How to use a custom Input or Output Format in Pentaho MapReduce.
In some situations you may need to use a input or output format beyond the base formats included in Hadoop. In this guide you are going to develop and implement a custom output format that names the files the year of the data instead of the default part-00000 name. Although this guide implements a custom output format the same steps could also be used for an input format. For more information on file formats: http://developer.yahoo.com/hadoop/tutorial/module5.html#inputformat
- List the Output Files: Listing the output files should return a file named 2010 and a file named 2011.
hadoop fs -ls /user/pdi/weblogs/aggregate_mr