Hitachi Vantara Pentaho Community Wiki
Child pages
  • Using a Custom Input or Output Format in Pentaho MapReduce

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0
Wiki Markup


How to use a custom Input or Output Format in Pentaho MapReduce.

In some situations you may need to use a input or output format beyond the base formats included in Hadoop. In this guide you are going to develop and implement a custom output format that names the files the year of the data instead of the default part-00000 name. Although this guide implements a custom output format the same steps could also be used for an input format. For more information on file formats:


  1. List the Output Files: Listing the output files should return a file named 2010 and a file named 2011.
    Code Block
    hadoop fs -ls /user/pdi/weblogs/aggregate_mr
    Wiki Markup