Hitachi Vantara Pentaho Community Wiki
Child pages
  • Using a Custom Input or Output Format in Pentaho MapReduce

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0
Wiki Markup
{scrollbar}

Excerpt

How to use a custom Input or Output Format in Pentaho MapReduce.

In some situations you may need to use a input or output format beyond the base formats included in Hadoop. In this guide you are going to develop and implement a custom output format that names the files the year of the data instead of the default part-00000 name. Although this guide implements a custom output format the same steps could also be used for an input format. For more information on file formats: http://developer.yahoo.com/hadoop/tutorial/module5.html#inputformat

...

  1. List the Output Files: Listing the output files should return a file named 2010 and a file named 2011.
    Code Block
    hadoop fs -ls /user/pdi/weblogs/aggregate_mr
    Wiki Markup
    {scrollbar}