How to use compression with Pentaho MapReduce.

This guide uses the Snappy compression codec in its examples, but you may use any compression codec you choose that is supported in Hadoop. The following scenarios are covered:


In order to follow along with this how-to guide you will need the following:

Step-By-Step Instructions

Reading Compressed Files

In this task you will configure Pentaho MapReduce to read compressed files into the Map/Reduce Input.

The following compression codecs are automatically supported by Pentaho MapReduce. You do not need to do any configuration to read a file using these codecs.

  1. Create Year Partitioner Class: In a text editor create a new file named YearPartitioner.java containing the following code: