This guide uses the Snappy compression codec in its examples, but you may use any compression codec you choose that is supported in Hadoop. The following scenarios are covered:
How to use compression with Pentaho MapReduce.
- Reading Compressed Files
- Writing Compressed Files
- Compressing Intermediate Data
In order to follow along with this how-to guide you will need the following:
- Pentaho Data Integration
- Pentaho Hadoop Distribution
- Compression Codec Installed on Hadoop
Reading Compressed Files
In this task you will configure Pentaho MapReduce to read compressed files into the Map/Reduce Input.
The following compression codecs are automatically supported by Pentaho MapReduce. You do not need to do any configuration to read a file using these codecs.
- Create Year Partitioner Class: In a text editor create a new file named YearPartitioner.java containing the following code: