How to use compression with Pentaho MapReduce. This guide uses the Snappy compression codec in its examples, but you may use any compression codec you choose that is supported in Hadoop. The following scenarios are covered:
- Reading Compressed Files
- Writing Compressed Files
- Compressing Intermediate Data
In order to follow along with this how-to guide you will need the following:
- Pentaho Data Integration
- Pentaho Hadoop Distribution
- Compression Codec Installed on Hadoop
Reading Compressed Files
In this task you will configure Pentaho MapReduce to read compressed files into the Map/Reduce Input.
The following compression codecs are automatically supported by Pentaho MapReduce. You do not need to do any configuration to read a file using these codecs.
- Create Year Partitioner Class: In a text editor create a new file named YearPartitioner.java containing the following code: