Hitachi Vantara Pentaho Community Wiki
Child pages
  • Using Compression with Pentaho MapReduce
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

How to use compression with Pentaho MapReduce. This guide uses the Snappy compression codec in its examples, but you may use any compression codec you choose that is supported in Hadoop. The following scenarios are covered:

  • Reading Compressed Files
  • Writing Compressed Files
  • Compressing Intermediate Data

Prerequisites

In order to follow along with this how-to guide you will need the following:

  • Hadoop
  • Pentaho Data Integration
  • Pentaho Hadoop Distribution
  • Compression Codec Installed on Hadoop

Step-By-Step Instructions

Reading Compressed Files

In this task you will configure Pentaho MapReduce to read compressed files into the Map/Reduce Input.

The following compression codecs are automatically supported by Pentaho MapReduce. You do not need to do any configuration to read a file using these codecs.

  1. Create Year Partitioner Class: In a text editor create a new file named YearPartitioner.java containing the following code:
  • No labels