How to configure client-side PDI so that files compressed using the Snappy codec can be decompressed using the Hadoop file input or Text file input step.


Step-By-Step Instructions

Configure PDI to Access Snappy Native Libraries

In order to use client-side PDI to decompress files encoded by hadoop-snappy (the snappy implementation used in Hadoop) it is necessary to build and install both the hadoop-snappy JNI interface and the snappy native libraries for your platform. Instructions for achieving this can be found at:


In particular, the instructions under "Build Hadoop Snppy" should be followed. The "Install Hadoop Snappy in Hadoop" instructions should only be followed if

  1. You want to decompress snappy encoded files within a Pentaho map reduce job (see Using Compression with Pentaho MapReduce for more information), and
  2. Your Hadoop installation does not have snappy hadoop-snappy installed already (recent Hadoop distributions from Cloudera etc. are configured with hadoop-snappy out of the box)

Once you have built hadoop-snappy:

  1. Uncompress the hadoop-snappy-x.y.z-SNAPSHOT.tar.gz archive the build process creates somewhere on your client PDI machine
  2. Copy hadoop-snappy-x.y.z-SNAPSHOT/lib/hadoop-snappy-x.y.z-SNAPSHOT.jar to libext/bigdata in your client PDI installation
  3. Set the java.library.path property to point to the subdirectory of hadoop-snappy-x.y.z-SNAPSHOT/lib/native that corresponds to your platform

Where to set the java.library.path in Step 3 will vary depending on your platform:

Verifying that Snappy Decompression is Available to PDI

After following the instructions of the previous section restart PDI. If hadoop-snappy and the snappy native libraries have been installed correctly on the PDI client machine then a "Hadoop-snappy" option will be available under the "Compression" drop-down box on the "Content" tab of the Hadoop file input and Text file input steps.