Hitachi Vantara Pentaho Community Wiki
Access Keys:
Skip to content (Access Key - 0)

Configure Pentaho for Cloudera CDH4

Outdated Material
This page has outdated material from an old Kettle version - 4.3 and has been archived. If you found it via search, be sure that it is what you want.

Client Configuration

These instructions are for the Cloudera CDH4 MRv1 Distribution, for previous versions, please go to Configure Pentaho for Cloudera and Other Hadoop Versions.

Kettle Client (PDI)

  1. Download and extract Kettle 4.3.0 CE from the Downloads page.
    The Kettle Client comes pre-configured for Apache Hadoop 0.20.2. If you are using this distro and version, no further configuration is required.
  2. Configure PDI Client for CDH4 MRv1
    1. Delete the pentaho-big-data-plugin directory found at $PDI_HOME/plugins.
    2. Download the Pentaho Big Data plugin specifically used for CDH4 MRv1 from here: http://ci.pentaho.com/job/BRANCH_CDH4_pentaho-big-data-plugin/, unzip it, and move the pentaho-big-data-plugin directory to $PDI_HOME/plugins.
    3. Delete the following from $PDI_HOME/libext/bigdata:
      1. hadoop-0.20.2-core.jar
      2. hbase-0.90.3.jar
      3. zookeeper-3.3.2.jar
    4. Delete the following from $PDI_HOME/libext/bigdata/hive:
      1. hive-exec-0.7.0-pentaho-1.0.1.jar
      2. hive-metastore-0.7.0-pentaho-1.0.1.jar
      3. hive-service-0.7.0-pentaho-1.0.1.jar
      4. libfb303.jar
      5. libthrift.jar
    5. Delete: $PDI_HOME/libext/google/google-collections-1.0-rc5.jar
    6. Copy the following jars from the CDH4 MRv1 installation to $PDI_HOME/libext/bigdata:
      1. avro-1.5.4.jar
      2. commons-configuration-1.6.jar
      3. hadoop-auth-2.0.0-cdh4.0.0.jar
      4. hadoop-common-2.0.0-cdh4.0.0.jar
      5. hadoop-core-2.0.0-mr1-cdh4.0.0.jar
      6. hadoop-hdfs-2.0.0-cdh4.0.0.jar
      7. hbase-0.92.1-cdh4.0.0-security.jar
      8. protobuf-java-2.4.0a.jar
      9. zookeeper-3.4.3-cdh4.0.0.jar
    7. Copy the following jars from the CDH4 MRv1 installation to $PDI_HOME/libext/bigdata/hive:
      1. hive-builtins-0.8.1-cdh4.0.0.jar
      2. hive-exec-0.8.1-cdh4.0.0.jar
      3. hive-metastore-0.8.1-cdh4.0.0.jar
      4. hive-service-0.8.1-cdh4.0.0.jar
      5. libfb303-0.7.0.jar
      6. libthrift-0.7.0.jar
    8. Move /pentaho-big-data-plugin/lib/guava-11.0.2.jar to the $PDI_HOME/libext/bigdata directory.
  3. Copy the core-site.xml, hdfs-site.xml, and mapred-site.xml configuration files from your Hadoop cluster to the $PDI_HOME directory.

Pentaho Report Designer (PRD)

  1. Download and extract PRD from the Downloads page.
    PRD comes pre-configured for Apache Hadoop 0.20.2. If you are using this distro and version, no further configuration is required.
  2. Configure PRD for CDH4 MRv1
    1. Delete the pentaho-big-data-plugin directory found at $PRD_HOME/plugins.
    2. Download the Pentaho Big Data plugin specifically used for CDH4 MRv1 from here: http://ci.pentaho.com/job/BRANCH_CDH4_pentaho-big-data-plugin/, unzip it, and move the pentaho-big-data-plugin directory to $PRD_HOME/plugins.
    3. Delete the following from $PRD_HOME/lib/bigdata:
      1. hadoop-0.20.2-core.jar
      2. hbase-0.90.3.jar
      3. zookeeper-3.3.2.jar
    4. Delete the following from $PRD_HOME/lib/jdbc:
      1. hive-exec-0.7.0-pentaho-1.0.1.jar
      2. hive-metastore-0.7.0-pentaho-1.0.1.jar
      3. hive-service-0.7.0-pentaho-1.0.1.jar
      4. libfb303-0.5.0.jar
      5. libthrift-0.5.0.jar
    5. Copy the following jars from the CDH4 MRv1 installation to $PRD_HOME/lib/bigdata:
      1. avro-1.5.4.jar
      2. commons-configuration-1.6.jar
      3. hadoop-auth-2.0.0-cdh4.0.0.jar
      4. hadoop-common-2.0.0-cdh4.0.0.jar
      5. hadoop-core-2.0.0-mr1-cdh4.0.0.jar
      6. hadoop-hdfs-2.0.0-cdh4.0.0.jar
      7. hbase-0.92.1-cdh4.0.0-security.jar
      8. protobuf-java-2.4.0a.jar
      9. zookeeper-3.4.3-cdh4.0.0.jar
    6. Copy the following jars from the CDH4 MRv1 installation to $PRD_HOME/lib/jdbc:
      1. hive-builtins-0.8.1-cdh4.0.0.jar
      2. hive-exec-0.8.1-cdh4.0.0.jar
      3. hive-metastore-0.8.1-cdh4.0.0.jar
      4. hive-service-0.8.1-cdh4.0.0.jar
      5. libfb303-0.7.0.jar
      6. libthrift-0.7.0.jar
    7. Move /pentaho-big-data-plugin/lib/guava-11.0.2.jar to the $PRD_HOME/lib/bigdata directory.

Pentaho Business Intelligence Server (BI Server)

  1. Download and extract BI Server from the Downloads page.
    The BI Server comes pre-configured for Apache Hadoop 0.20.2. If you are using this distro and version, no further configuration is required.
  2. Configure BI Server for CDH4 MRv1
    1. Delete the pentaho-big-data-plugin directory found at $BI_SERVER_HOME/pentaho-solutions/system/kettle/plugins.
    2. Download the Pentaho Big Data plugin specifically used for CDH4 MRv1 from here: http://ci.pentaho.com/job/BRANCH_CDH4_pentaho-big-data-plugin/, unzip it, and move the pentaho-big-data-plugin directory to $BI_SERVER_HOME/pentaho-solutions/system/kettle/plugins.
    3. Delete the following from $BI_SERVER_HOME/tomcat/webapps/pentaho/WEB-INF/lib:
      1. hadoop-0.20.2-core.jar
      2. hbase-0.90.3.jar
      3. hive-exec-0.7.0-pentaho-1.0.1.jar
      4. hive-metastore-0.7.0-pentaho-1.0.1.jar
      5. hive-service-0.7.0-pentaho-1.0.1.jar
      6. libfb303-0.5.0.jar
      7. libthrift-0.5.0.jar
      8. zookeeper-3.3.2.jar
    4. Copy the following jars from the CDH4 MRv1 installation to $BI_SERVER_HOME/tomcat/webapps/pentaho/WEB-INF/lib:
      1. avro-1.5.4.jar
      2. commons-configuration-1.6.jar
      3. hadoop-auth-2.0.0-cdh4.0.0.jar
      4. hadoop-common-2.0.0-cdh4.0.0.jar
      5. hadoop-core-2.0.0-mr1-cdh4.0.0.jar
      6. hadoop-hdfs-2.0.0-cdh4.0.0.jar
      7. hbase-0.92.1-cdh4.0.0-security.jar
      8. hive-builtins-0.8.1-cdh4.0.0.jar
      9. hive-exec-0.8.1-cdh4.0.0.jar
      10. hive-metastore-0.8.1-cdh4.0.0.jar
      11. hive-service-0.8.1-cdh4.0.0.jar
      12. libfb303-0.7.0.jar
      13. libthrift-0.7.0.jar
      14. protobuf-java-2.4.0a.jar
      15. zookeeper-3.4.3-cdh4.0.0.jar
    5. Move $BI_SERVER_HOME/pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin/lib/guava-11.0.2.jar to the $BI_SERVER_HOME/tomcat/webapps/pentaho/WEB-INF/lib directory.
  3. Place the Hadoop configuration files (hdfs-site.xml, core-site.xml, mapred-site.xml) into $BI_SERVER_HOME/tomcat/webapps/pentaho/WEB-INF/classes


  1. Oct 05, 2012

    Vijay Ivaturi says:

    In addition to the above steps, make sure you copy config files to PDI home. We ...

    In addition to the above steps, make sure you copy config files to PDI home. We ended up wasting a lot of time with all the CDH4 jars copied correctly but PDI could not connect to remote namenode to submit MR jobs

    1. Apply Hadoop client configuration files by placing the core-site, hdfs-site, and mapred-site.xml files in the $PDI_HOME directory.
  2. Oct 11, 2012

    Vijay Ivaturi says:

    HDFS step in Spoon failed even after the Jar file changes. Fix is to c...

    HDFS step in Spoon failed even after the Jar file changes. Fix is to copy commons-cli-1.2.jar to $PENTAHO_HOME/design-tools/data-integration/libext/bigdata/. Thanks Jonathan Bender! 

This documentation is maintained by the Pentaho community, and members are encouraged to create new pages in the appropriate spaces, or edit existing pages that need to be corrected or updated.

Please do not leave comments on Wiki pages asking for help. They will be deleted. Use the forums instead.

Adaptavist Theme Builder (4.2.0) Powered by Atlassian Confluence 3.3.3, the Enterprise Wiki