Hitachi Vantara Pentaho Community Wiki
Child pages
  • Configure Pentaho for MapR
Skip to end of metadata
Go to start of metadata

Outdated Material

This page has outdated material from an old Kettle version - 4.3 and has been archived. If you found it via search, be sure that it is what you want.

Setting up and configuring the Pentaho node dist, Kettle (PDI) and Reporting

Preconfigured Packages

These instructions are specific to the MapR distribution of Hadoop, if you are not using MapR, go to the Configure Pentaho for Cloudera and Other Hadoop Versions page.

Client Configuration

MapR Client

  1. Follow installation instructions provided by MapR for your architecture: Setting up the Client - MapR

Kettle Client

  1. Download and extract Kettle CE from the Downloads page.
  2. Configure PDI Client for MapR
    1. Overview:
      1. The MapR native libraries for your architecture must be added to the java.library.path
      2. MapR Hadoop Configuration directory needs to be on the classpath
      3. MapR Hadoop Core library must be on the classpath
    2. All architectures
      1. Update the $PDI_HOME/launcher/launcher.properties's classpath property to include the relative path to your MapR configuration directory. e.g.: classpath=../:../ui:../ui/images:../libext/mondrian/config:${HADOOP_HOME}/conf:../libext/bigdata/pigConf:../../../../opt/mapr/conf, or use the attached launcher.properties
      2. Delete $PDI_HOME/libext/bigdata/hadoop-0.20.2-core.jar
      3. Copy $MAPR_HOME/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar into $PDI_HOME/libext/bigdata
      4. Copy $MAPR_HOME/hadoop/hadoop-0.20.2/lib/maprfs-0.1.jar into $PDI_HOME/libext/bigdata
    3. Linux x64
      1. Update the $PDI_HOME/spoon.sh with the attached spoon.sh
      2. Update the $PDI_HOME/pan.sh with the attached pan.sh
      3. Update the $PDI_HOME/kitchen.sh with the attached kitchen.sh
      4. Update the $PDI_HOME/carte.sh with the attached carte.sh
    4. Mac OS X 64-bit
      1. Update the Data Integration 64-bit.app/Content/Info.plist with the attached Info.plist
  3. Apply the Hadoop client configuration files by adding the core-site, hdfs-site, and mapred-site.xml files in the $PDI_HOME directory.

Report Designer

  1. Download and extract PRD from the Downloads page.
  2. Configure PRD for MapR
    1. Delete $PRD_HOME/lib/jdbc/hadoop-0.20.2-core.jar
    2. Copy $MAPR_HOME/hadoop/hadoop-0.20.2/lib/hadoop-0.20.2-dev-core.jar into $PRD_HOME/lib
    3. Copy $MAPR_HOME/hadoop/hadoop-0.20.2/lib/maprfs-0.1.jar into $PRD_HOME/lib
    4. Linux x64:
      1. Add "-Djava.library.path=/opt/mapr/hadoop/hadoop-0.20.2/lib/native/Linux-amd64-64" to the last line in $PRD_HOME/report-designer.sh
    5. For MacOS:
      1. Add "-Djava.library.path=/opt/mapr/hadoop/hadoop-0.20.2/lib/native/Mac_OS_X-x86_64-64" to the "VMOptions" entry in $PRD_HOME/Pentaho\ Report\ Designer.app/Contents/Info.plist
Unknown macro: {HTMLComment}

Known Issues

  1. When using the HBase Input or Output steps from within a Pentaho MapReduce job you must have the HBase jar on the HADOOP_CLASSPATH on each node running a TaskTracker.
  2. When using CDH3u1 and above the Hive JDBC driver fails retrieving the last row. You must replace the Hive JDBC driver in Kettle's libext/bigdata/JDBC directory and PRD's lib/jdbc with this driver hive-jdbc-0.7.0-pentaho-SNAPSHOT.jar.
  • No labels