The Wiki will be offline Monday, November 20, for upgrade between 10:00am ET and 5:00pm ET.
Hitachi Vantara Pentaho Community Wiki
Access Keys:
Skip to content (Access Key - 0)

Understanding How Pentaho works with Hadoop

For a complete overview of using Pentaho and Hadoop, visit

Pentaho is integrated with Hadoop at many levels

  • Traditional ETL - Graphical designer to visually build transformations that read and write data in Hadoop from/to anywhere and transform the data on the way. No coding required - unless you want to. Transformation steps include...
    • HDFS files Read and Write
    • HBase Read/Write
    • Hive, Hive2 SQL Query and Write
    • Impala SQL Query and Write
    • Support for Avro file format and snappy compression
  • Data Orchestration - Graphical designer to visually build and schedule jobs that orchestrate processing, data movement and most aspects of operationalizing your data preparation. No coding required - unless you want to. Job steps include...
    • HDFS Copy files
    • Map Reduce Job Execution
    • Pig Script Execution
    • Amazon EMR Job Execution
    • Oozie integration
    • Sqoop Import/Export
    • Pentaho MapReduce Execution
    • PDI Clustering via YARN
  • Pentaho MapReduce - Graphical designer to visually build MapReduce jobs and run them in cluster. With a simple, point-and-click alternative to writing Hadoop MapReduce programs in Java or Pig, Pentaho exposes a familiar ETL-style user interface. Hadoop becomes easily usable by IT and data scientists, not just developers with specialized MapReduce and Pig coding skills. As always, No coding required - unless you want to.
  • Traditional Reporting - All data sources supported above can be used directly or blended with other data to drive our pixel perfect reporting engine. The reports can be secured, parameterized and published to the web to provide guided adhoc capabilities to end users. The reports can be mashed up with other pentaho visualizations to create dashboards.
  • Web Based Interactive Reporting - Pentaho's Metadata layer leverages data stored in Hive, Hive2 and Impala for WYSIWYG, interactive, self-service reporting. More Info
  • Pentaho Analyzer - Leverage your data stored Impala or Hive2 (Stinger) for interactive visual analysis with drill through, lasso filtering, zooming, and attribute highlighting for greater insight. More Info

In cluster ETL

This documentation is maintained by the Pentaho community, and members are encouraged to create new pages in the appropriate spaces, or edit existing pages that need to be corrected or updated.

Please do not leave comments on Wiki pages asking for help. They will be deleted. Use the forums instead.

Adaptavist Theme Builder (4.2.0) Powered by Atlassian Confluence 3.3.3, the Enterprise Wiki