Hitachi Vantara Pentaho Community Wiki

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin


Include Page

Welcome to the Big Data space in the Pentaho Community wiki. This space is the community home for Big Data and NoSQL technologies within the Pentaho ecosystem. It is the place to find information, how-to's, developer info, technology previews and other information about employing Pentaho technology as part of your overall Big Data Strategy. It is also where you can share your own information and experiences. We look forward to your participation and contribution!

If you are not a developer, are looking for more product specific information, or are interested in commercial support, is the place to find those resources.

Wiki Markup

h1. News and Information

* *Pentaho Big Data components are now open source!Common Images^new-icon.png!*Community Edition Upgrade to Big Data 5.0.4* \- In order to play well within the Hadoop open source ecosystem and make Kettle be the best and most pervasive ETL engine in the Big Data space, Pentaho has put all of the Hadoop and NoSQL components into open source starting with the 4.3 release. [READ MORE HERE Press Release (TODO)|READ MORE HERE Press Release (TODO)]\\
* *Kettle license moves to Apache* \- To further Kettle adoption within the Hadoop community, Pentaho had decided to move the Kettle open source license from LGPL to the more permissive Apache license.  This will remove the issue of what restrictions are applied to a derivative work based on combining Kettle with Hadoop. [READ MORE HERE  Press Release (TODO)|READ MORE HERE  Press Release (TODO)]\\
* *4.3 Pre-Release* of Kettle with the new Big Data components is now available for [download|Downloads]:
* *First set of Big Data How-To's Published* \- Check out the How-To's for MapR Hadoop and Cassandra NoSQL Database [here.|How To's]

h1. Intro Videos

{card:label=Running a transform in the Cluster}{color:#003366}{*}A quick introduction to executing a Kettle transform as a Mapper within the cluster{*}{color}

{card:label=Putting data into Hadoop}{color:#003366}{*}A quick demo of using Kettle to put data into the HDFS{*}{color}


{note:title=This currently is a closed wiki space}
The only people with access are Pentaho Employees and BAD team{note}[Upgrade Hadoop in Community Edition to 5.0.4]
* !Common Images^new-icon.png!*Pentaho Labs update* \- [Kettle running on Storm|Kettle Execution on Storm]
* !Common Images^new-icon50.png!*Pentaho Labs update* \- [Realtime debugging Kettle transforms running in Hadoop|Pentaho Map Reduce Vizor]
* *Update to Big Data Plugin Available for PDI 4.4 and BA Suite 4.8* \- Lots of fixes and new distro support [download|]

New and recently updated Big Data content on the [What's New?|What's New?] page


With growing volumes and varieties of data flowing at increasing speed, organizations need a fast and easy way to harness and gain insight from their big data sources. Pentaho accelerates the realization of value from big data with the most complete solution for big data analytics.

Image Added

For a more complete overview of the Pentaho Big Data story, visit

Getting Started

Select your Big Data technology to get started...