Hitachi Vantara Pentaho Community Wiki

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

The Pentaho Big Data Initiative

This wiki

Include Page

Welcome to the Big Data space in the Pentaho Community wiki. This space is the community home and collection point for all things "Big Data" and NoSQL technologies within the Pentaho ecosystem. This It is the place to find documentationinformation, how-to's, use cases developer info, technology previews and other information about employing Pentaho technology as part of your overall Big Data Strategy. It is also a place where you can share your own information and experiences using Pentaho Big Data technology. We look forward to your participation and contribution!

The term "Big Data" gets thrown around quite a bit these days (especially by us vendors) and there is no absolute, universally accepted definition. The two words themselves are about as non descriptive as possible. It would not be useful or even helpful to get into an academic debate about what Big Data really means, or should mean, and whether any or all information in this wiki space strictly fits all possible definitions of Big Data. Lets please agree to table all such debates.

Pentaho generally uses the Wikipedia definition where big data usually has one or more of the following characteristics:

  • Very large data volumes measured in terabytes or petabytes
  • Variety of structured, unstructured and semi-structured data
  • High velocity rapidly changing data
  • Datasets that grow so large that they become awkward or uneconomic to work with using traditional database management and BI tools. Analyzing big data allows analysts, data scientists and now casual business users to do things not previously possible including identifying business trends, preventing diseases and combatting crime.

If you are not a developer, are looking for more product specific information, or are interested in commercial support, is the place to find those resources.

Wiki Markup

h1. News and Information

* !Common Images^new-icon.png!*Community Edition Upgrade to Big Data 5.0.4* \- [Upgrade Hadoop in Community Edition to 5.0.4]
* !Common Images^new-icon.png!*Pentaho Labs update* \- [Kettle running on Storm|Kettle Execution on Storm]
* !Common Images^new-icon50.png!*Pentaho Labs update* \- [Realtime debugging Kettle transforms running in Hadoop|Pentaho Map Reduce Vizor]
* *Update to Big Data Plugin Available for PDI 4.4 and BA Suite 4.8* \- Lots of fixes and new distro support [download|]

New and recently updated Big Data content on the [What's New?|What's New?] page


With growing volumes and varieties of data flowing at increasing speed, organizations need a fast and easy way to harness and gain insight from their big data sources. Pentaho accelerates the realization of value from big data with the most complete solution for big data analytics.

Image Added

For a more complete overview of the Pentaho Big Data story, visit

Getting Started

Select your Big Data technology to get started...