The Pentaho Big Data Initiative

This wiki space is the community home and collection point for all things "Big Data" within the Pentaho ecosystem. This is the place to find documentation, how-to's, use cases and other information about employing Pentaho technology as part of your overall Big Data Strategy. It is also a place where you can share your own information and experiences using Pentaho Big Data technology. We look forward to your participation and contribution!

The term "Big Data" gets thrown around quite a bit these days (especially by us vendors) and there is no absolute, universally accepted definition. The two words themselves are about as non descriptive as possible. It would not be useful or even helpful to get into an academic debate about what Big Data really means, or should mean, and whether any or all information in this wiki space strictly fits all possible definitions of Big Data. Lets please agree to table all such debates.

Pentaho generally uses the Wikipedia definition where big data usually has one or more of the following characteristics: