The Pentaho Big Data Initiative
This wiki space is the community home and collection point for all things "Big Data" within the Pentaho ecosystem. This is the place to find documentation, how-to's, best practices, use-cases and other information about employing Pentaho technology as part of your overall Big Data Strategy. It is also where you can share your own information and experiences using Pentaho Big Data technology. We look forward to your participation and contribution!
The term "Big Data" gets thrown around quite a bit these days (especially by us vendors) and there is no absolute, universally accepted definition. The two words themselves are about as non descriptive as possible. It would not be useful or even helpful to get into an academic debate about what Big Data really means, or should mean, and whether any or all information in this wiki space strictly fits all possible definitions of Big Data. Lets please agree to table all such debates.
Pentaho generally uses the Wikipedia definition where big data usually has one or more of the following characteristics:
- Very large data volumes measured in terabytes or petabytes
- Variety of structured, unstructured and semi-structured data
- High velocity rapidly changing data
- Datasets that grow so large that they become awkward or uneconomic to work with using traditional database management and BI tools.
End of the Big Data philosophy discussion - for Pentaho marketing fluff, please visit: http://www.pentaho.com/big-data/
This is a closed wiki space
This is a first shot at getting an open source collaboration space for Big Data. It will eventually be open but is currently a work in progress and a place to put the use cases, demo's etc. I completely pulled the structure and initial content from my arse and am not in love with any of it. It is a round lump of clay, waiting to be molded by the brilliant minds of the Big Ass Data Team.