The Pentaho Big Data Initiative
This wiki space is the community home and collection point for all things "Big Data" within the Pentaho ecosystem. This is the place to find documentation, how-to's, use cases and other information about employing Pentaho technology as part of your overall Big Data Strategy. It is also a place where you can share your own information and experiences using Pentaho Big Data technology. We look forward to your participation and contribution!
The term "Big Data" gets thrown around quite a bit these days (especially by us vendors) and there is no absolute, universally accepted definition. The two words themselves are about as non descriptive as possible. It would not be useful or even helpful to get into an academic debate about what Big Data really means, or should mean, and whether any or all information in this wiki space strictly fits all possible definitions of Big Data. Lets please agree to table all such debates.
Pentaho generally uses the Wikipedia definition where big data usually has one or more of the following characteristics:
- Very large data volumes measured in terabytes or petabytes
- Variety of structured, unstructured and semi-structured data
- High velocity rapidly changing data
- Datasets that grow so large that they become awkward or uneconomic to work with using traditional database management and BI tools. Analyzing big data allows analysts, data scientists and now casual business users to do things not previously possible including identifying business trends, preventing diseases and combatting crime.