Hitachi Vantara Pentaho Community Wiki

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

The Pentaho Big Data Initiative

This wiki

Include Page
navPanel
navPanel

Welcome to the Big Data space in the Pentaho Community wiki. This space is the community home and collection point for all things "Big Data" and NoSQL technologies within the Pentaho ecosystem. This It is the place to find documentationinformation, how-to's, best practices, use-cases developer info, technology previews and other information about employing Pentaho technology as part of your overall Big Data Strategy. It is also where you can share your own information and experiences using Pentaho Big Data technology. We look forward to your participation and contribution!

Quick Launch

Overview

Pentaho's Big Data story revolves around Pentaho Data Integration AKA Kettle. Kettle is a powerful Extraction, Transformation and Loading (ETL) engine that uses a metadata-driven approach. The kettle engine provides data services for and is embedded in many of the applications within the Pentaho BI suite. Kettle comes with a graphical, drag and drop design environment for designing and running Kettle Jobs and Transformations.

A quick 2 min video of PDI in action

Kettle Transformations
Image Removed
A Kettle transformation consists of one or more steps that perform core ETL work like reading data in the form of rows from a file or database, filtering rows, calculating new columns and sending the new data stream somewhere else. All steps in a transform execute simultaneously (usually in separate threads) and data is passed from step to step in parallel. The data is operated on in a continuous stream without having to be fully read into memory or staged. The image to the right demonstrated a very simple kettle transformation - Read from a data source, do some transformation, in this case a filter and then write the data stream to another data source.

(IN WORK DM)

Note
titleThis is a closed wiki space

The only people with access are Pentaho Employees and Dave Reinke (Chris will need to sign up for the wiki and send me his user id)

This is a first shot at getting an open source collaboration space for Big Data. It will eventually be open but is currently a work in progress and a place to put the use cases, demo's etc. I completely pulled the structure and initial content from my arse and am not in love with any of it. It is a round lump of clay, waiting to be molded by the brilliant minds of the Big Ass Data Team.

If you are not a developer, are looking for more product specific information, or are interested in commercial support, PentahoBigData.com is the place to find those resources.

Wiki Markup
{HTMLComment}

h1. News and Information

* !Common Images^new-icon.png!*Community Edition Upgrade to Big Data 5.0.4* \- [Upgrade Hadoop in Community Edition to 5.0.4]
* !Common Images^new-icon.png!*Pentaho Labs update* \- [Kettle running on Storm|Kettle Execution on Storm]
* !Common Images^new-icon50.png!*Pentaho Labs update* \- [Realtime debugging Kettle transforms running in Hadoop|Pentaho Map Reduce Vizor]
* *Update to Big Data Plugin Available for PDI 4.4 and BA Suite 4.8* \- Lots of fixes and new distro support [download|https://support.pentaho.com/entries/24445558-Big-Data-Plugin-Version-1-3-3-for-Pentaho-BA-Server-4-8-1-x-and-PDI-4-4-1-x]

New and recently updated Big Data content on the [What's New?|What's New?] page
{HTMLComment}

Overview

With growing volumes and varieties of data flowing at increasing speed, organizations need a fast and easy way to harness and gain insight from their big data sources. Pentaho accelerates the realization of value from big data with the most complete solution for big data analytics.

Image Added

For a more complete overview of the Pentaho Big Data story, visit PentahoBigData.com/overview.

Getting Started

Select your Big Data technology to get started...