Hitachi Vantara Pentaho Community Wiki

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

Include Page

...

navPanel

...

navPanel

Welcome to the Big Data space in the Pentaho Community wiki. This space is the community home and collection point for all things Big Data and NoSQL technologies within the Pentaho ecosystem. It is the place to find documentationinformation, how-to's, best practices, use-cases developer info, technology previews and other information about employing Pentaho technology as part of your overall Big Data Strategy. It is also where you can share your own information and experiences. We look forward to your participation and contribution!

Overview

Pentaho's Big Data story revolves around Pentaho Data Integration AKA Kettle. Kettle is a powerful Extraction, Transformation and Loading (ETL) engine that uses a metadata-driven approach. The kettle engine provides data services for, and is embedded in, most of the applications within the Pentaho BI suite from Spoon, the Kettle designer, to the Pentaho report Designer. Check out About Kettle and Big Data for more details of the Pentaho Big Data Story.

News and Information

  • Image RemovedUpdate to Big Data Plugin Available for PDI 4.4 and BA Suite 4.8 - Lots of fixes and new distro support download
  • 4.4 Stable of Kettle with the new Big Data components is now available available for download:
  • First set of Big Data How-To's Published - Check out the How-To's for Hadoop, MapR, Cassandra and MongoDB here.

Getting Started

It's easy to get started with Pentaho for Big Data.

  1. Watch the intro videos below.
  2. Read about Kettle and Big Data.
  3. Download and configure the software here.
  4. Try the How To's for yourself.
  5. Join the Pentaho Big Data forum and let us know how you are using Big Data, ask questions and give feedback.
  6. Tell all your friends and neighbors

Intro Videos

...

Deck of Cards
idMyDeck
classtan
Card
label1) Pentaho MapReduce with Kettle

The first three videos compare using Pentaho Kettle to create and execute a simple MapReduce job with using Java to solve the same problem. The Kettle transform shown here runs as a Mapper and Reducer within the cluster.

youtube
KZe1UugxXcs
Card
label2) Straight Java

What would the same task as "1) Pentaho MapReduce with Kettle" look like if you coded it in Java? At a half hour long, you may not want to watch the entire video...

youtube
cfFq1XB4kww
Card
label3) Compare using Kettle to Java

This is a quick summary of the previous two videos, "1) Pentaho MapReduce with Kettle" and "2) Straight Java", and why Pentaho Kettle boosts productivity and maintainability.

youtube
ZnyuTICOrhk
Card
labelLoading Data into Hadoop

A quick example of loading into the Hadoop Distributed File System (HDFS) using Pentaho Kettle.

youtube
Ylekzmd6TAc
Card
labelExtracting Data from Hadoop

A quick example of extracting data from the Hadoop Distributed File System (HDFS) using Pentaho Kettle.

youtube
3Xew58LcMbg

If you are not a developer, are looking for more product specific information, or are interested in commercial support, PentahoBigData.com is the place to find those resources.

Wiki Markup
{HTMLComment}

h1. News and Information

* !Common Images^new-icon.png!*Community Edition Upgrade to Big Data 5.0.4* \- [Upgrade Hadoop in Community Edition to 5.0.4]
* !Common Images^new-icon.png!*Pentaho Labs update* \- [Kettle running on Storm|Kettle Execution on Storm]
* !Common Images^new-icon50.png!*Pentaho Labs update* \- [Realtime debugging Kettle transforms running in Hadoop|Pentaho Map Reduce Vizor]
* *Update to Big Data Plugin Available for PDI 4.4 and BA Suite 4.8* \- Lots of fixes and new distro support [download|https://support.pentaho.com/entries/24445558-Big-Data-Plugin-Version-1-3-3-for-Pentaho-BA-Server-4-8-1-x-and-PDI-4-4-1-x]

New and recently updated Big Data content on the [What's New?|What's New?] page
{HTMLComment}

Overview

With growing volumes and varieties of data flowing at increasing speed, organizations need a fast and easy way to harness and gain insight from their big data sources. Pentaho accelerates the realization of value from big data with the most complete solution for big data analytics.

Image Added

For a more complete overview of the Pentaho Big Data story, visit PentahoBigData.com/overview.

Getting Started

Select your Big Data technology to get started...