Hitachi Vantara Pentaho Community Wiki

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Wiki Markup
{table:width=30%|align=right}
{tr}
{td}
{roundrect:width=100%|height=100%|bgcolor=#CADC99|title=Resources}
* *[Downloads|Downloads|Download Released Builds]* - Get the apps
* *[How-To's|How To's|Tutorials and samples]* - Get me started
* *[Forum|http://forums.pentaho.com/forumdisplay.php?301-Big-Data]* - Ask questions
* *IRC* irc.freenode.net ##pentaho
* *[In Development|PMOPEN:|Current Development and Road Map]* - What's next
* *[CI Builds|http://ci.pentaho.com/view/Data%20Integration/|Continuous Integration Server]* - Last Dev Build
* *[Source code|http://source.pentaho.org/viewvc/svnkettleroot/|Source on subversion]* - Get the code
* *[Pentaho Community Home|http://community.pentaho.com|The rest of the Pentaho Community]*
{roundrect}
{td}
{tr}
{table}
{include:COM:StyleInclude}


*Welcome to the Big Data space in the Pentaho Community wiki.* This space is the community home and collection point for all things [Big Data|http://en.wikipedia.org/wiki/Big_data] within the Pentaho ecosystem.  It is the place to find documentation, how-to's, best practices, use-cases and other information about employing Pentaho technology as part of your overall Big Data Strategy. It is also where you can share your own information and experiences.  We look forward to your participation and contribution\!

h1. Overview

Pentaho's Big Data story revolves around [Pentaho Data Integration AKA Kettle|http://kettle.pentaho.com]. Kettle is a powerful Extraction, Transformation and Loading (ETL) engine that uses a metadata-driven approach. The kettle engine provides data services for, and is embedded in, most of the applications within the Pentaho BI suite from Spoon, the Kettle designer, to the Pentaho report Designer.  Check out [About Kettle and Big Data] for more details of the Pentaho Big Data Story.

h1. News and Information

{color:bluered}*Pentaho will be announcing on Monday January 30th that it is open sourcing it's big data components and moving Kettle to the Apache license*{color}  Stay tuned for more information...

* *Pentaho Big Data components are now open source* \- In order to play well within the Hadoop open source ecosystem and make Kettle be the best and most pervasive ETL engine in the Big Data space, Pentaho has put all of the Hadoop and NoSQL components into open source starting with the 4.3 release.
* *Kettle license moves to Apache* \- To further Kettle adoption within the Hadoop community, Pentaho had decided to move the Kettle open source license from LGPL to the more permissive Apache license.  This will remove the issue of what restrictions are applied to a derivative work based on combining Kettle with Hadoop. 
\\
* *4.3 Pre-Release* of Kettle with the new Big Data components will be available for download on Jan 30, 2012 [download|Downloads]:
\\
* *First set of Big Data How-To's Published* \- Check out the How-To's for MapR Hadoop and Cassandra NoSQL Database [here.|How To's]

h1. Intro Videos


{composition-setup}{composition-setup}{deck:id=MyDeck|class=tan}
{card:label=Pentaho MapReduce}
A quick introduction to executing Kettle transforms as a Mapper and Reducer within the cluster.
{youtube}KZe1UugxXcs{youtube}

{card}
{card:label=Loading Data into Hadoop}
A quick example of loading into the Hadoop Distributed File System (HDFS) using Pentaho Kettle.
{youtube}Ylekzmd6TAc{youtube}
{card}

{card:label=Extracting Data from Hadoop}
A quick example of extracting data from the Hadoop Distributed File System (HDFS) using Pentaho Kettle.
{youtube}3Xew58LcMbg{youtube}

{card}
{deck}