Hitachi Vantara Pentaho Community Wiki

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin
Wiki Markup
{table:width=30%|align=right}
{tr}
{td}
{roundrect:width=100%|height=100%|bgcolor=#CADC99|title=Resources}
* *[Downloads|Downloads|Download Released Builds]* - Get the code
* *[CI Builds|http://ci.pentaho.com/view/Data%20Integration/|Continuous Integration Server]* - Last Dev Build
* *[How-To's|How To's|Tutorials and samples]* - Get me started
* *[Forum|http://forums.pentaho.com/forumdisplay.php?301-Big-Data]* - Ask questions
* *IRC* irc.freenode.net ##pentaho
* *[In Development|PMOPEN:|Current Development and Road Map]* - What's next
* *[Pentaho Community Home|http://community.pentaho.com|The rest of the Pentaho Community]*
{roundrect}
{td}
{tr}
{table}
{include:COM:StyleInclude} *Welcome to the Big Data space in the Pentaho Community wiki.* This space is the community home and collection point for all things [Big Data|http://en.wikipedia.org/wiki/Big_data] within the Pentaho ecosystem.  It is the place to find documentation, how-to's, best practices, use-cases and other information about employing Pentaho technology as part of your overall Big Data Strategy. It is also where you can share your own information and experiences.  We look forward to your participation and contribution\!

*Expectations* \- If you are unfamiliar with open source, [this article|http://en.wikipedia.org/wiki/Open_source] is a good place to start.  The open source community thrives on participation and cooperation.  There are several communication channels available where people can help you, but they are not obligated to do so.  You are responsible for your own success which will require time, effort and a small amount technical ability.  If you prefer to have a relationship with a known vendor who will answer questions over the phone, help you during your evaluation and support you in production; please visit [www.pentaho.com|http://www.pentaho.com].

h1. Overview

Pentaho's Big Data story revolves around [Pentaho Data Integration AKA Kettle|http://kettle.pentaho.com]. Kettle is a powerful Extraction, Transformation and Loading (ETL) engine that uses a metadata-driven approach. The kettle engine provides data services for and is embedded in many of the applications within the Pentaho BI suite.  As such, it comes in many forms and packages:

* *Spoon* (Sometimes referred to as Kettle) - The Kettle desktop visual design tool used to create and edit ETL transformations and jobs.  Spoon also has perspectives for running and debugging, visualizing and generating data models that can be used by the rest of the Pentaho Suite.
* *Pentaho Hadoop Distribution (PHD)* \- This is the kettle engine packaged for distribution to a hadoop cluster.  The PHD allows kettle Transforms to be run as a map task, reduce task or combiner and take advantage of the power of the hadoop cluster.  This distribution will eventually become unnecessary as Kettle is modified to use the hadoop distributed cache to locate the resources it needs to execute within the cluster.
* *Pan* \- A program that can execute transformations from the command line, usually via scheduler.
* *Kitchen* \- A program that can execute jobs from the command line, usually via scheduler.
* *Carte* \- A simple web server that allows you to execute transformations and jobs remotely.  It does so by accepting XML (using a small servlet) that contains the transformation to execute and the execution configuration.  It also allows you to remotely monitor, start and stop the transformations and jobs that run on the Carte server.
* *Pentaho Report Designer (PRD)* \- The Kettle Engine is embedded in the Pentaho Report Designer which enables PRD to generate reports from a Kettle transform without having to stage the data.  It also gives PRD access to all of the database connectors within Kettle including the NoSQL databases. 
* *Pentaho BI Platform* \- The Kettle Engine is embedded in the BI Platform which enables reports created with PRD that rely on transforms to be published to the web.
* *Pentaho Data Integration Server (DI Server) EE* \- Standalone server for running Kettle Jobs and transforms. It has a CMS repository for storing and versioning Jobs and Transforms.  It also has a scheduler and performance monitor.  The DI Server is part of Pentaho Enterprise Edition and is not available in open source.

{note:title=This is a closed wiki space}
The only people with access are Pentaho Employees and BAD team

This is a first shot at getting an open source collaboration space for Big Data.  It will eventually be open but is currently a work in progress and a place to put the use cases, demo's etc.  I completely pulled the structure and initial content from my arse and am not in love with any of it.  It is a round lump of clay, waiting to be molded by the brilliant minds of the Big Ass Data Team.
{note}

Include Page
navPanel
navPanel

Welcome to the Big Data space in the Pentaho Community wiki. This space is the community home for Big Data and NoSQL technologies within the Pentaho ecosystem. It is the place to find information, how-to's, developer info, technology previews and other information about employing Pentaho technology as part of your overall Big Data Strategy. It is also where you can share your own information and experiences. We look forward to your participation and contribution!

If you are not a developer, are looking for more product specific information, or are interested in commercial support, PentahoBigData.com is the place to find those resources.

Wiki Markup
{HTMLComment}

h1. News and Information

* !Common Images^new-icon.png!*Community Edition Upgrade to Big Data 5.0.4* \- [Upgrade Hadoop in Community Edition to 5.0.4]
* !Common Images^new-icon.png!*Pentaho Labs update* \- [Kettle running on Storm|Kettle Execution on Storm]
* !Common Images^new-icon50.png!*Pentaho Labs update* \- [Realtime debugging Kettle transforms running in Hadoop|Pentaho Map Reduce Vizor]
* *Update to Big Data Plugin Available for PDI 4.4 and BA Suite 4.8* \- Lots of fixes and new distro support [download|https://support.pentaho.com/entries/24445558-Big-Data-Plugin-Version-1-3-3-for-Pentaho-BA-Server-4-8-1-x-and-PDI-4-4-1-x]

New and recently updated Big Data content on the [What's New?|What's New?] page
{HTMLComment}

Overview

With growing volumes and varieties of data flowing at increasing speed, organizations need a fast and easy way to harness and gain insight from their big data sources. Pentaho accelerates the realization of value from big data with the most complete solution for big data analytics.

Image Added

For a more complete overview of the Pentaho Big Data story, visit PentahoBigData.com/overview.

Getting Started

Select your Big Data technology to get started...