h1. Pentaho Big Data Plugin {div:style=float:right}[!http://ci.pentaho.com/job/pentaho-big-data-plugin/lastBuild/buildStatus!|http://ci.pentaho.com/job/pentaho-big-data-plugin/]{div}

The Pentaho Big Data Plugin Project provides support for an ever-expanding Big Data community within the Pentaho ecosystem. It is a plugin for the Pentaho Kettle engine which can be used within Pentaho Data Integration (Kettle), Pentaho Reporting, and the Pentaho BI Platform.

h2. Pentaho Big Data Plugin Features

This project contains the implementations for connecting to or preforming the following:
- *Pentaho MapReduce*: visually design MapReduce jobs as Kettle transformations
- *HDFS File Operations*: Read/write directly from any Kettle step. All made possible by the ubiquitous use of Apache VFS throughout Kettle
- *Data Sources*
-- JDBC connectivity
--- *Apache Hive*
-- Native RPC connectivity for reading/writing
--- *Apache HBase*
--- *Cassandra*
--- *MongoDB*

h1. Key Links

- Git Repository: [https://github.com/pentaho/big-data-plugin]
- CI: [pentaho-big-data-plugin|http://ci.pentaho.com/job/pentaho-big-data-plugin]
- Download the latest development build: [pentaho-big-data-plugin-TRUNK-SNAPSHOT.tar.gz|http://ci.pentaho.com/job/pentaho-big-data-plugin/lastSuccessfulBuild/artifact/pentaho-big-data-plugin/dist/pentaho-big-data-plugin-TRUNK-SNAPSHOT.tar.gz]

h1. Community and where to find help

The [Big Data Forum|http://forums.pentaho.com/forumdisplay.php?301-Big-Data] exists for both users and developers. The community also manages the ##pentaho IRC channel on irc.freenode.net.

h1. Quick Start: Building the project

The Pentaho Big Data Plugin is built with [Apache Ant|http://ant.apache.org/] and uses [Apache Ivy|http://ant.apache.org/ivy/] for dependency management. All you'll need to get started is Ant 1.8.0 or newer to build the project. The build scripts will download Ivy if you do not already have it installed.

{code}git clone git://github.com/pentaho/big-data-plugin.git
cd big-data-plugin

h1. Developing with Eclipse

We recommend [Apache IvyDE|http://ant.apache.org/ivy/ivyde/] to manage your Ivy dependencies within Eclipse.

# Import pentaho-big-data-plugin into Eclipse
# Resolve the project using IvyDE

If IvyDE is not an option then you can manually add the jars from lib/ and libswt/ to your class path. This project, like all other Pentaho projects, uses the open source [Subfloor|http://code.google.com/p/subfloor/] Ant build framework. Running the following targets will configure the Eclipse project to reference the required libraries:

{code}ant resolve create-dot-classpath{code}

Then import or refresh the project in Eclipse.

h1. Contributing Changes

We use the [Fork + Pull Model|http://help.github.com/send-pull-requests/] to manage community contributions. Please fork the repository and submit a pull request with your changes.

Here's a sample git workflow to get you started:
# [Install Git|http://help.github.com/set-up-git-redirect]
# Setup Git to auto-correct line endings: {code}git config --global core.autocrlf input{code}
# Create a Github account
# Fork the project from [https://github.com/pentaho/big-data-plugin]
# Clone your repository: {code}git clone git@github.com:USERNAME/big-data-plugin.git{code}
# * Hack away *
# Stage and commit changes. Please make sure your commit messages include the JIRA case for your changes. It should be in the format: \[JIRA-CASE\] Short description of fixes.: {code}git add . && git commit{code}
# Push changes back up to Github: {code}git push{code}
# Submit a pull request from your project page. Please include a brief summary of what you changed and why.

h3. Git Resources

Here's a short list of resources to help you learn and master Git:
- http://gitref.org/
- http://help.github.com/
- http://progit.org/book/
- http://gitready.com/

h1. Documentation

h2. Kettle Plugin Development

Getting started with the [Pentaho Data Integration Java API|http://wiki.pentaho.com/display/EAI/Pentaho+Data+Integration+-+Java+API+Examples]

h2. Step Documentation

- [Cassandra Input|http://wiki.pentaho.com/display/EAI/Cassandra+Input]
- [Cassandra Output|http://wiki.pentaho.com/display/EAI/Cassandra+Output]
- [MongoDB Input|http://wiki.pentaho.com/display/EAI/MongoDB+Input]
- [MongoDB Output|http://wiki.pentaho.com/display/EAI/MongoDB+Output]
- [HBase Input|http://wiki.pentaho.com/display/EAI/HBase+Input]
- [HBase Output|http://wiki.pentaho.com/display/EAI/HBase+Output]

h2. Job Entry Documentation

- [Pentaho MapReduce]

h1. Community Plugins

Here's a list of known community plugins that fall into the "big data" category:

[Voldemort Lookup|http://type-exit.org/adventures-with-open-source-bi/2010/06/developing-a-custom-kettle-plugin-looking-up-values-in-voldemort/]
[HPCC Systems ECL Plugins|https://github.com/hpcc-systems/spoon-plugins]