Pentaho Big Data Plugin
The Pentaho Big Data Plugin Project provides support for an ever-expanding BigData community within the Pentaho ecosystem. It is a plugin for the Pentaho Kettle engine which can be used within Pentaho Data Integration (Kettle), Pentaho Reporting, and the Pentaho BI Platform.
Pentaho Big Data Plugin Features
This project contains the implementations for connecting to or preforming the following:
- Pentaho MapReduce: visually design MapReduce jobs as Kettle transformations
- HDFS File Operations: Read/write directly from any Kettle step. All made possible by the ubiquitous use of Apache VFS throughout Kettle
- Data Sources
- JDBC connectivity
- Apache Hive
- Native RPC connectivity for reading/writing
- Apache HBase
- Cassandra
- MongoDB
- JDBC connectivity
Key Links
- SVN Repository: svn://source.pentaho.org/svnkettleroot/pentaho-big-data-plugin
- CI: pentaho-big-data-plugin
- Download: The latest development build: pentaho-big-data-plugin-TRUNK-SNAPSHOT.tar.gz
Community and where to find help
The Big Data Forum exists for both users and developers. The community also manages the ##pentaho IRC channel on irc.freenode.net.
Quick Start: Building the project
The Pentaho Big Data Plugin is built with Apache Ant and uses Apache Ivy for dependency management. All you'll need to get started is Ant 1.8.0 or newer to build the project. The build scripts will download Ivy if you do not already have it installed.
svn co svn://source.pentaho.org/svnkettleroot/pentaho-big-data-plugin/trunk pentaho-big-data-plugin cd pentaho-big-data-plugin ant
Developing with Eclipse
We recommend Apache IvyDE to manage your Ivy dependencies within Eclipse.
- Import pentaho-big-data-plugin into Eclipse
- Resolve the project using IvyDE
If IvyDE is not an option then you can manually add the jars from lib/ and libswt/ to your class path. This project, like all other Pentaho projects, uses the open-source Subfloor Ant build framework. Running the following targets will configure the Eclipse project to reference the required libraries:
ant resolve create-dot-classpath
Then import or refresh the project in Eclipse and add the SWT libraries for your architecture, e.g. for Mac OS X x64:
Unable to render embedded object: File (osx-swt-jars.png) not found.
Documentation
Kettle Plugin Development
Getting started with the Pentaho Data Integration Java API