Hitachi Vantara Pentaho Community Wiki
Child pages
  • Kettle Execution on Storm

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

Include Page

...

Labs Warning

...

Labs Warning

Excerpt

An experimental environment for executing a Kettle transformation as a Storm topology.

Project Info

  • Status: Early Prototype, proof of conceptDevelopment
  • Roadmap: Not on any roadmap, not committed Pentaho is looking for a customer to do joint development.
  • Availability: Undecided - Closed source for now
  • Contact: dmoran@pentaho.com
  • JIRA: none
youtube
kj65SAgB8Vg

Kettle for Storm empowers Pentaho ETL developers to process big data in real time using their existing visual Pentaho ETL transformations across a cluster of machines using Storm. It decomposes the transformation into a topology and wraps all steps in either a Storm Spout or a Bolt. The topology is then submitted to the cluster and runs continuously or until all inputs report end of data.

Widget Connector
width600
urlhttps://www.youtube.com/watch?v=RPoWdIWWPkc
height480

Closing the gap between batch and real time

Pentaho has lead led the big data ETL space for 3 years by providing ETL developers a visual environment to design, test, and execute ETL that leverages the power of MapReduce. Now with Kettle for Storm, that same ETL developer is immediately productive with one of the most popular distributed streaming processing systems today: Storm. Any existing Pentaho ETL transformations can be executed as realtime processes via Storm - including those used in Pentaho MapReduce. This powerful combination allows an ETL developer to provide data to business users when they need it most without the delay of batch processing or overhead of designing additional transformations.

...

  • Sampling
  • Aggregation
  • Sorting
  • Filtering
  • First-class Spoon support
  • Repository-based transformations
  • Error handling
  • Conditional hops
  • Multiple end steps
  • Sub-transformations
  • Metrics: Kettle timing, throughput, logging

Try it out!

Instructions and code is available on GitHub. Download the preview build from our CI environment.

Image AddedHortonWorks Sandbox VM Quick Start take an existing plain vanilla sandbox VM and add in Storm and Kettle-Storm stuff.

https://github.com/deinspanjer/hw-sandbox-storm-provision

Wiki Markup
{scrollbar}