Kettle for Storm

Kettle for Storm empowers Pentaho ETL developers to process big data in real
time using their existing visual Pentaho ETL transformations across a cluster of
machines using Storm.

Closing the gap between batch and real time

Pentaho has lead the big data ETL space for 3 years by providing ETL developers
a visual environment to design, test, and execute ETL that leverages the power of
MapReduce. Now with Kettle for Storm, that same ETL developer is immediately
productive with one of the most popular distributed streaming processing systems
today: Storm. Any existing Pentaho ETL transformations can be executed as realtime
processes via Storm - including those used in Pentaho MapReduce. This
powerful combination allows an ETL developer to provide data to business users
when they need it most without the delay of batch processing or overhead of
designing additional transformations.

Process data as it arrives

Pentaho ETL begins processing data as it arrives from the source and produces
the valuable data sets your business depends on immediately. Get up to the
second insight for your key business metrics by reacting when data arrives and
delivering real-time dashboards, reports, or intermediate data sets to be used by
your existing applications.

Hybrid workflows

Many of our customers have long running batch Pentaho ETL jobs that run within
Hadoop via MapReduce. Pentaho for Storm compliments these by allowing
developers to reuse existing transformations to process data immediately. Both
batch and real time workflows are powered by Pentaho ETL, empowering existing
developers to build upon years of knowledge to learn the most from their data,
instantly.

Leverage existing Kettle ETL

Kettle for Storm allows Pentaho ETL developers to reuse their knowledge and
beloved Kettle components to process data differently. Deliver data when its
needed - all with a familiar tool set.
Looking for additional tools for your Pentaho ETL tool kit? Check out the [Kettle
Marketplace|http://wiki.pentaho.com/display/EAI/Marketplace]!

Next steps

Today, Kettle for Storm can process many of your existing transformations but this
wouldn't be in Pentaho Labs if it were complete. We're continuing to build out
support for the entire Kettle ecosystem of steps. Stay tuned while we complete the
implementation.
Upcoming features:

  1. Spoon integration
  2. Support for aggregations, sorting, filtering, sampling
  3. Support for executing an entire transformation as a component within an
    existing Storm topology