Access Keys:
Skip to content (Access Key - 0)

This is a production Pentaho Labs project

This project has made it into product and is supported. We are very interested in your feedback and your use cases.

A recipe for executing Weka in Hadoop.

Project Info

This package for Weka >= 3.7.10 provides several jobs for executing learning tasks inside of Hadoop. These include:

  1. Determining ARFF meta data and summary statisitics
  2. Computing a correlation or covariance matrix
  3. Training a Weka classifier or regressor
  4. Generating randomly shuffled (and stratified) input data chunks
  5. Evaluating a Weka classifier or regressor via cross-validation or a hold-out set
  6. Scoring using a training classifier or regressor

A full-featured command line interface is available along with GUI Knowledge Flow components for job orchestration. Predictive models learned in Hadoop are fully compatible with Pentaho Data Integration's "Weka Scoring" transformation step.

More information on what is available in the distributed Weka package, and how it is implemented, can be found in a three part blog posting:

Try it out!

Open Weka's package manager (GUIChooser->Tools->Package manager) and install "distributedWekaHadoop".


This documentation is maintained by the Pentaho community, and members are encouraged to create new pages in the appropriate spaces, or edit existing pages that need to be corrected or updated.

Please do not leave comments on Wiki pages asking for help. They will be deleted. Use the forums instead.

Adaptavist Theme Builder (4.2.0) Powered by Atlassian Confluence 3.3.3, the Enterprise Wiki