Access Keys:
Skip to content (Access Key - 0)

Pentaho Data Mining Community Documentation

Quick Start and Overview

Pentaho Data Mining, based on Weka project, is a comprehensive set of tools for machine learning and data mining. Its broad suite of classification, regression, association rules, and clustering algorithms can be used to help you understand the business better and also be exploited to improve future performance through predictive analytics.

There are three versions of Weka:

  1. Weka 3.4 - stable branch that was created in 2003 to correspond with what is described in the 2nd edition of the Witten and Frank Data Mining book (published 2005). This branch is feature frozen and receives only bug fixes. It is also reaching end of life.
  2. Weka 3.6 - stable branch that was created in mid 2008 to correspond with what is described in the 3rd edition of the Witten, Frank and Hall Data Mining book (published January 2011). This branch is feature frozen and receives only bug fixes.
  3. Weka 3.7 - development branch. This is a continuation of the 3.6 code line that receives both bug fixes and new features.

Documentation

Pentaho Data Mining (Weka)

There is a book that has been written to accompany Weka - Data Mining: Practical Machine Learning Tools and Techniques (Third Edition).

Plugins for Pentaho Data Integration (Kettle)

Developing with Weka

Awards and Publications

Under Development/Roadmap

Archived

Further Links and Information


This documentation is maintained by the Pentaho community, and members are encouraged to create new pages in the appropriate spaces, or edit existing pages that need to be corrected or updated.

Please do not leave comments on Wiki pages asking for help. They will be deleted. Use the forums instead.

Adaptavist Theme Builder Powered by Atlassian Confluence