The Wiki will be offline Monday, November 20, for upgrade between 10:00am ET and 5:00pm ET.
Hitachi Vantara Pentaho Community Wiki
Access Keys:
Skip to content (Access Key - 0)

Pentaho Data Mining Community Documentation

Quick Start and Overview

Pentaho Data Mining, based on Weka project, is a comprehensive set of tools for machine learning and data mining. Its broad suite of classification, regression, association rules, and clustering algorithms can be used to help you understand the business better and also be exploited to improve future performance through predictive analytics.

There are two versions of Weka:

  1. Weka 3.8 - current stable version. This branch receives bug fixes to core Weka; new features are released through packages that can be installed via the built-in package manager.
  2. Weka 3.9 - development branch. This is a continuation of the 3.8 code line that receives both bug fixes and new features/improvements to core Weka. It also takes advantage of new features released in packages.


Pentaho Data Mining (Weka)

There is a book that has been written to accompany Weka - Data Mining: Practical Machine Learning Tools and Techniques (Third Edition).

Plugins for Pentaho Data Integration (Kettle)

Developing with Weka

Awards and Publications

Under Development/Roadmap


Further Links and Information

This documentation is maintained by the Pentaho community, and members are encouraged to create new pages in the appropriate spaces, or edit existing pages that need to be corrected or updated.

Please do not leave comments on Wiki pages asking for help. They will be deleted. Use the forums instead.

Adaptavist Theme Builder Powered by Atlassian Confluence