Quick Start and Overview
Pentaho Data Mining, based on Weka project, is a comprehensive set of tools for machine learning and data mining. Its broad suite of classification, regression, association rules, and clustering algorithms can be used to help you understand the business better and also be exploited to improve future performance through predictive analytics.
There are two versions of Weka:
- Weka 3.8 - current stable version. This branch receives bug fixes to core Weka; new features are released through packages that can be installed via the built-in package manager.
- Weka 3.9 - development branch. This is a continuation of the 3.8 code line that receives both bug fixes and new features/improvements to core Weka. It also takes advantage of new features released in packages.
Pentaho Data Mining (Weka)
There is a book that has been written to accompany Weka - Data Mining: Practical Machine Learning Tools and Techniques (Fourth Edition).
Plugins for Pentaho Data Integration (Kettle)
Developing with Weka
Awards and Publications
- Ian H. Witten, Eibe Frank, Mark A. Hall and Christopher J. Pal. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington, MA, 4th edition, 2016.
- Remco R. Bouckaert, Eibe Frank, Mark A. Hall, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. WEKA-experiences with a java open-source project. Journal of Machine Learning Research, 11:2533-2541, 2010.
- Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann and Ian H. Witten. The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1), 2009.
- ACM SIGKDD Service Award 2005
Further Links and Information