The Knowledge Flow plugin is an enterprise edition tool that allows entire data mining processes to be run as part of a Kettle (PDI) ETL transformation. There are a number of use cases for combining ETL and data mining, such as:
- Automated batch training/refreshing of predictive models
- Including data mining results in reports
- Access to data mining data pre-processing techniques in ETL transformations
Training/refreshing of predictive models is the application described in this document and, when combined with the Weka Scoring plugin for deploying predictive models, can provide a fully automated predictive analytics solution.
The Knowledge Flow plugin requires Kettle 3.1 or higher and Weka 3.6 or higher. Due to SWT-AWT problems under Mac OS X, OS X users will require the Eclipse Cocoa 64 bit SWT libraries (version 3.5) in order to use the plugin. These libraries can easily be dropped in to replace the ones included in the Kettle Mac application (Kettle.app/Contents/Resources/Java/libswt/osx).
Before starting Kettle's Spoon UI, the Knowledge Flow Kettle plugin must be installed in either the plugins/steps directory in your Kettle distribution or in $HOME/.kettle/plugins/steps. Unpack the Knowledge Flow archive and copy the contents of the KFDeploy directory to a new subdirectory of $HOME/.kettle/plugins/steps. Copy the "weka.jar" file from your Weka distribution to the same subdirectory of $HOME/.kettle/plugins/steps.
The Knowledge Flow Kettle plugin also requires a small plugin to be installed in the Weka Knowledge Flow application. This plugin provides a special data source component for the Weka Knowledge Flow that accepts incoming data sets from Kettle. Copy the contents of the "KettleInject" directory to a subdirectory in $HOME/.knowledgeFlow/plugins. If the $HOME/.knowledgeFlow/plugins directory does not exist, you will need to create it manually.
Once installed correctly, you will find the Kettle Knowledge Flow step in the "Transform" folder in the Spoon user interface.