Hitachi Vantara Pentaho Community Wiki
Child pages
  • Using the Knowledge Flow Plugin

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

4 Using the Knowledge Flow Plugin

As a simple example, We we will use the Knowledge Flow step to create and export a predictive model for the "pendigits.csv"data set (docs/data/pendigits.csv). This data set is also used in the "Using the Weka Scoring Plugin"documentation.

4.1 Create a Simple Transformation 

First construct a simple Kettle transformation that links a CSV input step to the Knowledge Flow step. Next configure the input step to load the "pendigits.csv" file. Make sure that the Delimiter text box contains a "," and then click "Get Fields" to make the CSV input step analyze a few lines of the file and determine the types of the fields.
 

All the fields in the "pendigits.csv" file are integers. However, the problem is a discrete classification task and Weka will need the "class" field to be declared as a nominal attribute. In the CSV input step's configuration dialog, change the type of the "class" field from "Integer" to "String."
 

4.2 Configuring the Knowledge Flow Kettle Step

The Knowledge Flow step's configuration dialog is made up of three tabs (although only two are visible initially when the dialog is first opened). The first tab, "KnowledgeFlow file," enables existing Knowledge Flow flow definition files to be loaded or imported from disk. It also allows you to configure how the incoming data from the transformation is connected to the Knowledge Flow process and how to deal with the output.

If a flow definition is loaded, then the definition file will be loaded (sourced) from the disk every time that the transformation is executed. If, on the other hand, the the flow definition file is imported, it will be stored in either the transformation's XML configuration file (.ktr file) or the repository (if one is being used).
Image Added
 A third option is to design a new Knowledge Flow process from scratch using the embedded Knowledge Flow editor. In this case the new flow definition will be stored in the .ktr file/repository. This is the approach we will take for the purposes of demonstration. Clicking the "Show embedded KnowledgeFlow editor" button will cause a new "KnowledgeFlow" tab to appear on the dialog.