Hitachi Vantara Pentaho Community Wiki
Child pages
  • Using the Knowledge Flow Plugin

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


To begin with, we will need an entry point into the data mining process for data from the Kettle transformation. Select the "Plugins" tab of the embedded editor and place a "KettleInject" step onto the layout canvas. If there is no "Plugins" tab visible, or there is no "KettleInject" step available from the "Plugins" tab, you will need to review the installation process described earlier.

Next, connect a "TrainingSetMaker" step to the "KettleInject" step by right clicking over "KettleInject" and selecting "dataSet" from the list of connections.

Now add a logistic regression classifier to the flow and connect it by right clicking over "TrainingSetMaker" and selecting "trainingSet" from the list of connections.

Next, connect a "SerializedModelSaver" step and connect it by right clicking over "Logistic" and selecting "batchClassifier" from the list of connections.
Now configure the "SerializedModelSaver" in order to specify a location to save the trained model to. Either double click the icon or right click over it and select "Configure..." from the pop-up menu. If you are using Weka version 3.7.x, there is support for environment variables in the Knowledge Flow and Kettle's internal environment variables are available. In the screenshot below, we are saving the trained classifier to  ${Internal.Transformation.Filename.Directory} - this is the directory that the Kettle transformation has been saved to (Note: this only makes sense if a repository is not being used). You can always specify an absolute path to a directory on your file system, and, in fact, this is necessary if you are using Weka version 3.6.x.
 

Finally, add a "TextViewer" step to the layout and connect it to the "Logistic" step by right clicking over "Logistic" and selecting "text" from the list of connections. Image Added

Now we can return to the "KnowledgeFlow file" tab in the Knowledge Flow Kettle step's configuration dialog and establish how data is to be passed in to and out of the Knowledge Flow process that we've just designed. First click the "Get changes from KnowledgeFlow editor" button. This will extract the flow from the editor and populate the drop-down boxes with applicable step and connection names. To specify that incoming data should be passed in to the Knowledge Flow process, select the "Inject data into KnowledgeFlow" checkbox and choose "KettleInject" in the "Inject step name" field. The "Inject connection name" field will be automatically filled in for you with the value "dataSet." Image Added

The choices for output include either passing the incoming data rows through to downstream Kettle steps or to pick up output from the Knowledge Flow process and pass that on instead. In this example we will do the latter by picking up output from the "TextViewer" step in the Knowledge Flow process. Note that the "SerializedModelSaver" step writes to disk and does not produce output that we can pass on inside of a Kettle transformation. Select "TextViewer" in the "Output step name" field and "text" in the "Output connection name" field. Make sure to leave "Pass rows through" unchecked.