Hitachi Vantara Pentaho Community Wiki
Child pages
  • Using the Knowledge Flow Plugin

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


If a flow definition is loaded, then the definition file will be loaded (sourced) from the disk every time that the transformation is executed. If, on the other hand, the the flow definition file is imported, it will be stored in either the transformation's XML configuration file (.ktr file) or the repository (if one is being used).

A third option is to design a new Knowledge Flow process from scratch using the embedded Knowledge Flow editor. In this case the new flow definition will be stored in the .ktr file/repository. This is the approach we will take for the purposes of demonstration.

4.2.1 Creating a Knowledge Flow Process Using the Embedded Editor


Finally, add a "TextViewer" step to the layout and connect it to the "Logistic" step by right clicking over "Logistic" and selecting "text" from the list of connections.

4.2.2 Linking the Knowledge Flow Data Mining Process to the Kettle Transformation



This tab is only applicable when data is being injected into the Knowledge Flow process.

The table in the upper portion of the tab shows the incoming Kettle fields, their type and corresponding Weka type. The table allows you to delete any fields that you don't wan't to become input to the data mining process. Clicking on the "Get fields" button will reset the table with all incoming fields.

Directly below the "Get fields" button is a text field that allows you to enter a relation name for the Weka data set that will be constructed from the incoming Kettle data. This is set to "Sampled rows" by default, but you may enter any descriptive string you like here.

The next two text fields relate to sampling the incoming Kettle data. The Knowledge Flow Kettle step has built in Reservoir sampling (similar to that of the separate Reservoir Sampling plugin step). In batch training mode (incremental training is discussed in the "Advanced Features" section below) the "Sample/cache size (rows)" text field allows you to specify how many incoming Kettle rows should be randomly sampled and passed on to the Knowledge Flow data mining process. Reservoir sampling ensures that each row has an equal probability of ending up in the sample (uniform sampling). The "Random seed" text field provides a seed value for the random sampling process - changing this will result in a different random sample of the data. Entering a 0 (zero) in the "Sample/cache size" field tells the step that all the incoming data should be passed on to the data mining process (i.e. no sampling is to be performed). Make sure that you enter a zero in this field or change the value to something more than the default 100 rows.
Image Added

The "Set class attribute" allows you to indicate that a class or target attribute is to be set on the data set created for the data mining process. Select this checkbox and then select the "class" field from the "Class attribute" drop-down box. Image Added