Hitachi Vantara Pentaho Community Wiki
Skip to end of metadata
Go to start of metadata


Groovy is an agile and dynamic language for the Java virtual machine. It compiles directly to Java bytecode and integrates with all existing Java objects and libraries. The Groovy scripting plugin for WEKA's KnowledgeFlow allows you to write, test and debug KnowledgeFlow steps in Groovy. This is much faster than writing a standard plugin for the KnowledgeFlow because the dynamic nature of Groovy eliminates the need to restart the KnowledgeFlow after changes to the code. Furthermore, the Groovy scripting plugin provides a fully compilable class template that implements all the important KnowledgeFlow interfaces - you just fill in the methods that you need. This makes for an easy way to get started with learning to program for the KnowledgeFlow.


Download the Groovy scripting plugin for the KnowledgeFlow here (includes the examples below). From Weka 3.7.2 the Groovy scripting plugin is available as a package called "kfGroovy", which can be downloaded and installed by the package manager. You can also browse the package's details online at:


WEKA 3.7.0 or higher.


The screenshot below shows the KnowledgeFlow with a Groovy script that loads all the ARFF files in a directory and passes them on, one-by-one, to the naive Bayes classifier for evaluation (via cross-validation) and for saving out the model learned from each to a file. It also shows the built-in editor for creating Groovy scripts.

The DirectoryLoader.groovy script can be seen here. It shows how to implement the "Startable" interface, how to generate "DataSet" events and how to use environment variables. 

The screenshot below shows another example. This time a Groovy script has been written to generate a learning curve for an incoming data set (the german credit data in this example).

The LearningCurve.groovy script can be seen here. It demonstrates how a script can provide a graphical dialog for setting options and further use of environment variables. Environment variables can be quite useful in KnowledgeFlow Groovy scripts as parameters edited by the user at runtime for an object of the script class can't be saved when the KnowledgeFlow layout is saved to disk (this is because the source code of the script is saved, not the object instantiated from the script).


  • No labels