04/22/10 - SupportVectorMachineModel is now supported!
06/22/09 - RuleSetModel is now supported.
02/26/09 - TreeModel is now supported.
02/08/09 - Feedback from the PMML testing web page has resulted in some bug fixes and improvements (e.g. derived fields can now reference other derived fields as long as the referred field is declared before the referring field). Get these latest improvements via the download link above.
09/15/08 - Neural network, TransformationDictionary, LocalTransformation and DerivedFieldare now supported.
What is PMML?
The Predictive Modeling Markup Language (PMML) is a vendor-agnostic XML-based standard for expressing statistical and data mining models. Applications can produce and consume PMML models, thus allowing a model created in one application to be consumed and used for scoring (prediction) in another. The PMML standard is maintained by the Data Mining Group (DMG).
What PMML model types are supported?
Support for importing PMML models into Weka is under development. Implementation of the PMML (v 3.2) model types Regression, GeneralRegression, NeuralNetwork, TreeModel, RuleSetModel and SupportVectorMachineModel is complete. Support for other model types will follow in the future. The current plan is to implement support for (in order): naive Bayes, association rules and clustering models. This wiki page will be updated with new information and new download archives as more features are implemented.
What are the current limitations of Weka's PMML support?
Only PMML Regression, GeneralRegression, NeuralNetwork, TreeModel, RuleSetModel and SupportVectorMachineModel are implemented so far. GeneralRegression supports a single Predictor-to-Parameter matrix (i.e. in the case of classification, each target class value shares the same PPMatrix). Aggregate and MapValues expressions are not supported yet. The first six of the eleven PMML built-in functions are supported so far. There is no support for exporting PMML models from Weka yet.
How will I be able to use PMML models with Pentaho?
PMML models will be able to be used in several different contexts: 1) In the Weka GUIs (Explorer and KnowledgeFlow) or from the command line, a PMML model will be able to be loaded and applied to test data to score it. Since Weka's implementation of PMML import renders a PMML model as a standard (albeit immutable) Weka Classifier, all the standard Weka evaluation metrics will be available for evaluating performance on the test set (if it contains reference target values); 2) Using the Weka scoring plugin for Pentaho Data Integration (Kettle), PMML models will be able to be deployed for scoring as part of an ETL job.
Integration of PMML support into the Weka scoring plugin and a new PMML classifier scoring plugin for the Weka KnowledgeFlow have been completed (see below for example usage and screenshots). From Weka 3.6.0, PMML models can be run from the Classify panel in Weka's Explorer user interface and from the command line.
Below is some example output of Weka's implementation of PMML GeneralRegression (multinomial logistic in this case) and the first few predictions (probability distributions over the class values) for some test data for the famous Irisdataset:
Here is another example. This shows the output from Weka's implementation of PMML Regression (polynomial regression in this case) and the first few predictions for some test data on the Elninodataset:
Weka's implementation of TreeModel for classification and regression trees implements Weka's Drawable interface, which allows the tree to be output in the Dot language used by the excellent Graphviz graph visualization software from AT&T Research. This enables the tree to be visualized by Weka's built-in TreeVisualizer or by other tools that support the Dot language. Here is a visualization of a PMML tree generated by SPSS Clementine from the Cleveland heart disease data.
Once the Weka PMML library is installed in the same directory as the Weka scoring plugin in your Kettle plugins directory, using PMML models is simple and follows the same procedure as using a standard serialized Weka model (for more information on using the Weka scoring plugin, see the documentation provided with the distribution).
The following screenshot shows browsing for PMML model files from the WekaScoring file browser.
The next screenshot shows the "HEART_NOMREG" PMML GeneralRegression model loaded into the Weka scoring plugin.
The PMML classifier scoring plugin for the KnowledgeFlow allows PMML classification and regression models to be loaded and used to score incoming batches of instances or instance streams in the KnowledgeFlow. Below are some example screenshots showing the PMML classifier scoring plugin, with a PMML binomial logistic regression model loaded, accepting an instance stream from the UCI Cleveland heart disease dataset. Evaluation metrics are computed by the incremental classifier evaluator component and displayed in a text viewer. Predictions for the data are appended and saved to a new ARFF file via the prediction appender and the ARFF saver components.