The Weka Forecasting plugin is a transformation step for PDI 4.x that is similar to the Weka Scoring Plugin. It can load or import a time series forecasting model created in Weka's time series analysis and forecasting environment and use it to generate a forecast for future time steps beyond the end of incoming historical data. This differs from the standard classification or regression scenario covered by the Weka Scoring plugin, where each incoming row receives a prediction (score) from the model, in that incoming rows provide a "window" over the recent history of the time series that the forecasting model then uses to initiate a closed-loop forecasting process to generate predictions for future time steps.
The Weka Forecasting plugin requires PDI 4 or higher, Weka 3.7.3 or higher and the core time series forecasting library from the time series forecasting package. Both Weka and the time series forecasting core library are bundled with the plugin, so no further downloads are required.
Before starting Kettle's Spoon UI, the Weka Forecasting plugin must be installed in either the plugins/steps directory in your Kettle distribution or in $HOME/.kettle/plugins/steps. Simply unpack the distribution zip file into plugins/steps and then start Spoon.
Once the forecasting plugin step is installed, and Spoon has been restarted, the Weka Forecasting step can be found in the "Transform" folder in the "Design" tab.
In this section we will demonstrate using the model developed on the Australian wine data in Section 3.1.1 of Time Series Analysis and Forecasting with Weka. This forecaster modeled monthly sales of the "Fortified" and "Dry-white" series. The following simple transformation loads the wine data and passes it to the Weka Forecasting step. The Weka Forecasting step uses the incoming data as historical "priming" data, that is, the data is used to populate the values of lagged variables and variables derived from the time stamp. These values are then input to the forecasting model and a forecast is produced for a user-defined number of steps beyond the end of the priming data. The step outputs the historical data followed by a number of new rows that contain the forecasted values.
Subsequent sections explain the configuration options for the step and the output that it produces in detail.
Conceptually, the UI for the Weka Forecasting step is set out in a similar fashion as the Weka Scoring plugin. The Model file tab allows a model to be loaded from the file system and configured for forecasting.
The Load/import forecaster field allows a serialized forecasting model to be loaded from a file. A path can be entered into the field directly, or the Browse button can be used to bring up a file browser dialog. If the field is left populated with a path then the forecasting model will be loaded from the file every time that the transformation is run. Alternatively, after importing a forecasting model (by pressing enter in the field after a path has been typed or by using the Browse button, if the field is cleared and the OK button pressed then the model will be stored in the XML ".ktr" file or in the repository (if one is being used).
The Number of steps to forecast field allows the user to specify how many time steps into the future the model will produce predictions for. In this example we have entered "24" in order to get a monthly forecast out to 24 months beyond the end of the incoming priming data.
The Number of historical rows beyond end of training data field only becomes enabled when the step detects that the loaded forecasting model is using an artificial time stamp (see Section 3.1.2 of Time Series Analysis and Forecasting with Weka). In this case, the user can specify how many rows of the incoming priming data occur after the most recent row seen by the forecaster when it was trained - this enables the forecaster to synchronize the artificial time stamp value with the priming data.
The Rebuild/reestimate forecaster on incoming data check box allows the user to specify that the forecaster should be trained on the incoming data rather than primed. This allows the forecasting model to be brought up to date with the latest historical data. After training is complete, a forecast is generated as described above. Selecting this option enables the Save forecaster field. This field can be used to specify a file to which the updated forecasting model will be saved out to. Leaving the field blank tells the step not to save the updated forecasting model.
The Fields mapping tab allows the user to check how the step is mapping incoming transformation fields to those that the model saw in its training data. The step matches both field names and types - note that this is done between incoming Kettle fields and the original training data fields (before any internal transformations done by the forecasting model itself). Any training data fields that don't have counterpart in the incoming data are indicated by an entry labelled "missing". If there is a difference in type between a training field and an incoming field, then this will be indicated by the label "type-mismatch". In both cases, the forecaster will receive a missing value as input for the field in question for all incoming data rows. This will impact forecasting performance to a greater or lesser degree.