Hitachi Vantara Pentaho Community Wiki
Child pages
  • Using the Weka Forecasting Plugin

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


The Model tab allows the user to check that model loaded is actually the one that he or she intents to use. This tab displays the textual description of the forecasting model in exactly the same way as it appears in the output of the time series forecasting environment.

4.3 Output

The following screenshot shows a preview of the rows output by the Weka Forecasting plugin step for the Australian wine forecasting model. Historical rows are output first - values for all the types of wine are present in these rows. Forecasted values follow the historical rows. In the case of this model, the forecasted values for the "Fortified" and "Dry-white" types of wine occur in rows towards the end of the output (where the values for the other types of wine are missing). The values of the confidence bounds on these two types of wine are also present for these rows. Note that confidence bounds will only be output for future time steps that were defined when the model was trained. For example, if the user specified a forecast for 12 future time steps in the Basic configuration panel of the time series analysis and forecasting environment, and turned on the computation of confidence intervals, then confidence intervals will only be output for up to 12 time steps into the future. The user can request forecasts for more than 12 time steps but predictions beyond 12 steps into the future will not have confidence limits output. Image Added

4.5 Using Overlay Data

The time series analysis and forecasting environment documentation (Section 3.2.4) explains how external "overlay" fields (sometimes called intervention data) can be incorporated into a forecasting model. If such data has been used to train a forecasting model used in the Weka Forecasting plugin step then this data must be supplied for the future time periods for which forecasted values are requested. This is accomplished by including rows in the incoming data stream for future time steps that contain values for the time stamp (if in use) and values for the overlay fields that the model is expecting. In this case, the number of these rows provided determines the number of forecasted values that will be produced, and the Number of steps to forecast field in the Model file tab is ignored.


In order for the step to be able to identify, in the incoming data stream, where historical priming/training data finishes and future overlay data begins, it is crucial that the values for the forecasted target field(s) are missing in the future overlay data.