Hitachi Vantara Pentaho Community Wiki
Access Keys:
Skip to content (Access Key - 0)




Produces a random subsample of a dataset using either sampling with replacement or without replacement.
The original dataset must fit entirely in memory. The number of instances in the generated dataset may be specified. The dataset must have a nominal class attribute. If not, use the unsupervised version. The filter can be made to maintain the class distribution in the subsample, or to bias the class distribution toward a uniform distribution. When used in batch mode (i.e. in the FilteredClassifier), subsequent batches are NOT resampled.


The table below describes the options available for Resample.

Option Description
biasToUniformClass Whether to use bias towards a uniform class. A value of 0 leaves the class distribution as-is, a value of 1 ensures the class distribution is uniform in the output data.
invertSelection Inverts the selection (only if instances are drawn WITHOUT replacement).
noReplacement Disables the replacement of instances.
randomSeed Sets the random number seed for subsampling.
sampleSizePercent The subsample size as a percentage of the original set.


The table below describes the capabilites of Resample.

Capability Supported
Class Binary class, Nominal class
Attributes Missing values, String attributes, Numeric attributes, Empty nominal attributes, Binary attributes, Unary attributes, Nominal attributes, Date attributes, Relational attributes
Min # of instances 0

This documentation is maintained by the Pentaho community, and members are encouraged to create new pages in the appropriate spaces, or edit existing pages that need to be corrected or updated.

Please do not leave comments on Wiki pages asking for help. They will be deleted. Use the forums instead.

Adaptavist Theme Builder Powered by Atlassian Confluence