The Wiki will be offline Monday, November 20, for upgrade between 10:00am ET and 5:00pm ET.
Hitachi Vantara Pentaho Community Wiki
Access Keys:
Skip to content (Access Key - 0)




Produces a random subsample of a dataset. The original dataset must fit entirely in memory. This filter allows you to specify the maximum "spread" between the rarest and most common class. For example, you may specify that there be at most a 2:1 difference in class frequencies. When used in batch mode, subsequent batches are NOT resampled.


The table below describes the options available for SpreadSubsample.

Option Description
adjustWeights Wether instance weights will be adjusted to maintain total weight per class.
distributionSpread The maximum class distribution spread. (0 = no maximum spread, 1 = uniform distribution, 10 = allow at most a 10:1 ratio between the classes).
maxCount The maximum count for any class value (0 = unlimited).
randomSeed Sets the random number seed for subsampling.


The table below describes the capabilites of SpreadSubsample.

Capability Supported
Class Nominal class, Binary class
Attributes Binary attributes, Missing values, Nominal attributes, Numeric attributes, Unary attributes, String attributes, Empty nominal attributes, Relational attributes, Date attributes
Min # of instances 0

This documentation is maintained by the Pentaho community, and members are encouraged to create new pages in the appropriate spaces, or edit existing pages that need to be corrected or updated.

Please do not leave comments on Wiki pages asking for help. They will be deleted. Use the forums instead.

Adaptavist Theme Builder Powered by Atlassian Confluence