Hitachi Vantara Pentaho Community Wiki
Access Keys:
Skip to content (Access Key - 0)




Resamples a dataset by applying the Synthetic Minority Oversampling TEchnique (SMOTE). The original dataset must fit entirely in memory. The amount of SMOTE and number of nearest neighbors may be specified. For more information, see

Nitesh V. Chawla et. al. (2002). Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. 16:321-357.

Available in Weka 3.6.x - 3.7.1. Available via the package management system for Weka >= 3.7.2 (SMOTE).


The table below describes the options available for SMOTE.

Option Description
classValue The index of the class value to which SMOTE should be applied. Use a value of 0 to auto-detect the non-empty minority class.
nearestNeighbors The number of nearest neighbors to use.
percentage The percentage of SMOTE instances to create.
randomSeed The seed used for random sampling.


The table below describes the capabilites of SMOTE.

Capability Supported
Class Nominal class, Binary class, Missing class values
Attributes Binary attributes, String attributes, Nominal attributes, Numeric attributes, Unary attributes, Relational attributes, Date attributes, Missing values, Empty nominal attributes
Min # of instances 0

This documentation is maintained by the Pentaho community, and members are encouraged to create new pages in the appropriate spaces, or edit existing pages that need to be corrected or updated.

Please do not leave comments on Wiki pages asking for help. They will be deleted. Use the forums instead.

Adaptavist Theme Builder Powered by Atlassian Confluence