Hitachi Vantara Pentaho Community Wiki
Child pages
  • ReservoirSample
Skip to end of metadata
Go to start of metadata

Package

weka.filters.unsupervised.instance

Synopsis

Produces a random subsample of a dataset using the reservoir sampling Algorithm "R" by Vitter. The original data set does not have to fit into main memory, but the reservoir does.

Options

The table below describes the options available for ReservoirSample.

Option

Description

randomSeed

The seed used for random sampling.

sampleSize

Size of the subsample (reservoir). i.e. the number of instances.

Capabilities

The table below describes the capabilites of ReservoirSample.

Capability

Supported

Class

Relational class, Binary class, No class, Date class, Numeric class, Nominal class, String class, Empty nominal class, Unary class, Missing class values

Attributes

Date attributes, Empty nominal attributes, Missing values, Unary attributes, String attributes, Nominal attributes, Relational attributes, Binary attributes, Numeric attributes

Min # of instances

0

  • No labels