Hitachi Vantara Pentaho Community Wiki
Child pages
  • RandomProjection
Skip to end of metadata
Go to start of metadata

Package

weka.filters.unsupervised.attribute

Synopsis

Reduces the dimensionality of the data by projecting it onto a lower dimensional subspace using a random matrix with columns of unit length (i.e. It will reduce the number of attributes in the data while preserving much of its variation like PCA, but at a much less computational cost).
It first applies the NominalToBinary filter to convert all attributes to numeric before reducing the dimension. It preserves the class attribute.

For more information, see:

Dmitriy Fradkin, David Madigan: Experiments with random projections for machine learning. In: KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, New York, NY, USA, 517-522, 003.

Options

The table below describes the options available for RandomProjection.

Option

Description

distribution

The distribution to use for calculating the random matrix.

Sparse1 is:
 sqrt(3) * { -1 with prob(1/6), 
               0 with prob(2/3),  
              +1 with prob(1/6) } 
Sparse2 is:
 { -1 with prob(1/2), 
   +1 with prob(1/2) } 

numberOfAttributes

The number of dimensions (attributes) the data should be reduced to.

percent

The percentage of dimensions (attributes) the data should be reduced to (inclusive of the class attribute). This NumberOfAttributes option is ignored if this option is present or is greater than zero.

randomSeed

The random seed used by the random number generator used for generating the random matrix

replaceMissingValues

If set the filter uses weka.filters.unsupervised.attribute.ReplaceMissingValues to replace the missing values

Capabilities

The table below describes the capabilites of RandomProjection.

Capability

Supported

Class

Missing class values, Empty nominal class, Nominal class, String class, Date class, No class, Unary class, Relational class, Numeric class, Binary class

Attributes

Unary attributes, Date attributes, Nominal attributes, Relational attributes, String attributes, Empty nominal attributes, Missing values, Numeric attributes, Binary attributes

Min # of instances

0

  • No labels