Hitachi Vantara Pentaho Community Wiki
Child pages
  • RaceSearch
Skip to end of metadata
Go to start of metadata

Package

weka.attributeSelection

Synopsis

Races the cross validation error of competing attribute subsets. Use in conjuction with a ClassifierSubsetEval. RaceSearch has four modes:

forward selection races all single attribute additions to a base set (initially no attributes), selects the winner to become the new base set and then iterates until there is no improvement over the base set.

Backward elimination is similar but the initial base set has all attributes included and races all single attribute deletions.

Schemata search is a bit different. Each iteration a series of races are run in parallel. Each race in a set determines whether a particular attribute should be included or not---ie the race is between the attribute being "in" or "out". The other attributes for this race are included or excluded randomly at each point in the evaluation. As soon as one race has a clear winner (ie it has been decided whether a particular attribute should be inor not) then the next set of races begins, using the result of the winning race from the previous iteration as new base set.

Rank race first ranks the attributes using an attribute evaluator and then races the ranking. The race includes no attributes, the top ranked attribute, the top two attributes, the top three attributes, etc.

It is also possible to generate a raked list of attributes through the forward racing process. If generateRanking is set to true then a complete forward race will be run---that is, racing continues until all attributes have been selected. The order that they are added in determines a complete ranking of all the attributes.

Racing uses paired and unpaired t-tests on cross-validation errors of competing subsets. When there is a significant difference between the means of the errors of two competing subsets then the poorer of the two can be eliminated from the race. Similarly, if there is no significant difference between the mean errors of two competing subsets and they are within some threshold of each other, then one can be eliminated from the race.

For more information see:

Andrew W. Moore, Mary S. Lee: Efficient Algorithms for Minimizing Cross Validation Error. In: Eleventh International Conference on Machine Learning, 190-198, 1994.

Options

The table below describes the options available for RaceSearch.

Option

Description

attributeEvaluator

Attribute evaluator to use for generating an initial ranking. Use in conjunction with a rank race

debug

Turn on verbose output for monitoring the search's progress.

foldsType

Set the number of folds to use for x-val error estimation; leave-one-out is selected automatically for schemata search.

generateRanking

Use the racing process to generate a ranked list of attributes. Using this mode forces the race to be a forward type and then races until all attributes have been added, thus giving a ranked list

numToSelect

Specify the number of attributes to retain. Use in conjunction with generateRanking. The default value (-1) indicates that all attributes are to be retained. Use either this option or a threshold to reduce the attribute set.

raceType

Set the type of search.

selectionThreshold

Set threshold by which attributes can be discarded. Default value results in no attributes being discarded. Use in conjunction with generateRanking

significanceLevel

Set the significance level to use for t-test comparisons.

threshold

Set the error threshold by which to consider two subsets equivalent.

  • No labels