Hitachi Vantara Pentaho Community Wiki
Child pages
  • InterquartileRange
Skip to end of metadata
Go to start of metadata

Package

weka.filters.unsupervised.attribute

Synopsis

A filter for detecting outliers and extreme values based on interquartile ranges. The filter skips the class attribute.

Outliers:
Q3 + OF*IQR < x <= Q3 + EVF*IQR
or
Q1 - EVF*IQR <= x < Q1 - OF*IQR

Extreme values:
x > Q3 + EVF*IQR
or
x < Q1 - EVF*IQR

Key:
Q1 = 25% quartile
Q3 = 75% quartile
IQR = Interquartile Range, difference between Q1 and Q3
OF = Outlier Factor
EVF = Extreme Value Factor

Options

The table below describes the options available for InterquartileRange.

Option

Description

attributeIndices

Specify range of attributes to act on; this is a comma separated list of attribute indices, with "first" and "last" valid values; specify an inclusive range with "-", eg: "first-3,5,6-10,last".

debug

Turns on output of debugging information.

detectionPerAttribute

Generates Outlier/ExtremeValue attribute pair for each numeric attribute, not just a single pair for all numeric attributes together.

extremeValuesAsOutliers

Whether to tag extreme values also as outliers.

extremeValuesFactor

The factor for determining the thresholds for extreme values.

outlierFactor

The factor for determining the thresholds for outliers.

outputOffsetMultiplier

Generates an additional attribute 'Offset' that contains the multiplier the value is off the median: value = median + 'multiplier' * IQR

Capabilities

The table below describes the capabilites of InterquartileRange.

Capability

Supported

Class

Unary class, Relational class, Date class, Missing class values, Numeric class, No class, String class, Empty nominal class, Binary class, Nominal class

Attributes

Missing values, Nominal attributes, String attributes, Empty nominal attributes, Relational attributes, Binary attributes, Date attributes, Unary attributes, Numeric attributes

Min # of instances

0

  • No labels