Hitachi Vantara Pentaho Community Wiki
Child pages
  • FPGrowth
Skip to end of metadata
Go to start of metadata

Package

weka.associations

Synopsis

Class implementing the FP-growth algorithm for finding large item sets without candidate generation. Iteratively reduces the minimum support until it finds the required number of rules with the given minimum metric. For more information see:

J. Han, J.Pei, Y. Yin: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM-SIGMID International Conference on Management of Data, 1-12, 2000.

Options

The table below describes the options available for FPGrowth.

Option

Description

delta

Iteratively decrease support by this factor. Reduces support until min support is reached or required number of rules has been generated.

findAllRulesForSupportLevel

Find all rules that meet the lower bound on minimum support and the minimum metric constraint. Turning this mode on will disable the iterative support reduction procedure to find the specified number of rules.

lowerBoundMinSupport

Lower bound for minimum support.

maxNumberOfItems

The maximum number of items to include in frequent item sets. -1 means no limit.

metricType

Set the type of metric by which to rank rules. Confidence is the proportion of the examples covered by the premise that are also covered by the consequence(Class association rules can only be mined using confidence). Lift is confidence divided by the proportion of all examples that are covered by the consequence. This is a measure of the importance of the association that is independent of support. Leverage is the proportion of additional examples covered by both the premise and consequence above those expected if the premise and consequence were independent of each other. The total number of examples that this represents is presented in brackets following the leverage. Conviction is another measure of departure from independence.

minMetric

Minimum metric score. Consider only rules with scores higher than this value.

numRulesToFind

The number of rules to output

positiveIndex

Set the index of binary valued attributes that is to be considered the positive index. Has no effect for sparse data (in this case the first index (i.e. non-zero values) is always treated as positive. Also has no effect for unary valued attributes (i.e. when using the Weka Apriori-style format for market basket data, which uses missing value "?" to indicate absence of an item.

upperBoundMinSupport

Upper bound for minimum support. Start iteratively decreasing minimum support from this value.

Capabilities

The table below describes the capabilites of FPGrowth.

Capability

Supported

Class

No class

Attributes

Unary attributes, Missing values, Empty nominal attributes, Binary attributes

Min # of instances

1

  • No labels