Hitachi Vantara Pentaho Community Wiki
Child pages
  • SGDText
Skip to end of metadata
Go to start of metadata

Package

weka.classifiers.functions

Synopsis

Implements stochastic gradient descent for learning a linear binary class SVM or binary class logistic regression on text data. Operates directly on String attributes. From Weka 3.7.5.

Options

The table below describes the options available for SGDText.

Option

Description

LNorm

The LNorm to use for document length normalization.

debug

If set to true, classifier may output additional info to the console.

epochs

The number of epochs to perform (batch learning). The total number of iterations is epochs * num instances.

lambda

The regularization constant. (default = 0.0001)

learningRate

The learning rate.

lossFunction

The loss function to use. Hinge loss (SVM), log loss (logistic regression) or squared loss (regression).

lowercaseTokens

Whether to convert all tokens to lowercase

minWordFrequency

Ignore any words that don't occur at least min frequency times in the training data. If periodic pruning is turned on, then the dictionary is pruned according to this value

norm

The norm of the instances after normalization.

periodicPruning

How often (number of instances) to prune the dictionary of low frequency terms. 0 means don't prune. Setting a positive integer n means prune after every n instances

seed

The random number seed to be used.

stemmer

The stemming algorithm to use on the words.

stopwords

The file containing the stopwords (if this is a directory then the default ones are used).

tokenizer

The tokenizing algorithm to use on the strings.

useStopList

If true, ignores all words that are on the stoplist.

useWordFrequencies

Use word frequencies rather than binary bag of words representation

Capabilities

The table below describes the capabilities of SGDText.

Capability

Supported

Class

Binary class, Missing class values

Attributes

String attributes, Missing values

Min # of instances

0

  • No labels