Multinomial naive bayes for text data. Operates directly (and only) on String attributes. Other types of input attributes are accepted but ignored during training and classification


The table below describes the options available for NaiveBayesMultinomialText.

Option Description
LNorm The LNorm to use for document length normalization.
debug If set to true, classifier may output additional info to the console.
lowercaseTokens Whether to convert all tokens to lowercase
minWordFrequency Ignore any words that don't occur at least min frequency times in the training data. If periodic pruning is turned on, then the dictionary is pruned according to this value
norm The norm of the instances after normalization.
normalizeDocLength If true then document length is normalized according to the settings for norm and lnorm
periodicPruning How often (number of instances) to prune the dictionary of low frequency terms. 0 means don't prune. Setting a positive integer n means prune after every n instances
stemmer The stemming algorithm to use on the words.
stopwords The file containing the stopwords (if this is a directory then the default ones are used).
tokenizer The tokenizing algorithm to use on the strings.
useStopList If true, ignores all words that are on the stoplist.
useWordFrequencies Use word frequencies rather than binary bag of words representation


The table below describes the capabilities of NaiveBayesMultinomialText.

Capability Supported
Class Nominal class, Binary class, Missing class values
Attributes Numeric attributes, Missing values, Date attributes, Unary attributes, Empty nominal attributes, Binary attributes, Nominal attributes, String attributes
Min # of instances 0

