nlp:add-one-smoothing-performance [CS Wiki]

How does Add-one smoothing affect performance in a multinomial naive bayes text classifier?

PLEASE NOTE BEFORE READING This document is not complete yet. I may have implemented DDJ Naive Bayes incorrectly. This document offers no scientific data or conclusions. I'm just documenting an ad hoc experiment.

Compared to a poor-man's smoothing mechanism, it increases performance significantly.

I compared the naive bayes implementation found in the Dr. Dobbs' Journal May 2005 with a simplified version of naive bayes from (McCallum, 1998). The (McCallum, 1998) version significantly outperformed the Dr. Dobbs' Journal by almost 2 times. Here are the results on a single data set.

(McCallum, 1998) uses Laplacean smoothing, also known as Add-one smoothing. DDJ uses what I call a poor man's smoothing mechanism. Smoothing is necessary when a token is encountered during classification that does not occur in the training data. Poor man's smoothing simply uses a hard-coded very low probability to handle this. Both Add-one smoothing and poor man's smoothing provide low probabilities for this event, but Add-one smoothing is more statistically correct.

(McCallum, 1998) version:

		GVOTE				GREL				GENT				
GVOTE		1.0		0.0		0.0		
GREL		0.1111111111111111		0.8518518518518519		0.037037037037037035		
GENT		0.027777777777777776		0.027777777777777776		0.9444444444444444		
Accuracy: 0.9523809523809523

Dr. Dobbs' Journal version:

		GVOTE				GREL				GENT				
GVOTE		1.0		0.0		0.0		
GREL		0.8518518518518519		0.14814814814814814		0.0		
GENT		0.8888888888888888		0.0		0.1111111111111111		
Accuracy: 0.5634920634920635

My initial reaction to this is that the DDJ version will perform better on a larger training set. The confusion matrix indicates that DDJ is assigning the majority of the dev documents the GVOTE label. The GVOTE set of documents is the most frequent label by a factor of two. No conclusion can be made until the classifiers are given more training data.

References

McCallum, A. and Nigam K. “A Comparison of Event Models for Naive Bayes Text Classification”. In AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41-48. Technical Report WS-98-05. AAAI Press. 1998. (available online: PDF).

nlp/add-one-smoothing-performance.txt · Last modified: 2015/04/23 15:41 by ryancha

Back to top