Differences

This shows you the differences between two versions of the page.

Link to this comparison view

nlp:add-one-smoothing-performance [2015/04/23 21:41] (current)
ryancha created
Line 1: Line 1:
 +How does Add-one smoothing affect performance in a multinomial naive bayes text classifier?
  
 +PLEASE NOTE BEFORE READING
 +This document is not complete yet. I may have implemented DDJ Naive Bayes incorrectly. This document offers no scientific data or conclusions. I'm just documenting an ad hoc experiment.
 +
 +
 +Compared to a poor-man'​s smoothing mechanism, it increases performance significantly.
 +
 +I compared the naive bayes implementation found in the Dr. Dobbs' Journal May 2005 with a simplified version of naive bayes from (McCallum, 1998). The (McCallum, 1998) version significantly outperformed the Dr. Dobbs' Journal by almost 2 times. Here are the results on a single data set.
 +
 +(McCallum, 1998) uses Laplacean smoothing, also known as Add-one smoothing. DDJ uses what I call a poor man's smoothing mechanism. Smoothing is necessary when a token is encountered during classification that does not occur in the training data. Poor man's smoothing simply uses a hard-coded very low probability to handle this. Both Add-one smoothing and poor man's smoothing provide low probabilities for this event, but Add-one smoothing is more statistically correct.
 +
 +(McCallum, 1998) version:
 +<pre>
 + GVOTE GREL GENT
 +GVOTE 1.0 0.0 0.0
 +GREL 0.1111111111111111 0.8518518518518519 0.037037037037037035
 +GENT 0.027777777777777776 0.027777777777777776 0.9444444444444444
 +Accuracy: 0.9523809523809523
 +</​pre>​
 +
 +
 +Dr. Dobbs' Journal version:
 +<pre>
 + GVOTE GREL GENT
 +GVOTE 1.0 0.0 0.0
 +GREL 0.8518518518518519 0.14814814814814814 0.0
 +GENT 0.8888888888888888 0.0 0.1111111111111111
 +Accuracy: 0.5634920634920635
 +</​pre>​
 +
 +My initial reaction to this is that the DDJ version will perform better on a larger training set. The confusion matrix indicates that DDJ is assigning the majority of the dev documents the GVOTE label. The GVOTE set of documents is the most frequent label by a factor of two. No conclusion can be made until the classifiers are given more training data.
 +
 +==References==
 +* McCallum, A. and Nigam K.  "A Comparison of Event Models for Naive Bayes Text Classification"​. ​ In AAAI/​ICML-98 Workshop on Learning for Text Categorization,​ pp. 41-48. Technical Report WS-98-05. AAAI Press. 1998.  ''​(available online: [http://​www.kamalnigam.com/​papers/​multinomial-aaaiws98.pdf PDF])''​.
nlp/add-one-smoothing-performance.txt ยท Last modified: 2015/04/23 21:41 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0