Table of Contents

Mark's Notes These are my notes from McCallumNigam_NaiveBayes-aaaiws98.pdf ://faculty.cs.byu.edu/~ringger/papers/McCallumNigam_NaiveBayes-aaaiws98.pdf.

Multi-variate Bernoulli Event Model

Multinomial Event Model

Bayesian Learning Framework

Let's assume that there is a finite number of generative classes and that one could gain access to all documents from each class. In other words, assume that one could gain access to every document ever written and that these documents represent every generative class possible. Could one use a clustering technique on this corpus to discover the generative classes? Perhaps only a substantial amount of documents (much smaller than all of them) need to be representing each generative class.

Consider a randomly selected document from the corpus of all documents ever written. Apply this document to a Bernoulli test, where success denotes that the document is from a generative class of interest, specified as a parameter of the Bernoulli test. Is this feasible?