Week 1: Text Classification with Naive Bayes

  • “A Comparison of Event Models for Naive Bayes Text Classification”, by Andrew McCallum and Kamal Nigam. In AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41-48. Technical Report WS-98-05. AAAI Press. 1998. PDF.
  • (optional) “Naive Bayes Text Classification: A Statistical Natural Language Processing Project”, by Chris Monson Chris_Monson.pdf.

Week 2: Semi-Supervised Learning with Naive Bayes and Expectation Maximization

  • “Learning to Classify Text from Labeled and Unlabeled Documents”, by Kamal Nigam, Andrew McCallum, Sebastian Thrun and Tom Mitchell. PDF (8 pages)
  • (optional) “Text Classification from Labeled and Unlabeled Documents using EM”, by Kamal Nigam, Andrew McCallum, Sebastian Thrun and Tom Mitchell. Machine Learning, 39(2/3). pp. 103-134. 2000. PDF (34 pages)

Week 3: Text Classification with Maximum Entropy

  • “Using Maximum Entropy for Text Classification”, by Kamal Nigam, John Lafferty, Andrew McCallum. PDF (7 pages)
  • (optional) “A Maximum Entropy Approach to Natural Language Processing”, by Adam Berger, Vincent Della Pietra, Stephen Della Pietra. PDF (34 pages)

Week 4: Feature Selection

  • Mutual information and Log-Likelihood ratio sections in Manning & Schuetze: 5.1-5.4
  • (optional) “A comparative study on feature selection for text categorization”, by Yiming Yang and Jan Pedersen. PDF

Week 5: Feature Selection in the Learning Loop

  • Focus on the section 4 about feature selection in the learning loop: “A Maximum Entropy Approach to Natural Language Processing”, by Adam Berger, Vincent Della Pietra, Stephen Della Pietra. PDF

Week 6: Feature Selection as Word Clustering

  • “Distributional Clustering of Words for Text Classification”, by Douglas Baker and Andrew McCallum. PDF

Week 7: Text Classification with Support Vector Machines

  • Work through as much of the SVM Tutorial by Nello Cristianini as you can. I don't expect you to get all the way through this. Presentation slides from ICML 2001 Tutorial: PDF
  • “Text Categorization with Support Vector Machines: Learning with Many Relevant Features”, by Thorsten Joachims. PDF

Moving on to text clustering …

Weeks 8 & 9: Clustering with Naive Bayes

  • “An Experimental Comparison of Several Clustering and Initialization Methods”, by Marina Meila and David Heckerman. Try to fight through the whole thing. PS

Week 10: Bayesian Smoothing

Week 11: Going Beyond Naive Bayes

  • “Latent Dirichlet Allocation”, by D. Blei, A. Ng, and M. Jordan. This is dense. Read as much of this as you can. PDF

Extra reading:

Clustering Email

  • “Inferring Ongoing Activities of Workstation Users by Clustering Email”. PDF

Shorter version: PDF

  • “Automatic Discovery of Personal Topics To Organize Email”.

PDF by Arun C. Surendran, John C. Platt and Erin Renshaw, Conference on Email and Anti-Spam, 21-22 July at Stanford University, 2005.

nlp/readings.txt · Last modified: 2015/04/23 21:37 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0