Table of Contents

Courses

CS598R Winter 2006: Special Projects - Text Classification and Text Clustering

Welcome to the home page for CS598R Special Topics for Winter 2006.

Our proposal for study is here CS598R Winter 2006 Proposal.

The papers we are reading are located on the Readings page.

Topic Pages

Mixture Models

Naive Bayes

Expectation-Maximization

Maximum Entropy

Log Likelihood Ratios

Log Domain Computations

Cluster Quality Metrics

Experiments

Add-one Smoothing Performance

Gaussian Mixture Distributions as a fit for a Non-parametric distribution

Naive Bayes with EM and Unlabeled Documents vs Naive Bayes

TF Feature Selection With Naive Bayes Multinomial

Distributional Word Clustering

Feature Selection with Naive Bayes

Document Clustering

Data

Reuters Data

Reuters21578

Enron

Student Pages

Msg26

Tools and Resources

ARFF Data Format

Weka

LibSVM

Windows Desktop Search API

Research From Abroad

Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora

Ron Bekkerman's Research