nlp-private:cs-679-action

Course Questions

Adam Drake's game review data
- Consider running a histogram, etc. to find Positive-Negative with or w/o netural.
- Make usable for class
Find a good summarization data set
- Used by state-of-the-art approaches, or at least bayesian ones
- Hal's paper on Query-based Summarization is a good start
Enable calling of libSVM code from code base without need of text files.
Clean-up/update rubrics for projects.
TODO: Set up the code distribution system better:
- Suggestion, move support code to a jar and make stub classes separate from the jar. It can then be distributed, through svn or tarball, and changes should only involve the jar. Also the stub classes can be released with the labs instead of all at once.

(to be moved up or deleted soon)

Organization of codebase: separating edu.berkeley.nlp and edu.byu.nlp

Fix handling of held-out data

Projects:

Text Class.: get to know tokenization pipeline and do simple text classifier (like k-NN)
Text Class.: Naive Bayes
Text Class.: MaxEnt or SVM
Clustering: k-Means
Clustering: EM
Clustering: LDA
Final Project: help fill our clustering survey matrix
- Pick unique entries from list that add up to a certain number of “difficulty points”

Final Presentation:

Datasets: