Course Questions
Adam Drake's game review data
Find a good summarization data set
Used by state-of-the-art approaches, or at least bayesian ones
Hal's paper on Query-based Summarization is a good start
Enable calling of libSVM code from code base without need of text files.
Clean-up/update rubrics for projects.
TODO: Set up the code distribution system better:
Suggestion, move support code to a jar and make stub classes separate from the jar. It can then be distributed, through svn or tarball, and changes should only involve the jar. Also the stub classes can be released with the labs instead of all at once.
George's TODO
-
Solve the course questions I can
Analyze/Cleanup cluster browser as a way to start thinking about visualization
Check out all instruction pages (ie. supercomputer how-to, etc.)
Old Material
(to be moved up or deleted soon)
Organization of codebase: separating edu.berkeley.nlp and edu.byu.nlp
Fix handling of held-out data
Projects:
Text Class.: get to know tokenization pipeline and do simple text classifier (like k-NN)
Text Class.: Naive Bayes
Text Class.: MaxEnt or SVM
Clustering: k-Means
Clustering: EM
Clustering: LDA
Final Project: help fill our clustering survey matrix
Final Presentation:
Implement clustering algorithm from the literature in our framework
Evaluate on given datasets
Do Error Analysis
Propose some improvement
Run experiments using the improved algorithm
Evaluate
Datasets:
Back to top