TODO

[Added 8 May 2008]

Follow-up to Dan's KDD paper on Mixture of Multinomials involving convergence diagnostics. (This one is for Dan.)
Co-clustering paper explaining a common framework underpinning LDA and the Author-Topic models.
LDA with credible intervals or at least a multi-sample technique for estimating P(w|z)
Label switching analysis on LDA
Length modeling
Suverying Gibbs samplers (possibly for MM or for LDA):
- collapsed v. non-collapsed
- block v. non-block sampler
- random scan
- non-collapsed: continuous variables – diagnose using traditional convergence diagnostics
Keyword identification for documents – build on your work for Data Mining group project involving LDA topics
Sampling N_d (# of tokens in doc. d) == choosing feature selector in the inference loop
- choosing dimensionality / cut-off for a fixed feature selector
Comparison of Bayesian model selection and non-parametric prior

Re-read E & D. Specifically look for values we can use for candidate, batch, and other parameter settings
- Re-run all experiments using best values

Redoing cost model in R
Code to compute cost (derived columns)
QBC: fix sampling, code review/Vote entropy
“Human Waits” Active learning (human only labels 1)
- Batch but based on previous annotations
We want the submitted job to run the revision that was current at submission (could pass revision number, or could set aside binaries)
Write out XML file without results at beginning. Dump results that you have on sigkill before terminating
Wrap ant script in shell script/python so that it can trap sigkill
Fast Maxent uses prior

Fellowships

Research work done at BBN on speech N-best lists from Schwartz, Nguyen, and Austin
Research work done by Robert Moore on estimating probability distributions using n-best lists