== Other ==

== Items from Qualifying Paper ==
* Initialize Gibbs with noisy marginal technique
* Tom Griffiths: measure convergence using Inter-chain metrics (that are immune to label switching)
* Evolution (or lack thereof) over time between samples within the same chain 
** Auto-correlation
*** likelihood within chains
*** favorite clustering metrics (e.g., ARI, unnormalized K-L divergence of each cell in diagonal)
** Plot these over time (like the divergence movie in a single graph)
* Question about negative correlation between MAP sample and metrics on "comb"
* EM as refinement of Gibbs to climb to mode of local (?) maxima
* Gelman: converged when inter-chain variance is same as intra-chain variance
** on likelihood
** on metrics
* Another chain summary idea from Kevin Seppi: most frequent label occurring in last 100 samples

== Near Term ==
* Get comparable likelihood measures for Gibbs and EM
* Implement and run Variational EM
* Do a Tech Report with the full derivation of the collapsed sampler
* Start EM with Gibbs
* Do Tech-Paper with a full derivation of the collapsed sampler
* Develop the "comb" idea
* Implement versions of the partition comparison metrics that can be run on samples (inter-chain and cross-chain)
* Look at the mean entropy metric.  Can this be adapted for
* Experiment with feature selectors/dimensionality reducers
* Split out held-out dataset to compute held-out likelihood on Enron
* Auto stop-word detection / feature selection
* Complete bibliography of clustering techniques in prep

== Longer term: ==

* Reproduce a result from one of the papers (LDA)
* Identify something in the model that can be improved
* Implement differences and write a paper


==CS 601R==

* Fix held-out set handling for CS 601R

== Done: ==
* 9/15: Present hierachical bisecting k-means clustering algorithm at NLP Lab Meeting
* 9/25:  Finish LogCounter (or set it aside for near-term experiments)
* 9/21:  Label with name for every PC
* 9/25:  Get a copy of Hal's evalutation script
* 9/30: Figure out the profiling situation - JProfiler
* 10/2:  Send me your 598R PowerPoint
* 10/6:  Subscribe to topic-models at Princeton [https://lists.cs.princeton.edu/mailman/listinfo/topic-models]
* 10/6:  Factor clustering away from classification
* 10/6:  Hoist computation out of "foreach document" loop and getProbabilities() for anything not specific to the current document.  e.g., logDistOfC
* 10/13:  Mine Hal's script for good metrics, etc.
* 10/11:  Fix Adjusted Rand index calculation
* 11/1:  Prepare 10-15 minute presentation detailing your current activities for a CS + Machine Learning audience at the UofU for 11/2.
* 11/3 :  Implement P, R, and F_1 as metrics.
* 11/10 : Implement Variation of information metric

== Brainstorming ==

[[Dan/Brainstorming|Brainstorming List]]