##### Differences

This shows you the differences between two versions of the page.

 — nlp-private:dan [2015/04/23 13:32] (current)ryancha created 2015/04/23 13:32 ryancha created 2015/04/23 13:32 ryancha created Line 1: Line 1: + == Other == + + == Items from Qualifying Paper == + * Initialize Gibbs with noisy marginal technique + * Tom Griffiths: measure convergence using Inter-chain metrics (that are immune to label switching) + * Evolution (or lack thereof) over time between samples within the same chain + ** Auto-correlation + *** likelihood within chains + *** favorite clustering metrics (e.g., ARI, unnormalized K-L divergence of each cell in diagonal) + ** Plot these over time (like the divergence movie in a single graph) + * Question about negative correlation between MAP sample and metrics on "​comb"​ + * EM as refinement of Gibbs to climb to mode of local (?) maxima + * Gelman: converged when inter-chain variance is same as intra-chain variance + ** on likelihood + ** on metrics + * Another chain summary idea from Kevin Seppi: most frequent label occurring in last 100 samples + + == Near Term == + * Get comparable likelihood measures for Gibbs and EM + * Implement and run Variational EM + * Do a Tech Report with the full derivation of the collapsed sampler + * Start EM with Gibbs + * Do Tech-Paper with a full derivation of the collapsed sampler + * Develop the "​comb"​ idea + * Implement versions of the partition comparison metrics that can be run on samples (inter-chain and cross-chain) + * Look at the mean entropy metric. ​ Can this be adapted for + * Experiment with feature selectors/​dimensionality reducers + * Split out held-out dataset to compute held-out likelihood on Enron + * Auto stop-word detection / feature selection + * Complete bibliography of clustering techniques in prep + + == Longer term: == + + * Reproduce a result from one of the papers (LDA) + * Identify something in the model that can be improved + * Implement differences and write a paper + + + ==CS 601R== + + * Fix held-out set handling for CS 601R + + == Done: == + * 9/15: Present hierachical bisecting k-means clustering algorithm at NLP Lab Meeting + * 9/25:  Finish LogCounter (or set it aside for near-term experiments) + * 9/21:  Label with name for every PC + * 9/25:  Get a copy of Hal's evalutation script + * 9/30: Figure out the profiling situation - JProfiler + * 10/2:  Send me your 598R PowerPoint + * 10/6:  Subscribe to topic-models at Princeton [https://​lists.cs.princeton.edu/​mailman/​listinfo/​topic-models] + * 10/6:  Factor clustering away from classification + * 10/6:  Hoist computation out of "​foreach document"​ loop and getProbabilities() for anything not specific to the current document. ​ e.g., logDistOfC + * 10/​13: ​ Mine Hal's script for good metrics, etc. + * 10/​11: ​ Fix Adjusted Rand index calculation + * 11/1:  Prepare 10-15 minute presentation detailing your current activities for a CS + Machine Learning audience at the UofU for 11/2. + * 11/3 :  Implement P, R, and F_1 as metrics. + * 11/10 : Implement Variation of information metric + + == Brainstorming == + + [[Dan/​Brainstorming|Brainstorming List]]