== Other == == Items from Qualifying Paper == * Initialize Gibbs with noisy marginal technique * Tom Griffiths: measure convergence using Inter-chain metrics (that are immune to label switching) * Evolution (or lack thereof) over time between samples within the same chain ** Auto-correlation *** likelihood within chains *** favorite clustering metrics (e.g., ARI, unnormalized K-L divergence of each cell in diagonal) ** Plot these over time (like the divergence movie in a single graph) * Question about negative correlation between MAP sample and metrics on "comb" * EM as refinement of Gibbs to climb to mode of local (?) maxima * Gelman: converged when inter-chain variance is same as intra-chain variance ** on likelihood ** on metrics * Another chain summary idea from Kevin Seppi: most frequent label occurring in last 100 samples == Near Term == * Get comparable likelihood measures for Gibbs and EM * Implement and run Variational EM * Do a Tech Report with the full derivation of the collapsed sampler * Start EM with Gibbs * Do Tech-Paper with a full derivation of the collapsed sampler * Develop the "comb" idea * Implement versions of the partition comparison metrics that can be run on samples (inter-chain and cross-chain) * Look at the mean entropy metric. Can this be adapted for * Experiment with feature selectors/dimensionality reducers * Split out held-out dataset to compute held-out likelihood on Enron * Auto stop-word detection / feature selection * Complete bibliography of clustering techniques in prep == Longer term: == * Reproduce a result from one of the papers (LDA) * Identify something in the model that can be improved * Implement differences and write a paper ==CS 601R== * Fix held-out set handling for CS 601R == Done: == * 9/15: Present hierachical bisecting k-means clustering algorithm at NLP Lab Meeting * 9/25: Finish LogCounter (or set it aside for near-term experiments) * 9/21: Label with name for every PC * 9/25: Get a copy of Hal's evalutation script * 9/30: Figure out the profiling situation - JProfiler * 10/2: Send me your 598R PowerPoint * 10/6: Subscribe to topic-models at Princeton [https://lists.cs.princeton.edu/mailman/listinfo/topic-models] * 10/6: Factor clustering away from classification * 10/6: Hoist computation out of "foreach document" loop and getProbabilities() for anything not specific to the current document. e.g., logDistOfC * 10/13: Mine Hal's script for good metrics, etc. * 10/11: Fix Adjusted Rand index calculation * 11/1: Prepare 10-15 minute presentation detailing your current activities for a CS + Machine Learning audience at the UofU for 11/2. * 11/3 : Implement P, R, and F_1 as metrics. * 11/10 : Implement Variation of information metric == Brainstorming == [[Dan/Brainstorming|Brainstorming List]]