Differences

This shows you the differences between two versions of the page.

Link to this comparison view

nlp-private:dan [2015/04/23 19:32] (current)
ryancha created
Line 1: Line 1:
 +== Other ==
 +
 +== Items from Qualifying Paper ==
 +* Initialize Gibbs with noisy marginal technique
 +* Tom Griffiths: measure convergence using Inter-chain metrics (that are immune to label switching)
 +* Evolution (or lack thereof) over time between samples within the same chain 
 +** Auto-correlation
 +*** likelihood within chains
 +*** favorite clustering metrics (e.g., ARI, unnormalized K-L divergence of each cell in diagonal)
 +** Plot these over time (like the divergence movie in a single graph)
 +* Question about negative correlation between MAP sample and metrics on "​comb"​
 +* EM as refinement of Gibbs to climb to mode of local (?) maxima
 +* Gelman: converged when inter-chain variance is same as intra-chain variance
 +** on likelihood
 +** on metrics
 +* Another chain summary idea from Kevin Seppi: most frequent label occurring in last 100 samples
 +
 +== Near Term ==
 +* Get comparable likelihood measures for Gibbs and EM
 +* Implement and run Variational EM
 +* Do a Tech Report with the full derivation of the collapsed sampler
 +* Start EM with Gibbs
 +* Do Tech-Paper with a full derivation of the collapsed sampler
 +* Develop the "​comb"​ idea
 +* Implement versions of the partition comparison metrics that can be run on samples (inter-chain and cross-chain)
 +* Look at the mean entropy metric. ​ Can this be adapted for
 +* Experiment with feature selectors/​dimensionality reducers
 +* Split out held-out dataset to compute held-out likelihood on Enron
 +* Auto stop-word detection / feature selection
 +* Complete bibliography of clustering techniques in prep
 +
 +== Longer term: ==
 +
 +* Reproduce a result from one of the papers (LDA)
 +* Identify something in the model that can be improved
 +* Implement differences and write a paper
 +
 +
 +==CS 601R==
 +
 +* Fix held-out set handling for CS 601R
 +
 +== Done: ==
 +* 9/15: Present hierachical bisecting k-means clustering algorithm at NLP Lab Meeting
 +* 9/25:  Finish LogCounter (or set it aside for near-term experiments)
 +* 9/21:  Label with name for every PC
 +* 9/25:  Get a copy of Hal's evalutation script
 +* 9/30: Figure out the profiling situation - JProfiler
 +* 10/2:  Send me your 598R PowerPoint
 +* 10/6:  Subscribe to topic-models at Princeton [https://​lists.cs.princeton.edu/​mailman/​listinfo/​topic-models]
 +* 10/6:  Factor clustering away from classification
 +* 10/6:  Hoist computation out of "​foreach document"​ loop and getProbabilities() for anything not specific to the current document. ​ e.g., logDistOfC
 +* 10/​13: ​ Mine Hal's script for good metrics, etc.
 +* 10/​11: ​ Fix Adjusted Rand index calculation
 +* 11/1:  Prepare 10-15 minute presentation detailing your current activities for a CS + Machine Learning audience at the UofU for 11/2.
 +* 11/3 :  Implement P, R, and F_1 as metrics.
 +* 11/10 : Implement Variation of information metric
 +
 +== Brainstorming ==
 +
 +[[Dan/​Brainstorming|Brainstorming List]]
  
nlp-private/dan.txt ยท Last modified: 2015/04/23 19:32 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0