nlp-private:dan [CS Wiki]

Trace: • dan

Other

Initialize Gibbs with noisy marginal technique
Tom Griffiths: measure convergence using Inter-chain metrics (that are immune to label switching)
Evolution (or lack thereof) over time between samples within the same chain
- Auto-correlation
  - likelihood within chains
  - favorite clustering metrics (e.g., ARI, unnormalized K-L divergence of each cell in diagonal)
- Plot these over time (like the divergence movie in a single graph)
Question about negative correlation between MAP sample and metrics on “comb”
EM as refinement of Gibbs to climb to mode of local (?) maxima
Gelman: converged when inter-chain variance is same as intra-chain variance
- on likelihood
- on metrics
Another chain summary idea from Kevin Seppi: most frequent label occurring in last 100 samples

Get comparable likelihood measures for Gibbs and EM
Implement and run Variational EM
Do a Tech Report with the full derivation of the collapsed sampler
Start EM with Gibbs
Do Tech-Paper with a full derivation of the collapsed sampler
Develop the “comb” idea
Implement versions of the partition comparison metrics that can be run on samples (inter-chain and cross-chain)
Look at the mean entropy metric. Can this be adapted for
Experiment with feature selectors/dimensionality reducers
Split out held-out dataset to compute held-out likelihood on Enron
Auto stop-word detection / feature selection
Complete bibliography of clustering techniques in prep

9/15: Present hierachical bisecting k-means clustering algorithm at NLP Lab Meeting
9/25: Finish LogCounter (or set it aside for near-term experiments)
9/21: Label with name for every PC
9/25: Get a copy of Hal's evalutation script
9/30: Figure out the profiling situation - JProfiler
10/2: Send me your 598R PowerPoint
10/6: Subscribe to topic-models at Princeton s://lists.cs.princeton.edu/mailman/listinfo/topic-models
10/6: Factor clustering away from classification
10/6: Hoist computation out of “foreach document” loop and getProbabilities() for anything not specific to the current document. e.g., logDistOfC
10/13: Mine Hal's script for good metrics, etc.
10/11: Fix Adjusted Rand index calculation
11/1: Prepare 10-15 minute presentation detailing your current activities for a CS + Machine Learning audience at the UofU for 11/2.
11/3 : Implement P, R, and F_1 as metrics.
11/10 : Implement Variation of information metric

nlp-private/dan.txt · Last modified: 2015/04/23 13:32 by ryancha