Other
Items from Qualifying Paper
Initialize Gibbs with noisy marginal technique
Tom Griffiths: measure convergence using Inter-chain metrics (that are immune to label switching)
Evolution (or lack thereof) over time between samples within the same chain
Question about negative correlation between MAP sample and metrics on “comb”
EM as refinement of Gibbs to climb to mode of local (?) maxima
Gelman: converged when inter-chain variance is same as intra-chain variance
Another chain summary idea from Kevin Seppi: most frequent label occurring in last 100 samples
Near Term
Get comparable likelihood measures for Gibbs and EM
Implement and run Variational EM
Do a Tech Report with the full derivation of the collapsed sampler
Start EM with Gibbs
Do Tech-Paper with a full derivation of the collapsed sampler
Develop the “comb” idea
Implement versions of the partition comparison metrics that can be run on samples (inter-chain and cross-chain)
Look at the mean entropy metric. Can this be adapted for
Experiment with feature selectors/dimensionality reducers
Split out held-out dataset to compute held-out likelihood on Enron
Auto stop-word detection / feature selection
Complete bibliography of clustering techniques in prep
Longer term:
Reproduce a result from one of the papers (LDA)
Identify something in the model that can be improved
Implement differences and write a paper
CS 601R
Done:
9/15: Present hierachical bisecting k-means clustering algorithm at NLP Lab Meeting
9/25: Finish LogCounter (or set it aside for near-term experiments)
9/21: Label with name for every PC
9/25: Get a copy of Hal's evalutation script
9/30: Figure out the profiling situation - JProfiler
10/2: Send me your 598R PowerPoint
-
10/6: Factor clustering away from classification
10/6: Hoist computation out of “foreach document” loop and getProbabilities() for anything not specific to the current document. e.g., logDistOfC
10/13: Mine Hal's script for good metrics, etc.
10/11: Fix Adjusted Rand index calculation
11/1: Prepare 10-15 minute presentation detailing your current activities for a CS + Machine Learning audience at the UofU for 11/2.
11/3 : Implement P, R, and F_1 as metrics.
11/10 : Implement Variation of information metric
Brainstorming
Back to top