= TODO = === Paper Ideas === [Added 8 May 2008] * Follow-up to Dan's KDD paper on Mixture of Multinomials involving convergence diagnostics. (This one is for Dan.) * Co-clustering paper explaining a common framework underpinning LDA and the Author-Topic models. * LDA with credible intervals or at least a multi-sample technique for estimating P(w|z) * Label switching analysis on LDA * Length modeling * Suverying Gibbs samplers (possibly for MM or for LDA): ** collapsed v. non-collapsed ** block v. non-block sampler ** random scan ** non-collapsed: continuous variables -- diagnose using traditional convergence diagnostics * Keyword identification for documents -- build on your work for Data Mining group project involving LDA topics * Sampling N_d (# of tokens in doc. d) == choosing feature selector in the inference loop ** choosing dimensionality / cut-off for a fixed feature selector * Comparison of Bayesian model selection and non-parametric prior === ALFA === * Re-read E & D. Specifically look for values we can use for candidate, batch, and other parameter settings ** Re-run all experiments using best values * Redoing cost model in R * Code to compute cost (derived columns) * QBC: fix sampling, code review/Vote entropy * "Human Waits" Active learning (human only labels 1) ** Batch but based on previous annotations * We want the submitted job to run the revision that was current at submission (could pass revision number, or could set aside binaries) * Write out XML file without results at beginning. Dump results that you have on sigkill before terminating * Wrap ant script in shell script/python so that it can trap sigkill * Fast Maxent uses prior * Phase in cutoffs === Other === * Thesis!!!! * Prepare slides for Bayesian reading group * MI for POS-Tagging * Add MI to LID pipeline = Fellowships = * BYU Graduate Research Fellowship Award: http://www.byu.edu/gradstudies/index.php?action=resources.fellowship&fellowshipid=57 ** Application due around January 2008 = Current Work = * LDA and Co-clustering * Mutual-Information feature selector for POS-Tagging * Incorporate Mutual-Information feature selector into LID. == Semi-supervised + Maxent == * Research work done at BBN on speech N-best lists from Schwartz, Nguyen, and Austin * Research work done by Robert Moore on estimating probability distributions using n-best lists = Brainstorming = [[User:Rah67/Brainstorming|Brainstorming List]]