Paper Ideas

[Added 8 May 2008]

  • Follow-up to Dan's KDD paper on Mixture of Multinomials involving convergence diagnostics. (This one is for Dan.)
  • Co-clustering paper explaining a common framework underpinning LDA and the Author-Topic models.
  • LDA with credible intervals or at least a multi-sample technique for estimating P(w|z)
  • Label switching analysis on LDA
  • Length modeling
  • Suverying Gibbs samplers (possibly for MM or for LDA):
    • collapsed v. non-collapsed
    • block v. non-block sampler
    • random scan
    • non-collapsed: continuous variables – diagnose using traditional convergence diagnostics
  • Keyword identification for documents – build on your work for Data Mining group project involving LDA topics
  • Sampling N_d (# of tokens in doc. d) == choosing feature selector in the inference loop
    • choosing dimensionality / cut-off for a fixed feature selector
  • Comparison of Bayesian model selection and non-parametric prior


  • Re-read E & D. Specifically look for values we can use for candidate, batch, and other parameter settings
    • Re-run all experiments using best values
  • Redoing cost model in R
  • Code to compute cost (derived columns)
  • QBC: fix sampling, code review/Vote entropy
  • “Human Waits” Active learning (human only labels 1)
    • Batch but based on previous annotations
  • We want the submitted job to run the revision that was current at submission (could pass revision number, or could set aside binaries)
  • Write out XML file without results at beginning. Dump results that you have on sigkill before terminating
  • Wrap ant script in shell script/python so that it can trap sigkill
  • Fast Maxent uses prior
  • Phase in cutoffs


  • Thesis!!!!
  • Prepare slides for Bayesian reading group
  • MI for POS-Tagging
  • Add MI to LID pipeline


Current Work

  • LDA and Co-clustering
  • Mutual-Information feature selector for POS-Tagging
  • Incorporate Mutual-Information feature selector into LID.

Semi-supervised + Maxent

  • Research work done at BBN on speech N-best lists from Schwartz, Nguyen, and Austin
  • Research work done by Robert Moore on estimating probability distributions using n-best lists


nlp-private/rah67.txt · Last modified: 2015/04/23 13:33 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0