George: Finish up Forward Entropy paper/Comparison of QBUE and QBU
George: Prepare a presentation on Voted Perceptron/ Averaged Perceptron and CRFs
George: bibliography on private wiki
Eric: give all access to private wiki
Eric: action list page on the wiki
George: upload the PDF for each paper
Eric: create mailing list
Robbie: Coordinate the creation of Subversion repository with Marc
Marc: find out about POS annotated poetry (BNC data is stored in the data directory as BNC.zip, Emily Dickinson data is on the way)
Marc: abstract query-by-x
Marc: Submit appropriate subset of his code
Marc: add “future-work” list (from 401R/581 final projects) to this action list (This was added by Robbie under Abstraction and Parameterization)
Robbie: share suggestions on abstraction with Marc
James: do asymptotic analysis (Big-O) of query-by-EVSI (full EVSI)
Peter: make query-by-uncertainty conform to Marc's query-by-X interface in the shared code-base
Everyone: check out the Alembic Workbench, Callisto (Java)
Eric: write up other query-by-uncertainty approaches
Peter: another query-by-uncertainty, with uncertainty measured by (1-max_{_t_} P(_T_=_t_)) (i.e., 1 - P(viterbi sequence))
Peter: approx. per sentence QBU, and weighted QBU
James: theory - compare EVSI and Q-by-uncert
James: Write QBU v. EVSI insights
James: Write up asymptotic analysis of EVSI
Marc: share results of query-set batch size experiment (10, 100, 1000 sentences) on the experiment log page
Marc: QBC
Marc: Random Baseline, multiple runs
George: full-sentence query-by-uncertainty, where entropy is computed using Monte Carlo sampling
Peter Change ActiveLearner to do data splits online - enables randomization of all experiments; allow percentage based on word or based on sentence (should be close, but possibly small variance)
Peter: experiment on query-set batch size (10, 100, 1000, 10K words); take whole sentences only; word count is lower bound (allow for extra words if necessary to get whole sent.)
George: code review of MC math
Peter: automate ant build and python script for running on supercomputer.
Peter: code optimization and abstraction
Eric: write and submit draft
Peter: Put EMNLP results on entropy
George: pull together quickly your existing writings and some brainstorms about what to do next with MC decoder
George: post results summarizing the performance of Monte Carlo tagger to estimate P(_t_ | _w_)
automatic search for thresholds on MC decoding that yield perf. comparable to Viterbi/beam search
search for thresholds on MC decoding that yield a full distribution (measured by entropy) in as little time as possible.
George: post results from using MC decoding in full-sentence QBU
Peter: Build a little Unicode Syriac diplay app to verify that the plumbing works on the NT data
Peter: Change active learner to default with 1 sentence of Initial Training
Peter: Consolidation / refactoring of code
Peter: specify Normalization, Weighting in the config. file, independently of experiment's name
George: present results on QBU with per-word importance weighting
George: finish a 10-entry annotated bib. on active learning
Peter: Syriac Reader: employ the existing word_TAG reader
Peter: Complete Word/Tag/Not a Tag distinction
Peter: Run Syriac test involving monolithic tag
Peter: Measure mutual information of all subtag pairs
Peter: Measure the number (pctg.) of tags in Syriac devtest not seen in training set
Peter: Experiment with PTB as unlabeled set; compute informativeness on random sub-samples
Old: Experimental Regimen produced four graphs:
x: # of labeled sententences - sentence-at-a-time
x: # of corrected words while labeling sentences
x: # of labeled words - word-at-a-time
x: # of corrected words while labeling words
Ideal: x: total cost (assuming a model of cost in time or $$)
George: Linear-time sequence entropy