17 Oct 2007
all words are now rare
baseline (inspired by bug): train model at round 1 (on batch 1) and use it across the experiment
- essentially longest sentence
alternative: stop updating model at round x
- (similar to switching to random selection at round x)
- for some sufficiently large x, should see no disadvantage
issue: cost of waiting for computer to select next sample
idea: time-limited sample selection (use stale scores, if time doesn't permit more work)
- time limit could be fixed const. or determined by completion of annotator's work unit.
measure cumulative time of experiments
idea: most common words per length of sentence
oracle labeled data split into training and test.
19 Feb. 2008 ALFA Notes from whiteboard
Multiple annotators/annotations
Simple model w/better selection (at first)
More look-ahead w/ simple model
Interactive constrained Viterbi (cost perspective)
3 processes
Proper initialization of prior – convergence of fast maxent (acero & Chelba)
QBC:
Committee selection
Disagreement metric
Size
Randomize of unannotated set every round
Profile
Net cost = gross cost – annotation time
For 1 cycle of a.l.
A. both idle (have this)
Human idle
No-one idle
Computer idle
Candidate set size
Data/results “commit”
Q_HC Q_CH
0 0
\infty 0
\infty \infty
0 \infty
Diagram: edge from H to C and from C to H. Edges annotated by Q_HC and Q_CH, respectively
6/6/2008 Syriac project meeting and 6/12/2008 Morph. Tagging project meeting
Questions:
1 character per prefix? yes
more than one prefix? 0-3
is a single separator ambiguous in some cases? use 2: one for prefix and one for suffix
multi-purpose interface: active learning, browsing, review for proofing
are we tagging prefixes or just segmenting them?
Action:
Brandon: clicking on side words updates attribute-value box's label
Peter: the prefix string is a value tagged by the Syriac tagger. shouldn't be.
Eric: follow up with Harry Diakoff
Auto-complete in little-language box
Support keyboard-only data entry
Update public site for ALFA project
6/13/2008 Syriac project meeting
Prefixes:
dolath (d)
lamad (l)
waw (conj)
prepositions
Suffixes: small, finite number
Idea: Layered / Prioritized Tags
tool should support full expressivity (full tag set)
tool should be configurable to hide layers (reduced tag set)
Configurable: may later want other layers of annotation, other distinctions
Important: supporting linkage of stems to headwords (“lexemes”) in the dictionary
Modes:
Features:
Reveal context: tool-tips should reveal attributes on more distant tokens in any text view
configureable amount of context
situated in the corpus, in a document
ability to browse files: explorer right on the left
ability to browse dictionary: dictionary pane on the right
ability to link stems to lexemes in the dictionary
ability to add to dictionary with pointer(s) back to corpus for examples
Roles for annotators:
24 June 2008 ALFA project meeting
Questions / variables for user study:
scope of jumps taken by active learner / measure the cost of context switching
article
genre
time period
author
corpus
availability of lexicon
how much context
granularity of annotation:
sentence
phrase
word
word sub-tag
correct or annotated from scratch
presentation of top-N model hypotheses
order of forced annotation: step-by-step in order or jumping (per AL)
Research question:
Separate problem: vowel restoration
Action:
Interface modes:
1. active learning
**machine determines granularity: sentence, phrase, word, sub-tag
2. review mode: sequential order
3. review mode: arbitrary order
4. review mode: AL order
5. browse (no changes)
Date unknown (prior to 8 July 2008)
8 July 2008 ALFA project meeting
Proposal to Harry Diakoff
Paper ideas:
Wait for it! Cost/benefit trade-offs in waiting
probability of datum - later (15 July 08 ?) decided to be unpromising based on prior experiments and discussion
utility / loss as part of active learner
multiple annotators / imperfect annotators
Cost model on the fly
Cost implications of error correction propagation
see Culotta & McCallum
another point on the correction vs. from scratch spectrum
Greedy EVSI
Particle filters for Bayesian models in AL
17 July 2008 Morph Tagging project meeting
Ask Ivan @ MS about Win Server licenses
Re-engage with Marc Carmen
Paul & Brandon do web-based prototype
Paul: GWT & JSF
Brandon: ASP & JSF
Eric: architecture for client/server set-up in Visio diagram
18 July 2008 Syriac project meeting
Features for prototype
Inspect lexical entry
Nestorian font
Font size
Features for review mode in prototype
highlight word in top line
single line of review cells
divider bar separating text (top) from review cells (bottom); ability to move to create more or fewer lines of review cells
allowance for multiple annotations
ability to change
indicator: tagged by human (blue) or machine (yellow)
reveal auto tag. on demand
reviewer can acknowledge blue and yellow tags
reveal levels of confidence on yellow cells - little confidence bars?
editing lens
progress tracker
Features for active learning mode in prototype:
“next” button should be “done button” when word-at-a-time
sent.-at-a-time: previous word, next word, done
“back” button to return to previous case
highlight
hide file browser
remove review boxes above and below the “lens” row
add path (corpus –> author –> doc) at top of view for context
place the highlighted word in the middle
use yellow & blue highlights on text in text view and in edit controls
constraint and prediction
changing attribute affects lexeme and vice versa
segment, then constraint lexicon, then attributes
Features for browse mode in prototype
tool tips on every word
link to dictionary
Features for all modes in prototype
22 July 2008 Combined ALFA / Morph. Tagging project meetings
Focus: knowledge-free
Perspective: cost reduction
Partial solutions:
Whole solution:
Reminders on Syriac tagger:
Proposals:
1. predict # of characters in prefix, suffix
2. choose among letter sequence as prefix
Back to top