nlp-private:2008-meeting-notes [CS Wiki]

17 Oct 2007
19 Feb. 2008 ALFA Notes from whiteboard
6/6/2008 Syriac project meeting and 6/12/2008 Morph. Tagging project meeting
6/13/2008 Syriac project meeting
24 June 2008 ALFA project meeting
Date unknown (prior to 8 July 2008)
8 July 2008 ALFA project meeting
17 July 2008 Morph Tagging project meeting
18 July 2008 Syriac project meeting
22 July 2008 Combined ALFA / Morph. Tagging project meetings

17 Oct 2007

all words are now rare

baseline (inspired by bug): train model at round 1 (on batch 1) and use it across the experiment - essentially longest sentence

alternative: stop updating model at round x - (similar to switching to random selection at round x) - for some sufficiently large x, should see no disadvantage

issue: cost of waiting for computer to select next sample

idea: time-limited sample selection (use stale scores, if time doesn't permit more work) - time limit could be fixed const. or determined by completion of annotator's work unit.

measure cumulative time of experiments

idea: most common words per length of sentence

oracle labeled data split into training and test.

19 Feb. 2008 ALFA Notes from whiteboard

Multiple annotators/annotations
Simple model w/better selection (at first)
More look-ahead w/ simple model
Interactive constrained Viterbi (cost perspective)
3 processes
Proper initialization of prior – convergence of fast maxent (acero & Chelba)
QBC:
- Committee selection
- Disagreement metric
- Size
Randomize of unannotated set every round
Profile
Net cost = gross cost – annotation time
For 1 cycle of a.l.
- A. both idle (have this)
- Human idle
- No-one idle
- Computer idle
Candidate set size
Data/results “commit”

Q_HC Q_CH 0 0 \infty 0 \infty \infty 0 \infty

First two rows: pay when human waits; no pay when human waits

Diagram: edge from H to C and from C to H. Edges annotated by Q_HC and Q_CH, respectively

Q_HC

Q_CH

6/6/2008 Syriac project meeting and 6/12/2008 Morph. Tagging project meeting

Questions:

1 character per prefix? yes
more than one prefix? 0-3
is a single separator ambiguous in some cases? use 2: one for prefix and one for suffix
multi-purpose interface: active learning, browsing, review for proofing
are we tagging prefixes or just segmenting them?

Action: Brandon: clicking on side words updates attribute-value box's label Peter: the prefix string is a value tagged by the Syriac tagger. shouldn't be. Eric: follow up with Harry Diakoff

Auto-complete in little-language box
Support keyboard-only data entry
Update public site for ALFA project

6/13/2008 Syriac project meeting

Prefixes:

dolath (d)
lamad (l)
waw (conj)
prepositions

Suffixes: small, finite number

Idea: Layered / Prioritized Tags

tool should support full expressivity (full tag set)
tool should be configurable to hide layers (reduced tag set)
Configurable: may later want other layers of annotation, other distinctions

Important: supporting linkage of stems to headwords (“lexemes”) in the dictionary

Modes:

Active learning
Review

Features:

Reveal context: tool-tips should reveal attributes on more distant tokens in any text view
configureable amount of context
situated in the corpus, in a document
ability to browse files: explorer right on the left
ability to browse dictionary: dictionary pane on the right
ability to link stems to lexemes in the dictionary
ability to add to dictionary with pointer(s) back to corpus for examples

Roles for annotators:

editors of dictionary
non-editors - can only propose new entries

24 June 2008 ALFA project meeting

Questions / variables for user study:

scope of jumps taken by active learner / measure the cost of context switching
- article
- genre
- time period
- author
- corpus
availability of lexicon
how much context
- sentence, phrase, QWIC, etc.
granularity of annotation:
- sentence
- phrase
- word
- word sub-tag
correct or annotated from scratch
presentation of top-N model hypotheses
order of forced annotation: step-by-step in order or jumping (per AL)

Research question:

1st item choice
online cost model estimation
which model to select data for user study
- minimize bias of item selector for user study
layers of annotation

Separate problem: vowel restoration

Action:

Hebrew OOV investigation

Interface modes: 1. active learning **machine determines granularity: sentence, phrase, word, sub-tag 2. review mode: sequential order 3. review mode: arbitrary order 4. review mode: AL order 5. browse (no changes)

Date unknown (prior to 8 July 2008)

Projects to complete:

User study for Syriac
Simulating AL for Syriac
Cost model on the fly
Get Habash/Rambow data for Arabic

8 July 2008 ALFA project meeting

Proposal to Harry Diakoff

Machine learning
Annotation
could be joint with BYU Classics or with Perseus Project
2 pages

Paper ideas:

Wait for it! Cost/benefit trade-offs in waiting
probability of datum - later (15 July 08 ?) decided to be unpromising based on prior experiments and discussion
utility / loss as part of active learner
multiple annotators / imperfect annotators
Cost model on the fly
Cost implications of error correction propagation
see Culotta & McCallum
another point on the correction vs. from scratch spectrum
Greedy EVSI
Particle filters for Bayesian models in AL

17 July 2008 Morph Tagging project meeting

Ask Ivan @ MS about Win Server licenses

Re-engage with Marc Carmen

Paul & Brandon do web-based prototype

Paul: GWT & JSF
Brandon: ASP & JSF

Eric: architecture for client/server set-up in Visio diagram

18 July 2008 Syriac project meeting

Features for prototype

Inspect lexical entry
Nestorian font
Font size

Features for review mode in prototype

highlight word in top line
single line of review cells
divider bar separating text (top) from review cells (bottom); ability to move to create more or fewer lines of review cells
allowance for multiple annotations
ability to change
indicator: tagged by human (blue) or machine (yellow)
reveal auto tag. on demand
reviewer can acknowledge blue and yellow tags
- turns green
- retrains model, as appropriate
reveal levels of confidence on yellow cells - little confidence bars?
editing lens
progress tracker

Features for active learning mode in prototype:

“next” button should be “done button” when word-at-a-time
sent.-at-a-time: previous word, next word, done
“back” button to return to previous case
highlight
hide file browser
remove review boxes above and below the “lens” row
add path (corpus –> author –> doc) at top of view for context
place the highlighted word in the middle
use yellow & blue highlights on text in text view and in edit controls
constraint and prediction
changing attribute affects lexeme and vice versa
segment, then constraint lexicon, then attributes

Features for browse mode in prototype

tool tips on every word
link to dictionary

Features for all modes in prototype

annotator should be able to flag a transcription as possibly erroneous
allow viewing of image?

22 July 2008 Combined ALFA / Morph. Tagging project meetings

Focus: knowledge-free

Perspective: cost reduction

compare machine learning (data-driven approach) with knowledge engineering

Partial solutions:

morph. tagging (as we have defined it - one vector of attributes per token)
segmentation only
vowel restoration

Whole solution:

tag + segment
look up in / link to dictionary
required for Syriac project

Reminders on Syriac tagger:

Re-do without string prediction
Remove vowels and re-do

Proposals:

feature sets for predictive segmentation

1. predict # of characters in prefix, suffix 2. choose among letter sequence as prefix

nlp-private/2008-meeting-notes.txt · Last modified: 2015/04/23 14:46 by ryancha

Back to top

Table of Contents