Differences

This shows you the differences between two versions of the page.

Link to this comparison view

nlp-private:2008-meeting-notes [2015/04/23 20:46] (current)
ryancha created
Line 1: Line 1:
 +== 17 Oct 2007 ==
 +
 +
 +all words are now rare
 +
 +baseline (inspired by bug): train model at round 1 (on batch 1) and use it across the experiment
 +- essentially longest sentence
 +
 +alternative:​ stop updating model at round x
 +- (similar to switching to random selection at round x)
 +- for some sufficiently large x, should see no disadvantage
 +
 +issue: cost of waiting for computer to select next sample
 +
 +idea: time-limited sample selection (use stale scores, if time doesn'​t permit more work)
 +- time limit could be fixed const. or determined by completion of annotator'​s work unit.
 +
 +measure cumulative time of experiments
 +
 +idea: most common words per length of sentence
 +
 +oracle labeled data split into training and test.
 +
 +
 +== 19 Feb. 2008 ALFA Notes from whiteboard ==
 +
 +* Multiple annotators/​annotations
 +* Simple model w/better selection (at first)
 +* More look-ahead w/ simple model
 +* Interactive constrained Viterbi (cost perspective)
 +* 3 processes
 +* Proper initialization of prior – convergence of fast maxent (acero & Chelba)
 +* QBC:
 +** Committee selection
 +** Disagreement metric
 +** Size
 +* Randomize of unannotated set every round
 +* Profile
 +* Net cost = gross cost – annotation time
 +* For 1 cycle of a.l.
 +** A. both idle (have this)
 +** Human idle
 +** No-one idle
 +** Computer idle
 +* Candidate set size
 +* Data/​results “commit”
 +
 +Q_HC  Q_CH
 +0 0
 +\infty 0
 +\infty \infty
 +0 \infty
 +
 +* First two rows: pay when human waits; no pay when human waits
 +
 +Diagram: edge from H to C and from C to H.  Edges annotated by Q_HC and Q_CH, respectively
 +|Q_HC| = batch size
 +|Q_CH| = candidate set size – could have stale scores
 +
 +
 +== 6/​6/​2008 ​ Syriac project meeting and 6/12/2008 Morph. Tagging project meeting ==
 +
 +Questions:
 +* 1 character per prefix? yes
 +* more than one prefix? 0-3
 +* is a single separator ambiguous in some cases? ​ use 2: one for prefix and one for suffix
 +* multi-purpose interface: active learning, browsing, review for proofing
 +* are we tagging prefixes or just segmenting them?
 +
 +Action:
 +Brandon: clicking on side words updates attribute-value box's label
 +Peter: the prefix string is a value tagged by the Syriac tagger. ​ shouldn'​t be.
 +Eric: follow up with Harry Diakoff
 +* Auto-complete in little-language box
 +* Support keyboard-only data entry
 +* Update public site for ALFA project
 +
 +
 +
 +== 6/13/2008 Syriac project meeting ==
 +
 +
 +Prefixes:
 +* dolath (d)
 +* lamad (l)
 +* waw (conj)
 +* prepositions
 +
 +Suffixes: small, finite number
 +
 +Idea: Layered / Prioritized Tags
 +* tool should support full expressivity (full tag set)
 +* tool should be configurable to hide layers (reduced tag set)
 +* Configurable:​ may later want other layers of annotation, other distinctions
 +
 +Important: supporting linkage of stems to headwords ("​lexemes"​) in the dictionary
 +
 +Modes:
 +* Active learning
 +* Review
 +
 +Features:
 +* Reveal context: tool-tips should reveal attributes on more distant tokens in any text view
 +* configureable amount of context
 +* situated in the corpus, in a document
 +* ability to browse files: explorer right on the left
 +* ability to browse dictionary: dictionary pane on the right
 +* ability to link stems to lexemes in the dictionary
 +* ability to add to dictionary with pointer(s) back to corpus for examples
 +
 +Roles for annotators:
 +* editors of dictionary
 +* non-editors - can only propose new entries
 +
 +
 +
 +== 24 June 2008 ALFA project meeting ==
 +
 +
 +Questions / variables for user study:
 +
 +* scope of jumps taken by active learner / measure the cost of context switching
 +** article
 +** genre
 +** time period
 +** author
 +** corpus
 +*availability of lexicon
 +*how much context
 +**sentence, phrase, QWIC, etc.
 +*granularity of annotation:
 +** sentence
 +**phrase
 +**word
 +**word sub-tag
 +*correct or annotated from scratch
 +*presentation of top-N model hypotheses
 +*order of forced annotation: step-by-step in order or jumping (per AL)
 +
 +Research question:
 +*1st item choice
 +*online cost model estimation
 +*which model to select data for user study
 +**minimize bias of item selector for user study
 +*layers of annotation
 +
 +Separate problem: vowel restoration
 +
 +Action:
 +*Hebrew OOV investigation
 +
 +Interface modes:
 +1. active learning
 +**machine determines granularity:​ sentence, phrase, word, sub-tag
 +2. review mode: sequential order
 +3. review mode: arbitrary order
 +4. review mode: AL order
 +5. browse (no changes)
 +
 +
 +
 +== Date unknown (prior to 8 July 2008) ==
 +
 +
 +Projects to complete:
 +*User study for Syriac
 +*Simulating AL for Syriac
 +*Cost model on the fly
 +*Get Habash/​Rambow data for Arabic
 +
 +
 +
 +== 8 July 2008 ALFA project meeting ==
 +
 +
 +Proposal to Harry Diakoff
 +*Machine learning
 +*Annotation
 +*could be joint with BYU Classics or with Perseus Project
 +*2 pages
 +
 +Paper ideas:
 +*Wait for it!  Cost/​benefit trade-offs in waiting
 +*probability of datum - later (15 July 08 ?) decided to be unpromising based on prior experiments and discussion
 +*utility / loss as part of active learner
 +*multiple annotators / imperfect annotators
 +*Cost model on the fly
 +*Cost implications of error correction propagation
 + *see Culotta & McCallum
 + *another point on the correction vs. from scratch spectrum
 +*Greedy EVSI
 +*Particle filters for Bayesian models in AL
 +
 +
 +
 +== 17 July 2008 Morph Tagging project meeting ==
 +
 +
 +Ask Ivan @ MS about Win Server licenses
 +
 +Re-engage with Marc Carmen
 +
 +Paul & Brandon do web-based prototype
 +*Paul: GWT & JSF
 +*Brandon: ASP & JSF
 +
 +Eric: architecture for client/​server set-up in Visio diagram
 +
 +
 +
 +== 18 July 2008 Syriac project meeting ==
 +
 +
 +Features for prototype
 +*Inspect lexical entry
 +* Nestorian font
 +* Font size
 +
 +Features for review mode in prototype
 +* highlight word in top line
 +* single line of review cells
 +* divider bar separating text (top) from review cells (bottom); ability to move to create more or fewer lines of review cells
 +* allowance for multiple annotations
 +* ability to change
 +* indicator: tagged by human (blue) or machine (yellow)
 +* reveal auto tag. on demand
 +* reviewer can acknowledge blue and yellow tags
 +** turns green
 +** retrains model, as appropriate
 +* reveal levels of confidence on yellow cells - little confidence bars?
 +* editing lens
 +* progress tracker
 +
 +Features for active learning mode in prototype:
 +* "​next"​ button should be "done button"​ when word-at-a-time
 +* sent.-at-a-time:​ previous word, next word, done
 +* "​back"​ button to return to previous case
 +* highlight
 +* hide file browser
 +* remove review boxes above and below the "​lens"​ row
 +* add path (corpus --> author --> doc) at top of view for context
 +* place the highlighted word in the middle
 +* use yellow & blue highlights on text in text view and in edit controls
 +* constraint and prediction
 +* changing attribute affects lexeme and vice versa
 +* segment, then constraint lexicon, then attributes
 +
 +Features for browse mode in prototype
 +* tool tips on every word
 +* link to dictionary
 +
 +Features for all modes in prototype
 +* annotator should be able to flag a transcription as possibly erroneous
 +* allow viewing of image?
 +
 +
 +
 +== 22 July 2008 Combined ALFA / Morph. Tagging project meetings ==
 +
 +
 +Focus: knowledge-free
 +
 +Perspective:​ cost reduction
 +* compare machine learning (data-driven approach) with knowledge engineering
 +
 +Partial solutions:
 +* morph. tagging (as we have defined it - one vector of attributes per token)
 +* segmentation only
 +* vowel restoration
 +
 +Whole solution:
 +* tag + segment
 +* look up in / link to dictionary
 +* required for Syriac project
 +
 +Reminders on Syriac tagger:
 +* Re-do without string prediction
 +* Remove vowels and re-do
 +
 +Proposals:
 +* feature sets for predictive segmentation
 +1. predict # of characters in prefix, suffix
 +2. choose among letter sequence as prefix
  
nlp-private/2008-meeting-notes.txt · Last modified: 2015/04/23 20:46 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0