nlp:on-ramp

April 28, 2009
May 5, 2009
May 12, 2009
May 19, 2009
May 26, 2009
May 31 - June 6, 2009
June 9, 2009
June 16, 2009

… to the ALFA project.

April 28, 2009

(continued on 4/30/09)

Topic: Welcome
Topic: Intro. to Probability Theory
- Presenter: Eric Ringger
Reading assignment: Manning & Schuetze 2.1, 2.2, 3, 4
Reading assignment: Russell & Norvig 14.1-14.4
Homework: https://cswiki.cs.byu.edu/cs479/index.php/Homework_0.1
Optional homework: https://cswiki.cs.byu.edu/cs479/index.php/Homework_0.2

May 5, 2009

Topic: Word Sense Disambiguation as motivation for Feature Engineering
- Presenter: Eric Ringger
Topic: Feature Engineering Console
- Presenter: Josh Hansen
Topic: Maximum Entropy Models
- Presenter: Peter McClanahan
Reading assignment: M&S 7, M&S 16
Optional reading assignment: Berger's MaxEnt tutorial
Homework: https://cswiki.cs.byu.edu/cs479/index.php/Project_2.2
- BUT: Use the Feature Engineering Console! http://nlp.cs.byu.edu/mediawiki-private/index.php/Feature_Engineering_Console (on the Private wiki – BYU NLP only – requires authentication)
- Write as little extra code as possible. Possible exceptions: new feature templates/extractors.
- Work with Josh Hansen if you want to improve the FEC itself.

May 12, 2009

Topic: Active Learning
- Presenter: Robbie Haertel
Reading assignment: Survey of Active Learning by Burr Settles
Homework:
- Implement one active learning selection function
- Reference: http://nlp.cs.byu.edu/mediawiki/index.php/Using_the_active_learner
- Plot learning curve for chosen function, versus random, using Gnuplot or Excel
- Work with Robbie Haertel to bring the plotting code back to life

May 19, 2009

Topic: Sequence Labeling
- Presenter: George Busby
Reading assignment: M&S 9, M&S 10
Reading assignment: Paper by Toutanova & Manning on MEMMs
Optional reading assignment: Paper on TnT by Brants
Homework:
- Continue on Active Learning experiments
- Focus on PNP classification task
  - Plot means of multiple (around 5) random runs over # of iterations.
  - Would be interesting to plot variances of multiple random runs over # of iterations.
- Try POS tagging task with small batch size B and number of iterations N, such that N x B is approx. 300

May 26, 2009

Topic: Intro. to StatNLP Code-base
- Presenter: Robbie Haertel
Reading assignment: our ALFA Paper at the LAW
Reading assignment: Tomanek et al., "An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data"
Homework: https://cswiki.cs.byu.edu/cs479/index.php/Project_3.1

May 31 - June 6, 2009

June 9, 2009

Topic: Named Entity Recognition
- Presenter: Eric Ringger
Reading assignment: Klein et al. paper from 2003: "Named Entity Recognition with Character-Level Models"
Reading assignment: McCallum et al. paper in CoNLL 2003: "Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons"
Reading assignment: Ratinov and Roth paper in CoNLL 2009: "Design Challenges and Misconceptions in Named Entity Recognition"
Homework:
- Data: CoNLL 2003 Named Entity shared task data set: http://www.cnts.ua.ac.be/conll2003/ner/
- Baseline: dictionary look-up method on CoNLL named entity recognition shared task data
  - dictionary is simply the list of named entities extracted from the training set
- Baseline: MEMM for Named Entity Recognition on the CoNLL data
  - Improve on this by doing error analysis and feature engineering, as you did for the POS tagging task
- Run both methods (dictionary look-up and MEMM) on noisy OCR data
- Coordinate with Thomas Packer for noisy OCR data (esp. the labeled dev test set)
  - Private wiki site for the noisy OCR data: http://nlp.cs.byu.edu/mediawiki-private/index.php/Ancestry_dot_Com
- Pick one 3rd-party tool (distinct from other students) from the list of open source tools on Wikipedia: http://en.wikipedia.org/wiki/Named_entity_recognition
  - Prefer one of the following:
    - Stanford Named Entity tagger
    - CCG group at UIUC: Named Entity + semantic role-labeling tagger
    - Mallet from U. Mass. Amherst
- Run 3rd-party tool on CoNLL data and noisy OCR data
- Report results

June 16, 2009

Topic: User Study and Regression Results
- Presenter: Kevin Seppi
- Reading assignment: our LREC Paper
- Reading assignment: Shilpa Arora et al.'s paper at ALNLP 2009
Homework:
- Repro. regression results from LREC paper using R
- Apply methods from Arora's study to our data using SVM Regression
  - over-all cost model
  - per-subject cost model
  - per-subject-type cost model

Table of Contents

April 28, 2009

May 5, 2009

May 12, 2009

May 19, 2009

May 26, 2009

May 31 - June 6, 2009

June 9, 2009

June 16, 2009