Table of Contents
April 28, 2009
May 5, 2009
May 12, 2009
May 19, 2009
May 26, 2009
May 31 - June 6, 2009
June 9, 2009
June 16, 2009
… to the ALFA project.
April 28, 2009
(continued on 4/30/09)
Topic: Welcome
Topic: Intro. to Probability Theory
Presenter: Eric Ringger
Reading assignment: Manning & Schuetze 2.1, 2.2, 3, 4
Reading assignment: Russell & Norvig 14.1-14.4
Homework:
https://cswiki.cs.byu.edu/cs479/index.php/Homework_0.1
Optional homework:
https://cswiki.cs.byu.edu/cs479/index.php/Homework_0.2
May 5, 2009
Topic: Word Sense Disambiguation as motivation for Feature Engineering
Presenter: Eric Ringger
Topic: Feature Engineering Console
Presenter: Josh Hansen
Topic: Maximum Entropy Models
Presenter: Peter McClanahan
Reading assignment: M&S 7, M&S 16
Optional reading assignment:
Berger's MaxEnt tutorial
Homework:
https://cswiki.cs.byu.edu/cs479/index.php/Project_2.2
BUT: Use the Feature Engineering Console!
http://nlp.cs.byu.edu/mediawiki-private/index.php/Feature_Engineering_Console
(on the Private wiki – BYU NLP only – requires authentication)
Write as little extra code as possible. Possible exceptions: new feature templates/extractors.
Work with Josh Hansen if you want to improve the FEC itself.
May 12, 2009
Topic: Active Learning
Presenter: Robbie Haertel
Reading assignment:
Survey of Active Learning by Burr Settles
Homework:
Implement one active learning selection function
Reference:
http://nlp.cs.byu.edu/mediawiki/index.php/Using_the_active_learner
Plot learning curve for chosen function, versus random, using Gnuplot or Excel
Work with Robbie Haertel to bring the plotting code back to life
May 19, 2009
Topic: Sequence Labeling
Presenter: George Busby
Reading assignment: M&S 9, M&S 10
Reading assignment:
Paper by Toutanova & Manning on MEMMs
Optional reading assignment:
Paper on TnT by Brants
Homework:
Continue on Active Learning experiments
Focus on PNP classification task
Plot means of multiple (around 5) random runs over # of iterations.
Would be interesting to plot variances of multiple random runs over # of iterations.
Try POS tagging task with small batch size B and number of iterations N, such that N x B is approx. 300
May 26, 2009
Topic: Intro. to StatNLP Code-base
Presenter: Robbie Haertel
Reading assignment:
our ALFA Paper at the LAW
Reading assignment:
Tomanek et al., "An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data"
Homework:
https://cswiki.cs.byu.edu/cs479/index.php/Project_3.1
May 31 - June 6, 2009
NAACL HLT 2009
Workshop on Active Learning for NLP
June 9, 2009
Topic: Named Entity Recognition
Presenter: Eric Ringger
Reading assignment:
Klein et al. paper from 2003: "Named Entity Recognition with Character-Level Models"
Reading assignment:
McCallum et al. paper in CoNLL 2003: "Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons"
Reading assignment:
Ratinov and Roth paper in CoNLL 2009: "Design Challenges and Misconceptions in Named Entity Recognition"
Homework:
Data: CoNLL 2003 Named Entity shared task data set:
http://www.cnts.ua.ac.be/conll2003/ner/
Baseline: dictionary look-up method on CoNLL named entity recognition shared task data
dictionary is simply the list of named entities extracted from the training set
Baseline: MEMM for Named Entity Recognition on the CoNLL data
Improve on this by doing error analysis and feature engineering, as you did for the POS tagging task
Run both methods (dictionary look-up and MEMM) on noisy OCR data
Coordinate with Thomas Packer for noisy OCR data (esp. the labeled dev test set)
Private wiki site for the noisy OCR data:
http://nlp.cs.byu.edu/mediawiki-private/index.php/Ancestry_dot_Com
Pick one 3rd-party tool (distinct from other students) from the list of open source tools on Wikipedia:
http://en.wikipedia.org/wiki/Named_entity_recognition
Prefer one of the following:
Stanford Named Entity tagger
CCG group at UIUC: Named Entity + semantic role-labeling tagger
Mallet from U. Mass. Amherst
Run 3rd-party tool on CoNLL data and noisy OCR data
Report results
June 16, 2009
Topic: User Study and Regression Results
Presenter: Kevin Seppi
Reading assignment:
our LREC Paper
Reading assignment:
Shilpa Arora et al.'s paper at ALNLP 2009
Homework:
Repro. regression results from LREC paper using R
Apply methods from Arora's study to our data using SVM Regression
over-all cost model
per-subject cost model
per-subject-type cost model