... to the ALFA project. == April 28, 2009 == (continued on 4/30/09) * Topic: Welcome * Topic: Intro. to Probability Theory ** Presenter: Eric Ringger * Reading assignment: Manning & Schuetze 2.1, 2.2, 3, 4 * Reading assignment: Russell & Norvig 14.1-14.4 * Homework: https://cswiki.cs.byu.edu/cs479/index.php/Homework_0.1 * Optional homework: https://cswiki.cs.byu.edu/cs479/index.php/Homework_0.2 == May 5, 2009 == * Topic: Word Sense Disambiguation as motivation for Feature Engineering ** Presenter: Eric Ringger * Topic: Feature Engineering Console ** Presenter: Josh Hansen * Topic: Maximum Entropy Models ** Presenter: Peter McClanahan * Reading assignment: M&S 7, M&S 16 * Optional reading assignment: [http://www.cs.cmu.edu/afs/cs/user/aberger/www/html/tutorial/tutorial.html Berger's MaxEnt tutorial] * Homework: https://cswiki.cs.byu.edu/cs479/index.php/Project_2.2 ** BUT: Use the Feature Engineering Console! http://nlp.cs.byu.edu/mediawiki-private/index.php/Feature_Engineering_Console (on the Private wiki -- BYU NLP only -- requires authentication) ** Write as little extra code as possible. Possible exceptions: new feature templates/extractors. ** Work with Josh Hansen if you want to improve the FEC itself. == May 12, 2009 == * Topic: Active Learning ** Presenter: Robbie Haertel * Reading assignment: [http://pages.cs.wisc.edu/~bsettles/pub/settles.activelearning.pdf Survey of Active Learning by Burr Settles] * Homework: ** Implement one active learning selection function ** Reference: http://nlp.cs.byu.edu/mediawiki/index.php/Using_the_active_learner ** Plot learning curve for chosen function, versus random, using Gnuplot or Excel ** Work with Robbie Haertel to bring the plotting code back to life == May 19, 2009 == * Topic: Sequence Labeling ** Presenter: George Busby * Reading assignment: M&S 9, M&S 10 * Reading assignment: [http://faculty.cs.byu.edu/~ringger/CS401R/papers/ToutanovaManning_POS-emnlp2000.pdf Paper by Toutanova & Manning on MEMMs] * Optional reading assignment: [http://faculty.cs.byu.edu/~ringger/CS401R/papers/Brants_POS-00tnt.pdf Paper on TnT by Brants] * Homework: ** Continue on Active Learning experiments ** Focus on PNP classification task *** Plot means of multiple (around 5) random runs over # of iterations. *** Would be interesting to plot variances of multiple random runs over # of iterations. ** Try POS tagging task with small batch size B and number of iterations N, such that N x B is approx. 300 == May 26, 2009 == * Topic: Intro. to StatNLP Code-base ** Presenter: Robbie Haertel * Reading assignment: [http://aclweb.org/anthology-new/W/W07/W07-1516.pdf our ALFA Paper at the LAW] * Reading assignment: [http://aclweb.org/anthology-new/D/D07/D07-1051.pdf Tomanek et al., "An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data"] * Homework: https://cswiki.cs.byu.edu/cs479/index.php/Project_3.1 == May 31 - June 6, 2009 == * [http://www.naacl2009.org NAACL HLT 2009] * [http://nlp.cs.byu.edu/alnlp/ Workshop on Active Learning for NLP] == June 9, 2009 == * Topic: Named Entity Recognition ** Presenter: Eric Ringger * Reading assignment: [http://nlp.stanford.edu/pubs/conll-ner.pdf Klein et al. paper from 2003: "Named Entity Recognition with Character-Level Models"] * Reading assignment: [http://www.cs.umass.edu/~mccallum/papers/mccallum-conll2003.pdf McCallum et al. paper in CoNLL 2003: "Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons"] * Reading assignment: [http://l2r.cs.uiuc.edu/~danr/Papers/RatinovRo09.pdf Ratinov and Roth paper in CoNLL 2009: "Design Challenges and Misconceptions in Named Entity Recognition"] * Homework: ** Data: CoNLL 2003 Named Entity shared task data set: http://www.cnts.ua.ac.be/conll2003/ner/ ** Baseline: dictionary look-up method on CoNLL named entity recognition shared task data *** dictionary is simply the list of named entities extracted from the training set ** Baseline: MEMM for Named Entity Recognition on the CoNLL data *** Improve on this by doing error analysis and feature engineering, as you did for the POS tagging task ** Run both methods (dictionary look-up and MEMM) on noisy OCR data ** Coordinate with Thomas Packer for noisy OCR data (esp. the labeled dev test set) *** Private wiki site for the noisy OCR data: http://nlp.cs.byu.edu/mediawiki-private/index.php/Ancestry_dot_Com ** Pick one 3rd-party tool (distinct from other students) from the list of open source tools on Wikipedia: http://en.wikipedia.org/wiki/Named_entity_recognition *** Prefer one of the following: **** Stanford Named Entity tagger **** CCG group at UIUC: Named Entity + semantic role-labeling tagger **** Mallet from U. Mass. Amherst ** Run 3rd-party tool on CoNLL data and noisy OCR data ** Report results == June 16, 2009 == * Topic: User Study and Regression Results ** Presenter: Kevin Seppi ** Reading assignment: [http://www.lrec-conf.org/proceedings/lrec2008/pdf/832_paper.pdf our LREC Paper] ** Reading assignment: [http://aclweb.org/anthology-new/W/W09/W09-1903.pdf Shilpa Arora et al.'s paper at ALNLP 2009] * Homework: ** Repro. regression results from LREC paper using R ** Apply methods from Arora's study to our data using SVM Regression *** over-all cost model *** per-subject cost model *** per-subject-type cost model