... to the ALFA project.

== April 28, 2009 ==
(continued on 4/30/09)

* Topic: Welcome
* Topic: Intro. to Probability Theory
** Presenter: Eric Ringger
* Reading assignment: Manning & Schuetze 2.1, 2.2, 3, 4
* Reading assignment: Russell & Norvig 14.1-14.4
* Homework: https://cswiki.cs.byu.edu/cs479/index.php/Homework_0.1 
* Optional homework: https://cswiki.cs.byu.edu/cs479/index.php/Homework_0.2

== May 5, 2009 ==

* Topic: Word Sense Disambiguation as motivation for Feature Engineering
** Presenter: Eric Ringger
* Topic: Feature Engineering Console
** Presenter: Josh Hansen
* Topic: Maximum Entropy Models
** Presenter: Peter McClanahan
* Reading assignment: M&S 7, M&S 16
* Optional reading assignment: [http://www.cs.cmu.edu/afs/cs/user/aberger/www/html/tutorial/tutorial.html Berger's MaxEnt tutorial]
* Homework: https://cswiki.cs.byu.edu/cs479/index.php/Project_2.2
** BUT: Use the Feature Engineering Console! http://nlp.cs.byu.edu/mediawiki-private/index.php/Feature_Engineering_Console (on the Private wiki -- BYU NLP only -- requires authentication)
** Write as little extra code as possible.  Possible exceptions: new feature templates/extractors.
** Work with Josh Hansen if you want to improve the FEC itself.

== May 12, 2009 ==

* Topic: Active Learning
** Presenter: Robbie Haertel
* Reading assignment: [http://pages.cs.wisc.edu/~bsettles/pub/settles.activelearning.pdf Survey of Active Learning by Burr Settles]
* Homework:
** Implement one active learning selection function
** Reference: http://nlp.cs.byu.edu/mediawiki/index.php/Using_the_active_learner
** Plot learning curve for chosen function, versus random, using Gnuplot or Excel
** Work with Robbie Haertel to bring the plotting code back to life

== May 19, 2009 ==

* Topic: Sequence Labeling
** Presenter: George Busby
* Reading assignment: M&S 9, M&S 10
* Reading assignment: [http://faculty.cs.byu.edu/~ringger/CS401R/papers/ToutanovaManning_POS-emnlp2000.pdf Paper by Toutanova & Manning on MEMMs]
* Optional reading assignment: [http://faculty.cs.byu.edu/~ringger/CS401R/papers/Brants_POS-00tnt.pdf Paper on TnT by Brants]
* Homework:
** Continue on Active Learning experiments
** Focus on PNP classification task
*** Plot means of multiple (around 5) random runs over # of iterations.
*** Would be interesting to plot variances of multiple random runs over # of iterations.
** Try POS tagging task with small batch size B and number of iterations N, such that N x B is approx. 300

== May 26, 2009 ==

* Topic: Intro. to StatNLP Code-base
** Presenter: Robbie Haertel
* Reading assignment:  [http://aclweb.org/anthology-new/W/W07/W07-1516.pdf our ALFA Paper at the LAW]
* Reading assignment:  [http://aclweb.org/anthology-new/D/D07/D07-1051.pdf Tomanek et al., "An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data"]
* Homework: https://cswiki.cs.byu.edu/cs479/index.php/Project_3.1

== May 31 - June 6, 2009 ==

* [http://www.naacl2009.org NAACL HLT 2009]
* [http://nlp.cs.byu.edu/alnlp/ Workshop on Active Learning for NLP]

== June 9, 2009 ==

* Topic: Named Entity Recognition
** Presenter: Eric Ringger
* Reading assignment: [http://nlp.stanford.edu/pubs/conll-ner.pdf Klein et al. paper from 2003: "Named Entity Recognition with Character-Level Models"]
* Reading assignment: [http://www.cs.umass.edu/~mccallum/papers/mccallum-conll2003.pdf McCallum et al. paper in CoNLL 2003: "Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons"]
* Reading assignment: [http://l2r.cs.uiuc.edu/~danr/Papers/RatinovRo09.pdf Ratinov and Roth paper in CoNLL 2009: "Design Challenges and Misconceptions in Named Entity Recognition"]  
* Homework:
** Data: CoNLL 2003 Named Entity shared task data set: http://www.cnts.ua.ac.be/conll2003/ner/
** Baseline: dictionary look-up method on CoNLL named entity recognition shared task data
*** dictionary is simply the list of named entities extracted from the training set
** Baseline: MEMM for Named Entity Recognition on the CoNLL data
*** Improve on this by doing error analysis and feature engineering, as you did for the POS tagging task
** Run both methods (dictionary look-up and MEMM) on noisy OCR data
** Coordinate with Thomas Packer for noisy OCR data (esp. the labeled dev test set)
*** Private wiki site for the noisy OCR data: http://nlp.cs.byu.edu/mediawiki-private/index.php/Ancestry_dot_Com
** Pick one 3rd-party tool (distinct from other students) from the list of open source tools on Wikipedia: http://en.wikipedia.org/wiki/Named_entity_recognition
*** Prefer one of the following:
**** Stanford Named Entity tagger
**** CCG group at UIUC: Named Entity + semantic role-labeling tagger
**** Mallet from U. Mass. Amherst
** Run 3rd-party tool on CoNLL data and noisy OCR data
** Report results

== June 16, 2009 ==

* Topic: User Study and Regression Results
** Presenter: Kevin Seppi
** Reading assignment: [http://www.lrec-conf.org/proceedings/lrec2008/pdf/832_paper.pdf our LREC Paper]
** Reading assignment: [http://aclweb.org/anthology-new/W/W09/W09-1903.pdf Shilpa Arora et al.'s paper at ALNLP 2009]
* Homework:
** Repro. regression results from LREC paper using R
** Apply methods from Arora's study to our data using SVM Regression
*** over-all cost model
*** per-subject cost model
*** per-subject-type cost model