## To Do Now

• Create instructions/templates for Eclipse and creating ANT scripts
• Create a build script for the stat nlp library
• Retire CS 479 repository
• Update project guidelines to match refactorings (Ringger)
• Clearer separation of wiki and guidelines (Ringger)

## Misc Details

• supply students with skeleton classes. This will largely take the place of class definitions on the web.
• Move worthless baselines to their own repository so they don't clutter statnlp. Provide students with this code.
• possibly rename the getOrder() methods in lab 1 to be more clear, example, getMarkovOrder()
• Have a quiz at the beginning of class to assess prior statistical knowledge
• See if it would be possible so that students can reuse their confusion matrices across labs
• Make the input/output parameters of Model clear (e.g. Sentence in, tree out) in the lab description
• Documentation/Javadocs – especially the generics (see Richard's email)
• Stipulate an appropriate package where students are to place their code to avoid name conflicts
• Rename LanguageModel to be more generic (SmoothedDistribution?) since it is used in so many different ways
• We can probably lose localTrigramScorer in place of above
• Use same splits for Lab 1 & Lab 3
• Provide MaxEnt code as option to lab 2
• Ask students to use generation to compare and contrast lab 1 and lab 3
• Make a base class that works for both lab 1 and lab 3
• Make a base Learner class that includes things like a feature extractor/data transformer chain and possible hooks for optimization (num Parameters, value at type of a deal). Note that both learners and models can be optimized (learners optimize parameters that affect training whereas models can be optimized based on things like K in GT which shouldn't affect training)
• Add sentence accuracy to hw3
• Restructure trellis; idea: be able to getNextStates(int position); internally there would be N queues; would have to make a SetQueue to adapt existing code
• Use the same split as T & M for lab 3
• Drill down on confusion matrix and do some error analysis online
• Review named entity results - CoNLL
• POS tagging – s': previous state
• Generate PCFG data (with different independence assumptions)
• proj. #1 dist check:
• 4 unigram; 6 trigram
• checking, passing, saying so - 1/3 each
• final proj. guidelines: “new” not required
• Use hierarchical model notation in CS 401R
• Labeled dependency parsing results and approaches
• Upconvert code and guidelines for translation project
• use Eisner's parsing song at end of Parsing unit. Wow!

## Notes from jar extracion for Fall '07

• Have the byu and berkeley util directories is distracting
• Remove berkeley's priority queue (also very distracting to have multiple types of priority queues)
• Solidify the semantics of LogCounter

## Homework 0

• Should include a requirement to use the distribution checker for ensuring a proper distribution.

## Lab 1

• Guidelines for dealing with unseen events
• Emphasize requirement for a true distribution
• Currently, code for generation from a counter (roughly, a categorical dist.) is provided for the students, although they still need to figure out how to generate an entire sentence.
• Better explanations can be added to the “Methods” sections.Here is a reply given to a student asking specific questions that could be converted into a supplement of some sort.
• Line searchers in math package for tuning interpolation weights???

Code:

I had an idea for lab 1 based on my experience after revising lab 3. If we modify the requirements of lab 1 appropriately, they will save a significant amount of time on lab 2 and lab 3 because they will be able to directly re-use their code.

Specifically, I think that the unigram model is fine as is. We should also require an un-interpolated bigram model (this will allow them to re-use the code here in lab 3). In addition, they are required to write an un-interpolated trigram model and then a fourth model which simply interpolates the previous three. That way we force them to have separate models lying around for future labs (I don’t believe ANYONE did this). We would recommend, but not require, that by implementing a model that worked for any order n, they could fulfill the bigram and trigram requirements much quicker than doing them separately and certainly with less code.

Another note is that b/c I recommended that they concatenate strings together to form the history, students’ code was not directly pluggable for Lab 2. They had to change all of the characters to strings or build new lms from scratch. Two solutions: (1) require that their LM works with any data type (after all, this is the purpose of generics) or (2) change the reader/add an optional reader that produces strings (each string is one character) instead of characters.

Finally, we should encourage (not require) them to separate their learner from their model so that in future labs they need only write new learners (often times, even that is unnecessary).

Write up coherent, self-contained notes on Kneser-Ney to aid students in implementation - rely on the “bit of progress” paper

## Lab 2

• Consider doing a full joint distribution check
• It might be more important for this lab that they understand that the posterior is a proper distribution by showing that .normalize \equiv p(features), since they have supposedly validated their distribution in the previous lab.

Remember how I was talking about how slick lab 2 was for maximum entropy? Well there is a downside. The way I had the students do things means that there is no way to grade their model.

Here’s the thing. Feature extraction happens in the reader. Therefore, for their report, the test set they used already had all of the features extracted and they didn’t have to add any extra code.

In general, this is desirable—why re-write code for feature extraction in every model.

The downside—the serialized model expects data to be given in its already extracted form. Of course, the auto-grader has no idea how to run their feature extractor (it wasn’t part of the serialized model, although it must exist somewhere in the .jar).

Bottom line: we don’t have any (reliable) scores for any maxent model.

Short term options: (1) require everyone to add feature extractors in the appropriate manner and re-serialize and re-upload (2) Just do the Hall of Fame based on the dev test set (3) have people report their own results on the dev test.

I’m leaning towards 2—I don’t want to add to the student’s burden.

Long term solution:

               Not quite sure. Maybe we’ll have to run evaluations based on submitted XML files now. That causes its own problems, for instance paths, and also how to insert the information related to the blind data set.

Possible solution: if their XML file is also submitted (in a predictable location, perhaps it can be one of the upload boxes), then it may be possible to parse the reader information from their own XML file (nothing else). This shouldn't (normally) have paths or any other information like that (whereas datasets will). Note that in lab three, the binarizers are attached to the datasets–this is correct because the blind data set should NOT be binarized (nothing to binarize)

## Project 2.2

• Consider NOT giving them an XML file that slurps in the validation data b/c some students were feature engineering using the validation set (as they should).
• We could write a blurb on how to feature engineer THEN slurp in as a matter of proper technique. This applies to project 1 as well.

## Project 3.1 NEW

• Clean up the tag accuracy calculator and remove the multi-tag stuff.

## Lab 3

• Add slide showing Viterbi as done in code so that I don't have to rehash it TA

## Lab 4

• verify that Lexicon returns a proper distribution for p(word|tag). I don't believe it does.
• Kyle requested that Lexicon return a counter with keys only having non-zero probability
• I had already intended on making Lexicon a model which would solve this problem
• Similarly, Grammar should be a model (p(child|parent)).
• Add an option in the Metric to only display trees with errors
• Richard reported perfect precision/recall on trees that had errors
• Compare and contrast PCFG as LM as POS tagger
• Generate text from PCFG. Error analysis

## Final Projects

• require proposal draft 1. After feedback, proposal draft 2.
• require addressing all five questions in the proposal itself.

## Bruce's Email

I spent a lot of time trying to figure out where to start. I was pretty unclear on which classes I need to write and how they fit in with the rest of the system. Graphical depictions of entity relationships and data flow would have been helpful. Navigating the sea of undocumented generic parameters and fuzzy class relationships in the framework code took several extra hours. Having documentation of even just the semantics of generic parameters for the classes I used would have saved me several hours.

I also spent several hours (6 or 7, maybe?) trying to understand the graphical model and mathematical foundations of the project, because being unclear on the math bit me on the last project. I finally just decided to talk to Robbie and take the “get stuff done” approach more than the “understand what's going on” approach. I have a superficial idea of what is happening, but there's still a lot of magic that's happening in the supplied code that I don't totally understand.

A small thing that would help is to supply prewritten code to automate the running of experiments. I spent an hour or two writing a Perl script to run an experiment, time it, and copy all of the relevant details (POS tagger file, experiment XML file, output, etc.) to a directory so that I could analyze it later.

Providing access to fast computing resources would be helpful for this project. I was lucky that I had access to a fast box with lots of memory at work, so each feature engineering run took about 1.5 hours (training + decoding w/ Viterbi), but I would imagine that some people could do only one feature engineering iteration per day.

It didn't help that I started late because I was putting out fires in my other classes.

Once I finally got going and understood things, I really enjoyed the project. It was fun to do feature engineering and see the accuracy increase with each iteration as the features that I added eliminated errors.