Table of Contents

My favorite month is Halloween month. My favorite day is Halloween day.

We're Nearly Done Now

Bibliography?

NER with Noisy OCR

Things being done now

Improve the argument in the paper

  1. Results from our codebase on a vanilla MEMM.
  2. Results from Mallet on a CRF.
  3. Clean up code for to commit into the repository.

Make our model better than DEG's

  1. Implement features from the papers below.
  2. Change to the BILOU encoding of the data.

Things to be doing soonish

  1. Import Aaron's regexes and templates as features.
  2. Try out Thomas' list pruning on name dictionaries.
  3. More labeled data through the HBLL.

598R NER reading list

L. Ratinov and D. Roth, “Design Challenges and Misconceptions in Named Entity Recognition”

L. Ratinov and D. Roth, “Design Challenges and Misconceptions in Named Entity Recognition,” in Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009), 2009, 147–155.

H. L Chieu and H. T Ng, “Named entity recognition with a maximum entropy approach.”

H. L Chieu and H. T Ng, “Named entity recognition with a maximum entropy approach.”

Further

  1. P. F Brown et al., “Class-based n-gram models of natural language,” Computational linguistics 18, no. 4 (1992): 467–479.
  2. P. Liang, “Semi-supervised learning for natural language” (Citeseer, 2005).

Things done