Back to Noisy OCR Group

<br />

Papers to be published; what to do.

<br />

__TOC__

<br />

ASIST 2010

Re-evaluate using page-level full names annotated based on transcriptions (no token IDs or bounding boxes required) (about 7 days with fewer hidden variables).

  1. Annotate on page-level with respect to transcription of image text (2 days).
  2. Read new hand-label file format and add new page-level metrics (no token ID or bounding boxes) to evaluation code (1/2 day).
  3. Re-evaluate extractors (1 day).
  4. Collect data and re-make figures (1 day).
  5. Re-write paper for ASIST including new metric explanation (2 days).

Re-evaluate using the instance-level annotations based on image bounding box coordinates (about 13 days minimum, with a lot of hidden variables).

  1. Two tool options:
    • Write annotation tool just for full name boxes and writing in transcription of full name, use it to annotate blind test set images, output XML with coordinates, full-name label, and transcribed names (5 days)
    • Use existing art program, write coordinates into XML file as read from art program and transcribed names by hand (5 days)
  2. Read new hand-label file format and add new coordinate-based metric (no token ID) to evaluation code (2 days or infinite if the coordinates are bad).
  3. Alter all extractors code to output full-name coordinates and re-run them (2 days if everyone is available and responsive, 1 or more weeks if not)
  4. Re-evaluate extractors (1 day).
  5. Re-make figures (1 day).
  6. Re-write paper for ASIST including new metric explanation (2 days).

<br />

nlp-private/papers.txt · Last modified: 2015/04/23 13:23 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0