Back to Noisy OCR Group
<br />
Papers to be published; what to do.
<br />
__TOC__
<br />
ASIST 2010
Re-evaluate using page-level full names annotated based on transcriptions (no token IDs or bounding boxes required) (about 7 days with fewer hidden variables).
Annotate on page-level with respect to transcription of image text (2 days).
Read new hand-label file format and add new page-level metrics (no token ID or bounding boxes) to evaluation code (1/2 day).
Re-evaluate extractors (1 day).
Collect data and re-make figures (1 day).
Re-write paper for ASIST including new metric explanation (2 days).
Re-evaluate using the instance-level annotations based on image bounding box coordinates (about 13 days minimum, with a lot of hidden variables).
Two tool options:
Write annotation tool just for full name boxes and writing in transcription of full name, use it to annotate blind test set images, output XML with coordinates, full-name label, and transcribed names (5 days)
Use existing art program, write coordinates into XML file as read from art program and transcribed names by hand (5 days)
Read new hand-label file format and add new coordinate-based metric (no token ID) to evaluation code (2 days or infinite if the coordinates are bad).
Alter all extractors code to output full-name coordinates and re-run them (2 days if everyone is available and responsive, 1 or more weeks if not)
Re-evaluate extractors (1 day).
Re-make figures (1 day).
Re-write paper for ASIST including new metric explanation (2 days).
<br />
Back to top