Back to Noisy OCR Group

<br />

__TOC__

<br />

Annotation Guidelines

The original Ancestry.com data labeling guidelines: Ancestry dot Com

Page-level Image-based Labeling Guidelines

<br />

Metrics

Page Level

  • Process: Take the union of instances (e.g. full names) in both hand label and predicted label files, removing token IDs and coordinates. Evaluate as normal based matching the text only.

Instance Level

  • Coordinates are given for extractor output and hand-annotation.
  • Metrics are generated by comparing labeled boxes and not text, using coordinates and labels.
  • The main reason for using this kind of metric based on coordinates instead of IDs is because IDs become useless when using more than one OCR engine.

<br />

nlp-private/annotation-guidelines-and-metrics.txt · Last modified: 2015/04/23 13:22 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0