nlp-private:annotation-guidelines-and-metrics [CS Wiki]

Trace: • mid-term-study-guide • annotation-guidelines-and-metrics

Back to Noisy OCR Group

<br />

__TOC__

<br />

Annotation Guidelines

The original Ancestry.com data labeling guidelines: Ancestry dot Com

Page-level Image-based Labeling Guidelines

<br />

Metrics

Page Level

Process: Take the union of instances (e.g. full names) in both hand label and predicted label files, removing token IDs and coordinates. Evaluate as normal based matching the text only.

Instance Level

Coordinates are given for extractor output and hand-annotation.
Metrics are generated by comparing labeled boxes and not text, using coordinates and labels.
The main reason for using this kind of metric based on coordinates instead of IDs is because IDs become useless when using more than one OCR engine.

<br />

nlp-private/annotation-guidelines-and-metrics.txt · Last modified: 2015/04/23 13:22 by ryancha

Back to top

CC Attribution-Share Alike 4.0 International