Differences

This shows you the differences between two versions of the page.

Link to this comparison view

nlp-private:annotation-guidelines-and-metrics [2015/04/23 19:22] (current)
ryancha created
Line 1: Line 1:
 +Back to [[Noisy OCR Group]]
 +
 +<br />
 +
 +__TOC__
 +
 +<br />
 +
 +== Annotation Guidelines ==
 +
 +The original Ancestry.com data labeling guidelines: [[Ancestry dot Com]]
 +
 +[[Page-level Image-based Labeling Guidelines]]
 +
 +<br />
 +
 +== Metrics ==
 +
 +Page Level
 +
 +* Process: Take the union of instances (e.g. full names) in both hand label and predicted label files, removing token IDs and coordinates. ​ Evaluate as normal based matching the text only.
 +
 +Instance Level
 +
 +* Coordinates are given for extractor output and hand-annotation.
 +* Metrics are generated by comparing labeled boxes and not text, using coordinates and labels.
 +* The main reason for using this kind of metric based on coordinates instead of IDs is because IDs become useless when using more than one OCR engine.
 +
 +<br />
  
nlp-private/annotation-guidelines-and-metrics.txt ยท Last modified: 2015/04/23 19:22 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0