Annotation Guidelines

The original data labeling guidelines: Ancestry dot Com

Page-level Image-based Labeling Guidelines

Page Level

  • Process: Take the union of instances (e.g. full names) in both hand label and predicted label files, removing token IDs and coordinates. Evaluate as normal based matching the text only.

Instance Level

  • Coordinates are given for extractor output and hand-annotation.
  • Metrics are generated by comparing labeled boxes and not text, using coordinates and labels.
  • The main reason for using this kind of metric based on coordinates instead of IDs is because IDs become useless when using more than one OCR engine.

