Main Page

From Ancestry Corpus

Contents Corpus

Welcome. This wiki provides access to and information about the Printed Document Corpus. This corpus contains document images from a variety of printed documents relevant to family history research, including books and newspapers.

Corpus Description

Family History Children Lists

The text of about 300 entries of some of the lists of children within two family history books: Ely and Barber. These files have been manually labeled with field labels and entity labels in two separate files. Other variations in the text are given in additional files, such as conflated numerals (all digits replaced by 8's). The sets of entries have been split into three sub-sets: training, dev. test and blind test. Suitable for training and evaluating of hidden a Markov model sequence labeler.

Transcriptions and Family

Several pages of documents with corresponding images, OCR, manual transcription and manual annotation of life event and family relationship information.

Wiki Stuff

Consult the User's Guide for information on using the wiki software.

Personal tools
  • Log in