Main Page

From Ancestry Corpus

Contents

Ancestry.com Corpus

Welcome. This wiki provides access to and information about the Ancestry.com Printed Document Corpus. This corpus contains document images from a variety of printed documents relevant to family history research, including books and newspapers.

Corpus Description


Family History Children Lists

The text of about 300 entries of some of the lists of children within two family history books: Ely and Barber. These files have been manually labeled with field labels and entity labels in two separate files. Other variations in the text are given in additional files, such as conflated numerals (all digits replaced by 8's). The sets of entries have been split into three sub-sets: training, dev. test and blind test. Suitable for training and evaluating of hidden a Markov model sequence labeler.

Media:FamilyHistoryChildrenLists.zip



Transcriptions and Family Annotations.zip

Several pages of documents with corresponding images, OCR, manual transcription and manual annotation of life event and family relationship information.

Media:TranscriptionsAndFamilyAnnotations.zip


Wiki Stuff

Consult the User's Guide for information on using the wiki software.


Views
Personal tools
  • Log in
Navigation
Toolbox