Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
nlp-private:library-ocr-tasks [2015/04/23 19:36]
ryancha created
nlp-private:library-ocr-tasks [2015/04/23 19:37]
ryancha
Line 7: Line 7:
 * Use more OCR engines in the current system * Use more OCR engines in the current system
 ** [http://​www.irislink.com/​c2-1584-189/​Readiris-12---OCR-Software-------Convert-your-Paper-Documents-into-Editable-Text-.aspx ReadIRIS] ** [http://​www.irislink.com/​c2-1584-189/​Readiris-12---OCR-Software-------Convert-your-Paper-Documents-into-Editable-Text-.aspx ReadIRIS]
-** Adobe Acrobat OCR -- Assigned to [[User:Cr24|Chris Rotz]]+** Adobe Acrobat OCR -- Assigned to [[Cr24|Chris Rotz]]
 ** [http://​www.primerecognition.com/​ Prime OCR] ** [http://​www.primerecognition.com/​ Prime OCR]
 * OCR confusion matrix to adjust the costs on mismatches. The hope is that there will be fewer paths through the network which may allow us to do more complex documents, and explore the network more quickly. * OCR confusion matrix to adjust the costs on mismatches. The hope is that there will be fewer paths through the network which may allow us to do more complex documents, and explore the network more quickly.
Line 16: Line 16:
 * Use a language model to select between multiple accepted words. Requires augmenting the lattice as described above. * Use a language model to select between multiple accepted words. Requires augmenting the lattice as described above.
 ** Need a mid-20th century news corpus for training. ** Need a mid-20th century news corpus for training.
-* [[Sclite Viewer]]: take an Sclite file and view the contents in a way that shows each "​sausage"​. -- Assigned to [[User:Cr24|Chris Rotz]]+* [[Sclite Viewer]]: take an Sclite file and view the contents in a way that shows each "​sausage"​. -- Assigned to [[Cr24|Chris Rotz]]
 * [[Aligned Backpointer Viewer]]: take the aligned backpointer output of DocumentLattice and view the contents in a way that shows the optimal alignment, the "​sausages"​ and a count of the optimal paths for each sausage. * [[Aligned Backpointer Viewer]]: take the aligned backpointer output of DocumentLattice and view the contents in a way that shows the optimal alignment, the "​sausages"​ and a count of the optimal paths for each sausage.
  
Line 60: Line 60:
 ** Complete runs on both DS2 and Marylou5 ** Complete runs on both DS2 and Marylou5
  
-==Tasks for [[User:Cr24|Chris Rotz]]==+==Tasks for [[Cr24|Chris Rotz]]==
 * Come up to speed * Come up to speed
 * Run Eisenhower Communiques through Adobe OCR * Run Eisenhower Communiques through Adobe OCR
nlp-private/library-ocr-tasks.txt ยท Last modified: 2015/04/23 19:37 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0