__NOTOC__

Chris Rotz

<!– Please add contact information here. –>

<!– As you can see you can add comments to the page that are not displayed.

 If you surround text with equals signs, you get headings. The more equals signs, the lower the ranking
 Make each day a separate heading -->

<!– Chris, edit below this line and it will appear in the column on the left side of the page… –> ==Wednesday, December 2, 2009==

Updated documentlattice class for Adobe and Readiris

Completed simple and combined confusion matrices for Omnipage and Tesseract

Fixing transcripts. Transcript 217 is actually 217a and 217 is missing. Will update confusion matrices when this is done.

==Monday, November 30, 2009==

Did alignments for Omnipage and Tesseract

Completed combined confusion matrix for Abbyy

==Friday, November 11, 2009==

Created a new set of transcript files with the variants and comments removed. The new files are found in fslg_nlpocr/OCR/transcriptRemovedComments.

Re-did abbyy/transcript alignments.

==Wednesday, November 11, 2009==

Continue work on confusion matrix.

Noticed that some transcript files contain extra information. Many of them contain variant, or corrected spellings in brackets such as PILSEN [PLZEN]. This is changing alignments and alignment scores.

I could remove or comment out these variants if it has not already been done.

==Friday, November 6, 2009==

Continue work on confusion matrix.

==Wednesday, November 4, 2009==

The blank files were a problem with Marylou. It is fixed now, so I have done the alignments.

Working on the confusion matrix. Creating an XML-table type of structure to hold information for each document and OCR engine.

==Monday, November 2, 2009==

Use Marylou to align Abbyy output with transcript output

Alignment files are coming back blank. They were working earlier, and I have not changed anything?

==Wednesday, October 28, 2009==

Complete Sclite output for Adobe and Readiris. Sclite results are on Marylou5 with the other baseline results. The previous converting errors are due to these characters ('•' '·') doing different things in mac and unix environments.

==Monday, October 26, 2009==

Work on Sclite output for Adobe and Readiris. Converting Adobe to sclite friendly output works on the mac, but not in Unix. Not sure why yet.

==Monday, October 12, 2009==

Work on improving Irisreader output.

Work on implementing Apache CLI. ==Friday, October 9, 2009==

Work on improving Irisreader output.

Work on learning Apache CLI. ==Wednesday, October 7, 2009==

sclite viewer documentation

Begin work on Irisreader output formatting (sclite friendly) program. ==Monday, October 5, 2009==

Finished with Irisreader fixes.

==Friday, October 2, 2009==

The Irisreader files are being read incorrectly with sections of text being placed out of order or interpreted as images. I am going through and fixing them.

==Wednesday, September 30, 2009==

Finish sclite Viewer (Still needs better documentation, and to be tested on PC and Linux)

Continue working on ReadIris OCR

Start learning BASH shell

==Monday, September 28, 2009==

Work on sclite Viewer and ReadIris

==Wednesday, September 23, 2009==

Work on sclite Viewer

==Monday, September 21, 2009==

Learn Readiris OCR

Finish Adobe OCR

==Friday, September 18, 2009==

Adobe OCR of Eisenhower Communique and DesNews

==Wednesday, September 16, 2009== Set up new computer

Connect to marylou5 :copy TIFF files (in progress)

Become more familiar with BASH commands

==Friday, September 11, 2009== Read over documents:

:Improving Optical Character Recognition… :An Improved Search Algorithm…

Met with Bill.

Transcribing documents. (Abandoned)

NOCR meeting.

<!– End of left panel. Do not edit below this line. –>

==Stuff for Chris to Do==
  • Look on Marylou5 and determine which files are missing that are referenced in the trainingSet.txt and devTest.txt file.
    • OCR/*
    • Results/Baseline OCR Sclite Results/*
    • Sclite/*
nlp-private/cr24.txt · Last modified: 2015/04/23 19:36 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0