Differences

This shows you the differences between two versions of the page.

Link to this comparison view

nlp-private:cr24 [2015/04/23 19:36] (current)
ryancha created
Line 1: Line 1:
 +__NOTOC__
 +=Chris Rotz=
 +<!-- Please add contact information here. -->
 +
 +<!-- As you can see you can add comments to the page that are not displayed. ​
 +  If you surround text with equals signs, you get headings. The more equals signs, the lower the ranking
 +  Make each day a separate heading -->
 +{|cellspacing="​3"​ |- valign="​top"​ |width="​50%"​ style="​border:​ 1px solid #ffc9c9; color: #000"|
 +|
 +<!-- Chris, edit below this line and it will appear in the column on the left side of the page... -->
 +==Wednesday,​ December 2, 2009==
 +
 +Updated documentlattice class for Adobe and Readiris
 +
 +Completed simple and combined confusion matrices for Omnipage and Tesseract
 +
 +Fixing transcripts. ​ Transcript 217 is actually 217a and 217 is missing. ​ Will update confusion matrices when this is done.
 +
 +==Monday, November 30, 2009==
 +
 +Did alignments for Omnipage and Tesseract
 +
 +Completed combined confusion matrix for Abbyy
 +
 +==Friday, November 11, 2009==
 +
 +Created a new set of transcript files with the variants and comments removed. ​ The new files are found in fslg_nlpocr/​OCR/​transcriptRemovedComments.
 +
 +Re-did abbyy/​transcript alignments.
 +
 +
 +==Wednesday,​ November 11, 2009==
 +
 +Continue work on confusion matrix.
 +
 +Noticed that some transcript files contain extra information. ​ Many of them contain variant, or corrected spellings in brackets such as PILSEN [PLZEN]. ​ This is changing alignments and alignment scores.
 +
 +I could remove or comment out these variants if it has not already been done.
 +
 +==Friday, November 6, 2009==
 +
 +Continue work on confusion matrix.
 +
 +==Wednesday,​ November 4, 2009==
 +
 +The blank files were a problem with Marylou. ​ It is fixed now, so I have done the alignments.
 +
 +Working on the confusion matrix. ​ Creating an XML-table type of structure to hold information for each document and OCR engine.
 +
 +==Monday, November 2, 2009==
 +
 +Use Marylou to align Abbyy output with transcript output
 +
 +Alignment files are coming back blank. ​ They were working earlier, and I have not changed anything?
 +
 +==Wednesday,​ October 28, 2009==
 +
 +Complete Sclite output for Adobe and Readiris. ​ Sclite results are on Marylou5 with the other baseline results.
 +The previous converting errors are due to these characters ('​•'​ '​·'​) doing different things in mac and unix environments.
 +
 +==Monday, October 26, 2009==
 +
 +Work on Sclite output for Adobe and Readiris. ​ Converting Adobe to sclite friendly output works on the mac, but not in Unix.  Not sure why yet.
 +
 +==Monday, October 12, 2009==
 +
 +Work on improving Irisreader output.
 +
 +Work on implementing Apache CLI.
 +==Friday, October 9, 2009==
 +
 +Work on improving Irisreader output.
 +
 +Work on learning Apache CLI.
 +==Wednesday,​ October 7, 2009==
 +
 +sclite viewer documentation
 +
 +Begin work on Irisreader output formatting (sclite friendly) program.
 +==Monday, October 5, 2009==
 +
 +Finished with Irisreader fixes.
 +
 +==Friday, October 2, 2009==
 +
 +The Irisreader files are being read incorrectly with sections of text being placed out of order or interpreted as images. ​ I am going through and fixing them.
 +
 +==Wednesday,​ September 30, 2009==
 +
 +Finish sclite Viewer (Still needs better documentation,​ and to be tested on PC and Linux)
 +
 +Continue working on ReadIris OCR
 +
 +Start learning BASH shell
 +
 +==Monday, September 28, 2009==
 +
 +Work on sclite Viewer and ReadIris
 +
 +==Wednesday,​ September 23, 2009==
 +
 +Work on sclite Viewer
 +
 +==Monday, September 21, 2009==
 +
 +Learn Readiris OCR
 +
 +Finish Adobe OCR
 +
 +==Friday, September 18, 2009==
 +
 +Adobe OCR of Eisenhower Communique and DesNews
 +
 +==Wednesday,​ September 16, 2009==
 +Set up new computer
 +
 +Connect to marylou5
 +:copy TIFF files (in progress)
 +
 +Become more familiar with BASH commands
 +
 +==Friday, September 11, 2009==
 +Read over documents:
 +
 +:Improving Optical Character Recognition...
 +:An Improved Search Algorithm...
 +
 +Met with Bill.
 +
 +Transcribing documents. (Abandoned)
 +
 +NOCR meeting.
 +
 +
 +<!-- End of left panel. Do not edit below this line. -->
 +
 +|valign="​top"​|
 +==Stuff for Chris to Do==
 +
 +* Look on Marylou5 and determine which files are missing that are referenced in the trainingSet.txt and devTest.txt file. 
 +** OCR/*
 +** Results/​Baseline OCR Sclite Results/*
 +** Sclite/*
 +
 +|}
  
nlp-private/cr24.txt · Last modified: 2015/04/23 19:36 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0