Differences

This shows you the differences between two versions of the page.

Link to this comparison view

nlp-private:noisy-ocr-group [2015/04/23 19:20] (current)
ryancha created
Line 1: Line 1:
 +__TOC__
 +
 +<br />
 +
 +== Planned Meeting Topics ==
 +
 +<br />
 +
 +[[Past Meetings]]
 +
 +<br />
 +
 +== Future Meeting Agenda Items ==
 +
 +For Now:
 +
 +* Review annotation tools, e.g. Lehigh'​s http://​dae.cse.lehigh.edu/​DAE/,​ GEDI, PixLabeler.
 +* Dan and Dr. Ringger present a good paper on relevant current research.
 +* Get CIKM "​AND"​ proceedings,​ '07, '08, '​09. ​ Very relevant venue for this group.
 +* Josh Hansen'​s HMM-LDA project?
 +* Report on state of data.
 +* Decide on an annotation tool ([[Image Annotation Tools]])
 +* Decide on an OCR engine ([[OCR Engines]])
 +* Document image data sets ([[Document Image Data Sets]])
 +* Funding discussion.
 +* Aaron to demo FOCIH-based image annotation tool.
 +
 +For Later:
 +
 +* Create annotation plan for Ancestry and other data.
 +* Hire under-grads,​ library, request Ancestry employees to do annotation.
 +* Dr. Ringger to invite Dan Lopresti to come to BYU.
 +* Decide to make a competition.
 +* More clarity on directions and contributions,​ including specific projects, what tools to use (Ocropus?), how to leverage those tools, what data to work on, how to magnify our efforts by working together as a group, etc.
 +* Plan for zoning, layout, language modeling, table interpretation and E/R classification earlier in the (Ocropus) pipeline. ​
 +* Should we use Ocropus, train a character recognition model from Internet Archive data, hire an undergrad to do this plus output bounding boxes, etc.?
 +* CHURP to NSF Proposal (sometime in June to August).
 +* OCRopus update.
 +
 +<br />
 +
 +== Major Tasks ==
 +
 +These are big tasks affecting everyone in the NOCR group.
 +
 +* '''​[[Image Annotation Tools]]''': ​ Find, adapt or create an annotation tool to help us create good benchmarking data sets, including gold standard zone, entity and relationship annotations.
 +* '''​[[Document Image Data Sets]]''': ​ Find, annotate and extract from one or more document image data sets.
 +* '''​[[OCR Engines]]''': ​ Find a good OCR engine and use it.
 +* '''​[[Annotation Guidelines and Metrics]]''':​
 +* '''​[[Papers]]'''​
 +
 +<br />
 +
 +== Participants ==
 +
 +* Aaron Stewart
 +* Bill Lund
 +* Dan Walker
 +* Thomas Packer (tpacker@byu.net)
 +* Dr. David Embley
 +* Dr. Eric Ringger
  
nlp-private/noisy-ocr-group.txt ยท Last modified: 2015/04/23 19:20 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0