• Cuneiform

:Pros :*Free :*Mostly open source. The rest to be released in the future. :Cons :*Currently Windows only, but is being ported to Mac and Linux in the future. Work on this seems to have stalled. :*Text output only, no searchable PDF :*A lot of the documentation is in Russian

  • GOCR (JOCR)

:Pros :*Free :*Open source :Cons :*Text output only :*Images are converted to PBM, PGM, PPM. Is this a lossy conversion?

  • OCRAD

:Pros :*Free :*Open source :Cons :*Text output only :*PBM, PGM, and PPM images only

  • Expervision (This has a trial version which I have not been able to get yet)

:Pros :*Exports to searchable PDF :*Has an SDK for use with C/C++ :Cons :*Royalty fees for licensing model :*Says it is compatible with all operating systems, but demo information says Visual C++ is required?

  • <span style=“color:blue;”>Microsoft Office Digital Imaging (I have not found a computer with this installed yet)</span>

:Pros :*Comes free with some/all versions of MS office on Windows. It is an optional install, so many computers do not have it installed. :*I have read that text coordinates can be obtained from MODI. :*Takes TIFF images :Cons :*Windows only

  • ReadSoft

:ReadSoft is geared more toward businesses looking for ways to automate large-scale document processing for managing and organizing data. OCR is only a small part of what they do. In talking with them, it doesn't seem like they have a product that is very specific to what we are doing.

  • <span style=“color:blue;”>SimpleOCR (I have not demo-ed yet)</span>

:Pros :*Freeware version including command line version and SDK :*Documentation says it can return coordinates of recognized words and images :*Takes TIFF and other images :Cons :*Windows only :*Does not appear to output PDF

  • PDF OCR X

:Pros :* Free version and pay ($30)“Enterprise” version. Free version restricts PDFs to one page. :* Takes TIFF files. :Cons :*Only text output

  • NovoDynamics

:Pros :*Professional version creates searchable PDFs :Cons :*Focused mainly on Middle-Eastern & Asian languages. Works on “Embedded English.” :*Expensive. Standard version costs $1300. Professional version is “call for pricing.” :*No demo available (at least not to me, perhaps if I were a better potential customer?)

  • MoreData/MoreDataFast (this is based on tesseract)

:Pros :*Free :Cons :*Windows only :*Text output only :*Documentation in Italian

  • BrainWare

:Brainware is a product like ReadSoft, geared toward the bigger picture of automating data management. OCR is only a small piece of what they do.

nlp-private/ocr-engine-pros-and-cons.txt · Last modified: 2015/04/23 19:38 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0