ZDNet UK


Skip to Main Content

ZDNet.co.uk - Winner of Best Business Website 2007
  1. Home
  2. News
  3. Blogs
  4. Reviews
  5. Prices
  6. Resources
  7. Community
  8. My ZDNet

 

ZDNet UK RSS Feeds


Enterprise open source Toolkit

Google slips out open source OCR engine

David Meyer ZDNet.co.uk

Published: 05 Sep 2006 12:55 BST

  • Email
  • Trackback
  • Clip Link
  • Print friendly
  • Post Comment

Google has announced that it "quietly released" a veteran optical character recognition (OCR) engine as open source a few months ago.

The engine, Tesseract, was developed between 1985 and 1995 by HP Labs to some acclaim, but was filed away when the company pulled out of the OCR business.

According to a recent Google Code Blog post by "Uber Tech Lead" Luc Vincent, a couple of HP employees decided to dust it off as open source software with the help of the Information Science Research Institute at UNLV, who in turn called on Google to help with debugging.

Tesseract is mostly covered by the Apache open source licence, although part is covered by a second licence that may put some restrictions on commerical use.

Although Vincent admitted that Tesseract was not currently a strong competitor to commercial OCR engines due to various issues — it only supports English, performs poorly with multi-column material and balks at greyscale or colour documents — he insisted it was "far more accurate than any other open source OCR package out there".

Interestingly, the post also mentioned that Google was looking to hire "top-notch OCR engineers".

  • Email
  • Trackback
  • Clip Link
  • Print friendly
  • Post Comment

Did you find this article useful?
210 out of 292 people found this useful


Company/Topic Alerts

Create a new alert from the list below:






Featured Talkback

Its the applications and device drivers that run on windows that cement its dominance. How many people would fork out hundreds of pounds for Vista if Linux ran all the software and kit they wanted to use.

By: pround

Read full story:
Windows' dominance stifles demand for Linux

Discussions

Tezzer Tezzer

Wot?

Saturday 30 August 2008, 12:04 AM

3 comments
Tezzer Tezzer

Ofcom to consider customer termination...

Saturday 30 August 2008, 12:03 AM

1 comment