5 Best OCRs in the industry compared!
02 Oct 2019 Sai Prasad

Need for the Best OCR software

Our comparison for the 5 Best OCR softwares began when we started building an ‘Intelligent document processing’ software that is capable of capturing ‘structured data’ from unstructured financial documents such as invoices, receipts, sales orders, etc. I assume readers are aware that OCR software makes it possible to recognize text in scanned documents and images, and convert it to a searchable and editable format. Since we needed the full-text search over the documents, we were searching for a tool that can perform OCR as well as possible. We hoped there would be a few good, modern tools in the market from the major OCR services. And so, we did some research on the current OCR providers, but as of July 2019, there wasn’t — so finally, we wrote one! 

What we basically developed, is a wrapper over the Tesseract library to enrich its capabilities. Our intention was to create a library that enhances the Tesseract features making it as versatile as Google Cloud Vision OCR, but is also available for offline utilization. 

What are the common expectations from an ideal OCR tool?

Document images come in different shapes and qualities. Sometimes they are scanned, other times they are captured by handheld devices. Apart from the printed text these might also contain handwriting and structural elements such as boxes and tables. Thus the ideal OCR tool should :

  1. recognize well-scanned text reliably,
  2. be robust towards bad image quality and handwriting,
  3. output information on the formatting and structure of the document.

Comparison of the 5 Best OCR Softwares

In this article, we will compare and shed some light on :

  1. Tesseract OCR
  2. ABBYY FineReader
  3. Kofax Omnipage (previously Nuance)
  4. Google Cloud Vision
  5. KlearStack’s OCR

1. Tesseract OCR:

The best thing about Tesseract is in that it is free and easy to use. It is a command-line OCR engine tool developed by Hewlett-Packard, but its utilisation is simplified significantly with a Python wrapper called pytesseract. Also there is a GUI frontend gImageReader, so you can choose the one that best fits your purposes. However, we noticed that Tesseract’s image processing is very rudimentary. In order to get the most out of it, you need to use an image pre-processor or use an image that’s already been processed (This is also a major reason why KlearStack OCR comes handy since it has the built-in capability of pre-processing of the images before extracting the text through Tesseract).

2. ABBYY FineReader

ABBYY is comparatively versatile at extracting text from scanned files and images of well-scanned documents. This application can extract the text from some of the most popular image formats, like PNG, JPG, BMP, and TIFF and can also extra text from file formats like PDFs and files. All you have to do is upload a high-resolution image or file for the program to analyze, and then select which portions of extractable text you want to be saved. However, the quality of OCR output degrades significantly when the scans are of poor quality and contain handwritten text. Besides, the text extracted from ABBBY needs further post-processing for domain-specific keywords, when complex financial documents are handled.

3. Google Cloud Vision API

Next in line is the Google Cloud Vision which is available to use via the API. Just like ABBBY FineReader, it is also a paid service (pricing).

Google Vision API  does well on the scanned email and recognizes the text in the smartphone-captured document similarly well as ABBYY. However, it is much better than Tesseract or ABBYY in recognizing handwriting. On the other hand, Google Cloud Vision doesn’t handle tables very well: It extracts the text, but that’s about it.

Another major limitation that users face is that the Google vision OCR does not support document size of more than 10 MB, which can be a common use-case.

In fact, the original Cloud Vision output is a JSON file containing information about character positions. Just as for Tesseract, based on this information one could try to detect structural elements But again, this functionality is not built-in.

4. OmniPage Nuance: 

OmniPage SDK is another easy to use OCR SDK and can handle more complex document layouts, such as tables, columns, lists, and even graphics. Additionally, there are image editing tools that allow you to edit an image for clarity, to ensure optimal extraction. But the major drawback is most of these features are Windows OS limited and need tedious configuration to accommodate into Linux based environments. Also, the output accuracy falls when the documents contain colored or highlighted text/ background.

5. KlearStack OCR:

KlearStack’s OCR which was built over Tesseract uses a HYBRID TECHNIQUE, which is the combination of the two techniques. First, using the deep learning algorithms, the region-based approach is used to detect a ‘text-containing’ zone. Then, with the usage of the Tesseract OCR, all the features are extracted from the text region.

To apprehend better, let us dig a bit deeper. Relying upon recent work in object detection, our Deep learning algorithm is able to simultaneously localize and recognize text blocks in arbitrarily complex documents. In order to train the model described here, we needed a large number of labeled images. Instead of generating and tagging these manually, we instead chose to develop our own synthetic training documents. With enough variability in document layouts, fonts, sizes, highlighted colors, brand logos, and so on, our synthetic data was used to train models that are now able to perform well on real-world images of documents. We generated around ten thousand such images which led to a strong performance in the real world cases during the predictions. This was indeed a promising approach for increasing the efficiency of document processing pipelines. 

Additionally, KlearStack’s OCR has also been customized to identify the currency symbols in the financial documents which even the Top notch OCRs fail to identify. KlearStack OCR also leverages Natural Language Processing models for post-processing the raw OCR data. That ensures the domain-specific text is auto-corrected when the scan quality is below par.

Although KlearStack OCR performs versatile OCR over images and PDF documents, it was primarily developed and integrated to be a part of an AI-based software called KlearStack. Organizations across all industries have adopted KlearStack, owing to its template-less data extraction from financial documents. It has been adopted to achieve end-to-end automation of a wide range of Accounting operations such as Invoice Automation, Straight-through Receipt Processing, Fraud detection in Employee expense claims, Bank statement & Foreign Currency Reconciliations and much more!

What were the Image Processing Results?

For the TL;DR readers, we have summarized the above merits and limitations after the comparative analysis of the 5 Best OCR softwares, in the table below:

 

Parameter

Tesseract

Omnipage Nuance

ABBBY

KlearStack’s OCR

Google Vision OCR

Overall Accuracy on documents

78%

84%

83%

85%

92%

Accuracy on scanned images

70%

75%

77%

80%

90%

Accuracy on Native PDFs

88%

95%

94%

93%

96%

Ability to read text on coloured background

Can Read with low accuracy

Cannot Read

Can Read with low accuracy

Can Read with low accuracy

Can read it perfectly

Ability to read single digits

Can read single digits perfectly

Can read single digits perfectly

Can read single digits perfectly

Can read single digits perfectly

Misses out single digits

Ability to recognize Handwritten text

Low Accuracy on Handwritten

Low Accuracy on Handwritten

Low Accuracy on Handwritten

Low Accuracy on Handwritten

Best OCR machine for Handwritten text

Service

Library

SDK

SDK

Library.

Only Cloud based

Indian Currency (Rupee) Symbol

Cannot recognize

Cannot recognize

Cannot recognize

Can recognize

Cannot recognize

Comparative pricing

Low

Moderate

Pricy

Moderate

Low

Conclusion :

The key take-aways concluded from above tabular comparison:

  1. If you deal with machine-written and well-scanned documents, or maybe PDF files lacking metadata, then Tesseract OCR might do the job, although the commercial services are more reliable.
  2. If recognition of handwritten characters is important for you, Google Cloud Vision is your only viable option among the tested ones as of today.
  3. If the document image quality is bad, both ABBYY FineReader and Google Cloud Vision still do a good job.
  4. If your aim is to extract tabular information, you might want to choose ABBYY FineReader.
  5. You should Opt KlearStack, if you expect following features in the OCR :
    1. If you deal with Machine-written and well-scanned images of documents or PDF files.
    2. Owing to the security constraints, you are not comfortable with the cloud based API and rather focus on an On-premise OCR. 
    3. Need to extract text with its metadata /co-ordinates, i.e xmin, xmax, ymin, ymax, width, height
    4. Your application needs to extract line wise text across multiple text columns.
    5. Expect the OCR to detect the file properties of scanned image
    6. Need the OCR to support wide variety of document types like:  “.pdf”, “.jpg/.jpeg”, “.png”, “.bmp”, “.tiff”
    7. Any size of document/image support (Google vision does not support document size more than 10 mb)
    8. Detection of structures like tables and blocks

Recent Post

How is the Accuracy Rate of an OCR Scanner Measured?
How Can I Improve My OCR Accuracy Rate?
What is Optical Character Recognition and How Does AI Make it Better?

Meta