Document images come in different shapes and qualities. Sometimes they are scanned, other times they are captured by handheld devices. Apart from the printed text these might also contain handwriting and structural elements such as boxes and tables. Thus the ideal OCR Software should :
- recognize well-scanned text reliably,
- be robust towards bad image quality and handwriting,
- output information on the formatting and structure of the document.
Comparison of the 5 Best OCR Software
In this article, we will compare and shed some light on the best OCR software available in the market.
AWS Tesseract is one of the best OCR software in the market. The best thing about Tesseract is in that it is free and easy to use. It is a command-line OCR engine tool developed by Hewlett-Packard, but its utilisation is simplified significantly with a Python wrapper called pytesseract. Also, there is a GUI frontend gImageReader, so you can choose the one that best fits your purposes.
However, we noticed that Tesseract’s image processing is very rudimentary. In order to get the most out of it, you need to use an image pre-processor or use an image that’s already been processed (This is also a major reason why KlearStack OCR comes handy since it has the built-in capability of pre-processing of the images before extracting the text through Tesseract).
ABBYY is comparatively versatile at extracting text from scanned files and images of well-scanned documents. This application can extract the text from some of the most popular image formats, like PNG, JPG, BMP, and TIFF and can also extra text from file formats like PDFs and files. All you have to do is upload a high-resolution image or file for the program to analyze, and then select which portions of extractable text you want to be saved.
However, the quality of OCR output degrades significantly when the scans are of poor quality and contain handwritten text. Besides, the text extracted from ABBBY needs further post-processing for domain-specific keywords, when complex financial documents are handled.
Google Cloud Vision API
Next in line is the Google Cloud Vision which is available to use via the API. Just like ABBBY FineReader, it is also a paid service (pricing).
Google Vision API does well on the scanned email and recognizes the text in the smartphone-captured document similarly well as ABBYY. However, it is much better than Tesseract or ABBYY in recognizing handwriting. On the other hand, Google Cloud Vision doesn’t handle tables very well: It extracts the text, but that’s about it.
Another major limitation that users face is that the Google vision OCR does not support document size of more than 10 MB, which can be a common use-case.
In fact, the original Cloud Vision output is a JSON file containing information about character positions. Just as for Tesseract, based on this information one could try to detect structural elements But again, this functionality is not built-in.
OmniPage SDK is another easy to use OCR SDK and can handle more complex document layouts, such as tables, columns, lists, and even graphics. Additionally, there are image editing tools that allow you to edit an image for clarity, to ensure optimal extraction.
But the major drawback is most of these features are Windows OS limited and need tedious configuration to accommodate into Linux based environments. Also, the output accuracy falls when the documents contain colored or highlighted text/ background.
KlearStack AI-driven OCR
KlearStack’s OCR which was built over Tesseract uses a HYBRID TECHNIQUE, which is the combination of the two techniques. First, using the deep learning algorithms, the region-based approach is used to detect a ‘text-containing’ zone. Then, with the usage of the Tesseract OCR, all the features are extracted from the text region.
To apprehend better, let us dig a bit deeper. Relying upon recent work in object detection, our Deep learning algorithm is able to simultaneously localize and recognize text blocks in arbitrarily complex documents. In order to train the model described here, we needed a large number of labeled images. Instead of generating and tagging these manually, we instead chose to develop our own synthetic training documents.
With enough variability in document layouts, fonts, sizes, highlighted colors, brand logos, and so on, our synthetic data was used to train models that are now able to perform well on real-world images of documents. We generated around ten thousand such images which led to a strong performance in the real world cases during the predictions. This was indeed a promising approach for increasing the efficiency of document processing pipelines.
Additionally, KlearStack’s OCR has also been customized to identify the currency symbols in the financial documents which even the best OCR software fail to identify. KlearStack OCR also leverages Natural Language Processing models for post-processing the raw OCR data. That ensures the domain-specific text is auto-corrected when the scan quality is below par.
Although KlearStack OCR performs versatile OCR over images and PDF documents, it was primarily developed and integrated to be a part of an AI-based software called KlearStack. Organizations across all industries have adopted KlearStack, owing to its template-less data extraction from financial documents.
It has been adopted to achieve end-to-end automation of a wide range of Accounting operations such as Invoice processing automation, Straight-through Receipt Processing, Fraud detection in Employee expense claims, Bank statement & Foreign Currency Reconciliations and much more!
KlearStack 90 in 90
Comparative Analysis of the 5 Best OCR software
For the TL;DR readers, we have summarized the above merits and limitations after the comparative analysis of the 5 Best OCR software, in the table below:
|Parameter||Tesseract||Omnipage Nuance||ABBBY||KlearStack’s OCR||Google Vision OCR|
|Overall Accuracy on documents||78%||84%||83%||85%||92%|
|Accuracy on scanned images||70%||75%||77%||80%||90%|
|Accuracy on Native PDFs||88%||95%||94%||93%||96%|
|Ability to read text on coloured background||Can Read with low accuracy||Cannot Read||Can Read with low accuracy||Can Read with low accuracy||Can read it perfectly|
|Ability to read single digits||Can read single digits perfectly||Can read single digits perfectly||Can read single digits perfectly||Can read single digits perfectly||Misses out single digits|
|Ability to recognize Handwritten text||Low Accuracy on Handwritten||Low Accuracy on Handwritten||Low Accuracy on Handwritten||Best Accuracy on Handwritten||Low Accuracy for Handwritten text|
|Service||Library||SDK||SDK||Library.||Only Cloud based|
|Indian Currency (Rupee) Symbol||Cannot recognize||Cannot recognize||Cannot recognize||Can recognize||Cannot recognize|
The key take-aways concluded from above tabular comparison
- If you deal with machine-written and well-scanned documents, or maybe PDF files lacking metadata, then Tesseract OCR might do the job, although the commercial services are more reliable.
- If recognition of handwritten characters is important for you, Google Cloud Vision is your only viable option among the tested ones as of today.
- If the document image quality is bad, both ABBYY FineReader and Google Cloud Vision still do a good job.
- If your aim is to extract tabular information, you might want to choose ABBYY FineReader.
You should Opt KlearStack, if you expect following features in the OCR :
- If you deal with Machine-written and well-scanned images of documents or PDF files.
- Owing to the security constraints, you are not comfortable with the cloud based API and rather focus on an On-premise OCR.
- Need to extract text with its metadata /co-ordinates, i.e xmin, xmax, ymin, ymax, width, height
- Your application needs to extract line wise text across multiple text columns.
- Expect the OCR to detect the file properties of scanned image
- Need the OCR to support wide variety of document types like: “.pdf”, “.jpg/.jpeg”, “.png”, “.bmp”, “.tiff”
- Any size of document/image support (Google vision does not support document size more than 10 mb)
- Detection of structures like tables and blocks.
- Accurately extracting data from unseen documents in first attempt.
- Transforming and validating data.
- End-to-end document automation with Straight Through Processing.
If you want to opt for the best OCR software for your organization, Contact KlearStack for a free demo today!