Advancement in artificial intelligence and machine learning has helped to automate various business operations across industries. Businesses look for lean growth and artificial intelligence plays an important role in it. Deep learning and computer vision are some of the many technologies of artificial intelligence that help businesses to automate their document management and data extraction processes.
Invoices and receipts are some documents that are part of every accounts department, no matter what industry you are in. Processing these documents manually is a tiresome job filled with errors. Deep learning technologies of artificial intelligence can reduce these errors and enhance the productivity of the employees as well as of the enterprise.
To understand how artificial intelligence leverages technologies like deep learning to recognize data on receipts, we will deep dive into the process.
Table of Contents
Deep Learning & Computer Vision
Two main technologies used by artificial intelligence to accurately recognize data on receipts are Deep Learning & Computer Vision.
Deep learning refers to capturing the data location on the receipt in the image by using a trained model that enables enhancement in its observation.
Computer vision refers to the application of various transformation to a particular image to capture the location on the receipt in the image and enhances its observation.
Deep learning can sometimes overkill the process of computer vision as computer vision is more efficient since it can get the job done with fewer lines of code. However, at times, deep learning is required for more robust and accurate data extraction. We will look at both procedures in detail.
Procedures of Deep Learning & Computer Vision
Below listed are steps used in receipt recognition and enhancement of its observation. The technical terms are explained in the next section.
- Look for the contour of the receipt in the image.
- Apply the affine transformation to rotate the receipt vertically.
- Capture the corner of the receipt.
- Apply homography transformation by using the covers and closing with bounding box
- Calculate the grid mapping between the contour and the bounding box, corresponding to it.
- To undistort the receipt, apply thin plate spline transformation.
The most important step, among the other steps listed above, is step 1. Contouring decides the success of the remainder of the receipt and data extraction process.
Operations of Computer Vision
The computer vision process was explained in six steps above. Now we will explore the meaning of different terminologies listed in the steps.
Contouring: It is a method of detecting shape that has edges around it. The algorithm can calculate the gradient of the pixel values surrounding it to detect the closed shape. This is known problem with computer vision and deep learning. By using contour for receipts and closing the bounding box, the receipt can rotate using affine transformation.
Bounding Box: It is the smallest rectangle that has vertical and horizontal edges that surrounds an object, in this case, a receipt.
Affine Transformation: In context of computer vision, this is set of linear transformations like rotating an image.
Homography Operation: This is a transformation that points to one image so you can observe this image from a different point of view. This is used in the case to rectify the image and modify the pitch or roll angles to observe it from the top.
Thin Plate Spine Transformation: This is an operation that can be used to rectify image distortion by applying a spatial mapping from different sets of grids.
Deep Learning approach for receipt recognition
There are many aspects to artificial intelligence. Computer vision being one of them and other one is deep learning. Deep learning aims to map inputs from data through different layered models to estimate variables as a classification or a regression problem. Image, text tables are some examples of input data. Output data can take any form that we would like it to take.
Instance segmentation is an aspect of deep learning that has the ability to detect distinct objects appearing in an image. It treats different objects of the same class as distinct individual aspects as it produces unique segmentation masks for each pixel as each instance.
The advantage in case of receipt recognition is that it can provide segmentation mask that can be used as a receipt contour even if it has multiple receipts in one image. Unlike in case of computer vision, deep learning need training of the datasets to learn from.
Combining the technologies of computer vision and deep learning, invoice and receipt recognition and data extraction takes place with high accuracy and reduces overall error rates.
KlearStack AI leverages these technologies to ensure that data is accurately extracted from unstructured documents. Straight Through Processing is also possible with KlearStack AI that enables documents to be processed end-to-end, without any manual intervention.
KlearStack AI also improves accuracy of data extraction as more and more documents are scanned through the system. Adaptive learning is an integral part of KlearStack AI.