Data Extraction using AI v/s Traditional template based data extraction

Most of the mid to large enterprises globally have an average of 500+ suppliers who periodically send invoices every month for providing good and services to these businesses. For reconciliation and payment management, these enterprises have to read these invoices manually, extract information (like invoice date, invoice number, due date, invoice amount, tax amount, supplier name, PO reference number), validate that information against the data in ERP and then enter the invoice in ERP.

Sounds tedious? Imagine doing the same for hundreds of invoices daily.

Most of this data processing in organizations today is manual. The biggest challenge in this scenario is that every supplier has a different invoice layout, format, and field naming conventions or text. The placement of each field and invoice layout differs from supplier to supplier. Even if the layouts are similar, the text could be different. Due to these completely non-standard invoices, automated data extraction is challenging and cumbersome.

Often all these invoices aren’t  structured according to any one specific invoice template and don’t conform to the set of layout rules. This increases the uncertainty of how the system will respond to data or information which isn’t aligned to the desired template.

In the past, there have been numerous attempts to automate the data extraction process. OCR method is one of the most used tools to extract the complete data in one big string but fails to arrange data systematically when complex invoices are processed and delivers inaccurate results.

Then there are several solutions that are based on OCR templates and rules.

In order to use these Template driven solutions, the users have to define ‘One set of template and rules per invoice layout.’ That means you have to define 1000’s of templates if you have 1000’s suppliers. This results in increased time and costs for the organization. Also, a template-driven approach may work when you have a small number of suppliers to deal with. The moment your suppliers start increasing rapidly, the speed of defining new templates should match up. Besides, even small changes in the existing supplier invoices will cause the data extraction to fail. So organizations have to continuously keep maintenance and support activities ongoing when they adopt a templates driven approach.

Our Solution : KlearStack

KlearStack  was developed with a clear goal to provide automated data extraction without using any templates and rules. The question we asked was “Could we train a machine to look at an invoice and make sense of the data on it, just like a human eye does?” With this thought-provoking question in mind, we set out to research various approaches to solve this problem.

After many experiments, our data scientists and machine learning developers created our proprietary Machine Learning model to extract specific fields, irrespective of the layouts. The model is continuously trained to understand the data extraction irrespective of layouts and formats/ field naming conventions. This eliminates the need for templates and saves a lot of time and money for the customers.  It facilitates data extraction using AI for financial documents like invoices, PO, Receipts and many more!

KlearStack  can sort and manage the extraction through deep learning, OCR (Optical Character Reader) and NLR (Natural Language Representation) methods converting them from unstructured to structured data to increase productivity by 200%. The customers have an option to also leverage KlearStack RPA components to take this newly structured data extracted from the invoice to fill customer forms, ERP screens and to reconcile invoices. 

Ashutosh Saitwal is a transformation leader with over 25 years of experience in both large enterprise and start-up environments. He has globally led cross-functional teams for $10 M to $100 M business units at the organizations like Symantec, ZS Associates, BindView, Capgemini, IDeaS (a SAS company), Scala etc.
Ashutosh has deep of experience with Automation Programs in the fields of RPA, DevOps, software testing and marketing. He has been a very early adopter of RPA.
He has 360 degree view of automation – extensively used automation in end user organizations as well as led an automation technology platform & services business. Ashutosh believes that a great automation is more about a mindset than about the tool-set.

    Leave a Reply

    Your email address will not be published. Required fields are marked*