Data Extraction using AI v/s Traditional template based data extraction
04 Mar 2019 Ashutosh Saitwal
data extraction using AI

Most of the mid to large enterprises globally have an average of 500+ suppliers who periodically send invoices every month for providing goods and services to these businesses. For reconciliation and payment management, these enterprises have to read these invoices manually, extract information (like invoice date, invoice number, due date, amount, tax amount, supplier name, PO reference), validate that information against the data in ERP and then enter the invoice in ERP.

Sounds tedious? Imagine doing the same for hundreds of invoices daily.

Most of this data processing in organizations today is manual. The biggest challenge in this scenario is that every supplier has a different invoice layout, format, and field naming conventions or text. The placement of each field and invoice layout differs from supplier to supplier. Even if the layouts are similar, the text could be different. Due to these completely non-standard invoices, automated data extraction is challenging and cumbersome.

Often all these invoices aren’t  structured according to any one specific invoice template and don’t conform to the set of layout rules. This increases the uncertainty of how the system will respond to data which isn’t aligned to the desired template.

In the past, there have been numerous attempts to automate the data extraction process. OCR method is one of the most common tools to extract the complete data in one big string but fails to arrange data systematically when complex invoices are processed and delivers inaccurate results.

Then there are several solutions that are based on OCR templates and rules.

In order to use these Template driven solutions, the users have to define ‘One set of template and rules per invoice layout.’ That means you have to define 1000’s of templates if you have 1000’s suppliers. This results in increased time and costs for the organization. Also, a template-driven approach may work when you have a small number of suppliers to deal with. The moment your suppliers start increasing rapidly, the speed of defining new templates should match up. Besides, even small changes in the existing supplier invoices will cause the data extraction to fail. So organizations have to continuously keep maintenance and support activities ongoing when they adopt a templates driven approach.

Our Solution : KlearStack - Data Extraction using AI

KlearStack  was developed with a clear goal to provide automated data extraction using AI, which means without using any templates and rules. The question we asked was “Could we train a machine to look at an invoice and make sense of the data on it, just like a human eye does?” With this thought-provoking question in mind, we set out to research various approaches to solve this problem.

After many experiments, our data scientists and machine learning developers created our proprietary Machine Learning model to extract specific fields, irrespective of the layouts. The model is continuously trained to understand the data extraction irrespective of layouts and formats/ field naming conventions. This eliminates the need for templates and saves a lot of time and money for the customers.  It facilitates data extraction using AI for financial documents like invoices, PO, Receipts and many more!

KlearStack  can sort and manage the data extraction through deep learning, OCR (Optical Character Reader) and NLR (Natural Language Representation) methods converting them from unstructured to structured data to increase productivity by 200%. The customers have an option to also leverage KlearStack RPA components to take this newly structured data extracted from the invoice to fill customer forms, ERP screens and to reconcile invoices . 

With KlearStack’s  RPA add-ons, it lays a strong foundation of automation for organizations that are willing to ramp-up their Accounts Payable and procure-to-pay process (P2P) cycles.

 

Recent Post

Why is Automation More Critical than Ever During the Pandemic?
Reinforcement Machine Learning and its Influence on Document Data Extraction
Invoice‌ ‌Automation‌ ‌Using‌ ‌OCR‌ ‌ Software:‌ ‌Here’s‌ ‌Everything‌ ‌You‌ ‌ Need‌ ‌to‌ ‌Know‌

Meta