Multi Page Invoice Processing Made Easier with KlearStack

Invoice data extraction and invoice processing has been quite a big challenge for business leaders, owing to the complexities that different data extraction software bring along. While AI-enabled OCRs are capable of interpreting and extracting meaningful data from the invoices, there is a significant aspect of intelligent document processing that businesses may be missing out on.

Multi page invoices represent a real problem when it comes to intelligently extracting relevant data from a number of pages. Such invoices form different categories of documents enclosed in a single file. While one or more pages in the file solely represent an invoice, the other pages may be supporting documents, copies of the original invoice, or multiple invoices from different vendors.

When extracting data from multi page invoices, it becomes extremely difficult to capture the exact data fields and ensure that fields from all the pages of the invoice are in line and accurately extracted into the system. However, KlearStack presents a comprehensive solution to the problem with the concept of intelligent document processing.

This article enlists the issues associated with multi page invoice processing and how KlearStack streamlines the process.

The accounts payable department can encounter the following issues while extracting data from a multi page invoice:

  1. While processing the multi page document containing invoice(s), the pages containing invoice data could get jumbled with supporting documents or vice-versa. Similarly, if the multi page file contains only invoices but from different vendors, while processing the file, the data from separate vendor invoices could mix with one another.
  2. When extracting data from a single vendor invoice but with multiple pages, the software would not be able to understand if the second page is in continuation with the first one, and the third page follows the second.
  3. Similarly, while processing two vendor invoices together, the software may fail to differentiate between the invoices and the extracted data results in a single table with data from the second invoice merged into the first.
  4. Another significant issue while processing multi page invoices is the expansion of line items (item table) in an invoice spanning across multiple pages. There are different scenarios that come along.
    • The space occupied by line items is different from each other. Some line items may spread across three lines while others in just one or two. Such a complication may hinder the accuracy of item extraction and create confusion.
    • When the line items span across the second or the third page of the invoice, the table may not have the headers repeated along. The next invoice page might only contain the line items, without an identification of the column to which they belong. In such a case, it is difficult to ensure the items are not jumbled between the pages or mixed within columns.
    • In some cases, when the invoice pages are not scanned in sequence, even with the headers copied on every page, you may not confirm if the line items on the second page you processed are in continuation with the line items of the first page.

Considering all the pitfalls associated with multi page document processing, it is necessary to invest in a software that has a proven solution for all the worst case scenarios mentioned above. That said, KlearStack is precisely that software that overcomes all the multi page invoice processing challenges.

How does KlearStack streamline and oversee multi page invoice processing?

You now have a clear understanding of all the possible problems that may arise while extracting data from multi page invoices. It’s time you see the solution in action!

KlearStack is an AI-enabled, OCR data extraction software that incorporates a template-less data extraction approach with end-to-end automation while processing documents. By leveraging intelligent document processing, KlearStack uses the following two techniques to intelligently extract and process data from multi page invoices.

1. Page Classification

In the page classification technique, KlearStack employs machine learning algorithms to group different pages of the same invoice in a single document in sequence and set aside the remaining pages. Let us see the different examples of the page classification technique.

Grouping pages of the same invoice in order

KlearStack can identify what pages belong to the same invoice and are in continuation with their previous ones. Owing to this identification, KlearStack extracts table data in the sequence of the original invoice. For example, in the images below, all the three pages of the same invoice are grouped into one document and in sequence.

While switching to the next page of the invoice, you can see the corresponding line items in the extracted table. With page classification, KlearStack is able to identify the page number, supplier name, and invoice number on all the three pages and grouped them together in order.

Differentiating invoice from the supporting documents

When a 2 pager invoice containing the invoice itself and its supporting document is scanned into KlearStack, it automatically identifies the invoice page and extracts relevant data while classifying the supporting document into “Not an Invoice” category, keeping it untouched.

In the above example, KlearStack could accurately extract data from the invoice page into the table along with other relevant information, but left the supporting Performa Invoice/Challan unprocessed and intact.

Differentiating invoices of two different vendors

When two different vendor invoices are scanned together into a multi page invoice, KlearStack automatically processes them into two different invoices/files with the same file name but precisely different supplier names, invoice number, and date.
The above two images represent two different vendor invoices but enclosed in the same multi page document. While processing, KlearStack automatically splits these invoices into one document each, representing different supplier names and invoice numbers.

KlearStack processed each of these invoices separately while extracting accurate data from each of them in order.

None of the invoice data is jumbled up with the other invoice and KlearStack successfully processed two different vendor invoices according to their corresponding details.

2. Line item extraction with computer vision and deep learning

The second problem associated with multi page invoice processing is line items spanning across multiple pages of the same invoice. KlearStack overcomes all the challenges of extracting accurate line items employing computer vision, deep learning, and natural language processing techniques. When different pages of a multi page invoice are classified into their respective categories, KlearStack proceeds with line items extraction.

If you have carefully looked at the images above, you may have noticed how line items from multiple pages were grouped in sequence in all the three examples. Let us revisit those cases, now focusing on extraction of line items.

Extracting line items in sequence from multiple pages of the same invoice

In the above images, KlearStack was able to extract accurate line item data from all the three pages of the invoice in sequence. The line items are not jumbled up between the pages and are extracted in accordance with related information. With the help of computer vision and deep learning algorithms, KlearStack is trained to identify the exact order of line items across the invoice pages.

Extracting accurate line items irrespective of their span size

Not to mention, the line items in the scanned images are spread across different spans. One line item takes a different amount of space than the other one. Even then, KlearStack is able to extract accurate line item data into the table without jumbling it with other line items.

This way, KlearStack prevents jumbling up of line items and invoice pages with each other and extracts relevant information from multi page invoices.

The bottom line is, extraction of data from multi page documents can be a real problem if you use inefficient and unreliable data extraction software. The techniques mentioned in this post require a unified strategy combining machine learning, computer vision, deep learning, and artificial intelligent techniques.

KlearStack has the ability and expertise to harness the intelligence of AI and ML for productive and meaningful invoice data extraction. If you wish to watch multi page document processing in action, contact us for a free demo of KlearStack.

Ashutosh Saitwal
Ashutosh Saitwal

Ashutosh is the founder and director of the award winning KlearStack AI platform. You can catch him speaking at NASSCOM events around the world where he speaks and is an evangelist for RPA, AI, Machine Learning and Intelligent Document Processing.