At Open Connections we have spent many years helping customers with their paper capture requirements. Whilst a significant proportion of this is now the capture of electronic documents via email and other sources there are still some sizable paper archives out there that can benefit from being converted to electronic form.
For some situations there is an ongoing requirement and we have solutions to help customers with this. Recently however, I was challenged to see if I could help in a situation where an organisation had a number of paper archives that they desperately needed to convert to an electronic form and index the documents using a few key fields of data that were included on the paper documents. Now I can’t show the actual documents here but they were of a financial nature so for this article we will use receipts to demonstrate.
In this article I will show how easy it was to create a low-cost solution that could achieve all the basic requirements without expensive software and weeks or months of configuration.
In future articles I will describe how we can use the facilities provided by Klippa DocHorizon to create mobile capture solutions using mobile phones or tablets, but for this scenario (imagine a number of filing cabinets all located in one room) it was a one-off requirement to capture the paper contents and store them with key pieces of data.
Step 1 – Capture
I considered a number of options here, various low-cost scanning tools were evaluated but none gave quite the level of control required. Eventually I settled on creating a custom scanning interface which offered the tight control needed to prevent users having to see scanner configuration dialogs (and potentially changing settings that could break the whole solution).
The scanning interface was written in Microsoft Visual C# and it did not take long to build an initial basic version. I have enhanced the functionality since and there are plans for more flexibility in the future Initially it has a simple “Scan” button, “Next” and “Previous” buttons to view other pages and a “Submit” button.
Driving the scanner was achieved by using an Open-Source component which supports WIA (Windows Image Acquisition) and TWAIN drivers. This allows for the use of a wide range of small, medium and even large volume scanners. For testing we hooked up up a trusty Fujitsu scanner with a speed of approximately 30 to 80 pages per minute but virtually any scanner from flat beds up would work.
Step 2 – The magic
The real challenge now was to extract key information from the scanned documents without the operator having to do a lot of manual data entry. To achieve this, I plugged in Klippa DocHorizon and specifically we wanted to use one of their AI driven document models. For this article I used the Financial model.
I could have called the Financial Model directly via the API but needed some additional flexibility so chose to do this via the DocHorizon FlowBuilder.
The first flow I built simply receives a webhook call from my C# code when the user clicks the “Submit” button, then performs the ‘magic’ using AI to extract key values, in this case the Vendor, Date and Amount and returns the desired data as a response.
This works well because the process is quite fast and the documents for this customer are quite small. If I needed, I could create an asynchronous system where we send the documents to Klippa and came back later for the results.
This could also be extended to have Klippa DocHorizon do much more of the work, send the image and/or the data to other systems including OneDrive, Google, SharePoint and a range of specific backends such as CRM’s and accounts packages (I will cover this in future articles).
Many people will be thinking that the effort to call out to an external AI engine must be huge? Well the code to do this fits on a single page:
It’s also worth considering that you could do this from VBA inside Excel or from PowerShell!
Step 3 – The Result
It takes less than 3 seconds on average to process the receipt and we have all the key data extracted (there is lots more data available; we are just displaying the three critical items).
If the operator clicks OK, the file is saved to an archive with the extracted data.
Unfortunately, in this case I can’t show the real documents the customer needed to process, however it was critical they could convert and index them as part of solving a business issue.
In the case of the real documents we opted to use the DocHorizon prompt builder, an amazing tool which allows us to write plain English ‘prompts’ such as ‘what it the total value of this contract’ or ‘what was the item that was repaired’ and have AI return the value to us, this allows us to process a vast range of document types including:
- Resumés (CVs)
- Contracts
- Delivery documents
- Maintenance records
I hope you found this article helpful in showing how a straightforward C# application can be combined with Klippa DocHorizon to create an effective yet easy-to-implement document capture solution. I’ll be publishing a separate article soon about using the prompt builder, so be sure to follow me to stay updated. If you have any questions about this article, feel free to connect with me.

0 Comments