Introduction
IBM® Datacap software is a key capability of the IBM Cloud Pak® for Business Automation. It streamlines the capture, recognition and classification of business documents. Its natural language processing and text analytics technologies identify, classify and extract content from unstructured or variable paper documents.
Datacap has a REST API, accessed through something called Datacap Web Services, which exposes its document processing capabilities. This provides a new level of flexibility, allowing external applications to interact with Datacap.
Datacap Web Services can be deployed in Azure, leveraging features which are native to Azure such as:
• Azure VM Scale Sets
• Azure SQL
• Azure Application Gateway
The Business Problem
A customer receives unstructured documents from a variety of sources. Some of these documents contain data which would traditionally have been surfaced using a Datacap application. The requirement was to provide an architecture which could be horizontally scaled out when demand increased and scaled back in when demand decreased. A traditional Datacap application requires Rulerunner Servers to process batches and Datacap Rulerunner does not easily lend itself to load balancing or to dynamic horizontal scaling. It is possible to use the Datacap REST API to create transactions which execute some Datacap Rulesets on the documents and return the data. Datacap Web Servers can be load-balanced and can therefore be scaled horizontally.
Corresponding White Paper
More detail about the solution is available in the corresponding White Paper, which can be found on the IBM Community here.
The Logical Architecture
The solution logical architecture consists of a Datacap Taskmaster Server, a single Datacap Web Server which was deployed inside an Azure Virtual Machine Scale Set (VMSS), an Azure Application Gateway which provides load balancing capabilities, an Azure SQL Database to host the Datacap application’s Admin Database and an instance of Azure Active Directory Services. SoapUI was used to exercise the Web Services Endpoints.
Configuration of Horizontal Scaling
The VMSS was set to scale out (add one additional web server) when the average CPU load exceeded 75% for 5 minutes. If the average CPU load was less than 25% for 5 minutes, the VMSS would scale in (remove one web server). The Application Gateway load balanced the web servers and distributed traffic to them as appropriate. The Azure VMSS has a theoretical limit of 1000 virtual machines. It’s difficult to imagine requiring that many Datacap Web Servers, but good to know that the capacity exists, should it ever become necessary!
Conclusion
The Datacap REST API provides a little-known means of consuming Datacap functionality. When coupled with the Azure VMSS, it unlocks the ability of the infrastructure to respond in real time to demands being placed upon it – something that would be very difficult to achieve with on-premises infrastructure.
For further information and help configuring this solution, please contact technical@openc.co.uk or call us on +44 (0) 1454 889966.
Open Connections is an innovative provider of Business Process Automation solutions. We have a wealth of expertise and have helped many organisations find solutions to their document-centred problems.