Document hierarchy
The document hierarchy is a core element of the design of the capture system. In addition to defining structure, the document hierarchy provides the information that the Datacap system requires to identify and assemble documents.
To process documents, Datacap requires a way of representing documents so that it can inspect their structure and component parts, manipulate them, and extract the target data. In fact, Datacap must represent not just the documents but the entire contents of an acquisition session. An acquisition session typically contains a set of documents that must accompany each other and that Datacap can handle as a whole.
- Signed application form
- Pay stub
- Identification card with photograph
For this purpose, Datacap has a flexible object model that is called the document hierarchy or datacap object (DCO). The document hierarchy comprises a batch, or a container and unit of work, that is processed as a whole. The batch contains one or more documents with one or more pages, each with one or more fields.
A batch corresponds to the physical grouping of individual sheets of paper or pages that are acquired together during a capture session. The document is recognized afterDatacap inspects the content of the batch and applies rules to separate the documents. Rules can specify separator pages or known page types that mark the start or end of documents. Rules can use knowledge of a batch structure, for example, if the batch is structured such that a new document starts every four pages.
- The document types that the application can process
- You might have only one type, or you might have multiple types. For example, an application might process marketing postcards, refinance loan applications, and bank statement document types.
- The page types within each document type
- Each document might have only one page type, or it might have multiple types. The loan application document type might include several pages: cover page, customer information page, signature page, and more.
- The number and order of pages within each document type
- Pages can be required or optional.
- The data fields within each page type
- Data fields can also be required or optional. The marketing postcard has different fields because it is a fairly basic document. It contains such things as name, phone number, and date. The loan application documents have many more data fields and might include address, personal identification number, bar codes, check boxes, signatures, and more.
The document hierarchy that is defined when you design an application is the setup DCO. The setup DCO is used as a blueprint for creating runtime instances during the capture process. After data is extracted and exported to the back-end repository, the instance is deleted by the Datacap maintenance process.