Document hierarchy

The document hierarchy is a core element of the design of the capture system. In addition to defining structure, the document hierarchy provides the information that the Datacap system requires to identify and assemble documents.

To process documents, Datacap requires a way of representing documents so that it can inspect their structure and component parts, manipulate them, and extract the target data. In fact, Datacap must represent not just the documents but the entire contents of an acquisition session. An acquisition session typically contains a set of documents that must accompany each other and that Datacap can handle as a whole.

For example, a credit card application might include the following documents:
  • Signed application form
  • Pay stub
  • Identification card with photograph
All of the documents must be processed as a single transaction so that they can be delivered together to business users. You also must uniquely identify that application for tracking purposes so that business users can refer to it and retrieve it from the back-end content management system.

For this purpose, Datacap has a flexible object model that is called the document hierarchy or datacap object (DCO). The document hierarchy comprises a batch, or a container and unit of work, that is processed as a whole. The batch contains one or more documents with one or more pages, each with one or more fields.

Diagram of the document hierarchy.

A batch corresponds to the physical grouping of individual sheets of paper or pages that are acquired together during a capture session. The document is recognized afterDatacap inspects the content of the batch and applies rules to separate the documents. Rules can specify separator pages or known page types that mark the start or end of documents. Rules can use knowledge of a batch structure, for example, if the batch is structured such that a new document starts every four pages.

Beneath the batch level, the document hierarchy defines the following information:
The document types that the application can process
You might have only one type, or you might have multiple types. For example, an application might process marketing postcards, refinance loan applications, and bank statement document types.
The page types within each document type
Each document might have only one page type, or it might have multiple types. The loan application document type might include several pages: cover page, customer information page, signature page, and more.
The number and order of pages within each document type
Pages can be required or optional.
The data fields within each page type
Data fields can also be required or optional. The marketing postcard has different fields because it is a fairly basic document. It contains such things as name, phone number, and date. The loan application documents have many more data fields and might include address, personal identification number, bar codes, check boxes, signatures, and more.

The document hierarchy that is defined when you design an application is the setup DCO. The setup DCO is used as a blueprint for creating runtime instances during the capture process. After data is extracted and exported to the back-end repository, the instance is deleted by the Datacap maintenance process.