Overview of the concepts

In IBM Business Automation Navigator, you process documents and batches of documents, and verify possible errors that would need to be manually corrected.

In your business activities, you might deal with much data from unstructured documents, such as banking forms, insurance claims, tax forms, invoices, and so on. Scanning documents, extracting and entering data, or verifying fields are time-consuming activities. Those tasks can be significantly sped up with IBM Automation Document Processing.

To process documents, you use applications that are built in Application Designer and deployed to your Business Automation Navigator environment. According to the business needs that were defined during the designing phase, these documents are automatically classified and relevant data is extracted. Machine-learning capabilities help process more documents, faster and more accurately, and increase your efficiency.

Documents can contain one or more pages.

Your application might be different according to the specifications from which the application developer worked, but two basic types of applications exist.

Batch Document Processing application

With this type of application, you process batches of documents. A batch is a set of documents or pages that need to be processed together.

For example, a batch can be documents that are scanned and sent from IBM® Datacap, and grouped together for processing in your application. Or, you might receive a series of attached documents in an email, which would not make sense on their own and that you need to keep together so you are able to accurately process these documents.

When you define your batch, you can set different values to easily identify it. When you select a batch, you see the list of documents that it contains, and their status. You can also add or delete documents within a batch.

After a batch is processed, you might need to correct errors manually.

Document type issues

If a document type was automatically classified with low confidence or is missing, you must add it manually. You can choose from a list of pre-trained document types, and the system can recommend the most likely types for a specific document. You might also need to reorder, add, or remove pages in documents. You must reprocess the document before you take the next step (data extraction issues).

Data extraction issues

If expected fields were not extracted correctly or are missing, you must correct them either by typing in the fields, or capturing the data directly on the document that you are viewing. Possible field types include:

Numeric values (integers or decimals)
Text strings
Boolean values, which means that the value is either true or false.
Dates, which you can pick from a calendar.
Checkboxes
Bar codes
Signatures

Document Processing application

Also called a single document processing application. With this type of application, you process various stand-alone documents. You need to upload documents from your own computer, and verify that fields were correctly extracted after processing.

Finalized documents

The finalized documents are used in the customer’s business applications in the same Content Engine object store that capture processing occurs in. The design and capture processing roles have no access to these finalized documents. A different set of roles is instead defined for the business users who access the finalized documents. Content Engine enforces access to these finalized documents, by using the three per-project business roles, and the three per-document class business roles. If a classification worker requires access to the documents after they are finalized, they must be included as members in the business team that corresponds to the Content Platform Engine dynamic roles that control access to the document in the business application environment. If classification workers are not added to the correct business team, they might see errors such as Error retrieving document properties after the document is finalized and the document processing application refreshes the list of documents being processed. For more information, see Document Processing security and roles.

Note: The sort order of the list of documents and batches is case-sensitive. This is influenced by the database collation configuration for the target repository. It is possible to change the overall behavior of the sorting in the batch or document lists by adjusting the database collation configuration. Contact your database administrator, who must refer to the documentation specific to their database server.

Reviewing and exporting finalized data

If you have permissions to view the finalized documents, you can review the finalized documents and the extracted data. Next to each document, open the Overflow menu and select View to open the document in the viewer, and View data to see the extracted fields in read-only mode.

From the same menu, you can select Export data to export the data as a JSON or CSV file.

Tip: If you use Microsoft Excel to view a CSV file that contains double-byte characters, the corrupted characters are displayed in Microsoft Excel instead of the double-byte characters. As a work-around, you can open a new blank workbook and re-import the data from the CSV file with the Unicode (UTF-8) encoding option on.

In batch applications, you can select multiple documents to export. If you export in CSV format, several files might be downloaded if you select the Download table data option in the Export data dialog: one file for scalar property data, and separate files for tabular data. If your browser prompts you each time to confirm the download of the documents, you can change your browser settings to download CSV files without prompting. For more information, see the user documentation for your browser, for example Firefox or Chrome.

Figure 1. Processing flow.

The following graphic shows the steps of processing a document or batch of documents.

graphic showing the steps during processing of documents or batches