Datacap functional overview

IBM® Datacap acquires documents, extracts useful information from them, and feeds them into other business processes downstream. Its strength is its ability to complete these tasks with a high degree of automation, flexibility, and accuracy.

At a high level, Datacap functions can be organized into three areas:
  1. Acquisition of documents from several sources
  2. Processing of documents to extract useful information
  3. Delivery of content and data to back-end systems

These functions are integrated into a task flow that controls the processing of the documents from acquisition to delivery. Background tasks are used whenever the processing can be automated. Foreground tasks are used when human interaction is required, such resolving errors and ambiguities in the extracted data.

Datacap handles the following main functions:
  • Acquires paper documents from scanners, multifunction printers, or mobile devices, such as smartphones and tablets
  • Imports electronic documents or existing images from a file system, fax, or email server
  • Cleans up images and prepares documents to improve data extraction with image-processing capabilities, such as deskewing, removing lines, smears, and borders
  • Classifies and separates document based on type to determine which data needs to be extracted
  • Extracts data by using recognition technologies:
    • Optical character recognition (OCR) for machine-printed characters
    • Intelligent character recognition (ICR) for handwriting, typically detached block letters, but also cursive writing on checks or in other well-identified contexts
    • Optical mark recognition (OMR) for identifying checked boxes and other marks, such as bubbles in surveys or a signature on a form
    • Bar code reading, including the following types:
      • One-dimensional bar codes, such as those that are used for price reference in stores
      • Two-dimensional bar codes that are used to encode larger sets of data, such as name, address, or shipping information
  • Checks the accuracy of extracted information and corrects errors against business rules.

    Datacap can also automatically look up information in a database from the partially recognized data. It can trigger verification and validation by a human operator when confidence in the data accuracy is below a predetermined level.

  • Learns automatically from the experience of human operators and the processing of documents to improve accuracy over time
  • Exports image documents and extracted data to FileNet® Content Manager or other ECM repositories, databases, or business applications
  • Organizes the flow of tasks in the capture process from scan to export, including handling of exceptions, into a workflow
  • Streamlines the manual data entry of index entries by using recognition to automatically identify the index values on each document and to automate the document identification process
  • Controls access to the system and tasks by using functional security
  • Monitors progress of capture operations and fixes problems in real time
  • Reports on capture operations and provides statistics about system performance
  • Supports flexible deployment scenarios
  • Provides libraries of hundreds of script-based and code-based (.NET) actions