Identifying pages: IBM Content Classification

The Datacap CC actions use IBM® Content Classification technology for page type identification in the following interrelated ways: category-based classification and rule-based classification.

For information about IBM Content Classification, see Classification overview.

Category-based classification

Category-based classification is a method of identifying the type of page text (or other text) by using an IBM Content Classification knowledge base. The text in question is compared against the text categories in the knowledge base to find the most closely matching category.

The following terms are helpful for understanding category-based classification:
Category confidence score The degree of similarity between a piece of text and an IBM Content Classification category that describes text. This similarity is expressed in a range from 0.0 - 1.0 with 1 indicating a perfect match. For example, the confidence score for a piece of text and a matching category might be 0.7.
Minimum category confidence score The minimum category confidence score that is required for a category to be considered a match. This minimum score is configurable.
Text confidence level The confidence score between a page’s text and the closest matching category.

Rule-based classification

Rule-based classification is a method of identifying a page type by using rules that are defined in an IBM Content Classification decision plan. For example, a decision plan that is called Mortgage might have the following rule: “If the document contains the word ‘Loan’, create a field that is called ‘MyType’ with the value ‘Mortgage’”.

In Datacap, you use actions to specify the decision plan to run and the fields from the decision plan to save in the DCO page object. For example, you might call the following actions:

SetDecisionPlanCC("Mortgage")           // Specify the decision plan to use
SetDecisionPlanFieldsCC(“MyType”)       // Specify the fields to be set in the DCO page object
RunDecisionPlanCC()                    // Run the decision plan
rrSet(“MyType”, “Page type”)            // Copy the MyType field value as the page type