Introduction to the Pattern Action language

You use the Pattern Action language to manipulate data. You can decipher and identify patterns in data, and then perform actions against the data based on the pattern.

A Pattern-Action file contains a series of pattern action sets. Each set contains a pattern condition followed by action statements. Actions are taken against input data that has been separated into tokens and classified. Actions are based on a given pattern of tokens.

The pattern condition can contain the following elements:
Operands
Class representation of the incoming data. The class representation is either a user-defined class or a default class.
User variables
Defined by the user, symbolic names associated with values that can be changed.
Dictionary fields
A collection of output field names as defined in the dictionary table located in the dictionary definition file (.DCT).

Patterns are executed in the order they appear in the Pattern-Action file. A pattern either matches the input data or it does not match. If it matches, the actions associated with the pattern are executed. If not, the actions are skipped. Processing then continues with the next pattern in the file.

The pattern action file enables you to use logic which becomes a part of rule sets. Rule sets, applied against parsed and classified input data, standardize the data.

Pattern-Actions files are ASCII files that you can create or update by using any standard text editor.

Describing the pattern format

You can use describe the pattern format with operands and conditional statements.

A pattern can consist of one or more operands. The operands are separated by vertical lines. For example, the following pattern has four operands:


^ | D | ? | T

These are referred to in the actions as [1], [2], [3], and [4].

You can use spaces to separate operands. For example, the following two patterns are equivalent:


^|D|?|T 
^ | D | ? | T
Tip: The use of spaces to separate operands and pipes makes reading and debugging the patterns easier. Spaces are used in the prebuilt rules provided with IBM® InfoSphere® QualityStage®.

You can add comments by preceding them with a semicolon. An entire line or just the end of a line can be a comment. For example:


;
;Process standard addresses
;
^ | D | ? | T  ; 123 N MAPLE AVE
You can refer to fields by enclosing the column name in braces. For example, {HouseNumber}, {StreetPrefixDirectional}, {StreetName}, and {StreetSuffixType} refer to dictionary column names that are defined in the dictionary definition file.
^ | D | ? | T | $ |
[ {HouseNumber} = "" & {StreetPrefixDirectional} = "" &
 {StreetName} = "" & {StreetSuffixType} = "" ] ;
 Common Pattern Found: CALL Address_Type SUBROUTINE then EXIT

Pattern matching for an individual pattern line stops after the first match is found. For example, in the address 123 MAPLE AVE & 456 HILL PLACE, the following pattern matches to 123 MAPLE AVE but not to 456 HILL PLACE:


^ | ? | T

A pattern might not contain any operands. For example, the conditional statement [Required_Step ="TRUE"] is a pattern that contains a user-variable: Required_Step.

The simplest patterns consist of only classification types:


^ | D | ? | T

These are straightforward and do not require much further explanation. Hyphens and slashes can occur in the patterns. For example:


123-45 matches to ^ | - | ^

123 ½ matches to ^ | ^ | / | ^