Rules in InfoSphere Information Analyzer thin client

In InfoSphere® Information Analyzer thin client, you can use two different rule types: data rules and quality rules. All rules evaluate or validate specific conditions associated with your data sources. Some rules can also affect the data quality score for each column and data set.

Data rules and quality rules can both be used to validate and check certain user-defined conditions in your data sets. The difference in these rule types is the way that they are managed and the output produced by the execution of each rule. Both data rules and quality rules are generated using rule definitions created in Information Analyzer workbench.

Rule definitions: A rule definition is reusable logic that you can apply to data in your various projects. Rule definition logic describes a particular condition in a record using basic syntax where a variable, such as a word or term, is evaluated based on a given condition. Rule logic evaluates to a true or false value and sets up pass or fail checks to evaluate the quality of your data.
Data rules: You generate a data rule by binding rule definition logic to your physical data, and then naming and saving the data rule. A data rule is an object that you can manage and run independently to produce a specific output.
Rule sets: A rule set is a collection of data rules. They can capture how a record within a data source conforms to multiple data rules. For example, how many rules does a specific record break? They also help to identify the overall quality or confidence in a data source by providing a broader, more holistic, view of a data source and its records by executing and evaluating multiple rules together against individual records.
Quality rules: A quality rule is a simple check that can be quickly applied to multiple columns because there is no need to assign a name or define output and scheduling. When you create a quality rule, you add a new quality dimension to your data analysis. You generate a quality rule by binding rule definition logic to your physical data. Quality rules are only run as a part of the quality analysis for an entire data set. The output of a quality rule is displayed as a rule violation in the data quality score analysis results. Because records that do not meet the conditions of a quality rule are identified as rule violations, it is important to develop and use rule definitions for quality rules that test for valid conditions.
Automation rules: An automation rule is used to automatically apply rule definitions or quality dimensions to data sets or columns based on their term assignment. Automated rules are run during the discover process or when you run a column analysis.

Quality rule example

A data analyst for the fictional Sample Outdoor Company wants to run a quick analysis on a data set to investigate whether the customers listed have a valid rewards account on file. He uses the InfoSphere Information Analyzer workbench to create a rule definition:

AccountHolder = 'true'

The analyst wants to review the amount of records in the data set that don't meet the conditions of this rule definition, and is only interested in viewing the basic results of the execution. Since he doesn't need granular control over the output or scheduling of this rule, he chooses to create a quality rule. In the Information Analyzer thin client, from the Column Properties grid, he selects the ACCOUNT_HOLDER column and binds the rule definition variable AccountHolder to it. He runs a new analysis on the data set, and sees the results of the quality rule execution in the data set's quality score analysis results.

Learn more about validating data with data rules.