Data rule definitions and rule set definitions

Use data rule and rule set definitions to evaluate or validate specific conditions that are associated with your data sources. Data rule definitions contain logic to analyze your data, and rule set definitions are collections of data rule definitions.

Data rule definitions
Data rule definitions are used to develop rule logic to analyze data. You can set up data rule definitions so that a variable, such as a word or term, is evaluated based on a specific condition or type of check. After you create data rule definitions, you can associate physical data sources to the definition. By associating data to the rule definition, you create a data rule or quality rule. The rules can then be run to return analysis statistics and detailed results. However, when you are creating data rule definitions, you do not bind actual data to the definitions. You specify variables that are bound to real data when you transform the data rule definition into a data rule or quality rule.
Rule set definitions
Rule set definitions are collections of data rule definitions and are used to create rule sets. Data rules capture an understanding of data quality at the columnar level (as they are executed or associated with one or multiple columns) and provide information about which records meet or do not meet an individual rule. They do not capture how a record within a data source conforms to multiple data rules, for example, how many rules does a specific record break. Also, they do not identify the overall quality or confidence in a data source. Rule sets provide the capability to achieve this broader, more holistic, view of a data source and its records by executing and evaluating multiple rules together against individual records.

Prerequisites
As a prerequisite, you must have InfoSphere® Information Analyzer installed.
To access data rule and rule set definitions, you must have a role to access Information Governance Catalog (Information Governance Catalog User role or higher), Information Analyzer User role to access Quality tab, and one of these roles: Rules User, Rules Author, Rules Administrator, Rules Manager.
The roles required to complete other tasks are provided in the topics specific to the tasks.

Accessing data rule and rule set definitions

You can find data rule and rule set definitions in each project in Data rules tab, or you can search for them in Catalog in the Data Quality asset group.

Required roles
To access data rule definitions and rule set definitions, you must have the following roles:
  • A role to access Information Governance Catalog (Information Governance Catalog User role or higher).
  • One of Information Analyzer roles: Data Administrator, Project Administrator, or User to access Quality tab.
  • A Business Analyst or Data Operator role on the project level to access the project.
  • One of these roles: Rules User, Rules Author, Rules Administrator, Rules Manager.

Predefined data rule definitions

By default, a set of predefined data rule definitions is available in the catalog. They're in project, in Data rules tab, in the Published Rules folder. You can use them instead of creating your own data rule definitions. Some common information domains are keys, national identifiers, dates, country codes, and email addresses. Some common conditions are completeness checks, valid values, range checks, aggregated totals, and equations.

You can also use them as examples and models for developing your own rules.

Published data rule and rule set definitions

When a data rule or rule set definition is published, it is available in all projects to all users who can access projects. All definitions that have the candidate status are published with the accepted status. Published definitions are available in the Published Rules folder.

When you work on a published definition in a project, it can be in the following states:
  • In sync - Indicates that the rule logic for data rule definitions and quality controls for rule set definitions are still the same as in the published version. Properties like description or associated terms might be different than in the published version.
  • Out of sync - Indicates that the rule logic or quality controls are different in the published version and the project version. Also, a rule set definition becomes out of sync when at least one of the data rule definitions that it contains becomes out of sync. When you publish the updated definition, the published version is updated as well.
  • Invalid - Indicates that the rule logic for data rule definitions is not valid. Data rule definitions with draft status can contain invalid rule logic. Or the rule logic might become invalid when for example an asset that was used in the logic was deleted. Rule set definitions can have invalid status when at least one of the data rule definitions that it contains becomes invalid.