Data quality dimensions

Data quality dimensions describe a measurable characteristic of data and help defining data quality requirements. Use data quality dimensions to determine the expected results of data quality assessment, whether initial assessment or ongoing monitoring.

Base Premium Standard Unless otherwise noted, this information applies to all editions of IBM Knowledge Catalog.

The state that you want your data to be in usually can be defined as fit for use, defect free, corresponds to specification, or meeting expectations and requirements. When you measure data quality, you compare the actual state of your data to this wanted state. The standards, expectations, and requirements that are important to your business processes are expressed as characteristics or dimensions of the data.

The Data Management Association (DAMA) International published a paper that describes 6 core dimensions of data quality: Accuracy, Completeness, Consistency, Timeliness, Uniqueness, Validity

In addition to these core dimensions, IBM Knowledge Catalog provides the dimension Homogeneity.

The following table describes the data quality dimensions and lists the data quality checks in metadata enrichment that can identify issues associated with a specific dimension. In addition, these dimension can be evaluated by running individual data quality rules.

Data quality dimensions
Dimension Description Types of data quality checks
Accuracy Data values are as close as possible to real values. None.
Completeness All required data values are present. Completeness check
Consistency Data values within a column comply with a rule. Capitalization style check
Missing values representation check
Referential integrity check
Suspect values check
Homogeneity Data within a data asset is uniform and consistent over time. All data points share similar characteristics, formats, or structures. Historical stability
Timeliness Data represents the reality from a required point in time. None.
Uniqueness Distinct values appear only once. Uniqueness check
Validity Data conforms to the format, type, or range of its definition. Data class check
Data type check
Format check
Length check
Possible values check
Range check
Regex check

IBM Match 360 (if deployed) contributes the Entity confidence dimension. This dimension indicates how confident the system is that the entity matches within your data are correct. The dimension score represents the percentage of entities of the particular entity type that have no records with potential match issues as member.

You can create your own data quality dimensions by using the IBM Knowledge Catalog API Create a data quality dimension.

Learn more