Data classification

Integrating IBM® StoredIQ® with a governance catalog provides the means to classify enterprise content consistently.

You have a uniform view on data classifications in the enterprise and how they are defined. And you can apply data classes consistently to structured and unstructured content to identify and find key data, for example, to detect policy violations for placement of sensitive data.

In terms of information governance, a data class categorizes data according to type and usage. Data classification is the process of assigning a data class to an information asset. Applying the classification rules of a data class lets you detect data that contains elements of a given kind, for example, phone or credit card numbers.

Use of data class detection logic as cartridge actions in IBM StoredIQ

In the governance catalog, you create or update data classes of type Regex that can be applied to unstructured content. When synchronized, these data classes are reflected in IBM StoredIQ as filters.

By applying the regexes that are derived from the data classes, you analyze and classify data, and make the classified data searchable and detectable in IBM StoredIQ. Classification results for IBM StoredIQ infosets and volume contribution information are available in the governance catalog.

Classification of infoset assets

Data classifications of infoset assets can be derived from filter, union, expand, and collapse operations. Applying a filter is the basis for all classification. During synchronization, the number and size of objects in an infoset that match a certain data class is calculated (calculation of the so-called data class contributions). If you apply a filter that is linked to a data class, for example, all objects of the new infoset are classified by that data class.

The calculated classification is inherited upward and downward in the infoset's ancestry. The calculation results in either the match type Exact or the match type GreaterThan:
  • Exact: The respective data class contributes exactly the given number and size of objects to the infoset.
  • GreaterThan: The respective data class contributes at least the given number and size of objects to the infoset. In cases where the data class contributes to the infoset but the number and the size of objects cannot be determined (which might happen for infosets created through expand or collapse operations), both values are set to zero.