Advanced data masking

Advanced data masking extends the capability of data protection rules and data location rules by protecting the data with advanced de-identification techniques. The techniques maintain the format and integrity of the data. The high data utility affords data users, such as data scientists, business analysts, and application developers to produce high-quality insights from protected data.

Advanced data masking includes the following features:

The following scenarios explain how advanced data masking extends the capability of data protection rules.

Data scientists want to use financial data, such as credit card numbers and banking account numbers in their Machine Learning model to predict fraudulent transactions. The credit card numbers cannot be XXXXXXXXX to produce the results that they’re looking for. Instead, they need actual credit card numbers. The preserve format method in advanced data masking produces credit card numbers that meet format requirements. Format requirements include maintaining issue identifier information (specifying which credit card company (Visa, Mastercard, and so on) issued the card), luhn checksum algorithm, and so on. The realistic masking ensures that data users can produce accurate results.

Healthcare data users want to use patient data that contains the patients' name and address information to analyze results from terminal disease clinical studies. The patient's name cannot be masked by "XXXX" to produce the results that they’re looking for. Instead, they need realistic names and realistic street names, cities, and countries. As a result, when data users are performing the analyses, they have a broader context that "Jane Doe" who lives on "123 Maple Lane" is the study participant with breast cancer.

Important

Because of the specificity of advanced data masking options, these options can be applied to only one data class at a time. These options are optimized for all 165 pre-defined Watson Knowledge Catalog data classes and recommended as the best format-preserving options for each data class. They can also be applied to custom-defined Watson Knowledge Catalog data classes.

The Advanced masking option can be enabled for only the Redact and Obfuscate masking methods. Advanced masking options apply to rules by using mask data in columns containing data class. Business terms, column names, and tags are not yet supported.

Creating data protection rules with advanced data masking

Advanced masking options are only enabled for data classes.

  1. Complete the conditions and select the attributes that you want to process. Recommended practice is to create rules in one of the following ways:
    • If the data class contains any __insert data class__, then mask data in columns containing data class __insert data class__.
    • You can optionally add conditions for asset owners, business terms, tags, and so on, but be careful to understand how these governance artifacts work. They might unintentionally leak unmasked data. See Managing data protection rules.
  2. Select the action Mask data.
  3. Select the following method to mask data:

    • Redact
    • Obfuscate

    Substitute is not supported for advanced masking.

  4. Click Advanced data masking options.
  5. Select your masking options in the Advanced data masking section. Some options are selected by default for you. See Redacting data method and Obfuscating data method for more information.
  6. Create a rule. See Masking data for more information on how to mask data in assets.

Using the masking previews

The Before and After preview in the Action section display how the data is masked when you're viewing data assets in catalogs, projects, and dynamically before running masking flow jobs.

Alt text

The After preview in Advanced masking options display how the data is masked in the masked copies that are produced by running masking flow jobs.

Alt text

Next steps

Learn more

Parent topic: Data protection rules