Masking virtual data (Data Virtualization)
Tech preview You can use a data protection rule to mask virtual data. When queried, masked columns return disguised data.
Data masking applies to the result sets of the queries only. The original data in tables and columns remains untouched. Masking does not apply to query predicates. You can use data masking to avoid exposing sensitive data. However, data masking does not stop a Data Virtualization users from connecting to the service and running queries against that data. Users can join and group data, generate the reports, perform analytics and collect insights by using the raw data, while masking the result set only.
- Tables that were not virtualized in the strict virtualization mode. See Enabling the strict virtualization mode for details.
- Tables that have one-to-one mapping with the original (non-virtualized) table. Therefore, data masking is not supported for grouped tables or tables that can be grouped before or during the virtualization process.
Watch the following video for an overview of data masking in Data Virtualization:
- Data type support for masking virtual data.
- Limitations and known issues for masking virtual data.
- To avoid limitations, you can deny access to the virtual table. See Denying access to masked data for details.
Data type support for masking virtual data
Data Virtualization supports data masking on columns of several data types. Masking operations behave differently based on the operation and the data type of the column.
- Redact action data mapping
- The following table shows which data types are mapped for Redact action
in Data Virtualization.
Table 1. Redact action data mapping Data type Masked to VARCHAR 'XXXXXXXXXX' or 'X' * original string length, whichever is shorter. CHAR 'XXXXXXXXXX' or 'X' * column length, whichever is shorter. Numeric 0 DATE '0001-01-01' TIME '00:00:00' TIMESTAMP DATE('0001-01-01') BOOLEAN 0 - Substitute action data mapping
-
- VARCHAR
-
- If the original value is at least 32 characters or longer, masked data is md5 of the original value.
- If the original value is less than 32 characters, masked data falls back to the Redact action.
- CHAR
-
- If column length is at least 32 characters or longer, masked data is md5 of the original value.
- If column length is less than 32 characters, masked data falls back to the Redact action.
- Add other data
- Falls back to the Redact action.
- Obfuscate action data mapping
- Obfuscation support is limited to the following data classes only:
- Personal Information
- Gender
- Honorific
- Name suffix
- Contact Details
- Email address
- USPHN
- STREET
- ADDRESSLINE1
- ADDRESSLINE2
- ADDRESSLINE3
- CITY
- STATEPROVNAME
- COUNTRY
- Latitude
- Longitude
- Financial Accounts
- Amex card
- Diners club card
- Discover card
- Master card
- VISA card
- Credit card
- Japan Credit Bureau (JCB)
- Government Identities
- SSN
- SSN4
- Organization Information
- DUNS Number
- CAGE CODE
- USSIC
- Connectivity Data
- IP v4 address
- Personal Demographic Information
- RELIGION
- ETHNICITY
- EYECOLOR
- HAIRCOLOR
- MARITALSTATUS
- HOBBY
- EMPLOYMENTSTATUS
- Other
- USSCP
- RELATIONSHIP
If obfuscation cannot be performed, the behavior falls back to Substitute.Note: If obfuscated value is longer than the length of the original value, masked data falls back to the Redact action to avoid overflowing column length. - Personal Information
Limitations and known issues for masking virtual data
- Data masking support
- Data masking is not supported for:
WHERE
,GROUP BY
,JOIN
,HAVING
,SELECT DISTINCT
, andORDER BY
clauses in an SQL statement. Masking is not applied to query predicates.- Non-virtualized tables in the catalog on which virtual tables are based.
- Data masking is not supported for grouped tables or tables that can be grouped before or during the virtualization process.
- Tables that are virtualized with a subset of columns smaller than the original non-virtualized table. If you choose not to virtualize all table columns, you get this type of tables.
- Views. As a workaround, you can anonymize tables that are referenced in the
view definition:
- Ensure that the virtual tables that are referenced in a view definition are basic virtual tables
by executing the following statement:
SELECT BSCHEMA, BNAME, BTYPE FROM SYSCAT.TABDEP WHERE TABSCHEMA = <View schema> AND TABNAME = <View name>
If
BTYPE
for all referenced objects isN
, your view is referencing basic virtual tables only and you can use this workaround. - Publish the referenced tables to the catalog and define masking rules that apply to these virtual tables.
Note: When the objects that are referenced in the views are anonymized, the views will be anonymized as well. - Ensure that the virtual tables that are referenced in a view definition are basic virtual tables
by executing the following statement:
- Substitute masking action can be performed only on character strings with original length >= 32 characters. In all other cases, the Substitute action is downgraded to Redact.
- Masked columns cannot be used in
SELECT
(including views) derived from a set operation that involves anEXCEPT
,EXCEPT ALL
,INTERSECT
,INTERSECT ALL
orUNION
set operator.
- Data masking support for data types
- Data masking is limited to the following data types:
- Character strings (CHAR, VARCHAR)
- Numeric
- Date/time
- Boolean
- Data masking in connections with personal credentials
-
When data assets in the catalog use connections with personal credentials, asset preview does not support masked data in the following components: Watson™ Knowledge Catalog, Watson Studio, and Data Refinery.
If you try to preview an asset in these conditions, you get the following error:An error occurred attempting to preview this asset. The anonymized data for this data asset can't be retrieved. There might be a temporary server outage.
By default, Data Virtualization's
DV
connection in Watson Knowledge Catalog uses personal credentials.
Denying access to masked data
- Run the following command to run the
dv-engine-0
:oc exec -it dv-engine-0 -c dv-engine bash
- Modify $BIGSQL_HOME/conf/bigsql-conf.xml to add the following
property:
<property> <name>bigsql.wkc.allow_on_transform</name> <value>false</value> </property>
- Save the file.
- Restart Big SQL
scheduler:
bigsql stop -scheduler bigsql start -scheduler