IBM InfoSphere Data Replication (CDC Replication) and related PIDs, InfoSphere Classic CDC for z/OS, considerations for GDPR readiness

This document is intended to help you in your preparations for GDPR readiness. It provides information about features of the CDC Replication and CDC Replication for z/OS® technologies in IBM® InfoSphere® Data Replication 11.4 and IBM InfoSphere Data Replication for Db2® for z/OS 11.4 (PIDs 5725E30, 5655DRQ, 5724U70, and 5655IM5) that you can configure, and aspects of the products' use that you should consider to help your organization with GDPR readiness.

This information is not an exhaustive list, due to the many ways that clients can choose and configure features, and the large variety of ways that the products can be used in themselves and with third-party applications and systems.

Clients are responsible for ensuring their own compliance with various laws and regulations, including the European Union General Data Protection Regulation. Clients are solely responsible for obtaining advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulations that may affect the clients‚ business, and any actions the clients might need to take to comply with such laws and regulations.

The products, services, and other capabilities described herein are not suitable for all client situations and might have restricted availability. IBM does not provide legal, accounting, or auditing advice or represent or warrant that its services or products will ensure that clients are in compliance with any law or regulation.

GDPR

General Data Protection Regulation (GDPR) has been adopted by the European Union (EU) and applies from May 25, 2018.

GDPR establishes a stronger data protection regulatory framework for processing of personal data of individuals. GDPR brings:

  • New and enhanced rights for individuals
  • Widened definition of personal data
  • New obligations for processors
  • Potential for significant financial penalties for non-compliance
  • Compulsory data breach notification

Read more about GDPR

Configuration to support data handling requirements

The GDPR legislation requires that personal data be strictly controlled and that the integrity of the data be maintained. This requires the data to be secured against loss through systems failure and also loss through unauthorized access, or via theft of computer equipment or storage media.

You can deploy and configure CDC Replication in an environment where security measures are in place to address data handling requirements that are related to the GDPR. The product is back office in nature and designed to reside in a secure environment while operating. The method that you choose to approach this requirement will vary depending on your business requirements.

It is recommended that you have an overall high level understanding of the topology of the nodes and components that make up the product and the environments where it exists. Information on this can be found in IBM Knowledge Center here: About CDC Replication .

You can learn about possible approaches and factors related to securing an environment for the product in CDC Replication security primer.

Note that CDC Replication is a facilitator for moving data (which could include personal data) between varying different data storage technologies (databases, queueing technologies, file storage). This guide only covers relevant topics that are related to the product itself. GDPR-related configuration and security information of potential source and target components is outside the scope of this information.

Configuring the products to support data privacy

A common approach to addressing data privacy is to limit and divide access to data and processing functions to small groups and or individuals on an as-required basis. You can take this approach with CDC Replication.

The overall security of the systems that are involved is covered in Configuring the products to support data security, but one aspect of data privacy is to control who can access a system where these products exist. This type of control limits the overall number of individuals who can come into potential contact with data. Note that this should include not just product specific users, but also overall system administrators as well.

Access to the actual product installation and configuration of specific source and/or target agents should also be limited to a small group. This is especially important because this area is a point where direct access to data storage (both the source and target databases through the associated access credentials) and also product operational data.

Normally implementing privacy controls is a function that is limited to the product administrators in charge of setting up and maintaining the product. Regular operators who use CDC Replication to solve a business function should be a separately controlled group without this access.

In addition to the overall primer mentioned earlier, specific information about installation, configuration, and administration can be found here: Overview of CDC Replication.

For operational management, control, and monitoring functions, the Management Console component of CDC Replication allows for the separation of capabilities along role lines. For more control of data privacy, the roles that allow the control of data movement (what data can be moved where) should be limited to a small group instead of those who only need to perform the functions of monitoring overall progress and high-level processing control (for example, scheduling). For more information on this topic, see Installing Access Server (Windows) and Installing Access Server (UNIX and Linux®).

Configuring the products to support data security

Data security with CDC Replication is accomplished by deploying and operating the products in a secure environment (encrypted file systems, VPN technologies that include secure network connections, firewall security and controlling access to the system perimeter). Direct security of data within the scope of the product is managed by controlling the access to administrator operators on the system where CDC Replication is installed along with the product administration account (the account under which CDC Replication is installed).

You also need to consider security of the data in cases where support trace logging is configured. By default, the product captures this data to the same location where the product is installed (and this should be on a secure file system). The process by which you manage these captured logs, including their retention and distribution, can affect data security and you should develop a secure procedure that meets GDPR handling requirements.

Note also that CDC Replication requires access to individual datastores to operate. Managing the credentials that are required for connecting to datastores is important to achieving a secure environment. These credentials are not stored securely (using encryption) by the products so encryption of the file systems where these are stored along with limiting user access to these areas is required.

Finally, from an operations perspective, the Management Console user interface tools allow storing these credentials for convenience in a non-secure environment. For stronger security, it is recommended that this option be disabled for management of datastores in Management Console and have users always be prompted for the access credentials to individual datastores.

More information about user access security and managing users and datastores in Management Console can be found in IBM Knowledge Center here:

Data lifecycle

CDC Replication is data agnostic, not specifically aware of the nature of data that it handles other than at a technical level (encoding, data type, size). As such, the product can never be aware of the presence (or lack thereof) of personal data, except for the aforementioned cases where a customer has explicitly defined handling data that might be personal in nature. It is up to customer discretion if there is the possibility that personal data is present in the data that is being moved by this product.

More information is available in IBM Knowledge Center about the general high-level process for data handing: About CDC Replication

Briefly summarizing as it relates to GDPR, data is taken from a source by a source CDC agent and sent to an eventual target by a target CDC agent. Depending on the configuration and the types of sources and targets involved, the product reads data from a source, stages data in program memory (and in some cases stages data to temporary disk storage cache), transmits the data via a network connection, stages data again in program memory (and again sometimes also to disk storage cache), and then moves data to an eventual target.

Because the product also gives users the ability to define specialized handling of specific data (for example data filtering and translations), there can then be certain specific cases where a customer may explicitly define data that is personal in nature to be handled in a specific way. In these cases, the classic replication agents would be acting on the personal data, although not in such a way that the product is aware of the nature of the data and only based on the customer defined rules.

If any personal data is present that could be handled by the product on a source, or as described in how a customer has configured the product to handle specific data, then this is the path and lifecycle it would follow.

During special (customer controlled) situations such a diagnostic servicing, product trace logging could be enabled which could result in data being captured in these logs. These logs would be persisted to disk storage where the product resides, with the normal process being the customer providing them to IBM to assist with servicing. Best practice is to immediately remove these logs from the system once collected and provided to service to end the lifecycle of any potential personal data that may be contained in logs at this point.

Note that if there is a desire to see that no personal data is shared with IBM in logs collected for product servicing, it is a customer responsibility to ensure that said data has either been removed or rendered no longer personal.

Finally, it should be noted that the nature of the data, the handling, and the lifecycle as such on either side of CDC Replication (source or target) will have its own specifics related to personal data and the GDPR. For information on that, consult product documentation related to these sources and targets.

Personal data used for online contact with IBM

CDC Replication clients can submit online comments/feedback/requests to contact IBM about CDC Replication subjects in a variety of ways, primarily:

  • Public comments area on pages in the CDC Replication community on IBM developerWorks®
  • Public comments area on pages of CDC Replication documentation in IBM Knowledge Center
  • Public comments in the CDC Replication space of dWAnswers
  • Feedback forms in the CDC Replication community

Typically, only the client name and email address are used to enable personal replies for the subject of the contact, and the use of personal data conforms to the IBM Online Privacy Statement.

Data storage

Transparency and data minimization

Because CDC Replication is data agnostic and does not collect data for the intention of storage, no personal data is directly available for review of transparency and thus the product meets the principle of data minimization.

Protection

The product does handle data that could include personal data and in some cases this data could reside on disk storage at some points in the data lifecycle. Access controls are built on top of system and component mechanisms (for example operating system and database access controls) to control and limit product access.

In order for data to be protected when using CDC Replication, it is necessary to provide a secure environment in which to run the product. For data that might reside at rest (such as in temporary disk storage caches or trace log files), a supplementary full disk volume encryption solution is required. In the case of the data transmission between nodes, it is recommended that a VPN solution be employed (either software based or a physical hardware based technology).

Additionally, standard system and IT security approaches (such as firewalls and network architectures) should be employed to protect all nodes involved with the movement and storage of data from risk of outside attack.

More details on approaches to this and considerations are covered in CDC Replication security primer.

Additional considerations

  • Account data: Most user account data that CDC Replication uses is contained and controlled via the system user account management facilities, facilities in related components such as database access controls, and in some cases external controls such as LDAP directories. In certain cases for convenience or product function requirements, Classic CDC does store certain user credential information that does not meet the criteria for "secure" storage. In some cases the product configuration can be adjusted to not allow these insecure conditions (such as requiring passwords to always be provided for connections to datastores by users of the Management Console), but in other cases storage of the credentials by the product cannot be avoided (source/target datastore credentials for the CDC agent). For this reason, the product needs to be installed on a secured (encrypted) disk environment.
  • Backups: While there is not a set CDC backup procedure that would cover the potential areas where GDPR-controlled data could reside at any given time, you could initiate a backup action for a system where the product exists that could inadvertently capture personal data. You must ensure that any actions that you perform on a system that could potentially contain personal data comply with GDPR handling policies.

Data access

Data access in CDC Replication is limited to a small number of roles, typically a small number of user accounts:

  • The account under which the product is installed and base configuration is performed.
  • User account credentials that the configured product uses to access actual data for reading and writing (for source and target agents).
  • User accounts that are used for operational configuration and monitoring.

Of these, the last group usually encompasses the largest potential group of users because separate account roles are available for system administration, operations, and monitoring,

Details on account requirements and options can be found in IBM Knowledge Center at the following links:

Because the product is an intermediary processor of data to and from various sources and targets, you need to access these stores by using various APIs and means. They are all controlled through various different user account credentials depending on the specifics of the databases or queuing technologies. Be aware that these are the user accounts that are being used to gain access to all data.

Controlling access to logs

All major components of the CDC Replication product infrastructure have activity and debug logging capabilities. The detail level of logging is configurable with minimal information logged in default mode during normal operations. These logs are visible to the user account that owns the product installation, as well as any superuser administrator accounts. Access to these accounts is controlled through the specific underlying operating system mechanisms for user access control on the specific node where the product is running.

In cases of detailed trace logging being enabled (such as during servicing), or in the case of a product error being encountered, detailed data that is being processed by the product could be captured in the log files. As such any user account with access to the log files could access potential personal data.

IBM support engineers might also need to be given access to logs or data during customer requested product servicing.

Additional considerations

In addition to the separation of duties around overall system administration, product installation and ownership, and data access controls, there is also the separation of accounts and duties for the product configuration, management, and operational control. While these are not related to direct data access, product management users can have privileges that allow them to control the product configuration around which data is moved, along with control of the movement of data. Be sure to consider the specific authorizations to what users or groups of users are allowed this capability.

Data processing

Users of CDC Replication can control the way data (and potentially personal data) is processed by the product through the product configuration and control interface (the Management Console). This will be the more frequently performed activity along side the normal initial product setup and configuration where the source and target data stores are defined and access to those configured.

Encryption

The products do not handle the security of the data directly and rely on outside mechanisms to provide a secure environment. The encryption of the environment should be handled through system-level encryption of the file systems and network connections across which the product communicates.

Security profiles and data processing

Such a model of an overall secure system with individual application access controls allows for the securing of the entire system to be handled by one group, such as a security administration team, while allowing only a specific product user team to be able to access and control the product data processing activities. The access to the potential underlying data and the users who are allowed to control the movement of data can be separated as well by separating the product setup access and the user control access accounts.

Specific details on the access to data and data processing controls for CDC can be found in IBM Knowledge Center here:

The details of the actual processing of the data are described in Data lifecycle.

Data deletion

Article 17 of the GDPR states that data subjects have the right to have their personal data removed from the systems of controllers and processors without undue delay.

CDC Replication is not a forward-facing application for customers and thus does not provide any mechanisms for data subjects to request or control data deletion. All data deletion related to the product can only be accomplished by authorized users.

Also, because the product does not permanently store any data and data that it does come in contact with is purged on a regular basis as part of continuous operation, there is normally no active requirement to delete any data related to the product.

Special cases related to this include:

  • Trace/error logging and other data for servicing
  • System level backups that capture configuration and operational data on file systems where the product is installed

Best practices should be followed to avoid the possibility of having personal data spread in scenarios like these, such as not backing up operational data for the product as well as a rigorous policy around the collection and management of trace logging data.

Data monitoring

Because CDC Replication must be installed in a secure environment to achieve the data protection requirements of GDPR, use these security mechanisms to regularly monitor the security state of the product and environment. Consult the product information related to those solutions for details on how to monitor the regular state of system security.

Within the Access Server and Management Console components of the CDC product itself, it is also further possible to monitor user related activity. Depending on the configuration, this may be provided by the Access Server component itself or via an outside LDAP directory. For information on this and the functions available, see Auditing user accounts, datastores, security policies, and general events.

Additionally, these components of CDC Replication also provide features for monitoring the state and progress of data processing and flow. You can use these features to monitor the overall progress of data movement. More information is available at Monitoring subscriptions.

An effective security monitoring and management protocol needs to cover many areas including:

  • Overall system security and access
  • Product configuration
  • Product monitoring
  • Monitoring of log and trace data produced by products

Individual customer needs will vary. Use the tools and functions that are mentioned above as part of developing this overall security management solution for your specific needs.

Responding to data subject rights

Because CDC Replication is not a forward-facing application for customers, there is limited applicability to data subject rights requests. No actions are required in the product to deal with correcting, modifying, restricting, or extracting data from these products. The only applicable case could be deletion of data.

Because personal data can potentially flow through the product, a request to delete all personal data could mean data going through the product would fall under that request.

Because the normal product operation is to move data from a source to a target on a continual basis, and never to retain any data for longer than is necessary to perform this function, any data will naturally age out on the end-to-end data flow. This happens as fast as the product is able to process the data and is linked in time only to the latency of being able to perform this function.

Special scenarios where this might not be the case could include:

  • Service trace/error logs
  • System backups covering the product installation and operational file systems of the product

Follow best practices of not backing up operational data for the product as well as a rigorous policy around the collection and management of trace logging to limit the potential spread of personal data.