IBM Support

IBM Integrated Synchronization setup and migration scenarios for incremental updates with IBM Db2 Analytics Accelerator V7.5

White Papers


Abstract

This document provides an overview of the new advanced data synchronization technique to process incremental updates to the accelerator. The overview includes a description of the architecture of the new technique and the setup steps. It also highlights the advantages of IBM Integrated Synchronization in comparison to IBM Change Data Capture (CDC) of InfoSphere® Data Replication for z/OS®.
In addition, this document describes the following scenarios to migrate to IBM Integrated Synchronization from existing CDC implementations.
- Accelerator V7: Migrating from CDC (without high-availability setup) to IBM Integrated Synchronization
- Accelerator V7: Migrating from CDC (with high-availability setup) to Integrated Synchronization
- Accelerator V5 and V7: Migrating from a V5 accelerator with CDC to a V7 accelerator with Integrated Synchronization

Content

IBM Integrated Synchronization overview

IBM Integrated Synchronization is a new advanced data synchronization technique to process incremental updates to the accelerator. This functionality is integrated into Db2 for z/OS. Its purpose is to capture table changes from the Db2 for z/OS log and to apply these changes to the tables on the accelerator. For customers that want to use incremental updates, it is no longer necessary to install and configure IBM CDC (Change Data Capture) of InfoSphere® Data Replication for z/OS®.

In addition, IBM Integrated Synchronization provides the following advantages:

  • Low latency
  • Reduced CPU consumption on z/OS due to a streamlined and optimized design
  • On z/OS, the workload to capture the table changes has been massively reduced and the remainder can be handled by IBM Z Integrated Information Processors (zIIPs)
  • Simplified administration, packaging, upgrades, and support
  • Enterprise-grade enabler for Hybrid Transactional Analytical Processing (HTAP): The integrated low latency protocol is now enabled to support significantly more analytical queries running against the latest committed data.
  • Supports the replication of Db2 archive tables (not supported with CDC)

Db2 Analytics Accelerator V7.5 provides and supports both techniques:

  • Incremental updates using IBM Integrated Synchronization
  • Incremental updates using CDC

Customers still have the choice between two incremental update techniques. The choice can be made for each Db2 subsystem that is paired to the accelerator.  However, if the prerequisites can be met, IBM Integrated Synchronization is recommended for new incremental update implementations. For existing incremental update implementations, a migration from CDC to IBM Integrated Synchronization is also recommended.

Prerequisites for IBM Integrated Synchronization:

  • Db2 12 for z/OS with APAR PH06628, running at function level V12R1M500
    • Additionally recommended APARs (available at the time of writing): PH19181, PH19886, PH20587, PH21187, PH21419, PH28849, PH29443, PH30397
  • Distributed data facility (DDF) with a secure port, configured for network encryption through AT-TLS
    • For data sharing groups: Ensure that it is possible to always connect to the same Db2 member, for example by using a specific secure port and location alias for IBM Integrated Synchronization.
  • IBM Db2 Analytics Accelerator for z/OS Version 7.5.0 or later. It is recommended to use the latest available maintenance level.

IBM Integrated Synchronization architecture and setup

The following figure describes the components that are involved in an IBM Integrated Synchronization setup:

image-20200317170412-1

The Log data provider is a newly developed, internal Db2 for z/OS component that is provided with Db2 12 APAR PH06628 (PTF UI63356). It reads the Db2 log into a memory buffer via a service request block (SRB) that is scheduled in the Db2 address space DBM1.

The Log data processor is a newly developed, internal accelerator component. It is responsible for forwarding the provided log data regularly to the accelerator into a staging area and applying the data from the staging area to the tables on the accelerator in an optimized, high-performance kind of way.

Compared with incremental updates using CDC, the design to read and fetch the log is streamlined, which results in a reduced CPU usage and a higher throughput.

For communication between both components, the log data processor on the accelerator connects to Db2 for z/OS (DIST address space) via the DDF secure port. It authenticates itself to Db2 on z/OS with a z/OS user ID/password that has MONITOR2 privilege in Db2 for z/OS. The privilege is defined in a special RACF DSNR profile (“ACCEL”); RACF PassTickets are also supported as an alternative. After the connection has been established, the log data provider starts reading the Db2 log and the log data processor forwards the log data to the staging area on the accelerator.

The log data transfer is always encrypted, which requires that AT-TLS be set up for this connection.

The setup of IBM Integrated Synchronization is described in detail in the Db2 Analytics Accelerator Knowledge Center: https://www.ibm.com/support/knowledgecenter/en/SS4LQ8_7.5.0/com.ibm.datatools.aqt.doc/installmanual/topics/tp_idaa_inst_incr_updt_isync.html.

The setup consists of the following steps:
  • Installing the Db2 prerequisites
  • Defining a secure network port for DDF
  • IBM Integrated Synchronization needs to maintain a stable connection to the same log data provider task on the same Db2 subsystem, or on the Db2 member where the session was started.
  • For data sharing groups: Ensure that it is possible to always connect to the same Db2 member, for example by executing the following steps :  
    • Define a dedicated location alias and a secure port (SECPORT) for IBM Integrated Synchronization on all Db2 members. In case you already use a particular SECPORT for other workloads, choose a different SECPORT for IBM Integrated Synchronization. A dedicated SECPORT and location alias for IBM Integrated Synchronization has the advantage that the location alias can be started individually per Db2 member. In a high-availability setup for IBM Integrated Synchronization, the location alias can be started on multiple Db2 members. If a high-availability setup is not present, the location alias should only be started on one Db2 member.
  • This figure shows an example where the location alias is started on one Db2 member only.  An example of a high-availability setup is shown in a later section of this document. 

image-20200317170852-1

  • AT-TLS configuration
    • Create certificates, set up a RACF key ring and store certificates
    • Export the public key of the signer certificate in DER format and transfer it to the accelerator
    • Setup AT/TLS to encrypt the DDF connections using the created certificates
  • Prepare a user ID with the required access rights for IBM Integrated Synchronization
    • Optionally, use a RACF PassTicket or a password for authentication
    • Required access rights: READ access to DSNR profile ssid DIST and ssid ACCEL, MONITOR2 privilege in Db2 for z/OS
  • Enable IBM Integrated Synchronization for a Db2 subsystem using the Db2 Analytics Accelerator Console
    • Required input parameters:  Db2 for z/OS IP address, DDF secure port, public key of the signer certificate, prepared user ID 

After these steps have been completed, IBM Integrated Synchronization is ready to use. You can now start replication and enable tables for replication (for example by using administration client controls from Db2 Analytics Accelerator Studio or from Data Server Manager).

Comparison of IBM Integrated Synchronization and CDC for incremental updates

The following table compares both techniques for incremental updates from a functional, setup, maintenance, performance and resource consumption perspective. The comparison is based on IBM Db2 Analytics Accelerator V7.5 and outlines the differences.

IBM Integrated Synchronization CDC

Installation and setup

Streamlined and optimized design tailored specifically to Db2 z/OS -> Db2 Analytics Accelerator replication, resulting in less CPU consumption and better throughput

General purpose replication technique supporting multiple sources and targets

New log data provider component integrated into Db2 for z/OS for capturing table updates from the Db2 for z/OS log. Initial installation by applying a Db2 for z/OS PTF. The new log data processor component is integrated into Db2 Analytics Accelerator.

CDC capture agent installation and setup required for capturing table updates from the Db2 for z/OS log.

CDC apply components are included in Db2 Analytics Accelerator.

Supported with Db2 12 only

Minimum prerequisite: Function level V12R1M500 with APAR PH06628 (PTF UI63356)

Supported with Db2 11 and Db2 12

Log data provider maintained by applying Db2 PTFs

CDC capture agent maintained by applying CDC PTFs

Db2 DDF secure port and AT-TLS network encryption required

No encryption required, but optional configuration of encryption of data-in-motion possible

Optional support for RACF PassTickets for Db2 z/OS authentication from accelerator via Db2 DDF as an alternative to user ID/password authentication method

Db2 for z/OS authentication from accelerator via Db2 DDF with user ID/password only.

High-Availability setup for log data provider component in data sharing environments supported since Db2 Analytics Accelerator V7.5.3

High-Availability setup of the CDC capture agent in data sharing environments supported

CPU usage and latency

50% less CPU consumption on z/OS compared to CDC.

Higher CPU consumption on z/OS

Workload that captures changes from the Db2 log is fully zIIP enabled

The CDC Capture Agent workload that captures  changes from the Db2 log runs on GP processors (not zIIP enabled)

Significant reduction in replication latency.

Latency < 10 seconds

Latency around 30 seconds, can be more or less depending on workload. Higher with a significant share of “True HTAP” jobs in the workload.

“True HTAP” query scalability significantly improved. Use of the WAITFORDATA protocol for concurrent queries has practically no impact on throughput and latency

Use of the WAITFORDATA protocol for queries may impact replication throughput and latency

Functional differences

Replication of Db2 archive tables (transparent archiving) supported

No support for replication of Db2 archive tables (transparent archiving) because of a CDC limitation

Supports non-logged utility actions to empty table partitions or complete tables: LOAD REPLACE DD DUMMY, REORG DISCARD. Data is deleted from the accelerator and replication continues.

No support for non-logged utility actions.

Tables must be reloaded to the accelerator to get utility changes synchronized and to continue replication.

Since maintenance level V7.5.5: Executing a DROP TABLE statement in Db2 for z/OS for a replication- enabled table now leads to an automatic table removal on the accelerator. The SYSPROC.ACCEL_REMOVE_TABLE stored procedure is called automatically.

The table is not automatically removed from the accelerator if a DROP TABLE statement is executed. The SYSPROC.ACCEL_REMOVE_TABLE stored procedures must be called manually by an accelerator user.
So far, the waitForReplication option of the SYSPROC.ACCEL_CONTROL_ACCELERATOR stored procedure is not supported. Use the WAITFORDATA protocol to execute queries on the most recently committed data. Supports the waitForReplication option of the SYSPROC.ACCEL_CONTROL_ACCELERATOR stored procedure.

Available with Db2 Analytics Accelerator V7.5 maintenance levels:

  • Option to change the IP address of the Log Data Provider on the accelerator -> available since V7.5.1
  • Automatic disablement of query acceleration when a table no longer replicated -> available since V7.5.2.2
  • Replication of schema changes (First stage: ALTER TABLE xxx ADD COLUMN) -> available since V7.5.4 as a technical preview function and since V7.5.6 for production usage.

With maintenance level V7.5.5, incremental updates based on IBM InfoSphere Data Replication for z/OS (CDC) have been deprecated.

There are no plans to make additional features available to CDC users.

                                                    

Coexistence of IBM Integrated Synchronization and CDC

Mixed CDC and IBM Integrated Synchronization environments that handle incremental updates for one or more accelerators are supported. For each Db2 for z/OS connection to an accelerator (“pairing”), you can decide whether you want to enable incremental updates based on IBM Integrated Synchronization or on CDC. For example, for Db2 subsystems still on Db2 for z/OS V11, you can enable CDC, but for Db2 on z/OS V12 subsystems that meet the prerequisites, it is recommended that you use IBM Integrated Synchronization.

  • One accelerator supports both incremental update techniques
  • From the same Db2 subsystem, both the same or different data can be replicated to one accelerator with CDC and to another one with IBM Integrated Synchronization. This configuration might be used in a coexistence setup where one accelerator is still on version 5 and the other one on version 7.

The following figure illustrates an environment in which two different Db2 subsystems replicate data to a single accelerator. The Db2 subsystem on LPAR1 uses IBM Integrated Synchronization, and the Db2 subsystem on LPAR2 uses CDC. Possible examples of such an environment:

  • The Db2 subsystem on LPAR1 is a test system. IBM Integrated Synchronization is already implemented for test purposes. The Db2 subsystem on LPAR2 is a production system that still uses CDC for replication. After a successful test of IBM Integrated  Synchronization, the replication setup on LPAR2 will be migrated from CDC to IBM Integrated Synchronization.
  • The Db2 subsystem on LPAR1 is already running Db2 for z/OS V12 with all prerequisites in place and can therefore use IBM Integrated Synchronization to replicate data to the accelerator. The Db2 subsystem on LPAR2 is still running Db2 for z/OS V11 and therefore uses CDC to replicate data to the accelerator.

image-20200317170852-2

The following figure illustrates an environment in which one Db2 subsystem replicates data to two accelerators. The Db2 subsystem replicates data to Accelerator 1 using IBM Integrated Synchronization, and to Accelerator 2 using CDC. An example of such an environment is a coexistence environment of a version 5 accelerator and a V7 accelerator during a migration phase from version 5 to version 7. Both versions are set up in coexistence mode until the migration has been completed and the V5 accelerator can be removed. In this example, the Db2 subsystem replicates to a V5 accelerator using CDC and to a V7 accelerator using IBM Integrated Synchronization.

image-20200317170852-3

Migration scenarios

In this chapter, various scenarios are described that outline the steps to migrate from an incremental update setup using CDC to an incremental update setup using IBM Integrated Synchronization.
Accelerator V7: Migration from CDC (without high-availability setup) to IBM Integrated Synchronization
 

In this scenario, a Db2 for z/OS V12 FL 500 subsystem is paired with a Db2 Analytics Accelerator V7, and data is replicated to the accelerator using CDC. CDC is set up without high availability. The following picture shows this environment with sample values for the CDC port number, the DDF port number and the IP addresses.  

image-20200317170852-4

After the migration, the data is replicated to the accelerator using IBM Integrated Synchronization. The following picture shows the migrated environment with sample values for the DDF secure port and the IP addresses.

image-20200317170852-5

The setup and migration steps are the following (after it has been verified that all prerequisites are installed for Db2 for z/OS and AT-TLS encryption):

  • Setup phase: The following steps can be taken while CDC replication to the accelerator is still running:
    • Define a secure network port for DDF (for example 12000).
    • Set up AT-TLS encryption between the accelerator and the Db2 DDF.
    • Prepare a user ID with defined access rights for IBM Integrated Synchronization using a RACF PassTicket or password authentication.
      • Optionally this can be the same user ID that is already in use for CDC and that was specified during the incremental update enablement from the Db2 Analytics Accelerator Console. For IBM Integrated Synchronization, the user needs additional access rights. This is already described in the chapter ‘IBM Integrated Synchronization architecture and setup’.
  • Migration phase: During the execution of the following steps, a time window must be defined carefully because no data will be replicated to the accelerator until the steps are complete. The time window must be at least as long as it will take to reload all the involved tables to the accelerator.
    • Stop replication for the Db2 system (for example by using administration client controls in Db2 Analytics Accelerator Studio or Data Server Manager)
    • Disable replication for all tables (for example by using administration client controls).
    • Log on to the Db2 Analytics Accelerator Console and disable CDC replication for the Db2 subsystem.
    • Using the Db2 Analytics Accelerator Console, enable IBM Integrated Synchronization for the Db2 subsystem. Required input parameters are the Db2 IP address that the accelerator uses to connect to Db2 for z/OS (for example 10.2.9.8), the secure DDF port number (for example 12000), the prepared user ID and the transferred public key of the signer certificate.
    • Start replication for the Db2 subsystem (for example by using administration client controls).
    • Enable tables for replication (for example by using administration client controls).
    • Reload the tables (for example by using administration client controls).
  • After a successful reload, you have completed the migration to IBM Integrated Synchronization. If it is no longer needed, you can now start removing the CDC capture agent from the z/OS system.
Accelerator V7: Migration from CDC (with high-availability setup) to IBM Integrated Synchronization

In this scenario, a Db2 for z/OS V12 FL 500 data sharing group is paired to a Db2 Analytics Accelerator V7, and data is replicated to the accelerator using CDC. CDC is set up with high availability. This means one CDC Capture Agent is set up on one Db2 member and is replicating data to the V7 accelerator. The accelerator itself maintains a Db2 connection via DDF to the same Db2 member that the CDC Capture Agent is running on. A second CDC Capture Agent in hot standby mode is set up on another Db2 member. If the Db2 member of the active CDC Capture Agent goes down, or the Capture Agent itself fails, the second CDC Capture Agent on the other Db2 member takes over. The accelerator server then connects to Db2 for z/OS via DDF on that Db2 member. The following figure shows this environment with sample values for the CDC port number, the DDF port number and the IP addresses.

image-20200317170852-6

After the migration, the data is replicated to the accelerator using IBM Integrated Synchronization. High-availability for the log data provider can be ensured using a special DDVIPA setup; that means specifying the TIMEDAFFINITY 60 option in the VIPADISTRIBUTE statement for the DDVIPA and the dedicated SECPORT.  The TIMEAFFINITY option specifies a period for which incoming connection requests from the accelerator are directed to the same Db2 member. IBM Integrated Synchronization has a timeout value of 60 seconds for the Db2 log data provider tasks. If no new request comes in within 60 seconds for each active session, the respective session is terminated automatically by Db2, and the resources are cleaned up. Accelerators send new requests within this period to keep the log data provider task active. A TIMEDAFFINITY value of 60 (specified in seconds) lets the DDVIPA distribute all repeating requests to the same member if both sides are running. When Db2 is no longer available to receive the request (for example, no process is listening to the port anymore because the Db2 member went down), DDVIPA ignores the TIMEDAFFINITY parameter and starts a new distribution process to find the next available Db2 member that can accept a connection at the specified port address. After the new connection is established, the accelerator remains connected to that member until it cannot reach that member anymore.

The following picture shows the migrated environment with sample values for the DDF secure port and the IP addresses. IBM Integrated Synchronization is set up for high-availability.

Integrated Synchronization high-availability setup

The setup and migration steps are the following (after it has been verified that all prerequisites are met for Db2 for z/OS and AT-TLS encryption):

  • Setup phase: The following steps can be taken while CDC replication to the accelerator is still running:
    • Define a location alias (for example DBINSYNCH) and a secure network port for DDF (for example 12011) to be used by IBM Integrated Synchronization. Specify the TIMEDAFFINITY 60 option on the VIPADISTRIBUTE statement for the DDVIPA and the SECPORT.  Start the location alias on all Db2 members that are supposed to participate in the high-availability setup for IBM Integrated Synchronization.
      • In specific environments, there might already be a SECPORT for other workloads. In the previous figure, this is SECPORT 12000. For this high-availability setup, it is required to define a dedicated SECPORT for IBM Integrated Synchronization (for example 12011).
      • If you have more than one Db2 member on the same z/OS LPAR, then only one of these members can participate in the high-availability setup.
    • Set up AT-TLS encryption between the accelerator and the Db2 DDF.
    • Prepare a user ID with defined access rights for IBM Integrated Synchronization using a RACF PassTicket or password authentication.
      • Optionally, this can be the same user ID that is already used in the incremental update setup for CDC (specified during the incremental update enablement from the Db2 Analytics Accelerator Console). For IBM Integrated Synchronization, the user needs additional access rights. This is described in the chapter ‘Integrated Synchronization architecture and setup’.
  • Migration phase: During the execution of the following steps, a time window must be defined carefully because no data will be replicated to the accelerator until the steps are complete. The time window must be at least as long as it will take to reload all the involved tables to the accelerator.
    • Stop replication for the Db2 subsystem (for example by using administration client controls from Db2 Analytics Accelerator Studio or from Data Server Manager).
    • Disable replication for all tables (for example by using administration client controls).
    • Log on to the Db2 Analytics Accelerator Console and disable CDC replication for the Db2 subsystem.
    • Using the Db2 Analytics Accelerator Console, enable IBM Integrated Synchronization for the Db2 subsystem. Required input parameters are the Db2 DDVIPA (Db2 group IP) that the accelerator uses to connect to Db2 z/OS (for example 10.2.9.8), the Db2 secure DDF port number (for example 12011) associated with the location alias that has been started on  the Db2 members, the prepared user ID and the transferred public key of the signer certificate.
    • Start replication for the Db2 subsystem (for example by using administration client controls).
    • Enable tables for replication (for example by using administration client controls).
    • Reload the tables (for example by using administration client controls).
  • After a successful reload, you have completed the migration to IBM Integrated Synchronization. If it is no longer needed, you can now start removing the CDC capture agent from the z/OS system.
Accelerator V5 and V7: Migration from Accelerator V5 with CDC to Accelerator V7 with IBM Integrated Synchronization

In this scenario, a Db2 for z/OS V12 FL 500 system is paired with a Db2 Analytics Accelerator V5 and data is replicated to the V5 accelerator using CDC. The Db2 system can either be a Db2 subsystem or a Db2 data sharing group. In addition, a new V7 accelerator is set up in coexistence mode with a V5 accelerator; this means both accelerators are paired with the same Db2 subsystem.

For this setup, it is highly recommended that you start immediately with IBM Integrated Synchronization on the V7 accelerator rather than starting with CDC first and migrating to IBM Integrated Synchronization at a later time. This way, a replication downtime on the V7 accelerator will be avoided. In contrast, a replication downtime would be required for migration from CDC to IBM Integrated Synchronization on the V7 accelerator as described in the previous migration chapters because tables must be reloaded after migrating these to the IBM Integrated Synchronization environment.

Additional information about IBM Integrated Synchronization

The following additional publications are available:

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS4LQ8","label":"Db2 Analytics Accelerator for z\/OS"},"Component":"Incremental Update","Platform":[{"code":"PF035","label":"z\/OS"}],"Version":"V7.5","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
06 September 2021

UID

ibm16027910