Understanding how CDC Replication interacts with your database

When CDC Replication interacts with your database, by reading its logs or applying data to its tables, it creates a dependency on your database.

The CDC Replication engine provides two ways to source change data:
  • Log-based configuration mode
  • Capture table-based configuration mode

Log-based configuration mode

Log management

Log management requires that you keep the logs from which CDC Replication reads until such time as CDC Replication has replicated data from them. The dmshowlogdependency command, available for most CDC Replication engines, informs you of those database logs on which CDC Replication continues to depend. Database logs should not be removed until such time as they no longer appear in the list of logs displayed when the command is issued.

The consequences of not adhering to this policy are that CDC Replication will either end with an error or appear to hang as it waits for the log files to become available to read, depending on the database. If the log files have been deleted and are permanently unavailable then you will have no option but to refresh the data. CDC Replication cannot skip logs while maintaining data integrity as it will never know what data would be missed in the log files that were skipped.

Similarly, the log files must have file system permissions sufficient for CDC Replication to be able to read them. Should such permissions not be sufficient, CDC Replication will fail with a message indicating that the specified log could not be opened for reading.

Resource utilization and availability

CDC Replication is frequently installed on the same server as the database from which it is replicating or to which it is replicating. For this reason, it is important to ensure that the memory allocated for use by CDC Replication is actually physically available on the machine. By default, some databases can be configured to use all available memory on the machine. Such a configuration will not work for CDC Replication, as it will have no memory with which to run. At least the amount of memory allocated to IBM® Data Replication will need to be set aside from the database to ensure that CDC Replication will be able to run.

Symptoms of resource starvation include many variations on CDC Replication failing due to out of memory conditions, communications failures, very high latency, timeout errors, and others.

Change management

Sometimes referred to as schema evolution, change management refers to the necessity of planning changes to the structure of database tables that CDC Replication is replicating and coordinating those changes with the operation of CDC Replication to ensure that the changes do not disrupt replication.

The database and CDC Replication must share the same understanding of the structure of the tables being replicated. Without a shared understanding, CDC Replication will interpret the table data incorrectly, and thereby replicate that data incorrectly. CDC Replication endeavours to protect users from potential data loss or corruption resulting from uncoordinated table structure changes, but it is not always able to do so. In order to minimize recovery efforts resulting from uncoordinated table structure changes, it is a best practice to follow the change management procedures appropriate to your database. Coordinating change management between the database and CDC Replication will ensure smooth continuity of replication with minimal effort. Please note that change management practices apply to the tables in both source and target databases.

Recognizing that some table structure changes are inadvertently performed, tech notes are also available to assist you in recovering from uncoordinated table structure changes.

IBM Data Replication continues replication without interruption during an index rebuild, table rebuild, or index reorganization operation that is unrelated to a table structure change.

Capture table-based configuration mode

When CDC Replication interacts with your database, by reading Microsoft SQL CDC capture tables that it creates a dependency on your database.

This dependency appears in several ways:
  • Changed data reading process
  • Capture table Retention policy

Changed data reading process

CDC reads the changed data from capture table that is created when Microsoft SQL Server CDC is enabled on the table. CDC reads the changed data from capture table, which is created when Microsoft SQL Server CDC is enabled on the table. Default capture will be enabled on the table by IIDR CDC if the user is the database owner. In this instance, Microsoft SQL Server CDC is enabled on the table by IIDR CDC using the default values. Refer to Microsoft SQL Server CDC manual for more policies and cleanup procedures. IDR CDC uses the table exactly as it is if Microsoft SQL Server CDC has already been activated on it.

Capture table Retention policy

Based on the dependency of the IIDR CDC mapped tables, you must take care of the frequency and persistence of capture table clean up. For most CDC Replication engines, use the dmshowlogdependency command to determine the minimum LSN required in the capture tables on which CDC Replication continues to rely.