Kafka custom operation processor (KCOP) for the CDC Replication Engine for Kafka

By configuring a Kafka custom operation processor (KCOP), you can control the Kafka producer records that the CDC Replication Engine for Kafka writes.

A KCOP is a Java™ class that gives you control over the Kafka producer records that are written to Kafka topics in response to insert, update, and delete operations that occur on source database tables.

When a subscription is configured to use a KCOP, the CDC Replication Engine for Kafka passes Avro generic records that represent the source operation to the KCOP. The KCOP returns to the CDC Replication Engine for Kafka zero or more Kafka producer records in response, and the CDC Replication Engine for Kafka writes these records to Kafka. The KCOP can determine all aspects of the Kafka producer record, including the topic name and partition to which the record is to be written, the content of the key and value fields, and how many records to write to which topics in response to a given source operation. Using a KCOP can alleviate the strict dependency on the Confluent Schema Registry Platform or the Hortonworks equivalent because some integrated KCOPs do not use a Confluent serializer and thus having a Confluent schema registry or the Hortonworks equivalent is not required for such subscriptions.

The CDC Replication Engine for Kafka includes integrated KCOPs that you can run by simply entering the fully qualified class name and parameters (if needed). If you wish to alter the behavior of integrated KCOPs or create an entirely new KCOP, the CDC Replication Engine for Kafka also supports user-defined KCOPs.

Whether using integrated or user-defined KCOPs, you receive the same guaranteed data delivery semantics as with default replication. The CDC Replication Engine for Kafka maintains the bookmark such that only records that are explicitly confirmed as written by Kafka are considered committed. The engine therefore does not skip operations. This behavior is maintained even spanning multiple replication sessions, where a replication session is a subscription in Mirror/Active state. This behavior means that even in the case of abnormal termination (perhaps the Kafka cluster went down) the CDC Replication Engine for Kafka, upon starting a subscription as Mirror/Active, ensures that any unconfirmed replicated source operations are resent, written to, and confirmed by Kafka before the engine updates its bookmark to mark the point that has been replicated. This is the data delivery guarantee.

If a KCOP specifies that a given source operation should result in multiple Kafka producer records being written, all records are confirmed before the CDC Replication Engine for Kafka considers replication of the operation to be successful.

All data that is written to a given topic/partition pairing is written in order for a subscription in Mirror/Active state. For example, if multiple records from different source tables in a subscription are sent to the same topic/partition, they appear in that topic/partition in the same order as on the source database.

Sample KCOP are provided with the CDC Replication Engine for Kafka. You can extend or modify these samples to suit your environment.