Replicating multibyte (MBCS) and double-byte (DBCS) character data

CDC Replication replicates character data between a wide variety of encodings and will automatically convert the data from the column encoding detected on the source to the column encoding detected on the target. For example, you can replicate multibyte character data such as Japanese, Chinese, or Korean. Character data in these languages cannot be represented in a single byte. The most common MBCS implementation is double-byte character sets (DBCS).

By default, CDC Replication assumes that the data stored in a character capable column is in the encoding associated with that column type. For instance, if your database is set to use Shift-JIS, then data stored in CHAR and VARCHAR columns is assumed to be in Shift-JIS by default. However, CDC Replication is only concerned with the encoding of the data, not the encoding of the column storage type. This flexibility allows the product to deal with situations where the encoding of the data does not match the encoding specified for the column in the database. The ability to override the detected encoding is determined by CDC Replication. Overriding the detected column encoding allows you to specify the actual encoding of the data as known by you.
Restrictions:
  • Encoding overrides are not permitted for the CDC Replication Engine for FlexRep.
  • For DataStage and Kafka targets, the target encoding is UTF-8 for all character data. For Kafka targets, you can override the encoding to be binary.

This functionality has been extended to not only standard character-capable column types such as CHAR and VARCHAR, but also to traditionally Unicode capable columns such as NCHAR and NVARCHAR, many traditionally binary column types, as well as many large object (LOB) column types, whether or not these are traditionally considered to be character-based. CDC Replication treats all of them as being character data capable. To provide the greatest level of flexibility and where permitted by the limitations of the database, CDC Replication undertakes to remove the distinction between the data themselves and the data type as known by the database that is used to contain the data.

There may be situations where you want to replicate the data exactly as is with no change to encoding. In these situations, you can designate the column as being binary and the data will be replicated as is. All binary designated column data must also be mapped to binary column data.

Encoding conversion can increase the workload for your source or target servers. CDC Replication provides the ability to specify (with a subscription-level preference) where that workload will be incurred – on either the source or the target.

CDC Replication also provides an upgrade process for subscriptions that use older implementations (CDC Replication version 6.3 and earlier) of MBCS support. Management Console allows you to quickly convert subscriptions to the auto-encoding mode for MBCS data that is available in CDC Replication version 6.5 and later.

Note: When you override default character encodings for a database, the specified encoding must be one that is supported by the database itself.