IBM Support

Mixed EBCDIC (DBCS) Conversion Considerations with Unicode

Troubleshooting


Problem

IBM i Access Client Solutions Windows Application Package ODBC, OLEDB, and Java toolbox JDBC might convert some mixed EBCDIC (DBCS) characters to a different character in Unicode making round-trip conversions difficult. This problem can also affect other IBM i Access Client Solutions functions such as data transfer.

Resolving The Problem

IBM i Access Client Solutions Windows Application Package conversion process for converting client-side Unicode to a mixed EBCDIC code page includes converting data by using both the IBM conversion tables and the Microsoft Windows conversion tables. The IBM Unicode conversion does not match the Microsoft Unicode conversion for all characters. These platforms use different standards. IBM i Access uses the Microsoft conversion to best maintain character conversion for use in a Microsoft Windows environment. This difference can cause some characters to not convert to the expected character and makes a round-trip conversion difficult. Some characters that are commonly affected include the ¥ (yen), full-width hyphen, half-width hyphen, and also the CJK unified Ideograph range between F9D6 -> FEFE and 8140 -> A0FE. (Unicode E000 -> F8FF range is "Private Use Area").

To avoid this issue, users can convert their database table columns to use Unicode or UTF-8 rather than mixed EBCDIC. This change avoids most of conversion problems that can be encountered with mixed EBCDIC and Unicode conversions.


Specific Insert Example with Yen and ODBC

For Unicode-to-SBCS conversions, a simple conversion table (retrieved from IBM i) is used. Unicode u005C is converted to xE0 in CCSID 1027 because, in Unicode, a u005C is a backslash, and in CCSID 1027 xE0 is a backslash.

For Unicode-to-mixed EBCDIC conversions, the Windows API, WideCharToMultiByte is used to first convert the Unicode to the Windows Mixed ASCII. Then, the result is converted by using the Mixed ASCII-to-Mixed EBCDIC conversion tables supplied by IBM i.

During the first phase, WideCharToMultiByte converts the u005C (backslash) to a x5C (yen sign) because there is no backslash in MS-932. During the second phase, the IBM i conversion table converts the x5C (yen sign) to a xB2 (yen sign) in 5035.

Note: This behavior changes with APAR SE14513. The APAR provides a code change to directly convert Unicode to a mixed EBCDIC CCSID with a one-step conversion. This conversion works similar to the SBCS conversion. A Unicode backslash is correctly converted to EBCDIC backslash rather than the yen sign.

Unicode CJK Range to DBCS

The characters in the "CJK unified Ideograph" Unicode area (http://www.unicode.org/charts/PDF/U4E00.pdf) are not mapped to the similar characters in EBCDIC DBCS code pages. IBM defines these characters as being in the user-defined range of the Unicode standard. Conversions for characters in this range on non-IBM Unicode platforms (for example, Microsoft, Sun's JVM, or Linux) do not convert to the expected character.  To resolve the issue, programmers must use an IBM Unicode code page on the client (such as the IBM JVM) or change the database table to a use a Unicode or UTF-8 CCSID

Unicode surrogate characters

Unicode implements some DBCS characters as surrogates and the Access for Windows converters are not designed to handle Unicode surrogates. An example is Japanese CCSID 1399 x'B7' range characters. For example, attempting to convert CCSID 1399 x'B79B' to Unicode results in a substitution character or conversion error. The IBM i Access Client Solutions Windows Application Package client has no plans to enhance the converters to support Unicode surrogates. The solution is to use Unicode columns or an IBM i Access Client Solutions client that supports Unicode surrogates.

[{"Product":{"code":"SWG60","label":"IBM i"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Host Servers","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"Version Independent","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Historical Number

349034081

Document Information

Modified date:
01 July 2021

UID

nas8N1015955