Character sets and code pages

Even with the same encoding scheme, different CCSIDs exist, and the same code point can represent a different character in different CCSIDs. Furthermore, a byte in a character string does not necessarily represent a character from a single-byte character set (SBCS).

The following figure shows how a typical character set might map to different code points in two different code pages.

Begin figure description. A table shows an ASCII code page and another table shows an EBCDIC code page. Character set ss1 maps to different code points for each code page. End figure description. — Figure 1. Code page mappings for character set ss1 in ASCII and EBCDIC

For Unicode, there is only one CCSID for UTF-8 and only one CCSID for UTF-16. The following figure shows how the first 127 single code points for UTF-8 are the same as ASCII with a CCSID of 367. For example, in both UTF-8 and ASCII CCSID 367, an A is X'41' and a 1 is X'31'.

Begin figure description. A table that shows the code point mapping for the first 127 code points for UTF-8. End figure description. — Figure 2. Code point mapping for the first 127 code points for UTF-8 single-byte characters (CCSID 1208)

The following figure shows a comparison of how some UTF-16 and UTF-8 code points map to some sample characters. The character for the eighth note musical symbol takes two 2 byte code points because it is a supplementary character.

Begin figure description. A table that shows the UTF-8 and UTF-16 code points for four sample characters. The additional description contains a literal description of the table. — Figure 3. A comparison of how some UTF-8 and UTF-16 code points map to some sample characters