UTF-8 support
UTF stands for UCS (Unicode) Transformation Format
. The UTF-8 encoding can be used to
represent any Unicode character. Depending on a Unicode character’s numeric value, the corresponding
UTF-8 character is a 1, 2, or 3 byte sequence. Table 1 shows the mapping
between Unicode and UTF-8. See RFC 2279 and RFC 2253 for more information about UTF-8.
Unicode range (hexadecimal) | UTF-8 octet sequence (binary) |
---|---|
0000-007F | 0xxxxxxx |
0080-07FF | 110xxxxx 10xxxxxx |
0800-FFFF | 1110xxxx 10xxxxxx 10xxxxxx |
The LDAP Version 3 protocol specifies that all data exchanged between LDAP clients and servers be UTF-8. The LDAP server supports UTF-8 data exchange as part of its Version 3 protocol support.
Note: For UTF-8 data stored in a LDAP server’s TDBM and GDBM (when
Db2®-based) backends, collation for
single-byte UTF-8 characters is relative to the server’s locale. For multi-byte UTF-8 characters,
collation is relative to the numeric value of the equivalent Unicode character.