UTF-8 support

UTF stands for UCS (Unicode) Transformation Format. The UTF-8 encoding can be used to represent any Unicode character. Depending on a Unicode character’s numeric value, the corresponding UTF-8 character is a 1, 2, or 3 byte sequence. Table 1 shows the mapping between Unicode and UTF-8. See RFC 2279 and RFC 2253 for more information about UTF-8.

Table 1. Mapping between Unicode and UTF-8
Unicode range (hexadecimal)	UTF-8 octet sequence (binary)
0000-007F	0xxxxxxx
0080-07FF	110xxxxx 10xxxxxx
0800-FFFF	1110xxxx 10xxxxxx 10xxxxxx

The LDAP Version 3 protocol specifies that all data exchanged between LDAP clients and servers be UTF-8. The LDAP server supports UTF-8 data exchange as part of its Version 3 protocol support.

Note: For UTF-8 data stored in a LDAP server’s TDBM and GDBM (when Db2®-based) backends, collation for single-byte UTF-8 characters is relative to the server’s locale. For multi-byte UTF-8 characters, collation is relative to the numeric value of the equivalent Unicode character.