Specifying how DB2 calculates the length of a string
For example, consider the string Jürgen in UTF-8. This string consists of 6 characters. However, it takes 7 bytes of storage, because the character ü takes 2 bytes in UTF-8. You can specify whether you want DB2 to count the length as 6 or 7.
The key is to specify the size code unit that you want DB2 to use when calculating the length. A code unit is the minimal bit combination that can represent a character.
Procedure
To specify how DB2 calculates the length of a string:
- CHARACTER_LENGTH
- CLOB
- DBCLOB
- GRAPHIC
- LEFT
- LOCATE
- LOCATE_IN_STRING
- OVERLAY
- POSITION
- RIGHT
- SUBSTRING
- VARCHAR
- VARGRAPHIC
Options to specify unit of measurement:
- CODEUNITS16
- Specifies that DB2 is to count the length by 16-bit (or 2-byte) code units. For every character that is 2 bytes or less in the string, DB2 counts a length of 1.
- CODEUNITS32
- Specifies that DB2 is to count the length by 32-bit (or 4-byte) code units. For every character that is 4 bytes or less in the string, DB2 counts a length of 1. CODEUNITS32 always returns the same value as CODEUNITS16 unless you have supplementary characters.
- OCTETS
- Specifies that DB2 is to
count the length by bytes. For every byte in the string, DB2 counts a length of 1.
The OCTETS option is not available for all of the listed functions.
Example
The following two queries both return the value 6, because DB2 counts the string Jürgen as 6 characters. In the first query, CODEUNITS32 means that any character that is 4 bytes or less is counted as 1. In the second query, CODEUNITS16 means that any character that is 2 bytes or less is counted as 1. In both cases, the result is the same.
SELECT CHARACTER_LENGTH(NAME,CODEUNITS32)
FROM T1 WHERE NAME = 'Jürgen';
SELECT CHARACTER_LENGTH(NAME,CODEUNITS16)
FROM T1 WHERE NAME = 'Jürgen';
However the following two queries return the value 7,
because the string contains 7 bytes. In the first query, OCTETS means
that length is to be calculated in bytes. In the second query, the
LENGTH function always counts by bytes.SELECT CHARACTER_LENGTH(NAME,OCTETS)
FROM T1 WHERE NAME = 'Jürgen';
SELECT LENGTH(NAME)
FROM T1 WHERE NAME = 'Jürgen';
SET :POSITION = LOCATE_IN_STRING('Jürgen lives on Hegelstraße','ß',-1,CODEUNITS32);
-- search from end
The following statement sets the value of the host variable
POSITION to 6. DB2 starts at
position 1 and looks for the third occurrence of the character N.
In this case, OCTETS means that DB2 counts
the length by bytes. SET :POSITION = LOCATE_IN_STRING('WINNING','N',1,3,OCTETS);
Function | Result | Hexadecimal result value |
---|---|---|
|
|
X'4AC3BC' |
|
|
X'4AC3BC' |
|
|
X'4A20' (a truncated string) |
|
|
X'4AC3' (The letter ‘J' and a partial character)1 |
|
|
X'C3BC7267656E' |
|
|
X'C3BC7267656E' |
|
|
X'207267656E' (a truncated string) |
|
|
X'BC7267656E' (a partial character followed by ‘rgen')1 |
|
|
X'4AC3BC' |
|
|
X'4AC3BC' |
|
|
X'4A20' (a truncated string) |
|
|
X'4AC3' (a partial character) |
|
|
a zero-length string |
|
|
a zero-length string |
|