UVALID
If a character data item contains valid UTF-8 or UTF-16 data, the UVALID function returns the value zero. If a character data item contains invalid UTF-8 or UTF-16 data, the UVALID function returns the index of the first invalid element.
The function type is integer.
- argument-1
- Must be of class alphabetic, alphanumeric, national or UTF-8.
- If argument-1 is of class alphabetic, alphanumeric or UTF-8, and it consists of valid UTF-8 encoded Unicode data, the returned value is zero.
- If argument-1 is of class alphabetic, or alphanumeric, and it consists of valid UTF-8 encoded Unicode data, the returned value is zero.
- If argument-1 is of class alphabetic, alphanumeric or UTF-8, and it contains invalid UTF-8 encoded Unicode data, the returned value is the position of the first byte where the invalid UTF-8 data starts.
- If argument-1 is of class alphabetic, or alphanumeric, and it contains invalid UTF-8 encoded Unicode data, the returned value is the position of the first byte where the invalid UTF-8 data starts.
- If argument-1 is of class national, and it consists of valid UTF-16 encoded Unicode data, the returned value is zero.
- If argument-1 is of class national, and it contains invalid UTF-16 encoded Unicode data, the returned value is the position of the first UTF-16 encoding unit where the invalid UTF-16 data starts. This position is one plus the number of well-formed UTF-16 encoding units that precede the invalid data.
Value Range | Dependency | Validity |
---|---|---|
x'00' - x'7F' | None | Valid |
x'80' - x'C1' | None | Invalid |
x'C2' - x'DF' | Followed by another byte that is in the range x'80' to x'BF' | Valid |
x'E0' - x'EF' | If the first byte is x'E0', followed by two more bytes that meet the following
requirements:
|
Valid |
If the first byte is in the range x'E1' to x'EC', both the second and third bytes
are in the range x'80' to x'BF' |
Valid | |
If the first byte is x'ED', followed by two more bytes that meet the following
requirements:
|
Valid | |
If the first byte is in the range x'EE' to x'EF', both the second and third bytes
are in the range x'80' to x'BF' |
Valid | |
x'F0' - x'F4' | If the first byte is x'F0', followed by three more bytes that meet the following
requirements:
|
Valid |
If the first byte is in the range x'F1' to x'F3', all the second, third, and fourth bytes
are in the range x'80' to x'BF' |
Valid | |
If the first byte is x'F4', followed by three more bytes that meet the following
requirements:
|
Valid | |
x'F5' - x'FF' | None | Invalid |
Value Range | Dependency | Validity | Number of bytes if converted to UTF-8 |
---|---|---|---|
nx'0000' - nx'007F' | None | Valid | 1 |
nx'0080' - nx'07FF' | None | Valid | 2 |
nx'0800' - nx'D7FF' | None | Valid | 3 |
nx'D800' - nx'DBFF' | Must be followed by a second encoding unit with a value in the range nx'DC00' to nx'DFFF' | Valid | 4
(A Unicode surrogate pair)
|
Other cases | Invalid | Not applicable | |
nx'E000' - nx'FFFF' | None | Valid | 3 |
Example 1
If A is an alphabetic or alphanumeric data item that contains value x'4BC3A4666572' ('Käfer') in UTF-8 encoding, the returned value from UVALID(A) is 0.
Example 2
If B is a national data item that contains value nx'005400F6006200750072D858DC6B0073' ('Töber𦁫s') in UTF-16 encoding, the returned value from UVALID(B) is 0.
Example 3
If C is a national data item that contains value nx'0054D9C3006200750072D858DC6B0073' in UTF-16 encoding, the returned value from UVALID(C) is 2 because x'D9C3' does not have a low surrogate pair.
Example 4
If D is a national data item that contains value nx'005400F60062DC010072D858DC6B0073' in UTF-16 encoding, the returned value from UVALID(D) is 4 because x'DC01' does not have a corresponding high surrogate pair.