%SPLIT (Split String into Substrings)
%SPLIT(string {: separators { : *ALLSEP {: *NATURAL | *STDCHARSIZE}}})
%SPLIT splits a string into an array of substrings. It returns a temporary array of the substrings.
- SORTA
- %ELEM
- %LOOKUP
- %SUBARR
The first operand is the string to be split. It can be alphanumeric, graphic, or UCS-2.
- If it is not specified, or if *BLANK or *BLANKS is specified, %SPLIT splits at blanks.
- If it is specified and not *BLANK or *BLANKS:
- It must have the same type and CCSID as the first operand.
- If the length of the second operand is greater than 1, any of the characters in the
second operand indicate the end of each substring.
For example,
%SPLIT('abc.def-ghi' : '.-')
has two separator characters,'.'
, and'-'
, so it returns an array with three elements:('abc','def','ghi')
.
The third operand can be *ALLSEP, indicating that every separator is considered to separate two substrings. When *ALLSEP is not specified, separators following other separators, leading separators, and trailing separators are ignored.
- Specify *NATURAL to indicate that %SPLIT operates in CHARCOUNT NATURAL mode.
The number of bytes in each character is considered when locating the
separators.
For example, if the
string
parameter is a UTF-8 string with the value '1á2ç3', and theseparators
parameter is a UTF-8 string with the value 'á' only 'á' is considered to be a separator. The result strings are '1' and '2ç3'. - Specify *STDCHARSIZE to indicate that %SPLIT operates in CHARCOUNT STDCHARSIZE mode.
In the previous example, with CHARCOUNT STDCHARSIZE mode,
each byte of the
separator
is considered to be a separator character. Characters 'á' and 'ç' are 2-byte characters, x'C3A1' and x'C3A7'. The first bytes of 'á' and 'ç' are the same, x'C3', so the first byte of 'ç' is considered to be a separator. The result elements of %SPLIT are not all valid, since some UTF-8 characters are split incorrectly.
How separators are handled by %SPLIT
In the following table, the separator is assumed to be '.'.
Separator type | String to be split | Without *ALLSEP | With *ALLSEP |
---|---|---|---|
A leading separator | '.abc' |
An array
with one element: ('abc') |
An array
with two elements: ('','abc') |
An extra separator | 'abc..def' |
An array
with two elements: ('abc','def') |
An array
with three elements: ('abc','','def')
|
A trailing separator | 'def.' |
An array
with one element: ('def') |
An array
with two elements: ('def','')
|
Several extra separators | '..abc..def..' |
An array
with two elements: ('abc','def') |
An array
with seven elements: ('','','abc','','def','','')
|
If the length of the separators operand is zero, the result is a single element
with the value of the string operand.
For example, if sep
has a length of zero,
%SPLIT('a.b.c' : sep)
returns an array
with one element: ('a.b.c')
.
If the string has a length of zero, %SPLIT returns zero elements.
- If *ALLSEP is not specified, %SPLIT returns zero elements.
- If *ALLSEP is specified, %SPLIT returns
N + 1
empty strings, whereN
is the length of the string.
Examples of %SPLIT
- In the following example, %SPLIT has only one parameter, so
the string is split at blanks.
DCL-S array VARCHAR(10) DIM(10); array = %SPLIT('Monday Tuesday Wednesday'); // array(1) = "Monday" // array(2) = "Tuesday" // array(3) = "Wednesday"
- In the following example, %SPLIT has two parameters.
The second parameter has two characters, period (.) and blank.
The string is split when either of the characters in the second parameter is found.
DCL-S array VARCHAR(10) DIM(10); array = %SPLIT('Today is Monday. Tomorrow is Tuesday.' : '. '); // array(1) = "Today" // array(2) = "is" // array(3) = "Monday" // array(4) = "Tomorrow" // array(5) = "is" // array(6) = "Tuesday"
- In the following example, the FOR-EACH operation is used to process
the temporary arrays returned by %SPLIT.
- The string is first split into sentences, splitting at the characters used to end a sentence.
- The string is then split into phrases, splitting at commas, colons, and semi-colons.
- Each phrase is then split into words, splitting at blanks.
DCL-S sentence VARCHAR(10000); DCL-S phrase VARCHAR(10000); DCL-S word VARCHAR(10000); DCL-S string VARCHAR(10000); FOR-EACH sentence in %SPLIT(string : '.!?'); // 1 ... FOR-EACH phrase in %SPLIT(sentence : ',;:'); // 2 ... FOR-EACH word in %SPLIT(phrase); // 3 ... ENDFOR; ENDFOR; ENDFOR;
- In the following example, the RPG programmer does not want extra separators
to be ignored by %SPLIT.
- The string normally contains three names, separated by commas:
'Mary,Jane,Smith'
. - If there is no middle name, there are two commas together:
'Mary,,Smith'
.
To ensure that all commas act as separators, *ALLSEP is specified.
DCL-S string VARCHAR(100); DCL-S names VARCHAR(100) DIM(3); string = 'Mary,Jane,Smith'; names = %SPLIT (string : ',' : *ALLSEP); // names = Mary | Jane | Smith string = 'Mary,,Smith'; names = %SPLIT (string : ',' : *ALLSEP); // names = Mary | | Smith
- The string normally contains three names, separated by commas: