|
This topic describes how the z/OS® support
for the Unicode Standard collation service is called.
Collation works under two basic schemes
— the binary comparison between two Unicode strings, and the generation
of a sort key vector. Following is a description of how the service
is called, followed by an explanation of the uses of the two types
of calls.
Binary comparison:
The 31-bit caller has to provide: - Source1 buffer pointer (31-bit pointer), ALET (4 byte), and length
(8 byte)
- Source2 buffer pointer (31-bit pointer), ALET (4 byte), and length
(8 byte)
- Target1 buffer pointer (31-bit pointer), ALET (4 byte), and length
(8 byte)
- Target2 buffer pointer (31-bit pointer), ALET (4 byte), and length
(8 byte)
- Collation level
- Work1 buffer pointer (31-bit pointer), ALET (4 byte), and length
(8 byte)
- Work2 buffer pointer (31-bit pointer), ALET (4 byte), and length
(8 byte)
- Dynamic data area pointer (DDA) (31-bit pointer), ALET (4 byte),
and length (8 byte)
- Flag1 (handle options)
- Collation mask options (sort key option=0)
The 64-bit caller has to provide: - Source1 buffer pointer (64-bit pointer), ALET (4 byte), and length
(8 byte)
- Source2 buffer pointer (64-bit pointer), ALET (4 byte), and length
(8 byte)
- Target1 buffer pointer (64-bit pointer), ALET (4 byte), and length
(8 byte)
- Target2 buffer pointer (64-bit pointer), ALET (4 byte), and length
(8 byte)
- Collation level
- Work1 buffer pointer (64-bit pointer), ALET (4 byte), and length
(8 byte)
- Work2 buffer pointer (64-bit pointer), ALET (4 byte), and length
(8 byte)
- Dynamic data area pointer (DDA) (64 bit pointer), ALET (4 byte),
and length (8 byte)
- Flag1 (handle options)
- Collation mask options (sort key option=0)
For collation features (UCA400R1, UCA410, and UCA600),
there are two ways to set the APIs as part of Unicode Dynamic Capabilities: - Long Path. This way to perform Collation API
settings has the intention to continue to use the existing collation
settings "plus" the new ones
- Short Path. This new way to set Collation API
is a very simple and easy for all the collation features supported.
Another option is to use SETUNI or SET UNI=xx commands as part
of an static initialization. For more information, see SETUNI command
in z/OS MVS System Commands.
Long Path:
The 31-bit caller has to provide: - Set parameter area version2
- Source1 buffer pointer (31-bit pointer), ALET (4 byte), and length
(8 byte)
- Source2 buffer pointer (31-bit pointer), ALET (4 byte), and length
(8 byte)
- Target1 buffer pointer (31-bit pointer), ALET (4 byte), and length
(8 byte)
- Target2 buffer pointer (31-bit pointer), ALET (4 byte), and length
(8 byte)
- Collation level
- Work1 buffer pointer (31-bit pointer), ALET (4 byte), and length
(8 byte)
- Work2 buffer pointer (31-bit pointer), ALET (4 byte), and length
(8 byte)
- Dynamic data area pointer (DDA) (31-bit pointer), ALET (4 byte),
and length (8 byte)
- Flag1 (handle options)
- Collation mask options (sort key option=0)
- Case Options Flags
- Hiragana support
- Locale or User Collation Rules file + DSN
+ Vol
The 64-bit caller has to provide: - Set parameter area version2
- Source1 buffer pointer (64-bit pointer), ALET (4 byte), and length
(8 byte)
- Source2 buffer pointer (64-bit pointer), ALET (4 byte), and length
(8 byte)
- Target1 buffer pointer (64-bit pointer), ALET (4 byte), and length
(8 byte)
- Target2 buffer pointer (64-bit pointer), ALET (4 byte), and length
(8 byte)
- Collation level
- Work1 buffer pointer (64-bit pointer), ALET (4 byte), and length
(8 byte)
- Work2 buffer pointer (64-bit pointer), ALET (4 byte), and length
(8 byte)
- Dynamic data area pointer (DDA) (64 bit pointer), ALET (4 byte),
and length (8 byte)
- Flag1 (handle options)
- Collation mask options (sort key option=0)
- Case Options Flags
- Hiragana support
- Locale or User Collation Rules file + DSN
+ Vol
Short Path:
The 31-bit caller has to provide: - Set parameter area version2
- Source1 buffer pointer (31-bit pointer), ALET (4 byte), and length
(4 byte)
- Source2 buffer pointer (31-bit pointer), ALET (4 byte), and length
(4 byte)
- Target1 buffer pointer (31-bit pointer), ALET (4 byte), and length
(4 byte)
- Target2 buffer pointer (31-bit pointer), ALET (4 byte), and length
(4 byte)
- Work2 buffer pointer (31-bit pointer), ALET (4 byte), and length
(4 byte)
- Dynamic data area pointer (DDA) (31 bit pointer), ALET (4 byte),
and length (4 byte)
- Collation Keyword
The 64-bit caller has to provide: - Set parameter area version2
- Source1 buffer pointer (64-bit pointer), ALET (4 byte), and length
(8 byte)
- Source2 buffer pointer (64-bit pointer), ALET (4 byte), and length
(8 byte)
- Target1 buffer pointer (64-bit pointer), ALET (4 byte), and length
(8 byte)
- Target2 buffer pointer (64-bit pointer), ALET (4 byte), and length
(8 byte)
- Work2 buffer pointer (64-bit pointer), ALET (4 byte), and length
(8 byte)
- Dynamic data area pointer (DDA) (64 bit pointer), ALET (4 byte),
and length (8 byte)
- Collation Keyword
Note: Short path settings has high priority over long path.
Sort key vector:
How you generate the sort key vector depends on how you set the
sourceX buffer length. For example, you can use any of the following
input combinations: - Source1
- Source2
- Source1 and source2
In the first two cases, you only need to provide the pointers for
the applicable source, work, and target buffers. In case number three,
you must provide pointers for both sets of buffers.
You must always provide the following, regardless of which of the
three cases applies: - Collation level
- Dynamic data area pointer (DDA), ALET, and length
- Flag1 (handle options)
- Collation mask options (sort key option=1)
Following is an explanation of the two types of calls to the collation
service.
- Binary comparison:
This is the most common use of the collation
service. Two Unicode strings are input by the caller to be compared
(collated) in a culturally correct manner. Prior to collation, the
caller must provide a desired collation level and optionally, the
alternate weighting, and other options in the collation parameter
area, to specify a particular comparison type. Once the collation
service is called, it will return a compare result and a return and
reason code. For two given Unicode input strings A and B, the compare
result shows how one string is related to the other in the following
way: - -1, if A < B
- 0, if A = B
- 1, if A > B
The compare result and return codes are returned in the
fields CUNBOPRM_Result, CUNBOPRM_Return_Code, and CUNBOPRM_Reason_code
(for 31-bit), or CUN4BOPR_Result, CUN4BOPR_Return_Code and CUN4BOPR_Reason_code
(for 64-bit), respectively. To set alternate weighting options and
a collation level, parameter fields CUNBOPRM_Mask and CUNBOPRM_Coll_Level
(for 31-bit) or CUN4BOPR_Mask and CUN4BOPR_Coll_Level (for 64-bit)
are used, respectively.
For more information on how to use
these fields, see Description of parameters in area CUNBOPRM.
The two
input Unicode strings to be compared are set in the same way as the
other Unicode Services source buffers. A buffer pointer, length,
and ALET are set for each source buffer.
The target buffers
that are used to hold the converted bytes in the other Unicode services
are not needed to be set in this case. That is because no bytes will
be converted, except if the CUNBOPRM_Norm_Type or CUN4BOPR_Norm_Type
field is equal to NFD, NFKD, NFC or NFKC.
For UCA400R1, UCA410,
and UCA600 versions, only NFD are supported.
If Collation API is set with version 2 and
there is an NF (Normalization Form) set differently from NFD, the
NF will be ignored and Normalization will no longer be considered.
Also RC = CUN_RC_WARN, RS = CUN_RS_INVALID_NORMALIZATION_VALUE will
be set, even the process continues without any Normalization Form.
The
results obtained from the comparison are returned in the result,return
and reason code fields as described in the paragraph above. The work
buffers are used as auxiliary buffers to hold data during the collation
process. The work buffers should always be set in each collation
call with the sufficient length needed during the collation process,
otherwise a work buffer error will result.
For more information
about the target and work buffers, see Target buffer length considerations
and Work buffer length considerations.
- Sort Key:
A sort key, or sort key vector, is a collection of
weights for a given Unicode string which can be binary compared against
another sort key to produce a compare result.
Sort keys can
result from the collation process if the user sets the parameter area
field CUNBOPRM_Coll_Mask or CUN4BOPR_Coll_Mask with constant CUNBOPRM_MASK_SK
(see call samples). An associated comparison level and alternate
weighting option can be specified by the user to form a particular
sort key. Also, as part of new settings for Collation versions
UCA400R1, UCA410, and UCA600, consider the long
and short path for sort key generation settings.
The sort key
can be considered a "compare file", because it can be created as
a data set if properly specified by the user. The usefulness of a
sort key is that once created for an input string, it can be kept
and used repeatedly by the caller in binary comparisons with other
sort keys. This can represent a performance advantage for the caller,
because in this case there would be no need to call the collation
services, but only perform a binary comparison with the caller's
preferred compare routine.
A sort key for a given Unicode character
is formed by reading and processing the level weights found in the
AllKeys.txt file provided by the Unicode consortium at: http://www.Unicode.org/Unicode/reports/tr10/allkeys.txt. Collation version 3.0.1 follows sort key
generation as described on the Unicode Consortium TR#10, while recent Collation versions UCA400R1, UCA410, and UCA600 do not due to tailoring features.
In
order to use this collation functionality, the target buffers must
be set by the caller in addition to the source and work buffers.
The target buffers will hold the resulting sort key for their respective
source buffers. Both or only one sort key can be generated on each
call to the collation services. To assume that one of the source
buffers is not being used you must set its length at zero.
If
you plan on using your own binary compare algorithms for sort keys,
it is important you can interpret the sort key format. This is explained
in Sort key vector format. The size of the sort key is
determined by the collation level chosen. The greater the collation
level, the longer the sort key will be.
z/OS Unicode
Services collation does not provide a way of making a binary comparison
for any pair of sort keys provided by the user. It is the user's
responsibility to do the binary comparisons. If, after a call to z/OS,
collation returns a zero return code, you can check for the sort key
left in the target buffer(s). Otherwise, you must interpret the return
and reason code, and retry a collation call after taking the appropriate
steps.
For Collation versions UCA400R1,
UCA410, and UCA600, sort key weights have different
values than their respective versions from the DUCET (Default Unicode Collation Element Table - http://www.unicode.org/Public/UCA/latest/allkeys.txt)
because they were modified for tailoring reasons (Locales or User Collation Rules - UCR).
According
to each UCA (Unicode Collation Algorithm) version
and settings (Locales or UCR) the Sort keys might contain different
weights and then comparisons between different UCA version sort keys,
in combination with some Locales or UCR, might return with an undesired
comparison result. A good practice to avoid undesired results with
sort key previously generated would be making sort key comparisons
if and only if they comes from the same settings, that is, same UCA
version, Locale, Collation Level, case options, etc. Otherwise,
results might be inconsistent.
General considerations:
A successful call to collation always returns a valid collation
handle. This handle can be used as a fast path when recalling the
collation services, because it specifies a direct access to the collation
tables. IBM® recommends providing the collation handle
if successive collation calls are to be performed. If the caller
only desires to request a collation handle, the fields CUNBOPRM_Get_New_Handle
or CUN4BOPR_Get_New_Handle must be set to X'80'. See description
of the field CUNBOPRM_Flag1 in Description of parameters in area CUNBOPRM. A
sample program, CUNSOSMC, is provided in SYS1.SAMPLIB.
The caller can put the source parameters in any data space. To
allow the service to access data not in primary space, an ALET must
be specified. An ALET of 0 indicates that the data is in the primary
address space (default value), which is the case for most callers.
A dynamic data area (DDA) must always be specified. The required
length is defined by constant CUNBOPRM_DDA_Req or CUN4BOPR_DDA_Req.
Refer to the interface definition file (CUNBOIDF).
|