Calling the character conversion services

This is a general description of how the character conversion services have to be called and what problems can occur.

The recommended DDA size for the character conversion services is 8K, set in the CUNBCPRM_DDA_BUF_LEN and CUN4BCPR_DDA_BUF_LEN fields in the parameter list.

The 31-bit caller of the conversion services must provide the following fields in the parameter area:
  • Source buffer pointer, ALET, and length.
  • Target buffer pointer, ALET, and length (see Note 2).
  • FROM-CCSID (or conversion handle in subsequent calls).
  • TO-CCSID (or conversion handle in subsequent calls).
  • Conversion technique (or conversion handle in subsequent calls).
  • Work buffer pointer, ALET, and length (see Note 2).
  • Dynamic data area pointer (DDA), ALET, and length.
  • Flags.
Note:
  1. A dynamic data area (DDA) must always be specified. The required length is defined by constant CUNBCPRM_DDA_REQ for AMODE (31). See Interface Definition File CUNBCIDF.
  2. To take advantage of a performance improvement, specifically for EBCDIC <=> UTF-8 and EBCDIC MBCS <=> UTF-16 conversions, the application developer can provide larger work and target buffers. The work buffer and target buffer must be three times the size of the source buffer. Expressed mathematically:
    Wrk Buffer Len >= 3* Src Buffer Len AND
    Targ Buffer Len >= 3* Src Buffer Len
The 64-bit caller of the conversion services must provide the following fields in the parameter area:
  • Source buffer (64 bit pointer), ALET (4 byte), and length (8 byte).
  • Target buffer pointer (64-bit pointer), ALET (4 byte), and length (8 byte) (see Note 2).
  • FROM-CCSID (or conversion handle in subsequent calls).
  • TO-CCSID (or conversion handle in subsequent calls).
  • Conversion technique (or conversion handle in subsequent calls).
  • Work buffer pointer (64-bit pointer), ALET (4 byte), and length (8 byte) (see Note 2).
  • Dynamic data area pointer (DDA), ALET, and length (see Note 1).
  • Flags.
Note:
  1. A dynamic data area (DDA) must always be specified. The required length is defined by constant CUN4BCPR_DDA_REQ for AMODE (64). See Interface Definition File CUN4BCID
  2. To take advantage of a performance improvement, specifically for EBCDIC <=> UTF-8 and EBCDIC MBCS <=> UTF-16 conversions, the application developer can provide larger work and target buffers. The work buffer and target buffer must be three times the size of the source buffer. Expressed mathematically:
    Wrk Buffer Len >= 3* Src Buffer Len AND
    Targ Buffer Len >= 3* Src Buffer Len
From the caller's perspective, conversions are always done with a single call to the conversion services. Internally, the following conversions are done in the following steps (an indirect conversion):
  • A mixed code page and anything other than simple code pages
  • UTF-8 and anything other than UTF-16
  • A conversion requesting bidi transformations
Two step conversions require that a work buffer be supplied by the caller. For coding simplicity, a caller may choose to always supply a work buffer (which will go unused for single-step conversions). Alternatively, if the caller knows that a particular conversion is "single-step", the work buffer need not be supplied.

The dynamic data area (DDA) is needed to hold all the variables needed internally by the conversion service. The size of the DDA required depends on the type of conversion being done (source and target CCSIDs), the addressing mode (AMODE(31) or AMODE(64)), whether the B technique is requested, and the parameter area version being used. If the DDA size is not large enough to support the type of conversion specified by Src_CCSID and Trg_CCSID, the conversion services will return with a return code of "CUN_RC_USER_ERR" and reason code of "CUN_RS_DDA_BUF_SMALL", and will also return the DDA size required for the specified conversion in field "UCCE_DDA_BUF_LEN" of the UCCE handle. It is recommended that the caller also provide code to recognize and react (by allocating a larger DDA buffer and recalling the service) to a "CUN_RS_DDA_BUF_SMALL" error.

When the service returns, it updates the source buffer and target buffer pointers, and lengths. Thus the caller can see how many bytes were converted and how much of the target buffer is filled up. Return codes and reason codes notify when a target buffer overflow was detected or other error occurred. Recommendations for the work buffer and target buffer sizes are listed in Handling a target buffer overflow.

The source buffer may contain characters that have no equivalent in the TO-CCSID or may contain the substitution character in the FROM-CCSID. The user of the conversion services specifies the action to take on detection of such a character by the value of the input parameter bit 'CUNBCPRM_Sub_Action'. Depending on this input bit the conversion service either terminates conversion with reason code CUN_RS_SUB_ACT_TERM or it inserts the conversion table's substitution character into the target buffer, sets bit CUNBCPRM_Substitution in the parameter list, and continues conversion with the next character in the source buffer.

The source buffer may also contain byte-strings that do not represent a character in the source code page. These characters are referred to as "malformed characters" and cannot be converted to a valid target codepoint. If the CUNBCPRM_Flag1 parameter bit CUNBCPRM_Sub_Action specifies "substitute", and CUNBCPRM_Mal_Action specifies "terminate", then the conversion will terminate with RC=4 and RS=0C when a malformed character is encountered. But if CUNBCPRM_Mal_Action specifies "substitute", the malformed character will be substituted.

The source code page (FROM-CCSID), target code page (TO-CCSID), and technique-search-order are given initially. A call with those specified always returns a conversion handle which – for the services – is a fast path to the conversion table and its properties. In subsequent calls, it is recommended that the caller provides the conversion handle. If a caller wants to request the conversion handle without converting, specify a source buffer length of 0.

The caller can put the conversion data in any data space. To allow the conversion service to access the data, an ALET must be specified. An ALET of 0 indicates that the data is in the primary address space.

To indicate which code page was active at the end of conversions from and to mixed code pages, CUNBCPRM_Subcodepage is updated by the character conversion services. The same technique is used for designator sequences used for some ISO 2022 encoding.

Specifically, since an MBCS encoding is made up of SBCS and DBCS tables, a unique algorithm is used to deal with this in the character conversion service. When converting to an MBCS encoding, the character conversion service will first begin using the SBCS table to search for the character to be converted. If the code point is not in the valid range within the SBCS table (from X'00' to X'FF'), the conversion service will switch to the DBCS table to look for that code point and convert. It is that switch that will generate a X'0E' (Shift-Out) in the converted data stream, because a shift out of SBCS mode was performed. Next the character conversion service will continue using the DBCS table for subsequent conversions of characters. At this point, if there are no more characters to be converted, the character conversion service will stop the conversion and the converted data stream will end without a X'0F' (shift into SBCS mode). However, if the character conversion service encounters a code point that is in the valid SBCS code point range, the character conversion service will switch back to SBCS and thereby generating a X'0F' (Shift In) in the converted data stream, because a shift into SBCS mode was performed. It is the responsibility of the character conversion service exploiter to add the necessary SI/SO (Shift In/Shift Out) characters when a string is broken up across multiple calls to the character conversion service that involves MBCS characters.

This is where the CUNBCPRM_Subcodepage parameter is useful. CUNBCPRM_Subcodepage is made up of two halves - first half is CUNBCPRM_Source_SCP_State and second half is CUNBCPRM_Target_SCP_State. When converting from Unicode to EBCDIC(MBCS), the character conversion service will set CUNBCPRM_Target_SCP_State. When converting from EBCDIC(MBCS) to Unicode, the character conversion service will set CUNBCPRM_Source_SCP_State. See the Description of parameters in area CUNBCPRM for the specific values and their definitions.

For the internal handling of MBCS conversions, refer to Conversion support for multi-byte encodings (MBCS).