z/OS Unicode Services User's Guide and Reference
Previous topic | Next topic | Contents | Contact z/OS | Library | PDF


Handling a target buffer overflow

z/OS Unicode Services User's Guide and Reference
SA38-0680-00

If the target buffer is too small, the conversion services will convert as many characters as will fit into the target buffer. When the service returns with the appropriate reason code for that situation, the source and target buffer pointers point to the byte following the last successfully converted source character (respectively inserted target character). Additionally, the source buffer length is updated to the number of bytes left unconverted in the source buffer and the target buffer length is updated to the number of bytes not yet consumed in the target buffer.

There are two ways in which a caller can respond to reason code CUN_RS_TRG_EXH (target buffer exhausted):

  1. Redo the conversion with a large enough target buffer:

    Repeat the conversion with a target buffer large enough to hold at least the maximum possible amount of target string bytes. To accomplish the necessary 'worst case' calculation, the caller has to take into account the number of source bytes to be converted and the nature of the CCSIDs involved (in terms of minimum possible source character width, maximum possible target character width, and possible shift-in/shift-out character sequences, or sub table switch control bytes). Such a 'worst case size' target buffer will prevent the occurrence of the reason code CUN_RS_TRG_EXH (target buffer exhausted).

    The following table lists the minimum and maximum character widths of the different encoding schemes:
    Table 1. Minimum and maximum character widths of the different encoding schemes
    Encoding scheme ESID Minimum Character Width Maximum Character Width Rationale
    SBCS x1xx 1 1 pure single byte
    DBCS and UCS-2 x2xx 2 2 pure double byte
    UTF-8 7807 1 4 UTF-8 uses 1 to 4 bytes to encode Unicode characters
    PC MBCS 2300 to 3300 1 2 PC MBCS encodings always use one SBCS and one DBCS code page
    EUC MBCS 4403 1 2 - 4 EUC encodings use at least one SBCS and at least one DBCS sub code page. If more than two sub code pages are used, shift characters are inserted for characters of the third and fourth sub code page. Then the maximum width is 2 + 1 = 3. Some EUC encodings use TBCS (triple byte) code pages as the third sub code page (this case is not yet supported). Then the maximum width is 3 + 1 = 4.
    EBCDIC MBCS 1301 1 3 EBCDIC MBCS encodings always use one SBCS and one DBCS sub code page. Because switching between them is done with shift characters the maximum width is 2 + 1 = 3.
    ISO2022 MBCS JP and ISO2022 MBCS JP-1 5404 1 5 - 6 ISO2022 MBCS JP encodings always use at least one SBCS and at least one DBCS sub code page. Most ISO2022-JP encodings use an escape sequence of 4 characters for at least one of the DBCS sub code pages. Thus, we get 2 + 4 = 6. In one case, the escape sequence is only 3 characters long. In that case, we get 2 + 3 = 5.
    ISO2022 MBCS KR 5409 1 6 - 7 ISO2022 MBCS KR encodings always use one or two SBCS sub code pages or one SCBS sub code page and one DBCS sub code page. Furthermore they use one designator sequence of length 4 before the first occurrence of a character of sub code page 2 and shift characters to switch between the sub code pages. Thus we get: (1 or 2) + 4 + 1 = (6 or 7).
    PC Data for GB 18030 2A00 1 4 S-ch PC Data mixed for GB 18030.
    QBCS 2900 4 4 S-ch 4 bytes part PC Data for GB 18030 (Fixed UCS2 Subset).
  2. Do the conversion piece-by-piece:

    Save the target buffer characters already converted. Provide a new target buffer and call the conversion service again without modifying CUNBCPRM_Src_Buf_Len and CUNBCPRM_Src_Buf_Ptr to make sure that the conversion continues where it has been interrupted. This follow-on step may have to be repeated several times until all source bytes are converted. The completion of the conversion is indicated by return code CUN_RC_OK (Return code=0). Concatenate the individual conversion results to form the complete converted string.

    Using the piece-by-piece method is not recommended when using the B technique. The B technique requires complete input to get correct results. You can use the piece-by-piece technique when using the extended bidi support with Layout_Streaming.

Go to the previous page Go to the next page




Copyright IBM Corporation 1990, 2014