Parsing XDBX input streams

Extensible Dynamic Binary XML (XDBX) is a binary XML form composed of both numeric and string data. The numeric data is used for several purposes, including identifying the semantic purpose and length of each associated string in the stream. See http://www.ibm.com/support/docview.wss?&uid=swg27019354 for more information about the format of XDBX streams.

XDBX can be passed to z/OS XML and parsed with validation to create a z/OS XML record stream, in the same way that regular XML text documents are handled. See the appropriate parser API section for details about how to initialize and control a parse instance for XDBX streams. Once the parse instance is initialized and configured, parsing proceeds in the same way as for regular XML text input. Non-validating parse requests are not supported for XDBX streams.

Although the API is called the same way for both XML text and XDBX input streams, there are important differences in the way the parser handles each type of input. More precisely, there is no need for z/OS XML to perform certain low level parsing functions on XDBX streams. Key among these is the need for a low-level scan of the input stream. XDBX streams already have tag fields that describe the meaning of each string and length fields that delimit the strings' boundaries. The z/OS® XML parser gains a performance advantage over the validation of XML text input by using the information already provided in the XDBX form.

z/OS XML does not re-scan each string of text from an XDBX stream. Consequently, the no-escapes bit setting is determined entirely from the tag used to represent a given string. This is important for the 'U' (text), 'b' (attribute), and 'W' (whitespace) tags in the XDBX stream. If the XDBX stream creator associated these tags with strings that do in fact contain characters that need to be escaped on serialization of the stream, z/OS XML will not catch this, and will set the XEH_NO_ESCAPES bit in the record header for any associated records generated during validation. Similarly, if a 'T' (text), 'y' (attribute) or a 'C' (CDATA) tag is used when the associated string has no characters that require escaping for serialization, the XEH_NO_ESCAPES flag will be off. This is true even when values are defaulted from the DTD or schema during the validation process.

Another difference from XML text input is that XDBX streams are required to have all entity references resolved. For this reason, none of the z/OS XML functionality implemented for managing unresolved entities is relevant for XDBX input. See the descriptions of the control APIs for more information about how character and entity references are handled for XDBX streams.

Every XDBX stream begins with a magic number (0xCA3B), and is encoded in big-endian form. There is no need for a byte-order-mark, and the parse request will fail if one is present in the XDBX stream.

The following usage notes apply to parsing XDBX streams:

XDBX input streams may be passed to z/OS XML for parsing with validation when the GXLHXEC_FEAT_XDBX_INPUT feature is enabled. Attempts to initialize a parse instance for an XDBX input stream without validation will result in a failure.
Validation is performed using an Optimized Schema Representation (OSR) in the same way as for conventional XML text input. The output of the parser is a conventional z/OS XML record stream.
XDBX input streams contain a combination of binary information and UTF-8 text strings, meaning that the CCSID specified at parser initialization must always be UTF-8.
Certain other parser features are not currently supported in combination with XDBX streams:
- GXLHXEC_FEAT_SCHEMA_DISCOVERY
- GXLHXEC_FEAT_SRC_OFFSETS
In addition, some control operations are not allowed when the parser is initialized to handle XDBX streams. See the section describing the gxlpControl — perform a parser control functionoperation for details of those functions that are not compatible with XDBX streams.