See information about the latest product version
Parsing strategies
You can use several efficient parsing strategies during the message flow development to reduce memory usage when you parse and serialize messages. This section describes Partial parsing, Opaque parsing, and how to avoid unnecessary parsing.
Identifying the message type quickly
It is important to be able to correctly recognize the correct message format and type as quickly as possible. In message flows that process multiple types of message, this identification can be a problem. What often happens is that the message needs to be parsed multiple times to ensure that you have the correct format. That extra parsing to determine the message type needs to be avoided to reduce memory usage.
The message flow in Figure 1 shows a number of Filter nodes (Filter1 & Filter2), several subflows (subflow1, subflow2, and so on) each containing more nodes. The message flow is complex and is implemented with a long critical path. As such, messages are parsed multiple times.
In this example, the use of functions and procedures, ESQL parsing techniques, and dynamic routing of the flow are combined into the minimum number of nodes (excludes error handling subflow nodes). The logic for each of the paths is coded as a function or procedure, and called from the main procedure in the Compute node. This method also avoids the multiple parsing of messages that is executed in the multiple Filter nodes and subflows. This method significantly reduces the performance cost due to fewer nodes, and ultimately leads to less parsing, tree copying, and so on, for the most optimized solution.
Partial parsing
A message is parsed only when necessary to resolve the reference to a particular part of its content. An input message can be of any length, and parsing the entire message for only a specific part of content is not usually required. Partial parsing (also referred to as On-demand parsing) improves the performance of message flows, and reduces the amount of parsed data that is stored in memory. Partial parsing is used to parse an input message bit stream only as far as is necessary to satisfy the current reference.
To use Partial parsing, you must set the Parse timing property on the input node to On Demand
All the parsers that are provided with IBM® Integration Bus support partial parsing. The amount of parsing that must be performed depends on which fields in a message need to be accessed, and the position of those fields in the message. In the next two diagrams, one has the fields ordered A to Z (Figure 3) and the other with them ordered Z to A (Figure 4). Depending on which field is needed, one of the cases is more efficient than the other. If you need to access field Z, then the first case would be best. Where you have influence over message design ensure that information that is needed for routing for example is placed at the start of the message and not at the end of the message.
When you use ESQL and Mapping nodes, the field references are typically explicit. That is, you have references such as InputRoot.Body.A. IBM Integration Bus parses only as far as the required message field to satisfy that reference. The parser stops at the first instance. When you use the XPath query language, the situation is different. By default, an XPath expression searches for all instances of an element in the message, which implicitly means that a full parse of the message takes place. If you know that there is only one element in a message, then there is the chance to optimize the XPath query, for example, to retrieve only the first instance. For example, /aaa[1] if you want just the first instance of the search argument.
Opaque parsing
For XMLNSC messages, you can use Opaque parsing: A technique that allows the whole of an XML sub tree to be placed in the message tree as a single element.
Opaque parsing is supported for the XMLNS and XMLNSC domains only.
Use the XMLNSC domain in new message flows if you want to use opaque parsing. The XMLNS domain is deprecated, and offers a more limited opaque parsing facility than the XMLNSC domain. The XMLNS domain is provided only to support legacy message flows.
- It reduces the size of the message tree because the XML subtree is not expanded into the individual elements.
- The cost of parsing is reduced because less of the input message is expanded as individual elements and added to the message tree.
You can use opaque parsing where you do not need to access the elements of the subtree. For example, you need to copy a portion of the input tree to the output message but might not care about the contents in this particular message flow. You accept the content in the subfolder and have no need to validate or process it in any way.
Specifying elements for opaque parsing
You must specify elements for opaque parsing in the Parser Options section of the Input node of the message flow, as shown in Figure 6:
To specify elements for opaque parsing, add the element names to the Opaque elements table. Ensure that message validation is not enabled, otherwise it automatically disables opaque parsing. Opaque parsing does not make sense for validation, because the whole message must be parsed and validated.
Opaque parsing in action
<tns:Inventory xmlns:tns="http://www.example.org/NewXMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.example.org/NewXMLSchema Inventory.xsd ">
<tns:header> <tns:version>V100</tns:version>
</tns:header>
<tns:body>
<tns:field1>tns:field1</tns:field1>
<tns:field2>tns:field2</tns:field2>
…..
<tns:field1000>tns:field2</tns:field1000>
</tns:body>
<tns:trailer>
<tns:type>tns:type</tns:type>
</tns:trailer>
</tns:Inventory>
Using
opaque parsing, you can eliminate the need to parse the body section of the payload. You need to set
the parent of the elements that you do not want to parse in the Parser
Options section of the Input
node of the message flow, as shown in Figure 7.All elements that are defined in the Opaque elements list are treated as a single string when parsed. This parsing behavior is shown in Figure 8.
When you design a message structure, if you have the opportunity to group elements based on the parsing needs, then this method greatly improves performance. In the previous example, if you move the <type> field into the header, there would be no need for opaque parsing: The on-demand parser would not need to go past the header in this example.
Avoiding unnecessary parsing
One effective technique to reduce the cost of parsing, is not to parse.
The strategy is to avoid having to parse some parts of the message as shown in Figure 9.
For example:
- If it is field A, then it is right at the beginning of the body and would be found quickly.
- If it is field Z, then the cost might be different, especially if the message is several megabytes in size.
Use the application that created this message to copy the field that is needed for routing into a header within the message. For an WebSphere® MQ message, this field might an MQRFH2 header, and a JMS property for a JMS message for example. If you use this technique, it is no longer necessary to parse the message body, potentially saving a large amount of processing effort. The MQRFH2 or JMS Properties folder still needs to be parsed, but with a smaller amount of data. The parsers in this case are also more efficient than the general parser for a message body because the structure of the header is known. Copy key data structures to MQMD, MQRFH2, or JMS Properties to prevent parsing the user data.