Technical Blog Post
Abstract
Direct I/O and XML Query Performance
Body
When assessing performance for queries, particularly when performance is insufficient or when doing query performance tuning, assessing both CPU utilization and bufferpool I/O (especially data,index, and temporary physical reads) via snapshots or monitor routines are essential metrics which are fundamental to identifying what areas of the query processing are of greatest importance. When running queries which involve XML data, direct I/O can also be an important factor in query performance
Queries which have XML operations such as XMLquery predicates, or which fetch XML column data (for example via XML2CLOB) may end up doing temporary processing which involves direct I/O during query execution. As a simple example of a single table select which includes XML constructs:
SELECT C1, C2...
XML2CLOB(my_xml_column) as xml_data
FROM IMAIONE.T1
WHERE <other local predicates> AND
XMLcast(XMLquery('$d/node1/node2/node3/node4' passing T1.my_xml_column as "d") as varchar(25)) is null AND
XMLcast(XMLquery('$d/node1/node2/node3/node5' passing T1.my_xml_column as "d") as varchar(25)) = C4
This query is assumed to contain some regular (non-XML) predicates on T1, as well as two XMLQUERY predicates which have to be applied on the XML data (in column MY_XML_COLUMN). The query also selects data from the XML column using the XML2CLOB scalar function. An explain plan for this query might look similar to the following:
250
NLJOIN
( 2)
340.973
45.0008
/-+--\
1000 0.25
FETCH XSCAN
( 3) ( 5)
22.7255 318.247
3.00078 42
/---+---\
1000 100000
IXSCAN TABLE: IMAIONE
( 4) T1
15.152 Q4
2
|
100000
INDEX: IMAIONE
IX5
Q4
The query plan includes an XSCAN operator which corresponds to the fetching of the XML column data to be returned via XMLCLOB, as well as the application of the XMLquery based predicates. When monitoring this type of query, check for direct I/O activity associated with the application handle and ensure that the I/O response times look adequate. Note also that these direct I/Os would generally be associated with temporary tablespace activity, although there is no explicit temp operation indicated in the explain graph:
Application Snapshot
Application handle = 32693
Direct reads = 1167414
Direct writes = 1167414
Direct read requests = 25334
Direct write requests = 46870
Direct reads elapsed time (ms) = 9574
Direct write elapsed time (ms) = 7002210
Tablespace Snapshot
Tablespace name = TEMPSPACE1
Tablespace ID = 1
Direct reads = 9251864
Direct writes = 9270058
Direct read requests = 402363
Direct write requests = 259955
Direct reads elapsed time (ms) = 21822
Direct write elapsed time (ms) = 11263002
In this example, although the average direct read time indicated by the snapshots looks fast, the direct write times (average 7002210 ms/1167414 = ~5ms/write) which is a very high number for write I/O. This would indicate a potential issue in the I/O subsystem which would likely create a serious bottleneck for the query's performance.
In addition to ensuring that direct I/O performance of the I/O subsystem is adequate, in some cases it may be possible to reduce the amount of direct I/O incurred by the query by creating appropriate XML indexes, if applicable. Note that using XML indexes may not always prevent such an XSCAN however. Depending on the type of XQuery predicates specified by the query, this may restrict whether an XML index can be used to apply those predicates. Also, if the query selects the full XML column data, as this one does, an XSCAN may still be required to fetch the data.
UID
ibm11140700