Annotated XML schema decomposition and recursive XML documents
Types of recursion
An XML schema is said to be recursive when the definition of types in it allow for elements of the same name and type to appear in their own definition. Recursion may be explicit or implicit.
- Explicit recursion
- Explicit recursion occurs when an element is defined in terms of itself. This is shown in the following example, where the element <root> is explicitly referred to in its own definition using the ref element declaration attribute:
<xs:element name="root"> <xs:complexType> <xs:sequence> <xs:element name="a" type="xs:string"/> <xs:element name="b"> <xs:complexType> <xs:sequence> <xs:element name="c" type="xs:string"/> <xs:element ref="root" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element>
With explicit recursion, a recursive branch is delimited as follows:- The start of a recursive branch is a declaration of element
Y
whose ancestors do not consist of another element declaration ofY
. The start of a recursive branch can have multiple branches of descendants; for a particular descendant branch, if the branch has another element declaration ofY
, the branch is considered a recursive branch. - The end of a recursive branch is the highest level element declaration
of
Y
that is a descendant of the start of the branch. Note that the end of branch is specifically an element reference
The node that is a start of a recursive branch can serve as the starting node for multiple recursive branches. In the following example there are two explicitly recursive branches:<root> (*), <b>, <root> (**)
<root> (*), <b>, <root> (***)
<xs:element name="root"> <!-- * --> <xs:complexType> <xs:sequence> <xs:element name="a" type="xs:string"/> <xs:element name="b"> <xs:complexType> <xs:sequence> <xs:element name="c" type="xs:string"/> <xs:element ref="root" minOccurs="1"/> <!-- ** --> <xs:element ref="root" minOccurs="1"/> <!-- *** --> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element>
A recursive branch delineates how its member elements are decomposed. In the instance document, the occurrence of element
Y
that corresponds to the start of the recursive branch, and its descendants, up to the occurrence ofY
that corresponds to the end of that branch, can be decomposed as scalar values. The occurrence ofY
in the instance document corresponding to the end of the recursive branch, marks the recursive region. The recursive region begins with the starting element tag of this occurrence ofY
, and ends with the end element tag of the occurrence. All elements and attributes in the instance document that are in this recursive region can be decomposed as markup or as string values, depending on the value specified for the db2-xdb:contentHandling decomposition annotation. - The start of a recursive branch is a declaration of element
- Implicit recursion
- Implicit recursion occurs when an element with a complex type definition contains another element, also defined as a complex type, where the latter has as its type attribute the name of a complex type definition of which it is a part. This is shown in the following example, where the element <beginRecursion> refers to the type "rootType" and the element <beginRecursion> is itself part of the type "rootType" being defined:
<xs:element name="root" type="rootType"/> <xs:complexType name="rootType"> <xs:sequence> <xs:element name="a" type="xs:string"/> <xs:element name="b"> <xs:complexType> <xs:sequence> <xs:element name="c" type="xs:string"/> <xs:element name="beginRecursion" type="rootType" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType>
With implicit recursion, a recursive branch is delimited as follows:- The start of a recursive branch is a declaration of element
Y
of complexType typeCT
whose ancestors do not consist of another element declaration of typeCT
. The start of a recursive branch can have multiple branches of descendants; for a particular descendant branch, if the branch has another element declaration ofZ
of typeCT
, the branch is considered a recursive branch. - The end of a recursive branch is the highest level element declaration
of type
CT
that is a descendant of the start of the branch.
The node that is a start of a recursive branch can serve as the starting node for multiple recursive branches. In the following example there are two implicitly recursive branches:<root>, <b>, <beginRecursion>
<root>, <b>, <anotherRecursion>
<xs:element name="root" type="rootType"/> <xs:complexType name="rootType"> <xs:sequence> <xs:element name="a" type="xs:string"/> <xs:element name="b"> <xs:complexType> <xs:sequence> <xs:element name="c" type="xs:string"/> <xs:element name="beginRecursion" type="rootType" minOccurs="2"/> <xs:element name="anotherRecursion" type="rootType" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType>
There is a slight difference in how this second, implicit type of recursion is decomposed, as compared to explicit recursion. In the instance document, the occurrence of element
Y
that corresponds to the start of the recursive branch, and its descendants, up to the occurrence ofZ
that corresponds to the end of that branch, can be decomposed as scalar values. This occurrence ofZ
in the instance document marks the recursive region. The recursive region begins after the starting element tag ofZ
, and ends immediately before the end element tag ofZ
. All element descendants of this occurrence ofZ
lie in this recursive region. However, the attributes of this occurrence are outside the recursive region and can therefore be decomposed as scalar values. - The start of a recursive branch is a declaration of element
Decomposition behavior for recursive branches
For both types of recursion, the recursive branch delineates non-recursive and recursive regions in the corresponding part of the instance document. Only the non-recursive regions of an XML instance document can be decomposed as scalar values into a target database table. This restriction includes any non-recursive regions within one branch that are part of a recursive region of an enclosing branch. That is, if recursive branch RB2 is completely encompassed by recursive branch RB1, then for some instances of RB2 in the instance XML document, its non-recursive region can fall inside the recursive region of an instance of RB1. In this case, this non-recursive region cannot be decomposed as scalar values; instead it is part of the markup which is the decomposition result for RB1. For any instance of RB2, only the non-recursive region of the instance that is not inside any other recursive region can be decomposed as scalar values.
- RB1 (
<root> (identified with *), <b>, <root> (identified with **)
) - RB2 (
<d>, <d>
)
<xs:element name="d">
<xs:complexType>
<xs:sequence>
<xs:element ref="d">
</xs:sequence>
<xs:attribute name="id" type="xs:int"/>
</xs:complexType>
</xs:element>
<xs:element name="root"> <!-- * -->
<xs:complexType>
<xs:sequence>
<xs:element name="a" type="xs:string"/>
<xs:element ref="d"/>
<xs:element name="b">
<xs:complexType>
<xs:sequence>
<xs:element name="c" type="xs:string"/>
<xs:element ref="root" minOccurs="1"/> <!-- ** -->
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
The recursive regions of an associated
instance document are highlighted in the following example. There
are two instances of RB2 (<d>, <d>
) in the
instance document, but only the non-recursive region of the first
instance of RB2 (<d>
identified by #) can be decomposed
as scalar values. That is, the attribute id="1"
can
be decomposed. The non-recursive region of the second instance of
RB2 is completely within the second highlighted area, which is a recursive
region of the instance of RB1. Therefore, the attribute id="2"
cannot
be decomposed.<root>
<a>a str1</a>
<d id="1"> <d id="11"> </d> </d>
<b>
<c>c str1</c>
<root>
<a>a str11</a>
<d id="2"> <d id="22"> </d> </d>
<b>
<c>c str11</c>
</b>
</root>
</b>
</root>
Example: Using the db2-xdb:contentHandling decomposition annotation with both types of recursion
This example demonstrates decomposition behavior for both the explicit and implicit type of recursion, and the results of setting different values for the db2-xdb:contentHandling annotation. In the following two XML instance documents the recursive regions are highlighted.
<root>
<a>a str1</a>
<b>
<c>c str1</c>
<root>
<a>a str11</a>
<b>
<c>c str11</c>
</b>
</root>
</b>
</root>
<root>
<a>a str2</a>
<b>
<c>c str2</c>
<beginRecursion>
<a>a str22</a>
<b>
<c>c str22</c>
</b>
</beginRecursion>
</b>
</root>
In an instance document, all elements or attributes and their contents that appear between the beginning of recursion and end of recursion cannot be decomposed as scalar values into table-column pairs. However a serialized markup version of the items between the beginning of recursion and end of recursion can be obtained by annotating an element (of complexType) in the recursive branch with the db2-xdb:contentHandling attribute set to "serializeSubtree". A text serialization of all the character data in this part can also be obtained by setting db2-xdb:contentHandling to "stringValue". In general, the content or markup of the recursive path can be obtained by setting the db2-xdb:contentHandling attribute appropriately at any complexType element of the recursive branch or on an element that is an ancestor of the elements in the recursive branch.
<xs:element name="root">
<xs:complexType>
<xs:sequence>
<xs:element name="a" type="xs:string"/>
<xs:element name="b"
db2-xdb:rowSet="TABLEx"
db2-xdb:column="COLx"
db2-xdb:contentHandling="serializeSubtree">
<xs:complexType>
<xs:sequence>
<xs:element name="c" type="xs:string"/>
<xs:element ref="root" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
results in this XML fragment being inserted
into a row of TABLEx, COLx when Document 1 is decomposed: <b>
<c>c str1</c>
<root>
<a>a str11</a>
<b>
<c>c str11</c>
</b>
</root>
</b>
<xs:element name="root" type="rootType"/>
<xs:complexType name="rootType">
<xs:sequence>
<xs:element name="a" type="xs:string"/>
<xs:element name="b">
<xs:complexType>
<xs:sequence>
<xs:element name="c" type="xs:string"/>
<xs:element name="beginRecursion"
type="rootType" minOccurs="0"
db2-xdb:rowSet="TABLEx"
db2-xdb:column="COLx"
db2-xdb:contentHandling="serializeSubtree"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
results in this XML fragment being
inserted into a row of TABLEx, COLx when Document 2 is decomposed: <beginRecursion>
<a>a str22</a>
<b>
<c>c str22</c>
</b>
</beginRecursion>