Creating the common analysis structure to index mapping file
Using the common analysis structure to index mapping file, you can determine which analysis results in the common analysis structure you want to index.
About this task
The common analysis structure to index mapping file is in XML. The sample common analysis structure to index mapping file is based on the type system defined for the police report scenario.
<?xml version="1.0" encoding="UTF-8"?>
<indexBuildSpecification
xmlns="http://www.ibm.com/of/822/consumer/index/xml">
<skipCondition>
<type>com.ibm.uima.tt.DocumentAnnotation</type>
<filter syntax="FeatureValue">toBeprocessed = 0</filter>
</skipCondition>
<indexBuildItem>
<name>com.ibm.analytics.types.Person</name>
<indexRule>
<style name="Annotation">
<attributemappings>
<mapping>
<feature>role</feature>
<indexName>role</indexName>
</mapping>
<mapping>
<feature>title</feature>
<indexName>title</indexName>
</mapping>
<mapping>
<feature>gender</feature>
<indexName>gender</indexName>
</mapping>
</attributemappings>
</style>
</indexRule>
</indexBuildItem>
<indexBuildItem>
<name>com.ibm.analytics.types.Suspect</name>
<indexRule>
<style name="Annotation"/>
<style name="Field">
<style name="Facet">
</style>
</indexRule>
</indexBuildItem>
<indexBuildItem>
<name>com.ibm.analytics.types.City</name>
<indexRule>
<style name="Annotation">
<attributemappings>
<mapping>
<feature>cityDistrict</feature>
<indexName>district</indexName>
</mapping>
</attributemappings>
</style>
</indexRule>
</indexBuildItem>
<indexBuildItem>
<name>com.ibm.analytics.types.Date</name>
<indexRule>
<style name="Field">
<attribute name="fixedName" value="Date"/>
</style>
<style name="Field">
<attribute name="fixedName" value="hour"/>
</style>
</indexRule>
<filter syntax="FeatureValue">year="2005"</filter>
</indexBuildItem>
<indexBuildItem>
<name>com.ibm.analytics.types.PoliceReport</name>
<indexRule>
<style name="Annotation">
<attribute name="fixedName" value="PoliceReport"/>
<attributemappings>
<mapping>
<feature>crimeDescription</feature>
<indexName>crimeDescription</indexName>
</mapping>
<mapping>
<feature>time/coveredText()</feature>
<indexName>time</indexName>
</mapping>
<mapping>
<feature>date/englDate</feature>
<indexName>date</indexName>
</mapping>
<mapping>
<feature>location/coveredText()</feature>
<indexName>location</indexName>
</mapping>
<mapping>
<feature>knownSuspects[]/com.ibm.analytics.types.Suspect:surName</feature>
<indexName>suspectsLastNames</indexName>
</mapping>
</attributemappings>
</style>
</indexRule>
</indexBuildItem>
<indexBuildItem>
<name>com.ibm.lang.LastName</name>
<indexRule>
<style name="Facet">
<attribute name="fixedName" value="$.lastName"/>
</style>
</indexRule>
</indexBuildItem>
</indexBuildSpecification>
The common analysis structure to index mapping file must contain all of the analysis results that you want to be able to query and view as facets in the content analytics miner and enterprise search applications.
Procedure
To create the common analysis structure to index mapping file:
-
Create an XML file.
To avoid XML syntax errors, use an XML editor or XML authoring tool of your choice. The XSD schema for the mapping file is called CasToIndexMapping.xsd in the ES_INSTALL_ROOT/configurations/parserservice/jediidata directory.
- Include your mappings in a <indexBuildSpecification xmlns="http://www.ibm.com/of/822/consumer/index/xml"> element. The namespace (specified in the xmlns attribute) must be exactly as shown.
-
Add a
<skipCondition>
element to prohibit certain documents from being indexed, based on a certain feature value. This element is optional. In the example, documents that contain a data structure of typecom.ibm.uima.tt.DocumentAnnotation
with a feature namedtoBeProcessed
set to zero are not indexed. -
Add one or more
<indexBuildItem>
elements that contains the mapping of one particular feature structure in the common analysis structure to a structure in the index. - Save and validate the XML file.
Example
- The
<indexBuildItem>
element - The common analysis structure to index mapping file contains one
or more
<indexBuildItem>
elements. Each element describes the mapping of one particular feature structure in the common analysis structure to a structure in the index (a span, field, or facet).The<name>
element contains the feature structure type. There are two ways to specify a type:- The full type name. For example,
com.ibm.analytics.types.Suspect
- A wildcard. For example,
com.ibm.analytics.types.*
. The wildcard character can be added only at the end of the type specification.
Use only subtypes of
uima.tcas.Annotation
as index build items. If a feature structure is a subtypeuima.cas.TOP
(and not ofuima.tcas.Annotation
), you can access this feature structure by using a feature path starting from an annotation.If type A is a subtype of type B (in the sample,com.ibm.analytics.types.Suspect
as a subtype tocom.ibm.analytics.types.Person
), and there are<indexBuildItem>
elements Ia and Ib defined for both types, processing is as follows:- Each index rule that is defined in Ib is applied to feature structures of type B and feature structures of type A
- Each index rule that is defined in Ia is applied to feature structures of type A only
In the example, the
<indexBuildItem>
element that is defined forcom.ibm.analytics.types.Person
annotations also applies tocom.ibm.analytics.types.Suspect
annotations. Two spans are created for a suspect annotation: one named Person and the other Suspect.The<filter>
element is optional and is used to restrict the<indexBuildItem>
mapping only to feature structures that have a certain attribute value. This element is useful if you want an attribute to act as a switch for what to index. For example, persons and organizations might be recorded in an annotation of typeEntityAnnotation
. Its feature calledtype
is set to eitherperson
ororganization
. To extract only the persons, and not the organizations, you can add the following filter:<filter syntax="FeatureValue">type = "person"</filter>
Moreover, you could choose to index persons and organizations under different span names, for example,
person
andorganization
. To index persons and organizations under different span names, define two<indexBuildItem>
elements of typeEntityAnnotation
and use two filters on thetype
feature to trigger either the persons or the organizations. - The full type name. For example,
- The
<indexRule>
element - Each
<indexBuildItem>
element contains one<indexRule>
element. Each<indexRule>
element contains all the information that is needed to map a feature structure in the common analysis structure to the index as a field, annotation, or facet. The Annotation, Field, and Facet styles support a number of attributes.Restriction: Watson Explorer Content Analytics does not support the Term style that is predefined in the UIMA Software Development Kit.For the Annotation, Field, and Facet, the following alternatives exist when you specify the annotation, field, or facet name to include in the index:- Use
fixedName
if you want each feature structure to be accessible in the index under the same name. In the following example, each feature structure of typecom.ibm.analytics.types.Person
is mapped to a span named "Person" in the index.<indexBuildItem> <name>com.ibm.analytics.types.Person</name> <indexRule> <style name="Annotation"> <attribute name="fixedName" value="Person" /> </style> </indexRule> </indexBuildItem>
This example enables queries like "Give me documents where Boss is contained as a person name". The query is expressed as follows by using XML fragments:
@xmlf2::'<Person>Boss</Person>'
- Use
nameFeature
if the annotation stores different entities that you want to be able to access by using different spans depending on the value of a certain feature of the annotation. In the following example,com.ibm.tt.EntityAnnotation
is indexed as aperson
ororganization
span, depending on the value of the feature namedtype
. The feature can also be a feature path.<indexBuildItem> <name>com.ibm.tt.EntityAnotation</name> <indexRule> <style name="Annotation"> <attribute name="nameFeature" value="type" /> </style> </indexRule> </indexBuildItem>
This example enables queries like "Give me documents about the organization WHO" (as opposed to the English term "who"). The query is expressed as follows in limited XPath syntax:
@xmlp::'/organization[ftcontains="WHO"]'
- If the <attribute> element is not specified, the short name
of the annotation type in the
<indexBuildItem>
element is used by default. For example:
This<indexBuildItem> <name>com.ibm.uima.tutorial.RoomNumber</name> <indexRule> <style name="Annotation" /> <style name="Field" /> <style name="Facet" /> </indexRule> </indexBuildItem>
<indexBuildItem>
element results in annotations, fields, and facets namedRoomNumber
that are populated with the text covered by thecom.ibm.uima.tutorial.RoomNumber
annotation.
- Use
- The
<style name="Annotation" />
element - Annotation in the
<style>
element specifies how you can access span information. Besides allowing the use of thefixedName
andnameFeature
attributes, this style also supports the<attributemappings>
element. Within this element, it is possible to map the value of a feature to an attribute of the resulting span in the index, which you can later use in a search expression.Each mapping is done within a separate<mapping>
element. The<feature>
element contains a feature path, and the<indexName>
element contains the name of the attribute that is used in the index to store the value of<feature>
. For example,
This<mapping> <feature>make/companyname</feature> <indexName>company</indexName> </mapping>
<mapping>
element stores the value of the feature in the pathmake/companyname
directly in the index attributecompany
.Mapping feature values to index attributes is especially useful if the type system used during text analysis is complex, including many nested feature structures. Using the
<mapping>
element, relevant attributes can be exposed, allowing you to use them in queries without detailed knowledge of the original type system structure.The
<style name="Field" />
elementField in the
<style>
element specifies the fields that you want to access. You can use thefixedName
andnameFeature
attributes. To use Field style rules in the index mapping file, you must use the administration console to define index fields for the fields that you want to search or analyze and configure the appropriate attributes, such as parametric or returnable.Field information is always content searchable, that is, field information is accessible through keyword queries.
The optional attributevalueFeature
defines which feature value to take as the field value. If the feature structure is an annotation, and the attribute is not set, the covered text of the annotation is used as the field value. In the example,
Two fields are generated for<indexBuildItem> <name>com.ibm.analytics.types.Date</name> <indexRule> <style name="Field"> <attribute name="fixedName" value="date"/> </style> <style name="Field"> <attribute name="fixedName" value="hour"/> <attribute name="valueFeature" value="hour"/> </style> </indexRule> <filter syntax="FeatureValue">year="2005"</filter> </indexBuildItem>
com.ibm.analytics.types.Date
. One field nameddate
contains the covered text, for example, 5:15pm. Another field contains the value of the attributehour
. Here you can query by using 'hour::<17'. - The
<style name="Facet" />
element - Facet in the
<style>
element specifies the facets to map to feature structures. You can use thefixedName
andnameFeature
attributes to specify the names of the facets to include in the index. When you specify the value for thefixedName
attribute, ensure that you use the correct identifier depending on the type of collection. For enterprise search collections, specify the name of the facet as specified in the administration console. For content analytics collections, specify the facet path.Requirement: When you include facet mappings in the index mapping file, you must use the administration console to define the facets that are stored in the index.In addition to the <atribute> element, this style supports the <pahComponent> element. Use this element to specify how to construct the value of the facet. You can use thefeature
orliteral
attribute values for thename
attribute of the <pahComponent> element. With the <pahComponent name=”feature”/> element, the value of the feature that is specified by thevalue
attribute is used as the value of the facet in the index. In the following example, the value of thecityName
feature of the com.ibm.analytics.types.City annotation is used for the value of theCity
facet.<indexBuildItem> <name>com.ibm.analytics.types.City</name> <indexRule> <style name="Facet"> <attribute name="fixedName" value="City"/> <pathComponent name="feature" value="cityName"/> </style> </indexRule> </indexBuildItem>
With the <pathComponent name=”literal”/> element, the specified string in thevalue
attribute is used as the value of the facet in the index. In the following example, the stringEnergy
is used for the value of thetechnology
facet.<indexBuildItem> <name>com.ibm.analytics.types.Technology</name> <indexRule> <style name="Facet"> <attribute name="fixedName" value="technology"/> <pathComponent name="literal" value="Energy"/> </style> </indexRule> </indexBuildItem>
If no <pathComponent> element is specified, the entire text span that is covered by the annotation is used as the facet value.
For enterprise search collections only, you can specify multiple <pathComponent> elements per <style name="Facet"> element to produce hierarchical facets. In the following example, the value of therole
feature of the com.ibm.lang.Expert annotation is used for the first-level value of theexpert
facet, and the value of thedivision
feature is used for the second-level value of the facet to produce a hierarchical facet value such as .
If you specify multiple <pathComponent> elements in a <style/> element for a content analytics collection, a flat facet is created according to the first <pathComponent> element that is specified. All additional <pathComponent> elements in that style are ignored.<indexBuildItem> <name>com.ibm.lang.Expert</name> <indexRule> <style name="Facet"> <attribute name="fixedName" value="expert"/> <pathComponent name="feature" value="role"/> <pathComponent name="feature" value="division"/> </style> </indexRule> </indexBuildItem>