IBM Support

Indexing for IBM FileNet P8 Content Engine Searches

Question & Answer


Question

How do I diagnose slow searches and create database indexes for faster searches?

Cause

The correct set of indexes or properly issued searches have not been created or configured.

Answer


Introduction


In addition to allowing customizable classes and properties, the Content Engine supports customizable searches against the classes and properties that have been uniquely created. Searches can be constructed with a variety of operators and combinations of search conditions, joins, subqueries, ordering, and can include the paging of results.

Because of this powerful and customizable search capability, and because unnecessary indexes can degrade performance when objects are added or updated, a single set of database indexes to support high performance searches for all application designs cannot be created "out of the box" when an object store is created.

Therefore, the FileNet P8 Administrator and Database Administrator must analyze the searches run by the specific FileNet P8 application or custom applications that are being used, in order to create the right and minimal set of database indexes to ensure a high performance system.

Guidelines will be presented here in order to create the right set of indexes for user-customizable searches. Also discussed here are best practices for writing a search to use indexes, the types of tools necessary to identify slow queries, techniques to validate that the required indexes are used, and additional causes of indexes not being used in a database query.

For existing FileNet P8 applications, there are also product-specific indexing recommendations in the Performance Tuning Guide (1) as well as in the documentation associated with those products.

Please note that starting in the IBM FileNet Content Manager 5.2 release, the information in this technote is now in the P8 Documentation under the Performance Tuning section "Tuning IBM FileNet P8 queries".

Also, starting in the IBM FileNet Content Manager 5.2.1 release, the Administration Console replaces FileNet Enterprise Manager. The Administration Console now supports creating composite indexes as well as case-insensitive indexes and additional keywords. See the P8 documentation for more information.

Search Processing in the Content Engine


Understanding how Content Engine searches are processed and then issued against the database will help to understand what columns are needed in a database index to support a fast database query. The columns required in an index will either be the columns associated with user properties, or will also include a system column in a composite index along with user property columns.

Content Engine searches are issued by using the Content Engine API, for example:
    Select Id, Creator, DocumentTitle From DocSubClass
    Where SiteLocation = 'site123'
    Order By DocumentTitle

This form and syntax is called the "CE SQL". The CE SQL uses Content Engine properties and classes in the Select, From, Where, and Order By conditions as well as possible joins or other conditions and operators supported by the CE SQL search syntax. See the Content Engine documentation SQL Syntax page for details.

The Content Engine then parses the "CE SQL" into "DB SQL". Classes are mapped to tables, properties are mapped to columns, and system columns will also be added into the SELECT, WHERE and ORDER BY clauses as well depending on the type of search.

An example of the DB SQL corresponding to the previous CE SQL is:
    Select object_id, object_class_id, security_id, creator, ..., u1708_documenttitle
    From DocVersion
    Where object_class_id IN (?, ?) And home_id Is Null And u7E24_sitelocation = ?
    Order By u1708_documenttitle, object_id

The DB SQL query is then executed against database, followed by additional calls by the Content Engine to fetch all the rows required to satisfy the search request.

The total search time thus consists of the time to:
(a) Execute the SQL query
(b) Fetch enough rows from the database to fill a page (if continuable, otherwise all rows or TOP N rows are returned, if specified)
(c) Check security permissions to verify that the user has access to the fetched rows
(d) Return the rows to the client

If the client request multiple pages in a continuable search, then the total search time is the sum of the search time taken for each page, that is, the steps (a) - (d) must be repeated for each page.

The ? symbols added into the DB SQL by the Content Engine are bind markers, so that a query plan does not have to be recompiled for each execution of the same search with different search values. Bind markers are required for fast performance in a high concurrency system. The markers can be seen in a FileNet P8 Server trace or similar markers such as :1, :2, etc. can be seen from database traces.

The prefix on property columns in the database, takes the form of uxy_, where xy is a set of digits, for example u54_. Starting in Content Engine 4.5.1, four hexadecimal digits are used rather than two, for example the u1708_ or u7E24_ shown in the previous example. In the DB SQL query examples that follow, the form uxy_ will be used for brevity.

The object_class_id clause will always be added to a search on a subclass in order to restrict the search to that class, or to the set of subclasses for that class as well, if the WITH INCLUDESUBCLASSES clause (the default) is specified. In this case the Document subclass is "DocSubClass".

The home_id clause is added to restrict the search to objects not associated with cross-object store references. The Order By object_id is added by the Content Engine for any continuable search as will be explained in more detail in the following sections.

Continuable Searches

Content Engine searches can be divided into those that are "continuable", or "paged" searches, and "non-continuable", or "non-paged" searches. The type of search, whether continuable or not, can affect the form of index that will be needed.

Many Content Engine applications will always use continuable searching, for example:
  • Administration Console for Content Engine (ACCE)
  • FileNet Enterprise Manager (available prior to 5.2.1)
  • Workplace XT
  • IBM Enterprise Records
  • IBM eDiscovery Manager
  • IBM Content Navigator

Continuable searches can also be used in any custom application using the Content Engine API. In the Content Engine Java API, for example, a continuable search can be configured by specifying continuable=true in the fetchRows() or fetchObjects() APIs.

In a continuable search, rows that satisfy the search criteria are returned a page at time. A request for a new page is issued as a new search, with the Content Engine automatically appending an "ORDER BY object_id" to the database query that is issued. The object_id is the database column corresponding to the Id property of the object's class. This clause is required in order to continue the search at the next value. Succeeding page queries will include an "object_id > 'last Id' " clause. The presence of this order by column affects the type of index that might be needed, as will be shown in the examples that follow.

In a continuable search, database locks are held for a short amount of time compared to a non-continuable search returning all rows, and the best concurrency is achieved with many active user searches. In a non-continuable search, all rows are returned and the database can hold row locks for the duration of the query, causing possible timeouts or database contention.

Basic Search Example

For some simple searches, the necessary indexes can be easily identified and then readily created through the Administration Console, for example:
    Select Id From Document Where Property1 = 'value'

Without an index on Property1, and when there are many rows in the underlying table, the search can be slow. In order to improve the performance of the search, Property1 must be indexed from the Administration Console, from the Document class property definition, properties dialog, provided:

1. The search is issued frequently, or it is important for the search to return quickly
2. The number of rows satisfying the search value is small relative to the total number of rows in the table

If the search is issued infrequently, for example only once a week, in order to avoid adding unnecessary indexes which will slow down object creation and add to storage costs, it might be acceptable to forgo the index in favor of running the search during off-hours, assuming that the search will still return in an acceptable amount of time and not interfere with any other active Content Engine applications. While not having an index might make the search a lot slower, it might be an acceptable trade-off, given that there could be dozens of possible searchable properties.

There are no fixed rules indicating when an index is needed, such as some percentage of the table having rows satisfying the search value of Property1. At some point, as decided by the database query optimizer based on different costing factors, the best query plan might involve a table scan, rather than one that uses an index on Property1.

However, if there are many rows satisfying the search value, there are still possible indexing solutions that will help, depending on the type of search, as will be discussed in the sections that follow.

The phrase used here of "many rows" typically means several thousand rows or more, although subjective cutoffs apply as well. For example the user response time can be a factor in the decision to optimize the search with the indexing techniques that follow.

Also, when Property1 is a string, and for all the indexing solutions that follow, if the object store has been configured with "Force Case-Insensitive Search", be sure to modify index creation as discussed in the section that follows "Indexing for Case-Insensitive Search".

Content Based Retrieval (CBR) Searches

Content Based Retrieval (CBR) searches are performed by first retreiving some set of full text rows that satisfy the CBR Contains search criteria from the full text repository, storing those rows in a temporary database table, and then performing a database join query between the database table representing the search class and the temporary table. For the final join query, the indexing solutions presented here still apply.

For more details on CBR searches, including how to perform an "Optimized full-text query", please see the topic "SQL Syntax Reference" in the FileNet P8 Information Center (2). The subtopics "FullTextRowLimit" and "FullTextRowLimit and Optimized Queries" include presentations of how the full text retrieval process works in each case.

Search Timeout

If a timeout is specified for the search, it is first applied by the Content Engine to the SQL execution at search step (a) presented above as well as to the initial content-based retrieval (CBR) execution, if CBR criteria are present in the search.

In Content Engine releases prior to 4.5.1 FixPack 7 and in 5.0 GA, the search timeout does not subsequently apply to the fetch time described in search step (b) of the previous section. However, beginning in 4.5.1 FixPack 7 and in 5.0 FixPack 1 and later, the timeout has also been applied to the fetch time. This change helps to timeout very long running searches that are discarding many rows due to insufficient access privileges.

Consequently, in Content Engine releases prior to 4.5.1 FixPack 7 and in 5.0 GA, a search will not timeout if the SQL execution is fast but there are many rows returned. For example, a search without a WHERE clause might begin a table scan and find rows to return with very fast execution, but the fetching of the large row set will not timeout.

If you are on the IBM FileNet Content Manager 5.2 release, please consult the product P8 Documentation for descriptions of the changes that have occurred for the query time limit. The Timelimit section in the SQL Syntax References presents the new QueryDatabaseTimeout property that can be applied to the JDBC execution portion of a search. The Server Query-Timeout Behavior section describes the new partial result set behavior.

Searches that can timeout during SQL execution are those with slow SQL execution times, such as those with an ORDER BY or JOIN without supporting indexes, or a LIKE condition beginning with a wildcard, for example.


Optimizing a Search Returning Many Rows


Consider the case of a search returning many rows. For example, thousands of rows are returned out of the underlying table with a row count in the millions.

Continuable Search Case

For a continuable (paged) search, if the DB SQL finds rows satisfying the query conditions quickly, without having to scan a table or index to find a page size of rows, the first page of rows will be returned quickly. The entire result set, obtained by requesting every page of rows may take a long time however because fetching rows "across the wire" is time consuming because of the physical time required to retrieve and transfer a row set from the database server over the network to the Content Engine server.

A continuable search may thus take a long time because either the rows satisfying the query conditions are not found quickly, or because the total time to request and return all the pages takes a long time.

One approach to improve the performance when many rows are returned is to add more restrictive property conditions to the search. Keep in mind that a continuable search already limits the number of rows returned, albeit a page at a time. So while applying TOP N to the CE SQL will restrict the total rows returned in all the pages of a continuable search, the DB SQL issued by the Content Engine for each page is already generally optimized, by building internal optimizations into the query.

In a continuable search, the Content Engine adds into the DB SQL some database vendor dependent "TOP N" syntax (for example FETCH FIRST N for DB2 or ROWNUM < N for Oracle). These additions enhance the ability of the query to return fast by not requiring too many rows to be selected from the database, which might invoke a table scan, provided there are selective and indexed properties available.

If security filtering removes rows that the user does not have access to, the query is re-issued internally again until the requested page is filled. For this reason, it is important to not rely on security to filter down a search returning many rows.

There are however indexing solutions to improve the performance of a continuable search when many rows are returned, as discussed in the next sections for specific types of search.

Non-Continuable Search Case

For a non-continuable search, after the DB SQL query is executed, all the rows are fetched from the database. Those rows that the user has permission to are returned to the end user. While the initial SQL execution of the DB query might be fast, the overall user search will still be slow due to the time required to fetch all the rows to the Content Engine, process them, and return them to the end user.

The solution to improve the performance of a non-continuable search is to either change the search, by adding in restrictive Where conditions on properties that are indexed, or by limiting the number of rows returned, for example by setting TOP N in the search or by specifying the maximum rows to return in the application interface. For example:
    Select TOP 200 Id From Document Where Property1 = 'value'

The TOP N keyword in the search will curtail the fetching of the rows when 200 rows that the user has permission to have been found. Because permission's have to be checked on each row, and the permission information is stored in one of the column values selected for each row, TOP N database-specific forms cannot be issued directly to the database. Because of this, it is very important to not rely on security to filter down a search returning many rows.

Additionally, in the Content Engine 5.0 and later, when TOP N is used and N is not too large, an optimization clause is added to the DB SQL syntax so that index use will be encouraged in the database query optimizer. A similar optimization clause is already added to all continuable searches by the Content Engine.

Property Equals Search

Consider again the Property Equals Search:
    Select Id From Document Where Property1 = 'value'

For a continuable search, the DB SQL becomes:
    Select object_id, object_class_id, security_id, ...
    From DocVersion
    Where home_id Is Null And uxy_Property1 = ?
    Order By object_id

When there are many rows that satisfy the Property1 = 'value' condition, the "Order By object_id" clause that the Content Engine must add to make the search continuable will add a significant sort cost to the query. For each page of rows, the DB SQL that is executed requires the database to sort the rows by object_id. Because of this sort cost, an index on Property1 alone might not result in the index being used in the query plan. It is faster for the database to issue a table scan, and then sort the table on object_id, or to scan the entire primary key which is already in object_id order, than it is to scan the Property1 index. Scanning the property1 index requires a very slow row ID look-up for each row found.

The solution in this case is to first drop the index on Property1, if present, from Administration Console and then create an index on columns uxy_Property1 and object_id like this:
    create index I_Property1_Id on DocVersion (uxy_Property1, object_id)

The prefix uxy_, which designates a user property column as opposed to a system column, will vary from system to system and from property to property, for example it might have the value u1708_.

The index provides the database a way to go directly to those rows satisfying Property1 and return the rows directly by scanning the index. Because the rows in the index are also in object_id order, no sort step is required in the database query plan.

Keep in mind that if there are not many rows returned by the search, the composite index in this example is not necessary. The original index on Property1 is sufficient. Composite indexes will reduce the performance of object creation as well as add to storage costs if too many are added unnecessarily.

With a non-continuable search returning many rows, the composite index is not necessary. The search can apply TOP N as was previously discussed.

Property Equals Search with Order By

A variation of the Property Equals Search with an Order By condition is:
    Select Id From Document Where Property1 = 'value' Order By Property2

When the search is continuable, the DB SQL becomes:
    Select object_id, object_class_id, security_id, ...
    From DocVersion
    Where home_id Is Null And uxy_Property1 = ?
    Order By uxy_Property2, object_id

When many rows are returned from the search, first drop the index on Property1, if present, from Administration Console and then create an index like this:
    create index I_Prop1_Prop2_Id on DocVersion (uxy_Property1, uxy_Property2, object_id)

If there are not many rows returned by the search, this index is not required because the sort cost will be small.

If the search is non-continuable and returns many rows, create the index similarly, except without the object_id column at the end.

Property Range Search

Consider this example of a property range search:
    Select Id From Document
    Where Property1 > 'aValue' And Property1 < 'zValue'

If the search is continuable and there are many rows returned, first modify the search to add an "Order By Property1" clause:
    Select Id From Document
    Where Property1 > 'aValue' And Property1 < 'zValue'
    ORDER BY Property1

The DB SQL constructed by the Content Engine becomes similar to:
    Select object_id, object_class_id, security_id, ...
    From DocVersion
    Where home_id Is Null And uxy_Property1 > ? And uxy_Property1 < ?
    Order By uxy_Property1, object_id

Also, drop the index on Property1, if present, from Administration Console and then have add a composite index like this:
    create index I_Property1_Id on DocVersion (uxy_Property1, object_id)

With that index, the sort step in the plan is removed and it will become more likely that the database optimizer will use the index in the query plan for the search.

If there are not many rows returned by the search, these changes are not required.

If the search is non-continuable, the Order By Property1 would not have to be added, and a composite index on Property1 with the object_id column would not be needed.

Property Range Search with a Different Order By Property

Here is an example of a search when the Order By property is different that the property used in the range condition:
    Select Id From Document
    Where Property1 > 'aValue' And Property1 < 'zValue'
    Order By Property2

For a continuable search, the DB SQL becomes similar to:
    Select object_id, object_class_id, security_id, ...
    From DocVersion
    Where home_id Is Null And uxy_Property1 > ? And uxy_Property1 < ?
    Order By uxy_Property2, object_id

This is an example where indexes may not be able to solve all forms of a search. An index look-up on uxy_Property1 will return the rows in Property1 order, not in Property2 order. If there are many rows returned by this search, the database will have to perform a sort, and the query plan might not use any index, or might scan the entire index on Property2 if one is present. In short, this query will have a slow execution, in addition to the natural time it takes to fetch and return all the rows. If there are not many rows returned by the search, no changes are required.

A solution to the slow query execution when many rows are returned is only possible if the Order By clause is first changed to use Property1:
    Select Id From Document
    Where Property1 > 'aValue' And Property1 < 'zValue'
    Order By Property1

Also, create a composite index for the query like this:
    create index I_Property1_Id on DocVersion (uxy_Property1, object_id)

If there are not many rows returned by the search, these changes are not required.

If the search is non-continuable, the Order By would also have to be changed to Property1, but the composite index would not be necessary.


Additional Indexing Use-Cases


Composite Indexes for Selective Search Conditions

Sometimes search conditions require too much database row filtering. Database row filtering occurs when each potential row found in the primary property index used in the query plan must be checked for the remaining where conditions back on the main data page of the table.

Consider this search:
    Select Id From Document
    Where Property1 = 'value1' And Property2 = 'value2'
    Order By Property2

Assume the cardinality (number of unique values) of Property1 is greater than Property2, and there is an index already present on Property1. As the index on Property1 is traversed, many rows might have to be discarded after checking the value of Property2 back on the main data page. This main page look-up process can be very expensive in database time, switching back and forth between the index and the main table in order to find the rows that satisfy the where condition. The way to eliminate this time however is to replace the simple index on Property1, with a composite index like this:
    create index I_Prop1_Prop2 on DocVersion (uxy_Property1, uxy_Property2)

In realistic searches, there will be usually be several other conditions, some only slightly selective, as well as search variations on different underlying columns. Since adding in more columns to the composite index is costly both in terms of storage and object creation costs, putting in only the most selective columns makes sense, so as to assist as many property search conditions as possible. Some trade-offs will be necessary to provide high performance overall, in terms of both object retrieval (search) and persistence (creation).

If there are many rows that satisfy the search, the object_id column can also be added to the end of this composite index as well, similar to the earlier examples.

Searches on a Property with Skew

Consider a property equals search:
    Select Id From Document
    Where Property1 = 'value'

When most but not all rows in the table have the same value for a property, the column values are skewed. For the common case of a search looking for a value with few occurrences, an index on Property1 may not be used, depending on the statistics and configuration that is available in the database, as well as the value that is searched on when the query plan is cached. See the Performance Tuning Guide (1) for a discussion of skew solutions for different database vendors.

For example, as described in the Performance Tuning Guide (1), skew solutions for Oracle include executing explicit histogram stats and sometimes a large sample percentage.

On DB2, this technote must be followed to address skew:
https://www-304.ibm.com/support/docview.wss?uid=swg21402554

Workplace (XT) Retrieval of Stored Searches

Retrieval of stored searches from the Workplace (XT) user interface currently invokes a database query for documents where the mime type equals a specific value designated for stored searches. The standard solution for making this query fast is to have the DBA index the mime_type column in the docversion table.

However, because the mime_type values are typically skewed, where most documents only have a few distinct mime_type values, and there are typically a small number of stored searches, skew solutions are sometimes needed to improve the performance of the stored searches retrieval. Solutions for skew are mentioned previously in the section "Searches on a Property with Skew" and can be used if the standard approach of indexing the mime_type column does not result in index use.

Be sure also to check if the object store has been configured with "Force Case-Insensitive Search". In that case, a function-based / generated column index is necessary on the mime_type column, specifically on lower(mime_type). This topic is discussed under the section that follows, "Indexing for Case-Insensitive Search".

Beginning in the Workplace XT 1.1.5 release, indexing the mime_type column in the docversion table for Stored Search retrieval will no longer be necessary. The mime_type search condition used by XT will be replaced by a search on the SearchType (integer) property, which can be indexed directly from the Administration Console, from the StoredSearch document subclass. No skew solutions will be required.

Searches using the LIKE Operator

Searches can be performed from applications using Starts With or (non-CBR) Contains, or explicitly with the LIKE operator. A Starts With search will result in a LIKE query on 'value%", that is with the wildcard operator at the end. If there are enough unique characters at the beginning of the search string, an index on the property can be used by the database for the search.

However if the property search uses Contains, constructed as a LIKE search on '%value%', with a wildcard at the front, then the database cannot use the index for the search because index usage requires some characters at the front of the search value in order to traverse the index. A Contains search with no other viable predicates will result in a table scan and with concurrency can severely impact performance.

Some recommendations:
  • Contains searches should when feasible, be performed during off-hours.
  • Contains searches should be minimized in the application, for example by making the default behavior Starts With.
  • Contains searches can look into using full text index solutions, where feasible.

Starts With searches that return many rows need to also follow the same index recommendations mentioned previously in the sections "Property Range Search", and "Property Range Search with a Different Order By Property". The requirements in those sections discuss (a) creating a composite index on the property along with object_id, and (b) changing the ORDER BY clause appropriately.

Remember as always, for string properties, to include the LOWER() function on the column in the index if "Force-Case Insensitive Search" is enabled on the object store, as discussed in the section "Indexing for Case-Insensitive Search".

Searches for a Specific Class

Consider a search against a specific class with no Where clause, like this:

Select Id From DocClass

The DB SQL for a continuable search becomes:
    Select object_id, object_class_id, security_id, creator, ...
    From DocVersion
    Where object_class_id IN (?) And home_id Is Null
    Order By object_id

This type of search is problematic because the cardinality of object_class_id is typically too low to result in an index on that column to be used in a query plan. The object_class_id column is frequently skewed, and a search against some small minority of rows will result in a table scan.

While solutions to skew exist in the Performance Tuning Guide (1), so that an index on object_class_id can be used to help the search, it is better to design the search with a useful Where condition like this, where Property1 is indexed:
    Select Id From DocClass
    Where Property1 = 'value'

Another approach is to find a property that only has a value for the DocClass and use this Where condition:
    Where Property1 Is Not Null

If adding a selective and indexed property condition is not feasible, an index on object_class_id may be the only choice. Be sure to follow the skew recommendations in the Performance Tuning Guide (1).

If there are many rows returned for the continuable search, create a composite index for the index on object_class_id like this:
    create index I_OCID_ID on DocVersion (object_class_id, object_id)

If there are subclasses of DocClass, so that the DB SQL clause uses "object_class_id IN (?, ?)" then a sort step on object_id is still required, so an index on object_class_id alone is sufficient.

System Column Indexes

Most system columns that do require indexes are already indexed "out of the box" when an object store is created. Sometimes, system properties are added by the Content Engine to the DB SQL query, as shown in this where clause:
    Where T0.object_class_id IN (?, ?) And T0.home_id Is Null
    And (T0.is_current = ? OR ( T0.version_status = ? AND T0.is_reserved = ?))

However indexes on the system columns in this where clause are generally not needed. Consider each of those columns:

home_id
A column such as home_id, used for cross-object store references, is always null, or perhaps has one value (for example in an IBM Enterprise Records system) so indexing this column will not help the performance of a search. An index on this column will not be used in a query plan because it is too expensive for the database to scan the entire index, switching back to the main table to check the other where conditions.

Adding home_id into a composite index will not help performance either, as described in Example 5. As discussed in that example, a column should be added into a composite index only if the search condition is selective, thereby reducing discarded rows when the value is checked on the main data page. With home_id, checking the value on the main data page will not discard the row (because it is typically always NULL), and the time to perform this filter step is negligible.

The presence of home_id in the search conditions also does not incur any noticeable expense in the performance of a search. Because the value is typically always null, or may sometimes have a value, there is always a hit on the row returned from the other search conditions. The row returned from the main data page does have to be checked for the "home_id is null" condition, but this filter step is extremely fast and incurs a negligible performance hit. In all cases, reports that a query can be made faster by removing the home_id is null clause have turned out to be resulting from faulty analysis.

is_current, version_status, is_reserved
These columns on the DocVersion table have very low cardinality (the number of unique values is small), and as such are not good candidates for indexing because the value selected in the search is typically the same value that most rows have. As in the discussion of home_id, when the values are mostly the same, an index will likely not be used, because it is faster to perform a table scan than it is to scan an index, switching back and forth to the main table to access all of the rows.

object_class_id
This column, which stores the Id of an object's class, is also of typically low cardinality. While the same arguments apply as the other system columns, there is a distinction that can be made here in the case of searching for a specific class as was discussed in the previous section.

Covering Indexes

Consider a search with several property conditions like this:
    Select Id, Property1, Property2, ... From Document
    Where Property1 = 'value' and Property2 = 'value' ...

The DB SQL for this search will contain additional Where clauses on home_id, object_class_id as well as other system columns such as is_current, depending on the search.

Creating an index for this search does not require a large composite index, or SQL Server "include columns", to "cover" all the search columns, either in the Where or Select clauses. The right approach for the best performance of the search can be obtained by following the discussion in the previous section "Composite Indexes for Selective Search Conditions".

Covering the entire set of Select or Where columns with a big composite index is an expensive solution in terms of storage costs, ingestion overhead, statistics run time and future changes that will break the solution when either user or system property columns are added to the Select list, such as will occur during Content Engine upgrades, metadata authoring, or changes in the search.

Indexes for Complex Searches

For a search with many predicates or joins, some trial and error may be required. Database vendor tools such as design advisors can be used, but be cautious on the outcome. Vendor tools can suggest incorrect indexes, so be sure to always follow the techniques presented here. The creation of the right index can be an iterative process: creating an index, running statistics and then testing the performance of the query and validating that the query plan is best and uses the new index. As mentioned, too many indexes can slow performance of inserts and updates, so drop indexes that are not used in any query plan for the searches issued against the tables involved.

Indexes on GUID data type Columns

A popular misconception is that on Oracle, a special index on a GUID data type column is needed, such as one that uses rawtohex(id_column). This is not the case for the Content Engine. The Content Engine does not require any special index, only an index directly on the GUID column name.

During query execution, the Content Engine properly "binds" any GUID data values used in a query by using the JDBC method setBytes(). Setting the bytes in this manner for the GUID ensures that the value is properly matched to any database index on that column, allowing it to be used in a query plan.

Similarly, the Content Engine does not require hextoraw() to be used in a GUID column where condition, for any query executed by using the Content Engine API, or by using FileNet P8 Applications created with the Content Engine API.

The only case when the Oracle hextoraw() function is needed in a query is when using a non-Content Engine API application, or when using database query tools. In those cases, for queries against the database directly, issue a query like this:
    select object_id from DocVersion Where object_class_id = hextoraw('4AA9266F32C33846AB4C084516BFFDEE')

When the query is executed by using non-Content Engine API applications or database query tools, including the hextoraw() on a guid value will allow the query plan to use the index on the column.

For more information, see this Technote:
https://www-304.ibm.com/support/docview.wss?uid=swg21469817

Note also:
- Guid values are stored in the database in Byte Reversed form in Oracle and DB2
- Query plans generated from query tools will not generally be correct, because the Content Engine uses bind values, and not literals. To obtain the proper query plan, use profiling techniques or capture the query plan from the shared pool, or use similar method appropriate for your database, for the actual query as executed by the Content Engine.

Indexes for List Property Searches

A search on a list property will be constructed in the Content Engine as in this example:
    Select Id From Document
    Where 'value1' IN ListProperty1

The DB SQL will be created similar to this:
    Select object_id, object_class_id, security_id, ...
    From DocVersion
    Where T0.home_id IS NULL AND EXISTS
    (select 1 from ListOfString L where L.parent_id = object_id and L.parent_prop_id = ?
    and LOWER(L.element_value) = LOWER(?))

By default, tables in the Content Engine object store of the form "ListOf<Table>", are not indexed for user searches, although there is a system index present that is necessary for retrievals of values for a specific object. The <Table> can designate either String, Boolean, Integer32, Float64, Datetime, or Id.

In order to support user searches on a list property value when there are many rows in the ListOf<Table>, an index is needed with the element_value column in the first slot like this:
    create index I_LOS on ListOf<Table> (element_value, parent_prop_id, parent_id)

If "Force Case-Insensitive Search" is enabled, so that the LOWER() function is applied for a ListOfString search, as shown in Example 6, create the index like this:
    create index I_LOS on ListOfString (lower(element_value), parent_prop_id, parent_id)

Also, see the Performance Tuning Guide (1) for additional indexing recommendations for List Property searches.

Indexing for Case-Insensitive Search

When Force Case-Insensitive Search is enabled on an object store, the Content Engine applies the LOWER function to both the String search property and also the value like this:
    Where LOWER(uxy_Property) = LOWER(?)

An index on uxy_Property will not be used in the query plan because of the presence of the LOWER(). Instead, the creation of an index on the LOWER() function is required, and the steps are unique to the database vendor. The original index on uxy_Property can be dropped.

Also, see the FileNet P8 Information Center (2) for details on creating indexes on the LOWER() function, when Force Case-Insensitive Search is enabled.


Best Practices for Writing a Search to use Indexes


The Best Practices for Searches section in the Working with Queries topic found in the FileNet P8 Information Center (2) discusses the following recommendations in detail. Database indexes will only be used for a search if the appropriate guidance is followed and the correct indexes are created.
  • Use a Wildcard only at the end of a string search value (use Starts With for a LIKE query)
  • Create Oracle Function-based or DB2 Generated Column Indexes for Case-Insensitive string comparisons
  • Join only one or two tables at a time in a query
  • Avoid unnecessary Order By clauses
  • Use the Intersects operator for multiple List Property OR conditions

The Performance Tuning Guide (1) also discusses important factors to ensure that indexes are used in a database query plan for a search, including:
  • Gathering Statistics
  • Running REORG occasionally
  • Addressing Skew by proper design of the application, or with necessary database specific solutions

Also, the SQL Syntax Reference page in the FileNet P8 Information Center (2) includes some important requirements for high performance CBR searches, for example see the section titled "FullTextRowLimit and Optimized Queries".


Diagnosing Slow Queries


In a large or complex FileNet P8 system, many database queries can be frequently executed. In order to find the queries that require tuning, tools and techniques are necessary to identify slow queries. All sites should use these techniques to determine if slow queries are causing system overhead.

Keep in mind, as discussed earlier, the total search time consists of the SQL query execution time, the fetch time to return rows from the database, as well as the time to return every page of rows to the client if there are many rows satisfying the search that must be returned. The techniques that follow are to identify slow SQL query execution times. A client search may still be slow, if many rows have to be returned due to the natural time it takes to fetch and return all the rows. Techniques to help searches returning many rows were discussed in the previous sections.

The first productive technique that can be used to address slow SQL queries is to examine common searches from the FileNet P8 Application side, based on the search interface in order to see directly what kinds of searches can be executed. Searches that can be frequently executed on selective properties will require at least some of those properties to be indexed, as has been discussed. In some cases a search interface will have to be designed to prevent, when possible, slow queries to be executed. For example a database property "Contains" search will run slow because a LIKE '%value' query cannot use an index with a wildcard at the front. Changing the default for the search to be "Starts With" will make the entire system much faster, by eliminating table scans. Other techniques are discussed or referenced in the previous section on Best Practices for Writing a Search.

Another necessary technique is to use database SQL monitoring tools to identify slow running queries within the database. For example, depending on which database vendor you are using, use these types of tools:
  • Oracle AWR / ADDM
  • DB2 Snapshots and Event Monitors
  • SQL Server Profiler

Content Engine Tools are also available to monitor for slow searches, including:
  • Performance Logging
  • Content Engine Tracing
  • IBM System Dashboard for ECM

The Content Engine "Must Gather Technotes" discuss Performance Logging, which can identify slow DB SQL execution:
https://www-304.ibm.com/support/docview.wss?uid=swg21391007

A FileNet P8 server trace for the Content Engine with Database and Search tracing subsystems enabled will show the SQL execution time of each database query:

Enabling IBM FileNet Content Engine Tracing
http://www-01.ibm.com/support/docview.wss?uid=swg21308282

Configuring trace logging at different levels
http://www-01.ibm.com/support/docview.wss?uid=swg27013426

Logging, when FileNet Enterprise Manager cannot be used
https://www-304.ibm.com/support/docview.wss?uid=swg21308321

Once a slow SQL query is identified, the next steps are to obtain the DDL for the tables involved in the query, as well as row counts for the tables and distribution of the properties involved so that skew and selectivity can be known.

DBA techniques can be used to gather the distributions and cardinality of values, such as issuing DB2Look on DB2 or similar queries of the statistics, or the following explicit query method can be used:
    select count(*) from <table>
    select count(*), uxy_property from <table> group by uxy_property order by 1 desc

Substitute the relevant table(s) for <table>, and the column name(s) involved for uxy_property.

Also, obtain a query plan to see what indexes if any are used for the query. This is discussed in the next section.

To address the slow query, generally, the most selective column in the query should be indexed, if not already, however all of the sections presented here should be studied to properly index the necessary columns or to determine the possible reasons that existing indexes are not used in the query plan.


Validating Index use in a Query


When indexes are being used, the database query typically runs faster. A query may run fast however on the second execution if data is buffered in the database, without using an index, so it is generally a best practice to get the execution time the first time a query is executed, as well as the second time and report both results. In addition, obtain the query plan in order to see whether the database is using the right indexes and plan.

Recall, as discussed previously, the end-user search may still take a long time, if many rows are returned, because the search consists of the database SQL query execution, followed by the fetching of rows, as well as the total time to return all pages requested in a continuable search. The validation discussed here is to ensure that the DB SQL query executes fast and uses the right indexes.

Obtaining a query plan can be problematic if not done correctly for several reasons:
(1) Content Engine queries are issued through the JDBC thin driver, using certain API calls. If a query is executed using command line or GUI database tools, the query plan can be different than when the search is executed by the Content Engine. Where feasible, the query plan that was actually used for the query executed by the Content Engine should be obtained.
(2) Content Engine queries use bind markers for the values. If a query is executed using literals, such as uxy_property = 'value' for the values, rather than "uxy_property = ?", the query plan will often be different depending on how the Content Engine query was first compiled based on any "bind peeking" provided by the database vendor.
(3) Oracle Explain plans can sometimes be incorrect when they involve raw data types (as are used by the Content Engine for Id columns), or when improper hextoraw() conversions are used. See the previous section "Indexes on GUID data type Columns".

Obtaining an Oracle Query Plan
To obtain a query plan for Oracle, use the following technique from Oracle support to obtain the query plan that was actually executed by the Content Engine, from the shared pool.

Immediately after a search is run from the FileNet P8 application through the Content Engine, login as the sys user run these commands from a SQL query tool such as SQL*Plus. The "query portion" mentioned in the following is enough exact SQL text to uniquely identify the query which was previously found to be slow.

column id format 999
column pid format 999
column operation format a25
column options format a20
set linesize 150
set pagesize 999

select distinct hash_value, sql_text from v$sql where lower(sql_text) like lower('%query portion%');

select distinct id, parent_id as pid, lpad (' ', depth) || operation operation, options, object_name, cost, bytes from v$sql_plan where hash_value = <hash value from previous query> order by id;

Obtaining a DB2 Query Plan
On DB2 use the Explain plan methods using the exact query executed by the Content Engine (including bind markers). If on DB2 9.7, a better approach to ensure that the right plan is obtained is to gather a section explain immediately after a search is run from the FileNet P8 application through the Content Engine. For example, after creating the explain tables, login as the schema owner and run these commands:

DB2 "SELECT SECTION_TYPE, EXECUTABLE_ID, VARCHAR(STMT_TEXT, 50) AS TEXT FROM TABLE(MON_GET_PKG_CACHE_STMT ( 'D', NULL, NULL, -2)) as T WHERE STMT_TYPE_ID LIKE 'DML%' AND lower(STMT_TEXT) LIKE lower('%query portion%')"
(Use a portion of the query obtained from a P8 server trace after the query is run)

CALL EXPLAIN_FROM_SECTION (x'015057652000', 'M', NULL, 0, NULL, ?, ?, ?, ?, ? )
(Use the entire executable_id returned in the first query for the first argument)

db2exfmt -d databaseName -e schemaName -w 2011-06-28-18.53.10.973000 -n SYSSH200 -s NULLID -t -# 0
(Use the -w explain_time, -n source_name, -s source_schema from last call)

Obtaining a SQL Server Query Plan
For SQL Server, SQL Profiler can be configured to obtain the query execution plan after a search is run from the FileNet P8 application through the Content Engine.


Additional causes of Indexes not being used in a Search


Statistics
Keep statistics up to date, however if there are already millions of rows present in the tables, the distribution might not change much at all so that frequent statistics runs are not necessary.

Oracle statistics in some cases are known to give incorrect query plans, even when the right indexes have been created. See the following Technote for solutions:
https://www-304.ibm.com/support/docview.wss?uid=swg21468281

Oracle Date Properties
On Oracle, indexes on Date properties in versions of the Content Engine prior to 5.0 are not used in an Oracle query plan due to a bug in Oracle JDBC prior to 11.2.0.2.

See the following Technote for solutions:
https://www-304.ibm.com/support/docview.wss?uid=swg21397282

Oracle Parallel Query
Oracle parallel query is generally used for data warehousing applications, and is not appropriate for the Content Engine. Oracle parallel query can cause table scans in query plans where indexes should have been used.

See the following Technote for more information:
https://www-304.ibm.com/support/docview.wss?uid=swg21469922

DB2 Skew Solutions
For searches on columns with skew, see the following Technote:
https://www-304.ibm.com/support/docview.wss?uid=swg21402554

DB2 REORG
On DB2, a table REORG can make a query faster when feasible to run.


References


1. Performance Tuning Guide
The IBM FileNet P8 Performance Tuning Guide is referenced on the main IBM FileNet P8 Platform Documentation page:
https://www.ibm.com/support/docview.wss?uid=swg27019441

2. FileNet P8 Information Center
The Information Center is referenced on the IBM FileNet P8 Platform Documentation page:
https://www.ibm.com/support/docview.wss?uid=swg27019441

[{"Product":{"code":"SSNVNV","label":"FileNet Content Manager"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Content Engine","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"5.2.1;5.2.0;5.1.0;5.0;4.5.1;4.5.0;4.0.1","Edition":"All Editions","Line of Business":{"code":"LOB45","label":"Automation"}},{"Product":{"code":"SSGLW6","label":"IBM Content Foundation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Content Engine","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
17 June 2018

UID

swg21502886