DB2 Version 10.1 for Linux, UNIX, and Windows

Net Search Extender key concepts

Net Search Extender offers users and application programmers a way to search full-text documents stored in DB2® databases, other databases, and file systems using SQL queries.

Important: Net Search Extender has been deprecated. Its use is no longer recommended and might be removed in a future release. Use DB2 Text Search as a fast and versatile method of searching full-text documents stored in DB2 databases using SQL and XQuery statements. See the topic about migrating from Net Search Extender to DB2 Text Search for details.

To fully understand the capabilities of Net Search Extender, it is necessary to understand key terms and the various options available. It is also necessary to have a basic understanding of DB2 database concepts and terms.

Basically, Net Search Extender searches text documents that are held in the column of a database table.

The text documents must be uniquely identifiable. Net Search Extender uses the primary key of the table for this purpose.

The documents can be stored in various formats. The formats include either unstructured plain text, structured text such as HTML or XML, or proprietary document formats such as PDF or Microsoft Office document formats. For the latter, additional filtering software that might have to be licensed separately is required.

Rather than sequentially searching through the text documents at query time that would take a considerable amount of time, Net Search Extender creates a text index in order to make documents searchable efficiently.

A text index consists of significant terms that are extracted from the text documents.

Figure 1. Creating a text index

Text index creation is the process of defining and declaring the properties of the index, such as the location of the index. After creation, the text index does not as yet contain any data. Index update is the process of adding data about terms and documents to the text index. The first index update adds information about all text documents from the text column to the index. The first update is known as the initial update.

By using a text index for searching, there are synchronization issues between the table and the text index that must be taken into account, as any follow-up changes to the table, such as additions, deletions, and updates to the text documents must be reflected in the text index. These changes are applied to the text index with an incremental update.

Net Search Extender supports two options to synchronize the text index with its source table. The basic synchronization in Net Search Extender is based on triggers that automatically store information about new, changed, and deleted documents in a log table. There is one log table for each text index.

Figure 2. Incremental update process with triggers

The basic option is based only on triggers; updates that are not recognized by triggers will therefore be ignored, for example, loading data with the LOAD command, attaching or detaching ranges for a range-partitioned table. An extended synchronization option enables capturing such changes through integrity processing by adding a text-maintained staging table to store information about new and deleted documents, while the log table stores information about changed documents captured through a trigger.

Figure 3. Incremental update with triggers and integrity processing

You can update the text index by using a manual or automatic option. The automatic option uses an update schedule with specified days and times.

Note that neither of these options synchronizes the text index within the scope of a transaction that updates, deletes, and inserts text documents. Net Search Extender's asynchronous text indexing improves performance and concurrency. The update is applied within a separate transaction to a copy of a very small part of the index. The index is only locked for read access during a very short period of time when the copy is put in place of the original. This is invisible to search operations, see Net Search Extender instance services for information.

A text index has certain properties, such as index file location and automatic update properties. If necessary, you can change some of the properties. This is known as altering the index. Altering the index does not modify any index data.

One such property is whether the ORDER BY phrase should presort the text index on the table columns. In such a case, the initial update will index the text document in the order specified, and return the search results in this order.

For example, you might specify presorted book abstracts according to the book price. When looking for the least expensive books about relational database systems, you can restrict your text search to return only the first couple of books as these will be the cheapest. However, without presorted indexes, you would have to search for all books and join these with the cheapest books, which would be a more costly operation.

Net Search Extender allows several presorted indexes per text column. For example, one index for presorting books according to the date of publication and, a second presorting books according to the price.

Usually the first update after creating a text index is an initial update, and the following updates are incremental. However, when working with presorted indexes you want to keep the order in case of updates. This is addressed by the RECREATE INDEX ON UPDATE option, which totally rebuilds the index each time an update is performed.

After the text index is updated, you can search using one of the following options:

An SQL scalar search function
A stored procedure search
An SQL table-valued function

As the search options have different operating characteristics, they are explained in the following sections.