Db2 Text Search key features and concepts

Db2 Text Search offers you a fast and versatile method for searching text documents that are stored in a table column in Db2 databases. You can search the documents by using SQL queries or XQuery for searches on XML documents.

The text documents must be uniquely identifiable. Db2 Text Search uses the primary key of the table for this purpose.

Rather than searching text documents sequentially, Db2 Text Search searches using a text search index, which is a more efficient approach. A text search index consists of significant terms that are extracted from the text documents.

Figure 1. Creating a text search index
Creating a text search index
Creating a text search index defines the properties of the index, such as the update frequency. The text search index does not contain any data immediately after you create it. Updating the index adds data about the terms and the text documents to the text search index. The initial index update adds all text documents from a text column to the index. Subsequent updates are known as incremental updates and synchronize the data in the table and the data in the text search index. Db2 Text Search provides two methods for synchronizing a text search index with its table:
  • The basic synchronization method uses triggers that automatically store information about new, changed, and deleted documents in a staging table.
  • The extended synchronization method uses a trigger to store information about changed documents in a staging table but captures information about new and deleted documents through integrity processing and stores that information in an auxiliary staging table.
See the text search index creation, updates, and property alterations topic for details.
Db2 Text Search works by collecting data from diverse sources and indexing it for subsequent fast retrieval. Db2 Text Search uses linguistic analysis to improve search results and supports the following document formats:
  • Unstructured plain text.
  • Structured text such as that in HTML or XML documents
  • Proprietary document formats such as PDF or Microsoft Office document formats.

    For proprietary formats, you need filtering software that might require an additional download and setup step.

Db2 Text Search supports full-text search in a partitioned database environment. You can also create a text search index for range-partitioned tables or tables that use the multidimensional clustering feature in a single-partition or partitioned database environment. Text search indexes are supported for any partitioning feature combination. In a partitioned database environment, the text search index is partitioned according to the partitioning of the table across multiple database partitions. Other partitioning features, such as table partitioning or multidimensional clustering, do not affect the partitioning of the text search index.

Db2 Text Search supports both stand-alone and integrated setups. For partitioned environments, a stand-alone setup is preferred to avoid resource contention with the database server. For Db2 pureScale® environments, only stand-alone servers are supported. For more detail on Db2 text search set up for pureScale environments please see this recipe.

In all environments, Db2 text search is not supported on column-organized tables.