db2ts CREATE INDEX command

The db2ts CREATE INDEX command creates a text search index for a text column. You can then search the column data by using text search functions.

Important: Net Search Extender (NSE) is no longer supported in Db2®. Use the Db2 Text Search feature.

The text search index does not contain any data until you run the text search UPDATE INDEX command or the Db2 Administrative Task Scheduler runs the UPDATE INDEX command according to the defined update frequency for the index.

To issue the CREATE INDEX command, you must prefix the command name with db2ts.

Authorization

The authorization ID of the db2ts CREATE INDEX command must hold the SYSTS_MGR role and CREATETAB authority on the database and one of the following items:
  • CONTROL privilege on the table on which the index will be defined
  • INDEX privilege on the table on which the index will be defined and one of the following items:
    • IMPLICIT_SCHEMA authority on the database, if the implicit or explicit schema name of the index does not exist
    • CREATEIN privilege on the schema, if the schema name of the index exists
  • DBADM authority

To schedule automatic index updates, the instance owner must have DBADM authority or CONTROL privileges on the administrative task scheduler tables.

Required connection

Database

Command syntax

Read syntax diagramSkip visual syntax diagramCREATE INDEXindex_nameFOR TEXTONschema_nametable_name (text_column_name)(function_name(text_column_name))text default informationupdate characteristicsstorage optionsindex configuration optionsconnection options
text default information
Read syntax diagramSkip visual syntax diagramCODEPAGEcode_pageLANGUAGElocaleFORMATformat
update characteristics
Read syntax diagramSkip visual syntax diagramUPDATE FREQUENCYNONEupdate frequencyincremental update characteristics
update frequency
Read syntax diagramSkip visual syntax diagramD(*,integer1)H(*,integer2)M(,integer3)
incremental update characteristics
Read syntax diagramSkip visual syntax diagramUPDATE MINIMUMminchanges
storage options
Read syntax diagramSkip visual syntax diagramCOLLECTION DIRECTORYdirectoryADMINISTRATION TABLES INtablespace_name
index configuration options
Read syntax diagramSkip visual syntax diagramINDEX CONFIGURATION(,option value)
option value
Read syntax diagramSkip visual syntax diagram COMMENTtext UPDATEAUTOCOMMITcommitcount_numbercommitsizeCOMMITTYPEcommittypeCOMMITCYCLEScommitcyclesINITIALMODEinitialmodeLOGTYPEltypeAUXLOGauxlog_valueCJKSEGMENTATION cjksegmentation_methodUPDATEDELAYupdateDelay value
server configuration options
Read syntax diagramSkip visual syntax diagramSERVERIDserverId
connection options
Read syntax diagramSkip visual syntax diagramCONNECT TOdatabase_nameUSERusernameUSINGpassword

Command parameters

INDEX index_name
Specifies the name of the index to create. This name (optionally, schema qualified) will uniquely identify the text search index within the database. The index name must adhere to the naming restrictions for Db2 indexes.
ON table_name
Specifies the table name containing the text column. In Db2 Version 10.5 Fix Pack 1 and later fix packs, you can create a text search index on a nickname. You cannot create text search indexes on federated tables, materialized query tables, or views.
text_column_name
Specifies the name of the column to index. The data type of the column must be one of the following types: CHAR, VARCHAR, CLOB, DBCLOB, BLOB, GRAPHIC, VARGRAPHIC, or XML. If the data type of the column is not one of these data types, use a transformation function with the name function_schema.function_name to convert the column type to one of the valid types. Alternatively, you can specify a user-defined external function that accesses the text documents that you want to index.

You can create only a single text search index for a column.

function_name(text_column_name)
Specifies the schema-qualified name of an external scalar function that accesses text documents in a column that is not of a supported data type for text searching. The name must conform to Db2 naming conventions. This parameter performs a column type conversion. This function must take only one parameter and return only one value.
CODEPAGE code_page
Specifies the Db2 code page (CODEPAGE) to use when indexing text documents. The default value is specified by the value in the view SYSIBMTS.TSDEFAULTS, where DEFAULTNAME='CODEPAGE'. This parameter applies only to binary data types, such as the column type or return type from a transformation function must be BLOB or FOR BIT DATA.
LANGUAGE locale
Specifies the language that Db2 Text Search uses for language-specific processing of a document during indexing. To have your documents automatically scanned to determine the locale, specify AUTO for the locale option. If you do not specify a locale, the database territory determines the default setting for the LANGUAGE parameter.
FORMAT format
Specifies the format of text documents in the column. The supported formats include TEXT, XML, HTML, and INSO. Db2 Text Search requires this information when indexing documents. If you do not specify the format, the default value is used. The default value is in the view SYSIBMTS.TSDEFAULTS, where DEFAULTNAME='FORMAT';. For columns of data type XML, the default format 'XML'; is used, regardless of the value of DEFAULTNAME. To use the INSO format, you must install rich text support
UPDATE FREQUENCY
Specifies the frequency of index updates. The index is updated if the number of changes is at least the value of the UPDATE MINIMUM parameter. You can do automatic updates if the Db2 Text Search instance services are running, which you start by issuing the START FOR TEXT command.

The default frequency value is taken from the view SYSIBMTS.TSDEFAULTS, where DEFAULTNAME is set to UPDATEFREQUENCY.

NONE
No further index updates are made. The NONE option can be useful for a text column in a table with data that does not change. It is also useful if you intend to manually update the index by using the UPDATE INDEX command.
D
The days of the week when the index is updated.
*
Every day of the week.
integer1
Specific days of the week, from Sunday to Saturday: 0 - 6.
H
The hours of the specified days when the index is updated.
*
Every hour of the day.
integer2
Specific hours of the day, from midnight to 11 p.m.: 0 - 23.
M
The minutes of the specified hours when the index is updated.
integer3
The top of the hour (0) , or 5-minute increments after the hour: 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, or 55.
UPDATE MINIMUM minchanges
Specifies the minimum number of changes to text documents before the index is updated incrementally according to the frequency that you specify for the UPDATE FREQUENCY parameter. Only positive integer values are allowed. The default value is taken from the view SYSIBMTS.TSDEFAULTS, where DEFAULTNAME='UPDATEMINIMUM'.

The UPDATE INDEX command ignores the value of the UPDATE MINIMUM parameter unless you specify the USING UPDATE MINIMUM option for that command.

A small value for the UPDATE MINIMUM parameterincreases consistency between the table column and the text search index. However, it also increases the load on the system.

COLLECTION DIRECTORY directory
Specifies the directory in which the text search index collection is stored. You must specify the absolute path, where the maximum length of the absolute path name is 215 characters. The process owner of the text search server instance service must have read and write access to this directory.

The COLLECTION DIRECTORY parameter is supported only for an integrated text search server setup. For additional information about collection locations, review the usage notes.

ADMINISTRATION TABLES IN tablespace_name
Specifies the name of an existing nontemporary table space for the administration tables that are created for the index.

For a nonpartitioned database, if you do not specify a table space, the table space of the base table for which you are creating the index is used.

For a partitioned database, you must use the ADMINISTRATION TABLES IN parameter. To ensure that the staging tables for the text search index are distributed in the same manner as the corresponding base table, the table space must be in the same partition group as the table space of the base table.

INDEX CONFIGURATION (option_value)
Specifies more index-related options as option-value string pairs. Options and values are as follows:
Table 1. Option-value pairs
Option Value Data type Description
COMMENT text String value of fewer than 512 bytes Adds a string comment value to the REMARKS column in the Db2 Text Search catalog view TSINDEXES. It also appends the string comment value as the description of the collection to the table.
UPDATEAUTOCOMMIT commitsize String

Specifies the number of rows or number of hours after which a commit is run to preserve the previous work for either initial or incremental updates.

If you specify the number of rows, after the number of updated documents reaches the COMMITCOUNT number, the server applies a commit. COMMITCOUNT counts the number of documents that are updated by using the primary key, not the number of staging table entries.

If you specify the number of hours, the data in text index is committed after the specified number of hours is reached. The maximum number of hours is 24.

For an initial update, the index update processes batches of documents from the base table. After the commitsize value is reached, update processing completes a COMMIT operation, and the last processed key is saved in the staging table with the operational identifier '4.' This key is used to restart update processing after a failure or after the completion of the specified number of commitcycles . If you specify a commitcycles value, the update mode is changed to incremental to initiate capturing changes by using the LOGTYPEBASIC option to create triggers on the text table. However, , until the initial update is complete, log entries that were generated by documents that were not processed in a previous cycle are removed from the staging table.

Using the UPDATEAUTOCOMMIT option for an initial text index update significantly increases execution time.

For incremental updates, log entries that are processed are removed from the staging table with each interim commit.

COMMITTYPE committype String

Specifies rows or hours for the UPDATEAUTOCOMMIT index configuration option. The default is rows.

COMMITCYCLES commitcycles Integer

Specifies the number of commit cycles. The default is 0, meaning unlimited cycles.

If you do not specify the number of cycles, the update operation uses as many cycles as required to finish the update processing, based on the batch size that you specify for the UPDATEAUTOCOMMIT option.

You can use the COMMITCYCLES option with the UPDATEAUTOCOMMIT option with a committype option .

INITIALMODE initialmode String

Specifies how the updates are processed. The possible values of the INITIALMODE option are as follows:

FIRST
The primary update is the default value of the INITIALMODE option.
SKIP
The update mode is immediately set to incremental, triggers are added for the LOGTYPEBASIC option, but no initial update is performed.
NOW
The update is started after the index is created as the final part of the CREATE INDEX command operation. This option is supported only for single-node setups.
LOGTYPE ltype String Specifies whether triggers are added to populate the primary log table. The values are as follows:
BASIC
The primary staging table is created, and triggers are created on the text table to recognize any changes. This is the default value for text search indexes on base tables. This option is not supported for nicknames.
CUSTOM
The primary staging table is created, but no triggers are created on the text table. To identify changes for incremental updates, especially if you do not plan to use the ALLROWS option for updates. The CUSTOM option is supported for nicknames.
Note: The default value of the LOGTYPE option is CUSTOM for text search indexes on nicknames.
AUXLOG auxlog_value String Controls the creation of the additional log infrastructure to capture changes that are not recognized by a trigger. The default setting for range-partitioned tables is ON. You can change the default value in the default table by setting AuxLogNorm for non-range-partitioned tables and AuxLogPart for range-partitioned tables.

For text search indexes on nicknames, only the OFF option is supported for theAUXLOG option.

CJKSEGMENTATION cjksegmentation_method String value of fewer than 512 bytes Specifies the segmentation method that applies to documents that use the Chinese, Japanese, or Korean language (zh_CN, zh_TW, ja_JP, or ko_KR locale set), including such documents when automatic language detection is enabled (when you specify the LANGUAGE parameter with the AUTO option). Supported values are:
  • MORPHOLOGICAL
  • NGRAM
If you do not specify a value, the value stored in the SYSIBMTS.TSDEFAULTS view is used. Specifically, the value in the DEFAULTVALUE column of the row whose DEFAULTNAME value is CJKSEGMENTATION.

The specified segmentation method is added to the SYSIBMTS. TSCONFIGURATION administrative view. You cannot change the method after creating the text index.

Important: You must enclose non-numeric values, such as comments, in single quotation marks. A single quotation mark character within a string value must be represented by two consecutive single quotation marks, as shown in the following example:
INDEX CONFIGURATION (COMMENT 'Index on User''s Guide column')
SERVERID serverId
If a multiple server setup is used, specifies the serverId from SYSIBMTS.SYSTSSERVERS in which the index is to be created. If there are no multiple servers, the default server is used to create the index.
partition options
Reserved for internal IBM use.
CONNECT TO database_name
Specifies the database to which a connection is established. The database must be on the local system. This parameter takes precedence over the DB2DBDFT environment variable. You can omit this parameter if the following statements are both true:
  • The DB2DBDFT environment variable is set to a valid database name.
  • The user running the command has the required authorization to connect to the database server.
USER username USING password
Specifies the authorization name and password that are used to establish the connection.

Usage notes

All limits and naming conventions that apply to Db2 database objects and queries also apply to Db2 Text Search features and queries. Db2 Text Search identifiers must conform to the Db2 naming conventions. There are some additional restrictions. For example, these identifiers can be of the form:
[A-Za-z][A-Za-z0-9@#$_]*
or
"[A-Za-z ][A-Za-z0-9@#$_ ]*"
Successful execution of the CREATE INDEX command has the following effects:
  • The Db2 Text Search server data is updated. A collection with the name instance_database_name_index_identifier_number is created per database partition, as in the following example:
    tigertail_MYTSDB_TS250517_0000
    You can retrieve the collection name from the COLLECTIONNAME column in the SYSIBMTS.TSCOLLECTIONNAMES view.
  • The Db2 Text Search catalog information is updated.
  • An index staging table is created in the specified table space with Db2 indexes. In addition, an index event table is created in the specified table space. If you specified the AUXLOG ON option, a second staging table is created to capture changes through integrity processing.
  • If Db2 Text Search coexists with Db2 Net Search Extender and an active Net Search Extender index exists for the table column, the new text search index is set to inactive.
  • The new text search index is not automatically populated. The UPDATE INDEX command must be executed either manually or automatically (as a result of an update schedule being defined for the index through the specification of the UPDATE FREQUENCY option) for the text search index to be populated.
  • If you specified a frequency, a schedule task is created for the Db2 Administrative Scheduler.
The following key-related restrictions apply:
  • You must define a primary key for the table. In Db2 Text Search, you can use a multicolumn Db2 primary key without type limitations. The maximum number of primary key columns is two fewer than the maximum number of primary key columns that are allowed by Db2.
  • The maximum total length of all primary key columns for a table with Db2 Text Search indexes is 15 bytes fewer than the maximum total primary key length that is allowed by Db2. See the restrictions for the Db2 CREATE INDEX statement.

You cannot issue multiple commands concurrently on a text search index if they might conflict. If you issue this command while a conflicting command is running, an error occurs, and the command fails, after which you can try to run the command again. A conflicting command is DISABLE DATABASE FOR TEXT.

You cannot change the auxiliary log property for a text index after creating the index.

The AUXLOG option is not supported for nicknames for data columns that support an MQT with deferred refresh. It is also not supported for views.

To create a text search index on a nickname, the nickname must be a non-relational flat file nickname. Non-relational XML nicknames are not supported

For compatibility with an earlier version, you can specify the UPDATEAUTOCOMMIT index configuration option without type and cycles. This option is associated by default with the COMMITTYPE rows option and unrestricted cycles.

To override the configured values, you can specify the UPDATEAUTOCOMMIT, COMMITTYPE, and COMMITSIZE index configuration options for an UPDATE INDEX operation. Values that you submit for a specific update are applied only once and not persisted.

If you specify theINITIALMODE SKIP option, the text search index manager populates the index. Use this option to control the sequence in which data from the text table is initially processed.

The following rules apply to the LOGTYPE index configuration option:
  • If you use the LOGTYPE CUSTOM setting, use the SYSIBMTS.TSSTAGING administrative view to insert log entries for new, changed, and deleted documents.
  • To view the setting for an index, check the value of the LOGTYPE option in the SYSIBMTS.TSCONFIGURATION administrative view.
  • To view the default log type that is applied to new text indexes, check the value of the LOGTYPE option in the SYSIBMTS.TSDEFAULTS administrative view.
  • The LOGTYPE option is not valid with the ALLROWS option of the CREATE INDEX command because the ALLROWS option forces an initial update and no log tables are created.

For a partitioned database environment, administration tables that are specific to text search indexes, such as staging tables, and text search indexes are distributed in a manner like that used for the corresponding base table. When creating a text search index, use the ADMINISTRATION TABLES IN parameter so that the specified table space is in the same partition group as the table space of the base table.

The CJKSEGMENTATION option applies to zh_CN, zh_TW, ja_JP and ko_KR locale sets for Chinese, Japanese, and Korean languages. The MORPHOLOGICAL or NGRAM option that you specify for the segmentation method is added to the SYSIBMTS.TSCONFIGURATION administration view.

If you create an index with the LANGUAGE parameter set to the AUTO option, you can specify the CJKSEGMENTATION option. The specified segmentation method applies to Chinese, Japanese, and Korean language documents. You cannot change the value that you set for the cjksegmentation_method option after index creation is complete.

If you create a text search index by setting the LANGUAGE parameter to AUTO and the CJKSEGMENTATION option to MORPHOLOGICAL, searches for valid strings on a morphological index might not return the expected results. In such a case, use the CONTAINS function with the QUERYLANGUAGE option to obtain the results, as shown in the following sample statement:
select bookname from ngrambooks where contains (story, '军书','QUERYLANGUAGE=zh_CN') = 1

If you use the INITIALMODE SKIP option, combined with the LOGTYPE ON and AUXLOG ON options, you must manually insert the log entries into the staging table, but only for the initial update. All subsequent updates are handled automatically.