About this task
Use the following instructions to setup and synchronize DB2® Text Search indexes for morphological
and N-gram indexing in the SAMPLE database. Search for linguistically
meaningful Chinese words.
- Create two tables for morphological and N-gram indexing. The tables have columns for the book name, author, story, ISBN
number and the year the book was published.
db2 "CREATE TABLE morphobooks (
isbn VARCHAR(18) not null PRIMARY KEY,
bookname VARCHAR(30),
author VARCHAR(30),
story blob(1G),
year integer
)"
db2 "CREATE TABLE ngrambooks (
isbn VARCHAR(18) not null PRIMARY KEY,
bookname VARCHAR(30),
author VARCHAR(30),
story blob(1G),
year integer
)"
- Issue the CREATE INDEX command to create
a text search index on the STORY column of MORPHOBOOKS table. The
name of the text search index is MORPHOINDEX.
db2ts " CREATE INDEX db2ts.morphoindex FOR TEXT
ON morphobooks (story) LANGUAGE zh_TW
INDEX CONFIGURATION (CJKSEGMENTATION 'morphological')
CONNECT TO sample";
- Issue the CREATE INDEX command to create
a text search index on the STORY column of NGRAMBOOKS table. The name
of the text search index is NGRAMINDEX.
db2ts " CREATE INDEX db2ts.ngramindex FOR TEXT
ON ngrambooks (story) LANGUAGE zh_TW
INDEX CONFIGURATION (CJKSEGMENTATION 'ngram')
CONNECT TO sample";
- Load data into the two tables.
db2 "import from ./data/books.del of DEL lobs from ./data/
replace into morphobooks";
db2 "import from ./data/books.del of DEL lobs from ./data/
replace into ngrambooks";
The books.del file has the entry:
"0-13-086755-4", "book1", "Julie", "books_zh_TW1.lob.0.449/", 2004
The Books_zh_TW1.lob large
object has the following content:
Figure 1. Content of the Books_zh_TW1.lob object
- Synchronize the text search indexes with data from the
corresponding table by issuing following commands:
db2ts "UPDATE INDEX db2ts.morphoindex FOR TEXT CONNECT TO sample";
db2ts "UPDATE INDEX db2ts.ngramindex FOR TEXT CONNECT TO sample";
- A search for linguistically meaningful Chinese words is
successful here for both morphological and N-gram segmentation.
Figure 2. Query results for meaningful
Chinese words
The output indicates that the result from morphological segmentation
is the same as N-gram segmentation
- Search for meaningless Chinese words to see the difference
between morphological and N-gram segmentation.
Figure 3. Query results for meaningless Chinese
words
Only N-gram segmentation returns a book name.