DB2 10.5 for Linux, UNIX, and Windows

Fuzzy search

Use a fuzzy search to find documents that contain words with similar spelling to the term that you are searching.

A fuzzy search query searches for character sequences that are not only the same but similar to the query term. Use the tilde symbol (~) at the end of a term to do a fuzzy search. For example, the following query finds documents that include the terms analytics, analyze, analysis, and so on.
analytics~
You can add an optional parameter to specify the degree of similarity of the search results to the search term. Specify a value greater than or equal to 0 and less than 1. You must precede the value by a 0 and a decimal point, for example, 0.8. A value closer to 1 matches terms with a higher similarity. If you do not specify the parameter, the default is 0.5.
analytics~0.8
You can specify a fuzzy search on a term but not on a phrase. To apply fuzzy search to multiple words in a query, you must apply a fuzzy search factor for each term. For example, the following query finds documents that include terms that are similar to summer and time.
summer~0.7 time~0.7

Example

Step 1. Create a table called BOOKS:
create table books (
         isbn varchar(18) not null primary key, 
         author varchar(30), 
         story varchar(100), 
         year integer);
Step 2. Create a text search index on the STORY column:
db2ts "create index bookidx for text on books(story) connect to test";
Step 3. Import data into the table:
insert into books values ('0-13-086755-1','John','The Blue Can',2001)
insert into books values ('0-13-086755-2','Mike','Cats and Dogs', 2000)
insert into books values ('0-13-086755-3','Peter','Hats on the Rack',1999)
insert into books values ('0-13-086755-4','Agatha','Cat among the Pigeons',1997)
insert into books values ('0-13-086755-5','Edgar','Cars Unlimited',2010)
insert into books values ('0-13-086755-6','Roy','Carson and Lemon',2008) 
Step 4. Update the text search index:
db2ts "update index bookidx for text connect to test"
Step 5. Issue a fuzzy search with the CONTAINS function:
select author, year, story from books where contains(story, 'cat~0.4') = 1
The following is the sample output:
AUTHOR YEAR STORY
------------------------ ----------- -------------------------
John 2001 The Blue Can
Mike 2000 Cats and Dogs
Agatha 1997 Cat among the Pigeons

3 record(s) selected.
To see the associated score, issue the following query that is modified for increased fuzziness:
select author, year, story, integer(score(story, 'cat~0.3')*1000) as score 
from books where contains(story, 'cat~0.3') = 1 order by score desc 
The following is the sample output:
AUTHOR YEAR STORY SCORE
------------------------------ ----------- -------
Agatha 1997 Cat among the Pigeons 32
John 2001 The Blue Can 17
Mike 2000 Cats and Dogs 17
Peter 1999 Hats on the Rack 1
Edgar 2010 Cars Unlimited 1

5 record(s) selected.

Restrictions