Configuration of multi-language settings
The following table lists the languages that are supported by the IBM® StoredIQ®.
This list does not apply to IBM StoredIQ Cognitive Data Assessment. With CDA, the only supported language is English.
Language | Code | Lemmas | Stop words |
---|---|---|---|
Arabic | ar |
X | |
Catalan | ca |
||
Chinese | zh |
X | |
Czech | cs |
X | |
Danish | da |
X | |
Dutch | nl |
X | |
English | en |
X | X |
Finnish | fi |
X | |
French | fr |
X | X |
German | de |
X | X |
Greek | el |
X | |
Hebrew | he |
X | |
Hungarian | hu |
||
Icelandic | is |
||
Italian | it |
X | |
Japanese | ja |
X | |
Korean | ko |
X | |
Malay | ms |
||
Norwegian (Bokmal) | nb |
X | |
Norwegian (Nynorsk) | nn |
X | |
Polish | pl |
X | |
Portuguese | pt |
X | X |
Romanian | ro |
||
Russian | ru |
X | |
Spanish | es |
X | X |
Swedish | sv |
X | |
Thai | th |
X | |
Turkish | tr |
X | |
Vietnamese | vi |
By default, English is the only language that Multi-language Support identifies during a harvest and it is also the default search language. Both the identified language (or languages) and the search default language can be changed in the siq-findex.properties file on the data server. You can find this properties file in the /usr/local/tomcat/webapps/storediq/WEB-INF/classes directory on each data server. All versions of the siq-findex.properties file must be kept in sync across all data servers for searches to be consistent and correct.
To change the language that the harvester can identify, use the
index.presetLanguageIDs field, which is the second-to-last line of the file:
index.presetLanguageIDs = en,fr,de,pt
. The first language in the list is the
default language, which is assigned to a document whose language cannot be identified.
To change the default search language, use the search.defaultLanguage
field, which is the last line of the file. The search language is used to determine which language's
rules, that is, stop words, lemmas,
character normalization, apply in a search. Only one language can be set as
the default for search. However, the default language can be manually overwritten in a full-text
search: lang:de[umkämpft großteils]
After you change this property file, you must restart the data server and reharvest the volumes that are to be searched. If the data server is the DataServer - Distributed type, run the following command on the data server after restarting it:
/etc/deepfile/dataserver/es-update-findex-props.py