Substitution/Synonym dictionaries

A substitution dictionary is a collection of terms that help to group similar terms under one target term. Substitution dictionaries are managed in the bottom pane of the Library Resources tab. You can access this view with View > Resource Editor in the menus, if you are in an interactive workbench session. Otherwise, you can edit dictionaries for a specific template in the Template Editor .

You can define two forms of substitutions in this dictionary: synonyms and optional elements. You can click the tabs in this pane to switch between them.

After you run an extraction on your text data, you may find several concepts that are synonyms or inflected forms of other concepts. By identifying optional elements and synonyms, you can force the extraction engine to map these to one single target term.

Substituting using synonyms and optional elements reduces the number of concepts in the Extraction Results pane by combining them together into more significant, representative concepts with higher frequency Doc. counts.

Synonyms

Synonyms associate two or more words that have the same meaning. You can also use synonyms to group terms with their abbreviations or to group commonly misspelled words with the correct spelling. You can define these synonyms on the Synonyms tab.

A synonym definition is made up of two parts. The first is a Target term, which is the term under which you want the extraction engine to group all synonym terms. Unless this target term is used as a synonym of another target term or unless it is excluded, it is likely to become the concept that appears in the Extraction Results pane. The second is the list of synonyms that will be grouped under the target term.

For example, if you want automobile to be replaced by vehicle, then automobile is the synonym and vehicle is the target term.

You can enter any words into the Synonym column, but if the word is not found during extraction and the term had a match option with Entire, then no substitution can take place. However, the target term does not need to be extracted for the synonyms to be grouped under this term.

Optional elements

Optional elements identify optional words in a compound term that can be ignored during extraction in order to keep similar terms together even if they appear slightly different in the text. Optional elements are single words that, if removed from a compound, could create a match with another term. These single words can appear anywhere within the compound--at the beginning, middle, or end. You can define optional elements on the Optional tab.

For example, to group the terms ibm and ibm corp together, you should declare corp to be treated as an optional element in this case. In another example, if you designate the term access to be an optional element and during extraction both internet access speed and internet speed are found, they will be grouped together under the term that occurs most frequently.