The Pattern Matcher annotator captures patterns that are constructed from one or more words in the input text. The text is mapped to predefined facets for the parts of speech, such as nouns and verbs, and phrase patterns, such as a noun sequence.
The Pattern Matcher annotator can be used with content analytics collections only.
In the administration console, an administrator can configure rules for the patterns that are to be extracted and analyzed and associates the rules with facets. When the annotator runs, it uses the rules to extract the defined patterns of text. Pattern matching during text analysis is case-sensitive.
This annotator captures patterns constructed from one or more words in the input text. A pattern is a sequence of words with constraints. The following constraints are available:
Constraint | Description | Example |
---|---|---|
str | Surface string (the exact characters that appear in the input text) | ate |
lex | Lemma of the word | eat |
pos | The part of speech that the word represents | noun |
ftrs | Additional features (attributes) of the words | proper |
category | The facet path assigned by the Dictionary Lookup annotator | $.myword |
guard | If a word is set as a guard, it matches against a word that meets other constraints (as usual), or the beginning or end of the sentence. For example, if you want to capture the sequence of exactly two nouns, the pattern is "!noun" "noun" "noun" "!noun". But a match does not result if two nouns appear at the beginning of the sentence because the first element does not match. Set guard="true" for the first and the last elements to guard the inner two nouns, which are the ones that you want. The default value is guard="false". |
Content analytics collections have predefined pattern definitions to provide default text analytics capability. The following facets are defined by default for part-of-speech analysis. Part-of-speech analysis is provided for all languages.
Facet path | Facet name |
---|---|
$._word.noun.general | General Noun |
$._word.noun.unk | Unknown |
$._word.verb | Verb |
$._word.adj | Adjective |
$._word.adv | Adverb |
$._word.conj | Conjunction |
$._word.intj | Interjection |
$._word.num | Numeral |
The following facets are defined by default for phrase analysis. Phrase analysis is not the same for all languages. For example, some facets are not used for some languages.
Facet path | Facet name |
---|---|
$._phrase.noun_phrase.nouns | Noun Sequence |
$._phrase.noun_phrase.mod_noun | Modified Noun |
$._phrase.noun_phrase.adp_noun | Preposition Noun |
$._phrase.pred_phrase.adv_pred | Predicate with Adverb |
$._phrase.pred_phrase.noun_pred | Noun - Predicate |
$._phrase.pred_phrase.verb_noun | Verb - Noun |
$._phrase.conj_phrase.resultative | Resultative Conjunction |
$._phrase.conj_phrase.contradictory | Contradictory Conjunction |