Syntax

At a glance

The Syntax model performs fundamental NLP tasks on the input text:

  • Sentence detection
  • Tokenization: can't -> ca + n't
  • Part-of-Speech tagging: I thought -> I/PRON, thought/VERB
  • Lemmatization: I thought -> I/I, thought/think
  • Dependency parsing: I -> nsubj -> thought -> root
Class definition
watson_nlp.blocks.syntax.izumo.IzumoTextProcessing

For language support, see Supported languages.

This model offers two implementations:

  • Izumo: Provides high accuracy and throughput at moderate computational cost. The model is built using curated human knowledge (dictionaries, complementary rules) with machine learning algorithms (Logistic Regression, Conditional Random Fields). The implementation has been tested over many years in IBM products.

  • Transformer-based syntax: Provides state-of-the-art accuracy by leveraging the cutting edge technology of IBM's fine-tuned Slate model. This consumes higher computational cost; especially that the runtime throughput is lower in a CPU environment without GPU.

Pretrained models

Model names are listed below.

Model ID Container Image
Izumo models
syntax_izumo_lang_af_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_af_stock:1.4.1
syntax_izumo_lang_ar_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_ar_stock:1.4.1
syntax_izumo_lang_bs_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_bs_stock:1.4.1
syntax_izumo_lang_ca_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_ca_stock:1.4.1
syntax_izumo_lang_cs_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_cs_stock:1.4.1
syntax_izumo_lang_da_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_da_stock:1.4.1
syntax_izumo_lang_de_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_de_stock:1.4.1
syntax_izumo_lang_el_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_el_stock:1.4.1
syntax_izumo_lang_en_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_en_stock:1.4.1
syntax_izumo_lang_es_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_es_stock:1.4.1
syntax_izumo_lang_fi_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_fi_stock:1.4.1
syntax_izumo_lang_fr_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_fr_stock:1.4.1
syntax_izumo_lang_he_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_he_stock:1.4.1
syntax_izumo_lang_hi_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_hi_stock:1.4.1
syntax_izumo_lang_hr_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_hr_stock:1.4.1
syntax_izumo_lang_it_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_it_stock:1.4.1
syntax_izumo_lang_ja_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_ja_stock:1.4.1
syntax_izumo_lang_ko_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_ko_stock:1.4.1
syntax_izumo_lang_nb_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_nb_stock:1.4.1
syntax_izumo_lang_nl_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_nl_stock:1.4.1
syntax_izumo_lang_nn_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_nn_stock:1.4.1
syntax_izumo_lang_pl_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_pl_stock:1.4.1
syntax_izumo_lang_pt_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_pt_stock:1.4.1
syntax_izumo_lang_ro_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_ro_stock:1.4.1
syntax_izumo_lang_ru_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_ru_stock:1.4.1
syntax_izumo_lang_sk_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_sk_stock:1.4.1
syntax_izumo_lang_sr_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_sr_stock:1.4.1
syntax_izumo_lang_sv_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_sv_stock:1.4.1
syntax_izumo_lang_tr_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_tr_stock:1.4.1
syntax_izumo_lang_zh-cn_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_zh-cn_stock:1.4.1
syntax_izumo_lang_zh-tw_stock cp.icr.io/cp/ai/watson-nlp_syntax_izumo_lang_zh-tw_stock:1.4.1
Transformer models
syntax_transformer_en_stock cp.icr.io/cp/ai/watson-nlp_syntax_transformer_en_stock:1.4.1

The Syntax models use pre-defined algorithms and models suitable for each language. The output schema for part of speech and dependency parsing follows the Universal Part-of-Speech v2 and Universal Dependency Relations v2 standard for all languages. This means that the Syntax models output the same set of parts of speech tags and dependency relations for all supported languages. For details about the Universal Part-of-Speech and Universal Dependency v2 standards, see Grammatical properties.

Both Izumo models and transformer-based syntax models have been trained with Universal Dependency corpora with commercial license, as well as additional data generated via a novel silver data generation process invented by IBM Research Yorktown and Tokyo labs. Models are continuously improved with feedback from users.

For Izumo models, the specific algorithms used in training the Syntax models for each language have been chosen to provide a good trade-off between accuracy and runtime performance. The algorithms differ across different language groups: as a general rule, simpler and fast algorithms are used for simpler languages such as English (Group A languages), and increasingly more sophisticated (and slower) algorithms are used for complex languages such as Arabic (Group B languages), and Chinese, Japanese and Korean (Group C languages).

The transformer-based syntax is purely based on machine learning approaches, thus the algorithm is common for all supported languages. Currently, only the English stock model is avaiable.

Running models

The Syntax model request accepts the following fields:

Field Type Required
Optional
Repeated
Description
raw_document watson_core_data_model.nlp.RawDocument required The input document on which to perform Syntax predictions
parsers str repeated List containing any of the following strings: token, sentence, lemma, part_of_speech, dependency

and returns the following responses:

Token

Groups a sequence of characters into a useful semantic unit for processing.

Sentence

Identifies sentence(s) within a text.

Lemma

Returns the base, or root, form of a word.

Part of Speech

Returns a part-of-speech code:

Name Number Description
POS_UNSET 0 Default value when no POS tagging performed
POS_ADJ 1 adjective
POS_ADP 2 adposition
POS_ADV 3 adverb
POS_AUX 4 auxiliary
POS_CCONJ 5 coordinating conjunction
POS_DET 6 determiner
POS_INTJ 7 interjection
POS_NOUN 8 noun
POS_NUM 9 numeral
POS_PART 10 particle
POS_PRON 11 pronoun
POS_PROPN 12 proper noun
POS_PUNCT 13 punctuation
POS_SCONJ 14 subordinating conjunction
POS_SYM 15 symbol
POS_VERB 16 verb
POS_X 17 other

Dependency

Returns a dependency relation code:

Name Number Description
DEP_OTHER 0 other
DEP_ACL 1 clausal modifier of noun (adjectival clause)
DEP_ACL_RELCL 38 relative clause modifier
DEP_ADVCL 2 adverbial clause modifier
DEP_ADVMOD 3 adverbial modifier
DEP_ADVMOD_EMPH 39 emphasizing word, intensifier
DEP_ADVMOD_LMOD 40 locative adverbial modifier
DEP_AMOD 4 adjectival modifier
DEP_APPOS 5 appositional modifier
DEP_AUX 6 auxiliary
DEP_AUX_PASS 41 passive auxiliary
DEP_CASE 7 case marking
DEP_CC 8 coordinating conjunction
DEP_CC PRECONJ 4 preconjunct
DEP_CCOMP 9 clausal complement
DEP_CLF 10 classifier
DEP_COMPOUND 11 compound
DEP_COMPOUND_LVC 44 light verb construction
DEP_COMPOUND_PRT 45 phrasal verb particle
DEP_COMPOUND_REDUP 46 reduplicated compounds
DEP_COMPOUND_SVC 47 serial verb compounds
DEP_CONJ 12 conjunct
DEP_COP 13 copula
DEP_CSUBJ 14 clausal subject
DEP_CSUBJ_PASS 43 clausal passive subject
DEP_DEP 15 unspecified dependency
DEP_DET 16 determiner
DEP_DET_NUMGOV 48 pronominal quantifier governing the case of the noun
DEP_DET_NUMNOD 49 pronominal quantifier agreeing in case with the noun
DEP_DET_POSS 50 possessive determiner
DEP_DISCOURSE 17 discourse element
DEP_DISLOCATED 18 dislocated elements
DEP_EXPL 19 expletive
DEP_EXPL_IMPERS 51 impersonal expletive
DEP_EXPL_PASS 52 reflexive pronoun used in reflexive passive
DEP_EXPL_PV 53 reflexive clitic with an inherently reflexive verb
DEP_FIXED 20 fixed multiword expression
DEP_FLAT 21 flat multiword expression
DEP_FLAT_FOREIGN 54 foreign words
DEP_FLAT_NAME 55 names
DEP_GOESWITH 22 goes with
DEP_IOBJ 23 indirect object
DEP_LIST 24 list
DEP_MARK 25
DEP_NMOD 26 nominal modifier
DEP_NMOD_POSS 56 possessive nominal modifier
DEP_NMOD_TMOD 57 temporal modifier
DEP_NSUBJ 27 nominal subject
DEP_NSUBJ_PASS 58 passive nominal subject
DEP_NUMMOD 2 numeric modifier
DEP_NUMMOD_GOV 59 numeric modifier governing the case of the noun
DEP_OBJ 29 object
DEP_OBL 30 oblique nominal
DEP_OBL_AGENT 60 agent modifier
DEP_OBL_ARG 61 oblique argument
DEP_OBL_LMOD 62 locative modifier
DEP_OBL_TMOD 63 temporal modifier
DEP_ORPHAN 31 orphan
DEP_PARATAXIS 32 parataxis
DEP_PUNCT 33 punctuation
DEP_REPARANDUM 34 overridden disfluency
DEP_ROOT 35 root
DEP_VOCATIVE 36 vocative
DEP_XCOMP 37 open clausal complements

Example requests

REST API

curl -s \
  "http://localhost:8080/v1/watson.runtime.nlp.v1/NlpService/SyntaxPredict" \
  -H "accept: application/json" \
  -H "content-type: application/json" \
  -H "Grpc-Metadata-mm-model-id: syntax_izumo_lang_en_stock" \
  -d '{ "raw_document": { "text": "This is a test sentence." }, "parsers": ["token","sentence","lemma","part_of_speech","dependency"] }'

Response

{"text":"This is a test sentence.", "producerId":{"name":"Izumo Text Processing", "version":"0.0.1"},
 "tokens":[
  {"span":{"begin":0, "end":4, "text":"This"}, "lemma":"this", "partOfSpeech":"POS_PRON", "dependency":{"relation":"DEP_NSUBJ", "identifier":1, "head":2}, "features":[]},
  {"span":{"begin":5, "end":7, "text":"is"}, "lemma":"be", "partOfSpeech":"POS_AUX", "dependency":{"relation":"DEP_COP", "identifier":3, "head":2}, "features":[]},
  {"span":{"begin":8, "end":9, "text":"a"}, "lemma":"a", "partOfSpeech":"POS_DET", "dependency":{"relation":"DEP_DET", "identifier":4, "head":2}, "features":[]},
  {"span":{"begin":10, "end":14, "text":"test"}, "lemma":"test", "partOfSpeech":"POS_NOUN", "dependency":{"relation":"DEP_COMPOUND", "identifier":5, "head":2}, "features":[]},
  {"span":{"begin":15, "end":23, "text":"sentence"}, "lemma":"sentence", "partOfSpeech":"POS_NOUN", "dependency":{"relation":"DEP_ROOT", "identifier":2, "head":0}, "features":[]},
  {"span":{"begin":23, "end":24, "text":"."}, "lemma":"", "partOfSpeech":"POS_PUNCT", "dependency":{"relation":"DEP_PUNCT", "identifier":6, "head":2}, "features":[]}
  ],
 "sentences":[
  {"span":{"begin":0, "end":24, "text":"This is a test sentence."}}
  ],
 "paragraphs":[
  {"span":{"begin":0, "end":24, "text":"This is a test sentence."}}
  ]
}

Python

import grpc

from watson_nlp_runtime_client import common_service_pb2, common_service_pb2_grpc

client = common_service_pb2_grpc.NlpServiceStub(grpc.insecure_channel("localhost:8085"))

response = client.SyntaxPredict(
  common_service_pb2.SyntaxRequest(
    raw_document={"text": "This is a test sentence."},
    parsers=('token', 'sentence', 'lemma', 'part_of_speech', 'dependency')
  ), 
  metadata=[("mm-model-id", "syntax_izumo_lang_en_stock")],
)

print(response)

Response

text: "This is a test sentence."
producer_id {
  name: "Izumo Text Processing"
  version: "0.0.1"
}
tokens {
  span {
    end: 4
    text: "This"
  }
  lemma: "this"
  part_of_speech: POS_PRON
  dependency {
    relation: DEP_NSUBJ
    identifier: 1
    head: 2
  }
}
tokens {
  span {
    begin: 5
    end: 7
    text: "is"
  }
  lemma: "be"
  part_of_speech: POS_AUX
  dependency {
    relation: DEP_COP
    identifier: 3
    head: 2
  }
}
tokens {
  span {
    begin: 8
    end: 9
    text: "a"
  }
  lemma: "a"
  part_of_speech: POS_DET
  dependency {
    relation: DEP_DET
    identifier: 4
    head: 2
  }
}
tokens {
  span {
    begin: 10
    end: 14
    text: "test"
  }
  lemma: "test"
  part_of_speech: POS_NOUN
  dependency {
    relation: DEP_COMPOUND
    identifier: 5
    head: 2
  }
}
tokens {
  span {
    begin: 15
    end: 23
    text: "sentence"
  }
  lemma: "sentence"
  part_of_speech: POS_NOUN
  dependency {
    relation: DEP_ROOT
    identifier: 2
  }
}
tokens {
  span {
    begin: 23
    end: 24
    text: "."
  }
  part_of_speech: POS_PUNCT
  dependency {
    relation: DEP_PUNCT
    identifier: 6
    head: 2
  }
}
sentences {
  span {
    end: 24
    text: "This is a test sentence."
  }
}
paragraphs {
  span {
    end: 24
    text: "This is a test sentence."
  }
}