Categories
At a glance
The categories
task assigns to an input document individual nodes within a hierarchical taxonomy. For example, in the text:
IBM announces new advances in quantum computing.
examples of categories extracted are technology and computing/hardware/computer
and technology and computing/operating systems
, nodes respectively at level 3 and level 2 in a hierarchical taxonomy.
This model differs from the classification model in that training starts from a set of seed phrases associated with each node in the taxonomy, and does not require labeled documents.
Class definition |
---|
watson_nlp.workflows.categories.esa_hierarchical.ESAHierarchical |
For language support, see Supported languages.
The motivation for the implementation is based on Hierarchical Dataless Classification by Song & Roth, AAAI 2014, see algorithm details for implementation details.
Pretrained models
Model names are listed below.
Model ID | Container Image |
---|---|
categories_esa-workflow_lang_en_stock | cp.icr.io/cp/ai/watson-nlp_categories_esa-workflow_lang_en_stock:1.4.1 |
The models have been tested on data from news reports and general web pages.
For details of the Categories
type system, see Understanding model type systems.
Running models
The Categories model request accepts the following fields:
Field | Type | Required Optional Repeated |
Description |
---|---|---|---|
raw_document |
watson_core_data_model.nlp.RawDocument |
required | The input document on which to perform Categories predictions |
explanation |
bool |
optional | Boolean flag indicating whether or not explanations should be computed and returned |
limit |
int32 |
optional | The maximum number of predicted categories. If not specified then the limit on the number of predicted categories defaults to 3 |
Example requests
REST API
curl -s \
"http://localhost:8080/v1/watson.runtime.nlp.v1/NlpService/CategoriesPredict" \
-H "accept: application/json" \
-H "content-type: application/json" \
-H "Grpc-Metadata-mm-model-id: categories_esa-workflow_lang_en_stock" \
-d '{ "raw_document": { "text": "A solicitor from Loughborough is delighted when she gets the chance to take part in the final of Bake Off. However, her chances are scuppered when she finds out her arch rival is also going to compete. Unexpectedly, the solicitor is bitten by a zombie and therefore is disqualified from competing." }, "explanation": true, "limit": 2 }'
Response
{"categories":[
{"labels":[
"food & drink",
"desserts and baking"
],
"score":0.74634,
"explanation":[{"text":"bake"}
]
},
{"labels":[
"sports",
"field hockey"
],
"score":0.599151, "explanation":[{"text":"chances"}
]
}
],
"producerId":{
"name":"Categories ESA Workflow",
"version":"1.0.0"
}
}
Python
import grpc
from watson_nlp_runtime_client import (
common_service_pb2,
common_service_pb2_grpc,
syntax_types_pb2,
)
channel = grpc.insecure_channel("localhost:8085")
stub = common_service_pb2_grpc.NlpServiceStub(channel)
request = common_service_pb2.CategoriesRequest(
raw_document=syntax_types_pb2.RawDocument(text="A solicitor from Loughborough is delighted when she gets the chance to take part in the final of Bake Off. However, her chances are scuppered when she finds out her arch rival is also going to compete. Unexpectedly, the solicitor is bitten by a zombie and therefore is disqualified from competing."),
explanation=True,
limit=2
)
response = stub.CategoriesPredict(
request, metadata=[("mm-model-id", "categories_esa-workflow_lang_en_stock")]
)
print(response)
Response
categories {
labels: "food & drink"
labels: "desserts and baking"
score: 0.74634
explanation {
text: "bake"
}
}
categories {
labels: "sports"
labels: "field hockey"
score: 0.599151
explanation {
text: "chances"
}
}
producer_id {
name: "Categories ESA Workflow"
version: "1.0.0"
}
When to use Categorization instead of Classification?
Categorization models work well in use cases where documents can be mapped to taxonomy nodes based on general knowledge topics discussed in the document, where these general topics have a good representation in Wikipedia. Categorization is not expected to work well in use cases that are very domain specific (with little or no representation in Wikipedia) or that require interpretation beyond general topics. For example, tasks such as sentiment or emotion classification are not a good match for Categorization models.