Categories

At a glance

The categories task assigns to an input document individual nodes within a hierarchical taxonomy. For example, in the text:

IBM announces new advances in quantum computing.

examples of categories extracted are technology and computing/hardware/computer and technology and computing/operating systems, nodes respectively at level 3 and level 2 in a hierarchical taxonomy.

This model differs from the classification model in that training starts from a set of seed phrases associated with each node in the taxonomy, and does not require labeled documents.

Class definition
`watson_nlp.workflows.categories.esa_hierarchical.ESAHierarchical`

For language support, see Supported languages.

The motivation for the implementation is based on Hierarchical Dataless Classification by Song & Roth, AAAI 2014, see algorithm details for implementation details.

Pretrained models

Model names are listed below.

Model ID	Container Image
categories_esa-workflow_lang_en_stock	cp.icr.io/cp/ai/watson-nlp_categories_esa-workflow_lang_en_stock:1.4.1

The models have been tested on data from news reports and general web pages.

For details of the Categories type system, see Understanding model type systems.

Running models

The Categories model request accepts the following fields:

Field	Type	Required Optional Repeated	Description
`raw_document`	`watson_core_data_model.nlp.RawDocument`	required	The input document on which to perform Categories predictions
`explanation`	`bool`	optional	Boolean flag indicating whether or not explanations should be computed and returned
`limit`	`int32`	optional	The maximum number of predicted categories. If not specified then the limit on the number of predicted categories defaults to 3

Example requests

REST API

curl -s \
  "http://localhost:8080/v1/watson.runtime.nlp.v1/NlpService/CategoriesPredict" \
  -H "accept: application/json" \
  -H "content-type: application/json" \
  -H "Grpc-Metadata-mm-model-id: categories_esa-workflow_lang_en_stock" \
  -d '{ "raw_document": { "text": "A solicitor from Loughborough is delighted when she gets the chance to take part in the final of Bake Off. However, her chances are scuppered when she finds out her arch rival is also going to compete. Unexpectedly, the solicitor is bitten by a zombie and therefore is disqualified from competing." }, "explanation": true, "limit": 2 }'

Response

{"categories":[
  {"labels":[
    "food & drink",
    "desserts and baking"
    ],
    "score":0.74634,
    "explanation":[{"text":"bake"}
    ]
   },
  {"labels":[
    "sports",
    "field hockey"
    ],
    "score":0.599151, "explanation":[{"text":"chances"}
    ]
   }
  ],
  "producerId":{
    "name":"Categories ESA Workflow",
    "version":"1.0.0"
    }
   }

Python

import grpc

  from watson_nlp_runtime_client import (
      common_service_pb2,
      common_service_pb2_grpc,
      syntax_types_pb2,
  )

  channel = grpc.insecure_channel("localhost:8085")

  stub = common_service_pb2_grpc.NlpServiceStub(channel)

  request = common_service_pb2.CategoriesRequest(
      raw_document=syntax_types_pb2.RawDocument(text="A solicitor from Loughborough is delighted when she gets the chance to take part in the final of Bake Off. However, her chances are scuppered when she finds out her arch rival is also going to compete. Unexpectedly, the solicitor is bitten by a zombie and therefore is disqualified from competing."),
      explanation=True,
      limit=2
  )

    response = stub.CategoriesPredict(
      request, metadata=[("mm-model-id", "categories_esa-workflow_lang_en_stock")]
  )

  print(response)

Response

categories {
  labels: "food & drink"
  labels: "desserts and baking"
  score: 0.74634
  explanation {
    text: "bake"
  }
}
categories {
  labels: "sports"
  labels: "field hockey"
  score: 0.599151
  explanation {
    text: "chances"
  }
}
producer_id {
  name: "Categories ESA Workflow"
  version: "1.0.0"
}

When to use Categorization instead of Classification?

Categorization models work well in use cases where documents can be mapped to taxonomy nodes based on general knowledge topics discussed in the document, where these general topics have a good representation in Wikipedia. Categorization is not expected to work well in use cases that are very domain specific (with little or no representation in Wikipedia) or that require interpretation beyond general topics. For example, tasks such as sentiment or emotion classification are not a good match for Categorization models.