Watson Natural Language Processing library

The Watson Natural Language Processing library provides basic natural language processing functions for syntax analysis and out-of-the-box pre-trained models for a wide variety of text processing tasks, such as sentiment analysis, keyword extraction and vectorization. The Watson Natural Language Processing library is available for Python only.

With Watson Natural Language Processing, you can turn unstructured data into structured data, making the data easier to understand and transferable, in particular if you are working with a mix of unstructured and structured data. Examples of such data are call center records, customer complaints, social media posts, or problem reports. The unstructured data is often part of a larger data record which includes columns with structured data. Extracting meaning and structure from the unstructured data and combining this information with the data in the columns of structured data, gives you a deeper understanding of the input data and can help you to make better decisions.

Watson Natural Language Processing provides pre-trained models in over 20 languages. They are curated by a dedicated team of experts, and evaluated for quality on each specific language. These pre-trained models can be used in production environments without you having to worry about license or intellectual property infringements.

Although you can create your own models, the easiest way to get started with Watson Natural Language Processing is to run the pre-trained models on unstructured text to perform language processing tasks.

Here are some examples of language processing tasks available in Watson Natural Language Processing pre-trained models:

  • Syntax: tokenization, lemmatization, part of speech tagging, and dependency parsing
  • Entity extraction: find mentions of entities (like person, organization, or date)
  • Keywords extraction: extract noun phrases that are relevant in the input text
  • Text classification: analyze text and then assign a set of pre-defined tags or categories based on its content
  • Sentiment classification: is the input document positive, negative or neutral?
  • Tone classification: classify the tone in the input document (like excited, frustrated, or sad)
  • Emotion classification: classify the emotion of the input document (like anger or disgust)
  • Keywords extraction: extract noun phrases that are relevant in the input text
  • Embeddings: map individual words or larger text snippets into a vector space

Using Watson Natural Language Processing in a notebook

Service The Watson Natural Language Processing library is only available if the Jupyter Notebooks with Python 3.10 or Python 3.9 service is installed. Additionally, the pre-trained Natural Language Processing models must be installed on the IBM Cloud Pak for Data platform. See Specifying additional installation options for default Runtime for Python 3.10 and Specifying additional installation options for Python 3.9.

You can run your Python notebooks using the Watson Natural Language Processing library in the following provided default environment.

* Runtime 22.1 on Python 3.9 environment template is deprecated.

Environment templates that include the Watson Natural Language Processing library
Name Hardware configuration
Runtime 22.1 on Python 3.9 * 1 vCPU and 2 GB RAM
Runtime 22.2 on Python 3.10 1 vCPU and 2 GB RAM

The Runtime 22.x environments are not large enough to run notebooks that use the prebuilt models. For example, to run the Syntax and Sentiment models, you need an environment with 1 vCPU and 4 GB RAM. To work with larger environments, you must create a custom environment template of type Default (only CPU) or GPU. When you create this template, consider the following:

  • The environment must have at least 4 GB of memory, and one of the following Software versions:

    • Runtime 22.1 on Python 3.9 *
    • Runtime 22.2 on Python 3.10
    • JupyterLab with Runtime 22.1 on Python 3.9 *
    • JupyterLab with Runtime 22.2 on Python 3.10
  • You can only select type GPU when creating a custom template if the Jupyter notebooks with Python for GPU service is installed on the IBM Cloud Pak for Data platform. GPU environments are not available by default. For details, see GPU environments.

Working with the pre-trained models

Watson Natural Language Processing encapsulates natural language functionality via blocks where each block supports functions to:

  • load(): load a block model.
  • run(): run the block on input argument(s).
  • train(): train the block on your own data. Not all blocks support training.
  • save(): save the block model you trained on your own data.

There are two types of blocks:

Blocks that operate directly on the input document

An example of a block that operates directly on the input document is the Syntax block, which performs natural language processing operations such as tokenization, lemmatization, part of speech tagging or dependency parsing.

This block can be loaded and run on the input document directly. For example:

import watson_nlp

# Load the syntax model for English
syntax_model = watson_nlp.load('syntax_izumo_en_stock')

# Run the syntax model and print the result
syntax_prediction = syntax_model.run('Welcome to IBM!')
print(syntax_prediction)

Blocks that depend on other blocks

Block that depend on other blocks cannot be applied on the input document directly, and must be linked with one or more blocks in order to process the input document. In general, machine learning models such as classifiers or entity extractors that require preprocessing the input text fall into this category. For example, the Entity Mention block depends on the Syntax block.

These blocks can be loaded but can only be run in a particular order on the input document. For example:

import watson_nlp

# Load Syntax and a Entity Mention model for English
syntax_model = watson_nlp.load('syntax_izumo_en_stock')
entity_model = watson_nlp.load('entity-mentions_bert_multi_stock')

# Run the syntax model on the input text
syntax_prediction = syntax_model.run('IBM announced new advances in quantum computing')

# Now run the entity mention model on the result of syntax
entity_mentions = entity_model.run(syntax_prediction)
print(entity_mentions)

Loading and running a model

Watson Natural Language Processing contains the load() function to allow you to quickly load pre-trained models to your notebook. To load a model, you first need to know its name. Model names follow a standard convention encoding the type of model (like classification or entity extraction), type of algorithm (like BERT or SVM), language code and details of the type system.

To find the right block to use, use the block catalog. See Watson NLP block catalog.

You can find the expected input for a given block class (for example to the Entity Mentions model) by using help() on the block class run() method:

import watson_nlp

help(watson_nlp.blocks.entity_mentions.BERT.run)

Sample project and notebooks

To help you get started with the Watson Natural Language Processing library, you can download a sample project and notebooks from a Data Science sample GitHub repository at Notebooks and projects. The notebooks in the sample project demonstrate how to use the different Watson Natural Language Processing blocks and how to train your own models.

Note that you need to download the sample project and the notebooks for the runtime environment in which you want to run the notebooks.

Sample notebooks

Sample project

If you don't want to download the sample notebooks to your project individually, you can download the entire sample project.

The sample project contains the sample notebooks listed in the previous section, including:

  • Analyzing hotel reviews using Watson Natural Language Processing

    This notebook shows you how to use syntax analysis to extract the most frequently used nouns from the hotel reviews, classify the sentiment of the reviews and use aspect-oriented sentiment analysis for the most frequently extracted aspects. The data file that is used by this notebook is included in the project as a data asset.

The sample project contains custom environment templates for the sample notebooks. If you want to create your own templates, ensure that they have at least 4 GB of memory. The Complaint classification with Watson Natural Language processing notebook requires a custom template with at least 8 GB of memory.

Learn more

Parent topic: Libraries and scripts for notebooks