What's new and changed in Watson Speech services

Watson Speech services updates can include new features, bug fixes, and security updates. Updates are listed in reverse chronological order so that the latest release is at the beginning of the topic.

You can see a list of the new features for the platform and all of the services at What's new in IBM Cloud Pak® for Data.

Installing or upgrading Watson Speech services

Ready to install or upgrade Watson Speech services?

Related documentation:

Cloud Pak for Data Version 4.7.3

The Watson Speech services were released in September 2023 with Cloud Pak for Data 4.7.3.

Operand version: 4.7.3

This release includes the following changes:

Issues fixed in this release

The following issues were fixed in this release:

Updates to US English next-generation telephony model: The US English next-generation telephony model en-US_Telephony has been updated for improved speech recognitions

Security issues fixed in this release

This release includes fixes for the following security issues:

CVE-2023-2976, CVE-2023-29402, CVE-2023-29403, CVE-2023-29404, CVE-2023-29405

Cloud Pak for Data Version 4.7.1

The Watson Speech services were released in July 2023 with Cloud Pak for Data 4.7.1.

Operand version: 4.7.1

This release includes the following changes:

Security issues fixed in this release

This release includes fixes for the following security issues:

CVE-2016-1000027

CVE-2022-35252, CVE-2022-36227, CVE-2022-41724, CVE-2022-41725, CVE-2022-4304, CVE-2022-4450

CVE-2023-0215, CVE-2023-20863, CVE-2023-20873, CVE-2023-24532, CVE-2023-24534, CVE-2023-24536, CVE-2023-24537, CVE-2023-24538, CVE-2023-24539, CVE-2023-24540, CVE-2023-29400

Cloud Pak for Data Version 4.7.0

The Watson Speech services were released in June 2023 with Cloud Pak for Data 4.7.0.

Operand version: 4.7.0

This release includes the following changes:

New features

The 4.7.0 release of the Watson Speech services includes the following features and updates:

Online backup and restore with OADP

You can now use the Cloud Pak for Data OpenShift® APIs for Data Protection (OADP) backup and restore utility to do an online backup and restore of the Watson Speech services.

For more information, see Cloud Pak for Data online backup and restore.

Offline backup and restore with OADP is not available for the Watson Speech services.

Shut down and restart the Watson Speech services

You can now shut down and restart the Watson Speech services. Shutting down services when you don't need them helps you conserve cluster resources. For more information, see Shutting down and restarting services.

Updates

The following updates were introduced in this release:

Updates to English next-generation Medical telephony model

The English next-generation Medical telephony model (en-WW_Medical_Telephony) has been updated for improved speech recognition

Added support for French and German on new improved next-generation language model customization

Language model customization for French and German next-generation models was recently added. This service update includes further internal improvements. For more information about improved next-generation customization, see:

New procedure for upgrading a custom model that is based on an improved next-generation model

Two approaches are now available to upgrade a custom language model to an improved next-generation base model. You can still modify and then retrain the custom model, as already documented. But now, you can also upgrade the custom model by including the query parameter force=true with the POST /v1/customizations/{customization_id}/train request. The force parameter upgrades the custom model regardless of whether it contains changes (is in the ready or available state). For more information, see Upgrading a custom language model based on an improved next-generation model

Guidance for adding words to custom models that are based on improved next-generation models

The documentation now offers more guidance about adding words to custom models that are based on improved next-generation models. For performance reasons during training, the guidance encourages the use of corpora rather than the direct addition of custom words whenever possible. For more information, see Guidelines for adding words to custom models based on improved next-generation models

Japanese custom words for custom models that are based on improved next-generation models are handled differently

For Japanese custom models that are based on next-generation models, custom words are handled differently from other languages. For Japanese, you can add a custom word or sounds-like that does not exceed 25 characters in length. If your custom word or sounds-like exceeds that limit, the service adds the word to the custom model as if it were added by a corpus. The word does not appear as a custom word for the model. For more information, see Guidelines for adding words to Japanese models based on improved next-generation models

Further improvements to updated next-generation language model customization

Language model customization for English and Japanese next-generation models was recently improved. This service update includes further internal improvements. For more information about improved next-generation customization, see:

Issues fixed in this release

The following issues were fixed in this release:

Creating and training a custom Language Model is now optimal for both standard and low-latency Next-Generation models

When creating and training a custom Language Model with corpora text files and / or custom words using a Next-generation low-latency model, it is now performing the same way as with a standard model. Previously, it was not optimal only when using a Next-Generation low-latency model.

STT Websockets sessions no longer fail due to tensor error message

When using STT websockets, sessions no longer fail due to an error message STT returns the error: Sizes of tensors must match except in dimension 0”.

Custom words containing half-width Katakana characters now return a clear error message with Japanese Telephony model

Only full-width Katakana characters are accepted in custom words and the next generation models now show an error message to explain that it's not supported. Previously, when creating custom words containing half-width Katakana characters, no error message was provided.

Japanese Telephony language model no longer fails due to long training time

When training a custom language model with Japanese Telephony, the service now effectively handles large numbers of custom words without failing.

The WebSocket interface now times out as expected when using next-generation models

When used for speech recognition with next-generation models, the WebSocket interface now times out as expected after long periods of silence. Previously, when used for speech recognition of short audio files, the WebSocket session could fail to time out. When the session failed to time out, the service did not return a final hypothesis to the waiting client application, and the client instead timed while waiting for the results.

Limits to allow completion of training for next-generation Japanese custom models

Successful training of a next-generation Japanese custom language model requires that custom words and sounds-likes added to the model each contain no more than 25 characters. For the most effective training, it is recommended that custom words and sounds-likes contain no more than 20 characters. Training of Japanese custom models with longer custom words and sounds-likes fails to complete after multiple hours of training. If you need to add the equivalent of a long word or sounds-like to a next-generation Japanese custom model, take these steps:

Add a shorter word or sounds-like that captures the essence of the longer word or sounds-like to the custom model.
Add one or more sentences that use the longer word or sounds-like to a corpus.
Consider adding sentences to the corpus that provide more context for the word or sounds-like. Greater context gives the service more information with which to recognize the word and apply the correct sounds-like.
Add the corpus to the custom model.
Retrain the custom model on the combination of the shorter word or sounds-like and the corpus that contains the longer string.

The limits and steps just described allow next-generation Japanese custom models to complete training. Keep in mind that adding large numbers of new custom words to a custom language model increases the training time of the model. But the increased training time occurs only when the custom model is initially trained on the new words. Once the custom model has been trained on the new words, training time returns to normal. For more information, see:

Smart formatting for US English dates is now correct

Smart formatting now correctly includes days of the week and dates when both are present in the spoken audio, for example, Tuesday February 28. Previously, in some cases the day of the week was omitted and the date was presented incorrectly. Note that smart formatting is beta functionality.

Update documentation for speech hesitation words for next-generation models

Documentation for speech hesitation words for next-generation models has been updated. More details are provided about US English and Japanese hesitation words. Next-generation models include the actual hesitation words in transcription results, unlike previous-generation models, which include only hesitation markers. For more information, see: Speech hesitations and hesitation markers

Security issues fixed in this release

This release includes fixes for the following security issues:

CVE-2023-20860, CVE-2023-24998, CVE-2023-25669, CVE-2023-28708, CVE-2023-25577, CVE-2023-25662, CVE-2023-25663, CVE-2023-25665, CVE-2023-25666, CVE-2023-25676, CVE-2023-25801, CVE-2023-25675, CVE-2023-25660, CVE-2023-25658, CVE-2023-25659, CVE-2023-23916, CVE-2023-25664, CVE-2023-25667, CVE-2023-25673, CVE-2023-25674, CVE-2023-25670, CVE-2023-25671, CVE-2023-25672, CVE-2023-27579, CVE-2023-25661, CVE-2023-25668, CVE-2023-20861, CVE-2023-23934

CVE-2022-45907, CVE-2022-32149, CVE-2022-1471, CVE-2022-41721

CVE-2021-44269, CVE-2021-22570