What's new and changed in Watson Speech services
Watson Speech services updates can include new features, bug fixes, and security updates. Updates are listed in reverse chronological order so that the latest release is at the beginning of the topic.
You can see a list of the new features for the platform and all of the services at What's new in IBM Cloud Pak® for Data.
Installing or upgrading Watson Speech services
- Related documentation:
Cloud Pak for Data Version 4.7.3
The Watson Speech services were released in September 2023 with Cloud Pak for Data 4.7.3.
Operand version: 4.7.3
This release includes the following changes:
- Issues fixed in this release
- The following issues were fixed in this release:
- Updates to US English next-generation telephony model
- The US English next-generation telephony model
en-US_Telephony
has been updated for improved speech recognitions
- Security issues fixed in this release
-
This release includes fixes for the following security issues:
CVE-2023-2976, CVE-2023-29402, CVE-2023-29403, CVE-2023-29404, CVE-2023-29405
Cloud Pak for Data Version 4.7.1
The Watson Speech services were released in July 2023 with Cloud Pak for Data 4.7.1.
Operand version: 4.7.1
This release includes the following changes:
- Security issues fixed in this release
-
This release includes fixes for the following security issues:
CVE-2016-1000027
CVE-2022-35252, CVE-2022-36227, CVE-2022-41724, CVE-2022-41725, CVE-2022-4304, CVE-2022-4450
CVE-2023-0215, CVE-2023-20863, CVE-2023-20873, CVE-2023-24532, CVE-2023-24534, CVE-2023-24536, CVE-2023-24537, CVE-2023-24538, CVE-2023-24539, CVE-2023-24540, CVE-2023-29400
Cloud Pak for Data Version 4.7.0
The Watson Speech services were released in June 2023 with Cloud Pak for Data 4.7.0.
Operand version: 4.7.0
This release includes the following changes:
- New features
-
The 4.7.0 release of the Watson Speech services includes the following features and updates:
- Online backup and restore with OADP
- You can now use the Cloud Pak for Data
OpenShift® APIs for Data Protection (OADP) backup and restore utility to do an online
backup and restore of the Watson Speech
services.
For more information, see Cloud Pak for Data online backup and restore.
Offline backup and restore with OADP is not available for the Watson Speech services.
- Shut down and restart the Watson Speech services
- You can now shut down and restart the Watson Speech services. Shutting down services when you don't need them helps you conserve cluster resources. For more information, see Shutting down and restarting services.
- Updates
- The following updates were introduced in this release:
- Updates to English next-generation Medical telephony model
- The English next-generation Medical telephony model (
en-WW_Medical_Telephony
) has been updated for improved speech recognition - Added support for French and German on new improved next-generation language model customization
- Language model customization for French and German next-generation models was recently added. This service update includes further internal improvements. For more information about improved next-generation customization, see:
- New procedure for upgrading a custom model that is based on an improved next-generation model
- Two approaches are now available to upgrade a custom language model to an improved next-generation base model. You can still modify and then retrain the custom model, as already documented. But now, you can also upgrade the custom model by including the query parameter
force=true
with thePOST /v1/customizations/{customization_id}/train
request. Theforce
parameter upgrades the custom model regardless of whether it contains changes (is in theready
oravailable
state). For more information, see Upgrading a custom language model based on an improved next-generation model - Guidance for adding words to custom models that are based on improved next-generation models
- The documentation now offers more guidance about adding words to custom models that are based on improved next-generation models. For performance reasons during training, the guidance encourages the use of corpora rather than the direct addition of custom words whenever possible. For more information, see Guidelines for adding words to custom models based on improved next-generation models
- Japanese custom words for custom models that are based on improved next-generation models are handled differently
- For Japanese custom models that are based on next-generation models, custom words are handled differently from other languages. For Japanese, you can add a custom word or sounds-like that does not exceed 25 characters in length. If your custom word or sounds-like exceeds that limit, the service adds the word to the custom model as if it were added by a corpus. The word does not appear as a custom word for the model. For more information, see Guidelines for adding words to Japanese models based on improved next-generation models
- Further improvements to updated next-generation language model customization
- Language model customization for English and Japanese next-generation models was recently improved. This service update includes further internal improvements. For more information about improved next-generation customization, see:
- Issues fixed in this release
- The following issues were fixed in this release:
- Creating and training a custom Language Model is now optimal for both standard and low-latency Next-Generation models
- When creating and training a custom Language Model with corpora text files and / or custom words using a Next-generation low-latency model, it is now performing the same way as with a standard model. Previously, it was not optimal only when using a Next-Generation low-latency model.
- STT Websockets sessions no longer fail due to tensor error message
- When using STT websockets, sessions no longer fail due to an error message
STT returns the error: Sizes of tensors must match except in dimension 0”
. - Custom words containing half-width Katakana characters now return a clear error message with Japanese Telephony model
- Only full-width Katakana characters are accepted in custom words and the next generation models now show an error message to explain that it's not supported. Previously, when creating custom words containing half-width Katakana characters, no error message was provided.
- Japanese Telephony language model no longer fails due to long training time
- When training a custom language model with Japanese Telephony, the service now effectively handles large numbers of custom words without failing.
- The WebSocket interface now times out as expected when using next-generation models
- When used for speech recognition with next-generation models, the WebSocket interface now times out as expected after long periods of silence. Previously, when used for speech recognition of short audio files, the WebSocket session could fail to time out. When the session failed to time out, the service did not return a final hypothesis to the waiting client application, and the client instead timed while waiting for the results.
- Limits to allow completion of training for next-generation Japanese custom models
- Successful training of a next-generation Japanese custom language model requires that custom words and sounds-likes added to the model each contain no more than 25 characters. For the most effective training, it is recommended that custom words and sounds-likes contain no more than 20 characters. Training of Japanese custom models with longer custom words and sounds-likes fails to complete after multiple hours of training.
If you need to add the equivalent of a long word or sounds-like to a next-generation Japanese custom model, take these steps:
- Add a shorter word or sounds-like that captures the essence of the longer word or sounds-like to the custom model.
- Add one or more sentences that use the longer word or sounds-like to a corpus.
- Consider adding sentences to the corpus that provide more context for the word or sounds-like. Greater context gives the service more information with which to recognize the word and apply the correct sounds-like.
- Add the corpus to the custom model.
- Retrain the custom model on the combination of the shorter word or sounds-like and the corpus that contains the longer string.
- Smart formatting for US English dates is now correct
- Smart formatting now correctly includes days of the week and dates when both are present in the spoken audio, for example,
Tuesday February 28
. Previously, in some cases the day of the week was omitted and the date was presented incorrectly. Note that smart formatting is beta functionality. - Update documentation for speech hesitation words for next-generation models
- Documentation for speech hesitation words for next-generation models has been updated. More details are provided about US English and Japanese hesitation words. Next-generation models include the actual hesitation words in transcription results, unlike previous-generation models, which include only hesitation markers. For more information, see: Speech hesitations and hesitation markers
- Security issues fixed in this release
-
This release includes fixes for the following security issues:
CVE-2023-20860, CVE-2023-24998, CVE-2023-25669, CVE-2023-28708, CVE-2023-25577, CVE-2023-25662, CVE-2023-25663, CVE-2023-25665, CVE-2023-25666, CVE-2023-25676, CVE-2023-25801, CVE-2023-25675, CVE-2023-25660, CVE-2023-25658, CVE-2023-25659, CVE-2023-23916, CVE-2023-25664, CVE-2023-25667, CVE-2023-25673, CVE-2023-25674, CVE-2023-25670, CVE-2023-25671, CVE-2023-25672, CVE-2023-27579, CVE-2023-25661, CVE-2023-25668, CVE-2023-20861, CVE-2023-23934
CVE-2022-45907, CVE-2022-32149, CVE-2022-1471, CVE-2022-41721
CVE-2021-44269, CVE-2021-22570