copyright: years: 2017, 2023 lastupdated: "2023-01-06"
Dynamically configuring the Text to Speech service
By using the IBM® Voice Gateway API, you can dynamically configure the Text to Speech service during a call. To change the configuration, define the vgwActSetTTSConfig
action in the output
of a node response in your Watson
Assistant dialog tree. For more information about using the API, see Defining action tags and state variables.
Watson Text to Speech service instances that are not part of Premium plans, by default, log requests and their results to improve the service for future users. To prevent IBM usage of data in this way, see Data collection in the Watson Text to Speech API reference.
The following example shows configuration that you can add to the node response in your Watson Assistant dialog tree. The settings are transparently passed as JSON properties to the Text to Speech service.
{
"output": {
"vgwAction": {
"command": "vgwActSetTTSConfig",
"parameters": {
"credentials": {
"url": "url": "https://api.us-south.text-to-speech.watson.cloud.ibm.com/instances/{instance_id}",
"apikey": "{apikey}",
"tokenServiceProviderUrl": "https://iam.cloud.ibm.com/identity/token"
},
"config": {
"x-watson-learning-opt-out": true,
"voice": "es-ES_LauraVoice"
},
"jitterBufferDelay": 200,
"connectionTimeout": 30,
"requestTimeout": 10,
"cacheTimeToLive": 336
}
}
}
}
JSON property | Description |
---|---|
credentials |
Credentials for the Watson Text to Speech service. If not defined, the default credentials from the Media Relay configuration are used. You can also reduce call latency times by configuring the tokenAuthEnabled credential to enable token authentication for Version 1.0.0.5a and later. See Enabling user name and password based token authentication for Watson services. |
config |
Parameters for the Watson Text to Speech service. See WebSockets API reference for Watson Text to Speech Service. |
jitterBufferDelay |
The amount of time in milliseconds to buffer before playing back audio from the Text to Speech service. This buffer accounts for any jitter in the streaming audio. If not defined, the value of WATSON_TTS_JITTER_BUFFER_DELAY in the Media Relay configuration is used. |
connectionTimeout |
Time in seconds that Voice Gateway waits to establish a socket connection with the Watson Text to Speech service. If the time is exceeded, Voice Gateway reattempts to connect with the Watson Text to Speech service. If the service still can't be reached, the call fails. Version 1.0.0.5 and later. |
requestTimeout |
Time in seconds that Voice Gateway waits to establish a speech synthesis session with the Watson Text to Speech service. If the time is exceeded, Voice Gateway reattempts to connect with the Watson Text to Speech service. If the service still can't be reached, the call fails. Version 1.0.0.5 and later. |
cacheTimeToLive |
The time in hours to cache responses from the Text to Speech service to improve playback response time. When enabled, all Text to Speech responses are cached unless they are excluded in the Watson Assistant dialog. If not defined, the value
of TTS_CACHE_TIME_TO_LIVE in the Media Relay configuration is used. |
updateMethod |
Optional. Specifies the update strategy to choose when setting the speech configuration. Possible values:
updateMethod . Version 1.0.0.7 and later. |
Note: The following parameters from the Text to Speech service can't be modified because they have fixed values that are used by the Media Relay.
accept
text
Example: Setting the Text to Speech voice (en-US_LisaVoice)
In this example, the voice is set to en-US_LisaVoice. Because the credentials
property isn't defined, the Media Relay will use the credentials defined through the Media Relay configuration (WATSON_TTS_URL
, WATSON_TTS_USERNAME
, and WATSON_TTS_PASSWORD
)
{
"output": {
"vgwAction": {
"command": "vgwActSetTTSConfig",
"parameters": {
"config": {
"voice": "en-US_LisaVoice"
}
}
}
}
}
Using updateMethod
You can use the updateMethod
property in dynamic configuration to define how changes to the configuration are made, by either replacing the configuration or merging new configuration properties, and specifying whether these changes
occur for the duration of the call or one conversation turn.
JSON property | Description |
---|---|
replace | Replaces the configuration for the duration of the call. |
replaceOnce | Replaces the configuration once, so the configuration is used for only the following conversation turn. Then, it reverts to the previous configuration. |
merge | Merges the configuration with the existing configuration for the duration of the call. |
mergeOnce | Merges the configuration for one turn of the conversation, and then reverts to the previous configuration. |
Updating fields that are not root level
When configuring dynamically from Watson Assistant, it's important to note that only the root level fields, such as config
or bargeInResume
, are updated. If they are omitted from the action, the original configuration
settings persist. You can use the different updateMethod
properties for merge
and mergeOnce
to merge config
fields with the existing configuration.
Deprecated: Configuring the Text to Speech service by defining state variables
In Version 1.0.0.2, configuring Watson speech service by defining state variables was deprecated in favor of the action tags described in the previous sections.
Important: Although the state variables continue to function, you can't define these deprecated state variables and the action tags within a node. Your Watson Assistant dialog can contain a mixture of action tags and deprecated state variables, but the JSON definition for each node can contain only one or the other.
{
"context": {
"vgwTTSConfigSettings":{
"credentials": {
"url": "url": "https://api.us-south.text-to-speech.watson.cloud.ibm.com/instances/{instance_id}",
"apikey": "{apikey}",
"tokenServiceProviderUrl": "https://iam.cloud.ibm.com/identity/token"
},
"config": {
"x-watson-learning-opt-out": true,
"voice": "es-ES_LauraVoice"
},
"jitterBufferDelay": 200,
"cacheTimeToLive": 336
}
}
}
JSON property | Description |
---|---|
credentials |
Credentials for the Watson Text to Speech service. If not defined, the default credentials from the Media Relay configuration are used. |
config |
Parameters for the Watson Text to Speech service. See WebSockets API reference for Watson Text to Speech Service. |
jitterBufferDelay |
The amount of time in milliseconds to buffer before playing back audio from the Text to Speech service. This buffer accounts for any jitter in the streaming audio. If not defined, the value of WATSON_TTS_JITTER_BUFFER_DELAY in the Media Relay configuration is used. |
cacheTimeToLive |
The time in hours to cache responses from the Text to Speech service to improve playback response time. When enabled, all Text to Speech responses are cached unless they are excluded in the Watson Assistant dialog. If not defined, the
value of TTS_CACHE_TIME_TO_LIVE in the Media Relay configuration is used. Version
1.0.0.1 and later. |