GitHubContribute in GitHub: Edit online

copyright: years: 2017, 2023 lastupdated: "2023-01-06"


Dynamically configuring the Text to Speech service

By using the IBM® Voice Gateway API, you can dynamically configure the Text to Speech service during a call. To change the configuration, define the vgwActSetTTSConfig action in the output of a node response in your Watson Assistant dialog tree. For more information about using the API, see Defining action tags and state variables.

Watson Text to Speech service instances that are not part of Premium plans, by default, log requests and their results to improve the service for future users. To prevent IBM usage of data in this way, see Data collection in the Watson Text to Speech API reference.

The following example shows configuration that you can add to the node response in your Watson Assistant dialog tree. The settings are transparently passed as JSON properties to the Text to Speech service.

{
  "output": {
    "vgwAction": {
      "command": "vgwActSetTTSConfig",
      "parameters": {
  			"credentials": {
  		  	"url": "url": "https://api.us-south.text-to-speech.watson.cloud.ibm.com/instances/{instance_id}",
  		  	"apikey": "{apikey}",
  		  	"tokenServiceProviderUrl": "https://iam.cloud.ibm.com/identity/token"
  			},
  			"config": {
  				"x-watson-learning-opt-out": true,
  				"voice": "es-ES_LauraVoice"
  			},
  			"jitterBufferDelay": 200,
  			"connectionTimeout": 30,
  			"requestTimeout": 10,
  			"cacheTimeToLive": 336
  		}
    }
  }
}

Table 1. JSON properties for the Text to Speech service
JSON property Description
credentials Credentials for the Watson Text to Speech service. If not defined, the default credentials from the Media Relay configuration are used. You can also reduce call latency times by configuring the tokenAuthEnabled credential to enable token authentication for Version 1.0.0.5a and later. See Enabling user name and password based token authentication for Watson services.
config Parameters for the Watson Text to Speech service. See WebSockets API reference for Watson Text to Speech Service.
jitterBufferDelay The amount of time in milliseconds to buffer before playing back audio from the Text to Speech service. This buffer accounts for any jitter in the streaming audio. If not defined, the value of WATSON_TTS_JITTER_BUFFER_DELAY in the Media Relay configuration is used.
connectionTimeout Time in seconds that Voice Gateway waits to establish a socket connection with the Watson Text to Speech service. If the time is exceeded, Voice Gateway reattempts to connect with the Watson Text to Speech service. If the service still can't be reached, the call fails. Version 1.0.0.5 and later.
requestTimeout Time in seconds that Voice Gateway waits to establish a speech synthesis session with the Watson Text to Speech service. If the time is exceeded, Voice Gateway reattempts to connect with the Watson Text to Speech service. If the service still can't be reached, the call fails. Version 1.0.0.5 and later.
cacheTimeToLive The time in hours to cache responses from the Text to Speech service to improve playback response time. When enabled, all Text to Speech responses are cached unless they are excluded in the Watson Assistant dialog. If not defined, the value of TTS_CACHE_TIME_TO_LIVE in the Media Relay configuration is used.
updateMethod Optional. Specifies the update strategy to choose when setting the speech configuration. Possible values:
  • replace
  • replaceOnce
  • merge
  • mergeOnce
See Using updateMethod. Version 1.0.0.7 and later.

Note: The following parameters from the Text to Speech service can't be modified because they have fixed values that are used by the Media Relay.

  • accept
  • text

Example: Setting the Text to Speech voice (en-US_LisaVoice)

In this example, the voice is set to en-US_LisaVoice. Because the credentials property isn't defined, the Media Relay will use the credentials defined through the Media Relay configuration (WATSON_TTS_URL, WATSON_TTS_USERNAME, and WATSON_TTS_PASSWORD)

{
  "output": {
    "vgwAction": {
      "command": "vgwActSetTTSConfig",
      "parameters": {
        "config": {
  				"voice": "en-US_LisaVoice"
  			}
  		}
    }
  }
}

Using updateMethod

You can use the updateMethod property in dynamic configuration to define how changes to the configuration are made, by either replacing the configuration or merging new configuration properties, and specifying whether these changes occur for the duration of the call or one conversation turn.

Table 2. Available options to update properties when using updateMethod.
JSON property Description
replace Replaces the configuration for the duration of the call.
replaceOnce Replaces the configuration once, so the configuration is used for only the following conversation turn. Then, it reverts to the previous configuration.
merge Merges the configuration with the existing configuration for the duration of the call.
mergeOnce Merges the configuration for one turn of the conversation, and then reverts to the previous configuration.

Updating fields that are not root level

When configuring dynamically from Watson Assistant, it's important to note that only the root level fields, such as config or bargeInResume, are updated. If they are omitted from the action, the original configuration settings persist. You can use the different updateMethod properties for merge and mergeOnce to merge config fields with the existing configuration.

Deprecated: Configuring the Text to Speech service by defining state variables

In Version 1.0.0.2, configuring Watson speech service by defining state variables was deprecated in favor of the action tags described in the previous sections.

Important: Although the state variables continue to function, you can't define these deprecated state variables and the action tags within a node. Your Watson Assistant dialog can contain a mixture of action tags and deprecated state variables, but the JSON definition for each node can contain only one or the other.

{
	"context": {
		"vgwTTSConfigSettings":{
			"credentials": {
		  	"url": "url": "https://api.us-south.text-to-speech.watson.cloud.ibm.com/instances/{instance_id}",
		  	"apikey": "{apikey}",
		  	"tokenServiceProviderUrl": "https://iam.cloud.ibm.com/identity/token"
			},
			"config": {
				"x-watson-learning-opt-out": true,
				"voice": "es-ES_LauraVoice"
			},
			"jitterBufferDelay": 200,
			"cacheTimeToLive": 336
		}
	}
}
Table 2. JSON properties for the Text to Speech service
JSON property Description
credentials Credentials for the Watson Text to Speech service. If not defined, the default credentials from the Media Relay configuration are used.
config Parameters for the Watson Text to Speech service. See WebSockets API reference for Watson Text to Speech Service.
jitterBufferDelay The amount of time in milliseconds to buffer before playing back audio from the Text to Speech service. This buffer accounts for any jitter in the streaming audio. If not defined, the value of WATSON_TTS_JITTER_BUFFER_DELAY in the Media Relay configuration is used.
cacheTimeToLive The time in hours to cache responses from the Text to Speech service to improve playback response time. When enabled, all Text to Speech responses are cached unless they are excluded in the Watson Assistant dialog. If not defined, the value of TTS_CACHE_TIME_TO_LIVE in the Media Relay configuration is used. Version 1.0.0.1 and later.