Contribute in GitHub:

Dynamically configuring the Text to Speech service

By using the IBM® Voice Gateway API, you can dynamically configure the Text to Speech service during a call. To change the configuration, define the vgwActSetTTSConfig action in the output of a node response in your Watson Assistant dialog tree. For more information about using the API, see Defining action tags and state variables.

Watson Text to Speech service instances that are not part of Premium plans, by default, log requests and their results to improve the service for future users. To prevent IBM usage of data in this way, see Data collection in the Watson Text to Speech API reference.

The following example shows configuration that you can add to the node response in your Watson Assistant dialog tree. The settings are transparently passed as JSON properties to the Text to Speech service.

{
  "output": {
    "vgwAction": {
      "command": "vgwActSetTTSConfig",
      "parameters": {
  			"credentials": {
  		  	"url": "url": "https://api.us-south.text-to-speech.watson.cloud.ibm.com/instances/{instance_id}",
  		  	"apikey": "{apikey}",
  		  	"tokenServiceProviderUrl": "https://iam.cloud.ibm.com/identity/token"
  			},
  			"config": {
  				"x-watson-learning-opt-out": true,
  				"voice": "es-ES_LauraVoice"
  			},
  			"jitterBufferDelay": 200,
  			"connectionTimeout": 30,
  			"requestTimeout": 10,
  			"cacheTimeToLive": 336
  		}
    }
  }
}

Table 1. JSON properties for the Text to Speech service
JSON property	Description
`credentials`	Credentials for the Watson Text to Speech service. If not defined, the default credentials from the Media Relay configuration are used. You can also reduce call latency times by configuring the `tokenAuthEnabled` credential to enable token authentication for Version 1.0.0.5a and later. See Enabling user name and password based token authentication for Watson services.
`config`	Parameters for the Watson Text to Speech service. See WebSockets API reference for Watson Text to Speech Service.
`jitterBufferDelay`	The amount of time in milliseconds to buffer before playing back audio from the Text to Speech service. This buffer accounts for any jitter in the streaming audio. If not defined, the value of `WATSON_TTS_JITTER_BUFFER_DELAY` in the Media Relay configuration is used.
`connectionTimeout`	Time in seconds that Voice Gateway waits to establish a socket connection with the Watson Text to Speech service. If the time is exceeded, Voice Gateway reattempts to connect with the Watson Text to Speech service. If the service still can't be reached, the call fails. Version 1.0.0.5 and later.
`requestTimeout`	Time in seconds that Voice Gateway waits to establish a speech synthesis session with the Watson Text to Speech service. If the time is exceeded, Voice Gateway reattempts to connect with the Watson Text to Speech service. If the service still can't be reached, the call fails. Version 1.0.0.5 and later.
`cacheTimeToLive`	The time in hours to cache responses from the Text to Speech service to improve playback response time. When enabled, all Text to Speech responses are cached unless they are excluded in the Watson Assistant dialog. If not defined, the value of `TTS_CACHE_TIME_TO_LIVE` in the Media Relay configuration is used.
`updateMethod`	Optional. Specifies the update strategy to choose when setting the speech configuration. Possible values: `replace` `replaceOnce` `merge` `mergeOnce` See Using `updateMethod`. Version 1.0.0.7 and later.

Note: The following parameters from the Text to Speech service can't be modified because they have fixed values that are used by the Media Relay.

accept
text

Example: Setting the Text to Speech voice (en-US_LisaVoice)

In this example, the voice is set to en-US_LisaVoice. Because the credentials property isn't defined, the Media Relay will use the credentials defined through the Media Relay configuration (WATSON_TTS_URL, WATSON_TTS_USERNAME, and WATSON_TTS_PASSWORD)

{
  "output": {
    "vgwAction": {
      "command": "vgwActSetTTSConfig",
      "parameters": {
        "config": {
  				"voice": "en-US_LisaVoice"
  			}
  		}
    }
  }
}

Using `updateMethod`

You can use the updateMethod property in dynamic configuration to define how changes to the configuration are made, by either replacing the configuration or merging new configuration properties, and specifying whether these changes occur for the duration of the call or one conversation turn.

Table 2. Available options to update properties when using updateMethod.
JSON property	Description
replace	Replaces the configuration for the duration of the call.
replaceOnce	Replaces the configuration once, so the configuration is used for only the following conversation turn. Then, it reverts to the previous configuration.
merge	Merges the configuration with the existing configuration for the duration of the call.
mergeOnce	Merges the configuration for one turn of the conversation, and then reverts to the previous configuration.

Updating fields that are not root level

When configuring dynamically from Watson Assistant, it's important to note that only the root level fields, such as config or bargeInResume, are updated. If they are omitted from the action, the original configuration settings persist. You can use the different updateMethod properties for merge and mergeOnce to merge config fields with the existing configuration.

Deprecated: Configuring the Text to Speech service by defining state variables

In Version 1.0.0.2, configuring Watson speech service by defining state variables was deprecated in favor of the action tags described in the previous sections.

Important: Although the state variables continue to function, you can't define these deprecated state variables and the action tags within a node. Your Watson Assistant dialog can contain a mixture of action tags and deprecated state variables, but the JSON definition for each node can contain only one or the other.

{
	"context": {
		"vgwTTSConfigSettings":{
			"credentials": {
		  	"url": "url": "https://api.us-south.text-to-speech.watson.cloud.ibm.com/instances/{instance_id}",
		  	"apikey": "{apikey}",
		  	"tokenServiceProviderUrl": "https://iam.cloud.ibm.com/identity/token"
			},
			"config": {
				"x-watson-learning-opt-out": true,
				"voice": "es-ES_LauraVoice"
			},
			"jitterBufferDelay": 200,
			"cacheTimeToLive": 336
		}
	}
}

Table 2. JSON properties for the Text to Speech service
JSON property	Description
`credentials`	Credentials for the Watson Text to Speech service. If not defined, the default credentials from the Media Relay configuration are used.
`config`	Parameters for the Watson Text to Speech service. See WebSockets API reference for Watson Text to Speech Service.
`jitterBufferDelay`	The amount of time in milliseconds to buffer before playing back audio from the Text to Speech service. This buffer accounts for any jitter in the streaming audio. If not defined, the value of `WATSON_TTS_JITTER_BUFFER_DELAY` in the Media Relay configuration is used.
`cacheTimeToLive`	The time in hours to cache responses from the Text to Speech service to improve playback response time. When enabled, all Text to Speech responses are cached unless they are excluded in the Watson Assistant dialog. If not defined, the value of `TTS_CACHE_TIME_TO_LIVE` in the Media Relay configuration is used. Version 1.0.0.1 and later.

Dynamically configuring the Text to Speech service

Example: Setting the Text to Speech voice (en-US_LisaVoice)

Using updateMethod

Updating fields that are not root level

Deprecated: Configuring the Text to Speech service by defining state variables

Using `updateMethod`