copyright: years: 2018, 2020 lastupdated: "2020-06-04"
Configuring MRCPv2 speech recognizer services
As an alternative to IBM® Speech to Text, you can configure your Voice Gateway deployment to connect with a third-party speech recognizer service by using an MRCPv2 connection.
Connecting to an MRCPv2 recognizer
-
Clone or download the sample.voice.gateway repository on GitHub.
-
Go to the directory where you cloned
sample.voice.gateway
repository on your machine, and open themrcp/ directory
, which contains the following files:docker-compose.yml
- Basic configuration of Voice Gateway with MRCPv2tenantConfiguration.json
- JSON configuration file
-
Open the
unimrcpConfig/unimrcpclient.xml
configuration file. In the<server-ip>
field, specify the IP address of the MRCPv2 server. In the<ext-ip>
field, specify the external IP address of the machine where the Media Relay container is running.- For full list of fields that can be configured, see the UniMRCP configuration file.
-
In the
docker-compose.yml
file, mount theunimrcpclient.xml
file to the Media Relay container. -
In the
tenantConfiguration.json
file, you can specify to use an MRCPv2 provider for speech recognition by setting yourstt
configurationproviderType
parameter value tomrcpv2
. You can include more configuration fields such as the recognizer grammars, MRCPv2 message header fields, to further customize your deployment. See the MRCPv2 recognizer configuration and programming model.
"stt": { ... "providerType": "mrcpv2" ... }
{:codeblock}
**Remember**: If you don't specify a `providerType`, Voice Gateway uses the `watson` parameter by default.
1. You can get started by importing the `mrcp/sample-mrcp-conversation.json` file from your cloned `sample.voice.gateway` GitHub repository. Once imported, specify your Watson Assistant credentials and workspace ID in the `tenantConfiguration.json` file. To learn more about importing JSON files, see [Creating a dialog skill](https://cloud.ibm.com/docs/assistant-data?topic=assistant-data-skill-dialog-add).
1. Once your configuration is completed, create and start up the containers by running the following command.
``` {: codeblock}
docker-compose up
Example: Single provider configuration that uses MRCPv2
In the following example, a single provider is configured to use an MRCPv2 connection with a speech recognizer, an alternative to using a Speech to Text service instance. Similarly, the configuration properties are formatted for both single
and multiple provider configurations, but they use different root level properties. For example, single tenant provider format requires only providerType
while the multiple provider configurations include providerSelectionPolicy
and providers
at the root level.
```JSON {: codeblock}
{ "stt": { "providerType": "mrcpv2", "config": { "mrcpv2ProfileID": "MRCP #1", "recognizeHeaders": { "Speech-Language": "en-US", "Confidence-Threshold": 0.9 }, "recognizeBody": { "contentType": "text/uri", "body": "builtin:grammar/digits?language=en-us\nbuiltin:grammar/boolean?language=en-us" } }, "bargeInResume": true } }
### Example: Multiple provider configuration that uses MRCPv2 {: #MRCP_recognizer_multiple}
In the following example, one provider is shown in a multiple provider formatted JSON configuration file. Unlike the single provider configuration, the multiple provider configuration has `providerSelectionPolicy` and `providers` at the root level.
```json
{
"stt": {
"providerSelectionPolicy" : "sequential",
"providers" : [
{
"name" : "mrcp-primary",
"providerType": "mrcpv2",
"config": {
"mrcpv2ProfileID": "MRCP #1",
"recognizeHeaders": {
"Speech-Language": "en-US",
"Confidence-Threshold": 0.9,
},
"recognizeBody": {
"contentType": "text/uri",
"body": "builtin:grammar/digits?language=en-us\nbuiltin:grammar/boolean?language=en-us"
}
},
"bargeInResume": true
}
]
}
}
|
Programming with the MRCPv2 recognizer configuration and grammars
By using the configuration variables and parameters in your Voice Gateway configuration, you can fully control the headers and the body of the MRCPv2 RECOGNIZE request.
Changes to Speech to Text configuration variables for MRCPv2
The top-level Voice Gateway configuration for the Speech to Text has equivalent values for when you configure an MRCPv2 speech recognizer.
Parameters | Value | Description |
---|---|---|
providerType |
String | Defines the type of the speech provider mrcpv2 or watson . Defaults to watson . |
credentials |
Credentials | Required if using Speech to Text if you have mixed providers. Not required for MRCPv2 recognizer services. |
config |
WatsonSpeechToTextConfig /MrcpRecognizerConfig |
Required. Defines the configuration for the specified text to speech provider. |
connectionTimeout |
Float | Optional. Time in seconds that Voice Gateway waits to establish a socket connection with the Speech to Text or recognizer service. If the time is exceeded, Voice Gateway reattempts to connect with the Speech to Text or recognizer service. If the service still can't be reached, the call fails. Version 1.0.0.5 and later. |
requestTimeout |
Float | Optional. Time in seconds that Voice Gateway waits to establish a speech recognition session with the Speech to Text or recognizer service. If the time is exceeded, Voice Gateway reattempts to connect with the Speech to Text or recognizer service. If the service still can't be reached, the call fails. Version 1.0.0.5 and later. |
providers[] |
String | Optional. A list of speech providers. |
MRCPv2 Recognizer configuration parameters
There are also configuration parameters that are specific to the mrcpv2
provider configuration.
Parameters | Value | Description |
---|---|---|
mrcpv2ProfileID |
String | Profile ID for the MRCPv2 Client. The original configuration document in XML can be found in the MRCPv2 Client configuration manual. |
recognizeHeaders |
JSON object | Collection of name and value pairs that are used as headers for the MRCPv2 RECOGNIZE request |
recognizeBody |
JSON object | Specifies the content body of the recognition request.
|
Grammars
Programming an MRCPv2 recognizer usually implies the use of grammars. The MRCPv2 speech recognizer uses grammars to specify the words and patterns that it listens for. Generally, you can use inline grammars, external grammars, and built-in grammars to specify the grammars to be used for the speech recognition.
Inline grammars
Inline grammars are explicitly defined in the Voice Gateway action tag, vgwActSetSTTConfig
. These grammars are suited for short responses from the caller. For example, from a Watson Assistant dialog, you can specify an inline
grammar through the vgwActSetSTTConfig
action tag.
The following example shows a method to specify the grammar in XML format instead of in JSON format. The value for the body
parameter is a single line and the quote symbol, "
, is escaped to satisfy the JSON
format of the configuration. Also, in this example, Voice Gateway is instructed to use the configuration that is specified for one conversational turn by specifying mergeOnce
.
{
"vgwAction": {
"command": "vgwActSetSTTConfig",
"parameters": {
"updateMethod": "mergeOnce",
"config": {
"recognizeHeaders": {
"No-Input-Timeout": 10000
},
"recognizeBody": {
"contentType": "application/srgs+xml",
"body": "<grammar version=\"1.0\" xml:lang=\"en-US\" root=\"digits\"> <rule id=\"confirmations\"> <one-of> <item> affirmative </item> <item> nah </item> <item> no </item> <item> nope </item> <item> yeah </item> <item> yep </item> <item> yes </item> <item> yup </item> <item> fine </item> <item> negative </item> <item> OK </item> <item> sure </item> </one-of> </rule> </grammar>"
}
}
}
}
}
External grammars
External grammars are grammars that are specified in external files, which can be referenced with external URI references. You can include external grammars as a list in the Watson Assistant dialog.
In the following example, the recognizeBody
specifies the content type as text/uri-list
. This content type instructs the MRCPv2 Recognizer to fetch the grammars from the list of provided URIs defined in the body
field.
{
"vgwAction": {
"command": "vgwActSetSTTConfig",
"parameters": {
"updateMethod": "mergeOnce",
"config": {
"recognizeBody": {
"body": ["http://example-grammar-server.com/grammars/basic_grammar.xml", "file://grammars/example_file_system_grammar.xml"],
"contentType": "text/uri-list"
}
}
}
}
}
Built-in Grammars
Some MRCPv2 recognizers support built-in
grammars, which are typically grammars that support various common user inputs, such as confirmations or digits, among others. To use built-in
grammars, you can specify them
as though they are external grammars. For example, you can specify whether the MRCPv2 recognizer has a built-in grammar for digits
In the following example, the specified grammar, builtin:grammar/digits?language=en-US;length=9
, says that the caller inputs digits, and the length of the spoken digits must match 9
.
{
"vgwAction": {
"command": "vgwActSetSTTConfig",
"parameters": {
"updateMethod": "mergeOnce",
"config": {
"recognizeBody": {
"body": ["builtin:grammar/digits?language=en-US;length=9"],
"contentType": "text/uri-list"
}
}
}
}
}
Handling Recognizer Results
If recognition completes successfully, the result with the highest confidence is used as the input for the Watson Assistant conversation turn. Additionally, the Voice Gateway state variable vgwSTTResponse
is set to the resulting
JSON, which includes the list of other alternatives as well as the original response from the server. The following example shows that the original response of the MRCPv2 recognizer is set to the mrcpv2RecognizerResult
field
of the vgwSTTResponse
context state variable.
{
"vgwSTTResponse": {
"warnings": "",
"error": "",
"mrcpv2RecognizerResult": "<?xml version='1.0'?><result><interpretation grammar=\"https:\/\/raw.githubusercontent.com\/WASdev\/sample.voice.gateway\/54e662f18e67c6935e8c8b815b89b57014a37494\/mrcp\/grammar-samples\/store-hours_query.xml\" confidence=\"0.76\"><input mode=\"speech\">Are you open on Sunday<\/input><instance><SWI_literal>Are you open on Sunday<\/SWI_literal><SWI_grammarName>https:\/\/raw.githubusercontent.com\/WASdev\/sample.voice.gateway\/54e662f18e67c6935e8c8b815b89b57014a37494\/mrcp\/grammar-samples\/store-hours_query.xml<\/SWI_grammarName><SWI_meaning>{SWI_literal:Are you open on Sunday}<\/SWI_meaning><\/instance><\/interpretation><interpretation grammar=\"https:\/\/raw.githubusercontent.com\/WASdev\/sample.voice.gateway\/54e662f18e67c6935e8c8b815b89b57014a37494\/mrcp\/grammar-samples\/store-hours_query.xml\" confidence=\"0.29\"><input mode=\"speech\">Are you open on Monday<\/input><instance><SWI_literal>Are you open on Monday<\/SWI_literal><SWI_grammarName>https:\/\/raw.githubusercontent.com\/WASdev\/sample.voice.gateway\/54e662f18e67c6935e8c8b815b89b57014a37494\/mrcp\/grammar-samples\/store-hours_query.xml<\/SWI_grammarName><SWI_meaning>{SWI_literal:Are you open on Monday}<\/SWI_meaning><\/instance><\/interpretation><\/result>",
"results": [{
"final": true,
"alternatives": [{
"transcript": "Are you open on Sunday",
"confidence": "0.76"
}, {
"transcript": "Are you open on Monday",
"confidence": "0.29"
}]
}]
}
}
In unsuccessful recognition results, the Voice Gateway sends the keyword as the user input,vgwMrcpRecognitionUnsuccessful
, to inform Watson Assistant that recognition was not successful. In example case, the state variable, vgwMrcpRecognizerResponse
,
is set to the response of the MRCPv2 recognizer.
{
"context": {
"vgwMrcpRecognizerResponse": {
"headers": {
"Completion-Cause": "001 no-input-timeout"
},
"body": "<body of the recognition result>"
}
}
}
The conversation developer can check the response of the MRCPv2 recognizer vgwMrcpRecognizerResponse
for insight on what occurred to the recognition request. For a full list of recognition completion causes, you can look into
the MRCPv2 RFC.