Contribute in GitHub:

Configuring MRCPv2 speech recognizer services

As an alternative to IBM® Speech to Text, you can configure your Voice Gateway deployment to connect with a third-party speech recognizer service by using an MRCPv2 connection.

Connecting to an MRCPv2 recognizer

Clone or download the sample.voice.gateway repository on GitHub.
Go to the directory where you cloned sample.voice.gateway repository on your machine, and open the mrcp/ directory, which contains the following files:
- docker-compose.yml - Basic configuration of Voice Gateway with MRCPv2
- tenantConfiguration.json - JSON configuration file
Open the unimrcpConfig/unimrcpclient.xml configuration file. In the <server-ip> field, specify the IP address of the MRCPv2 server. In the <ext-ip> field, specify the external IP address of the machine where the Media Relay container is running.
- For full list of fields that can be configured, see the UniMRCP configuration file.
In the docker-compose.yml file, mount the unimrcpclient.xml file to the Media Relay container.
In the tenantConfiguration.json file, you can specify to use an MRCPv2 provider for speech recognition by setting your stt configuration providerType parameter value to mrcpv2. You can include more configuration fields such as the recognizer grammars, MRCPv2 message header fields, to further customize your deployment. See the MRCPv2 recognizer configuration and programming model.

"stt": { ... "providerType": "mrcpv2" ... }

  {:codeblock}
  **Remember**: If you don't specify a `providerType`, Voice Gateway uses the `watson` parameter by default.

1. You can get started by importing the `mrcp/sample-mrcp-conversation.json` file from your cloned `sample.voice.gateway` GitHub repository. Once imported, specify your Watson Assistant credentials and workspace ID in the `tenantConfiguration.json` file. To learn more about importing JSON files, see [Creating a dialog skill](https://cloud.ibm.com/docs/assistant-data?topic=assistant-data-skill-dialog-add).

1. Once your configuration is completed, create and start up the containers by running the following command.
``` {: codeblock}
docker-compose up

Example: Single provider configuration that uses MRCPv2

In the following example, a single provider is configured to use an MRCPv2 connection with a speech recognizer, an alternative to using a Speech to Text service instance. Similarly, the configuration properties are formatted for both single and multiple provider configurations, but they use different root level properties. For example, single tenant provider format requires only providerType while the multiple provider configurations include providerSelectionPolicy and providers at the root level.

```JSON {: codeblock}

{ "stt": { "providerType": "mrcpv2", "config": { "mrcpv2ProfileID": "MRCP #1", "recognizeHeaders": { "Speech-Language": "en-US", "Confidence-Threshold": 0.9 }, "recognizeBody": { "contentType": "text/uri", "body": "builtin:grammar/digits?language=en-us\nbuiltin:grammar/boolean?language=en-us" } }, "bargeInResume": true } }

### Example: Multiple provider configuration that uses MRCPv2 {: #MRCP_recognizer_multiple}

In the following example, one provider is shown in a multiple provider formatted JSON configuration file. Unlike the single provider configuration, the multiple provider configuration has `providerSelectionPolicy` and `providers` at the root level.

```json
{
"stt": {
  "providerSelectionPolicy" : "sequential",
  "providers" : [
      {
        "name" : "mrcp-primary",
        "providerType": "mrcpv2",
        "config": {
          "mrcpv2ProfileID": "MRCP #1",
          "recognizeHeaders": {
            "Speech-Language": "en-US",
            "Confidence-Threshold": 0.9,
          },
          "recognizeBody": {
            "contentType": "text/uri",
            "body": "builtin:grammar/digits?language=en-us\nbuiltin:grammar/boolean?language=en-us"
          }
        },
        "bargeInResume": true
      }
  ]
}
}

Programming with the MRCPv2 recognizer configuration and grammars

By using the configuration variables and parameters in your Voice Gateway configuration, you can fully control the headers and the body of the MRCPv2 RECOGNIZE request.

Changes to Speech to Text configuration variables for MRCPv2

The top-level Voice Gateway configuration for the Speech to Text has equivalent values for when you configure an MRCPv2 speech recognizer.

Table 1. Parameters that can be used for both MRCPv2 speech recognizer services and Speech to Text services.
Parameters	Value	Description
`providerType`	String	Defines the type of the speech provider `mrcpv2` or `watson`. Defaults to `watson`.
`credentials`	Credentials	Required if using Speech to Text if you have mixed providers. Not required for MRCPv2 recognizer services.
`config`	`WatsonSpeechToTextConfig`/`MrcpRecognizerConfig`	Required. Defines the configuration for the specified text to speech provider.
`connectionTimeout`	Float	Optional. Time in seconds that Voice Gateway waits to establish a socket connection with the Speech to Text or recognizer service. If the time is exceeded, Voice Gateway reattempts to connect with the Speech to Text or recognizer service. If the service still can't be reached, the call fails. Version 1.0.0.5 and later.
`requestTimeout`	Float	Optional. Time in seconds that Voice Gateway waits to establish a speech recognition session with the Speech to Text or recognizer service. If the time is exceeded, Voice Gateway reattempts to connect with the Speech to Text or recognizer service. If the service still can't be reached, the call fails. Version 1.0.0.5 and later.
`providers[]`	String	Optional. A list of speech providers.

MRCPv2 Recognizer configuration parameters

There are also configuration parameters that are specific to the mrcpv2 provider configuration.

Table 2. Configuration parameters specific to MRCPv2 speech recognizers.
Parameters	Value	Description
`mrcpv2ProfileID`	String	Profile ID for the MRCPv2 Client. The original configuration document in XML can be found in the MRCPv2 Client configuration manual.
`recognizeHeaders`	JSON object	Collection of name and value pairs that are used as headers for the MRCPv2 RECOGNIZE request
`recognizeBody`	JSON object	Specifies the content body of the recognition request. `contentType`: Content type of body of the request. For example: `text/uri`, `application/srgs+xml`. `body`: Body to be sent in the request.

Grammars

Programming an MRCPv2 recognizer usually implies the use of grammars. The MRCPv2 speech recognizer uses grammars to specify the words and patterns that it listens for. Generally, you can use inline grammars, external grammars, and built-in grammars to specify the grammars to be used for the speech recognition.

Inline grammars

Inline grammars are explicitly defined in the Voice Gateway action tag, vgwActSetSTTConfig. These grammars are suited for short responses from the caller. For example, from a Watson Assistant dialog, you can specify an inline grammar through the vgwActSetSTTConfig action tag.

The following example shows a method to specify the grammar in XML format instead of in JSON format. The value for the body parameter is a single line and the quote symbol, ", is escaped to satisfy the JSON format of the configuration. Also, in this example, Voice Gateway is instructed to use the configuration that is specified for one conversational turn by specifying mergeOnce.

{
    "vgwAction": {
        "command": "vgwActSetSTTConfig",
        "parameters": {
            "updateMethod": "mergeOnce",
            "config": {
                "recognizeHeaders": {
                    "No-Input-Timeout": 10000
                },
                "recognizeBody": {
                    "contentType": "application/srgs+xml",
                    "body": "<grammar version=\"1.0\" xml:lang=\"en-US\" root=\"digits\"> <rule id=\"confirmations\"> <one-of> <item> affirmative </item> <item> nah </item> <item> no </item> <item> nope </item> <item> yeah </item> <item> yep </item> <item> yes </item> <item> yup </item> <item> fine </item> <item> negative </item> <item> OK </item> <item> sure </item> </one-of> </rule> </grammar>"
                }
            }
        }
    }
 }

External grammars

External grammars are grammars that are specified in external files, which can be referenced with external URI references. You can include external grammars as a list in the Watson Assistant dialog.

In the following example, the recognizeBody specifies the content type as text/uri-list. This content type instructs the MRCPv2 Recognizer to fetch the grammars from the list of provided URIs defined in the body field.

{
    "vgwAction": {
    "command": "vgwActSetSTTConfig",
    "parameters": {
      "updateMethod": "mergeOnce",
      "config": {
        "recognizeBody": {
          "body": ["http://example-grammar-server.com/grammars/basic_grammar.xml", "file://grammars/example_file_system_grammar.xml"],
          "contentType": "text/uri-list"
        }
      }
    }
  }
}

Built-in Grammars

Some MRCPv2 recognizers support built-in grammars, which are typically grammars that support various common user inputs, such as confirmations or digits, among others. To use built-in grammars, you can specify them as though they are external grammars. For example, you can specify whether the MRCPv2 recognizer has a built-in grammar for digits

In the following example, the specified grammar, builtin:grammar/digits?language=en-US;length=9, says that the caller inputs digits, and the length of the spoken digits must match 9.

{
    "vgwAction": {
    "command": "vgwActSetSTTConfig",
    "parameters": {
      "updateMethod": "mergeOnce",
      "config": {
        "recognizeBody": {
          "body": ["builtin:grammar/digits?language=en-US;length=9"],
          "contentType": "text/uri-list"
        }
      }
    }
  }
}

Handling Recognizer Results

If recognition completes successfully, the result with the highest confidence is used as the input for the Watson Assistant conversation turn. Additionally, the Voice Gateway state variable vgwSTTResponse is set to the resulting JSON, which includes the list of other alternatives as well as the original response from the server. The following example shows that the original response of the MRCPv2 recognizer is set to the mrcpv2RecognizerResult field of the vgwSTTResponse context state variable.

{
  "vgwSTTResponse": {
    "warnings": "",
    "error": "",
    "mrcpv2RecognizerResult": "<?xml version='1.0'?><result><interpretation grammar=\"https:\/\/raw.githubusercontent.com\/WASdev\/sample.voice.gateway\/54e662f18e67c6935e8c8b815b89b57014a37494\/mrcp\/grammar-samples\/store-hours_query.xml\" confidence=\"0.76\"><input mode=\"speech\">Are you open on Sunday<\/input><instance><SWI_literal>Are you open on Sunday<\/SWI_literal><SWI_grammarName>https:\/\/raw.githubusercontent.com\/WASdev\/sample.voice.gateway\/54e662f18e67c6935e8c8b815b89b57014a37494\/mrcp\/grammar-samples\/store-hours_query.xml<\/SWI_grammarName><SWI_meaning>{SWI_literal:Are you open on Sunday}<\/SWI_meaning><\/instance><\/interpretation><interpretation grammar=\"https:\/\/raw.githubusercontent.com\/WASdev\/sample.voice.gateway\/54e662f18e67c6935e8c8b815b89b57014a37494\/mrcp\/grammar-samples\/store-hours_query.xml\" confidence=\"0.29\"><input mode=\"speech\">Are you open on Monday<\/input><instance><SWI_literal>Are you open on Monday<\/SWI_literal><SWI_grammarName>https:\/\/raw.githubusercontent.com\/WASdev\/sample.voice.gateway\/54e662f18e67c6935e8c8b815b89b57014a37494\/mrcp\/grammar-samples\/store-hours_query.xml<\/SWI_grammarName><SWI_meaning>{SWI_literal:Are you open on Monday}<\/SWI_meaning><\/instance><\/interpretation><\/result>",
    "results": [{
      "final": true,
      "alternatives": [{
        "transcript": "Are you open on Sunday",
        "confidence": "0.76"
      }, {
        "transcript": "Are you open on Monday",
        "confidence": "0.29"
      }]
    }]
  }
}

In unsuccessful recognition results, the Voice Gateway sends the keyword as the user input,vgwMrcpRecognitionUnsuccessful, to inform Watson Assistant that recognition was not successful. In example case, the state variable, vgwMrcpRecognizerResponse, is set to the response of the MRCPv2 recognizer.

{
   "context": {
      "vgwMrcpRecognizerResponse": {
           "headers": {
                "Completion-Cause": "001 no-input-timeout"
            },
            "body": "<body of the recognition result>"
      }
   }

}

The conversation developer can check the response of the MRCPv2 recognizer vgwMrcpRecognizerResponse for insight on what occurred to the recognition request. For a full list of recognition completion causes, you can look into the MRCPv2 RFC.