GitHubContribute in GitHub: Edit online

copyright: years: 2019, 2023 lastupdated: "2023-04-18"


Monitoring

Monitoring and alerting via log messages

On every call IBM®, Voice Gateway will log information such as the end status of the call, reasons for failure, and other details. You can monitor the logs and build alerts on these log messages to proactively check the state of the application. We typically recommend monitoring for Call failures and Call quality as mentioned below.

Call failures

You can determine the number of calls that failed via the CWSGW0158E log message code and the number of calls received via the CWSGW0003I log message code. Using a log aggregation system, such as Splunk or LogDNA, a query can be built to count the instances of CWSGW0158E and divide by the number of CWSGW0003I instances to determine a percentage of calls that failed within a time frame and to create an alert. A recommended alert threshold would be at least 5% of calls within 15 minutes.

Call quality

You can determine the number of calls that had call quality issues via the CWSMR0134W log message code and the number of calls received via the CWSGW0003I log message code. Using a log aggregation system, such as Splunk or LogDNA, a query can be built to count instances of CWSMR0134W and divide by the number of CWSGW0003I instances to determine a percentage of calls that experienced a call quality issue within a time frame and to create an alert. A recommended alert threshold would be at least 5% of calls within 15 minutes.

For a full reference of system messages, see the Voice Gateway System Messages page.

Monitoring via Prometheus metrics

The Voice Gateway monitoring feature provides a REST API to display metrics for administrators to access.

Formats

The metrics endpoint provides two output formats. The format that is used for each response depends on the HTTP accept header of the corresponding request.

  • Prometheus text format - A representation of the metrics that is compatible with the Prometheus monitoring tool. This format is returned for requests that have a text/plain accept header.
  • JSON format - A JSON representation of the metrics. This format is returned for requests that have a application/json accept header.

REST endpoint

The following table illustrates the monitoring endpoint that can be accessed to provide metrics.

Table 1. REST endpoint for the monitoring feature
Endpoints Request type Supported formats Description
/metrics/application GET JSON, Prometheus Returns Voice Gateway metrics.

Connecting to data stores and monitoring tools

You can connect the Voice Gateway metrics to tools and stacks that can analyze and monitor the metric information. By default, the /metrics/application endpoint returns data in a format that is compatible with Prometheus. To connect the Voice Gateway server to Prometheus, configure Prometheus to use the http://host:http_port/metrics/application or  https://host:https_port/metrics/application endpoint. The JSON format can be used by other metrics collection tools that understand JSON.

Prometheus format details

The Prometheus text format is based on the 0.0.4 exposition format described in the Prometheus documentation. Where available, metadata is provided for each metric. The # Help line contains a description of the metric. Any tags present in the metadata are provided as Prometheus labels. The metric's unit is appended at the end of the metric name for gauges and histograms.

A gauge is represented by a single value. The following example shows how a gauge named vg_max_conversation_latency, with seconds as the unit of measurement, would be displayed for the 123456789 tenant:

# TYPE application_vg_max_conversation_latency_seconds gauge
# HELP application_vg_max_conversation_latency_seconds Maximum conversation latency per monitoring interval
application_vg_max_conversation_latency_seconds{tenant_id="123456789"} 7.049

The following example illustrates the generated text format for the vg_max_calls gauge.

# TYPE application_vg_max_calls_per_second gauge
# HELP application_vg_max_calls_per_second Maximum calls per second per monitoring interval
application_vg_max_calls_per_second 1

JSON format details

The JSON format returns data that is formatted in a tree. Each metric is referenced by the name and the value.

{
  "vg_max_tts_latency_seconds{tenant_id=\"123456789\"}": 0.528,
  "vg_max_conversation_latency_seconds{tenant_id=\"123456789\"}": 7.049,
  "vg_max_stt_latency_seconds{tenant_id=\"123456789\"}": 0,
  "vg_max_calls_per_second": 1,
  "vg_max_concurrent_calls{tenant_id=\"123456789\"}": 1
}

Metrics

Table 2. General metrics
Key Value
vg_max_calls Maximum calls per second per monitoring interval.

Tenant-specific metrics

Table 3. Tenant-specific metrics
Key Value
vg_max_concurrent_calls Maximum concurrent calls per monitoring interval.
vg_max_conversation_latency Maximum Watson Assistant service latency per monitoring interval, in seconds.
vg_max_tts_latency Maximum Text to Speech Service latency per monitoring interval, in seconds.
vg_max_stt_latency Maximum Speech to Text Service latency per monitoring interval, in seconds.

Configuration

Use the following environment variables to configure the monitoring feature.

Table 4. General deployment configuration environment variables for the monitoring REST API
Key Value
METRICS_SAMPLING_INTERVAL See Configuration environment variables for Voice Gateway.
ENABLE_METRICS_AUTH See Configuration environment variables for Voice Gateway.
HTTP_HOST See Configuration environment variables for Voice Gateway.
ADMIN_USERNAME See Configuration environment variables for Voice Gateway.
ADMIN_PASSWORD See Configuration environment variables for Voice Gateway.