TTS troubleshooting tips

This topic contains troubleshooting tips for text-to-speech (TTS).

The TTS engine(s) will not start (Linux platforms)

Use the following checklist to troubleshoot this problem:

Speech Synthesis Markup Language or lexicon-related troubleshooting tips

The embedded audio or lexicon document cannot be fetched, or the embedded audio file within the SSML document will not play

Also, if the SSML processor cannot find the audio file, it will synthesize the files that it can find. This means that there is no error message rendered, but you will not hear the file played. In order to troubleshoot this error you will need to turn on tracing to see why the file was not fetched.

Audio specified in the SSML <audio> tag not heard

If you are unable to hear the audio specified in the SSML <audio> tag, verify that the server where WebSphere Voice Server is installed, doesn't have Windows Internet Information Services (IIS) or any other web servers running or installed. Disable any web server software you find running or installed as additional Web servers conflict with the WebSphere Application Server HTTP server.

The SSML document encounters a parse error

When the <voice> tag has a gender attribute set to "neutral", the voice is not changed

Certain <voice> tag attributes, such as gender "neutral" and age, are not supported for any concatentive voices, including US English voices. Also note that the "variant" attribute is not supported for the US English voices.

A synthesized audio output is truncated

The amount of text included in one speak request is limited to the amount of text that can be spoken in five minutes. The synthesized audio for larger amounts of text may be truncated.

Prosody volume and range attributes have no effect

The prosody volume and range attributes do not have the appropriate effect on the TTS generated and are currently unsupported.

Age and variant voice settings unsupported in the TTS

The TTS supports only adult voices. Some age and variant voice settings may not affect the TTS.

Australian TTS and city name pronunciations

The Australian TTS uses the UK English TTS. As a result, some of the names of Australian cities will sound as if pronounced in UK English. You may insert lexicons into the VoiceXML script to correct the pronunciations.

Speech markup <emphasis> element limitation

The <emphasis> element, included in VoiceXML 2.0, is currently not supported. A fix is planned for a future release.

Errors when trying to load multiple voices

Windows platforms Windows

When loading more than three different TTS voices, for example the U.S. English male and female voices, as well as the U.K. English male and female voices, you may need to boot Windows Server 2003 so that it allocates more contiguous memory to application programs. To do this, use the following instructions:

Important: Allocating more contiguous memory may degrade the performance of some systems, so only use this procedure if necessary. See Memory Support and Windows Operating Systems for more information.
Restriction: This procedure only works on machines with at least 1 GB of memory running Windows Server 2003 Standard or Enterprise editions.
  1. Change directory to your Windows boot partition.
    C:
    cd \
  2. At the command line prompt, type: attrib boot.ini

    You should see a response similar to the following: A SH C:\boot.ini

  3. Next, type:
    attrib -h -s boot.ini
    notepad boot.ini

    You should see something similar to the following in Notepad:

    [boot loader]
    timeout=30
    default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
    [operating systems]
    multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Windows Server 2003, Enterprise" /fastdetect
  4. Carefully duplicate and edit the last line, adding /3GB /userva=3030 as in the following, and save the file:
    [boot loader]
    timeout=30
    default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
    [operating systems]
    multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Windows Server 2003, Enterprise /3GB" /fastdetect /3GB /userva=3030
    multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Windows Server 2003, Enterprise" /fastdetect
  5. Reboot your machine, and when given the choice, select Windows Server 2003, Enterprise /3GB, or wait until the machine automatically boots the new default /3GB configuration.

This configuration allows you to run additional voices, given that you have sufficient real memory and/or pagespace. If there is any problem with the /3GB configuration, you can boot from the original configuration (the second line on the boot screen) and remove the /3GB line from boot.ini.

To hide the boot.ini file again, type: attrib +h +s boot.ini

Linux platforms Linux

Since WebSphere Voice Server uses a single process to load voices, the memory available for loading languages is limited by the Linux limitation for the memory available for a single process, even if the machine has a larger memory.

Reliability issues when antivirus software installed

If you have antivirus software installed or enabled on a WebSphere Voice Server Windows Server 2003 machine, you may encounter the following:

If the operating system stops, the Windows Server 2003 Online Crash Analysis will indicate that the antivirus software has had a critical failure and request that you update the antivirus software to correct the problem.

Also, it is possible that the antivirus scanning software is attempting to scan some or all of the WebSphere Voice Server files; therefore, it is recommended that you disable scanning for the following directories (where installation_drive is the letter designation of the drive where the software is installed, such as C:):

If you have configured WebSphere Voice Server to point the General (first level) Cache and Grammar (second level) Cache to directories other than the default, then you must disable those directories as well.

If scanning on the above directories is disabled, and problems are still occurring, it may be necessary to uninstall or disable the antivirus software. Updating the antivirus software may not correct the problems. If not, contact your antivirus software vendor for a solution.

Errors when trying to load 22 KHz voices

Make certain that the list of available TTS voices does not contain both 22 KHz voices and 8 KHz voices. The following log message is displayed in that case:

CWVSB0001E: Failed to launch TTS engine process. Reason: Failed to load TTS voice voice name due to error status = 407, error cause = mixing sample rates not supported

Configure the list of available TTS voices so that it includes only 22 KHz voices, or only 8 KHz voices. Configure the default TTS voice accordingly, and restart the server (See Configuring default and available TTS voices).


Terms of use | Feedback

(c) Copyright IBM Corporation 2004, 2009. All rights reserved.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.