Run with Docker run

Overview

  1. Use the watson-tts-runtime image as the base
  2. Set the required configuration using environment variables
  3. Pull model archives out of model images and gather them into a single directory
  4. Run a simple file server serving the model archives
  5. Run the TTS runtime container configured with a UrlService pointing at the file server to download and extract the models from
  6. Use the resulting files in the model cache as the model source that the runtime uses at bootup

Usage

  1. Login to the IBM Entitled Registry

    Container images for Watson Text to Speech runtime and pretrained model images are stored in the IBM Entitled Registry. Once you've obtained the entitlement key from the container software library you can login to the registry with the key, and pull the images to your local machine.

    echo $IBM_ENTITLEMENT_KEY | docker login -u cp --password-stdin cp.icr.io
    
  2. Update configurations for the set of models you want to use

    A list of available models can be found in the models catalog.

    The set of configurations are described below, which includes environment configurations and resource requirements. The configs provided are a good set of defaults to use, however, depending on which models are being used, some configurations will have to be updated. For more details see the configuration page.

  3. Build the container image from the provided Dockerfile

    # Model images
    FROM cp.icr.io/cp/ai/watson-tts-generic-models:1.10.0 AS catalog
    FROM cp.icr.io/cp/ai/watson-tts-en-us-michaelv3voice:1.10.0 AS en-us-michael
    FROM cp.icr.io/cp/ai/watson-tts-es-la-sofiav3voice:1.10.0 AS es-la-sofia
    # Add additional FROM statements for additional models here
    
    # Base image for the runtime
    FROM cp.icr.io/cp/ai/watson-tts-runtime:1.10.0 AS runtime
    
    # Configure the runtime
    # MODELS is a comma separated list of Model IDs
    ENV MODELS=en-US_MichaelV3Voice,es-LA_SofiaV3Voice
    ENV DEFAULT_MODEL=en-US_MichaelV3Voice
    
    # Copy in the catalog and runtime configurations
    # $CHUCK is already set in the base image
    COPY --chown=watson:0 --from=catalog catalog.json ${CHUCK}/var/catalog.json
    
    # Intermediate image to populate the model cache
    FROM runtime as model_cache
    
    # Copy model archives from model images
    RUN sudo mkdir -p /models
    COPY --chown=watson:0 --from=en-us-michael model/ /models/
    COPY --chown=watson:0 --from=es-la-sofia model/ /models/
    # For each additional model, copy the line above and update the --from
    
    # Run script to initialize the model cache from the model archives
    RUN prepare_models.sh
    
    # Final runtime image with models baked in
    FROM runtime as release
    
    COPY --from=model_cache ${CHUCK}/var/cache/ ${CHUCK}/var/cache/
    

    The container image build starts by referencing the set of images that are required for the build. Files from the images are copied into the model_cache stage and then a the model cache is populated by running the prepare_models.sh script. Finally, the release stage is built with the model cache copied in.

    docker build . -t tts-standalone
    
  4. Run the newly built image

    docker run --rm -it --env ACCEPT_LICENSE=true --publish 1080:1080 tts-standalone
    

    The environment variable ACCEPT_LICENSE must be set to true in order for the container to run. To view the set of licenses, run the container without the environment variable set.

    You can also output the licenses to a file for ease of viewing:

    docker run --rm tts-standalone > tts-licenses.txt
    
  5. List the available voices to confirm that the voices are being loaded

    curl "http://localhost:1080/text-to-speech/api/v1/voices"
    

    Example output:

    {
       "voices": [
          {
             "name": "es-LA_SofiaV3Voice",
             "language": "es-LA",
             "gender": "female",
             "description": "Sofia: Latin American Spanish (español latinoamericano) female voice. Dnn technology.",
             "customizable": true,
             "supported_features": {
                "custom_pronunciation": true,
                "voice_transformation": false
             },
             "url": "http://localhost:1080/text-to-speech/api/v1/voices/es-LA_SofiaV3Voice"
          },
          {
             "name": "en-US_MichaelV3Voice",
             "language": "en-US",
             "gender": "male",
             "description": "Michael: American English male voice. Dnn technology.",
             "customizable": true,
             "supported_features": {
                "custom_pronunciation": true,
                "voice_transformation": false
             },
             "url": "http://localhost:1080/text-to-speech/api/v1/voices/en-US_MichaelV3Voice"
          }
       ]
    }
    
  6. Send a /synthesize request to test the service

    Send text in JSON format to the service and it will return an audio file:

    curl "http://localhost:1080/text-to-speech/api/v1/synthesize" \
      --header "Content-Type: application/json" \
      --data '{"text":"Hello world"}' \
      --header "Accept: audio/wav" \
      --output hello_world.wav
    

    To use a different model, add the voice query parameter to the request. To change the audio format, change the Accept header. For example:

    curl "http://localhost:1080/text-to-speech/api/v1/synthesize?voice=es-LA_SofiaV3Voice" \
      --header "Content-Type: application/json" \
      --data '{"text":"Hola! Hoy es un día muy bonito."}' \
      --header "Accept: audio/mp3" \
      --output hola.mp3
    

 For more details, view the [Text-to-Speech API docs](https://cloud.ibm.com/apidocs/text-to-speech#getsynthesize).
 Note that not all of the endpoints are supported.

### Using Additional Models

To include additional models to use:

1. Find the additional model images from the [model catalog](tts_models_catalog.html)

1. Add the model image to the top of the `Dockerfile` in a new `FROM <model-image> as <short-model-image-name>` statement

1. Populate the intermediate model cache by adding another `COPY --chown=watson:0 --from=<short-model-image-name>/* /models/` to the `Dockerfile`

1. Update the comma-separated list of Model IDs in the `ENV MODELS=` line in the `Dockerfile`

## Notes

The runtime container itself does not support TLS. The `watson-tts-haproxy` container (or another proxy) is required for TLS termination. For details view the [configuration page](tts_configuration.html).

The runtime caching code includes logic to treat the files as an LRU cache. If any of the model data is deleted, the server will not be able to use the data. Cleanup of the cache is triggered if the size of the files on disk is too large. Therefore, the size of all models in the cache needs should be kept below 2.5 GiB. Smaller is better.