Run with Docker run
Build a container image to serve pretrained Watson Speech to Text Library for Embed models, and run it with Docker. The container image should include both the Watson Speech to Text Runtime and the models.
Overview
- Use the
watson-stt-runtime
image as the base - Set the required configuration using environment variables
- Pull model archives out of model images and gather them into a single directory
- Run a simple file server serving the model archives
- Run the STT runtime container configured with a
UrlService
pointing at the file server to download and extract the models from - Use the resulting files in the model cache as the model source that the runtime uses at bootup
Usage
-
Login to the IBM Entitled Registry
Container images for Watson Speech to Text runtime and pretrained model images are stored in the IBM Entitled Registry. Once you've obtained the entitlement key from the container software library you can login to the registry with the key, and pull the images to your local machine.
echo $IBM_ENTITLEMENT_KEY | docker login -u cp --password-stdin cp.icr.io
-
Update configurations for the set of models you want to use
A list of available models can be found in the models catalog.
The set of configurations includes environment configurations and resource requirements. The configs provided are a good set of defaults to use, however, depending on which models are being used, some configurations will have to be updated. For more details see the configuration page.
-
Build the container image from the provided
Dockerfile
# Model images FROM cp.icr.io/cp/ai/watson-stt-generic-models:1.12.0 as catalog FROM cp.icr.io/cp/ai/watson-stt-en-us-multimedia:1.12.0 as en-us-multimedia FROM cp.icr.io/cp/ai/watson-stt-es-la-telephony:1.12.0 as es-la-telephony # Add additional FROM statements for additional models here # Base image for the runtime FROM cp.icr.io/cp/ai/watson-stt-runtime:1.12.0 AS runtime # Configure the runtime # MODELS is a comma separated list of Model IDs ENV MODELS=en-US_Multimedia,es-LA_Telephony ENV DEFAULT_MODEL=en-US_Multimedia # Copy in the catalog # $CHUCK is already set in the base image COPY --chown=watson:0 --from=catalog catalog.json ${CHUCK}/var/catalog.json # Intermediate image to populate the model cache FROM runtime as model_cache # Copy model archives from model images RUN sudo mkdir -p /models COPY --chown=watson:0 --from=en-us-multimedia model/ /models/ COPY --chown=watson:0 --from=es-la-telephony model/ /models/ # For each additional model, copy the line above and update the --from # Run script to initialize the model cache from the model archives RUN prepare_models.sh # Final runtime image with models baked in FROM runtime as release COPY --from=model_cache ${CHUCK}/var/cache/ ${CHUCK}/var/cache/
The container image build starts by referencing the set of images that are required for the build. Files from the images are copied into the
model_cache
stage and then a the model cache is populated by running theprepare_models.sh
script. Finally, therelease
stage is built with the model cache copied in.docker build . -t stt-standalone
-
Run the newly built image
docker run --rm -it --env ACCEPT_LICENSE=true --publish 1080:1080 stt-standalone
The environment variable
ACCEPT_LICENSE
must be set totrue
in order for the container to run. To view the set of licenses, run the container without the environment variable set.You can also output the licenses to a file for ease of viewing:
docker run --rm stt-standalone > stt-licenses.txt
-
List the available models to confirm that the models are being loaded
curl "http://localhost:1080/speech-to-text/api/v1/models"
Example output:
{ "models": [ { "name": "en-US_Multimedia", "rate": 16000, "language": "en-US", "description": "US English multimedia model for broadband audio (16kHz or more)", "supported_features": { "custom_acoustic_model": false, "custom_language_model": true, "low_latency": true, "speaker_labels": true }, "url": "http://localhost:1080/speech-to-text/api/v1/models/en-US_Multimedia" }, { "name": "es-LA_Telephony", "rate": 8000, "language": "es-LA", "description": "Latin American Spanish telephony model for narrowband audio (8kHz)", "supported_features": { "custom_acoustic_model": false, "custom_language_model": true, "low_latency": true, "speaker_labels": true }, "url": "http://localhost:1080/speech-to-text/api/v1/models/es-LA_Telephony" } ] }
-
Send a
/recognize
request to test the serviceDownload an example audio file or use your own:
curl "https://github.com/watson-developer-cloud/doc-tutorial-downloads/raw/master/speech-to-text/0001.flac" \ -sLo example.flac
Send the audio file to the service:
curl "http://localhost:1080/speech-to-text/api/v1/recognize" \ --header "Content-Type: audio/flac" \ --data-binary @example.flac
Example response:
{ "result_index": 0, "results": [ { "final": true, "alternatives": [ { "transcript": "several tornadoes touched down as a line of severe thunderstorms swept through colorado on sunday ", "confidence": 0.99 } ] } ] }
To use a different model, add the
model
query parameter to the request. The audio format can also be changed as long as theContent-Type
header matches. For example:curl "http://localhost:1080/speech-to-text/api/v1/recognize?model=es-LA_Telephony" \ --header "Content-Type: audio/mp3" \ --data-binary @hola.mp3
{ "result_index": 0, "results": [ { "final": true, "alternatives": [ { "transcript": "hola hoy es un día muy bonito ", "confidence": 0.92 } ] } ] }
For more details, such as what types of audio files are supported and additional input parameters, view the Speech-to-Text API docs. Note that not all of the endpoints are supported.
Using Additional Models
To include additional models to use:
-
Find the additional model images from the model catalog
-
Add the model image to the top of the
Dockerfile
in a newFROM <model-image> as <short-model-image-name>
statement -
Populate the intermediate model cache by adding another
COPY --chown=watson:0 --from=<short-model-image-name>/* /models/
to theDockerfile
-
Update the comma-separated list of Model IDs in the
ENV MODELS=
line in theDockerfile
Notes
The runtime container itself does not support TLS. The watson-stt-haproxy
container (or another proxy) is required for TLS termination. View the configuration page for details.
The runtime caching code includes logic to treat the files as an LRU cache. If any of the model data is deleted, the server will not be able to use the data. Cleanup of the cache is triggered if the size of the files on disk is too large. Therefore, the size of all models in the cache needs should be kept below 2.5 GiB. Smaller is better.