Home

AI and ML

Watson Speech to Text

IBM Watson Speech to Text
Convert speech into text using AI-powered speech recognition and transcription
Start your free trial
Man at desk connected to sound bars and documents
What is IBM Watson Speech to Text?

IBM Watson® Speech to Text technology enables fast and accurate speech transcription in multiple languages for a variety of use cases, including but not limited to customer self-service, agent assistance and speech analytics. Get started fast with our advanced machine learning models out-of-the-box or customize them for your use case.

IBM Watson Speech to Text is now available as a containerized library for IBM partners to embed AI technology in their commercial applications.

Benefits More accurate AI

Our best-in-class AI, embedded within Watson Speech to Text, truly understands your customers.

Customizable for your business

Train Watson Speech to Text on your unique domain language and specific audio characteristics.

Protects your data

Enjoy the security of IBM’s world-class data governance practices.

Truly runs anywhere

Built to support global languages and deployable on any cloud — public, private, hybrid, multicloud, or on-premises.

Feature highlights What sets Watson Speech to Text apart? Automatic speech recognition

Enable your voice applications using neural technologies for speech recognition powered by IBM Watson.

 

Model training options

Improve speech recognition accuracy for your use case with language and acoustic training options.

Optimized for customer care

Activate your voice application with speech models tuned for the customer care domain.

Pre-trained speech models

Activate your voice application with speech models tuned for the customer care domain.

Fine-tuning features

Improve speech recognition accuracy for extracting phrases, words, letters, numbers or lists.

Low latency transcription

Use our models optimized for low latency in real-time speech applications.

Audio diagnostics before transcription

Analyze and correct weak audio signals before transcription begins.

Interim transcription before final results

Improve application response times by using speech transcription as it is generated and throughout the finalization process.

Smart formatting

Transcribe dates, times, numbers, currency values, email and website addresses in your final transcripts by converting them into conventional forms.

Speaker diarization

Recognize who said what in a multi-participant voice exchange. Currently optimized for two-way call center conversations but can detect up to 6 different speakers.

Word spotting and filtering

Filter for specific words or inappropriate content by using our keyword spotting and profanity filtering features. (US English only)

Use cases

Customer self-service Call analytics Agent assist
Interactive demo
Experience the difference Explore the powerful capabilities of advanced AI, neural voices and voice customization in our interactive demo.
Partner with IBM

Accelerate your business growth as an Independent Software Vendor (ISV) by innovating with IBM. Partner with us to deliver enhanced commercial solutions embedded with AI to better address clients’ needs.

Explore ways to accelerate your growth with IBM
Find out more Build AI-based solutions faster with IBM embeddable AI
Ways to buy

Get started for free or view a demo. 

Lite

Free

500 minutes of free speech recognition a month and 38 pre-trained speech models.

Start for free

Plus

As low as USD 0.01 per minute

Tune your speech models to improve accuracy in recognition as well as transcription. Plus version includes unlimited minutes per month and 100 concurrent transcriptions.

View details

Premium

Contact us for pricing

Provides large and security-sensitive firms with more capacity and data protection. Premium includes unlimited minutes per month and unlimited concurrent transcriptions.

Deploy Anywhere

Contact us for pricing

Deploy behind your firewall or on any cloud with the flexibility of IBM Cloud Pak for Data. The Deploy Anywhere version includes unlimited minutes per month and unlimited concurrent transcriptions, along with noise detection, speech customization and data isolation. 

Resources API reference

Technical API specifications for all of your development needs.

Read more
Download SDKs

The Watson SDK repository in GitHub.

Go to GitHub
Data privacy and security

See documentation about our enhanced security features that ensure your data is isolated and encrypted end-to-end, while in transit and at rest.

Learn more
Build custom speech recognition models within minutes

Learn how to create custom speech models using IBM Watson quickly — without knowing how to code.

Read more
How to train your own speech “dragon”

Read about Watson Speech to Text requirements, the methodology and some best practices inspired by actual clients.

Read more
Replacing my old IVR system with IBM Watson

Guidelines on how to add a new or existing virtual assistant to your brand-new Watson IVR.

Read more
Related products Watson Text to Speech

Improve customer engagement by interacting with users in their own language using any written text.

watsonx Assistant

Solve customer issues the first time using an AI virtual assistant across any application, device, or channel.

Watson Speech Libraries for Embed

Infuse powerful natural language AI into commercial applications with a containerized library designed to empower IBM partners with greater flexibility.

Take the next step

See Watson Speech to Text capabilities in action.

Start your free trial
More ways to explore Documentation Community Partner with IBM