IBM watsonx.ai | Pricing

watsonx.ai pricing

Explore the pricing tiers for our trial, essentials and standard plans on IBM® watsonx.ai®. For model pricing, explore IBM's foundation and embedding model section, as well as third-party foundation and embedding models pricing.

Foundation models from IBM

Includes pay-as-you-go pricing per million tokens and hourly rates for on-demand model hosting and deployment.

Learn more

Embedding models

Includes IBM and third-party models available for USD 0.10 per million tokens.

Learn more

Third-party foundation models

Includes third-party models from Meta, Google, DeepSeek, Mistral, and more, with pay-as-you-go pricing per million tokens and hourly options for on-demand hosting and deployment.

Learn more

Use case specific pricing

Includes use case-based pricing for machine learning, text extraction, and model customization, with Essential and Standard package options.

Learn more

Pricing tiers (SaaS)

Free

Toolbox playground

Foundation Models: Up to 300,000 tokens per month

Machine Learning Tools: Up to 20 Compute Usage Hours (CUH) per month

Text Extraction: Up to 100 documents per month

Start your free trial

Essentials (Pay-as-you-go)

Production deployments

Starting at USD 0/month^*

Model price breakdown^***

Feature-specific price breakdown^**

Standard (Pay-as-you-go)

Enterprise production

Starting at USD 1050/month^*

Model price breakdown***

Feature-specific price breakdown**

Playground UI

Inferencing

Open source models

IBM watsonx® models

Work with foundational models (PromptLab)

Supports retrieval augmented generation (RAG)

Work with agents (AgentLab)

Synthetic data generator

ML functionality^**

Text extraction^**

LoRA/QLoRA Fine-tuning^*

Custom foundation models^***

Model hosting^***

Deploy on-demand models^***

Support

watsonx community and online chatbot

Basic support included: 24x7 access to tech support through cases

Options available

Advanced support with SLAs available starting at USD 200 per month

*Prices shown are indicative, may vary by country, exclude any applicable taxes and duties, and are subject to product offering availability in a locale.

IBM Foundation Models

Model name

Pay as you go

Per million tokens

Model hosting/Deploy on demand

Per hour

Prices based on GPU config

granite-4-h-small

USD 0.06 per 1M tokens input / USD 0.25 per 1M tokens output

Not available

granite-vision-3-3-2b

Not available

granite-vision-3-2-2b¹

USD 0.10

Not available

granite-3-2b-instruct (v3.1)¹

USD 0.10

Not available

granite-guardian-3-2b (v3.1)¹ (Deprecated)

USD 0.10

Not available

granite-guardian-3-8b (v3.1)¹

USD 0.20

Not available

granite-timeseries-ttm-r2¹

USD 0.38

Not available

granite-13b-instruct¹ (Deprecated)

USD 0.60

Not available

granite-3-8b-instruct (v3.1)

USD 0.20

Not available

granite-8b-code-instruct

USD 0.20

granite-3-2-8b-instruct

USD 0.20

granite-3-1-8b-base

Not available

granite-20b-code-base-sql-gen¹

Not available

granite-20b-code-base-schema-linking¹

Not available

granite-3-8b-base¹

Not available

granite-7b-lab¹

Not available

granite-8b-japanese¹

Not available

*Prices shown are indicative, may vary by country, exclude any applicable taxes and duties, and are subject to product offering availability in a locale.

Embedding models

All embedding models are USD 0.10 per million tokens. This includes IBM models (-granite-embedding-107m-multilingual, granite-embedding-278m-multilingual, slate-125m-english-rtrvr-v2, slate-125m-english-rttrvr, slate-30m-english-rtrvr-v2, slate-30m-english-rtrvr) and third-party models (-all-mini-l6-v2, all-minilm-l12-v2, and multilingual-e5=large).

Third-party foundation models

Model name

Provider

Pay as you go

Per million tokens

Model hosting/Deploy on demand^

Per hour

Prices based on GPU config

llama-4-maverick-17b-128e-instruct-int4

Meta

Not available

llama-4-maverick-17b-128e-instruct-fp8

Meta

USD 0.35 tokens input

USD 1.40 tokens output

Not available

llama-3-2-1b-instruct

Meta

USD 0.10

Not available

llama-3-2-3b-instruct

Meta

USD 0.15

Not available

llama-3-2-90b-vision-instruct

Meta

USD 2.00

Not available

llama-3-405b-instruct

Meta

USD 5.00 tokens input

USD 16.00 tokens output

Not available

llama-guard-3-11b-vision

Meta

USD 0.35

Not available

mistral-medium-2505

Mistral AI

USD 3.00 input

USD 10.00 tokens output

Not available

mistral-large-2²(Deprecated)

Mistral AI

USD 3.00 tokens input

USD 10.00 tokens output

Not available

mistral-small-3-1-24b-instruct-2503²

Mistral AI

USD 0.10 input

USD 0.30 output

Not available

pixtral-12b²(Deprecated)

Mistral AI

USD 0.35

Not available

llama-3-3-70b-instruct

Meta

USD 0.71

flan-t5-xl-3b

Deprecated

Google

USD 0.60

allam-1-13b-instruct

SDAIA

USD 1.80

gpt-oss-120b

Open AI

USD 0.15 tokens input

USD 0.60 tokens output

llama-3-2-11b-vision-instruct

Meta

USD 0.35

llama-3-13b-chat (Deprecated)

Meta

USD 0.0006 /1,000 tokens for input and output

deepseek-r1-distill-llama-70b

DeepSeek

Not available

deepseek-r1-distill-llama-8b

DeepSeek

Not available

eurollm-1-7b-instruct

Utter Project

Not available

eurollm-9b-instruct

Utter Project

Not available

llama-2-70b-chat

Meta

Not available

llama-3-1-70b

Meta

Not available

llama-3-1-8b

Meta

Not available

llama-3-3-70b-instruct-hf

Meta

Not available

mistral-large-instruct-2411²

Mistral AI

Not available

mistral-nemo-instruct-2407²

Mistral AI

Not available

mixtral-8x7b-base²

Mistral AI

Not available

poro-34b-chat

LumiOpen

Not available

*Prices shown are indicative, may vary by country, exclude any applicable taxes and duties, and are subject to product offering availability in a locale.

Features specific pricing

Use case

Essentials plan

Standard plan

Machine learning models

0.52 USD / Capacity Unit-Hour

0.42 USD / Capacity Unit-Hour

Text extraction³

0.038 USD / Page

0.03 USD / Page

LoRA fine-tuning

Not available

NVIDIA 1 x A100 GPU: 5.88 USD / Hour

NVIDIA 1 x H100 GPU: 13.86 USD / Hour

Model hosting/Deploy on demand

Not available

NVIDIA 1 x L40S GPU: 4.43 USD / Hour

NVIDIA 2 x L40S GPU: 8.86 USD / Hour

NVIDIA 1 x A100 GPU: 5.8 USD / Hour

NVIDIA 2 x A100 GPU: 11.6 USD / Hour

NVIDIA 4 x A100 GPU: 23.2 USD / Hour

NVIDIA 8 x A100 GPU: 46.4 USD / Hour

NVIDIA 1 x H100 GPU: 14.5 USD / Hour

NVIDIA 2 x H100 GPU: 29 USD / Hour

NVIDIA 4 x H100 GPU: 58 USD / Hour

NVIDIA 8 x H100 GPU: 116 USD / Hour

NVIDIA 1 x H200 GPU: 16 USD / Hour

NVIDIA 2 x H200 GPU: 32 USD / Hour

NVIDIA 4 x H200 GPU: 64 USD / Hour

NVIDIA 8 x H200 GPU: 128 USD / Hour

*Prices shown are indicative, may vary by country, exclude any applicable taxes and duties, and are subject to product offering availability in a locale.

Take the next step

Try watsonx.ai at no cost or continue your journey of discovery.

Start your free trial

Book a live demo

More ways to explore

Become an IBM Business Partner

Connect with the IBM Community

Support

Footnotes

¹For foundation model inference, charges are based on a Resource Unit (RU) metric equivalent to 1000 tokens (including both input and output tokens). 

²Mistral commercial models have a GPU hosting fee and a model access fee. For more information, view the documentation.

^* Prices shown are indicative, may vary by country, exclude any applicable taxes and duties, and are subject to product offering availability in a locale.

^{^} Capacity Unit Hour pricing depends on the environment and tools utilized within a billing month.

³ Unless otherwise specified under Software pricing, all features, capabilities, and potential updates refer exclusively to SaaS. IBM makes no representation that SaaS and software features and capabilities will be the same.