Supported foundation models available with watsonx.ai
A collection of open source and IBM foundation models are deployed in IBM watsonx.ai. You can prompt the deployed foundation models in the Prompt Lab or programmatically.
The following models are available in watsonx.ai:
- granite-13b-chat-v2
- granite-13b-instruct-v2
- granite-7b-lab
- granite-8b-japanese
- granite-20b-multilingual
- granite-3b-code-instruct
- granite-8b-code-instruct
- granite-20b-code-instruct
- granite-34b-code-instruct
- allam-1-13b-instruct
- codellama-34b-instruct
- elyza-japanese-llama-2-7b-instruct
- flan-t5-xl-3b
- flan-t5-xxl-11b
- flan-ul2-20b
- jais-13b-chat
- llama-3-8b-instruct
- llama-3-70b-instruct
- llama-2-13b-chat
- llama-2-70b-chat
- llama2-13b-dpo-v7
- merlinite-7b
- mistral-large
- mixtral-8x7b-instruct-v01
- mixtral-8x7b-instruct-v01-q
- mt0-xxl-13b
To understand how the model provider, instruction tuning, token limits, and other factors can affect which model you choose, see Choosing a model.
IBM foundation models
The following table lists the supported foundation models that IBM provides for inferencing. All IBM models are instruction-tuned.
Some IBM foundation models are also available from Hugging Face. License terms for IBM models that you access from Hugging Face are available from the Hugging Face website. For more information about contractual protections related to IBM indemnification for IBM foundation models that you access in watsonx.ai, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
Model name | IBM indemnification | Billing class | Maximum tokens Context (input + output) |
Supported tasks | More information |
---|---|---|---|---|---|
granite-13b-chat-v2 | Yes | Class 1 | 8192 | • classification • extraction • generation • question answering • summarization |
• Model card • Website • Research paper |
granite-13b-instruct-v2 | Yes | Class 1 | 8192 | • classification • extraction • generation • question answering • summarization |
• Model card • Website • Research paper Note: This foundation model can be tuned.
|
granite-7b-lab | Yes | Class 1 | 8192 | • classification • extraction • generation • question answering • retrieval-augmented generation • summarization |
• Model card • Research paper (LAB) |
granite-8b-japanese | Yes | Class 1 | 8192 | • classification • extraction • generation • question answering • summarization |
• Model card • Website • Research paper |
granite-20b-multilingual | Yes | Class 1 | 8192 | • classification • extraction • generation • question answering • summarization |
• Model card • Website • Research paper |
granite-3b-code-instruct | Yes | Class 1 | 2048 | • code • classification • extraction • generation • question answering • summarization |
• Model card • Website • Research paper |
granite-8b-code-instruct | Yes | Class 1 | 4096 | • code • classification • extraction • generation • question answering • summarization |
• Model card • Website • Research paper |
granite-20b-code-instruct | Yes | Class 1 | 8192 | • code • classification • extraction • generation • question answering • summarization |
• Model card • Research paper |
granite-34b-code-instruct | Yes | Class 1 | 8192 | • code • classification • extraction • generation • question answering • summarization |
• Model card • Research paper |
For more information about the supported foundation models that IBM provides for embedding text, see Supported embedding models.
Third-party foundation models
The following table lists the supported foundation models that third parties provide. All third-party models are instruction-tuned.
Model name | Provider | Billing class | Maximum tokens Context (input + output) |
Supported tasks | More information |
---|---|---|---|---|---|
allam-1-13b-instruct | National Center for Artificial Intelligence and Saudi Authority for Data and Artificial Intelligence | Class 2 | 4096 | • classification • extraction • generation • question answering • retrieval-augmented generation • summarization • translation |
• Model card (Frankfurt data center) |
codellama-34b-instruct | Code Llama | Class 2 | 16,384 | • code | • Model card • Meta AI Blog |
elyza-japanese-llama-2-7b-instruct | ELYZA, Inc | Class 2 | 4096 | • classification • extraction • generation • question answering • retrieval-augmented generation • summarization • translation |
• Model card • Blog on note.com |
flan-t5-xl-3b | Class 1 | 4096 | • classification • extraction • generation • question answering • retrieval-augmented generation • summarization |
• Model card • Research paper Note: This foundation model can be tuned.
|
|
flan-t5-xxl-11b | Class 2 | 4096 | • classification • extraction • generation • question answering • retrieval-augmented generation • summarization |
• Model card • Research paper |
|
flan-ul2-20b | Class 3 | 4096 | • classification • extraction • generation • question answering • retrieval-augmented generation • summarization |
• Model card • UL2 research paper • Flan research paper |
|
jais-13b-chat | Inception, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), and Cerebras Systems | Class 2 | 2048 | • classification • extraction • generation • question answering • retrieval-augmented generation • summarization • translation |
• Model card • Research paper |
llama-3-8b-instruct | Meta | Class 1 | 8192 | • classification • code • extraction • generation • question answering • retrieval-augmented generation • summarization |
• Model card • Meta AI website |
llama-3-70b-instruct | Meta | Class 2 | 8192 | • classification • code • extraction • generation • question answering • retrieval-augmented generation • summarization |
• Model card • Meta AI website |
llama-2-13b-chat | Meta | Class 1 | 4096 | • classification • code • extraction • generation • question answering • retrieval-augmented generation • summarization |
• Model card • Research paper Note: This foundation model can be tuned.
|
llama-2-70b-chat | Meta | Class 2 | 4096 | • classification • code • extraction • generation • question answering • retrieval-augmented generation • summarization |
• Model card • Research paper |
llama2-13b-dpo-v7 | Meta | Class 2 | 4096 | • classification • code • extraction • generation • question answering • retrieval-augmented generation • summarization |
• Model card • Research paper (DPO) |
merlinite-7b | Mistral AI and IBM | Class 1 | 32,768 | • classification • extraction • generation • retrieval-augmented generation • summarization |
• Model card • Research paper (LAB) |
mistral-large | Mistral AI | Mistral Large | 32,768 | • classification • code • extraction • generation • retrieval-augmented generation • summarization • translation |
• Model card • Mistral AI website |
mixtral-8x7b-instruct-v01 | Mistral AI | Class 1 | 32,768 | • classification • code • extraction • generation • retrieval-augmented generation • summarization • translation |
• Model card • Research paper |
mixtral-8x7b-instruct-v01-q |
Mistral AI and IBM | Class 1 | 32,768 | • classification • code • extraction • generation • retrieval-augmented generation • summarization • translation |
• Research paper |
mt0-xxl-13b | BigScience | Class 2 | 4096 | • classification • generation • question answering • summarization |
• Model card • Research paper |
- For a list of which models are provided in each regional data center, see Regional availability of foundation model.
- For information about pricing and rate limiting, see Watson Machine Learning plans.
Foundation model details
The available foundation models support a range of use cases for both natural languages and programming languages. To see the types of tasks that these models can do, review and try the sample prompts.
allam-1-13b-instruct
The allam-1-13b-instruct foundation model is a bilingual large language model for Arabic and English provided by the National Center for Artificial Intelligence and supported by the Saudi Authority for Data and Artificial Intelligence that is fine-tuned to support conversational tasks. The ALLaM series is a collection of powerful language models designed to advance Arabic language technology. These models are initialized with Llama-2 weights and undergo training on both Arabic and English languages.
- Usage
- Supports Q&A, summarization, classification, generation, extraction, and translation in Arabic.
- Cost
- Class 2. For pricing details, see Watson Machine Learning plans.
- Try it out
- Experiment with samples:
- Size
- 13 billion parameters
- Token limits
- Context window length (input + output): 4096
- Supported natural languages
- Arabic (Modern Standard Arabic) and English
- Instruction tuning information
- allam-1-13b-instruct is based on the Allam-13b-base model, which is a foundation model that is pre-trained on a total of 3 trillion tokens in English and Arabic, including the tokens seen from its initialization. The Arabic data set contains 500 billion tokens after cleaning and deduplication. The additional data is collected from open-source collections and web crawls. The allam-1-13b-instruct foundation model is fine-tuned with a curated set of 4 million Arabic and 6 million English prompt-and-response pairs.
- Model architecture
- Decoder-only
- License
- Llama 2 community license and ALLaM license
- Learn more
- Read the following resources:
codellama-34b-instruct
A programmatic code generation model that is based on Llama 2 from Meta. Code Llama is fine-tuned for generating and discussing code.
- Usage
- Use Code Llama to create prompts that generate code based on natural language inputs, explain code, or that complete and debug code.
- Cost
- Class 2. For pricing details, see Watson Machine Learning plans.
- Try it out
- Experiment with samples:
- Size
- 34 billion parameters
- Token limits
- Context window length (input + output): 16,384
- Note: The maximum new tokens, which means the tokens generated by the foundation model, is limited to 8192.
- Supported natural languages
- English
- Supported programming languages
- The codellama-34b-instruct-hf foundation model supports many programming languages, including Python, C++, Java, PHP, Typescript (Javascript), C#, Bash, and more.
- Instruction tuning information
- The instruction fine-tuned version was fed natural language instruction input and the expected output to guide the model to generate helpful and safe answers in natural language.
- Model architecture
- Decoder
- License
- License
- Learn more
- Read the following resources:
elyza-japanese-llama-2-7b-instruct
The elyza-japanese-llama-2-7b-instruct model is provided by ELYZA, Inc on Hugging Face. The elyza-japanese-llama-2-7b-instruct foundation model is a version of the Llama 2 model from Meta that is trained to understand and generate Japanese text. The model is fine-tuned for solving various tasks that follow user instructions and for participating in a dialog.
- Usage
- General use with zero- or few-shot prompts. Works well for classification and extraction in Japanese and for translation between English and Japanese. Performs best when prompted in Japanese.
- Cost
- Class 2. For pricing details, see Watson Machine Learning plans.
- Try it out
- Experiment with samples:
- Sample prompt: Classification
- Sample prompt: Translation
- Size
- 7 billion parameters
- Token limits
- Context window length (input + output): 4096
- Supported natural languages
- Japanese, English
- Instruction tuning information
- For Japanese language training, Japanese text from many sources were used, including Wikipedia and the Open Super-large Crawled ALMAnaCH coRpus (a multilingual corpus that is generated by classifying and filtering language in the Common Crawl corpus). The model was fine-tuned on a data set that was created by ELYZA. The ELYZA Tasks 100 data set contains 100 diverse and complex tasks that were created manually and evaluated by humans. The ELYZA Tasks 100 data set is publicly available from HuggingFace.
- Model architecture
- Decoder
- License
- License
- Learn more
- Read the following resources:
flan-t5-xl-3b
The flan-t5-xl-3b model is provided by Google on Hugging Face. This model is based on the pretrained text-to-text transfer transformer (T5) model and uses instruction fine-tuning methods to achieve better zero- and few-shot performance. The model is also fine-tuned with chain-of-thought data to improve its ability to perform reasoning tasks.
- Usage
-
General use with zero- or few-shot prompts.
- Cost
-
Class 1. For pricing details, see Watson Machine Learning plans.
- Try it out
- Size
-
3 billion parameters
- Token limits
-
Context window length (input + output): 4096
Note: Lite plan output is limited to 700
- Supported natural languages
-
Multilingual
- Instruction tuning information
-
The model was fine-tuned on tasks that involve multiple-step reasoning from chain-of-thought data in addition to traditional natural language processing tasks. Details about the training data sets used are published.
- Model architecture
-
Encoder-decoder
- License
- Learn more
-
Read the following resources:
flan-t5-xxl-11b
The flan-t5-xxl-11b model is provided by Google on Hugging Face. This model is based on the pretrained text-to-text transfer transformer (T5) model and uses instruction fine-tuning methods to achieve better zero- and few-shot performance. The model is also fine-tuned with chain-of-thought data to improve its ability to perform reasoning tasks.
- Usage
-
General use with zero- or few-shot prompts.
- Cost
-
Class 2. For pricing details, see Watson Machine Learning plans.
- Try it out
-
Experiment with samples:
- Size
-
11 billion parameters
- Token limits
-
Context window length (input + output): 4096
Note: Lite plan output is limited to 700
- Supported natural languages
-
English, German, French
- Instruction tuning information
-
The model was fine-tuned on tasks that involve multiple-step reasoning from chain-of-thought data in addition to traditional natural language processing tasks. Details about the training data sets used are published.
- Model architecture
-
Encoder-decoder
- License
- Learn more
-
Read the following resources:
flan-ul2-20b
The flan-ul2-20b model is provided by Google on Hugging Face. This model was trained by using the Unifying Language Learning Paradigms (UL2). The model is optimized for language generation, language understanding, text classification, question answering, common sense reasoning, long text reasoning, structured-knowledge grounding, and information retrieval, in-context learning, zero-shot prompting, and one-shot prompting.
- Usage
-
General use with zero- or few-shot prompts.
- Cost
-
Class 3. For pricing details, see Watson Machine Learning plans.
- Try it out
-
Experiment with samples:
- Sample prompts
- Sample prompt: Earnings call summary
- Sample prompt: Meeting transcript summary
- Sample prompt: Scenario classification
- Sample prompt: Sentiment classification
- Sample prompt: Thank you note generation
- Sample prompt: Named entity extraction
- Sample prompt: Fact extraction
- Sample notebook: Use watsonx to summarize cybersecurity documents
- Sample notebook: Use watsonx and LangChain to answer questions by using retrieval-augmented generation (RAG)
- Sample notebook: Use watsonx, Elasticsearch, and LangChain to answer questions (RAG)
- Sample notebook: Use watsonx, and Elasticsearch Python library to answer questions (RAG)
- Size
-
20 billion parameters
- Token limits
-
Context window length (input + output): 4096
Note: Lite plan output is limited to 700
- Supported natural languages
-
English
- Instruction tuning information
-
The flan-ul2-20b model is pretrained on the colossal, cleaned version of Common Crawl's web crawl corpus. The model is fine-tuned with multiple pretraining objectives to optimize it for various natural language processing tasks. Details about the training data sets used are published.
- Model architecture
-
Encoder-decoder
- License
- Learn more
-
Read the following resources:
granite-13b-chat-v2
The granite-13b-chat-v2 model is provided by IBM. This model is optimized for dialog use cases and works well with virtual agent and chat applications.
Usage: Generates dialog output like a chatbot. Uses a model-specific prompt format. Includes a keyword in its output that can be used as a stop sequence to produce succinct answers.
Cost: Class 1. For pricing details, see Watson Machine Learning plans.
- Try it out
- Size
-
13 billion parameters
- Token limits
-
Context window length (input + output): 8192
- Supported natural languages
-
English
- Instruction tuning information
-
The Granite family of models is trained on enterprise-relevant data sets from five domains: internet, academic, code, legal, and finance. Data used to train the models first undergoes IBM data governance reviews and is filtered of text that is flagged for hate, abuse, or profanity by the IBM-developed HAP filter. IBM shares information about the training methods and data sets used.
- Model architecture
-
Decoder
- License
-
IBM-developed foundation models are considered part of the IBM Cloud Service. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
- Learn more
-
Read the following resources:
granite-13b-instruct-v2
The granite-13b-instruct-v2 model is provided by IBM. This model was trained with high-quality finance data, and is a top-performing model on finance tasks. Financial tasks evaluated include: providing sentiment scores for stock and earnings call transcripts, classifying news headlines, extracting credit risk assessments, summarizing financial long-form text, and answering financial or insurance-related questions.
- Usage
-
Supports extraction, summarization, and classification tasks. Generates useful output for finance-related tasks. Uses a model-specific prompt format. Accepts special characters, which can be used for generating structured output.
- Cost
-
Class 1. For pricing details, see Watson Machine Learning plans.
- Try it out
-
Experiment with samples:
- Size
-
13 billion parameters
- Token limits
-
Context window length (input + output): 8192
- Supported natural languages
-
English
- Instruction tuning information
-
The Granite family of models is trained on enterprise-relevant data sets from five domains: internet, academic, code, legal, and finance. Data used to train the models first undergoes IBM data governance reviews and is filtered of text that is flagged for hate, abuse, or profanity by the IBM-developed HAP filter. IBM shares information about the training methods and data sets used.
- Model architecture
-
Decoder
- License
-
IBM-developed foundation models are considered part of the IBM Cloud Service. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
- Learn more
-
Read the following resources:
granite-7b-lab
The granite-7b-lab foundation model is provided by IBM. The granite-7b-lab foundation model uses a novel alignment tuning method from IBM Research. Large-scale Alignment for chatBots, or LAB is a method for adding new skills to existing foundation models by generating synthetic data for the skills, and then using that data to tune the foundation model.
- Usage
- Supports general purpose tasks, including extraction, summarization, classification, and more.
- Cost
-
Class 1. For pricing details, see Watson Machine Learning plans.
- Try it out
- Size
-
7 billion parameters
- Token limits
-
Context window length (input + output): 8192
Note: The maximum new tokens, which means the tokens generated by the foundation model, is limited to 4096.
- Supported natural languages
-
English
- Instruction tuning information
-
The granite-7b-lab foundation model is trained iteratively by using the large-scale alignment for chatbots (LAB) methodology.
- Model architecture
-
Decoder
- License
-
IBM-developed foundation models are considered part of the IBM Cloud Service. When you use the granite-7b-lab foundation model that is provided in watsonx.ai the contractual protections related to IBM indemnification apply. See the IBM Client Relationship Agreement and IBM watsonx.ai service description.
- Learn more
-
Read the following resources:
granite-8b-japanese
The granite-8b-japanese model is provided by IBM. The granite-8b-japanese foundation model is based on the IBM Granite Instruct foundation model and is trained to understand and generate Japanese text.
- Usage
-
Useful for general purpose tasks in the Japanese language, such as classification, extraction, question-answering, and for language translation between Japanese and English.
- Cost
-
Class 1. For pricing details, see Watson Machine Learning plans.
- Try it out
-
Experiment with samples:
- Size
-
8 billion parameters
- Token limits
-
Context window length (input + output): 8192
- Supported natural languages
-
English, Japanese
- Instruction tuning information
-
The Granite family of models is trained on enterprise-relevant data sets from five domains: internet, academic, code, legal, and finance. The granite-8b-japanese model was pretrained on 1 trillion tokens of English and 0.5 trillion tokens of Japanese text.
- Model architecture
-
Decoder
- License
-
IBM-developed foundation models are considered part of the IBM Cloud Service. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
- Learn more
-
Read the following resources:
granite-20b-multilingual
A foundation model from the IBM Granite family. The granite-20b-multilingual foundation model is based on the IBM Granite Instruct foundation model and is trained to understand and generate text in English, German, Spanish, French, and Portuguese.
- Usage
- English, German, Spanish, French, and Portuguese closed-domain question answering, summarization, generation, extraction, and classification.
- Cost
-
Class 1. For pricing details, see Watson Machine Learning plans.
- Try it out
- Size
-
13 billion parameters
- Token limits
-
Context window length (input + output): 8192
- Supported natural languages
-
English, German, Spanish, French, and Portuguese
- Instruction tuning information
-
The Granite family of models is trained on enterprise-relevant data sets from five domains: internet, academic, code, legal, and finance. Data used to train the models first undergoes IBM data governance reviews and is filtered of text that is flagged for hate, abuse, or profanity by the IBM-developed HAP filter. IBM shares information about the training methods and data sets used.
- Model architecture
-
Decoder
- License
-
IBM-developed foundation models are considered part of the IBM Cloud Service. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
- Learn more
-
Read the following resources:
Granite code models
Foundation models from the IBM Granite family. The Granite code foundation models are instruction-following models fine-tuned using a combination of Git commits paired with human instructions and open-source synthetically generated code instruction data sets.
- Usage
-
Granite code foundation models are designed to respond to coding-related instructions and can be used to build coding assitants.
- Cost
-
Class 1. For pricing details, see Watson Machine Learning plans.
- Try it out
-
Experiment with samples:
- Available sizes
-
The model is available in the following sizes:
- 3 billion parameters
- 8 billion parameters
- 20 billion parameters
- 34 billion parameters
- Token limits
-
Context window length (input + output)
- granite-3b-code-instruct : 2048
- granite-8b-code-instruct : 4096
- granite-20b-code-instruct : 8192
- granite-34b-code-instruct : 8192
- Supported natural languages
-
English
- Supported programming languages
-
The Granite code foundation models support 116 programming languages including Python, Javascript, Java, C++, Go, and Rust. For the full list, see IBM foundation models.
- Instruction tuning information
-
These models were fine-tuned from Granite code base models on a combination of permissively licensed instruction data to enhance instruction-following capabilities including logical reasoning and problem-solving skills.
- Model architecture
-
Decoder
- License
-
IBM-developed foundation models are considered part of the IBM Cloud Service. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.
- Learn more
-
Read the following resources:
jais-13b-chat
The jais-13b-chat foundation model is a bilingual large language model for Arabic and English that is fine-tuned to support conversational tasks.
- Usage
- Supports Q&A, summarization, classification, generation, extraction, and translation in Arabic.
- Cost
- Class 2. For pricing details, see Watson Machine Learning plans.
- Try it out
- Sample prompt: Arabic chat
- Size
- 13 billion parameters
- Token limits
- Context window length (input + output): 2048
- Supported natural languages
- Arabic (Modern Standard Arabic) and English
- Instruction tuning information
- Jais-13b-chat is based on the Jais-13b model, which is a foundation model that is trained on 116 billion Arabic tokens and 279 billion English tokens. Jais-13b-chat is fine-tuned with a curated set of 4 million Arabic and 6 million English prompt-and-response pairs.
- Model architecture
- Decoder
- License
- Apache 2.0
- Learn more
- Read the following resources:
Llama 3 Chat
Meta Llama 3 foundation models are accessible, open large language model that are built with Meta Llama 3 and provided by Meta on Hugging Face. The Llama 3 foundation models are instruction fine-tuned language models that can support various use cases.
Usage: Generates dialog output like a chatbot.
- Cost
-
- 8b: Class 1
- 70b: Class 2
For pricing details, see Watson Machine Learning plans.
- Try it out
- Available sizes
-
- 8 billion parameters
- 70 billion parameters
- Token limits
-
Context window length (input + output): 8192
Note: The maximum new tokens, which means the tokens generated by the foundation model, is limited to 4096.
- Supported natural languages
-
English
- Instruction tuning information
-
Llama 3 features improvements in post-training procedures that reduce false refusal rates, improve alignment, and increase diversity in the foundation model output. The result is better reasoning, code generation, and instruction-following capabilities. Llama 3 has more training tokens (15T) that result in better language comprehension.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
Llama 2 Chat
The Llama 2 Chat model is provided by Meta on Hugging Face. The fine-tuned model is useful for chat generation. The model is pretrained with publicly available online data and fine-tuned using reinforcement learning from human feedback.
You can choose to use the 13 billion parameter or 70 billion parameter version of the model.
- Usage
-
Generates dialog output like a chatbot. Uses a model-specific prompt format.
- Cost
-
- 13b: Class 1
- 70b: Class 2
For pricing details, see Watson Machine Learning plans.
- Try it out
-
Experiment with samples:
- Sample prompt: Chat with Llama 2
- Sample prompt: Questions about an article
- Sample notebook: Use watsonx and Meta llama-2-70b-chat to answer questions about an article
- Sample notebook: Use watsonx and Meta llama-2-70b-chat to answer questions about an article
- Sample notebook: Use watsonx to tune Meta llama-2-13b-chat model with CFPB documents
- Available sizes
-
- 13 billion parameters
- 70 billion parameters
- Token limits
-
Context window length (input + output): 4096
Lite plan output is limited as follows:
- 70b version: 900
- 13b version: 2048
- Supported natural languages
-
English
- Instruction tuning information
-
Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction data sets and more than one million new examples that were annotated by humans.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
llama2-13b-dpo-v7
The llama2-13b-dpo-v7 foundation model is provided by Minds & Company. The llama2-13b-dpo-v7 foundation model is a version of llama2-13b foundation model from Meta that is instruction-tuned and fine-tuned by using the direct preference optimzation method to handle Korean.
- Usage
- Suitable for many tasks, including classification, extraction, summarization, code creation and conversion, question-answering, generation, and retreival-augmented generation in Korean.
- Cost
- Class 2. For pricing details, see Watson Machine Learning plans.
- Try it out
- Experiment with samples:
- Size
- 13.2 billion parameters
- Token limits
- Context window length (input + output): 4096
- Supported natural languages
- English, Korean
- Instruction tuning information
- Direct preference optimzation (DPO) is an alternative to reinforcement learning from human feedback. With reinforcement learning from human feedback, responses must be sampled from a language model and an intermediate step of training a reward model is required. The direct preference optimzation uses a binary method of reinforcement learning where the model chooses the best of two answers based on preference data.
- Model architecture
- Decoder-only
- License
- License
- Learn more
- Read the following resources:
merlinite-7b
The merlinite-7b foundation model is provided by Mistral AI and tuned by IBM. The merlinite-7b foundation model is a derivative of the Mistral-7B-v0.1 model that is tuned with a novel alignment tuning method from IBM Research. Large-scale Alignment for chatBots, or LAB is a method for adding new skills to existing foundation models by generating synthetic data for the skills, and then using that data to tune the foundation model.
- Usage
- Supports general purpose tasks, including extraction, summarization, classification, and more.
- Cost
-
Class 1. For pricing details, see Watson Machine Learning plans.
- Try it out
- Size
-
7 billion parameters
- Token limits
-
Context window length (input + output): 32,768
Note: The maximum new tokens, which means the tokens generated by the foundation model, is limited to 8192.
- Supported natural languages
-
English
- Instruction tuning information
-
The merlinite-7b foundation model is trained iteratively by using the large-scale alignment for chatbots (LAB) methodology.
- Model architecture
-
Decoder
- License
- Learn more
-
Read the following resources:
mistral-large
Mistral Large is a large language model developed by Mistral Al. The mistral-large foundation model is fluent in and understands the grammar and cultural context of English, French, Spanish, German, and Italian. The foundation model can also understand dozens of other languages. The model has a large context window, which means you can add large documents as contextual information in prompts that you submit for retrieval-augmented generation (RAG) use cases. The mistral-large foundation model is effective at programmatic tasks, such as generating, reviewing, and commenting on code, and can generate results in JSON format.
- Usage
-
Suitable for complex multilingual reasoning tasks, including text understanding, transformation, and code generation. Due to the model's large context window, use the max tokens parameter to specify a token limit when prompting the model.
- Cost
-
Mistral Large. For pricing details, see Watson Machine Learning plans.
- Try it out
- Token limits
-
Context window length (input + output): 32,768
Note: The maximum new tokens, which means the tokens generated by the foundation model, is limited to 16,384.
- Supported natural languages
-
English, French, German, Italian, Spanish, and dozens of other languages.
- Instruction tuning information
-
The Mistral Large foundation model is pre-trained on diverse data sets like text, codebases, and mathematical data from various domains.
- Model architecture
-
Decoder-only
- License
-
For terms of use, including information about contractual protections related to indemnification, see Terms of use.
- Learn more
-
Read the following resources:
mixtral-8x7b-instruct-v01
The mixtral-8x7b-instruct-v01 foundation model is provided by Mistral AI. The mixtral-8x7b-instruct-v01 foundation model is a pretrained generative sparse mixture-of-experts network that groups the model parameters, and then for each token chooses a subset of groups (referred to as experts) to process the token. As a result, each token has access to 47 billion parameters, but only uses 13 billion active parameters for inferencing, which reduces costs and latency.
- Usage
-
Suitable for many tasks, including classification, summarization, generation, code creation and conversion, and language translation. Due to the model's unusually large context window, use the max tokens parameter to specify a token limit when prompting the model.
- Cost
-
Class 1. For pricing details, see Watson Machine Learning plans.
- Try it out
- Size
-
46.7 billion parameters
- Token limits
-
Context window length (input + output): 32,768
Note: The maximum new tokens, which means the tokens generated by the foundation model, is limited to 16,384.
- Supported natural languages
-
English, French, German, Italian, Spanish
- Instruction tuning information
-
The Mixtral foundation model is pretrained on internet data. The Mixtral 8x7B Instruct foundation model is fine-tuned to follow instructions.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
mixtral-8x7b-instruct-v01-q (Deprecated)
The mixtral-8x7b-instruct-v01-q model is provided by IBM. The mixtral-8x7b-instruct-v01-q foundation model is a quantized version of the Mixtral 8x7B Instruct foundation model from Mistral AI.
The underlying Mixtral 8x7B foundation model is a sparse mixture-of-experts network that groups the model parameters, and then for each token chooses a subset of groups (referred to as experts) to process the token. As a result, each token has access to 47 billion parameters, but only uses 13 billion active parameters for inferencing, which reduces costs and latency.
- Usage
-
Suitable for many tasks, including classification, summarization, generation, code creation and conversion, and language translation. Due to the model's unusually large context window, use the max tokens parameter to specify a token limit when prompting the model.
- Cost
-
Class 1. For pricing details, see Watson Machine Learning plans.
- Try it out
- Size
-
8 x 7 billion parameters
- Token limits
-
Context window length (input + output): 32,768
Note: The maximum new tokens, which means the tokens generated by the foundation model, is limited to 4096.
- Supported natural languages
-
English, French, German, Italian, Spanish
- Instruction tuning information
-
The Mixtral foundation model is pretrained on internet data. The Mixtral 8x7B Instruct foundation model is fine-tuned to follow instructions.
The IBM-tuned model uses the AutoGPTQ (Post-Training Quantization for Generative Pre-Trained Transformers) method to compress the model weight values from 16-bit floating point data types to 4-bit integer data types during data transfer. The weights decompress at computation time. Compressing the weights to transfer data reduces the GPU memory and GPU compute engine size requirements of the model.
- Model architecture
-
Decoder-only
- License
- Learn more
-
Read the following resources:
mt0-xxl-13b
The mt0-xxl-13b model is provided by BigScience on Hugging Face. The model is optimized to support language generation and translation tasks with English, languages other than English, and multilingual prompts.
Usage: General use with zero- or few-shot prompts. For translation tasks, include a period to indicate the end of the text you want translated or the model might continue the sentence rather than translate it.
- Cost
-
Class 2. For pricing details, see Watson Machine Learning plans.
- Try it out
-
Experiment with the following samples:
- Size
-
13 billion parameters
- Supported natural languages
-
Multilingual
- Token limits
-
Context window length (input + output): 4096
Note: Lite plan output is limited to 700
- Supported natural languages
-
The model is pretrained on multilingual data in 108 languages and fine-tuned with multilingual data in 46 languages to perform multilingual tasks.
- Instruction tuning information
-
BigScience publishes details about its code and data sets.
- Model architecture
-
Encoder-decoder
- License
- Learn more
-
Read the following resources:
Any deprecated foundation models are highlighted with a deprecated warning icon . Any withdrawn foundation models are highlighted with a withdrawn warning icon . For more information about deprecation, including foundation model withdrawal dates, see Foundation model lifecycle.
Learn more
Parent topic: Developing generative AI solutions