Supported hardware, model architectures, and performance settings
Review the foundation model architectures, and performance-boosting settings and features available for deploying a custom foundation model with watsonx.ai.
Service The required watsonx.ai service and other supplemental services are not available by default. An administrator must install these services on the IBM Cloud Pak for Data platform. To determine whether a service is installed, open the Services catalog and check whether the service is enabled.
Deploying custom foundation model is available starting with Cloud Pak for Data 4.8.4.
When you deploy a custom foundation model, consider the following requirements:
- Make sure that your hardware supports deploying a custom foundation model.
- Make sure that the model that you are deploying uses a supported architecture.
- Choose a hardware specification for deploying a custom foundation model, based on the size and number of parameters for the model.
Supported software specification
The required software specification for deploying a custom foundation model is watsonx-cfm-caikit-1.0
. This specification is not editable.
Hardware requirements
To deploy custom foundation models in Cloud Pak for Data, you must have the NVIDIA A100 or H100 80 GB
hardware set up on your cluster.
Supported model architectures
To find the architecture type for your custom foundation model, see Planning to deploy a custom foundation model.
You must deploy foundation models for which the model architecture is supported by watsonx.ai.
The following table provides information about architectures, quantization methods, and parallel tensors that are supported for each architecture:
Model architecture type | Supported quantization method | Supports parallel tensors (multiGpu) |
---|---|---|
bloom |
N/A | Yes |
codegen |
N/A | No |
falcon |
N/A | Yes |
gpt_bigcode |
gptq |
Yes |
gpt_neox |
N/A | Yes |
gptj |
N/A | No |
llama |
gptq |
Yes |
mixtral |
gptq |
No |
mistral |
N/A | No |
mt5 |
N/A | No |
mpt |
N/A | No |
t5 |
N/A | Yes |
IBM does not support deployment failures as a result of deploying foundation models with unsupported architectures. If you deploy a model with an unsupported architecture, you might receive a warning message.
Predefined hardware specifications
Choose a hardware specification when you deploy a custom foundation model.
- WX-S: 1 GPU, 2 CPU, and 60 GB of memory
- WX-M: 2 GPU, 3 CPU, and 120 GB of memory
- WX-L: 4 GPU, 5 CPU, and 240 GB of memory
- WX-XL: 8 GPU, 9 CPU, and 600 GB of memory
Assign a hardware specification to your custom foundation model, based on the number of parameters that the model was trained with:
- 1B to 20B parameters:
WX-S
- 21B to 40B parameters:
WX-M
- 41B to 80B parameters:
WX-L
- 80B to 200B parameters
WX-XL
- The
codegen
,gptj
,mpt
,mistral
,mixtral
andmt5
architecture types support one GPU, so regardless of the number of parameters you must assign theWX-S
hardware specification to them. - You cannot use predefined model specifications with quantized models. For quantized models and in other non-standard cases, use a custom hardware specification.
Creating custom hardware specifications
Follow these guidelines to optionally create a custom hardware specification for your model:
Non-quantized models:
Resource | Calculation |
---|---|
GPU Memory | (Number of Billion parameters * 2) + 50 % additional memory |
Number of GPUs | Number of GPUs depends on GPU memory requirements: 1GPU = 80 GB |
Number of CPUs | Number of GPUs + 1 |
CPU memory | Equal to GPU memory |
Quantized models:
4-bit quantized models:
Resource | Calculation |
---|---|
GPU Memory | (Number of Billion parameters * 0.5) + 50 % additional memory |
Number of GPUs | Number of GPUs depends on GPU memory requirements: 1GPU = 80 GB |
Number of CPUs | Number of GPUs + 1 |
CPU memory | Equal to GPU memory |
8-bit quantized models:
Resource | Calculation |
---|---|
GPU Memory | Number of Billion parameters + 50 % additional memory |
Number of GPUs | Number of GPUs depends on GPU memory requirements: 1GPU = 80 GB |
Number of CPUs | Num of GPUs + 1 |
CPU memory | Equal to GPU memory |
Use the following code sample to create a custom hardware specification for your model in a Project:
curl -ik -X POST -H "Authorization: Bearer $TOKEN" "https://<cluster_url>/v2/hardware_specifications?project_id=$project_id" \
-H "Content-Type:application/json" \
--data '{
"name": "custom_hw_spec",
"description": "Custom hardware specification for foundation models",
"nodes": {
"cpu": {
"units": "2"
},
"mem": {
"size": "128Gi"
},
"gpu": {
"num_gpu": 1
}
}
}'
Use the following code sample to create a custom hardware specification for your model in a deployment space:
curl -ik -X POST -H "Authorization: Bearer $TOKEN" "https://<cluster_url>/v2/hardware_specifications?space_id=$space_id"
-H "Content-Type:application/json"
--data '{
"name": "custom_hw_spec",
"description": "Custom hardware specification for foundation models",
"nodes": {
"cpu": {
"units": "2"
},
"mem": {
"size": "128Gi"
},
"gpu": {
"num_gpu": 1
}
}
}'
Next steps
Setting up storage and uploading the model
Parent topic: Planning to deploy a custom foundation model