Supported hardware, model architectures, and performance settings

Review the foundation model architectures, and performance-boosting settings and features available for deploying a custom foundation model with watsonx.ai.

Service The required watsonx.ai service and other supplemental services are not available by default. An administrator must install these services on the IBM Cloud Pak for Data platform. To determine whether a service is installed, open the Services catalog and check whether the service is enabled.

Deploying custom foundation model is available starting with Cloud Pak for Data 4.8.4.

When you deploy a custom foundation model, consider the following requirements:

Make sure that your hardware supports deploying a custom foundation model.
Make sure that the model that you are deploying uses a supported architecture.
Choose a hardware specification for deploying a custom foundation model, based on the size and number of parameters for the model.

Supported software specification

The required software specification for deploying a custom foundation model is watsonx-cfm-caikit-1.0. This specification is not editable.

Hardware requirements

To deploy custom foundation models in Cloud Pak for Data, you must have the NVIDIA A100 or H100 80 GB hardware set up on your cluster.

Supported model architectures

To find the architecture type for your custom foundation model, see Planning to deploy a custom foundation model.

You must deploy foundation models for which the model architecture is supported by watsonx.ai.

The following table provides information about architectures, quantization methods, and parallel tensors that are supported for each architecture:

Supported model architectures, quantization methods, and multiGpu availability
Model architecture type	Supported quantization method	Supports parallel tensors (multiGpu)
`bloom`	N/A	Yes
`codegen`	N/A	No
`falcon`	N/A	Yes
`gpt_bigcode`	`gptq`	Yes
`gpt_neox`	N/A	Yes
`gptj`	N/A	No
`llama`	`gptq`	Yes
`mixtral`	`gptq`	No
`mistral`	N/A	No
`mt5`	N/A	No
`mpt`	N/A	No
`t5`	N/A	Yes

Note: For the llama architecture, the llama and llama2 model types are supported.

Important:

IBM does not support deployment failures as a result of deploying foundation models with unsupported architectures. If you deploy a model with an unsupported architecture, you might receive a warning message.

Predefined hardware specifications

Choose a hardware specification when you deploy a custom foundation model.

WX-S: 1 GPU, 2 CPU, and 60 GB of memory
WX-M: 2 GPU, 3 CPU, and 120 GB of memory
WX-L: 4 GPU, 5 CPU, and 240 GB of memory
WX-XL: 8 GPU, 9 CPU, and 600 GB of memory

Assign a hardware specification to your custom foundation model, based on the number of parameters that the model was trained with:

1B to 20B parameters: WX-S
21B to 40B parameters: WX-M
41B to 80B parameters: WX-L
80B to 200B parameters WX-XL

Note:

The codegen, gptj, mpt, mistral, mixtral and mt5 architecture types support one GPU, so regardless of the number of parameters you must assign the WX-S hardware specification to them.
You cannot use predefined model specifications with quantized models. For quantized models and in other non-standard cases, use a custom hardware specification.

Creating custom hardware specifications

Follow these guidelines to optionally create a custom hardware specification for your model:

Non-quantized models:

Resource	Calculation
GPU Memory	(Number of Billion parameters * 2) + 50 % additional memory
Number of GPUs	Number of GPUs depends on GPU memory requirements: 1GPU = 80 GB
Number of CPUs	Number of GPUs + 1
CPU memory	Equal to GPU memory

Quantized models:

4-bit quantized models:

Resource	Calculation
GPU Memory	(Number of Billion parameters * 0.5) + 50 % additional memory
Number of GPUs	Number of GPUs depends on GPU memory requirements: 1GPU = 80 GB
Number of CPUs	Number of GPUs + 1
CPU memory	Equal to GPU memory

8-bit quantized models:

Resource	Calculation
GPU Memory	Number of Billion parameters + 50 % additional memory
Number of GPUs	Number of GPUs depends on GPU memory requirements: 1GPU = 80 GB
Number of CPUs	Num of GPUs + 1
CPU memory	Equal to GPU memory

Note: Failure to follow these formulas might result in an unexpected model behavior.

Use the following code sample to create a custom hardware specification for your model in a Project:

curl -ik -X POST -H "Authorization: Bearer $TOKEN" "https://<cluster_url>/v2/hardware_specifications?project_id=$project_id" \
-H "Content-Type:application/json" \
--data '{
  "name": "custom_hw_spec",
  "description": "Custom hardware specification for foundation models",
  "nodes": {
    "cpu": {
      "units": "2"
    },
    "mem": {
      "size": "128Gi"
    },
    "gpu": {
      "num_gpu": 1
    }
  }
}'

Use the following code sample to create a custom hardware specification for your model in a deployment space:

curl -ik -X POST -H "Authorization: Bearer $TOKEN" "https://<cluster_url>/v2/hardware_specifications?space_id=$space_id"
-H "Content-Type:application/json"
--data '{
  "name": "custom_hw_spec",
  "description": "Custom hardware specification for foundation models",
  "nodes": {
    "cpu": {
      "units": "2"
    },
    "mem": {
      "size": "128Gi"
    },
    "gpu": {
      "num_gpu": 1
    }
  }
}'

Next steps

Setting up storage and uploading the model

Parent topic: Planning to deploy a custom foundation model