Deploying custom foundation models

You can upload and deploy a custom foundation model for use with watsonx.ai inferencing capabilities.

Service The required watsonx.ai service and other supplemental services are not available by default. An administrator must install these services on the IBM Cloud Pak for Data platform. To determine whether a service is installed, open the Services catalog and check whether the service is enabled.

For a watsonx.ai lightweight engine installation, you follow different steps to add custom foundation models. For more information, see Adding custom foundation models to .

In addition to working with foundation models that are curated by IBM, you can now upload and deploy your own foundation models. After the models are deployed and registered with watsonx.ai, create prompts that inference the custom models from the Prompt Lab.

Deploying a custom foundation model provides the flexibility for you to implement the AI solutions that are right for your use case. The deployment process differs slightly depending on the source of your custom foundation model.

It is best to get the model directly from the model builder. One place to find new models is Hugging Face, a repository for open source foundation models used by many model builders.

Importing custom foundation models to a deployment space

The process for deploying a foundation model and making it available for inferencing includes tasks that are performed by a Cloud Pak for Data System administrator, ModelOps engineer, and Prompt engineer.

The system administrator must prepare the model and upload it to PVC storage. After storing the model, the Administrator must register the model with watsonx.ai.

To deploy a custom foundation model, the ModelOps engineer must create or promote a foundation model asset into the deployment space context.

After the model is deployed to production, the Prompt engineer can prompt the custom foundation model from the Prompt Lab or watsonx.ai API.

The following graphic represents a flow of tasks that are typically performed by a Cloud Pak for Data system administrator, a ModelOps engineer, and a Prompt engineer:

Process overview for deploying a custom foundation model

Preparing the model and uploading to PVC storage

To prepare the model and upload it to PVC storage, the system administrator must perform the following tasks:

  1. Prepare to deploy a custom foundation model
  2. Review the supported architecture frameworks and hardware specifications for custom foundation models
  3. Set up a storage repository for hosting the model and then upload the model to the storage repository
  4. Register the custom foundation model to use with watsonx.ai

Deploying a custom foundation model

To deploy a custom foundation model from a deployment space, the ModelOps engineer must perform the following tasks:

  1. Create the deployment for the custom foundation model
  2. Manage or update the deployment

Prompting the custom foundation model

To prompt the custom foundation model from the Prompt lab or watsonx.ai API, the Prompt engineer must perform the following tasks:

  1. Use the custom foundation model for generating prompt output

Requirements and usage notes for custom foundation models

Deployable custom models must meet these requirements:

  • The file list for the model must contain a config.json file. See Planning to deploy a custom foundation model for steps on how to check for the file.

  • The model must be compatible with the Text Generation Inference (TGI) standard and be built with a supported model architecture type. The model type is listed in the config.json file for the model.

  • The model must be in a safetensors format with the supported transformers library and include a tokenizer for authentication. If the model is otherwise compatible, a conversion utility provides these requirements as part of the process for preparing to upload the model.

    Important:

    You must make sure that your custom foundation model is saved with the supported transformers library. If the model.safetensors file for your custom foundation model uses an unsupported data format in the metadata header, your deployment might fail. For more information, see Troubleshooting Watson Machine Learning.

Note these restrictions for using custom foundation models after they are deployed and registered with watsonx.ai:

  • You cannot tune a custom foundation model.
  • You cannot use watsonx.governance to evaluate or track a prompt template for a custom foundation model.

Next steps

Watch this video to see how to deploy a custom foundation model.

This video provides a visual method to learn the concepts and tasks in this documentation.

Learn more

Developing generative AI solutions with foundation models (watsonx.ai)

Parent topic: Deploying foundation model assets