Managing predictive deployments
For proper deployment, you must set up a deployment space and then select and configure a specific deployment type. After you deploy assets, you can manage and update them to make sure they perform well and to monitor their accuracy.
To be able to deploy assets from a space, you must have a machine learning service instance that is provisioned and associated with that space.
Online and batch deployments provide simple ways to create an online scoring endpoint or do batch scoring with your models.
If you want to implement a custom logic:
- Create a Python function to use for creating your online endpoint
- Write a notebook or script for batch scoring
Deployable assets
Following is the list of assets that you can deploy from a watsonx.ai Runtime space, with information on applicable deployment types:
Asset type | Batch deployment | Online deployment |
---|---|---|
Functions | Yes | Yes |
Models | Yes | Yes |
Scripts | Yes | No |
Notes:
- A deployment job is a way of running a batch deployment, or a self-contained asset like a flow in watsonx.ai Runtime. You can select the input and output for your job and choose to run it manually or on a schedule. For more information, see Creating a deployment job.
- You can deploy a Natural Language Processing model by using Python functions or Python scripts. Both online and batch deployments are supported.
- Notebooks and flows use notebook environments. You can run them in a deployment space, but they are not deployable.
For more information, see:
- Creating online deployments
- Creating batch deployments
- Deploying Python functions
- Deploying NLP models
- Deploying scripts
After you deploy assets, you can manage and update them to make sure they perform well and to monitor their accuracy. Some ways to manage or update a deployment are as follows:
-
Manage deployment jobs. After you create one or more jobs, you can view and manage them from the Jobs tab of your deployment space.
-
Update a deployment. For example, you can replace a model with a better-performing version without having to create a new deployment.
-
Scale a deployment to increase availability and throughput by creating replicas of the deployment.
-
Delete a deployment to remove a deployment and free up resources.
Configuring API gateways to provide stable endpoints
watsonx.ai Runtime provides stable endpoints to prevent downtime. However, you might experience downtime if you move to a new Cloud Pak for Data instance or add an instance.
API gateways provide a stable URL that can be used with your Watson Machine Learning API endpoint. You can use an API gateway (available in Cloud Pak for Integration) with your deployment endpoints to handle downtime if it happens in the following cases:
- If you have more than one instance of Cloud Pak for Data in a high-availability configuration, and one of the available instances fails. In this case, you can use an API gateway for switching automatically to another instance, thereby preventing complete failure.
- If you have more than one application that uses the same endpoint, and the deployment endpoint is not available. For example, if you accidentally delete the deployment. In this case, you can update the endpoint in the API gateway to make sure that applications continue to use it.
Enabling GPU and MIG support for deployment runtimes
If you are deploying a predictive machine learning model that requires significant processing power for inferencing, you can optionally configure a GPU for deployment runtimes.
You can also enable MIG support for GPUs when you want to deploy an application that does not require the full power of an enitre GPU. If you are configuring MIG for GPU-accelerated workloads, all GPU-enabled nodes should adhere to a single strategy determined in the prior configuration steps. This ensures consistent behaviour across all GPU-enabled nodes in the cluster. To configure MIG support, see Nvidia Guide for configuring MIG support.
Learn more
Parent topic: Deploying assets