IBM Cloud Pak® for Data Version 4.8 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.
Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.8 reaches end of support. For more information, see Upgrading from IBM Cloud Pak for Data Version 4.8 to IBM Software Hub Version 5.1.
Scaling a deployment
When you create an online deployment for a model, function, or Shiny app from a deployment space or programmatically, a single copy of the asset is deployed by default. To increase scalability and availability, you can increase the number of copies (replicas) by editing the configuration of the deployment. More copies allow for a larger volume of scoring requests.
Deployments can be scaled in the following ways:
- Update the configuration for a deployment in a deployment space.
- Programmatically, using the Watson Machine Learning Python client library, or the Watson Machine Learning REST APIs.
You can scale up to ten copies by using the UI. If a larger number of copies are required then use the API to scale your deployment.
Changing the number of copies of an online deployment from a space
- Click the Deployment tab of your deployment space.
- From the action menu for your deployment name, click Edit.
- In the Edit deployment dialog box, change the number of copies and click Save.
Increasing the number of replicas of a deployment programmatically
To view or run a working sample of scaling a deployment programmatically, you can increase the number of replicas in the metadata for a deployment.
Python example
This example uses the Python client to set the number of replicas to 3.
change_meta = {
client.deployments.ConfigurationMetaNames.HARDWARE_SPEC: {
"name":"S",
"num_nodes":3}
}
client.deployments.update(<deployment_id>, change_meta)
The HARDWARE_SPEC value includes a name because the API requires a name or an ID to be provided. However, this argument is disregarded for online deployments with the following frameworks: Spark, PMML, Scikit-learn (excluding
custom image), XGBoost (excluding custom image), PyTorch (excluding custom image).
REST API example
curl -k -X PATCH -d '[ { "op": "replace", "path": "/hardware_spec", "value": { "name": "S", "num_nodes": 2 } } ]' <Deployment end-point URL>
You must specify a name for the hardware_spec value, but the argument is not applied for scaling.
Parent topic: Managing predictive deployments