IBM Cloud Pak® for Data Version 4.8 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.
Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.8 reaches end of support. For more information, see Upgrading from IBM Cloud Pak for Data Version 4.8 to IBM Software Hub Version 5.1.
Known issues and limitations for Watson Machine Learning
The following known issues and limitations apply to Watson Machine Learning.
Known issues
-
Known issues for Federated Learning
-
Known issues for AutoAI
-
Known issues for Watson Machine Learning
- Unusable deployments after an upgrade or restore from backup
- Predictions API in Watson Machine Learning service can time out too soon
- Decision Optimization deployment job fails with error:
Add deployment failed with deployment not finished within time - Previewing masked data assets is blocked in deployment space
- Deployment runtime containers in
CrashLoopBackoffstate after upgrade from previous releases - Deployments with custom
conda_ymlpackage extensions andnodefaultsfail - Deployment of runtime pods fail after upgrade
- Exporting asset files from a space fails after backing up and restoring Cloud Pak for Data
Limitations
-
Limitations for Watson Machine Learning
- Restrictions for IBM Z and IBM LinuxONE users
- Deploying a model on an S90X cluster might require retraining
- Limits on size of model deployments
- Security for file uploads
- Automatic mounting of storage volumes is not supported by online and batch deployments
- Batch deployments that use large data volumes as input might fail
- Batch deployment jobs that use large inline payload might get stuck in
startingorrunningstate - Python scoring function with a custom software specification executed on the Linux on Power (ppc64le) platform fails when custom software spec has a YML package extension
- Update pod time-out limits to manage resources for long-running jobs
- Setting environment variables in a conda yaml file does not work for deployments
- R Shiny applications deployed with
shiny-r3.6software specification fail after upgrade
-
Limitations for AutoAI experiments
Known issues for Federated Learning
Authentication failures for Federated Learning training jobs when allowed IPs are specified in the Remote Training System
Applies to: 4.8.0 and later
Currently, the Red Hat OpenShift Ingress controller is not setting the X-Forwarded-For header with the client's IP address regardless of the forwardedHeaderPolicy setting. This causes authentication failures for Federated
Learning training jobs when allowed_ips are specified in the Remote Training System even though the client IP address is correct.
To use the Federated Learning Remote Training System IP restriction feature in Cloud Pak for Data 4.0.3, configure an external proxy to inject the X-Forwarded-For header. For more information, see the article on configuring ingress.
Known issues for AutoAI
The Flight service returns "Received RST_STREAM with error code 3" when reading large data sets
Applies to: 4.8.0
Fixed in: 4.8.1
If you use the Flight service and the pyarrow library to read large data sets in an AutoAI experiment in a notebook, the Flight service might return the following message:
Received RST_STREAM with error code 3
When this error occurs, the AutoAI experiment receives incomplete data, which can affect training of the model candidate pipelines.
If this error occurs, add the following code to your notebook:
os.environ['GRPC_EXPERIMENTAL_AUTOFLOWCONTROL'] = 'false'
Then, rerun the experiment.
Importing an AutoAI notebook from a catalog can result in runtime error
Applies to: 4.8.0 and later
If you save an AutoAI notebook to an IBM Knowledge Catalog, and then you import it into a project and run it, you might get this error: Library not compatible or missing.
This error results from a mismatch between the runtime environment saved in the catalog and the runtime environment required to run the notebook in the project. To resolve, update the runtime environment to the latest supported version. For
example, if the imported notebook uses Runtime 22.2 in the catalog version, update to Runtime 23.1 and run the notebook job again.
Running AutoAI pipeline notebook generates TypeError
Applies to: 4.8.8 and later
Running an AutoAI pipeline notebook results in a TypeError with a missing argument in the initialized CatImputer:
cat_imputer = CatImputer(missing_values=float("nan"), sklearn_version_family="1")
To workaround this issue, add strategy="most_frequent” to the initializer and rerun the cell:
cat_imputer = CatImputer(strategy="most_frequent", missing_values=float("nan"), sklearn_version_family="1")
Known issues for Watson Machine Learning
Unusable deployments after an upgrade or restoring from backup
Applies to: 4.8.0 and later
For deployments created on Cloud Pak for Data 4.6.x, generating predictions with a deployment might fail after an upgrade to Cloud Pak for Data 4.8.x. The error message for this problem is:
Deployment: <deployment-ID> has been suspended due to the deployment owner either not being a member of the deployment space: <space-ID> any more or removed from the system.
These errors can also occur following a restore from backup.
The resolution is to update the deployments by using the following steps. You must use alternative steps that are specific to R Shiny deployments.
To update deployments, except for R Shiny deployments:
-
For
HOST="CP4D_HOSTNAME", replace "CPD_HOSTNAME" with the Cloud Pak for Data hostname. -
For
SPACE_ID="WML_SPACE_ID", replace "WML_SPACE_ID" with the space ID of the deployment that is failing. -
For
DEPLOYMENT_ID="WML_DEPLOYMENT_ID"replace "WML_DEPLOYMENT_ID" with the deployment ID of the broken deployment. -
Use
"Authorization: ZenApiKey <token>"and supply a valid token. If you export the environment variable use${TOKEN}instead of<token>. -
Use this CURL command to replace the "OWNER_ID" with actual owner ID on this cluster in the PATCH payload.
curl -k -X PATCH "$HOST/ml/v4/deployments/$DEPLOYMENT_ID?version=2020-04-20&space_id=$SPACE_ID" -H "content-type: application/json" -H "Authorization: ZenApiKey <token>" --data '[{ "op": "replace", "path": "/metadata/owner", "value": "OWNER_ID" }]'
To run this script, you must generate and export the token as the ${MY_TOKEN} environment variable. For details, see Generating an API authorization token.
To update R-Shiny deployments:
-
Use
oc get pods -n NAMESPACE | grep "wml-deployment-manager"and replace theNAMESPACEwithWML Namespace. -
For
oc exec -it WML_DEPLOYMENT_MANAGER_POD_NAME bash -n NAMESPACE, replace theWML_DEPLOYMENT_MANAGER_POD_NAMEwith the Deployment manager pod name displayed in the previous step and replace theNAMESPACEwith the Watson Machine Learning namespace`. -
For
deployment_id="DEPLOYMENT_ID", replace theDEPLOYMENT_IDwith the deployment ID. -
For
space_id="SPACE_ID", replace theSPACE_IDwith the space ID for the deployment. -
For
HOST="https://wml-deployment-manager-svc.NAMESPACE.svc:16500", replace theNAMESPACEwith the Watson Machine Learning namespace`. -
Use
"Authorization: ZenApiKey <token>"and supply a valid token. If you export the environment variable use${TOKEN}instead of<token>. -
Re-create the R Shiny deployment using the following CURL command:
url -k -X PUT "$HOST/ml/v4_private/recreate_deployment/$deployment_id?version=2020-06-12&space_id=$space_id" -H "Authorization: ZenApiKey <token>" -
Verify the status of R Shiny deployment and wait for the deployment to become "Ready" before proceeding to the next step.
curl -k -X GET "$HOST/ml/v4/deployments/$deployment_id?version=2020-06-12&space_id=$space_id" -H "Authorization: ZenApiKey ${MY_TOKEN}" -
If you are upgrading to Cloud Pak for Data 4.8.0 or restoring from backup, scale up the number of copies by 1 from the deployment space UI.

The deployment state will be changed from "Unusable" to "Deployed" state.

- You can optionally scale the number of copies back to 1 or the original setting when the deployment is working as expected.
2.To run this script, you must generate and export the token as the${MY_TOKEN}environment variable. For details, see Generating an API authorization token.
Predictions API in Watson Machine Learning service can time out too soon
Applies to: 4.8.0 and later
If the predictions API (POST /ml/v4/deployments/{deployment_id}/predictions) in the Watson Machine Learning deployment service is timing out too soon, follow these steps to manually update the timeout interval.
-
Update the API timeout parameter in Watson Machine Learning CR:
REQUIRED_TIMEOUT_IN_SECONDS=<timeout-in-seconds> NAMESPACE=<wml-instance-namespace> oc patch wmlbase wml-cr -p "{\"spec\":{\"wml_api_timeout\": $REQUIRED_TIMEOUT_IN_SECONDS, \"wml_envoy_pods\": 1}}" --type=merge -n "$NAMESPACE"The following example shows how to update the timeout to
600seconds for service instance namespacezen:REQUIRED_TIMEOUT_IN_SECONDS=600 NAMESPACE=zen oc patch wmlbase wml-cr -p "{\"spec\":{\"wml_api_timeout\": $REQUIRED_TIMEOUT_IN_SECONDS, \"wml_envoy_pods\": 1}}" --type=merge -n "$NAMESPACE"Note:If HPA is disabled on the Cloud Pak for Data cluster and you want to increase the throughput of Watson Machine Learning prediction API requests, you can increase the number of Watson Machine Learning envoy pods by using the
wml_envoy_podsparameter in the command. One envoy pod can support up to 1500 requests per second. -
Restart the NGINX pods:
oc rollout restart deployment ibm-nginx -n "$NAMESPACE" -
Check that the NGINX pods come up:
oc get pods -n "$NAMESPACE" | grep "ibm-nginx"
Decision Optimization deployment job fails with error: "Add deployment failed with deployment not finished within time"
Applies to: 4.8.0 and later
If your decision optimization deployment job fails with the following error, complete the steps to extend the timeout window.
"status": {
"completed_at": "2022-09-02T02:35:31.711Z",
"failure": {
"trace": "0c4c4308935a3c4f2d9987b22139c61c",
"errors": [{
"code": "add_deployment_failed_in_runtime",
"message": "Add deployment failed with deployment not finished within time"
}]
},
"state": "failed"
}
To update the deployment timeout in the deployment manager:
-
Edit the
wmlbase wml-crand add this line:ignoreForMaintenance: true. This sets the WML operator into maintenance mode, which stops automatic reconciliation. The automatic reconciliation will undo any configmap changes applied otherwise.oc patch wmlbase wml-cr --type merge --patch '{"spec": {"ignoreForMaintenance": true}}' -n <namespace>For example:
oc patch wmlbase wml-cr --type merge --patch '{"spec": {"ignoreForMaintenance": true}}' -n zen -
Capture the contents of the
wmlruntimemanagerconfigmap in a YAML file.oc get cm wmlruntimemanager -n <namespace> -o yaml > wmlruntimemanager.yamlFor example:
oc get cm wmlruntimemanager -n zen -o yaml > wmlruntimemanager.yaml -
Create a backup of the
wmlruntimemanagerYAML file.cp wmlruntimemanager.yaml wmlruntimemanager.yaml.bkp -
Open the
wmlruntimemanager.yaml.vi wmlruntimemanager.yaml -
Navigate to file
runtimeManager.confand search for propertyservice. -
Increase the number of retries in the
retry_countfield to extend the timeout window:service { jobs { do { check_deployment_status { retry_count = 420 // Increase the number of retries to extend the timeout window } retry_delay = 1000 } } }Where:
Field retry_count= Number of retriesField retry_delay= Delay between each retry in milliseconds
In the example, the timeout is configured as 7 minutes (
retry_count * retry_delay = 420 * 1000= 7 minutes). If you want to increase the timeout further, you can increase the number of retries in theretry_countfield. -
Apply the deployment manager configmap changes:
oc delete -f wmlruntimemanager.yaml oc create -f wmlruntimemanager.yaml -
Restart the deployment manager pods:
oc get pods -n <namespace> | grep wml-deployment-manager oc delete pod <podname> -n <namespace> -
Wait for the deployment manager pod to come up:
oc get pods -n <namespace> | grep wml-deployment-manager
If you plan to upgrade the Cloud Pak for Data cluster, you must bring the WML operator out of maintenance mode by setting the field ignoreForMaintenance to false in wml-cr.
Previewing masked data assets is blocked in deployment space
Applies to: 4.8.0 and later
A data asset preview might fail with this message:
This asset contains masked data and is not supported for preview in the Deployment Space
Deployment spaces currently don't support masking data so the preview for masked assets has been blocked to prevent data leaks.
Deployments with custom conda_yml package extensions and nodefaults fail
Applies to: 4.8.1 and later
Fixed in: 4.8.3
If you deploy an asset with custom conda_yml package extension or update packages with conda env update subprocess call, your deployment might fail when conda channels are restricted to nodedefaults.
The workaround is as follows:
- For Cloud Pak for Data version 4.8.3, use
conda_ymlpackage extension. - For Cloud Pak for Data versions prior to 4.8.3, use
conda_ymlpackage extension and removenodefaultsrestriction forchannel.
The following example shows conda channels restricted to nodefaults:
channels:
- empty
- nodefaults
dependencies:
- pip:
- langdetect==1.0.9
As a workaround, remove nodefaults restriction for channel:
channels:
- empty
dependencies:
- pip:
- langdetect==1.0.9
Deployment of runtime pods fail after upgrade
Applies to: 4.8.4
If you deploy a machine learning model with a constricted software specification in FIPS mode, the runtime pod might fail after you upgrade to Cloud Pak for Data version 4.8.4. To learn more about constricted software specifications, see Software specifications lifecycle.
The following code snippet shows the py39 runtime pod entering into the crashloopbackoff state after upgrading from Cloud Pak for Data version 4.6.5 to version 4.8.4.
wml-dep-py39-00d7b8ba-e942-4b9e-bf89-3096fb143481-5449b56b9lnrx 1/2 CrashLoopBackOff 4 (22s ago) 2m2s
wml-dep-py39-00d7b8ba-e942-4b9e-bf89-3096fb143481-5d55449f2f8mm 1/2 CrashLoopBackOff 4 (16s ago) 2m2s
wml-dep-py39-2dfb43d1-32ea-46b4-9318-1270a9869e7c-5bd5bb5cmm5jv 1/2 CrashLoopBackOff 4 (34s ago) 2m2s
wml-dep-py39-2dfb43d1-32ea-46b4-9318-1270a9869e7c-ff74c46dztl7r 1/2 CrashLoopBackOff 4 (23s ago) 2m2s
wml-dep-py39-38f88d99-78d2-4c2d-8fb4-e1039d465c5a-75c98ffd5nb4b 1/2 CrashLoopBackOff 4 (31s ago) 2m2s
wml-dep-py39-38f88d99-78d2-4c2d-8fb4-e1039d465c5a-86c8767bmx42v 1/2 CrashLoopBackOff 4 (20s ago) 2m2s
wml-dep-py39-5ac302d4-819a-4c48-8a42-d63d2437e9af-547dc9876jzkn 1/2 CrashLoopBackOff 4 (11s ago) 2m2s
wml-dep-py39-76d51889-cb37-460c-b86f-078b234163e4-7454fd76sgkrm 1/2 CrashLoopBackOff 4 (23s ago) 2m2s
wml-dep-py39-76d51889-cb37-460c-b86f-078b234163e4-775564f7tmntg 1/2 CrashLoopBackOff 4 (21s ago) 2m2s
wml-dep-py39-9409a1c5-02f4-4183-ae87-8025815d01bb-6b89bdc9fkk6g 1/2 CrashLoopBackOff 4 (26s ago) 2m2s
wml-dep-py39-9409a1c5-02f4-4183-ae87-8025815d01bb-6b9559f57b2tm 1/2 CrashLoopBackOff 4 (17s ago) 2m2s
wml-dep-py39-ecd96b96-27d6-4abf-9fe3-9a1f1eff16de-65c66b4cn7gz5 1/2 CrashLoopBackOff 4 (32s ago) 2m2s
wml-dep-py39-ecd96b96-27d6-4abf-9fe3-9a1f1eff16de-795f955f2xtgp 1/2 CrashLoopBackOff 4 (33s ago) 2m1s
wml-dep-py39-f60e182c-627d-468d-a7ca-bfb9387e3ad8-57cbdcb6fhvs2 1/2 CrashLoopBackOff 4 (31s ago) 2m1s
wml-dep-py39-f60e182c-627d-468d-a7ca-bfb9387e3ad8-6d44cfbdqmh8n 1/2 CrashLoopBackOff 4 (22s ago) 2m1s
As a workaround, you must upgrade to Cloud Pak for Data version 4.6.5 or higher and contact IBM Support to apply the hot fix before upgrading to Cloud Pak for Data version 4.8.4.
Exporting asset files from a space fails after backing up and restoring Cloud Pak for Data
Applies to: 4.8.4
This issue occurs on clusters running on Power (ppc64le) hardware.
After you backup and restore an instance of Cloud Pak for Data that uses Spectrum Scale storage, you cannot export asset files from a space. The export fails with a message that indicates that the asset files API was not able to connect to RabbitMQ.
To resolve the problem, restart the asset-files-api pod:
-
Set the
API_POD_NAMEenvironment variable:export API_POD_NAME=$(oc get pods -n=${PROJECT_CPD_INST_OPERANDS} | grep "asset-files-api" | awk '{print $1}') -
Restart the
asset-files-apipod:oc delete pod ${API_POD_NAME} -n=${PROJECT_CPC_INST_OPERANDS}
Limitations for Watson Machine Learning
AutoAI file gets pushed to the Git repository in default Git projects
After you create an AutoAI experiment in a default Git project, you create a commit and see a file that includes your experiment name in the list of files that can be committed. There are no consequences to including this file in your commit. The AutoAI experiment will not appear in the asset list for any other user who pulls the file into their local clone using Git. Additionally, other users won’t be prevented from creating an AutoAI experiment with the same name.
Restrictions for IBM Z and IBM LinuxONE users
Applies to: 4.8.0 and later
For a list of feature restrictions, see Capabilities on Linux on IBM Z and IBM LinuxONE
Deploying a model on an S90X cluster might require retraining
Applies to: 4.8.0 and later
Training an AI model on a different platform such as x86/ppc and deploying the AI model on s390x using Watson Machine Learning might fail because of an endianness issue. In such cases, retrain and deploy the existing AI model on the s390x platform to resolve the problem.
Limits on size of model deployments
Applies to: 4.8.0 and later
Limits on the size of models you deploy with Watson Machine Learning depend on factors such as the model framework and type. In some instances, when you exceed a threshold, you will be notified with an error when you try to store a model in
the Watson Machine Learning repository, for example: OverflowError: string longer than 2147483647 bytes. In other cases, the failure might be indicated by a more general error message, such as The service is experiencing some downstream errors, please re-try the request or There's no available attachment for the targeted asset. Any of these results indicate that you have exceeded the allowable size limits for that type of deployment.
Security for file uploads
Applies to: 4.8.0 and later
Files you upload through the Watson Studio or Watson Machine Learning UI are not validated or scanned for potentially malicious content. It is recommended that you run security software, such as an anti-virus application, on all files before uploading to ensure the security of your content.
Python scoring function with a custom software specification executed on the Linux on Power (ppc64le) platform fails when custom software spec has a YML package extension
Applies to: 4.8.0 to 4.8.2
Fixed in: 4.8.3
When executing a Python scoring function with a custom software specification that has a YML package extension, the scoring call returns this error: certificate verify failed: unable to get local issuer certificate.`
To resolve the problem, explicitly install the certifi==2023.5.7 version in the runtime. For example:
%%writefile tmp_custom_env.yml
dependencies:
- certifi==2023.5.7
- pip:
- langdetect==1.0.9
returns: Overwriting tmp_custom_env.yml
Maximum number of feature columns in AutoAI experiments
Applies to: 4.8.0 and later
The maximum number of feature columns for a classification or regression experiment is 5000.
No support for Cloud Pak for Data authentication with storage volume connection
Applies to: 4.8.0 and later
You cannot use a storage volume connection with the 'Cloud Pak for Data authentication' option enabled as a data source in an AutoAI experiment. AutoAI does not currently support the user authentication token. Instead, disable the 'Cloud Pak for Data authentication' option in the storage volume connection to use the connection as a data source in your AutoAI experiment.
Deployment runtime containers in CrashLoopBackoff state after upgrade from previous releases
Applies to: 4.8.0 and later
After upgrading Watson Machine Learning, some runtime containers are in CrashLoopBackoff state.
To fix the issue, patch the RTA of the deployment. First, fetch the rta-id for the deployments in CrashLoopBackoff state by using this command:
oc get rta -l WML_DEPLOYMENT_ID=<deployment-id>
Then, get the path to RTA by using this command:
curl -k -X PUT "https:///v2/runtime_services?uid=&location=urn:ibm:type:cpd" -H "Authorization: ZenApiKey ${MY_TOKEN}" -H "Service-Authorization: Basic $TOKEN" --data-raw '{"id":"<rta-id>","location":{"type":"cpd"},"environment":{"env":["productVersion=4.7.0 "]}}'
Automatic mounting of storage volumes is not supported by online and batch deployments
Applies to: 4.8.0 and later
You cannot use automatic mounts for storage volumes with Watson Machine Learning online and batch deployments. Watson Machine Learning does not support this feature for Python-based runtimes, including R-script, SPSS Modeler, Spark, and Decision Optimization. You can use only automatic mounts for storage volumes with Watson Machine Learning shiny app deployments and notebook runtimes.
As a workaround, you can use the download method from the Data assets library,
which is a part of the ibm-watson-machine-learning python client.
Batch deployments that use large data volumes as input might fail
Applies to: 4.8.0 and later
If you are scoring a batch job that uses large volumes of data as the input source, the job might fail becase of internal timeout settings. A symptom of this problem might be an error message similar to the following example:
Incorrect input data: Flight returned internal error, with message: CDICO9999E: Internal error occurred: Snowflake sQL logged error: JDBC driver internal error: Timeout waiting for the download of #chunk49(Total chunks: 186) retry=0.
If the timeout occurs when you score your batch deployment, you must configure the data source query level timeout limitation to handle long-running jobs.
Query-level timeout information for data sources is as follows:
| Data source | Query level time limitation | Default time limit | Modify default time limit |
|---|---|---|---|
| Apache Cassandra | Yes | 10 seconds | Set the read_timeout_in_ms and write_timeout_in_ms parameters in the Apache Cassandra configuration file or in the Apache Cassandra connection URL to change the default time limit. |
| Cloud Object Storage | No | N/A | N/A |
| Db2 | Yes | N/A | Set the QueryTimeout parameter to specify the amount of time (in seconds) that a client waits for a query execution to complete before a client attempts to cancel the execution and return control to the application. |
| Hive via Execution Engine for Hadoop | Yes | 60 minutes (3600 seconds) | Set the hive.session.query.timeout property in the connection URL to change the default time limit. |
| Microsoft SQL Server | Yes | 30 seconds | Set the QUERY_TIMEOUT server configuration option to change the default time limit. |
| MongoDB | Yes | 30 seconds | Set the maxTimeMS parameter in the query options to change the default time limit. |
| MySQL | Yes | 0 seconds (No default time limit) | Set the timeout property in the connection URL or in the JDBC driver properties to specify a time limit for your query. |
| Oracle | Yes | 30 seconds | Set the QUERY_TIMEOUT parameter in the Oracle JDBC driver to specify the maximum amount of time a query can run before it is automatically cancelled. |
| PostgreSQL | No | N/A | Set the queryTimeout property to specify the maximum amount of time that a query can run. The default value of the queryTimeout property is 0. |
| Snowflake | Yes | 6 hours | Set the queryTimeout parameter to change the default time limit. |
To avoid your batch deployments from failing, partition your data set or decrease its size.
Batch deployment jobs that use large inline payload might get stuck in starting or running state
Applies to: 4.8.0 and later
If you provide a large asynchronous payload for your inline batch deployment, it can result in the runtime manager process to go out of heap memory.
In the following example, 92 MB of payload was passed inline to the batch deployment which resulted in the heap to go out of memory.
Uncaught error from thread [scoring-runtime-manager-akka.scoring-jobs-dispatcher-35] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[scoring-runtime-manager]
java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:172)
at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:538)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:174)
...
This could result in concurrent jobs getting stuck in starting or running state. The starting state can only be cleared once the deployment is deleted and a new deployement is created. The running state can be cleared without deleting the deployment.
As a workaround, use data references instead of inline for huge payloads that are provided to batch deployments.
Update pod time out limits to manage resources for long-running jobs
Control how frequently long-running pods should be reclaimed to free up resources.
Follow the steps described in Predictions API in Watson Machine Learning service can time out too soon for details on stopping and starting pods. Update the parameters for jobs_per_deployment_limit,
job_pod_cleanup_check_interval and job_pod_cleanup_idle_time.
In this example, a long-running Decision Optimization solution is consuming pod resources. The administrator can intervene to reclaim pods.
oc patch wmlbase wml-cr --type=merge -p '{"spec":[{"jobs_per_deployment_limit": <REQUIRED_TIMEOUT_IN_SECONDS>, "job_pod_cleanup_check_interval": <REQUIRED_TIMEOUT_IN_SECONDS>, "job_pod_cleanup_idle_time": <REQUIRED_TIMEOUT_IN_SECONDS>}]}' -n <NAMESPACE>
where:
jobs_per_deployment_limitcontrols how many maximum jobs can run in parallel per deployment. It takes an integer as input. The default is 2.job_pod_cleanup_check_intervalcontrols how frequently the internal scheduler wakes up to check for idle pods of Decision Optimization runtime. It takes an integer as input. The default is 900 (seconds).job_pod_cleanup_idle_timecontrols how much minimum time a Decision Optimization runtime pod should be idle to be selected for reclaim of the pod. This takes an Integer as input. The default is 120 (minutes).
Setting environment variables in a conda yaml file does not work for deployments
Setting environment variables in a conda yaml file does not work for deployments. This means that you cannot override existing environment variables, for example LD_LIBRARY_PATH, when deploying assets in Watson Machine Learning.
As a workaround, if you're using a Python function, consider setting default parameters. For details, see Deploying Python functions.
More resources required for feature engineering on Power platform
Applies to: 4.8.0 to 4.8.2
Fixed in: 4.8.3
When training an AutoAI experiment with a 16x64 environment on a Power platform, you must disable the text feature engineering function or use an 8x32 AutoAI environment if you are using text feature engineering.
R Shiny applications deployed with shiny-r3.6 software specification fail after upgrade
Applies to: 4.8.4 and later
After upgrading from Cloud Pak for Data version 4.7.0 to version 4.8.4, R Shiny applications that are deployed by using shiny-r3.6 software specification in FIPS mode for x86 architecture fail. You might receive the
error message Error 502 - Bad Gateway.

As a workaround, you must make sure that your R Shiny applications are not deployed with shiny-r3.6 software specification. For applications deployed with shiny-r3.6 software specification, you must update your deployment
to use the latest software specification. For more information, see Managing outdated software specifications or frameworks. You can also delete your application deployment if you no
longer need it to free up resources.
Troubleshooting
Follow these tips to resolve common problems you might encounter when working with Watson Machine Learning.
Insufficient class members in training data for AutoAI experiment
Training data for an AutoAI experiment must have at least 4 members for each class. If your training data has an insufficient number of members in a class, you will encounter this error:
ERROR: ingesting data Message id: AC10011E. Message: Each class must have at least 4 members. The following classes have too few members: ['T'].
To resolve the problem, update the training data to remove the class or add more members.
Parent topic: Limitations and known issues in IBM Cloud Pak for Data