Important:

IBM Cloud Pak® for Data Version 4.8 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.
Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.8 reaches end of support. For more information, see Upgrading from IBM Cloud Pak for Data Version 4.8 to IBM Software Hub Version 5.1.

Known issues and limitations for Watson Machine Learning

The following known issues and limitations apply to Watson Machine Learning.

Known issues

Known issues for Federated Learning
- Authentication failures for Federated Learning training jobs when allowed_ips are specified in the Remote Training System
Known issues for AutoAI
Known issues for Watson Machine Learning

Limitations

Limitations for Watson Machine Learning
Limitations for AutoAI experiments

Known issues for Federated Learning

Authentication failures for Federated Learning training jobs when allowed IPs are specified in the Remote Training System

Applies to: 4.8.0 and later

Currently, the Red Hat OpenShift Ingress controller is not setting the X-Forwarded-For header with the client's IP address regardless of the forwardedHeaderPolicy setting. This causes authentication failures for Federated Learning training jobs when allowed_ips are specified in the Remote Training System even though the client IP address is correct.

To use the Federated Learning Remote Training System IP restriction feature in Cloud Pak for Data 4.0.3, configure an external proxy to inject the X-Forwarded-For header. For more information, see the article on configuring ingress.

Known issues for AutoAI

The Flight service returns "Received RST_STREAM with error code 3" when reading large data sets

Applies to: 4.8.0
Fixed in: 4.8.1

If you use the Flight service and the pyarrow library to read large data sets in an AutoAI experiment in a notebook, the Flight service might return the following message:

Received RST_STREAM with error code 3

When this error occurs, the AutoAI experiment receives incomplete data, which can affect training of the model candidate pipelines.

If this error occurs, add the following code to your notebook:

os.environ['GRPC_EXPERIMENTAL_AUTOFLOWCONTROL'] = 'false'

Then, rerun the experiment.

Importing an AutoAI notebook from a catalog can result in runtime error

Applies to: 4.8.0 and later

If you save an AutoAI notebook to an IBM Knowledge Catalog, and then you import it into a project and run it, you might get this error: Library not compatible or missing.

This error results from a mismatch between the runtime environment saved in the catalog and the runtime environment required to run the notebook in the project. To resolve, update the runtime environment to the latest supported version. For example, if the imported notebook uses Runtime 22.2 in the catalog version, update to Runtime 23.1 and run the notebook job again.

Tip: When you update your runtime environment, check that you have adequate computing resources. The recommended configuration is at least 2 vCPU and 8GB RAM for an experiment notebook, and at least 4 vCPU and 16GB RAM for a pipeline notebook.

Running AutoAI pipeline notebook generates TypeError

Applies to: 4.8.8 and later

Running an AutoAI pipeline notebook results in a TypeError with a missing argument in the initialized CatImputer:

cat_imputer = CatImputer(missing_values=float("nan"), sklearn_version_family="1")

To workaround this issue, add strategy="most_frequent” to the initializer and rerun the cell:

cat_imputer = CatImputer(strategy="most_frequent", missing_values=float("nan"), sklearn_version_family="1")

Known issues for Watson Machine Learning

Unusable deployments after an upgrade or restoring from backup

Applies to: 4.8.0 and later

For deployments created on Cloud Pak for Data 4.6.x, generating predictions with a deployment might fail after an upgrade to Cloud Pak for Data 4.8.x. The error message for this problem is:

Deployment: <deployment-ID> has been suspended due to the deployment owner either not being a member of the deployment space: <space-ID> any more or removed from the system.

These errors can also occur following a restore from backup.

The resolution is to update the deployments by using the following steps. You must use alternative steps that are specific to R Shiny deployments.

To update deployments, except for R Shiny deployments:

For HOST="CP4D_HOSTNAME", replace "CPD_HOSTNAME" with the Cloud Pak for Data hostname.
For SPACE_ID="WML_SPACE_ID", replace "WML_SPACE_ID" with the space ID of the deployment that is failing.
For DEPLOYMENT_ID="WML_DEPLOYMENT_ID" replace "WML_DEPLOYMENT_ID" with the deployment ID of the broken deployment.
Use "Authorization: ZenApiKey <token>" and supply a valid token. If you export the environment variable use ${TOKEN} instead of <token>.

Use this CURL command to replace the "OWNER_ID" with actual owner ID on this cluster in the PATCH payload.

curl  -k -X PATCH "$HOST/ml/v4/deployments/$DEPLOYMENT_ID?version=2020-04-20&space_id=$SPACE_ID" -H "content-type: application/json" -H "Authorization: ZenApiKey <token>" --data '[{ "op": "replace", "path": "/metadata/owner", "value": "OWNER_ID" }]'

Note:

To run this script, you must generate and export the token as the ${MY_TOKEN} environment variable. For details, see Generating an API authorization token.

To update R-Shiny deployments:

Use oc get pods -n NAMESPACE | grep "wml-deployment-manager" and replace the NAMESPACE with WML Namespace.
For oc exec -it WML_DEPLOYMENT_MANAGER_POD_NAME bash -n NAMESPACE, replace the WML_DEPLOYMENT_MANAGER_POD_NAME with the Deployment manager pod name displayed in the previous step and replace the NAMESPACE with the Watson Machine Learning namespace`.
For deployment_id="DEPLOYMENT_ID", replace the DEPLOYMENT_ID with the deployment ID.
For space_id="SPACE_ID", replace the SPACE_ID with the space ID for the deployment.
For HOST="https://wml-deployment-manager-svc.NAMESPACE.svc:16500", replace the NAMESPACE with the Watson Machine Learning namespace`.
Use "Authorization: ZenApiKey <token>" and supply a valid token. If you export the environment variable use ${TOKEN} instead of <token>.

Re-create the R Shiny deployment using the following CURL command:

url -k -X PUT "$HOST/ml/v4_private/recreate_deployment/$deployment_id?version=2020-06-12&space_id=$space_id" -H "Authorization: ZenApiKey <token>"

Verify the status of R Shiny deployment and wait for the deployment to become "Ready" before proceeding to the next step.

curl -k -X GET "$HOST/ml/v4/deployments/$deployment_id?version=2020-06-12&space_id=$space_id" -H "Authorization: ZenApiKey ${MY_TOKEN}"

If you are upgrading to Cloud Pak for Data 4.8.0 or restoring from backup, scale up the number of copies by 1 from the deployment space UI.

The deployment state will be changed from "Unusable" to "Deployed" state.

Restoring R Shiny deployment

Note:

You can optionally scale the number of copies back to 1 or the original setting when the deployment is working as expected.
2.To run this script, you must generate and export the token as the ${MY_TOKEN} environment variable. For details, see Generating an API authorization token.

Predictions API in Watson Machine Learning service can time out too soon

Applies to: 4.8.0 and later

If the predictions API (POST /ml/v4/deployments/{deployment_id}/predictions) in the Watson Machine Learning deployment service is timing out too soon, follow these steps to manually update the timeout interval.

Update the API timeout parameter in Watson Machine Learning CR:
```
REQUIRED_TIMEOUT_IN_SECONDS=<timeout-in-seconds>
NAMESPACE=<wml-instance-namespace>
oc patch wmlbase wml-cr -p "{\"spec\":{\"wml_api_timeout\": $REQUIRED_TIMEOUT_IN_SECONDS, \"wml_envoy_pods\": 1}}" --type=merge -n "$NAMESPACE"
```
The following example shows how to update the timeout to 600 seconds for service instance namespace zen:
```
REQUIRED_TIMEOUT_IN_SECONDS=600
NAMESPACE=zen
oc patch wmlbase wml-cr -p "{\"spec\":{\"wml_api_timeout\": $REQUIRED_TIMEOUT_IN_SECONDS, \"wml_envoy_pods\": 1}}" --type=merge -n "$NAMESPACE"
```
Note:
If HPA is disabled on the Cloud Pak for Data cluster and you want to increase the throughput of Watson Machine Learning prediction API requests, you can increase the number of Watson Machine Learning envoy pods by using the wml_envoy_pods parameter in the command. One envoy pod can support up to 1500 requests per second.

Restart the NGINX pods:

oc rollout restart deployment ibm-nginx -n "$NAMESPACE"

Check that the NGINX pods come up:

oc get pods -n "$NAMESPACE" | grep "ibm-nginx"

Decision Optimization deployment job fails with error: "Add deployment failed with deployment not finished within time"

Applies to: 4.8.0 and later

If your decision optimization deployment job fails with the following error, complete the steps to extend the timeout window.

"status": {
     "completed_at": "2022-09-02T02:35:31.711Z",
     "failure": {
         "trace": "0c4c4308935a3c4f2d9987b22139c61c",
         "errors": [{
              "code": "add_deployment_failed_in_runtime",
              "message": "Add deployment failed with deployment not finished within time"
         }]
     },
     "state": "failed"
   }

To update the deployment timeout in the deployment manager:

Edit the wmlbase wml-cr and add this line: ignoreForMaintenance: true. This sets the WML operator into maintenance mode, which stops automatic reconciliation. The automatic reconciliation will undo any configmap changes applied otherwise.
```
oc patch wmlbase wml-cr --type merge --patch '{"spec": {"ignoreForMaintenance": true}}' -n <namespace>
```
For example:
```
oc patch wmlbase wml-cr --type merge --patch '{"spec": {"ignoreForMaintenance": true}}' -n zen
```

Capture the contents of the wmlruntimemanager configmap in a YAML file.

oc get cm wmlruntimemanager -n <namespace> -o yaml > wmlruntimemanager.yaml

For example:

oc get cm wmlruntimemanager -n zen -o yaml > wmlruntimemanager.yaml

Create a backup of the wmlruntimemanager YAML file.

cp wmlruntimemanager.yaml wmlruntimemanager.yaml.bkp

Open the wmlruntimemanager.yaml.
```
vi wmlruntimemanager.yaml
```
Navigate to file runtimeManager.conf and search for property service.
Increase the number of retries in the retry_count field to extend the timeout window:
```
service {

     jobs {

         do {
             check_deployment_status {
                 retry_count = 420   // Increase the number of retries to extend the timeout window }
                 retry_delay = 1000
             }
         }
     }
```
Where:
- Field retry_count = Number of retries
- Field retry_delay = Delay between each retry in milliseconds
In the example, the timeout is configured as 7 minutes (retry_count * retry_delay = 420 * 1000 = 7 minutes). If you want to increase the timeout further, you can increase the number of retries in the retry_count field.

Apply the deployment manager configmap changes:

oc delete -f wmlruntimemanager.yaml
oc create -f wmlruntimemanager.yaml

Restart the deployment manager pods:

oc get pods -n <namespace> | grep wml-deployment-manager

oc delete pod <podname> -n <namespace>

Wait for the deployment manager pod to come up:

oc get pods -n <namespace> | grep wml-deployment-manager

Note:

If you plan to upgrade the Cloud Pak for Data cluster, you must bring the WML operator out of maintenance mode by setting the field ignoreForMaintenance to false in wml-cr.

Previewing masked data assets is blocked in deployment space

Applies to: 4.8.0 and later

A data asset preview might fail with this message: This asset contains masked data and is not supported for preview in the Deployment Space

Deployment spaces currently don't support masking data so the preview for masked assets has been blocked to prevent data leaks.

Deployments with custom conda_yml package extensions and nodefaults fail

Applies to: 4.8.1 and later

Fixed in: 4.8.3

If you deploy an asset with custom conda_yml package extension or update packages with conda env update subprocess call, your deployment might fail when conda channels are restricted to nodedefaults. The workaround is as follows:

For Cloud Pak for Data version 4.8.3, use conda_yml package extension.
For Cloud Pak for Data versions prior to 4.8.3, use conda_yml package extension and remove nodefaults restriction for channel.

The following example shows conda channels restricted to nodefaults:

channels:
  - empty
  - nodefaults

dependencies:
  - pip:
    - langdetect==1.0.9

As a workaround, remove nodefaults restriction for channel:

channels:
  - empty

dependencies:
  - pip:
    - langdetect==1.0.9

Deployment of runtime pods fail after upgrade

Applies to: 4.8.4

If you deploy a machine learning model with a constricted software specification in FIPS mode, the runtime pod might fail after you upgrade to Cloud Pak for Data version 4.8.4. To learn more about constricted software specifications, see Software specifications lifecycle.

The following code snippet shows the py39 runtime pod entering into the crashloopbackoff state after upgrading from Cloud Pak for Data version 4.6.5 to version 4.8.4.

wml-dep-py39-00d7b8ba-e942-4b9e-bf89-3096fb143481-5449b56b9lnrx   1/2     CrashLoopBackOff    4 (22s ago)   2m2s
wml-dep-py39-00d7b8ba-e942-4b9e-bf89-3096fb143481-5d55449f2f8mm   1/2     CrashLoopBackOff    4 (16s ago)   2m2s
wml-dep-py39-2dfb43d1-32ea-46b4-9318-1270a9869e7c-5bd5bb5cmm5jv   1/2     CrashLoopBackOff    4 (34s ago)   2m2s
wml-dep-py39-2dfb43d1-32ea-46b4-9318-1270a9869e7c-ff74c46dztl7r   1/2     CrashLoopBackOff    4 (23s ago)   2m2s
wml-dep-py39-38f88d99-78d2-4c2d-8fb4-e1039d465c5a-75c98ffd5nb4b   1/2     CrashLoopBackOff    4 (31s ago)   2m2s
wml-dep-py39-38f88d99-78d2-4c2d-8fb4-e1039d465c5a-86c8767bmx42v   1/2     CrashLoopBackOff    4 (20s ago)   2m2s
wml-dep-py39-5ac302d4-819a-4c48-8a42-d63d2437e9af-547dc9876jzkn   1/2     CrashLoopBackOff    4 (11s ago)   2m2s
wml-dep-py39-76d51889-cb37-460c-b86f-078b234163e4-7454fd76sgkrm   1/2     CrashLoopBackOff    4 (23s ago)   2m2s
wml-dep-py39-76d51889-cb37-460c-b86f-078b234163e4-775564f7tmntg   1/2     CrashLoopBackOff    4 (21s ago)   2m2s
wml-dep-py39-9409a1c5-02f4-4183-ae87-8025815d01bb-6b89bdc9fkk6g   1/2     CrashLoopBackOff    4 (26s ago)   2m2s
wml-dep-py39-9409a1c5-02f4-4183-ae87-8025815d01bb-6b9559f57b2tm   1/2     CrashLoopBackOff    4 (17s ago)   2m2s
wml-dep-py39-ecd96b96-27d6-4abf-9fe3-9a1f1eff16de-65c66b4cn7gz5   1/2     CrashLoopBackOff    4 (32s ago)   2m2s
wml-dep-py39-ecd96b96-27d6-4abf-9fe3-9a1f1eff16de-795f955f2xtgp   1/2     CrashLoopBackOff    4 (33s ago)   2m1s
wml-dep-py39-f60e182c-627d-468d-a7ca-bfb9387e3ad8-57cbdcb6fhvs2   1/2     CrashLoopBackOff    4 (31s ago)   2m1s
wml-dep-py39-f60e182c-627d-468d-a7ca-bfb9387e3ad8-6d44cfbdqmh8n   1/2     CrashLoopBackOff    4 (22s ago)   2m1s

As a workaround, you must upgrade to Cloud Pak for Data version 4.6.5 or higher and contact IBM Support to apply the hot fix before upgrading to Cloud Pak for Data version 4.8.4.

Exporting asset files from a space fails after backing up and restoring Cloud Pak for Data

Applies to: 4.8.4

This issue occurs on clusters running on Power (ppc64le) hardware.

After you backup and restore an instance of Cloud Pak for Data that uses Spectrum Scale storage, you cannot export asset files from a space. The export fails with a message that indicates that the asset files API was not able to connect to RabbitMQ.

To resolve the problem, restart the asset-files-api pod:

Set the API_POD_NAME environment variable:

export API_POD_NAME=$(oc get pods -n=${PROJECT_CPD_INST_OPERANDS} | grep "asset-files-api" | awk '{print $1}')

Restart the asset-files-api pod:

oc delete pod ${API_POD_NAME} -n=${PROJECT_CPC_INST_OPERANDS}

Limitations for Watson Machine Learning

AutoAI file gets pushed to the Git repository in default Git projects

After you create an AutoAI experiment in a default Git project, you create a commit and see a file that includes your experiment name in the list of files that can be committed. There are no consequences to including this file in your commit. The AutoAI experiment will not appear in the asset list for any other user who pulls the file into their local clone using Git. Additionally, other users won’t be prevented from creating an AutoAI experiment with the same name.

Restrictions for IBM Z and IBM LinuxONE users

Applies to: 4.8.0 and later

For a list of feature restrictions, see Capabilities on Linux on IBM Z and IBM LinuxONE

Deploying a model on an S90X cluster might require retraining

Applies to: 4.8.0 and later

Training an AI model on a different platform such as x86/ppc and deploying the AI model on s390x using Watson Machine Learning might fail because of an endianness issue. In such cases, retrain and deploy the existing AI model on the s390x platform to resolve the problem.

Limits on size of model deployments

Applies to: 4.8.0 and later

Limits on the size of models you deploy with Watson Machine Learning depend on factors such as the model framework and type. In some instances, when you exceed a threshold, you will be notified with an error when you try to store a model in the Watson Machine Learning repository, for example: OverflowError: string longer than 2147483647 bytes. In other cases, the failure might be indicated by a more general error message, such as The service is experiencing some downstream errors, please re-try the request or There's no available attachment for the targeted asset. Any of these results indicate that you have exceeded the allowable size limits for that type of deployment.

Security for file uploads

Applies to: 4.8.0 and later

Files you upload through the Watson Studio or Watson Machine Learning UI are not validated or scanned for potentially malicious content. It is recommended that you run security software, such as an anti-virus application, on all files before uploading to ensure the security of your content.

Python scoring function with a custom software specification executed on the Linux on Power (ppc64le) platform fails when custom software spec has a YML package extension

Applies to: 4.8.0 to 4.8.2
Fixed in: 4.8.3

When executing a Python scoring function with a custom software specification that has a YML package extension, the scoring call returns this error: certificate verify failed: unable to get local issuer certificate.`

To resolve the problem, explicitly install the certifi==2023.5.7 version in the runtime. For example:

%%writefile tmp_custom_env.yml

dependencies:
  - certifi==2023.5.7
  - pip:
    - langdetect==1.0.9

returns: Overwriting tmp_custom_env.yml

Maximum number of feature columns in AutoAI experiments

Applies to: 4.8.0 and later

The maximum number of feature columns for a classification or regression experiment is 5000.

No support for Cloud Pak for Data authentication with storage volume connection

Applies to: 4.8.0 and later

You cannot use a storage volume connection with the 'Cloud Pak for Data authentication' option enabled as a data source in an AutoAI experiment. AutoAI does not currently support the user authentication token. Instead, disable the 'Cloud Pak for Data authentication' option in the storage volume connection to use the connection as a data source in your AutoAI experiment.

Deployment runtime containers in CrashLoopBackoff state after upgrade from previous releases

Applies to: 4.8.0 and later

After upgrading Watson Machine Learning, some runtime containers are in CrashLoopBackoff state.

To fix the issue, patch the RTA of the deployment. First, fetch the rta-id for the deployments in CrashLoopBackoff state by using this command:

oc get rta -l WML_DEPLOYMENT_ID=<deployment-id>

Then, get the path to RTA by using this command:

curl -k -X PUT "https:///v2/runtime_services?uid=&location=urn:ibm:type:cpd" -H "Authorization: ZenApiKey ${MY_TOKEN}" -H "Service-Authorization: Basic $TOKEN" --data-raw '{"id":"<rta-id>","location":{"type":"cpd"},"environment":{"env":["productVersion=4.7.0 "]}}'

Automatic mounting of storage volumes is not supported by online and batch deployments

Applies to: 4.8.0 and later

You cannot use automatic mounts for storage volumes with Watson Machine Learning online and batch deployments. Watson Machine Learning does not support this feature for Python-based runtimes, including R-script, SPSS Modeler, Spark, and Decision Optimization. You can use only automatic mounts for storage volumes with Watson Machine Learning shiny app deployments and notebook runtimes.

As a workaround, you can use the download method from the Data assets library, which is a part of the ibm-watson-machine-learning python client.

Batch deployments that use large data volumes as input might fail

Applies to: 4.8.0 and later

If you are scoring a batch job that uses large volumes of data as the input source, the job might fail becase of internal timeout settings. A symptom of this problem might be an error message similar to the following example:

Incorrect input data: Flight returned internal error, with message: CDICO9999E: Internal error occurred: Snowflake sQL logged error: JDBC driver internal error: Timeout waiting for the download of #chunk49(Total chunks: 186) retry=0.

If the timeout occurs when you score your batch deployment, you must configure the data source query level timeout limitation to handle long-running jobs.

Query-level timeout information for data sources is as follows:

Information about query-level time limitation for data sources
Data source	Query level time limitation	Default time limit	Modify default time limit
Apache Cassandra	Yes	10 seconds	Set the `read_timeout_in_ms` and `write_timeout_in_ms` parameters in the Apache Cassandra configuration file or in the Apache Cassandra connection URL to change the default time limit.
Cloud Object Storage	No	N/A	N/A
Db2	Yes	N/A	Set the `QueryTimeout` parameter to specify the amount of time (in seconds) that a client waits for a query execution to complete before a client attempts to cancel the execution and return control to the application.
Hive via Execution Engine for Hadoop	Yes	60 minutes (3600 seconds)	Set the `hive.session.query.timeout` property in the connection URL to change the default time limit.
Microsoft SQL Server	Yes	30 seconds	Set the `QUERY_TIMEOUT` server configuration option to change the default time limit.
MongoDB	Yes	30 seconds	Set the `maxTimeMS` parameter in the query options to change the default time limit.
MySQL	Yes	0 seconds (No default time limit)	Set the `timeout` property in the connection URL or in the JDBC driver properties to specify a time limit for your query.
Oracle	Yes	30 seconds	Set the `QUERY_TIMEOUT` parameter in the Oracle JDBC driver to specify the maximum amount of time a query can run before it is automatically cancelled.
PostgreSQL	No	N/A	Set the `queryTimeout` property to specify the maximum amount of time that a query can run. The default value of the `queryTimeout` property is `0`.
Snowflake	Yes	6 hours	Set the `queryTimeout` parameter to change the default time limit.

To avoid your batch deployments from failing, partition your data set or decrease its size.

Batch deployment jobs that use large inline payload might get stuck in starting or running state

Applies to: 4.8.0 and later

If you provide a large asynchronous payload for your inline batch deployment, it can result in the runtime manager process to go out of heap memory.

In the following example, 92 MB of payload was passed inline to the batch deployment which resulted in the heap to go out of memory.

Uncaught error from thread [scoring-runtime-manager-akka.scoring-jobs-dispatcher-35] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[scoring-runtime-manager]
java.lang.OutOfMemoryError: Java heap space
	at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
	at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:172)
	at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:538)
	at java.base/java.lang.StringBuilder.append(StringBuilder.java:174)
   ...

This could result in concurrent jobs getting stuck in starting or running state. The starting state can only be cleared once the deployment is deleted and a new deployement is created. The running state can be cleared without deleting the deployment.

As a workaround, use data references instead of inline for huge payloads that are provided to batch deployments.

Update pod time out limits to manage resources for long-running jobs

Control how frequently long-running pods should be reclaimed to free up resources.

Follow the steps described in Predictions API in Watson Machine Learning service can time out too soon for details on stopping and starting pods. Update the parameters for jobs_per_deployment_limit, job_pod_cleanup_check_interval and job_pod_cleanup_idle_time.

In this example, a long-running Decision Optimization solution is consuming pod resources. The administrator can intervene to reclaim pods.

oc patch wmlbase wml-cr --type=merge  -p '{"spec":[{"jobs_per_deployment_limit": <REQUIRED_TIMEOUT_IN_SECONDS>, "job_pod_cleanup_check_interval": <REQUIRED_TIMEOUT_IN_SECONDS>, "job_pod_cleanup_idle_time": <REQUIRED_TIMEOUT_IN_SECONDS>}]}'  -n <NAMESPACE>

where:

jobs_per_deployment_limit controls how many maximum jobs can run in parallel per deployment. It takes an integer as input. The default is 2.
job_pod_cleanup_check_interval controls how frequently the internal scheduler wakes up to check for idle pods of Decision Optimization runtime. It takes an integer as input. The default is 900 (seconds).
job_pod_cleanup_idle_time controls how much minimum time a Decision Optimization runtime pod should be idle to be selected for reclaim of the pod. This takes an Integer as input. The default is 120 (minutes).

Setting environment variables in a conda yaml file does not work for deployments

Setting environment variables in a conda yaml file does not work for deployments. This means that you cannot override existing environment variables, for example LD_LIBRARY_PATH, when deploying assets in Watson Machine Learning.

As a workaround, if you're using a Python function, consider setting default parameters. For details, see Deploying Python functions.

More resources required for feature engineering on Power platform

Applies to: 4.8.0 to 4.8.2
Fixed in: 4.8.3

When training an AutoAI experiment with a 16x64 environment on a Power platform, you must disable the text feature engineering function or use an 8x32 AutoAI environment if you are using text feature engineering.

R Shiny applications deployed with shiny-r3.6 software specification fail after upgrade

Applies to: 4.8.4 and later

After upgrading from Cloud Pak for Data version 4.7.0 to version 4.8.4, R Shiny applications that are deployed by using shiny-r3.6 software specification in FIPS mode for x86 architecture fail. You might receive the error message Error 502 - Bad Gateway.

Example of error message

As a workaround, you must make sure that your R Shiny applications are not deployed with shiny-r3.6 software specification. For applications deployed with shiny-r3.6 software specification, you must update your deployment to use the latest software specification. For more information, see Managing outdated software specifications or frameworks. You can also delete your application deployment if you no longer need it to free up resources.

Troubleshooting

Follow these tips to resolve common problems you might encounter when working with Watson Machine Learning.

Insufficient class members in training data for AutoAI experiment

Training data for an AutoAI experiment must have at least 4 members for each class. If your training data has an insufficient number of members in a class, you will encounter this error:

ERROR: ingesting data Message id: AC10011E. Message: Each class must have at least 4 members. The following classes have too few members: ['T'].

To resolve the problem, update the training data to remove the class or add more members.

Parent topic: Limitations and known issues in IBM Cloud Pak for Data