Troubleshooting Watson OpenScale

You can use the following techniques to work around problems with IBM Watson OpenScale.

My model evaluations are failing
I can't see the monitoring metrics when my evaluation finishes
Why is my Spark job failing?
I'm unable to run a Db2 common configuration notebook
When I use AutoAI, why am I getting an error about mismatched data?
Why am I getting errors during model configuration?
Why are my class labels missing when I use XGBoost?
Why are the payload analytics not displaying properly?
Error: An error occurred while computing feature importance
Why are some of my active debias records missing?
Watson OpenScale does not show any available schemas
A monitor run fails with an OutOfResources exception error message
After re-installing Watson Machine Learning, everything appears to work fine, but no payloads are reaching Watson OpenScale
When creating a subscription, why do I get an error?
Missing deployments
Connection issue from IBM Watson OpenScale for IBM Cloud Pak for Data to IBM Watson Machine Learning
The Kafka SSL certificate and the compound public key are treated as invalid by the Python service
Watson OpenScale evaluation might fail due to large number of subscriptions
Watson OpenScale operator doesn't watch and reconcile custom resources
Watson OpenScale etcd and scheduler pods fail after large data upload
Watson OpenScale model risk management evaluation fails
Microsoft Azure ML Studio
Uploading feedback data fails in production subscription after importing settings
Microsoft Azure Machine Learning Service
Watson OpenScale fails to create a new Hive table for the batch deployment subscription
Watson OpenScale setup might fail with default Db2 database

My model evaluations are failing

When you provide the minimum number of payload records for model evaluations, the evaluations do not run successfully and fail to produce the expected metrics. The Evaluation window on your dashboard either displays metrics from the last successful evaluation or displays an error message.

The performance of the evaluations might be affected by the latency that occurs during payload logging data processing.

To fix this issue, you can slow down the rate of scoring requests by completing one of the following steps to send bulk scoring requests:

If you use the IBM Watson Machine Learning Python SDK, you can append multiple request values to the values array, as shown in the following example:

scoring_payload = {wml_client.deployments.ScoringMetaNames.INPUT_DATA:
       [{'fields':
           ['GENDER','AGE','MARITAL_STATUS','PROFESSION'],
           'values': [
               ['M',24,'Single','Student'],
               ['F',21,'Single','Retail'],
               ...
               ['M',40,'Married','Retail']
               ]
        }
      ]}
predictions = client.deployments.score(deployment_id, scoring_payload)

If you use the Watson OpenScale Python SDK, you can append multiple request values to the values array in the request argument. Then, append the corresponding multiple response values to the values array in the response argument, as shown in the following example:

store_record_info = client.data_sets.store_records(
      request_body=[PayloadRecord(
           scoring_id='42e62c3ae2244f0d851009dec4754d74',
           request={
              'fields': ['GENDER','AGE','MARITAL_STATUS','PROFESSION'],
              'values': [
                  ['M',24,'Single','Student'],
                  ['F',21,'Single','Retail'],
                  ...
                  ['M',40,'Married','Retail']
                  ]
              },
           response={
              'fields': ['probability', 'prediction'],
              'values': [
                  [[0.9800331274776278, 0.01996687252237212], True],
                  [[0.7256561304103799, 0.2743438695896201], False],
                  ...
                  [[0.9998955605441558, 0.00010443945584419821], False]
                  ]
              },
           response_time=4600
       )],
      data_set_id='997b1474-00d2-4g05-ac02-287ebfc603b5',
  )

By sending bulk scoring requests, you can slow down the scoring requests by sending, for example, 10 seconds of scoring requests at once instead of sending a small request every second.

I can't see the monitoring metrics when my evaluation finishes

After your evaluation runs successfully, the fairness, quality, and drift monitoring metrics do not appear on your dashboard.

The metrics might not appear because the events that are related to the evaluations might be stuck in the internal queuing service.

To fix this issue, you can restart the data mart service pods by completing the following steps:

Check the data mart service pods by running the following command:
```
oc get pod -l release=<your openscale release name>,component=aios-datamart
```
The pods are specified with the component=aios-datamart and release=<your openscale release name> labels. The release name is usually specified as aiopenscale or wos-cr if you don't specify a custom release name during installation.

Restart the pods by running the following command:

oc delete pod -l release=<your openscale release name>,component=aios-datamart

View the current status of the pods by running the oc get pod -l release=,component=aios-datamart command, as shown in the following example:

NAME                                           READY   STATUS    RESTARTS   AGE
aiopenscale-ibm-aios-datamart-858c4ccd-4dcvb   1/1     Running   0          14h

Why is my Spark job failing?

You're running model evaluation, and a Spark job fails.

A Spark job might fail because the resource quota for the Spark instance is insufficient to run the Spark job.

If this issue arises, you can increase the resource quota for the Spark instance. To increase the resource quota, complete the following steps:

First, get the Spark instance URL by going to the Spark instance details page on the Cloud Pak for Data cluster.
- If you're using a Spark 3.0 instance, then https://<CLUSTER_HOST>/v2/spark/v3/instances/<INSTANCE_ID> is the Spark instance URL, substituting in the CLUSTER_HOST and INSTANCE_ID for your cluster and instance.
- If you aren't using a Spark 3.0 instance, then https://<CLUSTER_HOST>/ae/spark/v2/<INSTANCE_ID> is the Spark instance URL, substituting in the CLUSTER_HOST and INSTANCE_ID for your cluster and instance.
Get your access token by running the following command:
```
curl -k -X GET --user "<username>:<password>" "https://<CLUSTER_HOST>/v1/preauth/validateAuth"
```
Use the accessToken that is returned by this command in the next step for the ACCESS_TOKEN variable.

Get the existing resource quota by running the following command:

curl -iX GET -ivk -H "Content-Type: application/json" -H "Authorization: Bearer <ACCESS_TOKEN>" <INSTANCE_URL>

Update the resource quota.

If you're using a Spark 3.0 instance, then update the resource quota by running the following command:

curl -i -X PUT \
        -H "Accept:application/json" \
        -H "Authorization:Bearer <ACCESS_TOKEN>" \
        -H "Content-Type:application/json" \
        -d \
   '{
          "cpu_quota": 40,
          "memory_quota_gibibytes": 80
   } \
       <INSTANCE_URL>/resource_quota

If you aren't using a Spark 3.0 instance, then update the resource quota by running the following command:

curl -i -X PUT \
         -H "Accept:application/json" \
         -H "Authorization:Bearer <ACCESS_TOKEN>" \
         -H "Content-Type:application/json" \
         -d \
   '{
            "cpu_quota": 40,
            "memory_quota": "80g"
   } \
         <INSTANCE_URL>/resource_quota

I'm unable to run a Db2 common configuration notebook

When you run the Db2 configuration notebook, one or more of the cells fails with the following exception error:

Caused by: javax.net.ssl.SSLHandshakeException: The server selected protocol version TLS11 is not accepted by client preferences [TLS12]

This error occurs because the client, the Spark job, is trying to communicate with a server, Db2, by using a Transport Layer Security (TLS) protocol that isn't supported by the server.

To fix this issue, update the SSL_VERSIONS setting on your database. When this error occurs, the client typically uses TLS version 1.2 and the server supports only TLS version 1.1. In this case, you can update your Db2 configuration by using the following command. In this example, TLSV12 references TLS version 1.2:

db2 update dbm cfg using SSL_VERSIONS TLSV12

The user who runs this command must have admin access, which is often db2inst1. Run the command after all applications that are using the Db2 server are stopped. To resolve this issue, you might call the following commands in order:

db2stop force
db2 update dbm cfg using SSL_VERSIONS TLSV12
db2start

When I use AutoAI, why am I getting an error about mismatched data?

You receive an error message about mismatched data when using AutoAI for binary classification. Note that AutoAI is only supported in IBM Watson OpenScale for IBM Cloud Pak for Data.

For binary classification type, AutoAI automatically sets the data type of the prediction column to boolean.

To fix this, implement one of the following solutions:

Change the label column values in the training data to integer values, such as 0 or 1 depending on the outcome.
Change the label column values in the training data to string value, such as A and B.

Why am I getting errors during model configuration?

The following error messages appear when you are configuring model details: Field feature_fields references column <name>, which is missing in input_schema of the model. Feature not found in input schema.

The preceding messages while completing the Model details section during configuration indicate a mismatch between the model input schema and the model training data schema:

To fix the issue, you must determine which of the following conditions is causing the error and take corrective action: If you use IBM Watson Machine Learning as your machine learning provider and the model type is XGBoost/scikit-learn refer to the Machine Learning Python SDK documentation for important information about how to store the model. To generate the drift detection model, you must use scikit-learn version 0.20.2 in notebooks. For all other cases, you must ensure that the training data column names match with the input schema column names.

Why are my class labels missing when I use XGBoost?

Native XGBoost multiclass classification does not return class labels.

By default, for binary and multiple class models, the XGBoost framework does not return class labels.

For XGBoost binary and multiple class models, you must update the model to return class labels.

Why are the payload analytics not displaying properly?

Payload analytics does not display properly and the following error message displays: AIQDT0044E Forbidden character " in column name <column name>

For proper processing of payload analytics, Watson OpenScale does not support column names with double quotation marks (") in the payload. This affects both scoring payload and feedback data in CSV and JSON formats.

Remove double quotation marks (") from the column names of the payload file.

Error: An error occurred while computing feature importance

You receive the following error message during processing: Error: An error occurred while computing feature importance.

Having an equals sign (=) in the column name of a dataset causes an issue with explainability.

Remove the equals sign (=) from the column name and send the dataset through processing again.

Why are some of my active debias records missing?

Active debias records do not reach the payload logging table.

When you use the active debias API, there is a limit of 1000 records that can be sent at one time for payload logging.

To avoid loss of data, you must use the active debias API to score in chunks of 1000 records or fewer.

For more information, see Reviewing debiased transactions.

Watson OpenScale does not show any available schemas

When a user attempts to retrieve schema information for Watson OpenScale, none are available. After attempting directly in DB2, without reference to Watson OpenScale, checking what schemas are available for the database userid also returns none.

Insufficient permissions for the database userid is causing database connection issues for Watson OpenScale.

Make sure the database user has the correct permissions needed for Watson OpenScale.

A monitor run fails with an `OutOfResources exception` error message

You receive an OutOfResources exception error message.

Although there's no longer a limit on the number of rows you can have in the feedback payload, scoring payload, or business payload tables. The 50,000 limit now applies to the number of records you can run through the quality and bias monitors each billing period.

After you reach your limit, you must either upgrade to a Standard plan or wait for the next billing period.

After re-installing Watson Machine Learning, everything appears to work fine, but no payloads are reaching Watson OpenScale

None of the payload from Watson Machine Learning is making it to Watson OpenScale. No scoring is taking place.

Any time that Watson Machine Learning is installed after Watson OpenScale, it is impossible to establish proper linkage between the two services. The wmlkafkaconfigmap file is missing a broker configuration and that can only be established by installing Watson OpenScale after Watson Machine Learning.

If you uninstall and then re-install Watson Machine Learning, you must also uninstall and re-install Watson OpenScale. This may lead to loss of data and configuration settings.

When creating a subscription, why do I get an error?

The following error applies to the following IBM Watson OpenScale for IBM Cloud Pak for Data versions: 2.1.0.2, 2.5.0, and 3.0.x. You may see the following error display: Action Create Payload Tables has failed: Invalid token.

In addition to that UI message, the dashboard service log might contain the following entry:

[ERROR] [aiops-dashboard] [configuration-routes] {"error":"Unexpected status code 500 from PUT API http://ai-open-scale-ibm-aios-nginx-internal/v1/data_marts/00000000-0000-0000-0000-000000000000/service_bindings/999/subscriptions/093ee84d-5a59-49fb-a367-1932ad3403df/configurations/payload_logging","statusCode":500,"globalTransactionId":"c3789a70-de41-46a2-979f-890c07eaad98","errors":[{"code":"AIQCS0046E","message":"Action Create Payload Tables has failed: Invalid token"}]}

The configuration service log might contain the following entry:

ApiRequestFailure: Failure during payload logging setup. (PUT https://icpd-aios-cluster1.cpolab.ibm.com:31843/v1/data_marts/00000000-0000-0000-0000-000000000000/service_bindings/999/subscriptions/093ee84d-5a59-49fb-a367-1932ad3403df/configurations/payload_logging) Status code: 500, body: {“errors”:[{“code”:“AIQCS0046E”,“message”:“Action Create Payload Tables has failed: Invalid token”}], “trace”:“N2RmNWViMTItMTc4Zi00MWM5LTkyMmUtZDEwYzg1NGJmZDA0”}

The access token used by the payload logging, configuration, and datamart processes has become invalid. Restarting these processes will generate a new, valid token.

When this happens, it can be resolved by restarting the following pods:

aiopenscale-ibm-aios-bkpicombined
aiopenscale-ibm-aios-payload-logging
aiopenscale-ibm-aios-payload-logging-api
aiopenscale-ibm-aios-configuration
aiopenscale-ibm-aios-datamart
aiopenscale-ibm-aios-feedback

Missing deployments

A deployed model does not show up as a deployment that can be selected to create a subscription.

There are different reasons that a deployment does not show up in the list of available deployed models. If the model is not a supported type of model because it uses an unsupported algorithm or framework, it won't appear. Your machine learning provider might not be configured properly. It could also be that there are issues with permissions.

Use the following steps to resolve this issue:

Check that the model is a supported type. Not sure? For more information, see Supported machine learning engines, frameworks, and models.
Check that a machine learning provider exists in the Watson OpenScale configuration for the specific deployment space. For more information, see Deployment spaces.
Check that the CP4D admin user has permission to access the deployment space.

Connection issue from IBM Watson OpenScale for IBM Cloud Pak for Data to IBM Watson Machine Learning

You receive the following error message while attempting to establish a connection between IBM Watson OpenScale for IBM Cloud Pak for Data and IBM Watson Machine Learning: Error: Unable to connect to local Watson Machine Learning from OpenScale:

You cannot connect to the Watson Machine Learning instance because of an issue with the token.

Use the following steps to resolve this issue:

Check whether the ICP_WML_TOKEN value in the discovery instance is populated by running the following command:

kubectl -n <namespace>  exec -it $(kubectl get pod | grep "aiopenscale-ibm-aios-ml-gateway-discovery" | awk '{print $1}') -- env | grep ICP_WML_TOKEN

output:

ICP_WML_TOKEN=tsG3UMLF7esl4oDWfMGbIhm6IkrlkmYSirYy2UgzNhiItv8xofkj0bbj8zSxR27FTa1AG9R6bxWernMBrtUNasampletokennotcapableofbeingused92837cjqoe8r

Important: If the ICP_WML_TOKEN value is empty (ICP_WML_TOKEN=), continue with the following steps. Otherwise, you must reach out to IBM Support for guidance.

Generate a new Watson Machine Learning token by running the following command:

user_pwd_token=$(printf %s <user>:<password> | base64)

curl -k --request GET --url <CPD Web Console URL>/v1/preauth/validateAuth --header "authorization: Basic $user_pwd_token"

Where <user>:<password> is a valid user name and password for the Cloud Pak for Data cluster.

Use the accessToken from the previous output to run the following command:

curl -k --request POST --url <CPD Web Console URL>/api/v1/usermgmt/v1/usermgmt/getTimedToken --header 'authorization: Bearer accessToken_from_previous_step' --header 'lifetime: 87600'

Encode the accessToken value from previous step to base64 format:
```
printf %s 'accessToken_from_previous_step' | base64
```

Edit the secret and use the encoded value:

kubectl -n <namespace> edit secret ibm-aios-icp4d-token

Edit the file to add the encoded value to the following field:
```
data:
token: "encoded_value_from previous_step"
```
Restart all Watson OpenScale pods by running the following command:
```
kubectl -n <namespace> delete pods -l app=ibm-aios
```

Check whether the ICP_WML_TOKEN value in the discovery instance is populated by running the following command:

kubectl -n <namespace>  exec -it $(kubectl get pod | grep "aiopenscale-ibm-aios-ml-gateway-discovery" | awk '{print $1}') -- env | grep ICP_WML_TOKEN

The Kafka SSL certificate and the compound public key are treated as invalid by the Python service

Automatic payload logging does not work for the Python base model, such as scikit-learn, XGboost, Keras/Tensorflow, or Python function model. The following error appears in the pods about SSL decoding.

oc logs -f wml-dep-od-scikit-learn0.23-1.3-379-2ndxbi3c-65b74df584-pmxn4
...
%3|1626368951.958|SSL|python-event-client-producer#producer-1| [thrd:sasl_ssl://aiopenscale-ibm-aios-kafka-2.aiopenscale-ibm-aios-ka]: sasl_ssl://aiopenscale-ibm-aios-kafka-2.aiopenscale-ibm-aios-kafka-headless.namespace1.svc.cluster.local:9092/bootstrap: rsa_eay.c:693: error:04067072:rsa routines:RSA_EAY_PUBLIC_DECRYPT:padding check failed:

The Kafka certificate file is invalid.

To resolve this issue, you must generate a new certificate file and replace the secret. The following steps must be run from the OpenShift command line interface (CLI):

Setup the OpenShift CLI by running the following command:

oc login <url> -u <username> -p <password>
oc project <project>

List the Kafka pods. The release label value depends on the system. The default value is aiopenscale.
```
oc get pod -l release=aiopenscale,serviceSelector=kafka
```

Run the following command from any of the Kafka pods. A quick way to find this information is copy all from the oc exec -i ... line to the EOF line and paste it to your shell terminal.

oc exec -i aiopenscale-ibm-aios-kafka-0 bash <<'EOF'
dir=/tmp/`date "+%Y%m%d-%H%M%S"`
echo ""
echo "1. create temporal directory in $dir"
mkdir $dir
cd $dir

echo ""
echo "2. Generate SSL key"
openssl genrsa -out rootCA.key 4096
openssl req -x509 -new -nodes -key rootCA.key -sha256 -days 3650 -out ca-cert -subj "/C=US/ST=California/   L=San Jose/O=IBM/OU=Watson"

echo ""
echo "3. create client and server trustore"
keytool -keystore kafka.server.truststore.jks -alias CARoot -import -file ca-cert -noprompt -storepass    password
keytool -keystore kafka.client.truststore.jks -alias CARoot -import -file ca-cert -noprompt -storepass    password

echo ""
echo "4. create keystore"
keytool -genkey -noprompt -alias localhost -keyalg RSA -dname "CN=kafka.aiopenscale, OU=Watson, O=IBM, L=SJ,    S=CA, C=US" -keystore kafka.server.keystore.jks -storepass password -keypass password

echo ""
echo "5. export cert from keystore"
keytool -keystore kafka.server.keystore.jks -alias localhost -certreq -file cert-file -noprompt -storepass    password -keypass password

echo ""
echo "6. sign cert with CA"
openssl x509 -req -CA ca-cert -CAkey rootCA.key -in cert-file -out cert-signed -days 3650 -CAcreateserial    -passin pass:password

echo ""
echo "7. import signed cert and CA to keystore"
keytool -keystore kafka.server.keystore.jks -alias CARoot -import -file ca-cert -noprompt -storepass password
keytool -keystore kafka.server.keystore.jks -alias localhost -import -file cert-signed -noprompt -storepass    password

echo ""
echo "8. show certfile and jks"
caCert=`cat ca-cert | base64 | tr -d '\n'`
keystore=`cat kafka.server.keystore.jks | base64 | tr -d '\n'`
serverTruststore=`cat kafka.server.truststore.jks | base64 | tr -d '\n'`
clientTruststore=`cat kafka.client.truststore.jks | base64 | tr -d '\n'`

cat << EOL > patch-secret.yaml
data:
  es-cert.pem: ${caCert}
  es-cert.jks: ${clientTruststore}
  kafka.server.keystore.jks: ${keystore}
  kafka.server.truststore.jks: ${serverTruststore}
EOL
cat << EOL > patch-cm.yaml
binaryData:
   es-cert.pem: <the above es-cert.pem>
   es-cert.jks: <the above es-cert.jks>
EOL

cat patch-secret.yaml
EOF

Replace the certificate file and secret, both the client and server, in the existing Kafka secret with the new one that you generated in the previous command.

To ensure this is done properly, dump the current Kafka secret and wmlkafkaconfigmap .yaml file by running the following command:

oc get secret aiopenscale-ibm-aios-kafka-secrets -o yaml > aiopenscale-ibm-aios-kafka-secrets.yaml
oc get cm wmlkafkaconfigmap -o yaml > wmlkafkaconfigmap.yaml

Check that they are successfully dumped out by running the following command:
```
cat aiopenscale-ibm-aios-kafka-secrets.yaml
cat wmlkafkaconfigmap.yaml
```

Replace the secret by running the following command:

oc patch secret aiopenscale-ibm-aios-kafka-secrets --patch "$(cat << EOL
data:
   es-cert.pem: <the above es-cert.pem>
   es-cert.jks: <the above es-cert.jks>
   kafka.server.keystore.jks: <the above kafka.server.keystore.jks>
   kafka.server.truststore.jks: <the above kafka.server.truststore.jks>
EOL
)"

For example, the following code shows an actual case:

oc patch secret aiopenscale-ibm-aios-kafka-secrets --patch "$(cat << EOL
data:
   es-cert.pem: LS0tLS1C...RFLS0tLS0K
   es-cert.jks: /u3+7QAAAAIA...Wt0ju8dM=
   kafka.server.keystore.jks: /u3+7QAAAAI...R4dffsGKsiRAfcxYQA==
   kafka.server.truststore.jks: /u3+7QAAAAI...+vHdqvmEAdk7MuQ8=
EOL
)"

Replace the certificate file and secret for the client with the new wmlkafkaconfigmap file that was generated in the previous command.

oc patch cm wmlkafkaconfigmap --patch "$(cat << EOL
binaryData:
   es-cert.pem: <the above es-cert.pem>
   es-cert.jks: <the above es-cert.jks>
EOL
)"

For example:

oc patch cm wmlkafkaconfigmap --patch "$(cat << EOL
binaryData:
   es-cert.pem: LS0tLS1C...RFLS0tLS0K
   es-cert.jks: /u3+7QAAAAIA...Wt0ju8dM=
EOL
)"

Restart the Kafka service, by running the following command:

oc delete pod -l release=aiopenscale,serviceSelector=kafka

Wait for the Kafka service to be ready.

oc get pod -l release=aiopenscale,serviceSelector=kafka

Restart the aiopenscale services that depend on the Kafka service by running the following command:

oc get pod -l release=aiopenscale,"serviceSelector in (payload-logging, payload-logging-api, datamart, notification)"
oc delete pod -l release=aiopenscale,"serviceSelector in (payload-logging, payload-logging-api, datamart, notification)"

Wait for the services to be ready.

oc get pod -l release=aiopenscale,"serviceSelector in (payload-logging, payload-logging-api, datamart, notification)"

Restart the old regular wml runtime pods. These may vary depending on the version.

oc get pod -l servicename!=wml-scoring,wml_types=runtime
oc delete pod -l servicename!=wml-scoring,wml_types=runtime

Wait for the runtime pods to be ready.

oc get pod -l servicename!=wml-scoring,wml_types=runtime

Restart the regular wml runtime pods if they exist.

oc get pod -l servicename=wml-scoring,wml_types=runtime
oc delete pod -l servicename=wml-scoring,wml_types=runtime

Wait for runtime pods to be ready.

oc get pod -l servicename=wml-scoring,wml_types=runtime

Restart the autoai runtime pods if they exist.

oc get pod -l servicename=wml-scoring,wml_types!=runtime
oc delete pod -l servicename=wml-scoring,wml_types!=runtime

Wait for the runtime pods to be ready.

oc get pod -l servicename=wml-scoring,wml_types!=runtime

The certificate is updated and you can now work with your Kafka deployed models.

Watson OpenScale evaluation might fail due to large number of subscriptions

If a Watson OpenScale instance contains too many subscriptions, such as 100 subscriptions, your quality evaluations might fail. You can view the details of the failure in the log for the data mart service pod that displays the following error message:

"Failure converting response to expected model EntityStreamSizeException: actual entity size (Some(8644836)) exceeded content length limit (8388608 bytes)! You can configure this by setting akka.http.[server|client].parsing.max-content-length or calling HttpEntity.withSizeLimit before materializing the dataBytes stream".

You can use the oc get pod -l component=aios-datamart command to find the name of the pod. You can also use the oc logs <pod name> command to the log for the pod.

To fix this error, you can use the following command to increase the maximum request body size by editing the "ADDITIONAL_JVM_OPTIONS" environment variable:

oc patch woservice <release name> -p '{"spec": {"datamart": {"additional_jvm_options":"-Dakka.http.client.parsing.max-content-length=100m"} }}' --type=merge

The release name is "aiopenscale" if you don't customize the release name when you install Watson OpenScale.

Watson OpenScale operator doesn't watch and reconcile custom resources

The Watson OpenScale operator might not process the WOService custom resources because a stale WATCH_NAMESPACE environment variable points only to a single namespace. You can use the following steps to fix this issue:

Log in to Red Hat OpenShift Container Platform with the following command:
```
oc login <OpenShift_URL>:<port>
```
Restart the Watson OpenScale operator pod with the following command:
```
oc delete pod -n ibm-common-services -l name=ibm-cpd-wos-operator,icpdsupport/app=ibm-aios
```
If you did not install the Watson OpenScale operator in the ibm-common-services project, specify an accurate namespace parameter.

Verify that the Watson OpenScale operator pod is running with the following command:

oc get pod -n ibm-common-services -l name=ibm-cpd-wos-operator,icpdsupport/app=ibm-aios

Watson OpenScale etcd and scheduler pods fail after large data upload

For pre-production models, the aiopenscale-ibm-aios-etcd and aiopenscale-ibm-aios-scheduling pods might fail to run after you upload CSV files with large amounts of data to configure model risk management. To fix this issue, you can run the following commands to increase the storage size of the etcd PersistentVolumeClaim (PVC) request and the memory of the etcd Statefulset object:

oc get pvc -l release=aiopenscale,serviceSelector=etcd | cut -d " " -f1 | while read pvc; do oc patch pvc $pvc -p '{"spec":{"resources":{"requests":{"storage":"10Gi"}}}}'; done
persistentvolumeclaim/data-aiopenscale-ibm-aios-etcd-0 patched
persistentvolumeclaim/data-aiopenscale-ibm-aios-etcd-1 patched
persistentvolumeclaim/data-aiopenscale-ibm-aios-etcd-2 patched

oc patch sts aiopenscale-ibm-aios-etcd -p '{"spec":{"template":{"spec":{"containers":[{"name":"etcd","resources":{"limits":{"memory":"10Gi"}}}]}}}}'

Watson OpenScale model risk management evaluation fails

When you run a model risk management (MRM) evaluation, the evaluation might fail because the feedback data that you uploaded is not persisted to Db2. To fix this issue, you can use the following steps to restart Kafka:

Stop the services that are connected to Kafka by running the following command:

oc scale deployment -l "component in (aios-datamart,aios-payload,aios-notification)" --replicas=0

Verify that the services are not running with the following command:

oc get pod -l "component in (aios-datamart,aios-payload,aios-notification)"

Stop the Kafka cluster with the following command:

oc scale sts -l component=aios-kafka --replicas=0

Verify that the Kafka services are not running with the following command:
```
oc get pod -l component=aios-kafka
```

Start the Kafka cluster with the following command:

oc scale sts -l component=aios-kafka --replicas=3

Verify that 3 Kafka pods are ready with the following command:
```
oc get pod -l component=aios-kafka
```

Start the services that are connected to Kafka with the following command:

oc scale deployment -l "component in (aios-datamart,aios-payload,aios-notification)" --replicas=1

Verify that the services are ready with the following command:

oc get pod -l "component in (aios-datamart,aios-payload,aios-notification)"

Microsoft Azure ML Studio

Of the two types of Azure Machine Learning web services, only the New type is supported by Watson OpenScale. The Classic type is not supported.
Default input name must be used: In the Azure web service, the default input name is "input1". Currently, this field is mandated for Watson OpenScale and, if it is missing, Watson OpenScale will not work.

If your Azure web service does not use the default name, change the input field name to "input1", then redeploy your web service and reconfigure your OpenScale machine learning provider settings.
If calls to Microsoft Azure ML Studio to list the machine learning models causes the response to time out, for example when you have many web services, you must increase timeout values. You may need to work around this issue by changing the /etc/haproxy/haproxy.cfg configuration setting:
- Log in to the load balancer node and update /etc/haproxy/haproxy.cfg to set the client and server timeout from 1m to 5m:
```
timeout client           5m
timeout server           5m
```
- Run systemctl restart haproxy to restart the HAProxy load balancer.

If you are using a different load balancer, other than HAProxy, you may need to adjust timeout values in a similar fashion.

Of the two types of Azure Machine Learning web services, only the New type is supported by Watson OpenScale. The Classic type is not supported.

Uploading feedback data fails in production subscription after importing settings

After importing the settings from your pre-production space to your production space you might have problems uploading feedback data. This happens when the datatypes do not match precisely. When you import settings, the feedback table references the payload table for its column types. You can avoid this issue by making sure that the payload data has the most precise value type first. For example, you must prioritize a double datatype over an integer datatype.

Microsoft Azure Machine Learning Service

When performing model evaluation, you may encounter issues where Watson OpenScale is not able to communicate with Azure Machine Learning Service, when it needs to invoke deployment scoring endpoints. Security tools that enforce your enterprise security policies, such as Symantec Blue Coat may prevent such access.

Watson OpenScale fails to create a new Hive table for the batch deployment subscription

When you choose to create a new Apache Hive table with the Parquet format during your Watson OpenScale batch deployment configuration, the following error might occur:

Attribute name "table name" contains invalid character(s) among " ,;{}()\\n\\t=". Please use alias to rename it.;

This error occurs if Watson OpenScale fails to run the CREATE TABLE SQL operation due to white space in a column name. To avoid this error, you can remove any white space from your column names or change the Apache Hive format to csv.

Watson OpenScale setup might fail with default Db2 database

When you set up Watson OpenScale and specify the default Db2 database, the setup might fail to complete.

To fix this issue, you must run the following command in Cloud Pak for Data to update Db2:

db2 update db cfg using DFT_EXTENT_SZ 32

After you run the command, you must create a new Db2 database to set up Watson OpenScale.