Known issues and limitations for watsonx Assistant

The following known issues and limitations apply to watsonx Assistant.

For a complete list of known issues and troubleshooting information for all versions of watsonx Assistant, see Troubleshooting known issues. For a complete list of known issues for IBM® Software Hub, see Limitations and known issues in IBM Software Hub.

The `deploy-knative-eventing` fails with `error: multiNamespace InstallModeType` not supported

Applies to: 5.1.x

Problem

This issue arises due to the interaction between the namespace scoping approach used by the deploy-knative-eventing installation and the default behavior of the IBM Namespace Scope Operator when users execute the setup-instance-topology command.

Solution

To remove the error messages, you can modify the setting to allow the

MultiNamespace
InstallModeType

in the operator:

Edit the namespace scope operator csv file:

oc edit csv ibm-namespace-scope-operator.v{$CPD_VERSION} -n ibm-knative-events

Set the MultiNamespace parameter to true:

  
  installModes:
  - supported: true
    type: MultiNamespace

Delete the old namespace-scope-operator pod.


nss_op_pod=$(oc get pods -n ibm-knative-events -l name=ibm-namespace-scope-operator --no-headers | awk '{print $1}')
oc delete pod $nss_op_pod

Rerun the following command:

cpd-cli manage setup-instance-topology --release=${VERSION} --cpd_operator_ns=ibm-knative-events --cpd_instance_ns=knative-eventing --license_acceptance=true

watsonx Assistant webhook is not working

Applies to: 5.1.x

Problem

Unable to invoke webhooks programmatically in watsonx Assistant due to blocked self-signed certificates and private IP restrictions.

Solution

Apply the following patch to allow self-signed certificates and private IP in watsonx Assistant:

oc patch wa $INSTANCE_NAME --type='merge' -p='{"configOverrides":{"dialog":{"extra_vars": {"DIALOG_FEATURE_SELF_SIGNED_CERTIFICATES_IN_WEBHOOKS":true,"BLOCKED_CALLOUT_IPS":"[]"}}}}'

The `deploy-knative-eventing` fails with `error: no matching resources found`

Applies to: 5.1.3

Problem

When running the cpd-cli manage deploy-knative-eventing command, it fails with error: no matching resources found after the message deployment.apps/kafka-controller condition met. This issue arises because no pods with the label app=kafka-broker-dispatcher are present.

Solution

Exec into the docker pod running olm utils.
```
docker exec -it olm-utils-play-v3 bash
```

Check for the line that is to be removed.

cat /opt/ansible/bin/deploy-knative-eventing | grep kafka-broker-dispatcher

Output:

cat /opt/ansible/bin/deploy-knative-eventing | grep kafka-broker-dispatcher
oc wait pods -n knative-eventing --selector app=kafka-broker-dispatcher --for condition=Ready --timeout=60s

Remove the line.

sed -i '/kafka-broker-dispatcher/d' /opt/ansible/bin/deploy-knative-eventing

Verify whether the line is removed.

cat /opt/ansible/bin/deploy-knative-eventing | grep kafka-broker-dispatcher

Output:

cat /opt/ansible/bin/deploy-knative-eventing | grep kafka-broker-dispatcher

Reset test run in Evaluate response settings fails with 500 error

Applies to: 5.1.2 and 5.1.3

Problem: When you reset the test run from the Evaluate response settings page, it fails with a 500 Internal Server Error, and the UI hangs.
Solution: If the page hangs after the error, go to a different page and then return to the same page, or refresh the page.

Preview page not available when the watsonx Assistant is created using the API

Applies to: 5.1.0, 5.1.1 and 5.1.2

Problem: The Preview page is not accessible when watsonx Assistant is created using the API.

`wa-system-entities` pods fail with Error and `ContainerStatusUnknown` statuses

Applies to: 5.1.1

Problem

The wa-system-entities pods fail with Error and ContainerStatusUnknown statuses. In both cases, the pods exit with error code 137 (OOMKilled) and the message:

"Pod ephemeral local storage usage exceeds
the total limit of containers 200Mi"

Solution

Set environment variables.

export PROJECT_CPD_INSTANCE=<namespace where watsonx Assistant is running>
export INSTANCE=<watsonx Assistant Instance Name>  # Normally "wa"

Verify the values.

echo $PROJECT_CPD_INSTANCE
echo $INSTANCE

Apply the patch using variables.

cat <<EOF | oc apply -f -
apiVersion: assistant.watson.ibm.com/v1
kind: TemporaryPatch
metadata:
  name: wa-system-entities-ephemeral-storage
  namespace: $PROJECT_CPD_INSTANCE
spec:
  apiVersion: assistant.watson.ibm.com/v1
  kind: WatsonAssistantClu
  name: $INSTANCE
  patchType: patchStrategicMerge
  patch:
    system-entities:
      deployment:
        spec:
          template:
            spec:
              containers:
              - name: system-entities
                resources:
                  limits:
                    ephemeral-storage: 400Mi
                  requests:
                    ephemeral-storage: 300Mi
EOF

`Cronjob` gets stuck during the watsonx Assistant upgrade from any prior releases to Version 5.1.0

Applies to: 5.1.0

Problem

The cronjob for EDB Postgres database uses the old config even after the watsonx Assistant upgrade to Version 5.1.0. This old config usage is due to some changes in resource creation. This causes the upgrade to fail.

Solution

Delete the existing cronjob and cronjob created jobs so that it creates a new cronjob with the correct config.

oc delete job -l component=store-cronjob,service=conversation
oc delete cronjob -l component=store-cronjob,service=conversation

Delete the cleanup job to restart the cleanup.

oc delete job -l component=cleanup,service=conversation

watsonx Assistant instance cannot be opened with either `Error 404 or 502`

Applies to: 5.1.0 and later

Problem

You cannot open an instance of watsonx Assistant with either Error 404 - Not Found or 502 Bad Gateway.

Solution

Get the GW CR item list.

oc get watsongateway -o yaml

Example output:

nginxDirectives:
      - proxy_buffer_size        32k;
      - proxy_busy_buffers_size  32k;
      - proxy_buffers          8 32k;

Get the maximum value out of the three nginxDirectives from nginx pod.

Get nginx pod name from the operand namespace.

export PROJECT_CPD_INST_OPERANDS=cpd  # Change based on where your WA is installed in namespace

oc get po -n ${PROJECT_CPD_INST_OPERANDS} | grep nginx

Example output::


ibm-nginx-788bc899f6-6hv95                                        2/2     Running     0                 17d
ibm-nginx-788bc899f6-cwsg9                                        2/2     Running     0                 17d

export nginx_pod=ibm-nginx-788bc899f6-6hv95

Get the value of buffer from anyone of the nginx pods.


oc -n ${PROJECT_CPD_INST_OPERANDS} exec -it $nginx_pod -- nginx -T | grep proxy_buffers

Example output:

Defaulted container "ibm-nginx-container" out of: ibm-nginx-container, zen-objstore-mirror-container, zen-objstore-init-container (init)
 proxy_buffers  4 256k;
 proxy_buffers  8 512k;
 proxy_buffers  4 256k;
 proxy_buffers  4 256k;
 proxy_buffers  4 256k;
 proxy_buffers 8 64k;
 proxy_buffers 8 64k;
 proxy_buffers 8 64k;

oc -n cpd-instance-grp5 exec -it $nginx_pod -- nginx -T | grep proxy_buffer_size

Example output:


Defaulted container "ibm-nginx-container" out of: ibm-nginx-container, zen-objstore-mirror-container, zen-objstore-init-container (init)
proxy_buffer_size 8k;
 proxy_buffer_size  512k;
 proxy_buffer_size  256k;
 proxy_buffer_size  256k;
 proxy_buffer_size  256k;
 proxy_buffer_size  256k;
proxy_buffer_size 8k;
 proxy_buffer_size 64k;
 proxy_buffer_size 64k;
 proxy_buffer_size 64k;
proxy_buffer_size 8k;

oc -n cpd-instance-grp5 exec -it $nginx_pod -- nginx -T | grep proxy_busy_buffers_size

Example output:

Defaulted container "ibm-nginx-container" out of: ibm-nginx-container, zen-objstore-mirror-container, zen-objstore-init-container (init)
 proxy_busy_buffers_size 256k;
 proxy_busy_buffers_size 512k;
 proxy_busy_buffers_size 64k;

Get the maximum value for all the three directives and substitute in the following file.

Note: proxy_busy_buffers_size must be equal to or greater than the maximum of the value of proxy_buffer_size and one of the proxy_buffers.

cat <<EOF | oc apply -f -
apiVersion: zen.cpd.ibm.com/v1
kind: ZenExtension
metadata:
  name: proxy-buffers-extension
spec:
  proxy-buffer.conf: |
    proxy_buffers 8 512k;
    proxy_buffer_size 512k;
    proxy_busy_buffers_size 512k;
  extensions: | 
    [
      {
        "extension_point_id": "zen_front_door",
        "extension_name": "proxy-buffers-extension",
        "details": {
          "upstream_conf": "proxy-buffer.conf"
        }
      }
    ]
EOF

Check whether all the zenextension are in completed state. This process takes approximately 15 minutes to complete.
```
oc get zenextension -n $PROJECT_CPD_INST_OPERANDS
```
Check whether the zenservice is in completed state.
```
oc get zenservice -n $PROJECT_CPD_INST_OPERANDS
```

Check nginx pod and that must be in running state.

oc get po -n $PROJECT_CPD_INST_OPERANDS | grep nginx

Opening bookmarked URL of watsonx Assistant tooling page returns error

Applies to: 5.1.0

Problem: When you open watsonx Assistant tooling page from bookmarked URL, it returns an error with code 500.
Solution: Login to IBM Software Hub and access the watsonx Assistant tooling page.

Preview page not available for Watson Discovery integration

Applies to: 5.1.0

Problem: The Preview page does not appear when you integrate Watson Discovery in watsonx Assistant.

Postgres pod goes to `CrashLoopBackOff` status after the upgrade

Applies to: 5.1.0 or later

Problem

When you upgrade watsonx Assistant, one of the Postgres pods goes to CrashLoopBackOff. This issue occurs because your data is corrupted.

Solution

Run the following command to find the watsonx Assistant Postgres pod in the CrashLoopBackOff state.

oc get pods --no-headers | grep -Ev "Comp|0/0|1/1|2/2|3/3|4/4|5/5|6/6|7/7|8/8" | grep wa-postgres

The output looks like this:

wa-postgres-3 0/1 CrashLoopBackOff 115 (2m30s ago) 9h

Run the following command to identify if the Postgres pod is the primary pod:
```
oc get cluster | grep wa-postgres
```
The output looks like this:
```
oc get cluster | grep wa-postgres
wa-postgres         2d20h   3           3       Cluster in healthy state   wa-postgres-1
```
Where wa-postgres-1 is the primary pod.
Tip: If the primary instance is in CrashLoopBackOff status, do the steps in the Postgres cluster in bad state topic.
Delete the non-primary pod and its PersistentVolumeClaim (PVC) to create a new pod that syncs with the primary pod.
Warning: Do not delete a primary pod because it can lead to database downtime and potential data loss.
```
oc delete pod/wa-postgres-3 pvc/wa-postgres-3
```
Important: Ensure that the EDB operator is running before deleting the pod and its PVC.

Known issues and limitations for watsonx Assistant

The deploy-knative-eventing fails with error: multiNamespace InstallModeType not supported

watsonx Assistant webhook is not working

The deploy-knative-eventing fails with error: no matching resources found

Reset test run in Evaluate response settings fails with 500 error

Preview page not available when the watsonx Assistant is created using the API

wa-system-entities pods fail with Error and ContainerStatusUnknown statuses

Cronjob gets stuck during the watsonx Assistant upgrade from any prior releases to Version 5.1.0

watsonx Assistant instance cannot be opened with either Error 404 or 502

Opening bookmarked URL of watsonx Assistant tooling page returns error

Preview page not available for Watson Discovery integration

Postgres pod goes to CrashLoopBackOff status after the upgrade

The `deploy-knative-eventing` fails with `error: multiNamespace InstallModeType` not supported

The `deploy-knative-eventing` fails with `error: no matching resources found`

`wa-system-entities` pods fail with Error and `ContainerStatusUnknown` statuses

`Cronjob` gets stuck during the watsonx Assistant upgrade from any prior releases to Version 5.1.0

watsonx Assistant instance cannot be opened with either `Error 404 or 502`

Postgres pod goes to `CrashLoopBackOff` status after the upgrade