Known issues for watsonx Orchestrate

The following known issues and limitations apply to watsonx Orchestrate.

After adding users to watsonx Orchestrate, Personal Skills are not displayed for the users

Applies to: 4.8.5

Problem

Unable to create more than 30 instances of watsonx Assistant to support Personal Skills in watsonx Orchestrate.

Cause
watsonx Assistant has a maximum limit of creating only 30 instances per watsonx Orchestrate instance.
Solution
To solve the problem, apply the following patch:
INSTANCE_NAME=$(oc -n <cpd-namespace> get wa --output jsonpath='{.items[0].metadata.name}')
oc patch wa ${INSTANCE_NAME} --type='merge' -p='{"configOverrides":{"store":{"extra_vars": {"store": {"MAX_NEW_IA_ASSISTANTS":"1000","MAX_NEW_IA_SKILLS":"300000","ASSISTANT_MAX_PAGE_LIMIT":"1000"}}}}}'

watsonx Orchestrate pod component limits are not defined for resource scaling

Applies to: 4.8.4 and 4.8.5

Problem
Requests and limits are not defined for optimizing pod resources.
Cause

Limited support for resource scaling in watsonx Orchestrate.

Solution
Disable the reconcile loop for the pod components to a certain capacity.
To resolve this issue, you can use one of the following methods:
  • Apply the following label to the component that you want to scale:
    wo.watsonx.ibm.com/hands-off=true
    This disables the reconcile loop for the component. You can modify the CR per your requirements. A limitation with this method is that not all components expose their resources in the CR, so you cannot change the default resources for them.
  • After the watsonx Orchestrate deployment is complete, scale down the watsonx Orchestrate component operator. To scale down, navigate to the component operator deployment and manually scale to 0, or use the following command:
    oc scale -n ${OPERATOR_NS} deploy/ibm-wxo-component-operator-controller-manager --replicas 0
    This action disables the reconcile loop for all components, and you can modify the resources per your requirements. This method allows you to scale deployments directly and modify components that do not expose resources in their CRs.

watsonx Assistant Dialog pod experiences continuous increased memory utilization and OOMKill

Applies to: 4.8.5

Problem

During testing wa-dialog pod experienced increased memory utilization and one of the pods was OOMKilled.

Cause
This problem is caused due to less memory limit for watsonx Assistant.
Solution
To solve the problem, apply the following patch:
Note: Apply the following patch on top of the watsonx Assistant with t-shirt size as medium and HPA enabled.
INSTANCE_NAME=$(oc -n <cpd-namespace> get wa --output jsonpath='{.items[0].metadata.name}')
oc -n <cpd-namespace> patch wa ${INSTANCE_NAME} --type=merge -p="{\"configOverrides\":{\"dialog\":{\"resources\":{\"limits\":{\"memory\":\"1.5Gi\"}}}}}"

CR remains in the In Progress status after rebooting the cluster

Applies to: 4.8.4

Fixed in: 4.8.5

Problem
After a cluster reboot, the CR gets stuck in the In Progress status.
 watsonx_orchestrate  WatsonxOrchestrate    wo                              zen          InProgress  1.0.0      2024-03-21T09:30:22Z  1.0.0                    N/A 
Cause

This problem occurs when Kafka waits for reconciling its topics.

Solution
To resolve this problem, delete the pending Kafka pod. For example, the kafkaibm-kafka-2 pod below in a cluster.
  conditions:
  - lastTransitionTime: "2024-03-22T14:05:09.741114596Z"
    message: Pod wo-watson-orchestrate-kafkaibm-kafka-2 cannot be updated right now.
    reason: UnforceableProblem
    status: "True"
    type: NotReady

A mongodb-wo-mongo-ops-manager-db pod error can halt the watsonx Orchestrate installation

Applies to: 4.8.4

Fixed in: 4.8.5

Error
This issue does not show a clear error in the pod or in the opsmanager CR. To confirm that the mongodb-wo-mongo-ops-manager-db pod error has occurred in the cluster, run the following command:
oc exec -it -c mongodb-agent  mongodb-wo-mongo-ops-manager-db-0 -- /opt/scripts/readinessprobe

If the error has occurred in the cluster, the following is displayed:

panic: Get "[https://172.30.0.1:443/api/v1/namespaces/cpd-instance-1/pods/mongodb-wo-mongo-ops-manager-db-0](https://172.30.0.1/api/v1/namespaces/cpd-instance-1/pods/mongodb-wo-mongo-ops-manager-db-0)": dial tcp 172.30.0.1:443: i/o timeout
Cause

This error occurs in certain clusters when the MongoDB readinessprobe calls to the Kubernetes API are not supported by the network traffic.

Solution
To resolve this error, run the following commands:
# Update opsmanager labels to allow access
oc patch opsmanager mongodb-wo-mongo-ops-manager --type merge --patch '{"spec":{"applicationDatabase":{"podSpec":{"podTemplate":{"metadata":{"labels":{"wo.watsonx.ibm.com/external-access":"true"}}}}}}}'
# Delete existing sts to force re-rollout:
oc delete sts mongodb-wo-mongo-ops-manager-db

Adding skills before synchronization results in an error

Applies to: 4.8.4

Error
When you add skills before synchronization, you might see the following error:
Failed to complete this operation due to the following errors. Failed to upskill skill because
{'status': 'failed', 'status_code':500, 'message':'"next_action"', 'detailed_message': None, 'other_details': None, 'is_available': True,
'skill_set_orch':[]}. Skill has not bootstrapped.
Cause

Synchronization of applications and skills takes up to five minutes to complete after the service instance creation.

Solution
Wait for up to five minutes to complete the synchronization of applications and skills.

An error is displayed when connecting to a skill

Applies to: 4.8.4

Error
When you try to connect to a skill, you might see the following error:
You couldn't connect. An error occurred while processing the request. Error: unable to get local issuer certificate.
Cause
IBM® watsonx Orchestrate does not support insecure communication with servers that use self-signed certificates or any certificates that are not signed by a trusted Certificate Authority (CA).
Solution
The server administrator must use a certificate that is signed by a trusted CA.

Cannot upload files over 6 MB by using Box skills

Applies to: 4.8.4

Problem
This issue is caused due to the file size.
Cause
This issue occurs when you try to upload files over 6 MB by using Box skills.
Solution
You need to upload the file manually or compress the file to a size smaller than 6 MB.

Unable to add input to the connection fields of Salesforce

Applies to: 4.8.4

Problem
You are unable to add input to the connection fields of Salesforce because the focus does not stay on the input fields.
Cause
This issue is caused due to a known user interface issue.
Solution
Click the border of an input field to keep the focus on the field.

OAuth 2.0 web authentication is not supported when adding new skills

Applies to: 4.8.4

Problem
OAuth 2.0 web authentication is not supported when adding new skills
Cause
This issue is caused due to browser redirection.
Solution
Try to use a different authentication method.

RabbitMQ cluster is set to In progress or Fail and wo-rabbitmq-orchestrate-backup-label has not completed after 1 minute

Applies to: 4.8.4

Problem
watsonx Orchestrate CR is stuck at 32 pods deployed, see the following message:
#  oc get WatsonxOrchestrate wo -n bvt
NAME   DEPLOYED   VERIFIED   TOTAL
wo     32         31         47

See if the following log is in the job:

# oc logs job/wo-rabbitmq-orchestrate-backup-label
Unable to connect to the server: dial tcp 172.30.0.1:443: i/o timeout
Retries=18
waiting for all PVCs to be ready
E0328 14:58:32.163323     208 memcache.go:265] couldn't get current server API group list: Get "https://172.30.0.1:443/api?timeout=32s": dial tcp 172.30.0.1:443: i/o timeout
Cause
The installation from production environment fails because of RabbitMQ instability and displays the error: “WO-Mongo is setting cpdmongodbservice-cr to fail".
Solution
To solve the problem, apply the following patch:
# Add rabbit label
oc patch rabbitmqcluster wo-rabbitmq --type 'merge' --patch '{"metadata":{"annotations":{"wo.watsonx.ibm.com/hands-off":"true"}},"spec":{"global":{"podLabels":{"wo.watsonx.ibm.com/external-access":"true"}}}}'
# Restart rabbitmq job
oc delete job wo-rabbitmq-orchestrate-backup-label