Known issues for watsonx Orchestrate
The following known issues and limitations apply to watsonx Orchestrate.
- After adding users to watsonx Orchestrate, Personal Skills are not displayed for the users
- watsonx Orchestrate pod component limits are not defined for resource scaling
- watsonx Assistant Dialog pod experiences continuous increased memory utilization and OOMKill
- CR remains in the In Progress status after rebooting the cluster
- A mongodb-wo-mongo-ops-manager-db pod error can halt the watsonx Orchestrate installation
- Adding skills before synchronization results in an error
- An error is displayed when connecting to a skill
- Cannot upload files over 6 MB by using Box skills
- Unable to add input to the connection fields of Salesforce
- OAuth 2.0 web authentication is not supported when adding new skills
- RabbitMQ cluster is set to In progress or Fail and wo-rabbitmq-orchestrate-backup-label has not completed after 1 minute
After adding users to watsonx Orchestrate,
Personal Skills
are not displayed for the users
Applies to: 4.8.5
- Problem
-
Unable to create more than 30 instances of watsonx Assistant to support
Personal Skills
in watsonx Orchestrate.
- Cause
- watsonx Assistant has a maximum limit of creating only 30 instances per watsonx Orchestrate instance.
- Solution
- To solve the problem, apply the following patch:
INSTANCE_NAME=$(oc -n <cpd-namespace> get wa --output jsonpath='{.items[0].metadata.name}') oc patch wa ${INSTANCE_NAME} --type='merge' -p='{"configOverrides":{"store":{"extra_vars": {"store": {"MAX_NEW_IA_ASSISTANTS":"1000","MAX_NEW_IA_SKILLS":"300000","ASSISTANT_MAX_PAGE_LIMIT":"1000"}}}}}'
watsonx Orchestrate pod component limits are not defined for resource scaling
Applies to: 4.8.4 and 4.8.5
- Problem
- Requests and limits are not defined for optimizing pod resources.
- Cause
-
Limited support for resource scaling in watsonx Orchestrate.
- Solution
- Disable the reconcile loop for the pod components to a certain capacity.To resolve this issue, you can use one of the following methods:
- Apply the following label to the component that you want to
scale:
This disables the reconcile loop for the component. You can modify the CR per your requirements. A limitation with this method is that not all components expose their resources in the CR, so you cannot change the default resources for them.wo.watsonx.ibm.com/hands-off=true
- After the watsonx Orchestrate deployment is
complete, scale down the watsonx Orchestrate component
operator. To scale down, navigate to the component operator deployment and manually scale to 0, or
use the following
command:
This action disables the reconcile loop for all components, and you can modify the resources per your requirements. This method allows you to scale deployments directly and modify components that do not expose resources in their CRs.oc scale -n ${OPERATOR_NS} deploy/ibm-wxo-component-operator-controller-manager --replicas 0
- Apply the following label to the component that you want to
scale:
watsonx Assistant Dialog pod experiences
continuous increased memory utilization and OOMKill
Applies to: 4.8.5
- Problem
-
During testing
wa-dialog
pod experienced increased memory utilization and one of the pods wasOOMKilled.
- Cause
- This problem is caused due to less memory limit for watsonx Assistant.
- Solution
- To solve the problem, apply the following patch: Note: Apply the following patch on top of the watsonx Assistant with t-shirt size as medium and
HPA
enabled.INSTANCE_NAME=$(oc -n <cpd-namespace> get wa --output jsonpath='{.items[0].metadata.name}') oc -n <cpd-namespace> patch wa ${INSTANCE_NAME} --type=merge -p="{\"configOverrides\":{\"dialog\":{\"resources\":{\"limits\":{\"memory\":\"1.5Gi\"}}}}}"
CR remains in the In Progress
status after rebooting the cluster
Applies to: 4.8.4
Fixed in: 4.8.5
- Problem
- After a cluster reboot, the CR gets stuck in the
In Progress
status.watsonx_orchestrate WatsonxOrchestrate wo zen InProgress 1.0.0 2024-03-21T09:30:22Z 1.0.0 N/A
- Cause
-
This problem occurs when Kafka waits for reconciling its topics.
- Solution
- To resolve this problem, delete the pending Kafka pod. For example, the
kafkaibm-kafka-2
pod below in a cluster.conditions: - lastTransitionTime: "2024-03-22T14:05:09.741114596Z" message: Pod wo-watson-orchestrate-kafkaibm-kafka-2 cannot be updated right now. reason: UnforceableProblem status: "True" type: NotReady
A mongodb-wo-mongo-ops-manager-db
pod error can halt the watsonx Orchestrate installation
Applies to: 4.8.4
Fixed in: 4.8.5
- Error
- This issue does not show a clear error in the pod or in the
opsmanager
CR. To confirm that themongodb-wo-mongo-ops-manager-db
pod error has occurred in the cluster, run the following command:oc exec -it -c mongodb-agent mongodb-wo-mongo-ops-manager-db-0 -- /opt/scripts/readinessprobe
If the error has occurred in the cluster, the following is displayed:
panic: Get "[https://172.30.0.1:443/api/v1/namespaces/cpd-instance-1/pods/mongodb-wo-mongo-ops-manager-db-0](https://172.30.0.1/api/v1/namespaces/cpd-instance-1/pods/mongodb-wo-mongo-ops-manager-db-0)": dial tcp 172.30.0.1:443: i/o timeout
- Cause
-
This error occurs in certain clusters when the MongoDB
readinessprobe
calls to the Kubernetes API are not supported by the network traffic. - Solution
- To resolve this error, run the following
commands:
# Update opsmanager labels to allow access oc patch opsmanager mongodb-wo-mongo-ops-manager --type merge --patch '{"spec":{"applicationDatabase":{"podSpec":{"podTemplate":{"metadata":{"labels":{"wo.watsonx.ibm.com/external-access":"true"}}}}}}}' # Delete existing sts to force re-rollout: oc delete sts mongodb-wo-mongo-ops-manager-db
Adding skills before synchronization results in an error
Applies to: 4.8.4
- Error
- When you add skills before synchronization, you might see the following
error:
Failed to complete this operation due to the following errors. Failed to upskill skill because {'status': 'failed', 'status_code':500, 'message':'"next_action"', 'detailed_message': None, 'other_details': None, 'is_available': True, 'skill_set_orch':[]}. Skill has not bootstrapped.
- Cause
-
Synchronization of applications and skills takes up to five minutes to complete after the service instance creation.
- Solution
- Wait for up to five minutes to complete the synchronization of applications and skills.
An error is displayed when connecting to a skill
Applies to: 4.8.4
- Error
- When you try to connect to a skill, you might see the following
error:
You couldn't connect. An error occurred while processing the request. Error: unable to get local issuer certificate.
- Cause
- IBM® watsonx Orchestrate does not support insecure communication with servers that use self-signed certificates or any certificates that are not signed by a trusted Certificate Authority (CA).
- Solution
- The server administrator must use a certificate that is signed by a trusted CA.
Cannot upload files over 6 MB by using Box skills
Applies to: 4.8.4
- Problem
- This issue is caused due to the file size.
- Cause
- This issue occurs when you try to upload files over 6 MB by using Box skills.
- Solution
- You need to upload the file manually or compress the file to a size smaller than 6 MB.
Unable to add input to the connection fields of Salesforce
Applies to: 4.8.4
- Problem
- You are unable to add input to the connection fields of Salesforce because the focus does not stay on the input fields.
- Cause
- This issue is caused due to a known user interface issue.
- Solution
- Click the border of an input field to keep the focus on the field.
OAuth 2.0 web authentication is not supported when adding new skills
Applies to: 4.8.4
- Problem
- OAuth 2.0 web authentication is not supported when adding new skills
- Cause
- This issue is caused due to browser redirection.
- Solution
- Try to use a different authentication method.
RabbitMQ cluster is set to In
progress
or Fail
and wo-rabbitmq-orchestrate-backup-label
has not completed after 1 minute
Applies to: 4.8.4
- Problem
- watsonx Orchestrate CR is stuck at
32
pods deployed, see the following message:# oc get WatsonxOrchestrate wo -n bvt NAME DEPLOYED VERIFIED TOTAL wo 32 31 47
See if the following log is in the job:
# oc logs job/wo-rabbitmq-orchestrate-backup-label Unable to connect to the server: dial tcp 172.30.0.1:443: i/o timeout Retries=18 waiting for all PVCs to be ready E0328 14:58:32.163323 208 memcache.go:265] couldn't get current server API group list: Get "https://172.30.0.1:443/api?timeout=32s": dial tcp 172.30.0.1:443: i/o timeout
- Cause
- The installation from production environment fails because of RabbitMQ instability and displays the error: “WO-Mongo is setting cpdmongodbservice-cr to fail".
- Solution
- To solve the problem, apply the following patch:
# Add rabbit label oc patch rabbitmqcluster wo-rabbitmq --type 'merge' --patch '{"metadata":{"annotations":{"wo.watsonx.ibm.com/hands-off":"true"}},"spec":{"global":{"podLabels":{"wo.watsonx.ibm.com/external-access":"true"}}}}' # Restart rabbitmq job oc delete job wo-rabbitmq-orchestrate-backup-label