IBM Support

Mustgather: Collecting data to diagnose issues with IBM Business Automation Workflow in a container

Troubleshooting


Problem

This document describes the general information and diagnostic data needed to start troubleshooting issues related to IBM Business Automation Workflow (BAW) containers. Include the diagnostics retrieved from using this document when you open a case for problems related to Business Automation Workflow containers running stand-alone or in Cloud Pak for Automation.

Resolving The Problem

Overview of Business Automation Workflow diagnostic information


General diagnostic information


As needed diagnostic information


Detailed diagnostic collection steps

These steps are the detailed steps to gather different types of data for BAW.  When you run the diagnostic commands, run them from an empty collection directory to make it easy to package the files. Run the commands from the project or namespace containing BAW or use the -n <namespace> flag with all oc commands.
Note: oc commands are interchangeable with kubectl.
When using something other than OCP, you should use kubectl and the -n parameter with kubernetes commands.

Important: If your issue is with workflow authoring on version 22.x or later, see the BA Studio mustgather for diagnostic collection.

1: Provide a detailed description of the problem and your environment

  • Provided a detailed description of your issue. Include screen captures and re-create steps if possible.
    Is it an intermittent or recreatable issue? Has this issue always been a problem or one that started only after a change occurred?
    What is the business impact? Are there any deadlines impacted by the issue?
  • Provide a reference to the documentation being followed for the failing operation
  • Which platform are you using (Red Hat OpenShift, managed Red Hat OpenShift, other Kubernetes platform)?
  • What is the database type and version?

2: Gather the configuration information

Gather the general configuration data
oc get icp4acluster -oyaml > Cp4aCR.yaml
oc get content -oyaml > ContentCR.yaml

oc adm must-gather --image=icr.io/cpopen/cpfs/must-gather:latest -- gather -m automationfoundation -n <cloud pak namespace>
The -n parameter is required and must be a single namespace.  If you are using an air gap setup, ensure you push the latest version of the must-gather image into your local repository. The command requires cluster admin access to execute. Generally, this collection takes 5 - 10 minutes and produces a 25 - 50MB gzip file.

Additionally in 23.0.1, new mustgather command options were added. See Gathering deployment information and logs from Cloud Pak for Business Automation for more details. This command is an alternative that gathers more targeted config data and logs if the issue is specific to workflow where cp4baNS is the Cloud Pak namespace and 23.0.2 is the appropriate version tag.
oc adm must-gather --image=icr.io/cpopen/cp4ba/icp4a-must-gather:23.0.2 -- gather -m cp4ba –p workflow_runtime -n cp4baNS

If you are not able to use the must-gather command then see item 2 option 2 of the main Cloud Pak MustGather to gather some basic info. The oc adm must-gather command is not valid for non-OCP environments and remember to replace oc with kubectl for kubernetes commands.

3: Log and Tracing data for WebSphere Liberty 
Follow these steps to enable IBM WebSphere Liberty tracing on the BAW containers.
  1. Edit the icp4acluster CR yaml used by the operator create the BAW pods.
    The same steps can be used for either baw_configuration or workflow_authoring_configuration.
    Modify the traceSpecification property in the BAW logs section of the yaml and set the following trace string or a trace that fits your problem.
    spec:
      ...
      baw_configuration:
        ...
        logs:
          trace_specification: '*=info:WLE.*=all:com.ibm.bpm.*=all:com.ibm.workflow.*=all'
    Update the CR with the new configuration by using your preferred method. For example, the edit command can be used.
    oc edit icp4acluster

    Note: It can take a large amount of time to recognize the change (length of an operator reconcile) and update the configuration. You can grep the log file for traceSpecification to see when the trace settings change.

  2. Optional: The changes can be applied immediately by additionally modifying the configmap ending with baw-server-configmap-custom. This configmap contains a trace-specification.xml file. Edit the settings of this file to match what was used in the CR file.

  3. The following command can be used to gather the BAW logs where pod name is one of the BAW pods. 
    oc cp <pod-name>:/logs/application/ ./BAW
    Note: The logs can also be gathered directly from the associated BAW logging (baw-logstore-pvc) persistent volume(PV).
  4. Disable the trace by setting traceSpecification back to "*=info" and applying the changes again.

4: Export of your application

If the issue is specific to a certain application, provide an export of that application.
For more information, see Importing and exporting projects.

5: Collect Operator logs

If you are having issues during the install or upgrade of BAW or CP4BA then collect the operator logs:
oc cp $operator_pod_name:/tmp/ansible-operator/runner/ ./operator_logs/
 
Where $operator_pod_name is the name of the operator pod you are concerned with (for example ibm-cp4a-operator).
For recent versions of BAW, you generally need to provide these from both the cp4a and content operators.
For more information, see the installation troubleshooting page.

6: Collect Browser data for UI issues

For console or web application usage issues, capture the following browser data:

7: Gathering javacores and heap dumps

For issue related to performance, hangs, jvm crashes, or memory issues, we likely need to get dumps from the liberty servers.
  1. Determine the names of the BAW server pods by using the get pods command.
    oc get pods | grep baw-server
  2. If dumps need to be generated, you can use the Liberty server dump commands to create them. Use the javadump command to generate javacores for each BAW server pod. Include the option --include=heap or --include=system to generate heap dumps or system core dumps. For example, the following command generates a javacore and heap dump for the pod.

    oc exec <podname> -- bash -c "server javadump --include=heap"
    Note: If a BAW Liberty server JVM crashes, then you might also see dumps get generated.
  3. The following command can be used to gather the BAW dumps where pod name is one of the BAW pods. 
    oc cp <pod-name>:/opt/ibm/wlp/output/defaultServer/dump ./BAW/dumps/
    Note: The dumps can also be gathered directly from the associated BAW dump (baw-dumpstore-pvc) persistent volume(PV).

Enabling verbose:gc and other JVM dump options.

Update the CR to include the needed JVM options and point the logs at an appropriate location.
baw_configuration:
  jvm_customize_options: -verbose:gc -Xverbosegclog:/logs/application/verbosegc/verbosegc.%Y%m%d.%H%M%S.%pid.txt,20,10000 -Xdump:stack:events=allocation,filter=#25m
These options enable verbose:gc, send log files to the logging PVC under the /verbosegc directory and enables stack dumps for gc events larger than 25MBs.
The operator will rollout the changes. To confirm the change or speed up the process, you can view or edit the configmap ending with baw-server-configmap-liberty. The jvm.options key in this configmap contains the settings. The pods do need to be restarted to pickup the new settings if you change the configmap. Once enabled logs can be gathered from the logging PV as mentioned in item 3 of this mustgather.

8: Gathering resource registry data

 
If you have an issue with the interaction between workflow and resource registry, then gather the follow data:
  • Get a dump of the resource registry contents. Run this command from one of the resource registry pods.
    etcdctl --cacert=/shared/resources/tls/ca-cert.pem --user=root:<root password> --insecure-skip-tls-verify get "" --from-key
    The root password can be determine by checking the secret icp4adeploy-rr-admin-secret.
  • Enable this trace string in addition to any other needed tracing when recreating the issue:
    com.ibm.bpm.dbaregistry.*=all: com.ibm.bpm.resourceregistry.*=all: com.ibm.bpm.serviceregistry.*=all: com.ibm.bpm.bas.registry.*=all
    See item 3 for more details on enabling trace.

What to do next

  1. Review the log files and traces at the time of the problem to try to determine the source of the problem.
     
  2. Check these locations for known issues:
  3. Once you completed gathering all the needed information and diagnostics, you can add them to your case. Alternatively, you can upload files to ECURep. For more information, see Enhanced Customer Data Repository (ECuRep) - Overview.

Document Location

Worldwide

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS2JQC","label":"IBM Cloud Pak for Automation"},"ARM Category":[{"code":"a8m0z0000001ew3AAA","label":"Business Automation Workflow"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)","Line of Business":{"code":"LOB10","label":"Data and AI"}},{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS8JB4","label":"IBM Business Automation Workflow"},"ARM Category":[{"code":"a8m50000000CcUCAA0","label":"Support Process EOS RFE License"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
06 February 2024

UID

ibm16259483