IBM Support

MustGather: Liberty on OpenShift with the Liberty Operator

Troubleshooting


Problem

This is a problem determination document to help collect data for WebSphere Liberty on OpenShift with the Liberty Operator.

Diagnosing The Problem

Table of contents:


Gather logs and configuration

Perform the following steps to gather logs and configuration:
  1. Ensure you are in the right namespace of the target application pods by replacing $NAME with your project name:
    oc project $NAME
  2. Find the relevant pods:
    oc get pods
  3. For each relevant pod, send the standard logs to a local file, replacing $POD twice based on the NAME in the previous step:
    oc logs --all-containers=true $POD > $POD.txt
  4. For each relevant pod, replace $POD based on the output above to remote into it (if there are multiple containers, use -c $CONTAINER after $POD to specify the Liberty container):
    oc exec -it $POD -- /bin/sh
  5. Optional: Copy some useful process information into /tmp
    cp -R --no-preserve=all --parents /proc/cpuinfo /proc/stat /proc/schedstat /proc/vmstat /proc/meminfo /proc/version /proc/pressure /proc/loadavg /proc/[0-9]*/cgroup /proc/[0-9]*/environ /proc/[0-9]*/cmdline /proc/[0-9]*/smaps /proc/[0-9]*/limits /proc/[0-9]*/stat /proc/[0-9]*/status /proc/[0-9]*/sched /proc/[0-9]*/schedstat /proc/[0-9]*/wchan /proc/[0-9]*/task/*/stat /proc/[0-9]*/task/*/wchan /proc/[0-9]*/task/*/sched /proc/[0-9]*/task/*/status /sys/fs/cgroup/cpu* /sys/fs/cgroup/memory* /sys/fs/cgroup/*/*/cpu* /sys/fs/cgroup/*/*/memory* /tmp/ 2>/dev/null
  6. Create a compressed file with all logs and configuration:
    tar czhf /tmp/liberty_${HOSTNAME}_$(date +%Y%m%d_%H%M%S).tar.gz /logs /config /serviceability/*/${HOSTNAME} /opt/*/wlp/usr/servers/*/logs /opt/*/wlp/usr/servers/*/configDropins /opt/*/wlp/usr/servers/*/*xml /opt/*/wlp/usr/servers/*/*options /opt/*/wlp/usr/servers/*/*env /opt/*/wlp/usr/servers/*/*properties /opt/*/wlp/usr/servers/*/javacore* /opt/*/wlp/usr/servers/*/verbosegc* /opt/*/wlp/usr/servers/*/heapdump* /opt/*/wlp/usr/servers/*/core* ${LOG_DIR} ${X_LOG_DIR} ${WLP_OUTPUT_DIR} ${SERVER_WORKING_DIR} ${VARIABLE_SOURCE_DIRS} ${JAVA_HOME}/jre/lib/security/java.security /tmp/proc /tmp/sys 2>/dev/null
  7. Optional: If you performed step 5 Delete only the temporary files produced in step 5:
    rm -rf /tmp/proc /tmp/sys
  8. List the compressed file and then exit the container:
    cd /tmp; ls *.tar.gz; exit &>/dev/null
  9. Download the compressed file, replacing $POD with the pod name and $FILE twice from the output of the previous step:
    oc cp $POD:/tmp/$FILE $FILE --retries=999
  10. Gather various resource state. The command will produce a local directory starting with the name inspect.local:
    1. If using the IBM WebSphere Liberty Operator:
      oc adm inspect WebSphereLibertyApplications,deployments,replicasets,pods,events
    2. If using the OpenLiberty Operator:
      oc adm inspect OpenLibertyApplications,deployments,replicasets,pods,events
  11. Upload:
    1. Pod standard logs (step 3)
    2. Liberty logs and configuration (step 9)
    3. Resource state (step 10): the directory with the name inspect.local*

Enable diagnostic trace at runtime

Perform the following steps to enable diagnostic trace at runtime:
  1. If runtime configuration updates are not enabled, then you must enable diagnostic trace at startup.
  2. If runtime configuration updates are enabled (as they are by default):
    1. If the operator storage for serviceability is not configured, then follow the Enable diagnostic trace at runtime steps in Liberty on OpenShift without the Liberty Operator.
    2. If the operator storage for serviceability is configured, then a WebSphereLibertyTrace custom resource may be used:
      1. Ensure you are in the right namespace of the target application pods by replacing $NAME with your project name:
        oc project $NAME
      2. Find the relevant pods:
        oc get pods
      3. For each relevant pod, create a local file named trace.yaml, replacing $POD based on the output above and replacing $TRACE with an IBM support-requested trace specification or your desired trace specification:
        1. If using the IBM WebSphere Liberty Operator:
          apiVersion: liberty.websphere.ibm.com/v1
          kind: WebSphereLibertyTrace
          metadata:
            name: libertytrace1
            annotations:
              day2operation.openliberty.io/targetKinds: Pod
          spec:
            license:
              accept: true
            podName: $POD
            traceSpecification: "*=info:$TRACE"
            maxFileSize: 100
            maxFiles: 5
            disable: false
        2. If using the OpenLiberty Operator:
          apiVersion: apps.openliberty.io/v1
          kind: OpenLibertyTrace
          metadata:
            name: libertytrace1
            annotations:
              day2operation.openliberty.io/targetKinds: Pod
          spec:
            license:
              accept: true
            podName: $POD
            traceSpecification: "*=info:$TRACE"
            maxFileSize: 100
            maxFiles: 5
            disable: false
      4. Apply the YAML:
        oc apply -f trace.yaml
      5. List the Liberty trace resources and verify that TRACING shows true:
        1. If using the IBM WebSphere Liberty Operator:
          oc get WebSphereLibertyTrace
          
        2. If using the OpenLiberty Operator:
          oc get OpenLibertyTrace
          
      6. Reproduce the problem
      7. Disable the diagnostic trace:
        1. If using the IBM WebSphere Liberty Operator:
          oc delete WebSphereLibertyTrace libertytrace1
          
        2. If using the OpenLiberty Operator:
          oc delete OpenLibertyTrace libertytrace1
          
      8. Gather and upload all logs

Enable diagnostic trace at startup

Perform the following steps to enable diagnostic trace at startup:
  1. Ensure you are in the right namespace of the target application pods by replacing $NAME with your project name:
    oc project $NAME
  2. In your current directory, create a local file named tracefromstartup.xml with the following contents and replace $TRACE with an IBM support-requested trace specification or your desired trace specification:
    <?xml version="1.0" encoding="UTF-8"?>
    <server>
      <logging traceSpecification="*=info:$TRACE" maxFileSize="100" maxFiles="10" />
    </server>
  3. Create a ConfigMap entry based on this local file:
    oc create configmap tracefromstartup --from-file tracefromstartup.xml
  4. List the Liberty Operator managed applications:
    1. If using the IBM WebSphere Liberty Operator:
      oc get WebSphereLibertyApplication
    2. If using the OpenLiberty Operator:
      oc get OpenLibertyApplication
  5. Edit the relevant Liberty Operator managed application:
    1. If using the IBM WebSphere Liberty Operator:
      oc edit WebSphereLibertyApplication $NAME
    2. If using the OpenLiberty Operator:
      oc edit OpenLibertyApplication $NAME
  6. In the spec section, add or edit a volumes section that mounts the ConfigMap and a volumeMounts section that places the file into the container; for example:
    spec:
      volumes:
      - name: tracefromstartup
        configMap:
          name: tracefromstartup
      volumeMounts:
      - name: tracefromstartup
        mountPath: /config/configDropins/overrides/tracefromstartup.xml
        subPath: tracefromstartup.xml
    
  7. Save and quit the editor. Ensure that the change succeeded by verifying the output ends with "edited"; for example:
    webspherelibertyapplication.liberty.websphere.ibm.com/websphereliberty-app-sample edited
    
  8. After a little bit of time, the old pods should be deleted and new pods with the new trace should be created.
  9. Reproduce the problem
  10. Gather and upload all logs

Change Java options at startup

The JVM_ARGS Liberty environment variable may be configured at container startup to change JVM options. Perform the following steps to add Java options at startup:
  1. First, check if this environment variable is already set:
    1. Ensure you are in the right namespace of the target application pods by replacing $NAME with your project name:
      oc project $NAME
    2. Find the relevant pods:
      oc get pods
    3. For one of the relevant pods, replace $POD based on the output above to search its environment variables (if there are multiple containers, use -c $CONTAINER after $POD to specify the Liberty container):
      oc exec -it $POD -- /bin/sh -c "cat /proc/[0-9]*/environ | tr '\0' '\n' | grep JVM_ARGS="
  2. Append your desired arguments to the value found in the above steps (if any).
  3. List the Liberty Operator managed applications:
    1. If using the IBM WebSphere Liberty Operator:
      oc get WebSphereLibertyApplication
    2. If using the OpenLiberty Operator:
      oc get OpenLibertyApplication
  4. Edit the relevant Liberty Operator managed application:
    1. If using the IBM WebSphere Liberty Operator:
      oc edit WebSphereLibertyApplication $NAME
    2. If using the OpenLiberty Operator:
      oc edit OpenLibertyApplication $NAME
  5. In the spec section, add or edit the JVM_ARGS entry in the env section; for example:
    spec:
      env:
      - name: JVM_ARGS
        value: -Djavax.net.debug=all
    
  6. Save and quit the editor. Ensure that the change succeeded by verifying the output ends with "edited"; for example:
    webspherelibertyapplication.liberty.websphere.ibm.com/websphereliberty-app-sample edited
    
  7. After a little bit of time, the old pods should be deleted and new pods with the new arguments should be created. You can verify by describing a new pod; for example:
    $ oc describe pod websphereliberty-app-sample-5bc7bb657f-h9bp9
      [...]
        Environment:
          JVM_ARGS:     -Djavax.net.debug=all
    
  8. Reproduce the problem
  9. Gather and upload all logs

Execute a server dump

Perform the following steps to execute and gather a Liberty server dump.
 
Warning: These commands will start a new process which will consume some memory (likely in the range of dozens of MB). If your container has a memory limit and it is near its limit, this may cause the container to crash. Alternatively, you may gather logs and configuration manually.
  1. If the operator storage for serviceability is not configured, then follow the Execute a server dump steps in Liberty on OpenShift without the Liberty Operator.
  2. If the operator storage for serviceability is configured, then a WebSphereLibertyDump custom resource may be used to perform a server dump. Note that server dump starts a new process which uses some memory so be careful performing this with a small memory limit.
    1. Ensure you are in the right namespace of the target application pods by replacing $NAME with your project name:
      oc project $NAME
    2. Find the relevant pods:
      oc get pods
    3. For each relevant pod, create a local file named dump.yaml, replacing $POD based on the output above:
      1. If using the IBM WebSphere Liberty Operator:
        apiVersion: liberty.websphere.ibm.com/v1
        kind: WebSphereLibertyDump
        metadata:
          name: libertydump1
          annotations:
            day2operation.openliberty.io/targetKinds: Pod
        spec:
          license:
            accept: true
          podName: $POD
          include:
            - thread
        
      2. If using the OpenLiberty Operator:
        apiVersion: apps.openliberty.io/v1
        kind: OpenLibertyDump
        metadata:
          name: libertydump1
          annotations:
            day2operation.openliberty.io/targetKinds: Pod
        spec:
          license:
            accept: true
          podName: $POD
          include:
            - thread
        
    4. Apply the YAML:
      oc apply -f dump.yaml
    5. List the Liberty dump resources:
      1. If using the IBM WebSphere Liberty Operator:
        oc get WebSphereLibertyDump
        
      2. If using the OpenLiberty Operator:
        oc get OpenLibertyDump
        
    6. Download the dump file, replacing $POD with the name of the pod and $FILE with the value of DUMP FILE in the previous command:
      oc cp $POD:$FILE libertydump.zip --retries=999

Gather data on a performance, hang, or high CPU issue


Notes & Tips

  1. If you are on macOS or Linux (or Cygwin on Windows), then you may use variables to simplify the above commands. For example, various commands use $POD for the target pod, so you may first execute a statement to set the POD variable and then future references of $POD in this terminal window will be replaced with what you specified. In the following example, POD is set to liberty1-5545f8475b-zdwmg and therefore the final command of oc logs $POD will use the specified value and thus you can just copy/paste commands from the instructions above without needing to modify them.
    $ oc get pods
    NAME                                           READY   STATUS    RESTARTS   AGE
    liberty1-5545f8475b-zdwmg                      1/1     Running   2          16d
    websphereliberty-app-sample-6f698f5bcb-srn55   1/1     Running   2          16d
    $ POD=liberty1-5545f8475b-zdwmg
    $ oc logs $POD
    

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB67","label":"IT Automation \u0026 App Modernization"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"ARM Category":[{"code":"a8mKe000000GmbMIAS","label":"IBM WebSphere Liberty-All Platforms-\u003EContainers"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
22 May 2024

UID

ibm17152466