Troubleshooting
Problem
This is a problem determination document to help collect data for Liberty on Kubernetes without the Liberty Operator.
Tab navigation
Environment
Tab navigation
Diagnosing The Problem
Table of contents:
- Gather logs and configuration
- Enable diagnostic trace at runtime
- Enable diagnostic trace at startup
- Change Java options at startup
- Execute a server dump
- Gather data on a performance, hang, or high CPU issue
Gather logs and configuration
Perform the following steps to gather logs and configuration:
- Ensure you are in the right namespace of the target application pods by replacing $NAME with your namespace:
kubectl config set-context --current --namespace=$NAME
- Find the relevant pods:
kubectl get pods
- For each relevant pod, send the standard logs to a local file, replacing $POD twice based on the NAME in the previous step:
kubectl logs --all-containers=true $POD > $POD.txt
- For each relevant pod, replace $POD based on the output above to remote into it (if there are multiple containers, use -c $CONTAINER after $POD to specify the Liberty container):
kubectl exec -it $POD -- /bin/sh
-
Optional: Copy some useful process information into /tmp
cp -R --no-preserve=all --parents /proc/cpuinfo /proc/stat /proc/schedstat /proc/vmstat /proc/meminfo /proc/version /proc/pressure /proc/loadavg /proc/[0-9]*/cgroup /proc/[0-9]*/environ /proc/[0-9]*/cmdline /proc/[0-9]*/smaps /proc/[0-9]*/limits /proc/[0-9]*/stat /proc/[0-9]*/status /proc/[0-9]*/sched /proc/[0-9]*/schedstat /proc/[0-9]*/wchan /proc/[0-9]*/task/*/stat /proc/[0-9]*/task/*/wchan /proc/[0-9]*/task/*/sched /proc/[0-9]*/task/*/status /sys/fs/cgroup/cpu* /sys/fs/cgroup/memory* /sys/fs/cgroup/*/*/cpu* /sys/fs/cgroup/*/*/memory* /tmp/ 2>/dev/null
- Create a compressed file with all logs and configuration:
tar czhf /tmp/liberty_${HOSTNAME}_$(date +%Y%m%d_%H%M%S).tar.gz /logs /config /serviceability/*/${HOSTNAME} /opt/*/wlp/usr/servers/*/logs /opt/*/wlp/usr/servers/*/configDropins /opt/*/wlp/usr/servers/*/*xml /opt/*/wlp/usr/servers/*/*options /opt/*/wlp/usr/servers/*/*env /opt/*/wlp/usr/servers/*/*properties /opt/*/wlp/usr/servers/*/javacore* /opt/*/wlp/usr/servers/*/verbosegc* /opt/*/wlp/usr/servers/*/heapdump* /opt/*/wlp/usr/servers/*/core* ${LOG_DIR} ${X_LOG_DIR} ${WLP_OUTPUT_DIR} ${SERVER_WORKING_DIR} ${VARIABLE_SOURCE_DIRS} ${JAVA_HOME}/jre/lib/security/java.security /tmp/proc /tmp/sys 2>/dev/null
-
Optional: If you performed step 5
Delete only the temporary files produced in step 5:rm -rf /tmp/proc /tmp/sys
- List the compressed file and then exit the container:
cd /tmp; ls *.tar.gz; exit &>/dev/null
- Download the compressed file, replacing $POD with the pod name and $FILE twice from the output of the previous step:
kubectl cp $POD:/tmp/$FILE $FILE --retries=999
- Gather various resource state:
- For each relevant pod, describe the pod, replacing $POD twice with the pod name:
kubectl describe pod $POD > $POD_describe.txt
- For each relevant pod, describe the pod, replacing $POD twice with the pod name:
- Upload:
-
Pod standard logs (step 3)
-
Liberty logs and configuration (step 9)
-
Resource state text files (step 10)
-
Enable diagnostic trace at runtime
Perform the following steps to enable diagnostic trace at runtime:
- If runtime configuration updates are not enabled, then you must enable diagnostic trace at startup.
- If runtime configuration updates are enabled (as they are by default):
- Ensure you are in the right namespace of the target application pods by replacing $NAME with your namespace:
kubectl config set-context --current --namespace=$NAME
- Find the relevant pods:
kubectl get pods
- For each of the relevant pods, replace $POD based on the output above and replace $TRACE with an IBM support-requested trace specification or your desired trace specification (if there are multiple containers, use -c $CONTAINER after $POD to specify the Liberty container):
kubectl exec -it $POD -- /bin/sh -c "echo '<?xml version=\"1.0\" encoding=\"UTF-8\"?><server><logging traceSpecification=\"*=info:$TRACE\" maxFileSize=\"100\" maxFiles=\"10\" /></server>' > /config/configDropins/overrides/trace.xml"
- Reproduce the problem
- Disable the diagnostic trace:
kubectl exec -it $POD -- /bin/sh -c "rm /config/configDropins/overrides/trace.xml"
- Gather and upload all logs
- Ensure you are in the right namespace of the target application pods by replacing $NAME with your namespace:
Enable diagnostic trace at startup
Perform the following steps to enable diagnostic trace at startup:
- Ensure you are in the right namespace of the target application pods by replacing $NAME with your namespace:
kubectl config set-context --current --namespace=$NAME
- In your current directory, create a local file named tracefromstartup.xml with the following contents and replace $TRACE with an IBM support-requested trace specification or your desired trace specification:
<?xml version="1.0" encoding="UTF-8"?> <server> <logging traceSpecification="*=info:$TRACE" maxFileSize="100" maxFiles="10" /> </server>
- Create a ConfigMap entry based on this local file:
kubectl create configmap tracefromstartup --from-file tracefromstartup.xml
- List the relevant Liberty Deployment:
kubectl get deployment
- Edit the relevant Liberty Deployment:
kubectl edit deployment $NAME
- In the spec section of the target container template, add or edit a volumes section that mounts the ConfigMap and a volumeMounts section that places the file into the container; for example:
spec: template: spec: containers: - image: [...] volumeMounts: - name: tracefromstartup mountPath: /config/configDropins/overrides/tracefromstartup.xml subPath: tracefromstartup.xml volumes: - name: tracefromstartup configMap: name: tracefromstartup
- Save and quit the editor. Ensure that the change succeeded by verifying the output ends with "edited"; for example:
deployment.apps/libertysample edited
- After a little bit of time, the old pods should be deleted and new pods with the new trace should be created.
Change Java options at startup
The JVM_ARGS Liberty environment variable may be configured at container startup to change JVM options. Perform the following steps to add Java options at startup:
- First, check if this environment variable is already set:
- Ensure you are in the right namespace of the target application pods by replacing $NAME with your namespace:
kubectl config set-context --current --namespace=$NAME
- Find the relevant pods:
kubectl get pods
- For one of the relevant pods, replace $POD based on the output above to search its environment variables (if there are multiple containers, use -c $CONTAINER after $POD to specify the Liberty container):
kubectl exec -it $POD -- /bin/sh -c "cat /proc/[0-9]*/environ | tr '\0' '\n' | grep JVM_ARGS="
- Ensure you are in the right namespace of the target application pods by replacing $NAME with your namespace:
- Append your desired arguments to the value found in the above steps (if any).
- List the relevant Liberty Deployments:
kubectl get deployments
- Edit the relevant Liberty Deployments:
kubectl edit deployment $NAME
- In the spec section, add or edit the JVM_ARGS entry in the env section; for example:
spec: template: spec: containers: - image: [...] env: - name: JVM_ARGS value: -Djavax.net.debug=all
- Save and quit the editor. Ensure that the change succeeded by verifying the output ends with "edited"; for example:
deployment.apps/libertysample edited
- After a little bit of time, the old pods should be deleted and new pods with the new arguments should be created. You can verify by describing a new pod; for example:
$ kubectl describe pod libertysample--5545f8475b-hzj42 [...] Environment: JVM_ARGS: -Djavax.net.debug=all
Execute a server dump
Perform the following steps to execute and gather a Liberty server dump.
Warning: These commands will start a new process which will consume some memory (likely in the range of dozens of MB). If your container has a memory limit and it is near its limit, this may cause the container to crash. Alternatively, you may gather logs and configuration manually.
- Ensure you are in the right namespace of the target application pods by replacing $NAME with your namespace:
kubectl config set-context --current --namespace=$NAME
- Find the relevant pods:
kubectl get pods
- For each of the relevant pods, replace $POD based on the output above (if there are multiple containers, use -c $CONTAINER after $POD to specify the Liberty container):
kubectl exec -it $POD -- /bin/sh -c "/opt/*/wlp/bin/server dump --include=thread"
- The output of the command should state where the server dump is written; for example:
Dumping server defaultServer. Server defaultServer dump complete in /opt/ibm/wlp/output/defaultServer/defaultServer.dump-24.05.15_17.11.35.zip.
- Download the file, replacing $POD with the pod name and $PATH with the full path from the output of the previous step (without the period at the end):
kubectl cp $POD:$PATH serverdump.zip --retries=999
Gather data on a performance, hang, or high CPU issue
Notes & Tips
- If you are on macOS or Linux (or Cygwin on Windows), then you may use variables to simplify the above commands. For example, various commands use $POD for the target pod, so you may first execute a statement to set the POD variable and then future references of $POD in this terminal window will be replaced with what you specified. In the following example, POD is set to liberty1-5545f8475b-zdwmg and therefore the final command of kubectl logs $POD will use the specified value and thus you can just copy/paste commands from the instructions above without needing to modify them.
$ kubectl get pods NAME READY STATUS RESTARTS AGE liberty1-5545f8475b-zdwmg 1/1 Running 2 16d websphereliberty-app-sample-6f698f5bcb-srn55 1/1 Running 2 16d $ POD=liberty1-5545f8475b-zdwmg $ kubectl logs $POD
Copied!
Copie
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB67","label":"IT Automation \u0026 App Modernization"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"ARM Category":[{"code":"a8mKe000000GmbMIAS","label":"IBM WebSphere Liberty-All Platforms-\u003EContainers"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]
Was this topic helpful?
Document Information
Modified date:
22 May 2024
UID
ibm17152481