IBM Support

Cassandra pods are down and going to CrashLoopBackOff mode

Troubleshooting


Problem

Cassandra pods are down and going to CrashLoopBackOff mode

Symptom

ERROR [main] 2023-06-15 12:19:44,673 JVMStabilityInspector.java:124 - Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Encountered bad header at position 49377 of commit log /opt/ibm/cassandra/bin/../data/commitlog/CommitLog-6-1681292370141.log, with bad position but valid CRC

Resolving The Problem

To fix this you need to delete that commit log and, to do that, you need to stop the pod from crashing, so edit the stateful set for cassandra, and add a line:
command: ["sh","-c","sleep 1000"]
after the line starting:
image: cp.icr.io/cp/noi/cassandra@sha256:b10b...

So, you have, for example:
image: cp.icr.io/cp/noi/cassandra@sha256:37c07e695d2cdd5b765f5829fd9f0eadc479f2ccf8ccc80fbf0890e4e262ee37
command: ["sh","-c","sleep 1000"]
When you save the stateful set the pods should restart.
If not, restart the failing pod.Login (rsh) to the previously failing pod and remove the bad commit log, i.e.:
rm -f /opt/ibm/cassandra/data/commitlog/CommitLog-6-1681292370141.log

Then re-edit the stateful set and remove the sleep. Restart the pod(s) again if they don't do it automatically.After this, you need to login to the pod once more and run:
/opt/ibm/cassandra/bin/nodetool repair --full
Note: The removal of the commit log will result in some loss of data.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTPTP","label":"Netcool Operations Insight"},"ARM Category":[{"code":"a8m0z0000001jZTAAY","label":"NOI Netcool Operations Insights-\u003ECNEA Cloud Native Event Analytics"}],"ARM Case Number":"TS013332876","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
22 June 2023

UID

ibm17006093