Troubleshooting
Problem
Due to the following defect
https://github.com/kubernetes/kubernetes/issues/84931
Kubernetes may mark certain pods as Not Ready, even though the Readiness checks pass.
This causes removal of Endpoints from Services which may impact the application's stability.
At this point we are aware that the following OpenShift versions are affected:
3.11.154 (used in Cloud Pak for Data System)
3.11.157
3.11.161
Symptom
Application becomes unstable:
- issues with logging in
- pages are not loading
- 50x error codes returned
- pods not in Ready state
Cause
Network instability as well as other factors may trigger pods' readiness state not be updated.
This makes some services not accessible, causing issues with internal traffic.
Environment
Cloud Pak for Data 2.5
Cloud Pak for Data System 2.5
OCP 3.11.154 (used in Cloud Pak for Data System)
OCP 3.11.157
OCP 3.11.161
Diagnosing The Problem
Please run the following command from CLI once authenticated to OCP:
oc get endpoints --no-headers --all-namespaces | grep -v ":" | grep -v "<none>"
the command should return no rows.
Resolving The Problem
Temporary workaround:
In case endpoints are returned (not empty response), the pods corresponding to the endpoints need to be restarted.
This can be done by selecting the project of the failing endpoint.
oc project <project of failing endpoint>
and running the following command:
for ep in $(oc get ep --no-headers | grep -v ":" | grep -v "<none>" | awk '{print $1}'); do ips=$(oc describe ep $ep| grep NotReadyAddresses | awk '{print $2}' | sed -e 's/,/ /g'); echo $ep; for ip in $ips; do pod=$(oc get po -o custom-columns=:metadata.name --no-headers=true --field-selector status.podIP=$ip); echo $pod; oc delete po $pod --wait=false; done; done
If possible, please upgrade OCP to 3.11.188 (or higher)
Related Information
Document Location
Worldwide
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSHGYS","label":"IBM Cloud Pak for Data"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"2.5","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}},{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSHDA9","label":"IBM Cloud Pak for Data System"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Was this topic helpful?
Document Information
Modified date:
28 May 2020
UID
ibm16098818