IBM Support

Stability of Cloud Pak for Data and Cloud Pak for Data System affected by Kubernetes defect

Troubleshooting


Problem

Due to the following defect
https://github.com/kubernetes/kubernetes/issues/84931
Kubernetes may mark certain pods as Not Ready, even though the Readiness checks pass.
This causes removal of Endpoints from Services which may impact the application's stability.
At this point we are aware that the following OpenShift versions are affected:
3.11.154  (used in Cloud Pak for Data System)
3.11.157
3.11.161 

Symptom

Application becomes unstable:
- issues with logging in
- pages are not loading
- 50x error codes returned
- pods not in Ready state

Cause

Network instability as well as other factors may trigger pods' readiness state not be updated.
This makes some services not accessible, causing issues with internal traffic.

Environment

Cloud Pak for Data 2.5
Cloud Pak for Data System 2.5
OCP 3.11.154  (used in Cloud Pak for Data System)
OCP 3.11.157
OCP 3.11.161

Diagnosing The Problem

Please run the following command from CLI once authenticated to OCP:
oc get endpoints --no-headers --all-namespaces | grep -v ":" | grep -v "<none>"
the command should return no rows.

 

Resolving The Problem

Temporary workaround:
In case endpoints are returned (not empty response), the pods corresponding to the endpoints need to be restarted.
This can be done by selecting the project of the failing endpoint.
 
oc project <project of failing endpoint>
  and running the following command:
for ep in $(oc get ep --no-headers | grep -v ":" | grep -v "<none>" | awk '{print $1}');  do ips=$(oc describe ep $ep| grep NotReadyAddresses  | awk '{print $2}' | sed -e 's/,/ /g');  echo $ep; for ip in $ips; do pod=$(oc get po -o custom-columns=:metadata.name --no-headers=true --field-selector status.podIP=$ip); echo $pod; oc delete po $pod  --wait=false; done;  done
If possible, please upgrade OCP to 3.11.188 (or higher)

Related Information

Document Location

Worldwide

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSHGYS","label":"IBM Cloud Pak for Data"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"2.5","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}},{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSHDA9","label":"IBM Cloud Pak for Data System"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
28 May 2020

UID

ibm16098818