IBM Support

Postgres database pods in CrashLoopBackOff state

Troubleshooting


Problem

Postgres database pods in 'CrashLoopBackOff' state and Postgres pod log show:
FATAL: could not load private key file \"/controller/certificates/server.key\": key values mismatch
Postgres operator controller manager in 'ibm-common-services' namespace also complains about key mismatch issues.

Symptom

The Postgres cluster might show in healthy state.
Example: 
The Postgres cluster name varies between the cloud Pak operators that use the database. You can find the name of the postgress cluster name using the following command.
$ oc get cluster -A

NAMESPACE   NAME                AGE    INSTANCES   READY   STATUS                           PRIMARY
cp4data  wa-dwf-ibm-mt-dwf-pg    293d   3          3       Cluster in healthy state   wa-dwf-ibm-mt-dwf-pg-6
cp4data  wa-postgres             126d   3          3       Cluster in healthy state   wa-postgres-1

Users might see IBM Cloud Pak application pods using Postgres database in 'Error' or 'CrashLoopBackOff' state.

Cause

This is a known issue, found in Cloud Native PostgresSQL (CNP) 1.10.0 and fixed in Postgres version 11 as described in the Release Notes.
These are the two main bug fixes in version 11, which correct the problem. 
Bug fix Disable Public Key Infrastructure (PKI) initialization on Red Hat OpenShift and OLM installations, by using the provided one.
Bug fix Use the correct public key when renewing the expired webhook TLS secret.
All latest PostgresSQL ​​​​and version may be found here: Release Notes

Environment

  • Cloud Platform: Red Hat OpenShift 4.x.x
  • PostgresEDB/PostgresSQL version: 1.10.0

Diagnosing The Problem

Operator logs show the following message: 
Example:  Be sure to denote the "namespace":"cp4ba" where the CNP(Cloud Native Postgres) cluster operator is running.
 
$ oc logs postgresql-operator-controller-manager-xxxx -n ibm-common-services

"level":"error","ts":1659944568.3490374,"logger":"controller.cluster","msg":"Reconciler error","reconciler group":"postgresql.k8s.enterprisedb.io","reconciler kind":"Cluster","name":"ibm-bts-cnpg-cp4ba-cp4ba-bts","namespace":"cp4ba","error":"cannot create Cluster auxiliary objects: generating server certificate: tls: private key does not match public key"
Postgres pods failing with: 
 
FATAL: could not load private key file \"/controller/certificates/server.key\": key values mismatch
You may see 3 instances Ready and 'Cluster in healthy state' but some Postgres pods may not show up.
$ oc get cluster
NAME                 AGE    INSTANCES   READY   STATUS                     PRIMARY
cp4na-o-postgresql   164d   3           3       Cluster in healthy state   cp4na-o-postgresql-3
'oc get pods' would show postgres pods expired after 90days. 
$ oc get pods
cp4na-o-postgresql-2                                              0/1     CrashLoopBackOff        649        90d
cp4na-o-postgresql-3                                              0/1     CrashLoopBackOff        688        164d

Resolving The Problem

To resolve this issue, use the correct public key when renewing the expired webhook TLS secret.
It is recommended to only delete the certificates associated with the cluster.
Look at the 'namespace' where the CNP operator resides in the cluster. Postgres used by IBM Cloud Pak operators (i.e. CP4Data, BTS, CP4BA) and typically, the CNP operator resides in 'ibm-common-services namespace', and one or more CNP clusters reside(s) in Cloud Pak namespace(s).
The secret names that the administrator would need to delete are:

<cluster_name>-server
<cluster_name>-ca
<cluster_name>-replication
For example:
oc delete secret <clustername>-server
oc delete secret <clustername>-ca
oc delete secret <clustername>-replication

Note: 

- EDB strongly recommends updating the operator to a current version and maintaining new release updates.
- If no Postgres update is done, these steps have to be performed within 90days due to Postgres pod expiry.

Document Location

Worldwide

[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSDSDC","label":"IBM Cloud Pak for Network Automation"},"ARM Category":[{"code":"a8m3p000000PCJCAA4","label":"AI Manager-\u003EDatastore-\u003EPostgres - EDB"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions","Type":"MASTER"},{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSHKN6","label":"IBM Cloud Platform Common Services"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBYVB","label":"IBM Cloud Pak for Business Automation"},"ARM Category":[{"code":"a8m3p000000hBeaAAE","label":"Other-\u003ECloudPak4Automation Platform-\u003EFoundation Services-\u003EPostgre"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSHGYS","label":"IBM Cloud Pak for Data"},"ARM Category":[{"code":"a8m0z000000GoylAAC","label":"Troubleshooting"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Product Synonym

EDB; Cloud Native Postgres; Postgres; CNP, Cloud Pak, BTS, CP4BA

Document Information

Modified date:
16 August 2022

UID

ibm16599269