Troubleshooting Key Management Service
Troubleshoot common Key Management Service issues.
Install the Kubernetes CLI to run the troubleshooting commands. For more information, see Installing the Kubernetes CLI (kubectl).
- UPGRADE FAILED error
- Key rotation does not work - shows 501 Not Implemented Error
- Key operations do not work - show 400 Bad Request Error
- Key operations do not work - show 500 Internal Server Error
- Key operations do not work - show 503 Unavailable Experiencing delays error
- HSM connection does not work on all management nodes
- Cannot import root key
- key-management-persistence log reports errors after Key Management Service configuration
- Kubernetes Ingress Controller Fake Certificate is returned by NGINX ingress controller
- key-management-pep pod not running
UPGRADE FAILED error
Symptom
Upgrading the Helm chart from 3.1.1 to 3.1.2 does not work. You see the error Error : UPGRADE FAILED
.
Cause
You did not specify the overrides.yaml
configuration file during Helm upgrade.
Solution
- Create a separate
overrides.yaml
configuration file and specify the new image path for IBM® Cloud Private3.1.2 in the file.
Following is a sample overrides.yaml
file:
api:
image:
repository: mycluster.icp:8500/ibmcom/kms-api-amd64
tag: <ICP_VERSION, like 3.1.2>
persistence:
image:
repository: mycluster.icp:8500/ibmcom/kms-persistence-amd64
tag: <ICP_VERSION, like 3.1.2>
storage:
image:
repository: mycluster.icp:8500/ibmcom/kms-onboarding-amd64
tag: <ICP_VERSION, like 3.1.2>
lifecycle:
image:
repository: mycluster.icp:8500/ibmcom/kms-lifecycle-amd64
tag: <ICP_VERSION, like 3.1.2>
pep:
image:
repository: mycluster.icp:8500/ibmcom/kms-pep-amd64
tag: <ICP_VERSION, like 3.1.2>
crypto:
image:
repository: mycluster.icp:8500/ibmcom/kms-crypto-amd64
tag: <ICP_VERSION, like 3.1.2>
auditService:
image:
repository: mycluster.icp:8500/ibmcom/icp-audit-service-amd64
tag: <ICP_VERSION, like 3.1.2>
- Specify the file when you run the
helm upgrade
command.
helm upgrade -f overrides.yaml
Key rotation does not work - shows 501 Not Implemented Error
Symptom
After you install the key-management-hsm
Helm chart, key rotation does not work with Hardware Security Module (HSM). You see the error 501 Not Implemented Error
.
Cause
Key rotation is supported from 3.1.2 version.
Solution
Install the key-management-3.1.2.tgz
Helm chart or upgrade the release.
Key operations do not work - show 400 Bad Request Error
Symptom
After upgrading the key management Helm chart, the operations for creating keys, wrapping keys, or unwrapping keys are not working with HSM. The log contains the following error: 400 Bad Request Error: "Provided API key could not be found".
Cause
The kms-api-key
data that is contained in the key-management-secret
was overwritten to an invalid value of “default_kms_api_key”.
Solution
-
Create a new
api-key
by following the instructions in API key management APIs. -
Encode the key with base64 encryption.
-
Overwrite the existing data in the
kms-api-key
section of the secret by using the {{site.data.keyword.console}}. -
Restart the pod by removing the
key-management-pep
pod.
Key operations do not work - show 500 Internal Server Error
Symptom
After you install the key-management-hsm
Helm chart, you cannot create keys, or wrap or unwrap keys with HSM. You see the error 500 Internal Server Error
.
Cause
Cleaning up job is not complete due to mismatch of image repository path.
Solution
-
Remove
key-management-hsm-cleanup
batch job.- Log in to the management console.
- From the navigation menu, select Workloads > Jobs > Batch Jobs.
- Place the cursor on the
key-management-hsm-cleanup
batch job. - Click ... > Remove to remove the batch job.
-
Redeploy the
key-management-hsm
Helm chart.
Key operations do not work - show 503 Unavailable Experiencing delays error
Symptom
After upgrading the key management Helm chart, the operations for creating keys, wrapping keys, or unwrapping keys are not working with HSM. The log contains the following error: 503 Service Error “Unavailable Experiencing delays. Please try again in few minutes.”
Cause
The HSM that is connected to key-management-hsm-middleware
is unavailable or shut down.
Solution
-
Check the status of the HSM to determine if it is offline or if its configuration has been changed.
-
Restore the original configuration settings to the HSM, if they were changed.
-
Restart the HSM.
HSM connection does not work on all management nodes
Symptom
HSM connection works on some but not all management nodes.
Cause
The certificate and key pairs are not found on the management nodes on which HSM does not work.
Solution
- Install
kubectl
. For more information, see Installing the Kubernetes CLI (kubectl). -
Check the HSM secret to confirm whether the certificate and key pairs are listed for all management nodes.
kubectl get secret hsm-secret -o yaml --namespace kube-system
The information is available in the following format:
<master-node-IP>: <BASE64_ENCODED_CERTIFICATE> <master-node-IP-key>: <BASE64_ENCODED_KEY>
Cannot import root key
You can import root keys only when you use a supported HSM model. SoftHSM is not supported.
For the supported HSM models, see Configuring Key Management Service.
key-management-persistence log reports errors after Key Management Service configuration
Symptom
After you configure the Key Management Service, you see errors in the key-management-persistence
log.
kubectl logs key-management-persistence-5d6974bf8c-vxxwl --namespace kube-system
Following is a sample output:
2018/11/27 14:31:13 maxprocs: Leaving GOMAXPROCS=8: CPU quota undefined
{“caller”:“config.go:402",“component”:“config”,“file”:“/opt/keyprotect/config//production”,“location”:“local”,“msg”:“config loaded from local”,“ts”:“2018-11-27T14:31:13.891450032Z”}
{“caller”:“root.go:104",“commit”:“5bbc1228",“component”:“root”,“semver”:“2.1.0",“ts”:“2018-11-27T14:31:15.157576488Z”}
Creating MongoDB session with options: [mongodb:27017], rs0
Failed to create session: no reachable servers
Creating MongoDB session with options: [mongodb:27017], rs0
Failed to create session: no reachable servers
Creating MongoDB session with options: [mongodb:27017], rs0
Failed to create session: no reachable servers
Creating MongoDB session with options: [mongodb:27017], rs0
Cause
Containers on the management node failed to look up other services on the master node. The routing table was not configured properly because of a configuration issue with the kube-controller
.
Solution
Update the kube-controller
configuration.
Kubernetes Ingress Controller fake certificate is returned by the NGINX ingress controller
Symptom
When calling https://proxy_ip/
, a Kubernetes Ingress Controller Fake Certificate is returned.
Cause
Kubernetes Ingress Controller Fake Certificate is used as the default SSL certificate in NGINX ingress controller.
Solution
You can configure --default-ssl-certificate
in daemonset nginx-ingress-controller to replace "Kubernetes Ingress Controller Fake Certificate”.
For example:
- Create a secret that contains a SSL certificate:
openssl genrsa -out ing-tls.key 4096 openssl req -new -key ing-tls.key -out ing-tls.csr -subj "/CN=TTTEEESSSTTT” openssl x509 -req -days 36500 -in ing-tls.csr -signkey ing-tls.key -out ing-tls.crt kubectl create secret tls ing-tls-secret --cert=ing-tls.crt --key=ing-tls.key -n kube-system
- Set
--default-ssl-certificate
in the daemonset nginx-ingress-controller. For example:kubectl edit ds -n kube-system nginx-ingress-controller
containers: - args: - /nginx-ingress-controller - --default-backend-service=$(POD_NAMESPACE)/default-http-backend - --configmap=$(POD_NAMESPACE)/nginx-ingress-controller - --annotations-prefix=ingress.kubernetes.io - --enable-ssl-passthrough=true - --publish-status-address=172.16.247.161 - --default-ssl-certificate=$(POD_NAMESPACE)/ing-tls-secret
- Check the result. For example:
# ps -ef | grep nginx-ingress-controller | grep default-ssl-certificate 33 23251 23207 0 22:45 ? 00:00:00 /usr/bin/dumb-init -- /nginx-ingress-controller --default-backend-service=kube-system/default-http-backend --configmap=kube-system/nginx-ingress-controller --annotations-prefix=ingress.kubernetes.io --enable-ssl-passthrough=true --publish-status-address=172.16.247.161 --default-ssl-certificate=kube-system/ing-tls-secret 33 23308 23251 0 22:45 ? 00:00:02 /nginx-ingress-controller --default-backend-service=kube-system/default-http-backend --configmap=kube-system/nginx-ingress-controller --annotations-prefix=ingress.kubernetes.io --enable-ssl-passthrough=true --publish-status-address=172.16.247.161 --default-ssl-certificate=kube-system/ing-tls-secret
# curl -kv https://172.16.247.161 * About to connect() to 172.16.247.161 port 443 (#0) * Trying 172.16.247.161... * Connected to 172.16.247.161 (172.16.247.161) port 443 (#0) * Initializing NSS with certpath: sql:/etc/pki/nssdb * skipping SSL peer certificate verification * SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 * Server certificate: * subject: CN=TTTEEESSSTTT * start date: May 05 05:44:02 2019 GMT * expire date: Apr 11 05:44:02 2119 GMT * common name: TTTEEESSSTTT * issuer: CN=TTTEEESSSTTT > GET / HTTP/1.1 > User-Agent: curl/7.29.0 > Host: 172.16.247.161 > Accept: */* > < HTTP/1.1 404 Not Found < Date: Sun, 05 May 2019 05:49:49 GMT < Content-Type: text/plain; charset=utf-8 < Content-Length: 21 < Connection: keep-alive < Strict-Transport-Security: max-age=15724800; includeSubDomains < * Connection #0 to host 172.16.247.161 left intact
key-management-pep pod not running
Symptom
The key-management-pep
pod is not running, and displays “CreateContainerConfigError”
.
Cause
The kms-api-key
data inside of the value of the key-management-secret
is not valid.
Solution
-
Check the status of secret-watcher pod.
-
If the pod is running, restart it.
-
If it is not running, see the troubleshooting guide for the secret watcher service.