Troubleshooting Key Management Service

Troubleshoot common Key Management Service issues.

Install the Kubernetes CLI to run the troubleshooting commands. For more information, see Installing the Kubernetes CLI (kubectl).

UPGRADE FAILED error

Symptom

Upgrading the Helm chart from 3.1.1 to 3.1.2 does not work. You see the error Error : UPGRADE FAILED.

Cause

You did not specify the overrides.yaml configuration file during Helm upgrade.

Solution

  1. Create a separate overrides.yaml configuration file and specify the new image path for IBM® Cloud Private3.1.2 in the file.

Following is a sample overrides.yaml file:

api:
  image:
    repository: mycluster.icp:8500/ibmcom/kms-api-amd64
    tag: <ICP_VERSION, like 3.1.2>

persistence:
  image:
    repository: mycluster.icp:8500/ibmcom/kms-persistence-amd64
    tag: <ICP_VERSION, like 3.1.2>

storage:
  image:
    repository: mycluster.icp:8500/ibmcom/kms-onboarding-amd64
    tag: <ICP_VERSION, like 3.1.2>

lifecycle:
  image:
    repository: mycluster.icp:8500/ibmcom/kms-lifecycle-amd64
    tag: <ICP_VERSION, like 3.1.2>

pep:
  image:
    repository: mycluster.icp:8500/ibmcom/kms-pep-amd64
    tag: <ICP_VERSION, like 3.1.2>

crypto:
  image:
    repository: mycluster.icp:8500/ibmcom/kms-crypto-amd64
    tag: <ICP_VERSION, like 3.1.2>

auditService:
  image:
    repository: mycluster.icp:8500/ibmcom/icp-audit-service-amd64
    tag: <ICP_VERSION, like 3.1.2>
  1. Specify the file when you run the helm upgrade command.
helm upgrade -f overrides.yaml

Key rotation does not work - shows 501 Not Implemented Error

Symptom

After you install the key-management-hsm Helm chart, key rotation does not work with Hardware Security Module (HSM). You see the error 501 Not Implemented Error.

Cause

Key rotation is supported from 3.1.2 version.

Solution

Install the key-management-3.1.2.tgz Helm chart or upgrade the release.

Key operations do not work - show 400 Bad Request Error

Symptom

After upgrading the key management Helm chart, the operations for creating keys, wrapping keys, or unwrapping keys are not working with HSM. The log contains the following error: 400 Bad Request Error: "Provided API key could not be found".

Cause

The kms-api-key data that is contained in the key-management-secret was overwritten to an invalid value of “default_kms_api_key”.

Solution

  1. Create a new api-key by following the instructions in API key management APIs.

  2. Encode the key with base64 encryption.

  3. Overwrite the existing data in the kms-api-key section of the secret by using the {{site.data.keyword.console}}.

  4. Restart the pod by removing the key-management-pep pod.

Key operations do not work - show 500 Internal Server Error

Symptom

After you install the key-management-hsm Helm chart, you cannot create keys, or wrap or unwrap keys with HSM. You see the error 500 Internal Server Error.

Cause

Cleaning up job is not complete due to mismatch of image repository path.

Solution

  1. Remove key-management-hsm-cleanup batch job.

    1. Log in to the management console.
    2. From the navigation menu, select Workloads > Jobs > Batch Jobs.
    3. Place the cursor on the key-management-hsm-cleanup batch job.
    4. Click ... > Remove to remove the batch job.
  2. Redeploy the key-management-hsm Helm chart.

Key operations do not work - show 503 Unavailable Experiencing delays error

Symptom

After upgrading the key management Helm chart, the operations for creating keys, wrapping keys, or unwrapping keys are not working with HSM. The log contains the following error: 503 Service Error “Unavailable Experiencing delays. Please try again in few minutes.”

Cause

The HSM that is connected to key-management-hsm-middleware is unavailable or shut down.

Solution

  1. Check the status of the HSM to determine if it is offline or if its configuration has been changed.

  2. Restore the original configuration settings to the HSM, if they were changed.

  3. Restart the HSM.

HSM connection does not work on all management nodes

Symptom

HSM connection works on some but not all management nodes.

Cause

The certificate and key pairs are not found on the management nodes on which HSM does not work.

Solution

  1. Install kubectl. For more information, see Installing the Kubernetes CLI (kubectl).
  2. Check the HSM secret to confirm whether the certificate and key pairs are listed for all management nodes.

    kubectl get secret hsm-secret -o yaml --namespace kube-system
    

    The information is available in the following format:

    <master-node-IP>: <BASE64_ENCODED_CERTIFICATE>
    <master-node-IP-key>: <BASE64_ENCODED_KEY>
    

Cannot import root key

You can import root keys only when you use a supported HSM model. SoftHSM is not supported.

For the supported HSM models, see Configuring Key Management Service.

key-management-persistence log reports errors after Key Management Service configuration

Symptom

After you configure the Key Management Service, you see errors in the key-management-persistence log.

kubectl logs key-management-persistence-5d6974bf8c-vxxwl --namespace kube-system

Following is a sample output:

2018/11/27 14:31:13 maxprocs: Leaving GOMAXPROCS=8: CPU quota undefined
{“caller”:“config.go:402",“component”:“config”,“file”:“/opt/keyprotect/config//production”,“location”:“local”,“msg”:“config loaded from local”,“ts”:“2018-11-27T14:31:13.891450032Z”}
{“caller”:“root.go:104",“commit”:“5bbc1228",“component”:“root”,“semver”:“2.1.0",“ts”:“2018-11-27T14:31:15.157576488Z”}
Creating MongoDB session with options: [mongodb:27017],  rs0
Failed to create session:  no reachable servers
Creating MongoDB session with options: [mongodb:27017],  rs0
Failed to create session:  no reachable servers
Creating MongoDB session with options: [mongodb:27017],  rs0
Failed to create session:  no reachable servers
Creating MongoDB session with options: [mongodb:27017],  rs0

Cause

Containers on the management node failed to look up other services on the master node. The routing table was not configured properly because of a configuration issue with the kube-controller.

Solution

Update the kube-controller configuration.

Kubernetes Ingress Controller fake certificate is returned by the NGINX ingress controller

Symptom

When calling https://proxy_ip/, a Kubernetes Ingress Controller Fake Certificate is returned.

Cause

Kubernetes Ingress Controller Fake Certificate is used as the default SSL certificate in NGINX ingress controller.

Solution

You can configure --default-ssl-certificate in daemonset nginx-ingress-controller to replace "Kubernetes Ingress Controller Fake Certificate”.

For example:

  1. Create a secret that contains a SSL certificate:
     openssl genrsa -out ing-tls.key 4096
     openssl req -new -key ing-tls.key -out ing-tls.csr -subj "/CN=TTTEEESSSTTT”
     openssl x509 -req -days 36500 -in ing-tls.csr -signkey ing-tls.key -out ing-tls.crt
     kubectl create secret tls ing-tls-secret --cert=ing-tls.crt --key=ing-tls.key -n kube-system
    
  2. Set --default-ssl-certificate in the daemonset nginx-ingress-controller. For example:
     kubectl edit ds -n kube-system nginx-ingress-controller
    
           containers:
           - args:
             - /nginx-ingress-controller
             - --default-backend-service=$(POD_NAMESPACE)/default-http-backend
             - --configmap=$(POD_NAMESPACE)/nginx-ingress-controller
             - --annotations-prefix=ingress.kubernetes.io
             - --enable-ssl-passthrough=true
             - --publish-status-address=172.16.247.161
             - --default-ssl-certificate=$(POD_NAMESPACE)/ing-tls-secret
    
  3. Check the result. For example:
     # ps -ef | grep nginx-ingress-controller | grep default-ssl-certificate
     33       23251 23207  0 22:45 ?        00:00:00 /usr/bin/dumb-init -- /nginx-ingress-controller --default-backend-service=kube-system/default-http-backend --configmap=kube-system/nginx-ingress-controller --annotations-prefix=ingress.kubernetes.io --enable-ssl-passthrough=true --publish-status-address=172.16.247.161 --default-ssl-certificate=kube-system/ing-tls-secret
     33       23308 23251  0 22:45 ?        00:00:02 /nginx-ingress-controller --default-backend-service=kube-system/default-http-backend --configmap=kube-system/nginx-ingress-controller --annotations-prefix=ingress.kubernetes.io --enable-ssl-passthrough=true --publish-status-address=172.16.247.161 --default-ssl-certificate=kube-system/ing-tls-secret
    
     # curl -kv  https://172.16.247.161
     * About to connect() to 172.16.247.161 port 443 (#0)
     *   Trying 172.16.247.161...
     * Connected to 172.16.247.161 (172.16.247.161) port 443 (#0)
     * Initializing NSS with certpath: sql:/etc/pki/nssdb
     * skipping SSL peer certificate verification
     * SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
     * Server certificate:
     *       subject: CN=TTTEEESSSTTT
     *       start date: May 05 05:44:02 2019 GMT
     *       expire date: Apr 11 05:44:02 2119 GMT
     *       common name: TTTEEESSSTTT
     *       issuer: CN=TTTEEESSSTTT
     > GET / HTTP/1.1
     > User-Agent: curl/7.29.0
     > Host: 172.16.247.161
     > Accept: */*
     >
     < HTTP/1.1 404 Not Found
     < Date: Sun, 05 May 2019 05:49:49 GMT
     < Content-Type: text/plain; charset=utf-8
     < Content-Length: 21
     < Connection: keep-alive
     < Strict-Transport-Security: max-age=15724800; includeSubDomains
     <
     * Connection #0 to host 172.16.247.161 left intact
    

key-management-pep pod not running

Symptom

The key-management-pep pod is not running, and displays “CreateContainerConfigError”.

Cause

The kms-api-key data inside of the value of the key-management-secret is not valid.

Solution

  1. Check the status of secret-watcher pod.

  2. If the pod is running, restart it.

  3. If it is not running, see the troubleshooting guide for the secret watcher service.