Cannot create pod because of MutatingAdmissionWebhook failure

The MutatingAdmissionWebhook admission plug-in failed to complete mutation in 13 seconds.

Symptom

You usually see this issue during upgrade of IBM Cloud Pak foundational services in your cluster.

When this issue happens, the replicaset-controllers cannot generate their pods. You can verify this behavior by deleting the pod.

  1. Get the pod names.

     oc get pods -n <pod-namespace>
    
  2. Delete a pod in the cluster.

     oc delete pod <pod-name> -n <pod-namespace>
    
  3. Verify whether the pod is created.

     oc get pods -n <pod-namespace>
    

    You see that the pod is not re-created.

  4. Get the replicaset of the pod.

     oc get replicaset -n <pod-namespace>
    
  5. Check the events in the replicaset.

     oc describe replicaset <pod-replicaset> -n <pod-namespace>
    

    Following is a sample event:

     Events:
       Type     Reason        Age                 From                   Message
       ----     ------        ----                ----                   -------
       Warning  FailedCreate  37m (x20 over 47m)  replicaset-controller  Error creating: Internal error occurred: admission plug-in "MutatingAdmissionWebhook" failed to complete mutation in 13s
    

Cause

The MutatingAdmissionWebhook doesn't work in the cluster.

Resolving the problem

Delete the failed MutatingAdmissionWebhook.

  1. Get the kube-apiserver pod name. The kube-apiserver pod is in the openshift-kube-apiserver namespace in the OpenShift cluster.

     oc get pod -n openshift-kube-apiserver
    
  2. Identify the webhook server that has failed. Check the kube-apiserver logs to identify the failed webhook server.

     oc logs <kube-apiserver-pod> -n openshift-kube-apiserver -c <kube-apiserver-container>
    

    Following is a sample from the log:

       W0501 11:12:28.735594       1 dispatcher.go:168] Failed calling webhook, failing open iam.hooks.securityenforcement.admission.cloud.ibm.com: failed calling webhook "iam.hooks.securityenforcement.admission.cloud.ibm.com": Post https://platform-identity-management.kube-system.svc:443/identity/api/v1/users/validateandmutate?timeout=30s: context canceled
    

    The log shows that the iam.hooks.securityenforcement.admission.cloud.ibm.com webhook has failed.

  3. Get the MutatingAdmissionWebhook information.

     oc get MutatingWebhookConfiguration
    
  4. Delete the failed webhook.

     oc delete MutatingWebhookConfiguration <webhook-name>
    
  5. Check whether the foundational services webhook is causing the webhook server to fail.

     oc get pod -n ibm-common-services | grep ibm-common-service-webhook
    

    If the ibm-common-service-webhook pod shows errors, delete the pod to re-create it.

     oc delete pod <ibm-common-service-webhook pod name> -n ibm-common-services