Troubleshooting installation on OpenShift

Review the following troubleshooting tips if you encounter a problem while installing or upgrading API Connect on OpenShift, including as a component of IBM Cloud Pak for Integration (CP4I).

One or more pods in CrashLoopBackoff or Error state, and report a certificate error in the logs

In rare cases, cert-manager might detect a certificate in a bad state right after it has been issued, and then re-issues the certificate. If a CA certificate has been issued twice, the certificate that was signed by the previously issued CA will be left stale and can't be validated by the newly issued CA. In this scenario, one of the following messages displays in the log:
  • javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
  • Error: unable to verify the first certificate
  • ERROR: openssl verify failed to verify the Portal CA tls.crt, ca.crt chain signed the Portal Server tls.crt cert
    
Resolve the problem by completing the following steps:
  1. Use apicops (v10 version 0.10.57+ required) to validate the certificates in the system:
    apicops upgrade:stale-certs -n <namespace>
  2. If any certificate that is managed by cert-manager fails the validation, delete the stale certificate secret:
    oc delete secret <stale-secret> -n <namespace>

    Cert-manager automatically generates a new certificate to replace the one you deleted.

  3. Use apicops to make sure all certificates can be verified successfully:
    apicops upgrade:stale-certs -n <namespace>

You see the denied: insufficient scope error during an air-gapped deployment

Problem: You encounter the denied: insufficient scope message while mirroring images during an air-gapped installation.

Reason: This error occurs when a problem is encountered with the entitlement key used for obtaining images.

Solution: Obtain a new entitlement key by completing the following steps:

  1. Log in to the IBM Container Library.
  2. In the Container software library, select Get entitlement key.
  3. After the Access your container software heading, click Copy key.
  4. Copy the key to a safe location.

Apiconnect operator crashes

Problem: During installation, the Apiconnect operator crashes with the following message:

panic: unable to build API support: unable to get Group and Resources: unable to retrieve the complete list of server APIs: packages.operators.coreos.com/v1: the server is currently unable to handle the request

goroutine 1 [running]:
github.ibm.com/velox/apiconnect-operator/operator-utils/v2/apiversions.GetAPISupport(0x0)
	operator-utils/v2/apiversions/api-versions.go:89 +0x1e5
main.main()
	ibm-apiconnect/cmd/manager/main.go:188 +0x4ee
Additional symptoms:
  • Apiconnect operator is in crash loopback status
  • Kube apiserver pods log the following information:
    E1122 18:02:07.853093 18 available_controller.go:437] v1.packages.operators.coreos.com failed with:
     failing or missing response from https://10.128.0.3:5443/apis/packages.operators.coreos.com/v1:
     bad status from https://10.128.0.3:5443/apis/packages.operators.coreos.com/v1: 401
  • The IP logged here belongs to the package server pod present in the openshift-operator-lifecycle-manager namespace
  • Package server pods log the following: /apis/packages.operators.coreos.com/v1 API call is being rejected with 401 issue
    E1122 18:10:25.614179 1 authentication.go:53] Unable to authenticate the request due to an error: x509: 
    certificate signed by unknown authority I1122 18:10:25.614224 1 httplog.go:90] 
    verb="GET" URI="/apis/packages.operators.coreos.com/v1" latency=161.243µs resp=401 
    UserAgent="Go-http-client/2.0" srcIP="10.128.0.1:41370":
  • Problem is intermittent
Solution:
  • If you find the exact symptoms as described, the solution is to delete package server pods in the openshift-operator-lifecycle-manager namespace.
  • New package server pods will log the 200 Success message for the same API call.

Disabling the Portal web endpoint check

When you create or register a Developer Portal service, the Portal subsystem checks that the Portal web endpoint is accessible. However sometimes, for example due to the complexity of public and private networks, the endpoint cannot be reached. The following example shows the errors that you might see in the portal-www pod, admin container logs, if the endpoint cannot be reached:
An error occurred contacting the provided portal web endpoint: example.com
The provided Portal web endpoint example.com returned HTTP status code 504
In this instance, you can disable the Portal web endpoint check so that the Developer Portal service can be created successfully.
To disable the endpoint check, complete the following update:
On Kubernetes, OpenShift, and IBM® Cloud Pak for Integration
Add the following section to the Portal custom resource (CR) template:
spec:
  template:
  - containers:
    - env:
      - name: PORTAL_SKIP_WEB_ENDPOINT_VALIDATION
        value: "true"
      name: admin
    name: www