Known limitations
The following are possible issues you may encounter when using IBM Cloud Pak® for Integration (with applicable solutions):
Cannot publish an unenforced API to any catalog in the UI (event endpoints only)
Error validating CRs against a new CRD schema in OLM when upgrading DataPower
Unable to deploy API Connect using the Platform UI with Portal Site Name set
Kubernetes garbage collection might cause pods to go into a CrashLoopBackOff state
OpenShift version support warning remains after upgrading Automation Assets
Upgrading Automation Assets to version 2022.2 can result in an outage during upgrade
Automation Assets may show as CrashLoopBackOff for a period of time before successfully deploying
Event Streams shows the Kafka cluster is not ready during upgrade to 2022.2
Unable to delete a namespace due to the OperandRequest remaining
Installations on IBM Power or Z platforms can fail to complete due to Zen storage requirements
Platform UI continuously changes from Ready back to Pending even when the UI is accessible
Errors when installing API Connect
For a list of possible errors and solutions when installing API Connect, see Troubleshooting installation and upgrade on OpenShift in the API Connect documentation.
Issues when using the Platform UI
Symptom: You experience issues when you use the Platform UI, for example you see an Application not available
message even though the associated pods are running.
Solution: Try restarting the pods.
Find the pods that are in the same namespace as your Platform UI instance. You can view a list of pods in a namespace by running the following command:
oc get pods -n <namespace>
Use the
oc delete
command to delete the pods with the following names, where the asterisk (*) is a wildcard:ibm-nginx-*
usermgmt-*
*-ibm-integration-platform-navigator-*
Wait for the pods to be re-created.
If the problem remains, check whether another known limitation in this topic could be the cause.
OLM causes Operator failures on OCP 4.10.38 and later
Symptom: Users may observe operators in an unknown state with a log error stating a dependency is not satisfied and the catalog-operator-*
pod in the openshift-operator-lifecycle-manager
namespace has multiple restarts.
Cause: Operator Lifecycle Manager (OLM) in OpenShift 4.10.38 is limited to processing a single operator request at a time.
Solution: To resolve this, go to the openshift-operator-lifecycle-manager
namespace and restart the catalog-operator-*
pod, then the olm-operator-*
pod. Alternatively this can be done by deleting all of the pods in the openshift-operator-lifecycle-manager
namespace, which will trigger them to be recreated.
Unable to access Cloud Pak for Integration user interfaces
Symptom: When users try to access the Cloud Pak for Integration UI routes, they get the message, Application is not available
.
Cause: Network traffic has not been allowed between the deployed instance of Platform Navigator and the Ingress Controller, as required. For more information about this policy, see https://docs.openshift.com/container-platform/4.6/networking/network_policy/multitenant-network-policy.html
Solution:
Using the CLI
Log in to the Red Hat OpenShift cluster CLI as a Cluster Administrator.
Confirm the
endpointPublishingStrategy
type of theIngressController
:oc get --namespace openshift-ingress-operator ingresscontrollers/default \ --output jsonpath='{.status.endpointPublishingStrategy.type}'; echo
If the type value from the previous step is
HostNetwork
, ingress traffic must be enabled through the default namespace. Add the following label to the default namespace:network.openshift.io/policy-group=ingress
oc label namespace default 'network.openshift.io/policy-group=ingress'
Using the Red Hat OpenShift web console
Log in to the Red Hat OpenShift web console for the cluster as a Cluster Administrator.
In the navigation pane, click Home > Search. Click to expand Project, select the
openshift-ingress-operator
namespace, then search for the resourceIngressController
.To confirm the value of
spec.endpointPublishingStrategy
, click to open the defaultIngressResource
and view the YAML.If the value of
spec.endpointPublishingStrategy.type
isHostNetwork
, ingress traffic must be enabled through the default namespace. In the left navigation pane, click Home > Search. Search for the resource namespace, select the default namespace, and click Edit Labels.Add the label
network.openshift.io/policy-group=ingress
, then click Save.
Expired leaf certificates not automatically refreshed
Symptom: User gets an Application Unavailable
message when attempting to access the Platform UI or other capabilities in Cloud Pak for Integration.
In addition, the management-ingress pod logs show an error:
Error: exit status 1
2021/01/23 16:56:00 [emerg] 44#44: invalid number of arguments in "ssl_certificate" directive in /tmp/nginx-cfg123456789:123
nginx: [emerg] invalid number of arguments in "ssl_certificate" directive in /tmp/nginx-cfg123456789:123
nginx: configuration file /tmp/nginx-cfg123456789 test failed
Cause:: Self-signed CA certificate refresh does not automatically refresh leaf certificates, resulting in unavailable services.
Solution: For information on how to refresh these certificates, see Replacing default keys and certificates.
Cannot publish an unenforced API to any catalog in the UI (event endpoints only)
Symptoms: If you have Event Endpoint Management with only event endpoints enabled, you cannot publish an unenforced API to any catalog in the UI.
After creating an AsyncAPI document to describe an event source and selecting not to enforce access control to your API through an Event Gateway Service, there are no catalogs to select when publishing the draft API by clicking the Menu icon for more options and clicking Publish.
Causes: When publishing a product to a catalog, the UI provides a list of catalogs and spaces to choose from. The list is filtered to only include catalogs and spaces that have registered gateways, which support enforced APIs in the product.
When an unenforced draft API with no x-ibm-configuration.gateway
value is published, the draft product that is created for it has a default gateway type of datapower-api-gateway
. This gateway type does not match any catalog in Event Endpoint Management because only the gateway type of event-gateway
is available when only event endpoints are enabled.
Solution: Edit the newly created product to manually set the gatewayType
as follows:
In the API Manager UI, click the Develop button in the navigation bar.
Click Products to list the draft products.
Click the product created when publishing the API to open the product editor.
In the product editor, click Source to show the product document source.
Replace
datapower-api-gateway
withevent-gateway
in the gateways list.Click Save.
The product can now be published to a catalog with a registered Event Gateway.
Clients fail to connect to the Event Gateway Service
Symptoms: After signing up to use an enforced Kafka AsyncAPI in the Developer Portal, an application developer sees their client application failing to connect to the Event Gateway Service. The connection is closed by the gateway service after a 30 second delay.
The logs for the Event Gateway Service provide the following messages at the time the client application attempts to connect, and the client then reports the following error:
...
INFO Events Gateway - [start:196] Kafka Server starting.
INFO com.ibm.eem.Kafka Server - [startGatewayVerticle:65] Broker port: <BOOTSTRAP SERVER PORT> Broker hostname: <BOOTSTRAP SERVER>
...
ERROR com.ibm.eem.core.Request - [abortAndCloseConnection:187] Timed out after waiting 30000(ms) for a reply. address: __vertx.reply.<ID>, repliedAddress: ConnectionManager
...
Causes: The Event Gateway Service is attempting to connect to the Kafka cluster using an incorrect SSL configuration. For example, it is trying to connect to Kafka brokers without using TLS when TLS is required.
Solution: In the AsyncAPI editor, review and correct the Gateway > Invoke settings for this API to match the TLS settings of your Kafka cluster.
If the cluster is using TLS:
Set Security Protocol to
SASL_SSL
orSSL
, depending on whether you are using SASL authentication and authorization or not.Provide the Transport CA certificate for the Event Gateway Service to use to establish trust with the Kafka cluster. This is only required if the cluster is not using a certificate issued by a well-known authority.
If the Kafka cluster does not require TLS:
Set Security Protocol to
PLAINTEXT
orSASL_PLAINTEXT
, depending on whether you are using SASL authentication and authorization or not.
After updating the API to have the correct settings, save the API, and republish the Products this API is included in. This will update the configuration used by the Event Gateway Service, and allow clients to connect successfully.
Error validating CRs against a new CRD schema in OLM when upgrading DataPower
Symptom: An error occurs when you attempt to uninstall and reinstall the DataPower Operator on OpenShift 4.8 or higher. When OLM is validating existing CRs against a new CRD schema, the conversion webhook is not found.
Solution: Reinstall the DataPower Operator using the following steps:
Uninstall the failed DataPower Operator
Edit the
DataPowerService
CRD in the clusterRemove the
spec.conversion.webhook
spec.Set
spec.conversion.strategy
to None.Apply changes to the
DataPowerService
CRD.Reinstall the DataPower Operator.
Unable to deploy API Connect using the Platform UI with Portal Site Name set
Symptom: If you attempt to deploy API Connect using the Platform UI with a Portal Site Name set, the create button is disabled.
Solution: To deploy API Connect with a Portal Site Name set, create your API Connect CR using the OpenShift Console or the OpenShift command-line interface (oc CLI).
Kubernetes garbage collection might cause pods to go into a CrashLoopBackOff state
Symptom: When the IBM Platform Navigator Operator is upgraded from 2020.4 to 2022.2, the pod enters a CrashLoopBackOff
state.
Cause: The Kubernetes garbage collection might not remove the package lock, which can cause the pod to enter a CrashLoopBackOff
state.
Solution: Delete the configMap associated with the operator. This releases the lock and allows the upgrade to proceed.
The Automation Assets UI can fail to load after a new installation or an upgrade of 2020.4 to 2022.2.
Symptom: After the new install or upgrade, the Automation Assets user interface is inaccessible. The page renders only the text of a version and platform (for example, 3.5.0.0 (20220429_2020) x86_64).
Cause: The Automation Assets user interface may be inaccessible if the Zen component does not process the required configMaps.
Solution: Run the following commands to restart the zen-watcher pod associated with this deployment. This triggers the configMap to be loaded.
oc get pods -n <namespace>
oc delete pod <zen-watcher-pod-name> -n <namespace>
OpenShift version support warning remains after upgrading Automation Assets
Symptom After Automation Assets has been upgraded from version 2020.4 (with OpenShift versions prior to 4.10) to version 2022.2 (with OpenShift version 4.10), the following warning remains in the status after the custom resource has reconciled:
An EUS instance (2020.4.1-6-eus) has been installed but the OCP version (4.7.45) does not qualify for the extended support duration. This instance is supported with a regular CD release.
Upgrading Automation Assets to version 2022.2 can result in an outage during upgrade
Symptom: Upgrading Automation Assets to version 2022.2 from a prior version can result in an outage of the Automation Assets for a period of time during the upgrade.
Automation Assets may show as CrashLoopBackOff for a period of time before successfully deploying
Symptom: When installing Automation Assets, the pod can enter the CrashLoopBackOff status during the installation process. After some time this can complete successfully. The following error is shown in the logs during this time:
Error starting server Error: connect ECONNREFUSED
...
statusText: 'ECONNREFUSED',
body: 'Response not received - no connection was made to the service.'
During upgrade, new API pod for Automation assets running in single replica mode gets stuck in Creating state
Symptom: Attempting to upgrade an Automation assets instance that is running in single replica mode leaves the new Automation assets API pod in a Creating
state. The pod events show messages similar to:
Generated from attachdetach-controller
Multi-Attach error for volume "[PVC-NAME]" Volume is already used by pod(s) [POD-NAME]
Cause: The Automation assets API Deployment is using RollingUpgrade
strategy, but the new pod cannot mount the existing PVC, because the original pod still has it mounted.
Solution: Locate the Automation assets API Deployment
(which is called <INSTANCE-NAME>-ibm-integration-asset-repository-api
), click the YAML tab, and find the strategy
section:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 40%
maxSurge: 100%
Replace it with:
strategy:
type: Recreate
Save your changes.
Event Streams shows the Kafka cluster is not ready during upgrade to 2022.2
Symptom: When upgrading Event Streams from operator version 2021.4 to 2022.2 (instance version 10.5.0 to 11.0.2) the Kafka cluster shows the error Kafka cluster is not ready
.
Cause: The Kafka cluster can show the error if the connection to the Zookeeper component drops.
Solution: Edit the custom resource for the Event Streams instance, updating the instance spec.version field to 11.0.2
.
Unable to delete a namespace due to the OperandRequest remaining
Symptom: An attempt to delete a namespace fails. The logs for the ODLM show "No permission to update OperandRequest".
Cause: The OperandRequest for the operand-deployment-lifecycle-manager (ODLM) might not be deleted, causing the namespace to remain in the 'Terminating' state.
Solution: Remove all finalizers from the OperandRequest by deleting finalizers:
and any associated finalizer URLs from the OperandRequest, such as finalizer.request.ibm.com
in the following snippet:
apiVersion: operator.ibm.com/v1alpha1
kind: OperandRequest
metadata:
finalizers:
- finalizer.request.ibm.com
generation: 1
labels:
ibm-common-services.common-service/config: "true"
ibm-common-services.common-service/registry: "true"
name: common-service
namespace: cp1
Installations on IBM Power or Z platforms can fail to complete due to Zen storage requirements
Cause: When installing the Platform UI on IBM Power or Z platforms, the Zen component requires additional storage capability provided by RWX in addition to the RWO default.
Solution: Follow the steps in https://www.ibm.com/docs/en/cloud-paks/cp-integration/2022.2?topic=ui-deploying-platform-rwo-storage in the documentation for Cloud Pak for Integration 2022.2.
Installation may result in multiple iam-config-job pods
Symptom: When Cloud Pak for Integration 2022.2 is installed, the Cloud Pak foundational services deploys an iam-config-job
pod which may enter an error state with the following message in the pod logs:
error: unable to upgrade connection: container not found ("usermgmt-container")
Could not copy ca.crt to pod
This results in the pod being recreated, though the previous pod remains and has to be manually deleted.
Platform UI continuously changes from Ready back to Pending even when the UI is accessible
Symptom: When you install Platform UI 2022.2.1, you might notice that the status of the resource keeps reverting back to Pending even when the resource is accessible, or hours after creation completed successfully.
Cause: This can happen due to a known limitation with the ZenService object does not stop reconciling.
Solution: Apply this annotation to the Platform UI YAML "integration.ibm.com/reconcile-zen-service": "false"
, for example:
metadata:
name: integration-quickstart
namespace: integration
annotations:
"integration.ibm.com/reconcile-zen-service": "false"
With this the Cloud Pak operator will no longer reconcile the Zen Service object. Note that this means some features, like replica and TLS control, are also disabled.
Upgrade of an operator fails
Symptom: Upgrade of an operator fails with this error:
Bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline
Cause: Uncertain, but the cause may be that the first time the operator bundle was pulled from the openshift-marketplace
, the extract job failed—probably due to an issue accessing the remote image or similar issue—and corrupted the ConfigMap. The operator manifest was most likely also corrupted. Once that happens, any attempt to use the same job and ConfigMap to install another instance of the operator—for example in another namespace—will fail.
Solution:
Find the corresponding job and ConfigMap (usually with the same name) in the
openshift-marketplace
and grep for the operator name or keyword in its contents:oc get job -n openshift-marketplace -o json | jq -r '.items[] | select(.spec.template.spec.containers[].env[].value|contains ("<operator_name_keyword>")) | .metadata.name'
Delete the job and corresponding ConfigMap (which has the same name as the job) found in the previous step:
oc delete job <job_string> -n openshift-marketplace oc delete configmap <configmap_string> -n openshift-marketplace
Try installing the operator again. If the installation is successful, the process is complete and you can end here. If you are still unable to install the operator, continue with the next step.
Uninstall the failed operator installation using the procedure in Uninstalling the operators and catalog sources.
Delete the install plan, subscription, and CSV that are in the same namespace as the operator:
oc delete ip <operator_installplan_name> -n <user_namespace> oc delete sub <operator_subscription_name> -n <user_namespace> oc delete csv <operator_csv_name> -n <user_namespace>
Retry installing the operator. The installation should complete successfully. If not, collect a new must-gather by using the script in Troubleshooting and note the operator InstallPlan error messages for IBM support.
A docker run
command returns a permission denied
error
Symptom: When running a docker run
command, you get the following error:
docker run "/kube/config": open /kube/config: permission denied
Cause: Read-write permissions are needed for KUBECONFIG
(~/.kube/config
).
Solution: Run the following to give the user read-write permissions to the file:
chmod +rw ~/.kube/config
User is unable to generate an upgrade plan by using the CLI
Symptom: When following the instructions for "Generating an upgrade plan using the CLI" in Upgrading from 2020.4 or Upgrading from 2021.4 for an online (connected) installation, you are unable to generate the upgrade plan.
Cause: You may not have the correct configuration or permissions for Docker or Podman, or there is an error in the KUBECONFIG
command.
Solution:
On your online (connected) cluster, run the
oc admin
command:oc adm must-gather --image=icr.io/cpopen/ibm-integration-upgrade-must-gather:v2 -- /usr/bin/gather --namespace cp4i --to 2022.2