IBM Support

IBM MQ: Considerations for OpenShift Container Storage 4.2 and 4.3 (CephFS)

Troubleshooting


Problem

IBM MQ provides guidance on the functional capabilities for a file system and the testing a client should complete in their own environment. If a file system has been verified by the Lab, or known issues identified, these are documented here:

https://www.ibm.com/support/pages/testing-statement-ibm-mq-multi-instance-queue-manager-file-systems

IBM completed testing against OpenShift Container Storage (OCS) 4.2 and 4.3 and this provides the core characteristics needed for an IBM MQ Multi-Instance deployment. During this testing two considerations were identified that clients should be aware of:

  • Failover time: When the active instance of a Multi-Instance pair fails, the recovery time is heavily dependent on the period it takes the underlying file system to detect the failure. After the failure has been detected, the file lock for the active instance is released and the standby instance obtains a lock and is started. The default file system detection time is 60 seconds in OCS 4.2 and 4.3, and this can be customized using the mds session timeout parameter. Currently this can be lowered to 30 seconds by using the Ceph CLI.
  • Standby Instance Node Failure causes outage: If the node running the standby instance fails, this causes the active instance to fail. This is because of an issue with OCS, where a common file across the active and standby instances becomes unavailable when the node running the standby instance fails unexpectedly. This causes the active instance to stop, and in a container environment will be seen as a POD restart. The OCS team accepted this as an issue, and we understand this has been fixed in the ceph kernel client for RHEL-8.3.0 in kernel-4.18.0-207.el8. This issue should not be present in ceph clients running with kernel-4.18.0-207.el8 or later.

In general, OpenShift Container Storage 4.2 and 4.3 provide the functional characteristics required to run IBM MQ Multi-Instance. However clients need to be aware of the expected failover behavior, and verify this meets their business resilience and availability requirements.

Document Location

Worldwide

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"ARM Category":[{"code":"a8m0z00000008L1AAI","label":"Components and Features->High Availability (HA)->File System Requirements"},{"code":"a8m0z00000008NKAAY","label":"Components and Features->High Availability (HA)->Multi Instance Queue Managers"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"9.1.0","Line of Business":{"code":"LOB45","label":"Automation"}}]

Product Synonym

WebSphere MQ;MQ

Document Information

Modified date:
15 July 2021

UID

ibm16115900