APAR status
Closed as program error.
Error description
An high availability (HA) queue manager on the IBM MQ Appliance is unable to be started or failed over, the status command returns: mqa(mqcli)#? status QM1 QM(QM1) Status(Running elsewhere) HA role: Secondary HA status: Resource failed HA control: Enabled HA preferred location: This appliance The pcs-status.stdout file in the runmqras shows that the Resource failed status is reported due to the following failed action: * QM1_start_0 on MQAPP1 'unknown error' (1): call=692, status=complete, exitreason='', last-rc-change='Fri Apr 23 09:45:19 2021', queued=0ms, exec=10473ms The attempt to start the queue manager failed and the messages logs shows below errors: 2021-04-23 09:45:29.773449-04:00 [localhost] mqsystem[550754]: HAQM_start Leaving for queue manager QM1, result is 24 2021-04-23 09:45:29.787727-04:00 [localhost] QueueManager(QM1)[550162]: INFO: Leaving QueueManager_start(), returning 1 2021-04-23 09:45:29.790285-04:00 [localhost] lrmd[23296]: notice: QM1_start_0:550162:stderr [ AMQ8041S: The queue manager cannot be restarted or deleted because processes, ] 2021-04-23 09:45:29.790304-04:00 [localhost] lrmd[23296]: notice: QM1_start_0:550162:stderr [ that were previously connected, are still running. ] 2021-04-23 09:45:29.790315-04:00 [localhost] lrmd[23296]: notice: QM1_start_0:550162:stderr [ AMQ7018E: The queue manager operation cannot be completed. ] 2021-04-23 09:45:29.790829-04:00 [localhost] crmd[23299]: notice: Result of start operation for QM1 on MQAPP1: 1 (unknown error) The queue manager couldn't start since some process that was previously connected was still running. The ps output shows that old instances of endmqm process are still running 0 S mqm 816342 1 0 80 0 - 37121 hrtime Apr22 ? 00:00:46 endmqm -C -ir QM1 0 S mqm 825305 1 0 80 0 - 52659 futex_ Apr22 ? 00:00:01 endmqm -C -pr QM1
Local fix
Suspend the preferred node from HA and reboot it.
Problem summary
**************************************************************** USERS AFFECTED: All MQ Appliance users using HA. Platforms affected: MultiPlatform **************************************************************** PROBLEM DESCRIPTION: If the endmqm process huns during failover then the queue manager was unable to return back to its original node, as the endmqm process was still running on that system.
Problem conclusion
The MQ appliance HA logic is modified to ensure the endmqm process is terminated if it does not end in timely manner during a fail over. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v9.2 LTS 9.2.0.5 v9.x CD 9.2.5 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IT36744
Reported component name
MQ APPL M2002 V
Reported component ID
5737H4701
Reported release
920
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-04-29
Closed date
2021-12-21
Last modified date
2021-12-21
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
MQ APPL M2002 V
Fixed component ID
5737H4701
Applicable component levels
[{"Line of Business":{"code":"LOB36","label":"IBM Automation"},"Business Unit":{"code":"BU053","label":"Cloud \u0026 Data Platform"},"Product":{"code":"SS5K6E","label":"IBM MQ Appliance"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"920"}]
Document Information
Modified date:
22 December 2021