IBM Support

IT36744: Unable to move MQ appliance HA queue manager back to its original node after fail over

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • An high availability (HA) queue manager on the IBM MQ Appliance
    is unable to be started or failed over, the
    status command returns:
    
    mqa(mqcli)#? status QM1
    QM(QM1)                                  Status(Running
    elsewhere)
    HA role:                                 Secondary
    HA status:                               Resource failed
    HA control:                              Enabled
    HA preferred location:                   This appliance
    
    
    The pcs-status.stdout file in the runmqras shows that the
    Resource failed status is reported due to the following failed
    action:
    
    * QM1_start_0 on MQAPP1 'unknown error' (1): call=692,
    status=complete, exitreason='',
        last-rc-change='Fri Apr 23 09:45:19 2021', queued=0ms,
    exec=10473ms
    
    
    The attempt to start the queue manager failed and the messages
    logs shows below errors:
    
    2021-04-23 09:45:29.773449-04:00 [localhost] mqsystem[550754]:
    HAQM_start Leaving for queue manager QM1, result is 24
    2021-04-23 09:45:29.787727-04:00 [localhost]
    QueueManager(QM1)[550162]: INFO: Leaving QueueManager_start(),
    returning 1
    2021-04-23 09:45:29.790285-04:00 [localhost]
    lrmd[23296]:   notice: QM1_start_0:550162:stderr [ AMQ8041S:
    The queue manager cannot be restarted or deleted because
    processes, ]
    2021-04-23 09:45:29.790304-04:00 [localhost]
    lrmd[23296]:   notice: QM1_start_0:550162:stderr [ that were
    previously connected, are still running. ]
    2021-04-23 09:45:29.790315-04:00 [localhost]
    lrmd[23296]:   notice: QM1_start_0:550162:stderr [ AMQ7018E:
    The queue manager operation cannot be completed. ]
    2021-04-23 09:45:29.790829-04:00 [localhost]
    crmd[23299]:   notice: Result of start operation for QM1 on
    MQAPP1: 1 (unknown error)
    
    
    The queue manager couldn't start since some process that was
    previously connected was still running.  The ps output shows
    that old instances of endmqm process are still running
    
    0 S mqm       816342       1  0  80   0 - 37121 hrtime Apr22
    ?        00:00:46 endmqm -C -ir QM1
    0 S mqm       825305       1  0  80   0 - 52659 futex_ Apr22
    ?        00:00:01 endmqm -C -pr QM1
    

Local fix

  • Suspend the preferred node from HA and reboot it.
    

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    All MQ Appliance users using HA.
    
    
    Platforms affected:
    MultiPlatform
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    If the endmqm process huns during failover then the queue
    manager was unable to return back to its original node, as the
    endmqm process was still running on that system.
    

Problem conclusion

  • The MQ appliance HA logic is modified to ensure the endmqm
    process is terminated if it does not end in timely manner during
    a fail over.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v9.2 LTS   9.2.0.5
    v9.x CD    9.2.5
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT36744

  • Reported component name

    MQ APPL M2002 V

  • Reported component ID

    5737H4701

  • Reported release

    920

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-04-29

  • Closed date

    2021-12-21

  • Last modified date

    2021-12-21

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    MQ APPL M2002 V

  • Fixed component ID

    5737H4701

Applicable component levels

[{"Line of Business":{"code":"LOB36","label":"IBM Automation"},"Business Unit":{"code":"BU053","label":"Cloud \u0026 Data Platform"},"Product":{"code":"SS5K6E","label":"IBM MQ Appliance"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"920"}]

Document Information

Modified date:
22 December 2021