IBM Support

IT36745: IBM MQ Appliance HA queue manager does not return to its preferred node after unexpected fail over

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • The cluster manager process terminates unexpectedly due to an
    internal error and triggers the queue managers to failover to
    the Secondary node and continues to run there even though its
    preferred node is online again.
    
    Below errors were recorded in the system log that triggers the
    failover.
    
    2021-04-21 13:13:19.374056+00:00 MQAPP1 crmd[861101]:    error:
     Could not recover from internal error
    2021-04-21 13:13:19.376296+00:00 MQAPP1 pacemakerd[860937]:
    error: crmd[861101] exited with status
    201 (Generic Pacemaker error)
    2021-04-21 13:13:19.376375+00:00 MQAPP1 pacemakerd[860937]:
    notice: Respawning failed child process:
    crmd
    

Local fix

  • Suspend and resume from the HA the node where the queue
    managers are now running.
    

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    All MQ Appliance users with one or more queue managers
    configured in an HA group.
    
    
    Platforms affected:
    MultiPlatform
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    When cluster process terminates unexpectedly some of the
    transient attributes which determine the eligibility of the node
    to run the cluster resources were lost. As a result the of this
    the queue manager was unable to move back to its preferred node
    even though the node was online again.
    

Problem conclusion

  • The code has been improved such that the missing transient
    attributes are restored in the event of an unexpected failure.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v9.2 LTS   9.2.0.5
    v9.x CD    9.2.5
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT36745

  • Reported component name

    MQ APPL M2002 V

  • Reported component ID

    5737H4701

  • Reported release

    920

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-04-29

  • Closed date

    2022-01-20

  • Last modified date

    2022-01-25

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    MQ APPL M2002 V

  • Fixed component ID

    5737H4701

Applicable component levels

[{"Line of Business":{"code":"LOB36","label":"IBM Automation"},"Business Unit":{"code":"BU053","label":"Cloud \u0026 Data Platform"},"Product":{"code":"SS5K6E","label":"IBM MQ Appliance"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"920"}]

Document Information

Modified date:
26 January 2022