IBM Support

IT41625: RDQM configured for both HA and DR unable to run on any node after frequent connection interruptions

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • RDQM configured for both HA and DR is unable to run on any node
    after frequent connection interruptions even with APAR IT38764
    included. When there is a failover initiated, though a stop on
    the stacked drbd resource is issued, the resource will never be
    restarted. The DR/HA queue manager's stacked DRBD  resource will
    be  stuck in disk less state. In the logs we can find below
    entries
    Jul 11 20:58:48 ### rdqmd: Waiting for Diskless stacked resource
    '###.dr' to be Secondary
    Jul 11 20:58:54 ### rdqmd: Stopped resource 'ms_drbd_dr_###'
    Jul 11 20:58:54 ### rdqmd: Set target-role=Master for resource
    'ms_drbd_dr_#'
    

Local fix

  • Suspend the primary node then resume it.
    rdqmadm -s
    rdqmadm -r
    

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    All MQ users who have configured RDQM for high availability (HA)
    and disaster recovery (DR).
    
    
    Platforms affected:
    Linux on x86-64
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    During failover when a stop on the stacked resource is issued,
    MQ will set the target-role to Master this will override the
    stop of stacked resource which can happen in a small timing
    window when quorum returns before Pacemaker stops the stacked
    resource. The stop of stacked resource issued is not synchronous
    and we are not waiting for resource to stop before changing the
    target role. APAR IT38764 attempted to fix a similar issue
    previously, but did not account for this additional timing
    window.
    

Problem conclusion

  • The code has been modified to wait for stop on drbd stacked
    resource to complete before changing the target role.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v9.2 LTS   9.2.0.10
    v9.3 LTS   9.3.0.5
    v9.x CD    9.3.3
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT41625

  • Reported component name

    MQ BASE V9.2

  • Reported component ID

    5724H7281

  • Reported release

    920

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2022-07-27

  • Closed date

    2023-02-28

  • Last modified date

    2023-02-28

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    MQ BASE V9.2

  • Fixed component ID

    5724H7281

Applicable component levels

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"920","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
01 March 2023