IBM Support

IT43517: Unable to switch back to RDQM node after applying FixPack 9.2.0.10

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • After FixPack 9.2.0.10 was applied to an RDQM node, it was not
    possible to switch the queue manager back to that node.
    The "rdqmadm -r" command appeared to run successfully, but the
    queue manager would not start on the 9.2.0.10 node when running
    "rdqmadm -p"
    
    Messages from /var/log/messages indicate conflicting options in
    /etc/drbd.d/global_common.conf e.g.:
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Apr  6 13:43:55 rdqm-test-1 drbd(p_drbd_myrdqm)[16344]: ERROR:
    myrdqm: Command stderr: drbd.d/global_common.conf:10:
    conflicting use of resource options section 'common:res_options'
    ...#012drbd.d/global_common.conf:7: resource options section
    'common:res_options' first used here.
    Apr  6 13:43:55 rdqm-test-1 lrmd[7438]:  notice:
    p_drbd_myrdqm_stop_0:16344:stderr [ ocf-exit-reason:DRBD
    resource myrdqm not found in configuration file /etc/drbd.conf.
    ]
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    
    Similar errors can be found in the output from "journalctl -xe"
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    -- Unit drbd.service has begun starting up.
    Apr 11 09:23:17 rdqm-test-1.ibm.com drbd[5900]: Starting DRBD
    resources:
    Apr 11 09:23:17 rdqm-test-1.ibm.com drbd[5900]:
    drbd.d/global_common.conf:10: conflicting use of resource
    options section 'common:res_opt
    Apr 11 09:23:17 rdqm-test-1.ibm.com drbd[5900]:
    drbd.d/global_common.conf:7: resource options section
    'common:res_options' first used her
    Apr 11 09:23:17 rdqm-test-1.ibm.com drbd[5900]: .
    Apr 11 09:23:17 rdqm-test-1.ibm.com systemd[1]: drbd.service:
    main process exited, code=exited, status=6/NOTCONFIGURED
    Apr 11 09:23:17 rdqm-test-1.ibm.com systemd[1]: Failed to start
    DRBD -- please disable. Unless you are NOT using a cluster
    manager..
    -- Subject: Unit drbd.service has failed
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    
    ...and crm status reports failed resource actions:
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    Stack: corosync
    Current DC: rdqm-test-2.ibm.com (version
    1.1.24.linbit-2.0.el7-8f22be2ae) - partition with quorum
    Last updated: Tue Apr 11 09:54:20 2023
    Last change: Tue Apr 11 09:24:56 2023 by root via crm_attribute
    on rdqm-test-1.ibm.com
    
    3 nodes configured
    6 resource instances configured
    
    Online: [ rdqm-test-1.ibm.com rdqm-test-2.ibm.com
    rdqm-test-3.ibm.com ]
    
    Full list of resources:
    
    myrdqm (ocf::ibm:rdqm):        Started rdqm-test-2.ibm.com
    p_fs_myrdqm    (ocf::heartbeat:Filesystem):    Started
    rdqm-test-2.ibm.com
    p_rdqmx_myrdqm (ocf::ibm:rdqmx):       Started
    rdqm-test-2.ibm.com
    Master/Slave Set: ms_drbd_myrdqm [p_drbd_myrdqm]
         Masters: [ rdqm-test-2.ibm.com ]
         Slaves: [ rdqm-test-3.ibm.com ]
         Stopped: [ rdqm-test-1.ibm.com ]
    
    Failed Resource Actions:
    * p_drbd_myrdqm_start_0 on rdqm-test-1.ibm.com 'not installed'
    (5): call=19, status=complete, exitreason='DRBD resource myrdqm
    not found in configuration file /etc/drbd.conf.',
        last-rc-change='Fri Apr  7 18:14:26 2023', queued=0ms,
    exec=176ms
    = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
    

Local fix

  • On each node where the failure has been found, remove the
    following duplicate lines from /etc/drbd.d/global_common.conf:
    
        options {
            twopc-timeout 100;
        }
    
            ping-timeout 40;
            socket-check-timeout 5;
            socket-check-timeout 5;
    
     The simplest way to do this is to replace
    /etc/drbd.d/global_common.conf with the file in the IBM MQ
    samples directory, e.g.:
    
    cp
    [MQ_INSTALLATION_PATH]/samp/rdqm/etc/drbd.d/global_common.conf
     /etc/drbd.d
    
    Then cleanup failed resource actions.  To check for failed
    resource actions, run :
    
      crm status
    
    If any are found, run:
    
      crm resource cleanup
    
    ...on the appropriate resource(s) (for example the queue manager
    name in lower case).
    
    When all failed resource actions have been cleared, use "rdqmadm
    -r" and "rdqmadm -p" to resume and switch the RDQM to the
    preferred node.
    

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    Those using RDQM who applied FixPack 9.2.0.10
    
    
    Platforms affected:
    Linux on x86-64
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    An error in the 9.2.0.10 FixPack migration code caused duplicate
    lines to be added to file /etc/drbd.d/global_common.conf which
    prevented RDQM from restarting on the node.
    

Problem conclusion

  • The error has been fixed.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v9.2 LTS   9.2.0.11
    v9.3 LTS   9.3.0.10
    v9.x CD    9.3.3
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT43517

  • Reported component name

    MQ BASE V9.2

  • Reported component ID

    5724H7281

  • Reported release

    920

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2023-04-06

  • Closed date

    2023-04-18

  • Last modified date

    2023-04-25

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    MQ BASE V9.2

  • Fixed component ID

    5724H7281

Applicable component levels

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"920","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
25 April 2023