IBM Support

IT38839: RDQM queue manager fails due to an I/O error

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • RDQM queue manager fails due to an I/O error. The following can
    be observed in the doc:
    
    2021/10/04 19:14:18.877382-4 Installation1 amqzmuc0 71669  4
    HL206037 mqloWritevFile        xecF_E_UNEXPECTED_RC
    hrcE_MQLO_DERR
    
    Major Errorcode  :- xecF_E_UNEXPECTED_RC
    Minor Errorcode  :- hrcE_MQLO_DERR
    Probe Description :- AMQ6118S: An internal IBM MQ error has
    occurred (20806826)
    Arith1      :- 545286182 (0x20806826)
    Arith2      :- 5 (0x5)
    
    2021/10/04 19:14:18.905774-4 Installation1 amqzmuc0 71669  4
    HL214091 mllWriteLogPages       hrcE_MQLO_DERR       OK
    2021/10/04 19:14:19.150322-4 Installation1 amqzmur0 71680  13
    XC402011 xeeOpenFiles         xecF_E_UNEXPECTED_SYSTEM_RC OK
    
    
    Probe Id     :- HL214091
    Application Name :- MQM
    Component     :- mllWriteLogPages
    Major Errorcode  :- hrcE_MQLO_DERR
    
    
    2021/10/04 19:14:18.933868-4 Installation1 amqzlaa0 61578 2755
    AL004008 almLogIt           xecF_E_UNEXPECTED_RC
    hrcE_LOG_DAMAGED
    2021/10/04 19:14:18.967913-4 Installation1 amqzlaa0 61578 2755
    AT049000 atxPerformCommit       STOP_ALL          OK
    2021/10/04 19:14:19.014360-4 Installation1 amqzlaa0 71731  33
    AT049000 atxPerformCommit       STOP_ALL          OK
    2021/10/04 19:14:19.075417-4 Installation1 amqzmur0 71680  13
    XC407051 xeiWriteFn          xecF_E_UNEXPECTED_SYSTEM_RC OK
    
    2021/10/04 19:14:19.094195-4 Installation1 amqzlaa0 71731  30
    AL004008 almLogIt           xecF_E_UNEXPECTED_RC
    hrcE_LOG_DAMAGED
    
    
    Probe Id     :- XC407051
    Component     :- xeiWriteFn
    Major Errorcode  :- xecF_E_UNEXPECTED_SYSTEM_RC
    Probe Description :- AMQ6119S: An internal IBM MQ error has
    occurred ('30 - Read-only file system' from write.)
    
    Arith1      :- 30 (0x1e)
    Comment1     :- '30 - Read-only file system' from write.
    
    MQM Function Stack
    xeiDiagnosticMessageService
    xeiProcessServiceMessages
    xcsDisplayMessage
    xcsFFST
    
    2021/10/04 19:14:19.154138-4 Installation1 runmqlsr 71729  5
    XC402011 xeeOpenFiles         xecF_E_UNEXPECTED_SYSTEM_RC OK
    
    2021/10/04 19:14:19.378770-4 Installation1 amqzlaa0 71731  30
    AT048003 atxCommit          STOP_ALL          OK
    2021/10/04 19:14:29.160506-4 Installation1 amqzxma0 71657  1
    ZX015050 zxcStopAgents        MQRC_Q_MGR_STOPPING     OK
    
    
    // Message log on node show that remounting file system
    read-only after the failure on node.
    
    Oct  2 19:09:33 3p corosync[1615]:  [MAIN  ] Corosync main
    process was not scheduled for 7084.0664 ms (threshold is
    1320.0000 ms). Consider token timeout increase.
    Oct  2 19:09:33 3p corosync[1615]:  [TOTEM ] A processor
    failed, forming new configuration.
    Oct  2 19:09:33 3p corosync[1615]:  [TOTEM ] A new membership
    (123.45.67.89:50728) was formed. Members joined: 1 2 left: 1 2
    Oct  2 19:09:33 3p corosync[1615]:  [TOTEM ] Failed to receive
    the leave message. failed: 1 2
    Oct  2 19:09:33 3p corosync[1615]:  [CPG   ] downlist
    left_list: 2 received
    Oct  2 19:09:33 3p corosync[1615]:  [CPG   ] downlist
    left_list: 0 received
    Oct  2 19:09:33 3p corosync[1615]:  [CPG   ] downlist
    left_list: 0 received
    Oct  2 19:09:33 3p kernel: drbd 2p 2p: sock was shut down by
    peer
    Oct  2 19:09:33 3p kernel: drbd 1p 1p: sock was reset by peer
    Oct  2 19:09:33 3p kernel: drbd 1p 1p: conn( Connected ->
    BrokenPipe ) peer( Secondary -> Unknown )
    Oct  2 19:09:33 3p kernel: drbd 1p/0 drbd100 1p: pdsk( UpToDate
    -> DUnknown ) repl( Established -> Off )
    

Local fix

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    Users running MQ in an RDQM cluster
    
    
    Platforms affected:
    Linux on x86
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    A queue manager may fail its automatic recovery after an active
    node loses connectivity. A potential timing window meant that if
    the node regained quorum while unmounting the filesystem, it
    would generate a file i/o error and leave the queue manager
    unavailable.
    

Problem conclusion

  • Updated the code to avoid this timing window.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v9.1 LTS   9.1.0.11
    v9.x CD    9.1.5
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT38839

  • Reported component name

    IBM MQ BASE MP

  • Reported component ID

    5724H7271

  • Reported release

    910

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-10-27

  • Closed date

    2022-02-25

  • Last modified date

    2022-06-16

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    IBM MQ BASE MP

  • Fixed component ID

    5724H7271

Applicable component levels

[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"910"}]

Document Information

Modified date:
17 June 2022