APAR status
Closed as program error.
Error description
RDQM queue manager fails due to an I/O error. The following can be observed in the doc: 2021/10/04 19:14:18.877382-4 Installation1 amqzmuc0 71669 4 HL206037 mqloWritevFile xecF_E_UNEXPECTED_RC hrcE_MQLO_DERR Major Errorcode :- xecF_E_UNEXPECTED_RC Minor Errorcode :- hrcE_MQLO_DERR Probe Description :- AMQ6118S: An internal IBM MQ error has occurred (20806826) Arith1 :- 545286182 (0x20806826) Arith2 :- 5 (0x5) 2021/10/04 19:14:18.905774-4 Installation1 amqzmuc0 71669 4 HL214091 mllWriteLogPages hrcE_MQLO_DERR OK 2021/10/04 19:14:19.150322-4 Installation1 amqzmur0 71680 13 XC402011 xeeOpenFiles xecF_E_UNEXPECTED_SYSTEM_RC OK Probe Id :- HL214091 Application Name :- MQM Component :- mllWriteLogPages Major Errorcode :- hrcE_MQLO_DERR 2021/10/04 19:14:18.933868-4 Installation1 amqzlaa0 61578 2755 AL004008 almLogIt xecF_E_UNEXPECTED_RC hrcE_LOG_DAMAGED 2021/10/04 19:14:18.967913-4 Installation1 amqzlaa0 61578 2755 AT049000 atxPerformCommit STOP_ALL OK 2021/10/04 19:14:19.014360-4 Installation1 amqzlaa0 71731 33 AT049000 atxPerformCommit STOP_ALL OK 2021/10/04 19:14:19.075417-4 Installation1 amqzmur0 71680 13 XC407051 xeiWriteFn xecF_E_UNEXPECTED_SYSTEM_RC OK 2021/10/04 19:14:19.094195-4 Installation1 amqzlaa0 71731 30 AL004008 almLogIt xecF_E_UNEXPECTED_RC hrcE_LOG_DAMAGED Probe Id :- XC407051 Component :- xeiWriteFn Major Errorcode :- xecF_E_UNEXPECTED_SYSTEM_RC Probe Description :- AMQ6119S: An internal IBM MQ error has occurred ('30 - Read-only file system' from write.) Arith1 :- 30 (0x1e) Comment1 :- '30 - Read-only file system' from write. MQM Function Stack xeiDiagnosticMessageService xeiProcessServiceMessages xcsDisplayMessage xcsFFST 2021/10/04 19:14:19.154138-4 Installation1 runmqlsr 71729 5 XC402011 xeeOpenFiles xecF_E_UNEXPECTED_SYSTEM_RC OK 2021/10/04 19:14:19.378770-4 Installation1 amqzlaa0 71731 30 AT048003 atxCommit STOP_ALL OK 2021/10/04 19:14:29.160506-4 Installation1 amqzxma0 71657 1 ZX015050 zxcStopAgents MQRC_Q_MGR_STOPPING OK // Message log on node show that remounting file system read-only after the failure on node. Oct 2 19:09:33 3p corosync[1615]: [MAIN ] Corosync main process was not scheduled for 7084.0664 ms (threshold is 1320.0000 ms). Consider token timeout increase. Oct 2 19:09:33 3p corosync[1615]: [TOTEM ] A processor failed, forming new configuration. Oct 2 19:09:33 3p corosync[1615]: [TOTEM ] A new membership (123.45.67.89:50728) was formed. Members joined: 1 2 left: 1 2 Oct 2 19:09:33 3p corosync[1615]: [TOTEM ] Failed to receive the leave message. failed: 1 2 Oct 2 19:09:33 3p corosync[1615]: [CPG ] downlist left_list: 2 received Oct 2 19:09:33 3p corosync[1615]: [CPG ] downlist left_list: 0 received Oct 2 19:09:33 3p corosync[1615]: [CPG ] downlist left_list: 0 received Oct 2 19:09:33 3p kernel: drbd 2p 2p: sock was shut down by peer Oct 2 19:09:33 3p kernel: drbd 1p 1p: sock was reset by peer Oct 2 19:09:33 3p kernel: drbd 1p 1p: conn( Connected -> BrokenPipe ) peer( Secondary -> Unknown ) Oct 2 19:09:33 3p kernel: drbd 1p/0 drbd100 1p: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
Local fix
Problem summary
**************************************************************** USERS AFFECTED: Users running MQ in an RDQM cluster Platforms affected: Linux on x86 **************************************************************** PROBLEM DESCRIPTION: A queue manager may fail its automatic recovery after an active node loses connectivity. A potential timing window meant that if the node regained quorum while unmounting the filesystem, it would generate a file i/o error and leave the queue manager unavailable.
Problem conclusion
Updated the code to avoid this timing window. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v9.1 LTS 9.1.0.11 v9.x CD 9.1.5 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IT38839
Reported component name
IBM MQ BASE MP
Reported component ID
5724H7271
Reported release
910
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-10-27
Closed date
2022-02-25
Last modified date
2022-06-16
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
IBM MQ BASE MP
Fixed component ID
5724H7271
Applicable component levels
[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"910"}]
Document Information
Modified date:
17 June 2022