A fix is available
APAR status
Closed as program error.
Error description
PLR hangs on peer qmgrs, leading to S026-08118001 / S026-08118002 abends and abnormal queue manager termination, following a failed PLR attempt when the owning queue manager restarts. It follows abend 602 which occurs during qmgr startup if the AMSM address space fails to start. Following the restart of CSQ1 it attempted to perform PLR for it's connection to the MUTUAL structure - this failed due to an IxlRsnCodeHeldBySys return code when attempting to lock a list header, and resulting in the connection to the structure being disconnected with REASON=FAILURE. The other qmgrs received a DiscConnFail event for CSQ1's connection to MUTUAL, and attempted to perform PLR, however they required an ENQ held by CSQ1 which would not be released until it ended, or disconnected from the admin structure for another reason. As a result the structure task for MUTUAL on each of the other queue managers hung waiting for the ENQ until terminated by XCF.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All users of IBM MQ for z/OS Version 9 * * Release 1 Modification 0, * * Release 2 Modification 0 and * * Release 3 Modification 0. * **************************************************************** * PROBLEM DESCRIPTION: Abend S026-08118001, followed later by * * abend S026-08118002 and abnormal * * queue manager termination S6C6 occurs * * due to ENQ contention when a peer queue * * manager fails Peer Level Recovery of * * its own connection to an application * * structure during startup. * **************************************************************** A queue manager terminated abnormally while shared queue operations were inflight. Any peer queue managers connected to the structure attempted Peer Level Recovery (PLR) for the failed connection, but were unable to perform recovery due to a list's lock being held by the system, When the terminated queue manager restarted, it connected to the structure and attempted PLR, however this also failed for the same reason, and the queue manager disconnected, indicating REASON=FAILURE. The connected peers detected this failure, and attempted to start PLR, however this required an ENQ that was held by the disconnecting queue manager until termination. This resulted in the structure task of each connected peer hanging until terminated by XCF.
Problem conclusion
Connected peers will no longer attempt PLR when a queue manager fails PLR for it's own connection during startup, preventing the hang condition. PLR for the connection will be retried when the owning queue manager next attempts to connect to the structure.
Temporary fix
Comments
APAR Information
APAR number
PH39958
Reported component name
IBM MQ Z/OS V9
Reported component ID
5655MQ900
Reported release
100
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-08-18
Closed date
2022-07-21
Last modified date
2022-10-07
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UI81009 UI81010 UI81600
Modules/Macros
CSQECLOS CSQESEX CSQESTE
Fix information
Fixed component name
IBM MQ Z/OS V9
Fixed component ID
5655MQ900
Applicable component levels
R100 PSY UI81010
UP22/07/01 P F206
R200 PSY UI81009
UP22/07/01 P F206
R300 PSY UI81600
UP22/08/03 P F208 ¢
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"100","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
07 October 2022