A fix is available
APAR status
Closed as program error.
Error description
Delay of commit processing of 1 Queue Manager belonging to a QSG, due to a Log Full condition ( CSQJ111A ), prevents other Queue Managers from processing failure events, due to a CF structure failure. This prevents an expected CF rebuild, until the 2-phase commit is complete. If the WAIT exceeds a pre-defined XCF Policy, XCF terminates the hung process, and is working 'as-designed'. However, in this particular case, the termination of additional queue managers that had not lost connectivity to the structure, nor filled their logs, is not expected. This 'inconsistent' behavior needs to be handled in a more elegant way.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All users of IBM MQ for z/OS Version 9 * * Release 2 Modification 0 and * * Release 3 Modification 0. * **************************************************************** * PROBLEM DESCRIPTION: Several queue managers are terminated * * S026-08110102 following a partial loss * * of connectivity to a subset of the * * queue managers. * **************************************************************** A queue manager experienced a partial loss of connectivity to an application structure, and prepared to disconnect from the structure. However processing of the EEPLLossConn event was delayed due to the logs having filled while a commit was in progress. While this processing was delayed another qmgr disconnected from the structure due to the same loss of connectivity, and all queue managers that remained connected (including the delayed queue manager) received a DiscFailConn event. One of them started Peer Level Recovery (PLR) phase 1, and waited for all connected queue managers to acknowledge the associated USync event. This resulted in PLR phase 2 being delayed due to the original queue manager's full log condition, and consequently can result in other queue managers participating in the PLR operation, in addition to the one that was hung, being terminated abnormally due to the S026-08110102 abends.
Problem conclusion
Queue managers processing a loss of connectivity event will not participate in new PLR operations for an affected structure, allowing other participating queue managers to complete PLR processing and respond to the DiscFailConn event.
Temporary fix
Comments
×**** PE23/03/23 FIX IN ERROR. SEE APAR PH53483 FOR DESCRIPTION
APAR Information
APAR number
PH40878
Reported component name
IBM MQ Z/OS V9
Reported component ID
5655MQ900
Reported release
200
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-09-24
Closed date
2022-07-20
Last modified date
2023-06-30
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UI79255 UI81566
Modules/Macros
CSQE197M CSQESEX CSQEWRKT
Fix information
Fixed component name
IBM MQ Z/OS V9
Fixed component ID
5655MQ900
Applicable component levels
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU011","label":"Systems - zSystems software"},"Product":{"code":"SG19M","label":"IBM MQ"},"Platform":[{"code":"PF054","label":"z Systems"}],"Version":"200"}]
Document Information
Modified date:
01 July 2023