A fix is available
APAR status
Closed as program error.
Error description
The first report of this problem was that queue managers in a Queue Sharing Group ( QSG ) are hung and show GRS contention. DISPLAY GRS STATUS shows QP23MSTR owns an exclusive lock and ALL other QMGRs in the QSG are waiting. D GRS,C showed: S=SYSTEMS SYSZCSQE PeerLevelRecoveryQPP30000000C00000039 SYSNAME JOBNAME ASID TCBADDR EXC/SHR STATUS MFOS QP23MSTR 008C 00960140 EXCLUSIVE OWN MBOS QP11MSTR 0095 00962140 EXCLUSIVE WAIT MBOS QPG2MSTR 010E 0096C7F0 EXCLUSIVE WAIT MAOS QPS3MSTR 0093 00973A48 EXCLUSIVE WAIT MAOS QP10MSTR 017E 0096B988 EXCLUSIVE WAIT etc. MQ error messages logged: CSQ3201E +QP22 ABNORMAL EOT IN PROGRESS FOR USER=WSSRWDP CONNECTION-ID=RRSBATCH THREAD-XREF= CSQE007I +QP20 EEPLDISCFAILCONNECTION event received for structure MQRBSDJ02 connection name CSQEQPP3QP220C CSQE008I +QP20 Recovery event from QP22 received for structure MQRBSDJ02 Dumps taken included: Dump Title: QP22,ABN=5C6-00C51027,U=SYSOPR ,C=L8200.600.CFM -CSQERAD1,M=CSQGFRCV,LOC=CSQELPLM.CSQERAD1+0912 Dump Title: ABEND=S026,REASON=08118001,CONNECTOR HANG: CONNAME=CSQEQPP3QP230D,JOBNAME=QP23MSTR SYSLOG also showed IXL041E and IXL049E received for the structure. LOGREC shows abend s13E occurring in XES code. * ADDITIONAL SYMPTOMS: Subsequent reports of this problem did not necessarily have the symptom of GRS contention. Their symptoms included: - An MQ dump shows that MQ system threads and application threads were waiting for latch SSSCONN / DMCSEGAL (the csObjectLatch). The latch owner was in a wait in CSQEUCAT for a lock in the Coupling Facility ( CF ). The lock owner had abended after CSQEUCAT obtained the lock and before it updated it's flag to indicate the lock is held. Therefore, when the abend occurred, the lock was not released. LOGREC will have an abend entry, possibly for S13E or S222, with SUBFUNCTION: CFM CSQEUCATCHG LHQC - The command processor ( thread RTSSRV01 ) was not working because it was waiting for the latch. - Application or channel threads may be hung in CSQE* modules, e.g. waiting in CSQEMPUT with the TCB waiting in IXLRQRSU . An attempt to restart the channel fails with CSQX514E channel is active. - CSQE020E CSQ1 Structure <strucid> connection as <connection> failed, RC=0000000C reason=02010C27 codes=00000002 00000008 00000C27 - CSQE007I CSQ1 EEPLDISCFAILCONNECTION event received for structure APPLICATION1 connection name CSQECSQSCSQ101 - ABEND=S026 REASON=08118001 ABENDS026 ABEND026 ABEND S026 026 - ABEND878-10 due to a build-up of DXWB control blocks ABENDS878 ABEND878 ABEND S878 878 . L2 Verification Steps: See details in the internal Level 2 forum
Local fix
Recycle the queue manager that owns the lock
Problem summary
**************************************************************** * USERS AFFECTED: All users of WebSphere MQ for z/OS Version 6 * **************************************************************** * PROBLEM DESCRIPTION: Multiple queue managers in a qsg hang * * while waiting for a lock on a list * * header. Symptoms include: * * - Queue Manager hangs during startup * * performing peer level recovery * * - Tasks accessing shared queues hang * * - Abend 5C6-00C51027 in CSQERAD1 * * - Abend S026 due to CONNECTOR HANG * **************************************************************** * RECOMMENDATION: * **************************************************************** A task opening a shared queue calls CSQEUCA1 to update the list header for that queue, and this calls IXLLSTC to lock the list header. After the lock is granted, but before CSQEUCA1 flags this, the task abends and CSQEUCA1's recovery routine is invoked, but because the successful granting of the lock was not yet recorded, no attempt is made to release the lock. Subsequently any task on any qmgr in the qsg attempting to get the lock hangs until the qmgr where the abend occurred is recycled.
Problem conclusion
CSQEUCA1 is changed to close the timing window where a lock has been granted but this has not been flagged for the recovery routine. Additionally, CSQGFRCV is changed to save the time of an abend in the FRE to aid in diagnosing similar problems in the future. 000Y CSQEUCA1 CSQGFRCV
Temporary fix
********* * HIPER * *********
Comments
APAR Information
APAR number
PM69098
Reported component name
WMQ Z/OS V6
Reported component ID
5655L8200
Reported release
000
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2012-07-18
Closed date
2012-08-30
Last modified date
2013-04-03
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
PM69584 UK81431
Modules/Macros
CSQEUCA1 CSQGFRCV
Fix information
Fixed component name
WMQ Z/OS V6
Fixed component ID
5655L8200
Applicable component levels
R000 PSY UK81431
UP12/10/10 P F210
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"6.0","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
03 April 2013