A fix is available
APAR status
Closed as program error.
Error description
The first report of this problem was that queue managers in a Queue Sharing Group ( QSG ) are hung and show GRS contention. DISPLAY GRS STATUS shows yyyyMSTR owns an exclusive lock and ALL other QMGRs in the QSG are waiting. D GRS,C showed: S=SYSTEMS SYSZCSQE PeerLevelRecoveryzzzz0000000C00000039 SYSNAME JOBNAME ASID TCBADDR EXC/SHR STATUS MFOS yyyyMSTR 008C 00960140 EXCLUSIVE OWN MBOS aaaaMSTR 0095 00962140 EXCLUSIVE WAIT MBOS bbbbMSTR 010E 0096C7F0 EXCLUSIVE WAIT MAOS ccccMSTR 0093 00973A48 EXCLUSIVE WAIT MAOS ddddMSTR 017E 0096B988 EXCLUSIVE WAIT etc. MQ error messages logged: CSQ3201E +xxxx ABNORMAL EOT IN PROGRESS FOR USER=WSSRWDP CONNECTION-ID=RRSBATCH THREAD-XREF= CSQE007I +yyyy EEPLDISCFAILCONNECTION event received for structure MQRBSDJ02 connection name CSQEzzzzxxxx0C CSQE008I +yyyy Recovery event from xxxx received for structure MQRBSDJ02 Dumps taken included: Dump Title: xxxx,ABN=5C6-00C51027,U=SYSOPR CSQERAD1,M=CSQGFRCV,LOC=CSQELPLM.CSQERAD1+0912 Dump Title: ABEND=S026,REASON=08118001,CONNECTOR HANG: CONNAME=CSQEzzzzyyyy0D,JOBNAME=yyyyMSTR SYSLOG also showed IXL041E and IXL049E received for the structure. LOGREC shows abend s13E occurring in XES code. * Additional symptoms: Subsequent reports of this problem did not necessarily have the symptom of GRS contention. Their symptoms included: - An MQ dump shows that MQ system threads and application threads were waiting for latch SSSCONN / DMCSEGAL (the csObjectLatch). The latch owner was in a wait in CSQEUCAT for a lock in the Coupling Facility ( CF ). The lock owner had abended after CSQEUCAT obtained the lock and before it updated it's flag to indicate the lock is held. Therefore, when the abend occurred, the lock was not released. - The command processor ( thread RTSSRV01 ) was not working because it was waiting for the latch. - Application or channel threads may be hung in CSQE* modules, e.g. waiting in CSQEMPUT with the TCB waiting in IXLRQRSU . An attempt to restart the channel fails with CSQX514E channel is active. - CSQE020E CSQ1 Structure <strucid> connection as <connection> failed, RC=0000000C reason=02010C27 codes=00000002 00000008 00000C27 - CSQE007I CSQ1 EEPLDISCFAILCONNECTION event received for structure APPLICATION1 connection name CSQECSQSCSQ101 - ABEND=S026 REASON=08118001 ABENDS026 ABEND026 ABEND S026 026 - ABEND878-10 due to a build-up of DXWB control blocks ABENDS878 ABEND878 ABEND S878 878 . L2 Verification Steps: See details in the internal Level 2 forum
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All users of WebSphere MQ for z/OS Version 7 * * Release 0 Modification 1 and Release 1 * * Modification 0. * **************************************************************** * PROBLEM DESCRIPTION: Multiple queue managers in a qsg hang * * while waiting for a lock on a list * * header. Symptoms include: * * - Queue Manager hangs during startup * * performing peer level recovery * * - Tasks accessing shared queues hang * * - Abend 5C6-00C51027 in CSQERAD1 * * - Abend S026 due to CONNECTOR HANG * **************************************************************** * RECOMMENDATION: * **************************************************************** A task opening a shared queue calls CSQEUCAT to update the list header for that queue, and this calls IXLLSTC to lock the list header. After the lock is granted, but before CSQEUCAT flags this, the task abends and CSQEUCAT's recovery routine is invoked, but because the successful granting of the lock was not yet recorded, no attempt is made to release the lock. Subsequently any task on any qmgr in the qsg attempting to get the lock hangs until the qmgr where the abend occurred is recycled.
Problem conclusion
CSQEUCAT is changed to close the timing window where a lock has been granted but this has not been flagged for the recovery routine. Additionally, CSQGFRCV is changed to save the time of an abend in the FRE to aid in diagnosing similar problems in the future. 010Y 100Y CSQEUCAT CSQGFRCV
Temporary fix
********* * HIPER * *********
Comments
APAR Information
APAR number
PM69584
Reported component name
WMQ Z/OS V7
Reported component ID
5655R3600
Reported release
010
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2012-07-25
Closed date
2012-09-11
Last modified date
2013-04-03
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UK81713 UK81714
Modules/Macros
CSQEUCAT CSQGFRCV
Fix information
Fixed component name
WMQ Z/OS V7
Fixed component ID
5655R3600
Applicable component levels
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.0.1","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
03 April 2013