A fix is available
APAR status
Closed as program error.
Error description
The initial problem report was high CPU. MSTR trace showed a loop between CSQP1GET and and CSQP1REL in the SCAVNGOB (object scavenger) thread. PSWs from the systrace showed that CSQISCO2 was driving the loop. The loop was caused by a corruption of the queue object page chain in PSID(00). The problem occurs when there is an MQCLOSE with MQCO_DELETE_PURGE of a permanent dynamic queue, which still has messages on it, at the the same time as the object scavenger is running. A change made by PM93543 (PTF UI11858) means that a lock on the page in page set 0 which contains the queue object is released prematurely, with the result that the page can be deallocated twice. ABEND symptoms may include: - ABN=5C6-00C91600,U=SYSOPR ,C=R3600.710.DMC -CSQIERS3, M=CSQGFRCV,LOC=CSQILPLM.CSQIERS3+00000F9E ABN=5C6-00C91600,U=SYSOPR ,C=R3600.710.DMC -CSQIERS3,M=CSQGFRCV,LOC=CSQILPLM.CSQIERS3+00000F9E where 00C91600 means CSQI_OBJECT_ALREADY_EXISTS - ABN=5C6-00C90600,U=C000940 ,C=R3600.710.DMC -CSQIMGE6,M=CSQGFRCV,LOC=CSQILPLM.CSQIMGE6+000004C8 ABN=5C6-00C90600,U=C000940 ,C=R3600.710.DMC -CSQILCHG,M=CSQGFRCV,LOC=CSQILPLM.CSQILCHG+0000051C ABN=5C6-00C90600,U=C000940 ,C=R3600.710.DMC -CSQILVAL,M=CSQGFRCV,LOC=CSQILPLM.CSQILVAL+0000038A where 00C90600 means CSQI_NO_RECORD_FOUND In the reported case, the queue manager was part of a Queue Sharing Group ( QSG ). The loop prevented MQ from processing other work so that the following occurred. IXC431I GROUP CSQGSEP0 MEMBER QBPK JOB ssidMSTR ASID nnnn STALLED AT mm/dd/yyyy hh:mm:ss.ssssss ID: 0.1 LAST MSGX: mm/dd/yyyy hh:mm:ss.ssssss nn STALLED nnnn PENDINGQ LAST GRPX: mm/dd/yyyy hh:mm:ss.ssssss n STALLED n PENDINGQ LAST STAX: 0 STALLED IXC430E SYSTEM ssss HAS STALLED XCF GROUP MEMBERS The the inbound paths for the Coupling Facility ( CF ) were being affected because of messages being accumulated. XCF tried to restart the pathins to try alleviate the problem: IXC467I RESTARTING PATHIN STRUCTURE IXC_PATH_S2 LIST 15 USED TO COMMUNICATE WITH SYSTEM ssss RSN: START CONVERTED TO RESTART Eventually, people could not log in or issue system commands. An IPL was required. The queue manager abended with ABN=5C6-00C91600 upon restart. A zap from the change team was required to repair the chain and get the queue manager restarted. Additional Symptom(s) Search Keyword(s): ABEND5C6 ABENDS5C6 5C6 S5C6 S05C6 00C90600 00C91600 performance loops looping PSID 00
Local fix
Contact the support center for a zap to repair PSID(00). Technote http://www.ibm.com/support/docview.wss?uid=swg21256847 has instructions on how to use ADRDSSU to ship the page set. . Back out UI11858.
Problem summary
**************************************************************** * USERS AFFECTED: All users of WebSphere MQ for z/OS Version 7 * * Release 1 Modification 0. * **************************************************************** * PROBLEM DESCRIPTION: After applying UI11858, or its * * superseded PTF UI13085, users may * * experience a chain corruption in * * PSID(0) if using permanent dynamic * * queues. This can result in a loop * * in an SRB in the queue manager. * * * * Symptoms include one or more of * * the following: * * - The queue manager issues abend * * 5C6-00C90600 in CSQIMGE6, * * CSQILCHG and CSQILVAL * * - The queue manager issues abend * * 5C6-00C90B00 in CSQIMGE9 * * - The queue manager issues abend * * 5C6-00C91600 in CSQIERS3 during * * startup and fails to start * * - The queue manager is using high * * CPU in load-module CSQILPLM, * * CSECT CSQISCO2 * * - The queue manager storage usage * * increases over time * * - The LPAR responsiveness is reduced, * * particularly on LPARs with few CPs * **************************************************************** * RECOMMENDATION: * **************************************************************** UI11858 (PM93543) has introduced changes to the processing of MQCLOSE API calls for permanent dynamic queues, where the MQCO_DELETE_PURGE option is specified. These changes release the lock on the page holding the queue definition prematurely, opening a small timing window where the same page could be deallocated twice. If the scavenger is invoked and tries to release pages on pageset 0 with a corrupt chain, it can go into a loop. This problem can also occur for a "DELETE QLOCAL() PURGE" command issued against a local or permanent dynamic queue.
Problem conclusion
The code has been changed to release the page locks at the correct point in the delete processing, ensuring that the object page chain remains consistent. 100Y CSQIDDEL CSQIDEL3 CSQIMGE3 CSQLRSAV CSQMHDRS CSQMRPUT CSQMSSUB CSQMSUB CSQMSUBI CSQMSUBV
Temporary fix
********* * HIPER * *********
Comments
APAR Information
APAR number
PI25131
Reported component name
WMQ Z/OS V7
Reported component ID
5655R3600
Reported release
100
Status
CLOSED PER
PE
YesPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2014-09-04
Closed date
2014-09-30
Last modified date
2014-11-04
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
PI26011 UI21836
Modules/Macros
CSQIDDEL CSQIDEL3 CSQIMGE3 CSQLRSAV CSQMHDRS CSQMRPUT CSQMSSUB CSQMSUB CSQMSUBI CSQMSUBV
Fix information
Fixed component name
WMQ Z/OS V7
Fixed component ID
5655R3600
Applicable component levels
R100 PSY UI21836
UP14/10/17 P F410 «
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.1","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
04 November 2014