IBM Support

PH42296: MQ FOR Z/OS: A LOOP OCCURS IN CSQVSRX WHILE HOLDING THE LOCAL LOCK FOR THE MQ MSTR ADDRESS SPACE 22/01/10 PTF PECHANGE

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • A standalone dump was created as the system became
    unresponsive, and TSO and CICS users were unable to log on.
    SYSLOG stopped recording messages.
    
    The z/OS team determined that a TCB ("TCB1") in JOB *MASTER*
    (ASID 1) was running its end of memory (EOM) resource managers.
    IPCS ANALYZE RESOURCE showed that the LOCAL LOCK FOR ASID 0001
    was held by TCB1 with DATA=SUSPENDED AND NOT DISPATCHABLE.
    Another TCB ("TCB2") in ASID 1 held the LOCAL LOCK for an MQ
    MSTR job.  This second lock was needed by that MSTR job, its
    CHIN job, and other jobs including CICS.  TCB2 was in a loop
    per system trace and had not released the lock for some time.
    
    This loop was in routine SetEBCncl in module CSQVSRX, which is
    CSQVCNCL processing. The code will loop until it has managed to
    set bit EBCancel on, but due to an unexpected state in the EB
    control block, it looped forever.  The inconsistent state in
    the EB arose from a timing window between suspend/resume
    processing when the suspending task abends.
    
    
    The scenario leading up to the problem:
    CSQMCLMT was called for local memory termination for MQ allied
    (application) address spaces. For CICS address spaces, it
    disconnects each of the associated EBs and, if they are
    suspended, it wakes them up with a CSQVCNCL call. In this case,
    TCB2 was disconnecting an EB for the CICS job.
    
    There was a large amount of latch contention for the
    IVSA.csCursLatch for a queue. This occurred due to many
    transactions simultaneously doing MQGET by MsgId from the
    queue, which had a high CURDEPTH.  MQ latch control was
    required while MQ was locating the correct message. The queue
    was not indexed, and message CSQI004I was issued for the queue.
    CSQI004I indicates that indexing the queue (setting INDXTYPE)
    will improve performance.
    
    This latch contention resulted in frequent suspending and
    resuming of TCBs for the transaction. At the same time, storage
    shortages in the CICS region resulted in many SRB-to-task
    percolations occurring, resulting in CICS TCBs being abended
    S878 while running in MQ code.
    
    A combination of MQ trace and SYSTRACE shows the order of
    events for TCB2:
     1) The TCB was paused in CSQVSUSP waiting for the
    IVSA.csCursLatch.
     2) Almost immediately after, the latch was released, and the
    TCB was resumed in CSQVRESM. The resume cleared many key fields
    in the EB.
     3) SRB-to-task percolation occurred, and the TCB was ABTERMed
    with ABENDS878-10. This drove recovery routine CSQVSRRX for the
    TCB.
     4) The recovery code identified that a resume occurred and set
    a temporary value in EBSROB. This was to prevent the field
    being improperly used during the dumping process.
     5) Recovery decided to create a dump for the 878-10 abend.
     6) The CICS address space was memtermed, preventing further
    recovery processing for the TCB.
    
    This processing resulted in the temporary value being left in
    EBSROB. Part of the memterm processing for the CICS application
    address space resulted in each of the EBs for the address space
    being disconnected with a call to CSQVCNCL in CSQVSRX. This was
    where the looping occurred due to the temporary value in
    EBSROB. If not for the memterm, this temporary value would have
    been cleared by CSQVSRRX after the dump was taken.
    
    The CSQVSRX code should not be allowed to loop while holding
    the local lock for the queue manager.  In the reported case,
    the system was IPL'd to clear the problem.
    

Local fix

  • None, other than taking steps to avoid the timing window by
    preventing the ABEND878 and by altering the INDXTYPE of the
    queue.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All users of IBM MQ for z/OS Version 9       *
    *                 Release 1 Modification 0 and Release 2       *
    *                 Modification 0.                              *
    ****************************************************************
    * PROBLEM DESCRIPTION: A MEMTERM of an MQ allied address space *
    *                      may result in a hang in End of Memory   *
    *                      (EOM) processing for the address space. *
    *                      Examining the *MASTER* TCB responsible  *
    *                      for the EOM processing shows that it is *
    *                      looping in CSECT CSQVSRX, while holding *
    *                      the local lock for the QMGR MSTR        *
    *                      address space.                          *
    ****************************************************************
    If an address space is MEMTERMed, then recovery routines will
    not get control. MQ recovery routine CSQVSRRX sets a temporary
    value in an abended task's EBSROB. If the task's home address
    space is MEMTERMed, then this temporary value may persist.
    
    During end of memory processing for an allied address space,
    active EBs will be disconnected from the QMGR address space.
    This may require a call to routine CSQVCNCL in CSECT CSQVSRX.
    This routine cannot handle the temporary EBSROB value, and will
    loop indefinitely while holding the local lock for the QMGR MSTR
    address space.
    

Problem conclusion

  • CSQVCNCL has been corrected to handle the temporary EBSROB
    value.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PH42296

  • Reported component name

    IBM MQ Z/OS V9

  • Reported component ID

    5655MQ900

  • Reported release

    100

  • Status

    CLOSED PER

  • PE

    YesPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-11-22

  • Closed date

    2022-01-10

  • Last modified date

    2022-03-01

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UI78860 UI78861

Modules/Macros

  • CSQ0CACB CSQ0COPN CSQ0DEAD CSQ0DPCS CSQ0DSVC CSQ0ERST CSQ0IPRH
    CSQ0LEPL CSQ3AAES CSQ3AM00 CSQ3AMFR CSQ3AUCM CSQ3AUCN CSQ3AUFR
    CSQ3AUGI CSQ3CT30 CSQ3CT80 CSQ3EXT0 CSQ3GCAB CSQ3ID80 CSQ3IDES
    CSQ3LCHX CSQ3PR00 CSQ3RIA0 CSQ3RIM0 CSQ3RIND CSQ3RRSR CSQ3RRSX
    CSQ3RRXF CSQ3SSES CSQ3SSFR CSQ9SCN9 CSQAPRHX CSQARIB  CSQGEXIT
    CSQGFFRR CSQGFRCV CSQGGEPL CSQIRECP CSQJB004 CSQJC001 CSQJC003
    CSQJC006 CSQJC008 CSQJC09A CSQJCR01 CSQJOFF6 CSQJOFF9 CSQJPOPN
    CSQJR007 CSQJR06A CSQJRE01 CSQJRE08 CSQJRE26 CSQJW008 CSQJW206
    CSQJWE01 CSQMALCH CSQMCALH CSQMCCHT CSQMCDLC CSQMCFEF CSQMCFRQ
    CSQMCFTK CSQMCFWU CSQMCIDT CSQMCLMT CSQMCMHB CSQMCPRH CSQMCRES
    CSQMCTXE CSQMCTXS CSQMFMH1 CSQMXARH CSQMXCLN CSQMZLOO CSQRCAFR
    CSQRCRFR CSQRCRQS CSQRCRSC CSQRCSHT CSQRCURS CSQRIURS CSQRPBCS
    CSQRPBCW CSQRPECS CSQRPLCS CSQRRRQS CSQRRURS CSQRUA01 CSQRUB01
    CSQRUC01 CSQRUE01 CSQSCON  CSQSCON2 CSQSDMPS CSQSFACL CSQSFBK
    CSQSFPL  CSQSGMN  CSQSHDWN CSQSPOWN CSQSPURS CSQSRSUP CSQSTERM
    CSQSVPL  CSQUZAP  CSQV002M CSQVCFRR CSQVCONN CSQVCRTH CSQVCST0
    CSQVDISC CSQVDST0 CSQVEOT1 CSQVEUS1 CSQVEUS2 CSQVEUS3 CSQVEUS4
    CSQVFACE CSQVFEB  CSQVGACE CSQVIALC CSQVLEPL CSQVLFRR CSQVLTT0
    CSQVSDC0 CSQVSLK  CSQVSLT0 CSQVSRRX CSQVSRX  CSQVSUL0 CSQVTFRR
    CSQVTRTH CSQVUTIL CSQVXLT0 CSQVXUL0 CSQWAAPI CSQWACC6 CSQWACCV
    CSQWDSD0 CSQWDSDM CSQWDST2 CSQWVFRR CSQWVOPX CSQWVSMT CSQWVSR2
    CSQWVZSA CSQWVZSS CSQWVZXT CSQWWFST CSQXDTRM CSQXFSTR CSQXGRIM
    CSQXJST  CSQXSUPR CSQXTCNC CSQXTCTL CSQYALLI CSQYASCP CSQYEAT2
    CSQYEATE CSQYEPL0 CSQYESCF CSQYESWE CSQYLGBL CSQYLGUN CSQYMESP
    CSQYMESS CSQYSIRM CSQYSTRT
    

Fix information

  • Fixed component name

    IBM MQ Z/OS V9

  • Fixed component ID

    5655MQ900

Applicable component levels

  • R100 PSY UI78861

       UP22/01/27 P F201 ­

  • R200 PSY UI78860

       UP22/02/10 P F202 ­

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"100"}]

Document Information

Modified date:
02 March 2022