IBM Support

PH50958: MQ Z/OS: A SUSPENDED JOB THREAD OR CHANNEL IS NOT RESUMED, FOR EXAMPLE FROM A LATCH WAIT, DUE TO A RARE TIMING CONDITION

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Original APAR title:
    MQ Z/OS: A CHANNEL REMAINS IN THE STOPPING STATE DUE TO THE
    RESUME FROM A LATCH SUSPEND NOT TAKING PLACE CORRECTLY
    .
    .
    A suspended SRB (service request block) was marked as resumed
    but was still paused. A very small timing window has been found
    in the suspend/resume code that caused the resume not to take
    effect.
    
    In the reported case, the symptom was that a receiver channel
    had CHSTATUS attributes that included STATUS(STOPPING),
    SUBSTATE(MQICALL), and STOPREQ(YES).
    
    STOP CHANNEL(channel-name) MODE(FORCE) did not end it.
    
    An attempt to start the channel from the sender queue manager
    resulted in:
      AMQ9558E: The remote channel '<channel-name>' on host
      '<ip-address>' is not currently available.
    
    CSQX558E is the z/OS equivalent to AMQ9558E.
    
    The message in the CHIN log was:
      CSQX514E CSQXRESP Channel <channel-name> is active
    This was despite the QMGR definition having settings of
    ADOPTCHK(ALL) and ADOPTMCA(ALL).
    
    From the TCP/IP perspective, the socket on the z/OS side was
    in the CLOSEWAIT (CLOSWT) state.
    
    In a dump, the thread for the channel was in commit processing
    (modules CSQMCCMT and CSQRUC01) for batch confirmation. The
    commit processing was scheduled on a CHIN adapter TCB. The
    adapter scheduled an SRB to CSQRUCA3 to complete the commit.
    The adapter subsequently suspended in CSQVSRX waiting for this
    request to complete. The SRB tried to obtain the
    IVSA.csObjectLatch latch for a queue and suspended in CSQVXLT0
    due to latch contention. Normally a latch wait is short, but in
    this case, the wait was for hours or days.
    
    The latch was no longer held, yet the waiting thread did not
    wake up from its suspend state.
    Additional symptoms and keywords:
    --------------------------------
    CLOSWT CLOSE_WAIT CLOSE-WAIT
    .
    Symptoms can vary based on the function that was not resumed.
    Channels, IMS, Db2, and other jobs can be affected.
    .
    In one case, the suspend came from CSQJW101 for a write to
    the active log. The I/O completed, but the waiter was not
    successfully resumed.
    

Local fix

  • To clear the hung thread, a recycle of the QMGR and CHIN is
    needed. Use STOP MODE(FORCE) if necessary. If shutdown does not
    complete, cancel the address spaces, starting with the CHIN or
    other hung job first.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All users of IBM MQ for z/OS Version 9       *
    *                 Release 1 Modification 0,                    *
    *                 Release 2 Modification 0 and                 *
    *                 Release 3 Modification 0.                    *
    ****************************************************************
    * PROBLEM DESCRIPTION: A task running inside MQ is suspended,  *
    *                      for example while waiting for a latch,  *
    *                      and is not correctly woken up when the  *
    *                      task should resume processing.          *
    ****************************************************************
    During CSQVSUSP processing IEAVPSE reported that the provided
    Pause Element Token (PET) was stale. CSQVSUSP updated the ROB to
    provide a valid PET and retried the IEAVPSE request, causing it
    to be suspended.
    However CSQVRESM was already in the process of resuming the same
    ROB using the same stale PET. Under rare timing conditions this
    CSQVRESM doesn't detect that the PET to use has changed, and
    does not release the PE.
    This results in the suspended task remaining hung.
    

Problem conclusion

  • CSQVRESM is changed to appropriately handle the stale PET when
    the reported timing condition occurs.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PH50958

  • Reported component name

    IBM MQ Z/OS V9

  • Reported component ID

    5655MQ900

  • Reported release

    100

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2022-11-16

  • Closed date

    2023-06-16

  • Last modified date

    2023-09-29

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UI92293 UI92294 UI92295

Modules/Macros

  • CSQVSRX
    

Fix information

  • Fixed component name

    IBM MQ Z/OS V9

  • Fixed component ID

    5655MQ900

Applicable component levels

  • R100 PSY UI92295

       UP23/07/15 P F307 ¢

  • R200 PSY UI92294

       UP23/07/15 P F307 ¢

  • R300 PSY UI92293

       UP23/07/15 P F307 ¢

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"100","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
29 September 2023