A fix is available
APAR status
Closed as program error.
Error description
A hang occurred in MQ for a job using RRS coordination after the job abended or was cancelled. From a dump at the time of the hang, the RRS perpective is: CTXCEMGR issued a PC 3506 to RRS to process its end context syncpoint exit. RRS then queued a request to the RRS address space to process syncpoint for the associated UR and suspended the requestor until syncpoint completes. There was 1 interest from the MQ Resource Manager. RRS called MQ, which had not returned. For the RRS TCB, the linkage stack entries show RRS did a PC 81F06, and then there was a PC 30D from MQ Load Module CSQ3EPX CSECT CSQ3RRSX routine EBACTL_RETRY1. MQ is waiting for ECB DIEWA.lECB. The RRS TCB was looping in EBACTL_RETRY1 waiting for the EBACTL flag to be set for a thread. In the reported case, a cancel or kill of the job was done. The related MQ thread was making an MQ API call. Since the work being done was from an RRS batch application, the task underwent an EB switch. As part of this call an execution unit switch (EUS) was required, and the task suspended awaiting the request completing. In the reported case, the task was killed (abended S422) from USS, and recovery routine EUS1FRRE got control. The EUS did not complete in the next 2 seconds, so the EBDR flag was set to indicate that recovery should be deferred until End of Task (EOT). The recovery routine percolated. A critical recovery exit in CSQMCPRH was not called which would haveve turned on EBACTL in the context ACE EB. This later leads to looping in CSQ3RRSX and the address space failing to terminate. The problem only exists when EBDR is set in the context ACE EB, which only happens for certain timing windows when a task abends (or is cancelled) while waiting for an execution unit switch. This problem is a regression caused by APAR PH38111. Additional symptoms: ------------------- ABEND422 ABENDS422 In the reported case, the RRS application affected was Financial Transaction Manager for SWIFT Services for z/OS (FTM). The problem can affect other RRS applications. For the reported FTM case, symptoms included: DNFF4008E ou-service LT 'ltname': Attempt to initialize SFD failed; reason code='Old SFD process found for this LT'. We issued abort + abort force for the LT (logical terminal), but the unix broker task was still running. We tried to cancel the SFD (SWIFTNet FIN Daemon) broker job, but the task was still running. We then moved the LT's to other LPAR; but the broker job on the original LPAR still has the instance.DNF_FSM_SLS.lt queue open, so we get this error on other LPAR: DNFF4400E ou-service LT 'ltname': MQ 'OPEN' operation on queue 'instance.DNF_FSM_SLS.lt' of queue manager 'qmgr-name' failed; Reason code='2042'; error text='SLS input queue open error'. Reason code 2042 means MQRC_OBJECT_IN_USE
Local fix
The problem of the hung RRS job was resolved with an IPL.
Problem summary
**************************************************************** * USERS AFFECTED: All users of IBM MQ for z/OS Version 9 * * Release 1 Modification 0 and Version 9 * * Release 2 Modification 0 and Version 9 * * Release 3 Modification 0. * **************************************************************** * PROBLEM DESCRIPTION: An RRS application that is connected to * * MQ hangs after abending or being * * canceled. * **************************************************************** After an RRS application that is connected to MQ abends or is canceled it may hang. The hang is caused by a loop that waits for the connected application to finish its work inside the Queue Manager's Address Space. However the internal state that reflects when the connected RRS application is no longer operating inside the Queue Manager's Address Space was not being set correctly resulting in the hang.
Problem conclusion
The code has been corrected so that the internal state that reflects when the connected RRS application is no longer operating inside the Queue Manager's Address Space is set correctly preventing the connected RRS application from hanging.
Temporary fix
Comments
APAR Information
APAR number
PH54116
Reported component name
IBM MQ Z/OS V9
Reported component ID
5655MQ900
Reported release
100
Status
CLOSED PER
PE
YesPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2023-04-24
Closed date
2023-08-07
Last modified date
2023-09-01
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UI93081 UI93082 UI93083
Modules/Macros
CSQ3ID30 CSQCECTX CSQM148M CSQMCLMT CSQMCPRH CSQVEOT1 CSQVEUS2 CSQVEUS3
Fix information
Fixed component name
IBM MQ Z/OS V9
Fixed component ID
5655MQ900
Applicable component levels
R100 PSY UI93083
UP23/08/19 P F308
R200 PSY UI93082
UP23/08/19 P F308
R300 PSY UI93081
UP23/08/19 P F308
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"100","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
28 September 2023