A fix is available
APAR status
Closed as program error.
Error description
A diagnostic SVC dump was taken by CPSM transaction COWC, with a title similar to the following: . EYU0XZSD Dump,masname,applid,sysidnt,LMAS,COWC,00012356, TRAC,EYU0WNLM,mm/dd/yyyy,hh:mm:ss In the CICS job log, the following messages will appear: . EYUWG0106E applid WLM has encountered an error while attempting to release MAS resources. . In the SVC dump, you format the CPSM trace entries using . VERBX EYU9Dxxx 'TRC=A,JOB=cicsname' . and find repeated entries similar to the following: . Task Mtd Prev Tran Obj Level Pt-ID Debug UOW CMAS/Usr Envr 12345 XSRA WDTR EZLI SRV Excp 1 INVRESPT CPSM CICSNAME LMAS 12345 WDTR XLOP EZLI WLM Excp 3 WDTRXSRA CPSM CICSNAME LMAS 12345 XSRA WDTR EZLI SRV Excp 1 INVRESPT CPSM CICSNAME LMAS 12345 WDTR XLOP EZLI WLM Excp 3 WDTRXSRA CPSM CICSNAME LMAS 12345 XSRA WDTR EZLI SRV Excp 1 INVRESPT CPSM CICSNAME LMAS 12345 WDTR XLOP EZLI WLM Excp 3 WDTRXSRA CPSM CICSNAME LMAS 12345 XSRA WDTR EZLI SRV Excp 1 INVRESPT CPSM CICSNAME LMAS 12345 WDTR XLOP EZLI WLM Excp 3 WDTRXSRA CPSM CICSNAME LMAS 12345 XSRA WDTR EZLI SRV Excp 1 INVRESPT CPSM CICSNAME LMAS 12345 WDTR XLOP EZLI WLM Excp 3 WDTRXSRA CPSM CICSNAME LMAS 12345 XSRA WDTR EZLI SRV Excp 1 INVRESPT CPSM CICSNAME LMAS 12345 WDTR XLOP EZLI WLM Excp 3 WDTRXSRA CPSM CICSNAME LMAS . The entries from XSRA are due to a failed attempt to get a shared lock on a workload descriptor. Formatting the same trace with 'TRC' instead of 'TRC=A', the full MAL for the call to XSRA is formatted. It shows the response and reason as follows: . Keyword Data Queue Req Data Data Value Type Dir Opt Address Value In: *FUNCTION FUN . 0001E3C0 RESACQ *DEBUG CHR . 0001E3C4 WDTRXSRA EXCLUSIVE SDT . *RESOURCE_PTR EPT . 0001E3CC A=01FF002A O=00058B50 CONDITIONAL SDT . EXCLUSIVE_MODE ENM . DOWNGRADE SDT . Out: *RESPONSE RSP . 0001E3C2 INVALID *REASON RSN . 0001E3C3 INVALID_RESOURCE_PTR *STATUS STA . 0001E3D4 OK . . The INVALID_RESOURCE_POINTER in this instance refers to the workload descriptor lock, field WRKD_LOCK in the workload descriptor. When we go to the workload descriptor in the WLM Dataspaces however, we find the following in the eye-catcher: . | ..>eYUWMEYURWRKD | . Notice the lower case 'e'. This indicates that CPSM has logically deleted it. As such, the lock word is invalid. When this occurs, we return and fail to free up a control block called a WNLE. . When transaction COWC ran to perform cleanup of these WNLEs, it them all in use by an active task, so it produced the diagnostic dump. . . Additional Symptom(s) Search Keyword(s): KIXREVSVR
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All CICSPlex SM V5R1M0, V5R2M0 and V5R3M0 * * Users * **************************************************************** * PROBLEM DESCRIPTION: MASes active as routing or target * * regions for a CPSM WLM workload may be * * invalidly removed from the workload, if * * the CMAS they are connected to is * * restarted multiple consecutive times * * while the MASes remain active, and the * * CMAS restarts are terminated before the * * CMAS performs Topology Connect with the * * MASes. * * * * During the first terminated restart, * * messages similar to the following will * * be issued in the EYULOG of the CMAS: * * * * EYUTI0009I Topology warm start for * * <masname> initiated - * * APPLID(<applid>) * * CICSplex(<plexname>). * * * * EYUWT0053W Workload Specifications * * cannot be removed during * * CMAS termination for * * CICSplex(<plexname>) * * because at least one * * <routing|target> region is * * connected to the CMAS. * * * * EYUWT0054I MAS <masname> is connected * * to the CMAS as a * * <routing|target> region in * * Workload(<wlmspec>) for * * CICSplex(<plexname>). * * * * During the second terminated restart, * * messages similar to the following will * * be issued in the EYULOG of the CMAS: * * * * EYUWM0425I Target region (<aorname>) * * has been terminated for * * Workload (<wlmspec>). * * * * EYUWM0421I Routing region (<torname>) * * has been removed from * * Workload (<wlmspec>). * * * * EYUWM0411I Workload Specification * * (<wlmspec>) has been * * removed from this CMAS for * * context (<plexname>). * * * * If this occurs, then routes attempted * * by any routing region for which message * * EYUWM0421I was received will fail, and * * routes attempted to any target region * * for which message EYUWM0425I was * * received may fail. * * * * In either case, the failures can result * * in orphaning of CPSM WLM resources in * * the MAS, which can result in message * * EYUWG0106E being issued in the MAS, * * followed by a dump. * * * * The message text will be similar to the * * following: * * * * EYUWG0106E WLM has encountered an * * error while attempting to * * release MAS resources. * * * * and the dump title will be similar to * * the following: * * * * EYU0XZSD Dump,<jobname>,<masname>, * * <lparid>,LMAS,COWC,<tasknum>,TRAC, * * EYU0WNLM,<date>,<time> * * * * The errors with the routing and target * * regions will continue until the CMAS is * * restarted and performs Topology Connect * * with the routing and target regions. * **************************************************************** * RECOMMENDATION: After applying the PTF that resolves this * * APAR, all CMASes and MASes must be restarted * * to pick up the new code. * * * * The restarts need not be performed at the * * same time, however if systems are not * * restarted at the same time, the following * * rules apply: * * * * - Maintenance Point (MP) CMASes must be * * restarted on the updated code before * * non-MP CMASes. * * * * - If you have more than one MP CMAS and any * * of those MP CMASes are connected directly * * or indirectly, then those MP CMASes must * * be restarted at the same time. * * * * - Before a MAS is restarted with the updated * * code, the CMAS to which the MAS connects * * must be running with the updated code. * * * * - This fix is being provided across all * * supported releases of CPSM as follows: * * * * - CPSM V4R1M0 - APAR PI75418 * * - CPSM V4R2M0 - APAR PI75418 * * - CPSM V5R1M0 - APAR PI76327 * * - CPSM V5R2M0 - APAR PI76327 * * - CPSM V5R3M0 - APAR PI76327 * * * * Before a CMAS running with the PTF that * * resolves this APAR for its release * * connects directly or indirectly to a CMAS * * running a higher release of CPSM, the * * higher release CMAS must be restarted so * * that it is running with the appropriate * * PTF for its release. * **************************************************************** When a CMAS that manages a CPSM WLM workload terminates while connected to MASes running as routing or target regions for the workload, and those MASes remain active, those routing and target regions should remain active in the workload. When the CMAS restarts: - Method EYU0TIWS (TIWS) executes part one of Topology warm start processing. Since the Topology data spaces are retained over the restart due to previously connected MASes still being active (CPSM CMAS warm start) TIWS is able to scan the Topology CICS system descriptor blocks (CSDBs) in the data spaces as they were at CMAS termination, to determine which MASes were active. For MASes that were connected to the CMAS, it will call the ESSS to determine if those regions are still active, and if so, will call method EYU0CPAM (CPAM) to inform the Communication component of that, issue message EYUTI0009I, and then build a TOPWCDTR resource record indicating that the MASes are in a lost state. - These records are processed by method EYU0TIW2 (TIW2) during part two of Topology warm start. It is the job of TIW2 to update the CSDBs with information from the TOPWCDTR records, including marking the CSDB status as lost connection for any lost MASes. This is done so that when Topology Connect occurs for the MASes, they will be processed properly, including marked as active. - Additionally, the TOPWCDTR records are passed to method EYU0WMWS (WMWS), which performs warm start processing for the WLM component. WMWS will then call either method EYU0WMAT (WMAT - target region termination) or EYU0WMTT (WMTT - routing region termination) with a status of lost for the MAS, which will mark the AOR descriptor (EYURWAOR) or the TOR descriptor (EYURWTOR) for the MAS to indicate it is in a lost contact state, but still active and available for WLM. However, if the first CMAS restart is terminated before performing Topology Connect for the lost MASes, then when the next restart occurs the CSDBs of still active MASes will now be marked as lost contact instead of active. - TIWS will set a status of gone for the MASes in the TOPWCDTR records, the ESSS and CPAM will not be called, and message EYUTI0009I will not be issued. - When TIW2 is called, it will propagate the gone status in the CSDB. - When WMWS is called, it will call WMAT or WMTT with a status of gone, which will result in the MAS being removed from the workload, with either message EYUWM0425I (WMAT) or EYUWM0421I (WMTT) being issued. When this restart of the CMAS terminates before performing Topology Connect with the MASes, since there are no local routing or target regions active in the workload, method EYU0WMWT (WMWT) will be called to terminate the workload, issuing message EYUWM0411I. If before another restart occurs for the CMAS that results in the MASes going through Topology Connect, a terminated routing region is called for a route request, or a terminated target region is called to handle a distributed route, CICS will pass control to module EYU9XLOP (XLOP), which is the CPSM DTRPGM and DSRTPGM routing exit. XLOP will call method EYU0WDTR (WDTR) to process the request. WDTR first calls method EYU0WDIN (WDIN), which will allocate CICS EDSA storage required for its processing. It will then call method EYU0XSRA (XSRA) to acquire a shared lock on the workload. Since the workload has been terminated, its lock has been unregistered, and XSRA will fail the request. As such, WDTR will propagate the failure back to XLOP, which will return to CICS with a response of abort. This results in CICS terminating processing for the route without calling CPSM again, and the storage allocated by the WDTR call to WDIN is orphaned. Note that if a customer is calling EYU9XLOP directly for routing decision processing, the same type of resource orphaning can occur. There are two problems that result in the errors documented above: - WDTR is not freeing allocated storage when it fails. - Topology warm start is causing the workload to be terminated prematurely, which results in subsequent calls to WDTR to fail. This APAR will address the Topology warm start problem. This should minimize the possibility of WDTR failing. WDTR resource management will be improved either in a subsequent APAR or through the development process.
Problem conclusion
There are two problems with Topology warm start processing that need to be addressed to allow for a CMAS restart terminating before the CMAS performs Topology connect: - TIWS only checks for active MASes. It needs to also check for lost contact MASes, since that is what TIW2 would set the MAS state to. - TIWS assumes that the data it needs for the CPAM call will be in the CSDB. That is not the case since what is in the CSDB is what a previous call to TIW2 might have set, and TIWS does not set into the TOPWCDTR record all of that data. To address these problems, the following changes have been made: - The TOPWCDTR resource table has been updated to include new attributes to hold additional data required from the existing CSDBs. - TIWS has been updated to collect that additional data and set it into the TOPWCDTR record. Additionally, TIWS has been updated to process MASes with a CSDB status of lost contact exactly as it processes MASes with a CSDB status of active. - TIW2 has been updated to update the CSDB with the additional data in the TOPWCDTR record.
Temporary fix
Comments
APAR Information
APAR number
PI76327
Reported component name
CICS TS Z/OS V5
Reported component ID
5655Y0400
Reported release
80M
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2017-02-09
Closed date
2017-02-14
Last modified date
2017-03-02
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UI44658 UI44659 UI44660
Modules/Macros
EYU0TIW2 EYU0TIWS EYU0WMWS EYUT2542 EYUY2542
Fix information
Fixed component name
CICS TS Z/OS V5
Fixed component ID
5655Y0400
Applicable component levels
R00M PSY UI44660
UP17/02/20 P F702
R80M PSY UI44658
UP17/02/20 P F702
R90M PSY UI44659
UP17/02/20 P F702
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGMGV","label":"CICS Transaction Server"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"5.1","Edition":"","Line of Business":{"code":"LOB35","label":"Mainframe SW"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"5.1","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
02 March 2017