z/OS MVS Planning: Global Resource Serialization
|
Previous topic |
Next topic |
Contents |
Contact z/OS |
Library |
PDF
Checking for ENQ contention problems z/OS MVS Planning: Global Resource Serialization SA23-1389-00 |
|||||||||||
|
When workload slows down and global resource serialization appears to be operating normally, the problem is often due to some part of the workload dominating ENQ resources in the sysplex. Because many workloads require exclusive access to resources (for example, to update a file), resource contention occurs between different parts of the workload when incompatible requests are made for resources. By itself, resource contention is not a sign of a problem. However, contention held for a long period of time among the same resources and requesters might be an indication of a problem. Global resource serialization provides diagnostic commands to help determine the source of contention.
To illustrate how to use contention analysis, an example is presented. In this example, the three-system sysplex is made up of systems PROD1, PROD2 and TEST. Figure 1. Example of Three System Sysplex
![]() In the scenario, four different work units will be affected:
The production job on PROD1 is a multistep process that submits the cleanup job, which is to run after the completion of the production run. The cleanup job is kept from running by the exclusive data set ENQ, [SYSDSN, PROD.DB] owned by the production job. The scenario begins with the production job running. It reaches the step where it submits the cleanup job. The cleanup job initiates but is blocked in allocation on the global ENQ for [SYSDSN, PROD.DB]. However, as part of allocation, it takes exclusive ownership of [SYSDSN, PROD.PROCS]. The current view of contention is displayed in the following figure. In the figure, units of work are represented by rectangles and resources are represented by ovals. The arrow and text from the unit of work to the resource represents the dependency. Figure 2. Current View of Contention
![]() While the production job is executing, a task in the master scheduler address space (*MASTER*) fails, but does not end, while holding the system command resource, [SYSIEFSD, Q10]. This resource is required by tasks that need to issue MVS™ system commands. Following this failure, the production job invokes the MGCRE macro to issue a system command. Because the command resource is permanently hung up by the *MASTER* task, ownership cannot be granted to PRODJOB. Contention for ENQ resources now looks like: Figure 3. Contention for ENQ Resources
![]() You discover that there is some sort of problem with the production database; the production job and the cleanup job seem to be hung up. Interactive requests for the database fail with an indication that the database is unavailable. You run an exec from the TSO session, which attempts to allocate both the production database, [SYSDSN, PROD.DB] and the production procedures library, [SYSDSN, PROD.PROCS]. This, of course, hangs the TSO session in an ENQ wait. The final state of contention is as follows: Figure 4. Final State of Contention
![]() Note that on a normal system, there will always be some level of "background" contention that is going on all the time. The above example ignores that level of contention and only displays the contention that is applicable. To debug this problem, use the contention analysis features provided by global resource serialization. The first thing that you discover is that commands do not seem to work on PROD1, so any systems analysis would have to occur either on TEST or PROD2. You must determine if any resources are in contention. If DISPLAY GRS,C were issued on PROD2 or TEST, the result would be as follows:
Looking at this output, it would appear that the problem is with PRODJOB; it is blocking both CLEANUP and SYSPROG from continuing. However, the DISPLAY GRS,C command does not return information about local resources on PROD1, as it was issued on PROD2 or TEST. If system commands were working on PROD1, DISPLAY GRS,C from PROD1 would provide a more complete picture:
You can see that the local resource [SYSIEFSD, Q10], held by *MASTER* is really holding up all of the workload. In a fully loaded system, where there is a considerable amount of workload being processed concurrently, the opportunities for contention and the number of units of work involved in that contention can become much higher. It might be impossible to quickly analyze what resources and units of work are part of a ENQ lockout and which ones are not. When this occurs, using the DISPLAY GRS,ANALYZE command is much more useful. Additionally, the GRS analysis command options are truly sysplex-wide in scope. The analysis will include local resources on all systems in the sysplex. The analysis provided by this command is based on the fact that most of the "benign" contention in the sysplex is short term. That is, if you issue the same command over a period of time, the contention that is most affecting the sysplex remains in the output of the command. The output from the command is ordered by the length of time that the contention has been in effect. In a serious resource lockout, where one requester dominates ownership of a resource for a long period of time, or a resource deadlock, where a set of requesters requires resources held by the others in the set such that no request can be granted, the contention will quickly rise to the top of the output. Using the previous lockout scenario, the DISPLAY GRS,ANALYZE,BLOCKER command would return:
It is clear from this output that PRODJOB has been blocking other requesters for the longest time. The display does not tell the complete story. The view obtained from DISPLAY GRS,ANALYZE,WAITER command shows that PRODJOB is also a waiter:
Again, because this is a simple case, it is easy to see that, although PRODJOB has been blocking the longest, PRODJOB is itself blocked by *MASTER* for [SYSIEFSD, Q10]. What if the scenario is far more complicated? Then the DISPLAY GRS,ANALYZE,DEPENDENCY command is very useful in determining if a single or a small set of jobs is causing the lockout. The command can also detect a resource allocation deadlock. For this scenario, the output from the command would be:
From this analysis, it is obvious that the problem is with *MASTER* and not PRODJOB. In fact, all of the dependency analyses end with *MASTER*. Although you cannot restart *MASTER*, the only way to clear up this lockout is to reIPL PROD1. In the case where the unit of work that all of the other requests depend on is a cancelable job or a subsystem that can be recycled, the operator can take appropriate action against that address space to resume normal system operations. For complete information on the DISPLAY GRS,ANALYZE command, see z/OS MVS System Commands.
|
Copyright IBM Corporation 1990, 2014 |