Troubleshooting high CPU usage

The topic helps you resolve high CPU usage.

The common reason for high CPU usage is loop in the code. It could be a loop in the user application or CICS code. Possible causes of a loop that does not terminate are:

The termination condition can never occur.
The code never tests for the termination condition.
When the termination condition is met, the conditional branch causes the loop to be performed again.

The following procedures can help identify programs that are involved in a loop that does not terminate.

If the looping code is in one of your applications, check the code for errors.
If the error appears to be in the TXSeries code, contact your support organization, see Working with your support organization.
Investigate the loop - Some examples of initial symptoms that can indicate a loop are:
- Repetitive output.
- Statistics show an excessive number of input and output operations.
- Statistics show an excessive number of requests for storage.
- An application server is using a lot of CPU time.
The characteristics of the symptoms might indicate which transaction is causing the loop, but to define the limits of the loop, you must use trace. You can use auxiliary trace to capture the whole loop in the trace data. If you use internal trace, wraparound might prevent you from seeing the whole loop. See Understanding TXSeries system trace.

After you have captured the trace data, purge the looping task from the system. To do this, find the task number of the task by using CEMT INQ TASK. Use CEMT SET TASK PURGE or FORCEPURGE to purge the task. This causes the transaction to terminate abnormally and produce a transaction dump of the task storage areas.

If the loop does not contain any EXEC CICS statements, you cannot use trace to determine the limits of the loop. Insert EXEC CICS ENTER calls into the application code in the areas that you suspect are causing problems, and capture the trace data again.
The following documentation is useful:
- The trace data.
- The transaction dump.
- Source listings of all the programs that are in the transaction.
- At least 3 stack of the process taken at short interval gap of 10 seconds.
Use the trace data and the program listings to identify the limits of the loop. Use the transaction dump to examine the user storage for the program. Examine the data to see why the loop occurred.
Identify the loop - Use the trace table to detect the repeating pattern of trace entries. If this is difficult, possibly the loop is large because many different programs are involved. Another possibility is that you have not captured the whole loop in the trace file, because the loop did not complete one cycle before you purged the transaction.
Remember that you might not be dealing with a loop, so the symptoms might be caused by something else. For example, poor application design.

If you can detect a pattern, you can identify the corresponding pattern of statements in your source code.

To help you identify the task that might be looping, you can set the MaxTaskCPU and the MaxTaskAction attributes of the Region Definitions (RD). The MaxTaskAction attribute can be set to abend or warning so that if the particular task exceeds the limit that is set in MaxTaskCPU, either a message is issued, or the transaction abends. The MaxTaskCPU attribute of the Transaction Definitions (TD) allows you to target specific transactions.
Finding the reason for the loop - Examine the statements that are contained in the loop.
Does the logic of the code suggest why the loop occurred?

If not, examine the contents of data fields in the task's user storage. Look for unexpected response codes and null values when the program copes only with finite values. The action of a program can be unpredictable when these conditions are encountered unless the code runs tests for such conditions and handles them accordingly.