[z/OS]

Examining the problem in greater depth on z/OS

Further checks to carry out when you have established that no changes have been made to your system, and that there are no problems with your application programs, but the preliminary checks have not enabled you to solve your problem.

About this task

Procedure

  1. Have you received some incorrect output?
    If you have obtained what you believe to be some incorrect output, consider the following:
    • When to classify output as incorrect

      "Incorrect output᾿ might be regarded as any output that you were not expecting. However, use this term with care in the context of problem determination because it might be a secondary effect of some other type of error. For example, looping could be occurring if you get any repetitive output, even though that output is what you expected.

    • Error messages

      IBM MQ also responds to many errors it detects by sending error messages. You might regard these messages as "incorrect output᾿, but they are only symptoms of another type of problem. If you have received an error message from IBM MQ that you were not expecting, see Are there any error messages, return codes or other error conditions? in Identifying characteristics of the problem on z/OS.

    • Unexpected messages

      Your application might not have received a message that it was expecting, or has received a message containing unexpected or corrupted information, or has received a message that it was not expecting (for example, one that was destined for a different application). For more information, see Dealing with incorrect output on z/OS.

  2. Have you received an unexpected error message or return code?
    If your application has received an unexpected error message, consider whether the error message has originated from IBM MQ or from another program.
    • IBM MQ error messages

      IBM MQ for z/OS error messages are prefixed with the letters CSQ. If you get an unexpected IBM MQ error message (for example, in the console log, or the CICS® log), see IBM MQ for z/OS messages, completion, and reason codes for an explanation, which might give you enough information to resolve the problem quickly, or it might redirect you to further information. If you cannot deal with the message, you might have to contact the IBM Support for help.

    • Non- IBM MQ error messages
      If you get an error message from another IBM program, or from the operating system, look in the appropriate messages and codes documentation for an explanation of what it means. In a queue-sharing environment, look for the following error messages:
      • XES (prefixed with the letters IXL)
      • Db2® (prefixed with the letters DSN)
      • RRS (prefixed with the letters ATR)
    • Unexpected return codes

      If your application has received an unexpected return code from IBM MQ, see Return codes for information about how your application can handle IBM MQ return codes.

  3. Has there been an abend?
    If your application has stopped running, this might be caused by an abnormal termination (abend). Abends can be caused by the user ending the task being performed before it terminates normally; for example, if you purge a CICS transaction. Abends can also be caused by an error in an application program.
    You are notified of an abend in one of the following places, depending on what type of application you are using:
    • For Batch applications, your listing shows the abend.
    • For CICS applications, you see a CICS transaction abend message. If your task is a terminal task, this message is displayed on your screen. If your task is not attached to a terminal, the message is displayed on the CICS CSMT log.
    • For IMS applications, in all cases, you see a message at the IBM MQ for IMS master terminal and in the listing of the dependent region involved. If an IMS transaction that had been entered from a terminal was being processed, an error message is also sent to that terminal.
    • For TSO applications, you might see a TSO message with a return code on your screen. (Whether this message is displayed depends on the way your system is set up, and the type of error.)
    For some abends, an address space dump is produced. For CICS transactions, a transaction dump showing the storage areas of interest to the transaction is provided.
    • If an application passes some data, the address of which is no longer valid, a dump is sometimes produced in the address space of the user.
      Note: For a batch dump, the dump is formatted and written to SYSUDUMP. For information about SYSUDUMPs, see SYSUDUMP information on z/OS. For CICS, a system dump is written to the SYS1.DUMP data sets, as well as a transaction dump being taken.
    • If a problem with IBM MQ for z/OS itself causes an abend, an abend code of X'5C6' or X'6C6' is returned, along with an abend reason code. This reason code uniquely describes the cause of the problem. See IBM MQ for z/OS abends for information about the abend codes, and see Return codes for an explanation of the reason code.

    If your program has terminated abnormally, see Dealing with abends on IBM MQ for z/OS.

    If your system has terminated abnormally, and you want to analyze the dump produced, see IBM MQ for z/OS dumps. This section tells you how to format the dump, and how to interpret the data contained in it.

  4. Have you received no response from an MQSC command?
    If you have issued an MQSC command from an application, and not from a z/OS console, but you have not received a response, consider the following questions:
    • Is the command server running?
      Check that the command server is running, as follows:
      1. Use the DISPLAY CMDSERV command at the z/OS console to display the status of the command server.
      2. If the command server is not running, start it using the START CMDSERV command.
      3. If the command server is running, use the DISPLAY QUEUE command with the name of the system-command input queue and the CURDEPTH and MAXDEPTH attributes to define the data displayed. If these values show that the queue is full, and the command server has been started, the messages are not being read from the queue.
      4. Try stopping the command server and then restarting it, responding to any error messages that are produced.
      5. Issue the display command again to see if it is working now.
    • Has a reply been sent to the dead-letter queue?

      If you do not know the name of the system dead-letter queue, use the DISPLAY QMGR DEADQ command to find the name. Use this name in the DISPLAY QUEUE command with the CURDEPTH attribute to see if there are any messages on the queue. The dead-letter queue message header (dead-letter header structure) contains a reason or feedback code describing the problem. For information about the dead-letter header structure, see Reason (MQLONG).

    • Are the queues enabled for PUTs and GETs?

      Use the DISPLAY QUEUE command from the console to check, for example DISPLAY QUEUE(SYSTEM.COMMAND.INPUT) PUT GET.

    • Is the WaitInterval parameter set to a sufficiently long time?

      If your MQGET call has timed out, your application receives completion code of 2 and a reason code of 2033 (MQRC_NO_MSG_AVAILABLE). (See WaitInterval (MQLONG) and MQGET - Get message for information about the WaitInterval parameter, and completion and reason codes from MQGET.)

    • Is a sync point required?

      If you are using your own application program to put commands onto the system-command input queue, consider whether you must take a sync point. You must take a sync point after putting messages to a queue, and before attempting to receive reply messages, or use MQPMO_NO_SYNCPOINT when putting them. Unless you have excluded your request message from sync point, you must take a sync point before attempting to receive reply messages.

    • Are the MaxDepth and MaxMsgL parameters of your queues set sufficiently high?

      See CSQO016E for information about defining the system-command input queue and the reply-to queue.

    • Are you using the CorrelId and MsgId parameters correctly?

      You must identify the queue and then display the CURDEPTH. Use the DISPLAY QUEUE command from the console (for example, DISPLAY QUEUE (MY.REPLY.QUEUE) CURDEPTH), to see if there are messages on the reply-to queue that you have not received. Set the values of MsgId and CorrelId in your application to ensure that you receive all messages from the queue.

    The following questions are applicable if you have issued an MQSC command from either a z/OS console (or its equivalent), or an application, but have not received a response:
    • Is the queue manager still running, or did your command cause an abend?

      Look for error messages indicating an abend, and if one occurred, see IBM MQ for z/OS dumps.

    • Were any error messages issued?

      Check to see if any error messages were issued that might indicate the nature of the error.

    For information about the different methods you can use to enter MQSC commands, see Sources from which you can issue MQSC and PCF commands on IBM MQ for z/OS.

  5. Is there a problem with the IBM MQ queues?
    If you suspect that there is a problem affecting the queues on your subsystem, use the operations and control panels to display the system-command input queue.
    • Has the system responded? If the system responds, then at least one queue is working. In this case, continue with Step 6.
    • Has the system not responded? The problem might be with the whole subsystem. In this instance, try stopping and restarting the queue manager, responding to any error messages that are produced. Check for any messages on the console needing action. Resolve any that might affect IBM MQ, such as a request to mount a tape for an archive log. See if other subsystems or CICS regions are affected. Use the DISPLAY QMGR COMMANDQ command to identify the name of the system command input queue.
    • Does the problem still occur after restart? Contact IBM Support for help (see Collecting troubleshooting information).
  6. Are some of your queues working?
    If you suspect that the problem occurs with only a subset of queues, select the name of a local queue that you think is having problems and use the DISPLAY QUEUE and DISPLAY QSTATUS commands to display information about the queue.
    • Is the queue being processed?
      • If CURDEPTH is at MAXDEPTH, it might indicate that the queue is not being processed. Check that all applications that use the queue are running normally (for example, check that transactions in your CICS system are running or that applications started in response to Queue Depth High events are running).
      • Use the command DISPLAY QSTATUS(xx) IPPROCS to see if the queue is open for input. If not, start the application.
      • If CURDEPTH is not at MAXDEPTH, check the following queue attributes to ensure that they are correct:
        • If triggering is being used, is the trigger monitor running? Is the trigger depth too big? Is the process name correct? Have all the trigger conditions been met?

          Use the command DISPLAY QSTATUS(xx) IPPROCS to see if an application has the same queue open for input. In some triggering scenarios, a trigger message is not produced if the queue is open for input. Stop the application to cause the triggering processing to be invoked.

        • Can the queue be shared? If not, another application (batch, IMS, or CICS) might already have it open for input.
        • Is the queue enabled appropriately for GET and PUT?
    • Do you have a long-running unit of work?

      If CURDEPTH is not zero, but when you attempt to MQGET a message the queue manager replies that there is no message available, either use the command DIS QSTATUS(xx) TYPE(HANDLE) to show you information about applications that have the queue open, or use the command DIS CONN(xx) to give you more information about an application that is connected to the queue.

    • How many tasks are accessing the queues?

      Use the command DISPLAY QSTATUS(xx) OPPROCS IPPROCS to see how many tasks are putting messages on to, and getting messages from the queue. In a queue-sharing environment, check OPPROCS and IPPROCS on each queue manager. Alternatively, use the CMDSCOPE attribute to check all the queue managers. If there are no application processes getting messages from the queue, determine the reason, which might, for example, be because the applications need to be started, or a connection has been disrupted, or because the MQOPEN call has failed for some reason.

    • Is this queue a shared queue? Does the problem affect only shared queues?

      Check that there is not a problem with the sysplex elements that support shared queues. For example, check that there is not a problem with the IBM MQ-managed Coupling Facility list structure.

      Use the command D XCF, STRUCTURE, STRNAME=ALL to check that the Coupling Facility structures are accessible.

      Use the command D RRS to check that RRS is active.

    • Is this queue part of a cluster?

      Check to see if the queue is part of a cluster (from the CLUSTER or CLUSNL attribute). If it is, verify that the queue manager that hosts the queue is still active in the cluster.

    If you cannot solve the problem, contact IBM Support for help (see Collecting troubleshooting information).
  7. Are the correct queues defined?
    IBM MQ requires certain predefined queues. Problems can occur if these queues are not defined correctly.
    • Check that the system-command input queue, the system-command reply model queue, and the reply-to queue are correctly defined, and that the MQOPEN calls were successful.
    • If you are using the system-command reply model queue, check that it was defined correctly.
    • If you are using clusters, you need to define the SYSTEM.CLUSTER.COMMAND.QUEUE to use commands relating to cluster processing.
  8. Does the problem affect only remote or cluster queues?
    If the problem affects only remote or cluster queues, check:
    • Are the remote queues being accessed? Check that the programs putting messages to the remote queues have run successfully (see Dealing with incorrect output on z/OS).
    • Is the system link active? Use APPC or TCP/IP commands as appropriate to check whether the link between the two systems is active. Use PING or OPING for TCP/IP or D NET ID=xxxxx, E for APPC.
    • Is triggering working? If you use triggering to start the distributed queuing process, check that the transmission queue has triggering set on and that the queue is get-enabled.
    • Is the channel or listener running? If necessary, start the channel or the listener manually, or try stopping and restarting the channel. See Configuring distributed queuing for more information. Look for error messages on the startup of the channel initiator and listener. See IBM MQ for z/OS messages, completion, and reason codes and Configuring distributed queuing to determine the cause.
    • What is the channel status? Check the channel status using the DISPLAY CHSTATUS (channel_name) command.
    • Are your process and channel definitions correct? Check your process definitions and your channel definitions.

    For information about how to use distributed queuing, and for information about how to define channels, see Configuring distributed queuing.

  9. Does the problem affect only shared queues?

    If the problem affects only queue sharing groups, use the VERIFY QSG function of the CSQ5PQSG utility. This command verifies that the Db2 setup is consistent in terms of the bitmap allocation fields, and object definition for the Db2 queue manager, structure, and shared queue objects, and reports details of any inconsistency that is discovered.

    The following is an example of a VERIFY QSG report with errors:

    
    CSQU501I  VERIFY QSG function requested
    CSQU503I  QSG=SQ02, DB2 DSG=DSN710P5, DB2 ssid=DFP5
    CSQU517I  XCF group CSQGSQ02 already defined
    CSQU520I  Summary information for XCF group CSQGSQ02
    CSQU522I  Member=MQ04, state=QUIESCED, system=MV4A
    CSQU523I  User data=D4E5F4C15AD4D8F0F4404040C4C5....
    CSQU522I  Member=MQ03, state=QUIESCED, system=MV4A
    CSQU523I  User data=D4E5F4C15AD4D8F0F3404040C4C6....
    CSQU526I  Connected to DB2 DF4A
    CSQU572E  Usage map T01_ARRAY_QMGR and DB2 table CSQ.ADMIN_B_QMGR inconsistent
    CSQU573E  QMGR MQ04 in table entry 1 not set in usage map
    CSQU574E  QMGR 27 in usage map has no entry in table
    CSQU572E  Usage map T01_ARRAY_STRUC and DB2 table CSQ.ADMIN_B_STRUCTURE inconsistent
    CSQU575E  Structure APPL2 in table entry 4 not set in usage map
    CSQU576E  Structure 55 in usage map has no entry in table
    CSQU572E  Usage map T03_LH_ARRAY and DB2 table CSQ.OBJ_B_QUEUE inconsistent
    CSQU577E  Queue MYSQ in table entry 13 not set in usage map for structure APPL1
    CSQU576E  Queue 129 in usage map for structure APPL1 has no entry in table
    CSQU528I  Disconnected from DB2 DF4A
    CSQU148I  CSQ5PQSG Utility completed, return code=12
    
  10. Is your application or IBM MQ for z/OS running slowly?
    Slow applications can be caused by the application itself or underlying software including IBM MQ.
    If your application is running slowly, this could indicate that it is in a loop, or waiting for a resource that is not available.
    • Is the problem worse at peak system load times? This could also be caused by a performance problem. Perhaps it is because your system needs tuning, or because it is operating near the limits of its capacity. This type of problem is probably worst at peak system load times, typically at mid-morning and mid-afternoon. If your network extends across more than one time zone, peak system load might seem to you to occur at some other time.
    • Does the problem occur when the system is lightly loaded? If you find that degrading performance is not dependent on system loading, but happens sometimes when the system is lightly loaded, a poorly designed application program is probably to blame. This could manifest itself as a problem that only occurs when specific queues are accessed.
    • Is IBM MQ for z/OS running slowly? The following symptoms might indicate that IBM MQ for z/OS is running slowly:
      • If your system is slow to respond to commands.
      • If repeated displays of the queue depth indicate that the queue is being processed slowly for an application with which you would expect a large amount of queue activity.

    For guidance on dealing with waits and loops, see Dealing with applications that are running slowly or have stopped on z/OS, and on dealing with performance problems, see Dealing with performance problems on z/OS.

  11. Has your application or IBM MQ for z/OS stopped processing work?
    There are several reasons why your system might unexpectedly stop processing work. The problem areas to check for include:
    • Are there any queue manager problems? The queue manager might be shutting down.
    • Are there any application problems? An application programming error might mean that the program branches away from its normal processing, or the application might get in a loop. There might also have been an application abend.
    • Are there any problems with IBM MQ? Your queues might have become disabled for MQPUT or MQGET calls, the dead-letter queue might be full, or IBM MQ for z/OS might be in a wait state, or a loop.
    • Are there any z/OS or other system problems? z/OS might be in a wait state, or CICS or IMS might be in a wait state or a loop. There might be problems at the system or sysplex level that are affecting the queue manager or the channel initiator. For example, excessive paging. It might also indicate DASD problems, or higher priority tasks with high processor usage.
    • Are there any Db2 or RRS problems? Check that Db2 and RRS are active.

    In all cases, carry out the following checks to determine the cause of the problem:

    1. Check for error messages.
      Use the DISPLAY THREAD(*) command to check if the queue manager is running. If the queue manager has stopped running, look for any messages that might explain the situation. Messages are displayed on the z/OS console, or on your terminal if you are using the operations and control panels. Use the DISPLAY DQM command to see if the channel initiator is working, and the listeners are active. The z/OS command
      DISPLAY R,L
      
      lists messages with outstanding replies. Check to see whether any of these replies are relevant. In some circumstances, for example, when it has used all its active logs, IBM MQ for z/OS waits for operator intervention.
    2. If no there are no error messages, issue the following z/OS commands:
      DISPLAY A,xxxxMSTR
      DISPLAY A,xxxxCHIN
      
      where xxxx is the IBM MQ for z/OS subsystem name.

      If you receive a message telling you that the queue manager or channel initiator has not been found, this message indicates that the subsystem has terminated. This condition could be caused by an abend or by operator shutdown of the system.

      If the subsystem is running, you receive message IEE105I. This message includes the CT=nnnn field, which contains information about the processor time being used by the subsystem. Note the value of this field, and reissue the command.
      • If the CT= value has not changed, this indicates that the subsystem is not using any processor time. This could indicate that the subsystem is in a wait state (or that it has no work to do). If you can issue a command like DISPLAY DQM and you get output back, this indicates there is no work to do rather than a hang condition.
      • If the CT= value has changed dramatically, and continues to do so over repeated displays, this could indicate that the subsystem is busy or possibly in a loop.
      • If the reply indicates that the subsystem is now not found, this indicates that it was in the process of terminating when the first command was issued. If a dump is being taken, the subsystem might take a while to terminate. A message is produced at the console before terminating. To check that the channel initiator is working, issue the DISPLAY DQM command. If the response does not show the channel initiator working this could be because it is getting insufficient resources (like the processor). In this case, use the z/OS monitoring tools, such as RMF, to determine if there is a resource problem. If it is not, restart the channel initiator.
    3. Check whether the queue manager or channel initiator terminated has abnormally.
      Look for any messages saying that the queue manager or channel initiator address space has abnormally terminated. If you get a message for which the system action is to terminate IBM MQ, find out whether a system dump was produced. For more information, see IBM MQ dumps.
    4. Check whether IBM MQ for z/OS might still be running.
      Consider also that IBM MQ for z/OS might still be running, but only slowly. If it is running slowly, you probably have a performance problem. To confirm this, see Step 10. For advice about what to do next, see Dealing with performance problems.