What to do if an agent is shown as being in an UNKNOWN state

Your agent is running and responds successfully to the ftePingAgent command, and items are being transferred normally. However, the fteListAgents and fteShowAgentDetails commands, and the IBM® MQ Explorer Managed File Transfer plugin, report the agent as being in an UNKNOWN state.

Why this problem occurs

Periodically, each agent publishes its status to the SYSTEM.FTE topic on the coordination queue manager. The frequency that an agent publishes its status is controlled by the following agent properties:
agentStatusPublishRateLimit
The maximum rate, in seconds, that the agent republishes its status because of a change in file transfer status. The default value of this property is 30 seconds.
agentStatusPublishRateMin
The minimum rate, in seconds, that the agent publishes its status. This value must be greater than or equal to the value of the agentStatusPublishRateLimit property. The default value for the agentStatusPublishRateMin property is 300 seconds (or 5 minutes).
The fteListAgents and fteShowAgentDetails commands, and the IBM MQ Explorer Managed File Transfer ( MFT) plugin, use these publications to determine the status of an agent. In order to do this, the commands and the plugin perform the following steps:
  1. Connect to the coordination queue manager.
  2. Subscribe to the SYSTEM.FTE topic.
  3. Receive agent status publications.
  4. Create a temporary queue on the coordination queue manager.
  5. Put a message to the temporary queue, and save the put time in order to get the current time on the coordination queue manager system.
  6. Close the temporary queue.
  7. Use the information contained within the publications, and the current time, to determine the status of an agent.
  8. Disconnect from the coordination queue manager.

The status message of an agent is considered stale if the difference between the time that it was published, and the current time, is greater than: The value of the agent property agentStatusPublishRateMin (included in the status message) plus the value of the advanced coordination queue manager property agentStatusJitterTolerance.

By default, the agentStatusJitterTolerance property has a value of 3000 milliseconds (3 seconds).

If the agentStatusPublishRateMin and agentStatusJitterTolerance properties are set to their default values, then the status of an agent is considered stale if the difference between the time that it was published, and the current time, is greater than 303 seconds (or 5 minutes 3 seconds).

Any agent with a stale status message is reported by the fteListAgents and fteShowAgentDetails commands, and the IBM MQ Explorer MFT plugin, as being in an UNKNOWN state.

The status publication of an agent can be stale for one of the following reasons:
  1. There is a significant difference in the system time between the system where the agent queue manager is running, and the system where the coordination queue manager is located.
  2. The channels between the agent queue manager and the coordination queue manager are stopped (which prevents new status messages from reaching the coordination queue manager).
  3. An authorization issue is preventing the agent from publishing its status to the SYSTEM.FTE topic on the coordination queue manager.
  4. An agent failure has occurred.

Troubleshooting the problem

There are a number of steps to take to determine why the status of an agent is being reported as UNKNOWN:
  1. Check whether the agent is running, by logging on to the agent system. If the agent is stopped, then investigate why it is no longer running. Once it is running again, check whether its status is now being reported correctly.
  2. Check that the coordination queue manager is running. If it is not, restart it and then use the fteListAgents or fteShowAgentDetails command, or the IBM MQ Explorer MFT plugin, to see if the agent status is now being reported correctly.
  3. If the agent and coordination queue managers are running, check their error logs to see if there are any authorization issues which are preventing the agent from publishing its status messages. If the logs show that authorization issues are occurring, then ensure that the user running the agent process has the correct authority to publish messages to the SYSTEM.FTE topic on the coordination queue manager.

    If the error logs of the queue manager do not report any authorization issues, check the status messages have not got stuck in the IBM MQ network. Verify that all of the sender and receiver channels used to route the messages from the agent queue manager to the coordination queue manager are running.

    If the channels are running, then check the transmission queues associated with the channels, to make sure that the status messages are not stuck on them. Also, you should check any dead letter queues for the queue managers to make sure that the status messages have not been placed there for some reason.

  4. If the channels are running, and the status messages are flowing through the IBM MQ network, then the next thing to check is that the queue manager's queued publish/subscribe engine is picking up the messages.
    The fteSetupCoordination command, which is used to define the coordination queue manager, provides you with some MQSC commands that must be run on the coordination queue manager to configure the queued publish/subscribe engine to receive publications. These commands perform the following steps:
    • Create the SYSTEM.FTE topic and its associated topic string.
    • Define a local queue called SYSTEM.FTE that will be used to receive incoming status messages.
    • Enable the queued publish/subscribe engine, by setting the PSMODE attribute on the queue manager to ENABLED.
    • Modify the SYSTEM.QPUBSUB.QUEUE.NAMELIST namelist, which is used by the queued publish/subscribe engine, so that it includes an entry for the new SYSTEM.FTE queue.
    For more information on this, including the MQSC commands that need to be run, see fteSetupCoordination.

    If there are messages on the SYSTEM.FTE queue, then you should check that the SYSTEM.QPUBSUB.QUEUE.NAMELIST namelist has been set up correctly and contains an entry for that queue. If the entry is missing, then the queued publish/subscribe engine will not detect any incoming status messages from the agent and will not process them.

    You should also ensure that the PSMODE attribute on the queue manager is set to ENABLED, which turns on the queued publish/subscribe engine.

  5. If the channels are running, and the status messages are flowing through the IBM MQ network and are being picked up from the SYSTEM.FTE queue by the queue manager's queued publish/subscribe engine, then collect the following traces:
    • An IBM MQ MFT trace from the agent, covering a time period equal to three times the value of the agent property agentStatusPublishRateMin. This ensures that the trace covers the time when the agent is publishing at least three messages containing its status. The trace should be collected dynamically, using the trace specification:
      
      com.ibm.wmqfte.statestore.impl.FTEAgentStatusPublisher,
      com.ibm.wmqfte.utils.AgentStatusDetails,
      com.ibm.wmqfte.wmqiface.AgentPublicationUtils,
      com.ibm.wmqfte.wmqiface.RFHMessageFactory=all
      
      Note: A reduced amount of trace is output using these strings.

      For information on how to enable the trace for agents running on IBM MQ for Multiplatforms, see Collecting a Managed File Transfer agent trace dynamically.

      For information on how to enable the trace for agents running on IBM MQ for z/OS®, see Collecting a Managed File Transfer for z/OS agent trace dynamically.

    • A concurrent trace of the queue managers used to route the status messages from the agent queue manager to the coordination queue manager.
    • A trace of the fteListAgents command, covering the time when the agent is shown as being in an UNKNOWN state. The trace should be collected using the trace specification:
      com.ibm.wmqfte=all

      For information on how to enable the trace for commands running on IBM MQ for Multiplatforms, see Tracing Managed File Transfer commands on Multiplatforms.

      For information on how to enable the trace for commands running on IBM MQ for z/OS, see Tracing Managed File Transfer for z/OS commands.

    Once the traces have been collected, they should be made available to IBM Support for analysis.