Resolve the problem of an MQTT client program failing to
connect to the telemetry (MQXR) service.
Before you begin
Is the problem at the server, at the client, or with the
connection? Have you have written your own MQTT v3 protocol handling
client, or an MQTT client app using the C or Java WebSphere® MQTT
clients?
Run the verification application supplied with WebSphere MQ Telemetry on
the server, and check that the telemetry channel and telemetry (MQXR)
service are running correctly. Then transfer the verification application
to the client, and run the verification application there.
About this task
There are a number of reasons why an MQTT client might not
connect, or you might conclude it has not connected, to the telemetry
server.
Procedure
- Consider what inferences can be drawn from the reason code
that the telemetry (MQXR) service returned to MqttClient.Connect.
What type of connection failure is it?
Option |
Description |
REASON_CODE_INVALID_PROTOCOL_VERSION |
Make sure that the socket address corresponds to a telemetry
channel, and you have not used the same socket address for another
broker.
|
REASON_CODE_INVALID_CLIENT_ID |
Check that the client identifier is no longer than 23 bytes,
and contains only characters from the range: A-Z, a-z, 0-9,
'./_%
|
REASON_CODE_INVALID_DESTINATION |
Check that the client identifier is not the same as the
queue manager name.
|
REASON_CODE_SERVER_CONNECT_ERROR |
Check that the telemetry (MQXR) service and the queue manager
are running normally. Use netstat to check that
the socket address is not allocated to another application.
|
If you have written an MQTT client library rather than use
one of the libraries provided by IBM® WebSphere MQ
Telemetry, look at the CONNACK
return code.
From
these three errors you can infer that the client has connected to
the telemetry (MQXR) service, but the service has found an error.
- Consider what inferences can be drawn from the reason codes
that the client produces when the telemetry (MQXR) service does not
respond:
Option |
Description |
REASON_CODE_CLIENT_EXCEPTION
REASON_CODE_CLIENT_TIMEOUT
|
Look for an FDC file at the server; see Server-side logs. When the telemetry (MQXR)
service detects the client has timed out, it writes a first-failure
data capture (FDC) file. It writes an FDC file whenever the connection
is unexpectedly broken.
|
The telemetry (MQXR) service might not have responded to
the client, and the timeout at the client expires. The WebSphere MQ Telemetry Java client only hangs if the application has
set an indefinite timeout. The client throws one of these exceptions
after the timeout set for MqttClient.Connect expires
with an undiagnosed connection problem.
Unless you find an FDC
file that correlates with the connection failure you cannot infer
that the client tried to connect to the server:
- Confirm that the client sent a connection request.
- Does the remote socket address used by the client match
the socket address defined for the telemetry channel?
The default file persistence class in the Java SE MQTT client supplied with IBM WebSphere MQ Telemetry creates a folder with the name: clientIdentifier-tcphostNameport or clientIdentifier-sslhostNameport in the client working directory. The folder name tells you the hostName and port used in the connection attempt;
see Client-side log files.
- Can you ping the remote server address?
- Does netstat on the server show the
telemetry channel is running on the port the client is connecting
too?
- Check whether the telemetry (MQXR) service found a problem
in the client request.
The telemetry (MQXR) service
writes errors it detects into mqxr.log, and the
queue manager writes errors into AMQERR01.LOG;
see
- Attempt to isolate the problem by running another client.
Run the sample programs on the server platform to eliminate
uncertainties about the network connection, then run the samples on
the client platform.
- Other things to check:
- Are tens of thousands of MQTT clients trying to connect
at the same time?
Telemetry channels have a queue to
buffer a backlog of incoming connections. Connections are processed
in excess of 10,000 a second. The size of the backlog buffer is configurable
using the telemetry channel wizard in IBM WebSphere MQ Explorer. Its default size
is 4096. Check that the backlog has not been configured to a low value.
- Are the telemetry (MQXR) service and queue manager still
running?
- Has the client connected to a high availability queue
manager that has switched its TCPIP address?
- Is a firewall selectively filtering outbound or return
data packets?