APAR status
Closed as program error.
Error description
An MQ classes for JMS application using the automatic client reconnect function may hang if there are frequent network interruptions or packet loss between the client and queue manager systems. When this occurs, messages are not delivered to the application and the depth of the MQ queue increases. A Javacore (thread dump) of the application JVM will show that application and internal MQ classes for JMS threads are stuck with the following callstacks until the JVM is killed and restarted: Java callstack of an application thread attempting to consume a message: "Application-Thread-1" prio=5 os_prio=0 tid=0x0000000019051000 nid=0x1e8c in Object.wait() java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait at com.ibm.mq.jmqi.remote.api.RemoteHconn.checkForReconnect - locked <0x00000000c8fc53b0> (a com.ibm.mq.jmqi.remote.api.RemoteHconn$ReconnectMutex) at com.ibm.mq.jmqi.remote.impl.RemoteProxyQueue.requestMutex at com.ibm.mq.jmqi.remote.impl.RemoteProxyQueue.requestMessagesReco nnectable at com.ibm.mq.jmqi.remote.impl.RemoteProxyQueue.requestMessages at com.ibm.mq.jmqi.remote.impl.RemoteProxyQueue.flushQueue at com.ibm.mq.jmqi.remote.impl.RemoteProxyQueue.proxyMQGET at com.ibm.mq.jmqi.remote.api.RemoteFAP.jmqiGetInternalWithRecon at com.ibm.mq.jmqi.remote.api.RemoteFAP.jmqiGetInternal at com.ibm.mq.jmqi.internal.JmqiTools.getMessage at com.ibm.mq.jmqi.remote.api.RemoteFAP.jmqiGet at com.ibm.mq.ese.jmqi.InterceptedJmqiImpl.jmqiGet at com.ibm.mq.ese.jmqi.ESEJMQI.jmqiGet at com.ibm.msg.client.wmq.internal.WMQConsumerShadow.getMsg - locked <0x00000000c8fc5430> (a java.lang.Object) at com.ibm.msg.client.wmq.internal.WMQSyncConsumerShadow.receiveInt ernal at com.ibm.msg.client.wmq.internal.WMQConsumerShadow.receive at com.ibm.msg.client.wmq.internal.WMQMessageConsumer.receive at com.ibm.msg.client.jms.internal.JmsMessageConsumerImpl.receiveIn boundMessage at com.ibm.msg.client.jms.internal.JmsMessageConsumerImpl.receive at com.ibm.mq.jms.MQMessageConsumer.receive An MQ classes for JMS "Remote Receive Thread" which is responsible for reading data sent by the queue manager over a TCP/IP connection: "RcvThread: com.ibm.mq.jmqi.remote.impl.RemoteTCPConnection@1303583763[qmid= QM1_2019-07-10_15.32.22,fap=13,channel=JMS.SVRCONN,ccsid=819,sha recnv=10,hbint=300,peer=localhost/127.0.0.1(1414),localport=5025 1,ssl=no]" #334 daemon prio=5 os_prio=0 tid=0x0000000015b93000 nid=0x1ef0 in Object.wait() java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait at com.ibm.mq.jmqi.remote.api.RemoteHconn.checkForReconnect - locked <0x00000000c8f6e618> (a com.ibm.mq.jmqi.remote.api.RemoteHconn$ReconnectMutex) at com.ibm.mq.jmqi.remote.api.RemoteHconn.getSession at com.ibm.mq.jmqi.remote.api.RemoteHconn.getSession at com.ibm.mq.jmqi.remote.api.RemoteFAP.spiOpen at com.ibm.mq.jmqi.remote.api.RemoteFAP.spiOpen at com.ibm.mq.jmqi.remote.api.RemoteHconn.dummyJmqiCall at com.ibm.mq.jmqi.remote.api.RemoteHconn.eligibleForReconnect at com.ibm.mq.jmqi.remote.api.RemoteHconn.deliverException at com.ibm.mq.jmqi.remote.impl.RemoteSession.deliverException at com.ibm.mq.jmqi.remote.impl.RemoteConnection.asyncConnectionBrok en - locked <0x00000000c8f293e8> (a com.ibm.mq.jmqi.remote.impl.RemoteConnection$SessionsMutex) at com.ibm.mq.jmqi.remote.impl.RemoteRcvThread.run at com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.runTas k at com.ibm.msg.client.commonservices.workqueue.SimpleWorkQueueItem. runItem at com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.run at com.ibm.msg.client.commonservices.workqueue.WorkQueueManager.run WorkQueueItem at com.ibm.msg.client.commonservices.j2se.workqueue.WorkQueueManage rImplementation$ThreadPoolWorker.run The MQ classes for JMS "Remote Reconnect Thread" which is responsible for creating connection and object handles for JMS resources used by the application: "JMSCCThreadPoolWorker-2" #92 daemon prio=5 os_prio=0 tid=0x00000000190e7000 nid=0x1e18 waiting for monitor entry [0x000000002be7e000] java.lang.Thread.State: BLOCKED (on object monitor) at com.ibm.mq.jmqi.remote.impl.RemoteConnection.removeSession - waiting to lock <0x00000000c8f293e8> (a com.ibm.mq.jmqi.remote.impl.RemoteConnection$SessionsMutex) at com.ibm.mq.jmqi.remote.impl.RemoteSession.disconnect at com.ibm.mq.jmqi.remote.api.RemoteFAP.jmqiConnect at com.ibm.mq.jmqi.remote.impl.RemoteReconnectThread.reconnect at com.ibm.mq.jmqi.remote.impl.RemoteReconnectThread.run at com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.runTas k at com.ibm.msg.client.commonservices.workqueue.SimpleWorkQueueItem. runItem at com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.run at com.ibm.msg.client.commonservices.workqueue.WorkQueueManager.run WorkQueueItem at com.ibm.msg.client.commonservices.j2se.workqueue.WorkQueueManage rImplementation$ThreadPoolWorker.run
Local fix
Problem summary
**************************************************************** USERS AFFECTED: This issue affects users of the IBM MQ classes for JMS automatic client reconnect function. Platforms affected: MultiPlatform **************************************************************** PROBLEM DESCRIPTION: A JMS Connection and a JMS Session created by an MQ classes for JMS application each have a connection to a queue manager. These connections are referred to as "conversations" or "connection handles" (hConns) and multiple hConns can be multiplexed over a server-connection channel instance - as determined by the value of the channel's sharing conversations property. A deadlock occurred within the MQ classes for JMS automatic client reconnect function when a connection error was detected by the hConn associated with a JMS Session and reconnect processing was invoked. This prevented the reconnection processing from completing. The "RcvThread" that was associated with the channel instance (TCP/IP connection) used by the JMS Session's hConn attempted to verify that its "parent" hConn (the one associated with the JMS Connection) was either still valid, already reconnected or in need of reconnection. This is because the JMS Session must always connect to the same queue manager as the JMS Connection from which it was created and uses connection information from this parent hConn. In MQ V9.1, it did this by attempting to issue a lightweight MQ API call to the queue manager because, for the most part, the hConn associated with a JMS Connection is used as a controlling hConn for asynchronous consume operations and so few MQ API calls are issued using it. Before issuing the MQ API call, the "RcvThread" took a lock on a list of hConns multiplexed over a channel instance and checked to see if the hConn was in the process of being reconnected. It was and so the "RcvThread" blocked, waiting for the reconnect to complete. An internal "RemoteReconnectThread" is responsible for reconnecting hConns. It was in the process of attempting to reconnect a particular hConn and required the lock on the list of hConns for the channel instance in order to perform some clean-up. This was because it initially tried to reconnect the hConn using an existing channel instance but failed because that channel instance was in the process of disconnected due to the original connection error. The "RemoteReconnectThread" could not obtain the lock which was held by the "RcvThread", which would not release it until the reconnect processing was completed by the "RemoteReconnectThread".
Problem conclusion
Two changes have been made to the MQ classes for JMS. The first is to ensure that the "RemoteReconnectThread" does not attempt to reuse old channel instances to reconnect broken connection handles (hConns). At the start of each reconnect cycle, a new connection is created which can then used to reconnect disconnected hConns. The second change is to remove the need for the "RcvThread" to make an MQ API call on a "parent" hConn when a connection error is detected on a child hConn. The channel heartbeating function is sufficient to detect errors on the channel instance associated with a parent hConn even if it is not being used to issue regular MQ API calls. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v9.1 CD 9.1.4 v9.1 LTS 9.1.0.4 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IT29729
Reported component name
IBM MQ BASE MP
Reported component ID
5724H7271
Reported release
910
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2019-07-15
Closed date
2019-10-04
Last modified date
2019-10-04
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
IBM MQ BASE MP
Fixed component ID
5724H7271
Applicable component levels
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"910","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
04 October 2019