APAR status
Closed as program error.
Error description
An MQ classes for JMS application using the automatic client reconnect function may hang during reconnect processing after a multi-instance queue manager is failed over to the standby instance. Application threads will remain in a receive() call on a JMS MessageConsumer object and not return. A Javacore (thread dump) of the application JVM will show that application and internal MQ classes for JMS threads become stuck in conditional wait states until the JVM is killed and restarted. Below shows come example threads with their associated Java call when this problem occurs: An application thread attempting to consume a message: "Application-Thread-1" J9VMThread:0x0000000031D6F600, j9thread_t:0x0000000042D9CE30, java/lang/Thread:0x000000000964B6D0, state:CW, prio=5 at java/lang/Object.wait at com/ibm/mq/jmqi/remote/impl/RemoteSession.exchangeTSH (entered lock: com/ibm/mq/jmqi/remote/impl/RemoteSession$RemoteRequestEntry@0x0 00000000970E028, entry count: 1) at com/ibm/mq/jmqi/remote/impl/RemoteProxyQueue.requestMessagesReco nnectable at com/ibm/mq/jmqi/remote/impl/RemoteProxyQueue.requestMessages at com/ibm/mq/jmqi/remote/impl/RemoteProxyQueue.flushQueue at com/ibm/mq/jmqi/remote/impl/RemoteProxyQueue.proxyMQGET at com/ibm/mq/jmqi/remote/api/RemoteFAP.jmqiGetInternalWithRecon at com/ibm/mq/jmqi/remote/api/RemoteFAP.jmqiGetInternal at com/ibm/mq/jmqi/internal/JmqiTools.getMessage at com/ibm/mq/jmqi/remote/api/RemoteFAP.jmqiGet at com/ibm/mq/ese/jmqi/InterceptedJmqiImpl.jmqiGet at com/ibm/mq/ese/jmqi/ESEJMQI.jmqiGet at com/ibm/msg/client/wmq/internal/WMQConsumerShadow.getMsg (entered lock: java/lang/Object@0x00000000093C19B8, entry count: 1) at com/ibm/msg/client/wmq/internal/WMQSyncConsumerShadow.receiveInt ernal at com/ibm/msg/client/wmq/internal/WMQConsumerShadow.receive at com/ibm/msg/client/wmq/internal/WMQMessageConsumer.receive at com/ibm/msg/client/jms/internal/JmsMessageConsumerImpl.receiveIn boundMessage at com/ibm/msg/client/jms/internal/JmsMessageConsumerImpl.receive at com/ibm/mq/jms/MQMessageConsumer.receive An MQ classes for JMS "Remote Receive Thread" which is responsible for reading data sent by the queue manager over a TCP/IP connection: "RcvThread: com.ibm.mq.jmqi.remote.impl.RemoteTCPConnection@1641630639[qmid= QM1_2019-07-01_10.31.17,fap=13,channel=JMS.SVRCONN,ccsid=850,sha recnv=10,hbint=300,peer=xxx.xxx.xxx.MIQM01/xxx.xxx.xxx.MIQM01(14 14),localport=50832,ssl=no]" J9VMThread:0x0000000031D99B00, j9thread_t:0x0000000040681550, java/lang/Thread:0x00000000092C3A40, state:CW, prio=5 Waiting on: com/ibm/mq/jmqi/remote/api/RemoteHconn$ReconnectMutex@0x00000000 0931B348 Owned by: <unowned> Java callstack: at java/lang/Object.wait at com/ibm/mq/jmqi/remote/api/RemoteHconn.checkForReconnect (entered lock: com/ibm/mq/jmqi/remote/api/RemoteHconn$ReconnectMutex@0x00000000 0931B348, entry count: 1) at com/ibm/mq/jmqi/remote/api/RemoteHconn.getSession at com/ibm/mq/jmqi/remote/api/RemoteHconn.getSession at com/ibm/mq/jmqi/remote/impl/RemoteProxyQueueManager.receiveNotif ication at com/ibm/mq/jmqi/remote/impl/RemoteRcvThread.run at com/ibm/msg/client/commonservices/workqueue/WorkQueueItem.runTas k at com/ibm/msg/client/commonservices/workqueue/SimpleWorkQueueItem. runItem at com/ibm/msg/client/commonservices/workqueue/WorkQueueItem.run at com/ibm/msg/client/commonservices/workqueue/WorkQueueManager.run WorkQueueItem at com/ibm/msg/client/commonservices/j2se/workqueue/WorkQueueManage rImplementation$ThreadPoolWorker.run The MQ classes for JMS "Remote Reconnect Thread" which is responsible for creating connection and object handles for JMS resources used by the application: "JMSCCThreadPoolWorker-4" J9VMThread:0x0000000031C08C00, j9thread_t:0x0000000045AE7B48, java/lang/Thread:0x0000000028303C50, state:CW, prio=5 Waiting on: com/ibm/mq/jmqi/remote/api/RemoteHconn$CallLock@0x000000000931B2 A8 Owned by: <unowned> Java callstack: at java/lang/Object.wait at com/ibm/mq/jmqi/remote/util/ReentrantMutex.acquire (entered lock: com/ibm/mq/jmqi/remote/api/RemoteHconn$CallLock@0x000000000931B2 A8, entry count: 2) at com/ibm/mq/jmqi/remote/util/ReentrantMutex.acquire (entered lock: com/ibm/mq/jmqi/remote/api/RemoteHconn$CallLock@0x000000000931B2 A8, entry count: 1) at com/ibm/mq/jmqi/remote/api/RemoteHconn.enterCall at com/ibm/mq/jmqi/remote/api/RemoteHconn.enterCall at com/ibm/mq/jmqi/remote/impl/RemoteReconnectThread.reconnect at com/ibm/mq/jmqi/remote/impl/RemoteReconnectThread.run at com/ibm/msg/client/commonservices/workqueue/WorkQueueItem.runTas k at com/ibm/msg/client/commonservices/workqueue/SimpleWorkQueueItem. runItem at com/ibm/msg/client/commonservices/workqueue/WorkQueueItem.run at com/ibm/msg/client/commonservices/workqueue/WorkQueueManager.run WorkQueueItem at com/ibm/msg/client/commonservices/j2se/workqueue/WorkQueueManage rImplementation$ThreadPoolWorker.run
Local fix
Use a sharing conversations value of 1 on the server-connection channel used by the MQ classes for JMS application.
Problem summary
**************************************************************** USERS AFFECTED: This issue affects MQ classes for JMS applications that use the automatic client reconnect function. Platforms affected: MultiPlatform **************************************************************** PROBLEM DESCRIPTION: This APAR addresses two related, but subtly different deadlocks that could have resulted from the same scenario depending upon undeterministic timing windows. In both scenarios, the MQ classes for JMS received two notification flows from a queue manager indicating a waiting get had failed. This could be due to a multi-instance queue manager failing over to the standby instance, for example. The notification flow contained the MQ reason code 2009. An internal thread known as the "Remote Receive Thread" (named "RcvThread: ..." as shown in Javacores) received the notification flows sent by the queue manager over the channel instance. When the first notification flow was processed, the automatic client reconnection logic was invoked. The connection object associated with the channel instance was not marked as disconnected however, because the TCP/IP socket was still valid. The Remote Receive Thread then began processing the second notification and became blocked, waiting for the reconnect processing to complete. For the threads to deadlock with the Java callstacks noted in the Problem Description section of this APAR, the application thread performing the waiting get needed to hold a lock on the connection handle (hConn) used to issue the message get API call. It held the lock and was waiting for data to be made available by the Remote Receive Thread. However this thread was blocked due to reconnect processing. The internal "Remote Reconnect Thread" (shown in the Problem Description as the "JMSCCThreadPoolWorker-4" thread) couldn't complete the reconnect processing because it was blocked waiting for the lock held by the application thread. Therefore, there was a three way deadlock between an application thread, a Remote Receive Thread and the Remote Reconnect Thread. A second deadlock could have occurred when the Remote Receive Thread is in the same state as described above but the Remote Reconnect Thread had the following callstack: Waiting on: com/ibm/mq/jmqi/remote/impl/RemoteSession$AsyncTshLock@0x0000000 017A9A060 Owned by: <unowned> Java callstack: at java/lang/Object.wait at com/ibm/mq/jmqi/remote/impl/RemoteSession.receiveAsyncTsh (entered lock: com/ibm/mq/jmqi/remote/impl/RemoteSession$AsyncTshLock@0x0000000 017A9A060, entry count: 1) at com/ibm/mq/jmqi/remote/impl/RemoteSession.receiveTSH at com/ibm/mq/jmqi/remote/impl/RemoteSession.startConversation at com/ibm/mq/jmqi/remote/impl/RemoteConnectionSpecification.sessio nFromEligible at com/ibm/mq/jmqi/remote/impl/RemoteConnectionSpecification.getSes sionFromEligibleConnection (entered lock: com/ibm/mq/jmqi/remote/impl/RemoteConnectionSpecification$Connec tionsLock@0x000000000868CF50, entry count: 1) at com/ibm/mq/jmqi/remote/impl/RemoteConnectionSpecification.getSes sion at com/ibm/mq/jmqi/remote/impl/RemoteConnectionPool.getSession at com/ibm/mq/jmqi/remote/api/RemoteFAP.jmqiConnect at com/ibm/mq/jmqi/remote/impl/RemoteReconnectThread.reconnect at com/ibm/mq/jmqi/remote/impl/RemoteReconnectThread.run at com/ibm/msg/client/commonservices/workqueue/WorkQueueItem.runTas k at com/ibm/msg/client/commonservices/workqueue/SimpleWorkQueueItem. runItem at com/ibm/msg/client/commonservices/workqueue/WorkQueueItem.run at com/ibm/msg/client/commonservices/workqueue/WorkQueueManager.run WorkQueueItem at com/ibm/msg/client/commonservices/j2se/workqueue/WorkQueueManage rImplementation$ThreadPoolWorker.run In this scenario, the Remote Reconnect Thread attempted to reconnect a connection handle over an existing channel instance. It was waiting for a response to be made available by the Remote Receive Thread. However the Remote Receive Thread was blocked waiting for the reconnection to complete.
Problem conclusion
Three changes have made to the MQ classes for JMS to address the deadlocks described by this APAR: 1) On receipt of a notification from the queue manager containing a connection broken type reason code, mark the channel instance as broken even if no exception has been thrown when using the java.net.Socket object associated with the channel instance. 2) Allow Remote Receive Threads to continue processing and reading data over an existing channel instance when automatic client reconnect is invoked. 3) Version channel instances such that the Remote Reconnect Thread does not attempt to reuse existing channel instances created before its current reconnect cycle. Instead, create a new channel instance and then attempt to use this to reconnect any broken connection handles (hConns). This may result in a small increase in the number of channel instances as potentially still connected, and valid, channel instances that were created before the reconnect cycle will not be considered for multiplexing reconnected hConns. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v8.0 8.0.0.14 v9.0 LTS 9.0.0.9 v9.1 CD 9.1.4 v9.1 LTS 9.1.0.4 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IT29434
Reported component name
IBM MQ BASE MP
Reported component ID
5724H7251
Reported release
800
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2019-06-12
Closed date
2019-10-04
Last modified date
2019-10-04
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
IBM MQ BASE MP
Fixed component ID
5724H7251
Applicable component levels
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.0.0.0","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
04 October 2019