APAR status
Closed as program error.
Error description
In this scenario there were local logger filesystem availability problems. Because of this, there were 1000s of application messages on the SYSTEM.CLUSTER.TRANSMIT.QUEUE, and one or more channels held indoubt batches of messages. At this time the administrator ran REFRESH CLUSTER on the local queue manager. The indoubt batches of messages became rolled back (probably automatically due to log space shortage, though the same effect could be seen if using RESOLVE CHANNEL to roll back the batch), so the messages appeared on the transmission queue again. After this the local queue manager's repository manager program tried to reallocate messages from the rolled-back indoubt batches. The reallocation routine found that the cluster cache no longer knew anything about the queues for which they were destined. The repository manager did not break from its reallocation routine to ask the full repositories for details of the queues, so it suffered repeated MQRC_CLUSTER_RESOLUTION_ERROR errors over a period of many minutes, which were visible in the MQ trace file for the amqrrmfa process. Application calls to MQOPEN for queues not known locally will fail with 2189 MQRC_CLUSTER_RESOLUTION_ERROR. Other symptoms include: -- a failure to recognize other queue managers in the cluster including the QMGR hosting the cluster Q. -- message buildup on the SCCQ and the SCTQ.
Local fix
Problem summary
**************************************************************** USERS AFFECTED: A system that suffers multiple problems including log space shortage, at a time when there are inflight batches and many other application messages sitting on the SYSTEM.CLUSTER.TRANSMIT.QUEUE. This problem only occurs if the REFRESH CLUSTER command is issued while the queue manager is in this situation. Platforms affected: MultiPlatform **************************************************************** PROBLEM DESCRIPTION: The reallocation routine within the repository manager was suffering a MQRC_CLUSTER_RESOLUTION_ERROR condition repeatedly, for the same message each time. There were 1000s of messages on the cluster transmission queue, and because of a flaw in the logic flow it would re-read the same message 1000s of times, with a 1 second sleep between each time. It did not break from this loop to send the query to the full repositories that was necessary to relieve this situation.
Problem conclusion
The correct behaviour for the reallocation routine is to schedule a re-run of itself in 60 seconds time, and break from its work to allow the repository manager to request information from the full repositories. This is the behaviour that has now been coded in the MQ queue manager. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v8.0 8.0.0.10 v9.0 LTS 9.0.0.4 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IT24269
Reported component name
IBM MQ BASE MP
Reported component ID
5724H7251
Reported release
800
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2018-03-02
Closed date
2018-03-19
Last modified date
2018-03-19
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
PI95380 IT33907
Fix information
Fixed component name
IBM MQ BASE MP
Fixed component ID
5724H7251
Applicable component levels
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.0.0.0","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
15 August 2020