APAR status
Closed as program error.
Error description
The queue manager might generate a failure data capture (FDC) record with probe ID RN189010 or RN262010 with error code rrcE_FILE_CORRUPT after applying fix for APAR IT42194 under heavy load. The FDC looks like below : | Operating System :- SunOS 5.11 | | LVLS :- 9.0.0.16 | | Product Long Name :- IBM MQ for Solaris (x86-64 platform) | | Vendor :- IBM | | O/S Registered :- 1 | | Data Path :- /var/mqm | | Installation Path :- /opt/mqm | | Installation Name :- Installation2 (2) | | License Type :- Production | | Probe Id :- RN189010 | | Application Name :- MQM | | Component :- rflIndex | | SCCS Info :- /build/jslot0/p900_P/src/lib/client/amqrfila.c, | | Program Name :- amqrmppa | | Arguments :- -m QME | | Addressing mode :- 64-bit | | LANG :- C | | Process :- 3895 | | Thread :- 27 RemoteResponder | | QueueManager :- QME | | UserApp :- FALSE | | ConnId(1) IPCC :- 187 | | ConnId(3) QM-P :- 451 | | Last HQC :- 1.0.0-3211776 | | Last HSHMEMB :- 0.0.0-0 | | Last ObjectName :- | | Major Errorcode :- rrcE_FILE_CORRUPT | | Minor Errorcode :- OK | | Probe Type :- MSGAMQ9517 | | Probe Severity :- 2 | | Probe Description :- AMQ9517: File damaged. | | FDCSequenceNumber :- 0 | MQM Function Stack ccxResponder rrxResponder rrxOpenSync rflOpen rflIndex xcsFFST The channel might also go unresponsive with the symptoms documented in the APAR IT40017. . AMQ9514 Channel is in use . ! Operating System :- OS400 V7R5M0 ! ! PIDS :- 5724H7251 ! ! LVLS :- 9.2.0.11 ! ! Product Long Name :- IBM MQ for IBM i ! ! Probe Id :- XC308010 ! ! Application Name :- MQM ! ! Component :- xlsReleaseMutex ! ! SCCS Info :- /build/slot1/p920_P/src/lib/cs/amqxlfsa.c, ! ! Line Number :- 3076 ! ! Build Date :- Apr 21 2023 ! ! Build Level :- p920-011-230421 ! ! Job Name :- 662118/QMQM/RUNMQCHL (RUNMQCHL) ! ! Job Description :- QMQM/QMQMJOBD ! ! Submitted By :- 653643/QMQM/RUNMQCHI ! ! Activation Group :- 19 (QMQM) (QMQM/RUNMQCHL) ! ! Max File Handles :- 2048 ! ! Thread :- 00000001 RunChannel ! ! UserApp :- FALSE ! ! Major Errorcode :- xecL_W_LONG_LOCK_WAIT ! ! Minor Errorcode :- OK ! ! Probe Type :- MSGAMQ6150 ! ! Probe Severity :- 3 ! ! Probe Description :- AMQ6150W: IBM MQ resource busy. ! ! FDCSequenceNumber :- 0 ! ! Arith1 :- 0 ! ! Arith2 :- 2402 0x'962' ! ! Comment1 :- SyncFile !
Local fix
Problem summary
**************************************************************** USERS AFFECTED: Users using MQ with APAR IT42194. Platforms affected: MultiPlatform **************************************************************** PROBLEM DESCRIPTION: In internal testing under a very high load, a FDC record with probe ID RN189010 and error code rrcE_FILE_CORRUPT was generated due to a logic error within code to update the channel sync file . The logic error exposed a timing condition where invalid data could be written into the channel sync file during concurrent updates. The invalid data was detected by new syncfile validation logic added by IT42194, although was often not harmful to the queue manager. At MQ 9.2 and earlier this could have been seen when multiple channels are starting or stopping concurrently. In this scenario there was also a possibility that a further timing window could be encountered, within which the affected channel processes could enter a looping state, consuming additional CPU and potentially delaying other channels running within the same process (if the affected channel is being serviced by an amqrmppa channel pool process). The channel sync file is not used at version 9.3 and later, and so 9.3 is not susceptible to this issue. At all versions including 9.3 it was theoretically possible to encounter the same FDC record if concurrent administrative modifications are made to a client channel definition table (CCDT), IBM is not aware of any cases of this latter scenario being encountered.
Problem conclusion
The timing window to encounter this FDC record when updating the channel sync file, and the possible of of the channel becoming unresponsive as a result, has been corrected for the affected 9.2 and earlier releases. The timing window to encounter this FDC record when updating a CCDT file has been corrected for the affected 9.3 and earlier releases. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v9.0 LTS 9.0.0.18 v9.1 LTS 9.1.0.15 v9.2 LTS 9.2.0.15 v9.3 LTS 9.3.0.10 v9.x CD 9.3.3 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IT43654
Reported component name
IBM MQ BASE M/P
Reported component ID
5724H7261
Reported release
900
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2023-04-27
Closed date
2023-05-09
Last modified date
2023-09-25
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
IBM MQ BASE M/P
Fixed component ID
5724H7261
Applicable component levels
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"9.0","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
26 September 2023