APAR status
Closed as program error.
Error description
IBM MQ channels might go unresponsive with high CPU usage in channel process amqrmppa or runmqchl if the channel synchronization record is corrupted. If the problem affects the channels on the SENDER side(e.g. SDR or CLUSSDR) channels then no message will be sent by the channels. The "DIS CHS" output likely to show no value in the SUBSTATE field. AMQ8417I: Display Channel Status details. CHANNEL(CLUSCHL1) CHLTYPE(CLUSSDR) ... RQMNAME(RQM1) STATUS(RUNNING) SUBSTATE( ) XMITQ(SYSTEM.CLUSTER.TRANSMIT.CLUSCHL1) If the affected channels are on the receiver side (e.g. RCVR or CLUSRCVR) then the channel process on the receiver side consumes high CPU with the corresponding SDR or RCVR channel going into retrying state. The top output for the affect channel process shows high CPU usage. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 133101 mqm 20 0 265412 13836 11584 R 106.2 0.4 2:08.97 runmqchl 133101 mqm 20 0 265412 13836 11584 R 99.7 0.4 2:11.97 runmqchl 133101 mqm 20 0 265412 13836 11584 R 99.3 0.4 2:14.97 runmqchl 133101 mqm 20 0 265412 13836 11584 R 99.7 0.4 2:17.97 runmqchl IBM MQ trace shows the channel process repeatedly calling rflSeekBytes and rflReadBytes with the same pattern of comparison and file pointers. Example: 19:54:22.511503 133101.1 RSESS:000001 NewFilePointer=716(0x000002cc) <--------- 19:54:22.511528 133101.1 RSESS:000001 NewFilePointer=1072(0x00000430) 19:54:22.511552 133101.1 RSESS:000001 NewFilePointer=1428(0x00000594) 19:54:22.511577 133101.1 RSESS:000001 NewFilePointer=1784(0x000006f8) 19:54:22.511603 133101.1 RSESS:000001 NewFilePointer=2140(0x0000085c) 19:54:22.511628 133101.1 RSESS:000001 NewFilePointer=2496(0x000009c0) 19:54:22.511653 133101.1 RSESS:000001 NewFilePointer=2852(0x00000b24) 19:54:22.511678 133101.1 RSESS:000001 NewFilePointer=716(0x000002cc) <--------- 19:54:22.511704 133101.1 RSESS:000001 NewFilePointer=1072(0x00000430) 19:54:22.511728 133101.1 RSESS:000001 NewFilePointer=1428(0x00000594) 19:54:22.511753 133101.1 RSESS:000001 NewFilePointer=1784(0x000006f8) 19:54:22.511778 133101.1 RSESS:000001 NewFilePointer=2140(0x0000085c) 19:54:22.511806 133101.1 RSESS:000001 NewFilePointer=2496(0x000009c0) 19:54:22.511831 133101.1 RSESS:000001 NewFilePointer=2852(0x00000b24)
Local fix
Stop the queue manager Backup the queue manager Rename the sync file AMQRSYNA.DAT Start the queue manager with -ns option (strmqm -ns QM) Recreate the channel sync file ( rcrmqobj -m QM -t syncfile ) stop the queue manager start the queue manager
Problem summary
**************************************************************** USERS AFFECTED: All users of IBM MQ distributed channels who have a corrupted channel synchronization record in the channel sync file. Corruption of this file is not an expected or typical usage pattern, and has not been observed as a result of any known product defect. The channel sync file is used by all queue manager channel types except SVRCONN/CLNTCONN and AMQP channels. Platforms affected: MultiPlatform **************************************************************** PROBLEM DESCRIPTION: The IBM MQ channel process was not detecting a corruption in the channel synchronization record and this caused infinite loop, resulting in the channel going into an unresponsive state.
Problem conclusion
The IBM MQ code has been modified prevent infinite loop if the channel synchronization record is corrupted. This APAR does not address the corruption in the channel synchronization record itself, as the cause of the corruption at the time this issue was observed remains unknown. With the fix applied, if the queue manager detects an infinite loop when finding a channel record in the channel synchronization file, the queue manager generates the following error message and the channel goes into retrying state. ------------------------ 02/28/2022 10:37:07 PM - Process(181467.1) User(root) Program(runmqchl) Host(host1.ibm.com) Installation(Installation1) VRMF(9.1.0.7) QMgr(qm1) Time(2022-03-01T06:37:07.434Z) ArithInsert1(1017) CommentInsert1(AMQRSYNA.DAT) AMQ9516E: File error occurred for file 'AMQRSYNA.DAT'. EXPLANATION: The filesystem returned error code 1017 for file 'AMQRSYNA.DAT'. ACTION: Record the name of the file and tell the systems administrator, who should ensure that file is correct and available, for example that the current user has appropriate access to the file for reading or writing. ------------------------ The user needs to take appropriate action to resolve the issue i.e. rebuild the syncfile using rcrmqobj in this case. To rebuild the syncfile, check the Local Fix/Workaround section The queue manager also generates the following failure data capture (FDC) record. AMQ184577.0.FDC 2022/03/01 17:37:07.740247-8 Installation1 runmqchl 184577 1 RM738001 rflFindRecord Unknown(3F9) Probe Id :- RM738001 Application Name :- MQM Component :- rflFindRecord Program Name :- runmqchl Arguments :- -c "CHL9 " -m "qm1 Major Errorcode :- Unknown(3F9) --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v9.0 LTS 9.0.0.16 v9.1 LTS 9.1.0.12 v9.2 LTS 9.2.0.7 v9.x CD 9.3.2 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IT40017
Reported component name
IBM MQ BASE MP
Reported component ID
5724H7271
Reported release
910
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2022-02-19
Closed date
2022-10-11
Last modified date
2023-02-24
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
IT43171
Fix information
Fixed component name
IBM MQ BASE MP
Fixed component ID
5724H7271
Applicable component levels
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"910","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
24 February 2023