APAR status
Closed as program error.
Error description
A customer running 12.10.FC7 on linux x86_64 reported a benchmark was reporting many less transactions per second when running ~6000 clients against a 52 CPU VP server as compared to splitting the clients over two 26 CPU VP servers. The servers were each primary servers of HDR pairs and DRINTERVAL was set to 0 and HDR_TXN_SCOPE set to NEAR_SYNC. The 52 CPU VP server was showing hundreds of threads sleeping forever with stacks like: 0x00000000013e79ef (oninit) yield_processor_mvp 0x00000000013eb1d7 (oninit) mt_yield 0x000000000106550d (oninit) cdrLSNQ_Wait 0x00000000011f55af (oninit) proxyWaitForAllNodesFromPrimary 0x0000000000d3401f (oninit) rscommit 0x000000000078788e (oninit) sqiscommit 0x000000000074c0fa (oninit) sqcommit 0x00000000006bec06 (oninit) aud_sqcommit 0x0000000000a03f44 (oninit) sql_commit 0x0000000000a040c9 (oninit) sq_commit 0x0000000000ad3653 (oninit) sqmain 0x00000000014f7756 (oninit) spawn_thread 0x00000000013c1790 (oninit) th_init_initgls 0x0000000001428327 (oninit) startup This is a common stack for a thread running in a NEAR_SYNC HDR environment when the thread is waiting for ack from the secondary that the commit log record made it to an HDR buffer. While it is expected to see stacks like these for sqlexec threads in this HDR environment, if you see a multitude of these stacks it might indicate another unexpected issue. In stress testing designed to mimic comparable work, I observed via profiling that the cdrLSNQ_Wait function was often a top 10 or at least a top 20 expensive function. There is a list in this function thatgets pretty long and we traverse this list looking for a particular waiter often having to go several hundred iterations into the list. The more threads that are running transactions and entering this function leads to longer lists and more and more expensive list traversals which explains why the customer saw better throughput when they divided clients across 2 instances. So, the above stack which shows a sleeping thread can also indicate a performance issue when hundreds of threads are in this state.
Local fix
set HDR_TXN_SCOPE to ASYNC
Problem summary
**************************************************************** * USERS AFFECTED: * * Users of Informix Server prior to 12.10.xC15 and 14.10.xC4. * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Upgrade to Informix Server 12.10.xC15 (when available) or * * 14.10.xC4. * ****************************************************************
Problem conclusion
Fixed in Informix Server 12.10.xC15 and 14.10.xC4.
Temporary fix
Comments
APAR Information
APAR number
IT32060
Reported component name
INFORMIX SERVER
Reported component ID
5725A3900
Reported release
C10
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2020-03-03
Closed date
2020-12-10
Last modified date
2024-09-24
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
INFORMIX SERVER
Fixed component ID
5725A3900
Applicable component levels
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSGU8G","label":"Informix Servers"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"C10","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Document Information
Modified date:
25 September 2024