Fixes are available
DB2 Version 9.1 Fix Pack 7 for Linux, UNIX and Windows
DB2 Version 9.1 Fix Pack 6 for Linux, UNIX and Windows
DB2 Version 9.1 Fix Pack 6a for Linux, UNIX and Windows
DB2 Version 9.1 Fix Pack 7a for Linux, UNIX and Windows
DB2 Version 9.1 Fix Pack 8 for Linux, UNIX and Windows
DB2 Version 9.1 Fix Pack 9 for Linux, UNIX and Windows
DB2 Version 9.1 Fix Pack 10 for Linux, UNIX and Windows
DB2 Version 9.1 Fix Pack 11 for Linux, UNIX and Windows
DB2 Version 9.1 Fix Pack 12 for Linux, UNIX and Windows
APAR status
Closed as program error.
Error description
There exists an internal shared structure call a socket pair that is used when an application connects to the database. The connection listener process will acquire one of these socket pairs, and then when the agent is dispatched for the connection, the agent process will get the socket pair and free it. In this way, this internal structure will increment when the connection listener receives a connection, but then quickly decrement again when the agent is assigned. There is an internal limit of 32 of these socket pairs, however under most normal conditions DB2 will never need that many because they are used and freed quickly. We expect the agent processes will get an equal share of cpu time that the listener is getting and so the get/free actions on this structure would happen relatively equally. However, a timing window has been observed such that if a connection listener gets 32 consecutive agent dispatch requests (and each one acquiring a socket pair) and if the CPU time slices given from the OS only go to the listener process in that time, then the agent processes do not decrement the number of socket pairs in time, and it results in a case where the connection listener hits the limit of 32 socket pairs. If this happens, the connection listener would have a stack traceback like this (if manually generated by support team): msgrcv + 0x98 sqloCSemP + 0xC8 GetSharedSocketPair + 0x30 sqleSendInbound + 0x60 sqleInitAgentCB + 0x2A8 sqleGetAgentFromPool + 0x45C sqleGetAgent + 0x1C4 sqlcctcpconnmgr_child + 0xDD0 sqloCreateEDU + 0x194 It is blocking while getting the socket pair, but there are none left. It is also holding a latch that prevents other agents from being dispatched and that results in an instance hang. This timing issue has only been seen if there is a CPU bottleneck that is affecting the timing (i.e. CPU contention or CPU spikes). Connection concentrator environments also seem to be vulnerable to this timing issue. This APAR will help to reduce the chances of hitting this timing window.
Local fix
Tune the system to help reduce CPU contention as a method to try to reduce the chances of hitting this rare timing issue. It does not impact DB2 9.5 since this design has been re-worked in that release.
Problem summary
Same as above.
Problem conclusion
First fixed in DB2 UDB Version x, FixPak 6 (s081007).
Temporary fix
Same as above.
Comments
APAR Information
APAR number
IZ18649
Reported component name
DB2 CEE AIX
Reported component ID
5765F3000
Reported release
910
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2008-03-27
Closed date
2009-03-10
Last modified date
2009-03-10
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
DB2 CEE AIX
Fixed component ID
5765F3000
Applicable component levels
R950 PSY
UP
R810 PSN
UP
R820 PSN
UP
R910 PSN
UP
Document Information
Modified date:
04 October 2021