IBM Support

DB2: Short spikes in number of applications in "Commit Active" state and/or COMMIT response time with large history file.

Troubleshooting


Problem

On highly concurrent DB2 OLTP environment one could see random, short spikes in commit response time with large history file. With multiple concurrent application, increased number of applications in "Commit Active" state might be also observed.

Cause

DB2 EDU responsible for writing log records to disk, db2loggw, before starting a write into a new transactional log file has to update HISTORY facility with information about the current log (information about the current log must be available for other facilities e.g. BACKUP). If that first write to the new log happens when other log is being archived by log manager EDU (db2logmgr), db2loggw will have to wait for the lock on history file. As the result, all applications connected to the database might experience a short spike in COMMIT response time.

Diagnosing The Problem

Spikes in COMMIT response time might happen for multiple reasons, including slow writes to active log path or, in HADR environments, slow processing on the STANDBY database. If the typical factors have been excluded, one could evaluate if slow history file access might be the problem.
Typical symptoms include:
1. Fairly large history file (db2rhist.asc). There is no specific threshold, but everything above 100 MB can be considered large.
2. Applications waiting in "Commit Active" state for the DB2 log writer (db2loggw), with sqlpgild function on the the stack, e.g:
semop
sqloWaitThreshold
sqlpgild
sqlpWriteToLog
sqlpWriteLR
sqlptlog
sqlpxcm1
sqlrrcom
sqlrr_commit
sqljs_ddm_rdbcmm
sqljsParseRdbAccessed
sqljsParse

3. Log writer EDU waiting for the lock on history file:
semop
sqloNLCKLock
sqluhLock
sqluhAllocateHandleVerifyHistoryFilesAndOptToRecover
sqluhOpen
sqluhBeginBatch
sqluhInsertSingle
sqlpInsertLogToHistFile
sqlpAddNewLogToHistFile
sqlpgWriteToDisk
sqlpgwlp
sqlpgasn2
sqloEDUEntry

4. Lock on history being held by DB2 log manager EDU (db2logmgr), e.g:
vsscanf
sscanf
sqluhReadEntry
sqluhUpdate
sqluhUpdateLogArchive
sqlpRecordArchiveInHistFile
sqlpgArchiveLogFile
sqlpgArchivePendingLogs
sqlpgLogManager
sqloEDUEntry
start_thread

With large history file, the amount of time required for db2logmgr to update history file is much higher, which results higher probability of history file access attempt by log writer when exclusive lock by db2logmgr is being held.
The design of history file access was improved in release 10.1 FP2, so this should no longer be an issue starting from that release:
> The process of writing to the recovery history file in DB2® Version 10.1 Fix Pack 2 is now more efficient. This improvement might result in general speed increases, especially as the recovery history file size increases. These speed improvements occur automatically; no change to the system or queries are required.
https://www.ibm.com/support/knowledgecenter/SSEPGG_10.1.0/com.ibm.db2.luw.wn.doc/doc/c0060645.html

Resolving The Problem

In order to avoid the problem, one should:

- reduce size of history file by running "PRUNE HISTORY"

- consider bigger LOGFILSIZ value, which will reduce the likelihood of log archival and write to a new log file happening exactly at the same time

[{"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Recovery - Logging","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"9.7;9.5;10.1","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 June 2018

UID

swg21991251