Technical Blog Post
Abstract
DB2 : SQL2038N while reading a log file on HADR Primary
Body
Recently, we had a case where customer reported of problems on Primary HADR when their Q-replication was trying to read a log but failed with SQL2038N.
Problem Description :
Customer used TSM for log archiving and the log archival location was shared between Primary(write to TSM) and Standby(read from TSM). When Q-rep tried to read a log on Primary, it reported the following error :
ASN0005E CAPTURE "ASN01" : "LogrdThread". The Capture program encountered an error when reading the DB2 log.
The log sequence number is “<log_sequence_number>", the SQLCODE is "-2038", and the reason code is "".
Root cause identification :
1> Db2diag.log had the below error logged for the SQL2038N :
EDUID : 112 EDUNAME: db2lfr.0 (<dbname>) 0
FUNCTION: DB2 UDB, recovery manager, sqlplfrIsLogFromValidChain, probe:4700
DATA #1 : <preformatted>
LFR Scan Num = 4304204
LFR Scan Caller's EDUID = 1715
Log extent 105382 on log stream 0 starts on a log chain larger than the most recent valid log chain.
Valid log chain: 8075083570/0000004E128922DA/1532490193/1227929500
Extent log chain: 8123381362/0000004E7A5CAA87/1532659966/1256964695
This shows that the log file we are trying to read (Log extent 105382) also exists on a higher log chain than the valid one we are currently working on.
2> On checking TSM, we saw 2 copies of same log file in 2 different chains:
Note : Although it might work in some cases, this operation is striclty unsupported by IBM DB2. No db maintenance activity which needs to break hadr and make a connection to the standby should be performed in a HADR setup. Should such an activity happen, we recommend to setup the HADR again from scratch by taking a new backup and restoring it in Standby. |
Because the log directory was shared on TSM, the Q-replication from Primary tried to read the next log in sequence S0105468 from that location.
Q-rep will always read from the highest log chain, and since the log file S0105468.LOG generated by standby was in log chain 59 (in TSM), it asked DB2 to read from there. Obviously, DB2 did not find the next log record in log chain 58 (current valid log chain on Primary) it was expecting and reported it with SQL2038N.
Please refer below link for more information about when log chains are created :
Title : Log chains are associated with bringing database out of rollforward.
Existing tech note: /support/pages/node/127841
Solution :
Hiding the log chain 59 so that the Q-replication will get the valid log from log chain 58 is the solution.
We suggested following steps in this case :
1> Let us query the log files on this chain 59 :
db2adutl query logs chain 59 db <dbname>
2> Extract logs from chain 59 to a different location (just in case we need to check or use them later):
db2adutl extract logs chain 59 db <dbname>
3> Delete the logs on chain 59 :
db2adutl delete logs chain 59 db <dbname>
On starting Qrep, it read the required log from log chain 58 and operations were resumed as usual.
As for HADR, we recommended to rebuild it from scratch as the earlier setup had become unsupported for already specified reasons.
UID
ibm11140154