Db2 HADR database pair both assume primary role
This topic will show you how to identify and resolve a case where both databases in an HADR pair assume the primary HADR database role due to compounding issues.
Identification of the problem
Confirm both databases have "HADR database role" set to PRIMARY. Run db2
get db cfg for | grep "HADR database role"
on each host as the instance user.
[rohant@svlxtorcpacemaker]# db2 get db cfg for gtdb| grep "HADR database role"
HADR database role = PRIMARY
[rohant@svlxtordpacemaker]# db2 get db cfg for gtdb| grep "HADR database role"
HADR database role = PRIMARY
Additionally, running crm status as root will show the database on one host in the failed state.
[root@svlxtorcpacemaker]# crm status
...
Clone Set: db2_rohant_rohant_GTDB-clone [db2_ rohant_rohant_GTDB] (promotable)
db2_rohant_rohant_GTDB (ocf::heartbeat:db2hadr): FAILED
Masters: [ svltord ]
The above output from crm status could be a transient state. Run the command a couple of times to confirm that the failure is persistent.
Resolution
Search for the promotion of the standby database in the pacemaker.log or db2diag.log.
Jun 19 14:10:52 svltordpacemaker-controld[1765] (abort_transition_graph) notice: Transition 8608 aborted by nodes-1-db2hadr-rohant_rohant_GTDB_reint doing modify db2hadr-rohant_rohant_GTDB_reint=1: Configuration change | cib=18.14477.0 source=te_update_diff_v2:465 path=/
db2hadr(db2_rohant_rohant_GTDB)[31427]: 2020/06/19_14:10:52 INFO: promote: 959: svtdbm: 0: CORAL: Debug data: "DB20000I The TAKEOVER HADR ON DATABASE command completed successfully.". db2hadr_promote() exit with rc=0.
2020-06-19-14.10.50.093150-420 I133204209A456 LEVEL: Info
PID : 16226 TID : 4395462813968 PROC : db2sysc 0
INSTANCE: rohant NODE : 000 DB : GTDB
HOSTNAME: svltord
EDUID : 80 EDUNAME: db2hadrs.0.0 (CORAL) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrStbyTkHandleInitialRequest, probe:46000
MESSAGE : Standby has initiated a takeover by force peer window only.....
2020-06-19-14.10.52.593838-420 I133268013A437 LEVEL: Info
PID : 16226 TID : 4395462813968 PROC : db2sysc 0
INSTANCE: rohantNODE : 000 DB : GTDB
HOSTNAME: svltord
EDUID : 80 EDUNAME: db2hadrp.0.1 (CORAL) 0
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrStbyTkHandleDoneDrain, probe:46840
MESSAGE : Standby has completed takeover (now primary).
As shown in the examples above, svltord
was promoted to become the primary,
meaning the other host svlxtorc
should be reintegrated as standby.
Reintegrate the database as standby on the host that was not promoted to primary by running db2 start hadr on db <dbname> as standby.
Run db2support to collect Db2 and Pacemaker diagnostics for analysis of original conditions leading to a double primary state.