Scenario - Deploying a two-sites multiple standby cluster with same-site failover automation

This scenario describes the planning, configuring and deploying of a Pacemaker HADR cluster with multiple standby databases and details of the takeover from one of the auxiliary standby databases. This setup has three standby databases: one principal standby and two auxiliary standby databases.

Important: In Db2® 11.5.8 and later, Mutual Failover high availability is supported when using Pacemaker as the integrated cluster manager. In Db2 11.5.6 and later, the Pacemaker cluster manager for automated fail-over to HADR standby databases is packaged and installed with Db2. In Db2 11.5.5, Pacemaker is included and available for production environments. In Db2 11.5.4, Pacemaker is included as a technology preview only, for development, test, and proof-of-concept environments.

Environment

The topic illustrates deploying HADR with multiple standby databases across two sites with same-site failover automation. The following table shows the set of hosts, instances, ports, and intended roles that are used in the example scenario:
Hostname Instance name Service port Intended role
Host_A dbinst1 10 Primary
Host_B dbinst2 20 Principal standby
Host_C dbinst3 30 Auxiliary standby
Host_D dbinst4 40 Auxiliary standby

The database hadrdb is configured as a multiple standby database with the topology outlined. All the commands that are used can be updated with the arguments appropriate for your specific deployment (such as the database name).

Configure Database with Archive Logging

HADR is only supported on a database that is configured with Archive logging. If your database is configured with Circular logging, you must first change the logarchmeth1 and logarchmeth2 configuration parameters. An offline backup of the database is required before the database is changed to use archive logging.

Plan a multiple standby setup

This scenario introduces the two-site multiple standby Pacemaker HADR setup. This setup includes an automated failover to standby enabled in each, but not across the two sites where the primary HADR database and principal standby are within the same domain in the primary site. Additionally, two auxiliary standby databases are within another domain in the disaster recovery (DR) site. This configuration allows the disaster recovery site to completely replace primary site with automation that is enabled by default when a manual takeover is issued on any of the auxiliary standby databases.

The following diagram, figure 1, depicts a two-site, three-standby dual automated database setup. The primary virtual IP on the primary site can be used to access the database, and each query is always directed to the primary database automatically. Pacemaker automation is supported between the two hosts in each site, which means when hosts fail on either primary site or the disaster recovery site, Pacemaker tries to recover.

Diagram showing a two-site three standby dual automated database setup for Pacemaker
With multiple standby databases configured in this topology, recovery is automated by Pacemaker with the following behavior that depends on the scenario:
  • Primary database failure
    • The primary database fails on Host_A, then the principal standby database on Host_B takeovers automatically as the primary. When the old primary database comes back online, it reintegrates as the principal standby.
  • Principal standby failure
    • The principal standby database fails on Host_B, then Pacemaker tries to bring it back as the principal standby, while all other databases remain in the same role.
  • Auxiliary standby failure
    • An auxiliary standby database fails on Host_C or Host_D. Pacemaker tries to bring it back as the auxiliary standby. Other database roles remain unchanged.
  • Takeover by force on an auxiliary database located in the disaster recovery site
    • If a manual takeover by force is issued on Host_C or Host_D, the databases on Host_A and Host_B both become auxiliary standby databases, while databases on Host_C and Host_D become the primary and principal standby. After the role switch, the previous three failure scenarios are still applicable.

Configuring a multiple standby setup

Take a backup of the intended primary HADR database by using the following command:
DB2 BACKUP DB hadrdb TO backup_dir
Copy the backup image to the other hosts. On each of the intended standby hosts, issue DROP DB to clean up any old databases that might exist and restore the backup image:
DB2 DROP DB hadrdb
DB2 RESTORE DB hadrdb FROM backup_dir
After the databases are restored on all standby hosts, as in a regular HADR setup, the following database configuration parameters must be explicitly set:
  • hadr_local_host
  • hadr_local_svc
  • hadr_remote_host
  • hadr_remote_inst
  • hadr_remote_svc
However, before these parameters are set, users need to determine the hostname, port number, and instance name of the four databases that are in the HADR setup:
Intended role Hostname Port number Instance name
Primary Host_A 10 db2inst1
Principal standby Host_B 20 db2inst2
Auxiliary standby Host_C 30 db2inst3
Auxiliary standby Host_D 40 db2inst4

On the primary, the settings for the hadr_remote_host, hadr_remote_inst, and hadr_remote_svc configuration parameters correspond to the hostname, instance name, and port number of the principal standby. On the standby databases, the values of these configuration parameters correspond to the hostname, port number, and instance name of the primary.

In addition, hostnames and port numbers are used to set the hadr_target_list configuration parameter on all the databases. In the following example, hadr_target_list would be configured as follows for each host:
Hostname Intended role hadr_target_list
Host_A Primary Host_B:20|Host_C:30|Host_D:40
Host_B Principal standby Host_A:10|Host_C:30|Host_D:40
Host_C Auxiliary standby Host_D:40|Host_A:10|Host_B:20
Host_D Auxiliary standby Host_C:30|Host_A:10|Host_B:20

In addition to the hadr_target_list configuration settings, users always want the closest possible synchronization between the primary and the principal standby. Since automated failover is configured within each site, databases in the disaster recovery site can become the primary and principal standby when a takeover is issued. Therefore, all databases can have the hadr_syncmode parameter set to SYNC. The hadr_syncmode parameter can also be set to SYNC for the auxiliary standby databases because the synchronization mode set in hadr_syncmode can only be effective when the database becomes the primary or principal standby, otherwise the auxiliary databases always have an effective synchronization mode of SUPERASYNC.

The following commands can be issued to update the configuration parameters on each of the databases:
  • On Host_A (Primary):
    DB2 "UPDATE DB CFG FOR hadrdb USING
    HADR_TARGET_LIST  Host_B:20|Host_C:30|Host_D:40
    HADR_REMOTE_HOST  Host_B
    HADR_REMOTE_SVC   20
    HADR_LOCAL_HOST   Host_A
    HADR_LOCAL_SVC    10
    HADR_SYNCMODE     sync
    HADR_REMOTE_INST  db2inst2"
  • On Host_B (Principal Standby):
    DB2 "UPDATE DB CFG FOR hadrdb USING
    HADR_TARGET_LIST  Host_A:10|Host_C:30|Host_D:40
    HADR_REMOTE_HOST  Host_A
    HADR_REMOTE_SVC   10
    HADR_LOCAL_HOST   Host_B
    HADR_LOCAL_SVC    20
    HADR_SYNCMODE     sync
    HADR_REMOTE_INST  db2inst1"
  • On Host_C (Auxiliary Standby):
    DB2 "UPDATE DB CFG FOR hadrdb USING
    HADR_TARGET_LIST  Host_D:40|Host_A:10|Host_B:20
    HADR_REMOTE_HOST  Host_A
    HADR_REMOTE_SVC   10
    HADR_LOCAL_HOST   Host_C
    HADR_LOCAL_SVC    30
    HADR_SYNCMODE     sync
    HADR_REMOTE_INST  db2inst1"
  • On Host_D (Auxiliary Standby):
    DB2 "UPDATE DB CFG FOR hadrdb USING
    HADR_TARGET_LIST  Host_C:30|Host_A:10|Host_B:20
    HADR_REMOTE_HOST  Host_A
    HADR_REMOTE_SVC   10
    HADR_LOCAL_HOST   Host_D
    HADR_LOCAL_SVC    40
    HADR_SYNCMODE     sync
    HADR_REMOTE_INST  db2inst1"
After each database is configured, each database is shown:
Configuration parameter Host_A Host_B Host_C Host_D
hadr_target_list Host_B:20|Host_C:30|Host_D:40 Host_A:10|Host_C:30|Host_D:40 Host_D:40|Host_A:10|Host_B:20 Host_C:30|Host_A:10|Host_B:20
hadr_remote_host Host_B Host_A Host_A Host_A
hadr_remote_svc 20 10 10 10
hadr_remote_inst db2inst2 db2inst1 db2inst1 db2inst1
hadr_local_host Host_A Host_B Host_C Host_D
hadr_local_svc 10 20 30 40
Configured hadr_syncmode SYNC SYNC SYNC SYNC
Effective hadr_syncmode N/A SYNC SUPERASYNC SUPERASYNC
Note: The effective hadr_syncmode parameter can be viewed by running the db2pd -db hadrdb -hadr command on each host.

Starting the HADR databases

Start HADR on the standby databases first, by issuing the following commands on Host_B, Host_C and Host_D:
DB2 START HADR ON DB hadrdb AS STANDBY
Then, start HADR on the primary database, in this example, on Host_A:
DB2 START HADR ON DB hadrdb AS PRIMARY
To verify that HADR is up and running, query the status of the databases from the primary on Host_A by running the db2pd -db hadrdb -hadr command, which returns information about all of the standby databases. For example:
db2pd -db hadrdb -hadr

Database Member 0 -- Database hadrdb -- Active -- Up 0 days 00:00:13 -- Date 2020-07-14-08.52.30.018629

                            HADR_ROLE = PRIMARY
                          REPLAY_TYPE = PHYSICAL
                        HADR_SYNCMODE = SYNC
                          STANDBY_ID = 1
                        LOG_STREAM_ID = 0
                          HADR_STATE = PEER
                          HADR_FLAGS = TCP_PROTOCOL
                  PRIMARY_MEMBER_HOST = Host_A
                    PRIMARY_INSTANCE = db2inst1
                      PRIMARY_MEMBER = 0
                  STANDBY_MEMBER_HOST = Host_B
                    STANDBY_INSTANCE = db2inst2
                      STANDBY_MEMBER = 0
                  HADR_CONNECT_STATUS = CONNECTED
            HADR_CONNECT_STATUS_TIME = 2020-07-14 08.52.19.818272 (1594741939)
          HEARTBEAT_INTERVAL(seconds) = 30
                    HEARTBEAT_MISSED = 0
                  HEARTBEAT_EXPECTED = 0
                HADR_TIMEOUT(seconds) = 120
        TIME_SINCE_LAST_RECV(seconds) = 8
            PEER_WAIT_LIMIT(seconds) = 0
          LOG_HADR_WAIT_CUR(seconds) = 0.000
    LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.007153
  LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.021
                  LOG_HADR_WAIT_COUNT = 3
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 87040
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 369280
            PRIMARY_LOG_FILE,PAGE,POS = S0000000.LOG, 14, 44893780
            STANDBY_LOG_FILE,PAGE,POS = S0000000.LOG, 0, 44836001
                  HADR_LOG_GAP(bytes) = 0
    STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000000.LOG, 0, 44836001
      STANDBY_RECV_REPLAY_GAP(bytes) = 0
                    PRIMARY_LOG_TIME = 2020-07-14 08.52.21.000000 (1594741941)
                    STANDBY_LOG_TIME = 2020-07-14 08.49.57.000000 (1594741797)
              STANDBY_REPLAY_LOG_TIME = 2020-07-14 08.49.57.000000 (1594741797)
        STANDBY_RECV_BUF_SIZE(pages) = 512
            STANDBY_RECV_BUF_PERCENT = 0
          STANDBY_SPOOL_LIMIT(pages) = 13000
                STANDBY_SPOOL_PERCENT = 0
                  STANDBY_ERROR_TIME = NULL
                PEER_WINDOW(seconds) = 120
                      PEER_WINDOW_END = 2020-07-14 08.54.20.000000 (1594742060)
            READS_ON_STANDBY_ENABLED = N
              HADR_LAST_TAKEOVER_TIME = NULL

                            HADR_ROLE = PRIMARY
                          REPLAY_TYPE = PHYSICAL
                        HADR_SYNCMODE = SUPERASYNC
                          STANDBY_ID = 2
                        LOG_STREAM_ID = 0
                          HADR_STATE = REMOTE_CATCHUP
                          HADR_FLAGS = TCP_PROTOCOL
                  PRIMARY_MEMBER_HOST = Host_A
                    PRIMARY_INSTANCE = db2inst1
                      PRIMARY_MEMBER = 0
                  STANDBY_MEMBER_HOST = Host_C
                    STANDBY_INSTANCE = db2inst3
                      STANDBY_MEMBER = 0
                  HADR_CONNECT_STATUS = CONNECTED
            HADR_CONNECT_STATUS_TIME = 2020-07-14 08.52.20.189945 (1594741940)
          HEARTBEAT_INTERVAL(seconds) = 30
                    HEARTBEAT_MISSED = 0
                  HEARTBEAT_EXPECTED = 0
                HADR_TIMEOUT(seconds) = 120
        TIME_SINCE_LAST_RECV(seconds) = 10
            PEER_WAIT_LIMIT(seconds) = 0
          LOG_HADR_WAIT_CUR(seconds) = 0.000
    LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.007153
  LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.021
                  LOG_HADR_WAIT_COUNT = 3
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 87040
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 369280
            PRIMARY_LOG_FILE,PAGE,POS = S0000000.LOG, 14, 44893780
            STANDBY_LOG_FILE,PAGE,POS = S0000000.LOG, 0, 44836001
                  HADR_LOG_GAP(bytes) = 0
    STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000000.LOG, 0, 44836001
      STANDBY_RECV_REPLAY_GAP(bytes) = 0
                    PRIMARY_LOG_TIME = 2020-07-14 08.52.21.000000 (1594741941)
                    STANDBY_LOG_TIME = 2020-07-14 08.49.57.000000 (1594741797)
              STANDBY_REPLAY_LOG_TIME = 2020-07-14 08.49.57.000000 (1594741797)
        STANDBY_RECV_BUF_SIZE(pages) = 512
            STANDBY_RECV_BUF_PERCENT = 0
          STANDBY_SPOOL_LIMIT(pages) = 13000
                STANDBY_SPOOL_PERCENT = 0
                  STANDBY_ERROR_TIME = NULL
                PEER_WINDOW(seconds) = 0
            READS_ON_STANDBY_ENABLED = N
              HADR_LAST_TAKEOVER_TIME = NULL

                            HADR_ROLE = PRIMARY
                          REPLAY_TYPE = PHYSICAL
                        HADR_SYNCMODE = SUPERASYNC
                          STANDBY_ID = 3
                        LOG_STREAM_ID = 0
                          HADR_STATE = REMOTE_CATCHUP
                          HADR_FLAGS = TCP_PROTOCOL
                  PRIMARY_MEMBER_HOST = Host_A
                    PRIMARY_INSTANCE = db2inst1
                      PRIMARY_MEMBER = 0
                  STANDBY_MEMBER_HOST = Host_D
                    STANDBY_INSTANCE = db2inst4
                      STANDBY_MEMBER = 0
                  HADR_CONNECT_STATUS = CONNECTED
            HADR_CONNECT_STATUS_TIME = 2020-07-14 08.52.19.403210 (1594741939)
          HEARTBEAT_INTERVAL(seconds) = 30
                    HEARTBEAT_MISSED = 0
                  HEARTBEAT_EXPECTED = 0
                HADR_TIMEOUT(seconds) = 120
        TIME_SINCE_LAST_RECV(seconds) = 11
            PEER_WAIT_LIMIT(seconds) = 0
          LOG_HADR_WAIT_CUR(seconds) = 0.000
    LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.007153
  LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.021
                  LOG_HADR_WAIT_COUNT = 3
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 87040
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 369280
            PRIMARY_LOG_FILE,PAGE,POS = S0000000.LOG, 14, 44893780
            STANDBY_LOG_FILE,PAGE,POS = S0000000.LOG, 0, 44836001
                  HADR_LOG_GAP(bytes) = 0
    STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000000.LOG, 0, 44836001
      STANDBY_RECV_REPLAY_GAP(bytes) = 0
                    PRIMARY_LOG_TIME = 2020-07-14 08.52.21.000000 (1594741941)
                    STANDBY_LOG_TIME = 2020-07-14 08.49.57.000000 (1594741797)
              STANDBY_REPLAY_LOG_TIME = 2020-07-14 08.49.57.000000 (1594741797)
        STANDBY_RECV_BUF_SIZE(pages) = 512
            STANDBY_RECV_BUF_PERCENT = 0
          STANDBY_SPOOL_LIMIT(pages) = 13000
                STANDBY_SPOOL_PERCENT = 0
                  STANDBY_ERROR_TIME = NULL
                PEER_WINDOW(seconds) = 0
            READS_ON_STANDBY_ENABLED = N
              HADR_LAST_TAKEOVER_TIME = NULL
After HADR is running, Pacemaker resources need to be created for all hosts, on both sites, for cluster management. Since Pacemaker resources can only be created on the active site (primary site), users must first create Pacemaker resources on the primary site with the following steps as root:
  1. Create the cluster and Ethernet resource by running:
    db2cm -create -cluster -domain db2ha1 -host Host_A -publicEthernet eth1 -host Host_B -publicEthernet eth1
  2. Create the instance resources by running:
    db2cm -create -instance db2inst1 -host Host_A
    and
    db2cm -create -instance db2inst2 -host Host_B
  3. On Host_A, create the database resource by running:
    db2cm -create -db hadrdb -instance db2inst1
  4. On Host_A, create the primary virtual IP (IP address can be within the same subnet with the hosts) by running:
    db2cm -create -primaryVIP x.x.x.x -db hadrdb -instance db2inst1
  5. On Host_A, create the standby virtual IP (IP address can be within the same subnet with the hosts) by running:
    db2cm -create -standbyVIP x.x.x.x -db hadrdb -instance db2inst1
After the previous steps are completed successfully, the Pacemaker resources can be verified by running the crm status command:
Online: [ Host_A Host_B ]

Full list of resources:
db2_Host_A_eth1       (ocf::heartbeat:db2ethmon):     Started Host_A
db2_Host_B_eth1      (ocf::heartbeat:db2ethmon):     Started Host_B
db2_Host_B_db2inst2_0        (ocf::heartbeat:db2inst):       Started Host_B
db2_Host_A_db2inst1_0 (ocf::heartbeat:db2inst):       Started Host_A
Clone Set: db2_db2inst1_db2inst2_hadrdb-clone [db2_db2inst1_db2inst2_hadrdb] (promotable)
Masters: [ Host_A ]
Slaves: [ Host_B ]
db2_db2inst1_db2inst2_hadrdb-primary-VIP  (ocf::heartbeat:IPaddr2):    Started Host_A
db2_db2inst1_db2inst2_hadrdb-standby-VIP  (ocf::heartbeat:IPaddr2):  Started Host_B
Next, create Pacemaker resources on the disaster recovery site.
  1. Manually issue a takeover command on Host_C so that site 2 becomes the primary site and Host_C now hosts the primary database.
    DB2 TAKEOVER HADR ON DB hadrdb
  2. Repeat the previous steps on Host_C while replacing hostnames and instance names with the names for Host_C and Host_D. Verify the status of resources by running crm status.
  3. After Pacemaker resources are online, manually issue a takeover on Host_A to takeover all databases back to the original primary site:
    DB2 TAKEOVER HADR ON DB hadrdb
Now the crm status output of the disaster recovery site can look like:
Online: [ Host_C Host_D ]

Full list of resources:
db2_Host_C_eth1       (ocf::heartbeat:db2ethmon):     Started Host_C
db2_Host_D_eth1       (ocf::heartbeat:db2ethmon):     Started Host_D
db2_Host_D_db2inst4_0        (ocf::heartbeat:db2inst):       Started Host_D
db2_Host_C_db2inst3_0        (ocf::heartbeat:db2inst):       Started Host_C
Clone Set: db2_db2inst3_db2inst4_hadrdb-clone [db2_db2inst3_db2inst4_hadrdb] (promotable) (unmanaged)
db2_db2inst3_db2inst4_hadrdb (ocf::heartbeat:db2hadr):  Masters Host_C (unmanaged)
db2_db2inst3_db2inst4_hadrdb (ocf::heartbeat:db2hadr):  Slaves Host_D (unmanaged)
db2_db2inst3_db2inst4_hadrdb-primary-VIP  (ocf::heartbeat:IPaddr2):    Started Host_C
db2_db2inst3_db2inst4_hadrdb-standby-VIP  (ocf::heartbeat:IPaddr2):  Started Host_D

Failover between primary and principal standby

When the primary database on Host_A fails, Pacemaker has the principal standby database on Host_B takeover as the primary automatically, then try to bring the old primary back as the principal standby as shown in following diagram:
In this case the configuration parameters hadr_remote_host, hadr_remote_svc, and hadr_remote_inst is changed on Host_C and Host_D, and they are updated to point to the new primary on Host_B.
Configuration parameter Host_A Host_B Host_C Host_D
hadr_target_list Host_B:20|Host_C:30|Host_D:40 Host_A:10|Host_C:30|Host_D:40 Host_D:40|Host_A:10|Host_B:20 Host_C:30|Host_A:10|Host_B:20
hadr_remote_host Host_B Host_A Host_B Host_B
hadr_remote_svc 20 10 20 20
hadr_remote_inst db2inst2 db2inst1 db2inst2 db2inst2
hadr_local_host Host_A Host_B Host_C Host_D
hadr_local_svc 10 20 30 40
Configured hadr_syncmode SYNC SYNC SYNC SYNC
Effective hadr_syncmode SYNC N/A SUPERASYNC SUPERASYNC

Takeover between sites in a multiple standby Pacemaker HADR setup

In the situation that the entire primary site is going to undergo maintenance or planned outage, the disaster recovery site can takeover as the primary site to ensure that the database always remains available by issuing a manual takeover. Before issuing the takeover, users need to query the two auxiliary standby databases to determine which one has the most log data by running the following command:
db2pd -hadr -db hadrdb | grep STANDBY_LOG_FILE,PAGE,POS
Here are sample outputs of the command from Host_C:
STANDBY_LOG_FILE,PAGE,POS = S0000001.LOG, 142, 72318245
And from Host_D:
STANDBY_LOG_FILE,PAGE,POS = S0000001.LOG, 134, 63254745

To check whether a standby is up to date, first check if it has the largest log sequence in its log file name, then check whether it has the largest page number, and finally check whether it has the largest position value. If all three values are the same, you can pick any auxiliary standby.

In the example, the database on Host_C is determined as the most up to date, so it can become primary by running the takeover command on Host_C:
DB2 TAKEOVER HADR ON DB hadrdb

After the takeover successfully completes, the database on Host_C now becomes the new primary and the database of Host_D becomes the new principal standby since it is first on the hadr_target_list of the database on Host_C. The primary VIP on the new primary site can now be used to access the database.

The setup would look like the following diagram:
The values for hadr_remote_host, hadr_remote_svc, and hadr_remote_inst on Host_C are reconfigured to point to Host_D. Meanwhile, these parameters on Host_D are reconfigured to point to Host_C. Configuration values on Host_A and Host_B are not changed since they are still offline and unavailable. The configuration values of each database are shown:
Configuration parameter Host_A Host_B Host_C Host_D
hadr_target_list Host_B:20|Host_C:30|Host_D:40 Host_A:10|Host_C:30|Host_D:40 Host_D:40|Host_A:10|Host_B:20 Host_C:30|Host_A:10|Host_B:20
hadr_remote_host Host_C Host_C Host_D Host_C
hadr_remote_svc 30 30 40 30
hadr_remote_inst db2inst3 db2inst3 db2inst4 db2inst3
hadr_local_host Host_A Host_B Host_C Host_D
hadr_local_svc 10 20 30 40
Configured hadr_syncmode SYNC SYNC SYNC SYNC
Effective hadr_syncmode SUPERASYNC SUPERASYNC N/A SYNC

Failover between sites in a multiple standby Pacemaker HADR setup

In the case that the entire primary site, where both Host_A and Host_B go down, the disaster recovery site needs to takeover as the primary site and accept client traffic. Before issuing the takeover, users need to query the two auxiliary standby databases to determine which one has the most log data by running the following command:
db2pd -hadr -db hadrdb | grep STANDBY_LOG_FILE,PAGE,POS
In this example, Host_C is determined as the most up to date, so it can be picked as the new primary using the following command:
DB2 TAKEOVER HADR ON DB hadrdb BY FORCE

After the takeover completes successfully, the database on Host_C becomes the new primary and the database on Host_D becomes the new principal standby since it is the first on the hadr_target_list of the database on Host_C. The primary VIP on the new primary site can now be used to access the database on Host_C.

The values for hadr_remote_host, hadr_remote_svc, and hadr_remote_inst on Host_C are reconfigured to point to Host_D. In the meantime, these parameters on Host_D are reconfigured to point to Host_C. Configuration values on Host_A and Host_B are not changed since they are still offline and unavailable. The configuration values of each database are shown:
Configuration parameter Host_A (unavailable) Host_B (unavailable) Host_C Host_D
hadr_target_list Host_B:20|Host_C:30|Host_D:40 Host_A:10|Host_C:30|Host_D:40 Host_D:40|Host_A:10|Host_B:20 Host_C:30|Host_A:10|Host_B:20
hadr_remote_host Host_B Host_A Host_D Host_C
hadr_remote_svc 20 10 40 30
hadr_remote_inst db2inst2 db2inst1 db2inst4 db2inst3
hadr_local_host Host_A Host_B Host_C Host_D
hadr_local_svc 10 20 30 40
Configured hadr_syncmode SYNC SYNC SYNC SYNC
Effective hadr_syncmode N/A N/A N/A SYNC
After a certain time, the old primary site comes back online, which means Host_A and Host_B become available again. After the original primary site recovers, and Host_A and Host_B become available, the databases fails to start as not all of their logs were propagated to Host_C. They must be manually reintegrated by using the following steps:
  1. Deactivate the database on Host_A and Host_B by issuing the following command on both hosts:
    DB2 DEACTIVATE DB hadrdb
  2. Stop HADR on Host_A and Host_B to disable automation between them by running:
    DB2 STOP HADR ON DB hadrdb
  3. Drop databases on Host_A and Host_B by running:
    DB2 DROP DB hadrdb
  4. Backup the new primary database on Host_C online by running:
    DB2 BACKUP DB hadrdb online
  5. Copy the backup image to both Host_A and Host_B using the scp command.
  6. Restore the database using the backup image on Host_A and Host_B by running:
    DB2 RESTORE DB hadrdb
  7. Reconfigure the restored database on Host_A and Host_B by running the command on each host:
    On Host_A:
    DB2 "UPDATE DB CFG FOR hadrdb USING
    HADR_TARGET_LIST  Host_B:20|Host_C:30|Host_D:40
    HADR_REMOTE_HOST  Host_C
    HADR_REMOTE_SVC   30
    HADR_LOCAL_HOST   Host_A
    HADR_LOCAL_SVC    10
    HADR_SYNCMODE     sync
    HADR_REMOTE_INST  db2inst3"
    On Host_B:
    DB2 "UPDATE DB CFG FOR hadrdb USING
    HADR_TARGET_LIST  Host_A:10|Host_C:30|Host_D:40
    HADR_REMOTE_HOST  Host_C
    HADR_REMOTE_SVC   30
    HADR_LOCAL_HOST   Host_B
    HADR_LOCAL_SVC    20
    HADR_SYNCMODE     sync
    HADR_REMOTE_INST  db2inst3"
  8. Start HADR on Host_A and Host_B to reintegrate them as auxiliary standby databases by running:
    DB2 START HADR ON DB hadrdb AS STANDBY
After running the previous commands successfully, Host_A and Host_B can successfully become auxiliary standby databases. The following diagram illustrates the new setup:
The values for hadr_remote_host, hadr_remote_svc, and hadr_remote_inst on both databases can be reconfigured to point to the new primary, and the effective hadr_syncmode for both databases can now become SUPERASYNC. The configuration for each database is shown:
Configuration parameter Host_A Host_B Host_C Host_D
hadr_target_list Host_B:20|Host_C:30|Host_D:40 Host_A:10|Host_C:30|Host_D:40 Host_D:40|Host_A:10|Host_B:20 Host_C:30|Host_A:10|Host_B:20
hadr_remote_host Host_C Host_C Host_D Host_C
hadr_remote_svc 30 30 40 30
hadr_remote_inst db2inst3 db2inst3 db2inst4 db2inst3
hadr_local_host Host_A Host_B Host_C Host_D
hadr_local_svc 10 20 30 40
Configured hadr_syncmode SYNC SYNC SYNC SYNC
Effective hadr_syncmode SUPERASYNC SUPERASYNC N/A SYNC