HADR takeover operations in a Db2 pureScale environment
When an HADR standby database takes over as the primary database in a Db2 pureScale environment, there are a number of important differences from HADR in other environments.
With HADR, there are two types of takeover: role switch and failover. Role switch, sometimes called graceful takeover or non-forced takeover, can be performed only when the primary is available and it switches the role of primary and standby. Failover, or forced takeover, can be performed when the primary is not available. It is commonly used in primary failure cases to make the standby the new primary. The old primary remains in the primary role in a forced takeover, but the standby sends it a message to disable it. Both types of takeover are supported in a Db2 pureScale environment, and both can be issued from any of the standby database members and not just the current replay member. However, after the standby completes the transition to the primary role, the database is only started on the member that served as the replay member before the takeover. The database can be started on the other members by issuing an ACTIVATE DATABASE command or implicitly through a client connection.
Role switch
- Crash recovery is not occurring on the primary cluster, including member crash recovery that is pending or in progress.
- All the log streams are in peer or assisted remote catchup state.
- All the log streams are in remote catchup state or in assisted remote catchup state, and the synchronization mode is SUPERASYNC.
- New connections are rejected on all members, any open transactions are rolled back, and all remaining logs are shipped to the standby.
- The primary cluster's database role changes to standby.
- A member that has a direct connection to the standby is chosen as the replay member, with preference given to the preferred replay member (that is, the member that HADR was started from).
- Log receiving and replay starts on the replay member.
- The database is shut down on the other non-replay members of the cluster.
- Log receiving is stopped on the replay member after the end of logs is reached on each log stream, helping ensure no data loss.
- The replay member finishes replaying all received logs.
- After it is confirmed that the primary cluster is now in the standby role, the replay member changes the standby cluster's role to primary.
- The database is opened for client connections, but it is only activated on the member that was previously the standby replay member.
Failover
- After it receives the disabling message, the database is shut down and log writing is stopped.
- A disabling message is sent to the primary, if it is connected.
- Log shipping and log retrieval is stopped, which entails a risk of data loss.
- The replay member finishes replaying all received logs (that is, the logs that are stored in the log path).
- Any open transactions are rolled back.
- The replay member changes the standby cluster's role to primary.
- The database is opened for client connections, but it is only activated on the member that was previously the standby replay member.
You can reintegrate the old primary as a new standby only if its log streams did not diverge from the new primary's log streams. Before you can start HADR, the database must be offline on all of the old primary's members; the cluster caching facilities, however, can stay online. If any members are online, kill them instead of issuing the DEACTIVATE DATABASE command on them.