TAKEOVER HADR command

The TAKEOVER HADR command instructs an HADR standby database to take over as the new HADR primary database for the HADR pair. This is a cluster-wide command in a Db2® pureScale® environment, so you can issue it on any member on the standby, including non-replay members.

Authorization

One of the following authorities:

SYSADM
SYSCTRL
SYSMAINT

Required connection

Instance. The command establishes a database connection if one does not exist, and closes the database connection when the command completes.

Command syntax

Command parameters

DATABASE database-alias

Identifies the current HADR standby database that should take over as the HADR primary database.

USER user-name

Identifies the user name under which the takeover operation is to be started.

USING password: The password used to authenticate user-name.

BY FORCE

Specifies that the database is not to wait for confirmation that the original HADR primary database has been shut down. Unless you are using SUPERASYNC synchronization mode, this option is required if the HADR pair is not in peer state.

PEER WINDOW ONLY: When this option is specified, there is not any committed transaction loss if the command succeeds and the primary database is brought down before the end of the peer window period (set the database configuration parameter hadr_peer_window to a non-zero value). Not bringing down the primary database, before the peer window expires, results in a split brain. If the TAKEOVER BY FORCE PEER WINDOW ONLY command is executed when the HADR pair is not in a peer or disconnected peer state (the peer window has expired), an error is returned.
You cannot use the PEER WINDOW ONLY option when the synchronization mode is set to ASYNC or SUPERASYNC.

Note: The takeover operation with the PEER WINDOW ONLY option can behave incorrectly if the primary database clock and the standby database clock are not synchronized to within 5 seconds of each other. That is, the operation may succeed when it should fail, or fail when it should succeed. You should use a time synchronization service (for example, NTP) to keep the clocks synchronized to the same source.

Usage notes

Table 1 and Table 2 show the behavior of the TAKEOVER HADR command when issued on an active standby database for each possible state and option combination. An error message is returned if this command is issued on an inactive standby database.

Table 1. Takeover operation without the BY FORCE option
Standby state	Takeover behavior
Disconnected peer	Takeover fails and an error message is returned.
Local catchup	Takeover fails and an error message is returned.
Peer	The primary database and standby database switch roles. If no failure is encountered during takeover, there is no data loss. However, if failures are encountered during takeover, data loss might occur and the roles of the primary and standby might or might not have been changed. The following is a guideline for handling failures during a takeover in which the primary and standby switch roles: If a failure occurs during a takeover operation, the roles of the HADR databases might or might not have been changed. If possible, make sure both databases are online. Check the HADR role of the available database or databases using the Snapshot Monitor, or by checking the value of the database configuration parameter hadr_db_role. If the intended new primary is still in standby role, and takeover is still required, re-issue the TAKEOVER HADR command (see the next guideline regarding the BY FORCE option). It is possible to end up with both databases in standby role. In that case, the TAKEOVER HADR command with the BY FORCE option can be issued at whichever node should now become the primary. The BY FORCE option is required in this case because the two standbys cannot establish the usual HADR primary-standby connection.
Remote catchup	Non-forced takeover is allowed in remote catchup state only if one of the following is true: The HADR synchronization mode is SUPERASYNC In a Db2 pureScale environment, a stream is in assisted remote catchup, regardless of the synchronization mode Before starting a non-forced takeover operation, check the log gap between the primary and standby databases. Because the standby database must retrieve the logs in the gap and replay them, a large gap causes a long elapsed time for the takeover operation. It is recommended that you perform non-forced takeover operations only when the log gap is small. To reduce the log gap between the primary and the standby databases, consider stopping or reducing the workload on the primary database.
Remote catchup pending	Takeover fails and an error message is returned.

Table 2. Takeover operation specifying the BY FORCE option
Standby state	Takeover behavior
Disconnected peer (without the PEER WINDOW ONLY option)	The standby database becomes the primary database, but there is no assurance of data consistency. Note: A no transaction loss takeover is also possible using the TAKEOVER BY FORCE command without the PEER WINDOW ONLY option, that is, unconditional failover, as long as the necessary conditions hold. Such a failover can be executed even long after the expiration of the peer window that was in effect when the primary failed.
Disconnected peer (with the PEER WINDOW ONLY option)	The standby database becomes the primary database, and there is a greater assurance of data consistency than if you did not specify the PEER WINDOW ONLY option. There are situations in which data loss can still happen: If the primary database remains active past the time when the peer window expires, and if the primary database still has no connection to the standby database, the primary database moves out of disconnected peer state and resume processing transactions independently. In NEARSYNC mode, if the standby database fails after acknowledging receipt of transaction logs from the primary database but before writing that transaction log information to disk, then that transaction log information, in the log receive buffer, might be lost.
Local catchup	In most cases, takeover fails and an error message is returned. The exception is when primary reintegration is in progress; during the reintegration, forced a takeover is allowed on a standby in local catchup state.
Peer	The standby database becomes the primary database, but there is no assurance of data consistency. Even with SYNC and NEARSYNC mode, the primary can fall out of peer state and commit more transactions, with the standby still in peer state and not aware of the primary's state change (the primary and standby may not notice network connection breakage at the same time).
Remote catchup	The standby database becomes the primary database, but there is a risk of data loss.
Remote catchup pending	The standby database becomes the primary database, but there is a risk of data loss. If log retrieval is in progress (retrieval only happens in remote catchup pending state), retrieval is stopped as part of the takeover process.

When issuing the TAKEOVER HADR command, the corresponding error codes might be generated: SQL1767N, SQL1769N, or SQL1770N with a reason code of 98. The reason code indicates that there is no installed license for HADR on the server where the command was issued. To correct the problem, install a valid HADR license using the db2licm or install a version of the server that contains a valid HADR license as part of its distribution.

When issuing the TAKEOVER HADR command, error code SQL1770N with a reason code of 15 might be generated. The reason code indicates that a takeover (either forced or unforced) is not allowed on the HADR standby database that is upgrade in progress. To correct the problem, do one of the following:

If you do not have an immediate need to connect to the standby database, wait for the UPGRADE DATABASE command to complete on the primary database and the standby database to replay all upgrade log records that were sent from the primary database then reissue the command.
If you need to connect to this standby database immediately, issue the STOP HADR command to turn the HADR role to STANDARD.

When you issue the TAKEOVER BY FORCE PEER WINDOW ONLY command, and it succeeds (you called it while the primary was disconnected from the standby, but still within the peer window), then there is not any transaction information about the primary database that was not already copied to the standby database.

Note: On forced takeovers that specify the PEER WINDOW ONLY option, the following occur:

Forced takeover stops log shipping or log retrieval on the standby. Log replay continues to the end of received or retrieved logs.
During a forced takeover, if the standby is connected to the old primary, it sends a poison pill, or a disabling message, to the old primary. This is done on a best effort basis; due to network, hardware, or software problems, the old primary might not receive the disabling message or correctly process it. Once the disabling message is received, the old primary should persist the poison pill to disk and shut itself down. As long as the pill is in effect, the old primary cannot be restarted. The pill is cleared only when one of the following commands is issued on the old primary:
- START HADR with the AS STANDBY option (that is, the old primary is reintegrated as a new standby)
- START HADR with the AS PRIMARY and BY FORCE options (the old primary is explicitly restarted as the primary, for reasons such as: the new primary failed to serve as the primary, so the user switches back to old primary; the user wants a clone of the database)
- STOP HADR (that is, the database is no longer an HADR database)
- DROP DATABASE
- RESTORE DATABASE

Takeover and reads on standby

If you have reads on standby enabled, any user application currently connected to the standby is disconnected to allow the takeover to proceed. Depending on the number of readers that are active on the standby, the takeover operation can take slightly longer to complete than it would if there were no readers on the standby. New connections are not allowed during the role switch. Any attempt to connect to the HADR standby during the role switch on takeover receives an error (SQL1776N).

Takeover and log spooling

If you are using a high value for hadr_spool_limit, you should consider that if there is a large gap between the log position of the primary and log replay on the standby, which might lead to a longer takeover time because the standby cannot assume the role of the new standby until the replay of the spooled logs finishes.

Takeover and delayed replay

If you have configured hadr_replay_delay to a non-zero value, you cannot issue the command on that standby (SQL1770N).

Takeover in a Db2 pureScale environment

The following considerations apply to Db2 pureScale environments:

All log streams must pass the check to allow a takeover command to proceed. However, the streams do not need to be in the same state.
When a primary database changes role into a standby database, a member that has a direct connection to the old standby is chosen as the replay member, with preference given to the preferred replay member (the preferred member is not chosen if it has no direct connection to the standby). Non-replay members are deactivated.
When a standby database changes role into a primary database, only the old replay member stays active; other members on the new primary are not activated.
Non-forced takeover is not allowed if any member on the primary is in member crash recovery (MCR) pending or in progress state.
Non-forced takeover is not allowed if the primary database is in group crash recovery because the streams cannot be in the required state.