HADR takeover service

The HADR takeover service is responsible to take over from a primary database when a connection problem occurs between IBM® Security Guardium® Key Lifecycle Manager master server and the primary database in the Multi-Master cluster. When the primary database is down, the takeover operation is initiated on a standby database so that the user operations are not hindered during the outage.

Important: Auto takeover functionality is discontinued from V4.1.0.1. For more information, see IBM Security Guardium Key Lifecycle Manager Version 4.1.0 Fix Packs.

You can configure agent.takeover.svc.interval property in the <SKLM_HOME>/config/SKLMConfig.properties file, for example, C:\Program Files\IBM\WebSphere\AppServer\products\sklm\config\SKLMConfig.properties to set the time interval for running HADR takeover service. For more information about the configuration property, see agent.takeover.svc.interval.

DB2® High Availability Disaster Recovery (HADR) is used in IBM Security Guardium Key Lifecycle Manager Multi-Master cluster. Configuring DB2 HADR protects you against data loss by transmitting data changes from a primary database to standby databases. Under normal conditions, DB2 HADR keeps the DB2 HADR primary and standby databases in sync.

Agents are installed on all the master servers in the cluster. Agent services track the availability of IBM Security Guardium Key Lifecycle Manager related ports. If the primary database is down, takeover service instructs the HADR standby database to take over as the new HADR primary database.

For the takeover operation, the primary and standby databases are continuously synchronized by using a secure communication channel. A set of DB2 HADR and WebSphere® Application Server configuration parameters are automatically updated for the takeover operation by using the configuration services that the agent runs. For more information about the various configuration services, see Configuration services.

DB2 HADR supports up to three standby databases in your Multi-Master setup. You can have one principal standby and up to two auxiliary standbys. Priorities are assigned to each of the standby database in the cluster. Standby with the higher priority is the one that assumes the primary database role. For example, if a primary database in the IBM Security Guardium Key Lifecycle Manager Multi-Master cluster fails, the standby database with a priority index 1 takes over the role as acting primary database. If the takeover operation on standby database with priority index 1 fails, the standby with next priority order (priority index 2) takes over as acting primary database.
Note: You must manually restart WebSphere Application Server in all the standby servers if an axillary standby takes over the primary role. WebSphere Application Server restart is not required when principal standby takes over the primary role.

IBM Security Guardium Key Lifecycle Manager supports the failback option. You can configure the primary database to take over the primary role when it comes up.

HADR Takeover
  • Takeover service of Instance 1 (primary master server) checks the database status (Primary Database) by using DB2 commands.
  • If the Primary Database is down, Instance 2 (standby master server) receives takeover request from the primary server. The Standby Database takes over as the Primary Database.
  • The primary master server receives a message from standby to indicate whether the takeover operation is successful. When the takeover operation fails, takeover service on the primary server sends takeover requests to the next standby if the cluster is configured with multiple standby servers.
  • When the old primary database server is up, takeover service starts HADR on it as standby.

For more information about prerequisites for DB2 HADR configuration, see Database configuration for high availability disaster recovery (HADR).

Manually initiating takeover operation

When the IBM Security Guardium Key Lifecycle Manager primary master server that contains the primary database is down, the takeover operation is not initiated automatically. In such cases, you can manually start the takeover operation by running the sklmTakeoverHADR script.
Note: If the operating system of the IBM Security Guardium Key Lifecycle Manager primary master server fails, use the instructions for manually initiating the takeover operation given here: Operating system of the IBM Security Guardium Key Lifecycle Manager primary master server fails.
  1. Locate the sklmTakeoverHADR script.
    Windows
    <SKLM_INSTALL_HOME>\agent

    Default location is C:\Program Files\IBM\SKLMV41\agent.

    Linux®
    <SKLM_INSTALL_HOME>/agent

    Default location is /opt/IBM/SKLMV41\agent.

  2. Open a command prompt and run the script.
    Windows
    Go to the <SKLM_INSTALL_HOME>\agent directory and run the following command:
    sklmTakeoverHADR.bat <WAS_HOME> [IP_HOSTNAME] [AGENT_PORT]
    For example,
    sklmTakeoverHADR.bat "C:\Program Files\IBM\WebSphere\AppServer" 9.113.37.10 60015
    Linux
    Go to the <SKLM_INSTALL_HOME>/agent directory and run the following command:
    sklmTakeoverHADR.sh <WAS_HOME> [IP_HOSTNAME] [AGENT_PORT]
    For example,
    ./sklmTakeoverHADR.sh /opt/IBM/WebSphere/AppServer 9.113.37.10 60015