IBM Tivoli Netcool/OMNIbus, Version 7.4

Peer-to-peer failover mode for probes

Two instances of a probe can run simultaneously in a peer-to-peer failover relationship. One instance is designated as the master. The other instance acts as a slave and is on hot standby. If the master instance fails, the slave instance is activated.

Note: Peer-to-peer failover is not supported for all probes. Probes that list the Mode, PeerHost, and PeerPort properties when you run the command $OMNIHOME/probes/nco_p_probename -dumpprops support peer-to-peer failover.

To set up a peer-to-peer failover relationship:

The master instance sends a heartbeat poll to the slave instance at the time interval specified by the BeatInterval property. The slave instance caches all the alert data it receives and deletes all alert data from the cache each time a heartbeat is received from the master instance. If the slave instance receives no heartbeat in the time period defined by the sum of the values of the BeatInterval and BeatThreshold properties (BeatInterval + BeatThreshold), the slave instance assumes that the master is no longer active, and forwards all alerts in the cache to the ObjectServer. The slave instance continues to forward all alerts until it receives another heartbeat from the original master instance. The timeout period while waiting for heartbeats is 1 second. So there can be a maximum delay of (BeatInterval + BeatThreshold + 1) seconds before the slave instance forwards its cached alerts. All alerts in the cache are sent.

The BeatInterval setting that is defined for the master instance takes precedence; the slave instance ignores its local BeatInterval setting.

To disable the peer-to-peer failover relationship, run a single instance of the probe with the Mode property set to standard. This is the default setting.

The failover mode of probes running in a peer-to-peer failover relationship is set in the properties files.

You can also switch the mode of a probe between master and slave in the rules file. There is a delay of up to one second before the mode change takes effect. This can result in duplicate events if two probe instances are switching from standard mode to master or slave; however, no data is lost.

When the two probe instances running in store-and-forward mode are connected to a failover pair of ObjectServers, the master instance sends alerts to the primary ObjectServer. If the primary ObjectServer fails, the master instance of the probe fails over and starts sending alerts in its store-and-forward file to the backup ObjectServer. If the master instance of the probe fails, the slave instance takes over. If the slave instance fails to connect to the ObjectServer, the slave then creates a store-and-forward file for storing alert data. When the master instance is reactivated, any store-and-forward files in the master instance are deleted to prevent old alerts from being resent.

Example: Setting the peer-to-peer failover mode in the properties files

Example properties file values for the master are as follows:

PeerPort: 9999
PeerHost: "slavehost"
Mode: "master"

Example properties file values for the slave are as follows:

PeerPort: 9999
PeerHost: "masterhost"
Mode: "slave"

Example: Setting the peer-to-peer failover mode in the rules file

To switch a probe instance to become the master, use the rules file syntax:

%Mode = "master"