Overview of the failover process

A failover environment contains a primary and backup Data server and optionally a primary and backup Netcool/OMNIBus ObjectServer. The ObjectServer and Data server can be located on the same or on different computers. If the primary host fails or the network connection between the primary and backup servers is lost, the backup server takes over the processes of the primary server.

TBSM Data server and ObjectServer failover provides support for TBSM event and status processing, ObjectServer event and status processing, or both in the event of a hardware or software failure that affects one of these capabilities. You can configure a single backup Data server, a single backup ObjectServer, or both that takes over processing if the primary server fails. These backup servers do not perform any operational processing when the primary server is functional. Therefore, these backup servers do not perform any load-balancing function.

Within a few minutes of startup time, the backup server or servers loads its database and resynchronizes its status from events. If the primary server resumes function while the backup server is running as the primary server, the original primary server assumes the role of backup server.

You can configure the system so that the original server assumes the role of primary, and the backup server returns to its backup role. This behavior is called fail back.

If there is network connectivity loss between the primary and backup servers, the backup server assumes the role of primary server and the original primary server still functions as the primary server. When connectivity resumes, the backup server detects that the primary server is running again and it transitions back the to role of backup server. If the backup server is restarted, it resumes its backup role.

A data fetcher failure in the primary server does not trigger the failover process. If the primary data fetcher cannot connect to the database, the backup data fetcher probably cannot connect either.

Figure 1 illustrates the architecture of a failover environment.
Figure 1. Failover architecture
Failover architecture

By default, both the primary and backup TBSM Data and Dashboard servers use the primary instances of Tivoli® Netcool/OMNIbus. The backup servers for the supporting applications also use the primary servers of the other applications by default.

Time Window Analyzer metric data store

As with all properties files, all the TBSM Metric Collection property files must be kept in sync between the primary and backup TBSM Data servers. The TBSM Metric Collection Component does not provide any automatic processes for syncing these files.