Network Manager failover architecture (core processes)

Failover of the Network Manager core processes can be implemented by setting up primary and backup Network Manager installations that run on different servers. Both installations can either connect to a single Tivoli Netcool/OMNIbus ObjectServer or to a virtual pair of ObjectServers.

When you connect to a Network Manager server, the associated domain under which the processes run needs to be identified. Network Manager provides a virtual domain that can be used when running in failover mode. Any connection to this virtual domain is routed to the Network Manager installation that is running as the primary server in the failover architecture. This routing capability is provided by the Virtual Domain component.

The following figure shows the high-level failover architecture for the primary and backup Network Manager core processes, which are set up in two separate domains.

This figure shows the high-level Network Manager failover architecture. — Figure 1. Network Manager failover architecture

In the figure, both the primary and backup installations connect to a virtual pair of ObjectServers.

In each domain:

The Virtual Domain component (ncp_virtualdomain) manages failover, and raises health check events to indicate whether the domain is healthy.
The Probe for Tivoli Netcool/OMNIbus (nco_p_ncpmonitor) connects to the virtual ObjectServer pair, and forwards the health check events.
The Event Gateway (ncp_g_event) connects to the virtual ObjectServer pair, reads in all health check events, and then passes the events to the Virtual Domain component.
These health check events are used to trigger failover.

A TCP socket connection is required between the Virtual Domain processes, to copy data from the primary domain to the backup domain. This ensures that the topology is in sync when failover occurs.

Note: If you implement failover, then you must ensure that both the primary and backup installations are using identical encryption keys. If the encryption keys are not identical, then the backup poller does not function correctly during failover. To ensure that both the primary and backup installations are using identical encryption keys, copy the following file from the primary server to the same location on the backup server: $NCHOME/etc/security/keys/conf.key. If you enter all SNMP community strings on the command line and do not encrypt them, you do not need to do this task. Also, to update the NCIM and ObjectServer passwords use the Perl script ncp_password_update.pl.

NCIM implementations for failover

You can set up Network Manager failover with NCIM topology database high availability. This failover configuration protects against data loss by replicating data changes from the source NCIM topology database in the primary Network Manager domain to one or more target NCIM topology databases in the backup Network Manager domain. The source NCIM topology database is referred to as the primary database and the target NCIM topology database is referred to as the standby database. This approach removes the single point of failure because both the primary and backup Network Manager domains connect to whichever database is acting as the primary database.

In any failover configuration, both the primary and backup Network Manager domains connect to the same database, even with database high availability configured. The main difference is that with high availability, the database is replicated on the standby database server.

Note: In previous Network Manager releases, users could include an NCIM topology database failover configuration by using NCIM replication (also referred to as NCIM topology database replication). The NCIM replication feature has been replaced by the high availability feature that is provided by the supported database:

If you have a Db2® database, you can use the High Availability Disaster Recovery (HADR) feature to set up failover for NCIM.
If you have an Oracle database, you can use the Real Application Clusters (RAC) feature to set up failover for NCIM.

Regardless of whether failover is configured with or without NCIM topology database high availability, all entities in the topology are stored under the primary domain name, and all poll policies are configured for the primary domain. There is no entry in the domainMgr table for the backup domain. As a result, the NmosDomainName field for an event in the alerts.status table will always be populated with the primary domain name when failover is configured.

Note: To configure NCIM topology database high availability using Db2 HADR, set up the HADR environment by following the instructions provided in the Db2 documentation. See Related information later for links to your Db2 Information Center. You then perform tasks to configure Network Manager to work with Db2 HADR. If you have an Oracle database, set up the Oracle RAC environment using the instructions provided in the Oracle documentation. See related links later for a link to the Oracle documentation. You then perform tasks to configure Network Manager to work in the Oracle RAC environment.