Network Manager failover architecture (core processes)
Failover of the Network Manager core processes can be implemented by setting up primary and backup Network Manager installations that run on different servers. Both installations can either connect to a single Tivoli Netcool/OMNIbus ObjectServer or to a virtual pair of ObjectServers.
When you connect to a Network Manager server, the associated domain under which the processes run needs to be identified. Network Manager provides a virtual domain that can be used when running in failover mode. Any connection to this virtual domain is routed to the Network Manager installation that is running as the primary server in the failover architecture. This routing capability is provided by the Virtual Domain component.
The following figure shows the high-level failover architecture for the primary and backup Network Manager core processes, which are set up in two separate domains.
In the figure, both the primary and backup installations connect to a virtual pair of ObjectServers.
- The Virtual Domain component (ncp_virtualdomain) manages failover, and raises health check events to indicate whether the domain is healthy.
- The Probe for Tivoli Netcool/OMNIbus (nco_p_ncpmonitor) connects to the virtual ObjectServer pair, and forwards the health check events.
- The Event Gateway (ncp_g_event) connects to
the virtual ObjectServer pair, reads in all health check events, and
then passes the events to the Virtual Domain component.
These health check events are used to trigger failover.
A TCP socket connection is required between the Virtual Domain processes, to copy data from the primary domain to the backup domain. This ensures that the topology is in sync when failover occurs.
$NCHOME/etc/security/keys/conf.key
. If you enter all SNMP community strings on the command line
and do not encrypt them, you do not need to do this task.
Also, to update the NCIM and ObjectServer passwords use the Perl script
ncp_password_update.pl
. NCIM implementations for failover
You can set up Network Manager failover with NCIM topology database high availability. This failover configuration protects against data loss by replicating data changes from the source NCIM topology database in the primary Network Manager domain to one or more target NCIM topology databases in the backup Network Manager domain. The source NCIM topology database is referred to as the primary database and the target NCIM topology database is referred to as the standby database. This approach removes the single point of failure because both the primary and backup Network Manager domains connect to whichever database is acting as the primary database.
In any failover configuration, both the primary and backup Network Manager domains connect to the same database, even with database high availability configured. The main difference is that with high availability, the database is replicated on the standby database server.
- If you have a Db2® database, you can use the High Availability Disaster Recovery (HADR) feature to set up failover for NCIM.
- If you have an Oracle database, you can use the Real Application Clusters (RAC) feature to set up failover for NCIM.
Regardless of whether failover is configured with or without NCIM topology database high availability, all entities in the topology are stored under the primary domain name, and all poll policies are configured for the primary domain. There is no entry in the domainMgr table for the backup domain. As a result, the NmosDomainName field for an event in the alerts.status table will always be populated with the primary domain name when failover is configured.