Configuring for Operating System Level Failover

The objective is to take a standard application installation on one node, detect failures, disable the first node, enable a second node, and start an identically configured application. The exact mechanics of this process will vary, depending on the hardware and software involved.

These solutions generally entail four key components, all of which are managed by OS/environment specific cluster management software.

IP Takeover: The designated backup platform will take over the IP address of the primary platform. This might be accomplished through mechanisms within a particular host or through external network facilities like NAT (Network Address Translation).
Disk Takeover: The failover platform will get access to the disks that were being used by the primary platform (access from the primary host must cease). An alternative to this is to have the required disk storage attached using a shared filesystem or cluster filesystem.
Restart Logic: There will be some kind of restart logic to manage dependencies and restart all necessary components only when their resources are available. In a Unix environment, these are often scripts, in a Windows environment, it is likely to be a combination of scripts and configuration in the Microsoft Clustering Software.
Failure Detection: Usually handled by the OS level failover technology, it monitors the platform and/or Sterling B2B Integrator to determine when a failover is needed. In addition to software, there are often hardware components like serial connections, additional isolated network connections and sometimes shared disk devices that are part of this mechanism to try to prevent partial takeovers in the event of partial failures.

The following sections show some common arrangements of this process. Each of these configurations has advantages and disadvantages.

In all options except the first (separate local disk), Sterling B2B Integrator is installed as if it were a standalone installation and configuration is done for the low level failover technology.

Option 1: Separate Local Disk (Application Installed Separately On Each Node)

This option, with a completely separate disk, is in many ways the simplest and cheapest way to go. It amounts to having two identically configured computers with Sterling B2B Integrator installed identically on each.

The biggest problem with this approach is that the installations are not automatically kept synchronized and because Sterling B2B Integrator is licensed to IP addresses, it is not possible to bring it up in the second environment while the first is operating. The synchronization issue can be partially mitigate through the use of automated file copying tools. Documents must be stored to the database in this configuration.

Option 2: Dual Attached Disk

This option, using dual attached hardware, requires disks that have the capability of being attached to two (or more) servers. Most modern external storage subsystems can do this (with varying degrees of sophistication).

In this environment, the failover process has the additional steps of logically moving the disks to the second server. This is usually handled more or less transparently by the OS level failover technology. It has the same performance as local disk, but the Sterling B2B Integrator instance is always up to date.

Option 3: Shared Network Attached Storage

This option (shared network attached storage), like NFS or Windows shared drives is also viable, but it does have two important issues:

There is almost always a performance hit because the network attached storage is slower than local disk (the exception being if Sterling B2B Integrator is configured to use local disk for temporary files and does not use the file system adapter extensively)
The Network Attached Storage itself must be capable of essentially instantaneous/transparent failover.

This is an attractive option in lower volume configurations or where there is already Network attached Storage available with the desired characteristics.