IBM Tivoli Provisioning Manager for Images, Version 7.1.1.16

Fault tolerance

A system is fault-tolerant if it can continue to perform despite parts failing. Fault tolerance helps to make your remote-boot infrastructure more robust.

In the case of OS deployment servers, the whole system is fault-tolerant if the OS deployment servers back up each other. When a server fails, other servers handle the requests from the down server.

Implementing fault tolerance at the Tivoli® Provisioning Manager for Images level does not mean that your whole network infrastructure is fault-tolerant. You can implement fault-tolerances at all levels:

At the physical level, by having redundant power sources (if all OS deployment servers are out of power at the same time, fault-tolerance at the product level is useless)
At the network level, by having backup network links, and backup active elements (the backup server must be able to reach remote-boot targets)
At the network operating system level, by having multiple network domains, or by running OS deployment servers outside of your domain architecture (OS deployment servers should not be all linked to the same NT PDC, or the same NFS server)
At the DHCP level, by having multiple DHCP servers on the same subnet
At the Tivoli Provisioning Manager for Images level, by implementing the fault-tolerance instructions.
At the operating system level. If Tivoli Provisioning Manager for Images is able to survive to a severe problem, but then the operating system cannot find its network server, fault tolerance is useless

The following sections present information about how to implement fault tolerance at the DHCP and Tivoli Provisioning Manager for Images levels. Other levels are beyond the scope of this document.

Feedback