Node quorum with tiebreaker disks
When running on small GPFS™ clusters, you might want to have the cluster remain online with only one surviving node.
To achieve this, you need to add a tiebreaker disk to the quorum configuration. Node quorum with tiebreaker disks allows you to run with as little as one quorum node available as long as you have access to a majority of the quorum disks (refer to Figure 1). Enabling node quorum with tiebreaker disks starts by designating one or more nodes as quorum nodes. Then one to three disks are defined as tiebreaker disks using the tiebreakerDisks parameter on the mmchconfig command. You can designate any disk to be a tiebreaker.
When utilizing node quorum with tiebreaker disks, there are specific rules for cluster nodes and for tiebreaker disks.
- There is a maximum of eight quorum nodes.
- All quorum nodes need to have access to all of the tiebreaker disks.
- When using the traditional server-based (non-CCR) configuration repository, you should include the primary and secondary cluster configuration servers as quorum nodes.
- You may have an unlimited number of non-quorum nodes.
- If a network connection fails, which causes the loss of quorum, and quorum is maintained by tiebreaker disks, the following rationale is used to re-establish quorum. If a group has the cluster manager, it is the "survivor". The cluster manager can give up its role if it communicates with fewer than the minimum number of quorum nodes as defined by the minQuorumNodes configuration parameter. In this case, other groups with the minimum number of quorum nodes (if they exist) can choose a new cluster manager.
Changing quorum semantics:
When using the cluster configuration repository (CCR) to store configuration files, the total number of quorum nodes is limited to eight, regardless of quorum semantics, but the use of tiebreaker disks can be enabled or disabled at any time by issuing an mmchconfig tiebreakerDisks command. The change will take effect immediately, and it is not necessary to shut down GPFS when making this change.
- To configure more than eight quorum nodes under the server-based
(non-CCR) configuration repository, you must disable node quorum with
tiebreaker disks and restart the GPFS daemon.
To disable node quorum with tiebreaker disks:
- Issue the mmshutdown -a command to shut down GPFS on all nodes.
- Change quorum semantics by issuing mmchconfig tiebreakerdisks=no.
- Add additional quorum nodes.
- Issue the mmstartup -a command to restart GPFS on all nodes.
- If you remove quorum nodes and the new configuration has less
than eight quorum nodes, you can change the configuration to node
quorum with tiebreaker disks. To enable quorum with tiebreaker disks:
- Issue the mmshutdown -a command to shut down GPFS on all nodes.
- Delete the appropriate quorum nodes or run mmchnode --nonquorum to drop them to a client.
- Change quorum semantics by issuing the mmchconfig
tiebreakerdisks="diskList" command.
- The diskList contains the names of the tiebreaker disks.
- The list contains the NSD names of the disks, preferably one or three disks, separated by a semicolon (;) and enclosed by quotes.
- Issue the mmstartup -a command to restart GPFS on all nodes.
- You can have one, two, or three tiebreaker disks. However, you should use an odd number of tiebreaker disks.
- Among the quorum node groups that appear after an interconnect failure, only those having access to a majority of tiebreaker disks can be candidates to be the survivor group.
- Tiebreaker disks must be connected to all quorum nodes.
When a quorum node detects loss of network connectivity, but before GPFS runs the algorithm that decides if the node will remain in the cluster, the tiebreakerCheck event is triggered. This event is generated only in configurations that use quorum nodes with tiebreaker disks. It is also triggered on the cluster manager periodically by a challenge-response thread to verify that the node can still continue as cluster manager.