PowerHA SystemMirror use of Cluster Aware AIX
PowerHA® SystemMirror® is built in addition to the core clustering capabilities that are supported in the AIX® operating system. PowerHA SystemMirror is supported for all editions of AIX that support Cluster Aware AIX (CAA) capabilities.
CAA and PowerHA SystemMirror use Universal IDs (UID and UUID) to track disks and nodes. Dynamically changing UID and UUID is not supported. The UID and UUID are normally invariant under most circumstances. However, there are known scenarios such as reinstalling the operating system where the UID and UUID can change. If you make changes to the UID and UUID, you must remove and recreate the CAA cluster to ensure all UID and UUIDs are updated.
In AIX Version 7.2, or later, or in
IBM® AIX 7.1 with Technology Level 4, or later, CAA detects and handles network
failures after 20 seconds (default value). To change the default value from 20 seconds, run the
clmgr modify cluster NETWORK_FAILURE_DETECTION_TIME=<xxx> command, where
xxx
is the number of seconds, in the range 5 - 590.
- Heartbeat management
- By default, PowerHA SystemMirror uses unicast
communications for heartbeat. As an alternative, multicast communications may be configured instead
of unicast. For multicast, you can optionally select a multicast address, or let Cluster Aware AIX (CAA) automatically assign one. You can specify
a multicast address while configuring the cluster, or have a multicast setup by Cluster Aware AIX (CAA) during the configuration based on the
network environment. Cluster communication is achieved by communicating over multiple redundant
paths of communication. The following redundant paths of communication provide a robust clustering
foundation that might not be prone to cluster partitioning:
- TCP/IP Networks
- PowerHA SystemMirror and Cluster Aware AIX use all network interfaces that are available for cluster communication. All of these interfaces are discovered by default and used for health management and other cluster communication. You can use the PowerHA SystemMirror management interfaces to remove any interface that you do not want to be used for application availability. You can also define the interfaces that you do not want to be used as private interfaces with PowerHA SystemMirror.
- SAN based communication
- CAA supports storage area network (SAN) fabric-based cluster communication, including heartbeating, for a limited number of adapters. This type of heartbeating is optional and might not work with most environments because of network zoning requirements that allow packets to move from one client to another client by using Small Computer System Interface (SCSI) protocol.
- Central cluster-repository based communication
- Cluster health and other cluster communication is achieved through the central repository disk. PowerHA SystemMirror 7.2, or later, provides an Automatic Repository Disk Replacement (ARR) function that automatically replaces a failed repository disk with a backup repository disk. The ARR function is available only if you configure and identify a backup repository disk by using PowerHA SystemMirror.
- Network interface failure detection time
- PowerHA SystemMirror relies on CAA to monitor and detect
network interface failures and node failures. In IBM AIX 7.1 with Technology Level 4, or earlier, CAA detected network failures within a fixed amount of time (5 seconds). If a
hardware failure occurred in these versions of the AIX
operating system, the failures were reported immediately. This type of reporting is called quick
failure process. This detection and reporting process in the AIX operating system is different than how PowerHA SystemMirror Version 6.1 reports and detects failures. In PowerHA SystemMirror 6.1, failures are not declared until the full
network failure detection time occurs. This process is called full wait time based on relaxed
failure detection.
In AIX Version 7.2, or later, or in IBM AIX 7.1 with Technology Level 4, or later, you can use the
NETWORK_FAILURE_DETECTION_TIME
option with the clmgr command to set the failure detection time for the network interface. The default value for theNETWORK_FAILURE_DETECTION_TIME
option is 20 seconds. In AIX Version 7.2, or later, or in IBM AIX 7.1 with Technology Level 4, or later, the failure detection process occurs after the full wait period of the failure detection time. These version of the AIX operating system do not use the quick failure detection process.To change the default value from 20 seconds for theNETWORK_FAILURE_DETECTION_TIME
option, run the clmgr modify cluster NETWORK_FAILURE_DETECTION_TIME=<xxx> command, wherexxx
is one of the following values:- 0
- If you specify this value and the cluster is synchronized, then the network failure detection occurs after 5 seconds and uses the quick failure detection process. This option was used in IBM AIX 7.1 with Technology Level 4, or earlier.
- 5 - 590 seconds
- If you specify a value in this range and if the cluster is synchronized, the network failure detection occurs after the specified value and uses the full wait time process.
- Node failure detection time
- PowerHA SystemMirror and CAA can detect failure of a partner
node in a cluster when heartbeats are missing from network communication and disk communication.
When these communication channels are lost, monitoring is enabled for a set period of time. This
monitoring is known as node failure detection time. To configure node failure detection time, you can use one of the following options:
- SMIT
- To configure node failure detection time, complete the following steps:
- From the command line, enter smit sysmirror.
- In the SMIT interface, select , and press Enter.
- Complete all required field, and press Enter.
- Command line
- From the command line, run the clmgr modify cluster HEARTBEAT_FREQUENCY=<v1>
GRACE_PERIOD=<v2> command, where
v1
andv2
are values in seconds.The
HEARTBEAT_FREQUENCY
option is the node communication time-out value. This value is the number of seconds that CAA waits to receive packets from the partner node before completing the next step in the process to determine whether the partner node has failed. Valid values for the TheHEARTBEAT_FREQUENCY
option are 20 - 600 seconds. The default value is 30 seconds. The value for theHEARTBEAT_FREQUENCY
options must be 10 seconds more than the value used for theNETWORK_FAILURE_DETECTION_TIME
option.The
GRACE_PERIOD
option is the additional time for which CAA waits after the value specified for theHEARTBEAT_FREQUENCY
option. The default value of theGRACE_PERIOD
option is 10 seconds.
- Enhanced event management
- CAA generates fine granular storage and network events that are used by PowerHA SystemMirror to provide a better decision-making capability for high availability management.
- Manage storage across the nodes
- PowerHA SystemMirror uses the storage fencing capabilities of
AIX for better storage management across the nodes in the
cluster. The fencing capabilities are supported for only disks that are configured with native AIX Multipath I/O (MPIO). PowerHA SystemMirror manages shared disks through the enhanced
concurrent volume management method.Note: PowerHA SystemMirror attempts to use the CAA storage framework fencing capability to prevent access of shared disks by nodes that do not have access to the owning shared volume group. This fencing capability prevents data corruption because of inadvertent access to shared disks from multiple nodes. However, the CAA storage framework fencing capability is supported only for native AIX MPIO.