Red Hat Enterprise Linux High Availability Add-on
Follow this step-by-step procedure to set up a working Red Hat Enterprise Linux High Availability Add-On (Red Hat HA) setup.
These steps are described in detail in the following sections.
-
Install Red Hat HA
-
Configure Quorum
-
Configure Fencing
-
Add Fence levels
-
Add highly available shared storage
-
Install bastion
Install Red Hat HA
-
The following steps are performed on all OpenShift LPARs:
Register the system and install Red Hat HA.
subscription-manager register --auto-attach dnf config-manager --set-enabled rhel-8-for-s390x-highavailability-rpms yum update -y yum install -y pcs pacemaker fence-agents-all -
Add firewall exceptions:
firewall-cmd --permanent --add-service=high-availability firewall-cmd --reload -
Create the Linux user that is used by the cluster:
passwd haclusterNote: The same password for each node is recommended. -
Enable and start the cluster controlling and configuration daemon:
systemctl enable pcsd.service --now systemctl status pcsd.service -
The following steps are performed on one OpenShift LPAR.
Authenticate the nodes:
pcs host auth halp01 halp02 halp03 #Username: hacluster #Password: ... -
Set up the cluster (replace name
my_cluster):pcs cluster setup my_cluster halp01 halp02 halp03 -
Start and enable the cluster service:
pcs cluster start --all pcs cluster enable --all
The pcs CLI tool allows you to configure the cluster (Pacemaker + Corosync) and view the status.
Full cluster status:
pcs status --full
# Pacemaker xml configuration
pcs cluster cib
Quorum configuration
The following steps are performed on one OpenShift LPAR:
-
With
wait_for_allenabled the whole cluster only becomes quorate/functional for the first time when all cluster members are available:pcs quorum update wait_for_all=1Note: For example, if you start three LPARs consecutively without enabling wait_for_all, the last LPAR might be fenced from the two already available LPARs. -
totem tokentimeout specifies the time in milliseconds until a token loss is reported:pcs cluster config update totem token=5000Note: For totem token limits, check out the Corosync support policies.Check that the totem token timeout is activated:
corosync-cmapctl | grep "runtime.*totem.token "
Fencing
For fencing, the following three levels are implemented:
-
fence_ibmz (Level 2): The main fencing method is power fencing over the HMC. It provides solid fencing because it is a power fencing method which triggers fencing externally via the HMC API.
-
fence_sbd (Level 3): When the HMC is not available, SBD is used as backup fence agent. As a last resort, self-fencing is a reliable backup option which might take a bit longer but should take effect in the worst cases. The poison pill is used to speed up this fencing method in some failure cases.
-
fence_kdump (Level 1): Included for debugging purposes. When you take a kdump (either automatically or manually) you want to prevent other fencing methods to trigger. This way the fencing is considered successful when a LPAR kdumps.
fence_ibmz
HMC prerequisites before setting up fence_ibmz
-
For the fence_ibmz fence agent, a HMC user is needed with the following rights: Deactivate, Activate, Load, View Activation Profiles. Those rights are listed under Summary -> Tasks:
Figure 1. Summary for user4fencing user on HMC.
At the bottom, Objects shows that the user is scoped to the cluster members (halp01, halp02 and halp03) only.
-
Allow access to the Web Services management interface:
Figure 2. Allow access to Web Services management interfaces.
Keep in mind how the timeouts are configured. The default values are usually very high, which should not affect fencing actions.
Figure 3. Session timeouts
-
In the activation profile of each LPAR you might want to set the option
load during activation. This has the advantage to skip the additional load task triggered by the fence agent. This option is required when you use SCSI disks (instead of DASDs). With DASDs this option is optional.Figure 4. Load during activation
-
Perform following steps on all OpenShift LPARs:
To reach the HMC in a secure way from the LPARs the TLS root certificate and intermediate CA certificates must be trusted.
The file CA_CERT.pem contains the root certificate and all intermediate CA certificates used in the chain of the HMC server certificate. You can view the chain with most Web-browsers. The Format of the file is PEM.
Copy the CA certificate file to the appropriate location and update the certificate authority with the associated trust:
cp CA_CERT.pem /etc/pki/ca-trust/source/anchors/ update-ca-trust -
Verify that the certificates are in the trust store:
trust list | lessVerify that the complete certificate chain to the HMC is trusted:
openssl s_client -showcerts -connect ${HMC_URL}:443 -verify_return_error < /dev/null
Install fence_ibmz
Perform the following steps on all OpenShift LPARs:
-
fence_ibmz is installed via upstream because the fence_ibmz package is not available via the package manager at the time of writing:Note: In newer RHEL versions fence_ibmz might be already installed with the fence-agents-all package. In this case skip to step 4.
dnf install -y wget wget https://raw.githubusercontent.com/ClusterLabs/fence-agents/master/agents/ibmz/fence_ibmz.py -
Replace platform-specific variables in the fence_ibmz code:
sed -i 's+@PYTHON@+/usr/libexec/platform-python+' fence_ibmz.py sed -i 's+@FENCEAGENTSLIBDIR@+/usr/share/fence+' fence_ibmz.py -
Copy
fence_ibmz.pyto a common location for fence agents and make the script executable:cp fence_ibmz.py /usr/sbin/fence_ibmz chmod +x /usr/sbin/fence_ibmz -
Verify that fence_ibmz is visible and installed:
pcs stonith list|grep fence_ibmz
Add fence_ibmz to the cluster
Perform on one OpenShift LPAR:
-
Add the fence agent to the cluster:
pcs stonith create fence_ibmz fence_ibmz \ ip=${HMC_URL} \ username="${HMC_USER}" \ password="${HMC_USER_PASSWORD}" \ ssl_secure=true \ pcmk_host_map="halp01:ZZ1/HALP01;halp02:ZZ1/HALP02;halp03:ZZ1/HALP03"Note: pcmk_host_map: the second value is case-sensitive! It is possible to hide the password by providing a password_script as described here. -
If you decided to enabled load_on_activate in the activation profile of each LPAR in the HMC (see previous step) then set the load_on_activate attribute as well:
pcs stonith update fence_ibmz load_on_activate=true -
Add debug log output for verification:
pcs stonith update fence_ibmz verbose=1 debug_file=/tmp/fence_ibmz.logTrigger fencing manually to verify the fence agent:
pcs stonith fence halp02 cat /tmp/fence_ibmz.log | lessNote: stonith-timeout and stonith-action options might be ignored when triggering manual fencing as described here.
Further considerations (recommended)
The stonith-timeout defines how long to wait until the STONITH action (for example on, off) is completed the defualt is 60 seconds. It can be overridden by pcmk_xxx_timeout on fence agent basis.
Because the deactivation process on the HMC can take up to 900 seconds (default), you can override the STONITH action timeout for the fence agent.
In the following code snippet, the timeout for each operation (off and on) is set to 905 seconds (perform on one OpenShift LPAR):
pcs stonith update fence_ibmz \
pcmk_reboot_timeout=1810 \
pcmk_off_timeout=905 \
pcmk_on_timeout=905
fence_sbd
fence_sbd prerequisite - watchdog
Perform on all OpenShift LPARs:
-
SBD uses watchdogs to monitor the systems. For IBM zSystems the
diag288_wdthardware watchdog is preferred. To enable the watchdog you have to load thediag288_wdtkernel module:modprobe diag288_wdt -
To verify that the watchdog is loaded,use following command:
wdctl -
Load the kernel module across reboots, run:
echo "diag288_wdt" > /etc/modules-load.d/watchdog.confNote: The watchdog timeout cannot be lower then 15 seconds. Source
fence_sbd prerequisite - shared disk for poison pill
Perform on one OpenShift LPAR:
-
For the poison pill communication, a shared disk is needed. In the following, a DASD disk is used.
Attach the DASD, format it, and create a partition (replace
dasdwith the appropriate DASD of your environment):# attach/enable dasd chzdev -e dasd 0.0.0200 # format dasd dasdfmt -b 4096 -d cdl -p /dev/disk/by-path/ccw-0.0.0200 # creates a partition over the entire disk automatically fdasd -a /dev/disk/by-path/ccw-0.0.0200 -
For SBD to work, a header is written to the previously created partition:
pcs stonith sbd device setup \ --device=/dev/disk/by-path/ccw-0.0.0200-part1 \ watchdog-timeout=15 \ msgwait-timeout=30 -
Verify the status of SBD by viewing the full status:
pcs stonith sbd status --full -
Enable the SBD daemon:
pcs stonith sbd enable \ watchdog=/dev/watchdog \ device=/dev/disk/by-path/ccw-0.0.0200-part1 \ SBD_DELAY_START=60 SBD_WATCHDOG_TIMEOUT=15Notes:
-
SBD_*are environment variables for the SBD systemd service. -
SBD_WATCHDOG_TIMEOUTonly applies when SBD runs in diskless mode. When disks are defined the watchdog timer written to the disk header is used. -
The diag288 watchdog minimum timeout is 15 seconds, see here
-
SBD_DELAY_STARTpostpones the start of the pacemaker systemd daemon -
SBD_DELAY_STARTshould be longer then: corosync token timeout (5) + consensus timeout (6) + pcmk_delay_max (0) + msgwait (30) = 41 seconds. Otherwise, Pacemaker might start with exit code 100.
-
-
To make the changes active, restart the cluster:
pcs cluster stop --all pcs cluster start --all -
Show the default
power_timeout, which indicates how long the fencing process waits before Pacemaker must be up again after fencing:fence_sbd -o metadata|grep -A 2 power_timeout -
Create the SBD fence agent and set the
power_timeout:pcs stonith create fence_sbd fence_sbd \ devices="/dev/disk/by-path/ccw-0.0.0200-part1" \ power_timeout=45 -
Show and verify the settings of the SBD fence agent:
pcs stonith config -
Make sure to test SBD fencing before continuing:
-
Send messages:
sbd -d /dev/disk/by-path/ccw-0.0.0200-part1 message halp01 test -
Look at the Log of the SBD systemd service:
journalctl -u sbd -f -
Send poison pills:
# make sure to disable other fencing methods first: pcs stonith disable fence_ibmz # test time pcs stonith fence halp01 pcs stonith enable fence_ibmzThe target system should reboot and the Pacemaker systemd service should be delayed by the SBD systemd service by
msgwait-timeoutseconds. -
Further debugging options:
-
Increase the SBD verbosity by adding a
-vto theSBD_OPTSin/etc/sysconfig/sbd. -
Look at systemd startup
systemd-analyze critical-chain
-
-
fence_kdump
Perform on all OpenShift LPARs:
-
Enable and start the kdump systemd service:
systemctl enable kdump --now -
Add the required firewall rule for kdump's communication to fence_kdump:
firewall-cmd --add-port=7410/udp --permanent systemctl reload firewalld systemctl restart firewalld -
Edit the kdump configuration
/etc/kdump.conf:-
Set the port to send the kdump message to (see
man fence_kdump_send) and set the interval to every 10 seconds (repeating forever):fence_kdump_args -p 7410 -f auto -c 0 -i 10 -
All hostnames (cluster members) to send the kdump message to:
fence_kdump_nodes halp01 halp02 halp03The system that runs fence_kdump at that time receives the message.
-
-
Restart the kdump service:
systemctl restart kdump -
Add the kdump fence agent to the cluster:
pcs stonith create kdump fence_kdump \ pcmk_reboot_action="off" \ example.com_list="halp01 halp02 halp03" \ verbose=1
Helpful links:
Add Fence Levels
Perform on one OpenShift LPAR:
-
Use fence levels to set the order in which the fence agents are run. Fence levels are run from low to high. Adjust
regexpto match the LPAR names of your environment (halp01, halp02, halp03).pcs stonith level add 1 regexp%lpar[0-9] fence_kdump pcs stonith level add 2 regexp%lpar[0-9] fence_ibmz pcs stonith level add 3 regexp%lpar[0-9] fence_sbd -
Show and verify the fencing levels:
pcs stonith level pcs stonith level verifyIf you made a mistake you can remove entire levels like this:
pcs stonith level remove 1 -
Additional configurations
-
When your fencing is misconfigured, or the node still has a healthy cluster communication (for example, if you are using fabric fencing) the node to be fenced is notified of its own fencing. In this case the
fence-reactionproperty decides what happens.panicis the safest choice and reboots the node.pcs property set fence-reaction=panic -
To prevent multiple fencing operations in parallel, you can disable concurrent-fencing. In a 3-node cluster (that can only withstand the failure of one node), you might not need concurrent-fencing:
pcs property set concurrent-fencing=false
-
Add highly available shared storage
Previously, you added the shared SCSI disks to the systems and created a partition on each of them. In this topic, the partitions are mounted to one OpenShift LPAR in an active/passive fashion.
Perform on one OpenShift LPAR:
-
Before you can add the shared SCSI LUNs to the cluster, logical volumes are needed for each SCSI LUN. Repeat following steps with each SCSI LUN that is used by KVM guest (replace WWID with the appropriate WWID):
pvcreate /dev/disk/by-id/scsi-3600507630bffc320000000000000e027-part1 vgcreate bastion-vg /dev/disk/by-id/scsi-3600507630bffc320000000000000e027-part1 lvcreate -n bastion-lv -l 100%FREE bastion-vg mkfs.ext4 /dev/bastion-vg/bastion-lvRepeat the above process for
control[0-2]andcompute[0-1]and bootstrap.Note: It is not required to make the bootstrap node LUN higly available as you only need it during installation. However,one advantage of adding it, is the possibility to repurpose the bootstrap to a compute-node after installation. -
Create all mounting directories:
mkdir -p /mnt/control0-mnt/images mkdir -p /mnt/control1-mnt/images mkdir -p /mnt/control2-mnt/images mkdir -p /mnt/compute0-mnt/images mkdir -p /mnt/compute1-mnt/images mkdir -p /mnt/bootstrap-mnt/images mkdir -p /mnt/bastion-mnt/images -
Create the required Red Hat HA resources for each SCSI LUN. Perform the following steps for bastion, control 0-2, compute 0-1 and bootstrap (using the appropriate values of your environment):
pcs resource create bastion-lvm LVM-activate vgname=bastion-vg vg_access_mode=system_id --group bastion-group pcs resource create bastion-fs Filesystem device="/dev/bastion-vg/bastion-lv" directory="/mnt/bastion-mnt" fstype="ext4" --group bastion-group -
Make sure the resource has no errors and is mounted under the LPAR you are currently working with. If the resource group is not currently mounted on the system you are currently working with, use the following command to move the resource-group:
pcs resource move group-name destination-lpar -
To make sure
libvirthas sufficient permissions for the new directories. You also need manage Selinux. The man page suggests the usage of thesvirt_home_tselinux type for custom directories. However, thevirt_image_ttype also works correctly. The following command creates a regex rule to label each file under the used mount directories withvirt_image_t. Therestoreconcommmand triggers a relabeling of the files and directories recursively.semanage fcontext --add -t virt_image_t -f f '/mnt(/[^/]*-mnt)(/.*)?' restorecon -R -v '/mnt' # if already customized use force (-F): # restorecon -F -R -v '/mnt'
Install bastion
Prepare bastion KVM guest
Perform on one OpenShift LPAR where all the storage from above is currently mounted:
-
Create the qcow image required for the bastion installation under the mounted highly available shared SCSI LUN:
qemu-img create -f qcow2 /mnt/bastion-mnt/images/bastion-disk.qcow2 100G -
Install RHEL 8.5 on the bastion disk (replace values with appropriate values of your environment):
virt-install --name bastion \ --memory 8192 \ --vcpus 4 \ --disk /mnt/bastion-mnt/images/bastion-disk.qcow2 \ --location http://redhat-dvd-hosting-server.corp/redhat/s390x/RHEL8.5-latest/DVD/ \ --os-variant "rhel8.5" \ --network "network=default,mac=02:9B:16:BB:BB:BB" \ --network "network=macvtap-10-bridge,mac=02:9B:17:BB:BB:BB" \ --initrd-inject "/root/bastion.ks" \ --extra-args "inst.ks=file:///bastion.ks" \ --noautoconsole -
Shut down the bastion guest and dump the xml guest description to the highly available SCSI disk:
virsh shutdown bastion virsh dumpxml bastion > /mnt/bastion-mnt/bastion-guest.xml virsh destroy bastion virsh undefine bastion -
To make sure the selinux labels are correct, everything under
/mntis relabelled:restorecon -R -v '/mnt'
Adding the KVM guest as cluster resource
-
Before adding the KVM guest as resource to the cluster, make sure all OpenShift LPARs can reach each other via SSH, for example (perform on all OpenShift LPARs):
ssh-keygen ssh-copy-id root@halp01 ssh-copy-id root@halp02 ssh-copy-id root@halp03Note: This is a prerequisite for the VirtualDomain resource optionmigration_transport=ssh(see next step). This option means that when a KVM guest is moved to a different LPAR ssh is used for communication. -
Add the KVM guest as VirtualDomain resource to the high availability cluster by performing the following step on the OpenShift LPAR that has all the storage from above attached:
pcs resource create bastion-guest \ ocf:heartbeat:VirtualDomain \ config="/mnt/bastion-mnt/bastion-guest.xml" \ hypervisor="qemu:///system" \ migration_transport="ssh" \ meta allow-migrate=false \ --group bastion-group
Congratulations, you just made your first KVM guest highly available.