Most of the Power Systems range of servers since POWER7 allows for the HMC organized online hot replacement of:
- Cooling fans
- Power supplies
And the online hot adding, removing, or replacement of:
- Internal hard disks
- PCIe adapters.
On POWER7, we are not going to have the online add or remove CEC nor adding or removing processors modules or memory cards. Online CPU and memory activations that use Capacity Upgrade on Demand are still an excellent option.
On the POWER8 and POWER9, I/O Drawers we don't currently have the ability online hot add or remove them. The direction we are moving is for these core physical configuration changes or some simpler things like major VIOS updates is to use Server Evacuation. At the end of 2014, we have something like 65% of customers with POWER6 and later hardware use Live Partition Mobility (LPM). All the virtual machines (LPARs) can be moved to alternative power servers and then the IBM service representative (I still call them a customer engineer) or VIOS server admin team has complete control of the server with zero risks to production workloads.
PowerVC 1.2.2 has a Server Evacuation feature but no feature to return the virtual machine back to its "home" server. There is a Lab Services offering@ an advanced tool to organize this LPAR Evacuation and Return. I saw a demo at the Technical University in Las Vegas and it was impressive. To gain access to the tool, clients can purchase a week or two of Lab Services time to install, configure it to your environment and train on its use. For more on this tool:
- To find an overview and information, Google for: "Lab Services offering LPAR Automation Evacuation tool"
- Ignore the advertisements at the top and then, there are websites and links to YouTube videos.
For smaller teams and smaller computer rooms, you might perhaps want something simpler. I wanted to see what I could do via the HMC command line interface to Evacuate and Return virtual machines. Many sites run their servers in small sets of five or six machines. A "six pack" is a nice concept but here it is for High Availability and Maintenance reasons. For example, five active servers and one spare. When a disastrous crash of a whole server occurs, then all its virtual machines can be moved to the empty space server. It also helps out with maintenance periods and unexpected massive peaks in workload. These server groups were called pods for many years. Now it seems the pod name is used for containers and Kubernetes.
So let us assume you want to evacuate one server to one other server = which means one source server and one target server that is simple:
![evacuate one VM](/support/pages/system/files/inline-images/Evacuate_one.jpg)
Alternatively, evacuate to a number of target servers = which means evacuate one source server to many target servers. Later, return it is many source servers to one target server, which is tricky:
![evacuate many at one time](/support/pages/system/files/inline-images/Evacuate_many.jpg)
I am not a regular user of the HMC command line interface (CLI). I try to stick to GUI like most admin users. Gareth Coates covers that and shares their experience in his "Hints and Tips of the Power M??????s" session often given at Technical Universities and Virtual User Groups. But I wanted to have a go and so remember how it all works.
To use the HMC command line, you can log on by using ssh to get a terminal session on the HMC or remotely run commands (normally from AIX or Linux) which works best. Assuming, you exchange those security certificates. Secured shell certificates are not covered here but you can find out how to set certificates up in other places on the web or HMC Redbook.
- One example, from the IBM Knowledge Center: Setting up secure script execution between SSH clients and the HMC
- One point that catches me is the frustrating mkauthkeys command. If it fails, then it does not state the command option that is the problem. It just says "no"!
- It has two options for adding the key = -a and --add (note the double "-"). If you don't spot the double, it just gives the usage with no explanation what-so-ever! [["It's a UNIX command, Jim, but not as we know it!"]]
- The web documentation does not show a worked example. It is unclear,, if you need the entire is_dsa.pub file content or just the Hexadecimal part. The answer is, you need the whole file content including spaces. Note: the single quotation marks.
- That's enough grumpy comments for now :-)
So the first attempt was to get a list of the machines connected to my HMC called hmc13:
Test connectivity:
$ ssh nag@hmc13 date
Thu Dec 18 13:40:43 GMT 2014
$
List servers the HMC knows about:
$ ssh nag@hmc13 lssyscfg -r sys -F name
red-8203-E4A-SN10E0A41
lime-8284-22A-SN215296V
silver-8203-SN10E0A31
ruby-9119-MME-SN108D2C7
emerald-8286-42A-SN100EC7V
indigo-8231-E1C-SN0659FDR
gold-8203-SN10E0A11
orange-8203-E4A-SN10E0A51
bronze-8203-E4A-SN10E0A21
$
We can list the virtual machines on one server with
$ ssh nag@hmc13 lssyscfg -r lpar -F name -m orange-8203-E4A-SN10E0A51
vm150-db5172e2-000000dd
vm61_SLES113-417ce624-00000014
vm52-59380644-0000002d
vm97-41097a95-0000002b
orangevios1 SSP4
$
We can also include the operating system type:
$ ssh nag@hmc13 lssyscfg -r lpar -F "name:os_version" -m orange-8203-E4A-SN10E0A51
vm150-db5172e2-000000dd:"AIX 7.1 7100-03-03-1415"
vm61_SLES113-417ce624-00000014:"Linux/SuSE 3.0.76-0.11-ppc64 11"
vm52-59380644-0000002d:Unknown
vm97-41097a95-0000002b:Unknown
"orangevios1 SSP4" "VIOS 2.2.3.3"
We can list the server names and match virtual machines by listing the server name list to drive a "while loop" to fetch their virtual machines. This script is also relying on the fact the HMC is running Linux and a Bash shell. The script and its output are shown in the following example:
$ ssh nag@hmc13 'lssyscfg -r sys -F name | \
while read SYSTEM ; \
do \ lssyscfg -r lpar -F name -m $SYSTEM | \
while read LPAR ; \
do \
echo $SYSTEM $LPAR ; \
done ; \
done;'
red-8203-E4A-SN10E0A41
vm87-78c5a8ba-0000006c
red-8203-E4A-SN10E0A41
vm89-de4be4cd-00000067
red-8203-E4A-SN10E0A41
redvios1 SSP4
lime-8284-22A-SN215296V No results were found. <-- new machine
silver-8203-SN10E0A31
vm63-hand-made-AIX733
silver-8203-SN10E0A31
vm92-9f7591f7-00000041
silver-8203-SN10E0A31
m53-801c9749-0000002e
silver-8203-SN10E0A31
silvervios1-SSP4
silver-8203-SN10E0A31
vm82-c2953634-00000025
ruby-9119-MME-SN108D2C7
ruby10_OpenSUSE
ruby-9119-MME-SN108D2C7
ruby35-Ubuntu1410
ruby-9119-MME-SN108D2C7
ruby34-SLES113
ruby-9119-MME-SN108D2C7
vm29-RHEL65-PowerVC-121
ruby-9119-MME-SN108D2C7
vm20-SLES-11.3
. . .
We have functions we need to create an Evacuation script. Let us assume we want to Evacuate the "orange" machine to the "ruby" machine.
We need LPM commands that look like this
1 To LPM Validate (-o v) the command is:
migrlpar -o v -m source -t target -p LPARname
2 To LPM Migrate (-o m) the command is:
migrlpar -o m -m source -t target -p LPARname
So let us write the Evacuation script
1) Save the possible target at the top of the script (adding a hash at the beginning of the line so it is a Bash comment):
ssh nag@hmc13 lssyscfg -r sys -F name | sed 's/^/# /' > orange_evac
2) Define the source and target in the script appending to the end
echo \# >> orange_evac
echo SOURCE=orange-8203-E4A-SN10E0A51 >> orange_evac
echo TARGET=ruby-9119-MME-SN108D2C7 >> orange_evac
echo \# >> orange_evac
3) Add the migrlpar commands to the script but we what to remove VIOS servers from the list. A VIOS is never moved.
ssh nag@hmc13 lssyscfg -r lpar -F "name:os_version" -m orange-8203-E4A-SN10E0A51 | \
grep -v VIOS | \
awk -F : '{ print "ssh nag@hmc13 migrlpar -o m -m $SOURCE -t $TARGET -p " $1 }' \
>> orange_evac
Our script not looks like this file orange_evac:
# red-8203-E4A-SN10E0A41
# lime-8284-22A-SN215296V
# silver-8203-SN10E0A31
# ruby-9119-MME-SN108D2C7
# emerald-8286-42A-SN100EC7V
# indigo-8231-E1C-SN0659FDR
# gold-8203-SN10E0A11
# orange-8203-E4A-SN10E0A51
# bronze-8203-E4A-SN10E0A21
# SOURCE=orange-8203-E4A-SN10E0A51 TARGET=ruby-9119-MME-SN108D2C7
# ssh nag@hmc13 migrlpar -o m -m $SOURCE -t $TARGET -p vm150-db5172e2-000000dd
ssh nag@hmc13 migrlpar -o m -m $SOURCE -t $TARGET -p vm61_SLES113-417ce624-00000014
ssh nag@hmc13 migrlpar -o m -m $SOURCE -t $TARGET -p vm52-59380644-0000002d
ssh nag@hmc13 migrlpar -o m -m $SOURCE -t $TARGET -p vm97-41097a95-0000002b
We now have the script to evacuate all virtual machines from one server to another single server.
If you want to spread the virtual machines to different servers, you can edit this script and replace the "$TARGET" with a specific other server from the list at the top.
If you want to run the LPM validate check, then substitute "-o m" for "-o v".
4) Run the script but watch the output carefully. It is assumed that all virtual machines are LPM-ready.
You can save the output to check that the migrations all worked OK.
You could do up to 16 concurrent LPMs at the same time on POWER8 and HMC 8+ - lower numbers applies to older servers.
If the script generates no errors, then the command does not return any details - which is a little spooky! Warnings are not reported.
5) After your maintenance or hardware additions are complete the virtual machines need to be returned. Edit the orange_evac script by simply swapping the "-m" and "-t" flags. Even if you sent some virtual machine to other machine this amended script would work fine.
Well that enough for one blog - happy HMC CLI LPM scripting and Evacuations + Returns on the cheap.
There are other more automated options like PowerVC and the Lab Services tools but this simple generated script is a useful option in your toolbox.