IBM Support

Experiments in Power Server Evacuation via LPM & HMC Command Line

How To


Summary

So you need to power off a server for some hardware maintenance or CPU / memory upgrades. Live Partition Mobility is the answer but if you have less than a half dozen virtual machines that can be done by hand. More than that you need to get organized. Here is how to do that.

Objective

Nigels Banner

Steps

Most of the Power Systems range of servers since POWER7 allows for the HMC organized online hot replacement of:

  • Cooling fans
  • Power supplies

And the online hot adding, removing, or replacement of:

  • Internal hard disks
  • PCIe adapters.

On POWER7, we are not going to have the online add or remove CEC nor adding or removing processors modules or memory cards.  Online CPU and memory activations that use Capacity Upgrade on Demand are still an excellent option. 

On the POWER8 and POWER9, I/O Drawers we don't currently have the ability online hot add or remove them.  The direction we are moving is for these core physical configuration changes or some simpler things like major VIOS updates is to use Server Evacuation. At the end of 2014, we have something like 65% of customers with POWER6 and later hardware use Live Partition Mobility (LPM).  All the virtual machines (LPARs) can be moved to alternative power servers and then the IBM service representative (I still call them a customer engineer) or VIOS server admin team has complete control of the server with zero risks to production workloads.

PowerVC 1.2.2 has a Server Evacuation feature but no feature to return the virtual machine back to its "home" server.  There is a Lab Services offering@ an advanced tool to organize this LPAR Evacuation and Return.  I saw a demo at the Technical University in Las Vegas and it was impressive.  To gain access to the tool, clients can purchase a week or two of Lab Services time to install, configure it to your environment and train on its use.  For more on this tool:

  • To find an overview and information, Google for: "Lab Services offering LPAR Automation Evacuation tool"
  • Ignore the advertisements at the top and then, there are websites and links to YouTube videos.

For smaller teams and smaller computer rooms, you might perhaps want something simpler.  I wanted to see what I could do via the HMC command line interface to Evacuate and Return virtual machines.  Many sites run their servers in small sets of five or six machines.  A "six pack" is a nice concept but here it is for High Availability and Maintenance reasons. For example, five active servers and one spare. When a disastrous crash of a whole server occurs, then all its virtual machines can be moved to the empty space server. It also helps out with maintenance periods and unexpected massive peaks in workload.  These server groups were called pods for many years. Now it seems the pod name is used for containers and Kubernetes.

So let us assume you want to evacuate one server to one other server = which means one source server and one target server that is simple:

evacuate one VM

Alternatively, evacuate to a number of target servers = which means evacuate one source server to many target servers. Later, return it is many source servers to one target server, which is tricky:

evacuate many at one time

I am not a regular user of the HMC command line interface (CLI).  I try to stick to GUI like most admin users.  Gareth Coates covers that and shares their experience in his "Hints and Tips of the Power M??????s" session often given at Technical Universities and Virtual User Groups. But I wanted to have a go and so remember how it all works.

To use the HMC command line, you can log on by using ssh to get a terminal session on the HMC or remotely run commands (normally from AIX or Linux) which works best.  Assuming, you exchange those security certificates.  Secured shell certificates are not covered here but you can find out how to set certificates up in other places on the web or HMC Redbook.

  • One example, from the IBM Knowledge Center:  Setting up secure script execution between SSH clients and the HMC
  • One point that catches me is the frustrating mkauthkeys command.  If it fails, then it does not state the command option that is the problem.  It just says "no"!
  • It has two options for adding the key = -a and --add (note the double "-").  If you don't spot the double, it just gives the usage with no explanation what-so-ever!  [["It's a UNIX command, Jim, but not as we know it!"]]
  • The web documentation does not show a worked example.  It is unclear,, if you need the entire is_dsa.pub file content or just the Hexadecimal part. The answer is, you need the whole file content including spaces. Note: the single quotation marks.
  • That's enough grumpy comments for now :-)

So the first attempt was to get a list of the machines connected to my HMC called hmc13:

Test connectivity:

  $ ssh nag@hmc13 date  
Thu Dec 18 13:40:43 GMT 2014  
$

List servers the HMC knows about:

  $ ssh nag@hmc13 lssyscfg -r sys -F name  
red-8203-E4A-SN10E0A41  
lime-8284-22A-SN215296V  
silver-8203-SN10E0A31  
ruby-9119-MME-SN108D2C7  
emerald-8286-42A-SN100EC7V  
indigo-8231-E1C-SN0659FDR  
gold-8203-SN10E0A11  
orange-8203-E4A-SN10E0A51  
bronze-8203-E4A-SN10E0A21  
$

We can list the virtual machines on one server with

  $  ssh nag@hmc13 lssyscfg -r lpar -F name -m  orange-8203-E4A-SN10E0A51  
vm150-db5172e2-000000dd  
vm61_SLES113-417ce624-00000014  
vm52-59380644-0000002d  
vm97-41097a95-0000002b  
orangevios1 SSP4  
$

We can also include the operating system type:

  $ ssh nag@hmc13 lssyscfg -r lpar -F "name:os_version" -m  orange-8203-E4A-SN10E0A51  
vm150-db5172e2-000000dd:"AIX 7.1 7100-03-03-1415"  
vm61_SLES113-417ce624-00000014:"Linux/SuSE 3.0.76-0.11-ppc64 11"  
vm52-59380644-0000002d:Unknown  
vm97-41097a95-0000002b:Unknown  
"orangevios1 SSP4" "VIOS 2.2.3.3"

We can list the server names and match virtual machines by listing the server name list to drive a "while loop" to fetch their virtual machines.  This script is also relying on the fact the HMC is running Linux and a Bash shell.  The script and its output are shown in the following example:

  $ ssh nag@hmc13 'lssyscfg -r sys -F name | \
while read SYSTEM ; \
do  \  lssyscfg -r lpar -F name -m $SYSTEM | \
while read LPAR ; \
do \
echo $SYSTEM $LPAR ; \
done  ; \
done;'

red-8203-E4A-SN10E0A41 
vm87-78c5a8ba-0000006c  
red-8203-E4A-SN10E0A41 
vm89-de4be4cd-00000067  
red-8203-E4A-SN10E0A41 
redvios1 SSP4  
lime-8284-22A-SN215296V No results were found.     <-- new machine  
silver-8203-SN10E0A31 
vm63-hand-made-AIX733  
silver-8203-SN10E0A31 
vm92-9f7591f7-00000041  
silver-8203-SN10E0A31 
m53-801c9749-0000002e  
silver-8203-SN10E0A31 
silvervios1-SSP4  
silver-8203-SN10E0A31 
vm82-c2953634-00000025  
ruby-9119-MME-SN108D2C7 
ruby10_OpenSUSE  
ruby-9119-MME-SN108D2C7 
ruby35-Ubuntu1410  
ruby-9119-MME-SN108D2C7 
ruby34-SLES113  
ruby-9119-MME-SN108D2C7 
vm29-RHEL65-PowerVC-121  
ruby-9119-MME-SN108D2C7 
vm20-SLES-11.3  
. . .

We have functions we need to create an Evacuation script. Let us assume we want to Evacuate the "orange" machine to the "ruby" machine.

We need LPM commands that look like this

  1 To LPM Validate (-o v) the command is:   
  migrlpar -o v -m source -t target -p LPARname
    ​2 To LPM Migrate  (-o m) the command is:   
  migrlpar -o m -m source -t target -p LPARname

So let us write the Evacuation script

1) Save the possible target at the top of the script (adding a hash at the beginning of the line so it is a Bash comment):

  ssh nag@hmc13 lssyscfg -r sys -F name | sed 's/^/# /'  > orange_evac

2) Define the source and target in the script appending to the end

  echo \# >> orange_evac  
echo SOURCE=orange-8203-E4A-SN10E0A51 >> orange_evac  
echo TARGET=ruby-9119-MME-SN108D2C7 >> orange_evac  
echo \# >> orange_evac

3) Add the migrlpar commands to the script but we what to remove VIOS servers from the list. A VIOS is never moved.

  ssh nag@hmc13 lssyscfg -r lpar -F "name:os_version" -m  orange-8203-E4A-SN10E0A51 | \
      grep -v VIOS | \
      awk -F : '{ print "ssh nag@hmc13 migrlpar -o m -m $SOURCE -t $TARGET -p " $1  }' \
>> orange_evac

Our script not looks like this file orange_evac:

  # red-8203-E4A-SN10E0A41  
# lime-8284-22A-SN215296V  
# silver-8203-SN10E0A31  
# ruby-9119-MME-SN108D2C7  
# emerald-8286-42A-SN100EC7V  
# indigo-8231-E1C-SN0659FDR  
# gold-8203-SN10E0A11  
# orange-8203-E4A-SN10E0A51  
# bronze-8203-E4A-SN10E0A21  
#  SOURCE=orange-8203-E4A-SN10E0A51  TARGET=ruby-9119-MME-SN108D2C7  
#  ssh nag@hmc13 migrlpar -o m -m $SOURCE -t $TARGET -p vm150-db5172e2-000000dd  

ssh nag@hmc13 migrlpar -o m -m $SOURCE -t $TARGET -p vm61_SLES113-417ce624-00000014  
ssh nag@hmc13 migrlpar -o m -m $SOURCE -t $TARGET -p vm52-59380644-0000002d  
ssh nag@hmc13 migrlpar -o m -m $SOURCE -t $TARGET -p vm97-41097a95-0000002b

We now have the script to evacuate all virtual machines from one server to another single server.

If you want to spread the virtual machines to different servers, you can edit this script and replace the "$TARGET" with a specific other server from the list at the top.

If you want to run the LPM validate check, then substitute "-o m" for "-o v".

4) Run the script but watch the output carefully.  It is assumed that all virtual machines are LPM-ready.

You can save the output to check that the migrations all worked OK.

You could do up to 16 concurrent LPMs at the same time on POWER8 and HMC 8+ - lower numbers applies to older servers.

If the script generates no errors, then the command does not return any details - which is a little spooky!  Warnings are not reported.

5) After your maintenance or hardware additions are complete the virtual machines need to be returned.  Edit the orange_evac script by simply swapping the "-m" and "-t" flags.  Even if you sent some virtual machine to other machine this amended script would work fine.

Well that enough for one blog - happy HMC CLI LPM scripting and Evacuations + Returns on the cheap.


There are other more automated options like PowerVC and the Lab Services tools but this simple generated script is a useful option in your toolbox.

Additional Information


Other places to find content from Nigel Griffiths IBM (retired)

Document Location

Worldwide

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HW1W1","label":"Power -\u003EPowerLinux"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG60","label":"IBM i"},"Component":"","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
31 December 2023

UID

ibm11165708