IBM Support

POWER CPU Memory Affinity 7 - VM CPU placement also needs RAM

How To


Summary

Power Systems gain their massive performance with lots a technology this series details many of them.

Objective

Nigels Banner

Originally, written in 2012 for the DeveloperWorks AIXpert Blog for POWER7 but updated in 2019 for POWER8 and POWER9.

Steps

We all tend to concentrate on the CPU first and the memory second.  CPUs, as the "brains" of the machine, does get a high focus and has lots of extremely good technology within it but the RAM is the "guts" of the machine to "feed" the CPU with nutrient data.  OK, let us stop the analogy there :-).  Along with reducing the number of CPUs via a lower virtual processor count, we also need to have the CPUs matching the memory - so AIX has a fighting change to localise a running process to its home SRAD and thus have it's data local for maximum speed.
SRAD? Definition Systems Resource Affinity Domain.    This is the collector on Virtual Processors memory areas allocated to a Virtual Machine.  Another way of saying it: the SRAD defines the slice of the computer that the VM can see and draws it CPU and memory resources from. Note it might "see" a CPU but it may not get sole use of all the compute cycles of that CPU as it could be shared with other VM's running on the same CPU.
 
Logical Memory Block size
At the SRAD level, we deal with whole CPUs but memory is finer grain in many MB terms. The smallest memory chunk is the Logical Memory Block, often just called LMB. This is set to a default size depending on the install memory when the machine booted up for the first time. I think the intention is to end up with something like 1000 chunks of memory for flexibility in assigning memory to a virtual machine, for Dynamic memory changes in sensible sizes but not too much waste in tracking millions of little memory chunks. On my medium and large POWER machines, I have seen LBM defaults of 64 MB to 256 MB.  After trying Live Partition Mobility, you quickly realise that the source and target machine have to have the same LMB size - so we standardised on 128 MB in our computer room. 
How to set the LMB?
  • This size is set in the ASMI Service Processor menu. On the HMC, select the machine then Operations and "Launch Advanced Systems Management", login in and find Performance Setting and then Logical Memory Block - unfortunately this needs a cold reboot of the machine. So get this right during the initial machine setup or it can be very painful to change while in production mode.
Assigning memory to virtual machine SRADs
When starting your virtual machine the Hypervisor will have to decide the optimal placement for both the CPU and memory. I have customers that have for similar workloads a fixed CPU to RAM ratio of 1 CPU to 32 GB of memory - if the machine was configured to have this ratio and the virtual machines do too. then there can be a good chance there is RAM available with each processor - it depends if you feel lucky!  I am thinking here of a machine that is 80% allocated to running virtual machines and we are adding a further one.   If instead there is no fixed ratio and the ratio is decided based on needs then I think you are likely to have more situations where there are CPU and memory islands - by which I mean
  • Some SRADs have spare CPUs but no RAM 
  • Other SRADs have no CPUs but do have RAM available
 In diagram form:
CPU RAM islands
This (above) is a worst case and what we want to avoid, if at all possible.

Virtual machine persistent placement:

It might not be obvious but when you shutdown a VM and start the VM - it is usually starts on the same CPUs and memory.  This is to provide consistency of performance. This has been true for many years and not introduced with POWER7. We would not want a simple cold restart to change the virtual machine placement and so give you different performance - that would be very hard to live with and nearly impossible to explain.  This does limit some of the choices in virtual machine placement.  Removing this "sticky" feature is difficult - you can make a profile with very little in it, then start the virtual machine until is starts loading AIX and then stop immediate it.  I think a cold reboot of the machine might get the allocation removed too.

A few examples of good, bad and "oh dear"

I have seen some customer virtual machines that have ended up with bad layouts in SRAD placement terms.
 
First, here is my 20 Virtual Processor virtual machine (remember Entitlement is irrelevant) as it started up after a 7 hour power outage (which took down a 2 mile radius block of London - not our fault!). So it was definitely a cold boot.
LSSRAD GOOD EXAMPLE
Here we find:
  • SRAD 0 - 7 virtual processors with 21195.25 MB RAM = 3028 MB per CPU-core
  • SRAD 1 - 6 virtual processors with 20027.0 MB RAM = 3338 MB per CPU-core
  • SRAD 2 - 4 virtual processors with 11827.5 MB RAM = 2957 MB per CPU-core
  • SRAD 3 - 3 virtual processors with 10582.5 MB RAM =  3528 MB per CPU-core

The machine only has 32 CPUs in two CEC drawers of the Power 770 and it has decided to lay it out across all four SRADs with a pretty even spread across them and fairly even memory to match.  This virtual machine was started after three Virtual I/O Servers, an NIM server and my Systems Director virtual machine. This might explain why it is not absolutely consistent but will be fairly typical of a largish virtual machine started when the machine was (guessing here 25% already running other smallish virtual machines. I am please with this placement.
 
Update: I then tried a few experiments.
I dropped the VP to 16 thinking it would use fewer SRADs than all four of my Power 770 and got the following:
lssrad example 16 CPUs
IN the above example it is still using all four SRAD but then I remembered this virtual machine has 64 GB of RAM. The machine has 128 in total so that is 32 GB per POWER - as I have other LPARs running some memory of each SRAD was probably used so it could not go for 100% of the memory of two SRADs and was thus spreading out the VM.  I then dropped the memory from 64 GB to 32 GB with the same 16 virtual processors and restarted the VM and got ...
lssrad 16 smaller
Above we now have
  • Only 3 SRADs (good),
  • Not an even CPU split across SRADs (8 CPUs, 5 CPUs and 3 CPUs) - looks odd but perhaps it does not have 8 CPUs in a further SRAD (i.e. a whole empty POWER7 chip) for this VM so it splits the CPUs between SRADs
  • Even memory split of 2 GB per CPU-core balanced (good).
This is not the configuration I would prefer but it is not terrible.
What have we learnt?
  1. You can't decide where the virtual machine goes in absolute physical terms.
  2. You might not get the layout balance you would prefer (like above two whole POWER7 chips and 16 GB of RAM each) due to existing virtual machines.
  3. Given the options (free resources) the Hypervisor does something sensible.
 Here is a "not too bad" example of a smaller 8 Virtual Processor virtual machine:
lssrad unbalanced
Above most of the virtual machine (Logical (SMT) CPUs 0 - 27 = 28 in total = 7 physical CPUs) is in one SRAD but I can only guess but the last CPU was allocated forcing one in a different SRAD.  The memory is not balanced either
  • SRA1 virtual processors with 20027.0 MB RAM = 1490 MB per CPU-core - this is rather low
 Now here is a pretty bad example, from a customer with a Power 795 (it has been fixed):
lssrad BAD
 Above, we see first that the Power 795 can have four SRADs per Processor book as it contains four POWER7 chips to make up the 32 per book. There is clearly five SRADs with one CPU-core and no memory at all.
  • SRAD 0 - 3 virtual processors with 13677.5 MB RAM = 4560 MB per CPU-core
  • SRAD 1 - 8 virtual processors with 35808 MB RAM = 4476 MB per CPU-core
  • SRAD 2 - 1 virtual processors with 4731 MB RAM = 4731 MB per CPU-core
  • SRAD 3 - 1 virtual processors with 0 MB RAM =  0 MB per CPU-core
  • SRAD 4 - 1 virtual processors with 0 MB RAM =  0 MB per CPU-core
  • SRAD 5 - 1 virtual processors with 0 MB RAM =  0 MB per CPU-core
  • SRAD 6 - 1 virtual processors with 0 MB RAM =  0 MB per CPU-core
  • SRAD 7 - 1 virtual processors with 0 MB RAM =  0 MB per CPU-core
  Above the first three SRADs look OK with a fair sharing out of the memory but the last five are not good news - with no memory at all processes running here will have 100% of their memory Near (SRAD 1) or Far (SRADs 4, 5, 6 & 7). This is not a disaster but we could do better - at least SRAD 0 and 1 have the bulk of the CPUs and memory.
 
So have a look at your virtual machines with lssrad -av and decide whether you like the look of them. This may influence the VM start-up order next time you restart your machine (after you have reduced you VP count).  There is one way to make your placement much worse (covered in part 8) but no simple way to improve it, unless you are on old firmware and have Capacity Upgrade on Demand (covered in part 9).

Get Out of Jail Free Card:
Just to remind readers again the POWER based system have excellent memory sub-systems, excellent inter-node memory bandwidth and the machines will still perform well with heavy use of Far memory accesses. In addition, AIX will place processes and take action to minimise Far memory access.  But it is not optimal and with planning and care we can get the performance a bit higher.  In the case above, I could see AIX trying not to use the last five SRAD's until absolutely necessary but we have already seen that the first SMT thread on each physical CPU-core is used before other threads.  Unfortunately, I don't have the above layout to experiment on to find out which algorithm takes wins.

Update:
After writing this article, IBM released Dynamic Performance Optimiser.  This can adjust, on-the-fly with virtual machines running, the allocation of the CPU and memory to localise CPU and memory - this is done quickly if moving the CPUs. The new CPUs are allocated and scheduled instead.  If allocated memory needs to be reallocated then it takes a little time as the memory needs to be locked one LMB at a time and the content moved.  DPO was first available with POWER7 and is available for POWER8 and POWER9 too. Although it is less important with the later processors as the memory sub-system is that much faster.

Additional Information


Other places to find Nigel Griffiths IBM (retired)

Document Location

Worldwide

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HW1W1","label":"Power -\u003EPowerLinux"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG60","label":"IBM i"},"Component":"","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
13 June 2023

UID

ibm11126701