IBM Support

POWER CPU Memory Affinity 6 - Too Many Virtual Processors has a Bad Side Effect

How To


Summary

Power Systems gain there massive performance with lots a technology this series details many of them.

Objective

Nigels Banner

Originally, written in 2012 for the DeveloperWorks AIXpert Blog for POWER7 but updated in 2019 for POWER8 and POWER9.

Steps

In this entry, we carry on from part 5 but we are going to look at setting the virtual processor number for the virtual machine. There is a side effect that is not obvious and after 6 years of using them, it never occurred to me so perhaps it is news to others too. The problem of virtual processors is that they are ephemeral - that is, they don't exist and costs nothing.  So I find most systems administrators feel they can be generous and allocate lots of them.  In the AIXpert blog, I have pointed out that it would be clearer if Virtual Processor (VP) was called the "Spreading Factor".  In Uncapped virtual machines, the VP number tells the Hypervisor the number of physical processors the VM can "see" (that is, the number of CPUs AIX knows about) and can spread out the processes across. Also in previous entries we have studied AIX CPU Folding, where AIX decides that running more CPU cycles on fewer physical CPU is more efficient and drops hint to the Hypervisor. This may make some systems administrators feel there are no bad effects of high VP numbers.
So is there any harm in having a large Virtual Processor number?  The Answer is YES

Here is a little reminder from part 5 of an example virtual machines (LPAR) setup:

For illustration purposes let us take a VM, which averages in busy periods 16 physical CPUs, peaks regularly for a few minutes to 18 physical CPUs and we have a shared processor pool of say 48 physical CPUs. Four scenarios:
1 Maximum flexibility - "Virtual Processors are free, right!"
  • E as low as possible (as a minimum you need a tenth of a CPU per VP) then VP as high as possible
  • Example: VP = 48 and E=48/10 = 4.8
2 It is production, so it must get what it needs
  • E around the average CPU use of the VM (as monitored over a week or month) then VP=E * 2.
  • Example: E=16 and VP=48
3 It is vital production so it needs to peak and perform
  • E to cover the regular peaks (as monitored) and then VP a good handful of extra CPUs
  • Example: E=18 and VP=36
4 It is vital but limit the spread across the machine, so VP=Entitlement rounded up plus 1 or 2
  • E to cover the peaks and the minimum extra VP
  • Example: E=18 and VP=20
     
All of these involve some compromise but there are two hidden side effects that you should know about. The too low an Entitlement side effect was covered in part 5 and the Virtual Processor number side effect covered below.

Note:
  • This article was written after POWER7 based computers were released. With a Power 770 each of the four CPU and Memory Drawer could content 16 CPU, totaling 64.
  • In the POWER8 and POWER9 based computers E880 and E980 that became 48 per CPU Drawers and a total of 192 CPUs. 
  • The principles are exactly, same even it the diagrams would have to be a lot more complicated with all those extra CPUs.  So in the interests of simplicity, I left the diagram showing the POWER7 details.
  • All these Enterprise servers have up to four Drawers each 4U high containing CPU, memory and adapters, which are inter-connected but very fast communication cables that turn the four drawers into one computer. These Drawers are also called Central Electronic Complex (CEC) Drawers.

Ask yourself the question what is the difference between virtual machines with VP=48 or 36 or 20? 
Well, whether you are using them or not (that is, AIX folded) the Virtual Processors have to be allocated to a physical CPU-core in the machine - it is the Hypervisor that allocates them.  At last in this series of blogs, we are back to considering virtual machine placement and the memory implications - sorry for the long detour in to Processor scheduling etc. 
Lets have a look at how it could be laid-out.
 
VP 1
If we take the example of a 64 CPU-core Power 770 in terms, it would look like the following picture with eight POWER7 chips and 64 CPU-cores in total.  These are housed in the four CEC drawers (brown squares with two POWER7 chips each) and a reminder about "Local, Near and Far" memory access:
VP 2
Small size Virtual Machines

 So if we have a virtual machine with a VP number of 8 (or less) - then if we are lucky the entire virtual machine could be placed on a single POWER chip with its 8 CPU-cores. The result is every data access is from the local memory
 DIMMs.
Medium size Virtual Machines

 If we have a virtual machine with a VP number of 9 to 16 then (again if lucky) it would get placed on the two POWER7 chips within a single CEC drawer. This would mean the data access would be Local or Near with AIX taking every opportunity to make the process and data on the same CPU-core and minimum Far data access.
Why do I keep saying "lucky"?

Well, if you already have virtual machines (LPARs) running the Hypervisor may have already allocated a few CPU-cores on each of the POWER chips and it will have to work around then. This would mean the Hypervisor can't go for a perfect placement. The following diagram, we show four different levels of "luckiness" for virtual machines using 8 virtual processors allocated to physical CPU-cores
 
VP 4
Do you feel Lucky, punk!":
  • Lucky - All CPU-cores are on a single POWER chip and you get every data access local
  • Licky-ish - Not too bad across two chips in the same CEC drawer, at least the worst case is Local and Near and hopefully Local most of the time as AIX tries for Local Affinity
  • Unlucky - the CPU-cores are spread across two CEC drawers, typically 50% of random memory accesses will be Far but again AIX will avoid that as much as possible.
  • Very Unlucky -  You are having a bad day most memory accesses will be Far like 75%. AIX will struggle to avoid Far memory accesses, so you have to take that hit but if you have lots of processes that don't share memory you still might do OK.
DON'T PANIC!
Although, I have painted this "Very Unlucky" layout as bad do not lose sight of the fact that the POWER range have VERY HIGH-SPEED CPU and memory sub-systems and can work well across CEC drawers. Those rPerf performance numbers are based on large 64 CPU-core virtual machines, where you have to deal with Local, Near and Far memory access ALL THE TIME.  It will work very well with Far memory accesses but if with a little thinking and planning we can get our smaller VM's going even faster by avoiding Far memory we might as well optimise them.
 Back to the larger LPAR example

In the worst case, where
  • We are starting the last virtual machine on the box,
  • Which will then mean we are using all the CPU-cores and
  • The virtual processor number is large.
We may get allocated CPU-cores on every chip on every CEC drawer because that is all that is available. Oh, well that is life, get use to it!
In the best case, where
  • We start the larger virtual machines first - so they get the best placement
  • Then we start the smaller virtual machines to fit around them
  • We keep the virtual processor number low - so we don't force the hypervisor to allocate CPU-core that are NOT actually going to get used
We may get the maximum Local memory and minimum use of Near and Far memory.
A bad idea

If we unnecessarily set VP=64 then we get the worst case every time = the VM over the whole machine :-(
Worst cases

For the worst case for VP= 20, 36 or 48 virtual processors, we end up with CPUs in every CEC drawers and all POWER chips.
Best cases
But for the best cases, the lower VP numbers do much better. See below for details:
 
VP 6
With the lower VP numbers the VM does not have to be spread across so many CEC drawers and thus gain efficiency  - this only took a little monitoring and planning.
Note:
  • In the VP=20 case, the bulk of the VM is on the one CEC drawer containing 16 CPU-cores and the bulk of the memory access will be Local or Near. AIX is aware that the other four CPU-cores are Far - it will try to avoid using these CPU-cores.
 Conclusions
  1. Smaller Virtual Processor numbers reduce the unnecessary spread of the Virtual Machine across the server.
  2. Don't use the maximum possible.
  3. Don't use a time two or 50% extra ratio.
  4. Preferably, assuming a sensible Entitlement, use the minimum plus 1 or 2.
  5. Now we know the side effect of high VP numbers. But even in the VP=20 case, we might consider in more detail the effects of a VP=16. That is getting below the single CEC size.
    • How much of the day does it require those two extra CPUs?
    • Are we absolutely sure that we are using all four SMT threads during those peaks?
    • See other AIXpert Blog entries on why AIX tends to use the maximum physical CPU-cores before switching on SMT.
But wait there is more - there is the placement of memory in addition to the CPU-cores to consider - that will be in part 7
 
- - - The End - - -
 

Additional Information


Other places to find Nigel Griffiths IBM (retired)

Document Location

Worldwide

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HW1W1","label":"Power -\u003EPowerLinux"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG60","label":"IBM i"},"Component":"","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
13 June 2023

UID

ibm11126419