IBM Support

POWER CPU Memory Affinity 8 - Dynamic LPAR changes can mess up your placement

How To


Summary

Power Systems gain their massive performance with lots a technology this series details many of them.

Objective

Nigels Banner

Originally, written in 2012 for the DeveloperWorks AIXpert Blog for POWER7 but updated in 2019 for POWER8 and POWER9.

Steps

After you have started and used your virtual machine (VM) for a while, you may decide to change its size using a Dynamic LPAR (DLPAR) change from the HMC. This has virtual machine placement implications. 
  • When shrinking your VM, the hypervisor will decide which CPU or memory (LBM) to release and it might not select what you think is the obvious choice. 
  • When enlarging your VM, we might think there is an obvious way to grow in a balanced way but we can't see:
    • 1) where physically our virtual machine is placed in the physical machine - note: the lssrad command output is relative and always starts from SRAD 0
    • 2) we don't know where there are free resources
When playing with DLPAR on my non-production box, I frequently was scratching my head thinking: Why it was doing certain things? And then later suddenly working it out.  Examples: I worked out the second SRAD was empty (must not have any LPAR using CPUs nor memory) and that is why it keeps getting more CPU + RAM to it. Or I was asking for 32 GB and realised it can't make a single SRAD VM because there is a little overhead but it could make a 31 GB one.  In practice, I have found the hypervisor is making good choices but often the reasons are a little mysterious!  Not many systems adminstraters one can spend half a day experimenting to work out why.
 
In the Advanced Technology Support group, we tend to have the latest HMC, firmware (hypervisor) and operating systems installed, early beta versions as we get involved with testing for user experience feedback, improving the documentation etc.  When I was first experimenting with early POWER7 machines and looking at CPU and Memory Affinity, we were on an older firmware level and I found the shrinking and enlarging virtual machines got into some odd situations and more asymmetric placements over time. But today, when I went to capture some examples, of shrinking a 16 virtual processor + 32 GB of memory virtual machine down to just to 2 CPUs and 4 GB and back again it was well behaved.  We updated the firmware in between the tests.
 
It is good practice to keep the CPU to memory ratio about the same for our workloads but this requires two Dynamic LPAR operations. We can't remove CPU and memory at the same time - if you try this you will fine it locks the virtual machine during the first operation so not further changes are possible until it completes.  The two steps makes it harder work for the hypervisor as it can't guess your intentions or what you will or may do in the near future. Those that have been around since POWER4 days will know that it is recommended best practice not to make very large memory reductions in one go but to take say 4 GB a few times. Releasing memory from the virtual machine requires AIX to empty the memory first. If the memory is in use it means paging the content out and that can take a considerable time. My test VM was not using the memory so it was quick but on a busy server it can take many minutes or an hour if you use a draconian removal size.  I noticed that on large memory reductions, the HMC shows the Reference Code 2003. If you click the code it tells you a "Dynamic LPAR Memory Removal" is in progress.
 
An example:
Starting with Entitlement=16 (note that is not important), virtual processors=16 and desired memory=32 GB
# lssrad -av
REF1   SRAD        MEM      CPU
0
          0   15824.31      0-7 12-15 20-23 28-31 36-39 44-47 56-59
1
          1   12325.50      8-11 16-19 24-27 32-35 40-43 52-55
          2    3610.50      48-51 60-63
#

 This was taken down in various stages to E=2, VP=2 and RAM=8 GB
# lssrad -av
REF1   SRAD        MEM      CPU
0
          0    4121.31      56-59
1
          1    4108.50      52-55
          2     124.50
#

 Then boosted up to E=24, VP=24, RAM=48 GB
# lssrad -av
REF1   SRAD        MEM      CPU
0
          0   19932.81      0-7 12-15 20-23 28-31 36-39 44-47 56-59 64-67 72-75 80-83 88-91
          3    3859.50
1
          1   21289.50      8-11 16-19 24-27 32-35 40-43 52-55 68-71 76-79 84-87
          2    2614.50      48-51 60-63 92-95
# 

 Followed by going back to the start position of E=16, VP=16 and RAM=32 GB
# lssrad -av
REF1   SRAD        MEM      CPU
0
          0   14828.31      28-31 36-39 44-47 56-59 64-67 72-75 80-83 88-91
          3       0.00
1
          1   14317.50      24-27 32-35 40-43 52-55 68-71 76-79 84-87
          2    2614.50      48-51
#

Comments:
  • We have ended up with four SRADs instead of three but SRAD 3 is actually empty (I guess here that they are not removed completely once they have been used).
  • We have SRAD 2 with less memory and only one CPU now (logical CPUs 48 to 51 are one physical CPU) compared the start position where it had an extra GB and two CPUs.
  • The bulk of the virtual machine is in two SRADs with near equal memory and differ by only on CPU 8 and 7 CPUs) - I am guessing here that SRAD 1 has a CPU allocated to another VM so it can't be 8.
  • I think the current placement is pretty good - full marks to the hypervisor algorithms and their developers. It can't be easy to make a set of rules to cover all the combinations and permutations that it could encounter.
 
Further experiments:
I took the VM down to an extremely (from 16 CPUs and 32 GB) small one to just VP=2 and RAM=4 GB and then started adding 1 GB at a time:
 
# lssrad -av
REF1   SRAD        MEM      CPU
0
          0    1382.31      56-59
          3       0.00
1
          1    1245.00      68-71
          2    1245.00

ADD 1 GB
# lssrad -av
REF1   SRAD        MEM      CPU
0
          0    2378.31      56-59
          3       0.00
1
          1    1245.00      68-71
          2    1245.00

ADD 1 GB
# lssrad -av
REF1   SRAD        MEM      CPU
0
          0    3374.31      56-59
          3       0.00
1
          1    1245.00      68-71
          2    1245.00

ADD 1 GB
# lssrad -av
REF1   SRAD        MEM      CPU
0
          0    4370.31      56-59
          3       0.00
1
          1    1245.00      68-71
          2    1245.00

ADD 1 GB
# lssrad -av
REF1   SRAD        MEM      CPU
0
          0    5366.31      56-59
          3       0.00
1
          1    1245.00      68-71
          2    1245.00
#

 
 Now we have orphaned memory in SRAD 2 and a rather unbalanced memory with 5.3 GB to one CPU and 1.2 GB for the others. Also I think the two CPUs are in different CEC drawers - indicated with REF0 and REF1.
 
Now I ask the impossible - while our experimental virtual machine is tiny I start-up a large virtual machine that takes up all the resources of the machine except enough "left over" for our test VM to grow back to its original size.  This means this new large VM has probably got the CPU's and memory that was used in out test VM so the test VM has to get the "left overs" - there was no way the hypervisor would know we would grow again.
 
First, I add back the memory to 32 GB
# lssrad -av
REF1   SRAD        MEM      CPU
0
          0   12836.31      56-59
          3       0.00
1
          1    1618.50      68-71
          2   17305.50
#
This is not a pleasant VM placement at all - but the hypervisor has no real options as nearly all the machines memory is in use now.
 
Second, I add back the CPUs to VP=16:
# lssrad -av
REF1   SRAD        MEM      CPU
0
          0   12836.31      0-7 12-15 20-23 28-31 36-39 44-47 56-59
          3       0.00
1
          1    1618.50      8-11 16-19 24-27 32-35 40-43 52-55 68-71
          2   17305.50      48-51
#
As expected the memory is the same and the CPUs with their "sticky" nature (you tend to get the same every time were added back as they were.  If I have started and stoped a dozen other VM's on this VM's CPUs I suspect that may not be the case - I have not tested that idea.  The worst aspects of our latest placement is the memory in SRAD2. Did you spot SRAD 0 has one more CPU than SRAD 1?  Also note that we have gaps in the CPU numbers and have a top Logical CPU number of 71 for 16 physical CPUs with SMT=4 that gives us 64 Logical CPUs. Can you spot the missing ones?
 
In the case above, the warning is clear that drastic DLPAR changes and starting new virtual machines in between can lead to sub-optimal VM placement.  Of course, smaller changes or temporary boost will not have such a dramatic effect as this deliberately extreme test case.
 

 Get Out of Jail Free Card:
Just to remind readers the POWER based system have excellent memory sub-systems, excellent inter-node memory bandwidth and the machines will still perform well with heavy use of Far memory accesses. In addition, AIX will place processes and take action to minimise Far memory access.  But it is not optimal and with planning and care we can get the performance a bit higher. 

We can use the Dynamic Platform Optimiser (DPO) to rethink the VM CPU and Memory Placement and estimate the benefits this will give and use it to actually make the changes.  This is not disruptive to the running system. For more information:

Additional Information


Other places to find content from Nigel Griffiths IBM (retired)

Document Location

Worldwide

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HW1W1","label":"Power -\u003EPowerLinux"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG60","label":"IBM i"},"Component":"","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
13 June 2023

UID

ibm11126719