IBM Support

Why vNIC failover/failback takes long time and uses high CPU on VIOS during failover/failback?

Question & Answer


Question

Power9 (Model: 9009-42A) has following configuration.
System Firmware: 930.03
VIOS:
Number of VIOS: 2
VIOS Level: 3.1.0.21
Each VIOS has 24 Virtual NIC Server Device
Backing device: PCIe3 2 Port 25/10 Gb NIC& RoCE Adapter VF (b315161014101e06), Feature Code: EC2U, device driver: mlxcendd
VIO Clients:
Number of VIO Clients: 24
AIX Level: 7.2 TL03 SP02
Each VIO Client has ent0 as vNIC Client Adapter in vNIC failover mode
When vNIC failover/failback is done, it takes long time, uses high CPU on VIOS and some VIO clients loose connectivity longer than expected.
What actions can be taken to resolve these problems?

Answer

Following diagram shows current vNIC failover configuration.
• 24 VIO Clients with vNIC failover configuration with failover priority 50 and 60
• Auto Priority Failover enabled
• 48 backing devices are distributed between two physical ports (one port on
   each adapter)
• 24 active backing devices on one physical port when both physical ports are up
Current vNIC failover Configuration:
image 3022
Following actions are taken to reduce failover/failback time and reduce the CPU usage on VIOS.
(1) In current configuration, all 24 vNIC servers have backing device associated
with physical port T1 on adapter in slot C2. When failover occurs all 24 vNIC
servers will try to failover at same time to physical port T1 on adapter in slot C10
and causes delay in failover. Same occurs when failback to adapter in slot C2.
To reduce the failover/failback time, the backing devices need to be distributed
among 4 physical ports as shown below in revised vNIC failover configuration.
• 24 VIO Clients with vNIC failover configuration with failover priority 50 and 60
• Auto Priority Failover enabled
• 48 backing devices are distributed between four physical ports (two physical
  ports on each adapter)
• 6 active backing devices on one physical port when all 4 physical ports are up
Revised vNIC failover configuration:
image 3034
(2) Upgrade the system firmware to 940.02 to take benefit of new XIVE (External Interrupt Virtualization Engine) performance feature added in system firmware level 940.00 and to get latest fixes for the adapter firmware.
(3) Upgrade the VIOS to 3.1.1.10
Note: To take benefit of new XIVE performance feature, VIOS needs to be at 3.1.1.0 or higher.
(4) Install ifix of following apars on VIOS because they are currently not available.
IJ23731 - Reduce CPU consumption in entcore
IJ23628 - Reduce CPU consumption in mlxcendd driver
IJ21338 - vnicserver_crq process high CPU usage
Author: Darshan Patel
Operating System: AIX and VIOS
Hardware: Power
Feedback:
aix_feedback@wwpdl.vnet.ibm.com

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSPHKW","label":"PowerVM Virtual I\/O Server"},"ARM Category":[],"Platform":[{"code":"PF053","label":"Power"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
20 October 2021

UID

ibm16198863