IBM Support

Causes of Hypervisor Send and Receive Failures

Question & Answer


Question

My netstat -v output shows values for Hypervisor Send Failures and Hypervisor Receive Failures. What is the cause of these failures?

Answer

The following discussion is for VIOS LPAR. The diagram below shows a typical VIOS system.



The user can get the Hypervisor statistics a few different ways. The command

entstat -d entN

where entN is a SEA (Shared Ethernet Adapter) or a VEA (Virtual Ethernet Adapter) will result in these statistics being written out for each of the VEAs the SEA is using or the specified VEA. The more general command

netstat -v

does an entstat -d entN for all of the network device drivers currently in the Available state and not under a SEA or an ether channel.

A sample of this output is shown below along with line numbers. The text will refer back to the line numbers.

1| Hypervisor Send Failures: 791
2| Receiver Failures: 791
3| Send Errors: 0
4| Hypervisor Receive Failures: 0
5|
6| Invalid VLAN ID Packets: 0

Line 1: This is simply the sum of line 2 and line 3.

Line 2: This counter increments for a number of reasons. The two main reasons are:
For unicast packets, when a VEA sends a packet, in concept the packet goes to the virtual switch and then the virtual switch gives it to the appropriate receiving VEA. The reality is that one partition (the LPAR which may be AIX, VIOS, Linux, or IBM i) sends the packet to PHYP and PHYP looks for a receiving VEA.

1) In case 1, the receiving VEA is out of buffers. PHYP has no choice but to drop the packet. The receiving side will show an increase in "Hypervisor Receive Failures” while the sender will show an increase in "Receiver Failures”. If this is the cause, a review of all the other VEAs in the other LPARs using the same VLAN tag will show which VEAs are failing to receive the packets. The reason for the failures and the remedy will be explain further in Line 4 below.

2) In case 2, PHYP can not find any VEA with a matching VLAN and MAC address. If PHYP does not find a match, it must drop the packet and the return to the sending side looks the same as in case #1 so the Receiver Failures increase.

For a VEA on a virtual client LPAR, the cause will be a unicast packet that can not find a matching active truck VEA being (usually being used as a virtual adapter by a SEA) and no VEA on the same virtual switch with the destination MAC.

For a trunk VEA being used as one of the virtual adapters under a SEA, this counter can increase for a few reasons. The two most often are:

2a) Normal operation of a switch uses a CAM table to route packets. Switches differ from hubs in that they send packets destined for MAC address A out the physical port where it last received a packet with a source MAC address of A. To rephrase: a switch notes which port p packets from MAC A come in on by sniffing the source MAC address of the packet. The switch then send packets destined for that particular MAC address out port p.

The CAM table is the table the switch maintains that tell is which MAC addresses arrived on which ports. A key characteristic of CAM tables is the entries time out. If the switch receives a packet destined for a MAC that is not in its table, it acts like a hub and sends the packet out all[1] of the ports. Thus in normal operation, a small percentage of packets arrive at the switch after the CAM table entry has been expired. In this case, the switch sends the packet out all of the ports. Usually this goes unnoticed because the packets are dropped at the adapter[2]. But the SEA has its REA (Real Ethernet Adapter) in promiscuous mode so these packets come through the REA to the SEA which bridges it to the appropriate VEA. Since the MAC is somewhere else and not a VEA on the CEC, PHYP drops the packet and the "Receive Failures" increment.

The percentage of these should be very small -- one percent or less.

2b) If the switch detects various errors, it can turn off the use of the CAM table and essentially act like a hub for all packets. The switch should log an error when this happens. One article that explains this is here.

2c) For a link from A to B that has a complex set of routers between them, if the router closes to A routes the packet through a different path than the router closest to B, this is called asymmetric routing. In this case, a particular switch along one of the paths will get packets going from A to B but will not see any packets going from B to A. Since it never sees packets with a particular MAC as the source, it can not fill out the CAM table for that MAC. In essence, it has no idea which physical port to send the packet out so it again acts like a hub and sends it out all the ports. This has the same effect as 2b but there will not be any errors in the switch's error log.

Line 3: This counter increases for a few reasons as well. In recent customer cases, the most frequent reason has been that a TCP packet with checksum offload enabled destined for a multicast MAC address is being transmitted on the VEA. The next to the last paragraph on this page briefly describes this. The Microsoft load balancer does this as well as some security cameras.

Line 4: This counter increments when PHYP attempts to deliver a packet to the VEA but there are no buffers of the proper size to receive the packet. The buffers come in a range of sizes from tiny up to huge. When these errors are observed, support recommends a few steps.

The first is to check the entitlement of the LPAR to make sure it has enough CPU to cope with the highest load the can occur. The second is to make adjustments to the allocation of the buffers. This technique will be explained shortly.

Line 6: Due to changes in the SEA, it is rare that line 6 will have a non-zero value. The counter increments for the same reason as line 3 but the VEA makes a check to see if the VLAN tag is associated with the VEA. If it is, then the line 3 counter is incremented. If it is not, then the line 6 counter is incremented. The change to the SEA is it no longer gives such packets to the VEA to transmit in the first place.

Adjusting the VEA buffers

These are the recommendations from support which may differ from other groups. Support recommends that the min value be equal to the max value. The reason for this is to reduce the amount of "thrashing" by allocating and deallocating buffers. Setting min to max will help over all performance of the LPAR.

For tiny and small, support recommends to pro-actively set these to 4096.

For VEAs that have large send enabled or jumbo frames as well as those that have both of these features enabled, the medium, large, and huge buffers need to be adjusted if the user observes Hypervisor Receive Errors after the tiny and small sizes have already been adjusted.

Reviewing the Lowest Registered values will help give clues as to which sizes need to be adjusted. The sizes that have the value equal to the minimum value indicates sizes that this particular LPAR is not receiving packets for that size. To rephrase: if the Min Buffers for the Medium column is equal to its Lowest Registered, then the LPAR is not receiving any packets of Medium size and therefore is not in need of being adjusted.

There is no simple sure way to make these adjustments. Simply allocating the maximum value for each size for all VEAs can lead to the system running out of memory and not being able to configure some of the VEAs. Adjusting the sizes while the system is live is troublesome because the VEA must not be open and in use.

For calculations purposes, the sizes of the buffers are: tiny is 512 bytes, small is 2048 bytes, medium is 16384 bytes, large is 32768 bytes, and huge is 65536 bytes. From these sizes, the amount of memory consumed by the buffers can be calculated.

A common but not fool proof technique for adjusting the buffers using the small and tiny and ent12 as an example is the following sequence of commands:

chdev -P -l ent12 -a max_buf_tiny=4096
chdev -P -l ent12 -a min_buf_tiny=4096
chdev -P -l ent12 -a max_buf_small=4096
chdev -P -l ent12 -a min_buf_small=4096

Then reboot the system.

Note that the max needs to be altered first or the chdev will fail if min is attempted to be set to a value greater than max.

This will save the new values into the ODM database since the -P option is used and then reboot the system. When the system comes back up, the new values will be in use.

Footnotes:
[1] "all" in reference to all of the physical port of a switch is actually an over statement. Under most conditions, the packet will never be sent out the same port that it was received on. Also, if the packet is tagged, it will be sent out only those ports that either have the same PVID as the packet or have the tag listed in the allowed VLAN list for the port. Clearly, the packet will not be sent out any port that does not have link up.

[2] A real adapter normally filters out all packets except for packets destined for a small set of MAC addresses. This set includes the adapter's MAC address, any multicast MACs that the system has expressed interest in, and the broadcast address.

Author: Perry Smith
Team: wwnetk
Operating System: VIOS
Hardware: Power
Feedback: aix_feedback@wwpdl.vnet.ibm.com

[{"Product":{"code":"SWG10","label":"AIX"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"}],"Version":"6.1;7.1;7.2","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Product":{"code":"SWG10","label":"AIX"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":" ","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Product":{"code":"SWG10","label":"AIX"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":" ","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Product":{"code":"SWG10","label":"AIX"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":" ","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Product":{"code":"SS2HWD","label":"IBM Power Systems Enterprise Cloud Edition with AIX 7.2"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":" ","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
21 October 2021

UID

isg3T1026752