IBM Support

The native AIX largesend segmentation offload feature in virtual Ethernet environment

Question & Answer


Question

What is the native AIX largesend offload feature in virtual Ethernet environment and how can it be enabled?

Cause

IBM Power servers running Logical Partitions (LPAR) with the IBM AIX operating system can use Virtual Ethernet Adapters (VEA).
These VEA's offer high network packet throughput using the Power Hypervisor to transfer the data packets directly between LPAR's.
Virtual Ethernet Adapters can be configured with a Maximum Transfer Unit (MTU) of up to 65390Bytes.
The computational overhead when sending and receiving packets is nearly the same no matter how large the packet is, therefore either a higher throughput or less CPU usage can be reached when sending bigger packets compared to standard Ethernet frames with a MTU of 1500Bytes.
Virtual IO Servers (VIOS) provide Shared Ethernet Adapters (SEA) that connect the virtual Ethernet to the external network components.
In most environments the Virtual Ethernet MTU is set to the standard 1500Bytes to allow interoperability with external network components.

Answer

What is the native AIX largesend feature?

For TCP sessions between VEA's configured in the same Virtual Ethernet Switch (vswitch) and using the same Virtual LAN ID (VLAN ID, PVID) the native AIX largesend feature provides the possibility to send larger packets while keeping the configured IP interface MTU size to a smaller value like the standard MTU of 1500Bytes.
Enabling largesend can either improve the network throughput and/or reduce the CPU usage during a data transfer. The Power Hypervisor is forwarding the packets to the destination VEA or if the destination is not inside the same VLAN/vswitch then to the Trunk VEA.  The largesend feature is transparent for applications and as well for the external network components.
The segmentation of the largesend packets into standard Ethernet MTU sized packets is offloaded to the real Ethernet Adapter hardware, therefore the checksum offloading feature of VEA's and real Ethernet Adapters is the basis for largesend to work.
Important note:
The largesend feature is negotiated for every TCP session separately during the three-way-handshake (SYN/SYN-ACK/ACK) therefore largesend will only be enabled for a session if sender and receiver have this feature enabled.

How can largesend be enabled on an IP interface that was configured on a VEA?

1. Find the VEA adapter
# lsdev -Cc adapter | grep l-lan
ent0   Available  Virtual I/O Ethernet Adapter (l-lan)
2. Make sure that the chksum_offload device attribute is active
# lsattr -El ent0 -a chksum_offload
chksum_offload    yes       Enable Checksum Offload for IPv4 packets True
The chksum_offload feature is by default "yes", but if it has been set to "no" for some reason, it needs to be changed to "yes".
-> in order to change the device attribute on the VEA ent0, the IP interface en0 needs to be detached first.
# ifconfig en0 detach
# chdev -l ent0 -a chksum_offload=yes
# mkdev -l en0
3. Enable largesend
The mtu_bypass IP interface attribute controls the largesend:
# lsattr -El en0 -a mtu_bypass
mtu_bypass   off     Enable/Disable largesend for virtual Ethernet True
Mtu_bypass can be changed dynamically, but it's only active for newly opened TCP sessions, because it is negotiated during the connection establishment phase:
# chdev -l en0 -a mtu_bypass=on
Alternatively the ifconfig command can be directly used for enabling largesend.
In this case the configuration is not stored in the ODM database and therefore lost after a reboot.
# ifconfig en0 largesend
4. Check the IP interface flags
# ifconfig en0
en0:  flags=1e084863,14c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN>
        inet 10.0.0.1 netmask 0xffffffc0 broadcast 10.0.0.255
         tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1

How can largesend be disabled on an IP interface that was configured on a VEA?

Mtu_bypass can be changed dynamically. Existing TCP sessions will stop using the largesend feature.
# chdev -l en0 -a mtu_bypass=off
Alternatively the ifconfig command can be directly used for disabling largesend.
In this case the configuration is not stored in the ODM database and therefore lost after a reboot.
# ifconfig en0 -largesend

How can largesend be enabled on a Shared Ethernet Adapter in a VIO Server?

When a SEA device was created from a real Ethernet network adapter card and a VEA(s), the "largesend" feature is activated by default if the underlying Ethernet network adapter card supports checksum offloading and large_send and also have these device attributes enabled.
1. Login to the VIO server as padmin user
2. List the Shared Ethernet Devices
$ lsdev | grep Shared
ent3             Available   Shared Ethernet Adapter
3. Show the largesend attribute of the SEA device
$ lsdev -dev ent3 -attr | grep largesend
largesend       1        Enable Hardware Transmit TCP Resegmentation                                        True
4. Enable largesend
$ chdev -dev ent3 -attr largesend=1
The change is dynamic.
5. Check the device driver flags of the Shared Ethernet Adapter
$ netstat -cdlistats ent3|grep "<"
    < THREAD >
    < LARGESEND >

How does the SEA handle largesend packets?

The largesend feature is negotiated during TCP connection establishment.  If a TCP session is opened between a largesend enabled VEA and an external device, then the packets go through the Trunk VEA that is inside the SEA .  The SEA now also negotiates the largesend capability by altering the SYN/SYN-ACK packets.
When sending packets to external network components, largesend enabled Shared Ethernet Adapters (SEA's) in VIO servers send the largesend packets to the network adapter card which is then segmenting the large packets into small packets in hardware. The largesend packet contains the TCP Maximum Segment Size (MSS) coded into the TCP checksum header field of the largesend packet.
When the largesend enabled Shared Ethernet Adapter received such a packet from the trunk VEA, the packet is sent to the real network adapter card that is then segmenting the largesend packets into small packets according to MSS value that was provided by the sending VEA.
Hint:
Some Virtual Ethernet Device drivers for Linux LPARs hosted on IBM Power Servers offer the possibility to enable a legacy largesend function.
This Linux legacy largesend function does not negotiate largesend with the SEA devices.  If the SEA device has largesend enabled, then the sending works.
If the SEA does not have largesend enabled, then these packets are NOT handled as largesend packets in the SEA code but as a simple large packet.  These packets are then fragmented in software by the SEA device. When packets have the "Don't Fragment" bit (DF) set in the IP header then the packet is dropped and an ICMP error packet "Destination unreachable, fragmentation necessary" is sent back to the sender.
The "Don't Fragment" bit is often set in the IP header i.e. when Path MTU discovery is enabled. 
The fragmentation of these largesend packets in software can cause throughput and load problems on the VIO servers and the dropping of packet with the DF bit set can cause serious communication problems.
This is visible in the SEA adapter statistics:
 
$ netstat -cdlistats ent3
...
Virtual Side Statistics:
    Packets received: 18468928
    Packets bridged: 18468764
    Packets consumed: 78255
    Packets fragmented: 48
    Packets transmitted: 23998741
    Packets dropped: 0
    Packets filtered(VlanId): 0
Other Statistics:
    Output packets generated: 1519703
    Output packets dropped: 0
    Device output failures: 0
    Memory allocation failures: 0
    ICMP error packets sent: 48
    Non IP packets larger than MTU: 0
    Thread queue overflow packets: 0
...

What happens if the VIO client LPARs are using largesend while the SEA has largesend disabled?

If a TCP session is opened between a largesend enabled VEA and an external device, then the packets go through the Trunk VEA that is inside the SEA .  The largesend negotiation will fail with a non-largesend enabled SEA. Therefore TCP sessions through this SEA device will not use the largesend feature.

How can I check if largesend packets are sent by a VIO client LPAR?

The netstat TCP protocol statistics shows if largesend packets have been sent.
$ netstat -p TCP |grep large
                33 large sends
                1048576 bytes sent using largesend
                60816 bytes is the biggest largesend
The netstat socket listing does show for every TCP socket if largesend has been negotiated successfully or not.
$ netstat -Aon
f1000e00002c73b8 tcp4       0      0  10.0.0.1.22       10.0.0.2.53019    ESTABLISHED

         so_options: (REUSEADDR|KEEPALIVE)
         so_state: (ISCONNECTED|PRIV|NBIO)
         timeo:0 uid:0
         so_special: (LOCKABLE|MEMCOMPRESS|DISABLE)
         so_special2: (PROC)
         sndbuf:
                 hiwat:262272 lowat:16384 mbcnt:0 mbmax:1049088
         rcvbuf:
                 hiwat:262272 lowat:1 mbcnt:0 mbmax:1049088
                 sb_flags: (SEL|NOTIFY)
         TCP:
         mss:1366  flags: (DELACK|NODELAY|RFC1323|SENT_WS|RCVD_WS|LARGESEND|VIRTUAL_LARGESEND|SENT_LS...
...|RCVD_LS|COPYFLAGS)

SUPPORT:

If additional assistance is required after completing all of the instructions provided in this document, please follow the step-by-step instructions below to contact IBM to open a case for software under warranty or with an active and valid support contract.  The technical support specialist assigned to your case will confirm that you have completed these steps.

a.  Document and/or take screen shots of all symptoms, errors, and/or messages that might have occurred

b.  Capture any logs or data relevant to the situation.

c.  Contact IBM to open a case:

   -For electronic support, please visit the IBM Support Community:
     https://www.ibm.com/mysupport
   -If you require telephone support, please visit the web page:
      https://www.ibm.com/planetwide/

d.  Provide a good description of your issue and reference this technote

e.  Upload all of the details and data to your case

   -You can attach files to your case in the IBM Support Community
   -Or Upload data to IBM testcase server analysis:

    http://www.ibm.com/support/docview.wss?uid=ibm10733581

f.  Click here to submit feedback for this document.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSPHKW","label":"PowerVM Virtual I\/O Server"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
03 June 2021

UID

ibm10885512