IBM Support

IBM PowerVM and AIX How To: Identifying and resolving common Ethernet adapter, SEA failures on VIOS, and AIX Etherchannel

How To


Summary

The purpose of this document is to list common issues and their solutions concerning Ethernet adapters with IBM PowerVM and AIX. This document covers dedicated (stand-alone) Ethernet adapters, Etherchannel, and SEA configurations.

Objective

This document discusses the following topics:
  • Link Down and Link Up errors on Ethernet devices
  • General Etherchannel Failure
  • Link Aggregation Control Protocol (LACP) Etherchannel Tips
  • General Shared Ethernet adapter (SEA) failures
  • Configure output options for unused Ethernet devices

Environment

IBM PowerVM (VIOS) and IBM AIX

Steps

Hide or Show All topics

Link up and Link Down events 

Tips:
The announcement of a link down or a link up event is the result of an issue with the Ethernet device, the Small Form-Factor Pluggable Modules (SFPs), Cables, or the switchport on the switch. Work with your network administrators to hunt down the event and verify the switch logs match the AIX error log. For most occasions, run the diag tool against the adapter port (by using the appropriate loopback plug) to verify that the Ethernet port is serviceable. It might return with a message to replace the whole adapter. Otherwise, the SFPs or the cables might need to be replaced. In rare instances, the switchport on the switch failed. Using the SMIT (System Management Interface Tool) fast path is easiest.
If you want to use the cli command, use the following command syntax: 
diag
For the SMIT fastpath use:
smit

General Etherchannel Failure

Tips:
Etherchannel failures, where the member Ethernet ports are having link issues, need to be handled slightly differently. Standard Etherchannel do not use LACP (Link Aggregation Control Protocol).  Logically assign these switchports to a port channel group, so that it can be more easily managed by the network administrators, ensuring the port configuration is standard. Etherchannel offer robust link failure protection, requiring only one good link to continue handling traffic. 
As with individual port failures, work with your network administrators to determine whether the failure is network or hardware based. Another version of the Etherchannel is the Network Interface Backup (NIB).  This version of the standard Etherchannel has Ethernet adapters in the primary (active) channel and in the backup (inactive) channel. A Network address to ping is required for this Etherchannel to fail over properly. Alternatively, you can use the poll uplink option in lieu of the network address to ping. As many secure environments consider the constant ping sent by the Etherchannel to the specified address as a DDoS attack. Without the network address to ping or poll uplink feature in use, the NIB code cannot determine whether the primary link is down to allow the Etherchannel fail over to the backup adapter. If there are issues failing over from "Primary" to "Backup" on a NIB Etherchannel. Your issue is the "netaddr" value in the Etherchannel configuration or you need to turn on the poll uplink feature. 

LACP Etherchannel issues

Tips:
802.3ad Etherchannel's use LACP. This setting requires the Ethernet adapter ports to generate lacpdu and communicate with the switch to ensure that aggregation is in sync and taking place. Provided is some output from an adapter in sync.
Here is the configuration output from an 802.3ad Etherchannel with two 40GB adapters.
eth_chan
So here ent22 and ent23 are configured to send lacpdu to the switch.  Here is part of the "entstat -d" command output to verify the links on the Etherchannel are in sync and sending and collecting data.
lacp
Errors can occur with LACP Etherchannel when adapter ports fall out of sync. This condition can happen a number of ways:
1.  A link down event
2.  A switchport is removed from the Etherchannel
3.  The switchport mode on the switch is not set properly for a LACP enabled port channel
For a link down event, after the expiration of the LACP sync (90 sec) you need to bounce (restart) the Ethernet adapter either from the switch or from AIX. Use the AIX SMIT Etherchannel fast path to remove and readd adapters in the Etherchannel to restart the LACP negotiation. If the Etherchannel switchport is removed from the Etherchannel, LACP is no longer sent to the AIX v adapter, and it falls out of sync. Readd this switchport to the port channel to resume LACP sync. The last possibility is the switchport is set for a port channel, but the mode is incorrectly set on the switch. So the AIX adapter is expecting LACP from the switch, but the switch does not know to send LACP to the AIX adapter. This condition causes the AIX 802.3ad Etherchannel to be out of sync.  Ensure your network administrators set the port channel mode to either active or passive to ensure LACP is used.
* SPECIAL NOTE*
You can configure an 802.3ad adapter with primary and backup adapters. If you want to create this type of LACP enabled Etherchannel, a different port channel is required for the backup adapters. Backup adapters with the same operational key as the primary adapters are not allowed to handle traffic. Ensure that the primary channel adapters are members of port channel A and the backup adapters are members of port channel B. This limitation is not in place when a port channel of the same number is created on a different switch as it has a different operational key.

General Shared Ethernet adapter (SEA) failures

Tips:
Without a physical adapter to move the data to the network your SEA might enter the LIMBO state. The current condition can be shown with the following command:
sea
The SEA state output is: "Primary", "Backup", "PRIMARY_SH", "BACKUP_SH", or "Limbo". If your SEA is constantly flapping or swapping from primary to back up, we need to troubleshoot the underlying adapter issue. Start by removing the troublesome SEA from rotation with this command:
standby_command
Now check on the adapters and the network to alleviate the flapping issue. Once the issue is resolved, return the SEA to service with:
 sea (or sharing if you are using the SEAs in a sharing mode)
The SEA in a traditional primary-backup configuration is easy to manipulate for minimal downtime for your clients. If you do not have "failover" enabled, IBM cannot guarantee minimized outage time.
*NOTE*
If there is a need to replace the real adapter from the SEA with another. Delete the SEA and re-create it with the new adapter. Merely changing the real-adapter in the SEA can work, but there are rare instances where the forced dynamic change can affect AIX ability to detect the active link speed.

Configure output options for unused adapters

Tips:
As AIX progresses and new enhancements are provided. These enhancements result in a design change in the behavior of the adapter that requires system administrators to adjust how they manage these adapters. With the latest network adapter code, allowing unused adapters in an unconfigured state can result in system and command impact on system performance. If there is monitoring software running on your VIOS or AIX logical partition, which might include: nmon, topas, perfprovider, or IBM Tivoli Monitor (to name a few). The software checks the interface and brings it up then down, which logs a link down error. You might see performance hindrances as these checks require memory and cpu. If there are many unused adapters it can cause system crashes, high network latency, high cpu usage, or cause applications to run slow. The first and easiest way to be rid of this problem is to remove adapters that are not being used. If there are extra adapters assigned, remove them and assign them elsewhere. You can instead use wrap plugs, cable the adapter, or set a dummy IP. A dummy IP forces the link to report as up, and alleviates the slowness you could encounter with unused adapters.
If you want to use a dummy IP, use the following commands as an example:
ip
Setting the network address with a nonroutable IP address can be a safe way to force the interface into the up state without allowing an access point that leaves your network. If you do not want to assign an IP, routable or not, you could force the adapter into the up state with:
up
Alternatively, you could also set the adapter as defined:
rm
However, anytime you restart the system or if someone runs the cfgmgr command the adapter returns to the available state and your issue regenerates. Our last option is using the entstat_mode environment variable configuration. With this configuration, you can stop tools such as entstat and netstat -v from opening and closing unused physical adapters that are not already open. Stopping the link down messages in the error log and improve your cpu usage.
The ENTSTAT_MODE can be set with the export command, in ".profile" of a specific user, or set system wide in "/etc/profile", or from within a shell script.
There are the three possible configurations:
closed.ignore = Ignore Ethernet devices that are closed.
closed.message = Print "Device entX is closed." to stdout.
closed.error = Print "Device entX is closed, errno = ENETDOWN (69)." to stderr.
The following examples demonstrate how different values work.
To set as closed.ignore use this command:
closed
Now verify with:
demo1
No message or error is displayed
To set as closed.message
closed
Now verify with:
co
To set as closed.error
ee
Now verify with
exout

Additional Information

SUPPORT

If you require more assistance, use the following step-by-step instructions to contact IBM to open a case for software with an active and valid support contract.  

1. Document (or collect screen captures of) all symptoms, errors, and messages related to your issue.

2. Capture any logs or data relevant to the situation.

3. Contact IBM to open a case:

   -For electronic support, see the IBM Support Community:
     https://www.ibm.com/mysupport
   -If you require telephone support, see the web page:
      https://www.ibm.com/planetwide/

4. Provide a clear, concise description of the issue.

 - For guidance, use this link: Working with IBM AIX Support: Describing the problem.

5. If the system is accessible, collect a system snap, and upload all of the details and data for your case.

 - For guidance, use this link: Working with IBM AIX Support: Collecting snap data

Click here to submit feedback for this document.

Additional Information

Author: Eduardo D. Garza Jr.
Reviewers: Roy G. Spencer, Roger Leuckie
Contributors: Darshan Patel, Saania Khanna

Document Location

Worldwide

Operating System

System x:AIX

[{"Type":"MASTER","Line of Business":{"code":"LOB08","label":"Cognitive Systems"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"ARM Category":[{"code":"a8m0z000000cvzIAAQ","label":"Networking"}],"ARM Case Number":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions"}]

Document Information

Modified date:
20 October 2022

UID

ibm16493885