IBM Support

Why does the link of PCIe3 2 PORT 25/10/1 Gb NIC&ROCE SFP28 ADAPTER (feature code EC2U) take a long time to come up or not come up at all?

Question & Answer


Question

Why does the link of ent4 and ent7 in following setup take a long time to come up or not come up at all?
System:
Model              : 9040-MR9
Firmware        : FW950.40
ent_capacity  : 2.00
VIOS Level      : 3.1.3.14
 
Network Adapter:
SEA ent20
      Real Adapter: ent19 - Etherchannel - mode: 8023ad - hash_mode: src_dst_port
           Primary Adapter:
                ent4 P2-C5-T1 PCIe3 2 PORT 25/10/1 Gb NIC&ROCE SFP28 ADAPTER (b315151014101e06)
                ent7 P1-C5-T2 PCIe3 2 PORT 25/10/1 Gb NIC&ROCE SFP28 ADAPTER (b315151014101e06)
           Backup Adapter: None
SEA ent15
      Real Adapter: ent14 - Etherchannel - mode: 8023ad - hash_mode: src_dst_port
           Primary Adapter:  
               ent5 P2-C5-T2 PCIe3 2 PORT 25/10/1 Gb NIC&ROCE SFP28 ADAPTER (b315151014101e06)
               ent6 P1-C5-T1 PCIe3 2 PORT 25/10/1 Gb NIC&ROCE SFP28 ADAPTER (b315151014101e06)
           Backup Adapter: None
ent4/ent5/ent6/ent7 details:
        Part Number:                                     01FT751
        EC Level:                                             P14620
        FRU Number:                                     01FT753
        Feature Code/Marketing ID:           EC2U
        ROM Level (alterable):                     001400311014
        media_speed:                                    Auto_Negotiation
        Physical Port Negotiated Speed:   25Gbps Full Duplex
       • There are adapters made by Mellanox (now Nvidia).
       • If lsdev -Cc adapter and lsattr -El entX shows a device id starts with b315 then it is a Mellanox adapter.
  
Network Switch:
       • All 4 adapters are connected to 25 Gb ports on the Cisco Nexus 9K in the ACI setup.
      
Problem Symptoms:
       • Occasionally, the links of ent4 and ent7 take up to 1 hour come up or do not come up at all when vios
         reboots or the cable is disconnected and reconnected.
       • Links of ent5 and ent6 always come up promptly even though they are of the same type as ent4 and ent7.

Answer

 Following actions were taken to resolve the problem.
 • Checked cable and SFP and they were fine.  

 • Replaced the cable and SFP but that didn't help.  

 • Discussion with Cisco revealed that the port connected to the Mellanox adapter needs special tuning.
   Mellanox adapters and switch use a low frequency communication method for auto-negotiation during
   the link up process. Some switches have compatibility issues and do not support the low frequency
   communication in their hardware. In order to overcome the switch port speed getting locked to the
   negotiation signal, Cisco Nexus 9000 switches have a dfe-tuning-delay command that enables them to
   start locking to the signal only after a predefined delay time to avoid trying to lock on the low frequency
   signal. There are signal paths placed on the switch PCB that connect switch port to the chip inside the
   switch. Not all the paths are the same length. Depending upon the signal path, only certain ports on the
   switch require this tuning.
    image-20230403174436-1
    It is not set by default and the range is 0 to 10000 ms. This value can be set per port.
    Setting dfe-tuning-delay to 1500 on switch ports connected to ent4 and ent7 resolved the problem.
    After the change, links came up promptly and consistently.
    This tuning needs to be used only if needed because not all Mellanox adapters and switch ports require it.
    For more information about dfe-tuning-delay, please contact Cisco support.
Author: Darshan Patel
Platform: AIX and VIOS on Power
Feedback: aix_feedback@wwpdl.vnet.ibm.com
     

[{"Type":"MASTER","Line of Business":{"code":"LOB08","label":"Cognitive Systems"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"ARM Category":[{"code":"a8m0z000000cw48AAA","label":"Networking-\u003EAdapters"}],"ARM Case Number":"TS012247948","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions"}]

Document Information

Modified date:
13 April 2023

UID

ibm16967619