IBM Support

IBM Spectrum Scale Alert: Mellanox OFED 5.x considerations in IBM ESS V6.1.2.x+

Flashes (Alerts)


Abstract

Considerations need to be made when performing online upgrades (with RDMA, verbsRDMA enabled), where Mellanox OFED (MOFED) libraries are migrated to 5.x.

ESS versions ESS 6.1.2.0 and greater migrates the Mellanox libraries to 5.x.
- ESS Power 8 hardware includes MOFED 4.9.x with 5.x libraries.
- All other ESS hardware includes MOFED 5.x with 5.x libraries.

If you are running High-speed Ethernet or Infiniband environments that do not have verbsRDMA enabled, online upgrade is fully supported without these considerations (you are not impacted by this flash).

Once all of your systems (ESS, Remote, Client, Etc.) have been migrated to the MOFED 5.X libraries, online upgrade is fully supported without these considerations (you are not impacted by this flash).

Content

Problem Summary:

MOFED 4.9-3.1.5.3 is the only supported version of MOFED for doing an online upgrade (with RDMA, verbsRDMA enabled), where the libraries are migrated to MOFED 5.x.

Upgrade Options:

1. Offline upgrade – Bring down all ESS, remote, client, etc. nodes, upgrade ESS nodes to 6.1.2.x and migrate all other nodes to MOFED 5.x libraries.

2. Online upgrade with RDMA disabled – Disable verbsRDMA on all nodes until all nodes have migrated to MOFED 5.x libraries.

3. Online upgrade with RDMA enabled – upgrade/downgrade all ESS nodes to 4.9-3.1.5.3, then migrate all remote, client, etc. nodes to MOFED 5.x libraries, then upgrade the ESS nodes to 6.1.2.x or newer.

    - Once the ESS nodes are at 4.9-3.1.5.3, mixed libraries (pre 5.x and 5.x) are supported. However, it is recommended to not leave it in this state any longer than needed.

ESS releases that include MOFED 4.9-3.1.5.3:
- 5.3.7.2
- 5.3.7.3
- 6.0.2.2 and greater for ESS 3000
- 6.0.2.2 for Power8 LE and ESS 5000
- 6.0.2.3 for Power8 LE and ESS 5000
- 6.1.1.2

Online upgrade with RDMA enabled

1. Upgrade ESS nodes to a minimum of 5.3.7.x or 6.0.2.3.

2. Verify correct/supported MOFED versions (4.9-3.1.5.3) on ESS nodes.

    [EMS]# mmdsh -N all "ofed_info -s"

        ems1-hs.gpfs.net:  MLNX_OFED_LINUX-4.9-4.1.1.1:
        gssio1-hs.gpfs.net:  MLNX_OFED_LINUX-4.9-4.1.1.1:
        gssio2-hs.gpfs.net:  MLNX_OFED_LINUX-4.9-4.1.1.1:

3. Manually upgrade/downgrade MOFED as needed on ESS Nodes.

    [NODE]# mmshutdown

    [NODE]# /sbin/ofed_uninstall.sh --force

    [NODE]# mount -o ro,loop [MLNX_OFED_LINUX-4.9-3.1.5.3-rhel7.9-...iso] /mnt

    [NODE]# cd /mnt

    UPGRADING   - [NODE]# ./mlnxofedinstall --add-kernel-support --force

    DOWNGRADING - [NODE]# ./mlnxofedinstall --add-kernel-support --without-fw-update --force

    [NODE]# dracut -f -v

    [NODE]# reboot

4. Upgrade all remote, client, etc nodes to MOFED 5.x (MOFED libraries migrated to 5.x libraries).

5. Upgrade ESS to 6.1.2.x:

    - A work around is needed when upgrading from 5.3.7.[0|1|2|3]

    - You need to install the 5.3.7.4 version of gpfs.gss.tools on the ESS nodes before upgrading (contact support).

Other considerations:

1. ESS upgrades to MOFED 5.x must be completed in a timely manner across the whole cluster.

2. Adding ESS 6.1.2.x nodes into existing environments: You must upgrade existing ESS building blocks to ESS 6.1.2.x before adding new nodes into the environment. For example, if you have ESS 6.1.1.2 running and purchased a new building-block (ESS 3500) it will come with ESS 6.1.4.x code. The existing ESS 6.1.1.2 environment must be upgraded to ESS 6.1.2.x (or newer) before integrating this new building block into the cluster. This will allow the building blocks to seamlessly communicate with each other.

3. Client node considerations: Client nodes should migrate to MOFED 5.x during the process of upgrading or integrating ESS 6.1.2.x. It is recommended that all nodes move to RDMA core libraries. MOFED 5.x is preferred but if issues with compatibility exist (i.e. ConnectX-3) then MOFED 4.9x can still be used but migrating to RDMA core libraries is still required.

4. ConnectX-3 Card Considerations

If you have ConnectX-3 cards and have set the ring buffer to max, you are required to set the ring buffer back to defaults.

a.  Identify if you have ConnectX-3 adapters 

[NODE]# lspci | grep "ConnectX-3"

    Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3]
    Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

b. Identify Ring Buffer Setting

[NODE]# ibdev2netdev

    mlx4_0 port 1 == > enP3p1s0 (Up)
    mlx4_0 port 2 == > enP3p1s0d1 (Down)

[NODE]# mmgetifconf

    lo 127.0.0.1 255.0.0.0
    enP3p9s0f0 192.168.45.20 255.255.255.0
    enP3p9s0f1 172.17.31.200 255.240.0.0
    enP3p9s0f2 9.42.166.20 255.255.255.0
    enP3p9s0f3 10.0.0.1 255.255.255.0
    bond0 10.1.0.10 255.255.255.0

[NODE]# ethtool -g enP3p1s0 | egrep "max|RX:|TX:|Cur"

    Pre-set maximums:
    RX: 8192
    TX: 8192

    Current hardware settings:
    RX: 8192
    TX: 8192

[NODE]# ethtool -g enP3p1s0d1 | egrep "max|RX:|TX:|Cur"

    Pre-set maximums:
    RX: 8192
    TX: 8192

    Current hardware settings:
    RX: 8192
    TX: 8192

Supporting documentation:

Migration to RDMA-Core

MOFED 4.9 - https://docs.nvidia.com/networking/display/MLNXOFEDv494170/Installing+Mellanox+OFED 

MOFED 5.x - https://docs.nvidia.com/networking/display/MLNXOFEDv551032/General+Support

[{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STHMCM","label":"IBM Elastic Storage Server"},"ARM Category":[{"code":"a8m50000000KzfeAAC","label":"Melanox, Ofed, Bonding"}],"Platform":[{"code":"PF016","label":"Linux"}],"Version":"6.1.2"}]

Document Information

Modified date:
08 November 2022

UID

ibm16574739