Setting up the SMC support

6.6 LPAR mode z/VM guest KVM guest

SMC traffic requires two associated network interfaces: an interface for a traditional TCP/IP connection and an interface for an SMC-capable device.

Any network interface that can reach the communication peer can provide the TCP/IP connection, including HiperSockets interfaces and interfaces of OSA-Express or RoCE Express adapters. The SMC-capable devices are ISM devices for SMC-D or PCI functions of RoCE Express adapters for SMC-R.

How to associate network interfaces for SMC connections depends on your version of SMC-D or SMC-R. Issue an smcd info or smcr info command to display the supported versions.

In the following example, both the hardware and software support SMC-Dv2 and SMC-Rv2 as well as SMC-Dv1 and SMC-Rv1.
# smcr info
Kernel Capabilities
SMC Version: 2.0
SMC Hostname: t8345009.lnxne.boe
SMC-D Features: v1 v2
SMC-R Features: v1 v2

Hardware Capabilities
SEID: IBM-SYSZ-ISMSEID000000002E488561
ISM:  v1 v2
RoCE: v1 v2

For SMC-Dv2, you need an IBM z15®, IBM® LinuxONE III, or later hardware system. The smcd info command must list v2 for the SMC-D Features and for ISM.

For SMC-Rv2, your SMC-capable network adapter must be RoCE Express2 or later. The smcr info command must list v2 for the SMC-R Features and for RoCE.

Setting up connections with SMC-Dv1 or SMC-Rv1

With SMC-Dv1 or SMC-Rv1, use physical network (PNET) IDs to associate network interfaces for TCP/IP and for ISM devices or RoCE Express PCI functions. If these interfaces have the same PNET ID, they are connected to the same physical network and can be used together for SMC.

LPAR and z/VM®
For Linux® in LPAR mode and for Linux on z/VM, you can assign PNET IDs to OSA, HiperSockets, RoCE, and ISM devices through the IOCDS.
Figure 1 illustrates how the IOCDS assigns the PNET ID NET1 to an SMC-capable device and a network interface for an Ethernet device. In Linux, the matching PNET ID associates the ISM device with the Ethernet device.
Figure 1. PNET ID and SMC device association
The PNET ID of the Ethernet device and of the SMC-capable device must be the same

As a fallback, you can also use a software PNET table that maps network interfaces to PCI functions of RoCE Express adapters. For more information about PNET tables, see the KVM information that follows.

KVM
For SMC-R on Linux on KVM, you need a software PNET table that maps network interfaces of TCP/IP connections to those of PCI functions of RoCE Express adapters. Use the smc_pnet command to create a physical network (PNET) table with this mapping (see smc_pnet - Create network mapping table).
Note: z/OS® does not support the RoCE Express adapter as an IP device, and therefore uses OSA adapters for the initial handshake for SMC-R connections. Linux has no such constraint.

Setting up connections with SMC-Dv2 or SMC-Rv2

Other than SMC-Dv1 and SMC-Rv1, SMC-Dv2 and SMC-Rv2 support connections across IP subnets.

How to associate the TCP/IP network interfaces and SMC-capable devices that can reach a communication peer is different for SMC-Dv2 and SMC-Rv2.
SMC-Dv2
Other than for SMC-Dv1, SMC-Dv2 does not require PNET IDs to explicitly associate the interfaces, but PNET IDs must also not contradict the association. If set for both interfaces, the PNET ID must be the same, thus enabling the fallback to SMC-Dv1. This fallback would otherwise not be available, and is required when connecting to peers that support SMC-Dv1 only.
SMC-Rv2
Like SMC-Rv1, SMC-Rv2 requires PNET IDs to explicitly associate the interfaces.

SMC traffic is constrained by enterprise IDs (EIDs), which are assigned at the operating system level. Operating system instances that share an EID constitute a group that, with associated interfaces of TCP/IP and SMC-capable devices in place, can exchange SMC traffic. You can use EIDs to establish groups that are isolated from one another with respect to SMC. This isolation can separate operating system instances for data privacy. It can also prevent SMC-R connections between peers that are geographically or topologically too distant for efficient RDMA traffic.

EIDs apply to both SMC-Dv2 and SMC-Rv2. With SMC-D already limited to traffic within a hardware system, EIDs are useful mainly for SMC-Rv2.

An EID can be pre-defined in the hardware system or it can be user-defined.
System-defined EID
The unique system-defined EIDs of IBM Z® and IBM LinuxONE hardware systems are relevant to SMC-Dv2. Operating system instances with the same system-defined EID run on the same hardware system and are eligible to exchange SMC-Dv2 traffic.

By default, Linux instances use the system-defined EID. With the smcd seid command, you can disable or enable the system-defined EID (see smcd - Display information about SMC-D link groups and devices).

In contrast, z/OS disables the system-defined EID by default. The system-defined EID is enabled or disabled through a configuration parameter, see z/OS Communications Server: IP Configuration Guide.

With user-defined EIDs you can restrict SMC traffic to groups of operating system instances.

User-defined EIDs
User-defined EIDs are relevant to both SMC-Dv2 and SMC-Rv2, and the same user-defined EIDs apply to both SMC variants.

Assign user-defined EIDs to set up groups of operating system instances that are eligible for SMC traffic within the groups. For SMC-Rv2, user-defined EIDs can span multiple hardware systems.

If EIDs are used to group operating system instances that are geographically close, guests of the same z/VM system can all share an EID. Similarly, for SMC-Rv2 traffic, KVM guests on the same KVM host often have the same EID.

A Linux instance can have up to four EIDs, and so be a member of up to four groups. It is then eligible for SMC traffic with operating system instances in each group.

You can use the smcd ueid command or the smcr ueid command to manage user-defined EIDs (see smcr - Display information about SMC-R and smcd - Display information about SMC-D link groups and devices).

Instances of Linux on IBM Z or IBM LinuxONE have at least one active EID.
  • You cannot disable the system-defined EID unless at least one user-defined EID is assigned.
  • Deleting the last user-defined EID automatically enables the system-defined EID.
Figure 2 shows an example with three Linux instances on an IBM Z system. For all instances, the system-defined EID is enabled. With IP connectivity and eligible ISM devices in place, all instances can exchange SMC-Dv2 traffic, across IP subnets.
Figure 2. SMC-Dv2 with system-defined EID
IBM Z with two subnets connected through an IP router, and three Linux instances that all use the SEID .
In Figure 3, two of the Linux instances disabled their system-defined EID and use a matching user-defined EID instead. With this setup, only the instances with matching user-defined EIDs can exchange SMC-Dv2 traffic, Linux 1 and Linux 3 in the example.
Figure 3. SMC-Dv2 with user-defined EIDs
Above graphic with two Linux instance using a matching UEID instead of the SEID.

If Linux instances with matching user-defined EIDs are connected through RoCE Express adapters, the connection can be SMC-Rv2 instead of SMC-Dv2. Because SMC-D is more performant than SMC-R, SMC-D is used if the prerequisites for both options are in place.

SMC-R connections can span both IP subnets and hardware systems, as illustrated in Figure 4.
Figure 4. SMC-R across IP subnets and hardware systems
The graphic shows two subnets on different IBM Z systems. An external network connects the two subnets and two RoCE adapters, one on each hardware system.

In the example, Linux 1 and Linux 4 can exchange SMC-R traffic, assuming that PNET IDs associate the TCP/IP interface and the SMC-R capable interface on both Linux 1 and Linux 4.

Network device settings for SMC-R

On the network device that is associated with the RoCE Express PCI function that you want to use for SMC traffic, check the settings with the ethtool command and ensure that pause settings are turned on.

For example, if eno3 is the network device that is associated with the wanted PCI function:
# ethtool -a eno3
Pause parameters for eno3:
Autonegotiate: off
RX: on
TX: on
RoCE Express PCI functions provide both, interfaces for SMC-R RDMA traffic and Ethernet interfaces for TCP traffic. To use a PCI function as a failover device for RDMA, the Ethernet interface must be active but not permit any traffic. The following example shows how this condition can be attained. The example uses the ip command. For a persistent configuration, use the network manager of your distribution.
  1. Set up a link mylnk_eth0 for an interface eth0
    # ip link add dev mylnk_eth0 link eth0
    To set up the link in the context of a VLAN, append the VLAN specifications to this command. For example, for a VLAN with ID 661, the command becomes:
    # ip link add dev mylnk_eth0 link eth0 type vlan id 661
  2. Assign an IP address to the link.
    # ip addr add 10.2.1.1/16 dev mylnk_eth0
  3. Activate the link.
    # ip link set mylnk_eth0 up
  4. Remove all auto-generated routes for the new link.
    # ip route flush scope link dev mylnk_eth0
  5. The network manager of your distribution might interpret this stale link setup as a configuration error. Prevent the network manager from reversing your settings to make the link functional. The example shows a NetworkManager command.
    # nmcli device set eth0 managed no

    Your distribution might use a different network manager, for example, wicked or netplan. Use a command according to your network manager.

Sysctl settings

SMC requires contiguous memory. The minimum is 16 KB, and the maximum is 512 MB. The SMC implementation selects a value as follows:
  • Some socket applications define the socket send- and receive buffer sizes with a setsockopt call, whose upper limits are defined in net.core.wmem_max and net.core.rmem_max.
  • If setsockopt SO_SNDBUF is not used, the socket send buffer size is taken from the value of net.ipv4.tcp_wmem.
  • If setsockopt SO_RCVBUF is not used, the socket receive buffer is taken from the value of net.ipv4.tcp_rmem, rounded to the next higher power of 2.

Make an existing application use SMC

Use the preload library to make the unmodified socket application use SMC. Existing TCP/IP applications can benefit from the SMC protocol without recompiling if they are started with the SMC preload library libsmc-preload.so. See the smc-tools package for the smc_run script, which makes an existing TCP/IP socket program use SMC.

As an alternative to smc_run, you can use the LC_PRELOAD environment variable to specify the preload library with the application's start command:
# LD_PRELOAD=libsmc-preload.so <application_start_cmd>

Converting an application to use SMC

Alternatively, if you need to, you can convert an application. To convert an application from TCP/IP to SMC sockets, change the socket() function call from AF_INET to AF_SMC with protocol 0 and from AF_INET6 to AF_SMC with protocol 1. For example, change:
sd = socket(AF_INET, SOCK_STREAM, 0);
to:
sd = socket(AF_SMC, SOCK_STREAM, 0);
and
sd = socket(AF_INET6,SOCK_STREAM, 0);
to:
sd = socket(AF_SMC, SOCK_STREAM, 1);
Use the sockets.h header file from the glibc-header package. For more programming information, see the af_smc(7) man page.