![Start of change](./delta.gif)
Configuring GPUDirect Storage for IBM Spectrum Scale
After you install IBM Spectrum Scale, you need to enable the GPUDirect Storage (GDS) feature by using the mmchconfig verbsGPUDirectStorage = yes command.
You also need to set the following configuration options by using the mmchconfig command on the GDS clients and storage servers:
minReleaseLevel
must be 5.1.2 or later.verbsRdma = enable
verbsRdmaSend = yes
verbsPorts
. The values must be compliant with the values of the rdma_dev_addr_list parameter that is at /etc/cufile.json.verbsRdmaCm = disable
Configuring virtual fabrics
The RDMA subsystem within the IBM Spectrum Scale is supporting virtual fabrics to control how RDMA ports on NSD clients and NSD servers communicate with each other through Queue Pairs. Only RDMA ports on the same virtual fabric communicate with each other. With this feature, it is possible to use GDS on setups with multiple and separated InfiniBand fabrics.
All virtual fabric numbers that are used on the GDS clients must be used on the NSD server also.
There are no configuration changes within the NVIDIA GDS software stack. All RDMA ports that are
configured within the NVIDIA software stack and key rdma_dev_addr_list
in object
properties must be configured in the NSD server also, by using the verbsPorts
configuration variable. If all GDS I/O operations through an RDMA port are not listed in
verbsPorts
, it results in an I/O error and an error message is logged in the
IBM
Spectrum Scale log file. The
verbsPorts
syntax remains unchanged.
All NSD servers must have RDMA ports in all virtual fabrics that are configured on the NSD clients that perform I/O through GDS. For example, on the GDS clients, the RDMA ports are configured to use virtual fabric numbers 1, 2, 3, and 4. On the NSD server, RDMA ports on the same four virtual fabric number must be configured. When a GDS client submits a GDS request through an RDMA port on the virtual fabric number 4, but the NSD server does not have an RDMA port on virtual fabric number 4, the request fails and results in an I/O error in the GDS application. An error message in the IBM Spectrum Scale log file also gets recorded.
Configuring CUDA
- rdma_dev_addr_list
- Defines the InfiniBand devices to be used. The IP over InfiniBand addresses specified must be
consistent with the values that are set for the
verbsPorts
parameter on the GDS clients. - rdma_load_balancing_policy
- Specifies the load-balancing policy for RDMA memory registration. If the GDS client is a DGX,
the following values must be set:
- RoundRobin: For storage Network Interface Cards (NIC).
- RoundRobinMaxMin: For compute NICs.
The default value is RoundRobin. For more information on DGX, see https://www.nvidia.com/en-us/data-center/dgx-systems/.
- rdma_access_mask
- Enables relaxed ordering. Set the value
0x1f
. - "logging"."level"
- Defines the log level. Set the values
ERROR
orWARN
unless debug output is required. Setting log levels such asDEBUG
andTRACE
impacts performance. - use_poll_mode
- Switches the NVIDIA driver between asynchronous and synchronous I/O modes. Set the value
false
for configuring GDS for IBM Spectrum Scale.
![End of change](./deltaend.gif)