GPUDirect Storage support for IBM Spectrum Scale

IBM Spectrum Scale's support for NVIDIA's GPUDirect Storage (GDS) enables a direct path between GPU memory and storage. This solution addresses the need for higher throughput and lower latencies. File system storage is directly connected to the GPU buffers to reduce latency and load on CPU. For IBM Spectrum Scale, this means that data can be read directly from an NSD server's pagepool and it is sent to the GPU buffer of the IBM Spectrum Scale clients by using RDMA. IBM Spectrum Scale with GDS requires an InfiniBand fabric. Start of changeIn IBM Spectrum Scale 5.1.3, the mmdiag command is enhanced to print diagnostic information for GPUDirect Storage.End of change

IBM Spectrum Scale 5.1.3 supports Nvidia's GDS over RDMA over Converged Ethernet (RoCE). It allows to achieve low latencies and high throughput when reading data from an NSD server pagepool into a GPU buffer. It requires a high-speed Ethernet fabric with GDS capable hardware and the CUDA environment must be installed on GDS clients.

GDS is useful where significant I/O is involved and the CPU stands as a bottleneck to overall system performance. This happens when the CPU cycles are heavily used for managing data transfers into and out of CPU memory and CPU DRAM bandwidth is used. The addition of GDS enhances the ability of IBM Spectrum Scale and all-flash solutions like ESS 3200 to avoid many of those bottlenecks.

You need to install CUDA, which is provided by NVIDIA, on the IBM Spectrum Scale client. GDS enables the CUDA developer to bring data directly from IBM Spectrum Scale storage to the GPU memory by using RDMA. GDS eliminates the buffer copies in system (CPU) memory, bypasses the CPU memory, and places data directly into the GPU application memory. GDS delivers benefits such as increased data transfer rate, lower latency, and reduced CPU utilization.

I/O requests triggered by calling the function cuFileRead() from the CUDA application are run as RDMA requests from the NSD servers into the GPU buffer. Currently, only read operations are supported. More restrictions and limitations are listed in the following sections.

Compatibility mode

For certain types of I/O, GDS cannot use the direct RDMA from the pagepool into the GPU buffer. In those cases, the buffered I/O path is taken, which gets the data correctly into the GPU but it does not produce any performance improvements. This is called compatibility mode for GDS. The types of I/O that switches GDS into compatibility mode are as follows:
  • Files with size less than 4096 bytes.
  • Sparse files or files with pre-allocated storage. For example, fallocate() and gpfs_prealloc().
  • Encrypted files.
  • Memory-mapped files.
  • Compressed files or files that are marked for deferred compression. For more information on compression, see File compression.
  • Files in snapshots or clones.
  • Direct I/O is disabled by using the mmchconfig disableDIO = true option. The default value of the disableDIO parameter is false.

IBM Spectrum Scale 5.1.3 supports recovery for failing GDS RDMA operations by returning the failed read request to CUDA and CUDA retries the failed read request in the compatibility mode.

Other limitations

The following limitations are also applicable for the GDS support:
  • Write operations (cuFileWrite) are supported in compatibility mode. For the writes in compatibility mode, the data is first copied from the GPU buffer into the host memory (cudaMemcpy) and then Direct I/O is used if possible.
  • IBM Spectrum Scale does not support GDS in the following scenarios:
    • NVIDIA GDS in asynchronous "poll" mode. The NVIDIA GDS lib implicitly converts a poll mode request on a file in an IBM Spectrum Scale mount to a synchronous GDS I/O request.
    • Reading a file with GDS read concurrently with a buffered read does not deliver full GDS performance for the GDS thread. This limitation holds whether the concurrent threads are part of the same or different user application. In this context, buffered read is considered as a non-GDS and non-direct I/O.
    • Files that use data tiering, including Transparent Cloud Tiering.