GPUDirect Storage support for IBM Spectrum Scale

IBM Spectrum Scale's support for NVIDIA's GPUDirect Storage (GDS) enables a direct path between GPU memory and storage. This solution addresses the need for higher throughput and lower latencies. File system storage is directly connected to the GPU buffers to reduce latency and load on CPU. For IBM Spectrum Scale, this means that data can be read or written directly from / to an NSD server's pagepool and it is sent to the GPU buffer of the IBM Spectrum Scale clients by using RDMA. IBM Spectrum Scale with GDS requires an InfiniBand or RoCE fabric. In IBM Spectrum Scale, the mmdiag command is enhanced to print diagnostic information for GPUDirect Storage.

IBM Spectrum Scale supports Nvidia's GDS over RDMA over Converged Ethernet (RoCE). It allows to achieve low latencies and high throughput when reading data from an NSD server pagepool into a GPU buffer. It requires a high-speed Ethernet fabric with GDS capable hardware and the CUDA environment must be installed on GDS clients.

GDS is useful where significant I/O is involved and the CPU stands as a bottleneck to overall system performance. This happens when the CPU cycles are heavily used for managing data transfers into and out of CPU memory and CPU DRAM bandwidth is used. The addition of GDS enhances the ability of IBM Spectrum Scale and all-flash solutions like ESS 3200 to avoid many of those bottlenecks.

You need to install CUDA, which is provided by NVIDIA, on the IBM Spectrum Scale client. GDS enables the CUDA developer to copy data directly between IBM Spectrum Scale storage and the GPU memory by using RDMA. GDS eliminates the buffer copies in system (CPU) memory, bypasses the CPU memory, and can place data directly into the GPU application memory. GDS delivers benefits such as increased data transfer rate, lower latency, and reduced CPU utilization.

Start of changeI/O requests triggered by calling the APIs cuFileRead(.) or cuFileWrite(.) from an CUDA application are run as RDMA requests started on the NSD servers copying data into or from the GPU buffers. In case the preconditions for an RDMA are not satisfied, the read and write operations are transparently handled in compatibility mode.
Note: GDS writes follow Direct I/O semantics. As such, IBM Spectrum Scale does not serialize concurrent GDS writes (or concurrent DirectIO writes) to overlapping regions of a file. Overlapping writes must be serialized by the application.
End of change

Compatibility mode

For certain types of I/O, GDS cannot use the direct RDMA from the pagepool into the GPU buffer. In those cases, the buffered I/O path is taken, which gets the data correctly into the GPU but it does not produce any performance improvements. This is called compatibility mode for GDS. The types of I/O that switches GDS into compatibility mode include for example:
  • Files with size less than 4096 bytes.
  • Sparse files or files with preallocated storage. For example, fallocate() and gpfs_prealloc().
  • Encrypted files.
  • Memory-mapped files.
  • Compressed files or files that are marked for deferred compression. For more information on compression, see File compression.
  • Files in snapshots or clones.
  • Direct I/O is disabled by using the mmchconfig disableDIO = true option. The default value of the disableDIO parameter is false.
  • Start of changeFor a full list of cases for which the compatibility mode is used and how to diagnose these, see Restriction counters in GPUDirect Storage troubleshooting.End of change

IBM Spectrum Scale supports recovery for failing GDS RDMA operations by returning the failed read request to CUDA and CUDA retries the failed read request in the compatibility mode.

Other limitations

The following limitations are also applicable for the GDS support:
  • IBM Spectrum Scale does not support GDS in the following scenarios:
    • NVIDIA GDS in asynchronous "poll" mode. The NVIDIA GDS lib implicitly converts a poll mode request on a file in an IBM Spectrum Scale mount to a synchronous GDS I/O request.
    • Reading a file with GDS read concurrently with a buffered read does not deliver full GDS performance for the GDS thread. This limitation holds whether the concurrent threads are part of the same or different user application. In this context, buffered read is considered as a nonGDS and indirect I/O.
    • Files that use data tiering, including Transparent Cloud Tiering.