Spectrum Scale on IBM Cloud performance

In the cloud era, a compute cluster that once took months to build out can now be created and ready to use in minutes. In this blog post, we will discuss all pieces that come together to make this near-instant infrastructure a reality. From there, we will show how this infrastructure and file system fulfills the promise of performance right out of the box.

IBM Spectrum Scale

The express path to a high-performance distributed file system and compute cluster begins with the IBM Spectrum Scale catalog tile. Follow the link, and the IBM Cloud Schematics interface offers a straightforward process for filling out the parameters to configure your cloud-based storage and compute cluster. After you provide all the configuration details, your input is stored as a Schematics workspace. This workspace contains your infrastructure specification, and upon your command, the workspace connects with the Terraform and Ansible code contained in the repository to create your cloud-based infrastructure.

VPC infrastructure

The IBM Cloud VPC infrastructure used by the Spectrum Scale catalog tile can employ storage nodes based on bare metal instances with NVMe devices or use virtual instances with instance storage. For this post, we will be using bare metal instances that offer the following:

8 3.2 TB NVMe storage devices
48 physical cores (96 vCPUs) from Intel Xeon 8260 processors
192 – 1536 GB of memory

The number and configuration of the compute nodes is up to the user with virtual instance profiles:

From 2 to 176 vCPUs
From 2 GB to 2.5 TB of memory

In addition to the storage and compute nodes, the automation provisions and configures a bastion node that helps to secure the cluster’s VPC in several ways:

Serves as an SSH jump host allowing secure command line access to the cluster’s VPC
Isolates the cluster VPC from the internet by closing non-essential ports
Restricts access to the cluster to approved remote IP addresses or CIDR blocks

IBM Spectrum Scale file system

IBM Spectrum Scale is a high-performance clustered file system that provides concurrent access to a shared file system from multiple nodes. It can be used in a wide variety of hardware and software configurations. For our purposes, it is configured as a collection of nodes built from both bare metal servers for storage and virtual instances for compute. Each bare metal instance has direct-attached NVMe storage serving as NSD volumes and a 100 Gbps network interface. We’ll have more to say about this in the performance section.

Security

The tile automation scripts build a cluster that employs simple and effective security practices to get you started:

User-supplied SSH keys
A login (bastion) node jump host
Firewall with only the SSH port open and restricted to your specified CIDR
All nodes in the cluster can only be accessed from within the VPC

From there, it is expected that you employ the rich set of tools supplied by IBM Cloud and Spectrum Scale to implement the level of security that meets your needs.

Cluster creation

As discussed earlier, before it is rendered in real hardware, the cluster exists as a specification stored in a Schematics workspace. This workspace can be thought of as a form of infrastructure that incurs no cost or energy while in storage.

Assuming the cluster is already configured, the process of bringing it to life begins with invoking the “apply” command, which executes the pre-existing and well-tested Terraform scripts from the Schematics repository to provision the cloud resources. Whenever possible, the provision steps are carried out in parallel. In the case of our largest example, a 10 storage node and 64 compute node cluster, there can be close to 100 discrete cloud operations in flight at one time. In this way, for one example, the 64 compute nodes are provisioned concurrently and complete in a little over 1 minute, and so it goes with subnets, security rules, a bastion node, storage nodes and so on. Once the hardware is in place, Ansible scripts are kicked off to install and configure the software.

Time required to create a Spectrum Scale cluster

The following timings were measured on varying cluster configurations in real experiments and can be used as a guideline. As always, your results may vary to some degree. Three different cluster sizes were tested, and the times needed to create them were broken down to give an idea of how long various operations take.

“Schematics time” is the amount of time spent running Terraform scripts in a Schematics container. This time is spent provisioning a login node and a “controller” node to which we transfer the responsibility for finishing the cluster. The reason we make this transition is to allow us to move execution to a node that we own and control. We can also size to speed up the process that is executing Terraform scripts to provision resources and later Ansible scripts to install software and configure the cluster.

This time is split into the Controller Terraform and Controller Ansible components. The “Total Time” column is the elapsed time from “apply” to the cluster being ready to get to work. It is interesting to note how the performance varies as we scale up the cluster size. Schematics time is essentially invariant because it is the same amount of work in this phase, regardless of cluster size. The controller Terraform illustrates how successfully we can parallelize the Terraform provisions. In this case, the time needed to do 74 (10 storage + 64 compute) provisions is less than 5% longer than the time needed to do 6. In contrast, the Ansible-based configurations run serially in many cases, so the time needed is proportional to the number of nodes in the cluster.

We also tested the time needed to destroy a cluster, and the results are in Table 2 above. The total time is made up of two separate operations. There are two operations due to the split nature of the Terraform work. Some of it runs on the Schematics container, while the bulk of the work is carried out on the bootstrap instance.

These two operations run sequentially, so the total time is obtained by adding the two operations together. Regardless of cluster size, it takes approximately 10 minutes to free all the resources and return them to the cloud. Just as in resource creation, we take advantage of the ability to run Terraform operations in parallel to keep the total time down.

Spectrum Scale storage resiliency

Out of the box, our cluster offers resiliency that allows for the loss of a storage node and the loss of a storage block.

This level of redundancy requires two settings that are applied at cluster creation time:

A minimum storage cluster consists of the three nodes
A write replication factor of two is set

The above settings can be seen as providing the basic level of resiliency that befits a large, clustered file system. Beyond this, and depending on your needs, Spectrum Scale and IBM Cloud can be customized to provide resiliency and security at very high levels.

Spectrum Scale storage performance

The testing was performed on a system with the following characteristics:

10 storage nodes
80 NVMe drives
256 TB of raw storage capacity
100 Gbps network in each storage node
A single 107 TB file system provided by Spectrum Scale 5.1.4

On the compute side, we have:

64 compute nodes
cx2-16×32 (16 vCPU, 32 GB memory) instance profile
512 physical cores
24 Gbps network per instance

It should be evident that these are very good numbers for a clustered file system. The read bandwidth of 112 GiB/sec is essentially all the bandwidth supplied by the 10 100 Gbps network adapters, which means when it comes to read bandwidth, the Scale software and IBM Cloud network infrastructure is leaving nothing on the table. Write bandwidth is also good, operating under the constraints imposed by replication. The 5.4 million read IOPs supplied are also impressive. In short, this is a very high-performance offering out of the box.

It should be noted that all the results listed above were achieved “out of the box.” As with any high-performance computing system, the cluster has benefited from testing and tuning, but it was done over the course of our development and performance testing and the tuning is now applied automatically when a cluster is built from the tile.

Conclusion

The IBM Cloud Spectrum Scale catalog tile has been designed and built to offer you the shortest path possible to get to a high-performance compute and storage cluster. In less than one hour, you can build a compute/storage cluster to your specification with up to a 100 TB distributed file system, as much compute capacity as you desire and tuned to extract maximum performance from the underlying hardware. We invite you to try out our offering and embrace the cloud-based future of high-performance computing today.

Author

Greg Mewhinney

Senior Engineer

IBM Cloud Performance Engineering

Paul Mazzurana