This article presents an example of how the IBM Spectrum Symphony offering on IBM Cloud can be used for HPC Monte Carlo simulation workloads.
High Performance Computing (HPC) workloads can use the IBM Spectrum Symphony scheduling software on a cluster of compute nodes that they can easily deploy on IBM Cloud. In this blog post, we provide a summary of the results from an evaluation of a set of Monte Carlo simulation workload runs. Monte Carlo simulation is widely used throughout the financial industry to perform risk analysis.
The Monte Carlo workload
The Monte Carlo Value at Risk (VaR) simulation workload is provided as a sample application as part of the IBM Spectrum Symphony offering. The VaR workload will process a pre-defined sample portfolio with a user selectable number of equities in the portfolio, number of iterations of 10K simulations, number of days in the time horizon for the simulation and degree of confidence. The total number of simulations executed during a workload run is calculated by multiplying the number of equities times the number of iterations times 10K. Note: To change the portfolio dataset, see Updating equity data to calculate Value at Risk.
Environment characteristics
The Spectrum Symphony offering on IBM Cloud was used to create all the necessary resources and to configure the HPC cluster for evaluating the Monte Carlo workload. Spectrum Symphony makes use of virtual private cloud (VPC) infrastructure services, including IBM Cloud Schematics and Terraform capabilities. The basic elements of the cluster are illustrated in Figure 1. There is a jump host (login system), one or more management nodes, one NFS server node for storage and a dynamically variable number of worker nodes. The login system and the nodes are VPC Virtual Server Instances (VSIs).
We used the Spectrum Symphony Host Factory Scheduled Requests feature to dynamically provision worker nodes for the Monte Carlo simulations and then deprovision them once the workload runs were completed and the nodes were idle:
Monte Carlo simulation workload run results
For this evaluation, we used the bx2.2×8 and bx2.48×192 profiles, which are two of the options within the Balanced family of VPC VSI profiles. We did a comparison of clusters with small nodes (bx2.2×8) and with large nodes (bx2.48×192), where the total number of vCPUs in the clusters was kept the same. The nodes in the cluster were provisioned by using the Scheduled Requests feature of IBM Spectrum Symphony. When evaluating a cluster with 960 vCPUS, 480 bx2.2×8 small nodes were provisioned for one run and 20 bx2.48×192 large nodes were provisioned for a second run.
For each workload run, a 99% degree of confidence was used along with 12 equities in the portfolio, 1,000 iterations of 10K simulations (total of 12 * 1000 * 10K = 120M simulations) and the number of days to run was set to 1095. CentOS 7.7 was used for the evaluation, but RHEL 7.7 is also available for use with the IBM Spectrum Symphony offering.
The Monte Carlo scaling curve is shown in Figure 2:
On the left, we have 48 vCPUs, which was 24 small nodes vs. 1 large node. The small node cluster achieved a throughput of 7.2M simulations/minute during the workload run, and the large node cluster throughput was 6.9M simulations/minute. Therefore, the small node cluster was slightly faster, although there was no significant data to indicate a major advantage from use of the small nodes.
On the right, we have 960 vCPUs, which was 480 small nodes vs. 20 large nodes. The small node cluster achieved a throughput of 117.1M simulations/minute during the workload run, and the large node cluster throughput was 118.9M simulations/minute. Therefore, even as the number of nodes and overall number of vCPUs were scaled up, throughput remained nearly identical for the small node and large node clusters.
In addition to runtime performance, another important aspect to consider when using a cloud environment to run HPC workloads is how quickly you can set up and tear down the configuration used for the workload runs. The cluster creation times for the small and large node clusters were comparable for 48 and 240 vCPUs scaling points. For 480, 720 and 960 vCPUs, the small cluster creation time (240, 360 and 480 nodes respectively) was longer than the cluster creation time for large nodes (10, 15 and 20 nodes respectively), which is to be expected. Using the 960 vCPU scaling point for example, creation time for 20 large nodes was 1 minute 47 seconds, whereas the creation time for 480 small nodes was 3 minutes 49 seconds.
In general, the cluster destroy time for large node clusters was faster than small node clusters — again, as expected. The longest destroy time for large node clusters was under 1 minute and the longest destroy time for small node clusters was just over 3 minutes.
Summary of the Monte Carlo workload evaluation
The IBM Cloud environment deployed using the IBM Spectrum Symphony offering provides a good illustration of the Monte Carlo simulation workload leveraging the Host Factory dynamic hosts capability. The performance scaling factor for small vs. large node size is very comparable, giving the customer the flexibility to choose either small or large nodes.
In addition to the performance scaling factor, the straightforward method used for VPC VSI pricing within a profile family on a specific cost per vCPU basis means that the total cost for a cluster consisting of an aggregate number of vCPUs will be the same regardless of the VSI profile used for the nodes in the cluster. Using the 960 vCPUs cluster size as an example, the total cost for a cluster deployed using 480 small nodes would be the same as for one using 20 large nodes instead.
By leveraging the IBM Spectrum Host Factory capabilities and the fast-provisioning performance of IBM Cloud VSIs, HPC cluster setup is simple and efficient, and operational costs can be minimized by paying only for compute resources when they are needed. With equivalent price-performance characteristics for small and large node clusters, this allows flexibility for customers’ deployment options.