A look at the auto-scaling capability offered by IBM Spectrum LSF in the HPC cluster solution.
The HPC cluster blog post and tutorial show you how to create an HPC cluster on IBM Cloud. This post is to focus on the auto-scaling capability offered by IBM Spectrum LSF in the HPC cluster solution.
The parameters related to auto-scaling are worker_node_min_count and worker_node_max_count:
Your cluster will keep worker_node_min_count workers regardless of whether there are jobs running in the cluster. When jobs request computing resources more than worker_node_min_count workers, LSF auto-scaling will automatically add more workers — up to worker_node_max_count workers in the cluster — to satisfy jobs’ demands. These additional computing resources (i.e., total number of workers – worker_node_min_count ) will be removed from the cluster after the job demand has diminished and the resources have been idling for 10 minutes.
Updating auto-scaling parameters
After you create an HPC cluster, you can still adjust the auto-scaling parameters. This can be done by changing the LSF configuration files inside the cluster. This page shows you how to change worker_node_max_count. You can also adjust the default 10-minute idling time if you want to keep these additional computing resources created by auto-scaling for a longer time. We would suggest keeping 10 minutes as the minimum value. This timer can be adjusted following the steps here.
One of the nice features of LSF auto-scaling is to allow you to mix different types of workers in one cluster. The instructions to add multiple compute profiles are provided here. You can then select the best choice of computing resources depending on the performance characteristics of your workloads.
Conclusions
This blog post shows you how to configure IBM Spectrum LSF auto-scaling parameters with the HPC cluster solution. Try to create a HPC cluster and see how LSF auto-scaling manages computing resources dynamically. You can learn more about LSF auto-scaling here.