GPU enhancements

The following enhancements affect LSF GPU support.

GPU fair share scheduling

LSF now allows you to consider GPU run time and GPU historical run time as weighting factors in the dynamic priority calculation for fair share scheduling for GPUs that are exclusively used.

To add GPU run time to the dynamic priority calculation, specify the GPU_RUN_TIME_FACTOR parameter in the lsb.params or lsb.queues file. To enable GPU historical run time to also be included in the GPU run time factor, set the ENABLE_GPU_HIST_RUN_TIME parameter to Y in the lsb.params or lsb.queues file. If these parameters are defined in both files, the queue-level values take precedence.

LSB_GPU_NEW_SYNTAX=extend must be defined in the lsf.conf file to include GPU run time in the fair share formula.

GPU preemption

LSF GPU jobs now support preemptive scheduling so that a lower priority GPU job can release GPU resources for higher priority GPU jobs. GPU jobs must be either using exclusive_process mode or have j_exclusive=yes set to be preempted by other GPU jobs. Non-GPU jobs cannot preempt GPU jobs. In addition, higher priority GPU jobs can only preempt lower priority GPU jobs that are configured for automatic job migration and rerun (that is, the MIG parameter is defined and the RERUNNABLE parameter is set to yes in the lsb.queues or lsb.applications file).

To enable GPU preemption, configure the LSB_GPU_NEW_SYNTAX parameter in the lsf.conf file to either Y or extend, then configure the PREEMPTABLE_RESOURCES parameter in the lsb.params file to include the ngpus_physical resource. LSF treats the GPU resources the same as other preemptable resources.

General GPU number limits

LSF now supports a general GPU number limit (that is, the number of ngpus_physical resources) when enabling the new GPU syntax.

For example, in the lsb.resources file, limit the general GPU number as follows:

Begin Limit
NAME = Limit1
RESOURCE=[ngpus_physical,3]
End Limit

To enable the general GPU number limit, enable the new GPU syntax. That is, configure the LSB_GPU_NEW_SYNTAX=extend parameter in the lsf.conf file.