Example GPU job submissions

The following are examples of possible submissions for jobs that use GPU resources.

  • The following job requests the default GPU resource requirement num=1:mode=shared:mps=no:j_exclusive=no. The job requests one GPU in DEFAULT mode, without starting MPS, and the GPU can be used by other jobs since j_exclusive is set to no.
    bsub -gpu - ./app
  • The following job requires 2 EXCLUSIVE_PROCESS mode GPUs and starts MPS before the job runs:
    bsub -gpu "num=2:mode=exclusive_process:mps=yes" ./app
  • The following job requires 2 EXCLUSIVE_PROCESS mode GPUs, starts MPS before the job runs, and allows multiple jobs in the host to share the same MPS daemon if those jobs are submitted by the same user with the same GPU requirements:
    bsub -gpu "num=2:mode=exclusive_process:mps=yes,share" ./app
  • The following job requires 2 EXCLUSIVE_PROCESS mode GPUs and starts multiple MPS daemons (one MPS daemon per socket):
    bsub -gpu "num=2:mode=exclusive_process:mps=per_socket" ./app
  • The following job requires 2 EXCLUSIVE_PROCESS mode GPUs and starts multiple MPS daemons (one MPS daemon per socket), and allows multiple jobs in the socket to share the same MPS daemon if those jobs are submitted by the same user with the same GPU requirements:
    bsub -gpu "num=2:mode=exclusive_process:mps=per_socket,share" ./app
  • The following job requires 2 EXCLUSIVE_PROCESS mode GPUs and starts multiple MPS daemons (one MPS daemon per GPU):
    bsub -gpu "num=2:mode=exclusive_process:mps=per_gpu" ./app
  • The following job requires 2 EXCLUSIVE_PROCESS mode GPUs and starts multiple MPS daemons (one MPS daemon per GPU), and allows multiple jobs in the GPU to share the same MPS daemon if those jobs are submitted by the same user with the same GPU requirements:
    bsub -gpu "num=2:mode=exclusive_process:mps=per_gpu,share" ./app
  • The following job requires 2 DEFAULT mode GPUs and uses them exclusively. The two GPUs cannot be used by other jobs even though the mode is shared:
    bsub -gpu "num=2:mode=shared:j_exclusive=yes" ./app
  • The following job uses 3 DEFAULT mode GPUs and shares them with other jobs:
    bsub -gpu "num=3:mode=shared:j_exclusive=no" ./app
  • The following job requests 2 AMD GPUs:
    bsub -gpu "num=2:gvendor=amd" ./app
  • The following job requests 2 Vega GPUs with xGMI connections:
    bsub -gpu "num=2:gmodel=Vega:glink=yes" ./app
  • The following job requests 2 Nvidia GPUs:
    bsub -gpu "num=2:gvendor=nvidia" ./app
  • The following job requests 2 Tesla C2050/C2070 GPUs:
    bsub -gpu "num=2:gmodel=C2050_C2070"
  • The following job requests 2 Tesla GPUs of any model with a total memory size of 12 GB on each GPU:
    bsub -gpu "num=2:gmodel=Tesla-12G"
  • The following job requests 2 Tesla GPUs of any model with a total memory size of 12 GB on each GPU, but with relaxed GPU affinity enforcement:
    bsub -gpu "num=2:gmodel=Tesla-12G":aff=no
  • The following job requests 2 Tesla GPUs of any model with a total memory size of 12 GB on each GPU and reserves 8 GB of GPU memory on each GPU:
    bsub -gpu "num=2:gmodel=Tesla-12G:gmem=8G"
  • The following job requests 4 Tesla K80 GPUs per host and 2 GPUs on each socket:
    bsub -gpu "num=4:gmodel=K80:gtile=2"
  • The following job requests 4 Tesla K80 GPUs per host and the GPUs are spread evenly on each socket:
    bsub -gpu "num=4:gmodel=K80:gtile='!'"
  • The following job requests 4 Tesla P100 GPUs per host with NVLink connections and the GPUs are spread evenly on each socket:
    bsub -gpu "num=4:gmodel=TeslaP100:gtile='!':glink=yes"
  • The following job uses 2 Nvidia MIG devices with a GPU instance size of 3 and a compute instance size of 2.
    bsub -gpu "num=2:mig=3/2" ./app
  • The following job uses 4 EXCLUSIVE_PROCESS GPUs that cannot be used by other jobs. The j_exclusive option defaults to yes for this job.
    bsub -gpu "num=4:mode=exclusive_process" ./app
  • The following job requires two tasks. Each task requires 2 EXCLUSIVE_PROCESS GPUs on two hosts. The GPUs are allocated in the same NUMA as the allocated CPU.
    bsub -gpu "num=2:mode=exclusive_process" -n2 -R "span[ptile=1] affinity[core(1)]" ./app
  • The following job ignores the simple GPU resource requirements that are specified in the -gpu option because the -R option is specifying the ngpus_physical GPU resource:
    bsub -gpu "num=2:mode=exclusive_process" -n2 -R "span[ptile=1] rusage[ngpus_physical=2:gmodel=TeslaP100:glink=yes]" ./app
    Since you can only request EXCLUSIVE_PROCESS GPUs with the -gpu option, move the rusage[] string contents to the -gpu option arguments. The following corrected job submission requires two tasks, and each task requires 2 EXCLUSIVE_PROCESS Tesla P100 GPUs with NVLink connections on two hosts:
    bsub -gpu "num=2:mode=exclusive_process:gmodel=TeslaP100:glink=yes" -n2 -R "span[ptile=1]" ./app