Configuring LSF to run Docker jobs

Configure the Docker application profile or queue in LSF to run Docker jobs.

About this task

You cannot run pre-execution and post-execution scripts in container jobs. The following are workarounds for specific pre-execution and post-execution operations:
  • To prepare data for the container as a pre-execution or post-execution operation, put this data into a directory that is mounted to a job container.
  • To customize the internal job container, you can customize the starter scripts to prepare the appropriate environment.

Procedure

  1. Edit the lsb.applications or lsb.queues file and define the CONTAINER parameter for the application profile or queue to run Docker jobs.

    If this parameter is specified in both files, the parameter value in the lsb.applications file overrides the value in the lsb.queues file.

    Use this syntax:
    CONTAINER=docker[image(image_name) options(docker_run_options)]
    Note: Do not use the following options when configuring Docker options in the options keyword configuration or the options script:
    • The --cgroup-parent, --user, -u, and --name options are reserved for LSF internal use. Do not use these options in the options keyword configuration or in the options script.
    • The -w and --ulimit options are automatically set for LSF. Do not use these options in the options keyword configuration or options script because the specifications here override the LSF settings.

    For more details, refer to the CONTAINER parameter in the lsb.applications file or the CONTAINER parameter in the lsb.queues file.

    You can enable LSF to automatically assign a name to a Docker container when it creates the Docker container. To enable this feature, set the ENABLE_CONTAINER_NAME parameter to True in the lsfdockerlib.py file.

    The container name uses the following naming conventions:
    • Normal jobs and blaunch parallel job containers: <cluster_name>.job.<job_id>
    • Array jobs and array blaunch parallel job containers: <cluster_name>.job.<job_id>.<job_index>
    • blaunch parallel job task containers: <cluster_name>.job.<job_id>.task.<task_id>
    • Array blaunch parallel job task containers: <cluster_name>.job.<job_id>.<job_index>.task.<task_id>

    In the following examples, LSF uses the ubuntu image to run the job in the Docker container.

    • For sequential jobs:
      CONTAINER=docker[image(ubuntu) options(--rm)]

      The container for the job is removed after the job is done, which is enabled with the docker run --rm option.

    • For parallel jobs:
      CONTAINER = docker[image(ubuntu)  options(--rm --net=host --ipc=host -v /path/to/my/passwd:/etc/passwd)]

      This command uses the following docker run options:

      --rm
      The container for the job is removed after the job is done
      --net=host
      LSF needs the host network for launching parallel tasks.
      -v
      LSF needs the user ID and user name for launching parallel tasks.
      Note: The passwd file must be in the standard format for UNIX and Linux password files, such as the following format:
      user1:x:10001:10001:::
      user2:x:10002:10002:::
  2. Edit the lsb.applications or lsb.queues file and define the EXEC_DRIVER parameter for the application profile or queue to run Docker jobs.

    If this parameter is specified in both files, the parameter value in the lsb.applications file overrides the value in the lsb.queues file.

    EXEC_DRIVER=context[user(lsfadmin)]
                starter[/path/to/serverdir/docker-starter.py]
                controller[/path/to/serverdir/docker-control.py]
                monitor[/path/to/serverdir/docker-monitor.py]

    Replace /path/to/serverdir with the actual file path of the LSF_SERVERDIR directory.

  3. Optional: Enable Docker image affinity by defining DOCKER_IMAGE_AFFINITY=Y in the lsb.applications file for the application profile to run Docker jobs, the lsb.queues file for the queue to run Docker jobs, or the lsb.params file for the entire cluster.

    Docker image affinity enables LSF to give preference for execution hosts that have already have the requested Docker image. This reduces network bandwidth and the job start time because the execution host does not have to pull the Docker image from the repository and the job can immediately start on the execution host.

    If this parameter is specified in both files, the parameter value in the lsb.applications file overrides the value in the lsb.queues file, which overrides the value in the lsb.params file.