Define Intel Xeon Phi resources

Enable LSF so applications can use Intel Xeon Phi co-processors (previously referred to as Intel Many Integrated Core Architecture, or MIC, co-processors) in a Linux environment. LSF supports parallel jobs that request Xeon Phi resources, so you can specify some co-processors on each node at run time, based on availability.

Specifically, LSF supports the following environments:

  • Intel Xeon Phi co-processors for serial and parallel jobs. Use the blaunch command to launch parallel jobs.
  • Intel Xeon Phi co-processor for LSF jobs in offload mode, both serial and parallel.
  • CUDA 4.0 to CUDA 8.0 and later.
  • LIntel Xeon Phi co-processors support Linux x64.

LSF also supports the collection of metrics for Xeon Phi co-processors by using ELIMs and predefined LSF resources.

The elim.mic ELIM collects the following information:

  • elim.mic detects the number of Intel Xeon Phi co-processors (nmics)
  • For each co-processor, the optional elim detects the following resources:
    mic_ncores*
    Number of cores.
    mic_temp*
    Co-processor temperature.
    mic_freq*
    Co-processor frequency.
    mic_freemem*
    Co-processor free memory.
    mic_util*
    Co-processor utilization.
    mic_power*:
    Co-processor total power.

* If the resource consists of more than one resource, an index is displayed, starting at 0. For example, for mic_ncores, you might see mic_ncores0, mic_ncores1, and mic_ncores2, and so on.

When you enable LSF support for Intel Xeon Phi resources, note the following support:

  • Checkpoint and restart are not supported.
  • Preemption is not supported.
  • Resource duration and decay are not supported.
  • ELIMs for CUDA 4.0 can work with CUDA 8.0 or later.

Configure and use Intel Xeon Phi resources

Configure and use Intel Xeon Phi resources:
  1. Binary files for the base elim.mic file are located under $LSF_SERVERDIR. The binary for elim.mic.ext script file is located under LSF_TOP/10.1.0/util/elim.mic.ext.

    Make sure that the elim executable files are in the LSF_SERVERDIR directory.

    For Intel Xeon Phi co-processor support, make sure that the following third-party software is installed correctly:

    • Intel Xeon Phi co-processor (Knights Corner).
    • Intel MPSS version 2.1.4982-15 or later.
    • Runtime support library/tools from Intel Xeon Phi offload support.
  2. Configure the LSF cluster that contains the Intel Xeon Phi resources.
    • Configure the lsf.shared file.
      For Intel Xeon Phi support, define the following resources in the Resource section. The first resource (nmics) is required. The others are optional:
      Begin Resource 
      RESOURCENAME TYPE    INTERVAL  INCREASING  CONSUMABLE  DESCRIPTION
      nmics        Numeric 60        N           Y           (Number of MIC devices)
      mic_temp0    Numeric 60        Y           N           (MIC device 0 CPU temp)
      mic_temp1    Numeric 60        Y           N           (MIC device 1 CPU temp)
      mic_freq0    Numeric 60        N           N           (MIC device 0 CPU freq)
      mic_freq1    Numeric 60        N           N           (MIC device 1 CPU freq)
      mic_power0   Numeric 60        Y           N           (MIC device 0 total power)
      mic_power1   Numeric 60        Y           N           (MIC device 1 total power)
      mic_freemem0 Numeric 60        N           N           (MIC device 0 free memory)
      mic_freemem1 Numeric 60        N           N           (MIC device 1 free memory)
      mic_util0    Numeric 60        Y           N           (MIC device 0 CPU utility)
      mic_util1    Numeric 60        Y           N           (MIC device 1 CPU utility)
      mic_ncores0  Numeric 60        N           N           (MIC device 0 number cores)
      mic_ncores1  Numeric 60        N           N           (MIC device 1 number cores)
      ...
      End Resource
      Note: The mic_util resource is a numeric resource, so the lsload command does not display it as the internal resource.
    • Configure the lsf.cluster.cluster_namefile.

      For Intel Xeon Phi support, define the following lines in the ResourceMap section. The first resource (nmics) is provided by the elim.mic. The others are optional:

      Begin ResourceMap
      RESOURCENAME      LOCATION
      ...
      nmics             [default]
      mic_temp0         [default]
      mic_temp1         [default]
      mic_freq0         [default]
      mic_freq1         [default]
      mic_power0        [default]
      mic_power1        [default]
      mic_freemem0      [default]
      mic_freemem1      [default]
      mic_util0         [default]
      mic_util1         [default]
      mic_ncores0       [default]
      mic_ncores1       [default]
      ...
      End ResourceMap
      
    • Configure the nmics resource in the lsb.resources file. You can set attributes in the ReservationUsage section with the following values:
      Begin ReservationUsage 
      RESOURCE         METHOD        RESERVE
      ...
      nmics            PER_TASK      N
      ...
      End ReservationUsage
      

      If this file has no configuration for Intel Xeon Phi resources, by default LSF considers all resources as PER_HOST.

  3. Use the lsload -l command to show Intel Xeon Phi resources:
    lsload -I nmics:ngpus:ngpus_shared:ngpus_excl_t:ngpus_excl_p
    HOST_NAME       status nmics ngpus ngpus_shared ngpus_excl_t ngpus_excl_p
    hostA           ok      -    3.0   12.0         0.0          0.0
    hostB           ok     1.0    -     -            -            -
    hostC           ok     1.0    -     -            -            -
    hostD           ok     1.0    -     -            -            -
    hostE           ok     1.0    -     -            -            -
    hostF           ok      -    3.0    12.0        0.0          0.0
    hostG           ok      -    3.0    12.0        0.0          1.0
    hostH           ok      -    3.0    12.0        1.0          0.0
    hostI           ok     2.0    -      -           -            -
    
  4. Use the bhost -l command to see how the LSF scheduler allocated Intel Xeon Phi resources. These resources are treated as normal host-based resources:
    bhosts -l hostA
    HOST  hostA
    STATUS   CPUF  JL/U   MAX  NJOBS  RUN  SSUSP  USUSP  RSV DISPATCH_WINDOW
    ok       60.00  -     12   2      2    0      0      0   -
     
    CURRENT LOAD USED FOR SCHEDULING:
             r15s  r1m  r15m  ut  pg   io  ls it tmp  swp   mem   slots nmics
    Total    0.0   0.0  0.0   0%  0.0  3   4  0  28G  3.9G  22.5G  10   0.0
    Reserved 0.0   0.0  0.0   0%  0.0  0   0  0  0M   0M    0M      -    - 
     
              ngpus ngpus_shared ngpus_excl_t ngpus_excl_p
    Total     3.0   10.0         0.0          0.0
    Reserved  0.0   2.0          0.0          0.0
     
    LOAD THRESHOLD USED FOR SCHEDULING:
               r15s  r1m  r15m  ut  pg  io  ls  it  tmp  swp  mem
    loadSched   -    -     -    -   -   -   -   -   -    -    -  
    loadStop    -    -     -    -   -   -   -   -   -    -    -  
     
                nmics ngpus ngpus_shared ngpus_excl_t ngpus_excl_p 
    loadSched   -     -     -            -            -  
    loadStop    -     -     -            -            -  
    
  5. Use the lshosts -l command to see the information for Intel Xeon Phi co-processors that are collected by the elim:
    lshosts -l hostA
     
    HOST_NAME:  hostA
    type    model        cpuf ncpus ndisks maxmem maxswp maxtmp rexpri server nprocs ncores nthreads
    X86_64  Intel_EM64T  60.0 12    1      23.9G  3.9G   40317M 0      Yes    2      6      1
     
    RESOURCES: (mg)
    RUN_WINDOWS:  (always open)
     
    LOAD_THRESHOLDS:
    r15s  r1m  r15m ut pg io ls it tmp swp mem nmics ngpus ngpus_shared ngpus_excl_t ngpus_excl_p
    -     3.5  -    -  -  -  -  -  -   -   -   -     -     -            -            -
    
  6. Submit jobs. Use the select[] string in a resource requirement (-R option) to choose the hosts that have Intel Xeon Phi resources. Use the rusage[] string to tell LSF how many resources to use.
    • Use Intel Xeon Phi resources in a LSF MPI job:
      bsub -n 4 -R "rusage[nmics=2]" mpirun -lsf mic_app
    • Request Intel Xeon Phi co-processors.
      bsub -R "rusage[nmics=n]"
    • Consume one Intel Xeon Phi resource on the execution host:
      bsub -R "rusage[nmics=1]" mic_app 
    • Run the job on one host and consume 2 Intel Xeon Phi resources on that host:
      bsub -R "rusage[nmics=2]" mic_app