Installing on a compute host

Add a compute host to your IBM Spectrum Conductor™ Deep Learning Impact 1.2.3 environment that runs deep learning workloads.

You must install and configure IBM Spectrum Conductor Deep Learning Impact on at least one management host (and use it as the master host). You can additionally install IBM Spectrum Conductor on a compute host. Compute hosts provide computing resources.

Before installing the IBM Spectrum Conductor and IBM Spectrum Conductor Deep Learning Impact packages on the compute host, you must ensure that the compute host has the prerequisite packages installed and that all the compute hosts have the same packages installed.

Before installing IBM Spectrum Conductor Deep Learning Impact, ensure that all system set up has been done: Set up your system (Manual install) and Configure a system for IBM Spectrum Conductor Deep Learning Impact.

  1. Log in to the host (root or sudo to root permission).
  2. Copy the installation files to the compute host.
  3. Define the following environment variables that are required by IBM Spectrum Conductor Deep Learning Impact. These values must be the same across all compute hosts.
    Environment variable Description
    CLUSTERADMIN Mandatory. Set to any valid operating user account, which then owns all installation files. For example:
    export CLUSTERADMIN=egoadmin
    DLI_SHARED_FS Mandatory. Set to the shared file system directory used by IBM Spectrum Conductor Deep Learning Impact for deep learning user data. The default is /gpfs/dlfs1. For example:
    export DLI_SHARED_FS=/gpfs/dlfs1
    Note: Ensure that the cluster administrator has read, write, and execute permissions on this directory.
    DLI_DATA_FS Mandatory. Set a mount point for shared data storage. For example:
    export DLI_DATA_FS=/dli_data_fs
    Note: Shared data storage must be NFS4.
    DLI_RESULT_FS Mandatory. Set a mount point for shared result data storage. For example:
    export DLI_RESULT_FS=/dli_resuls_fs
    Note: Cannot use the same mount point as DLI_SHARED_FS.
    DLI_CONDA_HOME Mandatory. Sets the Anaconda directory that is used by deep learning frameworks.
    Default value is /opt/anaconda3. For example:
    export DLI_CONDA_HOME=/opt/anaconda3
    EGOCOMPUTEHOST Mandatory. Sets the host as a compute host. For example:
    export EGOCOMPUTEHOST=Y
    DLI_RDMA_ENABLE Mandatory if RDMA is enabled on management hosts. Otherwise, not required. Enables remote direct memory access (RDMA) for elastic distributed training. The default value is N. For example:
    export DLI_RDMA_ENABLE=N

    Note, if installing to a shared environment, you must update the profile.elastic file in the $EGO_TOP/dli/1.2.3/dlpd/conf directory by adding RDMA_DEVICE_NAME and the RDMA_DEVICE_PORT settings for each compute host. For example:

    RDMA_DEVICE_NAME=compute1.ibm.com:mlx4_0
    RDMA_DEVICE_PORT=compute1.ibm.com:1
    RDMA_DEVICE_NAME=compute2.ibm.com:mlx4_0
    RDMA_DEVICE_PORT=compute2.ibm.com:1 
    DLI_RDMA_DEVICE_NAME Required if DLI_RDMA_ENABLE is enabled. Specifies the RDMA device name. For example:
    export DLI_RDMA_DEVICE_NAME=mlx4_0
    DLI_RDMA_DEVICE_PORT Required if DLI_RDMA_ENABLE is enabled. Specifies the RDMA device port. The port specified must be available and in PORT_ACTIVE state. For example:
    export DLI_RDMA_DEVICE_PORT=1
    DLI_RDMA_BUFFER_SIZE Optional if DLI_RDMA_ENABLE is enabled. Specifies the RDMA buffer size. If not specified, the default value of 1GB is used. For example:
    export DLI_RDMA_BUFFER_SIZE=1073741824
  4. Run the IBM Spectrum Conductor Deep Learning Impact installer package.
    Important: Make sure that you install IBM Spectrum Conductor Deep Learning Impact to the same directory as the IBM Spectrum Conductor installation.
    Entitled version
    • If IBM Spectrum Conductor was installed with default settings, run this command:
      • Power® install
        sudo ./dli-1.2.3.0_ppc64le.bin --quiet
      • x86 install
        sudo ./dli-1.2.3.0_x86_64.bin --quiet
    • If IBM Spectrum Conductor was not installed with default settings, run this command:
      • Power install
        sudo ./dli-1.2.3.0_ppc64le.bin --prefix install_location --dbpath dbpath_location --quiet
      • x86 install
        sudo ./dli-1.2.3.0_x86_64.bin --prefix install_location --dbpath dbpath_location --quiet
      • --prefix install_location specifies the absolute path to the installation directory. The --prefix parameter is optional. If you install without the --prefix option, IBM Spectrum Conductor Deep Learning Impact is installed in its default directory: /opt/ibm/spectrumcomputing. Ensure that the path is set to the same directory as IBM Spectrum Conductor.
      • --dbpath dbpath_location sets the RPM database to a directory different from the default /var/lib/rpm. The --dbpath parameter is optional. Ensure that the path is set to the same directory as IBM Spectrum Conductor.
      • --quiet enables silent installation. The --quiet parameter is optional.
    Evaluation version
    • If IBM Spectrum Conductor was installed with default settings, run this command:
      • Power install
        sudo ./dlieval-1.2.3.0_ppc64le.bin --quiet
      • x86 install
        sudo ./dlieval-1.2.3.0_x86_64.bin --quiet
    • If IBM Spectrum Conductor was not installed with default settings, run this command:
      • Power install
        sudo ./dlieval-1.2.3.0_ppc64le.bin --prefix install_location --dbpath dbpath_location --quiet
      • x86 install
        sudo ./dlieval-1.2.3.0_x86_64.bin --prefix install_location --dbpath dbpath_location --quiet
      • --prefix install_location specifies the absolute path to the installation directory. The --prefix parameter is optional. If you install without the --prefix option, IBM Spectrum Conductor Deep Learning Impact is installed in its default directory: /opt/ibm/spectrumcomputing. Ensure that the path is set to the same directory as IBM Spectrum Conductor.
      • --dbpath dbpath_location sets the RPM database to a directory different from the default /var/lib/rpm. The --dbpath parameter is optional. Ensure that the path is set to the same directory as IBM Spectrum Conductor.
      • --quiet enables silent installation. The --quiet parameter is optional.
  5. Source the environment by running one of the following commands, where install_location is the path to your installation directory. The default directory is /opt/ibm/spectrumcomputing:
    • For BASH shell, run: source <install_location>/profile.platform
    • For CSH shell, run: source <install_location>/cshrc.platform
  6. As the root user, log in to the cluster as the cluster administrator user:
    egosh user logon -u username -x password
    For example:
    egosh user logon -u Admin -x Admin
  7. Install the WML Accelerator license conda package and accept the license on the management or compute host that is being added to the cluster: Installing the WML Accelerator license on an additional node.
  8. Add the compute host to the IBM Spectrum Conductor cluster. See Adding a host to a cluster.