Installing

IBM Watson® Machine Learning Accelerator consists of several components. Follow these steps to install all of the components that make up WML Accelerator, either manually, or by using the Automated installer.

Only the most current level of each release of IBM Watson Machine Learning Accelerator should be installed, where version numbers are in the format version.release.modification.

Note:
  • The Automated installer automates some of the setup and install processes. If you want to use the Automated installer, use the information in this topic instead: Automated installer.
Before you begin
  1. If you plan on using WML Accelerator with IBM Watson Studio Local, you must ensure that IBM Watson Studio Local is installed: Installing IBM Watson Studio Local.
  2. If required, make sure to configure a pluggable authentication module (PAM), like LDAP, for user authentication: Configuring user authentication for PAM and default clients.

    If using WML Accelerator with IBM Watson Studio Local, user authentication can be handled by PAM, otherwise, the cluster administrator must add each user to WML Accelerator.

  3. Ensure that your system meets all requirements:Hardware and software requirements.
  4. Ensure that you have set up your system: Set up your system (Manual install).
Steps:
  1. Log in to the host with root permission.
  2. Download the appropriate install package on the master host. If you are entitled to the packages, download it from Passport Advantage Online or Entitled Systems Support (ESS). If you want to evaluate the product, download the evaluation packages from the WML Accelerator 1.2.0 Evaluation page.
  3. Extract the component packages.
    1. Log in to the master host with root or sudo to root permission.
    2. Ensure that you are in the base conda environment:
      . /opt/anaconda3/etc/profile.d/conda.sh
      conda activate base
    3. Run the WML Accelerator install package to extract the install files:
      Entitled version
      • To extract the install files to the current directory, run:
        sh ibm-wmla-1.2.0_ppc64le.bin
      • To extract the install files to a different directory, run:
        sh ibm-wmla-1.2.0_ppc64le.bin --extract extract_directory
      Evaluation version
      • To extract the install files to the current directory, run:
        sh ibm-wmla-eval-1.2.0_ppc64le.bin
      • To extract the install files to a different directory, run:
        sh ibm-wmla-eval-1.2.0_ppc64le.bin --extract extract_directory
      Note: PowerAI 1.6.0 and the WML Accelerator license will be installed to .powerai in the home directory of the user that performed the install. The default install location for IBM Spectrum Conductor™ Deep Learning Impact and IBM Spectrum Conductor is /opt/ibm/spectrumcomputing.
  4. The WML Accelerator license panel is displayed. Review the license terms and accept the license.
  5. Configure the system for IBM Spectrum Conductor Deep Learning Impact: Configure a system for IBM Spectrum Conductor Deep Learning Impact.
  6. Install IBM Spectrum Conductor by following the instructions in one of these topics, depending on your environment:
  7. Entitle IBM Spectrum Conductor: Entitle IBM Spectrum Conductor.
  8. Install IBM Spectrum Conductor Deep Learning Impact: Install IBM Spectrum Conductor Deep Learning Impact.
  9. Provide permission to $CLUSTERADMIN of IBM Spectrum Conductor Deep Learning Impact (by default, egoadmin) to write to the audit directory, where $EGO_TOP is the path to your installation directory. The default is /opt/ibm/spectrumcomputing.
    setfacl -m u:$CLUSTERADMIN:rwx $EGO_TOP/kernel/audit
  10. If IBM Watson Studio Local is installed with WML Accelerator, update the public key.
    1. Get the standalone IBM Watson Studio Local certificate:
      wget -e https://ws_host:ws_port/auth/jwtcert 
      where ws_host and ws_port is the IBM Watson Studio Local host IP address and port.
    2. Get the public PEM key from the certificate:
      openssl x509 -pubkey -in jwtcert -noout >new_pub_key.pem
    3. Locate the JWT key file in IBM Spectrum Conductor Deep Learning Impact:
      $>cat /opt/wmla/ego_top/dli/conf/dlpd/dlpd.conf |grep DLI_JWT_SECRET_KEY
       "DLI_JWT_SECRET_KEY": "/dlishared/public_key.pem",
    4. Update JWT public file with the new certificate:
      cat new_pub_key.pem > /dlishared/public_key.pem
    5. Update permissions of the JWT key file.
      chmod 777 /dlishared/public_key.pem
    6. Restart dlpd service:
      source /opt/wmla/ego_top/profile.platform
      egosh user logon -u Admin -x Admin
      egosh service stop dlpd
      sleep 5
      egosh service start dlpd
  11. If IBM Watson Studio Local is installed, create a dedicated user for IBM Watson Studio Local named wml-user, for example, from IBM Spectrum Conductor complete the following:
    Note: Each IBM Watson Studio Local user that wants to run training jobs, must be added to WML Accelerator with a matching user name. You can use a common LDAP server for both IBM Watson Studio Local and WML Accelerator for storing user credentials.
    1. In the cluster management console, select System & Services > Users > Roles.
    2. Click the Create New User Account icon in the Users column.
    3. Fill in the fields, specifying wml-user as the account name.
    4. Click Create.
    5. Select the user wml-user and enable either the consumer user role or the data scientist role.
  12. Configure IBM Spectrum Conductor Deep Learning Impact: Configure IBM Spectrum Conductor Deep Learning Impact. If IBM Watson Studio Local is installed, two Spark instance groups can be configured, one for distributed training and one for elastic distributed training. Make sure to use user wml-user to create these Spark instance groups.
  13. If IBM Watson Studio Local is installed and both Spark instance groups are created and configured, you can set up resource sharing between both Spark instance groups so that all workloads have access to available resources.
    1. Open and edit the ego.conf configuration file to set EGO_ENABLE_BORROW_ONLY_CONSUMER to Y. Save your changes and close the file.
    2. In the consumer plan, update slot formation for both Spark instance groups.
      • For the elastic distributed training Spark instance group, for the top consumer, specify 0 for owned slots.
      • For the other Spark instance group, for the top consumer specify 1 for owned slots.
    3. Set the limit for the distributed training consumer as the total number of slots in the GPU group minus 1.
    4. After updating ego.conf, the consumer can be modified from the cluster management console. Set non-master workloads to 0 for share ratio.
  14. Apply any available fixes from Fix Central:
    Note: If IBM Watson Studio Local is installed with WML Accelerator, make sure to apply interim fix 517129.
    1. Obtain the tools that are required to get the fix. If it is not installed, obtain your product update installer. You can download the installer from Fix Central: http://www.ibm.com/support/fixcentral. This site provides download, installation, and configuration instructions for the update installer.
      Note: For more information about how to obtain software fixes, from the Fix Central page, click Getting started with Fix Central, then click the Software tab.
    2. Under Find product, type WML Accelerator in the Product selector field.
    3. Select WML Accelerator. For Installed version, select All. For Platform, select the appropriate platform or select All, then click Continue.
    4. Identify and select the fix that is required, then click Continue.
    5. Click the "readme" link and review the readme file for instructions to apply the fix and other important information.
    6. Download the fix; ensuring that the name of the maintenance file is not changed, either intentionally or by the web browser or download utility.
    7. Use the information in the readme file to apply the fix.
  15. If IBM Watson Studio Local is installed with WML Accelerator, run the updateWMLClusterdetails.sh command line utility which allows IBM Watson Studio Local to locate and use a WML Accelerator instance and identify which WML Accelerator details should be used. For example:
    updateWMLClusterdetails.sh https://wmla-master.example.com 9243 wml-ig wml-ig-edt
    Learn more about running the command line utility: Setting up WML Accelerator with IBM Watson Studio Local
    Note:
Notes:
  • Before adding any hosts to the cluster, you must manually install the WML Accelerator license conda package and accept the license on the management or compute host that is being added to the cluster. See this topic for instructions: Installing the WML Accelerator license on a host.
  • When installing IBM Spectrum Conductor on a compute node, do not add it to cluster until after installing IBM Spectrum Conductor Deep Learning Impact on the compute node.