IBM Watson® Machine Learning Accelerator consists of
several components. Follow these steps to install all of the components that make up WML Accelerator, either manually, or by using the Automated
installer.
Only the most current level of each release of IBM Watson Machine Learning Accelerator should be installed, where version
numbers are in the format version.release.modification.
Note: - The Automated installer automates some of the setup and install processes. If you want to use
the Automated installer, use the information in this topic instead: Automated installer.
Before you begin- If you plan on using WML Accelerator with IBM Watson Studio
Local, you must ensure that IBM Watson Studio
Local is installed: Installing IBM Watson Studio
Local.
- If required, make sure to configure a pluggable authentication module
(PAM), like LDAP, for user authentication: Configuring user authentication for PAM and default clients.
If using WML Accelerator with IBM Watson Studio
Local, user authentication can be handled by PAM,
otherwise, the cluster administrator must add each user to WML Accelerator.
- Ensure that your system meets all requirements:Hardware and software requirements.
- Ensure that you have set up your system: Set up your system (Manual install).
Steps:
- Log in to the host with root permission.
- Download the appropriate install package on the master host. If you are entitled to the packages, download it from Passport Advantage Online or Entitled Systems Support (ESS). If you want to evaluate the product, download the evaluation
packages from the WML Accelerator
1.2.0 Evaluation page.
- Extract the component packages.
- Log in to the master host with root or sudo to root permission.
- Ensure that you are in the base conda environment:
. /opt/anaconda3/etc/profile.d/conda.sh
conda activate base
- Run the WML Accelerator install package to extract the
install files:
- Entitled version
-
- Evaluation version
-
Note: PowerAI
1.6.0 and the WML Accelerator license will be installed to .powerai in the home directory of the user that performed the install. The default
install location for IBM Spectrum Conductor™ Deep Learning Impact and IBM Spectrum
Conductor is
/opt/ibm/spectrumcomputing.
- The WML Accelerator license panel is displayed. Review
the license terms and accept the license.
- Configure the system for IBM Spectrum Conductor Deep Learning Impact: Configure a system for IBM Spectrum Conductor Deep Learning Impact.
- Install IBM Spectrum
Conductor by following
the instructions in one of these topics, depending on your environment:
- Entitle IBM Spectrum Conductor: Entitle IBM Spectrum Conductor.
- Install IBM Spectrum Conductor Deep Learning Impact: Install IBM Spectrum Conductor Deep Learning Impact.
- Provide permission to $CLUSTERADMIN of IBM Spectrum Conductor Deep Learning Impact (by default, egoadmin) to write to the audit
directory, where $EGO_TOP is the path to your installation directory. The default is
/opt/ibm/spectrumcomputing.
setfacl -m u:$CLUSTERADMIN:rwx $EGO_TOP/kernel/audit
- If IBM Watson Studio
Local
is installed with WML Accelerator, update the public
key.
- Get the standalone IBM Watson Studio
Local
certificate:
wget -e https://ws_host:ws_port/auth/jwtcert
where
ws_host and ws_port is the IBM Watson Studio
Local host IP address and port.
- Get the public PEM key from the certificate:
openssl x509 -pubkey -in jwtcert -noout >new_pub_key.pem
- Locate the JWT key file in IBM Spectrum Conductor Deep Learning Impact:
$>cat /opt/wmla/ego_top/dli/conf/dlpd/dlpd.conf |grep DLI_JWT_SECRET_KEY
"DLI_JWT_SECRET_KEY": "/dlishared/public_key.pem",
- Update JWT public file with the new certificate:
cat new_pub_key.pem > /dlishared/public_key.pem
- Update permissions of the JWT key file.
chmod 777 /dlishared/public_key.pem
- Restart dlpd service:
source /opt/wmla/ego_top/profile.platform
egosh user logon -u Admin -x Admin
egosh service stop dlpd
sleep 5
egosh service start dlpd
- If IBM Watson Studio
Local
is installed, create a dedicated user for IBM Watson Studio
Local named wml-user, for example, from IBM Spectrum Conductor complete the following:
Note: Each IBM Watson Studio
Local user
that wants to run training jobs, must be added to WML Accelerator with a matching user name. You can use a common LDAP
server for both IBM Watson Studio
Local and WML Accelerator for storing user
credentials.
- In the cluster management console, select .
- Click the Create New User Account icon in the
Users column.
- Fill in the fields, specifying wml-user as the account name.
- Click Create.
- Select the user wml-user and enable either the consumer user role
or the data scientist role.
- Configure IBM Spectrum Conductor Deep Learning Impact: Configure IBM Spectrum Conductor Deep Learning Impact. If IBM Watson Studio
Local is
installed, two Spark instance groups can be configured, one for distributed training and one for
elastic distributed
training. Make sure to use user wml-user to create these Spark instance
groups.
- If IBM Watson Studio
Local is installed and both Spark instance groups are
created and configured, you can set up resource sharing between both Spark instance groups so that
all workloads have access to available resources.
- Open and edit the ego.conf configuration file to set
EGO_ENABLE_BORROW_ONLY_CONSUMER to Y. Save your changes
and close the file.
- In the consumer plan, update slot formation for both Spark instance groups.
- For the elastic distributed training Spark instance group, for the top consumer, specify
0 for owned slots.
- For the other Spark instance group, for the top consumer specify 1 for
owned slots.
- Set the limit for the distributed training consumer as the total number of slots in
the GPU group minus 1.
- After updating ego.conf, the consumer can be modified from the
cluster management console. Set non-master workloads to 0 for share ratio.
- Apply any available fixes from Fix Central:
Note: If
IBM Watson Studio
Local is
installed with
WML Accelerator, make sure to apply
interim fix 517129.
- Obtain the tools that are required to get the fix. If it is not installed, obtain your product
update installer. You can download the installer from Fix Central: http://www.ibm.com/support/fixcentral. This site provides download, installation, and configuration instructions for the update
installer.
Note: For more information about how to obtain software fixes, from the Fix Central page,
click Getting started with Fix Central, then click the Software
tab.
- Under Find product, type WML Accelerator in the
Product selector field.
- Select WML Accelerator. For Installed
version, select All. For Platform, select
the appropriate platform or select All, then click
Continue.
- Identify and select the fix that is required, then click Continue.
- Click the "readme" link and review the readme file for instructions to apply the fix and
other important information.
- Download the fix; ensuring that the name of the maintenance file is not changed, either
intentionally or by the web browser or download utility.
- Use the information in the readme file to apply the fix.
- If IBM Watson Studio
Local
is installed with WML Accelerator, run the
updateWMLClusterdetails.sh command line utility which allows IBM Watson Studio
Local to locate and use a WML Accelerator instance and identify which WML Accelerator details should be used. For
example:
updateWMLClusterdetails.sh https://wmla-master.example.com 9243 wml-ig wml-ig-edt
Learn
more about running the command line utility: Setting up WML Accelerator with IBM Watson Studio
LocalNote:
Notes: - Before adding any hosts to the cluster, you must manually
install the WML Accelerator license conda package and accept
the license on the management or compute host that is being added to the cluster. See this
topic for instructions: Installing the WML Accelerator license on a host.
- When installing IBM Spectrum Conductor on a compute node, do
not add it to cluster until after installing IBM Spectrum Conductor Deep Learning Impact
on the compute node.