Install the elastic distributed inference package.
Note: When installing the elastic distributed inference package, you
must first install it on the master host before installing it on any compute
hosts.
Before installing the elastic distributed inference package, ensure the
following:
- Obtain and install interim fix 226695 before you install the elastic distributed
inference package.
- Make sure that you obtain the elastic distributed inference package
(dliedi-1.2.3.0.x86_64.rpm or dliedi-1.2.3.0.ppc64le.rpm)
from IBM Fix Central.
- High availability for elastic distributed inference is implemented using the EGO service virtual
IP and is installed in the shared directory for IBM Spectrum Conductor Deep Learning Impact. To ensure that high
availability is available for elastic distributed inference, you must ensure the following:
- A virtual IP address must be prepared for management services, for example:
imd/lbd/etcd.
- A hostname is configured to use the virtual IP. It must have the same domain name as the
hosts.
- If SSL is enabled, all hosts should have an entry for virtual IP and hostname in
/etc/hosts.
- If a virtual IP is not specified during installation, and
the cluster has more than one network card, it is recommenced that after installation you should specify which network card to use. If the
network card is not specified, by default, the first network card is used.
After
installation, to specify which network card is used, update the
$EGO_TOP/dlim/conf/etcd.conf
file and specify the
NETWORKID and
HOSTIP to be used for elastic distributed
inference. For
example:
NETWORKID=network-IP/24
HOSTIP=network-IP
where
network-IP is the network IP address that you want to use for elastic
distributed inference.
To install elastic distributed inference, complete the following
steps on each host.
- Log in to the host (root or sudo to root permission).
- Source the environment. For
example:
source /opt/ibm/spectrumcomputing/profile.platform
- Copy the elastic distributed inference package files to the host.
- Set environment variables. You must set the same variables that you used for the
installation of IBM Spectrum Conductor Deep Learning Impact and any
additional variables that enable and define elastic distributed inference properties.
Environment variable |
Description |
EDI_SHARED_FS |
Required.
Set to the working shared file system directory used by elastic distributed inference. By default,
if not specified, the directory specified by DL_NFS_PATH (found in the
dlpd.conf file) is used. It is recommended that you use the default, otherwise,
you must ensure the following for the directory: - Ensure that the cluster administrator has read, write, and execute permissions on this
directory.
- Compute hosts must use the same directory as management hosts.
- The shared file system directory specified must already exist. Make sure that the directory is
already created before exporting this variable.
- Authority 755 is required.
For example:export EDI_SHARED_FS=/gpfs/dlfs1
Note: For
the test functionality to be available from the cluster management console, you must set
EDI_SHARED_FS to the same value as DLI_SHARED_FS. Otherwise, the test functionality is disabled.
|
EDI_CONDA_HOME |
Required.
Sets the Anaconda directory that is used by elastic distributed inference for deep learning
frameworks. By default, if not specified, the directory specified by DLI_CONDA_HOME
(found in the dlpd.conf file) is used. It is recommended that you use the
default, otherwise, you must ensure that the compute hosts use the same directory as the management
hosts. For example:export EDI_CONDA_HOME=/opt/anaconda3
|
DISABLESSL |
Disables SSL. The default value is N. Note: This
value must be the same as your IBM Spectrum Conductor installation.
For
example:export DISABLESSL=N
|
DLI_EDI_MGT_NETWORKID |
Specifies the inference service management network. For example:
export DLI_EDI_MGT_NETWORKID=9.21.52.77/24
|
DLI_EDI_VIP_NETWORKID |
Specifies inference service base virtual IP network that is used for high availability. For
example:export DLI_EDI_VIP_NETWORKID=9.21.52.250/24
Note: For inference
service to have high availability, both DLI_EDI_VIP_NETWORKID and
DLI_EDI_VIRTUAL_HOSTNAME must be specified.
|
DLI_EDI_VIRTUAL_HOSTNAME |
Specifies the inference service virtual hostname used for high
availability.export DLI_EDI_VIRTUAL_HOSTNAME=vauto.ibm.com
Note: For inference
service to have high availability, both DLI_EDI_VIP_NETWORKID and
DLI_EDI_VIRTUAL_HOSTNAME must be specified.
|
DLI_EDI_LBD_REST_PORT |
Specifies the load balancing daemon (LBD) rest port for the inference service. The default
port number is 9000. For example:
export DLI_EDI_LBD_REST_PORT=9000
|
DLI_EDI_LBD_STREAM_PORT |
Specifies the load balancing daemon (LBD) streaming port for the inference service. The
default port number is 9010. For example:
export DLI_EDI_LBD_STREAM_PORT=9010
|
DLI_EDI_IMD_REST_PORT |
Specifies the inference management daemon (IMD) rest port. The default port number is
8888. For example:
export DLI_EDI_IMD_REST_PORT=8888
|
DLI_EDI_IMD_MGT_PORT |
Specifies the inference management daemon (IMD) management port. The default port number is
8889. For example:
export DLI_EDI_IMD_MGT_PORT=8889
|
DLI_EDI_IMD_STREAM_PORT |
Specifies the inference management daemon (IMD) streaming port. The default port number is
8890. For example:
export DLI_EDI_IMD_STREAM_PORT=8890
|
DLI_EDI_ETCD_PEER_PORT |
Specifies the etcd management port. The default port number is 2380.
For example: export DLI_EDI_ETCD_PEER_PORT=2380
|
DLI_EDI_ETCD_CLIENT_PORT |
Specifies the etcd client port. The default port number is 2379. For
example: export DLI_EDI_ETCD_CLIENT_PORT=2379
|
- Install the elastic distributed inference package. The elastic distributed inference
package must be installed to the same installation location as IBM Spectrum Conductor Deep Learning Impact.
For
x86:
rpm -ivh --prefix $EGO_TOP --dbpath $DB_PATH dliedi-1.2.3.0.x86_64.rpm
For
Power:
rpm -ivh --prefix $EGO_TOP --dbpath $DB_PATH dliedi-1.2.3.0.ppc64le.rpm
where:
- --prefix
$EGO_TOP specifies the absolute path to the installation directory. The
--prefix parameter is optional. If IBM Spectrum Conductor Deep Learning Impact was installed with the
--prefix option, then you must specify the same path that was used during the
installation of IBM Spectrum Conductor Deep Learning Impact. If you
install without the --prefix option, the default path
/opt/ibm/spectrumcomputing is used.
- --dbpath
$DB_PATH sets the path to the RPM database directory. The
--dbpath parameter is optional. If IBM Spectrum Conductor Deep Learning Impact was installed with the
--dbpath option, then you must specify the same path that was used during the
installation of IBM Spectrum Conductor Deep Learning Impact.
- Restart EGO services.
egosh service stop all
egosh ego shutdown all
egosh ego start all
- Log in to the cluster management console to verify that elastic distributed inference is
available.
- Log in to the cluster management console at
https://<webserver_hostname>:8443 (or
http://<webserver_hostname>:8080 if SSL is disabled).
- Navigate to
.
- Verify that Elastic Distributed Inference is one of the
available tabs.
Elastic distributed inference is installed.