Create a Spark instance group using the wmla-ig-template-2.3.3 template. Only available with IBM
Spectrum Conductor 2.4.0.
Note: The
wmla-ig-template-2.3.3
template should only be used in
IBM Watson Machine Learning Accelerator environments where
IBM Watson Studio Local is installed.
By default,
the wmla-ig-template-2.3.3 template enables impersonation
settings.
By
enabling impersonation you can have Spark applications run as the submission user. Impersonation
means that the system runs executables under a designated operating system account. The wmla-ig-template-2.3.3 template enables impersonation with
authentication therefore Spark applications must run by LDAP or OS execution users.
- Ensure that you completed setting up your resources, see Configuring a Spark instance group after installing IBM Spectrum Conductor Deep Learning Impact.
- Create an instance group for IBM Spectrum Conductor Deep Learning Impact using the
wmla-ig-template-2.3.3 template.
Attention: The
wmla-ig-template-2.3.3 template creates a Spark instance group
that is used for running deep learning workloads. In order to use elastic distributed training, use
the wmla-ig-edt-template-2.3.3 template to create a second
instance group for the purpose of running elastic distributed training workloads.
- Select the Workload tab and click .
- In the Instance Group List tab, click New.
- Click the Templates button to load the wmla-ig-template-2.3.3 template.
- Click Use to select and use the wmla-ig-template-2.3.3 template.
- Provide a name for the instance group, for example: wml-ig.
- Provide a directory for the Spark deployment. The wml-user user must
have read, write, and execute permissions to the directory specified and its parent directory.
- Set the execution user to wml-user.
- Provide a Spark version and configure Spark. By default, the template uses Spark version 2.3.3 and is configured for single node training
using Python 3.6. If you change
the Spark version, these configurations are lost and must be configured manually. Or if you want to
use a different training type or a different Python version, you must configure additional
parameters as follows.
- Under Consumers the Enable
impersonation to have Spark applications run as the submission user option is enabled. This option is required for IBM Watson Studio Local
and requires the use of LDAP.
- Under Resource Groups and Plans enable GPU slot allocation and specify
the resource group from which resources are allocated to executors in the instance group.
Make sure that the CPU executors resource group contains all the CPU and GPU
executor hosts. If you do not do this, GPU slots are used for the shuffle service.
- Select a CPU resource group for use by Spark executors (CPU slots).
- Select the previously created GPU resource group for use by Spark executors (GPU
slots).
- Create the Spark instance group by clicking Create and Deploy Instance
Group.
- Edit the consumer properties of the Spark instance group.
- Navigate to .
- Select the <Spark-instance-group-name>-sparkexecutor consumer.
- Under the Consumer Properties tab, set Reclaim grace
period to the maximum value of 596 Hours.
- For each child consumer belonging to
<Spark-instance-group-name>-sparkexecutor (for example,
<Spark-instance-group-name>-sparkexecutor0), complete the following:
- Under the Consumer Properties tab, set Reclaim grace
period to the maximum value of 596 Hours.
- Enable exclusive slots at the consumer level (where free slots from the
host can be shared and assigned to any number of allocations, but only amongst a select set of
consumers within an exclusive consumer group).
- Open the $EGO_CONFDIR/ego.conf file for editing.
- Set EGO_ENABLE_CONSUMER_LEVEL_EXCLUSIVE=Y.
- Save your changes.
- Click Apply
The instance group is configured and ready to be started.