Working with snap-ml-spark in WML Accelerator
IBM Spectrum Conductor™ in WML Accelerator lets you set up a Spark cluster automatically. After configuring your environment, you can run applications using snap-ml-spark APIs in a IBM Spectrum Conductor environment in WML Accelerator.
Configure an IBM Spectrum Conductor resource group for PowerAI SnapML
- Create a resource group called "gpus".
- Log in to the cluster management console as an administrator.
- Navigate to .
- Under Global Actions, click Create a Resource Group.
- Create a resource group called "gpus" with Advanced Formula ngpus.
- Navigate to "SampleApplications". Select the Consumer Properties page. Under the section "Specify slot-based resource groups", select the "gpus" resource group click Apply. and select a consumer, such as
- Navigate to Resource Group: gpus. For the Slot allocation
policy, select Exclusive. This setting specifies that when IBM Spectrum Conductor allocates resources from this resource group, it
uses all free slots from a host. For example, assuming there are four GPUs on a host, a request for
1, 2, 3, or 4 GPUs would use the whole host.
Click Apply.
and select
- Create an Anaconda environment.
During the creation of the Spark Instance Group, you will select this environment.
Create an Anaconda environment.- Navigate to WorkloadSparkAnaconda Management. If Anaconda3-2018-12-Linux-ppc64le is not listed, download it from this location: https://repo.continuum.io/archive/Anaconda3-2018.12-Linux-ppc64le.sh, then click Add and upload it.
- Select Anaconda3-2018-12-Linux-ppc64le from the Anaconda Management page. Click Deploy.
- Specify a name, for example, myAnaconda, and a deployment directory, such as /home/egoadmin/myAnaconda, for the Anaconda distribution.
- Select the Environment Variables tab. Click Add Variable and add the
following
variables:
IBM_POWERAI_LICENSE_ACCEPT=yes
PATH=$PATH:/usr/bin or PATH=$PATH:/bin; based on where bash exists.Click Deploy, then click Continue to Anaconda Distribution Instance.
- Prepare the conda environment yml file from the existing dlipy3 conda environment:
- Log in to the WML Accelerator master host as the user
who installed the WML Accelerator (may be root) and run the
below commands. You might have to run the export command first to get the correct
conda to become effective. Example:export PATH=/opt/anaconda3/bin:$PATH
source activate dlipy3 conda env export | grep -v "^prefix: " > /tmp/dlipy3_env.yml
- Remove the line containing
-
https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda at the
beginning of the file in /tmp/dlipy3_env.yml. Add the following lines under
"dependencies:" but above "- pip:" in this yml
file.
- jupyter=1.0.0 - jupyter_client=5.2.2 - jupyter_console=5.2.0 - jupyter_core=4.4.0 - jupyterlab=0.31.5 - jupyterlab_launcher=0.10.2 - notebook=5.6.0 - conda=4.5.12
- Copy the file /tmp/dlipy3_env.yml to your system so that this yml file can be uploaded when creating the conda environment through the user interface.
- Log in to the WML Accelerator master host as the user
who installed the WML Accelerator (may be root) and run the
below commands. You might have to run the export command first to get the correct
conda to become effective. Example:export PATH=/opt/anaconda3/bin:$PATH
- Select Create environment from a yaml file, click
Browse, then select the dlipy3_env.yml file and click
Add.
This creates a conda environment with the name dlipy3 with PowerAI components installed in the environment. This dlipy3 conda environment will be configured when a Spark instance group in the following section.
- Click the Anaconda3-2018-12-Linux-ppc64le Anaconda distribution name. In the wizard, click on the Anaconda distribution instance myAnaconda. Under Conda environments, click Add.
- Create a Spark instance group.
To use snap-ml-spark, you must configure a Spark instance group in IBM Spectrum Conductor with the following configuration.
- Navigate to Spark 2.3.1 from the Version drop down box. Click the
Configuration link and set these
properties:
SPARK_EGO_CONF_DIR_EXTRA=/home/egoadmin/myAnaconda/anaconda/envs/dlipy3/snap-ml-spark/conductor_spark/conf
SPARK_EGO_GPU_EXECUTOR_SLOTS_MAX= The number of GPUs available on each host in the cluster. For example,
SPARK_EGO_GPU_EXECUTOR_SLOTS_MAX=4
. Select - Go to Additional Parameters and click Add a Parameter. Add the parameter spark.jars with the value /home/egoadmin/myAnaconda/anaconda/envs/dlipy3/snap-ml-spark/lib/snap-ml-spark-v1.2.0-ppc64le.jar.
- Enable the Jupyter 5.4.0 Notebook:
- Open the Spark Instance Group creation page. Under Enable notebooks, select Jupyter 5.4.0.
- Provide a shared directory as the base data directory. For example, /wmla-nfs/ravigumm-2019-03-07-00-03-03-dli-shared-fs/data.
- Select the Anaconda distribution instance myAnaconda and the Conda environment dlipy3.
- Click the Jupyter 5.4.0 Configuration link.
- Go to the Environment Variables tab and click Add a variable. Add the
JUPYTER_SPARK_OPTS variable. For example, to use eight GPUs with eight partitions for notebooks,
where two hosts with four GPUs on each host exists in the Spark instance group, add the
following:
JUPYTER_SPARK_OPTS = --conf spark.ego.gpu.app=true --conf spark.ego.gpu.executors.slots.max=4 --conf spark.default.parallelism=8
- Under the Resource Groups and Plans section, ensure that the following is selected:
- For Spark executors (GPU slots), select gpus resource group.
- For all other options in the section, select ComputeHosts resource group.
- Go to the Environment Variables tab and click Add a variable. Add the
JUPYTER_SPARK_OPTS variable. For example, to use eight GPUs with eight partitions for notebooks,
where two hosts with four GPUs on each host exists in the Spark instance group, add the
following:
- After the Spark instance group is deployed and started, to run Jupyter Notebooks, select the Spark instance group, select the Notebooks tab, click Create Notebooks for Users. Select users and click Create.
- Stop and start Jupyter 5.4.0 to refresh the Notebook and ensure that the sample notebooks are
added to the Jupyter Notebooks home page. Note: Start this Jupyter 5.4.0 Notebook only when Jupyter Notebooks are to be executed. This ensures that GPUs are not allocated to the Notebook unnecessarily.
- Navigate to Spark 2.3.1 from the Version drop down box. Click the
Configuration link and set these
properties:
Run snap-ml-spark applications through spark-submit in WML Accelerator
--master ego-client
--conf spark.ego.gpu.app=true /home/egoadmin/myAnaconda/anaconda/envs/dlipy3/snap-ml-spark/examples/example-criteo45m/example-criteo45m.py
--data_path /wmla-nfs/criteoData
--num_partitions 8
--use_gpu
To submit the Spark job in cluster mode, specify ego-cluster instead of ego-client. On the Applications page, you will see a Running application that is using eight GPUs from two hosts.
The /wmla-nfs/criteoData/data/ directory should contain the input criteo data. This data directory is on the host in the cluster where the Spark application driver will be running. Therefore, it is recommended that you use a shared file system for the data directory. For instructions to run this example and its related data set, review the README file here: /home/egoadmin/myAnaconda/anaconda/envs/dlipy3/snap-ml-spark/examples/example-criteo45m/README.md.
For additional examples, see /home/egoadmin/myAnaconda/anaconda/envs/dlipy3/snap-ml-spark/examples.
Run Jupyter Notebooks in WML Accelerator using snap-ml-spark
Follow these instructions to run Jupyter Notebooks by using snap-ml-spark:
- Open the Spark instance group and select the Notebooks tab.
- Start the notebook if it is not in "started" state.
- Under My Notebooks, select the notebook. For example "Jupyter 5.4.0 - owned by Admin". The Notebooks log in page opens.
- Log in as the Admin user and run an existing notebook or create a new notebook where
snap-ml-spark can be imported and its API can be run:
- To run a sample notebook, open the snap_ml_spark_example_notebooks folder, and select a notebook to open and run. Review the instructions at the beginning of the notebook before running it.
- To create a new IPython notebook, click .