Working with snap-ml-spark in WML Accelerator

IBM Spectrum Conductor™ in WML Accelerator lets you set up a Spark cluster automatically. After configuring your environment, you can run applications using snap-ml-spark APIs in a IBM Spectrum Conductor environment in WML Accelerator.

Configure an IBM Spectrum Conductor resource group for WML CE SnapML

  1. Create a resource group called "gpus".
    1. Log in to the cluster management console as an administrator.
    2. Navigate to Resources > Resource Planning (Slot) > Resource Groups.
    3. Under Global Actions, click Create a Resource Group.
    4. Create a resource group called "gpus" with Advanced Formula ngpus.
    5. Navigate to Resources > Consumers and select a consumer, such as "SampleApplications". Select the Consumer Properties page. Under the section "Specify slot-based resource groups", select the "gpus" resource group click Apply.
    6. Navigate to Resources > Resource Planning (Slot) > Resource Plan and select Resource Group: gpus. For the Slot allocation policy, select Exclusive. This setting specifies that when IBM Spectrum Conductor allocates resources from this resource group, it uses all free slots from a host. For example, assuming there are four GPUs on a host, a request for 1, 2, 3, or 4 GPUs would use the whole host.

      Click Apply.

  2. Create an Anaconda environment.

    During the creation of the Spark Instance Group, you will select this environment.

    Create an Anaconda environment.
    1. Navigate to WorkloadSparkAnaconda Management. If Anaconda3-2018-12-Linux-ppc64le is not listed, download it from this location: https://repo.continuum.io/archive/Anaconda3-2018.12-Linux-ppc64le.sh, then click Add and upload it.
    2. Select Anaconda3-2018-12-Linux-ppc64le from the Anaconda Management page. Click Deploy.
    3. Specify a name, for example, myAnaconda, and a deployment directory, such as /home/egoadmin/myAnaconda, for the Anaconda distribution.
    4. Select the Environment Variables tab. Click Add Variable and add the following variables:

      IBM_POWERAI_LICENSE_ACCEPT=yes
      PATH=$PATH:/usr/bin or PATH=$PATH:/bin; based on where bash exists.

      Click Deploy, then click Continue to Anaconda Distribution Instance.

    1. Prepare the conda environment yml file from the existing dlipy3 conda environment:
      1. Log in to the WML Accelerator master host as the user who installed the WML Accelerator (may be root) and run the below commands.
        . ${DLI_CONDA_HOME}/etc/profile.d/conda.sh
        source activate dlipy3
        conda env export | grep -v "^prefix: " > /tmp/dlipy3_env.yml
      2. Remove the line containing - https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda at the beginning of the file in /tmp/dlipy3_env.yml. Add the following lines under "dependencies:" but above "- pip:" in this yml file.
          - jupyter=1.0.0
          - jupyter_client=5.2.2
          - jupyter_console=5.2.0
          - jupyter_core=4.4.0
          - jupyterlab=0.31.5
          - jupyterlab_launcher=0.10.2
          - notebook=5.6.0
          - conda=4.5.12
      3. Change the tornado level to 5.1.1 in /tmp/dlipy3_env.yml:
        - tornado=5.1.1=py36h7b6447c_0
      4. Copy the file /tmp/dlipy3_env.yml to your system so that this yml file can be uploaded when creating the conda environment through the user interface.
    2. Select Create environment from a yaml file, click Browse, then select the dlipy3_env.yml file and click Add.

      This creates a conda environment with the name dlipy3 with WML CE components installed in the environment. This dlipy3 conda environment will be configured when a Spark instance group in the following section.

    3. Click the Anaconda3-2018-12-Linux-ppc64le Anaconda distribution name. In the wizard, click on the Anaconda distribution instance myAnaconda. Under Conda environments, click Add.
  3. Create a Spark instance group.

    To use snap-ml-spark, you must configure a Spark instance group in IBM Spectrum Conductor with the following configuration.

    1. Navigate to Workload > Spark > Spark Instance Groups > New. Select Spark 2.3.1 from the Version drop down box. Click the Configuration link and set these properties:

      SPARK_EGO_CONF_DIR_EXTRA=/home/egoadmin/myAnaconda/anaconda/envs/dlipy3/snap-ml-spark/conductor_spark/conf
      SPARK_EGO_GPU_EXECUTOR_SLOTS_MAX= The number of GPUs available on each host in the cluster. For example,
      SPARK_EGO_GPU_EXECUTOR_SLOTS_MAX=4

    2. Go to Additional Parameters and click Add a Parameter. Add the parameter spark.jars with the value /home/egoadmin/myAnaconda/anaconda/envs/dlipy3/snap-ml-spark/lib/snap-ml-spark-v1.2.0-ppc64le.jar.
    3. Enable the Jupyter 5.4.0 Notebook:
      1. Open the Spark Instance Group creation page. Under Enable notebooks, select Jupyter 5.4.0.
      2. Provide a shared directory as the base data directory. For example, /wmla-nfs/ravigumm-2019-03-07-00-03-03-dli-shared-fs/data.
      3. Select the Anaconda distribution instance myAnaconda and the Conda environment dlipy3.
    4. Click the Jupyter 5.4.0 Configuration link.
      1. Go to the Environment Variables tab and click Add a variable. Add the JUPYTER_SPARK_OPTS variable. For example, to use eight GPUs with eight partitions for notebooks, where two hosts with four GPUs on each host exists in the Spark instance group, add the following:
        JUPYTER_SPARK_OPTS = --conf spark.ego.gpu.app=true --conf spark.ego.gpu.executors.slots.max=4 --conf spark.default.parallelism=8
      2. Under the Resource Groups and Plans section, ensure that the following is selected:
        • For Spark executors (GPU slots), select gpus resource group.
        • For all other options in the section, select ComputeHosts resource group.
        Click Create and Deploy Instance Group.
    5. After the Spark instance group is deployed and started, to run Jupyter Notebooks, select the Spark instance group, select the Notebooks tab, click Create Notebooks for Users. Select users and click Create.
    6. Stop and start Jupyter 5.4.0 to refresh the Notebook and ensure that the sample notebooks are added to the Jupyter Notebooks home page.
      Note: Start this Jupyter 5.4.0 Notebook only when Jupyter Notebooks are to be executed. This ensures that GPUs are not allocated to the Notebook unnecessarily.

Run snap-ml-spark applications through spark-submit in WML Accelerator

To run snap-ml-spark applications through spark-submit, open the Cluster Management Console and navigate to Workload > Spark > My Applications and Notebooks. By spark-submit, click Run Application. The Run Application wizard opens. Enter appropriate arguments. In the following example, a Spark job is submitted in client mode. The file example-criteo45m.py is an example file shipped with WML CE:
--master ego-client 
--conf spark.ego.gpu.app=true /home/egoadmin/myAnaconda/anaconda/envs/dlipy3/snap-ml-spark/examples/example-criteo45m/example-criteo45m.py 
--data_path /wmla-nfs/criteoData 
--num_partitions 8 
--use_gpu

To submit the Spark job in cluster mode, specify ego-cluster instead of ego-client. On the Applications page, you will see a Running application that is using eight GPUs from two hosts.

The /wmla-nfs/criteoData/data/ directory should contain the input criteo data. This data directory is on the host in the cluster where the Spark application driver will be running. Therefore, it is recommended that you use a shared file system for the data directory. For instructions to run this example and its related data set, review the README file here: /home/egoadmin/myAnaconda/anaconda/envs/dlipy3/snap-ml-spark/examples/example-criteo45m/README.md.

For additional examples, see /home/egoadmin/myAnaconda/anaconda/envs/dlipy3/snap-ml-spark/examples.

Run Jupyter Notebooks in WML Accelerator using snap-ml-spark

Follow these instructions to run Jupyter Notebooks by using snap-ml-spark:

  1. Open the Spark instance group and select the Notebooks tab.
  2. Start the notebook if it is not in "started" state.
  3. Under My Notebooks, select the notebook. For example "Jupyter 5.4.0 - owned by Admin". The Notebooks log in page opens.
  4. Log in as the Admin user and run an existing notebook or create a new notebook where snap-ml-spark can be imported and its API can be run:
    • To run a sample notebook, open the snap_ml_spark_example_notebooks folder, and select a notebook to open and run. Review the instructions at the beginning of the notebook before running it.
    • To create a new IPython notebook, click New > Spark Cluster.