Using frameworks via command line interface

IBM Spectrum Conductor Deep Learning Impact 1.2.3 supports using additional frameworks via the command line interface (CLI). Utilize your existing cluster resources to use any framework to run deep learning jobs using dlicmd.

Users can submit deep learning tasks to a particular deep learning framework provided that it was installed and made available by the cluster administrator. The dlicmd.py file is found in the $EGO_TOP/dli/1.2.3/dlpd/bin directory and the corresponding framework plugins are found in the $EGO_TOP/dli/1.2.3/dlpd/tools/dl_plugins directory.

The dlicmd command assumes that models can access data sources from within the IBM Spectrum Conductor Deep Learning Impact cluster. Model data must either be dynamically downloaded, reside on shared directories, or be available from remote data connection services.

Obtain plugins

By default, some framework plugins are provided with IBM Spectrum Conductor Deep Learning Impact 1.2.3. It is recommended that you obtain the latest plugins for IBM Spectrum Conductor Deep Learning Impact 1.2.3 from IBM Cloud.

New plugins can be created and added to the $EGO_TOP/dli/1.2.3/dlpd/tools/dl_plugins directory by a cluster administrator, see Add a framework.

Limitations

Consider the following limitations when using framework plugins:

Frameworks configured with a plugin for dlicmd command can not be used from within a Jupyter notebook.
All dlicmd framework plugins cannot be managed from the cluster management console; they must be managed using command line.

Examples

Using the mnist example, use the TensorFlow plugin to execute a TensorFlow task using the dlicmd command:
1. From any host, log in to IBM Spectrum Conductor Deep Learning Impact.
```
$ python dlicmd.py --logon  --master-host abc.ibm.com --username Admin --password Admin
```
2. List all available frameworks on host abc.ibm.com.
```
$ python dlicmd.py --dl-frameworks --master-host abc.ibm.com
```
3. Execute a TensorFlow task named mnist.py using instance group dliig on host abc.ibm.com.
```
$ python dlicmd.py --exec-start tensorflow --master-host abc.ibm.com --ig dliig --model-main mnist.py
```
4. After executing the task, there are two ways to see the submitted task, either:
  - Using the CLI:
    - Using the dlicmd and execution ID of the task, you can obtain information about the task.
  - Using the cluster management console:
    - Log in to the cluster management console and navigate to the Workload tab and select Spark > My Applications & Notebooks. The application name and execution ID is available as a submitted application. Note, no information about the execution is available under Spark > Deep Learning.
Using the mnist example, use the Caffe plugin to execute a Caffe task using the dlicmd command:
1. From any host, log in to IBM Spectrum Conductor Deep Learning Impact. This assumes that the cluster DLI_DLPD_REST_PORT is 9243 (default) and master host is a host where the DLPD service is running.
```
$ python dlicmd.py --logon  --master-host abc.ibm.com --username Admin --password Admin
```
2. List all available frameworks on host abc.ibm.com.
```
$ python dlicmd.py --dl-frameworks --master-host abc.ibm.com
```
3. Execute a Caffe-BVLC train task named lenet_solver.prototxt. To run Caffe job on multiple nodes, make sure all nodes have access to the files required by Caffe task such as models and datasets.
```
$ python dlicmd.py --exec-start caffePowerAICaffeIBM --master-host abc.ibm.com --ig dliig --gpuPerWorker 1 --model-dir /dli_shared_fs/models/caffe --model-main lenet_solver.prototxt train
```
4. After executing the task, there are two ways to see the submitted task, either:
  - Using the CLI:
    - Using the dlicmd and execution ID of the task, you can obtain information about the task.
  - Using the cluster management console:
    - Log in to the cluster management console and navigate to the Workload tab and select My Applications & Notebooks. The application name and execution ID is available as a submitted application. Note, no information about the execution is available under Spark > Deep Learning.