Tutorial: Pytorch with DDL
This tutorial explains the necessary steps for enabling distributed deep learning (DDL)
from within the Pytorch script examples provided in the WML CE distribution. DDL is directly integrated into the
Pytorch distributed communication package, torch.distributed
, as the backend
ddl
.
Additional information regarding torch.distributed
: https://pytorch.org/docs/stable/distributed.html
Enabling DDL in a Pytorch program
The DDL Pytorch integration makes it simple to run a Pytorch program on a cluster. To enable DDL,
you simply need to initialize the Pytorch package torch.distributed
with the
backend DDL before any other method in the program. The init_method
needs to be set
to env://
, as shown in this example:
torch.distributed.init_process_group('ddl', init_method='env://')
For the distributed program to work most efficiently, the data needs to be split among each DDL instance. The following two methods are approaches in which the data can be split:
Splitting training data manually
In this approach, you split the data and make the dist.all_reduce
call
manually.
mnist.py
script was modified in this manner to split the training data, as
described above, to enable DDL. The modified script can be found
inCONDA_PREFIX/lib/python$PY_VER/site-packages/torch/examples/ddl_examples/mnist/mnist.py
.For more detailed information about how the data is being split, see the Pytorch tutorial on Writing Distributed Applications
Splitting training data through Pytorch module DistributedDataParallel and DistributedSampler
Pytorch offers a DistributedSampler
module that performs the training data split
amongst the DDL instances and DistributedDataParallel
that does the averaging of
the gradients on the backward pass. DDL does not support the num_workers
argument
passed to torch.utils.data.DataLoader
, unexpected behavior may occur.
kwargs = {'pin_memory': True}
train_sampler = torch.utils.data.distributed.DistributedSampler(dataset, world_size, rank)
train_data = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False,
sampler=train_sampler, **kwargs)
If this behavior occurs, wrap your model with the DistributedDataParallel
module.
model = torch.nn.parallel.DistributedDataParallel(model)
mnist_ddp.py
script was modified in this manner to split the training data,
as described above, to enable DDL. The modified script can be found
inCONDA_PREFIX/lib/python$PY_VER/site-packages/torch/examples/ddl_examples/mnist/mnist_ddp.py
.Running the script using ddlrun
DDL-enabled programs are launched using the ddlrun tool. See Using the ddlrun tool and ddlrun -h
for more information about
ddlrun.
ddlrun -H host1,host2 --accelerators 4 python
CONDA_PREFIX/lib/python$PY_VER/site-packages/torch/examples/ddl_examples/mnist/mnist.py
--dist-backend ddl
For additional information about how to use ddlrun, see
CONDA_PREFIX/doc/ddl/README.md
.
Running DDL-enabled Imagenet script
The Imagenet script available from Pytorch has been modified to take advantage of DDL.
ddlrun --accelerators 4 python -u
CONDA_PREFIX/lib/python$PY_VER/site-packages/torch/examples/ddl_examples/imagenet/imagenet_ddp.py
--dist-backend ddl --seed 1234 --dist-url env:// /dir/to/training_data
CONDA_PREFIX/lib/python$PY_VER/site-packages/torch/examples/ddl_examples/imagenet/README.md