Getting started with DDL
Find common configuration steps for getting started with distributed deep learning (DDL).
Some configuration steps are common to all use of DDL:
- PowerAI frameworks must be installed at the same version on all nodes in the DDL cluster.
- The DDL master node must be able to log in to all the nodes in the cluster by using ssh keys.
Keys can be created and added by:
- Generate ssh private/public key pair on the master node by
using:
ssh-keygen
- Copy the generated public key in
~/.ssh/id_rsa.pub
to all the nodes’~./ssh/authorized_keys
file:ssh-copy-id -i ~/.ssh/id_rsa.pub $USER@$HOST
- Generate ssh private/public key pair on the master node by
using:
- Linux system firewalls might need to be adjusted to pass
MPI traffic. This adjustment might be done broadly as shown. Note: Opening only required ports would be more secure. Required ports vary with configuration.
sudo iptables -A INPUT -p tcp --dport 1024:65535 -j ACCEPT