Getting started with PyTorch

Find information about getting started with PyTorch.

This release of PowerAI includes the community development preview of PyTorch 1.0 (rc1). PowerAI's PyTorch includes support for IBM's Distributed Deep Learning (DDL) and Large Model Support (LMS).

PyTorch examples

The PyTorch package includes a set of examples. A script is provided to copy the sample content into a specified directory:

pytorch-install-samples <somedir>

Large Model Support (LMS)

Large Model Support is a feature provided in PowerAI PyTorch that allows the successful training of deep learning models that would otherwise exhaust GPU memory and abort with “out of memory” errors. LMS manages this oversubscription of GPU memory by temporarily swapping tensors to host memory when they are not needed.

One or more elements of a deep learning model can lead to GPU memory exhaustion. These include:

Model depth and complexity
Base data size (for example, high-resolution images)
Batch size

Traditionally, the solution to this problem has been to modify the model until it fits in GPU memory. This approach, however, can negatively impact accuracy – especially if concessions are made by reducing data fidelity or model complexity.

With LMS, deep learning models can scale significantly beyond what was previously possible and, ultimately, generate more accurate results.

Large Model Support is available as a technology preview in PowerAI PyTorch.

LMS usage

A PyTorch program enables Large Model Support by calling torch.cuda.set_enabled_lms(True) prior to model creation.

In addition, a pair of tunables is provided to control how GPU memory used for tensors is managed under LMS.

torch.cuda.set_limit_lms(limit)
Defines the soft limit in bytes on GPU memory allocated for tensors (default: 0).

By default, LMS favors GPU memory reuse (moving inactive tensors to host memory) over new allocations. This effectively minimizes GPU memory consumption.

However, when a limit is defined, the algorithm favors allocation of GPU memory up to the limit prior to swapping any tensors out to host memory. This allows the user to control the amount of GPU memory consumed when using LMS.

Tuning this limit to optimize GPU memory utilization, therefore, can reduce data transfers and improve performance. Since the ideal tuning for any given scenario may differ, it is considered a best practice to determine the value experimentally, arriving at the largest value that does not result in an out of memory error.
torch.cuda.set_size_lms(size)
Defines the minimum tensor size in bytes that is eligible for LMS swapping (default: 1 MB).

Any tensor smaller than this value is exempt from LMS reuse and persists in GPU memory.

LMS example

The PyTorch imagenet example provides a simple illustration of Large Model Support in action. ResNet-152 is a deep residual network that requires a significant amount of GPU memory.

On a system with a single 16 GB GPU, without LMS enabled, a training attempt with the default batch size of 256 will fail with insufficient GPU memory:

python main.py -a resnet152 -b 256 [imagenet-folder with train and val folders]
=> creating model 'resnet152'
[...]
RuntimeError: CUDA error: out of memory

After enabling LMS, the training proceeds without issue:

git diff
--- a/imagenet/main.py
+++ b/imagenet/main.py
@@ -90,6 +90,7 @@ def main():
                      world_size=args.world_size)
 # create model
 + torch.cuda.set_enabled_lms(True)
   if args.pretrained:
      print("=> using pre-trained model '{}'".format(args.arch))
      model = models.__dict__[args.arch](pretrained=True)
python main.py -a resnet152 -b 256 [imagenet-folder with train and val folders]
=> creating model 'resnet152'
Epoch: [0][0/5005] [...]
Epoch: [0][10/5005] [...]
Epoch: [0][20/5005] [...]
Epoch: [0][30/5005] [...]
Epoch: [0][40/5005] [...]
Epoch: [0][50/5005] [...]
Epoch: [0][60/5005] [...]
[...]

PowerAI PyTorch API Extensions for LMS

Large Model Support extends the torch.cuda package to provide the following control and tuning interfaces.

torch.cuda.set_enabled_lms(enable): Enable/disable Large Model Support.
Parameters: enable (bool): desired LMS setting.
torch.cuda.get_enabled_lms(): Returns a bool indicating whether Large Model Support is currently enabled.
torch.cuda.set_limit_lms(limit): Sets the allocation limit (in bytes) for LMS.
Parameters: limit (int): soft limit on GPU memory allocated for tensors.
torch.cuda.get_limit_lms(): Returns the allocation limit (in bytes) for LMS.
torch.cuda.set_size_lms(size): Sets the minimum size (in bytes) for LMS.
Parameters: size (int): any tensor smaller than this value is exempt from LMS reuse and persists in GPU memory.
torch.cuda.get_size_lms(): Returns the minimum size (in bytes) for LMS.

More information

The PyTorch home page has various information, including tutorials and a getting started guide.

Additional tutorials and examples are available from the community: