PyTorch is a software-based open source deep learning framework used to build neural networks, combining the machine learning (ML) library of Torch with a Python-based high-level API. Its flexibility and ease of use, among other benefits, have made it the leading ML framework for academic and research communities.
PyTorch supports a wide variety of neural network architectures, from simple linear regression algorithms to complex convolutional neural networks and generative transformer models used for tasks like computer vision and natural language processing (NLP). Built on the widely understood Python programming language and offering extensive libraries of pre-configured (and even pre-trained) models, PyTorch allows data scientists to build and run sophisticated deep learning networks while minimizing the time and labor spent on code and mathematical structure.
PyTorch also allows data scientists to run and test portions of code in real time, rather than wait for the entire code to be implemented—which, for large deep learning models, can take a very long time. This makes PyTorch an excellent platform for rapid prototyping, and also greatly expedites the debugging process.
Originally developed by Facebook AI Research (now Meta), PyTorch was made open source in 2017 and has been under the stewardship of the PyTorch Foundation (which is part of the larger Linux Foundation) since 2022. The foundation serves a neutral space for the deep learning community to collaborate on further development of the PyTorch ecosystem.
In 2023, IBM became a premier member of the PyTorch Foundation, having already collaborated on two major projects: enabling more efficient training of flexible AI foundation models with billions of parameters and making checkpointing for AI training considerably more cost effective. The IBM watsonx portfolio uses PyTorch to provide an enterprise-grade software stack for artificial intelligence foundation models, from end-to-end training to fine-tuning of models.
PyTorch’s mathematical and programming structure simplifies and streamlines machine learning workflows, without limiting the complexity or performance of deep neural networks.
Python is a general purpose, high-level programming language widely used in data science, making it an intuitive choice for data scientists extending their work into actively modeling deep learning networks. Python’s simple syntax is easy to read, takes relatively little time to learn and can run on any operating system, including Windows, macOS, Linux or Unix. Python has been the second most used programming language on GitHub for over three years, having overtaken Java in 2019. It continues to grow in popularity, with a 22.5 percent increase in 2022.1
This flexibility and simplicity has helped foster a robust online community of Python developers, collaborating on a wide array of Python libraries and APIs—like Numerical Python (NumPy) for mathematical operations, Pandas for data manipulation or matplotlib for data visualization—and educational resources. This community has also produced a great volume of Pytorch libraries that reduce the monotony and guesswork of coding for machine learning, freeing up developers and data scientists to focus on innovation rather than rote task writing.
In any machine learning algorithm, even those applied to ostensibly non-numerical information like sounds or images, data must be represented numerically. In PyTorch, this is achieved through tensors, which serve as the fundamental units of data used for computation on the platform.
In the context of machine learning, a tensor is a multi-dimensional array of numbers that functions like a mathematical bookkeeping device. Linguistically, “tensor” functions as a generic term inclusive of some more familiar mathematical entities:
PyTorch tensors function similarly to the ndarrays used in NumPy—but unlike ndarrays, which can only run on central processing units (CPUs), tensors can also run on graphics processing units (GPUs). GPUs enable dramatically faster computation than CPUs, which is a major advantage given the massive volumes of data and parallel processing typical to deep learning.
In addition to encoding a model’s inputs and outputs, PyTorch tensors also encode model parameters: the weights, biases and gradients that are “learned” in machine learning. This property of tensors enables automatic differentiation, which is one of PyTorch’s most important features.
PyTorch uses modules as the building blocks of deep learning models, which allows for the quick and straightforward construction of neural networks without the tedious work of manually coding each algorithm.
Modules can—and often do—contain other nested modules. In addition to enabling the creation of more elaborate multi-layer neural networks, this also allows these complex deep learning models to be easily saved as a single named module and transferred between different machines, CPUs or GPUs. PyTorch models can even be run in non-Python environments, like C++, using Torchscript (link resides outside ibm.com), helping bridge the gap between research prototypes and production deployment.
Broadly speaking, there are three primary classes of modules used to build and optimize deep learning models in PyTorch:
nn modules are deployed as the layers of a neural network. The torch.nn package contains a large library of modules that perform common operations like convolutions, pooling and regression. For example, torch.nn.Linear(n,m) calls a linear regression algorithm with n inputs and m outputs (whose initial inputs and parameters are then established in subsequent lines of code).
The autograd module provides a simple way to automatically compute gradients, used to optimize model parameters via gradient descent, for any function operated within a neural network. Appending any tensor with requires_grad=True signals to autograd that every operation on that tensor should be tracked, which enables automatic differentiation.
Optim modules apply optimization algorithms to those gradients. Torch.optim provides modules for various optimization methods, like stochastic gradient descent (SGD) or root mean square propagation (RMSprop), to suit specific optimization needs.
Dynamic computation graphs (DCGs) are how deep learning models are represented in PyTorch. Abstractly speaking, computation graphs map the flow of data between the different operations in a mathematical system: in the context of deep learning, they essentially translate a neural network’s code into a flowchart indicating the operations performed at each node and the dependencies between different layers in the network—the arrangement of steps and sequences that transform input data into output data.
What differentiates dynamic computation graphs (like those used in PyTorch) from static computation graphs (like those used in TensorFlow) is that DCGs defer the exact specification of computations and relationships between them until run time. In other words, whereas a static computation graph requires the architecture of the entire neural network to be fully determined and compiled in order to run, DCGs can be iterated and modified on the fly.
This makes DCGs particularly useful for debugging and prototyping, as specific portions of a model’s code can be altered or run in isolation without having to reset the entire model—which, for the very large deep learning models used for sophisticated computer vision and NLP tasks, can be a waste of both time and computational resources. The benefits of this flexibility extend to model training, as dynamic computation graphs are easily generated in reverse during backpropagation.
While their fixed structure can empower greater computational efficiency, static computational graphs have limited flexibility: for example, building a model that uses a varying number of layers depending on the input data—like a convolutional neural network (CNN) that can process images of different sizes—is prohibitively difficult with static graphs.
One extensively used method for training neural networks, particularly in supervised learning, is backpropagation. First, in a forward pass, a model is fed some inputs (x) and predicts some outputs (y); working backwards from that output, a loss function is used to measure the error of the model’s predictions at different values of x. By differentiating that loss function to find its derivative, gradient descent can be used to adjust weights in the neural network, one layer at a time.
PyTorch’s autograd module powers its automatic differentiation technique using a calculus formula called the chain rule, calculating complex derivatives by splitting them into simpler derivates and combining them later. Autograd automatically calculates and records gradients for all operations executed in a computational graph, greatly reducing the legwork of backpropagation.
When running a model that has already been trained, autograd becomes an unnecessary use of computational resources. Appending any tensor operation with requires_grad=False will signal PyTorch to stop tracking gradients.
Working with the large datasets required to train deep learning models can be complex and computationally demanding. PyTorch provides two data primitives, datasets and dataloaders, to facilitate data loading and make code more easily readable.
PyTorch’s core features are supplemented by a robust ecosystem of tools, libraries and extensions (link resides outside ibm.com) developed by members of the PyTorch community. Many additional open source libraries, containing purpose-specific modules, pre-configured neural networks and even pre-trained models, are available to supplement the pre-installed torch library.
Torchvision is a toolkit containing modules, network architectures and datasets for various image classification, object detection and image segmentation tasks.
TorchText provides resources like datasets, basic text-processing transformations and pre-trained models for use in NLP.
The Open Neural Network Exchange (ONNX) ensures interoperability between AI frameworks, allowing users to easily transition their PyTorch models onto other platforms.
Many helpful tutorials are available at PyTorch.org. For example, this intermediate tutorial (link resides outside ibm.com) teaches the fundamentals of deep reinforcement learning by training an AI to play a video game.
PyTorch can be installed and run in different configurations on both local systems and cloud platforms.
Running PyTorch locally requires installing Python, using either the Anaconda package manager, Homebrew (link resides outside ibm.com) or the Python website (link resides outside ibm.com).
PyTorch can be locally installed via Anaconda (link resides outside ibm.com) using the command conda install pytorch torchvision -c pytorch, or via pip (link resides outside ibm.com) using the command pip3 install torch torchvision. Anaconda is recommended, as it provides all PyTorch dependencies (including Python) in one sandboxed install.2
PyTorch can also be run on cloud platforms, including Amazon Web Services, Google Cloud and Microsoft Azure.
It is recommended (but not required) to work with NVIDIA GPUs in order to take advantage of PyTorch’s support for CUDA (Compute Unified Device Architecture), which offers dramatically faster training and performance than can be delivered by CPUs.
We surveyed 2,000 organizations about their AI initiatives to discover what's working, what's not and how you can get ahead.
IBM® Granite™ is our family of open, performant and trusted AI models, tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.
Access our full catalog of over 100 online courses by purchasing an individual or multi-user subscription today, enabling you to expand your skills across a range of our products at one low price.
Led by top IBM thought leaders, the curriculum is designed to help business leaders gain the knowledge needed to prioritize the AI investments that can drive growth.
Want to get a better return on your AI investments? Learn how scaling gen AI in key areas drives change by helping your best minds build and deliver innovative new solutions.
Learn how to confidently incorporate generative AI and machine learning into your business.
Dive into the 3 critical elements of a strong AI strategy: creating a competitive edge, scaling AI across the business and advancing trustworthy AI.
1 Octoverse 2022: The top programming languages (link resides outside ibm.com), Github, 17 November 2022
2 PyTorch: Get Started – Start Locally (link resides outside ibm.com)