What is an AI accelerator?

An artificial intelligence (AI) accelerator, also known as an AI chip, deep learning processor or neural processing unit (NPU), is a hardware accelerator that is built to speed AI neural networks, deep learning and machine learning.

As AI technology expands, AI accelerators are critical to processing the large amounts of data needed to run AI applications. Currently, AI accelerator use cases span smartphones, PCs, robotics, autonomous vehicles, the Internet of Things (IoT), edge computing and more.

For decades, computer systems depended on accelerators (or coprocessors) for a variety of specialized tasks. Typical examples of coprocessors include graphics processing units (GPUs), sound cards and video cards. But with the growth of AI applications over the last decade, traditional central processing units (CPUs) and even some GPUs weren’t able to process the large amounts of data needed to run AI applications. Enter AI accelerators, with specialized parallel-processing capabilities that allow them to perform billions of calculations at once.

Subscribe to the IBM newsletter

Why are AI accelerators important?

As the AI industry expands into new applications and fields, AI accelerators are critical to speeding the processing of data necessary to create AI applications at scale. Without AI accelerators like GPUs, field programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs) to speed deep learning, breakthroughs in AI like ChatGPT would take much longer and be more costly. AI accelerators are widely used by some of the world’s largest companies, including Apple, Google, IBM, Intel and Microsoft.

Benefits of AI accelerators

With speed and scalability at a premium in the fast-moving AI technology industry, AI accelerators have become indispensable in helping companies innovate at scale and bring new AI applications to market sooner. AI accelerators are superior to their older counterparts in three critical ways: Speed, efficiency and design.

Speed

AI accelerators are much faster than traditional CPUs due to their dramatically lower latency, a measure of delays in a system. Low latency is particularly critical in the development of AI applications in the medical and autonomous vehicle fields where delays of seconds—even milliseconds—are dangerous.

Efficiency

AI accelerators can be anywhere from one hundred to a thousand times more efficient than other, more standard compute systems. Both the large AI accelerator chips used in data centers and the smaller ones typically used in edge devices draw less power and dissipate lower amounts of heat than their older counterparts.

Design

AI accelerators have what’s known as heterogenous architecture, enabling multiple processors to support separate tasks, a capability that increases compute performance to the levels required by AI applications.

Challenges to AI accelerator technology

AI accelerators are crucial to applications of AI technology, however, there are challenges facing the industry that will need to be resolved soon or they will hamper innovation.

Most AI accelerators are made exclusively in Taiwan

60% of the world’s semiconductors and 90% of its advanced chips (including AI accelerators) are manufactured on the island of Taiwan. Additionally, the world’s largest AI hardware and software company, Nvidia, relies almost exclusively on a single company—the Taiwan Semiconductor Manufacturing Corporation (TSMC)—for its AI accelerators.

AI models are developing faster than AI accelerator design

Today’s most powerful AI models require more computational power than many AI accelerators can handle, and the pace of innovation in chip design isn’t keeping pace with the innovation occurring in AI models. Companies are exploring areas like in-memory computing and AI-algorithm-enhanced performance and fabrication to increase efficiency, but they aren’t moving as fast as the increases in computational demand of AI-powered applications.

AI accelerators need more power than their size allows

AI accelerators are small, most are measured in millimeters and the largest in the world is only about the size of an iPad, making it difficult to direct the amount of energy needed to power them into such a small space. This has become increasingly difficult as compute demands from AI workloads have risen in recent years. Advancements will need to be made soon in the power delivery network (PDN) architectures behind AI accelerators or their performance will start to be affected.

How do AI accelerators work?

Due to their unique design and specialized hardware, AI accelerators boost AI processing performance considerably when compared to their predecessors. Purpose-built features enable the solving of complex AI algorithms at rates that far outpace general-purpose chips.

AI accelerators are typically made from a semiconductor material, like silicon, and a transistor that’s connected to an electronic circuit. Electrical currents running through the material are turned on and off, creating a signal that is then read by a digital device. In advanced accelerators, the signals are switched on and off billions of times per second, allowing the circuits to solve complex computations using binary code.

Some AI accelerators are designed for a specific purpose while others have more general functionality. For example, NPUs are AI accelerators built specifically for deep learning, while GPUs are AI accelerators designed for video and image processing.

Features of AI accelerators

Tasked primarily with solving advanced algorithms, the performance of AI accelerators is crucial to various AI-related operations, such as machine learning (ML), deep learning and deep neural network problems. They can solve many algorithms at once—quickly and accurately—due to the unique way they deploy computational resources, primarily through parallel processing, their unique memory architecture and a feature known as reduced precision. Today’s most advanced AI accelerators are designed to tackle large, complex problems by dividing them up into smaller ones and solving them at the same time, exponentially increasing their speed.

Parallel processing

No other feature enhances AI accelerator’s performance like its ability to perform many computations at once, a task known as parallel processing. Unlike other chips, AI accelerators can complete tasks in minutes, seconds—even milliseconds—that previously took hours and even days. This capability makes them indispensable to AI technologies that rely on real-time data processing, such as edge computing. Because of the sheer number of complex algorithms in ML and deep learning processes, AI accelerators are critical to the advancement of both the technology and its applications.

Reduced precision for AI training

To save power, AI accelerators can employ a feature known as reduced precision arithmetic. Neural networks are still highly functional using 16-bit or even 8-bit floating point numbers, instead of the 32 bits that more general-purpose chips use. This means they can achieve faster processing speeds at lower energy expenditure without sacrificing accuracy.

Memory hierarchy

The way data is moved from one place to another in an AI accelerator is critical to the optimization of AI workloads. AI accelerators use different memory architectures than general-purpose chips, allowing them to achieve lower latencies and better throughput. These specialized design features, including on-chip caches and high-bandwidth memory, are vital to speeding the processing of large datasets necessary for high-performance AI workloads.

Types of AI accelerators

AI accelerators are divided up into two architectures based on their function: AI accelerators for data centers and AI accelerators for edge computing frameworks. Data center AI accelerators require highly scalable architecture and large chips, such as the Wafer-Scale Engine (WSE), built by Cerebras for deep learning systems, while AI accelerators built for edge computing ecosystems focus more on energy efficiency and the ability to deliver near real-time results.

Wafer-scale integration

Wafer-scale integration, or WSI, is a process for building extremely large AI chip networks into a single, “super” chip to reduce cost and accelerate the performance of deep learning models. The most popular wafer-scale integration is the WSE-3 chip network produced by Cerebras and built by TSMC’s 5 nm process, currently the fastest AI accelerator in the world.

NPUs

NPUs, or neural processing units, are AI accelerators for deep learning and neural networks and the data processing requirements unique to these workloads. NPUs can process large amounts of data faster than other chips. They can perform a wide range of AI tasks associated with machine learning, such as image recognition and the neural networks behind popular AI and ML applications like ChatGPT.

GPUs

GPUs—electronic circuits built to enhance the performance of computer graphics and image processing—are used in a variety of devices including video cards, motherboards and mobile phones. However, due to their parallel processing capabilities, they are also increasingly being used in the training of AI models. One popular method is to connect many GPUs to a single AI system to increase that system’s processing power.

Field programmable gate arrays (FPGAs)

FPGAs are highly customizable AI accelerators that depend on specialized knowledge to be reprogrammed for a specific purpose. Unlike other AI accelerators, FPGAs have a unique design that suits a specific function, often having to do with the processing of data in real time. FPGAs are reprogrammable on a hardware level, enabling a much higher level of customization. Common FPGA applications include aerospace, Internet of Things (IoT) and wireless networking.

Application-specific integrated circuits (ASICs)

ASICs are AI accelerators that have been designed with a specific purpose or workload in mind, like deep learning in the case of the WSE-3 ASICs accelerator produced by Cerebras. Unlike FPGAs, ASICs cannot be reprogrammed, but since they are constructed with a singular purpose, they typically out-perform other, more general-purpose accelerators. One example of these is Google’s Tensor Processing Unit (TPU), developed for neural network machine learning using Google's own TensorFlow software.

AI accelerator use cases

From smartphones and PCs to state-of-the-art AI technology like robotics and satellites, AI accelerators play a crucial role in the development of new AI applications. Here are some examples of how AI accelerators are being used:

Autonomous vehicles

AI accelerators can capture and process data in near real time, making them critical to the development of self-driving cars, drones and other autonomous vehicles. Their parallel processing capabilities are unmatched, allowing them to process and interpret data from cameras and sensors and process it so vehicles can react to their surroundings. For example, when a self-driving car arrives at a traffic light, AI accelerators speed the processing of data from its sensors allowing it to read the traffic signal and positions of other cars at the intersection.

Edge computing and edge AI

Edge computing is a process that brings applications and compute power closer to data sources like IoT devices, allowing data to be processed with or without an internet connection. Edge AI allows for AI capabilities and AI accelerators of ML tasks to perform at the edge, rather than moving the data to a data center to be processed. This reduces latency and energy efficiency in many AI applications.

Large language models

Large language models (LLMs) depend on AI accelerators to help them develop their unique ability to understand and generate natural language. AI accelerators’ parallel processing helps speed processes in neural networks, optimizing the performance of cutting-edge AI applications like generative AI and chatbots.

Robotics

AI accelerators are critical to the development of the robotics industry due to their ML and computer vision capabilities. As AI enhanced robotics are developed for various tasks—ranging from personal companions to surgical tools—AI accelerators will continue to play a crucial role in developing their abilities to detect and react to environments with the same speed and accuracy as a human.