This blog post is sponsored by IBM. The author, Peter Rutten, is a Research Director for the IDC Enterprise Infrastructure Practice, focusing on high-end, accelerated, and heterogeneous infrastructure and their use cases. Information and opinions contained in this article are his own, based on research conducted for IDC.

An IDC survey that I performed in 2018 (N = 200) shows what organizations require most from an artificial intelligence (AI) platform above all else is performance–more so than affordability, a specific form factor, vendor support or a full AI software stack on the platform. In other words, businesses want horsepower and they want it from all the critical components: the processors, the accelerators, the interconnects, and the I/O system.

Horsepower is indeed what the industry has been focused on. Most of the disruption in the infrastructure ecosystem is caused by a renewed quest for processor and co-processor performance in the AI era. Significant progress has been made in the past two years and there is a lot of horsepower available to organizations to run their AI workloads, including deep learning training tasks. On the other hand, though, the bar gets set higher and higher. The algorithms are becoming more complex, not to mention larger, and the data volumes that the algorithms train on are growing immensely.

There’s an interesting IDC chart that shows the percentage of worldwide x86 server units reaching near 99 percent in 2016 and a simultaneous sudden ramp-up in co-processor sales starting in that same year. That, of course, was the year when AI– more specifically, deep learning–entered the stage. It quickly became evident that general purpose CPUs couldn’t handle core-hungry AI workloads.

What do I mean by “core hungry”? AI is based on sophisticated mathematical and statistical computations. Take, for example, image and video analytics. Images are converted to matrices, with each pixel being represented by a number. Millions of matrices plus their classifications are fed into a neural network for correlation. The matrices are then multiplied with each other to find the right result. To speed this process up, it must be done in parallel on many more cores than CPUs can provide.

CPUs are designed for serial processing and they are close to reaching their maximum potential due to the size and cost of their cores. Hence, the rise of different types of CPUs as well as accelerators such as GPUs and custom designed processors (ASIC, FPGAs). These accelerators have massively parallel architectures with hundreds or even thousands of cores on a die that affordably deliver the parallel compute performance needed.

The AI performance paradigm is Massively Parallel Compute (MPC). AI workloads (but also big data analytics and simulation & modeling) require performance that can only be achieved with clustered server nodes that house multiple co-processors that contain thousands of cores–tensor cores in a GPU, for example. Co-processors–typically GPUs, FPGAs, or ASICs–are being used to improve performance across various workloads. The most common accelerated workloads today are networking, security, encryption, real-time analytics, and compression, followed closely by AI deep learning training.

The scramble to compensate for limited host processor performance is not unique to AI

The world’s fastest supercomputer in 2019–Summit–was built with thousands of nodes that each have 2 IBM POWER9 processors and 4 NVIDIA V100 GPUs. Every year, the Supercomputer Top 500 lists more systems that leverage such co-processors rather than relying solely on CPUs.

The same survey referenced earlier shows that businesses achieve between 58-73 percent performance improvements from acceleration with co-processors. They do so at the expense of increased CAPEX (if on premises) or OPEX (if using a CSP) between 26-33 percent.

Those are decent statistics, but there’s a fundamental concern that has crept into the discussion: is it really enough? Kunle Olukotun, professor of electrical engineering and computer science at Stanford University and a co-founder of AI hardware start-up Samba Nova has stated: “Organizations are currently settling for expensive temporary solutions that are being cobbled together to run AI applications. A radically new architecture is needed.”

There are many startups in the purpose-built AI infrastructure category, and I fully expect one or more of them to make a significant impact in the near future.

However, the large processor incumbents–IBM, Intel, AMD, Xilinx, Google, AWS, and NVIDIA–have been innovating aggressively to solve the performance gap.

What will the result of all this innovation be? First and foremost, more horsepower for AI. Much more! But also, some competitive chaos in the AI infrastructure marketplace. Processor startups will need to continue their financing efforts (processor startups are not cheap) while creating software ecosystems and server OEM and CSP partnerships. The processor incumbents are competing to get server OEM’s and CSP’s buy-in, the only exception being IBM, which builds its own AI-optimized servers with its own processors, both for Power Systems and IBM Z. And IT will have to evaluate which processor platform warrants investing. My advice to IT would be:

  • Don’t get core starved. Performance issues with AI are generally the result of insufficient parallelization of the infrastructure that the AI workload runs on. AI infrastructure is increasingly based on MPC, meaning clusters of accelerated server nodes with fast interconnects.
  • Keep track of the new AI processor technologies. While some are not yet expected for several years, others are available today, especially those from the large incumbents. Request information from your server vendor what their stance is toward emerging performance requirements in terms of new processors, co-processors, interconnects, or combinations thereof.
  • Don’t be afraid to build a heterogenous AI compute infrastructure, even if that means a bit more complexity than with a 100% homogeneous environment–AI requires it. Remember that heterogeneous infrastructure is no longer complicated the way it used to be thanks to open source layers that abstract away from the hardware (think: Linux, containers, Kubernetes, etc.)

In short: to achieve the horsepower you need for AI, embrace the diversity. Talk to your data scientists and AI developers about infrastructure parallelization. Then, investigate platforms that you may not have had in the datacenter before, platforms with different processors, co-processors, and interconnects for better parallelization. Your AI performance will depend on it.

To learn more about infrastructure for AI, you can read my IDC paper, Rethinking Your Infrastructure for Enterprise AI.

Was this article helpful?
YesNo

More from Artificial intelligence

Self-service that delights customers: How the IBM Partner Ecosystem is harnessing generative AI assistants in the banking and financial sectors

4 min read - Thanks to the transformative benefits promised by generative artificial intelligence (AI), the banking and financial sectors are at a turning point. From redefining a bank’s competitive edge in customer relationships to streamlining core banking operations and strengthening cyber-resiliency, AI technologies can unlock numerous new capabilities. Institutions are already seizing the opportunity. The 2024 Global Outlook for Banking and Financial Markets from the IBM Institute for Business Value (IBM IBV) revealed that 78% of the 600 executives surveyed tactically deploy generative…

Meta releases new Llama 3.1 models, including highly anticipated 405B parameter variant

7 min read - On Tuesday, July 23, Meta announced the launch of the Llama 3.1 collection of multilingual large language models (LLMs). Llama 3.1 comprises both pretrained and instruction-tuned text in/text out open source generative AI models in sizes of 8B, 70B and—for the first time—405B parameters. The instruction-tuned Llama 3.1-405B, which figures to be the largest and most powerful open source language model available today and competitive with the best proprietary models on the market, will be available on IBM® watsonx.ai™ today where…

IBM watsonx Challenge empowers partners to solve real-world problems with AI

2 min read - In June, IBM invited ecosystem partners in Europe, the Middle East and Africa to participate in an IBM watsonx™ Challenge, a hands-on experience designed to bring the watsonx platform capabilities to some of the most important members of the IBM ecosystem. These ecosystem partners, who sell, build or service IBM technologies, enthusiastically embraced the challenge. Participants formed teams and focused on quickly crafting a solution to one of three selected challenges.   The challenges included using prompt engineering to analyze…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters