Enhancing the Speed of AI Inferencing with the Power10 Chip

An inferencing model is one that is used to find insights on data through being trained on how to find patterns of interest in the data.

Inferencing does not need as much compute power as compared to the compute power required in training an artificial intelligence (AI) model. Thus, it is totally possible—and even more energy efficient—to inference without any extra hardware accelerators (such as GPUs) and even perform on edge devices. It is common to have AI inferencing models run on smart phones and similar devices just by using the CPU. Many picture and face filters on social media phone apps are all AI inferencing models.

IBM’s Power10 chip

IBM was the pioneer in adding on-processor accelerators for inferencing in its IBM Power10 chip, called the Matrix Math Accelerator (MMA) engines. This gave the Power10 platform the ability to be faster than other hardware architectures without the need to spend an extra watt in energy with added GPUs. The Power10 chip can extract insights from data faster than any other chip architecture and consumes much less energy than GPU-based systems, and that is why it is optimized for AI.

Leveraging IBM Power10 for AI, especially for inferencing, does not require any extra effort from AI DevOps teams. The data science libraries—such as openBLAS, libATen, Eigen and MLAS, to name a few—are already optimized to make use of the MMA engines. So, AI frameworks that leverage these libraries—such as Pytorch, Tensorflow and ONNX—already benefit from the on-chip acceleration. These optimized libraries are available through the RocketCE channel in anaconda.org.

IBM Power10 can also speed up inferencing by using reduced-precision data. Instead of feeding the inference model with 32-bit floating point data, one can feed it with 16-bit floating point data, for example—filling the processor with twice as much data for inferencing at the same time. This works well for some models without prejudice in the accuracy of the inferenced data.

Inferencing is the last stage of the AI DevOps cycle, and the IBM Power10 platform was designed to be AI-optimized, thus helping clients extract insights from data in a more cost-effective way both in terms of energy efficiency and reducing the need for extra accelerators.

Learn more

If you want to learn more about inferencing on Power10, please reach out to IBM Technology Expert Labs at technologyservices@ibm.com.

Was this article helpful?

YesNo

Rodrigo Ceron

Senior Managing Consultant

Putting AI to work in finance: Using generative AI for transformational change

2 min read - Finance leaders are no strangers to the complexities and challenges that come with driving business growth. From navigating the intricacies of enterprise-wide digitization to adapting to shifting customer spending habits, the responsibilities of a CFO have never been more multifaceted. Amidst this complexity lies an opportunity. CFOs can harness the transformative power of generative AI (gen AI) to revolutionize finance operations and unlock new levels of efficiency, accuracy and insights. Generative AI is a game-changing technology that promises to reshape…

AI that’s ready for business starts with data that’s ready for AI

6 min read - By 2026, over 80% of enterprises will deploy AI APIs or generative AI applications. AI models and the data on which they're trained and fine-tuned can elevate applications from generic to impactful, offering tangible value to customers and businesses. For example, the Master’s generative AI-driven golf fan experience uses real-time and historical data to provide insights and commentary for over 20,000 video clips. The quality and quantity of data can make or break AI success, and organizations that effectively harness…

Applying generative AI to revolutionize telco network operations

5 min read - Generative AI is shaping the future of telecommunications network operations. The potential applications for enhancing network operations include predicting the values of key performance indicators (KPIs), forecasting traffic congestion, enabling the move to prescriptive analytics, providing design advisory services and acting as network operations center (NOC) assistants. In addition to these capabilities, generative AI can revolutionize drive tests, optimize network resource allocation, automate fault detection, optimize truck rolls and enhance customer experience through personalized services. Operators and suppliers are…

An inferencing model is one that is used to find insights on data through being trained on how to find patterns of interest in the data.

IBM’s Power10 chip

Learn more

More from Artificial intelligence

Putting AI to work in finance: Using generative AI for transformational change

AI that’s ready for business starts with data that’s ready for AI

Applying generative AI to revolutionize telco network operations

IBM Newsletters