Accelerated AI Processor on LinuxONE

The integrated AI accelerator is a feature of the IBM Telum® processor. It is an on-chip processing unit that is memory coherent and directly connected to the fabric like any other general-purpose core. It increases AI inferencing performance by minimizing latency between AI and data through colocation.

The IBM Telum chip, designed for LinuxONE systems, boasts over 40%¹ performance growth per socket compared to the IBM LinuxONE III. It introduces a dedicated on-chip AI accelerator, ensuring consistent low-latency and high-throughput inference capacity. This accelerator simplifies software orchestration and library complexity, while the accelerated AI processor transforms AI integration in enterprises, delivering real-time insights with unmatched performance across hybrid cloud environments.

Unlock business growth with AI on IBM LinuxONE

This webinar discusses how IBM LinuxONE can help you unlock new use cases for AI across industries.

Features

Telum is the central processor chip for the next-generation LinuxONE systems. Learn more about IBM Telum's AI accelerator architecture, microarchitecture, integration into the system stack, performance and power.

Read about the next-gen microprocessor for LinuxONE

Enhance real-time insights

The accelerator maintains memory coherence and connects directly to the fabric, similar to other general-purpose cores, enabling it to support low-latency inference while meeting the system's transaction rate. This capability empowers businesses to incorporate real-time insights with superior performance.

Overcome AI workload challenges

The integration of AI into enterprise workloads often encounters obstacles due to low throughput when run on platform. However, the on-chip AI accelerator can deliver high throughput, with inference capacity exceeding 200 TFLOPS in a 32-chip system².

Enhance AI performance

The Telum chip includes on-chip AI acceleration, boosting inference speed and scale. In a global bank's fraud detection model, the AI accelerator achieves 22x speedup compared to general-purpose cores, with 116k inferences per second and a latency of just 1.1 msec. Scaling up to 32 chips maintains low latency at 1.2 msec while performing over 3.5 million inferences/second³.

Use cases

Fraud detection

Face the challenge of real-time transaction scoring by overcoming network latency and scalability limitations associated with off-platform scoring engines. This can result in a significant increase in transaction security, with a higher percentage of transactions being successfully secured against fraud.

Medical imaging

Use computer vision and deep learning image analysis models to quickly process and validate medical records, enabling near real-time verification of insurance claims. This approach optimizes core consolidation, enhancing processing speed and efficiency.

ISV applications

IBM is working with the IBM LinuxONE Ecosystem to help ISVs provide solutions for today's AI, sustainability and cybersecurity challenges.

Explore two innovative solutions that are tailored for financial and healthcare institutions: Clari5 Enterprise Fraud Management on IBM LinuxONE 4 Express for real-time fraud prevention and Exponential AI's Enso Decision Intelligence Platform on LinuxONE for advanced AI solutions at scale.

Clari5

The Clari5 Enterprise Fraud Management Solution on IBM LinuxONE 4 Express empowers financial institutions with a robust decisioning engine for real-time fraud prevention. It is designed to monitor, detect and influence transactions, ensuring compliance and enhancing productivity, all while delivering unprecedented speed and scalability.

Exponential AI

Exponential AI's Enso Decision Intelligence Platform on LinuxONE provides cutting-edge capabilities for building, training, orchestrating and managing near real-time AI solutions at scale. This platform addresses challenges faced by leading national health insurance payers in complex transactions, offering Intelligent Automation solutions developed by Exponential AI.

Software that exploits Telum

Empower model development

TensorFlow

TensorFlow is an open-source machine learning framework that offers a comprehensive set of tools for model development, training and inference. It boasts a rich, robust ecosystem and is compatible with LinuxONE environments running on Linux.

Explore TensorFlow and TensorFlow Serving

Efficient machine learning

IBM SnapML

IBM SnapML is a library designed for high-speed training and inference of popular machine learning models. It leverages the IBM Integrated Accelerator for AI to enhance performance for Random Forest, Extra Trees and Gradient Boosting Machines models. Available as part of the AI Toolkit for IBM Z and LinuxONE and IBM CloudPak for Data.

Explore IBM Snap Machine Learning

Optimize inference

Triton Inference Server

The Triton Inference Server is an open-source model server developed by Nvidia that supports model inference on both CPU and GPU devices. It is widely used across various platforms and architectures, including s390x (Linux on Z). Specifically on Linux on Z, Triton can leverage AI frameworks to use both the SIMD architecture and the IBM Integrated Accelerator for AI, optimizing performance.

Explore Triton Inference Server

Empower data scientists

IBM Z Deep Learning Compiler

The IBM Z Deep Learning Compiler is a powerful tool that enables data scientists to develop deep learning models using familiar tools and frameworks. These models can then be deployed on Linux on IBM Z, where mission-critical data resides. This compiler facilitates the quick and easy utilization of the new Telum processor's Integrated Accelerator for AI by existing models.

Explore IBM Z Deep Learning Compiler

Portable model format

Open Neural Network Exchange

Open Neural Network Exchange (ONNX) is an open format built to represent machine learning models. ONNX defines a common set of operators—the building blocks of machine learning and deep learning models—and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes and compilers.

Explore ONNX

Integrated solutions

Discover seamless AI integration solutions tailored for IBM Z and LinuxONE systems.

Elevate AI performance

The AI Toolkit for IBM Z and LinuxONE comprises a range of popular open-source AI frameworks, supported by IBM Elite Support. It is optimized to use the IBM Z Integrated Accelerator for AI, significantly enhancing the performance of both open-source and IBM nonwarranted AI programs.

Explore AI Toolkit for IBM Z and LinuxONE

Optimize AI workloads

The AI Bundle for IBM LinuxONE offers a dedicated AI hardware infrastructure with an optimized core software stack. By harnessing the power of the IBM Telum processor with Integrated Accelerator for AI, enterprises can perform inferencing for high-volume workloads at scale.

Explore AI Bundle for IBM LinuxONE

Streamline data management:

IBM Cloud Pak for Data is a modular set of integrated software components designed for data analysis, organization and management. It enhances productivity and reduces complexity by assisting in building a data fabric that connects siloed data distributed across a hybrid cloud landscape.

Explore IBM Cloud Pak for Data

Read the Redbook

Take the next step

Learn more about AI on IBM LinuxONE by scheduling a no-cost 30-minute meeting with an IBM representative.

Start your journey to AI on LinuxONE

Explore IBM LinuxONE 4

Footnotes

¹ The IBM Telum processor on IBM z16™ delivers over 40% per socket performance growth versus the IBM z15 processor.

DISCLAIMER: Results are based on engineering analysis of the total processing capacity offered by the IBM Telum processor and the IBM z15 processor, as well as the IBM Large System Performance Reference (LSPR) ratios published at: https://www.ibm.com/support/pages/ibm-z-large-systems-performance-reference. The number of cores per processor socket accessible for general use varies based on system configuration. Total processing capacity varies based on workload, configuration and software levels.

²The on-chip AI acceleration is designed to add up to 5.8 TFLOPS of processing power shared by all cores on the chip.

DISCLAIMER: The result is the maximum theoretical number of floating-point operations per second (FLOPS) in 16bit precision that can be executed by a single on-chip AI engine. There is one on-chip AI engine per chip.

³ DISCLAIMER: Performance result is extrapolated from IBM internal tests running local inference operations in an IBM LinuxONE Emperor 4 LPAR with 48 cores and 128 GB memory on Ubuntu 20.04 (SMT mode) using a synthetic credit card fraud detection model (https://github.com/IBM/ai-on-z-fraud-detection) exploiting the Integrated Accelerator for AI. The benchmark was running with 8 parallel threads each pinned to the first core of a different chip. The lscpu command was used to identify the core-chip topology. A batch size of 128 inference operations was used. Results vary.