The integrated AI accelerator is a feature of the IBM Telum® processor. It is an on-chip processing unit that is memory coherent and directly connected to the fabric like any other general-purpose core. It increases AI inferencing performance by minimizing latency between AI and data through colocation.
The IBM Telum chip, designed for LinuxONE systems, boasts over 40%1 performance growth per socket compared to the IBM LinuxONE III. It introduces a dedicated on-chip AI accelerator, ensuring consistent low-latency and high-throughput inference capacity. This accelerator simplifies software orchestration and library complexity, while the accelerated AI processor transforms AI integration in enterprises, delivering real-time insights with unmatched performance across hybrid cloud environments.
This webinar discusses how IBM LinuxONE can help you unlock new use cases for AI across industries.
IBM is working with the IBM LinuxONE Ecosystem to help ISVs provide solutions for today's AI, sustainability and cybersecurity challenges.
Explore two innovative solutions that are tailored for financial and healthcare institutions: Clari5 Enterprise Fraud Management on IBM LinuxONE 4 Express for real-time fraud prevention and Exponential AI's Enso Decision Intelligence Platform on LinuxONE for advanced AI solutions at scale.
The Clari5 Enterprise Fraud Management Solution on IBM LinuxONE 4 Express empowers financial institutions with a robust decisioning engine for real-time fraud prevention. It is designed to monitor, detect and influence transactions, ensuring compliance and enhancing productivity, all while delivering unprecedented speed and scalability.
Exponential AI's Enso Decision Intelligence Platform on LinuxONE provides cutting-edge capabilities for building, training, orchestrating and managing near real-time AI solutions at scale. This platform addresses challenges faced by leading national health insurance payers in complex transactions, offering Intelligent Automation solutions developed by Exponential AI.
TensorFlow is an open-source machine learning framework that offers a comprehensive set of tools for model development, training and inference. It boasts a rich, robust ecosystem and is compatible with LinuxONE environments running on Linux.
IBM SnapML is a library designed for high-speed training and inference of popular machine learning models. It leverages the IBM Integrated Accelerator for AI to enhance performance for Random Forest, Extra Trees and Gradient Boosting Machines models. Available as part of the AI Toolkit for IBM Z and LinuxONE and IBM CloudPak for Data.
The Triton Inference Server is an open-source model server developed by Nvidia that supports model inference on both CPU and GPU devices. It is widely used across various platforms and architectures, including s390x (Linux on Z). Specifically on Linux on Z, Triton can leverage AI frameworks to use both the SIMD architecture and the IBM Integrated Accelerator for AI, optimizing performance.
The IBM Z Deep Learning Compiler is a powerful tool that enables data scientists to develop deep learning models using familiar tools and frameworks. These models can then be deployed on Linux on IBM Z, where mission-critical data resides. This compiler facilitates the quick and easy utilization of the new Telum processor's Integrated Accelerator for AI by existing models.
Open Neural Network Exchange (ONNX) is an open format built to represent machine learning models. ONNX defines a common set of operators—the building blocks of machine learning and deep learning models—and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes and compilers.
1 The IBM Telum processor on IBM z16™ delivers over 40% per socket performance growth versus the IBM z15 processor.
DISCLAIMER: Results are based on engineering analysis of the total processing capacity offered by the IBM Telum processor and the IBM z15 processor, as well as the IBM Large System Performance Reference (LSPR) ratios published at: https://www.ibm.com/support/pages/ibm-z-large-systems-performance-reference. The number of cores per processor socket accessible for general use varies based on system configuration. Total processing capacity varies based on workload, configuration and software levels.
2 The on-chip AI acceleration is designed to add up to 5.8 TFLOPS of processing power shared by all cores on the chip.
DISCLAIMER: The result is the maximum theoretical number of floating-point operations per second (FLOPS) in 16bit precision that can be executed by a single on-chip AI engine. There is one on-chip AI engine per chip.
3 DISCLAIMER: Performance result is extrapolated from IBM internal tests running local inference operations in an IBM LinuxONE Emperor 4 LPAR with 48 cores and 128 GB memory on Ubuntu 20.04 (SMT mode) using a synthetic credit card fraud detection model (https://github.com/IBM/ai-on-z-fraud-detection) exploiting the Integrated Accelerator for AI. The benchmark was running with 8 parallel threads each pinned to the first core of a different chip. The lscpu command was used to identify the core-chip topology. A batch size of 128 inference operations was used. Results vary.