In 2021, IBM® introduced the IBM Telum® processor, featuring its first advanced on-processor chip AI accelerator for inferencing. The Telum processor’s ability to deliver business outcomes has been a key driver behind the success of the IBM z16™ mainframe program. As client needs evolve, IBM continues to innovate and push the envelope on emerging technologies.

Today at the Hot Chips 2024 conference in Palo Alto, California, IBM announced the next generation of enterprise computing for the AI era with the IBM Telum® II processor and a preview of the IBM Spyre™ Accelerator. Both are expected to be available in 2025.

Developed using Samsung 5nm technology, the new IBM Telum II processor will feature eight high-performance cores running at 5.5GHz. Telum II will include a 40% increase in on-chip cache capacity, with the virtual L3 and virtual L4 growing to 360MB and 2.88GB respectively. The processor integrates a new data processing unit (DPU) specialized for IO acceleration and the next generation of on-chip AI acceleration. These hardware enhancements are designed to provide significant performance improvements for clients over previous generations.

Infusing AI into enterprise transactions has become essential for many of our clients’ workloads. For instance, our AI-driven fraud detection solutions are designed to save clients millions of dollars annually. With the introduction of the AI accelerator on the Telum processor, we’ve seen active adoption across our client base. Building on this success, we’ve significantly enhanced the AI accelerator on the Telum II processor.

The compute power of each accelerator is expected to be improved by 4x, reaching 24 trillion operations per second (TOPS). But TOPS alone don’t tell the whole story. It is all about the accelerator’s architectural design plus optimization of the AI ecosystem that sits on top of the accelerator. When it comes to AI acceleration in production enterprise workloads, a fit-for-purpose architecture matters. Telum II is engineered to enable model runtimes to sit side by side with the most demanding enterprise workloads, while delivering high throughput, low-latency inferencing. Additionally, support for INT8 as a data type has been added to enhance compute capacity and efficiency for applications where INT8 is preferred, thereby enabling the use of newer models.

New compute primitives have also been incorporated to better support large language models within the accelerator.  They are designed to support an increasingly broader range of AI models for a comprehensive analysis of both structured and textual data.

Beyond the raw compute improvements with Telum II, we’ve made system-level enhancements in the processor drawer. These enhancements enable each AI accelerator to accept work from any core in the same drawer to improve the load balancing across all eight of those AI accelerators. This gives each core access to more low-latency AI acceleration, designed for 192 TOPS available when fully configured between all the AI accelerators in the drawer.

At Hot Chips 2024, IBM also showcased the IBM Spyre Accelerator, which was jointly developed with IBM Research and IBM Infrastructure development. The Spyre Accelerator will contain 32 AI accelerator cores that will share a similar architecture to the AI accelerator integrated into the Telum II chip. Multiple IBM Spyre Accelerators can be connected into the I/O Subsystem of IBM Z via PCIe. Combining these two technologies can result in a substantial increase in the amount of available acceleration.

Both IBM Telum II and the Spyre Accelerator are designed to support a broader, larger set of models with what’s called ensemble AI method use cases. Using ensemble AI leverages the strength of multiple AI models to improve overall performance and accuracy of a prediction as compared to individual models.

Let’s explore insurance claims fraud detection as an example of an ensemble AI method. Traditional neural networks are designed to provide an initial risk assessment, and when combined with large language models (LLMs), they are geared to enhance performance and accuracy. Similarly, these ensemble AI techniques can drive advanced detection for suspicious financial activities, supporting compliance with regulatory requirements and mitigating the risk of financial crimes.

The new Telum II processor and IBM Spyre Accelerator are engineered for a broader set of AI use cases to accelerate and deliver on client business outcomes. With these next-generation enterprise solutions, IBM continues a tradition of groundbreaking innovations, co-created with our clients as their needs evolve to fit their mission-critical transactional workloads.

Read more about Telum II here Webinar: Learn how IBM Z can enhance, scale, and secure your workloads

Statements regarding IBM’s future direction and intent are subject to change or withdrawal without notice and represent goals and objectives only.

More from Artificial intelligence

An in-depth look at the foundation models and tools used to develop the US Open fan experience

5 min read - For more than three decades, teams of developers and data scientists from IBM Consulting® have collaborated with the United States Tennis Association (USTA) to provide an engaging digital experience for US Open tennis fans. Let’s take a deep dive into this year’s innovations across two generative AI projects that leverage IBM’s versatile family of enterprise-ready Granite™ foundation models, among other models. We’ll also look at how the team used IBM watsonx Code Assistant™ to accelerate code generation and improve productivity…

Gen AI: Your new creative partner?

3 min read - As artificial intelligence continues to reshape industries, its impact on creativity has been a hot topic of debate. Can machines truly be creative? Or will they simply mimic human ingenuity? A new study published in Nature Human Behaviour sheds light on this question, suggesting that generative AI might be more of a creative collaborator than a replacement. Researchers put ChatGPT, the popular AI chatbot, through its paces, pitting it against Google searches and unaided human thinking. The result? Large language…

Examining synthetic data: The promise, risks and realities

3 min read - As artificial intelligence reshapes industries worldwide, developers are grappling with an unexpected challenge: a shortage of high-quality, real-world data to train their increasingly sophisticated models. Now, a potential solution is emerging from an unlikely source—data that doesn't exist in reality at all. Synthetic data, artificially generated information designed to mimic real-world scenarios, is rapidly gaining traction in AI development. It promises to overcome data bottlenecks, address privacy concerns, and reduce costs. However, as the field evolves, questions about its limitations…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters