May 21, 2024 By AJ Nish 3 min read

As the demand for advanced graphics processing units (GPU) from vendors like NVIDIA® grows to support machine learning, AI, video streaming and 3D visualization, safeguarding performance while maximizing efficiency is critical. And with the pace of progress in AI model architecture rapidly accelerating with services like IBM watsonx™, the use of large language models (LLMs) that require advanced NVIDIA GPU workloads is on the rise to meet performance requirements. With this comes new concerns over costs and proper provisioning to ensure performance. 

IBM Turbonomic® is excited to announce the latest capability to optimize NVIDIA GPU workloads in the public cloud, on prem and on containers to improve efficiency without sacrificing performance. The benefits for customers include enhancing performance optimization to promote faster response, smoother experience and better efficiency by addressing resource waste to potentially keep costs down.  

Turbonomic can now monitor and optimize GPU on prem and on cloud, and coming in June, on containers. The more permissions Turbonomic is granted for observation, the more optimizations it can drive. With the recent release of IBM Turbonomic 8.12.0, we can now monitor and optimize GPU on prem and on cloud.  

On cloud 

Developers may find it difficult to decide which GPU cloud instances would serve them best and, in most cases, they could end up over-provisioning. We have seen some GPU instances costing more than USD 100 a day, over-provisioning can result in a steep increase in their cloud bill.  

Turbonomic enables users to scale GPU instances on demand to the instance type optimized to improve for efficiency and cost control. Currently, Turbonomic supports P2, P3, P3dn, G3, G4dn, G5, and G5g instance types on AWS. You can view these metrics in the Capacity and Usage and Multiple Resources charts.   

Example: View and scale GPU instances in AWS

On premises 

The usage of GPUs is increasingly prevalent, especially in virtual machine (VM) environments. We’re seeing that it’s becoming common to configure VMs with virtual GPUs (vGPUs) to leverage their powerful processing capabilities.  

Turbonomic’s current VM placing actions feature now identifies NVIDIA GPUs installed on both the source and the destination hosts as well the NVIDIA vGPU types assigned to VMs. Turbonomic makes suggestions on where to place the VM provided it supports compatible NVIDIA GPU cards and GPU types. Turbonomic will also recognize VMs with Passthrough GPUs attached to them and block them from move actions. 

On containers 

Generative AI (gen AI) and LLM workloads can require immense GPU processing power to operate at efficient levels of performance. Turbonomic was engineered to optimize GPU resources to make sure gen AI workloads meet performance standards while addressing efficiency in resource optimization and cost.  

Turbonomic is committed to developing GPU optimization services to provide performance insights and generate actions to achieve application performance and efficiency targets. Turbonomic has developed the capability to scale up and down containers serving gen AI models according to their respective waiting queue sizes.   

IBM Turbonomic’s new GPU optimization features combined with the new IBM Instana® technology to observe gen AI, is designed to provide efficiency and performance for customers leveraging GPU for LLMs. For more information or to be considered for our current containers preview, book a meeting with one of our Turbonomic specialists today.  

Learn more about GPU optimization with IBM Turbonomic

Statements regarding IBM’s future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only

More from Automation

Achieving operational efficiency through Instana’s Intelligent Remediation

3 min read - With digital transformation all around us, application environments are ever growing leading to greater complexity. Organizations are turning to observability to help them proactively address performance issues efficiently and are leveraging generative AI to gain a competitive edge in delivering exceptional user experiences. This is where Instana’s Intelligent Remediation comes in, as it enhances application performance and resolves issues, before they have a chance to impact customers. Now generally available: Instana’s Intelligent Remediation Announced at IBM Think 2024, I’m happy…

Announcing the general availability of IBM Concert

< 1 min read - At Think 2024, we announced IBM Concert®. It provides generative AI driven insights for your applications and puts site reliability engineers (SREs) and developers in control, enabling them to simplify and optimize their operations across any environment. IBM Concert is now generally available. You can now start using IBM Concert to get a detailed view of your applications and environments and apply generative AI to get insights on how to optimize your applications so your business works better.  Powered by…

Making HTTPS redirects easy with IBM NS1 Connect

3 min read - HTTPS is now the standard for application and website traffic on the internet. Over 85% of websites now use HTTPS by default—it’s to the point where a standard HTTP request now seems suspicious.  This is great for the security of the internet, but it’s a huge pain for the website and application teams that are managing HTTPS records. It was easy to move HTTP records around with a simple URL redirect. HTTPS redirects, on the other hand, require changing the URL…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters