Home Topics Autoscaling What is auto scaling?
Explore IBM's auto scaling solution Subscribe to AI Topic Updates
Illustration with collage of pictograms of gear, robotic arm, mobile phone

Published: 12 December 2023
Contributors: Gita Jackson, Michael Goodwin

What is auto scaling?

Auto scaling, occasionally referred to as “automatic scaling,” is a cloud computing feature that automatically allocates computational resources based on system demand. 

Auto scaling is used to ensure that applications have the resources they need to maintain consistent availability and hit performance goals, as well as to promote the efficient use of cloud resources and minimize cloud costs. 

According to a 2023 white paper from Infosys, organizations that migrate to cloud waste about 32% of their cloud cost.1 Because of its focus on efficient resource utilization, auto scaling is a useful component in a successful FinOps practice.

When organizations configure cloud infrastructure, they provision resources according to a “baseline” of compute, storage and network resource needs. But demand fluctuates, say, with spikes or drops in network traffic or application use. Auto scaling features allow for resources to be scaled to match real-time demand according to specific metrics like CPU utilization or bandwidth availability, without human intervention.

Auto scaling can be used to optimize the allocation of resources through a variety of means. For example, predictive scaling uses historical data to predict future demand. Or, dynamic scaling, which reacts to resource needs in real-time as determined by an organization’s auto scaling policies.

Auto scaling policies automate the lifecycles of cloud computing instances, launching and terminating virtual machines as needed to assist with resource demand. Auto scaling is often used in tandem with elastic load balancing to fully leverage available cloud resources.

Debunking the myths of observability

This ebook aims to debunk myths surrounding observability and showcase its role in the digital world.

Related content

Register for TEI Report for IBM Robotic Process Automation

Load balancing versus auto scaling

While auto scaling is related to load balancing, they are not quite the same. Both of these processes affect the allocation of back-end resources and are used to optimize performance and avoid overprovisioning. They are often used together.

Load balancers distribute incoming traffic across multiple servers to reduce the load on any particular server. Load balancers often provide features like health checks that help direct traffic away from unhealthy instances and toward healthy ones. Balancing the traffic load helps improve the performance of applications in a cloud environment.

Auto scaling, by contrast, adjusts system capacity based on demand to maintain consistent performance and avoid overprovisioning resources (that is, using only what is needed.) Auto scaling adds new servers or compute instances (or terminates them) in accordance with resource demand and the auto scaling policies an organization has established.

How auto scaling works

Most cloud vendors, like IBM Cloud®, Amazon Web Services (sometimes called AWS Cloud), Microsoft Azure and Oracle Cloud Infrastructure offer auto scaling services on their cloud platforms. These services can help organizations configure auto scaling policies to meet the organization’s cloud computing needs and goals.

Different providers and platforms offer different features, capabilities and pricing and organizations will have different resources available and varying use cases, but generally auto scaling works as follows:

The process starts with a launch configuration, or baseline deployment, where an instance type (or types) is deployed with a specific capacity and performance features. This deployment is often done using API calls and infrastructure as code (IaC), a process that leverages code to provision and configure IT infrastructure elements to predefined specifications.

Organizations determine desired capacity, and what sort of attributes the instance needs based on the expected workload for that instance. In setting up an auto scaling policy, organizations can set targets and thresholds for compute, storage or network use. When these thresholds are met, it can automatically trigger a specified action to accommodate current resource demands more accurately. If desired, policies can be configured so that notifications are sent each time a scaling action is initiated.

Auto scaling groups

Organizations can also set up instance groups that maintain a minimum or maximum number of instances for specified workloads, or group together different instance types to handle different types of workloads. Instance types include:2

General-purpose instances

General-purpose instance types are designed for a variety of workloads, including web servers, small databases and development and testing environments.

Compute optimized instances

These instances are optimized for compute-intensive workloads such as high-performance computing, batch processing and scientific modeling. These instances maximize compute power using GPU and high core count CPU.

Memory optimized instances

These high-memory instances are optimized for memory-intensive workloads such as high-performance databases, distributed in-memory caches and real-time data processing, and big data analytics.

Storage optimized instances

These instances are optimized for storage-intensive workloads such as big data, data warehousing and log processing. They leverage high-capacity caching and solid-state drives (SSD) to support the intense read and write activities of the workloads.


Auto scaling groups featuring mixed instance types enable CloudOps and DevOps teams to meet resource demands more precisely and efficiently. For example, if bandwidth needs are suitably met, but CPU usage exceeds the threshold established in auto scaling policies, the system can activate more compute-specific instances. Meanwhile instances dedicated to handling network traffic remain as is.

Once teams understand workload demand, they can even create launch configuration templates for new instances. These templates define instance type, configuration parameters and other policies for the spinning up of new instances and how they contribute to the overall cloud environment. This allows organizations to fully automate the lifecycles of virtual machines.

Types of auto scaling

There are a couple of different types of scaling, as well as different methods of auto scaling:

Horizontal scaling

Horizontal scaling, or “scaling out,” entails adding more machines or nodes to a cloud computing environment. You can also scale in, reducing the number of nodes in the environment.

Vertical scaling

Vertical scaling, or “scaling up,” is the process of adding more power—RAM, CPU, storage, for instance—to existing nodes in your current cloud computing environment.

Auto scaling policies can be predictive, dynamic or scheduled.

Predictive scaling

Predictive scaling policies use artificial intelligence (AI) and machine learning to anticipate future resource needs before they occur based on historical utilization.

For instance, a predictive auto scaling policy might identify the likelihood of increased web traffic for an e-commerce company ahead of a holiday buying season. It might scale out or up in accordance with set policy. This approach can help proactively minimize network latency and downtime.

Dynamic scaling

Dynamic scaling policies react to resource needs as they occur, adjusting resource allocation based on real-time utilization. With a dynamic scaling policy, organizations can send more resources to a particular node or auto scaling group. They can also spin up additional instances when a specific threshold, like a percentage of CPU usage, is reached.

For instance, if an organization is running a web application that consumes significant resources on an irregular schedule, a dynamic scaling policy could be used to adjust resource availability as needed. Dynamic scaling is often accompanied by a cooldown period, in which increased resources remain available in the case that there are additional traffic spikes.

Scheduled scaling

Scheduled auto scaling policies allocate resources according to a predetermined schedule. For example, if an organization knows that traffic or resource demand is much higher in the evenings than in the morning, an auto scaling policy can be set to accommodate that demand.

Benefits of auto scaling

When implemented effectively, auto scaling can play a significant role in optimizing an organization’s cloud computing environment and reducing overall cloud costs.

By establishing robust auto scaling policies, organizations can reduce their dependence on manual provisioning and ensure more consistent system performance.

Minimize manual configuration of infrastructure

Auto scaling allows a cloud environment to react in real-time to resource demand without the need for human intervention. This is more efficient than manual scaling. It helps to reduce employee burnout, improve configuration and provisioning consistency, and free up employee time for more valuable tasks.

Increase scalability

Auto scaling allows organizations to expand their cloud computing environment and capabilities more seamlessly, without having to dedicate additional personnel to the monitoring and provisioning of resources.

Provide consistent performance

By ensuring that a cloud environment has the compute, network and storage resources it requires, regardless of activity or demand, auto scaling helps maintain the consistent and reliable performance of cloud services.

Improve user experience

More consistent web app and network performance mean a more consistent level of service for the user.

Reduce cloud computing costs

When relying on the manual provisioning of resources, organizations often overprovision as a precaution, just to make sure that resources are available for times of peak demand. By using a platform that can automatically scale compute, network and storage resources to meet demand in real-time, organizations can avoid overprovisioning. This approach makes sure that they use only what they need, resulting in a lower cloud bill and greater ROI.

Related solutions
Cloud cost optimization with IBM® Turbonomic®

Using the IBM Turbonomic platform’s AI-powered automation and cloud optimization solutions, you can continuously help ensure application performance (both traditional and cloud-native) and optimize costs. The platform’s cloud cost management tools work with public, private, multicloud and hybrid cloud environments and with any major cloud provider. 

Explore cloud cost optimization with IBM Turbonomic Try Turbonomic for free

IBM Turbonomic

The IBM Turbonomic hybrid cloud cost optimization platform allows you to continuously automate critical actions in real-time. With IBM Turbonomic, you can proactively deliver the most efficient use of compute, storage and network resources to your apps at every layer of the stack. 

Explore IBM Turbonomic Try the interactive demo
Resources Carhartt: A legendary brand achieves record holiday sales

Learn how Carhartt used IBM Turbonomic hybrid cloud cost optimization software to help its hybrid cloud infrastructure handle dramatic new spikes in demand.

Best Practices for Choosing a Cloud Optimization Solution

Read this exclusive PeerPaper™ Report, with best practices based on industry expert insights and reviews from verified IBM Turbonomic users.

Smarter AIOps

Put AI-powered automation to work in your business to help ensure application performance.

Operationalizing FinOps automation

Read this quick guide to explore the rapidly growing cloud financial management discipline of FinOps.

IBM Turbonomic overview

Learn how application resource management turns observability into action with AI-driven automation.

IBM Turbonomic documentation

Explore documentation for all versions of the IBM Turbonomic platform.

Take the next step

IBM Turbonomic allows you to run applications seamlessly, continuously and cost-effectively to help achieve efficient app performance while lowering costs.

Explore Turbonomic Book a free demo
Footnotes

Cloud cost optimization" (link resides outside ibm.com), Sarika Nandwani, Infosys.com, 2023

AWS EC2 instance types: Challenges and best practices for hosting your application in AWS,” Christopher Graham, 23 August 2023