What is Auto Scaling?

Published: 12 December 2023
Contributors: Gita Jackson, Michael Goodwin

What is auto scaling?

Auto scaling, occasionally referred to as “automatic scaling,” is a cloud computing feature that automatically allocates computational resources based on system demand.

Auto scaling is used to ensure that applications have the resources they need to maintain consistent availability and hit performance goals, as well as to promote the efficient use of cloud resources and minimize cloud costs.

According to a 2023 white paper from Infosys, organizations that migrate to cloud waste about 32% of their cloud cost.¹ Because of its focus on efficient resource utilization, auto scaling is a useful component in a successful FinOps practice.

When organizations configure cloud infrastructure, they provision resources according to a “baseline” of compute, storage and network resource needs. But demand fluctuates, say, with spikes or drops in network traffic or application use. Auto scaling features allow for resources to be scaled to match real-time demand according to specific metrics like CPU utilization or bandwidth availability, without human intervention.

Auto scaling can be used to optimize the allocation of resources through a variety of means. For example, predictive scaling uses historical data to predict future demand. Or, dynamic scaling, which reacts to resource needs in real-time as determined by an organization’s auto scaling policies.

Auto scaling policies automate the lifecycles of cloud computing instances, launching and terminating virtual machines as needed to assist with resource demand. Auto scaling is often used in tandem with elastic load balancing to fully leverage available cloud resources.

Debunking the myths of observability

This ebook aims to debunk myths surrounding observability and showcase its role in the digital world.

Related content

Load balancing versus auto scaling

While auto scaling is related to load balancing, they are not quite the same. Both of these processes affect the allocation of back-end resources and are used to optimize performance and avoid overprovisioning. They are often used together.

Load balancers distribute incoming traffic across multiple servers to reduce the load on any particular server. Load balancers often provide features like health checks that help direct traffic away from unhealthy instances and toward healthy ones. Balancing the traffic load helps improve the performance of applications in a cloud environment.

Auto scaling, by contrast, adjusts system capacity based on demand to maintain consistent performance and avoid overprovisioning resources (that is, using only what is needed.) Auto scaling adds new servers or compute instances (or terminates them) in accordance with resource demand and the auto scaling policies an organization has established.

How auto scaling works

Most cloud vendors, like IBM Cloud®, Amazon Web Services (sometimes called AWS Cloud), Microsoft Azure and Oracle Cloud Infrastructure offer auto scaling services on their cloud platforms. These services can help organizations configure auto scaling policies to meet the organization’s cloud computing needs and goals.

Different providers and platforms offer different features, capabilities and pricing and organizations will have different resources available and varying use cases, but generally auto scaling works as follows:

The process starts with a launch configuration, or baseline deployment, where an instance type (or types) is deployed with a specific capacity and performance features. This deployment is often done using API calls and infrastructure as code (IaC), a process that leverages code to provision and configure IT infrastructure elements to predefined specifications.

Organizations determine desired capacity, and what sort of attributes the instance needs based on the expected workload for that instance. In setting up an auto scaling policy, organizations can set targets and thresholds for compute, storage or network use. When these thresholds are met, it can automatically trigger a specified action to accommodate current resource demands more accurately. If desired, policies can be configured so that notifications are sent each time a scaling action is initiated.

Auto scaling groups

Organizations can also set up instance groups that maintain a minimum or maximum number of instances for specified workloads, or group together different instance types to handle different types of workloads. Instance types include:²

General-purpose instances

General-purpose instance types are designed for a variety of workloads, including web servers, small databases and development and testing environments.

Compute optimized instances

These instances are optimized for compute-intensive workloads such as high-performance computing, batch processing and scientific modeling. These instances maximize compute power using GPU and high core count CPU.

Memory optimized instances

These high-memory instances are optimized for memory-intensive workloads such as high-performance databases, distributed in-memory caches and real-time data processing, and big data analytics.

Storage optimized instances

These instances are optimized for storage-intensive workloads such as big data, data warehousing and log processing. They leverage high-capacity caching and solid-state drives (SSD) to support the intense read and write activities of the workloads.

Auto scaling groups featuring mixed instance types enable CloudOps and DevOps teams to meet resource demands more precisely and efficiently. For example, if bandwidth needs are suitably met, but CPU usage exceeds the threshold established in auto scaling policies, the system can activate more compute-specific instances. Meanwhile instances dedicated to handling network traffic remain as is.

Once teams understand workload demand, they can even create launch configuration templates for new instances. These templates define instance type, configuration parameters and other policies for the spinning up of new instances and how they contribute to the overall cloud environment. This allows organizations to fully automate the lifecycles of virtual machines.

Types of auto scaling

There are a couple of different types of scaling, as well as different methods of auto scaling:

Horizontal scaling

Horizontal scaling, or “scaling out,” entails adding more machines or nodes to a cloud computing environment. You can also scale in, reducing the number of nodes in the environment.

Vertical scaling

Vertical scaling, or “scaling up,” is the process of adding more power—RAM, CPU, storage, for instance—to existing nodes in your current cloud computing environment.

Auto scaling policies can be predictive, dynamic or scheduled.

Predictive scaling

Predictive scaling policies use artificial intelligence (AI) and machine learning to anticipate future resource needs before they occur based on historical utilization.

For instance, a predictive auto scaling policy might identify the likelihood of increased web traffic for an e-commerce company ahead of a holiday buying season. It might scale out or up in accordance with set policy. This approach can help proactively minimize network latency and downtime.

Dynamic scaling

Dynamic scaling policies react to resource needs as they occur, adjusting resource allocation based on real-time utilization. With a dynamic scaling policy, organizations can send more resources to a particular node or auto scaling group. They can also spin up additional instances when a specific threshold, like a percentage of CPU usage, is reached.

For instance, if an organization is running a web application that consumes significant resources on an irregular schedule, a dynamic scaling policy could be used to adjust resource availability as needed. Dynamic scaling is often accompanied by a cooldown period, in which increased resources remain available in the case that there are additional traffic spikes.

Scheduled scaling

Scheduled auto scaling policies allocate resources according to a predetermined schedule. For example, if an organization knows that traffic or resource demand is much higher in the evenings than in the morning, an auto scaling policy can be set to accommodate that demand.

Benefits of auto scaling

When implemented effectively, auto scaling can play a significant role in optimizing an organization’s cloud computing environment and reducing overall cloud costs.

By establishing robust auto scaling policies, organizations can reduce their dependence on manual provisioning and ensure more consistent system performance.

Minimize manual configuration of infrastructure

Auto scaling allows a cloud environment to react in real-time to resource demand without the need for human intervention. This is more efficient than manual scaling. It helps to reduce employee burnout, improve configuration and provisioning consistency, and free up employee time for more valuable tasks.

Increase scalability

Auto scaling allows organizations to expand their cloud computing environment and capabilities more seamlessly, without having to dedicate additional personnel to the monitoring and provisioning of resources.

Provide consistent performance

By ensuring that a cloud environment has the compute, network and storage resources it requires, regardless of activity or demand, auto scaling helps maintain the consistent and reliable performance of cloud services.

Improve user experience

More consistent web app and network performance mean a more consistent level of service for the user.

Reduce cloud computing costs

When relying on the manual provisioning of resources, organizations often overprovision as a precaution, just to make sure that resources are available for times of peak demand. By using a platform that can automatically scale compute, network and storage resources to meet demand in real-time, organizations can avoid overprovisioning. This approach makes sure that they use only what they need, resulting in a lower cloud bill and greater ROI.

Footnotes

¹“Cloud cost optimization" (link resides outside ibm.com), Sarika Nandwani, Infosys.com, 2023

²“AWS EC2 instance types: Challenges and best practices for hosting your application in AWS,” Christopher Graham, 23 August 2023