Horizontal pod auto scaling by using custom metrics
The Horizontal Pod Autoscaler (HPA) in IBM Cloud Private allows your system to automatically scale workloads up or down based on the resource usage. This automatic scaling helps to guarantee service level agreements (SLAs) for your workloads.
Important: This content is a technical preview, and should not be relied on in a production environment.
By default, the HPA policy automatically scales the number of pods based on the observed CPU utilization. However, in many situations, you might want to scale the application based on other monitored metrics, such as the number of incoming requests or the memory consumption. Starting with IBM Cloud Private Version 3.2.0, you have the capability to automate scaling by leveraging the Prometheus and Prometheus adapter.
Prometheus
Prometheus is widely used to monitor all the components of a Kubernetes cluster. These components include the control plane, the worker nodes, and the applications that are running on the cluster.
Prometheus adapter
Prometheus adapter is the Kubernetes aggregator layer that installs extra Kubernetes-style APIs and register custom API servers to the Kubernetes cluster. The adapter gathers the names of available metrics from Prometheus at regular intervals and then exposes metrics to HPA for autoscaling.
Preparing for the installation
By default, in IBM Cloud Private, HPA is enabled to auto scale based on CPU utilization. To enable auto scaling based on custom metrics, you must remove the custom-metrics-adapter
option from the disabled_management_services
parameter in the /<installation_directory>/cluster/config.yaml
file.
Your configuration file might resemble the following code:
## Management Services Settings
## You can disable following services: custom-metrics-adapter, istio, metering, monitoring, service-catalog, storage-glusterfs, vulnerability-advisor
management_services:
istio: disabled
vulnerability-advisor: disabled
storage-glusterfs: disabled
storage-minio: disabled
Verifying the installation
After installation completes, verify that the custom-metrics-adapter
is enabled.
-
Ensure that the
autoscaling/v2beta1
API group displays.kubectl api-versions |grep "autoscaling/v2beta1"
The output resembles the following code:
autoscaling/v2beta1
-
Ensure that the corresponding
custom-metrics-adapter
pod is deployed and is in arunning
state.kubectl get po -n kube-system |grep custom-metrics-adapter
The output resembles the following code:
custom-metrics-adapter-76d7bb8dcd-2pj4k 1/1 Running 0 18m
-
List the default custom metrics that are provided by the Prometheus adapter on the pod.
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq . |grep "pods/"
The output resembles the following code:
"name": "pods/kube_pod_container_status_waiting_reason", "name": "pods/fs_read", "name": "pods/memory_failures", "name": "pods/kube_pod_status_phase", "name": "pods/kube_pod_container_resource_limits_memory_bytes", "name": "pods/cpu_user", "name": "pods/fs_usage_bytes", "name": "pods/tasks_state", "name": "pods/kube_pod_container_info", "name": "pods/cpu_cfs_throttled", "name": "pods/fs_sector_writes", "name": "pods/kube_pod_created", "name": "pods/network_tcp_usage", "name": "pods/spec_memory_limit_bytes", "name": "pods/network_udp_usage", "name": "pods/memory_max_usage_bytes", "name": "pods/spec_cpu_quota", "name": "pods/kube_pod_container_status_terminated_reason", "name": "pods/cpu_system", "name": "pods/kube_pod_container_status_running", "name": "pods/kube_pod_status_ready", "name": "pods/fs_io_time_weighted", "name": "pods/fs_reads_bytes", "name": "pods/kube_pod_info", "name": "pods/fs_reads_merged", "name": "pods/kube_pod_container_resource_requests_cpu_cores", "name": "pods/fs_io_time", "name": "pods/kube_pod_container_resource_limits_cpu_cores", "name": "pods/fs_inodes", "name": "pods/start_time_seconds", "name": "pods/kube_pod_container_status_terminated", "name": "pods/kube_pod_container_status_waiting", "name": "pods/cpu_usage", "name": "pods/spec_cpu_shares", "name": "pods/spec_memory_reservation_limit_bytes", "name": "pods/kube_pod_container_status_ready", "name": "pods/fs_writes_merged", "name": "pods/fs_inodes_free", "name": "pods/cpu_cfs_throttled_periods", "name": "pods/kube_pod_labels", "name": "pods/cpu_load_average_10s", "name": "pods/fs_io_current", "name": "pods/memory_working_set_bytes", "name": "pods/spec_memory_swap_limit_bytes", "name": "pods/fs_reads", "name": "pods/kube_pod_container_resource_requests_memory_bytes", "name": "pods/memory_rss", "name": "pods/cpu_cfs_periods", "name": "pods/fs_writes_bytes", "name": "pods/fs_writes", "name": "pods/last_seen", "name": "pods/spec_cpu_period", "name": "pods/kube_pod_start_time", "name": "pods/fs_write", "name": "pods/memory_failcnt", "name": "pods/kube_pod_container_status_restarts", "name": "pods/fs_sector_reads", "name": "pods/kube_pod_status_scheduled", "name": "pods/memory_cache", "name": "pods/memory_usage_bytes", "name": "pods/memory_swap", "name": "pods/fs_limit_bytes", "name": "pods/kube_pod_owner",
Example: Deploying an application with a HPA policy
This example shows you how to autoscale a nginx web application based on memory usage by using a HPA policy. When the memory_usage_bytes
of a nginx pod is greater than 10 M, the policy scales up the nginx web application. Scaling up
an application increases the number of pods available for a deployment. If the memory_usage_bytes
of a nginx pod is less than 10 M, the application scales down, but does not scale less than the minimum number of replicas that are
specified for the deployment.
-
Create the
podinfo-svc.yaml
file by using the following code:--- apiVersion: v1 kind: Service metadata: name: podinfo labels: app: podinfo annotations: prometheus.io/scrape: "true" spec: type: NodePort ports: - port: 80 targetPort: 80 nodePort: 31198 protocol: TCP selector: app: podinfo
-
Create a
podinfo
service by running the following command:kubectl create -f podinfo-svc.yaml
The response resembles the following example:
service "podinfo" created
-
Create the
podinfo-dep.yaml
file by using the following code:--- apiVersion: extensions/v1beta1 kind: Deployment metadata: name: podinfo spec: replicas: 2 template: metadata: labels: app: podinfo annotations: prometheus.io/scrape: 'true' spec: containers: - name: podinfod image: nginx:latest imagePullPolicy: Always ports: - containerPort: 80 protocol: TCP resources: requests: memory: "32Mi" cpu: "1m" limits: memory: "256Mi" cpu: "100m"
-
Create a
podinfo
deployment by running the following command:kubectl create -f podinfo-dep.yaml
The response resembles the following example:
deployment "podinfo" created
-
Create the
podinfo-hpa-custom.yaml
file by using the following code:--- apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler metadata: name: podinfo spec: scaleTargetRef: apiVersion: extensions/v1beta1 kind: Deployment name: podinfo minReplicas: 2 maxReplicas: 10 metrics: - type: Pods pods: metricName: memory_usage_bytes targetAverageValue: 10485760
-
Create a
podinfo
HPA policy based on pod memory usage (memory_usage_bytes,10485760=10M
) by running the following command:kubectl create -f podinfo-hpa-custom.yaml
The response resembles the following example:
horizontalpodautoscaler.autoscaling "podinfo" created
-
Simulate the load by using the Apache
ab
application. This application triggers an autoscaling workload.for a in `seq 1 50`; do ab -rSqd -c 200 -n 20000 <node_ip>:31198/;done
<node_ip>
is the IP address of a node in your IBM Cloud Private cluster.