Adjusting the CPU and memory reservations for integration server pods with the Vertical Pod Autoscaler

The Vertical Pod Autoscaler (VPA) is a Kubernetes resource that automates the process for setting resource limits and requests for the pods in a cluster. You can install the VPA in your cluster and then use it to configure autoscaling for your running integration server pods to ensure that they are not over resourced, or can be scaled up when demand increases.

You configure autoscaling for an integration server by deploying a VPA object that defines minimum and maximum CPU and memory limits for pod containers, with an update policy to automatically modify resource requests. The VPA will then monitor CPU and memory usage over an extended period of time, generate recommendations to apply to the integration server pods, and dynamically scale up or scale down as required.


Before you begin

Cluster metrics must have been configured by your cluster administrator.

Ensure that you have cluster administrator permissions and are logged in to your cluster.

Installing the VPA

The VPA comprises three components (admission-controller, recommender, and updater), which are deployed when you install.

Installing on Red Hat OpenShift

For Red Hat® OpenShift® clusters, see the Red Hat OpenShift documentation for installation details. The VPA is typically installed into the openshift-vertical-pod-autoscaler namespace.

After you install, ensure that the required permissions are assigned to the VPA deployment as described in Configuring role-based access control (RBAC).

Configuring role-based access control (RBAC)

Specify which autoscaling actions can be performed on deployed integration servers by assigning RBAC permissions to VPA cluster roles. You can do so by creating ClusterRole and ClusterRoleBinding resources that are linked to the vpa-recommender service account that is installed with the VPA.

To configure RBAC, complete the following steps:

  1. If unsure, identify the namespace of the vpa-recommender service account by running the following command. (You will need to specify this namespace in a later step.)
    oc get sa --all-namespaces | grep vpa-recommender

    You should see output similar to this, which identifies the namespace (openshift-vertical-pod-autoscaler) in the first column:

    openshift-vertical-pod-autoscaler               vpa-recommender                    2         65m
  2. Create the ClusterRole object:
    1. From your local computer, create a YAML file (for example, cr_filename.yaml) by copying the following custom resource:
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
        name: ibm-appconnect-vpa-recommender
      rules:
        - apiGroups:
            - appconnect.ibm.com
          resources:
            - integrationservers
            - integrationservers/scale
          verbs:
            - get
            - list
            - watch
    2. From the command line, run the following command to deploy the cluster role:
      oc apply -f cr_filename.yaml
  3. Create the ClusterRoleBinding object:
    1. From your local computer, create a YAML file (for example, crb_filename.yaml) by copying the following custom resource, where vpaRecommenderNamespace is the namespace of the vpa-recommender service account:
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRoleBinding
      metadata:
        name: ibm-appconnect-vpa-recommender
      subjects:
      - kind: ServiceAccount
        name: vpa-recommender
        namespace: vpaRecommenderNamespace
      roleRef:
        kind: ClusterRole
        name: ibm-appconnect-vpa-recommender
        apiGroup: rbac.authorization.k8s.io
    2. From the command line, run the following command to deploy the cluster role binding:
      oc apply -f crb_filename.yaml

Next, configure autoscaling for your running integration servers.

Configuring autoscaling for your running integration servers

After you've installed the VPA and set ClusterRole permissions, you can configure the VPA to target your running integration servers. To configure autoscaling for an integration server, you must deploy a VerticalPodAutoscaler resource, which identifies the containers, the minimum and maximum limits, and the update policy for autoscaling.

Complete the following steps for any running integration server:

  1. From your local computer, create a file with a .yaml extension (for example, toolkit_vpa.yaml) by copying the following VerticalPodAutoscaler custom resource, where:
    • metadata.name is a unique name for this instance of the VerticalPodAutoscaler custom resource. (In this example, the value is shown as toolkit-vpa.)
    • spec.targetRef.name is set to the name of your integration server. (In this example, the value is shown as is-toolkit.)
    • spec.resourcePolicy.containerPolicies.minAllowed and spec.resourcePolicy.containerPolicies.maxAllowed are set to the lower and upper resource limits for containers that are defined for the integration server.
    apiVersion: "autoscaling.k8s.io/v1"
    kind: VerticalPodAutoscaler
    metadata:
      name: toolkit-vpa
    spec:
      targetRef:
        apiVersion: appconnect.ibm.com/v1beta1
        kind: IntegrationServer
        name: is-toolkit
      resourcePolicy:
        containerPolicies:
          - containerName: '*'
            minAllowed:
              cpu: 100m
              memory: 50Mi
            maxAllowed:
              cpu: 2
              memory: 500Mi
            controlledResources: ["cpu", "memory"]
    Note: An update policy is not explicitly set in the supplied YAML because by default, the VPA will automatically apply resource recommendations by using this spec.updatePolicy.updateMode setting:
    spec:
    ...
      updatePolicy:
        updateMode: "Auto"
  2. From the command line, run the following command to deploy a VerticalPodAutoscaler object.
    oc apply -f toolkit_vpa.yaml

    The VPA will begin to query for metrics on the pods and after approximately 30 seconds, should begin to generate recommendations based on current and previous usage. (Under certain circumstances, this process might take longer.)

  3. To verify that vertical pod autoscaling is operational, run this command to view detailed information about the VerticalPodAutoscaler object, where vpaObjectName is the metadata.name value of the object:
    oc describe vpa vpaObjectName

    The following example shows the output of the oc describe command for a sample VerticalPodAutoscaler object called toolkit-vpa. The Status section of the output indicates when the recommendations were last provided, and shows the generated recommendations for containers belonging to an integration server called is-toolkit, after querying the resource metrics.

    oc describe vpa toolkit-vpa 
    Name:         toolkit-vpa
    Namespace:    ace-cam
    API Version:  autoscaling.k8s.io/v1
    Kind:         VerticalPodAutoscaler
    Metadata:
    ...
    Spec:
      Resource Policy:
        Container Policies:
          Container Name:  *
          Controlled Resources:
            cpu
            memory
          Max Allowed:
            Cpu:     2
            Memory:  500Mi
          Min Allowed:
            Cpu:     100m
            Memory:  50Mi
      Target Ref:
        API Version:  appconnect.ibm.com/v1beta1
        Kind:         IntegrationServer
        Name:         is-toolkit
      Update Policy:
        Update Mode:  Auto
    Status:
      Conditions:
        Last Transition Time:  2020-11-03T17:46:15Z
        Status:                True
        Type:                  RecommendationProvided
      Recommendation:
        Container Recommendations:
          Container Name:  is-toolkit
          Lower Bound:
            Cpu:     100m
            Memory:  262144k
          Target:
            Cpu:     182m
            Memory:  272061154
          Uncapped Target:
            Cpu:     182m
            Memory:  272061154
          Upper Bound:
            Cpu:     2
            Memory:  500Mi
    Events:          <none>

    A pod with resource requests that are less than the Lower Bound or greater than the Upper Bound recommendations will be recreated with the Target (optimal) recommendation. The Uncapped Target values show the most recent resource recommendations.

    For more information about the Vertical Pod Autoscaler, see the README.md file in the GitHub repository and Automatically adjust pod resource levels with the vertical pod autoscaler (on Red Hat OpenShift).

Example: Testing autoscaling

If you have integration servers that are running close to the limits, the VPA should readily spring into action. However, if all your integration servers are well within the limits, you can test autoscaling by using a sample load. In this example, we will illustrate autoscaling for a sample integration server that is running close to the CPU limit.

  1. Download the attached perf-rating.bar.zip file.
  2. Extract the contents of the ZIP file to a directory on your local computer. This ZIP file contains a PerfRating.bar file for a Toolkit integration.
  3. From the App Connect Dashboard, deploy the PerfRating.bar file to an integration server.
    Deployed integration server
  4. Optional. Use Grafana to monitor the CPU usage for the pod. Or monitor CPU usage from the Red Hat OpenShift web console (Monitoring > Dashboards ); for example:
    Red Hat web console Dashboard for monitoring CPU usage of the pod
  5. Run the following command to obtain the HTTP route for the integration server:
    oc get routes

    In the output, take note of the entry that shows the integration server name appended by -http; for example, vpa-is-toolkit-http. You will need to specify its HOST/PORT value in the next step.

    Sample output of the oc get routes command
  6. To invoke the flow, run the following command, where BASEURL is the HOST/PORT value of the integrationServerName-http entry:
    curl --request GET --url 'BASEURL:80/perf/rating?SequenceNumber=50&NumberOfIterations=2&NumberOfThreads=1' --header 'accept: application/json'

    On Windows, you can run the command as follows:

    curl --request GET --url "BASEURL:80/perf/rating?SequenceNumber=50&NumberOfIterations=2&NumberOfThreads=1" --header "accept: application/json"
  7. When the pod starts requesting resource beyond the lower limit, the VPA will revise the figures and proceed with cycling the pod to apply the updated figures.