Kubernetes aims to provide both resilience and scalability. It achieves this by deploying multiple pods with different resource allocations, to provide redundancy for your applications. Although you can grow and shrink your own deployments manually based on your needs, Kubernetes provides first-class support for scaling on-demand, using a feature called Horizontal Pod Autoscaling. It is a closed loop system that automatically grows or shrinks resources (application Pods) based on your current needs. You create a HorizontalPodAutoscaler
(or HPA
) resource for each application deployment that needs autoscaling, and let it take care of the rest for you automatically.
At a high level, HPA
does the following:
Under the hood, a HorizontalPodAutoscaler
is a CRD (Custom Resource Definition
) which drives a Kubernetes control loop implemented via a dedicated controller within the Control Plane
of your cluster. You create a HorizontalPodAutoscaler
YAML manifest targeting your application Deployment
, and then use kubectl
to apply the HPA resource in your cluster.
In order to work, HPA needs a metrics server available in your cluster to scrape required metrics, such as CPU and memory utilization. One straightforward option is the Kubernetes Metrics Server. The Metrics Server works by collecting resource metrics from Kubelets and exposing them via the Kubernetes API Server
to the Horizontal Pod Autoscaler. The Metrics API can also be accessed via kubectl top
if needed.
In this tutorial, you will:
If you’re looking for a managed Kubernetes hosting service, check out our simple, managed Kubernetes service built for growth.
To follow this tutorial, you will need:
A Kubernetes cluster with role-based access control (RBAC) enabled. This setup will use a DigitalOcean Kubernetes cluster, but you could also create a cluster manually. Your Kubernetes version should be between 1.20 and 1.25.
The kubectl
command-line tool installed in your local environment and configured to connect to your cluster. You can read more about installing kubectl
in the official documentation. If you are using a DigitalOcean Kubernetes cluster, please refer to How to Connect to a DigitalOcean Kubernetes Cluster to learn how to connect to your cluster using kubectl
.
The version control tool Git available in your development environment. If you are working in Ubuntu, you can refer to installing Git on Ubuntu 22.04
The Kubernetes Helm package manager also available in your development environment. You can refer to how to install software with Helm to install Helm locally.
You’ll start by adding the metrics-server
repository to your helm
package lists. You can use helm repo add
:
- helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server
Next, use helm repo update
to refresh the available packages:
- helm repo update metrics-server
OutputHang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "metrics-server" chart repository
Update Complete. ⎈Happy Helming!⎈
Now that you’ve added the repository to helm
, you’ll be able to add metrics-server
to your Kubernetes deployments. You could write your own deployment configuration here, but this tutorial will follow DigitalOcean’s Kubernetes Starter Kit, which includes a configuration for metrics-server
.
To do that, clone the Kubernetes Starter Kit Git repository:
- git clone https://github.com/digitalocean/Kubernetes-Starter-Kit-Developers.git
The metrics-server
configuration is located in Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/metrics-server-values-v3.8.2.yaml
. You can view or edit it by using nano
or your favorite text editor:
- nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/metrics-server-values-v3.8.2.yaml
It contains a few stock parameters. Note that replicas
is a fixed value, 2
.
## Starter Kit metrics-server configuration
## Ref: https://github.com/kubernetes-sigs/metrics-server/blob/metrics-server-helm-chart-3.8.2/charts/metrics-server
##
# Number of metrics-server replicas to run
replicas: 2
apiService:
# Specifies if the v1beta1.metrics.k8s.io API service should be created.
#
# You typically want this enabled! If you disable API service creation you have to
# manage it outside of this chart for e.g horizontal pod autoscaling to
# work with this release.
create: true
hostNetwork:
# Specifies if metrics-server should be started in hostNetwork mode.
#
# You would require this enabled if you use alternate overlay networking for pods and
# API server unable to communicate with metrics-server. As an example, this is required
# if you use Weave network on EKS
enabled: false
Refer to the Metrics Server chart page for an explanation of the available parameters for metrics-server
.
Note: You need to be fairly careful when matching Kubernetes deployments to your running version of Kubernetes, and the helm
charts themselves are also versioned to enforce this. The current upstream helm
chart for metrics-server
is 3.8.2, which deploys version 0.6.1
of metrics-server
itself. From the Metrics Server Compatibility Matrix, you can see that version 0.6.x
supports Kubernetes 1.19+
.
After you’ve reviewed the file and made any changes, you can proceed with deploying metrics-server
, by providing this file along with the helm install
command:
- HELM_CHART_VERSION="3.8.2"
-
- helm install metrics-server metrics-server/metrics-server --version "$HELM_CHART_VERSION" \
- --namespace metrics-server \
- --create-namespace \
- -f "Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/metrics-server-values-v${HELM_CHART_VERSION}.yaml"
This will deploy metrics-server
to your configured Kubernetes cluster:
OutputNAME: metrics-server
LAST DEPLOYED: Wed May 25 11:54:43 2022
NAMESPACE: metrics-server
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
***********************************************************************
* Metrics Server *
***********************************************************************
Chart version: 3.8.2
App version: 0.6.1
Image tag: k8s.gcr.io/metrics-server/metrics-server:v0.6.1
***********************************************************************
After deploying, you can use helm ls
to verify that metrics-server
has been added to your deployment:
- helm ls -n metrics-server
OutputNAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
metrics-server metrics-server 1 2022-02-24 14:58:23.785875 +0200 EET deployed metrics-server-3.8.2 0.6.1
Next, you can check the status of all of the Kubernetes resources deployed to the metrics-server
namespace:
- kubectl get all -n metrics-server
Based on the configuration you deployed with, both the deployment.apps
and replicaset.apps
values should count 2 available instances.
OutputNAME READY STATUS RESTARTS AGE
pod/metrics-server-694d47d564-9sp5h 1/1 Running 0 8m54s
pod/metrics-server-694d47d564-cc4m2 1/1 Running 0 8m54s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/metrics-server ClusterIP 10.245.92.63 <none> 443/TCP 8m54s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/metrics-server 2/2 2 2 8m55s
NAME DESIRED CURRENT READY AGE
replicaset.apps/metrics-server-694d47d564 2 2 2 8m55s
You have now deployed metrics-server
into your Kubernetes cluster. In the next step, you’ll review some of the parameters of a HorizontalPodAutoscaler Custom Resource Definition.
So far, your configurations have used a fixed value for the number of ReplicaSet
instances to deploy. In this step you will learn how to define a HorizontalPodAutoscaler CRD so that this value can dynamically grow or shrink.
A typical HorizontalPodAutoscaler
CRD looks like this:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app-deployment
minReplicas: 1
maxReplicas: 3
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
The parameters used in this configuration are as follows:
spec.scaleTargetRef
: A named reference to the resource being scaled.spec.minReplicas
: The lower limit for the number of replicas to which the autoscaler can scale down.spec.maxReplicas
: The upper limit.spec.metrics.type
: The metric to use to calculate the desired replica count. This example is using the Resource
type, which tells the HPA to scale the deployment based on average CPU
(or memory) utilization. averageUtilization
is set to a threshold value of 50
.You have two options to create an HPA for your application deployment:
kubectl autoscale
command on an existing deployment.kubectl
to apply changes to your cluster.You’ll try option #1 first, using another configuration from the DigitalOcean Kubernetes Starter Kit. It contains a deployment called myapp-test.yaml
which will demonstrate HPA in action by creating some arbitrary CPU load.
You can review that file by using nano
or your favorite text editor:
- nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/myapp-test.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-test
spec:
selector:
matchLabels:
run: myapp-test
replicas: 1
template:
metadata:
labels:
run: myapp-test
spec:
containers:
- name: busybox
image: busybox
resources:
limits:
cpu: 50m
requests:
cpu: 20m
command: ["sh", "-c"]
args:
- while [ 1 ]; do
echo "Test";
sleep 0.01;
done
Note the last few lines of this file. They contain some shell syntax to repeatedly print “Test” a hundred times a second, to simulate load. Once you are done reviewing the file, you can deploy it into your cluster using kubectl
:
- kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/myapp-test.yaml
Next, use kubectl autoscale
to create a HorizontalPodAutoscaler
targeting the myapp-test
deployment:
- kubectl autoscale deployment myapp-test --cpu-percent=50 --min=1 --max=3
Note the arguments passed to this command – this means that your deployment will be scaled between 1
and 3
replicas whenever CPU utilization reaches 50
percent.
You can check if the HPA resource was created by running kubectl get hpa
:
- kubectl get hpa
The TARGETS
column of the output will eventually show a figure of current usage%/target usage%
.
OutputNAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
myapp-test Deployment/myapp-test 240%/50% 1 3 3 52s
Note: The TARGETS
column value will display <unknown>/50%
for a while (around 15 seconds). This is normal, because HPA needs to collect average values over time, and it won’t have enough data before the first 15 second interval. By default, HPA checks metrics every 15 seconds.
You can also observe the logged events that a HPA generates by using kubectl describe
:
- kubectl describe hpa myapp-test
OutputName: myapp-test
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Mon, 28 May 2022 10:10:50 -0800
Reference: Deployment/myapp-test
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 240% (48m) / 50%
Min replicas: 1
Max replicas: 3
Deployment pods: 3 current / 3 desired
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 17s horizontal-pod-autoscaler New size: 2; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 37s horizontal-pod-autoscaler New size: 3; reason: cpu resource utilization (percentage of request) above target
This is the kubectl autoscale
method. In a production scenario, you should usually instead use a dedicated YAML manifest to define each HPA. This way, you can track changes by having the manifest committed to a Git repository, and modify it as needed.
You will walk through an example of this in the last step of this tutorial. Before moving on, delete the myapp-test
deployment and corresponding HPA resource:
- kubectl delete hpa myapp-test
- kubectl delete deployment myapp-test
In this last step, you’ll experiment with two different ways of generating server load and scaling via a YAML manifest:
In this scenario, you will create a sample application implemented using Python, which performs some CPU intensive computations. Similar to the shell script from the last step, this Python code is included in one of the example manifests from the starter kit. You can open the constant-load-deployment-test.yaml
using nano
or your favorite text editor:
- nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/constant-load-deployment-test.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
name: python-test-code-configmap
data:
entrypoint.sh: |-
#!/usr/bin/env python
import math
while True:
x = 0.0001
for i in range(1000000):
x = x + math.sqrt(x)
print(x)
print("OK!")
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: constant-load-deployment-test
spec:
selector:
matchLabels:
run: python-constant-load-test
replicas: 1
template:
metadata:
labels:
run: python-constant-load-test
spec:
containers:
- name: python-runtime
image: python:alpine3.15
resources:
limits:
cpu: 50m
requests:
cpu: 20m
command:
- /bin/entrypoint.sh
volumeMounts:
- name: python-test-code-volume
mountPath: /bin/entrypoint.sh
readOnly: true
subPath: entrypoint.sh
volumes:
- name: python-test-code-volume
configMap:
defaultMode: 0700
name: python-test-code-configmap
The Python code, which repeatedly generates arbitrary square roots, is highlighted above. The deployment will fetch a docker image hosting the required python runtime, and then attach a ConfigMap
to the application Pod
hosting the sample Python script shown earlier.
First, create a separate namespace for this deployment (for better observation), then deploy it via kubectl
:
- kubectl create ns hpa-constant-load
-
- kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/constant-load-deployment-test.yaml -n hpa-constant-load
Outputconfigmap/python-test-code-configmap created
deployment.apps/constant-load-deployment-test created
Note: The sample deployment also configures resource request limits for the sample application Pods. This is important because HPA logic relies on having resource requests limits set for your Pods. In general, it is advisable to set resource requests limits for all your application Pods, to avoid unpredictable bottlenecks.
Verify that the deployment was created successfully, and that it’s up and running:
- kubectl get deployments -n hpa-constant-load
OutputNAME READY UP-TO-DATE AVAILABLE AGE
constant-load-deployment-test 1/1 1 1 8s
Next, you’ll need to deploy another HPA to this cluster. There is an example matched to this scenario in constant-load-hpa-test.yaml
, which you can open with nano
or your favorite text editor:
- nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/constant-load-hpa-test.yaml -n hpa-constant-load
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: constant-load-test
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: constant-load-deployment-test
minReplicas: 1
maxReplicas: 3
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Deploy it via kubectl
:
- kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/constant-load-hpa-test.yaml -n hpa-constant-load
This will create a HPA
resource, targeting the sample Python deployment. You can check the constant-load-test
HPA state via kubectl get hpa
:
- kubectl get hpa constant-load-test -n hpa-constant-load
Note the REFERENCE
column targeting constant-load-deployment-test
, as well as the TARGETS
column showing current CPU resource requests versus the threshold value, as in the last example.
OutputNAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
constant-load-test Deployment/constant-load-deployment-test 255%/50% 1 3 3 49s
You may also notice that the REPLICAS
column value increased from 1
to 3
for the sample application deployment, as stated in the HPA CRD spec. This happened very quickly because the application used in this example generates CPU load very quickly. As in the previous example, you can also inspect logged HPA events using kubectl describe hpa -n hpa-constant-load
.
A more interesting and realistic scenario is to observe where external load is created. For this final example you’re going to use a different namespace and set of manifests to avoid reusing any data from the previous test.
This example will use the quote of the moment sample server. Every time an HTTP request is made to this server, it sends a different quote as a response. You’ll create load on your cluster by sending HTTP requests every 1ms. This deployment is included in quote_deployment.yaml
. Review this file using nano
or your favorite text editor:
- nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/quote_deployment.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: quote
spec:
replicas: 1
selector:
matchLabels:
app: quote
template:
metadata:
labels:
app: quote
spec:
containers:
- name: quote
image: docker.io/datawire/quote:0.4.1
ports:
- name: http
containerPort: 8080
resources:
requests:
cpu: 100m
memory: 50Mi
limits:
cpu: 200m
memory: 100Mi
---
apiVersion: v1
kind: Service
metadata:
name: quote
spec:
ports:
- name: http
port: 80
targetPort: 8080
selector:
app: quote
Note that the actual HTTP query script is not contained within the manifest this time – this manifest only provisions an app to run the queries for now. When you are done reviewing the file, create the quote namespace and deployment using kubectl
:
- kubectl create ns hpa-external-load
-
- kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/quote_deployment.yaml -n hpa-external-load
Verify that the quote
application deployment and services are up and running:
- kubectl get all -n hpa-external-load
OutputNAME READY STATUS RESTARTS AGE
pod/quote-dffd65947-s56c9 1/1 Running 0 3m5s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/quote ClusterIP 10.245.170.194 <none> 80/TCP 3m5s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/quote 1/1 1 1 3m5s
NAME DESIRED CURRENT READY AGE
replicaset.apps/quote-6c8f564ff 1 1 1 3m5s
Next, you’ll create the HPA
for the quote
deployment. This is configured in quote-deployment-hpa-test.yaml
. Review the file in nano
or your favorite text editor:
- nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/quote-deployment-hpa-test.yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: external-load-test
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: quote
behavior:
scaleDown:
stabilizationWindowSeconds: 60
minReplicas: 1
maxReplicas: 3
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 20
Note that in this case there’s a different threshold value set for the CPU utilization resource metric (20%
). There is also a different scaling behavior. This configuration alters the scaleDown.stabilizationWindowSeconds
behavior, and sets it to a lower value of 60
seconds. This is not always needed in practice, but in this case you may want to speed up things to see more quickly how the autoscaler performs the scale down action. By default, the HorizontalPodAutoscaler
has a cool down period of 5 minutes. This is sufficient in most cases, and should avoid fluctuations when replicas are being scaled.
When you’re ready, deploy it using kubectl
:
- kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/quote-deployment-hpa-test.yaml -n hpa-external-load
Now, check if the HPA resource is in place and alive:
- kubectl get hpa external-load-test -n hpa-external-load
OutputNAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
external-load-test Deployment/quote 1%/20% 1 3 1 108s
Finally, you will run the actual HTTP queries, using the shell script quote_service_load_test.sh
. The reason that this shell script was not embedded into the manifest earlier is so that you can observe it running in your cluster while logging directly to your terminal. Review the script using nano
or your favorite text editor:
- nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/scripts/quote_service_load_test.sh
#!/usr/bin/env sh
echo
echo "[INFO] Starting load testing in 10s..."
sleep 10
echo "[INFO] Working (press Ctrl+C to stop)..."
kubectl run -i --tty load-generator \
--rm \
--image=busybox \
--restart=Never \
-n hpa-external-load \
-- /bin/sh -c "while sleep 0.001; do wget -q -O- http://quote; done" > /dev/null 2>&1
echo "[INFO] Load testing finished."
For this demonstration, open two separate terminal windows. In the first, run the quote_service_load_test.sh
shell script:
- Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/scripts/quote_service_load_test.sh
Next, in the second window, run a kubectl
watch command using the -w
flag on the HPA resource:
- kubectl get hpa -n hpa-external-load -w
You should see the load tick upwards and scale automatically:
OutputNAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
external-load-test Deployment/quote 1%/20% 1 3 1 2m49s
external-load-test Deployment/quote 29%/20% 1 3 1 3m1s
external-load-test Deployment/quote 67%/20% 1 3 2 3m16s
You can observe how the autoscaler kicks in when load increases, and increments the quote
server deployment replica set to a higher value. As soon as the load generator script is stopped, there’s a cool down period, and after 1 minute or so the replica set is lowered to the initial value of 1. You can press Ctrl+C
to terminate the running script after navigating back to the first terminal window.
In this tutorial, you deployed and observed the behavior of Horizontal Pod Autoscaling (HPA) using Kubernetes Metrics Server under several different scenarios. HPA is an essential component of Kubernetes that helps your infrastructure handle more traffic on an as-needed basis.
Metrics Server has a significant limitation in that it cannot provide any metrics beyond CPU or memory usage. You can further review Metrics Server documentation to understand how to work within its use cases. If you need to scale using any other metrics (such as disk usage or network load), you can use Prometheus via a special adapter, named prometheus-adapter.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
The Horizontal Pod Autoscaler is a built-in Kubernetes feature that allows to horizontally scale applications based on one or more monitored metrics. Horizontal scaling means increasing and decreasing the number of replicas. Vertical scaling means increasing and decreasing the compute resources of a single replica.