mirror of
https://github.com/jpetazzo/container.training.git
synced 2026-03-03 01:40:19 +00:00
merge
This commit is contained in:
245
slides/k8s/horizontal-pod-autoscaler.md
Normal file
245
slides/k8s/horizontal-pod-autoscaler.md
Normal file
@@ -0,0 +1,245 @@
|
||||
# The Horizontal Pod Autoscaler
|
||||
|
||||
- What is the Horizontal Pod Autoscaler, or HPA?
|
||||
|
||||
- It is a controller that can perform *horizontal* scaling automatically
|
||||
|
||||
- Horizontal scaling = changing the number of replicas
|
||||
|
||||
(adding / removing pods)
|
||||
|
||||
- Vertical scaling = changing the size of individual replicas
|
||||
|
||||
(increasing / reducing CPU and RAM per pod)
|
||||
|
||||
- Cluster scaling = changing the size of the cluster
|
||||
|
||||
(adding / removing nodes)
|
||||
|
||||
---
|
||||
|
||||
## Principle of operation
|
||||
|
||||
- Each HPA resource (or "policy") specifies:
|
||||
|
||||
- which object to monitor and scale (e.g. a Deployment, ReplicaSet...)
|
||||
|
||||
- min/max scaling ranges (the max is a safety limit!)
|
||||
|
||||
- a target resource usage (e.g. the default is CPU=80%)
|
||||
|
||||
- The HPA continuously monitors the CPU usage for the related object
|
||||
|
||||
- It computes how many pods should be running:
|
||||
|
||||
`TargetNumOfPods = ceil(sum(CurrentPodsCPUUtilization) / Target)`
|
||||
|
||||
- It scales up/down the related object to this target number of pods
|
||||
|
||||
---
|
||||
|
||||
## Pre-requirements
|
||||
|
||||
- The metrics server needs to be running
|
||||
|
||||
(i.e. we need to be able to see pod metrics with `kubectl top pods`)
|
||||
|
||||
- The pods that we want to autoscale need to have resource requests
|
||||
|
||||
(because the target CPU% is not absolute, but relative to the request)
|
||||
|
||||
- The latter actually makes a lot of sense:
|
||||
|
||||
- if a Pod doesn't have a CPU request, it might be using 10% of CPU ...
|
||||
|
||||
- ... but only because there is no CPU time available!
|
||||
|
||||
- this makes sure that we won't add pods to nodes that are already resource-starved
|
||||
|
||||
---
|
||||
|
||||
## Testing the HPA
|
||||
|
||||
- We will start a CPU-intensive web service
|
||||
|
||||
- We will send some traffic to that service
|
||||
|
||||
- We will create an HPA policy
|
||||
|
||||
- The HPA will automatically scale up the service for us
|
||||
|
||||
---
|
||||
|
||||
## A CPU-intensive web service
|
||||
|
||||
- Let's use `jpetazzo/busyhttp`
|
||||
|
||||
(it is a web server that will use 1s of CPU for each HTTP request)
|
||||
|
||||
.exercise[
|
||||
|
||||
- Deploy the web server:
|
||||
```bash
|
||||
kubectl create deployment busyhttp --image=jpetazzo/busyhttp
|
||||
```
|
||||
|
||||
- Expose it with a ClusterIP service:
|
||||
```bash
|
||||
kubectl expose deployment busyhttp --port=80
|
||||
```
|
||||
|
||||
- Get the ClusterIP allocated to the service:
|
||||
```bash
|
||||
kubectl get svc busyhttp
|
||||
```
|
||||
|
||||
]
|
||||
|
||||
---
|
||||
|
||||
## Monitor what's going on
|
||||
|
||||
- Let's start a bunch of commands to watch what is happening
|
||||
|
||||
.exercise[
|
||||
|
||||
- Monitor pod CPU usage:
|
||||
```bash
|
||||
watch kubectl top pods
|
||||
```
|
||||
|
||||
- Monitor service latency:
|
||||
```bash
|
||||
httping http://`ClusterIP`/
|
||||
```
|
||||
|
||||
- Monitor cluster events:
|
||||
```bash
|
||||
kubectl get events -w
|
||||
```
|
||||
|
||||
]
|
||||
|
||||
---
|
||||
|
||||
## Send traffic to the service
|
||||
|
||||
- We will use `ab` (Apache Bench) to send traffic
|
||||
|
||||
.exercise[
|
||||
|
||||
- Send a lot of requests to the service, with a concurrency level of 3:
|
||||
```bash
|
||||
ab -c 3 -n 100000 http://`ClusterIP`/
|
||||
```
|
||||
|
||||
]
|
||||
|
||||
The latency (reported by `httping`) should increase above 3s.
|
||||
|
||||
The CPU utilization should increase to 100%.
|
||||
|
||||
(The server is single-threaded and won't go above 100%.)
|
||||
|
||||
---
|
||||
|
||||
## Create an HPA policy
|
||||
|
||||
- There is a helper command to do that for us: `kubectl autoscale`
|
||||
|
||||
.exercise[
|
||||
|
||||
- Create the HPA policy for the `busyhttp` deployment:
|
||||
```bash
|
||||
kubectl autoscale deployment busyhttp --max=10
|
||||
```
|
||||
|
||||
]
|
||||
|
||||
By default, it will assume a target of 80% CPU usage.
|
||||
|
||||
This can also be set with `--cpu-percent=`.
|
||||
|
||||
--
|
||||
|
||||
*The autoscaler doesn't seem to work. Why?*
|
||||
|
||||
---
|
||||
|
||||
## What did we miss?
|
||||
|
||||
- The events stream gives us a hint, but to be honest, it's not very clear:
|
||||
|
||||
`missing request for cpu`
|
||||
|
||||
- We forgot to specify a resource request for our Deployment!
|
||||
|
||||
- The HPA target is not an absolute CPU%
|
||||
|
||||
- It is relative to the CPU requested by the pod
|
||||
|
||||
---
|
||||
|
||||
## Adding a CPU request
|
||||
|
||||
- Let's edit the deployment and add a CPU request
|
||||
|
||||
- Since our server can use up to 1 core, let's request 1 core
|
||||
|
||||
.exercise[
|
||||
|
||||
- Edit the Deployment definition:
|
||||
```bash
|
||||
kubectl edit deployment busyhttp
|
||||
```
|
||||
|
||||
- In the `containers` list, add the following block:
|
||||
```yaml
|
||||
resources:
|
||||
requests:
|
||||
cpu: "1"
|
||||
```
|
||||
|
||||
]
|
||||
|
||||
---
|
||||
|
||||
## Results
|
||||
|
||||
- After saving and quitting, a rolling update happens
|
||||
|
||||
(if `ab` or `httping` exits, make sure to restart it)
|
||||
|
||||
- It will take a minute or two for the HPA to kick in:
|
||||
|
||||
- the HPA runs every 30 seconds by default
|
||||
|
||||
- it needs to gather metrics from the metrics server first
|
||||
|
||||
- If we scale further up (or down), the HPA will react after a few minutes:
|
||||
|
||||
- it won't scale up if it already scaled in the last 3 minutes
|
||||
|
||||
- it won't scale down if it already scaled in the last 5 minutes
|
||||
|
||||
---
|
||||
|
||||
## What about other metrics?
|
||||
|
||||
- The HPA in API group `autoscaling/v1` only supports CPU scaling
|
||||
|
||||
- The HPA in API group `autoscaling/v2beta2` supports metrics from various API groups:
|
||||
|
||||
- metrics.k8s.io, aka metrics server (per-Pod CPU and RAM)
|
||||
|
||||
- custom.metrics.k8s.io, custom metrics per Pod
|
||||
|
||||
- external.metrics.k8s.io, external metrics (not associated to Pods)
|
||||
|
||||
- Kubernetes doesn't implement any of these API groups
|
||||
|
||||
- Using these metrics requires to [register additional APIs](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-metrics-apis)
|
||||
|
||||
- The metrics provided by metrics server are standard; everything else is custom
|
||||
|
||||
- For more details, see [this great blog post](https://medium.com/uptime-99/kubernetes-hpa-autoscaling-with-custom-and-external-metrics-da7f41ff7846) or [this talk](https://www.youtube.com/watch?v=gSiGFH4ZnS8)
|
||||
Reference in New Issue
Block a user