merge

2026-05-07 01:16:37 +00:00 · 2019-05-25 21:45:06 -05:00
parent 918fa2091d f4e16dccc4
commit 48bd2a98bd
1 changed files with 245 additions and 0 deletions
--- a/slides/k8s/horizontal-pod-autoscaler.md
+++ b/slides/k8s/horizontal-pod-autoscaler.md
@@ -0,0 +1,245 @@
+# The Horizontal Pod Autoscaler
+
+- What is the Horizontal Pod Autoscaler, or HPA?
+
+- It is a controller that can perform *horizontal* scaling automatically
+
+- Horizontal scaling = changing the number of replicas
+
+  (adding / removing pods)
+
+- Vertical scaling = changing the size of individual replicas
+
+  (increasing / reducing CPU and RAM per pod)
+
+- Cluster scaling = changing the size of the cluster
+
+  (adding / removing nodes)
+
+---
+
+## Principle of operation
+
+- Each HPA resource (or "policy") specifies:
+
+  - which object to monitor and scale (e.g. a Deployment, ReplicaSet...)
+
+  - min/max scaling ranges (the max is a safety limit!)
+
+  - a target resource usage (e.g. the default is CPU=80%)
+
+- The HPA continuously monitors the CPU usage for the related object
+
+- It computes how many pods should be running:
+
+  `TargetNumOfPods = ceil(sum(CurrentPodsCPUUtilization) / Target)`
+
+- It scales up/down the related object to this target number of pods
+
+---
+
+## Pre-requirements
+
+- The metrics server needs to be running
+
+  (i.e. we need to be able to see pod metrics with `kubectl top pods`)
+
+- The pods that we want to autoscale need to have resource requests
+
+  (because the target CPU% is not absolute, but relative to the request)
+
+- The latter actually makes a lot of sense:
+
+  - if a Pod doesn't have a CPU request, it might be using 10% of CPU ...
+
+  - ... but only because there is no CPU time available!
+
+  - this makes sure that we won't add pods to nodes that are already resource-starved
+
+---
+
+## Testing the HPA
+
+- We will start a CPU-intensive web service
+
+- We will send some traffic to that service
+
+- We will create an HPA policy
+
+- The HPA will automatically scale up the service for us
+
+---
+
+## A CPU-intensive web service
+
+- Let's use `jpetazzo/busyhttp`
+
+  (it is a web server that will use 1s of CPU for each HTTP request)
+
+.exercise[
+
+- Deploy the web server:
+  ```bash
+  kubectl create deployment busyhttp --image=jpetazzo/busyhttp
+  ```
+
+- Expose it with a ClusterIP service:
+  ```bash
+  kubectl expose deployment busyhttp --port=80
+  ```
+
+- Get the ClusterIP allocated to the service:
+  ```bash
+  kubectl get svc busyhttp
+  ```
+
+]
+
+---
+
+## Monitor what's going on
+
+- Let's start a bunch of commands to watch what is happening
+
+.exercise[
+
+- Monitor pod CPU usage:
+  ```bash
+  watch kubectl top pods
+  ```
+
+- Monitor service latency:
+  ```bash
+  httping http://`ClusterIP`/
+  ```
+
+- Monitor cluster events:
+  ```bash
+  kubectl get events -w
+  ```
+
+]
+
+---
+
+## Send traffic to the service
+
+- We will use `ab` (Apache Bench) to send traffic
+
+.exercise[
+
+- Send a lot of requests to the service, with a concurrency level of 3:
+  ```bash
+  ab -c 3 -n 100000 http://`ClusterIP`/
+  ```
+
+]
+
+The latency (reported by `httping`) should increase above 3s.
+
+The CPU utilization should increase to 100%.
+
+(The server is single-threaded and won't go above 100%.)
+
+---
+
+## Create an HPA policy
+
+- There is a helper command to do that for us: `kubectl autoscale`
+
+.exercise[
+
+- Create the HPA policy for the `busyhttp` deployment:
+  ```bash
+  kubectl autoscale deployment busyhttp --max=10
+  ```
+
+]
+
+By default, it will assume a target of 80% CPU usage.
+
+This can also be set with `--cpu-percent=`.
+
+--
+
+*The autoscaler doesn't seem to work. Why?*
+
+---
+
+## What did we miss?
+
+- The events stream gives us a hint, but to be honest, it's not very clear:
+
+  `missing request for cpu`
+
+- We forgot to specify a resource request for our Deployment!
+
+- The HPA target is not an absolute CPU%
+
+- It is relative to the CPU requested by the pod
+
+---
+
+## Adding a CPU request
+
+- Let's edit the deployment and add a CPU request
+
+- Since our server can use up to 1 core, let's request 1 core
+
+.exercise[
+
+- Edit the Deployment definition:
+  ```bash
+  kubectl edit deployment busyhttp
+  ```
+
+- In the `containers` list, add the following block:
+  ```yaml
+    resources:
+      requests:
+        cpu: "1"
+  ```
+
+]
+
+---
+
+## Results
+
+- After saving and quitting, a rolling update happens
+
+  (if `ab` or `httping` exits, make sure to restart it)
+
+- It will take a minute or two for the HPA to kick in:
+
+  - the HPA runs every 30 seconds by default
+
+  - it needs to gather metrics from the metrics server first
+
+- If we scale further up (or down), the HPA will react after a few minutes:
+
+  - it won't scale up if it already scaled in the last 3 minutes
+
+  - it won't scale down if it already scaled in the last 5 minutes
+
+---
+
+## What about other metrics?
+
+- The HPA in API group `autoscaling/v1` only supports CPU scaling
+
+- The HPA in API group `autoscaling/v2beta2` supports metrics from various API groups:
+
+  - metrics.k8s.io, aka metrics server (per-Pod CPU and RAM)
+
+  - custom.metrics.k8s.io, custom metrics per Pod
+
+  - external.metrics.k8s.io, external metrics (not associated to Pods)
+
+- Kubernetes doesn't implement any of these API groups
+
+- Using these metrics requires to [register additional APIs](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-metrics-apis)
+
+- The metrics provided by metrics server are standard; everything else is custom
+
+- For more details, see [this great blog post](https://medium.com/uptime-99/kubernetes-hpa-autoscaling-with-custom-and-external-metrics-da7f41ff7846) or [this talk](https://www.youtube.com/watch?v=gSiGFH4ZnS8)