From 785a8178cac46ebbfa82ebb5778f25305d24ffcc Mon Sep 17 00:00:00 2001 From: Jerome Petazzoni Date: Wed, 22 May 2019 13:47:52 -0500 Subject: [PATCH 1/2] Show quick demo using CPU-bound workload. Explain autoscaler gotchas. Explain the difference between the different API groups, metrics servier, custom metrics, external metrics. --- slides/k8s/horizontal-pod-autoscaler.md | 245 ++++++++++++++++++++++++ slides/kube-admin-one.yml | 1 + 2 files changed, 246 insertions(+) create mode 100644 slides/k8s/horizontal-pod-autoscaler.md diff --git a/slides/k8s/horizontal-pod-autoscaler.md b/slides/k8s/horizontal-pod-autoscaler.md new file mode 100644 index 00000000..1265b02a --- /dev/null +++ b/slides/k8s/horizontal-pod-autoscaler.md @@ -0,0 +1,245 @@ +# The Horizontal Pod Autoscaler + +- What is the Horizontal Pod Autoscaler, or HPA? + +- It is a controller that can perform *horizontal* scaling automatically + +- Horizontal scaling = changing the number of replicas + + (adding / removing pods) + +- Vertical scaling = changing the size of individual replicas + + (increasing / reducing CPU and RAM per pod) + +- Cluster scaling = changing the size of the cluster + + (adding / removing nodes) + +--- + +## Principle of operation + +- Each HPA resource (or "policy") specifies: + + - which object to monitor and scale (e.g. a Deployment, ReplicaSet...) + + - min/max scaling ranges (the max is a safety limit!) + + - a target resource usage (e.g. the default is CPU=80%) + +- The HPA continuously monitors the CPU usage for the related object + +- It computes how many pods should be running: + + `TargetNumOfPods = ceil(sum(CurrentPodsCPUUtilization) / Target)` + +- It scales up/down the related object to this target number of pods + +--- + +## Pre-requirements + +- The metrics server needs to be running + + (i.e. we need to be able to see pod metrics with `kubectl top pods`) + +- The pods that we want to autoscale need to have resource requests + + (because the target CPU% is not absolute, but relative to the request) + +- The latter actually makes a lot of sense: + + - if a Pod doesn't have a CPU request, it might be using 10% of CPU ... + + - ... but only because there is no CPU time available! + + - this makes sure that we won't add pods to nodes that are already starved + +--- + +## Testing the HPA + +- We will start a CPU-intensive web service + +- We will send some traffic to that service + +- We will create an HPA policy + +- The HPA will automatically scale up the service for us + +--- + +## A CPU-intensive web service + +- Let's use `jpetazzo/busyhttp` + + (it is a web server that will use 1s of CPU for each HTTP request) + +.exercise[ + +- Deploy the web server: + ```bash + kubectl create deployment busyhttp --image=jpetazzo/busyhttp + ``` + +- Expose it with a ClusterIP service: + ```bash + kubectl expose deployment busyhttp --port=80 + ``` + +- Get the ClusterIP allocated to the service: + ```bash + kubectl get svc busyhttp + ``` + +] + +--- + +## Monitor what's going on + +- Let's start a bunch of commands to watch what is happening + +.exercise[ + +- Monitor pod CPU usage: + ```bash + watch kubectl top pods + ``` + +- Monitor service latency: + ```bash + httping http://`ClusterIP`/ + ``` + +- Monitor cluster events: + ```bash + kubectl get events -w + ``` + +] + +--- + +## Send traffic to the service + +- We will use `ab` (Apache Bench) to send traffic + +.exercise[ + +- Send a lot of requests to the service, with a concurrency level of 3: + ```bash + ab -c 3 -n 100000 http://`ClusterIP`/ + ``` + +] + +The latency (reported by `httping`) should increase above 3s. + +The CPU utilization should increase to 100%. + +(The server is single-threaded and won't go above 100%.) + +--- + +## Create an HPA policy + +- There is a helper command to do that for us: `kubectl autoscale` + +.exercise[ + +- Create the HPA policy for the `busyhttp` deployment: + ```bash + kubectl autoscale deployment busyhttp --max=10 + ``` + +] + +By default, it will assume a target of 80% CPU usage. + +This can also be set with `--cpu-percent=`. + +-- + +*The autoscaler doesn't seem to work. Why?* + +--- + +## What did we miss? + +- The events stream gives us a hint, but to be honest, it's not very clear: + + `missing request for cpu` + +- We forgot to specify a resource request for our Deployment! + +- The HPA target is not an absolute CPU% + +- It is relative to the CPU requested by the pod + +--- + +## Adding a CPU request + +- Let's edit the deployment and add a CPU request + +- Since our server can use up to 1 core, let's request 1 core + +.exercise[ + +- Edit the Deployment definition: + ```bash + kubectl edit deployment busyhttp + ``` + +- In the `containers` list, add the following block: + ```yaml + resources: + requests: + cpu: "1" + ``` + +] + +--- + +## Results + +- After saving and quitting, a rolling update happens + + (if `ab` or `httping` exits, make sure to restart it) + +- It will take a minute or two for the HPA to kick in: + + - the HPA runs every 30 seconds by default + + - it needs to gather metrics from the metrics server first + +- If we scale further up (or down), the HPA will react after a few minutes: + + - it won't scale up if it already scaled in the last 3 minutes + + - it won't scale down if it already scaled in the last 5 minutes + +--- + +## What about other metrics? + +- The HPA in API group `autoscaling/v1` only supports CPU scaling + +- The HPA in API group `autoscaling/v2beta2` supports metrics from various API groups: + + - metrics.k8s.io, aka metrics server (per-Pod CPU and RAM) + + - custom.metrics.k8s.io, custom metrics per Pod + + - external.metrics.k8s.io, external metrics (not associated to Pods) + +- Kubernetes doesn't implement any of these API groups + +- Using these metrics requires to [register additional APIs](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-metrics-apis) + +- The metrics provided by metrics server are standard; everything else is custom + +- For more details, see [this great blog post](https://medium.com/uptime-99/kubernetes-hpa-autoscaling-with-custom-and-external-metrics-da7f41ff7846) or [this talk](https://www.youtube.com/watch?v=gSiGFH4ZnS8) diff --git a/slides/kube-admin-one.yml b/slides/kube-admin-one.yml index 4dac2298..c18239db 100644 --- a/slides/kube-admin-one.yml +++ b/slides/kube-admin-one.yml @@ -38,6 +38,7 @@ chapters: - - k8s/resource-limits.md - k8s/metrics-server.md - k8s/cluster-sizing.md + - k8s/horizontal-pod-autoscaler.md - - k8s/lastwords-admin.md - k8s/links.md - shared/thankyou.md From 1f0842543747a07970c26373e5bd1b60cabc2140 Mon Sep 17 00:00:00 2001 From: Jerome Petazzoni Date: Fri, 24 May 2019 19:37:35 -0500 Subject: [PATCH 2/2] Improve phrasing --- slides/k8s/horizontal-pod-autoscaler.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/slides/k8s/horizontal-pod-autoscaler.md b/slides/k8s/horizontal-pod-autoscaler.md index 1265b02a..38e363f6 100644 --- a/slides/k8s/horizontal-pod-autoscaler.md +++ b/slides/k8s/horizontal-pod-autoscaler.md @@ -54,7 +54,7 @@ - ... but only because there is no CPU time available! - - this makes sure that we won't add pods to nodes that are already starved + - this makes sure that we won't add pods to nodes that are already resource-starved ---