From 785a8178cac46ebbfa82ebb5778f25305d24ffcc Mon Sep 17 00:00:00 2001
From: Jerome Petazzoni <jerome.petazzoni@gmail.com>
Date: Wed, 22 May 2019 13:47:52 -0500
Subject: [PATCH 1/2] Show quick demo using CPU-bound workload. Explain
 autoscaler gotchas. Explain the difference between the different API groups,
 metrics servier, custom metrics, external metrics.

---
 slides/k8s/horizontal-pod-autoscaler.md | 245 ++++++++++++++++++++++++
 slides/kube-admin-one.yml               |   1 +
 2 files changed, 246 insertions(+)
 create mode 100644 slides/k8s/horizontal-pod-autoscaler.md

diff --git a/slides/k8s/horizontal-pod-autoscaler.md b/slides/k8s/horizontal-pod-autoscaler.md
new file mode 100644
index 00000000..1265b02a
--- /dev/null
+++ b/slides/k8s/horizontal-pod-autoscaler.md
@@ -0,0 +1,245 @@
+# The Horizontal Pod Autoscaler
+
+- What is the Horizontal Pod Autoscaler, or HPA?
+
+- It is a controller that can perform *horizontal* scaling automatically
+
+- Horizontal scaling = changing the number of replicas
+
+  (adding / removing pods)
+
+- Vertical scaling = changing the size of individual replicas
+
+  (increasing / reducing CPU and RAM per pod)
+
+- Cluster scaling = changing the size of the cluster
+
+  (adding / removing nodes)
+
+---
+
+## Principle of operation
+
+- Each HPA resource (or "policy") specifies:
+
+  - which object to monitor and scale (e.g. a Deployment, ReplicaSet...)
+
+  - min/max scaling ranges (the max is a safety limit!)
+
+  - a target resource usage (e.g. the default is CPU=80%)
+
+- The HPA continuously monitors the CPU usage for the related object
+
+- It computes how many pods should be running:
+
+  `TargetNumOfPods = ceil(sum(CurrentPodsCPUUtilization) / Target)`
+
+- It scales up/down the related object to this target number of pods
+
+---
+
+## Pre-requirements
+
+- The metrics server needs to be running
+
+  (i.e. we need to be able to see pod metrics with `kubectl top pods`)
+
+- The pods that we want to autoscale need to have resource requests
+
+  (because the target CPU% is not absolute, but relative to the request)
+
+- The latter actually makes a lot of sense:
+
+  - if a Pod doesn't have a CPU request, it might be using 10% of CPU ...
+
+  - ... but only because there is no CPU time available!
+
+  - this makes sure that we won't add pods to nodes that are already starved
+
+---
+
+## Testing the HPA
+
+- We will start a CPU-intensive web service
+
+- We will send some traffic to that service
+
+- We will create an HPA policy
+
+- The HPA will automatically scale up the service for us
+
+---
+
+## A CPU-intensive web service
+
+- Let's use `jpetazzo/busyhttp`
+
+  (it is a web server that will use 1s of CPU for each HTTP request)
+
+.exercise[
+
+- Deploy the web server:
+  ```bash
+  kubectl create deployment busyhttp --image=jpetazzo/busyhttp
+  ```
+
+- Expose it with a ClusterIP service:
+  ```bash
+  kubectl expose deployment busyhttp --port=80
+  ```
+
+- Get the ClusterIP allocated to the service:
+  ```bash
+  kubectl get svc busyhttp
+  ```
+
+]
+
+---
+
+## Monitor what's going on
+
+- Let's start a bunch of commands to watch what is happening
+
+.exercise[
+
+- Monitor pod CPU usage:
+  ```bash
+  watch kubectl top pods
+  ```
+
+- Monitor service latency:
+  ```bash
+  httping http://`ClusterIP`/
+  ```
+
+- Monitor cluster events:
+  ```bash
+  kubectl get events -w
+  ```
+
+]
+
+---
+
+## Send traffic to the service
+
+- We will use `ab` (Apache Bench) to send traffic
+
+.exercise[
+
+- Send a lot of requests to the service, with a concurrency level of 3:
+  ```bash
+  ab -c 3 -n 100000 http://`ClusterIP`/
+  ```
+
+]
+
+The latency (reported by `httping`) should increase above 3s.
+
+The CPU utilization should increase to 100%.
+
+(The server is single-threaded and won't go above 100%.)
+
+---
+
+## Create an HPA policy
+
+- There is a helper command to do that for us: `kubectl autoscale`
+
+.exercise[
+
+- Create the HPA policy for the `busyhttp` deployment:
+  ```bash
+  kubectl autoscale deployment busyhttp --max=10
+  ```
+
+]
+
+By default, it will assume a target of 80% CPU usage.
+
+This can also be set with `--cpu-percent=`.
+
+--
+
+*The autoscaler doesn't seem to work. Why?*
+
+---
+
+## What did we miss?
+
+- The events stream gives us a hint, but to be honest, it's not very clear:
+
+  `missing request for cpu`
+
+- We forgot to specify a resource request for our Deployment!
+
+- The HPA target is not an absolute CPU%
+
+- It is relative to the CPU requested by the pod
+
+---
+
+## Adding a CPU request
+
+- Let's edit the deployment and add a CPU request
+
+- Since our server can use up to 1 core, let's request 1 core
+
+.exercise[
+
+- Edit the Deployment definition:
+  ```bash
+  kubectl edit deployment busyhttp
+  ```
+
+- In the `containers` list, add the following block:
+  ```yaml
+    resources:
+      requests:
+        cpu: "1"
+  ```
+
+]
+
+---
+
+## Results
+
+- After saving and quitting, a rolling update happens
+
+  (if `ab` or `httping` exits, make sure to restart it)
+
+- It will take a minute or two for the HPA to kick in:
+
+  - the HPA runs every 30 seconds by default
+
+  - it needs to gather metrics from the metrics server first
+
+- If we scale further up (or down), the HPA will react after a few minutes:
+
+  - it won't scale up if it already scaled in the last 3 minutes
+
+  - it won't scale down if it already scaled in the last 5 minutes
+
+---
+
+## What about other metrics?
+
+- The HPA in API group `autoscaling/v1` only supports CPU scaling
+
+- The HPA in API group `autoscaling/v2beta2` supports metrics from various API groups:
+
+  - metrics.k8s.io, aka metrics server (per-Pod CPU and RAM)
+
+  - custom.metrics.k8s.io, custom metrics per Pod
+
+  - external.metrics.k8s.io, external metrics (not associated to Pods)
+
+- Kubernetes doesn't implement any of these API groups
+
+- Using these metrics requires to [register additional APIs](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-metrics-apis)
+
+- The metrics provided by metrics server are standard; everything else is custom
+
+- For more details, see [this great blog post](https://medium.com/uptime-99/kubernetes-hpa-autoscaling-with-custom-and-external-metrics-da7f41ff7846) or [this talk](https://www.youtube.com/watch?v=gSiGFH4ZnS8)
diff --git a/slides/kube-admin-one.yml b/slides/kube-admin-one.yml
index 4dac2298..c18239db 100644
--- a/slides/kube-admin-one.yml
+++ b/slides/kube-admin-one.yml
@@ -38,6 +38,7 @@ chapters:
 - - k8s/resource-limits.md
   - k8s/metrics-server.md
   - k8s/cluster-sizing.md
+  - k8s/horizontal-pod-autoscaler.md
 - - k8s/lastwords-admin.md
   - k8s/links.md
   - shared/thankyou.md

From 1f0842543747a07970c26373e5bd1b60cabc2140 Mon Sep 17 00:00:00 2001
From: Jerome Petazzoni <jerome.petazzoni@gmail.com>
Date: Fri, 24 May 2019 19:37:35 -0500
Subject: [PATCH 2/2] Improve phrasing

---
 slides/k8s/horizontal-pod-autoscaler.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/slides/k8s/horizontal-pod-autoscaler.md b/slides/k8s/horizontal-pod-autoscaler.md
index 1265b02a..38e363f6 100644
--- a/slides/k8s/horizontal-pod-autoscaler.md
+++ b/slides/k8s/horizontal-pod-autoscaler.md
@@ -54,7 +54,7 @@
 
   - ... but only because there is no CPU time available!
 
-  - this makes sure that we won't add pods to nodes that are already starved
+  - this makes sure that we won't add pods to nodes that are already resource-starved
 
 ---