♻️ Update healthcheck content

2026-07-19 04:49:19 +00:00 · 2022-01-27 11:23:43 +01:00
parent a01fecf679
commit 5aa20362eb
1 changed files with 62 additions and 29 deletions
--- a/slides/k8s/healthchecks.md
+++ b/slides/k8s/healthchecks.md
@@ -1,16 +1,18 @@
 # Healthchecks

- Kubernetes provides two kinds of healthchecks: liveness and readiness
+- Containers can have *healthchecks*

- Healthchecks are *probes* that apply to *containers* (not to pods)
+- There are three kinds of healthchecks, corresponding to very different use-cases:

- Each container can have two (optional) probes:
+  - liveness  = detect when a container is "dead" and needs to be restarted

-  - liveness = is this container dead or alive?
+  - readiness = detect when a container is ready to serve traffic

-  - readiness = is this container ready to serve traffic?
+  - startup = detect if a container has finished to boot

- Different probes are available (HTTP, TCP, program execution)
+- These healthchecks are optional (we can use none, all, or some of them)
+
+- Different probes are available (HTTP request, TCP connection, program execution)

 - Let's see the difference and how to use them!

@@ -18,11 +20,13 @@

 ## Liveness probe

+*This container is dead, we don't know how to fix it, other than restarting it.*
+
 - Indicates if the container is dead or alive

 - A dead container cannot come back to life

- If the liveness probe fails, the container is killed
+- If the liveness probe fails, the container is killed (destroyed)

  (to make really sure that it's really dead; no zombies or undeads!)

@@ -50,9 +54,31 @@

 ---

-## Readiness probe
+## Readiness probe (1)

- Indicates if the container is ready to serve traffic
+*Make sure that a container is ready before continuing a rolling update.*
+
+- Indicates if the container is ready to handle traffic
+
+- When doing a rolling update, the Deployment controller waits for Pods to be ready
+
+  (a Pod is ready when all the containers in the Pod are ready)
+
+- Improves reliability and safety of rolling updates:
+
+  - don't roll out a broken version (that doesn't pass readiness checks)
+
+  - don't lose processing capacity during a rolling update
+
+---
+
+## Readiness probe (2)
+
+*Temporarily remove a container (overloaded or otherwise) from a Service load balancer.*
+
+- A container can mark itself "not ready" temporarily
+
+  (e.g. if it's overloaded or needs to reload/restart/garbage collect...)

 - If a container becomes "unready" it might be ready again soon

@@ -80,9 +106,9 @@

  - runtime is busy doing garbage collection or initial data load

- For processes that take a long time to start
+- To redirect new connections to other Pods

-  (more on that later)
+  (e.g. fail the readiness probe when the Pod's load is too high)

 ---

@@ -120,27 +146,35 @@

 ---

-class: extra-details
-
 ## Startup probe

- Kubernetes 1.16 introduces a third type of probe: `startupProbe`
+*The container takes too long to start, and is killed by the liveness probe!*

-  (it is in `alpha` in Kubernetes 1.16)
+- By default, probes (including liveness) start immediately

- It can be used to indicate "container not ready *yet*"
+- With the default probe interval and failure threshold:

-  - process is still starting
+  *a container must respond in less than 30 seconds, or it will be killed!*

-  - loading external data, priming caches
+- There are two ways to avoid that:

- Before Kubernetes 1.16, we had to use the `initialDelaySeconds` parameter
+  - set `initialDelaySeconds` (a fixed, rigid delay)

-  (available for both liveness and readiness probes)
+  - use a `startupProbe`

- `initialDelaySeconds` is a rigid delay (always wait X before running probes)
+- Kubernetes will run only the startup probe, and when it succeeds, run the other probes

- `startupProbe` works better when a container start time can vary a lot
+---
+
+## When to use a startup probe
+
+- For containers that take a long time to start
+
+  (more than 30 seconds)
+
+- Especially if that time can vary a lot
+
+  (e.g. fast in dev, slow in prod, or the other way around)

 ---

@@ -190,17 +224,16 @@ Here is a pod template for the `rng` web service of the DockerCoins app:
 apiVersion: v1
 kind: Pod
 metadata:
-  name: rng-with-liveness
+  name: healthy-app
 spec:
  containers:
-  - name: rng
-    image: dockercoins/rng:v0.1
+  - name: myapp
+    image: myregistry.io/myapp:v1.0
    livenessProbe:
      httpGet:
-        path: /
+        path: /health
        port: 80
-      initialDelaySeconds: 10
-      periodSeconds: 1
+      periodSeconds: 5
 ```

 If the backend serves an error, or takes longer than 1s, 3 times in a row, it gets killed.
@@ -267,7 +300,7 @@ If the Redis process becomes unresponsive, it will be killed.

 (In that context, worker = process that doesn't accept connections)

- Readiness isn't useful
+- Readiness is useful mostly for rolling updates

  (because workers aren't backends for a service)