🩺 Update healthcheck exercise

2026-02-14 17:49:59 +00:00 · 2022-01-03 19:36:16 +01:00
parent ee7547999c
commit e902962f3a
2 changed files with 70 additions and 21 deletions
--- a/slides/exercises/healthchecks-brief.md
+++ b/slides/exercises/healthchecks-brief.md
@@ -4,8 +4,6 @@

  (we will use the `rng` service in the dockercoins app)

- Observe the correct behavior of the readiness probe
+- See what happens when the load increses

-  (when deploying e.g. an invalid image)
-
- Observe the behavior of the liveness probe
+  (spoiler alert: it involves timeouts!)
--- a/slides/exercises/healthchecks-details.md
+++ b/slides/exercises/healthchecks-details.md
@@ -2,34 +2,85 @@

 - We want to add healthchecks to the `rng` service in dockercoins

- First, deploy a new copy of dockercoins
+- The `rng` service exhibits an interesting behavior under load:

- Then, add a readiness probe on the `rng` service
+  *its latency increases (which will cause probes to time out!)*

-  (using a simple HTTP check on the `/` route of the service)
+- We want to see:

- Check what happens when deploying an invalid image for `rng` (e.g. `alpine`)
+  - what happens when the readiness probe fails

- Then roll back `rng` to the original image and add a liveness probe
+  - what happens when the liveness probe fails

-  (with the same parameters)
-
- Scale up the `worker` service (to 15+ workers) and observe
-
- What happens?
+  - how to set "appropriate" probes and probe parameters

 ---

-## Goal
+## Setup

- *Before* adding the readiness probe:
+- First, deploy a new copy of dockercoins

-  updating the image of the `rng` service with `alpine` should break it
+  (for instance, in a brand new namespace)

- *After* adding the readiness probe:
+- Pro tip #1: ping (e.g. with `httping`) the `rng` service at all times

-  updating the image of the `rng` service with `alpine` shouldn't break it
+  - it should initially show a few milliseconds latency

- When adding the liveness probe, nothing special should happen
+  - that will increase when we scale up

- Scaling the `worker` service will then cause disruptions
+  - it will also let us detect when the service goes "boom"
+
+- Pro tip #2: also keep an eye on the web UI
+
+---
+
+## Readiness
+
+- Add a readiness probe to `rng`
+
+  - this requires editing the pod template in the Deployment manifest
+
+  - use a simple HTTP check on the `/` route of the service
+
+  - keep all other parameters (timeouts, thresholds...) at their default values
+
+- Check what happens when deploying an invalid image for `rng` (e.g. `alpine`)
+
+*(If the probe was set up correctly, the app will continue to work,
+because Kubernetes won't switch over the traffic to the `alpine` containers,
+because they don't pass the readiness probe.)*
+
+---
+
+## Readiness under load
+
+- Then roll back `rng` to the original image
+
+- Check what happens when we scale up the `worker` Deployment to 15+ workers
+
+  (get the latency above 1 second)
+
+*(We should now observe intermittent unavailability of the service, i.e. every
+30 seconds it will be unreachable for a bit, then come back, then go away again, etc.)*
+
+---
+
+## Liveness
+
+- Now replace the readiness probe with a liveness probe
+
+- What happens now?
+
+*(At first the behavior looks the same as with the readiness probe:
+service becomes unreachable, then reachable again, etc.; but there is
+a significant difference behind the scenes. What is it?)*
+
+---
+
+## Readiness and liveness
+
+- Bonus questions!
+
+- What happens if we enable both probes at the same time?
+
+- What strategies can we use so that both probes are useful?