🩺 Update healthcheck exercise

This commit is contained in:
Jérôme Petazzoni
2022-01-03 19:36:16 +01:00
parent ee7547999c
commit e902962f3a
2 changed files with 70 additions and 21 deletions

View File

@@ -4,8 +4,6 @@
(we will use the `rng` service in the dockercoins app)
- Observe the correct behavior of the readiness probe
- See what happens when the load increses
(when deploying e.g. an invalid image)
- Observe the behavior of the liveness probe
(spoiler alert: it involves timeouts!)

View File

@@ -2,34 +2,85 @@
- We want to add healthchecks to the `rng` service in dockercoins
- First, deploy a new copy of dockercoins
- The `rng` service exhibits an interesting behavior under load:
- Then, add a readiness probe on the `rng` service
*its latency increases (which will cause probes to time out!)*
(using a simple HTTP check on the `/` route of the service)
- We want to see:
- Check what happens when deploying an invalid image for `rng` (e.g. `alpine`)
- what happens when the readiness probe fails
- Then roll back `rng` to the original image and add a liveness probe
- what happens when the liveness probe fails
(with the same parameters)
- Scale up the `worker` service (to 15+ workers) and observe
- What happens?
- how to set "appropriate" probes and probe parameters
---
## Goal
## Setup
- *Before* adding the readiness probe:
- First, deploy a new copy of dockercoins
updating the image of the `rng` service with `alpine` should break it
(for instance, in a brand new namespace)
- *After* adding the readiness probe:
- Pro tip #1: ping (e.g. with `httping`) the `rng` service at all times
updating the image of the `rng` service with `alpine` shouldn't break it
- it should initially show a few milliseconds latency
- When adding the liveness probe, nothing special should happen
- that will increase when we scale up
- Scaling the `worker` service will then cause disruptions
- it will also let us detect when the service goes "boom"
- Pro tip #2: also keep an eye on the web UI
---
## Readiness
- Add a readiness probe to `rng`
- this requires editing the pod template in the Deployment manifest
- use a simple HTTP check on the `/` route of the service
- keep all other parameters (timeouts, thresholds...) at their default values
- Check what happens when deploying an invalid image for `rng` (e.g. `alpine`)
*(If the probe was set up correctly, the app will continue to work,
because Kubernetes won't switch over the traffic to the `alpine` containers,
because they don't pass the readiness probe.)*
---
## Readiness under load
- Then roll back `rng` to the original image
- Check what happens when we scale up the `worker` Deployment to 15+ workers
(get the latency above 1 second)
*(We should now observe intermittent unavailability of the service, i.e. every
30 seconds it will be unreachable for a bit, then come back, then go away again, etc.)*
---
## Liveness
- Now replace the readiness probe with a liveness probe
- What happens now?
*(At first the behavior looks the same as with the readiness probe:
service becomes unreachable, then reachable again, etc.; but there is
a significant difference behind the scenes. What is it?)*
---
## Readiness and liveness
- Bonus questions!
- What happens if we enable both probes at the same time?
- What strategies can we use so that both probes are useful?