diff --git a/slides/k8s/healthchecks-more.md b/slides/k8s/healthchecks-more.md
new file mode 100644
index 00000000..d75aa112
--- /dev/null
+++ b/slides/k8s/healthchecks-more.md
@@ -0,0 +1,404 @@
+## Questions to ask before adding healthchecks
+
+- Do we want liveness, readiness, both?
+
+ (sometimes, we can use the same check, but with different failure thresholds)
+
+- Do we have existing HTTP endpoints that we can use?
+
+- Do we need to add new endpoints, or perhaps use something else?
+
+- Are our healthchecks likely to use resources and/or slow down the app?
+
+- Do they depend on additional services?
+
+ (this can be particularly tricky, see next slide)
+
+---
+
+## Healthchecks and dependencies
+
+- A good healthcheck should always indicate the health of the service itself
+
+- It should not be affected by the state of the service's dependencies
+
+- Example: a web server requiring a database connection to operate
+
+ (make sure that the healthcheck can report "OK" even if the database is down;
+
+ because it won't help us to restart the web server if the issue is with the DB!)
+
+- Example: a microservice calling other microservices
+
+- Example: a worker process
+
+ (these will generally require minor code changes to report health)
+
+---
+
+## Adding healthchecks to an app
+
+- Let's add healthchecks to DockerCoins!
+
+- We will examine the questions of the previous slide
+
+- Then we will review each component individually to add healthchecks
+
+---
+
+## Liveness, readiness, or both?
+
+- To answer that question, we need to see the app run for a while
+
+- Do we get temporary, recoverable glitches?
+
+ → then use readiness
+
+- Or do we get hard lock-ups requiring a restart?
+
+ → then use liveness
+
+- In the case of DockerCoins, we don't know yet!
+
+- Let's pick liveness
+
+---
+
+## Do we have HTTP endpoints that we can use?
+
+- Each of the 3 web services (hasher, rng, webui) has a trivial route on `/`
+
+- These routes:
+
+ - don't seem to perform anything complex or expensive
+
+ - don't seem to call other services
+
+- Perfect!
+
+ (See next slides for individual details)
+
+---
+
+- [hasher.rb](https://github.com/jpetazzo/container.training/blob/master/dockercoins/hasher/hasher.rb)
+ ```ruby
+ get '/' do
+ "HASHER running on #{Socket.gethostname}\n"
+ end
+ ```
+
+- [rng.py](https://github.com/jpetazzo/container.training/blob/master/dockercoins/rng/rng.py)
+ ```python
+ @app.route("/")
+ def index():
+ return "RNG running on {}\n".format(hostname)
+ ```
+
+- [webui.js](https://github.com/jpetazzo/container.training/blob/master/dockercoins/webui/webui.js)
+ ```javascript
+ app.get('/', function (req, res) {
+ res.redirect('/index.html');
+ });
+ ```
+
+---
+
+## Running DockerCoins
+
+- We will run DockerCoins in a new, separate namespace
+
+- We will use a set of YAML manifests and pre-built images
+
+- We will add our new liveness probe to the YAML of the `rng` DaemonSet
+
+- Then, we will deploy the application
+
+---
+
+## Creating a new namespace
+
+- This will make sure that we don't collide / conflict with previous exercises
+
+.exercise[
+
+- Create the yellow namespace:
+ ```bash
+ kubectl create namespace yellow
+ ```
+
+- Switch to that namespace:
+ ```bash
+ kns yellow
+ ```
+
+]
+
+---
+
+## Retrieving DockerCoins manifests
+
+- All the manifests that we need are on a convenient repository:
+
+ https://github.com/jpetazzo/kubercoins
+
+.exercise[
+
+- Clone that repository:
+ ```bash
+ cd ~
+ git clone https://github.com/jpetazzo/kubercoins
+ ```
+
+- Change directory to the repository:
+ ```bash
+ cd kubercoins
+ ```
+
+]
+
+---
+
+## A simple HTTP liveness probe
+
+This is what our liveness probe should look like:
+
+```yaml
+containers:
+- name: ...
+image: ...
+livenessProbe:
+ httpGet:
+ path: /
+ port: 80
+ initialDelaySeconds: 30
+ periodSeconds: 5
+```
+
+This will give 30 seconds to the service to start. (Way more than necessary!)
+
+It will run the probe every 5 seconds.
+
+It will use the default timeout (1 second).
+
+It will use the default failure threshold (3 failed attempts = dead).
+
+It will use the default success threshold (1 successful attempt = alive).
+
+---
+
+## Adding the liveness probe
+
+- Let's add the liveness probe, then deploy DockerCoins
+
+.exercise[
+
+- Edit `rng-daemonset.yaml` and add the liveness probe
+ ```bash
+ vim rng-daemonset.yaml
+ ```
+
+- Load the YAML for all the resources of DockerCoins:
+ ```bash
+ kubectl apply -f .
+ ```
+
+]
+
+---
+
+## Testing the liveness probe
+
+- The rng service needs 100ms to process a request
+
+ (because it is single-threaded and sleeps 0.1s in each request)
+
+- The probe timeout is set to 1 second
+
+- If we send more than 10 requests per second per backend, it will break
+
+- Let's generate traffic and see what happens!
+
+.exercise[
+
+- Get the ClusterIP address of the rng service:
+ ```bash
+ kubectl get svc rng
+ ```
+
+]
+
+---
+
+## Monitoring the rng service
+
+- Each command below will show us what's happening on a different level
+
+.exercise[
+
+- In one window, monitor cluster events:
+ ```bash
+ kubectl get events -w
+ ```
+
+- In another window, monitor the response time of rng:
+ ```bash
+ httping ``
+ ```
+
+- In another window, monitor pods status:
+ ```bash
+ kubectl get pods -w
+ ```
+
+]
+
+---
+
+## Generating traffic
+
+- Let's use `ab` to send concurrent requests to rng
+
+.exercise[
+
+- In yet another window, generate traffic:
+ ```bash
+ ab -c 10 -n 1000 http://``/1
+ ```
+
+- Experiment with higher values of `-c` and see what happens
+
+]
+
+- The `-c` parameter indicates the number of concurrent requests
+
+- The final `/1` is important to generate actual traffic
+
+ (otherwise we would use the ping endpoint, which doesn't sleep 0.1s per request)
+
+---
+
+## Discussion
+
+- Above a given threshold, the liveness probe starts failing
+
+ (about 10 concurrent requests per backend should be plenty enough)
+
+- When the liveness probe fails 3 times in a row, the container is restarted
+
+- During the restart, there is *less* capacity available
+
+- ... Meaning that the other backends are likely to timeout as well
+
+- ... Eventually causing all backends to be restarted
+
+- ... And each fresh backend gets restarted, too
+
+- This goes on until the load goes down, or we add capacity
+
+*This wouldn't be a good healthcheck in a real application!*
+
+---
+
+## Better healthchecks
+
+- We need to make sure that the healthcheck doesn't trip when
+ performance degrades due to external pressure
+
+- Using a readiness check would have lesser effects
+
+ (but it still would be an imperfect solution)
+
+- A possible combination:
+
+ - readiness check with a short timeout / low failure threshold
+
+ - liveness check with a longer timeout / higher failure treshold
+
+---
+
+## Healthchecks for redis
+
+- A liveness probe is enough
+
+ (it's not useful to remove a backend from rotation when it's the only one)
+
+- We could use an exec probe running `redis-cli ping`
+
+---
+
+class: extra-details
+
+## Exec probes and zombies
+
+- When using exec probes, we should make sure that we have a *zombie reaper*
+
+ 🤔🧐🧟 Wait, what?
+
+- When a process terminates, its parent must call `wait()`/`waitpid()`
+
+ (this is how the parent process retrieves the child's exit status)
+
+- In the meantime, the process is in *zombie* state
+
+ (the process state will show as `Z` in `ps`, `top` ...)
+
+- When a process is killed, its children are *orphaned* and attached to PID 1
+
+- PID 1 has the responsibility if *reaping* these processes when they terminate
+
+- OK, but how does that affect us?
+
+---
+
+class: extra-details
+
+## PID 1 in containers
+
+- On ordinary systems, PID 1 (`/sbin/init`) has logic to reap processes
+
+- In containers, PID 1 is typically our application process
+
+ (e.g. Apache, the JVM, NGINX, Redis ...)
+
+- These *do not* take care of reaping orphans
+
+- If we use exec probes, we need to add a process reaper
+
+- We can add [tini](https://github.com/krallin/tini) to our images
+
+- Or [share the PID namespace between containers of a pod](https://kubernetes.io/docs/tasks/configure-pod-container/share-process-namespace/)
+
+ (and have gcr.io/pause take care of the reaping)
+
+---
+
+## PID 1
+
+
+- W
+
+- On normal systems
+
+
+
+---
+
+## Healthchecks for worker
+
+- Readiness isn't useful
+
+ (because worker isn't a backend for a service)
+
+- Liveness may help us to restart a broken worker, but how can we check it?
+
+- Embedding an HTTP server is an option
+
+ (but it has a high potential for unwanted side-effects and false positives)
+
+- Using a "lease" file can be relatively easy:
+
+ - touch a file during each iteration of the main loop
+
+ - check the timestamp of that file from an exec probe
+
+- Writing logs (and checking them from the probe) also works