🏭️ Refactor stateful apps content

2026-02-14 09:39:56 +00:00 · 2021-11-20 22:00:50 +01:00
parent 93d8a23c81
commit 52015b81fe
15 changed files with 1426 additions and 1363 deletions
--- a/k8s/mounter.yaml
+++ b/k8s/mounter.yaml
@@ -0,0 +1,20 @@
+kind: Pod
+apiVersion: v1
+metadata:
+  generateName: mounter-
+  labels:
+    container.training/mounter: ""
+spec:
+  volumes:
+  - name: pvc
+    persistentVolumeClaim:
+      claimName: my-pvc-XYZ45
+  containers:
+  - name: mounter
+    image: alpine
+    stdin: true
+    tty: true
+    volumeMounts:
+    - name: pvc
+      mountPath: /pvc
+    workingDir: /pvc
--- a/k8s/pv.yaml
+++ b/k8s/pv.yaml
@@ -0,0 +1,20 @@
+kind: PersistentVolume
+apiVersion: v1
+metadata:
+  generateName: my-pv-
+  labels:
+    container.training/pv: ""
+spec:
+  accessModes:
+  - ReadWriteOnce
+  - ReadWriteMany
+  capacity:
+    storage: 1G
+  hostPath:
+    path: /tmp/my-pv
+  #storageClassName: my-sc
+  #claimRef:
+  #  kind: PersistentVolumeClaim
+  #  apiVersion: v1
+  #  namespace: default
+  #  name: my-pvc-XYZ45
--- a/k8s/pvc.yaml
+++ b/k8s/pvc.yaml
@@ -0,0 +1,13 @@
+kind: PersistentVolumeClaim
+apiVersion: v1
+metadata:
+  generateName: my-pvc-
+  labels:
+    container.training/pvc: ""
+spec:
+  accessModes:
+  - ReadWriteOnce
+  resources:
+    requests:
+      storage: 1G
+  #storageClassName: my-sc
--- a/slides/k8s/consul.md
+++ b/slides/k8s/consul.md
@@ -0,0 +1,228 @@
+# Running a Consul cluster
+
+- Here is a good use-case for Stateful sets!
+
+- We are going to deploy a Consul cluster with 3 nodes
+
+- Consul is a highly-available key/value store
+
+  (like etcd or Zookeeper)
+
+- One easy way to bootstrap a cluster is to tell each node:
+
+  - the addresses of other nodes
+
+  - how many nodes are expected (to know when quorum is reached)
+
+---
+
+## Bootstrapping a Consul cluster
+
+*After reading the Consul documentation carefully (and/or asking around),
+we figure out the minimal command-line to run our Consul cluster.*
+
+```
+consul agent -data-dir=/consul/data -client=0.0.0.0 -server -ui \
+       -bootstrap-expect=3 \
+       -retry-join=`X.X.X.X` \
+       -retry-join=`Y.Y.Y.Y`
+```
+
+- Replace X.X.X.X and Y.Y.Y.Y with the addresses of other nodes
+
+- A node can add its own address (it will work fine)
+
+- ... Which means that we can use the same command-line on all nodes (convenient!)
+
+---
+
+## Cloud Auto-join
+
+- Since version 1.4.0, Consul can use the Kubernetes API to find its peers
+
+- This is called [Cloud Auto-join]
+
+- Instead of passing an IP address, we need to pass a parameter like this:
+
+  ```
+  consul agent -retry-join "provider=k8s label_selector=\"app=consul\""
+  ```
+
+- Consul needs to be able to talk to the Kubernetes API
+
+- We can provide a `kubeconfig` file
+
+- If Consul runs in a pod, it will use the *service account* of the pod
+
+[Cloud Auto-join]: https://www.consul.io/docs/agent/cloud-auto-join.html#kubernetes-k8s-
+
+---
+
+## Setting up Cloud auto-join
+
+- We need to create a service account for Consul
+
+- We need to create a role that can `list` and `get` pods
+
+- We need to bind that role to the service account
+
+- And of course, we need to make sure that Consul pods use that service account
+
+---
+
+## Putting it all together
+
+- The file `k8s/consul-1.yaml` defines the required resources
+
+  (service account, role, role binding, service, stateful set)
+
+- Inspired by this [excellent tutorial](https://github.com/kelseyhightower/consul-on-kubernetes) by Kelsey Hightower
+
+  (many features from the original tutorial were removed for simplicity)
+
+---
+
+## Running our Consul cluster
+
+- We'll use the provided YAML file
+
+.exercise[
+
+- Create the stateful set and associated service:
+  ```bash
+  kubectl apply -f ~/container.training/k8s/consul-1.yaml
+  ```
+
+- Check the logs as the pods come up one after another:
+  ```bash
+  stern consul
+  ```
+
+<!--
+```wait Synced node info```
+```key ^C```
+-->
+
+- Check the health of the cluster:
+  ```bash
+  kubectl exec consul-0 -- consul members
+  ```
+
+]
+
+---
+
+## Caveats
+
+- The scheduler may place two Consul pods on the same node
+
+  - if that node fails, we lose two Consul pods at the same time
+  - this will cause the cluster to fail
+
+- Scaling down the cluster will cause it to fail
+
+  - when a Consul member leaves the cluster, it needs to inform the others
+  - otherwise, the last remaining node doesn't have quorum and stops functioning
+
+- This Consul cluster doesn't use real persistence yet
+
+  - data is stored in the containers' ephemeral filesystem
+  - if a pod fails, its replacement starts from a blank slate
+
+---
+
+## Improving pod placement
+
+- We need to tell the scheduler:
+
+  *do not put two of these pods on the same node!*
+
+- This is done with an `affinity` section like the following one:
+  ```yaml
+    affinity:
+      podAntiAffinity:
+        requiredDuringSchedulingIgnoredDuringExecution:
+          - labelSelector:
+              matchLabels:
+                app: consul
+            topologyKey: kubernetes.io/hostname
+  ```
+
+---
+
+## Using a lifecycle hook
+
+- When a Consul member leaves the cluster, it needs to execute:
+  ```bash
+  consul leave
+  ```
+
+- This is done with a `lifecycle` section like the following one:
+  ```yaml
+    lifecycle:
+      preStop:
+        exec:
+          command: [ "sh", "-c", "consul leave" ]
+  ```
+
+---
+
+## Running a better Consul cluster
+
+- Let's try to add the scheduling constraint and lifecycle hook
+
+- We can do that in the same namespace or another one (as we like)
+
+- If we do that in the same namespace, we will see a rolling update
+
+  (pods will be replaced one by one)
+
+.exercise[
+
+- Deploy a better Consul cluster:
+  ```bash
+  kubectl apply -f ~/container.training/k8s/consul-2.yaml
+  ```
+
+]
+
+---
+
+## Still no persistence, though
+
+- We aren't using actual persistence yet
+
+  (no `volumeClaimTemplate`, Persistent Volume, etc.)
+
+- What happens if we lose a pod?
+
+  - a new pod gets rescheduled (with an empty state)
+
+  - the new pod tries to connect to the two others
+
+  - it will be accepted (after 1-2 minutes of instability)
+
+  - and it will retrieve the data from the other pods
+
+---
+
+## Failure modes
+
+- What happens if we lose two pods?
+
+  - manual repair will be required
+
+  - we will need to instruct the remaining one to act solo
+
+  - then rejoin new pods
+
+- What happens if we lose three pods? (aka all of them)
+
+  - we lose all the data (ouch)
+
+???
+
+:EN:- Scheduling pods together or separately
+:EN:- Example: deploying a Consul cluster
+:FR:- Lancer des pods ensemble ou séparément
+:FR:- Example : lancer un cluster Consul
--- a/slides/k8s/local-persistent-volumes.md
+++ b/slides/k8s/local-persistent-volumes.md
@@ -1,251 +0,0 @@
-# Local Persistent Volumes
-
- We want to run that Consul cluster *and* actually persist data
-
- But we don't have a distributed storage system
-
- We are going to use local volumes instead
-
-  (similar conceptually to `hostPath` volumes)
-
- We can use local volumes without installing extra plugins
-
- However, they are tied to a node
-
- If that node goes down, the volume becomes unavailable
-
---
-
-## With or without dynamic provisioning
-
- We will deploy a Consul cluster *with* persistence
-
- That cluster's StatefulSet will create PVCs
-
- These PVCs will remain unbound¹, until we will create local volumes manually
-
-  (we will basically do the job of the dynamic provisioner)
-
- Then, we will see how to automate that with a dynamic provisioner
-
-.footnote[¹Unbound = without an associated Persistent Volume.]
-
---
-
-## If we have a dynamic provisioner ...
-
- The labs in this section assume that we *do not* have a dynamic provisioner
-
- If we do have one, we need to disable it
-
-.exercise[
-
- Check if we have a dynamic provisioner:
-  ```bash
-  kubectl get storageclass
-  ```
-
- If the output contains a line with `(default)`, run this command:
-  ```bash
-  kubectl annotate sc storageclass.kubernetes.io/is-default-class- --all
-  ```
-
- Check again that it is no longer marked as `(default)`
-
-]
-
---
-
-## Deploying Consul
-
- Let's use a new manifest for our Consul cluster
-
- The only differences between that file and the previous one are:
-
-  - `volumeClaimTemplate` defined in the Stateful Set spec
-
-  - the corresponding `volumeMounts` in the Pod spec
-
-.exercise[
-
- Apply the persistent Consul YAML file:
-  ```bash
-  kubectl apply -f ~/container.training/k8s/consul-3.yaml
-  ```
-
-]
-
---
-
-## Observing the situation
-
- Let's look at Persistent Volume Claims and Pods
-
-.exercise[
-
- Check that we now have an unbound Persistent Volume Claim:
-  ```bash
-  kubectl get pvc
-  ```
-
- We don't have any Persistent Volume:
-  ```bash
-  kubectl get pv
-  ```
-
- The Pod `consul-0` is not scheduled yet:
-  ```bash
-  kubectl get pods -o wide
-  ```
-
-]
-
-*Hint: leave these commands running with `-w` in different windows.*
-
---
-
-## Explanations
-
- In a Stateful Set, the Pods are started one by one
-
- `consul-1` won't be created until `consul-0` is running
-
- `consul-0` has a dependency on an unbound Persistent Volume Claim
-
- The scheduler won't schedule the Pod until the PVC is bound
-
-  (because the PVC might be bound to a volume that is only available on a subset of nodes; for instance EBS are tied to an availability zone)
-
---
-
-## Creating Persistent Volumes
-
- Let's create 3 local directories (`/mnt/consul`) on node2, node3, node4
-
- Then create 3 Persistent Volumes corresponding to these directories
-
-.exercise[
-
- Create the local directories:
-  ```bash
-    for NODE in node2 node3 node4; do
-      ssh $NODE sudo mkdir -p /mnt/consul
-    done
-  ```
-
- Create the PV objects:
-  ```bash
-  kubectl apply -f ~/container.training/k8s/volumes-for-consul.yaml
-  ```
-
-]
-
---
-
-## Check our Consul cluster
-
- The PVs that we created will be automatically matched with the PVCs
-
- Once a PVC is bound, its pod can start normally
-
- Once the pod `consul-0` has started, `consul-1` can be created, etc.
-
- Eventually, our Consul cluster is up, and backend by "persistent" volumes
-
-.exercise[
-
- Check that our Consul clusters has 3 members indeed:
-  ```bash
-  kubectl exec consul-0 -- consul members
-  ```
-
-]
-
---
-
-## Devil is in the details (1/2)
-
- The size of the Persistent Volumes is bogus
-
-  (it is used when matching PVs and PVCs together, but there is no actual quota or limit)
-
---
-
-## Devil is in the details (2/2)
-
- This specific example worked because we had exactly 1 free PV per node:
-
-  - if we had created multiple PVs per node ...
-
-  - we could have ended with two PVCs bound to PVs on the same node ...
-
-  - which would have required two pods to be on the same node ...
-
-  - which is forbidden by the anti-affinity constraints in the StatefulSet
-
- To avoid that, we need to associated the PVs with a Storage Class that has:
-  ```yaml
-  volumeBindingMode: WaitForFirstConsumer
-  ```
-  (this means that a PVC will be bound to a PV only after being used by a Pod)
-
- See [this blog post](https://kubernetes.io/blog/2018/04/13/local-persistent-volumes-beta/) for more details
-
---
-
-## Bulk provisioning
-
- It's not practical to manually create directories and PVs for each app
-
- We *could* pre-provision a number of PVs across our fleet
-
- We could even automate that with a Daemon Set:
-
-  - creating a number of directories on each node
-
-  - creating the corresponding PV objects
-
- We also need to recycle volumes
-
- ... This can quickly get out of hand
-
---
-
-## Dynamic provisioning
-
- We could also write our own provisioner, which would:
-
-  - watch the PVCs across all namespaces
-
-  - when a PVC is created, create a corresponding PV on a node
-
- Or we could use one of the dynamic provisioners for local persistent volumes
-
-  (for instance the [Rancher local path provisioner](https://github.com/rancher/local-path-provisioner))
-
---
-
-## Strategies for local persistent volumes
-
- Remember, when a node goes down, the volumes on that node become unavailable
-
- High availability will require another layer of replication
-
-  (like what we've just seen with Consul; or primary/secondary; etc)
-
- Pre-provisioning PVs makes sense for machines with local storage
-
-  (e.g. cloud instance storage; or storage directly attached to a physical machine)
-
- Dynamic provisioning makes sense for large number of applications
-
-  (when we can't or won't dedicate a whole disk to a volume)
-
- It's possible to mix both (using distinct Storage Classes)
-
-???
-
-:EN:- Static vs dynamic volume provisioning
-:EN:- Example: local persistent volume provisioner
-:FR:- Création statique ou dynamique de volumes
-:FR:- Exemple : création de volumes locaux
--- a/slides/k8s/openebs.md
+++ b/slides/k8s/openebs.md
@@ -321,207 +321,13 @@ EOF

 ---

-## Creating a Pod using the Jiva class
+## We're ready now!

- We will create a Pod running PostgreSQL, using the default class
+- We have a StorageClass that can provision PersistentVolumes

-.exercise[
+- These PersistentVolumes will be replicated across nodes

- Create the Pod:
-  ```bash
-  kubectl apply -f ~/container.training/k8s/postgres.yaml
-  ```
-
- Wait for the PV, PVC, and Pod to be up:
-  ```bash
-  watch kubectl get pv,pvc,pod
-  ```
-
- We can also check what's going on in the `openebs` namespace:
-  ```bash
-  watch kubectl get pods --namespace openebs
-  ```
-
-]
-
---
-
-## Node failover
-
-⚠️ This will partially break your cluster!
-
- We are going to disconnect the node running PostgreSQL from the cluster
-
- We will see what happens, and how to recover
-
- We will not reconnect the node to the cluster
-
- This whole lab will take at least 10-15 minutes (due to various timeouts)
-
-⚠️ Only do this lab at the very end, when you don't want to run anything else after!
-
---
-
-## Disconnecting the node from the cluster
-
-.exercise[
-
- Find out where the Pod is running, and SSH into that node:
-  ```bash
-  kubectl get pod postgres-0 -o jsonpath={.spec.nodeName}
-  ssh nodeX
-  ```
-
- Check the name of the network interface:
-  ```bash
-  sudo ip route ls default
-  ```
-
- The output should look like this:
-  ```
-  default via 10.10.0.1 `dev ensX` proto dhcp src 10.10.0.13 metric 100 
-  ```
-
- Shutdown the network interface:
-  ```bash
-  sudo ip link set ensX down
-  ```
-
-]
-
---
-
-## Watch what's going on
-
- Let's look at the status of Nodes, Pods, and Events
-
-.exercise[
-
- In a first pane/tab/window, check Nodes and Pods:
-  ```bash
-  watch kubectl get nodes,pods -o wide
-  ```
-
- In another pane/tab/window, check Events:
-  ```bash
-  kubectl get events --watch
-  ```
-
-]
-
---
-
-## Node Ready → NotReady
-
- After \~30 seconds, the control plane stops receiving heartbeats from the Node
-
- The Node is marked NotReady
-
- It is not *schedulable* anymore
-
-  (the scheduler won't place new pods there, except some special cases)
-
- All Pods on that Node are also *not ready*
-
-  (they get removed from service Endpoints)
-
- ... But nothing else happens for now
-
-  (the control plane is waiting: maybe the Node will come back shortly?)
-
---
-
-## Pod eviction
-
- After \~5 minutes, the control plane will evict most Pods from the Node
-
- These Pods are now `Terminating`
-
- The Pods controlled by e.g. ReplicaSets are automatically moved
-
-  (or rather: new Pods are created to replace them)
-
- But nothing happens to the Pods controlled by StatefulSets at this point
-
-  (they remain `Terminating` forever)
-
- Why? 🤔
-
--
-
- This is to avoid *split brain scenarios*
-
---
-
-class: extra-details
-
-## Split brain 🧠⚡️🧠
-
- Imagine that we create a replacement pod `postgres-0` on another Node
-
- And 15 minutes later, the Node is reconnected and the original `postgres-0` comes back
-
- Which one is the "right" one?
-
- What if they have conflicting data?
-
-😱
-
- We *cannot* let that happen!
-
- Kubernetes won't do it
-
- ... Unless we tell it to
-
---
-
-## The Node is gone
-
- One thing we can do, is tell Kubernetes "the Node won't come back"
-
-  (there are other methods; but this one is the simplest one here)
-
- This is done with a simple `kubectl delete node`
-
-.exercise[
-
- `kubectl delete` the Node that we disconnected
-
-]
-
---
-
-## Pod rescheduling
-
- Kubernetes removes the Node
-
- After a brief period of time (\~1 minute) the "Terminating" Pods are removed
-
- A replacement Pod is created on another Node
-
- ... But it doens't start yet!
-
- Why? 🤔
-
---
-
-## Multiple attachment
-
- By default, a disk can only be attached to one Node at a time
-
-  (sometimes it's a hardware or API limitation; sometimes enforced in software)
-
- In our Events, we should see `FailedAttachVolume` and `FailedMount` messages
-
- After \~5 more minutes, the disk will be force-detached from the old Node
-
- ... Which will allow attaching it to the new Node!
-
-🎉
-
- The Pod will then be able to start
-
- Failover is complete!
+- They should be able to withstand single-node failures

 ???

--- a/slides/k8s/portworx.md
+++ b/slides/k8s/portworx.md
@@ -1,42 +1,4 @@
-# Highly available Persistent Volumes
-
- How can we achieve true durability?
-
- How can we store data that would survive the loss of a node?
-
--
-
- We need to use Persistent Volumes backed by highly available storage systems
-
- There are many ways to achieve that:
-
-  - leveraging our cloud's storage APIs
-
-  - using NAS/SAN systems or file servers
-
-  - distributed storage systems
-
--
-
- We are going to see one distributed storage system in action
-
---
-
-## Our test scenario
-
- We will set up a distributed storage system on our cluster
-
- We will use it to deploy a SQL database (PostgreSQL)
-
- We will insert some test data in the database
-
- We will disrupt the node running the database
-
- We will see how it recovers
-
---
-
-## Portworx
+# Portworx

 - Portworx is a *commercial* persistent storage solution for containers

@@ -60,7 +22,7 @@

 - We're installing Portworx because we need a storage system

- If you are using AKS, EKS, GKE ... you already have a storage system
+- If you are using AKS, EKS, GKE, Kapsule ... you already have a storage system

  (but you might want another one, e.g. to leverage local storage)

@@ -301,364 +263,6 @@ parameters:

 ---

-## Our Postgres Stateful set
-
- The next slide shows `k8s/postgres.yaml`
-
- It defines a Stateful set
-
- With a `volumeClaimTemplate` requesting a 1 GB volume
-
- That volume will be mounted to `/var/lib/postgresql/data`
-
- There is another little detail: we enable the `stork` scheduler
-
- The `stork` scheduler is optional (it's specific to Portworx)
-
- It helps the Kubernetes scheduler to colocate the pod with its volume
-
-  (see [this blog post](https://portworx.com/stork-storage-orchestration-kubernetes/) for more details about that)
-
---
-
-.small[
-```yaml
-apiVersion: apps/v1
-kind: StatefulSet
-metadata:
-  name: postgres
-spec:
-  selector:
-    matchLabels:
-      app: postgres
-  serviceName: postgres
-  template:
-    metadata:
-      labels:
-        app: postgres
-    spec:
-      schedulerName: stork
-      containers:
-      - name: postgres
-        image: postgres:12
-        env:
-        - name: POSTGRES_HOST_AUTH_METHOD
-          value: trust
-        volumeMounts:
-        - mountPath: /var/lib/postgresql/data
-          name: postgres
-  volumeClaimTemplates:
-  - metadata:
-      name: postgres
-    spec:
-      accessModes: ["ReadWriteOnce"]
-      resources:
-        requests:
-          storage: 1Gi
-```
-]
-
---
-
-## Creating the Stateful set
-
- Before applying the YAML, watch what's going on with `kubectl get events -w`
-
-.exercise[
-
- Apply that YAML:
-  ```bash
-  kubectl apply -f ~/container.training/k8s/postgres.yaml
-  ```
-
-<!-- ```hide kubectl wait pod postgres-0 --for condition=ready``` -->
-
-]
-
---
-
-## Testing our PostgreSQL pod
-
- We will use `kubectl exec` to get a shell in the pod
-
- Good to know: we need to use the `postgres` user in the pod
-
-.exercise[
-
- Get a shell in the pod, as the `postgres` user:
-  ```bash
-  kubectl exec -ti postgres-0 -- su postgres
-  ```
-
-<!--
-autopilot prompt detection expects $ or # at the beginning of the line.
-```wait postgres@postgres```
-```keys PS1="\u@\h:\w\n\$ "```
-```key ^J```
-->
-
- Check that default databases have been created correctly:
-  ```bash
-  psql -l
-  ```
-
-]
-
-(This should show us 3 lines: postgres, template0, and template1.)
-
---
-
-## Inserting data in PostgreSQL
-
- We will create a database and populate it with `pgbench`
-
-.exercise[
-
- Create a database named `demo`:
-  ```bash
-  createdb demo
-  ```
-
- Populate it with `pgbench`:
-  ```bash
-  pgbench -i demo
-  ```
-
-]
-
- The `-i` flag means "create tables"
-
- If you want more data in the test tables, add e.g. `-s 10` (to get 10x more rows)
-
---
-
-## Checking how much data we have now
-
- The `pgbench` tool inserts rows in table `pgbench_accounts`
-
-.exercise[
-
- Check that the `demo` base exists:
-  ```bash
-  psql -l
-  ```
-
- Check how many rows we have in `pgbench_accounts`:
-  ```bash
-  psql demo -c "select count(*) from pgbench_accounts"
-  ```
-
- Check that `pgbench_history` is currently empty:
-  ```bash
-  psql demo -c "select count(*) from pgbench_history"
-  ```
-
-]
-
---
-
-## Testing the load generator
-
- Let's use `pgbench` to generate a few transactions
-
-.exercise[
-
- Run `pgbench` for 10 seconds, reporting progress every second:
-  ```bash
-  pgbench -P 1 -T 10 demo
-  ```
-
- Check the size of the history table now:
-  ```bash
-  psql demo -c "select count(*) from pgbench_history"
-  ```
-
-]
-
-Note: on small cloud instances, a typical speed is about 100 transactions/second.
-
---
-
-## Generating transactions
-
- Now let's use `pgbench` to generate more transactions
-
- While it's running, we will disrupt the database server
-
-.exercise[
-
- Run `pgbench` for 10 minutes, reporting progress every second:
-  ```bash
-  pgbench -P 1 -T 600 demo
-  ```
-
- You can use a longer time period if you need more time to run the next steps
-
-<!-- ```tmux split-pane -h``` -->
-
-]
-
---
-
-## Find out which node is hosting the database
-
- We can find that information with `kubectl get pods -o wide`
-
-.exercise[
-
- Check the node running the database:
-  ```bash
-  kubectl get pod postgres-0 -o wide
-  ```
-
-]
-
-We are going to disrupt that node.
-
--
-
-By "disrupt" we mean: "disconnect it from the network".
-
---
-
-## Disconnect the node
-
- We will use `iptables` to block all traffic exiting the node
-
-  (except SSH traffic, so we can repair the node later if needed)
-
-.exercise[
-
- SSH to the node to disrupt:
-  ```bash
-  ssh `nodeX`
-  ```
-
- Allow SSH traffic leaving the node, but block all other traffic:
-  ```bash
-  sudo iptables -I OUTPUT -p tcp --sport 22 -j ACCEPT
-  sudo iptables -I OUTPUT 2 -j DROP
-  ```
-
-]
-
---
-
-## Check that the node is disconnected
-
-.exercise[
-
- Check that the node can't communicate with other nodes:
-  ```
-  ping node1
-  ```
-
- Logout to go back on `node1`
-
-<!-- ```key ^D``` -->
-
- Watch the events unfolding with `kubectl get events -w` and `kubectl get pods -w`
-
-]
-
- It will take some time for Kubernetes to mark the node as unhealthy
-
- Then it will attempt to reschedule the pod to another node
-
- In about a minute, our pod should be up and running again
-
---
-
-## Check that our data is still available
-
- We are going to reconnect to the (new) pod and check
-
-.exercise[
-
- Get a shell on the pod:
-  ```bash
-  kubectl exec -ti postgres-0 -- su postgres
-  ```
-
-<!--
-```wait postgres@postgres```
-```keys PS1="\u@\h:\w\n\$ "```
-```key ^J```
-->
-
- Check how many transactions are now in the `pgbench_history` table:
-  ```bash
-  psql demo -c "select count(*) from pgbench_history"
-  ```
-
-<!-- ```key ^D``` -->
-
-]
-
-If the 10-second test that we ran earlier gave e.g. 80 transactions per second,
-and we failed the node after 30 seconds, we should have about 2400 row in that table.
-
---
-
-## Double-check that the pod has really moved
-
- Just to make sure the system is not bluffing!
-
-.exercise[
-
- Look at which node the pod is now running on
-  ```bash
-  kubectl get pod postgres-0 -o wide
-  ```
-
-]
-
---
-
-## Re-enable the node
-
- Let's fix the node that we disconnected from the network
-
-.exercise[
-
- SSH to the node:
-  ```bash
-  ssh `nodeX`
-  ```
-
- Remove the iptables rule blocking traffic:
-  ```bash
-  sudo iptables -D OUTPUT 2
-  ```
-
-]
-
---
-
-class: extra-details
-
-## A few words about this PostgreSQL setup
-
- In a real deployment, you would want to set a password
-
- This can be done by creating a `secret`:
-  ```
-  kubectl create secret generic postgres \
-          --from-literal=password=$(base64 /dev/urandom | head -c16)
-  ```
-
- And then passing that secret to the container:
-  ```yaml
-  env:
-  - name: POSTGRES_PASSWORD
-    valueFrom:
-      secretKeyRef:
-        name: postgres
-        key: password
-  ```
-
---
-
 class: extra-details

 ## Troubleshooting Portworx
@@ -666,7 +270,7 @@ class: extra-details
 - If we need to see what's going on with Portworx:
  ```
  PXPOD=$(kubectl -n kube-system get pod -l name=portworx -o json |
-  	      jq -r .items[0].metadata.name)
+          jq -r .items[0].metadata.name)
  kubectl -n kube-system exec $PXPOD -- /opt/pwx/bin/pxctl status
  ```

@@ -709,26 +313,6 @@ class: extra-details

 ---

-class: extra-details
-
-## Dynamic provisioning without a provider
-
- What if we want to use Stateful sets without a storage provider?
-
- We will have to create volumes manually
-
-  (by creating Persistent Volume objects)
-
- These volumes will be automatically bound with matching Persistent Volume Claims
-
- We can use local volumes (essentially bind mounts of host directories)
-
- Of course, these volumes won't be available in case of node failure
-
- Check [this blog post](https://kubernetes.io/blog/2018/04/13/local-persistent-volumes-beta/) for more information and gotchas
-
---
-
 ## Acknowledgements

 The Portworx installation tutorial, and the PostgreSQL example,
@@ -748,8 +332,5 @@ were inspired by [Portworx examples on Katacoda](https://katacoda.com/portworx/s

 ???

-:EN:- Using highly available persistent volumes
-:EN:- Example: deploying a database that can withstand node outages
-
-:FR:- Utilisation de volumes à haute disponibilité
-:FR:- Exemple : déployer une base de données survivant à la défaillance d'un nœud
+:EN:- Hyperconverged storage with Portworx
+:FR:- Stockage hyperconvergé avec Portworx
--- a/slides/k8s/pv-pvc-sc.md
+++ b/slides/k8s/pv-pvc-sc.md
@@ -0,0 +1,323 @@
+# PV, PVC, and Storage Classes
+
+- When an application needs storage, it creates a PersistentVolumeClaim
+
+  (either directly, or through a volume claim template in a Stateful Set)
+
+- The PersistentVolumeClaim is initially `Pending`
+
+- Kubernetes then looks for a suitable PersistentVolume
+
+  (maybe one is immediately available; maybe we need to wait for provisioning)
+
+- Once a suitable PersistentVolume is found, the PVC becomes `Bound`
+
+- The PVC can then be used by a Pod
+
+  (as long as the PVC is `Pending`, the Pod cannot run)
+
+---
+
+## Access modes
+
+- PV and PVC have *access modes*:
+
+  - ReadWriteOnce (only one node can access the volume at a time)
+
+  - ReadWriteMany (multiple nodes can access the volume simultaneously)
+
+  - ReadOnlyMany (multiple nodes can access, but they can't write)
+
+  - ReadWriteOncePod (only one pod can access the volume; new in Kubernetes 1.22)
+
+- A PV lists the access modes that it requires
+
+- A PVC lists the access modes that it supports
+
+⚠️ A PV with only ReadWriteMany won't satisfy a PVC with ReadWriteOnce!
+
+---
+
+## Capacity
+
+- A PVC must express a storage size request
+
+  (field `spec.resources.requests.storage`, in bytes)
+
+- A PV must express its size
+
+  (field `spec.capacity.storage`, in bytes)
+
+- Kubernetes will only match a PV and PVC if the PV is big enough
+
+- These fields are only used for "matchmaking" purposes:
+
+  - nothing prevents the Pod mounting the PVC from using more space
+
+  - nothing requires the PV to actually be that big
+
+---
+
+## Storage Class
+
+- What if we have multiple storage systems available?
+
+  (e.g. NFS and iSCSI; or AzureFile and AzureDisk; or Cinder and Ceph...)
+
+- What if we have a storage system with multiple tiers?
+
+  (e.g. SAN with RAID1 and RAID5; general purpose vs. io optimized EBS...)
+
+- Kubernetes lets us define *storage classes* to represent these
+
+  (see if you have any available at the moment with `kubectl get storageclasses`)
+
+---
+
+## Using storage classes
+
+- Optionally, each PV and each PVC can reference a StorageClass
+
+  (field `spec.storageClassName`)
+
+- When creating a PVC, specifying a StorageClass means
+
+  “use that particular storage system to provision the volume!”
+
+- Storage classes are necessary for [dynamic provisioning](https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/)
+
+  (but we can also ignore them and perform manual provisioning)
+
+---
+
+## Default storage class
+
+- We can define a *default storage class*
+
+  (by annotating it with `storageclass.kubernetes.io/is-default-class=true`)
+
+- When a PVC is created,
+
+  **IF** it doesn't indicate which storage class to use
+
+  **AND** there is a default storage class
+
+  **THEN** the PVC `storageClassName` is set to the default storage class
+
+---
+
+## Additional constraints
+
+- A PersistentVolumeClaim can also specify a volume selector
+
+  (referring to labels on the PV)
+
+- A PersistentVolume can also be created with a `claimRef`
+
+  (indicating to which PVC it should be bound)
+
+---
+
+class: extra-details
+
+## Which PV gets associated to a PVC?
+
+- The PV must be `Available`
+
+- The PV must satisfy the PVC constraints
+
+  (access mode, size, optional selector, optional storage class)
+
+- The PVs with the closest access mode are picked
+
+- Then the PVs with the closest size
+
+- It is possible to specify a `claimRef` when creating a PV
+
+  (this will associate it to the specified PVC, but only if the PV satisfies all the requirements of the PVC; otherwise another PV might end up being picked)
+
+- For all the details about the PersistentVolumeClaimBinder, check [this doc](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/persistent-storage.md#matching-and-binding)
+
+---
+
+## Creating a PVC
+
+- Let's create a standalone PVC and see what happens!
+
+.exercise[
+
+- Check if we have a StorageClass:
+  ```bash
+  kubectl get storageclasses
+  ```
+
+- Create the PVC:
+  ```bash
+  kubectl create -f ~/container.training/k8s/pvc.yaml
+  ```
+
+- Check the PVC:
+  ```bash
+  kubectl get pvc
+  ```
+
+]
+
+---
+
+## Four possibilities
+
+1. If we have a default StorageClass with *immediate* binding:
+
+   *a PV was created and associated to the PVC*
+
+2. If we have a default StorageClass that *waits for first consumer*:
+
+  *the PVC is still `Pending` but has a `STORAGECLASS`* ⚠️
+
+3. If we don't have a default StorageClass:
+
+  *the PVC is still `Pending`, without a `STORAGECLASS`*
+
+4. If we have a StorageClass, but it doesn't work:
+
+  *the PVC is still `Pending` but has a `STORAGECLASS`* ⚠️
+
+---
+
+## Immediate vs WaitForFirstConsumer
+
+- Immediate = as soon as there is a `Pending` PVC, create a PV
+
+- What if:
+
+  - the PV is only available on a node (e.g. local volume)
+
+  - ...or on a subset of nodes (e.g. SAN HBA, EBS AZ...)
+
+  - the Pod that will use the PVC has scheduling constraints
+
+  - these constraints turn out to be incompatible with the PV
+
+- WaitForFirstConsumer = don't provision the PV until a Pod mounts the PVC
+
+---
+
+## Using the PVC
+
+- Let's mount the PVC in a Pod
+
+- We will use a stray Pod (no Deployment, StatefulSet, etc.)
+
+- We will use @@LINK[k8s/mounter.yaml], shown on the next slide
+
+- We'll need to update the `claimName`! ⚠️
+
+---
+
+```yaml
+@@INCLUDE[k8s/mounter.yaml]
+```
+
+---
+
+## Running the Pod
+
+.exercise[
+
+- Edit the `mounter.yaml` manifest
+
+- Update the `claimName` to put the name of our PVC
+
+- Create the Pod
+
+- Check the status of the PV and PVC
+
+]
+
+Note: this "mounter" Pod can be useful to inspect the content of a PVC.
+
+---
+
+## Scenario 1 & 2
+
+If we have a default Storage Class that can provision PVC dynamically...
+
+- We should now have a new PV
+
+- The PV and the PVC should be `Bound` together
+
+---
+
+## Scenario 3
+
+If we don't have a default Storage Class, we must create the PV manually.
+
+```bash
+kubectl create -f ~/container.training/k8s/pv.yaml
+```
+
+After a few seconds, check that the PV and PVC are bound:
+
+```bash
+kubectl get pv,pvc
+```
+
+---
+
+## Scenario 4
+
+If our default Storage Class can't provision a PV, let's do it manually.
+
+The PV must specify the correct `storageClassName`.
+
+```bash
+STORAGECLASS=$(kubectl get pvc --selector=container.training/pvc \
+               -o jsonpath={..storageClassName})
+kubectl patch -f ~/container.training/k8s/pv.yaml --dry-run=client -o yaml \
+        --patch '{"spec": {"storageClassName": "'$STORAGECLASS'"}}' \
+        | kubectl create -f-
+```
+
+Check that the PV and PVC are bound:
+
+```bash
+kubectl get pv,pvc
+```
+
+---
+
+## Checking the Pod
+
+- If the PVC was `Pending`, then the Pod was `Pending` too
+
+- Once the PVC is `Bound`, the Pod can be scheduled and can run
+
+- Once the Pod is `Running`, check it out with `kubectl attach -ti`
+
+---
+
+## PV and PVC lifecycle
+
+- We can't delete a PV if it's `Bound`
+
+- If we `kubectl delete` it, it goes to `Terminating` state
+
+- We can't delete a PVC if it's in use by a Pod
+
+- Likewise, if we `kubectl delete` it, it goes to `Terminating` state
+
+- Deletion is prevented by *finalizers*
+
+  (=like a post-it note saying “don't delete me!”)
+
+- When the mounting Pods are deleted, their PVCs are freed up
+
+- When PVCs are deleted, their PVs are freed up
+
+???
+
+:EN:- Storage provisioning
+:EN:- PV, PVC, StorageClass
+:FR:- Création de volumes
+:FR:- PV, PVC, et StorageClass
--- a/slides/k8s/stateful-failover.md
+++ b/slides/k8s/stateful-failover.md
@@ -0,0 +1,468 @@
+# Stateful failover
+
+- How can we achieve true durability?
+
+- How can we store data that would survive the loss of a node?
+
+--
+
+- We need to use Persistent Volumes backed by highly available storage systems
+
+- There are many ways to achieve that:
+
+  - leveraging our cloud's storage APIs
+
+  - using NAS/SAN systems or file servers
+
+  - distributed storage systems
+
+---
+
+## Our test scenario
+
+- We will use it to deploy a SQL database (PostgreSQL)
+
+- We will insert some test data in the database
+
+- We will disrupt the node running the database
+
+- We will see how it recovers
+
+---
+
+## Our Postgres Stateful set
+
+- The next slide shows `k8s/postgres.yaml`
+
+- It defines a Stateful set
+
+- With a `volumeClaimTemplate` requesting a 1 GB volume
+
+- That volume will be mounted to `/var/lib/postgresql/data`
+
+---
+
+.small[.small[
+```yaml
+@@INCLUDE[k8s/postgres.yaml]
+```
+]]
+
+---
+
+## Creating the Stateful set
+
+- Before applying the YAML, watch what's going on with `kubectl get events -w`
+
+.exercise[
+
+- Apply that YAML:
+  ```bash
+  kubectl apply -f ~/container.training/k8s/postgres.yaml
+  ```
+
+<!-- ```hide kubectl wait pod postgres-0 --for condition=ready``` -->
+
+]
+
+---
+
+## Testing our PostgreSQL pod
+
+- We will use `kubectl exec` to get a shell in the pod
+
+- Good to know: we need to use the `postgres` user in the pod
+
+.exercise[
+
+- Get a shell in the pod, as the `postgres` user:
+  ```bash
+  kubectl exec -ti postgres-0 -- su postgres
+  ```
+
+<!--
+autopilot prompt detection expects $ or # at the beginning of the line.
+```wait postgres@postgres```
+```keys PS1="\u@\h:\w\n\$ "```
+```key ^J```
+-->
+
+- Check that default databases have been created correctly:
+  ```bash
+  psql -l
+  ```
+
+]
+
+(This should show us 3 lines: postgres, template0, and template1.)
+
+---
+
+## Inserting data in PostgreSQL
+
+- We will create a database and populate it with `pgbench`
+
+.exercise[
+
+- Create a database named `demo`:
+  ```bash
+  createdb demo
+  ```
+
+- Populate it with `pgbench`:
+  ```bash
+  pgbench -i demo
+  ```
+
+]
+
+- The `-i` flag means "create tables"
+
+- If you want more data in the test tables, add e.g. `-s 10` (to get 10x more rows)
+
+---
+
+## Checking how much data we have now
+
+- The `pgbench` tool inserts rows in table `pgbench_accounts`
+
+.exercise[
+
+- Check that the `demo` base exists:
+  ```bash
+  psql -l
+  ```
+
+- Check how many rows we have in `pgbench_accounts`:
+  ```bash
+  psql demo -c "select count(*) from pgbench_accounts"
+  ```
+
+- Check that `pgbench_history` is currently empty:
+  ```bash
+  psql demo -c "select count(*) from pgbench_history"
+  ```
+
+]
+
+---
+
+## Testing the load generator
+
+- Let's use `pgbench` to generate a few transactions
+
+.exercise[
+
+- Run `pgbench` for 10 seconds, reporting progress every second:
+  ```bash
+  pgbench -P 1 -T 10 demo
+  ```
+
+- Check the size of the history table now:
+  ```bash
+  psql demo -c "select count(*) from pgbench_history"
+  ```
+
+]
+
+Note: on small cloud instances, a typical speed is about 100 transactions/second.
+
+---
+
+## Generating transactions
+
+- Now let's use `pgbench` to generate more transactions
+
+- While it's running, we will disrupt the database server
+
+.exercise[
+
+- Run `pgbench` for 10 minutes, reporting progress every second:
+  ```bash
+  pgbench -P 1 -T 600 demo
+  ```
+
+- You can use a longer time period if you need more time to run the next steps
+
+<!-- ```tmux split-pane -h``` -->
+
+]
+
+---
+
+## Find out which node is hosting the database
+
+- We can find that information with `kubectl get pods -o wide`
+
+.exercise[
+
+- Check the node running the database:
+  ```bash
+  kubectl get pod postgres-0 -o wide
+  ```
+
+]
+
+We are going to disrupt that node.
+
+--
+
+By "disrupt" we mean: "disconnect it from the network".
+
+---
+
+## Node failover
+
+⚠️ This will partially break your cluster!
+
+- We are going to disconnect the node running PostgreSQL from the cluster
+
+- We will see what happens, and how to recover
+
+- We will not reconnect the node to the cluster
+
+- This whole lab will take at least 10-15 minutes (due to various timeouts)
+
+⚠️ Only do this lab at the very end, when you don't want to run anything else after!
+
+---
+
+## Disconnecting the node from the cluster
+
+.exercise[
+
+- Find out where the Pod is running, and SSH into that node:
+  ```bash
+  kubectl get pod postgres-0 -o jsonpath={.spec.nodeName}
+  ssh nodeX
+  ```
+
+- Check the name of the network interface:
+  ```bash
+  sudo ip route ls default
+  ```
+
+- The output should look like this:
+  ```
+  default via 10.10.0.1 `dev ensX` proto dhcp src 10.10.0.13 metric 100 
+  ```
+
+- Shutdown the network interface:
+  ```bash
+  sudo ip link set ensX down
+  ```
+
+]
+
+---
+
+class: extra-details
+
+## Another way to disconnect the node
+
+- We can also use `iptables` to block all traffic exiting the node
+
+  (except SSH traffic, so we can repair the node later if needed)
+
+.exercise[
+
+- SSH to the node to disrupt:
+  ```bash
+  ssh `nodeX`
+  ```
+
+- Allow SSH traffic leaving the node, but block all other traffic:
+  ```bash
+  sudo iptables -I OUTPUT -p tcp --sport 22 -j ACCEPT
+  sudo iptables -I OUTPUT 2 -j DROP
+  ```
+
+]
+
+---
+
+## Watch what's going on
+
+- Let's look at the status of Nodes, Pods, and Events
+
+.exercise[
+
+- In a first pane/tab/window, check Nodes and Pods:
+  ```bash
+  watch kubectl get nodes,pods -o wide
+  ```
+
+- In another pane/tab/window, check Events:
+  ```bash
+  kubectl get events --watch
+  ```
+
+]
+
+---
+
+## Node Ready → NotReady
+
+- After \~30 seconds, the control plane stops receiving heartbeats from the Node
+
+- The Node is marked NotReady
+
+- It is not *schedulable* anymore
+
+  (the scheduler won't place new pods there, except some special cases)
+
+- All Pods on that Node are also *not ready*
+
+  (they get removed from service Endpoints)
+
+- ... But nothing else happens for now
+
+  (the control plane is waiting: maybe the Node will come back shortly?)
+
+---
+
+## Pod eviction
+
+- After \~5 minutes, the control plane will evict most Pods from the Node
+
+- These Pods are now `Terminating`
+
+- The Pods controlled by e.g. ReplicaSets are automatically moved
+
+  (or rather: new Pods are created to replace them)
+
+- But nothing happens to the Pods controlled by StatefulSets at this point
+
+  (they remain `Terminating` forever)
+
+- Why? 🤔
+
+--
+
+- This is to avoid *split brain scenarios*
+
+---
+
+class: extra-details
+
+## Split brain 🧠⚡️🧠
+
+- Imagine that we create a replacement pod `postgres-0` on another Node
+
+- And 15 minutes later, the Node is reconnected and the original `postgres-0` comes back
+
+- Which one is the "right" one?
+
+- What if they have conflicting data?
+
+😱
+
+- We *cannot* let that happen!
+
+- Kubernetes won't do it
+
+- ... Unless we tell it to
+
+---
+
+## The Node is gone
+
+- One thing we can do, is tell Kubernetes "the Node won't come back"
+
+  (there are other methods; but this one is the simplest one here)
+
+- This is done with a simple `kubectl delete node`
+
+.exercise[
+
+- `kubectl delete` the Node that we disconnected
+
+]
+
+---
+
+## Pod rescheduling
+
+- Kubernetes removes the Node
+
+- After a brief period of time (\~1 minute) the "Terminating" Pods are removed
+
+- A replacement Pod is created on another Node
+
+- ... But it doens't start yet!
+
+- Why? 🤔
+
+---
+
+## Multiple attachment
+
+- By default, a disk can only be attached to one Node at a time
+
+  (sometimes it's a hardware or API limitation; sometimes enforced in software)
+
+- In our Events, we should see `FailedAttachVolume` and `FailedMount` messages
+
+- After \~5 more minutes, the disk will be force-detached from the old Node
+
+- ... Which will allow attaching it to the new Node!
+
+🎉
+
+- The Pod will then be able to start
+
+- Failover is complete!
+
+---
+
+## Check that our data is still available
+
+- We are going to reconnect to the (new) pod and check
+
+.exercise[
+
+- Get a shell on the pod:
+  ```bash
+  kubectl exec -ti postgres-0 -- su postgres
+  ```
+
+<!--
+```wait postgres@postgres```
+```keys PS1="\u@\h:\w\n\$ "```
+```key ^J```
+-->
+
+- Check how many transactions are now in the `pgbench_history` table:
+  ```bash
+  psql demo -c "select count(*) from pgbench_history"
+  ```
+
+<!-- ```key ^D``` -->
+
+]
+
+If the 10-second test that we ran earlier gave e.g. 80 transactions per second,
+and we failed the node after 30 seconds, we should have about 2400 row in that table.
+
+---
+
+## Double-check that the pod has really moved
+
+- Just to make sure the system is not bluffing!
+
+.exercise[
+
+- Look at which node the pod is now running on
+  ```bash
+  kubectl get pod postgres-0 -o wide
+  ```
+
+]
+
+???
+
+:EN:- Using highly available persistent volumes
+:EN:- Example: deploying a database that can withstand node outages
+
+:FR:- Utilisation de volumes à haute disponibilité
+:FR:- Exemple : déployer une base de données survivant à la défaillance d'un nœud
--- a/slides/k8s/statefulsets.md
+++ b/slides/k8s/statefulsets.md
@@ -6,7 +6,7 @@

 - They offer mechanisms to deploy scaled stateful applications

- At a first glance, they look like *deployments*:
+- At a first glance, they look like Deployments:

  - a stateful set defines a pod spec and a number of replicas *R*

@@ -182,503 +182,30 @@ spec:

 - These pods can each have their own persistent storage

-  (Deployments cannot do that)
-
 ---

-# Running a Consul cluster
+## Obtaining per-pod storage

- Here is a good use-case for Stateful sets!
+- Stateful Sets can have *persistent volume claim templates*

- We are going to deploy a Consul cluster with 3 nodes
+  (declared in `spec.volumeClaimTemplates` in the Stateful set manifest)

- Consul is a highly-available key/value store
+- A claim template will create one Persistent Volume Claim per pod

-  (like etcd or Zookeeper)
+  (the PVC will be named `<claim-name>.<stateful-set-name>.<pod-index>`)

- One easy way to bootstrap a cluster is to tell each node:
+- Persistent Volume Claims are matched 1-to-1 with Persistent Volumes

-  - the addresses of other nodes
+- Persistent Volume provisioning can be done:

-  - how many nodes are expected (to know when quorum is reached)
+  - automatically (by leveraging *dynamic provisioning* with a Storage Class)

---
-
-## Bootstrapping a Consul cluster
-
-*After reading the Consul documentation carefully (and/or asking around),
-we figure out the minimal command-line to run our Consul cluster.*
-
-```
-consul agent -data-dir=/consul/data -client=0.0.0.0 -server -ui \
-       -bootstrap-expect=3 \
-       -retry-join=`X.X.X.X` \
-       -retry-join=`Y.Y.Y.Y`
-```
-
- Replace X.X.X.X and Y.Y.Y.Y with the addresses of other nodes
-
- A node can add its own address (it will work fine)
-
- ... Which means that we can use the same command-line on all nodes (convenient!)
-
---
-
-## Cloud Auto-join
-
- Since version 1.4.0, Consul can use the Kubernetes API to find its peers
-
- This is called [Cloud Auto-join]
-
- Instead of passing an IP address, we need to pass a parameter like this:
-
-  ```
-  consul agent -retry-join "provider=k8s label_selector=\"app=consul\""
-  ```
-
- Consul needs to be able to talk to the Kubernetes API
-
- We can provide a `kubeconfig` file
-
- If Consul runs in a pod, it will use the *service account* of the pod
-
-[Cloud Auto-join]: https://www.consul.io/docs/agent/cloud-auto-join.html#kubernetes-k8s-
-
---
-
-## Setting up Cloud auto-join
-
- We need to create a service account for Consul
-
- We need to create a role that can `list` and `get` pods
-
- We need to bind that role to the service account
-
- And of course, we need to make sure that Consul pods use that service account
-
---
-
-## Putting it all together
-
- The file `k8s/consul-1.yaml` defines the required resources
-
-  (service account, role, role binding, service, stateful set)
-
- Inspired by this [excellent tutorial](https://github.com/kelseyhightower/consul-on-kubernetes) by Kelsey Hightower
-
-  (many features from the original tutorial were removed for simplicity)
-
---
-
-## Running our Consul cluster
-
- We'll use the provided YAML file
-
-.exercise[
-
- Create the stateful set and associated service:
-  ```bash
-  kubectl apply -f ~/container.training/k8s/consul-1.yaml
-  ```
-
- Check the logs as the pods come up one after another:
-  ```bash
-  stern consul
-  ```
-
-<!--
-```wait Synced node info```
-```key ^C```
-->
-
- Check the health of the cluster:
-  ```bash
-  kubectl exec consul-0 -- consul members
-  ```
-
-]
-
---
-
-## Caveats
-
- The scheduler may place two Consul pods on the same node
-
-  - if that node fails, we lose two Consul pods at the same time
-  - this will cause the cluster to fail
-
- Scaling down the cluster will cause it to fail
-
-  - when a Consul member leaves the cluster, it needs to inform the others
-  - otherwise, the last remaining node doesn't have quorum and stops functioning
-
- This Consul cluster doesn't use real persistence yet
-
-  - data is stored in the containers' ephemeral filesystem
-  - if a pod fails, its replacement starts from a blank slate
-
---
-
-## Improving pod placement
-
- We need to tell the scheduler:
-
-  *do not put two of these pods on the same node!*
-
- This is done with an `affinity` section like the following one:
-  ```yaml
-    affinity:
-      podAntiAffinity:
-        requiredDuringSchedulingIgnoredDuringExecution:
-          - labelSelector:
-              matchLabels:
-                app: consul
-            topologyKey: kubernetes.io/hostname
-  ```
-
---
-
-## Using a lifecycle hook
-
- When a Consul member leaves the cluster, it needs to execute:
-  ```bash
-  consul leave
-  ```
-
- This is done with a `lifecycle` section like the following one:
-  ```yaml
-    lifecycle:
-      preStop:
-        exec:
-          command: [ "sh", "-c", "consul leave" ]
-  ```
-
---
-
-## Running a better Consul cluster
-
- Let's try to add the scheduling constraint and lifecycle hook
-
- We can do that in the same namespace or another one (as we like)
-
- If we do that in the same namespace, we will see a rolling update
-
-  (pods will be replaced one by one)
-
-.exercise[
-
- Deploy a better Consul cluster:
-  ```bash
-  kubectl apply -f ~/container.training/k8s/consul-2.yaml
-  ```
-
-]
-
---
-
-## Still no persistence, though
-
- We aren't using actual persistence yet
-
-  (no `volumeClaimTemplate`, Persistent Volume, etc.)
-
- What happens if we lose a pod?
-
-  - a new pod gets rescheduled (with an empty state)
-
-  - the new pod tries to connect to the two others
-
-  - it will be accepted (after 1-2 minutes of instability)
-
-  - and it will retrieve the data from the other pods
-
---
-
-## Failure modes
-
- What happens if we lose two pods?
-
-  - manual repair will be required
-
-  - we will need to instruct the remaining one to act solo
-
-  - then rejoin new pods
-
- What happens if we lose three pods? (aka all of them)
-
-  - we lose all the data (ouch)
-
- If we run Consul without persistent storage, backups are a good idea!
-
---
-
-# Persistent Volumes Claims
-
- Our Pods can use a special volume type: a *Persistent Volume Claim*
-
- A Persistent Volume Claim (PVC) is also a Kubernetes resource
-
-  (visible with `kubectl get persistentvolumeclaims` or `kubectl get pvc`)
-
- A PVC is not a volume; it is a *request for a volume*
-
- It should indicate at least:
-
-  - the size of the volume (e.g. "5 GiB")
-
-  - the access mode (e.g. "read-write by a single pod")
-
---
-
-## What's in a PVC?
-
- A PVC contains at least:
-
-  - a list of *access modes* (ReadWriteOnce, ReadOnlyMany, ReadWriteMany)
-
-  - a size (interpreted as the minimal storage space needed)
-
- It can also contain optional elements:
-
-  - a selector (to restrict which actual volumes it can use)
-
-  - a *storage class* (used by dynamic provisioning, more on that later)
-
---
-
-## What does a PVC look like?
-
-Here is a manifest for a basic PVC:
-
-```yaml
-kind: PersistentVolumeClaim
-apiVersion: v1
-metadata:
-   name: my-claim
-spec:
-   accessModes:
-     - ReadWriteOnce
-   resources:
-     requests:
-       storage: 1Gi
-```
-
---
-
-## Using a Persistent Volume Claim
-
-Here is a Pod definition like the ones shown earlier, but using a PVC:
-
-```yaml
-apiVersion: v1
-kind: Pod
-metadata:
-  name: pod-using-a-claim
-spec:
-  containers:
-  - image: ...
-    name: container-using-a-claim
-    volumeMounts:
-    - mountPath: /my-vol
-      name: my-volume
-  volumes:
-  - name: my-volume
-    persistentVolumeClaim:
-      claimName: my-claim
-```
-
---
-
-## Creating and using Persistent Volume Claims
-
- PVCs can be created manually and used explicitly
-
-  (as shown on the previous slides)
-
- They can also be created and used through Stateful Sets
-
-  (this will be shown later)
-
---
-
-## Lifecycle of Persistent Volume Claims
-
- When a PVC is created, it starts existing in "Unbound" state
-
-  (without an associated volume)
-
- A Pod referencing an unbound PVC will not start
-
-  (the scheduler will wait until the PVC is bound to place it)
-
- A special controller continuously monitors PVCs to associate them with PVs
-
- If no PV is available, one must be created:
-
-  - manually (by operator intervention)
-
-  - using a *dynamic provisioner* (more on that later)
-
---
-
-class: extra-details
-
-## Which PV gets associated to a PVC?
-
- The PV must satisfy the PVC constraints
-
-  (access mode, size, optional selector, optional storage class)
-
- The PVs with the closest access mode are picked
-
- Then the PVs with the closest size
-
- It is possible to specify a `claimRef` when creating a PV
-
-  (this will associate it to the specified PVC, but only if the PV satisfies all the requirements of the PVC; otherwise another PV might end up being picked)
-
- For all the details about the PersistentVolumeClaimBinder, check [this doc](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/persistent-storage.md#matching-and-binding)
-
---
-
-## Persistent Volume Claims and Stateful sets
-
- A Stateful set can define one (or more) `volumeClaimTemplate`
-
- Each `volumeClaimTemplate` will create one Persistent Volume Claim per pod
-
- Each pod will therefore have its own individual volume
-
- These volumes are numbered (like the pods)
-
- Example:
-
-  - a Stateful set is named `db`
-  - it is scaled to replicas
-  - it has a `volumeClaimTemplate` named `data`
-  - then it will create pods `db-0`, `db-1`, `db-2`
-  - these pods will have volumes named `data-db-0`, `data-db-1`, `data-db-2`
-
---
-
-## Persistent Volume Claims are sticky
-
- When updating the stateful set (e.g. image upgrade), each pod keeps its volume
-
- When pods get rescheduled (e.g. node failure), they keep their volume
-
-  (this requires a storage system that is not node-local)
-
- These volumes are not automatically deleted
-
-  (when the stateful set is scaled down or deleted)
-
- If a stateful set is scaled back up later, the pods get their data back
-
---
-
-## Dynamic provisioners
-
- A *dynamic provisioner* monitors unbound PVCs
-
- It can create volumes (and the corresponding PV) on the fly
-
- This requires the PVCs to have a *storage class*
-
-  (annotation `volume.beta.kubernetes.io/storage-provisioner`)
-
- A dynamic provisioner only acts on PVCs with the right storage class
-
-  (it ignores the other ones)
-
- Just like `LoadBalancer` services, dynamic provisioners are optional
-
-  (i.e. our cluster may or may not have one pre-installed)
-
---
-
-## What's a Storage Class?
-
- A Storage Class is yet another Kubernetes API resource
-
-  (visible with e.g. `kubectl get storageclass` or `kubectl get sc`)
-
- It indicates which *provisioner* to use
-
-  (which controller will create the actual volume)
-
- And arbitrary parameters for that provisioner
-
-  (replication levels, type of disk ... anything relevant!)
-
- Storage Classes are required if we want to use [dynamic provisioning](https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/)
-
-  (but we can also create volumes manually, and ignore Storage Classes)
-
---
-
-## The default storage class
-
- At most one storage class can be marked as the default class
-
-  (by annotating it with `storageclass.kubernetes.io/is-default-class=true`)
-
- When a PVC is created, it will be annotated with the default storage class
-
-  (unless it specifies an explicit storage class)
-
- This only happens at PVC creation
-
-  (existing PVCs are not updated when we mark a class as the default one)
-
---
-
-## Dynamic provisioning setup
-
-This is how we can achieve fully automated provisioning of persistent storage.
-
-1. Configure a storage system.
-
-   (It needs to have an API, or be capable of automated provisioning of volumes.)
-
-2. Install a dynamic provisioner for this storage system.
-
-   (This is some specific controller code.)
-
-3. Create a Storage Class for this system.
-
-   (It has to match what the dynamic provisioner is expecting.)
-
-4. Annotate the Storage Class to be the default one.
-
---
-
-## Dynamic provisioning usage
-
-After setting up the system (previous slide), all we need to do is:
-
-*Create a Stateful Set that makes use of a `volumeClaimTemplate`.*
-
-This will trigger the following actions.
-
-1. The Stateful Set creates PVCs according to the `volumeClaimTemplate`.
-
-2. The Stateful Set creates Pods using these PVCs.
-
-3. The PVCs are automatically annotated with our Storage Class.
-
-4. The dynamic provisioner provisions volumes and creates the corresponding PVs.
-
-5. The PersistentVolumeClaimBinder associates the PVs and the PVCs together.
-
-6. PVCs are now bound, the Pods can start.
+  - manually (human operator creates the volumes ahead of time, or when needed)

 ???

 :EN:- Deploying apps with Stateful Sets
-:EN:- Example: deploying a Consul cluster
 :EN:- Understanding Persistent Volume Claims and Storage Classes
 :FR:- Déployer une application avec un *Stateful Set*
-:FR:- Example : lancer un cluster Consul
 :FR:- Comprendre les *Persistent Volume Claims* et *Storage Classes*

--- a/slides/k8s/volume-claim-templates.md
+++ b/slides/k8s/volume-claim-templates.md
@@ -0,0 +1,314 @@
+## Putting it all together
+
+- We want to run that Consul cluster *and* actually persist data
+
+- We'll use a StatefulSet that will leverage PV and PVC
+
+- If we have a dynamic provisioner:
+
+  *the cluster will come up right away*
+
+- If we don't have a dynamic provisioner:
+
+  *we will need to create Persistent Volumes manually*
+
+---
+
+## Persistent Volume Claims and Stateful sets
+
+- A Stateful set can define one (or more) `volumeClaimTemplate`
+
+- Each `volumeClaimTemplate` will create one Persistent Volume Claim per Pod
+
+- Each Pod will therefore have its own individual volume
+
+- These volumes are numbered (like the Pods)
+
+- Example:
+
+  - a Stateful set is named `consul`
+  - it is scaled to replicas
+  - it has a `volumeClaimTemplate` named `data`
+  - then it will create pods `consul-0`, `consul-1`, `consul-2`
+  - these pods will have volumes named `data`, referencing PersistentVolumeClaims
+    named `data-consul-0`, `data-consul-1`, `data-consul-2`
+
+---
+
+## Persistent Volume Claims are sticky
+
+- When updating the stateful set (e.g. image upgrade), each pod keeps its volume
+
+- When pods get rescheduled (e.g. node failure), they keep their volume
+
+  (this requires a storage system that is not node-local)
+
+- These volumes are not automatically deleted
+
+  (when the stateful set is scaled down or deleted)
+
+- If a stateful set is scaled back up later, the pods get their data back
+
+---
+
+## Deploying Consul
+
+- Let's use a new manifest for our Consul cluster
+
+- The only differences between that file and the previous one are:
+
+  - `volumeClaimTemplate` defined in the Stateful Set spec
+
+  - the corresponding `volumeMounts` in the Pod spec
+
+.exercise[
+
+- Apply the persistent Consul YAML file:
+  ```bash
+  kubectl apply -f ~/container.training/k8s/consul-3.yaml
+  ```
+
+]
+
+---
+
+## No dynamic provisioner
+
+- If we don't have a dynamic provisioner, we need to create the PVs
+
+- We are going to use local volumes
+
+  (similar conceptually to `hostPath` volumes)
+
+- We can use local volumes without installing extra plugins
+
+- However, they are tied to a node
+
+- If that node goes down, the volume becomes unavailable
+
+---
+
+## Observing the situation
+
+- Let's look at Persistent Volume Claims and Pods
+
+.exercise[
+
+- Check that we now have an unbound Persistent Volume Claim:
+  ```bash
+  kubectl get pvc
+  ```
+
+- We don't have any Persistent Volume:
+  ```bash
+  kubectl get pv
+  ```
+
+- The Pod `consul-0` is not scheduled yet:
+  ```bash
+  kubectl get pods -o wide
+  ```
+
+]
+
+*Hint: leave these commands running with `-w` in different windows.*
+
+---
+
+## Explanations
+
+- In a Stateful Set, the Pods are started one by one
+
+- `consul-1` won't be created until `consul-0` is running
+
+- `consul-0` has a dependency on an unbound Persistent Volume Claim
+
+- The scheduler won't schedule the Pod until the PVC is bound
+
+  (because the PVC might be bound to a volume that is only available on a subset of nodes; for instance EBS are tied to an availability zone)
+
+---
+
+## Creating Persistent Volumes
+
+- Let's create 3 local directories (`/mnt/consul`) on node2, node3, node4
+
+- Then create 3 Persistent Volumes corresponding to these directories
+
+.exercise[
+
+- Create the local directories:
+  ```bash
+    for NODE in node2 node3 node4; do
+      ssh $NODE sudo mkdir -p /mnt/consul
+    done
+  ```
+
+- Create the PV objects:
+  ```bash
+  kubectl apply -f ~/container.training/k8s/volumes-for-consul.yaml
+  ```
+
+]
+
+---
+
+## Check our Consul cluster
+
+- The PVs that we created will be automatically matched with the PVCs
+
+- Once a PVC is bound, its pod can start normally
+
+- Once the pod `consul-0` has started, `consul-1` can be created, etc.
+
+- Eventually, our Consul cluster is up, and backend by "persistent" volumes
+
+.exercise[
+
+- Check that our Consul clusters has 3 members indeed:
+  ```bash
+  kubectl exec consul-0 -- consul members
+  ```
+
+]
+
+---
+
+## Devil is in the details (1/2)
+
+- The size of the Persistent Volumes is bogus
+
+  (it is used when matching PVs and PVCs together, but there is no actual quota or limit)
+
+- The Pod might end up using more than the requested size
+
+- The PV may or may not have the capacity that it's advertising
+
+- It works well with dynamically provisioned block volumes
+
+- ...Less so in other scenarios!
+
+---
+
+## Devil is in the details (2/2)
+
+- This specific example worked because we had exactly 1 free PV per node:
+
+  - if we had created multiple PVs per node ...
+
+  - we could have ended with two PVCs bound to PVs on the same node ...
+
+  - which would have required two pods to be on the same node ...
+
+  - which is forbidden by the anti-affinity constraints in the StatefulSet
+
+- To avoid that, we need to associated the PVs with a Storage Class that has:
+  ```yaml
+  volumeBindingMode: WaitForFirstConsumer
+  ```
+  (this means that a PVC will be bound to a PV only after being used by a Pod)
+
+- See [this blog post](https://kubernetes.io/blog/2018/04/13/local-persistent-volumes-beta/) for more details
+
+---
+
+## If we have a dynamic provisioner
+
+These are the steps when dynamic provisioning happens:
+
+1. The Stateful Set creates PVCs according to the `volumeClaimTemplate`.
+
+2. The Stateful Set creates Pods using these PVCs.
+
+3. The PVCs are automatically annotated with our Storage Class.
+
+4. The dynamic provisioner provisions volumes and creates the corresponding PVs.
+
+5. The PersistentVolumeClaimBinder associates the PVs and the PVCs together.
+
+6. PVCs are now bound, the Pods can start.
+
+---
+
+## Validating persistence (1)
+
+- When the StatefulSet is deleted, the PVC and PV still exist
+
+- And if we recreate an identical StatefulSet, the PVC and PV are reused
+
+- Let's see that!
+
+.exercise[
+
+- Put some data in Consul:
+  ```bash
+  kubectl exec consul-0 -- consul kv put answer 42
+  ```
+
+- Delete the Consul cluster:
+  ```bash
+  kubectl delete -f ~/container.training/k8s/consul-3.yaml
+  ```
+
+]
+
+---
+
+## Validating persistence (2)
+
+.exercise[
+
+- Wait until the last Pod is deleted:
+  ```bash
+  kubectl wait pod consul-0 --for=delete
+  ```
+
+- Check that PV and PVC are still here:
+  ```bash
+  kubectl get pv,pvc
+  ```
+
+]
+
+---
+
+## Validating persistence (3)
+
+.exercise[
+
+- Re-create the cluster:
+  ```bash
+  kubectl apply -f ~/container.training/k8s/consul-3.yaml
+  ```
+
+- Wait until it's up
+
+- Then access the key that we set earlier:
+  ```bash
+  kubectl exec consul-0 -- consul kv get answer
+  ```
+
+]
+
+---
+
+## Cleaning up
+
+- PV and PVC don't get deleted automatically
+
+- This is great (less risk of accidental data loss)
+
+- This is not great (storage usage increases)
+
+- Managing PVC lifecycle:
+
+  - remove them manually
+
+  - add their StatefulSet to their `ownerReferences`
+
+  - delete the Namespace that they belong to
+
+???
+
+:EN:- Defining volumeClaimTemplates
+:FR:- Définir des volumeClaimTemplates
--- a/slides/kadm-twodays.yml
+++ b/slides/kadm-twodays.yml
@@ -84,5 +84,9 @@ content:
  - k8s/configuration.md
  - k8s/secrets.md
  - k8s/statefulsets.md
-  - k8s/local-persistent-volumes.md
-  - k8s/portworx.md
+  - k8s/consul.md
+  - k8s/pv-pvc-sc.md
+  - k8s/volume-claim-templates.md
+  #- k8s/portworx.md
+  - k8s/openebs.md
+  - k8s/stateful-failover.md
--- a/slides/kube-fullday.yml
+++ b/slides/kube-fullday.yml
@@ -110,8 +110,12 @@ content:
  #- k8s/prometheus.md
  #- k8s/prometheus-stack.md
  #- k8s/statefulsets.md
-  #- k8s/local-persistent-volumes.md
+  #- k8s/consul.md
+  #- k8s/pv-pvc-sc.md
+  #- k8s/volume-claim-templates.md
  #- k8s/portworx.md
+  #- k8s/openebs.md
+  #- k8s/stateful-failover.md
  #- k8s/extending-api.md
  #- k8s/crd.md
  #- k8s/admission.md
--- a/slides/kube-selfpaced.yml
+++ b/slides/kube-selfpaced.yml
@@ -112,9 +112,12 @@ content:
  - k8s/configuration.md
  - k8s/secrets.md
  - k8s/statefulsets.md
-  - k8s/local-persistent-volumes.md
+  - k8s/consul.md
+  - k8s/pv-pvc-sc.md
+  - k8s/volume-claim-templates.md
  - k8s/portworx.md
  - k8s/openebs.md
+  - k8s/stateful-failover.md
 -
  - k8s/logs-centralized.md
  - k8s/prometheus.md
--- a/slides/kube-twodays.yml
+++ b/slides/kube-twodays.yml
@@ -110,9 +110,12 @@ content:
  #- k8s/prometheus-stack.md
 -
  - k8s/statefulsets.md
-  - k8s/local-persistent-volumes.md
-  - k8s/portworx.md
-  #- k8s/openebs.md
+  - k8s/consul.md
+  - k8s/pv-pvc-sc.md
+  - k8s/volume-claim-templates.md
+  #- k8s/portworx.md
+  - k8s/openebs.md
+  - k8s/stateful-failover.md
  #- k8s/extending-api.md
  #- k8s/admission.md
  #- k8s/operators.md