✨ Docker content for derivco

2026-03-02 17:30:20 +00:00 · 2021-11-11 09:38:21 +01:00
222 changed files with 2920 additions and 8051 deletions
--- a/dockercoins/Tiltfile
+++ b/dockercoins/Tiltfile
@@ -1,67 +1,14 @@
-# (1) Setting up a registry, and telling Tilt to use it.
-
-# Tilt needs a registry to store images.
-
-# The following manifest defines a Deployment to run a basic Docker registry,
-# and a NodePort Service to access it. Using a NodePort means that we don't
-# need to obtain a TLS certificate, because we will be accessing the registry
-# through localhost.
 k8s_yaml('../k8s/tilt-registry.yaml')
-
-# Tell Tilt to use the registry that we just deployed instead of whatever
-# is defined in our Kubernetes resources. Tilt will patch image names to
-# use our registry.
 default_registry('localhost:30555')
-
-# Create a port forward so that we can access the registry from our local
-# environment, too. Note that if you run Tilt directly from a Kubernetes node
-# (which is not typical, but might happen in some lab/training environments)
-# the following might cause an error because port 30555 is already taken.
-k8s_resource(workload='tilt-registry', port_forwards='30555:5000')
-
-# (2) Telling Tilt how to build and run our app.
-
-# The following two lines will use the kubectl-build plugin
-# to leverage buildkit and build the images in our Kubernetes
-# cluster. This is not enabled by default, because it requires
-# the plugin to be installed.
-# See https://github.com/vmware-tanzu/buildkit-cli-for-kubectl
-# for more information about this plugin.
-#load('ext://kubectl_build', 'kubectl_build')
-#docker_build = kubectl_build
-
-# Our Kubernetes manifests use images 'dockercoins/...' so we tell Tilt
-# how each of these images should be built. The first argument is the name
-# of the image, the second argument is the directory containing the build
-# context (i.e. the Dockerfile to build the image).
 docker_build('dockercoins/hasher', 'hasher')
 docker_build('dockercoins/rng', 'rng')
 docker_build('dockercoins/webui', 'webui')
 docker_build('dockercoins/worker', 'worker')
-
-# The following manifests defines five Deployments and four Services for
-# our application.
 k8s_yaml('../k8s/dockercoins.yaml')

-# (3) Finishing touches.
+# Uncomment the following line to let tilt run with the default kubeadm cluster-admin context.
+#allow_k8s_contexts('kubernetes-admin@kubernetes')

-# The following line lets Tilt run with the default kubeadm cluster-admin context.
-allow_k8s_contexts('kubernetes-admin@kubernetes')
-
-# This will run an ngrok tunnel to expose Tilt to the outside world.
-# This is intended to be used when Tilt runs on a remote machine.
-local_resource(name='ngrok:tunnel', serve_cmd='ngrok http 10350')
-
-# This will wait until the ngrok tunnel is up, and show its URL to the user.
-# We send the output to /dev/tty so that it doesn't get intercepted by
-# Tilt, and gets displayed to the user's terminal instead.
-# Note: this assumes that the ngrok instance will be running on port 4040.
-# If you have other ngrok instances running on the machine, this might not work.
-local_resource(name='ngrok:showurl', cmd='''
-  while sleep 1; do
-    TUNNELS=$(curl -fsSL http://localhost:4040/api/tunnels | jq -r .tunnels[].public_url)
-    [ "$TUNNELS" ] && break
-  done
-  printf "\nYou should be able to connect to the Tilt UI with the following URL(s): %s\n" "$TUNNELS" >/dev/tty
-  '''
-)
+# While we're here: if you're controlling a remote cluster, uncomment that line.
+# It will create a port forward so that you can access the remote registry.
+#k8s_resource(workload='registry', port_forwards='30555:5000')
--- a/dockercoins/webui/Dockerfile
+++ b/dockercoins/webui/Dockerfile
@@ -1,6 +1,6 @@
 FROM node:4-slim
 RUN npm install express
-RUN npm install redis@3
+RUN npm install redis
 COPY files/ /files/
 COPY webui.js /
 CMD ["node", "webui.js"]
--- a/k8s/admission-configuration.yaml
+++ b/k8s/admission-configuration.yaml
@@ -1,16 +0,0 @@
-apiVersion: apiserver.config.k8s.io/v1
-kind: AdmissionConfiguration
-plugins:
- name: PodSecurity
-  configuration:
-    apiVersion: pod-security.admission.config.k8s.io/v1alpha1
-    kind: PodSecurityConfiguration
-    defaults:
-      enforce: baseline
-      audit: baseline
-      warn: baseline
-    exemptions:
-      usernames:
-      - cluster-admin
-      namespaces:
-      - kube-system
--- a/k8s/consul-1.yaml
+++ b/k8s/consul-1.yaml
@@ -3,12 +3,6 @@
 # - no actual persistence
 # - scaling down to 1 will break the cluster
 # - pods may be colocated
---
-apiVersion: v1
-kind: ServiceAccount
-metadata:
-  name: consul
---
 apiVersion: rbac.authorization.k8s.io/v1
 kind: Role
 metadata:
@@ -34,6 +28,11 @@ subjects:
    name: consul
 ---
 apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: consul
+---
+apiVersion: v1
 kind: Service
 metadata:
  name: consul
@@ -62,7 +61,7 @@ spec:
      serviceAccountName: consul
      containers:
        - name: consul
-          image: "consul:1.11"
+          image: "consul:1.8"
          env:
            - name: NAMESPACE
              valueFrom:
--- a/k8s/consul-2.yaml
+++ b/k8s/consul-2.yaml
@@ -2,12 +2,6 @@
 # There is still no actual persistence, but:
 # - podAntiaffinity prevents pod colocation
 # - clusters works when scaling down to 1 (thanks to lifecycle hook)
---
-apiVersion: v1
-kind: ServiceAccount
-metadata:
-  name: consul
---
 apiVersion: rbac.authorization.k8s.io/v1
 kind: Role
 metadata:
@@ -33,6 +27,11 @@ subjects:
    name: consul
 ---
 apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: consul
+---
+apiVersion: v1
 kind: Service
 metadata:
  name: consul
@@ -69,7 +68,7 @@ spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: consul
-          image: "consul:1.11"
+          image: "consul:1.8"
          env:
            - name: NAMESPACE
              valueFrom:
--- a/k8s/consul-3.yaml
+++ b/k8s/consul-3.yaml
@@ -1,11 +1,5 @@
 # Even better Consul cluster.
 # That one uses a volumeClaimTemplate to achieve true persistence.
---
-apiVersion: v1
-kind: ServiceAccount
-metadata:
-  name: consul
---
 apiVersion: rbac.authorization.k8s.io/v1
 kind: Role
 metadata:
@@ -31,6 +25,11 @@ subjects:
    name: consul
 ---
 apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: consul
+---
+apiVersion: v1
 kind: Service
 metadata:
  name: consul
@@ -76,7 +75,7 @@ spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: consul
-          image: "consul:1.11"
+          image: "consul:1.8"
          volumeMounts:
            - name: data
              mountPath: /consul/data
--- a/k8s/haproxy.cfg
+++ b/k8s/haproxy.cfg
@@ -1,16 +1,18 @@
 global
  daemon
+  maxconn 256

 defaults
  mode tcp
-  timeout connect 5s
-  timeout client 50s
-  timeout server 50s
+  timeout connect 5000ms
+  timeout client 50000ms
+  timeout server 50000ms

-listen very-basic-load-balancer
+frontend the-frontend
  bind *:80
-  server blue color.blue.svc:80
-  server green color.green.svc:80
+  default_backend the-backend
+
+backend the-backend
+  server google.com-80 google.com:80 maxconn 32 check
+  server ibm.fr-80 ibm.fr:80 maxconn 32 check

-# Note: the services above must exist,
-# otherwise HAproxy won't start.
--- a/k8s/kyverno-ingress-domain-name-1.yaml
+++ b/k8s/kyverno-ingress-domain-name-1.yaml
@@ -1,28 +0,0 @@
-apiVersion: kyverno.io/v1
-kind: ClusterPolicy
-metadata:
-  name: ingress-domain-name
-spec:
-  rules:
-  - name: create-ingress
-    match:
-      resources: 
-        kinds:
-        - Service
-    generate: 
-      kind: Ingress
-      name: "{{request.object.metadata.name}}"
-      namespace: "{{request.object.metadata.namespace}}"
-      data:
-        spec:
-          rules:
-          - host: "{{request.object.metadata.name}}.{{request.object.metadata.namespace}}.A.B.C.D.nip.io"
-            http:
-              paths:
-              - backend:
-                  service:
-                    name: "{{request.object.metadata.name}}"
-                    port:
-                      number: 80
-                path: /
-                pathType: Prefix
--- a/k8s/kyverno-ingress-domain-name-2.yaml
+++ b/k8s/kyverno-ingress-domain-name-2.yaml
@@ -1,32 +0,0 @@
-apiVersion: kyverno.io/v1
-kind: ClusterPolicy
-metadata:
-  name: ingress-domain-name
-spec:
-  rules:
-  - name: create-ingress
-    match:
-      resources: 
-        kinds:
-        - Service
-    preconditions:
-    - key: "{{request.object.spec.ports[0].name}}"
-      operator: Equals
-      value: http
-    generate: 
-      kind: Ingress
-      name: "{{request.object.metadata.name}}"
-      namespace: "{{request.object.metadata.namespace}}"
-      data:
-        spec:
-          rules:
-          - host: "{{request.object.metadata.name}}.{{request.object.metadata.namespace}}.A.B.C.D.nip.io"
-            http:
-              paths:
-              - backend:
-                  service:
-                    name: "{{request.object.metadata.name}}"
-                    port:
-                      name: http
-                path: /
-                pathType: Prefix
--- a/k8s/kyverno-ingress-domain-name-3.yaml
+++ b/k8s/kyverno-ingress-domain-name-3.yaml
@@ -1,37 +0,0 @@
-apiVersion: kyverno.io/v1
-kind: ClusterPolicy
-metadata:
-  name: ingress-domain-name
-spec:
-  rules:
-  - name: create-ingress
-    context:
-    - name: configmap
-      configMap:
-        name: ingress-domain-name
-        namespace: "{{request.object.metadata.namespace}}"
-    match:
-      resources: 
-        kinds:
-        - Service
-    preconditions:
-    - key: "{{request.object.spec.ports[0].name}}"
-      operator: Equals
-      value: http
-    generate: 
-      kind: Ingress
-      name: "{{request.object.metadata.name}}"
-      namespace: "{{request.object.metadata.namespace}}"
-      data:
-        spec:
-          rules:
-          - host: "{{request.object.metadata.name}}.{{request.object.metadata.namespace}}.{{configmap.data.domain}}"
-            http:
-              paths:
-              - backend:
-                  service:
-                    name: "{{request.object.metadata.name}}"
-                    port:
-                      name: http
-                path: /
-                pathType: Prefix
--- a/k8s/mounter.yaml
+++ b/k8s/mounter.yaml
@@ -1,20 +0,0 @@
-kind: Pod
-apiVersion: v1
-metadata:
-  generateName: mounter-
-  labels:
-    container.training/mounter: ""
-spec:
-  volumes:
-  - name: pvc
-    persistentVolumeClaim:
-      claimName: my-pvc-XYZ45
-  containers:
-  - name: mounter
-    image: alpine
-    stdin: true
-    tty: true
-    volumeMounts:
-    - name: pvc
-      mountPath: /pvc
-    workingDir: /pvc
--- a/k8s/netpol-dockercoins.yaml
+++ b/k8s/netpol-dockercoins.yaml
@@ -3,7 +3,8 @@ apiVersion: networking.k8s.io/v1
 metadata:
  name: deny-from-other-namespaces
 spec:
-  podSelector: {}
+  podSelector:
+    matchLabels:
  ingress:
  - from:
    - podSelector: {}
--- a/k8s/pv.yaml
+++ b/k8s/pv.yaml
@@ -1,20 +0,0 @@
-kind: PersistentVolume
-apiVersion: v1
-metadata:
-  generateName: my-pv-
-  labels:
-    container.training/pv: ""
-spec:
-  accessModes:
-  - ReadWriteOnce
-  - ReadWriteMany
-  capacity:
-    storage: 1G
-  hostPath:
-    path: /tmp/my-pv
-  #storageClassName: my-sc
-  #claimRef:
-  #  kind: PersistentVolumeClaim
-  #  apiVersion: v1
-  #  namespace: default
-  #  name: my-pvc-XYZ45
--- a/k8s/pvc.yaml
+++ b/k8s/pvc.yaml
@@ -1,13 +0,0 @@
-kind: PersistentVolumeClaim
-apiVersion: v1
-metadata:
-  generateName: my-pvc-
-  labels:
-    container.training/pvc: ""
-spec:
-  accessModes:
-  - ReadWriteOnce
-  resources:
-    requests:
-      storage: 1G
-  #storageClassName: my-sc
--- a/k8s/rainbow.yaml
+++ b/k8s/rainbow.yaml
@@ -1,147 +0,0 @@
---
-apiVersion: v1
-kind: Namespace
-metadata:
-  name: blue
-  labels:
-    app: rainbow
---
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  labels:
-    app: rainbow
-    color: blue
-  name: color
-  namespace: blue
-spec:
-  selector:
-    matchLabels:
-      app: rainbow
-      color: blue
-  template:
-    metadata:
-      labels:
-        app: rainbow
-        color: blue
-    spec:
-      containers:
-      - image: jpetazzo/color
-        name: color
---
-apiVersion: v1
-kind: Service
-metadata:
-  labels:
-    app: rainbow
-    color: blue
-  name: color
-  namespace: blue
-spec:
-  ports:
-  - name: http
-    port: 80
-    protocol: TCP
-    targetPort: 80
-  selector:
-    app: rainbow
-    color: blue
-  type: ClusterIP
---
-apiVersion: v1
-kind: Namespace
-metadata:
-  name: green
-  labels:
-    app: rainbow
---
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  labels:
-    app: rainbow
-    color: green
-  name: color
-  namespace: green
-spec:
-  selector:
-    matchLabels:
-      app: rainbow
-      color: green
-  template:
-    metadata:
-      labels:
-        app: rainbow
-        color: green
-    spec:
-      containers:
-      - image: jpetazzo/color
-        name: color
---
-apiVersion: v1
-kind: Service
-metadata:
-  labels:
-    app: rainbow
-    color: green
-  name: color
-  namespace: green
-spec:
-  ports:
-  - name: http
-    port: 80
-    protocol: TCP
-    targetPort: 80
-  selector:
-    app: rainbow
-    color: green
-  type: ClusterIP
---
-apiVersion: v1
-kind: Namespace
-metadata:
-  name: red
-  labels:
-    app: rainbow
---
-apiVersion: apps/v1
-kind: Deployment
-metadata:
-  labels:
-    app: rainbow
-    color: red
-  name: color
-  namespace: red
-spec:
-  selector:
-    matchLabels:
-      app: rainbow
-      color: red
-  template:
-    metadata:
-      labels:
-        app: rainbow
-        color: red
-    spec:
-      containers:
-      - image: jpetazzo/color
-        name: color
---
-apiVersion: v1
-kind: Service
-metadata:
-  labels:
-    app: rainbow
-    color: red
-  name: color
-  namespace: red
-spec:
-  ports:
-  - name: http
-    port: 80
-    protocol: TCP
-    targetPort: 80
-  selector:
-    app: rainbow
-    color: red
-  type: ClusterIP
--- a/prepare-tf/README.md
+++ b/prepare-tf/README.md
@@ -1,107 +1,17 @@
-⚠️ This is work in progress. The UX needs to be improved,
-and the docs could be better.
-
 This directory contains a Terraform configuration to deploy
-a bunch of Kubernetes clusters on various cloud providers,
-using their respective managed Kubernetes products.
+a bunch of Kubernetes clusters on various cloud providers, using their respective managed Kubernetes products.

-## With shell wrapper
-
-This is the recommended use. It makes it easy to start N clusters
-on any provider. It will create a directory with a name like
-`tag-YYYY-MM-DD-HH-MM-SS-SEED-PROVIDER`, copy the Terraform configuration
-to that directory, then create the clusters using that configuration.
-
-1. One-time setup: configure provider authentication for the provider(s) that you wish to use.
-
- Digital Ocean:
-  ```bash
-  doctl auth init
-  ```
-
- Google Cloud Platform: you will need to create a project named `prepare-tf`
-  and enable the relevant APIs for this project (sorry, if you're new to GCP,
-  this sounds vague; but if you're familiar with it you know what to do; if you
-  want to change the project name you can edit the Terraform configuration)
-
- Linode:
-  ```bash
-  linode-cli configure
-  ```
-
- Oracle Cloud: FIXME
-  (set up `oci` through the `oci-cli` Python package)
-
- Scaleway: run `scw init`
-
-2. Optional: set number of clusters, cluster size, and region.
-
-By default, 1 cluster will be configured, with 2 nodes, and auto-scaling up to 5 nodes.
-
-If you want, you can override these parameters, with the following variables.
-
-```bash
-export TF_VAR_how_many_clusters=5
-export TF_VAR_min_nodes_per_pool=2
-export TF_VAR_max_nodes_per_pool=4
-export TF_VAR_location=xxx
-```
-
-The `location` variable is optional. Each provider should have a default value.
-The value of the `location` variable is provider-specific. Examples:
-
-| Provider      | Example value     | How to see possible values
-|---------------|-------------------|---------------------------
-| Digital Ocean | `ams3`            | `doctl compute region list`
-| Google Cloud  | `europe-north1-a` | `gcloud  compute zones list`
-| Linode        | `eu-central`      | `linode-cli regions list`
-| Oracle Cloud  | `eu-stockholm-1`  | `oci iam region list`
-
-You can also specify multiple locations, and then they will be
-used in round-robin fashion.
-
-For example, with Google Cloud, since the default quotas are very
-low (my account is limited to 8 public IP addresses per zone, and
-my requests to increase that quota were denied) you can do the
-following:
-
-```bash
-export TF_VAR_location=$(gcloud compute zones list --format=json | jq -r .[].name | grep ^europe)
-```
-
-Then when you apply, clusters will be created across all available
-zones in Europe. (When I write this, there are 20+ zones in Europe,
-so even with my quota, I can create 40 clusters.)
-
-3. Run!
-
-```bash
-./run.sh <providername>
-```
-
-(If you don't specify a provider name, it will list available providers.)
-
-4. Shutting down
-
-Go to the directory that was created by the previous step (`tag-YYYY-MM...`)
-and run `terraform destroy`.
-
-You can also run `./clean.sh` which will destroy ALL clusters deployed by the previous run script.
-
-## Without shell wrapper
-
-Expert mode.
-
-Useful to run steps sperarately, and/or when working on the Terraform configurations.
+To use it:

 1. Select the provider you wish to use.

-Go to the `source` directory and edit `main.tf`.
-
 Change the `source` attribute of the `module "clusters"` section.
-
 Check the content of the `modules` directory to see available choices.

+```bash
+vim main.tf
+```
+
 2. Initialize the provider.

 ```bash
@@ -110,20 +20,24 @@ terraform init

 3. Configure provider authentication.

-See steps above, and add the following extra steps:
-
- Digital Coean:
-  ```bash
-  export DIGITALOCEAN_ACCESS_TOKEN=$(grep ^access-token ~/.config/doctl/config.yaml | cut -d: -f2 | tr -d " ")
-  ```
-
- Linode:
-  ```bash
-  export LINODE_TOKEN=$(grep ^token ~/.config/linode-cli | cut -d= -f2 | tr -d " ")
-  ```
+- Digital Ocean: `export DIGITALOCEAN_ACCESS_TOKEN=...`
+  (check `~/.config/doctl/config.yaml` for the token)
+- Linode: `export LINODE_TOKEN=...`
+  (check `~/.config/linode-cli` for the token)
+- Oracle Cloud: it should use `~/.oci/config`
+- Scaleway: run `scw init`

 4. Decide how many clusters and how many nodes per clusters you want.

+```bash
+export TF_VAR_how_many_clusters=5
+export TF_VAR_min_nodes_per_pool=2
+# Optional (will enable autoscaler when available)
+export TF_VAR_max_nodes_per_pool=4
+# Optional (will only work on some providers)
+export TF_VAR_enable_arm_pool=true
+```
+
 5. Provision clusters.

 ```bash
@@ -132,7 +46,7 @@ terraform apply

 6. Perform second stage provisioning.

-This will install an SSH server on the clusters.
+This will install a SSH server on the clusters.

 ```bash
 cd stage2
@@ -158,5 +72,5 @@ terraform destroy
 9. Clean up stage2.

 ```bash
-rm stage2/terraform.tfstate*
+rm stage/terraform.tfstate*
 ```
--- a/prepare-tf/cleanup.sh
+++ b/prepare-tf/cleanup.sh
@@ -1,9 +0,0 @@
-#!/bin/sh
-export LINODE_TOKEN=$(grep ^token ~/.config/linode-cli | cut -d= -f2 | tr -d " ")
-export DIGITALOCEAN_ACCESS_TOKEN=$(grep ^access-token ~/.config/doctl/config.yaml | cut -d: -f2 | tr -d " ")
-for T in  tag-*; do
-(
-  cd $T
-  terraform apply -destroy -auto-approve && mv ../$T ../deleted$T
-)
-done
--- a/prepare-tf/locals.tf
+++ b/prepare-tf/locals.tf
@@ -0,0 +1,16 @@
+resource "random_string" "_" {
+  length  = 5
+  special = false
+  upper   = false
+}
+
+resource "time_static" "_" {}
+
+locals {
+  tag = format("tf-%s-%s", formatdate("YYYY-MM-DD-hh-mm", time_static._.rfc3339), random_string._.result)
+  # Common tags to be assigned to all resources
+  common_tags = [
+    "created-by=terraform",
+    "tag=${local.tag}"
+  ]
+}
--- a/prepare-tf/source/main.tf
+++ b/prepare-tf/source/main.tf
@@ -1,5 +1,5 @@
 module "clusters" {
-  source             = "./modules/PROVIDER"
+  source             = "./modules/linode"
  for_each           = local.clusters
  cluster_name       = each.value.cluster_name
  min_nodes_per_pool = var.min_nodes_per_pool
@@ -7,24 +7,22 @@ module "clusters" {
  enable_arm_pool    = var.enable_arm_pool
  node_size          = var.node_size
  common_tags        = local.common_tags
-  location           = each.value.location
 }

 locals {
  clusters = {
    for i in range(101, 101 + var.how_many_clusters) :
    i => {
-      cluster_name     = format("%s-%03d", local.tag, i)
-      kubeconfig_path  = format("./stage2/kubeconfig.%03d", i)
+      cluster_name    = format("%s-%03d", local.tag, i)
+      kubeconfig_path = format("./stage2/kubeconfig.%03d", i)
+      #dashdash_kubeconfig = format("--kubeconfig=./stage2/kubeconfig.%03d", i)
      externalips_path = format("./stage2/externalips.%03d", i)
-      flags_path       = format("./stage2/flags.%03d", i)
-      location         = local.locations[i % length(local.locations)]
    }
  }
 }

 resource "local_file" "stage2" {
-  filename        = "./stage2/main.tf"
+  filename = "./stage2/main.tf"
  file_permission = "0644"
  content = templatefile(
    "./stage2.tmpl",
@@ -32,15 +30,6 @@ resource "local_file" "stage2" {
  )
 }

-resource "local_file" "flags" {
-  for_each        = local.clusters
-  filename        = each.value.flags_path
-  file_permission = "0600"
-  content         = <<-EOT
-    has_metrics_server: ${module.clusters[each.key].has_metrics_server}
-  EOT
-}
-
 resource "local_file" "kubeconfig" {
  for_each        = local.clusters
  filename        = each.value.kubeconfig_path
@@ -70,8 +59,8 @@ resource "null_resource" "wait_for_nodes" {
 }

 data "external" "externalips" {
-  for_each   = local.clusters
-  depends_on = [null_resource.wait_for_nodes]
+  for_each = local.clusters
+  depends_on = [ null_resource.wait_for_nodes ]
  program = [
    "sh",
    "-c",
--- a/prepare-tf/source/modules/digitalocean/main.tf
+++ b/prepare-tf/source/modules/digitalocean/main.tf
@@ -1,13 +1,12 @@
 resource "digitalocean_kubernetes_cluster" "_" {
-  name = var.cluster_name
-  tags = var.common_tags
-  # Region is mandatory, so let's provide a default value.
-  region  = var.location != null ? var.location : "nyc1"
+  name    = var.cluster_name
+  tags    = local.common_tags
+  region  = var.region
  version = var.k8s_version

  node_pool {
-    name       = "x86"
-    tags       = var.common_tags
+    name       = "dok-x86"
+    tags       = local.common_tags
    size       = local.node_type
    auto_scale = true
    min_nodes  = var.min_nodes_per_pool
--- a/prepare-tf/source/modules/digitalocean/outputs.tf
+++ b/prepare-tf/source/modules/digitalocean/outputs.tf
@@ -5,7 +5,3 @@ output "kubeconfig" {
 output "cluster_id" {
  value = digitalocean_kubernetes_cluster._.id
 }
-
-output "has_metrics_server" {
-  value = false
-}
--- a/prepare-tf/source/modules/digitalocean/providers.tf
+++ b/prepare-tf/source/modules/digitalocean/providers.tf
--- a/prepare-tf/source/modules/digitalocean/variables.tf
+++ b/prepare-tf/source/modules/digitalocean/variables.tf
@@ -8,6 +8,10 @@ variable "common_tags" {
  default = []
 }

+locals {
+  common_tags = [for tag in var.common_tags : replace(tag, "=", "-")]
+}
+
 variable "node_size" {
  type    = string
  default = "M"
@@ -42,11 +46,9 @@ locals {
  node_type = var.node_types[var.node_size]
 }

-# To view supported regions, run:
-# doctl compute region list
-variable "location" {
+variable "region" {
  type    = string
-  default = null
+  default = "ams3"
 }

 # To view supported versions, run:
--- a/prepare-tf/source/modules/linode/main.tf
+++ b/prepare-tf/source/modules/linode/main.tf
@@ -1,8 +1,7 @@
 resource "linode_lke_cluster" "_" {
-  label = var.cluster_name
-  tags  = var.common_tags
-  # "region" is mandatory, so let's provide a default value if none was given.
-  region      = var.location != null ? var.location : "eu-central"
+  label       = var.cluster_name
+  tags        = var.common_tags
+  region      = var.region
  k8s_version = var.k8s_version

  pool {
--- a/prepare-tf/source/modules/linode/outputs.tf
+++ b/prepare-tf/source/modules/linode/outputs.tf
@@ -5,7 +5,3 @@ output "kubeconfig" {
 output "cluster_id" {
  value = linode_lke_cluster._.id
 }
-
-output "has_metrics_server" {
-  value = false
-}
--- a/prepare-tf/source/modules/linode/providers.tf
+++ b/prepare-tf/source/modules/linode/providers.tf
--- a/prepare-tf/source/modules/linode/variables.tf
+++ b/prepare-tf/source/modules/linode/variables.tf
@@ -42,11 +42,11 @@ locals {
  node_type = var.node_types[var.node_size]
 }

-# To view supported regions, run:
+# To view supported versions, run:
 # linode-cli regions list
-variable "location" {
+variable "region" {
  type    = string
-  default = null
+  default = "us-east"
 }

 # To view supported versions, run:
--- a/prepare-tf/source/modules/oraclecloud/main.tf
+++ b/prepare-tf/source/modules/oraclecloud/main.tf
@@ -1,7 +1,6 @@
 resource "oci_identity_compartment" "_" {
-  name          = var.cluster_name
-  description   = var.cluster_name
-  enable_delete = true
+  name        = var.cluster_name
+  description = var.cluster_name
 }

 locals {
--- a/prepare-tf/source/modules/oraclecloud/network.tf
+++ b/prepare-tf/source/modules/oraclecloud/network.tf
--- a/prepare-tf/source/modules/oraclecloud/outputs.tf
+++ b/prepare-tf/source/modules/oraclecloud/outputs.tf
@@ -9,7 +9,3 @@ output "kubeconfig" {
 output "cluster_id" {
  value = oci_containerengine_cluster._.id
 }
-
-output "has_metrics_server" {
-  value = false
-}
--- a/prepare-tf/source/modules/oraclecloud/providers.tf
+++ b/prepare-tf/source/modules/oraclecloud/providers.tf
--- a/prepare-tf/source/modules/oraclecloud/variables.tf
+++ b/prepare-tf/source/modules/oraclecloud/variables.tf
@@ -70,13 +70,6 @@ locals {
  node_type = var.node_types[var.node_size]
 }

-# To view supported regions, run:
-# oci iam region list | jq .data[].name
-variable "location" {
-  type    = string
-  default = null
-}
-
 # To view supported versions, run:
 # oci ce cluster-options get --cluster-option-id all | jq -r '.data["kubernetes-versions"][]'
 variable "k8s_version" {
--- a/prepare-tf/source/modules/scaleway/main.tf
+++ b/prepare-tf/source/modules/scaleway/main.tf
@@ -1,15 +1,13 @@
 resource "scaleway_k8s_cluster" "_" {
-  name                        = var.cluster_name
-  region                      = var.location
-  tags                        = var.common_tags
-  version                     = var.k8s_version
-  cni                         = var.cni
-  delete_additional_resources = true
+  name    = var.cluster_name
+  tags    = var.common_tags
+  version = var.k8s_version
+  cni     = var.cni
 }

 resource "scaleway_k8s_pool" "_" {
  cluster_id  = scaleway_k8s_cluster._.id
-  name        = "x86"
+  name        = "scw-x86"
  tags        = var.common_tags
  node_type   = local.node_type
  size        = var.min_nodes_per_pool
--- a/prepare-tf/source/modules/scaleway/outputs.tf
+++ b/prepare-tf/source/modules/scaleway/outputs.tf
@@ -5,7 +5,3 @@ output "kubeconfig" {
 output "cluster_id" {
  value = scaleway_k8s_cluster._.id
 }
-
-output "has_metrics_server" {
-  value = sort([var.k8s_version, "1.22"])[0] == "1.22"
-}
--- a/prepare-tf/source/modules/scaleway/providers.tf
+++ b/prepare-tf/source/modules/scaleway/providers.tf
--- a/prepare-tf/source/modules/scaleway/variables.tf
+++ b/prepare-tf/source/modules/scaleway/variables.tf
@@ -47,12 +47,7 @@ variable "cni" {
  default = "cilium"
 }

-variable "location" {
-  type    = string
-  default = null
-}
-
-# To view supported versions, run:
+# See supported versions with:
 # scw k8s version list -o json | jq -r .[].name
 variable "k8s_version" {
  type    = string
--- a/prepare-tf/source/providers.tf
+++ b/prepare-tf/source/providers.tf
--- a/prepare-tf/run.sh
+++ b/prepare-tf/run.sh
@@ -1,49 +0,0 @@
-#!/bin/sh
-set -e
-
-TIME=$(which time)
-
-PROVIDER=$1
-[ "$PROVIDER" ] || {
-  echo "Please specify a provider as first argument, or 'ALL' for parallel mode."
-  echo "Available providers:"
-  ls -1 source/modules
-  exit 1
-}
-
-[ "$TAG" ] || {
-  TIMESTAMP=$(date +%Y-%m-%d-%H-%M-%S)
-  RANDOMTAG=$(base64 /dev/urandom | tr A-Z a-z | tr -d /+ | head -c5)
-  export TAG=tag-$TIMESTAMP-$RANDOMTAG
-}
-
-[ "$PROVIDER" = "ALL" ] && {
-  for PROVIDER in $(ls -1 source/modules); do
-    $TERMINAL -T $TAG-$PROVIDER -e sh -c "
-      export TAG=$TAG-$PROVIDER
-      $0 $PROVIDER
-      cd $TAG-$PROVIDER
-      bash
-      " &
-  done
-  exit 0
-}
-
-[ -d "source/modules/$PROVIDER" ] || {
-  echo "Provider '$PROVIDER' not found."
-  echo "Available providers:"
-  ls -1 source/modules
-  exit 1  
-}
-
-export LINODE_TOKEN=$(grep ^token ~/.config/linode-cli | cut -d= -f2 | tr -d " ")
-export DIGITALOCEAN_ACCESS_TOKEN=$(grep ^access-token ~/.config/doctl/config.yaml | cut -d: -f2 | tr -d " ")
-
-cp -a source $TAG
-cd $TAG
-cp -r modules/$PROVIDER modules/PROVIDER
-$TIME -o time.1.init terraform init
-$TIME -o time.2.stage1 terraform apply -auto-approve
-cd stage2
-$TIME -o ../time.3.init terraform init
-$TIME -o ../time.4.stage2 terraform apply -auto-approve
--- a/prepare-tf/source/locals.tf
+++ b/prepare-tf/source/locals.tf
@@ -1,19 +0,0 @@
-resource "random_string" "_" {
-  length  = 4
-  number  = false
-  special = false
-  upper   = false
-}
-
-resource "time_static" "_" {}
-
-locals {
-  timestamp = formatdate("YYYY-MM-DD-hh-mm", time_static._.rfc3339)
-  tag       = random_string._.result
-  # Common tags to be assigned to all resources
-  common_tags = [
-    "created-by-terraform",
-    format("created-at-%s", local.timestamp),
-    format("created-for-%s", local.tag)
-  ]
-}
--- a/prepare-tf/source/modules/googlecloud/main.tf
+++ b/prepare-tf/source/modules/googlecloud/main.tf
@@ -1,65 +0,0 @@
-resource "google_container_cluster" "_" {
-  name               = var.cluster_name
-  project            = local.project
-  location           = local.location
-  min_master_version = var.k8s_version
-
-  # To deploy private clusters, uncomment the section below,
-  # and uncomment the block in network.tf.
-  # Private clusters require extra resources (Cloud NAT,
-  # router, network, subnet) and the quota for some of these
-  # resources is fairly low on GCP; so if you want to deploy
-  # a lot of private clusters (more than 10), you can use these
-  # blocks as a base but you will probably have to refactor
-  # things quite a bit (you will at least need to define a single
-  # shared router and use it across all the clusters).
-  /*
-  network    = google_compute_network._.name
-  subnetwork = google_compute_subnetwork._.name
-
-  private_cluster_config {
-    enable_private_nodes = true
-    # This must be set to "false".
-    # (Otherwise, access to the public endpoint is disabled.)
-    enable_private_endpoint = false
-    # This must be set to a /28.
-    # I think it shouldn't collide with the pod network subnet.
-    master_ipv4_cidr_block = "10.255.255.0/28"
-  }
-  # Private clusters require "VPC_NATIVE" networking mode
-  # (as opposed to the legacy "ROUTES").
-  networking_mode = "VPC_NATIVE"
-  # ip_allocation_policy is required for VPC_NATIVE clusters.
-  ip_allocation_policy {
-    # This is the block that will be used for pods.
-    cluster_ipv4_cidr_block = "10.0.0.0/12"
-    # The services block is optional
-    # (GKE will pick one automatically).
-    #services_ipv4_cidr_block = ""
-  }
-  */
-
-  node_pool {
-    name = "x86"
-    node_config {
-      tags         = var.common_tags
-      machine_type = local.node_type
-    }
-    initial_node_count = var.min_nodes_per_pool
-    autoscaling {
-      min_node_count = var.min_nodes_per_pool
-      max_node_count = max(var.min_nodes_per_pool, var.max_nodes_per_pool)
-    }
-  }
-
-  # This is not strictly necessary.
-  # We'll see if we end up using it.
-  # (If it is removed, make sure to also remove the corresponding
-  # key+cert variables from outputs.tf!)
-  master_auth {
-    client_certificate_config {
-      issue_client_certificate = true
-    }
-  }
-}
-
--- a/prepare-tf/source/modules/googlecloud/network.tf
+++ b/prepare-tf/source/modules/googlecloud/network.tf
@@ -1,38 +0,0 @@
-/*
-resource "google_compute_network" "_" {
-  name    = var.cluster_name
-  project = local.project
-  # The default is to create subnets automatically.
-  # However, this creates one subnet per zone in all regions,
-  # which causes a quick exhaustion of the subnet quota.
-  auto_create_subnetworks = false
-}
-
-resource "google_compute_subnetwork" "_" {
-  name          = var.cluster_name
-  ip_cidr_range = "10.254.0.0/16"
-  region        = local.region
-  network       = google_compute_network._.id
-  project       = local.project
-}
-
-resource "google_compute_router" "_" {
-  name    = var.cluster_name
-  region  = local.region
-  network = google_compute_network._.name
-  project = local.project
-}
-
-resource "google_compute_router_nat" "_" {
-  name    = var.cluster_name
-  router  = google_compute_router._.name
-  region  = local.region
-  project = local.project
-  # Everyone in the network is allowed to NAT out.
-  # (We would change this if we only wanted to allow specific subnets to NAT out.)
-  source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
-  # Pick NAT addresses automatically.
-  # (We would change this if we wanted to use specific addresses to NAT out.)
-  nat_ip_allocate_option = "AUTO_ONLY"
-}
-*/
--- a/prepare-tf/source/modules/googlecloud/outputs.tf
+++ b/prepare-tf/source/modules/googlecloud/outputs.tf
@@ -1,35 +0,0 @@
-data "google_client_config" "_" {}
-
-output "kubeconfig" {
-  value = <<-EOT
-    apiVersion: v1
-    kind: Config
-    current-context: ${google_container_cluster._.name}
-    clusters:
-    - name: ${google_container_cluster._.name}
-      cluster:
-        server: https://${google_container_cluster._.endpoint}
-        certificate-authority-data: ${google_container_cluster._.master_auth[0].cluster_ca_certificate}
-    contexts:
-    - name: ${google_container_cluster._.name}
-      context:
-        cluster: ${google_container_cluster._.name}
-        user: client-token
-    users:
-    - name: client-cert
-      user:
-        client-key-data: ${google_container_cluster._.master_auth[0].client_key}
-        client-certificate-data: ${google_container_cluster._.master_auth[0].client_certificate}
-    - name: client-token
-      user:
-        token: ${data.google_client_config._.access_token}
-    EOT
-}
-
-output "cluster_id" {
-  value = google_container_cluster._.id
-}
-
-output "has_metrics_server" {
-  value = true
-}
--- a/prepare-tf/source/modules/googlecloud/providers.tf
+++ b/prepare-tf/source/modules/googlecloud/providers.tf
@@ -1,8 +0,0 @@
-terraform {
-  required_providers {
-    google = {
-      source  = "hashicorp/google"
-      version = "4.5.0"
-    }
-  }
-}
--- a/prepare-tf/source/modules/googlecloud/variables.tf
+++ b/prepare-tf/source/modules/googlecloud/variables.tf
@@ -1,68 +0,0 @@
-variable "cluster_name" {
-  type    = string
-  default = "deployed-with-terraform"
-}
-
-variable "common_tags" {
-  type    = list(string)
-  default = []
-}
-
-variable "node_size" {
-  type    = string
-  default = "M"
-}
-
-variable "min_nodes_per_pool" {
-  type    = number
-  default = 2
-}
-
-variable "max_nodes_per_pool" {
-  type    = number
-  default = 5
-}
-
-# FIXME
-variable "enable_arm_pool" {
-  type    = bool
-  default = false
-}
-
-variable "node_types" {
-  type = map(string)
-  default = {
-    "S" = "e2-small"
-    "M" = "e2-medium"
-    "L" = "e2-standard-2"
-  }
-}
-
-locals {
-  node_type = var.node_types[var.node_size]
-}
-
-# To view supported locations, run:
-# gcloud compute zones list
-variable "location" {
-  type    = string
-  default = null
-}
-
-# To view supported versions, run:
-# gcloud container get-server-config --region=europe-north1 '--format=flattened(channels)'
-# But it's also possible to just specify e.g. "1.20" and it figures it out.
-variable "k8s_version" {
-  type    = string
-  default = "1.21"
-}
-
-locals {
-  location = var.location != null ? var.location : "europe-north1-a"
-  region   = replace(local.location, "/-[a-z]$/", "")
-  # Unfortunately, the following line doesn't work
-  # (that attribute just returns an empty string)
-  # so we have to hard-code the project name.
-  #project = data.google_client_config._.project
-  project = "prepare-tf"
-}
--- a/prepare-tf/source/variables.tf
+++ b/prepare-tf/source/variables.tf
@@ -1,40 +0,0 @@
-variable "how_many_clusters" {
-  type    = number
-  default = 1
-}
-
-variable "node_size" {
-  type    = string
-  default = "M"
-  # Can be S, M, L.
-  # We map these values to different specific instance types for each provider,
-  # but the idea is that they shoudl correspond to the following sizes:
-  # S = 2 GB RAM
-  # M = 4 GB RAM
-  # L = 8 GB RAM
-}
-
-variable "min_nodes_per_pool" {
-  type    = number
-  default = 1
-}
-
-variable "max_nodes_per_pool" {
-  type    = number
-  default = 0
-}
-
-variable "enable_arm_pool" {
-  type    = bool
-  default = false
-}
-
-variable "location" {
-  type    = string
-  default = null
-}
-
-# TODO: perhaps handle if it's space-separated instead of newline?
-locals {
-  locations = var.location == null ? [null] : split("\n", var.location)
-}
--- a/prepare-tf/source/stage2.tmpl
+++ b/prepare-tf/source/stage2.tmpl
@@ -2,7 +2,7 @@ terraform {
  required_providers {
    kubernetes = {
      source  = "hashicorp/kubernetes"
-      version = "2.7.1"
+      version = "2.0.3"
    }
  }
 }
@@ -119,11 +119,6 @@ resource "kubernetes_cluster_role_binding" "shpod_${index}" {
    name      = "shpod"
    namespace = "shpod"
  }
-  subject {
-    api_group = "rbac.authorization.k8s.io"
-    kind      = "Group"
-    name      = "shpod-cluster-admins"
-  }
 }

 resource "random_string" "shpod_${index}" {
@@ -140,14 +135,9 @@ provider "helm" {
 }

 resource "helm_release" "metrics_server_${index}" {
-  # Some providers pre-install metrics-server.
-  # Some don't. Let's install metrics-server,
-  # but only if it's not already installed.
-  count = yamldecode(file("./flags.${index}"))["has_metrics_server"] ? 0 : 1
  provider = helm.cluster_${index}
  repository = "https://charts.bitnami.com/bitnami"
  chart = "metrics-server"
-  version = "5.8.8"
  name = "metrics-server"
  namespace = "metrics-server"
  create_namespace = true
@@ -191,7 +181,7 @@ resource "kubernetes_config_map" "kubeconfig_${index}" {
      - name: cluster-admin
        user:
          client-key-data: $${base64encode(tls_private_key.cluster_admin_${index}.private_key_pem)}
-          client-certificate-data: $${base64encode(kubernetes_certificate_signing_request_v1.cluster_admin_${index}.certificate)}
+          client-certificate-data: $${base64encode(kubernetes_certificate_signing_request.cluster_admin_${index}.certificate)}
    EOT
  }
 }
@@ -205,14 +195,11 @@ resource "tls_cert_request" "cluster_admin_${index}" {
  private_key_pem = tls_private_key.cluster_admin_${index}.private_key_pem
  subject {
    common_name = "cluster-admin"
-    # Note: CSR API v1 doesn't allow issuing certs with "system:masters" anymore.
-    #organization = "system:masters"
-    # We'll use this custom group name instead.cluster-admin user.
-    organization = "shpod-cluster-admins"
+    organization = "system:masters"
  }
 }

-resource "kubernetes_certificate_signing_request_v1" "cluster_admin_${index}" {
+resource "kubernetes_certificate_signing_request" "cluster_admin_${index}" {
  provider = kubernetes.cluster_${index}
  metadata {
    name = "cluster-admin"
@@ -220,7 +207,6 @@ resource "kubernetes_certificate_signing_request_v1" "cluster_admin_${index}" {
  spec {
    usages = ["client auth"]
    request = tls_cert_request.cluster_admin_${index}.cert_request_pem
-    signer_name = "kubernetes.io/kube-apiserver-client"
  }
  auto_approve = true
 }
--- a/prepare-tf/variables.tf
+++ b/prepare-tf/variables.tf
@@ -0,0 +1,28 @@
+variable "how_many_clusters" {
+  type    = number
+  default = 2
+}
+
+variable "node_size" {
+  type    = string
+  default = "M"
+  # Can be S, M, L.
+  # S = 2 GB RAM
+  # M = 4 GB RAM
+  # L = 8 GB RAM
+}
+
+variable "min_nodes_per_pool" {
+  type    = number
+  default = 1
+}
+
+variable "max_nodes_per_pool" {
+  type    = number
+  default = 0
+}
+
+variable "enable_arm_pool" {
+  type    = bool
+  default = true
+}
--- a/prepare-vms/README.md
+++ b/prepare-vms/README.md
@@ -14,9 +14,7 @@ These tools can help you to create VMs on:

 - [Docker](https://docs.docker.com/engine/installation/)
 - [Docker Compose](https://docs.docker.com/compose/install/)
- [Parallel SSH](https://github.com/lilydjwg/pssh)
-  (should be installable with `pip install git+https://github.com/lilydjwg/pssh`;
-  on a Mac, try `brew install pssh`)
+- [Parallel SSH](https://code.google.com/archive/p/parallel-ssh/) (on a Mac: `brew install pssh`) 

 Depending on the infrastructure that you want to use, you also need to install
 the CLI that is specific to that cloud. For OpenStack deployments, you will
--- a/prepare-vms/lib/commands.sh
+++ b/prepare-vms/lib/commands.sh
@@ -75,11 +75,9 @@ _cmd_createuser() {
    echo '$USER_LOGIN ALL=(ALL) NOPASSWD:ALL' | sudo tee /etc/sudoers.d/$USER_LOGIN
    "

-    # The MaxAuthTries is here to help with folks who have many SSH keys.
    pssh "
    set -e
    sudo sed -i 's/PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config
-    sudo sed -i 's/#MaxAuthTries 6/MaxAuthTries 42/' /etc/ssh/sshd_config
    sudo service ssh restart
    "

@@ -238,12 +236,6 @@ _cmd_docker() {
    sudo add-apt-repository 'deb https://download.docker.com/linux/ubuntu bionic stable'
    sudo apt-get -q update
    sudo apt-get -qy install docker-ce
-
-    # Add registry mirror configuration.
-    if ! [ -f /etc/docker/daemon.json ]; then
-        echo '{\"registry-mirrors\": [\"https://mirror.gcr.io\"]}' | sudo tee /etc/docker/daemon.json
-        sudo systemctl restart docker
-    fi
    "

    ##VERSION## https://github.com/docker/compose/releases
@@ -311,15 +303,13 @@ _cmd_kube() {
    need_login_password

    # Optional version, e.g. 1.13.5
-    SETTINGS=tags/$TAG/settings.yaml
-    KUBEVERSION=$(awk '/^kubernetes_version:/ {print $2}' $SETTINGS)
+    KUBEVERSION=$2
    if [ "$KUBEVERSION" ]; then
-        pssh "
-        sudo tee /etc/apt/preferences.d/kubernetes <<EOF
-Package: kubectl kubeadm kubelet
-Pin: version $KUBEVERSION*
-Pin-Priority: 1000
-EOF"
+        EXTRA_APTGET="=$KUBEVERSION-00"
+        EXTRA_KUBEADM="kubernetesVersion: v$KUBEVERSION"
+    else
+        EXTRA_APTGET=""
+        EXTRA_KUBEADM=""
    fi

    # Install packages
@@ -330,8 +320,7 @@ EOF"
    sudo tee /etc/apt/sources.list.d/kubernetes.list"
    pssh --timeout 200 "
    sudo apt-get update -q &&
-    sudo apt-get install -qy kubelet kubeadm kubectl &&
-    sudo apt-mark hold kubelet kubeadm kubectl
+    sudo apt-get install -qy kubelet$EXTRA_APTGET kubeadm$EXTRA_APTGET kubectl$EXTRA_APTGET &&
    kubectl completion bash | sudo tee /etc/bash_completion.d/kubectl &&
    echo 'alias k=kubectl' | sudo tee /etc/bash_completion.d/k &&
    echo 'complete -F __start_kubectl k' | sudo tee -a /etc/bash_completion.d/k"
@@ -343,11 +332,6 @@ EOF"
        sudo swapoff -a"
    fi

-    # Re-enable CRI interface in containerd
-    pssh "
-    echo '# Use default parameters for containerd.' | sudo tee /etc/containerd/config.toml
-    sudo systemctl restart containerd"
-
    # Initialize kube control plane
    pssh --timeout 200 "
    if i_am_first_node && [ ! -f /etc/kubernetes/admin.conf ]; then
@@ -357,38 +341,19 @@ kind: InitConfiguration
 apiVersion: kubeadm.k8s.io/v1beta2
 bootstrapTokens:
 - token: \$(cat /tmp/token)
-nodeRegistration:
-  # Comment out the next line to switch back to Docker.
-  criSocket: /run/containerd/containerd.sock
-  ignorePreflightErrors:
-  - NumCPU
---
-kind: JoinConfiguration
-apiVersion: kubeadm.k8s.io/v1beta2
-discovery:
-  bootstrapToken:
-    apiServerEndpoint: \$(cat /etc/name_of_first_node):6443
-    token: \$(cat /tmp/token)
-    unsafeSkipCAVerification: true
-nodeRegistration:
-  # Comment out the next line to switch back to Docker.
-  criSocket: /run/containerd/containerd.sock
-  ignorePreflightErrors:
-  - NumCPU
 ---
 kind: KubeletConfiguration
 apiVersion: kubelet.config.k8s.io/v1beta1
-# The following line is necessary when using Docker.
-# It doesn't seem necessary when using containerd.
-#cgroupDriver: cgroupfs
+cgroupDriver: cgroupfs
 ---
 kind: ClusterConfiguration
 apiVersion: kubeadm.k8s.io/v1beta2
 apiServer:
  certSANs:
  - \$(cat /tmp/ipv4)
+$EXTRA_KUBEADM
 EOF
-	sudo kubeadm init --config=/tmp/kubeadm-config.yaml
+	sudo kubeadm init --config=/tmp/kubeadm-config.yaml --ignore-preflight-errors=NumCPU
    fi"

    # Put kubeconfig in ubuntu's and $USER_LOGIN's accounts
@@ -412,8 +377,8 @@ EOF
    pssh --timeout 200 "
    if ! i_am_first_node && [ ! -f /etc/kubernetes/kubelet.conf ]; then
        FIRSTNODE=\$(cat /etc/name_of_first_node) &&
-        ssh $SSHOPTS \$FIRSTNODE cat /tmp/kubeadm-config.yaml > /tmp/kubeadm-config.yaml &&
-        sudo kubeadm join --config /tmp/kubeadm-config.yaml
+        TOKEN=\$(ssh $SSHOPTS \$FIRSTNODE cat /tmp/token) &&
+        sudo kubeadm join --discovery-token-unsafe-skip-ca-verification --token \$TOKEN \$FIRSTNODE:6443
    fi"

    # Install metrics server
@@ -504,7 +469,7 @@ EOF
    if [ ! -x /usr/local/bin/kustomize ]; then
        curl -fsSL $URL |
        sudo tar -C /usr/local/bin -zx kustomize
-        kustomize completion bash | sudo tee /etc/bash_completion.d/kustomize
+        echo complete -C /usr/local/bin/kustomize kustomize | sudo tee /etc/bash_completion.d/kustomize
        kustomize version
    fi"

@@ -713,7 +678,7 @@ _cmd_tailhist () {
    ARCH=${ARCHITECTURE-amd64}
    [ "$ARCH" = "aarch64" ] && ARCH=arm64

-    pssh "
+    pssh -i "
    set -e
    wget https://github.com/joewalnes/websocketd/releases/download/v0.3.0/websocketd-0.3.0-linux_$ARCH.zip
    unzip websocketd-0.3.0-linux_$ARCH.zip websocketd
--- a/prepare-vms/lib/infra/openstack-tf.sh
+++ b/prepare-vms/lib/infra/openstack-tf.sh
@@ -1,7 +1,7 @@
 infra_start() {
        COUNT=$1

-        cp terraform-openstack/*.tf tags/$TAG
+        cp terraform/*.tf tags/$TAG
        (
                cd tags/$TAG
                if ! terraform init; then
--- a/prepare-vms/netlify-dns.sh
+++ b/prepare-vms/netlify-dns.sh
@@ -1,82 +0,0 @@
-#!/bin/sh
-
-# https://open-api.netlify.com/#tag/dnsZone
-[ "$1" ] || {
-  echo ""
-  echo "Add a record in Netlify DNS."
-  echo "This script is hardcoded to add a record to container.training".
-  echo ""
-  echo "Syntax:"
-  echo "$0 list"
-  echo "$0 add <name> <ipaddr>"
-  echo "$0 del <recordid>"
-  echo ""
-  echo "Example to create a A record for eu.container.training:"
-  echo "$0 add eu 185.145.250.0"
-  echo ""
-  exit 1
-}
-
-NETLIFY_USERID=$(jq .userId < ~/.config/netlify/config.json)
-NETLIFY_TOKEN=$(jq -r .users[$NETLIFY_USERID].auth.token < ~/.config/netlify/config.json)
-
-netlify() {
-  URI=$1
-  shift
-  http https://api.netlify.com/api/v1/$URI "$@" "Authorization:Bearer $NETLIFY_TOKEN"
-}
-
-ZONE_ID=$(netlify dns_zones |
-          jq -r '.[] | select ( .name == "container.training" ) | .id')
-
-_list() {
-  netlify dns_zones/$ZONE_ID/dns_records |
-    jq -r '.[] | select(.type=="A") | [.hostname, .type, .value, .id] | @tsv'
-}
-
-_add() {
-  NAME=$1.container.training
-  ADDR=$2
-
-
-  # It looks like if we create two identical records, then delete one of them,
-  # Netlify DNS ends up in a weird state (the name doesn't resolve anymore even
-  # though it's still visible through the API and the website?)
-
-  if netlify dns_zones/$ZONE_ID/dns_records |
-          jq '.[] | select(.hostname=="'$NAME'" and .type=="A" and .value=="'$ADDR'")' |
-          grep .
-  then
-    echo "It looks like that record already exists. Refusing to create it."
-    exit 1
-  fi
-
-  netlify dns_zones/$ZONE_ID/dns_records type=A hostname=$NAME value=$ADDR ttl=300
-
-  netlify dns_zones/$ZONE_ID/dns_records |
-          jq '.[] | select(.hostname=="'$NAME'")'
-}
-
-_del() {
-  RECORD_ID=$1
-  # OK, since that one is dangerous, I'm putting the whole request explicitly here
-  http DELETE \
-    https://api.netlify.com/api/v1/dns_zones/$ZONE_ID/dns_records/$RECORD_ID \
-    "Authorization:Bearer $NETLIFY_TOKEN"
-}
-
-case "$1" in
-  list)
-    _list
-    ;;
-  add)
-    _add $2 $3
-    ;;
-  del)
-    _del $2
-    ;;
-  *)
-    echo "Unknown command '$1'."
-    exit 1
-    ;;
-esac
--- a/prepare-vms/settings/admin-oldversion.yaml
+++ b/prepare-vms/settings/admin-oldversion.yaml
@@ -1,33 +0,0 @@
-# Number of VMs per cluster
-clustersize: 3
-
-# The hostname of each node will be clusterprefix + a number
-clusterprefix: oldversion
-
-# Jinja2 template to use to generate ready-to-cut cards
-cards_template: cards.html
-
-# Use "Letter" in the US, and "A4" everywhere else
-paper_size: A4
-
-# Login and password that students will use
-user_login: k8s
-user_password: training
-
-# For a list of old versions, check:
-# https://kubernetes.io/releases/patch-releases/#non-active-branch-history
-kubernetes_version: 1.18.20
-
-image:
-
-steps:
-  - wait
-  - clusterize
-  - tools
-  - docker
-  - createuser
-  - webssh
-  - tailhist
-  - kube
-  - kubetools
-  - kubetest
--- a/prepare-vms/setup-admin-clusters.sh
+++ b/prepare-vms/setup-admin-clusters.sh
@@ -3,7 +3,7 @@ set -e

 export AWS_INSTANCE_TYPE=t3a.small

-INFRA=infra/aws-eu-north-1
+INFRA=infra/aws-us-east-2

 STUDENTS=2

@@ -33,15 +33,9 @@ TAG=$PREFIX-$SETTINGS
 	--settings settings/$SETTINGS.yaml \
 	--students $STUDENTS

-INFRA=infra/enix
+#INFRA=infra/aws-us-west-1

-SETTINGS=admin-oldversion
-TAG=$PREFIX-$SETTINGS
-./workshopctl start \
-	--tag $TAG \
-	--infra $INFRA \
-	--settings settings/$SETTINGS.yaml \
-	--students $STUDENTS
+export AWS_INSTANCE_TYPE=t3a.medium

 SETTINGS=admin-test
 TAG=$PREFIX-$SETTINGS
--- a/prepare-vms/terraform-openstack/keypair.tf
+++ b/prepare-vms/terraform-openstack/keypair.tf
--- a/prepare-vms/terraform-openstack/machines.tf
+++ b/prepare-vms/terraform-openstack/machines.tf
--- a/prepare-vms/terraform-openstack/network.tf
+++ b/prepare-vms/terraform-openstack/network.tf
--- a/prepare-vms/terraform-openstack/provider.tf
+++ b/prepare-vms/terraform-openstack/provider.tf
--- a/prepare-vms/terraform-openstack/secgroup.tf
+++ b/prepare-vms/terraform-openstack/secgroup.tf
--- a/prepare-vms/terraform-openstack/vars.tf
+++ b/prepare-vms/terraform-openstack/vars.tf
--- a/slides/1.yml
+++ b/slides/1.yml
@@ -0,0 +1,56 @@
+title: |
+  Docker & Kubernetes
+  Part 1 - Docker
+
+chat: "[Teams](https://teams.microsoft.com/l/channel/19%3arctk01XQVWxbj6pjTtJDfVd0_QOzfzYe7Xt8VDpl9681%40thread.tacv2/General?groupId=89c621d8-7080-447f-a7eb-9d6704776dd5&tenantId=72aa0d83-624a-4ebf-a683-1b9b45548610)"
+
+gitrepo: github.com/jpetazzo/container.training
+
+slides: https://2021-11-derivco.container.training/
+
+#slidenumberprefix: "#SomeHashTag &mdash; "
+
+exclude:
+- self-paced
+
+content:
+- shared/title.md
+- logistics.md
+- containers/intro.md
+- shared/about-slides.md
+- shared/chat-room-im.md
+#- shared/chat-room-zoom-meeting.md
+#- shared/chat-room-zoom-webinar.md
+- shared/toc.md
+- # DAY 1
+  #- containers/Docker_Overview.md
+  #- containers/Docker_History.md
+  - containers/Training_Environment.md
+  #- containers/Installing_Docker.md
+  - containers/First_Containers.md
+  - containers/Background_Containers.md
+  - containers/Initial_Images.md
+-
+  - containers/Building_Images_Interactively.md
+  - containers/Building_Images_With_Dockerfiles.md
+  - containers/Cmd_And_Entrypoint.md
+  - containers/Copying_Files_During_Build.md
+  - containers/Exercise_Dockerfile_Basic.md
+- # DAY 2
+  - containers/Dockerfile_Tips.md
+  - containers/Multi_Stage_Builds.md
+  - containers/Container_Networking_Basics.md
+  - containers/Local_Development_Workflow.md
+  - containers/Getting_Inside.md
+-
+  - containers/Container_Network_Model.md
+  - containers/Compose_For_Dev_Stacks.md
+  - containers/Exercise_Composefile.md
+  - containers/Exercise_Dockerfile_Advanced.md
+  - shared/thankyou.md
+- # EXTRA
+  - containers/Start_And_Attach.md
+  - containers/Naming_And_Inspecting.md
+  - containers/Labels.md
+  - containers/Advanced_Dockerfiles.md
+  - containers/Network_Drivers.md
--- a/slides/_redirects
+++ b/slides/_redirects
@@ -2,7 +2,7 @@
 #/ /kube-halfday.yml.html 200!
 #/ /kube-fullday.yml.html 200!
 #/ /kube-twodays.yml.html 200!
-/ /kube.yml.html 200!
+/ /1.yml.html 200!

 # And this allows to do "git clone https://container.training".
 /info/refs service=git-upload-pack https://github.com/jpetazzo/container.training/info/refs?service=git-upload-pack
--- a/slides/containers/Publishing_To_Docker_Hub.md
+++ b/slides/containers/Publishing_To_Docker_Hub.md
@@ -109,7 +109,7 @@ class: extra-details

 - Example: [ctr.run](https://ctr.run/)

-.lab[
+.exercise[

 - Use ctr.run to automatically build a container image and run it:
  ```bash
--- a/slides/containers/intro.md
+++ b/slides/containers/intro.md
@@ -28,7 +28,7 @@ class: self-paced
 - Likewise, it will take more than merely *reading* these slides
  to make you an expert

- These slides include *tons* of demos, exercises, and examples
+- These slides include *tons* of exercises and examples

 - They assume that you have access to a machine running Docker

--- a/slides/exercises/appconfig-brief.md
+++ b/slides/exercises/appconfig-brief.md
@@ -1,5 +0,0 @@
-## Exercise — Application Configuration
-
- Configure an application with a ConfigMap
-
- Generate configuration file from the downward API
--- a/slides/exercises/appconfig-details.md
+++ b/slides/exercises/appconfig-details.md
@@ -1,87 +0,0 @@
-# Exercise — Application Configuration
-
- We want to configure an application with a ConfigMap
-
- We will use the "rainbow" example shown previously
-
-  (HAProxy load balancing traffic to services in multiple namespaces)
-
- We won't provide the HAProxy configuration file
-
- Instead, we will provide a list of namespaces
-
-  (e.g. as a space-delimited list in a ConfigMap)
-
- Our Pod should generate the HAProxy configuration using the ConfigMap
-
---
-
-## Setup
-
- Let's say that we have the "rainbow" app deployed:
-  ```bash
-  kubectl apply -f ~/container.training/k8s/rainbow.yaml
-  ```
-
- And a ConfigMap like the following one:
-  ```bash
-  kubectl create configmap rainbow --from-literal=namespaces="blue green"
-  ```
-
---
-
-## Goal 1
-
- We want a Deployment and a Service called `rainbow`
-
- The `rainbow` Service should load balance across Namespaces `blue` and `green`
-
-  (i.e. to the Services called `color` in both these Namespaces)
-
- We want to be able to update the configuration:
-
-  - update the ConfigMap to put `blue green red`
-
-  - what should we do so that HAproxy picks up the change?
-
---
-
-## Goal 2
-
- Check what happens if we specify a backend that doesn't exist
-
-  (e.g. add `purple` to the list of namespaces)
-
- If we specify invalid backends to HAProxy, it won't start!
-
- Implement a workaround among these two:
-
-  - remove invalid backends from the list before starting HAProxy
-
-  - wait until all backends are valid before starting HAProxy
-
---
-
-## Goal 3
-
- We'd like HAProxy to pick up ConfigMap updates automatically
-
- How can we do that?
-
---
-
-## Hints
-
- Check the following slides if you need help!
-
--
-
- We want to generate the HAProxy configuration in an `initContainer`
-
--
-
- The `namespaces` entry of the `rainbow` ConfigMap should be exposed to the `initContainer`
-
--
-
- The HAProxy configuration should be in a volume shared with HAProxy
--- a/slides/exercises/dmuc-brief.md
+++ b/slides/exercises/dmuc-brief.md
@@ -1,7 +0,0 @@
-## Exercise — Build a Cluster
-
- Deploy a cluster by configuring and running each component manually
-
- Add CNI networking
-
- Generate and validate ServiceAccount tokens
--- a/slides/exercises/dmuc-details.md
+++ b/slides/exercises/dmuc-details.md
@@ -1,33 +0,0 @@
-# Exercise — Build a Cluster
-
- Step 1: deploy a cluster
-
-  - follow the steps in the "Dessine-moi un cluster" section
-
- Step 2: add CNI networking
-
-  - une kube-router
-
-  - interconnect with the route-reflector
-
-  - check that you receive the routes of other clusters
-
- Step 3: generate and validate ServiceAccount tokens
-
-  - see next slide for help!
-
---
-
-## ServiceAccount tokens
-
- We need to generate a TLS key pair and certificate
-
- A self-signed key will work
-
- We don't need anything particular in the certificate
-
-  (no particular CN, key use flags, etc.)
-
- The key needs to be passed to both API server and controller manager
-
- Check that ServiceAccount tokens are generated correctly
--- a/slides/exercises/healthchecks-brief.md
+++ b/slides/exercises/healthchecks-brief.md
@@ -4,6 +4,8 @@

  (we will use the `rng` service in the dockercoins app)

- See what happens when the load increses
+- Observe the correct behavior of the readiness probe

-  (spoiler alert: it involves timeouts!)
+  (when deploying e.g. an invalid image)
+
+- Observe the behavior of the liveness probe
--- a/slides/exercises/healthchecks-details.md
+++ b/slides/exercises/healthchecks-details.md
@@ -2,85 +2,36 @@

 - We want to add healthchecks to the `rng` service in dockercoins

- The `rng` service exhibits an interesting behavior under load:
-
-  *its latency increases (which will cause probes to time out!)*
-
- We want to see:
-
-  - what happens when the readiness probe fails
-
-  - what happens when the liveness probe fails
-
-  - how to set "appropriate" probes and probe parameters
-
---
-
-## Setup
-
 - First, deploy a new copy of dockercoins

-  (for instance, in a brand new namespace)
+- Then, add a readiness probe on the `rng` service

- Pro tip #1: ping (e.g. with `httping`) the `rng` service at all times
-
-  - it should initially show a few milliseconds latency
-
-  - that will increase when we scale up
-
-  - it will also let us detect when the service goes "boom"
-
- Pro tip #2: also keep an eye on the web UI
-
---
-
-## Readiness
-
- Add a readiness probe to `rng`
-
-  - this requires editing the pod template in the Deployment manifest
-
-  - use a simple HTTP check on the `/` route of the service
-
-  - keep all other parameters (timeouts, thresholds...) at their default values
+  (using a simple HTTP check on the `/` route of the service)

 - Check what happens when deploying an invalid image for `rng` (e.g. `alpine`)

-*(If the probe was set up correctly, the app will continue to work,
-because Kubernetes won't switch over the traffic to the `alpine` containers,
-because they don't pass the readiness probe.)*
+- Then roll back `rng` to the original image and add a liveness probe
+
+  (with the same parameters)
+
+- Scale up the `worker` service (to 15+ workers) and observe
+
+- What happens, and how can we improve the situation?

 ---

-## Readiness under load
+## Goal

- Then roll back `rng` to the original image
+- *Before* adding the readiness probe:

- Check what happens when we scale up the `worker` Deployment to 15+ workers
+  updating the image of the `rng` service with `alpine` should break it

-  (get the latency above 1 second)
+- *After* adding the readiness probe:

-*(We should now observe intermittent unavailability of the service, i.e. every
-30 seconds it will be unreachable for a bit, then come back, then go away again, etc.)*
+  updating the image of the `rng` service with `alpine` shouldn't break it

---
+- When adding the liveness probe, nothing special should happen

-## Liveness
+- Scaling the `worker` service will then cause disruptions

- Now replace the readiness probe with a liveness probe
-
- What happens now?
-
-*(At first the behavior looks the same as with the readiness probe:
-service becomes unreachable, then reachable again, etc.; but there is
-a significant difference behind the scenes. What is it?)*
-
---
-
-## Readiness and liveness
-
- Bonus questions!
-
- What happens if we enable both probes at the same time?
-
- What strategies can we use so that both probes are useful?
+- The final goal is to understand why, and how to fix it
--- a/slides/exercises/ingress-details.md
+++ b/slides/exercises/ingress-details.md
@@ -6,7 +6,7 @@

  - the web app itself (dockercoins, NGINX, whatever we want)

-  - an ingress controller
+  - an ingress controller (we suggest Traefik)

  - a domain name (`use \*.nip.io` or `\*.localdev.me`)

@@ -16,7 +16,7 @@

 ## Goal

- We want to be able to access the web app using a URL like:
+- We want to be able to access the web app using an URL like:

  http://webapp.localdev.me

@@ -30,13 +30,11 @@

 ## Hints

- For the ingress controller, we can use:
+- Traefik can be installed with Helm

-  - [ingress-nginx](https://github.com/kubernetes/ingress-nginx/blob/main/docs/deploy/index.md)
+  (it can be found on the Artifact Hub)

-  - the [Traefik Helm chart](https://doc.traefik.io/traefik/getting-started/install-traefik/#use-the-helm-chart)
-
-  - the container.training [Traefik DaemonSet](https://raw.githubusercontent.com/jpetazzo/container.training/main/k8s/traefik-v2.yaml)
+- If using Kubernetes 1.22+, make sure to use Traefik 2.5+

 - If our cluster supports LoadBalancer Services: easy

--- a/slides/exercises/ingress-secret-policy-brief.md
+++ b/slides/exercises/ingress-secret-policy-brief.md
@@ -1,5 +1,3 @@
-⚠️ BROKEN EXERCISE - DO NOT USE
-
 ## Exercise — Ingress Secret Policy

 *Implement policy to limit impact of ingress controller vulnerabilities.*
--- a/slides/exercises/ingress-secret-policy-details.md
+++ b/slides/exercises/ingress-secret-policy-details.md
@@ -1,5 +1,3 @@
-⚠️ BROKEN EXERCISE - DO NOT USE
-
 # Exercise — Ingress Secret Policy

 - Most ingress controllers have access to all Secrets
@@ -90,6 +88,6 @@

 ## Step 5: double-check

- Check that the Ingress Controller can't access other secrets
+- Check that the Ingres Controller can't access other secrets

  (e.g. by manually creating a Secret and checking with `kubectl exec`?)
--- a/slides/exercises/k8sfundamentals-details.md
+++ b/slides/exercises/k8sfundamentals-details.md
@@ -8,37 +8,25 @@

 - We'll use one Deployment for each component

-  (created with `kubectl create deployment`)
+  (see next slide for the images to use)

 - We'll connect them with Services

-  (create with `kubectl expose`)
+- We'll check that we can access the web UI in a browser

 ---

 ## Images

- We'll use the following images:
+- hasher → `dockercoins/hasher:v0.1`

-  - hasher → `dockercoins/hasher:v0.1`
+- redis → `redis`

-  - redis → `redis`
+- rng → `dockercoins/rng:v0.1`

-  - rng → `dockercoins/rng:v0.1`
+- webui → `dockercoins/webui:v0.1`

-  - webui → `dockercoins/webui:v0.1`
-
-  - worker → `dockercoins/worker:v0.1`
-
- All services should be internal services, except the web UI
-
-  (since we want to be able to connect to the web UI from outside)
-
---
-
-class: pic
-
-![Dockercoins architecture diagram](images/dockercoins-diagram.png)
+- worker → `dockercoins/worker:v0.1`

 ---

@@ -46,7 +34,7 @@ class: pic

 - We should be able to see the web UI in our browser

-  (with the graph showing approximately 3-4 hashes/second)
+  (with the graph showing approximatiely 3-4 hashes/second)

 ---

@@ -56,4 +44,4 @@ class: pic

  (check the logs of the worker; they indicate the port numbers)

- The web UI can be exposed with a NodePort or LoadBalancer Service
+- The web UI can be exposed with a NodePort Service
--- a/slides/exercises/kyverno-ingress-domain-name-brief.md
+++ b/slides/exercises/kyverno-ingress-domain-name-brief.md
@@ -1,9 +0,0 @@
-## Exercise — Generating Ingress With Kyverno
-
- When a Service gets created, automatically generate an Ingress
-
- Step 1: expose all services with a hard-coded domain name
-
- Step 2: only expose services that have a port named `http`
-
- Step 3: configure the domain name with a per-namespace ConfigMap
--- a/slides/exercises/kyverno-ingress-domain-name-details.md
+++ b/slides/exercises/kyverno-ingress-domain-name-details.md
@@ -1,33 +0,0 @@
-# Exercise — Generating Ingress With Kyverno
-
-When a Service gets created...
-
-*(for instance, Service `blue` in Namespace `rainbow`)*
-
-...Automatically generate an Ingress.
-
-*(for instance, with host name `blue.rainbow.MYDOMAIN.COM`)*
-
---
-
-## Goals
-
- Step 1: expose all services with a hard-coded domain name
-
- Step 2: only expose services that have a port named `http`
-
- Step 3: configure the domain name with a per-namespace ConfigMap
-
-  (e.g. `kubectl create configmap ingress-domain-name --from-literal=domain=1.2.3.4.nip.io`)
-
---
-
-## Hints
-
- We want to use a Kyverno `generate` ClusterPolicy
-
- For step 1, check [Generate Resources](https://kyverno.io/docs/writing-policies/generate/) documentation
-
- For step 2, check [Preconditions](https://kyverno.io/docs/writing-policies/preconditions/) documentation
-
- For step 3, check [External Data Sources](https://kyverno.io/docs/writing-policies/external-data-sources/) documentation
--- a/slides/exercises/remotecluster-brief.md
+++ b/slides/exercises/remotecluster-brief.md
@@ -1,9 +0,0 @@
-## Exercise — Remote Cluster
-
- Install kubectl locally
-
- Retrieve the kubeconfig file of our remote cluster
-
- Deploy dockercoins on that cluster
-
- Access an internal service without exposing it
--- a/slides/exercises/remotecluster-details.md
+++ b/slides/exercises/remotecluster-details.md
@@ -1,62 +0,0 @@
-# Exercise — Remote Cluster
-
- We want to control a remote cluster
-
- Then we want to run a copy of dockercoins on that cluster
-
- We want to be able to connect to an internal service
-
---
-
-## Goal
-
- Be able to access e.g. hasher, rng, or webui
-
-  (without exposing them with a NodePort or LoadBalancer service)
-
---
-
-## Getting access to the cluster
-
- If you don't have `kubectl` on your machine, install it
-
- Download the kubeconfig file from the remote cluster
-
-  (you can use `scp` or even copy-paste it)
-
- If you already have a kubeconfig file on your machine:
-
-  - save the remote kubeconfig with another name (e.g. `~/.kube/config.remote`)
-
-  - set the `KUBECONFIG` environment variable to point to that file name
-
-  - ...or use the `--kubeconfig=...` option with `kubectl`
-
- Check that you can access the cluster (e.g. `kubectl get nodes`)
-
---
-
-## If you get an error...
-
-⚠️ The following applies to clusters deployed with `kubeadm`
-
- If you have a cluster where the nodes are named `node1`, `node2`, etc.
-
- `kubectl` commands might show connection errors with internal IP addresses
-
-  (e.g. 10.10... or 172.17...)
-
- In that case, you might need to edit the `kubeconfig` file:
-
-  - find the server address
-
-  - update it to put the *external* address of the first node of the cluster
-
---
-
-
-## Deploying an app
-
- Deploy another copy of dockercoins from your local machine
-
- Access internal services (e.g. with `kubectl port-forward`)
--- a/slides/exercises/sealed-secrets-details.md
+++ b/slides/exercises/sealed-secrets-details.md
@@ -24,9 +24,9 @@ We will call them "dev cluster" and "prod cluster".

 - Our application needs two secrets:

-  - a *logging API token* (not too sensitive; same in dev and prod)
+  - `logging_api_token` (not too sensitive; same in dev and prod)

-  - a *database password* (sensitive; different in dev and prod)
+  - `database_password` (sensitive; different in dev and prod)

 - Secrets can be exposed as env vars, or mounted in volumes

@@ -42,7 +42,7 @@ We will call them "dev cluster" and "prod cluster".

 - On the dev cluster, create a Namespace called `dev`

- Create the two secrets, `logging-api-token` and `database-password`
+- Create the two secrets, `logging_api_token` and `database_password`

  (the content doesn't matter; put a random string of your choice)

@@ -110,8 +110,8 @@ We want Alice to be able to:

 - deploy the whole application in the `prod` namespace

- access the *logging API token* secret
+- access the `logging_api_token` secret

- but *not* the *database password* secret
+- but *not* the `database_password` secret

 - view the logs of the app
--- a/slides/exercises/tf-nodepools-brief.md
+++ b/slides/exercises/tf-nodepools-brief.md
@@ -1,9 +0,0 @@
-## Exercise — Terraform Node Pools
-
- Write a Terraform configuration to deploy a cluster
-
- The cluster should have two node pools with autoscaling
-
- Deploy two apps, each using exclusively one node pool
-
- Bonus: deploy an app balanced across both node pools
--- a/slides/exercises/tf-nodepools-details.md
+++ b/slides/exercises/tf-nodepools-details.md
@@ -1,69 +0,0 @@
-# Exercise — Terraform Node Pools
-
- Write a Terraform configuration to deploy a cluster
-
- The cluster should have two node pools with autoscaling
-
- Deploy two apps, each using exclusively one node pool
-
- Bonus: deploy an app balanced across both node pools
-
---
-
-## Cluster deployment
-
- Write a Terraform configuration to deploy a cluster
-
- We want to have two node pools with autoscaling
-
- Example for sizing:
-
-  - 4 GB / 1 CPU per node
-
-  - pools of 1 to 4 nodes
-
---
-
-## Cluster autoscaling
-
- Deploy an app on the cluster
-
-  (you can use `nginx`, `jpetazzo/color`...)
-
- Set a resource request (e.g. 1 GB RAM)
-
- Scale up and verify that the autoscaler kicks in
-
---
-
-## Pool isolation
-
- We want to deploy two apps
-
- The first app should be deployed exclusively on the first pool
-
- The second app should be deployed exclusively on the second pool
-
- Check the next slide for hints!
-
---
-
-## Hints
-
- One solution involves adding a `nodeSelector` to the pod templates
-
- Another solution involves adding:
-
-  - `taints` to the node pools
-
-  - matching `tolerations` to the pod templates
-
---
-
-## Balancing
-
- Step 1: make sure that the pools are not balanced
-
- Step 2: deploy a new app, check that it goes to the emptiest pool
-
- Step 3: update the app so that it balances (as much as possible) between pools
--- a/slides/find-unmerged-changes.sh
+++ b/slides/find-unmerged-changes.sh
@@ -1,60 +0,0 @@
-#!/bin/sh
-
-# The materials for a given training live in their own branch.
-# Sometimes, we write custom content (or simply new content) for a training,
-# and that content doesn't get merged back to main. This script tries to
-# detect that with the following heuristics:
-# - list all remote branches
-# - for each remote branch, list the changes that weren't merged into main
-#   (using "diff main...$BRANCH", three dots)
-# - ignore a bunch of training-specific files that change all the time anyway
-# - for the remaining files, compute the diff between main and the branch
-#   (using "diff main..$BRANCH", two dots)
-# - ignore changes of less than 10 lines
-# - also ignore a few red herrings
-# - display whatever is left
-
-# For "git diff" (in the filter function) to work correctly, we must be
-# at the root of the repo.
-cd $(git rev-parse --show-toplevel)
-
-BRANCHES=$(git branch -r | grep -v origin/HEAD | grep origin/2)
-
-filter() {
-  threshold=10
-  while read filename; do
-    case $filename in
-      # Generic training-specific files
-      slides/*.html) continue;;
-      slides/*.yml) continue;;
-      slides/logistics*.md) continue;;
-      # Specific content that can be ignored
-      #slides/containers/Local_Environment.md) threshold=100;;
-      # Content that was moved/refactored enough to confuse us
-      slides/containers/Local_Environment.md) threshold=100;;
-      slides/exercises.md) continue;;
-      slides/k8s/batch-jobs) threshold=20;;
-      # Renames
-      */{*}*) continue;;
-    esac
-    git diff --find-renames --numstat main..$BRANCH -- "$filename" | {
-      # If the files are identical, the diff will be empty, and "read" will fail.
-      read plus minus filename || return
-      # Ignore binary files (FIXME though?)
-      if [ $plus = - ]; then
-        return
-      fi
-      diff=$((plus-minus))
-      if [ $diff -gt $threshold ]; then
-        echo git diff main..$BRANCH -- $filename
-      fi
-    }
-  done
-}
-
-for BRANCH in $BRANCHES; do
-  if FILES=$(git diff --find-renames --name-only main...$BRANCH | filter | grep .); then
-    echo "🌳 $BRANCH:"
-    echo "$FILES"
-  fi
-done
--- a/slides/k8s/access-eks-cluster.md
+++ b/slides/k8s/access-eks-cluster.md
@@ -32,7 +32,7 @@

 - You're welcome to use whatever you like (e.g. AWS profiles)

-.lab[
+.exercise[

 - Set the AWS region, API access key, and secret key:
  ```bash
@@ -58,7 +58,7 @@

  - register it in our kubeconfig file

-.lab[
+.exercise[

 - Update our kubeconfig file:
  ```bash
--- a/slides/k8s/accessinternal.md
+++ b/slides/k8s/accessinternal.md
@@ -20,13 +20,13 @@

 ## Suspension of disbelief

-The labs and demos in this section assume that we have set up `kubectl` on our
+The exercises in this section assume that we have set up `kubectl` on our
 local machine in order to access a remote cluster.

 We will therefore show how to access services and pods of the remote cluster,
 from our local machine.

-You can also run these commands directly on the cluster (if you haven't
+You can also run these exercises directly on the cluster (if you haven't
 installed and set up `kubectl` locally).

 Running commands locally will be less useful
@@ -58,7 +58,7 @@ installed and set up `kubectl` to communicate with your cluster.

 - Let's access the `webui` service through `kubectl proxy`

-.lab[
+.exercise[

 - Run an API proxy in the background:
  ```bash
@@ -101,7 +101,7 @@ installed and set up `kubectl` to communicate with your cluster.

 - Let's access our remote Redis server

-.lab[
+.exercise[

 - Forward connections from local port 10000 to remote port 6379:
  ```bash
--- a/slides/k8s/admission.md
+++ b/slides/k8s/admission.md
@@ -198,7 +198,7 @@ Some examples ...

  (the Node "echo" app, the Flask app, and one ngrok tunnel for each of them)

-.lab[
+.exercise[

 - Go to the webhook directory:
  ```bash
@@ -244,7 +244,7 @@ class: extra-details

 - We need to update the configuration with the correct `url`

-.lab[
+.exercise[

 - Edit the webhook configuration manifest:
  ```bash
@@ -271,7 +271,7 @@ class: extra-details

  (so if the webhook server is down, we can still create pods)

-.lab[
+.exercise[

 - Register the webhook:
  ```bash
@@ -288,7 +288,7 @@ It is strongly recommended to tail the logs of the API server while doing that.

 - Let's create a pod and try to set a `color` label

-.lab[
+.exercise[

 - Create a pod named `chroma`:
  ```bash
@@ -328,7 +328,7 @@ Note: the webhook doesn't do anything (other than printing the request payload).

 ## Update the webhook configuration

-.lab[
+.exercise[

 - First, check the ngrok URL of the tunnel for the Flask app:
  ```bash
@@ -395,7 +395,7 @@ Note: the webhook doesn't do anything (other than printing the request payload).

 ## Let's get to work!

-.lab[
+.exercise[

 - Make sure we're in the right directory:
  ```bash
@@ -424,7 +424,7 @@ Note: the webhook doesn't do anything (other than printing the request payload).

  ... we'll store it in a ConfigMap, and install dependencies on the fly

-.lab[
+.exercise[

 - Load the webhook source in a ConfigMap:
  ```bash
@@ -446,7 +446,7 @@ Note: the webhook doesn't do anything (other than printing the request payload).

  (of course, there are plenty others options; e.g. `cfssl`)

-.lab[
+.exercise[

 - Generate a self-signed certificate:
  ```bash
@@ -470,7 +470,7 @@ Note: the webhook doesn't do anything (other than printing the request payload).

 - Let's reconfigure the webhook to use our Service instead of ngrok

-.lab[
+.exercise[

 - Edit the webhook configuration manifest:
  ```bash
@@ -504,7 +504,7 @@ Note: the webhook doesn't do anything (other than printing the request payload).

 Shell to the rescue!

-.lab[
+.exercise[

 - Load up our cert and encode it in base64:
  ```bash
--- a/slides/k8s/aggregation-layer.md
+++ b/slides/k8s/aggregation-layer.md
@@ -66,7 +66,7 @@

 - We'll ask `kubectl` to show us the exacts requests that it's making

-.lab[
+.exercise[

 - Check the URI for a cluster-scope, "core" resource, e.g. a Node:
  ```bash
@@ -122,7 +122,7 @@ class: extra-details

 - What about namespaced resources?

-.lab[
+.exercise[

 - Check the URI for a namespaced, "core" resource, e.g. a Service:
  ```bash
@@ -169,7 +169,7 @@ class: extra-details

 ## Accessing a subresource

-.lab[
+.exercise[

 - List `kube-proxy` pods:
  ```bash
@@ -200,7 +200,7 @@ command=echo&command=hello&command=world&container=kube-proxy&stderr=true&stdout

 - There are at least three useful commands to introspect the API server

-.lab[
+.exercise[

 - List resources types, their group, kind, short names, and scope:
  ```bash
@@ -249,7 +249,7 @@ command=echo&command=hello&command=world&container=kube-proxy&stderr=true&stdout

 The following assumes that `metrics-server` is deployed on your cluster.

-.lab[
+.exercise[

 - Check that the metrics.k8s.io is registered with `metrics-server`:
  ```bash
@@ -271,7 +271,7 @@ The following assumes that `metrics-server` is deployed on your cluster.

 - We can have multiple resources with the same name

-.lab[
+.exercise[

 - Look for resources named `node`:
  ```bash
@@ -298,7 +298,7 @@ The following assumes that `metrics-server` is deployed on your cluster.

 - But we can look at the raw data (with `-o json` or `-o yaml`)

-.lab[
+.exercise[

 - Look at NodeMetrics objects with one of these commands:
  ```bash
@@ -320,7 +320,7 @@ The following assumes that `metrics-server` is deployed on your cluster.

 --

-.lab[
+.exercise[

 - Display node metrics:
  ```bash
@@ -342,7 +342,7 @@ The following assumes that `metrics-server` is deployed on your cluster.

 - Then we can register that server by creating an APIService resource

-.lab[
+.exercise[

 - Check the definition used for the `metrics-server`:
  ```bash
--- a/slides/k8s/apiserver-deepdive.md
+++ b/slides/k8s/apiserver-deepdive.md
@@ -103,7 +103,7 @@ class: extra-details

 ---

-## `WithWaitGroup`
+## `WithWaitGroup`, 

 - When we shutdown, tells clients (with in-flight requests) to retry

--- a/slides/k8s/architecture.md
+++ b/slides/k8s/architecture.md
@@ -20,67 +20,25 @@ The control plane can run:

 - in containers, on the same nodes that run other application workloads

-  (default behavior for local clusters like [Minikube](https://github.com/kubernetes/minikube), [kind](https://kind.sigs.k8s.io/)...)
+  (example: [Minikube](https://github.com/kubernetes/minikube); 1 node runs everything, [kind](https://kind.sigs.k8s.io/))

 - on a dedicated node

-  (default behavior when deploying with kubeadm)
+  (example: a cluster installed with kubeadm)

 - on a dedicated set of nodes

-  ([Kubernetes The Hard Way](https://github.com/kelseyhightower/kubernetes-the-hard-way); [kops](https://github.com/kubernetes/kops); also kubeadm)
+  (example: [Kubernetes The Hard Way](https://github.com/kelseyhightower/kubernetes-the-hard-way); [kops](https://github.com/kubernetes/kops))

 - outside of the cluster

-  (most managed clusters like AKS, DOK, EKS, GKE, Kapsule, LKE, OKE...)
+  (example: most managed clusters like AKS, EKS, GKE)

 ---

 class: pic

-![](images/control-planes/single-node-dev.svg)
-
---
-
-class: pic
-
-![](images/control-planes/managed-kubernetes.svg)
-
---
-
-class: pic
-
-![](images/control-planes/single-control-and-workers.svg)
-
---
-
-class: pic
-
-![](images/control-planes/stacked-control-plane.svg)
-
---
-
-class: pic
-
-![](images/control-planes/non-dedicated-stacked-nodes.svg)
-
---
-
-class: pic
-
-![](images/control-planes/advanced-control-plane.svg)
-
---
-
-class: pic
-
-![](images/control-planes/advanced-control-plane-split-events.svg)
-
---
-
-class: pic
-
-![Kubernetes architecture diagram: communication between components](images/k8s-arch4-thanks-luxas.png)
+![Kubernetes architecture diagram: control plane and nodes](images/k8s-arch2.png)

 ---

@@ -157,6 +115,12 @@ The kubelet agent uses a number of special-purpose protocols and interfaces, inc

 ---

+class: pic
+
+![Kubernetes architecture diagram: communication between components](images/k8s-arch4-thanks-luxas.png)
+
+---
+
 # The Kubernetes API

 [
@@ -203,9 +167,9 @@ What does that mean?

 ## Let's experiment a bit!

- For this section, connect to the first node of the `test` cluster
+- For the exercises in this section, connect to the first node of the `test` cluster

-.lab[
+.exercise[

 - SSH to the first node of the test cluster

@@ -224,7 +188,7 @@ What does that mean?

 - Let's create a simple object

-.lab[
+.exercise[

 - Create a namespace with the following command:
  ```bash
@@ -246,7 +210,7 @@ This is equivalent to `kubectl create namespace hello`.

 - Let's retrieve the object we just created

-.lab[
+.exercise[

 - Read back our object:
  ```bash
@@ -354,7 +318,7 @@ class: extra-details

 - The easiest way is to use `kubectl label`

-.lab[
+.exercise[

 - In one terminal, watch namespaces:
  ```bash
@@ -402,7 +366,7 @@ class: extra-details

  - DELETED resources

-.lab[
+.exercise[

 - In one terminal, watch pods, displaying full events:
  ```bash
--- a/slides/k8s/authn-authz.md
+++ b/slides/k8s/authn-authz.md
@@ -361,7 +361,7 @@ class: extra-details

 ## Listing service accounts

-.lab[
+.exercise[

 - The resource name is `serviceaccount` or `sa` for short:
  ```bash
@@ -378,7 +378,7 @@ class: extra-details

 ## Finding the secret

-.lab[
+.exercise[

 - List the secrets for the `default` service account:
  ```bash
@@ -398,7 +398,7 @@ class: extra-details

 - The token is stored in the secret, wrapped with base64 encoding

-.lab[
+.exercise[

 - View the secret:
  ```bash
@@ -421,7 +421,7 @@ class: extra-details

 - Let's send a request to the API, without and with the token

-.lab[
+.exercise[

 - Find the ClusterIP for the `kubernetes` service:
  ```bash
@@ -495,49 +495,6 @@ class: extra-details

 ---

-class: extra-details
-
-## Listing all possible verbs
-
- The Kubernetes API is self-documented
-
- We can ask it which resources, subresources, and verb exist
-
- One way to do this is to use:
-
-  - `kubectl get --raw /api/v1` (for core resources with `apiVersion: v1`)
-
-  - `kubectl get --raw /apis/<group>/<version>` (for other resources)
-
- The JSON response can be formatted with e.g. `jq` for readability
-
---
-
-class: extra-details
-
-## Examples
-
- List all verbs across all `v1` resources
-
-  ```bash
-  kubectl get --raw /api/v1 | jq -r .resources[].verbs[] | sort -u
-  ```
-
- List all resources and subresources in `apps/v1`
-
-  ```bash
-  kubectl get --raw /apis/apps/v1 | jq -r .resources[].name
-  ```
-
- List which verbs are available on which resources in `networking.k8s.io`
-
-  ```bash
-  kubectl get --raw /apis/networking.k8s.io/v1 | \
-          jq -r '.resources[] | .name + ": " + (.verbs | join(", "))'
-  ```
-
---
-
 ## From rules to roles to rolebindings

 - A *role* is an API object containing a list of *rules*
@@ -616,7 +573,7 @@ class: extra-details

 - Nixery automatically generates images with the requested packages

-.lab[
+.exercise[

 - Run our pod:
  ```bash
@@ -632,7 +589,7 @@ class: extra-details

 - Normally, at this point, we don't have any API permission

-.lab[
+.exercise[

 - Check our permissions with `kubectl`:
  ```bash
@@ -658,7 +615,7 @@ class: extra-details

  (but again, we could call it `view` or whatever we like)

-.lab[
+.exercise[

 - Create the new role binding:
  ```bash
@@ -716,7 +673,7 @@ It's important to note a couple of details in these flags...

 - We should be able to *view* things, but not to *edit* them

-.lab[
+.exercise[

 - Check our permissions with `kubectl`:
  ```bash
@@ -971,18 +928,6 @@ class: extra-details
  kubectl describe clusterrole cluster-admin
  ```

---
-
-## `list` vs. `get`
-
-⚠️ `list` grants read permissions to resources!
-
- It's not possible to give permission to list resources without also reading them
-
- This has implications for e.g. Secrets
-
-  (if a controller needs to be able to enumerate Secrets, it will be able to read them)
-
 ???

 :EN:- Authentication and authorization in Kubernetes
--- a/slides/k8s/authoring-yaml.md
+++ b/slides/k8s/authoring-yaml.md
@@ -93,7 +93,7 @@

 - We can use the `--dry-run=client` option

-.lab[
+.exercise[

 - Generate the YAML for a Deployment without creating it:
  ```bash
@@ -128,7 +128,7 @@ class: extra-details

 ## The limits of `kubectl apply --dry-run=client`

-.lab[
+.exercise[

 - Generate the YAML for a deployment:
  ```bash
@@ -161,7 +161,7 @@ class: extra-details

  (all validation and mutation hooks will be executed)

-.lab[
+.exercise[

 - Try the same YAML file as earlier, with server-side dry run:
  ```bash
@@ -200,7 +200,7 @@ class: extra-details

 - `kubectl diff` does a server-side dry run, *and* shows differences

-.lab[
+.exercise[

 - Try `kubectl diff` on the YAML that we tweaked earlier:
  ```bash
--- a/slides/k8s/aws-eks.md
+++ b/slides/k8s/aws-eks.md
@@ -1,693 +0,0 @@
-# Amazon EKS
-
- Elastic Kubernetes Service
-
- AWS runs the Kubernetes control plane
-
-  (all we see is an API server endpoint)
-
- Pods can run on any combination of:
-
-  - EKS-managed nodes
-
-  - self-managed nodes
-
-  - Fargate
-
- Leverages and integrates with AWS services and APIs
-
---
-
-## Some integrations
-
- Authenticate with IAM users and roles
-
- Associate IAM roles to Kubernetes ServiceAccounts
-
- Load balance traffic with ALB/ELB/NLB
-
- Persist data with EBS/EFS
-
- Label nodes with instance ID, instance type, region, AZ ...
-
- Pods can be "first class citizens" of VPC
-
---
-
-## Pros/cons
-
- Fully managed control plane
-
- Handles deployment, upgrade, scaling of the control plane
-
- Available versions and features tend to lag a bit
-
- Doesn't fit the most demanding users
-
-  ("demanding" starts somewhere between 100 and 1000 nodes)
-
---
-
-## Good to know ...
-
- Some integrations are specific to EKS
-
-  (some authentication models)
-
- Many integrations are *not* specific to EKS
-
- The Cloud Controller Manager can run outside of EKS
-
-  (and provide LoadBalancer services, EBS volumes, and more)
-
---
-
-# Provisioning clusters
-
- AWS console, API, CLI
-
- `eksctl`
-
- Infrastructure-as-Code
-
---
-
-## AWS "native" provisioning
-
- AWS web console
-
-  - click-click-click!
-
-  - difficulty: low
-
- AWS API or CLI
-
-  - must provide subnets, ARNs
-
-  - difficulty: medium
-
---
-
-## `eksctl`
-
- Originally developed by Weave
-
-  (back when AWS "native" provisioning wasn't very good)
-
- `eksctl create cluster` just works™
-
- Has been "adopted" by AWS
-
-  (is listed in official documentations)
-
---
-
-## Infrastructure-as-Code
-
- Cloud Formation
-
- Terraform
-
-  [terraform-aws-eks](https://github.com/terraform-aws-modules/terraform-aws-eks)
-  by the community
-  ([example](https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/basic))
-
-  [terraform-provider-aws](https://github.com/hashicorp/terraform-provider-aws)
-  by Hashicorp
-  ([example](https://github.com/hashicorp/terraform-provider-aws/tree/main/examples/eks-getting-started))
-
-  [Kubestack](https://www.kubestack.com/)
-
---
-
-## Node groups
-
- Virtually all provisioning models have a concept of "node group"
-
- Node group = group of similar nodes in an ASG
-
-  - can span multiple AZ
-
-  - can have instances of different types¹
-
- A cluster will need at least one node group
-
-.footnote[¹As I understand it, to specify fallbacks if one instance type is unavailable or out of capacity.]
-
---
-
-# IAM → EKS authentication
-
- Access EKS clusters using IAM users and roles
-
- No special role, permission, or policy is needed in IAM
-
-  (but the `eks:DescribeCluster` permission can be useful, see later)
-
- Users and roles need to be explicitly listed in the cluster
-
- Configuration is done through a ConfigMap in the cluster
-
---
-
-## Setting it up
-
- Nothing to do when creating the cluster
-
-  (feature is always enabled)
-
- Users and roles are *mapped* to Kubernetes users and groups
-
-  (through the `aws-auth` ConfigMap in `kube-system`)
-
- That's it!
-
---
-
-## Mapping
-
- The `aws-auth` ConfigMap can contain two entries:
-
-  - `mapRoles` (map IAM roles)
-
-  - `mapUsers` (map IAM users)
-
- Each entry is a YAML file
-
- Each entry includes:
-
-  - `rolearn` or `userarn` to map
-
-  - `username` (as a string)
-
-  - `groups` (as a list; can be empty)
-
---
-
-## Example
-
-```yaml
-apiVersion: v1
-kind: ConfigMap
-metadata:
-  namespace: kube-system
-  name: aws-auth
-data:
-  mapRoles: `|`
-    - rolearn: arn:aws:iam::111122223333:role/blah
-      username: blah
-      groups: [ devs, ops ]
-  mapUsers: `|`
-    - userarn: arn:aws:iam::111122223333:user/alice
-      username: alice
-      groups: [ system:masters ]
-    - userarn: arn:aws:iam::111122223333:user/bob
-      username: bob
-      groups: [ system:masters ]
-```
-
---
-
-## Client setup
-
- We need either the `aws` CLI or the `aws-iam-authenticator`
-
- We use them as `exec` plugins in `~/.kube/config`
-
- Done automatically by `eksctl`
-
- Or manually with `aws eks update-kubeconfig`
-
- Discovering the address of the API server requires one IAM permission
-
-  ```json
-    "Action": [
-        "eks:DescribeCluster"
-    ],
-    "Resource": "arn:aws:eks:<region>:<account>:cluster/<cluster-name>"
-  ```
-
-  (wildcards can be used when specifying the resource)
-
---
-
-class: extra-details
-
-## How it works
-
- The helper generates a token
-
-  (with `aws eks get-token` or `aws-iam-authenticator token`)
-
- Note: these calls will always succeed!
-
-  (even if AWS API keys are invalid)
-
- The token is used to authenticate with the Kubernetes API
-
- AWS' Kubernetes API server will decode and validate the token
-
-  (and map the underlying user or role accordingly)
-
---
-
-## Read The Fine Manual
-
-https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html
-
---
-
-# EKS → IAM authentication
-
- Access AWS services from workloads running on EKS
-
-  (e.g.: access S3 bucket from code running in a Pod)
-
- This works by associating an IAM role to a K8S ServiceAccount
-
- There are also a few specific roles used internally by EKS
-
-  (e.g. to let the nodes establish network configurations)
-
- ... We won't talk about these
-
---
-
-## The big picture
-
- One-time setup task
-
-  ([create an OIDC provider associated to our EKS cluster](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html))
-
- Create (or update) a role with an appropriate *trust policy*
-
-  (more on that later)
-
- Annotate service accounts to map them to that role
-
-  `eks.amazonaws.com/role-arn=arn:aws:iam::111122223333:role/some-iam-role`
-
- Create (or re-create) pods using that ServiceAccount
-
- The pods can now use that role!
-
---
-
-## Trust policies
-
- IAM roles have a *trust policy* (aka *assume role policy*)
-
-  (cf `aws iam create-role ... --assume-role-policy-document ...`)
-
- That policy contains a *statement* list
-
- This list indicates who/what is allowed to assume (use) the role
-
- In the current scenario, that policy will contain something saying:
-
-  *ServiceAccount S on EKS cluster C is allowed to use this role*
-
---
-
-## Trust policy for a single ServiceAccount
-
-```json
-{
-  "Version": "2012-10-17",
-  "Statement": [
-    {
-      "Effect": "Allow",
-      "Principal": {
-        "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
-      },
-      "Action": "sts:AssumeRoleWithWebIdentity",
-      "Condition": {
-        "StringEquals": {
-          "${OIDC_PROVIDER}:sub":
-            "system:serviceaccount:<namespace>:<service-account>"
-        }
-      }
-    }
-  ]
-}
-```
-
---
-
-## Trust policy for multiple ServiceAccounts
-
-```json
-{
-  "Version": "2012-10-17",
-  "Statement": [
-    {
-      "Effect": "Allow",
-      "Principal": {
-        "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
-      },
-      "Action": "sts:AssumeRoleWithWebIdentity",
-      "Condition": {
-        "StringLike": {
-            "${OIDC_PROVIDER}:sub": 
-              ["system:serviceaccount:container-training:*"]
-        }
-      }
-    }
-  ]
-}
-```
-
---
-
-## The little details
-
- When pods are created, they are processed by a mutating webhook
-
-  (typically named `pod-identity-webhook`)
-
- Pods using a ServiceAccount with the right annotation get:
-
-  - an extra token
-    <br/>
-    (mounted in `/var/run/secrets/eks.amazonaws.com/serviceaccount/token`)
-
-  - a few env vars
-    <br/>
-    (including `AWS_WEB_IDENTITY_TOKEN_FILE` and `AWS_ROLE_ARN`)
-
- AWS client libraries and tooling will work this that
-
-  (see [this list](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-minimum-sdk.html) for supported versions)
-
---
-
-# CNI
-
- EKS is a compliant Kubernetes implementation
-
-  (which means we can use a wide range of CNI plugins)
-
- However, the recommended CNI plugin is the "AWS VPC CNI"
-
-  (https://github.com/aws/amazon-vpc-cni-k8s)
-
- Pods are then "first class citizens" of AWS VPC
-
---
-
-## AWS VPC CNI
-
- Each Pod gets an address in a VPC subnet
-
- No overlay network, no encapsulation, no overhead
-
-  (other than AWS network fabric, obviously)
-
- Probably the fastest network option when running on AWS
-
- Allows "direct" load balancing (more on that later)
-
- Can use security groups with Pod traffic
-
- But: limits the number of Pods per Node
-
- But: more complex configuration (more on that later)
-
---
-
-## Number of Pods per Node
-
- Each Pod gets an IP address on an ENI
-
-  (Elastic Network Interface)
-
- EC2 instances can only have a limited number of ENIs
-
-  (the exact limit depends on the instance type)
-
- ENIs can only have a limited number of IP addresses
-
-  (with variations here as well)
-
- This gives limits of e.g. 35 pods on `t3.large`, 29 on `c5.large` ...
-
-  (see
-  [full list of limits per instance type](https://github.com/awslabs/amazon-eks-ami/blob/master/files/eni-max-pods.txt
-)
-  and
-  [ENI/IP details](https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/awsutils/vpc_ip_resource_limit.go
-))
-
---
-
-## Limits?
-
- These limits might seem low
-
- They're not *that* low if you compute e.g. the RAM/Pod ratio
-
- Except if you're running lots if tiny pods
-
- Bottom line: do the math!
-
---
-
-class: extra-details
-
-## Pre-loading
-
- It can take a little while to allocate/attach an ENI
-
- The AWS VPC CNI can keep a few extra addresses on each Node
-
-  (by default, one ENI worth of IP addresses)
-
- This is tunable if needed
-
-  (see [the docs](https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/eni-and-ip-target.md
-) for details)
-
---
-
-## Better load balancing
-
- The default path for inbound traffic is:
-
-  Load balancer → NodePort → Pod
-
- With the AWS VPC CNI, it becomes possible to do:
-
-  Load balancer → Pod
-
- More on that in the load balancing section!
-
---
-
-## Configuration complexity
-
- The AWS VPC CNI is a very good solution when running EKS
-
- It brings optimized solutions to various use-cases:
-
-  - direct load balancing
-  - user authentication
-  - interconnection with other infrastructure
-  - etc.
-
- Keep in mind that all these solutions are AWS-specific
-
- They can require a non-trivial amount of specific configuration
-
- Especially when moving from a simple POC to an IAC deployment!
-
---
-
-# Load Balancers
-
- Here be dragons!
-
- Multiple options, each with different pros/cons
-
- It's necessary to know both AWS products and K8S concepts
-
---
-
-## AWS load balancers
-
- CLB / Classic Load Balancer (formerly known as ELB)
-
-  - can work in L4 (TCP) or L7 (HTTP) mode
-  - can do TLS unrolling
-  - can't do websockets, HTTP/2, content-based routing ...
-
- NLB / Network Load Balancer
-
-  - high-performance L4 load balancer with TLS support
-
- ALB / Application Load Balancer
-
-  - HTTP load balancer
-  - can do TLS unrolling
-  - can do websockets, HTTP/2, content-based routing ...
-
---
-
-## Load balancing modes
-
- "IP targets"
-
-  - send traffic directly from LB to Pods
-
-  - Pods must use the AWS VPC CNI
-
-  - compatible with Fargate Pods
-
- "Instance targets"
-
-  - send traffic to a NodePort (generally incurs an extra hop)
-
-  - Pods can use any CNI
-
-  - not compatible with Fargate Pods
-
- Each LB (Service) can use a different mode, if necessary
-
---
-
-## Kubernetes load balancers
-
- Service (L4)
-
-  - ClusterIP: internal load balancing
-  - NodePort: external load balancing on ports >30000
-  - LoadBalancer: external load balancing on the port you want
-  - ExternalIP: external load balancing directly on nodes
-
- Ingress (L7 HTTP)
-
-  - partial content-based routing (`Host` header, request path)
-  - requires an Ingress Controller (in front)
-  - works with Services (in back)
-
---
-
-## Two controllers are available
-
- Kubernetes "in-tree" load balancer controller
-
-  - always available
-  - used by default for LoadBalancer Services
-  - creates CLB by default; can also do NLB
-  - can only do "instance targets"
-  - can use extra CLB features (TLS, HTTP)
-
- AWS Load Balancer Controller (fka AWS ALB Ingress Controller)
-
-  - optional add-on (requires additional config)
-  - primarily meant to be an Ingress Controller
-  - creates NLB and ALB
-  - can do "instance targets" and "IP targets"
-  - can also be used for LoadBalancer Services with type `nlb-ip`
-
- They can run side by side
-
---
-
-## Which one should we use?
-
- AWS Load Balancer Controller supports "IP targets"
-
-  (which means direct routing of traffic to Pods)
-
- It can be used as an Ingress controller
-
- It *seems* to be the perfect solution for EKS!
-
- However ...
-
---
-
-## Caveats
-
- AWS Load Balancer Controller requires extensive configuration
-
-  - a few hours to a few days to get it to work in a POC ...
-
-  - a few days to a few weeks to industrialize that process?
-
- It's AWS-specific
-
- It still introduces an extra hop, even if that hop is invisible
-
- Other ingress controllers can have interesting features
-
-  (canary deployment, A/B testing ...)
-
---
-
-## Noteworthy annotations and docs
-
- `service.beta.kubernetes.io/aws-load-balancer-type: nlb-ip`
-
-  - LoadBalancer Service with "IP targets" ([docs](https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/service/nlb_ip_mode/))
-  - requires AWS Load Balancer Controller
-
- `service.beta.kubernetes.io/aws-load-balancer-internal: "true"`
-
-  - internal load balancer (for private VPC)
-
- `service.beta.kubernetes.io/aws-load-balancer-type: nlb`
-
-  - opt for NLB instead of CLB with in-tree controller
-
- `service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"`
-
-  - use HAProxy [PROXY protocol](https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt)
-
---
-
-## TLS-related annotations
-
- `service.beta.kubernetes.io/aws-load-balancer-ssl-cert`
-
-  - enable TLS and use that certificate
-  - example value: `arn:aws:acm:<region>:<account>:certificate/<cert-id>`
-
- `service.beta.kubernetes.io/aws-load-balancer-ssl-ports`
-
-  - enable TLS *only* on the specified ports (when multiple ports are exposed)
-  - example value: `"443,8443"`
-
- `service.beta.kubernetes.io/aws-load-balancer-ssl-negotiation-policy`
-
-  - specify ciphers and other TLS parameters to use (see [that list](https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-security-policy-table.html))
-  - example value: `"ELBSecurityPolicy-TLS-1-2-2017-01"`
-
---
-
-## To HTTP(S) or not to HTTP(S)
-
- `service.beta.kubernetes.io/aws-load-balancer-backend-protocol`
-
-  - can be either `http`, `https`, `ssl`, or `tcp`
-
-  - if `https` or `ssl`: enable TLS to the backend
-
-  - if `http` or `https`: enable HTTP `x-forwarded-for` headers (with `http` or `https`)
-
-???
-
-## Cluster autoscaling
-
-## Logging
-
-https://docs.aws.amazon.com/eks/latest/userguide/logging-using-cloudtrail.html
-
-:EN:- Working with EKS
-:EN:- Cluster and user provisioning
-:EN:- Networking and load balancing
-
-:FR:- Travailler avec EKS
-:FR:- Outils de déploiement
-:FR:- Intégration avec IAM
-:FR:- Fonctionalités réseau
--- a/slides/k8s/batch-jobs.md
+++ b/slides/k8s/batch-jobs.md
@@ -30,7 +30,7 @@

  - or we hit the *backoff limit* of the Job (default=6)

-.lab[
+.exercise[

 - Create a Job that has a 50% chance of success:
  ```bash
@@ -49,7 +49,7 @@

 - If the Pod fails, the Job creates another Pod

-.lab[
+.exercise[

 - Check the status of the Pod(s) created by the Job:
  ```bash
@@ -108,7 +108,7 @@ class: extra-details

  (The Cron Job will not hold if a previous job is still running)

-.lab[
+.exercise[

 - Create the Cron Job:
  ```bash
@@ -135,7 +135,7 @@ class: extra-details

  (re-creating another one if it fails, for instance if its node fails)

-.lab[
+.exercise[

 - Check the Jobs that are created:
  ```bash
--- a/slides/k8s/bootstrap.md
+++ b/slides/k8s/bootstrap.md
@@ -98,7 +98,7 @@

 - Let's list our bootstrap tokens on a cluster created with kubeadm

-.lab[
+.exercise[

 - Log into node `test1`

@@ -145,7 +145,7 @@ class: extra-details

 - The token we need to use has the form `abcdef.1234567890abcdef`

-.lab[
+.exercise[

 - Check that it is accepted by the API server:
  ```bash
@@ -177,7 +177,7 @@ class: extra-details

 - That information is stored in a public ConfigMap

-.lab[
+.exercise[

 - Retrieve that ConfigMap:
  ```bash
--- a/slides/k8s/build-with-docker.md
+++ b/slides/k8s/build-with-docker.md
@@ -88,7 +88,7 @@ spec:

 - Let's try this out!

-.lab[
+.exercise[

 - Check the port used by our self-hosted registry:
  ```bash
--- a/slides/k8s/build-with-kaniko.md
+++ b/slides/k8s/build-with-kaniko.md
@@ -40,7 +40,7 @@

 - Let's build the image for the DockerCoins `worker` service with Kaniko

-.lab[
+.exercise[

 - Find the port number for our self-hosted registry:
  ```bash
@@ -160,7 +160,7 @@ spec:

 - The YAML for the pod is in `k8s/kaniko-build.yaml`

-.lab[
+.exercise[

 - Create the pod:
  ```bash
--- a/slides/k8s/buildshiprun-selfhosted.md
+++ b/slides/k8s/buildshiprun-selfhosted.md
@@ -37,7 +37,7 @@ so that your build pipeline is automated.*

 - We will deploy a registry container, and expose it with a NodePort

-.lab[
+.exercise[

 - Create the registry service:
  ```bash
@@ -57,7 +57,7 @@ so that your build pipeline is automated.*

 - We need to find out which port has been allocated

-.lab[
+.exercise[

 - View the service details:
  ```bash
@@ -78,7 +78,7 @@ so that your build pipeline is automated.*

 - A convenient Docker registry API route to remember is `/v2/_catalog`

-.lab[
+.exercise[

 <!-- ```hide kubectl wait deploy/registry --for condition=available```-->

@@ -102,7 +102,7 @@ We should see:

 - We can retag a small image, and push it to the registry

-.lab[
+.exercise[

 - Make sure we have the busybox image, and retag it:
  ```bash
@@ -123,7 +123,7 @@ We should see:

 - Let's use the same endpoint as before

-.lab[
+.exercise[

 - Ensure that our busybox image is now in the local registry:
  ```bash
@@ -143,7 +143,7 @@ The curl command should now output:

 - We are going to use a convenient feature of Docker Compose

-.lab[
+.exercise[

 - Go to the `stacks` directory:
  ```bash
@@ -217,7 +217,7 @@ class: extra-details

 - All our images should now be in the registry

-.lab[
+.exercise[

 - Re-run the same `curl` command as earlier:
  ```bash
@@ -232,4 +232,4 @@ variable, so that we can quickly switch from
 the self-hosted registry to pre-built images
 hosted on the Docker Hub. So make sure that
 this $REGISTRY variable is set correctly when
-running these commands!*
+running the exercises!*
--- a/slides/k8s/cert-manager.md
+++ b/slides/k8s/cert-manager.md
@@ -56,7 +56,7 @@

 - It can be installed with a YAML manifest, or with Helm

-.lab[
+.exercise[

 - Let's install the cert-manager Helm chart with this one-liner:
  ```bash
@@ -86,7 +86,7 @@

 - The manifest shown on the previous slide is in @@LINK[k8s/cm-clusterissuer.yaml]

-.lab[
+.exercise[

 - Create the ClusterIssuer:
  ```bash
@@ -115,7 +115,7 @@

 - The manifest shown on the previous slide is in @@LINK[k8s/cm-certificate.yaml]

-.lab[
+.exercise[

 - Edit the Certificate to update the domain name

@@ -140,7 +140,7 @@

 - then it waits for the challenge to complete

-.lab[
+.exercise[

 - View the resources created by cert-manager:
  ```bash
@@ -158,7 +158,7 @@

  `http://<our-domain>/.well-known/acme-challenge/<token>`

-.lab[
+.exercise[

 - Check the *path* of the Ingress in particular:
  ```bash
@@ -176,7 +176,7 @@

 An Ingress Controller! 😅

-.lab[
+.exercise[

 - Install an Ingress Controller:
  ```bash
--- a/slides/k8s/cluster-autoscaler.md
+++ b/slides/k8s/cluster-autoscaler.md
@@ -1,445 +0,0 @@
-# Cluster autoscaler
-
- When the cluster is full, we need to add more nodes
-
- This can be done manually:
-
-  - deploy new machines and add them to the cluster
-
-  - if using managed Kubernetes, use some API/CLI/UI
-
- Or automatically with the cluster autoscaler:
-
-  https://github.com/kubernetes/autoscaler
-
---
-
-## Use-cases
-
- Batch job processing
-
-  "once in a while, we need to execute these 1000 jobs in parallel"
-
-  "...but the rest of the time there is almost nothing running on the cluster"
-
- Dynamic workload
-
-  "a few hours per day or a few days per week, we have a lot of traffic"
-
-  "...but the rest of the time, the load is much lower"
-
---
-
-## Pay for what you use
-
- The point of the cloud is to "pay for what you use"
-
- If you have a fixed number of cloud instances running at all times:
-
-  *you're doing in wrong (except if your load is always the same)*
-
- If you're not using some kind of autoscaling, you're wasting money
-
-  (except if you like lining the pockets of your cloud provider)
-
---
-
-## Running the cluster autoscaler
-
- We must run nodes on a supported infrastructure
-
- See [here] for a non-exhaustive list of supported providers
-
- Sometimes, the cluster autoscaler is installed automatically
-
-  (or by setting a flag / checking a box when creating the cluster)
-
- Sometimes, it requires additional work
-
-  (which is often non-trivial and highly provider-specific)
-
-[here]: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider
-
---
-
-## Scaling up in theory
-
-IF a Pod is `Pending`,
-
-AND adding a Node would allow this Pod to be scheduled,
-
-THEN add a Node.
-
---
-
-## Fine print 1
-
-*IF a Pod is `Pending`...*
-
- First of all, the Pod must exist
-
- Pod creation might be blocked by e.g. a namespace quota
-
- In that case, the cluster autoscaler will never trigger
-
---
-
-## Fine print 2
-
-*IF a Pod is `Pending`...*
-
- If our Pods do not have resource requests:
-
-  *they will be in the `BestEffort` class*
-
- Generally, Pods in the `BestEffort` class are schedulable
-
-  - except if they have anti-affinity placement constraints
-
-  - except if all Nodes already run the max number of pods (110 by default)
-
- Therefore, if we want to leverage cluster autoscaling:
-
-  *our Pods should have resource requests*
-
---
-
-## Fine print 3
-
-*AND adding a Node would allow this Pod to be scheduled...*
-
- The autoscaler won't act if:
-
-  - the Pod is too big to fit on a single Node
-
-  - the Pod has impossible placement constraints
-
- Examples:
-
-  - "run one Pod per datacenter" with 4 pods and 3 datacenters
-
-  - "use this nodeSelector" but no such Node exists
-
---
-
-## Trying it out
-
- We're going to check how much capacity is available on the cluster
-
- Then we will create a basic deployment
-
- We will add resource requests to that deployment
-
- Then scale the deployment to exceed the available capacity
-
- **The following commands require a working cluster autoscaler!**
-
---
-
-## Checking available resources
-
-.lab[
-
- Check how much CPU is allocatable on the cluster:
-  ```bash
-  kubectl get nodes  -o jsonpath={..allocatable.cpu}
-  ```
-
-]
-
- If we see e.g. `2800m 2800m 2800m`, that means:
-
-  3 nodes with 2.8 CPUs allocatable each
-
- To trigger autoscaling, we will create 7 pods requesting 1 CPU each
-
-  (each node can fit 2 such pods)
-
---
-
-## Creating our test Deployment
-
-.lab[
-
- Create the Deployment:
-  ```bash
-  kubectl create deployment blue --image=jpetazzo/color
-  ```
-
- Add a request for 1 CPU:
-  ```bash
-    kubectl patch deployment blue --patch='
-    spec:
-      template:
-        spec:
-          containers:
-          - name: color
-            resources:
-              requests:
-                cpu: 1
-    '
-  ```
-]
-
---
-
-## Scaling up in practice
-
- This assumes that we have strictly less than 7 CPUs available
-
-  (adjust the numbers if necessary!)
-
-.lab[
-
- Scale up the Deployment:
-  ```bash
-  kubectl scale deployment blue --replicas=7
-  ```
-
- Check that we have a new Pod, and that it's `Pending`:
-  ```bash
-  kubectl get pods
-  ```
-
-]
-
---
-
-## Cluster autoscaling
-
- After a few minutes, a new Node should appear
-
- When that Node becomes `Ready`, the Pod will be assigned to it
-
- The Pod will then be `Running`
-
- Reminder: the `AGE` of the Pod indicates when the Pod was *created*
-
-  (it doesn't indicate when the Pod was scheduled or started!)
-
- To see other state transitions, check the `status.conditions` of the Pod
-
---
-
-## Scaling down in theory
-
-IF a Node has less than 50% utilization for 10 minutes,
-
-AND all its Pods can be scheduled on other Nodes,
-
-AND all its Pods are *evictable*,
-
-AND the Node doesn't have a "don't scale me down" annotation¹,
-
-THEN drain the Node and shut it down.
-
-.footnote[¹The annotation is: `cluster-autoscaler.kubernetes.io/scale-down-disabled=true`]
-
---
-
-## When is a Pod "evictable"?
-
-By default, Pods are evictable, except if any of the following is true.
-
- They have a restrictive Pod Disruption Budget
-
- They are "standalone" (not controlled by a ReplicaSet/Deployment, StatefulSet, Job...)
-
- They are in `kube-system` and don't have a Pod Disruption Budget
-
- They have local storage (that includes `EmptyDir`!)
-
-This can be overridden by setting the annotation:
-<br/>
-`cluster-autoscaler.kubernetes.io/safe-to-evict`
-<br/>(it can be set to `true` or `false`)
-
---
-
-## Pod Disruption Budget
-
- Special resource to configure how many Pods can be *disrupted*
-
-  (i.e. shutdown/terminated)
-
- Applies to Pods matching a given selector
-
-  (typically matching the selector of a Deployment)
-
- Only applies to *voluntary disruption*
-
-  (e.g. cluster autoscaler draining a node, planned maintenance...)
-
- Can express `minAvailable` or `maxUnavailable`
-
- See [documentation] for details and examples
-
-[documentation]: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
-
---
-
-## Local storage
-
- If our Pods use local storage, they will prevent scaling down
-
- If we have e.g. an `EmptyDir` volume for caching/sharing:
-
-  make sure to set the `.../safe-to-evict` annotation to `true`!
-
- Even if the volume...
-
-  - ...only has a PID file or UNIX socket
-
-  - ...is empty
-
-  - ...is not mounted by any container in the Pod!
-
---
-
-## Expensive batch jobs
-
- Careful if we have long-running batch jobs!
-
-  (e.g. jobs that take many hours/days to complete)
-
- These jobs could get evicted before they complete
-
-  (especially if they use less than 50% of the allocatable resources)
-
- Make sure to set the `.../safe-to-evict` annotation to `false`!
-
---
-
-## Node groups
-
- Easy scenario: all nodes have the same size
-
- Realistic scenario: we have nodes of different sizes
-
-  - e.g. mix of CPU and GPU nodes
-
-  - e.g. small nodes for control plane, big nodes for batch jobs
-
-  - e.g. leveraging spot capacity
-
- The cluster autoscaler can handle it!
-
---
-
-class: extra-details
-
-## Leveraging spot capacity
-
- AWS, Azure, and Google Cloud are typically more expensive then their competitors
-
- However, they offer *spot* capacity (spot instances, spot VMs...)
-
- *Spot* capacity:
-
-  - has a much lower cost (see e.g. AWS [spot instance advisor][awsspot])
-
-  - has a cost that varies continuously depending on regions, instance type...
-
-  - can be preempted at all times
-
- To be cost-effective, it is strongly recommended to leverage spot capacity
-
-[awsspot]: https://aws.amazon.com/ec2/spot/instance-advisor/
-
---
-
-## Node groups in practice
-
- The cluster autoscaler maps nodes to *node groups*
-
-  - this is an internal, provider-dependent mechanism
-
-  - the node group is sometimes visible through a proprietary label or annotation
-
- Each node group is scaled independently
-
- The cluster autoscaler uses [expanders] to decide which node group to scale up
-
-  (the default expander is "random", i.e. pick a node group at random!) 
-
- Of course, only acceptable node groups will be considered
-
-  (i.e. node groups that could accommodate the `Pending` Pods)
-
-[expanders]: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-expanders
-
---
-
-class: extra-details
-
-## Scaling to zero
-
- *In general,* a node group needs to have at least one node at all times
-
-  (the cluster autoscaler uses that node to figure out the size, labels, taints... of the group)
-
- *On some providers,* there are special ways to specify labels and/or taints
-
-  (but if you want to scale to zero, check that the provider supports it!)
-
---
-
-## Warning
-
- Autoscaling up is easy
-
- Autoscaling down is harder
-
- It might get stuck because Pods are not evictable
-
- Do at least a dry run to make sure that the cluster scales down correctly!
-
- Have alerts on cloud spend
-
- *Especially when using big/expensive nodes (e.g. with GPU!)*
-
---
-
-## Preferred vs. Required
-
- Some Kubernetes mechanisms allow to express "soft preferences":
-
-  - affinity (`requiredDuringSchedulingIgnoredDuringExecution` vs `preferredDuringSchedulingIgnoredDuringExecution`)
-
-  - taints (`NoSchedule`/`NoExecute` vs `PreferNoSchedule`)
-
- Remember that these "soft preferences" can be ignored
-
-  (and given enough time and churn on the cluster, they will!)
-
---
-
-## Troubleshooting
-
- The cluster autoscaler publishes its status on a ConfigMap
-
-.lab[
-
- Check the cluster autoscaler status:
-  ```bash
-  kubectl describe configmap --namespace kube-system cluster-autoscaler-status
-  ```
-
-]
-
- We can also check the logs of the autoscaler
-
-  (except on managed clusters where it's running internally, not visible to us)
-
---
-
-## Acknowledgements
-
-Special thanks to [@s0ulshake] for their help with this section!
-
-If you need help to run your data science workloads on Kubernetes,
-<br/>they're available for consulting.
-
-(Get in touch with them through https://www.linkedin.com/in/ajbowen/)
-
-[@s0ulshake]: https://twitter.com/s0ulshake
--- a/slides/k8s/cluster-upgrade.md
+++ b/slides/k8s/cluster-upgrade.md
@@ -18,9 +18,9 @@

 - It's easy to check the version for the API server

-.lab[
+.exercise[

- Log into node `oldversion1`
+- Log into node `test1`

 - Check the version of kubectl and of the API server:
  ```bash
@@ -39,7 +39,7 @@

 - It's also easy to check the version of kubelet

-.lab[
+.exercise[

 - Check node versions (includes kubelet, kernel, container engine):
  ```bash
@@ -60,7 +60,7 @@

 - If the control plane is self-hosted (running in pods), we can check it

-.lab[
+.exercise[

 - Show image versions for all pods in `kube-system` namespace:
  ```bash
@@ -81,7 +81,7 @@

 ## What version are we running anyway?

- When I say, "I'm running Kubernetes 1.18", is that the version of:
+- When I say, "I'm running Kubernetes 1.15", is that the version of:

  - kubectl

@@ -157,15 +157,15 @@

 ## Kubernetes uses semantic versioning

- Kubernetes versions look like MAJOR.MINOR.PATCH; e.g. in 1.18.20:
+- Kubernetes versions look like MAJOR.MINOR.PATCH; e.g. in 1.17.2:

  - MAJOR = 1
-  - MINOR = 18
-  - PATCH = 20
+  - MINOR = 17
+  - PATCH = 2

 - It's always possible to mix and match different PATCH releases

-  (e.g. 1.18.20 and 1.18.15 are compatible)
+  (e.g. 1.16.1 and 1.16.6 are compatible)

 - It is recommended to run the latest PATCH release

@@ -181,9 +181,9 @@

 - All components support a difference of one¹ MINOR version

- This allows live upgrades (since we can mix e.g. 1.18 and 1.19)
+- This allows live upgrades (since we can mix e.g. 1.15 and 1.16)

- It also means that going from 1.18 to 1.20 requires going through 1.19
+- It also means that going from 1.14 to 1.16 requires going through 1.15

 .footnote[¹Except kubelet, which can be up to two MINOR behind API server,
 and kubectl, which can be one MINOR ahead or behind API server.]
@@ -214,7 +214,7 @@ and kubectl, which can be one MINOR ahead or behind API server.]

 - We will change the version of the API server

- We will work with cluster `oldversion` (nodes `oldversion1`, `oldversion2`, `oldversion3`)
+- We will work with cluster `test` (nodes `test1`, `test2`, `test3`)

 ---

@@ -240,9 +240,9 @@ and kubectl, which can be one MINOR ahead or behind API server.]

 - We will edit the YAML file to use a different image version

-.lab[
+.exercise[

- Log into node `oldversion1`
+- Log into node `test1`

 - Check API server version:
  ```bash
@@ -254,7 +254,7 @@ and kubectl, which can be one MINOR ahead or behind API server.]
  sudo vim /etc/kubernetes/manifests/kube-apiserver.yaml
  ```

- Look for the `image:` line, and update it to e.g. `v1.19.0`
+- Look for the `image:` line, and update it to e.g. `v1.16.0`

 ]

@@ -264,7 +264,7 @@ and kubectl, which can be one MINOR ahead or behind API server.]

 - The API server will be briefly unavailable while kubelet restarts it

-.lab[
+.exercise[

 - Check the API server version:
  ```bash
@@ -299,7 +299,7 @@ and kubectl, which can be one MINOR ahead or behind API server.]

  (note: this is possible only because the cluster was installed with kubeadm)

-.lab[
+.exercise[

 - Check what will be upgraded:
  ```bash
@@ -308,11 +308,11 @@ and kubectl, which can be one MINOR ahead or behind API server.]

 ]

-Note 1: kubeadm thinks that our cluster is running 1.19.0.
+Note 1: kubeadm thinks that our cluster is running 1.16.0.
 <br/>It is confused by our manual upgrade of the API server!

-Note 2: kubeadm itself is still version 1.18.20..
-<br/>It doesn't know how to upgrade do 1.19.X.
+Note 2: kubeadm itself is still version 1.15.9.
+<br/>It doesn't know how to upgrade do 1.16.X.

 ---

@@ -320,7 +320,7 @@ Note 2: kubeadm itself is still version 1.18.20..

 - First things first: we need to upgrade kubeadm

-.lab[
+.exercise[

 - Upgrade kubeadm:
  ```
@@ -335,28 +335,28 @@ Note 2: kubeadm itself is still version 1.18.20..
 ]

 Problem: kubeadm doesn't know know how to handle
-upgrades from version 1.18.
+upgrades from version 1.15.

-This is because we installed version 1.22 (or even later).
+This is because we installed version 1.17 (or even later).

-We need to install kubeadm version 1.19.X.
+We need to install kubeadm version 1.16.X.

 ---

 ## Downgrading kubeadm

- We need to go back to version 1.19.X.
+- We need to go back to version 1.16.X (e.g. 1.16.6)

-.lab[
+.exercise[

 - View available versions for package `kubeadm`:
  ```bash
-  apt show kubeadm -a | grep ^Version | grep 1.19
+  apt show kubeadm -a | grep ^Version | grep 1.16
  ```

 - Downgrade kubeadm:
  ```
-  sudo apt install kubeadm=1.19.8-00
+  sudo apt install kubeadm=1.16.6-00
  ```

 - Check what kubeadm tells us:
@@ -366,7 +366,7 @@ We need to install kubeadm version 1.19.X.

 ]

-kubeadm should now agree to upgrade to 1.19.8.
+kubeadm should now agree to upgrade to 1.16.6.

 ---

@@ -378,11 +378,11 @@ kubeadm should now agree to upgrade to 1.19.8.

 - Or we can try the upgrade anyway

-.lab[
+.exercise[

 - Perform the upgrade:
  ```bash
-  sudo kubeadm upgrade apply v1.19.8
+  sudo kubeadm upgrade apply v1.16.6
  ```

 ]
@@ -395,9 +395,9 @@ kubeadm should now agree to upgrade to 1.19.8.

 - We can therefore use `apt` or `apt-get`

-.lab[
+.exercise[

- Log into node `oldversion3`
+- Log into node `test3`

 - View available versions for package `kubelet`:
  ```bash
@@ -406,7 +406,7 @@ kubeadm should now agree to upgrade to 1.19.8.

 - Upgrade kubelet:
  ```bash
-  sudo apt install kubelet=1.19.8-00
+  sudo apt install kubelet=1.16.6-00
  ```

 ]
@@ -415,9 +415,9 @@ kubeadm should now agree to upgrade to 1.19.8.

 ## Checking what we've done

-.lab[
+.exercise[

- Log into node `oldversion1`
+- Log into node `test1`

 - Check node versions:
  ```bash
@@ -458,15 +458,15 @@ kubeadm should now agree to upgrade to 1.19.8.

  (after upgrading the control plane)

-.lab[
+.exercise[

 - Download the configuration on each node, and upgrade kubelet:
  ```bash
    for N in 1 2 3; do
-      ssh oldversion$N "
-        sudo apt install kubeadm=1.19.8-00 &&
+      ssh test$N "
+        sudo apt install kubeadm=1.16.6-00 &&
        sudo kubeadm upgrade node &&
-        sudo apt install kubelet=1.19.8-00"
+        sudo apt install kubelet=1.16.6-00"
    done
  ```
 ]
@@ -475,9 +475,9 @@ kubeadm should now agree to upgrade to 1.19.8.

 ## Checking what we've done

- All our nodes should now be updated to version 1.19.8
+- All our nodes should now be updated to version 1.16.6

-.lab[
+.exercise[

 - Check nodes versions:
  ```bash
@@ -492,13 +492,13 @@ class: extra-details

 ## Skipping versions

- This example worked because we went from 1.18 to 1.19
+- This example worked because we went from 1.15 to 1.16

- If you are upgrading from e.g. 1.16, you will have to go through 1.17 first
+- If you are upgrading from e.g. 1.14, you will have to go through 1.15 first

- This means upgrading kubeadm to 1.17.X, then using it to upgrade the cluster
+- This means upgrading kubeadm to 1.15.X, then using it to upgrade the cluster

- Then upgrading kubeadm to 1.18.X, etc.
+- Then upgrading kubeadm to 1.16.X, etc.

 - **Make sure to read the release notes before upgrading!**

--- a/slides/k8s/cni.md
+++ b/slides/k8s/cni.md
@@ -204,7 +204,7 @@ class: extra-details

 ## Logging into the new cluster

-.lab[
+.exercise[

 - Log into node `kuberouter1`

@@ -228,7 +228,7 @@ class: extra-details

 - By default, kubelet gets the CNI configuration from `/etc/cni/net.d`

-.lab[
+.exercise[

 - Check the content of `/etc/cni/net.d`

@@ -262,7 +262,7 @@ class: extra-details

  (where `C` is our cluster number)

-.lab[
+.exercise[

 - Edit the Compose file to set the Cluster CIDR:
  ```bash
@@ -298,7 +298,7 @@ class: extra-details

  (where `A.B.C.D` is the public address of `kuberouter1`, running the control plane)

-.lab[
+.exercise[

 - Edit the YAML file to set the API server address:
  ```bash
@@ -320,7 +320,7 @@ Note: the DaemonSet won't create any pods (yet) since there are no nodes (yet).

 - This is similar to what we did for the `kubenet` cluster

-.lab[
+.exercise[

 - Generate the kubeconfig file (replacing `X.X.X.X` with the address of `kuberouter1`):
  ```bash
@@ -338,7 +338,7 @@ Note: the DaemonSet won't create any pods (yet) since there are no nodes (yet).

 - We need to copy that kubeconfig file to the other nodes

-.lab[
+.exercise[

 - Copy `kubeconfig` to the other nodes:
  ```bash
@@ -359,7 +359,7 @@ Note: the DaemonSet won't create any pods (yet) since there are no nodes (yet).

 - We need to pass `--network-plugin=cni`

-.lab[
+.exercise[

 - Join the first node:
   ```bash
@@ -384,7 +384,7 @@ class: extra-details

  (in `/etc/cni/net.d`)

-.lab[
+.exercise[

 - Check the content of `/etc/cni/net.d`

@@ -400,7 +400,7 @@ class: extra-details

 - Let's create a Deployment and expose it with a Service

-.lab[
+.exercise[

 - Create a Deployment running a web server:
  ```bash
@@ -423,7 +423,7 @@ class: extra-details

 ## Checking that everything works

-.lab[
+.exercise[

 - Get the ClusterIP address for the service:
  ```bash
@@ -449,7 +449,7 @@ class: extra-details

 - What if we need to check that everything is working properly?

-.lab[
+.exercise[

 - Check the IP addresses of our pods:
  ```bash
@@ -490,7 +490,7 @@ class: extra-details

 ## Trying `kubectl logs` / `kubectl exec`

-.lab[
+.exercise[

 - Try to show the logs of a kube-router pod:
  ```bash
--- a/slides/k8s/configuration.md
+++ b/slides/k8s/configuration.md
@@ -344,94 +344,32 @@ We'll cover them just after!*

 ---

-## Example: HAProxy configuration
+## Passing a configuration file with a configmap

- We are going to deploy HAProxy, a popular load balancer
+- We will start a load balancer powered by HAProxy

- It expects to find its configuration in a specific place:
+- We will use the [official `haproxy` image](https://hub.docker.com/_/haproxy/)

-  `/usr/local/etc/haproxy/haproxy.cfg`
+- It expects to find its configuration in `/usr/local/etc/haproxy/haproxy.cfg`

- We will create a ConfigMap holding the configuration file
+- We will provide a simple HAproxy configuration, `k8s/haproxy.cfg`

- Then we will mount that ConfigMap in a Pod running HAProxy
+- It listens on port 80, and load balances connections between IBM and Google

 ---

-## Blue/green load balancing
+## Creating the configmap

- In this example, we will deploy two versions of our app:
+.exercise[

-  - the "blue" version in the `blue` namespace
-
-  - the "green" version in the `green` namespace
-
- In both namespaces, we will have a Deployment and a Service
-
-  (both named `color`)
-
- We want to load balance traffic between both namespaces
-
-  (we can't do that with a simple service selector: these don't cross namespaces)
-
---
-
-## Deploying the app
-
- We're going to use the image `jpetazzo/color`
-
-  (it is a simple "HTTP echo" server showing which pod served the request)
-
- We can create each Namespace, Deployment, and Service by hand, or...
-
-.lab[
-
- We can deploy the app with a YAML manifest:
+- Go to the `k8s` directory in the repository:
  ```bash
-  kubectl apply -f ~/container.training/k8s/rainbow.yaml
+  cd ~/container.training/k8s
  ```

-]
-
---
-
-## Testing the app
-
- Reminder: Service `x` in Namespace `y` is available through:
-
-  `x.y`, `x.y.svc`, `x.y.svc.cluster.local`
-
- Since the `cluster.local` suffix can change, we'll use `x.y.svc`
-
-.lab[
-
- Check that the app is up and running:
+- Create a configmap named `haproxy` and holding the configuration file:
  ```bash
-    kubectl run --rm -it --restart=Never --image=nixery.dev/curl my-test-pod \
-            curl color.blue.svc
-  ```
-
-]
-
---
-
-## Creating the HAProxy configuration
-
-Here is the file that we will use, @@LINK[k8s/haproxy.cfg]:
-
-```
-@@INCLUDE[k8s/haproxy.cfg]
-```
-
---
-
-## Creating the ConfigMap
-
-.lab[
-
- Create a ConfigMap named `haproxy` and holding the configuration file:
-  ```bash
-  kubectl create configmap haproxy --from-file=~/container.training/k8s/haproxy.cfg
+  kubectl create configmap haproxy --from-file=haproxy.cfg
  ```

 - Check what our configmap looks like:
@@ -443,21 +381,37 @@ Here is the file that we will use, @@LINK[k8s/haproxy.cfg]:

 ---

-## Using the ConfigMap
+## Using the configmap

-Here is @@LINK[k8s/haproxy.yaml], a Pod manifest using that ConfigMap:
+We are going to use the following pod definition:

 ```yaml
-@@INCLUDE[k8s/haproxy.yaml]
+apiVersion: v1
+kind: Pod
+metadata:
+  name: haproxy
+spec:
+  volumes:
+  - name: config
+    configMap:
+      name: haproxy
+  containers:
+  - name: haproxy
+    image: haproxy
+    volumeMounts:
+    - name: config
+      mountPath: /usr/local/etc/haproxy/
 ```

 ---

-## Creating the Pod
+## Using the configmap

-.lab[
+- The resource definition from the previous slide is in `k8s/haproxy.yaml`

- Create the HAProxy Pod:
+.exercise[
+
+- Create the HAProxy pod:
  ```bash
  kubectl apply -f ~/container.training/k8s/haproxy.yaml
  ```
@@ -476,21 +430,27 @@ Here is @@LINK[k8s/haproxy.yaml], a Pod manifest using that ConfigMap:

 ## Testing our load balancer

- If everything went well, when we should see a perfect round robin
+- The load balancer will send:

-  (one request to `blue`, one request to `green`, one request to `blue`, etc.)
+  - half of the connections to Google

-.lab[
+  - the other half to IBM

- Send a few requests:
+.exercise[
+
+- Access the load balancer a few times:
  ```bash
-  for i in $(seq 10); do
  curl $IP
-  done
+  curl $IP
+  curl $IP
  ```

 ]

+We should see connections served by Google, and others served by IBM.
+<br/>
+(Each server sends us a redirect page. Look at the URL that they send us to!)
+
 ---

 ## Exposing configmaps with the downward API
@@ -509,7 +469,7 @@ Here is @@LINK[k8s/haproxy.yaml], a Pod manifest using that ConfigMap:

 ## Creating the configmap

-.lab[
+.exercise[

 - Our configmap will have a single key, `http.addr`:
  ```bash
@@ -530,16 +490,29 @@ Here is @@LINK[k8s/haproxy.yaml], a Pod manifest using that ConfigMap:
 We are going to use the following pod definition:

 ```yaml
-@@INCLUDE[k8s/registry.yaml]
+apiVersion: v1
+kind: Pod
+metadata:
+  name: registry
+spec:
+  containers:
+  - name: registry
+    image: registry
+    env:
+    - name: REGISTRY_HTTP_ADDR
+      valueFrom:
+        configMapKeyRef:
+          name: registry
+          key: http.addr
 ```

 ---

 ## Using the configmap

- The resource definition from the previous slide is in @@LINK[k8s/registry.yaml]
+- The resource definition from the previous slide is in `k8s/registry.yaml`

-.lab[
+.exercise[

 - Create the registry pod:
  ```bash
--- a/Show More
+++ b/Show More