😈 Demonware advanced Kubernetes custom content

🐞 Typo fix
📃 Update V's name and social media link
2026-02-28 16:30:21 +00:00 · 2023-12-07 15:31:04 -06:00 · 2023-12-06 15:31:09 -06:00 · 2023-12-04 16:41:03 -06:00 · 2023-12-04 16:38:33 -06:00 · 2023-12-03 21:33:25 -06:00
323 changed files with 7183 additions and 6710 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -2,11 +2,14 @@
 *.swp
 *~

-prepare-vms/tags
-prepare-vms/infra
-prepare-vms/www
-
-prepare-tf/tag-*
+**/terraform.tfstate
+**/terraform.tfstate.backup
+prepare-labs/terraform/lab-environments
+prepare-labs/terraform/many-kubernetes/one-kubernetes-config/config.tf
+prepare-labs/terraform/many-kubernetes/one-kubernetes-module/*.tf
+prepare-labs/terraform/tags
+prepare-labs/terraform/virtual-machines/openstack/*.tfvars
+prepare-labs/www

 slides/*.yml.html
 slides/autopilot/state.yaml
@@ -26,3 +29,4 @@ node_modules
 Thumbs.db
 ehthumbs.db
 ehthumbs_vista.db
+
--- a/k8s/pod-disruption-budget.yaml
+++ b/k8s/pod-disruption-budget.yaml
@@ -0,0 +1,13 @@
+apiVersion: policy/v1
+kind: PodDisruptionBudget
+metadata:
+  name: my-pdb
+spec:
+  #minAvailable: 2
+  #minAvailable: 90%
+  maxUnavailable: 1
+  #maxUnavailable: 10%
+  selector:
+    matchLabels:
+      app: my-app
+
--- a/k8s/traefik-v2.yaml
+++ b/k8s/traefik-v2.yaml
@@ -1,36 +1,44 @@
 ---
 apiVersion: v1
+kind: Namespace
+metadata:
+  name: traefik
+---
+apiVersion: v1
 kind: ServiceAccount
 metadata:
-  name: traefik-ingress-controller
-  namespace: kube-system
+  name: traefik
+  namespace: traefik
 ---
 kind: DaemonSet
 apiVersion: apps/v1
 metadata:
-  name: traefik-ingress-controller
-  namespace: kube-system
+  name: traefik
+  namespace: traefik
  labels:
-    k8s-app: traefik-ingress-lb
+    app: traefik
 spec:
  selector:
    matchLabels:
-      k8s-app: traefik-ingress-lb
+      app: traefik
  template:
    metadata:
      labels:
-        k8s-app: traefik-ingress-lb
-        name: traefik-ingress-lb
+        app: traefik
+        name: traefik
    spec:
      tolerations:
      - effect: NoSchedule
        operator: Exists
-      hostNetwork: true
-      serviceAccountName: traefik-ingress-controller
+      # If, for some reason, our CNI plugin doesn't support hostPort,
+      # we can enable hostNetwork instead. That should work everywhere
+      # but it doesn't provide the same isolation.
+      #hostNetwork: true
+      serviceAccountName: traefik
      terminationGracePeriodSeconds: 60
      containers:
-      - image: traefik:v2.5
-        name: traefik-ingress-lb
+      - image: traefik:v2.10
+        name: traefik
        ports:
        - name: http
          containerPort: 80
@@ -61,7 +69,7 @@ spec:
 kind: ClusterRole
 apiVersion: rbac.authorization.k8s.io/v1
 metadata:
-  name: traefik-ingress-controller
+  name: traefik
 rules:
  - apiGroups:
      - ""
@@ -73,14 +81,6 @@ rules:
      - get
      - list
      - watch
-  - apiGroups:
-      - extensions
-    resources:
-      - ingresses
-    verbs:
-      - get
-      - list
-      - watch
  - apiGroups:
      - networking.k8s.io
    resources:
@@ -94,15 +94,15 @@ rules:
 kind: ClusterRoleBinding
 apiVersion: rbac.authorization.k8s.io/v1
 metadata:
-  name: traefik-ingress-controller
+  name: traefik
 roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
-  name: traefik-ingress-controller
+  name: traefik
 subjects:
 - kind: ServiceAccount
-  name: traefik-ingress-controller
-  namespace: kube-system
+  name: traefik
+  namespace: traefik
 ---
 kind: IngressClass
 apiVersion: networking.k8s.io/v1
--- a/prepare-labs/README.md
+++ b/prepare-labs/README.md
@@ -0,0 +1,222 @@
+# Tools to create lab environments
+
+This directory contains tools to create lab environments for Docker and Kubernetes courses and workshops.
+
+It also contains Terraform configurations that can be used stand-alone to create simple Kubernetes clusters.
+
+Assuming that you have installed all the necessary dependencies, and placed cloud provider access tokens in the right locations, you could do, for instance:
+
+```bash
+# For a Docker course with 50 students,
+# create 50 VMs on Digital Ocean.
+./labctl create --students 50 --settings settings/docker.env --provider digitalocean
+
+# For a Kubernetes training with 20 students,
+# create 20 clusters of 4 VMs each using kubeadm,
+# on a private Openstack cluster.
+./labctl create --students 20 --settings settings/kubernetes.env --provider openstack/enix
+
+# For a Kubernetes workshop with 80 students,
+# create 80 clusters with 2 VMs each,
+# using Scaleway Kapsule (managed Kubernetes).
+./labctl create --students 20 --settings settings/mk8s.env --provider scaleway --mode mk8s
+```
+
+Interested? Read on!
+
+## Software requirements
+
+For Docker labs and Kubernetes labs based on kubeadm:
+
+- [Parallel SSH](https://github.com/lilydjwg/pssh)
+  (should be installable with `pip install git+https://github.com/lilydjwg/pssh`;
+  on a Mac, try `brew install pssh`)
+
+For all labs:
+
+- Terraform
+
+If you want to generate printable cards:
+
+- [pyyaml](https://pypi.python.org/pypi/PyYAML)
+- [jinja2](https://pypi.python.org/pypi/Jinja2)
+
+These require Python 3. If you are on a Mac, see below for specific instructions on setting up
+Python 3 to be the default Python on a Mac. In particular, if you installed `mosh`, Homebrew
+may have changed your default Python to Python 2.
+
+You will also need an account with the cloud provider(s) that you want to use to deploy the lab environments.
+
+## Cloud provider account(s) and credentials
+
+These scripts create VMs or Kubernetes cluster on cloud providers, so you will need cloud provider account(s) and credentials.
+
+Generally, we try to use the credentials stored in the configuration file used by the cloud providers CLI tools.
+
+This means, for instance, that for Linode, if you install `linode-cli` and configure it properly, it will place your credentials in `~/.config/linode-cli`, and our Terraform configurations will try to read that file and use the credentials in it.
+
+You don't **have to** install the CLI tools of the cloud provider(s) that you want to use; but we recommend that you do.
+
+If you want to provide your cloud credentials through other means, you will have to adjust the Terraform configuration files in `terraform/provider-config` accordingly.
+
+Here is where we look for credentials for each provider:
+
+- AWS: Terraform defaults; see [AWS provider documentation][creds-aws] (for instance, you can use the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables, or AWS config and profile files)
+- Azure: Terraform defaults; see [AzureRM provider documentation][creds-azure] (typically, you can authenticate with the `az` CLI and Terraform will pick it up automatically)
+- Civo: CLI configuration file (`~/.civo.json`)
+- Digital Ocean: CLI configuration file (`~/.config/doctl/config.yaml`)
+- Exoscale: CLI configuration file (`~/.config/exoscale/exoscale.toml`)
+- Google Cloud: FIXME, note that the project name is currently hard-coded to `prepare-tf`
+- Hetzner: CLI configuration file (`~/.config/hcloud/cli.toml`)
+- Linode: CLI configuration file (`~/.config/linode-cli`)
+- OpenStack: you will need to write a tfvars file (check [that exemple](terraform/virtual-machines/openstack/tfvars.example))
+- Oracle: Terraform defaults; see [OCI provider documentation][creds-oci] (for instance, you can set up API keys; or you can use a short-lived token generated by the OCI CLI with `oci session authenticate`)
+- OVH: Terraform defaults; see [OVH provider documentation][creds-ovh] (this typically involves setting up 5 `OVH_...` environment variables)
+- Scaleway: Terraform defaults; see [Scaleway provider documentation][creds-scw] (for instance, you can set environment variables, but it will also automatically pick up CLI authentication from `~/.config/scw/config.yaml`)
+
+[creds-aws]: https://registry.terraform.io/providers/hashicorp/aws/latest/docs#authentication-and-configuration
+[creds-azure]: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs#authenticating-to-azure
+[creds-oci]: https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/terraformproviderconfiguration.htm#authentication
+[creds-ovh]: https://registry.terraform.io/providers/ovh/ovh/latest/docs#provider-configuration
+[creds-scw]: https://registry.terraform.io/providers/scaleway/scaleway/latest/docs#authentication
+
+## General Workflow
+
+- fork/clone repo
+- make sure your cloud credentials have been configured properly
+- run `./labctl create ...` to create lab environments
+- run `./labctl destroy ...` when you don't need the environments anymore
+
+## Customizing things
+
+You can edit the `settings/*.env` files, for instance to change the size of the clusters, the login or password used for the students...
+
+Note that these files are sourced before executing any operation on a specific set of lab environments, which means that you can set Terraform variables by adding lines like the following one in the `*.env` files:
+
+```bash
+export TF_VAR_node_size=GP1.L
+export TF_VAR_location=eu-north
+```
+
+## `./labctl` Usage
+
+If you run `./labctl` without arguments, it will show a list of available commands.
+
+### Summary of What `./labctl` Does For You
+
+The script will create a Terraform configuration using a provider-specific template.
+
+There are two modes: `pssh` and `mk8s`.
+
+In `pssh` mode, students connect directly to the virtual machines using SSH.
+
+The Terraform configuration creates a bunch of virtual machines, then the provisioning and configuration are done with `pssh`. There are a number of "steps" that are executed on the VMs, to install Docker, install a number of convenient tools, install and set up Kubernetes (if needed)... The list of "steps" to be executed is configured in the `settings/*.env` file.
+
+In `mk8s` mode, students don't connect directly to the virtual machines. Instead, they connect to an SSH server running in a Pod (using the `jpetazzo/shpod` image), itself running on a Kubernetes cluster. The Kubernetes cluster is a managed cluster created by the Terraform configuration.
+
+## `terraform` directory structure and principles
+
+Legend:
+- `📁` directory
+- `📄` file
+- `📄📄📄` multiple files
+- `🌍` Terraform configuration that can be used "as-is"
+
+```
+📁terraform
+├── 📁list-locations
+│   └── 📄📄📄 helper scripts
+│              (to list available locations for each provider)
+├── 📁many-kubernetes
+│   └── 📄📄📄 Terraform configuration template 
+│              (used in mk8s mode)
+├── 📁one-kubernetes
+│   │ (contains Terraform configurations that can spawn
+│   │  a single Kubernetes cluster on a given provider)
+│   ├── 📁🌍aws
+│   ├── 📁🌍civo
+│   ├── 📄common.tf
+│   ├── 📁🌍digitalocean
+│   └── ...
+├── 📁providers
+│   ├── 📁aws
+│   │   ├── 📄config.tf
+│   │   └── 📄variables.tf
+│   ├── 📁azure
+│   │   ├── 📄config.tf
+│   │   └── 📄variables.tf
+│   ├── 📁civo
+│   │   ├── 📄config.tf
+│   │   └── 📄variables.tf
+│   ├── 📁digitalocean
+│   │   ├── 📄config.tf
+│   │   └── 📄variables.tf
+│   └── ...
+├── 📁tags
+│   │ (contains Terraform configurations + other files 
+│   │  for a specific set of VMs or K8S clusters; these
+│   │  are created by labctl)
+│   ├── 📁2023-03-27-10-04-79-jp
+│   ├── 📁2023-03-27-10-07-41-jp
+│   ├── 📁2023-03-27-10-16-418-jp
+│   └── ...
+└── 📁virtual-machines
+    │ (contains Terraform configurations that can spawn
+    │  a bunch of virtual machines on a given provider)
+    ├── 📁🌍aws
+    ├── 📁🌍azure
+    ├── 📄common.tf
+    ├── 📁🌍digitalocean
+    └── ...
+```
+
+The directory structure can feel a bit overwhelming at first, but it's built with specific goals in mind.
+
+**Consistent input/output between providers.** The per-provider configurations in `one-kubernetes` all take the same input variables, and provide the same output variables. Same thing for the per-provider configurations in `virtual-machines`.
+
+**Don't repeat yourself.** As much as possible, common variables, definitions, and logic has been factored in the `common.tf` file that you can see in `one-kubernetes` and `virtual-machines`. That file is then symlinked in each provider-specific directory, to make sure that all providers use the same version of the `common.tf` file.
+
+**Don't repeat yourself (again).** The things that are specific to each provider have been placed in the `providers` directory, and are shared between the `one-kubernetes` and the `virtual-machines` configurations. Specifically, for each provider, there is `config.tf` (which contains provider configuration, e.g. how to obtain the credentials for that provider) and `variables.tf` (which contains default values like which location and which VM size to use).
+
+**Terraform configurations should work in `labctl` or standalone, without extra work.** The Terraform configurations (identified by 🌍 in the directory tree above) can be used directly. Just go to one of these directories, `terraform init`, `terraform apply`, and you're good to go. But they can also be used from `labctl`. `labctl` shouldn't barf out if you did a `terraform apply` in one of these directories (because it will only copy the `*.tf` files, and leave alone the other files, like the Terraform state).
+
+The latter means that it should be easy to tweak these configurations, or create a new one, without having to use `labctl` to test it. It also means that if you want to use these configurations but don't care about `labctl`, you absolutely can!
+
+## Miscellaneous info
+
+### Making sure Python3 is the default (Mac only)
+
+Check the `/usr/local/bin/python` symlink. It should be pointing to
+`/usr/local/Cellar/python/3`-something. If it isn't, follow these
+instructions.
+
+1) Verify that Python 3 is installed.
+
+```
+ls -la /usr/local/Cellar/Python
+```
+
+You should see one or more versions of Python 3. If you don't,
+install it with `brew install python`.
+
+2) Verify that `python` points to Python3.
+ 
+```
+ls -la /usr/local/bin/python
+```
+
+If this points to `/usr/local/Cellar/python@2`, then we'll need to change it.
+
+```
+rm /usr/local/bin/python
+ln -s /usr/local/Cellar/Python/xxxx /usr/local/bin/python
+# where xxxx is the most recent Python 3 version you saw above
+```
+
+### AWS specific notes
+
+Initial assumptions are you're using a root account. If you'd like to use a IAM user, it will need the right permissions. For `pssh` mode, that includes at least `AmazonEC2FullAccess` and `IAMReadOnlyAccess`.
+
+In `pssh` mode, the Terraform configuration currently uses the default VPC and Security Group. If you want to use another one, you'll have to make changes to `terraform/virtual-machines/aws`.
+
+The default VPC Security Group does not open any ports from Internet by default. So you'll need to add Inbound rules for `SSH | TCP | 22 | 0.0.0.0/0` and `Custom TCP Rule | TCP | 8000 - 8002 | 0.0.0.0/0`.
--- a/prepare-labs/cleanup.sh
+++ b/prepare-labs/cleanup.sh
@@ -0,0 +1,33 @@
+#!/bin/sh
+
+case "$1-$2" in
+linode-lb)
+  linode-cli nodebalancers list --json | 
+    jq '.[] | select(.label | startswith("ccm-")) | .id' | 
+    xargs -n1 -P10 linode-cli nodebalancers delete
+  ;;
+linode-pvc)
+  linode-cli volumes list --json | 
+    jq '.[] | select(.label | startswith("pvc")) | .id' | 
+    xargs -n1 -P10 linode-cli volumes delete
+  ;;
+digitalocean-lb)
+  doctl compute load-balancer list --output json | 
+    jq .[].id |
+    xargs -n1 -P10 doctl compute load-balancer delete --force
+  ;;
+digitalocean-pvc)
+  doctl compute volume list --output json |
+    jq '.[] | select(.name | startswith("pvc-")) | .id' | 
+    xargs -n1 -P10 doctl compute volume delete --force
+  ;;
+scaleway-pvc)
+  scw instance volume list --output json |
+    jq '.[] | select(.name | contains("_pvc-")) | .id' |
+    xargs -n1 -P10 scw instance volume delete
+  ;;
+*)
+  echo "Unknown combination of provider ('$1') and resource ('$2')."
+  ;;
+esac
+
--- a/prepare-labs/cncsetup.sh
+++ b/prepare-labs/cncsetup.sh
--- a/prepare-labs/dns-cloudflare.sh
+++ b/prepare-labs/dns-cloudflare.sh
@@ -0,0 +1,59 @@
+#!/bin/sh
+#set -eu
+
+if ! command -v http >/dev/null; then
+  echo "Could not find the 'http' command line tool."
+  echo "Please install it (the package name might be 'httpie')."
+  exit 1
+fi
+
+. ~/creds/creds.cloudflare.dns
+
+cloudflare() {
+  case "$1" in
+  GET|POST|DELETE)
+    METHOD="$1"
+    shift
+    ;;
+  *)
+    METHOD=""
+    ;;
+  esac
+  URI=$1
+  shift
+  http --ignore-stdin $METHOD https://api.cloudflare.com/client/v4/$URI "$@" "Authorization:Bearer $CLOUDFLARE_TOKEN"
+}
+
+_list_zones() {
+  cloudflare zones | jq -r .result[].name
+}
+
+_get_zone_id() {
+  cloudflare zones?name=$1 | jq -r .result[0].id
+}
+
+_populate_zone() {
+  ZONE_ID=$(_get_zone_id $1)
+  shift
+  for IPADDR in $*; do
+    cloudflare zones/$ZONE_ID/dns_records "name=*" "type=A" "content=$IPADDR"
+    cloudflare zones/$ZONE_ID/dns_records "name=\@" "type=A" "content=$IPADDR"
+  done
+}
+
+_clear_zone() {
+  ZONE_ID=$(_get_zone_id $1)
+  for RECORD_ID in $(
+    cloudflare zones/$ZONE_ID/dns_records  | jq -r .result[].id
+  ); do
+    cloudflare DELETE zones/$ZONE_ID/dns_records/$RECORD_ID
+  done
+}
+
+_add_zone() {
+  cloudflare zones "name=$1"
+}
+
+echo "This script is still work in progress."
+echo "You can source it and then use its individual functions."
+
--- a/prepare-labs/dns-gandi.py
+++ b/prepare-labs/dns-gandi.py
@@ -2,16 +2,16 @@
 """
 There are two ways to use this script:

-1. Pass a file name and a tag name as a single argument.
-It will load a list of domains from the given file (one per line),
-and assign them to the clusters corresponding to that tag.
-There should be more domains than clusters.
-Example: ./map-dns.py domains.txt 2020-08-15-jp
-
-2. Pass a domain as the 1st argument, and IP addresses then.
+1. Pass a domain as the 1st argument, and IP addresses then.
 It will configure the domain with the listed IP addresses.
 Example: ./map-dns.py open-duck.site 1.2.3.4 2.3.4.5 3.4.5.6

+2. Pass two files names as argument, in which case the first
+file should contain a list of domains, and the second a list of
+groups of IP addresses, with one group per line.
+There should be more domains than groups of addresses.
+Example: ./map-dns.py domains.txt tags/2020-08-15-jp/clusters.txt
+
 In both cases, the domains should be configured to use GANDI LiveDNS.
 """
 import os
@@ -30,18 +30,9 @@ domain_or_domain_file = sys.argv[1]
 if os.path.isfile(domain_or_domain_file):
  domains = open(domain_or_domain_file).read().split()
  domains = [ d for d in domains if not d.startswith('#') ]
-  ips_file_or_tag = sys.argv[2]
-  if os.path.isfile(ips_file_or_tag):
-    lines = open(ips_file_or_tag).read().split('\n')
-    clusters = [line.split() for line in lines]
-  else:
-    ips = open(f"tags/{ips_file_or_tag}/ips.txt").read().split()
-    settings_file = f"tags/{ips_file_or_tag}/settings.yaml"
-    clustersize = yaml.safe_load(open(settings_file))["clustersize"]
-    clusters = []
-    while ips:
-      clusters.append(ips[:clustersize])
-      ips = ips[clustersize:]
+  clusters_file = sys.argv[2]
+  lines = open(clusters_file).read().split('\n')
+  clusters = [line.split() for line in lines]
 else:
  domains = [domain_or_domain_file]
  clusters = [sys.argv[2:]]
--- a/prepare-labs/dns-netlify.sh
+++ b/prepare-labs/dns-netlify.sh
@@ -12,12 +12,15 @@
  echo "$0 del <recordid>"
  echo ""
  echo "Example to create a A record for eu.container.training:"
-  echo "$0 add eu 185.145.250.0"
+  echo "$0 add eu A 185.145.250.0"
  echo ""
  exit 1
 }

 NETLIFY_CONFIG_FILE=~/.config/netlify/config.json
+if ! [ "$DOMAIN" ]; then
+  DOMAIN=container.training
+fi

 if ! [ -f "$NETLIFY_CONFIG_FILE" ]; then
  echo "Could not find Netlify configuration file ($NETLIFY_CONFIG_FILE)."
@@ -26,6 +29,12 @@ if ! [ -f "$NETLIFY_CONFIG_FILE" ]; then
  exit 1
 fi

+if ! command -v http >/dev/null; then
+  echo "Could not find the 'http' command line tool."
+  echo "Please install it (the package name might be 'httpie')."
+  exit 1
+fi
+
 NETLIFY_USERID=$(jq .userId < "$NETLIFY_CONFIG_FILE")
 NETLIFY_TOKEN=$(jq -r .users[$NETLIFY_USERID].auth.token < "$NETLIFY_CONFIG_FILE")

@@ -36,31 +45,33 @@ netlify() {
 }

 ZONE_ID=$(netlify dns_zones |
-          jq -r '.[] | select ( .name == "container.training" ) | .id')
+          jq -r '.[] | select ( .name == "'$DOMAIN'" ) | .id')

 _list() {
  netlify dns_zones/$ZONE_ID/dns_records |
-    jq -r '.[] | select(.type=="A") | [.hostname, .type, .value, .id] | @tsv'
+    jq -r '.[] | select(.type=="A" or .type=="AAAA") | [.hostname, .type, .value, .id] | @tsv' |
+    sort |
+    column --table
 }

 _add() {
-  NAME=$1.container.training
-  ADDR=$2
-
+  NAME=$1.$DOMAIN
+  TYPE=$2
+  VALUE=$3

  # It looks like if we create two identical records, then delete one of them,
  # Netlify DNS ends up in a weird state (the name doesn't resolve anymore even
  # though it's still visible through the API and the website?)

  if netlify dns_zones/$ZONE_ID/dns_records |
-          jq '.[] | select(.hostname=="'$NAME'" and .type=="A" and .value=="'$ADDR'")' |
+          jq '.[] | select(.hostname=="'$NAME'" and .type=="'$TYPE'" and .value=="'$VALUE'")' |
          grep .
  then
    echo "It looks like that record already exists. Refusing to create it."
    exit 1
  fi

-  netlify dns_zones/$ZONE_ID/dns_records type=A hostname=$NAME value=$ADDR ttl=300
+  netlify dns_zones/$ZONE_ID/dns_records type=$TYPE hostname=$NAME value=$VALUE ttl=300

  netlify dns_zones/$ZONE_ID/dns_records |
          jq '.[] | select(.hostname=="'$NAME'")'
@@ -79,7 +90,7 @@ case "$1" in
    _list
    ;;
  add)
-    _add $2 $3
+    _add $2 $3 $4
    ;;
  del)
    _del $2
--- a/prepare-labs/docker.png
+++ b/prepare-labs/docker.png
--- a/prepare-labs/konk.sh
+++ b/prepare-labs/konk.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+# deploy big cluster
+#TF_VAR_node_size=g6-standard-6 \
+#TF_VAR_nodes_per_cluster=5 \
+#TF_VAR_location=eu-west \
+
+TF_VAR_node_size=PRO2-XS \
+TF_VAR_nodes_per_cluster=5 \
+TF_VAR_location=fr-par-2 \
+./labctl create --mode mk8s --settings settings/mk8s.env --provider scaleway --tag konk
+
+# set kubeconfig file
+cp tags/konk/stage2/kubeconfig.101 ~/kubeconfig
+
+# set external_ip labels
+kubectl get nodes -o=jsonpath='{range .items[*]}{.metadata.name} {.status.addresses[?(@.type=="ExternalIP")].address}{"\n"}{end}' |
+while read node address; do
+  kubectl label node $node external_ip=$address
+done
+
+# vcluster all the things
+./labctl create --settings settings/mk8s.env --provider vcluster --mode mk8s --students 50
--- a/prepare-vms/workshopctl
+++ b/prepare-vms/workshopctl
@@ -21,10 +21,13 @@ DEPENDENCIES="
    man
    pssh
    ssh
-    wkhtmltopdf
    yq
    "

+UNUSED_DEPENDENCIES="
+    wkhtmltopdf
+"
+
 # Check for missing dependencies, and issue a warning if necessary.
 missing=0
 for dependency in $DEPENDENCIES; do
--- a/prepare-labs/lib/cli.sh
+++ b/prepare-labs/lib/cli.sh
@@ -50,20 +50,6 @@ sep() {
    fi
 }

-need_infra() {
-    if [ -z "$1" ]; then
-        die "Please specify infrastructure file. (e.g.: infra/aws)"
-    fi
-    if [ "$1" = "--infra" ]; then
-        die "The infrastructure file should be passed directly to this command. Remove '--infra' and try again."
-    fi
-    if [ ! -f "$1" ]; then
-        die "Infrastructure file $1 doesn't exist."
-    fi
-    . "$1"
-    . "lib/infra/$INFRACLASS.sh"
-}
-
 need_tag() {
    if [ -z "$TAG" ]; then
        die "Please specify a tag. To see available tags, run: $0 tags"
@@ -71,25 +57,12 @@ need_tag() {
    if [ ! -d "tags/$TAG" ]; then
        die "Tag $TAG not found (directory tags/$TAG does not exist)."
    fi
-    for FILE in settings.yaml ips.txt infra.sh; do
+    for FILE in settings.env ips.txt; do
        if [ ! -f "tags/$TAG/$FILE" ]; then
          warning "File tags/$TAG/$FILE not found."
        fi
    done
-    . "tags/$TAG/infra.sh"
-    . "lib/infra/$INFRACLASS.sh"
-}
-
-need_settings() {
-    if [ -z "$1" ]; then
-        die "Please specify a settings file. (e.g.: settings/kube101.yaml)"
-    fi
-    if [ ! -f "$1" ]; then
-        die "Settings file $1 doesn't exist."
+    if [ -f "tags/$TAG/settings.env" ]; then
+        . tags/$TAG/settings.env
    fi
 }
-
-need_login_password() {
-    USER_LOGIN=$(yq -r .user_login < tags/$TAG/settings.yaml)
-    USER_PASSWORD=$(yq -r .user_password < tags/$TAG/settings.yaml)
-}
--- a/prepare-labs/lib/colors.sh
+++ b/prepare-labs/lib/colors.sh
--- a/prepare-labs/lib/commands.sh
+++ b/prepare-labs/lib/commands.sh
@@ -1,5 +1,3 @@
-export AWS_DEFAULT_OUTPUT=text
-
 # Ignore SSH key validation when connecting to these remote hosts.
 # (Otherwise, deployment scripts break when a VM IP address reuse.)
 SSHOPTS="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=ERROR"
@@ -16,25 +14,17 @@ _cmd_help() {
    printf "%s" "$HELP" | sort
 }

-_cmd build "Build the Docker image to run this program in a container"
-_cmd_build() {
-    docker-compose build
-}
-
-_cmd wrap "Run this program in a container"
-_cmd_wrap() {
-    docker-compose run --rm workshopctl "$@"
-}
-
 _cmd cards "Generate ready-to-print cards for a group of VMs"
 _cmd_cards() {
    TAG=$1
    need_tag

+    die FIXME
+
    # This will process ips.txt to generate two files: ips.pdf and ips.html
    (
        cd tags/$TAG
-        ../../lib/ips-txt-to-html.py settings.yaml
+        ../../../lib/ips-txt-to-html.py settings.yaml
    )

    ln -sf ../tags/$TAG/ips.html www/$TAG.html
@@ -47,10 +37,10 @@ _cmd_cards() {
    info "$0 www"
 }

-_cmd clean "Remove information about stopped clusters"
+_cmd clean "Remove information about destroyed clusters"
 _cmd_clean() {
 	for TAG in tags/*; do
-		if grep -q ^stopped$ "$TAG/status"; then
+		if grep -q ^destroyed$ "$TAG/status"; then
 			info "Removing $TAG..."
 			rm -rf "$TAG"
 		fi
@@ -61,12 +51,13 @@ _cmd createuser "Create the user that students will use"
 _cmd_createuser() {
    TAG=$1
    need_tag
-    need_login_password

    pssh "
    set -e
    # Create the user if it doesn't exist yet.
    id $USER_LOGIN || sudo useradd -d /home/$USER_LOGIN -g users -m -s /bin/bash $USER_LOGIN
+    # Make sure there are at least exec permission on their home.
+    sudo chmod a+X /home/$USER_LOGIN
    # Add them to the docker group, if there is one.
    grep ^docker: /etc/group && sudo usermod -aG docker $USER_LOGIN
    # Set their password.
@@ -80,7 +71,7 @@ _cmd_createuser() {
    set -e
    sudo sed -i 's/PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config
    sudo sed -i 's/#MaxAuthTries 6/MaxAuthTries 42/' /etc/ssh/sshd_config
-    sudo service ssh restart
+    sudo systemctl restart ssh.service
    "

    pssh "
@@ -96,6 +87,12 @@ _cmd_createuser() {
    fi
    "

+    # FIXME this is a gross hack to add the deployment key to our SSH agent,
+    # so that it can be used to bounce from host to host (which is necessary
+    # in the next deployment step). In the long run, we probably want to
+    # generate these keys locally and push them to the machines instead
+    # (once we move everything to Terraform).
+    ssh-add tags/$TAG/id_rsa
    pssh "
    set -e
    cd /home/$USER_LOGIN
@@ -105,6 +102,7 @@ _cmd_createuser() {
      sudo -u $USER_LOGIN tar -xf-
    fi
    "
+    ssh-add -d tags/$TAG/id_rsa

    # FIXME do this only once.
    pssh -I "sudo -u $USER_LOGIN tee -a /home/$USER_LOGIN/.bashrc" <<"SQRL"
@@ -128,6 +126,7 @@ set number
 set shiftwidth=2
 set softtabstop=2
 set nowrap
+set laststatus=2
 SQRL

    pssh -I "sudo -u $USER_LOGIN tee /home/$USER_LOGIN/.tmux.conf" <<SQRL
@@ -142,9 +141,11 @@ bind l select-pane -R
 set -g mouse on

 # Make scrolling with wheels work
-
 bind -n WheelUpPane if-shell -F -t = "#{mouse_any_flag}" "send-keys -M" "if -Ft= '#{pane_in_mode}' 'send-keys -M' 'select-pane -t=; copy-mode -e; send-keys -M'"
 bind -n WheelDownPane select-pane -t= \; send-keys -M
+
+# Retain one million lines
+set-option -g history-limit 1000000
 SQRL

    # Install docker-prompt script
@@ -154,80 +155,195 @@ SQRL
    echo user_ok > tags/$TAG/status
 }

+
+_cmd create "Create lab environments"
+_cmd_create() {
+    while [ ! -z "$*" ]; do
+        case "$1" in
+        --mode) MODE=$2; shift 2;;
+        --provider) PROVIDER=$2; shift 2;;
+        --settings) SETTINGS=$2; shift 2;;
+        --students) STUDENTS=$2; shift 2;;
+        --tag) TAG=$2; shift 2;;
+        *) die "Unrecognized parameter: $1."
+        esac
+    done
+
+    if [ -z "$MODE" ]; then
+        info "Using default mode (pssh)."
+        MODE=pssh
+    fi
+    if [ -z "$PROVIDER" ]; then
+        die "Please add --provider flag to specify which provider to use."
+    fi
+    if [ -z "$SETTINGS" ]; then
+        die "Please add --settings flag to specify which settings file to use."
+    fi
+    if [ -z "$STUDENTS" ]; then
+        info "Defaulting to 1 student since --students flag wasn't specified."
+        STUDENTS=1
+    fi
+
+    case "$MODE" in
+        mk8s)
+            PROVIDER_BASE=terraform/one-kubernetes
+            ;;
+        pssh)
+            PROVIDER_BASE=terraform/virtual-machines
+            ;;
+        *) die "Invalid mode: $MODE (supported modes: mk8s, pssh)." ;;
+    esac
+    
+    if ! [ -f "$SETTINGS" ]; then
+        die "Settings file ($SETTINGS) not found."
+    fi
+
+    # Check that the provider is valid.
+    if [ -d $PROVIDER_BASE/$PROVIDER ]; then
+        if [ -f $PROVIDER_BASE/$PROVIDER/requires_tfvars ]; then
+            die "Provider $PROVIDER cannot be used directly, because it requires a tfvars file."
+        fi
+        PROVIDER_DIRECTORY=$PROVIDER_BASE/$PROVIDER
+        TFVARS=""
+    elif [ -f $PROVIDER_BASE/$PROVIDER.tfvars ]; then
+        TFVARS=$PROVIDER_BASE/$PROVIDER.tfvars
+        PROVIDER_DIRECTORY=$(dirname $PROVIDER_BASE/$PROVIDER)
+    else
+        error "Provider $PROVIDER not found."
+        info "Available providers for mode $MODE:"
+        (
+            cd $PROVIDER_BASE
+            for P in *; do
+                if [ -d "$P" ]; then
+                    [ -f "$P/requires_tfvars" ] || info "$P"
+                    for V in $P/*.tfvars; do
+                        [ -f "$V" ] && info "${V%.tfvars}"
+                    done
+                fi
+            done
+        )
+        die "Please specify a valid provider."
+    fi
+
+    if [ -z "$TAG" ]; then
+        TAG=$(_cmd_maketag)
+    fi
+    mkdir -p tags/$TAG
+    echo creating > tags/$TAG/status
+
+    ln -s ../../$SETTINGS tags/$TAG/settings.env.orig
+    cp $SETTINGS tags/$TAG/settings.env
+    . $SETTINGS
+
+    echo $MODE > tags/$TAG/mode
+    echo $PROVIDER > tags/$TAG/provider
+    case "$MODE" in
+        mk8s)
+            cp -d terraform/many-kubernetes/*.* tags/$TAG
+            mkdir tags/$TAG/one-kubernetes-module
+            cp $PROVIDER_DIRECTORY/*.tf tags/$TAG/one-kubernetes-module
+            mkdir tags/$TAG/one-kubernetes-config
+            mv tags/$TAG/one-kubernetes-module/config.tf tags/$TAG/one-kubernetes-config
+            ;;
+        pssh)
+            cp $PROVIDER_DIRECTORY/*.tf tags/$TAG
+            if [ "$TFVARS" ]; then
+                cp "$TFVARS" "tags/$TAG/$(basename $TFVARS).auto.tfvars"
+            fi
+            ;;
+    esac
+    (
+        cd tags/$TAG
+        terraform init
+        echo tag = \"$TAG\" >> terraform.tfvars
+        echo how_many_clusters = $STUDENTS >> terraform.tfvars
+        echo nodes_per_cluster = $CLUSTERSIZE >> terraform.tfvars
+        for RETRY in 1 2 3; do
+            if terraform apply -auto-approve; then
+                touch terraform.ok
+                break
+            fi
+        done
+        if ! [ -f terraform.ok ]; then
+            die "Terraform failed."
+        fi
+    )
+
+    sep
+    info "Successfully created $COUNT instances with tag $TAG"
+    echo create_ok > tags/$TAG/status
+
+    # If the settings.env file has a "STEPS" field,
+    # automatically execute all the actions listed in that field.
+    # If an action fails, retry it up to 10 times.
+    for STEP in $(echo $STEPS); do
+        sep "$TAG -> $STEP"
+        TRY=1
+        MAXTRY=10
+        while ! $0 $STEP $TAG ; do
+            TRY=$(($TRY+1))
+            if [ $TRY -gt $MAXTRY ]; then
+                error "This step ($STEP) failed after $MAXTRY attempts."
+                info "You can troubleshoot the situation manually, or terminate these instances with:"
+                info "$0 destroy $TAG"
+                die "Giving up."
+            else
+                sep
+                info "Step '$STEP' failed for '$TAG'. Let's wait 10 seconds and try again."
+                info "(Attempt $TRY out of $MAXTRY.)"
+                sleep 10
+            fi
+        done
+    done
+    sep
+    info "Deployment successful."
+    info "To log into the first machine of that batch, you can run:"
+    info "$0 ssh $TAG"
+    info "To terminate these instances, you can run:"
+    info "$0 destroy $TAG"
+}
+
+_cmd destroy "Destroy lab environments"
+_cmd_destroy() {
+    TAG=$1
+    need_tag
+    cd tags/$TAG
+    echo destroying > status
+    terraform destroy -auto-approve
+    echo destroyed > status
+}
+
 _cmd clusterize "Group VMs in clusters"
 _cmd_clusterize() {
    TAG=$1
    need_tag

-    # Disable unattended upgrades so that they don't mess up with the subsequent steps
-    pssh sudo rm -f /etc/apt/apt.conf.d/50unattended-upgrades
+    pssh "
+    set -e
+    grep PSSH_ /etc/ssh/sshd_config || echo 'AcceptEnv PSSH_*' | sudo tee -a /etc/ssh/sshd_config
+    sudo systemctl restart ssh.service"

-    # Special case for scaleway since it doesn't come with sudo
-    if [ "$INFRACLASS" = "scaleway" ]; then
-        pssh -l root "
-    grep DEBIAN_FRONTEND /etc/environment || echo DEBIAN_FRONTEND=noninteractive >> /etc/environment
-    grep cloud-init /etc/sudoers && rm /etc/sudoers
-    apt-get update && apt-get install sudo -y"
+    pssh -I < tags/$TAG/clusters.txt "
+    grep -w \$PSSH_HOST | tr ' ' '\n' > /tmp/cluster"
+    pssh "
+    echo \$PSSH_HOST > /tmp/ipv4
+    head -n 1 /tmp/cluster | sudo tee /etc/ipv4_of_first_node
+    echo ${CLUSTERPREFIX}1 | sudo tee /etc/name_of_first_node
+    echo HOSTIP=\$PSSH_HOST | sudo tee -a /etc/environment
+    NODEINDEX=\$((\$PSSH_NODENUM%$CLUSTERSIZE+1))
+    if [ \$NODEINDEX = 1 ]; then
+        sudo ln -sf /bin/true /usr/local/bin/i_am_first_node
+    else
+        sudo ln -sf /bin/false /usr/local/bin/i_am_first_node
    fi
-
-    # FIXME
-    # Special case for hetzner since it doesn't have an ubuntu user
-    #if [ "$INFRACLASS" = "hetzner" ]; then
-    #    pssh -l root "
-    #[ -d /home/ubuntu ] ||
-    #    useradd ubuntu -m -s /bin/bash
-    #echo 'ubuntu ALL=(ALL:ALL) NOPASSWD:ALL' > /etc/sudoers.d/ubuntu
-    #[ -d /home/ubuntu/.ssh ] ||
-    #    install --owner=ubuntu --mode=700 --directory /home/ubuntu/.ssh
-    #[ -f /home/ubuntu/.ssh/authorized_keys ] ||
-    #    install --owner=ubuntu --mode=600 /root/.ssh/authorized_keys --target-directory /home/ubuntu/.ssh"
-    #fi
-
-    # Special case for oracle since their iptables blocks everything but SSH
-    pssh "
-    if [ -f /etc/iptables/rules.v4 ]; then
-        sudo sed -i 's/-A INPUT -j REJECT --reject-with icmp-host-prohibited//' /etc/iptables/rules.v4
-        sudo netfilter-persistent flush
-        sudo netfilter-persistent start
-    fi"
-
-    # oracle-cloud-agent upgrades pacakges in the background.
-    # This breaks our deployment scripts, because when we invoke apt-get, it complains
-    # that the lock already exists (symptom: random "Exited with error code 100").
-    # Workaround: if we detect oracle-cloud-agent, remove it.
-    # But this agent seems to also take care of installing/upgrading
-    # the unified-monitoring-agent package, so when we stop the snap,
-    # it can leave dpkg in a broken state. We "fix" it with the 2nd command.
-    pssh "
-    if [ -d /snap/oracle-cloud-agent ]; then
-        sudo snap remove oracle-cloud-agent
-        sudo dpkg --remove --force-remove-reinstreq unified-monitoring-agent
-    fi"
-
-    # Copy settings and install Python YAML parser
-    pssh -I tee /tmp/settings.yaml <tags/$TAG/settings.yaml
-    pssh "
-    sudo apt-get update &&
-    sudo apt-get install -y python-yaml"
-
-    # If there is no "python" binary, symlink to python3
-    pssh "
-    if ! which python; then
-        sudo ln -s $(which python3) /usr/local/bin/python
-    fi"
-
-    # Copy postprep.py to the remote machines, and execute it, feeding it the list of IP addresses
-    pssh -I tee /tmp/clusterize.py <lib/clusterize.py
-    pssh --timeout 900 --send-input "python /tmp/clusterize.py >>/tmp/pp.out 2>>/tmp/pp.err" <tags/$TAG/ips.txt
-
-    # On the first node, create and deploy TLS certs using Docker Machine
-    # (Currently disabled.)
-    true || pssh "
-    if i_am_first_node; then
-        grep '[0-9]\$' /etc/hosts |
-        xargs -n2 sudo -H -u $USER_LOGIN \
-        docker-machine create -d generic --generic-ssh-user $USER_LOGIN --generic-ip-address
-    fi"
+    echo $CLUSTERPREFIX\$NODEINDEX | sudo tee /etc/hostname
+    sudo hostname $CLUSTERPREFIX\$NODEINDEX
+    N=1
+    while read ip; do
+        grep -w \$ip /etc/hosts || echo \$ip $CLUSTERPREFIX\$N | sudo tee -a /etc/hosts
+        N=\$((\$N+1))
+    done < /tmp/cluster
+    "

    echo cluster_ok > tags/$TAG/status
 }
@@ -261,7 +377,7 @@ _cmd_docker() {
    # This will install the latest Docker.
    sudo apt-get -qy install apt-transport-https ca-certificates curl software-properties-common
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
-    sudo add-apt-repository 'deb https://download.docker.com/linux/ubuntu bionic stable'
+    sudo add-apt-repository 'deb https://download.docker.com/linux/ubuntu jammy stable'
    sudo apt-get -q update
    sudo apt-get -qy install docker-ce

@@ -305,10 +421,23 @@ _cmd_kubebins() {
    TAG=$1
    need_tag

+    if [ "$KUBEVERSION" = "" ]; then
+        KUBEVERSION="$(curl -fsSL https://cdn.dl.k8s.io/release/stable.txt | sed s/^v//)"
+    fi
+
    ##VERSION##
-    ETCD_VERSION=v3.4.13
-    K8SBIN_VERSION=v1.19.11 # Can't go to 1.20 because it requires a serviceaccount signing key.
-    CNI_VERSION=v0.8.7
+    case "$KUBEVERSION" in
+    1.19.*)
+      ETCD_VERSION=v3.4.13
+      CNI_VERSION=v0.8.7
+      ;;
+    *)
+      ETCD_VERSION=v3.5.10
+      CNI_VERSION=v1.3.0
+      ;;
+    esac
+
+    K8SBIN_VERSION="v$KUBEVERSION"
    ARCH=${ARCHITECTURE-amd64}
    pssh --timeout 300 "
    set -e
@@ -332,29 +461,41 @@ _cmd_kubebins() {
    "
 }

-_cmd kube "Setup kubernetes clusters with kubeadm (must be run AFTER deploy)"
-_cmd_kube() {
+_cmd kubepkgs "Install Kubernetes packages (kubectl, kubeadm, kubelet)"
+_cmd_kubepkgs() {
    TAG=$1
    need_tag
-    need_login_password

-    # Optional version, e.g. 1.13.5
-    SETTINGS=tags/$TAG/settings.yaml
-    KUBEVERSION=$(awk '/^kubernetes_version:/ {print $2}' $SETTINGS)
-    if [ "$KUBEVERSION" ]; then
-        pssh "
-        sudo tee /etc/apt/preferences.d/kubernetes <<EOF
+    # Prior September 2023, there was a single Kubernetes package repo that
+    # contained packages for all versions, so we could just add that repo
+    # and install whatever was the latest version available there.
+    # Things have changed (versions after September 2023, e.g. 1.28.3 are
+    # not in the old repo) and now there is a different repo for each
+    # minor version, so we need to figure out what minor version we are
+    # installing to add the corresponding repo.
+    if [ "$KUBEVERSION" = "" ]; then
+        KUBEVERSION="$(curl -fsSL https://cdn.dl.k8s.io/release/stable.txt | sed s/^v//)"
+    fi
+    KUBEREPOVERSION="$(echo $KUBEVERSION | cut -d. -f1-2)"
+
+    # Since the new repo doesn't have older versions, add a safety check here.
+    MINORVERSION="$(echo $KUBEVERSION | cut -d. -f2)"
+    if [ "$MINORVERSION" -lt 24 ]; then
+        die "Cannot install kubepkgs for versions before 1.24."
+    fi
+
+    pssh "
+    sudo tee /etc/apt/preferences.d/kubernetes <<EOF
 Package: kubectl kubeadm kubelet
-Pin: version $KUBEVERSION*
+Pin: version $KUBEVERSION-*
 Pin-Priority: 1000
 EOF"
-    fi

    # Install packages
    pssh --timeout 200 "
-    curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg |
-    sudo apt-key add - &&
-    echo deb http://apt.kubernetes.io/ kubernetes-xenial main |
+    curl -fsSL https://pkgs.k8s.io/core:/stable:/v$KUBEREPOVERSION/deb/Release.key | 
+    gpg --dearmor | sudo tee /etc/apt/keyrings/kubernetes-apt-keyring.gpg &&
+    echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v$KUBEREPOVERSION/deb/ /' |
    sudo tee /etc/apt/sources.list.d/kubernetes.list"
    pssh --timeout 200 "
    sudo apt-get update -q &&
@@ -364,18 +505,25 @@ EOF"
    kubectl completion bash | sudo tee /etc/bash_completion.d/kubectl &&
    echo 'alias k=kubectl' | sudo tee /etc/bash_completion.d/k &&
    echo 'complete -F __start_kubectl k' | sudo tee -a /etc/bash_completion.d/k"
+}

-    # Disable swap
-    # (note that this won't survive across node reboots!)
-    if [ "$INFRACLASS" = "linode" ]; then
-        pssh "
-        sudo swapoff -a"
+_cmd kubeadm "Setup kubernetes clusters with kubeadm"
+_cmd_kubeadm() {
+    TAG=$1
+    need_tag
+
+    if [ "$KUBEVERSION" ]; then
+        CLUSTER_CONFIGURATION_KUBERNETESVERSION='kubernetesVersion: "v'$KUBEVERSION'"'
+        IGNORE_SYSTEMVERIFICATION="- SystemVerification"
+        IGNORE_SWAP="- Swap"
    fi

-    # Re-enable CRI interface in containerd
-    pssh "
-    echo '# Use default parameters for containerd.' | sudo tee /etc/containerd/config.toml
-    sudo systemctl restart containerd"
+    # Install a valid configuration for containerd
+    # (first, the CRI interface needs to be re-enabled;
+    # also, the correct systemd cgroup driver must be selected,
+    # otherwise containerd just restarts containers for no good reason)
+    pssh -I "sudo tee /etc/containerd/config.toml" < lib/containerd-config.toml
+    pssh "sudo systemctl restart containerd"

    # Initialize kube control plane
    pssh --timeout 200 "
@@ -383,39 +531,38 @@ EOF"
        kubeadm token generate > /tmp/token &&
        cat >/tmp/kubeadm-config.yaml <<EOF
 kind: InitConfiguration
-apiVersion: kubeadm.k8s.io/v1beta2
+apiVersion: kubeadm.k8s.io/v1beta3
 bootstrapTokens:
 - token: \$(cat /tmp/token)
 nodeRegistration:
-  # Comment out the next line to switch back to Docker.
-  criSocket: /run/containerd/containerd.sock
  ignorePreflightErrors:
  - NumCPU
+  $IGNORE_SYSTEMVERIFICATION
+  $IGNORE_SWAP
 ---
 kind: JoinConfiguration
-apiVersion: kubeadm.k8s.io/v1beta2
+apiVersion: kubeadm.k8s.io/v1beta3
 discovery:
  bootstrapToken:
    apiServerEndpoint: \$(cat /etc/name_of_first_node):6443
    token: \$(cat /tmp/token)
    unsafeSkipCAVerification: true
 nodeRegistration:
-  # Comment out the next line to switch back to Docker.
-  criSocket: /run/containerd/containerd.sock
  ignorePreflightErrors:
  - NumCPU
+  $IGNORE_SYSTEMVERIFICATION
+  $IGNORE_SWAP
 ---
 kind: KubeletConfiguration
 apiVersion: kubelet.config.k8s.io/v1beta1
-# The following line is necessary when using Docker.
-# It doesn't seem necessary when using containerd.
-#cgroupDriver: cgroupfs
+failSwapOn: false
 ---
 kind: ClusterConfiguration
-apiVersion: kubeadm.k8s.io/v1beta2
+apiVersion: kubeadm.k8s.io/v1beta3
 apiServer:
  certSANs:
  - \$(cat /tmp/ipv4)
+$CLUSTER_CONFIGURATION_KUBERNETESVERSION
 EOF
 	sudo kubeadm init --config=/tmp/kubeadm-config.yaml
    fi"
@@ -433,11 +580,17 @@ EOF
    # Install weave as the pod network
    pssh "
    if i_am_first_node; then
-        #kubever=\$(kubectl version | base64 | tr -d '\n') &&
-        #kubectl apply -f https://cloud.weave.works/k8s/net?k8s-version=\$kubever
        kubectl apply -f https://github.com/weaveworks/weave/releases/download/v2.8.1/weave-daemonset-k8s-1.11.yaml
    fi"

+    # FIXME this is a gross hack to add the deployment key to our SSH agent,
+    # so that it can be used to bounce from host to host (which is necessary
+    # in the next deployment step). In the long run, we probably want to
+    # generate these keys locally and push them to the machines instead
+    # (once we move everything to Terraform).
+    if [ -f "tags/$TAG/id_rsa" ]; then
+        ssh-add tags/$TAG/id_rsa
+    fi
    # Join the other nodes to the cluster
    pssh --timeout 200 "
    if ! i_am_first_node && [ ! -f /etc/kubernetes/kubelet.conf ]; then
@@ -445,6 +598,9 @@ EOF
        ssh $SSHOPTS \$FIRSTNODE cat /tmp/kubeadm-config.yaml > /tmp/kubeadm-config.yaml &&
        sudo kubeadm join --config /tmp/kubeadm-config.yaml
    fi"
+    if [ -f "tags/$TAG/id_rsa" ]; then
+        ssh-add -d tags/$TAG/id_rsa
+    fi

    # Install metrics server
    pssh "
@@ -460,7 +616,6 @@ _cmd kubetools "Install a bunch of CLI tools for Kubernetes"
 _cmd_kubetools() {
    TAG=$1
    need_tag
-    need_login_password

    ARCH=${ARCHITECTURE-amd64}

@@ -655,6 +810,25 @@ EOF
        sudo tar -zxvf- -C /usr/local/bin kubeseal
        kubeseal --version
    fi"
+
+    ##VERSION## https://github.com/vmware-tanzu/velero/releases
+    VELERO_VERSION=1.11.0
+    pssh "
+    if [ ! -x /usr/local/bin/velero ]; then
+        curl -fsSL https://github.com/vmware-tanzu/velero/releases/download/v$VELERO_VERSION/velero-v$VELERO_VERSION-linux-$ARCH.tar.gz |
+        sudo tar --strip-components=1 --wildcards -zx -C /usr/local/bin '*/velero'
+        velero completion bash | sudo tee /etc/bash_completion.d/velero
+        velero version --client-only
+    fi"
+
+    ##VERSION## https://github.com/doitintl/kube-no-trouble/releases
+    KUBENT_VERSION=0.7.0
+    pssh "
+    if [ ! -x /usr/local/bin/kubent ]; then
+        curl -fsSL https://github.com/doitintl/kube-no-trouble/releases/download/${KUBENT_VERSION}/kubent-${KUBENT_VERSION}-linux-$ARCH.tar.gz |
+        sudo tar -zxvf- -C /usr/local/bin kubent
+        kubent --version
+    fi"
 }

 _cmd kubereset "Wipe out Kubernetes configuration on all nodes"
@@ -688,8 +862,6 @@ _cmd_ips() {
    TAG=$1
    need_tag $TAG

-    SETTINGS=tags/$TAG/settings.yaml
-    CLUSTERSIZE=$(awk '/^clustersize:/ {print $2}' $SETTINGS)
    while true; do
        for I in $(seq $CLUSTERSIZE); do
            read ip || return 0
@@ -699,22 +871,9 @@ _cmd_ips() {
    done < tags/$TAG/ips.txt
 }

-_cmd inventory "List all VMs on a given infrastructure (or all infras if no arg given)"
+_cmd inventory "List all VMs on a given provider (or across all providers if no arg given)"
 _cmd_inventory() {
-    case "$1" in
-    "")
-        for INFRA in infra/*; do
-            $0 inventory $INFRA
-        done
-        ;;
-    */example.*)
-        ;;
-    *)
-        need_infra $1
-        sep "Listing instances for $1"
-        infra_list
-        ;;
-    esac
+    FIXME
 }

 _cmd maketag "Generate a quasi-unique tag for a group of instances"
@@ -759,18 +918,92 @@ _cmd_ping() {
    fping < tags/$TAG/ips.txt
 }

+_cmd stage2 "Finalize the setup of managed Kubernetes clusters"
+_cmd_stage2() {
+    TAG=$1
+    need_tag
+
+    cd tags/$TAG/stage2
+    terraform init -upgrade
+    terraform apply -auto-approve
+}
+
+_cmd standardize "Deal with non-standard Ubuntu cloud images"
+_cmd_standardize() {
+    TAG=$1
+    need_tag
+
+    # Try to log in as root.
+    # If successful, make sure than we have:
+    # - sudo
+    # - ubuntu user
+    # Note that on Scaleway, the keys of the root account get copied
+    # a little bit later after boot; so the first time we run "standardize"
+    # we might end up copying an incomplete authorized_keys file.
+    # That's why we copy it inconditionally here, rather than checking
+    # for existence and skipping if it already exists.
+    pssh -l root -t 5 true 2>&1 >/dev/null && {
+        pssh -l root "
+        grep DEBIAN_FRONTEND /etc/environment || echo DEBIAN_FRONTEND=noninteractive >> /etc/environment
+        #grep cloud-init /etc/sudoers && rm /etc/sudoers
+        apt-get update && apt-get install sudo -y
+        getent passwd ubuntu || {
+            useradd ubuntu -m -s /bin/bash
+            echo 'ubuntu ALL=(ALL:ALL) NOPASSWD:ALL' > /etc/sudoers.d/ubuntu
+        }
+        install --owner=ubuntu --mode=700 --directory /home/ubuntu/.ssh
+        install --owner=ubuntu --mode=600 /root/.ssh/authorized_keys --target-directory /home/ubuntu/.ssh
+        "
+    }
+
+    # Now make sure that we have an ubuntu user
+    pssh true
+
+    # Disable unattended upgrades so that they don't mess up with the subsequent steps
+    pssh sudo rm -f /etc/apt/apt.conf.d/50unattended-upgrades
+
+    # Digital Ocean's cloud init disables password authentication; re-enable it.
+    pssh "
+    if [ -f /etc/ssh/sshd_config.d/50-cloud-init.conf ]; then
+        sudo rm /etc/ssh/sshd_config.d/50-cloud-init.conf
+        sudo systemctl restart ssh.service
+    fi"
+
+    # Special case for oracle since their iptables blocks everything but SSH
+    pssh "
+    if [ -f /etc/iptables/rules.v4 ]; then
+        sudo sed -i 's/-A INPUT -j REJECT --reject-with icmp-host-prohibited//' /etc/iptables/rules.v4
+        sudo netfilter-persistent flush
+        sudo netfilter-persistent start
+    fi"
+
+    # oracle-cloud-agent upgrades pacakges in the background.
+    # This breaks our deployment scripts, because when we invoke apt-get, it complains
+    # that the lock already exists (symptom: random "Exited with error code 100").
+    # Workaround: if we detect oracle-cloud-agent, remove it.
+    # But this agent seems to also take care of installing/upgrading
+    # the unified-monitoring-agent package, so when we stop the snap,
+    # it can leave dpkg in a broken state. We "fix" it with the 2nd command.
+    pssh "
+    if [ -d /snap/oracle-cloud-agent ]; then
+        sudo snap remove oracle-cloud-agent
+        sudo dpkg --remove --force-remove-reinstreq unified-monitoring-agent
+    fi"
+}
+
 _cmd tailhist "Install history viewer on port 1088"
 _cmd_tailhist () {
    TAG=$1
    need_tag
-    need_login_password

    ARCH=${ARCHITECTURE-amd64}
    [ "$ARCH" = "aarch64" ] && ARCH=arm64

+    # We use "wget -c" here in case the download was aborted
+    # halfway through and we're actually trying to download it again.
    pssh "
    set -e
-    wget https://github.com/joewalnes/websocketd/releases/download/v0.3.0/websocketd-0.3.0-linux_$ARCH.zip
+    wget -c https://github.com/joewalnes/websocketd/releases/download/v0.3.0/websocketd-0.3.0-linux_$ARCH.zip
    unzip websocketd-0.3.0-linux_$ARCH.zip websocketd
    sudo mv websocketd /usr/local/bin/websocketd
    sudo mkdir -p /tmp/tailhist
@@ -804,25 +1037,9 @@ _cmd_tools() {
    sudo apt-get -qy install apache2-utils emacs-nox git httping htop jid joe jq mosh python-setuptools tree unzip
    # This is for VMs with broken PRNG (symptom: running docker-compose randomly hangs)
    sudo apt-get -qy install haveged
-    # I don't remember why we need to remove this
-    sudo apt-get remove -y --purge dnsmasq-base
    "
 }

-_cmd opensg "Open the default security group to ALL ingress traffic"
-_cmd_opensg() {
-    need_infra $1
-    infra_opensg
-}
-
-_cmd disableaddrchecks "Disable source/destination IP address checks"
-_cmd_disableaddrchecks() {
-    TAG=$1
-    need_tag
-
-    infra_disableaddrchecks
-}
-
 _cmd pssh "Run an arbitrary command on all nodes"
 _cmd_pssh() {
    TAG=$1
@@ -864,122 +1081,21 @@ fi
    "
 }

-_cmd quotas "Check our infrastructure quotas (max instances)"
-_cmd_quotas() {
-    need_infra $1
-    infra_quotas
-}
-
 _cmd ssh "Open an SSH session to the first node of a tag"
 _cmd_ssh() {
    TAG=$1
    need_tag
-    need_login_password
    IP=$(head -1 tags/$TAG/ips.txt)
    info "Logging into $IP (default password: $USER_PASSWORD)"
    ssh $SSHOPTS $USER_LOGIN@$IP

 }

-_cmd start "Start a group of VMs"
-_cmd_start() {
-    while [ ! -z "$*" ]; do
-        case "$1" in
-        --infra) INFRA=$2; shift 2;;
-        --settings) SETTINGS=$2; shift 2;;
-        --count) die "Flag --count is deprecated; please use --students instead." ;;
-        --tag) TAG=$2; shift 2;;
-        --students) STUDENTS=$2; shift 2;;
-        *) die "Unrecognized parameter: $1."
-        esac
-    done
-
-    if [ -z "$INFRA" ]; then
-        die "Please add --infra flag to specify which infrastructure file to use."
-    fi
-    if [ -z "$SETTINGS" ]; then
-        die "Please add --settings flag to specify which settings file to use."
-    fi
-    if [ -z "$COUNT" ]; then
-        CLUSTERSIZE=$(awk '/^clustersize:/ {print $2}' $SETTINGS)
-        if [ -z "$STUDENTS" ]; then
-            warning "Neither --count nor --students was specified."
-            warning "According to the settings file, the cluster size is $CLUSTERSIZE."
-            warning "Deploying one cluster of $CLUSTERSIZE nodes."
-            STUDENTS=1
-        fi
-        COUNT=$(($STUDENTS*$CLUSTERSIZE))
-    fi
-
-    # Check that the specified settings and infrastructure are valid.
-    need_settings $SETTINGS
-    need_infra $INFRA
-
-    if [ -z "$TAG" ]; then
-        TAG=$(_cmd_maketag)
-    fi
-    mkdir -p tags/$TAG
-    ln -s ../../$INFRA tags/$TAG/infra.sh
-    ln -s ../../$SETTINGS tags/$TAG/settings.yaml
-    echo creating > tags/$TAG/status
-
-    infra_start $COUNT
-    sep
-    info "Successfully created $COUNT instances with tag $TAG"
-    echo create_ok > tags/$TAG/status
-
-    # If the settings.yaml file has a "steps" field,
-    # automatically execute all the actions listed in that field.
-    # If an action fails, retry it up to 10 times.
-    python -c 'if True: # hack to deal with indentation
-        import sys, yaml
-        settings = yaml.safe_load(sys.stdin)
-        print ("\n".join(settings.get("steps", [])))
-        ' < tags/$TAG/settings.yaml \
-    | while read step; do
-        if [ -z "$step" ]; then
-            break
-        fi
-        sep "$TAG -> $step"
-        TRY=1
-        MAXTRY=10
-        while ! $0 $step $TAG ; do
-            TRY=$(($TRY+1))
-            if [ $TRY -gt $MAXTRY ]; then
-                error "This step ($step) failed after $MAXTRY attempts."
-                info "You can troubleshoot the situation manually, or terminate these instances with:"
-                info "$0 stop $TAG"
-                die "Giving up."
-            else
-                sep
-                info "Step '$step' failed for '$TAG'. Let's wait 10 seconds and try again."
-                info "(Attempt $TRY out of $MAXTRY.)"
-                sleep 10
-            fi
-        done
-    done
-    sep
-    info "Deployment successful."
-    info "To log into the first machine of that batch, you can run:"
-    info "$0 ssh $TAG"
-    info "To terminate these instances, you can run:"
-    info "$0 stop $TAG"
-}
-
-_cmd stop "Stop (terminate, shutdown, kill, remove, destroy...) instances"
-_cmd_stop() {
-    TAG=$1
-    need_tag
-    infra_stop
-    echo stopped > tags/$TAG/status
-}
-
 _cmd tags "List groups of VMs known locally"
 _cmd_tags() {
    (
        cd tags
-        echo "[#] [Status] [Tag] [Infra]" \
-           | awk '{ printf "%-7s %-12s %-25s %-25s\n", $1, $2, $3, $4}'
+        echo "[#] [Status] [Tag] [Mode] [Provider]"
        for tag in *; do
            if [ -f $tag/ips.txt ]; then
                count="$(wc -l < $tag/ips.txt)"
@@ -991,15 +1107,19 @@ _cmd_tags() {
            else
                status="?"
            fi
-            if [ -f $tag/infra.sh ]; then
-                infra="$(basename $(readlink $tag/infra.sh))"
+            if [ -f $tag/mode ]; then
+                mode="$(cat $tag/mode)"
            else
-                infra="?"
+                mode="?"
            fi
-            echo "$count $status $tag $infra" \
-               | awk '{ printf "%-7s %-12s %-25s %-25s\n", $1, $2, $3, $4}'
+            if [ -f $tag/provider ]; then
+                provider="$(cat $tag/provider)"
+            else
+                provider="?"
+            fi
+            echo "$count $status $tag $mode $provider"
        done
-    )
+    ) | column -t
 }

 _cmd test "Run tests (pre-flight checks) on a group of VMs"
@@ -1054,21 +1174,28 @@ _cmd_passwords() {
    $0 ips "$TAG" | paste "$PASSWORDS_FILE" - | while read password nodes; do
        info "Setting password for $nodes..."
        for node in $nodes; do
-            echo docker:$password | ssh $SSHOPTS ubuntu@$node sudo chpasswd
+            echo $USER_LOGIN:$password | ssh $SSHOPTS -i tags/$TAG/id_rsa ubuntu@$node sudo chpasswd
        done
    done
    info "Done."
 }

-_cmd wait "Wait until VMs are ready (reachable and cloud init is done)"
+_cmd wait "Wait until VMs are ready (reachable, cloud init is done, ubuntu user is up)"
 _cmd_wait() {
    TAG=$1
    need_tag

    # Wait until all hosts are reachable.
    info "Trying to reach $TAG instances..."
-    while ! pssh -t 5 true 2>&1 >/dev/null; do
-        >/dev/stderr echo -n "."
+    while >/dev/stderr echo -n "."; do
+        pssh -t 5 true 2>&1 >/dev/null && {
+            SSH_USER=ubuntu
+            break
+        }
+        pssh -l root -t 5 true 2>&1 >/dev/null && {
+            SSH_USER=root
+            break
+        }
        sleep 2
    done
    >/dev/stderr echo ""
@@ -1076,11 +1203,9 @@ _cmd_wait() {
    # If this VM image is using cloud-init,
    # wait for cloud-init to be done
    info "Waiting for cloud-init to be done on $TAG instances..."
-    pssh "
+    pssh -l $SSH_USER "
    if [ -d /var/lib/cloud ]; then
-        while [ ! -f /var/lib/cloud/instance/boot-finished ]; do
-            sleep 1
-        done
+        cloud-init status --wait
    fi"
 }

@@ -1106,7 +1231,6 @@ _cmd_webssh() {
    need_tag
    pssh "
    sudo apt-get update &&
-    sudo apt-get install python-tornado python-paramiko -y ||
    sudo apt-get install python3-tornado python3-paramiko -y"
    pssh "
    cd /opt
--- a/prepare-labs/lib/containerd-config.toml
+++ b/prepare-labs/lib/containerd-config.toml
@@ -0,0 +1,7 @@
+version = 2
+[plugins."io.containerd.grpc.v1.cri".containerd]
+default_runtime_name = "runc"
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
+runtime_type = "io.containerd.runc.v2"
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
+SystemdCgroup = true
--- a/prepare-labs/lib/docker-prompt
+++ b/prepare-labs/lib/docker-prompt
--- a/prepare-labs/lib/find-ubuntu-ami.sh
+++ b/prepare-labs/lib/find-ubuntu-ami.sh
--- a/prepare-labs/lib/ips-txt-to-html.py
+++ b/prepare-labs/lib/ips-txt-to-html.py
--- a/prepare-labs/lib/pssh.sh
+++ b/prepare-labs/lib/pssh.sh
@@ -16,18 +16,18 @@ pssh() {
    }

    echo "[parallel-ssh] $@"
-    export PSSH=$(which pssh || which parallel-ssh)

-    case "$INFRACLASS" in
-        hetzner) LOGIN=root ;;
-        linode)  LOGIN=root ;;
-        *)       LOGIN=ubuntu ;;
-    esac
+    # There are some routers that really struggle with the number of TCP
+    # connections that we open when deploying large fleets of clusters.
+    # We're adding a 1 second delay here, but this can be cranked up if
+    # necessary - or down to zero, too.
+    sleep ${PSSH_DELAY_PRE-1}

-    $PSSH -h $HOSTFILE -l $LOGIN \
-        --par 100 \
+    $(which pssh || which parallel-ssh) -h $HOSTFILE -l ubuntu \
+        --par ${PSSH_PARALLEL_CONNECTIONS-100} \
        --timeout 300 \
        -O LogLevel=ERROR \
+        -O IdentityFile=tags/$TAG/id_rsa \
        -O UserKnownHostsFile=/dev/null \
        -O StrictHostKeyChecking=no \
        -O ForwardAgent=yes \
--- a/prepare-labs/lib/tailhist.html
+++ b/prepare-labs/lib/tailhist.html
--- a/prepare-labs/lib/wkhtmltopdf
+++ b/prepare-labs/lib/wkhtmltopdf
--- a/prepare-labs/settings/admin-kubenet.env
+++ b/prepare-labs/settings/admin-kubenet.env
@@ -0,0 +1,21 @@
+CLUSTERSIZE=3
+
+CLUSTERPREFIX=kubenet
+CLUSTERNUMBER=100
+
+USER_LOGIN=k8s
+USER_PASSWORD=training
+
+STEPS="
+  wait
+  standardize
+  clusterize
+  tools
+  docker
+  createuser
+  webssh
+  tailhist
+  kubebins
+  kubetools
+  ips
+"
--- a/prepare-labs/settings/admin-kuberouter.env
+++ b/prepare-labs/settings/admin-kuberouter.env
@@ -0,0 +1,21 @@
+CLUSTERSIZE=3
+
+CLUSTERPREFIX=kuberouter
+CLUSTERNUMBER=200
+
+USER_LOGIN=k8s
+USER_PASSWORD=training
+
+STEPS="
+  wait
+  standardize
+  clusterize
+  tools
+  docker
+  createuser
+  webssh
+  tailhist
+  kubebins
+  kubetools
+  ips
+"
--- a/prepare-labs/settings/admin-monokube.env
+++ b/prepare-labs/settings/admin-monokube.env
@@ -0,0 +1,26 @@
+CLUSTERSIZE=1
+
+CLUSTERPREFIX=monokube
+
+# We're sticking to this in the first DMUC lab,
+# because it still works with Docker, and doesn't
+# require a ServiceAccount signing key.
+KUBEVERSION=1.19.11
+
+USER_LOGIN=k8s
+USER_PASSWORD=training
+
+STEPS="
+  wait
+  standardize
+  clusterize
+  tools
+  docker
+  disabledocker
+  createuser
+  webssh
+  tailhist
+  kubebins
+  kubetools
+  ips
+"
--- a/prepare-labs/settings/admin-oldversion.env
+++ b/prepare-labs/settings/admin-oldversion.env
@@ -0,0 +1,25 @@
+CLUSTERSIZE=3
+
+CLUSTERPREFIX=oldversion
+
+USER_LOGIN=k8s
+USER_PASSWORD=training
+
+# For a list of old versions, check:
+# https://kubernetes.io/releases/patch-releases/#non-active-branch-history
+KUBEVERSION=1.24.14
+
+STEPS="
+  wait
+  standardize
+  clusterize
+  tools
+  docker
+  createuser
+  webssh
+  tailhist
+  kubepkgs
+  kubeadm
+  kubetools
+  kubetest
+"
--- a/prepare-labs/settings/admin-polykube.env
+++ b/prepare-labs/settings/admin-polykube.env
@@ -0,0 +1,20 @@
+CLUSTERSIZE=3
+
+CLUSTERPREFIX=polykube
+
+USER_LOGIN=k8s
+USER_PASSWORD=training
+
+STEPS="
+  wait
+  standardize
+  clusterize
+  tools
+  kubepkgs
+  kubebins
+  createuser
+  webssh
+  tailhist
+  kubetools
+  ips
+"
--- a/prepare-labs/settings/admin-test.env
+++ b/prepare-labs/settings/admin-test.env
@@ -0,0 +1,21 @@
+CLUSTERSIZE=3
+
+CLUSTERPREFIX=test
+
+USER_LOGIN=k8s
+USER_PASSWORD=training
+
+STEPS="
+  wait
+  standardize
+  clusterize
+  tools
+  docker
+  createuser
+  webssh
+  tailhist
+  kubepkgs
+  kubeadm
+  kubetools
+  kubetest
+"
--- a/prepare-labs/settings/docker.env
+++ b/prepare-labs/settings/docker.env
@@ -0,0 +1,19 @@
+CLUSTERSIZE=1
+
+CLUSTERPREFIX=moby
+
+USER_LOGIN=docker
+USER_PASSWORD=training
+
+STEPS="
+  wait
+  standardize
+  clusterize
+  tools
+  docker
+  createuser
+  webssh
+  tailhist
+  cards
+  ips
+"
--- a/prepare-labs/settings/kubernetes.env
+++ b/prepare-labs/settings/kubernetes.env
@@ -0,0 +1,21 @@
+CLUSTERSIZE=4
+
+CLUSTERPREFIX=node
+
+USER_LOGIN=k8s
+USER_PASSWORD=training
+
+STEPS="
+  wait
+  standardize
+  clusterize
+  tools
+  docker
+  createuser
+  webssh
+  tailhist
+  kubepkgs
+  kubeadm
+  kubetools
+  kubetest
+"
--- a/prepare-labs/settings/largekube.env
+++ b/prepare-labs/settings/largekube.env
@@ -0,0 +1,22 @@
+CLUSTERSIZE=10
+export TF_VAR_node_size=GP1.M
+
+CLUSTERPREFIX=node
+
+USER_LOGIN=k8s
+USER_PASSWORD=training
+
+STEPS="
+  wait
+  standardize
+  clusterize
+  tools
+  docker
+  createuser
+  webssh
+  tailhist
+  kubepkgs
+  kubeadm
+  kubetools
+  kubetest
+"
--- a/prepare-labs/settings/mk8s.env
+++ b/prepare-labs/settings/mk8s.env
@@ -0,0 +1,6 @@
+CLUSTERSIZE=2
+
+USER_LOGIN=k8s
+USER_PASSWORD=
+
+STEPS="stage2"
--- a/prepare-labs/settings/portal.env
+++ b/prepare-labs/settings/portal.env
@@ -0,0 +1,19 @@
+#export TF_VAR_node_size=GP2.4
+#export TF_VAR_node_size=g6-standard-6
+
+CLUSTERSIZE=1
+
+CLUSTERPREFIX=CHANGEME
+
+USER_LOGIN=portal
+USER_PASSWORD=CHANGEME
+
+STEPS="
+  wait
+  standardize
+  clusterize
+  tools
+  docker
+  createuser
+  ips
+"
--- a/prepare-labs/setup-admin-clusters.sh
+++ b/prepare-labs/setup-admin-clusters.sh
@@ -0,0 +1,40 @@
+#!/bin/sh
+set -e
+
+PREFIX=$(date +%Y-%m-%d-%H-%M)
+PROVIDER=openstack/enix # aws also works
+STUDENTS=2
+#export TF_VAR_location=eu-north-1
+export TF_VAR_node_size=S
+
+SETTINGS=admin-monokube
+TAG=$PREFIX-$SETTINGS
+./labctl create \
+	--tag $TAG \
+	--provider $PROVIDER \
+	--settings settings/$SETTINGS.env \
+	--students $STUDENTS
+
+SETTINGS=admin-polykube
+TAG=$PREFIX-$SETTINGS
+./labctl create \
+	--tag $TAG \
+	--provider $PROVIDER \
+	--settings settings/$SETTINGS.env \
+	--students $STUDENTS
+
+SETTINGS=admin-oldversion
+TAG=$PREFIX-$SETTINGS
+./labctl create \
+	--tag $TAG \
+	--provider $PROVIDER \
+	--settings settings/$SETTINGS.env \
+	--students $STUDENTS
+
+SETTINGS=admin-test
+TAG=$PREFIX-$SETTINGS
+./labctl create \
+	--tag $TAG \
+	--provider $PROVIDER \
+	--settings settings/$SETTINGS.env \
+	--students $STUDENTS
--- a/prepare-labs/tags
+++ b/prepare-labs/tags
@@ -0,0 +1 @@
+terraform/tags
--- a/prepare-labs/templates/cards.html
+++ b/prepare-labs/templates/cards.html
--- a/prepare-labs/templates/clusters.csv
+++ b/prepare-labs/templates/clusters.csv
--- a/prepare-labs/terraform/list-locations/azure
+++ b/prepare-labs/terraform/list-locations/azure
@@ -0,0 +1,4 @@
+#!/bin/sh
+az account list-locations -o table \
+  --query "sort_by([?metadata.regionType == 'Physical'], &regionalDisplayName)[]
+          .{ displayName: displayName, regionalDisplayName: regionalDisplayName }"
--- a/prepare-labs/terraform/list-locations/civo
+++ b/prepare-labs/terraform/list-locations/civo
@@ -0,0 +1,2 @@
+#!/bin/sh
+civo region ls
--- a/prepare-tf/source/modules/digitalocean/list_locations.sh
+++ b/prepare-tf/source/modules/digitalocean/list_locations.sh
--- a/prepare-labs/terraform/list-locations/exoscale
+++ b/prepare-labs/terraform/list-locations/exoscale
@@ -0,0 +1,2 @@
+#!/bin/sh
+exo zone
--- a/prepare-tf/source/modules/googlecloud/list_locations.sh
+++ b/prepare-tf/source/modules/googlecloud/list_locations.sh
--- a/prepare-tf/source/modules/linode/list_locations.sh
+++ b/prepare-tf/source/modules/linode/list_locations.sh
--- a/prepare-tf/source/modules/oraclecloud/list_locations.sh
+++ b/prepare-tf/source/modules/oraclecloud/list_locations.sh
--- a/prepare-tf/source/modules/scaleway/list_locations.sh
+++ b/prepare-tf/source/modules/scaleway/list_locations.sh
--- a/prepare-labs/terraform/many-kubernetes/locals.tf
+++ b/prepare-labs/terraform/many-kubernetes/locals.tf
@@ -8,8 +8,10 @@ resource "random_string" "_" {
 resource "time_static" "_" {}

 locals {
-  timestamp = formatdate("YYYY-MM-DD-hh-mm", time_static._.rfc3339)
-  tag       = random_string._.result
+  min_nodes_per_pool = var.nodes_per_cluster
+  max_nodes_per_pool = var.nodes_per_cluster * 2
+  timestamp          = formatdate("YYYY-MM-DD-hh-mm", time_static._.rfc3339)
+  tag                = random_string._.result
  # Common tags to be assigned to all resources
  common_tags = [
    "created-by-terraform",
--- a/prepare-labs/terraform/many-kubernetes/main.tf
+++ b/prepare-labs/terraform/many-kubernetes/main.tf
@@ -1,10 +1,9 @@
 module "clusters" {
-  source             = "./modules/PROVIDER"
+  source             = "./one-kubernetes-module"
  for_each           = local.clusters
  cluster_name       = each.value.cluster_name
-  min_nodes_per_pool = var.min_nodes_per_pool
-  max_nodes_per_pool = var.max_nodes_per_pool
-  enable_arm_pool    = var.enable_arm_pool
+  min_nodes_per_pool = local.min_nodes_per_pool
+  max_nodes_per_pool = local.max_nodes_per_pool
  node_size          = var.node_size
  common_tags        = local.common_tags
  location           = each.value.location
@@ -63,7 +62,7 @@ resource "null_resource" "wait_for_nodes" {
    }
    command = <<-EOT
      while sleep 1; do
-        kubectl get nodes --watch | grep --silent --line-buffered . &&
+        kubectl get nodes -o name | grep --silent . &&
        kubectl wait node --for=condition=Ready --all --timeout=10m &&
        break
      done
--- a/prepare-labs/terraform/many-kubernetes/one-kubernetes-config.tf
+++ b/prepare-labs/terraform/many-kubernetes/one-kubernetes-config.tf
@@ -0,0 +1 @@
+one-kubernetes-config/config.tf
--- a/prepare-labs/terraform/many-kubernetes/one-kubernetes-config/README.md
+++ b/prepare-labs/terraform/many-kubernetes/one-kubernetes-config/README.md
@@ -0,0 +1,3 @@
+This directory should contain a config.tf file, even if it's empty.
+(Because if the file doesn't exist, then the Terraform configuration
+in the parent directory will fail.)
--- a/prepare-labs/terraform/many-kubernetes/one-kubernetes-module/README.md
+++ b/prepare-labs/terraform/many-kubernetes/one-kubernetes-module/README.md
@@ -0,0 +1,8 @@
+This directory should contain a copy of one of the "one-kubernetes" modules.
+For instance, when located in this directory, you can do:
+
+cp ../../one-kubernetes/linode/* .
+
+Then, move the config.tf file to ../one-kubernetes-config:
+
+mv config.tf ../one-kubernetes-config
--- a/prepare-labs/terraform/many-kubernetes/one-kubernetes-provider.tf
+++ b/prepare-labs/terraform/many-kubernetes/one-kubernetes-provider.tf
@@ -0,0 +1 @@
+one-kubernetes-module/provider.tf
--- a/prepare-labs/terraform/many-kubernetes/providers.tf
+++ b/prepare-labs/terraform/many-kubernetes/providers.tf
@@ -0,0 +1,3 @@
+terraform {
+  required_version = ">= 1.4"
+}
--- a/prepare-labs/terraform/many-kubernetes/stage2.tmpl
+++ b/prepare-labs/terraform/many-kubernetes/stage2.tmpl
@@ -90,7 +90,6 @@ resource "kubernetes_service" "shpod_${index}" {
      name = "ssh"
      port = 22
      target_port = 22
-      node_port = 32222
    }
    type = "NodePort"
  }
@@ -222,7 +221,10 @@ output "ip_addresses_of_nodes" {
  value = join("\n", [
  %{ for index, cluster in clusters ~}
    join("\t", concat(
-      [ random_string.shpod_${index}.result, "ssh -l k8s -p 32222" ],
+      [
+        random_string.shpod_${index}.result,
+        "ssh -l k8s -p $${kubernetes_service.shpod_${index}.spec[0].port[0].node_port}"
+      ],
      split(" ", file("./externalips.${index}"))
    )),
  %{ endfor ~}
--- a/prepare-labs/terraform/many-kubernetes/variables.tf
+++ b/prepare-labs/terraform/many-kubernetes/variables.tf
@@ -0,0 +1,28 @@
+variable "tag" {
+  type = string
+}
+
+variable "how_many_clusters" {
+  type    = number
+  default = 2
+}
+
+variable "nodes_per_cluster" {
+  type    = number
+  default = 2
+}
+
+variable "node_size" {
+  type    = string
+  default = "M"
+}
+
+variable "location" {
+  type = string
+  default = null
+}
+
+# TODO: perhaps handle if it's space-separated instead of newline?
+locals {
+  locations = var.location == null ? [null] : split("\n", var.location)
+}
--- a/prepare-labs/terraform/one-kubernetes/aws/common.tf
+++ b/prepare-labs/terraform/one-kubernetes/aws/common.tf
@@ -0,0 +1 @@
+../common.tf
--- a/prepare-labs/terraform/one-kubernetes/aws/config.tf
+++ b/prepare-labs/terraform/one-kubernetes/aws/config.tf
@@ -0,0 +1 @@
+../../providers/aws/config.tf
--- a/prepare-labs/terraform/one-kubernetes/aws/main.tf
+++ b/prepare-labs/terraform/one-kubernetes/aws/main.tf
@@ -0,0 +1,87 @@
+# Taken from:
+# https://github.com/hashicorp/learn-terraform-provision-eks-cluster/blob/main/main.tf
+
+data "aws_availability_zones" "available" {}
+
+module "vpc" {
+  source  = "terraform-aws-modules/vpc/aws"
+  version = "3.19.0"
+
+  name = var.cluster_name
+
+  cidr = "10.0.0.0/16"
+  azs  = slice(data.aws_availability_zones.available.names, 0, 3)
+
+  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
+  public_subnets  = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]
+
+  enable_nat_gateway   = true
+  single_nat_gateway   = true
+  enable_dns_hostnames = true
+
+  public_subnet_tags = {
+    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
+    "kubernetes.io/role/elb"                    = 1
+  }
+
+  private_subnet_tags = {
+    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
+    "kubernetes.io/role/internal-elb"           = 1
+  }
+}
+
+module "eks" {
+  source  = "terraform-aws-modules/eks/aws"
+  version = "19.5.1"
+
+  cluster_name    = var.cluster_name
+  cluster_version = "1.24"
+
+  vpc_id                         = module.vpc.vpc_id
+  subnet_ids                     = module.vpc.private_subnets
+  cluster_endpoint_public_access = true
+
+  eks_managed_node_group_defaults = {
+    ami_type = "AL2_x86_64"
+
+  }
+
+  eks_managed_node_groups = {
+    one = {
+      name = "node-group-one"
+
+      instance_types = [local.node_size]
+
+      min_size     = var.min_nodes_per_pool
+      max_size     = var.max_nodes_per_pool
+      desired_size = var.min_nodes_per_pool
+    }
+  }
+}
+
+# https://aws.amazon.com/blogs/containers/amazon-ebs-csi-driver-is-now-generally-available-in-amazon-eks-add-ons/ 
+data "aws_iam_policy" "ebs_csi_policy" {
+  arn = "arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy"
+}
+
+module "irsa-ebs-csi" {
+  source  = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
+  version = "4.7.0"
+
+  create_role                   = true
+  role_name                     = "AmazonEKSTFEBSCSIRole-${module.eks.cluster_name}"
+  provider_url                  = module.eks.oidc_provider
+  role_policy_arns              = [data.aws_iam_policy.ebs_csi_policy.arn]
+  oidc_fully_qualified_subjects = ["system:serviceaccount:kube-system:ebs-csi-controller-sa"]
+}
+
+resource "aws_eks_addon" "ebs-csi" {
+  cluster_name             = module.eks.cluster_name
+  addon_name               = "aws-ebs-csi-driver"
+  addon_version            = "v1.5.2-eksbuild.1"
+  service_account_role_arn = module.irsa-ebs-csi.iam_role_arn
+  tags = {
+    "eks_addon" = "ebs-csi"
+    "terraform" = "true"
+  }
+}
--- a/prepare-labs/terraform/one-kubernetes/aws/outputs.tf
+++ b/prepare-labs/terraform/one-kubernetes/aws/outputs.tf
@@ -0,0 +1,44 @@
+output "cluster_id" {
+  value = module.eks.cluster_arn
+}
+
+output "has_metrics_server" {
+  value = false
+}
+
+output "kubeconfig" {
+  sensitive = true
+  value = yamlencode({
+    apiVersion = "v1"
+    kind       = "Config"
+    clusters = [{
+      name = var.cluster_name
+      cluster = {
+        certificate-authority-data = module.eks.cluster_certificate_authority_data
+        server                     = module.eks.cluster_endpoint
+      }
+    }]
+    contexts = [{
+      name = var.cluster_name
+      context = {
+        cluster = var.cluster_name
+        user    = var.cluster_name
+      }
+    }]
+    users = [{
+      name = var.cluster_name
+      user = {
+        exec = {
+          apiVersion = "client.authentication.k8s.io/v1beta1"
+          command    = "aws"
+          args       = ["eks", "get-token", "--cluster-name", var.cluster_name]
+        }
+      }
+    }]
+    current-context = var.cluster_name
+  })
+}
+
+data "aws_eks_cluster_auth" "_" {
+  name = module.eks.cluster_name
+}
--- a/prepare-labs/terraform/one-kubernetes/aws/provider.tf
+++ b/prepare-labs/terraform/one-kubernetes/aws/provider.tf
@@ -0,0 +1,8 @@
+terraform {
+  required_providers {
+    aws = {
+      source  = "hashicorp/aws"
+      version = "~> 4.47.0"
+    }
+  }
+}
--- a/prepare-labs/terraform/one-kubernetes/aws/variables.tf
+++ b/prepare-labs/terraform/one-kubernetes/aws/variables.tf
@@ -0,0 +1 @@
+../../providers/aws/variables.tf
--- a/prepare-labs/terraform/one-kubernetes/azure/common.tf
+++ b/prepare-labs/terraform/one-kubernetes/azure/common.tf
@@ -0,0 +1 @@
+../common.tf
--- a/prepare-labs/terraform/one-kubernetes/azure/config.tf
+++ b/prepare-labs/terraform/one-kubernetes/azure/config.tf
@@ -0,0 +1 @@
+../../providers/azure/config.tf
--- a/prepare-labs/terraform/one-kubernetes/azure/main.tf
+++ b/prepare-labs/terraform/one-kubernetes/azure/main.tf
@@ -0,0 +1,22 @@
+resource "azurerm_resource_group" "_" {
+  name     = var.cluster_name
+  location = var.location
+}
+
+resource "azurerm_kubernetes_cluster" "_" {
+  name       = var.cluster_name
+  location   = var.location
+  dns_prefix = var.cluster_name
+  identity {
+    type = "SystemAssigned"
+  }
+  resource_group_name = azurerm_resource_group._.name
+  default_node_pool {
+    name                = "x86"
+    node_count          = var.min_nodes_per_pool
+    min_count           = var.min_nodes_per_pool
+    max_count           = var.max_nodes_per_pool
+    vm_size             = local.node_size
+    enable_auto_scaling = true
+  }
+}
--- a/prepare-labs/terraform/one-kubernetes/azure/outputs.tf
+++ b/prepare-labs/terraform/one-kubernetes/azure/outputs.tf
@@ -0,0 +1,12 @@
+output "cluster_id" {
+  value = azurerm_kubernetes_cluster._.id
+}
+
+output "has_metrics_server" {
+  value = true
+}
+
+output "kubeconfig" {
+  value     = azurerm_kubernetes_cluster._.kube_config_raw
+  sensitive = true
+}
--- a/prepare-labs/terraform/one-kubernetes/azure/provider.tf
+++ b/prepare-labs/terraform/one-kubernetes/azure/provider.tf
@@ -0,0 +1,7 @@
+terraform {
+  required_providers {
+    azurerm = {
+      source = "hashicorp/azurerm"
+    }
+  }
+}
--- a/prepare-labs/terraform/one-kubernetes/azure/variables.tf
+++ b/prepare-labs/terraform/one-kubernetes/azure/variables.tf
@@ -0,0 +1 @@
+../../providers/azure/variables.tf
--- a/prepare-labs/terraform/one-kubernetes/civo/common.tf
+++ b/prepare-labs/terraform/one-kubernetes/civo/common.tf
@@ -0,0 +1 @@
+../common.tf
--- a/prepare-labs/terraform/one-kubernetes/civo/config.tf
+++ b/prepare-labs/terraform/one-kubernetes/civo/config.tf
@@ -0,0 +1 @@
+../../providers/civo/config.tf
--- a/prepare-labs/terraform/one-kubernetes/civo/main.tf
+++ b/prepare-labs/terraform/one-kubernetes/civo/main.tf
@@ -0,0 +1,17 @@
+# As of March 2023, the default type ("k3s") only supports up
+# to Kubernetes 1.23, which belongs to a museum.
+# So let's use Talos, which supports up to 1.25.
+
+resource "civo_kubernetes_cluster" "_" {
+  name         = var.cluster_name
+  firewall_id  = civo_firewall._.id
+  cluster_type = "talos"
+  pools {
+    size       = local.node_size
+    node_count = var.min_nodes_per_pool
+  }
+}
+
+resource "civo_firewall" "_" {
+  name = var.cluster_name
+}
--- a/prepare-labs/terraform/one-kubernetes/civo/outputs.tf
+++ b/prepare-labs/terraform/one-kubernetes/civo/outputs.tf
@@ -0,0 +1,12 @@
+output "cluster_id" {
+  value = civo_kubernetes_cluster._.id
+}
+
+output "has_metrics_server" {
+  value = false
+}
+
+output "kubeconfig" {
+  value     = civo_kubernetes_cluster._.kubeconfig
+  sensitive = true
+}
--- a/prepare-labs/terraform/one-kubernetes/civo/provider.tf
+++ b/prepare-labs/terraform/one-kubernetes/civo/provider.tf
@@ -0,0 +1,7 @@
+terraform {
+  required_providers {
+    civo = {
+      source = "civo/civo"
+    }
+  }
+}
--- a/prepare-labs/terraform/one-kubernetes/civo/variables.tf
+++ b/prepare-labs/terraform/one-kubernetes/civo/variables.tf
@@ -0,0 +1 @@
+../../providers/civo/variables.tf
--- a/prepare-labs/terraform/one-kubernetes/common.tf
+++ b/prepare-labs/terraform/one-kubernetes/common.tf
@@ -0,0 +1,28 @@
+variable "cluster_name" {
+  type    = string
+  default = "deployed-with-terraform"
+}
+
+variable "common_tags" {
+  type    = list(string)
+  default = []
+}
+
+variable "node_size" {
+  type    = string
+  default = "M"
+}
+
+variable "min_nodes_per_pool" {
+  type    = number
+  default = 2
+}
+
+variable "max_nodes_per_pool" {
+  type    = number
+  default = 4
+}
+
+locals {
+  node_size = lookup(var.node_sizes, var.node_size, var.node_size)
+}
--- a/prepare-labs/terraform/one-kubernetes/digitalocean/common.tf
+++ b/prepare-labs/terraform/one-kubernetes/digitalocean/common.tf
@@ -0,0 +1 @@
+../common.tf
--- a/prepare-labs/terraform/one-kubernetes/digitalocean/config.tf
+++ b/prepare-labs/terraform/one-kubernetes/digitalocean/config.tf
@@ -0,0 +1 @@
+../../providers/digitalocean/config.tf
--- a/prepare-labs/terraform/one-kubernetes/digitalocean/main.tf
+++ b/prepare-labs/terraform/one-kubernetes/digitalocean/main.tf
@@ -3,15 +3,18 @@ resource "digitalocean_kubernetes_cluster" "_" {
  tags = var.common_tags
  # Region is mandatory, so let's provide a default value.
  region  = var.location != null ? var.location : "nyc1"
-  version = var.k8s_version
+  version = data.digitalocean_kubernetes_versions._.latest_version

  node_pool {
    name       = "x86"
    tags       = var.common_tags
-    size       = local.node_type
-    auto_scale = true
+    size       = local.node_size
+    auto_scale = var.max_nodes_per_pool > var.min_nodes_per_pool
    min_nodes  = var.min_nodes_per_pool
    max_nodes  = max(var.min_nodes_per_pool, var.max_nodes_per_pool)
  }

 }
+
+data "digitalocean_kubernetes_versions" "_" {
+}
--- a/prepare-labs/terraform/one-kubernetes/digitalocean/outputs.tf
+++ b/prepare-labs/terraform/one-kubernetes/digitalocean/outputs.tf
@@ -1,7 +1,3 @@
-output "kubeconfig" {
-  value = digitalocean_kubernetes_cluster._.kube_config.0.raw_config
-}
-
 output "cluster_id" {
  value = digitalocean_kubernetes_cluster._.id
 }
@@ -9,3 +5,8 @@ output "cluster_id" {
 output "has_metrics_server" {
  value = false
 }
+
+output "kubeconfig" {
+  value     = digitalocean_kubernetes_cluster._.kube_config.0.raw_config
+  sensitive = true
+}
--- a/prepare-labs/terraform/one-kubernetes/digitalocean/provider.tf
+++ b/prepare-labs/terraform/one-kubernetes/digitalocean/provider.tf
--- a/prepare-labs/terraform/one-kubernetes/digitalocean/variables.tf
+++ b/prepare-labs/terraform/one-kubernetes/digitalocean/variables.tf
@@ -0,0 +1 @@
+../../providers/digitalocean/variables.tf
--- a/prepare-labs/terraform/one-kubernetes/exoscale/common.tf
+++ b/prepare-labs/terraform/one-kubernetes/exoscale/common.tf
@@ -0,0 +1 @@
+../common.tf
--- a/prepare-labs/terraform/one-kubernetes/exoscale/config.tf
+++ b/prepare-labs/terraform/one-kubernetes/exoscale/config.tf
@@ -0,0 +1 @@
+../../providers/exoscale/config.tf
--- a/prepare-labs/terraform/one-kubernetes/exoscale/main.tf
+++ b/prepare-labs/terraform/one-kubernetes/exoscale/main.tf
@@ -0,0 +1,20 @@
+resource "exoscale_sks_cluster" "_" {
+  zone          = var.location
+  name          = var.cluster_name
+  service_level = "starter"
+}
+
+resource "exoscale_sks_nodepool" "_" {
+  cluster_id    = exoscale_sks_cluster._.id
+  zone          = exoscale_sks_cluster._.zone
+  name          = var.cluster_name
+  instance_type = local.node_size
+  size          = var.min_nodes_per_pool
+}
+
+resource "exoscale_sks_kubeconfig" "_" {
+  cluster_id = exoscale_sks_cluster._.id
+  zone       = exoscale_sks_cluster._.zone
+  user       = "kubernetes-admin"
+  groups     = ["system:masters"]
+}
--- a/prepare-labs/terraform/one-kubernetes/exoscale/outputs.tf
+++ b/prepare-labs/terraform/one-kubernetes/exoscale/outputs.tf
@@ -0,0 +1,12 @@
+output "cluster_id" {
+  value = exoscale_sks_cluster._.id
+}
+
+output "has_metrics_server" {
+  value = true
+}
+
+output "kubeconfig" {
+  value     = exoscale_sks_kubeconfig._.kubeconfig
+  sensitive = true
+}
--- a/prepare-labs/terraform/one-kubernetes/exoscale/provider.tf
+++ b/prepare-labs/terraform/one-kubernetes/exoscale/provider.tf
@@ -0,0 +1,7 @@
+terraform {
+  required_providers {
+    exoscale = {
+      source = "exoscale/exoscale"
+    }
+  }
+}
--- a/prepare-labs/terraform/one-kubernetes/exoscale/variables.tf
+++ b/prepare-labs/terraform/one-kubernetes/exoscale/variables.tf
@@ -0,0 +1 @@
+../../providers/exoscale/variables.tf
--- a/prepare-labs/terraform/one-kubernetes/googlecloud/common.tf
+++ b/prepare-labs/terraform/one-kubernetes/googlecloud/common.tf
@@ -0,0 +1 @@
+../common.tf
--- a/prepare-labs/terraform/one-kubernetes/googlecloud/config.tf
+++ b/prepare-labs/terraform/one-kubernetes/googlecloud/config.tf
@@ -0,0 +1 @@
+../../providers/googlecloud/config.tf
--- a/prepare-labs/terraform/one-kubernetes/googlecloud/locals.tf
+++ b/prepare-labs/terraform/one-kubernetes/googlecloud/locals.tf
@@ -0,0 +1,12 @@
+locals {
+  location = var.location != null ? var.location : "europe-north1-a"
+  region   = replace(local.location, "/-[a-z]$/", "")
+  # Unfortunately, the following line doesn't work
+  # (that attribute just returns an empty string)
+  # so we have to hard-code the project name.
+  #project = data.google_client_config._.project
+  project = "prepare-tf"
+}
+
+data "google_client_config" "_" {}
+
--- a/prepare-labs/terraform/one-kubernetes/googlecloud/main.tf
+++ b/prepare-labs/terraform/one-kubernetes/googlecloud/main.tf
@@ -1,8 +1,8 @@
 resource "google_container_cluster" "_" {
-  name               = var.cluster_name
-  project            = local.project
-  location           = local.location
-  min_master_version = var.k8s_version
+  name     = var.cluster_name
+  project  = local.project
+  location = local.location
+  #min_master_version = var.k8s_version

  # To deploy private clusters, uncomment the section below,
  # and uncomment the block in network.tf.
@@ -43,12 +43,12 @@ resource "google_container_cluster" "_" {
    name = "x86"
    node_config {
      tags         = var.common_tags
-      machine_type = local.node_type
+      machine_type = local.node_size
    }
    initial_node_count = var.min_nodes_per_pool
    autoscaling {
      min_node_count = var.min_nodes_per_pool
-      max_node_count = max(var.min_nodes_per_pool, var.max_nodes_per_pool)
+      max_node_count = var.max_nodes_per_pool
    }
  }

@@ -62,4 +62,3 @@ resource "google_container_cluster" "_" {
    }
  }
 }
-
--- a/prepare-labs/terraform/one-kubernetes/googlecloud/network.tf
+++ b/prepare-labs/terraform/one-kubernetes/googlecloud/network.tf
--- a/prepare-labs/terraform/one-kubernetes/googlecloud/outputs.tf
+++ b/prepare-labs/terraform/one-kubernetes/googlecloud/outputs.tf
@@ -1,7 +1,14 @@
-data "google_client_config" "_" {}
+output "cluster_id" {
+  value = google_container_cluster._.id
+}
+
+output "has_metrics_server" {
+  value = true
+}

 output "kubeconfig" {
-  value = <<-EOT
+  sensitive = true
+  value     = <<-EOT
    apiVersion: v1
    kind: Config
    current-context: ${google_container_cluster._.name}
@@ -25,11 +32,3 @@ output "kubeconfig" {
        token: ${data.google_client_config._.access_token}
    EOT
 }
-
-output "cluster_id" {
-  value = google_container_cluster._.id
-}
-
-output "has_metrics_server" {
-  value = true
-}
--- a/prepare-labs/terraform/one-kubernetes/googlecloud/provider.tf
+++ b/prepare-labs/terraform/one-kubernetes/googlecloud/provider.tf
--- a/prepare-labs/terraform/one-kubernetes/googlecloud/variables.tf
+++ b/prepare-labs/terraform/one-kubernetes/googlecloud/variables.tf
@@ -0,0 +1 @@
+../../providers/googlecloud/variables.tf
--- a/prepare-labs/terraform/one-kubernetes/linode/common.tf
+++ b/prepare-labs/terraform/one-kubernetes/linode/common.tf
@@ -0,0 +1 @@
+../common.tf
--- a/prepare-labs/terraform/one-kubernetes/linode/config.tf
+++ b/prepare-labs/terraform/one-kubernetes/linode/config.tf
@@ -0,0 +1 @@
+../../providers/linode/config.tf
--- a/prepare-labs/terraform/one-kubernetes/linode/main.tf
+++ b/prepare-labs/terraform/one-kubernetes/linode/main.tf
@@ -3,10 +3,10 @@ resource "linode_lke_cluster" "_" {
  tags  = var.common_tags
  # "region" is mandatory, so let's provide a default value if none was given.
  region      = var.location != null ? var.location : "eu-central"
-  k8s_version = local.k8s_version
+  k8s_version = data.linode_lke_versions._.versions[0].id

  pool {
-    type  = local.node_type
+    type  = local.node_size
    count = var.min_nodes_per_pool
    autoscaler {
      min = var.min_nodes_per_pool
@@ -15,3 +15,9 @@ resource "linode_lke_cluster" "_" {
  }

 }
+
+data "linode_lke_versions" "_" {
+}
+
+# FIXME: sort the versions to be sure that we get the most recent one?
+# (We don't know in which order they are returned by the provider.)
--- a/prepare-labs/terraform/one-kubernetes/linode/outputs.tf
+++ b/prepare-labs/terraform/one-kubernetes/linode/outputs.tf
@@ -1,7 +1,3 @@
-output "kubeconfig" {
-  value = base64decode(linode_lke_cluster._.kubeconfig)
-}
-
 output "cluster_id" {
  value = linode_lke_cluster._.id
 }
@@ -9,3 +5,8 @@ output "cluster_id" {
 output "has_metrics_server" {
  value = false
 }
+
+output "kubeconfig" {
+  value     = base64decode(linode_lke_cluster._.kubeconfig)
+  sensitive = true
+}
--- a/prepare-labs/terraform/one-kubernetes/linode/provider.tf
+++ b/prepare-labs/terraform/one-kubernetes/linode/provider.tf
@@ -2,7 +2,7 @@ terraform {
  required_providers {
    linode = {
      source  = "linode/linode"
-      version = "1.22.0"
+      version = "1.30.0"
    }
  }
 }
--- a/prepare-labs/terraform/one-kubernetes/linode/variables.tf
+++ b/prepare-labs/terraform/one-kubernetes/linode/variables.tf
@@ -0,0 +1 @@
+../../providers/linode/variables.tf
--- a/prepare-labs/terraform/one-kubernetes/oci/common.tf
+++ b/prepare-labs/terraform/one-kubernetes/oci/common.tf
@@ -0,0 +1 @@
+../common.tf
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Jérôme Petazzoni	77606044f6	😈 Demonware advanced Kubernetes custom content	2023-12-07 15:31:04 -06:00
Jérôme Petazzoni	dbfda8b458	🐞 Typo fix	2023-12-06 15:31:09 -06:00
Jérôme Petazzoni	c8fc67c995	📃 Update V's name and social media link	2023-12-04 16:41:03 -06:00
Jérôme Petazzoni	28222db2e4	⏳ Add 1-second pre-pssh delay Seems to help with AT&T fiber router. (Actually it takes a longer delay to make a difference, like 10 seconds, but this patch makes the delay configurable.)	2023-12-04 16:38:33 -06:00
Jérôme Petazzoni	a38f930858	📦 Use new k8s package repositories	2023-12-03 21:33:25 -06:00
Jérôme Petazzoni	2cef200726	➕ Add DMUC+RBAC exercises	2023-12-03 15:38:43 -06:00
Jérôme Petazzoni	1f77a52137	📃 Flesh out upgrade information Add the official policy (which is to drain nodes before upgrading), and give some explanations about when it may/may not be fine to upgrade without draining nodes.	2023-11-30 16:45:11 -06:00
Jérôme Petazzoni	b188e0f8a9	🔧 Mention priorityClasses around resource pressure	2023-11-30 16:10:12 -06:00
Jérôme Petazzoni	ac203a128d	➕ Add content about disruptions and PDB	2023-11-30 15:36:32 -06:00
Jérôme Petazzoni	a9920e5cf0	🌐 Add IPv6 support in netlify DNS scriptlet	2023-11-30 15:32:03 -06:00
Jérôme Petazzoni	d1047f950d	📃 Update resource limits to add ephemeral-storage	2023-11-29 14:23:24 -06:00
Jérôme Petazzoni	e380509ffe	💈 Tweak CSS for consistent spacing after titles	2023-11-29 14:22:54 -06:00
Jérôme Petazzoni	b5c754211e	➕ Mention Validating Admission Policies and CEL	2023-11-24 12:29:44 -06:00
Jérôme Petazzoni	cc57d983b2	🔧 Add Linode portal size for reference	2023-10-30 13:12:20 +01:00
Jérôme Petazzoni	fd86e6079d	✂️ Remove Service Catalog This doesn't seem to be supported anymore, and looking at https://github.com/kubernetes-retired/service-catalog/tree/master it even looks like the whole thing might be deprecated?	2023-10-26 18:20:09 +02:00
Jérôme Petazzoni	08f2e76082	🐞 Fix a couple of typos	2023-10-26 17:53:53 +02:00
Jérôme Petazzoni	db848767c1	⏫ Update kubebuilder instructions for new controller semantics	2023-10-26 17:49:26 +02:00
Jérôme Petazzoni	c07f52c493	🔧 Add function to delete CloudFlare DNS records	2023-10-22 09:20:39 +02:00
Jérôme Petazzoni	016c8fc863	🔧 Add GP2 instance size to portal env (for reference)	2023-10-17 10:17:29 +02:00
Jérôme Petazzoni	b9bbccb346	⏫ Bump up Network Policy documentation link versions	2023-10-10 15:09:20 +02:00
Jérôme Petazzoni	311a2aaf32	🔧 Add scaleway invocation to konk script	2023-10-10 07:37:56 +02:00
Jérôme Petazzoni	a19585a587	🧹 Add clean up snippet for Scaleway PVC	2023-09-22 09:21:29 +02:00
Jérôme Petazzoni	354bd9542e	➕ Add scriptlet to list exoscale zones	2023-09-14 14:50:36 +02:00
Jérôme Petazzoni	0c73e91e6f	🔧 Tweak slides order + typo fix	2023-09-14 13:59:20 +02:00
Jérôme Petazzoni	23064b5d26	🔧 Show file name in vim	2023-09-13 16:11:03 +02:00
Jérôme Petazzoni	971314a84f	🔧 Minor fixes in DMUC refactor	2023-09-13 16:09:26 +02:00
Jérôme Petazzoni	c0689cc5df	⚡️ New content for M5 Instead of showing kubenet and kuberouter with Kubernetes 1.19, we now start with Kubernetes 1.28 (or whatever is the latest version) along with containerd and CNI.	2023-08-27 21:16:34 +02:00
Jérôme Petazzoni	033873064a	🏭️ Refactor deployment scripts for monokube/polykube Break out kubernetes package installation and kubeadm invocation to two different steps, so that we can install kubernetes packages without setting up the cluster (for the new DMUC labs).	2023-08-25 17:49:30 +02:00
Jérôme Petazzoni	1ed3af6eff	🖼️ Change openstack image selection mechanism Instead of passing an image name through a terraform variable, use tags to select the latest image matching the specified tags (in this case, os=Ubuntu version=22.04).	2023-08-24 01:11:31 +02:00
Jérôme Petazzoni	33ddfce3fa	🐞 Tweak index.yaml There's something wrong with the self-paced slides (see #632) but I'm not sure what the problem is exactly 😅	2023-08-17 21:22:43 +02:00
Jérôme Petazzoni	943783c8fb	🐞 Fix typo in swarm metrics setup Closes #631. Thanks @Zakariasemlali for noticing this :)	2023-08-04 02:11:39 +02:00
Or Navon	46b3aa23bf	Fix minor grammar mistake	2023-07-31 11:27:28 +02:00
Jérôme Petazzoni	4498dc41a4	🔧 Make TF_VAR_cluster_name mandatory in testing script	2023-07-28 14:51:20 +02:00
Jérôme Petazzoni	58de0d31f8	🔧 Fix AWS and OCI configurations	2023-06-19 22:38:44 +02:00
Jérôme Petazzoni	d32d986a9e	➕ Add support for Azure AKS and OVH MKS	2023-06-18 19:55:31 +02:00
Jérôme Petazzoni	fcb922628c	📃 Add documentation for cloud credentials	2023-06-17 19:22:58 +02:00
Jérôme Petazzoni	77ceba7f5b	🔧 Fix broken links in intro to docker slides Closes #622 I recovered some of the case studies from the internet archive, and removed the other links.	2023-06-15 23:07:25 +02:00
Jérôme Petazzoni	ccb73fc872	➕ Add CloudFlare script (WIP)	2023-05-29 12:24:54 +02:00
Jérôme Petazzoni	bb302a25de	✂️ Split prereqs/handson instructions	2023-05-29 09:05:57 +02:00
Julien Girardin	e66b90eb4e	Replace ship lab by kustomize lab	2023-05-26 17:33:38 +02:00
dependabot[bot]	74add4d435	Bump socket.io-parser from 4.2.2 to 4.2.3 in /slides/autopilot Bumps [socket.io-parser](https://github.com/socketio/socket.io-parser) from 4.2.2 to 4.2.3. - [Release notes](https://github.com/socketio/socket.io-parser/releases) - [Changelog](https://github.com/socketio/socket.io-parser/blob/main/CHANGELOG.md) - [Commits](https://github.com/socketio/socket.io-parser/compare/4.2.2...4.2.3) --- updated-dependencies: - dependency-name: socket.io-parser dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2023-05-25 16:25:15 +02:00
Jérôme Petazzoni	5ee1367e79	🖼️ Use ngrok/ngrok image instead of building it from scratch	2023-05-25 16:09:47 +02:00
Jérôme Petazzoni	c1f8177f4e	🔧 Pass kubernetesVersion: in kubeadm config file	2023-05-17 19:04:32 +02:00
Jérôme Petazzoni	d4a9ea2461	🪆 Fix vcluster deployment and add konk.sh script	2023-05-16 19:16:19 +02:00
Jérôme Petazzoni	dd0f6d00fa	🏭️ Refactor the DaemonSet section	2023-05-14 20:10:23 +02:00
Jérôme Petazzoni	79359e2abc	🏭️ Refactor YAML and Namespace chapters	2023-05-14 19:58:45 +02:00
Jérôme Petazzoni	9cd812de75	⏫ Update ingress chapter and manifest	2023-05-13 12:06:47 +02:00
Jérôme Petazzoni	e29bfe7921	🔧 Improve mk8s Terraform configuration - instead of using 'kubectl wait nodes', we now use a simpler 'kubectl get nodes -o name' and check if there is anything in the output. This seems to work better (as the previous method would sometimes remain stuck because the kubectl process would never get stopped by SIGPIPE). - the shpod SSH NodePort is no longer hard-coded to 32222, which allows us to use e.g. vcluster to deploy multiple Kubernetes labs on a single 'home' (or 'outer') Kubernetes cluster.	2023-05-13 08:19:19 +02:00
Jérôme Petazzoni	11bc78851b	➕ Add Scaleway and Hetzner to ARM providers	2023-05-12 18:13:19 +02:00
Jérôme Petazzoni	c611f55dca	⏫ Update cluster upgrade section We now go from 1.22 to 1.23. Updating to 1.22 was necessary because Kubernetes 1.27 deprecated kubeadm config v1beta2, which forced us to upgrade to v1beta3, which was only introduced in 1.22. In other words, our scripts can only install Kubernetes 1.22+ now.	2023-05-12 07:23:36 +02:00
Jérôme Petazzoni	980bc66c3a	🔧 Improve output of 'labctl tags'	2023-05-12 07:03:49 +02:00
Jérôme Petazzoni	fd0bc97a7a	🔓️ Disable port protection on AWS and OpenStack This is required for the kubenet and kuberouter labs, for 'operating kubernetes' training classes.	2023-05-12 06:57:54 +02:00
Jérôme Petazzoni	8f6c32e94a	🔧 Tweak history limit to keep 1 million lines	2023-05-11 14:43:04 +02:00
Jérôme Petazzoni	1a711f8c2c	➕ Add kubent Kube No Trouble (kubent) is a simple tool to check whether you're using any of these API versions in your cluster and therefore should upgrade your workloads first, before upgrading your Kubernetes cluster.	2023-05-10 19:12:55 +02:00
Jérôme Petazzoni	0080f21817	➕ Add velero CLI	2023-05-10 18:45:34 +02:00
ENIX NOC	f937456232	Fixed executable name for pssh on ubuntu	2023-05-09 15:28:37 +00:00
ENIX NOC	8376aba5fd	Fixed ssh key usage when setting password	2023-05-09 15:28:20 +00:00
Jérôme Petazzoni	6d13122a4d	➕ Add BuildKit RUN --mount=type=cache...	2023-05-09 07:50:40 +02:00
Jérôme Petazzoni	8184c46ed3	⏫ Upgrade metrics-server install instructions	2023-05-09 07:25:48 +02:00
Jérôme Petazzoni	0b900f9e5c	➕ Add example file for OpenStack tfvars	2023-05-09 07:25:11 +02:00
Jérôme Petazzoni	e14d0d4ca4	🔧 Tweak netlify DNS script to take domain as env var Now that script can be used for container.training, but also our other properties at Netlify (e.g. tinyshellscript.com)	2023-05-08 21:50:17 +02:00
dependabot[bot]	cdb1e41524	Bump engine.io and socket.io in /slides/autopilot Bumps [engine.io](https://github.com/socketio/engine.io) to 6.4.2 and updates ancestor dependency [socket.io](https://github.com/socketio/socket.io). These dependencies need to be updated together. Updates `engine.io` from 6.2.1 to 6.4.2 - [Release notes](https://github.com/socketio/engine.io/releases) - [Changelog](https://github.com/socketio/engine.io/blob/main/CHANGELOG.md) - [Commits](https://github.com/socketio/engine.io/compare/6.2.1...6.4.2) Updates `socket.io` from 4.5.1 to 4.6.1 - [Release notes](https://github.com/socketio/socket.io/releases) - [Changelog](https://github.com/socketio/socket.io/blob/main/CHANGELOG.md) - [Commits](https://github.com/socketio/socket.io/compare/4.5.1...4.6.1) --- updated-dependencies: - dependency-name: engine.io dependency-type: indirect - dependency-name: socket.io dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2023-05-04 10:25:18 +02:00
Jérôme Petazzoni	600e7c441c	⏫ Bump up kubeadm configuration version v1beta2 support was removed in Kubernetes 1.27. Warning, v1beta3 was introduced in Kubernetes 1.22 (I think?) which means that the minimum version for "old cluster" deployments is now 1.22.	2023-04-24 06:58:06 +02:00
Jérôme Petazzoni	81913d88a0	➕ Add script to list civo locations	2023-04-23 16:13:51 +02:00
Jérôme Petazzoni	17d3d9a92a	♻️ Add clean up script to remove stray LBs and PVs	2023-04-12 08:25:47 +02:00
Jérôme Petazzoni	dd026b3db2	📃 Update healthchecks section	2023-04-11 12:42:51 +02:00
Jérôme Petazzoni	b9426af9cd	✂️ Remove Dockerfile and Compose file They're not valid anymore, and fixing them would require quite a lot of work, since we drastically changed the way we provision things. I'm removing them rather than leaving a completely broken thing.	2023-04-11 10:19:20 +02:00
MrUtkarsh	aa4c0846ca	Update Dockerfile_Tips.md Updated the chown to chmod as its repeated.	2023-04-10 16:18:34 +02:00
Jérôme Petazzoni	abca33af29	🏭️ Second pass of Terraform refactoring Break down provider-specific configuration into two files: - config.tf (actual configuration, e.g. credentials, that cannot be included in submodules) - variables.tf (per-provider knobs and settings, e.g. mapping logical VM size like S/M/L to actual cloud SKUs)	2023-04-09 09:45:05 +02:00
Jérôme Petazzoni	f69a9d3eb8	🔧 Update .gitignore to get some Terraform stuff out of the way	2023-04-04 19:34:51 +02:00
Jérôme Petazzoni	bc10c5a5ca	📔 A bit of doc 😅	2023-04-04 19:32:49 +02:00
Jérôme Petazzoni	b6340acb6e	⚛️ Huge refactoring of lab environment deployment system Summary of changes: - "workshopctl" is now "labctl" - it can handle deployment of VMs but also of managed Kubernetes clusters (and therefore, it replaces the "prepare-tf" directory) - support for many more providers has been added Check the README.md, in particular the "directory structure"; it has the most important information.	2023-03-29 18:36:48 +02:00
Jérôme Petazzoni	f8ab4adfb7	⚙️ Make it possible to change number of parallel SSH connections with env var	2023-03-21 17:54:29 +01:00
Jérôme Petazzoni	dc8bd21062	📃 Add YAML exercise	2023-03-20 12:56:06 +01:00
Jérôme Petazzoni	c9710a9f70	📃 Update YAML section - fix mapping example - fix indentation - add information about multi-documents - add information about multi-line strings	2023-03-20 12:46:16 +01:00
ENIX NOC	bc1ba942c0	🔧 Retry 'terraform apply' 3 times if it fails Some platforms (looking at you OpenStack) can exhibit random transient failures. This helps to work around them.	2023-03-11 19:42:57 +01:00
ENIX NOC	fa0a894ebc	🔧 OpenStack pool and external_network_id are now variables	2023-03-11 19:42:57 +01:00
ENIX NOC	e78e0de377	🐞 Fix bug in 'passwords' action It was still hard-coded to user 'docker' instead of using the USER_LOGIN environment variable. Also add download-retry when wgetting the websocketd deb.	2023-03-11 19:42:57 +01:00
Jérôme Petazzoni	cba2ff5ff7	🔧 Check for httpie in netlify DNS script	2023-03-08 17:57:17 +01:00
Jérôme Petazzoni	d8f8bf6d87	♻️ Switch Hetzner to the new Terraform system	2023-03-04 15:24:51 +01:00
Jérôme Petazzoni	84f131cdc5	🏭️ Refactor Digital Ocean and Linode authentication in prepare-tf Fetch credentials from CLI configuration files instead of environment variables.	2023-03-04 14:35:09 +01:00
Jérôme Petazzoni	8738f68a72	🏭️ Small refactorings to prepare Terraform migration - add support for Digital Ocean (through Terraform) - add support for per-cluster SSH key (hackish for now) - pre-load Kubernetes APT GPG key (because of GCS outage)	2023-03-04 13:40:43 +01:00
Jérôme Petazzoni	e130884184	⏫ Bump up DOK version	2023-03-04 10:18:53 +01:00
Jérôme Petazzoni	74cb1aec85	⚙️ Store terraform variables (# of nodes...) in tfvars file Using environment variables was a mistake, because they must be set again manually each time we want to re-apply the Terraform configurations. Instead, put the variables in a tfvars file.	2023-03-04 10:18:35 +01:00
Jérôme Petazzoni	70e60d7f4e	🏭️ Big refactoring to move to Ubuntu 22.04 Instead of Ubuntu 18.04, we should use 22.04 (especially as 18.04 will be EOL soon). This moves a few providers to 22.04 (and more will follow). We now ship a small containerd configuration file (instead of defaulting to an empty configuration like we did before) since it looks like recent versions of containerd cause infinite crashloops if the cgroups driver isn't set properly. Also, Linode is now provisioned using Terraform (instead of the old-style system relying on linode-cli) which should make instance provisioning faster (thanks to Terraform parallelism). The "wait" command now tries to log in with both "ubuntu" and "root", and if it fails with "ubuntu" but succeeds with "root", it will create the "ubuntu" user and give it full sudo rights. Finally, a "standardize" action has been created to gather all the commands that deal with non-standard Ubuntu images. Note that for completeness, we should check that all providers work correctly; currently only Linode has been validated.	2023-02-23 16:32:10 +01:00
Jérôme Petazzoni	29b3185e7e	🐘 Add link to Mastodon profile	2023-02-23 10:06:38 +01:00
Jérôme Petazzoni	0616d74e37	➕ Add gentle intro to YAML	2023-02-22 20:56:46 +01:00
Jérôme Petazzoni	676ebcdd3f	♻️ Replace jpetazzo/httpenv with jpetazzo/color	2023-02-20 14:22:02 +01:00
Jérôme Petazzoni	28f0253242	➕ Add kubectl np-viewer in network policy section	2023-02-20 10:37:53 +01:00