Compare commits

..

87 Commits

Author SHA1 Message Date
Jérôme Petazzoni
8c62ba7b28 🏖️ Highfive May 2025 2025-06-13 08:52:05 +02:00
Jérôme Petazzoni
71ee3012fb Add DMUC advanced exercises 2025-06-13 08:49:59 +02:00
Jérôme Petazzoni
5ed12d6631 🔧 Tweak backup chapter 2025-06-13 08:49:59 +02:00
Jérôme Petazzoni
839b50a7a6 📃 Update chapter on static pods 2025-06-13 08:49:59 +02:00
Jérôme Petazzoni
e0fdbfdb50 📃 Update control plane auth section 2025-06-13 08:49:59 +02:00
Jérôme Petazzoni
d9f53288f2 🔒️ Update section on user key and cert generation 2025-06-13 08:49:59 +02:00
Jérôme Petazzoni
697e9cf9f7 🔗 Links to docs and blog posts about ephemeral storage isolation 2025-06-13 08:49:59 +02:00
Jérôme Petazzoni
6b06fa2b35 🔗 Update Kyverno doc links 2025-06-13 08:49:59 +02:00
Jérôme Petazzoni
240b2a24e2 🐞 Typo fix 2025-06-13 08:49:59 +02:00
Hiranyey Gajbhiye
4bc97aa1b8 Update concepts-k8s.md
Fixed spelling mistake if it was unintentional
2025-06-13 08:49:59 +02:00
Jérôme Petazzoni
798dc2216c 📃 Clarify what needs to be scaled up in healthcheck lab 2025-06-13 08:49:59 +02:00
Jérôme Petazzoni
5117b27386 🔧 Tweak portal VM size to use GP4 (GP2 is deprecated) 2025-06-13 08:49:59 +02:00
Jérôme Petazzoni
d2f736a850 📍 Pin express version in webui 2025-06-13 08:49:59 +02:00
Jérôme Petazzoni
01c374d0a4 Merge pull request #664 from lpiot/main
The missing slides…😅
2025-06-13 08:48:44 +02:00
Ludovic Piot
eee44979c5 📝 Add Kyverno install chapter 2025-06-12 22:13:19 +02:00
Ludovic Piot
4d3bc06e30 📝 Add Kyverno install chapter 2025-06-12 21:50:42 +02:00
Ludovic Piot
229ab045b3 🔥 2025-06-12 21:04:06 +02:00
Ludovic Piot
fe1a61eaeb 🎨 2025-06-12 21:03:49 +02:00
Ludovic Piot
9613589dea 📝 Add small section about SSH keypairs rotation for Flux 2025-06-12 20:23:59 +02:00
Ludovic Piot
ca8865a10b 📝 Change the mermaid scenario diagram 2025-06-12 20:07:11 +02:00
Ludovic Piot
f279bbea11 ✏️ 2025-06-12 20:06:27 +02:00
Ludovic Piot
bc6100301e 📝 Add monitoring stack install 2025-06-12 20:05:14 +02:00
Jérôme Petazzoni
a32751636a Merge pull request #663 from lpiot/main
The deck with a small fix
2025-06-11 20:33:27 +02:00
Ludovic Piot
4a0e23d131 🐛 Sorry Jerome 2025-06-11 19:59:52 +02:00
Ludovic Piot
6e987d1fca Merge branch 'm6' into main 2025-06-11 19:52:03 +02:00
Ludovic Piot
18b888009e 📝 Add an MVP Network policies section 2025-06-11 19:44:17 +02:00
Ludovic Piot
36dd8bb695 📝 Add the new chapters to the M6 stack 2025-06-11 19:33:35 +02:00
Ludovic Piot
395c5a38ab 🎨 Add reference to the chapter title 2025-06-11 19:24:57 +02:00
Ludovic Piot
2b0d3b87ac 📝 Add OpenEBS install chapter 2025-06-11 19:24:13 +02:00
Ludovic Piot
a165e60407 📝 Add k0s install chapter 2025-06-11 19:22:40 +02:00
Ludovic Piot
3c13fd51dd 🎨 Add Mario animation when Flux reconcile 2025-06-11 19:22:04 +02:00
Ludovic Piot
324ad2fdd0 🎨 Update mermaid scenario diagram 2025-06-11 19:21:13 +02:00
Ludovic Piot
269ae79e30 📝 Add k0s install chapter 2025-06-11 17:08:52 +02:00
Ludovic Piot
39a15b3d7d ✏️ Clean up consistency about how we evoke the OPS team 2025-06-11 17:08:52 +02:00
Ludovic Piot
9e7ed8cb49 📝 Add MOVY tenant creation chapter 2025-06-11 17:08:52 +02:00
Ludovic Piot
06e7a47659 📝 Upgrade the mermaid scenario 2025-06-11 17:08:52 +02:00
Ludovic Piot
802e525f57 📝 Add Ingress chapter 2025-06-11 17:08:52 +02:00
Ludovic Piot
0f68f89840 📝 Add Ingress chapter 2025-06-11 17:08:52 +02:00
Ludovic Piot
b275342bd2 ✏️ Fixing TEST emphasis 2025-06-11 17:08:52 +02:00
Ludovic Piot
e11e97ccff 📝 Add k0s install chapter 2025-06-11 15:10:43 +02:00
Ludovic Piot
023a9d0346 ✏️ Clean up consistency about how we evoke the OPS team 2025-06-10 19:20:25 +02:00
Ludovic Piot
3f5eaae6b9 📝 Add MOVY tenant creation chapter 2025-06-10 19:19:19 +02:00
Ludovic Piot
1634d5b5bc 📝 Upgrade the mermaid scenario 2025-06-10 17:15:38 +02:00
Ludovic Piot
40418be55a 📝 Add Ingress chapter 2025-06-10 16:19:06 +02:00
Ludovic Piot
04198b7f91 📝 Add Ingress chapter 2025-06-10 16:05:17 +02:00
Jérôme Petazzoni
150c8fc768 Merge pull request #660 from lpiot/main
Mostly the scenario upgrade with Mermaid schemas
2025-06-10 14:24:18 +02:00
Ludovic Piot
e2af1bb057 ✏️ Fixing TEST emphasis 2025-06-10 12:51:09 +02:00
Ludovic Piot
d4c260aa4a 💄 📝 🎨 Upgrade the mermaid scenario schema 2025-06-09 21:20:57 +02:00
Ludovic Piot
89cd677b09 📝 upgrade R01 chapter 2025-06-09 21:20:57 +02:00
Ludovic Piot
3008680c12 🛂 🐛 fix permissions for persistentVolumes management 2025-06-09 21:20:57 +02:00
Ludovic Piot
f7b8184617 🎨 2025-06-09 21:20:57 +02:00
Jérôme Petazzoni
a565c0979c Merge pull request #659 from lpiot/main
Add R01 chapter and fixes to previous chapters
2025-06-09 20:05:55 +02:00
Jérôme Petazzoni
7a11f03b5e Merge branch 'm6' into main 2025-06-09 20:05:26 +02:00
Ludovic Piot
b0760b99a5 ✏️ 📝 Fix shpod access methods 2025-06-09 17:11:57 +02:00
Ludovic Piot
bcb9c3003f 📝 Add R01 chapter about test-ROCKY tenant config 2025-06-09 17:10:35 +02:00
Ludovic Piot
99ce9b3a8a 🎨 📝 Add missing steps in demo 2025-06-09 16:09:45 +02:00
Ludovic Piot
0ba602b533 🎨 clean up code display 2025-06-09 16:08:58 +02:00
Jérôme Petazzoni
d43c41e11e Proof-read first half of M6-START 2025-06-09 14:46:13 +02:00
Ludovic Piot
331309dc63 🎨 cleanup display of some console results 2025-06-09 14:11:05 +02:00
Ludovic Piot
44146915e0 📝 🍱 add T03 chapter 2025-06-04 23:55:33 +02:00
Ludovic Piot
84996e739b 🍱 📝 rewording and updating pics 2025-06-04 23:54:51 +02:00
Ludovic Piot
2aea1f70b2 📝 Add Flux install 2025-05-29 18:00:18 +02:00
Ludovic Piot
985e2ae42c 📝 add M6 intro slidedeck 2025-05-29 12:25:57 +02:00
Ludovic Piot
ea58428a0c 🐛 Slides now generate! ♻️ Move a slide 2025-05-14 22:05:59 +02:00
Ludovic Piot
59e60786c0 🎨 make personnae and cluster names consistent 2025-05-14 21:49:09 +02:00
Ludovic Piot
af63cf1405 🚨 2025-05-14 21:25:59 +02:00
Ludovic Piot
f9041807f6 🎉 first M6 draft slidedeck 2025-05-14 20:52:32 +02:00
Jérôme Petazzoni
785d704726 🏭️ Rework Kyverno chapter 2025-05-11 18:34:11 +02:00
Jérôme Petazzoni
cd346ecace 📃 Update slides about k8s setup 2025-05-07 22:33:30 +02:00
Jérôme Petazzoni
4de3c303a6 🐞 Don't query when overwriting partial zip download
Thanks @swacquie for that one
2025-05-05 19:04:52 +02:00
Jérôme Petazzoni
121713a6c7 🔧 Tweak devcontainer configuration 2025-05-02 19:43:45 +02:00
Jérôme Petazzoni
4431cfe68a 📦️ Add devcontainer
This is still highly experimental, but hopefully it'll
let us go through the beginning of the class with
github codespaces.
2025-05-02 13:04:14 +02:00
Jérôme Petazzoni
dcf218dbe2 🐞 Fix webssh python version 2025-04-28 10:07:55 +02:00
Jérôme Petazzoni
43ff815d9f 🐞 Fix tabs in logins.jsonl 2025-04-27 14:03:02 +02:00
Jérôme Petazzoni
92e61ef83b ☁️ Add nano instances for scaleway konk usecase 2025-04-27 12:53:41 +02:00
Jérôme Petazzoni
45770cc584 Add monokube exercise 2025-03-25 17:35:01 -05:00
Jérôme Petazzoni
58700396f9 🐞 Fix permissions for injected kubeconfig in mk8s stage2 2025-03-23 18:27:31 -05:00
Jérôme Petazzoni
8783da014c 🐞 Handle dualstack nodes (with multiple ExternalIP) 2025-03-23 18:15:50 -05:00
Jérôme Petazzoni
f780100217 Add kuik and a blue green exercise 2025-03-22 18:46:55 -05:00
Jérôme Petazzoni
555cd058bb 🔗 Fix source link in API deep dive 2025-03-22 18:07:18 -05:00
Jérôme Petazzoni
a05d1f9d4f ♻️ Use a variable for proxmox VM storage 2025-02-17 18:38:18 +01:00
Jérôme Petazzoni
84365d03c6 🔧 Add tags to Proxmox VMs; use linked clones by default 2025-02-17 17:28:53 +00:00
Jérôme Petazzoni
164bc01388 🛜 code-server will now also listen on IPv6 2025-02-17 17:28:01 +00:00
Jérôme Petazzoni
c07116bd29 ♻️ Update etcdctl snapshot commands; mention auger 2025-02-17 18:26:34 +01:00
Jérôme Petazzoni
c4057f9c35 🔧 Minor update to Kyverno chapter and manifests 2025-02-17 14:46:07 +01:00
Jérôme Petazzoni
f57bd9a072 Bump code server version 2025-02-17 12:55:24 +01:00
Jérôme Petazzoni
fca6396540 🐞 Fix Flux link ref 2025-02-12 11:01:00 +01:00
92 changed files with 4993 additions and 534 deletions

View File

@@ -0,0 +1,26 @@
{
"name": "container.training environment to get started with Docker and/or Kubernetes",
"image": "ghcr.io/jpetazzo/shpod",
"features": {
//"ghcr.io/devcontainers/features/common-utils:2": {}
},
// Use 'forwardPorts' to make a list of ports inside the container available locally.
"forwardPorts": [],
//"postCreateCommand": "... install extra packages...",
"postStartCommand": "dind.sh",
// This lets us use "docker-outside-docker".
// Unfortunately, minikube, kind, etc. don't work very well that way;
// so for now, we'll likely use "docker-in-docker" instead (with a
// privilege dcontainer). But we're still exposing that socket in case
// someone wants to do something interesting with it.
"mounts": ["source=/var/run/docker.sock,target=/var/run/docker-host.sock,type=bind"],
// This is for docker-in-docker.
"privileged": true,
// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
"remoteUser": "k8s"
}

View File

@@ -1,5 +1,5 @@
FROM node:4-slim
RUN npm install express
RUN npm install express@4
RUN npm install redis@3
COPY files/ /files/
COPY webui.js /

View File

@@ -0,0 +1,9 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: ingress-nginx-controller
namespace: ingress-nginx
data:
use-forwarded-headers: true
compute-full-forwarded-for: true
use-proxy-protocol: true

View File

@@ -0,0 +1,10 @@
apiVersion: v1
kind: Namespace
metadata:
labels:
app.kubernetes.io/instance: flux-system
app.kubernetes.io/part-of: flux
app.kubernetes.io/version: v2.5.1
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/warn-version: latest
name: ingress-nginx

View File

@@ -0,0 +1,12 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- M6-ingress-nginx-components.yaml
- sync.yaml
patches:
- path: M6-ingress-nginx-cm-patch.yaml
target:
kind: ConfigMap
- path: M6-ingress-nginx-svc-patch.yaml
target:
kind: Service

View File

@@ -0,0 +1,8 @@
apiVersion: v1
kind: Service
metadata:
name: ingress-nginx-controller
namespace: ingress-nginx
annotations:
service.beta.kubernetes.io/scw-loadbalancer-proxy-protocol-v2: true
service.beta.kubernetes.io/scw-loadbalancer-use-hostname: true

View File

@@ -0,0 +1,10 @@
apiVersion: v1
kind: Namespace
metadata:
labels:
app.kubernetes.io/instance: flux-system
app.kubernetes.io/part-of: flux
app.kubernetes.io/version: v2.5.1
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/warn-version: latest
name: kyverno

View File

@@ -0,0 +1,72 @@
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: flux-multi-tenancy
spec:
validationFailureAction: enforce
rules:
- name: serviceAccountName
exclude:
resources:
namespaces:
- flux-system
match:
resources:
kinds:
- Kustomization
- HelmRelease
validate:
message: ".spec.serviceAccountName is required"
pattern:
spec:
serviceAccountName: "?*"
- name: kustomizationSourceRefNamespace
exclude:
resources:
namespaces:
- flux-system
- ingress-nginx
- kyverno
- monitoring
- openebs
match:
resources:
kinds:
- Kustomization
preconditions:
any:
- key: "{{request.object.spec.sourceRef.namespace}}"
operator: NotEquals
value: ""
validate:
message: "spec.sourceRef.namespace must be the same as metadata.namespace"
deny:
conditions:
- key: "{{request.object.spec.sourceRef.namespace}}"
operator: NotEquals
value: "{{request.object.metadata.namespace}}"
- name: helmReleaseSourceRefNamespace
exclude:
resources:
namespaces:
- flux-system
- ingress-nginx
- kyverno
- monitoring
- openebs
match:
resources:
kinds:
- HelmRelease
preconditions:
any:
- key: "{{request.object.spec.chart.spec.sourceRef.namespace}}"
operator: NotEquals
value: ""
validate:
message: "spec.chart.spec.sourceRef.namespace must be the same as metadata.namespace"
deny:
conditions:
- key: "{{request.object.spec.chart.spec.sourceRef.namespace}}"
operator: NotEquals
value: "{{request.object.metadata.namespace}}"

View File

@@ -0,0 +1,29 @@
apiVersion: v1
kind: Namespace
metadata:
labels:
app.kubernetes.io/instance: flux-system
app.kubernetes.io/part-of: flux
app.kubernetes.io/version: v2.5.1
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/warn-version: latest
name: monitoring
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana
namespace: monitoring
spec:
ingressClassName: nginx
rules:
- host: grafana.test.metal.mybestdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: kube-prometheus-stack-grafana
port:
number: 80

View File

@@ -0,0 +1,35 @@
---
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: deny-from-other-namespaces
spec:
podSelector: {}
ingress:
- from:
- podSelector: {}
---
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-webui
spec:
podSelector:
matchLabels:
app: web
ingress:
- from: []
---
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-db
spec:
podSelector:
matchLabels:
app: db
ingress:
- from:
- podSelector:
matchLabels:
app: web

View File

@@ -0,0 +1,10 @@
apiVersion: v1
kind: Namespace
metadata:
labels:
app.kubernetes.io/instance: flux-system
app.kubernetes.io/part-of: flux
app.kubernetes.io/version: v2.5.1
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/warn-version: latest
name: openebs

View File

@@ -0,0 +1,12 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: openebs
resources:
- M6-openebs-components.yaml
- sync.yaml
configMapGenerator:
- name: openebs-values
files:
- values.yaml=M6-openebs-values.yaml
configurations:
- M6-openebs-kustomizeconfig.yaml

View File

@@ -0,0 +1,6 @@
nameReference:
- kind: ConfigMap
version: v1
fieldSpecs:
- path: spec/valuesFrom/name
kind: HelmRelease

View File

@@ -0,0 +1,15 @@
# helm install openebs --namespace openebs openebs/openebs
# --set engines.replicated.mayastor.enabled=false
# --set lvm-localpv.lvmNode.kubeletDir=/var/lib/k0s/kubelet/
# --create-namespace
engines:
replicated:
mayastor:
enabled: false
# Needed for k0s install since kubelet install is slightly divergent from vanilla install >:-(
lvm-localpv:
lvmNode:
kubeletDir: /var/lib/k0s/kubelet/
localprovisioner:
hostpathClass:
isDefaultClass: true

View File

@@ -0,0 +1,38 @@
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
namespace: rocky-test
name: rocky-full-access
rules:
- apiGroups: ["", extensions, apps]
resources: [deployments, replicasets, pods, services, ingresses, statefulsets]
verbs: [get, list, watch, create, update, patch, delete] # You can also use [*]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: rocky-pv-access
rules:
- apiGroups: [""]
resources: [persistentvolumes]
verbs: [get, list, watch, create, patch]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
toolkit.fluxcd.io/tenant: rocky
name: rocky-reconciler2
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: rocky-pv-access
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: gotk:rocky-test:reconciler
- kind: ServiceAccount
name: rocky
namespace: rocky-test

19
k8s/M6-rocky-ingress.yaml Normal file
View File

@@ -0,0 +1,19 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: rocky
namespace: rocky-test
spec:
ingressClassName: nginx
rules:
- host: rocky.test.mybestdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web
port:
number: 80

View File

@@ -0,0 +1,8 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base/rocky
patches:
- path: M6-rocky-test-patch.yaml
target:
kind: Kustomization

View File

@@ -0,0 +1,7 @@
apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
name: rocky
namespace: rocky-test
spec:
path: ./k8s/plain

View File

@@ -3,7 +3,6 @@ kind: ClusterPolicy
metadata:
name: pod-color-policy-1
spec:
validationFailureAction: enforce
rules:
- name: ensure-pod-color-is-valid
match:
@@ -18,5 +17,6 @@ spec:
operator: NotIn
values: [ red, green, blue ]
validate:
failureAction: Enforce
message: "If it exists, the label color must be red, green, or blue."
deny: {}

View File

@@ -3,7 +3,6 @@ kind: ClusterPolicy
metadata:
name: pod-color-policy-2
spec:
validationFailureAction: enforce
background: false
rules:
- name: prevent-color-change
@@ -22,6 +21,7 @@ spec:
operator: NotEquals
value: ""
validate:
failureAction: Enforce
message: "Once label color has been added, it cannot be changed."
deny:
conditions:

View File

@@ -3,7 +3,6 @@ kind: ClusterPolicy
metadata:
name: pod-color-policy-3
spec:
validationFailureAction: enforce
background: false
rules:
- name: prevent-color-change
@@ -22,7 +21,6 @@ spec:
operator: Equals
value: ""
validate:
failureAction: Enforce
message: "Once label color has been added, it cannot be removed."
deny:
conditions:
deny: {}

View File

@@ -14,10 +14,12 @@ STUDENTS=30
case "$PROVIDER" in
linode)
export TF_VAR_node_size=g6-standard-6
export TF_VAR_location=eu-west
export TF_VAR_location=us-east
;;
scaleway)
export TF_VAR_node_size=PRO2-XS
# For tiny testing purposes, these are okay too:
#export TF_VAR_node_size=PLAY2-NANO
export TF_VAR_location=fr-par-2
;;
esac
@@ -36,7 +38,7 @@ fi
# set external_ip labels
kubectl get nodes -o=jsonpath='{range .items[*]}{.metadata.name} {.status.addresses[?(@.type=="'$ADDRTYPE'")].address}{"\n"}{end}' |
while read node address; do
while read node address ignoredaddresses; do
kubectl label node $node external_ip=$address
done

View File

@@ -55,7 +55,7 @@ _cmd_codeserver() {
need_tag
ARCH=${ARCHITECTURE-amd64}
CODESERVER_VERSION=4.96.2
CODESERVER_VERSION=4.96.4
CODESERVER_URL=https://github.com/coder/code-server/releases/download/v${CODESERVER_VERSION}/code-server-${CODESERVER_VERSION}-linux-${ARCH}.tar.gz
pssh "
set -e
@@ -76,7 +76,7 @@ Description=code-server
WantedBy=default.target
[Service]
ExecStart=/usr/local/bin/code-server --bind-addr 0:1789
ExecStart=/usr/local/bin/code-server --bind-addr [::]:1789
Restart=always
EOF
sudo systemctl --user -M $USER_LOGIN@ enable code-server.service --now
@@ -374,9 +374,13 @@ _cmd_clusterize() {
done < /tmp/cluster
"
while read line; do
printf '{"login": "%s", "password": "%s", "ipaddrs": "%s"}\n' "$USER_LOGIN" "$USER_PASSWORD" "$line"
done < tags/$TAG/clusters.tsv > tags/$TAG/logins.jsonl
jq --raw-input --compact-output \
--arg USER_LOGIN "$USER_LOGIN" --arg USER_PASSWORD "$USER_PASSWORD" '
{
"login": $USER_LOGIN,
"password": $USER_PASSWORD,
"ipaddrs": .
}' < tags/$TAG/clusters.tsv > tags/$TAG/logins.jsonl
echo cluster_ok > tags/$TAG/status
}
@@ -1116,7 +1120,7 @@ _cmd_tailhist () {
set -e
sudo apt-get install unzip -y
wget -c https://github.com/joewalnes/websocketd/releases/download/v0.3.0/websocketd-0.3.0-linux_$ARCH.zip
unzip websocketd-0.3.0-linux_$ARCH.zip websocketd
unzip -o websocketd-0.3.0-linux_$ARCH.zip websocketd
sudo mv websocketd /usr/local/bin/websocketd
sudo mkdir -p /opt/tailhist
sudo tee /opt/tailhist.service <<EOF
@@ -1392,7 +1396,7 @@ WantedBy=multi-user.target
[Service]
WorkingDirectory=/opt/webssh
ExecStart=/usr/bin/env python run.py --fbidhttp=false --port=1080 --policy=reject
ExecStart=/usr/bin/env python3 run.py --fbidhttp=false --port=1080 --policy=reject
User=nobody
Group=nogroup
Restart=always

View File

@@ -1,4 +1,4 @@
#export TF_VAR_node_size=GP2.4
#export TF_VAR_node_size=GP4.4
#export TF_VAR_node_size=g6-standard-6
#export TF_VAR_node_size=m7i.xlarge

View File

@@ -107,6 +107,36 @@ resource "helm_release" "metrics_server_${index}" {
}
}
# This section here deserves a little explanation.
#
# When we access a cluster with shpod (either through SSH or code-server)
# there is no kubeconfig file - we simply use "in-cluster" authentication
# with a ServiceAccount token. This is a bit unusual, and ideally, I would
# prefer to have a "normal" kubeconfig file in the students' shell.
#
# So what we're doing here, is that we're populating a ConfigMap with
# a kubeconfig file; and in the initialization scripts (e.g. bashrc) we
# automatically download the kubeconfig file from the ConfigMap and place
# it in ~/.kube/kubeconfig.
#
# But, which kubeconfig file should we use? We could use the "normal"
# kubeconfig file that was generated by the provider; but in some cases,
# that kubeconfig file might use a token instead of a certificate for
# user authentication - and ideally, I would like to have a certificate
# so that in the section about auth and RBAC, we can dissect that TLS
# certificate and explain where our permissions come from.
#
# So we're creating a TLS key pair; using the CSR API to issue a user
# certificate belongong to a special group; and grant the cluster-admin
# role to that group; then we use the kubeconfig file generated by the
# provider but override the user with that TLS key pair.
#
# This is not strictly necessary but it streamlines the lesson on auth.
#
# Lastly - in the ConfigMap we actually put both the original kubeconfig,
# and the one where we injected our new user (just in case we want to
# use or look at the original for any reason).
resource "kubernetes_config_map" "kubeconfig_${index}" {
provider = kubernetes.cluster_${index}
metadata {
@@ -153,6 +183,23 @@ resource "tls_cert_request" "cluster_admin_${index}" {
}
}
resource "kubernetes_cluster_role_binding" "shpod_cluster_admin_${index}" {
provider = kubernetes.cluster_${index}
metadata {
name = "shpod-cluster-admin"
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = "cluster-admin"
}
subject {
api_group = "rbac.authorization.k8s.io"
kind = "Group"
name = "shpod-cluster-admins"
}
}
resource "kubernetes_certificate_signing_request_v1" "cluster_admin_${index}" {
provider = kubernetes.cluster_${index}
metadata {

View File

@@ -13,6 +13,11 @@ variable "proxmox_password" {
default = null
}
variable "proxmox_storage" {
type = string
default = "local"
}
variable "proxmox_template_node_name" {
type = string
default = null

View File

@@ -8,6 +8,7 @@ resource "proxmox_virtual_environment_vm" "_" {
node_name = local.pve_nodes[each.value.node_index % length(local.pve_nodes)]
for_each = local.nodes
name = each.value.node_name
tags = ["container.training", var.tag]
stop_on_destroy = true
cpu {
cores = split(" ", each.value.node_size)[0]
@@ -17,7 +18,7 @@ resource "proxmox_virtual_environment_vm" "_" {
dedicated = split(" ", each.value.node_size)[1]
}
#disk {
# datastore_id = "ceph"
# datastore_id = var.proxmox_storage
# file_id = proxmox_virtual_environment_file._.id
# interface = "scsi0"
# size = 30
@@ -26,12 +27,13 @@ resource "proxmox_virtual_environment_vm" "_" {
clone {
vm_id = var.proxmox_template_vm_id
node_name = var.proxmox_template_node_name
full = false
}
agent {
enabled = true
}
initialization {
datastore_id = "ceph"
datastore_id = var.proxmox_storage
user_account {
username = "ubuntu"
keys = [trimspace(tls_private_key.ssh.public_key_openssh)]

View File

@@ -8,6 +8,9 @@ proxmox_endpoint = "https://localhost:8006/"
proxmox_username = "terraform@pve"
proxmox_password = "CHANGEME"
# Which storage to use for VM disks. Defaults to "local".
#proxmox_storage = "ceph"
proxmox_template_node_name = "CHANGEME"
proxmox_template_vm_id = CHANGEME

View File

@@ -5,7 +5,7 @@ chat: "[Mattermost](https://training.enix.io/mattermost)"
gitrepo: github.com/jpetazzo/container.training
slides: https://2025-01-enix.container.training/
slides: https://2025-05-enix.container.training/
#slidenumberprefix: "#SomeHashTag &mdash; "

View File

@@ -5,7 +5,7 @@ chat: "[Mattermost](https://training.enix.io/mattermost)"
gitrepo: github.com/jpetazzo/container.training
slides: https://2025-01-enix.container.training/
slides: https://2025-05-enix.container.training/
#slidenumberprefix: "#SomeHashTag &mdash; "
@@ -25,7 +25,7 @@ content:
#- shared/webssh.md
- shared/connecting.md
- exercises/k8sfundamentals-brief.md
- exercises/yaml-brief.md
- exercises/yaml-dockercoins-brief.md
- exercises/localcluster-brief.md
- exercises/healthchecks-brief.md
- shared/toc.md
@@ -64,7 +64,7 @@ content:
- k8s/localkubeconfig.md
- k8s/accessinternal.md
- k8s/kubectlproxy.md
- exercises/yaml-details.md
- exercises/yaml-dockercoins-details.md
- exercises/localcluster-details.md
- # 3
#- k8s/kubectlscale.md

View File

@@ -6,7 +6,7 @@ chat: "[Mattermost](https://training.enix.io/mattermost)"
gitrepo: github.com/jpetazzo/container.training
slides: https://2025-01-enix.container.training/
slides: https://2025-05-enix.container.training/
#slidenumberprefix: "#SomeHashTag &mdash; "

View File

@@ -5,7 +5,7 @@ chat: "[Mattermost](https://training.enix.io/mattermost)"
gitrepo: github.com/jpetazzo/container.training
slides: https://2025-01-enix.container.training/
slides: https://2025-05-enix.container.training/
#slidenumberprefix: "#SomeHashTag &mdash; "
@@ -26,7 +26,7 @@ content:
- shared/toc.md
- exercises/netpol-brief.md
- exercises/sealed-secrets-brief.md
- exercices/rbac-brief.md
- exercises/rbac-brief.md
- exercises/kyverno-ingress-domain-name-brief.md
- exercises/reqlim-brief.md
- #1

View File

@@ -5,7 +5,7 @@ chat: "[Mattermost](https://training.enix.io/mattermost)"
gitrepo: github.com/jpetazzo/container.training
slides: https://2025-01-enix.container.training/
slides: https://2025-05-enix.container.training/
#slidenumberprefix: "#SomeHashTag &mdash; "
@@ -14,7 +14,7 @@ exclude:
content:
- shared/title.md
- logistics-ludovic.md
- logistics.md
- k8s/intro.md
- shared/about-slides.md
- shared/chat-room-im.md
@@ -28,26 +28,38 @@ content:
- k8s/architecture.md
- k8s/deploymentslideshow.md
- k8s/dmuc-easy.md
-
- k8s/dmuc-medium.md
- k8s/dmuc-hard.md
- k8s/cni-internals.md
#- k8s/interco.md
- k8s/apilb.md
-
- k8s/internal-apis.md
- k8s/staticpods.md
- k8s/cluster-upgrade.md
- k8s/cluster-backup.md
#- k8s/cloud-controller-manager.md
-
- k8s/control-plane-auth.md
- k8s/user-cert.md
- k8s/control-plane-auth.md
- k8s/staticpods.md
- exercises/dmuc-auth-details.md
- exercises/dmuc-networking-details.md
- exercises/dmuc-staticpods-details.md
-
- k8s/dmuc-hard.md
- k8s/apilb.md
- k8s/cni-internals.md
- k8s/csr-api.md
- k8s/openid-connect.md
- k8s/pod-security-intro.md
- k8s/pod-security-policies.md
- k8s/pod-security-admission.md
#- k8s/interco.md
#- k8s/internal-apis.md
- k8s/cluster-upgrade.md
- k8s/cluster-backup.md
#- k8s/cloud-controller-manager.md
-
- k8s/M6-START-a-company-scenario.md
- k8s/M6-T02-flux-install.md
- k8s/M6-T03-installing-tenants.md
- k8s/M6-R01-flux_configure-ROCKY-deployment.md
- k8s/M6-T05-ingress-config.md
- k8s/M6-M01-adding-MOVY-tenant.md
- k8s/M6-K01-METAL-install.md
- k8s/M6-K03-openebs-install.md
- k8s/M6-monitoring-stack-install.md
- k8s/M6-kyverno-install.md
- shared/thankyou.md
#-
# |

View File

@@ -0,0 +1,32 @@
# Exercise — enable auth
- We want to enable authentication and authorization
- Checklist:
- non-privileged user can deploy in their namespace
<br/>(and nowhere else)
- each controller uses its own key, certificate, and identity
- each node uses its own key, certificate, and identity
- Service Accounts work properly
- See next slide for help / hints!
---
## Checklist
- Generate keys, certs, and kubeconfig for everything that needs them
(cluster admin, cluster user, controller manager, scheduler, kubelet)
- Reconfigure and restart each component to use its new identity
- Turn on `RBAC` and `Node` authorizers on the API server
- Check that everything works properly
(e.g. that you can create and scale a Deployment using the "cluster user" identity)

View File

@@ -0,0 +1,51 @@
# Exercise — networking
- We want to install extra networking components:
- a CNI configuration
- kube-proxy
- CoreDNS
- After doing that, we should be able to deploy a "complex" app
(with multiple containers communicating together + service discovery)
---
## CNI
- Easy option: Weave
https://github.com/weaveworks/weave/releases
- Better option: Cilium
https://docs.cilium.io/en/stable/gettingstarted/k8s-install-default/#install-the-cilium-cli
or https://docs.cilium.io/en/stable/installation/k8s-install-helm/#installation-using-helm
---
## kube-proxy
- Option 1: author a DaemonSet
- Option 2: leverage the CNI (some CNIs like Cilium can replace kube-proxy)
---
## CoreDNS
- Suggested method: Helm chart
(available on https://github.com/coredns/helm)
---
## Testing
- Try to deploy DockerCoins and confirm that it works
(for instance with [this YAML file](https://raw.githubusercontent.com/jpetazzo/container.training/refs/heads/main/k8s/dockercoins.yaml))

View File

@@ -0,0 +1,22 @@
# Exercise — static pods
- We want to run the control plane in static pods
(etcd, API server, controller manager, scheduler)
- For Kubernetes components, we can use [these images](https://kubernetes.io/releases/download/#container-images)
- For etcd, we can use [this image](https://quay.io/repository/coreos/etcd?tab=tags)
- If we're using keys, certificates... We can use [hostPath volumes](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath)
---
## Testing
After authoring our static pod manifests and placing them in the right directory,
we should be able to start our cluster simply by starting kubelet.
(Assuming that the container engine is already running.)
For bonus points: write and enable a systemd unit for kubelet!

View File

@@ -26,7 +26,7 @@
- it should initially show a few milliseconds latency
- that will increase when we scale up
- that will increase when we scale up the number of `worker` Pods
- it will also let us detect when the service goes "boom"

View File

@@ -26,8 +26,8 @@ When a Service gets created...
- We want to use a Kyverno `generate` ClusterPolicy
- For step 1, check [Generate Resources](https://kyverno.io/docs/writing-policies/generate/) documentation
- For step 1, check [Generate Resources](https://kyverno.io/docs/policy-types/cluster-policy/generate/) documentation
- For step 2, check [Preconditions](https://kyverno.io/docs/writing-policies/preconditions/) documentation
- For step 2, check [Preconditions](https://kyverno.io/docs/policy-types/cluster-policy/preconditions/) documentation
- For step 3, check [External Data Sources](https://kyverno.io/docs/writing-policies/external-data-sources/) documentation
- For step 3, check [External Data Sources](https://kyverno.io/docs/policy-types/cluster-policy/external-data-sources/) documentation

View File

@@ -0,0 +1,51 @@
# Exercise — Monokube static pods
- We want to run a very basic Kubernetes cluster by starting only:
- kubelet
- a container engine (e.g. Docker)
- The other components (control plane and otherwise) should be started with:
- static pods
- "classic" manifests loaded with e.g. `kubectl apply`
- This should be done with the "monokube" VM
(which has Docker and kubelet 1.19 binaries available)
---
## Images to use
Here are some suggestions of images:
- etcd → `quay.io/coreos/etcd:vX.Y.Z`
- Kubernetes components → `registry.k8s.io/kube-XXX:vX.Y.Z`
(where `XXX` = `apiserver`, `scheduler`, `controller-manager`)
To know which versions to use, check the version of the binaries installed on the `monokube` VM, and use the same ones.
See next slide for more hints!
---
## Inventory
We'll need to run:
- kubelet (with the flag for static pod manifests)
- Docker
- static pods for control plane components
(suggestion: use `hostNetwork`)
- static pod or DaemonSet for `kube-proxy`
(will require a privileged security context)

View File

@@ -0,0 +1,86 @@
# Exercise — Writing blue/green YAML
- We want to author YAML manifests for the "color" app
(use image `jpetazzo/color` or `ghcr.io/jpetazzo/color`)
- That app serves web requests on port 80
- We want to deploy two instances of that app (`blue` and `green`)
- We want to expose the app with a service named `front`, such that:
90% of the requests are sent to `blue`, and 10% to `green`
---
## End goal
- We want to be able to do something like:
```bash
kubectl apply -f blue-green-demo.yaml
```
- Then connect to the `front` service and see responses from `blue` and `green`
- Then measure e.g. on 100 requests how many go to `blue` and `green`
(we want a 90/10 traffic split)
- Go ahead, or check the next slides for hints!
---
## Step 1
- Test the app in isolation:
- create a Deployment called `blue`
- expose it with a Service
- connect to the service and see a "blue" reply
- If you use a `ClusterIP` service:
- if you're logged directly on the clusters you can connect directly
- otherwise you can use `kubectl port-forward`
- Otherwise, you can use a `NodePort` or `LoadBalancer` service
---
## Step 2
- Add the `green` Deployment
- Create the `front` service
- Edit the `front` service to replace its selector with a custom one
- Edit `blue` and `green` to add the label(s) of your custom selector
- Check that traffic hits both green and blue
- Think about how to obtain the 90/10 traffic split
---
## Step 3
- Generate, write, extract, ... YAML manifests for all components
(`blue` and `green` Deployments, `front` Service)
- Check that applying the manifests (e.g. in a brand new namespace) works
- Bonus points: add a one-shot pod to check the traffic split!
---
## Discussion
- Would this be a viable option to obtain, say, a 95% / 5% traffic split?
- What about 99% / 1 %?

View File

@@ -12,113 +12,119 @@
<table>
<tr>
<td>Mardi 21 janvier 2025</td>
<td>Mardi 13 mai 2025</td>
<td>
<a href="1.yml.html">Docker Intensif</a>
</td>
</tr>
<tr>
<td>Mercredi 22 janvier 2025</td>
<td>Mercredi 14 mai 2025</td>
<td>
<a href="1.yml.html">Docker Intensif</a>
</td>
</tr>
<tr>
<td>Jeudi 23 janvier 2025</td>
<td>Jeudi 15 mai 2025</td>
<td>
<a href="1.yml.html">Docker Intensif</a>
</td>
</tr>
<tr>
<td>Vendredi 24 janvier 2025</td>
<td>Vendredi 16 mai 2025</td>
<td>
<a href="1.yml.html">Docker Intensif</a>
</td>
</tr>
<tr>
<td>Mardi 28 janvier 2025</td>
<td>Mardi 20 mai 2025</td>
<td>
<a href="2.yml.html">Fondamentaux Kubernetes</a>
</td>
</tr>
<tr>
<td>Mercredi 29 janvier 2025</td>
<td>Mercredi 21 mai 2025</td>
<td>
<a href="2.yml.html">Fondamentaux Kubernetes</a>
</td>
</tr>
<tr>
<td>Jeudi 30 janvier 2025</td>
<td>Jeudi 22 mai 2025</td>
<td>
<a href="2.yml.html">Fondamentaux Kubernetes</a>
</td>
</tr>
<tr>
<td>Vendredi 31 janvier 2025</td>
<td>Vendredi 23 mai 2025</td>
<td>
<a href="2.yml.html">Fondamentaux Kubernetes</a>
</td>
</tr>
<tr>
<td>Lundi 3 février 2025</td>
<td>Lundi 26 mai 2025</td>
<td>
<a href="3.yml.html">Packaging d'applications pour Kubernetes</a>
</td>
</tr>
<tr>
<td>Mardi 4 février 2025</td>
<td>Mardi 27 mai 2025</td>
<td>
<a href="3.yml.html">Packaging d'applications pour Kubernetes</a>
</td>
</tr>
<tr>
<td>Mercredi 5 février 2025</td>
<td>Mercredi 28 mai 2025</td>
<td>
<a href="3.yml.html">Packaging d'applications pour Kubernetes</a>
</td>
</tr>
<tr>
<td>Jeudi 7 février 2025</td>
<td>Lundi 2 juin 2025</td>
<td>
<a href="4.yml.html">Kubernetes Avancé</a>
</td>
</tr>
<tr>
<td>Vendredi 7 février 2025</td>
<td>Mardi 3 juin 2025</td>
<td>
<a href="4.yml.html">Kubernetes Avancé</a>
</td>
</tr>
<tr>
<td>Lundi 10 février 2025</td>
<td>Mercredi 4 juin 2025</td>
<td>
<a href="4.yml.html">Kubernetes Avancé</a>
</td>
</tr>
<tr>
<td>Mardi 11 février 2025</td>
<td>Jeudi 5 juin 2025</td>
<td>
<a href="4.yml.html">Kubernetes Avancé</a>
</td>
</tr>
<tr>
<td>Mercredi 12 février 2025</td>
<td>Mardi 10 juin 2025</td>
<td>
<a href="5.yml.html">Opérer Kubernetes</a>
</td>
</tr>
<tr>
<td>Jeudi 13 février 2025</td>
<td>Mercredi 11 juin 2025</td>
<td>
<a href="5.yml.html">Opérer Kubernetes</a>
</td>
</tr>
<tr>
<td>Vendredi 14 février 2025</td>
<td>Jeudi 12 juin 2025</td>
<td>
<a href="5.yml.html">Opérer Kubernetes</a>
</td>
</tr>
<tr>
<td>Vendredi 13 juin 2025</td>
<td>
<a href="5.yml.html">Opérer Kubernetes</a>
</td>

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 73 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 186 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 221 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 69 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 162 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 570 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 278 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 347 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 192 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 71 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 241 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 189 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 29 KiB

View File

@@ -0,0 +1,349 @@
# K01- Installing a Kubernetes cluster from scratch
We operated a managed cluster from **Scaleway** `Kapsule`.
It's great! Most batteries are included:
- storage classes, with an already configured default one
- a default CNI with `Cilium`
<br/>(`Calico` is supported too)
- a _IaaS_ load-balancer that is manageable by `ingress-controllers`
- a management _WebUI_ with the Kubernetes dashboard
- an observability stack with `metrics-server` and the Kubernetes dashboard
But what about _on premises_ needs?
---
class: extra-details
## On premises Kubernetes distributions
The [CNCF landscape](https://landscape.cncf.io/?fullscreen=yes&zoom=200&group=certified-partners-and-providers) currently lists **61!** Kubernetes distributions, today.
Not speaking of Kubernetes managed services from Cloud providers…
Please, refer to the [`Setting up Kubernetes` chapter in the High Five M2 module](./2.yml.html#toc-setting-up-kubernetes) for more infos about Kubernetes distributions.
---
## Introducing k0s
Nowadays, some "light" distros are considered good enough to run production clusters.
That's the case for `k0s`.
It's an open source Kubernetes lightweight distribution.
Mainly relying on **Mirantis**, a long-time software vendor in Kubernetes ecosystem.
(The ones who bought `Docker Enterprise` a long time ago. remember?)
`k0s` aims to be both
- a lightweight distribution for _edge-computing_ and development pupose
- an enterprise-grade HA distribution fully supported by its editor
<br/>`MKE4` and `kordent` leverage on `k0s`
---
### `k0s` package
Its single binary includes:
- a CRI (`containerd`)
- Kubernetes vanilla control plane components (including both `etcd`)
- a vanilla network stack
- `kube-router`
- `kube-proxy`
- `coredns`
- `konnectivity`
- `kubectl` CLI
- install / uninstall features
- backup / restore features
---
class: pic
![k0s package](images/M6-k0s-packaging.png)
---
class: extra-details
### Konnectivity
You've seen that Kubernetes cluster architecture is very versatile.
I'm referring to the [`Kubernetes architecture` chapter in the High Five M5 module](./5.yml.html#toc-kubernetes-architecture)
Network communications between control plane components and worker nodes might be uneasy to configure.
`Konnectivity` is a response to this pain. It acts as an RPC proxy for any communication initiated from control plane to workers.
These communications are listed in [`Kubernetes internal APIs` chapter in the High Five M5 module](https://2025-01-enix.container.training/5.yml.html#toc-kubernetes-internal-apis)
The agent deployed on each worker node maintains an RPC tunnel with the one deployed on control plane side.
---
class: pic
![konnectivity architecture](images/M6-konnectivity-architecture.png)
---
## Installing `k0s`
It installs with a one-liner command
- either in single-node lightweight footprint
- or in multi-nodes HA footprint
.lab[
- Get the binary
```bash
docker@m621: ~$ wget https://github.com/k0sproject/k0sctl/releases/download/v0.25.1/k0sctl-linux-amd64
```
]
---
### Prepare the config file
.lab[
- Create the config file
```bash
docker@m621: ~$ k0sctl init \
--controller-count 3 \
--user docker \
--k0s m621 m622 m623 > k0sctl.yaml
```
- change the following field: `spec.hosts[*].role: controller+worker`
- add the following fields: `spec.hosts[*].noTaints: true`
```bash
docker@m621: ~$ k0sctl apply --config k0sctl.yaml
```
]
---
### And the famous one-liner
.lab[
```bash
k8s@shpod: ~$ k0sctl apply --config k0sctl.yaml
```
]
---
### Check that k0s installed correctly
.lab[
```bash
docker@m621 ~$ sudo k0s status
Version: v1.33.1+k0s.1
Process ID: 60183
Role: controller
Workloads: true
SingleNode: false
Kube-api probing successful: true
Kube-api probing last error:
docker@m621 ~$ sudo k0s etcd member-list
{"members":{"m621":"https://10.10.3.190:2380","m622":"https://10.10.2.92:2380","m623":"https://10.10.2.110:2380"}}
```
]
---
### `kubectl` is included
.lab[
```bash
docker@m621 ~$ sudo k0s kubectl get nodes
NAME STATUS ROLES AGE VERSION
m621 Ready control-plane 66m v1.33.1+k0s
m622 Ready control-plane 66m v1.33.1+k0s
m623 Ready control-plane 66m v1.33.1+k0s
docker@m621 ~$ sudo k0s kubectl run shpod --image jpetazzo/shpod
```
]
---
class: extra-details
### Single node install (for info!)
For testing purpose, you may want to use a single-node (yet `etcd`-geared) install…
.lab[
- Install it
```bash
docker@m621 ~$ curl -sSLf https://get.k0s.sh | sudo sh
docker@m621 ~$ sudo k0s install controller --single
docker@m621 ~$ sudo k0s start
```
- Reset it
```bash
docker@m621 ~$ sudo k0s start
docker@m621 ~$ sudo k0s reset
```
]
---
## Deploying shpod
.lab[
```bash
docker@m621 ~$ sudo k0s kubectl apply -f https://shpod.in/shpod.yaml
docker@m621 ~$ sudo k0s kubectl apply -f https://shpod.in/shpod.yaml
```
]
---
## Flux install
We'll install `Flux`.
And replay the all scenario a 2nd time.
Let's face it: we don't have that much time. 😅
Since all our install and configuration is `GitOps`-based, we might just leverage on copy-paste and code configuration…
Maybe.
Let's copy the 📂 `./clusters/CLOUDY` folder and rename it 📂 `./clusters/METAL`.
---
### Modifying Flux config 📄 files
- In 📄 file `./clusters/METAL/flux-system/gotk-sync.yaml`
</br>change the `Kustomization` value `spec.path: ./clusters/METAL`
- ⚠️ We'll have to adapt the `Flux` _CLI_ command line
- And that's pretty much it!
- We'll see if anything goes wrong on that new cluster
---
### Connecting to our dedicated `Github` repo to host Flux config
.lab[
- let's replace `GITHUB_TOKEN` and `GITHUB_REPO` values
- don't forget to change the patch to `clusters/METAL`
```bash
k8s@shpod:~$ export GITHUB_TOKEN="my-token" && \
export GITHUB_USER="container-training-fleet" && \
export GITHUB_REPO="fleet-config-using-flux-XXXXX"
k8s@shpod:~$ flux bootstrap github \
--owner=${GITHUB_USER} \
--repository=${GITHUB_REPO} \
--team=OPS \
--team=ROCKY --team=MOVY \
--path=clusters/METAL
```
]
---
class: pic
![Running Mario](images/M6-running-Mario.gif)
---
### Flux deployed our complete stack
Everything seems to be here but…
- one database is in `Pending` state
- our `ingresses` don't work well
```bash
k8s@shpod ~$ curl --header 'Host: rocky.test.enixdomain.com' http://${myIngressControllerSvcIP}
curl: (52) Empty reply from server
```
---
### Fixing the Ingress
The current `ingress-nginx` configuration leverages on specific annotations used by Scaleway to bind a _IaaS_ load-balancer to the `ingress-controller`.
We don't have such kind of things here.😕
- We could bind our `ingress-controller` to a `NodePort`.
`ingress-nginx` install manifests propose it here:
</br>https://github.com/kubernetes/ingress-nginx/deploy/static/provider/baremetal
- In the 📄file `./clusters/METAL/ingress-nginx/sync.yaml`,
</br>change the `Kustomization` value `spec.path: ./deploy/static/provider/baremetal`
---
class: pic
![Running Mario](images/M6-running-Mario.gif)
---
### Troubleshooting the database
One of our `db-0` pod is in `Pending` state.
```bash
k8s@shpod ~$ k get pods db-0 -n *-test -oyaml
()
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2025-06-11T11:15:42Z"
message: '0/3 nodes are available: pod has unbound immediate PersistentVolumeClaims.
preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: Burstable
```
---
### Troubleshooting the PersistentVolumeClaims
```bash
k8s@shpod ~$ k get pvc postgresql-data-db-0 -n *-test -o yaml
()
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 9s (x182 over 45m) persistentvolume-controller no persistent volumes available for this claim and no storage class is set
```
No `storage class` is available on this cluster.
We hadn't the problem on our managed cluster since a default storage class was configured and then associated to our `PersistentVolumeClaim`.
Why is there no problem with the other database?

View File

@@ -0,0 +1,129 @@
# K03- Installing OpenEBS as our CSI
`OpenEBS` is a _CSI_ solution capable of hyperconvergence, synchronous replication and other extra features.
It installs with `Helm` charts.
- `Flux` is able to watch `Helm` repositories and install `HelmReleases`
- To inject its configuration into the `Helm chart` , `Flux` relies on a `ConfigMap` including the `values.yaml` file
.lab[
```bash
k8s@shpod ~$ mkdir -p ./clusters/METAL/openebs/ && \
cp -pr ~/container.training/k8s/M6-openebs-*.yaml \
./clusters/METAL/openebs/ && \
cd ./clusters/METAL/openebs/ && \
mv M6-openebs-kustomization.yaml kustomization.yaml && \
cd -
```
]
---
## Creating an `Helm` source in Flux for OpenEBS Helm chart
.lab[
```bash
k8s@shpod ~$ flux create source helm openebs \
--url=https://openebs.github.io/openebs \
--interval=3m \
--export > ./clusters/METAL/openebs/sync.yaml
```
]
---
## Creating the `HelmRelease` in Flux
.lab[
```bash
k8s@shpod ~$ flux create helmrelease openebs \
--namespace=openebs \
--source=HelmRepository/openebs.flux-system \
--chart=openebs \
--values-from=ConfigMap/openebs-values \
--export >> ./clusters/METAL/openebs/sync.yaml
```
]
---
## 📂 Let's review the files
- `M6-openebs-components.yaml`
</br>To include the `Flux` resources in the same _namespace_ where `Flux` installs the `OpenEBS` resources, we need to create the _namespace_ **before** the installation occurs
- `sync.yaml`
</br>The resources `Flux` uses to watch and get the `Helm chart`
- `M6-openebs-values.yaml`
</br> the `values.yaml` file that will be injected into the `Helm chart`
- `kustomization.yaml`
</br>This one is a bit special: it includes a [ConfigMap generator](https://kubectl.docs.kubernetes.io/references/kustomize/kustomization/configmapgenerator/)
- `M6-openebs-kustomizeconfig.yaml`
</br></br>This one is tricky: in order for `Flux` to trigger an upgrade of the `Helm Release` when the `ConfigMap` is altered, you need to explain to the `Kustomize ConfigMap generator` how the resources are relating with each others. 🤯
And here we go!
---
class: pic
![Running Mario](images/M6-running-Mario.gif)
---
## And the result
Now, we have a _cluster_ featuring `openEBS`.
But still… The PersistentVolumeClaim remains in `Pending` state!😭
```bash
k8s@shpod ~$ kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
openebs-hostpath openebs.io/local Delete WaitForFirstConsumer false 82m
```
We still don't have a default `StorageClass`!😤
---
### Manually enforcing the default `StorageClass`
Even if Flux is constantly reconciling our resources, we still are able to test evolutions by hand.
.lab[
```bash
k8s@shpod ~$ flux suspend helmrelease openebs -n openebs
► suspending helmrelease openebs in openebs namespace
✔ helmrelease suspended
k8s@shpod ~$ kubectl patch storageclass openebs-hostpath \
-p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
k8s@shpod ~$ k get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
openebs-hostpath (default) openebs.io/local Delete WaitForFirstConsumer false 82m
```
]
---
### Now the database is OK
```bash
k8s@shpod ~$ get pvc,pods -n movy-test
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/postgresql-data-db-0 Bound pvc-ede1634f-2478-42cd-8ee3-7547cd7cdde2 1Gi RWO openebs-hostpath <unset> 20m
NAME READY STATUS RESTARTS AGE
pod/db-0 1/1 Running 0 5h43m
()
```

View File

@@ -0,0 +1,320 @@
# M01- Configuring **_🎬MOVY_** deployment with Flux
**_🎸ROCKY_** _tenant_ is now fully usable in **_⚗TEST_** env, let's do the same for another _dev_ team: **_🎬MOVY_**
😈 We could do it by using `Flux` _CLI_,
but let's see if we can succeed by just adding manifests in our `Flux` configuration repository.
---
class: pic
![Flux configuration waterfall](images/M6-flux-config-dependencies.png)
---
## Impact study
In our `Flux` configuration repository:
- Creation of the following 📂 folders: `./tenants/[base|test]/MOVY`
- Modification of the following 📄 file: `./clusters/CLOUDY/tenants.yaml`?
- Well, we don't need to: the watched path include the whole `./tenants/[test]/*` folder
In the app repository:
- Creation of a `movy` branch to deploy another version of the app dedicated to movie soundtracks
---
### Creation of the 📂 folders
.lab[
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ \
cp -pr tenants/base/rocky tenants/base/movy
cp -pr tenants/test/rocky tenants/test/movy
```
]
---
### Modification of tenants/[base|test]/movy/* 📄 files
- For 📄`M6-rocky-*.yaml`, change the file names…
- and update the 📄`kustomization.yaml` file as a result
- In any file, replace any `rocky` entry by `movy`
- In 📄 `sync.yaml` be aware of what repository and what branch you want `Flux` to watch for **_🎬MOVY_** app deployment.
- for this demo, let's assume we create a `movy` branch
---
class: extra-details
### What about reusing rocky-cluster-roles?
💡 In 📄`M6-movy-cluster-role.yaml` and 📄`rbac.yaml`, we could have reused the already existing `ClusterRoles`: `rocky-full-access`, and `rocky-pv-access`
A `ClusterRole` is cluster wide. It is not dedicated to a namespace.
- Its permissions are restrained to a specific namespace by being bound to a `ServiceAccount` by a `RoleBinding`.
- Whereas a `ClusterRoleBinding` extends the permissions to the whole cluster scope.
But a _tenant_ is a **_tenant_** and permissions might evolved separately for **_🎸ROCKY_** and **_🎬MOVY_**.
So [we got to keep'em separated](https://www.youtube.com/watch?v=GHUql3OC_uU).
---
### Let-su-go!
The **_⚙OPS_** team push this new tenant configuration to `Github` for `Flux` controllers to watch and catch it!
.lab[
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ \
git add . && \
git commit -m':wrench: :construction_worker: add MOVY tenant configuration' && \
git push
```
]
---
class: pic
![Running Mario](images/M6-running-Mario.gif)
---
class: extra-details
### Another Flux error?
.lab[
- It seems that our `movy` branch is not present in the app repository
```bash
k8s@shpod:~$ flux get kustomization -A
NAMESPACE NAME REVISION SUSPENDED MESSAGE
()
flux-system tenant-prod False False kustomization path not found: stat /tmp/kustomization-113582828/tenants/prod: no such file or directory
()
movy-test movy False False Source artifact not found, retrying in 30s
```
]
---
### Creating the `movy` branch
- Let's create this new `movy` branch from `rocky` branch
.lab[
- You can force immediate reconciliation by typing this command:
```bash
k8s@shpod:~$ flux reconcile source git movy-app -n movy-test
```
]
---
class: pic
![Running Mario](images/M6-running-Mario.gif)
---
### New branch detected
You now have a second app responding on [http://movy.test.mybestdomain.com]
But as of now, it's just the same as the **_🎸ROCKY_** one.
We want a specific (pink-colored) version with a dataset full of movie soundtracks.
---
## New version of the **_🎬MOVY_** app
In our branch `movy`
Let's modify our `deployment.yaml` file with 2 modifications.
- in `spec.template.spec.containers.image` change the container image tag to `1.0.3`
- and… let's introduce some evil enthropy by changing this line… 😈😈😈
```yaml
value: jdbc:postgresql://db/music
```
by this one
```yaml
value: jdbc:postgresql://db.rocky-test/music
```
And push the modifications…
---
class: pic
![MOVY app has an incorrect dataset](images/M6-incorrect-dataset-in-MOVY-app.png)
---
class: pic
![ROCKY app has an incorrect dataset](images/M6-incorrect-dataset-in-ROCKY-app.png)
---
### MOVY app is connected to ROCKY database
How evil have we been! 😈
We connected the **_🎬MOVY_** app to the **_🎸ROCKY_** database.
Even if our tenants are isolated in how they manage their Kubernetes resources…
pod network is still full mesh and any connection is authorized.
> The **_⚙OPS_** team should fix this!
---
class: extra-details
## Adding NetworkPolicies to **_🎸ROCKY_** and **_🎬MOVY_** namespaces
`Network policies` may be seen as the firewall feature in the pod network.
They rules ingress and egress network connections considering a described subset of pods.
Please, refer to the [`Network policies` chapter in the High Five M4 module](./4.yml.html#toc-network-policies)
- In our case, we just add the file `~/container.training/k8s/M6-network-policies.yaml`
</br>in our `./tenants/base/movy` folder
- without forgetting to update our `kustomization.yaml` file
- and without forgetting to commit 😁
---
class: pic
![Running Mario](images/M6-running-Mario.gif)
---
### 🗺️ Where are we in our scenario?
<pre class="mermaid">
%%{init:
{
"theme": "default",
"gitGraph": {
"mainBranchName": "OPS",
"mainBranchOrder": 0
}
}
}%%
gitGraph
commit id:"0" tag:"start"
branch ROCKY order:3
branch MOVY order:4
branch YouRHere order:5
checkout OPS
commit id:'Flux install on CLOUDY cluster' tag:'T01'
branch TEST-env order:1
commit id:'FLUX install on TEST' tag:'T02' type: HIGHLIGHT
checkout OPS
commit id:'Flux config. for TEST tenant' tag:'T03'
commit id:'namespace isolation by RBAC'
checkout TEST-env
merge OPS id:'ROCKY tenant creation' tag:'T04'
checkout OPS
commit id:'ROCKY deploy. config.' tag:'R01'
checkout TEST-env
merge OPS id:'TEST ready to deploy ROCKY' type: HIGHLIGHT tag:'R02'
checkout ROCKY
commit id:'ROCKY' tag:'v1.0.0'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.0'
checkout OPS
commit id:'Ingress-controller config.' tag:'T05'
checkout TEST-env
merge OPS id:'Ingress-controller install' type: HIGHLIGHT tag:'T06'
checkout OPS
commit id:'ROCKY patch for ingress config.' tag:'R03'
checkout TEST-env
merge OPS id:'ingress config. for ROCKY app'
checkout ROCKY
commit id:'blue color' tag:'v1.0.1'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.1'
checkout ROCKY
commit id:'pink color' tag:'v1.0.2'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.2'
checkout OPS
commit id:'FLUX config for MOVY deployment' tag:'M01'
checkout TEST-env
merge OPS id:'FLUX ready to deploy MOVY' type: HIGHLIGHT tag:'M02'
checkout MOVY
commit id:'MOVY' tag:'v1.0.3'
checkout TEST-env
merge MOVY tag:'MOVY v1.0.3' type: REVERSE
checkout OPS
commit id:'Network policies'
checkout TEST-env
merge OPS type: HIGHLIGHT
checkout YouRHere
commit id:'x'
checkout OPS
merge YouRHere id:'YOU ARE HERE'
checkout OPS
commit id:'k0s install on METAL cluster' tag:'K01'
commit id:'Flux config. for METAL cluster' tag:'K02'
branch METAL_TEST-PROD order:3
commit id:'ROCKY/MOVY tenants on METAL' type: HIGHLIGHT
checkout OPS
commit id:'Flux config. for OpenEBS' tag:'K03'
checkout METAL_TEST-PROD
merge OPS id:'openEBS on METAL' type: HIGHLIGHT
checkout OPS
commit id:'Prometheus install'
checkout TEST-env
merge OPS type: HIGHLIGHT
checkout OPS
commit id:'Kyverno install'
commit id:'Kyverno rules'
checkout TEST-env
merge OPS type: HIGHLIGHT
</pre>

View File

@@ -0,0 +1,417 @@
# R01- Configuring **_🎸ROCKY_** deployment with Flux
The **_⚙OPS_** team manages 2 distinct envs: **_⚗TEST_** et _**🚜PROD**_
Thanks to _Kustomize_
1. it creates a **_base_** common config
2. this common config is overwritten with a **_⚗TEST_** _tenant_-specific configuration
3. the same applies with a _**🚜PROD**_-specific configuration
> 💡 This seems complex, but no worries: Flux's CLI handles most of it.
---
## Creating the **_🎸ROCKY_**-dedicated _tenant_ in **_⚗TEST_** env
- Using the `flux` _CLI_, we create the file configuring the **_🎸ROCKY_** team's dedicated _tenant_
- … this file takes place in the `base` common configuration for both envs
.lab[
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ \
mkdir -p ./tenants/base/rocky && \
flux create tenant rocky \
--with-namespace=rocky-test \
--cluster-role=rocky-full-access \
--export > ./tenants/base/rocky/rbac.yaml
```
]
---
class: extra-details
### 📂 ./tenants/base/rocky/rbac.yaml
Let's see our file…
3 resources are created: `Namespace`, `ServiceAccount`, and `ClusterRoleBinding`
`Flux` **impersonates** as this `ServiceAccount` when it applies any resources found in this _tenant_-dedicated source(s)
- By default, the `ServiceAccount` is bound to the `cluster-admin` `ClusterRole`
- The team maintaining the sourced `Github` repository is almighty at cluster scope
A not that much isolated _tenant_! 😕
That's why the **_⚙OPS_** team enforces specific `ClusterRoles` with restricted permissions
Let's create these permissions!
---
## _namespace_ isolation for **_🎸ROCKY_**
.lab[
- Here are the restricted permissions to use in the `rocky-test` `Namespace`
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ \
cp ~/container.training/k8s/M6-rocky-cluster-role.yaml ./tenants/base/rocky/
```
]
> 💡 Note that some resources are managed at cluster scope (like `PersistentVolumes`).
> We need specific permissions, then…
---
## Creating `Github` source in Flux for **_🎸ROCKY_** app repository
A specific _branch_ of the `Github` repository is monitored by the `Flux` source
.lab[
- ⚠️ you may change the **repository URL** to the one of your own clone
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ flux create source git rocky-app \
--namespace=rocky-test \
--url=https://github.com/Musk8teers/container.training-spring-music/ \
--branch=rocky --export > ./tenants/base/rocky/sync.yaml
```
]
---
## Creating `kustomization` in Flux for **_🎸ROCKY_** app repository
.lab[
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ flux create kustomization rocky \
--namespace=rocky-test \
--service-account=rocky \
--source=GitRepository/rocky-app \
--path="./k8s/" --export >> ./tenants/base/rocky/sync.yaml
k8s@shpod:~/fleet-config-using-flux-XXXXX$ \
cd ./tenants/base/rocky/ && \
kustomize create --autodetect && \
cd -
```
]
---
class: extra-details
### 📂 Flux config files
Let's review our `Flux` configuration files
.lab[
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ \
cat ./tenants/base/rocky/sync.yaml && \
cat ./tenants/base/rocky/kustomization.yaml
```
]
---
## Adding a kustomize patch for **_⚗TEST_** cluster deployment
💡 Remember the DRY strategy!
- The `Flux` tenant-dedicated configuration is looking for this file: `.tenants/test/rocky/kustomization.yaml`
- It has been configured here: `clusters/CLOUDY/tenants.yaml`
- All the files we just created are located in `.tenants/base/rocky`
- So we have to create a specific kustomization in the right location
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ \
mkdir -p ./tenants/test/rocky && \
cp ~/container.training/k8s/M6-rocky-test-patch.yaml ./tenants/test/rocky/ && \
cp ~/container.training/k8s/M6-rocky-test-kustomization.yaml ./tenants/test/rocky/kustomization.yaml
```
---
### Synchronizing Flux config with its Github repo
Locally, our `Flux` config repo is ready
The **_⚙OPS_** team has to push it to `Github` for `Flux` controllers to watch and catch it!
.lab[
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ \
git add . && \
git commit -m':wrench: :construction_worker: add ROCKY tenant configuration' && \
git push
```
]
---
class: pic
![Running Mario](images/M6-running-Mario.gif)
---
class: pic
![rocky config files](images/M6-R01-config-files.png)
---
class: extra-details
### Flux resources for ROCKY tenant 1/2
.lab[
```bash
k8s@shpod:~$ flux get all -A
NAMESPACE NAME REVISION SUSPENDED
READY MESSAGE
flux-system gitrepository/flux-system main@sha1:8ffd72cf False
True stored artifact for revision 'main@sha1:8ffd72cf'
rocky-test gitrepository/rocky-app rocky@sha1:ffe9f3fe False
True stored artifact for revision 'rocky@sha1:ffe9f3fe'
()
```
]
---
class: extra-details
### Flux resources for ROCKY _tenant_ 2/2
.lab[
```bash
k8s@shpod:~$ flux get all -A
()
NAMESPACE NAME REVISION SUSPENDED
READY MESSAGE
flux-system kustomization/flux-system main@sha1:8ffd72cf False
True Applied revision: main@sha1:8ffd72cf
flux-system kustomization/tenant-prod False
False kustomization path not found: stat /tmp/kustomization-1164119282/tenants/prod: no such file or directory
flux-system kustomization/tenant-test main@sha1:8ffd72cf False
True Applied revision: main@sha1:8ffd72cf
rocky-test kustomization/rocky False
False StatefulSet/db dry-run failed (Forbidden): statefulsets.apps "db" is forbidden: User "system:serviceaccount:rocky-test:rocky" cannot patch resource "statefulsets" in API group "apps" at the cluster scope
```
]
And here is our 2nd Flux error(s)! 😅
---
class: extra-details
### Flux Kustomization, mutability, …
🔍 Notice that none of the expected resources is created:
the whole kustomization is rejected, even if the `StatefulSet` is this only resource that fails!
🔍 Flux Kustomization uses the dry-run feature to templatize the resources and then applying patches onto them
Good but some resources are not completely mutable, such as `StatefulSets`
We have to fix the mutation by applying the change without having to patch the resource.
🔍 Simply add the `spec.targetNamespace: rocky-test` to the `Kustomization` named `rocky`
---
class: extra-details
## And then it's deployed 1/2
You should see the following resources in the `rocky-test` namespace
.lab[
```bash
k8s@shpod-578d64468-tp7r2 ~/$ k get pods,svc,deployments -n rocky-test
NAME READY STATUS RESTARTS AGE
pod/db-0 1/1 Running 0 47s
pod/web-6c677bf97f-c7pkv 0/1 Running 1 (22s ago) 47s
pod/web-6c677bf97f-p7b4r 0/1 Running 1 (19s ago) 47s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/db ClusterIP 10.32.6.128 <none> 5432/TCP 48s
service/web ClusterIP 10.32.2.202 <none> 80/TCP 48s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/web 0/2 2 0 47s
```
]
---
class: extra-details
## And then it's deployed 2/2
You should see the following resources in the `rocky-test` namespace
.lab[
```bash
k8s@shpod-578d64468-tp7r2 ~/$ k get statefulsets,pvc,pv -n rocky-test
NAME READY AGE
statefulset.apps/db 1/1 47s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/postgresql-data-db-0 Bound pvc-c1963a2b-4fc9-4c74-9c5a-b0870b23e59a 1Gi RWO sbs-default <unset> 47s
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE
persistentvolume/postgresql-data 1Gi RWO,RWX Retain Available <unset> 47s
persistentvolume/pvc-150fcef5-ebba-458e-951f-68a7e214c635 1G RWO Delete Bound shpod/shpod sbs-default <unset> 4h46m
persistentvolume/pvc-c1963a2b-4fc9-4c74-9c5a-b0870b23e59a 1Gi RWO Delete Bound rocky-test/postgresql-data-db-0 sbs-default <unset> 47s
```
]
---
class: extra-details
### PersistentVolumes are using a default `StorageClass`
💡 This managed cluster comes with custom `StorageClasses` leveraging on Cloud _IaaS_ capabilities (i.e. block devices)
![Flux configuration waterfall](images/M6-persistentvolumes.png)
- a default `StorageClass` is applied if none is specified (like here)
- for **_🏭PROD_** purpose, ops team might enforce a more performant `StorageClass`
- on a bare-metal cluster, **_🏭PROD_** team has to configure and provide `StorageClasses` on its own
---
class: pic
![Flux configuration waterfall](images/M6-flux-config-dependencies.png)
---
## Upgrading ROCKY app
The Git source named `rocky-app` is pointing at
- a Github repository named [Musk8teers/container.training-spring-music](https://github.com/Musk8teers/container.training-spring-music/)
- on its branch named `rocky`
This branch deploy the v1.0.0 of the _Web_ app:
`spec.template.spec.containers.image: ghcr.io/musk8teers/container.training-spring-music:1.0.0`
What happens if the **_🎸ROCKY_** team upgrades its branch to deploy `v1.0.1` of the _Web_ app?
---
## _tenant_ **_🏭PROD_**
💡 **_🏭PROD_** _tenant_ is still waiting for its `Flux` configuration, but don't bother for it right now.
---
### 🗺️ Where are we in our scenario?
<pre class="mermaid">
%%{init:
{
"theme": "default",
"gitGraph": {
"mainBranchName": "OPS",
"mainBranchOrder": 0
}
}
}%%
gitGraph
commit id:"0" tag:"start"
branch ROCKY order:3
branch MOVY order:4
branch YouRHere order:5
checkout OPS
commit id:'Flux install on CLOUDY cluster' tag:'T01'
branch TEST-env order:1
commit id:'FLUX install on TEST' tag:'T02' type: HIGHLIGHT
checkout OPS
commit id:'Flux config. for TEST tenant' tag:'T03'
commit id:'namespace isolation by RBAC'
checkout TEST-env
merge OPS id:'ROCKY tenant creation' tag:'T04'
checkout OPS
commit id:'ROCKY deploy. config.' tag:'R01'
checkout TEST-env
merge OPS id:'TEST ready to deploy ROCKY' type: HIGHLIGHT tag:'R02'
checkout ROCKY
commit id:'ROCKY' tag:'v1.0.0'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.0'
checkout YouRHere
commit id:'x'
checkout OPS
merge YouRHere id:'YOU ARE HERE'
checkout OPS
commit id:'Ingress-controller config.' tag:'T05'
checkout TEST-env
merge OPS id:'Ingress-controller install' type: HIGHLIGHT tag:'T06'
checkout OPS
commit id:'ROCKY patch for ingress config.' tag:'R03'
checkout TEST-env
merge OPS id:'ingress config. for ROCKY app'
checkout ROCKY
commit id:'blue color' tag:'v1.0.1'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.1'
checkout ROCKY
commit id:'pink color' tag:'v1.0.2'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.2'
checkout OPS
commit id:'FLUX config for MOVY deployment' tag:'M01'
checkout TEST-env
merge OPS id:'FLUX ready to deploy MOVY' type: HIGHLIGHT tag:'M02'
checkout MOVY
commit id:'MOVY' tag:'v1.0.3'
checkout TEST-env
merge MOVY tag:'MOVY v1.0.3' type: REVERSE
checkout OPS
commit id:'Network policies'
checkout TEST-env
merge OPS type: HIGHLIGHT
</pre>

View File

View File

@@ -0,0 +1,354 @@
# Kubernetes in production — <br/>an end-to-end example
- Previous training modules focused on individual topics
(e.g. RBAC, network policies, CRDs, Helm...)
- We will now show how to put everything together to deploy apps in production
(dealing with typical challenges like: multiple apps, multiple teams, multiple clusters...)
- Our first challenge will be to pick and choose which components to use
(among the vast [Cloud Native Landscape](https://landscape.cncf.io/))
- We'll start with a basic Kubernetes cluster (on cloud or on premises)
- We'll and enhance it by adding features one at a time
---
## The cast
There are 3 teams in our company:
- **_⚙OPS_** is the platform engineering team
- they're responsible for building and configuring Kubernetes clusters
- the **_🎸ROCKY_** team develops and manages the **_🎸ROCKY_** app
- that app manages a collection of _rock & pop_ albums
- it's deployed with plain YAML manifests
- the **_🎬MOVY_** team develops and manages the **_🎬MOVY_** app
- that app manages a collection of _movie soundtrack_ albums
- it's deployed with Helm charts
---
## Code and team organization
- **_🎸ROCKY_** and **_🎬MOVY_** reside in separate git repositories
- Each team can write code, build package, and deploy their applications:
- independently
<br/>(= without having to worry about what's happening in the other repo)
- autonomously
<br/>(= without having to synchronize or obtain privileges from another team)
---
## Cluster organization
The **_⚙OPS_** team manages 2 Kubernetes clusters:
- **_☁CLOUDY_**: managed cluster from a public cloud provider
- **_🤘METAL_**: custom-built cluster installed on bare Linux servers
Let's see the differences between these clusters.
---
## **_☁CLOUDY_** cluster
- Managed cluster from a public cloud provider ("Kubernetes-as-a-Service")
- HA control plane deployed and managed by the cloud provider
- Two worker nodes (potentially with cluster autoscaling)
- Usually comes pre-installed with some basic features
(e.g. metrics-server, CNI, CSI, sometimes an ingress controller)
- Requires extra components to be production-ready
(e.g. Flux or other gitops pipeline, observability...)
- Example: [Scaleway Kapsule][kapsule] (but many other KaaS options are available)
[kapsule]: https://www.scaleway.com/en/kubernetes-kapsule/
---
## **_🤘METAL_** cluster
- Custom-built cluster installed on bare Linux servers
- HA control plane deployed and managed by the **_⚙OPS_** team
- 3 nodes
- in our example, the nodes will run both the control plane and our apps
- it is more typical to use dedicated control plane nodes
<br/>(example: 3 control plane nodes + at least 3 worker nodes)
- Comes with even less pre-installed components than **_☁CLOUDY_**
(requiring more work from our **_⚙OPS_** team)
- Example: we'll use [k0s] (but many other distros are available)
[k0s]: https://k0sproject.io/
---
## **_⚗TEST_** and **_🏭PROD_**
- The **_⚙OPS_** team creates 2 environments for each dev team
(**_⚗TEST_** and **_🏭PROD_**)
- These environments exist on both clusters
(meaning 2 apps × 2 clusters × 2 envs = 8 envs total)
- The setup for each env and cluster should follow DRY principles
(to ensure configurations are consistent and minimize maintenance)
- Each cluster and each env has its own lifecycle
(= it should be possible to deploy, add an extra components/feature...
<br/>on one env without impacting the other)
---
### Multi-tenancy
Both **_🎸ROCKY_** and **_🎬MOVY_** teams should use **dedicated _"tenants"_** on each cluster/env
- the **_🎸ROCKY_** team should be able to deploy, upgrade and configure its app within its dedicated **namespace** without anybody else involved
- and the same for **_🎬MOVY_**
- neither team's deployments might interfere with the other, maintaining a clean and conflict-free environment
---
## Application overview
- Both dev teams are working on an app to manage music albums
- This app is mostly based on a `Spring` framework demo called spring-music
- This lab uses a dedicated fork [container.training-spring-music](https://github.com/Musk8teers/container.training-spring-music):
- with 2 branches dedicated to the **_🎸ROCKY_** and **_🎬MOVY_** teams
- The app architecture consists of 2 tiers:
- a `Java/Spring` Web app
- a `PostgreSQL` database
---
### 📂 specific file: application.yaml
This is where we configure the application to connect to the `PostgreSQL` database.
.lab[
🔍 Location: [/src/main/resources/application.yml](https://github.com/Musk8teers/container.training-spring-music/blob/main/src/main/resources/application.yml)
]
`PROFILE=postgres` env var is set in [docker-compose.yaml](https://github.com/Musk8teers/container.training-spring-music/blob/main/docker-compose.yml) file, for example…
---
### 📂 specific file: AlbumRepositoryPopulator.java
This is where the album collection is initially loaded from the file [`album.json`](https://github.com/Musk8teers/container.training-spring-music/blob/main/src/main/resources/albums.json)
.lab[
🔍 Location: [`/src/main/java/org/cloudfoundry/samples/music/repositories/AlbumRepositoryPopulator.java`](https://github.com/Musk8teers/container.training-spring-music/blob/main/src/main/java/org/cloudfoundry/samples/music/repositories/AlbumRepositoryPopulator.java)
]
---
## 🚚 How to deploy?
The **_⚙OPS_** team offers 2 deployment strategies that dev teams can use autonomously:
- **_🎸ROCKY_** uses a `Flux` _GitOps_ workflow based on regular Kubernetes `YAML` resources
- **_🎬MOVY_** uses a `Flux` _GitOps_ workflow based on `Helm` charts
---
## 🍱 What features?
<!-- TODO: complete this slide when all the modules are there -->
The **_⚙OPS_** team aims to provide clusters offering the following features to its users:
- a network stack with efficient workload isolation
- ingress and load-balancing capabilites
- an enterprise-grade monitoring solution for real-time insights
- automated policy rule enforcement to control Kubernetes resources requested by dev teams
<!-- - HA PostgreSQL -->
<!-- - HTTPs certificates to expose the applications -->
---
## 🌰 In a nutshell
- 3 teams: **_⚙OPS_**, **_🎸ROCKY_**, **_🎬MOVY_**
- 2 clusters: **_☁CLOUDY_**, **_🤘METAL_**
- 2 envs per cluster and per dev team: **_⚗TEST_**, **_🏭PROD_**
- 2 Web apps Java/Spring + PostgreSQL: one for pop and rock albums, another for movie soundtrack albums
- 2 deployment strategies: regular `YAML` resources + `Kustomize`, `Helm` charts
> 💻 `Flux` is used both
> - to operate the clusters
> - and to manage the _GitOps_ deployment workflows
---
### What our scenario might look like…
<pre class="mermaid">
%%{init:
{
"theme": "default",
"gitGraph": {
"mainBranchName": "OPS",
"mainBranchOrder": 0
}
}
}%%
gitGraph
commit id:"0" tag:"start"
branch ROCKY order:4
branch MOVY order:5
branch YouRHere order:6
checkout YouRHere
commit id:'x'
checkout OPS
merge YouRHere id:'YOU ARE HERE'
checkout OPS
commit id:'Flux install on CLOUDY cluster' tag:'T01'
branch TEST-env order:1
commit id:'FLUX install on TEST' tag:'T02' type: HIGHLIGHT
checkout OPS
commit id:'Flux config. for TEST tenant' tag:'T03'
commit id:'namespace isolation by RBAC'
checkout TEST-env
merge OPS id:'ROCKY tenant creation' tag:'T04'
checkout OPS
commit id:'ROCKY deploy. config.' tag:'R01'
checkout TEST-env
merge OPS id:'TEST ready to deploy ROCKY' type: HIGHLIGHT tag:'R02'
checkout ROCKY
commit id:'ROCKY' tag:'v1.0.0'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.0'
checkout OPS
commit id:'Ingress-controller config.' tag:'T05'
checkout TEST-env
merge OPS id:'Ingress-controller install' type: HIGHLIGHT tag:'T06'
checkout OPS
commit id:'ROCKY patch for ingress config.' tag:'R03'
checkout TEST-env
merge OPS id:'ingress config. for ROCKY app'
checkout ROCKY
commit id:'blue color' tag:'v1.0.1'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.1'
checkout ROCKY
commit id:'pink color' tag:'v1.0.2'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.2'
checkout OPS
commit id:'FLUX config for MOVY deployment' tag:'M01'
checkout TEST-env
merge OPS id:'FLUX ready to deploy MOVY' type: HIGHLIGHT tag:'M02'
checkout MOVY
commit id:'MOVY' tag:'v1.0.3'
checkout TEST-env
merge MOVY tag:'MOVY v1.0.3' type: REVERSE
checkout OPS
commit id:'Network policies'
checkout TEST-env
merge OPS type: HIGHLIGHT tag:'T07'
checkout OPS
commit id:'k0s install on METAL cluster' tag:'K01'
commit id:'Flux config. for METAL cluster' tag:'K02'
branch METAL_TEST-PROD order:3
commit id:'ROCKY/MOVY tenants on METAL' type: HIGHLIGHT
checkout OPS
commit id:'Flux config. for OpenEBS' tag:'K03'
checkout METAL_TEST-PROD
merge OPS id:'openEBS on METAL' type: HIGHLIGHT
checkout OPS
commit id:'Prometheus install'
checkout TEST-env
merge OPS type: HIGHLIGHT
checkout OPS
commit id:'Kyverno install'
commit id:'Kyverno rules'
checkout TEST-env
merge OPS type: HIGHLIGHT
checkout OPS
commit id:'Flux config. for PROD tenant' tag:'P01'
branch PROD-env order:2
commit id:'ROCKY tenant on PROD'
checkout OPS
commit id:'ROCKY patch for PROD' tag:'R04'
checkout PROD-env
merge OPS id:'PROD ready to deploy ROCKY' type: HIGHLIGHT
checkout PROD-env
merge ROCKY tag:'ROCKY v1.0.2'
checkout MOVY
commit id:'MOVY HELM chart' tag:'M03'
checkout TEST-env
merge MOVY tag:'MOVY v1.0'
</pre>

View File

@@ -0,0 +1,410 @@
# T02- creating **_⚗TEST_** env on our **_☁CLOUDY_** cluster
Let's take a look at our **_☁CLOUDY_** cluster!
**_☁CLOUDY_** is a Kubernetes cluster created with [Scaleway Kapsule](https://www.scaleway.com/en/kubernetes-kapsule/) managed service
This managed cluster comes preinstalled with specific features:
- Kubernetes dashboard
- specific _Storage Classes_ based on Scaleway _IaaS_ block storage offerings
- a `Cilium` _CNI_ stack already set up
---
## Accessing the managed Kubernetes cluster
To access our cluster, we'll connect via [`shpod`](https://github.com/jpetazzo/shpod)
.lab[
- If you already have a kubectl on your desktop computer
```bash
kubectl -n shpod run shpod --image=jpetazzo/shpod
kubectl -n shpod exec -it shpod -- bash
```
- or directly via ssh
```bash
ssh -p myPort k8s@mySHPODSvcIpAddress
```
]
---
## Flux installation
Once `Flux` is installed,
the **_⚙OPS_** team exclusively operates its clusters by updating a code base in a `Github` repository
_GitOps_ and `Flux` enable the **_⚙OPS_** team to rely on the _first-class citizen pattern_ in Kubernetes' world through these steps:
- describe the **desired target state**
- and let the **automated convergence** happens
---
### Checking prerequisites
The `Flux` _CLI_ is available in our `shpod` pod
Before installation, we need to check that:
- `Flux` _CLI_ is correctly installed
- it can connect to the `API server`
- our versions of `Flux` and Kubernetes are compatible
.lab[
```bash
k8s@shpod:~$ flux --version
flux version 2.5.1
k8s@shpod:~$ flux check --pre
► checking prerequisites
✔ Kubernetes 1.32.3 >=1.30.0-0
✔ prerequisites checks passed
```
]
---
### Git repository for Flux configuration
The **_⚙OPS_** team uses `Flux` _CLI_
- to create a `git` repository named `fleet-config-using-flux-XXXXX` (⚠ replace `XXXXX` by a personnal suffix)
- in our `Github` organization named `container-training-fleet`
Prerequisites are:
- `Flux` _CLI_ needs a `Github` personal access token (_PAT_)
- to create and/or access the `Github` repository
- to give permissions to existing teams in our `Github` organization
- The PAT needs _CRUD_ permissions on our `Github` organization
- repositories
- admin:public_key
- users
- As **_⚙OPS_** team, let's creates a `Github` personal access token…
---
class: pic
![Generating a Github personal access token](images/M6-github-add-token.jpg)
---
### Creating dedicated `Github` repo to host Flux config
.lab[
- let's replace the `GITHUB_TOKEN` value by our _Personal Access Token_
- and the `GITHUB_REPO` value by our specific repository name
```bash
k8s@shpod:~$ export GITHUB_TOKEN="my-token" && \
export GITHUB_USER="container-training-fleet" && \
export GITHUB_REPO="fleet-config-using-flux-XXXXX"
k8s@shpod:~$ flux bootstrap github \
--owner=${GITHUB_USER} \
--repository=${GITHUB_REPO} \
--team=OPS \
--team=ROCKY --team=MOVY \
--path=clusters/CLOUDY
```
]
---
class: extra-details
Here is the result
```bash
✔ repository "https://github.com/container-training-fleet/fleet-config-using-flux-XXXXX" created
► reconciling repository permissions
✔ granted "maintain" permissions to "OPS"
✔ granted "maintain" permissions to "ROCKY"
✔ granted "maintain" permissions to "MOVY"
► reconciling repository permissions
✔ reconciled repository permissions
► cloning branch "main" from Git repository "https://github.com/container-training-fleet/fleet-config-using-flux-XXXXX.git"
✔ cloned repository
► generating component manifests
✔ generated component manifests
✔ committed component manifests to "main" ("7c97bdeb5b932040fd8d8a65fe1dc84c66664cbf")
► pushing component manifests to "https://github.com/container-training-fleet/fleet-config-using-flux-XXXXX.git"
✔ component manifests are up to date
► installing components in "flux-system" namespace
✔ installed components
✔ reconciled components
► determining if source secret "flux-system/flux-system" exists
► generating source secret
✔ public key: ecdsa-sha2-nistp384 AAAAE2VjZHNhLXNoYTItbmlzdHAzODQAAAAIbmlzdHAzODQAAABhBFqaT8B8SezU92qoE+bhnv9xONv9oIGuy7yVAznAZfyoWWEVkgP2dYDye5lMbgl6MorG/yjfkyo75ETieAE49/m9D2xvL4esnSx9zsOLdnfS9W99XSfFpC2n6soL+Exodw==
✔ configured deploy key "flux-system-main-flux-system-./clusters/CLOUDY" for "https://github.com/container-training-fleet/fleet-config-using-flux-XXXXX"
► applying source secret "flux-system/flux-system"
✔ reconciled source secret
► generating sync manifests
✔ generated sync manifests
✔ committed sync manifests to "main" ("11035e19cabd9fd2c7c94f6e93707f22d69a5ff2")
► pushing sync manifests to "https://github.com/container-training-fleet/fleet-config-using-flux-XXXXX.git"
► applying sync manifests
✔ reconciled sync configuration
◎ waiting for GitRepository "flux-system/flux-system" to be reconciled
✔ GitRepository reconciled successfully
◎ waiting for Kustomization "flux-system/flux-system" to be reconciled
✔ Kustomization reconciled successfully
► confirming components are healthy
✔ helm-controller: deployment ready
✔ kustomize-controller: deployment ready
✔ notification-controller: deployment ready
✔ source-controller: deployment ready
✔ all components are healthy
```
---
### Flux configures Github repository access for teams
- `Flux` sets up permissions that allow teams within our organization to **access** the `Github` repository as maintainers
- Teams need to exist before `Flux` proceeds to this configuration
![Teams in Github](images/M6-github-teams.png)
---
### ⚠️ Disclaimer
- In this lab, adding these teams as maintainers was merely a demonstration of how `Flux` _CLI_ sets up permissions in Github
- But there is no need for dev teams to have access to this `Github` repository
- One advantage of _GitOps_ lies in its ability to easily set up 💪🏼 **Separation of concerns** by using multiple `Flux` sources
---
### 📂 Flux config files
`Flux` has been successfully installed onto our **_☁CLOUDY_** Kubernetes cluster!
Its configuration is managed through a _Gitops_ workflow sourced directly from our `Github` repository
Let's review our `Flux` configuration files we've created and pushed into the `Github` repository…
… as well as the corresponding components running in our Kubernetes cluster
![Flux config files](images/M6-flux-config-files.png)
---
class: pic
<!-- FIXME: wrong schema -->
![Flux architecture](images/M6-flux-controllers.png)
---
class: extra-details
### Flux resources 1/2
.lab[
```bash
k8s@shpod:~$ kubectl get all --namespace flux-system
NAME READY STATUS RESTARTS AGE
pod/helm-controller-b6767d66-h6qhk 1/1 Running 0 5m
pod/kustomize-controller-57c7ff5596-94rnd 1/1 Running 0 5m
pod/notification-controller-58ffd586f7-zxfvk 1/1 Running 0 5m
pod/source-controller-6ff87cb475-g6gn6 1/1 Running 0 5m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/notification-controller ClusterIP 10.104.139.156 <none> 80/TCP 5m1s
service/source-controller ClusterIP 10.106.120.137 <none> 80/TCP 5m
service/webhook-receiver ClusterIP 10.96.28.236 <none> 80/TCP 5m
()
```
]
---
class: extra-details
### Flux resources 2/2
.lab[
```bash
k8s@shpod:~$ kubectl get all --namespace flux-system
()
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/helm-controller 1/1 1 1 5m
deployment.apps/kustomize-controller 1/1 1 1 5m
deployment.apps/notification-controller 1/1 1 1 5m
deployment.apps/source-controller 1/1 1 1 5m
NAME DESIRED CURRENT READY AGE
replicaset.apps/helm-controller-b6767d66 1 1 1 5m
replicaset.apps/kustomize-controller-57c7ff5596 1 1 1 5m
replicaset.apps/notification-controller-58ffd586f7 1 1 1 5m
replicaset.apps/source-controller-6ff87cb475 1 1 1 5m
```
]
---
### Flux components
- the `source controller` monitors `Git` repositories to apply Kubernetes resources on the cluster
- the `Helm controller` checks for new `Helm` _charts_ releases in `Helm` repositories and installs updates as needed
- _CRDs_ store `Flux` configuration within the Kubernetes control plane
---
class: extra-details
### Flux resources that have been created
.lab[
```bash
k8s@shpod:~$ flux get all --all-namespaces
NAMESPACE NAME REVISION SUSPENDED
READY MESSAGE
flux-system gitrepository/flux-system main@sha1:d48291a8 False
True stored artifact for revision 'main@sha1:d48291a8'
NAMESPACE NAME REVISION SUSPENDED
READY MESSAGE
flux-system kustomization/flux-system main@sha1:d48291a8 False
True Applied revision: main@sha1:d48291a8
```
]
---
### Flux CLI
`Flux` Command-Line Interface fulfills 3 primary functions:
1. It installs and configures first mandatory `Flux` resources in a _Gitops_ `git` repository
- ensuring proper access and permissions
2. It locally generates `YAML` files for desired `Flux` resources so that we just need to `git push` them
- _tenants_
- sources
-
3. It requests the API server to manage `Flux`-related resources
- _operators_
- _CRDs_
- logs
---
class: extra-details
### Flux -- for more info
Please, refer to the [`Flux` chapter in the High Five M3 module](./3.yml.html#toc-helm-chart-format)
---
### Flux relies on Kustomize
The `Flux` component named `kustomize controller` look for `Kustomize` resources in `Flux` code-based sources
1. `Kustomize` look for `YAML` manifests listed in the `kustomization.yaml` file
2. and aggregates, hydrates and patches them following the `kustomization` configuration
---
class: extra-details
### 2 different kustomization resources
⚠️ `Flux` uses 2 distinct resources with `kind: kustomization`
```yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: kustomization
```
describes how Kustomize (the _CLI_ tool) appends and transforms `YAML` manifests into a single bunch of `YAML` described resources
```yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1 group
kind: Kustomization
```
describes where `Flux kustomize-controller` looks for a `kustomization.yaml` file in a given `Flux` code-based source
---
class: extra-details
### Kustomize -- for more info
Please, refer to the [`Kustomize` chapter in the High Five M3 module](./3.yml.html#toc-kustomize)
---
class: extra-details
### Group / Version / Kind -- for more info
For more info about how Kubernetes resource natures are identified by their `Group / Version / Kind` triplet…
… please, refer to the [`Kubernetes API` chapter in the High Five M5 module](./5.yml.html#toc-the-kubernetes-api)
---
### 🗺️ Where are we in our scenario?
<pre class="mermaid">
%%{init:
{
"theme": "default",
"gitGraph": {
"mainBranchName": "OPS",
"mainBranchOrder": 0
}
}
}%%
gitGraph
commit id:"0" tag:"start"
branch ROCKY order:3
branch MOVY order:4
branch YouRHere order:5
checkout OPS
commit id:'Flux install on CLOUDY cluster' tag:'T01'
branch TEST-env order:1
commit id:'FLUX install on TEST' tag:'T02' type: HIGHLIGHT
checkout YouRHere
commit id:'x'
checkout OPS
merge YouRHere id:'YOU ARE HERE'
checkout OPS
commit id:'Flux config. for TEST tenant' tag:'T03'
commit id:'namespace isolation by RBAC'
checkout TEST-env
merge OPS id:'ROCKY tenant creation' tag:'T04'
checkout OPS
commit id:'ROCKY deploy. config.' tag:'R01'
checkout TEST-env
merge OPS id:'TEST ready to deploy ROCKY' type: HIGHLIGHT tag:'R02'
checkout ROCKY
commit id:'ROCKY' tag:'v1.0.0'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.0'
</pre>

View File

@@ -0,0 +1,200 @@
# Multi-tenants management with Flux
💡 Thanks to `Flux`, we can manage Kubernetes resources from inside the clusters.
The **_⚙OPS_** team uses `Flux` with a _GitOps_ code base to:
- configure the clusters
- deploy tools and components to extend the clusters capabilites
- configure _GitOps_ workflow for dev teams in **dedicated and isolated _tenants_**
The **_🎸ROCKY_** team uses `Flux` to deploy every new release of its app, by detecting every new `git push` events happening in its app `Github` repository
The **_🎬MOVY_** team uses `Flux` to deploy every new release of its app, packaged and published in a new `Helm` chart release
---
## Creating _tenants_ with Flux
While basic `Flux` behavior is to use a single configuration directory applied by a cluster-wide role…
… it can also enable _multi-tenant_ configuration by:
- creating dedicated directories for each _tenant_ in its configuration code base
- and using a dedicated `ServiceAccount` with limited permissions to operate in each _tenant_
Several _tenants_ are created
- per env
- for **_⚗TEST_**
- and **_🏭PROD_**
- per team
- for **_🎸ROCKY_**
- and **_🎬MOVY_**
---
class: pic
![Multi-tenants clusters](images/M6-cluster-multi-tenants.png )
---
### Flux CLI works locally
First, we have to **locally** clone your `Flux` configuration `Github` repository
- create an ssh key pair
- add the **public** key to your `Github` repository (**with write access**)
- and git clone the repository
---
### The command line 1/2
Creating the **_⚗TEST_** _tenant_
.lab[
- ⚠️ Think about renaming the repo with your own suffix
```bash
k8s@shpod:~$ cd fleet-config-using-flux-XXXXX/
k8s@shpod:~/fleet-config-using-flux-XXXXX$ \
flux create kustomization tenant-test \
--namespace=flux-system \
--source=GitRepository/flux-system \
--path ./tenants/test \
--interval=1m \
--prune --export >> clusters/CLOUDY/tenants.yaml
```
]
---
### The command line 2/2
Then we create the **_🏭PROD_** _tenant_
.lab[
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ \
flux create kustomization tenant-prod \
--namespace=flux-system \
--source=GitRepository/flux-system \
--path ./tenants/prod \
--interval=3m \
--prune --export >> clusters/CLOUDY/tenants.yaml
```
]
---
### 📂 Flux tenants.yaml files
Let's review the `fleet-config-using-flux-XXXXX/clusters/CLOUDY/tenants.yaml` file
⚠️ The last command we type in `Flux` _CLI_ creates the `YAML` manifest **locally**
> ☝🏻 Don't forget to `git commit` and `git push` to `Github`!
---
class: pic
![Running Mario](images/M6-running-Mario.gif)
---
### Our 1st Flux error
.lab[
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ flux get all
NAMESPACE NAME REVISION SUSPENDED
READY MESSAGE
flux-system gitrepository/flux-system main@sha1:0466652e False
True stored artifact for revision 'main@sha1:0466652e'
NAMESPACE NAME REVISION SUSPENDED
READY MESSAGE
kustomization/flux-system main@sha1:0466652e False True
Applied revision: main@sha1:0466652e
kustomization/tenant-prod False False
kustomization path not found: stat /tmp/kustomization-417981261/tenants/prod: no such file or directory
kustomization/tenant-test False False
kustomization path not found: stat /tmp/kustomization-2532810750/tenants/test: no such file or directory
```
]
> Our configuration may be incomplete 😅
---
## Configuring Flux for the **_🎸ROCKY_** team
What the **_⚙OPS_** team has to do:
- 🔧 Create a dedicated `rocky` _tenant_ for **_⚗TEST_** and **_🏭PROD_** envs on the cluster
- 🔧 Create the `Flux` source pointing to the `Github` repository embedding the **_🎸ROCKY_** app source code
- 🔧 Add a `kustomize` _patch_ into the global `Flux` config to include this specific `Flux` config. dedicated to the deployment of the **_🎸ROCKY_** app
What the **_🎸ROCKY_** team has to do:
- 👨‍💻 Create the `kustomization.yaml` file in the **_🎸ROCKY_** app source code repository on `Github`
---
### 🗺️ Where are we in our scenario?
<pre class="mermaid">
%%{init:
{
"theme": "default",
"gitGraph": {
"mainBranchName": "OPS",
"mainBranchOrder": 0
}
}
}%%
gitGraph
commit id:"0" tag:"start"
branch ROCKY order:3
branch MOVY order:4
branch YouRHere order:5
checkout OPS
commit id:'Flux install on CLOUDY cluster' tag:'T01'
branch TEST-env order:1
commit id:'FLUX install on TEST' tag:'T02' type: HIGHLIGHT
checkout OPS
commit id:'Flux config. for TEST tenant' tag:'T03'
commit id:'namespace isolation by RBAC'
checkout TEST-env
merge OPS id:'ROCKY tenant creation' tag:'T04'
checkout YouRHere
commit id:'x'
checkout OPS
merge YouRHere id:'YOU ARE HERE'
checkout OPS
commit id:'ROCKY deploy. config.' tag:'R01'
checkout TEST-env
merge OPS id:'TEST ready to deploy ROCKY' type: HIGHLIGHT tag:'R02'
checkout ROCKY
commit id:'ROCKY' tag:'v1.0.0'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.0'
</pre>

View File

@@ -0,0 +1,284 @@
# T05- Configuring ingress for **_🎸ROCKY_** app
🍾 **_🎸ROCKY_** team has just deployed its `v1.0.0`
We would like to reach it from our workstations
The regular way to do it in Kubernetes is to configure an `Ingress` resource.
- `Ingress` is an abstract resource that manages how services are exposed outside of the Kubernetes cluster (Layer 7).
- It relies on `ingress-controller`(s) that are technical solutions to handle all the rules related to ingress.
- Available features vary, depending on the `ingress-controller`: load-balancing, networking, firewalling, API management, throttling, TLS encryption, etc.
- `ingress-controller` may provision Cloud _IaaS_ network resources such as load-balancer, persistent IPs, etc.
---
class: extra-details
## Ingress -- for more info
Please, refer to the [`Ingress` chapter in the High Five M2 module](./2.yml.html#toc-exposing-http-services-with-ingress-resources)
---
## Installing `ingress-nginx` as our `ingress-controller`
We'll use `ingress-nginx` (relying on `NGinX`), quite a popular choice.
- It is able to provision IaaS load-balancer in ScaleWay Cloud services
- As a reverse-proxy, it is able to balance HTTP connections on an on-premises cluster
The **_⚙OPS_** Team add this new install to its `Flux` config. repo
---
### Creating a `Github` source in Flux for `ingress-nginx`
.lab[
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ \
mkdir -p ./clusters/CLOUDY/ingress-nginx && \
flux create source git ingress-nginx \
--namespace=ingress-nginx \
--url=https://github.com/kubernetes/ingress-nginx/ \
--branch=release-1.12 \
--export > ./clusters/CLOUDY/ingress-nginx/sync.yaml
```
]
---
### Creating `kustomization` in Flux for `ingress-nginx`
.lab[
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ flux create kustomization ingress-nginx \
--namespace=ingress-nginx \
--source=GitRepository/ingress-nginx \
--path="./deploy/static/provider/scw/" \
--export >> ./clusters/CLOUDY/ingress-nginx/sync.yaml
k8s@shpod:~/fleet-config-using-flux-XXXXX$ \
cp -p ~/container.training/k8s/M6-ingress-nginx-kustomization.yaml \
./clusters/CLOUDY/ingress-nginx/kustomization.yaml && \
cp -p ~/container.training/k8s/M6-ingress-nginx-components.yaml \
~/container.training/k8s/M6-ingress-nginx-*-patch.yaml \
./clusters/CLOUDY/ingress-nginx/
```
]
---
### Applying the new config
.lab[
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ \
git add ./clusters/CLOUDY/ingress-nginx && \
git commit -m':wrench: :rocket: add Ingress-controller' && \
git push
```
]
---
class: pic
![Running Mario](images/M6-running-Mario.gif)
---
class: pic
![Ingress-nginx provisionned a IaaS load-balancer in Scaleway Cloud services](images/M6-ingress-nginx-scaleway-lb.png)
---
class: extra-details
### Using external Git source
💡 Note that you can directly use pubilc `Github` repository (not maintained by your company).
- If you have to alter the configuration, `Kustomize` patching capabilities might help.
- Depending on the _gitflow_ this repository uses, updates will be deployed automatically to your cluster (here we're using a `release` branch).
- This repo exposes a `kustomization.yaml`. Well done!
---
## Adding the `ingress` resource to ROCKY app
.lab[
- Add the new manifest to our kustomization bunch
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ \
cp -pr ~/container.training/k8s/M6-rocky-ingress.yaml ./tenants/base/rocky && \
echo '- M6-rocky-ingress.yaml' >> ./tenants/base/rocky/kustomization.yaml
```
- Commit and its done
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ \
git add . && \
git commit -m':wrench: :rocket: add Ingress' && \
git push
```
]
---
class: pic
![Running Mario](images/M6-running-Mario.gif)
---
### Here is the result
After Flux reconciled the whole bunch of sources and kustomizations, you should see
- `Ingress-NGinX` controller components in `ingress-nginx` namespace
- A new `Ingress` in `rocky-test` namespace
.lab[
```bash
k8s@shpod:~$ kubectl get all -n ingress-nginx && \
kubectl get ingress -n rocky-test
k8s@shpod:~$ \
PublicIP=$(kubectl get ingress rocky -n rocky-test \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}')
k8s@shpod:~$ \
curl --header 'rocky.test.mybestdomain.com' http://$PublicIP/
```
]
---
class: pic
![Rocky application screenshot](images/M6-rocky-app-screenshot.png)
---
## Upgrading **_🎸ROCKY_** app
**_🎸ROCKY_** team is now fully able to upgrade and deploy its app autonomously.
Just give it a try!
- In the `deployment.yaml` file
- in the app repo ([https://github.com/Musk8teers/container.training-spring-music/])
- you can change the `spec.template.spec.containers.image` to `1.0.1` and then to `1.0.2`
Dont' forget which branch is watched by `Flux` Git source named `rocky`
Don't forget to commit!
---
## Few considerations
- The **_⚙OPS_** team has to decide how to manage name resolution for public IPs
- Scaleway propose to expose a wildcard domain for its Kubernetes clusters
- Here, we chose that `Ingress-controller` (that makes sense) but `Ingress` as well were managed by the **_⚙OPS_** team.
- It might have been done in many different ways!
---
### 🗺️ Where are we in our scenario?
<pre class="mermaid">
%%{init:
{
"theme": "default",
"gitGraph": {
"mainBranchName": "OPS",
"mainBranchOrder": 0
}
}
}%%
gitGraph
commit id:"0" tag:"start"
branch ROCKY order:3
branch MOVY order:4
branch YouRHere order:5
checkout OPS
commit id:'Flux install on CLOUDY cluster' tag:'T01'
branch TEST-env order:1
commit id:'FLUX install on TEST' tag:'T02' type: HIGHLIGHT
checkout OPS
commit id:'Flux config. for TEST tenant' tag:'T03'
commit id:'namespace isolation by RBAC'
checkout TEST-env
merge OPS id:'ROCKY tenant creation' tag:'T04'
checkout OPS
commit id:'ROCKY deploy. config.' tag:'R01'
checkout TEST-env
merge OPS id:'TEST ready to deploy ROCKY' type: HIGHLIGHT tag:'R02'
checkout ROCKY
commit id:'ROCKY' tag:'v1.0.0'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.0'
checkout OPS
commit id:'Ingress-controller config.' tag:'T05'
checkout TEST-env
merge OPS id:'Ingress-controller install' type: HIGHLIGHT tag:'T06'
checkout OPS
commit id:'ROCKY patch for ingress config.' tag:'R03'
checkout TEST-env
merge OPS id:'ingress config. for ROCKY app'
checkout ROCKY
commit id:'blue color' tag:'v1.0.1'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.1'
checkout ROCKY
commit id:'pink color' tag:'v1.0.2'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.2'
checkout YouRHere
commit id:'x'
checkout OPS
merge YouRHere id:'YOU ARE HERE'
checkout OPS
commit id:'FLUX config for MOVY deployment' tag:'M01'
checkout TEST-env
merge OPS id:'FLUX ready to deploy MOVY' type: HIGHLIGHT tag:'M02'
checkout MOVY
commit id:'MOVY' tag:'v1.0.3'
checkout TEST-env
merge MOVY tag:'MOVY v1.0.3' type: REVERSE
checkout OPS
commit id:'Network policies'
checkout TEST-env
merge OPS type: HIGHLIGHT
</pre>

View File

@@ -0,0 +1,353 @@
# Installing a Kubernetes cluster from scratch
We operated a managed cluster from **Scaleway** `Kapsule`.
It's great! Most batteries are included:
- storage classes, with an already configured default one
- a default CNI with `Cilium`
<br/>(`Calico` is supported too)
- a _IaaS_ load-balancer that is manageable by `ingress-controllers`
- a management _WebUI_ with the Kubernetes dashboard
- an observability stack with `metrics-server` and the Kubernetes dashboard
But what about _on premises_ needs?
---
class: extra-details
## On premises Kubernetes distributions
The [CNCF landscape](https://landscape.cncf.io/?fullscreen=yes&zoom=200&group=certified-partners-and-providers) currently lists **61!** Kubernetes distributions, today.
Not speaking of Kubernetes managed services from Cloud providers…
Please, refer to the [`Setting up Kubernetes` chapter in the High Five M2 module](./2.yml.html#toc-setting-up-kubernetes) for more infos about Kubernetes distributions.
---
## Introducing k0s
Nowadays, some "light" distros are considered good enough to run production clusters.
That's the case for `k0s`.
It's an open source Kubernetes lightweight distribution.
Mainly relying on **Mirantis**, a long-time software vendor in Kubernetes ecosystem.
(The ones who bought `Docker Enterprise` a long time ago. remember?)
`k0s` aims to be both
- a lightweight distribution for _edge-computing_ and development pupose
- an enterprise-grade HA distribution fully supported by its editor
<br/>`MKE4` and `kordent` leverage on `k0s`
---
### `k0s` package
Its single binary includes:
- a CRI (`containerd`)
- Kubernetes vanilla control plane components (including both `etcd`)
- a vanilla network stack
- `kube-router`
- `kube-proxy`
- `coredns`
- `konnectivity`
- `kubectl` CLI
- install / uninstall features
- backup / restore features
---
class: pic
![k0s package](images/M6-k0s-packaging.png)
---
class: extra-details
### Konnectivity
You've seen that Kubernetes cluster architecture is very versatile.
I'm referring to the [`Kubernetes architecture` chapter in the High Five M5 module](./5.yml.html#toc-kubernetes-architecture)
Network communications between control plane components and worker nodes might be uneasy to configure.
`Konnectivity` is a response to this pain. It acts as an RPC proxy for any communication initiated from control plane to workers.
These communications are listed in [`Kubernetes internal APIs` chapter in the High Five M5 module](https://2025-01-enix.container.training/5.yml.html#toc-kubernetes-internal-apis)
The agent deployed on each worker node maintains an RPC tunnel with the one deployed on control plane side.
---
class: pic
![konnectivity architecture](images/M6-konnectivity-architecture.png)
---
## Installing `k0s`
It installs with a one-liner command
- either in single-node lightweight footprint
- or in multi-nodes HA footprint
.lab[
- Get the binary
```bash
docker@m621: ~$ wget https://github.com/k0sproject/k0sctl/releases/download/v0.25.1/k0sctl-linux-amd64
```
]
---
### Prepare the config file
.lab[
- Create the config file
```bash
docker@m621: ~$ k0sctl init \
--controller-count 3 \
--user docker \
--k0s m621 m622 m623 > k0sctl.yaml
```
- change the following field: `spec.hosts[*].role: controller+worker`
- add the following fields: `spec.hosts[*].noTaints: true`
```bash
docker@m621: ~$ k0sctl apply --config k0sctl.yaml
```
]
---
### And the famous one-liner
.lab[
```bash
k8s@shpod: ~$ k0sctl apply --config k0sctl.yaml
```
]
---
### Check that k0s installed correctly
.lab[
```bash
docker@m621 ~$ sudo k0s status
Version: v1.33.1+k0s.1
Process ID: 60183
Role: controller
Workloads: true
SingleNode: false
Kube-api probing successful: true
Kube-api probing last error:
docker@m621 ~$ sudo k0s etcd member-list
{"members":{"m621":"https://10.10.3.190:2380","m622":"https://10.10.2.92:2380","m623":"https://10.10.2.110:2380"}}
```
]
---
### `kubectl` is included
.lab[
```bash
docker@m621 ~$ sudo k0s kubectl get nodes
NAME STATUS ROLES AGE VERSION
m621 Ready control-plane 66m v1.33.1+k0s
m622 Ready control-plane 66m v1.33.1+k0s
m623 Ready control-plane 66m v1.33.1+k0s
docker@m621 ~$ sudo k0s kubectl run shpod --image jpetazzo/shpod
```
]
---
class: extra-details
### Single node install (for info!)
For testing purpose, you may want to use a single-node (yet `etcd`-geared) install…
.lab[
- Install it
```bash
docker@m621 ~$ curl -sSLf https://get.k0s.sh | sudo sh
docker@m621 ~$ sudo k0s install controller --single
docker@m621 ~$ sudo k0s start
```
- Reset it
```bash
docker@m621 ~$ sudo k0s start
docker@m621 ~$ sudo k0s reset
```
]
---
## Deploying shpod
.lab[
```bash
docker@m621 ~$ sudo k0s kubectl apply -f https://shpod.in/shpod.yaml
docker@m621 ~$ sudo k0s kubectl apply -f https://shpod.in/shpod.yaml
```
]
---
## Flux install
We'll install `Flux`.
And replay the all scenario a 2nd time.
Let's face it: we don't have that much time. 😅
Since all our install and configuration is `GitOps`-based, we might just leverage on copy-paste and code configuration…
Maybe.
Let's copy the 📂 `./clusters/CLOUDY` folder and rename it 📂 `./clusters/METAL`.
---
### Modifying Flux config 📄 files
- In 📄 file `./clusters/METAL/flux-system/gotk-sync.yaml`
</br>change the `Kustomization` value `spec.path: ./clusters/METAL`
- ⚠️ We'll have to adapt the `Flux` _CLI_ command line
- And that's pretty much it!
- We'll see if anything goes wrong on that new cluster
---
### Connecting to our dedicated `Github` repo to host Flux config
.lab[
- let's replace `GITHUB_TOKEN` and `GITHUB_REPO` values
- don't forget to change the patch to `clusters/METAL`
```bash
k8s@shpod:~$ export GITHUB_TOKEN="my-token" && \
export GITHUB_USER="container-training-fleet" && \
export GITHUB_REPO="fleet-config-using-flux-XXXXX"
k8s@shpod:~$ flux bootstrap github \
--owner=${GITHUB_USER} \
--repository=${GITHUB_REPO} \
--team=OPS \
--team=ROCKY --team=MOVY \
--path=clusters/METAL
```
]
---
class: pic
![Running Mario](images/M6-running-Mario.gif)
---
### Flux deployed our complete stack
Everything seems to be here but…
- one database is in `Pending` state
- our `ingresses` don't work well
```bash
k8s@shpod ~$ curl --header 'Host: rocky.test.enixdomain.com' http://${myIngressControllerSvcIP}
curl: (52) Empty reply from server
```
---
### Fixing the Ingress
The current `ingress-nginx` configuration leverages on specific annotations used by Scaleway to bind a _IaaS_ load-balancer to the `ingress-controller`.
We don't have such kind of things here.😕
- We could bind our `ingress-controller` to a `NodePort`.
`ingress-nginx` install manifests propose it here:
</br>https://github.com/kubernetes/ingress-nginx/deploy/static/provider/baremetal
- In the 📄file `./clusters/METAL/ingress-nginx/sync.yaml`,
</br>change the `Kustomization` value `spec.path: ./deploy/static/provider/baremetal`
---
class: pic
![Running Mario](images/M6-running-Mario.gif)
---
### Troubleshooting the database
One of our `db-0` pod is in `Pending` state.
```bash
k8s@shpod ~$ k get pods db-0 -n *-test -oyaml
()
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2025-06-11T11:15:42Z"
message: '0/3 nodes are available: pod has unbound immediate PersistentVolumeClaims.
preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: Burstable
```
---
### Troubleshooting the PersistentVolumeClaims
```bash
k8s@shpod ~$ k get pvc postgresql-data-db-0 -n *-test -o yaml
()
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 9s (x182 over 45m) persistentvolume-controller no persistent volumes available for this claim and no storage class is set
```
No `storage class` is available on this cluster.
We hadn't the problem on our managed cluster since a default storage class was configured and then associated to our `PersistentVolumeClaim`.
Why is there no problem with the other database?
---
## Installing OpenEBS as our CSI

View File

@@ -0,0 +1,241 @@
## introducing Kyverno
Kyverno is a tool to extend Kubernetes permission management to express complex policies…
</br>… and override manifests delivered by client teams.
---
class: extra-details
### Kyverno -- for more info
Please, refer to the [`Setting up Kubernetes` chapter in the High Five M4 module](./4.yml.html#toc-policy-management-with-kyverno) for more infos about `Kyverno`.
---
## Creating an `Helm` source in Flux for OpenEBS Helm chart
.lab[
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ \
mkdir -p clusters/CLOUDY/kyverno && \
cp -pr ~/container.training/k8s/
k8s@shpod ~$ flux create source helm kyverno \
--namespace=kyverno \
--url=https://kyverno.github.io/kyverno/ \
--interval=3m \
--export > ./clusters/CLOUDY/kyverno/sync2.yaml
```
]
---
## Creating the `HelmRelease` in Flux
.lab[
```bash
k8s@shpod ~$ flux create helmrelease kyverno \
--namespace=kyverno \
--source=HelmRepository/kyverno.flux-system \
--target-namespace=kyverno \
--create-target-namespace=true \
--chart-version=">=3.4.2" \
--chart=kyverno \
--export >> ./clusters/CLOUDY/kyverno/sync.yaml
```
]
---
## Add Kyverno policy
This polivy is just an example.
It enforces the use of a `Service Account` in `Flux` configurations
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ \
mkdir -p clusters/CLOUDY/kyverno-policies && \
cp -pr ~/container.training/k8s/M6-kyverno-enforce-service-account.yaml \
./clusters/CLOUDY/kyverno-policies/
---
### Creating `kustomization` in Flux for Kyverno policies
.lab[
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ \
flux create kustomization kyverno-policies \
--namespace=kyverno \
--source=GitRepository/flux-system \
--path="./clusters/CLOUDY/kyverno-policies/" \
--prune true --interval 5m \
--depends-on kyverno \
--export >> ./clusters/CLOUDY/kyverno-policies/sync.yaml
```
]
## Apply Kyverno policy
```bash
flux create kustomization
--path
--source GitRepository/
--export > ./clusters/CLOUDY/kyverno-policies/sync.yaml
```
---
## Add Kyverno dependency for **_⚗TEST_** cluster
- Now that we've got `Kyverno` policies,
- ops team will enforce any upgrade from any kustomization in our dev team tenants
- to wait for the `kyverno` policies to be reconciled (in a `Flux` perspective)
- upgrade file `./clusters/CLOUDY/tenants.yaml`,
- by adding this property: `spec.dependsOn.{name: kyverno-policies}`
---
class: pic
![Running Mario](images/M6-running-Mario.gif)
---
### Debugging
`Kyverno-policies` `Kustomization` failed because `spec.dependsOn` property can only target a resource from the same `Kind`.
- Let's suppress the `spec.dependsOn` property.
Now `Kustomizations` for **_🎸ROCKY_** and **_🎬MOVY_** tenants failed because of our policies.
---
### 🗺️ Where are we in our scenario?
<pre class="mermaid">
%%{init:
{
"theme": "default",
"gitGraph": {
"mainBranchName": "OPS",
"mainBranchOrder": 0
}
}
}%%
gitGraph
commit id:"0" tag:"start"
branch ROCKY order:4
branch MOVY order:5
branch YouRHere order:6
checkout OPS
commit id:'Flux install on CLOUDY cluster' tag:'T01'
branch TEST-env order:1
commit id:'FLUX install on TEST' tag:'T02' type: HIGHLIGHT
checkout OPS
commit id:'Flux config. for TEST tenant' tag:'T03'
commit id:'namespace isolation by RBAC'
checkout TEST-env
merge OPS id:'ROCKY tenant creation' tag:'T04'
checkout OPS
commit id:'ROCKY deploy. config.' tag:'R01'
checkout TEST-env
merge OPS id:'TEST ready to deploy ROCKY' type: HIGHLIGHT tag:'R02'
checkout ROCKY
commit id:'ROCKY' tag:'v1.0.0'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.0'
checkout OPS
commit id:'Ingress-controller config.' tag:'T05'
checkout TEST-env
merge OPS id:'Ingress-controller install' type: HIGHLIGHT tag:'T06'
checkout OPS
commit id:'ROCKY patch for ingress config.' tag:'R03'
checkout TEST-env
merge OPS id:'ingress config. for ROCKY app'
checkout ROCKY
commit id:'blue color' tag:'v1.0.1'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.1'
checkout ROCKY
commit id:'pink color' tag:'v1.0.2'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.2'
checkout OPS
commit id:'FLUX config for MOVY deployment' tag:'M01'
checkout TEST-env
merge OPS id:'FLUX ready to deploy MOVY' type: HIGHLIGHT tag:'M02'
checkout MOVY
commit id:'MOVY' tag:'v1.0.3'
checkout TEST-env
merge MOVY tag:'MOVY v1.0.3' type: REVERSE
checkout OPS
commit id:'Network policies'
checkout TEST-env
merge OPS type: HIGHLIGHT tag:'T07'
checkout OPS
commit id:'k0s install on METAL cluster' tag:'K01'
commit id:'Flux config. for METAL cluster' tag:'K02'
branch METAL_TEST-PROD order:3
commit id:'ROCKY/MOVY tenants on METAL' type: HIGHLIGHT
checkout OPS
commit id:'Flux config. for OpenEBS' tag:'K03'
checkout METAL_TEST-PROD
merge OPS id:'openEBS on METAL' type: HIGHLIGHT
checkout OPS
commit id:'Prometheus install'
checkout TEST-env
merge OPS type: HIGHLIGHT
checkout OPS
commit id:'Kyverno install'
commit id:'Kyverno rules'
checkout TEST-env
merge OPS type: HIGHLIGHT
checkout YouRHere
commit id:'x'
checkout OPS
merge YouRHere id:'YOU ARE HERE'
checkout OPS
commit id:'Flux config. for PROD tenant' tag:'P01'
branch PROD-env order:2
commit id:'ROCKY tenant on PROD'
checkout OPS
commit id:'ROCKY patch for PROD' tag:'R04'
checkout PROD-env
merge OPS id:'PROD ready to deploy ROCKY' type: HIGHLIGHT
checkout PROD-env
merge ROCKY tag:'ROCKY v1.0.2'
checkout MOVY
commit id:'MOVY HELM chart' tag:'M03'
checkout TEST-env
merge MOVY tag:'MOVY v1.0'
</pre>

View File

@@ -0,0 +1,251 @@
# Install monitoring stack
The **_⚙OPS_** team wants to have a real monitoring stack for its clusters.
Let's deploy `Prometheus` and `Grafana` onto the clusters.
Note:
---
## Creating `Github` source in Flux for monitoring components install repository
.lab[
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ mkdir -p clusters/CLOUDY/kube-prometheus-stack
k8s@shpod:~/fleet-config-using-flux-XXXXX$ flux create source git monitoring \
--namespace=monitoring \
--url=https://github.com/fluxcd/flux2-monitoring-example.git \
--branch=main --export > ./clusters/CLOUDY/kube-prometheus-stack/sync.yaml
```
]
---
### Creating `kustomization` in Flux for monitoring stack
.lab[
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ flux create kustomization monitoring \
--namespace=monitoring \
--source=GitRepository/monitoring \
--path="./monitoring/controllers/kube-prometheus-stack/" \
--export >> ./clusters/CLOUDY/kube-prometheus-stack/sync.yaml
```
]
---
### Install Flux Grafana dashboards
.lab[
```bash
k8s@shpod:~/fleet-config-using-flux-XXXXX$ flux create kustomization dashboards \
--namespace=monitoring \
--source=GitRepository/monitoring \
--path="./monitoring/configs/" \
--export >> ./clusters/CLOUDY/kube-prometheus-stack/sync.yaml
```
]
---
class: pic
![Running Mario](images/M6-running-Mario.gif)
---
## Flux repository synchro is broken😅
It seems that `Flux` on **_☁CLOUDY_** cluster is not able to authenticate with `ssh` on its `Github` config repository!
What happened?
When we install `Flux` on **_🤘METAL_** cluster, it generates a new `ssh` keypair and override the one used by **_☁CLOUDY_** among the "deployment keys" of the `Github` repository.
⚠️ Beware of flux bootstrap command!
We have to
- generate a new keypair (or reuse an already existing one)
- add the private key to the Flux-dedicated secrets in **_☁CLOUDY_** cluster
- add it to the "deployment keys" of the `Github` repository
---
### the command
.lab[
- `Flux` _CLI_ helps to recreate the secret holding the `ssh` **private** key.
```bash
k8s@shpod:~$ flux create secret git flux-system \
--url=ssh://git@github.com/container-training-fleet/fleet-config-using-flux-XXXXX \
--private-key-file=/home/k8s/.ssh/id_ed25519
```
- copy the **public** key into the deployment keys of the `Github` repository
]
---
class: pic
![Running Mario](images/M6-running-Mario.gif)
---
## Access the Grafana dashboard
.lab[
- Get the `Host` and `IP` address to request
```bash
k8s@shpod:~$ kubectl -n monitoring get ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
grafana nginx grafana.test.metal.mybestdomain.com 62.210.39.83 80 6m30s
```
- Get the `Grafana` admin password
```bash
k8s@shpod:~$ k get secret kube-prometheus-stack-grafana -n monitoring \
-o jsonpath='{.data.admin-password}' | base64 -d
```
]
## And browse…
class: pic
![Grafana dashboard screenshot](images/M6-grafana-dashboard.png)
---
### 🗺️ Where are we in our scenario?
<pre class="mermaid">
%%{init:
{
"theme": "default",
"gitGraph": {
"mainBranchName": "OPS",
"mainBranchOrder": 0
}
}
}%%
gitGraph
commit id:"0" tag:"start"
branch ROCKY order:4
branch MOVY order:5
branch YouRHere order:6
checkout OPS
commit id:'Flux install on CLOUDY cluster' tag:'T01'
branch TEST-env order:1
commit id:'FLUX install on TEST' tag:'T02' type: HIGHLIGHT
checkout OPS
commit id:'Flux config. for TEST tenant' tag:'T03'
commit id:'namespace isolation by RBAC'
checkout TEST-env
merge OPS id:'ROCKY tenant creation' tag:'T04'
checkout OPS
commit id:'ROCKY deploy. config.' tag:'R01'
checkout TEST-env
merge OPS id:'TEST ready to deploy ROCKY' type: HIGHLIGHT tag:'R02'
checkout ROCKY
commit id:'ROCKY' tag:'v1.0.0'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.0'
checkout OPS
commit id:'Ingress-controller config.' tag:'T05'
checkout TEST-env
merge OPS id:'Ingress-controller install' type: HIGHLIGHT tag:'T06'
checkout OPS
commit id:'ROCKY patch for ingress config.' tag:'R03'
checkout TEST-env
merge OPS id:'ingress config. for ROCKY app'
checkout ROCKY
commit id:'blue color' tag:'v1.0.1'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.1'
checkout ROCKY
commit id:'pink color' tag:'v1.0.2'
checkout TEST-env
merge ROCKY tag:'ROCKY v1.0.2'
checkout OPS
commit id:'FLUX config for MOVY deployment' tag:'M01'
checkout TEST-env
merge OPS id:'FLUX ready to deploy MOVY' type: HIGHLIGHT tag:'M02'
checkout MOVY
commit id:'MOVY' tag:'v1.0.3'
checkout TEST-env
merge MOVY tag:'MOVY v1.0.3' type: REVERSE
checkout OPS
commit id:'Network policies'
checkout TEST-env
merge OPS type: HIGHLIGHT tag:'T07'
checkout OPS
commit id:'k0s install on METAL cluster' tag:'K01'
commit id:'Flux config. for METAL cluster' tag:'K02'
branch METAL_TEST-PROD order:3
commit id:'ROCKY/MOVY tenants on METAL' type: HIGHLIGHT
checkout OPS
commit id:'Flux config. for OpenEBS' tag:'K03'
checkout METAL_TEST-PROD
merge OPS id:'openEBS on METAL' type: HIGHLIGHT
checkout OPS
commit id:'Prometheus install'
checkout TEST-env
merge OPS type: HIGHLIGHT
checkout YouRHere
commit id:'x'
checkout OPS
merge YouRHere id:'YOU ARE HERE'
checkout OPS
commit id:'Kyverno install'
commit id:'Kyverno rules'
checkout TEST-env
merge OPS type: HIGHLIGHT
checkout OPS
commit id:'Flux config. for PROD tenant' tag:'P01'
branch PROD-env order:2
commit id:'ROCKY tenant on PROD'
checkout OPS
commit id:'ROCKY patch for PROD' tag:'R04'
checkout PROD-env
merge OPS id:'PROD ready to deploy ROCKY' type: HIGHLIGHT
checkout PROD-env
merge ROCKY tag:'ROCKY v1.0.2'
checkout MOVY
commit id:'MOVY HELM chart' tag:'M03'
checkout TEST-env
merge MOVY tag:'MOVY v1.0'
</pre>

View File

@@ -32,7 +32,7 @@
- Problem mitigation
*block nodes with vulnerable kernels, inject log4j mitigations...*
*block nodes with vulnerable kernels, inject log4j mitigations, rewrite images...*
- Extended validation for operators
@@ -583,19 +583,38 @@ Shell to the rescue!
---
## Coming soon...
## Real world examples
- [kube-image-keeper][kuik] rewrites image references to use cached images
(e.g. `nginx` → `localhost:7439/nginx`)
- [Kyverno] implements very extensive policies
(validation, generation... it deserves a whole chapter on its own!)
[kuik]: https://github.com/enix/kube-image-keeper
[kyverno]: https://kyverno.io/
---
## Alternatives
- Kubernetes Validating Admission Policies
- Integrated with the Kubernetes API server
- Relatively recent (alpha: 1.26, beta: 1.28, GA: 1.30)
- Lets us define policies using [CEL (Common Expression Language)][cel-spec]
- Declare validation rules with Common Expression Language ([CEL][cel-spec])
- Available in beta in Kubernetes 1.28 <!-- ##VERSION## -->
- Validation is done entirely within the API server
- Check this [CNCF Blog Post][cncf-blog-vap] for more details
(no external webhook = no latency, no deployment complexity...)
[cncf-blog-vap]: https://www.cncf.io/blog/2023/09/14/policy-management-in-kubernetes-is-changing/
- Not as powerful as full-fledged webhook engines like Kyverno
(see e.g. [this page of the Kyverno doc][kyverno-vap] for a comparison)
[kyverno-vap]: https://kyverno.io/docs/policy-types/validating-policy/
[cel-spec]: https://github.com/google/cel-spec
???

View File

@@ -35,7 +35,7 @@
## The chain of handlers
- API requests go through a complex chain of filters ([src](https://github.com/kubernetes/apiserver/blob/release-1.19/pkg/server/config.go#L671))
- API requests go through a complex chain of filters ([src](https://github.com/kubernetes/apiserver/blob/release-1.32/pkg/server/config.go#L1004))
(note when reading that code: requests start at the bottom and go up)

View File

@@ -1,5 +1,7 @@
# Backing up clusters
**(Note: we won't do the labs for that section!)**
- Backups can have multiple purposes:
- disaster recovery (servers or storage are destroyed or unreachable)
@@ -183,49 +185,98 @@
- But we also need to specify:
- an environment variable to specify that we want etcdctl v3
- the address of the server to back up
- the path to the key, certificate, and CA certificate
<br/>(if our etcd uses TLS certificates)
- an environment variable to specify that we want etcdctl v3
<br/>(not necessary anymore with recent versions of etcd)
---
## Snapshotting etcd on kubeadm
## Snapshotting etcd on kubeadm
- The following command will work on clusters deployed with kubeadm
- Here is a strategy that works on clusters deployed with kubeadm
(and maybe others)
- It should be executed on a master node
- We're going to:
```bash
docker run --rm --net host -v $PWD:/vol \
-v /etc/kubernetes/pki/etcd:/etc/kubernetes/pki/etcd:ro \
-e ETCDCTL_API=3 k8s.gcr.io/etcd:3.3.10 \
etcdctl --endpoints=https://[127.0.0.1]:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
--key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
snapshot save /vol/snapshot
```
- identify a node running the control plane
- It will create a file named `snapshot` in the current directory
- identify the etcd image
- execute `etcdctl snapshot` in a *debug container*
- transfer the resulting snapshot with another *debug container*
---
## How can we remember all these flags?
## Finding an etcd node and image
- Older versions of kubeadm did add a healthcheck probe with all these flags
These commands let us retrieve the node and image automatically.
- That healthcheck probe was calling `etcdctl` with all the right flags
.lab[
- With recent versions of kubeadm, we're on our own!
- Get the name of a control plane node:
```bash
NODE=$(kubectl get nodes \
--selector=node-role.kubernetes.io/control-plane \
-o jsonpath={.items[0].metadata.name})
```
- Exercise: write the YAML for a batch job to perform the backup
- Get the etcd image:
```bash
IMAGE=$(kubectl get pods --namespace kube-system etcd-$NODE \
-o jsonpath={..containers[].image})
```
(how will you access the key and certificate required to connect?)
]
---
## Making a snapshot
This relies on the fact that in a `node` debug pod:
- the host filesystem is mounted in `/host`,
- the debug pod is using the host's network.
.lab[
- Execute `etcdctl` in a debug pod:
```bash
kubectl debug --attach --profile=general \
node/$NODE --image $IMAGE -- \
etcdctl --endpoints=https://[127.0.0.1]:2379 \
--cacert=/host/etc/kubernetes/pki/etcd/ca.crt \
--cert=/host/etc/kubernetes/pki/etcd/healthcheck-client.crt \
--key=/host/etc/kubernetes/pki/etcd/healthcheck-client.key \
snapshot save /host/tmp/snapshot
```
]
---
## Transfer the snapshot
We're going to use base64 encoding to ensure that the snapshot
doesn't get corrupted in transit.
.lab[
- Retrieve the snapshot:
```bash
kubectl debug --attach --profile=general --quiet \
node/$NODE --image $IMAGE -- \
base64 /host/tmp/snapshot | base64 -d > snapshot
```
]
We can now work with the `snapshot` file in the current directory!
---
@@ -252,8 +303,7 @@ docker run --rm --net host -v $PWD:/vol \
1. Create a new data directory from the snapshot:
```bash
sudo rm -rf /var/lib/etcd
docker run --rm -v /var/lib:/var/lib -v $PWD:/vol \
-e ETCDCTL_API=3 k8s.gcr.io/etcd:3.3.10 \
docker run --rm -v /var/lib:/var/lib -v $PWD:/vol $IMAGE \
etcdctl snapshot restore /vol/snapshot --data-dir=/var/lib/etcd
```
@@ -281,6 +331,20 @@ docker run --rm --net host -v $PWD:/vol \
---
## Accessing etcd directly
- Data in etcd is encoded in a binary format
- We can retrieve the data with etcdctl, but it's hard to read
- There is a tool to decode that data: `auger`
- Check the [use cases][auger-use-cases] for an example of how to retrieve and modify etcd data!
[auger-use-cases]: https://github.com/etcd-io/auger?tab=readme-ov-file#use-cases
---
## More information about etcd backups
- [Kubernetes documentation](https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#built-in-snapshot) about etcd backups
@@ -291,6 +355,8 @@ docker run --rm --net host -v $PWD:/vol \
- [Another good blog post by consol labs](https://labs.consol.de/kubernetes/2018/05/25/kubeadm-backup.html) on the same topic
- [auger](https://github.com/etcd-io/auger), a tool to directly access Kubernetes objects stored in etcd
---
## Don't forget ...
@@ -346,19 +412,15 @@ docker run --rm --net host -v $PWD:/vol \
## More backup tools
- [Stash](https://appscode.com/products/stash/)
- [Stash](https://stash.run/)
back up Kubernetes persistent volumes
- [ReShifter](https://github.com/mhausenblas/reshifter)
cluster state management
- ~~Heptio Ark~~ [Velero](https://github.com/heptio/velero)
- ~~Heptio Ark~~ [Velero](https://velero.io/)
full cluster backup
- [kube-backup](https://github.com/pieterlange/kube-backup)
- [kube-backup](https://github.com/pieterlange/kube-backup) (unmaintained)
simple scripts to save resource YAML to a git repository
@@ -366,6 +428,10 @@ docker run --rm --net host -v $PWD:/vol \
Backup Interface for Volumes Attached to Containers
- [Veeam Kasten](https://www.veeam.com/products/cloud/kubernetes-data-protection.html)
commercial product; compares to Velero
???
:EN:- Backing up clusters

View File

@@ -1,5 +1,7 @@
# Upgrading clusters
**(Note: we won't do the labs for that section!)**
- It's *recommended* to run consistent versions across a cluster
(mostly to have feature parity and latest security updates)

View File

@@ -406,7 +406,7 @@ class: pic
- all the NGINX containers would be on the same node
- they would all have the same IP address
<br/>(resulting in `Address alreading in use` errors)
<br/>(resulting in `Address already in use` errors)
---

View File

@@ -22,11 +22,13 @@
## Authentication and authorization
- Authentication (checking "who you are") is done with mutual TLS
- Authentication (checking "who you are") can be done in different ways:
(both the client and the server need to hold a valid certificate)
- with mutual TLS (both client and server need to hold a valid certificate)
- Authorization (checking "what you can do") is done in different ways
- with service account tokens (issued by the Kubernetes API server)
- Authorization (checking "what you can do") can also be done in multiple ways:
- the API server implements a sophisticated permission logic (with RBAC)
@@ -34,6 +36,30 @@
- some services require a certificate signed by a particular CA / sub-CA
- there is also a special "Node Authorizer" (for kubelet API access)
---
## Mutual TLS vs tokens
- Service account tokens:
- automatically generated by API server
- can be exposed to pods through e.g. volume mounts
- require the control plane to be up and running
- can't be used by kubelets or by static pods
- Mutual TLS:
- requires manual generation (and renewal!)
- doesn't require the control plane to be up and running
- particularly relevant for kubelets and static pods
---
## In practice
@@ -114,22 +140,17 @@
---
## API server authentication with TLS certificates
## API server clients
- Some control plane components will authenticate with TLS certificates
- The API server has a sophisticated authentication and authorization system
- For connections coming from other components of the control plane:
- authentication uses certificates (trusting the certificates' subject or CN)
- authorization uses whatever mechanism is enabled (most oftentimes, RBAC)
(typically: scheduler, controller manager; also: kubelets!)
- The relevant API server flags are:
`--client-ca-file`, `--tls-cert-file`, `--tls-private-key-file`
- Each component connecting to the API server takes a `--kubeconfig` flag
- These clients will typically accept a `--kubeconfig` flag
(to specify a kubeconfig file containing the CA cert, client key, and client cert)
@@ -137,6 +158,84 @@
---
## API server authentication with tokens
- Some control plane components may authenticate with Service Account tokens
(typically: controllers like CNI, CSI, Ingress...)
- The relevant API server flags are:
`--service-account-signing-key-file`, `--service-account-issuer`, `--service-account-key-file`
- These clients will automatically detect that they should use "in cluster config"
- That detection relies on the following things to exist:
- environment variables `KUBERNETES_SERVICE_HOST` and `KUBERNETES_SERVICE_PORT`
- token in file `/var/run/secrets/kubernetes.io/serviceaccount/token`
---
## API server clients authorization
- Most clients will rely on the `RBAC` authorizer
- enabled with API server flag `--authorization-mode=RBAC`
- that flag will automatically create a bunch of roles and bindings
- clients should use standard names (e.g. `system:kube-scheduler`)
- Kubelets will rely on the `Node` authorizer
- enabled with API server flag `--authorization-mode=Node`
- this authorizer makes sure that kubelets work on a "need-to-know" basis
- kubelets should use standard names (`system:node:<name-of-the-node>`)
- Note: to enable both authorizers, use `--authorization-mode=RBAC,Node`
---
class: extra-details
## How are these permissions set up?
- A bunch of roles and bindings are defined as constants in the API server code:
[auth/authorizer/rbac/bootstrappolicy/policy.go](https://github.com/kubernetes/kubernetes/blob/release-1.19/plugin/pkg/auth/authorizer/rbac/bootstrappolicy/policy.go#L188)
- They are created automatically when the API server starts:
[registry/rbac/rest/storage_rbac.go](https://github.com/kubernetes/kubernetes/blob/release-1.19/pkg/registry/rbac/rest/storage_rbac.go#L140)
- We must use the correct Common Names (`CN`) for the control plane certificates
(since the bindings defined above refer to these common names)
---
class: extra-details
## The Node Authorizer
- Question: when should node `X` be able to access secret `Y`?
--
- Answer: if, and only if, node `X` runs a pod that uses secret `Y`
- The Node Authorizer implements that kind of logic
- It also allows kubelets to set labels and taints for themselves
(but not for other nodes)
---
## Kubelet and API server
- Communication between kubelet and API server can be established both ways
@@ -167,6 +266,12 @@
(it will authenticate like any other client)
- Authorization will typically require the Node Authorizer mentioned earlier
⚠️ Kubelet certificates need to be renewed regularly!
- This is typically done through the CSR API
---
## API server → kubelet
@@ -203,37 +308,31 @@
- Its certificate will have `CN=system:kube-controller-manager`
- To improve security posture, each controller can use an individual Service Account
- This is enabled with flag `--use-service-account-credentials=true`
---
## Controller manager keys
- The controller can create Secrets holding Service Account tokens
- this is enabled with flag `--service-account-private-key-file`
- this was used in older versions of Kubernetes (before *bound tokens*)
- in modern clusters, kubelet uses the `TokenRequest` API instead
- If we use the CSR API, the controller manager needs the CA cert and key
(passed with flags `--cluster-signing-cert-file` and `--cluster-signing-key-file`)
- the CSR API is used in many clusters to renew kubelet certificates
- We usually want the controller manager to generate tokens for service accounts
- These tokens deserve some details (on the next slide!)
- it's enabled with `--cluster-signing-cert-file` and `--cluster-signing-key-file`
---
class: extra-details
## How are these permissions set up?
- A bunch of roles and bindings are defined as constants in the API server code:
[auth/authorizer/rbac/bootstrappolicy/policy.go](https://github.com/kubernetes/kubernetes/blob/release-1.19/plugin/pkg/auth/authorizer/rbac/bootstrappolicy/policy.go#L188)
- They are created automatically when the API server starts:
[registry/rbac/rest/storage_rbac.go](https://github.com/kubernetes/kubernetes/blob/release-1.19/pkg/registry/rbac/rest/storage_rbac.go#L140)
- We must use the correct Common Names (`CN`) for the control plane certificates
(since the bindings defined above refer to these common names)
---
## Service account tokens
- Each time we create a service account, the controller manager generates a token
## Service account tokens recap
- These tokens are JWT tokens, signed with a particular key
@@ -241,13 +340,14 @@ class: extra-details
(and therefore, the API server needs to be able to verify their integrity)
- This uses another keypair:
- That key is passed to the API server using a couple of flags:
- the private key (used for signature) is passed to the controller manager
<br/>(using flags `--service-account-private-key-file` and `--root-ca-file`)
- `--service-account-private-key-file` (used to issue tokens)
- the public key (used for verification) is passed to the API server
<br/>(using flag `--service-account-key-file`)
- `--service-account-key-file` (used to verify tokens)
- The private key is also passed to the controller manager
<br/>(using flag `--service-account-private-key-file`)
---
@@ -261,8 +361,14 @@ class: extra-details
- It will authenticate using the token of that Service Account
- It's also possible (but rare) to run it with e.g. static pods
(it will then require TLS keys; possibly the same as kubelet's!)
---
class: extra-details
## Webhooks
- We mentioned webhooks earlier; how does that really work?
@@ -283,6 +389,8 @@ class: extra-details
---
class: extra-details
## Subject Access Review
Here is an example showing how to check if `jean.doe` can `get` some `pods` in `kube-system`:

View File

@@ -6,7 +6,7 @@ We are going to cover:
- Controllers
- Dynamic Admission Webhooks
- Admission Control
- Custom Resource Definitions (CRDs)
@@ -128,23 +128,36 @@ then make or request changes where needed.*
---
## Admission controllers
## Admission control
- Admission controllers can vet or transform API requests
- Validate (approve/deny) or mutate (modify) API requests
- The diagram on the next slide shows the path of an API request
- In modern Kubernetes, we have at least 3 ways to achieve that:
(courtesy of Banzai Cloud)
- [admission controllers][ac-controllers] (built in the API server)
- [dynamic admission control][ac-webhooks] (with webhooks)
- [validating admission policies][ac-vap] (using CEL, Common Expression Language)
- More is coming; e.g. [mutating admission policies][ac-map] (alpha in Kubernetes 1.32)
[ac-controllers]: https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/
[ac-webhooks]: https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/
[ac-vap]: https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/
[ac-map]: https://kubernetes.io/docs/reference/access-authn-authz/mutating-admission-policy/
---
class: pic
![API request lifecycle](images/api-request-lifecycle.png)
![API request lifecycle; from Kubernetes documentation](images/admission-control-phases.svg)
---
## Types of admission controllers
## Admission controllers
- Built in the API server
- *Validating* admission controllers can accept/reject the API call
@@ -156,9 +169,13 @@ class: pic
- There are a number of built-in admission controllers
(see [documentation](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#what-does-each-admission-controller-do) for a list)
([and a bunch of them are enabled by default][ac-default])
- We can also dynamically define and register our own
- They can be enabled/disabled with API server command-line flags
(this is not always possible when using *managed* Kubernetes!)
[ac-default]: https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#which-plugins-are-enabled-by-default
---
@@ -202,6 +219,8 @@ class: extra-details
---
class: extra-details
## Webhook Configuration
- A ValidatingWebhookConfiguration or MutatingWebhookConfiguration contains:
@@ -229,15 +248,39 @@ class: extra-details
- Sidecar injection
(Used by some service meshes)
(used by some service meshes)
- Type validation
(More on this later, in the CRD section)
(more on this later, in the CRD section)
- And many other creative + useful scenarios!
(for example in [kube-image-keeper][kuik], to rewrite image references)
[kuik]: https://github.com/enix/kube-image-keeper
---
## Kubernetes API types
## Validating Admission Policies
- Relatively recent (alpha: 1.26, beta: 1.28, GA: 1.30)
- Declare validation rules with Common Expression Language (CEL)
- Validation is done entirely within the API server
(no external webhook = no latency, no deployment complexity...)
- Not as powerful as full-fledged webhook engines like Kyverno
(see e.g. [this page of the Kyverno doc][kyverno-vap] for a comparison)
[kyverno-vap]: https://kyverno.io/docs/policy-types/validating-policy/
---
## Kubernetes API resource types
- Almost everything in Kubernetes is materialized by a resource
@@ -271,21 +314,21 @@ class: extra-details
## Examples
- Representing configuration for controllers and operators
(e.g. Prometheus scrape targets, gitops configuration, certificates...)
- Representing composite resources
(e.g. clusters like databases, messages queues ...)
(e.g. database cluster, message queue...)
- Representing external resources
(e.g. virtual machines, object store buckets, domain names ...)
- Representing configuration for controllers and operators
(e.g. custom Ingress resources, certificate issuers, backups ...)
(e.g. virtual machines, object store buckets, domain names...)
- Alternate representations of other objects; services and service instances
(e.g. encrypted secret, git endpoints ...)
(e.g. encrypted secret, git endpoints...)
---
@@ -339,17 +382,18 @@ class: extra-details
---
## Documentation
## And more...
- [Custom Resource Definitions: when to use them](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
- Some specifics areas of Kubernetes also have extension points
- [Custom Resources Definitions: how to use them](https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/)
- Example: scheduler
- [Built-in Admission Controllers](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/)
- it's possible to [customize the behavior of the scheduler][sched-config]
- [Dynamic Admission Controllers](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/)
- or even run [multiple schedulers][sched-multiple]
- [Aggregation Layer](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
[sched-config]: https://kubernetes.io/docs/reference/scheduling/config/
[sched-multiple]: https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/
???

View File

@@ -55,7 +55,7 @@
- We're going to talk mostly about "Kubernetes cluster centric" approaches here
[ArgoCD]: https://argoproj.github.io/cd/
[Flux]: https://fluxcd.io/
[FluxCD]: https://fluxcd.io/
---

134
slides/k8s/kuik.md Normal file
View File

@@ -0,0 +1,134 @@
# Kube Image Keeper
- Open source solution to improve registry availability
- Not-too-simple, not-too-complex operator
(nothing "magic" in the way it works)
- Leverages various Kubernetes features
(CRD, mutation webhooks...)
- Written in Go, with the kubebuilder framework
---
## Registry problems that can happen
- Registry is unavailable or slow
(e.g. [registry.k8s.io outage in April 2023][registry-k8s-outage])
- Image was deleted from registry
(accidentally, or by retention policy)
- Registry has pull quotas
(hello Docker Hub!)
[registry-k8s-outage]: https://github.com/kubernetes/registry.k8s.io/issues/234#issuecomment-1524456564
---
## Registries are hard to monitor
- Most Kubernetes clusters use images from many registries
(should we continuously monitor all of them?)
- Registry can be up, but image can be missing
(should we monitor every image individually?)
- Some registries have quotas
(can we even monitor them without triggering these quotas?)
---
## Can't we mirror registries?
- Not as straightforward as, say, mirroring package repositories
- Requires container engine configuration or rewriting image references
---
## How it works
- A mutating webhook rewrites image references:
`ghcr.io/foo/bar:lol``localhost:7439/ghcr.io/foo/bar:lol`
- `localhost:7439` is served by the kuik-proxy DaemonSet
- It serves images either from a cache, or directly from origin
- The cache is a regular registry running on the cluster
(it can be stateless, stateful, backed by PVC, object store...)
- Images are put in cache by the kuik-controller
- Images are tracked by a CachedImage Custom Resource
---
## Some diagrams
See diagrams in [this presentation][kuik-slides].
(The full video of the presentation is available [here][kuik-video].)
[kuik-slides]: https://docs.google.com/presentation/d/19eEogm2HFRNTSqr_1ItLf2wZZP34TUl_RHFhwj3RZEY/edit#slide=id.g27e8d88ad7c_0_142
[kuik-video]: https://www.youtube.com/watch?v=W1wcIdn0DHY
---
## Operator (SRE) analysis
*After using kuik in production for a few years...*
- Prevented many outages or quasi-outages
(e.g. hitting quotas while scaling up or replacing nodes)
- Running in stateless mode is possible but *not recommended*
(it's mostly for testing and quick deployments!)
- When managing many clusters, it would be nice to share the cache
(not just to save space, but get better performance for common images)
- Kuik architecture makes it suitable to virtually *any* cluster
(not tied to a particular distribution, container engine...)
---
## Operator (CRD) analysis
- Nothing "magic"
- The mutating webhook might even be replaced with Kyverno, CEL... in the long run
- The CachedImage CR exposes internal data (reference count, age, etc)
- Leverages kubebuilder (not reinventing too many wheels, hopefully!)
- Leverages existing building blocks (like the image registry)
- Some minor inefficiencies (e.g. double pull when image is not in cache)
- Breaks semantics for `imagePullPolicy: Always` (but [this is tunable][kuik-ippa])
[kuik-ippa]: https://github.com/enix/kube-image-keeper/issues/156#issuecomment-2312966436
???
:EN:- Image retention with Kube Image Keeper
:FR:- Mise en cache des images avec KUIK

View File

@@ -1,44 +1,78 @@
# Policy Management with Kyverno
- The Kubernetes permission management system is very flexible ...
- Kyverno is a policy engine for Kubernetes
- ... But it can't express *everything!*
- It has many use cases, including:
- Examples:
- validating resources when they are created/edited
<br/>(blocking or logging violations)
- forbid using `:latest` image tag
- preventing some modifications
<br/>(e.g. restricting modifications to some fields, labels...)
- enforce that each Deployment, Service, etc. has an `owner` label
<br/>(except in e.g. `kube-system`)
- modifying resources automatically
- enforce that each container has at least a `readinessProbe` healthcheck
- generating resources automatically
- How can we address that, and express these more complex *policies?*
- clean up resources automatically
---
## Admission control
## Examples (validation)
- The Kubernetes API server provides a generic mechanism called *admission control*
- [Disallow `:latest` tag](https://kyverno.io/policies/best-practices/disallow-latest-tag/disallow-latest-tag/)
- Admission controllers will examine each write request, and can:
- [Disallow secrets in environment variables](https://kyverno.io/policies/other/disallow-secrets-from-env-vars/disallow-secrets-from-env-vars/)
- approve/deny it (for *validating* admission controllers)
- [Require that containers drop all capabilities](https://kyverno.io/policies/best-practices/require-drop-all/require-drop-all/)
- additionally *update* the object (for *mutating* admission controllers)
- [Prevent creation of Deployment, ReplicaSet, etc. without an HPA](https://kyverno.io/policies/other/check-hpa-exists/check-hpa-exists/)
- These admission controllers can be:
- [Forbid CPU limits](https://kyverno.io/policies/other/forbid-cpu-limits/forbid-cpu-limits/)
- plug-ins built into the Kubernetes API server
<br/>(selectively enabled/disabled by e.g. command-line flags)
- [Check that memory requests are equal to limits](https://kyverno.io/policies/other/memory-requests-equal-limits/memory-requests-equal-limits/)
- webhooks registered dynamically with the Kubernetes API server
- [Require containers to have healthchecks](https://kyverno.io/policies/best-practices/require-probes/require-probes/)
---
## What's Kyverno?
## Examples (mutation)
- Policy management solution for Kubernetes
- [Automatically add environment variables from a ConfigMap](https://kyverno.io/policies/other/add-env-vars-from-cm/add-env-vars-from-cm/)
- [Add image as an environment variable](https://kyverno.io/policies/other/add-image-as-env-var/add-image-as-env-var/)
- [Add image `LABEL` as an environment variable](https://kyverno.io/policies/other/inject-env-var-from-image-label/inject-env-var-from-image-label/)
- [When creating a Deployment, copy some labels from its Namespace](https://kyverno.io/policies/other/copy-namespace-labels/copy-namespace-labels/)
- [Automatically restart a given Deployment when a given ConfigMap changes](https://kyverno.io/policies/other/restart-deployment-on-secret-change/restart-deployment-on-secret-change/)
---
## Examples (generation)
- [Automatically create a PDB when a Deployment is created](https://kyverno.io/policies/other/create-default-pdb/create-default-pdb/)
- [Create an event when an object is deleted (for auditing purposes)](https://kyverno.io/policies/other/audit-event-on-delete/audit-event-on-delete/)
- [Create an audit event when using `kubectl exec`](https://kyverno.io/policies/other/audit-event-on-exec/audit-event-on-exec/)
- [Automatically create a Secret (e.g. for registry auth) when a Namespace is created](https://kyverno.io/policies/other/sync-secrets/sync-secrets/)
---
## Examples (advanced validation)
- [Only allow root user in images coming from a trusted registry](https://kyverno.io/policies/other/only-trustworthy-registries-set-root/only-trustworthy-registries-set-root/)
- [Prevent images that haven't been checked by a vulnerability scanner](https://kyverno.io/policies/other/require-vulnerability-scan/require-vulnerability-scan/)
- [Prevent ingress with the same host and path](https://kyverno.io/policies/other/unique-ingress-host-and-path/unique-ingress-host-and-path/)
---
## More about Kyverno
- Open source (https://github.com/kyverno/kyverno/)
@@ -50,53 +84,25 @@
- It's not the only solution!
(see e.g. [Open Policy Agent](https://www.openpolicyagent.org/docs/v0.12.2/kubernetes-admission-control/))
(see e.g. [Open Policy Agent](https://www.openpolicyagent.org/docs/v0.12.2/kubernetes-admission-control/) or [Validating Admission Policies](https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/))
---
## What can Kyverno do?
- *Validate* resource manifests
(accept/deny depending on whether they conform to our policies)
- *Mutate* resources when they get created or updated
(to add/remove/change fields on the fly)
- *Generate* additional resources when a resource gets created
(e.g. when namespace is created, automatically add quotas and limits)
- *Audit* existing resources
(warn about resources that violate certain policies)
---
## How does it do it?
## How does it work?
- Kyverno is implemented as a *controller* or *operator*
- It typically runs as a Deployment on our cluster
- Policies are defined as *custom resource definitions*
- Policies are defined as *custom resources*
- They are implemented with a set of *dynamic admission control webhooks*
--
🤔
--
- Let's unpack that!
---
## Custom resource definitions
- When we install Kyverno, it will register new resource types:
- When we install Kyverno, it will register new resource types, including:
- Policy and ClusterPolicy (per-namespace and cluster-scope policies)
@@ -112,32 +118,6 @@
---
## Dynamic admission control webhooks
- When we install Kyverno, it will register a few webhooks for its use
(by creating ValidatingWebhookConfiguration and MutatingWebhookConfiguration resources)
- All subsequent resource modifications are submitted to these webhooks
(creations, updates, deletions)
---
## Controller
- When we install Kyverno, it creates a Deployment (and therefore, a Pod)
- That Pod runs the server used by the webhooks
- It also runs a controller that will:
- run checks in the background (and generate PolicyReport objects)
- process GenerateRequest objects asynchronously
---
## Kyverno in action
- We're going to install Kyverno on our cluster
@@ -148,19 +128,22 @@
## Installing Kyverno
- Kyverno can be installed with a (big) YAML manifest
The recommended [installation method][install-kyverno] is to use Helm charts.
- ... or with Helm charts (which allows to customize a few things)
(It's also possible to install with a single YAML manifest.)
.lab[
- Install Kyverno:
```bash
kubectl create -f https://raw.githubusercontent.com/kyverno/kyverno/release-1.7/config/release/install.yaml
helm upgrade --install --repo https://kyverno.github.io/kyverno/ \
--namespace kyverno --create-namespace kyverno kyverno
```
]
[install-kyverno]: https://kyverno.io/docs/installation/methods/
---
## Kyverno policies in a nutshell
@@ -322,6 +305,8 @@
(with an error similar to `JMESPAth query failed: Unknown key ... in path`)
- If a precondition fails, the policy will be skipped altogether (and ignored!)
- To work around that, [use an OR expression][non-existence-checks]:
`{{ requests.object.metadata.labels.color || '' }}`
@@ -330,7 +315,7 @@
(e.g. in *preconditions*, a missing label would evalute to an empty string)
[non-existence-checks]: https://kyverno.io/docs/writing-policies/jmespath/#non-existence-checks
[non-existence-checks]: https://kyverno.io/docs/policy-types/cluster-policy/jmespath/#non-existence-checks
---
@@ -359,11 +344,39 @@
---
## `background`
## `spec.rules.validate.failureAction`
- What is this `background: false` option, and why do we need it?
- By default, this is set to `Audit`
--
- This means that rule violations are not enforced
- They still generate a warning (at the API level) and a PolicyReport
(more on that later)
- We need to change the `failureAction` to `Enforce`
---
## `background`, `admission`, `emitWarning`
- Policies have three boolean flags to control what they do and when
- `admission` = run that policy at admission
(when an object gets created/updated and validation controllers get invoked)
- `background` = run that policy in the background
(periodically check if existing objects fit the policy)
- `emitWarning` = generate an `Event` of type `Warning` associated to the validated objct
(visible with e.g. `kubectl describe` on that object)
---
## Background checks
- Admission controllers are only invoked when we change an object
@@ -375,17 +388,15 @@
(we'll see later how they are reported)
- `background: false` disables that
- `background: true/false` controls that
--
- Alright, but ... *why* do we need it?
- When would we want to disabled it? 🤔
---
## Accessing `AdmissionRequest` context
- In this specific policy, we want to prevent an *update*
- In some of our policies, we want to prevent an *update*
(as opposed to a mere *create* operation)
@@ -399,10 +410,6 @@
- We access the `AdmissionRequest` object through `{{ request }}`
--
- Alright, but ... what's the link with `background: false`?
---
## `{{ request }}`
@@ -415,6 +422,16 @@
(it can only be used when an object is actually created/updated/deleted)
--
- *Well, actually...*
--
- Kyverno exposes `{{ request.object }}` and `{{ request.namespace }}`
(see [the documentation](https://kyverno.io/docs/policy-reports/background/) for details!)
---
## Immutable primary colors, take 3
@@ -584,7 +601,7 @@ class: extra-details
- See [Linking resources with ownerReferences][ownerref] for an example
[ownerref]: https://kyverno.io/docs/writing-policies/generate/#linking-resources-with-ownerreferences
[ownerref]: https://kyverno.io/docs/writing-policies/generate/#linking-trigger-with-downstream
---
@@ -604,7 +621,19 @@ class: extra-details
---
## Footprint
## Footprint (current versions)
- 14 CRDs
- 10 webhooks
- 6 services, 4 Deployments, 2 ConfigMaps
- Internal resources (GenerateRequest) "parked" in a Namespace
---
## Footprint (older versions)
- 8 CRDs
@@ -612,9 +641,7 @@ class: extra-details
- 2 Services, 1 Deployment, 2 ConfigMaps
- Internal resources (GenerateRequest) "parked" in a Namespace
- Kyverno packs a lot of features in a small footprint
*We can see the number of resources increased over time, as Kyverno added features.*
---
@@ -622,8 +649,6 @@ class: extra-details
- Kyverno is very easy to install
(it's harder to get easier than one `kubectl apply -f`)
- The setup of the webhooks is fully automated
(including certificate generation)
@@ -634,6 +659,10 @@ class: extra-details
(e.g. `matchExpressions`)
- It has pretty good documentation, including many examples
- There is also a CLI tool (not discussed here)
---
## Caveats

View File

@@ -314,6 +314,16 @@ class: extra-details
class: extra-details
## All the details...
- [Documentation about ephemeral local storage](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#local-ephemeral-storage)
- [Blog post about local storage capacity isolation](https://kubernetes.io/blog/2022/09/19/local-storage-capacity-isolation-ga/)
---
class: extra-details
## CPU and RAM reservation
- Kubernetes passes resources requests and limits to the container engine

View File

@@ -198,6 +198,30 @@
- The only limit is yourself, and the time you are willing to sink in!
---
## GPU support
Some solutions can expose your GPU to your containers.
This can be useful for machine learning inference and training.
It only works for some combinations of hardware and operating system.
For example:
- WSL2 + NVIDIA is supported by Docker Desktop and Podman Desktop
- Linux + NVIDIA is supported by Podman Desktop
- MacOS + Apple silicon is supported by Podman Desktop
See [Docker Deskop][gpu-docker-desktop] and [Podman Desktop][gpu-podman-desktop]
documentations for more details.
[gpu-docker-desktop]: https://docs.docker.com/desktop/features/gpu/
[gpu-podman-desktop]: https://podman-desktop.io/docs/podman/gpu
???
:EN:- Kubernetes options for local development

View File

@@ -392,6 +392,18 @@ https://www.scaleway.com/en/pricing/)
- ...
---
## Reminder...
Managed Kubernetes ≠ managed hosting!
- Running an app also involves system upgrades, supervision, on-call, backups...
- "Managed hosting" means that the hosting provider takes care of it
- In "managed Kubernetes", you are responsible for these tasks!
???
:EN:- Installing a managed cluster

View File

@@ -72,6 +72,107 @@
---
## Managed ≠ managed
- Managed Kubernetes ≠ managed hosting
- Managed hosting typically means that the hosting provider takes care of:
- installation, upgrades, time-sensitive security patches, backups
- logging and metrics collection
- setting up supervision, alerts, and on-call rotation
- Managed Kubernetes typically means that the hosting provider takes care of:
- installation
- maybe upgrades (kind of; you typically need to initiate/coordinate them)
- and that's it!
---
## "Managed" Kubernetes
- "Managed Kubernetes" gives us the equivalent of a raw VM
- We still need to add a lot of things to make it production-ready
(upgrades, logging, supervision...)
- We also need some almost-essential components that don't always come out of the box
- ingress controller
- network policy controller
- storage class...
📽️[How to make Kubernetes ryhme with production readiness](https://www.youtube.com/watch?v=6G4v-ZE6OHI
)
---
## Observability
- Logging, metrics, traces...
- Pick a solution (self-hosted, as-a-service?)
- Configure control plane, nodes, various components
- Set up dashboards, track important metrics
(e.g. on AWS, track inter-AZ and external traffic per app to avoid $$$ surprises)
- Set up supervision, on-call notifications, on-call rotation
---
## Backups
- Full machine backups of the nodes?
(not very effective)
- Backup of control plane data?
(important; it's not always possible to obtain etcd backups)
- Backup of persistent volumes?
(good idea; but not always effective)
- App-level backups, e.g. database dumps, log-shipping?
(more effective and reliable; more work depending on the app and database)
---
## Upgrades
- Control plane
*typically automated by the provider; but might cause breakage*
- Nodes
*best case scenario: can be done in-place; otherwise: requires provisioning new nodes*
- Additional components (ingress controller, operators, etc.)
*depends wildly of the components!*
---
## It's dangerous to go alone!
Don't hesitate to hire help before going to production with your first K8S app!
---
## Node management
- Most "Turnkey Solutions" offer fully managed control planes
@@ -138,7 +239,7 @@
- There are too many options to list them all
(check [this page](https://kubernetes.io/partners/#conformance) for an overview!)
(check [this page](https://kubernetes.io/partners/#iframe-landscape-conformance) for an overview!)
---

View File

@@ -10,27 +10,37 @@
(e.g. national security for states that don't have a suitable domestic cloud)
- There are [countless](https://kubernetes.io/docs/setup/pick-right-solution/) distributions available
- There are countless [distributions and installers][certified-kubernetes] available
- We can't review them all
- We're just going to explore a few options
[certified-kubernetes]: https://kubernetes.io/partners/#iframe-landscape-conformance
---
## [kops](https://github.com/kubernetes/kops)
## Evolution over time
- Deploys Kubernetes using cloud infrastructure
- 2014 - early days; Kubernetes is installed manually
(supports AWS, GCE, Digital Ocean ...)
- 2015 - CoreOS, Rancher
- Leverages special cloud features when possible
- 2016 - [kops](https://github.com/kubernetes/kops), kubeadm
(e.g. Auto Scaling Groups ...)
- 2017 - Kubernetes the hard way, Docker Enterprise
- 2018 - Crossplane, Cluster API, PKS
- 2019 - k3s, Talos
- 2021 - k0s, EKS anywhere
Note: some of these dates might be approximative (should we count
announcements, first commit, first release, release 1.0...), the
goal is to get an overall idea of the evolution of the state of the art.
---
## kubeadm
## Example - kubeadm
- Provisions Kubernetes nodes on top of existing machines
@@ -40,69 +50,51 @@
- Supports HA control plane [with some extra steps](https://kubernetes.io/docs/setup/independent/high-availability/)
---
- Installing a single cluster is easy
## [kubespray](https://github.com/kubernetes-incubator/kubespray)
- Upgrading a cluster is possible, but must be done carefully
- Based on Ansible
- Works on bare metal and cloud infrastructure
(good for hybrid deployments)
- The expert says: ultra flexible; slow; complex
💡 Great to install a single cluster quickly with a reasonable learning curve.
---
## RKE (Rancher Kubernetes Engine)
## Example - Cluster API
- Opinionated installer with low requirements
- Provision and manage Kubernetes clusters declaratively
- Requires a set of machines with Docker + SSH access
- Clusters, nodes... are represented by Kubernetes resources
- Supports highly available etcd and control plane
- Initial setup is more or less complicated
- The expert says: fast; maintenance can be tricky
(depending on the infrastructure and bootstrap providers used)
- Installing many clusters is then easy
- Upgrading clusters can be fully automated
(again, depending on infrastructure, bootstrap providers...)
💡 Great to manage dozens or hundreds of clusters, with a bigger initial investment.
---
## Terraform + kubeadm
## Example - Talos Linux
- Sometimes it is necessary to build a custom solution
- Based on an immutable system
- Example use case:
(like CoreOS Linux, Flatcar... but learned a lot from these precursors)
- deploying Kubernetes on OpenStack
- Control plane and nodes are managed declaratively
- ... with highly available control plane
- Initial setup and upgrades are relatively straightforward
- ... and Cloud Controller Manager integration
- Some admin tasks require to learn a new way to do things
- Solution: Terraform + kubeadm (kubeadm driven by remote-exec)
(e.g. managing storage, troubleshooting nodes...)
- [GitHub repository](https://github.com/enix/terraform-openstack-kubernetes)
- Managing fleets of clusters is facilitated by Omni (commercial product)
- [Blog post (in French)](https://enix.io/fr/blog/deployer-kubernetes-1-13-sur-openstack-grace-a-terraform/)
---
## And many more ...
- [AKS Engine](https://github.com/Azure/aks-engine)
- Docker Enterprise Edition
- [Lokomotive](https://github.com/kinvolk/lokomotive), leveraging Terraform and [Flatcar Linux](https://www.flatcar-linux.org/)
- Pivotal Container Service (PKS)
- [Tarmak](https://github.com/jetstack/tarmak), leveraging Puppet and Terraform
- Tectonic by CoreOS (now being integrated into Red Hat OpenShift)
- [Typhoon](https://typhoon.psdn.io/), leveraging Terraform
- VMware Tanzu Kubernetes Grid (TKG)
💡 As of 2025, Talos Linux popularity has significantly increased among "trendsetters".
---

View File

@@ -1,147 +1,109 @@
# Static pods
- Hosting the Kubernetes control plane on Kubernetes has advantages:
Question: can we host the control plane of a cluster *on the cluster itself?*
- we can use Kubernetes' replication and scaling features for the control plane
- To create a Pod, we need to communicate with the API server
- we can leverage rolling updates to upgrade the control plane
- The API server needs etcd to be up
- However, there is a catch:
- Then the Pod needs to be bound to a node by the scheduler
- deploying on Kubernetes requires the API to be available
- So... all these things need to be running already!
- the API won't be available until the control plane is deployed
- Even if the Pod already exists, we still need API server and etcd
- How can we get out of that chicken-and-egg problem?
(so that kubelet can connect to the API server and "know" about the Pod)
---
## A possible approach
## Static pods
- Since each component of the control plane can be replicated...
Solution: run (parts of) the control plane in *static pods!*
- We could set up the control plane outside of the cluster
- Normally, kubelet queries the API server to know what pods to run
- Then, once the cluster is fully operational, create replicas running on the cluster
- Additionally, we can tell kubelet to run pods:
- Finally, remove the replicas that are running outside of the cluster
- by storing manifests in a directory (`--pod-manifest-path`)
*What could possibly go wrong?*
- by retrieving manifests from an HTTP server (`--manifest-url`)
- These manifests should be normal pod manifests
(make sure to include the namespace in the metadata block!)
- kubelet will append the node name after the pod name
---
## Sawing off the branch you're sitting on
## How and when kubelet runs static pods
- What if anything goes wrong?
- kubelet runs static pods "no matter what"
(During the setup or at a later point)
(even if it can't connect to the API server, or if no API server is configured)
- Worst case scenario, we might need to:
- When there is no API server configuration, that's called "standalone mode"
- set up a new control plane (outside of the cluster)
- Almost nothing can prevent kubelet from running these pods
- restore a backup from the old control plane
(e.g. admission controllers, pod security settings... won't apply)
- move the new control plane to the cluster (again)
- kubelet monitors the manifest path (and/or the manifest URL)
- This doesn't sound like a great experience
- If manifests are deleted: their pods are destroyed
- If manifests are modified: their pods are destroyed and recreated
---
## Static pods to the rescue
## Mirror pods
- Pods are started by kubelet (an agent running on every node)
- Static pods remain running even after API server connection is up
- To know which pods it should run, the kubelet queries the API server
- Once the API server is up, kubelet will create *mirror pods*
- The kubelet can also get a list of *static pods* from:
- Mirror pods represent the static pods that are running
- a directory containing one (or multiple) *manifests*, and/or
.warning[Deleting a mirror pod has no effect on the static pod!]
- a URL (serving a *manifest*)
- kubelet will immediately recreate the mirror pod if it is deleted
- These "manifests" are basically YAML definitions
.warning[Admission control can block the mirror pod, but not the static pod!]
(As produced by `kubectl get pod my-little-pod -o yaml`)
- Since kubelet runs the static pod even if there is no connection to the API server
---
## Static pods are dynamic
## Example
- Kubelet will periodically reload the manifests
- `kubeadm` leverages static pods to run the control plane
- It will start/stop pods accordingly
(etcd, API server, controller manager, scheduler)
(i.e. it is not necessary to restart the kubelet after updating the manifests)
- It "renders" a number of YAML manifests to `/etc/kubernetes/manifests`
- When connected to the Kubernetes API, the kubelet will create *mirror pods*
- This is the cluster boot sequence:
- Mirror pods are copies of the static pods
- machine boots
(so they can be seen with e.g. `kubectl get pods`)
- kubelet is started (typically by systemd)
- kubelet reads static pod manifests and run them
- control plane is up, yay!
---
## Bootstrapping a cluster with static pods
class: extra-details
- We can run control plane components with these static pods
## Pod checkpointer
- They can start without requiring access to the API server
- Once they are up and running, the API becomes available
- These pods are then visible through the API
(We cannot upgrade them from the API, though)
*This is how kubeadm has initialized our clusters.*
---
## Static pods vs normal pods
- The API only gives us read-only access to static pods
- We can `kubectl delete` a static pod...
...But the kubelet will re-mirror it immediately
- Static pods can be selected just like other pods
(So they can receive service traffic)
- A service can select a mixture of static and other pods
---
## From static pods to normal pods
- Once the control plane is up and running, it can be used to create normal pods
- We can then set up a copy of the control plane in normal pods
- Then the static pods can be removed
- The scheduler and the controller manager use leader election
(Only one is active at a time; removing an instance is seamless)
- Each instance of the API server adds itself to the `kubernetes` service
- Etcd will typically require more work!
---
## From normal pods back to static pods
- Alright, but what if the control plane is down and we need to fix it?
- We restart it using static pods!
- This can be done automatically with a “pod checkpointer”
- This pattern isn't used anymore, but perhaps it can provide inspiration
- The pod checkpointer automatically generates manifests of running pods
(if they have specific labels/annotations)
- The manifests are used to restart these pods if API contact is lost
- This pattern is implemented in [openshift/pod-checkpointer-operator] and [bootkube checkpointer]
@@ -151,95 +113,6 @@
[openshift/pod-checkpointer-operator]: https://github.com/openshift/pod-checkpointer-operator
[bootkube checkpointer]: https://github.com/kubernetes-retired/bootkube/blob/master/cmd/checkpoint/README.md
---
## Where should the control plane run?
*Is it better to run the control plane in static pods, or normal pods?*
- If I'm a *user* of the cluster: I don't care, it makes no difference to me
- What if I'm an *admin*, i.e. the person who installs, upgrades, repairs... the cluster?
- If I'm using a managed Kubernetes cluster (AKS, EKS, GKE...) it's not my problem
(I'm not the one setting up and managing the control plane)
- If I already picked a tool (kubeadm, kops...) to set up my cluster, the tool decides for me
- What if I haven't picked a tool yet, or if I'm installing from scratch?
- static pods = easier to set up, easier to troubleshoot, less risk of outage
- normal pods = easier to upgrade, easier to move (if nodes need to be shut down)
---
## Static pods in action
- On our clusters, the `staticPodPath` is `/etc/kubernetes/manifests`
.lab[
- Have a look at this directory:
```bash
ls -l /etc/kubernetes/manifests
```
]
We should see YAML files corresponding to the pods of the control plane.
---
class: static-pods-exercise
## Running a static pod
- We are going to add a pod manifest to the directory, and kubelet will run it
.lab[
- Copy a manifest to the directory:
```bash
sudo cp ~/container.training/k8s/just-a-pod.yaml /etc/kubernetes/manifests
```
- Check that it's running:
```bash
kubectl get pods
```
]
The output should include a pod named `hello-node1`.
---
class: static-pods-exercise
## Remarks
In the manifest, the pod was named `hello`.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: hello
namespace: default
spec:
containers:
- name: hello
image: nginx
```
The `-node1` suffix was added automatically by kubelet.
If we delete the pod (with `kubectl delete`), it will be recreated immediately.
To delete the pod, we need to delete (or move) the manifest file.
???
:EN:- Static pods

View File

@@ -18,31 +18,43 @@
- The demos in this section require that we have access to our cluster's CA
- This is easy if we are using a cluster deployed with `kubeadm`
- *On a managed cluster:* the CA is very rarely exposed by the provider
- Otherwise, we may or may not have access to the cluster's CA
- *On a self-hosted cluster:* the CA should be available somewhere
(it may or may not be easy to find, though!)
- We may or may not be able to use the CSR API instead
---
## Check that we have access to the CA
## Locate the CA key and cert
- Make sure that you are logged on the node hosting the control plane
- On a cluster deployed with `kubeadm`:
(if a cluster has been provisioned for you for a training, it's `node1`)
*the files will be in `/etc/kubernetes/pki` (on any control plane node)*
.lab[
- On a cluster deployed with something like k3s or k0s:
- Check that the CA key is here:
*Check the docs to know where the CA files are*
*(and for extra credit, submit a PR to update this slide!)*
- On a cluster deployed manually (like "dessine-moi un cluster"):
*the files will be wherever you did put them*
---
## Let's set environment variables
- To normalize the commands in the next slides:
```bash
sudo ls -l /etc/kubernetes/pki
CA_KEY=/.../ca.key
CA_CRT=/.../ca.crt
```
]
The output should include `ca.key` and `ca.crt`.
---
## How it works
@@ -57,14 +69,14 @@ The output should include `ca.key` and `ca.crt`.
.lab[
- Check which CA is used by the Kubernetes API server:
- On clusters deployed with `kubeadm`, this will show the location of the CA cert:
```bash
sudo grep crt /etc/kubernetes/manifests/kube-apiserver.yaml
sudo grep client-ca /etc/kubernetes/manifests/kube-apiserver.yaml
```
]
This is the flag that we're looking for:
This should output a flag like the following one:
```
--client-ca-file=/etc/kubernetes/pki/ca.crt
```
@@ -111,7 +123,7 @@ This is the flag that we're looking for:
- Generate the certificate:
```bash
sudo openssl x509 -req \
-CA /etc/kubernetes/pki/ca.crt -CAkey /etc/kubernetes/pki/ca.key \
-CA $CA_CRT -CAkey $CA_KEY \
-in user.csr -days 1 -set_serial 1234 > user.crt
```
@@ -167,12 +179,48 @@ We could also embed the key and certs with the `--embed-certs` option.
]
Access will be denied, but we should see that were correctly *authenticated* as `jerome`.
Does it work, or do we get a permission error?
---
## On a normal cluster
- We should get a message like the following one:
```
Error from server (Forbidden): pods is forbidden: User "jerome"
cannot list resource "pods" in API group "" in the namespace "default"
```
- This means:
*your user key and cert are valid (`User "jerome"`)...*
*...but you don't have permission to get pods yet*
- We now need to grant permissions (e.g. with Roles and Rolebindings)
---
## On a cluster deployed manually
- If we haven't enabled RBAC, then it will work
(because without RBAC, any valid certificate gives full access to the API)
- We could (should?) enable RBAC!
- But then we'll need to generate keys and certs for all API clients
(including, but not limited to, control plane components and kubelets)
---
## Granting permissions
- If RBAC is enabled, we can give some permissions to our new user
- The following example assumes a `kubeadm` cluster
- Let's add some read-only permissions to the `devs` group (for instance)
.lab[
@@ -212,6 +260,16 @@ Access will be denied, but we should see that were correctly *authenticated* as
]
---
## `kubeadm kubeconfig user`
- On `kubeadm` clusters, there is a command to automate key and certificate generation
- `kubeadm kubeconfig user` will issue a key, certificate, and output the kubeconfig file
- It will access CA key and cert in `/etc/kubernetes/pki/` directly
???
:EN:- Authentication with user certificates

View File

@@ -149,6 +149,7 @@ content:
- k8s/operators-design.md
- k8s/operators-example.md
- k8s/kubebuilder.md
- k8s/kuik.md
- k8s/sealed-secrets.md
- k8s/kyverno.md
- k8s/eck.md

View File

@@ -2,9 +2,15 @@
- Bonjour !
<!--
- Sur scène : Jérôme ([@jpetazzo@hachyderm.io])
-->
- En backstage : Alexandre, Antoine, Aurélien (x2), Benjamin, David, Kostas, Nicolas, Paul, Sébastien, Thibault...
- Sur scène lundi+mardi : Jérôme ([@jpetazzo@hachyderm.io])
- Sur scène mercredi+jeudi : Ludovic
- En backstage : Alexandre, Antoine, Aurélien (x2), Baptiste, Benjamin, David, Hadrien, Kostas, Louis, Magalie, Nicolas, Paul, Sébastien, Thibault, Yoann...
- Horaires : tous les jours de 9h à 13h
@@ -55,24 +61,16 @@
---
## Allô Docker¹ ?
## La hotline
- Chaque après-midi : une heure de questions/réponses ouvertes !
(sauf le dernier jour)
- Une heure de questions/réponses ouvertes !
- Mardi: 16h00-17h00
- Jeudi: 16h00-17h00
- Mercredi: 15h30-16h30
- Vendredi: 15h00-16h00
- Jeudi: 15h00-16h00
- Lundi: 15h30-16h30
- Sur [Jitsi][jitsi] (lien "visioconf" sur le portail de formation)
.footnote[¹Clin d'œil à l'excellent ["Quoi de neuf Docker?"][qdnd] de l'excellent [Nicolas Deloof][ndeloof] 🙂]
[qdnd]: https://www.youtube.com/channel/UCOAhkxpryr_BKybt9wIw-NQ
[ndeloof]: https://github.com/ndeloof
[jitsi]: https://training.enix.io/jitsi-magic/jitsi.container.training/Janvier2025
- Sur Jitsi (lien "visioconf" sur le portail de formation)

28
slides/m6.yml Normal file
View File

@@ -0,0 +1,28 @@
title: |
Using Kubernetes
in an Enterprise-like scenario
chat: "[Slack](https://dockercommunity.slack.com/messages/C7GKACWDV)"
#chat: "[Gitter](https://gitter.im/jpetazzo/workshop-yyyymmdd-city)"
gitrepo: github.com/jpetazzo/container.training
slides: https://container.training/
#slidenumberprefix: "#SomeHashTag &mdash; "
exclude:
- in-person
content:
- k8s/M6-START-a-company-scenario.md
- k8s/M6-T02-flux-install.md
- k8s/M6-T03-installing-tenants.md
- k8s/M6-R01-flux_configure-ROCKY-deployment.md
- k8s/M6-T05-ingress-config.md
- k8s/M6-M01-adding-MOVY-tenant.md
- k8s/M6-K01-METAL-install.md
- k8s/M6-K03-openebs-install.md
- k8s/M6-monitoring-stack-install.md
- k8s/M6-kyverno-install.md