12 KiB
Upgrading clusters
-
It's recommended to run consistent versions across a cluster
(mostly to have feature parity and latest security updates)
-
It's not mandatory
(otherwise, cluster upgrades would be a nightmare!)
-
Components can be upgraded one at a time without problems
Checking what we're running
- It's easy to check the version for the API server
.lab[
-
Log into node
oldversion1 -
Check the version of kubectl and of the API server:
kubectl version
]
-
In a HA setup with multiple API servers, they can have different versions
-
Running the command above multiple times can return different values
Node versions
- It's also easy to check the version of kubelet
.lab[
- Check node versions (includes kubelet, kernel, container engine):
kubectl get nodes -o wide
]
-
Different nodes can run different kubelet versions
-
Different nodes can run different kernel versions
-
Different nodes can run different container engines
Control plane versions
- If the control plane is self-hosted (running in pods), we can check it
.lab[
- Show image versions for all pods in
kube-systemnamespace:kubectl --namespace=kube-system get pods -o json \ | jq -r ' .items[] | [.spec.nodeName, .metadata.name] + (.spec.containers[].image | split(":")) | @tsv ' \ | column -t
]
What version are we running anyway?
-
When I say, "I'm running Kubernetes 1.28", is that the version of:
-
kubectl
-
API server
-
kubelet
-
controller manager
-
something else?
-
Other versions that are important
-
etcd
-
kube-dns or CoreDNS
-
CNI plugin(s)
-
Network controller, network policy controller
-
Container engine
-
Linux kernel
Important questions
-
Should we upgrade the control plane before or after the kubelets?
-
Within the control plane, should we upgrade the API server first or last?
-
How often should we upgrade?
-
How long are versions maintained?
-
All the answers are in the documentation about version skew policy!
-
Let's review the key elements together ...
Kubernetes uses semantic versioning
-
Kubernetes versions look like MAJOR.MINOR.PATCH; e.g. in 1.28.9:
- MAJOR = 1
- MINOR = 28
- PATCH = 9
-
It's always possible to mix and match different PATCH releases
(e.g. 1.28.9 and 1.28.13 are compatible)
-
It is recommended to run the latest PATCH release
(but it's mandatory only when there is a security advisory)
Version skew
-
API server must be more recent than its clients (kubelet and control plane)
-
... Which means it must always be upgraded first
-
All components support a difference of one¹ MINOR version
-
This allows live upgrades (since we can mix e.g. 1.28 and 1.29)
-
It also means that going from 1.28 to 1.30 requires going through 1.29
.footnote[¹Except kubelet, which can be up to two MINOR behind API server, and kubectl, which can be one MINOR ahead or behind API server.]
Release cycle
-
There is a new PATCH relese whenever necessary
(every few weeks, or "ASAP" when there is a security vulnerability)
-
There is a new MINOR release every 3 months (approximately)
-
At any given time, three MINOR releases are maintained
-
... Which means that MINOR releases are maintained approximately 9 months
-
We should expect to upgrade at least every 3 months (on average)
General guidelines
-
To update a component, use whatever was used to install it
-
If it's a distro package, update that distro package
-
If it's a container or pod, update that container or pod
-
If you used configuration management, update with that
Know where your binaries come from
-
Sometimes, we need to upgrade quickly
(when a vulnerability is announced and patched)
-
If we are using an installer, we should:
-
make sure it's using upstream packages
-
or make sure that whatever packages it uses are current
-
make sure we can tell it to pin specific component versions
-
In practice
-
We are going to update a few cluster components
-
We will change the kubelet version on one node
-
We will change the version of the API server
-
We will work with cluster
oldversion(nodesoldversion1,oldversion2,oldversion3)
Updating the API server
-
This cluster has been deployed with kubeadm
-
The control plane runs in static pods
-
These pods are started automatically by kubelet
(even when kubelet can't contact the API server)
-
They are defined in YAML files in
/etc/kubernetes/manifests(this path is set by a kubelet command-line flag)
-
kubelet automatically updates the pods when the files are changed
Changing the API server version
- We will edit the YAML file to use a different image version
.lab[
-
Log into node
oldversion1 -
Check API server version:
kubectl version -
Edit the API server pod manifest:
sudo vim /etc/kubernetes/manifests/kube-apiserver.yaml -
Look for the
image:line, and update it to e.g.v1.30.1
]
Checking what we've done
- The API server will be briefly unavailable while kubelet restarts it
.lab[
- Check the API server version:
kubectl version
]
Was that a good idea?
--
No!
--
-
Remember the guideline we gave earlier:
To update a component, use whatever was used to install it.
-
This control plane was deployed with kubeadm
-
We should use kubeadm to upgrade it!
Updating the whole control plane
-
Let's make it right, and use kubeadm to upgrade the entire control plane
(note: this is possible only because the cluster was installed with kubeadm)
.lab[
- Check what will be upgraded:
sudo kubeadm upgrade plan
]
Note 1: kubeadm thinks that our cluster is running 1.24.1.
It is confused by our manual upgrade of the API server!
Note 2: kubeadm itself is still version 1.22.1..
It doesn't know how to upgrade do 1.23.X.
Upgrading kubeadm
-
First things first: we need to upgrade kubeadm
-
The Kubernetes package repositories are now split by minor versions
(i.e. there is one repository for 1.28, another for 1.29, etc.)
-
This avoids accidentally upgrading from one minor version to another
(e.g. with unattended upgrades or if packages haven't been held/pinned)
-
We'll need to add the new package repository and unpin packages!
Installing the new packages
-
Edit
/etc/apt/sources.list.d/kubernetes.list(or copy it to e.g.
kubernetes-1.29.listand edit that) -
apt-get update -
Now edit (or remove)
/etc/apt/preferences.d/kubernetes -
apt-get install kubeadmshould now upgradekubeadmcorrectly! 🎉
Reverting our manual API server upgrade
-
First, we should revert our
image:change(so that kubeadm executes the right migration steps)
.lab[
-
Edit the API server pod manifest:
sudo vim /etc/kubernetes/manifests/kube-apiserver.yaml -
Look for the
image:line, and restore it to the original value(e.g.
v1.28.9) -
Wait for the control plane to come back up
]
Upgrading the cluster with kubeadm
- Now we can let kubeadm do its job!
.lab[
-
Check the upgrade plan:
sudo kubeadm upgrade plan -
Perform the upgrade:
sudo kubeadm upgrade apply v1.29.0
]
Updating kubelet
-
These nodes have been installed using the official Kubernetes packages
-
We can therefore use
aptorapt-get
.lab[
-
Log into node
oldversion2 -
Update package lists and APT pins like we did before
-
Then upgrade kubelet
]
Checking what we've done
.lab[
-
Log into node
oldversion1 -
Check node versions:
kubectl get nodes -o wide -
Create a deployment and scale it to make sure that the node still works
]
Was that a good idea?
--
Almost!
--
-
Yes, kubelet was installed with distribution packages
-
However, kubeadm took care of configuring kubelet
(when doing
kubeadm join ...) -
We were supposed to run a special command before upgrading kubelet!
-
That command should be executed on each node
-
It will download the kubelet configuration generated by kubeadm
Upgrading kubelet the right way
-
We need to upgrade kubeadm, upgrade kubelet config, then upgrade kubelet
(after upgrading the control plane)
.lab[
- Execute the whole upgrade procedure on each node:
for N in 1 2 3; do ssh oldversion$N " sudo sed -i s/1.28/1.29/ /etc/apt/sources.list.d/kubernetes.list && sudo rm /etc/apt/preferences.d/kubernetes && sudo apt update && sudo apt install kubeadm -y && sudo kubeadm upgrade node && sudo apt install kubelet -y" done
]
Checking what we've done
- All our nodes should now be updated to version 1.29
.lab[
- Check nodes versions:
kubectl get nodes -o wide
]
And now, was that a good idea?
--
Almost!
--
-
The official recommendation is to drain a node before performing node maintenance
(migrate all workloads off the node before upgrading it)
-
How do we do that?
-
Is it really necessary?
-
Let's see!
Draining a node
-
This can be achieved with the
kubectl draincommand, which will:-
cordon the node (prevent new pods from being scheduled there)
-
evict all the pods running on the node (delete them gracefully)
-
the evicted pods will automatically be recreated somewhere else
-
evictions might be blocked in some cases (Pod Disruption Budgets,
emptyDirvolumes)
-
-
Once the node is drained, it can safely be upgraded, restarted...
-
Once it's ready, it can be put back in commission with
kubectl uncordon
Is it necessary?
-
When upgrading kubelet from one patch-level version to another:
- it's probably fine
-
When upgrading system packages:
-
it's probably fine
-
except when it's not
-
-
When upgrading the kernel:
-
it's probably fine
-
...as long as we can tolerate a restart of the containers on the node
-
...and that they will be unavailable for a few minutes (during the reboot)
-
Is it necessary?
-
When upgrading kubelet from one minor version to another:
-
it may or may not be fine
-
in some cases (e.g. migrating from Docker to containerd) it will not
-
-
Here's what the documentation says:
Draining nodes before upgrading kubelet ensures that pods are re-admitted and containers are re-created, which may be necessary to resolve some security issues or other important bugs.
-
Do it at your own risk, and if you do, test extensively in staging environments!
Database operators to the rescue
-
Moving stateful pods (e.g.: database server) can cause downtime
-
Database replication can help:
-
if a node contains database servers, we make sure these servers aren't primaries
-
if they are primaries, we execute a switch over
-
-
Some database operators (e.g. CNPG) will do that switch over automatically
(when they detect that a node has been cordoned)
class: extra-details
Skipping versions
-
This example worked because we went from 1.28 to 1.29
-
If you are upgrading from e.g. 1.26, you will have to go through 1.27 first
-
This means upgrading kubeadm to 1.27.X, then using it to upgrade the cluster
-
Then upgrading kubeadm to 1.28.X, etc.
-
Make sure to read the release notes before upgrading!
???
:EN:- Best practices for cluster upgrades :EN:- Example: upgrading a kubeadm cluster
:FR:- Bonnes pratiques pour la mise à jour des clusters :FR:- Exemple : mettre à jour un cluster kubeadm