12 KiB
The Container Network Interface
-
Allows us to decouple network configuration from Kubernetes
-
Implemented by plugins
-
Plugins are executables that will be invoked by kubelet
-
Plugins are responsible for:
-
allocating IP addresses for containers
-
configuring the network for containers
-
-
Plugins can be combined and chained when it makes sense
Combining plugins
-
Interface could be created by e.g.
vlanorbridgeplugin -
IP address could be allocated by e.g.
dhcporhost-localplugin -
Interface parameters (MTU, sysctls) could be tweaked by the
tuningplugin
The reference plugins are available here.
Look in each plugin's directory for its documentation.
How does kubelet know which plugins to use?
-
The plugin (or list of plugins) is set in the CNI configuration
-
The CNI configuration is a single file in
/etc/cni/net.d -
If there are multiple files in that directory, the first one is used
(in lexicographic order)
-
That path can be changed with the
--cni-conf-dirflag of kubelet
CNI configuration in practice
-
When we set up the "pod network" (like Calico, Weave...) it ships a CNI configuration
(and sometimes, custom CNI plugins)
-
Very often, that configuration (and plugins) is installed automatically
(by a DaemonSet featuring an initContainer with hostPath volumes)
-
Examples:
-
Calico CNI config and volume
-
kube-router CNI config and volume
-
class: extra-details
Conf vs conflist
-
There are two slightly different configuration formats
-
Basic configuration format:
- holds configuration for a single plugin
- typically has a
.confname suffix - has a
typestring field in the top-most structure - examples
-
Configuration list format:
- can hold configuration for multiple (chained) plugins
- typically has a
.conflistname suffix - has a
pluginslist field in the top-most structure - examples
class: extra-details
How plugins are invoked
-
Parameters are given through environment variables, including:
-
CNI_COMMAND: desired operation (ADD, DEL, CHECK, or VERSION)
-
CNI_CONTAINERID: container ID
-
CNI_NETNS: path to network namespace file
-
CNI_IFNAME: what the network interface should be named
-
-
The network configuration must be provided to the plugin on stdin
(this avoids race conditions that could happen by passing a file path)
In practice: kube-router
-
We are going to set up a new cluster
-
For this new cluster, we will use kube-router
-
kube-router will provide the "pod network"
(connectivity with pods)
-
kube-router will also provide internal service connectivity
(replacing kube-proxy)
How kube-router works
-
Very simple architecture
-
Does not introduce new CNI plugins
(uses the
bridgeplugin, withhost-localfor IPAM) -
Pod traffic is routed between nodes
(no tunnel, no new protocol)
-
Internal service connectivity is implemented with IPVS
-
Can provide pod network and/or internal service connectivity
-
kube-router daemon runs on every node
What kube-router does
-
Connect to the API server
-
Obtain the local node's
podCIDR -
Inject it into the CNI configuration file
(we'll use
/etc/cni/net.d/10-kuberouter.conflist) -
Obtain the addresses of all nodes
-
Establish a full mesh BGP peering with the other nodes
-
Exchange routes over BGP
class: extra-details
What's BGP?
-
BGP (Border Gateway Protocol) is the protocol used between internet routers
-
It scales pretty well (it is used to announce the 700k CIDR prefixes of the internet)
-
It is spoken by many hardware routers from many vendors
-
It also has many software implementations (Quagga, Bird, FRR...)
-
Experienced network folks generally know it (and appreciate it)
-
It also used by Calico (another popular network system for Kubernetes)
-
Using BGP allows us to interconnect our "pod network" with other systems
The plan
-
We'll work in a new cluster (named
kuberouter) -
We will run a simple control plane (like before)
-
... But this time, the controller manager will allocate
podCIDRsubnets(so that we don't have to manually assign subnets to individual nodes)
-
We will create a DaemonSet for kube-router
-
We will join nodes to the cluster
-
The DaemonSet will automatically start a kube-router pod on each node
Logging into the new cluster
.lab[
-
Log into node
kuberouter1 -
Clone the workshop repository:
git clone https://@@GITREPO@@ -
Move to this directory:
cd container.training/compose/kube-router-k8s-control-plane
]
class: extra-details
Checking the CNI configuration
- By default, kubelet gets the CNI configuration from
/etc/cni/net.d
.lab[
- Check the content of
/etc/cni/net.d
]
(On most machines, at this point, /etc/cni/net.d doesn't even exist).)
Our control plane
-
We will use a Compose file to start the control plane
-
It is similar to the one we used with the
kubenetcluster -
The API server is started with
--allow-privileged(because we will start kube-router in privileged pods)
-
The controller manager is started with extra flags too:
--allocate-node-cidrsand--cluster-cidr -
We need to edit the Compose file to set the Cluster CIDR
Starting the control plane
-
Our cluster CIDR will be
10.C.0.0/16(where
Cis our cluster number)
.lab[
-
Edit the Compose file to set the Cluster CIDR:
vim docker-compose.yaml -
Start the control plane:
docker-compose up
]
The kube-router DaemonSet
-
In the same directory, there is a
kuberouter.yamlfile -
It contains the definition for a DaemonSet and a ConfigMap
-
Before we load it, we also need to edit it
-
We need to indicate the address of the API server
(because kube-router needs to connect to it to retrieve node information)
Creating the DaemonSet
-
The address of the API server will be
http://A.B.C.D:8080(where
A.B.C.Dis the public address ofkuberouter1, running the control plane)
.lab[
-
Edit the YAML file to set the API server address:
vim kuberouter.yaml -
Create the DaemonSet:
kubectl create -f kuberouter.yaml
]
Note: the DaemonSet won't create any pods (yet) since there are no nodes (yet).
Generating the kubeconfig for kubelet
- This is similar to what we did for the
kubenetcluster
.lab[
- Generate the kubeconfig file (replacing
X.X.X.Xwith the address ofkuberouter1):kubectl config set-cluster cni --server http://`X.X.X.X`:8080 kubectl config set-context cni --cluster cni kubectl config use-context cni cp ~/.kube/config ~/kubeconfig
]
Distributing kubeconfig
- We need to copy that kubeconfig file to the other nodes
.lab[
- Copy
kubeconfigto the other nodes:for N in 2 3; do scp ~/kubeconfig kuberouter$N: done
]
Starting kubelet
-
We don't need the
--pod-cidroption anymore(the controller manager will allocate these automatically)
-
We need to pass
--network-plugin=cni
.lab[
-
Join the first node:
sudo kubelet --kubeconfig ~/kubeconfig --network-plugin=cni -
Open more terminals and join the other nodes:
ssh kuberouter2 sudo kubelet --kubeconfig ~/kubeconfig --network-plugin=cni ssh kuberouter3 sudo kubelet --kubeconfig ~/kubeconfig --network-plugin=cni
]
class: extra-details
Checking the CNI configuration
-
At this point, kuberouter should have installed its CNI configuration
(in
/etc/cni/net.d)
.lab[
- Check the content of
/etc/cni/net.d
]
-
There should be a file created by kuberouter
-
The file should contain the node's podCIDR
Setting up a test
- Let's create a Deployment and expose it with a Service
.lab[
-
Create a Deployment running a web server:
kubectl create deployment blue --image=jpetazzo/color -
Scale it so that it spans multiple nodes:
kubectl scale deployment blue --replicas=5 -
Expose it with a Service:
kubectl expose deployment blue --port=8888
]
Checking that everything works
.lab[
-
Get the ClusterIP address for the service:
kubectl get svc web -
Send a few requests there:
curl `X.X.X.X`:8888
]
Note that if you send multiple requests, they are load-balanced in a round robin manner.
This shows that we are using IPVS (vs. iptables, which picked random endpoints).
class: extra-details
Troubleshooting
- What if we need to check that everything is working properly?
.lab[
-
Check the IP addresses of our pods:
kubectl get pods -o wide -
Check our routing table:
route -n ip route
]
We should see the local pod CIDR connected to kube-bridge, and the other nodes' pod CIDRs having individual routes, with each node being the gateway.
class: extra-details
More troubleshooting
-
We can also look at the output of the kube-router pods
(with
kubectl logs) -
kube-router also comes with a special shell that gives lots of useful info
(we can access it with
kubectl exec) -
But with the current setup of the cluster, these options may not work!
-
Why?
class: extra-details
Trying kubectl logs / kubectl exec
.lab[
-
Try to show the logs of a kube-router pod:
kubectl -n kube-system logs ds/kube-router -
Or try to exec into one of the kube-router pods:
kubectl -n kube-system exec kube-router-xxxxx bash
]
These commands will give an error message that includes:
dial tcp: lookup kuberouterX on 127.0.0.11:53: no such host
What does that mean?
class: extra-details
Internal name resolution
-
To execute these commands, the API server needs to connect to kubelet
-
By default, it creates a connection using the kubelet's name
(e.g.
http://kuberouter1:...) -
This requires our nodes names to be in DNS
-
We can change that by setting a flag on the API server:
--kubelet-preferred-address-types=InternalIP
class: extra-details
Another way to check the logs
-
We can also ask the logs directly to the container engine
-
First, get the container ID, with
docker psor like this:CID=$(docker ps -q \ --filter label=io.kubernetes.pod.namespace=kube-system \ --filter label=io.kubernetes.container.name=kube-router) -
Then view the logs:
docker logs $CID
class: extra-details
Other ways to distribute routing tables
-
We don't need kube-router and BGP to distribute routes
-
The list of nodes (and associated
podCIDRsubnets) is available through the API -
This shell snippet generates the commands to add all required routes on a node:
NODES=$(kubectl get nodes -o name | cut -d/ -f2)
for DESTNODE in $NODES; do
if [ "$DESTNODE" != "$HOSTNAME" ]; then
echo $(kubectl get node $DESTNODE -o go-template="
route add -net {{.spec.podCIDR}} gw {{(index .status.addresses 0).address}}")
fi
done
-
This could be useful for embedded platforms with very limited resources
(or lab environments for learning purposes)
???
:EN:- Configuring CNI plugins :FR:- Configurer des plugins CNI