container.training/slides/k8s/multinode.md

# Adding nodes to the cluster

- So far, our cluster has only 1 node

- Let's see what it takes to add more nodes

- We are going to use another set of machines: `kubenet`

---

## The environment

- We have 3 identical machines: `kubenet1`, `kubenet2`, `kubenet3`

- The Docker Engine is installed (and running) on these machines

- The Kubernetes binaries are installed, but nothing is running

- We will use `kubenet1` to run the control plane

---

## The plan

- Start the control plane on `kubenet1`

- Join the 3 nodes to the cluster

- Deploy and scale a simple web server

.lab[

- Log into `kubenet1`

]

---

## Running the control plane

- We will use a Compose file to start the control plane components

.lab[

- Clone the repository containing the workshop materials:
  ```bash
  git clone https://@@GITREPO@@
  ```

- Go to the `compose/simple-k8s-control-plane` directory:
  ```bash
  cd container.training/compose/simple-k8s-control-plane
  ```

- Start the control plane:
  ```bash
  docker-compose up
  ```

]

---

## Checking the control plane status

- Before moving on, verify that the control plane works

.lab[

- Show control plane component statuses:
  ```bash
  kubectl get componentstatuses
  kubectl get cs
  ```

- Show the (empty) list of nodes:
  ```bash
  kubectl get nodes
  ```

]

---

class: extra-details

## Differences from `dmuc`

- Our new control plane listens on `0.0.0.0` instead of the default `127.0.0.1`

- The ServiceAccount admission plugin is disabled

---

## Joining the nodes

- We need to generate a `kubeconfig` file for kubelet

- This time, we need to put the public IP address of `kubenet1`

  (instead of `localhost` or `127.0.0.1`)

.lab[

- Generate the `kubeconfig` file:
  ```bash
    kubectl config set-cluster kubenet --server http://`X.X.X.X`:8080
    kubectl config set-context kubenet --cluster kubenet
    kubectl config use-context kubenet
    cp ~/.kube/config ~/kubeconfig
  ```

]

---

## Distributing the `kubeconfig` file

- We need that `kubeconfig` file on the other nodes, too

.lab[

- Copy `kubeconfig` to the other nodes:
  ```bash
    for N in 2 3; do
    	scp ~/kubeconfig kubenet$N:
    done
  ```

]

---

## Starting kubelet

- Reminder: kubelet needs to run as root; don't forget `sudo`!

.lab[

- Join the first node:
   ```bash
   sudo kubelet --kubeconfig ~/kubeconfig
   ```

- Open more terminals and join the other nodes to the cluster:
  ```bash
  ssh kubenet2 sudo kubelet --kubeconfig ~/kubeconfig
  ssh kubenet3 sudo kubelet --kubeconfig ~/kubeconfig
  ```

]

---

## Checking cluster status

- We should now see all 3 nodes

- At first, their `STATUS` will be `NotReady`

- They will move to `Ready` state after approximately 10 seconds

.lab[

- Check the list of nodes:
  ```bash
  kubectl get nodes
  ```

]

---

## Deploy a web server

- Let's create a Deployment and scale it

  (so that we have multiple pods on multiple nodes)

.lab[

- Create a Deployment running `jpetazzo/color`:
  ```bash
  kubectl create deployment blue --image=jpetazzo/color
  ```

- Scale it:
  ```bash
  kubectl scale deployment blue --replicas=5
  ```

]

---

## Check our pods

- The pods will be scheduled on the nodes

- The nodes will pull the `jpetazzo/color` image, and start the pods

- What are the IP addresses of our pods?

.lab[

- Check the IP addresses of our pods
  ```bash
  kubectl get pods -o wide
  ```

]

--

🤔 Something's not right ... Some pods have the same IP address!

---

## What's going on?

- Without the `--network-plugin` flag, kubelet defaults to "no-op" networking

- It lets the container engine use a default network

  (in that case, we end up with the default Docker bridge)

- Our pods are running on independent, disconnected, host-local networks

---

## What do we need to do?

- On a normal cluster, kubelet is configured to set up pod networking with CNI plugins

- This requires:

  - installing CNI plugins

  - writing CNI configuration files

  - running kubelet with `--network-plugin=cni`

---

## Using network plugins

- We need to set up a better network

- Before diving into CNI, we will use the `kubenet` plugin

- This plugin creates a `cbr0` bridge and connects the containers to that bridge

- This plugin allocates IP addresses from a range:

  - either specified to kubelet (e.g. with `--pod-cidr`)

  - or stored in the node's `spec.podCIDR` field

.footnote[See [here][kubenet-plugin] for more details about this `kubenet` plugin.]

[kubenet-plugin]: https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#kubenet

---

## What `kubenet` does and *does not* do

- It allocates IP addresses to pods *locally*

  (each node has its own local subnet)

- It connects the pods to a *local* bridge

  (pods on the same node can communicate together; not with other nodes)

- It doesn't set up routing or tunneling

  (we get pods on separated networks; we need to connect them somehow)

- It doesn't allocate subnets to nodes

  (this can be done manually, or by the controller manager)

---

## Setting up routing or tunneling

- *On each node*, we will add routes to the other nodes' pod network

- Of course, this is not convenient or scalable!

- We will see better techniques to do this; but for now, hang on!

---

## Allocating subnets to nodes

- There are multiple options:

  - passing the subnet to kubelet with the `--pod-cidr` flag

  - manually setting `spec.podCIDR` on each node

  - allocating node CIDRs automatically with the controller manager

- The last option would be implemented by adding these flags to controller manager:
  ```
  --allocate-node-cidrs=true --cluster-cidr=<cidr>
  ```

---

class: extra-details

## The pod CIDR field is not mandatory

- `kubenet` needs the pod CIDR, but other plugins don't need it

  (e.g. because they allocate addresses in multiple pools, or a single big one)

- The pod CIDR field may eventually be deprecated and replaced by an annotation

  (see [kubernetes/kubernetes#57130](https://github.com/kubernetes/kubernetes/issues/57130))

---

## Restarting kubelet wih pod CIDR

- We need to stop and restart all our kubelets

- We will add the `--network-plugin` and `--pod-cidr` flags

- We all have a "cluster number" (let's call that `C`) printed on your VM info card

- We will use pod CIDR `10.C.N.0/24` (where `N` is the node number: 1, 2, 3)

.lab[

- Stop all the kubelets (Ctrl-C is fine)

- Restart them all, adding `--network-plugin=kubenet --pod-cidr 10.C.N.0/24`

]

---

## What happens to our pods?

- When we stop (or kill) kubelet, the containers keep running

- When kubelet starts again, it detects the containers

.lab[

- Check that our pods are still here:
  ```bash
  kubectl get pods -o wide
  ```

]

🤔 But our pods still use local IP addresses!

---

## Recreating the pods

- The IP address of a pod cannot change

- kubelet doesn't automatically kill/restart containers with "invalid" addresses
  <br/>
  (in fact, from kubelet's point of view, there is no such thing as an "invalid" address)

- We must delete our pods and recreate them

.lab[

- Delete all the pods, and let the ReplicaSet recreate them:
  ```bash
  kubectl delete pods --all
  ```

- Wait for the pods to be up again:
  ```bash
  kubectl get pods -o wide -w
  ```

]

---

## Adding kube-proxy

- Let's start kube-proxy to provide internal load balancing

- Then see if we can create a Service and use it to contact our pods

.lab[

- Start kube-proxy:
  ```bash
  sudo kube-proxy --kubeconfig ~/.kube/config
  ```

- Expose our Deployment:
  ```bash
  kubectl expose deployment blue --port=80
  ```

]

---

## Test internal load balancing

.lab[

- Retrieve the ClusterIP address:
  ```bash
  kubectl get svc blue
  ```

- Send a few requests to the ClusterIP address (with `curl`)

]

--

Sometimes it works, sometimes it doesn't. Why?

---

## Routing traffic

- Our pods have new, distinct IP addresses

- But they are on host-local, isolated networks

- If we try to ping a pod on a different node, it won't work

- kube-proxy merely rewrites the destination IP address

- But we need that IP address to be reachable in the first place

- How do we fix this?

  (hint: check the title of this slide!)

---

## Important warning

- The technique that we are about to use doesn't work everywhere

- It only works if:

  - all the nodes are directly connected to each other (at layer 2)

  - the underlying network allows the IP addresses of our pods

- If we are on physical machines connected by a switch: OK

- If we are on virtual machines in a public cloud: NOT OK

  - on AWS, we need to disable "source and destination checks" on our instances

  - on OpenStack, we need to disable "port security" on our network ports

---

## Routing basics

- We need to tell *each* node:

  "The subnet 10.C.N.0/24 is located on node N" (for all values of N)

- This is how we add a route on Linux:
  ```bash
  ip route add 10.C.N.0/24 via W.X.Y.Z
  ```

  (where `W.X.Y.Z` is the internal IP address of node N)

- We can see the internal IP addresses of our nodes with:
  ```bash
  kubectl get nodes -o wide
  ```

---

## Firewalling

- By default, Docker prevents containers from using arbitrary IP addresses

  (by setting up iptables rules)

- We need to allow our containers to use our pod CIDR

- For simplicity, we will insert a blanket iptables rule allowing all traffic:

  `iptables -I FORWARD -j ACCEPT`

- This has to be done on every node

---

## Setting up routing

.lab[

- Create all the routes on all the nodes

- Insert the iptables rule allowing traffic

- Check that you can ping all the pods from one of the nodes

- Check that you can `curl` the ClusterIP of the Service successfully

]

---

## What's next?

- We did a lot of manual operations:

  - allocating subnets to nodes

  - adding command-line flags to kubelet

  - updating the routing tables on our nodes

- We want to automate all these steps

- We want something that works on all networks

???

:EN:- Connecting nodes ands pods
:FR:- Interconnecter les nœuds et les pods