From 2efc29991e9bda2effc30ddcb30e957f5d846ae4 Mon Sep 17 00:00:00 2001 From: Jerome Petazzoni Date: Tue, 20 Nov 2018 12:45:32 -0600 Subject: [PATCH 1/2] Rewrite section about labels and selectors The old version was using a slightly confusing way to show which pods were receiving traffic: kubectl logs --tail 1 --selector app=rng (And then we look at the timestamp of the last request.) In this new version, concepts are introduced progressively; the YAML parser magic is isolated from the other concerns; we show the impact of removing a pod from load balancing in a way that is (IMHO) more straightforward: - follow logs of specific pod - remove pod from load balancer - logs instantly stop flowing These slides also explain why the DaemonSet and the ReplicaSet for the rng service don't step on each other's toes. --- slides/k8s/daemonset.md | 525 ++++++++++++++++++++++------------------ 1 file changed, 286 insertions(+), 239 deletions(-) diff --git a/slides/k8s/daemonset.md b/slides/k8s/daemonset.md index f3bfc54f..a569f438 100644 --- a/slides/k8s/daemonset.md +++ b/slides/k8s/daemonset.md @@ -252,38 +252,29 @@ The master node has [taints](https://kubernetes.io/docs/concepts/configuration/t --- -## What are all these pods doing? +## Is this working? -- Let's check the logs of all these `rng` pods - -- All these pods have the label `app=rng`: - - - the first pod, because that's what `kubectl create deployment` does - - the other ones (in the daemon set), because we - *copied the spec from the first one* - -- Therefore, we can query everybody's logs using that `app=rng` selector - -.exercise[ - -- Check the logs of all the pods having a label `app=rng`: - ```bash - kubectl logs -l app=rng --tail 1 - ``` - -] +- Look at the web UI -- -It appears that *all the pods* are serving requests at the moment. +- The graph should now go above 10 hashes per second! + +-- + +- It looks like the newly created pods are serving traffic correctly + +- How and why did this happen? + + (We didn't do anything special to add them to the `rng` service load balancer!) --- -## The magic of selectors +# Labels and selectors - The `rng` *service* is load balancing requests to a set of pods -- This set of pods is defined as "pods having the label `app=rng`" +- That set of pods is defined by the *selector* of the `rng` service .exercise[ @@ -294,19 +285,60 @@ It appears that *all the pods* are serving requests at the moment. ] -When we created additional pods with this label, they were -automatically detected by `svc/rng` and added as *endpoints* -to the associated load balancer. +- The selector is `app=rng` + +- It means "all the pods having the label `app=rng`" + + (They can have additional labels as well, that's OK!) --- -## Removing the first pod from the load balancer +## Selector evaluation + +- We can use selectors with many `kubectl` commands + +- For instance, with `kubectl get`, `kubectl logs`, `kubectl delete` ... and more + +.exercise[ + +- Get the list of pods matching selector `app=rng`: + ```bash + kubectl get pods -l app=rng + kubectl get pods --selector app=rng + ``` + +] + +But ... why do these pods (in particular, the *new* ones) have this `app=rng` label? + +--- + +## Where do labels come from? + +- When we create a deployment with `kubectl create deployment rng`, +
this deployment gets the label `app=rng` + +- The replica sets created by this deployment also get the label `app=rng` + +- The pods created by these replica sets also get the label `app=rng` + +- When we created the daemon set from the deployment, we re-used the same spec + +- Therefore, the pods created by the daemon set get the same labels + +.footnote[Note: when we use `kubectl run stuff`, the label is `run=stuff` instead.] + +--- + +## Updating load balancer configuration + +- We would like to remove a pod from the load balancer - What would happen if we removed that pod, with `kubectl delete pod ...`? -- - The `replicaset` would re-create it immediately. + It would be re-created immediately (by the replica set or the daemon set) -- @@ -314,90 +346,272 @@ to the associated load balancer. -- - The `replicaset` would re-create it immediately. + It would *also* be re-created immediately -- - ... Because what matters to the `replicaset` is the number of pods *matching that selector.* - --- - -- But but but ... Don't we have more than one pod with `app=rng` now? - --- - - The answer lies in the exact selector used by the `replicaset` ... + Why?!? --- -## Deep dive into selectors +## Selectors for replica sets and daemon sets -- Let's look at the selectors for the `rng` *deployment* and the associated *replica set* +- The "mission" of a replica set is: + + "Make sure that there is the right number of pods matching this spec!" + +- The "mission" of a daemon set is: + + "Make sure that there is a pod matching this spec on each node!" + +-- + +- *In fact,* replica sets and daemon sets do not check pod specifications + +- They merely have a *selector*, and they look for pods matching that selector + +- Yes, we can fool them by manually creating pods with the "right" labels + +- Bottom line: if we remove our `app=rng` label ... + + ... The pod "diseappears" for its parent, which re-creates another pod to replace it + +--- + +class: extra-details + +## Isolation of replica sets and daemon sets + +- Since both the `rng` daemon set and the `rng` replica set use `app=rng` ... + + ... Why don't they "find" each other's pods? + +-- + +- *Replica sets* have a more specific selector, visible with `kubectl describe` + + (It looks like `app=rng,pod-template-hash=abcd1234`) + +- *Daemon sets* also have a more specific selector, but it's invisible + + (It looks like `app=rng,controller-revision-hash=abcd1234`) + +- As a result, each controller only "sees" the pods it manages + +--- + +## Removing a pod from the load balancer + +- Currently, the `rng` service is defined by the `app=rng` selector + +- The only way to remove a pod is to remove or change the `app` label + +- ... But that will cause another pod to be created instead! + +- What's the solution? + +-- + +- We need to change the selector of the `rng` service! + +- Let's add another label to that selector (e.g. `enabled=yes`) + +--- + +## Complex selectors + +- If a selector specifies multiple labels, they are understood as a logical *AND* + + (In other words: the pods must match all the labels) + +- Kubernetes has support for advanced, set-based selectors + + (But these cannot be used with services, at least not yet!) + +--- + +## The plan + +1. Add the label `enabled=yes` to all our `rng` pods + +2. Update the selector for the `rng` service to also include `enabled=yes` + +3. Toggle traffic to a pod by manually adding/removing the `enabled` label + +4. Profit! + +*Note: if we swap steps 1 and 2, it will cause a short +service disruption, because there will be a period of time +during which the service selector won't match any pod. +During that time, requests to the service will time out. +By doing things in the order above, we guarantee that there won't +be any interruption.* + +--- + +## Adding labels to pods + +- We want to add the label `enabled=yes` to all pods that have `app=rng` + +- We could edit each pod one by one with `kubectl edit` ... + +- ... Or we could use `kubectl label` to label them all + +- `kubectl label` can use selectors itself .exercise[ -- Show detailed information about the `rng` deployment: +- Add `enabled=yes` to all pods that have `app=rng`: ```bash - kubectl describe deploy rng + kubectl label pods -l app=rng enabled=yes ``` -- Show detailed information about the `rng` replica: -
(The second command doesn't require you to get the exact name of the replica set) +] + +--- + +## Updating the service selector + +- We need to edit the service specification + +- Reminder: in the service definition, we will see `app: rng` in two places + + - the label of the service itself (we don't need to touch that one) + + - the selector of the service (that's the one we want to change) + +.exercise[ + +- Update the service to add `enabled: yes` to its selector: ```bash - kubectl describe rs rng-yyyyyyyy - kubectl describe rs -l app=rng + kubectl edit service rng ``` + + ] -- -The replica set selector also has a `pod-template-hash`, unlike the pods in our daemon set. +... And then we get *the weirdest error ever.* Why? --- -# Updating a service through labels and selectors +## When the YAML parser is being too smart -- What if we want to drop the `rng` deployment from the load balancer? +- YAML parsers try to help us: -- Option 1: + - `xyz` is the string `"xyz"` - - destroy it + - `42` is the integer `42` -- Option 2: + - `yes` is the boolean value `true` - - add an extra *label* to the daemon set +- If we want the string `"42"` or the string `"yes"`, we have to quote them - - update the service *selector* to refer to that *label* +- So we have to use `enabled: "yes"` --- - -Of course, option 2 offers more learning opportunities. Right? +.footnote[For a good laugh: if we had used "ja", "oui", "si" ... as the value, it would have worked!] --- -## Add an extra label to the daemon set +## Updating the service selector, take 2 -- We will update the daemon set "spec" +.exercise[ -- Option 1: +- Update the service to add `enabled: "yes"` to its selector: + ```bash + kubectl edit service rng + ``` - - edit the `rng.yml` file that we used earlier + - - load the new definition with `kubectl apply` +] -- Option 2: +This time it should work! - - use `kubectl edit` - --- - -*If you feel like you got this๐Ÿ’•๐ŸŒˆ, feel free to try directly.* - -*We've included a few hints on the next slides for your convenience!* +If we did everything correctly, the web UI shouldn't show any change. --- +## Updating labels + +- We want to disable the pod that was created by the deployment + +- All we have to do, is remove the `enabled` label from that pod + +- To identify that pod, we can use its name + +- ... Or rely on the fact that it's the only one with a `pod-template-hash` label + +- Good to know: + + - `kubectl label ... foo=` doesn't remove a label (it sets it to an empty string) + + - to remove label `foo`, use `kubectl label ... foo-` + + - to change an existing label, we would need to add `--overwrite` + +--- + +## Removing a pod from the load balancer + +.exercise[ + +- In one window, check the logs of that pod: + ```bash + POD=$(kubectl get pod -l app=rng,pod-template-hash -o name) + kubectl logs --tail 1 --follow $POD + + ``` + (We should see a steady stream of HTTP logs) + +- In another window, remove the label from the pod: + ```bash + kubectl label pod -l app=rng,pod-template-hash enabled- + ``` + (The stream of HTTP logs should stop immediately) + +] + +There might be a slight change in the web UI (since we removed a bit +of capacity from the `rng` service). If we remove more pods, +the effect should be more visible. + +--- + +class: extra-details + +## Updating the daemon set + +- If we scale up our cluster by adding new nodes, the daemon set will create more pods + +- These pods won't have the `enabled=yes` label + +- If we want these pods to have that label, we need to edit the daemon set spec + +- We can do that with e.g. `kubectl edit daemonset rng` + +--- + +class: extra-details + ## We've put resources in your resources - Reminder: a daemon set is a resource that creates more resources! @@ -410,7 +624,9 @@ Of course, option 2 offers more learning opportunities. Right? - the label(s) of the resource(s) created by the first resource (in the `template` block) -- You need to update the selector and the template (metadata labels are not mandatory) +- We would need to update the selector and the template + + (metadata labels are not mandatory) - The template must match the selector @@ -418,175 +634,6 @@ Of course, option 2 offers more learning opportunities. Right? --- -## Adding our label - -- Let's add a label `isactive: yes` - -- In YAML, `yes` should be quoted; i.e. `isactive: "yes"` - -.exercise[ - -- Update the daemon set to add `isactive: "yes"` to the selector and template label: - ```bash - kubectl edit daemonset rng - ``` - - - -- Update the service to add `isactive: "yes"` to its selector: - ```bash - kubectl edit service rng - ``` - - - -] - ---- - -## Checking what we've done - -.exercise[ - -- Check the most recent log line of all `app=rng` pods to confirm that exactly one per node is now active: - ```bash - kubectl logs -l app=rng --tail 1 - ``` - -] - -The timestamps should give us a hint about how many pods are currently receiving traffic. - -.exercise[ - -- Look at the pods that we have right now: - ```bash - kubectl get pods - ``` - -] - ---- - -## Cleaning up - -- The pods of the deployment and the "old" daemon set are still running - -- We are going to identify them programmatically - -.exercise[ - -- List the pods with `app=rng` but without `isactive=yes`: - ```bash - kubectl get pods -l app=rng,isactive!=yes - ``` - -- Remove these pods: - ```bash - kubectl delete pods -l app=rng,isactive!=yes - ``` - -] - ---- - -## Cleaning up stale pods - -``` -$ kubectl get pods -NAME READY STATUS RESTARTS AGE -rng-54f57d4d49-7pt82 1/1 Terminating 0 51m -rng-54f57d4d49-vgz9h 1/1 Running 0 22s -rng-b85tm 1/1 Terminating 0 39m -rng-hfbrr 1/1 Terminating 0 39m -rng-vplmj 1/1 Running 0 7m -rng-xbpvg 1/1 Running 0 7m -[...] -``` - -- The extra pods (noted `Terminating` above) are going away - -- ... But a new one (`rng-54f57d4d49-vgz9h` above) was restarted immediately! - --- - -- Remember, the *deployment* still exists, and makes sure that one pod is up and running - -- If we delete the pod associated to the deployment, it is recreated automatically - ---- - -## Deleting a deployment - -.exercise[ - -- Remove the `rng` deployment: - ```bash - kubectl delete deployment rng - ``` -] - --- - -- The pod that was created by the deployment is now being terminated: - -``` -$ kubectl get pods -NAME READY STATUS RESTARTS AGE -rng-54f57d4d49-vgz9h 1/1 Terminating 0 4m -rng-vplmj 1/1 Running 0 11m -rng-xbpvg 1/1 Running 0 11m -[...] -``` - -Ding, dong, the deployment is dead! And the daemon set lives on. - ---- - -## Avoiding extra pods - -- When we changed the definition of the daemon set, it immediately created new pods. We had to remove the old ones manually. - -- How could we have avoided this? - --- - -- By adding the `isactive: "yes"` label to the pods before changing the daemon set! - -- This can be done programmatically with `kubectl patch`: - - ```bash - PATCH=' - metadata: - labels: - isactive: "yes" - ' - kubectl get pods -l app=rng -l controller-revision-hash -o name | - xargs kubectl patch -p "$PATCH" - ``` - ---- - ## Labels and debugging - When a pod is misbehaving, we can delete it: another one will be recreated From 9fa7b958dca8dd4b670e2ccaf7dc6fe14c4d9ff9 Mon Sep 17 00:00:00 2001 From: Jerome Petazzoni Date: Thu, 6 Dec 2018 21:38:26 -0600 Subject: [PATCH 2/2] Update Consul demo to use Cloud auto-join Consul 1.4 introduces Cloud auto-join, which finds the IP addresses of the other nodes by querying an API (in that case, the Kubernetes API). This involves creating a service account and granting permissions to list and get pods. It is a little bit more complex, but it reuses previous notions (like RBAC) so I like it better. --- k8s/consul.yaml | 46 ++++++++++++++++++++++++++------- slides/k8s/statefulsets.md | 52 ++++++++++++++++++++++++-------------- 2 files changed, 70 insertions(+), 28 deletions(-) diff --git a/k8s/consul.yaml b/k8s/consul.yaml index 2e5bc138..a82d4733 100644 --- a/k8s/consul.yaml +++ b/k8s/consul.yaml @@ -1,3 +1,37 @@ +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: consul + labels: + app: consul +rules: + - apiGroups: [""] + resources: + - pods + verbs: + - get + - list +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: consul +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: consul +subjects: + - kind: ServiceAccount + name: consul + namespace: default +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: consul + labels: + app: consul +--- apiVersion: v1 kind: Service metadata: @@ -24,6 +58,7 @@ spec: labels: app: consul spec: + serviceAccountName: consul affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: @@ -37,18 +72,11 @@ spec: terminationGracePeriodSeconds: 10 containers: - name: consul - image: "consul:1.2.2" - env: - - name: NAMESPACE - valueFrom: - fieldRef: - fieldPath: metadata.namespace + image: "consul:1.4.0" args: - "agent" - "-bootstrap-expect=3" - - "-retry-join=consul-0.consul.$(NAMESPACE).svc.cluster.local" - - "-retry-join=consul-1.consul.$(NAMESPACE).svc.cluster.local" - - "-retry-join=consul-2.consul.$(NAMESPACE).svc.cluster.local" + - "-retry-join=provider=k8s label_selector=\"app=consul\"" - "-client=0.0.0.0" - "-data-dir=/consul/data" - "-server" diff --git a/slides/k8s/statefulsets.md b/slides/k8s/statefulsets.md index 35c7fc20..3dc45286 100644 --- a/slides/k8s/statefulsets.md +++ b/slides/k8s/statefulsets.md @@ -266,7 +266,9 @@ spec: --- -## Stateful sets in action +# Running a Consul cluster + +- Here is a good use-case for Stateful sets! - We are going to deploy a Consul cluster with 3 nodes @@ -294,42 +296,54 @@ consul agent -data=dir=/consul/data -client=0.0.0.0 -server -ui \ -retry-join=`Y.Y.Y.Y` ``` -- We need to replace X.X.X.X and Y.Y.Y.Y with the addresses of other nodes +- Replace X.X.X.X and Y.Y.Y.Y with the addresses of other nodes -- We can specify DNS names, but then they have to be FQDN - -- It's OK for a pod to include itself in the list as well - -- We can therefore use the same command-line on all nodes (easier!) +- The same command-line can be used on all nodes (convenient!) --- -## Discovering the addresses of other pods +## Cloud Auto-join -- When a service is created for a stateful set, individual DNS entries are created +- Since version 1.4.0, Consul can use the Kubernetes API to find its peers -- These entries are constructed like this: +- This is called [Cloud Auto-join] - `-...svc.cluster.local` +- Instead of passing an IP address, we need to pass a parameter like this: -- `` is the number of the pod in the set (starting at zero) + ``` + consul agent -retry-join "provider=k8s label_selector=\"app=consul\"" + ``` -- If we deploy Consul in the default namespace, the names could be: +- Consul needs to be able to talk to the Kubernetes API - - `consul-0.consul.default.svc.cluster.local` - - `consul-1.consul.default.svc.cluster.local` - - `consul-2.consul.default.svc.cluster.local` +- We can provide a `kubeconfig` file + +- If Consul runs in a pod, it will use the *service account* of the pod + +[Cloud Auto-join]: https://www.consul.io/docs/agent/cloud-auto-join.html#kubernetes-k8s- + +--- + +## Setting up Cloud auto-join + +- We need to create a service account for Consul + +- We need to create a role that can `list` and `get` pods + +- We need to bind that role to the service account + +- And of course, we need to make sure that Consul pods use that service account --- ## Putting it all together -- The file `k8s/consul.yaml` defines a service and a stateful set +- The file `k8s/consul.yaml` defines the required resources + + (service account, cluster role, cluster role binding, service, stateful set) - It has a few extra touches: - - the name of the namespace is injected through an environment variable - - a `podAntiAffinity` prevents two pods from running on the same node - a `preStop` hook makes the pod leave the cluster when shutdown gracefully