Merge branch 'master' into kube-2019-01

This commit is contained in:
Jerome Petazzoni
2019-01-14 12:01:47 -06:00
3 changed files with 356 additions and 267 deletions

View File

@@ -1,3 +1,37 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: consul
labels:
app: consul
rules:
- apiGroups: [""]
resources:
- pods
verbs:
- get
- list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: consul
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: consul
subjects:
- kind: ServiceAccount
name: consul
namespace: default
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: consul
labels:
app: consul
---
apiVersion: v1
kind: Service
metadata:
@@ -24,6 +58,7 @@ spec:
labels:
app: consul
spec:
serviceAccountName: consul
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
@@ -37,18 +72,11 @@ spec:
terminationGracePeriodSeconds: 10
containers:
- name: consul
image: "consul:1.2.2"
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
image: "consul:1.4.0"
args:
- "agent"
- "-bootstrap-expect=3"
- "-retry-join=consul-0.consul.$(NAMESPACE).svc.cluster.local"
- "-retry-join=consul-1.consul.$(NAMESPACE).svc.cluster.local"
- "-retry-join=consul-2.consul.$(NAMESPACE).svc.cluster.local"
- "-retry-join=provider=k8s label_selector=\"app=consul\""
- "-client=0.0.0.0"
- "-data-dir=/consul/data"
- "-server"

View File

@@ -252,38 +252,29 @@ The master node has [taints](https://kubernetes.io/docs/concepts/configuration/t
---
## What are all these pods doing?
## Is this working?
- Let's check the logs of all these `rng` pods
- All these pods have the label `app=rng`:
- the first pod, because that's what `kubectl create deployment` does
- the other ones (in the daemon set), because we
*copied the spec from the first one*
- Therefore, we can query everybody's logs using that `app=rng` selector
.exercise[
- Check the logs of all the pods having a label `app=rng`:
```bash
kubectl logs -l app=rng --tail 1
```
]
- Look at the web UI
--
It appears that *all the pods* are serving requests at the moment.
- The graph should now go above 10 hashes per second!
--
- It looks like the newly created pods are serving traffic correctly
- How and why did this happen?
(We didn't do anything special to add them to the `rng` service load balancer!)
---
## The magic of selectors
# Labels and selectors
- The `rng` *service* is load balancing requests to a set of pods
- This set of pods is defined as "pods having the label `app=rng`"
- That set of pods is defined by the *selector* of the `rng` service
.exercise[
@@ -294,19 +285,60 @@ It appears that *all the pods* are serving requests at the moment.
]
When we created additional pods with this label, they were
automatically detected by `svc/rng` and added as *endpoints*
to the associated load balancer.
- The selector is `app=rng`
- It means "all the pods having the label `app=rng`"
(They can have additional labels as well, that's OK!)
---
## Removing the first pod from the load balancer
## Selector evaluation
- We can use selectors with many `kubectl` commands
- For instance, with `kubectl get`, `kubectl logs`, `kubectl delete` ... and more
.exercise[
- Get the list of pods matching selector `app=rng`:
```bash
kubectl get pods -l app=rng
kubectl get pods --selector app=rng
```
]
But ... why do these pods (in particular, the *new* ones) have this `app=rng` label?
---
## Where do labels come from?
- When we create a deployment with `kubectl create deployment rng`,
<br/>this deployment gets the label `app=rng`
- The replica sets created by this deployment also get the label `app=rng`
- The pods created by these replica sets also get the label `app=rng`
- When we created the daemon set from the deployment, we re-used the same spec
- Therefore, the pods created by the daemon set get the same labels
.footnote[Note: when we use `kubectl run stuff`, the label is `run=stuff` instead.]
---
## Updating load balancer configuration
- We would like to remove a pod from the load balancer
- What would happen if we removed that pod, with `kubectl delete pod ...`?
--
The `replicaset` would re-create it immediately.
It would be re-created immediately (by the replica set or the daemon set)
--
@@ -314,90 +346,272 @@ to the associated load balancer.
--
The `replicaset` would re-create it immediately.
It would *also* be re-created immediately
--
... Because what matters to the `replicaset` is the number of pods *matching that selector.*
--
- But but but ... Don't we have more than one pod with `app=rng` now?
--
The answer lies in the exact selector used by the `replicaset` ...
Why?!?
---
## Deep dive into selectors
## Selectors for replica sets and daemon sets
- Let's look at the selectors for the `rng` *deployment* and the associated *replica set*
- The "mission" of a replica set is:
"Make sure that there is the right number of pods matching this spec!"
- The "mission" of a daemon set is:
"Make sure that there is a pod matching this spec on each node!"
--
- *In fact,* replica sets and daemon sets do not check pod specifications
- They merely have a *selector*, and they look for pods matching that selector
- Yes, we can fool them by manually creating pods with the "right" labels
- Bottom line: if we remove our `app=rng` label ...
... The pod "diseappears" for its parent, which re-creates another pod to replace it
---
class: extra-details
## Isolation of replica sets and daemon sets
- Since both the `rng` daemon set and the `rng` replica set use `app=rng` ...
... Why don't they "find" each other's pods?
--
- *Replica sets* have a more specific selector, visible with `kubectl describe`
(It looks like `app=rng,pod-template-hash=abcd1234`)
- *Daemon sets* also have a more specific selector, but it's invisible
(It looks like `app=rng,controller-revision-hash=abcd1234`)
- As a result, each controller only "sees" the pods it manages
---
## Removing a pod from the load balancer
- Currently, the `rng` service is defined by the `app=rng` selector
- The only way to remove a pod is to remove or change the `app` label
- ... But that will cause another pod to be created instead!
- What's the solution?
--
- We need to change the selector of the `rng` service!
- Let's add another label to that selector (e.g. `enabled=yes`)
---
## Complex selectors
- If a selector specifies multiple labels, they are understood as a logical *AND*
(In other words: the pods must match all the labels)
- Kubernetes has support for advanced, set-based selectors
(But these cannot be used with services, at least not yet!)
---
## The plan
1. Add the label `enabled=yes` to all our `rng` pods
2. Update the selector for the `rng` service to also include `enabled=yes`
3. Toggle traffic to a pod by manually adding/removing the `enabled` label
4. Profit!
*Note: if we swap steps 1 and 2, it will cause a short
service disruption, because there will be a period of time
during which the service selector won't match any pod.
During that time, requests to the service will time out.
By doing things in the order above, we guarantee that there won't
be any interruption.*
---
## Adding labels to pods
- We want to add the label `enabled=yes` to all pods that have `app=rng`
- We could edit each pod one by one with `kubectl edit` ...
- ... Or we could use `kubectl label` to label them all
- `kubectl label` can use selectors itself
.exercise[
- Show detailed information about the `rng` deployment:
- Add `enabled=yes` to all pods that have `app=rng`:
```bash
kubectl describe deploy rng
kubectl label pods -l app=rng enabled=yes
```
- Show detailed information about the `rng` replica:
<br/>(The second command doesn't require you to get the exact name of the replica set)
]
---
## Updating the service selector
- We need to edit the service specification
- Reminder: in the service definition, we will see `app: rng` in two places
- the label of the service itself (we don't need to touch that one)
- the selector of the service (that's the one we want to change)
.exercise[
- Update the service to add `enabled: yes` to its selector:
```bash
kubectl describe rs rng-yyyyyyyy
kubectl describe rs -l app=rng
kubectl edit service rng
```
<!--
```wait Please edit the object below```
```keys /app: rng```
```keys ^J```
```keys noenabled: yes```
```keys ^[``` ]
```keys :wq```
```keys ^J```
-->
]
--
The replica set selector also has a `pod-template-hash`, unlike the pods in our daemon set.
... And then we get *the weirdest error ever.* Why?
---
# Updating a service through labels and selectors
## When the YAML parser is being too smart
- What if we want to drop the `rng` deployment from the load balancer?
- YAML parsers try to help us:
- Option 1:
- `xyz` is the string `"xyz"`
- destroy it
- `42` is the integer `42`
- Option 2:
- `yes` is the boolean value `true`
- add an extra *label* to the daemon set
- If we want the string `"42"` or the string `"yes"`, we have to quote them
- update the service *selector* to refer to that *label*
- So we have to use `enabled: "yes"`
--
Of course, option 2 offers more learning opportunities. Right?
.footnote[For a good laugh: if we had used "ja", "oui", "si" ... as the value, it would have worked!]
---
## Add an extra label to the daemon set
## Updating the service selector, take 2
- We will update the daemon set "spec"
.exercise[
- Option 1:
- Update the service to add `enabled: "yes"` to its selector:
```bash
kubectl edit service rng
```
- edit the `rng.yml` file that we used earlier
<!--
```wait Please edit the object below```
```keys /app: rng```
```keys ^J```
```keys noenabled: "yes"```
```keys ^[``` ]
```keys :wq```
```keys ^J```
-->
- load the new definition with `kubectl apply`
]
- Option 2:
This time it should work!
- use `kubectl edit`
--
*If you feel like you got this💕🌈, feel free to try directly.*
*We've included a few hints on the next slides for your convenience!*
If we did everything correctly, the web UI shouldn't show any change.
---
## Updating labels
- We want to disable the pod that was created by the deployment
- All we have to do, is remove the `enabled` label from that pod
- To identify that pod, we can use its name
- ... Or rely on the fact that it's the only one with a `pod-template-hash` label
- Good to know:
- `kubectl label ... foo=` doesn't remove a label (it sets it to an empty string)
- to remove label `foo`, use `kubectl label ... foo-`
- to change an existing label, we would need to add `--overwrite`
---
## Removing a pod from the load balancer
.exercise[
- In one window, check the logs of that pod:
```bash
POD=$(kubectl get pod -l app=rng,pod-template-hash -o name)
kubectl logs --tail 1 --follow $POD
```
(We should see a steady stream of HTTP logs)
- In another window, remove the label from the pod:
```bash
kubectl label pod -l app=rng,pod-template-hash enabled-
```
(The stream of HTTP logs should stop immediately)
]
There might be a slight change in the web UI (since we removed a bit
of capacity from the `rng` service). If we remove more pods,
the effect should be more visible.
---
class: extra-details
## Updating the daemon set
- If we scale up our cluster by adding new nodes, the daemon set will create more pods
- These pods won't have the `enabled=yes` label
- If we want these pods to have that label, we need to edit the daemon set spec
- We can do that with e.g. `kubectl edit daemonset rng`
---
class: extra-details
## We've put resources in your resources
- Reminder: a daemon set is a resource that creates more resources!
@@ -410,7 +624,9 @@ Of course, option 2 offers more learning opportunities. Right?
- the label(s) of the resource(s) created by the first resource (in the `template` block)
- You need to update the selector and the template (metadata labels are not mandatory)
- We would need to update the selector and the template
(metadata labels are not mandatory)
- The template must match the selector
@@ -418,175 +634,6 @@ Of course, option 2 offers more learning opportunities. Right?
---
## Adding our label
- Let's add a label `isactive: yes`
- In YAML, `yes` should be quoted; i.e. `isactive: "yes"`
.exercise[
- Update the daemon set to add `isactive: "yes"` to the selector and template label:
```bash
kubectl edit daemonset rng
```
<!--
```wait Please edit the object below```
```keys /app: rng```
```keys ^J```
```keys noisactive: "yes"```
```keys ^[``` ]
```keys /app: rng```
```keys ^J```
```keys oisactive: "yes"```
```keys ^[``` ]
```keys :wq```
```keys ^J```
-->
- Update the service to add `isactive: "yes"` to its selector:
```bash
kubectl edit service rng
```
<!--
```wait Please edit the object below```
```keys /app: rng```
```keys ^J```
```keys noisactive: "yes"```
```keys ^[``` ]
```keys :wq```
```keys ^J```
-->
]
---
## Checking what we've done
.exercise[
- Check the most recent log line of all `app=rng` pods to confirm that exactly one per node is now active:
```bash
kubectl logs -l app=rng --tail 1
```
]
The timestamps should give us a hint about how many pods are currently receiving traffic.
.exercise[
- Look at the pods that we have right now:
```bash
kubectl get pods
```
]
---
## Cleaning up
- The pods of the deployment and the "old" daemon set are still running
- We are going to identify them programmatically
.exercise[
- List the pods with `app=rng` but without `isactive=yes`:
```bash
kubectl get pods -l app=rng,isactive!=yes
```
- Remove these pods:
```bash
kubectl delete pods -l app=rng,isactive!=yes
```
]
---
## Cleaning up stale pods
```
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
rng-54f57d4d49-7pt82 1/1 Terminating 0 51m
rng-54f57d4d49-vgz9h 1/1 Running 0 22s
rng-b85tm 1/1 Terminating 0 39m
rng-hfbrr 1/1 Terminating 0 39m
rng-vplmj 1/1 Running 0 7m
rng-xbpvg 1/1 Running 0 7m
[...]
```
- The extra pods (noted `Terminating` above) are going away
- ... But a new one (`rng-54f57d4d49-vgz9h` above) was restarted immediately!
--
- Remember, the *deployment* still exists, and makes sure that one pod is up and running
- If we delete the pod associated to the deployment, it is recreated automatically
---
## Deleting a deployment
.exercise[
- Remove the `rng` deployment:
```bash
kubectl delete deployment rng
```
]
--
- The pod that was created by the deployment is now being terminated:
```
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
rng-54f57d4d49-vgz9h 1/1 Terminating 0 4m
rng-vplmj 1/1 Running 0 11m
rng-xbpvg 1/1 Running 0 11m
[...]
```
Ding, dong, the deployment is dead! And the daemon set lives on.
---
## Avoiding extra pods
- When we changed the definition of the daemon set, it immediately created new pods. We had to remove the old ones manually.
- How could we have avoided this?
--
- By adding the `isactive: "yes"` label to the pods before changing the daemon set!
- This can be done programmatically with `kubectl patch`:
```bash
PATCH='
metadata:
labels:
isactive: "yes"
'
kubectl get pods -l app=rng -l controller-revision-hash -o name |
xargs kubectl patch -p "$PATCH"
```
---
## Labels and debugging
- When a pod is misbehaving, we can delete it: another one will be recreated

View File

@@ -266,7 +266,9 @@ spec:
---
## Stateful sets in action
# Running a Consul cluster
- Here is a good use-case for Stateful sets!
- We are going to deploy a Consul cluster with 3 nodes
@@ -294,42 +296,54 @@ consul agent -data=dir=/consul/data -client=0.0.0.0 -server -ui \
-retry-join=`Y.Y.Y.Y`
```
- We need to replace X.X.X.X and Y.Y.Y.Y with the addresses of other nodes
- Replace X.X.X.X and Y.Y.Y.Y with the addresses of other nodes
- We can specify DNS names, but then they have to be FQDN
- It's OK for a pod to include itself in the list as well
- We can therefore use the same command-line on all nodes (easier!)
- The same command-line can be used on all nodes (convenient!)
---
## Discovering the addresses of other pods
## Cloud Auto-join
- When a service is created for a stateful set, individual DNS entries are created
- Since version 1.4.0, Consul can use the Kubernetes API to find its peers
- These entries are constructed like this:
- This is called [Cloud Auto-join]
`<name-of-stateful-set>-<n>.<name-of-service>.<namespace>.svc.cluster.local`
- Instead of passing an IP address, we need to pass a parameter like this:
- `<n>` is the number of the pod in the set (starting at zero)
```
consul agent -retry-join "provider=k8s label_selector=\"app=consul\""
```
- If we deploy Consul in the default namespace, the names could be:
- Consul needs to be able to talk to the Kubernetes API
- `consul-0.consul.default.svc.cluster.local`
- `consul-1.consul.default.svc.cluster.local`
- `consul-2.consul.default.svc.cluster.local`
- We can provide a `kubeconfig` file
- If Consul runs in a pod, it will use the *service account* of the pod
[Cloud Auto-join]: https://www.consul.io/docs/agent/cloud-auto-join.html#kubernetes-k8s-
---
## Setting up Cloud auto-join
- We need to create a service account for Consul
- We need to create a role that can `list` and `get` pods
- We need to bind that role to the service account
- And of course, we need to make sure that Consul pods use that service account
---
## Putting it all together
- The file `k8s/consul.yaml` defines a service and a stateful set
- The file `k8s/consul.yaml` defines the required resources
(service account, cluster role, cluster role binding, service, stateful set)
- It has a few extra touches:
- the name of the namespace is injected through an environment variable
- a `podAntiAffinity` prevents two pods from running on the same node
- a `preStop` hook makes the pod leave the cluster when shutdown gracefully