mirror of
https://github.com/jpetazzo/container.training.git
synced 2026-05-01 22:46:36 +00:00
237 lines
4.5 KiB
Markdown
237 lines
4.5 KiB
Markdown
## Adding more manager nodes
|
|
|
|
- Right now, we have only one manager (node1)
|
|
|
|
- If we lose it, we lose quorum - and that's *very bad!*
|
|
|
|
- Containers running on other nodes will be fine ...
|
|
|
|
- But we won't be able to get or set anything related to the cluster
|
|
|
|
- If the manager is permanently gone, we will have to do a manual repair!
|
|
|
|
- Nobody wants to do that ... so let's make our cluster highly available
|
|
|
|
---
|
|
|
|
class: self-paced
|
|
|
|
## Adding more managers
|
|
|
|
With Play-With-Docker:
|
|
|
|
```bash
|
|
TOKEN=$(docker swarm join-token -q manager)
|
|
for N in $(seq 4 5); do
|
|
export DOCKER_HOST=tcp://node$N:2375
|
|
docker swarm join --token $TOKEN node1:2377
|
|
done
|
|
unset DOCKER_HOST
|
|
```
|
|
|
|
---
|
|
|
|
class: in-person
|
|
|
|
## Building our full cluster
|
|
|
|
- We could SSH to nodes 3, 4, 5; and copy-paste the command
|
|
|
|
--
|
|
|
|
class: in-person
|
|
|
|
- Or we could use the AWESOME POWER OF THE SHELL!
|
|
|
|
--
|
|
|
|
class: in-person
|
|
|
|

|
|
|
|
--
|
|
|
|
class: in-person
|
|
|
|
- No, not *that* shell
|
|
|
|
---
|
|
|
|
class: in-person
|
|
|
|
## Let's form like Swarm-tron
|
|
|
|
- Let's get the token, and loop over the remaining nodes with SSH
|
|
|
|
.exercise[
|
|
|
|
- Obtain the manager token:
|
|
```bash
|
|
TOKEN=$(docker swarm join-token -q manager)
|
|
```
|
|
|
|
- Loop over the 3 remaining nodes:
|
|
```bash
|
|
for NODE in node3 node4 node5; do
|
|
ssh $NODE docker swarm join --token $TOKEN node1:2377
|
|
done
|
|
```
|
|
|
|
]
|
|
|
|
[That was easy.](https://www.youtube.com/watch?v=3YmMNpbFjp0)
|
|
|
|
---
|
|
|
|
## You can control the Swarm from any manager node
|
|
|
|
.exercise[
|
|
|
|
- Try the following command on a few different nodes:
|
|
```bash
|
|
docker node ls
|
|
```
|
|
|
|
]
|
|
|
|
On manager nodes:
|
|
<br/>you will see the list of nodes, with a `*` denoting
|
|
the node you're talking to.
|
|
|
|
On non-manager nodes:
|
|
<br/>you will get an error message telling you that
|
|
the node is not a manager.
|
|
|
|
As we saw earlier, you can only control the Swarm through a manager node.
|
|
|
|
---
|
|
|
|
class: self-paced
|
|
|
|
## Play-With-Docker node status icon
|
|
|
|
- If you're using Play-With-Docker, you get node status icons
|
|
|
|
- Node status icons are displayed left of the node name
|
|
|
|
- No icon = no Swarm mode detected
|
|
- Solid blue icon = Swarm manager detected
|
|
- Blue outline icon = Swarm worker detected
|
|
|
|

|
|
|
|
---
|
|
|
|
## Dynamically changing the role of a node
|
|
|
|
- We can change the role of a node on the fly:
|
|
|
|
`docker node promote nodeX` → make nodeX a manager
|
|
<br/>
|
|
`docker node demote nodeX` → make nodeX a worker
|
|
|
|
.exercise[
|
|
|
|
- See the current list of nodes:
|
|
```
|
|
docker node ls
|
|
```
|
|
|
|
- Promote any worker node to be a manager:
|
|
```
|
|
docker node promote <node_name_or_id>
|
|
```
|
|
|
|
]
|
|
|
|
---
|
|
|
|
## How many managers do we need?
|
|
|
|
- 2N+1 nodes can (and will) tolerate N failures
|
|
<br/>(you can have an even number of managers, but there is no point)
|
|
|
|
--
|
|
|
|
- 1 manager = no failure
|
|
|
|
- 3 managers = 1 failure
|
|
|
|
- 5 managers = 2 failures (or 1 failure during 1 maintenance)
|
|
|
|
- 7 managers and more = now you might be overdoing it a little bit
|
|
|
|
---
|
|
|
|
## Why not have *all* nodes be managers?
|
|
|
|
- Intuitively, it's harder to reach consensus in larger groups
|
|
|
|
- With Raft, writes have to go to (and be acknowledged by) all nodes
|
|
|
|
- More nodes = more network traffic
|
|
|
|
- Bigger network = more latency
|
|
|
|
---
|
|
|
|
## What would McGyver do?
|
|
|
|
- If some of your machines are more than 10ms away from each other,
|
|
<br/>
|
|
try to break them down in multiple clusters
|
|
(keeping internal latency low)
|
|
|
|
- Groups of up to 9 nodes: all of them are managers
|
|
|
|
- Groups of 10 nodes and up: pick 5 "stable" nodes to be managers
|
|
<br/>
|
|
(Cloud pro-tip: use separate auto-scaling groups for managers and workers)
|
|
|
|
- Groups of more than 100 nodes: watch your managers' CPU and RAM
|
|
|
|
- Groups of more than 1000 nodes:
|
|
|
|
- if you can afford to have fast, stable managers, add more of them
|
|
- otherwise, break down your nodes in multiple clusters
|
|
|
|
---
|
|
|
|
## What's the upper limit?
|
|
|
|
- We don't know!
|
|
|
|
- Internal testing at Docker Inc.: 1000-10000 nodes is fine
|
|
|
|
- deployed to a single cloud region
|
|
|
|
- one of the main take-aways was *"you're gonna need a bigger manager"*
|
|
|
|
- Testing by the community: [4700 heterogenous nodes all over the 'net](https://sematext.com/blog/2016/11/14/docker-swarm-lessons-from-swarm3k/)
|
|
|
|
- it just works
|
|
|
|
- more nodes require more CPU; more containers require more RAM
|
|
|
|
- scheduling of large jobs (70000 containers) is slow, though (working on it!)
|
|
|
|
---
|
|
|
|
## Real-life deployment methods
|
|
|
|
--
|
|
|
|
Running commands manually over SSH
|
|
|
|
--
|
|
|
|
(lol jk)
|
|
|
|
--
|
|
|
|
- Using your favorite configuration management tool
|
|
|
|
- [Docker for AWS](https://docs.docker.com/docker-for-aws/#quickstart)
|
|
|
|
- [Docker for Azure](https://docs.docker.com/docker-for-azure/)
|