Files
container.training/docs/morenodes.md
2017-10-10 17:55:22 +02:00

237 lines
4.5 KiB
Markdown

## Adding more manager nodes
- Right now, we have only one manager (node1)
- If we lose it, we lose quorum - and that's *very bad!*
- Containers running on other nodes will be fine ...
- But we won't be able to get or set anything related to the cluster
- If the manager is permanently gone, we will have to do a manual repair!
- Nobody wants to do that ... so let's make our cluster highly available
---
class: self-paced
## Adding more managers
With Play-With-Docker:
```bash
TOKEN=$(docker swarm join-token -q manager)
for N in $(seq 4 5); do
export DOCKER_HOST=tcp://node$N:2375
docker swarm join --token $TOKEN node1:2377
done
unset DOCKER_HOST
```
---
class: in-person
## Building our full cluster
- We could SSH to nodes 3, 4, 5; and copy-paste the command
--
class: in-person
- Or we could use the AWESOME POWER OF THE SHELL!
--
class: in-person
![Mario Red Shell](mario-red-shell.png)
--
class: in-person
- No, not *that* shell
---
class: in-person
## Let's form like Swarm-tron
- Let's get the token, and loop over the remaining nodes with SSH
.exercise[
- Obtain the manager token:
```bash
TOKEN=$(docker swarm join-token -q manager)
```
- Loop over the 3 remaining nodes:
```bash
for NODE in node3 node4 node5; do
ssh $NODE docker swarm join --token $TOKEN node1:2377
done
```
]
[That was easy.](https://www.youtube.com/watch?v=3YmMNpbFjp0)
---
## You can control the Swarm from any manager node
.exercise[
- Try the following command on a few different nodes:
```bash
docker node ls
```
]
On manager nodes:
<br/>you will see the list of nodes, with a `*` denoting
the node you're talking to.
On non-manager nodes:
<br/>you will get an error message telling you that
the node is not a manager.
As we saw earlier, you can only control the Swarm through a manager node.
---
class: self-paced
## Play-With-Docker node status icon
- If you're using Play-With-Docker, you get node status icons
- Node status icons are displayed left of the node name
- No icon = no Swarm mode detected
- Solid blue icon = Swarm manager detected
- Blue outline icon = Swarm worker detected
![Play-With-Docker icons](pwd-icons.png)
---
## Dynamically changing the role of a node
- We can change the role of a node on the fly:
`docker node promote nodeX` → make nodeX a manager
<br/>
`docker node demote nodeX` → make nodeX a worker
.exercise[
- See the current list of nodes:
```
docker node ls
```
- Promote any worker node to be a manager:
```
docker node promote <node_name_or_id>
```
]
---
## How many managers do we need?
- 2N+1 nodes can (and will) tolerate N failures
<br/>(you can have an even number of managers, but there is no point)
--
- 1 manager = no failure
- 3 managers = 1 failure
- 5 managers = 2 failures (or 1 failure during 1 maintenance)
- 7 managers and more = now you might be overdoing it a little bit
---
## Why not have *all* nodes be managers?
- Intuitively, it's harder to reach consensus in larger groups
- With Raft, writes have to go to (and be acknowledged by) all nodes
- More nodes = more network traffic
- Bigger network = more latency
---
## What would McGyver do?
- If some of your machines are more than 10ms away from each other,
<br/>
try to break them down in multiple clusters
(keeping internal latency low)
- Groups of up to 9 nodes: all of them are managers
- Groups of 10 nodes and up: pick 5 "stable" nodes to be managers
<br/>
(Cloud pro-tip: use separate auto-scaling groups for managers and workers)
- Groups of more than 100 nodes: watch your managers' CPU and RAM
- Groups of more than 1000 nodes:
- if you can afford to have fast, stable managers, add more of them
- otherwise, break down your nodes in multiple clusters
---
## What's the upper limit?
- We don't know!
- Internal testing at Docker Inc.: 1000-10000 nodes is fine
- deployed to a single cloud region
- one of the main take-aways was *"you're gonna need a bigger manager"*
- Testing by the community: [4700 heterogenous nodes all over the 'net](https://sematext.com/blog/2016/11/14/docker-swarm-lessons-from-swarm3k/)
- it just works
- more nodes require more CPU; more containers require more RAM
- scheduling of large jobs (70000 containers) is slow, though (working on it!)
---
## Real-life deployment methods
--
Running commands manually over SSH
--
(lol jk)
--
- Using your favorite configuration management tool
- [Docker for AWS](https://docs.docker.com/docker-for-aws/#quickstart)
- [Docker for Azure](https://docs.docker.com/docker-for-azure/)