container.training/docs/morenodes.md at baf48657d05446eeaa5127d2f2dda3552172b695

github/container.training

Fork 0

mirror of https://github.com/jpetazzo/container.training.git synced 2026-03-03 01:40:19 +00:00

Files

Jérôme Petazzoni 825257427f Split out selfpaced and dockercon workshops

2017-10-10 17:55:22 +02:00

4.5 KiB

Raw Blame History

Adding more manager nodes

Right now, we have only one manager (node1)
If we lose it, we lose quorum - and that's very bad!
Containers running on other nodes will be fine ...
But we won't be able to get or set anything related to the cluster
If the manager is permanently gone, we will have to do a manual repair!
Nobody wants to do that ... so let's make our cluster highly available

Adding more managers

With Play-With-Docker:

TOKEN=$(docker swarm join-token -q manager)
for N in $(seq 4 5); do
  export DOCKER_HOST=tcp://node$N:2375
  docker swarm join --token $TOKEN node1:2377
done
unset DOCKER_HOST

Building our full cluster

We could SSH to nodes 3, 4, 5; and copy-paste the command

Or we could use the AWESOME POWER OF THE SHELL!

No, not that shell

Let's form like Swarm-tron

Let's get the token, and loop over the remaining nodes with SSH

Obtain the manager token:

TOKEN=$(docker swarm join-token -q manager)

Loop over the 3 remaining nodes:

  for NODE in node3 node4 node5; do
    ssh $NODE docker swarm join --token $TOKEN node1:2377
  done

]

That was easy.

You can control the Swarm from any manager node

Try the following command on a few different nodes:
```
docker node ls
```

]

On manager nodes:
you will see the list of nodes, with a * denoting the node you're talking to.

On non-manager nodes:
you will get an error message telling you that the node is not a manager.

As we saw earlier, you can only control the Swarm through a manager node.

Play-With-Docker node status icon

If you're using Play-With-Docker, you get node status icons
Node status icons are displayed left of the node name
- No icon = no Swarm mode detected
- Solid blue icon = Swarm manager detected
- Blue outline icon = Swarm worker detected

Dynamically changing the role of a node

We can change the role of a node on the fly:

docker node promote nodeX → make nodeX a manager
docker node demote nodeX → make nodeX a worker

See the current list of nodes:
```
docker node ls
```
Promote any worker node to be a manager:
```
docker node promote <node_name_or_id>
```

]

How many managers do we need?

2N+1 nodes can (and will) tolerate N failures
(you can have an even number of managers, but there is no point)

1 manager = no failure
3 managers = 1 failure
5 managers = 2 failures (or 1 failure during 1 maintenance)
7 managers and more = now you might be overdoing it a little bit

Why not have all nodes be managers?

Intuitively, it's harder to reach consensus in larger groups
With Raft, writes have to go to (and be acknowledged by) all nodes
More nodes = more network traffic
Bigger network = more latency

What would McGyver do?

If some of your machines are more than 10ms away from each other,
try to break them down in multiple clusters (keeping internal latency low)
Groups of up to 9 nodes: all of them are managers
Groups of 10 nodes and up: pick 5 "stable" nodes to be managers
(Cloud pro-tip: use separate auto-scaling groups for managers and workers)
Groups of more than 100 nodes: watch your managers' CPU and RAM
Groups of more than 1000 nodes:
- if you can afford to have fast, stable managers, add more of them
- otherwise, break down your nodes in multiple clusters

What's the upper limit?

We don't know!
Internal testing at Docker Inc.: 1000-10000 nodes is fine
- deployed to a single cloud region
- one of the main take-aways was "you're gonna need a bigger manager"
Testing by the community: 4700 heterogenous nodes all over the 'net
- it just works
- more nodes require more CPU; more containers require more RAM
- scheduling of large jobs (70000 containers) is slow, though (working on it!)

Real-life deployment methods

Running commands manually over SSH

(lol jk)

Using your favorite configuration management tool
Docker for AWS
Docker for Azure

4.5 KiB Raw Blame History