4.5 KiB
Adding more manager nodes
-
Right now, we have only one manager (node1)
-
If we lose it, we lose quorum - and that's very bad!
-
Containers running on other nodes will be fine ...
-
But we won't be able to get or set anything related to the cluster
-
If the manager is permanently gone, we will have to do a manual repair!
-
Nobody wants to do that ... so let's make our cluster highly available
class: self-paced
Adding more managers
With Play-With-Docker:
TOKEN=$(docker swarm join-token -q manager)
for N in $(seq 4 5); do
export DOCKER_HOST=tcp://node$N:2375
docker swarm join --token $TOKEN node1:2377
done
unset DOCKER_HOST
class: in-person
Building our full cluster
- We could SSH to nodes 3, 4, 5; and copy-paste the command
--
class: in-person
- Or we could use the AWESOME POWER OF THE SHELL!
--
class: in-person
--
class: in-person
- No, not that shell
class: in-person
Let's form like Swarm-tron
- Let's get the token, and loop over the remaining nodes with SSH
.exercise[
-
Obtain the manager token:
TOKEN=$(docker swarm join-token -q manager) -
Loop over the 3 remaining nodes:
for NODE in node3 node4 node5; do ssh $NODE docker swarm join --token $TOKEN node1:2377 done
]
You can control the Swarm from any manager node
.exercise[
- Try the following command on a few different nodes:
docker node ls
]
On manager nodes:
you will see the list of nodes, with a * denoting
the node you're talking to.
On non-manager nodes:
you will get an error message telling you that
the node is not a manager.
As we saw earlier, you can only control the Swarm through a manager node.
class: self-paced
Play-With-Docker node status icon
-
If you're using Play-With-Docker, you get node status icons
-
Node status icons are displayed left of the node name
- No icon = no Swarm mode detected
- Solid blue icon = Swarm manager detected
- Blue outline icon = Swarm worker detected
Dynamically changing the role of a node
-
We can change the role of a node on the fly:
docker node promote nodeX→ make nodeX a manager
docker node demote nodeX→ make nodeX a worker
.exercise[
-
See the current list of nodes:
docker node ls -
Promote any worker node to be a manager:
docker node promote <node_name_or_id>
]
How many managers do we need?
- 2N+1 nodes can (and will) tolerate N failures
(you can have an even number of managers, but there is no point)
--
-
1 manager = no failure
-
3 managers = 1 failure
-
5 managers = 2 failures (or 1 failure during 1 maintenance)
-
7 managers and more = now you might be overdoing it a little bit
Why not have all nodes be managers?
-
Intuitively, it's harder to reach consensus in larger groups
-
With Raft, writes have to go to (and be acknowledged by) all nodes
-
More nodes = more network traffic
-
Bigger network = more latency
What would McGyver do?
-
If some of your machines are more than 10ms away from each other,
try to break them down in multiple clusters (keeping internal latency low) -
Groups of up to 9 nodes: all of them are managers
-
Groups of 10 nodes and up: pick 5 "stable" nodes to be managers
(Cloud pro-tip: use separate auto-scaling groups for managers and workers) -
Groups of more than 100 nodes: watch your managers' CPU and RAM
-
Groups of more than 1000 nodes:
- if you can afford to have fast, stable managers, add more of them
- otherwise, break down your nodes in multiple clusters
What's the upper limit?
-
We don't know!
-
Internal testing at Docker Inc.: 1000-10000 nodes is fine
-
deployed to a single cloud region
-
one of the main take-aways was "you're gonna need a bigger manager"
-
-
Testing by the community: 4700 heterogenous nodes all over the 'net
-
it just works
-
more nodes require more CPU; more containers require more RAM
-
scheduling of large jobs (70000 containers) is slow, though (working on it!)
-
Real-life deployment methods
--
Running commands manually over SSH
--
(lol jk)
--
-
Using your favorite configuration management tool
