mirror of
https://github.com/jpetazzo/container.training.git
synced 2026-02-14 17:49:59 +00:00
Last touch-ups for LISA16! Good to go!
This commit is contained in:
BIN
docs/bell-curve.jpg
Normal file
BIN
docs/bell-curve.jpg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 15 KiB |
256
docs/index.html
256
docs/index.html
@@ -99,16 +99,20 @@ class: title
|
||||
Docker <br/> Orchestration <br/> Workshop
|
||||
|
||||
???
|
||||
---
|
||||
|
||||
## Intros
|
||||
|
||||
- Hello! We are:
|
||||
|
||||
- Hello! We are
|
||||
AJ ([@s0ulshake](https://twitter.com/s0ulshake))
|
||||
|
||||
&
|
||||
Jérôme ([@jpetazzo](https://twitter.com/jpetazzo))
|
||||
|
||||
Tiffany ([@tiffanyfayj](https://twitter.com/tiffanyfayj))
|
||||
--
|
||||
|
||||
- This is our collective Docker knowledge:
|
||||
|
||||

|
||||
|
||||
<!--
|
||||
Reminder, when updating the agenda: when people are told to show
|
||||
@@ -119,11 +123,13 @@ at e.g. 9am, and start at 9:30.
|
||||
-->
|
||||
|
||||
???
|
||||
---
|
||||
|
||||
## Agenda
|
||||
|
||||
<!--
|
||||
- Agenda:
|
||||
-->
|
||||
|
||||
.small[
|
||||
- 09:00-09:15 hello
|
||||
@@ -134,24 +140,26 @@ at e.g. 9am, and start at 9:30.
|
||||
- 13:30-15:00 part 3
|
||||
- 15:00-15:15 coffee break
|
||||
- 15:15-16:45 part 4
|
||||
- 16:45-17:30 Q&A
|
||||
- 16:45-17:00 Q&A
|
||||
]
|
||||
|
||||
<!--
|
||||
- The tutorial will run from 1pm to 5pm
|
||||
- This will be fast-paced, but DON'T PANIC!
|
||||
- We will do short breaks for coffee + QA every hour
|
||||
-->
|
||||
|
||||
- The tutorial will run from 1pm to 5pm
|
||||
|
||||
- This will be fast-paced, but DON'T PANIC!
|
||||
|
||||
- We will do short breaks for coffee + QA every hour
|
||||
- Feel free to interrupt for questions at any time
|
||||
|
||||
- Live feedback, questions, help on
|
||||
[Gitter](http://container.training/chat)
|
||||
[Slack](http://container.training/chat)
|
||||
([get an invite](http://lisainvite.herokuapp.com/))
|
||||
|
||||
- All the content is publicly available (slides, code samples, scripts)
|
||||
|
||||
<!--
|
||||
Remember to change:
|
||||
- the link below
|
||||
- the link above
|
||||
- the "tweet my speed" hashtag in DockerCoins HTML
|
||||
-->
|
||||
|
||||
@@ -235,7 +243,7 @@ grep '^# ' index.html | grep -v '<br' | tr '#' '-'
|
||||
|
||||
-->
|
||||
|
||||
- [Slack](FIXME) account
|
||||
- [Slack](http://lisainvite.herokuapp.com/) account
|
||||
<br/>(to join the conversation during the workshop)
|
||||
|
||||
- [Docker Hub](https://hub.docker.com) account
|
||||
@@ -342,7 +350,9 @@ wait
|
||||
|
||||
- you access the terminal directly in your browser
|
||||
|
||||
- exposing services requires something like ngrok
|
||||
- exposing services requires something like
|
||||
[ngrok](https://ngrok.com/)
|
||||
or [supergrok](https://github.com/jpetazzo/orchestration-workshop#using-play-with-docker)
|
||||
|
||||
- If you use VMs deployed with Docker Machine:
|
||||
|
||||
@@ -776,6 +786,28 @@ killall docker-compose
|
||||
|
||||
---
|
||||
|
||||
## Accessing internal services
|
||||
|
||||
- `rng` and `hasher` are exposed on ports 8001 and 8002
|
||||
|
||||
- This is declared in the Compose file:
|
||||
|
||||
```yaml
|
||||
...
|
||||
rng:
|
||||
build: rng
|
||||
ports:
|
||||
- "8001:80"
|
||||
|
||||
hasher:
|
||||
build: hasher
|
||||
ports:
|
||||
- "8002:80"
|
||||
...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Measuring latency under load
|
||||
|
||||
We will use `httping`.
|
||||
@@ -807,7 +839,10 @@ We will use `httping`.
|
||||
- We need to scale out the `rng` service on multiple machines!
|
||||
|
||||
Note: this is a fiction! We have enough entropy. But we need a pretext to scale out.
|
||||
<br/>(In fact, the code of `rng` uses `/dev/urandom`, which doesn't need entropy.)
|
||||
|
||||
(In fact, the code of `rng` uses `/dev/urandom`, which never runs out of entropy...
|
||||
<br/>
|
||||
...and is [just as good as `/dev/random`](http://www.slideshare.net/PacSecJP/filippo-plain-simple-reality-of-entropy).)
|
||||
|
||||
---
|
||||
|
||||
@@ -912,6 +947,12 @@ class: title
|
||||
|
||||
---
|
||||
|
||||
## Illustration
|
||||
|
||||

|
||||
|
||||
---
|
||||
|
||||
## SwarmKit concepts (2/2)
|
||||
|
||||
- The *managers* expose the SwarmKit API
|
||||
@@ -933,7 +974,7 @@ You can refer to the [NOMENCLATURE](https://github.com/docker/swarmkit/blob/mast
|
||||
|
||||
## Swarm Mode
|
||||
|
||||
- Docker Engine 1.12 features SwarmKit integration
|
||||
- Since version 1.12, Docker Engine embeds SwarmKit
|
||||
|
||||
- The Docker CLI features three new commands:
|
||||
|
||||
@@ -947,18 +988,11 @@ You can refer to the [NOMENCLATURE](https://github.com/docker/swarmkit/blob/mast
|
||||
|
||||
- The SwarmKit API is also exposed (on a separate socket)
|
||||
|
||||
???
|
||||
---
|
||||
|
||||
## Illustration
|
||||
|
||||

|
||||
|
||||
---
|
||||
|
||||
## You need to enable Swarm mode to use the new stuff
|
||||
|
||||
- By default, everything runs as usual
|
||||
- By default, all this new code is inactive
|
||||
|
||||
- Swarm Mode can be enabled, "unlocking" SwarmKit functions
|
||||
<br/>(services, out-of-the-box overlay networks, etc.)
|
||||
@@ -966,13 +1000,19 @@ You can refer to the [NOMENCLATURE](https://github.com/docker/swarmkit/blob/mast
|
||||
.exercise[
|
||||
|
||||
- Try a Swarm-specific command:
|
||||
```
|
||||
$ docker node ls
|
||||
Error response from daemon: This node is not a swarm manager. [...]
|
||||
```bash
|
||||
docker node ls
|
||||
```
|
||||
|
||||
]
|
||||
|
||||
--
|
||||
|
||||
You will get an error message:
|
||||
```
|
||||
Error response from daemon: This node is not a swarm manager. [...]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Creating our first Swarm
|
||||
@@ -1160,7 +1200,7 @@ ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
|
||||
|
||||
- We can still use `docker info` to verify that the node is part of the Swarm:
|
||||
```bash
|
||||
$ docker info | grep ^Swarm
|
||||
docker info | grep ^Swarm
|
||||
```
|
||||
|
||||
]
|
||||
@@ -1407,11 +1447,11 @@ When a node joins the Swarm:
|
||||
|
||||
## Under the hood: cluster communication
|
||||
|
||||
- The *control plane* is encrypted over TLS
|
||||
- The *control plane* is encrypted with AES-GCM; keys are rotated every 12 hours
|
||||
|
||||
- Keys and certificates are automatically renewed on regular intervals
|
||||
- Authentication is done with mutual TLS; certificates are rotated every 90 days
|
||||
|
||||
(90 days by default; tunable with `docker swarm update`)
|
||||
(`docker swarm update` allows to change this delay or to use an external CA)
|
||||
|
||||
- The *data plane* (communication between containers) is not encrypted by default
|
||||
|
||||
@@ -1509,6 +1549,75 @@ As we saw earlier, you can only control the Swarm through a manager node.
|
||||
|
||||
---
|
||||
|
||||
## How many managers to we need?
|
||||
|
||||
- 2N+1 nodes can (and will) tolerate N failures
|
||||
<br/>(you can have an even number of managers, but there is no point)
|
||||
|
||||
--
|
||||
|
||||
- 1 manager = no failure
|
||||
|
||||
- 3 managers = 1 failure
|
||||
|
||||
- 5 managers = 2 failures (or 1 failure during 1 maintenance)
|
||||
|
||||
- 7 managers and more = now you might be overdoing it a little bit
|
||||
|
||||
---
|
||||
|
||||
## Why not have *all* nodes be managers?
|
||||
|
||||
- Intuitively, it's harder to reach consensus in larger groups
|
||||
|
||||
- With Raft, each write needs to be acknowledged by the majority of nodes
|
||||
|
||||
- More nodes = more chance that we will have to wait for some laggard
|
||||
|
||||
- Bigger network = more latency
|
||||
|
||||
---
|
||||
|
||||
## What would McGyver do?
|
||||
|
||||
- If some of your machines are more than 10ms away from each other,
|
||||
<br/>
|
||||
try to break them down in multiple clusters
|
||||
(keeping internal latency low)
|
||||
|
||||
- Groups of up to 9 nodes: all of them are managers
|
||||
|
||||
- Groups of 10 nodes and up: pick 5 "stable" nodes to be managers
|
||||
|
||||
- Groups of more than 100 nodes: watch your managers' CPU and RAM
|
||||
|
||||
- Groups of more than 1000 nodes:
|
||||
|
||||
- if you can afford to have fast, stable managers, add more of them
|
||||
- otherwise, break down your nodes in multiple clusters
|
||||
|
||||
---
|
||||
|
||||
## What's the upper limit?
|
||||
|
||||
- We don't know!
|
||||
|
||||
- Internal testing at Docker Inc.: 1000-10000 nodes is fine
|
||||
|
||||
- deployed to a single cloud region
|
||||
|
||||
- one of the main take-aways was *"you're gonna need a bigger manager"*
|
||||
|
||||
- Testing by the community: [4700 heterogenous nodes all over the 'net](https://sematext.com/blog/2016/11/14/docker-swarm-lessons-from-swarm3k/)
|
||||
|
||||
- it just works
|
||||
|
||||
- more nodes require more CPU; more containers require more RAM
|
||||
|
||||
- scheduling of large jobs (70000 containers) is slow, though (working on it!)
|
||||
|
||||
---
|
||||
|
||||
# Running our first Swarm service
|
||||
|
||||
- How do we run services? Simplified version:
|
||||
@@ -1625,7 +1734,7 @@ As we saw earlier, you can only control the Swarm through a manager node.
|
||||
- Create an ElasticSearch service (and give it a name while we're at it):
|
||||
```bash
|
||||
docker service create --name search --publish 9200:9200 --replicas 7 \
|
||||
elasticsearch:2
|
||||
elasticsearch`:2`
|
||||
```
|
||||
|
||||
- Check what's going on:
|
||||
@@ -1635,6 +1744,10 @@ As we saw earlier, you can only control the Swarm through a manager node.
|
||||
|
||||
]
|
||||
|
||||
Note: don't forget the **:2**!
|
||||
|
||||
The latest version of the ElasticSearch image won't start without mandatory configuration.
|
||||
|
||||
---
|
||||
|
||||
## Tasks lifecycle
|
||||
@@ -1795,13 +1908,7 @@ We just have to adapt this to our application, which has 4 services!
|
||||
|
||||
## Using Docker Hub
|
||||
|
||||
- Set the `DOCKER_REGISTRY` environment variable to your Docker Hub user name
|
||||
<br/>(the `build-tag-push.py` script prefixes each image name with that variable)
|
||||
|
||||
- We will also see how to run the open source registry
|
||||
<br/>(so use whatever option you want!)
|
||||
|
||||
.exercise[
|
||||
*If we wanted to use the Docker Hub...*
|
||||
|
||||
<!--
|
||||
```meta
|
||||
@@ -1809,13 +1916,17 @@ We just have to adapt this to our application, which has 4 services!
|
||||
```
|
||||
-->
|
||||
|
||||
- Set the following environment variable:
|
||||
<br/>`export DOCKER_REGISTRY=jpetazzo`
|
||||
- We would set the following environment variable:
|
||||
```bash
|
||||
export DOCKER_REGISTRY=jpetazzo`
|
||||
```
|
||||
|
||||
- (Use *your* Docker Hub login, of course!)
|
||||
(Using *our* Docker Hub login, of course!)
|
||||
|
||||
- Log into the Docker Hub:
|
||||
<br/>`docker login`
|
||||
- And we would log into the Docker Hub:
|
||||
```bash
|
||||
docker login
|
||||
```
|
||||
|
||||
<!--
|
||||
```meta
|
||||
@@ -1830,13 +1941,16 @@ We just have to adapt this to our application, which has 4 services!
|
||||
|
||||
## Using Docker Trusted Registry
|
||||
|
||||
If we wanted to use DTR, we would:
|
||||
*If we wanted to use DTR, we would...*
|
||||
|
||||
- make sure we have a Docker Hub account
|
||||
- [activate a Docker Datacenter subscription](
|
||||
- Make sure we have a Docker Hub account
|
||||
|
||||
- [Activate a Docker Datacenter subscription](
|
||||
https://hub.docker.com/enterprise/trial/)
|
||||
- install DTR on our machines
|
||||
- set `DOCKER_REGISTRY` to `dtraddress:port/user`
|
||||
|
||||
- Install DTR on our machines
|
||||
|
||||
- Set `DOCKER_REGISTRY` to `dtraddress:port/user`
|
||||
|
||||
*This is out of the scope of this workshop!*
|
||||
|
||||
@@ -2120,6 +2234,10 @@ You might have to wait a bit for the container to be up and running.
|
||||
|
||||
Check its status with `docker service ps webui`.
|
||||
|
||||
Protip: use `docker service ps webui -a` to see *all* tasks.
|
||||
<br/>
|
||||
(Otherwise you only see the ones currently running.)
|
||||
|
||||
---
|
||||
|
||||
## Scaling the application
|
||||
@@ -2423,7 +2541,12 @@ It is a virtual IP address (VIP) for the `rng` service.
|
||||
|
||||
It *should* ping. (But this might change in the future.)
|
||||
|
||||
Current behavior for VIPs is to ping when there is a backend available on the same machine.
|
||||
With Engine 1.12: VIPs respond to ping if a
|
||||
backend is available on the same machine.
|
||||
|
||||
With Engine 1.13: VIPs respond to ping if a
|
||||
backend is available anywhere.
|
||||
|
||||
(Again: this might change in the future.)
|
||||
|
||||
---
|
||||
@@ -2714,19 +2837,24 @@ WHY?!?
|
||||
|
||||
- We will use `ngrep`, which allows to grep for network traffic
|
||||
|
||||
- We will run it in a container (because we can!)
|
||||
|
||||
- We will use host networking to sniff the host's traffic
|
||||
- We will run it in a container, using host networking to access the host's interfaces
|
||||
|
||||
.exercise[
|
||||
|
||||
- Sniff network traffic and display all packets containing "HTTP":
|
||||
```bash
|
||||
docker run --net host nicolaka/netshoot ngrep -tpd eth0 HTTP
|
||||
docker run --net host jpetazzo/netshoot ngrep -tpd eth0 HTTP
|
||||
```
|
||||
|
||||
]
|
||||
|
||||
--
|
||||
|
||||
Seeing tons of HTTP request? Shutdown your DockerCoins workers:
|
||||
```bash
|
||||
docker service update worker --replicas=0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Check that we are, indeed, sniffing traffic
|
||||
@@ -2873,7 +3001,12 @@ Note how the build and push were fast (because caching).
|
||||
|
||||
.exercise[
|
||||
|
||||
- In the other window, update the service to the new image:
|
||||
- In the other window, bring back the workers (if you had stopped them earlier):
|
||||
```bash
|
||||
docker service update worker --replicas 10
|
||||
```
|
||||
|
||||
- Then, update the service to the new image:
|
||||
```bash
|
||||
docker service update worker --image $IMAGE
|
||||
```
|
||||
@@ -2921,6 +3054,8 @@ The current upgrade will continue at a faster pace.
|
||||
|
||||
]
|
||||
|
||||
Note: if you updated the roll-out parallelism, *rollback* will not rollback to the previous image; it will rollback to the previous roll-out cadence.
|
||||
|
||||
---
|
||||
|
||||
## Timeline of an upgrade
|
||||
@@ -3330,7 +3465,7 @@ Note: if somebody steals both your disks and your key, .strike[you're doomed! Do
|
||||
|
||||
.exercise[
|
||||
|
||||
- Revert to a non-encrypted cluster:
|
||||
- Permanently unlock the cluster:
|
||||
```bash
|
||||
docker swarm update --autolock=false
|
||||
```
|
||||
@@ -4492,6 +4627,8 @@ Congratulations, you are viewing the CPU usage of a single container!
|
||||
|
||||
- A *Prometheus server* will *scrape* URLs like these
|
||||
|
||||
(It can also use protobuf to avoid the overhead of parsing line-oriented formats!)
|
||||
|
||||
---
|
||||
|
||||
## Collecting metrics with Prometheus on Swarm
|
||||
@@ -5061,7 +5198,7 @@ The editor is a bit less friendly than the one we used for InfluxDB.
|
||||
|
||||
- Adding a constraint caused the service to be redeployed:
|
||||
```bash
|
||||
docker service ps stateful
|
||||
docker service ps stateful -a
|
||||
```
|
||||
|
||||
]
|
||||
@@ -5517,6 +5654,11 @@ It doesn't work!?!
|
||||
pip install git+git://github.com/docker/compose
|
||||
```
|
||||
|
||||
- Re-hash our `$PATH`:
|
||||
```bash
|
||||
hash docker-compose
|
||||
```
|
||||
|
||||
]
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user