mirror of
https://github.com/jpetazzo/container.training.git
synced 2026-03-05 02:40:27 +00:00
5681 lines
101 KiB
HTML
5681 lines
101 KiB
HTML
<!DOCTYPE html>
|
||
<html>
|
||
<head>
|
||
<base target="_blank">
|
||
<title>Docker Orchestration Workshop</title>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
|
||
<style type="text/css">
|
||
@import url(https://fonts.googleapis.com/css?family=Yanone+Kaffeesatz);
|
||
@import url(https://fonts.googleapis.com/css?family=Droid+Serif:400,700,400italic);
|
||
@import url(https://fonts.googleapis.com/css?family=Ubuntu+Mono:400,700,400italic);
|
||
|
||
body { font-family: 'Droid Serif'; font-size: 150%; }
|
||
|
||
h1, h2, h3 {
|
||
font-family: 'Yanone Kaffeesatz';
|
||
font-weight: normal;
|
||
}
|
||
a {
|
||
text-decoration: none;
|
||
color: blue;
|
||
}
|
||
.remark-code, .remark-inline-code { font-family: 'Ubuntu Mono'; }
|
||
.red { color: #fa0000; }
|
||
.gray { color: #ccc; }
|
||
.small { font-size: 70%; }
|
||
.big { font-size: 140%; }
|
||
.underline { text-decoration: underline; }
|
||
.footnote {
|
||
position: absolute;
|
||
bottom: 3em;
|
||
}
|
||
.pic {
|
||
vertical-align: middle;
|
||
text-align: center;
|
||
padding: 0 0 0 0 !important;
|
||
}
|
||
img {
|
||
max-width: 100%;
|
||
max-height: 450px;
|
||
}
|
||
.title {
|
||
vertical-align: middle;
|
||
text-align: center;
|
||
}
|
||
.title {
|
||
font-size: 2em;
|
||
}
|
||
.title .remark-slide-number {
|
||
font-size: 0.5em;
|
||
}
|
||
.quote {
|
||
background: #eee;
|
||
border-left: 10px solid #ccc;
|
||
margin: 1.5em 10px;
|
||
padding: 0.5em 10px;
|
||
quotes: "\201C""\201D""\2018""\2019";
|
||
font-style: italic;
|
||
}
|
||
.quote:before {
|
||
color: #ccc;
|
||
content: open-quote;
|
||
font-size: 4em;
|
||
line-height: 0.1em;
|
||
margin-right: 0.25em;
|
||
vertical-align: -0.4em;
|
||
}
|
||
.quote p {
|
||
display: inline;
|
||
}
|
||
.icon img {
|
||
height: 1em;
|
||
}
|
||
.exercise {
|
||
background-color: #eee;
|
||
background-image: url("keyboard.png");
|
||
background-size: 1.4em;
|
||
background-repeat: no-repeat;
|
||
background-position: 0.2em 0.2em;
|
||
border: 2px dotted black;
|
||
}
|
||
.exercise::before {
|
||
content: "Exercise:";
|
||
margin-left: 1.8em;
|
||
}
|
||
li p { line-height: 1.25em; }
|
||
</style>
|
||
</head>
|
||
<body>
|
||
<textarea id="source">
|
||
|
||
class: title
|
||
|
||
# Docker <br/> Orchestration <br/> Workshop
|
||
|
||
---
|
||
|
||
## Logistics
|
||
|
||
- Hello! We are:
|
||
<br/>`jerome at docker dot com`
|
||
<br/>`aj at soulshake dot net`
|
||
|
||
<!--
|
||
Reminder, when updating the agenda: when people are told to show
|
||
up at 9am, they usually trickle in until 9:30am (except for paid
|
||
training sessions). If you're not sure that people will be there
|
||
on time, it's a good idea to have a breakfast with the attendees
|
||
at e.g. 9am, and start at 9:30.
|
||
-->
|
||
|
||
- Agenda:
|
||
|
||
.small[
|
||
- 09:00-09:15 hello!
|
||
- 09:15-10:45 part 1
|
||
- 10:45-11:00 coffee break
|
||
- 11:00-12:30 part 2
|
||
- 12:30-13:45 lunch break
|
||
- 13:45-15:15 part 3
|
||
- 15:15-15:30 coffee break
|
||
- 15:30-17:00 part 4
|
||
]
|
||
|
||
<!-- - This will be FAST PACED, but DON'T PANIC! -->
|
||
|
||
- All the content is publicly available
|
||
<br/>(slides, code samples, scripts)
|
||
|
||
<!--
|
||
Remember to change:
|
||
- the Gitter link below
|
||
- the other Gitter link
|
||
- the "tweet my speed" hashtag in DockerCoins HTML
|
||
-->
|
||
|
||
- Experimental chat support on
|
||
[Gitter](https://gitter.im/jpetazzo/workshop-20160322-munchen)
|
||
|
||
---
|
||
|
||
|
||
<!--
|
||
grep '^# ' index.html | grep -v '<br' | tr '#' '-'^C
|
||
-->
|
||
|
||
## Outline (1/4)
|
||
|
||
- Pre-requirements
|
||
- VM environment
|
||
- Our sample application
|
||
- Running services independently
|
||
- Running the whole app on a single node
|
||
- Identifying bottlenecks
|
||
- Measuring latency under load
|
||
- Scaling HTTP on a single node
|
||
- Put a load balancer on it
|
||
- Connecting to containers on other hosts
|
||
- Abstracting remote services with ambassadors
|
||
- Various considerations about ambassadors
|
||
|
||
---
|
||
|
||
## Outline (2/4)
|
||
|
||
- Docker for ops
|
||
- Backups
|
||
- Logs
|
||
- Storing container logs in an ELK stack
|
||
- Security upgrades
|
||
- Network traffic analysis
|
||
|
||
---
|
||
|
||
## Outline (3/4)
|
||
|
||
- Dynamic orchestration
|
||
- Hands-on Swarm
|
||
- Deploying Swarm
|
||
- Cluster discovery
|
||
- Building our app on Swarm
|
||
- Connecting containers with ambassadors
|
||
- Setting up Consul and overlay networks
|
||
- Multi-host networking
|
||
- Using overlay networks with Compose
|
||
|
||
---
|
||
|
||
## Outline (4/4)
|
||
|
||
- Here be dragons
|
||
- Highly available Swarm managers
|
||
- Highly available containers
|
||
- Conclusions
|
||
|
||
---
|
||
|
||
# Pre-requirements
|
||
|
||
- Computer with network connection and SSH client
|
||
<br/>(on Windows, get [putty](http://www.putty.org/)
|
||
or [Git BASH](https://msysgit.github.io/))
|
||
|
||
- Basic Docker knowledge
|
||
<br/>(but that's OK if you're not a Docker expert!)
|
||
|
||
---
|
||
|
||
## Nice-to-haves
|
||
|
||
- [GitHub](https://github.com/join) account
|
||
<br/>(if you want to fork the repo; also used to join Gitter)
|
||
|
||
- [Gitter](https://gitter.im/) account
|
||
<br/>(to join the conversation during the workshop)
|
||
|
||
- [Docker Hub](https://hub.docker.com) account
|
||
<br/>(it's one way to distribute images on your Swarm cluster)
|
||
|
||
---
|
||
|
||
## Hands-on sections
|
||
|
||
- The whole workshop is hands-on
|
||
|
||
- I will show Docker in action
|
||
|
||
- I invite you to reproduce what I do
|
||
|
||
- All hands-on sections are clearly identified
|
||
<br/>(see below)
|
||
|
||
.exercise[
|
||
|
||
- This is the stuff you're supposed to do!
|
||
- Go to [container.training](http://container.training/) to view these slides
|
||
- Join the chat room on
|
||
[Gitter](https://gitter.im/jpetazzo/workshop-20160322-munchen)
|
||
|
||
]
|
||
|
||
---
|
||
|
||
# VM environment
|
||
|
||
- Each person gets 5 VMs
|
||
- They are *your* VMs
|
||
- They'll be up until tomorrow
|
||
- You have a little card with login+password+IP addresses
|
||
- You can automatically SSH from one VM to another
|
||
|
||
.exercise[
|
||
|
||
- Log into the first VM (`node1`)
|
||
- Check that you can SSH (without password) to `node2`
|
||
- Check the version of docker with `docker version`
|
||
|
||
]
|
||
|
||
.footnote[Note: from now on, unless instructed, **all commands must
|
||
be run from the first VM, `node1`**.]
|
||
|
||
---
|
||
|
||
## Terminals
|
||
|
||
Once in a while, the instructions will say:
|
||
<br/>"Open a new terminal."
|
||
|
||
There are multiple ways to do this:
|
||
|
||
- create a new window or tab on your machine,
|
||
<br/>and SSH into the VM;
|
||
|
||
- use tmux on the VM and open a new window in tmux.
|
||
|
||
If you want to use screen or whatever, you're welcome!
|
||
|
||
---
|
||
|
||
## Tmux cheatsheet
|
||
|
||
- Ctrl-b c → creates a new window
|
||
- Ctrl-b n → go to next window
|
||
- Ctrl-b p → go to previous window
|
||
- Ctrl-b " → split window top/bottom
|
||
- Ctrl-b % → split window left/right
|
||
- Ctrl-b Alt-1 → rearrange windows in columns
|
||
- Ctrl-b Alt-2 → rearrange windows in rows
|
||
- Ctrl-b arrows → navigate to other windows
|
||
- Ctrl-b d → detach session
|
||
- tmux attach → reattach to session
|
||
|
||
---
|
||
|
||
## Brand new versions!
|
||
|
||
- Engine 1.10.**3**
|
||
|
||
- Compose 1.6.2
|
||
|
||
- Swarm 1.1.3
|
||
|
||
- Machine 0.6.0
|
||
|
||
---
|
||
|
||
# Our sample application
|
||
|
||
- Let's look at the general layout of the
|
||
[source code](https://github.com/jpetazzo/orchestration-workshop)
|
||
|
||
- Each directory = 1 microservice
|
||
- `rng` = web service generating random bytes
|
||
- `hasher` = web service computing hash of POSTed data
|
||
- `worker` = background process using `rng` and `hasher`
|
||
- `webui` = web interface to watch progress
|
||
|
||
.exercise[
|
||
|
||
- Clone the repository on `node1`:
|
||
<br/>.small[`git clone git://github.com/jpetazzo/orchestration-workshop`]
|
||
|
||
]
|
||
|
||
(Bonus points for forking on GitHub and cloning your fork!)
|
||
|
||
---
|
||
|
||
## What's this application?
|
||
|
||
--
|
||
|
||

|
||
|
||
(DockerCoins logo courtesy of @jonasrosland. Thanks!)
|
||
|
||
---
|
||
|
||
## What's this application?
|
||
|
||
- It is a DockerCoin miner! 💰🐳📦🚢
|
||
|
||
- No, you can't buy coffee with DockerCoins
|
||
|
||
- How DockerCoins works:
|
||
|
||
- `worker` asks to `rng` to give it random bytes
|
||
- `worker` feeds those random bytes into `hasher`
|
||
- each hash starting with `0` is a DockerCoin
|
||
- DockerCoins are stored in `redis`
|
||
- `redis` is also updated every second to track speed
|
||
- you can see the progress with the `webui`
|
||
|
||
Next: we will inspect components independently.
|
||
|
||
---
|
||
|
||
# Running services independently
|
||
|
||
First, we will run the random number generator (`rng`).
|
||
|
||
.exercise[
|
||
|
||
- Go to the `dockercoins` directory, in the cloned repo:
|
||
<br/>`cd orchestration-workshop/dockercoins`
|
||
|
||
- Use Compose to run the `rng` service:
|
||
<br/>`docker-compose up rng`
|
||
|
||
- Docker will pull `python` and build the microservice
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Lies, damn lies, and port numbers
|
||
|
||
.icon[] Pay attention to the port mapping!
|
||
|
||
- The container log says:
|
||
<br/>`Running on http://0.0.0.0:80/`
|
||
|
||
- But if you try `curl localhost:80`, you will get:
|
||
<br/>`Connection refused`
|
||
|
||
- Port 80 on the container ≠ port 80 on the Docker host
|
||
|
||
---
|
||
|
||
## Understanding port mapping
|
||
|
||
- `node1`, the Docker host, has only one port 80
|
||
|
||
- If we give the one and only port 80 to the first
|
||
container who asks for it, we are in trouble when
|
||
another container needs it
|
||
|
||
- Default behavior: containers are not "exposed"
|
||
<br/>(only reachable by the Docker host and other containers,
|
||
through their private address)
|
||
|
||
- Container network services can be exposed:
|
||
|
||
- statically (you decide which host port to use)
|
||
|
||
- dynamically (Docker allocates a host port)
|
||
|
||
---
|
||
|
||
## Declaring port mapping
|
||
|
||
- Directly with the Docker Engine:
|
||
<br/>`docker run -d -p 8000:80 nginx`
|
||
<br/>`docker run -d -p 80 nginx`
|
||
<br/>`docker run -d -P nginx`
|
||
|
||
- With Docker Compose, in the `docker-compose.yml` file:
|
||
|
||
```
|
||
rng:
|
||
…
|
||
ports:
|
||
- "8001:80"
|
||
```
|
||
|
||
→ port 8001 *on the host* maps to
|
||
port 80 *in the container*
|
||
|
||
---
|
||
|
||
## Using the `rng` service
|
||
|
||
Let's get random bytes of data!
|
||
|
||
.exercise[
|
||
|
||
- Open a new terminal and connect to the same VM
|
||
|
||
<!--
|
||
```
|
||
NEW-TERM
|
||
```
|
||
-->
|
||
|
||
- Check that the service is alive:
|
||
<br/>`curl localhost:8001`
|
||
|
||
- Get 10 bytes of random data:
|
||
<br/>`curl localhost:8001/10`
|
||
|
||
- If the binary data output messed up your terminal, fix it:
|
||
<br/>`reset`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Running the hasher
|
||
|
||
.exercise[
|
||
|
||
- Open yet another terminal
|
||
|
||
<!--
|
||
```
|
||
NEW-TERM
|
||
```
|
||
-->
|
||
|
||
- Start the `hasher` service:
|
||
<br/>`docker-compose up hasher`
|
||
|
||
- It will pull `ruby` and do the build
|
||
|
||
]
|
||
|
||
.icon[] Again, pay attention to the port mapping!
|
||
|
||
The container log says that it's listening on port 80,
|
||
but it's mapped to port 8002 on the host.
|
||
|
||
You can see the mapping in `docker-compose.yml`.
|
||
|
||
---
|
||
|
||
## Testing the hasher
|
||
|
||
.exercise[
|
||
|
||
- Open one more terminal to `node1`
|
||
|
||
<!--
|
||
```
|
||
NEW-TERM
|
||
```
|
||
-->
|
||
|
||
- Check that the `hasher` service is alive:
|
||
<br/>`curl localhost:8002`
|
||
|
||
- Posting binary data requires some extra flags:
|
||
|
||
```
|
||
curl \
|
||
-H "Content-type: application/octet-stream" \
|
||
--data-binary hello \
|
||
localhost:8002
|
||
```
|
||
|
||
- Check that it computed the right hash:
|
||
<br/>`echo -n hello | sha256sum`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Stopping services
|
||
|
||
We have multiple options:
|
||
|
||
- Interrupt `docker-compose up` with `^C`
|
||
|
||
- Stop individual services with `docker-compose stop rng`
|
||
|
||
- Stop all services with `docker-compose stop`
|
||
|
||
- Kill all services with `docker-compose kill`
|
||
<br/>(rude, but faster!)
|
||
|
||
- Stop and remove all services with `docker-compose down`
|
||
|
||
.exercise[
|
||
|
||
- Use any of those methods to stop `rng` and `hasher`
|
||
|
||
]
|
||
|
||
???
|
||
|
||
This hidden content is here for automation
|
||
(so that `docker-compose kill` gets executed
|
||
when auto-testing the content).
|
||
|
||
.exercise[
|
||
|
||
```
|
||
docker-compose kill
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
# Running the whole app on a single node
|
||
|
||
.exercise[
|
||
|
||
- Run `docker-compose up` to start all components
|
||
|
||
]
|
||
|
||
- `rng` and `hasher` can be started directly
|
||
|
||
- Other components are built accordingly
|
||
|
||
- Aggregate output is shown
|
||
|
||
- Output is verbose
|
||
<br/>(because the worker is constantly hitting other services)
|
||
|
||
---
|
||
|
||
## Viewing our application
|
||
|
||
- The app exposes a Web UI with a realtime progress graph
|
||
|
||
.exercise[
|
||
|
||
- Open http://[yourVMaddr]:8000/ (from a browser)
|
||
|
||
]
|
||
|
||
- The app actually has a constant, steady speed
|
||
<br/>(3.33 coins/second)
|
||
|
||
- The speed seems not-so-steady because:
|
||
|
||
- we measure a discrete value over discrete intervals
|
||
|
||
- the measurement is done by the browser
|
||
|
||
- BREAKING: network latency is a thing
|
||
|
||
---
|
||
|
||
## Running in the background
|
||
|
||
- The logs are very verbose (and won't get better)
|
||
|
||
- Let's put them in the background for now!
|
||
|
||
.exercise[
|
||
|
||
- Stop the app (with `^C`)
|
||
|
||
- Start it again with `docker-compose up -d`
|
||
|
||
- Check on the web UI that the app is still making progress
|
||
|
||
]
|
||
|
||
Note: there is a regression in Compose 1.6 when it
|
||
is installed as a self-contained binary: `^C` doesn't
|
||
stop the containers. It will be fixed soon.
|
||
Meanwhile, installing with `pip` is fine too.
|
||
|
||
---
|
||
|
||
## Looking at resource usage
|
||
|
||
- Let's look at CPU, memory, and I/O usage
|
||
|
||
.exercise[
|
||
|
||
- run `top` to see CPU and memory usage
|
||
<br/>(you should see idle cycles)
|
||
|
||
- run `vmstat 3` to see I/O usage (si/so/bi/bo)
|
||
<br/>(the 4 numbers should be almost zero,
|
||
<br/>except `bo` for logging)
|
||
|
||
]
|
||
|
||
We have available resources.
|
||
|
||
- Why?
|
||
- How can we use them?
|
||
|
||
---
|
||
|
||
## Scaling workers on a single node
|
||
|
||
- Docker Compose supports scaling.red[*]
|
||
- Let's scale `worker` and see what happens!
|
||
|
||
.exercise[
|
||
|
||
- Start 9 more `worker` containers:
|
||
<br/>`docker-compose scale worker=10`
|
||
|
||
- Check the aggregated logs of those containers:
|
||
<br/>`docker-compose logs worker`
|
||
|
||
- See the impact on CPU load (with top/htop),
|
||
<br/>and on compute speed (with web UI)
|
||
|
||
]
|
||
|
||
.footnote[.red[*]With some limitations, as we'll see later.]
|
||
|
||
---
|
||
|
||
# Identifying bottlenecks
|
||
|
||
- You should have seen a 3x speed bump (not 10x)
|
||
|
||
- Adding workers didn't result in linear improvement
|
||
|
||
- *Something else* is slowing us down
|
||
|
||
--
|
||
|
||
- ... But what?
|
||
|
||
--
|
||
|
||
- The code doesn't have instrumentation
|
||
|
||
- Let's use state-of-the-art HTTP performance analysis!
|
||
<br/>(i.e. good old tools like `ab`, `httping`...)
|
||
|
||
???
|
||
|
||
## Benchmarking our microservices
|
||
|
||
We will test microservices in isolation.
|
||
|
||
.exercise[
|
||
|
||
- Stop the application:
|
||
`docker-compose kill`
|
||
|
||
- Remove old containers:
|
||
`docker-compose rm`
|
||
|
||
- Start `hasher` and `rng`:
|
||
`docker-compose up hasher rng`
|
||
|
||
]
|
||
|
||
Now let's hammer them with requests!
|
||
|
||
???
|
||
|
||
## Testing `rng`
|
||
|
||
Let's assess the raw performance of our RNG.
|
||
|
||
.exercise[
|
||
|
||
- Test the performance on one big request:
|
||
<br/>`curl -o/dev/null localhost:8001/10000000`
|
||
<br/>(should take ~1s, and show speed of ~10 MB/s)
|
||
|
||
]
|
||
|
||
If we were doing requests of 1000 bytes ...
|
||
|
||
... Could we get 10k req/s?
|
||
|
||
Let's test and see what happens!
|
||
|
||
???
|
||
|
||
## Concurrent requests
|
||
|
||
.exercise[
|
||
|
||
- Test 100 requests of 1000 bytes each:
|
||
<br/>`ab -n 100 localhost:8001/1000`
|
||
|
||
- Test 100 requests, 10 requests in parallel:
|
||
<br/>`ab -n 100 -c 10 localhost:8001/1000`
|
||
<br/>(look how the latency has increased!)
|
||
|
||
- Try with 100 requests in parallel:
|
||
<br/>`ab -n 100 -c 100 localhost:8001/1000`
|
||
|
||
]
|
||
|
||
??
|
||
|
||
Whatever we do, we get ~10 requests/second.
|
||
|
||
Increasing concurrency doesn't help:
|
||
it just increases latency.
|
||
|
||
???
|
||
|
||
## Discussion
|
||
|
||
- When serving requests sequentially, they each take 100ms
|
||
|
||
- When 10 requests arrive at the same time:
|
||
|
||
- one request is served in 100ms
|
||
- another is served in 200ms
|
||
- another is served in 300ms
|
||
- ...
|
||
- another is served in 1000ms
|
||
|
||
- All requests are queued and served by a single thread
|
||
|
||
- It looks like `rng` doesn't handle concurrent requests
|
||
|
||
- What about `hasher`?
|
||
|
||
???
|
||
|
||
## Save some random data and stop the generator
|
||
|
||
Before testing the hasher, let's save some random
|
||
data that we will feed to the hasher later.
|
||
|
||
.exercise[
|
||
|
||
- Run `curl localhost:8001/1000000 > /tmp/random`
|
||
|
||
]
|
||
|
||
Now we can stop the generator.
|
||
|
||
.exercise[
|
||
|
||
- In the shell where you did `docker-compose up rng`,
|
||
<br/>stop it by hitting `^C`
|
||
|
||
]
|
||
|
||
???
|
||
|
||
## Benchmarking the hasher
|
||
|
||
We will hash the data that we just got from `rng`.
|
||
|
||
.exercise[
|
||
|
||
- Posting binary data requires some extra flags:
|
||
|
||
```
|
||
curl \
|
||
-H "Content-type: application/octet-stream" \
|
||
--data-binary @/tmp/random \
|
||
localhost:8002
|
||
```
|
||
|
||
- Compute the hash locally to verify that it works fine:
|
||
<br/>`sha256sum /tmp/random`
|
||
<br/>(it should display the same hash)
|
||
|
||
]
|
||
|
||
???
|
||
|
||
## The hasher under load
|
||
|
||
The invocation of `ab` will be slightly more complex as well.
|
||
|
||
.exercise[
|
||
|
||
- Execute 100 requests in a row:
|
||
|
||
```
|
||
ab -n 100 -T application/octet-stream \
|
||
-p /tmp/random localhost:8002/
|
||
```
|
||
|
||
- Execute 100 requests with 10 requests in parallel:
|
||
|
||
```
|
||
ab -c 10 -n 100 -T application/octet-stream \
|
||
-p /tmp/random localhost:8002/
|
||
```
|
||
|
||
]
|
||
|
||
Take note of the performance numbers (requests/s).
|
||
|
||
???
|
||
|
||
## Benchmarking the hasher on smaller data
|
||
|
||
Here we hashed 1,000,000 bytes.
|
||
|
||
Later we will hash much smaller payloads.
|
||
|
||
Let's repeat the tests with smaller data.
|
||
|
||
.exercise[
|
||
|
||
- Run `truncate --size=10 /tmp/random`
|
||
- Repeat the `ab` tests
|
||
|
||
]
|
||
|
||
---
|
||
|
||
# Measuring latency under load
|
||
|
||
We will use `httping`.
|
||
|
||
.exercise[
|
||
|
||
- Scale back the `worker` service to zero:
|
||
<br/>`docker-compose scale worker=0`
|
||
|
||
- Open a new terminal and check the latency of `rng`:
|
||
<br/>`httping localhost:8001`
|
||
|
||
- Open a new terminal and do the same for `hasher`:
|
||
<br/>`httping localhost:8002`
|
||
|
||
- Keep an eye on both connections!
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Latency in initial conditions
|
||
|
||
Latency for both services should be very low (~1ms).
|
||
|
||
Now add a first worker and see what happens.
|
||
|
||
.exercise[
|
||
|
||
- Create the first `worker` instance:
|
||
<br/>`docker-compose scale worker=1`
|
||
|
||
]
|
||
|
||
- `hasher` should be very low (~1ms)
|
||
|
||
- `rng` should be low, with occasional spikes (10-100ms)
|
||
|
||
---
|
||
|
||
## Latency when scaling the worker
|
||
|
||
We will add workers and see what happens.
|
||
|
||
.exercise[
|
||
|
||
- Run `docker-compose scale worker=2`
|
||
|
||
- Check latency
|
||
|
||
- Increase number of workers and repeat
|
||
|
||
]
|
||
|
||
What happens?
|
||
|
||
- `hasher` remains low
|
||
- `rng` spikes up until it is reaches ~(N-2)*100ms
|
||
<br/>(when you have N workers)
|
||
|
||
---
|
||
|
||
class: title
|
||
|
||
Why?
|
||
|
||
---
|
||
|
||
## Why does everything take (at least) 100ms?
|
||
|
||
--
|
||
|
||
`rng` code:
|
||
|
||

|
||
|
||
--
|
||
|
||
`hasher` code:
|
||
|
||

|
||
|
||
---
|
||
|
||
class: title
|
||
|
||
But ...
|
||
|
||
WHY?!?
|
||
|
||
---
|
||
|
||
## Why did we sprinkle this sample app with sleeps?
|
||
|
||
- Deterministic performance
|
||
<br/>(regardless of instance speed, CPUs, I/O...)
|
||
|
||
--
|
||
|
||
- Actual code sleeps all the time anyway
|
||
|
||
--
|
||
|
||
- When your code makes a remote API call:
|
||
|
||
- it sends a request;
|
||
|
||
- it sleeps until it gets the response;
|
||
|
||
- it processes the response.
|
||
|
||
---
|
||
|
||
## Why do `rng` and `hasher` behave differently?
|
||
|
||

|
||
|
||
--
|
||
|
||
(Synchronous vs. asynchronous event processing)
|
||
|
||
---
|
||
|
||
## How to make `rng` go faster
|
||
|
||
- Obvious solution: comment out the `sleep` instruction
|
||
|
||
--
|
||
|
||
- Unfortunately, in the real world, network latency exists
|
||
|
||
--
|
||
|
||
- More realistic solution: use an asynchronous framework
|
||
<br/>(e.g. use gunicorn with gevent)
|
||
|
||
--
|
||
|
||
- New rule: we can't change the code!
|
||
|
||
--
|
||
|
||
- Solution: scale out `rng`
|
||
<br/>(dispatch `rng` requests on multiple instances)
|
||
|
||
---
|
||
|
||
# Scaling HTTP on a single node
|
||
|
||
- We could try to scale with Compose:
|
||
|
||
```
|
||
docker-compose scale rng=3
|
||
```
|
||
|
||
- Compose doesn't deal with load balancing
|
||
|
||
- We would get 3 instances ...
|
||
|
||
- ... But only the first one would serve traffic
|
||
|
||
---
|
||
|
||
## The plan
|
||
|
||
<!--
|
||
- Stop the `rng` service first
|
||
-->
|
||
|
||
- Create multiple identical `rng` containers
|
||
|
||
- Put a load balancer in front of them
|
||
|
||
- Point other services to the load balancer
|
||
|
||
???
|
||
|
||
## Stopping `rng`
|
||
|
||
- That's the easy part!
|
||
|
||
.exercise[
|
||
|
||
- Use `docker-compose` to stop `rng`:
|
||
|
||
```
|
||
docker-compose stop rng
|
||
```
|
||
|
||
]
|
||
|
||
Note: we do this first because we are about to remove
|
||
`rng` from the Docker Compose file.
|
||
|
||
If we don't stop
|
||
`rng` now, it will remain up and running, with Compose
|
||
being unaware of its existence!
|
||
|
||
---
|
||
|
||
## Scaling `rng`
|
||
|
||
.exercise[
|
||
|
||
- Replace the `rng` service with multiple copies of it:
|
||
|
||
```
|
||
rng1:
|
||
build: rng
|
||
|
||
rng2:
|
||
build: rng
|
||
|
||
rng3:
|
||
build: rng
|
||
```
|
||
|
||
]
|
||
|
||
That's all!
|
||
|
||
Shortcut: `docker-compose.yml-scaled-rng`
|
||
|
||
---
|
||
|
||
## Introduction to `jpetazzo/hamba`
|
||
|
||
- Public image on the Docker Hub
|
||
|
||
- Load balancer based on HAProxy
|
||
|
||
- Expects the following arguments:
|
||
<br/>`FE-port BE1-addr:BE1-port BE2-addr:BE2-port ...`
|
||
<br/>*or*
|
||
<br/>`FE-addr:FE-port BE1-addr:BE1-port BE2-addr:BE2-port ...`
|
||
|
||
- FE=frontend (the thing other services connect to)
|
||
|
||
- BE=backend (the multiple copies of your scaled service)
|
||
|
||
.small[
|
||
Example: listen to port 80 and balance traffic on www1:1234 + www2:2345
|
||
|
||
```
|
||
docker run -d -p 80 jpetazzo/hamba 80 www1:1234 www2:2345
|
||
```
|
||
]
|
||
|
||
---
|
||
|
||
# Put a load balancer on it
|
||
|
||
Let's add our load balancer to the Compose file.
|
||
|
||
.exercise[
|
||
|
||
- Add the following section to the Compose file:
|
||
|
||
```
|
||
rng:
|
||
image: jpetazzo/hamba
|
||
links:
|
||
- rng1
|
||
- rng2
|
||
- rng3
|
||
command: 80 rng1 80 rng2 80 rng3 80
|
||
ports:
|
||
- "8001:80"
|
||
```
|
||
|
||
]
|
||
|
||
Shortcut: `docker-compose.yml-scaled-rng`
|
||
|
||
???
|
||
|
||
## Point other services to the load balancer
|
||
|
||
- The only affected service is `worker`
|
||
|
||
- We have to replace the `rng` link with a link to `rng0`,
|
||
but it should still be named `rng` (so we don't change the code)
|
||
|
||
.exercise[
|
||
|
||
- Update the `worker` section as follows:
|
||
|
||
```
|
||
worker:
|
||
build: worker
|
||
links:
|
||
- rng0:rng
|
||
- hasher
|
||
- redis
|
||
```
|
||
|
||
]
|
||
|
||
Shortcut: `docker-compose.yml-scaled-rng`
|
||
|
||
---
|
||
|
||
## Start the whole stack
|
||
|
||
.exercise[
|
||
|
||
- Start the new services:
|
||
<br/>`docker-compose up -d`
|
||
|
||
- Check worker logs:
|
||
<br/>`docker-compose logs worker`
|
||
|
||
- Check load balancer logs:
|
||
<br/>`docker-compose logs rng`
|
||
|
||
]
|
||
|
||
<!--
|
||
If you get errors about port 8001, make sure that
|
||
`rng` was stopped correctly and try again.
|
||
-->
|
||
|
||
---
|
||
|
||
## Results
|
||
|
||
- Check the latency of `rng`
|
||
<br/>(it should have improved significantly!)
|
||
|
||
- Check the application performance in the Web UI
|
||
<br/>(it should improve if you have enough workers)
|
||
|
||
*Note: if `worker` was scaled when you did `docker-compose up`,
|
||
it probably took a while, because `worker` doesn't handle
|
||
signals properly and Docker patiently waits 10 seconds for
|
||
each `worker` instance to terminate. This would be much
|
||
faster for a well-behaved application.*
|
||
|
||
---
|
||
|
||
## The good, the bad, the ugly
|
||
|
||
--
|
||
|
||
- The good
|
||
|
||
We scaled a service, added a load balancer -
|
||
<br/>without changing a single line of code.
|
||
|
||
--
|
||
|
||
- The bad
|
||
|
||
We manually copy-pasted sections in `docker-compose.yml`.
|
||
|
||
Improvement: write scripts to transform the YAML file.
|
||
|
||
--
|
||
|
||
- The ugly
|
||
|
||
If we scale up/down, we have to restart everything.
|
||
|
||
Improvement: reconfigure the load balancer dynamically.
|
||
|
||
---
|
||
|
||
# Connecting to containers on other hosts
|
||
|
||
- So far, our whole stack is on a single machine
|
||
|
||
- We want to scale out (across multiple nodes)
|
||
|
||
- We will deploy the same stack multiple times
|
||
|
||
- But we want every stack to use the same Redis
|
||
<br/>(in other words: Redis is our only *stateful* service here)
|
||
|
||
--
|
||
|
||
- And remember: we're not allowed to change the code!
|
||
|
||
- the code connects to host `redis`
|
||
- `redis` must resolve to the address of our Redis service
|
||
- the Redis service must listen on the default port (6379)
|
||
|
||
???
|
||
|
||
## Using host name injection to abstract service dependencies
|
||
|
||
- It is possible to add host entries to a container
|
||
|
||
- With the CLI:
|
||
|
||
```
|
||
docker run --add-host redis:192.168.1.2 myservice...
|
||
```
|
||
|
||
- In a Compose file:
|
||
|
||
```
|
||
myservice:
|
||
image: myservice
|
||
extra_host:
|
||
redis: 192.168.1.2
|
||
```
|
||
|
||
- Docker exposes a DNS server to the container,
|
||
<br/>with a private view where `redis` resolves to `192.168.1.2`
|
||
(Before Engine 1.10, it created entries in `/etc/hosts`)
|
||
|
||
???
|
||
|
||
## The plan
|
||
|
||
- Deploy our Redis service separately
|
||
|
||
- use the same `redis` image
|
||
|
||
- make sure that Redis server port (6379) is publicly accessible,
|
||
using port 6379 on the Docker host
|
||
|
||
- Update our Docker Compose YAML file
|
||
|
||
- remove the `redis` section
|
||
|
||
- in the `links` section, remove `redis`
|
||
|
||
- instead, put a `redis` entry in `extra_hosts`
|
||
|
||
Note: the code stays on the first node!
|
||
<br/>(We do not need to copy the code to the other nodes.)
|
||
|
||
???
|
||
|
||
## Making Redis available on its default port
|
||
|
||
There are two strategies.
|
||
|
||
- `docker run -p 6379:6379 redis`
|
||
|
||
- the container has its own, isolated network stack
|
||
- Docker creates a port mapping rule through iptables
|
||
- slight performance overhead
|
||
- port number is explicit (visible through Docker API)
|
||
|
||
- `docker run --net host redis`
|
||
|
||
- the container uses the network stack of the host
|
||
- when it binds to 6379/tcp, that's 6379/tcp on the host
|
||
- allows raw speed (no overhead due to iptables/bridge)
|
||
- port number is not visible through Docker API
|
||
|
||
Choose wisely!
|
||
|
||
???
|
||
|
||
## Deploy Redis
|
||
|
||
.exercise[
|
||
|
||
- Start a new redis container, mapping port 6379 to 6379:
|
||
|
||
```
|
||
docker run -d -p 6379:6379 redis
|
||
```
|
||
|
||
- Check that it's running with `docker ps`
|
||
|
||
- Note the IP address of this Docker host
|
||
|
||
- Try to connect to it (from anywhere):
|
||
|
||
```
|
||
curl node1:6379
|
||
```
|
||
|
||
]
|
||
|
||
The `ERR` messages are normal: Redis speaks Redis, not HTTP.
|
||
|
||
???
|
||
|
||
## Update `docker-compose.yml` (1/3)
|
||
|
||
.exercise[
|
||
|
||
- Comment out `redis`:
|
||
|
||
```
|
||
#redis:
|
||
# image: redis
|
||
```
|
||
|
||
]
|
||
|
||
???
|
||
|
||
## Update `docker-compose.yml` (2/3)
|
||
|
||
.exercise[
|
||
|
||
- Update `worker`:
|
||
|
||
```
|
||
worker:
|
||
build: worker
|
||
extra_hosts:
|
||
redis: A.B.C.D
|
||
links:
|
||
- rng
|
||
- hasher
|
||
```
|
||
|
||
]
|
||
|
||
Replace `A.B.C.D` with the IP address noted earlier.
|
||
|
||
Shortcut: `docker-compose.yml-extra-hosts`
|
||
<br/>(But you still have to replace `A.B.C.D`!)
|
||
|
||
???
|
||
|
||
## Update `docker-compose.yml` (3/3)
|
||
|
||
.exercise[
|
||
|
||
- Update `webui`:
|
||
|
||
```
|
||
webui:
|
||
build: webui
|
||
extra_hosts:
|
||
redis: A.B.C.D
|
||
ports:
|
||
- "8000:80"
|
||
volumes:
|
||
- "./webui/files/:/files/"
|
||
```
|
||
|
||
]
|
||
|
||
(Replace `A.B.C.D` with the IP address noted earlier)
|
||
|
||
???
|
||
|
||
## Start the stack on the first machine
|
||
|
||
- Nothing special to do here
|
||
|
||
- Just bring up the application like we did before
|
||
|
||
.exercise[
|
||
|
||
- `docker-compose up -d`
|
||
|
||
]
|
||
|
||
- Check in the web browser that it's running correctly
|
||
|
||
???
|
||
|
||
## Start the stack on another machine
|
||
|
||
- We will set the `DOCKER_HOST` variable
|
||
|
||
- `docker-compose` will detect and use it
|
||
|
||
- Our Docker hosts are listening on port 55555
|
||
|
||
.exercise[
|
||
|
||
- Set the environment variable:
|
||
<br/>`export DOCKER_HOST=tcp://node2:55555`
|
||
|
||
- Start the stack:
|
||
<br/>`docker-compose up -d`
|
||
|
||
- Check that it's running:
|
||
<br/>`docker-compose ps`
|
||
|
||
]
|
||
|
||
???
|
||
|
||
## Scale!
|
||
|
||
.exercise[
|
||
|
||
- Keep an eye on the web UI
|
||
|
||
- Create 20 workers on both nodes:
|
||
```
|
||
for NODE in node1 node2; do
|
||
export DOCKER_HOST=tcp://$NODE:55555
|
||
docker-compose scale worker=20
|
||
done
|
||
```
|
||
|
||
]
|
||
|
||
Note: of course, if we wanted, we could run on all five nodes.
|
||
|
||
???
|
||
|
||
## Cleanup
|
||
|
||
- Let's remove what we did
|
||
|
||
.exercise[
|
||
|
||
- You can use the following scriptlet:
|
||
|
||
```
|
||
for N in $(seq 1 5); do
|
||
export DOCKER_HOST=tcp://node$N:55555
|
||
docker ps -qa | xargs docker rm -f
|
||
done
|
||
unset DOCKER_HOST
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
# Using custom DNS mapping
|
||
|
||
- We could setup a Redis server on its default port
|
||
|
||
- And add a DNS entry mapping `redis` to this server
|
||
|
||
.exercise[
|
||
|
||
- See what happens if we run:
|
||
```
|
||
docker run --add-host redis:1.2.3.4 alpine ping redis
|
||
```
|
||
|
||
]
|
||
|
||
There is a Compose file option for that: `extra_hosts`.
|
||
|
||
---
|
||
|
||
# Abstracting remote services with ambassadors
|
||
|
||
- What if we can't/won't run Redis on its default port?
|
||
|
||
- What if we want to be able to move it easily?
|
||
|
||
--
|
||
|
||
- We will use an ambassador
|
||
|
||
- Redis will be started independently of our stack
|
||
|
||
- It will run at an arbitrary location (host+port)
|
||
|
||
- In our stack, we replace `redis` with an ambassador
|
||
|
||
- The ambassador will connect to Redis
|
||
|
||
- The ambassador will "act as" Redis in the stack
|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||

|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||

|
||
|
||
---
|
||
|
||
## Start redis
|
||
|
||
- Start a standalone Redis container
|
||
|
||
- Let Docker expose it on a random port
|
||
|
||
.exercise[
|
||
|
||
- Run redis with a random public port:
|
||
<br/>`docker run -d -P --name myredis redis`
|
||
|
||
- Check which port was allocated:
|
||
<br/>`docker port myredis 6379`
|
||
|
||
]
|
||
|
||
- Note the IP address of the machine, and this port
|
||
|
||
---
|
||
|
||
## Update `docker-compose.yml`
|
||
|
||
.exercise[
|
||
|
||
<!-- Following line to be commented out if we skip extra_hosts section -->
|
||
<!--
|
||
- Restore `links` as they were before in `webui` and `worker`
|
||
-->
|
||
<!-- -->
|
||
|
||
- Replace `redis` with an ambassador using `jpetazzo/hamba`:
|
||
|
||
```
|
||
redis:
|
||
image: jpetazzo/hamba
|
||
command: 6379 AA.BB.CC.DD EEEEE
|
||
```
|
||
|
||
]
|
||
|
||
Shortcut: `docker-compose.yml-ambassador`
|
||
<br/>(But you still have to update `AA.BB.CC.DD EEEE`!)
|
||
|
||
---
|
||
|
||
## Start the stack on the first machine
|
||
|
||
- Compose will detect the change in the `redis` service
|
||
|
||
- It will replace `redis` with a `jpetazzo/hamba` instance
|
||
|
||
.exercise[
|
||
|
||
- Just tell Compose to do its thing:
|
||
|
||
```
|
||
docker-compose up -d
|
||
```
|
||
|
||
- Check that the stack is up and running:
|
||
|
||
```
|
||
docker-compose ps
|
||
```
|
||
|
||
- Look at the Web UI to make sure that it works fine
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Start the stack on another machine
|
||
|
||
- We will set the `DOCKER_HOST` variable
|
||
|
||
- `docker-compose` will detect and use it
|
||
|
||
- Our Docker hosts are listening on port 55555
|
||
|
||
.exercise[
|
||
|
||
- Set the environment variable:
|
||
<br/>`export DOCKER_HOST=tcp://node2:55555`
|
||
|
||
- Start the stack:
|
||
<br/>`docker-compose up -d`
|
||
|
||
- Check that it's running:
|
||
<br/>`docker-compose ps`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Scale!
|
||
|
||
.exercise[
|
||
|
||
- Deploy one instance of the stack on each node:
|
||
|
||
.small[
|
||
```
|
||
for N in 3 4 5; do
|
||
DOCKER_HOST=tcp://node$N:55555 docker-compose up -d &
|
||
done
|
||
```
|
||
]
|
||
|
||
- Add a bunch of workers all over the place:
|
||
|
||
.small[
|
||
```
|
||
for N in 1 2 3 4 5; do
|
||
DOCKER_HOST=tcp://node$N:55555 docker-compose scale worker=10
|
||
done
|
||
```
|
||
]
|
||
|
||
- Admire the result in the Web UI!
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Social Media Moment
|
||
|
||
Let's celebrate our success!
|
||
|
||
(And the fact that we're just 2498349893849283948982 DockerCoins away from being able to afford a cup of coffee!)
|
||
|
||
.exercise[
|
||
|
||
- If you have a Twitter account, tweet your mining speed!
|
||
</br>(use the "Tweet this!" link below the graph☺)
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## A few words about development volumes
|
||
|
||
- Try to access the web UI on another node
|
||
|
||
--
|
||
|
||
- It doesn't work! Why?
|
||
|
||
--
|
||
|
||
- Static assets are masked by an empty volume
|
||
|
||
--
|
||
|
||
- We need to comment out the `volumes` section
|
||
|
||
---
|
||
|
||
## Why must we comment out the `volumes` section?
|
||
|
||
- Volumes have multiple uses:
|
||
|
||
- storing persistent stuff (database files...)
|
||
|
||
- sharing files between containers (logs, configuration...)
|
||
|
||
- sharing files between host and containers (source...)
|
||
|
||
- The `volumes` directive expands to an host path
|
||
<br/>.small[(e.g. `/home/docker/orchestration-workshop/dockercoins/webui/files`)]
|
||
|
||
- This host path exists on the local machine
|
||
<br/>(not on the others)
|
||
|
||
- This specific volume is used in development
|
||
<br/>(not in production)
|
||
|
||
---
|
||
|
||
## Stop the app (but leave Redis running)
|
||
|
||
- Let's use `docker-compose down`
|
||
|
||
- It will stop and remove the DockerCoins app
|
||
<br/>(but leave other containers running)
|
||
|
||
.exercise[
|
||
|
||
- We can do another simple shell loop:
|
||
```
|
||
for N in $(seq 1 5); do
|
||
export DOCKER_HOST=tcp://node$N:55555
|
||
docker-compose down &
|
||
done
|
||
```
|
||
|
||
]
|
||
|
||
(We need to keep the `myredis` container for
|
||
our next section, which will be about backups!)
|
||
|
||
---
|
||
|
||
# Various considerations about ambassadors
|
||
|
||
- "But, ambassadors are adding an extra hop!"
|
||
|
||
--
|
||
|
||
- Yes, but if you need load balancing, you need that hop
|
||
|
||
- Ambassadors actually *save* one hop
|
||
<br/>(they act as local load balancers)
|
||
|
||
- traditional load balancer:
|
||
<br/>client ⇒ external LB ⇒ server (2 physical hops)
|
||
|
||
- ambassadors:
|
||
<br/>client → ambassador ⇒ server (1 physical hop)
|
||
|
||
--
|
||
|
||
- Ambassadors are more reliable than traditional LBs
|
||
<br/>(they are colocated with their clients)
|
||
|
||
---
|
||
|
||
## Inconvenients of ambassadors
|
||
|
||
- Generic issues
|
||
<br/>(shared with any kind of load balancing / HA setup)
|
||
|
||
- extra logical hop (not transparent to the client)
|
||
|
||
- must assess backend health
|
||
|
||
- one more thing to worry about (!)
|
||
|
||
- Specific issues
|
||
|
||
- load balancing fairness
|
||
|
||
High-end load balancing solutions will rely on back pressure
|
||
from the backends. This addresses the fairness issue.
|
||
|
||
---
|
||
|
||
## There are many ways to deploy ambassadors
|
||
|
||
"Ambassador" is a design pattern.
|
||
|
||
There are many ways to implement it.
|
||
|
||
We will present three increasingly complex (but also powerful)
|
||
ways to deploy ambassadors.
|
||
|
||
---
|
||
|
||
## Single-tier ambassador deployment
|
||
|
||
- One-shot configuration process
|
||
|
||
- Must be executed manually after each scaling operation
|
||
|
||
- Scans current state, updates load balancer configuration
|
||
|
||
- Pros:
|
||
<br/>- simple, robust, no extra moving part
|
||
<br/>- easy to customize (thanks to simple design)
|
||
<br/>- can deal efficiently with large changes
|
||
|
||
- Cons:
|
||
<br/>- must be executed after each scaling operation
|
||
<br/>- harder to compose different strategies
|
||
|
||
- Example: this workshop
|
||
|
||
---
|
||
|
||
## Two-tier ambassador deployment
|
||
|
||
- Daemon listens to Docker events API
|
||
|
||
- Reacts to container start/stop events
|
||
|
||
- Adds/removes back-ends to load balancers configuration
|
||
|
||
- Pros:
|
||
<br/>- no extra step required when scaling up/down
|
||
|
||
- Cons:
|
||
<br/>- extra process to run and maintain
|
||
<br/>- deals with one event at a time (ordering matters)
|
||
|
||
- Hidden gotcha: load balancer creation
|
||
|
||
- Example: interlock
|
||
|
||
---
|
||
|
||
## Three-tier ambassador deployment
|
||
|
||
|
||
- Daemon listens to Docker events API
|
||
|
||
- Reacts to container start/stop events
|
||
|
||
- Adds/removes scaled services in distributed config DB
|
||
<br/>(zookeeper, etcd, consul…)
|
||
|
||
- Another daemon listens to config DB events
|
||
|
||
- Adds/removes backends to load balancers configuration
|
||
|
||
- Pros:
|
||
<br/>- more flexibility
|
||
|
||
- Cons:
|
||
<br/>- three extra services to run and maintain
|
||
|
||
- Example: registrator
|
||
|
||
---
|
||
|
||
## Other multi-host communication mechanisms
|
||
|
||
- Overlay networks
|
||
|
||
- weave, flannel, pipework ...
|
||
|
||
- Network plugins
|
||
|
||
- available since Engine 1.9
|
||
|
||
- Allow a flat network for your containers
|
||
|
||
- Often requires an extra service to deal with BUM packets
|
||
<br/>(broadcast/unknown/multicast)
|
||
|
||
- e.g. a key/value store (Consul, Etcd, Zookeeper ...)
|
||
|
||
- Load balancers and/or failover mechanisms still needed
|
||
|
||
---
|
||
|
||
class: title
|
||
|
||
# Interlude <br/>
|
||
|
||
# Docker for ops
|
||
|
||
---
|
||
|
||
# Backups
|
||
|
||
- Redis is still running (with name `myredis`)
|
||
|
||
- We want to enable backups without touching it
|
||
|
||
- We will use a special backup container:
|
||
|
||
- sharing the same volumes
|
||
|
||
- linked to it (to connect to it easily)
|
||
|
||
- possibly containing our backup tools
|
||
|
||
- This works because the `redis` container image
|
||
<br/>stores its data on a volume
|
||
|
||
---
|
||
|
||
## Starting the backup container
|
||
|
||
.exercise[
|
||
|
||
- Make sure you're talking to the initial host:
|
||
|
||
```
|
||
unset DOCKER_HOST
|
||
```
|
||
|
||
- Start the container:
|
||
|
||
```
|
||
docker run --link myredis:redis \
|
||
--volumes-from myredis \
|
||
-v /tmp/myredis:/output \
|
||
-ti alpine sh
|
||
```
|
||
|
||
- Look in `/data` in the container
|
||
<br/>(That's where Redis puts its data dumps)
|
||
]
|
||
|
||
---
|
||
|
||
## Connecting to Redis
|
||
|
||
- We need to tell Redis to perform a data dump *now*
|
||
|
||
.exercise[
|
||
|
||
- Connect to Redis:
|
||
<br/>`telnet redis 6379`
|
||
|
||
- Issue commands `SAVE` then `QUIT`
|
||
|
||
- Look at `/data` again (notice the time stamps)
|
||
|
||
]
|
||
|
||
- There should be a recent dump file now!
|
||
|
||
---
|
||
|
||
## Getting the dump out of the container
|
||
|
||
- We could use many things:
|
||
|
||
- s3cmd to copy to S3
|
||
- SSH to copy to a remote host
|
||
- gzip/bzip/etc before copying
|
||
|
||
- We'll just copy it to the Docker host
|
||
|
||
.exercise[
|
||
|
||
- Copy the file from `/data` to `/output`
|
||
|
||
- Exit the container
|
||
|
||
- Look into `/tmp/myredis` (on the host)
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Scheduling backups
|
||
|
||
In the "old world," we (generally) use cron.
|
||
|
||
With containers, what are our options?
|
||
|
||
--
|
||
|
||
- run `cron` on the Docker host,
|
||
<br/>and put `docker run` in the crontab
|
||
|
||
--
|
||
|
||
- run `cron` in the backup container,
|
||
<br/>and make sure it keeps running
|
||
<br/>(e.g. with `docker run --restart=…`)
|
||
|
||
--
|
||
|
||
- run `cron` in a container,
|
||
<br/>and start backup containers from there
|
||
|
||
--
|
||
|
||
- listen to the Docker events stream,
|
||
<br/>automatically scheduling backups
|
||
<br/>when database containers are started
|
||
|
||
---
|
||
|
||
# Docker events stream
|
||
|
||
- Using the Docker API, we can get real-time
|
||
notifications of everything happening in the Engine:
|
||
|
||
- container creation/destruction
|
||
- container start/stop
|
||
- container exit/signal/out of memory
|
||
- container attach/detach
|
||
- volume creation/destruction
|
||
- network creation/destruction
|
||
- connection/disconnection of containers
|
||
|
||
(Networks will be covered a bit later!)
|
||
|
||
---
|
||
|
||
## Subscribing to the events stream
|
||
|
||
- This is done with `docker events`
|
||
|
||
.exercise[
|
||
|
||
- Get a stream of events:
|
||
```
|
||
docker events
|
||
```
|
||
|
||
<!-- NEW-TERM -->
|
||
|
||
- In a new terminal, do *anything*:
|
||
```
|
||
docker run --rm alpine sleep 10
|
||
```
|
||
|
||
]
|
||
|
||
You should see events for the lifecycle of the
|
||
container, as well as its connection/disconnection
|
||
to the default `bridge` network.
|
||
|
||
---
|
||
|
||
# Attaching labels
|
||
|
||
- You can attach arbitrary labels to engines and containers
|
||
|
||
- You can read the value of those labels
|
||
|
||
- You can use those labels as filters in some commands
|
||
|
||
.exercise[
|
||
|
||
- Start two containers, with and without a `backup` label:
|
||
```
|
||
docker run -d --name leweb nginx
|
||
docker run -d --name ledata --label backup=please redis
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Using labels as filters
|
||
|
||
- `docker ps` can take a `--filter` argument
|
||
|
||
.exercise[
|
||
|
||
- List only containers that have a `backup` label:
|
||
```
|
||
docker ps --filter label=backup
|
||
```
|
||
|
||
- List only containers where the `backup` label
|
||
has a specific value:
|
||
```
|
||
docker ps --filter label=backup=please
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Filtering events
|
||
|
||
- On a large cluster, there will be *lots* of events
|
||
<br/>(especially when using short-lived containers)
|
||
|
||
- `docker events` can also take a `--filter` argument
|
||
|
||
.exercise[
|
||
|
||
- Show events only for containers with a "backup" label:
|
||
```
|
||
docker events --filter label=backup
|
||
```
|
||
|
||
<!-- NEW-TERM -->
|
||
|
||
- In a different terminal, terminate our containers:
|
||
```
|
||
docker kill leweb ledata
|
||
```
|
||
|
||
]
|
||
|
||
Only the events for `ledata` will be shown.
|
||
|
||
---
|
||
|
||
## Using `docker ps` in scripts
|
||
|
||
- The default output of `docker ps` has two flaws:
|
||
|
||
- it is not machine-readable
|
||
- some information is not shown
|
||
|
||
- This can be changed with the `--format` flag
|
||
|
||
.exercise[
|
||
|
||
- List containers that have a `backup` label;
|
||
<br/>show their container ID, image, and the label:
|
||
```
|
||
docker ps \
|
||
--filter label=backup \
|
||
--format '{{ .ID }} {{ .Image }} {{ .Label "backup" }}'
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
# Logs
|
||
|
||
- Two strategies:
|
||
|
||
- log to plain files on volumes
|
||
|
||
- log to stdout
|
||
<br/>(and use a logging driver)
|
||
|
||
---
|
||
|
||
## Logging to plain files on volumes
|
||
|
||
(Sorry, that part won't be hands-on!)
|
||
|
||
- Start a container with `-v /logs`
|
||
|
||
- Make sure that all log files are in `/logs`
|
||
|
||
- To check logs, run e.g.
|
||
|
||
```
|
||
docker run --volumes-from ... ubuntu sh -c \
|
||
"grep WARN /logs/*.log"
|
||
```
|
||
|
||
- Or just go interactive:
|
||
|
||
```
|
||
docker run --volumes-from ... -ti ubuntu
|
||
```
|
||
|
||
- You can (should) start a log shipper that way
|
||
|
||
---
|
||
|
||
## Logging to stdout
|
||
|
||
- All containers should write to stdout/stderr
|
||
|
||
- Docker will collect logs and pass them to a logging driver
|
||
|
||
- Logging driver can specified globally, and per container
|
||
<br/>(changing it for a container overrides the global setting)
|
||
|
||
- To change the global logging driver,
|
||
<br/>pass extra flags to the daemon
|
||
<br/>(requires a daemon restart)
|
||
|
||
- To override the logging driver for a container,
|
||
<br/>pass extra flags to `docker run`
|
||
|
||
---
|
||
|
||
## Specifying logging flags
|
||
|
||
- `--log-driver`
|
||
|
||
*selects the driver*
|
||
|
||
- `--log-opt key=val`
|
||
|
||
*adds driver-specific options*
|
||
<br/>*(can be repeated multiple times)*
|
||
|
||
- The flags are identical for `docker daemon` and `docker run`
|
||
|
||
Tip #1: when provisioning with Docker Machine, use:
|
||
```
|
||
docker-machine create ... --engine-opt log-driver=...
|
||
```
|
||
|
||
Tip #2: you can set logging options in Compose files.
|
||
|
||
---
|
||
|
||
## Available drivers
|
||
|
||
- json-file (default)
|
||
|
||
- syslog (can send to UDP, TCP, TCP+TLS, UNIX sockets)
|
||
|
||
- awslogs (AWS CloudWatch)
|
||
|
||
- journald
|
||
|
||
- gelf
|
||
|
||
- fluentd
|
||
|
||
- splunk
|
||
|
||
---
|
||
|
||
## About json-file ...
|
||
|
||
- It doesn't rotate logs by default, so your disks will fill up
|
||
|
||
(Unless you set `maxsize` *and* `maxfile` log options.)
|
||
|
||
- It's the only one supporting logs retrieval
|
||
|
||
(If you want to use `docker logs`, `docker-compose logs`,
|
||
or fetch logs from the Docker API, you need json-file!)
|
||
|
||
- This might change in the future
|
||
|
||
(But it's complex since there is no standard protocol
|
||
to *retrieve* log entries.)
|
||
|
||
All about logging in the documentation:
|
||
https://docs.docker.com/reference/logging/overview/
|
||
|
||
---
|
||
|
||
# Storing container logs in an ELK stack
|
||
|
||
*Important foreword: this is not an "official" or "recommended"
|
||
setup; it is just an example. We do not endorse ELK, GELF,
|
||
or the other elements of the stack more than others!*
|
||
|
||
What we will do:
|
||
|
||
- Spin up an ELK stack, with Compose
|
||
|
||
- Gaze at the spiffy Kibana web UI
|
||
|
||
- Manually send a few log entries over GELF
|
||
|
||
- Reconfigure our DockerCoins app to send logs to ELK
|
||
|
||
---
|
||
|
||
## What's in an ELK stack?
|
||
|
||
- ELK is three components:
|
||
|
||
- ElasticSearch (to store and index log entries)
|
||
|
||
- Logstash (to receive log entries from various
|
||
sources, process them, and forward them to various
|
||
destinations)
|
||
|
||
- Kibana (to view/search log entries with a nice UI)
|
||
|
||
- The only component that we will configure is Logstash
|
||
|
||
- We will accept log entries using the GELF protocol
|
||
|
||
- Log entries will be stored in ElasticSearch,
|
||
<br/>and displayed on Logstash's stdout for debugging
|
||
|
||
---
|
||
|
||
## Starting our ELK stack
|
||
|
||
- We will use a *separate* Compose file
|
||
|
||
- The Compose file is in the `elk` directory
|
||
|
||
.exercise[
|
||
|
||
- Go to the `elk` directory:
|
||
```
|
||
cd ~/orchestration-workshop/elk
|
||
```
|
||
|
||
- Start the ELK stack:
|
||
```
|
||
docker-compose up -d
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Checking that our ELK stack works
|
||
|
||
- Our default Logstash configuration sends a test
|
||
message every minute
|
||
|
||
- All messages are stored into ElasticSearch,
|
||
but also shown on Logstash stdout
|
||
|
||
.exercise[
|
||
|
||
- Look at Logstash stdout:
|
||
```
|
||
docker-compose logs logstash
|
||
```
|
||
|
||
]
|
||
|
||
After less than one minute, you should see a `"message" => "ok"`
|
||
in the output.
|
||
|
||
---
|
||
|
||
## Connect to Kibana
|
||
|
||
- Our ELK stack exposes two public services:
|
||
<br/>the Kibana web server, and the GELF UDP socket
|
||
|
||
.exercise[
|
||
|
||
- Check the port number for the Kibana UI:
|
||
```
|
||
docker-compose ps kibana
|
||
```
|
||
|
||
- Open the UI in your browser
|
||
<br/>(Use the instance IP address and the public port number)
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## "Configuring" Kibana
|
||
|
||
- If you see a status page with a yellow item, wait a minute and reload
|
||
(Kibana is probably still initializing)
|
||
|
||
- Kibana should offer you to "Configure an index pattern",
|
||
just click the "Create" button
|
||
|
||
- Then:
|
||
|
||
- click "Discover" (in the top-left corner)
|
||
- click "Last 15 minutes" (in the top-right corner)
|
||
- click "Last 1 hour" (in the list in the middle)
|
||
- click "Auto-refresh" (top-right corner)
|
||
- click "5 seconds" (top-left of the list)
|
||
|
||
- You should see a series of green bars
|
||
<br/>(with one new green bar every minute)
|
||
|
||
---
|
||
|
||
## Kibana out of the box
|
||
|
||

|
||
|
||
---
|
||
|
||
## Sending container output to Kibana
|
||
|
||
- We will create a simple container displaying "hello world"
|
||
|
||
- We will override the container logging driver
|
||
|
||
.exercise[
|
||
|
||
- Check the port number for the GELF socket:
|
||
<br/>`docker-compose ps logstash`
|
||
|
||
- Start a one-off container, overriding its logging driver:
|
||
<br/>(make sure to update X.X.X.X:XXXXX, of course)
|
||
|
||
```
|
||
docker run --rm --log-driver gelf \
|
||
--log-opt gelf-address=udp://X.X.X.X:XXXXX \
|
||
alpine echo hello world
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Visualizing container logs in Kibana
|
||
|
||
- Less than 5 seconds later (the refresh rate of the UI),
|
||
the log line should be visible in the Web UI
|
||
|
||
- We can customize the Web UI to be more readable
|
||
|
||
.exercise[
|
||
|
||
- In the left column, move the mouse over the following
|
||
columns, and click the "Add" button that appears:
|
||
|
||
- host
|
||
- container_name
|
||
- short_message
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Removing the old deployment of DockerCoins
|
||
|
||
- Before redeploying DockerCoins, remove everything
|
||
|
||
.exercise[
|
||
|
||
- Go back to the dockercoins directory:
|
||
<br/>`cd ~/orchestration-workshop/dockercoins`
|
||
|
||
- Stop and remove all DockerCoins containers:
|
||
<br/>`docker-compose kill`
|
||
<br/>`docker-compose rm -f`
|
||
|
||
- Reset the Compose file:
|
||
<br/>`git checkout docker-compose.yml`
|
||
|
||
- Point the Docker API to a single node:
|
||
<br/>`unset DOCKER_HOST`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Add the logging driver to the Compose file
|
||
|
||
- We need to add the logging section to each container
|
||
|
||
- We need the GELF endpoint (host+port) that we
|
||
got earlier with `docker-compose ps logstash`
|
||
|
||
.exercise[
|
||
|
||
- Edit the `docker-compose.yml` file,
|
||
<br/>adding the the following lines **to each container**:
|
||
|
||
```
|
||
log_driver: gelf
|
||
log_opt:
|
||
gelf-address: "udp://X.X.X.X:XXXXX"
|
||
```
|
||
|
||
]
|
||
|
||
Shortcut: `docker-compose.yml-logging`
|
||
<br/>(But you still have to update `XX.XX.XX.XX:XXXXX`!)
|
||
|
||
---
|
||
|
||
## Start the DockerCoins app
|
||
|
||
.exercise[
|
||
|
||
- Use Compose normally:
|
||
```
|
||
docker-compose up -d
|
||
```
|
||
|
||
]
|
||
|
||
If you look in the Kibana web UI, you will see log lines
|
||
refreshed every 5 seconds.
|
||
|
||
Note: to do interesting things (graphs, searches...) we
|
||
would need to create indexes. This is beyond the scope
|
||
of this workshop.
|
||
|
||
---
|
||
|
||
## Logging in production
|
||
|
||
- If we were using an ELK stack:
|
||
|
||
- scale ElasticSearch
|
||
- scale Logstash
|
||
- move away from UDP *or* put one Logstash per node
|
||
- interpose a Redis or Kafka queue
|
||
|
||
- Configure your Engines to send all logs to ELK by default
|
||
|
||
- Start the logging containers with a different logging system
|
||
<br/>(to avoid a logging loop)
|
||
|
||
- Make sure you don't end up writing *all logs* on the nodes running Logstash!
|
||
|
||
---
|
||
|
||
# Security upgrades
|
||
|
||
- This section is not hands-on
|
||
|
||
- Public Service Announcement
|
||
|
||
- We'll discuss:
|
||
|
||
- how to upgrade the Docker daemon
|
||
|
||
- how to upgrade container images
|
||
|
||
---
|
||
|
||
## Upgrading the Docker daemon
|
||
|
||
- Stop all containers cleanly
|
||
|
||
- Stop the Docker daemon
|
||
|
||
- Upgrade the Docker daemon
|
||
|
||
- Start the Docker daemon
|
||
|
||
- Start all containers
|
||
|
||
- This is like upgrading your Linux kernel,
|
||
<br/>but it will get better
|
||
|
||
---
|
||
|
||
## In practice
|
||
|
||
- Keep track of running containers before stopping the Engine:
|
||
```
|
||
docker ps --no-trunc -q |
|
||
tee /tmp/running |
|
||
xargs -n1 -P10 docker stop
|
||
```
|
||
|
||
- Restart those containers after the Engine is running again:
|
||
```
|
||
xargs docker start < /tmp/running
|
||
```
|
||
<br/>(Run this multiple times if you have linked containers!)
|
||
|
||
---
|
||
|
||
## Upgrading container images
|
||
|
||
- When a vulnerability is announced:
|
||
|
||
- if it affects your base images,
|
||
<br/>make sure they are fixed first
|
||
|
||
- if it affects downloaded packages,
|
||
<br/>make sure they are fixed first
|
||
|
||
- re-pull base images
|
||
|
||
- rebuild
|
||
|
||
- restart containers
|
||
|
||
---
|
||
|
||
## How do we know when to upgrade?
|
||
|
||
- Subscribe to CVE notifications
|
||
|
||
- https://cve.mitre.org/
|
||
|
||
- your distros' security announcements
|
||
|
||
- Check CVE status in official images
|
||
<br/>(tag [cve-tracker](
|
||
https://github.com/docker-library/official-images/labels/cve-tracker)
|
||
in [docker-library/official-images](
|
||
https://github.com/docker-library/official-images/labels/cve-tracker)
|
||
repo)
|
||
|
||
- Coming soon: Project Nautilus
|
||
<br/>(see [this DC15EU presentation](
|
||
http://www.slideshare.net/Docker/official-repos-and-project-nautilus/26))
|
||
|
||
---
|
||
|
||
## Upgrading with Compose
|
||
|
||
Compose makes this particularly easy:
|
||
```
|
||
docker-compose build --pull --no-cache
|
||
docker-compose up -d
|
||
```
|
||
|
||
This will automatically:
|
||
|
||
- pull base images;
|
||
- rebuild all container images;
|
||
- bring up the new containers.
|
||
|
||
Remember: Compose will automatically move our
|
||
volumes to the new containers, so data is preserved.
|
||
|
||
---
|
||
|
||
# Network traffic analysis
|
||
|
||
- We still have `myredis` running
|
||
|
||
- We will use *shared network namespaces*
|
||
<br/>to perform network analysis
|
||
|
||
- Two containers sharing the same network namespace...
|
||
|
||
- have the same IP addresses
|
||
|
||
- have the same network interfaces
|
||
|
||
- `eth0` is therefore the same in both containers
|
||
|
||
---
|
||
|
||
## Install and start `ngrep`
|
||
|
||
Ngrep uses libpcap (like tcpdump) to sniff network traffic.
|
||
|
||
.exercise[
|
||
|
||
- Start a container with the same network namespace:
|
||
<br/>`docker run --net container:myredis -ti alpine sh`
|
||
|
||
- Install ngrep:
|
||
<br/>`apk update && apk add ngrep`
|
||
|
||
- Run ngrep:
|
||
<br/>`ngrep -tpd eth0 -Wbyline . tcp`
|
||
|
||
]
|
||
|
||
You should see a stream of Redis requests and responses.
|
||
|
||
---
|
||
|
||
class: title
|
||
|
||
# Dynamic orchestration
|
||
|
||
---
|
||
|
||
## Static vs Dynamic
|
||
|
||
- Static
|
||
|
||
- you decide what goes where
|
||
|
||
- simple to describe and implement
|
||
|
||
- seems easy at first but doesn't scale efficiently
|
||
|
||
- Dynamic
|
||
|
||
- the system decides what goes where
|
||
|
||
- requires extra components (HA KV...)
|
||
|
||
- scaling can be finer-grained, more efficient
|
||
|
||
---
|
||
|
||
## Mesos (overview)
|
||
|
||
- First presented in 2009
|
||
|
||
- Initial goal: resource scheduler
|
||
<br/>(two-level/pessimistic)
|
||
|
||
- top-level "master" knows the global cluster state
|
||
|
||
- "slave" nodes report status and resources to master
|
||
|
||
- master allocates resources to "frameworks"
|
||
|
||
- Container support added recently
|
||
<br/>(had to fit existing model)
|
||
|
||
- Network and service discovery is complex
|
||
|
||
---
|
||
|
||
## Mesos (in practice)
|
||
|
||
- Easy to setup a test cluster (in containers!)
|
||
|
||
- Great to accommodate mixed workloads
|
||
<br/>(see Marathon, Chronos, Aurora, and many more)
|
||
|
||
- "Meh" if you only want to run Docker containers
|
||
|
||
- In production on clusters of thousands of nodes
|
||
|
||
- Open source project; commercial support available
|
||
|
||
---
|
||
|
||
## Kubernetes (overview)
|
||
|
||
- Started in June 2014
|
||
|
||
- Designed specifically as a platform for containers
|
||
<br/>("greenfield" design)
|
||
|
||
- "pods" = groups of containers sharing network/storage
|
||
|
||
- Scaling and HA managed by "replication controllers"
|
||
|
||
- extensive use of "tags" instead of e.g. tree hierarchy
|
||
|
||
- Initially designed around Docker,
|
||
<br/>but doesn't hesitate to diverge in a few places
|
||
|
||
---
|
||
|
||
## Kubernetes (in practice)
|
||
|
||
- Network and service discovery is powerful, but complex
|
||
<br/>.small[(different mechanisms within pod, between pods, for inbound traffic...)]
|
||
|
||
- Initially designed around GCE
|
||
<br/>.small[(networking and persistence work better on GCE, but this is getting better)]
|
||
|
||
- Tends to be loved by ops more than devs
|
||
<br/>.small[(but keep in mind that it's evolving quite as fast as Docker)]
|
||
|
||
- Adaptation is needed when it differs from Docker
|
||
<br/>.small[(need to learn new API, new tooling, new concepts)]
|
||
|
||
- Bottom line: Kubernetes is not Docker!
|
||
<br/>.small[(different APIs, concepts, configuration files...)]
|
||
|
||
---
|
||
|
||
## ECS (overview)
|
||
|
||
- Amazon EC2 Container Service
|
||
|
||
- "Bring your own instance"
|
||
|
||
- "Native" container scheduler on AWS
|
||
|
||
- Some integration with other AWS products
|
||
|
||
- No extra component to operate
|
||
|
||
- Defines new concepts:
|
||
|
||
- task
|
||
- task definition
|
||
- service
|
||
|
||
---
|
||
|
||
## ECS (in practice)
|
||
|
||
- Task definitions look like Compose files,
|
||
<br/>but are significantly different
|
||
|
||
- Integration with e.g. ELB is suboptimal
|
||
<br/>(ELB requires all backends to run on the same port)
|
||
|
||
- Cluster deployment is made easier thanks to ECS CLI
|
||
|
||
- Docker API gets partially exposed through ECS API,
|
||
<br/>with some features lagging behind
|
||
|
||
- Service discovery is painful
|
||
|
||
---
|
||
|
||
## Nomad (overview)
|
||
|
||
- Generic job scheduler
|
||
<br/>(not only for containers)
|
||
|
||
- Desired state is stored in Consul
|
||
|
||
- Nodes pull jobs from Consul
|
||
|
||
- Scheduling happens in parallel
|
||
|
||
---
|
||
|
||
## Nomad (in practice)
|
||
|
||
*Disclaimer: I have little first-hand experience with Nomad!*
|
||
|
||
- Does only one thing, but does it really well
|
||
|
||
- Works with jobs, not applications, services, etc.
|
||
|
||
- As I understand it: Nomad is an excellent building block,
|
||
<br/>but you need to add other components to deploy your apps
|
||
|
||
---
|
||
|
||
## Swarm (in theory)
|
||
|
||
- Consolidates multiple Docker hosts into a single one
|
||
|
||
- "Looks like" a Docker daemon, but it dispatches (schedules)
|
||
your containers on multiple daemons
|
||
|
||
- Talks the Docker API front and back
|
||
<br/>(leverages the Docker API and ecosystem)
|
||
|
||
- Open source and written in Go (like Docker)
|
||
|
||
- Started by two of the original Docker authors
|
||
<br/>([@aluzzardi](https://twitter.com/aluzzardi) and [@vieux](https://twitter.com/vieux))
|
||
|
||
---
|
||
|
||
## Swarm (in practice)
|
||
|
||
- Stable since November 2015
|
||
|
||
- Tested with 1000 nodes + 50000 containers
|
||
<br/>.small[(without particular tuning; see DockerCon EU opening keynotes!)]
|
||
|
||
- Perfect for some scenarios (Jenkins, grid...)
|
||
|
||
- Requires extra effort for Compose build, links...
|
||
|
||
- Requires a key/value store to achieve high availability
|
||
|
||
- We'll see it in action!
|
||
|
||
???
|
||
|
||
## PAAS on Docker
|
||
|
||
- The PAAS workflow: *just push code*
|
||
<br/>(inspired by Heroku, dotCloud...)
|
||
|
||
- TL,DR: easier for devs, harder for ops,
|
||
<br/>some very opinionated choices
|
||
|
||
- A few examples:
|
||
<br/>(Non-exhaustive list!!!)
|
||
|
||
- Cloud Foundry
|
||
- Deis
|
||
- Dokku
|
||
- Flynn
|
||
- Tsuru
|
||
|
||
*Docker made it very easy to cobble your own PAAS.*
|
||
|
||
???
|
||
|
||
## A few other tools
|
||
|
||
- Volume plugins (Convoy, Flocker...)
|
||
|
||
- manage/migrate stateful containers (and more)
|
||
|
||
- Network plugins (Contiv, Weave...)
|
||
|
||
- overlay network so that containers can ping each other
|
||
|
||
- Docker Cloud (Tutum), Docker UCP (Universal Control Plane)
|
||
|
||
- dashboards to manage fleets of Docker hosts
|
||
|
||
... And many more!
|
||
|
||
---
|
||
|
||
# Hands-on Swarm
|
||
|
||

|
||
|
||
---
|
||
|
||
## Setting up our Swarm cluster
|
||
|
||
- This can be done manually or with **Docker Machine**
|
||
|
||
- Manual deployment:
|
||
|
||
- with TLS: certificate generation is painful
|
||
<br/>(needs dual-use certs)
|
||
|
||
- without TLS: easier, but insecure
|
||
<br/>(unless you run on your internal/private network)
|
||
|
||
- Docker Machine deployment:
|
||
|
||
- generates keys, certificates, and deploys them for you
|
||
|
||
- can also create VMs
|
||
|
||
---
|
||
|
||
## The Way Of The Machine
|
||
|
||
- Install `docker-machine` (single binary download)
|
||
|
||
- Set a few environment variables (cloud credentials)
|
||
|
||
- Create one or more machines:
|
||
<br/>`docker-machine create -d digitalocean node42`
|
||
|
||
- List machines and their status:
|
||
<br/>`docker-machine ls`
|
||
|
||
- Select a machine for use:
|
||
<br/>`eval $(docker-machine env node42)`
|
||
<br/>(this will set a few environment variables)
|
||
|
||
- Execute regular commands with Docker, Compose, etc.
|
||
<br/>(they will pick up remote host address from environment)
|
||
|
||
---
|
||
|
||
## Docker Machine `generic` driver
|
||
|
||
- Most drivers work the same way:
|
||
|
||
- use cloud API to create instance
|
||
|
||
- connect to instance over SSH
|
||
|
||
- install Docker
|
||
|
||
- The `generic` driver skips the first step
|
||
|
||
- It can install Docker on any machine,
|
||
<br/>as long as you have SSH access
|
||
|
||
- We will use that!
|
||
|
||
---
|
||
|
||
# Deploying Swarm
|
||
|
||
- Components involved:
|
||
|
||
- cluster discovery mechanism
|
||
<br/>(so that the manager can learn about the nodes)
|
||
|
||
- swarm manager
|
||
<br/>(your frontend to the cluster)
|
||
|
||
- swarm agent
|
||
<br/>(runs on each node, registers it with service discovery)
|
||
|
||
---
|
||
|
||
# Cluster discovery
|
||
|
||
- Possible backends:
|
||
|
||
- dynamic, self-hosted (zk, etcd, consul)
|
||
|
||
- static (command-line or file)
|
||
|
||
- hosted by Docker (token)
|
||
|
||
- We will use the token mechanism
|
||
|
||
---
|
||
|
||
## Generating our Swarm discovery token
|
||
|
||
The token is a unique identifier, corresponding to a bucket
|
||
in the discovery service hosted by Docker Inc.
|
||
|
||
(You can consider it as a rendez-vous point for your cluster.)
|
||
|
||
.exercise[
|
||
|
||
- Create your token, saving it preciously to disk as well:
|
||
|
||
```
|
||
TOKEN=$(docker run swarm create | tee token)
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Swarm agent
|
||
|
||
- Used only for dynamic discovery (zk, etcd, consul, token)
|
||
|
||
- Must run on each node
|
||
|
||
- Every 20s (by default), tells to the discovery system:
|
||
</br>"Hello, there is a Swarm node at A.B.C.D:EFGH"
|
||
|
||
- Must know the node's IP address
|
||
<br/>(sorry, it can't figure it out by itself, because
|
||
<br/>it doesn't know whether to use public or private addresses)
|
||
|
||
- The node continues to work even if the agent dies
|
||
|
||
- Automatically started by Docker Machine
|
||
<br/>(when the `--swarm` option is passed)
|
||
|
||
---
|
||
|
||
## Swarm manager
|
||
|
||
- Exposes a Docker API endpoint
|
||
|
||
- Talks to the cluster nodes
|
||
|
||
- Performs healthchecks, scheduling...
|
||
|
||
.exercise[
|
||
|
||
- Connect to `node1`
|
||
|
||
- "Create" a node with Docker Machine:
|
||
|
||
.small[
|
||
```
|
||
docker-machine create --driver generic \
|
||
--swarm --swarm-master --swarm-discovery token://$TOKEN \
|
||
--generic-ssh-user docker --generic-ip-address A.B.C.D node1
|
||
```
|
||
]
|
||
|
||
]
|
||
|
||
(Don't forget to replace A.B.C.D with the node IP address!)
|
||
|
||
---
|
||
|
||
## Redundancy
|
||
|
||
- The manager is a SPOF
|
||
|
||
- If you lose the manager:
|
||
|
||
- you can't control the cluster anymore
|
||
|
||
- you can still control individual nodes
|
||
|
||
- you can start a new manager
|
||
<br/>(at this point, it is stateless)
|
||
|
||
- We'll setup active/passive redundancy later
|
||
|
||
---
|
||
|
||
## Check our node
|
||
|
||
Let's connect to the node *individually*.
|
||
|
||
.exercise[
|
||
|
||
- Select the node with Machine
|
||
|
||
```
|
||
eval $(docker-machine env node1)
|
||
```
|
||
|
||
- Execute some Docker commands
|
||
|
||
```
|
||
docker version
|
||
docker info
|
||
docker ps
|
||
```
|
||
|
||
]
|
||
|
||
Two containers should show up: the agent and the manager.
|
||
|
||
---
|
||
|
||
## Check our (single-node) Swarm cluster
|
||
|
||
Let's connect to the manager instead.
|
||
|
||
.exercise[
|
||
|
||
- Select the Swarm manager with Machine
|
||
|
||
```
|
||
eval $(docker-machine env node1 --swarm)
|
||
```
|
||
|
||
- Execute some Docker commands
|
||
|
||
```
|
||
docker version
|
||
docker info
|
||
docker ps
|
||
```
|
||
|
||
]
|
||
|
||
The output is different! Let's review this.
|
||
|
||
---
|
||
|
||
## `docker version`
|
||
|
||
Swarm identifies itself clearly:
|
||
|
||
```
|
||
Client:
|
||
Version: 1.10.2
|
||
API version: 1.22
|
||
Go version: go1.5.3
|
||
Git commit: c3959b1
|
||
Built: Mon Feb 22 21:40:35 2016
|
||
OS/Arch: linux/amd64
|
||
|
||
Server:
|
||
Version: swarm/1.1.3
|
||
API version: 1.22
|
||
Go version: go1.5.3
|
||
Git commit: 7e9c6bd
|
||
Built: Wed Mar 2 00:15:12 UTC 2016
|
||
OS/Arch: linux/amd64
|
||
```
|
||
|
||
---
|
||
|
||
## `docker info`
|
||
|
||
Swarm gives cluster information, showing all nodes:
|
||
|
||
.small[
|
||
```
|
||
Containers: 3
|
||
Images: 6
|
||
Role: primary
|
||
Strategy: spread
|
||
Filters: affinity, health, constraint, port, dependency
|
||
Nodes: 1
|
||
node1: 52.58.50.15:2376
|
||
└ Status: Healthy
|
||
└ Containers: 3
|
||
└ Reserved CPUs: 0 / 2
|
||
└ Reserved Memory: 0 B / 3.859 GiB
|
||
└ Labels: executiondriver=native-0.2,
|
||
kernelversion=4.2.0-30-generic,
|
||
operatingsystem=Ubuntu 15.10,
|
||
provider=generic,
|
||
storagedriver=aufs
|
||
└ Error: (none)
|
||
└ UpdatedAt: 2016-03-09T14:01:43Z
|
||
Kernel Version: 4.2.0-30-generic
|
||
Operating System: linux
|
||
Architecture: amd64
|
||
CPUs: 2
|
||
Total Memory: 3.86 GiB
|
||
Name: node1
|
||
```
|
||
]
|
||
|
||
---
|
||
|
||
## `docker ps`
|
||
|
||
- This one should show nothing at this point
|
||
|
||
- The Swarm containers are hidden
|
||
|
||
- This avoids unneeded pollution
|
||
|
||
- This also avoids killing them by mistake
|
||
|
||
- We can still see them with `docker ps -a`, though
|
||
|
||
---
|
||
|
||
## Add other nodes to the cluster
|
||
|
||
- Let's use *almost* the same command line
|
||
<br/>(but without `--swarm-master`)
|
||
|
||
.exercise[
|
||
|
||
- Stay on `node1` (it has keys and certificates now!)
|
||
|
||
- Add another node with Docker Machine
|
||
|
||
.small[
|
||
```
|
||
docker-machine create --driver generic \
|
||
--swarm --swarm-discovery token://$TOKEN \
|
||
--generic-ssh-user docker --generic-ip-address A.B.C.D node2
|
||
```
|
||
]
|
||
]
|
||
|
||
Remember to update the IP address correctly.
|
||
|
||
Repeat for all 4 nodes.
|
||
|
||
Pro tip: look for name/address mapping in `/etc/hosts`.
|
||
|
||
---
|
||
|
||
## Scripting
|
||
|
||
To help you a little bit:
|
||
|
||
```
|
||
grep node[2345] /etc/hosts | grep -v ^127 |
|
||
while read IPADDR NODENAME
|
||
do docker-machine create --driver generic \
|
||
--swarm --swarm-discovery token://$TOKEN \
|
||
--generic-ssh-user docker \
|
||
--generic-ip-address $IPADDR $NODENAME
|
||
done
|
||
```
|
||
|
||
Then check with `docker info` that all the nodes are there.
|
||
|
||
---
|
||
|
||
## Running containers on Swarm
|
||
|
||
Try to run a few `busybox` containers.
|
||
|
||
Then, let's get serious:
|
||
|
||
.exercise[
|
||
|
||
- Start a Redis service:
|
||
<br/>`docker run -dP redis`
|
||
|
||
- See the service address:
|
||
<br/>`docker port $(docker ps -lq) 6379`
|
||
|
||
]
|
||
|
||
This can be any of your five nodes.
|
||
|
||
---
|
||
|
||
## Scheduling strategies
|
||
|
||
- Random: pick a node at random
|
||
<br/>(but honor resource constraints)
|
||
|
||
- Spread: pick the node with the least containers
|
||
<br/>(including stopped containers)
|
||
|
||
- Binpack: try to maximize resource usage
|
||
<br/>(in other words: use as few hosts as possible)
|
||
|
||
---
|
||
|
||
# Building our app on Swarm
|
||
|
||
Before trying to build our app, we will remove previous images.
|
||
|
||
.exercise[
|
||
|
||
- Delete all images with "dockercoins" in the name:
|
||
|
||
```
|
||
docker images |
|
||
grep dockercoins |
|
||
awk '{print $1}' |
|
||
xargs -r docker rmi -f
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Building our app on Swarm
|
||
|
||
- Compose now supports builds on Swarm
|
||
<br/>(older versions would crash)
|
||
|
||
.exercise[
|
||
|
||
- Run `docker-compose build`
|
||
|
||
- Try to start and scale the application:
|
||
```
|
||
docker-compose up -d
|
||
docker-compose scale worker=10
|
||
docker-compose scale webui=2
|
||
```
|
||
|
||
]
|
||
|
||
.icon[] This is supposed to fail!
|
||
|
||
(Don't bang your head on the keyboard if it doesn't work!)
|
||
|
||
---
|
||
|
||
## Caveats when building with Swarm
|
||
|
||
- Containers are only scheduled where they were built
|
||
|
||
- cause: images are not present on all nodes
|
||
|
||
- solution: distribute images through a registry
|
||
<br/>(e.g. Docker Hub)
|
||
|
||
- You can end up with inconsistent versions
|
||
<br/>(i.e. `dockercoins_rng:latest` being different on two nodes)
|
||
|
||
- cause: build nodes can come and go
|
||
|
||
- solution: always pin builds to the same node
|
||
|
||
- Also, caching doesn't work all the time
|
||
|
||
---
|
||
|
||
## Why can't Swarm do this automatically for us?
|
||
|
||
- Let's step back and think for a minute ...
|
||
|
||
- What should `docker build` do on Swarm?
|
||
|
||
- build on one machine
|
||
|
||
- build everywhere ($$$)
|
||
|
||
- After the build, what should `docker run` do?
|
||
|
||
- run where we built (how do we know where it is?)
|
||
|
||
- run on any machine that has the image
|
||
|
||
- Could Compose+Swarm solve this automatically?
|
||
|
||
---
|
||
|
||
## A few words about "sane defaults"
|
||
|
||
- *It would be nice if Swarm could pick a node, and build there!*
|
||
|
||
- but which node should it pick?
|
||
- what if the build is very expensive?
|
||
- what if we want to distribute the build across nodes?
|
||
- what if we want to tag some builder nodes?
|
||
- ok but what if no node has been tagged?
|
||
|
||
- *It would be nice if Swarm could automatically push images!*
|
||
|
||
- using the Docker Hub is an easy choice
|
||
<br/>(you just need an account)
|
||
- but some of us can't/won't use Docker Hub
|
||
<br/>(for compliance reasons or because no network access)
|
||
|
||
.small[("Sane" defaults are nice only if we agree on the definition of "sane")]
|
||
|
||
---
|
||
|
||
## The plan
|
||
|
||
- Build locally
|
||
|
||
- Tag images
|
||
|
||
- Upload them to a registry
|
||
|
||
- Update the Compose file to use those images
|
||
|
||
*That's the purpose of the `build-tag-push.py` script!*
|
||
|
||
---
|
||
|
||
## Which registry do we want to use?
|
||
|
||
.small[
|
||
|
||
- **Docker Hub**
|
||
|
||
- hosted by Docker Inc.
|
||
- requires an account (free, no credit card needed)
|
||
- images will be public (unless you pay)
|
||
- located in AWS EC2 us-east-1
|
||
|
||
- **Docker Trusted Registry**
|
||
|
||
- self-hosted commercial product
|
||
- requires a subscription (free 30-day trial available)
|
||
- images can be public or private
|
||
- located wherever you want
|
||
|
||
- **Docker open source registry**
|
||
|
||
- self-hosted barebones repository hosting
|
||
- doesn't require anything
|
||
- doesn't come with anything either
|
||
- located wherever you want
|
||
]
|
||
|
||
---
|
||
|
||
## Using Docker Hub
|
||
|
||
- To tell `build-tag-push.py` to use Docker Hub,
|
||
<br/>set the `DOCKER_REGISTRY` environment variable
|
||
<br/>to your Docker Hub user name
|
||
|
||
- We will also see how to run the open source registry
|
||
<br/>(so use whatever option you want!)
|
||
|
||
.exercise[
|
||
|
||
- Set the following environment variable:
|
||
<br/>`export DOCKER_REGISTRY=jpetazzo`
|
||
|
||
- (Use *your* Docker Hub login, of course!)
|
||
|
||
- Log into the Docker Hub:
|
||
<br/>`docker login`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Using Docker Trusted Registry
|
||
|
||
If we wanted to use DTR, we would:
|
||
|
||
- make sure we have a Docker Hub account
|
||
- [activate a Docker Datacenter subscription](
|
||
https://hub.docker.com/enterprise/trial/)
|
||
- install DTR on our machines
|
||
- set `DOCKER_REGISTRY` to `dtraddress:port/user`
|
||
|
||
*This is out of the scope of this workshop!*
|
||
|
||
---
|
||
|
||
## Using open source registry
|
||
|
||
- We need to run a `registry:2` container
|
||
<br/>(make sure you specify tag `:2` to run the new version!)
|
||
|
||
- It will store images and layers to the local filesystem
|
||
<br/>(but you can add a config file to use S3, Swift, etc.)
|
||
|
||
- Docker *requires* TLS when communicating with the registry,
|
||
unless for registries on `localhost` or with the Engine
|
||
flag `--insecure-registry`
|
||
|
||
- We will run an ambassador on each node
|
||
of the cluster, redirecting `localhost:5000` to
|
||
the registry
|
||
|
||
---
|
||
|
||
## Deploying our open source registry
|
||
|
||
.exercise[
|
||
|
||
- Start your registry on your Swarm cluster:
|
||
```
|
||
eval $(docker-machine env node1 --swarm)
|
||
docker run -dP --name registry registry:2
|
||
```
|
||
|
||
- Start five ambassadors (one per node):
|
||
```
|
||
for N in $(seq 1 5); do
|
||
docker run -d -p 5000:5000 jpetazzo/hamba \
|
||
5000 $(docker port registry 5000)
|
||
done
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Testing our local registry
|
||
|
||
- We can retag a small image, and push it to the registry
|
||
|
||
.exercise[
|
||
|
||
- Make sure we have the busybox image:
|
||
```
|
||
docker pull busybox
|
||
```
|
||
|
||
- Retag the busybox image:
|
||
```
|
||
docker tag busybox localhost:5000/busybox
|
||
```
|
||
|
||
- Push it:
|
||
```
|
||
docker push localhost:5000/busybox
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Using our local registry
|
||
|
||
- The `build-tag-push.py` script uses the `DOCKER_REGISTRY`
|
||
environment variable
|
||
|
||
.exercise[
|
||
|
||
- Set the `DOCKER_REGISTRY` variable:
|
||
```
|
||
export DOCKER_REGISTRY=localhost:5000
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Build, Tag, And Push
|
||
|
||
Let's inspect the source code of `build-tag-push.py` and run it.
|
||
|
||
.icon[] Make sure to run it against a single node!
|
||
|
||
.icon[] Make sure to use the original Compose file!
|
||
|
||
.small[(We don't want the scaled RNG service, the custom logging driver, etc.)]
|
||
|
||
.exercise[
|
||
|
||
- Run `build-tag-push.py`:
|
||
```
|
||
eval $(docker-machine env node1)
|
||
git checkout docker-compose.yml
|
||
../bin/build-tag-push.py
|
||
```
|
||
|
||
]
|
||
|
||
Inspect the `docker-compose.yml-XXX` file that it created.
|
||
|
||
---
|
||
|
||
## Can we run this now?
|
||
|
||
Let's try!
|
||
|
||
.exercise[
|
||
|
||
- Switch back to the Swarm cluster:
|
||
<br/>`eval $(docker-machine env node1 --swarm)`
|
||
|
||
- Protip - set the `COMPOSE_FILE` variable:
|
||
<br/>`export COMPOSE_FILE=docker-compose.yml-XXX`
|
||
|
||
- Bring up the application:
|
||
<br/>`docker-compose up`
|
||
|
||
]
|
||
|
||
.icon[] This is *still* supposed to fail!
|
||
|
||
--
|
||
|
||
(╯°□°)╯︵ ┻━┻
|
||
|
||
---
|
||
|
||
## Why do we get that weird error message?
|
||
|
||
- Compose and Swarm do not collaborate
|
||
to establish *placement constraints*.
|
||
|
||
---
|
||
|
||
## Simple container dependencies
|
||
|
||
- Container A has a link to container B
|
||
|
||
- Compose starts B first, then A
|
||
|
||
- Swarm translates the link into a placement constraint:
|
||
|
||
- *"put A on the same node as B"*
|
||
|
||
- Alles gut
|
||
|
||
---
|
||
|
||
## Complex container dependencies
|
||
|
||
- Container A has a link to containers B and C
|
||
|
||
- Compose starts B and C first
|
||
<br/>(but that can be on different nodes!)
|
||
|
||
- Compose starts A
|
||
|
||
- Swarm translates the links into placements contraints
|
||
|
||
- *"put A on the same node as B"*
|
||
- *"put A on the same node as C"*
|
||
|
||
- If B and C are on different nodes, that's impossible
|
||
|
||
So, what do‽
|
||
|
||
---
|
||
|
||
## A word on placement constraints
|
||
|
||
- Swarm supports constraints
|
||
|
||
- We could tell swarm to put all our containers together
|
||
|
||
- Linking would work
|
||
|
||
- But all containers would end up on the same node
|
||
|
||
--
|
||
|
||
- So having a cluster would be pointless!
|
||
|
||
---
|
||
|
||
## Connecting containers with Swarm (1/2)
|
||
|
||
- Implement service discovery in the application
|
||
|
||
- requires extensive code changes
|
||
|
||
- doesn't require extra services or containers
|
||
|
||
- provides load balancing and failover
|
||
|
||
- Inject service addresses in environment variables
|
||
|
||
- requires minimal code changes
|
||
|
||
- doesn't require extra services or containers
|
||
|
||
- doesn't provide load balancing and failover
|
||
|
||
---
|
||
|
||
## Connecting containers with Swarm (2/2)
|
||
|
||
- Ambassadors
|
||
|
||
- don't require code changes
|
||
|
||
- require additional containers
|
||
|
||
- provide load balancing and failover
|
||
|
||
- Overlay networks
|
||
|
||
- don't require code changes
|
||
|
||
- don't require extra services or containers
|
||
|
||
- doesn't provide load balancing and failover (yet)
|
||
|
||
---
|
||
|
||
# Connecting containers with ambassadors
|
||
|
||
- We will use one-tier, dynamic ambassadors
|
||
|
||
- Each link to a service will be replaced by an ambassador
|
||
|
||
- Each ambassador will be placed in the network namespace
|
||
of the service using the ambassador
|
||
|
||
- Ambassadors will be dynamically reconfigured when
|
||
linked services are updated
|
||
|
||
---
|
||
|
||
## Revisiting `jpetazzo/hamba`
|
||
|
||
- Configuration is stored in a *volume*
|
||
|
||
- A watcher process looks for configuration updates,
|
||
<br/>and restarts HAProxy when needed
|
||
|
||
- It can be started without configuration:
|
||
|
||
```
|
||
docker run --name amba jpetazzo/hamba run
|
||
```
|
||
|
||
- There is a helper to inject a new configuration:
|
||
|
||
```
|
||
docker run --rm --volumes-from amba jpetazzo/hamba \
|
||
reconfigure 80 backend1 port1 backend2 port2 ...
|
||
```
|
||
|
||
---
|
||
|
||
## Should we use `links` for our ambassadors?
|
||
|
||
Technically, we could use links.
|
||
|
||
- Before starting an app container:
|
||
|
||
start the ambassador(s) it needs
|
||
|
||
- When starting an app container:
|
||
|
||
link it to its ambassador(s)
|
||
|
||
But we wouldn't be able to use `docker-compose scale` anymore.
|
||
|
||
(We would have to scale the ambassadors *first*,
|
||
then add our client containers.)
|
||
|
||
---
|
||
|
||
## Network namespaces and `extra_hosts`
|
||
|
||
This is our plan:
|
||
|
||
- Replace each `link` with an `extra_host`,
|
||
<br/>pointing to the `127.127.X.X` address space
|
||
|
||
- Start app containers normally
|
||
<br/>(`docker-compose up`, `docker-compose scale`)
|
||
|
||
- Start ambassadors after app containers are up:
|
||
|
||
- ambassadors bind to `127.127.X.X`
|
||
|
||
- they share their client's network namespace
|
||
|
||
- Reconfigure ambassadors each time something changes
|
||
|
||
---
|
||
|
||
## Our plan for service discovery
|
||
|
||
- Replace all `links` with static `/etc/hosts` entries
|
||
|
||
- Those entries will map to `127.127.0.X`
|
||
<br/>(with different `X` for each service)
|
||
|
||
- Example: `redis` will point to `127.127.0.2`
|
||
<br/>(instead of a container address)
|
||
|
||
- Start all services; scale them if we want
|
||
<br/>(at this point, they will all fail to connect)
|
||
|
||
- Start ambassadors in the services' namespace;
|
||
<br/>each ambassador will listen on the right `127.127.0.X`
|
||
|
||
- Gather all backend addresses and configure ambassadors
|
||
|
||
.icon[] Services should try to reconnect!
|
||
|
||
---
|
||
|
||
## "Design for failure," they said
|
||
|
||
- When the containers are started, the network is not ready
|
||
|
||
- First connection attempts **will fail**
|
||
|
||
- App should try to reconnect
|
||
|
||
- It is OK to crash and restart
|
||
|
||
- Exponential back-off is nice
|
||
|
||
---
|
||
|
||
## Pictures worth 1000 words
|
||
|
||
- In the following diagrams, we are connecting a
|
||
`www` service to a `redis` service through
|
||
an ambassador.
|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||

|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||

|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||

|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||

|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||

|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||

|
||
|
||
---
|
||
|
||
## Our tools
|
||
|
||
- `link-to-ambassadors.py`
|
||
|
||
- replaces all `links` with `extra_hosts` entries
|
||
|
||
- `create-ambassadors.py`
|
||
|
||
- scans running containers
|
||
- allocates `127.127.X.X` addresses
|
||
- starts (unconfigured) ambassadors
|
||
|
||
- `configure-ambassadors.py`
|
||
|
||
- scans running containers
|
||
- gathers backend addresses
|
||
- sends configuration to ambassadors
|
||
|
||
---
|
||
|
||
## Convert links to ambassadors
|
||
|
||
- When we ran `build-tag-push.py` earlier,
|
||
<br/>it generated a new `docker-compose.yml-XXX` file.
|
||
|
||
.exercise[
|
||
|
||
- Run the first script to create a new YAML file:
|
||
<br/>`../bin/link-to-ambassadors.py`
|
||
|
||
]
|
||
|
||
In the Compose file, all links have been replaced
|
||
by `extra_hosts` sections.
|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||
## Current state
|
||
|
||

|
||
|
||
---
|
||
|
||
## Bring up the application
|
||
|
||
The application can now be started and scaled.
|
||
|
||
.exercise[
|
||
|
||
- Start the application:
|
||
<br/>`docker-compose up -d`
|
||
|
||
]
|
||
|
||
Note: you can scale everything as you like, *except Redis*,
|
||
because it is stateful.
|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||
## Current state
|
||
|
||

|
||
|
||
---
|
||
|
||
## Create the ambassadors
|
||
|
||
This has to be executed each time you create new services
|
||
or scale up existing ones.
|
||
|
||
After reading `$COMPOSE_FILE`, it will scan running containers, and compare:
|
||
|
||
- the list of app containers,
|
||
- the list of ambassadors.
|
||
|
||
It will create missing ambassadors.
|
||
|
||
.exercise[
|
||
|
||
- Run the script!
|
||
<br/>`../bin/create-ambassadors.py`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||
## Current state
|
||
|
||

|
||
|
||
---
|
||
|
||
## Configure the ambassadors
|
||
|
||
All ambassadors are created but they still need configuration.
|
||
|
||
That's the purpose of the last script.
|
||
|
||
It will read `$COMPOSE_FILE` and gather:
|
||
|
||
- the list of app backends,
|
||
- the list of ambassadors.
|
||
|
||
Then it configures all ambassadors with all found backends.
|
||
|
||
.exercise[
|
||
|
||
- Run it!
|
||
<br/>`../bin/configure-ambassadors.py`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||
## Current state
|
||
|
||

|
||
|
||
---
|
||
|
||
## Check what we did
|
||
|
||
.exercise[
|
||
|
||
|
||
- Find out the address of the web UI:
|
||
<br/>`docker-compose ps webui`
|
||
|
||
- Point your browser to it
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Scale
|
||
|
||
- We will now add more containers.
|
||
|
||
.exercise[
|
||
|
||
- Scale worker and rng:
|
||
```
|
||
docker-compose scale worker=5 rng=10
|
||
```
|
||
|
||
]
|
||
|
||
The performance graph stays at the same level.
|
||
|
||
If we look at the logs of the added workers, we will
|
||
see screenfuls of "connection refused" exceptions.
|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||
## Current state
|
||
|
||

|
||
|
||
---
|
||
|
||
## Add ambassadors
|
||
|
||
- The new containers don't have ambassadors at this point.
|
||
|
||
.exercise[
|
||
|
||
- Create the missing ambassadors with the script:
|
||
```
|
||
../bin/create-ambassadors.py
|
||
```
|
||
|
||
]
|
||
|
||
The performance graph stays at the same level.
|
||
|
||
If we look at the logs of the added workers, we will
|
||
now see timeout errors instead of "connection refused."
|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||
## Current state
|
||
|
||

|
||
|
||
---
|
||
|
||
## Configure ambassadors
|
||
|
||
- The last step is to inject the updated configuration.
|
||
|
||
.exercise[
|
||
|
||
- Run the last script one more time:
|
||
```
|
||
../bin/configure-ambassadors.py
|
||
```
|
||
|
||
]
|
||
|
||
Now the performance graph climbs up, and the worker
|
||
logs show normal operation.
|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||
## Current state
|
||
|
||

|
||
|
||
---
|
||
|
||
## Clean up
|
||
|
||
- Before moving on, stop and remove all containers
|
||
|
||
.exercise[
|
||
|
||
- Terminate and remove all containers:
|
||
```
|
||
docker-compose down
|
||
```
|
||
|
||
- Remove ambassadors:
|
||
```
|
||
../bin/delete-ambassadors.sh
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## A few words about those ambassadors
|
||
|
||
- There is "a lot" of added complexity here
|
||
<br/>(5 scripts of almost 50 lines each!)
|
||
|
||
- Snark aside, those scripts tap into those concepts:
|
||
|
||
- network namespaces
|
||
- dynamic load balancer reconfiguration
|
||
- sidekick containers that are *mandatory*
|
||
- ... and have to be managed manually
|
||
|
||
- We are going to see an easier way to manage this!
|
||
|
||
---
|
||
|
||
# Setting up Consul and overlay networks
|
||
|
||
- We will reconfigure our Swarm cluster to enable overlays
|
||
|
||
- We will deploy a Consul cluster
|
||
|
||
- We will connect containers running on different machines
|
||
|
||
---
|
||
|
||
## First, let's Clean All The Things!
|
||
|
||
- We need to remove the old containers
|
||
<br/>(in particular the `swarm` agents and managers)
|
||
|
||
.exercise[
|
||
|
||
- The following snippet will nuke all containers on all hosts:
|
||
|
||
```
|
||
for N in 1 2 3 4 5
|
||
do
|
||
ssh node$N "docker ps -qa | xargs -r docker rm -f"
|
||
done
|
||
```
|
||
|
||
(If it asks you to confirm SSH keys, just do it!)
|
||
|
||
]
|
||
|
||
Note: our Swarm cluster is now broken.
|
||
|
||
---
|
||
|
||
## Remove old Machine information
|
||
|
||
- We will use `docker-machine rm`
|
||
|
||
- With the `generic` driver, this doesn't do anything
|
||
<br/>(it just deletes local configuration)
|
||
|
||
- With cloud/VM drivers, this would actually delete VMs
|
||
|
||
.exercise[
|
||
|
||
- Remove our nodes from Docker Machine config database:
|
||
|
||
```
|
||
for N in 1 2 3 4 5
|
||
do
|
||
docker-machine rm -f node$N
|
||
done
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Add extra options to our Engines
|
||
|
||
- We need two new options for our engines:
|
||
|
||
- `cluster-store` (to indicate which key/value store to use)
|
||
|
||
- `cluster-advertise` (to indicate which IP address to register)
|
||
|
||
- `cluster-store` will be `consul://localhost:8500`
|
||
<br/>(we will run one Consul node on each machine)
|
||
|
||
- `cluster-advertise` will be `eth0:2376`
|
||
<br/>(Engine will automatically pick up eth0's IP address)
|
||
|
||
---
|
||
|
||
## Reconfiguring Swarm clusters, the Docker way
|
||
|
||
- The traditional way to reconfigure a service is to edit
|
||
its configuration (or init script), then restart
|
||
|
||
- We can use Machine to make that easier
|
||
|
||
- Re-deploying with Machine's `generic` driver will reconfigure
|
||
Engines with the new parameters
|
||
|
||
.exercise[
|
||
|
||
- Re-provision the manager node:
|
||
|
||
.small[
|
||
```
|
||
docker-machine create --driver generic \
|
||
--engine-opt cluster-store=consul://localhost:8500 \
|
||
--engine-opt cluster-advertise=eth0:2376 \
|
||
--swarm --swarm-master --swarm-discovery consul://localhost:8500 \
|
||
--generic-ssh-user docker --generic-ip-address XX.XX.XX.XX node1
|
||
```
|
||
]
|
||
]
|
||
|
||
---
|
||
|
||
## Reconfigure the other nodes
|
||
|
||
- Once again, scripting to the rescue!
|
||
|
||
.exercise[
|
||
|
||
```
|
||
grep node[2345] /etc/hosts | grep -v ^127 |
|
||
while read IPADDR NODENAME
|
||
do docker-machine create --driver generic \
|
||
--engine-opt cluster-store=consul://localhost:8500 \
|
||
--engine-opt cluster-advertise=eth0:2376 \
|
||
--swarm --swarm-discovery consul://localhost:8500 \
|
||
--generic-ssh-user docker \
|
||
--generic-ip-address $IPADDR $NODENAME
|
||
done
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Checking what we did
|
||
|
||
.exercise[
|
||
|
||
- Directly point the CLI to a node and check configuration:
|
||
|
||
```
|
||
eval $(docker-machine env node1)
|
||
docker info
|
||
```
|
||
|
||
(should show `Cluster store` and `Cluster advertise`)
|
||
|
||
- Try to talk to the Swarm cluster:
|
||
|
||
```
|
||
eval $(docker-machine env node1 --swarm)
|
||
docker info
|
||
```
|
||
|
||
(should show zero node)
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Why zero node?
|
||
|
||
- We haven't started Consul yet
|
||
|
||
- Swarm discovery is not operationl
|
||
|
||
- Swarm can't discover the nodes
|
||
|
||
Note: good guy ~~Stevedore~~ Docker will start without K/V
|
||
|
||
(This lets us run Consul itself in a container!)
|
||
|
||
---
|
||
|
||
## Adding Consul
|
||
|
||
- We will run Consul in containers
|
||
|
||
- We will use a
|
||
[custom consul image](https://hub.docker.com/r/jpetazzo/consul/)
|
||
|
||
- We will tell Docker to automatically restart it on reboots
|
||
|
||
- To simplify network setup, we will use `host` networking
|
||
|
||
---
|
||
|
||
## Starting the first Consul node
|
||
|
||
.exercise[
|
||
|
||
- Make sure you're logged into `node1`,
|
||
with a clean environment:
|
||
|
||
```
|
||
unset DOCKER_HOST
|
||
```
|
||
|
||
- The first node must be started with the `-bootstrap` flag:
|
||
|
||
```
|
||
CID=$(docker run --name consul_node1 \
|
||
-d --restart=always --net host \
|
||
jpetazzo/consul agent -server -bootstrap)
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Starting the other Consul nodes
|
||
|
||
- Other nodes have to be started with the `-join A.B.C.D`
|
||
option, where A.B.C.D is the address of an existing node
|
||
|
||
.exercise[
|
||
|
||
- Find the internal IP address of our first node:
|
||
```
|
||
IPADDR=$(ip a ls dev eth0 |
|
||
sed -n 's,.*inet \(.*\)/.*,\1,p')
|
||
```
|
||
|
||
- Start the other nodes:
|
||
```
|
||
for N in 2 3 4 5; do
|
||
ssh node$N docker run --name consul_node$N \
|
||
-d --restart=always --net host \
|
||
jpetazzo/consul agent -server -join $IPADDR
|
||
done
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Check that our Consul cluster is up
|
||
|
||
- With your browser, navigate to any instance on port 8500
|
||
<br/>(in "NODES" you should see the five nodes)
|
||
|
||
- Let's run a couple of useful Consul commands
|
||
|
||
.exercise[
|
||
|
||
- Ask Consul the list of members it knows:
|
||
```
|
||
docker run --net host --rm jpetazzo/consul members
|
||
```
|
||
|
||
- Ask Consul which node is the current leader:
|
||
```
|
||
curl localhost:8500/v1/status/leader
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Check that our Swarm cluster is up
|
||
|
||
.exercise[
|
||
|
||
- Try again the `docker info` from earlier:
|
||
|
||
```
|
||
eval $(docker-machine env --swarm node1)
|
||
docker info
|
||
```
|
||
|
||
- Now all nodes should be visible
|
||
<br/>(Give them a minute or two to register)
|
||
|
||
]
|
||
|
||
---
|
||
|
||
# Multi-host networking
|
||
|
||
- Docker 1.9 has the concept of *networks*
|
||
|
||
- By default, containers are on the default "bridge" network
|
||
|
||
- You can create additional networks
|
||
|
||
- Containers can be on multiple networks
|
||
|
||
- Containers can dynamically join/leave networks
|
||
|
||
- The "overlay" driver lets networks span multiple hosts
|
||
|
||
- Let's see that in action!
|
||
|
||
---
|
||
|
||
## Create a few networks and containers
|
||
|
||
.exercise[
|
||
|
||
- Create two networks, *blue* and *green*:
|
||
```
|
||
docker network create --driver overlay blue
|
||
docker network create --driver overlay green
|
||
docker network ls
|
||
```
|
||
|
||
- Create containers with names of blue and green
|
||
things, on their respective networks:
|
||
```
|
||
docker run -d --name sky --net blue -m 3G redis
|
||
docker run -d --name navy --net blue -m 3G redis
|
||
docker run -d --name grass --net green -m 3G redis
|
||
docker run -d --name forest --net green -m 3G redis
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Check connectivity within networks
|
||
|
||
.exercise[
|
||
|
||
- Check that our containers are on different networks:
|
||
|
||
```
|
||
docker ps
|
||
```
|
||
|
||
- This will work:
|
||
|
||
```
|
||
docker exec -ti sky ping navy
|
||
```
|
||
|
||
- This will not:
|
||
|
||
```
|
||
docker exec -ti navy ping grass
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Containers connected to multiple networks
|
||
|
||
- Some colors aren't *quite* blue *nor* green
|
||
|
||
.exercise[
|
||
|
||
- Create a container that we want to be on both networks:
|
||
```
|
||
docker run -d --net blue --name turquoise nginx
|
||
```
|
||
|
||
- Check connectivity:
|
||
```
|
||
docker exec -ti turquoise ping -c1 navy
|
||
docker exec -ti turquoise ping -c1 grass
|
||
```
|
||
(First works; second doesn't)
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Dynamically connecting containers
|
||
|
||
- This is achieved with the command:
|
||
<br/>`docker network connect NETNAME CONTAINER`
|
||
|
||
.exercise[
|
||
|
||
- Dynamically connect to the green network:
|
||
```
|
||
docker network connect green turquoise
|
||
```
|
||
|
||
- Check connectivity:
|
||
```
|
||
docker exec -ti turquoise ping -c1 navy
|
||
docker exec -ti turquoise ping -c1 grass
|
||
```
|
||
(Both commands work now)
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Under the hood
|
||
|
||
- Each network has an interface in the container
|
||
|
||
- There is also an interface for the default gateway
|
||
|
||
.exercise[
|
||
|
||
- View interfaces in our `turquoise` container:
|
||
```
|
||
docker exec -ti turquoise ip addr ls
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Dynamically disconnecting containers
|
||
|
||
- There is a mirror command to `docker network connect`
|
||
|
||
.exercise[
|
||
|
||
- Disconnect the *turquoise* container from *blue*
|
||
(its original network):
|
||
```
|
||
docker network disconnect blue turquoise
|
||
docker network connect green turquoise
|
||
```
|
||
|
||
- Check connectivity:
|
||
```
|
||
docker exec -ti turquoise ping -c1 navy
|
||
docker exec -ti turquoise ping -c1 grass
|
||
```
|
||
(First command fails, second one works)
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Cleaning up
|
||
|
||
.exercise[
|
||
|
||
- Destroy containers:
|
||
|
||
```
|
||
docker rm -f sky navy grass forest turquoise
|
||
```
|
||
|
||
- Destroy networks:
|
||
|
||
```
|
||
docker network rm blue
|
||
docker network rm green
|
||
```
|
||
|
||
]
|
||
|
||
You cannot remove a network if
|
||
it still has containers.
|
||
|
||
There is no `"rm -f"` for network.
|
||
<br/>
|
||
There is a `"disconnect -f"` if needed.
|
||
|
||
---
|
||
|
||
# Using overlay networks with Compose
|
||
|
||
- Compose 1.5 had `--x-networking` flag
|
||
<br/>(enabling experimental support for overlay networks)
|
||
|
||
- Compose 1.6 has a new Compose file format
|
||
<br/>(using the new format enables overlay networks support)
|
||
|
||
- Compose will remain backward compatible with old files
|
||
|
||
- Converting to new files is (ridiculously) easy
|
||
|
||
---
|
||
|
||
## Our first "Compose v2" app
|
||
|
||
- To deploy DockerCoins, we still need a local registry
|
||
|
||
- Let's deploy a local registry using a Compose File v2!
|
||
|
||
.exercise[
|
||
|
||
- Go to the `registry` directory in the repository:
|
||
```
|
||
cd ~/orchestration-workshop/registry
|
||
```
|
||
|
||
]
|
||
|
||
Let's examine the `docker-compose.yml` file.
|
||
|
||
---
|
||
|
||
## Our first Compose v2 file
|
||
|
||
```
|
||
version: "2"
|
||
|
||
services:
|
||
backend:
|
||
image: registry:2
|
||
frontend:
|
||
image: jpetazzo/hamba
|
||
command: 5000 backend:5000
|
||
ports:
|
||
- "127.0.0.1:5000:5000"
|
||
depends_on:
|
||
- backend
|
||
```
|
||
|
||
- *Backend* is the actual registry.
|
||
- *Frontend* is the ambassador that we deployed earlier.
|
||
<br/>
|
||
It communicates with *backend* using an internal network
|
||
and network aliases.
|
||
|
||
---
|
||
|
||
## Starting a local registry with Compose
|
||
|
||
- We will bring up the registry
|
||
|
||
- Then we will ensure that one *frontend* is running
|
||
on each node by scaling it to our number of nodes
|
||
|
||
.exercise[
|
||
|
||
- Make sure that `COMPOSE_FILE` is not set:
|
||
```
|
||
unset COMPOSE_FILE
|
||
```
|
||
|
||
- Start the registry:
|
||
```
|
||
docker-compose up -d
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## "Scaling" the local registry
|
||
|
||
- This is a particular kind of scaling
|
||
|
||
- We just want to ensure that one *frontend*
|
||
is running on every single node of the cluster
|
||
|
||
.exercise[
|
||
|
||
- Scale the registry:
|
||
```
|
||
N=1
|
||
while docker-compose scale frontend=$N; do
|
||
N=$((N+1))
|
||
done
|
||
```
|
||
|
||
]
|
||
|
||
Note: Swarm might do that automatically for us in the future.
|
||
|
||
---
|
||
|
||
## Converting the Compose file for DockerCoins
|
||
|
||
- Services are no longer at the top level,
|
||
<br/>but under a `services` section
|
||
|
||
- There has to be a `version` key at the top level,
|
||
<br/>with value `"2"` (as a string, not an integer)
|
||
|
||
- Links should be removed
|
||
|
||
- Fixed port mappings should be removed
|
||
<br/>(until [docker/compose#2866](
|
||
https://github.com/docker/compose/issues/2866) is fixed)
|
||
|
||
- There are other minor differences, but for our sample
|
||
app, that's all we have to worry about!
|
||
|
||
---
|
||
|
||
## Our new Compose file
|
||
|
||
.small[
|
||
```
|
||
version: '2'
|
||
|
||
services:
|
||
rng:
|
||
build: rng
|
||
ports:
|
||
- 80
|
||
|
||
hasher:
|
||
build: hasher
|
||
ports:
|
||
- 80
|
||
|
||
webui:
|
||
build: webui
|
||
ports:
|
||
- 80
|
||
|
||
redis:
|
||
image: redis
|
||
|
||
worker:
|
||
build: worker
|
||
```
|
||
]
|
||
|
||
Copy-paste this into `docker-compose.yml`
|
||
<br/>(or you can `cp docker-compose.yml-v2 docker-compose.yml`)
|
||
|
||
---
|
||
|
||
## Use images, not builds
|
||
|
||
- If we try to start the app like that, containers will only
|
||
run on nodes which have the images
|
||
|
||
- Like before: we need to replace `build` with `image`
|
||
|
||
- We can re-use the `build-tag-push.py` script for that
|
||
|
||
.exercise[
|
||
|
||
- Set `DOCKER_REGISTRY` to use our local registry,
|
||
<br/>then build, tag, and push the application:
|
||
```
|
||
export DOCKER_REGISTRY=localhost:5000
|
||
../bin/build-tag-push.py
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Run the application
|
||
|
||
- At this point, our app is ready to run
|
||
|
||
- We don't need ambassadors or extra containers
|
||
|
||
.exercise[
|
||
|
||
- Start the application:
|
||
```
|
||
export COMPOSE_FILE=docker-compose.yml-XXXX
|
||
docker-compose up -d
|
||
```
|
||
|
||
- Observe that it's running on multiple nodes:
|
||
```
|
||
docker ps
|
||
```
|
||
|
||
]
|
||
|
||
Each container name is prefixed with the node it's running on.
|
||
|
||
---
|
||
|
||
## View the performance graph
|
||
|
||
- Load up the graph in the browser
|
||
|
||
.exercise[
|
||
|
||
- Check the `webui` service address and port:
|
||
```
|
||
docker-compose port webui 80
|
||
```
|
||
|
||
- Open it in your browser
|
||
|
||
]
|
||
|
||
---
|
||
|
||
# Load balancing with overlay networks
|
||
|
||
- Scaling the `worker` service works out of the box
|
||
(like before)
|
||
|
||
.exercise[
|
||
|
||
- Scale `worker`:
|
||
```
|
||
docker-compose scale worker=10
|
||
```
|
||
|
||
]
|
||
|
||
We will hit the bottleneck caused by the `rng` service.
|
||
|
||
How can we scale that service?
|
||
|
||
---
|
||
|
||
## The manual method
|
||
|
||
- Replace `rng` with:
|
||
|
||
- multiple copies `rng1`, `rng2`, `rng3`, ...
|
||
|
||
- a load balancer taking over the name `rng`,
|
||
<br/>and spreading traffic accross all instances
|
||
|
||
- You should have a sense of *déjà vu*
|
||
|
||
- We did that in the beginning of the workshop
|
||
|
||
- Can we do better?
|
||
|
||
---
|
||
|
||
## The scripted method
|
||
|
||
- We could write a script to automate those steps
|
||
|
||
--
|
||
|
||
- *Can we do better?*
|
||
|
||
--
|
||
|
||
- In a perfect world, we would like to do:
|
||
```
|
||
docker-compose scale rng=10
|
||
```
|
||
|
||
---
|
||
|
||
## Naming problem
|
||
|
||
- Service is called `rng`
|
||
|
||
- It therefore takes the network name `rng`
|
||
|
||
- Worker code connects to `rng`
|
||
|
||
- So `rng` should point to the load balancer
|
||
|
||
- What do‽
|
||
|
||
---
|
||
|
||
## Naming is *per-network*
|
||
|
||
- Solution: put `rng` on its own network
|
||
|
||
- That way, it doesn't take the network name `rng`
|
||
<br/>(at least not on the default network)
|
||
|
||
- Have the load balancer sit on both networks
|
||
|
||
- Add the name `rng` to the load balancer
|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||
## Original DockerCoins
|
||
|
||

|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||
## Load-balanced DockerCoins
|
||
|
||

|
||
|
||
---
|
||
|
||
## Declaring networks
|
||
|
||
- Networks (other than the default one)
|
||
*must* be declared
|
||
in a top-level `networks` section
|
||
|
||
.exercise[
|
||
|
||
- Add the `rng` network to the Compose file:
|
||
```
|
||
version: '2'
|
||
|
||
networks:
|
||
rng:
|
||
|
||
services:
|
||
rng:
|
||
image: ...
|
||
...
|
||
```
|
||
|
||
]
|
||
|
||
That section can be placed anywhere in the file.
|
||
|
||
---
|
||
|
||
## Putting the `rng` service in its network
|
||
|
||
- Services can have a `networks` section
|
||
|
||
- If they don't: they are placed in the default network
|
||
|
||
- If they do: they are placed only in the mentioned networks
|
||
|
||
.exercise[
|
||
|
||
- Change the `rng` service to put it in its network:
|
||
```
|
||
rng:
|
||
image: localhost:5000/dockercoins_rng:…
|
||
networks:
|
||
rng:
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Adding the load balancer
|
||
|
||
- The load balancer has to be in both networks:
|
||
<br/>`rng` and `default`
|
||
|
||
- In the `default` network, it must have the `rng` alias
|
||
|
||
- We will use the `jpetazzo/hamba` image
|
||
|
||
.exercise[
|
||
|
||
- Add the `rng-lb` service to the Compose file:
|
||
```
|
||
rng-lb:
|
||
image: jpetazzo/hamba
|
||
command: run
|
||
networks:
|
||
rng:
|
||
default:
|
||
aliases: [ rng ]
|
||
```
|
||
]
|
||
|
||
---
|
||
|
||
## Load balancer initial configuration
|
||
|
||
- We specified `run` as the initial command
|
||
|
||
- This tells `hamba` to wait for an initial configuration
|
||
|
||
- The load balancer will not be operational
|
||
<br/>(until we feed it its configuration)
|
||
|
||
---
|
||
|
||
## Start the application
|
||
|
||
.exercise[
|
||
|
||
- Bring up DockerCoins:
|
||
```
|
||
docker-compose up -d
|
||
```
|
||
|
||
- See that `worker` is complaining:
|
||
```
|
||
docker-compose logs worker
|
||
```
|
||
]
|
||
|
||
---
|
||
|
||
## Configure the load balancer
|
||
|
||
- Multiple solutions:
|
||
|
||
- lookup the IP address of the `rng` backend
|
||
- use the backend's network name
|
||
- use the backend's container name (easiest!)
|
||
|
||
.exercise[
|
||
|
||
- Configure the load balancer:
|
||
```
|
||
docker run --rm \
|
||
--volumes-from dockercoins_rng-lb_1 \
|
||
--net container:dockercoins_rng-lb_1 \
|
||
jpetazzo/hamba reconfigure 80 dockercoins_rng_1 80
|
||
```
|
||
|
||
]
|
||
|
||
The application should now be working correctly.
|
||
|
||
---
|
||
|
||
## Scale the application
|
||
|
||
- Use `docker-compose scale` as planned
|
||
|
||
.exercise[
|
||
|
||
- Scale `rng`:
|
||
```
|
||
docker-compose scale rng=10
|
||
```
|
||
|
||
]
|
||
|
||
Of course, the graph doesn't change *yet*.
|
||
|
||
We need to add the new backends to the load balancer
|
||
configuration first.
|
||
|
||
---
|
||
|
||
## Reconfigure the load balancer
|
||
|
||
- The command is similar to the one before
|
||
|
||
- We need to pass the list of all backends
|
||
|
||
.exercise[
|
||
|
||
- Reconfigure the load balancer:
|
||
```
|
||
docker run --rm \
|
||
--volumes-from dockercoins_rng-lb_1 \
|
||
--net container:dockercoins_rng-lb_1 \
|
||
jpetazzo/hamba reconfigure 80 \
|
||
$(for N in $(seq 1 10); do
|
||
echo dockercoins_rng_$N:80
|
||
done)
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Automating the process
|
||
|
||
- Nobody loves artisan YAML handy craft
|
||
|
||
- This can be automated very easily
|
||
|
||
- To make things easier, we can use a label:
|
||
|
||
*each container behind a load balancer will
|
||
have a `loadbalancer` label giving the name
|
||
of that loadbalancer*
|
||
|
||
- This is implemented by two scripts:
|
||
|
||
- add-load-balancer-v2.py
|
||
|
||
- reconfigure-load-balancers.py
|
||
|
||
---
|
||
|
||
# Going further
|
||
|
||
Deploying a new version (difficulty: easy)
|
||
|
||
- Just re-run all the steps!
|
||
|
||
- However, Compose will re-create the containers
|
||
|
||
- You will have to re-create ambassadors
|
||
<br/>(and configure them)
|
||
|
||
- You will have to cleanup old ambassadors
|
||
<br/>(left as an exercise for the reader)
|
||
|
||
- You will experience a little bit of downtime
|
||
|
||
---
|
||
|
||
## Going further
|
||
|
||
Zero-downtime deployment (difficulty: medium)
|
||
|
||
- Isolate stateful services
|
||
<br/>(like we did earlier for Redis)
|
||
|
||
- Do blue/green deployment:
|
||
|
||
- deploy and scale version N
|
||
|
||
- point a "top-level" load balancer to the app
|
||
|
||
- deploy and scale version N+1
|
||
|
||
- put both apps in the "top-level" balancer
|
||
|
||
- slowly switch traffic over to app version N+1
|
||
|
||
---
|
||
|
||
## Going further
|
||
|
||
Harder projects:
|
||
|
||
- Two-tier or three-tier ambassador deployments
|
||
|
||
- Deploy to Mesos or Kubernetes
|
||
|
||
---
|
||
|
||
## Cleaning up
|
||
|
||
.exercise[
|
||
|
||
- Terminate containers and remove them:
|
||
|
||
```
|
||
docker-compose down
|
||
```
|
||
|
||
]
|
||
|
||
Note: `docker-compose down` also deletes the
|
||
networks that had been created for the application.
|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||

|
||
|
||
---
|
||
|
||
# Here be dragons
|
||
|
||
- So far, we've used stable features
|
||
|
||
- We're going to explore experimental code
|
||
|
||
- **Use at your own risk**
|
||
|
||
---
|
||
|
||
# Distributing Machine credentials
|
||
|
||
- All the credentials (TLS keys and certs) are on node1
|
||
<br/>(the node on which we ran `docker-machine create`)
|
||
|
||
- If we lose node1, we're toast
|
||
|
||
- We need to move (or copy) the credentials somewhere safe
|
||
|
||
- Credentials are regular files, and relatively small
|
||
|
||
- Ah, if only we had a highly available, hierarchic store ...
|
||
|
||
--
|
||
|
||
- Wait a minute, we have one!
|
||
|
||
--
|
||
|
||
(That's Consul, if you were wondering)
|
||
|
||
---
|
||
|
||
## Storing files in Consul
|
||
|
||
- We will use [Benjamin Wester's consulfs](
|
||
https://github.com/bwester/consulfs)
|
||
|
||
- It mounts a Consul key/value store as a local filesystem
|
||
|
||
- Performance will be horrible
|
||
<br/>(don't run a database on top of that!)
|
||
|
||
- But to store files of a few KB, nobody will notice
|
||
|
||
- We will copy/link/sync... `~/.docker/machine` to Consul
|
||
|
||
---
|
||
|
||
## Installing consulfs
|
||
|
||
- Option 1: install Go, git clone, go build ...
|
||
|
||
- Option 2: be lazy and use [jpetazzo/consulfs](
|
||
https://hub.docker.com/r/jpetazzo/consulfs/)
|
||
|
||
.exercise[
|
||
|
||
- Be lazy and use the Docker image:
|
||
```
|
||
sudo docker run --rm -v /usr/local/bin:/target jpetazzo/consulfs
|
||
```
|
||
]
|
||
|
||
Note: the `jpetazzo/consulfs` image contains the
|
||
`consulfs` binary.
|
||
It copies it to `/target` (if `/target` is a volume).
|
||
|
||
We need `consulfs` locally (not in a container) because
|
||
we can't propagate a FUSE mount from a container to
|
||
the host (yet).
|
||
|
||
---
|
||
|
||
## Running consulfs
|
||
|
||
- The `consulfs` binary takes two arguments:
|
||
|
||
- the Consul server address
|
||
- a mount point (that has to be created first)
|
||
|
||
.exercise[
|
||
|
||
- Create a mount point:
|
||
```
|
||
mkdir ~/consul
|
||
```
|
||
|
||
- Mount Consul as a local filesystem:
|
||
```
|
||
consulfs localhost:8500 ~/consul
|
||
```
|
||
|
||
]
|
||
|
||
Leave this running in the foreground.
|
||
|
||
---
|
||
|
||
## Copying our credentials to Consul
|
||
|
||
- Use standard UNIX commands
|
||
|
||
- Don't try to preserve permissions, though
|
||
<br/>(`consulfs` doesn't store those)
|
||
|
||
.exercise[
|
||
|
||
- Check that Consul key/values are visible:
|
||
```
|
||
ls -l ~/consul/
|
||
```
|
||
|
||
- Copy Machine credentials into Consul:
|
||
```
|
||
cp -r ~/.docker/machine/. ~/consul/machine/
|
||
```
|
||
|
||
]
|
||
|
||
(This command can be re-executed to update the copy.)
|
||
|
||
---
|
||
|
||
## Mount Consul on another node
|
||
|
||
- We will repeat the previous steps to mount `~/consul`
|
||
|
||
.exercise[
|
||
|
||
- Connect to node2:
|
||
```
|
||
ssh node2
|
||
```
|
||
|
||
- Install `consulfs` and mount Consul:
|
||
```
|
||
sudo docker run --rm -v /usr/local/bin:/target jpetazzo/consulfs
|
||
mkdir ~/consul
|
||
consulfs localhost:8500 ~/consul
|
||
```
|
||
|
||
]
|
||
|
||
At this point, `ls -l ~/consul` should show `docker` and
|
||
`machine` directories.
|
||
|
||
---
|
||
|
||
## Access the credentials from the other node
|
||
|
||
- We will create a symlink
|
||
|
||
- We could also copy the credentials
|
||
|
||
.exercise[
|
||
|
||
- Create the symlink:
|
||
```
|
||
mkdir -p ~/.docker/
|
||
ln -s ~/consul/machine ~/.docker/
|
||
```
|
||
|
||
- Check that all nodes are visible:
|
||
```
|
||
docker-machine ls
|
||
```
|
||
|
||
]
|
||
|
||
.icon[] Go back to node1 after this.
|
||
|
||
---
|
||
|
||
## A few words on this strategy
|
||
|
||
- Anyone accessing Consul can control your Docker cluster
|
||
<br/>(to be fair: anyone accessing Consul can wreck
|
||
serious havoc to your cluster anyway)
|
||
|
||
- ConsulFS doesn't support *all* POSIX operations,
|
||
<br/>so a few things (like `mv`) will not work)
|
||
|
||
- As a consequence, with Machine 0.6, you cannot
|
||
run `docker-machine create` directly on top of ConsulFS
|
||
|
||
---
|
||
|
||
## What if Consul becomes unavailable?
|
||
|
||
- If Consul becomes unavailable (e.g. loses quorum),
|
||
<br/>you won't be able to access your credentials
|
||
|
||
- If Consul becomes unavailable ...
|
||
<br/>your cluster will be in a bad state anyway
|
||
|
||
- You can still access each Docker Engine over the
|
||
local UNIX socket (and repair Consul that way)
|
||
|
||
|
||
---
|
||
|
||
# Highly available Swarm managers
|
||
|
||
- Until now, the Swarm manager was a SPOF
|
||
<br/>(Single Point Of Failure)
|
||
|
||
- Swarm has experimental support for replication
|
||
|
||
- When replication is enabled, you deploy multiple (identical) managers
|
||
|
||
- one will be "primary"
|
||
- the other(s) will be "secondary"
|
||
- this is determined automatically
|
||
<br/>(through *leader election*)
|
||
|
||
---
|
||
|
||
## Swarm leader election
|
||
|
||
- The leader election mechanism relies on a key/value store
|
||
<br/>(consul, etcd, zookeeper)
|
||
|
||
- There is no requirement on the number of replicas
|
||
<br/>(the quorum is achieved through the key/value store)
|
||
|
||
- When the leader (or "primary") is unavailable,
|
||
<br/>a new election happens automatically
|
||
|
||
- You can issue API requests to any manager:
|
||
<br/>if you talk to a secondary, it forwards to the primary
|
||
|
||
.icon[] There is currently a bug when
|
||
the Consul cluster itself has a leader election; see [docker/swarm#1782](
|
||
https://github.com/docker/swarm/issues/1782).
|
||
|
||
---
|
||
|
||
## Swarm replication in practice
|
||
|
||
- We need to give two extra flags to the Swarm manager:
|
||
|
||
- `--replication`
|
||
|
||
*enables replication (duh!)*
|
||
|
||
- `--advertise ip.ad.dr.ess:port`
|
||
|
||
*address and port where this Swarm manager is reachable*
|
||
|
||
- Do you deploy with Docker Machine?
|
||
<br/>Then you can use `--swarm-opt`
|
||
to automatically pass flags to the Swarm manager
|
||
|
||
---
|
||
|
||
## Cleaning up our current Swarm containers
|
||
|
||
- We will use Docker Machine to re-provision Swarm
|
||
|
||
- We need to:
|
||
|
||
- remove the nodes from the Machine registry
|
||
|
||
- remove the Swarm containers
|
||
|
||
.exercise[
|
||
|
||
- Remove the current configuration:
|
||
```
|
||
for N in 1 2 3 4 5; do
|
||
ssh node$N docker rm -f swarm-agent swarm-agent-master
|
||
docker-machine rm -f node$N
|
||
done
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Re-deploy with the new configuration
|
||
|
||
- This time, we can deploy each node identically
|
||
<br/>(instead of 1 manager + 4 non-managers)
|
||
|
||
.exercise[
|
||
|
||
- Deploy all five nodes with the previous options,
|
||
and the new replication options:
|
||
|
||
.small[
|
||
```
|
||
grep node[12345] /etc/hosts | grep -v ^127 |
|
||
while read IPADDR NODENAME; do
|
||
docker-machine create --driver generic \
|
||
--engine-opt cluster-store=consul://localhost:8500 \
|
||
--engine-opt cluster-advertise=eth0:2376 \
|
||
--swarm --swarm-master \
|
||
--swarm-discovery consul://localhost:8500 \
|
||
--swarm-opt replication --swarm-opt advertise=$IPADDR:3376 \
|
||
--generic-ssh-user docker --generic-ip-address $IPADDR $NODENAME
|
||
done
|
||
```
|
||
]
|
||
|
||
]
|
||
|
||
.small[
|
||
Note: Consul is still running thanks to the `--restart=always` policy.
|
||
Other containers are now stopped, because the engines have been
|
||
reconfigured and restarted.
|
||
]
|
||
|
||
---
|
||
|
||
## Assess our new cluster health
|
||
|
||
- The output of `docker info` will tell us the status
|
||
of the node that we are talking to (primary or replica)
|
||
|
||
- If we talk to a replica, it will tell us who is the primary
|
||
|
||
.exercise[
|
||
|
||
- Talk to a random node, and ask its view of the cluster:
|
||
```
|
||
eval $(docker-machine env node3 --swarm)
|
||
docker info | grep -e ^Name -e ^Role -e ^Primary
|
||
```
|
||
|
||
]
|
||
|
||
Note: `docker info` is one of the only commands that will
|
||
work even when there is no elected primary. This helps
|
||
debugging.
|
||
|
||
---
|
||
|
||
## Test Swarm manager failover
|
||
|
||
- The previous command told us which node was the primary manager
|
||
|
||
- if `Role` is `primary`,
|
||
<br/>then the primary is indicated by `Name`
|
||
|
||
- if `Role` is `replica`,
|
||
<br/>then the primary is indicated by `Primary`
|
||
|
||
.exercise[
|
||
|
||
- Kill the primary manager:
|
||
```
|
||
ssh XXX docker kill swarm-agent-master
|
||
```
|
||
|
||
]
|
||
|
||
Look at the output of `docker info` every few seconds.
|
||
|
||
---
|
||
|
||
# Highly available containers
|
||
|
||
- Swarm has support for *rescheduling* on node failure
|
||
|
||
- It has to be explicitly enabled on a per-container basis
|
||
|
||
- When the primary manager detects that a node goes down,
|
||
<br/>those containers are rescheduled elsewhere
|
||
|
||
- If the containers can't be rescheduled (constraints issue),
|
||
<br/>they are lost (there is no reconciliation loop yet)
|
||
|
||
- As of Swarm 1.1.0, this is an *experimental* feature
|
||
<br/>(To enable it, you must pass the
|
||
`--experimental` flag when you start Swarm itself!)
|
||
|
||
---
|
||
|
||
## Working around flag order
|
||
|
||
- The flag must be *before* the Swarm command
|
||
<br/>(i.e. `docker run swarm --experimental manage ...`)
|
||
|
||
- We cannot use Docker Machine to pass that flag ☹
|
||
<br/>(Machine adds flags *after* the Swarm command)
|
||
|
||
- Instead, we will use the Swarm image `jpetazzo/swarm:experimental`:
|
||
```
|
||
FROM swarm
|
||
ENTRYPOINT ["/swarm", "--experimental"]
|
||
```
|
||
|
||
- We can tell Machine to use this with `--swarm-image`
|
||
|
||
---
|
||
|
||
## Reconfigure Swarm [one more time](https://www.youtube.com/watch?v=FGBhQbmPwH8)
|
||
|
||
.exercise[
|
||
|
||
- Redeploy Swarm with `--experimental`:
|
||
|
||
.small[
|
||
```
|
||
for N in 1 2 3 4 5; do
|
||
ssh node$N docker rm -f swarm-agent swarm-agent-master
|
||
docker-machine rm -f node$N
|
||
done
|
||
|
||
grep node[12345] /etc/hosts | grep -v ^127 |
|
||
while read IPADDR NODENAME; do
|
||
docker-machine create --driver generic \
|
||
--engine-opt cluster-store=consul://localhost:8500 \
|
||
--engine-opt cluster-advertise=eth0:2376 \
|
||
--swarm --swarm-master --swarm-image jpetazzo/swarm:experimental \
|
||
--swarm-discovery consul://localhost:8500 \
|
||
--swarm-opt replication --swarm-opt advertise=$IPADDR:3376 \
|
||
--generic-ssh-user docker --generic-ip-address $IPADDR $NODENAME
|
||
done
|
||
```
|
||
]
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Start a resilient container
|
||
|
||
- By default, containers will not be restarted when their node goes down
|
||
|
||
- You must pass an explicit *rescheduling policy* to make that happen
|
||
|
||
- For now, the only policy is "on-node-failure"
|
||
|
||
.exercise[
|
||
|
||
- Start a container with a rescheduling policy:
|
||
|
||
.small[
|
||
```
|
||
CID=$(docker run -d -e reschedule:on-node-failure nginx)
|
||
```
|
||
]
|
||
|
||
]
|
||
|
||
Check that the container is up and running.
|
||
|
||
---
|
||
|
||
## Simulate a node failure
|
||
|
||
- We will reboot the node running this container
|
||
|
||
- Swarm will reschedule it
|
||
|
||
.exercise[
|
||
|
||
- Check on which node the container is running:
|
||
</br>.small[`NODE=$(docker inspect --format '{{.Node.Name}}' $CID)`]
|
||
|
||
- Reboot that node:
|
||
<br/>`ssh $NODE sudo reboot`
|
||
|
||
- Check that the container has been recheduled:
|
||
<br/>`docker ps`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## .icon[] Caveats
|
||
|
||
- There are some corner cases when the node is also
|
||
the Swarm leader or the Consul leader; this is being improved
|
||
right now!
|
||
|
||
- Swarm doesn't handle gracefully the fact that after the
|
||
reboot, you have *two* containers named `highlander`,
|
||
and attempts to manipulate the container with its name
|
||
will not work. This will be improved too.
|
||
|
||
---
|
||
|
||
# Conclusions
|
||
|
||
- Bad news: we still have work to do to deploy our apps
|
||
|
||
- it's not all unicorns, ponies, and rainbows
|
||
|
||
- *no, Docker will not make your job obsolete*
|
||
|
||
- Good news: a lot of hard things are becoming easier
|
||
|
||
- building, packaging, distributing apps
|
||
|
||
- running distributed systems on clusters
|
||
|
||
---
|
||
|
||
## "This is complicated"
|
||
|
||
- The scripts used here are pretty simple
|
||
<br/>(each is less than 100 LOCs)
|
||
|
||
- You can easily rewrite them in your favorite language,
|
||
<br/>adapt and customize them, in a few hours of time
|
||
|
||
- FYI: those scripts are smaller and simpler than the
|
||
scripts (cloud init etc) used to deploy the VMs for this
|
||
workshop!
|
||
|
||
- Docker Inc. has commercial products to wrap all this:
|
||
|
||
- Docker Cloud
|
||
<br/>(manage your Docker nodes from a SAAS portal)
|
||
|
||
- Universal Control Plane
|
||
<br/>(buzzword-compliant management solution:
|
||
<br/>turnkey, enterprise-class, on-premise, etc.)
|
||
|
||
---
|
||
|
||
## What's next?
|
||
|
||
- November 2015: Compose 1.5 + Engine 1.9 =
|
||
<br/>first release with multi-host networking
|
||
|
||
- January 2016: Compose 1.6 + Engine 1.10 =
|
||
<br/>HUGE improvements (DNS server, HA...)
|
||
|
||
- Next release: another truckload of features
|
||
|
||
- I will deliver this workshop about twice a month
|
||
|
||
- Check out the GitHub repo for updated content!
|
||
<br/>(there is a tag for each big round of updates)
|
||
|
||
---
|
||
|
||
class: title
|
||
|
||
# Thanks! <br/> Questions?
|
||
|
||
### [@jpetazzo](https://twitter.com/jpetazzo) <br/> [@docker](https://twitter.com/docker)
|
||
|
||
</textarea>
|
||
<script src="https://gnab.github.io/remark/downloads/remark-0.5.9.min.js" type="text/javascript">
|
||
</script>
|
||
<script type="text/javascript">
|
||
var slideshow = remark.create();
|
||
</script>
|
||
</body>
|
||
</html>
|