mirror of
https://github.com/jpetazzo/container.training.git
synced 2026-03-05 19:00:38 +00:00
4210 lines
76 KiB
HTML
4210 lines
76 KiB
HTML
<!DOCTYPE html>
|
||
<html>
|
||
<head>
|
||
<base target="_blank">
|
||
<title>Docker Orchestration Workshop</title>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
|
||
<style type="text/css">
|
||
@import url(https://fonts.googleapis.com/css?family=Yanone+Kaffeesatz);
|
||
@import url(https://fonts.googleapis.com/css?family=Droid+Serif:400,700,400italic);
|
||
@import url(https://fonts.googleapis.com/css?family=Ubuntu+Mono:400,700,400italic);
|
||
|
||
body { font-family: 'Droid Serif'; font-size: 150%; }
|
||
|
||
h1, h2, h3 {
|
||
font-family: 'Yanone Kaffeesatz';
|
||
font-weight: normal;
|
||
}
|
||
a {
|
||
text-decoration: none;
|
||
color: blue;
|
||
}
|
||
.remark-code, .remark-inline-code { font-family: 'Ubuntu Mono'; }
|
||
.red { color: #fa0000; }
|
||
.gray { color: #ccc; }
|
||
.small { font-size: 70%; }
|
||
.big { font-size: 140%; }
|
||
.underline { text-decoration: underline; }
|
||
.footnote {
|
||
position: absolute;
|
||
bottom: 3em;
|
||
}
|
||
.pic {
|
||
vertical-align: middle;
|
||
text-align: center;
|
||
padding: 0 0 0 0 !important;
|
||
}
|
||
img {
|
||
max-width: 100%;
|
||
max-height: 450px;
|
||
}
|
||
.title {
|
||
vertical-align: middle;
|
||
text-align: center;
|
||
}
|
||
.title {
|
||
font-size: 2em;
|
||
}
|
||
.title .remark-slide-number {
|
||
font-size: 0.5em;
|
||
}
|
||
.quote {
|
||
background: #eee;
|
||
border-left: 10px solid #ccc;
|
||
margin: 1.5em 10px;
|
||
padding: 0.5em 10px;
|
||
quotes: "\201C""\201D""\2018""\2019";
|
||
font-style: italic;
|
||
}
|
||
.quote:before {
|
||
color: #ccc;
|
||
content: open-quote;
|
||
font-size: 4em;
|
||
line-height: 0.1em;
|
||
margin-right: 0.25em;
|
||
vertical-align: -0.4em;
|
||
}
|
||
.quote p {
|
||
display: inline;
|
||
}
|
||
.icon img {
|
||
height: 1em;
|
||
}
|
||
.exercise {
|
||
background-color: #eee;
|
||
background-image: url("keyboard.png");
|
||
background-size: 1.4em;
|
||
background-repeat: no-repeat;
|
||
background-position: 0.2em 0.2em;
|
||
border: 2px dotted black;
|
||
}
|
||
.exercise::before {
|
||
content: "Exercise:";
|
||
margin-left: 1.8em;
|
||
}
|
||
li p { line-height: 1.25em; }
|
||
</style>
|
||
</head>
|
||
<body>
|
||
<textarea id="source">
|
||
|
||
class: title
|
||
|
||
# Docker <br/> Orchestration <br/> Workshop
|
||
|
||
---
|
||
|
||
## Logistics
|
||
|
||
- Hello! I'm `jerome at docker dot com`
|
||
|
||
- Agenda:
|
||
|
||
.small[
|
||
- 09:00-10:30 part 1
|
||
- 10:30-11:00 coffee break
|
||
- 11:00-12:30 part 2
|
||
- 12:30-13:30 lunch break
|
||
- 13:30-15:00 part 3
|
||
- 15:00-15:30 coffee break
|
||
- 15:30-17:00 part 4
|
||
- 17:00- open discussion, Q&A
|
||
]
|
||
|
||
<!-- - This will be FAST PACED, but DON'T PANIC! -->
|
||
|
||
- All the content is publicly available
|
||
<br/>(slides, code samples, scripts)
|
||
|
||
- Experimental chat support on
|
||
[Gitter](https://gitter.im/jpetazzo/workshop-20160215-paris)
|
||
|
||
---
|
||
|
||
<!--
|
||
grep '^# ' index.html | grep -v '<br' | tr '#' '-'^C
|
||
-->
|
||
|
||
## Outline (1/2)
|
||
|
||
- Pre-requirements
|
||
- VM environment
|
||
- Our sample application
|
||
- Running services independently
|
||
- Running the whole app on a single node
|
||
- Identifying bottlenecks
|
||
- Measuring latency under load
|
||
- Scaling HTTP on a single node
|
||
- Put a load balancer on it
|
||
- Connecting to containers on other hosts
|
||
- Abstracting remote services with ambassadors
|
||
- Various considerations about ambassadors
|
||
|
||
---
|
||
|
||
## Outline (2/2)
|
||
|
||
- Docker for ops
|
||
- Backups
|
||
- Logs
|
||
- Security upgrades
|
||
- Network traffic analysis
|
||
- Dynamic orchestration
|
||
- Hands-on Swarm
|
||
- Deploying Swarm
|
||
- Cluster discovery
|
||
- Building our app on Swarm
|
||
- Network plumbing on Swarm
|
||
- Going further
|
||
|
||
---
|
||
|
||
# Pre-requirements
|
||
|
||
- Computer with network connection and SSH client
|
||
<br/>(on Windows, get [putty](http://www.putty.org/)
|
||
or [Git BASH](https://msysgit.github.io/))
|
||
- GitHub account (recommended; not mandatory)
|
||
- Gitter account (recommended; not mandatory)
|
||
- Docker Hub account (only for Swarm hands-on section)
|
||
- Basic Docker knowledge
|
||
|
||
.exercise[
|
||
|
||
- This is the stuff you're supposed to do!
|
||
- Create [GitHub](https://github.com/) and
|
||
[Docker Hub](https://hub.docker.com) accounts now if needed
|
||
- Go to [view.dckr.info](http://view.dckr.info) to view these slides
|
||
- Join the chat room on
|
||
[Gitter](https://gitter.im/jpetazzo/workshop-20160215-paris)
|
||
|
||
]
|
||
|
||
---
|
||
|
||
# VM environment
|
||
|
||
- Each person gets 5 VMs
|
||
- They are *your* VMs
|
||
- They'll be up until tomorrow
|
||
- You have a little card with login+password+IP addresses
|
||
- You can automatically SSH from one VM to another
|
||
|
||
.exercise[
|
||
|
||
- Log into the first VM (`node1`)
|
||
- Check that you can SSH (without password) to `node2`
|
||
- Check the version of docker with `docker version`
|
||
|
||
]
|
||
|
||
.footnote[Note: from now on, unless instructed, **all commands must
|
||
be run from the first VM, `node1`**.]
|
||
|
||
---
|
||
|
||
## Brand new versions!
|
||
|
||
- Engine 1.10.0
|
||
|
||
- Compose 1.6.0
|
||
|
||
- Swarm 1.1.0
|
||
|
||
- Machine 0.6.0
|
||
|
||
---
|
||
|
||
# Our sample application
|
||
|
||
- Let's look at the general layout of the
|
||
[source code](https://github.com/jpetazzo/orchestration-workshop)
|
||
|
||
- Each directory = 1 microservice
|
||
- `rng` = web service generating random bytes
|
||
- `hasher` = web service computing hash of POSTed data
|
||
- `worker` = background process using `rng` and `hasher`
|
||
- `webui` = web interface to watch progress
|
||
|
||
.exercise[
|
||
|
||
- Clone the repository on `node1`:
|
||
<br/>.small[`git clone git://github.com/jpetazzo/orchestration-workshop`]
|
||
|
||
]
|
||
|
||
(Bonus points for forking on GitHub and cloning your fork!)
|
||
|
||
---
|
||
|
||
## What's this application?
|
||
|
||
- It is a DockerCoin miner! 💰🐳📦🚢
|
||
|
||
- No, you can't buy coffee with DockerCoins
|
||
|
||
- How DockerCoins works:
|
||
|
||
- `worker` asks to `rng` to give it random bytes
|
||
- `worker` feeds those random bytes into `hasher`
|
||
- each hash starting with `0` is a DockerCoin
|
||
- DockerCoins are stored in `redis`
|
||
- `redis` is also updated every second to track speed
|
||
- you can see the progress with the `webui`
|
||
|
||
Next: we will inspect components independently.
|
||
|
||
---
|
||
|
||
# Running services independently
|
||
|
||
First, we will run the random number generator (`rng`).
|
||
|
||
.exercise[
|
||
|
||
- Go to the `dockercoins` directory, in the cloned repo:
|
||
<br/>`cd orchestration-workshop/dockercoins`
|
||
|
||
- Use Compose to run the `rng` service:
|
||
<br/>`docker-compose up rng`
|
||
|
||
- Docker will pull `python` and build the microservice
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Lies, damn lies, and port numbers
|
||
|
||
.icon[] Pay attention to the port mapping!
|
||
|
||
- The container log says:
|
||
<br/>`Running on http://0.0.0.0:80/`
|
||
|
||
- But if you try `curl localhost:80`, you will get:
|
||
<br/>`Connection refused`
|
||
|
||
- Port 80 on the container ≠ port 80 on the Docker host
|
||
|
||
---
|
||
|
||
## Understanding port mapping
|
||
|
||
- `node1`, the Docker host, has only one port 80
|
||
|
||
- If we give the one and only port 80 to the first
|
||
container who asks for it, we are in trouble when
|
||
another container needs it
|
||
|
||
- Default behavior: containers are not "exposed"
|
||
<br/>(only reachable by the Docker host and other containers,
|
||
through their private address)
|
||
|
||
- Container network services can be exposed:
|
||
|
||
- statically (you decide which host port to use)
|
||
|
||
- dynamically (Docker allocates a host port)
|
||
|
||
---
|
||
|
||
## Declaring port mapping
|
||
|
||
- Directly with the Docker Engine:
|
||
<br/>`docker run -P redis`
|
||
<br/>`docker run -p 6379 redis`
|
||
<br/>`docker run -p 1234:6379 redis`
|
||
|
||
- With Docker Compose, in the `docker-compose.yml` file:
|
||
|
||
```
|
||
rng:
|
||
…
|
||
ports:
|
||
- "8001:80"
|
||
```
|
||
|
||
→ port 8001 *on the host* maps to
|
||
port 80 *in the container*
|
||
|
||
---
|
||
|
||
## Using the `rng` service
|
||
|
||
Let's get random bytes of data!
|
||
|
||
.exercise[
|
||
|
||
- Open a second terminal and connect to the same VM
|
||
|
||
- Check that the service is alive:
|
||
<br/>`curl localhost:8001`
|
||
|
||
- Get 10 bytes of random data:
|
||
<br/>`curl localhost:8001/10`
|
||
|
||
- If the binary data output messed up your terminal, fix it:
|
||
<br/>`reset`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Running the hasher
|
||
|
||
.exercise[
|
||
|
||
- Start the `hasher` service:
|
||
<br/>`docker-compose up hasher`
|
||
|
||
- It will pull `ruby` and do the build
|
||
|
||
]
|
||
|
||
.icon[] Again, pay attention to the port mapping!
|
||
|
||
The container log says that it's listening on port 80,
|
||
but it's mapped to port 8002 on the host.
|
||
|
||
You can see the mapping in `docker-compose.yml`.
|
||
|
||
---
|
||
|
||
## Testing the hasher
|
||
|
||
.exercise[
|
||
|
||
- Open a third terminal window, and SSH to `node1`
|
||
|
||
- Check that the `hasher` service is alive:
|
||
<br/>`curl localhost:8002`
|
||
|
||
- Posting binary data requires some extra flags:
|
||
|
||
```
|
||
curl \
|
||
-H "Content-type: application/octet-stream" \
|
||
--data-binary hello \
|
||
localhost:8002
|
||
```
|
||
|
||
- Check that it computed the right hash:
|
||
<br/>`echo -n hello | sha256sum`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Stopping services
|
||
|
||
We have multiple options:
|
||
|
||
- Interrupt `docker-compose up` with `^C`
|
||
|
||
- Stop individual services with `docker-compose stop rng`
|
||
|
||
- Stop all services with `docker-compose stop`
|
||
|
||
- Kill all services with `docker-compose kill`
|
||
<br/>(rude, but faster!)
|
||
|
||
.exercise[
|
||
|
||
- Use any of those methods to stop `rng` and `hasher`
|
||
|
||
]
|
||
|
||
???
|
||
|
||
This hidden content is here for automation
|
||
(so that `docker-compose kill` gets executed
|
||
when auto-testing the content).
|
||
|
||
.exercise[
|
||
|
||
```
|
||
docker-compose kill
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
# Running the whole app on a single node
|
||
|
||
.exercise[
|
||
|
||
- Run `docker-compose up` to start all components
|
||
|
||
]
|
||
|
||
- `rng` and `hasher` can be started directly
|
||
|
||
- Other components are built accordingly
|
||
|
||
- Aggregate output is shown
|
||
|
||
- Output is verbose
|
||
<br/>(because the worker is constantly hitting other services)
|
||
|
||
---
|
||
|
||
## Viewing our application
|
||
|
||
- The app exposes a Web UI with a realtime progress graph
|
||
|
||
.exercise[
|
||
|
||
- Open http://[yourVMaddr]:8000/ (from a browser)
|
||
|
||
]
|
||
|
||
- The app actually has a constant, steady speed
|
||
<br/>(3.33 coins/second)
|
||
|
||
- The speed seems not-so-steady because:
|
||
|
||
- we measure a discrete value over discrete intervals
|
||
|
||
- the measurement is done by the browser
|
||
|
||
- BREAKING: network latency is a thing
|
||
|
||
---
|
||
|
||
## Running in the background
|
||
|
||
- The logs are very verbose (and won't get better)
|
||
|
||
- Let's put them in the background for now!
|
||
|
||
.exercise[
|
||
|
||
- Stop the app (with `^C`)
|
||
|
||
- Start it again with `docker-compose up -d`
|
||
|
||
- Check on the web UI that the app is still making progress
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Looking at resource usage
|
||
|
||
- Let's look at CPU, memory, and I/O usage
|
||
|
||
.exercise[
|
||
|
||
- run `top` to see CPU and memory usage
|
||
<br/>(you should see idle cycles)
|
||
|
||
- run `vmstat 3` to see I/O usage (si/so/bi/bo)
|
||
<br/>(the 4 numbers should be almost zero,
|
||
<br/>except `bo` for logging)
|
||
|
||
]
|
||
|
||
We have available resources.
|
||
|
||
- Why?
|
||
- How can we use them?
|
||
|
||
---
|
||
|
||
## Scaling workers on a single node
|
||
|
||
- Docker Compose supports scaling.red[*]
|
||
- Let's scale `worker` and see what happens!
|
||
|
||
.exercise[
|
||
|
||
- Start 9 more `worker` containers:
|
||
<br/>`docker-compose scale worker=10`
|
||
|
||
- Check the aggregated logs of those containers:
|
||
<br/>`docker-compose logs worker`
|
||
|
||
- See the impact on CPU load (with top/htop),
|
||
<br/>and on compute speed (with web UI)
|
||
|
||
]
|
||
|
||
.footnote[.red[*]With some limitations, as we'll see later.]
|
||
|
||
---
|
||
|
||
# Identifying bottlenecks
|
||
|
||
- You should have seen a 3x speed bump (not 10x)
|
||
|
||
- Adding workers didn't result in linear improvement
|
||
|
||
- *Something else* is slowing us down
|
||
|
||
--
|
||
|
||
- ... But what?
|
||
|
||
--
|
||
|
||
- The code doesn't have instrumentation
|
||
|
||
- Let's use state-of-the-art HTTP performance analysis!
|
||
<br/>(i.e. good old tools like `ab`, `httping`...)
|
||
|
||
???
|
||
|
||
## Benchmarking our microservices
|
||
|
||
We will test microservices in isolation.
|
||
|
||
.exercise[
|
||
|
||
- Stop the application:
|
||
`docker-compose kill`
|
||
|
||
- Remove old containers:
|
||
`docker-compose rm`
|
||
|
||
- Start `hasher` and `rng`:
|
||
`docker-compose up hasher rng`
|
||
|
||
]
|
||
|
||
Now let's hammer them with requests!
|
||
|
||
???
|
||
|
||
## Testing `rng`
|
||
|
||
Let's assess the raw performance of our RNG.
|
||
|
||
.exercise[
|
||
|
||
- Test the performance on one big request:
|
||
<br/>`curl -o/dev/null localhost:8001/10000000`
|
||
<br/>(should take ~1s, and show speed of ~10 MB/s)
|
||
|
||
]
|
||
|
||
If we were doing requests of 1000 bytes ...
|
||
|
||
... Could we get 10k req/s?
|
||
|
||
Let's test and see what happens!
|
||
|
||
???
|
||
|
||
## Concurrent requests
|
||
|
||
.exercise[
|
||
|
||
- Test 100 requests of 1000 bytes each:
|
||
<br/>`ab -n 100 localhost:8001/1000`
|
||
|
||
- Test 100 requests, 10 requests in parallel:
|
||
<br/>`ab -n 100 -c 10 localhost:8001/1000`
|
||
<br/>(look how the latency has increased!)
|
||
|
||
- Try with 100 requests in parallel:
|
||
<br/>`ab -n 100 -c 100 localhost:8001/1000`
|
||
|
||
]
|
||
|
||
??
|
||
|
||
Whatever we do, we get ~10 requests/second.
|
||
|
||
Increasing concurrency doesn't help:
|
||
it just increases latency.
|
||
|
||
???
|
||
|
||
## Discussion
|
||
|
||
- When serving requests sequentially, they each take 100ms
|
||
|
||
- When 10 requests arrive at the same time:
|
||
|
||
- one request is served in 100ms
|
||
- another is served in 200ms
|
||
- another is served in 300ms
|
||
- ...
|
||
- another is served in 1000ms
|
||
|
||
- All requests are queued and served by a single thread
|
||
|
||
- It looks like `rng` doesn't handle concurrent requests
|
||
|
||
- What about `hasher`?
|
||
|
||
???
|
||
|
||
## Save some random data and stop the generator
|
||
|
||
Before testing the hasher, let's save some random
|
||
data that we will feed to the hasher later.
|
||
|
||
.exercise[
|
||
|
||
- Run `curl localhost:8001/1000000 > /tmp/random`
|
||
|
||
]
|
||
|
||
Now we can stop the generator.
|
||
|
||
.exercise[
|
||
|
||
- In the shell where you did `docker-compose up rng`,
|
||
<br/>stop it by hitting `^C`
|
||
|
||
]
|
||
|
||
???
|
||
|
||
## Benchmarking the hasher
|
||
|
||
We will hash the data that we just got from `rng`.
|
||
|
||
.exercise[
|
||
|
||
- Posting binary data requires some extra flags:
|
||
|
||
```
|
||
curl \
|
||
-H "Content-type: application/octet-stream" \
|
||
--data-binary @/tmp/random \
|
||
localhost:8002
|
||
```
|
||
|
||
- Compute the hash locally to verify that it works fine:
|
||
<br/>`sha256sum /tmp/random`
|
||
<br/>(it should display the same hash)
|
||
|
||
]
|
||
|
||
???
|
||
|
||
## The hasher under load
|
||
|
||
The invocation of `ab` will be slightly more complex as well.
|
||
|
||
.exercise[
|
||
|
||
- Execute 100 requests in a row:
|
||
|
||
```
|
||
ab -n 100 -T application/octet-stream \
|
||
-p /tmp/random localhost:8002/
|
||
```
|
||
|
||
- Execute 100 requests with 10 requests in parallel:
|
||
|
||
```
|
||
ab -c 10 -n 100 -T application/octet-stream \
|
||
-p /tmp/random localhost:8002/
|
||
```
|
||
|
||
]
|
||
|
||
Take note of the performance numbers (requests/s).
|
||
|
||
???
|
||
|
||
## Benchmarking the hasher on smaller data
|
||
|
||
Here we hashed 1,000,000 bytes.
|
||
|
||
Later we will hash much smaller payloads.
|
||
|
||
Let's repeat the tests with smaller data.
|
||
|
||
.exercise[
|
||
|
||
- Run `truncate --size=10 /tmp/random`
|
||
- Repeat the `ab` tests
|
||
|
||
]
|
||
|
||
---
|
||
|
||
# Measuring latency under load
|
||
|
||
We will use `httping`.
|
||
|
||
.exercise[
|
||
|
||
- Scale back the `worker` service to zero:
|
||
<br/>`docker-compose scale worker=0`
|
||
|
||
- Open a new SSH connection and check the latency of `rng`:
|
||
<br/>`httping localhost:8001`
|
||
|
||
- Open a new SSH conection and do the same for `hasher`:
|
||
<br/>`httping localhost:8002`
|
||
|
||
- Keep an eye on both connections!
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Latency in initial conditions
|
||
|
||
Latency for both services should be very low (~1ms).
|
||
|
||
Now add a first worker and see what happens.
|
||
|
||
.exercise[
|
||
|
||
- Create the first `worker` instance:
|
||
<br/>`docker-compose scale worker=1`
|
||
|
||
]
|
||
|
||
- `hasher` should be very low (~1ms)
|
||
|
||
- `rng` should be low, with occasional spikes (10-100ms)
|
||
|
||
---
|
||
|
||
## Latency when scaling the worker
|
||
|
||
We will add workers and see what happens.
|
||
|
||
.exercise[
|
||
|
||
- Run `docker-compose scale worker=2`
|
||
|
||
- Check latency
|
||
|
||
- Increase number of workers and repeat
|
||
|
||
]
|
||
|
||
What happens?
|
||
|
||
- `hasher` remains low
|
||
- `rng` spikes up until it is reaches ~(N-2)*100ms
|
||
<br/>(when you have N workers)
|
||
|
||
---
|
||
|
||
class: title
|
||
|
||
Why?
|
||
|
||
---
|
||
|
||
## Why does everything take (at least) 100ms?
|
||
|
||
--
|
||
|
||
`rng` code:
|
||
|
||

|
||
|
||
--
|
||
|
||
`hasher` code:
|
||
|
||

|
||
|
||
---
|
||
|
||
class: title
|
||
|
||
But ...
|
||
|
||
WHY?!?
|
||
|
||
---
|
||
|
||
## Why did we sprinkle this sample app with sleeps?
|
||
|
||
- Deterministic performance
|
||
<br/>(regardless of instance speed, CPUs, I/O...)
|
||
|
||
--
|
||
|
||
- Actual code sleeps all the time anyway
|
||
|
||
--
|
||
|
||
- When your code makes a remote API call:
|
||
|
||
- it sends a request;
|
||
|
||
- it sleeps until it gets the response;
|
||
|
||
- it processes the response.
|
||
|
||
---
|
||
|
||
## Why do `rng` and `hasher` behave differently?
|
||
|
||

|
||
|
||
--
|
||
|
||
(Synchronous vs. asynchronous event processing)
|
||
|
||
---
|
||
|
||
## How to make `rng` go faster
|
||
|
||
- Obvious solution: comment out the `sleep` instruction
|
||
|
||
--
|
||
|
||
- Real-world solution: use an asynchronous framework
|
||
<br/>(e.g. use gunicorn with gevent)
|
||
|
||
--
|
||
|
||
- New rule: we can't change the code!
|
||
|
||
--
|
||
|
||
- Solution: scale out `rng`
|
||
<br/>(dispatch `rng` requests on multiple instances)
|
||
|
||
---
|
||
|
||
# Scaling HTTP on a single node
|
||
|
||
- We could try to scale with Compose:
|
||
|
||
```
|
||
docker-compose scale rng=3
|
||
```
|
||
|
||
- Compose doesn't deal with load balancing
|
||
|
||
- We would get 3 instances ...
|
||
|
||
- ... But only the first one would serve traffic
|
||
|
||
---
|
||
|
||
## The plan
|
||
|
||
- Stop the `rng` service first
|
||
|
||
- Create multiple identical `rng` containers
|
||
|
||
- Put a load balancer in front of them
|
||
|
||
- Point other services to the load balancer
|
||
|
||
---
|
||
|
||
## Stopping `rng`
|
||
|
||
- That's the easy part!
|
||
|
||
.exercise[
|
||
|
||
- Use `docker-compose` to stop `rng`:
|
||
|
||
```
|
||
docker-compose stop rng
|
||
```
|
||
|
||
]
|
||
|
||
Note: we do this first because we are about to remove
|
||
`rng` from the Docker Compose file.
|
||
|
||
If we don't stop
|
||
`rng` now, it will remain up and running, with Compose
|
||
being unaware of its existence!
|
||
|
||
---
|
||
|
||
## Scaling `rng`
|
||
|
||
.exercise[
|
||
|
||
- Replace the `rng` service with multiple copies of it:
|
||
|
||
```
|
||
rng1:
|
||
build: rng
|
||
|
||
rng2:
|
||
build: rng
|
||
|
||
rng3:
|
||
build: rng
|
||
```
|
||
|
||
]
|
||
|
||
That's all!
|
||
|
||
Shortcut: `docker-compose.yml-scaled-rng`
|
||
|
||
---
|
||
|
||
## Introduction to `jpetazzo/hamba`
|
||
|
||
- Public image on the Docker Hub
|
||
|
||
- Load balancer based on HAProxy
|
||
|
||
- Expects the following arguments:
|
||
<br/>`FE-port BE1-addr BE1-port BE2-addr BE2-port ...`
|
||
<br/>*or*
|
||
<br/>`FE-addr:FE-port BE1-addr BE1-port BE2-addr BE2-port ...`
|
||
|
||
- FE=frontend (the thing other services connect to)
|
||
|
||
- BE=backend (the multiple copies of your scaled service)
|
||
|
||
.small[
|
||
Example: listen to port 80 and balance traffic on www1:1234 + www2:2345
|
||
|
||
```
|
||
docker run -d -p 80 jpetazzo/hamba 80 www1 1234 www2 2345
|
||
```
|
||
]
|
||
|
||
---
|
||
|
||
# Put a load balancer on it
|
||
|
||
Let's add our load balancer to the Compose file.
|
||
|
||
.exercise[
|
||
|
||
- Add the following section to the Compose file:
|
||
|
||
```
|
||
rng0:
|
||
image: jpetazzo/hamba
|
||
links:
|
||
- rng1
|
||
- rng2
|
||
- rng3
|
||
command: 80 rng1 80 rng2 80 rng3 80
|
||
ports:
|
||
- "8001:80"
|
||
```
|
||
|
||
]
|
||
|
||
Shortcut: `docker-compose.yml-scaled-rng`
|
||
|
||
---
|
||
|
||
## Point other services to the load balancer
|
||
|
||
- The only affected service is `worker`
|
||
|
||
- We have to replace the `rng` link with a link to `rng0`,
|
||
but it should still be named `rng` (so we don't change the code)
|
||
|
||
.exercise[
|
||
|
||
- Update the `worker` section as follows:
|
||
|
||
```
|
||
worker:
|
||
build: worker
|
||
links:
|
||
- rng0:rng
|
||
- hasher
|
||
- redis
|
||
```
|
||
|
||
]
|
||
|
||
Shortcut: `docker-compose.yml-scaled-rng`
|
||
|
||
---
|
||
|
||
## Start the whole stack
|
||
|
||
.exercise[
|
||
|
||
- Start the new services:
|
||
<br/>`docker-compose up -d`
|
||
|
||
- Check worker logs:
|
||
<br/>`docker-compose logs worker`
|
||
|
||
- Check load balancer logs:
|
||
<br/>`docker-compose logs rng0`
|
||
|
||
]
|
||
|
||
If you get errors about port 8001, make sure that
|
||
`rng` was stopped correctly and try again.
|
||
|
||
---
|
||
|
||
## Results
|
||
|
||
- Check the latency of `rng`
|
||
<br/>(it should have improved significantly!)
|
||
|
||
- Check the application performance in the Web UI
|
||
<br/>(it should improve if you have enough workers)
|
||
|
||
*Note: if `worker` was scaled when you did `docker-compose up`,
|
||
it probably took a while, because `worker` doesn't handle
|
||
signals properly and Docker patiently waits 10 seconds for
|
||
each `worker` instance to terminate. This would be much
|
||
faster for a well-behaved application.*
|
||
|
||
---
|
||
|
||
## The good, the bad, the ugly
|
||
|
||
- The good
|
||
|
||
We scaled a service, added a load balancer -
|
||
<br/>without changing a single line of code.
|
||
|
||
- The bad
|
||
|
||
We manually copy-pasted sections in `docker-compose.yml`.
|
||
|
||
Improvement: write scripts to transform the YAML file.
|
||
|
||
- The ugly
|
||
|
||
If we scale up/down, we have to restart everything.
|
||
|
||
Improvement: reconfigure the load balancer dynamically.
|
||
|
||
---
|
||
|
||
# Connecting to containers on other hosts
|
||
|
||
- So far, our whole stack is on a single machine
|
||
|
||
- We want to scale out (across multiple nodes)
|
||
|
||
- We will deploy the same stack multiple times
|
||
|
||
- But we want every stack to use the same Redis
|
||
<br/>(in other words: Redis is our only *stateful* service here)
|
||
|
||
--
|
||
|
||
- And remember: we're not allowed to change the code!
|
||
|
||
- the code connects to host `redis`
|
||
- `redis` must resolve to the address of our Redis service
|
||
- the Redis service must listen on the default port (6379)
|
||
|
||
---
|
||
|
||
## Using host name injection to abstract service dependencies
|
||
|
||
- It is possible to add host entries to a container
|
||
|
||
- With the CLI:
|
||
|
||
```
|
||
docker run --add-host redis:192.168.1.2 myservice...
|
||
```
|
||
|
||
- In a Compose file:
|
||
|
||
```
|
||
myservice:
|
||
image: myservice
|
||
extra_host:
|
||
redis: 192.168.1.2
|
||
```
|
||
|
||
- This creates entries in `/etc/hosts` in the container
|
||
</br>(in Engine 1.10, a local DNS server is used instead)
|
||
|
||
???
|
||
|
||
## The plan
|
||
|
||
- Deploy our Redis service separately
|
||
|
||
- use the same `redis` image
|
||
|
||
- make sure that Redis server port (6379) is publicly accessible,
|
||
using port 6379 on the Docker host
|
||
|
||
- Update our Docker Compose YAML file
|
||
|
||
- remove the `redis` section
|
||
|
||
- in the `links` section, remove `redis`
|
||
|
||
- instead, put a `redis` entry in `extra_hosts`
|
||
|
||
Note: the code stays on the first node!
|
||
<br/>(We do not need to copy the code to the other nodes.)
|
||
|
||
???
|
||
|
||
## Making Redis available on its default port
|
||
|
||
There are two strategies.
|
||
|
||
- `docker run -p 6379:6379 redis`
|
||
|
||
- the container has its own, isolated network stack
|
||
- Docker creates a port mapping rule through iptables
|
||
- slight performance overhead
|
||
- port number is explicit (visible through Docker API)
|
||
|
||
- `docker run --net host redis`
|
||
|
||
- the container uses the network stack of the host
|
||
- when it binds to 6379/tcp, that's 6379/tcp on the host
|
||
- allows raw speed (no overhead due to iptables/bridge)
|
||
- port number is not visible through Docker API
|
||
|
||
Choose wisely!
|
||
|
||
???
|
||
|
||
## Deploy Redis
|
||
|
||
.exercise[
|
||
|
||
- Start a new redis container, mapping port 6379 to 6379:
|
||
|
||
```
|
||
docker run -d -p 6379:6379 redis
|
||
```
|
||
|
||
- Check that it's running with `docker ps`
|
||
|
||
- Note the IP address of this Docker host
|
||
|
||
- Try to connect to it (from anywhere):
|
||
|
||
```
|
||
telnet ip.ad.dr.ess 6379
|
||
```
|
||
|
||
]
|
||
|
||
To exit a telnet session: `Ctrl-] c ENTER`
|
||
|
||
???
|
||
|
||
## Update `docker-compose.yml` (1/3)
|
||
|
||
.exercise[
|
||
|
||
- Comment out `redis`:
|
||
|
||
```
|
||
#redis:
|
||
# image: redis
|
||
```
|
||
|
||
]
|
||
|
||
???
|
||
|
||
## Update `docker-compose.yml` (2/3)
|
||
|
||
.exercise[
|
||
|
||
- Update `worker`:
|
||
|
||
```
|
||
worker:
|
||
build: worker
|
||
extra_hosts:
|
||
redis: A.B.C.D
|
||
links:
|
||
- rng0:rng
|
||
- hasher
|
||
```
|
||
|
||
]
|
||
|
||
Replace `A.B.C.D` with the IP address noted earlier.
|
||
|
||
Shortcut: `docker-compose.yml-extra-hosts`
|
||
<br/>(But you still have to replace `A.B.C.D`!)
|
||
|
||
???
|
||
|
||
## Update `docker-compose.yml` (3/3)
|
||
|
||
.exercise[
|
||
|
||
- Update `webui`:
|
||
|
||
```
|
||
webui:
|
||
build: webui
|
||
extra_hosts:
|
||
redis: A.B.C.D
|
||
ports:
|
||
- "8000:80"
|
||
#volumes:
|
||
# - "./webui/files/:/files/"
|
||
```
|
||
|
||
]
|
||
|
||
(Replace `A.B.C.D` with the IP address noted earlier)
|
||
|
||
.icon[] Don't forget to comment out the `volumes` section!
|
||
|
||
???
|
||
|
||
## Why did we comment out the `volumes` section?
|
||
|
||
- Volumes have multiple uses:
|
||
|
||
- storing persistent stuff (database files...)
|
||
|
||
- sharing files between containers (logs, configuration...)
|
||
|
||
- sharing files between host and containers (source...)
|
||
|
||
- The `volumes` directive expands to an host path
|
||
<br/>.small[(e.g. `/home/docker/orchestration-workshop/dockercoins/webui/files`)]
|
||
|
||
- This host path exists on the local machine
|
||
<br/>(not on the others)
|
||
|
||
- This specific volume is used in development
|
||
<br/>(not in production)
|
||
|
||
???
|
||
|
||
## Start the stack on the first machine
|
||
|
||
- Nothing special to do here
|
||
|
||
- Just bring up the application like we did before
|
||
|
||
.exercise[
|
||
|
||
- `docker-compose up -d`
|
||
|
||
]
|
||
|
||
- Check in the web browser that it's running correctly
|
||
|
||
???
|
||
|
||
## Start the stack on another machine
|
||
|
||
- We will set the `DOCKER_HOST` variable
|
||
|
||
- `docker-compose` will detect and use it
|
||
|
||
- Our Docker hosts are listening on port 55555
|
||
|
||
.exercise[
|
||
|
||
- Set the environment variable:
|
||
<br/>`export DOCKER_HOST=tcp://node2:55555`
|
||
|
||
- Start the stack:
|
||
<br/>`docker-compose up -d`
|
||
|
||
- Check that it's running:
|
||
<br/>`docker-compose ps`
|
||
|
||
]
|
||
|
||
???
|
||
|
||
## Scale!
|
||
|
||
.exercise[
|
||
|
||
- Open the Web UI
|
||
<br/>(on a node where it's deployed)
|
||
|
||
- Deploy one instance of the stack on each node
|
||
|
||
]
|
||
|
||
???
|
||
|
||
## Cleanup
|
||
|
||
- Let's remove what we did
|
||
|
||
.exercise[
|
||
|
||
- You can use the following scriptlet:
|
||
|
||
```
|
||
for N in $(seq 1 5); do
|
||
export DOCKER_HOST=tcp://node$N:55555
|
||
docker ps -qa | xargs docker rm -f
|
||
done
|
||
unset DOCKER_HOST
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
# Abstracting remote services with ambassadors
|
||
|
||
- What if we can't/won't run Redis on its default port?
|
||
|
||
- What if we want to be able to move it easily?
|
||
|
||
--
|
||
|
||
- We will use an ambassador
|
||
|
||
- Redis will be started independently of our stack
|
||
|
||
- It will run at an arbitrary location (host+port)
|
||
|
||
- In our stack, we replace `redis` with an ambassador
|
||
|
||
- The ambassador will connect to Redis
|
||
|
||
- The ambassador will "act as" Redis in the stack
|
||
|
||
---
|
||
|
||
## Start redis
|
||
|
||
- Start a standalone Redis container
|
||
|
||
- Let Docker expose it on a random port
|
||
|
||
.exercise[
|
||
|
||
- Run redis with a random public port:
|
||
<br/>`docker run -d -P --name myredis redis`
|
||
|
||
- Check which port was allocated:
|
||
<br/>`docker port myredis 6379`
|
||
|
||
]
|
||
|
||
- Note the IP address of the machine, and this port
|
||
|
||
---
|
||
|
||
## Update `docker-compose.yml`
|
||
|
||
.exercise[
|
||
|
||
<!--
|
||
- Restore `links` as they were before in `webui` and `worker`
|
||
-->
|
||
|
||
- Replace `redis` with an ambassador using `jpetazzo/hamba`:
|
||
|
||
```
|
||
redis:
|
||
image: jpetazzo/hamba
|
||
command: 6379 AA.BB.CC.DD EEEEE
|
||
```
|
||
|
||
- Comment out the `volumes` section in `webui`:
|
||
|
||
```
|
||
#volumes:
|
||
# - "./webui/files/:/files/"
|
||
```
|
||
|
||
]
|
||
|
||
Shortcut: `docker-compose.yml-ambassador`
|
||
<br/>(But you still have to update `AA.BB.CC.DD EEEE`!)
|
||
|
||
---
|
||
|
||
## Why did we comment out the `volumes` section?
|
||
|
||
- Volumes have multiple uses:
|
||
|
||
- storing persistent stuff (database files...)
|
||
|
||
- sharing files between containers (logs, configuration...)
|
||
|
||
- sharing files between host and containers (source...)
|
||
|
||
- The `volumes` directive expands to an host path
|
||
<br/>.small[(e.g. `/home/docker/orchestration-workshop/dockercoins/webui/files`)]
|
||
|
||
- This host path exists on the local machine
|
||
<br/>(not on the others)
|
||
|
||
- This specific volume is used in development
|
||
<br/>(not in production)
|
||
|
||
---
|
||
|
||
## Start the stack on the first machine
|
||
|
||
- Compose will detect the change in the `redis` service
|
||
|
||
- It will replace `redis` with a `jpetazzo/hamba` instance
|
||
|
||
.exercise[
|
||
|
||
- Just tell Compose to to its thing:
|
||
|
||
```
|
||
docker-compose up -d
|
||
```
|
||
|
||
- Check that the stack is up and running:
|
||
|
||
```
|
||
docker-compose ps
|
||
```
|
||
|
||
- Look at the Web UI to make sure that it works fine
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Start the stack on another machine
|
||
|
||
- We will set the `DOCKER_HOST` variable
|
||
|
||
- `docker-compose` will detect and use it
|
||
|
||
- Our Docker hosts are listening on port 55555
|
||
|
||
.exercise[
|
||
|
||
- Set the environment variable:
|
||
<br/>`export DOCKER_HOST=tcp://node2:55555`
|
||
|
||
- Start the stack:
|
||
<br/>`docker-compose up -d`
|
||
|
||
- Check that it's running:
|
||
<br/>`docker-compose ps`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Scale!
|
||
|
||
.exercise[
|
||
|
||
- Deploy one instance of the stack on each node:
|
||
|
||
.small[
|
||
```
|
||
for N in 3 4 5; do
|
||
DOCKER_HOST=tcp://node$N:55555 docker-compose up -d &
|
||
done
|
||
```
|
||
]
|
||
|
||
- Add a bunch of workers all over the place:
|
||
|
||
.small[
|
||
```
|
||
for N in 1 2 3 4 5; do
|
||
DOCKER_HOST=tcp://node$N:55555 docker-compose scale worker=10
|
||
done
|
||
```
|
||
]
|
||
|
||
- Admire the result in the Web UI!
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Social Media Moment
|
||
|
||
Let's celebrate our success!
|
||
|
||
(And the fact that we're just 2498349893849283948982 DockerCoins away from being able to afford a cup of coffee!)
|
||
|
||
.exercise[
|
||
|
||
- If you have a Twitter account, tweet your mining speed!
|
||
</br>(use the "Tweet this!" link below the graph☺)
|
||
|
||
]
|
||
|
||
---
|
||
|
||
# Various considerations about ambassadors
|
||
|
||
- "But, ambassadors are adding an extra hop!"
|
||
|
||
--
|
||
|
||
- Yes, but if you need load balancing, you need that hop
|
||
|
||
- Ambassadors actually *save* one hop
|
||
<br/>(they act as local load balancers)
|
||
|
||
- traditional load balancer:
|
||
<br/>client ⇒ external LB ⇒ server (2 physical hops)
|
||
|
||
- ambassadors:
|
||
<br/>client → ambassador ⇒ server (1 physical hop)
|
||
|
||
--
|
||
|
||
- Ambassadors are more reliable than traditional LBs
|
||
<br/>(they are colocated with their clients)
|
||
|
||
---
|
||
|
||
## Inconvenients of ambassadors
|
||
|
||
- Generic issues
|
||
<br/>(shared with any kind of load balancing / HA setup)
|
||
|
||
- extra logical hop (not transparent to the client)
|
||
|
||
- must assess backend health
|
||
|
||
- one more thing to worry about (!)
|
||
|
||
- Specific issues
|
||
|
||
- load balancing fairness
|
||
|
||
High-end load balancing solutions will rely on back pressure
|
||
from the backends. This addresses the fairness issue.
|
||
|
||
---
|
||
|
||
## There are many ways to deploy ambassadors
|
||
|
||
"Ambassador" is a design pattern.
|
||
|
||
There are many ways to implement it.
|
||
|
||
We will present three increasingly complex (but also powerful)
|
||
ways to deploy ambassadors.
|
||
|
||
---
|
||
|
||
## Single-tier ambassador deployment
|
||
|
||
- One-shot configuration process
|
||
|
||
- Must be executed manually after each scaling operation
|
||
|
||
- Scans current state, updates load balancer configuration
|
||
|
||
- Pros:
|
||
<br/>- simple, robust, no extra moving part
|
||
<br/>- easy to customize (thanks to simple design)
|
||
<br/>- can deal efficiently with large changes
|
||
|
||
- Cons:
|
||
<br/>- must be executed after each scaling operation
|
||
<br/>- harder to compose different strategies
|
||
|
||
- Example: this workshop
|
||
|
||
---
|
||
|
||
## Two-tier ambassador deployment
|
||
|
||
- Daemon listens to Docker events API
|
||
|
||
- Reacts to container start/stop events
|
||
|
||
- Adds/removes back-ends to load balancers configuration
|
||
|
||
- Pros:
|
||
<br/>- no extra step required when scaling up/down
|
||
|
||
- Cons:
|
||
<br/>- extra process to run and maintain
|
||
<br/>- deals with one event at a time (ordering matters)
|
||
|
||
- Hidden gotcha: load balancer creation
|
||
|
||
- Example: interlock
|
||
|
||
---
|
||
|
||
## Three-tier ambassador deployment
|
||
|
||
|
||
- Daemon listens to Docker events API
|
||
|
||
- Reacts to container start/stop events
|
||
|
||
- Adds/removes scaled services in distributed config DB
|
||
<br/>(zookeeper, etcd, consul…)
|
||
|
||
- Another daemon listens to config DB events
|
||
|
||
- Adds/removes backends to load balancers configuration
|
||
|
||
- Pros:
|
||
<br/>- more flexibility
|
||
|
||
- Cons:
|
||
<br/>- three extra services to run and maintain
|
||
|
||
- Example: registrator
|
||
|
||
---
|
||
|
||
## Other multi-host communication mechanisms
|
||
|
||
- Overlay networks
|
||
|
||
- weave, flannel, pipework ...
|
||
|
||
- Network plugins
|
||
|
||
- available since Engine 1.9
|
||
|
||
- Allow a flat network for your containers
|
||
|
||
- Often requires an extra service to deal with BUM packets
|
||
<br/>(broadcast/unknown/multicast)
|
||
|
||
- e.g. a key/value store (Consul, Etcd, Zookeeper ...)
|
||
|
||
- Load balancers and/or failover mechanisms still needed
|
||
|
||
---
|
||
|
||
class: title
|
||
|
||
# Interlude <br/>
|
||
|
||
# Docker for ops
|
||
|
||
---
|
||
|
||
# Backups
|
||
|
||
- Redis is still running (with name `myredis`)
|
||
|
||
- We want to enable backups without touching it
|
||
|
||
- We will use a special backup container:
|
||
|
||
- sharing the same volumes
|
||
|
||
- linked to it (to connect to it easily)
|
||
|
||
- possibly containing our backup tools
|
||
|
||
- This works because the `redis` container image
|
||
<br/>stores its data on a volume
|
||
|
||
---
|
||
|
||
## Starting the backup container
|
||
|
||
.exercise[
|
||
|
||
- Make sure you're talking to the initial host:
|
||
|
||
```
|
||
unset DOCKER_HOST
|
||
```
|
||
|
||
- Start the container:
|
||
|
||
```
|
||
docker run --link myredis:redis \
|
||
--volumes-from myredis \
|
||
-v /tmp/myredis:/output \
|
||
-ti alpine sh
|
||
```
|
||
|
||
- Look in `/data` in the container
|
||
<br/>(That's where Redis puts its data dumps)
|
||
]
|
||
|
||
---
|
||
|
||
## Connecting to Redis
|
||
|
||
- We need to tell Redis to perform a data dump *now*
|
||
|
||
.exercise[
|
||
|
||
- Connect to Redis:
|
||
<br/>`telnet redis 6379`
|
||
|
||
- Issue commands `SAVE` then `QUIT`
|
||
|
||
- Look at `/data` again
|
||
|
||
]
|
||
|
||
- There should be a recent dump file now!
|
||
|
||
---
|
||
|
||
## Getting the dump out of the container
|
||
|
||
- We could use many things:
|
||
|
||
- s3cmd to copy to S3
|
||
- SSH to copy to a remote host
|
||
- gzip/bzip/etc before copying
|
||
|
||
- We'll just copy it to the Docker host
|
||
|
||
.exercise[
|
||
|
||
- Copy the file from `/data` to `/output`
|
||
|
||
- Exit the container
|
||
|
||
- Look into `/tmp/myredis` (on the host)
|
||
|
||
]
|
||
|
||
---
|
||
|
||
# Logs
|
||
|
||
- Two strategies:
|
||
|
||
- log to plain files on volumes
|
||
|
||
- log to stdout
|
||
<br/>(and use a logging driver)
|
||
|
||
---
|
||
|
||
## Logging to plain files on volumes
|
||
|
||
(Sorry, that part won't be hands-on!)
|
||
|
||
- Start a container with `-v /logs`
|
||
|
||
- Make sure that all log files are in `/logs`
|
||
|
||
- To check logs, run e.g.
|
||
|
||
```
|
||
docker run --volumes-from ... ubuntu sh -c \
|
||
"grep WARN /logs/*.log"
|
||
```
|
||
|
||
- Or just go interactive:
|
||
|
||
```
|
||
docker run --volumes-from ... -ti ubuntu
|
||
```
|
||
|
||
- You can (should) start a log shipper that way
|
||
|
||
---
|
||
|
||
## Logging to stdout
|
||
|
||
- All containers should write to stdout/stderr
|
||
|
||
- Docker will collect logs and pass them to a logging driver
|
||
|
||
- Logging driver can specified globally, and per container
|
||
<br/>(changing it for a container overrides the global setting)
|
||
|
||
- To change the global logging driver,
|
||
<br/>pass extra flags to the daemon
|
||
<br/>(requires a daemon restart)
|
||
|
||
- To override the logging driver for a container,
|
||
<br/>pass extra flags to `docker run`
|
||
|
||
---
|
||
|
||
## Specifying logging flags
|
||
|
||
- `--log-driver`
|
||
|
||
*selects the driver*
|
||
|
||
- `--log-opt key=val`
|
||
|
||
*adds driver-specific options*
|
||
<br/>*(can be repeated multiple times)*
|
||
|
||
- The flags are identical for `docker daemon` and `docker run`
|
||
|
||
Tip #1: when provisioning with Docker Machine, use:
|
||
```
|
||
docker-machine create ... --engine-opt log-driver=...
|
||
```
|
||
|
||
Tip #2: you can set logging options in Compose files.
|
||
|
||
---
|
||
|
||
## Available drivers
|
||
|
||
- json-file (default)
|
||
|
||
- syslog (can send to UDP, TCP, TCP+TLS, UNIX sockets)
|
||
|
||
- awslogs (AWS CloudWatch)
|
||
|
||
- journald
|
||
|
||
- gelf
|
||
|
||
- fluentd
|
||
|
||
- splunk
|
||
|
||
---
|
||
|
||
## About json-file ...
|
||
|
||
- It doesn't rotate logs by default, so your disks will fill up
|
||
|
||
(Unless you set `maxsize` *and* `maxfile` log options.)
|
||
|
||
- It's the only one supporting logs retrieval
|
||
|
||
(If you want to use `docker logs`, `docker-compose logs`,
|
||
or fetch logs from the Docker API, you need json-file!)
|
||
|
||
- This might change in the future
|
||
|
||
(But it's complex since there is no standard protocol
|
||
to *retrieve* log entries.)
|
||
|
||
All about logging in the documentation:
|
||
https://docs.docker.com/reference/logging/overview/
|
||
|
||
---
|
||
|
||
# Storing container logs in an ELK stack
|
||
|
||
*Important foreword: this is not an "official" or "recommended"
|
||
setup; it is just an example. We do not endorse ELK, GELF,
|
||
or the other elements of the stack more than others!*
|
||
|
||
What we will do:
|
||
|
||
- Spin up an ELK stack, with Compose
|
||
|
||
- Gaze at the spiffy Kibana web UI
|
||
|
||
- Manually send a few log entries over GELF
|
||
|
||
- Reconfigure our DockerCoins app to send logs to ELK
|
||
|
||
---
|
||
|
||
## What's in an ELK stack?
|
||
|
||
- ELK is three components:
|
||
|
||
- ElasticSearch (to store and index log entries)
|
||
|
||
- Logstash (to receive log entries from various
|
||
sources, process them, and forward them to various
|
||
destinations)
|
||
|
||
- Kibana (to view/search log entries with a nice UI)
|
||
|
||
- The only component that we will configure is Logstash
|
||
|
||
- We will accept log entries using the GELF protocol
|
||
|
||
- Log entries will be stored in ElasticSearch,
|
||
<br/>and displayed on Logstash's stdout for debugging
|
||
|
||
---
|
||
|
||
## Starting our ELK stack
|
||
|
||
- We will use a *separate* Compose file
|
||
|
||
- The Compose file is in the `elk` directory
|
||
|
||
.exercise[
|
||
|
||
- Go to the `elk` directory:
|
||
```
|
||
cd ~/orchestration-workshop/elk
|
||
```
|
||
|
||
- Start the ELK stack:
|
||
```
|
||
docker-compose up -d
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Checking that our ELK stack works
|
||
|
||
- Our default Logstash configuration sends a test
|
||
message every minute
|
||
|
||
- All messages are stored into ElasticSearch,
|
||
but also shown on Logstash stdout
|
||
|
||
.exercise[
|
||
|
||
- Look at Logstash stdout:
|
||
```
|
||
docker-compose log logstash
|
||
```
|
||
|
||
]
|
||
|
||
After less than one minute, you should see a `"message" => "ok"`
|
||
in the output.
|
||
|
||
---
|
||
|
||
## Connect to Kibana
|
||
|
||
- Our ELK stack exposes two public services:
|
||
<br/>the Kibana web server, and the GELF UDP socket
|
||
|
||
.exercise[
|
||
|
||
- Check the port number for the Kibana UI:
|
||
```
|
||
docker-compose ps kibana
|
||
```
|
||
|
||
- Open the UI in your browser
|
||
<br/>(Use the instance IP address and the public port number)
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## "Configuring" Kibana
|
||
|
||
- If you see a status page with a yellow item, wait a minute and reload
|
||
(Kibana is probably still initializing)
|
||
|
||
- Kibana should offer you to "Configure an index pattern",
|
||
just click the "Create" button
|
||
|
||
- Then:
|
||
|
||
- click "Discover" (in the top-left corner)
|
||
- click "Last 15 minutes" (in the top-right corner)
|
||
- click "Last 1 hour" (in the list in the middle)
|
||
- click "Auto-refresh" (top-right corner)
|
||
- click "5 seconds" (top-left of the list)
|
||
|
||
- You should see a series of green bars
|
||
<br/>(with one new green bar every minute)
|
||
|
||
---
|
||
|
||
## Kibana out of the box
|
||
|
||

|
||
|
||
---
|
||
|
||
## Sending container output to Kibana
|
||
|
||
- We will create a simple container displaying "hello world"
|
||
|
||
- We will override the container logging driver
|
||
|
||
.exercise[
|
||
|
||
- Check the port number for the GELF socket:
|
||
<br/>`docker-compose ps logstash`
|
||
|
||
- Start a one-off container, overriding its logging driver:
|
||
<br/>(make sure to update X.X.X.X:XXXXX, of course)
|
||
|
||
```
|
||
docker run --rm --log-driver gelf \
|
||
--log-opt gelf-address=udp://X.X.X.X:XXXXX \
|
||
alpine echo hello world
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Visualizing container logs in Kibana
|
||
|
||
- Less than 5 seconds later (the refresh rate of the UI),
|
||
the log line should be visible in the Web UI
|
||
|
||
- We can customize the Web UI to be more readable
|
||
|
||
.exercise[
|
||
|
||
- In the left column, move the mouse over the following
|
||
columns, and click the "Add" button that appears:
|
||
|
||
- host
|
||
- container_name
|
||
- short_message
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Removing the old deployment of DockerCoins
|
||
|
||
- Before redeploying DockerCoins, remove everything
|
||
|
||
.exercise[
|
||
|
||
- Stop all DockerCoins containers:
|
||
<br/>`docker-compose kill`
|
||
|
||
- Remove them:
|
||
<br/>`docker-compose rm -f`
|
||
|
||
- Reset the Compose file:
|
||
<br/>`git checkout docker-compose.yml`
|
||
|
||
- Point the Docker API to a single node:
|
||
<br/>`eval $(docker-machine env -u)`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Add the logging driver to the Compose file
|
||
|
||
- We need to add the logging section to each container
|
||
|
||
- We need the GELF endpoint (host+port) that we
|
||
got earlier with `docker-compose ps logstash`
|
||
|
||
.exercise[
|
||
|
||
- Edit the `docker-compose.yml` file,
|
||
<br/>adding the the following lines **to each container**:
|
||
|
||
```
|
||
log_driver: gelf
|
||
log_opt:
|
||
gelf-address: "udp://X.X.X.X:XXXXX"
|
||
```
|
||
|
||
]
|
||
|
||
Shortcut: `docker-compose.yml-logging`
|
||
<br/>(But you still have to update `XX.XX.XX.XX:XXXXX`!)
|
||
|
||
---
|
||
|
||
## Start the DockerCoins app
|
||
|
||
.exercise[
|
||
|
||
- Use Compose normally:
|
||
```
|
||
docker-compose up -d
|
||
```
|
||
|
||
]
|
||
|
||
If you look in the Kibana web UI, you will see log lines
|
||
refreshed every 5 seconds.
|
||
|
||
Note: to do interesting things (graphs, searches...) we
|
||
would need to create indexes. This is beyond the scope
|
||
of this workshop.
|
||
|
||
---
|
||
|
||
# Security upgrades
|
||
|
||
- This section is not hands-on
|
||
|
||
- Public Service Announcement
|
||
|
||
- We'll discuss:
|
||
|
||
- how to upgrade the Docker daemon
|
||
|
||
- how to upgrade container images
|
||
|
||
---
|
||
|
||
## Upgrading the Docker daemon
|
||
|
||
- Stop all containers cleanly
|
||
<br/>(`docker ps -q | xargs docker stop`)
|
||
|
||
- Stop the Docker daemon
|
||
|
||
- Upgrade the Docker daemon
|
||
|
||
- Start the Docker daemon
|
||
|
||
- Start all containers
|
||
|
||
- This is like upgrading your Linux kernel,
|
||
<br/>but it will get better
|
||
|
||
---
|
||
|
||
## Upgrading container images
|
||
|
||
- When a vulnerability is announced:
|
||
|
||
- if it affects your base images,
|
||
<br/>make sure they are fixed first
|
||
|
||
- if it affects downloaded packages,
|
||
<br/>make sure they are fixed first
|
||
|
||
- re-pull base images
|
||
|
||
- rebuild
|
||
|
||
- restart containers
|
||
|
||
(The procedure is simple and plain, just follow it!)
|
||
|
||
---
|
||
|
||
# Network traffic analysis
|
||
|
||
- We still have `myredis` running
|
||
|
||
- We will use *shared network namespaces*
|
||
<br/>to perform network analysis
|
||
|
||
- Two containers sharing the same network namespace...
|
||
|
||
- have the same IP addresses
|
||
|
||
- have the same network interfaces
|
||
|
||
- `eth0` is therefore the same in both containers
|
||
|
||
---
|
||
|
||
## Install and start `ngrep`
|
||
|
||
Ngrep uses libpcap (like tcpdump) to sniff network traffic.
|
||
|
||
.exercise[
|
||
|
||
- Start a container with the same network namespace:
|
||
<br/>`docker run --net container:myredis -ti alpine sh`
|
||
|
||
- Install ngrep:
|
||
<br/>`apk update && apk add ngrep`
|
||
|
||
- Run ngrep:
|
||
<br/>`ngrep -tpd eth0 -Wbyline . tcp`
|
||
|
||
]
|
||
|
||
You should see a stream of Redis requests and responses.
|
||
|
||
---
|
||
|
||
class: title
|
||
|
||
# Dynamic orchestration
|
||
|
||
---
|
||
|
||
## Static vs Dynamic
|
||
|
||
- Static
|
||
|
||
- you decide what goes where
|
||
|
||
- simple to describe and implement
|
||
|
||
- seems easy at first but doesn't scale efficiently
|
||
|
||
- Dynamic
|
||
|
||
- the system decides what goes where
|
||
|
||
- requires extra components (HA KV...)
|
||
|
||
- scaling can be finer-grained, more efficient
|
||
|
||
---
|
||
|
||
## Mesos (overview)
|
||
|
||
- First presented in 2009
|
||
|
||
- Initial goal: resource scheduler
|
||
<br/>(two-level/pessimistic)
|
||
|
||
- top-level "master" knows the global cluster state
|
||
|
||
- "slave" nodes report status and resources to master
|
||
|
||
- master allocates resources to "frameworks"
|
||
|
||
- Container support added recently
|
||
<br/>(had to fit existing model)
|
||
|
||
- Network and service discovery is complex
|
||
|
||
---
|
||
|
||
## Mesos (in practice)
|
||
|
||
- Easy to setup a test cluster (in containers!)
|
||
|
||
- Great to accommodate mixed workloads
|
||
<br/>(see Marathon, Chronos, Aurora, and many more)
|
||
|
||
- "Meh" if you only want to run Docker containers
|
||
|
||
- In production on clusters of thousands of nodes
|
||
|
||
- Open source project; commercial support available
|
||
|
||
---
|
||
|
||
## Kubernetes (overview)
|
||
|
||
- 1 year old
|
||
|
||
- Designed specifically as a platform for containers
|
||
<br/>("greenfield" design)
|
||
|
||
- "pods" = groups of containers sharing network/storage
|
||
|
||
- Scaling and HA managed by "replication controllers"
|
||
|
||
- extensive use of "tags" instead of e.g. tree hierarchy
|
||
|
||
- Initially designed around Docker,
|
||
<br/>but doesn't hesitate to diverge in a few places
|
||
|
||
---
|
||
|
||
## Kubernetes (in practice)
|
||
|
||
- Network and service discovery is powerful, but complex
|
||
<br/>.small[(different mechanisms within pod, between pods, for inbound traffic...)]
|
||
|
||
- Initially designed around GCE
|
||
<br/>.small[(currently relies on "native" features for fast networking and persistence)]
|
||
|
||
- Adaptation is needed when it differs from Docker
|
||
<br/>.small[(need to learn new API, new tooling, new concepts)]
|
||
|
||
- Tends to be loved by ops more than devs
|
||
<br/>.small[(but keep in mind that it's evolving quite as fast as Docker)]
|
||
|
||
---
|
||
|
||
## Swarm (in theory)
|
||
|
||
- Consolidates multiple Docker hosts into a single one
|
||
|
||
- "Looks like" a Docker daemon, but it dispatches (schedules)
|
||
your containers on multiple daemons
|
||
|
||
- Talks the Docker API front and back
|
||
<br/>(leverages the Docker API and ecosystem)
|
||
|
||
- Open source and written in Go (like Docker)
|
||
|
||
- Started by two of the original Docker authors
|
||
<br/>([@aluzzardi](https://twitter.com/aluzzardi) and [@vieux](https://twitter.com/vieux))
|
||
|
||
---
|
||
|
||
## Swarm (in practice)
|
||
|
||
- Stable since November 2015
|
||
|
||
- Tested with 1000 nodes + 50000 containers
|
||
<br/>.small[(without particular tuning; see DockerCon EU opening keynotes!)]
|
||
|
||
- Perfect for some scenarios (Jenkins, grid...)
|
||
|
||
- Requires extra effort for Compose build, links...
|
||
|
||
- Requires a key/value store to achieve high availability
|
||
|
||
- We'll see it in action!
|
||
|
||
---
|
||
|
||
## PAAS on Docker
|
||
|
||
- The PAAS workflow: *just push code*
|
||
<br/>(inspired by Heroku, dotCloud...)
|
||
|
||
- TL,DR: easier for devs, harder for ops,
|
||
<br/>some very opinionated choices
|
||
|
||
- A few examples:
|
||
<br/>(Non-exhaustive list!!!)
|
||
|
||
- Cloud Foundry
|
||
- Deis
|
||
- Dokku
|
||
- Flynn
|
||
- Tsuru
|
||
|
||
---
|
||
|
||
## A few other tools
|
||
|
||
- Volume plugins (Convoy, Flocker...)
|
||
|
||
- manage/migrate stateful containers (and more)
|
||
|
||
- Network plugins (Contiv, Weave...)
|
||
|
||
- overlay network so that containers can ping each other
|
||
|
||
- Powerstrip
|
||
|
||
- sits in front of the Docker API; great for experiments
|
||
|
||
- Tutum, Docker UCP (Universal Control Plane)
|
||
|
||
- dashboards to manage fleets of Docker hosts
|
||
|
||
... And many more!
|
||
|
||
---
|
||
|
||
# Hands-on Swarm
|
||
|
||

|
||
|
||
---
|
||
|
||
## Setting up our Swarm cluster
|
||
|
||
- This can be done manually or with **Docker Machine**
|
||
|
||
- Manual deployment:
|
||
|
||
- with TLS: certificate generation is painful
|
||
<br/>(needs dual-use certs)
|
||
|
||
- without TLS: easier, but insecure
|
||
<br/>(unless you run on your internal/private network)
|
||
|
||
- Docker Machine deployment:
|
||
|
||
- generates keys, certificates, and deploys them for you
|
||
|
||
- can also create VMs
|
||
|
||
---
|
||
|
||
## The Way Of The Machine
|
||
|
||
- Install `docker-machine` (single binary download)
|
||
|
||
- Set a few environment variables (cloud credentials)
|
||
|
||
- Create one or more machines:
|
||
<br/>`docker-machine create -d digitalocean node42`
|
||
|
||
- List machines and their status:
|
||
<br/>`docker-machine ls`
|
||
|
||
- Select a machine for use:
|
||
<br/>`eval $(docker-machine env node42)`
|
||
<br/>(this will set a few environment variables)
|
||
|
||
- Execute regular commands with Docker, Compose, etc.
|
||
<br/>(they will pick up remote host address from environment)
|
||
|
||
---
|
||
|
||
## Docker Machine `generic` driver
|
||
|
||
- Most drivers work the same way:
|
||
|
||
- use cloud API to create instance
|
||
|
||
- connect to instance over SSH
|
||
|
||
- install Docker
|
||
|
||
- The `generic` driver skips the first step
|
||
|
||
- It can install Docker on any machine,
|
||
<br/>as long as you have SSH access
|
||
|
||
- We will use that!
|
||
|
||
---
|
||
|
||
# Deploying Swarm
|
||
|
||
- Components involved:
|
||
|
||
- service discovery mechanism
|
||
<br/>(we'll use Docker's hosted system)
|
||
|
||
- swarm manager
|
||
<br/>(runs on `node1`, exposes Docker API)
|
||
|
||
- swarm agent
|
||
<br/>(runs on each node, registers it with service discovery)
|
||
|
||
---
|
||
|
||
# Cluster discovery
|
||
|
||
- Possible backends:
|
||
|
||
- dynamic, self-hosted (zk, etcd, consul)
|
||
|
||
- static (command-line or file)
|
||
|
||
- hosted by Docker (token)
|
||
|
||
- We will use the token mechanism
|
||
|
||
---
|
||
|
||
## Generating our Swarm discovery token
|
||
|
||
The token is a unique identifier, corresponding to a bucket
|
||
in the discovery service hosted by Docker Inc.
|
||
|
||
(You can consider it as a rendez-vous point for your cluster.)
|
||
|
||
.exercise[
|
||
|
||
- Create your token, saving it preciusly to disk as well:
|
||
|
||
```
|
||
TOKEN=$(docker run swarm create | tee token)
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Swarm agent
|
||
|
||
- Used only for dynamic discovery (zk, etcd, consul, token)
|
||
|
||
- Must run on each node
|
||
|
||
- Every 20s (by default), tells to the discovery system:
|
||
</br>"Hello, there is a Swarm node at A.B.C.D:EFGH"
|
||
|
||
- Must know the node's IP address
|
||
<br/>(sorry, it can't figure it out by itself, because
|
||
<br/>it doesn't know whether to use public or private addresses)
|
||
|
||
- The node continues to work even if the agent dies
|
||
|
||
- Automatically started by Docker Machine
|
||
<br/>(when the `--swarm` option is passed)
|
||
|
||
---
|
||
|
||
## Swarm manager
|
||
|
||
- Today: must run on the leader node
|
||
|
||
- Later: can run on multiple nodes, with leader election
|
||
|
||
- Automatically started by Docker Machine
|
||
<br/>(when the `--swarm-master` option is passed)
|
||
|
||
.exercise[
|
||
|
||
- Connect to `node1`
|
||
|
||
- "Create" a node with Docker Machine
|
||
|
||
.small[
|
||
```
|
||
docker-machine create --driver generic \
|
||
--swarm --swarm-master --swarm-discovery token://$TOKEN \
|
||
--generic-ssh-user docker --generic-ip-address 1.2.3.4 node1
|
||
```
|
||
]
|
||
|
||
]
|
||
|
||
(Don't forget to replace 1.2.3.4 with the node IP address!)
|
||
|
||
---
|
||
|
||
## Check our node
|
||
|
||
Let's connect to the node *individually*.
|
||
|
||
.exercise[
|
||
|
||
- Select the node with Machine
|
||
|
||
```
|
||
eval $(docker-machine env node1)
|
||
```
|
||
|
||
- Execute some Docker commands
|
||
|
||
```
|
||
docker version
|
||
docker info
|
||
docker ps
|
||
```
|
||
|
||
]
|
||
|
||
Two containers should show up: the agent and the manager.
|
||
|
||
---
|
||
|
||
## Check our (single-node) Swarm cluster
|
||
|
||
Let's connect to the manager instead.
|
||
|
||
.exercise[
|
||
|
||
- Select the Swarm manager with Machine
|
||
|
||
```
|
||
eval $(docker-machine env node1 --swarm)
|
||
```
|
||
|
||
- Execute some Docker commands
|
||
|
||
```
|
||
docker version
|
||
docker info
|
||
docker ps
|
||
```
|
||
|
||
]
|
||
|
||
The output is different! Let's review this.
|
||
|
||
---
|
||
|
||
## `docker version`
|
||
|
||
Swarm identifies itself clearly:
|
||
|
||
```
|
||
Client:
|
||
Version: 1.9.1
|
||
API version: 1.21
|
||
Go version: go1.4.2
|
||
Git commit: a34a1d5
|
||
Built: Fri Nov 20 13:20:08 UTC 2015
|
||
OS/Arch: linux/amd64
|
||
|
||
Server:
|
||
Version: swarm/1.0.1
|
||
API version: 1.21
|
||
Go version: go1.5.2
|
||
Git commit: 744e3a3
|
||
Built:
|
||
OS/Arch: linux/amd64
|
||
```
|
||
|
||
---
|
||
|
||
## `docker info`
|
||
|
||
Swarm gives cluster information, showing all nodes:
|
||
|
||
```
|
||
Containers: 3
|
||
Images: 6
|
||
Role: primary
|
||
Strategy: spread
|
||
Filters: affinity, health, constraint, port, dependency
|
||
Nodes: 1
|
||
node: 52.89.117.68:2376
|
||
└ Containers: 3
|
||
└ Reserved CPUs: 0 / 2
|
||
└ Reserved Memory: 0 B / 3.86 GiB
|
||
└ Labels: executiondriver=native-0.2,
|
||
kernelversion=3.13.0-53-generic,
|
||
operatingsystem=Ubuntu 14.04.2 LTS,
|
||
provider=generic, storagedriver=aufs
|
||
CPUs: 2
|
||
Total Memory: 3.86 GiB
|
||
Name: 2ec2e6c4054e
|
||
```
|
||
|
||
---
|
||
|
||
## `docker ps`
|
||
|
||
- This one should show nothing at this point.
|
||
|
||
- The Swarm containers are hidden.
|
||
|
||
- This avoids unneeded pollution.
|
||
|
||
- This also avoids killing them by mistake.
|
||
|
||
---
|
||
|
||
## Add other nodes to the cluster
|
||
|
||
- Let's use *almost* the same command line
|
||
<br/>(but without `--swarm-master`)
|
||
|
||
.exercise[
|
||
|
||
- Stay on `node1` (it has keys and certificates now!)
|
||
|
||
- Add another node with Docker Machine
|
||
|
||
.small[
|
||
```
|
||
docker-machine create --driver generic \
|
||
--swarm --swarm-discovery token://$TOKEN \
|
||
--generic-ssh-user docker --generic-ip-address 1.2.3.4 node2
|
||
```
|
||
]
|
||
]
|
||
|
||
Remember to update the IP address correctly.
|
||
|
||
Repeat for all 4 nodes.
|
||
|
||
Pro tip: look for name/address mapping in `/etc/hosts`.
|
||
|
||
---
|
||
|
||
## Scripting
|
||
|
||
To help you a little bit:
|
||
|
||
```
|
||
grep node[2345] /etc/hosts | grep -v ^127 |
|
||
while read IPADDR NODENAME
|
||
do docker-machine create --driver generic \
|
||
--swarm --swarm-discovery token://$TOKEN \
|
||
--generic-ssh-user docker \
|
||
--generic-ip-address $IPADDR $NODENAME
|
||
done
|
||
```
|
||
|
||
---
|
||
|
||
## Running containers on Swarm
|
||
|
||
Try to run a few `busybox` containers.
|
||
|
||
Then, let's get serious:
|
||
|
||
.exercise[
|
||
|
||
- Start a Redis service:
|
||
<br/>`docker run -dP redis`
|
||
|
||
- See the service address:
|
||
<br/>`docker port $(docker ps -lq) 6379`
|
||
|
||
]
|
||
|
||
This can be any of your five nodes.
|
||
|
||
---
|
||
|
||
# Building our app on Swarm
|
||
|
||
Before trying to build our app, we will remove previous images.
|
||
|
||
.exercise[
|
||
|
||
- Delete all images with "dockercoins" in the name:
|
||
|
||
```
|
||
docker images |
|
||
grep dockercoins |
|
||
awk '{print $1}' |
|
||
xargs -r docker rmi -f
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Building our app on Swarm
|
||
|
||
- Compose now supports builds on Swarm
|
||
<br/>(older versions would crash)
|
||
|
||
.exercise[
|
||
|
||
- Run `docker-compose build`
|
||
|
||
- Scale a few containers:
|
||
```
|
||
docker-compose scale worker=10
|
||
docker-compose scale webui=2
|
||
```
|
||
|
||
- Check where the containers are running with `docker ps`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Caveats when building with Swarm
|
||
|
||
- Containers are only scheduled where they were built
|
||
|
||
- cause: images are not present on all nodes
|
||
|
||
- solution: distribute images through a registry
|
||
<br/>(e.g. Docker Hub)
|
||
|
||
- You can end up with inconsistent versions
|
||
<br/>(i.e. `dockercoins_rng:latest` being different on two nodes)
|
||
|
||
- cause: build nodes can come and go
|
||
|
||
- solution: always pin builds to the same node
|
||
|
||
- Also, caching doesn't work all the time
|
||
|
||
---
|
||
|
||
## Why can't Swarm do this automatically for us?
|
||
|
||
- Let's step back and think for a minute ...
|
||
|
||
- What should `docker build` do on Swarm?
|
||
|
||
- build on one machine
|
||
|
||
- build everywhere ($$$)
|
||
|
||
- After the build, what should `docker run` do?
|
||
|
||
- run where we built (how do we know where it is?)
|
||
|
||
- run on any machine that has the image
|
||
|
||
- Could Compose+Swarm solve this automatically?
|
||
|
||
---
|
||
|
||
## A few words about "sane defaults"
|
||
|
||
- *It would be nice if Swarm could pick a node, and build there!*
|
||
|
||
- but which node should it pick?
|
||
- what if the build is very expensive?
|
||
- what if we want to distribute the build across nodes?
|
||
- what if we want to tag some builder nodes?
|
||
- ok but what if no node has been tagged?
|
||
|
||
- *It would be nice if Swarm could automatically push images!*
|
||
|
||
- using the Docker Hub is an easy choice
|
||
<br/>(you just need an account)
|
||
- but some of us can't/won't use Docker Hub
|
||
<br/>(for compliance reasons or because no network access)
|
||
|
||
.small[("Sane" defaults are nice only if we agree on the definition of "sane")]
|
||
|
||
---
|
||
|
||
## The plan
|
||
|
||
- Build locally
|
||
|
||
- Tag images
|
||
|
||
- Upload them to the hub
|
||
<br/>(Note: this part requires a Docker Hub account!)
|
||
|
||
- Update the Compose file to use those images
|
||
|
||
*That's the purpose of the `build-tag-push.py` script!*
|
||
|
||
---
|
||
|
||
## Docker Hub account
|
||
|
||
- You need a Docker Hub account for that part
|
||
|
||
- If you don't have one, create it
|
||
|
||
.exercise[
|
||
|
||
- Set the following environment variable:
|
||
|
||
```
|
||
export DOCKERHUB_USER=jpetazzo
|
||
```
|
||
|
||
- (Use *your* Docker Hub login, of course!)
|
||
|
||
- Log into the Docker Hub:
|
||
|
||
```
|
||
docker login
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Build, Tag, And Push
|
||
|
||
Let's inspect the source code of `build-tag-push.py` and run it.
|
||
|
||
.icon[] It is better to run it against a single node!
|
||
|
||
(There are some race conditions within Swarm when building+pushing too fast.)
|
||
|
||
.exercise[
|
||
|
||
- Point to a single node:
|
||
<br/>`eval $(docker-machine env node1)`
|
||
|
||
- Run the script (from the `dockercoins` directory):
|
||
<br/>`../build-tag-push.py`
|
||
|
||
- Inspect the `docker-compose.yml-XXX` file that it created
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Can we run this now?
|
||
|
||
Let's try!
|
||
|
||
.exercise[
|
||
|
||
- Switch back to the Swarm cluster:
|
||
<br/>`eval $(docker-machine env node1 --swarm)`
|
||
|
||
- Protip - set the `COMPOSE_FILE` variable:
|
||
<br/>`export COMPOSE_FILE=docker-compose.yml-XXX`
|
||
|
||
- Bring up the application:
|
||
<br/>`docker-compose up`
|
||
|
||
]
|
||
|
||
--
|
||
|
||
It won't work, because Compose and Swarm do not collaborate
|
||
to establish *placement constraints*.
|
||
|
||
--
|
||
|
||
(╯°□°)╯︵ ┻━┻
|
||
|
||
---
|
||
|
||
## Simple container dependencies
|
||
|
||
- Container A has a link to container B
|
||
|
||
- Compose starts B first, then A
|
||
|
||
- Swarm translates the link into a placement constraint:
|
||
|
||
- *"put A on the same node as B"*
|
||
|
||
- Alles gut
|
||
|
||
---
|
||
|
||
## Complex container dependencies
|
||
|
||
- Container A has a link to containers B and C
|
||
|
||
- Compose starts B and C first
|
||
<br/>(but that can be on different nodes!)
|
||
|
||
- Compose starts A
|
||
|
||
- Swarm translates the links into placements contraints
|
||
|
||
- *"put A on the same node as B"*
|
||
- *"put A on the same node as C"*
|
||
|
||
- If B and C are on different nodes, that's impossible
|
||
|
||
So, what do‽
|
||
|
||
---
|
||
|
||
## A word on placement constraints
|
||
|
||
- Swarm supports constraints
|
||
|
||
- We could tell swarm to put all our containers together
|
||
|
||
- Linking would work
|
||
|
||
- But all containers would end up on the same node
|
||
|
||
--
|
||
|
||
- So having a cluster would be pointless!
|
||
|
||
---
|
||
|
||
# Network plumbing on Swarm
|
||
|
||
- We will use one-tier, dynamic ambassadors
|
||
<br/>(as seen before)
|
||
|
||
- Other available options:
|
||
|
||
- injecting service addresses in environment variables
|
||
|
||
- implementing service discovery in the application
|
||
|
||
- use an overlay network
|
||
|
||
---
|
||
|
||
## Revisiting `jpetazzo/hamba`
|
||
|
||
- Configuration is stored in a *volume*
|
||
|
||
- A watcher process looks for configuration updates,
|
||
<br/>and restarts HAProxy when needed
|
||
|
||
- It can be started without configuration:
|
||
|
||
```
|
||
docker run --name amba jpetazzo/hamba run
|
||
```
|
||
|
||
- There is a helper to inject a new configuration:
|
||
|
||
```
|
||
docker run --rm --volumes-from amba jpetazzo/hamba \
|
||
reconfigure 80 backend1 port1 backend2 port2 ...
|
||
```
|
||
|
||
.footnote[Note: configuration validation and error messages
|
||
will be logged by the ambassador, not the `reconfigure` container.]
|
||
|
||
---
|
||
|
||
## Should we use `links` for our ambassadors?
|
||
|
||
Technically, we could use links.
|
||
|
||
- Before starting an app container:
|
||
|
||
start the ambassador(s) it needs
|
||
|
||
- When starting an app container:
|
||
|
||
link it to its ambassador(s)
|
||
|
||
But we wouldn't be able to use `docker-compose scale` anymore.
|
||
|
||
---
|
||
|
||
## Network namespaces and `extra_hosts`
|
||
|
||
This is our plan:
|
||
|
||
- Replace each `link` with an `extra_host`,
|
||
<br/>pointing to the `127.127.X.X` address space
|
||
|
||
- Start app containers normally
|
||
<br/>(`docker-compose up`, `docker-compose scale`)
|
||
|
||
- Start ambassadors after app containers are up:
|
||
|
||
- ambassadors bind to `127.127.X.X`
|
||
|
||
- they share their client's network namespace
|
||
|
||
- Reconfigure ambassadors each time something changes
|
||
|
||
---
|
||
|
||
## Our plan for service discovery
|
||
|
||
- Replace all `links` with static `/etc/hosts` entries
|
||
|
||
- Those entries will map to `127.127.0.X`
|
||
<br/>(with different `X` for each service)
|
||
|
||
- Example: `redis` will point to `127.127.0.2`
|
||
<br/>(instead of a container address)
|
||
|
||
- Start all services; scale them if we want
|
||
<br/>(at this point, they will all fail to connect)
|
||
|
||
- Start ambassadors in the services' namespace;
|
||
<br/>each ambassador will listen on the right `127.127.0.X`
|
||
|
||
- Gather all backend addresses and configure ambassadors
|
||
|
||
.icon[] Services should try to reconnect!
|
||
|
||
---
|
||
|
||
## "Design for failure," they said
|
||
|
||
- When the containers are started, the network is not ready
|
||
|
||
- First connection attempts **will fail**
|
||
|
||
- App should try to reconnect
|
||
|
||
- It is OK to crash and restart
|
||
|
||
- Exponential back-off is nice
|
||
|
||
---
|
||
|
||
## Our tools
|
||
|
||
- `link-to-ambassadors.py`
|
||
|
||
- replaces all `links` with `extra_hosts` entries
|
||
|
||
- `create-ambassadors.py`
|
||
|
||
- scans running containers
|
||
- allocates `127.127.X.X` addresses
|
||
- starts (unconfigured) ambassadors
|
||
|
||
- `configure-ambassadors.py`
|
||
|
||
- scans running containers
|
||
- gathers backend addresses
|
||
- sends configuration to ambassadors
|
||
|
||
---
|
||
|
||
## Convert links to ambassadors
|
||
|
||
- When we ran `build-tag-push.py` earlier,
|
||
<br/>it generated a new `docker-compose.yml-XXX` file.
|
||
|
||
.exercise[
|
||
|
||
- Run the first script to create a new YAML file:
|
||
<br/>`../link-to-ambassadors.py $COMPOSE_FILE new.yml`
|
||
|
||
- Look how the file was modified:
|
||
<br/>`diff $COMPOSE_FILE new.yml`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Change `$COMPOSE_FILE` in place
|
||
|
||
The script can take zero, one, or two file name arguments:
|
||
|
||
- two arguments indicate input and output files to use;
|
||
- with one argument, the file will be modified in place;
|
||
- with zero agument, it will act on `$COMPOSE_FILE`.
|
||
|
||
For convenience, let's avoid having a bazillion files around.
|
||
|
||
.exercise[
|
||
|
||
- Remove the temporary Compose file we just created:
|
||
<br/>`rm -f new.yml`
|
||
|
||
- Update `$COMPOSE_FILE` in place:
|
||
<br/>`../link-to-ambassadors.py`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Bring up the application
|
||
|
||
The application can now be started and scaled.
|
||
|
||
.exercise[
|
||
|
||
- Start the application:
|
||
<br/>`docker-compose up -d`
|
||
|
||
- Scale the application:
|
||
<br/>`docker-compose scale worker=5 rng=10`
|
||
|
||
]
|
||
|
||
Note: you can scale everything as you like, *except Redis*,
|
||
because it is stateful.
|
||
|
||
---
|
||
|
||
## Create the ambassadors
|
||
|
||
This has to be executed each time you create new services
|
||
or scale up existing ones.
|
||
|
||
After reading `$COMPOSE_FILE`, it will scan running containers, and compare:
|
||
|
||
- the list of app containers,
|
||
- the list of ambassadors.
|
||
|
||
It will create missing ambassadors.
|
||
|
||
.exercise[
|
||
|
||
- Run the script!
|
||
<br/>`../create-ambassadors.py`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Configure the ambassadors
|
||
|
||
All ambassadors are created but they still need configuration.
|
||
|
||
That's the purpose of the last script.
|
||
|
||
It will read `$COMPOSE_FILE` and gather:
|
||
|
||
- the list of app backends,
|
||
- the list of ambassadors.
|
||
|
||
Then it configures all ambassadors with all found backends.
|
||
|
||
.exercise[
|
||
|
||
- Run it!
|
||
<br/>`../configure-ambassadors.py`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Check what we did
|
||
|
||
.exercise[
|
||
|
||
|
||
- Find out the address of the web UI:
|
||
<br/>`docker-compose ps webui`
|
||
|
||
- Point your browser to it
|
||
|
||
- Check the logs:
|
||
<br/>`docker-compose logs`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
# Going further
|
||
|
||
Scaling the application (difficulty: easy)
|
||
|
||
- Run `docker-compose scale`
|
||
|
||
- Re-create ambassadors
|
||
|
||
- Re-configure ambassadors
|
||
|
||
- No downtime
|
||
|
||
---
|
||
|
||
## Going further
|
||
|
||
Deploying a new version (difficulty: easy)
|
||
|
||
- Just re-run all the steps!
|
||
|
||
- However, Compose will re-create the containers
|
||
|
||
- You will have to re-create ambassadors
|
||
<br/>(and configure them)
|
||
|
||
- You will have to cleanup old ambassadors
|
||
<br/>(left as an exercise for the reader)
|
||
|
||
- You will experience a little bit of downtime
|
||
|
||
---
|
||
|
||
## Going further
|
||
|
||
Zero-downtime deployment (difficulty: medium)
|
||
|
||
- Isolate stateful services
|
||
<br/>(like we did earlier for Redis)
|
||
|
||
- Do blue/green deployment:
|
||
|
||
- deploy and scale version N
|
||
|
||
- point a "top-level" load balancer to the app
|
||
|
||
- deploy and scale version N+1
|
||
|
||
- put both apps in the "top-level" balancer
|
||
|
||
- slowly switch traffic over to app version N+1
|
||
|
||
---
|
||
|
||
## Going further
|
||
|
||
Use the new networking features (difficulty: medium)
|
||
|
||
- Create a key/value store (e.g. Consul cluster)
|
||
|
||
- Reconfigure all Engines to use the key/value store
|
||
|
||
- Load balancers can use DNS for backend discovery
|
||
|
||
Note: this is really easy to do with a 1-node Consul cluster.
|
||
|
||
---
|
||
|
||
## Going further
|
||
|
||
Harder projects:
|
||
|
||
- Two-tier or three-tier ambassador deployments
|
||
|
||
- Deploy to Mesos or Kubernetes
|
||
|
||
---
|
||
|
||
class: pic
|
||
|
||

|
||
|
||
---
|
||
|
||
# Here be dragons
|
||
|
||
- So far, we've used stable products (versions 1.X)
|
||
|
||
- We're going to explore experimental software
|
||
|
||
- **Use at your own risk**
|
||
|
||
---
|
||
|
||
# Setting up Consul and overlay networks
|
||
|
||
- We will reconfigure our Swarm cluster to enable overlays
|
||
|
||
- We will deploy a Consul cluster
|
||
|
||
- We will connect containers running on different machines
|
||
|
||
---
|
||
|
||
## First, let's Clean All The Things!
|
||
|
||
- We need to remove the old containers
|
||
<br/>(in particular the `swarm` agents and managers)
|
||
|
||
.exercise[
|
||
|
||
- The following snippet will nuke all containers on all hosts:
|
||
|
||
```
|
||
for N in 1 2 3 4 5
|
||
do
|
||
ssh node$N "docker ps -qa | xargs -r docker rm -f"
|
||
done
|
||
```
|
||
|
||
(If it asks you to confirm SSH keys, just do it!)
|
||
|
||
]
|
||
|
||
Note: our Swarm cluster is now broken.
|
||
|
||
---
|
||
|
||
## Remove old Machine information
|
||
|
||
- We will use `docker-machine rm`
|
||
|
||
- With the `generic` driver, this doesn't do anything
|
||
<br/>(it just deletes local configuration)
|
||
|
||
- With cloud/VM drivers, this would actually delete VMs
|
||
|
||
.exercise[
|
||
|
||
- Remove our nodes from Docker Machine config database:
|
||
|
||
```
|
||
for N in 1 2 3 4 5
|
||
do
|
||
docker-machine rm -f node$N
|
||
done
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Add extra options to our Engines
|
||
|
||
- We need two new options for our engines:
|
||
|
||
- `cluster-store` (to indicate which key/value store to use)
|
||
|
||
- `cluster-advertise` (to indicate which IP address to register)
|
||
|
||
- `cluster-store` will be `consul://localhost:8500`
|
||
<br/>(we will run one Consul node on each machine)
|
||
|
||
- `cluster-advertise` will be `eth0:2376`
|
||
<br/>(Engine will automatically pick up eth0's IP address)
|
||
|
||
---
|
||
|
||
## Reconfiguring Swarm clusters, the Docker way
|
||
|
||
- The traditional way to reconfigure a service is to edit
|
||
its configuration (or init script), then restart
|
||
|
||
- We can use Machine to make that easier
|
||
|
||
- Re-deploying with Machine's `generic` driver will reconfigure
|
||
Engines with the new parameters
|
||
|
||
.exercise[
|
||
|
||
- Re-provision the manager node:
|
||
|
||
.small[
|
||
```
|
||
docker-machine create --driver generic \
|
||
--engine-opt cluster-store=consul://localhost:8500 \
|
||
--engine-opt cluster-advertise=eth0:2376 \
|
||
--swarm --swarm-master --swarm-discovery consul://localhost:8500 \
|
||
--generic-ssh-user docker --generic-ip-address 52.32.216.30 node1
|
||
```
|
||
]
|
||
]
|
||
|
||
---
|
||
|
||
## Reconfigure the other nodes
|
||
|
||
- Once again, scripting to the rescue!
|
||
|
||
.exercise[
|
||
|
||
```
|
||
grep node[2345] /etc/hosts | grep -v ^127 |
|
||
while read IPADDR NODENAME
|
||
do docker-machine create --driver generic \
|
||
--engine-opt cluster-store=consul://localhost:8500 \
|
||
--engine-opt cluster-advertise=eth0:2376 \
|
||
--swarm --swarm-discovery consul://localhost:8500 \
|
||
--generic-ssh-user docker \
|
||
--generic-ip-address $IPADDR $NODENAME
|
||
done
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Checking what we did
|
||
|
||
.exercise[
|
||
|
||
- Directly point the CLI to a node and check configuration:
|
||
|
||
```
|
||
eval $(docker-machine env node1)
|
||
docker info
|
||
```
|
||
|
||
(should show `Cluster store` and `Cluster advertise`)
|
||
|
||
- Try to talk to the Swarm cluster:
|
||
|
||
```
|
||
eval $(docker-machine env node1 --swarm)
|
||
docker info
|
||
```
|
||
|
||
(should show zero node)
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Why zero node?
|
||
|
||
- We haven't started Consul yet
|
||
|
||
- Swarm discovery is not operationl
|
||
|
||
- Swarm can't discover the nodes
|
||
|
||
Note: good guy ~~Stevedore~~ Docker will start without K/V
|
||
|
||
(This lets us run Consul itself in a container!)
|
||
|
||
---
|
||
|
||
## Adding Consul
|
||
|
||
- We will run Consul in containers
|
||
|
||
- We will use a
|
||
[custom consul image](https://hub.docker.com/r/jpetazzo/consul/)
|
||
|
||
- We will tell Docker to automatically restart it on reboots
|
||
|
||
- To simplify network setup, we will use `host` networking
|
||
|
||
---
|
||
|
||
## Starting the first Consul node
|
||
|
||
.exercise[
|
||
|
||
- Log into `node1`
|
||
|
||
- The first node must be started with the `-bootstrap` flag:
|
||
|
||
```
|
||
CID=$(docker run --name consul_node1 \
|
||
-d --restart=always --net host \
|
||
jpetazzo/consul agent -server -bootstrap)
|
||
```
|
||
|
||
- Find the internal IP address of that node
|
||
<br/>With This One Weird Trick:
|
||
|
||
```
|
||
IPADDR=$(ip a ls dev eth0 |
|
||
sed -n 's,.*inet \(.*\)/.*,\1,p')
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Starting the other Consul nodes
|
||
|
||
.exercise[
|
||
|
||
- The other nodes have to be startd with the `-join IP.AD.DR.ESS` flag:
|
||
|
||
```
|
||
for N in 2 3 4 5; do
|
||
ssh node$N docker run --name consul_node$N \
|
||
-d --restart=always --net host \
|
||
jpetazzo/consul agent -server -join $IPADDR
|
||
done
|
||
```
|
||
|
||
- With your browser, navigate to any instance on port 8500
|
||
<br/>(in "NODES" you should see the five nodes)
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Check that our Consul cluster is up
|
||
|
||
- Let's run a couple of useful Consul commands
|
||
|
||
.exercise[
|
||
|
||
- Ask Consul the list of members it knows:
|
||
```
|
||
docker run --net host --rm jpetazzo/consul members
|
||
```
|
||
|
||
- Ask Consul which node is the current leader:
|
||
```
|
||
curl localhost:8500/v1/status/leader
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Check that our Swarm cluster is up
|
||
|
||
.exercise[
|
||
|
||
- Try again the `docker info` from earlier:
|
||
|
||
```
|
||
eval $(docker-machine env --swarm node1)
|
||
docker info
|
||
```
|
||
|
||
- Now all nodes should be visible
|
||
<br/>(Give them a minute or two to register)
|
||
|
||
]
|
||
|
||
---
|
||
|
||
# Multi-host networking
|
||
|
||
- Docker 1.9 has the concept of *networks*
|
||
|
||
- By default, containers are on the default "bridge" network
|
||
|
||
- You can create additional networks
|
||
|
||
- Containers can be on multiple networks
|
||
|
||
- Containers can dynamically join/leave networks
|
||
|
||
- The "overlay" driver lets networks span multiple hosts
|
||
|
||
- Let's see that in action!
|
||
|
||
---
|
||
|
||
## Create a few networks and containers
|
||
|
||
.exercise[
|
||
|
||
```
|
||
docker network create --driver overlay jedi
|
||
docker network create --driver overlay darkside
|
||
docker network ls
|
||
```
|
||
|
||
]
|
||
|
||
--
|
||
|
||
(Don't worry, there won't be any spoiler here, I have
|
||
been so busy preparing this workshop that I haven't
|
||
seen the new movie yet!)
|
||
|
||
--
|
||
|
||
.exercise[
|
||
|
||
```
|
||
docker run -d --name luke --net jedi -m 3G redis
|
||
docker run -d --name vador --net jedi -m 3G redis
|
||
docker run -d --name palpatine --net darkside -m 3G redis
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Check connectivity within networks
|
||
|
||
.exercise[
|
||
|
||
- Check that our containers are on different networks:
|
||
|
||
```
|
||
docker ps
|
||
```
|
||
|
||
- This will work:
|
||
|
||
```
|
||
docker exec -ti vador ping luke
|
||
```
|
||
|
||
- This will not:
|
||
|
||
```
|
||
docker exec -ti vador ping palpatine
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Dynamically connect containers
|
||
|
||
.exercise[
|
||
|
||
- ~~Connect `vador` to the `darkside`:~~
|
||
- To the `darkside`, connect `vador` we must:
|
||
|
||
```
|
||
docker network connect darkside vador
|
||
```
|
||
|
||
- Now this will work:
|
||
|
||
```
|
||
docker exec -ti vador ping palpatine
|
||
```
|
||
|
||
- Take a peek inside `vador`:
|
||
|
||
```
|
||
docker exec -ti vador ip addr ls
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Dynamically disconnecting containers
|
||
|
||
.exercise[
|
||
|
||
- This works, right:
|
||
|
||
```
|
||
docker exec -ti vador ping luke
|
||
```
|
||
|
||
- Let's disconnect `vador` from the `jedi` ~~order~~ network:
|
||
|
||
```
|
||
docker network disconnect jedi vador
|
||
```
|
||
|
||
- And now:
|
||
|
||
```
|
||
docker exec -ti vador ping luke
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Cleaning up
|
||
|
||
.exercise[
|
||
|
||
- Destroy containers:
|
||
|
||
```
|
||
docker rm -f luke vador palpatine
|
||
```
|
||
|
||
- Destroy networks:
|
||
|
||
```
|
||
docker network rm jedi
|
||
docker network rm darkside
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
# Compose and multi-host networking
|
||
|
||
.icon[] Here be 7-headed flame-throwing hydras!
|
||
|
||
- This is super experimental
|
||
|
||
- Your cluster is likely to blow up to bits
|
||
|
||
- Situation is much better in Engine 1.10 and Compose 1.6
|
||
<br/>(currently in RC; to be released circa February 2016!)
|
||
|
||
---
|
||
|
||
## Revisiting DockerCoins
|
||
|
||
.exercise[
|
||
|
||
- Go back to the `dockercoins` app:
|
||
|
||
```
|
||
cd ~/orchestration-workshop/dockercoins
|
||
```
|
||
|
||
- Re-execute `build-tag-push` to get a fresh Compose file:
|
||
|
||
```
|
||
eval $(docker-machine env -u)
|
||
../build-tag-push.py
|
||
export COMPOSE_FILE=docker-compose.yml-XXX
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Add `container_name` to Compose file
|
||
|
||
.exercise[
|
||
|
||
- Edit the Compose file
|
||
|
||
- In the `hasher`, `rng`, and `redis` sections, add:
|
||
<br/>`container_name: XXX`
|
||
<br/>(where XXX is the name of the section)
|
||
|
||
- Also, comment out the `volumes` section
|
||
|
||
]
|
||
|
||
Note: by default, containers will be named `dockercoins_XXX_1`
|
||
(instead of `XXX`) and links will not work.
|
||
|
||
*This is no longer necessary with Compose 1.6!*
|
||
|
||
---
|
||
|
||
## Run the app
|
||
|
||
.exercise[
|
||
|
||
- Add two custom experimental flags:
|
||
|
||
```
|
||
docker-compose \
|
||
--x-networking --x-network-driver=overlay \
|
||
up -d
|
||
```
|
||
|
||
- Check the `webui` endpoint address:
|
||
|
||
```
|
||
docker-compose ps webui
|
||
```
|
||
|
||
- Go to the webui with your browser!
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Scale the app
|
||
|
||
.exercise[
|
||
|
||
- Don't forget the custom experimental flags:
|
||
|
||
```
|
||
docker-compose \
|
||
--x-networking --x-network-driver=overlay \
|
||
scale worker=2
|
||
```
|
||
|
||
- Look at the graph in your browser
|
||
|
||
]
|
||
|
||
Note: with Compose 1.6 and Engine 1.10, you can have
|
||
multiple containers with the same DNS name, thus
|
||
achieving "natural" load balancing through DNS round robin.
|
||
|
||
---
|
||
|
||
## Cleaning up
|
||
|
||
.exercise[
|
||
|
||
- Terminate containers and remove them:
|
||
|
||
```
|
||
docker-compose kill
|
||
docker-compose rm -f
|
||
```
|
||
|
||
]
|
||
|
||
Note: Compose 1.5 doesn't support changes to an
|
||
existing app (except basic scaling).
|
||
|
||
When trying to do `docker-compose -x-... up` on existing
|
||
apps, you might get errors like this one:
|
||
<br/>.small[`ERROR: unable to find a node that satisfies container==38aac...`]
|
||
|
||
If that happens, just kill+rm the app and try again.
|
||
|
||
---
|
||
|
||
# Highly available Swarm managers
|
||
|
||
- Until now, the Swarm manager was a SPOF
|
||
<br/>(Single Point Of Failure)
|
||
|
||
- Swarm has experimental support for replication
|
||
|
||
- When replication is enabled, you deploy multiple (identical) managers
|
||
|
||
- one will be "primary"
|
||
- the other(s) will be "secondary"
|
||
- this is determined automatically
|
||
<br/>(through *leader election*)
|
||
|
||
---
|
||
|
||
## Swarm leader election
|
||
|
||
- The leader election mechanism relies on a key/value store
|
||
<br/>(consul, etcd, zookeeper)
|
||
|
||
- There is no requirement on the number of replicas
|
||
<br/>(the quorum is achieved through the key/value store)
|
||
|
||
- When the leader (or "primary") is unavailable,
|
||
<br/>a new election happens automatically
|
||
|
||
- You can issue API requests to any manager:
|
||
<br/>if you talk to a secondary, it forwards to the primary
|
||
|
||
.icon[] There is currently a bug when
|
||
the Consul cluster itself has a leader election; see [docker/swarm#1782](
|
||
https://github.com/docker/swarm/issues/1782).
|
||
|
||
---
|
||
|
||
## Swarm replication in practice
|
||
|
||
- We need to give two extra flags to the Swarm manager:
|
||
|
||
- `--replication`
|
||
|
||
*enables replication (duh!)*
|
||
|
||
- `--advertise ip.ad.dr.ess:port`
|
||
|
||
*address and port where this Swarm manager is reachable*
|
||
|
||
- Do you deploy with Docker Machine?
|
||
<br/>Then you can use `--swarm-opt`
|
||
to automatically pass flags to the Swarm manager
|
||
|
||
---
|
||
|
||
## Cleaning up our current Swarm containers
|
||
|
||
- We will use Docker Machine to re-provision Swarm
|
||
|
||
- We need to:
|
||
|
||
- remove the nodes from the Machine registry
|
||
|
||
- remove the Swarm containers
|
||
|
||
.exercise[
|
||
|
||
- Remove the current configuration:
|
||
```
|
||
for N in 1 2 3 4 5; do
|
||
ssh node$N docker rm -f swarm-agent swarm-agent-master
|
||
docker-machine rm -f node$N
|
||
done
|
||
```
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Re-deploy with the new configuration
|
||
|
||
- This time, we can deploy each node identically
|
||
<br/>(instead of 1 manager + 4 non-managers)
|
||
|
||
.exercise[
|
||
|
||
- Deploy all five nodes with the previous options,
|
||
and the new replication options:
|
||
|
||
.small[
|
||
```
|
||
grep node[12345] /etc/hosts | grep -v ^127 |
|
||
while read IPADDR NODENAME; do
|
||
docker-machine create --driver generic \
|
||
--engine-opt cluster-store=consul://localhost:8500 \
|
||
--engine-opt cluster-advertise=eth0:2376 \
|
||
--swarm --swarm-master \
|
||
--swarm-discovery consul://localhost:8500 \
|
||
--swarm-opt replication --swarm-opt advertise=$IPADDR:3376 \
|
||
--generic-ssh-user docker --generic-ip-address $IPADDR $NODENAME
|
||
done
|
||
```
|
||
]
|
||
|
||
]
|
||
|
||
.small[
|
||
Note: Consul is still running thanks to the `--restart=always` policy.
|
||
Other containers are now stopped, because the engines have been
|
||
reconfigured and restarted.
|
||
]
|
||
|
||
---
|
||
|
||
## Assess our new cluster health
|
||
|
||
- The output of `docker info` will tell us the status
|
||
of the node that we are talking to (primary or replica)
|
||
|
||
- If we talk to a replica, it will tell us who is the primary
|
||
|
||
.exercise[
|
||
|
||
- Talk to a random node, and ask its view of the cluster:
|
||
```
|
||
eval $(docker-machine env node3 --swarm)
|
||
docker info | grep -e ^Name -e ^Role -e ^Primary
|
||
```
|
||
|
||
]
|
||
|
||
Note: `docker info` is one of the only commands that will
|
||
work even when there is no elected primary. This helps
|
||
debugging.
|
||
|
||
---
|
||
|
||
## Test Swarm manager failover
|
||
|
||
- The previous command told us which node was the primary manager
|
||
|
||
- if `Role` is `primary`,
|
||
<br/>then the primary is indicated by `Name`
|
||
|
||
- if `Role` is `replica`,
|
||
<br/>then the primary is indicated by `Primary`
|
||
|
||
.exercise[
|
||
|
||
- Powercycle the primary manager:
|
||
```
|
||
ssh XXX sudo reboot
|
||
```
|
||
|
||
]
|
||
|
||
Look at the output of `docker info` every few seconds.
|
||
|
||
---
|
||
|
||
# Highly available containers
|
||
|
||
- Swarm has support for *rescheduling* on node failure
|
||
|
||
- It has to be explicitly enabled on a per-container basis
|
||
|
||
- When the primary manager detects that a node goes down,
|
||
<br/>those containers are rescheduled elsewhere
|
||
|
||
- If the containers can't be rescheduled (constraints issue),
|
||
<br/>they are lost (there is no reconciliation loop yet)
|
||
|
||
- As of Swarm 1.1.0, this is an *experimental* feature
|
||
<br/>(To enable it, you must pass the
|
||
`--experimental` flag when you start Swarm itself!)
|
||
|
||
---
|
||
|
||
## Working around flag order
|
||
|
||
- The flag must be *before* the Swarm command
|
||
<br/>(i.e. `docker run swarm --experimental manage ...`)
|
||
|
||
- We cannot use Docker Machine to pass that flag ☹
|
||
<br/>(Machine adds flags *after* the Swarm command)
|
||
|
||
- Instead, we will use the Swarm image `jpetazzo/swarm:experimental`:
|
||
```
|
||
FROM swarm
|
||
ENTRYPOINT ["/swarm", "--experimental"]
|
||
```
|
||
|
||
- We can tell Machine to use this with `--swarm-image`
|
||
|
||
---
|
||
|
||
## Reconfigure Swarm [one more time](https://www.youtube.com/watch?v=FGBhQbmPwH8)
|
||
|
||
.exercise[
|
||
|
||
- Redeploy Swarm with `--experimental`:
|
||
|
||
.small[
|
||
```
|
||
for N in 1 2 3 4 5; do
|
||
ssh node$N docker rm -f swarm-agent swarm-agent-master
|
||
docker-machine rm -f node$N
|
||
done
|
||
|
||
grep node[12345] /etc/hosts | grep -v ^127 |
|
||
while read IPADDR NODENAME; do
|
||
docker-machine create --driver generic \
|
||
--engine-opt cluster-store=consul://localhost:8500 \
|
||
--engine-opt cluster-advertise=eth0:2376 \
|
||
--swarm --swarm-master --swarm-image jpetazzo/swarm:experimental \
|
||
--swarm-discovery consul://localhost:8500 \
|
||
--swarm-opt replication --swarm-opt advertise=$IPADDR:3376 \
|
||
--generic-ssh-user docker --generic-ip-address $IPADDR $NODENAME
|
||
done
|
||
```
|
||
]
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## Start a resilient container
|
||
|
||
- By default, containers will not be restarted when their node goes down
|
||
|
||
- You must pass an explicit *rescheduling policy* to make that happen
|
||
|
||
- For now, the only policy is "on-node-failure"
|
||
|
||
.exercise[
|
||
|
||
- Start a container with a rescheduling policy:
|
||
|
||
.small[
|
||
```
|
||
docker run -d --name highlander -e reschedule:on-node-failure redis
|
||
```
|
||
]
|
||
|
||
]
|
||
|
||
Check that the container is up and running.
|
||
|
||
---
|
||
|
||
## Simulate a node failure
|
||
|
||
- We will reboot the node running this container
|
||
|
||
- Swarm will reschedule it
|
||
|
||
.exercise[
|
||
|
||
- Check on which node the container is running:
|
||
</br>.small[`NODE=$(docker inspect --format '{{.Node.Name}}' highlander)`]
|
||
|
||
- Reboot that node:
|
||
<br/>`ssh $NODE sudo reboot`
|
||
|
||
- Check that the container has been recheduled:
|
||
<br/>`docker ps`
|
||
|
||
]
|
||
|
||
---
|
||
|
||
## .icon[] Caveats
|
||
|
||
- There are some corner cases when the node is also
|
||
the Swarm leader or the Consul leader; this is being improved
|
||
right now!
|
||
|
||
- Swarm doesn't handle gracefully the fact that after the
|
||
reboot, you have *two* containers named `highlander`,
|
||
and attempts to manipulate the container with its name
|
||
will not work. This will be improved too.
|
||
|
||
---
|
||
|
||
# Conclusions
|
||
|
||
- Bad news: we still have work to do to deploy our apps
|
||
|
||
- it's not all unicorns, ponies, and rainbows
|
||
|
||
- *no, Docker will not make your job obsolete*
|
||
|
||
- Good news: a lot of hard things are becoming easier
|
||
|
||
- building, packaging, distributing apps
|
||
|
||
- running distributed systems on clusters
|
||
|
||
---
|
||
|
||
## "This is complicated"
|
||
|
||
- The scripts used here are pretty simple
|
||
<br/>(each is less than 100 LOCs)
|
||
|
||
- You can easily rewrite them in your favorite language,
|
||
<br/>adapt and customize them, in a few hours of time
|
||
|
||
- FYI: those scripts are smaller and simpler than the
|
||
scripts (cloud init etc) used to deploy the VMs for this
|
||
workshop!
|
||
|
||
- Docker Inc. has commercial products to wrap all this:
|
||
|
||
- Docker Cloud
|
||
<br/>(manage your Docker nodes from a SAAS portal)
|
||
|
||
- Universal Control Plane
|
||
<br/>(buzzword-compliant management solution:
|
||
<br/>turnkey, enterprise-class, on-premise, etc.)
|
||
|
||
---
|
||
|
||
## What's next?
|
||
|
||
- November 2015: Compose 1.5 + Engine 1.9 =
|
||
<br/>first release with multi-host networking
|
||
|
||
- January 2016: Compose 1.6 + Engine 1.10 =
|
||
<br/>HUGE improvements (DNS server, HA...)
|
||
|
||
- Next release: another truckload of features
|
||
|
||
- I will deliver this workshop about twice a month
|
||
|
||
- Check out the GitHub repo for updated content!
|
||
|
||
---
|
||
|
||
class: title
|
||
|
||
# Thanks! <br/> Questions?
|
||
|
||
### [@jpetazzo](https://twitter.com/jpetazzo) <br/> [@docker](https://twitter.com/docker)
|
||
|
||
</textarea>
|
||
<script src="https://gnab.github.io/remark/downloads/remark-0.5.9.min.js" type="text/javascript">
|
||
</script>
|
||
<script type="text/javascript">
|
||
var slideshow = remark.create();
|
||
</script>
|
||
</body>
|
||
</html>
|