container.training/www/htdocs/index.html

<!DOCTYPE html>
<html>
  <head>
    <base target="_blank">
    <title>Docker Orchestration Workshop</title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    <style type="text/css">
      @import url(https://fonts.googleapis.com/css?family=Yanone+Kaffeesatz);
      @import url(https://fonts.googleapis.com/css?family=Droid+Serif:400,700,400italic);
      @import url(https://fonts.googleapis.com/css?family=Ubuntu+Mono:400,700,400italic);

      body { font-family: 'Droid Serif'; font-size: 150%; }

      h1, h2, h3 {
        font-family: 'Yanone Kaffeesatz';
        font-weight: normal;
      }
      a {
        text-decoration: none;
        color: blue;
      }
      .remark-code, .remark-inline-code { font-family: 'Ubuntu Mono'; }
      .red { color: #fa0000; }
      .gray { color: #ccc; }
      .small { font-size: 70%; }
      .big { font-size: 140%; }
      .underline { text-decoration: underline; }
      .footnote {
        position: absolute;
        bottom: 3em;
      }
      .pic {
        vertical-align: middle;
        text-align: center;
        padding: 0 0 0 0 !important;
      }
      img {
        max-width: 100%;
        max-height: 450px;
      }
      .title {
        vertical-align: middle;
        text-align: center;
      }
      .title {
        font-size: 2em;
      }
      .title .remark-slide-number {
        font-size: 0.5em;
      }
      .quote {
        background: #eee;
        border-left: 10px solid #ccc;
        margin: 1.5em 10px;
        padding: 0.5em 10px;
        quotes: "\201C""\201D""\2018""\2019";
        font-style: italic;
      }
      .quote:before {
        color: #ccc;
        content: open-quote;
        font-size: 4em;
        line-height: 0.1em;
        margin-right: 0.25em;
        vertical-align: -0.4em;
      }
      .quote p {
        display: inline;
      }
      .icon img {
        height: 1em;
      }
      .exercise {
        background-color: #eee;
        background-image: url("keyboard.png");
        background-size: 1.4em;
        background-repeat: no-repeat;
        background-position: 0.2em 0.2em;
        border: 2px dotted black;
      }
      .exercise::before {
        content: "Exercise:";
        margin-left: 1.8em;
      }
      li p { line-height: 1.25em; }
    </style>
  </head>
  <body>
    <textarea id="source">

class: title

# Docker <br/> Orchestration <br/> Workshop

---

## Logistics

- Hello! We are:
  <br/>`jerome at docker dot com`
  <br/>`aj at soulshake dot net`

<!--
Reminder, when updating the agenda: when people are told to show
up at 9am, they usually trickle in until 9:30am (except for paid
training sessions). If you're not sure that people will be there
on time, it's a good idea to have a breakfast with the attendees
at e.g. 9am, and start at 9:30.
-->

- Agenda:

  .small[
  - 09:00-09:15 hello!
  - 09:15-10:45 part 1
  - 10:45-11:00 coffee break
  - 11:00-12:30 part 2
  - 12:30-13:45 lunch break
  - 13:45-15:15 part 3
  - 15:15-15:30 coffee break
  - 15:30-17:00 part 4
  ]

<!-- - This will be FAST PACED, but DON'T PANIC! -->

- All the content is publicly available
  <br/>(slides, code samples, scripts)

<!--
Remember to change:
- the Gitter link below
- the other Gitter link
- the "tweet my speed" hashtag in DockerCoins HTML
-->

- Experimental chat support on
  [Gitter](https://gitter.im/jpetazzo/workshop-20160322-munchen)

---


<!--
grep '^# ' index.html | grep -v '<br' | tr '#' '-'^C
-->

## Outline (1/4)

- Pre-requirements
- VM environment
- Our sample application
- Running services independently
- Running the whole app on a single node
- Identifying bottlenecks
- Measuring latency under load
- Scaling HTTP on a single node
- Put a load balancer on it
- Connecting to containers on other hosts
- Abstracting remote services with ambassadors
- Various considerations about ambassadors

---

## Outline (2/4)

- Docker for ops
- Backups
- Logs
- Storing container logs in an ELK stack
- Security upgrades
- Network traffic analysis

---

## Outline (3/4)

- Dynamic orchestration
- Hands-on Swarm
- Deploying Swarm
- Cluster discovery
- Building our app on Swarm
- Connecting containers with ambassadors
- Setting up Consul and overlay networks
- Multi-host networking
- Using overlay networks with Compose

---

## Outline (4/4)

- Here be dragons
- Highly available Swarm managers
- Highly available containers
- Conclusions

---

# Pre-requirements

- Computer with network connection and SSH client
  <br/>(on Windows, get [putty](http://www.putty.org/)
  or [Git BASH](https://msysgit.github.io/))

- Basic Docker knowledge
  <br/>(but that's OK if you're not a Docker expert!)

---

## Nice-to-haves

- [GitHub](https://github.com/join) account
  <br/>(if you want to fork the repo; also used to join Gitter)

- [Gitter](https://gitter.im/) account
  <br/>(to join the conversation during the workshop)

- [Docker Hub](https://hub.docker.com) account
  <br/>(it's one way to distribute images on your Swarm cluster)

---

## Hands-on sections

- The whole workshop is hands-on

- I will show Docker in action

- I invite you to reproduce what I do

- All hands-on sections are clearly identified
  <br/>(see below)

.exercise[

- This is the stuff you're supposed to do!
- Go to [container.training](http://container.training/) to view these slides
- Join the chat room on
  [Gitter](https://gitter.im/jpetazzo/workshop-20160322-munchen)

]

---

# VM environment

- Each person gets 5 VMs
- They are *your* VMs
- They'll be up until tomorrow
- You have a little card with login+password+IP addresses
- You can automatically SSH from one VM to another

.exercise[

- Log into the first VM (`node1`)
- Check that you can SSH (without password) to `node2`
- Check the version of docker with `docker version`

]

.footnote[Note: from now on, unless instructed, **all commands must
be run from the first VM, `node1`**.]

---

## Terminals

Once in a while, the instructions will say:
<br/>"Open a new terminal."

There are multiple ways to do this:

- create a new window or tab on your machine,
  <br/>and SSH into the VM;

- use tmux on the VM and open a new window in tmux.

If you want to use screen or whatever, you're welcome!

---

## Tmux cheatsheet

- Ctrl-b c → creates a new window
- Ctrl-b n → go to next window
- Ctrl-b p → go to previous window
- Ctrl-b " → split window top/bottom
- Ctrl-b % → split window left/right
- Ctrl-b Alt-1 → rearrange windows in columns
- Ctrl-b Alt-2 → rearrange windows in rows
- Ctrl-b arrows → navigate to other windows
- Ctrl-b d → detach session
- tmux attach → reattach to session

---

## Brand new versions!

- Engine 1.10.**3**

- Compose 1.6.2

- Swarm 1.1.3

- Machine 0.6.0

---

# Our sample application

- Let's look at the general layout of the
  [source code](https://github.com/jpetazzo/orchestration-workshop)

- Each directory = 1 microservice
  - `rng` = web service generating random bytes
  - `hasher` = web service computing hash of POSTed data
  - `worker` = background process using `rng` and `hasher`
  - `webui` = web interface to watch progress

.exercise[

- Clone the repository on `node1`:
  <br/>.small[`git clone git://github.com/jpetazzo/orchestration-workshop`]

]

(Bonus points for forking on GitHub and cloning your fork!)

---

## What's this application?

--

![DockerCoins logo](dockercoins.png)

(DockerCoins logo courtesy of @jonasrosland. Thanks!)

---

## What's this application?

- It is a DockerCoin miner! 💰🐳📦🚢

- No, you can't buy coffee with DockerCoins

- How DockerCoins works:

  - `worker` asks to `rng` to give it random bytes
  - `worker` feeds those random bytes into `hasher`
  - each hash starting with `0` is a DockerCoin
  - DockerCoins are stored in `redis`
  - `redis` is also updated every second to track speed
  - you can see the progress with the `webui`

Next: we will inspect components independently.

---

# Running services independently

First, we will run the random number generator (`rng`).

.exercise[

- Go to the `dockercoins` directory, in the cloned repo:
  <br/>`cd orchestration-workshop/dockercoins`

- Use Compose to run the `rng` service:
  <br/>`docker-compose up rng`

- Docker will pull `python` and build the microservice

]

---

## Lies, damn lies, and port numbers

.icon[![Warning](warning.png)] Pay attention to the port mapping!

- The container log says:
  <br/>`Running on http://0.0.0.0:80/`

- But if you try `curl localhost:80`, you will get:
  <br/>`Connection refused`

- Port 80 on the container ≠ port 80 on the Docker host

---

## Understanding port mapping

- `node1`, the Docker host, has only one port 80

- If we give the one and only port 80 to the first
  container who asks for it, we are in trouble when
  another container needs it

- Default behavior: containers are not "exposed"
  <br/>(only reachable by the Docker host and other containers,
  through their private address)

- Container network services can be exposed:

  - statically (you decide which host port to use)

  - dynamically (Docker allocates a host port)

---

## Declaring port mapping

- Directly with the Docker Engine:
  <br/>`docker run -d -p 8000:80 nginx`
  <br/>`docker run -d -p 80 nginx`
  <br/>`docker run -d -P nginx`

- With Docker Compose, in the `docker-compose.yml` file:

```
rng:
  …
  ports:
    - "8001:80"
```

→ port 8001 *on the host* maps to
port 80 *in the container*

---

## Using the `rng` service

Let's get random bytes of data!

.exercise[

- Open a new terminal and connect to the same VM

<!--
```
NEW-TERM
```
-->

- Check that the service is alive:
  <br/>`curl localhost:8001`

- Get 10 bytes of random data:
  <br/>`curl localhost:8001/10`

- If the binary data output messed up your terminal, fix it:
  <br/>`reset`

]

---

## Running the hasher

.exercise[

- Open yet another terminal

<!--
```
NEW-TERM
```
-->

- Start the `hasher` service:
  <br/>`docker-compose up hasher`

- It will pull `ruby` and do the build

]

.icon[![Warning](warning.png)] Again, pay attention to the port mapping!

The container log says that it's listening on port 80,
but it's mapped to port 8002 on the host.

You can see the mapping in `docker-compose.yml`.

---

## Testing the hasher

.exercise[

- Open one more terminal to `node1`

<!--
```
NEW-TERM
```
-->

- Check that the `hasher` service is alive:
  <br/>`curl localhost:8002`

- Posting binary data requires some extra flags:

  ```
  curl \
    -H "Content-type: application/octet-stream" \
    --data-binary hello \
    localhost:8002
  ```

- Check that it computed the right hash:
  <br/>`echo -n hello | sha256sum`

]

---

## Stopping services

We have multiple options:

- Interrupt `docker-compose up` with `^C`

- Stop individual services with `docker-compose stop rng`

- Stop all services with `docker-compose stop`

- Kill all services with `docker-compose kill`
  <br/>(rude, but faster!)

- Stop and remove all services with `docker-compose down`

.exercise[

- Use any of those methods to stop `rng` and `hasher`

]

???

This hidden content is here for automation
(so that `docker-compose kill` gets executed
when auto-testing the content).

.exercise[

```
docker-compose kill
```

]

---

# Running the whole app on a single node

.exercise[

- Run `docker-compose up` to start all components

]

- `rng` and `hasher` can be started directly

- Other components are built accordingly

- Aggregate output is shown

- Output is verbose
  <br/>(because the worker is constantly hitting other services)

---

## Viewing our application

- The app exposes a Web UI with a realtime progress graph

.exercise[

- Open http://[yourVMaddr]:8000/ (from a browser)

]

- The app actually has a constant, steady speed
  <br/>(3.33 coins/second)

- The speed seems not-so-steady because:

  - we measure a discrete value over discrete intervals

  - the measurement is done by the browser

  - BREAKING: network latency is a thing

---

## Running in the background

- The logs are very verbose (and won't get better)

- Let's put them in the background for now!

.exercise[

- Stop the app (with `^C`)

- Start it again with `docker-compose up -d`

- Check on the web UI that the app is still making progress

]

Note: there is a regression in Compose 1.6 when it
is installed as a self-contained binary: `^C` doesn't
stop the containers. It will be fixed soon.
Meanwhile, installing with `pip` is fine too.

---

## Looking at resource usage

- Let's look at CPU, memory, and I/O usage

.exercise[

- run `top` to see CPU and memory usage
  <br/>(you should see idle cycles)

- run `vmstat 3` to see I/O usage (si/so/bi/bo)
  <br/>(the 4 numbers should be almost zero,
  <br/>except `bo` for logging)

]

We have available resources.

- Why?
- How can we use them?

---

## Scaling workers on a single node

- Docker Compose supports scaling.red[*]
- Let's scale `worker` and see what happens!

.exercise[

- Start 9 more `worker` containers:
  <br/>`docker-compose scale worker=10`

- Check the aggregated logs of those containers:
  <br/>`docker-compose logs worker`

- See the impact on CPU load (with top/htop),
  <br/>and on compute speed (with web UI)

]

.footnote[.red[*]With some limitations, as we'll see later.]

---

# Identifying bottlenecks

- You should have seen a 3x speed bump (not 10x)

- Adding workers didn't result in linear improvement

- *Something else* is slowing us down

--

- ... But what?

--

- The code doesn't have instrumentation

- Let's use state-of-the-art HTTP performance analysis!
  <br/>(i.e. good old tools like `ab`, `httping`...)

???

## Benchmarking our microservices

We will test microservices in isolation.

.exercise[

- Stop the application:
  `docker-compose kill`

- Remove old containers:
  `docker-compose rm`

- Start `hasher` and `rng`:
  `docker-compose up hasher rng`

]

Now let's hammer them with requests!

???

## Testing `rng`

Let's assess the raw performance of our RNG.

.exercise[

- Test the performance on one big request:
  <br/>`curl -o/dev/null localhost:8001/10000000`
  <br/>(should take ~1s, and show speed of ~10 MB/s)

]

If we were doing requests of 1000 bytes ...

... Could we get 10k req/s?

Let's test and see what happens!

???

## Concurrent requests

.exercise[

- Test 100 requests of 1000 bytes each:
  <br/>`ab -n 100 localhost:8001/1000`

- Test 100 requests, 10 requests in parallel:
  <br/>`ab -n 100 -c 10 localhost:8001/1000`
  <br/>(look how the latency has increased!)

- Try with 100 requests in parallel:
  <br/>`ab -n 100 -c 100 localhost:8001/1000`

]

??

Whatever we do, we get ~10 requests/second.

Increasing concurrency doesn't help:
it just increases latency.

???

## Discussion

- When serving requests sequentially, they each take 100ms

- When 10 requests arrive at the same time:

  - one request is served in 100ms
  - another is served in 200ms
  - another is served in 300ms
  - ...
  - another is served in 1000ms

- All requests are queued and served by a single thread

- It looks like `rng` doesn't handle concurrent requests

- What about `hasher`?

???

## Save some random data and stop the generator

Before testing the hasher, let's save some random
data that we will feed to the hasher later.

.exercise[

- Run `curl localhost:8001/1000000 > /tmp/random`

]

Now we can stop the generator.

.exercise[

- In the shell where you did `docker-compose up rng`,
  <br/>stop it by hitting `^C`

]

???

## Benchmarking the hasher

We will hash the data that we just got from `rng`.

.exercise[

- Posting binary data requires some extra flags:

  ```
  curl \
    -H "Content-type: application/octet-stream" \
    --data-binary @/tmp/random \
    localhost:8002
  ```

- Compute the hash locally to verify that it works fine:
  <br/>`sha256sum /tmp/random`
  <br/>(it should display the same hash)

]

???

## The hasher under load

The invocation of `ab` will be slightly more complex as well.

.exercise[

- Execute 100 requests in a row:

  ```
  ab -n 100 -T application/octet-stream \
     -p /tmp/random localhost:8002/
  ```

- Execute 100 requests with 10 requests in parallel:

  ```
  ab -c 10 -n 100 -T application/octet-stream \
     -p /tmp/random localhost:8002/
  ```

]

Take note of the performance numbers (requests/s).

???

## Benchmarking the hasher on smaller data

Here we hashed 1,000,000 bytes.

Later we will hash much smaller payloads.

Let's repeat the tests with smaller data.

.exercise[

- Run `truncate --size=10 /tmp/random`
- Repeat the `ab` tests

]

---

# Measuring latency under load

We will use `httping`.

.exercise[

- Scale back the `worker` service to zero:
  <br/>`docker-compose scale worker=0`

- Open a new terminal and check the latency of `rng`:
  <br/>`httping localhost:8001`

- Open a new terminal and do the same for `hasher`:
  <br/>`httping localhost:8002`

- Keep an eye on both connections!

]

---

## Latency in initial conditions

Latency for both services should be very low (~1ms).

Now add a first worker and see what happens.

.exercise[

- Create the first `worker` instance:
  <br/>`docker-compose scale worker=1`

]

- `hasher` should be very low (~1ms)

- `rng` should be low, with occasional spikes (10-100ms)

---

## Latency when scaling the worker

We will add workers and see what happens.

.exercise[

- Run `docker-compose scale worker=2`

- Check latency

- Increase number of workers and repeat

]

What happens?

- `hasher` remains low
- `rng` spikes up until it is reaches ~(N-2)*100ms
  <br/>(when you have N workers)

---

class: title

Why?

---

## Why does everything take (at least) 100ms?

--

`rng` code:

![RNG code screenshot](delay-rng.png)

--

`hasher` code:

![HASHER code screenshot](delay-hasher.png)

---

class: title

But ...

WHY?!?

---

## Why did we sprinkle this sample app with sleeps?

- Deterministic performance
  <br/>(regardless of instance speed, CPUs, I/O...)

--

- Actual code sleeps all the time anyway

--

- When your code makes a remote API call:

  - it sends a request;

  - it sleeps until it gets the response;

  - it processes the response.

---

## Why do `rng` and `hasher` behave differently?

![Equations on a blackboard](equations.png)

--

(Synchronous vs. asynchronous event processing)

---

## How to make `rng` go faster

- Obvious solution: comment out the `sleep` instruction

--

- Unfortunately, in the real world, network latency exists

--

- More realistic solution: use an asynchronous framework
  <br/>(e.g. use gunicorn with gevent)

--

- New rule: we can't change the code!

--

- Solution: scale out `rng`
  <br/>(dispatch `rng` requests on multiple instances)

---

# Scaling HTTP on a single node

- We could try to scale with Compose:

  ```
  docker-compose scale rng=3
  ```

- Compose doesn't deal with load balancing

- We would get 3 instances ...

- ... But only the first one would serve traffic

---

## The plan

<!--
- Stop the `rng` service first
-->

- Create multiple identical `rng` containers

- Put a load balancer in front of them

- Point other services to the load balancer

???

## Stopping `rng`

- That's the easy part!

.exercise[

- Use `docker-compose` to stop `rng`:

  ```
  docker-compose stop rng
  ```

]

Note: we do this first because we are about to remove
`rng` from the Docker Compose file.

If we don't stop
`rng` now, it will remain up and running, with Compose
being unaware of its existence!

---

## Scaling `rng`

.exercise[

- Replace the `rng` service with multiple copies of it:

  ```
  rng1:
    build: rng

  rng2:
    build: rng

  rng3:
    build: rng
  ```

]

That's all!

Shortcut: `docker-compose.yml-scaled-rng`

---

## Introduction to `jpetazzo/hamba`

- Public image on the Docker Hub

- Load balancer based on HAProxy

- Expects the following arguments:
  <br/>`FE-port BE1-addr:BE1-port BE2-addr:BE2-port ...`
  <br/>*or*
  <br/>`FE-addr:FE-port BE1-addr:BE1-port BE2-addr:BE2-port ...`

  - FE=frontend (the thing other services connect to)

  - BE=backend (the multiple copies of your scaled service)

.small[
Example: listen to port 80 and balance traffic on www1:1234 + www2:2345

```
docker run -d -p 80 jpetazzo/hamba 80 www1:1234 www2:2345
```
]

---

# Put a load balancer on it

Let's add our load balancer to the Compose file.

.exercise[

- Add the following section to the Compose file:

  ```
  rng:
      image: jpetazzo/hamba
      links:
        - rng1
        - rng2
        - rng3
      command: 80 rng1 80 rng2 80 rng3 80
      ports:
        - "8001:80"
  ```

]

Shortcut: `docker-compose.yml-scaled-rng`

???

## Point other services to the load balancer

- The only affected service is `worker`

- We have to replace the `rng` link with a link to `rng0`,
  but it should still be named `rng` (so we don't change the code)

.exercise[

- Update the `worker` section as follows:

  ```
  worker:
    build: worker
    links:
      - rng0:rng
      - hasher
      - redis
   ```

]

Shortcut: `docker-compose.yml-scaled-rng`

---

## Start the whole stack

.exercise[

- Start the new services:
  <br/>`docker-compose up -d`

- Check worker logs:
  <br/>`docker-compose logs worker`

- Check load balancer logs:
  <br/>`docker-compose logs rng`

]

<!--
If you get errors about port 8001, make sure that
`rng` was stopped correctly and try again.
-->

---

## Results

- Check the latency of `rng`
  <br/>(it should have improved significantly!)

- Check the application performance in the Web UI
  <br/>(it should improve if you have enough workers)

*Note: if `worker` was scaled when you did `docker-compose up`,
it probably took a while, because `worker` doesn't handle
signals properly and Docker patiently waits 10 seconds for
each `worker` instance to terminate. This would be much
faster for a well-behaved application.*

---

## The good, the bad, the ugly

--

- The good

  We scaled a service, added a load balancer -
  <br/>without changing a single line of code.

--

- The bad

  We manually copy-pasted sections in `docker-compose.yml`.

  Improvement: write scripts to transform the YAML file.

--

- The ugly

  If we scale up/down, we have to restart everything.

  Improvement: reconfigure the load balancer dynamically.

---

# Connecting to containers on other hosts

- So far, our whole stack is on a single machine

- We want to scale out (across multiple nodes)

- We will deploy the same stack multiple times

- But we want every stack to use the same Redis
  <br/>(in other words: Redis is our only *stateful* service here)

--

- And remember: we're not allowed to change the code!

  - the code connects to host `redis`
  - `redis` must resolve to the address of our Redis service
  - the Redis service must listen on the default port (6379)

???

## Using host name injection to abstract service dependencies

- It is possible to add host entries to a container

- With the CLI:

  ```
  docker run --add-host redis:192.168.1.2 myservice...
  ```

- In a Compose file:

  ```
  myservice:
    image: myservice
    extra_host:
      redis: 192.168.1.2
  ```

- Docker exposes a DNS server to the container,
  <br/>with a private view where `redis` resolves to `192.168.1.2`
  (Before Engine 1.10, it created entries in `/etc/hosts`)

???

## The plan

- Deploy our Redis service separately

  - use the same `redis` image

  - make sure that Redis server port (6379) is publicly accessible,
    using port 6379 on the Docker host

- Update our Docker Compose YAML file

  - remove the `redis` section

  - in the `links` section, remove `redis`

  - instead, put a `redis` entry in `extra_hosts`

Note: the code stays on the first node!
<br/>(We do not need to copy the code to the other nodes.)

???

## Making Redis available on its default port

There are two strategies.

- `docker run -p 6379:6379 redis`

  - the container has its own, isolated network stack
  - Docker creates a port mapping rule through iptables
  - slight performance overhead
  - port number is explicit (visible through Docker API)

- `docker run --net host redis`

  - the container uses the network stack of the host
  - when it binds to 6379/tcp, that's 6379/tcp on the host
  - allows raw speed (no overhead due to iptables/bridge)
  - port number is not visible through Docker API

Choose wisely!

???

## Deploy Redis

.exercise[

- Start a new redis container, mapping port 6379 to 6379:

  ```
  docker run -d -p 6379:6379 redis
  ```

- Check that it's running with `docker ps`

- Note the IP address of this Docker host

- Try to connect to it (from anywhere):

  ```
  curl node1:6379
  ```

]

The `ERR` messages are normal: Redis speaks Redis, not HTTP.

???

## Update `docker-compose.yml` (1/3)

.exercise[

- Comment out `redis`:

  ```
  #redis:
  #    image: redis
  ```

]

???

## Update `docker-compose.yml` (2/3)

.exercise[

- Update `worker`:

  ```
  worker:
    build: worker
    extra_hosts:
      redis: A.B.C.D
    links:
      - rng
      - hasher
  ```

]

Replace `A.B.C.D` with the IP address noted earlier.

Shortcut: `docker-compose.yml-extra-hosts`
<br/>(But you still have to replace `A.B.C.D`!)

???

## Update `docker-compose.yml` (3/3)

.exercise[

- Update `webui`:

  ```
  webui:
    build: webui
    extra_hosts:
      redis: A.B.C.D
    ports:
      - "8000:80"
    volumes:
      - "./webui/files/:/files/"
  ```

]

(Replace `A.B.C.D` with the IP address noted earlier)

???

## Start the stack on the first machine

- Nothing special to do here

- Just bring up the application like we did before

.exercise[

- `docker-compose up -d`

]

- Check in the web browser that it's running correctly

???

## Start the stack on another machine

- We will set the `DOCKER_HOST` variable

- `docker-compose` will detect and use it

- Our Docker hosts are listening on port 55555

.exercise[

- Set the environment variable:
  <br/>`export DOCKER_HOST=tcp://node2:55555`

- Start the stack:
  <br/>`docker-compose up -d`

- Check that it's running:
  <br/>`docker-compose ps`

]

???

## Scale!

.exercise[

- Keep an eye on the web UI

- Create 20 workers on both nodes:
  ```
  for NODE in node1 node2; do
    export DOCKER_HOST=tcp://$NODE:55555
    docker-compose scale worker=20
  done
  ```

]

Note: of course, if we wanted, we could run on all five nodes.

???

## Cleanup

- Let's remove what we did

.exercise[

- You can use the following scriptlet:

  ```
  for N in $(seq 1 5); do
    export DOCKER_HOST=tcp://node$N:55555
    docker ps -qa | xargs docker rm -f
  done
  unset DOCKER_HOST
  ```

]

---

# Using custom DNS mapping

- We could setup a Redis server on its default port

- And add a DNS entry mapping `redis` to this server

.exercise[

- See what happens if we run:
  ```
  docker run --add-host redis:1.2.3.4 alpine ping redis
  ```

]

There is a Compose file option for that: `extra_hosts`.

---

# Abstracting remote services with ambassadors

- What if we can't/won't run Redis on its default port?

- What if we want to be able to move it easily?

--

- We will use an ambassador

- Redis will be started independently of our stack

- It will run at an arbitrary location (host+port)

- In our stack, we replace `redis` with an ambassador

- The ambassador will connect to Redis

- The ambassador will "act as" Redis in the stack

---

class: pic

![Ambassador principle](ambassadors/principle-1.png)

---

class: pic

![Ambassador principle](ambassadors/principle-2.png)

---

## Start redis

- Start a standalone Redis container

- Let Docker expose it on a random port

.exercise[

- Run redis with a random public port:
  <br/>`docker run -d -P --name myredis redis`

- Check which port was allocated:
  <br/>`docker port myredis 6379`

]

- Note the IP address of the machine, and this port

---

## Update `docker-compose.yml`

.exercise[

<!-- Following line to be commented out if we skip extra_hosts section -->
<!--
- Restore `links` as they were before in `webui` and `worker`
-->
<!-- -->

- Replace `redis` with an ambassador using `jpetazzo/hamba`:

  ```
  redis:
    image: jpetazzo/hamba
    command: 6379 AA.BB.CC.DD EEEEE
  ```

]

Shortcut: `docker-compose.yml-ambassador`
<br/>(But you still have to update `AA.BB.CC.DD EEEE`!)

---

## Start the stack on the first machine

- Compose will detect the change in the `redis` service

- It will replace `redis` with a `jpetazzo/hamba` instance

.exercise[

- Just tell Compose to do its thing:

  ```
  docker-compose up -d
  ```

- Check that the stack is up and running:

  ```
  docker-compose ps
  ```

- Look at the Web UI to make sure that it works fine

]

---

## Start the stack on another machine

- We will set the `DOCKER_HOST` variable

- `docker-compose` will detect and use it

- Our Docker hosts are listening on port 55555

.exercise[

- Set the environment variable:
  <br/>`export DOCKER_HOST=tcp://node2:55555`

- Start the stack:
  <br/>`docker-compose up -d`

- Check that it's running:
  <br/>`docker-compose ps`

]

---

## Scale!

.exercise[

- Deploy one instance of the stack on each node:

  .small[
  ```
  for N in 3 4 5; do
    DOCKER_HOST=tcp://node$N:55555 docker-compose up -d &
  done
  ```
  ]

- Add a bunch of workers all over the place:

  .small[
  ```
  for N in 1 2 3 4 5; do
    DOCKER_HOST=tcp://node$N:55555 docker-compose scale worker=10
  done
  ```
  ]

- Admire the result in the Web UI!

]

---

## Social Media Moment

Let's celebrate our success!

(And the fact that we're just 2498349893849283948982 DockerCoins away from being able to afford a cup of coffee!)

.exercise[

- If you have a Twitter account, tweet your mining speed!
  </br>(use the "Tweet this!" link below the graph☺)

]

---

## A few words about development volumes

- Try to access the web UI on another node

--

- It doesn't work! Why?

--

- Static assets are masked by an empty volume

--

- We need to comment out the `volumes` section

---

## Why must we comment out the `volumes` section?

- Volumes have multiple uses:

  - storing persistent stuff (database files...)

  - sharing files between containers (logs, configuration...)

  - sharing files between host and containers (source...)

- The `volumes` directive expands to an host path
  <br/>.small[(e.g. `/home/docker/orchestration-workshop/dockercoins/webui/files`)]

- This host path exists on the local machine
  <br/>(not on the others)

- This specific volume is used in development
  <br/>(not in production)

---

## Stop the app (but leave Redis running)

- Let's use `docker-compose down`

- It will stop and remove the DockerCoins app
  <br/>(but leave other containers running)

.exercise[

- We can do another simple shell loop:
  ```
  for N in $(seq 1 5); do
    export DOCKER_HOST=tcp://node$N:55555
    docker-compose down &
  done
  ```

]

(We need to keep the `myredis` container for
our next section, which will be about backups!)

---

# Various considerations about ambassadors

- "But, ambassadors are adding an extra hop!"

--

- Yes, but if you need load balancing, you need that hop

- Ambassadors actually *save* one hop
  <br/>(they act as local load balancers)

  - traditional load balancer:
    <br/>client ⇒ external LB ⇒ server (2 physical hops)

  - ambassadors:
    <br/>client → ambassador ⇒ server (1 physical hop)

--

- Ambassadors are more reliable than traditional LBs
  <br/>(they are colocated with their clients)

---

## Inconvenients of ambassadors

- Generic issues
  <br/>(shared with any kind of load balancing / HA setup)

  - extra logical hop (not transparent to the client)

  - must assess backend health

  - one more thing to worry about (!)

- Specific issues

  - load balancing fairness

High-end load balancing solutions will rely on back pressure
from the backends. This addresses the fairness issue.

---

## There are many ways to deploy ambassadors

"Ambassador" is a design pattern.

There are many ways to implement it.

We will present three increasingly complex (but also powerful)
ways to deploy ambassadors.

---

## Single-tier ambassador deployment

- One-shot configuration process

- Must be executed manually after each scaling operation

- Scans current state, updates load balancer configuration

- Pros:
  <br/>- simple, robust, no extra moving part
  <br/>- easy to customize (thanks to simple design)
  <br/>- can deal efficiently with large changes

- Cons:
  <br/>- must be executed after each scaling operation
  <br/>- harder to compose different strategies

- Example: this workshop

---

## Two-tier ambassador deployment

- Daemon listens to Docker events API

- Reacts to container start/stop events

- Adds/removes back-ends to load balancers configuration

- Pros:
  <br/>- no extra step required when scaling up/down

- Cons:
  <br/>- extra process to run and maintain
  <br/>- deals with one event at a time (ordering matters)

- Hidden gotcha: load balancer creation

- Example: interlock

---

## Three-tier ambassador deployment


- Daemon listens to Docker events API

- Reacts to container start/stop events

- Adds/removes scaled services in distributed config DB
  <br/>(zookeeper, etcd, consul…)

- Another daemon listens to config DB events

- Adds/removes backends to load balancers configuration

- Pros:
  <br/>- more flexibility

- Cons:
  <br/>- three extra services to run and maintain

- Example: registrator

---

## Other multi-host communication mechanisms

- Overlay networks

  - weave, flannel, pipework ...

- Network plugins

  - available since Engine 1.9

- Allow a flat network for your containers

- Often requires an extra service to deal with BUM packets
  <br/>(broadcast/unknown/multicast)

  - e.g. a key/value store (Consul, Etcd, Zookeeper ...)

- Load balancers and/or failover mechanisms still needed

---

class: title

# Interlude <br/>

# Docker for ops

---

# Backups

- Redis is still running (with name `myredis`)

- We want to enable backups without touching it

- We will use a special backup container:

  - sharing the same volumes

  - linked to it (to connect to it easily)

  - possibly containing our backup tools

- This works because the `redis` container image
  <br/>stores its data on a volume

---

## Starting the backup container

.exercise[

- Make sure you're talking to the initial host:

  ```
  unset DOCKER_HOST
  ```

- Start the container:

  ```
  docker run --link myredis:redis \
             --volumes-from myredis \
             -v /tmp/myredis:/output \
             -ti alpine sh
  ```

- Look in `/data` in the container
  <br/>(That's where Redis puts its data dumps)
]

---

## Connecting to Redis

- We need to tell Redis to perform a data dump *now*

.exercise[

- Connect to Redis:
  <br/>`telnet redis 6379`

- Issue commands `SAVE` then `QUIT`

- Look at `/data` again (notice the time stamps)

]

- There should be a recent dump file now!

---

## Getting the dump out of the container

- We could use many things:

  - s3cmd to copy to S3
  - SSH to copy to a remote host
  - gzip/bzip/etc before copying

- We'll just copy it to the Docker host

.exercise[

- Copy the file from `/data` to `/output`

- Exit the container

- Look into `/tmp/myredis` (on the host)

]

---

## Scheduling backups

In the "old world," we (generally) use cron.

With containers, what are our options?

--

- run `cron` on the Docker host,
  <br/>and put `docker run` in the crontab

--

- run `cron` in the backup container,
  <br/>and make sure it keeps running
  <br/>(e.g. with `docker run --restart=…`)

--

- run `cron` in a container,
  <br/>and start backup containers from there

--

- listen to the Docker events stream,
  <br/>automatically scheduling backups
  <br/>when database containers are started

---

# Docker events stream

- Using the Docker API, we can get real-time
  notifications of everything happening in the Engine:

  - container creation/destruction
  - container start/stop
  - container exit/signal/out of memory
  - container attach/detach
  - volume creation/destruction
  - network creation/destruction
  - connection/disconnection of containers

(Networks will be covered a bit later!)

---

## Subscribing to the events stream

- This is done with `docker events`

.exercise[

- Get a stream of events:
  ```
  docker events
  ```

<!-- NEW-TERM -->

- In a new terminal, do *anything*:
  ```
  docker run --rm alpine sleep 10
  ```

]

You should see events for the lifecycle of the
container, as well as its connection/disconnection
to the default `bridge` network.

---

# Attaching labels

- You can attach arbitrary labels to engines and containers

- You can read the value of those labels

- You can use those labels as filters in some commands

.exercise[

- Start two containers, with and without a `backup` label:
  ```
  docker run -d --name leweb nginx
  docker run -d --name ledata --label backup=please redis
  ```

]

---

## Using labels as filters

- `docker ps` can take a `--filter` argument

.exercise[

- List only containers that have a `backup` label:
  ```
  docker ps --filter label=backup
  ```

- List only containers where the `backup` label
  has a specific value:
  ```
  docker ps --filter label=backup=please
  ```

]

---

## Filtering events

- On a large cluster, there will be *lots* of events
  <br/>(especially when using short-lived containers)

- `docker events` can also take a `--filter` argument

.exercise[

- Show events only for containers with a "backup" label:
  ```
  docker events --filter label=backup
  ```

<!-- NEW-TERM -->

- In a different terminal, terminate our containers:
  ```
  docker kill leweb ledata
  ```

]

Only the events for `ledata` will be shown.

---

## Using `docker ps` in scripts

- The default output of `docker ps` has two flaws:

  - it is not machine-readable
  - some information is not shown

- This can be changed with the `--format` flag

.exercise[

- List containers that have a `backup` label;
  <br/>show their container ID, image, and the label:
  ```
  docker ps \
    --filter label=backup \
    --format '{{ .ID }} {{ .Image }} {{ .Label "backup" }}'
  ```

]

---

# Logs

- Two strategies:

  - log to plain files on volumes

  - log to stdout
    <br/>(and use a logging driver)

---

## Logging to plain files on volumes

(Sorry, that part won't be hands-on!)

- Start a container with `-v /logs`

- Make sure that all log files are in `/logs`

- To check logs, run e.g.

  ```
  docker run --volumes-from ... ubuntu sh -c \
         "grep WARN /logs/*.log"
  ```

- Or just go interactive:

  ```
  docker run --volumes-from ... -ti ubuntu
  ```

- You can (should) start a log shipper that way

---

## Logging to stdout

- All containers should write to stdout/stderr

- Docker will collect logs and pass them to a logging driver

- Logging driver can specified globally, and per container
  <br/>(changing it for a container overrides the global setting)

- To change the global logging driver,
  <br/>pass extra flags to the daemon
  <br/>(requires a daemon restart)

- To override the logging driver for a container,
  <br/>pass extra flags to `docker run`

---

## Specifying logging flags

- `--log-driver`

  *selects the driver*

- `--log-opt key=val`

  *adds driver-specific options*
  <br/>*(can be repeated multiple times)*

- The flags are identical for `docker daemon` and `docker run`

Tip #1: when provisioning with Docker Machine, use:
```
docker-machine create ... --engine-opt log-driver=...
```

Tip #2: you can set logging options in Compose files.

---

## Available drivers

- json-file (default)

- syslog (can send to UDP, TCP, TCP+TLS, UNIX sockets)

- awslogs (AWS CloudWatch)

- journald

- gelf

- fluentd

- splunk

---

## About json-file ...

- It doesn't rotate logs by default, so your disks will fill up

  (Unless you set `maxsize` *and* `maxfile` log options.)

- It's the only one supporting logs retrieval

  (If you want to use `docker logs`, `docker-compose logs`,
  or fetch logs from the Docker API, you need json-file!)

- This might change in the future

  (But it's complex since there is no standard protocol
  to *retrieve* log entries.)

All about logging in the documentation:
https://docs.docker.com/reference/logging/overview/

---

# Storing container logs in an ELK stack

*Important foreword: this is not an "official" or "recommended"
setup; it is just an example. We do not endorse ELK, GELF,
or the other elements of the stack more than others!*

What we will do:

- Spin up an ELK stack, with Compose

- Gaze at the spiffy Kibana web UI

- Manually send a few log entries over GELF

- Reconfigure our DockerCoins app to send logs to ELK

---

## What's in an ELK stack?

- ELK is three components:

  - ElasticSearch (to store and index log entries)

  - Logstash (to receive log entries from various
    sources, process them, and forward them to various
    destinations)

  - Kibana (to view/search log entries with a nice UI)

- The only component that we will configure is Logstash

- We will accept log entries using the GELF protocol

- Log entries will be stored in ElasticSearch,
  <br/>and displayed on Logstash's stdout for debugging

---

## Starting our ELK stack

- We will use a *separate* Compose file

- The Compose file is in the `elk` directory

.exercise[

- Go to the `elk` directory:
  ```
  cd ~/orchestration-workshop/elk
  ```

- Start the ELK stack:
  ```
  docker-compose up -d
  ```

]

---

## Checking that our ELK stack works

- Our default Logstash configuration sends a test
  message every minute

- All messages are stored into ElasticSearch,
  but also shown on Logstash stdout

.exercise[

- Look at Logstash stdout:
  ```
  docker-compose logs logstash
  ```

]

After less than one minute, you should see a `"message" => "ok"`
in the output.

---

## Connect to Kibana

- Our ELK stack exposes two public services:
  <br/>the Kibana web server, and the GELF UDP socket

.exercise[

- Check the port number for the Kibana UI:
  ```
  docker-compose ps kibana
  ```

- Open the UI in your browser
  <br/>(Use the instance IP address and the public port number)

]

---

## "Configuring" Kibana

- If you see a status page with a yellow item, wait a minute and reload
  (Kibana is probably still initializing)

- Kibana should offer you to "Configure an index pattern",
  just click the "Create" button

- Then:

  - click "Discover" (in the top-left corner)
  - click "Last 15 minutes" (in the top-right corner)
  - click "Last 1 hour" (in the list in the middle)
  - click "Auto-refresh" (top-right corner)
  - click "5 seconds" (top-left of the list)

- You should see a series of green bars
  <br/>(with one new green bar every minute)

---

## Kibana out of the box

![Screenshot of Kibana](kibana.png)

---

## Sending container output to Kibana

- We will create a simple container displaying "hello world"

- We will override the container logging driver

.exercise[

- Check the port number for the GELF socket:
  <br/>`docker-compose ps logstash`

- Start a one-off container, overriding its logging driver:
  <br/>(make sure to update X.X.X.X:XXXXX, of course)

  ```
  docker run --rm --log-driver gelf \
         --log-opt gelf-address=udp://X.X.X.X:XXXXX \
         alpine echo hello world
  ```

]

---

## Visualizing container logs in Kibana

- Less than 5 seconds later (the refresh rate of the UI),
  the log line should be visible in the Web UI

- We can customize the Web UI to be more readable

.exercise[

- In the left column, move the mouse over the following
  columns, and click the "Add" button that appears:

  - host
  - container_name
  - short_message

]

---

## Removing the old deployment of DockerCoins

- Before redeploying DockerCoins, remove everything

.exercise[

- Go back to the dockercoins directory:
  <br/>`cd ~/orchestration-workshop/dockercoins`

- Stop and remove all DockerCoins containers:
  <br/>`docker-compose kill`
  <br/>`docker-compose rm -f`

- Reset the Compose file:
  <br/>`git checkout docker-compose.yml`

- Point the Docker API to a single node:
  <br/>`unset DOCKER_HOST`

]

---

## Add the logging driver to the Compose file

- We need to add the logging section to each container

- We need the GELF endpoint (host+port) that we
  got earlier with `docker-compose ps logstash`

.exercise[

- Edit the `docker-compose.yml` file,
  <br/>adding the the following lines **to each container**:

  ```
  log_driver: gelf
  log_opt:
    gelf-address: "udp://X.X.X.X:XXXXX"
  ```

]

Shortcut: `docker-compose.yml-logging`
<br/>(But you still have to update `XX.XX.XX.XX:XXXXX`!)

---

## Start the DockerCoins app

.exercise[

- Use Compose normally:
  ```
  docker-compose up -d
  ```

]

If you look in the Kibana web UI, you will see log lines
refreshed every 5 seconds.

Note: to do interesting things (graphs, searches...) we
would need to create indexes. This is beyond the scope
of this workshop.

---

## Logging in production

- If we were using an ELK stack:

  - scale ElasticSearch
  - scale Logstash
  - move away from UDP *or* put one Logstash per node
  - interpose a Redis or Kafka queue

- Configure your Engines to send all logs to ELK by default

- Start the logging containers with a different logging system
  <br/>(to avoid a logging loop)

- Make sure you don't end up writing *all logs* on the nodes running Logstash!

---

# Security upgrades

- This section is not hands-on

- Public Service Announcement

- We'll discuss:

  - how to upgrade the Docker daemon

  - how to upgrade container images

---

## Upgrading the Docker daemon

- Stop all containers cleanly

- Stop the Docker daemon

- Upgrade the Docker daemon

- Start the Docker daemon

- Start all containers

- This is like upgrading your Linux kernel,
  <br/>but it will get better

---

## In practice

- Keep track of running containers before stopping the Engine:
  ```
  docker ps --no-trunc -q |
  tee /tmp/running |
  xargs -n1 -P10 docker stop
  ```

- Restart those containers after the Engine is running again:
  ```
  xargs docker start < /tmp/running
  ```
  <br/>(Run this multiple times if you have linked containers!)

---

## Upgrading container images

- When a vulnerability is announced:

  - if it affects your base images,
    <br/>make sure they are fixed first

  - if it affects downloaded packages,
    <br/>make sure they are fixed first

  - re-pull base images

  - rebuild

  - restart containers

---

## How do we know when to upgrade?

- Subscribe to CVE notifications

  - https://cve.mitre.org/

  - your distros' security announcements

- Check CVE status in official images
  <br/>(tag [cve-tracker](
  https://github.com/docker-library/official-images/labels/cve-tracker)
  in [docker-library/official-images](
  https://github.com/docker-library/official-images/labels/cve-tracker)
  repo)

- Coming soon: Project Nautilus
  <br/>(see [this DC15EU presentation](
  http://www.slideshare.net/Docker/official-repos-and-project-nautilus/26))

---

## Upgrading with Compose

Compose makes this particularly easy:
```
docker-compose build --pull --no-cache
docker-compose up -d
```

This will automatically:

- pull base images;
- rebuild all container images;
- bring up the new containers.

Remember: Compose will automatically move our
volumes to the new containers, so data is preserved.

---

# Network traffic analysis

- We still have `myredis` running

- We will use *shared network namespaces*
  <br/>to perform network analysis

- Two containers sharing the same network namespace...

  - have the same IP addresses

  - have the same network interfaces

- `eth0` is therefore the same in both containers

---

## Install and start `ngrep`

Ngrep uses libpcap (like tcpdump) to sniff network traffic.

.exercise[

- Start a container with the same network namespace:
  <br/>`docker run --net container:myredis -ti alpine sh`

- Install ngrep:
  <br/>`apk update && apk add ngrep`

- Run ngrep:
  <br/>`ngrep -tpd eth0 -Wbyline . tcp`

]

You should see a stream of Redis requests and responses.

---

class: title

# Dynamic orchestration

---

## Static vs Dynamic

- Static

  - you decide what goes where

  - simple to describe and implement

  - seems easy at first but doesn't scale efficiently

- Dynamic

  - the system decides what goes where

  - requires extra components (HA KV...)

  - scaling can be finer-grained, more efficient

---

## Mesos (overview)

- First presented in 2009

- Initial goal: resource scheduler
  <br/>(two-level/pessimistic)

  - top-level "master" knows the global cluster state

  - "slave" nodes report status and resources to master

  - master allocates resources to "frameworks"

- Container support added recently
  <br/>(had to fit existing model)

- Network and service discovery is complex

---

## Mesos (in practice)

- Easy to setup a test cluster (in containers!)

- Great to accommodate mixed workloads
  <br/>(see Marathon, Chronos, Aurora, and many more)

- "Meh" if you only want to run Docker containers

- In production on clusters of thousands of nodes

- Open source project; commercial support available

---

## Kubernetes (overview)

- Started in June 2014

- Designed specifically as a platform for containers
  <br/>("greenfield" design)

  - "pods" = groups of containers sharing network/storage

  - Scaling and HA managed by "replication controllers"

  - extensive use of "tags" instead of e.g. tree hierarchy

- Initially designed around Docker,
  <br/>but doesn't hesitate to diverge in a few places

---

## Kubernetes (in practice)

- Network and service discovery is powerful, but complex
  <br/>.small[(different mechanisms within pod, between pods, for inbound traffic...)]

- Initially designed around GCE
  <br/>.small[(networking and persistence work better on GCE, but this is getting better)]

- Tends to be loved by ops more than devs
  <br/>.small[(but keep in mind that it's evolving quite as fast as Docker)]

- Adaptation is needed when it differs from Docker
  <br/>.small[(need to learn new API, new tooling, new concepts)]

- Bottom line: Kubernetes is not Docker!
  <br/>.small[(different APIs, concepts, configuration files...)]

---

## ECS (overview)

- Amazon EC2 Container Service

- "Bring your own instance"

- "Native" container scheduler on AWS

- Some integration with other AWS products

- No extra component to operate

- Defines new concepts:

  - task
  - task definition
  - service

---

## ECS (in practice)

- Task definitions look like Compose files,
  <br/>but are significantly different

- Integration with e.g. ELB is suboptimal
  <br/>(ELB requires all backends to run on the same port)

- Cluster deployment is made easier thanks to ECS CLI

- Docker API gets partially exposed through ECS API,
  <br/>with some features lagging behind

- Service discovery is painful

---

## Nomad (overview)

- Generic job scheduler
  <br/>(not only for containers)

- Desired state is stored in Consul

- Nodes pull jobs from Consul

- Scheduling happens in parallel

---

## Nomad (in practice)

*Disclaimer: I have little first-hand experience with Nomad!*

- Does only one thing, but does it really well

- Works with jobs, not applications, services, etc.

- As I understand it: Nomad is an excellent building block,
  <br/>but you need to add other components to deploy your apps

---

## Swarm (in theory)

- Consolidates multiple Docker hosts into a single one

- "Looks like" a Docker daemon, but it dispatches (schedules)
  your containers on multiple daemons

- Talks the Docker API front and back
  <br/>(leverages the Docker API and ecosystem)

- Open source and written in Go (like Docker)

- Started by two of the original Docker authors
  <br/>([@aluzzardi](https://twitter.com/aluzzardi) and [@vieux](https://twitter.com/vieux))

---

## Swarm (in practice)

- Stable since November 2015

- Tested with 1000 nodes + 50000 containers
  <br/>.small[(without particular tuning; see DockerCon EU opening keynotes!)]

- Perfect for some scenarios (Jenkins, grid...)

- Requires extra effort for Compose build, links...

- Requires a key/value store to achieve high availability

- We'll see it in action!

???

## PAAS on Docker

- The PAAS workflow: *just push code*
  <br/>(inspired by Heroku, dotCloud...)

- TL,DR: easier for devs, harder for ops,
  <br/>some very opinionated choices

- A few examples:
  <br/>(Non-exhaustive list!!!)

  - Cloud Foundry
  - Deis
  - Dokku
  - Flynn
  - Tsuru

*Docker made it very easy to cobble your own PAAS.*

???

## A few other tools

- Volume plugins (Convoy, Flocker...)

  - manage/migrate stateful containers (and more)

- Network plugins (Contiv, Weave...)

  - overlay network so that containers can ping each other

- Docker Cloud (Tutum), Docker UCP (Universal Control Plane)

  - dashboards to manage fleets of Docker hosts

... And many more!

---

# Hands-on Swarm

![Swarm Logo](swarm.png)

---

## Setting up our Swarm cluster

- This can be done manually or with **Docker Machine**

- Manual deployment:

  - with TLS: certificate generation is painful
    <br/>(needs dual-use certs)

  - without TLS: easier, but insecure
    <br/>(unless you run on your internal/private network)

- Docker Machine deployment:

  - generates keys, certificates, and deploys them for you

  - can also create VMs

---

## The Way Of The Machine

- Install `docker-machine` (single binary download)

- Set a few environment variables (cloud credentials)

- Create one or more machines:
  <br/>`docker-machine create -d digitalocean node42`

- List machines and their status:
  <br/>`docker-machine ls`

- Select a machine for use:
  <br/>`eval $(docker-machine env node42)`
  <br/>(this will set a few environment variables)

- Execute regular commands with Docker, Compose, etc.
  <br/>(they will pick up remote host address from environment)

---

## Docker Machine `generic` driver

- Most drivers work the same way:

  - use cloud API to create instance

  - connect to instance over SSH

  - install Docker

- The `generic` driver skips the first step

- It can install Docker on any machine,
  <br/>as long as you have SSH access

- We will use that!

---

# Deploying Swarm

- Components involved:

  - cluster discovery mechanism
    <br/>(so that the manager can learn about the nodes)

  - swarm manager
    <br/>(your frontend to the cluster)

  - swarm agent
    <br/>(runs on each node, registers it with service discovery)

---

# Cluster discovery

- Possible backends:

  - dynamic, self-hosted (zk, etcd, consul)

  - static (command-line or file)

  - hosted by Docker (token)

- We will use the token mechanism

---

## Generating our Swarm discovery token

The token is a unique identifier, corresponding to a bucket
in the discovery service hosted by Docker Inc.

(You can consider it as a rendez-vous point for your cluster.)

.exercise[

- Create your token, saving it preciously to disk as well:

  ```
  TOKEN=$(docker run swarm create | tee token)
  ```

]

---

## Swarm agent

- Used only for dynamic discovery (zk, etcd, consul, token)

- Must run on each node

- Every 20s (by default), tells to the discovery system:
  </br>"Hello, there is a Swarm node at A.B.C.D:EFGH"

- Must know the node's IP address
  <br/>(sorry, it can't figure it out by itself, because
  <br/>it doesn't know whether to use public or private addresses)

- The node continues to work even if the agent dies

- Automatically started by Docker Machine
  <br/>(when the `--swarm` option is passed)

---

## Swarm manager

- Exposes a Docker API endpoint

- Talks to the cluster nodes

- Performs healthchecks, scheduling...

.exercise[

- Connect to `node1`

- "Create" a node with Docker Machine:

  .small[
  ```
  docker-machine create --driver generic \
    --swarm --swarm-master --swarm-discovery token://$TOKEN \
    --generic-ssh-user docker --generic-ip-address A.B.C.D node1
  ```
  ]

]

(Don't forget to replace A.B.C.D with the node IP address!)

---

## Redundancy

- The manager is a SPOF

- If you lose the manager:

  - you can't control the cluster anymore

  - you can still control individual nodes

  - you can start a new manager
    <br/>(at this point, it is stateless)

- We'll setup active/passive redundancy later

---

## Check our node

Let's connect to the node *individually*.

.exercise[

- Select the node with Machine

  ```
  eval $(docker-machine env node1)
  ```

- Execute some Docker commands

  ```
  docker version
  docker info
  docker ps
  ```

]

Two containers should show up: the agent and the manager.

---

## Check our (single-node) Swarm cluster

Let's connect to the manager instead.

.exercise[

- Select the Swarm manager with Machine

  ```
  eval $(docker-machine env node1 --swarm)
  ```

- Execute some Docker commands

  ```
  docker version
  docker info
  docker ps
  ```

]

The output is different! Let's review this.

---

## `docker version`

Swarm identifies itself clearly:

```
Client:
 Version:      1.10.2
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   c3959b1
 Built:        Mon Feb 22 21:40:35 2016
 OS/Arch:      linux/amd64

Server:
 Version:      swarm/1.1.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   7e9c6bd
 Built:        Wed Mar  2 00:15:12 UTC 2016
 OS/Arch:      linux/amd64
```

---

## `docker info`

Swarm gives cluster information, showing all nodes:

.small[
```
Containers: 3
Images: 6
Role: primary
Strategy: spread
Filters: affinity, health, constraint, port, dependency
Nodes: 1
 node1: 52.58.50.15:2376
  └ Status: Healthy
  └ Containers: 3
  └ Reserved CPUs: 0 / 2
  └ Reserved Memory: 0 B / 3.859 GiB
  └ Labels: executiondriver=native-0.2,
            kernelversion=4.2.0-30-generic,
            operatingsystem=Ubuntu 15.10,
            provider=generic,
            storagedriver=aufs
  └ Error: (none)
  └ UpdatedAt: 2016-03-09T14:01:43Z
Kernel Version: 4.2.0-30-generic
Operating System: linux
Architecture: amd64
CPUs: 2
Total Memory: 3.86 GiB
Name: node1
```
]

---

## `docker ps`

- This one should show nothing at this point

- The Swarm containers are hidden

- This avoids unneeded pollution

- This also avoids killing them by mistake

- We can still see them with `docker ps -a`, though

---

## Add other nodes to the cluster

- Let's use *almost* the same command line
  <br/>(but without `--swarm-master`)

.exercise[

- Stay on `node1` (it has keys and certificates now!)

- Add another node with Docker Machine

  .small[
  ```
  docker-machine create --driver generic \
    --swarm --swarm-discovery token://$TOKEN \
    --generic-ssh-user docker --generic-ip-address A.B.C.D node2
  ```
  ]
]

Remember to update the IP address correctly.

Repeat for all 4 nodes.

Pro tip: look for name/address mapping in `/etc/hosts`.

---

## Scripting

To help you a little bit:

```
grep node[2345] /etc/hosts | grep -v ^127 |
while read IPADDR NODENAME
do docker-machine create --driver generic \
   --swarm --swarm-discovery token://$TOKEN \
   --generic-ssh-user docker \
   --generic-ip-address $IPADDR $NODENAME
done
```

Then check with `docker info` that all the nodes are there.

---

## Running containers on Swarm

Try to run a few `busybox` containers.

Then, let's get serious:

.exercise[

- Start a Redis service:
  <br/>`docker run -dP redis`

- See the service address:
  <br/>`docker port $(docker ps -lq) 6379`

]

This can be any of your five nodes.

---

## Scheduling strategies

- Random: pick a node at random
  <br/>(but honor resource constraints)

- Spread: pick the node with the least containers
  <br/>(including stopped containers)

- Binpack: try to maximize resource usage
  <br/>(in other words: use as few hosts as possible)

---

# Building our app on Swarm

Before trying to build our app, we will remove previous images.

.exercise[

- Delete all images with "dockercoins" in the name:

  ```
  docker images |
    grep dockercoins |
    awk '{print $1}' |
    xargs -r docker rmi -f
  ```

]

---

## Building our app on Swarm

- Compose now supports builds on Swarm
  <br/>(older versions would crash)

.exercise[

- Run `docker-compose build`

- Try to start and scale the application:
  ```
  docker-compose up -d
  docker-compose scale worker=10
  docker-compose scale webui=2
  ```

]

.icon[![Warning](warning.png)] This is supposed to fail!

(Don't bang your head on the keyboard if it doesn't work!)

---

## Caveats when building with Swarm

- Containers are only scheduled where they were built

  - cause: images are not present on all nodes

  - solution: distribute images through a registry
    <br/>(e.g. Docker Hub)

- You can end up with inconsistent versions
  <br/>(i.e. `dockercoins_rng:latest` being different on two nodes)

  - cause: build nodes can come and go

  - solution: always pin builds to the same node

- Also, caching doesn't work all the time

---

## Why can't Swarm do this automatically for us?

- Let's step back and think for a minute ...

- What should `docker build` do on Swarm?

  - build on one machine

  - build everywhere ($$$)

- After the build, what should `docker run` do?

  - run where we built (how do we know where it is?)

  - run on any machine that has the image

- Could Compose+Swarm solve this automatically?

---

## A few words about "sane defaults"

- *It would be nice if Swarm could pick a node, and build there!*

  - but which node should it pick?
  - what if the build is very expensive?
  - what if we want to distribute the build across nodes?
  - what if we want to tag some builder nodes?
  - ok but what if no node has been tagged?

- *It would be nice if Swarm could automatically push images!*

  - using the Docker Hub is an easy choice
    <br/>(you just need an account)
  - but some of us can't/won't use Docker Hub
    <br/>(for compliance reasons or because no network access)

.small[("Sane" defaults are nice only if we agree on the definition of "sane")]

---

## The plan

- Build locally

- Tag images

- Upload them to a registry

- Update the Compose file to use those images

*That's the purpose of the `build-tag-push.py` script!*

---

## Which registry do we want to use?

.small[

- **Docker Hub**

  - hosted by Docker Inc.
  - requires an account (free, no credit card needed)
  - images will be public (unless you pay)
  - located in AWS EC2 us-east-1

- **Docker Trusted Registry**

  - self-hosted commercial product
  - requires a subscription (free 30-day trial available)
  - images can be public or private
  - located wherever you want

- **Docker open source registry**

  - self-hosted barebones repository hosting
  - doesn't require anything
  - doesn't come with anything either
  - located wherever you want
]

---

## Using Docker Hub

- To tell `build-tag-push.py` to use Docker Hub,
  <br/>set the `DOCKER_REGISTRY` environment variable
  <br/>to your Docker Hub user name

- We will also see how to run the open source registry
  <br/>(so use whatever option you want!)

.exercise[

- Set the following environment variable:
  <br/>`export DOCKER_REGISTRY=jpetazzo`

- (Use *your* Docker Hub login, of course!)

- Log into the Docker Hub:
  <br/>`docker login`

]

---

## Using Docker Trusted Registry

If we wanted to use DTR, we would:

- make sure we have a Docker Hub account
- [activate a Docker Datacenter subscription](
  https://hub.docker.com/enterprise/trial/)
- install DTR on our machines
- set `DOCKER_REGISTRY` to `dtraddress:port/user`

*This is out of the scope of this workshop!*

---

## Using open source registry

- We need to run a `registry:2` container
  <br/>(make sure you specify tag `:2` to run the new version!)

- It will store images and layers to the local filesystem
  <br/>(but you can add a config file to use S3, Swift, etc.)

- Docker *requires* TLS when communicating with the registry,
  unless for registries on `localhost` or with the Engine
  flag `--insecure-registry`

- We will run an ambassador on each node
  of the cluster, redirecting `localhost:5000` to
  the registry

---

## Deploying our open source registry

.exercise[

- Start your registry on your Swarm cluster:
  ```
  eval $(docker-machine env node1 --swarm)
  docker run -dP --name registry registry:2
  ```

- Start five ambassadors (one per node):
  ```
  for N in $(seq 1 5); do
    docker run -d -p 5000:5000 jpetazzo/hamba \
           5000 $(docker port registry 5000)
  done
  ```

]

---

## Testing our local registry

- We can retag a small image, and push it to the registry

.exercise[

- Make sure we have the busybox image:
  ```
  docker pull busybox
  ```

- Retag the busybox image:
  ```
  docker tag busybox localhost:5000/busybox
  ```

- Push it:
  ```
  docker push localhost:5000/busybox
  ```

]

---

## Using our local registry

- The `build-tag-push.py` script uses the `DOCKER_REGISTRY`
  environment variable

.exercise[

- Set the `DOCKER_REGISTRY` variable:
  ```
  export DOCKER_REGISTRY=localhost:5000
  ```

]

---

## Build, Tag, And Push

Let's inspect the source code of `build-tag-push.py` and run it.

.icon[![Warning](warning.png)] Make sure to run it against a single node!

.icon[![Warning](warning.png)] Make sure to use the original Compose file!

.small[(We don't want the scaled RNG service, the custom logging driver, etc.)]

.exercise[

- Run `build-tag-push.py`:
  ```
  eval $(docker-machine env node1)
  git checkout docker-compose.yml
  ../bin/build-tag-push.py
  ```

]

Inspect the `docker-compose.yml-XXX` file that it created.

---

## Can we run this now?

Let's try!

.exercise[

- Switch back to the Swarm cluster:
  <br/>`eval $(docker-machine env node1 --swarm)`

- Protip - set the `COMPOSE_FILE` variable:
  <br/>`export COMPOSE_FILE=docker-compose.yml-XXX`

- Bring up the application:
  <br/>`docker-compose up`

]

.icon[![Warning](warning.png)] This is *still* supposed to fail!

--

(╯°□°)╯︵ ┻━┻

---

## Why do we get that weird error message?

- Compose and Swarm do not collaborate
  to establish *placement constraints*.

---

## Simple container dependencies

- Container A has a link to container B

- Compose starts B first, then A

- Swarm translates the link into a placement constraint:

  - *"put A on the same node as B"*

- Alles gut

---

## Complex container dependencies

- Container A has a link to containers B and C

- Compose starts B and C first
  <br/>(but that can be on different nodes!)

- Compose starts A

- Swarm translates the links into placements contraints

  - *"put A on the same node as B"*
  - *"put A on the same node as C"*

- If B and C are on different nodes, that's impossible

So, what do‽

---

## A word on placement constraints

- Swarm supports constraints

- We could tell swarm to put all our containers together

- Linking would work

- But all containers would end up on the same node

--

- So having a cluster would be pointless!

---

## Connecting containers with Swarm (1/2)

- Implement service discovery in the application

  - requires extensive code changes

  - doesn't require extra services or containers

  - provides load balancing and failover

- Inject service addresses in environment variables

  - requires minimal code changes

  - doesn't require extra services or containers

  - doesn't provide load balancing and failover

---

## Connecting containers with Swarm (2/2)

- Ambassadors

  - don't require code changes

  - require additional containers

  - provide load balancing and failover

- Overlay networks

  - don't require code changes

  - don't require extra services or containers

  - doesn't provide load balancing and failover (yet)

---

# Connecting containers with ambassadors

- We will use one-tier, dynamic ambassadors

- Each link to a service will be replaced by an ambassador

- Each ambassador will be placed in the network namespace
  of the service using the ambassador

- Ambassadors will be dynamically reconfigured when
  linked services are updated

---

## Revisiting `jpetazzo/hamba`

- Configuration is stored in a *volume*

- A watcher process looks for configuration updates,
  <br/>and restarts HAProxy when needed

- It can be started without configuration:

  ```
  docker run --name amba jpetazzo/hamba run
  ```

- There is a helper to inject a new configuration:

  ```
  docker run --rm --volumes-from amba jpetazzo/hamba \
         reconfigure 80 backend1 port1 backend2 port2 ...
  ```

---

## Should we use `links` for our ambassadors?

Technically, we could use links.

- Before starting an app container:

  start the ambassador(s) it needs

- When starting an app container:

  link it to its ambassador(s)

But we wouldn't be able to use `docker-compose scale` anymore.

(We would have to scale the ambassadors *first*,
then add our client containers.)

---

## Network namespaces and `extra_hosts`

This is our plan:

- Replace each `link` with an `extra_host`,
  <br/>pointing to the `127.127.X.X` address space

- Start app containers normally
  <br/>(`docker-compose up`, `docker-compose scale`)

- Start ambassadors after app containers are up:

  - ambassadors bind to `127.127.X.X`

  - they share their client's network namespace

- Reconfigure ambassadors each time something changes

---

## Our plan for service discovery

- Replace all `links` with static `/etc/hosts` entries

- Those entries will map to `127.127.0.X`
  <br/>(with different `X` for each service)

- Example: `redis` will point to `127.127.0.2`
  <br/>(instead of a container address)

- Start all services; scale them if we want
  <br/>(at this point, they will all fail to connect)

- Start ambassadors in the services' namespace;
  <br/>each ambassador will listen on the right `127.127.0.X`

- Gather all backend addresses and configure ambassadors

.icon[![Warning](warning.png)] Services should try to reconnect!

---

## "Design for failure," they said

- When the containers are started, the network is not ready

- First connection attempts **will fail**

- App should try to reconnect

- It is OK to crash and restart

- Exponential back-off is nice

---

## Pictures worth 1000 words

- In the following diagrams, we are connecting a
  `www` service to a `redis` service through
  an ambassador.

---

class: pic

![](ambassadors/simple-1.png)

---

class: pic

![](ambassadors/simple-2.png)

---

class: pic

![](ambassadors/simple-3.png)

---

class: pic

![](ambassadors/simple-4.png)

---

class: pic

![](ambassadors/simple-5.png)

---

class: pic

![](ambassadors/simple-6.png)

---

## Our tools

- `link-to-ambassadors.py`

  - replaces all `links` with `extra_hosts` entries

- `create-ambassadors.py`

  - scans running containers
  - allocates `127.127.X.X` addresses
  - starts (unconfigured) ambassadors

- `configure-ambassadors.py`

  - scans running containers
  - gathers backend addresses
  - sends configuration to ambassadors

---

## Convert links to ambassadors

- When we ran `build-tag-push.py` earlier,
  <br/>it generated a new `docker-compose.yml-XXX` file.

.exercise[

- Run the first script to create a new YAML file:
  <br/>`../bin/link-to-ambassadors.py`

]

In the Compose file, all links have been replaced
by `extra_hosts` sections.

---

class: pic

## Current state

![Two empty hosts](ambassadors/dockercoins-1.png)

---

## Bring up the application

The application can now be started and scaled.

.exercise[

- Start the application:
  <br/>`docker-compose up -d`

]

Note: you can scale everything as you like, *except Redis*,
because it is stateful.

---

class: pic

## Current state

![Containers are running, but disconnected](ambassadors/dockercoins-2.png)

---

## Create the ambassadors

This has to be executed each time you create new services
or scale up existing ones.

After reading `$COMPOSE_FILE`, it will scan running containers, and compare:

- the list of app containers,
- the list of ambassadors.

It will create missing ambassadors.

.exercise[

- Run the script!
  <br/>`../bin/create-ambassadors.py`

]

---

class: pic

## Current state

![Ambassadors are started, but unconfigured](ambassadors/dockercoins-3.png)

---

## Configure the ambassadors

All ambassadors are created but they still need configuration.

That's the purpose of the last script.

It will read `$COMPOSE_FILE` and gather:

- the list of app backends,
- the list of ambassadors.

Then it configures all ambassadors with all found backends.

.exercise[

- Run it!
  <br/>`../bin/configure-ambassadors.py`

]

---

class: pic

## Current state

![Ambassadors are configured](ambassadors/dockercoins-4.png)

---

## Check what we did

.exercise[


- Find out the address of the web UI:
  <br/>`docker-compose ps webui`

- Point your browser to it

]

---

## Scale

- We will now add more containers.

.exercise[

- Scale worker and rng:
  ```
  docker-compose scale worker=5 rng=10
  ```

]

The performance graph stays at the same level.

If we look at the logs of the added workers, we will
see screenfuls of "connection refused" exceptions.

---

class: pic

## Current state

![New containers are there, but not ambassadors](ambassadors/dockercoins-5.png)

---

## Add ambassadors

- The new containers don't have ambassadors at this point.

.exercise[

- Create the missing ambassadors with the script:
  ```
  ../bin/create-ambassadors.py
  ```

]

The performance graph stays at the same level.

If we look at the logs of the added workers, we will
now see timeout errors instead of "connection refused."

---

class: pic

## Current state

![Now ambassadors are here, but unconfigured](ambassadors/dockercoins-6.png)

---

## Configure ambassadors

- The last step is to inject the updated configuration.

.exercise[

- Run the last script one more time:
  ```
  ../bin/configure-ambassadors.py
  ```

]

Now the performance graph climbs up, and the worker
logs show normal operation.

---

class: pic

## Current state

![All ambassadors are here and configured](ambassadors/dockercoins-7.png)

---

## Clean up

- Before moving on, stop and remove all containers

.exercise[

- Terminate and remove all containers:
  ```
  docker-compose down
  ```

- Remove ambassadors:
  ```
  ../bin/delete-ambassadors.sh
  ```

]

---

## A few words about those ambassadors

- There is "a lot" of added complexity here
  <br/>(5 scripts of almost 50 lines each!)

- Snark aside, those scripts tap into those concepts:

  - network namespaces
  - dynamic load balancer reconfiguration
  - sidekick containers that are *mandatory*
  - ... and have to be managed manually

- We are going to see an easier way to manage this!

---

# Setting up Consul and overlay networks

- We will reconfigure our Swarm cluster to enable overlays

- We will deploy a Consul cluster

- We will connect containers running on different machines

---

## First, let's Clean All The Things!

- We need to remove the old containers
  <br/>(in particular the `swarm` agents and managers)

.exercise[

- The following snippet will nuke all containers on all hosts:

  ```
  for N in 1 2 3 4 5
  do
    ssh node$N "docker ps -qa | xargs -r docker rm -f"
  done
  ```

(If it asks you to confirm SSH keys, just do it!)

]

Note: our Swarm cluster is now broken.

---

## Remove old Machine information

- We will use `docker-machine rm`

- With the `generic` driver, this doesn't do anything
  <br/>(it just deletes local configuration)

- With cloud/VM drivers, this would actually delete VMs

.exercise[

- Remove our nodes from Docker Machine config database:

  ```
  for N in 1 2 3 4 5
  do
    docker-machine rm -f node$N
  done
  ```

]

---

## Add extra options to our Engines

- We need two new options for our engines:

  - `cluster-store` (to indicate which key/value store to use)

  - `cluster-advertise` (to indicate which IP address to register)

- `cluster-store` will be `consul://localhost:8500`
  <br/>(we will run one Consul node on each machine)

- `cluster-advertise` will be `eth0:2376`
  <br/>(Engine will automatically pick up eth0's IP address)

---

## Reconfiguring Swarm clusters, the Docker way

- The traditional way to reconfigure a service is to edit
  its configuration (or init script), then restart

- We can use Machine to make that easier

- Re-deploying with Machine's `generic` driver will reconfigure
  Engines with the new parameters

.exercise[

- Re-provision the manager node:

  .small[
  ```
  docker-machine create --driver generic \
    --engine-opt cluster-store=consul://localhost:8500 \
    --engine-opt cluster-advertise=eth0:2376 \
    --swarm --swarm-master --swarm-discovery consul://localhost:8500 \
    --generic-ssh-user docker --generic-ip-address XX.XX.XX.XX node1
  ```
  ]
]

---

## Reconfigure the other nodes

- Once again, scripting to the rescue!

.exercise[

```
grep node[2345] /etc/hosts | grep -v ^127 |
while read IPADDR NODENAME
do docker-machine create --driver generic \
   --engine-opt cluster-store=consul://localhost:8500 \
   --engine-opt cluster-advertise=eth0:2376 \
   --swarm --swarm-discovery consul://localhost:8500 \
   --generic-ssh-user docker \
   --generic-ip-address $IPADDR $NODENAME
done
```

]

---

## Checking what we did

.exercise[

- Directly point the CLI to a node and check configuration:

  ```
  eval $(docker-machine env node1)
  docker info
  ```

  (should show `Cluster store` and `Cluster advertise`)

- Try to talk to the Swarm cluster:

  ```
  eval $(docker-machine env node1 --swarm)
  docker info
  ```

  (should show zero node)

]

---

## Why zero node?

- We haven't started Consul yet

- Swarm discovery is not operationl

- Swarm can't discover the nodes

Note: good guy ~~Stevedore~~ Docker will start without K/V

(This lets us run Consul itself in a container!)

---

## Adding Consul

- We will run Consul in containers

- We will use a
  [custom consul image](https://hub.docker.com/r/jpetazzo/consul/)

- We will tell Docker to automatically restart it on reboots

- To simplify network setup, we will use `host` networking

---

## Starting the first Consul node

.exercise[

- Make sure you're logged into `node1`,
  with a clean environment:

  ```
  unset DOCKER_HOST
  ```

- The first node must be started with the `-bootstrap` flag:

  ```
  CID=$(docker run --name consul_node1 \
        -d --restart=always --net host \
        jpetazzo/consul agent -server -bootstrap)
  ```

]

---

## Starting the other Consul nodes

- Other nodes have to be started with the `-join A.B.C.D`
  option, where A.B.C.D is the address of an existing node

.exercise[

- Find the internal IP address of our first node:
  ```
  IPADDR=$(ip a ls dev eth0 |
           sed -n 's,.*inet \(.*\)/.*,\1,p')
  ```

- Start the other nodes:
  ```
  for N in 2 3 4 5; do
  ssh node$N docker run --name consul_node$N \
             -d --restart=always --net host \
             jpetazzo/consul agent -server -join $IPADDR
  done
  ```

]

---

## Check that our Consul cluster is up

- With your browser, navigate to any instance on port 8500
  <br/>(in "NODES" you should see the five nodes)

- Let's run a couple of useful Consul commands

.exercise[

- Ask Consul the list of members it knows:
  ```
  docker run --net host --rm jpetazzo/consul members
  ```

- Ask Consul which node is the current leader:
  ```
  curl localhost:8500/v1/status/leader
  ```

]

---

## Check that our Swarm cluster is up

.exercise[

- Try again the `docker info` from earlier:

  ```
  eval $(docker-machine env --swarm node1)
  docker info
  ```

- Now all nodes should be visible
  <br/>(Give them a minute or two to register)

]

---

# Multi-host networking

- Docker 1.9 has the concept of *networks*

- By default, containers are on the default "bridge" network

- You can create additional networks

- Containers can be on multiple networks

- Containers can dynamically join/leave networks

- The "overlay" driver lets networks span multiple hosts

- Let's see that in action!

---

## Create a few networks and containers

.exercise[

- Create two networks, *blue* and *green*:
  ```
  docker network create --driver overlay blue
  docker network create --driver overlay green
  docker network ls
  ```

- Create containers with names of blue and green
  things, on their respective networks:
  ```
  docker run -d --name sky --net blue -m 3G redis
  docker run -d --name navy --net blue -m 3G redis
  docker run -d --name grass --net green -m 3G redis
  docker run -d --name forest --net green -m 3G redis
  ```

]

---

## Check connectivity within networks

.exercise[

- Check that our containers are on different networks:

  ```
  docker ps
  ```

- This will work:

  ```
  docker exec -ti sky ping navy
  ```

- This will not:

  ```
  docker exec -ti navy ping grass
  ```

]

---

## Containers connected to multiple networks

- Some colors aren't *quite* blue *nor* green

.exercise[

- Create a container that we want to be on both networks:
  ```
  docker run -d --net blue --name turquoise nginx
  ```

- Check connectivity:
  ```
  docker exec -ti turquoise ping -c1 navy
  docker exec -ti turquoise ping -c1 grass
  ```
  (First works; second doesn't)

]

---

## Dynamically connecting containers

- This is achieved with the command:
  <br/>`docker network connect NETNAME CONTAINER`

.exercise[

- Dynamically connect to the green network:
  ```
  docker network connect green turquoise
  ```

- Check connectivity:
  ```
  docker exec -ti turquoise ping -c1 navy
  docker exec -ti turquoise ping -c1 grass
  ```
  (Both commands work now)

]

---

## Under the hood

- Each network has an interface in the container

- There is also an interface for the default gateway

.exercise[

- View interfaces in our `turquoise` container:
  ```
  docker exec -ti turquoise ip addr ls
  ```

]

---

## Dynamically disconnecting containers

- There is a mirror command to `docker network connect`

.exercise[

- Disconnect the *turquoise* container from *blue*
  (its original network):
  ```
  docker network disconnect blue turquoise
  docker network connect green turquoise
  ```

- Check connectivity:
  ```
  docker exec -ti turquoise ping -c1 navy
  docker exec -ti turquoise ping -c1 grass
  ```
  (First command fails, second one works)

]

---

## Cleaning up

.exercise[

- Destroy containers:

  ```
  docker rm -f sky navy grass forest turquoise
  ```

- Destroy networks:

  ```
  docker network rm blue
  docker network rm green
  ```

]

You cannot remove a network if
it still has containers.

There is no `"rm -f"` for network.
<br/>
There is a `"disconnect -f"` if needed.

---

# Using overlay networks with Compose

- Compose 1.5 had `--x-networking` flag
  <br/>(enabling experimental support for overlay networks)

- Compose 1.6 has a new Compose file format
  <br/>(using the new format enables overlay networks support)

- Compose will remain backward compatible with old files

- Converting to new files is (ridiculously) easy

---

## Our first "Compose v2" app

- To deploy DockerCoins, we still need a local registry

- Let's deploy a local registry using a Compose File v2!

.exercise[

- Go to the `registry` directory in the repository:
  ```
  cd ~/orchestration-workshop/registry
  ```

]

Let's examine the `docker-compose.yml` file.

---

## Our first Compose v2 file

```
version: "2"

services:
  backend:
    image: registry:2
  frontend:
    image: jpetazzo/hamba
    command: 5000 backend:5000
    ports:
      - "127.0.0.1:5000:5000"
    depends_on:
      - backend
```

- *Backend* is the actual registry.
- *Frontend* is the ambassador that we deployed earlier.
<br/>
It communicates with *backend* using an internal network
and network aliases.

---

## Starting a local registry with Compose

- We will bring up the registry

- Then we will ensure that one *frontend* is running
  on each node by scaling it to our number of nodes

.exercise[

- Make sure that `COMPOSE_FILE` is not set:
  ```
  unset COMPOSE_FILE
  ```

- Start the registry:
  ```
  docker-compose up -d
  ```

]

---

## "Scaling" the local registry

- This is a particular kind of scaling

- We just want to ensure that one *frontend*
  is running on every single node of the cluster

.exercise[

- Scale the registry:
  ```
  N=1
  while docker-compose scale frontend=$N; do
    N=$((N+1))
  done
  ```

]

Note: Swarm might do that automatically for us in the future.

---

## Converting the Compose file for DockerCoins

- Services are no longer at the top level,
  <br/>but under a `services` section

- There has to be a `version` key at the top level,
  <br/>with value `"2"` (as a string, not an integer)

- Links should be removed

- Fixed port mappings should be removed
  <br/>(until [docker/compose#2866](
  https://github.com/docker/compose/issues/2866) is fixed)

- There are other minor differences, but for our sample
  app, that's all we have to worry about!

---

## Our new Compose file

.small[
```
version: '2'

services:
    rng:
        build: rng
        ports:
            - 80

    hasher:
        build: hasher
        ports:
            - 80

    webui:
        build: webui
        ports:
            - 80

    redis:
        image: redis

    worker:
        build: worker
```
]

Copy-paste this into `docker-compose.yml`
<br/>(or you can `cp docker-compose.yml-v2 docker-compose.yml`)

---

## Use images, not builds

- If we try to start the app like that, containers will only
  run on nodes which have the images

- Like before: we need to replace `build` with `image`

- We can re-use the `build-tag-push.py` script for that

.exercise[

- Set `DOCKER_REGISTRY` to use our local registry,
  <br/>then build, tag, and push the application:
  ```
  export DOCKER_REGISTRY=localhost:5000
  ../bin/build-tag-push.py
  ```

]

---

## Run the application

- At this point, our app is ready to run

- We don't need ambassadors or extra containers

.exercise[

- Start the application:
  ```
  export COMPOSE_FILE=docker-compose.yml-XXXX
  docker-compose up -d
  ```

- Observe that it's running on multiple nodes:
  ```
  docker ps
  ```

]

Each container name is prefixed with the node it's running on.

---

## View the performance graph

- Load up the graph in the browser

.exercise[

- Check the `webui` service address and port:
  ```
  docker-compose port webui 80
  ```

- Open it in your browser

]

---

# Load balancing with overlay networks

- Scaling the `worker` service works out of the box
  (like before)

.exercise[

- Scale `worker`:
  ```
  docker-compose scale worker=10
  ```

]

We will hit the bottleneck caused by the `rng` service.

How can we scale that service?

---

## The manual method

- Replace `rng` with:

  - multiple copies `rng1`, `rng2`, `rng3`, ...

  - a load balancer taking over the name `rng`,
    <br/>and spreading traffic accross all instances

- You should have a sense of *déjà vu*

- We did that in the beginning of the workshop

- Can we do better?

---

## The scripted method

- We could write a script to automate those steps

--

- *Can we do better?*

--

- In a perfect world, we would like to do:
  ```
  docker-compose scale rng=10
  ```

---

## Naming problem

- Service is called `rng`

- It therefore takes the network name `rng`

- Worker code connects to `rng`

- So `rng` should point to the load balancer

- What do‽

---

## Naming is *per-network*

- Solution: put `rng` on its own network

- That way, it doesn't take the network name `rng`
  <br/>(at least not on the default network)

- Have the load balancer sit on both networks

- Add the name `rng` to the load balancer

---

class: pic

## Original DockerCoins

![](dockercoins-single-node.png)

---

class: pic

## Load-balanced DockerCoins

![](dockercoins-multi-node.png)

---

## Declaring networks

- Networks (other than the default one)
  *must* be declared
  in a top-level `networks` section

.exercise[

- Add the `rng` network to the Compose file:
  ```
  version: '2'

  networks:
    rng:

  services:
    rng:
      image: ...
  ...
  ```

]

That section can be placed anywhere in the file.

---

## Putting the `rng` service in its network

- Services can have a `networks` section

- If they don't: they are placed in the default network

- If they do: they are placed only in the mentioned networks

.exercise[

- Change the `rng` service to put it in its network:
  ```
  rng:
    image: localhost:5000/dockercoins_rng:…
    networks:
      rng:
  ```

]

---

## Adding the load balancer

- The load balancer has to be in both networks:
  <br/>`rng` and `default`

- In the `default` network, it must have the `rng` alias

- We will use the `jpetazzo/hamba` image

.exercise[

- Add the `rng-lb` service to the Compose file:
  ```
  rng-lb:
    image: jpetazzo/hamba
    command: run
    networks:
      rng:
      default:
        aliases: [ rng ]
  ```
]

---

## Load balancer initial configuration

- We specified `run` as the initial command

- This tells `hamba` to wait for an initial configuration

- The load balancer will not be operational
  <br/>(until we feed it its configuration)

---

## Start the application

.exercise[

- Bring up DockerCoins:
  ```
  docker-compose up -d
  ```

- See that `worker` is complaining:
  ```
  docker-compose logs worker
  ```
]

---

## Configure the load balancer

- Multiple solutions:

  - lookup the IP address of the `rng` backend
  - use the backend's network name
  - use the backend's container name (easiest!)

.exercise[

- Configure the load balancer:
  ```
  docker run --rm \
    --volumes-from dockercoins_rng-lb_1 \
    --net container:dockercoins_rng-lb_1 \
    jpetazzo/hamba reconfigure 80 dockercoins_rng_1 80
  ```

]

The application should now be working correctly.

---

## Scale the application

- Use `docker-compose scale` as planned

.exercise[

- Scale `rng`:
  ```
  docker-compose scale rng=10
  ```

]

Of course, the graph doesn't change *yet*.

We need to add the new backends to the load balancer
configuration first.

---

## Reconfigure the load balancer

- The command is similar to the one before

- We need to pass the list of all backends

.exercise[

- Reconfigure the load balancer:
  ```
  docker run --rm \
    --volumes-from dockercoins_rng-lb_1 \
    --net container:dockercoins_rng-lb_1 \
    jpetazzo/hamba reconfigure 80 \
    $(for N in $(seq 1 10); do
        echo dockercoins_rng_$N:80
      done)
  ```

]

---

## Automating the process

- Nobody loves artisan YAML handy craft

- This can be automated very easily

- To make things easier, we can use a label:

  *each container behind a load balancer will
  have a `loadbalancer` label giving the name
  of that loadbalancer*

- This is implemented by two scripts:

  - add-load-balancer-v2.py

  - reconfigure-load-balancers.py

---

# Going further

Deploying a new version (difficulty: easy)

- Just re-run all the steps!

- However, Compose will re-create the containers

- You will have to re-create ambassadors
  <br/>(and configure them)

- You will have to cleanup old ambassadors
  <br/>(left as an exercise for the reader)

- You will experience a little bit of downtime

---

## Going further

Zero-downtime deployment (difficulty: medium)

- Isolate stateful services
  <br/>(like we did earlier for Redis)

- Do blue/green deployment:

  - deploy and scale version N

  - point a "top-level" load balancer to the app

  - deploy and scale version N+1

  - put both apps in the "top-level" balancer

  - slowly switch traffic over to app version N+1

---

## Going further

Harder projects:

- Two-tier or three-tier ambassador deployments

- Deploy to Mesos or Kubernetes

---

## Cleaning up

.exercise[

- Terminate containers and remove them:

  ```
  docker-compose down
  ```

]

Note: `docker-compose down` also deletes the
networks that had been created for the application.

---

class: pic

![Here Be Dragons](dragons.jpg)

---

# Here be dragons

- So far, we've used stable features

- We're going to explore experimental code

- **Use at your own risk**

---

# Distributing Machine credentials

- All the credentials (TLS keys and certs) are on node1
  <br/>(the node on which we ran `docker-machine create`)

- If we lose node1, we're toast

- We need to move (or copy) the credentials somewhere safe

- Credentials are regular files, and relatively small

- Ah, if only we had a highly available, hierarchic store ...

--

- Wait a minute, we have one!

--

(That's Consul, if you were wondering)

---

## Storing files in Consul

- We will use [Benjamin Wester's consulfs](
  https://github.com/bwester/consulfs)

- It mounts a Consul key/value store as a local filesystem

- Performance will be horrible
  <br/>(don't run a database on top of that!)

- But to store files of a few KB, nobody will notice

- We will copy/link/sync... `~/.docker/machine` to Consul

---

## Installing consulfs

- Option 1: install Go, git clone, go build ...

- Option 2: be lazy and use [jpetazzo/consulfs](
  https://hub.docker.com/r/jpetazzo/consulfs/)

.exercise[

- Be lazy and use the Docker image:
  ```
  sudo docker run --rm -v /usr/local/bin:/target jpetazzo/consulfs
  ```
]

Note: the `jpetazzo/consulfs` image contains the
`consulfs` binary.
It copies it to `/target` (if `/target` is a volume).

We need `consulfs` locally (not in a container) because
we can't propagate a FUSE mount from a container to
the host (yet).

---

## Running consulfs

- The `consulfs` binary takes two arguments:

  - the Consul server address
  - a mount point (that has to be created first)

.exercise[

- Create a mount point:
  ```
  mkdir ~/consul
  ```

- Mount Consul as a local filesystem:
  ```
  consulfs localhost:8500 ~/consul
  ```

]

Leave this running in the foreground.

---

## Copying our credentials to Consul

- Use standard UNIX commands

- Don't try to preserve permissions, though
  <br/>(`consulfs` doesn't store those)

.exercise[

- Check that Consul key/values are visible:
  ```
  ls -l ~/consul/
  ```

- Copy Machine credentials into Consul:
  ```
  cp -r ~/.docker/machine/. ~/consul/machine/
  ```

]

(This command can be re-executed to update the copy.)

---

## Mount Consul on another node

- We will repeat the previous steps to mount `~/consul`

.exercise[

- Connect to node2:
  ```
  ssh node2
  ```

- Install `consulfs` and mount Consul:
  ```
  sudo docker run --rm -v /usr/local/bin:/target jpetazzo/consulfs
  mkdir ~/consul
  consulfs localhost:8500 ~/consul
  ```

]

At this point, `ls -l ~/consul` should show `docker` and
`machine` directories.

---

## Access the credentials from the other node

- We will create a symlink

- We could also copy the credentials

.exercise[

- Create the symlink:
  ```
  mkdir -p ~/.docker/
  ln -s ~/consul/machine ~/.docker/
  ```

- Check that all nodes are visible:
  ```
  docker-machine ls
  ```

]

.icon[![Warning](warning.png)] Go back to node1 after this.

---

## A few words on this strategy

- Anyone accessing Consul can control your Docker cluster
  <br/>(to be fair: anyone accessing Consul can wreck
  serious havoc to your cluster anyway)

- ConsulFS doesn't support *all* POSIX operations,
  <br/>so a few things (like `mv`) will not work)

- As a consequence, with Machine 0.6, you cannot
  run `docker-machine create` directly on top of ConsulFS

---

## What if Consul becomes unavailable?

- If Consul becomes unavailable (e.g. loses quorum),
  <br/>you won't be able to access your credentials

- If Consul becomes unavailable ...
  <br/>your cluster will be in a bad state anyway

- You can still access each Docker Engine over the
  local UNIX socket (and repair Consul that way)


---

# Highly available Swarm managers

- Until now, the Swarm manager was a SPOF
  <br/>(Single Point Of Failure)

- Swarm has experimental support for replication

- When replication is enabled, you deploy multiple (identical) managers

  - one will be "primary"
  - the other(s) will be "secondary"
  - this is determined automatically
    <br/>(through *leader election*)

---

## Swarm leader election

- The leader election mechanism relies on a key/value store
  <br/>(consul, etcd, zookeeper)

- There is no requirement on the number of replicas
  <br/>(the quorum is achieved through the key/value store)

- When the leader (or "primary") is unavailable,
  <br/>a new election happens automatically

- You can issue API requests to any manager:
  <br/>if you talk to a secondary, it forwards to the primary

.icon[![Warning](warning.png)] There is currently a bug when
the Consul cluster itself has a leader election; see [docker/swarm#1782](
https://github.com/docker/swarm/issues/1782).

---

## Swarm replication in practice

- We need to give two extra flags to the Swarm manager:

  - `--replication`

    *enables replication (duh!)*

  - `--advertise ip.ad.dr.ess:port`

    *address and port where this Swarm manager is reachable*

- Do you deploy with Docker Machine?
  <br/>Then you can use `--swarm-opt`
  to automatically pass flags to the Swarm manager

---

## Cleaning up our current Swarm containers

- We will use Docker Machine to re-provision Swarm

- We need to:

  - remove the nodes from the Machine registry

  - remove the Swarm containers

.exercise[

- Remove the current configuration:
  ```
  for N in 1 2 3 4 5; do
    ssh node$N docker rm -f swarm-agent swarm-agent-master
    docker-machine rm -f node$N
  done
  ```

]

---

## Re-deploy with the new configuration

- This time, we can deploy each node identically
  <br/>(instead of 1 manager + 4 non-managers)

.exercise[

- Deploy all five nodes with the previous options,
  and the new replication options:

  .small[
  ```
  grep node[12345] /etc/hosts | grep -v ^127 |
  while read IPADDR NODENAME; do
    docker-machine create --driver generic \
      --engine-opt cluster-store=consul://localhost:8500 \
      --engine-opt cluster-advertise=eth0:2376 \
      --swarm --swarm-master \
      --swarm-discovery consul://localhost:8500  \
      --swarm-opt replication --swarm-opt advertise=$IPADDR:3376 \
      --generic-ssh-user docker --generic-ip-address $IPADDR $NODENAME
  done
  ```
  ]

]

.small[
Note: Consul is still running thanks to the `--restart=always` policy.
Other containers are now stopped, because the engines have been
reconfigured and restarted.
]

---

## Assess our new cluster health

- The output of `docker info` will tell us the status
  of the node that we are talking to (primary or replica)

- If we talk to a replica, it will tell us who is the primary

.exercise[

- Talk to a random node, and ask its view of the cluster:
  ```
  eval $(docker-machine env node3 --swarm)
  docker info | grep -e ^Name -e ^Role -e ^Primary
  ```

]

Note: `docker info` is one of the only commands that will
work even when there is no elected primary. This helps
debugging.

---

## Test Swarm manager failover

- The previous command told us which node was the primary manager

  - if `Role` is `primary`,
    <br/>then the primary is indicated by `Name`

  - if `Role` is `replica`,
    <br/>then the primary is indicated by `Primary`

.exercise[

- Kill the primary manager:
  ```
  ssh XXX docker kill swarm-agent-master
  ```

]

Look at the output of `docker info` every few seconds.

---

# Highly available containers

- Swarm has support for *rescheduling* on node failure

- It has to be explicitly enabled on a per-container basis

- When the primary manager detects that a node goes down,
  <br/>those containers are rescheduled elsewhere

- If the containers can't be rescheduled (constraints issue),
  <br/>they are lost (there is no reconciliation loop yet)

- As of Swarm 1.1.0, this is an *experimental* feature
  <br/>(To enable it, you must pass the
  `--experimental` flag when you start Swarm itself!)

---

## Working around flag order

- The flag must be *before* the Swarm command
  <br/>(i.e. `docker run swarm --experimental manage ...`)

- We cannot use Docker Machine to pass that flag ☹
  <br/>(Machine adds flags *after* the Swarm command)

- Instead, we will use the Swarm image `jpetazzo/swarm:experimental`:
  ```
  FROM swarm
  ENTRYPOINT ["/swarm", "--experimental"]
  ```

- We can tell Machine to use this with `--swarm-image`

---

## Reconfigure Swarm [one more time](https://www.youtube.com/watch?v=FGBhQbmPwH8)

.exercise[

- Redeploy Swarm with `--experimental`:

  .small[
  ```
  for N in 1 2 3 4 5; do
    ssh node$N docker rm -f swarm-agent swarm-agent-master
    docker-machine rm -f node$N
  done

  grep node[12345] /etc/hosts | grep -v ^127 |
  while read IPADDR NODENAME; do
    docker-machine create --driver generic \
      --engine-opt cluster-store=consul://localhost:8500 \
      --engine-opt cluster-advertise=eth0:2376 \
      --swarm --swarm-master --swarm-image jpetazzo/swarm:experimental \
      --swarm-discovery consul://localhost:8500  \
      --swarm-opt replication --swarm-opt advertise=$IPADDR:3376 \
      --generic-ssh-user docker --generic-ip-address $IPADDR $NODENAME
  done
  ```
  ]

]

---

## Start a resilient container

- By default, containers will not be restarted when their node goes down

- You must pass an explicit *rescheduling policy* to make that happen

- For now, the only policy is "on-node-failure"

.exercise[

- Start a container with a rescheduling policy:

  .small[
  ```
  CID=$(docker run -d -e reschedule:on-node-failure nginx)
  ```
  ]

]

Check that the container is up and running.

---

## Simulate a node failure

- We will reboot the node running this container

- Swarm will reschedule it

.exercise[

- Check on which node the container is running:
  </br>.small[`NODE=$(docker inspect --format '{{.Node.Name}}' $CID)`]

- Reboot that node:
  <br/>`ssh $NODE sudo reboot`

- Check that the container has been recheduled:
  <br/>`docker ps`

]

---

## .icon[![Warning](warning.png)] Caveats

- There are some corner cases when the node is also
  the Swarm leader or the Consul leader; this is being improved
  right now!

- Swarm doesn't handle gracefully the fact that after the
  reboot, you have *two* containers named `highlander`,
  and attempts to manipulate the container with its name
  will not work. This will be improved too.

---

# Conclusions

- Bad news: we still have work to do to deploy our apps

  - it's not all unicorns, ponies, and rainbows

  - *no, Docker will not make your job obsolete*

- Good news: a lot of hard things are becoming easier

  - building, packaging, distributing apps

  - running distributed systems on clusters

---

## "This is complicated"

- The scripts used here are pretty simple
  <br/>(each is less than 100 LOCs)

- You can easily rewrite them in your favorite language,
  <br/>adapt and customize them, in a few hours of time

- FYI: those scripts are smaller and simpler than the
  scripts (cloud init etc) used to deploy the VMs for this
  workshop!

- Docker Inc. has commercial products to wrap all this:

  - Docker Cloud
    <br/>(manage your Docker nodes from a SAAS portal)

  - Universal Control Plane
    <br/>(buzzword-compliant management solution:
    <br/>turnkey, enterprise-class, on-premise, etc.)

---

## What's next?

- November 2015: Compose 1.5 + Engine 1.9 =
  <br/>first release with multi-host networking

- January 2016: Compose 1.6 + Engine 1.10 =
  <br/>HUGE improvements (DNS server, HA...)

- Next release: another truckload of features

- I will deliver this workshop about twice a month

- Check out the GitHub repo for updated content!
  <br/>(there is a tag for each big round of updates)

---

class: title

# Thanks! <br/> Questions?

### [@jpetazzo](https://twitter.com/jpetazzo) <br/> [@docker](https://twitter.com/docker)

    </textarea>
    <script src="https://gnab.github.io/remark/downloads/remark-0.5.9.min.js" type="text/javascript">
    </script>
    <script type="text/javascript">
      var slideshow = remark.create();
    </script>
  </body>
</html>