container.training/www/htdocs/index.html

<!DOCTYPE html>
<html>
  <head>
    <base target="_blank">
    <title>Docker Orchestration Workshop</title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    <style type="text/css">
      @import url(https://fonts.googleapis.com/css?family=Yanone+Kaffeesatz);
      @import url(https://fonts.googleapis.com/css?family=Droid+Serif:400,700,400italic);
      @import url(https://fonts.googleapis.com/css?family=Ubuntu+Mono:400,700,400italic);

      body { font-family: 'Droid Serif'; font-size: 150%; }

      h1, h2, h3 {
        font-family: 'Yanone Kaffeesatz';
        font-weight: normal;
      }
      a {
        text-decoration: none;
        color: blue;
      }
      .remark-code, .remark-inline-code { font-family: 'Ubuntu Mono'; }
      .red { color: #fa0000; }
      .gray { color: #ccc; }
      .small { font-size: 70%; }
      .big { font-size: 140%; }
      .underline { text-decoration: underline; }
      .footnote {
        position: absolute;
        bottom: 3em;
      }
      .pic {
        vertical-align: middle;
        text-align: center;
        padding: 0 0 0 0 !important;
      }
      img {
        max-width: 100%;
        max-height: 450px;
      }
      .title {
        vertical-align: middle;
        text-align: center;
      }
      .title {
        font-size: 2em;
      }
      .title .remark-slide-number {
        font-size: 0.5em;
      }
      .quote {
        background: #eee;
        border-left: 10px solid #ccc;
        margin: 1.5em 10px;
        padding: 0.5em 10px;
        quotes: "\201C""\201D""\2018""\2019";
        font-style: italic;
      }
      .quote:before {
        color: #ccc;
        content: open-quote;
        font-size: 4em;
        line-height: 0.1em;
        margin-right: 0.25em;
        vertical-align: -0.4em;
      }
      .quote p {
        display: inline;
      }
      .icon img {
        height: 1em;
      }
      .exercise {
        background-color: #eee;
        background-image: url("keyboard.png");
        background-size: 1.4em;
        background-repeat: no-repeat;
        background-position: 0.2em 0.2em;
        border: 2px dotted black;
      }
      .exercise::before {
        content: "Exercise:";
        margin-left: 1.8em;
      }
      li p { line-height: 1.25em; }
    </style>
  </head>
  <body>
    <textarea id="source">

class: title

# Docker <br/> Orchestration <br/> Workshop

---

## Logistics

- Hello! I'm `jerome at docker dot com`

- Agenda:

  .small[
  - 09:00-10:30 part 1
  - 10:30-11:00 coffee break
  - 11:00-12:30 part 2
  - 12:30-13:30 lunch break
  - 13:30-15:00 part 3
  - 15:00-15:30 coffee break
  - 15:30-17:00 part 4
  - 17:00- open discussion, Q&A
  ]

<!-- - This will be FAST PACED, but DON'T PANIC! -->

- All the content is publicly available
  <br/>(slides, code samples, scripts)

- Experimental chat support on
  [Gitter](https://gitter.im/jpetazzo/workshop-20160215-paris)

---

<!-- grep '^# ' index.html | grep -v '<br' | tr '#' '-'^C -->

## Outline (1/2)

- Pre-requirements
- VM environment
- Our sample application
- Running services independently
- Running the whole app on a single node
- Identifying bottlenecks
- Measuring latency under load
- Scaling HTTP on a single node
- Put a load balancer on it
- Connecting to containers on other hosts
- Abstracting remote services with ambassadors
- Various considerations about ambassadors

---

## Outline (2/2)

- Docker for ops
- Backups
- Logs
- Security upgrades
- Network traffic analysis
- Dynamic orchestration
- Hands-on Swarm
- Deploying Swarm
- Cluster discovery
- Building our app on Swarm
- Network plumbing on Swarm
- Going further

---

# Pre-requirements

- Computer with network connection and SSH client
  <br/>(on Windows, get [putty](http://www.putty.org/)
  or [Git BASH](https://msysgit.github.io/))
- GitHub account (recommended; not mandatory)
- Gitter account (recommended; not mandatory)
- Docker Hub account (only for Swarm hands-on section)
- Basic Docker knowledge

.exercise[

- This is the stuff you're supposed to do!
- Create [GitHub](https://github.com/) and
  [Docker Hub](https://hub.docker.com) accounts now if needed
- Go to [view.dckr.info](http://view.dckr.info) to view these slides
- Join the chat room on
  [Gitter](https://gitter.im/jpetazzo/workshop-20160215-paris)

]

---

# VM environment

- Each person gets 5 VMs
- They are *your* VMs
- They'll be up until tomorrow
- You have a little card with login+password+IP addresses
- You can automatically SSH from one VM to another

.exercise[

- Log into the first VM (`node1`)
- Check that you can SSH (without password) to `node2`
- Check the version of docker with `docker version`

]

.footnote[Note: from now on, unless instructed, **all commands must
be run from the first VM, `node1`**.]

---

## Brand new versions!

- Engine 1.10.0

- Compose 1.6.0

- Swarm 1.1.0

- Machine 0.6.0

---

# Our sample application

- Let's look at the general layout of the
  [source code](https://github.com/jpetazzo/orchestration-workshop)

- Each directory = 1 microservice
  - `rng` = web service generating random bytes
  - `hasher` = web service computing hash of POSTed data
  - `worker` = background process using `rng` and `hasher`
  - `webui` = web interface to watch progress

.exercise[

- Clone the repository on `node1`:
  <br/>.small[`git clone git://github.com/jpetazzo/orchestration-workshop`]

]

(Bonus points for forking on GitHub and cloning your fork!)

---

## What's this application?

- It is a DockerCoin miner! 💰🐳📦🚢

- No, you can't buy coffee with DockerCoins

- How DockerCoins works:

  - `worker` asks to `rng` to give it random bytes
  - `worker` feeds those random bytes into `hasher`
  - each hash starting with `0` is a DockerCoin
  - DockerCoins are stored in `redis`
  - `redis` is also updated every second to track speed
  - you can see the progress with the `webui`

Next: we will inspect components independently.

---

# Running services independently

First, we will run the random number generator (`rng`).

.exercise[

- Go to the `dockercoins` directory, in the cloned repo:
  <br/>`cd orchestration-workshop/dockercoins`

- Use Compose to run the `rng` service:
  <br/>`docker-compose up rng`

- Docker will pull `python` and build the microservice

]

---

## Lies, damn lies, and port numbers

.icon[![Warning](warning.png)] Pay attention to the port mapping!

- The container log says:
  <br/>`Running on http://0.0.0.0:80/`

- But if you try `curl localhost:80`, you will get:
  <br/>`Connection refused`

- Port 80 on the container ≠ port 80 on the Docker host

---

## Understanding port mapping

- `node1`, the Docker host, has only one port 80

- If we give the one and only port 80 to the first
  container who asks for it, we are in trouble when
  another container needs it

- Default behavior: containers are not "exposed"
  <br/>(only reachable by the Docker host and other containers,
  through their private address)

- Container network services can be exposed:

  - statically (you decide which host port to use)

  - dynamically (Docker allocates a host port)

---

## Declaring port mapping

- Directly with the Docker Engine:
  <br/>`docker run -P redis`
  <br/>`docker run -p 6379 redis`
  <br/>`docker run -p 1234:6379 redis`

- With Docker Compose, in the `docker-compose.yml` file:

```
rng:
  …
  ports:
    - "8001:80"
```

→ port 8001 *on the host* maps to
port 80 *in the container*

---

## Using the `rng` service

Let's get random bytes of data!

.exercise[

- Open a second terminal and connect to the same VM

- Check that the service is alive:
  <br/>`curl localhost:8001`

- Get 10 bytes of random data:
  <br/>`curl localhost:8001/10`

- If the binary data output messed up your terminal, fix it:
  <br/>`reset`

]

---

## Running the hasher

.exercise[

- Start the `hasher` service:
  <br/>`docker-compose up hasher`

- It will pull `ruby` and do the build

]

.icon[![Warning](warning.png)] Again, pay attention to the port mapping!

The container log says that it's listening on port 80,
but it's mapped to port 8002 on the host.

You can see the mapping in `docker-compose.yml`.

---

## Testing the hasher

.exercise[

- Open a third terminal window, and SSH to `node1`

- Check that the `hasher` service is alive:
  <br/>`curl localhost:8002`

- Posting binary data requires some extra flags:

  ```
  curl \
    -H "Content-type: application/octet-stream" \
    --data-binary hello \
    localhost:8002
  ```

- Check that it computed the right hash:
  <br/>`echo -n hello | sha256sum`

]

---

## Stopping services

We have multiple options:

- Interrupt `docker-compose up` with `^C`

- Stop individual services with `docker-compose stop rng`

- Stop all services with `docker-compose stop`

- Kill all services with `docker-compose kill`
  <br/>(rude, but faster!)

.exercise[

- Use any of those methods to stop `rng` and `hasher`

]

???

This hidden content is here for automation
(so that `docker-compose kill` gets executed
when auto-testing the content).

.exercise[

```
docker-compose kill
```

]

---

# Running the whole app on a single node

.exercise[

- Run `docker-compose up` to start all components

]

- `rng` and `hasher` can be started directly

- Other components are built accordingly

- Aggregate output is shown

- Output is verbose
  <br/>(because the worker is constantly hitting other services)

---

## Viewing our application

- The app exposes a Web UI with a realtime progress graph

.exercise[

- Open http://[yourVMaddr]:8000/ (from a browser)

]

- The app actually has a constant, steady speed
  <br/>(3.33 coins/second)

- The speed seems not-so-steady because:

  - we measure a discrete value over discrete intervals

  - the measurement is done by the browser

  - BREAKING: network latency is a thing

---

## Running in the background

- The logs are very verbose (and won't get better)

- Let's put them in the background for now!

.exercise[

- Stop the app (with `^C`)

- Start it again with `docker-compose up -d`

- Check on the web UI that the app is still making progress

]

---

## Looking at resource usage

- Let's look at CPU, memory, and I/O usage

.exercise[

- run `top` to see CPU and memory usage
  <br/>(you should see idle cycles)

- run `vmstat 3` to see I/O usage (si/so/bi/bo)
  <br/>(the 4 numbers should be almost zero,
  <br/>except `bo` for logging)

]

We have available resources.

- Why?
- How can we use them?

---

## Scaling workers on a single node

- Docker Compose supports scaling.red[*]
- Let's scale `worker` and see what happens!

.exercise[

- Start 9 more `worker` containers:
  <br/>`docker-compose scale worker=10`

- Check the aggregated logs of those containers:
  <br/>`docker-compose logs worker`

- See the impact on CPU load (with top/htop),
  <br/>and on compute speed (with web UI)

]

.footnote[.red[*]With some limitations, as we'll see later.]

---

# Identifying bottlenecks

- You should have seen a 3x speed bump (not 10x)

- Adding workers didn't result in linear improvement

- *Something else* is slowing us down

--

- ... But what?

--

- The code doesn't have instrumentation

- Let's use state-of-the-art HTTP performance analysis!
  <br/>(i.e. good old tools like `ab`, `httping`...)

???

## Benchmarking our microservices

We will test microservices in isolation.

.exercise[

- Stop the application:
  `docker-compose kill`

- Remove old containers:
  `docker-compose rm`

- Start `hasher` and `rng`:
  `docker-compose up hasher rng`

]

Now let's hammer them with requests!

???

## Testing `rng`

Let's assess the raw performance of our RNG.

.exercise[

- Test the performance on one big request:
  <br/>`curl -o/dev/null localhost:8001/10000000`
  <br/>(should take ~1s, and show speed of ~10 MB/s)

]

If we were doing requests of 1000 bytes ...

... Could we get 10k req/s?

Let's test and see what happens!

???

## Concurrent requests

.exercise[

- Test 100 requests of 1000 bytes each:
  <br/>`ab -n 100 localhost:8001/1000`

- Test 100 requests, 10 requests in parallel:
  <br/>`ab -n 100 -c 10 localhost:8001/1000`
  <br/>(look how the latency has increased!)

- Try with 100 requests in parallel:
  <br/>`ab -n 100 -c 100 localhost:8001/1000`

]

??

Whatever we do, we get ~10 requests/second.

Increasing concurrency doesn't help:
it just increases latency.

???

## Discussion

- When serving requests sequentially, they each take 100ms

- When 10 requests arrive at the same time:

  - one request is served in 100ms
  - another is served in 200ms
  - another is served in 300ms
  - ...
  - another is served in 1000ms

- All requests are queued and served by a single thread

- It looks like `rng` doesn't handle concurrent requests

- What about `hasher`?

???

## Save some random data and stop the generator

Before testing the hasher, let's save some random
data that we will feed to the hasher later.

.exercise[

- Run `curl localhost:8001/1000000 > /tmp/random`

]

Now we can stop the generator.

.exercise[

- In the shell where you did `docker-compose up rng`,
  <br/>stop it by hitting `^C`

]

???

## Benchmarking the hasher

We will hash the data that we just got from `rng`.

.exercise[

- Posting binary data requires some extra flags:

  ```
  curl \
    -H "Content-type: application/octet-stream" \
    --data-binary @/tmp/random \
    localhost:8002
  ```

- Compute the hash locally to verify that it works fine:
  <br/>`sha256sum /tmp/random`
  <br/>(it should display the same hash)

]

???

## The hasher under load

The invocation of `ab` will be slightly more complex as well.

.exercise[

- Execute 100 requests in a row:

  ```
  ab -n 100 -T application/octet-stream \
     -p /tmp/random localhost:8002/
  ```

- Execute 100 requests with 10 requests in parallel:

  ```
  ab -c 10 -n 100 -T application/octet-stream \
     -p /tmp/random localhost:8002/
  ```

]

Take note of the performance numbers (requests/s).

???

## Benchmarking the hasher on smaller data

Here we hashed 1,000,000 bytes.

Later we will hash much smaller payloads.

Let's repeat the tests with smaller data.

.exercise[

- Run `truncate --size=10 /tmp/random`
- Repeat the `ab` tests

]

---

# Measuring latency under load

We will use `httping`.

.exercise[

- Scale back the `worker` service to zero:
  <br/>`docker-compose scale worker=0`

- Open a new SSH connection and check the latency of `rng`:
  <br/>`httping localhost:8001`

- Open a new SSH conection and do the same for `hasher`:
  <br/>`httping localhost:8002`

- Keep an eye on both connections!

]

---

## Latency in initial conditions

Latency for both services should be very low (~1ms).

Now add a first worker and see what happens.

.exercise[

- Create the first `worker` instance:
  <br/>`docker-compose scale worker=1`

]

- `hasher` should be very low (~1ms)

- `rng` should be low, with occasional spikes (10-100ms)

---

## Latency when scaling the worker

We will add workers and see what happens.

.exercise[

- Run `docker-compose scale worker=2`

- Check latency

- Increase number of workers and repeat

]

What happens?

- `hasher` remains low
- `rng` spikes up until it is reaches ~(N-2)*100ms
  <br/>(when you have N workers)

---

class: title

Why?

---

## Why does everything take (at least) 100ms?

--

`rng` code:

![RNG code screenshot](delay-rng.png)

--

`hasher` code:

![HASHER code screenshot](delay-hasher.png)

---

class: title

But ...

WHY?!?

---

## Why did we sprinkle this sample app with sleeps?

- Deterministic performance
  <br/>(regardless of instance speed, CPUs, I/O...)

--

- Actual code sleeps all the time anyway

--

- When your code makes a remote API call:

  - it sends a request;

  - it sleeps until it gets the response;

  - it processes the response.

---

## Why do `rng` and `hasher` behave differently?

![Equations on a blackboard](equations.png)

--

(Synchronous vs. asynchronous event processing)

---

## How to make `rng` go faster

- Obvious solution: comment out the `sleep` instruction

--

- Real-world solution: use an asynchronous framework
  <br/>(e.g. use gunicorn with gevent)

--

- New rule: we can't change the code!

--

- Solution: scale out `rng`
  <br/>(dispatch `rng` requests on multiple instances)

---

# Scaling HTTP on a single node

- We could try to scale with Compose:

  ```
  docker-compose scale rng=3
  ```

- Compose doesn't deal with load balancing

- We would get 3 instances ...

- ... But only the first one would serve traffic

---

## The plan

- Stop the `rng` service first

- Create multiple identical `rng` containers

- Put a load balancer in front of them

- Point other services to the load balancer

---

## Stopping `rng`

- That's the easy part!

.exercise[

- Use `docker-compose` to stop `rng`:

  ```
  docker-compose stop rng
  ```

]

Note: we do this first because we are about to remove
`rng` from the Docker Compose file.

If we don't stop
`rng` now, it will remain up and running, with Compose
being unaware of its existence!

---

## Scaling `rng`

.exercise[

- Replace the `rng` service with multiple copies of it:

  ```
  rng1:
    build: rng

  rng2:
    build: rng

  rng3:
    build: rng
  ```

]

That's all!

Shortcut: `docker-compose.yml-scaled-rng`

---

## Introduction to `jpetazzo/hamba`

- Public image on the Docker Hub

- Load balancer based on HAProxy

- Expects the following arguments:
  <br/>`FE-port BE1-addr BE1-port BE2-addr BE2-port ...`
  <br/>*or*
  <br/>`FE-addr:FE-port BE1-addr BE1-port BE2-addr BE2-port ...`

  - FE=frontend (the thing other services connect to)

  - BE=backend (the multiple copies of your scaled service)

.small[
Example: listen to port 80 and balance traffic on www1:1234 + www2:2345

```
docker run -d -p 80 jpetazzo/hamba 80 www1 1234 www2 2345
```
]

---

# Put a load balancer on it

Let's add our load balancer to the Compose file.

.exercise[

- Add the following section to the Compose file:

  ```
  rng0:
      image: jpetazzo/hamba
      links:
        - rng1
        - rng2
        - rng3
      command: 80 rng1 80 rng2 80 rng3 80
      ports:
        - "8001:80"
  ```

]

Shortcut: `docker-compose.yml-scaled-rng`

---

## Point other services to the load balancer

- The only affected service is `worker`

- We have to replace the `rng` link with a link to `rng0`,
  but it should still be named `rng` (so we don't change the code)

.exercise[

- Update the `worker` section as follows:

  ```
  worker:
    build: worker
    links:
      - rng0:rng
      - hasher
      - redis
   ```

]

Shortcut: `docker-compose.yml-scaled-rng`

---

## Start the whole stack

.exercise[

- Start the new services:
  <br/>`docker-compose up -d`

- Check worker logs:
  <br/>`docker-compose logs worker`

- Check load balancer logs:
  <br/>`docker-compose logs rng0`

]

If you get errors about port 8001, make sure that
`rng` was stopped correctly and try again.

---

## Results

- Check the latency of `rng`
  <br/>(it should have improved significantly!)

- Check the application performance in the Web UI
  <br/>(it should improve if you have enough workers)

*Note: if `worker` was scaled when you did `docker-compose up`,
it probably took a while, because `worker` doesn't handle
signals properly and Docker patiently waits 10 seconds for
each `worker` instance to terminate. This would be much
faster for a well-behaved application.*

---

## The good, the bad, the ugly

- The good

  We scaled a service, added a load balancer -
  <br/>without changing a single line of code.

- The bad

  We manually copy-pasted sections in `docker-compose.yml`.

  Improvement: write scripts to transform the YAML file.

- The ugly

  If we scale up/down, we have to restart everything.

  Improvement: reconfigure the load balancer dynamically.

---

# Connecting to containers on other hosts

- So far, our whole stack is on a single machine

- We want to scale out (across multiple nodes)

- We will deploy the same stack multiple times

- But we want every stack to use the same Redis
  <br/>(in other words: Redis is our only *stateful* service here)

--

- And remember: we're not allowed to change the code!

  - the code connects to host `redis`
  - `redis` must resolve to the address of our Redis service
  - the Redis service must listen on the default port (6379)

---

## Using host name injection to abstract service dependencies

- It is possible to add host entries to a container

- With the CLI:

  ```
  docker run --add-host redis:192.168.1.2 myservice...
  ```

- In a Compose file:

  ```
  myservice:
    image: myservice
    extra_host:
      redis: 192.168.1.2
  ```

- This creates entries in `/etc/hosts` in the container
  </br>(in Engine 1.10, a local DNS server is used instead)

???

## The plan

- Deploy our Redis service separately

  - use the same `redis` image

  - make sure that Redis server port (6379) is publicly accessible,
    using port 6379 on the Docker host

- Update our Docker Compose YAML file

  - remove the `redis` section

  - in the `links` section, remove `redis`

  - instead, put a `redis` entry in `extra_hosts`

Note: the code stays on the first node!
<br/>(We do not need to copy the code to the other nodes.)

???

## Making Redis available on its default port

There are two strategies.

- `docker run -p 6379:6379 redis`

  - the container has its own, isolated network stack
  - Docker creates a port mapping rule through iptables
  - slight performance overhead
  - port number is explicit (visible through Docker API)

- `docker run --net host redis`

  - the container uses the network stack of the host
  - when it binds to 6379/tcp, that's 6379/tcp on the host
  - allows raw speed (no overhead due to iptables/bridge)
  - port number is not visible through Docker API

Choose wisely!

???

## Deploy Redis

.exercise[

- Start a new redis container, mapping port 6379 to 6379:

  ```
  docker run -d -p 6379:6379 redis
  ```

- Check that it's running with `docker ps`

- Note the IP address of this Docker host

- Try to connect to it (from anywhere):

  ```
  telnet ip.ad.dr.ess 6379
  ```

]

To exit a telnet session: `Ctrl-] c ENTER`

???

## Update `docker-compose.yml` (1/3)

.exercise[

- Comment out `redis`:

  ```
  #redis:
  #    image: redis
  ```

]

???

## Update `docker-compose.yml` (2/3)

.exercise[

- Update `worker`:

  ```
  worker:
    build: worker
    extra_hosts:
      redis: A.B.C.D
    links:
      - rng0:rng
      - hasher
  ```

]

Replace `A.B.C.D` with the IP address noted earlier.

Shortcut: `docker-compose.yml-extra-hosts`
<br/>(But you still have to replace `A.B.C.D`!)

???

## Update `docker-compose.yml` (3/3)

.exercise[

- Update `webui`:

  ```
  webui:
    build: webui
    extra_hosts:
      redis: A.B.C.D
    ports:
      - "8000:80"
    #volumes:
    #  - "./webui/files/:/files/"
  ```

]

(Replace `A.B.C.D` with the IP address noted earlier)

.icon[![Warning](warning.png)] Don't forget to comment out the `volumes` section!

???

## Why did we comment out the `volumes` section?

- Volumes have multiple uses:

  - storing persistent stuff (database files...)

  - sharing files between containers (logs, configuration...)

  - sharing files between host and containers (source...)

- The `volumes` directive expands to an host path
  <br/>.small[(e.g. `/home/docker/orchestration-workshop/dockercoins/webui/files`)]

- This host path exists on the local machine
  <br/>(not on the others)

- This specific volume is used in development
  <br/>(not in production)

???

## Start the stack on the first machine

- Nothing special to do here

- Just bring up the application like we did before

.exercise[

- `docker-compose up -d`

]

- Check in the web browser that it's running correctly

???

## Start the stack on another machine

- We will set the `DOCKER_HOST` variable

- `docker-compose` will detect and use it

- Our Docker hosts are listening on port 55555

.exercise[

- Set the environment variable:
  <br/>`export DOCKER_HOST=tcp://node2:55555`

- Start the stack:
  <br/>`docker-compose up -d`

- Check that it's running:
  <br/>`docker-compose ps`

]

???

## Scale!

.exercise[

- Open the Web UI
  <br/>(on a node where it's deployed)

- Deploy one instance of the stack on each node

]

???

## Cleanup

- Let's remove what we did

.exercise[

- You can use the following scriptlet:

  ```
  for N in $(seq 1 5); do
    export DOCKER_HOST=tcp://node$N:55555
    docker ps -qa | xargs docker rm -f
  done
  unset DOCKER_HOST
  ```

]

---

# Abstracting remote services with ambassadors

- What if we can't/won't run Redis on its default port?

- What if we want to be able to move it easily?

--

- We will use an ambassador

- Redis will be started independently of our stack

- It will run at an arbitrary location (host+port)

- In our stack, we replace `redis` with an ambassador

- The ambassador will connect to Redis

- The ambassador will "act as" Redis in the stack

---

## Start redis

- Start a standalone Redis container

- Let Docker expose it on a random port

.exercise[

- Run redis with a random public port:
  <br/>`docker run -d -P --name myredis redis`

- Check which port was allocated:
  <br/>`docker port myredis 6379`

]

- Note the IP address of the machine, and this port

---

## Update `docker-compose.yml`

.exercise[

<!--
- Restore `links` as they were before in `webui` and `worker`
-->

- Replace `redis` with an ambassador using `jpetazzo/hamba`:

  ```
  redis:
    image: jpetazzo/hamba
    command: 6379 AA.BB.CC.DD EEEEE
  ```

- Comment out the `volumes` section in `webui`:

  ```
  #volumes:
  #  - "./webui/files/:/files/"
  ```

]

Shortcut: `docker-compose.yml-ambassador`
<br/>(But you still have to update `AA.BB.CC.DD EEEE`!)

---

## Why did we comment out the `volumes` section?

- Volumes have multiple uses:

  - storing persistent stuff (database files...)

  - sharing files between containers (logs, configuration...)

  - sharing files between host and containers (source...)

- The `volumes` directive expands to an host path
  <br/>.small[(e.g. `/home/docker/orchestration-workshop/dockercoins/webui/files`)]

- This host path exists on the local machine
  <br/>(not on the others)

- This specific volume is used in development
  <br/>(not in production)

---

## Start the stack on the first machine

- Compose will detect the change in the `redis` service

- It will replace `redis` with a `jpetazzo/hamba` instance

.exercise[

- Just tell Compose to to its thing:

  ```
  docker-compose up -d
  ```

- Check that the stack is up and running:

  ```
  docker-compose ps
  ```

- Look at the Web UI to make sure that it works fine

]

---

## Start the stack on another machine

- We will set the `DOCKER_HOST` variable

- `docker-compose` will detect and use it

- Our Docker hosts are listening on port 55555

.exercise[

- Set the environment variable:
  <br/>`export DOCKER_HOST=tcp://node2:55555`

- Start the stack:
  <br/>`docker-compose up -d`

- Check that it's running:
  <br/>`docker-compose ps`

]

---

## Scale!

.exercise[

- Deploy one instance of the stack on each node:

  .small[
  ```
  for N in 3 4 5; do
    DOCKER_HOST=tcp://node$N:55555 docker-compose up -d &
  done
  ```
  ]

- Add a bunch of workers all over the place:

  .small[
  ```
  for N in 1 2 3 4 5; do
    DOCKER_HOST=tcp://node$N:55555 docker-compose scale worker=10
  done
  ```
  ]

- Admire the result in the Web UI!

]

---

## Social Media Moment

Let's celebrate our success!

(And the fact that we're just 2498349893849283948982 DockerCoins away from being able to afford a cup of coffee!)

.exercise[

- If you have a Twitter account, tweet your mining speed!
  </br>(use the "Tweet this!" link below the graph☺)

]

---

# Various considerations about ambassadors

- "But, ambassadors are adding an extra hop!"

--

- Yes, but if you need load balancing, you need that hop

- Ambassadors actually *save* one hop
  <br/>(they act as local load balancers)

  - traditional load balancer:
    <br/>client ⇒ external LB ⇒ server (2 physical hops)

  - ambassadors:
    <br/>client → ambassador ⇒ server (1 physical hop)

--

- Ambassadors are more reliable than traditional LBs
  <br/>(they are colocated with their clients)

---

## Inconvenients of ambassadors

- Generic issues
  <br/>(shared with any kind of load balancing / HA setup)

  - extra logical hop (not transparent to the client)

  - must assess backend health

  - one more thing to worry about (!)

- Specific issues

  - load balancing fairness

High-end load balancing solutions will rely on back pressure
from the backends. This addresses the fairness issue.

---

## There are many ways to deploy ambassadors

"Ambassador" is a design pattern.

There are many ways to implement it.

We will present three increasingly complex (but also powerful)
ways to deploy ambassadors.

---

## Single-tier ambassador deployment

- One-shot configuration process

- Must be executed manually after each scaling operation

- Scans current state, updates load balancer configuration

- Pros:
  <br/>- simple, robust, no extra moving part
  <br/>- easy to customize (thanks to simple design)
  <br/>- can deal efficiently with large changes

- Cons:
  <br/>- must be executed after each scaling operation
  <br/>- harder to compose different strategies

- Example: this workshop

---

## Two-tier ambassador deployment

- Daemon listens to Docker events API

- Reacts to container start/stop events

- Adds/removes back-ends to load balancers configuration

- Pros:
  <br/>- no extra step required when scaling up/down

- Cons:
  <br/>- extra process to run and maintain
  <br/>- deals with one event at a time (ordering matters)

- Hidden gotcha: load balancer creation

- Example: interlock

---

## Three-tier ambassador deployment


- Daemon listens to Docker events API

- Reacts to container start/stop events

- Adds/removes scaled services in distributed config DB
  <br/>(zookeeper, etcd, consul…)

- Another daemon listens to config DB events

- Adds/removes backends to load balancers configuration

- Pros:
  <br/>- more flexibility

- Cons:
  <br/>- three extra services to run and maintain

- Example: registrator

---

## Other multi-host communication mechanisms

- Overlay networks

  - weave, flannel, pipework ...

- Network plugins

  - available since Engine 1.9

- Allow a flat network for your containers

- Often requires an extra service to deal with BUM packets
  <br/>(broadcast/unknown/multicast)

  - e.g. a key/value store (Consul, Etcd, Zookeeper ...)

- Load balancers and/or failover mechanisms still needed

---

class: title

# Interlude <br/>

# Docker for ops

---

# Backups

- Redis is still running (with name `myredis`)

- We want to enable backups without touching it

- We will use a special backup container:

  - sharing the same volumes

  - linked to it (to connect to it easily)

  - possibly containing our backup tools

- This works because the `redis` container image
  <br/>stores its data on a volume

---

## Starting the backup container

.exercise[

- Make sure you're talking to the initial host:

  ```
  unset DOCKER_HOST
  ```

- Start the container:

  ```
  docker run --link myredis:redis \
             --volumes-from myredis \
             -v /tmp/myredis:/output \
             -ti alpine sh
  ```

- Look in `/data` in the container
  <br/>(That's where Redis puts its data dumps)
]

---

## Connecting to Redis

- We need to tell Redis to perform a data dump *now*

.exercise[

- Connect to Redis:
  <br/>`telnet redis 6379`

- Issue commands `SAVE` then `QUIT`

- Look at `/data` again

]

- There should be a recent dump file now!

---

## Getting the dump out of the container

- We could use many things:

  - s3cmd to copy to S3
  - SSH to copy to a remote host
  - gzip/bzip/etc before copying

- We'll just copy it to the Docker host

.exercise[

- Copy the file from `/data` to `/output`

- Exit the container

- Look into `/tmp/myredis` (on the host)

]

---

# Logs

- Two strategies:

  - log to plain files on volumes

  - log to stdout
    <br/>(and use a logging driver)

---

## Logging to plain files on volumes

(Sorry, that part won't be hands-on!)

- Start a container with `-v /logs`

- Make sure that all log files are in `/logs`

- To check logs, run e.g.

  ```
  docker run --volumes-from ... ubuntu sh -c \
         "grep WARN /logs/*.log"
  ```

- Or just go interactive:

  ```
  docker run --volumes-from ... -ti ubuntu
  ```

- You can (should) start a log shipper that way

---

## Logging to stdout

- All containers should write to stdout/stderr

- Docker will collect logs and pass them to a logging driver

- Logging driver can specified globally, and per container
  <br/>(changing it for a container overrides the global setting)

- To change the global logging driver,
  <br/>pass extra flags to the daemon
  <br/>(requires a daemon restart)

- To override the logging driver for a container,
  <br/>pass extra flags to `docker run`

---

## Specifying logging flags

- `--log-driver`

  *selects the driver*

- `--log-opt key=val`

  *adds driver-specific options*
  <br/>*(can be repeated multiple times)*

- The flags are identical for `docker daemon` and `docker run`

Tip #1: when provisioning with Docker Machine, use:
```
docker-machine create ... --engine-opt log-driver=...
```

Tip #2: you can set logging options in Compose files.

---

## Available drivers

- json-file (default)

- syslog (can send to UDP, TCP, TCP+TLS, UNIX sockets)

- awslogs (AWS CloudWatch)

- journald

- gelf

- fluentd

- splunk

---

## About json-file ...

- It doesn't rotate logs by default, so your disks will fill up

  (Unless you set `maxsize` *and* `maxfile` log options.)

- It's the only one supporting logs retrieval

  (If you want to use `docker logs`, `docker-compose logs`,
  or fetch logs from the Docker API, you need json-file!)

- This might change in the future

  (But it's complex since there is no standard protocol
  to *retrieve* log entries.)

All about logging in the documentation:
https://docs.docker.com/reference/logging/overview/

---

# Storing container logs in an ELK stack

*Important foreword: this is not an "official" or "recommended"
setup; it is just an example. We do not endorse ELK, GELF,
or the other elements of the stack more than others!*

What we will do:

- Spin up an ELK stack, with Compose

- Gaze at the spiffy Kibana web UI

- Manually send a few log entries over GELF

- Reconfigure our DockerCoins app to send logs to ELK

---

## What's in an ELK stack?

- ELK is three components:

  - ElasticSearch (to store and index log entries)

  - Logstash (to receive log entries from various
    sources, process them, and forward them to various
    destinations)

  - Kibana (to view/search log entries with a nice UI)

- The only component that we will configure is Logstash

- We will accept log entries using the GELF protocol

- Log entries will be stored in ElasticSearch,
  <br/>and displayed on Logstash's stdout for debugging

---

## Starting our ELK stack

- We will use a *separate* Compose file

- The Compose file is in the `elk` directory

.exercise[

- Go to the `elk` directory:
  ```
  cd ~/orchestration-workshop/elk
  ```

- Start the ELK stack:
  ```
  docker-compose up -d
  ```

]

---

## Checking that our ELK stack works

- Our default Logstash configuration sends a test
  message every minute

- All messages are stored into ElasticSearch,
  but also shown on Logstash stdout

.exercise[

- Look at Logstash stdout:
  ```
  docker-compose log logstash
  ```

]

After less than one minute, you should see a `"message" => "ok"`
in the output.

---

## Connect to Kibana

- Our ELK stack exposes two public services:
  <br/>the Kibana web server, and the GELF UDP socket

.exercise[

- Check the port number for the Kibana UI:
  ```
  docker-compose ps kibana
  ```

- Open the UI in your browser
  <br/>(Use the instance IP address and the public port number)

]

---

## "Configuring" Kibana

- If you see a status page with a yellow item, wait a minute and reload
  (Kibana is probably still initializing)

- Kibana should offer you to "Configure an index pattern",
  just click the "Create" button

- Then:

  - click "Discover" (in the top-left corner)
  - click "Last 15 minutes" (in the top-right corner)
  - click "Last 1 hour" (in the list in the middle)
  - click "Auto-refresh" (top-right corner)
  - click "5 seconds" (top-left of the list)

- You should see a series of green bars
  <br/>(with one new green bar every minute)

---

## Kibana out of the box

![Screenshot of Kibana](kibana.png)

---

## Sending container output to Kibana

- We will create a simple container displaying "hello world"

- We will override the container logging driver

.exercise[

- Check the port number for the GELF socket:
  <br/>`docker-compose ps logstash`

- Start a one-off container, overriding its logging driver:
  <br/>(make sure to update X.X.X.X:XXXXX, of course)

  ```
  docker run --rm --log-driver gelf \
         --log-opt gelf-address=udp://X.X.X.X:XXXXX \
         alpine echo hello world
  ```

]

---

## Visualizing container logs in Kibana

- Less than 5 seconds later (the refresh rate of the UI),
  the log line should be visible in the Web UI

- We can customize the Web UI to be more readable

.exercise[

- In the left column, move the mouse over the following
  columns, and click the "Add" button that appears:

  - host
  - container_name
  - short_message

]

---

## Removing the old deployment of DockerCoins

- Before redeploying DockerCoins, remove everything

.exercise[

- Stop all DockerCoins containers:
  <br/>`docker-compose kill`

- Remove them:
  <br/>`docker-compose rm -f`

- Reset the Compose file:
  <br/>`git checkout docker-compose.yml`

- Point the Docker API to a single node:
  <br/>`eval $(docker-machine env -u)`

]

---

## Add the logging driver to the Compose file

- We need to add the logging section to each container

- We need the GELF endpoint (host+port) that we
  got earlier with `docker-compose ps logstash`

.exercise[

- Edit the `docker-compose.yml` file,
  <br/>adding the the following lines **to each container**:

  ```
  log_driver: gelf
  log_opt:
    gelf-address: "udp://X.X.X.X:XXXXX"
  ```

]

Shortcut: `docker-compose.yml-logging`
<br/>(But you still have to update `XX.XX.XX.XX:XXXXX`!)

---

## Start the DockerCoins app

.exercise[

- Use Compose normally:
  ```
  docker-compose up -d
  ```

]

If you look in the Kibana web UI, you will see log lines
refreshed every 5 seconds.

Note: to do interesting things (graphs, searches...) we
would need to create indexes. This is beyond the scope
of this workshop.

---

# Security upgrades

- This section is not hands-on

- Public Service Announcement

- We'll discuss:

  - how to upgrade the Docker daemon

  - how to upgrade container images

---

## Upgrading the Docker daemon

- Stop all containers cleanly
  <br/>(`docker ps -q | xargs docker stop`)

- Stop the Docker daemon

- Upgrade the Docker daemon

- Start the Docker daemon

- Start all containers

- This is like upgrading your Linux kernel,
  <br/>but it will get better

---

## Upgrading container images

- When a vulnerability is announced:

  - if it affects your base images,
    <br/>make sure they are fixed first

  - if it affects downloaded packages,
    <br/>make sure they are fixed first

  - re-pull base images

  - rebuild

  - restart containers

(The procedure is simple and plain, just follow it!)

---

# Network traffic analysis

- We still have `myredis` running

- We will use *shared network namespaces*
  <br/>to perform network analysis

- Two containers sharing the same network namespace...

  - have the same IP addresses

  - have the same network interfaces

- `eth0` is therefore the same in both containers

---

## Install and start `ngrep`

Ngrep uses libpcap (like tcpdump) to sniff network traffic.

.exercise[

- Start a container with the same network namespace:
  <br/>`docker run --net container:myredis -ti alpine sh`

- Install ngrep:
  <br/>`apk update && apk add ngrep`

- Run ngrep:
  <br/>`ngrep -tpd eth0 -Wbyline . tcp`

]

You should see a stream of Redis requests and responses.

---

class: title

# Dynamic orchestration

---

## Static vs Dynamic

- Static

  - you decide what goes where

  - simple to describe and implement

  - seems easy at first but doesn't scale efficiently

- Dynamic

  - the system decides what goes where

  - requires extra components (HA KV...)

  - scaling can be finer-grained, more efficient

---

## Mesos (overview)

- First presented in 2009

- Initial goal: resource scheduler
  <br/>(two-level/pessimistic)

  - top-level "master" knows the global cluster state

  - "slave" nodes report status and resources to master

  - master allocates resources to "frameworks"

- Container support added recently
  <br/>(had to fit existing model)

- Network and service discovery is complex

---

## Mesos (in practice)

- Easy to setup a test cluster (in containers!)

- Great to accommodate mixed workloads
  <br/>(see Marathon, Chronos, Aurora, and many more)

- "Meh" if you only want to run Docker containers

- In production on clusters of thousands of nodes

- Open source project; commercial support available

---

## Kubernetes (overview)

- 1 year old

- Designed specifically as a platform for containers
  <br/>("greenfield" design)

  - "pods" = groups of containers sharing network/storage

  - Scaling and HA managed by "replication controllers"

  - extensive use of "tags" instead of e.g. tree hierarchy

- Initially designed around Docker,
  <br/>but doesn't hesitate to diverge in a few places

---

## Kubernetes (in practice)

- Network and service discovery is powerful, but complex
  <br/>.small[(different mechanisms within pod, between pods, for inbound traffic...)]

- Initially designed around GCE
  <br/>.small[(currently relies on "native" features for fast networking and persistence)]

- Adaptation is needed when it differs from Docker
  <br/>.small[(need to learn new API, new tooling, new concepts)]

- Tends to be loved by ops more than devs
  <br/>.small[(but keep in mind that it's evolving quite as fast as Docker)]

---

## Swarm (in theory)

- Consolidates multiple Docker hosts into a single one

- "Looks like" a Docker daemon, but it dispatches (schedules)
  your containers on multiple daemons

- Talks the Docker API front and back
  <br/>(leverages the Docker API and ecosystem)

- Open source and written in Go (like Docker)

- Started by two of the original Docker authors
  <br/>([@aluzzardi](https://twitter.com/aluzzardi) and [@vieux](https://twitter.com/vieux))

---

## Swarm (in practice)

- Stable since November 2015

- Tested with 1000 nodes + 50000 containers
  <br/>.small[(without particular tuning; see DockerCon EU opening keynotes!)]

- Perfect for some scenarios (Jenkins, grid...)

- Requires extra effort for Compose build, links...

- Requires a key/value store to achieve high availability

- We'll see it in action!

---

## PAAS on Docker

- The PAAS workflow: *just push code*
  <br/>(inspired by Heroku, dotCloud...)

- TL,DR: easier for devs, harder for ops,
  <br/>some very opinionated choices

- A few examples:
  <br/>(Non-exhaustive list!!!)

  - Cloud Foundry
  - Deis
  - Dokku
  - Flynn
  - Tsuru

---

## A few other tools

- Volume plugins (Convoy, Flocker...)

  - manage/migrate stateful containers (and more)

- Network plugins (Contiv, Weave...)

  - overlay network so that containers can ping each other

- Powerstrip

  - sits in front of the Docker API; great for experiments

- Tutum, Docker UCP (Universal Control Plane)

  - dashboards to manage fleets of Docker hosts

... And many more!

---

# Hands-on Swarm

![Swarm Logo](swarm.png)

---

## Setting up our Swarm cluster

- This can be done manually or with **Docker Machine**

- Manual deployment:

  - with TLS: certificate generation is painful
    <br/>(needs dual-use certs)

  - without TLS: easier, but insecure
    <br/>(unless you run on your internal/private network)

- Docker Machine deployment:

  - generates keys, certificates, and deploys them for you

  - can also create VMs

---

## The Way Of The Machine

- Install `docker-machine` (single binary download)

- Set a few environment variables (cloud credentials)

- Create one or more machines:
  <br/>`docker-machine create -d digitalocean node42`

- List machines and their status:
  <br/>`docker-machine ls`

- Select a machine for use:
  <br/>`eval $(docker-machine env node42)`
  <br/>(this will set a few environment variables)

- Execute regular commands with Docker, Compose, etc.
  <br/>(they will pick up remote host address from environment)

---

## Docker Machine `generic` driver

- Most drivers work the same way:

  - use cloud API to create instance

  - connect to instance over SSH

  - install Docker

- The `generic` driver skips the first step

- It can install Docker on any machine,
  <br/>as long as you have SSH access

- We will use that!

---

# Deploying Swarm

- Components involved:

  - service discovery mechanism
    <br/>(we'll use Docker's hosted system)

  - swarm manager
    <br/>(runs on `node1`, exposes Docker API)

  - swarm agent
    <br/>(runs on each node, registers it with service discovery)

---

# Cluster discovery

- Possible backends:

  - dynamic, self-hosted (zk, etcd, consul)

  - static (command-line or file)

  - hosted by Docker (token)

- We will use the token mechanism

---

## Generating our Swarm discovery token

The token is a unique identifier, corresponding to a bucket
in the discovery service hosted by Docker Inc.

(You can consider it as a rendez-vous point for your cluster.)

.exercise[

- Create your token, saving it preciusly to disk as well:

  ```
  TOKEN=$(docker run swarm create | tee token)
  ```

]

---

## Swarm agent

- Used only for dynamic discovery (zk, etcd, consul, token)

- Must run on each node

- Every 20s (by default), tells to the discovery system:
  </br>"Hello, there is a Swarm node at A.B.C.D:EFGH"

- Must know the node's IP address
  <br/>(sorry, it can't figure it out by itself, because
  <br/>it doesn't know whether to use public or private addresses)

- The node continues to work even if the agent dies

- Automatically started by Docker Machine
  <br/>(when the `--swarm` option is passed)

---

## Swarm manager

- Today: must run on the leader node

- Later: can run on multiple nodes, with leader election

- Automatically started by Docker Machine
  <br/>(when the `--swarm-master` option is passed)

.exercise[

- Connect to `node1`

- "Create" a node with Docker Machine

  .small[
  ```
  docker-machine create --driver generic \
    --swarm --swarm-master --swarm-discovery token://$TOKEN \
    --generic-ssh-user docker --generic-ip-address 1.2.3.4 node1
  ```
  ]

]

(Don't forget to replace 1.2.3.4 with the node IP address!)

---

## Check our node

Let's connect to the node *individually*.

.exercise[

- Select the node with Machine

  ```
  eval $(docker-machine env node1)
  ```

- Execute some Docker commands

  ```
  docker version
  docker info
  docker ps
  ```

]

Two containers should show up: the agent and the manager.

---

## Check our (single-node) Swarm cluster

Let's connect to the manager instead.

.exercise[

- Select the Swarm manager with Machine

  ```
  eval $(docker-machine env node1 --swarm)
  ```

- Execute some Docker commands

  ```
  docker version
  docker info
  docker ps
  ```

]

The output is different! Let's review this.

---

## `docker version`

Swarm identifies itself clearly:

```
Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:        Fri Nov 20 13:20:08 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      swarm/1.0.1
 API version:  1.21
 Go version:   go1.5.2
 Git commit:   744e3a3
 Built:
 OS/Arch:      linux/amd64
```

---

## `docker info`

Swarm gives cluster information, showing all nodes:

```
Containers: 3
Images: 6
Role: primary
Strategy: spread
Filters: affinity, health, constraint, port, dependency
Nodes: 1
 node: 52.89.117.68:2376
  └ Containers: 3
  └ Reserved CPUs: 0 / 2
  └ Reserved Memory: 0 B / 3.86 GiB
  └ Labels: executiondriver=native-0.2,
            kernelversion=3.13.0-53-generic,
            operatingsystem=Ubuntu 14.04.2 LTS,
            provider=generic, storagedriver=aufs
CPUs: 2
Total Memory: 3.86 GiB
Name: 2ec2e6c4054e
```

---

## `docker ps`

- This one should show nothing at this point.

- The Swarm containers are hidden.

- This avoids unneeded pollution.

- This also avoids killing them by mistake.

---

## Add other nodes to the cluster

- Let's use *almost* the same command line
  <br/>(but without `--swarm-master`)

.exercise[

- Stay on `node1` (it has keys and certificates now!)

- Add another node with Docker Machine

  .small[
  ```
  docker-machine create --driver generic \
    --swarm --swarm-discovery token://$TOKEN \
    --generic-ssh-user docker --generic-ip-address 1.2.3.4 node2
  ```
  ]
]

Remember to update the IP address correctly.

Repeat for all 4 nodes.

Pro tip: look for name/address mapping in `/etc/hosts`.

---

## Scripting

To help you a little bit:

```
grep node[2345] /etc/hosts | grep -v ^127 |
while read IPADDR NODENAME
do docker-machine create --driver generic \
   --swarm --swarm-discovery token://$TOKEN \
   --generic-ssh-user docker \
   --generic-ip-address $IPADDR $NODENAME
done
```

---

## Running containers on Swarm

Try to run a few `busybox` containers.

Then, let's get serious:

.exercise[

- Start a Redis service:
  <br/>`docker run -dP redis`

- See the service address:
  <br/>`docker port $(docker ps -lq) 6379`

]

This can be any of your five nodes.

---

# Building our app on Swarm

Before trying to build our app, we will remove previous images.

.exercise[

- Delete all images with "dockercoins" in the name:

  ```
  docker images |
    grep dockercoins |
    awk '{print $3}' |
    xargs -r docker rmi -f
  ```

]

---

## Building our app on Swarm

- Swarm has partial support for builds

- .icon[![Warning](warning.png)] Older versions of Compose would crash on builds

.exercise[

- Run `docker-compose build` multiple times
  <br/>(until you get it to build twice)

- Loudly complain that caching doesn't work as expected!

- Run one container multiple times with a resource limit:
  <br/>`docker run -d -m 1G dockercoins_rng`

- Check where the containers are running with `docker ps`

]

---

## Caveats when building with Swarm

- Caching doesn't work all the time

  - cause: build nodes can be picked randomly

  - solution: always pin builds to the same node

- Containers are only scheduled on a few nodes

  - cause: images are not present on all nodes

  - solution: distribute images through a registry
    <br/>(e.g. Docker Hub)

---

## Why can't Swarm do this automatically for us?

- Let's step back and think for a minute ...

- What should `docker build` do on Swarm?

  - build on one machine

  - build everywhere ($$$)

- After the build, what should `docker run` do?

  - run where we built (how do we know where it is?)

  - run on any machine that has the image

- Could Compose+Swarm solve this automatically?

---

## A few words about "sane defaults"

- *It would be nice if Swarm could pick a node, and build there!*

  - but which node should it pick?
  - what if the build is very expensive?
  - what if we want to distribute the build across nodes?
  - what if we want to tag some builder nodes?
  - ok but what if no node has been tagged?

- *It would be nice if Swarm could automatically push images!*

  - using the Docker Hub is an easy choice
    <br/>(you just need an account)
  - but some of us can't/won't use Docker Hub
    <br/>(for compliance reasons or because no network access)

.small[("Sane" defaults are nice only if we agree on the definition of "sane")]

---

## The plan

- Build locally

- Tag images

- Upload them to the hub
  <br/>(Note: this part requires a Docker Hub account!)

- Update the Compose file to use those images

*That's the purpose of the `build-tag-push.py` script!*

---

## Docker Hub account

- You need a Docker Hub account for that part

- If you don't have one, create it

.exercise[

- Set the following environment variable:

  ```
  export DOCKERHUB_USER=jpetazzo
  ```

- (Use *your* Docker Hub login, of course!)

- Log into the Docker Hub:

  ```
  docker login
  ```

]

---

## Build, Tag, And Push

Let's inspect the source code of `build-tag-push.py` and run it.

.icon[![Warning](warning.png)] It is better to run it against a single node!

(There are some race conditions within Swarm when building+pushing too fast.)

.exercise[

- Point to a single node:
  <br/>`eval $(docker-machine env node1)`

- Run the script (from the `dockercoins` directory):
  <br/>`../build-tag-push.py`

- Inspect the `docker-compose.yml-XXX` file that it created

]

---

## Can we run this now?

Let's try!

.exercise[

- Switch back to the Swarm cluster:
  <br/>`eval $(docker-machine env node1 --swarm)`

- Protip - set the `COMPOSE_FILE` variable:
  <br/>`export COMPOSE_FILE=docker-compose.yml-XXX`

- Bring up the application:
  <br/>`docker-compose up`

]

--

It won't work, because Compose and Swarm do not collaborate
to establish *placement constraints*.

--

(╯°□°)╯︵ ┻━┻

---

## Simple container dependencies

- Container A has a link to container B

- Compose starts B first, then A

- Swarm translates the link into a placement constraint:

  - *"put A on the same node as B"*

- Alles gut

---

## Complex container dependencies

- Container A has a link to containers B and C

- Compose starts B and C first
  <br/>(but that can be on different nodes!)

- Compose starts A

- Swarm translates the links into placements contraints

  - *"put A on the same node as B"*
  - *"put A on the same node as C"*

- If B and C are on different nodes, that's impossible

So, what do‽

---

## A word on placement constraints

- Swarm supports constraints

- We could tell swarm to put all our containers together

- Linking would work

- But all containers would end up on the same node

--

- So having a cluster would be pointless!

---

# Network plumbing on Swarm

- We will use one-tier, dynamic ambassadors
  <br/>(as seen before)

- Other available options:

  - injecting service addresses in environment variables

  - implementing service discovery in the application

  - use an overlay network

---

## Revisiting `jpetazzo/hamba`

- Configuration is stored in a *volume*

- A watcher process looks for configuration updates,
  <br/>and restarts HAProxy when needed

- It can be started without configuration:

  ```
  docker run --name amba jpetazzo/hamba run
  ```

- There is a helper to inject a new configuration:

  ```
  docker run --rm --volumes-from amba jpetazzo/hamba \
         reconfigure 80 backend1 port1 backend2 port2 ...
  ```

.footnote[Note: configuration validation and error messages
will be logged by the ambassador, not the `reconfigure` container.]

---

## Should we use `links` for our ambassadors?

Technically, we could use links.

- Before starting an app container:

  start the ambassador(s) it needs

- When starting an app container:

  link it to its ambassador(s)

But we wouldn't be able to use `docker-compose scale` anymore.

---

## Network namespaces and `extra_hosts`

This is our plan:

- Replace each `link` with an `extra_host`,
  <br/>pointing to the `127.127.X.X` address space

- Start app containers normally
  <br/>(`docker-compose up`, `docker-compose scale`)

- Start ambassadors after app containers are up:

  - ambassadors bind to `127.127.X.X`

  - they share their client's network namespace

- Reconfigure ambassadors each time something changes

---

## Our plan for service discovery

- Replace all `links` with static `/etc/hosts` entries

- Those entries will map to `127.127.0.X`
  <br/>(with different `X` for each service)

- Example: `redis` will point to `127.127.0.2`
  <br/>(instead of a container address)

- Start all services; scale them if we want
  <br/>(at this point, they will all fail to connect)

- Start ambassadors in the services' namespace;
  <br/>each ambassador will listen on the right `127.127.0.X`

- Gather all backend addresses and configure ambassadors

.icon[![Warning](warning.png)] Services should try to reconnect!

---

## "Design for failure," they said

- When the containers are started, the network is not ready

- First connection attempts **will fail**

- App should try to reconnect

- It is OK to crash and restart

- Exponential back-off is nice

---

## Our tools

- `link-to-ambassadors.py`

  - replaces all `links` with `extra_hosts` entries

- `create-ambassadors.py`

  - scans running containers
  - allocates `127.127.X.X` addresses
  - starts (unconfigured) ambassadors

- `configure-ambassadors.py`

  - scans running containers
  - gathers backend addresses
  - sends configuration to ambassadors

---

## Convert links to ambassadors

- When we ran `build-tag-push.py` earlier,
  <br/>it generated a new `docker-compose.yml-XXX` file.

.exercise[

- Run the first script to create a new YAML file:
  <br/>`../link-to-ambassadors.py $COMPOSE_FILE new.yml`

- Look how the file was modified:
  <br/>`diff $COMPOSE_FILE new.yml`

]

---

## Change `$COMPOSE_FILE` in place

The script can take zero, one, or two file name arguments:

- two arguments indicate input and output files to use;
- with one argument, the file will be modified in place;
- with zero agument, it will act on `$COMPOSE_FILE`.

For convenience, let's avoid having a bazillion files around.

.exercise[

- Remove the temporary Compose file we just created:
  <br/>`rm -f new.yml`

- Update `$COMPOSE_FILE` in place:
  <br/>`../link-to-ambassadors.py`

]

---

## Bring up the application

The application can now be started and scaled.

.exercise[

- Start the application:
  <br/>`docker-compose up -d`

- Scale the application:
  <br/>`docker-compose scale worker=5 rng=10`

]

Note: you can scale everything as you like, *except Redis*,
because it is stateful.

---

## Create the ambassadors

This has to be executed each time you create new services
or scale up existing ones.

After reading `$COMPOSE_FILE`, it will scan running containers, and compare:

- the list of app containers,
- the list of ambassadors.

It will create missing ambassadors.

.exercise[

- Run the script!
  <br/>`../create-ambassadors.py`

]

---

## Configure the ambassadors

All ambassadors are created but they still need configuration.

That's the purpose of the last script.

It will read `$COMPOSE_FILE` and gather:

- the list of app backends,
- the list of ambassadors.

Then it configures all ambassadors with all found backends.

.exercise[

- Run it!
  <br/>`../configure-ambassadors.py`

]

---

## Check what we did

.exercise[


- Find out the address of the web UI:
  <br/>`docker-compose ps webui`

- Point your browser to it

- Check the logs:
  <br/>`docker-compose logs`

]

---

# Going further

Scaling the application (difficulty: easy)

- Run `docker-compose scale`

- Re-create ambassadors

- Re-configure ambassadors

- No downtime

---

## Going further

Deploying a new version (difficulty: easy)

- Just re-run all the steps!

- However, Compose will re-create the containers

- You will have to re-create ambassadors
  <br/>(and configure them)

- You will have to cleanup old ambassadors
  <br/>(left as an exercise for the reader)

- You will experience a little bit of downtime

---

## Going further

Zero-downtime deployment (difficulty: medium)

- Isolate stateful services
  <br/>(like we did earlier for Redis)

- Do blue/green deployment:

  - deploy and scale version N

  - point a "top-level" load balancer to the app

  - deploy and scale version N+1

  - put both apps in the "top-level" balancer

  - slowly switch traffic over to app version N+1

---

## Going further

Use the new networking features (difficulty: medium)

- Create a key/value store (e.g. Consul cluster)

- Reconfigure all Engines to use the key/value store

- Load balancers can use DNS for backend discovery

Note: this is really easy to do with a 1-node Consul cluster.

---

## Going further

Harder projects:

- Two-tier or three-tier ambassador deployments

- Deploy to Mesos or Kubernetes

---

class: pic

![Here Be Dragons](dragons.jpg)

---

# Here be dragons

- So far, we've used stable products (versions 1.X)

- We're going to explore experimental software

- **Use at your own risk**

---

# Setting up Consul and overlay networks

- We will reconfigure our Swarm cluster to enable overlays

- We will deploy a Consul cluster

- We will connect containers running on different machines

---

## First, let's Clean All The Things!

- We need to remove the old containers
  <br/>(in particular the `swarm` agents and managers)

.exercise[

- The following snippet will nuke all containers on all hosts:

  ```
  for N in 1 2 3 4 5
  do
    ssh node$N "docker ps -qa | xargs -r docker rm -f"
  done
  ```

(If it asks you to confirm SSH keys, just do it!)

]

Note: our Swarm cluster is now broken.

---

## Remove old Machine information

- We will use `docker-machine rm`

- With the `generic` driver, this doesn't do anything
  <br/>(it just deletes local configuration)

- With cloud/VM drivers, this would actually delete VMs

.exercise[

- Remove our nodes from Docker Machine config database:

  ```
  for N in 1 2 3 4 5
  do
    docker-machine rm -f node$N
  done
  ```

]

---

## Add extra options to our Engines

- We need two new options for our engines:

  - `cluster-store` (to indicate which key/value store to use)

  - `cluster-advertise` (to indicate which IP address to register)

- `cluster-store` will be `consul://localhost:8500`
  <br/>(we will run one Consul node on each machine)

- `cluster-advertise` will be `eth0:2376`
  <br/>(Engine will automatically pick up eth0's IP address)

---

## Reconfiguring Swarm clusters, the Docker way

- The traditional way to reconfigure a service is to edit
  its configuration (or init script), then restart

- We can use Machine to make that easier

- Re-deploying with Machine's `generic` driver will reconfigure
  Engines with the new parameters

.exercise[

- Re-provision the manager node:

  .small[
  ```
  docker-machine create --driver generic \
    --engine-opt cluster-store=consul://localhost:8500 \
    --engine-opt cluster-advertise=eth0:2376 \
    --swarm --swarm-master --swarm-discovery consul://localhost:8500 \
    --generic-ssh-user docker --generic-ip-address 52.32.216.30 node1
  ```
  ]
]

---

## Reconfigure the other nodes

- Once again, scripting to the rescue!

.exercise[

```
grep node[2345] /etc/hosts | grep -v ^127 |
while read IPADDR NODENAME
do docker-machine create --driver generic \
   --engine-opt cluster-store=consul://localhost:8500 \
   --engine-opt cluster-advertise=eth0:2376 \
   --swarm --swarm-discovery consul://localhost:8500 \
   --generic-ssh-user docker \
   --generic-ip-address $IPADDR $NODENAME
done
```

]

---

## Checking what we did

.exercise[

- Directly point the CLI to a node and check configuration:

  ```
  eval $(docker-machine env node1)
  docker info
  ```

  (should show `Cluster store` and `Cluster advertise`)

- Try to talk to the Swarm cluster:

  ```
  eval $(docker-machine env node1 --swarm)
  docker info
  ```

  (should show zero node)

]

---

## Why zero node?

- We haven't started Consul yet

- Swarm discovery is not operationl

- Swarm can't discover the nodes

Note: good guy ~~Stevedore~~ Docker will start without K/V

(This lets us run Consul itself in a container!)

---

## Adding Consul

- We will run Consul in containers

- We will use a
  [custom consul image](https://hub.docker.com/r/jpetazzo/consul/)

- We will tell Docker to automatically restart it on reboots

- To simplify network setup, we will use `host` networking

---

## Starting the first Consul node

.exercise[

- Log into `node1`

- The first node must be started with the `-bootstrap` flag:

  ```
  CID=$(docker run --name consul_node1 \
        -d --restart=always --net host \
        jpetazzo/consul agent -server -bootstrap)
  ```

- Find the internal IP address of that node
  <br/>With This One Weird Trick:

  ```
  IPADDR=$(ip a ls dev eth0 |
           sed -n 's,.*inet \(.*\)/.*,\1,p')
  ```

]

---

## Starting the other Consul nodes

.exercise[

- The other nodes have to be startd with the `-join IP.AD.DR.ESS` flag:

  ```
  for N in 2 3 4 5; do
  ssh node$N docker run --name consul_node$N \
             -d --restart=always --net host \
             jpetazzo/consul agent -server -join $IPADDR
  done
  ```

- With your browser, navigate to any instance on port 8500
  <br/>(in "NODES" you should see the five nodes)

]

---

## Check that our Consul cluster is up

- Let's run a couple of useful Consul commands

.exercise[

- Ask Consul the list of members it knows:
  ```
  docker run --net host --rm jpetazzo/consul members
  ```

- Ask Consul which node is the current leader:
  ```
  curl localhost:8500/v1/status/leader
  ```

]

---

## Check that our Swarm cluster is up

.exercise[

- Try again the `docker info` from earlier:

  ```
  eval $(docker-machine env --swarm node1)
  docker info
  ```

- Now all nodes should be visible
  <br/>(Give them a minute or two to register)

]

---

# Multi-host networking

- Docker 1.9 has the concept of *networks*

- By default, containers are on the default "bridge" network

- You can create additional networks

- Containers can be on multiple networks

- Containers can dynamically join/leave networks

- The "overlay" driver lets networks span multiple hosts

- Let's see that in action!

---

## Create a few networks and containers

.exercise[

```
docker network create --driver overlay jedi
docker network create --driver overlay darkside
docker network ls
```

]

--

(Don't worry, there won't be any spoiler here, I have
been so busy preparing this workshop that I haven't
seen the new movie yet!)

--

.exercise[

```
docker run -d --name luke --net jedi -m 3G redis
docker run -d --name vador --net jedi -m 3G redis
docker run -d --name palpatine --net darkside -m 3G redis
```

]

---

## Check connectivity within networks

.exercise[

- Check that our containers are on different networks:

  ```
  docker ps
  ```

- This will work:

  ```
  docker exec -ti vador ping luke
  ```

- This will not:

  ```
  docker exec -ti vador ping palpatine
  ```

]

---

## Dynamically connect containers

.exercise[

- ~~Connect `vador` to the `darkside`:~~
- To the `darkside`, connect `vador` we must:

  ```
  docker network connect darkside vador
  ```

- Now this will work:

  ```
  docker exec -ti vador ping palpatine
  ```

- Take a peek inside `vador`:

  ```
  docker exec -ti vador ip addr ls
  ```

]

---

## Dynamically disconnecting containers

.exercise[

- This works, right:

  ```
  docker exec -ti vador ping luke
  ```

- Let's disconnect `vador` from the `jedi` ~~order~~ network:

  ```
  docker network disconnect jedi vador
  ```

- And now:

  ```
  docker exec -ti vador ping luke
  ```

]

---

## Cleaning up

.exercise[

- Destroy containers:

  ```
  docker rm -f luke vador palpatine
  ```

- Destroy networks:

  ```
  docker network rm jedi
  docker network rm darkside
  ```

]

---

# Compose and multi-host networking

.icon[![Warning](warning.png)] Here be 7-headed flame-throwing hydras!

- This is super experimental

- Your cluster is likely to blow up to bits

- Situation is much better in Engine 1.10 and Compose 1.6
  <br/>(currently in RC; to be released circa February 2016!)

---

## Revisiting DockerCoins

.exercise[

- Go back to the `dockercoins` app:

  ```
  cd ~/orchestration-workshop/dockercoins
  ```

- Re-execute `build-tag-push` to get a fresh Compose file:

  ```
  eval $(docker-machine env -u)
  ../build-tag-push.py
  export COMPOSE_FILE=docker-compose.yml-XXX
  ```

]

---

## Add `container_name` to Compose file

.exercise[

- Edit the Compose file

- In the `hasher`, `rng`, and `redis` sections, add:
  <br/>`container_name: XXX`
  <br/>(where XXX is the name of the section)

- Also, comment out the `volumes` section

]

Note: by default, containers will be named `dockercoins_XXX_1`
(instead of `XXX`) and links will not work.

*This is no longer necessary with Compose 1.6!*

---

## Run the app

.exercise[

- Add two custom experimental flags:

  ```
  docker-compose \
      --x-networking --x-network-driver=overlay \
      up -d
  ```

- Check the `webui` endpoint address:

  ```
  docker-compose ps webui
  ```

- Go to the webui with your browser!

]

---

## Scale the app

.exercise[

- Don't forget the custom experimental flags:

  ```
  docker-compose \
      --x-networking --x-network-driver=overlay \
      scale worker=2
  ```

- Look at the graph in your browser

]

Note: with Compose 1.6 and Engine 1.10, you can have
multiple containers with the same DNS name, thus
achieving "natural" load balancing through DNS round robin.

---

## Cleaning up

.exercise[

- Terminate containers and remove them:

  ```
  docker-compose kill
  docker-compose rm -f
  ```

]

Note: Compose 1.5 doesn't support changes to an
existing app (except basic scaling).

When trying to do `docker-compose -x-... up` on existing
apps, you might get errors like this one:
<br/>.small[`ERROR: unable to find a node that satisfies container==38aac...`]

If that happens, just kill+rm the app and try again.

---

# Highly available Swarm managers

- Until now, the Swarm manager was a SPOF
  <br/>(Single Point Of Failure)

- Swarm has experimental support for replication

- When replication is enabled, you deploy multiple (identical) managers

  - one will be "primary"
  - the other(s) will be "secondary"
  - this is determined automatically
    <br/>(through *leader election*)

---

## Swarm leader election

- The leader election mechanism relies on a key/value store
  <br/>(consul, etcd, zookeeper)

- There is no requirement on the number of replicas
  <br/>(the quorum is achieved through the key/value store)

- When the leader (or "primary") is unavailable,
  <br/>a new election happens automatically

- You can issue API requests to any manager:
  <br/>if you talk to a secondary, it forwards to the primary

.icon[![Warning](warning.png)] There is currently a bug when
the Consul cluster itself has a leader election; see [docker/swarm#1782](
https://github.com/docker/swarm/issues/1782).

---

## Swarm replication in practice

- We need to give two extra flags to the Swarm manager:

  - `--replication`

    *enables replication (duh!)*

  - `--advertise ip.ad.dr.ess:port`

    *address and port where this Swarm manager is reachable*

- Do you deploy with Docker Machine?
  <br/>Then you can use `--swarm-opt`
  to automatically pass flags to the Swarm manager

---

## Cleaning up our current Swarm containers

- We will use Docker Machine to re-provision Swarm

- We need to:

  - remove the nodes from the Machine registry

  - remove the Swarm containers

.exercise[

- Remove the current configuration:
  ```
  for N in 1 2 3 4 5; do
    ssh node$N docker rm -f swarm-agent swarm-agent-master
    docker-machine rm -f node$N
  done
  ```

]

---

## Re-deploy with the new configuration

- This time, we can deploy each node identically
  <br/>(instead of 1 manager + 4 non-managers)

.exercise[

- Deploy all five nodes with the previous options,
  and the new replication options:

  .small[
  ```
  grep node[12345] /etc/hosts | grep -v ^127 |
  while read IPADDR NODENAME; do
    docker-machine create --driver generic \
      --engine-opt cluster-store=consul://localhost:8500 \
      --engine-opt cluster-advertise=eth0:2376 \
      --swarm --swarm-master \
      --swarm-discovery consul://localhost:8500  \
      --swarm-opt replication --swarm-opt advertise=$IPADDR:3376 \
      --generic-ssh-user docker --generic-ip-address $IPADDR $NODENAME
  done
  ```
  ]

]

.small[
Note: Consul is still running thanks to the `--restart=always` policy.
Other containers are now stopped, because the engines have been
reconfigured and restarted.
]

---

## Assess our new cluster health

- The output of `docker info` will tell us the status
  of the node that we are talking to (primary or replica)

- If we talk to a replica, it will tell us who is the primary

.exercise[

- Talk to a random node, and ask its view of the cluster:
  ```
  eval $(docker-machine env node3 --swarm)
  docker info | grep -e ^Name -e ^Role -e ^Primary
  ```

]

Note: `docker info` is one of the only commands that will
work even when there is no elected primary. This helps
debugging.

---

## Test Swarm manager failover

- The previous command told us which node was the primary manager

  - if `Role` is `primary`,
    <br/>then the primary is indicated by `Name`

  - if `Role` is `replica`,
    <br/>then the primary is indicated by `Primary`

.exercise[

- Powercycle the primary manager:
  ```
  ssh XXX sudo reboot
  ```

]

Look at the output of `docker info` every few seconds.

---

# Highly available containers

- Swarm has support for *rescheduling* on node failure

- It has to be explicitly enabled on a per-container basis

- When the primary manager detects that a node goes down,
  <br/>those containers are rescheduled elsewhere

- If the containers can't be rescheduled (constraints issue),
  <br/>they are lost (there is no reconciliation loop yet)

- As of Swarm 1.1.0, this is an *experimental* feature
  <br/>(To enable it, you must pass the
  `--experimental` flag when you start Swarm itself!)

---

## Working around flag order

- The flag must be *before* the Swarm command
  <br/>(i.e. `docker run swarm --experimental manage ...`)

- We cannot use Docker Machine to pass that flag ☹
  <br/>(Machine adds flags *after* the Swarm command)

- Instead, we will use the Swarm image `jpetazzo/swarm:experimental`:
  ```
  FROM swarm
  ENTRYPOINT ["/swarm", "--experimental"]
  ```

- We can tell Machine to use this with `--swarm-image`

---

## Reconfigure Swarm [one more time](https://www.youtube.com/watch?v=FGBhQbmPwH8)

.exercise[

- Redeploy Swarm with `--experimental`:

  .small[
  ```
  for N in 1 2 3 4 5; do
    ssh node$N docker rm -f swarm-agent swarm-agent-master
    docker-machine rm -f node$N
  done

  grep node[12345] /etc/hosts | grep -v ^127 |
  while read IPADDR NODENAME; do
    docker-machine create --driver generic \
      --engine-opt cluster-store=consul://localhost:8500 \
      --engine-opt cluster-advertise=eth0:2376 \
      --swarm --swarm-master --swarm-image jpetazzo/swarm:experimental \
      --swarm-discovery consul://localhost:8500  \
      --swarm-opt replication --swarm-opt advertise=$IPADDR:3376 \
      --generic-ssh-user docker --generic-ip-address $IPADDR $NODENAME
  done
  ```
  ]

]

---

## Start a resilient container

- By default, containers will not be restarted when their node goes down

- You must pass an explicit *rescheduling policy* to make that happen

- For now, the only policy is "on-node-failure"

.exercise[

- Start a container with a rescheduling policy:

  .small[
  ```
  docker run -d --name highlander -e reschedule:on-node-failure redis
  ```
  ]

]

Check that the container is up and running.

---

## Simulate a node failure

- We will reboot the node running this container

- Swarm will reschedule it

.exercise[

- Check on which node the container is running:
  </br>.small[`NODE=$(docker inspect --format '{{.Node.Name}}' highlander)`]

- Reboot that node:
  <br/>`ssh $NODE sudo reboot`

- Check that the container has been recheduled:
  <br/>`docker ps`

]

---

## .icon[![Warning](warning.png)] Caveats

- There are some corner cases when the node is also
  the Swarm leader or the Consul leader; this is being improved
  right now!

- Swarm doesn't handle gracefully the fact that after the
  reboot, you have *two* containers named `highlander`,
  and attempts to manipulate the container with its name
  will not work. This will be improved too.

---

## A new hope

- Compose 1.5 + Engine 1.9 =
  <br/>first release with multi-host networking

- Compose 1.6 + Engine 1.10 =
  <br/>HUGE improvements

- I will deliver this workshop about twice a month

- Check out the GitHub repo for updated content!

---

class: title

# Thanks! <br/> Questions?

### [@jpetazzo](https://twitter.com/jpetazzo) <br/> [@docker](https://twitter.com/docker)

    </textarea>
    <script src="https://gnab.github.io/remark/downloads/remark-0.5.9.min.js" type="text/javascript">
    </script>
    <script type="text/javascript">
      var slideshow = remark.create();
    </script>
  </body>
</html>