container.training/www/htdocs/index.html

<!DOCTYPE html>
<html>
  <head>
    <base target="_blank">
    <title>Docker Orchestration Workshop</title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    <style type="text/css">
      @import url(https://fonts.googleapis.com/css?family=Yanone+Kaffeesatz);
      @import url(https://fonts.googleapis.com/css?family=Droid+Serif:400,700,400italic);
      @import url(https://fonts.googleapis.com/css?family=Ubuntu+Mono:400,700,400italic);

      body { font-family: 'Droid Serif'; font-size: 150%; }

      h1, h2, h3 {
        font-family: 'Yanone Kaffeesatz';
        font-weight: normal;
      }
      a {
        text-decoration: none;
        color: blue;
      }
      .remark-code, .remark-inline-code { font-family: 'Ubuntu Mono'; }
      .red { color: #fa0000; }
      .gray { color: #ccc; }
      .small { font-size: 70%; }
      .big { font-size: 140%; }
      .underline { text-decoration: underline; }
      .footnote {
        position: absolute;
        bottom: 3em;
      }
      .pic {
        vertical-align: middle;
        text-align: center;
        padding: 0 0 0 0 !important;
      }
      img {
        max-width: 100%;
        max-height: 450px;
      }
      .title {
        vertical-align: middle;
        text-align: center;
      }
      .title {
        font-size: 2em;
      }
      .title .remark-slide-number {
        font-size: 0.5em;
      }
      .quote {
        background: #eee;
        border-left: 10px solid #ccc;
        margin: 1.5em 10px;
        padding: 0.5em 10px;
        quotes: "\201C""\201D""\2018""\2019";
        font-style: italic;
      }
      .quote:before {
        color: #ccc;
        content: open-quote;
        font-size: 4em;
        line-height: 0.1em;
        margin-right: 0.25em;
        vertical-align: -0.4em;
      }
      .quote p {
        display: inline;
      }
      .icon img {
        height: 1em;
      }
      .exercise {
        background-color: #eee;
        background-image: url("keyboard.png");
        background-size: 1.4em;
        background-repeat: no-repeat;
        background-position: 0.2em 0.2em;
        border: 2px dotted black;
      }
      .exercise::before {
        content: "Exercise:";
        margin-left: 1.8em;
      }
      li p { line-height: 1.25em; }
    </style>
  </head>
  <body>
    <textarea id="source">

class: title

# Docker <br/> Orchestration <br/> Workshop

---

<!-- grep '^# ' index.html | grep -v '<br' | tr '#' '-'^C -->

## Outline (1/2)

- Pre-requirements
- VM environment
- Our sample application
- Running services independently
- Running the whole app on a single node
- Identifying bottlenecks
- Measuring latency under load
- Scaling HTTP on a single node
- Put a load balancer on it
- Connecting to containers on other hosts
- Abstracting remote services with ambassadors
- Various considerations about ambassadors

---

## Outline (2/2)

- Docker for ops
- Backups
- Logs
- Security upgrades
- Network traffic analysis
- Dynamic orchestration
- Hands-on Swarm
- Deploying Swarm
- Cluster discovery
- Building our app on Swarm
- Network plumbing on Swarm
- Going further

---

# Pre-requirements

- Computer with network connection and SSH client
  <br/>(on Windows, get [putty](http://www.putty.org/)
  or [Git BASH](https://msysgit.github.io/))
- GitHub account (recommended; not mandatory)
- Docker Hub account (only for Swarm hands-on section)
- Basic Docker knowledge

.exercise[

- This is the stuff you're supposed to do!
- Create [GitHub](https://github.com/) and
  [Docker Hub](https://hub.docker.com) accounts now if needed
- Go to [view.dckr.info](http://view.dckr.info) to view these slides

]

---

# VM environment

- Each person gets 5 VMs
- They are *your* VMs
- They'll be up until tomorrow
- You have a little card with login+password+IP addresses
- You can automatically SSH from one VM to another

.exercise[

- Log into the first VM (`node1`)
- Check that you can SSH (without password) to `node2`
- Check the version of docker with `docker version`

]

.footnote[Note: from now on, unless instructed, **all commands must
be run from the first VM, `node1`**.]

---

## Brand new versions!

- Engine 1.8.2

- Compose 1.4.2

- Swarm 0.4

- Machine 0.4.1

---

# Our sample application

- Let's look at the general layout of the
  [source code](https://github.com/jpetazzo/orchestration-workshop)

- Each directory = 1 microservice
  - `rng` = web service generating random bytes
  - `hasher` = web service computing hash of POSTed data
  - `worker` = background process using `rng` and `hasher`
  - `webui` = web interface to watch progress

.exercise[

- Clone the repository on `node1`:
  <br/>.small[`git clone git://github.com/jpetazzo/orchestration-workshop`]

]

(Bonus points for forking on GitHub and cloning your fork!)

---

## What's this application?

- It is a DockerCoin miner! 💰🐳📦🚢

- No, you can't buy coffee with DockerCoins

- How DockerCoins works:

  - `worker` asks to `rng` to give it random bytes
  - `worker` feeds those random bytes into `hasher`
  - each hash starting with `0` is a DockerCoin
  - DockerCoins are stored in `redis`
  - `redis` is also updated every second to track speed
  - you can see the progress with the `webui`

Next: we will inspect components independently.

---

# Running services independently

First, we will run the random number generator (`rng`).

.exercise[

- Go to the `dockercoins` directory (in the cloned repo)

- Run `docker-compose up rng`
  <br/>(Docker will pull `python` and build the microservice)

]

.icon[![Warning](warning.png)] Pay attention to the port mapping!

- The container log says:
  <br/>`Running on http://0.0.0.0:80/`

- But if you try `curl localhost:80`, you will get:
  <br/>`Connection refused`

---

## Understanding port mapping

- `node1`, the Docker host, has only one port 80

- If we give the one and only port 80 to the first
  container who asks for it, we are in trouble when
  another container needs it

- Default behavior: containers are not "exposed"
  <br/>(only reachable through their private address)

- Container network services can be exposed:

  - statically (you decide which host port to use)

  - dynamically (Docker allocates a host port)

---

## Declaring port mapping

- Directly with the Docker Engine:
  <br/>`docker run -P redis`
  <br/>`docker run -p 6379 redis`
  <br/>`docker run -p 1234:6379 redis`

- With Docker Compose, in the `docker-compose.yml` file:

```
rng:
  …
  ports:
    - "8001:80"
```

→ port 8001 *on the host* maps to
port 80 *in the container*

---

## Using the `rng` service

Let's get random bytes of data!

.exercise[

- Open a second terminal and connect to the same VM

- Check that the service is alive:
  <br/>`curl localhost:8001`

- Get 10 bytes of random data:
  <br/>`curl localhost:8001/10`

- If the binary data output messed up your terminal, fix it:
  <br/>`reset`

]

---

## Running the hasher

.exercise[

- Run `docker-compose up hasher`
  <br/>(it will pull `ruby` and do the build)

]

.icon[![Warning](warning.png)] Again, pay attention to the port mapping!

The container log says that it's listening on port 80,
but it's mapped to port 8002 on the host.

You can see the mapping in `docker-compose.yml`.

---

## Testing the hasher

.exercise[

- Open a third terminal window, and SSH to `node1`

- Run `curl localhost:8002`
  <br/>(it will say it's alive)

- Posting binary data requires some extra flags:

  ```
  curl \
    -H "Content-type: application/octet-stream" \
    --data-binary hello \
    localhost:8002
  ```

- Check that it computed the right hash:
  <br/>`echo -n hello | sha256sum`

]

---

## Stopping services

We have multiple options:

- Interrupt `docker-compose up` with `^C`

- Stop individual services with `docker-compose stop rng`

- Stop all services with `docker-compose stop`

- Kill all services with `docker-compose kill`
  <br/>(rude, but faster!)

.exercise[

- Use any of those methods to stop `rng` and `hasher`

]

---

# Running the whole app on a single node

.exercise[

- Run `docker-compose up` to start all components

]

- Aggregate output is shown

- Output is verbose
  <br/>(because the worker is constantly hitting other services)

- Now let's use the little web UI to see realtime progress


.exercise[

- Open http://[yourVMaddr]:8000/ (from a browser)

]

---

## Running in the background

- The logs are very verbose (and won't get better)

- Let's put them in the background for now!

.exercise[

- Stop the app (with `^C`)

- Start it again with `docker-compose up -d`

- Check on the web UI that the app is still making progress

]

---

## Looking at resource usage

- Let's look at CPU, memory, and I/O usage

.exercise[

- run `top` to see CPU and memory usage
  <br/>(you should see idle cycles)

- run `vmstat 3` to see I/O usage (si/so/bi/bo)
  <br/>(the 4 numbers should be almost zero,
  <br/>except `bo` for logging)

]

We have available resources.

- Why?
- How can we use them?

---

## Scaling workers on a single node

- Docker Compose supports scaling.red[*]
- Let's scale `worker` and see what happens!

.exercise[

- In one SSH session, run `docker-compose logs worker`

- In another, run `docker-compose scale worker=10`

- See the impact on CPU load (with top/htop),
  <br/>and on compute speed (with web UI)

]

You should see the compute speed increase ~3x (not 10x).

.footnote[.red[*]With some limitations, as we'll see later.]

---

# Identifying bottlenecks

- Adding workers didn't result in linear improvement

- *Something else* is slowing us down

--

- ... But what?

--

- The code doesn't have instrumentation

- We will use `ab` for individual load testing

- We will use `httping` to view latency under load

---

## Benchmarking our microservices

We will test microservices in isolation.

.exercise[

- Stop the application:
  `docker-compose kill`

- Remove old containers:
  `docker-compose rm`

- Start `hasher` and `rng`:
  `docker-compose up hasher rng`

]

Now let's hammer them with requests!

---

## Testing `rng`

Let's assess the raw performance of our RNG.

.exercise[

- Test the performance on one big request:
  <br/>`curl -o/dev/null localhost:8001/10000000`
  <br/>(should take ~1s, and show speed of ~10 MB/s)

]

If we were doing requests of 1000 bytes ...

... Could we get 10k req/s?

Let's test and see what happens!

---

## Concurrent requests

.exercise[

- Test 100 requests of 1000 bytes each:
  <br/>`ab -n 100 localhost:8001/1000`

- Test 100 requests, 10 requests in parallel:
  <br/>`ab -n 100 -c 10 localhost:8001/1000`
  <br/>(look how the latency has increased!)

- Try with 100 requests in parallel:
  <br/>`ab -n 100 -c 100 localhost:8001/1000`

]

--

Whatever we do, we get ~10 requests/second.

Increasing concurrency doesn't help:
it just increases latency.

---

## Discussion

- When serving requests sequentially, they each take 100ms

- When 10 requests arrive at the same time:

  - one request is served in 100ms
  - another is served in 200ms
  - another is served in 300ms
  - ...
  - another is served in 1000ms

- All requests are queued and served by a single thread

- It looks like `rng` doesn't handle concurrent requests

- What about `hasher`?

---

## Save some random data and stop the generator

Before testing the hasher, let's save some random
data that we will feed to the hasher later.

.exercise[

- Run `curl localhost:8001/1000000 > /tmp/random`

]

Now we can stop the generator.

.exercise[

- In the shell where you did `docker-compose up rng`,
  <br/>stop it by hitting `^C`

]

---

## Benchmarking the hasher

We will hash the data that we just got from `rng`.

.exercise[

- Posting binary data requires some extra flags:

  ```
  curl \
    -H "Content-type: application/octet-stream" \
    --data-binary @/tmp/random \
    localhost:8002
  ```

- Compute the hash locally to verify that it works fine:
  <br/>`sha256sum /tmp/random`
  <br/>(it should display the same hash)

]

---

## The hasher under load

The invocation of `ab` will be slightly more complex as well.

.exercise[

- Execute 100 requests in a row:

  ```
  ab -n 100 -T application/octet-stream \
     -p /tmp/random localhost:8002/
  ```

- Execute 100 requests with 10 requests in parallel:

  ```
  ab -c 10 -n 100 -T application/octet-stream \
     -p /tmp/random localhost:8002/
  ```

]

Take note of the performance numbers (requests/s).

---

## Benchmarking the hasher on smaller data

Here we hashed 1,000,000 bytes.

Later we will hash much smaller payloads.

Let's repeat the tests with smaller data.

.exercise[

- Run `truncate --size=10 /tmp/random`
- Repeat the `ab` tests

]

---

# Measuring latency under load

We will use `httping`.

.exercise[

- You need three SSH connections

- In the first one, let run `httping localhost:8001`

- In the second one, let run `httping localhost:8002`

- In the third one, run `docker-compose up -d`

]

Check the latency numbers.

- `hasher` should be very low (~1ms)
- `rng` should be low, with occasional spikes (10-100ms)

---

## Latency when scaling the worker

We will add workers and see what happens.

.exercise[

- Run `docker-compose scale worker=2`

- Check latency

- Increase number of workers and repeat

]

What happens?

- `hasher` remains low
- `rng` spikes up until it is reaches ~N*100ms
  <br/>(when you have N+1 workers)

---

class: title

Why?

---

## Why does everything take (at least) 100ms?

--

`rng` code:

![RNG code screenshot](delay-rng.png)

--

`hasher` code:

![HASHER code screenshot](delay-hasher.png)

---

class: title

But ...

WHY?!?

---

## Why did we sprinkle this sample app with sleeps?

- Deterministic performance
  <br/>(regardless of instance speed, CPUs, I/O...)

--

- Actual code sleeps all the time anyway

--

- When your code makes a remote API call:

  - it sends a request;

  - it sleeps until it gets the response;

  - it processes the response.

---

## Why do `rng` and `hasher` behave differently?

![Equations on a blackboard](equations.png)

--

(Synchronous vs. asynchronous event processing)

---

## How to make `rng` go faster

- Obvious solution: comment out the `sleep` instruction

--

- Real-world solution: use an asynchronous framework
  <br/>(e.g. use gunicorn with gevent)

--

- New rule: we can't change the code!

--

- Solution: scale out `rng`
  <br/>(dispatch `rng` requests on multiple instances)

---

# Scaling HTTP on a single node

- We could try to scale with Compose:

  ```
  docker-compose scale rng=3
  ```

- Compose doesn't deal with load balancing

- We would get 3 instances ...

- ... But only the first one would serve traffic

---

## The plan

- Stop the `rng` service first

- Create multiple identical `rng` containers

- Put a load balancer in front of them

- Point other services to the load balancer

---

## Stopping `rng`

- That's the easy part!

.exercise[

- Use `docker-compose` to stop `rng`:

  ```
  docker-compose stop rng
  ```

]

Note: we do this first because we are about to remove
`rng` from the Docker Compose file.

If we don't stop
`rng` now, it will remain up and running, with Compose
being unaware of its existence!

---

## Scaling `rng`

.exercise[

- Replace the `rng` service with multiple copies of it:

  ```
  rng1:
    build: rng

  rng2:
    build: rng

  rng3:
    build: rng
  ```

]

That's all!

Shortcut: `docker-compose.yml-scaled-rng`

---

## Introduction to `jpetazzo/hamba`

- Public image on the Docker Hub

- Load balancer based on HAProxy

- Expects the following arguments:
  <br/>`FE-port BE1-addr BE1-port BE2-addr BE2-port ...`
  <br/>*or*
  <br/>`FE-addr:FE-port BE1-addr BE1-port BE2-addr BE2-port ...`

  - FE=frontend (the thing other services connect to)

  - BE=backend (the multiple copies of your scaled service)

.small[
Example: listen to port 80 and balance traffic on www1:1234 + www2:2345

```
docker run -d -p 80 jpetazzo/hamba 80 www1 1234 www2 2345
```
]

---

# Put a load balancer on it

Let's add our load balancer to the Compose file.

.exercise[

- Add the following section to the Compose file:

  ```
  rng0:
      image: jpetazzo/hamba
      links:
        - rng1
        - rng2
        - rng3
      command: 80 rng1 80 rng2 80 rng3 80
      ports:
        - "8001:80"
  ```

]

Shortcut: `docker-compose.yml-scaled-rng`

---

## Point other services to the load balancer

- The only affected service is `worker`

- We have to replace the `rng` link with a link to `rng0`,
  but it should still be named `rng` (so we don't change the code)

.exercise[

- Update the `worker` section as follows:

  ```
  worker:
    build: worker
    links:
      - rng0:rng
      - hasher
      - redis
   ```

]

Shortcut: `docker-compose.yml-scaled-rng`

---

## Start the whole stack

.exercise[

- Run `docker-compose up -d`

- Check worker logs with `docker-compose logs worker`

- Check load balancer logs with `docker-compose logs rng0`

]

If you get errors about port 8001, make sure that
`rng` was stopped correctly and try again.

---

## The good, the bad, the ugly

- The good

  We scaled a service, added a load balancer -
  <br/>without changing a single line of code.

- The bad

  We manually copy-pasted sections in `docker-compose.yml`.

  Improvement: write scripts to transform the YAML file.

- The ugly

  If we scale up/down, we have to restart everything.

  Improvement: reconfigure the load balancer dynamically.

---

# Connecting to containers on other hosts

- So far, our whole stack is on a single machine

- We want to scale out (across multiple nodes)

- We will deploy the same stack multiple times

- But we want every stack to use the same Redis
  <br/>(in other words: Redis is our only *stateful* service here)

--

- And remember: we're not allowed to change the code!

  - the code connects to host `redis`
  - `redis` must resolve to the address of our Redis service
  - the Redis service must listen on the default port (6379)

---

## The plan

- Deploy our Redis service separately

  - use the same `redis` image

  - make sure that Redis server port (6379) is publicly accessible,
    using port 6379 on the Docker host

- Update our Docker Compose YAML file

  - remove the `redis` section

  - in the `links` section, remove `redis`

  - instead, put a `redis` entry in `extra_hosts`

---

## Making Redis available on its default port

There are two strategies.

- `docker run -p 6379:6379 redis`

  - the container has its own, isolated network stack
  - Docker creates a port mapping rule through iptables
  - slight performance overhead
  - port number is explicit (visible through Docker API)

- `docker run --net host redis`

  - the container uses the network stack of the host
  - when it binds to 6379/tcp, that's 6379/tcp on the host
  - allows raw speed (no overhead due to iptables/bridge)
  - port number is not visible through Docker API

Choose wisely!

---

## Deploy Redis

.exercise[

- Start a new redis container, mapping port 6379 to 6379:

  ```
  docker run -d -p 6379:6379 redis
  ```

- Check that it's running with `docker ps`

- Note the IP address of this Docker host

- Try to connect to it (from anywhere):

  ```
  telnet ip.ad.dr.ess 6379
  ```

]

To exit a telnet session: `Ctrl-] c ENTER`

---

## Update `docker-compose.yml` (1/3)

.exercise[

- Comment out `redis`:

  ```
  #redis:
  #    image: redis
  ```

]

---

## Update `docker-compose.yml` (2/3)

.exercise[

- Update `worker`:

  ```
  worker:
    build: worker
    extra_hosts:
      redis: A.B.C.D
    links:
      - rng0:rng
      - hasher
  ```

]

Replace `A.B.C.D` with the IP address noted earlier.

Shortcut: `docker-compose.yml-extra-hosts`
<br/>(But you still have to replace `A.B.C.D`!)

---

## Update `docker-compose.yml` (3/3)

.exercise[

- Update `webui`:

  ```
  webui:
    build: webui
    extra_hosts:
      redis: A.B.C.D
    ports:
      - "8000:80"
    #volumes:
    #  - "./webui/files/:/files/"
  ```

]

(Replace `A.B.C.D` with the IP address noted earlier)

.icon[![Warning](warning.png)] Don't forget to comment out the `volumes` section!

---

## Why did we comment out the `volumes` section?

- Volumes have multiple uses:

  - storing persistent stuff (database files...)

  - sharing files between containers (logs, configuration...)

  - sharing files between host and containers (source...)

- The `volumes` directive expands to an host path
  <br/>.small[(e.g. `/home/docker/orchestration-workshop/dockercoins/webui/files`)]

- This host path exists on the local machine
  <br/>(not on the others)

- This specific volume is used in development
  <br/>(not in production)

---

## Start the stack on the first machine

- Nothing special to do here

- Just bring up the application like we did before

.exercise[

- `docker-compose up -d`

]

- Check in the web browser that it's running correctly

---

## Start the stack on another machine

- We will set the `DOCKER_HOST` variable

- `docker-compose` will detect and use it

- Our Docker hosts are listening on port 55555

.exercise[

- Set the environment variable:
  <br/>`export DOCKER_HOST=tcp://node2:55555`

- Start the stack:
  <br/>`docker-compose up -d`

- Check that it's running:
  <br/>`docker-compose ps`

]

---

## Scale!

.exercise[

- Open the Web UI
  <br/>(on a node where it's deployed)

- Deploy one instance of the stack on each node

]

---

## Cleanup

- Let's remove what we did

.exercise[

- You can use the following scriptlet:

  ```
  for N in $(seq 1 5); do
    export DOCKER_HOST=tcp://node$N:55555
    docker ps -qa | xargs docker rm -f
  done
  unset DOCKER_HOST
  ```

]

---

# Abstracting remote services with ambassadors

- What if we can't/won't run Redis on its default port?

- What if we want to be able to move it more easily?

--

- We will use an ambassador

- Redis will run at an arbitrary location (host+port)

- The ambassador will be part of the scaled stack

- The ambassador will connect to Redis

- The ambassador will "act as" Redis in the stack

---

## Start redis

- This time, we will let Docker pick the port for Redis

.exercise[

- Run redis with a random public port:
  <br/>`docker run -d -P --name myredis redis`

- Check which port was allocated:
  <br/>`docker port myredis 6379`

]

- Note this IP address and port

---

## Update `docker-compose.yml`

.exercise[

- Restore `links` as they were before in `webui` and `worker`

- Replace `redis` with an ambassador using `jpetazzo/hamba`:

  ```
  redis:
    image: jpetazzo/hamba
    command: 6379 AA.BB.CC.DD EEEEE
  ```

]

Shortcut: `docker-compose.yml-ambassador`
<br/>(But you still have to update `AA.BB.CC.DD EEEE`!)

---

## Start the new stack

.exercise[

- Run `docker-compose up -d`

- Go to the web UI

- Start the stack on another node as previously,
  <br/>and confirm on the web UI that it's picking up

]

---

# Various considerations about ambassadors

- "But, ambassadors are adding an extra hop!"

--

- Yes, but if you need load balancing, you need that hop

- Ambassadors actually *save* one hop
  <br/>(they act as local load balancers)

  - traditional load balancer:
    <br/>client ⇒ external LB ⇒ server (2 physical hops)

  - ambassadors:
    <br/>client → ambassador ⇒ server (1 physical hop)

--

- Ambassadors are more reliable than traditional LBs
  <br/>(they are colocated with their clients)

---

## Inconvenients of ambassadors

- Generic issues
  <br/>(shared with any kind of load balancing / HA setup)

  - extra logical hop (not transparent to the client)

  - must assess backend health

  - one more thing to worry about (!)

- Specific issues

  - load balancing fairness

High-end load balancing solutions will rely on back pressure
from the backends. This addresses the fairness issue.

---

## There are many ways to deploy ambassadors

"Ambassador" is a design pattern.

There are many ways to implement it.

We will present three increasingly complex (but also powerful)
ways to deploy ambassadors.

---

## Single-tier ambassador deployment

- One-shot configuration process

- Must be executed manually after each scaling operation

- Scans current state, updates load balancer configuration

- Pros:
  <br/>- simple, robust, no extra moving part
  <br/>- easy to customize (thanks to simple design)
  <br/>- can deal efficiently with large changes

- Cons:
  <br/>- must be executed after each scaling operation
  <br/>- harder to compose different strategies

- Example: this workshop

---

## Two-tier ambassador deployment

- Daemon listens to Docker events API

- Reacts to container start/stop events

- Adds/removes back-ends to load balancers configuration

- Pros:
  <br/>- no extra step required when scaling up/down

- Cons:
  <br/>- extra process to run and maintain
  <br/>- deals with one event at a time (ordering matters)

- Hidden gotcha: load balancer creation

- Example: interlock

---

## Three-tier ambassador deployment


- Daemon listens to Docker events API

- Reacts to container start/stop events

- Adds/removes scaled services in distributed config DB
  <br/>(zookeeper, etcd, consul…)

- Another daemon listens to config DB events

- Adds/removes backends to load balancers configuration

- Pros:
  <br/>- more flexibility

- Cons:
  <br/>- three extra services to run and maintain

- Example: registrator

---

## Other multi-host communication mechanisms

- Overlay networks

  - weave, flannel, pipework ...

- Network plugins

  - check out Docker Experimental Engine
    <br/>(should land in stable releases soon)

- Allow a flat network for your containers

- Often requires an extra service to deal with BUM packets
  <br/>(broadcast/unknown/multicast)

- Load balancers and/or failover mechanisms still needed

---

class: title

# Interlude <br/>

# Docker for ops

---

# Backups

- Redis is still running (with name `myredis`)

- We want to enable backups without touching it

- We will use a special backup container:

  - sharing the same volumes

  - linked to it (to connect to it easily)

  - possibly containing our backup tools

- This works because the `redis` container image
  <br/>stores its data on a volume

---

## Starting the backup container

.exercise[

- Start the container:

  ```
  docker run --link myredis:redis \
             --volumes-from myredis \
             -v /tmp/myredis:/output \
             -ti ubuntu
  ```

- Look in `/data` in the container
  <br/>(That's where Redis puts its data dumps)
]

- We need to tell Redis to perform a data dump *now*

---

## Connecting to Redis

.exercise[

- `apt-get install telnet`

- `telnet redis 6379`

- issue `SAVE` then `QUIT`

- Look at `/data` again

]

- There should be a recent dump file now

---

## Getting the dump out of the container

- We could use many things:

  - s3cmd to copy to S3
  - SSH to copy to a remote host
  - gzip/bzip/etc before copying

- We'll just copy it to the Docker host

.exercise[

- Copy the file from `/data` to `/output`

- Exit the container

- Look into `/tmp/myredis` (on the host)

]

---

# Logs

- Sorry, this part won't be hands-on

- Two strategies:

  - log to plain files on volumes

  - log to stdout
    <br/>(and use a logging driver)

---

## Logging to plain files on volumes

- Start a container with `-v /logs`

- Make sure that all log files are in `/logs`

- To check logs, run e.g.

  ```
  docker run --volumes-from ... ubuntu sh -c \
         "grep WARN /logs/*.log"
  ```

- Or just go interactive:

  ```
  docker run --volumes-from ... -ti ubuntu
  ```

- You can (should) start a log shipper that way

---

## Logging to stdout

- All containers should write to stdout/stderr

- Docker will collect logs and pass them to a logging driver

- Available drivers:
  <br/>json-file (default), syslog, journald, gelf, fluentd

- Change driver by passing `--log-driver` option to daemon
  <br>(On Ubuntu, tweak `DOCKER_OPTS` in `/etc/default/docker`)

- For now, only json-files supports logs retrieval
  <br/>(i.e. `docker logs`)

- Warning: json-file doesn't rotate logs by default
  <br/>(but this can be changed with `--log-opt`)

See: https://docs.docker.com/reference/logging/overview/

---

# Security upgrades

- This section is not hands-on

- Public Service Announcement

- We'll discuss:

  - how to upgrade the Docker daemon

  - how to upgrade container images

---

## Upgrading the Docker daemon

- Stop all containers cleanly
  <br/>(`docker ps -q | xargs docker stop`)

- Stop the Docker daemon

- Upgrade the Docker daemon

- Start the Docker daemon

- Start all containers

- This is like upgrading your Linux kernel,
  <br/>but it will get better

---

## Upgrading container images

- When a vulnerability is announced:

  - if it affects your base images,
    <br/>make sure they are fixed first

  - if it affects downloaded packages,
    <br/>make sure they are fixed first

  - re-pull base images

  - rebuild

  - restart containers

(The procedure is simple and plain, just follow it!)

---

# Network traffic analysis

- We still have `myredis` running

- We will use *shared network namespaces*
  <br/>to perform network analysis

- Two containers sharing the same network namespace...

  - have the same IP addresses

  - have the same network interfaces

- `eth0` is therefore the same in both containers

---

## Install and start `ngrep`

Ngrep uses libpcap (like tcpdump) to sniff network traffic.

.exercise[

- Start a container with the same network namespace:
  <br/>`docker run --net container:myredis -ti alpine`

- Install ngrep:
  <br/>`apk update && apk add ngrep`

- Run ngrep:
  <br/>`ngrep -tpd eth0 -Wbyline . tcp`

]

You should see a stream of Redis requests and responses.

---

class: title

# Dynamic orchestration

---

## Static vs Dynamic

- Static

  - you decide what goes where

  - simple to describe and implement

  - seems easy at first but doesn't scale efficiently

- Dynamic

  - the system decides what goes where

  - requires extra components (HA KV...)

  - scaling can be finer-grained, more efficient

---

## Mesos (overview)

- First presented in 2009

- Initial goal: resource scheduler
  <br/>(two-level/pessimistic)

  - top-level "master" knows the global cluster state

  - "slave" nodes report status and resources to master

  - master allocates resources to "frameworks"

- Container support added recently
  <br/>(had to fit existing model)

- Network and service discovery is complex

---

## Mesos (in practice)

- Easy to setup a test cluster (in containers!)

- Great to accommodate mixed workloads
  <br/>(see Marathon, Chronos, Aurora, and many more)

- "Meh" if you only want to run Docker containers

- In production on clusters of thousands of nodes

- Open source project; commercial support available

---

## Kubernetes (overview)

- 1 year old

- Designed specifically as a platform for containers
  <br/>("greenfield" design)

  - "pods" = groups of containers sharing network/storage

  - Scaling and HA managed by "replication controllers"

  - extensive use of "tags" instead of e.g. tree hierarchy

- Initially designed around Docker,
  <br/>but doesn't hesitate to diverge in a few places

---

## Kubernetes (in practice)

- Network and service discovery is powerful, but complex
  <br/>.small[(different mechanisms within pod, between pods, for inbound traffic...)]

- Initially designed around GCE
  <br/>.small[(currently relies on "native" features for fast networking and persistence)]

- Adaptation is needed when it differs from Docker
  <br/>.small[(need to learn new API, new tooling, new concepts)]

- Great deployment platform ...
  <br/>but no developer experience yet

---

## Swarm (in theory)

- Consolidates multiple Docker hosts into a single one

- "Looks like" a Docker daemon, but it dispatches (schedules)
  your containers on multiple daemons

- Talks the Docker API front and back
  <br/>(leverages the Docker API and ecosystem)

- Open source and written in Go (like Docker)

- Started by two of the original Docker authors
  <br/>([@aluzzardi](https://twitter.com/aluzzardi) and [@vieux](https://twitter.com/vieux))

---

## Swarm (in practice)

- Not stable yet (version 0.4 right now)

- OK for some scenarios (Jenkins, grid...)

- Not OK (yet.red[*]) for Compose build, links...

- We'll see it (briefly) in action

.footnote[.red[*]By "not OK" we mean "requires extra elbow grease."]

---

## PAAS on Docker

- The PAAS workflow: *just push code*
  <br/>(inspired by Heroku, dotCloud...)

- TL,DR: easier for devs, harder for ops,
  <br/>some very opinionated choices

- A few examples:
  <br/>(Non-exhaustive list!!!)

  - Cloud Foundry
  - Deis
  - Dokku
  - Flynn
  - Tsuru

---

## A few other tools

- Flocker

  - manage/migrate stateful containers

- Powerstrip

  - sits in front of the Docker API

  - great to implement your own experiments

- Weave

  - overlay network so that containers can ping each other

... And many more!

---

class: pic

![Here Be Dragons](dragons.jpg)

---

## Warning: here be dragons

- So far, we've used stable products (versions 1.X)

- We're going to expore experimental software

- **Use at your own risk**

---

# Hands-on Swarm

![Swarm Logo](swarm.png)

---

## Setting up our Swarm cluster

- This can be done manually or with **Docker Machine**

- Manual deployment:

  - with TLS: certificate generation is painful
    <br/>(needs dual-use certs)

  - without TLS: easier, but insecure
    <br/>(unless you run on your internal/private network)

- Docker Machine deployment:

  - generates keys, certificates, and deploys them for you

  - can also create VMs

---

## The Way Of The Machine

- Install `docker-machine` (single binary download)

- Set a few environment variables (cloud credentials)

- Create one or more machines:
  <br/>`docker-machine create -d digitalocean node42`

- List machines and their status:
  <br/>`docker-machine ls`

- Select a machine for use:
  <br/>`eval $(docker-machine env node42)`
  <br/>(this will set a few environment variables)

- Execute regular commands with Docker, Compose, etc.
  <br/>(they will pick up remote host address from environment)

---

## Docker Machine `generic` driver

- Most drivers work the same way:

  - use cloud API to create instance

  - connect to instance over SSH

  - install Docker

- The `generic` driver skips the first step

- It can install Docker on any machine,
  <br/>as long as you have SSH access

- We will use that!

---

# Deploying Swarm

- Components involved:

  - service discovery mechanism
    <br/>(we'll use Docker's hosted system)

  - swarm manager
    <br/>(runs on `node1`, exposes Docker API)

  - swarm agent
    <br/>(runs on each node, registers it with service discovery)

---

# Cluster discovery

- Possible backends:

  - dynamic, self-hosted (zk, etcd, consul)

  - static (command-line or file)

  - hosted by Docker (token)

- We will use the token mechanism

.exercise[

- Run `TOKEN=$(docker run swarm create)`
- Save `$TOKEN` carefully: it's your token
  <br/>(it's the unique identifier for your cluster)

]

---

## Swarm agent

- Used only for dynamic discovery (zk, etcd, consul, token)

- Must run on each node

- Every 20s (by default), tells to the discovery system:
  </br>"Hello, there is a Swarm node at A.B.C.D:EFGH"

- Must know the node's IP address
  <br/>(sorry, it can't figure it out by itself, because
  <br/>it doesn't know whether to use public or private addresses)

- The node continues to work even if the agent dies

- Automatically started by Docker Machine
  <br/>(when the `--swarm` option is passed)

---

## Swarm manager

- Today: must run on the leader node

- Later: can run on multiple nodes, with leader election

- Automatically started by Docker Machine
  <br/>(when the `--swarm-master` option is passed)

.exercise[

- Connect to `node1`

- "Create" a node with Docker Machine

  .small[
  ```
  docker-machine create node1 --driver generic \
    --swarm --swarm-master --swarm-discovery token://$TOKEN \
    --generic-ssh-user docker --generic-ip-address 1.2.3.4
  ```
  ]

]

(Don't forget to replace 1.2.3.4 with the node IP address!)

---

## Check our node

Let's connect to the node *individually*.

.exercise[

- Select the node with Machine

  ```
  eval $(docker-machine env node1)
  ```

- Execute some Docker commands

  ```
  docker version
  docker info
  docker ps
  ```

]

Two containers should show up: the agent and the manager.

---

## Check our (single-node) Swarm cluster

Let's connect to the manager instead.

.exercise[

- Select the Swarm manager with Machine

  ```
  eval $(docker-machine env node1 --swarm)
  ```

- Execute some Docker commands

  ```
  docker version
  docker info
  docker ps
  ```

]

The output is different! Let's review this.

---

## `docker version`

Swarm identifies itself clearly:

```
Client:
 Version:      1.8.2
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   0a8c2e3
 Built:        Thu Sep 10 19:19:00 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      swarm/0.4.0
 API version:  1.16
 Go version:   go1.4.2
 Git commit:   d647d82
 Built:
 OS/Arch:      linux/amd64
```

---

## `docker info`

Swarm gives cluster information, showing all nodes:

```
Containers: 3
Images: 6
Role: primary
Strategy: spread
Filters: affinity, health, constraint, port, dependency
Nodes: 1
 node: 52.89.117.68:2376
  └ Containers: 3
  └ Reserved CPUs: 0 / 2
  └ Reserved Memory: 0 B / 3.86 GiB
  └ Labels: executiondriver=native-0.2,
            kernelversion=3.13.0-53-generic,
            operatingsystem=Ubuntu 14.04.2 LTS,
            provider=generic, storagedriver=aufs
CPUs: 2
Total Memory: 3.86 GiB
Name: 2ec2e6c4054e
```

---

## `docker ps`

- This one should show nothing at this point.

- The Swarm containers are hidden.

- This avoids unneeded pollution.

- This also avoids killing them by mistake.

---

## Add other nodes to the cluster

- Let's use *almost* the same command line
  <br/>(but without `--swarm-master`)

.exercise[

- Stay on `node1` (it has keys and certificates now!)

- Add another node with Docker Machine

  .small[
  ```
  docker-machine create node2 --driver generic \
    --swarm --swarm-discovery token://$TOKEN \
    --generic-ssh-user docker --generic-ip-address 1.2.3.4
  ```
  ]
]

Remember to update the IP address correctly.

Repeat for all 4 nodes.

Pro tip: look for name/address mapping in `/etc/hosts`.

---

## Scripting

To help you a little bit:

```
grep node[2345] /etc/hosts | grep -v ^127 |
while read IPADDR NODENAME
do docker-machine create $NODENAME --driver generic \
   --swarm --swarm-discovery token://$TOKEN \
   --generic-ssh-user docker \
   --generic-ip-address $IPADDR \
</dev/null
done
```

Fun fact: Machine drains stdin.

That's why we use `</dev/null` here.

<!---
Let's fix Markdown coloring with this one weird trick!
-->

---

## Running containers on Swarm

Try to run a few `busybox` containers.

Then, let's get serious:

.exercise[

- Start a Redis service:
  <br/>`docker run -dP redis`

- See the service address:
  <br/>`docker port $(docker ps -lq) 6379`

]

This can be any of your five nodes.

---

# Building our app on Swarm

- Swarm has partial support for builds

- .icon[![Warning](warning.png)] Older versions of Compose would crash on builds

- Try it!

.exercise[

- Run `docker-compose build` once ...

- Run `docker-compose build` twice ...

- What happened?

]

---

## Re-thinking the build process

- Let's step back and think for a minute ...

- What should `docker build` do on Swarm?

  - build on one machine

  - build everywhere ($$$)

- After the build, what should `docker run` do?

  - run where we built (how do we know where it is?)

  - run on any machine that has the image

- What do, what do‽

---

## The plan

- Build locally

- Tag images

- Upload them to the hub

- Update the Compose file to use those images

*That's the purpose of the `build-tag-push.py` script!*

---

## Build, Tag, And Push

Let's inspect the source code of `build-tag-push.py` and run it.

.icon[![Warning](warning.png)] It is better to run it against a single node!

(There are some race conditions within Swarm when building+pushing too fast.)

.exercise[

- Point to a single node:
  <br/>`eval $(docker-machine env node1)`

- Run the script (from the `dockercoins` directory):
  <br/>`../build-tag-push.py`

- Inspect the `docker-compose.yml-XXX` file that it created

]

---

## Can we run this now?

Let's try!

.exercise[

- Switch back to the Swarm cluster:
  <br/>`eval $(docker-machine env node1 --swarm)`

- Bring up the application:
  <br/>`docker-compose -f docker-compose.yml-XXX up`

]

--

It won't work, because Compose and Swarm do not collaborate
to establish *placement constraints*.

--

(╯°□°)╯︵ ┻━┻

---

## Simple container dependencies

- Container A has a link to container B

- Compose starts B first, then A

- Swarm translates the link into a placement constraint:

  - *"put A on the same node as B"*

- Alles gut

---

## Complex container dependencies

- Container A has a link to containers B and C

- Compose starts B and C first
  <br/>(but that can be on different nodes!)

- Compose starts A

- Swarm translates the links into placements contraints

  - *"put A on the same node as B"*
  - *"put A on the same node as C"*

- If B and C are on different nodes, that's impossible

So, what do‽

---

## A word on placement constraints

- Swarm supports constraints

- We could tell swarm to put all our containers together

- Linking would work

- But all containers would end up on the same node

--

- So having a cluster would be pointless!

---

# Network plumbing on Swarm

- We will use one-tier, dynamic ambassadors
  <br/>(as seen before)

- Other available options:

  - injecting service addresses in environment variables

  - implementing service discovery in the application

  - use Docker Engine Experimental + network plugins
    <br/>(or any other overlay network like Weave or Pipework)

---

## Revisiting `jpetazzo/hamba`

- Configuration is stored in a *volume*

- A watcher process looks for configuration updates,
  <br/>and restarts HAProxy when needed

- It can be started without configuration:

  ```
  docker run --name amba jpetazzo/hamba run
  ```

- There is a helper to inject a new configuration:

  ```
  docker run --rm --volumes-from amba jpetazzo/hamba \
         reconfigure 80 backend1 port1 backend2 port2 ...
  ```

.footnote[Note: configuration validation and error messages
will be logged by the ambassador, not the `reconfigure` container.]

---

## Should we use `links` for our ambassadors?

Technically, we could use links.

- Before starting an app container:

  start the ambassador(s) it needs

- When starting an app container:

  link it to its ambassador(s)

But we wouldn't be able to use `docker-compose scale` anymore.

---

## Network namespaces and `extra_hosts`

This is our plan:

- Replace each `link` with an `extra_host`,
  <br/>pointing to the `127.127.X.X` address space

- Start app containers normally
  <br/>(`docker-compose up`, `docker-compose scale`)

- Start ambassadors after app containers are up:

  - ambassadors bind to `127.127.X.X`

  - they share their client's network namespace

- Reconfigure ambassadors each time something changes

---

## Our plan for service discovery

- Replace all `links` with static `/etc/hosts` entries

- Those entries will map to `127.127.0.X`
  <br/>(with different `X` for each service)

- Example: `redis` will point to `127.127.0.2`
  <br/>(instead of a container address)

- Start all services; scale them if we want
  <br/>(at this point, they will all fail to connect)

- Start ambassadors in the services' namespace;
  <br/>each ambassador will listen on the right `127.127.0.X`

- Gather all backend addresses and configure ambassadors

.icon[![Warning](warning.png)] Services should try to reconnect!

---

## "Design for failure," they said

- When the containers are started, the network is not ready

- First connection attempts **will fail**

- App should try to reconnect

- It is OK to crash and restart

- Exponential back-off is nice

---

## Our tools

- `link-to-ambassadors.py`

  - replaces all `links` with `extra_hosts` entries

- `create-ambassadors.py`

  - scans running containers
  - allocates `127.127.X.X` addresses
  - starts (unconfigured) ambassadors

- `configure-ambassadors.py`

  - scans running containers
  - gathers backend addresses
  - sends configuration to ambassadors

---

## Convert links to ambassadors

- When we ran `build-tag-push.py` earlier,
  <br/>it generated a new `docker-compose.yml-XXX` file.

.exercise[

- Run the first script to create a new YAML file:
  <br/>`../link-to-ambassadors.py docker-compose.yml-XXX a.yml`

- Look how the file was modified:
  <br/>`diff docker-compose.yml-XXX a.yml`

]

The script can take one or two file name arguments:

- two arguments indicate input and output files to use;
- with one argument, the file will be modified in place.

---

## Bring up the application

The application can now be started and scaled.

Remember to use the *new* YAML file!

.exercise[

- Start the application:
  <br/>`docker-compose -f a.yml up -d`

- Scale the application:
  <br/>`docker-compose -f a.yml scale worker=5 rng=10`

]

Note: you can scale everything as you like, *except Redis*,
because it is stateful.

---

## Create the ambassadors

This has to be executed each time you create new services
or scale up existing ones.

The script takes the YAML file as its only argument.

It will scan and compare:

- the list of app containers,
- the list of ambassadors.

It will create missing ambassadors.

.exercise[

- Run the script!
  <br/>`../create-ambassadors.py a.yml`

]

---

## Configure the ambassadors

All ambassadors are created but they still need configuration.

That's the purpose of the last script.

It will gather:

- the list of app backends,
- the list of ambassadors.

Then it configures all ambassadors with all found backends.

.exercise[

- Run it!
  <br/>`../configure-ambassadors.py a.yml`

]

---

## Check what we did

.exercise[


- Find out the address of the web UI:
  <br/>`docker-compose ps webui`

- Point your browser to it

- Check the logs:
  <br/>`docker-compose logs`

]

---

# Going further

Scaling the application (difficulty: easy)

- Run `docker-compose scale`

- Re-create ambassadors

- Re-configure ambassadors

- No downtime

---

## Going further

Deploying a new version (difficulty: easy)

- Just re-run all the steps!

- However, Compose will re-create the containers

- You will have to re-create ambassadors
  <br/>(and configure them)

- You will have to cleanup old ambassadors
  <br/>(left as an exercise for the reader)

- You will experience a little bit of downtime

---

## Going further

Zero-downtime deployment (difficulty: medium)

- Isolate stateful services
  <br/>(like we did earlier for Redis)

- Do blue/green deployment:

  - deploy and scale version N

  - point a "top-level" load balancer to the app

  - deploy and scale version N+1

  - put both apps in the "top-level" balancer

  - slowly switch traffic over to app version N+1

---

## Going further

Harder projects:

- Try two-tier or three-tier ambassador deployments

- Try overlay networking instead of ambassadors

- Try to deploy Mesos or Kubernetes

---

class: title

# Thanks! <br/> Questions?

### [@jpetazzo](https://twitter.com/jpetazzo) <br/> [@docker](https://twitter.com/docker)

    </textarea>
    <script src="https://gnab.github.io/remark/downloads/remark-0.5.9.min.js" type="text/javascript">
    </script>
    <script type="text/javascript">
      var slideshow = remark.create();
    </script>
  </body>
</html>