Docker Orchestration Workshop

---

## Logistics

- Hello! We're `jerome at docker dot com` and `aj at soulshake dot net`

<!--
Reminder, when updating the agenda: when people are told to show
up at 9am, they usually trickle in until 9:30am (except for paid
training sessions). If you're not sure that people will be there
on time, it's a good idea to have a breakfast with the attendees
at e.g. 9am, and start at 9:30.

- Agenda:

.small[
- 08:00-09:00 hello and breakfast
- 09:00:10:25 part 1
- 10:25-10:35 coffee break
- 10:35-12:00 part 2
- 12:00-13:00 lunch break
- 13:00-14:25 part 3
- 14:25-14:35 coffee break
- 14:35-16:00 part 4
]

-->

- The tutorial will run from 1:20pm to 4:40pm

- There will be a break from 3:00pm to 3:15pm

- This will be FAST PACED, but DON'T PANIC!

- All the content is publicly available (slides, code samples, scripts)

- Live feedback, questions, help on 
  [Gitter](http://container.training/chat)

---

## Chapter 1: getting started

- Pre-requirements
- VM environment
- Our sample application
- Running the application
- Identifying bottlenecks
- Scaling out
- Connecting to containers on other hosts
- Abstracting remote services with ambassadors

---

## Chapter 2: Swarm setup and deployment

- Dynamic orchestration
- Deploying Swarm
- Picking a key/value store
- Running containers on Swarm
- Resource allocation
- Multi-host networking
- Building images with Swarm
- Deploying a local registry
- Scaling web services with Compose on Swarm

---

## Chapter 3: Docker for Ops

- Logs
- Setting up ELK to store container logs
- Network traffic analysis
- Backups
- Controlling Docker from a container
- Docker events stream
- Security upgrades

---

## Chapter 4: high availability (additional content)

- Distributing Machine credentials
- Highly available Swarm managers
- Highly available containers
- Conclusions

---

# Pre-requirements

- Computer with network connection and SSH client

- on Linux, OS X, FreeBSD... you are probably all set

- on Windows, get [putty](http://www.putty.org/),
  [Git BASH](https://git-for-windows.github.io/), or
  [MobaXterm](http://mobaxterm.mobatek.net/)

- Basic Docker knowledge
 (but that's OK if you're not a Docker expert!)

---

## Nice-to-haves

- [GitHub](https://github.com/join) account
 (if you want to fork the repo; also used to join Gitter)

- [Gitter](https://gitter.im/) account
 (to join the conversation during the workshop)

- [Docker Hub](https://hub.docker.com) account
 (it's one way to distribute images on your Swarm cluster)

---

## Hands-on sections

- The whole workshop is hands-on

- I will show Docker in action

- I invite you to reproduce what I do

- All hands-on sections are clearly identified, like the gray rectangle below

- This is the stuff you're supposed to do!
- Go to [container.training](http://container.training/) to view these slides
- Join the chat room on
  [Gitter](http://container.training/chat)

]

---

# VM environment

- Each person gets 5 private VMs (not shared with anybody else)
- They'll be up until tonight
- You have a little card with login+password+IP addresses
- You can automatically SSH from one VM to another

- Log into the first VM (`node1`)
- Check that you can SSH (without password) to `node2`:
  ```bash
  ssh node2
  ```
- Type `exit` or `^D` to come back to node1

]

---

## We will (mostly) interact with node1 only

- Unless instructed, **all commands must be run from the first VM, `node1`**

- We will only checkout/copy the code on `node1`

- When we will use the other nodes, we will do it mostly through the Docker API

- We will use SSH only for a few "out of band" operations (mass-removing containers...)

---

## Terminals

Once in a while, the instructions will say:
 "Open a new terminal."

There are multiple ways to do this:

- create a new window or tab on your machine, and SSH into the VM;

- use screen or tmux on the VM and open a new window from there.

You are welcome to use the method that you feel the most comfortable with.

---

## Tmux cheatsheet

- Ctrl-b c → creates a new window
- Ctrl-b n → go to next window
- Ctrl-b p → go to previous window
- Ctrl-b " → split window top/bottom
- Ctrl-b % → split window left/right
- Ctrl-b Alt-1 → rearrange windows in columns
- Ctrl-b Alt-2 → rearrange windows in rows
- Ctrl-b arrows → navigate to other windows
- Ctrl-b d → detach session
- tmux attach → reattach to session

---

## Brand new versions!

- Engine 1.11
- Compose 1.7
- Swarm 1.2
- Machine 0.6

- Check all installed versions:
  ```bash
  docker version
  docker-compose -v
  docker run --rm swarm -version
  docker-machine -v
  ```

]

---

## Why are we not using the latest version of Machine?

- The latest version of Machine is 0.7

- The way it deploys Swarm is different from 0.6

- This causes a regression in the strategy that we will use later

- More details later!

---

# Our sample application

- Visit the GitHub repository with all the materials of this workshop:
 https://github.com/jpetazzo/orchestration-workshop

- The application is in the [dockercoins](
  https://github.com/jpetazzo/orchestration-workshop/tree/master/dockercoins)
  subdirectory

- Let's look at the general layout of the source code:

there is a Compose file [docker-compose.yml](
  https://github.com/jpetazzo/orchestration-workshop/blob/master/dockercoins/docker-compose.yml) ...

... and 4 other services, each in its own directory:

- `rng` = web service generating random bytes
  - `hasher` = web service computing hash of POSTed data
  - `worker` = background process using `rng` and `hasher`
  - `webui` = web interface to watch progress

---

## Compose file format version

*Particularly relevant if you have used Compose before...*

- Compose 1.6 introduced support for a new Compose file format (aka "v2")

- Services are no longer at the top level, but under a `services` section

- There has to be a `version` key at the top level, with value `"2"` (as a string, not an integer)

- Containers are placed on a dedicated network, making links unnecessary

- There are other minor differences, but upgrade is easy and straightforward

---

## Links, naming, and service discovery

- Containers can have network aliases (resolvable through DNS)

- Compose file version 2 makes each container reachable through its service name

- Compose file version 1 requires "links" sections

- Our code can connect to services using their short name

(instead of e.g. IP address or FQDN)

---

## Example in `worker/worker.py`

![Service discovery](service-discovery.png)

---

## What's this application?

---

![DockerCoins logo](dockercoins.png)

(DockerCoins 2016 logo courtesy of @XtlCnslt and @ndeloof. Thanks!)

---

## What's this application?

- It is a DockerCoin miner! 💰🐳📦🚢

- No, you can't buy coffee with DockerCoins

- How DockerCoins works:

- `worker` asks to `rng` to give it random bytes
  - `worker` feeds those random bytes into `hasher`
  - each hash starting with `0` is a DockerCoin
  - DockerCoins are stored in `redis`
  - `redis` is also updated every second to track speed
  - you can see the progress with the `webui`

---

## Getting the application source code

- We will clone the GitHub repository

- The repository also contains scripts and tools that we will use through the workshop

- Clone the repository on `node1`:
  ```bash
  git clone git://github.com/jpetazzo/orchestration-workshop
  ```

]

(You can also fork the repository on GitHub and clone your fork if you prefer that.)

---

# Running the application

Without further ado, let's start our application.

- Go to the `dockercoins` directory, in the cloned repo:
  ```bash
  cd ~/orchestration-workshop/dockercoins
  ```

- Use Compose to build and run all containers:
  ```bash
  docker-compose up
  ```

]

Compose tells Docker to build all container images (pulling
the corresponding base images), then starts all containers,
and displays aggregated logs.

---

## Lots of logs

- The application continuously generates logs

- We can see the `worker` service making requests to `rng` and `hasher`

- Let's put that in the background

- Stop the application by hitting `^C`

]

- `^C` stops all containers by sending them the `TERM` signal

- Some containers exit immediately, others take longer
 (because they don't handle `SIGTERM` and end up being killed after a 10s timeout)

---

## Restarting in the background

- Many flags and commands of Compose are modeled after those of `docker`

- Start the app in the background with the `-d` option:
  ```bash
  docker-compose up -d
  ```

- Check that our app is running with the `ps` command:
  ```bash
  docker-compose ps
  ```

]

`docker-compose ps` also shows the ports exposed by the application.

---

## Viewing logs

- The `docker-compose logs` command works like `docker logs`

- View all logs since container creation and exit when done:
  ```bash
  docker-compose logs
  ```

- Stream container logs, starting at the last 10 lines for each container:
  ```bash
  docker-compose logs --tail 10 --follow
  ```

]

Tip: use `^S` and `^Q` to pause/resume log output.

???

## Upgrading from Compose 1.6

- Up to 1.6

- `docker-compose logs` is the equivalent of `logs --follow`

- `docker-compose logs` must be restarted if containers are added

- Since 1.7

- `--follow` must be specified explicitly

- new containers are automatically picked up by `docker-compose logs`

---

## Connecting to the web UI

- The `webui` container exposes a web dashboard; let's view it

- Open http://[yourVMaddr]:8000/ (from a browser)

]

- The app actually has a constant, steady speed (3.33 coins/second)

- The speed seems not-so-steady because:

- the worker doesn't update the counter after every loop, but up to once per second

- the speed is computed by the browser, checking the counter about once per second

- between two consecutive updates, the counter will increase either by 4, or by 0

---

## Scaling up the application

- Our goal is to make that performance graph go up (without changing a line of code!)

- Before trying to scale the application, we'll figure out if we need more resources

(CPU, RAM...)

- For that, we will use good old UNIX tools on our Docker node

---

## Looking at resource usage

- Let's look at CPU, memory, and I/O usage

- run `top` to see CPU and memory usage (you should see idle cycles)

- run `vmstat 3` to see I/O usage (si/so/bi/bo)
 (the 4 numbers should be almost zero, except `bo` for logging)

]

We have available resources.

- Why?
- How can we use them?

---

## Scaling workers on a single node

- Docker Compose supports scaling
- Let's scale `worker` and see what happens!

- Start one more `worker` container:
  ```bash
  docker-compose scale worker=2
  ```

- Look at the performance graph (it should show a x2 improvement)

- Look at the aggregated logs of our containers (`worker_2` should show up)

- Look at the impact on CPU load with e.g. top (it should be negligible)

]

---

## Adding more workers

- Great, let's add more workers and call it a day, then!

- Start eight more `worker` containers:
  ```bash
  docker-compose scale worker=10
  ```

- Look at the performance graph: does it show a x10 improvement?

- Look at the aggregated logs of our containers

- Look at the impact on CPU load and memory usage

]

---

# Identifying bottlenecks

- You should have seen a 3x speed bump (not 10x)

- Adding workers didn't result in linear improvement

- *Something else* is slowing us down

- ... But what?

- The code doesn't have instrumentation

- Let's use state-of-the-art HTTP performance analysis!
 (i.e. good old tools like `ab`, `httping`...)

---

## Measuring latency under load

We will use `httping`.

- Check the latency of `rng`:
  ```bash
  httping -c 10 localhost:8001
  ```

- Check the latency of `hasher`:
  ```bash
  httping -c 10 localhost:8002
  ```

]

`rng` has a much higher latency than `hasher`.

---

## Let's draw hasty conclusions

- The bottleneck seems to be `rng`

- *What if* we don't have enough entropy and can't generate enough random numbers?

- We need to scale out the `rng` service on multiple machines!

Note: this is a fiction! We have enough entropy. But we need a pretext to scale out.
 (In fact, the code of `rng` uses `/dev/urandom`, which doesn't need entropy.)

---

# Scaling out

---

# Connecting to containers on other hosts

- So far, our whole stack is on a single machine

- We want to scale out (across multiple nodes)

- We will deploy the same stack multiple times

- But we want every stack to use the same Redis
 (in other words: Redis is our only *stateful* service here)

- And remember: we're not allowed to change the code!

- the code connects to host `redis`
  - `redis` must resolve to the address of our Redis service
  - the Redis service must listen on the default port (6379)

???

## Using custom DNS mapping

- We could setup a Redis server on its default port

- And add a DNS entry mapping `redis` to this server

- See what happens if we run:
  ```bash
  docker run --add-host redis:1.2.3.4 alpine ping redis
  ```

]

There is a Compose file option for that: `extra_hosts`.

---

# Abstracting remote services with ambassadors

<!--

- What if we can't/won't run Redis on its default port?

- What if we want to be able to move it easily?

-->

- We will use an ambassador

- Redis will be started independently of our stack

- It will run at an arbitrary location (host+port)

- In our stack, we replace `redis` with an ambassador

- The ambassador will connect to Redis

- The ambassador will "act as" Redis in the stack

---

![Ambassador principle](static-orchestration-1-node-a.png)

---

![Ambassador principle](static-orchestration-1-node-b.png)

---

![Ambassador principle](static-orchestration-1-node-c.png)

---

![Ambassador principle](static-orchestration-2-nodes.png)

---

![Ambassador principle](static-orchestration-3-nodes.png)

---

![Ambassador principle](static-orchestration-4-nodes.png)

---

![Ambassador principle](static-orchestration-5-nodes.png)

---

## Start redis

- Start a standalone Redis container

- Let Docker expose it on a random port

- Run redis with a random public port:
 `docker run -d -P --name myredis redis`

- Check which port was allocated:
 `docker port myredis 6379`

]

- Note the IP address of the machine, and this port

---

## Introduction to `jpetazzo/hamba`

- General purpose load balancer and traffic director

- [Source code is available on GitHub](
  https://github.com/jpetazzo/hamba)

- [Public image is available on the Docker Hub](
  https://hub.docker.com/r/jpetazzo/hamba/)

- Generates a configuration file for HAProxy, then starts HAProxy

- Parameters are provided on the command line; for instance:
  ```bash
  docker run -d -p 80 jpetazzo/hamba 80 www1:1234 www2:2345
  docker run -d -p 80 jpetazzo/hamba 80 www1 1234 www2 2345
  ```
  Those two commands do the same thing: they start a load balancer
  listening on port 80, and balancing traffic across www1:1234 and www2:2345

---

## Update `docker-compose.yml`

- Replace `redis` with an ambassador using `jpetazzo/hamba`:
  ```yaml
    redis:
      image: jpetazzo/hamba
      command: 6379 `AA.BB.CC.DD:EEEEE`
  ```

]

Shortcut: `docker-compose.yml-ambassador`
 (But you still have to update `AA.BB.CC.DD:EEEEE`!)

---

## Start the stack on the first machine

- Compose will detect the change in the `redis` service

- It will replace `redis` with a `jpetazzo/hamba` instance

- Just tell Compose to do its thing:
 `docker-compose up -d`

- Check that the stack is up and running:
 `docker-compose ps`

- Look at the web UI to make sure that it works fine

]

---

## Controlling other Docker Engines

- Many tools in the ecosystem will honor the `DOCKER_HOST` environment variable

- Those tools include (obviously!) the Docker CLI and Docker Compose

- Our training VMs have been setup to accept API requests on port 55555
 (without authentication - this is very insecure, by the way!)

- We will see later how to setup mutual authentication with certificates

---

## Setting the `DOCKER_HOST` environment variable

- Check how many containers are running on `node1`:
  ```bash
  docker ps
  ```

- Set the `DOCKER_HOST` variable to control `node2`, and compare:
  ```bash
  export DOCKER_HOST=tcp://node2:55555
  docker ps
  ```

]

You shouldn't see any container running on `node2` at this point.

---

## Start the stack on another machine

- We will tell Compose to bring up our stack on the other node

- It will use the local code (we don't need to checkout the code on `node2`)

- Start the stack:
  ```bash
  docker-compose up -d
  ```

]

Note: this will build the container images on `node2`, resulting
in potentially different results from `node1`. We will see later
how to use the same images across the whole cluster.

---

## Run the application on every node

- We will repeat the previous step with a little shell loop

... but introduce parallelism to save some time

- Deploy one instance of the stack on each node:

```bash
    for N in 3 4 5; do
      DOCKER_HOST=tcp://node$N:55555 docker-compose up -d &
    done
    wait
  ```

]

Note: again, this will rebuild the container images on each node.

---

## Scale!

- The app is built (and running!) everywhere

- Scaling can be done very quickly

- Add a bunch of workers all over the place:

```bash
    for N in 1 2 3 4 5; do
      DOCKER_HOST=tcp://node$N:55555 docker-compose scale worker=10
    done
  ```

- Admire the result in the web UI!

]

---

## A few words about development volumes

- Try to access the web UI on another node

- It doesn't work! Why?

- Static assets are masked by an empty volume

- We need to comment out the `volumes` section

---

## Why must we comment out the `volumes` section?

- Volumes have multiple uses:

- storing persistent stuff (database files...)

- sharing files between containers (logs, configuration...)

- sharing files between host and containers (source...)

- The `volumes` directive expands to an host path:

`/home/docker/orchestration-workshop/dockercoins/webui/files`

- This host path exists on the local machine (not on the others)

- This specific volume is used in development (not in production)

---

## Stop the app

- Let's use `docker-compose down`

- It will stop and remove the DockerCoins app (but leave other containers running)

- We can do another simple parallel shell loop:
  ```bash
    for N in $(seq 1 5); do
      export DOCKER_HOST=tcp://node$N:55555
      docker-compose down &
    done
    wait
  ```

]

---

## Clean up the redis container

- `docker-compose down` only removes containers defined with Compose

- Check that `myredis` is still there:
  ```bash
  unset DOCKER_HOST
  docker ps
  ```

- Remove it:
  ```bash
  docker rm -f myredis
  ```

]

---

## Considerations about ambassadors

"Ambassador" is a design pattern.

There are many ways to implement it.

Others implementations include:

- [interlock](https://github.com/ehazlett/interlock);
- [registrator](http://gliderlabs.com/registrator/latest/);
- [smartstack](http://nerds.airbnb.com/smartstack-service-discovery-cloud/);
- [zuul](https://github.com/Netflix/zuul/wiki);
- and more!

<!--

We will present three increasingly complex (but also powerful)
ways to deploy ambassadors.

-->

???

## Single-tier ambassador deployment

- One-shot configuration process

- Must be executed manually after each scaling operation

- Scans current state, updates load balancer configuration

- Pros:
 - simple, robust, no extra moving part
 - easy to customize (thanks to simple design)
 - can deal efficiently with large changes

- Cons:
 - must be executed after each scaling operation
 - harder to compose different strategies

- Example: this workshop

???

## Two-tier ambassador deployment

- Daemon listens to Docker events API

- Reacts to container start/stop events

- Adds/removes back-ends to load balancers configuration

- Pros:
 - no extra step required when scaling up/down

- Cons:
 - extra process to run and maintain
 - deals with one event at a time (ordering matters)

- Hidden gotcha: load balancer creation

- Example: interlock

???

## Three-tier ambassador deployment

- Daemon listens to Docker events API

- Reacts to container start/stop events

- Adds/removes scaled services in distributed config DB (Zookeeper, etcd, Consul…)

- Another daemon listens to config DB events,
 adds/removes backends to load balancers configuration

- Pros:
 - more flexibility

- Cons:
 - three extra services to run and maintain

- Example: registrator

---

## Ambassadors and overlay networks

- Overlay networks allow direct multi-host communication

- Ambassadors are still useful to implement other tasks:

- load balancing;

- credentials injection;

- instrumentation;

- fail-over;

- etc.

---

# Dynamic orchestration

---

## Static vs Dynamic

- Static

- you decide what goes where

- simple to describe and implement

- seems easy at first but doesn't scale efficiently

- Dynamic

- the system decides what goes where

- requires extra components (HA KV...)

- scaling can be finer-grained, more efficient

---

## Hands-on Swarm

![Swarm Logo](swarm.png)

---

## Swarm (in theory)

- Consolidates multiple Docker hosts into a single one

- You talk to Swarm using the Docker API

→ you can use all existing tools: Docker CLI, Docker Compose, etc.

- Swarm talks to your Docker Engines using the Docker API too

→ you can use existing Engines without modification

- Dispatches (schedules) your containers across the cluster, transparently

- Open source and written in Go (like the Docker Engine)

- Initial design and implementation by [@aluzzardi](https://twitter.com/aluzzardi) and [@vieux](https://twitter.com/vieux),
  who were also the authors of the first versions of the Docker Engine

---

## Swarm (in practice)

- Stable since November 2015

- Easy to setup (compared to other orchestrators)

- Tested with 1000 nodes + 50000 containers
 .small[(without particular tuning; see DockerCon EU opening keynotes!)]

- Requires a key/value store for advanced features

- Can use Consul, etcd, or Zookeeper

---

# Deploying Swarm

- Components involved:

- cluster discovery mechanism
 (so that the manager can learn about the nodes)

- Swarm manager
 (your frontend to the cluster)

- Swarm agent
 (runs on each node, registers it with service discovery)

---

## Cluster discovery

- Possible backends:

- dynamic, self-hosted
 (requires to run a Consul/etcd/Zookeeper cluster)

- static, through command-line or file
 (great for testing, or for private subnets, see [this article](
 https://medium.com/on-docker/docker-swarm-flat-file-engine-discovery-2b23516c71d4#.6vp94h5wn))

- external, token-based
 (dynamic; nothing to operate; relies on external service operated by Docker Inc.)

---

## Swarm agent

- Used only for dynamic discovery (ZK, etcd, Consul, token)

- Must run on each node

- Every 20s (by default), tells to the discovery system:
  
  *"Hello, there is a Swarm node at A.B.C.D:EFGH"*

- Must know the node's IP address
  
  (It cannot figure it out by itself, because it doesn't know whether to use public or private addresses)

- The node continues to work even if the agent dies

---

## Swarm manager

- Accepts Docker API requests

- Communicates with the cluster nodes

- Performs healthchecks, scheduling...

---

# Picking a key/value store

- We are going to use a key/value store, and use it for:

- cluster membership discovery

- overlay networks backend
  
  - resilient storage of important credentials
  
  - Swarm leader election

- We are going to use Consul, and run one Consul instance on each node

(That way, we can always access Consul over localhost)

---

## Do we really need a key/value store?

- Cluster membership discovery doesn't *require* a key/value store

(We could use the token mechanism instead)

- Network overlays don't *require* a key/value store

(We could use a plugin like Weave instead)

- Credentials can be distributed through other mechanisms

(E.g. copying them to a private S3 bucket)

- Swarm leader election, however, requires a key/value store

---

## Why are we using a key/value store, then?

- Each aforementioned mechanism requires some reliable, distributed storage

- If we don't use our own key/value store, we end up using *something else*:

- Docker Inc.'s centralized token discovery service

- [Weave's CRDT protocol](https://github.com/weaveworks/weave/wiki/IP-allocation-design)

- AWS S3 (or your cloud provider's equivalent, or some other file storage system)

- Each of those is one extra potential point of failure

- See for instance [Kyle Kingsbury's analysis of Chronos](https://aphyr.com/posts/326-jepsen-chronos) for an illustration of this problem

- By operating our own key/value store, we have 1 extra service instead of 3 (or more)

---

## Should we always use a key/value store?

- No!

- If you don't want to operate your own key/value store, don't do it

- You might be more comfortable using tokens + Weave + S3, for instance

- You can also use static discovery

- Maybe you don't even need overlay networks

---

## Why Consul?

- Consul is not the "official" or best way to do this

- This is an arbitrary decision made by Truly Yours

- I *personally* find Consul easier to setup for a workshop like this

- ... But etcd and Zookeper will work too!

---

## Setting up our Swarm cluster

We need to:

- create certificates,

- distribute them on our nodes,

- run the Swarm agent on every node,

- run the Swarm manager on `node1`,

- reconfigure the Engine on each node to add extra flags (for overlay networks).

That's a lot of work, so we'll use Docker Machine to automate this.

---

## Using Docker Machine to setup a Swarm cluster

- Docker Machine has two primary uses:

- provisioning cloud instances running the Docker Engine

- managing local Docker VMs within e.g. VirtualBox

- It can also create Swarm clusters, and will:

- create and manage certificates

- automatically start swarm agent and manager containers

- It comes with a special driver, `generic`, to (re)configure existing machines

---

## Setting up Docker Machine

- Install `docker-machine` (single binary download)

(This is already done on your VMs!)

- Set a few environment variables (cloud credentials)
  ```bash
  export AWS_ACCESS_KEY_ID=AKI...
  export AWS_SECRET_ACCESS_KEY=...
  export AWS_DEFAULT_REGION=eu-west-2
  export DIGITALOCEAN_ACCESS_TOKEN=...
  export DIGITALOCEAN_SIZE=2gb
  export AZURE_SUBSCRIPTION_ID=...
  ```

(We already have 5 nodes, so we don't need to do this!)

---

## Creating nodes with Docker Machine

- The only two mandatory parameters are the driver to use, and the machine name:
  ```bash
  docker-machine create -d digitalocean node42
  ```

- *Tons* of parameters can be specified; see [Docker Machine driver documentation](https://docs.docker.com/machine/drivers/)

- To list machines and their status:
  ```bash
  docker-machine ls
  ```

- To destroy a machine:
  ```bash
  docker-machine rm node42
  ```

---

## Communicating with nodes managed by Docker Machine

- Select a machine for use:
  ```bash
  eval $(docker-machine env node42)
  ```
  This will set a few environment variables (at least `DOCKER_HOST`).

- Execute regular commands with Docker, Compose, etc.

(They will pick up remote host address from environment)

- If you need to go under the hood, you can get SSH access:
  ```bash
  docker-machine ssh node42
  ```

---

## Docker Machine `generic` driver

- Most drivers work the same way:

- use cloud API to create instance

- connect to instance over SSH

- install Docker

- The `generic` driver skips the first step

- It can install Docker on any machine, as long as you have SSH access

- We will use that!

---

## Setting up Swarm with Docker Machine

When invoking Machine, we will provide three sets of parameters:

- the machine driver to use (`generic`) and the SSH connection information

- Swarm-specific options indicating the cluster membership discovery mechanism

- Extra flags to be passed to the Engine, to enable overlay networks

---

## Provisioning the first node

- Use the following command to provision the manager node:

```bash
    docker-machine create --driver generic \
      --engine-opt cluster-store=consul://localhost:8500 \
      --engine-opt cluster-advertise=eth0:2376 \
      --swarm --swarm-master --swarm-discovery consul://localhost:8500 \
      --generic-ssh-user docker --generic-ip-address `AA.BB.CC.DD` node1
  ```

]

---

## Provisioning the other nodes

- The command is almost the same, but without the `--swarm-master` flag

- We will use a shell snippet for convenience

```bash
  grep node[2345] /etc/hosts | grep -v ^127 |
  while read IPADDR NODENAME
  do docker-machine create --driver generic \
     --engine-opt cluster-store=consul://localhost:8500 \
     --engine-opt cluster-advertise=eth0:2376 \
     --swarm --swarm-discovery consul://localhost:8500 \
     --generic-ssh-user docker \
     --generic-ip-address $IPADDR $NODENAME
  done
```

]

---

## Check what we did

Let's connect to the first node *individually*.

- Select the node with Machine

```bash
  eval $(docker-machine env node1)
  ```

- Execute some Docker commands

```bash
  docker version
  docker info
  ```

]

In the output of `docker info`, we should see `Cluster store` and `Cluster advertise`.

---

## Interact with the node

Let's try a few basic Docker commands on this node.

- Run a simple container:
  ```bash
  docker run --rm busybox echo hello world
  ```

- See running containers:
  ```bash
  docker ps
  ```

]

Two containers should show up: the agent and the manager.

---

## Connect to the Swarm cluster

Now, let's try the same operations, but when talking to the Swarm manager.

- Select the Swarm manager with Machine:

```bash
  eval $(docker-machine env node1 --swarm)
  ```

- Execute some Docker commands

```bash
  docker version
  docker info
  docker ps
  ```

]

The output is different! Let's review this.

---

## `docker version`

Swarm identifies itself clearly:

```
Client:
 Version:      1.11.1
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   5604cbe
 Built:        Tue Apr 26 23:38:55 2016
 OS/Arch:      linux/amd64

Server:
 Version:      swarm/1.2.2
 API version:  1.22
 Go version:   go1.5.4
 Git commit:   34e3da3
 Built:        Mon May  9 17:03:22 UTC 2016
 OS/Arch:      linux/amd64
```

---

## `docker info`

The output of `docker info` on Swarm shows a number of differences from
the output on a single Engine:

.small[
```
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: swarm/1.2.2
Role: primary
Strategy: spread
Filters: health, port, containerslots, dependency, affinity, constraint
Nodes: 0
Plugins: 
 Volume: 
 Network: 
Kernel Version: 4.2.0-36-generic
Operating System: linux
Architecture: amd64
CPUs: 0
Total Memory: 0 B
Name: node1
Docker Root Dir: 
Debug mode (client): false
Debug mode (server): false
WARNING: No kernel memory limit support
```
]
---

## Why zero nodes?

- We haven't started Consul yet

- Swarm discovery is not operational

- Swarm can't discover the nodes

Note: Docker will start (and be functional) without a K/V store.

This lets us run Consul itself in a container.

---

## Adding Consul

- We will run Consul in containers

- We will use the [Consul official image](
  https://hub.docker.com/_/consul/) that was released *very recently*

- We will tell Docker to automatically restart it on reboots

- To simplify network setup, we will use `host` networking

---

## A few words about `host` networking

- Consul needs to be aware of its actual IP address (seen by other nodes)

- It also binds a bunch of different ports

- It makes sense (from a security point of view) to have Consul listening on localhost only

(and have "users", i.e. Engine, Swarm, etc. connect over localhost)

- Therefore, we will use `host` networking!

- Also: Docker Machine 0.6 starts the Swarm containers in `host` networking ...

- ... but Docker Machine 0.7 doesn't (which is why we stick to 0.6 for now)

---

## Consul fundamentals (if I must give you just one slide...)

- Consul nodes can be "just an agent" or "server"

- From the client's perspective, they behave the same

- Only servers are members in the Raft consensus / leader election / etc

(non-server agents forward requests to a server)

- All nodes must be told the address of at least another node to join

(except for the first node, where this is optional)

- At least the first nodes must know how many nodes to expect to have quorum

- Consul can have only one "truth" at a time (hence the importance of quorum)

---

## Starting our Consul cluster

- Make sure you're logged into `node1`, and:

```bash
    IPADDR=$(ip a ls dev eth0 | sed -n 's,.*inet $.*$/.*,\1,p')
    for N in 1 2 3 4 5; do
      ssh node$N -- docker run -d --restart=always --name consul_node$N \
                    -e CONSUL_BIND_INTERFACE=eth0 --net host consul \
                    agent -server -retry-join $IPADDR -bootstrap-expect 5 \
                    -ui -client 0.0.0.0
    done
  ```

]

Note: in production, you probably want to remove `-client 0.0.0.0` since it
gives public access to your cluster! Also adapt `-bootstrap-expect` to your quorum.

---

## Check that our Consul cluster is up

- With your browser, navigate to any instance on port 8500
 (in "NODES" you should see the five nodes)

- Let's run a couple of useful Consul commands

- Ask Consul the list of members it knows:
  ```bash
  docker run --net host --rm consul members
  ```

- Ask Consul which node is the current leader:
  ```bash
  curl localhost:8500/v1/status/leader
  ```

]

---

## Check that our Swarm cluster is up

- Try again the `docker info` from earlier:

```bash
  eval $(docker-machine env --swarm node1)
  docker info
  docker ps
  ```

]

All nodes should be visible. (If not, give them a minute or two to register.)

The Consul containers should be visible.

The Swarm containers, however, are hidden by Swarm (unless you use `docker ps -a`).

---

# Running containers on Swarm

Try to run a few `busybox` containers.

Then, let's get serious:

- Start a Redis service:
 `docker run -dP redis`

- See the service address:
 `docker port $(docker ps -lq) 6379`

]

This can be any of your five nodes.

---

## Scheduling strategies

- Random: pick a node at random
 (but honor resource constraints)

- Spread: pick the node with the least containers
 (including stopped containers)

- Binpack: try to maximize resource usage
 (in other words: use as few hosts as possible)

---

# Resource allocation

- Swarm can honor resource reservations

- This requires containers to be started with resource limits

- Swarm refuses to schedule a container if it cannot honor a reservation

- Start Redis containers with 1 GB of RAM until Swarm refuses to start more:
  ```bash
  docker run -d -m 1G redis
  ```

]

On a cluster of 5 nodes with ~3.8 GB of RAM per node, Swarm will refuse to start the 16th container.

---

## Removing our Redis containers

- Let's use a little bit of shell scripting

- Remove all containers using the redis image:
  ```bash
  docker ps | awk '/redis/ {print $1}' | xargs docker rm -f
  ```

]

???

## Things to know about resource allocation

- `docker info` shows resource allocation for each node

- Swarm allows a 5% resource overcommit (tunable)

- Containers without resource reservation can always be started

- Resources of stopped containers are still counted as being reserved

- this guarantees that it will be possible to restart a stopped container

- containers have to be deleted to free up their resources

- `docker update` can be used to change resource allocation on the fly

---

# Setting up overlay networks

---

# Multi-host networking

- Docker 1.9 has the concept of *networks*

- By default, containers are on the default "bridge" network

- You can create additional networks

- Containers can be on multiple networks

- Containers can dynamically join/leave networks

- The "overlay" driver lets networks span multiple hosts

- Containers can have "network aliases" resolvable through DNS

---

## Manipulating networks, names, and aliases

- The preferred method is to let Compose do the heavy lifting for us

(YAML-defined networking!)

- But if we really need to, we can use the Docker CLI, with:

`docker network ...`

`docker run --net ... --net-alias ...`

- The following slides illustrate those commands

---

## Create a few networks and containers

- Create two networks, *blue* and *green*:
  ```bash
  docker network create blue
  docker network create green
  docker network ls
  ```

- Create containers with names of blue and green
  things, on their respective networks:
  ```bash
  docker run -d --net-alias things --name sky --net blue -m 3G redis
  docker run -d --net-alias things --name navy --net blue -m 3G redis
  docker run -d --net-alias things --name grass --net green -m 3G redis
  docker run -d --net-alias things --name forest --net green -m 3G redis
  ```

]

---

## Check connectivity within networks

- Check that our containers are on different nodes:

```bash
  docker ps
  ```

- This will work:

```bash
  docker run --rm --net blue alpine ping -c 3 navy
  ```

- This will not:

```bash
  docker run --rm --net blue alpine ping -c 3 grass
  ```

]

???

## Containers connected to multiple networks

- Some colors aren't *quite* blue *nor* green

- Create a container that we want to be on both networks:
  ```bash
  docker run -d --net-alias things --net blue --name turquoise redis
  ```

- Check connectivity:
  ```bash
  docker exec -ti turquoise ping -c 3 navy
  docker exec -ti turquoise ping -c 3 grass
  ```
  (First works; second doesn't)

]

???

## Dynamically connecting containers

- This is achieved with the command:
 `docker network connect NETNAME CONTAINER`

- Dynamically connect to the green network:
  ```bash
  docker network connect green turquoise
  ```

- Check connectivity:
  ```bash
  docker exec -ti turquoise ping -c 3 navy
  docker exec -ti turquoise ping -c 3 grass
  ```
  (Both commands work now)

]

---

## Network aliases

- Each container was created with the network alias `things`

- Network aliases are scoped by network

- Resolve the `things` alias from both networks:
  ```bash
    docker run --rm --net blue alpine nslookup things
    docker run --rm --net green alpine nslookup things
  ```

]

???

## Under the hood

- Each network has an interface in the container

- There is also an interface for the default gateway

- View interfaces in our `turquoise` container:
  ```bash
  docker exec -ti turquoise ip addr ls
  ```

]

???

## Dynamically disconnecting containers

- There is a mirror command to `docker network connect`

- Disconnect the *turquoise* container from *blue*
  (its original network):
  ```bash
  docker network disconnect blue turquoise
  ```

- Check connectivity:
  ```bash
  docker exec -ti turquoise ping -c 3 navy
  docker exec -ti turquoise ping -c 3 grass
  ```
  (First command fails, second one works)

]

---

## Cleaning up

- Destroy containers:

```bash
  docker rm -f sky navy grass forest
  ```

- Destroy networks:

```bash
  docker network rm blue
  docker network rm green
  ```

]

---

## Cleaning up after an outage or a crash

- You cannot remove a network if it still has containers

- There is no `"rm -f"` for network

- If a network still has stale endpoints, you can use `"disconnect -f"`

---

# Building images with Swarm

---

## Building images with Swarm

- Special care must be taken when building and running images

- We *can* build images on Swarm (with `docker build` or `docker-compose build`)

- One node will be picked at random, and the build will happen there

- At the end of the build, the image will be present *only on that node*

---

## Building on Swarm can yield inconsistent results

- Builds are scheduled on random nodes

- Multiple builds and rebuilds can happen on different nodes

- If a build happens on a different node, the cache of the previous build cannot be used

- Worse: you can have two different images with the same name on your cluster

---

## Scaling won't work as expected

Consider the following scenario:

- `docker-compose up`
 
 → each service is built on a node, and runs there

- `docker-compose scale`
 
 → additional containers for this service can only be spawned where the image was built

- `docker-compose up` (again)
 
 → services might be built (and started) on different nodes

- `docker-compose scale`
 
 → containers can be spawned with both the new and old images

---

## Scaling correctly with Swarm

- After building an image, it should be distributed to the cluster

(Or made available through a registry, so that nodes can download it automatically)

- Instead of referencing images with the `:latest` tag, unique tags should be used

(Using e.g. timestamps, version numbers, or VCS hashes)

---

## Why can't Swarm do this automatically for us?

- Let's step back and think for a minute ...

- What should `docker build` do on Swarm?

- build on one machine

- build everywhere ($$$)

- After the build, what should `docker run` do?

- run where we built (how do we know where it is?)

- run on any machine that has the image

- Could Compose+Swarm solve this automatically?

---

## A few words about "sane defaults"

- *It would be nice if Swarm could pick a node, and build there!*

- but which node should it pick?
  - what if the build is very expensive?
  - what if we want to distribute the build across nodes?
  - what if we want to tag some builder nodes?
  - ok but what if no node has been tagged?

- *It would be nice if Swarm could automatically push images!*

- using the Docker Hub is an easy choice
 (you just need an account)
 - but some of us can't/won't use Docker Hub
 (for compliance reasons or because no network access)

---

## The plan

- Build on a single node (`node1`)

- Tag images with the current UNIX timestamp (for simplicity)

- Upload them to a registry

- Update the Compose file to use those images

This is all automated with the [`build-tag-push.py` script](https://github.com/jpetazzo/orchestration-workshop/blob/master/bin/build-tag-push.py).

---

## Which registry do we want to use?

- **Docker Hub**

- hosted by Docker Inc.
  - requires an account (free, no credit card needed)
  - images will be public (unless you pay)
  - located in AWS EC2 us-east-1

- **Docker Trusted Registry**

- self-hosted commercial product
  - requires a subscription (free 30-day trial available)
  - images can be public or private
  - located wherever you want

- **Docker open source registry**

- self-hosted barebones repository hosting
  - doesn't require anything
  - doesn't come with anything either
  - located wherever you want

]

---

## Using Docker Hub

- Set the `DOCKER_REGISTRY` environment variable to your Docker Hub user name
 (the `build-tag-push.py` script prefixes each image name with that variable)

- We will also see how to run the open source registry
 (so use whatever option you want!)

- Set the following environment variable:
 `export DOCKER_REGISTRY=jpetazzo`

- (Use *your* Docker Hub login, of course!)

- Log into the Docker Hub:
 `docker login`

]

---

## Using Docker Trusted Registry

If we wanted to use DTR, we would:

- make sure we have a Docker Hub account
- [activate a Docker Datacenter subscription](
  https://hub.docker.com/enterprise/trial/)
- install DTR on our machines
- set `DOCKER_REGISTRY` to `dtraddress:port/user`

*This is out of the scope of this workshop!*

---

## Using open source registry

- We need to run a `registry:2` container
 (make sure you specify tag `:2` to run the new version!)

- It will store images and layers to the local filesystem
 (but you can add a config file to use S3, Swift, etc.)

- Docker *requires* TLS when communicating with the registry,
  unless for registries on `localhost` or with the Engine
  flag `--insecure-registry`

- Our strategy: run a reverse proxy on `localhost:5000` on each node

---

## Registry frontends and backend

![Registry frontends](registry-frontends.png)

---

# Deploying a local registry

- There is a Compose file for that

- Go to the `registry` directory in the repository:
  ```bash
  cd ~/orchestration-workshop/registry
  ```

]

Let's examine the `docker-compose.yml` file.

---

## Running a local registry with Compose

```yaml
version: "2"

services:
  backend:
    image: registry:2
  frontend:
    image: jpetazzo/hamba
    command: 5000 backend:5000
    ports:
      - "127.0.0.1:5000:5000"
    depends_on:
      - backend
```

- *Backend* is the actual registry.
- *Frontend* is the ambassador that we deployed earlier.
 
It communicates with *backend* using an internal network
and network aliases.

---

## Starting a local registry with Compose

- We will bring up the registry

- Then we will ensure that one *frontend* is running
  on each node by scaling it to our number of nodes

- Start the registry:
  ```bash
  docker-compose up -d
  ```

]

---

## "Scaling" the local registry

- This is a particular kind of scaling

- We just want to ensure that one *frontend*
  is running on every single node of the cluster

- Scale the registry:
  ```bash
    for N in $(seq 1 5); do
      docker-compose scale frontend=$N
    done
  ```

]

Note: Swarm might do that automatically for us in the future.

---

## Testing our local registry

- We can retag a small image, and push it to the registry

- Make sure we have the busybox image, and retag it:
  ```bash
  docker pull busybox
  docker tag busybox localhost:5000/busybox
  ```

- Push it:
  ```bash
  docker push localhost:5000/busybox
  ```

]

---

## Checking what's on our local registry

- The registry API has endpoints to query what's there

- Ensure that our busybox image is now in the local registry:
  ```bash
  curl http://localhost:5000/v2/_catalog
  ```

]

The curl command should output:
```json
{"repositories":["busybox"]}
```

---

## Adapting our Compose file to run on Swarm

- We can get rid of all the `ports` section, except for the web UI

- Go back to the dockercoins directory:
  ```bash
  cd ~/orchestration-workshop/dockercoins
  ```

]

---

## Our new Compose file

services:
  rng:
    build: rng

hasher:
    build: hasher

webui:
    build: webui
    ports:
    - "8000:80"

redis:
    image: redis

worker:
    build: worker
```
]

Copy-paste this into `docker-compose.yml`
 (or you can `cp docker-compose.yml-v2 docker-compose.yml`)

---

## Use images, not builds

- We need to replace each `build` with an `image`

- We will use the `build-tag-push.py` script for that

- Set `DOCKER_REGISTRY` to use our local registry

- Make sure that you are building on `node1`

- Then run the script

```bash
  export DOCKER_REGISTRY=localhost:5000
  eval $(docker-machine env node1)
  ../bin/build-tag-push.py
  ```

]

---

## Run the application

- At this point, our app is ready to run

- Start the application:
  ```bash
  export COMPOSE_FILE=docker-compose.yml-`NNN`
  eval $(docker-machine env node1 --swarm)
  docker-compose up -d
  ```

- Observe that it's running on multiple nodes:
 (each container name is prefixed with the node it's running on)
 ```bash
 docker ps
 ```

]

---

## View the performance graph

- Load up the graph in the browser

- Check the `webui` service address and port:
  ```bash
  docker-compose port webui 80
  ```

- Open it in your browser

]

---

## Scaling workers

- Scaling the `worker` service works out of the box
  (like before)

- Scale `worker`:
  ```bash
  docker-compose scale worker=10
  ```

]

Check that workers are on different nodes.

However, we hit the same bottleneck as before.

How can we address that?

---

## Finding the real cause of the bottleneck

- If time permits, we can benchmark `rng` and `hasher` to find out more

- Otherwise, we'll fast-forward a bit

---

## Benchmarking in isolation

- If we want the benchmark to be accurate, we need to make sure that `rng` and `hasher` are not receiving traffic

- Stop the `worker` containers:
  ```bash
  docker-compose kill worker
  ```

]

---

## A better benchmarking tool

- Instead of `httping`, we will now use `ab` (Apache Bench)

- We will install it in an `alpine` container placed on the network used by our application

- Start an interactive `alpine` container on the `dockercoins_rng` network:
  ```bash
  docker run -ti --net dockercoins_default alpine sh
  ```

- Install `ab` with the `apache2-utils` package:
  ```bash
  apk add --update apache2-utils
  ```

]

---

## Benchmarking `rng`

We will send 50 requests, but with various levels of concurrency.

- Send 50 requests, with a single sequential client:
  ```bash
  ab -c 1 -n 50 http://rng/10
  ```

- Send 50 requests, with ten parallel clients:
  ```bash
  ab -c 10 -n 50 http://rng/10
  ```

]

---

## Benchmark results for `rng`

- In both cases, the benchmark takes ~5 seconds to complete

- When serving requests sequentially, they each take 100ms

- In the parallel scenario, the latency increased dramatically:

- one request is served in 100ms
  - another is served in 200ms
  - another is served in 300ms
  - ...
  - another is served in 1000ms

- What about `hasher`?

---

## Benchmarking `hasher`

We will do the same tests for `hasher`.

The command is slightly more complex, since we need to post random data.

First, we need to put the POST payload in a temporary file.

- Install curl in the container, and generate 10 bytes of random data:
  ```bash
  apk add curl
  curl http://rng/10 >/tmp/random
  ```

]

---

## Benchmarking `hasher`

Once again, we will send 50 requests, with different levels of concurrency.

- Send 50 requests with a sequential client:
  ```bash
    ab -c 1 -n 50 -T application/octet-stream \
       -p /tmp/random http://hasher/
  ```

- Send 50 requests with 10 parallel clients:
  ```bash
    ab -c 10 -n 50 -T application/octet-stream \
       -p /tmp/random http://hasher/
  ```

]

---

## Benchmark results for `hasher`

- The sequential benchmarks takes ~5 seconds to complete

- The parallel benchmark takes less than 1 second to complete

- In both cases, each request takes a bit more than 100ms to complete

- Requests are a bit slower in the parallel benchmark

- It looks like `hasher` is better equiped to deal with concurrency than `rng`

---

Why?

---

## Why does everything take (at least) 100ms?

`rng` code:

![RNG code screenshot](delay-rng.png)

`hasher` code:

![HASHER code screenshot](delay-hasher.png)

---

But ...

WHY?!?

---

## Why did we sprinkle this sample app with sleeps?

- Deterministic performance
 (regardless of instance speed, CPUs, I/O...)

- Actual code sleeps all the time anyway

- When your code makes a remote API call:

- it sends a request;

- it sleeps until it gets the response;

- it processes the response.

---

## Why do `rng` and `hasher` behave differently?

![Equations on a blackboard](equations.png)

(Synchronous vs. asynchronous event processing)

---

## How to make `rng` go faster

- Obvious solution: comment out the `sleep` instruction

- Unfortunately, in the real world, network latency exists

- More realistic solution: use an asynchronous framework
 (e.g. use gunicorn with gevent)

- Reminder: we can't change the code!

- Solution: scale out `rng`
 (dispatch `rng` requests on multiple instances)

---

# Scaling web services with Compose on Swarm

- We *can* scale network services with Compose

- The result may or may not be satisfactory, though!

- Restart the `worker` service:
  ```bash
  docker-compose start worker
  ```

- Scale the `rng` service:
  ```bash
  docker-compose scale rng=5
  ```

]

---

## Results

- In the web UI, you might see a performance increase ... or maybe not

- Since Engine 1.11, we get round-robin DNS records

(i.e. resolving `rng` will yield the IP addresses of all 3 containers)

- Docker randomizes the records it sends

- But many resolvers will sort them in unexpected ways

- Depending on various factors, you could get:

- all traffic on a single container
  - traffic perfectly balanced on all containers
  - traffic unevenly balanced across containers

---

## Assessing DNS randomness

- Let's see how our containers resolve DNS requests

- On each of our 10 scaled workers, execute 5 ping requests:
  ```bash
    for N in $(seq 1 10); do
      echo PING__________$N
      for I in $(seq 1 5); do
        docker exec -ti dockercoins_worker_$N ping -c1 rng
      done
    done | grep PING
  ```

]

(The 7th Might Surprise You!)

---

## DNS randomness

- Other programs can yield different results

- Same program on another distro can yield different results

- Same source code with another libc or resolver can yield different results

- Running the same test at different times can yield different results

- Did I mention that Your Results May Vary?

---

## Implementing fair load balancing

- Instead of relying on DNS round robin, let's use a proper load balancer

- Use Compose to create multiple copies of the `rng` service

- Put a load balancer in front of them

- Point other services to the load balancer

---

## Naming problem

- The service is called `rng`

- Therefore, it is reachable with the network name `rng`

- Our application code (the `worker` service) connects to `rng`

- So the name `rng` should resolve to the load balancer

- What should we do‽

---

## Naming is *per-network*

- Solution: put `rng` on its own network

- That way, it doesn't take the network name `rng`
 (at least not on the default network)

- Have the load balancer sit on both networks

- Add the name `rng` to the load balancer

---

Original DockerCoins

![](dockercoins-single-node.png)

---

Load-balanced DockerCoins

![](dockercoins-multi-node.png)

---

## Declaring networks

- Networks (other than the default one)
  *must* be declared
  in a top-level `networks` section,
  placed anywhere in the file

- Add the `rng` network to the Compose file, `docker-compose.yml-NNN`:
  ```yaml
    version: '2'

networks:
      rng:

services:
      rng:
        image: ...
    ...
  ```

]

---

## Putting the `rng` service in its network

- Services can have a `networks` section

- If they don't: they are placed in the default network

- If they do: they are placed only in the mentioned networks

- Change the `rng` service to put it in its network:
  ```yaml
    rng:
      image: localhost:5000/dockercoins_rng:…
      networks:
        rng:
  ```

]

---

## Adding the load balancer

- The load balancer has to be in both networks: `rng` and `default`
- In the `default` network, it must have the `rng` alias
- We will use the `jpetazzo/hamba` image

- Add the `rng-lb` service to the Compose file:
  ```yaml
    rng-lb:
      image: jpetazzo/hamba
      command: run
      networks:
        rng:
        default:
          aliases: [ rng ]
  ```
]

---

## Load balancer initial configuration

- We specified `run` as the initial command

- This tells `hamba` to wait for an initial configuration

- The load balancer will not be operational (until we feed it its configuration)

---

## Start the application

- Bring up DockerCoins:
  ```bash
  docker-compose up -d
  ```

- See that `worker` is complaining:
  ```bash
  docker-compose logs --tail 100 --follow worker
  ```
]

---

## Add one backend to the load balancer

- Multiple solutions:

- lookup the IP address of the `rng` backend
  - use the backend's network name
  - use the backend's container name (easiest!)

- Configure the load balancer:
  ```bash
    docker run --rm --volumes-from dockercoins_rng-lb_1 \
                    --net container:dockercoins_rng-lb_1 \
                    jpetazzo/hamba reconfigure 80 dockercoins_rng_1 80
  ```

]

The application should now be working correctly.

---

## Add all backends to the load balancer

- The command is similar to the one before

- We need to pass the list of all backends

- Reconfigure the load balancer:
  ```bash
    docker run --rm \
      --volumes-from dockercoins_rng-lb_1 \
      --net container:dockercoins_rng-lb_1 \
      jpetazzo/hamba reconfigure 80 \
      $(for N in $(seq 1 5); do
          echo dockercoins_rng_$N:80
        done)
  ```

]

---

## Automating the process

- Nobody loves artisan YAML handicraft

- This can be scripted very easily

- But can it be fully automated?

---

## Use DNS to discover the addresses of all the backends

- When multiple containers have the same network alias:

- Engine 1.10 returns only one of them (the same one across the whole network)

- Engine 1.11 returns all of them (in a random order)

- A "smart" client can use all records to implement load balancing

- We can compose `jpetazzo/hamba` with a special-purpose container,
  which will dynamically generate HAProxy's configuration when
  the DNS records are updated

---

## Introducing `jpetazzo/watchdns`

- [100 lines of pure POSIX scriptery](
  https://github.com/jpetazzo/watchdns/blob/master/watchdns)

- Resolves a given DNS name every second

- Each time the result changes, a new HAProxy configuration is generated

- When used together with `--volumes-from` and `jpetazzo/hamba`, it
  updates the configuration of an existing load balancer

- Comes with a companion script, [`add-load-balancer-v2.py`](https://github.com/jpetazzo/orchestration-workshop/blob/master/bin/add-load-balancer-v2.py), to update your Compose files

---

## Using `jpetazzo/watchdns`

- First, revert the Compose file to remove the load balancer

- Then, run `add-load-balancer-v2.py`:
  ```bash
  ../bin/add-load-balancer-v2.py rng
  ```

- Inspect the resulting Compose file

]

---

## Scaling with `watchdns`

- Start the application with the new sidekick containers:
  ```bash
  docker-compose up -d
  ```

- Scale `rng`:
  ```bash
  docker-compose scale rng=10
  ```

- Check logs:
  ```bash
  docker-compose logs rng-wd
  ```

]

---

## Comments

- This is a very crude implementation of the pattern

- A Go version would only be a bit longer, but use much less resources

- When there are many backends, reacting quickly to change is less important

(i.e. it's not necessary to re-resolve records every second!)

---

# All things ops (logs, backups, and more)

---

# Logs

- Two strategies:

- log to plain files on volumes

- log to stdout
 (and use a logging driver)

---

## Logging to plain files on volumes

(Sorry, that part won't be hands-on!)

- Start a container with `-v /logs`

- Make sure that all log files are in `/logs`

- To check logs, run e.g.

```bash
  docker run --volumes-from ... ubuntu sh -c "grep WARN /logs/*.log"
  ```

- Or just go interactive:

```bash
  docker run --volumes-from ... -ti ubuntu
  ```

- You can (should) start a log shipper that way

---

## Logging to stdout

- All containers should write to stdout/stderr

- Docker will collect logs and pass them to a logging driver

- Logging driver can specified globally, and per container
 (changing it for a container overrides the global setting)

- To change the global logging driver, pass extra flags to the daemon
 (requires a daemon restart)

- To override the logging driver for a container, pass extra flags to `docker run`

---

## Specifying logging flags

- `--log-driver`

*selects the driver*

- `--log-opt key=val`

*adds driver-specific options*
 *(can be repeated multiple times)*

- The flags are identical for `docker daemon` and `docker run`

---

## Logging flags in practice

- If you provision your nodes with Docker Machine,
  you can set global logging flags (which will apply to all
  containers started by a given Engine) like this:

```bash
  docker-machine create ... --engine-opt log-driver=...
  ```

- Otherwise, use your favorite method to edit or manage configuration files

- You can set per-container logging options in Compose files

---

## Available drivers

- json-file (default)

- syslog (can send to UDP, TCP, TCP+TLS, UNIX sockets)

- awslogs (AWS CloudWatch)

- journald

- gelf

- fluentd

- splunk

---

## About json-file ...

- It doesn't rotate logs by default, so your disks will fill up

(Unless you set `maxsize` *and* `maxfile` log options.)

- It's the only one supporting logs retrieval

(If you want to use `docker logs`, `docker-compose logs`,
  or fetch logs from the Docker API, you need json-file!)

- This might change in the future

(But it's complex since there is no standard protocol
  to *retrieve* log entries.)

All about logging in the documentation:
https://docs.docker.com/reference/logging/overview/

---

# Setting up ELK to store container logs

*Important foreword: this is not an "official" or "recommended"
setup; it is just an example. We do not endorse ELK, GELF,
or the other elements of the stack more than others!*

What we will do:

- Spin up an ELK stack, with Compose

- Gaze at the spiffy Kibana web UI

- Manually send a few log entries over GELF

- Reconfigure our DockerCoins app to send logs to ELK

---

## What's in an ELK stack?

- ELK is three components:

- ElasticSearch (to store and index log entries)

- Logstash (to receive log entries from various
    sources, process them, and forward them to various
    destinations)

- Kibana (to view/search log entries with a nice UI)

- The only component that we will configure is Logstash

- We will accept log entries using the GELF protocol

- Log entries will be stored in ElasticSearch,
 and displayed on Logstash's stdout for debugging

---

## Starting our ELK stack

- We will use a *separate* Compose file

- The Compose file is in the `elk` directory

- Go to the `elk` directory:
  ```bash
  cd ~/orchestration-workshop/elk
  ```

- Start the ELK stack:
  ```bash
  unset COMPOSE_FILE
  docker-compose up -d
  ```

]

---

## Making sure that each node has a local logstash

- We will configure each container to send logs to `localhost:12201`

- We need to make sure that each node has a logstash container listening on port 12201

- Scale the `logstash` service to 5 instances (one per node):
  ```bash
    for N in $(seq 1 5); do
      docker-compose scale logstash=$N
    done
  ```

]

---

## Checking that our ELK stack works

- Our default Logstash configuration sends a test
  message every minute

- All messages are stored into ElasticSearch,
  but also shown on Logstash stdout

- Look at Logstash stdout:
  ```bash
  docker-compose logs logstash
  ```

]

After less than one minute, you should see a `"message" => "ok"`
in the output.

---

## Connect to Kibana

- Our ELK stack exposes two public services:
 the Kibana web server, and the GELF UDP socket

- They are both exposed on their default port numbers
 (5601 for Kibana, 12201 for GELF)

- Check the address of the node running kibana:
  ```bash
  docker-compose ps
  ```

- Open the UI in your browser: http://instance-address:5601/

]

---

## "Configuring" Kibana

- If you see a status page with a yellow item, wait a minute and reload
  (Kibana is probably still initializing)

- Kibana should offer you to "Configure an index pattern",
  just click the "Create" button

- Then:

- click "Discover" (in the top-left corner)
  - click "Last 15 minutes" (in the top-right corner)
  - click "Last 1 hour" (in the list in the middle)
  - click "Auto-refresh" (top-right corner)
  - click "5 seconds" (top-left of the list)

- You should see a series of green bars (with one new green bar every minute)

---

![Screenshot of Kibana](kibana.png)

---

## Sending container output to Kibana

- We will create a simple container displaying "hello world"

- We will override the container logging driver

- The GELF address is `127.0.0.1:12201`, because the Compose file
  explicitly exposes the GELF socket on port 12201

- Start our one-off container:

```bash
    docker run --rm --log-driver gelf \
           --log-opt gelf-address=udp://127.0.0.1:12201 \
           alpine echo hello world
  ```

]

---

## Visualizing container logs in Kibana

- Less than 5 seconds later (the refresh rate of the UI),
  the log line should be visible in the web UI

- We can customize the web UI to be more readable

- In the left column, move the mouse over the following
  columns, and click the "Add" button that appears:

- host
  - container_name
  - message

]

---

## Switching back to the DockerCoins application

- Go back to the dockercoins directory:
  ```bash
  cd ~/orchestration-workshop/dockercoins
  ```

- Set the `COMPOSE_FILE` variable:
  ```bash
  export COMPOSE_FILE=docker-compose.yml-`NNN`
  ```

]

---
## Add the logging driver to the Compose file

- We need to add the logging section to each container

- Edit the `docker-compose.yml-NNN` file, adding the following lines **to each container**:

```yaml
    logging:
      driver: gelf
      options:
        gelf-address: "udp://127.0.0.1:12201"
  ```

]

There is also a script, [`../bin/add-logging.py`](https://github.com/jpetazzo/orchestration-workshop/blob/master/bin/add-logging.py), to do that automatically.

---

## Update the DockerCoins app

- Use Compose normally:
  ```bash
  docker-compose up -d
  ```

]

If you look in the Kibana web UI, you will see log lines
refreshed every 5 seconds.

Note: to do interesting things (graphs, searches...) we
would need to create indexes. This is beyond the scope
of this workshop.

---

## Logging in production

- If we were using an ELK stack:

- scale ElasticSearch
  - interpose a Redis or Kafka queue to deal with bursts

- Configure your Engines to send all logs to ELK by default

- Start the logging containers with a different logging system
 (to avoid a logging loop)

- Make sure you don't end up writing *all logs* on the nodes running Logstash!

---

# Network traffic analysis

- We want to inspect the network traffic entering/leaving `dockercoins_redis_1`

- We will use *shared network namespaces* to perform network analysis

- Two containers sharing the same network namespace...

- have the same IP addresses

- have the same network interfaces

- `eth0` is therefore the same in both containers

---

## Install and start `ngrep`

Ngrep uses libpcap (like tcpdump) to sniff network traffic.

- Start a container with the same network namespace:
 `docker run --net container:dockercoins_redis_1 -ti alpine sh`

- Install ngrep:
 `apk update && apk add ngrep`

- Run ngrep:
 `ngrep -tpd eth0 -Wbyline . tcp`

]

You should see a stream of Redis requests and responses.

---

# Backups

- We want to enable backups for `dockercoins_redis_1`

- We don't want to install extra software in this container

- We will use a special backup container:

- sharing the same volumes

- using the same network stack (to connect to it easily)

- possibly containing our backup tools

- This works because the `redis` container image stores its data on a volume

---

## Starting the backup container

- We will use the `--net container:` option to be able to connect locally

- We will use the `--volumes-from` option to access the container's persistent data

- Start the container:

```bash
    docker run --net container:dockercoins_redis_1 \
               --volumes-from dockercoins_redis_1:ro \
               -v /tmp/myredis:/output \
               -ti alpine sh
  ```

- Look in `/data` in the container (that's where Redis puts its data dumps)
]

---

## Connecting to Redis

- We need to tell Redis to perform a data dump *now*

- Connect to Redis:
  ```bash
  telnet localhost 6379
  ```

- Issue commands `SAVE` then `QUIT`

- Look at `/data` again (notice the time stamps)

]

- There should be a recent dump file now!

---

## Getting the dump out of the container

- We could use many things:

- s3cmd to copy to S3
  - SSH to copy to a remote host
  - gzip/bzip/etc before copying

- We'll just copy it to the Docker host

- Copy the file from `/data` to `/output`

- Exit the container

- Look into `/tmp/myredis` (on the host)

]

---

## Scheduling backups

In the "old world," we (generally) use cron.

With containers, what are our options?

- run `cron` on the Docker host, and put `docker run` in the crontab

- run `cron` in the backup container, and make sure it keeps running
 (e.g. with `docker run --restart=…`)

- run `cron` in a container, and start backup containers from there

- listen to the Docker events stream, automatically scheduling backups
 when database containers are started

---

# Controlling Docker from a container

- In a local environment, just bind-mount the Docker control socket:
  ```bash
  docker run -ti -v /var/run/docker.sock:/var/run/docker.sock docker
  ```

- Otherwise, you have to:

- set `DOCKER_HOST`,
  - set `DOCKER_TLS_VERIFY` and `DOCKER_CERT_PATH` (if you use TLS),
  - copy certificates to the container that will need API access.