container.training/docs/index.html

<!DOCTYPE html>
<html>
  <head>
    <base target="_blank">
    <title>Docker Orchestration Workshop</title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    <style type="text/css">
      @import url(https://fonts.googleapis.com/css?family=Yanone+Kaffeesatz);
      @import url(https://fonts.googleapis.com/css?family=Droid+Serif:400,700,400italic);
      @import url(https://fonts.googleapis.com/css?family=Ubuntu+Mono:400,700,400italic);

      body { font-family: 'Droid Serif'; }

      h1, h2, h3 {
        font-family: 'Yanone Kaffeesatz';
        font-weight: normal;
        margin-top: 0.5em;
      }
      a {
        text-decoration: none;
        color: blue;
      }
      .remark-slide-content { padding: 1em 2.5em 1em 2.5em; }

      .remark-slide-content { font-size: 25px; }
      .remark-slide-content h1 { font-size: 50px; }
      .remark-slide-content h2 { font-size: 50px; }
      .remark-slide-content h3 { font-size: 25px; }
      .remark-code { font-size: 25px; }
      .small .remark-code { font-size: 16px; }

      .remark-code, .remark-inline-code { font-family: 'Ubuntu Mono'; }
      .red { color: #fa0000; }
      .gray { color: #ccc; }
      .small { font-size: 70%; }
      .big { font-size: 140%; }
      .underline { text-decoration: underline; }
      .strike { text-decoration: line-through; }
      .pic {
        vertical-align: middle;
        text-align: center;
        padding: 0 0 0 0 !important;
      }
      img {
        max-width: 100%;
        max-height: 550px;
      }
      .title {
        vertical-align: middle;
        text-align: center;
      }
      .title h1 { font-size: 100px; }
      .title p { font-size: 100px; }
      .quote {
        background: #eee;
        border-left: 10px solid #ccc;
        margin: 1.5em 10px;
        padding: 0.5em 10px;
        quotes: "\201C""\201D""\2018""\2019";
        font-style: italic;
      }
      .quote:before {
        color: #ccc;
        content: open-quote;
        font-size: 4em;
        line-height: 0.1em;
        margin-right: 0.25em;
        vertical-align: -0.4em;
      }
      .quote p {
        display: inline;
      }
      .warning {
        background-image: url("warning.png");
        background-size: 1.5em;
        background-repeat: no-repeat;
        padding-left: 2em;
      }
      .exercise {
        background-color: #eee;
        background-image: url("keyboard.png");
        background-size: 1.4em;
        background-repeat: no-repeat;
        background-position: 0.2em 0.2em;
        border: 2px dotted black;
      }
      .exercise::before {
        content: "Exercise";
        margin-left: 1.8em;
      }
      li p { line-height: 1.25em; }
    </style>
  </head>
  <body>
    <textarea id="source">

class: title

Docker <br/> Orchestration <br/> Workshop

---

## Intros

- Hello! We are
  AJ ([@s0ulshake](https://twitter.com/s0ulshake))
  &
  Jérôme ([@jpetazzo](https://twitter.com/jpetazzo))

--

- Who are you?

--

  - first time attending SCALE?

--

  - already attended Docker workshops at SCALE?

--

- Special thanks to SCALE ...

???

- This is our collective Docker knowledge:

  ![Bell Curve](bell-curve.jpg)

<!--
Reminder, when updating the agenda: when people are told to show
up at 9am, they usually trickle in until 9:30am (except for paid
training sessions). If you're not sure that people will be there
on time, it's a good idea to have a breakfast with the attendees
at e.g. 9am, and start at 9:30.
-->

???

## Agenda

<!--
- Agenda:
-->

.small[
- 09:00-09:15 hello
- 09:15-10:45 part 1
- 10:45-11:00 coffee break
- 11:00-12:30 part 2
- 12:30-13:30 lunch break
- 13:30-15:00 part 3
- 15:00-15:15 coffee break
- 15:15-16:45 part 4
- 16:45-17:00 Q&A
]

<!--
- The tutorial will run from 1pm to 5pm
- This will be fast-paced, but DON'T PANIC!
- We will do short breaks for coffee + QA every hour
-->

- Feel free to interrupt for questions at any time

- Live feedback, questions, help on
  [Slack](http://container.training/chat)
  ([get an invite](http://lisainvite.herokuapp.com/))

- All the content is publicly available (slides, code samples, scripts)

<!--
Remember to change:
- the link above
- the "tweet my speed" hashtag in DockerCoins HTML
-->

---

## Disclaimer

- This will be slightly different from the posted abstract

- Lots of things happened between the CFP and today

--

  - Docker 1.13

--

  - Docker 17.03

--

- There is enough content here for a whole day

- We will cover about a half of the whole program

- And I'll give you ways to continue learning on your own, should you choose to!

???

## A brief introduction

- This was initially written to support in-person,
  instructor-led workshops and tutorials

- You can also follow along on your own, at your own pace

- We included as much information as possible in these slides

- We recommend having a mentor to help you ...

- ... Or be comfortable spending some time reading the Docker
 [documentation](https://docs.docker.com/) ...

- ... And looking for answers in the [Docker forums](forums.docker.com),
  [StackOverflow](http://stackoverflow.com/questions/tagged/docker),
  and other outlets

???

## Hands on, you shall practice

- Nobody ever became a Jedi by spending their lives reading Wookiepedia

- Likewise, it will take more than merely *reading* these slides
  to make you an expert

- These slides include *tons* of exercises

- They assume that you have access to a cluster of Docker nodes

- If you are attending a workshop or tutorial:
  <br/>you will be given specific instructions to access your cluster

- If you are doing this on your own:
  <br/>you can use
  [Play-With-Docker](http://www.play-with-docker.com/) and
  read [these instructions](https://github.com/jpetazzo/orchestration-workshop#using-play-with-docker) for extra
  details

???

<!--
grep '^# ' index.html | grep -v '<br' | tr '#' '-'
-->

## Chapter 1: getting started

- Pre-requirements
- VM environment
- Our sample application
- Running the application
- Identifying bottlenecks
- Introducing SwarmKit

???

## Chapter 2: scaling out our app on Swarm

- Creating our first Swarm
- Running our first Swarm service
- Deploying a local registry
- Overlay networks
- Breaking into an overlay network
- Securing overlay networks

???

## Chapter 3: operating the Swarm

- Rolling updates
- Secrets management and encryption at rest
- [Centralized logging](#logging)
- Metrics collection

???

## Chapter 4: deeper in Swarm

- Dealing with stateful services
- Scripting image building and pushing
- Integration with Compose
- Controlling Docker from a container
- Node management
- What's next?

---

# Pre-requirements

- Computer with internet connection and a web browser

- For instructor-led workshops: an SSH client to connect to remote machines

  - on Linux, OS X, FreeBSD... you are probably all set

  - on Windows, get [putty](http://www.putty.org/),
  Microsoft [Win32 OpenSSH](https://github.com/PowerShell/Win32-OpenSSH/wiki/Install-Win32-OpenSSH),
  [Git BASH](https://git-for-windows.github.io/), or
  [MobaXterm](http://mobaxterm.mobatek.net/)

- For self-paced learning: SSH is not necessary if you use
  [Play-With-Docker](http://www.play-with-docker.com/)

- Some Docker knowledge

  (but that's OK if you're not a Docker expert!)

---

## Nice-to-haves

- [Mosh](https://mosh.org/) instead of SSH, if your internet connection tends to lose packets
  <br/>(available with `(apt|yum|brew) install mosh`; then connect with `mosh user@host`)

- [GitHub](https://github.com/join) account
  <br/>(if you want to fork the repo; also used to join Gitter)

<!--

- [Gitter](https://gitter.im/) account
  <br/>(to join the conversation during the workshop)

-->

- [Slack](https://community.docker.com/registrations/groups/4316) account
  <br/>(to join the conversation during the workshop)

- [Docker Hub](https://hub.docker.com) account
  <br/>(it's one way to distribute images on your Swarm cluster)

---

## Hands-on sections

- The whole workshop is hands-on

- We will see Docker 17.03 in action

- You are invited to reproduce all the demos

- All hands-on sections are clearly identified, like the gray rectangle below

.exercise[

- This is the stuff you're supposed to do!
- Go to [container.training](http://container.training/) to view these slides
- Join the chat room on
  [Gitter](http://container.training/chat)

]

---

# VM environment

- To follow along, you need a cluster of five Docker Engines

<!--

- If you are doing this with an instructor, see next slide

- If you are doing (or re-doing) this on your own, you can:

-->

- You can ...

  - create your own cluster (local or cloud VMs) with Docker Machine
    ([instructions](https://github.com/jpetazzo/orchestration-workshop/tree/master/prepare-machine))

  - use [Play-With-Docker](http://play-with-docker.com) ([instructions](https://github.com/jpetazzo/orchestration-workshop#using-play-with-docker))

  - create a bunch of clusters for you and your friends
    ([instructions](https://github.com/jpetazzo/orchestration-workshop/tree/master/prepare-vms))

???

class: pic

![You get five VMs](you-get-five-vms.jpg)

---

<!--
## You get five VMs
-->

## Some of you get five VMs

- Each person gets 5 private VMs (not shared with anybody else)
- They'll remain up until the day after the tutorial
- You should have a little card with login+password+IP addresses
- You can automatically SSH from one VM to another

.exercise[

<!--
```bash
for N in $(seq 1 5); do
  ssh -o StrictHostKeyChecking=no node$N true
done
for N in $(seq 1 5); do
  (.
  docker-machine rm -f node$N
  ssh node$N "docker ps -aq | xargs -r docker rm -f"
  ssh node$N sudo rm -f /etc/systemd/system/docker.service
  ssh node$N sudo systemctl daemon-reload
  echo Restarting node$N.
  ssh node$N sudo systemctl restart docker
  echo Restarted node$N.
  ) &
done
wait
```
-->

- Log into the first VM (`node1`) with SSH or MOSH
- Check that you can SSH (without password) to `node2`:
  ```bash
  ssh node2
  ```
- Type `exit` or `^D` to come back to node1

<!--
```meta
^D
```
-->

]

---

<!--
## If doing or re-doing the workshop on your own ...


- If you use Play-With-Docker:
-->

## Everybody else use Play-With-Docker!

- Main differences:

  - you don't need to SSH to the machines
    <br/>(just click on the node that you want to control in the left tab bar)

  - Play-With-Docker automagically detects exposed ports
    <br/>(and displays them as little badges with port numbers, above the terminal)

  - You can access HTTP services by clicking on the port numbers

  - exposing TCP services requires something like
    [ngrok](https://ngrok.com/)
    or [supergrok](https://github.com/jpetazzo/orchestration-workshop#using-play-with-docker)

<!--

- If you use VMs deployed with Docker Machine:

  - you won't have pre-authorized SSH keys to bounce across machines

  - you won't have host aliases

-->

---

## We will (mostly) interact with node1 only

- Unless instructed, **all commands must be run from the first VM, `node1`**

- We will only checkout/copy the code on `node1`

- When we will use the other nodes, we will do it mostly through the Docker API

- We will log into other nodes only for initial setup and a few "out of band" operations
  <br/>(checking internal logs, debugging...)

---

## Terminals

Once in a while, the instructions will say:
<br/>"Open a new terminal."

There are multiple ways to do this:

- create a new window or tab on your machine, and SSH into the VM;

- use screen or tmux on the VM and open a new window from there.

You are welcome to use the method that you feel the most comfortable with.

---

## Tmux cheatsheet

- Ctrl-b c → creates a new window
- Ctrl-b n → go to next window
- Ctrl-b p → go to previous window
- Ctrl-b " → split window top/bottom
- Ctrl-b % → split window left/right
- Ctrl-b Alt-1 → rearrange windows in columns
- Ctrl-b Alt-2 → rearrange windows in rows
- Ctrl-b arrows → navigate to other windows
- Ctrl-b d → detach session
- tmux attach → reattach to session

---

## Brand new versions!

- Engine 17.03
- Compose 1.11
- Machine 0.9

.exercise[

- Check all installed versions:
  ```bash
  docker version
  docker-compose -v
  docker-machine -v
  ```

]

---

## Wait, what, 17.03 ?!?

--

- Docker inc. [announced yesterday](https://blog.docker.com/2017/03/docker-enterprise-edition/)
  Docker Enterprise Edition

--

- Docker EE:

  - $$$
  - certification for select distros, clouds, and plugins
  - advanced management features (fine-grained access control, security scanning...)

- Docker CE:

  - free
  - available through Docker Mac, Docker Windows, and major Linux distros
  - perfect for individuals and small organizations

---

## Why?

- More readable for enterprise users

  (i.e. the very nice folks who are kind enough to pay us big $$$ for our stuff)

- No impact for the community

  (beyond CE/EE suffix and version numbering change)

- Both trains leverage the same open source components

  (containerd, libcontainer, SwarmKit...)

- More predictible release schedule (see next slide)

---

class: pic

![Docker CE/EE release cycle](lifecycle.png)

---

class: title

All right!
<br/>
We're all set.
<br/>
Let's do this.

---

name: part-1

class: title

Part 1

---

# Our sample application

- Visit the GitHub repository with all the materials of this workshop:
  <br/>https://github.com/jpetazzo/orchestration-workshop

- The application is in the [dockercoins](
  https://github.com/jpetazzo/orchestration-workshop/tree/master/dockercoins)
  subdirectory

- Let's look at the general layout of the source code:

  there is a Compose file [docker-compose.yml](
  https://github.com/jpetazzo/orchestration-workshop/blob/master/dockercoins/docker-compose.yml) ...

  ... and 4 other services, each in its own directory:

  - `rng` = web service generating random bytes
  - `hasher` = web service computing hash of POSTed data
  - `worker` = background process using `rng` and `hasher`
  - `webui` = web interface to watch progress

???
---

## Compose file format version

*Particularly relevant if you have used Compose before...*

- Compose 1.6 introduced support for a new Compose file format (aka "v2")

- Services are no longer at the top level, but under a `services` section

- There has to be a `version` key at the top level, with value `"2"` (as a string, not an integer)

- Containers are placed on a dedicated network, making links unnecessary

- There are other minor differences, but upgrade is easy and straightforward

---

## Links, naming, and service discovery

- Containers can have network aliases (resolvable through DNS)

- Compose file version 2 makes each container reachable through its service name

- Compose file version 1 requires "links" sections

- Our code can connect to services using their short name

  (instead of e.g. IP address or FQDN)

---

## Example in `worker/worker.py`

![Service discovery](service-discovery.png)

---

## What's this application?

---

class: pic

![DockerCoins logo](dockercoins.png)

(DockerCoins 2016 logo courtesy of [@XtlCnslt](https://twitter.com/xtlcnslt) and [@ndeloof](https://twitter.com/ndeloof). Thanks!)

---

## What's this application?

- It is a DockerCoin miner! 💰🐳📦🚢

- No, you can't buy coffee with DockerCoins

- How DockerCoins works:

  - `worker` asks to `rng` to give it random bytes
  - `worker` feeds those random bytes into `hasher`
  - each hash starting with `0` is a DockerCoin
  - DockerCoins are stored in `redis`
  - `redis` is also updated every second to track speed
  - you can see the progress with the `webui`

---

## Getting the application source code

- We will clone the GitHub repository

- The repository also contains scripts and tools that we will use through the workshop

.exercise[

<!--
```bash
[ -d orchestration-workshop ] && mv orchestration-workshop orchestration-workshop.$$
```
-->

- Clone the repository on `node1`:
  ```bash
  git clone git://github.com/jpetazzo/orchestration-workshop
  ```

]

(You can also fork the repository on GitHub and clone your fork if you prefer that.)

---

# Running the application

Without further ado, let's start our application.

.exercise[

- Go to the `dockercoins` directory, in the cloned repo:
  ```bash
  cd ~/orchestration-workshop/dockercoins
  ```

- Use Compose to build and run all containers:
  ```bash
  docker-compose up
  ```

]

Compose tells Docker to build all container images (pulling
the corresponding base images), then starts all containers,
and displays aggregated logs.

---

## Lots of logs

- The application continuously generates logs

- We can see the `worker` service making requests to `rng` and `hasher`

- Let's put that in the background

.exercise[

- Stop the application by hitting `^C`

<!--
```meta
^C
```
-->

]

- `^C` stops all containers by sending them the `TERM` signal

- Some containers exit immediately, others take longer
  <br/>(because they don't handle `SIGTERM` and end up being killed after a 10s timeout)

---

## Restarting in the background

- Many flags and commands of Compose are modeled after those of `docker`

.exercise[

- Start the app in the background with the `-d` option:
  ```bash
  docker-compose up -d
  ```

- Check that our app is running with the `ps` command:
  ```bash
  docker-compose ps
  ```

]

`docker-compose ps` also shows the ports exposed by the application.

???
---

## Viewing logs

- The `docker-compose logs` command works like `docker logs`

.exercise[

- View all logs since container creation and exit when done:
  ```bash
  docker-compose logs
  ```

- Stream container logs, starting at the last 10 lines for each container:
  ```bash
  docker-compose logs --tail 10 --follow
  ```

<!--
```meta
^C
```
-->

]

Tip: use `^S` and `^Q` to pause/resume log output.

???

## Upgrading from Compose 1.6

.warning[The `logs` command has changed between Compose 1.6 and 1.7!]

- Up to 1.6

  - `docker-compose logs` is the equivalent of `logs --follow`

  - `docker-compose logs` must be restarted if containers are added

- Since 1.7

  - `--follow` must be specified explicitly

  - new containers are automatically picked up by `docker-compose logs`

---

## Connecting to the web UI

- The `webui` container exposes a web dashboard; let's view it

.exercise[

- With a web browser, connect to `node1` on port 8000

]

- The app actually has a constant, steady speed (3.33 coins/second)

- The speed seems not-so-steady because:

  - the worker doesn't update the counter after every loop, but up to once per second

  - the speed is computed by the browser, checking the counter about once per second

  - between two consecutive updates, the counter will increase either by 4, or by 0

---

## Scaling up the application

- Our goal is to make that performance graph go up (without changing a line of code!)

--

- Before trying to scale the application, we'll figure out if we need more resources

  (CPU, RAM...)

- For that, we will use good old UNIX tools on our Docker node

---

## Looking at resource usage

- Let's look at CPU, memory, and I/O usage

.exercise[

- run `top` to see CPU and memory usage (you should see idle cycles)

- run `vmstat 3` to see I/O usage (si/so/bi/bo)
  <br/>(the 4 numbers should be almost zero, except `bo` for logging)

]

We have available resources.

- Why?
- How can we use them?

---

## Scaling workers on a single node

- Docker Compose supports scaling
- Let's scale `worker` and see what happens!

.exercise[

- Start one more `worker` container:
  ```bash
  docker-compose scale worker=2
  ```

- Look at the performance graph (it should show a x2 improvement)

- Look at the aggregated logs of our containers (`worker_2` should show up)

- Look at the impact on CPU load with e.g. top (it should be negligible)

]

---

## Adding more workers

- Great, let's add more workers and call it a day, then!

.exercise[

- Start eight more `worker` containers:
  ```bash
  docker-compose scale worker=10
  ```

- Look at the performance graph: does it show a x10 improvement?

- Look at the aggregated logs of our containers

- Look at the impact on CPU load and memory usage

<!--
```bash
sleep 5
killall docker-compose
```
-->

]

---

# Identifying bottlenecks

- You should have seen a 3x speed bump (not 10x)

- Adding workers didn't result in linear improvement

- *Something else* is slowing us down

--

- ... But what?

--

- The code doesn't have instrumentation

- Let's use state-of-the-art HTTP performance analysis!
  <br/>(i.e. good old tools like `ab`, `httping`...)

---

## Accessing internal services

- `rng` and `hasher` are exposed on ports 8001 and 8002

- This is declared in the Compose file:

  ```yaml
    ...
    rng:
      build: rng
      ports:
      - "8001:80"

    hasher:
      build: hasher
      ports:
      - "8002:80"
    ...
  ```

---

## Measuring latency under load

We will use `httping`.

.exercise[

- Check the latency of `rng`:
  ```bash
  httping -c 10 localhost:8001
  ```

- Check the latency of `hasher`:
  ```bash
  httping -c 10 localhost:8002
  ```

]

`rng` has a much higher latency than `hasher`.

---

## Let's draw hasty conclusions

- The bottleneck seems to be `rng`

- *What if* we don't have enough entropy and can't generate enough random numbers?

- We need to scale out the `rng` service on multiple machines!

Note: this is a fiction! We have enough entropy. But we need a pretext to scale out.

(In fact, the code of `rng` uses `/dev/urandom`, which never runs out of entropy...
<br/>
...and is [just as good as `/dev/random`](http://www.slideshare.net/PacSecJP/filippo-plain-simple-reality-of-entropy).)

---

## Clean up

- Before moving on, let's remove those containers

.exercise[

- Tell Compose to remove everything:
  ```bash
  docker-compose down
  ```

]

---

class: title

#  Scaling out

---

# SwarmKit

- [SwarmKit](https://github.com/docker/swarmkit) is an open source
  toolkit to build multi-node systems

- It is a reusable library, like libcontainer, libnetwork, vpnkit ...

- It is a plumbing part of the Docker ecosystem

- SwarmKit comes with two examples:

  - `swarmctl` (a CLI tool to "speak" the SwarmKit API)

  - `swarmd` (an agent that can federate existing Docker Engines into a Swarm)

- SwarmKit/swarmd/swarmctl → libcontainer/containerd/container-ctr

---

## SwarmKit features

- Highly-available, distributed store based on [Raft](
  https://en.wikipedia.org/wiki/Raft_(computer_science))
  <br/>(more on next slide)

- *Services* managed with a *declarative API*
  <br/>(implementing *desired state* and *reconciliation loop*)

- Automatic TLS keying and signing

- Dynamic promotion/demotion of nodes, allowing to change
  how many nodes are actively part of the Raft consensus

- Integration with overlay networks and load balancing

- And much more!

---

## Where is the key/value store?

- Many orchestration systems use a key/value store backed by a consensus algorithm
  <br/>
  (k8s→etcd→Raft, mesos→zookeeper→ZAB, etc.)

- SwarmKit implements the Raft algorithm directly
  <br/>
  (Nomad is similar; thanks [@cbednarski](https://twitter.com/@cbednarski),
  [@diptanu](https://twitter.com/diptanu) and others for point it out!)

- Analogy courtesy of [@aluzzardi](https://twitter.com/aluzzardi):

  *It's like B-Trees and RDBMS. They are different layers, often
  associated. But you don't need to bring up a full SQL server when
  all you need is to index some data.*

- As a result, the orchestrator has direct access to the data
  <br/>
  (the main copy of the data is stored in the orchestrator's memory)

- Simpler, easier to deploy and operate; also faster

---

## SwarmKit concepts (1/2)

- A *cluster* will be at least one *node* (preferably more)

- A *node* can be a *manager* or a *worker*

  (Note: in SwarmKit, *managers* are also *workers*)

- A *manager* actively takes part in the Raft consensus

- You can talk to a *manager* using the SwarmKit API

- One *manager* is elected as the *leader*; other managers merely forward requests to it

---

## Illustration

![Illustration](swarm-mode.svg)

---

## SwarmKit concepts (2/2)

- The *managers* expose the SwarmKit API

- Using the API, you can indicate that you want to run a *service*

- A *service* is specified by its *desired state*: which image, how many instances...

- The *leader* uses different subsystems to break down services into *tasks*:
  <br/>orchestrator, scheduler, allocator, dispatcher

- A *task* corresponds to a specific container, assigned to a specific *node*

- *Nodes* know which *tasks* should be running, and will start or stop containers accordingly (through the Docker Engine API)

You can refer to the [NOMENCLATURE](https://github.com/docker/swarmkit/blob/master/design/nomenclature.md) in the SwarmKit repo for more details.

---

## Swarm Mode

- Since version 1.12, Docker Engine embeds SwarmKit

- The Docker CLI features three new commands:

  - `docker swarm` (enable Swarm mode; join a Swarm; adjust cluster parameters)

  - `docker node` (view nodes; promote/demote managers; manage nodes)

  - `docker service` (create and manage services)

- The Docker API exposes the same concepts

- The SwarmKit API is also exposed (on a separate socket)

---

## You need to enable Swarm mode to use the new stuff

- By default, all this new code is inactive

- Swarm Mode can be enabled, "unlocking" SwarmKit functions
  <br/>(services, out-of-the-box overlay networks, etc.)

.exercise[

- Try a Swarm-specific command:
  ```bash
  docker node ls
  ```

]

--

You will get an error message:
```
Error response from daemon: This node is not a swarm manager. [...]
```

---

# Creating our first Swarm

- The cluster is initialized with `docker swarm init`

- This should be executed on a first, seed node

- .warning[DO NOT execute `docker swarm init` on multiple nodes!]

  You would have multiple disjoint clusters.

.exercise[

- Create our cluster from node1:
  ```bash
  docker swarm init
  ```

]

If Docker tells you that it `could not choose an IP address to advertise`, see next slide!

---

## IP address to advertise

- When running in Swarm mode, each node *advertises* its address to the others
  <br/>
  (i.e. it tells them *"you can contact me on 10.1.2.3:2377"*)

- If the node has only one IP address (other than 127.0.0.1), it is used automatically

- If the node has multiple IP addresses, you **must** specify which one to use
  <br/>
  (Docker refuses to pick one randomly)

- You can specify an IP address or an interface name
  <br/>(in the latter case, Docker will read the IP address of the interface and use it)

- You can also specify a port number
  <br/>(otherwise, the default port 2377 will be used)

---

## Which IP address should be advertised?

- If your nodes have only one IP address, it's safe to let autodetection do the job

  .small[(Except if your instances have different private and public addresses, e.g.
  on EC2, and you are building a Swarm involving nodes inside and outside the
  private network: then you should advertise the public address.)]

- If your nodes have multiple IP addresses, pick an address which is reachable
  *by every other node* of the Swarm

- If you are using [play-with-docker](http://play-with-docker.com/), use the IP
  address shown next to the node name

  .small[(This is the address of your node on your private internal overlay network.
  The other address that you might see is the address of your node on the
  `docker_gwbridge` network, which is used for outbound traffic.)]

Examples:

```bash
docker swarm init --advertise-addr 10.0.9.2
docker swarm init --advertise-addr eth0:7777
```

---

## Token generation

- In the output of `docker swarm init`, we have a message
  confirming that our node is now the (single) manager:

  ```
  Swarm initialized: current node (8jud...) is now a manager.
  ```

- Docker generated two security tokens (like passphrases or passwords) for our cluster

- The CLI shows us the command to use on other nodes to add them to the cluster using the "worker"
  security token:

  ```
    To add a worker to this swarm, run the following command:
      docker swarm join \
      --token SWMTKN-1-59fl4ak4nqjmao1ofttrc4eprhrola2l87... \
      172.31.4.182:2377
  ```

---

## Checking that Swarm mode is enabled

.exercise[

- Run the traditional `docker info` command:
  ```bash
  docker info
  ```

]

The output should include:

```
Swarm: active
 NodeID: 8jud7o8dax3zxbags3f8yox4b
 Is Manager: true
 ClusterID: 2vcw2oa9rjps3a24m91xhvv0c
 ...
```

---

## Running our first Swarm mode command

- Let's retry the exact same command as earlier

.exercise[

- List the nodes (well, the only node) of our cluster:
  ```bash
  docker node ls
  ```

]

The output should look like the following:
```
ID             HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
8jud...ox4b *  node1     Ready   Active        Leader
```

---

## Adding nodes to the Swarm

- A cluster with one node is not a lot of fun

- Let's add `node2`!

- We need the token that was shown earlier

--

- You wrote it down, right?

--

- Don't panic, we can easily see it again 😏

---

## Adding nodes to the Swarm

.exercise[

- Show the token again:
  ```bash
  docker swarm join-token worker
  ```

- Switch to `node2`

- Copy-paste the `docker swarm join ...` command
  <br/>(that was displayed just before)

]

???
---

## Check that the node was added correctly

- Stay on `node2` for now!

.exercise[

- We can still use `docker info` to verify that the node is part of the Swarm:
  ```bash
  docker info | grep ^Swarm
  ```

]

- However, Swarm commands will not work; try, for instance:
  ```
  docker node ls
  ```

- This is because the node that we added is currently a *worker*

- Only *managers* can accept Swarm-specific commands

---

## View our two-node cluster

- Let's go back to `node1` and see what our cluster looks like

.exercise[

- Switch back to `node1`

- View the cluster from `node1`, which is a manager:
  ```bash
  docker node ls
  ```

]

The output should be similar to the following:
```
ID             HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
8jud...ox4b *  node1     Ready   Active        Leader
ehb0...4fvx    node2     Ready   Active
```

---

## Adding nodes using the Docker API

- We don't have to SSH into the other nodes, we can use the Docker API

- If you are using Play-With-Docker:

  - the nodes expose the Docker API over port 2375/tcp, without authentication

  - we will connect by setting the `DOCKER_HOST` environment variable

- Otherwise:

  - the nodes expose the Docker API over port 2376/tcp, with TLS mutual authentication

  - we will use Docker Machine to set the correct environment variables
    <br/>(the nodes have been suitably pre-configured to be controlled through `node1`)

---

## Docker Machine

- Docker Machine has two primary uses:

  - provisioning cloud instances running the Docker Engine

  - managing local Docker VMs within e.g. VirtualBox

- Docker Machine is purely optional

- It makes it easy to create, upgrade, manage... Docker hosts:

  - on your favorite cloud provider

  - locally (e.g. to test clustering, or different versions)

  - across different cloud providers

---

## Docker Machine basic usage

- We will learn two commands:

  - `docker-machine ls` (list existing hosts)

  - `docker-machine env` (switch to a specific host)

.exercise[

- List configured hosts:
  ```bash
  docker-machine ls
  ```

]

You should see your 5 nodes.

---

## How did we make our 5 nodes show up there?

*For the curious...*

- This was done by our VM provisioning scripts

- After setting up everything else, `node1` adds the 5 nodes
  to the local Docker Machine configuration
  (located in `$HOME/.docker/machine`)

- Nodes are added using [Docker Machine generic driver](https://docs.docker.com/machine/drivers/generic/)

  (It skips machine provisioning and jumps straight to the configuration phase)

- Docker Machine creates TLS certificates and deploys them to the nodes through SSH

---

## Using Docker Machine to communicate with a node

- To select a node, use `eval $(docker-machine env nodeX)`

- This sets a number of environment variables

- To unset these variables, use `eval $(docker-machine env -u)`

.exercise[

- View the variables used by Docker Machine:
  ```bash
  docker-machine env node3
  ```

]

---

## Getting the token

- First, let's store the join token in a variable

.exercise[

- Make sure we talk to the local node:
  ```bash
  eval $(docker-machine env -u)
  ```

- Get the join token:
  ```bash
  TOKEN=$(docker swarm join-token -q worker)
  ```

]

---

## Change the node targeted by the Docker CLI

- We need to set the right environment variables to communicate with `node3`

.exercise[

- If you're using Play-With-Docker:
  ```bash
  export DOCKER_HOST=tcp://node3:2375
  ```

- Otherwise, use Docker Machine:
  ```bash
  eval $(docker-machine env node3)
  ```

]

---

## Adding a node through the Docker API

Note: it can be useful to use a [custom shell prompt](
https://github.com/jpetazzo/orchestration-workshop/blob/master/prepare-vms/scripts/postprep.rc#L68)
reflecting the `DOCKER_HOST` variable.

.exercise[

- Check which node we're talking to:
  ```bash
  docker info | grep ^Name
  ```

- Add `node3` to the Swarm:
  ```bash
  docker swarm join --token $TOKEN node1:2377
  ```

]

---

## Going back to the local node

- We need to revert the environment variable(s) that we had set previously

.exercise[

- If you're using Play-With-Docker, just clear `DOCKER_HOST`:
  ```bash
  unset DOCKER_HOST
  ```

- Otherwise, use Docker Machine to reset all the relevant variables:
  ```bash
  eval $(docker-machine env -u)
  ```

]

---

## Checking the composition of our cluster

- Now that we're talking to `node1` again, we can use management commands

.exercise[

- Check that the node is here:
  ```bash
  docker node ls
  ```

]

---

## Under the hood: docker swarm init

When we do `docker swarm init`:

- a keypair is created for the root CA of our Swarm

- a keypair is created for the first node

- a certificate is issued for this node

- the join tokens are created

---

## Under the hood: join tokens

There is one token to *join as a worker*, and another to *join as a manager*.

The join tokens have two parts:

- a secret key (preventing unauthorized nodes from joining)

- a fingerprint of the root CA certificate (preventing MITM attacks)

If a token is compromised, it can be rotated instantly with:
```
docker swarm join-token --rotate <worker|manager>
```

---

## Under the hood: docker swarm join

When a node joins the Swarm:

- it is issued its own keypair, signed by the root CA

- if the node is a manager:

  - it joins the Raft consensus
  - it connects to the current leader
  - it accepts connections from worker nodes

- if the node is a worker:

  - it connects to one of the managers (leader or follower)

---

## Under the hood: cluster communication

- The *control plane* is encrypted with AES-GCM; keys are rotated every 12 hours

- Authentication is done with mutual TLS; certificates are rotated every 90 days

  (`docker swarm update` allows to change this delay or to use an external CA)

- The *data plane* (communication between containers) is not encrypted by default

  (but this can be activated on a by-network basis, using IPSEC,
  leveraging hardware crypto if available)

---

## Under the hood: I want to know more!

Revisit SwarmKit concepts:

- Docker 1.12 Swarm Mode Deep Dive Part 1: Topology
  ([video](https://www.youtube.com/watch?v=dooPhkXT9yI))

- Docker 1.12 Swarm Mode Deep Dive Part 2: Orchestration
  ([video](https://www.youtube.com/watch?v=_F6PSP-qhdA))

Some presentations from the Docker Distributed Systems Summit in Berlin:

- Heart of the SwarmKit: Topology Management
  ([slides](https://speakerdeck.com/aluzzardi/heart-of-the-swarmkit-topology-management))

- Heart of the SwarmKit: Store, Topology & Object Model
  ([slides](http://www.slideshare.net/Docker/heart-of-the-swarmkit-store-topology-object-model))
  ([video](https://www.youtube.com/watch?v=EmePhjGnCXY))

---

## Adding more manager nodes

- Right now, we have only one manager (node1)

- If we lose it, we're SOL

- Let's make our cluster highly available

- Can you write a tiny script to automatically retrieve the manager token,
  <br/>and automatically add remaining nodes to the cluster?

--

- Hint: we want to use `for N in $(seq 4 5) ...`

---

## Adding more managers

With Play-With-Docker:

```bash
TOKEN=$(docker swarm join-token -q manager)
for N in $(seq 4 5); do
  export DOCKER_HOST=tcp://node$N:2375
  docker swarm join --token $TOKEN node1:2377
done
unset DOCKER_HOST
```

---

## Adding more managers

With Docker Machine:

```bash
TOKEN=$(docker swarm join-token -q manager)
for N in $(seq 4 5); do
  eval $(docker-machine env node$N)
  docker swarm join --token $TOKEN node1:2377
done
eval $(docker-machine env -u)
```

---

## You can control the Swarm from any manager node

.exercise[

- Try the following command on a few different nodes:
  ```bash
  docker node ls
  ```

]

On manager nodes:
<br/>you will see the list of nodes, with a `*` denoting
the node you're talking to.

On non-manager nodes:
<br/>you will get an error message telling you that
the node is not a manager.

As we saw earlier, you can only control the Swarm through a manager node.

---

## Play-With-Docker node status icon

- If you're using Play-With-Docker, you get node status icons

- Node status icons are displayed left of the node name

- No icon = no Swarm mode detected

- Solid blue icon = Swarm manager detected

- Blue outline icon = Swarm worker detected

---

## Promoting nodes

- Instead of adding a manager node, we can also promote existing workers

- Nodes can be promoted (and demoted) at any time

.exercise[

- See the current list of nodes:
  ```
  docker node ls
  ```

- Promote the two worker nodes to be managers:
  ```
  docker node promote XXX YYY
  ```

]

---

## How many managers do we need?

- 2N+1 nodes can (and will) tolerate N failures
  <br/>(you can have an even number of managers, but there is no point)

--

- 1 manager = no failure

- 3 managers = 1 failure

- 5 managers = 2 failures (or 1 failure during 1 maintenance)

- 7 managers and more = now you might be overdoing it a little bit

---

## Why not have *all* nodes be managers?

- Intuitively, it's harder to reach consensus in larger groups

- With Raft, each write needs to be acknowledged by the majority of nodes

- More nodes = more chance that we will have to wait for some laggard

- Bigger network = more latency

---

## What would McGyver do?

- If some of your machines are more than 10ms away from each other,
  <br/>
  try to break them down in multiple clusters
  (keeping internal latency low)

- Groups of up to 9 nodes: all of them are managers

- Groups of 10 nodes and up: pick 5 "stable" nodes to be managers

- Groups of more than 100 nodes: watch your managers' CPU and RAM

- Groups of more than 1000 nodes:

  - if you can afford to have fast, stable managers, add more of them
  - otherwise, break down your nodes in multiple clusters

---

## What's the upper limit?

- We don't know!

- Internal testing at Docker Inc.: 1000-10000 nodes is fine

  - deployed to a single cloud region

  - one of the main take-aways was *"you're gonna need a bigger manager"*

- Testing by the community: [4700 heterogenous nodes all over the 'net](https://sematext.com/blog/2016/11/14/docker-swarm-lessons-from-swarm3k/)

  - it just works

  - more nodes require more CPU; more containers require more RAM

  - scheduling of large jobs (70000 containers) is slow, though (working on it!)

---

# Running our first Swarm service

- How do we run services? Simplified version:

  `docker run` → `docker service create`

.exercise[

- Create a service featuring an Alpine container pinging Google resolvers:
  ```bash
  docker service create alpine ping 8.8.8.8
  ```

- Check where the container was created:
  ```bash
  docker service ps <serviceID>
  ```

]

---

## Checking container logs

- Right now, there is no direct way to check the logs of our container
  <br/>(unless it was scheduled on the current node)

- Look up the `NODE` on which the container is running
  <br/>(in the output of the `docker service ps` command)

.exercise[

- If you use Play-With-Docker, switch to that node's tab, or set `DOCKER_HOST`

- Otherwise, `ssh` into tht node or use `$(eval docker-machine env node...)`

]

---

## Viewing the logs of the container

.exercise[

- See that the container is running and check its ID:
  ```bash
  docker ps
  ```

- View its logs:
  ```bash
  docker logs <containerID>
  ```

- Go back to `node1` afterwards

]

---

## Scale our service

- Services can be scaled in a pinch with the `docker service update`
  command

.exercise[

- Scale the service to ensure 2 copies per node:
  ```bash
  docker service update <serviceID> --replicas 10
  ```

- Check that we have two containers on the current node:
  ```bash
  docker ps
  ```

]

---

## Expose a service

- Services can be exposed, with two special properties:

  - the public port is available on *every node of the Swarm*,

  - requests coming on the public port are load balanced across all instances.

- This is achieved with option `-p/--publish`; as an approximation:

  `docker run -p → docker service create -p`

- If you indicate a single port number, it will be mapped on a port
  starting at 30000
  <br/>(vs. 32768 for single container mapping)

- You can indicate two port numbers to set the public port number
  <br/>(just like with `docker run -p`)

---

## Expose ElasticSearch on its default port

.exercise[

- Create an ElasticSearch service (and give it a name while we're at it):
  ```bash
  docker service create --name search --publish 9200:9200 --replicas 7 \
         elasticsearch`:2`
  ```

- Check what's going on:
  ```bash
  watch docker service ps search
  ```

]

Note: don't forget the **:2**!

The latest version of the ElasticSearch image won't start without mandatory configuration.

---

## Tasks lifecycle

- If you are fast enough, you will be able to see multiple states:

  - assigned (the task has been assigned to a specific node)

  - preparing (right now, this mostly means "pulling the image")

  - running

- When a task is terminated (stopped, killed...) it cannot be restarted

  (A replacement task will be created)

---

![diagram showing what happens during docker service create, courtesy of @aluzzardi](docker-service-create.svg)

---

## Test our service

- We mapped port 9200 on the nodes, to port 9200 in the containers

- Let's try to reach that port!

.exercise[

- Try the following command:
  ```bash
  curl localhost:9200
  ```

]

(If you get `Connection refused`: congratulations, you are very fast indeed! Just try again.)

ElasticSearch serves a little JSON document with some basic information
about this instance; including a randomly-generated super-hero name.

---

## Test the load balancing

- If we repeat our `curl` command multiple times, we will see different names

.exercise[

- Send 10 requests, and see which instances serve them:
  ```bash
    for N in $(seq 1 10); do
      curl -s localhost:9200 | jq .name
    done
  ```

]

Note: if you don't have `jq` on your Play-With-Docker instance, just install it:
```bash
apk add --no-cache jq
```

---

## Load balancing results

Traffic is handled by our clusters [TCP routing mesh](
https://docs.docker.com/engine/swarm/ingress/).

Each request is served by one of the 7 instances, in rotation.

Note: if you try to access the service from your browser,
you will probably see the same
instance name over and over, because your browser (unlike curl) will try
to re-use the same connection.

---

## Under the hood of the TCP routing mesh

- Load balancing is done by IPVS

- IPVS is a high-performance, in-kernel load balancer

- It's been around for a long time (merged in the kernel since 2.4)

- Each node runs a local load balancer

  (Allowing connections to be routed directly to the destination,
  without extra hops)

---

## Managing inbound traffic

There are many ways to deal with inbound traffic on a Swarm cluster.

- Put all (or a subset) of your nodes in a DNS `A` record

- Assign your nodes (or a subet) to an ELB

- Use a virtual IP and make sure that it is assigned to an "alive" node

- etc.

---

## Managing HTTP traffic

- The TCP routing mesh doesn't parse HTTP headers

- If you want to place multiple HTTP services on port 80, you need something more

- You can setup NGINX or HAProxy on port 80 to do the virtual host switching

- Docker Universal Control Plane provides its own [HTTP routing mesh](
  https://docs.docker.com/datacenter/ucp/2.1/guides/admin/configure/use-domain-names-to-access-services/)

  - add a specific label starting with `com.docker.ucp.mesh.http` to your services

  - labels are detected automatically and dynamically update the configuration

---

## Terminate our services

- Before moving on, we will remove those services

- `docker service rm` can accept multiple services names or IDs

- `docker service ls` can accept the `-q` flag

- A Shell snippet a day keeps the cruft away

.exercise[

- Remove all services with this one liner:
  ```bash
  docker service ls -q | xargs docker service rm
  ```

]

---

class: title

#  Our app on Swarm

---

## What's on the menu?

In this part, we will cover:

- building images for our app,

- shipping those images with a registry,

- running them through the services concept,

- enabling inter-container communication with overlay networks.

---

## Why do we need to ship our images?

- When we do `docker-compose up`, images are built for our services

- Those images are present only on the local node

- We need those images to be distributed on the whole Swarm

- The easiest way to achieve that is to use a Docker registry

- Once our images are on a registry, we can reference them when
  creating our services

---

## Build, ship, and run, for a single service

If we had only one service (built from a `Dockerfile` in the
current directory), our workflow could look like this:

```
docker build -t jpetazzo/doublerainbow:v0.1 .
docker push jpetazzo/doublerainbow:v0.1
docker service create jpetazzo/doublerainbow:v0.1
```

We just have to adapt this to our application, which has 4 services!

---

## The plan

- Build on our local node (`node1`)

- Tag images with a version number

  (timestamp; git hash; semantic...)

- Upload them to a registry

- Create services using the images

---

## Which registry do we want to use?

.small[

- **Docker Hub**

  - hosted by Docker Inc.
  - requires an account (free, no credit card needed)
  - images will be public (unless you pay)
  - located in AWS EC2 us-east-1

- **Docker Trusted Registry**

  - self-hosted commercial product
  - requires a subscription (free 30-day trial available)
  - images can be public or private
  - located wherever you want

- **Docker open source registry**

  - self-hosted barebones repository hosting
  - doesn't require anything
  - doesn't come with anything either
  - located wherever you want

]

???
---

## Using Docker Hub

*If we wanted to use the Docker Hub...*

<!--
```meta
^{
```
-->

- We would log into the Docker Hub:
  ```bash
  docker login
  ```

- And in the following slides, we would use our Docker Hub login
  (e.g. `jpetazzo`) instead of the registry address (i.e. `127.0.0.1:5000`)

<!--
```meta
^}
```
-->

???
---

## Using Docker Trusted Registry

*If we wanted to use DTR, we would...*

- Make sure we have a Docker Hub account

- [Activate a Docker Datacenter subscription](
  https://hub.docker.com/enterprise/trial/)

- Install DTR on our machines

- Use `dtraddress:port/user` instead of the registry address

*This is out of the scope of this workshop!*

---

## Using open source registry

- We need to run a `registry:2` container
  <br/>(make sure you specify tag `:2` to run the new version!)

- It will store images and layers to the local filesystem
  <br/>(but you can add a config file to use S3, Swift, etc.)

<!-- -->

- Docker *requires* TLS when communicating with the registry

  - unless for registries on `127.0.0.0/8` (i.e. `localhost`)

  - or with the Engine flag `--insecure-registry`

<!-- -->

- Our strategy: publish the registry container on port 5000,
  <br/>so that it's available through `127.0.0.1:5000` on each node

---

# Deploying a local registry

- We will create a single-instance service, publishing its port
  on the whole cluster

.exercise[

- Create the registry service:
  ```bash
  docker service create --name registry --publish 5000:5000 registry:2
  ```

- Try the following command, until it returns `{"repositories":[]}`:
  ```bash
  curl 127.0.0.1:5000/v2/_catalog
  ```

]

(Retry a few times, it might take 10-20 seconds for the container to be started. Patience.)

---

## Testing our local registry

- We can retag a small image, and push it to the registry

.exercise[

- Make sure we have the busybox image, and retag it:
  ```bash
  docker pull busybox
  docker tag busybox 127.0.0.1:5000/busybox
  ```

- Push it:
  ```bash
  docker push 127.0.0.1:5000/busybox
  ```

]

---

## Checking what's on our local registry

- The registry API has endpoints to query what's there

.exercise[

- Ensure that our busybox image is now in the local registry:
  ```bash
  curl http://127.0.0.1:5000/v2/_catalog
  ```

]

The curl command should now output:
```json
{"repositories":["busybox"]}
```

---

## Build, tag, and push our application container images

- Scriptery to the rescue!

.exercise[

- Set `DOCKER_REGISTRY` and `TAG` environment variables to use our local registry

- And run this little for loop:
  ```bash
    DOCKER_REGISTRY=127.0.0.1:5000
    TAG=v0.1
    for SERVICE in hasher rng webui worker; do
      docker-compose build $SERVICE
      docker tag dockercoins_$SERVICE $DOCKER_REGISTRY/dockercoins_$SERVICE:$TAG
      docker push $DOCKER_REGISTRY/dockercoins_$SERVICE
    done
  ```

]

---

# Overlay networks

- SwarmKit integrates with overlay networks, without requiring
  an extra key/value store

- Overlay networks are created the same way as before

.exercise[

- Create an overlay network for our application:
  ```bash
  docker network create --driver overlay dockercoins
  ```

- Check existing networks:
  ```bash
  docker network ls
  ```

]

---

## Can you spot the differences?

The networks `dockercoins` and `ingress` are different from the other ones.

Can you see how?

--

- They are using a different kind of ID, reflecting the fact that they
  are SwarmKit objects instead of "classic" Docker Engine objects.

- Their *scope* is `swarm` instead of `local`.

- They are using the overlay driver.

---

## Caveats

.warning[In Docker 1.12, you cannot join an overlay network with `docker run --net ...`.]

Starting with version 1.13, you can, if the network was created with the `--attachable` flag.

*Why is that?*

Placing a container on a network requires allocating an IP address for this container.

The allocation must be done by a manager node (worker nodes cannot update Raft data).

As a result, `docker run --net ...` requires collaboration with manager nodes.

It alters the code path for `docker run`, so it is allowed only under strict circumstances.

---

## Run the application

- First, create the `redis` service; that one is using a Docker Hub image

.exercise[

- Create the `redis` service:
  ```bash
  docker service create --network dockercoins --name redis redis
  ```

]

---

## Run the other services

- Then, start the other services one by one

- We will use the images pushed previously

.exercise[

- Start the other services:
  ```bash
  DOCKER_REGISTRY=127.0.0.1:5000
  TAG=v0.1
  for SERVICE in hasher rng webui worker; do
    docker service create --network dockercoins --name $SERVICE \
           $DOCKER_REGISTRY/dockercoins_$SERVICE:$TAG
  done
  ```

]

???
---

## Wait for our application to be up

- We will see later a way to watch progress for all the tasks of the cluster

- But for now, a scrappy Shell loop will do the trick

.exercise[

- Repeatedly display the status of all our services:
  ```bash
  watch "docker service ls -q | xargs -n1 docker service ps"
  ```

- Stop it once everything is running

]

---

## Expose our application web UI

- We need to connect to the `webui` service, but it is not publishing any port

- Let's reconfigure it to publish a port

.exercise[

- Update `webui` so that we can connect to it from outside:
  ```bash
  docker service update webui --publish-add 8000:80
  ```

]

Note: to "de-publish" a port, you would have to specify the container port.
</br>(i.e. in that case, `--publish-rm 80`)

---

## What happens when we modify a service?

- Let's find out what happened to our `webui` service

.exercise[

- Look at the tasks and containers associated to `webui`:
  ```bash
  docker service ps webui
  ```
]

--

The first version of the service (the one that was not exposed) has been shutdown.

It has been replaced by the new version, with port 80 accessible from outside.

(This will be discussed with more details in the section about stateful services.)

---

## Connect to the web UI

- The web UI is now available on port 8000, *on all the nodes of the cluster*

.exercise[

- If you're using Play-With-Docker, just click on the `(8000)` badge

- Otherwise, point your browser to any node, on port 8000

]

You might have to wait a bit for the container to be up and running.

Check its status with `docker service ps webui`.

---

## Scaling the application

- We can change scaling parameters with `docker update` as well

- We will do the equivalent of `docker-compose scale`

.exercise[

- Bring up more workers:
  ```bash
  docker service update worker --replicas 10
  ```

- Check the result in the web UI

]

You should see the performance peaking at 10 hashes/s (like before).

---

## Global scheduling

- We want to utilize as best as we can the entropy generators
  on our nodes

- We want to run exactly one `rng` instance per node

- SwarmKit has a special scheduling mode for that, let's use it

- We cannot enable/disable global scheduling on an existing service

- We have to destroy and re-create the `rng` service

---

## Scaling the `rng` service

.exercise[

- Remove the existing `rng` service:
  ```bash
  docker service rm rng
  ```

- Re-create the `rng` service with *global scheduling*:
  ```bash
    docker service create --name rng --network dockercoins --mode global \
      $DOCKER_REGISTRY/dockercoins_rng:$TAG
  ```

- Look at the result in the web UI

]

Note: if the hash rate goes to zero and doesn't climb back up, try to `rm` and `create` again.

---

## Why do we have to re-create the service to enable global scheduling?

- Enabling it dynamically would make rolling updates semantics very complex

- This might change in the future (after all, it was possible in 1.12 RC!)

- As of Docker Engine 1.13, other parameters requiring to `rm`/`create` the service are:

  - service name

  - hostname

  - network

---

## How did we make our app "Swarm-ready"?

This app was written in June 2015. (One year before Swarm mode was released.)

What did we change to make it compatible with Swarm mode?

--

.exercise[

- Go to the app directory:
  ```bash
  cd ~/orchestration-workshop/dockercoins
  ```

- See modifications in the code:
  ```bash
  git log -p --since "4-JUL-2015" -- . ':!*.yml*' ':!*.html'
  ```

]

---

## What did we change in our app since its inception?

- Compose files

- HTML file (it contains an embedded contextual tweet)

- Dockerfiles (to switch to smaller images)

- That's it!

--

*We didn't change a single line of code in this app since it was written.*

--

*The images that were [built in June 2015](
https://hub.docker.com/r/jpetazzo/dockercoins_worker/tags/)
(when the app was written) can still run today ...
<br/>... in Swarm mode (distributed across a cluster, with load balancing) ...
<br/>... without any modification.*

---

## How did we design our app in the first place?

- [Twelve-Factor App](https://12factor.net/) principles

- Service discovery using DNS names

  - Initially implemented as "links"

  - Then "ambassadors"

  - And now "services"

- Existing apps might require more changes!

---

# Scripting image building and pushing

- Earlier, we used some rather crude shell loops to build and push images

- Compose (and clever environment variables) can help us to make that easier

- When using Compose file version 2, you can specify *both* `build` and `image`:

```yaml
version: "2"

services:

  webapp:
    build: src/
    image: jpetazzo/webapp:${TAG}
```

Note: Compose tolerates empty (or unset) environment variables, but in this example,
`TAG` *must* be set, because `jpetazzo/webapp:` is not a valid image name.

---

## Updating the Compose file to specify image tags

- Let's update the Compose file for DockerCoins to make it easier to push it to our registry

.exercise[

- Go back to the `dockercoins` directory:
  ```bash
  cd ~/orchestration-workshop/dockercoins
  ```

- Edit `docker-compose.yml`, and update each service to add an `image` directive as follows:
  ```yaml
    rng:
      build: rng
      `image: ${REGISTRY_SLASH}rng${COLON_TAG}`
  ```

]

You can also directly use the file `docker-compose.yml-images`.

---

## Use the new Compose file

- We need to set `REGISTRY_SLASH` and `COLON_TAG` variables

- Then we can use Compose to `build` and `push`

.exercise[

- Set environment variables:
  ```bash
  export REGISTRY_SLASH=127.0.0.1:5000/
  export COLON_TAG=:v0.01
  ```

- Build and push with Compose:
  ```bash
  docker-compose build
  docker-compose push
  ```

]

---

## Why the weird variable names?

- It would be more intuitive to have:
  ```bash
  REGISTRY=127.0.0.1:5000
  TAG=v0.01
  ```

- But then, when the variables are not set, the image names would be invalid

  (they would look like .red[`/rng:`])

- Putting the slash and the colon in the variables allows to use the Compose file
  even when the variables are not set

- The variable names (might) remind you that you have to put the trailing slash and heading colon

---

# Integration with Compose

- The previous section showed us how to streamline image build and push

- We will now see how to streamline service creation

  (i.e. get rid of the `for SERVICE in ...; do docker service create ...` part)

---

## Compose file version 3

(New in Docker Engine 1.13)

- Almost identical to version 2

- Can be directly used by a Swarm cluster through `docker stack ...` commands

- Introduces a `deploy` section to pass Swarm-specific parameters

- Resource limits are moved to this `deploy` section

- See [here](https://github.com/aanand/docker.github.io/blob/8524552f99e5b58452fcb1403e1c273385988b71/compose/compose-file.md#upgrading) for the complete list of changes

- Supersedes *Distributed Application Bundles*

  (JSON payload describing an application; could be generated from a Compose file)

---

## Removing everything

- Before deploying using "stacks," let's get a clean slate

.exercise[

- Remove *all* the services:
  ```bash
  docker service ls -q | xargs docker service rm
  ```

]

---

## Our first stack

We need a registry to move images around.

Before, we deployed it with the following command:

```bash
docker service create --publish 5000:5000 registry:2
```

Now, we are going to deploy it with the following stack file:

```yaml
version: "3"

services:
  registry:
    image: registry:2
    ports:
      - "5000:5000"
```

---

## Checking our stack files

- All the stack files that we will use are in the `stacks` directory

.exercise[

- Go to the `stacks` directory:
  ```bash
  cd ~/orchestration-workshop/stacks
  ```

- Check `registry.yml`:
  ```bash
  cat registry.yml
  ```

]

---

## Deploying our first stack

- All stack manipulation commands start with `docker stack`

- Under the hood, they map to `docker service` commands

- Stacks have a *name* (which also serves as a namespace)

- Stacks are specified with the aforementioned Compose file format version 3

.exercise[

- Deploy our local registry:
  ```bash
  docker stack deploy registry --compose-file registry.yml
  ```

]

---

## Inspecting stacks

- `docker stack ps` shows the detailed state of all services of a stack

.exercise[

- Check that our registry is running correctly:
  ```bash
  docker stack ps registry
  ```

- Confirm that we get the same output with the following command:
  ```bash
  docker service ps registry_registry
  ```

]

---

## Specifics of stack deployment

Our registry is not *exactly* identical to the one deployed with `docker service create`!

- Each stack gets its own overlay network

- Services of the task are connected to this network
  <br/>(unless specified differently in the Compose file)

- Services get network aliases matching their name in the Compose file
  <br/>(just like when Compose brings up an app specified in a v2 file)

- Services are explicitly named `<stack_name>_<service_name>`

- Services and tasks also get an internal label indicating which stack they belong to

---

## Building and pushing stack services

- We are going to use the `build` + `image` trick that we showed earlier:

  ```bash
  docker-compose -f my_stack_file.yml build
  docker-compose -f my_stack_file.yml push
  docker stack deploy my_stack --compose-file my_stack_file.yml
  ```

.exercise[

- Try it:
  ```bash
  docker-compose -f dockercoins.yml build
  ```

]

--

If it doesn't work, make sure that you have Compose 1.10.

---

## Upgrading Compose

- This is only necessary if you have an older version of Compose

.exercise[

- Upgrade Compose, using the `master` branch:
  ```bash
  pip install git+git://github.com/docker/compose
  ```

- Re-hash our `$PATH`:
  ```bash
  hash docker-compose
  ```

]

---

## Trying again

- We can now build and push our container images

.exercise[

- Build our application:
  ```bash
  docker-compose -f dockercoins.yml build
  ```

- Push the images to our registry:
  ```bash
  docker-compose -f dockercoins.yml push
  ```

]

Let's have a look at the `dockercoins.yml` file while this is building and pushing.

---

```yaml
version: "3"

services:
  rng:
    build: dockercoins/rng
    image: ${REGISTRY_SLASH-127.0.0.1:5000/}rng${COLON_TAG-:latest}
    deploy:
      mode: global
  ...
  redis:
    image: redis
  ...
  worker:
    build: dockercoins/worker
    image: ${REGISTRY_SLASH-127.0.0.1:5000/}worker${COLON_TAG-:latest}
    ...
    deploy:
      replicas: 10
```

???

## What's this `logging` section?

- This application stack is setup to send logs to a local GELF receiver

- We will use another "ready-to-use" Compose file to deploy an ELK stack

- We won't give much more details about the ELK stack right now

  (But there is a chapter dedicated to it in another part!)

- A given container can have only one logging driver at a time (for now)

- As a result, the `gelf` driver is superseding the default `json-file` driver

- ... Which means that the output of these containers won't show up in `docker logs`

---

## Deploying the application

- Now that the images are on the registry, we can deploy our application stack

.exercise[

- Create the application stack:
  ```bash
  docker stack deploy dockercoins --compose-file dockercoins.yml
  ```

]

We can now connect to any of our nodes on port 8000, and we will see the familiar hashing speed graph.

???

## Deploying the logging stack

- We are going to deploy an ELK stack

- We won't tell you much more about ELK (for now!)

.exercise[

- Deploy the logging stack:
  ```bash
  docker stack deploy elk --compose-file elk.yml
  ```

]

???

## Accessing the logging stack

- At the end of `elk.yml`, we have:

```yaml
  kibana:
    image: kibana:4
    ports:
      - "5601:5601"
    environment:
      ELASTICSEARCH_URL: http://elasticsearch:9200
```

- Kibana (the web frontend for the logging stack) is exposed on port 5601

.exercise[

- With your web browser, connect to port 5601

- Click around until you see your containers' logs!

]

???

.exercise[

- Deploy Prometheus, cAdvisor, and the node exporter, just like we deployed DockerCoins:
  ```bash
  docker-compose -f prometheus.yml build
  docker-compose -f prometheus.yml push
  docker stack deploy prometheus --compose-file prometheus.yml
  ```

]

Look at `prometheus.yml` while it's building and pushing.

???

```yaml
version: "3"

services:

  prometheus:
    build: ../prom
    image: 127.0.0.1:5000/prom
    ports:
      - "9090:9090"

  node:
    ...

  cadvisor:
    image: google/cadvisor
    deploy:
      mode: global
    volumes:
      - "/:/rootfs"
      - "/var/run:/var/run"
      - "/sys:/sys"
      - "/var/lib/docker:/var/lib/docker"
```

???

## Accessing our new metrics stack

.exercise[

- Go to any node, port 9090

- Check that data scraping works (click on "status", then "targets")

- Select a metric from the "insert metric at cursor" dropdown

- Execute!

]

---

## Maintaining multiple environments

There are many ways to handle variations between environments.

- Compose loads `docker-compose.yml` and (if it exists) `docker-compose.override.yml`

- Compose can load alternate file(s) by setting the `-f` flag or the `COMPOSE_FILE` environment variable

- Compose files can *extend* other Compose files, selectively including services:

  ```yaml
    web:
      extends:
        file: common-services.yml
        service: webapp
  ```

See [this documentation page](https://docs.docker.com/compose/extends/) for more details about these techniques.


---

## Good to know ...

- Compose file version 3 adds the `deploy` section

- Compose file version 3.1 adds support for secrets

- You can re-run `docker stack deploy` to update a stack

- ... But unsupported features will be wiped each time you redeploy (!)

  (This will likely be fixed/improved soon)

- `extends` doesn't work with `docker stack deploy`

  (But you can use `docker-compose config` to "flatten" your configuration)

---

## Summary

- We've seen how to set up a Swarm

- We've used it to host our own registry

- We've built our app container images

- We've used the registry to host those images

- We've deployed and scaled our application

- We've seen how to use Compose to streamline deployments

- Awesome job, team!

---

name: part-2

class: title

Part 2

---

## Before we start ...

The following exercises assume that you have a 5-nodes Swarm cluster.

If you come here from a previous tutorial and still have your cluster: great!

Otherwise: check [part 1](#part-1) to learn how to setup your own cluster.

We pick up exactly where we left you, so we assume that you have:

- a five nodes Swarm cluster,

- a self-hosted registry,

- DockerCoins up and running.

The next slide has a cheat sheet if you need to set that up in a pinch.

---

## Catching up

Assuming you have 5 nodes provided by
[Play-With-Docker](http://www.play-with-docker/), do this from `node1`:

```bash
docker swarm init --advertise-addr eth0
TOKEN=$(docker swarm join-token -q manager)
for N in $(seq 2 5); do
  DOCKER_HOST=tcp://node$N:2375 docker swarm join --token $TOKEN node1:2377
done
git clone git://github.com/jpetazzo/orchestration-workshop
cd orchestration-workshop/stacks
docker stack deploy --compose-file registry.yml registry
docker-compose -f dockercoins.yml build
docker-compose -f dockercoins.yml push
docker stack deploy --compose-file dockercoins.yml dockercoins
```

You should now be able to connect to port 8000 and see the DockerCoins web UI.

---

## Troubleshooting overlay networks

<!--

## Finding the real cause of the bottleneck

- We want to debug our app as we scale `worker` up and down

-->

- We want to run tools like `ab` or `httping` on the internal network

--

- Ah, if only we had created our overlay network with the `--attachable` flag ...

--

- Oh well, let's use this as an excuse to introduce New Ways To Do Things

---

# Breaking into an overlay network

- We will create a dummy placeholder service on our network

- Then we will use `docker exec` to run more processes in this container

.exercise[

- Start a "do nothing" container using our favorite Swiss-Army distro:
  ```bash
    docker service create --network dockercoins_default --name debug --mode global \
           alpine sleep 1000000000
  ```

]

Why am I using global scheduling here? Because I'm lazy!
<br/>
With global scheduling, I'm *guaranteed* to have an instance on the local node.
<br/>
I don't need to SSH to another node.

---

## Entering the debug container

- Once our container is started (which should be really fast because the alpine image is small), we can enter it (from any node)

.exercise[

- Locate the container:
  ```bash
  docker ps
  ```

- Enter it:
  ```bash
  docker exec -ti <containerID> sh
  ```

]

---

## Labels

- We can also be fancy and find the ID of the container automatically

- SwarmKit places labels on containers

.exercise[

- Get the ID of the container:
  ```bash
  CID=$(docker ps -q --filter label=com.docker.swarm.service.name=debug)
  ```

- And enter the container:
  ```bash
  docker exec -ti $CID sh
  ```

]

---

## Installing our debugging tools

- Ideally, you would author your own image, with all your favorite tools, and use it instead of the base `alpine` image

- But we can also dynamically install whatever we need

.exercise[

- Install a few tools:
  ```bash
  apk add --update curl apache2-utils drill
  ```

]

---

## Investigating the `rng` service

- First, let's check what `rng` resolves to

.exercise[

- Use drill or nslookup to resolve `rng`:
  ```bash
  drill rng
  ```

]

This give us one IP address. It is not the IP address of a container.
It is a virtual IP address (VIP) for the `rng` service.

---

## Investigating the VIP

.exercise[

- Try to ping the VIP:
  ```bash
  ping rng
  ```

]

It *should* ping. (But this might change in the future.)

With Engine 1.12: VIPs respond to ping if a
backend is available on the same machine.

With Engine 1.13: VIPs respond to ping if a
backend is available anywhere.

(Again: this might change in the future.)

---

## What if I don't like VIPs?

- Services can be published using two modes: VIP and DNSRR.

- With VIP, you get a virtual IP for the service, and a load balancer
  based on IPVS

  (By the way, IPVS is totally awesome and if you want to learn more about it in the context of containers,
  I highly recommend [this talk](https://www.youtube.com/watch?v=oFsJVV1btDU&index=5&list=PLkA60AVN3hh87OoVra6MHf2L4UR9xwJkv) by [@kobolog](https://twitter.com/kobolog) at DC15EU!)

- With DNSRR, you get the former behavior (from Engine 1.11), where
  resolving the service yields the IP addresses of all the containers for
  this service

- You change this with `docker service create --endpoint-mode [VIP|DNSRR]`

---

## Looking up VIP backends

- You can also resolve a special name: `tasks.<name>`

- It will give you the IP addresses of the containers for a given service

.exercise[

- Obtain the IP addresses of the containers for the `rng` service:
  ```bash
  drill tasks.rng
  ```

]

This should list 5 IP addresses.

???

## Testing and benchmarking our service

- We will check that the service is up with `rng`, then
  benchmark it with `ab`

.exercise[

- Make a test request to the service:
  ```bash
  curl rng
  ```

- Open another window, and stop the workers, to test in isolation:
  ```bash
  docker service update dockercoins_worker --replicas 0
  ```

]

Wait until the workers are stopped (check with `docker service ls`)
before continuing.

???

## Benchmarking `rng`

We will send 50 requests, but with various levels of concurrency.

.exercise[

- Send 50 requests, with a single sequential client:
  ```bash
  ab -c 1 -n 50 http://rng/10
  ```

- Send 50 requests, with fifty parallel clients:
  ```bash
  ab -c 50 -n 50 http://rng/10
  ```

]

???

## Benchmark results for `rng`

- When serving requests sequentially, they each take 100ms

- In the parallel scenario, the latency increased dramatically:

- What about `hasher`?

???

## Benchmarking `hasher`

We will do the same tests for `hasher`.

The command is slightly more complex, since we need to post random data.

First, we need to put the POST payload in a temporary file.

.exercise[

- Install curl in the container, and generate 10 bytes of random data:
  ```bash
  curl http://rng/10 >/tmp/random
  ```

]

???

## Benchmarking `hasher`

Once again, we will send 50 requests, with different levels of concurrency.

.exercise[

- Send 50 requests with a sequential client:
  ```bash
    ab -c 1 -n 50 -T application/octet-stream -p /tmp/random http://hasher/
  ```

- Send 50 requests with 50 parallel clients:
  ```bash
    ab -c 50 -n 50 -T application/octet-stream -p /tmp/random http://hasher/
  ```

]

???

## Benchmark results for `hasher`

- The sequential benchmarks takes ~5 seconds to complete

- The parallel benchmark takes less than 1 second to complete

- In both cases, each request takes a bit more than 100ms to complete

- Requests are a bit slower in the parallel benchmark

- It looks like `hasher` is better equiped to deal with concurrency than `rng`

???

class: title

Why?

???

## Why does everything take (at least) 100ms?

??

`rng` code:

![RNG code screenshot](delay-rng.png)

??

`hasher` code:

![HASHER code screenshot](delay-hasher.png)

???

class: title

But ...

WHY?!?

???

## Why did we sprinkle this sample app with sleeps?

- Deterministic performance
  <br/>(regardless of instance speed, CPUs, I/O...)

??

- Actual code sleeps all the time anyway

??

- When your code makes a remote API call:

  - it sends a request;

  - it sleeps until it gets the response;

  - it processes the response.

???

## Why do `rng` and `hasher` behave differently?

![Equations on a blackboard](equations.png)

??

(Synchronous vs. asynchronous event processing)

---

## Global scheduling → global debugging

- Traditional approach:

  - log into a node
  - install our Swiss Army Knife (if necessary)
  - troubleshoot things

- Proposed alternative:

  - put our Swiss Army Knife in a container (e.g. [nicolaka/netshoot](https://hub.docker.com/r/nicolaka/netshoot/))
  - run tests from multiple locations at the same time

(This becomes practical with the `docker service log` command, available by enabling experimental features.)

---

# Securing overlay networks

- By default, overlay networks are using plain VXLAN encapsulation

  (~Ethernet over UDP, using SwarmKit's control plane for ARP resolution)

- Encryption can be enabled on a per-network basis

  (It will use IPSEC encryption provided by the kernel, leveraging hardware acceleration)

- This is only for the `overlay` driver

  (Other drivers/plugins will use different mechanisms)

---

## Creating two networks: encrypted and not

- Let's create two networks for testing purposes

.exercise[

- Create an "insecure" network:
  ```bash
  docker network create insecure --driver overlay --attachable
  ```

- Create a "secure" network:
  ```bash
  docker network create secure --opt encrypted --driver overlay --attachable
  ```

]

.warning[Make sure that you don't typo that option; errors are silently ignored!]

---

## Deploying a web server sitting on both networks

- Let's use good old NGINX

- We will attach it to both networks

- We will use a placement constraint to make sure that it is on a different node

.exercise[

- Create a web server running somewhere else:
  ```bash
    docker service create --name web \
           --network secure --network insecure \
           --constraint node.hostname!=node1 \
           nginx
  ```

]

---

## Sniff HTTP traffic

- We will use `ngrep`, which allows to grep for network traffic

- We will run it in a container, using host networking to access the host's interfaces

.exercise[

- Sniff network traffic and display all packets containing "HTTP":
  ```bash
  docker run --net host jpetazzo/netshoot ngrep -tpd eth0 HTTP
  ```

]

--

Seeing tons of HTTP request? Shutdown your DockerCoins workers:
```bash
docker service update dockercoins_worker --replicas=0
```

---

## Check that we are, indeed, sniffing traffic

- Let's see if we can intercept our traffic with Google!

.exercise[

- Open a new terminal

- Issue an HTTP request to Google (or anything you like):
  ```bash
  curl google.com
  ```

]

The ngrep container will display one `#` per packet traversing the network interface.

When you do the `curl`, you should see the HTTP request in clear text in the output.

---

## Try to sniff traffic across overlay networks

- We will run `curl web` through both secure and insecure networks

.exercise[

- Access the web server through the insecure network:
  ```bash
  docker run --rm --net insecure nicolaka/netshoot curl web
  ```

- Now do the same through the secure network:
  ```bash
  docker run --rm --net secure nicolaka/netshoot curl web
  ```

]

When you run the first command, you will see HTTP fragments.
<br/>
However, when you run the second one, only `#` will show up.

---

# Rolling updates

- We want to release a new version of the worker

- We will edit the code ...

- ... build the new image ...

- ... push it to the registry ...

- ... update our service to use the new image

???

## But first...

- Restart the workers

.exercise[

- Just scale back to 10 replicas:
  ```bash
  docker service update worker --replicas 10
  ```

- Check that they're running:
  ```bash
  docker service ps worker
  ```

]

---

## Making changes

.exercise[

- Edit `~/orchestration-workshop/dockercoins/worker/worker.py`

- Locate the line that has a `sleep` instruction

- Reduce the `sleep` from `0.1` to `0.01`

- Save your changes and exit

]

---

## Building and pushing the new image

.exercise[

- Build the new image:
  ```bash
  IMAGE=127.0.0.1:5000/worker:v0.01
  docker build -t $IMAGE ~/orchestration-workshop/dockercoins/worker
  ```

- Push it to the registry:
  ```bash
  docker push $IMAGE
  ```

]

Note how the build and push were fast (because caching).

---

## Watching the deployment process

- We will need to open a new window for this

.exercise[

- Look at our service status:
  ```bash
  watch -n1 "docker service ps worker | grep -v Shutdown.*Shutdown"
  ```

]

- `docker service ps worker` gives us all tasks
  <br/>(including the one whose current or desired state is `Shutdown`)

- Then we filter out the tasks whose current **and** desired state is `Shutdown`

- Future versions might have fancy filters to make that less tinkerish

---

## Updating to our new image

- Keep the `watch ...` command running!

.exercise[

- In the other window, bring back the workers (if you had stopped them earlier):
  ```bash
  docker service update dockercoins_worker --replicas 10
  ```

- Then, update the service to the new image:
  ```bash
  docker service update dockercoins_worker --image $IMAGE
  ```

]

By default, SwarmKit does a rolling upgrade, one instance at a time.

---

## Changing the upgrade policy

- We can set upgrade parallelism (how many instances to update at the same time)

- And upgrade delay (how long to wait between two batches of instances)

.exercise[

- Change the parallelism to 2 and the delay to 5 seconds:
  ```bash
    docker service update dockercoins_worker \
      --update-parallelism 2 --update-delay 5s
  ```

]

The current upgrade will continue at a faster pace.

---

## Rolling back

- At any time (e.g. before the upgrade is complete), we can rollback

.exercise[

- Rollback to the previous image:
  ```bash
    docker service update dockercoins_worker \
      --image $DOCKER_REGISTRY/dockercoins_worker:v0.1
  ```

- With Docker 1.13, we can also revert to the previous service specification:
  ```bash
  docker service update dockercoins_worker --rollback
  ```

]

Note: if you updated the roll-out parallelism, *rollback* will not rollback to the previous image; it will rollback to the previous roll-out cadence.

---

## Timeline of an upgrade

- SwarmKit will upgrade N instances at a time
  <br/>(following the `update-parallelism` parameter)

- New tasks are created, and their desired state is set to `Ready`
  <br/>.small[(this pulls the image if necessary, ensures resource availability, creates the container ... without starting it)]

- If the new tasks fail to get to `Ready` state, go back to the previous step
  <br/>.small[(SwarmKit will try again and again, until the situation is addressed or desired state is updated)]

- When the new tasks are `Ready`, it sets the old tasks desired state to `Shutdown`

- When the old tasks are `Shutdown`, it starts the new tasks

- Then it waits for the `update-delay`, and continues with the next batch of instances

---

## Getting task information for a given node

- You can see all the tasks assigned to a node with `docker node ps`

- It shows the *desired state* and *current state* of each task

- `docker node ps` shows info about the current node

- `docker node ps <node_name_or_id>` shows info for another node

- `docker node ps -a` includes stopped and failed tasks

---

## Getting cluster-wide task information

- The Docker API doesn't expose this directly (yet)

- But the SwarmKit API does

- Let's see how to use it

- We will use `swarmctl`

- `swarmctl` is an example program showing how to
  interact with the SwarmKit API

- First, we need to install `swarmctl`

---

## Building `swarmctl`

- We are going to install a Go compiler, then download SwarmKit source and build it

.exercise[
- Download, compile, and install SwarmKit with this one-liner:
  ```bash
  docker run -v /usr/local/bin:/go/bin golang \
         go get `-v` github.com/docker/swarmkit/...
  ```

]

Remove `-v` if you don't like verbose things.

Shameless promo: for more Go and Docker love, check
[this blog post](http://jpetazzo.github.io/2016/09/09/go-docker/)!

Note: in the unfortunate event of SwarmKit *master* branch being broken,
the build might fail. In that case, just skip the `swarmctl` section.

---

## Using `swarmctl`

- The Docker Engine places the SwarmKit control socket in a special path

- And you need root privileges to access it

.exercise[

- Set an alias so that swarmctl can run as root and use the right control socket:
  ```bash
    alias \
      swarmctl='sudo swarmctl --socket /var/run/docker/swarm/control.sock'
  ```

]

(Note: with Docker 1.12, that control socket was in `/var/lib/docker/swarm/control.sock`)

---

## `swarmctl` in action

- Let's review a few useful `swarmctl` commands

.exercise[

- List cluster nodes (that's equivalent to `docker node ls`):
  ```bash
  swarmctl node ls
  ```

- View all tasks across all services:
  ```bash
  swarmctl task ls
  ```

]

---

## Caveat

- SwarmKit is vendored into the Docker Engine

- If you want to use `swarmctl`, you need the exact version of
  SwarmKit that was used in your Docker Engine

- Otherwise, you might get some errors like:

```
Error: grpc: failed to unmarshal the received message proto: wrong wireType = 0
```

---

# Secrets management and encryption at rest

(New in Docker Engine 1.13)

- Secrets management = selectively and securely bring secrets to services

- Encryption at rest = protect against storage theft or prying

- Remember:

  - control plane is authenticated through mutual TLS, certs rotated every 90 days

  - control plane is encrypted with AES-GCM, keys rotated every 12 hours

  - data plane is not encrypted by default (for performance reasons),
    <br/>but we saw earlier how to enable that with a single flag

---

## Secret management

- Docker has a "secret safe" (secure key→value store)

- You can create as many secrets as you like

- You can associate secrets to services

- Secrets are exposed as plain text files, but kept in memory only (using `tmpfs`)

- Secrets are immutable (at least in Engine 1.13)

- Secrets have a max size of 500 KB

---

## Creating secrets

- Must specify a name for the secret; and the secret itself

.exercise[

- Assign [one of the four most commonly used passwords](https://www.youtube.com/watch?v=0Jx8Eay5fWQ) to a secret called `hackme`:
  ```bash
  echo love | docker secret create hackme -
  ```

]

If the secret is in a file, you can simply pass the path to the file.

(The special path `-` indicates to read from the standard input.)

---

## Creating better secrets

- Picking lousy passwords always leads to security breaches

.exercise[

- Let's craft a better password, and assign it to another secret:
  ```bash
  base64 /dev/urandom | head -c16 | docker secret create arewesecureyet -
  ```

]

Note: in the latter case, we don't even know the secret at this point. But Swarm does.

---

## Using secrets

- Secrets must be handed explicitly to services

.exercise[

- Create a dummy service with both secrets:
  ```bash
    docker service create \
           --secret hackme --secret arewesecureyet \
           --name dummyservice --mode global \
           alpine sleep 1000000000
  ```

]

We use a global service to make sure that there will be an instance on the local node.

---

## Accessing secrets

- Secrets are materialized on `/run/secrets` (which is an in-memory filesystem)

.exercise[

- Find the ID of the container for the dummy service:
  ```bash
  CID=$(docker ps -q --filter label=com.docker.swarm.service.name=dummyservice)
  ```

- Enter the container:
  ```bash
  docker exec -ti $CID sh
  ```

- Check the files in `/run/secrets`

]

---

## Rotating secrets

- You can't change a secret

  (Sounds annoying at first; but allows clean rollbacks if a secret update goes wrong)

- You can add a secret to a service with `docker service update --secret-add`

  (This will redeploy the service; it won't add the secret on the fly)

- You can remove a secret with `docker service update --secret-rm`

- Secrets can be mapped to different names by expressing them with a micro-format:
  ```bash
  docker service create --secret source=secretname,target=filename
  ```

---

## Changing our insecure password

- We want to replace our `hackme` secret with a better one

.exercise[

- Remove the insecure `hackme` secret:
  ```bash
  docker service update dummyservice --secret-rm hackme
  ```

- Add our better secret instead:
  ```bash
  docker service update dummyservice \
         --secret-add source=arewesecureyet,target=hackme
  ```

]

Wait for the service to be fully updated with e.g. `watch docker service ps dummyservice`.

---

## Checking that our password is now stronger

- We will use the power of `docker exec`!

.exercise[

- Get the ID of the new container:
  ```bash
  CID=$(docker ps -q --filter label=com.docker.swarm.service.name=dummyservice)
  ```

- Check the contents of the secret files:
  ```bash
  docker exec $CID grep -r . /run/secrets
  ```

]

---

## Secrets in practice

- Can be (ab)used to hold whole configuration files if needed

- If you intend to rotate secret `foo`, call it `foo.N` instead, and map it to `foo`

  (N can be a serial, a timestamp...)

  ```bash
  docker service create --secret source=foo.N,target=foo ...
  ```

- You can update (remove+add) a secret in a single command:

  ```bash
  docker service update ... --secret-rm foo.M --secret-add source=foo.N,target=foo
  ```

- For more details and examples, [check the documentation](https://docs.docker.com/engine/swarm/secrets/)

---

## A reminder about *scope*

- Out of the box, Docker API access is "all or nothing"

- When someone has access to the Docker API, they can access *everything*

- If your developers are using the Docker API to deploy on the dev cluster ...

  ... and the dev cluster is the same as the prod cluster ...

  ... it means that your devs have access to your production data, passwords, etc.

- This can easily be avoided

---

## Fine-grained API access control

A few solutions, by increasing order of flexibility:

- Use separate clusters for different security perimeters

  (And different credentials for each cluster)

--

- Add an extra layer of abstraction (sudo scripts, hooks, or full-blown PAAS)

--

- Enable [authorization plugins]

  - each API request is vetted by your plugin(s)

  - by default, the *subject name* in the client TLS certificate is used as user name

  - example: [user and permission management] in [UCP]

[authorization plugins]: https://docs.docker.com/engine/extend/plugins_authorization/
[UCP]: https://docs.docker.com/datacenter/ucp/2.1/guides/
[user and permission management]: https://docs.docker.com/datacenter/ucp/2.1/guides/admin/manage-users/


---

## Encryption at rest

- Swarm data is always encrypted

- A Swarm cluster can be "locked"

- When a cluster is "locked", the encryption key is protected with a passphrase

- Starting or restarting a locked manager requires the passphrase

- This protects against:

  - theft (stealing a physical machine, a disk, a backup tape...)

  - unauthorized access (to e.g. a remote or virtual volume)

  - some vulnerabilities (like path traversal)

---

## Locking a Swarm cluster

- This is achieved through the `docker swarm update` command

.exercise[

- Lock our cluster:
  ```bash
  docker swarm update --autolock=true
  ```

]

This will display the unlock key. Copy-paste it somewhere safe.

---

## Locked state

- If we restart a manager, it will now be locked

.exercise[

- Restart the local Engine:
  ```bash
  sudo systemctl restart docker
  ```

]

Note: if you are doing the workshop on your own, using nodes
that you [provisioned yourself](https://github.com/jpetazzo/orchestration-workshop/tree/master/prepare-machine) or with [Play-With-Docker](http://play-with-docker.com/), you might have to use a different method to restart the Engine.

---

## Checking that our node is locked

- Manager commands (requiring access to crypted data) will fail

- Other commands are OK

.exercise[

- Try a few basic commands:
  ```bash
  docker ps
  docker run alpine echo ♥
  docker node ls
  ```

]

(The last command should fail, and it will tell you how to unlock this node.)

---

## Checking the state of the node programmatically

- The state of the node shows up in the output of `docker info`

.exercise[

- Check the output of `docker info`:
  ```bash
  docker info
  ```

- Can't see it? Too verbose? Grep to the rescue!
  ```bash
  docker info | grep ^Swarm
  ```

]

---

## Unlocking a node

- You will need the secret token that we obtained when enabling auto-lock earlier

.exercise[

- Unlock the node:
  ```bash
  docker swarm unlock
  ```

- Copy-paste the secret token that we got earlier

- Check that manager commands now work correctly:
  ```bash
  docker node ls
  ```

]

---

## Managing the secret key

- If the key is compromised, you can change it and re-encrypt with a new key:
  ```bash
  docker swarm unlock-key --rotate
  ```

- If you lost the key, you can get it as long as you have at least one unlocked node:
  ```bash
  docker swarm unlock-key -q
  ```

Note: if you rotate the key while some nodes are locked, without saving the previous key, those nodes won't be able to rejoin.

Note: if somebody steals both your disks and your key, .strike[you're doomed! Doooooomed!]
<br/>you can block the compromised node with `docker node demote` and `docker node rm`.

---

## Unlocking the cluster permanently

- If you want to remove the secret key, disable auto-lock

.exercise[

- Permanently unlock the cluster:
  ```bash
  docker swarm update --autolock=false
  ```

]

Note: if some nodes are in locked state at that moment (or if they are offline/restarting
while you disabled autolock), they still need the previous unlock key to get back online.

For more information about locking, you can check the [upcoming documentation](https://github.com/docker/docker.github.io/pull/694).

---

name: logging

# Centralized logging

- We want to send all our container logs to a central place

- If that place could offer a nice web dashboard too, that'd be nice

--

- We are going to deploy an ELK stack

- It will accept logs over a GELF socket

- We will update our services to send logs through the GELF logging driver

---

# Setting up ELK to store container logs

*Important foreword: this is not an "official" or "recommended"
setup; it is just an example. We used ELK in this demo because
it's a popular setup and we keep being asked about it; but you
will have equal success with Fluent or other logging stacks!*

What we will do:

- Spin up an ELK stack with services

- Gaze at the spiffy Kibana web UI

- Manually send a few log entries using one-shot containers

- Set our containers up to send their logs to Logstash

---

## What's in an ELK stack?

- ELK is three components:

  - ElasticSearch (to store and index log entries)

  - Logstash (to receive log entries from various
    sources, process them, and forward them to various
    destinations)

  - Kibana (to view/search log entries with a nice UI)

- The only component that we will configure is Logstash

- We will accept log entries using the GELF protocol

- Log entries will be stored in ElasticSearch,
  <br/>and displayed on Logstash's stdout for debugging

---

## Setting up ELK

- We need three containers: ElasticSearch, Logstash, Kibana

- We will place them on a common network, `logging`

.exercise[

- Create the network:
  ```bash
  docker network create --driver overlay logging
  ```

- Create the ElasticSearch service:
  ```bash
  docker service create --network logging --name elasticsearch elasticsearch:2.4
  ```

]

---

## Setting up Kibana

- Kibana exposes the web UI

- Its default port (5601) needs to be published

- It needs a tiny bit of configuration: the address of the ElasticSearch service

- We don't want Kibana logs to show up in Kibana (it would create clutter)
  <br/>so we tell Logspout to ignore them

.exercise[

- Create the Kibana service:
  ```bash
  docker service create --network logging --name kibana --publish 5601:5601 \
         -e ELASTICSEARCH_URL=http://elasticsearch:9200 kibana:4.6
  ```

]

---

## Setting up Logstash

- Logstash needs some configuration to listen to GELF messages and send them to ElasticSearch

- We could author a custom image bundling this configuration

- We can also pass the [configuration](https://github.com/jpetazzo/orchestration-workshop/blob/master/elk/logstash.conf) on the command line

.exercise[

- Create the Logstash service:
  ```bash
    docker service create --network logging --name logstash -p 12201:12201/udp \
           logstash:2.4 -e "$(cat ~/orchestration-workshop/elk/logstash.conf)"
  ```

]

---

## Checking Logstash

- Before proceeding, let's make sure that Logstash started properly

.exercise[

- Lookup the node running the Logstash container:
  ```bash
  docker service ps logstash
  ```

- Connect to that node

]

---

## View Logstash logs

.exercise[

- Get the ID of the Logstash container:
  ```bash
  CID=$(docker ps -q --filter label=com.docker.swarm.service.name=logstash)
  ```

- View the logs:
  ```bash
  docker logs --follow $CID
  ```

]

You should see the heartbeat messages:
.small[
```json
{      "message" => "ok",
          "host" => "1a4cfb063d13",
      "@version" => "1",
    "@timestamp" => "2016-06-19T00:45:45.273Z"
}
```
]

---

## Testing the GELF receiver

- In a new window, we will generate a logging message

- We will use a one-off container, and Docker's GELF logging driver

.exercise[

- Send a test message:
  ```bash
    docker run --log-driver gelf --log-opt gelf-address=udp://127.0.0.1:12201 \
           --rm alpine echo hello
  ```
]

The test message should show up in the logstash container logs.

???
---

## Sending logs from a service

- We were sending from a "classic" container so far; let's send logs from a service instead

- We're lucky: the parameters (`--log-driver` and `--log-opt`) are exactly the same!


.exercise[

- Send a test message:
  ```bash
    docker service create \
           --log-driver gelf --log-opt gelf-address=udp://127.0.0.1:12201 \
           alpine echo hello
  ```

]

The test message should show up as well in the logstash container logs.

--

In fact, *multiple messages will show up, and continue to show up every few seconds!*

---

## Restart conditions

- By default, if a container exits (or is killed with `docker kill`, or runs out of memory ...),
  the Swarm will restart it (possibly on a different machine)

- This behavior can be changed by setting the *restart condition* parameter

.exercise[

- Change the restart condition so that Swarm doesn't try to restart our container forever:
  ```bash
  docker service update `xxx` --restart-condition none
  ```
]

Available restart conditions are `none`, `any`, and `on-error`.

You can also set `--restart-delay`, `--restart-max-attempts`, and `--restart-window`.

---

## Connect to Kibana

- The Kibana web UI is exposed on cluster port 5601

.exercise[

- Connect to port 5601 of your cluster

  - if you're using Play-With-Docker, click on the (5601) badge above the terminal

  - otherwise, open http://(any-node-address):5601/ with your browser

]

---

## "Configuring" Kibana

- If you see a status page with a yellow item, wait a minute and reload
  (Kibana is probably still initializing)

- Kibana should offer you to "Configure an index pattern":
  <br/>in the "Time-field name" drop down, select "@timestamp", and hit the
  "Create" button

- Then:

  - click "Discover" (in the top-left corner)
  - click "Last 15 minutes" (in the top-right corner)
  - click "Last 1 hour" (in the list in the middle)
  - click "Auto-refresh" (top-right corner)
  - click "5 seconds" (top-left of the list)

- You should see a series of green bars (with one new green bar every minute)

---

## Updating our services to use GELF

- We will now inform our Swarm to add GELF logging to all our services

- This is done with the `docker service update` command

- The logging flags are the same as before

.exercise[

- Enable GELF logging for all our *stateless* services:
  ```bash
    for SERVICE in hasher rng webui worker; do
      docker service update dockercoins_$SERVICE \
             --log-driver gelf --log-opt gelf-address=udp://127.0.0.1:12201
    done
  ```

]

After ~15 seconds, you should see the log messages in Kibana.

---

## Viewing container logs

- Go back to Kibana

- Container logs should be showing up!

- We can customize the web UI to be more readable

.exercise[

- In the left column, move the mouse over the following
  columns, and click the "Add" button that appears:

  - host
  - container_name
  - message

<!--
  - logsource
  - program
  - message
-->

]

---

## .warning[Don't update stateful services!]

- Why didn't we update the Redis service as well?

- When a service changes, SwarmKit replaces existing container with new ones

- This is fine for stateless services

- But if you update a stateful service, its data will be lost in the process

- If we updated our Redis service, all our DockerCoins would be lost

---

## Important afterword

**This is not a "production-grade" setup.**

It is just an educational example. We did set up a single
ElasticSearch instance and a single Logstash instance.

In a production setup, you need an ElasticSearch cluster
(both for capacity and availability reasons). You also
need multiple Logstash instances.

And if you want to withstand
bursts of logs, you need some kind of message queue:
Redis if you're cheap, Kafka if you want to make sure
that you don't drop messages on the floor. Good luck.

---

# Metrics collection

- We want to gather metrics in a central place

- We will gather node metrics and container metrics

- We want a nice interface to view them (graphs)

---

## Node metrics

- CPU, RAM, disk usage on the whole node

- Total number of processes running, and their states

- Number of open files, sockets, and their states

- I/O activity (disk, network), per operation or volume

- Physical/hardware (when applicable): temperature, fan speed ...

- ... and much more!

---

## Container metrics

- Similar to node metrics, but not totally identical

- RAM breakdown will be different

  - active vs inactive memory
  - some memory is *shared* between containers, and accounted specially

- I/O activity is also harder to track

  - async writes can cause deferred "charges"
  - some page-ins are also shared between containers

For details about container metrics, see:
<br/>
http://jpetazzo.github.io/2013/10/08/docker-containers-metrics/

---

## Tools

We will use three open source Go projects for our metrics pipeline:

- Intel Snap

  Collects, processes, and publishes metrics

- InfluxDB

  Stores metrics

- Grafana

  Displays metrics visually

---

## Snap

- [github.com/intelsdi-x/snap](https://github.com/intelsdi-x/snap)

- Can collect, process, and publish metric data

- Doesn’t store metrics

- Works as a daemon (snapd) controlled by a CLI (snapctl)

- Offloads collecting, processing, and publishing to plugins

- Does nothing out of the box; configuration required!

- Docs: https://github.com/intelsdi-x/snap/blob/master/docs/

---

## InfluxDB

- Snap doesn't store metrics data

- InfluxDB is specifically designed for time-series data

  - CRud vs. CRUD (you rarely if ever update/delete data)

  - orthogonal read and write patterns

  - storage format optimization is key (for disk usage and performance)

- Snap has a plugin allowing to *publish* to InfluxDB

---

## Grafana

- Snap cannot show graphs

- InfluxDB cannot show graphs

- Grafana will take care of that

- Grafana can read data from InfluxDB and display it as graphs

---

## Getting and setting up Snap

- We will install Snap directly on the nodes

- Release tarballs are available from GitHub

- We will use a *global service*
  <br/>(started on all nodes, including nodes added later)

- This service will download and unpack Snap in /opt and /usr/local

- /opt and /usr/local will be bind-mounted from the host

- This service will effectively install Snap on the hosts

---

## The Snap installer service

- This will get Snap on all nodes

.exercise[

```bash
docker service create --restart-condition=none --mode global \
       --mount type=bind,source=/usr/local/bin,target=/usr/local/bin \
       --mount type=bind,source=/opt,target=/opt centos sh -c '
SNAPVER=v0.16.1-beta
RELEASEURL=https://github.com/intelsdi-x/snap/releases/download/$SNAPVER
curl -sSL $RELEASEURL/snap-$SNAPVER-linux-amd64.tar.gz |
     tar -C /opt -zxf-
curl -sSL $RELEASEURL/snap-plugins-$SNAPVER-linux-amd64.tar.gz |
     tar -C /opt -zxf-
ln -s snap-$SNAPVER /opt/snap
for BIN in snapd snapctl; do ln -s /opt/snap/bin/$BIN /usr/local/bin/$BIN; done
' # If you copy-paste that block, do not forget that final quote ☺
```

]

---

## First contact with `snapd`

- The core of Snap is `snapd`, the Snap daemon

- Application made up of a REST API, control module, and scheduler module

.exercise[

- Start `snapd` with plugin trust disabled and log level set to debug:
  ```bash
  snapd -t 0 -l 1
  ```

]

- More resources:

  https://github.com/intelsdi-x/snap/blob/master/docs/SNAPD.md
  https://github.com/intelsdi-x/snap/blob/master/docs/SNAPD_CONFIGURATION.md

---

## Using `snapctl` to interact with `snapd`

- Let's load a *collector* and a *publisher* plugins

.exercise[

- Open a new terminal

- Load the psutil collector plugin:
  ```bash
  snapctl plugin load /opt/snap/plugin/snap-plugin-collector-psutil
  ```

- Load the file publisher plugin:
  ```bash
  snapctl plugin load /opt/snap/plugin/snap-plugin-publisher-mock-file
  ```

]

---

## Checking what we've done

- Good to know: Docker CLI uses `ls`, Snap CLI uses `list`

.exercise[

- See your loaded plugins:
  ```bash
  snapctl plugin list
  ```

- See the metrics you can collect:
  ```bash
  snapctl metric list
  ```

]

---

## Actually collecting metrics: introducing *tasks*

- To start collecting/processing/publishing metric data, you need to create a *task*

- A *task* indicates:

  - *what* to collect (which metrics)
  - *when* to collect it (e.g. how often)
  - *how* to process it (e.g. use it directly, or compute moving averages)
  - *where* to publish it

- Tasks can be defined with manifests written in JSON or YAML

- Some plugins, such as the Docker collector, allow for wildcards (\*) in the metrics "path"
  <br/>(see snap/docker-influxdb.json)

- More resources:
  https://github.com/intelsdi-x/snap/blob/master/docs/TASKS.md

---

## Our first task manifest

```yaml
  version: 1
  schedule:
    type: "simple" # collect on a set interval
    interval: "1s" # of every 1s
  max-failures: 10
  workflow:
    collect: # first collect
      metrics: # metrics to collect
        /intel/psutil/load/load1: {}
      config: # there is no configuration
      publish: # after collecting, publish
        -
            plugin_name: "file" # use the file publisher
            config:
                file: "/tmp/snap-psutil-file.log" # write to this file
```

---

## Creating our first task

- The task manifest shown on the previous slide is stored in `snap/psutil-file.yml`.

.exercise[

- Create a task using the manifest:

  ```bash
  cd ~/orchestration-workshop/snap
  snapctl task create -t psutil-file.yml
  ```

]

  The output should look like the following:
  ```
    Using task manifest to create task
    Task created
    ID: 240435e8-a250-4782-80d0-6fff541facba
    Name: Task-240435e8-a250-4782-80d0-6fff541facba
    State: Running
  ```

---

## Checking existing tasks

.exercise[

- This will confirm that our task is running correctly, and remind us of its task ID

  ```bash
  snapctl task list
  ```

]

The output should look like the following:
  ```
    ID           NAME              STATE     HIT MISS FAIL CREATED
    24043...acba Task-24043...acba Running   4   0    0    2:34PM   8-13-2016
  ```
---

## Viewing our task dollars at work

- The task is using a very simple publisher, `mock-file`

- That publisher just writes text lines in a file (one line per data point)

.exercise[

- Check that the data is flowing indeed:
  ```bash
  tail -f /tmp/snap-psutil-file.log
  ```

]

To exit, hit `^C`

---

## Debugging tasks

- When a task is not directly writing to a local file, use `snapctl task watch`

- `snapctl task watch` will stream the metrics you are collecting to STDOUT

.exercise[

```bash
snapctl task watch <ID>
```

]

To exit, hit `^C`

---

## Stopping snap

- Our Snap deployment has a few flaws:

  - snapd was started manually

  - it is running on a single node

  - the configuration is purely local

--

- We want to change that!

--

- But first, go back to the terminal where `snapd` is running, and hit `^C`

- All tasks will be stopped; all plugins will be unloaded; Snap will exit

---

## Snap Tribe Mode

- Tribe is Snap's clustering mechanism

- When tribe mode is enabled, nodes can join *agreements*

- When a node in an *agreement* does something (e.g. load a plugin or run a task),
  <br/>other nodes of that agreement do the same thing

- We will use it to load the Docker collector and InfluxDB publisher on all nodes,
  <br/>and run a task to use them

- Without tribe mode, we would have to load plugins and run tasks manually on every node

- More resources:
  https://github.com/intelsdi-x/snap/blob/master/docs/TRIBE.md

---

## Running Snap itself on every node

- Snap runs in the foreground, so you need to use `&` or start it in tmux

.exercise[

- Run the following command *on every node:*
  ```bash
  snapd -t 0 -l 1 --tribe --tribe-seed node1:6000
  ```

]

If you're *not* using Play-With-Docker, there is another way to start Snap!

---

## Starting a daemon through SSH

.warning[Hackety hack ahead!]

- We will create a *global service*

- That global service will install a SSH client

- With that SSH client, the service will connect back to its local node
  <br/>(i.e. "break out" of the container, using the SSH key that we provide)

- Once logged on the node, the service starts snapd with Tribe Mode enabled

---

## Running Snap itself on every node

- I might go to hell for showing you this, but here it goes ...

.exercise[

- Start Snap all over the place:
  ```bash
    docker service create --name snapd --mode global \
           --mount type=bind,source=$HOME/.ssh/id_rsa,target=/sshkey \
           alpine sh -c "
                  apk add --no-cache openssh-client &&
                  ssh -o StrictHostKeyChecking=no -i /sshkey docker@172.17.0.1 \
                      sudo snapd -t 0 -l 1 --tribe --tribe-seed node1:6000
           " # If you copy-paste that block, don't forget that final quote :-)
   ```

]

Remember: this *does not work* with Play-With-Docker (which doesn't have SSH).

---

## Viewing the members of our tribe

- If everything went fine, Snap is now running in tribe mode

.exercise[

- View the members of our tribe:
  ```bash
  snapctl member list
  ```

]

This should show the 5 nodes with their hostnames.

---

## Create an agreement

- We can now create an *agreement* for our plugins and tasks

.exercise[

- Create an agreement; make sure to use the same name all along:
  ```bash
  snapctl agreement create docker-influxdb
  ```

]

The output should look like the following:

```
  Name             Number of Members       plugins      tasks
  docker-influxdb  0                       0            0
```

---

## Instruct all nodes to join the agreement

- We dont need another fancy global service!

- We can join nodes from any existing node of the cluster

.exercise[

- Add all nodes to the agreement:
  ```bash
    snapctl member list | tail -n +2 |
      xargs -n1 snapctl agreement join docker-influxdb
  ```

]

The last bit of output should look like the following:
```
  Name             Number of Members       plugins         tasks
  docker-influxdb  5                       0               0
```

---

## Start a container on every node

- The Docker plugin requires at least one container to be started

- Normally, at this point, you will have at least one container on each node

- But just in case you did things differently, let's create a dummy global service

.exercise[

- Create an alpine container on the whole cluster:
  ```bash
    docker service create --name ping --mode global alpine ping 8.8.8.8
  ```

]

---

## Running InfluxDB

- We will create a service for InfluxDB

- We will use the official image

- InfluxDB uses multiple ports:

  - 8086 (HTTP API; we need this)

  - 8083 (admin interface; we need this)

  - 8088 (cluster communication; not needed here)

  - more ports for other protocols (graphite, collectd...)

- We will just publish the first two

---

## Creating the InfluxDB service

.exercise[

- Start an InfluxDB service, publishing ports 8083 and 8086:
  ```bash
    docker service create --name influxdb \
           --publish 8083:8083 \
           --publish 8086:8086 \
           influxdb:0.13
  ```

]

Note: this will allow any node to publish metrics data to `localhost:8086`,
and it will allows us to access the admin interface by connecting to any node
on port 8083.

.warning[Make sure to use InfluxDB 0.13; a few things changed in 1.0
(like, the name of the default retention policy is now "autogen") and
this breaks a few things.]

---

## Setting up InfluxDB

- We need to create the "snap" database

.exercise[

- Open port 8083 with your browser

- Enter the following query in the query box:
  ```
  CREATE DATABASE "snap"
  ```

- In the top-right corner, select "Database: snap"

]

Note: the InfluxDB query language *looks like* SQL but it's not.

???

## Setting a retention policy

- When graduating to 1.0, InfluxDB changed the name of the default policy

- It used to be "default" and it is now "autogen"

- Snap still uses "default" and this results in errors

.exercise[

- Create a "default" retention policy by entering the following query in the box:
  ```
  CREATE RETENTION POLICY "default" ON "snap" DURATION 1w REPLICATION 1
  ```

]

---

## Load Docker collector and InfluxDB publisher

- We will load plugins on the local node

- Since our local node is a member of the agreement, all other
  nodes in the agreement will also load these plugins

.exercise[

- Load Docker collector:

  ```bash
  snapctl plugin load /opt/snap/plugin/snap-plugin-collector-docker
  ```

- Load InfluxDB publisher:

  ```bash
  snapctl plugin load /opt/snap/plugin/snap-plugin-publisher-influxdb
  ```

]

---

## Start a simple collection task

- Again, we will create a task on the local node

- The task will be replicated on other nodes members of the same agreement

.exercise[

- Load a task manifest file collecting a couple of metrics on all containers,
  <br/>and sending them to InfluxDB:
  ```bash
  snapctl task create -t docker-influxdb.json
  ```

]

Note: the task description sends metrics to the InfluxDB API endpoint
located at 127.0.0.1:8086. Since the InfluxDB container is published
on port 8086, 127.0.0.1:8086 always routes traffic to the InfluxDB
container.

---

## If things go wrong...

Note: if a task runs into a problem (e.g. it's trying to publish
to a metrics database, but the database is unreachable), the task
will be stopped.

You will have to restart it manually by running:

```bash
snapctl task enable <ID>
snapctl task start <ID>
```

This must be done *per node*. Alternatively, you can delete+re-create
the task (it will delete+re-create on all nodes).

---

## Check that metric data shows up in InfluxDB

- Let's check existing data with a few manual queries in the InfluxDB admin interface

.exercise[

- List "measurements":
  ```
  SHOW MEASUREMENTS
  ```
  (This should show two generic entries corresponding to the two collected metrics.)

- View time series data for one of the metrics:
  ```
  SELECT * FROM "intel/docker/stats/cgroups/cpu_stats/cpu_usage/total_usage"
  ```
  (This should show a list of data points with **time**, **docker_id**, **source**, and **value**.)

]

---

## Deploy Grafana

- We will use an almost-official image, `grafana/grafana`

- We will publish Grafana's web interface on its default port (3000)

.exercise[

- Create the Grafana service:
  ```bash
  docker service create --name grafana --publish 3000:3000 grafana/grafana:3.1.1
  ```

]

---

## Set up Grafana

.exercise[

- Open port 3000 with your browser

- Identify with "admin" as the username and password

- Click on the Grafana logo (the orange spiral in the top left corner)

- Click on "Data Sources"

- Click on "Add data source" (green button on the right)

]

---

## Add InfluxDB as a data source for Grafana

.small[

Fill the form exactly as follows:
- Name = "snap"
- Type = "InfluxDB"

In HTTP settings, fill as follows:
- Url = "http://(IP.address.of.any.node):8086"
- Access = "direct"
- Leave HTTP Auth untouched

In InfluxDB details, fill as follows:
- Database = "snap"
- Leave user and password blank

Finally, click on "add", you should see a green message saying "Success - Data source is working".
If you see an orange box (sometimes without a message), it means that you got something wrong. Triple check everything again.

]

---

![Screenshot showing how to fill the form](grafana-add-source.png)

---

## Create a dashboard in Grafana

.exercise[

- Click on the Grafana logo again (the orange spiral in the top left corner)

- Hover over "Dashboards"

- Click "+ New"

- Click on the little green rectangle that appeared in the top left

- Hover over "Add Panel"

- Click on "Graph"

]

At this point, you should see a sample graph showing up.

---

## Setting up a graph in Grafana

.exercise[

- Panel data source: select snap
- Click on the SELECT metrics query to expand it
- Click on "select measurement" and pick CPU usage
- Click on the "+" right next to "WHERE"
- Select "docker_id"
- Select the ID of a container of your choice (e.g. the one running InfluxDB)
- Click on the "+" on the right of the "SELECT" line
- Add "derivative"
- In the "derivative" option, select "1s"
- In the top right corner, click on the clock, and pick "last 5 minutes"

]

Congratulations, you are viewing the CPU usage of a single container!

---

![Screenshot showing the end result](grafana-add-graph.png)

---

## Before moving on ...

- Leave that tab open!

- We are going to setup *another* metrics system

- ... And then compare both graphs side by side

---

## Prometheus

- Prometheus is another metrics collection system

- Snap *pushes* metrics; Prometheus *pulls* them

- The *Prometheus server* pulls, stores, and displays metrics

- Its configuration defines a list of *exporter* endpoints
  <br/>(that list can be dynamic, using e.g. Consul, DNS, Etcd...)

- The exporters expose metrics over HTTP using a simple line-oriented format

  (An optimized format using protobuf is also possible)

---

## It's all about the `/metrics`

- This is was the *node exporter* looks like:

  http://demo.robustperception.io:9100/metrics

- Prometheus itself exposes its own internal metrics, too:

  http://demo.robustperception.io:9090/metrics

- A *Prometheus server* will *scrape* URLs like these

  (It can also use protobuf to avoid the overhead of parsing line-oriented formats!)

---

## Collecting metrics with Prometheus on Swarm

- We will run two *global services* (i.e. scheduled on all our nodes):

  - the Prometheus *node exporter* to get node metrics

  - Google's cAdvisor to get container metrics

- We will run a Prometheus server to scrape these exporters

- The Prometheus server will be configured to use DNS service discovery

- We will use `tasks.<servicename>` for service discovery

- All these services will be placed on a private internal network

---

## Creating an overlay network for Prometheus

- This is the easiest step ☺

.exercise[

- Create an overlay network:
  ```bash
  docker network create --driver overlay prom
  ```

]

---

## Running the node exporter

- The node exporter *should* run directly on the hosts
- However, it can run from a container, if configured properly
  <br/>
  (it needs to access the host's filesystems, in particular /proc and /sys)

.exercise[

- Start the node exporter:
  ```bash
    docker service create --name node --mode global --network prom \
     --mount type=bind,source=/proc,target=/host/proc \
     --mount type=bind,source=/sys,target=/host/sys \
     --mount type=bind,source=/,target=/rootfs \
     prom/node-exporter \
      -collector.procfs /host/proc \
      -collector.sysfs /host/proc \
      -collector.filesystem.ignored-mount-points "^/(sys|proc|dev|host|etc)($|/)"
   ```

]

---

## Running cAdvisor

- Likewise, cAdvisor *should* run directly on the hosts

- But it can run in containers, if configured properly

.exercise[

- Start the cAdvisor collector:
  ```bash
    docker service create --name cadvisor --network prom --mode global \
      --mount type=bind,source=/,target=/rootfs \
      --mount type=bind,source=/var/run,target=/var/run \
      --mount type=bind,source=/sys,target=/sys \
      --mount type=bind,source=/var/lib/docker,target=/var/lib/docker \
      google/cadvisor:latest
  ```

]

---

## Configuring the Prometheus server

This will be our configuration file for Prometheus:

```yaml
global:
  scrape_interval: 1s
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'node'
    dns_sd_configs:
      - names: ['tasks.node']
        type: 'A'
        port: 9100
  - job_name: 'cadvisor'
    dns_sd_configs:
      - names: ['tasks.cadvisor']
        type: 'A'
        port: 8080
```

---

## Passing the configuration to the Prometheus server

- We need to provide our custom configuration to the Prometheus server

- The easiest solution is to create a custom image bundling this configuration

- We will use a very simple Dockerfile:
  ```dockerfile
  FROM prom/prometheus:v1.4.1
  COPY prometheus.yml /etc/prometheus/prometheus.yml
  ```

  (The configuration file, and the Dockerfile, are in the `prom` subdirectory)

- We will build this image, and push it to our local registry

- Then we will create a service using this image

---

## Building our custom Prometheus image

- We will use the local registry started previously on 127.0.0.1:5000

.exercise[

- Build the image using the provided Dockerfile:
  ```bash
  docker build -t 127.0.0.1:5000/prometheus ~/orchestration-workshop/prom
  ```

- Push the image to our local registry:
  ```bash
  docker push 127.0.0.1:5000/prometheus
  ```

]

---

## Running our custom Prometheus image

- That's the only service that needs to be published

  (If we want to access Prometheus from outside!)

.exercise[

- Start the Prometheus server:
  ```bash
    docker service create --network prom --name prom \
           --publish 9090:9090 127.0.0.1:5000/prometheus
  ```

]

---

## Checking our Prometheus server

- First, let's make sure that Prometheus is correctly scraping all metrics

.exercise[

- Open port 9090 with your browser

- Click on "status", then "targets"

]

You should see 11 endpoints (5 cadvisor, 5 node, 1 prometheus).

Their state should be "UP".

---

## Displaying metrics directly from Prometheus

- This is easy ... if you are familiar with PromQL

.exercise[

- Click on "Graph", and in "expression", paste the following:
  ```
    sum without (cpu) (
      irate(
        container_cpu_usage_seconds_total{
          container_label_com_docker_swarm_service_name="influxdb"
          }[1m]
      )
    )
  ```

- Click on the blue "Execute" button and on the "Graph" tab just below

]

---

## Building the query from scratch

- We are going to build the same query from scratch

- This doesn't intend to be a detailed PromQL course

- This is merely so that you (I) can pretend to know how the previous query works
  <br/>so that your coworkers (you) can be suitably impressed (or not)

  (Or, so that we can build other queries if necessary, or adapt if cAdvisor,
  Prometheus, or anything else changes and requires editing the query!)

---

## Displaying a raw metric for *all* containers

- Click on the "Graph" tab on top

  *This takes us to a blank dashboard*

- Click on the "Insert metric at cursor" drop down, and select `container_cpu_usage_seconds_total`

  *This puts the metric name in the query box*

- Click on "Execute"

  *This fills a table of measurements below*

- Click on "Graph" (next to "Console")

  *This replaces the table of measurements with a series of graphs (after a few seconds)*

---

## Selecting metrics for a specific service

- Hover over the lines in the graph

  (Look for the ones that have labels like `container_label_com_docker_...`)

- Edit the query, adding a condition between curly braces:

  .small[`container_cpu_usage_seconds_total{container_label_com_docker_swarm_service_name="influxdb"}`]

- Click on "Execute"

  *Now we should see only one line per CPU*

- If you want to select by container ID, you can use a regex match: `id=~"/docker/c4bf.*"`

- You can also specify multiple conditions by separating them with commas

---

## Turn counters into rates

- What we see is the total amount of CPU used (in seconds)

- We want to see a *rate* (CPU time used / real time)

- To get a moving average over 1 minute periods, enclose the current expression within:

  ```
  rate ( ... { ... } [1m] )
  ```

  *This should turn our steadily-increasing CPU counter into a wavy graph*

- To get an instantaneous rate, use `irate` instead of `rate`

  (The time window is then used to limit how far behind to look for data if data points
  are missing in case of scrape failure; see [here](https://www.robustperception.io/irate-graphs-are-better-graphs/) for more details!)

  *This should show spikes that were previously invisible because they were smoothed out*

---

## Aggregate multiple data series

- We have one graph per CPU; we want to sum them

- Enclose the whole expression within:

  ```
  sum ( ... )
  ```

  *We now see a single graph*

- If we have multiple containers we can also collapse just the CPU dimension:

  ```
  sum without (cpu) ( ... )
  ```

  *This shows the same graph, but preserves the other labels*

- Congratulations, you wrote your first PromQL expression from scratch!

  (I'd like to thank [Johannes Ziemke](https://twitter.com/discordianfish) and
  [Julius Volz](https://twitter.com/juliusvolz) for their help with Prometheus!)

---

## Comparing Snap and Prometheus data

- If you haven't setup Snap, InfluxDB, and Grafana, skip this section

- If you have closed the Grafana tab, you might have to re-setup a new dashboard

  (Unless you saved it before navigating it away)

- To re-do the setup, just follow again the instructions from the previous chapter

---

## Add Prometheus as a data source in Grafana

.exercise[

- In a new tab, connect to Grafana (port 3000)

- Click on the Grafana logo (the orange spiral in the top-left corner)

- Click on "Data Sources"

- Click on the green "Add data source" button

]

We see the same input form that we filled earlier to connect to InfluxDB.

---

## Connecting to Prometheus from Grafana

.exercise[

- Enter "prom" in the name field

- Select "Prometheus" as the source type

- Enter http://(IP.address.of.any.node):9090 in the Url field

- Select "direct" as the access method

- Click on "Save and test"

]

Again, we should see a green box telling us "Data source is working."

Otherwise, double-check every field and try again!

---

## Adding the Prometheus data to our dashboard

.exercise[

- Go back to the the tab where we had our first Grafana dashboard

- Click on the blue "Add row" button in the lower right corner

- Click on the green tab on the left; select "Add panel" and "Graph"

]

This takes us to the graph editor that we used earlier.

---

## Querying Prometheus data from Grafana

The editor is a bit less friendly than the one we used for InfluxDB.

.exercise[

- Select "prom" as Panel data source

- Paste the query in the query field:
  ```
    sum without (cpu, id) ( irate (
      container_cpu_usage_seconds_total{
        container_label_com_docker_swarm_service_name="influxdb"}[1m] ) )
  ```

- Click outside of the query field to confirm

- Close the row editor by clicking the "X" in the top right area

]

---

## Interpreting results

- The two graphs *should* be similar

- Protip: align the time references!

.exercise[

- Click on the clock in the top right corner

- Select "last 30 minutes"

- Click on "Zoom out"

- Now press the right arrow key (hold it down and watch the CPU usage increase!)

]

*Adjusting units is left as an exercise for the reader.*

---

## More resources on container metrics

- [Docker Swarm & Container Overview](https://grafana.net/dashboards/609),
  a custom dashboard for Grafana

- [Gathering Container Metrics](http://jpetazzo.github.io/2013/10/08/docker-containers-metrics/),
  a blog post about cgroups

- [The Prometheus Time Series Database](https://www.youtube.com/watch?v=HbnGSNEjhUc),
  a talk explaining why custom data storage is necessary for metrics

---

# Dealing with stateful services

- First of all, you need to make sure that the data files are on a *volume*

- Volumes are host directories that are mounted to the container's filesystem

- These host directories can be backed by the ordinary, plain host filesystem ...

- ... Or by distributed/networked filesystems

- In the latter scenario, in case of node failure, the data is safe elsewhere ...

- ... And the container can be restarted on another node without data loss

---

## Building a stateful service experiment

- We will use Redis for this example

- We will expose it on port 10000 to access it easily

.exercise[

- Start the Redis service:
  ```bash
  docker service create --name stateful -p 10000:6379 redis
  ```

- Check that we can connect to it (replace XX.XX.XX.XX with any node's IP address):
  ```bash
  docker run --rm redis redis-cli -h `XX.XX.XX.XX` -p 10000 info server
  ```

]

---

## Accessing our Redis service easily

- Typing that whole command is going to be tedious

.exercise[

- Define a shell alias to make our lives easier:
  ```bash
  alias redis='docker run --rm redis redis-cli -h `XX.XX.XX.XX` -p 10000'
  ```

- Try it:
  ```bash
  redis info server
  ```

]

---

## Basic Redis commands

.exercise[

- Check that the `foo` key doesn't exist:
  ```bash
  redis get foo
  ```

- Set it to `bar`:
  ```bash
  redis set foo bar
  ```

- Check that it exists now:
  ```bash
  redis get foo
  ```

]

---

## Local volumes vs. global volumes

- Global volumes exist in a single namespace

- A global volume can be mounted on any node
  <br/>.small[(bar some restrictions specific to the volume driver in use; e.g. using an EBS-backed volume on a GCE/EC2 mixed cluster)]

- Attaching a global volume to a container allows to start the container anywhere
  <br/>(and retain its data wherever you start it!)

- Global volumes require extra *plugins* (Flocker, Portworx...)

- Docker doesn't come with a default global volume driver at this point

- Therefore, we will fall back on *local volumes*

---

## Local volumes

- We will use the default volume driver, `local`

- As the name implies, the `local` volume driver manages *local* volumes

- Since local volumes are (duh!) *local*, we need to pin our container to a specific host

- We will do that with a *constraint*

.exercise[

- Add a placement constraint to our service:
  ```bash
  docker service update stateful --constraint-add node.hostname==$HOSTNAME
  ```

]

---

## Where is our data?

- If we look for our `foo` key, it's gone!

.exercise[

- Check the `foo` key:
  ```bash
  redis get foo
  ```

- Adding a constraint caused the service to be redeployed:
  ```bash
  docker service ps stateful
  ```

]

Note: even if the constraint ends up being a no-op (i.e. not
moving the service), the service gets redeployed.
This ensures consistent behavior.

---

## Setting the key again

- Since our database was wiped out, let's populate it again

.exercise[

- Set `foo` again:
  ```bash
  redis set foo bar
  ```

- Check that it's there:
  ```bash
  redis get foo
  ```

]

---

## Service updates cause containers to be replaced

- Let's try to make a trivial update to the service and see what happens

.exercise[

- Set a memory limit to our Redis service:
  ```bash
  docker service update stateful --limit-memory 100M
  ```

- Try to get the `foo` key one more time:
  ```bash
  redis get foo
  ```

]

The key is blank again!

---

## Service volumes are ephemeral by default

- Let's highlight what's going on with volumes!

.exercise[

- Check the current list of volumes:
  ```bash
  docker volume ls
  ```

- Carry a minor update to our Redis service:
  ```bash
  docker service update stateful --limit-memory 200M
  ```

]

Again: all changes trigger the creation of a new task, and therefore a replacement of the existing container;
even when it is not strictly technically necessary.

---

## The data is gone again

- What happened to our data?

.exercise[

- The list of volumes is slightly different:
  ```bash
  docker volume ls
  ```

]

(You should see one extra volume.)

---

## Assigning a persistent volume to the container

- Let's add an explicit volume mount to our service, referencing a named volume

.exercise[

- Update the service with a volume mount:
  ```bash
    docker service update stateful \
           --mount-add type=volume,source=foobarstore,target=/data
  ```

- Check the new volume list:
  ```bash
  docker volume ls
  ```

]

Note: the `local` volume driver automatically creates volumes.

---

## Checking that persistence actually works across service updates

.exercise[

- Store something in the `foo` key:
  ```bash
  redis set foo barbar
  ```

- Update the service with yet another trivial change:
  ```bash
  docker service update stateful --limit-memory 300M
  ```

- Check that `foo` is still set:
  ```bash
  redis get foo
  ```

]

---

## Recap

- The service must commit its state to disk when being shutdown.red[*]

  (Shutdown = being sent a `TERM` signal)

- The state must be written on files located on a volume

- That volume must be specified to be persistent

- If using a local volume, the service must also be pinned to a specific node

  (And losing that node means losing the data, unless there are other backups)

.footnote[<br/>.red[*]Until recently, the Redis image didn't automatically
persist data. Beware!]

---

## Cleaning up

.exercise[

- Remove the stateful service:
  ```bash
  docker service rm stateful
  ```

- Remove the associated volume:
  ```bash
  docker volume rm foobarstore
  ```

]

Note: we could keep the volume around if we wanted.

---

# Controlling Docker from a container

- In a local environment, just bind-mount the Docker control socket:
  ```bash
  docker run -ti -v /var/run/docker.sock:/var/run/docker.sock docker
  ```

- Otherwise, you have to:

  - set `DOCKER_HOST`,
  - set `DOCKER_TLS_VERIFY` and `DOCKER_CERT_PATH` (if you use TLS),
  - copy certificates to the container that will need API access.

More resources on this topic:

- [Do not use Docker-in-Docker for CI](
  http://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/)
- [One container to rule them all](
  http://jpetazzo.github.io/2016/04/03/one-container-to-rule-them-all/)

---

## Bind-mounting the Docker control socket

- In Swarm mode, bind-mounting the control socket gives you access to the whole cluster

- You can tell Docker to place a given service on a manager node, using constraints:
  ```bash
    docker service create \
      --mount source=/var/run/docker.sock,type=bind,target=/var/run/docker.sock \
      --name autoscaler --constraint node.role==manager ...
  ```

---

## Constraints and global services

(New in Docker Engine 1.13)

- By default, global services run on *all* nodes
  ```bash
  docker service create --mode global ...
  ```

- You can specify constraints for global services

- These services will run only on the node satisfying the constraints

- For instance, this service will run on all manager nodes:
  ```bash
  docker service create --mode global --constraint node.role==manager ...
  ```

---

## Constraints and dynamic scheduling

(New in Docker Engine 1.13)

- If constraints change, services are started/stopped accordingly

  (e.g., `--constraint node.role==manager` and nodes are promoted/demoted)

- This is particularly useful with labels:
  ```bash
  docker node update node1 --label-add defcon=five
  docker service create --constraint node.labels.defcon==five ...
  docker node update node2 --label-add defcon=five
  docker node update node1 --label-rm defcon=five
  ```

---

## Shortcomings of dynamic scheduling

.warning[If a service becomes "unschedulable" (constraints can't be satisfied):]

- It won't be scheduled automatically when constraints are satisfiable again

- You will have to update the service; you can do a no-op udate with:
  ```bash
  docker service update ... --force
  ```

.warning[Docker will silently ignore attemps to remove a non-existent label or constraint]

- It won't warn you if you typo when removing a label or constraint!

---

# Node management

- SwarmKit allows to change (almost?) everything on-the-fly

- Nothing should require a global restart

---

## Node availability

```bash
docker node update <node-name> --availability <active|pause|drain>
```

- Active = schedule tasks on this node (default)

- Pause = don't schedule new tasks on this node; existing tasks are not affected

  You can use it to troubleshoot a node without disrupting existing tasks

  It can also be used (in conjunction with labels) to reserve resources

- Drain = don't schedule new tasks on this node; existing tasks are moved away

  This is just like crashing the node, but containers get a chance to shutdown cleanly

---

## Managers and workers

- Nodes can be promoted to manager with `docker node promote`

- Nodes can be demoted to worker with `docker node demote`

- This can also be done with `docker node update <node> --role <manager|worker>`

- Reminder: this has to be done from a manager node
  <br/>(workers cannot promote themselves)

---

## Removing nodes

- You can leave Swarm mode with `docker swarm leave`

- Nodes are drained before being removed (i.e. all tasks are rescheduled somewhere else)

- Managers cannot leave (they have to be demoted first)

- After leaving, a node still shows up in `docker node ls` (in `Down` state)

- When a node is `Down`, you can remove it with `docker node rm` (from a manager node)

---

## Join tokens and automation

- If you have used Docker 1.12-RC: join tokens are now mandatory!

- You cannot specify your own token (SwarmKit generates it)

- If you need to change the token: `docker swarm join-token --rotate ...`

- To automate cluster deployment:

  - have a seed node do `docker swarm init` if it's not already in Swarm mode

  - propagate the token to the other nodes (secure bucket, facter, ohai...)

---

## Disk space management: `docker system df`

- Shows disk usage for images, containers, and volumes

- Breaks down between *active* and *reclaimable* categories

.exercise[

- Check how much disk space is used at the end of the workshop:
  ```bash
  docker system df
  ```

]

Note: `docker system` is new in Docker Engine 1.13.

---

## Reclaiming unused resources: `docker system prune`

- Removes stopped containers

- Removes dangling images (that don't have a tag associated anymore)

- Removes orphaned volumes

- Removes empty networks

.exercise[

- Try it:
  ```bash
  docker system prune -f
  ```

]

Note: `docker system prune -a` will also remove *unused* images.

---

class: title

# What's next?

## (What to expect in future versions of this workshop)

---

## Implemented and stable, but out of scope

- [Docker Content Trust](https://docs.docker.com/engine/security/trust/content_trust/) and
  [Notary](https://github.com/docker/notary) (image signature and verification)

- Image security scanning (many products available, Docker Inc. and 3rd party)

- [Docker Cloud](https://cloud.docker.com/) and
  [Docker Datacenter](https://www.docker.com/products/docker-datacenter)
  (commercial offering with node management, secure registry, CI/CD pipelines, all the bells and whistles)

- Network and storage plugins

---

## Work in progress

- Stabilize Compose/Swarm integration

- Refine Snap deployment

- Healthchecks

- Demo at least one volume plugin
  <br/>(bonus points if it's a distributed storage system)

- ..................................... (your favorite feature here)

Reminder: there is a tag for each iteration of the content
in the Github repository.

It makes it easy to come back later and check what has changed since you did it!

---

class: title

# Thanks! <!-- <br/> Questions? -->

<!--
## [@jpetazzo](https://twitter.com/jpetazzo) <br/> [@docker](https://twitter.com/docker)
-->

<!--
## AJ ([@s0ulshake](https://twitter.com/s0ulshake)) <br/> Jérôme ([@jpetazzo](https://twitter.com/jpetazzo)) <br/> Tiffany ([@tiffanyfayj](https://twitter.com/tiffanyfayj))
-->

    </textarea>
    <script src="remark-0.14.min.js" type="text/javascript">
    </script>
    <script type="text/javascript">
      var slideshow = remark.create({
        ratio: '16:9',
        highlightSpans: true
      });
    </script>
  </body>
</html>