diff --git a/docs/apiscope.md b/docs/apiscope.md new file mode 100644 index 00000000..b924a0e0 --- /dev/null +++ b/docs/apiscope.md @@ -0,0 +1,41 @@ +## A reminder about *scope* + +- Out of the box, Docker API access is "all or nothing" + +- When someone has access to the Docker API, they can access *everything* + +- If your developers are using the Docker API to deploy on the dev cluster ... + + ... and the dev cluster is the same as the prod cluster ... + + ... it means that your devs have access to your production data, passwords, etc. + +- This can easily be avoided + +--- + +## Fine-grained API access control + +A few solutions, by increasing order of flexibility: + +- Use separate clusters for different security perimeters + + (And different credentials for each cluster) + +-- + +- Add an extra layer of abstraction (sudo scripts, hooks, or full-blown PAAS) + +-- + +- Enable [authorization plugins] + + - each API request is vetted by your plugin(s) + + - by default, the *subject name* in the client TLS certificate is used as user name + + - example: [user and permission management] in [UCP] + +[authorization plugins]: https://docs.docker.com/engine/extend/plugins_authorization/ +[UCP]: https://docs.docker.com/datacenter/ucp/2.1/guides/ +[user and permission management]: https://docs.docker.com/datacenter/ucp/2.1/guides/admin/manage-users/ diff --git a/docs/docker-compose.yml b/docs/docker-compose.yml new file mode 100644 index 00000000..4de8a255 --- /dev/null +++ b/docs/docker-compose.yml @@ -0,0 +1,11 @@ +version: "3" + +services: + + www: + image: nginx + volumes: + - ".:/usr/share/nginx/html" + ports: + - "8888:80" + diff --git a/docs/encryptionatrest.md b/docs/encryptionatrest.md new file mode 100644 index 00000000..3df9797e --- /dev/null +++ b/docs/encryptionatrest.md @@ -0,0 +1,170 @@ +class: encryption-at-rest + +## Encryption at rest + +- Swarm data is always encrypted + +- A Swarm cluster can be "locked" + +- When a cluster is "locked", the encryption key is protected with a passphrase + +- Starting or restarting a locked manager requires the passphrase + +- This protects against: + + - theft (stealing a physical machine, a disk, a backup tape...) + + - unauthorized access (to e.g. a remote or virtual volume) + + - some vulnerabilities (like path traversal) + +--- + +class: encryption-at-rest + +## Locking a Swarm cluster + +- This is achieved through the `docker swarm update` command + +.exercise[ + +- Lock our cluster: + ```bash + docker swarm update --autolock=true + ``` + +] + +This will display the unlock key. Copy-paste it somewhere safe. + +--- + +class: encryption-at-rest + +## Locked state + +- If we restart a manager, it will now be locked + +.exercise[ + +- Restart the local Engine: + ```bash + sudo systemctl restart docker + ``` + +] + +Note: if you are doing the workshop on your own, using nodes +that you [provisioned yourself](https://github.com/jpetazzo/orchestration-workshop/tree/master/prepare-machine) or with [Play-With-Docker](http://play-with-docker.com/), you might have to use a different method to restart the Engine. + +--- + +class: encryption-at-rest + +## Checking that our node is locked + +- Manager commands (requiring access to crypted data) will fail + +- Other commands are OK + +.exercise[ + +- Try a few basic commands: + ```bash + docker ps + docker run alpine echo ♥ + docker node ls + ``` + +] + +(The last command should fail, and it will tell you how to unlock this node.) + +--- + +class: encryption-at-rest + +## Checking the state of the node programmatically + +- The state of the node shows up in the output of `docker info` + +.exercise[ + +- Check the output of `docker info`: + ```bash + docker info + ``` + +- Can't see it? Too verbose? Grep to the rescue! + ```bash + docker info | grep ^Swarm + ``` + +] + +--- + +class: encryption-at-rest + +## Unlocking a node + +- You will need the secret token that we obtained when enabling auto-lock earlier + +.exercise[ + +- Unlock the node: + ```bash + docker swarm unlock + ``` + +- Copy-paste the secret token that we got earlier + +- Check that manager commands now work correctly: + ```bash + docker node ls + ``` + +] + +--- + +class: encryption-at-rest + +## Managing the secret key + +- If the key is compromised, you can change it and re-encrypt with a new key: + ```bash + docker swarm unlock-key --rotate + ``` + +- If you lost the key, you can get it as long as you have at least one unlocked node: + ```bash + docker swarm unlock-key -q + ``` + +Note: if you rotate the key while some nodes are locked, without saving the previous key, those nodes won't be able to rejoin. + +Note: if somebody steals both your disks and your key, .strike[you're doomed! Doooooomed!] +
you can block the compromised node with `docker node demote` and `docker node rm`. + +--- + +class: encryption-at-rest + +## Unlocking the cluster permanently + +- If you want to remove the secret key, disable auto-lock + +.exercise[ + +- Permanently unlock the cluster: + ```bash + docker swarm update --autolock=false + ``` + +] + +Note: if some nodes are in locked state at that moment (or if they are offline/restarting +while you disabled autolock), they still need the previous unlock key to get back online. + +For more information about locking, you can check the [upcoming documentation](https://github.com/docker/docker.github.io/pull/694). diff --git a/docs/end.md b/docs/end.md new file mode 100644 index 00000000..0c837e09 --- /dev/null +++ b/docs/end.md @@ -0,0 +1,64 @@ +class: title, extra-details + +# What's next? + +## (What to expect in future versions of this workshop) + +--- + +class: extra-details + +## Implemented and stable, but out of scope + +- [Docker Content Trust](https://docs.docker.com/engine/security/trust/content_trust/) and + [Notary](https://github.com/docker/notary) (image signature and verification) + +- Image security scanning (many products available, Docker Inc. and 3rd party) + +- [Docker Cloud](https://cloud.docker.com/) and + [Docker Datacenter](https://www.docker.com/products/docker-datacenter) + (commercial offering with node management, secure registry, CI/CD pipelines, all the bells and whistles) + +- Network and storage plugins + +--- + +class: extra-details + +## Work in progress + +- Demo at least one volume plugin +
(bonus points if it's a distributed storage system) + +- ..................................... (your favorite feature here) + +Reminder: there is a tag for each iteration of the content +in the Github repository. + +It makes it easy to come back later and check what has changed since you did it! + +--- + +class: title, self-paced + +Thank you! + +--- + +class: title, in-person + +That's all folks!
Questions? + +.small[.small[ + +Jérôme ([@jpetazzo](https://twitter.com/jpetazzo)) — [@docker](https://twitter.com/docker) + +AJ ([@s0ulshake](https://twitter.com/s0ulshake)) — *For hire!* +
+`curl cv.soulshake.net` + +]] + + diff --git a/docs/extract-section-titles.py b/docs/extract-section-titles.py deleted file mode 100755 index a5f27fc0..00000000 --- a/docs/extract-section-titles.py +++ /dev/null @@ -1,19 +0,0 @@ -#!/usr/bin/env python -""" -Extract and print level 1 and 2 titles from workshop slides. -""" - -separators = [ - "---", - "--" -] - -slide_count = 1 -for line in open("index.html"): - line = line.strip() - if line in separators: - slide_count += 1 - if line.startswith('# '): - print slide_count, '# #', line - elif line.startswith('# '): - print slide_count, line diff --git a/docs/extratips.md b/docs/extratips.md new file mode 100644 index 00000000..92193f9b --- /dev/null +++ b/docs/extratips.md @@ -0,0 +1,274 @@ +class: extra-details + +# Controlling Docker from a container + +- In a local environment, just bind-mount the Docker control socket: + ```bash + docker run -ti -v /var/run/docker.sock:/var/run/docker.sock docker + ``` + +- Otherwise, you have to: + + - set `DOCKER_HOST`, + - set `DOCKER_TLS_VERIFY` and `DOCKER_CERT_PATH` (if you use TLS), + - copy certificates to the container that will need API access. + +More resources on this topic: + +- [Do not use Docker-in-Docker for CI]( + http://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/) +- [One container to rule them all]( + http://jpetazzo.github.io/2016/04/03/one-container-to-rule-them-all/) + +--- + +class: extra-details + +## Bind-mounting the Docker control socket + +- In Swarm mode, bind-mounting the control socket gives you access to the whole cluster + +- You can tell Docker to place a given service on a manager node, using constraints: + ```bash + docker service create \ + --mount source=/var/run/docker.sock,type=bind,target=/var/run/docker.sock \ + --name autoscaler --constraint node.role==manager ... + ``` + +--- + +class: extra-details + +## Constraints and global services + +(New in Docker Engine 1.13) + +- By default, global services run on *all* nodes + ```bash + docker service create --mode global ... + ``` + +- You can specify constraints for global services + +- These services will run only on the node satisfying the constraints + +- For instance, this service will run on all manager nodes: + ```bash + docker service create --mode global --constraint node.role==manager ... + ``` + +--- + +class: extra-details + +## Constraints and dynamic scheduling + +(New in Docker Engine 1.13) + +- If constraints change, services are started/stopped accordingly + + (e.g., `--constraint node.role==manager` and nodes are promoted/demoted) + +- This is particularly useful with labels: + ```bash + docker node update node1 --label-add defcon=five + docker service create --constraint node.labels.defcon==five ... + docker node update node2 --label-add defcon=five + docker node update node1 --label-rm defcon=five + ``` + +--- + +class: extra-details + +## Shortcomings of dynamic scheduling + +.warning[If a service becomes "unschedulable" (constraints can't be satisfied):] + +- It won't be scheduled automatically when constraints are satisfiable again + +- You will have to update the service; you can do a no-op udate with: + ```bash + docker service update ... --force + ``` + +.warning[Docker will silently ignore attempts to remove a non-existent label or constraint] + +- It won't warn you if you typo when removing a label or constraint! + +--- + +class: extra-details + +# Node management + +- SwarmKit allows to change (almost?) everything on-the-fly + +- Nothing should require a global restart + +--- + +class: extra-details + +## Node availability + +```bash +docker node update --availability +``` + +- Active = schedule tasks on this node (default) + +- Pause = don't schedule new tasks on this node; existing tasks are not affected + + You can use it to troubleshoot a node without disrupting existing tasks + + It can also be used (in conjunction with labels) to reserve resources + +- Drain = don't schedule new tasks on this node; existing tasks are moved away + + This is just like crashing the node, but containers get a chance to shutdown cleanly + +--- + +class: extra-details + +## Managers and workers + +- Nodes can be promoted to manager with `docker node promote` + +- Nodes can be demoted to worker with `docker node demote` + +- This can also be done with `docker node update --role ` + +- Reminder: this has to be done from a manager node +
(workers cannot promote themselves) + +--- + +class: extra-details + +## Removing nodes + +- You can leave Swarm mode with `docker swarm leave` + +- Nodes are drained before being removed (i.e. all tasks are rescheduled somewhere else) + +- Managers cannot leave (they have to be demoted first) + +- After leaving, a node still shows up in `docker node ls` (in `Down` state) + +- When a node is `Down`, you can remove it with `docker node rm` (from a manager node) + +--- + +class: extra-details + +## Join tokens and automation + +- If you have used Docker 1.12-RC: join tokens are now mandatory! + +- You cannot specify your own token (SwarmKit generates it) + +- If you need to change the token: `docker swarm join-token --rotate ...` + +- To automate cluster deployment: + + - have a seed node do `docker swarm init` if it's not already in Swarm mode + + - propagate the token to the other nodes (secure bucket, facter, ohai...) + +--- + +class: extra-details + +## Disk space management: `docker system df` + +- Shows disk usage for images, containers, and volumes + +- Breaks down between *active* and *reclaimable* categories + +.exercise[ + +- Check how much disk space is used at the end of the workshop: + ```bash + docker system df + ``` + +] + +Note: `docker system` is new in Docker Engine 1.13. + +--- + +class: extra-details + +## Reclaiming unused resources: `docker system prune` + +- Removes stopped containers + +- Removes dangling images (that don't have a tag associated anymore) + +- Removes orphaned volumes + +- Removes empty networks + +.exercise[ + +- Try it: + ```bash + docker system prune -f + ``` + +] + +Note: `docker system prune -a` will also remove *unused* images. + +--- + +class: extra-details + +## Events + +- You can get a real-time stream of events with `docker events` + +- This will report *local events* and *cluster events* + +- Local events = +
+ all activity related to containers, images, plugins, volumes, networks, *on this node* + +- Cluster events = +
Swarm Mode activity related to services, nodes, secrets, configs, *on the whole cluster* + +- `docker events` doesn't report *local events happening on other nodes* + +- Events can be filtered (by type, target, labels...) + +- Events can be formatted with Go's `text/template` or in JSON + +--- + +class: extra-details + +## Getting *all the events* + +- There is no built-in to get a stream of *all the events* on *all the nodes* + +- This can be achieved with (for instance) the four following services working together: + + - a Redis container (used as a stateless, fan-in message queue) + + - a global service bind-mounting the Docker socket, pushing local events to the queue + + - a similar singleton service to push global events to the queue + + - a queue consumer fetching events and processing them as you please + +I'm not saying that you should implement it with Shell scripts, but you totally could. + +.small[ +(It might or might not be one of the initiating rites of the +[House of Bash](https://twitter.com/carmatrocity/status/676559402787282944)) +] + +For more information about event filters and types, check [the documentation](https://docs.docker.com/engine/reference/commandline/events/). diff --git a/docs/firstservice.md b/docs/firstservice.md new file mode 100644 index 00000000..47945367 --- /dev/null +++ b/docs/firstservice.md @@ -0,0 +1,472 @@ +# Running our first Swarm service + +- How do we run services? Simplified version: + + `docker run` → `docker service create` + +.exercise[ + +- Create a service featuring an Alpine container pinging Google resolvers: + ```bash + docker service create alpine ping 8.8.8.8 + ``` + +- Check the result: + ```bash + docker service ps + ``` + +] + +--- + +## `--detach` for service creation + +(New in Docker Engine 17.05) + +If you are running Docker 17.05 or later, you will see the following message: + +``` +Since --detach=false was not specified, tasks will be created in the background. +In a future release, --detach=false will become the default. +``` + +Let's ignore it for now; but we'll come back to it in just a few minutes! + +--- + +## Checking service logs + +(New in Docker Engine 17.05) + +- Just like `docker logs` shows the output of a specific local container ... + +- ... `docker service logs` shows the output of all the containers of a specific service + +.exercise[ + +- Check the output of our ping command: + ```bash + docker service logs + ``` + +] + +Flags `--follow` and `--tail` are available, as well as a few others. + +Note: by default, when a container is destroyed (e.g. when scaling down), its logs are lost. + +--- + +class: extra-details + +## Before Docker Engine 17.05 + +- Docker 1.13/17.03/17.04 have `docker service logs` as an experimental feature +
(available only when enabling the experimental feature flag) + +- We have to use `docker logs`, which only works on local containers + +- We will have to connect to the node running our container +
(unless it was scheduled locally, of course) + +--- + +class: extra-details + +## Looking up where our container is running + +- The `docker service ps` command told us where our container was scheduled + +.exercise[ + +- Look up the `NODE` on which the container is running: + ```bash + docker service ps + ``` + +- If you use Play-With-Docker, switch to that node's tab, or set `DOCKER_HOST` + +- Otherwise, `ssh` into tht node or use `$(eval docker-machine env node...)` + +] + +--- + +class: extra-details + +## Viewing the logs of the container + +.exercise[ + +- See that the container is running and check its ID: + ```bash + docker ps + ``` + +- View its logs: + ```bash + docker logs + ``` + +- Go back to `node1` afterwards + +] + +--- + +## Scale our service + +- Services can be scaled in a pinch with the `docker service update` command + +.exercise[ + +- Scale the service to ensure 2 copies per node: + ```bash + docker service update --replicas 10 + ``` + +- Check that we have two containers on the current node: + ```bash + docker ps + ``` + +] + +--- + +## View deployment progress + +(New in Docker Engine 17.05) + +- Commands that create/update/delete services can run with `--detach=false` + +- The CLI will show the status of the command, and exit once it's done working + +.exercise[ + +- Scale the service to ensure 3 copies per node: + ```bash + docker service update --replicas 15 --detach=false + ``` + +] + +Note: `--detach=false` will eventually become the default. + +With older versions, you can use e.g.: `watch docker service ps ` + +--- + +## Expose a service + +- Services can be exposed, with two special properties: + + - the public port is available on *every node of the Swarm*, + + - requests coming on the public port are load balanced across all instances. + +- This is achieved with option `-p/--publish`; as an approximation: + + `docker run -p → docker service create -p` + +- If you indicate a single port number, it will be mapped on a port + starting at 30000 +
(vs. 32768 for single container mapping) + +- You can indicate two port numbers to set the public port number +
(just like with `docker run -p`) + +--- + +## Expose ElasticSearch on its default port + +.exercise[ + +- Create an ElasticSearch service (and give it a name while we're at it): + ```bash + docker service create --name search --publish 9200:9200 --replicas 7 \ + --detach=false elasticsearch`:2` + ``` + +] + +Note: don't forget the **:2**! + +The latest version of the ElasticSearch image won't start without mandatory configuration. + +--- + +## Tasks lifecycle + +- During the deployment, you will be able to see multiple states: + + - assigned (the task has been assigned to a specific node) + + - preparing (this mostly means "pulling the image") + + - starting + + - running + +- When a task is terminated (stopped, killed...) it cannot be restarted + + (A replacement task will be created) + +--- + +class: extra-details + +![diagram showing what happens during docker service create, courtesy of @aluzzardi](docker-service-create.svg) + +--- + +## Test our service + +- We mapped port 9200 on the nodes, to port 9200 in the containers + +- Let's try to reach that port! + +.exercise[ + +- Try the following command: + ```bash + curl localhost:9200 + ``` + +] + +(If you get `Connection refused`: congratulations, you are very fast indeed! Just try again.) + +ElasticSearch serves a little JSON document with some basic information +about this instance; including a randomly-generated super-hero name. + +--- + +## Test the load balancing + +- If we repeat our `curl` command multiple times, we will see different names + +.exercise[ + +- Send 10 requests, and see which instances serve them: + ```bash + for N in $(seq 1 10); do + curl -s localhost:9200 | jq .name + done + ``` + +] + +Note: if you don't have `jq` on your Play-With-Docker instance, just install it: +```bash +apk add --no-cache jq +``` + +--- + +## Load balancing results + +Traffic is handled by our clusters [TCP routing mesh]( +https://docs.docker.com/engine/swarm/ingress/). + +Each request is served by one of the 7 instances, in rotation. + +Note: if you try to access the service from your browser, +you will probably see the same +instance name over and over, because your browser (unlike curl) will try +to re-use the same connection. + +--- + +## Under the hood of the TCP routing mesh + +- Load balancing is done by IPVS + +- IPVS is a high-performance, in-kernel load balancer + +- It's been around for a long time (merged in the kernel since 2.4) + +- Each node runs a local load balancer + + (Allowing connections to be routed directly to the destination, + without extra hops) + +--- + +## Managing inbound traffic + +There are many ways to deal with inbound traffic on a Swarm cluster. + +- Put all (or a subset) of your nodes in a DNS `A` record + +- Assign your nodes (or a subset) to an ELB + +- Use a virtual IP and make sure that it is assigned to an "alive" node + +- etc. + +--- + +class: btw-labels + +## Managing HTTP traffic + +- The TCP routing mesh doesn't parse HTTP headers + +- If you want to place multiple HTTP services on port 80, you need something more + +- You can setup NGINX or HAProxy on port 80 to do the virtual host switching + +- Docker Universal Control Plane provides its own [HTTP routing mesh]( + https://docs.docker.com/datacenter/ucp/2.1/guides/admin/configure/use-domain-names-to-access-services/) + + - add a specific label starting with `com.docker.ucp.mesh.http` to your services + + - labels are detected automatically and dynamically update the configuration + +--- + +class: btw-labels + +## You should use labels + +- Labels are a great way to attach arbitrary information to services + +- Examples: + + - HTTP vhost of a web app or web service + + - backup schedule for a stateful service + + - owner of a service (for billing, paging...) + + - etc. + +--- + +## Pro-tip for ingress traffic management + +- It is possible to use *local* networks with Swarm services + +- This means that you can do something like this: + ```bash + docker service create --network host --mode global traefik ... + ``` + + (This runs the `traefik` load balancer on each node of your cluster, in the `host` network) + +- This gives you native performance (no iptables, no proxy, no nothing!) + +- The load balancer will "see" the clients' IP addresses + +- But: a container cannot simultaneously be in the `host` network and another network + + (You will have to route traffic to containers using exposed ports or UNIX sockets) + +--- + +class: extra-details + +## Using local networks (`host`, `macvlan` ...) with Swarm services + +- Using the `host` network is fairly straightforward + + (With the caveats described on the previous slide) + +- It is also possible to use drivers like `macvlan` + + - see [this guide]( +https://docs.docker.com/engine/userguide/networking/get-started-macvlan/ +) to get started on `macvlan` + + - see [this PR](https://github.com/moby/moby/pull/32981) for more information about local network drivers in Swarm mode + +--- + +## Visualize container placement + +- Let's leverage the Docker API! + +.exercise[ + +- Get the source code of this simple-yet-beautiful visualization app: + ```bash + cd ~ + git clone git://github.com/dockersamples/docker-swarm-visualizer + ``` + +- Build and run the Swarm visualizer: + ```bash + cd docker-swarm-visualizer + docker-compose up -d + ``` + +] + +--- + +## Connect to the visualization webapp + +- It runs a web server on port 8080 + +.exercise[ + +- Point your browser to port 8080 of your node1's public ip + + (If you use Play-With-Docker, click on the (8080) badge) + +] + +- The webapp updates the display automatically (you don't need to reload the page) + +- It only shows Swarm services (not standalone containers) + +- It shows when nodes go down + +- It has some glitches (it's not Carrier-Grade Enterprise-Compliant ISO-9001 software) + +--- + +## Why This Is More Important Than You Think + +- The visualizer accesses the Docker API *from within a container* + +- This is a common pattern: run container management tools *in containers* + +- Instead of viewing your cluster, this could take care of logging, metrics, autoscaling ... + +- We can run it within a service, too! We won't do it, but the command would look like: + + ```bash + docker service create \ + --mount source=/var/run/docker.sock,type=bind,target=/var/run/docker.sock \ + --name viz --constraint node.role==manager ... + ``` + +Credits: the visualization code was written by +[Francisco Miranda](https://github.com/maroshii). +
+[Mano Marks](https://twitter.com/manomarks) adapted +it to Swarm and maintains it. + +--- + +## Terminate our services + +- Before moving on, we will remove those services + +- `docker service rm` can accept multiple services names or IDs + +- `docker service ls` can accept the `-q` flag + +- A Shell snippet a day keeps the cruft away + +.exercise[ + +- Remove all services with this one liner: + ```bash + docker service ls -q | xargs docker service rm + ``` + +] diff --git a/docs/healthchecks.md b/docs/healthchecks.md new file mode 100644 index 00000000..5435e1af --- /dev/null +++ b/docs/healthchecks.md @@ -0,0 +1,227 @@ +name: healthchecks + +class: healthchecks + +# Health checks + +(New in Docker Engine 1.12) + +- Commands that are executed on regular intervals in a container + +- Must return 0 or 1 to indicate "all is good" or "something's wrong" + +- Must execute quickly (timeouts = failures) + +- Example: + ```bash + curl -f http://localhost/_ping || false + ``` + - the `-f` flag ensures that `curl` returns non-zero for 404 and similar errors + - `|| false` ensures that any non-zero exit status gets mapped to 1 + - `curl` must be installed in the container that is being checked + +--- + +class: healthchecks + +## Defining health checks + +- In a Dockerfile, with the [HEALTHCHECK](https://docs.docker.com/engine/reference/builder/#healthcheck) instruction + ``` + HEALTHCHECK --interval=1s --timeout=3s CMD curl -f http://localhost/ || false + ``` + +- From the command line, when running containers or services + ``` + docker run --health-cmd "curl -f http://localhost/ || false" ... + docker service create --health-cmd "curl -f http://localhost/ || false" ... + ``` + +- In Compose files, with a per-service [healthcheck](https://docs.docker.com/compose/compose-file/#healthcheck) section + ```yaml + www: + image: hellowebapp + healthcheck: + test: "curl -f https://localhost/ || false" + timeout: 3s + ``` + +--- + +class: healthcheck + +## Using health checks + +- With `docker run`, health checks are purely informative + + - `docker ps` shows health status + + - `docker inspect` has extra details (including health check command output) + +- With `docker service`: + + - unhealthy tasks are terminated (i.e. the service is restarted) + + - failed deployments can be rolled back automatically +
(by setting *at least* the flag `--update-failure action rollback`) + +--- + +class: healthcheck + +## Automated rollbacks + +Here is a comprehensive example using the CLI: + +```bash +docker service update \ + --update-delay 5s \ + --update-failure-action rollback \ + --update-max-failure-ratio .25 \ + --update-monitor 5s \ + --update-parallelism 1 \ + --rollback-delay 5s \ + --rollback-failure-action pause \ + --rollback-max-failure-ratio .5 \ + --rollback-monitor 5s \ + --rollback-parallelism 0 \ + --health-cmd "curl -f http://localhost/ || exit 1" \ + --health-interval 2s \ + --health-retries 1 \ + --image yourimage:newversion \ + yourservice +``` + +--- + +class: healthcheck + +## Implementing auto-rollback in practice + +We will use the following Compose file (`stacks/dockercoins+healthchecks.yml`): + +```yaml +... + hasher: + build: dockercoins/hasher + image: ${REGISTRY-127.0.0.1:5000}/hasher:${TAG-latest} + deploy: + replicas: 7 + update_config: + delay: 5s + failure_action: rollback + max_failure_ratio: .5 + monitor: 5s + parallelism: 1 +... +``` + +--- + +class: healthcheck + +## Enabling auto-rollback + +.exercise[ + +- Go to the `stacks` directory: + ```bash + cd ~/orchestration-workshop/ + ``` + +- Deploy the updated stack: + ```bash + docker deploy dockercoins --compose-file dockercoins+healthchecks.yml + ``` + +] + +This will also scale the `hasher` service to 7 instances. + +--- + +class: healthcheck + +## Visualizing a rolling update + +First, let's make an "innocent" change and deploy it. + +.exercise[ + +- Update the `sleep` delay in the code: + ```bash + sed -i "s/sleep 0.1/sleep 0.2/" dockercoins/hasher/hasher.rb + ``` + +- Build, ship, and run the new image: + ```bash + docker-compose -f dockercoins+healthchecks.yml build + docker-compose -f dockercoins+healthchecks.yml push + docker service update dockercoins_hasher \ + --detach=false --image=127.0.0.1:5000/hasher:latest + ``` + +] + +--- + +class: healthcheck + +## Visualizing an automated rollback + +And now, a breaking change that will cause the health check to fail: + +.exercise[ + +- Change the HTTP listening port: + ```bash + sed -i "s/80/81/" dockercoins/hasher/hasher.rb + ``` + +- Build, ship, and run the new image: + ```bash + docker-compose -f dockercoins+healthchecks.yml build + docker-compose -f dockercoins+healthchecks.yml push + docker service update dockercoins_hasher \ + --detach=false --image=127.0.0.1:5000/hasher:latest + ``` + +] + +--- + +class: healthcheck + +## Command-line options available for health checks, rollbacks, etc. + +Batteries included, but swappable + +.small[ +``` +--health-cmd string Command to run to check health +--health-interval duration Time between running the check (ms|s|m|h) +--health-retries int Consecutive failures needed to report unhealthy +--health-start-period duration Start period for the container to initialize before counting retries towards unstable (ms|s|m|h) +--health-timeout duration Maximum time to allow one check to run (ms|s|m|h) +--no-healthcheck Disable any container-specified HEALTHCHECK +--restart-condition string Restart when condition is met ("none"|"on-failure"|"any") +--restart-delay duration Delay between restart attempts (ns|us|ms|s|m|h) +--restart-max-attempts uint Maximum number of restarts before giving up +--restart-window duration Window used to evaluate the restart policy (ns|us|ms|s|m|h) +--rollback Rollback to previous specification +--rollback-delay duration Delay between task rollbacks (ns|us|ms|s|m|h) +--rollback-failure-action string Action on rollback failure ("pause"|"continue") +--rollback-max-failure-ratio float Failure rate to tolerate during a rollback +--rollback-monitor duration Duration after each task rollback to monitor for failure (ns|us|ms|s|m|h) +--rollback-order string Rollback order ("start-first"|"stop-first") +--rollback-parallelism uint Maximum number of tasks rolled back simultaneously (0 to roll back all at once) +--update-delay duration Delay between updates (ns|us|ms|s|m|h) +--update-failure-action string Action on update failure ("pause"|"continue"|"rollback") +--update-max-failure-ratio float Failure rate to tolerate during an update +--update-monitor duration Duration after each task update to monitor for failure (ns|us|ms|s|m|h) +--update-order string Update order ("start-first"|"stop-first") +--update-parallelism uint Maximum number of tasks updated simultaneously (0 to update all at once) +``` +] + +Yup ... That's a lot of batteries! diff --git a/docs/intro.md b/docs/intro.md new file mode 100644 index 00000000..f0f73f7c --- /dev/null +++ b/docs/intro.md @@ -0,0 +1,142 @@ +class: title, self-paced + +Docker
Orchestration
Workshop + +--- + +class: title, in-person + +.small[ + +Deploy and scale containers with Docker native, open source orchestration + +.small[.small[ + +**Be kind to the WiFi!** + +*Use the 5G network* +
+*Don't use your hotspot* +
+*Don't stream videos from YouTube, Netflix, etc. +
(if you're bored, watch local content instead)* + +Thank you! + +] +] +] + +--- + +class: in-person + +## Intros + +- Hello! We are + AJ ([@s0ulshake](https://twitter.com/s0ulshake)) + & + Jérôme ([@jpetazzo](https://twitter.com/jpetazzo)) + +-- + +class: in-person + +- This is our collective Docker knowledge: + + ![Bell Curve](bell-curve.jpg) + + + +--- + +class: in-person + +## Agenda + + + + + +- The tutorial will run from 9:00am to 12:20pm + +- This will be fast-paced, but DON'T PANIC! + +- All the content is publicly available (slides, code samples, scripts) + + Upstream URL: https://github.com/jpetazzo/orchestration-workshop + +- There will be a coffee break at 10:30am +
+ (please remind me if I forget about it!) + +- Feel free to interrupt for questions at any time + +- Live feedback, questions, help on [Gitter](chat) + + http://container.training/chat + +--- + +## A brief introduction + +- This was initially written to support in-person, + instructor-led workshops and tutorials + +- You can also follow along on your own, at your own pace + +- We included as much information as possible in these slides + +- We recommend having a mentor to help you ... + +- ... Or be comfortable spending some time reading the Docker + [documentation](https://docs.docker.com/) ... + +- ... And looking for answers in the [Docker forums](forums.docker.com), + [StackOverflow](http://stackoverflow.com/questions/tagged/docker), + and other outlets + +--- + +class: self-paced + +## Hands on, you shall practice + +- Nobody ever became a Jedi by spending their lives reading Wookiepedia + +- Likewise, it will take more than merely *reading* these slides + to make you an expert + +- These slides include *tons* of exercises + +- They assume that you have access to a cluster of Docker nodes + +- If you are attending a workshop or tutorial: +
you will be given specific instructions to access your cluster + +- If you are doing this on your own: +
you can use + [Play-With-Docker](http://www.play-with-docker.com/) and + read [these instructions](https://github.com/jpetazzo/orchestration-workshop#using-play-with-docker) for extra + details diff --git a/docs/ipsec.md b/docs/ipsec.md new file mode 100644 index 00000000..7feb0010 --- /dev/null +++ b/docs/ipsec.md @@ -0,0 +1,154 @@ +class: ipsec + +# Securing overlay networks + +- By default, overlay networks are using plain VXLAN encapsulation + + (~Ethernet over UDP, using SwarmKit's control plane for ARP resolution) + +- Encryption can be enabled on a per-network basis + + (It will use IPSEC encryption provided by the kernel, leveraging hardware acceleration) + +- This is only for the `overlay` driver + + (Other drivers/plugins will use different mechanisms) + +--- + +class: ipsec + +## Creating two networks: encrypted and not + +- Let's create two networks for testing purposes + +.exercise[ + +- Create an "insecure" network: + ```bash + docker network create insecure --driver overlay --attachable + ``` + +- Create a "secure" network: + ```bash + docker network create secure --opt encrypted --driver overlay --attachable + ``` + +] + +.warning[Make sure that you don't typo that option; errors are silently ignored!] + +--- + +class: ipsec + +## Deploying a web server sitting on both networks + +- Let's use good old NGINX + +- We will attach it to both networks + +- We will use a placement constraint to make sure that it is on a different node + +.exercise[ + +- Create a web server running somewhere else: + ```bash + docker service create --name web \ + --network secure --network insecure \ + --constraint node.hostname!=node1 \ + nginx + ``` + +] + +--- + +class: ipsec + +## Sniff HTTP traffic + +- We will use `ngrep`, which allows to grep for network traffic + +- We will run it in a container, using host networking to access the host's interfaces + +.exercise[ + +- Sniff network traffic and display all packets containing "HTTP": + ```bash + docker run --net host nicolaka/netshoot ngrep -tpd eth0 HTTP + ``` + +] + +-- + +class: ipsec + +Seeing tons of HTTP request? Shutdown your DockerCoins workers: +```bash +docker service update dockercoins_worker --replicas=0 +``` + +--- + +class: ipsec + +## Check that we are, indeed, sniffing traffic + +- Let's see if we can intercept our traffic with Google! + +.exercise[ + +- Open a new terminal + +- Issue an HTTP request to Google (or anything you like): + ```bash + curl google.com + ``` + +] + +The ngrep container will display one `#` per packet traversing the network interface. + +When you do the `curl`, you should see the HTTP request in clear text in the output. + +--- + +class: ipsec, extra-details + +## If you are using Play-With-Docker, Vagrant, etc. + +- You will probably have *two* network interfaces + +- One interface will be used for outbound traffic (to Google) + +- The other one will be used for internode traffic + +- You might have to adapt/relaunch the `ngrep` command to specify the right one! + +--- + +class: ipsec + +## Try to sniff traffic across overlay networks + +- We will run `curl web` through both secure and insecure networks + +.exercise[ + +- Access the web server through the insecure network: + ```bash + docker run --rm --net insecure nicolaka/netshoot curl web + ``` + +- Now do the same through the secure network: + ```bash + docker run --rm --net secure nicolaka/netshoot curl web + ``` + +] + +When you run the first command, you will see HTTP fragments. +
+However, when you run the second one, only `#` will show up. diff --git a/docs/leastprivilege.md b/docs/leastprivilege.md new file mode 100644 index 00000000..630c4236 --- /dev/null +++ b/docs/leastprivilege.md @@ -0,0 +1,47 @@ +# Least privilege model + +- All the important data is stored in the "Raft log" + +- Managers nodes have read/write access to this data + +- Workers nodes have no access to this data + +- Workers only receive the minimum amount of data that they need: + + - which services to run + - network configuration information for these services + - credentials for these services + +- Compromising a worker node does not give access to the full cluster + +--- + +## What can I do if I compromise a worker node? + +- I can enter the containers running on that node + +- I can access the configuration and credentials used by these containers + +- I can inspect the network traffic of these containers + +- I cannot inspect or disrupt the network traffic of other containers + + (network information is provided by manager nodes; ARP spoofing is not possible) + +- I cannot infer the topology of the cluster and its number of nodes + +- I can only learn the IP addresses of the manager nodes + +--- + +## Guidelines for workload isolation leveraging least privilege model + +- Define security levels + +- Define security zones + +- Put managers in the highest security zone + +- Enforce workloads of a given security level to run in a given zone + +- Enforcement can be done with [Authorization Plugins](https://docs.docker.com/engine/extend/plugins_authorization/) diff --git a/docs/logging.md b/docs/logging.md new file mode 100644 index 00000000..48b7215d --- /dev/null +++ b/docs/logging.md @@ -0,0 +1,431 @@ +name: logging + +# Centralized logging + +- We want to send all our container logs to a central place + +- If that place could offer a nice web dashboard too, that'd be nice + +-- + +- We are going to deploy an ELK stack + +- It will accept logs over a GELF socket + +- We will update our services to send logs through the GELF logging driver + +--- + +# Setting up ELK to store container logs + +*Important foreword: this is not an "official" or "recommended" +setup; it is just an example. We used ELK in this demo because +it's a popular setup and we keep being asked about it; but you +will have equal success with Fluent or other logging stacks!* + +What we will do: + +- Spin up an ELK stack with services + +- Gaze at the spiffy Kibana web UI + +- Manually send a few log entries using one-shot containers + +- Set our containers up to send their logs to Logstash + +--- + +## What's in an ELK stack? + +- ELK is three components: + + - ElasticSearch (to store and index log entries) + + - Logstash (to receive log entries from various + sources, process them, and forward them to various + destinations) + + - Kibana (to view/search log entries with a nice UI) + +- The only component that we will configure is Logstash + +- We will accept log entries using the GELF protocol + +- Log entries will be stored in ElasticSearch, +
and displayed on Logstash's stdout for debugging + +--- + +class: elk-manual + +## Setting up ELK + +- We need three containers: ElasticSearch, Logstash, Kibana + +- We will place them on a common network, `logging` + +.exercise[ + +- Create the network: + ```bash + docker network create --driver overlay logging + ``` + +- Create the ElasticSearch service: + ```bash + docker service create --network logging --name elasticsearch elasticsearch:2.4 + ``` + +] + +--- + +class: elk-manual + +## Setting up Kibana + +- Kibana exposes the web UI + +- Its default port (5601) needs to be published + +- It needs a tiny bit of configuration: the address of the ElasticSearch service + +- We don't want Kibana logs to show up in Kibana (it would create clutter) +
so we tell Logspout to ignore them + +.exercise[ + +- Create the Kibana service: + ```bash + docker service create --network logging --name kibana --publish 5601:5601 \ + -e ELASTICSEARCH_URL=http://elasticsearch:9200 kibana:4.6 + ``` + +] + +--- + +class: elk-manual + +## Setting up Logstash + +- Logstash needs some configuration to listen to GELF messages and send them to ElasticSearch + +- We could author a custom image bundling this configuration + +- We can also pass the [configuration](https://github.com/jpetazzo/orchestration-workshop/blob/master/elk/logstash.conf) on the command line + +.exercise[ + +- Create the Logstash service: + ```bash + docker service create --network logging --name logstash -p 12201:12201/udp \ + logstash:2.4 -e "$(cat ~/orchestration-workshop/elk/logstash.conf)" + ``` + +] + +--- + +class: elk-manual + +## Checking Logstash + +- Before proceeding, let's make sure that Logstash started properly + +.exercise[ + +- Lookup the node running the Logstash container: + ```bash + docker service ps logstash + ``` + +- Connect to that node + +] + +--- + +class: elk-manual + +## View Logstash logs + +.exercise[ + +- Get the ID of the Logstash container: + ```bash + CID=$(docker ps -q --filter label=com.docker.swarm.service.name=logstash) + ``` + +- View the logs: + ```bash + docker logs --follow $CID + ``` + +] + +You should see the heartbeat messages: +.small[ +```json +{ "message" => "ok", + "host" => "1a4cfb063d13", + "@version" => "1", + "@timestamp" => "2016-06-19T00:45:45.273Z" +} +``` +] + +--- + +class: elk-auto + +## Deploying our ELK cluster + +- We will use a stack file + +.exercise[ + +- Build, ship, and run our ELK stack: + ```bash + docker-compose -f elk.yml build + docker-compose -f elk.yml push + docker stack deploy elk -c elk.yml + ``` + +] + +Note: the *build* and *push* steps are not strictly necessary, but they don't hurt! + +Let's have a look at the [Compose file]( +https://github.com/jpetazzo/orchestration-workshop/blob/master/stacks/elk.yml). + +--- + +class: elk-auto + +## Checking that our ELK stack works correctly + +- Let's view the logs of logstash + + (Who logs the loggers?) + +.exercise[ + +- Stream logstash's logs: + ```bash + docker service logs --follow --tail 1 elk_logstash + ``` + +] + +You should see the heartbeat messages: + +.small[ +```json +{ "message" => "ok", + "host" => "1a4cfb063d13", + "@version" => "1", + "@timestamp" => "2016-06-19T00:45:45.273Z" +} +``` +] + +--- + +## Testing the GELF receiver + +- In a new window, we will generate a logging message + +- We will use a one-off container, and Docker's GELF logging driver + +.exercise[ + +- Send a test message: + ```bash + docker run --log-driver gelf --log-opt gelf-address=udp://127.0.0.1:12201 \ + --rm alpine echo hello + ``` +] + +The test message should show up in the logstash container logs. + +--- + +## Sending logs from a service + +- We were sending from a "classic" container so far; let's send logs from a service instead + +- We're lucky: the parameters (`--log-driver` and `--log-opt`) are exactly the same! + + +.exercise[ + +- Send a test message: + ```bash + docker service create \ + --log-driver gelf --log-opt gelf-address=udp://127.0.0.1:12201 \ + alpine echo hello + ``` + +] + +The test message should show up as well in the logstash container logs. + +-- + +In fact, *multiple messages will show up, and continue to show up every few seconds!* + +--- + +## Restart conditions + +- By default, if a container exits (or is killed with `docker kill`, or runs out of memory ...), + the Swarm will restart it (possibly on a different machine) + +- This behavior can be changed by setting the *restart condition* parameter + +.exercise[ + +- Change the restart condition so that Swarm doesn't try to restart our container forever: + ```bash + docker service update `xxx` --restart-condition none + ``` +] + +Available restart conditions are `none`, `any`, and `on-error`. + +You can also set `--restart-delay`, `--restart-max-attempts`, and `--restart-window`. + +--- + +## Connect to Kibana + +- The Kibana web UI is exposed on cluster port 5601 + +.exercise[ + +- Connect to port 5601 of your cluster + + - if you're using Play-With-Docker, click on the (5601) badge above the terminal + + - otherwise, open http://(any-node-address):5601/ with your browser + +] + +--- + +## "Configuring" Kibana + +- If you see a status page with a yellow item, wait a minute and reload + (Kibana is probably still initializing) + +- Kibana should offer you to "Configure an index pattern": +
in the "Time-field name" drop down, select "@timestamp", and hit the + "Create" button + +- Then: + + - click "Discover" (in the top-left corner) + - click "Last 15 minutes" (in the top-right corner) + - click "Last 1 hour" (in the list in the middle) + - click "Auto-refresh" (top-right corner) + - click "5 seconds" (top-left of the list) + +- You should see a series of green bars (with one new green bar every minute) + +--- + +## Updating our services to use GELF + +- We will now inform our Swarm to add GELF logging to all our services + +- This is done with the `docker service update` command + +- The logging flags are the same as before + +.exercise[ + + + +- Enable GELF logging for the `rng` service: + ```bash + docker service update dockercoins_rng \ + --log-driver gelf --log-opt gelf-address=udp://127.0.0.1:12201 + ``` + +] + +After ~15 seconds, you should see the log messages in Kibana. + +--- + +## Viewing container logs + +- Go back to Kibana + +- Container logs should be showing up! + +- We can customize the web UI to be more readable + +.exercise[ + +- In the left column, move the mouse over the following + columns, and click the "Add" button that appears: + + - host + - container_name + - message + + + +] + +--- + +## .warning[Don't update stateful services!] + +- What would have happened if we had updated the Redis service? + +- When a service changes, SwarmKit replaces existing container with new ones + +- This is fine for stateless services + +- But if you update a stateful service, its data will be lost in the process + +- If we updated our Redis service, all our DockerCoins would be lost + +--- + +## Important afterword + +**This is not a "production-grade" setup.** + +It is just an educational example. We did set up a single +ElasticSearch instance and a single Logstash instance. + +In a production setup, you need an ElasticSearch cluster +(both for capacity and availability reasons). You also +need multiple Logstash instances. + +And if you want to withstand +bursts of logs, you need some kind of message queue: +Redis if you're cheap, Kafka if you want to make sure +that you don't drop messages on the floor. Good luck. + +If you want to learn more about the GELF driver, +have a look at [this blog post]( +http://jpetazzo.github.io/2017/01/20/docker-logging-gelf/). diff --git a/docs/markmaker.py b/docs/markmaker.py new file mode 100755 index 00000000..08007c46 --- /dev/null +++ b/docs/markmaker.py @@ -0,0 +1,73 @@ +#!/usr/bin/env python +# transforms a YAML manifest into a MARKDOWN workshop file + +import logging +import os +import re +import sys +import yaml + + +if os.environ.get("DEBUG") == "1": + logging.basicConfig(level=logging.DEBUG) + + +class InvalidChapter(ValueError): + + def __init__(self, chapter): + ValueError.__init__(self, "Invalid chapter: {!r}".format(chapter)) + + +def yaml2markdown(inf, outf): + manifest = yaml.load(inf) + markdown, titles = processchapter(manifest["chapters"]) + logging.debug(titles) + toc = gentoc(titles) + markdown = markdown.replace("@@TOC@@", toc) + outf.write(markdown) + + +def gentoc(titles, depth=0, chapter=0): + if not titles: + return "" + if type(titles) == str: + return " "*(depth-2) + "- " + titles + "\n" + if type(titles) == list: + if depth==0: + sep = "\n\n---\n\n" + head = "" + tail = "" + elif depth==1: + sep = "\n" + head = "## Chapter {}\n\n".format(chapter) + tail = "" + else: + sep = "\n" + head = "" + tail = "" + return head + sep.join(gentoc(t, depth+1, c+1) for (c,t) in enumerate(titles)) + tail + + +def findtitles(markdown): + return re.findall("^# (.*)", markdown, re.MULTILINE) + + +# This takes a file name or a markdown snippet in argument. +# It returns (epxandedmarkdown,[list of titles]) +# The list of titles can be nested. +def processchapter(chapter): + if type(chapter) == str: + if "\n" in chapter: + return (chapter, findtitles(chapter)) + if os.path.isfile(chapter): + return processchapter(open(chapter).read()) + if type(chapter) == list: + chapters = [processchapter(c) for c in chapter] + markdown = "\n---\n".join(c[0] for c in chapters) + titles = [t for (m,t) in chapters if t] + return (markdown, titles) + raise InvalidChapter(chapter) + + + +yaml2markdown(sys.stdin, sys.stdout) diff --git a/docs/metrics.md b/docs/metrics.md new file mode 100644 index 00000000..451566d9 --- /dev/null +++ b/docs/metrics.md @@ -0,0 +1,1631 @@ +# Metrics collection + +- We want to gather metrics in a central place + +- We will gather node metrics and container metrics + +- We want a nice interface to view them (graphs) + +--- + +## Node metrics + +- CPU, RAM, disk usage on the whole node + +- Total number of processes running, and their states + +- Number of open files, sockets, and their states + +- I/O activity (disk, network), per operation or volume + +- Physical/hardware (when applicable): temperature, fan speed ... + +- ... and much more! + +--- + +## Container metrics + +- Similar to node metrics, but not totally identical + +- RAM breakdown will be different + + - active vs inactive memory + - some memory is *shared* between containers, and accounted specially + +- I/O activity is also harder to track + + - async writes can cause deferred "charges" + - some page-ins are also shared between containers + +For details about container metrics, see: +
+http://jpetazzo.github.io/2013/10/08/docker-containers-metrics/ + +--- + +class: snap, prom + +## Tools + +We will build *two* different metrics pipelines: + +- One based on Intel Snap, + +- Another based on Prometheus. + +If you're using Play-With-Docker, skip the exercises +relevant to Intel Snap (we rely on a SSH server to deploy, +and PWD doesn't have that yet). + +--- + +class: snap + +## First metrics pipeline + +We will use three open source Go projects for our first metrics pipeline: + +- Intel Snap + + Collects, processes, and publishes metrics + +- InfluxDB + + Stores metrics + +- Grafana + + Displays metrics visually + +--- + +class: snap + +## Snap + +- [github.com/intelsdi-x/snap](https://github.com/intelsdi-x/snap) + +- Can collect, process, and publish metric data + +- Doesn’t store metrics + +- Works as a daemon (snapd) controlled by a CLI (snapctl) + +- Offloads collecting, processing, and publishing to plugins + +- Does nothing out of the box; configuration required! + +- Docs: https://github.com/intelsdi-x/snap/blob/master/docs/ + +--- + +class: snap + +## InfluxDB + +- Snap doesn't store metrics data + +- InfluxDB is specifically designed for time-series data + + - CRud vs. CRUD (you rarely if ever update/delete data) + + - orthogonal read and write patterns + + - storage format optimization is key (for disk usage and performance) + +- Snap has a plugin allowing to *publish* to InfluxDB + +--- + +class: snap + +## Grafana + +- Snap cannot show graphs + +- InfluxDB cannot show graphs + +- Grafana will take care of that + +- Grafana can read data from InfluxDB and display it as graphs + +--- + +class: snap + +## Getting and setting up Snap + +- We will install Snap directly on the nodes + +- Release tarballs are available from GitHub + +- We will use a *global service* +
(started on all nodes, including nodes added later) + +- This service will download and unpack Snap in /opt and /usr/local + +- /opt and /usr/local will be bind-mounted from the host + +- This service will effectively install Snap on the hosts + +--- + +class: snap + +## The Snap installer service + +- This will get Snap on all nodes + +.exercise[ + +```bash +docker service create --restart-condition=none --mode global \ + --mount type=bind,source=/usr/local/bin,target=/usr/local/bin \ + --mount type=bind,source=/opt,target=/opt centos sh -c ' +SNAPVER=v0.16.1-beta +RELEASEURL=https://github.com/intelsdi-x/snap/releases/download/$SNAPVER +curl -sSL $RELEASEURL/snap-$SNAPVER-linux-amd64.tar.gz | + tar -C /opt -zxf- +curl -sSL $RELEASEURL/snap-plugins-$SNAPVER-linux-amd64.tar.gz | + tar -C /opt -zxf- +ln -s snap-$SNAPVER /opt/snap +for BIN in snapd snapctl; do ln -s /opt/snap/bin/$BIN /usr/local/bin/$BIN; done +' # If you copy-paste that block, do not forget that final quote ☺ +``` + +] + +--- + +class: snap + +## First contact with `snapd` + +- The core of Snap is `snapd`, the Snap daemon + +- Application made up of a REST API, control module, and scheduler module + +.exercise[ + +- Start `snapd` with plugin trust disabled and log level set to debug: + ```bash + snapd -t 0 -l 1 + ``` + +] + +- More resources: + + https://github.com/intelsdi-x/snap/blob/master/docs/SNAPD.md + https://github.com/intelsdi-x/snap/blob/master/docs/SNAPD_CONFIGURATION.md + +--- + +class: snap + +## Using `snapctl` to interact with `snapd` + +- Let's load a *collector* and a *publisher* plugins + +.exercise[ + +- Open a new terminal + +- Load the psutil collector plugin: + ```bash + snapctl plugin load /opt/snap/plugin/snap-plugin-collector-psutil + ``` + +- Load the file publisher plugin: + ```bash + snapctl plugin load /opt/snap/plugin/snap-plugin-publisher-mock-file + ``` + +] + +--- + +class: snap + +## Checking what we've done + +- Good to know: Docker CLI uses `ls`, Snap CLI uses `list` + +.exercise[ + +- See your loaded plugins: + ```bash + snapctl plugin list + ``` + +- See the metrics you can collect: + ```bash + snapctl metric list + ``` + +] + +--- + +class: snap + +## Actually collecting metrics: introducing *tasks* + +- To start collecting/processing/publishing metric data, you need to create a *task* + +- A *task* indicates: + + - *what* to collect (which metrics) + - *when* to collect it (e.g. how often) + - *how* to process it (e.g. use it directly, or compute moving averages) + - *where* to publish it + +- Tasks can be defined with manifests written in JSON or YAML + +- Some plugins, such as the Docker collector, allow for wildcards (\*) in the metrics "path" +
(see snap/docker-influxdb.json) + +- More resources: + https://github.com/intelsdi-x/snap/blob/master/docs/TASKS.md + +--- + +class: snap + +## Our first task manifest + +```yaml + version: 1 + schedule: + type: "simple" # collect on a set interval + interval: "1s" # of every 1s + max-failures: 10 + workflow: + collect: # first collect + metrics: # metrics to collect + /intel/psutil/load/load1: {} + config: # there is no configuration + publish: # after collecting, publish + - + plugin_name: "file" # use the file publisher + config: + file: "/tmp/snap-psutil-file.log" # write to this file +``` + +--- + +class: snap + +## Creating our first task + +- The task manifest shown on the previous slide is stored in `snap/psutil-file.yml`. + +.exercise[ + +- Create a task using the manifest: + + ```bash + cd ~/orchestration-workshop/snap + snapctl task create -t psutil-file.yml + ``` + +] + + The output should look like the following: + ``` + Using task manifest to create task + Task created + ID: 240435e8-a250-4782-80d0-6fff541facba + Name: Task-240435e8-a250-4782-80d0-6fff541facba + State: Running + ``` + +--- + +class: snap + +## Checking existing tasks + +.exercise[ + +- This will confirm that our task is running correctly, and remind us of its task ID + + ```bash + snapctl task list + ``` + +] + +The output should look like the following: + ``` + ID NAME STATE HIT MISS FAIL CREATED + 24043...acba Task-24043...acba Running 4 0 0 2:34PM 8-13-2016 + ``` +--- + +class: snap + +## Viewing our task dollars at work + +- The task is using a very simple publisher, `mock-file` + +- That publisher just writes text lines in a file (one line per data point) + +.exercise[ + +- Check that the data is flowing indeed: + ```bash + tail -f /tmp/snap-psutil-file.log + ``` + +] + +To exit, hit `^C` + +--- + +class: snap + +## Debugging tasks + +- When a task is not directly writing to a local file, use `snapctl task watch` + +- `snapctl task watch` will stream the metrics you are collecting to STDOUT + +.exercise[ + +```bash +snapctl task watch +``` + +] + +To exit, hit `^C` + +--- + +class: snap + +## Stopping snap + +- Our Snap deployment has a few flaws: + + - snapd was started manually + + - it is running on a single node + + - the configuration is purely local + +-- + +class: snap + +- We want to change that! + +-- + +class: snap + +- But first, go back to the terminal where `snapd` is running, and hit `^C` + +- All tasks will be stopped; all plugins will be unloaded; Snap will exit + +--- + +class: snap + +## Snap Tribe Mode + +- Tribe is Snap's clustering mechanism + +- When tribe mode is enabled, nodes can join *agreements* + +- When a node in an *agreement* does something (e.g. load a plugin or run a task), +
other nodes of that agreement do the same thing + +- We will use it to load the Docker collector and InfluxDB publisher on all nodes, +
and run a task to use them + +- Without tribe mode, we would have to load plugins and run tasks manually on every node + +- More resources: + https://github.com/intelsdi-x/snap/blob/master/docs/TRIBE.md + +--- + +class: snap + +## Running Snap itself on every node + +- Snap runs in the foreground, so you need to use `&` or start it in tmux + +.exercise[ + +- Run the following command *on every node:* + ```bash + snapd -t 0 -l 1 --tribe --tribe-seed node1:6000 + ``` + +] + +If you're *not* using Play-With-Docker, there is another way to start Snap! + +--- + +class: snap + +## Starting a daemon through SSH + +.warning[Hackety hack ahead!] + +- We will create a *global service* + +- That global service will install a SSH client + +- With that SSH client, the service will connect back to its local node +
(i.e. "break out" of the container, using the SSH key that we provide) + +- Once logged on the node, the service starts snapd with Tribe Mode enabled + +--- + +class: snap + +## Running Snap itself on every node + +- I might go to hell for showing you this, but here it goes ... + +.exercise[ + +- Start Snap all over the place: + ```bash + docker service create --name snapd --mode global \ + --mount type=bind,source=$HOME/.ssh/id_rsa,target=/sshkey \ + alpine sh -c " + apk add --no-cache openssh-client && + ssh -o StrictHostKeyChecking=no -i /sshkey docker@172.17.0.1 \ + sudo snapd -t 0 -l 1 --tribe --tribe-seed node1:6000 + " # If you copy-paste that block, don't forget that final quote :-) + ``` + +] + +Remember: this *does not work* with Play-With-Docker (which doesn't have SSH). + +--- + +class: snap + +## Viewing the members of our tribe + +- If everything went fine, Snap is now running in tribe mode + +.exercise[ + +- View the members of our tribe: + ```bash + snapctl member list + ``` + +] + +This should show the 5 nodes with their hostnames. + +--- + +class: snap + +## Create an agreement + +- We can now create an *agreement* for our plugins and tasks + +.exercise[ + +- Create an agreement; make sure to use the same name all along: + ```bash + snapctl agreement create docker-influxdb + ``` + +] + +The output should look like the following: + +``` + Name Number of Members plugins tasks + docker-influxdb 0 0 0 +``` + +--- + +class: snap + +## Instruct all nodes to join the agreement + +- We dont need another fancy global service! + +- We can join nodes from any existing node of the cluster + +.exercise[ + +- Add all nodes to the agreement: + ```bash + snapctl member list | tail -n +2 | + xargs -n1 snapctl agreement join docker-influxdb + ``` + +] + +The last bit of output should look like the following: +``` + Name Number of Members plugins tasks + docker-influxdb 5 0 0 +``` + +--- + +class: snap + +## Start a container on every node + +- The Docker plugin requires at least one container to be started + +- Normally, at this point, you will have at least one container on each node + +- But just in case you did things differently, let's create a dummy global service + +.exercise[ + +- Create an alpine container on the whole cluster: + ```bash + docker service create --name ping --mode global alpine ping 8.8.8.8 + ``` + +] + +--- + +class: snap + +## Running InfluxDB + +- We will create a service for InfluxDB + +- We will use the official image + +- InfluxDB uses multiple ports: + + - 8086 (HTTP API; we need this) + + - 8083 (admin interface; we need this) + + - 8088 (cluster communication; not needed here) + + - more ports for other protocols (graphite, collectd...) + +- We will just publish the first two + +--- + +class: snap + +## Creating the InfluxDB service + +.exercise[ + +- Start an InfluxDB service, publishing ports 8083 and 8086: + ```bash + docker service create --name influxdb \ + --publish 8083:8083 \ + --publish 8086:8086 \ + influxdb:0.13 + ``` + +] + +Note: this will allow any node to publish metrics data to `localhost:8086`, +and it will allows us to access the admin interface by connecting to any node +on port 8083. + +.warning[Make sure to use InfluxDB 0.13; a few things changed in 1.0 +(like, the name of the default retention policy is now "autogen") and +this breaks a few things.] + +--- + +class: snap + +## Setting up InfluxDB + +- We need to create the "snap" database + +.exercise[ + +- Open port 8083 with your browser + +- Enter the following query in the query box: + ``` + CREATE DATABASE "snap" + ``` + +- In the top-right corner, select "Database: snap" + +] + +Note: the InfluxDB query language *looks like* SQL but it's not. + +??? + +## Setting a retention policy + +- When graduating to 1.0, InfluxDB changed the name of the default policy + +- It used to be "default" and it is now "autogen" + +- Snap still uses "default" and this results in errors + +.exercise[ + +- Create a "default" retention policy by entering the following query in the box: + ``` + CREATE RETENTION POLICY "default" ON "snap" DURATION 1w REPLICATION 1 + ``` + +] + +--- + +class: snap + +## Load Docker collector and InfluxDB publisher + +- We will load plugins on the local node + +- Since our local node is a member of the agreement, all other + nodes in the agreement will also load these plugins + +.exercise[ + +- Load Docker collector: + + ```bash + snapctl plugin load /opt/snap/plugin/snap-plugin-collector-docker + ``` + +- Load InfluxDB publisher: + + ```bash + snapctl plugin load /opt/snap/plugin/snap-plugin-publisher-influxdb + ``` + +] + +--- + +class: snap + +## Start a simple collection task + +- Again, we will create a task on the local node + +- The task will be replicated on other nodes members of the same agreement + +.exercise[ + +- Load a task manifest file collecting a couple of metrics on all containers, +
and sending them to InfluxDB: + ```bash + cd ~/orchestration-workshop/snap + snapctl task create -t docker-influxdb.json + ``` + +] + +Note: the task description sends metrics to the InfluxDB API endpoint +located at 127.0.0.1:8086. Since the InfluxDB container is published +on port 8086, 127.0.0.1:8086 always routes traffic to the InfluxDB +container. + +--- + +class: snap + +## If things go wrong... + +Note: if a task runs into a problem (e.g. it's trying to publish +to a metrics database, but the database is unreachable), the task +will be stopped. + +You will have to restart it manually by running: + +```bash +snapctl task enable +snapctl task start +``` + +This must be done *per node*. Alternatively, you can delete+re-create +the task (it will delete+re-create on all nodes). + +--- + +class: snap + +## Check that metric data shows up in InfluxDB + +- Let's check existing data with a few manual queries in the InfluxDB admin interface + +.exercise[ + +- List "measurements": + ``` + SHOW MEASUREMENTS + ``` + (This should show two generic entries corresponding to the two collected metrics.) + +- View time series data for one of the metrics: + ``` + SELECT * FROM "intel/docker/stats/cgroups/cpu_stats/cpu_usage/total_usage" + ``` + (This should show a list of data points with **time**, **docker_id**, **source**, and **value**.) + +] + +--- + +class: snap + +## Deploy Grafana + +- We will use an almost-official image, `grafana/grafana` + +- We will publish Grafana's web interface on its default port (3000) + +.exercise[ + +- Create the Grafana service: + ```bash + docker service create --name grafana --publish 3000:3000 grafana/grafana:3.1.1 + ``` + +] + +--- + +class: snap + +## Set up Grafana + +.exercise[ + +- Open port 3000 with your browser + +- Identify with "admin" as the username and password + +- Click on the Grafana logo (the orange spiral in the top left corner) + +- Click on "Data Sources" + +- Click on "Add data source" (green button on the right) + +] + +--- + +class: snap + +## Add InfluxDB as a data source for Grafana + +.small[ + +Fill the form exactly as follows: +- Name = "snap" +- Type = "InfluxDB" + +In HTTP settings, fill as follows: +- Url = "http://(IP.address.of.any.node):8086" +- Access = "direct" +- Leave HTTP Auth untouched + +In InfluxDB details, fill as follows: +- Database = "snap" +- Leave user and password blank + +Finally, click on "add", you should see a green message saying "Success - Data source is working". +If you see an orange box (sometimes without a message), it means that you got something wrong. Triple check everything again. + +] + +--- + +class: snap + +![Screenshot showing how to fill the form](grafana-add-source.png) + +--- + +class: snap + +## Create a dashboard in Grafana + +.exercise[ + +- Click on the Grafana logo again (the orange spiral in the top left corner) + +- Hover over "Dashboards" + +- Click "+ New" + +- Click on the little green rectangle that appeared in the top left + +- Hover over "Add Panel" + +- Click on "Graph" + +] + +At this point, you should see a sample graph showing up. + +--- + +class: snap + +## Setting up a graph in Grafana + +.exercise[ + +- Panel data source: select snap +- Click on the SELECT metrics query to expand it +- Click on "select measurement" and pick CPU usage +- Click on the "+" right next to "WHERE" +- Select "docker_id" +- Select the ID of a container of your choice (e.g. the one running InfluxDB) +- Click on the "+" on the right of the "SELECT" line +- Add "derivative" +- In the "derivative" option, select "1s" +- In the top right corner, click on the clock, and pick "last 5 minutes" + +] + +Congratulations, you are viewing the CPU usage of a single container! + +--- + +class: snap + +![Screenshot showing the end result](grafana-add-graph.png) + +--- + +class: snap, prom + +## Before moving on ... + +- Leave that tab open! + +- We are going to setup *another* metrics system + +- ... And then compare both graphs side by side + +--- + +class: snap, prom + +## Prometheus vs. Snap + +- Prometheus is another metrics collection system + +- Snap *pushes* metrics; Prometheus *pulls* them + +--- + +class: prom + +## Prometheus components + +- The *Prometheus server* pulls, stores, and displays metrics + +- Its configuration defines a list of *exporter* endpoints +
(that list can be dynamic, using e.g. Consul, DNS, Etcd...) + +- The exporters expose metrics over HTTP using a simple line-oriented format + + (An optimized format using protobuf is also possible) + +--- + +class: prom + +## It's all about the `/metrics` + +- This is was the *node exporter* looks like: + + http://demo.robustperception.io:9100/metrics + +- Prometheus itself exposes its own internal metrics, too: + + http://demo.robustperception.io:9090/metrics + +- A *Prometheus server* will *scrape* URLs like these + + (It can also use protobuf to avoid the overhead of parsing line-oriented formats!) + +--- + +class: prom-manual + +## Collecting metrics with Prometheus on Swarm + +- We will run two *global services* (i.e. scheduled on all our nodes): + + - the Prometheus *node exporter* to get node metrics + + - Google's cAdvisor to get container metrics + +- We will run a Prometheus server to scrape these exporters + +- The Prometheus server will be configured to use DNS service discovery + +- We will use `tasks.` for service discovery + +- All these services will be placed on a private internal network + +--- + +class: prom-manual + +## Creating an overlay network for Prometheus + +- This is the easiest step ☺ + +.exercise[ + +- Create an overlay network: + ```bash + docker network create --driver overlay prom + ``` + +] + +--- + +class: prom-manual + +## Running the node exporter + +- The node exporter *should* run directly on the hosts +- However, it can run from a container, if configured properly +
+ (it needs to access the host's filesystems, in particular /proc and /sys) + +.exercise[ + +- Start the node exporter: + ```bash + docker service create --name node --mode global --network prom \ + --mount type=bind,source=/proc,target=/host/proc \ + --mount type=bind,source=/sys,target=/host/sys \ + --mount type=bind,source=/,target=/rootfs \ + prom/node-exporter \ + -collector.procfs /host/proc \ + -collector.sysfs /host/proc \ + -collector.filesystem.ignored-mount-points "^/(sys|proc|dev|host|etc)($|/)" + ``` + +] + +--- + +class: prom-manual + +## Running cAdvisor + +- Likewise, cAdvisor *should* run directly on the hosts + +- But it can run in containers, if configured properly + +.exercise[ + +- Start the cAdvisor collector: + ```bash + docker service create --name cadvisor --network prom --mode global \ + --mount type=bind,source=/,target=/rootfs \ + --mount type=bind,source=/var/run,target=/var/run \ + --mount type=bind,source=/sys,target=/sys \ + --mount type=bind,source=/var/lib/docker,target=/var/lib/docker \ + google/cadvisor:latest + ``` + +] + +--- + +class: prom-manual + +## Configuring the Prometheus server + +This will be our configuration file for Prometheus: + +```yaml +global: + scrape_interval: 10s +scrape_configs: + - job_name: 'prometheus' + static_configs: + - targets: ['localhost:9090'] + - job_name: 'node' + dns_sd_configs: + - names: ['tasks.node'] + type: 'A' + port: 9100 + - job_name: 'cadvisor' + dns_sd_configs: + - names: ['tasks.cadvisor'] + type: 'A' + port: 8080 +``` + +--- + +class: prom-manual + +## Passing the configuration to the Prometheus server + +- We need to provide our custom configuration to the Prometheus server + +- The easiest solution is to create a custom image bundling this configuration + +- We will use a very simple Dockerfile: + ```dockerfile + FROM prom/prometheus:v1.4.1 + COPY prometheus.yml /etc/prometheus/prometheus.yml + ``` + + (The configuration file, and the Dockerfile, are in the `prom` subdirectory) + +- We will build this image, and push it to our local registry + +- Then we will create a service using this image + +Note: it is also possible to use a `config` to inject that configuration file +without having to create this ad-hoc image. + +--- + +class: prom-manual + +## Building our custom Prometheus image + +- We will use the local registry started previously on 127.0.0.1:5000 + +.exercise[ + +- Build the image using the provided Dockerfile: + ```bash + docker build -t 127.0.0.1:5000/prometheus ~/orchestration-workshop/prom + ``` + +- Push the image to our local registry: + ```bash + docker push 127.0.0.1:5000/prometheus + ``` + +] + +--- + +class: prom-manual + +## Running our custom Prometheus image + +- That's the only service that needs to be published + + (If we want to access Prometheus from outside!) + +.exercise[ + +- Start the Prometheus server: + ```bash + docker service create --network prom --name prom \ + --publish 9090:9090 127.0.0.1:5000/prometheus + ``` + +] + +--- + +class: prom-auto + +## Deploying Prometheus on our cluster + +- We will use a stack definition (once again) + +.exercise[ + +- Make sure we are in the stacks directory: + ```bash + cd ~/orchestration-workshop/stacks + ``` + +- Build, ship, and run the Prometheus stack: + ```bash + docker-compose -f prometheus.yml build + docker-compose -f prometheus.yml push + docker stack deploy -c prometheus.yml prometheus + ``` + +] + +--- + +class: prom + +## Checking our Prometheus server + +- First, let's make sure that Prometheus is correctly scraping all metrics + +.exercise[ + +- Open port 9090 with your browser + +- Click on "status", then "targets" + +] + +You should see 11 endpoints (5 cadvisor, 5 node, 1 prometheus). + +Their state should be "UP". + +--- + +class: prom-auto, config + +## Injecting a configuration file + +(New in Docker Engine 17.06) + +- We are creating a custom image *just to inject a configuration* + +- Instead, we could use the base Prometheus image + a `config` + +- A `config` is a blob (usually, a configuration file) that: + + - is created and managed through the Docker API (and CLI) + + - gets persisted into the Raft log (i.e. safely) + + - can be associated to a service +
+ (this injects the blob as a plain file in the service's containers) + +--- + +class: prom-auto, config + +## Differences between `config` and `secret` + +The two are very similar, but ... + +- `configs`: + + - can be injected to any filesystem location + + - can be viewed and extracted using the Docker API or CLI + +- `secrets`: + + - can only be injected into `/run/secrets` + + - are never stored in clear text on disk + + - cannot be viewed or extracted with the Docker API or CLI + +--- + +class: prom-auto, config + +## Deploying Prometheus with a `config` + +- The `config` can be created manually or declared in the Compose file + +- This is what our new Compose file looks like: + +.small[ +```yaml +version: "3.3" + +services: + +prometheus: + image: prom/prometheus:v1.4.1 + ports: + - "9090:9090" + configs: + - source: prometheus + target: /etc/prometheus/prometheus.yml + +... + +configs: + prometheus: + file: ../prom/prometheus.yml +``` +] + +(This is from `prometheus+config.yml`) + +--- + +class: prom-auto, config + +## Specifying a `config` in a Compose file + +- In each service, an optional `configs` section can list as many configs as you want + +- Each config can specify: + + - an optional `target` (path to inject the configuration; by default: root of the container) + + - ownership and permissions (by default, the file will be owned by UID 0, i.e. `root`) + +- These configs reference top-level `configs` elements + +- The top-level configs can be declared as: + + - *external*, meaning that it is supposed to be created before you deploy the stack + + - referencing a file, whose content is used to initialize the config + +--- + +class: prom-auto, config + +## Re-deploying Prometheus with a config + +- We will update the existing stack using `prometheus+config.yml` + +.exercise[ + +- Redeploy the `prometheus` stack: + ```bash + docker stack deploy -c prometheus+config.yml prometheus + ``` + +- Check that Prometheus still works as intended + + (By connecting to any node of the cluster, on port 9090) + +] + +--- + +class: prom-auto, config + +## Accessing the config object from the Docker CLI + +- Config objects can be viewed from the CLI (or API) + +.exercise[ + +- List existing config objects: + ```bash + docker config ls + ``` + +- View details about our config object: + ```bash + docker config inspect prometheus_prometheus + ``` + +] + +Note: the content of the config blob is shown with BASE64 encoding. +
+(It doesn't have to be text; it could be an image or any kind of binary content!) + +--- + +class: prom-auto, config + +## Extracting a config blob + +- Let's retrieve that Prometheus configuration! + +.exercise[ + +- Extract the BASE64 payload with `jq`: + ```bash + docker config inspect prometheus_prometheus | jq -r .[0].Spec.Data + ``` + +- Decode it with `base64 -d`: + ```bash + docker config inspect prometheus_prometheus | jq -r .[0].Spec.Data | base64 -d + ``` + +] + +--- + +class: prom + +## Displaying metrics directly from Prometheus + +- This is easy ... if you are familiar with PromQL + +.exercise[ + +- Click on "Graph", and in "expression", paste the following: + ``` + sum by (container_label_com_docker_swarm_node_id) ( + irate( + container_cpu_usage_seconds_total{ + container_label_com_docker_swarm_service_name="dockercoins_worker" + }[1m] + ) + ) + ``` + +- Click on the blue "Execute" button and on the "Graph" tab just below + +] + +--- + +class: prom + +## Building the query from scratch + +- We are going to build the same query from scratch + +- This doesn't intend to be a detailed PromQL course + +- This is merely so that you (I) can pretend to know how the previous query works +
so that your coworkers (you) can be suitably impressed (or not) + + (Or, so that we can build other queries if necessary, or adapt if cAdvisor, + Prometheus, or anything else changes and requires editing the query!) + +--- + +class: prom + +## Displaying a raw metric for *all* containers + +- Click on the "Graph" tab on top + + *This takes us to a blank dashboard* + +- Click on the "Insert metric at cursor" drop down, and select `container_cpu_usage_seconds_total` + + *This puts the metric name in the query box* + +- Click on "Execute" + + *This fills a table of measurements below* + +- Click on "Graph" (next to "Console") + + *This replaces the table of measurements with a series of graphs (after a few seconds)* + +--- + +class: prom + +## Selecting metrics for a specific service + +- Hover over the lines in the graph + + (Look for the ones that have labels like `container_label_com_docker_...`) + +- Edit the query, adding a condition between curly braces: + + .small[`container_cpu_usage_seconds_total{container_label_com_docker_swarm_service_name="dockercoins_worker"}`] + +- Click on "Execute" + + *Now we should see one line per CPU per container* + +- If you want to select by container ID, you can use a regex match: `id=~"/docker/c4bf.*"` + +- You can also specify multiple conditions by separating them with commas + +--- + +class: prom + +## Turn counters into rates + +- What we see is the total amount of CPU used (in seconds) + +- We want to see a *rate* (CPU time used / real time) + +- To get a moving average over 1 minute periods, enclose the current expression within: + + ``` + rate ( ... { ... } [1m] ) + ``` + + *This should turn our steadily-increasing CPU counter into a wavy graph* + +- To get an instantaneous rate, use `irate` instead of `rate` + + (The time window is then used to limit how far behind to look for data if data points + are missing in case of scrape failure; see [here](https://www.robustperception.io/irate-graphs-are-better-graphs/) for more details!) + + *This should show spikes that were previously invisible because they were smoothed out* + +--- + +class: prom + +## Aggregate multiple data series + +- We have one graph per CPU per container; we want to sum them + +- Enclose the whole expression within: + + ``` + sum ( ... ) + ``` + + *We now see a single graph* + +--- + +class: prom + +## Collapse dimensions + +- If we have multiple containers we can also collapse just the CPU dimension: + + ``` + sum without (cpu) ( ... ) + ``` + + *This shows the same graph, but preserves the other labels* + +- Congratulations, you wrote your first PromQL expression from scratch! + + (I'd like to thank [Johannes Ziemke](https://twitter.com/discordianfish) and + [Julius Volz](https://twitter.com/juliusvolz) for their help with Prometheus!) + +--- + +class: prom, snap + +## Comparing Snap and Prometheus data + +- If you haven't setup Snap, InfluxDB, and Grafana, skip this section + +- If you have closed the Grafana tab, you might have to re-setup a new dashboard + + (Unless you saved it before navigating it away) + +- To re-do the setup, just follow again the instructions from the previous chapter + +--- + +class: prom, snap + +## Add Prometheus as a data source in Grafana + +.exercise[ + +- In a new tab, connect to Grafana (port 3000) + +- Click on the Grafana logo (the orange spiral in the top-left corner) + +- Click on "Data Sources" + +- Click on the green "Add data source" button + +] + +We see the same input form that we filled earlier to connect to InfluxDB. + +--- + +class: prom, snap + +## Connecting to Prometheus from Grafana + +.exercise[ + +- Enter "prom" in the name field + +- Select "Prometheus" as the source type + +- Enter http://(IP.address.of.any.node):9090 in the Url field + +- Select "direct" as the access method + +- Click on "Save and test" + +] + +Again, we should see a green box telling us "Data source is working." + +Otherwise, double-check every field and try again! + +--- + +class: prom, snap + +## Adding the Prometheus data to our dashboard + +.exercise[ + +- Go back to the the tab where we had our first Grafana dashboard + +- Click on the blue "Add row" button in the lower right corner + +- Click on the green tab on the left; select "Add panel" and "Graph" + +] + +This takes us to the graph editor that we used earlier. + +--- + +class: prom, snap + +## Querying Prometheus data from Grafana + +The editor is a bit less friendly than the one we used for InfluxDB. + +.exercise[ + +- Select "prom" as Panel data source + +- Paste the query in the query field: + ``` + sum without (cpu, id) ( irate ( + container_cpu_usage_seconds_total{ + container_label_com_docker_swarm_service_name="influxdb"}[1m] ) ) + ``` + +- Click outside of the query field to confirm + +- Close the row editor by clicking the "X" in the top right area + +] + +--- + +class: prom, snap + +## Interpreting results + +- The two graphs *should* be similar + +- Protip: align the time references! + +.exercise[ + +- Click on the clock in the top right corner + +- Select "last 30 minutes" + +- Click on "Zoom out" + +- Now press the right arrow key (hold it down and watch the CPU usage increase!) + +] + +*Adjusting units is left as an exercise for the reader.* + +--- + +## More resources on container metrics + +- [Prometheus, a Whirlwind Tour](https://speakerdeck.com/copyconstructor/prometheus-a-whirlwind-tour), + an original overview of Prometheus + +- [Docker Swarm & Container Overview](https://grafana.net/dashboards/609), + a custom dashboard for Grafana + +- [Gathering Container Metrics](http://jpetazzo.github.io/2013/10/08/docker-containers-metrics/), + a blog post about cgroups + +- [The Prometheus Time Series Database](https://www.youtube.com/watch?v=HbnGSNEjhUc), + a talk explaining why custom data storage is necessary for metrics diff --git a/docs/namespaces.md b/docs/namespaces.md new file mode 100644 index 00000000..7d3712b4 --- /dev/null +++ b/docs/namespaces.md @@ -0,0 +1,236 @@ +class: namespaces +name: namespaces + +# Improving isolation with User Namespaces + +- *Namespaces* are kernel mechanisms to compartimetalize the system + +- There are different kind of namespaces: `pid`, `net`, `mnt`, `ipc`, `uts`, and `user` + +- For a primer, see "Anatomy of a Container" + ([video](https://www.youtube.com/watch?v=sK5i-N34im8)) + ([slides](https://www.slideshare.net/jpetazzo/cgroups-namespaces-and-beyond-what-are-containers-made-from-dockercon-europe-2015)) + +- The *user namespace* allows to map UIDs between the containers and the host + +- As a result, `root` in a container can map to a non-privileged user on the host + +Note: even without user namespaces, `root` in a container cannot go wild on the host. +
+It is mediated by capabilities, cgroups, namespaces, seccomp, LSMs... + +--- + +class: namespaces + +## User Namespaces in Docker + +- Optional feature added in Docker Engine 1.10 + +- Not enabled by default + +- Has to be enabled at Engine startup, and affects all containers + +- When enabled, `UID:GID` in containers are mapped to a different range on the host + +- Safer than switching to a non-root user (with `-u` or `USER`) in the container +
+ (Since with user namespaces, root escalation maps to a non-privileged user) + +- Can be selectively disabled per container by starting them with `--userns=host` + +--- + +class: namespaces + +## User Namespaces Caveats + +When user namespaces are enabled, containers cannot: + +- Use the host's network namespace (with `docker run --network=host`) + +- Use the host's PID namespace (with `docker run --pid=host`) + +- Run in privileged mode (with `docker run --privileged`) + +... Unless user namespaces are disabled for the container, with flag `--userns=host` + +External volume and graph drivers that don't support user mapping might not work. + +All containers are currently mapped to the same UID:GID range. + +Some of these limitations might be lifted in the future! + +--- + +class: namespaces + +## Filesystem ownership details + +When enabling user namespaces: + +- the UID:GID on disk (in the images and containers) has to match the *mapped* UID:GID + +- existing images and containers cannot work (their UID:GID would have to be changed) + +For practical reasons, when enabling user namespaces, the Docker Engine places containers and images (and everything else) in a different directory. + +As a resut, if you enable user namespaces on an existing installation: + +- all containers and images (and e.g. Swarm data) disappear + +- *if a node is a member of a Swarm, it is then kicked out of the Swarm* + +- everything will re-appear if you disable user namespaces again + +--- + +class: namespaces + +## Picking a node + +- We will select a node where we will enable user namespaces + +- This node will have to be re-added to the Swarm + +- All containers and services running on this node will be rescheduled + +- Let's make sure that we do not pick the node running the registry! + +.exercise[ + +- Check on which node the registry is running: + ```bash + docker service ps registry + ``` + +] + +Pick any other node (noted `nodeX` in the next slides). + +--- + +class: namespaces + +## Logging into the right Engine + +.exercise[ + +- Log into the right node: + ```bash + ssh node`X` + ``` + +] + +--- + +class: namespaces + +## Configuring the Engine + +.exercise[ + +- Create a configuration file for the Engine: + ```bash + echo '{"userns-remap": "default"}' | sudo tee /etc/docker/daemon.json + ``` + +- Restart the Engine: + ```bash + kill $(pidof dockerd) + ``` + +] + +--- + +class: namespaces + +## Checking that User Namespaces are enabled + +.exercise[ + - Notice the new Docker path: + ```bash + docker info | grep var/lib + ``` + + - Notice the new UID:GID permissions: + ```bash + sudo ls -l /var/lib/docker + ``` + +] + +You should see a line like the following: +``` +drwx------ 11 296608 296608 4096 Aug 3 05:11 296608.296608 +``` + +--- + +class: namespaces + +## Add the node back to the Swarm + +.exercise[ + +- Get our manager token from another node: + ```bash + ssh node`Y` docker swarm join-token manager + ``` + +- Copy-paste the join command to the node + +] + +--- + +class: namespaces + +## Check the new UID:GID + +.exercise[ + +- Run a background container on the node: + ```bash + docker run -d --name lockdown alpine sleep 1000000 + ``` + +- Look at the processes in this container: + ```bash + docker top lockdown + ps faux + ``` + +] + +--- + +class: namespaces + +## Comparing on-disk ownership with/without User Namespaces + +.exercise[ + +- Compare the output of the two following commands: + ```bash + docker run alpine ls -l / + docker run --userns=host alpine ls -l / + ``` + +] + +-- + +class: namespaces + +In the first case, it looks like things belong to `root:root`. + +In the second case, we will see the "real" (on-disk) ownership. + +-- + +class: namespaces + +Remember to get back to `node1` when finished! diff --git a/docs/netshoot.md b/docs/netshoot.md new file mode 100644 index 00000000..a9dd87b9 --- /dev/null +++ b/docs/netshoot.md @@ -0,0 +1,393 @@ +class: netshoot, extra-details + +## Troubleshooting overlay networks + + + +- We want to run tools like `ab` or `httping` on the internal network + +-- + +class: netshoot, extra-details + +- Ah, if only we had created our overlay network with the `--attachable` flag ... + +-- + +class: netshoot, extra-details + +- Oh well, let's use this as an excuse to introduce New Ways To Do Things + +--- + +class: netshoot + +# Breaking into an overlay network + +- We will create a dummy placeholder service on our network + +- Then we will use `docker exec` to run more processes in this container + +.exercise[ + +- Start a "do nothing" container using our favorite Swiss-Army distro: + ```bash + docker service create --network dockercoins_default --name debug \ + --constraint node.hostname==$HOSTNAME alpine sleep 1000000000 + ``` + +] + +The `constraint` makes sure that the container will be created on the local node. + +--- + +class: netshoot + +## Entering the debug container + +- Once our container is started (which should be really fast because the alpine image is small), we can enter it (from any node) + +.exercise[ + +- Locate the container: + ```bash + docker ps + ``` + +- Enter it: + ```bash + docker exec -ti sh + ``` + +] + +--- + +class: netshoot + +## Labels + +- We can also be fancy and find the ID of the container automatically + +- SwarmKit places labels on containers + +.exercise[ + +- Get the ID of the container: + ```bash + CID=$(docker ps -q --filter label=com.docker.swarm.service.name=debug) + ``` + +- And enter the container: + ```bash + docker exec -ti $CID sh + ``` + +] + +--- + +class: netshoot + +## Installing our debugging tools + +- Ideally, you would author your own image, with all your favorite tools, and use it instead of the base `alpine` image + +- But we can also dynamically install whatever we need + +.exercise[ + +- Install a few tools: + ```bash + apk add --update curl apache2-utils drill + ``` + +] + +--- + +class: netshoot + +## Investigating the `rng` service + +- First, let's check what `rng` resolves to + +.exercise[ + +- Use drill or nslookup to resolve `rng`: + ```bash + drill rng + ``` + +] + +This give us one IP address. It is not the IP address of a container. +It is a virtual IP address (VIP) for the `rng` service. + +--- + +class: netshoot + +## Investigating the VIP + +.exercise[ + +- Try to ping the VIP: + ```bash + ping rng + ``` + +] + +It *should* ping. (But this might change in the future.) + +With Engine 1.12: VIPs respond to ping if a +backend is available on the same machine. + +With Engine 1.13: VIPs respond to ping if a +backend is available anywhere. + +(Again: this might change in the future.) + +--- + +class: netshoot + +## What if I don't like VIPs? + +- Services can be published using two modes: VIP and DNSRR. + +- With VIP, you get a virtual IP for the service, and a load balancer + based on IPVS + + (By the way, IPVS is totally awesome and if you want to learn more about it in the context of containers, + I highly recommend [this talk](https://www.youtube.com/watch?v=oFsJVV1btDU&index=5&list=PLkA60AVN3hh87OoVra6MHf2L4UR9xwJkv) by [@kobolog](https://twitter.com/kobolog) at DC15EU!) + +- With DNSRR, you get the former behavior (from Engine 1.11), where + resolving the service yields the IP addresses of all the containers for + this service + +- You change this with `docker service create --endpoint-mode [VIP|DNSRR]` + +--- + +class: netshoot + +## Looking up VIP backends + +- You can also resolve a special name: `tasks.` + +- It will give you the IP addresses of the containers for a given service + +.exercise[ + +- Obtain the IP addresses of the containers for the `rng` service: + ```bash + drill tasks.rng + ``` + +] + +This should list 5 IP addresses. + +--- + +class: netshoot, extra-details + +## Testing and benchmarking our service + +- We will check that the service is up with `rng`, then + benchmark it with `ab` + +.exercise[ + +- Make a test request to the service: + ```bash + curl rng + ``` + +- Open another window, and stop the workers, to test in isolation: + ```bash + docker service update dockercoins_worker --replicas 0 + ``` + +] + +Wait until the workers are stopped (check with `docker service ls`) +before continuing. + +--- + +class: netshoot, extra-details + +## Benchmarking `rng` + +We will send 50 requests, but with various levels of concurrency. + +.exercise[ + +- Send 50 requests, with a single sequential client: + ```bash + ab -c 1 -n 50 http://rng/10 + ``` + +- Send 50 requests, with fifty parallel clients: + ```bash + ab -c 50 -n 50 http://rng/10 + ``` + +] + +--- + +class: netshoot, extra-details + +## Benchmark results for `rng` + +- When serving requests sequentially, they each take 100ms + +- In the parallel scenario, the latency increased dramatically: + +- What about `hasher`? + +--- + +class: netshoot, extra-details + +## Benchmarking `hasher` + +We will do the same tests for `hasher`. + +The command is slightly more complex, since we need to post random data. + +First, we need to put the POST payload in a temporary file. + +.exercise[ + +- Install curl in the container, and generate 10 bytes of random data: + ```bash + curl http://rng/10 >/tmp/random + ``` + +] + +--- + +class: netshoot, extra-details + +## Benchmarking `hasher` + +Once again, we will send 50 requests, with different levels of concurrency. + +.exercise[ + +- Send 50 requests with a sequential client: + ```bash + ab -c 1 -n 50 -T application/octet-stream -p /tmp/random http://hasher/ + ``` + +- Send 50 requests with 50 parallel clients: + ```bash + ab -c 50 -n 50 -T application/octet-stream -p /tmp/random http://hasher/ + ``` + +] + +--- + +class: netshoot, extra-details + +## Benchmark results for `hasher` + +- The sequential benchmarks takes ~5 seconds to complete + +- The parallel benchmark takes less than 1 second to complete + +- In both cases, each request takes a bit more than 100ms to complete + +- Requests are a bit slower in the parallel benchmark + +- It looks like `hasher` is better equiped to deal with concurrency than `rng` + +--- + +class: netshoot, extra-details, title + +Why? + +--- + +class: netshoot, extra-details + +## Why does everything take (at least) 100ms? + +`rng` code: + +![RNG code screenshot](delay-rng.png) + +`hasher` code: + +![HASHER code screenshot](delay-hasher.png) + +--- + +class: netshoot, extra-details, title + +But ... + +WHY?!? + +--- + +class: netshoot, extra-details + +## Why did we sprinkle this sample app with sleeps? + +- Deterministic performance +
(regardless of instance speed, CPUs, I/O...) + +- Actual code sleeps all the time anyway + +- When your code makes a remote API call: + + - it sends a request; + + - it sleeps until it gets the response; + + - it processes the response. + +--- + +class: netshoot, extra-details, in-person + +## Why do `rng` and `hasher` behave differently? + +![Equations on a blackboard](equations.png) + +(Synchronous vs. asynchronous event processing) + +--- + +class: netshoot, extra-details + +## Global scheduling → global debugging + +- Traditional approach: + + - log into a node + - install our Swiss Army Knife (if necessary) + - troubleshoot things + +- Proposed alternative: + + - put our Swiss Army Knife in a container (e.g. [nicolaka/netshoot](https://hub.docker.com/r/nicolaka/netshoot/)) + - run tests from multiple locations at the same time + +(This becomes very practical with the `docker service log` command, available since 17.05.) diff --git a/docs/nodeinfo.md b/docs/nodeinfo.md new file mode 100644 index 00000000..e43352d5 --- /dev/null +++ b/docs/nodeinfo.md @@ -0,0 +1,13 @@ +class: node-info + +## Getting task information for a given node + +- You can see all the tasks assigned to a node with `docker node ps` + +- It shows the *desired state* and *current state* of each task + +- `docker node ps` shows info about the current node + +- `docker node ps ` shows info for another node + +- `docker node ps -a` includes stopped and failed tasks diff --git a/docs/operatingswarm.md b/docs/operatingswarm.md new file mode 100644 index 00000000..66567029 --- /dev/null +++ b/docs/operatingswarm.md @@ -0,0 +1,58 @@ +class: title, in-person + +Operating the Swarm + +--- + +name: part-2 + +class: title, self-paced + +Part 2 + +--- + +class: self-paced + +## Before we start ... + +The following exercises assume that you have a 5-nodes Swarm cluster. + +If you come here from a previous tutorial and still have your cluster: great! + +Otherwise: check [part 1](#part-1) to learn how to setup your own cluster. + +We pick up exactly where we left you, so we assume that you have: + +- a five nodes Swarm cluster, + +- a self-hosted registry, + +- DockerCoins up and running. + +The next slide has a cheat sheet if you need to set that up in a pinch. + +--- + +class: self-paced + +## Catching up + +Assuming you have 5 nodes provided by +[Play-With-Docker](http://www.play-with-docker/), do this from `node1`: + +```bash +docker swarm init --advertise-addr eth0 +TOKEN=$(docker swarm join-token -q manager) +for N in $(seq 2 5); do + DOCKER_HOST=tcp://node$N:2375 docker swarm join --token $TOKEN node1:2377 +done +git clone git://github.com/jpetazzo/orchestration-workshop +cd orchestration-workshop/stacks +docker stack deploy --compose-file registry.yml registry +docker-compose -f dockercoins.yml build +docker-compose -f dockercoins.yml push +docker stack deploy --compose-file dockercoins.yml dockercoins +``` + +You should now be able to connect to port 8000 and see the DockerCoins web UI. diff --git a/docs/ourapponswarm.md b/docs/ourapponswarm.md new file mode 100644 index 00000000..ccc090d2 --- /dev/null +++ b/docs/ourapponswarm.md @@ -0,0 +1,968 @@ +class: title + +Our app on Swarm + +--- + +## What's on the menu? + +In this part, we will: + +- **build** images for our app, + +- **ship** these images with a registry, + +- **run** services using these images. + +--- + +## Why do we need to ship our images? + +- When we do `docker-compose up`, images are built for our services + +- These images are present only on the local node + +- We need these images to be distributed on the whole Swarm + +- The easiest way to achieve that is to use a Docker registry + +- Once our images are on a registry, we can reference them when + creating our services + +--- + +class: extra-details + +## Build, ship, and run, for a single service + +If we had only one service (built from a `Dockerfile` in the +current directory), our workflow could look like this: + +``` +docker build -t jpetazzo/doublerainbow:v0.1 . +docker push jpetazzo/doublerainbow:v0.1 +docker service create jpetazzo/doublerainbow:v0.1 +``` + +We just have to adapt this to our application, which has 4 services! + +--- + +## The plan + +- Build on our local node (`node1`) + +- Tag images so that they are named `localhost:5000/servicename` + +- Upload them to a registry + +- Create services using the images + +--- + +## Which registry do we want to use? + +.small[ + +- **Docker Hub** + + - hosted by Docker Inc. + - requires an account (free, no credit card needed) + - images will be public (unless you pay) + - located in AWS EC2 us-east-1 + +- **Docker Trusted Registry** + + - self-hosted commercial product + - requires a subscription (free 30-day trial available) + - images can be public or private + - located wherever you want + +- **Docker open source registry** + + - self-hosted barebones repository hosting + - doesn't require anything + - doesn't come with anything either + - located wherever you want + +] + +--- + +class: extra-details + +## Using Docker Hub + +*If we wanted to use the Docker Hub...* + + + +- We would log into the Docker Hub: + ```bash + docker login + ``` + +- And in the following slides, we would use our Docker Hub login + (e.g. `jpetazzo`) instead of the registry address (i.e. `127.0.0.1:5000`) + + + +--- + +class: extra-details + +## Using Docker Trusted Registry + +*If we wanted to use DTR, we would...* + +- Make sure we have a Docker Hub account + +- [Activate a Docker Datacenter subscription]( + https://hub.docker.com/enterprise/trial/) + +- Install DTR on our machines + +- Use `dtraddress:port/user` instead of the registry address + +*This is out of the scope of this workshop!* + +--- + +## Using the open source registry + +- We need to run a `registry:2` container +
(make sure you specify tag `:2` to run the new version!) + +- It will store images and layers to the local filesystem +
(but you can add a config file to use S3, Swift, etc.) + +- Docker *requires* TLS when communicating with the registry + + - unless for registries on `127.0.0.0/8` (i.e. `localhost`) + + - or with the Engine flag `--insecure-registry` + + + +- Our strategy: publish the registry container on port 5000, +
so that it's available through `127.0.0.1:5000` on each node + +--- + +class: manual-btp + +# Deploying a local registry + +- We will create a single-instance service, publishing its port + on the whole cluster + +.exercise[ + +- Create the registry service: + ```bash + docker service create --name registry --publish 5000:5000 registry:2 + ``` + +- Try the following command, until it returns `{"repositories":[]}`: + ```bash + curl 127.0.0.1:5000/v2/_catalog + ``` + +] + +(Retry a few times, it might take 10-20 seconds for the container to be started. Patience.) + +--- + +class: manual-btp + +## Testing our local registry + +- We can retag a small image, and push it to the registry + +.exercise[ + +- Make sure we have the busybox image, and retag it: + ```bash + docker pull busybox + docker tag busybox 127.0.0.1:5000/busybox + ``` + +- Push it: + ```bash + docker push 127.0.0.1:5000/busybox + ``` + +] + +--- + +class: manual-btp + +## Checking what's on our local registry + +- The registry API has endpoints to query what's there + +.exercise[ + +- Ensure that our busybox image is now in the local registry: + ```bash + curl http://127.0.0.1:5000/v2/_catalog + ``` + +] + +The curl command should now output: +```json +{"repositories":["busybox"]} +``` + +--- + +class: manual-btp + +## Build, tag, and push our application container images + +- Compose has named our images `dockercoins_XXX` for each service + +- We need to retag them (to `127.0.0.1:5000/XXX:v1`) and push them + +.exercise[ + +- Set `REGISTRY` and `TAG` environment variables to use our local registry +- And run this little for loop: + ```bash + cd ~/orchestration-workshop/dockercoins + REGISTRY=127.0.0.1:5000 TAG=v1 + for SERVICE in hasher rng webui worker; do + docker tag dockercoins_$SERVICE $REGISTRY/$SERVICE:$TAG + docker push $REGISTRY/$SERVICE + done + ``` + +] + +--- + +class: manual-btp + +# Overlay networks + +- SwarmKit integrates with overlay networks + +- Networks are created with `docker network create` + +- Make sure to specify that you want an *overlay* network +
(otherwise you will get a local *bridge* network by default) + +.exercise[ + +- Create an overlay network for our application: + ```bash + docker network create --driver overlay dockercoins + ``` + +] + +--- + +class: manual-btp + +## Viewing existing networks + +- Let's confirm that our network was created + +.exercise[ + +- List existing networks: + ```bash + docker network ls + ``` + +] + +--- + +class: manual-btp + +## Can you spot the differences? + +The networks `dockercoins` and `ingress` are different from the other ones. + +Can you see how? + +-- + +class: manual-btp + +- They are using a different kind of ID, reflecting the fact that they + are SwarmKit objects instead of "classic" Docker Engine objects. + +- Their *scope* is `swarm` instead of `local`. + +- They are using the overlay driver. + +--- + +class: manual-btp, extra-details + +## Caveats + +.warning[In Docker 1.12, you cannot join an overlay network with `docker run --net ...`.] + +Starting with version 1.13, you can, if the network was created with the `--attachable` flag. + +*Why is that?* + +Placing a container on a network requires allocating an IP address for this container. + +The allocation must be done by a manager node (worker nodes cannot update Raft data). + +As a result, `docker run --net ...` requires collaboration with manager nodes. + +It alters the code path for `docker run`, so it is allowed only under strict circumstances. + +--- + +class: manual-btp + +## Run the application + +- First, create the `redis` service; that one is using a Docker Hub image + +.exercise[ + +- Create the `redis` service: + ```bash + docker service create --network dockercoins --name redis redis + ``` + +] + +--- + +class: manual-btp + +## Run the other services + +- Then, start the other services one by one + +- We will use the images pushed previously + +.exercise[ + +- Start the other services: + ```bash + REGISTRY=127.0.0.1:5000 + TAG=v1 + for SERVICE in hasher rng webui worker; do + docker service create --network dockercoins --detach=true \ + --name $SERVICE $REGISTRY/$SERVICE:$TAG + done + ``` + +] + +??? + +## Wait for our application to be up + +- We will see later a way to watch progress for all the tasks of the cluster + +- But for now, a scrappy Shell loop will do the trick + +.exercise[ + +- Repeatedly display the status of all our services: + ```bash + watch "docker service ls -q | xargs -n1 docker service ps" + ``` + +- Stop it once everything is running + +] + +--- + +class: manual-btp + +## Expose our application web UI + +- We need to connect to the `webui` service, but it is not publishing any port + +- Let's reconfigure it to publish a port + +.exercise[ + +- Update `webui` so that we can connect to it from outside: + ```bash + docker service update webui --publish-add 8000:80 --detach=false + ``` + +] + +Note: to "de-publish" a port, you would have to specify the container port. +
(i.e. in that case, `--publish-rm 80`) + +--- + +class: manual-btp + +## What happens when we modify a service? + +- Let's find out what happened to our `webui` service + +.exercise[ + +- Look at the tasks and containers associated to `webui`: + ```bash + docker service ps webui + ``` +] + +-- + +class: manual-btp + +The first version of the service (the one that was not exposed) has been shutdown. + +It has been replaced by the new version, with port 80 accessible from outside. + +(This will be discussed with more details in the section about stateful services.) + +--- + +class: manual-btp + +## Connect to the web UI + +- The web UI is now available on port 8000, *on all the nodes of the cluster* + +.exercise[ + +- If you're using Play-With-Docker, just click on the `(8000)` badge + +- Otherwise, point your browser to any node, on port 8000 + +] + +--- + +## Scaling the application + +- We can change scaling parameters with `docker update` as well + +- We will do the equivalent of `docker-compose scale` + +.exercise[ + +- Bring up more workers: + ```bash + docker service update worker --replicas 10 --detach=false + ``` + +- Check the result in the web UI + +] + +You should see the performance peaking at 10 hashes/s (like before). + +--- + +class: manual-btp + +# Global scheduling + +- We want to utilize as best as we can the entropy generators + on our nodes + +- We want to run exactly one `rng` instance per node + +- SwarmKit has a special scheduling mode for that, let's use it + +- We cannot enable/disable global scheduling on an existing service + +- We have to destroy and re-create the `rng` service + +--- + +class: manual-btp + +## Scaling the `rng` service + +.exercise[ + +- Remove the existing `rng` service: + ```bash + docker service rm rng + ``` + +- Re-create the `rng` service with *global scheduling*: + ```bash + docker service create --name rng --network dockercoins --mode global \ + --detach=false $REGISTRY/rng:$TAG + ``` + +- Look at the result in the web UI + +] + +--- + +class: extra-details, manual-btp + +## Why do we have to re-create the service to enable global scheduling? + +- Enabling it dynamically would make rolling updates semantics very complex + +- This might change in the future (after all, it was possible in 1.12 RC!) + +- As of Docker Engine 17.05, other parameters requiring to `rm`/`create` the service are: + + - service name + + - hostname + + - network + +--- + +class: swarm-ready + +## How did we make our app "Swarm-ready"? + +This app was written in June 2015. (One year before Swarm mode was released.) + +What did we change to make it compatible with Swarm mode? + +-- + +.exercise[ + +- Go to the app directory: + ```bash + cd ~/orchestration-workshop/dockercoins + ``` + +- See modifications in the code: + ```bash + git log -p --since "4-JUL-2015" -- . ':!*.yml*' ':!*.html' + ``` + +] + +--- + +class: swarm-ready + +## What did we change in our app since its inception? + +- Compose files + +- HTML file (it contains an embedded contextual tweet) + +- Dockerfiles (to switch to smaller images) + +- That's it! + +-- + +class: swarm-ready + +*We didn't change a single line of code in this app since it was written.* + +-- + +class: swarm-ready + +*The images that were [built in June 2015]( +https://hub.docker.com/r/jpetazzo/dockercoins_worker/tags/) +(when the app was written) can still run today ... +
... in Swarm mode (distributed across a cluster, with load balancing) ... +
... without any modification.* + +--- + +class: swarm-ready + +## How did we design our app in the first place? + +- [Twelve-Factor App](https://12factor.net/) principles + +- Service discovery using DNS names + + - Initially implemented as "links" + + - Then "ambassadors" + + - And now "services" + +- Existing apps might require more changes! + +--- + +class: manual-btp + +# Integration with Compose + +- The previous section showed us how to streamline image build and push + +- We will now see how to streamline service creation + + (i.e. get rid of the `for SERVICE in ...; do docker service create ...` part) + +--- + +## Compose file version 3 + +(New in Docker Engine 1.13) + +- Almost identical to version 2 + +- Can be directly used by a Swarm cluster through `docker stack ...` commands + +- Introduces a `deploy` section to pass Swarm-specific parameters + +- Resource limits are moved to this `deploy` section + +- See [here](https://github.com/aanand/docker.github.io/blob/8524552f99e5b58452fcb1403e1c273385988b71/compose/compose-file.md#upgrading) for the complete list of changes + +- Supersedes *Distributed Application Bundles* + + (JSON payload describing an application; could be generated from a Compose file) + +--- + +class: manual-btp + +## Removing everything + +- Before deploying using "stacks," let's get a clean slate + +.exercise[ + +- Remove *all* the services: + ```bash + docker service ls -q | xargs docker service rm + ``` + +] + +--- + +## Our first stack + +We need a registry to move images around. + +Without a stack file, it would be deployed with the following command: + +```bash +docker service create --publish 5000:5000 registry:2 +``` + +Now, we are going to deploy it with the following stack file: + +```yaml +version: "3" + +services: + registry: + image: registry:2 + ports: + - "5000:5000" +``` + +--- + +## Checking our stack files + +- All the stack files that we will use are in the `stacks` directory + +.exercise[ + +- Go to the `stacks` directory: + ```bash + cd ~/orchestration-workshop/stacks + ``` + +- Check `registry.yml`: + ```bash + cat registry.yml + ``` + +] + +--- + +## Deploying our first stack + +- All stack manipulation commands start with `docker stack` + +- Under the hood, they map to `docker service` commands + +- Stacks have a *name* (which also serves as a namespace) + +- Stacks are specified with the aforementioned Compose file format version 3 + +.exercise[ + +- Deploy our local registry: + ```bash + docker stack deploy registry --compose-file registry.yml + ``` + +] + +--- + +## Inspecting stacks + +- `docker stack ps` shows the detailed state of all services of a stack + +.exercise[ + +- Check that our registry is running correctly: + ```bash + docker stack ps registry + ``` + +- Confirm that we get the same output with the following command: + ```bash + docker service ps registry_registry + ``` + +] + +--- + +class: manual-btp + +## Specifics of stack deployment + +Our registry is not *exactly* identical to the one deployed with `docker service create`! + +- Each stack gets its own overlay network + +- Services of the task are connected to this network +
(unless specified differently in the Compose file) + +- Services get network aliases matching their name in the Compose file +
(just like when Compose brings up an app specified in a v2 file) + +- Services are explicitly named `_` + +- Services and tasks also get an internal label indicating which stack they belong to + +--- + +## Testing our local registry + +- Connecting to port 5000 *on any node of the cluster* routes us to the registry + +- Therefore, we can use `localhost:5000` or `127.0.0.1:5000` as our registry + +.exercise[ + +- Issue the following API request to the registry: + ```bash + curl 127.0.0.1:5000/v2/_catalog + ``` + +] + +It should return: + +```json +{"repositories":[]} +``` + +If that doesn't work, retry a few times; perhaps the container is still starting. + +--- + +## Pushing an image to our local registry + +- We can retag a small image, and push it to the registry + +.exercise[ + +- Make sure we have the busybox image, and retag it: + ```bash + docker pull busybox + docker tag busybox 127.0.0.1:5000/busybox + ``` + +- Push it: + ```bash + docker push 127.0.0.1:5000/busybox + ``` + +] + +--- + +## Checking what's on our local registry + +- The registry API has endpoints to query what's there + +.exercise[ + +- Ensure that our busybox image is now in the local registry: + ```bash + curl http://127.0.0.1:5000/v2/_catalog + ``` + +] + +The curl command should now output: +```json +"repositories":["busybox"]} +``` + +--- + +## Building and pushing stack services + +- When using Compose file version 2 and above, you can specify *both* `build` and `image` + +- When both keys are present: + + - Compose does "business as usual" (uses `build`) + + - but the resulting image is named as indicated by the `image` key +
+ (instead of `_:latest`) + + - it can be pushed to a registry with `docker-compose push` + +- Example: + + ```yaml + webfront: + build: www + image: myregistry.company.net:5000/webfront + ``` + +--- + +## Using Compose to build and push images + +.exercise[ + +- Try it: + ```bash + docker-compose -f dockercoins.yml build + docker-compose -f dockercoins.yml push + ``` + +] + +Let's have a look at the `dockercoins.yml` file while this is building and pushing. + +--- + +```yaml +version: "3" + +services: + rng: + build: dockercoins/rng + image: ${REGISTRY-127.0.0.1:5000}/rng:${TAG-latest} + deploy: + mode: global + ... + redis: + image: redis + ... + worker: + build: dockercoins/worker + image: ${REGISTRY-127.0.0.1:5000}/worker:${TAG-latest} + ... + deploy: + replicas: 10 +``` + +--- + +## Deploying the application + +- Now that the images are on the registry, we can deploy our application stack + +.exercise[ + +- Create the application stack: + ```bash + docker stack deploy dockercoins --compose-file dockercoins.yml + ``` + +] + +We can now connect to any of our nodes on port 8000, and we will see the familiar hashing speed graph. + +--- + +## Maintaining multiple environments + +There are many ways to handle variations between environments. + +- Compose loads `docker-compose.yml` and (if it exists) `docker-compose.override.yml` + +- Compose can load alternate file(s) by setting the `-f` flag or the `COMPOSE_FILE` environment variable + +- Compose files can *extend* other Compose files, selectively including services: + + ```yaml + web: + extends: + file: common-services.yml + service: webapp + ``` + +See [this documentation page](https://docs.docker.com/compose/extends/) for more details about these techniques. + + +--- + +class: extra-details + +## Good to know ... + +- Compose file version 3 adds the `deploy` section + +- Compose file version 3.1 adds support for secrets + +- You can re-run `docker stack deploy` to update a stack + +- ... But unsupported features will be wiped each time you redeploy (!) + + (This will likely be fixed/improved soon) + +- `extends` doesn't work with `docker stack deploy` + + (But you can use `docker-compose config` to "flatten" your configuration) + +--- + +## Summary + +- We've seen how to set up a Swarm + +- We've used it to host our own registry + +- We've built our app container images + +- We've used the registry to host those images + +- We've deployed and scaled our application + +- We've seen how to use Compose to streamline deployments + +- Awesome job, team! diff --git a/docs/prereqs.md b/docs/prereqs.md new file mode 100644 index 00000000..fcadbefb --- /dev/null +++ b/docs/prereqs.md @@ -0,0 +1,255 @@ +# Pre-requirements + +- Computer with internet connection and a web browser + +- For instructor-led workshops: an SSH client to connect to remote machines + + - on Linux, OS X, FreeBSD... you are probably all set + + - on Windows, get [putty](http://www.putty.org/), + Microsoft [Win32 OpenSSH](https://github.com/PowerShell/Win32-OpenSSH/wiki/Install-Win32-OpenSSH), + [Git BASH](https://git-for-windows.github.io/), or + [MobaXterm](http://mobaxterm.mobatek.net/) + +- For self-paced learning: SSH is not necessary if you use + [Play-With-Docker](http://www.play-with-docker.com/) + +- Some Docker knowledge + + (but that's OK if you're not a Docker expert!) + +--- + +class: in-person, extra-details + +## Nice-to-haves + +- [Mosh](https://mosh.org/) instead of SSH, if your internet connection tends to lose packets +
(available with `(apt|yum|brew) install mosh`; then connect with `mosh user@host`) + +- [GitHub](https://github.com/join) account +
(if you want to fork the repo; also used to join Gitter) + +- [Gitter](https://gitter.im/) account +
(to join the conversation during the workshop) + +- [Slack](https://community.docker.com/registrations/groups/4316) account +
(to join the conversation after the workshop) + +- [Docker Hub](https://hub.docker.com) account +
(it's one way to distribute images on your Swarm cluster) + +--- + +class: extra-details + +## Extra details + +- This slide should have a little magnifying glass in the top right corner + + (If it doesn't, it's because CSS is hard — Jérôme is only a backend person, alas) + +- Slides with that magnifying glass indicate slides providing extra details + +- Feel free to skip them if you're in a hurry! + +--- + +## Hands-on sections + +- The whole workshop is hands-on + +- We will see Docker in action + +- You are invited to reproduce all the demos + +- All hands-on sections are clearly identified, like the gray rectangle below + +.exercise[ + +- This is the stuff you're supposed to do! +- Go to [container.training](http://container.training/) to view these slides +- Join the [chat room](chat) + +] + +--- + +class: in-person + +# VM environment + +- To follow along, you need a cluster of five Docker Engines + +- If you are doing this with an instructor, see next slide + +- If you are doing (or re-doing) this on your own, you can: + + - create your own cluster (local or cloud VMs) with Docker Machine + ([instructions](https://github.com/jpetazzo/orchestration-workshop/tree/master/prepare-machine)) + + - use [Play-With-Docker](http://play-with-docker.com) ([instructions](https://github.com/jpetazzo/orchestration-workshop#using-play-with-docker)) + + - create a bunch of clusters for you and your friends + ([instructions](https://github.com/jpetazzo/orchestration-workshop/tree/master/prepare-vms)) + +--- + +class: pic, in-person + +![You get five VMs](you-get-five-vms.jpg) + +--- + +class: in-person + +## You get five VMs + +- Each person gets 5 private VMs (not shared with anybody else) +- They'll remain up until the day after the tutorial +- You should have a little card with login+password+IP addresses +- You can automatically SSH from one VM to another + +.exercise[ + + + +- Log into the first VM (`node1`) with SSH or MOSH +- Check that you can SSH (without password) to `node2`: + ```bash + ssh node2 + ``` +- Type `exit` or `^D` to come back to node1 + + + +] + +--- + +class: in-person + +## If doing or re-doing the workshop on your own ... + +--- + +class: self-paced + +## How to get your own Docker nodes? + +- Use [Play-With-Docker](http://www.play-with-docker.com/)! + +-- + +- Main differences: + + - you don't need to SSH to the machines +
(just click on the node that you want to control in the left tab bar) + + - Play-With-Docker automagically detects exposed ports +
(and displays them as little badges with port numbers, above the terminal) + + - You can access HTTP services by clicking on the port numbers + + - exposing TCP services requires something like + [ngrok](https://ngrok.com/) + or [supergrok](https://github.com/jpetazzo/orchestration-workshop#using-play-with-docker) + + + +--- + +class: self-paced + +## Using Play-With-Docker + +- Open a new browser tab to [www.play-with-docker.com](http://www.play-with-docker.com/) + +- Confirm that you're not a robot + +- Click on "ADD NEW INSTANCE": congratulations, you have your first Docker node! + +- When you will need more nodes, just click on "ADD NEW INSTANCE" again + +- Note the countdown in the corner; when it expires, your instances are destroyed + +- If you give your URL to somebody else, they can access your nodes too +
+ (You can use that for pair programming, or to get help from a mentor) + +- Loving it? Not loving it? Tell it to the wonderful authors, + [@marcosnils](https://twitter.com/marcosnils) & + [@xetorthio](https://twitter.com/xetorthio)! + +--- + +## We will (mostly) interact with node1 only + +- Unless instructed, **all commands must be run from the first VM, `node1`** + +- We will only checkout/copy the code on `node1` + +- When we will use the other nodes, we will do it mostly through the Docker API + +- We will log into other nodes only for initial setup and a few "out of band" operations +
(checking internal logs, debugging...) + +--- + +## Terminals + +Once in a while, the instructions will say: +
"Open a new terminal." + +There are multiple ways to do this: + +- create a new window or tab on your machine, and SSH into the VM; + +- use screen or tmux on the VM and open a new window from there. + +You are welcome to use the method that you feel the most comfortable with. + +--- + +## Tmux cheatsheet + +- Ctrl-b c → creates a new window +- Ctrl-b n → go to next window +- Ctrl-b p → go to previous window +- Ctrl-b " → split window top/bottom +- Ctrl-b % → split window left/right +- Ctrl-b Alt-1 → rearrange windows in columns +- Ctrl-b Alt-2 → rearrange windows in rows +- Ctrl-b arrows → navigate to other windows +- Ctrl-b d → detach session +- tmux attach → reattach to session diff --git a/docs/sampleapp.md b/docs/sampleapp.md new file mode 100644 index 00000000..36c931c2 --- /dev/null +++ b/docs/sampleapp.md @@ -0,0 +1,468 @@ +# Our sample application + +- Visit the GitHub repository with all the materials of this workshop: +
https://github.com/jpetazzo/orchestration-workshop + +- The application is in the [dockercoins]( + https://github.com/jpetazzo/orchestration-workshop/tree/master/dockercoins) + subdirectory + +- Let's look at the general layout of the source code: + + there is a Compose file [docker-compose.yml]( + https://github.com/jpetazzo/orchestration-workshop/blob/master/dockercoins/docker-compose.yml) ... + + ... and 4 other services, each in its own directory: + + - `rng` = web service generating random bytes + - `hasher` = web service computing hash of POSTed data + - `worker` = background process using `rng` and `hasher` + - `webui` = web interface to watch progress + +--- + +class: extra-details + +## Compose file format version + +*Particularly relevant if you have used Compose before...* + +- Compose 1.6 introduced support for a new Compose file format (aka "v2") + +- Services are no longer at the top level, but under a `services` section + +- There has to be a `version` key at the top level, with value `"2"` (as a string, not an integer) + +- Containers are placed on a dedicated network, making links unnecessary + +- There are other minor differences, but upgrade is easy and straightforward + +--- + +## Links, naming, and service discovery + +- Containers can have network aliases (resolvable through DNS) + +- Compose file version 2+ makes each container reachable through its service name + +- Compose file version 1 did require "links" sections + +- Our code can connect to services using their short name + + (instead of e.g. IP address or FQDN) + +- Network aliases are automatically namespaced + + (i.e. you can have multiple apps declaring and using a service named `database`) + +--- + +## Example in `worker/worker.py` + +![Service discovery](service-discovery.png) + +--- + +## What's this application? + +--- + +class: pic + +![DockerCoins logo](dockercoins.png) + +(DockerCoins 2016 logo courtesy of [@XtlCnslt](https://twitter.com/xtlcnslt) and [@ndeloof](https://twitter.com/ndeloof). Thanks!) + +--- + +## What's this application? + +- It is a DockerCoin miner! 💰🐳📦🚢 + +-- + +- No, you can't buy coffee with DockerCoins + +-- + +- How DockerCoins works: + + - `worker` asks to `rng` to generate a few random bytes + + - `worker` feeds these bytes into `hasher` + + - and repeat forever! + + - every second, `worker` updates `redis` to indicate how many loops were done + + - `webui` queries `redis`, and computes and exposes "hashing speed" in your browser + +--- + +## Getting the application source code + +- We will clone the GitHub repository + +- The repository also contains scripts and tools that we will use through the workshop + +.exercise[ + + + +- Clone the repository on `node1`: + ```bash + git clone git://github.com/jpetazzo/orchestration-workshop + ``` + +] + +(You can also fork the repository on GitHub and clone your fork if you prefer that.) + +--- + +# Running the application + +Without further ado, let's start our application. + +.exercise[ + +- Go to the `dockercoins` directory, in the cloned repo: + ```bash + cd ~/orchestration-workshop/dockercoins + ``` + +- Use Compose to build and run all containers: + ```bash + docker-compose up + ``` + +] + +Compose tells Docker to build all container images (pulling +the corresponding base images), then starts all containers, +and displays aggregated logs. + +--- + +## Lots of logs + +- The application continuously generates logs + +- We can see the `worker` service making requests to `rng` and `hasher` + +- Let's put that in the background + +.exercise[ + +- Stop the application by hitting `^C` + + + +] + +- `^C` stops all containers by sending them the `TERM` signal + +- Some containers exit immediately, others take longer +
(because they don't handle `SIGTERM` and end up being killed after a 10s timeout) + +--- + +## Restarting in the background + +- Many flags and commands of Compose are modeled after those of `docker` + +.exercise[ + +- Start the app in the background with the `-d` option: + ```bash + docker-compose up -d + ``` + +- Check that our app is running with the `ps` command: + ```bash + docker-compose ps + ``` + +] + +`docker-compose ps` also shows the ports exposed by the application. + +--- + +class: extra-details + +## Viewing logs + +- The `docker-compose logs` command works like `docker logs` + +.exercise[ + +- View all logs since container creation and exit when done: + ```bash + docker-compose logs + ``` + +- Stream container logs, starting at the last 10 lines for each container: + ```bash + docker-compose logs --tail 10 --follow + ``` + + + +] + +Tip: use `^S` and `^Q` to pause/resume log output. + +--- + +class: extra-details + +## Upgrading from Compose 1.6 + +.warning[The `logs` command has changed between Compose 1.6 and 1.7!] + +- Up to 1.6 + + - `docker-compose logs` is the equivalent of `logs --follow` + + - `docker-compose logs` must be restarted if containers are added + +- Since 1.7 + + - `--follow` must be specified explicitly + + - new containers are automatically picked up by `docker-compose logs` + +--- + +## Connecting to the web UI + +- The `webui` container exposes a web dashboard; let's view it + +.exercise[ + +- With a web browser, connect to `node1` on port 8000 + +- Remember: the `nodeX` aliases are valid only on the nodes themselves + +- In your browser, you need to enter the IP address of your node + +] + +You should see a speed of approximately 4 hashes/second. + +More precisely: 4 hashes/second, with regular dips down to zero. +
This is because Jérôme is incapable of writing good frontend code. +
Don't ask. Seriously, don't ask. This is embarrassing. + +--- + +class: extra-details + +## Why does the speed seem irregular? + +- The app actually has a constant, steady speed: 3.33 hashes/second +
+ (which corresponds to 1 hash every 0.3 seconds, for *reasons*) + +- The worker doesn't update the counter after every loop, but up to once per second + +- The speed is computed by the browser, checking the counter about once per second + +- Between two consecutive updates, the counter will increase either by 4, or by 0 + +- The perceived speed will therefore be 4 - 4 - 4 - 0 - 4 - 4 - etc. + +*We told you to not ask!!!* + +--- + +## Scaling up the application + +- Our goal is to make that performance graph go up (without changing a line of code!) + +-- + +- Before trying to scale the application, we'll figure out if we need more resources + + (CPU, RAM...) + +- For that, we will use good old UNIX tools on our Docker node + +--- + +## Looking at resource usage + +- Let's look at CPU, memory, and I/O usage + +.exercise[ + +- run `top` to see CPU and memory usage (you should see idle cycles) + +- run `vmstat 3` to see I/O usage (si/so/bi/bo) +
(the 4 numbers should be almost zero, except `bo` for logging) + +] + +We have available resources. + +- Why? +- How can we use them? + +--- + +## Scaling workers on a single node + +- Docker Compose supports scaling +- Let's scale `worker` and see what happens! + +.exercise[ + +- Start one more `worker` container: + ```bash + docker-compose scale worker=2 + ``` + +- Look at the performance graph (it should show a x2 improvement) + +- Look at the aggregated logs of our containers (`worker_2` should show up) + +- Look at the impact on CPU load with e.g. top (it should be negligible) + +] + +--- + +## Adding more workers + +- Great, let's add more workers and call it a day, then! + +.exercise[ + +- Start eight more `worker` containers: + ```bash + docker-compose scale worker=10 + ``` + +- Look at the performance graph: does it show a x10 improvement? + +- Look at the aggregated logs of our containers + +- Look at the impact on CPU load and memory usage + + + +] + +--- + +# Identifying bottlenecks + +- You should have seen a 3x speed bump (not 10x) + +- Adding workers didn't result in linear improvement + +- *Something else* is slowing us down + +-- + +- ... But what? + +-- + +- The code doesn't have instrumentation + +- Let's use state-of-the-art HTTP performance analysis! +
(i.e. good old tools like `ab`, `httping`...) + +--- + +## Accessing internal services + +- `rng` and `hasher` are exposed on ports 8001 and 8002 + +- This is declared in the Compose file: + + ```yaml + ... + rng: + build: rng + ports: + - "8001:80" + + hasher: + build: hasher + ports: + - "8002:80" + ... + ``` + +--- + +## Measuring latency under load + +We will use `httping`. + +.exercise[ + +- Check the latency of `rng`: + ```bash + httping -c 10 localhost:8001 + ``` + +- Check the latency of `hasher`: + ```bash + httping -c 10 localhost:8002 + ``` + +] + +`rng` has a much higher latency than `hasher`. + +--- + +## Let's draw hasty conclusions + +- The bottleneck seems to be `rng` + +- *What if* we don't have enough entropy and can't generate enough random numbers? + +- We need to scale out the `rng` service on multiple machines! + +Note: this is a fiction! We have enough entropy. But we need a pretext to scale out. + +(In fact, the code of `rng` uses `/dev/urandom`, which never runs out of entropy... +
+...and is [just as good as `/dev/random`](http://www.slideshare.net/PacSecJP/filippo-plain-simple-reality-of-entropy).) + +--- + +## Clean up + +- Before moving on, let's remove those containers + +.exercise[ + +- Tell Compose to remove everything: + ```bash + docker-compose down + ``` + +] diff --git a/docs/secrets.md b/docs/secrets.md new file mode 100644 index 00000000..5260aaed --- /dev/null +++ b/docs/secrets.md @@ -0,0 +1,193 @@ +class: secrets + +## Secret management + +- Docker has a "secret safe" (secure key→value store) + +- You can create as many secrets as you like + +- You can associate secrets to services + +- Secrets are exposed as plain text files, but kept in memory only (using `tmpfs`) + +- Secrets are immutable (at least in Engine 1.13) + +- Secrets have a max size of 500 KB + +--- + +class: secrets + +## Creating secrets + +- Must specify a name for the secret; and the secret itself + +.exercise[ + +- Assign [one of the four most commonly used passwords](https://www.youtube.com/watch?v=0Jx8Eay5fWQ) to a secret called `hackme`: + ```bash + echo love | docker secret create hackme - + ``` + +] + +If the secret is in a file, you can simply pass the path to the file. + +(The special path `-` indicates to read from the standard input.) + +--- + +class: secrets + +## Creating better secrets + +- Picking lousy passwords always leads to security breaches + +.exercise[ + +- Let's craft a better password, and assign it to another secret: + ```bash + base64 /dev/urandom | head -c16 | docker secret create arewesecureyet - + ``` + +] + +Note: in the latter case, we don't even know the secret at this point. But Swarm does. + +--- + +class: secrets + +## Using secrets + +- Secrets must be handed explicitly to services + +.exercise[ + +- Create a dummy service with both secrets: + ```bash + docker service create \ + --secret hackme --secret arewesecureyet \ + --name dummyservice --mode global \ + alpine sleep 1000000000 + ``` + +] + +We use a global service to make sure that there will be an instance on the local node. + +--- + +class: secrets + +## Accessing secrets + +- Secrets are materialized on `/run/secrets` (which is an in-memory filesystem) + +.exercise[ + +- Find the ID of the container for the dummy service: + ```bash + CID=$(docker ps -q --filter label=com.docker.swarm.service.name=dummyservice) + ``` + +- Enter the container: + ```bash + docker exec -ti $CID sh + ``` + +- Check the files in `/run/secrets` + +] + +--- + +class: secrets + +## Rotating secrets + +- You can't change a secret + + (Sounds annoying at first; but allows clean rollbacks if a secret update goes wrong) + +- You can add a secret to a service with `docker service update --secret-add` + + (This will redeploy the service; it won't add the secret on the fly) + +- You can remove a secret with `docker service update --secret-rm` + +- Secrets can be mapped to different names by expressing them with a micro-format: + ```bash + docker service create --secret source=secretname,target=filename + ``` + +--- + +class: secrets + +## Changing our insecure password + +- We want to replace our `hackme` secret with a better one + +.exercise[ + +- Remove the insecure `hackme` secret: + ```bash + docker service update dummyservice --secret-rm hackme + ``` + +- Add our better secret instead: + ```bash + docker service update dummyservice \ + --secret-add source=arewesecureyet,target=hackme + ``` + +] + +Wait for the service to be fully updated with e.g. `watch docker service ps dummyservice`. + +--- + +class: secrets + +## Checking that our password is now stronger + +- We will use the power of `docker exec`! + +.exercise[ + +- Get the ID of the new container: + ```bash + CID=$(docker ps -q --filter label=com.docker.swarm.service.name=dummyservice) + ``` + +- Check the contents of the secret files: + ```bash + docker exec $CID grep -r . /run/secrets + ``` + +] + +--- + +class: secrets + +## Secrets in practice + +- Can be (ab)used to hold whole configuration files if needed + +- If you intend to rotate secret `foo`, call it `foo.N` instead, and map it to `foo` + + (N can be a serial, a timestamp...) + + ```bash + docker service create --secret source=foo.N,target=foo ... + ``` + +- You can update (remove+add) a secret in a single command: + + ```bash + docker service update ... --secret-rm foo.M --secret-add source=foo.N,target=foo + ``` + +- For more details and examples, [check the documentation](https://docs.docker.com/engine/swarm/secrets/) diff --git a/docs/security.md b/docs/security.md new file mode 100644 index 00000000..49833a11 --- /dev/null +++ b/docs/security.md @@ -0,0 +1,16 @@ +# Secrets management and encryption at rest + +(New in Docker Engine 1.13) + +- Secrets management = selectively and securely bring secrets to services + +- Encryption at rest = protect against storage theft or prying + +- Remember: + + - control plane is authenticated through mutual TLS, certs rotated every 90 days + + - control plane is encrypted with AES-GCM, keys rotated every 12 hours + + - data plane is not encrypted by default (for performance reasons), +
but we saw earlier how to enable that with a single flag diff --git a/docs/stateful.md b/docs/stateful.md new file mode 100644 index 00000000..f8de8570 --- /dev/null +++ b/docs/stateful.md @@ -0,0 +1,344 @@ +# Dealing with stateful services + +- First of all, you need to make sure that the data files are on a *volume* + +- Volumes are host directories that are mounted to the container's filesystem + +- These host directories can be backed by the ordinary, plain host filesystem ... + +- ... Or by distributed/networked filesystems + +- In the latter scenario, in case of node failure, the data is safe elsewhere ... + +- ... And the container can be restarted on another node without data loss + +--- + +## Building a stateful service experiment + +- We will use Redis for this example + +- We will expose it on port 10000 to access it easily + +.exercise[ + +- Start the Redis service: + ```bash + docker service create --name stateful -p 10000:6379 redis + ``` + +- Check that we can connect to it: + ```bash + docker run --net host --rm redis redis-cli -p 10000 info server + ``` + +] + +--- + +## Accessing our Redis service easily + +- Typing that whole command is going to be tedious + +.exercise[ + +- Define a shell alias to make our lives easier: + ```bash + alias redis='docker run --net host --rm redis redis-cli -p 10000' + ``` + +- Try it: + ```bash + redis info server + ``` + +] + +--- + +## Basic Redis commands + +.exercise[ + +- Check that the `foo` key doesn't exist: + ```bash + redis get foo + ``` + +- Set it to `bar`: + ```bash + redis set foo bar + ``` + +- Check that it exists now: + ```bash + redis get foo + ``` + +] + +--- + +## Local volumes vs. global volumes + +- Global volumes exist in a single namespace + +- A global volume can be mounted on any node +
.small[(bar some restrictions specific to the volume driver in use; e.g. using an EBS-backed volume on a GCE/EC2 mixed cluster)] + +- Attaching a global volume to a container allows to start the container anywhere +
(and retain its data wherever you start it!) + +- Global volumes require extra *plugins* (Flocker, Portworx...) + +- Docker doesn't come with a default global volume driver at this point + +- Therefore, we will fall back on *local volumes* + +--- + +## Local volumes + +- We will use the default volume driver, `local` + +- As the name implies, the `local` volume driver manages *local* volumes + +- Since local volumes are (duh!) *local*, we need to pin our container to a specific host + +- We will do that with a *constraint* + +.exercise[ + +- Add a placement constraint to our service: + ```bash + docker service update stateful --constraint-add node.hostname==$HOSTNAME + ``` + +] + +--- + +## Where is our data? + +- If we look for our `foo` key, it's gone! + +.exercise[ + +- Check the `foo` key: + ```bash + redis get foo + ``` + +- Adding a constraint caused the service to be redeployed: + ```bash + docker service ps stateful + ``` + +] + +Note: even if the constraint ends up being a no-op (i.e. not +moving the service), the service gets redeployed. +This ensures consistent behavior. + +--- + +## Setting the key again + +- Since our database was wiped out, let's populate it again + +.exercise[ + +- Set `foo` again: + ```bash + redis set foo bar + ``` + +- Check that it's there: + ```bash + redis get foo + ``` + +] + +--- + +## Service updates cause containers to be replaced + +- Let's try to make a trivial update to the service and see what happens + +.exercise[ + +- Set a memory limit to our Redis service: + ```bash + docker service update stateful --limit-memory 100M + ``` + +- Try to get the `foo` key one more time: + ```bash + redis get foo + ``` + +] + +The key is blank again! + +--- + +## Service volumes are ephemeral by default + +- Let's highlight what's going on with volumes! + +.exercise[ + +- Check the current list of volumes: + ```bash + docker volume ls + ``` + +- Carry a minor update to our Redis service: + ```bash + docker service update stateful --limit-memory 200M + ``` + +] + +Again: all changes trigger the creation of a new task, and therefore a replacement of the existing container; +even when it is not strictly technically necessary. + +--- + +## The data is gone again + +- What happened to our data? + +.exercise[ + +- The list of volumes is slightly different: + ```bash + docker volume ls + ``` + +] + +(You should see one extra volume.) + +--- + +## Assigning a persistent volume to the container + +- Let's add an explicit volume mount to our service, referencing a named volume + +.exercise[ + +- Update the service with a volume mount: + ```bash + docker service update stateful \ + --mount-add type=volume,source=foobarstore,target=/data + ``` + +- Check the new volume list: + ```bash + docker volume ls + ``` + +] + +Note: the `local` volume driver automatically creates volumes. + +--- + +## Checking that persistence actually works across service updates + +.exercise[ + +- Store something in the `foo` key: + ```bash + redis set foo barbar + ``` + +- Update the service with yet another trivial change: + ```bash + docker service update stateful --limit-memory 300M + ``` + +- Check that `foo` is still set: + ```bash + redis get foo + ``` + +] + +--- + +## Recap + +- The service must commit its state to disk when being shutdown.red[*] + + (Shutdown = being sent a `TERM` signal) + +- The state must be written on files located on a volume + +- That volume must be specified to be persistent + +- If using a local volume, the service must also be pinned to a specific node + + (And losing that node means losing the data, unless there are other backups) + +.footnote[
+.red[*]If you customize Redis configuration, make sure you +persist data correctly! +
+It's easy to make that mistake — __Trust me!__] + +--- + +## Cleaning up + +.exercise[ + +- Remove the stateful service: + ```bash + docker service rm stateful + ``` + +- Remove the associated volume: + ```bash + docker volume rm foobarstore + ``` + +] + +Note: we could keep the volume around if we wanted. + +--- + +## Should I run stateful services in containers? + +-- + +Depending whom you ask, they'll tell you: + +-- + +- certainly not, heathen! + +-- + +- we've been running a few thousands PostgreSQL instances in containers ... +
for a few years now ... in production ... is that bad? + +-- + +- what's a container? + +-- + +Perhaps a better question would be: + +*"Should I run stateful services?"* + +-- + +- is it critical for my business? +- is it my value-add? +- or should I find somebody else to run them for me? diff --git a/docs/swarmkit.md b/docs/swarmkit.md new file mode 100644 index 00000000..d24a2d48 --- /dev/null +++ b/docs/swarmkit.md @@ -0,0 +1,997 @@ +# SwarmKit + +- [SwarmKit](https://github.com/docker/swarmkit) is an open source + toolkit to build multi-node systems + +- It is a reusable library, like libcontainer, libnetwork, vpnkit ... + +- It is a plumbing part of the Docker ecosystem + +--- + +## SwarmKit features + +- Highly-available, distributed store based on [Raft]( + https://en.wikipedia.org/wiki/Raft_%28computer_science%29) +
(avoids depending on an external store: easier to deploy; higher performance) + +- Dynamic reconfiguration of Raft without interrupting cluster operations + +- *Services* managed with a *declarative API* +
(implementing *desired state* and *reconciliation loop*) + +- Integration with overlay networks and load balancing + +- Strong emphasis on security: + + - automatic TLS keying and signing; automatic cert rotation + - full encryption of the data plane; automatic key rotation + - least privilege architecture (single-node compromise ≠ cluster compromise) + - on-disk encryption with optional passphrase + +--- + +class: extra-details + +## Where is the key/value store? + +- Many orchestration systems use a key/value store backed by a consensus algorithm +
+ (k8s→etcd→Raft, mesos→zookeeper→ZAB, etc.) + +- SwarmKit implements the Raft algorithm directly +
+ (Nomad is similar; thanks [@cbednarski](https://twitter.com/@cbednarski), + [@diptanu](https://twitter.com/diptanu) and others for point it out!) + +- Analogy courtesy of [@aluzzardi](https://twitter.com/aluzzardi): + + *It's like B-Trees and RDBMS. They are different layers, often + associated. But you don't need to bring up a full SQL server when + all you need is to index some data.* + +- As a result, the orchestrator has direct access to the data +
+ (the main copy of the data is stored in the orchestrator's memory) + +- Simpler, easier to deploy and operate; also faster + +--- + +## SwarmKit concepts (1/2) + +- A *cluster* will be at least one *node* (preferably more) + +- A *node* can be a *manager* or a *worker* + +- A *manager* actively takes part in the Raft consensus, and keeps the Raft log + +- You can talk to a *manager* using the SwarmKit API + +- One *manager* is elected as the *leader*; other managers merely forward requests to it + +- The *workers* get their instructions from the *managers* + +- Both *workers* and *managers* can run containers + +--- + +## Illustration + +![Illustration](swarm-mode.svg) + +--- + +## SwarmKit concepts (2/2) + +- The *managers* expose the SwarmKit API + +- Using the API, you can indicate that you want to run a *service* + +- A *service* is specified by its *desired state*: which image, how many instances... + +- The *leader* uses different subsystems to break down services into *tasks*: +
orchestrator, scheduler, allocator, dispatcher + +- A *task* corresponds to a specific container, assigned to a specific *node* + +- *Nodes* know which *tasks* should be running, and will start or stop containers accordingly (through the Docker Engine API) + +You can refer to the [NOMENCLATURE](https://github.com/docker/swarmkit/blob/master/design/nomenclature.md) in the SwarmKit repo for more details. + +--- + +## Swarm Mode + +- Since version 1.12, Docker Engine embeds SwarmKit + +- All the SwarmKit features are "asleep" until you enable "Swarm Mode" + +- Examples of Swarm Mode commands: + + - `docker swarm` (enable Swarm mode; join a Swarm; adjust cluster parameters) + + - `docker node` (view nodes; promote/demote managers; manage nodes) + + - `docker service` (create and manage services) + +??? + +- The Docker API exposes the same concepts + +- The SwarmKit API is also exposed (on a separate socket) + +--- + +## You need to enable Swarm mode to use the new stuff + +- By default, all this new code is inactive + +- Swarm Mode can be enabled, "unlocking" SwarmKit functions +
(services, out-of-the-box overlay networks, etc.) + +.exercise[ + +- Try a Swarm-specific command: + ```bash + docker node ls + ``` + +] + +-- + +You will get an error message: +``` +Error response from daemon: This node is not a swarm manager. [...] +``` + +--- + +# Creating our first Swarm + +- The cluster is initialized with `docker swarm init` + +- This should be executed on a first, seed node + +- .warning[DO NOT execute `docker swarm init` on multiple nodes!] + + You would have multiple disjoint clusters. + +.exercise[ + +- Create our cluster from node1: + ```bash + docker swarm init + ``` + +] + +-- + +class: advertise-addr + +If Docker tells you that it `could not choose an IP address to advertise`, see next slide! + +--- + +class: advertise-addr + +## IP address to advertise + +- When running in Swarm mode, each node *advertises* its address to the others +
+ (i.e. it tells them *"you can contact me on 10.1.2.3:2377"*) + +- If the node has only one IP address (other than 127.0.0.1), it is used automatically + +- If the node has multiple IP addresses, you **must** specify which one to use +
+ (Docker refuses to pick one randomly) + +- You can specify an IP address or an interface name +
(in the latter case, Docker will read the IP address of the interface and use it) + +- You can also specify a port number +
(otherwise, the default port 2377 will be used) + +--- + +class: advertise-addr + +## Which IP address should be advertised? + +- If your nodes have only one IP address, it's safe to let autodetection do the job + + .small[(Except if your instances have different private and public addresses, e.g. + on EC2, and you are building a Swarm involving nodes inside and outside the + private network: then you should advertise the public address.)] + +- If your nodes have multiple IP addresses, pick an address which is reachable + *by every other node* of the Swarm + +- If you are using [play-with-docker](http://play-with-docker.com/), use the IP + address shown next to the node name + + .small[(This is the address of your node on your private internal overlay network. + The other address that you might see is the address of your node on the + `docker_gwbridge` network, which is used for outbound traffic.)] + +Examples: + +```bash +docker swarm init --advertise-addr 10.0.9.2 +docker swarm init --advertise-addr eth0:7777 +``` + +--- + +class: extra-details + +## Using a separate interface for the data path + +- You can use different interfaces (or IP addresses) for control and data + +- You set the _control plane path_ with `--advertise-addr` + + (This will be used for SwarmKit manager/worker communication, leader election, etc.) + +- You set the _data plane path_ with `--data-path-addr` + + (This will be used for traffic between containers) + +- Both flags can accept either an IP address, or an interface name + + (When specifying an interface name, Docker will use its first IP address) + +--- + +## Token generation + +- In the output of `docker swarm init`, we have a message + confirming that our node is now the (single) manager: + + ``` + Swarm initialized: current node (8jud...) is now a manager. + ``` + +- Docker generated two security tokens (like passphrases or passwords) for our cluster + +- The CLI shows us the command to use on other nodes to add them to the cluster using the "worker" + security token: + + ``` + To add a worker to this swarm, run the following command: + docker swarm join \ + --token SWMTKN-1-59fl4ak4nqjmao1ofttrc4eprhrola2l87... \ + 172.31.4.182:2377 + ``` + +--- + +class: extra-details + +## Checking that Swarm mode is enabled + +.exercise[ + +- Run the traditional `docker info` command: + ```bash + docker info + ``` + +] + +The output should include: + +``` +Swarm: active + NodeID: 8jud7o8dax3zxbags3f8yox4b + Is Manager: true + ClusterID: 2vcw2oa9rjps3a24m91xhvv0c + ... +``` + +--- + +## Running our first Swarm mode command + +- Let's retry the exact same command as earlier + +.exercise[ + +- List the nodes (well, the only node) of our cluster: + ```bash + docker node ls + ``` + +] + +The output should look like the following: +``` +ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS +8jud...ox4b * node1 Ready Active Leader +``` + +--- + +## Adding nodes to the Swarm + +- A cluster with one node is not a lot of fun + +- Let's add `node2`! + +- We need the token that was shown earlier + +-- + +- You wrote it down, right? + +-- + +- Don't panic, we can easily see it again 😏 + +--- + +## Adding nodes to the Swarm + +.exercise[ + +- Show the token again: + ```bash + docker swarm join-token worker + ``` + +- Switch to `node2` + +- Copy-paste the `docker swarm join ...` command +
(that was displayed just before) + +] + +--- + +class: extra-details + +## Check that the node was added correctly + +- Stay on `node2` for now! + +.exercise[ + +- We can still use `docker info` to verify that the node is part of the Swarm: + ```bash + docker info | grep ^Swarm + ``` + +] + +- However, Swarm commands will not work; try, for instance: + ``` + docker node ls + ``` + +- This is because the node that we added is currently a *worker* + +- Only *managers* can accept Swarm-specific commands + +--- + +## View our two-node cluster + +- Let's go back to `node1` and see what our cluster looks like + +.exercise[ + +- Switch back to `node1` + +- View the cluster from `node1`, which is a manager: + ```bash + docker node ls + ``` + +] + +The output should be similar to the following: +``` +ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS +8jud...ox4b * node1 Ready Active Leader +ehb0...4fvx node2 Ready Active +``` + +--- + +class: docker-machine + +## Adding nodes using the Docker API + +- We don't have to SSH into the other nodes, we can use the Docker API + +- If you are using Play-With-Docker: + + - the nodes expose the Docker API over port 2375/tcp, without authentication + + - we will connect by setting the `DOCKER_HOST` environment variable + +- Otherwise: + + - the nodes expose the Docker API over port 2376/tcp, with TLS mutual authentication + + - we will use Docker Machine to set the correct environment variables +
(the nodes have been suitably pre-configured to be controlled through `node1`) + +--- + +class: docker-machine + +# Docker Machine + +- Docker Machine has two primary uses: + + - provisioning cloud instances running the Docker Engine + + - managing local Docker VMs within e.g. VirtualBox + +- Docker Machine is purely optional + +- It makes it easy to create, upgrade, manage... Docker hosts: + + - on your favorite cloud provider + + - locally (e.g. to test clustering, or different versions) + + - across different cloud providers + +--- + +class: self-paced, docker-machine + +## If you're using Play-With-Docker ... + +- You won't need to use Docker Machine + +- Instead, to "talk" to another node, we'll just set `DOCKER_HOST` + +- You can skip the exercises telling you to do things with Docker Machine! + +--- + +class: docker-machine + +## Docker Machine basic usage + +- We will learn two commands: + + - `docker-machine ls` (list existing hosts) + + - `docker-machine env` (switch to a specific host) + +.exercise[ + +- List configured hosts: + ```bash + docker-machine ls + ``` + +] + +You should see your 5 nodes. + +--- + +class: in-person, docker-machine + +## How did we make our 5 nodes show up there? + +*For the curious...* + +- This was done by our VM provisioning scripts + +- After setting up everything else, `node1` adds the 5 nodes + to the local Docker Machine configuration + (located in `$HOME/.docker/machine`) + +- Nodes are added using [Docker Machine generic driver](https://docs.docker.com/machine/drivers/generic/) + + (It skips machine provisioning and jumps straight to the configuration phase) + +- Docker Machine creates TLS certificates and deploys them to the nodes through SSH + +--- + +class: docker-machine + +## Using Docker Machine to communicate with a node + +- To select a node, use `eval $(docker-machine env nodeX)` + +- This sets a number of environment variables + +- To unset these variables, use `eval $(docker-machine env -u)` + +.exercise[ + +- View the variables used by Docker Machine: + ```bash + docker-machine env node3 + ``` + +] + +(This shows which variables *would* be set by Docker Machine; but it doesn't change them.) + +--- + +class: docker-machine + +## Getting the token + +- First, let's store the join token in a variable + +- This must be done from a manager + +.exercise[ + +- Make sure we talk to the local node, or `node1`: + ```bash + eval $(docker-machine env -u) + ``` + +- Get the join token: + ```bash + TOKEN=$(docker swarm join-token -q worker) + ``` + +] + +--- + +class: docker-machine + +## Change the node targeted by the Docker CLI + +- We need to set the right environment variables to communicate with `node3` + +.exercise[ + +- If you're using Play-With-Docker: + ```bash + export DOCKER_HOST=tcp://node3:2375 + ``` + +- Otherwise, use Docker Machine: + ```bash + eval $(docker-machine env node3) + ``` + +] + +--- + +class: docker-machine + +## Checking which node we're talking to + +- Let's use the Docker API to ask "who are you?" to the remote node + +.exercise[ + +- Extract the node name from the output of `docker info`: + ```bash + docker info | grep ^Name + ``` + +] + +This should tell us that we are talking to `node3`. + +Note: it can be useful to use a [custom shell prompt]( +https://github.com/jpetazzo/orchestration-workshop/blob/master/prepare-vms/scripts/postprep.rc#L68) +reflecting the `DOCKER_HOST` variable. + +--- + +class: docker-machine + +## Adding a node through the Docker API + +- We are going to use the same `docker swarm join` command as before + +.exercise[ + +- Add `node3` to the Swarm: + ```bash + docker swarm join --token $TOKEN node1:2377 + ``` + +] + +--- + +class: docker-machine + +## Going back to the local node + +- We need to revert the environment variable(s) that we had set previously + +.exercise[ + +- If you're using Play-With-Docker, just clear `DOCKER_HOST`: + ```bash + unset DOCKER_HOST + ``` + +- Otherwise, use Docker Machine to reset all the relevant variables: + ```bash + eval $(docker-machine env -u) + ``` + +] + +From that point, we are communicating with `node1` again. + +--- + +class: docker-machine + +## Checking the composition of our cluster + +- Now that we're talking to `node1` again, we can use management commands + +.exercise[ + +- Check that the node is here: + ```bash + docker node ls + ``` + +] + +--- + +class: under-the-hood + +## Under the hood: docker swarm init + +When we do `docker swarm init`: + +- a keypair is created for the root CA of our Swarm + +- a keypair is created for the first node + +- a certificate is issued for this node + +- the join tokens are created + +--- + +class: under-the-hood + +## Under the hood: join tokens + +There is one token to *join as a worker*, and another to *join as a manager*. + +The join tokens have two parts: + +- a secret key (preventing unauthorized nodes from joining) + +- a fingerprint of the root CA certificate (preventing MITM attacks) + +If a token is compromised, it can be rotated instantly with: +``` +docker swarm join-token --rotate +``` + +--- + +class: under-the-hood + +## Under the hood: docker swarm join + +When a node joins the Swarm: + +- it is issued its own keypair, signed by the root CA + +- if the node is a manager: + + - it joins the Raft consensus + - it connects to the current leader + - it accepts connections from worker nodes + +- if the node is a worker: + + - it connects to one of the managers (leader or follower) + +--- + +class: under-the-hood + +## Under the hood: cluster communication + +- The *control plane* is encrypted with AES-GCM; keys are rotated every 12 hours + +- Authentication is done with mutual TLS; certificates are rotated every 90 days + + (`docker swarm update` allows to change this delay or to use an external CA) + +- The *data plane* (communication between containers) is not encrypted by default + + (but this can be activated on a by-network basis, using IPSEC, + leveraging hardware crypto if available) + +--- + +class: under-the-hood + +## Under the hood: I want to know more! + +Revisit SwarmKit concepts: + +- Docker 1.12 Swarm Mode Deep Dive Part 1: Topology + ([video](https://www.youtube.com/watch?v=dooPhkXT9yI)) + +- Docker 1.12 Swarm Mode Deep Dive Part 2: Orchestration + ([video](https://www.youtube.com/watch?v=_F6PSP-qhdA)) + +Some presentations from the Docker Distributed Systems Summit in Berlin: + +- Heart of the SwarmKit: Topology Management + ([slides](https://speakerdeck.com/aluzzardi/heart-of-the-swarmkit-topology-management)) + +- Heart of the SwarmKit: Store, Topology & Object Model + ([slides](http://www.slideshare.net/Docker/heart-of-the-swarmkit-store-topology-object-model)) + ([video](https://www.youtube.com/watch?v=EmePhjGnCXY)) + +--- + +## Adding more manager nodes + +- Right now, we have only one manager (node1) + +- If we lose it, we lose quorum - and that's *very bad!* + +- Containers running on other nodes will be fine ... + +- But we won't be able to get or set anything related to the cluster + +- If the manager is permanently gone, we will have to do a manual repair! + +- Nobody wants to do that ... so let's make our cluster highly available + +--- + +class: self-paced + +## Adding more managers + +With Play-With-Docker: + +```bash +TOKEN=$(docker swarm join-token -q manager) +for N in $(seq 4 5); do + export DOCKER_HOST=tcp://node$N:2375 + docker swarm join --token $TOKEN node1:2377 +done +unset DOCKER_HOST +``` + +--- + +class: docker-machine + +## Adding more managers + +With Docker Machine: + +```bash +TOKEN=$(docker swarm join-token -q manager) +for N in $(seq 4 5); do + eval $(docker-machine env node$N) + docker swarm join --token $TOKEN node1:2377 +done +eval $(docker-machine env -u) +``` + +--- + +class: in-person + +## Building our full cluster + +- We could SSH to nodes 3, 4, 5; and copy-paste the command + +-- + +class: in-person + +- Or we could use the AWESOME POWER OF THE SHELL! + +-- + +class: in-person + +![Mario Red Shell](mario-red-shell.png) + +-- + +class: in-person + +- No, not *that* shell + +--- + +class: in-person + +## Let's form like Swarm-tron + +- Let's get the token, and loop over the remaining nodes with SSH + +.exercise[ + +- Obtain the manager token: + ```bash + TOKEN=$(docker swarm join-token -q manager) + ``` + +- Loop over the 3 remaining nodes: + ```bash + for NODE in node3 node4 node5; do + ssh $NODE docker swarm join --token $TOKEN node1:2377 + done + ``` + +] + +[That was easy.](https://www.youtube.com/watch?v=3YmMNpbFjp0) + +--- + +## You can control the Swarm from any manager node + +.exercise[ + +- Try the following command on a few different nodes: + ```bash + docker node ls + ``` + +] + +On manager nodes: +
you will see the list of nodes, with a `*` denoting +the node you're talking to. + +On non-manager nodes: +
you will get an error message telling you that +the node is not a manager. + +As we saw earlier, you can only control the Swarm through a manager node. + +--- + +class: self-paced + +## Play-With-Docker node status icon + +- If you're using Play-With-Docker, you get node status icons + +- Node status icons are displayed left of the node name + + - No icon = no Swarm mode detected + - Solid blue icon = Swarm manager detected + - Blue outline icon = Swarm worker detected + +![Play-With-Docker icons](pwd-icons.png) + +--- + +## Dynamically changing the role of a node + +- We can change the role of a node on the fly: + + `docker node promote XXX` → make XXX a manager +
+ `docker node demote XXX` → make XXX a worker + +.exercise[ + +- See the current list of nodes: + ``` + docker node ls + ``` + +- Promote any worker node to be a manager: + ``` + docker node promote + ``` + +] + +--- + +## How many managers do we need? + +- 2N+1 nodes can (and will) tolerate N failures +
(you can have an even number of managers, but there is no point) + +-- + +- 1 manager = no failure + +- 3 managers = 1 failure + +- 5 managers = 2 failures (or 1 failure during 1 maintenance) + +- 7 managers and more = now you might be overdoing it a little bit + +--- + +## Why not have *all* nodes be managers? + +- Intuitively, it's harder to reach consensus in larger groups + +- With Raft, writes have to go to (and be acknowledged by) all nodes + +- More nodes = more network traffic + +- Bigger network = more latency + +--- + +## What would McGyver do? + +- If some of your machines are more than 10ms away from each other, +
+ try to break them down in multiple clusters + (keeping internal latency low) + +- Groups of up to 9 nodes: all of them are managers + +- Groups of 10 nodes and up: pick 5 "stable" nodes to be managers +
+ (Cloud pro-tip: use separate auto-scaling groups for managers and workers) + +- Groups of more than 100 nodes: watch your managers' CPU and RAM + +- Groups of more than 1000 nodes: + + - if you can afford to have fast, stable managers, add more of them + - otherwise, break down your nodes in multiple clusters + +--- + +## What's the upper limit? + +- We don't know! + +- Internal testing at Docker Inc.: 1000-10000 nodes is fine + + - deployed to a single cloud region + + - one of the main take-aways was *"you're gonna need a bigger manager"* + +- Testing by the community: [4700 heterogenous nodes all over the 'net](https://sematext.com/blog/2016/11/14/docker-swarm-lessons-from-swarm3k/) + + - it just works + + - more nodes require more CPU; more containers require more RAM + + - scheduling of large jobs (70000 containers) is slow, though (working on it!) + +--- + +## Real-life deployment methods + +-- Running commands manually over SSH + +-- + + (lol jk) + +-- + +- Using your favorite configuration management tool + +- [Docker for AWS](https://docs.docker.com/docker-for-aws/#quickstart) + +- [Docker for Azure](https://docs.docker.com/docker-for-azure/) diff --git a/docs/swarmnbt.md b/docs/swarmnbt.md new file mode 100644 index 00000000..9440a2c1 --- /dev/null +++ b/docs/swarmnbt.md @@ -0,0 +1,72 @@ +class: nbt, extra-details + +## Measuring network conditions on the whole cluster + +- Since we have built-in, cluster-wide discovery, it's relatively straightforward + to monitor the whole cluster automatically + +- [Alexandros Mavrogiannis](https://github.com/alexmavr) wrote + [Swarm NBT](https://github.com/alexmavr/swarm-nbt), a tool doing exactly that! + +.exercise[ + +- Start Swarm NBT: + ```bash + docker run --rm -v inventory:/inventory \ + -v /var/run/docker.sock:/var/run/docker.sock \ + alexmavr/swarm-nbt start + ``` + +] + +Note: in this mode, Swarm NBT connects to the Docker API socket, +and issues additional API requests to start all the components it needs. + +--- + +class: nbt, extra-details + +## Viewing network conditions with Prometheus + +- Swarm NBT relies on Prometheus to scrape and store data + +- We can directly consume the Prometheus endpoint to view telemetry data + +.exercise[ + +- Point your browser to any Swarm node, on port 9090 + + (If you're using Play-With-Docker, click on the (9090) badge) + +- In the drop-down, select `icmp_rtt_gauge_seconds` + +- Click on "Graph" + +] + +You are now seeing ICMP latency across your cluster. + +--- + +class: nbt, in-person, extra-details + +## Viewing network conditions with Grafana + +- If you are using a "real" cluster (not Play-With-Docker) you can use Grafana + +.exercise[ + +- Start Grafana with `docker service create -p 3000:3000 grafana` +- Point your browser to Grafana, on port 3000 on any Swarm node +- Login with username `admin` and password `admin` +- Click on the top-left menu and browse to Data Sources +- Create a prometheus datasource with any name +- Point it to http://any-node-IP:9090 +- Set access to "direct" and leave credentials blank +- Click on the top-left menu, highlight "Dashboards" and select the "Import" option +- Copy-paste [this JSON payload]( + https://raw.githubusercontent.com/alexmavr/swarm-nbt/master/grafana.json), + then use the Prometheus Data Source defined before +- Poke around the dashboard that magically appeared! + +] diff --git a/docs/swarmtools.md b/docs/swarmtools.md new file mode 100644 index 00000000..ed747710 --- /dev/null +++ b/docs/swarmtools.md @@ -0,0 +1,204 @@ +class: swarmtools + +# SwarmKit debugging tools + +- The SwarmKit repository comes with debugging tools + +- They are *low level* tools; not for general use + +- We are going to see two of these tools: + + - `swarmctl`, to communicate directly with the SwarmKit API + + - `swarm-rafttool`, to inspect the content of the Raft log + +--- + +class: swarmtools + +## Building the SwarmKit tools + +- We are going to install a Go compiler, then download SwarmKit source and build it + +.exercise[ +- Download, compile, and install SwarmKit with this one-liner: + ```bash + docker run -v /usr/local/bin:/go/bin golang \ + go get `-v` github.com/docker/swarmkit/... + ``` + +] + +Remove `-v` if you don't like verbose things. + +Shameless promo: for more Go and Docker love, check +[this blog post](http://jpetazzo.github.io/2016/09/09/go-docker/)! + +Note: in the unfortunate event of SwarmKit *master* branch being broken, +the build might fail. In that case, just skip the Swarm tools section. + +--- + +class: swarmtools + +## Getting cluster-wide task information + +- The Docker API doesn't expose this directly (yet) + +- But the SwarmKit API does + +- We are going to query it with `swarmctl` + +- `swarmctl` is an example program showing how to + interact with the SwarmKit API + +--- + +class: swarmtools + +## Using `swarmctl` + +- The Docker Engine places the SwarmKit control socket in a special path + +- You need root privileges to access it + +.exercise[ + +- If you are using Play-With-Docker, set the following alias: + ```bash + alias swarmctl='/lib/ld-musl-x86_64.so.1 /usr/local/bin/swarmctl \ + --socket /var/run/docker/swarm/control.sock' + ``` + +- Otherwise, set the following alias: + ```bash + alias swarmctl='sudo swarmctl \ + --socket /var/run/docker/swarm/control.sock' + ``` + +] + +--- + +class: swarmtools + +## `swarmctl` in action + +- Let's review a few useful `swarmctl` commands + +.exercise[ + +- List cluster nodes (that's equivalent to `docker node ls`): + ```bash + swarmctl node ls + ``` + +- View all tasks across all services: + ```bash + swarmctl task ls + ``` + +] + +--- + +class: swarmtools + +## `swarmctl` notes + +- SwarmKit is vendored into the Docker Engine + +- If you want to use `swarmctl`, you need the exact version of + SwarmKit that was used in your Docker Engine + +- Otherwise, you might get some errors like: + + ``` + Error: grpc: failed to unmarshal the received message proto: wrong wireType = 0 + ``` + +- With Docker 1.12, the control socket was in `/var/lib/docker/swarm/control.sock` + +--- + +class: swarmtools + +## `swarm-rafttool` + +- SwarmKit stores all its important data in a distributed log using the Raft protocol + + (This log is also simply called the "Raft log") + +- You can decode that log with `swarm-rafttool` + +- This is a great tool to understand how SwarmKit works + +- It can also be used in forensics or troubleshooting + + (But consider it as a *very low level* tool!) + +--- + +class: swarmtools + +## The powers of `swarm-rafttool` + +With `swarm-rafttool`, you can: + +- view the latest snapshot of the cluster state; + +- view the Raft log (i.e. changes to the cluster state); + +- view specific objects from the log or snapshot; + +- decrypt the Raft data (to analyze it with other tools). + +It *cannot* work on live files, so you must stop Docker or make a copy first. + +--- + +class: swarmtools + +## Using `swarm-rafttool` + +- First, let's make a copy of the current Swarm data + +.exercise[ + +- If you are using Play-With-Docker, the Docker data directory is `/graph`: + ```bash + cp -r /graph/swarm /swarmdata + ``` + +- Otherwise, it is in the default `/var/lib/docker`: + ```bash + sudo cp -r /var/lib/docker/swarm /swarmdata + ``` + +] + +--- + +class: swarmtools + +## Dumping the Raft log + +- We have to indicate the path holding the Swarm data + + (Otherwise `swarm-rafttool` will try to use the live data, and complain that it's locked!) + +.exercise[ + +- If you are using Play-With-Docker, you must use the musl linker: + ```bash + /lib/ld-musl-x86_64.so.1 /usr/local/bin/swarm-rafttool -d /swarmdata/ dump-wal + ``` + +- Otherwise, you don't need the musl linker but you need to get root: + ```bash + sudo swarm-rafttool -d /swarmdata/ dump-wal + ``` + +] + +Reminder: this is a very low-level tool, requiring a knowledge of SwarmKit's internals! diff --git a/docs/updatingservices.md b/docs/updatingservices.md new file mode 100644 index 00000000..2632994b --- /dev/null +++ b/docs/updatingservices.md @@ -0,0 +1,264 @@ +# Updating services + +- We want to make changes to the web UI + +- The process is as follows: + + - edit code + + - build new image + + - ship new image + + - run new image + +--- + +class: extra-details + +## But first... + +- Restart the workers + +.exercise[ + +- Just scale back to 10 replicas: + ```bash + docker service update dockercoins_worker --replicas 10 + ``` + +- Check that they're running: + ```bash + docker service ps dockercoins_worker + ``` + +] + +--- + +## Updating a single service the hard way + +- To update a single service, we could do the following: + ```bash + REGISTRY=localhost:5000 TAG=v0.2 + IMAGE=$REGISTRY/dockercoins_webui:$TAG + docker build -t $IMAGE webui/ + docker push $IMAGE + docker service update dockercoins_webui --image $IMAGE + ``` + +- Make sure to tag properly your images: update the `TAG` at each iteration + + (When you check which images are running, you want these tags to be uniquely identifiable) + +--- + +## Updating services the easy way + +- With the Compose inbtegration, all we have to do is: + ```bash + export TAG=v0.2 + docker-compose -f composefile.yml build + docker-compose -f composefile.yml push + docker stack deploy -c composefile.yml nameofstack + ``` + +-- + +- That's exactly what we used earlier to deploy the app + +- We don't need to learn new commands! + +--- + +## Updating the web UI + +- Let's make the numbers on the Y axis bigger! + +.exercise[ + +- Edit the file `webui/files/index.html` + +- Locate the `font-size` CSS attribute and increase it (at least double it) + +- Save and exit + +- Build, ship, and run: + ```bash + export TAG=v0.2 + docker-compose -f dockercoins.yml build + docker-compose -f dockercoins.yml push + docker stack deploy -c dockercoins.yml dockercoins + ``` + +] + +--- + +## Viewing our changes + +- Wait at least 10 seconds (for the new version to be deployed) + +- Then reload the web UI + +- Or just mash "reload" frantically + +- ... Eventually the legend on the left will be bigger! + +--- + +## Making changes + +.exercise[ + +- Edit `~/orchestration-workshop/dockercoins/worker/worker.py` + +- Locate the line that has a `sleep` instruction + +- Increase the `sleep` from `0.1` to `1.0` + +- Save your changes and exit + +] + +--- + +## Rolling updates + +- Let's change a scaled service: `worker` + +.exercise[ + +- Edit `worker/worker.py` + +- Locate the `sleep` instruction and change the delay + +- Build, ship, and run our changes: + ```bash + export TAG=v0.3 + docker-compose -f dockercoins.yml build + docker-compose -f dockercoins.yml push + docker stack deploy -c dockercoins.yml dockercoins + ``` + +] + +--- + +## Viewing our update as it rolls out + +.exercise[ + +- Check the status of the `dockercoins_worker` service: + ```bash + watch docker service ps dockercoins_worker + ``` + +- Hide the tasks that are shutdown: + ```bash + watch -n1 "docker service ps dockercoins_worker | grep -v Shutdown.*Shutdown" + ``` + +] + +If you had stopped the workers earlier, this will automatically restart them. + +By default, SwarmKit does a rolling upgrade, one instance at a time. + +We should therefore see the workers being updated one my one. + +--- + +## Changing the upgrade policy + +- We can set upgrade parallelism (how many instances to update at the same time) + +- And upgrade delay (how long to wait between two batches of instances) + +.exercise[ + +- Change the parallelism to 2 and the delay to 5 seconds: + ```bash + docker service update dockercoins_worker \ + --update-parallelism 2 --update-delay 5s + ``` + +] + +The current upgrade will continue at a faster pace. + +--- + +## Changing the policy in the Compose file + +- The policy can also be updated in the Compose file + +- This is done by adding an `update_config` key under the `deploy` key: + + ```yaml + deploy: + replicas: 10 + update_config: + parallelism: 2 + delay: 10s + ``` + +--- + +## Rolling back + +- At any time (e.g. before the upgrade is complete), we can rollback: + + - by editing the Compose file and redeploying; + + - or with the special `--rollback` flag + +.exercise[ + +- Try to rollback the service: + ```bash + docker service update dockercoins_worker --rollback + ``` + +] + +What happens with the web UI graph? + +--- + +## The fine print with rollback + +- Rollback reverts to the previous service definition + +- If we visualize successive updates as a stack: + + - it doesn't "pop" the latest update + + - it "pushes" a copy of the previous update on top + + - ergo, rolling back twice does nothing + +- "Service definition" includes rollout cadence + +- Each `docker service update` command = a new service definition + +--- + +class: extra-details + +## Timeline of an upgrade + +- SwarmKit will upgrade N instances at a time +
(following the `update-parallelism` parameter) + +- New tasks are created, and their desired state is set to `Ready` +
.small[(this pulls the image if necessary, ensures resource availability, creates the container ... without starting it)] + +- If the new tasks fail to get to `Ready` state, go back to the previous step +
.small[(SwarmKit will try again and again, until the situation is addressed or desired state is updated)] + +- When the new tasks are `Ready`, it sets the old tasks desired state to `Shutdown` + +- When the old tasks are `Shutdown`, it starts the new tasks + +- Then it waits for the `update-delay`, and continues with the next batch of instances diff --git a/docs/versions.md b/docs/versions.md new file mode 100644 index 00000000..03e29f53 --- /dev/null +++ b/docs/versions.md @@ -0,0 +1,96 @@ +## Brand new versions! + +- Engine 17.10 +- Compose 1.16 +- Machine 0.12 + +.exercise[ + +- Check all installed versions: + ```bash + docker version + docker-compose -v + docker-machine -v + ``` + +] + +--- + +## Wait, what, 17.10 ?!? + +-- + +- Docker 1.13 = Docker 17.03 (year.month, like Ubuntu) + +- Every month, there is a new "edge" release (with new features) + +- Every quarter, there is a new "stable" release + +- Docker CE releases are maintained 4+ months + +- Docker EE releases are maintained 12+ months + +- For more details, check the [Docker EE announcement blog post](https://blog.docker.com/2017/03/docker-enterprise-edition/) + +--- + +class: extra-details + +## Docker CE vs Docker EE + +- Docker EE: + + - $$$ + - certification for select distros, clouds, and plugins + - advanced management features (fine-grained access control, security scanning...) + +- Docker CE: + + - free + - available through Docker Mac, Docker Windows, and major Linux distros + - perfect for individuals and small organizations + +--- + +class: extra-details + +## Why? + +- More readable for enterprise users + + (i.e. the very nice folks who are kind enough to pay us big $$$ for our stuff) + +- No impact for the community + + (beyond CE/EE suffix and version numbering change) + +- Both trains leverage the same open source components + + (containerd, libcontainer, SwarmKit...) + +- More predictible release schedule (see next slide) + +--- + +class: pic + +![Docker CE/EE release cycle](lifecycle.png) + +--- + +## What was added when? + +|||| +| ---- | ----- | --- | +| 2015 | 1.9 | Overlay (multi-host) networking, network/IPAM plugins +| 2016 | 1.10 | Embedded dynamic DNS +| 2016 | 1.11 | DNS round robin load balancing +| 2016 | 1.12 | Swarm mode, routing mesh, encrypted networking, healthchecks +| 2017 | 1.13 | Stacks, attachable overlays, image squash and compress +| 2017 | 1.13 | Windows Server 2016 Swarm mode +| 2017 | 17.03 | Secrets +| 2017 | 17.04 | Update rollback, placement preferences (soft constraints) +| 2017 | 17.05 | Multi-stage image builds, service logs +| 2017 | 17.06 | Swarm configs, node/service events +| 2017 | 17.06 | Windows Server 2016 Swarm overlay networks, secrets diff --git a/docs/workshop.css b/docs/workshop.css new file mode 100644 index 00000000..4361673b --- /dev/null +++ b/docs/workshop.css @@ -0,0 +1,126 @@ +@import url(https://fonts.googleapis.com/css?family=Yanone+Kaffeesatz); +@import url(https://fonts.googleapis.com/css?family=Droid+Serif:400,700,400italic); +@import url(https://fonts.googleapis.com/css?family=Ubuntu+Mono:400,700,400italic); + +/* For print! Borrowed from https://github.com/gnab/remark/issues/50 */ +@page { + size: 1210px 681px; + margin: 0; + } + +@media print { + .remark-slide-scaler { + width: 100% !important; + height: 100% !important; + transform: scale(1) !important; + top: 0 !important; + left: 0 !important; + } +} + +body { font-family: 'Droid Serif'; } + +h1, h2, h3 { + font-family: 'Yanone Kaffeesatz'; + font-weight: normal; + margin-top: 0.5em; +} + +a { + text-decoration: none; + color: blue; +} + +.remark-slide-content { padding: 1em 2.5em 1em 2.5em; } +.remark-slide-content { font-size: 25px; } +.remark-slide-content h1 { font-size: 50px; } +.remark-slide-content h2 { font-size: 50px; } +.remark-slide-content h3 { font-size: 25px; } + +.footnote { + position: absolute; + bottom: 3em; +} + +.remark-code { font-size: 25px; } +.small .remark-code { font-size: 16px; } +.remark-code, .remark-inline-code { font-family: 'Ubuntu Mono'; } +.remark-inline-code { + background-color: #ccc; +} + +.red { color: #fa0000; } +.gray { color: #ccc; } +.small { font-size: 70%; } +.big { font-size: 140%; } +.underline { text-decoration: underline; } +.strike { text-decoration: line-through; } + +.pic { + vertical-align: middle; + text-align: center; + padding: 0 0 0 0 !important; +} +img { + max-width: 100%; + max-height: 550px; +} +.title { + vertical-align: middle; + text-align: center; +} +.title h1 { font-size: 5em; } +.title p { font-size: 3em; } + +.quote { + background: #eee; + border-left: 10px solid #ccc; + margin: 1.5em 10px; + padding: 0.5em 10px; + quotes: "\201C""\201D""\2018""\2019"; + font-style: italic; +} +.quote:before { + color: #ccc; + content: open-quote; + font-size: 4em; + line-height: 0.1em; + margin-right: 0.25em; + vertical-align: -0.4em; +} +.quote p { + display: inline; +} + +.warning { + background-image: url("warning.png"); + background-size: 1.5em; + background-repeat: no-repeat; + padding-left: 2em; +} +.exercise { + background-color: #eee; + background-image: url("keyboard.png"); + background-size: 1.4em; + background-repeat: no-repeat; + background-position: 0.2em 0.2em; + border: 2px dotted black; +} +.exercise::before { + content: "Exercise"; + margin-left: 1.8em; +} + +li p { line-height: 1.25em; } + +div.extra-details { + background-image: url(extra-details.png); + background-position: 99.5% 1%; + background-size: 4%; +} + +/* This is used only for the history slide (the only table in this doc) */ +td { + padding: 0.1em 0.5em; + background: #eee; +} diff --git a/docs/workshop.html b/docs/workshop.html new file mode 100644 index 00000000..0a5c4f91 --- /dev/null +++ b/docs/workshop.html @@ -0,0 +1,33 @@ + + + + + Docker Orchestration Workshop + + + + +
+ The slides should show up here. If they don't, it might be + because you are accessing this file directly from your filesystem. + It needs to be served from a web server. You can try this: +
+        docker-compose up -d
+        open http://localhost:8888/workshop.html # on MacOS
+        xdg-open http://localhost:8888/workshop.html # on Linux
+      
+ Once the slides are loaded, this notice disappears when you + go full screen (e.g. by hitting "f"). +
+ + + + diff --git a/docs/workshop.yml b/docs/workshop.yml new file mode 100644 index 00000000..c55daa87 --- /dev/null +++ b/docs/workshop.yml @@ -0,0 +1,46 @@ +chapters: +- intro.md +- | + @@TOC@@ +- - prereqs.md + - versions.md + - | + class: title + + All right! +
+ We're all set. +
+ Let's do this. + - | + name: part-1 + + class: title, self-paced + + Part 1 + - sampleapp.md + - | + class: title + + Scaling out + - swarmkit.md +- - firstservice.md + - ourapponswarm.md +- - operatingswarm.md + - netshoot.md + - swarmnbt.md + - ipsec.md + - updatingservices.md + - healthchecks.md + - nodeinfo.md + - swarmtools.md +- - security.md + - secrets.md + - leastprivilege.md + - namespaces.md + - apiscope.md + - encryptionatrest.md + - metrics.md + - stateful.md + - extratips.md + - end.md