Compare commits

...

3 Commits

Author SHA1 Message Date
Jérôme Petazzoni
37242c0c72 Update describe-instances for awscli 1.11 (thanks @mikegcoleman for finding that bug!) 2017-04-27 14:56:15 -05:00
Jérôme Petazzoni
acab2f9074 Last updates 2017-04-17 13:48:58 -05:00
Jérôme Petazzoni
4cd6235ab7 add blackbelt speakers, toggle slides 2017-04-17 13:27:24 -05:00
3 changed files with 315 additions and 40 deletions

BIN
docs/blackbelt.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

View File

@@ -76,6 +76,12 @@
background-repeat: no-repeat;
padding-left: 2em;
}
.blackbelt {
background-image: url("blackbelt.png");
background-size: 1.5em;
background-repeat: no-repeat;
padding-left: 2em;
}
.exercise {
background-color: #eee;
background-image: url("keyboard.png");
@@ -104,10 +110,14 @@ class: in-person
## Intros
- Hello! We are
- Hello!
<!--
We are
AJ ([@s0ulshake](https://twitter.com/s0ulshake))
&
Jérôme ([@jpetazzo](https://twitter.com/jpetazzo))
-->
--
@@ -125,20 +135,25 @@ on time, it's a good idea to have a breakfast with the attendees
at e.g. 9am, and start at 9:30.
-->
---
class: in-person
## Agenda
<!--
- Agenda:
-->
.small[
- 09:00-09:15 hello
- 09:15-10:45 part 1
- 10:45-11:00 coffee break
- 11:00-12:30 part 2
- 12:30-13:30 lunch break
- 13:30-15:00 part 3
- 15:00-15:15 coffee break
- 15:15-16:45 part 4
- 16:45-17:00 Q&A
- 14:00-14:05 hello
- 14:05-14:50 part 1
- 14:50-15:00 tea/coffee break + Q&A
- 15:00-15:50 part 2
- 15:50-16:00 more tea/coffee + Q&A
- 16:00-16:50 part 3
- 16:50-17:00 no tea/coffee, still Q&A
- 17:00-23:59 kombucha, beers, and more
]
<!--
@@ -149,12 +164,26 @@ at e.g. 9am, and start at 9:30.
- Feel free to interrupt for questions at any time
- Live feedback, questions, help on [Gitter](chat)
- Live feedback, questions, help on [Slack](chat)
- All the content is publicly available (slides, code samples, scripts)
.blackbelt[GOTTA CATCH'EM ALL! (The black belt track references!)]
---
## .blackbelt[Tuesday 16:15]
Cgroups in Go...
Namespaces in Go...
Reminder: *Cgroups + Namespaces = Containers*
Liz Rice will craft some artisanal, organic, non-GMO containers in Go! ✨
???
class: in-person
## Disclaimer
@@ -217,8 +246,6 @@ class: self-paced
read [these instructions](https://github.com/jpetazzo/orchestration-workshop#using-play-with-docker) for extra
details
???
<!--
grep '^# ' index.html | grep -v '<br' | tr '#' '-'
-->
@@ -267,25 +294,25 @@ class: in-person
class: in-person
## Chapter 3: operating the Swarm
## Chapter 3: operating the Swarm (advanced material)
- Breaking into an overlay network
- (Breaking into an overlay network)
- Securing overlay networks
- (Securing overlay networks)
- Rolling updates
- (Rolling updates)
- (Secrets management and encryption at rest)
- [Centralized logging](#logging)
- ([Centralized logging](#logging))
- Metrics collection
- (Metrics collection)
---
class: in-person
## Chapter 4: bonus material
## Chapter 4: useful Swarm-fu
- Dealing with stateful services
@@ -543,8 +570,8 @@ You are welcome to use the method that you feel the most comfortable with.
## Brand new versions!
- Engine 17.03
- Compose 1.11
- Engine 17.05
- Compose 1.12
- Machine 0.10
.exercise[
@@ -560,7 +587,7 @@ You are welcome to use the method that you feel the most comfortable with.
---
## Wait, what, 17.03 ?!?
## Wait, what, 17.05 ?!?
--
@@ -670,9 +697,9 @@ class: extra-details
- Containers can have network aliases (resolvable through DNS)
- Compose file version 2 makes each container reachable through its service name
- Compose file version 2+ makes each container reachable through its service name
- Compose file version 1 requires "links" sections
- Compose file version 1 required "links" sections
- Our code can connect to services using their short name
@@ -990,6 +1017,20 @@ killall docker-compose
---
## .blackbelt[Wednesday 13:30]
Do you want to know exactly what your code is doing?
Down to the microsecond?
.small[(I want to say nanosecond but I don't want to be too presomptuous)]
Flame graphs! Performance counters! Kernel tracing!
Brendan Gregg will share with us the secrets of container peformance analysis.
---
## Accessing internal services
- `rng` and `hasher` are exposed on ports 8001 and 8002
@@ -1755,6 +1796,20 @@ Some presentations from the Docker Distributed Systems Summit in Berlin:
---
## .blackbelt[Tuesday 14:55]
What is this "quorum" thing exactly?
How does raft *actually* work?
Can I blow up a Swarm cluster to bits and rebuild it from scratch?
Docker Captain Laura Frank will answer all these questions!
*"I dyed my hair red with the blood of offline manager nodes."*
---
## Adding more manager nodes
- Right now, we have only one manager (node1)
@@ -3484,7 +3539,7 @@ You should now be able to connect to port 8000 and see the DockerCoins web UI.
---
# Breaking into an overlay network
# (Breaking into an overlay network)
- We will create a dummy placeholder service on our network
@@ -3908,7 +3963,7 @@ class: in-person
---
# Securing overlay networks
# (Securing overlay networks)
- By default, overlay networks are using plain VXLAN encapsulation
@@ -4037,7 +4092,31 @@ However, when you run the second one, only `#` will show up.
---
# Rolling updates
## .blackbelt[Tuesday 11:45]
In a galaxy far far away ...
--
The Death Star has a REST API, and the Empire protects it with L7 security policies ...
--
Wait, what?
--
Thomas Graf will present BPF and XDP, which are some *really cool* kernel tech.
Then he'll show how Cilium leverages them to implement L7 security policies.
Also he might or might not blow up Death Stars.
.small[(In other news, BPF and XDP are also used by Facebook to achieve 10x performance improvements over IPVS LB.)]
---
# (Rolling updates)
- We want to release a new version of the worker
@@ -4217,8 +4296,6 @@ Note: if you updated the roll-out parallelism, *rollback* will not rollback to t
---
class: swarmctl
## Getting task information for a given node
- You can see all the tasks assigned to a node with `docker node ps`
@@ -4235,7 +4312,7 @@ class: swarmctl
class: swarmtools
## SwarmKit debugging tools
# SwarmKit debugging tools
- The SwarmKit repository comes with debugging tools
@@ -4459,6 +4536,26 @@ Reminder: this is a very low-level tool, requiring a knowledge of SwarmKit's int
---
## .blackbelt[Wednesday 11:15]
*"Docker takes security seriously."*
--
Well, *everybody* takes security seriously, don't they?
--
*"We don't give a 💩 about security!"* (Said no company ever)
--
Come see how we reaccommodate security bugs, *one patch at a time*.
With Michael Crosby, maintainer of the Engine, libcontainer, runc, containerd ...
---
class: secrets
## Secret management
@@ -5242,6 +5339,24 @@ http://jpetazzo.github.io/2017/01/20/docker-logging-gelf/).
---
## .blackbelt[Tuesday 17:10]
You want to implement surveillance?
I mean metrics collection?
Gather CPU, RAM, I/O, etc. for all your nodes and containers?
But also applications and business metrics?
Julius Volz will show you what Prometheus can do for you!
(Spoiler alert: a lot!)
---
class: metrics
# Metrics collection
- We want to gather metrics in a central place
@@ -5252,6 +5367,8 @@ http://jpetazzo.github.io/2017/01/20/docker-logging-gelf/).
---
class: metrics
## Node metrics
- CPU, RAM, disk usage on the whole node
@@ -5268,6 +5385,8 @@ http://jpetazzo.github.io/2017/01/20/docker-logging-gelf/).
---
class: metrics
## Container metrics
- Similar to node metrics, but not totally identical
@@ -5288,6 +5407,8 @@ http://jpetazzo.github.io/2013/10/08/docker-containers-metrics/
---
class: metrics
## Tools
We will build *two* different metrics pipelines:
@@ -5302,6 +5423,8 @@ and PWD doesn't have that yet).
---
class: metrics
## First metrics pipeline
We will use three open source Go projects for our first metrics pipeline:
@@ -5320,6 +5443,8 @@ We will use three open source Go projects for our first metrics pipeline:
---
class: metrics
## Snap
- [github.com/intelsdi-x/snap](https://github.com/intelsdi-x/snap)
@@ -5338,6 +5463,8 @@ We will use three open source Go projects for our first metrics pipeline:
---
class: metrics
## InfluxDB
- Snap doesn't store metrics data
@@ -5354,6 +5481,8 @@ We will use three open source Go projects for our first metrics pipeline:
---
class: metrics
## Grafana
- Snap cannot show graphs
@@ -5366,6 +5495,8 @@ We will use three open source Go projects for our first metrics pipeline:
---
class: metrics
## Getting and setting up Snap
- We will install Snap directly on the nodes
@@ -5383,6 +5514,8 @@ We will use three open source Go projects for our first metrics pipeline:
---
class: metrics
## The Snap installer service
- This will get Snap on all nodes
@@ -5408,6 +5541,8 @@ for BIN in snapd snapctl; do ln -s /opt/snap/bin/$BIN /usr/local/bin/$BIN; done
---
class: metrics
## First contact with `snapd`
- The core of Snap is `snapd`, the Snap daemon
@@ -5430,6 +5565,8 @@ for BIN in snapd snapctl; do ln -s /opt/snap/bin/$BIN /usr/local/bin/$BIN; done
---
class: metrics
## Using `snapctl` to interact with `snapd`
- Let's load a *collector* and a *publisher* plugins
@@ -5452,6 +5589,8 @@ for BIN in snapd snapctl; do ln -s /opt/snap/bin/$BIN /usr/local/bin/$BIN; done
---
class: metrics
## Checking what we've done
- Good to know: Docker CLI uses `ls`, Snap CLI uses `list`
@@ -5472,6 +5611,8 @@ for BIN in snapd snapctl; do ln -s /opt/snap/bin/$BIN /usr/local/bin/$BIN; done
---
class: metrics
## Actually collecting metrics: introducing *tasks*
- To start collecting/processing/publishing metric data, you need to create a *task*
@@ -5493,6 +5634,8 @@ for BIN in snapd snapctl; do ln -s /opt/snap/bin/$BIN /usr/local/bin/$BIN; done
---
class: metrics
## Our first task manifest
```yaml
@@ -5515,6 +5658,8 @@ for BIN in snapd snapctl; do ln -s /opt/snap/bin/$BIN /usr/local/bin/$BIN; done
---
class: metrics
## Creating our first task
- The task manifest shown on the previous slide is stored in `snap/psutil-file.yml`.
@@ -5541,6 +5686,8 @@ for BIN in snapd snapctl; do ln -s /opt/snap/bin/$BIN /usr/local/bin/$BIN; done
---
class: metrics
## Checking existing tasks
.exercise[
@@ -5560,6 +5707,8 @@ The output should look like the following:
```
---
class: metrics
## Viewing our task dollars at work
- The task is using a very simple publisher, `mock-file`
@@ -5579,6 +5728,8 @@ To exit, hit `^C`
---
class: metrics
## Debugging tasks
- When a task is not directly writing to a local file, use `snapctl task watch`
@@ -5597,6 +5748,8 @@ To exit, hit `^C`
---
class: metrics
## Stopping snap
- Our Snap deployment has a few flaws:
@@ -5609,16 +5762,22 @@ To exit, hit `^C`
--
class: metrics
- We want to change that!
--
class: metrics
- But first, go back to the terminal where `snapd` is running, and hit `^C`
- All tasks will be stopped; all plugins will be unloaded; Snap will exit
---
class: metrics
## Snap Tribe Mode
- Tribe is Snap's clustering mechanism
@@ -5638,6 +5797,8 @@ To exit, hit `^C`
---
class: metrics
## Running Snap itself on every node
- Snap runs in the foreground, so you need to use `&` or start it in tmux
@@ -5655,6 +5816,8 @@ If you're *not* using Play-With-Docker, there is another way to start Snap!
---
class: metrics
## Starting a daemon through SSH
.warning[Hackety hack ahead!]
@@ -5670,6 +5833,8 @@ If you're *not* using Play-With-Docker, there is another way to start Snap!
---
class: metrics
## Running Snap itself on every node
- I might go to hell for showing you this, but here it goes ...
@@ -5693,6 +5858,8 @@ Remember: this *does not work* with Play-With-Docker (which doesn't have SSH).
---
class: metrics
## Viewing the members of our tribe
- If everything went fine, Snap is now running in tribe mode
@@ -5710,6 +5877,8 @@ This should show the 5 nodes with their hostnames.
---
class: metrics
## Create an agreement
- We can now create an *agreement* for our plugins and tasks
@@ -5732,6 +5901,8 @@ The output should look like the following:
---
class: metrics
## Instruct all nodes to join the agreement
- We dont need another fancy global service!
@@ -5756,6 +5927,8 @@ The last bit of output should look like the following:
---
class: metrics
## Start a container on every node
- The Docker plugin requires at least one container to be started
@@ -5775,6 +5948,8 @@ The last bit of output should look like the following:
---
class: metrics
## Running InfluxDB
- We will create a service for InfluxDB
@@ -5795,6 +5970,8 @@ The last bit of output should look like the following:
---
class: metrics
## Creating the InfluxDB service
.exercise[
@@ -5819,6 +5996,8 @@ this breaks a few things.]
---
class: metrics
## Setting up InfluxDB
- We need to create the "snap" database
@@ -5859,6 +6038,8 @@ Note: the InfluxDB query language *looks like* SQL but it's not.
---
class: metrics
## Load Docker collector and InfluxDB publisher
- We will load plugins on the local node
@@ -5884,6 +6065,8 @@ Note: the InfluxDB query language *looks like* SQL but it's not.
---
class: metrics
## Start a simple collection task
- Again, we will create a task on the local node
@@ -5907,6 +6090,8 @@ container.
---
class: metrics
## If things go wrong...
Note: if a task runs into a problem (e.g. it's trying to publish
@@ -5925,6 +6110,8 @@ the task (it will delete+re-create on all nodes).
---
class: metrics
## Check that metric data shows up in InfluxDB
- Let's check existing data with a few manual queries in the InfluxDB admin interface
@@ -5947,6 +6134,8 @@ the task (it will delete+re-create on all nodes).
---
class: metrics
## Deploy Grafana
- We will use an almost-official image, `grafana/grafana`
@@ -5964,6 +6153,8 @@ the task (it will delete+re-create on all nodes).
---
class: metrics
## Set up Grafana
.exercise[
@@ -5982,6 +6173,8 @@ the task (it will delete+re-create on all nodes).
---
class: metrics
## Add InfluxDB as a data source for Grafana
.small[
@@ -6006,10 +6199,14 @@ If you see an orange box (sometimes without a message), it means that you got so
---
class: metrics
![Screenshot showing how to fill the form](grafana-add-source.png)
---
class: metrics
## Create a dashboard in Grafana
.exercise[
@@ -6032,6 +6229,8 @@ At this point, you should see a sample graph showing up.
---
class: metrics
## Setting up a graph in Grafana
.exercise[
@@ -6053,10 +6252,14 @@ Congratulations, you are viewing the CPU usage of a single container!
---
class: metrics
![Screenshot showing the end result](grafana-add-graph.png)
---
class: metrics
## Before moving on ...
- Leave that tab open!
@@ -6067,6 +6270,8 @@ Congratulations, you are viewing the CPU usage of a single container!
---
class: metrics
## Prometheus
- Prometheus is another metrics collection system
@@ -6084,6 +6289,8 @@ Congratulations, you are viewing the CPU usage of a single container!
---
class: metrics
## It's all about the `/metrics`
- This is was the *node exporter* looks like:
@@ -6100,6 +6307,8 @@ Congratulations, you are viewing the CPU usage of a single container!
---
class: metrics
## Collecting metrics with Prometheus on Swarm
- We will run two *global services* (i.e. scheduled on all our nodes):
@@ -6118,6 +6327,8 @@ Congratulations, you are viewing the CPU usage of a single container!
---
class: metrics
## Creating an overlay network for Prometheus
- This is the easiest step ☺
@@ -6133,6 +6344,8 @@ Congratulations, you are viewing the CPU usage of a single container!
---
class: metrics
## Running the node exporter
- The node exporter *should* run directly on the hosts
@@ -6158,6 +6371,8 @@ Congratulations, you are viewing the CPU usage of a single container!
---
class: metrics
## Running cAdvisor
- Likewise, cAdvisor *should* run directly on the hosts
@@ -6180,6 +6395,8 @@ Congratulations, you are viewing the CPU usage of a single container!
---
class: metrics
## Configuring the Prometheus server
This will be our configuration file for Prometheus:
@@ -6205,6 +6422,8 @@ scrape_configs:
---
class: metrics
## Passing the configuration to the Prometheus server
- We need to provide our custom configuration to the Prometheus server
@@ -6225,6 +6444,8 @@ scrape_configs:
---
class: metrics
## Building our custom Prometheus image
- We will use the local registry started previously on 127.0.0.1:5000
@@ -6245,6 +6466,8 @@ scrape_configs:
---
class: metrics
## Running our custom Prometheus image
- That's the only service that needs to be published
@@ -6263,6 +6486,8 @@ scrape_configs:
---
class: metrics
## Checking our Prometheus server
- First, let's make sure that Prometheus is correctly scraping all metrics
@@ -6281,6 +6506,8 @@ Their state should be "UP".
---
class: metrics
## Displaying metrics directly from Prometheus
- This is easy ... if you are familiar with PromQL
@@ -6304,6 +6531,8 @@ Their state should be "UP".
---
class: metrics
## Building the query from scratch
- We are going to build the same query from scratch
@@ -6318,6 +6547,8 @@ Their state should be "UP".
---
class: metrics
## Displaying a raw metric for *all* containers
- Click on the "Graph" tab on top
@@ -6338,6 +6569,8 @@ Their state should be "UP".
---
class: metrics
## Selecting metrics for a specific service
- Hover over the lines in the graph
@@ -6358,6 +6591,8 @@ Their state should be "UP".
---
class: metrics
## Turn counters into rates
- What we see is the total amount of CPU used (in seconds)
@@ -6381,6 +6616,8 @@ Their state should be "UP".
---
class: metrics
## Aggregate multiple data series
- We have one graph per CPU; we want to sum them
@@ -6408,6 +6645,8 @@ Their state should be "UP".
---
class: metrics
## Comparing Snap and Prometheus data
- If you haven't setup Snap, InfluxDB, and Grafana, skip this section
@@ -6420,6 +6659,8 @@ Their state should be "UP".
---
class: metrics
## Add Prometheus as a data source in Grafana
.exercise[
@@ -6438,6 +6679,8 @@ We see the same input form that we filled earlier to connect to InfluxDB.
---
class: metrics
## Connecting to Prometheus from Grafana
.exercise[
@@ -6460,6 +6703,8 @@ Otherwise, double-check every field and try again!
---
class: metrics
## Adding the Prometheus data to our dashboard
.exercise[
@@ -6476,6 +6721,8 @@ This takes us to the graph editor that we used earlier.
---
class: metrics
## Querying Prometheus data from Grafana
The editor is a bit less friendly than the one we used for InfluxDB.
@@ -6499,6 +6746,8 @@ The editor is a bit less friendly than the one we used for InfluxDB.
---
class: metrics
## Interpreting results
- The two graphs *should* be similar
@@ -6521,6 +6770,8 @@ The editor is a bit less friendly than the one we used for InfluxDB.
---
class: metrics
## More resources on container metrics
- [Docker Swarm & Container Overview](https://grafana.net/dashboards/609),
@@ -7071,11 +7322,31 @@ class: title
---
## .blackbelt[Wednesday 14:25]
Don't get hacked!
.small[(Especially by sketchy Russian groups, since it leads to privilege escalation and compromises Democracy. Just saying.)]
Sign images!
Require multiple keys for signature!
Revoke compromised keys and reissue new ones easily!
Prevent replay attacks and use of obsolete vulnerable software!
Justin Cappos will tell you all about *securing the software supply chain!*
.small[(This is Infosec! It's Very Important! Therefore I Am Allowed To Use Lots Of Exclamation Marks!)]
---
## Work in progress
- Stabilize Compose/Swarm integration
<!--
- Refine Snap deployment
-->
- Healthchecks
@@ -7110,7 +7381,7 @@ class: title
var slideshow = remark.create({
ratio: '16:9',
highlightSpans: true,
excludedClasses: ["in-person"]
excludedClasses: ["self-paced", "extra-details", "metrics", "swarmtools", "secrets", "encryption-at-rest"]
});
</script>
</body>

View File

@@ -66,20 +66,24 @@ aws_display_instances_by_tag() {
fi
}
aws_get_instance_ids_by_filter() {
FILTER=$1
aws ec2 describe-instances --filters $FILTER \
--query Reservations[*].Instances[*].InstanceId \
--output text | tr "\t" "\n"
}
aws_get_instance_ids_by_client_token() {
TOKEN=$1
need_tag $TOKEN
aws ec2 describe-instances --filters "Name=client-token,Values=$TOKEN" \
| grep ^INSTANCE \
| awk '{print $8}'
aws_get_instance_ids_by_filter Name=client-token,Values=$TOKEN
}
aws_get_instance_ids_by_tag() {
TAG=$1
need_tag $TAG
aws ec2 describe-instances --filters "Name=tag:Name,Values=$TAG" \
| grep ^INSTANCE \
| awk '{print $8}'
aws_get_instance_ids_by_filter Name=tag:Name,Values=$TAG
}
aws_get_instance_ips_by_tag() {