mirror of
https://github.com/jpetazzo/container.training.git
synced 2026-02-14 17:49:59 +00:00
386 lines
7.3 KiB
Markdown
386 lines
7.3 KiB
Markdown
class: extra-details
|
|
|
|
## Troubleshooting overlay networks
|
|
|
|
<!--
|
|
|
|
## Finding the real cause of the bottleneck
|
|
|
|
- We want to debug our app as we scale `worker` up and down
|
|
|
|
-->
|
|
|
|
- We want to run tools like `ab` or `httping` on the internal network
|
|
|
|
--
|
|
|
|
class: extra-details
|
|
|
|
- Ah, if only we had created our overlay network with the `--attachable` flag ...
|
|
|
|
--
|
|
|
|
class: extra-details
|
|
|
|
- Oh well, let's use this as an excuse to introduce New Ways To Do Things
|
|
|
|
---
|
|
|
|
# Breaking into an overlay network
|
|
|
|
- We will create a dummy placeholder service on our network
|
|
|
|
- Then we will use `docker exec` to run more processes in this container
|
|
|
|
.exercise[
|
|
|
|
- Start a "do nothing" container using our favorite Swiss-Army distro:
|
|
```bash
|
|
docker service create --network dockercoins_default --name debug \
|
|
--constraint node.hostname==$HOSTNAME alpine sleep 1000000000
|
|
```
|
|
|
|
]
|
|
|
|
The `constraint` makes sure that the container will be created on the local node.
|
|
|
|
---
|
|
|
|
## Entering the debug container
|
|
|
|
- Once our container is started (which should be really fast because the alpine image is small), we can enter it (from any node)
|
|
|
|
.exercise[
|
|
|
|
- Locate the container:
|
|
```bash
|
|
docker ps
|
|
```
|
|
|
|
- Enter it:
|
|
```bash
|
|
docker exec -ti <containerID> sh
|
|
```
|
|
|
|
]
|
|
|
|
---
|
|
|
|
## Labels
|
|
|
|
- We can also be fancy and find the ID of the container automatically
|
|
|
|
- SwarmKit places labels on containers
|
|
|
|
.exercise[
|
|
|
|
- Get the ID of the container:
|
|
```bash
|
|
CID=$(docker ps -q --filter label=com.docker.swarm.service.name=debug)
|
|
```
|
|
|
|
- And enter the container:
|
|
```bash
|
|
docker exec -ti $CID sh
|
|
```
|
|
|
|
]
|
|
|
|
---
|
|
|
|
## Installing our debugging tools
|
|
|
|
- Ideally, you would author your own image, with all your favorite tools, and use it instead of the base `alpine` image
|
|
|
|
- But we can also dynamically install whatever we need
|
|
|
|
.exercise[
|
|
|
|
- Install a few tools:
|
|
```bash
|
|
apk add --update curl apache2-utils drill
|
|
```
|
|
|
|
]
|
|
|
|
---
|
|
|
|
## Investigating the `rng` service
|
|
|
|
- First, let's check what `rng` resolves to
|
|
|
|
.exercise[
|
|
|
|
- Use drill or nslookup to resolve `rng`:
|
|
```bash
|
|
drill rng
|
|
```
|
|
|
|
]
|
|
|
|
This give us one IP address. It is not the IP address of a container.
|
|
It is a virtual IP address (VIP) for the `rng` service.
|
|
|
|
---
|
|
|
|
## Investigating the VIP
|
|
|
|
.exercise[
|
|
|
|
- Try to ping the VIP:
|
|
```bash
|
|
ping rng
|
|
```
|
|
|
|
]
|
|
|
|
It *should* ping. (But this might change in the future.)
|
|
|
|
With Engine 1.12: VIPs respond to ping if a
|
|
backend is available on the same machine.
|
|
|
|
With Engine 1.13: VIPs respond to ping if a
|
|
backend is available anywhere.
|
|
|
|
(Again: this might change in the future.)
|
|
|
|
---
|
|
|
|
## What if I don't like VIPs?
|
|
|
|
- Services can be published using two modes: VIP and DNSRR.
|
|
|
|
- With VIP, you get a virtual IP for the service, and a load balancer
|
|
based on IPVS
|
|
|
|
(By the way, IPVS is totally awesome and if you want to learn more about it in the context of containers,
|
|
I highly recommend [this talk](https://www.youtube.com/watch?v=oFsJVV1btDU&index=5&list=PLkA60AVN3hh87OoVra6MHf2L4UR9xwJkv) by [@kobolog](https://twitter.com/kobolog) at DC15EU!)
|
|
|
|
- With DNSRR, you get the former behavior (from Engine 1.11), where
|
|
resolving the service yields the IP addresses of all the containers for
|
|
this service
|
|
|
|
- You change this with `docker service create --endpoint-mode [VIP|DNSRR]`
|
|
|
|
---
|
|
|
|
## Looking up VIP backends
|
|
|
|
- You can also resolve a special name: `tasks.<name>`
|
|
|
|
- It will give you the IP addresses of the containers for a given service
|
|
|
|
.exercise[
|
|
|
|
- Obtain the IP addresses of the containers for the `rng` service:
|
|
```bash
|
|
drill tasks.rng
|
|
```
|
|
|
|
]
|
|
|
|
This should list 5 IP addresses.
|
|
|
|
---
|
|
|
|
class: extra-details, benchmarking
|
|
|
|
## Testing and benchmarking our service
|
|
|
|
- We will check that the service is up with `rng`, then
|
|
benchmark it with `ab`
|
|
|
|
.exercise[
|
|
|
|
- Make a test request to the service:
|
|
```bash
|
|
curl rng
|
|
```
|
|
|
|
- Open another window, and stop the workers, to test in isolation:
|
|
```bash
|
|
docker service update dockercoins_worker --replicas 0
|
|
```
|
|
|
|
]
|
|
|
|
Wait until the workers are stopped (check with `docker service ls`)
|
|
before continuing.
|
|
|
|
---
|
|
|
|
class: extra-details, benchmarking
|
|
|
|
## Benchmarking `rng`
|
|
|
|
We will send 50 requests, but with various levels of concurrency.
|
|
|
|
.exercise[
|
|
|
|
- Send 50 requests, with a single sequential client:
|
|
```bash
|
|
ab -c 1 -n 50 http://rng/10
|
|
```
|
|
|
|
- Send 50 requests, with fifty parallel clients:
|
|
```bash
|
|
ab -c 50 -n 50 http://rng/10
|
|
```
|
|
|
|
]
|
|
|
|
---
|
|
|
|
class: extra-details, benchmarking
|
|
|
|
## Benchmark results for `rng`
|
|
|
|
- When serving requests sequentially, they each take 100ms
|
|
|
|
- In the parallel scenario, the latency increased dramatically:
|
|
|
|
- What about `hasher`?
|
|
|
|
---
|
|
|
|
class: extra-details, benchmarking
|
|
|
|
## Benchmarking `hasher`
|
|
|
|
We will do the same tests for `hasher`.
|
|
|
|
The command is slightly more complex, since we need to post random data.
|
|
|
|
First, we need to put the POST payload in a temporary file.
|
|
|
|
.exercise[
|
|
|
|
- Install curl in the container, and generate 10 bytes of random data:
|
|
```bash
|
|
curl http://rng/10 >/tmp/random
|
|
```
|
|
|
|
]
|
|
|
|
---
|
|
|
|
class: extra-details, benchmarking
|
|
|
|
## Benchmarking `hasher`
|
|
|
|
Once again, we will send 50 requests, with different levels of concurrency.
|
|
|
|
.exercise[
|
|
|
|
- Send 50 requests with a sequential client:
|
|
```bash
|
|
ab -c 1 -n 50 -T application/octet-stream -p /tmp/random http://hasher/
|
|
```
|
|
|
|
- Send 50 requests with 50 parallel clients:
|
|
```bash
|
|
ab -c 50 -n 50 -T application/octet-stream -p /tmp/random http://hasher/
|
|
```
|
|
|
|
]
|
|
|
|
---
|
|
|
|
class: extra-details, benchmarking
|
|
|
|
## Benchmark results for `hasher`
|
|
|
|
- The sequential benchmarks takes ~5 seconds to complete
|
|
|
|
- The parallel benchmark takes less than 1 second to complete
|
|
|
|
- In both cases, each request takes a bit more than 100ms to complete
|
|
|
|
- Requests are a bit slower in the parallel benchmark
|
|
|
|
- It looks like `hasher` is better equiped to deal with concurrency than `rng`
|
|
|
|
---
|
|
|
|
class: extra-details, title, benchmarking
|
|
|
|
Why?
|
|
|
|
---
|
|
|
|
class: extra-details, benchmarking
|
|
|
|
## Why does everything take (at least) 100ms?
|
|
|
|
`rng` code:
|
|
|
|

|
|
|
|
`hasher` code:
|
|
|
|

|
|
|
|
---
|
|
|
|
class: extra-details, title, benchmarking
|
|
|
|
But ...
|
|
|
|
WHY?!?
|
|
|
|
---
|
|
|
|
class: extra-details, benchmarking
|
|
|
|
## Why did we sprinkle this sample app with sleeps?
|
|
|
|
- Deterministic performance
|
|
<br/>(regardless of instance speed, CPUs, I/O...)
|
|
|
|
- Actual code sleeps all the time anyway
|
|
|
|
- When your code makes a remote API call:
|
|
|
|
- it sends a request;
|
|
|
|
- it sleeps until it gets the response;
|
|
|
|
- it processes the response.
|
|
|
|
---
|
|
|
|
class: extra-details, in-person, benchmarking
|
|
|
|
## Why do `rng` and `hasher` behave differently?
|
|
|
|

|
|
|
|
(Synchronous vs. asynchronous event processing)
|
|
|
|
---
|
|
|
|
class: extra-details
|
|
|
|
## Global scheduling → global debugging
|
|
|
|
- Traditional approach:
|
|
|
|
- log into a node
|
|
- install our Swiss Army Knife (if necessary)
|
|
- troubleshoot things
|
|
|
|
- Proposed alternative:
|
|
|
|
- put our Swiss Army Knife in a container (e.g. [nicolaka/netshoot](https://hub.docker.com/r/nicolaka/netshoot/))
|
|
- run tests from multiple locations at the same time
|
|
|
|
(This becomes very practical with the `docker service log` command, available since 17.05.)
|
|
|
|
---
|
|
|
|
## More about overlay networks
|
|
|
|
.blackbelt[[Deep Dive in Docker Overlay Networks](https://www.youtube.com/watch?v=b3XDl0YsVsg&index=1&list=PLkA60AVN3hh-biQ6SCtBJ-WVTyBmmYho8) by Laurent Bernaille (DC17US)]
|
|
|
|
.blackbelt[Deeper Dive in Docker Overlay Networks by Laurent Bernaille (Wednesday 13:30)]
|