class: extra-details ## Troubleshooting overlay networks - We want to run tools like `ab` or `httping` on the internal network -- class: extra-details - Ah, if only we had created our overlay network with the `--attachable` flag ... -- class: extra-details - Oh well, let's use this as an excuse to introduce New Ways To Do Things --- # Breaking into an overlay network - We will create a dummy placeholder service on our network - Then we will use `docker exec` to run more processes in this container .exercise[ - Start a "do nothing" container using our favorite Swiss-Army distro: ```bash docker service create --network dockercoins_default --name debug \ --constraint node.hostname==$HOSTNAME alpine sleep 1000000000 ``` ] The `constraint` makes sure that the container will be created on the local node. --- ## Entering the debug container - Once our container is started (which should be really fast because the alpine image is small), we can enter it (from any node) .exercise[ - Locate the container: ```bash docker ps ``` - Enter it: ```bash docker exec -ti containerID sh ``` ] --- ## Labels - We can also be fancy and find the ID of the container automatically - SwarmKit places labels on containers .exercise[ - Get the ID of the container: ```bash CID=$(docker ps -q --filter label=com.docker.swarm.service.name=debug) ``` - And enter the container: ```bash docker exec -ti $CID sh ``` ] --- ## Installing our debugging tools - Ideally, you would author your own image, with all your favorite tools, and use it instead of the base `alpine` image - But we can also dynamically install whatever we need .exercise[ - Install a few tools: ```bash apk add --update curl apache2-utils drill ``` ] --- ## Investigating the `rng` service - First, let's check what `rng` resolves to .exercise[ - Use drill or nslookup to resolve `rng`: ```bash drill rng ``` ] This give us one IP address. It is not the IP address of a container. It is a virtual IP address (VIP) for the `rng` service. --- ## Investigating the VIP .exercise[ - Try to ping the VIP: ```bash ping -c 3 rng ``` ] It *should* ping. (But this might change in the future.) With Engine 1.12: VIPs respond to ping if a backend is available on the same machine. With Engine 1.13: VIPs respond to ping if a backend is available anywhere. (Again: this might change in the future.) --- ## What if I don't like VIPs? - Services can be published using two modes: VIP and DNSRR. - With VIP, you get a virtual IP for the service, and a load balancer based on IPVS (By the way, IPVS is totally awesome and if you want to learn more about it in the context of containers, I highly recommend [this talk](https://www.youtube.com/watch?v=oFsJVV1btDU&index=5&list=PLkA60AVN3hh87OoVra6MHf2L4UR9xwJkv) by [@kobolog](https://twitter.com/kobolog) at DC15EU!) - With DNSRR, you get the former behavior (from Engine 1.11), where resolving the service yields the IP addresses of all the containers for this service - You change this with `docker service create --endpoint-mode [VIP|DNSRR]` --- ## Looking up VIP backends - You can also resolve a special name: `tasks.` - It will give you the IP addresses of the containers for a given service .exercise[ - Obtain the IP addresses of the containers for the `rng` service: ```bash drill tasks.rng ``` ] This should list 5 IP addresses. --- class: extra-details, benchmarking ## Testing and benchmarking our service - We will check that the service is up with `rng`, then benchmark it with `ab` .exercise[ - Make a test request to the service: ```bash curl rng ``` - Open another window, and stop the workers, to test in isolation: ```bash docker service update dockercoins_worker --replicas 0 ``` ] Wait until the workers are stopped (check with `docker service ls`) before continuing. --- class: extra-details, benchmarking ## Benchmarking `rng` We will send 50 requests, but with various levels of concurrency. .exercise[ - Send 50 requests, with a single sequential client: ```bash ab -c 1 -n 50 http://rng/10 ``` - Send 50 requests, with fifty parallel clients: ```bash ab -c 50 -n 50 http://rng/10 ``` ] --- class: extra-details, benchmarking ## Benchmark results for `rng` - When serving requests sequentially, they each take 100ms - In the parallel scenario, the latency increased dramatically: - What about `hasher`? --- class: extra-details, benchmarking ## Benchmarking `hasher` We will do the same tests for `hasher`. The command is slightly more complex, since we need to post random data. First, we need to put the POST payload in a temporary file. .exercise[ - Generate 10 bytes of random data: ```bash curl http://rng/10 >/tmp/random ``` ] --- class: extra-details, benchmarking ## Benchmarking `hasher` Once again, we will send 50 requests, with different levels of concurrency. .exercise[ - Send 50 requests with a sequential client: ```bash ab -c 1 -n 50 -T application/octet-stream -p /tmp/random http://hasher/ ``` - Send 50 requests with 50 parallel clients: ```bash ab -c 50 -n 50 -T application/octet-stream -p /tmp/random http://hasher/ ``` ] --- class: extra-details, benchmarking ## Benchmark results for `hasher` - The sequential benchmarks takes ~5 seconds to complete - The parallel benchmark takes less than 1 second to complete - In both cases, each request takes a bit more than 100ms to complete - Requests are a bit slower in the parallel benchmark - It looks like `hasher` is better equipped to deal with concurrency than `rng` --- class: extra-details, title, benchmarking Why? --- class: extra-details, benchmarking ## Why does everything take (at least) 100ms? `rng` code: ![RNG code screenshot](images/delay-rng.png) `hasher` code: ![HASHER code screenshot](images/delay-hasher.png) --- class: extra-details, title, benchmarking But ... WHY?!? --- class: extra-details, benchmarking ## Why did we sprinkle the code with sleeps? - Deterministic performance
(regardless of instance speed, CPUs, I/O...) - Actual code sleeps all the time anyway - When your code makes a remote API call: - it sends a request; - it sleeps until it gets the response; - it processes the response. --- class: extra-details, in-person, benchmarking ## Why do `rng` and `hasher` behave differently? ![Equations on a blackboard](images/equations.png) (Synchronous vs. asynchronous event processing) --- class: extra-details ## Global scheduling → global debugging - Traditional approach: - log into a node - install our Swiss Army Knife (if necessary) - troubleshoot things - Proposed alternative: - put our Swiss Army Knife in a container (e.g. [nicolaka/netshoot](https://hub.docker.com/r/nicolaka/netshoot/)) - run tests from multiple locations at the same time (This becomes very practical with the `docker service log` command, available since 17.05.) --- ## More about overlay networks .blackbelt[DC17US: Deep Dive in Docker Overlay Networks ([video](https://www.youtube.com/watch?v=b3XDl0YsVsg&index=1&list=PLkA60AVN3hh-biQ6SCtBJ-WVTyBmmYho8))] .blackbelt[DC17EU: Deeper Dive in Docker Overlay Networks ([video](https://dockercon.docker.com/watch/XkRRA7Etsznv7uAk1UKsri))]