# Orchestration, an overview In this chapter, we will: * Explain what is orchestration and why we would need it. * Present (from a high-level perspective) some orchestrators. --- class: pic ## What's orchestration? ![Joana Carneiro (orchestra conductor)](images/conductor.jpg) --- ## What's orchestration? According to Wikipedia: *Orchestration describes the __automated__ arrangement, coordination, and management of complex computer systems, middleware, and services.* -- *[...] orchestration is often discussed in the context of __service-oriented architecture__, __virtualization__, provisioning, Converged Infrastructure and __dynamic datacenter__ topics.* -- What does that really mean? --- ## Example 1: dynamic cloud instances -- - Q: do we always use 100% of our servers? -- - A: obviously not! .center[![Daily variations of traffic](images/traffic-graph.png)] --- ## Example 1: dynamic cloud instances - Every night, scale down (by shutting down extraneous replicated instances) - Every morning, scale up (by deploying new copies) - "Pay for what you use" (i.e. save big $$$ here) --- ## Example 1: dynamic cloud instances How do we implement this? - Crontab - Autoscaling (save even bigger $$$) That's *relatively* easy. Now, how are things for our IAAS provider? --- ## Example 2: dynamic datacenter - Q: what's the #1 cost in a datacenter? -- - A: electricity! -- - Q: what uses electricity? -- - A: servers, obviously - A: ... and associated cooling -- - Q: do we always use 100% of our servers? -- - A: obviously not! --- ## Example 2: dynamic datacenter - If only we could turn off unused servers during the night... - Problem: we can only turn off a server if it's totally empty! (i.e. all VMs on it are stopped/moved) - Solution: *migrate* VMs and shutdown empty servers (e.g. combine two hypervisors with 40% load into 80%+0%,
and shut down the one at 0%) --- ## Example 2: dynamic datacenter How do we implement this? - Shut down empty hosts (but keep some spare capacity) - Start hosts again when capacity gets low - Ability to "live migrate" VMs (Xen already did this 10+ years ago) - Rebalance VMs on a regular basis - what if a VM is stopped while we move it? - should we allow provisioning on hosts involved in a migration? *Scheduling* becomes more complex. --- ## What is scheduling? According to Wikipedia (again): *In computing, scheduling is the method by which threads, processes or data flows are given access to system resources.* The scheduler is concerned mainly with: - throughput (total amount of work done per time unit); - turnaround time (between submission and completion); - response time (between submission and start); - waiting time (between job readiness and execution); - fairness (appropriate times according to priorities). In practice, these goals often conflict. **"Scheduling" = decide which resources to use.** --- ## Exercise 1 - You have: - 5 hypervisors (physical machines) - Each server has: - 16 GB RAM, 8 cores, 1 TB disk - Each week, your team requests: - one VM with X RAM, Y CPU, Z disk Scheduling = deciding which hypervisor to use for each VM. Difficulty: easy! --- ## Exercise 2 - You have: - 1000+ hypervisors (and counting!) - Each server has different resources: - 8-500 GB of RAM, 4-64 cores, 1-100 TB disk - Multiple times a day, a different team asks for: - up to 50 VMs with different characteristics Scheduling = deciding which hypervisor to use for each VM. Difficulty: ??? --- ## Exercise 2 - You have: - 1000+ hypervisors (and counting!) - Each server has different resources: - 8-500 GB of RAM, 4-64 cores, 1-100 TB disk - Multiple times a day, a different team asks for: - up to 50 VMs with different characteristics Scheduling = deciding which hypervisor to use for each VM. ![Troll face](images/trollface.png) --- ## Exercise 3 - You have machines (physical and/or virtual) - You have containers - You are trying to put the containers on the machines - Sounds familiar? --- class: pic ## Scheduling with one resource .center[![Not-so-good bin packing](images/binpacking-1d-1.gif)] ## We can't fit a job of size 6 :( --- class: pic ## Scheduling with one resource .center[![Better bin packing](images/binpacking-1d-2.gif)] ## ... Now we can! --- class: pic ## Scheduling with two resources .center[![2D bin packing](images/binpacking-2d.gif)] --- class: pic ## Scheduling with three resources .center[![3D bin packing](images/binpacking-3d.gif)] --- class: pic ## You need to be good at this .center[![Tangram](images/tangram.gif)] --- class: pic ## But also, you must be quick! .center[![Tetris](images/tetris-1.png)] --- class: pic ## And be web scale! .center[![Big tetris](images/tetris-2.gif)] --- class: pic ## And think outside (?) of the box! .center[![3D tetris](images/tetris-3.png)] --- class: pic ## Good luck! .center[![FUUUUUU face](images/fu-face.jpg)] --- ## TL,DR * Scheduling with multiple resources (dimensions) is hard. * Don't expect to solve the problem with a Tiny Shell Script. * There are literally tons of research papers written on this. --- ## But our orchestrator also needs to manage ... * Network connectivity (or filtering) between containers. * Load balancing (external and internal). * Failure recovery (if a node or a whole datacenter fails). * Rolling out new versions of our applications. (Canary deployments, blue/green deployments...) --- ## Some orchestrators We are going to present briefly a few orchestrators. There is no "absolute best" orchestrator. It depends on: - your applications, - your requirements, - your pre-existing skills... --- ## Nomad - Open Source project by Hashicorp. - Arbitrary scheduler (not just for containers). - Great if you want to schedule mixed workloads. (VMs, containers, processes...) - Less integration with the rest of the container ecosystem. --- ## Mesos - Open Source project in the Apache Foundation. - Arbitrary scheduler (not just for containers). - Two-level scheduler. - Top-level scheduler acts as a resource broker. - Second-level schedulers (aka "frameworks") obtain resources from top-level. - Frameworks implement various strategies. (Marathon = long running processes; Chronos = run at intervals; ...) - Commercial offering through DC/OS by Mesosphere. --- ## Rancher - Rancher 1 offered a simple interface for Docker hosts. - Rancher 2 is a complete management platform for Docker and Kubernetes. - Technically not an orchestrator, but it's a popular option. --- ## Swarm - Tightly integrated with the Docker Engine. - Extremely simple to deploy and setup, even in multi-manager (HA) mode. - Secure by default. - Strongly opinionated: - smaller set of features, - easier to operate. --- ## Kubernetes - Open Source project initiated by Google. - Contributions from many other actors. - *De facto* standard for container orchestration. - Many deployment options; some of them very complex. - Reputation: steep learning curve. - Reality: - true, if we try to understand *everything*; - false, if we focus on what matters. ??? :EN:- Orchestration overview :FR:- Survol de techniques d'orchestration