class: namespaces name: namespaces # Improving isolation with User Namespaces - *Namespaces* are kernel mechanisms to compartimetalize the system - There are different kind of namespaces: `pid`, `net`, `mnt`, `ipc`, `uts`, and `user` - For a primer, see "Anatomy of a Container" ([video](https://www.youtube.com/watch?v=sK5i-N34im8)) ([slides](https://www.slideshare.net/jpetazzo/cgroups-namespaces-and-beyond-what-are-containers-made-from-dockercon-europe-2015)) - The *user namespace* allows to map UIDs between the containers and the host - As a result, `root` in a container can map to a non-privileged user on the host Note: even without user namespaces, `root` in a container cannot go wild on the host.
It is mediated by capabilities, cgroups, namespaces, seccomp, LSMs... --- class: namespaces ## User Namespaces in Docker - Optional feature added in Docker Engine 1.10 - Not enabled by default - Has to be enabled at Engine startup, and affects all containers - When enabled, `UID:GID` in containers are mapped to a different range on the host - Safer than switching to a non-root user (with `-u` or `USER`) in the container
(Since with user namespaces, root escalation maps to a non-privileged user) - Can be selectively disabled per container by starting them with `--userns=host` --- class: namespaces ## User Namespaces Caveats When user namespaces are enabled, containers cannot: - Use the host's network namespace (with `docker run --network=host`) - Use the host's PID namespace (with `docker run --pid=host`) - Run in privileged mode (with `docker run --privileged`) ... Unless user namespaces are disabled for the container, with flag `--userns=host` External volume and graph drivers that don't support user mapping might not work. All containers are currently mapped to the same UID:GID range. Some of these limitations might be lifted in the future! --- class: namespaces ## Filesystem ownership details When enabling user namespaces: - the UID:GID on disk (in the images and containers) has to match the *mapped* UID:GID - existing images and containers cannot work (their UID:GID would have to be changed) For practical reasons, when enabling user namespaces, the Docker Engine places containers and images (and everything else) in a different directory. As a resut, if you enable user namespaces on an existing installation: - all containers and images (and e.g. Swarm data) disappear - *if a node is a member of a Swarm, it is then kicked out of the Swarm* - everything will re-appear if you disable user namespaces again --- class: namespaces ## Picking a node - We will select a node where we will enable user namespaces - This node will have to be re-added to the Swarm - All containers and services running on this node will be rescheduled - Let's make sure that we do not pick the node running the registry! .exercise[ - Check on which node the registry is running: ```bash docker service ps registry ``` ] Pick any other node (noted `nodeX` in the next slides). --- class: namespaces ## Logging into the right Engine .exercise[ - Log into the right node: ```bash ssh node`X` ``` ] --- class: namespaces ## Configuring the Engine .exercise[ - Create a configuration file for the Engine: ```bash echo '{"userns-remap": "default"}' | sudo tee /etc/docker/daemon.json ``` - Restart the Engine: ```bash kill $(pidof dockerd) ``` ] --- class: namespaces ## Checking that User Namespaces are enabled .exercise[ - Notice the new Docker path: ```bash docker info | grep var/lib ``` - Notice the new UID:GID permissions: ```bash sudo ls -l /var/lib/docker ``` ] You should see a line like the following: ``` drwx------ 11 296608 296608 4096 Aug 3 05:11 296608.296608 ``` --- class: namespaces ## Add the node back to the Swarm .exercise[ - Get our manager token from another node: ```bash ssh node`Y` docker swarm join-token manager ``` - Copy-paste the join command to the node ] --- class: namespaces ## Check the new UID:GID .exercise[ - Run a background container on the node: ```bash docker run -d --name lockdown alpine sleep 1000000 ``` - Look at the processes in this container: ```bash docker top lockdown ps faux ``` ] --- class: namespaces ## Comparing on-disk ownership with/without User Namespaces .exercise[ - Compare the output of the two following commands: ```bash docker run alpine ls -l / docker run --userns=host alpine ls -l / ``` ] -- class: namespaces In the first case, it looks like things belong to `root:root`. In the second case, we will see the "real" (on-disk) ownership. -- class: namespaces Remember to get back to `node1` when finished!