container.training/slides/containers/Init_Systems.md

# Init systems and PID 1

In this chapter, we will consider:

- the role of PID 1 in the world of Docker,

- how to avoid some common pitfalls due to the misuse of init systems.

---

## What's an init system?

- On UNIX, the "init system" (or "init" in short) is PID 1.

- It is the first process started by the kernel when the system starts.

- It has multiple responsibilities:

  - start every other process on the machine,

  - reap orphaned zombie processes.

---

class: extra-details

## Orphaned zombie processes ?!?

- When a process exits (or "dies"), it becomes a "zombie".

  (Zombie processes show up in `ps` or `top` with the status code `Z`.)

- Its parent process must *reap* the zombie process.

  (This is done by calling `waitpid()` to retrieve the process' exit status.)

- When a process exits, if it has child processes, these processes are "orphaned."

- They are then re-parented to PID 1, init.

- Init therefore needs to take care of these orphaned processes when they exit.

---

## Don't use init systems in containers

- It's often tempting to use an init system or a process manager.

  (Examples: *systemd*, *supervisord*...)

- Our containers are then called "system containers".

  (By contrast with "application containers".)

- "System containers" are similar to lightweight virtual machines.

- They have multiple downsides:

  - when starting multiple processes, their logs get mixed on stdout,

  - if the application process dies, the container engine doesn't see it.

- Overall, they make it harder to operate troubleshoot containerized apps.

---

## Exceptions and workarounds

- Sometimes, it's convenient to run a real init system like *systemd*.

  (Example: a CI system whose goal is precisely to test an init script or unit file.)

- If we need to run multiple processes: can we use multiple containers?

  (Example: [this Compose file](https://github.com/jpetazzo/container.training/blob/master/compose/simple-k8s-control-plane/docker-compose.yaml) runs multiple processes together.)

- When deploying with Kubernetes:

  - a container belong to a pod,

  - a pod can have multiple containers.

---

## What about these zombie processes?

- Our application runs as PID 1 in the container.

- Our application may or may not be designed to reap zombie processes.

- If our application uses subprocesses and doesn't reap them ...

  ... this can lead to PID exhaustion!

  (Or, more realistically, to a confusing herd of zombie processes.)

- How can we solve this?

---

## Tini to the rescue

- Docker can automatically provide a minimal `init` process.

- This is enabled with `docker run --init ...`

- It uses a small init system ([tini](https://github.com/krallin/tini)) as PID 1:

  - it reaps zombies,

  - it forwards signals,

  - it exits when the child exits.

- It is totally transparent to our application.

- We should use it if our application creates subprocess but doesn't reap them.

---

class: extra-details

## What about Kubernetes?

- Kubernetes does not expose that `--init` option.

- However, we can achieve the same result with [Process Namespace Sharing](https://kubernetes.io/docs/tasks/configure-pod-container/share-process-namespace/).

- When Process Namespace Sharing is enabled, PID 1 will be `pause`.

- That `pause` process takes care of reaping zombies.

- Process Namespace Sharing is available since Kubernetes 1.16.

- If you're using an older version of Kubernetes ...

  ... you might have to add `tini` explicitly to your Docker image.