Merge pull request #201 from negz/patch-1

Document Draino remedy system
2026-05-09 02:37:18 +00:00 · 2018-09-05 02:22:33 -07:00
parent 94bd5b07ba 0868200b08
commit 9bb87cd209
1 changed files with 18 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -26,7 +26,7 @@ stack, so Kubernetes will continue scheduling pods to the bad nodes.
 To solve this problem, we introduced this new daemon **node-problem-detector** to
 collect node problems from various daemons and make them visible to the upstream
 layers. Once upstream layers have the visibility to those problems, we can discuss the
-remedy system.
+[remedy system](#remedy-systems).

 # Problem API
 node-problem-detector uses `Event` and `NodeCondition` to report problems to
@@ -138,6 +138,23 @@ For example, to test [KernelMonitor](https://github.com/kubernetes/node-problem-
 - You can see more rule examples under [test/kernel_log_generator/problems](https://github.com/kubernetes/node-problem-detector/tree/master/test/kernel_log_generator/problems).
 - For [KernelMonitor](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json) message injection, all messages should have ```kernel: ``` prefix (also note there is a space after ```:```).

+# Remedy Systems
+A _remedy system_ is a process or processes designed to attempt to remedy problems
+detected by the node-problem-detector. Remedy systems observe events and/or node
+conditions emitted by the node-problem-detector and take action to return the
+Kubernetes cluster to a healthy state. The following remedy systems exist:
+
+* [**Draino**](https://github.com/negz/draino) automatically drains Kubernetes
+  nodes based on labels and node conditions. Nodes that match _all_ of the supplied
+  labels and _any_ of the supplied node conditions will be prevented from accepting
+  new pods (aka 'cordoned') immediately, and
+  [drained](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/)
+  after a configurable time. Draino can be used in conjunction with the
+  [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)
+  to automatically terminate drained nodes. Refer to
+  [this issue](https://github.com/kubernetes/node-problem-detector/issues/199)
+  for an example production use case for Draino.
+
 # Links
 * [Design Doc](https://docs.google.com/document/d/1cs1kqLziG-Ww145yN6vvlKguPbQQ0psrSBnEqpy0pzE/edit?usp=sharing)
 * [Slides](https://docs.google.com/presentation/d/1bkJibjwWXy8YnB5fna6p-Ltiy-N5p01zUsA22wCNkXA/edit?usp=sharing)