add doc for AlertmanagerClusterDown

2026-05-06 07:16:33 +00:00 · 2021-11-12 16:37:08 +01:00
parent 6dd0987395
commit cbf59bda0f
1 changed files with 28 additions and 0 deletions
--- a/content/runbooks/alertmanager/AlertmanagerClusterDown.md
+++ b/content/runbooks/alertmanager/AlertmanagerClusterDown.md
@@ -0,0 +1,28 @@
+---
+title: Alertmanager Cluster Down
+weight: 20
+---
+
+# AlertmanagerClusterDown
+
+## Meaning
+
+Half or more of the Alertmanager instances within the same cluster are down. 
+
+## Impact
+
+You have an unstable cluster, if everything goes wrong you will lose the whole cluster.
+
+## Diagnosis
+
+Verify why pods are not running.
+You can get a big picture with `events`.
+
+```bash
+$ kubectl get events --field-selector involvedObject.kind=Pod | grep alertmanager
+```
+
+## Mitigation
+
+There are no cheap options to mitigate this risk.
+Verifying any new changes in preprod before production environment should improve stability.