add doc for AlertmanagerClusterDown

This commit is contained in:
Tigran Tch
2021-11-12 16:37:08 +01:00
parent 6dd0987395
commit cbf59bda0f

View File

@@ -0,0 +1,28 @@
---
title: Alertmanager Cluster Down
weight: 20
---
# AlertmanagerClusterDown
## Meaning
Half or more of the Alertmanager instances within the same cluster are down.
## Impact
You have an unstable cluster, if everything goes wrong you will lose the whole cluster.
## Diagnosis
Verify why pods are not running.
You can get a big picture with `events`.
```bash
$ kubectl get events --field-selector involvedObject.kind=Pod | grep alertmanager
```
## Mitigation
There are no cheap options to mitigate this risk.
Verifying any new changes in preprod before production environment should improve stability.