mirror of
https://github.com/prometheus-operator/runbooks.git
synced 2026-05-06 07:16:33 +00:00
add doc for AlertmanagerClusterDown
This commit is contained in:
28
content/runbooks/alertmanager/AlertmanagerClusterDown.md
Normal file
28
content/runbooks/alertmanager/AlertmanagerClusterDown.md
Normal file
@@ -0,0 +1,28 @@
|
||||
---
|
||||
title: Alertmanager Cluster Down
|
||||
weight: 20
|
||||
---
|
||||
|
||||
# AlertmanagerClusterDown
|
||||
|
||||
## Meaning
|
||||
|
||||
Half or more of the Alertmanager instances within the same cluster are down.
|
||||
|
||||
## Impact
|
||||
|
||||
You have an unstable cluster, if everything goes wrong you will lose the whole cluster.
|
||||
|
||||
## Diagnosis
|
||||
|
||||
Verify why pods are not running.
|
||||
You can get a big picture with `events`.
|
||||
|
||||
```bash
|
||||
$ kubectl get events --field-selector involvedObject.kind=Pod | grep alertmanager
|
||||
```
|
||||
|
||||
## Mitigation
|
||||
|
||||
There are no cheap options to mitigate this risk.
|
||||
Verifying any new changes in preprod before production environment should improve stability.
|
||||
Reference in New Issue
Block a user