Expand SLOs profile to cover monitoring for more alerts

This commit:
- Also sets appropriate severity to avoid false failures for the
  test cases especially given that theses are monitored during the chaos
  vs post chaos. Critical alerts are all monitored post chaos with few
  monitored during the chaos that represent overall health and performance
  of the service.
- Renames Alerts to SLOs validation

Metrics reference: f09a492b13/cmd/kube-burner/ocp-config/alerts.yml
This commit is contained in:
Naga Ravi Chaitanya Elluri
2023-06-14 09:57:30 -04:00
parent 68dc17bc44
commit 0eb8d38596
3 changed files with 70 additions and 12 deletions

View File

@@ -1,16 +1,16 @@
## Alerts
## SLOs validation
Pass/fail based on metrics captured from the cluster is important in addition to checking the health status and recovery. Kraken supports:
### Checking for critical alerts
If enabled, the check runs at the end of each scenario and Kraken exits in case critical alerts are firing to allow user to debug. You can enable it in the config:
### Checking for critical alerts post chaos
If enabled, the check runs at the end of each scenario ( post chaos ) and Kraken exits in case critical alerts are firing to allow user to debug. You can enable it in the config:
```
performance_monitoring:
check_critical_alerts: False # When enabled will check prometheus for critical alerts firing post chaos
```
### Alerting based on the queries defined by the user
### Validation and alerting based on the queries defined by the user during chaos
Takes PromQL queries as input and modifies the return code of the run to determine pass/fail. It's especially useful in case of automated runs in CI where user won't be able to monitor the system. It uses [Kube-burner](https://kube-burner.readthedocs.io/en/latest/) under the hood. This feature can be enabled in the [config](https://github.com/redhat-chaos/krkn/blob/main/config/config.yaml) by setting the following:
```