mirror of
https://github.com/krkn-chaos/krkn.git
synced 2026-02-14 18:10:00 +00:00
Add support to alerting on metrics evaluation
This commit enables alerting in Kraken based on the Prometheus queries defined by the user and modifies the return code of the run to determine pass/fail for the run.
This commit is contained in:
@@ -36,19 +36,27 @@ It's important to make sure to check if the targeted component recovered from th
|
||||
### Performance monitoring
|
||||
Monitoring the Kubernetes/OpenShift cluster to observe the impact of Kraken chaos scenarios on various components is key to find out the bottlenecks as it's important to make sure the cluster is healthy in terms if both recovery as well as performance during/after the failure has been injected. Instructions on enabling it can be found [here](docs/performance_dashboards.md).
|
||||
|
||||
|
||||
### Scraping and storing metrics long term
|
||||
Kraken supports capturing metrics for the duration of the scenarios defined in the config and indexes then into Elasticsearch to be able to store and evaluate the state of the runs long term. The indexed metrics can be visualized with the help of Grafana. It uses [Kube-burner](https://github.com/cloud-bulldozer/kube-burner) under the hood. The metrics to capture need to be defined in a metrics profile which Kraken consumes to query prometheus ( installed by default in OpenShift ) with the start and end timestamp of the run. Information on enabling and leveraging this feature can be found [here](docs/metrics.md).
|
||||
|
||||
|
||||
### Alerts
|
||||
In addition to checking the recovery and health of the cluster and components under test, Kraken takes in a profile with the Prometheus expressions to validate and alerts, exits with a non-zero return code depending on the severity set. This feature can be used to determine pass/fail or alert on abnormalities observed in the cluster based on the metrics. Information on enabling and leveraging this feature can be found [here](docs/alerts.md).
|
||||
|
||||
|
||||
### Blogs and other useful resources
|
||||
- Blog post on introduction to Kraken: https://www.openshift.com/blog/introduction-to-kraken-a-chaos-tool-for-openshift/kubernetes
|
||||
- Discussion and demo on how Kraken can be leveraged to ensure OpenShift is reliable, performant and scalable: https://www.youtube.com/watch?v=s1PvupI5sD0&ab_channel=OpenShift
|
||||
- Blog post emphasizing the importance of making Chaos part of Performance and Scale runs to mimic the production environments: https://www.openshift.com/blog/making-chaos-part-of-kubernetes/openshift-performance-and-scalability-tests
|
||||
|
||||
|
||||
### Contributions
|
||||
We are always looking for more enhancements, fixes to make it better, any contributions are most welcome. Feel free to report or work on the issues filed on github.
|
||||
|
||||
[More information on how to Contribute](docs/contribute.md)
|
||||
|
||||
|
||||
### Community
|
||||
Key Members(slack_usernames): paigerube14, rook, mffiedler, mohit, dry923, rsevilla, ravielluri
|
||||
* [**#sig-scalability on Kubernetes Slack**](https://kubernetes.slack.com)
|
||||
|
||||
Reference in New Issue
Block a user