mirror of https://github.com/krkn-chaos/krkn.git synced 2026-02-14 18:10:00 +00:00

Go to file

Naga Ravi Chaitanya Elluri d1ae298692 Update the workflow

This commit modifies the workflow diagram to add pieces that are
leveraged to determine pass/fail of the chaos scnearios.

2021-06-23 12:41:38 -04:00

.github/workflows

Install pre-commit and use GitHub Actions (#94 )

2021-05-05 09:53:45 -04:00

ansible

Install pre-commit and use GitHub Actions (#94 )

2021-05-05 09:53:45 -04:00

CI/scenarios

Install pre-commit and use GitHub Actions (#94 )

2021-05-05 09:53:45 -04:00

config

Add support to alerting on metrics evaluation

2021-06-22 15:22:37 -04:00

containers

Add dependencies needed for performance monitoring

2021-05-06 17:26:09 -04:00

docs

Add support to alerting on metrics evaluation

2021-06-22 15:22:37 -04:00

kraken

Add support to alerting on metrics evaluation

2021-06-22 15:22:37 -04:00

media

Update the workflow

2021-06-23 12:41:38 -04:00

scenarios

Install pre-commit and use GitHub Actions (#94 )

2021-05-05 09:53:45 -04:00

.gitignore

Install pre-commit and use GitHub Actions (#94 )

2021-05-05 09:53:45 -04:00

.pre-commit-config.yaml

Install pre-commit and use GitHub Actions (#94 )

2021-05-05 09:53:45 -04:00

LICENSE

Initial commit

2020-04-19 15:33:55 -04:00

README.md

Add support to alerting on metrics evaluation

2021-06-22 15:22:37 -04:00

requirements.txt

Install pre-commit and use GitHub Actions (#94 )

2021-05-05 09:53:45 -04:00

run_kraken.py

Add support to alerting on metrics evaluation

2021-06-22 15:22:37 -04:00

setup.cfg

Add support to scrape and index metrics

2021-06-21 14:55:50 -04:00

setup.py

Install pre-commit and use GitHub Actions (#94 )

2021-05-05 09:53:45 -04:00

README.md

Kraken

Chaos and resiliency testing tool for Kubernetes and OpenShift. Kraken injects deliberate failures into Kubernetes/OpenShift clusters to check if it is resilient to turbulent conditions.

Workflow

Installation and usage

Instructions on how to setup, configure and run Kraken can be found at Installation.

Config

Instructions on how to setup the config and the options supported can be found at Config.

Kubernetes/OpenShift chaos scenarios supported

Kraken supports pod, node, time/date and litmus based scenarios.

Kraken scenario pass/fail criteria and report

It's important to make sure to check if the targeted component recovered from the chaos injection and also if the Kubernetes/OpenShift cluster is healthy as failures in one component can have an adverse impact on other components. Kraken does this by:

Having built in checks for pod and node based scenarios to ensure the expected number of replicas and nodes are up. It also supports running custom scripts with the checks.
Leveraging Cerberus to monitor the cluster under test and consuming the aggregated go/no-go signal to determine pass/fail. It is highly recommended to turn on the Cerberus health check feature avaliable in Kraken. Instructions on installing and setting up Cerberus can be found here. Once Cerberus is up and running, set cerberus_enabled to True and cerberus_url to the url where Cerberus publishes go/no-go signal in the Kraken config file.

Performance monitoring

Monitoring the Kubernetes/OpenShift cluster to observe the impact of Kraken chaos scenarios on various components is key to find out the bottlenecks as it's important to make sure the cluster is healthy in terms if both recovery as well as performance during/after the failure has been injected. Instructions on enabling it can be found here.

Scraping and storing metrics long term

Kraken supports capturing metrics for the duration of the scenarios defined in the config and indexes then into Elasticsearch to be able to store and evaluate the state of the runs long term. The indexed metrics can be visualized with the help of Grafana. It uses Kube-burner under the hood. The metrics to capture need to be defined in a metrics profile which Kraken consumes to query prometheus ( installed by default in OpenShift ) with the start and end timestamp of the run. Information on enabling and leveraging this feature can be found here.

Alerts

In addition to checking the recovery and health of the cluster and components under test, Kraken takes in a profile with the Prometheus expressions to validate and alerts, exits with a non-zero return code depending on the severity set. This feature can be used to determine pass/fail or alert on abnormalities observed in the cluster based on the metrics. Information on enabling and leveraging this feature can be found here.

Blogs and other useful resources

Blog post on introduction to Kraken: https://www.openshift.com/blog/introduction-to-kraken-a-chaos-tool-for-openshift/kubernetes
Discussion and demo on how Kraken can be leveraged to ensure OpenShift is reliable, performant and scalable: https://www.youtube.com/watch?v=s1PvupI5sD0&ab_channel=OpenShift
Blog post emphasizing the importance of making Chaos part of Performance and Scale runs to mimic the production environments: https://www.openshift.com/blog/making-chaos-part-of-kubernetes/openshift-performance-and-scalability-tests

Contributions

We are always looking for more enhancements, fixes to make it better, any contributions are most welcome. Feel free to report or work on the issues filed on github.

More information on how to Contribute

Community

Key Members(slack_usernames): paigerube14, rook, mffiedler, mohit, dry923, rsevilla, ravielluri

Description

Chaos and resiliency testing tool for Kubernetes with a focus on improving performance under failure conditions. A CNCF sandbox project.

chaos-engineering containers kubernetes performance reliability resiliency scalability testing

Readme Apache-2.0 7.6 MiB