Commit Graph

43 Commits

Author SHA1 Message Date
Naga Ravi Chaitanya Elluri
dad4039f27 Add chaos testing guide github pages link
Chaos testing guide is hosted using github pages at https://cloud-bulldozer.github.io/krkn/.
This commit adds a pointer to the readme for reference.
2022-04-22 10:20:55 -04:00
Naga Ravi Chaitanya Elluri
eceb846844 Add krkn reference in the readme 2022-04-21 16:02:50 -04:00
Sanja Bonic
0bd543a339 Add build container step to PRs, fix typos (#226) 2022-04-12 18:46:26 +02:00
Naga Ravi Chaitanya Elluri
8c7b19d37d Add roadmap for Kraken
This commit adds a roadmap which walks through the features and enhancements that
are going to be added to Kraken in the immediate future in order to help users
understand where we need help as well as where the project is going.
2022-01-31 09:39:07 -05:00
Naga Ravi Chaitanya Elluri
f10538abcb Add chaos testing guide
This commit:
- Adds information around test methodology that needs to be embraced and
  best practices that an OpenShift cluster, platform and applications running
  on top of it should take into account for best user experience, performance,
  resilience and reliability.
- Adds test environment recommendations as to how and where to run chaos tests.
2022-01-06 16:17:32 -05:00
yogananth-subramanian
50dd9873c1 Node egress traffic shaping
Patch adds a scenario to create variations in egress traffic of a Node's interface using the tc and Netem.
2021-12-16 12:54:53 -05:00
Paige Rubendall
67b0f2de8c Adding in image and link to demo 2021-12-06 19:41:38 -05:00
Paige Rubendall
f17ad062cf Ci tests (#184)
* Adding in working ci tests

* spacing in readme
2021-11-24 15:12:47 -05:00
Alejandro Gullón
baa812b7f0 Added new scenario to fill up a given volumen (#182)
* Added new scenario to fill up a given volumen

* fixing small issues and style

* adding PVC as input param instead of pod name

* small fix

* get container name and volumen name
replace oc with kubectl commands

* adding yaml file to create a pv, pvc and pod to run pvc_scenario

* adding support to match both string for describe command when looking for pod_name

* added support to find the pvc from a given pod

* small fix

* small fix
2021-11-24 12:18:49 -05:00
Paige Rubendall
6b865fc573 Adding server set up for kraken 2021-10-25 08:58:46 -04:00
Naga Ravi Chaitanya Elluri
d3f8e2dd35 Bake in azure cli needed for node scenarios
This commit also modifies the key members for folks to reach out in case
of any questions.
2021-10-19 16:31:18 -04:00
Naga Ravi Chaitanya Elluri
cdf3bc03d2 Add support to block traffic to an application
This commit enables users to simulate a downtime of an application
by blocking the traffic for the specified duration to see how
it/other components communicating with it behave in case of downtime.
2021-10-01 10:13:40 -04:00
Naga Ravi Chaitanya Elluri
5da0b259c5 Run all the litmus resources in a single namespace
- This eases the usage and debuggability by running the fault injection pods in
  the same namespace as other resources of litmus. This will also ease the
  deletion process and ensure that there are no leftover objects on the cluster.

- This commit also enables users to use the same rbac template for all the litmus
  scenarios without having to pull in a specic one for each of the scenarios.
2021-09-08 16:37:07 -04:00
Naga Ravi Chaitanya Elluri
9d9f564a3d Add badge for the container image 2021-08-27 20:32:43 -04:00
Naga Ravi Chaitanya Elluri
07ccfbf0aa Add pointer to Kraken-hub
This enables users to run Kraken with minimal configuration tweaks
and makes it easy for especially CI use cases.
2021-08-23 14:33:16 -04:00
Naga Ravi Chaitanya Elluri
6456eec76a Add zone outage scenarios
This commit adds support to create zone outage in AWS by denying both
ingress and egress traffic to the instances belonging to a particular
subnet belonging to the zone by tweaking the network acl. This creates
an outage of all the nodes in the zone - both master and workers.
2021-08-17 11:43:13 -04:00
Naga Ravi Chaitanya Elluri
716057eab6 Monitor user application availability during chaos
Current Kraken integration with Cerberus monitors the cluster as well as the
application health post chaos and pass/fails if they are not healthy after chaos.
This commit adds ability to monitor the user application health during the chaos
and fails the run in case of downtime as it's potentially a downtime in case of
customers environment as well. It is especially useful in case of control plane
failure scenarios including API server, Etcd, Ingress etc.
2021-07-27 13:15:57 -04:00
Naga Ravi Chaitanya Elluri
c0b9cb46da Improve error handling
This commit:
- Adds timeout to avoid operations hanging for long durations.
- Improves exception handling and exits wherever needed.
- Sets KUBECONFIG env var globoally to access the cluster.
2021-07-21 12:48:06 -04:00
Paige Rubendall
f051c1c30f Merge pull request #120 from paigerube14/container_kill
Container kill
2021-07-15 15:07:58 -04:00
prubenda
76efac8f9b Adding delete of namespaces 2021-07-13 13:31:45 -04:00
prubenda
46a1823291 Adding killing of specific containers in pods 2021-07-08 17:10:48 -04:00
Naga Ravi Chaitanya Elluri
d7ba19c382 Automate the infrastruture pieces
This commit:
- Adds support to automate the infrastructure pieces leveraged by Kraken
  including Cerberus and Elasticsearch
- Adds a Kraken config that can be used to discover all the infra pieces
  automatically without having to tweak the configuration.
2021-07-07 15:52:26 -04:00
prubenda
5456fce924 Adding getting started docs 2021-06-23 13:58:43 -04:00
prubenda
41bf815f98 Adding shut down scenario for gcp, az, aws, openstack 2021-06-23 09:00:58 -04:00
Naga Ravi Chaitanya Elluri
e30a4243f6 Add support to alerting on metrics evaluation
This commit enables alerting in Kraken based on the Prometheus queries defined
by the user and modifies the return code of the run to determine pass/fail for
the run.
2021-06-22 15:22:37 -04:00
Naga Ravi Chaitanya Elluri
7e8f0450d6 Add support to scrape and index metrics
This commit:
- Enables Kraken to leverage kube-burner to scrape metrics from
  Prometheus and index them into Elasticsearch. This way we can
  take a look at the metrics in Grafana long term even after the
  cluster is terminated.
- Enables separation of operations based on distribution with
  OpenShift as the default option. One of the use cases is to
  capture Prometheus instance details as it's installed by default
  while it's optional for Kubernetes.
2021-06-21 14:55:50 -04:00
Naga Ravi Chaitanya Elluri
5c2453b07e Refactor code base
This commit:
- Refactors the code base to be more modular by moving functions
  into respective modules to make it lean and reusable.
- Uses black to reformat the code to follow PEP 8 practices.
2021-06-14 17:41:10 -04:00
Amit Sagtani
d00d6ec69e Install pre-commit and use GitHub Actions (#94)
* added pre-commit and code-cleaning

* removed tox and TravisCI
2021-05-05 09:53:45 -04:00
Naga Ravi Chaitanya Elluri
70b14956c7 Docs: Add pointer to the litmus based scenarios 2021-05-03 10:01:07 -04:00
Naga Ravi Chaitanya Elluri
db42f054ba Add pointer to the new blog
This commit:
- Adds a pointer to a new blog which emphasizes the importance
  of making chaos part of Perf/Scale test runs.
- Bumps up the allowed max-line-length for the linters.
2021-03-22 19:37:20 -04:00
Naga Ravi Chaitanya Elluri
576227189d Fix the link in the docs 2021-03-16 09:00:49 -04:00
prubenda
387d6921a6 adding contribute doc 2021-02-25 10:33:55 -05:00
Naga Ravi Chaitanya Elluri
a7e28ca490 Add support to deploy performance dashboards
This commit enables performance monitoring on the cluster when
running Kraken to be able to observe how cluster reacts to failures
as it's important to make sure the cluster is healthy in terms of
both recovery as well as performance.
2021-02-10 16:06:55 -05:00
Naga Ravi Chaitanya Elluri
12201a32c7 Add pointers to helpful resources around Kraken 2021-02-02 21:17:49 -05:00
prubenda
6f31519e5f adding time scenario 2020-10-27 08:37:54 -04:00
Naga Ravi Chaitanya Elluri
82743230fe Modify documentation to improve readability
This commit:
- Converts various sections in the readme into individual documents.
- Adds pointers to the public blogs.
- Updates workflow/architecture diagram.
- Adds community info and contributing guidelines.
2020-10-21 15:01:54 -04:00
Yashashree Suresh
31f06b861a Added node scenarios to stop and terminate instance
This commit:
- Adds a node scenario to stop and start an instance
- Adds a node scenario to terminate an instance
- Adds a node scenario to reboot an instance
- Adds a node scenario to stop the kubelet
- Adds a node scenario to crash the node
2020-08-27 16:50:42 -04:00
Yashashree Suresh
c033aa434e Added support to kill prometheus pods 2020-08-13 10:30:04 -04:00
prubenda
9958a9753b Adding build own readme and linking 2020-08-11 12:28:20 -04:00
Naga Ravi Chaitanya Elluri
eec52cf613 Containerize kraken
This commit adds support to run the tool as a container on the host
with access to kubeconfig for better portability. The plan is to
trigger regular image builds on quay.io to make sure it has the
latest code.
2020-04-27 22:29:15 -04:00
Yashashree Suresh
f1c145e942 Integrated cerberus for checking cluster health 2020-04-22 23:30:21 -04:00
Naga Ravi Chaitanya Elluri
b745a0404f Update readme
This commit updates readme with the following:
- Information on how to use the tool.
- Information on adding new scenarios.
- Information on using Cerberus tool for pass/fail.
2020-04-20 11:44:29 -04:00
Naga Ravi Chaitanya Elluri
ae6c9b87e9 Initial commit 2020-04-19 15:33:55 -04:00