Commit Graph

65 Commits

Author SHA1 Message Date
Tullio Sebastiani
724068a978 Chaos recommender refactoring (#516)
* basic structure working

* config and options refactoring

nits and changes

* removed unused function with typo + fixed duration

* removed unused arguments

* minor fixes
2023-10-30 15:51:09 +01:00
Naga Ravi Chaitanya Elluri
fc6344176b Add pointer to the CNCF sandbox discussion (#517)
Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>
2023-10-24 16:07:40 -04:00
Mudit Verma
5953e53b46 chaos recommendation entry in README (#510) 2023-10-16 11:26:32 -04:00
Naga Ravi Chaitanya Elluri
eb2eabe029 Update community slack channel 2023-10-06 17:47:25 -04:00
Paige Rubendall
f7f1b2dfb0 Service disruption (#494)
* adding service disruption

* fixing kil services

* service log changes

* remvoing extra logging

* adding daemon set

* adding service disruption name changes

* cerberus config back

* bad string
2023-10-06 12:51:10 -04:00
Pratyusha Thammineni
3a66f8a5a3 Added Docker image build workflow status badge
This Allows the users to track the docker-build action in README.md
without navigationg to Actions tab on Github
2023-09-11 15:16:28 -04:00
yogananth-subramanian
b2b5002f45 Pod egress network shapping Chaos scenario
The scenario introduces network latency, packet loss, and bandwidth restriction in the Pod's network interface.
The purpose of this scenario is to observe faults caused by random variations in the network.

Below example config applies egress traffic shaping to openshift console.
````
- id: pod_egress_shaping
  config:
    namespace: openshift-console   # Required - Namespace of the pod to which filter need to be applied.
    label_selector: 'component=ui' # Applies traffic shaping to access openshift console.
    network_params:
        latency: 500ms             # Add 500ms latency to egress traffic from the pod.
````
2023-08-08 11:45:03 -04:00
Naga Ravi Chaitanya Elluri
0eb8d38596 Expand SLOs profile to cover monitoring for more alerts
This commit:
- Also sets appropriate severity to avoid false failures for the
  test cases especially given that theses are monitored during the chaos
  vs post chaos. Critical alerts are all monitored post chaos with few
  monitored during the chaos that represent overall health and performance
  of the service.
- Renames Alerts to SLOs validation

Metrics reference: f09a492b13/cmd/kube-burner/ocp-config/alerts.yml
2023-06-14 16:58:36 -04:00
Naga Ravi Chaitanya Elluri
572eeefaf4 Minor fixes
This commit fixes few typos and duplicate logs
2023-06-12 21:05:27 -04:00
Naga Ravi Chaitanya Elluri
54ea98be9c Add enhancements being planned as part of the roadmap (#425) 2023-05-24 14:36:59 -04:00
Naga Ravi Chaitanya Elluri
d9f4607aa6 Add blogs and update roadmap 2023-05-15 11:50:16 -04:00
Tullio Sebastiani
83b811bee4 Arcaflow stress-ng hogs with parallelism support (#418)
* kubeconfig management for arcaflow + hogs scenario refactoring  

  * kubeconfig authentication parsing refactored to support arcaflow kubernetes deployer  
  * reimplemented all the hog scenarios to allow multiple parallel containers of the same scenarios 
  (eg. to stress two or more nodes in the same run simultaneously) 
  * updated documentation 
* removed sysbench scenarios


* recovered cpu hogs


* updated requirements.txt


* updated config.yaml

* added gitleaks file for test fixtures

* imported sys and logging

* removed config_arcaflow.yaml

* updated readme

* refactored arcaflow documentation entrypoint
2023-05-15 09:45:16 -04:00
Tullio Sebastiani
3627b5ba88 cpu hog scenario + basic arcaflow documentation (#391)
typo


typo


updated documentation


fixed workflow map issue
2023-03-15 16:52:20 +01:00
Paige Rubendall
93686ca736 new quay image reference 2023-01-31 17:21:45 -05:00
José Castillo Lema
d76ab31155 OCM/ACM integration (#370)
* OCM support for ManagedClusters

* Updated docs and general adjustments

* Improved docs

* Improved docs2

* Removed io packet import

Signed-off-by: José Castillo Lema <josecastillolema@gmail.com>

* Removed time from imports

Signed-off-by: José Castillo Lema <josecastillolema@gmail.com>

* Removed duplicate logging import

Signed-off-by: José Castillo Lema <josecastillolema@gmail.com>

* Removed sys import

Signed-off-by: José Castillo Lema <josecastillolema@gmail.com>

* Update run.py

Signed-off-by: José Castillo Lema <josecastillolema@gmail.com>

Signed-off-by: José Castillo Lema <josecastillolema@gmail.com>
2023-01-10 08:58:17 -05:00
Sandro Bonazzola
d0d289fb7c update references to github organization
Updated references from chaos-kubox to redhat-chaos.

Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com>
2022-09-02 14:38:25 +02:00
Shreyas Anantha Ramaprasad
9421a0c2c2 Added support for ingress traffic shaping (#299)
* Added plugin for ingress network traffic shaping

* Documentation changes

* Minor changes

* Documentation and formatting fixes

* Added trap to sleep infinity command running in containers

* Removed shell injection threat for modprobe commands

* Added docstrings to cerberus functions

* Added checks to prevent shell injection

* Bug fix
2022-09-02 07:54:11 +02:00
Sam Doran
f4bc30d2a1 Update README (#284)
* Update link to documentation

* Update container status badge and link

Use the correct link to the status badge on Quay.
2022-08-07 02:20:32 -04:00
Naga Ravi Chaitanya Elluri
9208f39e06 Add support to run on Kubernetes
This commit:
- Leverages distribution flag in the config set by the user to skip
  things not supported on OpenShift to be able to run scenarios on
  Kubernetes.
- Adds sample config and scenario files that work on Kubernetes.
2022-06-01 07:27:06 -05:00
gsteeds
6280a39250 Fixed links within docs, as well as read through docs files and corrected some spelling and grammer issues. 2022-05-04 09:35:50 +02:00
Naga Ravi Chaitanya Elluri
9a087de8e9 Add Krkn logo (#230)
Credits: Thanks to Kaliq Ray for designing the logo.

Fixes https://github.com/cloud-bulldozer/krkn/issues/195

Co-authored-by: Sanja <86982064+sanjacodes@users.noreply.github.com>
2022-05-03 14:48:38 -04:00
Paige Rubendall
c1fb82e245 adding new quay repo 2022-04-25 10:25:51 -04:00
Naga Ravi Chaitanya Elluri
dad4039f27 Add chaos testing guide github pages link
Chaos testing guide is hosted using github pages at https://cloud-bulldozer.github.io/krkn/.
This commit adds a pointer to the readme for reference.
2022-04-22 10:20:55 -04:00
Naga Ravi Chaitanya Elluri
eceb846844 Add krkn reference in the readme 2022-04-21 16:02:50 -04:00
Sanja Bonic
0bd543a339 Add build container step to PRs, fix typos (#226) 2022-04-12 18:46:26 +02:00
Naga Ravi Chaitanya Elluri
8c7b19d37d Add roadmap for Kraken
This commit adds a roadmap which walks through the features and enhancements that
are going to be added to Kraken in the immediate future in order to help users
understand where we need help as well as where the project is going.
2022-01-31 09:39:07 -05:00
Naga Ravi Chaitanya Elluri
f10538abcb Add chaos testing guide
This commit:
- Adds information around test methodology that needs to be embraced and
  best practices that an OpenShift cluster, platform and applications running
  on top of it should take into account for best user experience, performance,
  resilience and reliability.
- Adds test environment recommendations as to how and where to run chaos tests.
2022-01-06 16:17:32 -05:00
yogananth-subramanian
50dd9873c1 Node egress traffic shaping
Patch adds a scenario to create variations in egress traffic of a Node's interface using the tc and Netem.
2021-12-16 12:54:53 -05:00
Paige Rubendall
67b0f2de8c Adding in image and link to demo 2021-12-06 19:41:38 -05:00
Paige Rubendall
f17ad062cf Ci tests (#184)
* Adding in working ci tests

* spacing in readme
2021-11-24 15:12:47 -05:00
Alejandro Gullón
baa812b7f0 Added new scenario to fill up a given volumen (#182)
* Added new scenario to fill up a given volumen

* fixing small issues and style

* adding PVC as input param instead of pod name

* small fix

* get container name and volumen name
replace oc with kubectl commands

* adding yaml file to create a pv, pvc and pod to run pvc_scenario

* adding support to match both string for describe command when looking for pod_name

* added support to find the pvc from a given pod

* small fix

* small fix
2021-11-24 12:18:49 -05:00
Paige Rubendall
6b865fc573 Adding server set up for kraken 2021-10-25 08:58:46 -04:00
Naga Ravi Chaitanya Elluri
d3f8e2dd35 Bake in azure cli needed for node scenarios
This commit also modifies the key members for folks to reach out in case
of any questions.
2021-10-19 16:31:18 -04:00
Naga Ravi Chaitanya Elluri
cdf3bc03d2 Add support to block traffic to an application
This commit enables users to simulate a downtime of an application
by blocking the traffic for the specified duration to see how
it/other components communicating with it behave in case of downtime.
2021-10-01 10:13:40 -04:00
Naga Ravi Chaitanya Elluri
5da0b259c5 Run all the litmus resources in a single namespace
- This eases the usage and debuggability by running the fault injection pods in
  the same namespace as other resources of litmus. This will also ease the
  deletion process and ensure that there are no leftover objects on the cluster.

- This commit also enables users to use the same rbac template for all the litmus
  scenarios without having to pull in a specic one for each of the scenarios.
2021-09-08 16:37:07 -04:00
Naga Ravi Chaitanya Elluri
9d9f564a3d Add badge for the container image 2021-08-27 20:32:43 -04:00
Naga Ravi Chaitanya Elluri
07ccfbf0aa Add pointer to Kraken-hub
This enables users to run Kraken with minimal configuration tweaks
and makes it easy for especially CI use cases.
2021-08-23 14:33:16 -04:00
Naga Ravi Chaitanya Elluri
6456eec76a Add zone outage scenarios
This commit adds support to create zone outage in AWS by denying both
ingress and egress traffic to the instances belonging to a particular
subnet belonging to the zone by tweaking the network acl. This creates
an outage of all the nodes in the zone - both master and workers.
2021-08-17 11:43:13 -04:00
Naga Ravi Chaitanya Elluri
716057eab6 Monitor user application availability during chaos
Current Kraken integration with Cerberus monitors the cluster as well as the
application health post chaos and pass/fails if they are not healthy after chaos.
This commit adds ability to monitor the user application health during the chaos
and fails the run in case of downtime as it's potentially a downtime in case of
customers environment as well. It is especially useful in case of control plane
failure scenarios including API server, Etcd, Ingress etc.
2021-07-27 13:15:57 -04:00
Naga Ravi Chaitanya Elluri
c0b9cb46da Improve error handling
This commit:
- Adds timeout to avoid operations hanging for long durations.
- Improves exception handling and exits wherever needed.
- Sets KUBECONFIG env var globoally to access the cluster.
2021-07-21 12:48:06 -04:00
Paige Rubendall
f051c1c30f Merge pull request #120 from paigerube14/container_kill
Container kill
2021-07-15 15:07:58 -04:00
prubenda
76efac8f9b Adding delete of namespaces 2021-07-13 13:31:45 -04:00
prubenda
46a1823291 Adding killing of specific containers in pods 2021-07-08 17:10:48 -04:00
Naga Ravi Chaitanya Elluri
d7ba19c382 Automate the infrastruture pieces
This commit:
- Adds support to automate the infrastructure pieces leveraged by Kraken
  including Cerberus and Elasticsearch
- Adds a Kraken config that can be used to discover all the infra pieces
  automatically without having to tweak the configuration.
2021-07-07 15:52:26 -04:00
prubenda
5456fce924 Adding getting started docs 2021-06-23 13:58:43 -04:00
prubenda
41bf815f98 Adding shut down scenario for gcp, az, aws, openstack 2021-06-23 09:00:58 -04:00
Naga Ravi Chaitanya Elluri
e30a4243f6 Add support to alerting on metrics evaluation
This commit enables alerting in Kraken based on the Prometheus queries defined
by the user and modifies the return code of the run to determine pass/fail for
the run.
2021-06-22 15:22:37 -04:00
Naga Ravi Chaitanya Elluri
7e8f0450d6 Add support to scrape and index metrics
This commit:
- Enables Kraken to leverage kube-burner to scrape metrics from
  Prometheus and index them into Elasticsearch. This way we can
  take a look at the metrics in Grafana long term even after the
  cluster is terminated.
- Enables separation of operations based on distribution with
  OpenShift as the default option. One of the use cases is to
  capture Prometheus instance details as it's installed by default
  while it's optional for Kubernetes.
2021-06-21 14:55:50 -04:00
Naga Ravi Chaitanya Elluri
5c2453b07e Refactor code base
This commit:
- Refactors the code base to be more modular by moving functions
  into respective modules to make it lean and reusable.
- Uses black to reformat the code to follow PEP 8 practices.
2021-06-14 17:41:10 -04:00
Amit Sagtani
d00d6ec69e Install pre-commit and use GitHub Actions (#94)
* added pre-commit and code-cleaning

* removed tox and TravisCI
2021-05-05 09:53:45 -04:00