Commit Graph

44 Commits

Author SHA1 Message Date
Tullio Sebastiani
83b811bee4 Arcaflow stress-ng hogs with parallelism support (#418)
* kubeconfig management for arcaflow + hogs scenario refactoring  

  * kubeconfig authentication parsing refactored to support arcaflow kubernetes deployer  
  * reimplemented all the hog scenarios to allow multiple parallel containers of the same scenarios 
  (eg. to stress two or more nodes in the same run simultaneously) 
  * updated documentation 
* removed sysbench scenarios


* recovered cpu hogs


* updated requirements.txt


* updated config.yaml

* added gitleaks file for test fixtures

* imported sys and logging

* removed config_arcaflow.yaml

* updated readme

* refactored arcaflow documentation entrypoint
2023-05-15 09:45:16 -04:00
Paige Rubendall
16ea18c718 Ibm plugin node scenario (#417)
* Node scenarios for ibmcloud

* adding openshift check info
2023-05-09 12:07:38 -04:00
Naga Ravi Chaitanya Elluri
bc863fa01f Add support to check for critical alerts
This commit enables users to opt in to check for critical alerts firing
in the cluster post chaos at the end of each scenario. Chaos scenario is
considered as failed if the cluster is unhealthy in which case user can
start debugging to fix and harden respective areas.

Fixes https://github.com/redhat-chaos/krkn/issues/410
2023-05-03 16:14:13 -04:00
Naga Ravi Chaitanya Elluri
6b17dbdbb3 Allow users to set the listening address
This commit provides an option for the user to set the listening address
for the signal. This also fixes a security vulnerability.

Fixes https://github.com/redhat-chaos/krkn/issues/307
2022-11-08 15:59:57 -05:00
Sandro Bonazzola
0c36903fff config: really default to ~ instead of /root
Documentation says we default to ~ for looking up the kubernetes config
but then we set everywhere /root. Fixed the config to really look for ~.

Should solve #327.

Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com>
2022-09-13 12:01:16 +02:00
Shreyas Anantha Ramaprasad
9421a0c2c2 Added support for ingress traffic shaping (#299)
* Added plugin for ingress network traffic shaping

* Documentation changes

* Minor changes

* Documentation and formatting fixes

* Added trap to sleep infinity command running in containers

* Removed shell injection threat for modprobe commands

* Added docstrings to cerberus functions

* Added checks to prevent shell injection

* Bug fix
2022-09-02 07:54:11 +02:00
Naga Ravi Chaitanya Elluri
6c75d3dddb Add option to skip litmus installation
This commit adds an option for the user to pick whether to install
litmus or not depending on their use case. One use case is disconnected
environments where litmus is pre-installed insted of reaching out to the
internet.
2022-08-23 14:09:10 -04:00
Shreyas Anantha Ramaprasad
08deae63dd Added VMware Node Scenarios (#285)
* Added VMware node scenarios

* Made vmware plugin independent of Krkn

* Revert changes made to node status watch

* Fixed minor documentation changes
2022-08-15 23:35:16 +02:00
Janos Bonic
ccd902565e Fixes #265: Replace Powerfulseal and introduce Wolkenwalze SDK for plugin system 2022-08-02 16:25:03 +01:00
Naga Ravi Chaitanya Elluri
9208f39e06 Add support to run on Kubernetes
This commit:
- Leverages distribution flag in the config set by the user to skip
  things not supported on OpenShift to be able to run scenarios on
  Kubernetes.
- Adds sample config and scenario files that work on Kubernetes.
2022-06-01 07:27:06 -05:00
Adolfo Aguirrezabal
3adf5847b2 Add option to avoid litmus uninstall before chaos run (#242)
* Adds option to avoid litmus uninstall before chaos run

* Add new option to the config files
2022-05-05 09:02:25 -04:00
yogananth-subramanian
50dd9873c1 Node egress traffic shaping
Patch adds a scenario to create variations in egress traffic of a Node's interface using the tc and Netem.
2021-12-16 12:54:53 -05:00
Alejandro Gullón
baa812b7f0 Added new scenario to fill up a given volumen (#182)
* Added new scenario to fill up a given volumen

* fixing small issues and style

* adding PVC as input param instead of pod name

* small fix

* get container name and volumen name
replace oc with kubectl commands

* adding yaml file to create a pv, pvc and pod to run pvc_scenario

* adding support to match both string for describe command when looking for pod_name

* added support to find the pvc from a given pod

* small fix

* small fix
2021-11-24 12:18:49 -05:00
Naga Ravi Chaitanya Elluri
674eb74a75 Expose setting the signal in the config
This commit enables users to start Kraken to act as listener by setting
the signal to PAUSE in the config to get the cluster to a desired test or
run any setup before injecting chaos by setting the signal to RUN. This
helps in cases where we have test cases that need to coordinate the chaos
at a desired time depending on the state of the cluster/test run.
2021-10-26 09:05:25 -04:00
Paige Rubendall
6b865fc573 Adding server set up for kraken 2021-10-25 08:58:46 -04:00
Naga Ravi Chaitanya Elluri
cdf3bc03d2 Add support to block traffic to an application
This commit enables users to simulate a downtime of an application
by blocking the traffic for the specified duration to see how
it/other components communicating with it behave in case of downtime.
2021-10-01 10:13:40 -04:00
Paige Rubendall
22df024312 adding validation that namespace becomes active 2021-09-28 09:58:55 -04:00
Naga Ravi Chaitanya Elluri
036e51a6b1 Delete litmus crd's during the cleanup
This commit will ensure that the litmus resources installed on the
cluster get cleaned up and also creates the chaosengine in the
specified namespace.
2021-09-16 16:30:21 -04:00
Paige Rubendall
a9056ddf43 adding litmus logging 2021-09-08 17:11:49 -04:00
Naga Ravi Chaitanya Elluri
5da0b259c5 Run all the litmus resources in a single namespace
- This eases the usage and debuggability by running the fault injection pods in
  the same namespace as other resources of litmus. This will also ease the
  deletion process and ensure that there are no leftover objects on the cluster.

- This commit also enables users to use the same rbac template for all the litmus
  scenarios without having to pull in a specic one for each of the scenarios.
2021-09-08 16:37:07 -04:00
Naga Ravi Chaitanya Elluri
68a32666cd Update litmus docs with supported scenarios 2021-09-01 16:41:22 -04:00
prubenda
9b0bcdbf0e Adding node memory hog scenario 2021-08-20 14:02:00 -04:00
Naga Ravi Chaitanya Elluri
6456eec76a Add zone outage scenarios
This commit adds support to create zone outage in AWS by denying both
ingress and egress traffic to the instances belonging to a particular
subnet belonging to the zone by tweaking the network acl. This creates
an outage of all the nodes in the zone - both master and workers.
2021-08-17 11:43:13 -04:00
Naga Ravi Chaitanya Elluri
c56a8a5356 Add more tunables for cpu hog scenario
This commit exposes the flags to tweak the number of cores and node
count to hog during the node-cpu-hog scenario.
2021-07-28 17:07:40 -04:00
Naga Ravi Chaitanya Elluri
716057eab6 Monitor user application availability during chaos
Current Kraken integration with Cerberus monitors the cluster as well as the
application health post chaos and pass/fails if they are not healthy after chaos.
This commit adds ability to monitor the user application health during the chaos
and fails the run in case of downtime as it's potentially a downtime in case of
customers environment as well. It is especially useful in case of control plane
failure scenarios including API server, Etcd, Ingress etc.
2021-07-27 13:15:57 -04:00
Paige Rubendall
f051c1c30f Merge pull request #120 from paigerube14/container_kill
Container kill
2021-07-15 15:07:58 -04:00
prubenda
76efac8f9b Adding delete of namespaces 2021-07-13 13:31:45 -04:00
prubenda
46a1823291 Adding killing of specific containers in pods 2021-07-08 17:10:48 -04:00
prubenda
41bf815f98 Adding shut down scenario for gcp, az, aws, openstack 2021-06-23 09:00:58 -04:00
Naga Ravi Chaitanya Elluri
e30a4243f6 Add support to alerting on metrics evaluation
This commit enables alerting in Kraken based on the Prometheus queries defined
by the user and modifies the return code of the run to determine pass/fail for
the run.
2021-06-22 15:22:37 -04:00
Naga Ravi Chaitanya Elluri
7e8f0450d6 Add support to scrape and index metrics
This commit:
- Enables Kraken to leverage kube-burner to scrape metrics from
  Prometheus and index them into Elasticsearch. This way we can
  take a look at the metrics in Grafana long term even after the
  cluster is terminated.
- Enables separation of operations based on distribution with
  OpenShift as the default option. One of the use cases is to
  capture Prometheus instance details as it's installed by default
  while it's optional for Kubernetes.
2021-06-21 14:55:50 -04:00
Naga Ravi Chaitanya Elluri
a7e28ca490 Add support to deploy performance dashboards
This commit enables performance monitoring on the cluster when
running Kraken to be able to observe how cluster reacts to failures
as it's important to make sure the cluster is healthy in terms of
both recovery as well as performance.
2021-02-10 16:06:55 -05:00
prubenda
1fc9683c8c Adding litmus scenario options 2020-12-03 12:45:35 -05:00
Yashashree1997
47847d86cd Adds the ability to run a specific type of scenario multiple times
With the current implementation, all the scenarios of specific type
(for example, pod scenario) has to be executed together. All
pod_scenarios are followed by node_scenarios and so on.
(pod_scenarios -> node_scenarios -> pod_scenarios is not possible)
This commit enables the user to run a specific type of scenario
multiple times. For example, few pod_scenarios followed by
node_scenarios followed by few_scenarios.
2020-10-30 10:40:42 -04:00
prubenda
6f31519e5f adding time scenario 2020-10-27 08:37:54 -04:00
Naga Ravi Chaitanya Elluri
82743230fe Modify documentation to improve readability
This commit:
- Converts various sections in the readme into individual documents.
- Adds pointers to the public blogs.
- Updates workflow/architecture diagram.
- Adds community info and contributing guidelines.
2020-10-21 15:01:54 -04:00
Mike Fiedler
2e5eac4550 Fix comment in config.yml 2020-10-09 13:20:26 -04:00
prubenda
8f5b688fba working on powerfulseal retry logic 2020-09-11 17:08:31 -04:00
Yashashree Suresh
31f06b861a Added node scenarios to stop and terminate instance
This commit:
- Adds a node scenario to stop and start an instance
- Adds a node scenario to terminate an instance
- Adds a node scenario to reboot an instance
- Adds a node scenario to stop the kubelet
- Adds a node scenario to crash the node
2020-08-27 16:50:42 -04:00
prubenda
0fc82090f2 Adding watch to see if components recovered 2020-08-18 16:26:04 -04:00
prubenda
44e753867f Adding random regex pod kill 2020-07-06 22:00:12 -04:00
prubenda
52e232d0e7 Adding iterations or infinite run of kraken 2020-06-09 10:55:24 -04:00
Yashashree Suresh
f1c145e942 Integrated cerberus for checking cluster health 2020-04-22 23:30:21 -04:00
Naga Ravi Chaitanya Elluri
649134e492 Add initial version of kraken
This commit:
- Adds support to run pod chaos scenarios including killing an etcd,
  ApiServer and kube-apiserver using powerfulseal tool.
- Adds support to create a report with the details about each chaos
  injection along with timestamps. The report is generated in the
  run directory.
- Adds kubernetes package with a bunch of functions which can be
  used later to talk to the kubernetes API to be able to know the
  status of the targeted components/nodes.
2020-04-20 08:57:00 -04:00