133 Commits

Author SHA1 Message Date
Tullio Sebastiani
a7e5ae6c80 Replaced oc debug command execution on node with a native version (#547)
* native time skew feature

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* fixed podname conflict issue

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* updated krkn-lib to v1.4.6

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* fixed pod conflict issue

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

---------

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>
2024-01-15 12:15:38 -05:00
Paige Rubendall
462f93ad87 updating scenarios to have deployers (#537)
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>
2024-01-10 12:06:15 +01:00
Tullio Sebastiani
f2d7f88cb8 Krkn lib prometheus client + kube_burner references removed
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>
2024-01-09 10:43:32 -05:00
Tullio Sebastiani
41f9573563 Fixes cluster shutdown issue with single entry in scenario config (#535)
* fixed cluster shutdown issue

* fixed config file list parsing
2023-12-15 14:22:25 -05:00
Naga Ravi Chaitanya Elluri
afe8d817a9 Print telemetry data location to stdout
This commit also deprecates litmus integration.
2023-11-13 10:01:17 -05:00
Naga Ravi Chaitanya Elluri
94bec8dc9b Add missing import to get values from yaml (#526)
* Add missing import to get values from yaml

* Update Dockerfile

* Update Dockerfile-ppc64le

---------

Co-authored-by: Tullio Sebastiani <tsebastiani@users.noreply.github.com>
2023-11-07 11:07:17 +01:00
yogananth-subramanian
2111bab9a4 Pod ingress network shaping Chaos scenario
The scenario introduces network latency, packet loss, and bandwidth restriction in the Pod's network interface. The purpose of this scenario is to observe faults caused by random variations in the network.

Below example config applies ingress traffic shaping to openshift console.
````
- id: pod_ingress_shaping
  config:
    namespace: openshift-console   # Required - Namespace of the pod to which filter need to be applied.
    label_selector: 'component=ui' # Applies traffic shaping to access openshift console.
    network_params:
        latency: 500ms             # Add 500ms latency to ingress traffic from the pod.
````
2023-11-06 23:34:17 -05:00
Tullio Sebastiani
7a966a71d0 krkn integration of telemetry events collection (#523)
* function package refactoring in krkn-lib

* cluster events collection flag

* krkn-lib version bump

requirements

* dockerfile bump
2023-10-31 14:31:33 -04:00
Tullio Sebastiani
27fabfd4af OCP/K8S functionalities and packages splitting in krkn-lib (#507)
* krkn-lib ocp/k8s split adaptation

* library reference updated

* requirements update

* rebase with main + fix
2023-10-30 17:31:48 +01:00
Tullio Sebastiani
724068a978 Chaos recommender refactoring (#516)
* basic structure working

* config and options refactoring

nits and changes

* removed unused function with typo + fixed duration

* removed unused arguments

* minor fixes
2023-10-30 15:51:09 +01:00
jtydlack
86d1fda325 Fix container scenario to accept only signal number (#350) (#485) 2023-10-24 16:51:48 -04:00
jtydlack
ff469579e9 Use function get_yaml_item_value
Enables using default even though the value was loaded as None.
2023-10-24 14:55:49 -04:00
Paige Rubendall
f7f1b2dfb0 Service disruption (#494)
* adding service disruption

* fixing kil services

* service log changes

* remvoing extra logging

* adding daemon set

* adding service disruption name changes

* cerberus config back

* bad string
2023-10-06 12:51:10 -04:00
Sahil Shah
0ad4c11356 Fix for time scenario (#490) 2023-09-14 12:36:08 -04:00
Tullio Sebastiani
f868000ebd Switched from krkn_lib_kubernetes to krkn_lib v1.0.0 (#469)
* changed all the references to krkn_lib_kubernetes to the new krkn_lib


changed all the references

* added krkn-lib pointer in documentation
2023-08-22 12:41:40 -04:00
Sahil Shah
b569e6a9d5 Fixing pvc scenario 2023-08-16 16:05:18 -04:00
Tullio Sebastiani
39c0152b7b Krkn telemetry integration (#435)
* adapted config.yaml to the new feature

* temporarly pointing requirement.txt to the lib feature branch

* run_kraken.py + arcaflow scenarios refactoring


typo

* plugin scenario

* node scenarios


return failed scenarios

* container scenarios


fix

* time scenarios

* cluster shutdown  scenarios

* namespace scenarios

* zone outage scenarios

* app outage scenarios

* pvc scenarios

* network chaos scenarios

* run_kraken.py adaptation to telemetry

* prometheus telemetry upload + config.yaml


some fixes


typos and logs


max retries in config


telemetry id with run_uuid


safe_logger

* catch send_telemetry exception

* scenario collection bug fixes

* telemetry enabled check

* telemetry run tag

* requirements pointing to main + archive_size

* requirements.txt and config.yaml update

* added telemetry config to common config

* fixed scenario array elements for telemetry
2023-08-10 14:42:53 -04:00
jtydlack
491dc17267 Slo via http (#459)
* Fix typo

* Enable loading SLO profile via URL (#438)
2023-08-10 11:02:33 -04:00
yogananth-subramanian
b2b5002f45 Pod egress network shapping Chaos scenario
The scenario introduces network latency, packet loss, and bandwidth restriction in the Pod's network interface.
The purpose of this scenario is to observe faults caused by random variations in the network.

Below example config applies egress traffic shaping to openshift console.
````
- id: pod_egress_shaping
  config:
    namespace: openshift-console   # Required - Namespace of the pod to which filter need to be applied.
    label_selector: 'component=ui' # Applies traffic shaping to access openshift console.
    network_params:
        latency: 500ms             # Add 500ms latency to egress traffic from the pod.
````
2023-08-08 11:45:03 -04:00
Sahil Shah
19cc2c047f Fix for pvc scenario 2023-07-21 15:41:28 -04:00
Tullio Sebastiani
68dc17bc44 krkn-lib-kubernetes refactoring proposal (#400)
* run_kraken.py updated + renamed kubernetes library folder


unstaged files


kubecli marker

* container scenarios updated

* node scenarios updated


typo


injected kubecli

* managed cluster scenarios updated

* time scenarios updated

* litmus scenarios updated

* cluster scenarios updated

* namespace scenarios updated

* pvc scenarios updated

* network chaos scenarios updated

* common_managed_cluster functions updated

* switched draft library to official one

* regression on rebase
2023-06-13 10:02:35 -04:00
Naga Ravi Chaitanya Elluri
572eeefaf4 Minor fixes
This commit fixes few typos and duplicate logs
2023-06-12 21:05:27 -04:00
José Castillo Lema
a7938e58d2 Allow kraken to run with environment variables instead of kubeconfig file (#429)
* Include check for inside k8s scenario

* Include check for inside k8s scenario (2)

* Include check for inside k8s scenario (3)

* Include check for inside k8s scenario (4)
2023-06-01 14:43:01 -04:00
yogananth-subramanian
8806781a4f Pod network outage Chaos scenario
Pod network outage chaos scenario blocks traffic at pod level irrespective of the network policy used.
With the current network policies, it is not possible to explicitly block ports which are enabled
by allowed network policy rule. This chaos scenario addresses this issue by using OVS flow rules
to block ports related to the pod. It supports OpenShiftSDN and OVNKubernetes based networks.

Below example config blocks access to openshift console.
````
- id: pod_network_outage
  config:
    namespace: openshift-console
    direction:
        - ingress
    ingress_ports:
        - 8443
    label_selector: 'component=ui'
````
2023-05-15 10:43:58 -04:00
Tullio Sebastiani
83b811bee4 Arcaflow stress-ng hogs with parallelism support (#418)
* kubeconfig management for arcaflow + hogs scenario refactoring  

  * kubeconfig authentication parsing refactored to support arcaflow kubernetes deployer  
  * reimplemented all the hog scenarios to allow multiple parallel containers of the same scenarios 
  (eg. to stress two or more nodes in the same run simultaneously) 
  * updated documentation 
* removed sysbench scenarios


* recovered cpu hogs


* updated requirements.txt


* updated config.yaml

* added gitleaks file for test fixtures

* imported sys and logging

* removed config_arcaflow.yaml

* updated readme

* refactored arcaflow documentation entrypoint
2023-05-15 09:45:16 -04:00
Paige Rubendall
16ea18c718 Ibm plugin node scenario (#417)
* Node scenarios for ibmcloud

* adding openshift check info
2023-05-09 12:07:38 -04:00
Naga Ravi Chaitanya Elluri
bc863fa01f Add support to check for critical alerts
This commit enables users to opt in to check for critical alerts firing
in the cluster post chaos at the end of each scenario. Chaos scenario is
considered as failed if the cluster is unhealthy in which case user can
start debugging to fix and harden respective areas.

Fixes https://github.com/redhat-chaos/krkn/issues/410
2023-05-03 16:14:13 -04:00
Tullio Sebastiani
691be66b0a kubeconfig_path in new_client_from_config
added clients in the same context of the config
2023-04-19 14:12:46 -04:00
Naga Ravi Chaitanya Elluri
17f61625e4 Exit on critical alert failures
This commit captures and exits on non-zero return code i.e when
critical alerts are fired

Fixes https://github.com/redhat-chaos/krkn/issues/396
2023-03-27 12:43:57 -04:00
Tullio Sebastiani
fee4f7d2bf arcaflow integration (#384)
arcaflow library version

Co-authored-by: Tullio Sebastiani <tsebasti@redhat.com>
2023-03-08 12:01:03 +01:00
Naga Ravi Chaitanya Elluri
64f4c234e9 Add prom token creation step
This enables compatability with all OpenShift versions.
Reference PR by Paige in Cerberus: https://github.com/redhat-chaos/cerberus/pull/190.
2023-01-31 12:36:09 -05:00
José Castillo Lema
493a8a245f Docker provider for node actions (#369)
* Docker provider for node actions

* Adjusted dependencies and imports

* Update config_kind.yaml

Signed-off-by: José Castillo Lema <josecastillolema@gmail.com>

Signed-off-by: José Castillo Lema <josecastillolema@gmail.com>
2023-01-10 14:36:18 -05:00
José Castillo Lema
d76ab31155 OCM/ACM integration (#370)
* OCM support for ManagedClusters

* Updated docs and general adjustments

* Improved docs

* Improved docs2

* Removed io packet import

Signed-off-by: José Castillo Lema <josecastillolema@gmail.com>

* Removed time from imports

Signed-off-by: José Castillo Lema <josecastillolema@gmail.com>

* Removed duplicate logging import

Signed-off-by: José Castillo Lema <josecastillolema@gmail.com>

* Removed sys import

Signed-off-by: José Castillo Lema <josecastillolema@gmail.com>

* Update run.py

Signed-off-by: José Castillo Lema <josecastillolema@gmail.com>

Signed-off-by: José Castillo Lema <josecastillolema@gmail.com>
2023-01-10 08:58:17 -05:00
Paige Rubendall
4035f2724b Adding wait duration for pods (#368)
* adding wait duration for pods

* adding kube apiserver with plugin schema
2022-11-18 07:43:26 +05:30
Naga Ravi Chaitanya Elluri
1c207538b6 Use run dir instead of tmp
This commit also logs a message to handle the exception during the
node checks.

Fixes https://github.com/redhat-chaos/krkn/issues/356, https://github.com/redhat-chaos/krkn/issues/357
2022-11-08 15:46:08 -05:00
Naga Ravi Chaitanya Elluri
6ccc16a0ab Use autoescape=True to mitigate XSS vulnerabilities
Fixes https://github.com/redhat-chaos/krkn/issues/354
2022-11-08 14:34:06 -05:00
Naga Ravi Chaitanya Elluri
b9d5a7af4d Use safe loader for Yaml
This fixes the security vulnerabilities for example - it raises an
exception when opening a yaml file with code.

Fixes https://github.com/redhat-chaos/krkn/issues/352
2022-11-08 13:35:06 -05:00
Sandro Bonazzola
1c4a51cbfa refactor: use arcaflow plugin
Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com>
2022-10-18 16:43:33 +02:00
Naga Ravi Chaitanya Elluri
9f23699cfa Document node scenario actions for VMware
This commit also updates the id's for the VMware scenarios to be aligned
with other cloud providers.
2022-09-07 11:34:14 -04:00
Sandro Bonazzola
ec807e3b3a pycodestyle fixes: vmware_plugin.py
Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com>
2022-09-05 14:15:38 +02:00
Sandro Bonazzola
b444854cb2 pycodestyle fixes: kraken/pvc/pvc_scenario.py
Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com>
2022-09-05 13:36:16 +02:00
Sandro Bonazzola
1dc58d8721 pycodestyle fixes: ingress_shaping.py
Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com>
2022-09-05 13:20:23 +02:00
Sandro Bonazzola
6112ba63c3 plugins/run_python_plugin.py: remove unused import
Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com>
2022-09-05 13:20:23 +02:00
Sandro Bonazzola
6ba1e1ad8b waive bandit report on insecure random usage
Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com>
2022-09-02 15:57:39 +02:00
Sandro Bonazzola
3b476b68f2 pycodestyle fixes: kraken/time_actions/common_time_functions.py
Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com>
2022-09-02 15:57:39 +02:00
Sandro Bonazzola
e17ebd0e7b pycodestyle fixes: kraken/shut_down/common_shut_down_func.py
Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com>
2022-09-02 15:44:42 +02:00
Sandro Bonazzola
d0d289fb7c update references to github organization
Updated references from chaos-kubox to redhat-chaos.

Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com>
2022-09-02 14:38:25 +02:00
Sandro Bonazzola
66f88f5a78 pyflakes: fix imports for allowing analysis
Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com>
2022-09-02 14:23:11 +02:00
Sandro Bonazzola
90b45538f2 pycodestyle fixes: kraken/cerberus/setup.py
Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com>
2022-09-02 06:32:46 -04:00
Sandro Bonazzola
c94c2b22a9 pycodestyle fixes: kraken/zone_outage/actions.py
Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com>
2022-09-02 09:15:59 +02:00