Tullio Sebastiani
fb3bbe4e26
replaced log syntax to allow objects to be printed
...
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
2024-05-14 11:13:44 -04:00
Tullio Sebastiani
21b89a32a7
fixing missing import for log_exception
...
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
2024-05-13 11:58:13 -04:00
Tullio Sebastiani
a142f6e7a4
Service hijacking scenario ( #617 )
...
* WIP: service hijacking scenario
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* wip
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* error handling
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
adapted run_raken.py
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* restored config.yaml
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* added funtest
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
test fix
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fix
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fixed test
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fix
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fix test
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fixed funtest
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
funtest fix
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
minor nit
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
added explicit curl method
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
push
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fix
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
restored all funtests
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
added mime type test
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fixed pipeline
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
commented unit
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
utf-8
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
test restored
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fix test pipeline
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* documentation
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* krkn-lib 2.1.3
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* added other funtests to main merge to collect coverage
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
---------
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
2024-05-13 10:04:06 +02:00
Tullio Sebastiani
2dfa5cb0cd
fixes missing data in telemetry.json
...
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
2024-05-06 14:16:09 -04:00
Tullio Sebastiani
ab98e416a6
Integration of the new pod recovery monitoring strategy implemented in krkn-lib ( #609 )
...
* pod monitoring integration in plugin scenario
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* pod monitoring integration in container scenario
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* removed wait-for-pod step from plugin scenario config files
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* introduced global pod recovery time
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
nit
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* introduced krkn_pod_recovery_time in plugin scenario and removed all the references to wait-for-pods
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fix
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* functional test fix
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* main branch functional test fix
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* increased recovery times
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
---------
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
2024-04-23 10:49:01 +02:00
jtydlcak
804d7cbf58
Accept list of namespaces in chaos recommender
...
Signed-off-by: jtydlack <139967002+jtydlack@users.noreply.github.com >
2024-04-09 23:32:17 -04:00
Paige Rubendall
b79e526cfd
adding app outage not creating file ( #605 )
...
Signed-off-by: Paige Rubendall <prubenda@redhat.com >
2024-03-29 14:35:14 -04:00
yogananth
a1b81bd382
Fix: Reslove ingress network chaos plugin issue
...
Added network_chaos to plugin step and job wait time to be based on the test duration and set the default wait_time to 30s
Signed-off-by: yogananth subramanian <ysubrama@redhat.com >
2024-03-22 14:48:17 -04:00
Tullio Sebastiani
b9c0bb39c7
checking post run alerts properties presence ( #584 )
...
added metric check
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
2024-03-01 18:30:54 +01:00
Tullio Sebastiani
706a886151
checking alert properties presence ( #583 )
...
typo fix
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
2024-03-01 17:58:21 +01:00
Tullio Sebastiani
1298f220a6
Critical alerts collection and upload ( #577 )
...
* added prometheus client method for critical alerts
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* adapted run_kraken to the new plugin method for critical_alerts collection + telemetry upload
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* requirements.txt pointing temporarly to git
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* fixed severity level
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* added functional tests
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* exit on post chaos critical alerts
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
log moved
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* removed noisy log
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fixed log
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* updated requirements.txt to krkn-lib 1.4.13
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* krkn lib
* added check on variable that makes kraken return 1 whether post critical alerts are > 0
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
---------
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
2024-02-28 09:48:29 -05:00
jtydlcak
24059fb731
Add json output file option for recommender ( #511 )
...
Output in terminal changed to use json structure.
The json output file names are in format
recommender_namespace_YYYY-MM-DD_HH-MM-SS.
The path to the json file can be specified. Default path is in
kraken/utils/chaos_recommender/recommender_output.
Signed-off-by: jtydlcak <139967002+jtydlack@users.noreply.github.com >
2024-02-27 11:09:00 -05:00
Naga Ravi Chaitanya Elluri
ab951adb78
Expose thresholds config options ( #574 )
...
This commit allows users to edit the thresholds in the chaos-recommender
config to be able to identify outliers based on their use case.
Fixes https://github.com/krkn-chaos/krkn/issues/509
Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com >
2024-02-26 09:43:34 -05:00
Tullio Sebastiani
a7e5ae6c80
Replaced oc debug command execution on node with a native version ( #547 )
...
* native time skew feature
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* fixed podname conflict issue
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* updated krkn-lib to v1.4.6
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* fixed pod conflict issue
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
---------
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
2024-01-15 12:15:38 -05:00
Paige Rubendall
462f93ad87
updating scenarios to have deployers ( #537 )
...
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
2024-01-10 12:06:15 +01:00
Tullio Sebastiani
f2d7f88cb8
Krkn lib prometheus client + kube_burner references removed
...
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
2024-01-09 10:43:32 -05:00
Tullio Sebastiani
41f9573563
Fixes cluster shutdown issue with single entry in scenario config ( #535 )
...
* fixed cluster shutdown issue
* fixed config file list parsing
2023-12-15 14:22:25 -05:00
Naga Ravi Chaitanya Elluri
afe8d817a9
Print telemetry data location to stdout
...
This commit also deprecates litmus integration.
2023-11-13 10:01:17 -05:00
Naga Ravi Chaitanya Elluri
94bec8dc9b
Add missing import to get values from yaml ( #526 )
...
* Add missing import to get values from yaml
* Update Dockerfile
* Update Dockerfile-ppc64le
---------
Co-authored-by: Tullio Sebastiani <tsebastiani@users.noreply.github.com >
2023-11-07 11:07:17 +01:00
yogananth-subramanian
2111bab9a4
Pod ingress network shaping Chaos scenario
...
The scenario introduces network latency, packet loss, and bandwidth restriction in the Pod's network interface. The purpose of this scenario is to observe faults caused by random variations in the network.
Below example config applies ingress traffic shaping to openshift console.
````
- id: pod_ingress_shaping
config:
namespace: openshift-console # Required - Namespace of the pod to which filter need to be applied.
label_selector: 'component=ui' # Applies traffic shaping to access openshift console.
network_params:
latency: 500ms # Add 500ms latency to ingress traffic from the pod.
````
2023-11-06 23:34:17 -05:00
Tullio Sebastiani
7a966a71d0
krkn integration of telemetry events collection ( #523 )
...
* function package refactoring in krkn-lib
* cluster events collection flag
* krkn-lib version bump
requirements
* dockerfile bump
2023-10-31 14:31:33 -04:00
Tullio Sebastiani
27fabfd4af
OCP/K8S functionalities and packages splitting in krkn-lib ( #507 )
...
* krkn-lib ocp/k8s split adaptation
* library reference updated
* requirements update
* rebase with main + fix
2023-10-30 17:31:48 +01:00
Tullio Sebastiani
724068a978
Chaos recommender refactoring ( #516 )
...
* basic structure working
* config and options refactoring
nits and changes
* removed unused function with typo + fixed duration
* removed unused arguments
* minor fixes
2023-10-30 15:51:09 +01:00
jtydlack
86d1fda325
Fix container scenario to accept only signal number ( #350 ) ( #485 )
2023-10-24 16:51:48 -04:00
jtydlack
ff469579e9
Use function get_yaml_item_value
...
Enables using default even though the value was loaded as None.
2023-10-24 14:55:49 -04:00
Paige Rubendall
f7f1b2dfb0
Service disruption ( #494 )
...
* adding service disruption
* fixing kil services
* service log changes
* remvoing extra logging
* adding daemon set
* adding service disruption name changes
* cerberus config back
* bad string
2023-10-06 12:51:10 -04:00
Sahil Shah
0ad4c11356
Fix for time scenario ( #490 )
2023-09-14 12:36:08 -04:00
Tullio Sebastiani
f868000ebd
Switched from krkn_lib_kubernetes to krkn_lib v1.0.0 ( #469 )
...
* changed all the references to krkn_lib_kubernetes to the new krkn_lib
changed all the references
* added krkn-lib pointer in documentation
2023-08-22 12:41:40 -04:00
Sahil Shah
b569e6a9d5
Fixing pvc scenario
2023-08-16 16:05:18 -04:00
Tullio Sebastiani
39c0152b7b
Krkn telemetry integration ( #435 )
...
* adapted config.yaml to the new feature
* temporarly pointing requirement.txt to the lib feature branch
* run_kraken.py + arcaflow scenarios refactoring
typo
* plugin scenario
* node scenarios
return failed scenarios
* container scenarios
fix
* time scenarios
* cluster shutdown scenarios
* namespace scenarios
* zone outage scenarios
* app outage scenarios
* pvc scenarios
* network chaos scenarios
* run_kraken.py adaptation to telemetry
* prometheus telemetry upload + config.yaml
some fixes
typos and logs
max retries in config
telemetry id with run_uuid
safe_logger
* catch send_telemetry exception
* scenario collection bug fixes
* telemetry enabled check
* telemetry run tag
* requirements pointing to main + archive_size
* requirements.txt and config.yaml update
* added telemetry config to common config
* fixed scenario array elements for telemetry
2023-08-10 14:42:53 -04:00
jtydlack
491dc17267
Slo via http ( #459 )
...
* Fix typo
* Enable loading SLO profile via URL (#438 )
2023-08-10 11:02:33 -04:00
yogananth-subramanian
b2b5002f45
Pod egress network shapping Chaos scenario
...
The scenario introduces network latency, packet loss, and bandwidth restriction in the Pod's network interface.
The purpose of this scenario is to observe faults caused by random variations in the network.
Below example config applies egress traffic shaping to openshift console.
````
- id: pod_egress_shaping
config:
namespace: openshift-console # Required - Namespace of the pod to which filter need to be applied.
label_selector: 'component=ui' # Applies traffic shaping to access openshift console.
network_params:
latency: 500ms # Add 500ms latency to egress traffic from the pod.
````
2023-08-08 11:45:03 -04:00
Sahil Shah
19cc2c047f
Fix for pvc scenario
2023-07-21 15:41:28 -04:00
Tullio Sebastiani
68dc17bc44
krkn-lib-kubernetes refactoring proposal ( #400 )
...
* run_kraken.py updated + renamed kubernetes library folder
unstaged files
kubecli marker
* container scenarios updated
* node scenarios updated
typo
injected kubecli
* managed cluster scenarios updated
* time scenarios updated
* litmus scenarios updated
* cluster scenarios updated
* namespace scenarios updated
* pvc scenarios updated
* network chaos scenarios updated
* common_managed_cluster functions updated
* switched draft library to official one
* regression on rebase
2023-06-13 10:02:35 -04:00
Naga Ravi Chaitanya Elluri
572eeefaf4
Minor fixes
...
This commit fixes few typos and duplicate logs
2023-06-12 21:05:27 -04:00
José Castillo Lema
a7938e58d2
Allow kraken to run with environment variables instead of kubeconfig file ( #429 )
...
* Include check for inside k8s scenario
* Include check for inside k8s scenario (2)
* Include check for inside k8s scenario (3)
* Include check for inside k8s scenario (4)
2023-06-01 14:43:01 -04:00
yogananth-subramanian
8806781a4f
Pod network outage Chaos scenario
...
Pod network outage chaos scenario blocks traffic at pod level irrespective of the network policy used.
With the current network policies, it is not possible to explicitly block ports which are enabled
by allowed network policy rule. This chaos scenario addresses this issue by using OVS flow rules
to block ports related to the pod. It supports OpenShiftSDN and OVNKubernetes based networks.
Below example config blocks access to openshift console.
````
- id: pod_network_outage
config:
namespace: openshift-console
direction:
- ingress
ingress_ports:
- 8443
label_selector: 'component=ui'
````
2023-05-15 10:43:58 -04:00
Tullio Sebastiani
83b811bee4
Arcaflow stress-ng hogs with parallelism support ( #418 )
...
* kubeconfig management for arcaflow + hogs scenario refactoring
* kubeconfig authentication parsing refactored to support arcaflow kubernetes deployer
* reimplemented all the hog scenarios to allow multiple parallel containers of the same scenarios
(eg. to stress two or more nodes in the same run simultaneously)
* updated documentation
* removed sysbench scenarios
* recovered cpu hogs
* updated requirements.txt
* updated config.yaml
* added gitleaks file for test fixtures
* imported sys and logging
* removed config_arcaflow.yaml
* updated readme
* refactored arcaflow documentation entrypoint
2023-05-15 09:45:16 -04:00
Paige Rubendall
16ea18c718
Ibm plugin node scenario ( #417 )
...
* Node scenarios for ibmcloud
* adding openshift check info
2023-05-09 12:07:38 -04:00
Naga Ravi Chaitanya Elluri
bc863fa01f
Add support to check for critical alerts
...
This commit enables users to opt in to check for critical alerts firing
in the cluster post chaos at the end of each scenario. Chaos scenario is
considered as failed if the cluster is unhealthy in which case user can
start debugging to fix and harden respective areas.
Fixes https://github.com/redhat-chaos/krkn/issues/410
2023-05-03 16:14:13 -04:00
Tullio Sebastiani
691be66b0a
kubeconfig_path in new_client_from_config
...
added clients in the same context of the config
2023-04-19 14:12:46 -04:00
Naga Ravi Chaitanya Elluri
17f61625e4
Exit on critical alert failures
...
This commit captures and exits on non-zero return code i.e when
critical alerts are fired
Fixes https://github.com/redhat-chaos/krkn/issues/396
2023-03-27 12:43:57 -04:00
Tullio Sebastiani
fee4f7d2bf
arcaflow integration ( #384 )
...
arcaflow library version
Co-authored-by: Tullio Sebastiani <tsebasti@redhat.com >
2023-03-08 12:01:03 +01:00
Naga Ravi Chaitanya Elluri
64f4c234e9
Add prom token creation step
...
This enables compatability with all OpenShift versions.
Reference PR by Paige in Cerberus: https://github.com/redhat-chaos/cerberus/pull/190 .
2023-01-31 12:36:09 -05:00
José Castillo Lema
493a8a245f
Docker provider for node actions ( #369 )
...
* Docker provider for node actions
* Adjusted dependencies and imports
* Update config_kind.yaml
Signed-off-by: José Castillo Lema <josecastillolema@gmail.com >
Signed-off-by: José Castillo Lema <josecastillolema@gmail.com >
2023-01-10 14:36:18 -05:00
José Castillo Lema
d76ab31155
OCM/ACM integration ( #370 )
...
* OCM support for ManagedClusters
* Updated docs and general adjustments
* Improved docs
* Improved docs2
* Removed io packet import
Signed-off-by: José Castillo Lema <josecastillolema@gmail.com >
* Removed time from imports
Signed-off-by: José Castillo Lema <josecastillolema@gmail.com >
* Removed duplicate logging import
Signed-off-by: José Castillo Lema <josecastillolema@gmail.com >
* Removed sys import
Signed-off-by: José Castillo Lema <josecastillolema@gmail.com >
* Update run.py
Signed-off-by: José Castillo Lema <josecastillolema@gmail.com >
Signed-off-by: José Castillo Lema <josecastillolema@gmail.com >
2023-01-10 08:58:17 -05:00
Paige Rubendall
4035f2724b
Adding wait duration for pods ( #368 )
...
* adding wait duration for pods
* adding kube apiserver with plugin schema
2022-11-18 07:43:26 +05:30
Naga Ravi Chaitanya Elluri
1c207538b6
Use run dir instead of tmp
...
This commit also logs a message to handle the exception during the
node checks.
Fixes https://github.com/redhat-chaos/krkn/issues/356 , https://github.com/redhat-chaos/krkn/issues/357
2022-11-08 15:46:08 -05:00
Naga Ravi Chaitanya Elluri
6ccc16a0ab
Use autoescape=True to mitigate XSS vulnerabilities
...
Fixes https://github.com/redhat-chaos/krkn/issues/354
2022-11-08 14:34:06 -05:00
Naga Ravi Chaitanya Elluri
b9d5a7af4d
Use safe loader for Yaml
...
This fixes the security vulnerabilities for example - it raises an
exception when opening a yaml file with code.
Fixes https://github.com/redhat-chaos/krkn/issues/352
2022-11-08 13:35:06 -05:00