Tullio Sebastiani
d91172d9b2
Core Refactoring, Krkn Scenario Plugin API ( #694 )
...
* relocated shared libraries from `kraken` to `krkn` folder
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* AbstractScenarioPlugin and ScenarioPluginFactory
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* application_outage porting
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* arcaflow_scenarios porting
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* managedcluster_scenarios porting
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* network_chaos porting
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* node_actions porting
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* plugin_scenarios porting
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* pvc_scenarios porting
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* service_disruption porting
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* service_hijacking porting
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* cluster_shut_down_scenarios porting
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* syn_flood porting
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* time_scenarios porting
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* zone_outages porting
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* ScenarioPluginFactory tests
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* unit tests update
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* pod_scenarios and post actions deprecated
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
scenarios post_actions
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* funtests and config update
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* run_krkn.py update
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* utils porting
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* API Documentation
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* container_scenarios porting
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fix
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* funtest fix
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* document gif update
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* Documentation + tests update
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* removed example plugin
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* global renaming
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
test fix
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
test fix
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* config.yaml typos
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
typos
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* removed `plugin_scenarios` from NativScenarioPlugin class
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* pod_network_scenarios type added
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* documentation update
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* krkn-lib update
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
typo
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
---------
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
2024-10-03 20:48:04 +02:00
Naga Ravi Chaitanya Elluri
5e7938ba4a
Update default configuration pointer for the node scenarios ( #693 )
...
Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com >
2024-09-09 22:10:25 -04:00
Tullio Sebastiani
6186555c15
Elastic search krkn-lib integration ( #658 )
...
* Elastic search krkn-lib integration
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
removed default urls
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* Fix alerts bug on prometheus
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* fixed prometheus object initialization bug
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* updated requirements to krkn-lib 2.1.8
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* disabled alerts and metrics by default
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* reverted requirement to elastic branch on krkn-lib
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* numpy downgrade
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* maximium retries added to hijacking funtest
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* added elastic settings to funtest config
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* krkn-lib 3.0.0 update
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
---------
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
2024-08-28 10:46:42 -04:00
Naga Ravi Chaitanya Elluri
624f50acd1
Output rate of increase for the SLO queries
...
This commit:
- Also switches the rate queries severity to critical as 5%
threshold is high for low scale/density clusters and needs to be flagged.
- Adds rate queries to openshift alerts file
Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com >
2024-08-01 12:29:35 -04:00
Tullio Sebastiani
e02c6d1287
SYN flood scenario ( #668 )
...
* scenario config file
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* syn flood plugin
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* run_krkn.py updaated
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* requirements.txt + documentation + config.yaml
* set node selector defaults to worker
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
---------
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
2024-07-29 15:31:37 -04:00
jtydlack
04425a8d8a
Add alerts to alert.yaml
...
Signed-off-by: jtydlack <139967002+jtydlack@users.noreply.github.com >
2024-07-25 10:51:15 -04:00
Tullio Sebastiani
a142f6e7a4
Service hijacking scenario ( #617 )
...
* WIP: service hijacking scenario
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* wip
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* error handling
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
adapted run_raken.py
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* restored config.yaml
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* added funtest
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
test fix
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fix
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fixed test
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fix
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fix test
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fixed funtest
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
funtest fix
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
minor nit
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
added explicit curl method
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
push
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fix
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
restored all funtests
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
added mime type test
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fixed pipeline
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
commented unit
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
utf-8
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
test restored
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
fix test pipeline
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* documentation
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* krkn-lib 2.1.3
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
* added other funtests to main merge to collect coverage
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
---------
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
2024-05-13 10:04:06 +02:00
jtydlcak
804d7cbf58
Accept list of namespaces in chaos recommender
...
Signed-off-by: jtydlack <139967002+jtydlack@users.noreply.github.com >
2024-04-09 23:32:17 -04:00
Liangquan Li
8bf21392f1
fix doc's nit
...
Signed-off-by: Liangquan Li <liangli@redhat.com >
2024-03-13 15:21:57 -04:00
Tullio Sebastiani
c71ce31779
integrated new telemetry library for WS 2.0
...
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
updated krkn-lib version
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
2024-02-28 22:58:54 -05:00
jtydlcak
24059fb731
Add json output file option for recommender ( #511 )
...
Output in terminal changed to use json structure.
The json output file names are in format
recommender_namespace_YYYY-MM-DD_HH-MM-SS.
The path to the json file can be specified. Default path is in
kraken/utils/chaos_recommender/recommender_output.
Signed-off-by: jtydlcak <139967002+jtydlack@users.noreply.github.com >
2024-02-27 11:09:00 -05:00
Paige Rubendall
fec0434ce1
adding upload to elastic search
...
Signed-off-by: Paige Rubendall <prubenda@redhat.com >
2024-02-13 12:01:40 -05:00
Paige Rubendall
67d4ee9fa2
updating comment to match query ( #568 )
...
Signed-off-by: Paige Rubendall <prubenda@redhat.com >
2024-02-08 22:09:37 -05:00
Naga Ravi Chaitanya Elluri
487a9f464c
Deprecate long term metrics collection
...
This will be added back soon via native prometheus integration.
Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com >
2024-01-10 15:08:58 -05:00
Tullio Sebastiani
f2d7f88cb8
Krkn lib prometheus client + kube_burner references removed
...
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com >
2024-01-09 10:43:32 -05:00
Paige Rubendall
b03511850b
taking out more litmus references
2023-12-03 13:10:52 +05:30
Tullio Sebastiani
7a966a71d0
krkn integration of telemetry events collection ( #523 )
...
* function package refactoring in krkn-lib
* cluster events collection flag
* krkn-lib version bump
requirements
* dockerfile bump
2023-10-31 14:31:33 -04:00
Naga Ravi Chaitanya Elluri
43d891afd3
Bump telemetry archive default size to 500MB
...
This commit also removes litmus configs as they are not maintained.
2023-10-30 12:50:04 -04:00
Tullio Sebastiani
724068a978
Chaos recommender refactoring ( #516 )
...
* basic structure working
* config and options refactoring
nits and changes
* removed unused function with typo + fixed duration
* removed unused arguments
* minor fixes
2023-10-30 15:51:09 +01:00
Paige Rubendall
f7f1b2dfb0
Service disruption ( #494 )
...
* adding service disruption
* fixing kil services
* service log changes
* remvoing extra logging
* adding daemon set
* adding service disruption name changes
* cerberus config back
* bad string
2023-10-06 12:51:10 -04:00
Tullio Sebastiani
61356fd70b
Added log telemetry piece to Krkn ( #500 )
...
* config
* log collection and upload
dictionary key fix
* escape regex in config.yaml
* bump krkn-lib version
* updated funtest github cli command
* update krkn-lib version to 1.3.2
* fixed requirements.txt
2023-10-06 10:08:46 -04:00
Tullio Sebastiani
f6f686e8fe
fixed io-hog scenario
2023-09-13 09:57:00 -04:00
Sahil Shah
585d519687
Adding Prometheus Disruption Scenario ( #484 )
2023-09-11 11:18:29 -04:00
yogananth-subramanian
e40fedcd44
Update etcd metrics
2023-09-08 11:11:42 -04:00
Tullio Sebastiani
cee5259fd3
arcaflow scenarios removed from config.yaml
2023-08-23 08:50:19 -04:00
pratyusha
d2d80be241
Updated config.yaml file with more scenarios ( #468 )
2023-08-21 11:26:33 -04:00
Tullio Sebastiani
39c0152b7b
Krkn telemetry integration ( #435 )
...
* adapted config.yaml to the new feature
* temporarly pointing requirement.txt to the lib feature branch
* run_kraken.py + arcaflow scenarios refactoring
typo
* plugin scenario
* node scenarios
return failed scenarios
* container scenarios
fix
* time scenarios
* cluster shutdown scenarios
* namespace scenarios
* zone outage scenarios
* app outage scenarios
* pvc scenarios
* network chaos scenarios
* run_kraken.py adaptation to telemetry
* prometheus telemetry upload + config.yaml
some fixes
typos and logs
max retries in config
telemetry id with run_uuid
safe_logger
* catch send_telemetry exception
* scenario collection bug fixes
* telemetry enabled check
* telemetry run tag
* requirements pointing to main + archive_size
* requirements.txt and config.yaml update
* added telemetry config to common config
* fixed scenario array elements for telemetry
2023-08-10 14:42:53 -04:00
jtydlack
491dc17267
Slo via http ( #459 )
...
* Fix typo
* Enable loading SLO profile via URL (#438 )
2023-08-10 11:02:33 -04:00
yogananth-subramanian
b2b5002f45
Pod egress network shapping Chaos scenario
...
The scenario introduces network latency, packet loss, and bandwidth restriction in the Pod's network interface.
The purpose of this scenario is to observe faults caused by random variations in the network.
Below example config applies egress traffic shaping to openshift console.
````
- id: pod_egress_shaping
config:
namespace: openshift-console # Required - Namespace of the pod to which filter need to be applied.
label_selector: 'component=ui' # Applies traffic shaping to access openshift console.
network_params:
latency: 500ms # Add 500ms latency to egress traffic from the pod.
````
2023-08-08 11:45:03 -04:00
Naga Ravi Chaitanya Elluri
de0567b067
Tweak the etcd alert severity
2023-06-16 09:19:17 -04:00
Naga Ravi Chaitanya Elluri
ce409ea6fb
Update kube-burner dependency version to 1.7.0
2023-06-15 11:55:17 -04:00
Naga Ravi Chaitanya Elluri
0eb8d38596
Expand SLOs profile to cover monitoring for more alerts
...
This commit:
- Also sets appropriate severity to avoid false failures for the
test cases especially given that theses are monitored during the chaos
vs post chaos. Critical alerts are all monitored post chaos with few
monitored during the chaos that represent overall health and performance
of the service.
- Renames Alerts to SLOs validation
Metrics reference: f09a492b13/cmd/kube-burner/ocp-config/alerts.yml
2023-06-14 16:58:36 -04:00
Tullio Sebastiani
72b46f8393
temporarly removed io-hog scenario ( #433 )
...
* temporarly removed io-hog scenario
* removed litmus documentation & config
2023-06-05 11:03:44 -04:00
Naga Ravi Chaitanya Elluri
9858f96c78
Change the severity of the etcd leader election check to warning
...
This is the first step towards the goal to only have metrics tracking
the overall health and performance of the component/cluster. For instance,
for etcd disruption scenarios, leader elections are expected, we should instead
track etcd leader availability and fsync latency under critical catergory vs leader
elections.
2023-05-31 11:50:20 -04:00
yogananth-subramanian
8806781a4f
Pod network outage Chaos scenario
...
Pod network outage chaos scenario blocks traffic at pod level irrespective of the network policy used.
With the current network policies, it is not possible to explicitly block ports which are enabled
by allowed network policy rule. This chaos scenario addresses this issue by using OVS flow rules
to block ports related to the pod. It supports OpenShiftSDN and OVNKubernetes based networks.
Below example config blocks access to openshift console.
````
- id: pod_network_outage
config:
namespace: openshift-console
direction:
- ingress
ingress_ports:
- 8443
label_selector: 'component=ui'
````
2023-05-15 10:43:58 -04:00
Tullio Sebastiani
83b811bee4
Arcaflow stress-ng hogs with parallelism support ( #418 )
...
* kubeconfig management for arcaflow + hogs scenario refactoring
* kubeconfig authentication parsing refactored to support arcaflow kubernetes deployer
* reimplemented all the hog scenarios to allow multiple parallel containers of the same scenarios
(eg. to stress two or more nodes in the same run simultaneously)
* updated documentation
* removed sysbench scenarios
* recovered cpu hogs
* updated requirements.txt
* updated config.yaml
* added gitleaks file for test fixtures
* imported sys and logging
* removed config_arcaflow.yaml
* updated readme
* refactored arcaflow documentation entrypoint
2023-05-15 09:45:16 -04:00
Paige Rubendall
16ea18c718
Ibm plugin node scenario ( #417 )
...
* Node scenarios for ibmcloud
* adding openshift check info
2023-05-09 12:07:38 -04:00
Naga Ravi Chaitanya Elluri
bc863fa01f
Add support to check for critical alerts
...
This commit enables users to opt in to check for critical alerts firing
in the cluster post chaos at the end of each scenario. Chaos scenario is
considered as failed if the cluster is unhealthy in which case user can
start debugging to fix and harden respective areas.
Fixes https://github.com/redhat-chaos/krkn/issues/410
2023-05-03 16:14:13 -04:00
Tullio Sebastiani
3627b5ba88
cpu hog scenario + basic arcaflow documentation ( #391 )
...
typo
typo
updated documentation
fixed workflow map issue
2023-03-15 16:52:20 +01:00
Tullio Sebastiani
fee4f7d2bf
arcaflow integration ( #384 )
...
arcaflow library version
Co-authored-by: Tullio Sebastiani <tsebasti@redhat.com >
2023-03-08 12:01:03 +01:00
José Castillo Lema
493a8a245f
Docker provider for node actions ( #369 )
...
* Docker provider for node actions
* Adjusted dependencies and imports
* Update config_kind.yaml
Signed-off-by: José Castillo Lema <josecastillolema@gmail.com >
Signed-off-by: José Castillo Lema <josecastillolema@gmail.com >
2023-01-10 14:36:18 -05:00
Naga Ravi Chaitanya Elluri
6b17dbdbb3
Allow users to set the listening address
...
This commit provides an option for the user to set the listening address
for the signal. This also fixes a security vulnerability.
Fixes https://github.com/redhat-chaos/krkn/issues/307
2022-11-08 15:59:57 -05:00
Sandro Bonazzola
0c36903fff
config: really default to ~ instead of /root
...
Documentation says we default to ~ for looking up the kubernetes config
but then we set everywhere /root. Fixed the config to really look for ~.
Should solve #327 .
Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com >
2022-09-13 12:01:16 +02:00
Shreyas Anantha Ramaprasad
9421a0c2c2
Added support for ingress traffic shaping ( #299 )
...
* Added plugin for ingress network traffic shaping
* Documentation changes
* Minor changes
* Documentation and formatting fixes
* Added trap to sleep infinity command running in containers
* Removed shell injection threat for modprobe commands
* Added docstrings to cerberus functions
* Added checks to prevent shell injection
* Bug fix
2022-09-02 07:54:11 +02:00
Naga Ravi Chaitanya Elluri
6c75d3dddb
Add option to skip litmus installation
...
This commit adds an option for the user to pick whether to install
litmus or not depending on their use case. One use case is disconnected
environments where litmus is pre-installed insted of reaching out to the
internet.
2022-08-23 14:09:10 -04:00
Shreyas Anantha Ramaprasad
08deae63dd
Added VMware Node Scenarios ( #285 )
...
* Added VMware node scenarios
* Made vmware plugin independent of Krkn
* Revert changes made to node status watch
* Fixed minor documentation changes
2022-08-15 23:35:16 +02:00
Janos Bonic
ccd902565e
Fixes #265 : Replace Powerfulseal and introduce Wolkenwalze SDK for plugin system
2022-08-02 16:25:03 +01:00
Naga Ravi Chaitanya Elluri
9208f39e06
Add support to run on Kubernetes
...
This commit:
- Leverages distribution flag in the config set by the user to skip
things not supported on OpenShift to be able to run scenarios on
Kubernetes.
- Adds sample config and scenario files that work on Kubernetes.
2022-06-01 07:27:06 -05:00
Adolfo Aguirrezabal
3adf5847b2
Add option to avoid litmus uninstall before chaos run ( #242 )
...
* Adds option to avoid litmus uninstall before chaos run
* Add new option to the config files
2022-05-05 09:02:25 -04:00
Steven Barre
3691bba5af
Update Litmus version in config_performance.yaml
...
v1.10 uses a different case for `Experimentstatus` than v1.13 and thus will always fail.
2022-03-21 09:06:39 -04:00