github/krkn - krkn - Gitea: Git with a nice cup of tea

mirror of https://github.com/krkn-chaos/krkn.git synced 2026-04-15 06:57:28 +00:00

Author	SHA1	Message	Date
Naga Ravi Chaitanya Elluri	487a9f464c	Deprecate long term metrics collection This will be added back soon via native prometheus integration. Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>	2024-01-10 15:08:58 -05:00
Tullio Sebastiani	f2d7f88cb8	Krkn lib prometheus client + kube_burner references removed Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-01-09 10:43:32 -05:00
Paige Rubendall	b03511850b	taking out more litmus references	2023-12-03 13:10:52 +05:30
Tullio Sebastiani	7a966a71d0	krkn integration of telemetry events collection (#523 ) * function package refactoring in krkn-lib * cluster events collection flag * krkn-lib version bump requirements * dockerfile bump	2023-10-31 14:31:33 -04:00
Naga Ravi Chaitanya Elluri	43d891afd3	Bump telemetry archive default size to 500MB This commit also removes litmus configs as they are not maintained.	2023-10-30 12:50:04 -04:00
Tullio Sebastiani	724068a978	Chaos recommender refactoring (#516 ) * basic structure working * config and options refactoring nits and changes * removed unused function with typo + fixed duration * removed unused arguments * minor fixes	2023-10-30 15:51:09 +01:00
Paige Rubendall	f7f1b2dfb0	Service disruption (#494 ) * adding service disruption * fixing kil services * service log changes * remvoing extra logging * adding daemon set * adding service disruption name changes * cerberus config back * bad string	2023-10-06 12:51:10 -04:00
Tullio Sebastiani	61356fd70b	Added log telemetry piece to Krkn (#500 ) * config * log collection and upload dictionary key fix * escape regex in config.yaml * bump krkn-lib version * updated funtest github cli command * update krkn-lib version to 1.3.2 * fixed requirements.txt	2023-10-06 10:08:46 -04:00
Tullio Sebastiani	f6f686e8fe	fixed io-hog scenario	2023-09-13 09:57:00 -04:00
Sahil Shah	585d519687	Adding Prometheus Disruption Scenario (#484 )	2023-09-11 11:18:29 -04:00
yogananth-subramanian	e40fedcd44	Update etcd metrics	2023-09-08 11:11:42 -04:00
Tullio Sebastiani	cee5259fd3	arcaflow scenarios removed from config.yaml	2023-08-23 08:50:19 -04:00
pratyusha	d2d80be241	Updated config.yaml file with more scenarios (#468 )	2023-08-21 11:26:33 -04:00
Tullio Sebastiani	39c0152b7b	Krkn telemetry integration (#435 ) * adapted config.yaml to the new feature * temporarly pointing requirement.txt to the lib feature branch * run_kraken.py + arcaflow scenarios refactoring typo * plugin scenario * node scenarios return failed scenarios * container scenarios fix * time scenarios * cluster shutdown scenarios * namespace scenarios * zone outage scenarios * app outage scenarios * pvc scenarios * network chaos scenarios * run_kraken.py adaptation to telemetry * prometheus telemetry upload + config.yaml some fixes typos and logs max retries in config telemetry id with run_uuid safe_logger * catch send_telemetry exception * scenario collection bug fixes * telemetry enabled check * telemetry run tag * requirements pointing to main + archive_size * requirements.txt and config.yaml update * added telemetry config to common config * fixed scenario array elements for telemetry	2023-08-10 14:42:53 -04:00
jtydlack	491dc17267	Slo via http (#459 ) * Fix typo * Enable loading SLO profile via URL (#438)	2023-08-10 11:02:33 -04:00
yogananth-subramanian	b2b5002f45	Pod egress network shapping Chaos scenario The scenario introduces network latency, packet loss, and bandwidth restriction in the Pod's network interface. The purpose of this scenario is to observe faults caused by random variations in the network. Below example config applies egress traffic shaping to openshift console. ```` - id: pod_egress_shaping config: namespace: openshift-console # Required - Namespace of the pod to which filter need to be applied. label_selector: 'component=ui' # Applies traffic shaping to access openshift console. network_params: latency: 500ms # Add 500ms latency to egress traffic from the pod. ````	2023-08-08 11:45:03 -04:00
Naga Ravi Chaitanya Elluri	de0567b067	Tweak the etcd alert severity	2023-06-16 09:19:17 -04:00
Naga Ravi Chaitanya Elluri	ce409ea6fb	Update kube-burner dependency version to 1.7.0	2023-06-15 11:55:17 -04:00
Naga Ravi Chaitanya Elluri	0eb8d38596	Expand SLOs profile to cover monitoring for more alerts This commit: - Also sets appropriate severity to avoid false failures for the test cases especially given that theses are monitored during the chaos vs post chaos. Critical alerts are all monitored post chaos with few monitored during the chaos that represent overall health and performance of the service. - Renames Alerts to SLOs validation Metrics reference: `f09a492b13/cmd/kube-burner/ocp-config/alerts.yml`	2023-06-14 16:58:36 -04:00
Tullio Sebastiani	72b46f8393	temporarly removed io-hog scenario (#433 ) * temporarly removed io-hog scenario * removed litmus documentation & config	2023-06-05 11:03:44 -04:00
Naga Ravi Chaitanya Elluri	9858f96c78	Change the severity of the etcd leader election check to warning This is the first step towards the goal to only have metrics tracking the overall health and performance of the component/cluster. For instance, for etcd disruption scenarios, leader elections are expected, we should instead track etcd leader availability and fsync latency under critical catergory vs leader elections.	2023-05-31 11:50:20 -04:00
yogananth-subramanian	8806781a4f	Pod network outage Chaos scenario Pod network outage chaos scenario blocks traffic at pod level irrespective of the network policy used. With the current network policies, it is not possible to explicitly block ports which are enabled by allowed network policy rule. This chaos scenario addresses this issue by using OVS flow rules to block ports related to the pod. It supports OpenShiftSDN and OVNKubernetes based networks. Below example config blocks access to openshift console. ```` - id: pod_network_outage config: namespace: openshift-console direction: - ingress ingress_ports: - 8443 label_selector: 'component=ui' ````	2023-05-15 10:43:58 -04:00
Tullio Sebastiani	83b811bee4	Arcaflow stress-ng hogs with parallelism support (#418 ) * kubeconfig management for arcaflow + hogs scenario refactoring * kubeconfig authentication parsing refactored to support arcaflow kubernetes deployer * reimplemented all the hog scenarios to allow multiple parallel containers of the same scenarios (eg. to stress two or more nodes in the same run simultaneously) * updated documentation * removed sysbench scenarios * recovered cpu hogs * updated requirements.txt * updated config.yaml * added gitleaks file for test fixtures * imported sys and logging * removed config_arcaflow.yaml * updated readme * refactored arcaflow documentation entrypoint	2023-05-15 09:45:16 -04:00
Paige Rubendall	16ea18c718	Ibm plugin node scenario (#417 ) * Node scenarios for ibmcloud * adding openshift check info	2023-05-09 12:07:38 -04:00
Naga Ravi Chaitanya Elluri	bc863fa01f	Add support to check for critical alerts This commit enables users to opt in to check for critical alerts firing in the cluster post chaos at the end of each scenario. Chaos scenario is considered as failed if the cluster is unhealthy in which case user can start debugging to fix and harden respective areas. Fixes https://github.com/redhat-chaos/krkn/issues/410	2023-05-03 16:14:13 -04:00
Tullio Sebastiani	3627b5ba88	cpu hog scenario + basic arcaflow documentation (#391 ) typo typo updated documentation fixed workflow map issue	2023-03-15 16:52:20 +01:00
Tullio Sebastiani	fee4f7d2bf	arcaflow integration (#384 ) arcaflow library version Co-authored-by: Tullio Sebastiani <tsebasti@redhat.com>	2023-03-08 12:01:03 +01:00
José Castillo Lema	493a8a245f	Docker provider for node actions (#369 ) * Docker provider for node actions * Adjusted dependencies and imports * Update config_kind.yaml Signed-off-by: José Castillo Lema <josecastillolema@gmail.com> Signed-off-by: José Castillo Lema <josecastillolema@gmail.com>	2023-01-10 14:36:18 -05:00
Naga Ravi Chaitanya Elluri	6b17dbdbb3	Allow users to set the listening address This commit provides an option for the user to set the listening address for the signal. This also fixes a security vulnerability. Fixes https://github.com/redhat-chaos/krkn/issues/307	2022-11-08 15:59:57 -05:00
Sandro Bonazzola	0c36903fff	config: really default to ~ instead of /root Documentation says we default to ~ for looking up the kubernetes config but then we set everywhere /root. Fixed the config to really look for ~. Should solve #327. Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com>	2022-09-13 12:01:16 +02:00
Shreyas Anantha Ramaprasad	9421a0c2c2	Added support for ingress traffic shaping (#299 ) * Added plugin for ingress network traffic shaping * Documentation changes * Minor changes * Documentation and formatting fixes * Added trap to sleep infinity command running in containers * Removed shell injection threat for modprobe commands * Added docstrings to cerberus functions * Added checks to prevent shell injection * Bug fix	2022-09-02 07:54:11 +02:00
Naga Ravi Chaitanya Elluri	6c75d3dddb	Add option to skip litmus installation This commit adds an option for the user to pick whether to install litmus or not depending on their use case. One use case is disconnected environments where litmus is pre-installed insted of reaching out to the internet.	2022-08-23 14:09:10 -04:00
Shreyas Anantha Ramaprasad	08deae63dd	Added VMware Node Scenarios (#285 ) * Added VMware node scenarios * Made vmware plugin independent of Krkn * Revert changes made to node status watch * Fixed minor documentation changes	2022-08-15 23:35:16 +02:00
Janos Bonic	ccd902565e	Fixes #265 : Replace Powerfulseal and introduce Wolkenwalze SDK for plugin system	2022-08-02 16:25:03 +01:00
Naga Ravi Chaitanya Elluri	9208f39e06	Add support to run on Kubernetes This commit: - Leverages distribution flag in the config set by the user to skip things not supported on OpenShift to be able to run scenarios on Kubernetes. - Adds sample config and scenario files that work on Kubernetes.	2022-06-01 07:27:06 -05:00
Adolfo Aguirrezabal	3adf5847b2	Add option to avoid litmus uninstall before chaos run (#242 ) * Adds option to avoid litmus uninstall before chaos run * Add new option to the config files	2022-05-05 09:02:25 -04:00
Steven Barre	3691bba5af	Update Litmus version in config_performance.yaml v1.10 uses a different case for `Experimentstatus` than v1.13 and thus will always fail.	2022-03-21 09:06:39 -04:00
yogananth-subramanian	50dd9873c1	Node egress traffic shaping Patch adds a scenario to create variations in egress traffic of a Node's interface using the tc and Netem.	2021-12-16 12:54:53 -05:00
Alejandro Gullón	baa812b7f0	Added new scenario to fill up a given volumen (#182 ) * Added new scenario to fill up a given volumen * fixing small issues and style * adding PVC as input param instead of pod name * small fix * get container name and volumen name replace oc with kubectl commands * adding yaml file to create a pv, pvc and pod to run pvc_scenario * adding support to match both string for describe command when looking for pod_name * added support to find the pvc from a given pod * small fix * small fix	2021-11-24 12:18:49 -05:00
Naga Ravi Chaitanya Elluri	674eb74a75	Expose setting the signal in the config This commit enables users to start Kraken to act as listener by setting the signal to PAUSE in the config to get the cluster to a desired test or run any setup before injecting chaos by setting the signal to RUN. This helps in cases where we have test cases that need to coordinate the chaos at a desired time depending on the state of the cluster/test run.	2021-10-26 09:05:25 -04:00
Paige Rubendall	6b865fc573	Adding server set up for kraken	2021-10-25 08:58:46 -04:00
Naga Ravi Chaitanya Elluri	970cd061f4	Set the location of cerberus config to match entrypoint Entrypoint for reference - https://github.com/cloud-bulldozer/cerberus/blob/master/containers/Dockerfile#L23.	2021-10-08 09:25:14 -04:00
Naga Ravi Chaitanya Elluri	cdf3bc03d2	Add support to block traffic to an application This commit enables users to simulate a downtime of an application by blocking the traffic for the specified duration to see how it/other components communicating with it behave in case of downtime.	2021-10-01 10:13:40 -04:00
Paige Rubendall	22df024312	adding validation that namespace becomes active	2021-09-28 09:58:55 -04:00
Naga Ravi Chaitanya Elluri	036e51a6b1	Delete litmus crd's during the cleanup This commit will ensure that the litmus resources installed on the cluster get cleaned up and also creates the chaosengine in the specified namespace.	2021-09-16 16:30:21 -04:00
Paige Rubendall	a9056ddf43	adding litmus logging	2021-09-08 17:11:49 -04:00
Naga Ravi Chaitanya Elluri	5da0b259c5	Run all the litmus resources in a single namespace - This eases the usage and debuggability by running the fault injection pods in the same namespace as other resources of litmus. This will also ease the deletion process and ensure that there are no leftover objects on the cluster. - This commit also enables users to use the same rbac template for all the litmus scenarios without having to pull in a specic one for each of the scenarios.	2021-09-08 16:37:07 -04:00
Naga Ravi Chaitanya Elluri	68a32666cd	Update litmus docs with supported scenarios	2021-09-01 16:41:22 -04:00
prubenda	9b0bcdbf0e	Adding node memory hog scenario	2021-08-20 14:02:00 -04:00
Naga Ravi Chaitanya Elluri	6456eec76a	Add zone outage scenarios This commit adds support to create zone outage in AWS by denying both ingress and egress traffic to the instances belonging to a particular subnet belonging to the zone by tweaking the network acl. This creates an outage of all the nodes in the zone - both master and workers.	2021-08-17 11:43:13 -04:00

1 2

73 Commits