Commit Graph

127 Commits

Author SHA1 Message Date
Naga Ravi Chaitanya Elluri
32fe0223ff Add recommendations around Pod Disruption Budgets
Some checks failed
Functional & Unit Tests / Functional & Unit Tests (push) Failing after 9m14s
Functional & Unit Tests / Generate Coverage Badge (push) Has been skipped
This commit adds recommendation to test and ensure Pod Disruption
Budgets are set for critical applications to avoid downtime.

Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>
2025-03-06 07:56:02 -05:00
jtydlack
a25736ad08 Add autodetecting distribution (#753)
Some checks failed
Functional & Unit Tests / Functional & Unit Tests (push) Failing after 9m12s
Functional & Unit Tests / Generate Coverage Badge (push) Has been skipped
Used is_openshift function from krkn lib



Remove distribution from config



Remove distribution from documentation

Signed-off-by: jtydlack <139967002+jtydlack@users.noreply.github.com>
2025-02-13 15:45:08 -05:00
Meghana Katta
69bf20fc76 Fixed the spelling mistake
Signed-off-by: Meghana Katta <mkatta@mkatta-thinkpadt14gen4.bengluru.csb>
2025-02-05 12:53:30 -05:00
Paige Patton
21ab8d475d adding vsphere updates to non native
Some checks failed
Functional & Unit Tests / Functional & Unit Tests (push) Failing after 10m19s
Functional & Unit Tests / Generate Coverage Badge (push) Has been skipped
Signed-off-by: Paige Patton <prubenda@redhat.com>
2025-01-31 15:21:48 -05:00
Tullio Sebastiani
c7e068a562 Hog scenario porting from arcaflow to native (#748)
* added new native hog scenario

* removed arcaflow dependency + legacy hog scenarios

* config update

* changed hog configuration structure + added average samples

* fix on cpu count

* removes tripledes warning

* changed selector format

* changed selector syntax

* number of nodes option

* documentation

* functional tests

* exception handling on hog deployment thread
2025-01-31 17:01:26 +01:00
Pablo Méndez Hernández
667798d588 Change API from 'Google API Client' to 'Google Cloud Python Client' (#723)
* Document how to use Google's credentials associated with a user acccount

Signed-off-by: Pablo Méndez Hernández <pablomh@redhat.com>

* Change API from 'Google API Client' to 'Google Cloud Python Client'

According to the 'Google API Client' GH page:

```
This library is considered complete and is in maintenance mode. This means
that we will address critical bugs and security issues but will not add any
new features.

This library is officially supported by Google. However, the maintainers of
this repository recommend using Cloud Client Libraries for Python, where
possible, for new code development.
```

So change the code accordingly to adapt it to 'Google Cloud Python Client'.

Signed-off-by: Pablo Méndez Hernández <pablomh@redhat.com>

---------

Signed-off-by: Pablo Méndez Hernández <pablomh@redhat.com>
2024-12-12 22:34:45 -05:00
jtydlack
0c30d89a1b Add node_disk_detach_attach_scenario for aws under node scenarios
Resolves #678

Signed-off-by: jtydlack <139967002+jtydlack@users.noreply.github.com>

Add functions for aws detach disk scenario

Signed-off-by: jtydlack <139967002+jtydlack@users.noreply.github.com>

Add detach disk scenario in node scenario

Signed-off-by: jtydlack <139967002+jtydlack@users.noreply.github.com>

Add disk_deatch_attach_scenario in docs

Signed-off-by: jtydlack <139967002+jtydlack@users.noreply.github.com>
2024-12-10 09:21:05 -05:00
Paige Patton
2ba20fa483 adding code bock 2024-12-05 12:37:43 -05:00
Paige Patton
491f59d152 few small changes
Signed-off-by: Paige Patton <prubenda@redhat.com>
2024-11-12 10:34:09 -07:00
Henrick Goldwurm
949f1f09e0 Add support for user-provided default network ACL (#731)
* Add support for user-provided default network ACL

Signed-off-by: henrick <self@thehenrick.com>

* Add logs to notify user when their provided acl is used

Signed-off-by: henrick <self@thehenrick.com>

* Update docs to include optional default_acl_id parameter in zone_outage

Signed-off-by: henrick <self@thehenrick.com>

---------

Signed-off-by: henrick <self@thehenrick.com>
Co-authored-by: henrick <self@thehenrick.com>
2024-11-06 12:58:25 -05:00
Paige Patton
0e68dedb12 adding ibm shut down scenario (#697)
rh-pre-commit.version: 2.2.0
rh-pre-commit.check-secrets: ENABLED

Signed-off-by: Auto User <auto@users.noreply.github.com>
Signed-off-by: Paige Patton <prubenda@redhat.com>
2024-11-01 15:16:07 -04:00
Naga Ravi Chaitanya Elluri
e5c5b35db3 Update kube-burner references to krkn
Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>
2024-10-28 11:03:52 -04:00
Pablo Méndez Hernández
93d2e60386 Fix typo in docs index
Replace "oraganization" with "organization" in table of contents.

Signed-off-by: Pablo Méndez Hernández <pablomh@redhat.com>
2024-10-24 15:10:55 -04:00
Tullio Sebastiani
d91172d9b2 Core Refactoring, Krkn Scenario Plugin API (#694)
* relocated shared libraries from `kraken` to `krkn` folder

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* AbstractScenarioPlugin and ScenarioPluginFactory

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* application_outage porting

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* arcaflow_scenarios porting

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* managedcluster_scenarios porting

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* network_chaos porting

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* node_actions porting

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* plugin_scenarios porting

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* pvc_scenarios porting

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* service_disruption porting

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* service_hijacking porting

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* cluster_shut_down_scenarios porting

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* syn_flood porting

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* time_scenarios porting

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* zone_outages porting

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* ScenarioPluginFactory tests

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* unit tests update

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* pod_scenarios and post actions deprecated

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

scenarios post_actions

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* funtests and config update

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* run_krkn.py update

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* utils porting

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* API Documentation

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* container_scenarios porting

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

fix

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* funtest fix

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* document gif update

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* Documentation + tests update

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* removed example plugin

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* global renaming

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

test fix

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

test fix

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* config.yaml typos

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

typos

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* removed `plugin_scenarios` from NativScenarioPlugin class

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* pod_network_scenarios type added

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* documentation update

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* krkn-lib update

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

typo

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

---------

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>
2024-10-03 20:48:04 +02:00
Paige Patton
b525f83261 restart kubelet (#688)
rh-pre-commit.version: 2.2.0
rh-pre-commit.check-secrets: ENABLED

Signed-off-by: Auto User <auto@users.noreply.github.com>
2024-09-09 21:57:53 -04:00
Naga Ravi Chaitanya Elluri
5484828b67 Deprecate running krkn as kubernetes app
This commit removes the instructions on running krkn as kubernetes
deployment as it is not supported/maintained and also not recommended.

Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>
2024-08-09 13:44:43 -04:00
Naga Ravi Chaitanya Elluri
d18b6332e5 Improve node-scenario docs
This commit adds sample configuration files for each of the supported
platforms.

Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>
2024-08-07 13:52:15 -04:00
Tullio Sebastiani
e02c6d1287 SYN flood scenario (#668)
* scenario config file

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* syn flood plugin

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* run_krkn.py updaated

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* requirements.txt + documentation + config.yaml

* set node selector defaults to worker

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

---------

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>
2024-07-29 15:31:37 -04:00
Paige Rubendall
ef1a55438b taking out need for az cli to be installed
rh-pre-commit.version: 2.2.0
rh-pre-commit.check-secrets: ENABLED

Signed-off-by: Paige Rubendall <prubenda@redhat.com>
2024-07-05 15:18:06 -04:00
Tullio Sebastiani
052f83e7d9 added reference to webservice source code in the documentation (#630) 2024-05-14 17:58:06 +02:00
Tullio Sebastiani
a142f6e7a4 Service hijacking scenario (#617)
* WIP: service hijacking scenario

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* wip

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* error handling

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

adapted run_raken.py

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* restored config.yaml

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* added funtest

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

test fix

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

fix

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

fixed test

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

fix

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

fix test

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

fixed funtest

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

funtest fix

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

minor nit

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

added explicit curl method

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

push

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

fix

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

restored all funtests

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

added mime type test

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

fixed pipeline

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

commented unit

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

utf-8

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

test restored

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

fix test pipeline

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* documentation

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* krkn-lib 2.1.3

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* added other funtests to main merge to collect coverage

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

---------

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>
2024-05-13 10:04:06 +02:00
Tullio Sebastiani
ab98e416a6 Integration of the new pod recovery monitoring strategy implemented in krkn-lib (#609)
* pod monitoring integration in plugin scenario

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* pod monitoring integration in container scenario

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* removed wait-for-pod step from plugin scenario config files

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* introduced global pod recovery time

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

nit

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* introduced krkn_pod_recovery_time in plugin scenario and removed all the references to wait-for-pods

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

fix

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* functional test fix

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* main branch functional test fix

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* increased recovery times

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

---------

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>
2024-04-23 10:49:01 +02:00
Liangquan Li
8bf21392f1 fix doc's nit
Signed-off-by: Liangquan Li <liangli@redhat.com>
2024-03-13 15:21:57 -04:00
Naga Ravi Chaitanya Elluri
2e651798fa Update redhat-chaos references with krkn-chaos
The tools are now hosted under https://github.com/krkn-chaos

Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>
2024-01-24 13:40:39 -05:00
Naga Ravi Chaitanya Elluri
487a9f464c Deprecate long term metrics collection
This will be added back soon via native prometheus integration.

Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>
2024-01-10 15:08:58 -05:00
Tullio Sebastiani
f2d7f88cb8 Krkn lib prometheus client + kube_burner references removed
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>
2024-01-09 10:43:32 -05:00
Naga Ravi Chaitanya Elluri
93f1f19411 Focus on Kubernetes in the chaos testing guide
Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>
2024-01-08 20:09:12 -05:00
yogananth-subramanian
2111bab9a4 Pod ingress network shaping Chaos scenario
The scenario introduces network latency, packet loss, and bandwidth restriction in the Pod's network interface. The purpose of this scenario is to observe faults caused by random variations in the network.

Below example config applies ingress traffic shaping to openshift console.
````
- id: pod_ingress_shaping
  config:
    namespace: openshift-console   # Required - Namespace of the pod to which filter need to be applied.
    label_selector: 'component=ui' # Applies traffic shaping to access openshift console.
    network_params:
        latency: 500ms             # Add 500ms latency to ingress traffic from the pod.
````
2023-11-06 23:34:17 -05:00
Kamesh Akella
b734f1dd05 Updating the chaos recommender README to point to accurate python version 2023-11-03 11:23:43 -04:00
Naga Ravi Chaitanya Elluri
0e852da7d4 Deprecate kubernetes method of deploying Krkn
This will ensure users will use the recommended methods ( standlone or containerized )
of installing and running Krkn.
2023-10-25 12:32:46 -04:00
jtydlack
86d1fda325 Fix container scenario to accept only signal number (#350) (#485) 2023-10-24 16:51:48 -04:00
Paige Rubendall
f7f1b2dfb0 Service disruption (#494)
* adding service disruption

* fixing kil services

* service log changes

* remvoing extra logging

* adding daemon set

* adding service disruption name changes

* cerberus config back

* bad string
2023-10-06 12:51:10 -04:00
Tullio Sebastiani
5567c06cd0 reinstated io-hog documentation (#492) 2023-09-19 17:27:59 +02:00
Paige Rubendall
1bb5b8ad04 adding comment 2023-08-29 21:54:17 -04:00
Paige Rubendall
725d58c8ce adding docs update again 2023-08-25 14:37:07 -04:00
Paige Rubendall
c6058da7a7 adding comment 2023-08-25 12:19:03 -04:00
Tullio Sebastiani
f868000ebd Switched from krkn_lib_kubernetes to krkn_lib v1.0.0 (#469)
* changed all the references to krkn_lib_kubernetes to the new krkn_lib


changed all the references

* added krkn-lib pointer in documentation
2023-08-22 12:41:40 -04:00
jtydlack
491dc17267 Slo via http (#459)
* Fix typo

* Enable loading SLO profile via URL (#438)
2023-08-10 11:02:33 -04:00
yogananth-subramanian
b2b5002f45 Pod egress network shapping Chaos scenario
The scenario introduces network latency, packet loss, and bandwidth restriction in the Pod's network interface.
The purpose of this scenario is to observe faults caused by random variations in the network.

Below example config applies egress traffic shaping to openshift console.
````
- id: pod_egress_shaping
  config:
    namespace: openshift-console   # Required - Namespace of the pod to which filter need to be applied.
    label_selector: 'component=ui' # Applies traffic shaping to access openshift console.
    network_params:
        latency: 500ms             # Add 500ms latency to egress traffic from the pod.
````
2023-08-08 11:45:03 -04:00
Naga Ravi Chaitanya Elluri
0eb8d38596 Expand SLOs profile to cover monitoring for more alerts
This commit:
- Also sets appropriate severity to avoid false failures for the
  test cases especially given that theses are monitored during the chaos
  vs post chaos. Critical alerts are all monitored post chaos with few
  monitored during the chaos that represent overall health and performance
  of the service.
- Renames Alerts to SLOs validation

Metrics reference: f09a492b13/cmd/kube-burner/ocp-config/alerts.yml
2023-06-14 16:58:36 -04:00
Tullio Sebastiani
72b46f8393 temporarly removed io-hog scenario (#433)
* temporarly removed io-hog scenario

* removed litmus documentation & config
2023-06-05 11:03:44 -04:00
Tullio Sebastiani
b9c08a45db extracted the namespace as scenario input (#419)
fixed sub-workflow and input

Co-authored-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>
2023-05-15 18:24:23 +02:00
yogananth-subramanian
8806781a4f Pod network outage Chaos scenario
Pod network outage chaos scenario blocks traffic at pod level irrespective of the network policy used.
With the current network policies, it is not possible to explicitly block ports which are enabled
by allowed network policy rule. This chaos scenario addresses this issue by using OVS flow rules
to block ports related to the pod. It supports OpenShiftSDN and OVNKubernetes based networks.

Below example config blocks access to openshift console.
````
- id: pod_network_outage
  config:
    namespace: openshift-console
    direction:
        - ingress
    ingress_ports:
        - 8443
    label_selector: 'component=ui'
````
2023-05-15 10:43:58 -04:00
Tullio Sebastiani
83b811bee4 Arcaflow stress-ng hogs with parallelism support (#418)
* kubeconfig management for arcaflow + hogs scenario refactoring  

  * kubeconfig authentication parsing refactored to support arcaflow kubernetes deployer  
  * reimplemented all the hog scenarios to allow multiple parallel containers of the same scenarios 
  (eg. to stress two or more nodes in the same run simultaneously) 
  * updated documentation 
* removed sysbench scenarios


* recovered cpu hogs


* updated requirements.txt


* updated config.yaml

* added gitleaks file for test fixtures

* imported sys and logging

* removed config_arcaflow.yaml

* updated readme

* refactored arcaflow documentation entrypoint
2023-05-15 09:45:16 -04:00
Paige Rubendall
16ea18c718 Ibm plugin node scenario (#417)
* Node scenarios for ibmcloud

* adding openshift check info
2023-05-09 12:07:38 -04:00
Naga Ravi Chaitanya Elluri
1ab94754e3 Add missing parameters supported by container scenarios (#415)
Also renames retry_wait to expected_recovery_time to make it clear that
the Kraken will exit 1 if the container doesn't recover within the expected
time.
Fixes https://github.com/redhat-chaos/krkn/issues/414
2023-05-05 13:02:07 -04:00
Naga Ravi Chaitanya Elluri
bc863fa01f Add support to check for critical alerts
This commit enables users to opt in to check for critical alerts firing
in the cluster post chaos at the end of each scenario. Chaos scenario is
considered as failed if the cluster is unhealthy in which case user can
start debugging to fix and harden respective areas.

Fixes https://github.com/redhat-chaos/krkn/issues/410
2023-05-03 16:14:13 -04:00
Naga Ravi Chaitanya Elluri
900ca74d80 Reorganize the content from https://github.com/startx-lab (#346)
Moving the content around installing kraken using helm to the
chaos in practice section of the guide to showcase how startx-lab
is deploying and leveraging Kraken.
2023-04-24 13:51:49 -04:00
Tullio Sebastiani
3627b5ba88 cpu hog scenario + basic arcaflow documentation (#391)
typo


typo


updated documentation


fixed workflow map issue
2023-03-15 16:52:20 +01:00
Paige Rubendall
93686ca736 new quay image reference 2023-01-31 17:21:45 -05:00