* Hog scenario porting from arcaflow to native (#748)
* added new native hog scenario
* removed arcaflow dependency + legacy hog scenarios
* config update
* changed hog configuration structure + added average samples
* fix on cpu count
* removes tripledes warning
* changed selector format
* changed selector syntax
* number of nodes option
* documentation
* functional tests
* exception handling on hog deployment thread
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Hog scenario porting from arcaflow to native (#748)
* added new native hog scenario
* removed arcaflow dependency + legacy hog scenarios
* config update
* changed hog configuration structure + added average samples
* fix on cpu count
* removes tripledes warning
* changed selector format
* changed selector syntax
* number of nodes option
* documentation
* functional tests
* exception handling on hog deployment thread
Signed-off-by: Paige Patton <prubenda@redhat.com>
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* adding vsphere updates to non native
Signed-off-by: Paige Patton <prubenda@redhat.com>
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* adding node id to affected node
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Fixed the spelling mistake
Signed-off-by: Meghana Katta <mkatta@mkatta-thinkpadt14gen4.bengluru.csb>
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* adding v4.0.8 version (#756)
Signed-off-by: Paige Patton <prubenda@redhat.com>
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Add autodetecting distribution (#753)
Used is_openshift function from krkn lib
Remove distribution from config
Remove distribution from documentation
Signed-off-by: jtydlack <139967002+jtydlack@users.noreply.github.com>
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* initial version of health checks
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changes for appending success response and health check config format
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changes include health check doc and exit_on_failure config
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Update config.yaml
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* initial version of health checks
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changes for appending success response and health check config format
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Update config.yaml
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* initial version of health checks
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changes for appending success response and health check config format
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changes include health check doc and exit_on_failure config
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Update config.yaml
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* initial version of health checks
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changes for appending success response and health check config format
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Update config.yaml
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Added the health check config in functional test config
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Modified the health checks documentation
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changes for debugging the functional test failing
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* changed the code for debugging in run_test.sh
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Debugging
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Removed the functional test running line
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Removing the health check config in common_test_config for debugging
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Fixing functional test fialure
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Removing the changes that are added for debugging
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* few modifications
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Renamed timestamp
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changed the start timestamp and end timestamp data type to the datetime
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* initial version of health checks
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changes for appending success response and health check config format
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changes include health check doc and exit_on_failure config
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Update config.yaml
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* initial version of health checks
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changes for appending success response and health check config format
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Update config.yaml
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Hog scenario porting from arcaflow to native (#748)
* added new native hog scenario
* removed arcaflow dependency + legacy hog scenarios
* config update
* changed hog configuration structure + added average samples
* fix on cpu count
* removes tripledes warning
* changed selector format
* changed selector syntax
* number of nodes option
* documentation
* functional tests
* exception handling on hog deployment thread
Signed-off-by: Paige Patton <prubenda@redhat.com>
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* adding node id to affected node
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* initial version of health checks
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changes for appending success response and health check config format
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changes include health check doc and exit_on_failure config
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Update config.yaml
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* initial version of health checks
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changes for appending success response and health check config format
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Update config.yaml
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Added the health check config in functional test config
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Modified the health checks documentation
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changes for debugging the functional test failing
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* changed the code for debugging in run_test.sh
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Debugging
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Removed the functional test running line
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Removing the health check config in common_test_config for debugging
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Fixing functional test fialure
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Removing the changes that are added for debugging
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* few modifications
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Renamed timestamp
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* initial version of health checks
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changes for appending success response and health check config format
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* initial version of health checks
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Hog scenario porting from arcaflow to native (#748)
* added new native hog scenario
* removed arcaflow dependency + legacy hog scenarios
* config update
* changed hog configuration structure + added average samples
* fix on cpu count
* removes tripledes warning
* changed selector format
* changed selector syntax
* number of nodes option
* documentation
* functional tests
* exception handling on hog deployment thread
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Hog scenario porting from arcaflow to native (#748)
* added new native hog scenario
* removed arcaflow dependency + legacy hog scenarios
* config update
* changed hog configuration structure + added average samples
* fix on cpu count
* removes tripledes warning
* changed selector format
* changed selector syntax
* number of nodes option
* documentation
* functional tests
* exception handling on hog deployment thread
Signed-off-by: Paige Patton <prubenda@redhat.com>
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* adding node id to affected node
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* initial version of health checks
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changes include health check doc and exit_on_failure config
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Update config.yaml
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* initial version of health checks
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changes for appending success response and health check config format
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Update config.yaml
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Added the health check config in functional test config
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changes for debugging the functional test failing
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* changed the code for debugging in run_test.sh
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Debugging
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Removed the functional test running line
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Removing the health check config in common_test_config for debugging
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Fixing functional test fialure
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Removing the changes that are added for debugging
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* few modifications
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Renamed timestamp
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* passing the health check response as HealthCheck object
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Updated the krkn-lib version in requirements.txt
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
* Changed the coverage
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
---------
Signed-off-by: kattameghana <meghanakatta8@gmail.com>
Signed-off-by: Paige Patton <prubenda@redhat.com>
Signed-off-by: Meghana Katta <mkatta@mkatta-thinkpadt14gen4.bengluru.csb>
Signed-off-by: jtydlack <139967002+jtydlack@users.noreply.github.com>
Co-authored-by: Tullio Sebastiani <tsebastiani@users.noreply.github.com>
Co-authored-by: Paige Patton <prubenda@redhat.com>
Co-authored-by: Meghana Katta <mkatta@mkatta-thinkpadt14gen4.bengluru.csb>
Co-authored-by: Paige Patton <64206430+paigerube14@users.noreply.github.com>
Co-authored-by: jtydlack <139967002+jtydlack@users.noreply.github.com>
Functional & Unit Tests / Functional & Unit Tests (push) Failing after 9m12s
Functional & Unit Tests / Generate Coverage Badge (push) Has been skipped
Used is_openshift function from krkn lib
Remove distribution from config
Remove distribution from documentation
Signed-off-by: jtydlack <139967002+jtydlack@users.noreply.github.com>
This commit:
- Also switches the rate queries severity to critical as 5%
threshold is high for low scale/density clusters and needs to be flagged.
- Adds rate queries to openshift alerts file
Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>
Output in terminal changed to use json structure.
The json output file names are in format
recommender_namespace_YYYY-MM-DD_HH-MM-SS.
The path to the json file can be specified. Default path is in
kraken/utils/chaos_recommender/recommender_output.
Signed-off-by: jtydlcak <139967002+jtydlack@users.noreply.github.com>
* basic structure working
* config and options refactoring
nits and changes
* removed unused function with typo + fixed duration
* removed unused arguments
* minor fixes
* adding service disruption
* fixing kil services
* service log changes
* remvoing extra logging
* adding daemon set
* adding service disruption name changes
* cerberus config back
* bad string
The scenario introduces network latency, packet loss, and bandwidth restriction in the Pod's network interface.
The purpose of this scenario is to observe faults caused by random variations in the network.
Below example config applies egress traffic shaping to openshift console.
````
- id: pod_egress_shaping
config:
namespace: openshift-console # Required - Namespace of the pod to which filter need to be applied.
label_selector: 'component=ui' # Applies traffic shaping to access openshift console.
network_params:
latency: 500ms # Add 500ms latency to egress traffic from the pod.
````
This commit:
- Also sets appropriate severity to avoid false failures for the
test cases especially given that theses are monitored during the chaos
vs post chaos. Critical alerts are all monitored post chaos with few
monitored during the chaos that represent overall health and performance
of the service.
- Renames Alerts to SLOs validation
Metrics reference: f09a492b13/cmd/kube-burner/ocp-config/alerts.yml
This is the first step towards the goal to only have metrics tracking
the overall health and performance of the component/cluster. For instance,
for etcd disruption scenarios, leader elections are expected, we should instead
track etcd leader availability and fsync latency under critical catergory vs leader
elections.
Pod network outage chaos scenario blocks traffic at pod level irrespective of the network policy used.
With the current network policies, it is not possible to explicitly block ports which are enabled
by allowed network policy rule. This chaos scenario addresses this issue by using OVS flow rules
to block ports related to the pod. It supports OpenShiftSDN and OVNKubernetes based networks.
Below example config blocks access to openshift console.
````
- id: pod_network_outage
config:
namespace: openshift-console
direction:
- ingress
ingress_ports:
- 8443
label_selector: 'component=ui'
````
* kubeconfig management for arcaflow + hogs scenario refactoring
* kubeconfig authentication parsing refactored to support arcaflow kubernetes deployer
* reimplemented all the hog scenarios to allow multiple parallel containers of the same scenarios
(eg. to stress two or more nodes in the same run simultaneously)
* updated documentation
* removed sysbench scenarios
* recovered cpu hogs
* updated requirements.txt
* updated config.yaml
* added gitleaks file for test fixtures
* imported sys and logging
* removed config_arcaflow.yaml
* updated readme
* refactored arcaflow documentation entrypoint
This commit enables users to opt in to check for critical alerts firing
in the cluster post chaos at the end of each scenario. Chaos scenario is
considered as failed if the cluster is unhealthy in which case user can
start debugging to fix and harden respective areas.
Fixes https://github.com/redhat-chaos/krkn/issues/410
Documentation says we default to ~ for looking up the kubernetes config
but then we set everywhere /root. Fixed the config to really look for ~.
Should solve #327.
Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com>