The scenario introduces network latency, packet loss, and bandwidth restriction in the Pod's network interface.
The purpose of this scenario is to observe faults caused by random variations in the network.
Below example config applies egress traffic shaping to openshift console.
````
- id: pod_egress_shaping
config:
namespace: openshift-console # Required - Namespace of the pod to which filter need to be applied.
label_selector: 'component=ui' # Applies traffic shaping to access openshift console.
network_params:
latency: 500ms # Add 500ms latency to egress traffic from the pod.
````
This makes sure latest clients are installed and used:
- This will avoid compatability issues with the server
- Fixes security vulnerabilities and CVEs
This commit:
- Also sets appropriate severity to avoid false failures for the
test cases especially given that theses are monitored during the chaos
vs post chaos. Critical alerts are all monitored post chaos with few
monitored during the chaos that represent overall health and performance
of the service.
- Renames Alerts to SLOs validation
Metrics reference: f09a492b13/cmd/kube-burner/ocp-config/alerts.yml
* Include check for inside k8s scenario
* Include check for inside k8s scenario (2)
* Include check for inside k8s scenario (3)
* Include check for inside k8s scenario (4)
This is the first step towards the goal to only have metrics tracking
the overall health and performance of the component/cluster. For instance,
for etcd disruption scenarios, leader elections are expected, we should instead
track etcd leader availability and fsync latency under critical catergory vs leader
elections.
Pod network outage chaos scenario blocks traffic at pod level irrespective of the network policy used.
With the current network policies, it is not possible to explicitly block ports which are enabled
by allowed network policy rule. This chaos scenario addresses this issue by using OVS flow rules
to block ports related to the pod. It supports OpenShiftSDN and OVNKubernetes based networks.
Below example config blocks access to openshift console.
````
- id: pod_network_outage
config:
namespace: openshift-console
direction:
- ingress
ingress_ports:
- 8443
label_selector: 'component=ui'
````
* kubeconfig management for arcaflow + hogs scenario refactoring
* kubeconfig authentication parsing refactored to support arcaflow kubernetes deployer
* reimplemented all the hog scenarios to allow multiple parallel containers of the same scenarios
(eg. to stress two or more nodes in the same run simultaneously)
* updated documentation
* removed sysbench scenarios
* recovered cpu hogs
* updated requirements.txt
* updated config.yaml
* added gitleaks file for test fixtures
* imported sys and logging
* removed config_arcaflow.yaml
* updated readme
* refactored arcaflow documentation entrypoint
Also renames retry_wait to expected_recovery_time to make it clear that
the Kraken will exit 1 if the container doesn't recover within the expected
time.
Fixes https://github.com/redhat-chaos/krkn/issues/414
This commit enables users to opt in to check for critical alerts firing
in the cluster post chaos at the end of each scenario. Chaos scenario is
considered as failed if the cluster is unhealthy in which case user can
start debugging to fix and harden respective areas.
Fixes https://github.com/redhat-chaos/krkn/issues/410
Moving the content around installing kraken using helm to the
chaos in practice section of the guide to showcase how startx-lab
is deploying and leveraging Kraken.