Functional & Unit Tests / Functional & Unit Tests (push) Failing after 9m14s
Functional & Unit Tests / Generate Coverage Badge (push) Has been skipped
This commit adds recommendation to test and ensure Pod Disruption
Budgets are set for critical applications to avoid downtime.
Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>
Functional & Unit Tests / Functional & Unit Tests (push) Failing after 9m12s
Functional & Unit Tests / Generate Coverage Badge (push) Has been skipped
Used is_openshift function from krkn lib
Remove distribution from config
Remove distribution from documentation
Signed-off-by: jtydlack <139967002+jtydlack@users.noreply.github.com>
* Document how to use Google's credentials associated with a user acccount
Signed-off-by: Pablo Méndez Hernández <pablomh@redhat.com>
* Change API from 'Google API Client' to 'Google Cloud Python Client'
According to the 'Google API Client' GH page:
```
This library is considered complete and is in maintenance mode. This means
that we will address critical bugs and security issues but will not add any
new features.
This library is officially supported by Google. However, the maintainers of
this repository recommend using Cloud Client Libraries for Python, where
possible, for new code development.
```
So change the code accordingly to adapt it to 'Google Cloud Python Client'.
Signed-off-by: Pablo Méndez Hernández <pablomh@redhat.com>
---------
Signed-off-by: Pablo Méndez Hernández <pablomh@redhat.com>
* Add support for user-provided default network ACL
Signed-off-by: henrick <self@thehenrick.com>
* Add logs to notify user when their provided acl is used
Signed-off-by: henrick <self@thehenrick.com>
* Update docs to include optional default_acl_id parameter in zone_outage
Signed-off-by: henrick <self@thehenrick.com>
---------
Signed-off-by: henrick <self@thehenrick.com>
Co-authored-by: henrick <self@thehenrick.com>
This commit removes the instructions on running krkn as kubernetes
deployment as it is not supported/maintained and also not recommended.
Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>
The scenario introduces network latency, packet loss, and bandwidth restriction in the Pod's network interface. The purpose of this scenario is to observe faults caused by random variations in the network.
Below example config applies ingress traffic shaping to openshift console.
````
- id: pod_ingress_shaping
config:
namespace: openshift-console # Required - Namespace of the pod to which filter need to be applied.
label_selector: 'component=ui' # Applies traffic shaping to access openshift console.
network_params:
latency: 500ms # Add 500ms latency to ingress traffic from the pod.
````
* adding service disruption
* fixing kil services
* service log changes
* remvoing extra logging
* adding daemon set
* adding service disruption name changes
* cerberus config back
* bad string
The scenario introduces network latency, packet loss, and bandwidth restriction in the Pod's network interface.
The purpose of this scenario is to observe faults caused by random variations in the network.
Below example config applies egress traffic shaping to openshift console.
````
- id: pod_egress_shaping
config:
namespace: openshift-console # Required - Namespace of the pod to which filter need to be applied.
label_selector: 'component=ui' # Applies traffic shaping to access openshift console.
network_params:
latency: 500ms # Add 500ms latency to egress traffic from the pod.
````
This commit:
- Also sets appropriate severity to avoid false failures for the
test cases especially given that theses are monitored during the chaos
vs post chaos. Critical alerts are all monitored post chaos with few
monitored during the chaos that represent overall health and performance
of the service.
- Renames Alerts to SLOs validation
Metrics reference: f09a492b13/cmd/kube-burner/ocp-config/alerts.yml
Pod network outage chaos scenario blocks traffic at pod level irrespective of the network policy used.
With the current network policies, it is not possible to explicitly block ports which are enabled
by allowed network policy rule. This chaos scenario addresses this issue by using OVS flow rules
to block ports related to the pod. It supports OpenShiftSDN and OVNKubernetes based networks.
Below example config blocks access to openshift console.
````
- id: pod_network_outage
config:
namespace: openshift-console
direction:
- ingress
ingress_ports:
- 8443
label_selector: 'component=ui'
````
* kubeconfig management for arcaflow + hogs scenario refactoring
* kubeconfig authentication parsing refactored to support arcaflow kubernetes deployer
* reimplemented all the hog scenarios to allow multiple parallel containers of the same scenarios
(eg. to stress two or more nodes in the same run simultaneously)
* updated documentation
* removed sysbench scenarios
* recovered cpu hogs
* updated requirements.txt
* updated config.yaml
* added gitleaks file for test fixtures
* imported sys and logging
* removed config_arcaflow.yaml
* updated readme
* refactored arcaflow documentation entrypoint
Also renames retry_wait to expected_recovery_time to make it clear that
the Kraken will exit 1 if the container doesn't recover within the expected
time.
Fixes https://github.com/redhat-chaos/krkn/issues/414
This commit enables users to opt in to check for critical alerts firing
in the cluster post chaos at the end of each scenario. Chaos scenario is
considered as failed if the cluster is unhealthy in which case user can
start debugging to fix and harden respective areas.
Fixes https://github.com/redhat-chaos/krkn/issues/410
Moving the content around installing kraken using helm to the
chaos in practice section of the guide to showcase how startx-lab
is deploying and leveraging Kraken.