* basic structure working
* config and options refactoring
nits and changes
* removed unused function with typo + fixed duration
* removed unused arguments
* minor fixes
* adding service disruption
* fixing kil services
* service log changes
* remvoing extra logging
* adding daemon set
* adding service disruption name changes
* cerberus config back
* bad string
The scenario introduces network latency, packet loss, and bandwidth restriction in the Pod's network interface.
The purpose of this scenario is to observe faults caused by random variations in the network.
Below example config applies egress traffic shaping to openshift console.
````
- id: pod_egress_shaping
config:
namespace: openshift-console # Required - Namespace of the pod to which filter need to be applied.
label_selector: 'component=ui' # Applies traffic shaping to access openshift console.
network_params:
latency: 500ms # Add 500ms latency to egress traffic from the pod.
````
This commit:
- Also sets appropriate severity to avoid false failures for the
test cases especially given that theses are monitored during the chaos
vs post chaos. Critical alerts are all monitored post chaos with few
monitored during the chaos that represent overall health and performance
of the service.
- Renames Alerts to SLOs validation
Metrics reference: f09a492b13/cmd/kube-burner/ocp-config/alerts.yml
* kubeconfig management for arcaflow + hogs scenario refactoring
* kubeconfig authentication parsing refactored to support arcaflow kubernetes deployer
* reimplemented all the hog scenarios to allow multiple parallel containers of the same scenarios
(eg. to stress two or more nodes in the same run simultaneously)
* updated documentation
* removed sysbench scenarios
* recovered cpu hogs
* updated requirements.txt
* updated config.yaml
* added gitleaks file for test fixtures
* imported sys and logging
* removed config_arcaflow.yaml
* updated readme
* refactored arcaflow documentation entrypoint
This commit:
- Leverages distribution flag in the config set by the user to skip
things not supported on OpenShift to be able to run scenarios on
Kubernetes.
- Adds sample config and scenario files that work on Kubernetes.
This commit adds a roadmap which walks through the features and enhancements that
are going to be added to Kraken in the immediate future in order to help users
understand where we need help as well as where the project is going.
This commit:
- Adds information around test methodology that needs to be embraced and
best practices that an OpenShift cluster, platform and applications running
on top of it should take into account for best user experience, performance,
resilience and reliability.
- Adds test environment recommendations as to how and where to run chaos tests.
* Added new scenario to fill up a given volumen
* fixing small issues and style
* adding PVC as input param instead of pod name
* small fix
* get container name and volumen name
replace oc with kubectl commands
* adding yaml file to create a pv, pvc and pod to run pvc_scenario
* adding support to match both string for describe command when looking for pod_name
* added support to find the pvc from a given pod
* small fix
* small fix
This commit enables users to simulate a downtime of an application
by blocking the traffic for the specified duration to see how
it/other components communicating with it behave in case of downtime.
- This eases the usage and debuggability by running the fault injection pods in
the same namespace as other resources of litmus. This will also ease the
deletion process and ensure that there are no leftover objects on the cluster.
- This commit also enables users to use the same rbac template for all the litmus
scenarios without having to pull in a specic one for each of the scenarios.
This commit adds support to create zone outage in AWS by denying both
ingress and egress traffic to the instances belonging to a particular
subnet belonging to the zone by tweaking the network acl. This creates
an outage of all the nodes in the zone - both master and workers.
Current Kraken integration with Cerberus monitors the cluster as well as the
application health post chaos and pass/fails if they are not healthy after chaos.
This commit adds ability to monitor the user application health during the chaos
and fails the run in case of downtime as it's potentially a downtime in case of
customers environment as well. It is especially useful in case of control plane
failure scenarios including API server, Etcd, Ingress etc.
This commit:
- Adds timeout to avoid operations hanging for long durations.
- Improves exception handling and exits wherever needed.
- Sets KUBECONFIG env var globoally to access the cluster.
This commit:
- Adds support to automate the infrastructure pieces leveraged by Kraken
including Cerberus and Elasticsearch
- Adds a Kraken config that can be used to discover all the infra pieces
automatically without having to tweak the configuration.
This commit enables alerting in Kraken based on the Prometheus queries defined
by the user and modifies the return code of the run to determine pass/fail for
the run.
This commit:
- Enables Kraken to leverage kube-burner to scrape metrics from
Prometheus and index them into Elasticsearch. This way we can
take a look at the metrics in Grafana long term even after the
cluster is terminated.
- Enables separation of operations based on distribution with
OpenShift as the default option. One of the use cases is to
capture Prometheus instance details as it's installed by default
while it's optional for Kubernetes.
This commit:
- Refactors the code base to be more modular by moving functions
into respective modules to make it lean and reusable.
- Uses black to reformat the code to follow PEP 8 practices.