This commit helps the cases where targeting application pods in a
namespace using pod-selector to create an outage fails because of
not being able to validate the selector.
Error message for reference:
error validating data: ValidationError(NetworkPolicy.spec.podSelector):
unknown field "app=dittybopper" in io.k8s.apimachinery.pkg.apis.meta.v1.LabelSelector
This commit enables users to simulate a downtime of an application
by blocking the traffic for the specified duration to see how
it/other components communicating with it behave in case of downtime.
This will avoid querying all namespaces for pods matching the label_selector
if defined as shown in the sample scenario config. This commit also prints a
pointer to the report generated at the end of the run.
- This eases the usage and debuggability by running the fault injection pods in
the same namespace as other resources of litmus. This will also ease the
deletion process and ensure that there are no leftover objects on the cluster.
- This commit also enables users to use the same rbac template for all the litmus
scenarios without having to pull in a specic one for each of the scenarios.
This commit adds support to create zone outage in AWS by denying both
ingress and egress traffic to the instances belonging to a particular
subnet belonging to the zone by tweaking the network acl. This creates
an outage of all the nodes in the zone - both master and workers.
This commit switches the object type from Deployment to Job to be able
to display the status after executing all the scenarios specified in
the Kraken config instead of crashing which is expected in Deployments.
Fixes https://github.com/cloud-bulldozer/kraken/issues/135
Current Kraken integration with Cerberus monitors the cluster as well as the
application health post chaos and pass/fails if they are not healthy after chaos.
This commit adds ability to monitor the user application health during the chaos
and fails the run in case of downtime as it's potentially a downtime in case of
customers environment as well. It is especially useful in case of control plane
failure scenarios including API server, Etcd, Ingress etc.
There are cases where the kubeconfig can be read only like when running
Kraken as a kubernetes deployment. This commit fixes the instances to
use -n flag instead of a namespace context switch.
This commit:
- Adds timeout to avoid operations hanging for long durations.
- Improves exception handling and exits wherever needed.
- Sets KUBECONFIG env var globoally to access the cluster.
This commit modifies the wait time from 60 seconds to 3 seconds between
each of the requests to the API to capture the components state at a more
granular level by default.
This commit:
- Adds support to automate the infrastructure pieces leveraged by Kraken
including Cerberus and Elasticsearch
- Adds a Kraken config that can be used to discover all the infra pieces
automatically without having to tweak the configuration.
* Support for baremtal node scenarious
* Finished baremetal support
* Added documentation for baremetal
* Clarify limitations of implementation in documentation
* Add baremetal support to new run.py file
* Allow use on newer machines
Some older machines require lanplus instead of lan
* Setup to allow per-device user, pass, and bmc address
Also set min version for a dependency
* Fix linting issues
* More linting issue fixes
* More linter issues
* Account for linter standard non-conformity
* Added baremetal warning
Co-authored-by: jaredoconnell <jocnnel@redhat.com>
This commit enables alerting in Kraken based on the Prometheus queries defined
by the user and modifies the return code of the run to determine pass/fail for
the run.
This commit:
- Enables Kraken to leverage kube-burner to scrape metrics from
Prometheus and index them into Elasticsearch. This way we can
take a look at the metrics in Grafana long term even after the
cluster is terminated.
- Enables separation of operations based on distribution with
OpenShift as the default option. One of the use cases is to
capture Prometheus instance details as it's installed by default
while it's optional for Kubernetes.
This commit:
- Refactors the code base to be more modular by moving functions
into respective modules to make it lean and reusable.
- Uses black to reformat the code to follow PEP 8 practices.