- This eases the usage and debuggability by running the fault injection pods in
the same namespace as other resources of litmus. This will also ease the
deletion process and ensure that there are no leftover objects on the cluster.
- This commit also enables users to use the same rbac template for all the litmus
scenarios without having to pull in a specic one for each of the scenarios.
This commit adds support to create zone outage in AWS by denying both
ingress and egress traffic to the instances belonging to a particular
subnet belonging to the zone by tweaking the network acl. This creates
an outage of all the nodes in the zone - both master and workers.
Current Kraken integration with Cerberus monitors the cluster as well as the
application health post chaos and pass/fails if they are not healthy after chaos.
This commit adds ability to monitor the user application health during the chaos
and fails the run in case of downtime as it's potentially a downtime in case of
customers environment as well. It is especially useful in case of control plane
failure scenarios including API server, Etcd, Ingress etc.
This commit modifies the wait time from 60 seconds to 3 seconds between
each of the requests to the API to capture the components state at a more
granular level by default.
This commit:
- Adds support to automate the infrastructure pieces leveraged by Kraken
including Cerberus and Elasticsearch
- Adds a Kraken config that can be used to discover all the infra pieces
automatically without having to tweak the configuration.
This commit enables alerting in Kraken based on the Prometheus queries defined
by the user and modifies the return code of the run to determine pass/fail for
the run.
This commit:
- Enables Kraken to leverage kube-burner to scrape metrics from
Prometheus and index them into Elasticsearch. This way we can
take a look at the metrics in Grafana long term even after the
cluster is terminated.
- Enables separation of operations based on distribution with
OpenShift as the default option. One of the use cases is to
capture Prometheus instance details as it's installed by default
while it's optional for Kubernetes.
This commit enables performance monitoring on the cluster when
running Kraken to be able to observe how cluster reacts to failures
as it's important to make sure the cluster is healthy in terms of
both recovery as well as performance.
With the current implementation, all the scenarios of specific type
(for example, pod scenario) has to be executed together. All
pod_scenarios are followed by node_scenarios and so on.
(pod_scenarios -> node_scenarios -> pod_scenarios is not possible)
This commit enables the user to run a specific type of scenario
multiple times. For example, few pod_scenarios followed by
node_scenarios followed by few_scenarios.
This commit:
- Converts various sections in the readme into individual documents.
- Adds pointers to the public blogs.
- Updates workflow/architecture diagram.
- Adds community info and contributing guidelines.
This commit:
- Adds a node scenario to stop and start an instance
- Adds a node scenario to terminate an instance
- Adds a node scenario to reboot an instance
- Adds a node scenario to stop the kubelet
- Adds a node scenario to crash the node
This commit:
- Adds support to run pod chaos scenarios including killing an etcd,
ApiServer and kube-apiserver using powerfulseal tool.
- Adds support to create a report with the details about each chaos
injection along with timestamps. The report is generated in the
run directory.
- Adds kubernetes package with a bunch of functions which can be
used later to talk to the kubernetes API to be able to know the
status of the targeted components/nodes.