Current Kraken integration with Cerberus monitors the cluster as well as the
application health post chaos and pass/fails if they are not healthy after chaos.
This commit adds ability to monitor the user application health during the chaos
and fails the run in case of downtime as it's potentially a downtime in case of
customers environment as well. It is especially useful in case of control plane
failure scenarios including API server, Etcd, Ingress etc.
This commit:
- Adds support to automate the infrastructure pieces leveraged by Kraken
including Cerberus and Elasticsearch
- Adds a Kraken config that can be used to discover all the infra pieces
automatically without having to tweak the configuration.
This commit enables performance monitoring on the cluster when
running Kraken to be able to observe how cluster reacts to failures
as it's important to make sure the cluster is healthy in terms of
both recovery as well as performance.
With the current implementation, all the scenarios of specific type
(for example, pod scenario) has to be executed together. All
pod_scenarios are followed by node_scenarios and so on.
(pod_scenarios -> node_scenarios -> pod_scenarios is not possible)
This commit enables the user to run a specific type of scenario
multiple times. For example, few pod_scenarios followed by
node_scenarios followed by few_scenarios.
This commit:
- Converts various sections in the readme into individual documents.
- Adds pointers to the public blogs.
- Updates workflow/architecture diagram.
- Adds community info and contributing guidelines.