Commit Graph

32 Commits

Author SHA1 Message Date
Naga Ravi Chaitanya Elluri
716057eab6 Monitor user application availability during chaos
Current Kraken integration with Cerberus monitors the cluster as well as the
application health post chaos and pass/fails if they are not healthy after chaos.
This commit adds ability to monitor the user application health during the chaos
and fails the run in case of downtime as it's potentially a downtime in case of
customers environment as well. It is especially useful in case of control plane
failure scenarios including API server, Etcd, Ingress etc.
2021-07-27 13:15:57 -04:00
Naga Ravi Chaitanya Elluri
e9f5961986 [Docs] Add instructions on how to mount custom scenarios 2021-07-26 09:57:11 -04:00
Paige Rubendall
f051c1c30f Merge pull request #120 from paigerube14/container_kill
Container kill
2021-07-15 15:07:58 -04:00
prubenda
76efac8f9b Adding delete of namespaces 2021-07-13 13:31:45 -04:00
prubenda
46a1823291 Adding killing of specific containers in pods 2021-07-08 17:10:48 -04:00
Naga Ravi Chaitanya Elluri
d7ba19c382 Automate the infrastruture pieces
This commit:
- Adds support to automate the infrastructure pieces leveraged by Kraken
  including Cerberus and Elasticsearch
- Adds a Kraken config that can be used to discover all the infra pieces
  automatically without having to tweak the configuration.
2021-07-07 15:52:26 -04:00
Naga Ravi Chaitanya Elluri
e195922504 Document pip version and add more logging 2021-07-07 09:49:52 -04:00
Jared O'Connell
9b83dbcf04 Baremetal Node Support (#74)
* Support for baremtal node scenarious

* Finished baremetal support

* Added documentation for baremetal

* Clarify limitations of implementation in documentation

* Add baremetal support to new run.py file

* Allow use on newer machines

Some older machines require lanplus instead of lan

* Setup to allow per-device user, pass, and bmc address

Also set min version for a dependency

* Fix linting issues

* More linting issue fixes

* More linter issues

* Account for linter standard non-conformity

* Added baremetal warning

Co-authored-by: jaredoconnell <jocnnel@redhat.com>
2021-07-02 17:31:40 -04:00
prubenda
5456fce924 Adding getting started docs 2021-06-23 13:58:43 -04:00
prubenda
41bf815f98 Adding shut down scenario for gcp, az, aws, openstack 2021-06-23 09:00:58 -04:00
Naga Ravi Chaitanya Elluri
e30a4243f6 Add support to alerting on metrics evaluation
This commit enables alerting in Kraken based on the Prometheus queries defined
by the user and modifies the return code of the run to determine pass/fail for
the run.
2021-06-22 15:22:37 -04:00
Naga Ravi Chaitanya Elluri
7e8f0450d6 Add support to scrape and index metrics
This commit:
- Enables Kraken to leverage kube-burner to scrape metrics from
  Prometheus and index them into Elasticsearch. This way we can
  take a look at the metrics in Grafana long term even after the
  cluster is terminated.
- Enables separation of operations based on distribution with
  OpenShift as the default option. One of the use cases is to
  capture Prometheus instance details as it's installed by default
  while it's optional for Kubernetes.
2021-06-21 14:55:50 -04:00
Robert O'Brien
56de5c76a9 Added selinux label to the docker run install command 2021-06-17 08:30:58 -04:00
Amit Sagtani
d00d6ec69e Install pre-commit and use GitHub Actions (#94)
* added pre-commit and code-cleaning

* removed tox and TravisCI
2021-05-05 09:53:45 -04:00
Mike Fiedler
6dc06c1c57 Merge pull request #40 from paigerube14/az_nodes
Az nodes
2021-03-17 17:50:24 -04:00
prubenda
c7bb32f633 Adding azure to node scenarios 2021-03-17 17:41:07 -04:00
prubenda
387d6921a6 adding contribute doc 2021-02-25 10:33:55 -05:00
Pravin Dsilva
807d96ae9c Dockerfile for ppc64le
Signed-off-by: Pravin Dsilva <pravin.d-silva@ibm.com>
2021-02-17 12:29:27 -05:00
Pravin Dsilva
918b5fb6d3 Add node level chaos scenarios for bastion node
Signed-off-by: Pravin Dsilva <pravin.d-silva@ibm.com>
2021-02-16 09:04:55 -08:00
Naga Ravi Chaitanya Elluri
a7e28ca490 Add support to deploy performance dashboards
This commit enables performance monitoring on the cluster when
running Kraken to be able to observe how cluster reacts to failures
as it's important to make sure the cluster is healthy in terms of
both recovery as well as performance.
2021-02-10 16:06:55 -05:00
mjulie
a42adf89e8 Add pod scenarios for custom app
Signed-off-by: mjulie <mjulie@in.ibm.com>
2021-02-04 11:24:05 -05:00
mjulie
488aa826e4 Add pod scenarios for custom app
Signed-off-by: mjulie <mjulie@in.ibm.com>
2021-02-04 11:24:05 -05:00
mjulie
9df350a189 Add pod scenarios for custom app
Signed-off-by: mjulie <mjulie@in.ibm.com>
2021-02-04 11:24:05 -05:00
arcprabh
8dd18af161 Enable support for Openstack cloud.
Signed-off-by: arcprabh <arcprabh@in.ibm.com>

Incorporated first round of review comments

Signed-off-by: arcprabh <arcprabh@in.ibm.com>

Resolve multiple node name issue for single ip

Signed-off-by: arcprabh <arcprabh@in.ibm.com>
2021-02-02 20:47:30 +05:30
prubenda
1fc9683c8c Adding litmus scenario options 2020-12-03 12:45:35 -05:00
prubenda
d3d2cffffa Adding a couple of docs layout updates 2020-11-30 09:26:50 -05:00
prubenda
d3e01db574 adding start to fix for all other cloud types 2020-11-24 16:32:43 -05:00
prubenda
72fe662e05 Adding GCP node scenarios support 2020-11-17 09:57:39 -05:00
prubenda
c41241cd6d Adding specifications to set up for node scenarios in aws 2020-11-13 13:05:42 -05:00
Yashashree1997
47847d86cd Adds the ability to run a specific type of scenario multiple times
With the current implementation, all the scenarios of specific type
(for example, pod scenario) has to be executed together. All
pod_scenarios are followed by node_scenarios and so on.
(pod_scenarios -> node_scenarios -> pod_scenarios is not possible)
This commit enables the user to run a specific type of scenario
multiple times. For example, few pod_scenarios followed by
node_scenarios followed by few_scenarios.
2020-10-30 10:40:42 -04:00
prubenda
6f31519e5f adding time scenario 2020-10-27 08:37:54 -04:00
Naga Ravi Chaitanya Elluri
82743230fe Modify documentation to improve readability
This commit:
- Converts various sections in the readme into individual documents.
- Adds pointers to the public blogs.
- Updates workflow/architecture diagram.
- Adds community info and contributing guidelines.
2020-10-21 15:01:54 -04:00