Commit Graph

123 Commits

Author SHA1 Message Date
Paige Rubendall
6b865fc573 Adding server set up for kraken 2021-10-25 08:58:46 -04:00
Naga Ravi Chaitanya Elluri
d3f8e2dd35 Bake in azure cli needed for node scenarios
This commit also modifies the key members for folks to reach out in case
of any questions.
2021-10-19 16:31:18 -04:00
Naga Ravi Chaitanya Elluri
2674e09407 Ignore validation for network policy creation
This commit helps the cases where targeting application pods in a
namespace using pod-selector to create an outage fails because of
not being able to validate the selector.

Error message for reference:
error validating data: ValidationError(NetworkPolicy.spec.podSelector):
unknown field "app=dittybopper" in io.k8s.apimachinery.pkg.apis.meta.v1.LabelSelector
2021-10-14 19:31:38 -04:00
Paige Rubendall
10e9b09819 Adding fix for openstack node name issue 2021-10-14 14:56:46 -04:00
Paige Rubendall
57ef98f728 adding more node clouds defined 2021-10-11 13:49:12 -04:00
Naga Ravi Chaitanya Elluri
970cd061f4 Set the location of cerberus config to match entrypoint
Entrypoint for reference - https://github.com/cloud-bulldozer/cerberus/blob/master/containers/Dockerfile#L23.
2021-10-08 09:25:14 -04:00
Naga Ravi Chaitanya Elluri
cdf3bc03d2 Add support to block traffic to an application
This commit enables users to simulate a downtime of an application
by blocking the traffic for the specified duration to see how
it/other components communicating with it behave in case of downtime.
2021-10-01 10:13:40 -04:00
Paige Rubendall
22df024312 adding validation that namespace becomes active 2021-09-28 09:58:55 -04:00
Naga Ravi Chaitanya Elluri
4a4033605b Pull images from quay instead of docker
This is needed to avoid getting rate limited. Build for reference -
https://recovery.quay.io/repository/openshift-scale/kraken/build/0cccc967-cfef-43d0-98ca-e3eccb698045.
2021-09-23 15:00:21 -04:00
Naga Ravi Chaitanya Elluri
f36da323e7 Prioritize filtering on namespace to improve performance
This will avoid querying all namespaces for pods matching the label_selector
if defined as shown in the sample scenario config. This commit also prints a
pointer to the report generated at the end of the run.
2021-09-22 15:03:39 -04:00
Paige Rubendall
ad6d2982a3 Merge pull request #152 from paigerube14/time_spacing_fix
Time spacing fix
2021-09-22 09:57:33 -04:00
Naga Ravi Chaitanya Elluri
b736f87695 Bump Kubernetes python version 2021-09-22 09:26:14 -04:00
Paige Rubendall
8e09e0a61b Adding specific tag version of powerfulseal 2021-09-21 13:49:41 -04:00
Paige Rubendall
16b5214fdd Adding specific tag version of powerfulseal 2021-09-21 12:37:45 -04:00
Naga Ravi Chaitanya Elluri
036e51a6b1 Delete litmus crd's during the cleanup
This commit will ensure that the litmus resources installed on the
cluster get cleaned up and also creates the chaosengine in the
specified namespace.
2021-09-16 16:30:21 -04:00
Paige Rubendall
5015853f22 Merge pull request #149 from paigerube14/litmus_logging
adding litmus logging
2021-09-08 17:41:45 -04:00
Paige Rubendall
a9056ddf43 adding litmus logging 2021-09-08 17:11:49 -04:00
Naga Ravi Chaitanya Elluri
5da0b259c5 Run all the litmus resources in a single namespace
- This eases the usage and debuggability by running the fault injection pods in
  the same namespace as other resources of litmus. This will also ease the
  deletion process and ensure that there are no leftover objects on the cluster.

- This commit also enables users to use the same rbac template for all the litmus
  scenarios without having to pull in a specic one for each of the scenarios.
2021-09-08 16:37:07 -04:00
Naga Ravi Chaitanya Elluri
68a32666cd Update litmus docs with supported scenarios 2021-09-01 16:41:22 -04:00
Naga Ravi Chaitanya Elluri
b9493baf1d Add a note around node-scenarios compatability
This commit adds a note around using standlone version of Kraken to
inject node-scenarios until https://github.com/cloud-bulldozer/kraken/issues/106
gets fixed.
2021-08-30 08:40:20 -04:00
Naga Ravi Chaitanya Elluri
9d9f564a3d Add badge for the container image 2021-08-27 20:32:43 -04:00
Naga Ravi Chaitanya Elluri
adb465cab0 Add support for multi-zone disruption
This will enable users to disrupt multiple zones in the cluster simultaneously
to be able to understand the behaviour of various components.
2021-08-26 08:23:24 -04:00
Paige Rubendall
22fcab57f5 container checking in pod 2021-08-25 09:28:03 -04:00
Naga Ravi Chaitanya Elluri
07ccfbf0aa Add pointer to Kraken-hub
This enables users to run Kraken with minimal configuration tweaks
and makes it easy for especially CI use cases.
2021-08-23 14:33:16 -04:00
prubenda
9b0bcdbf0e Adding node memory hog scenario 2021-08-20 14:02:00 -04:00
Naga Ravi Chaitanya Elluri
6456eec76a Add zone outage scenarios
This commit adds support to create zone outage in AWS by denying both
ingress and egress traffic to the instances belonging to a particular
subnet belonging to the zone by tweaking the network acl. This creates
an outage of all the nodes in the zone - both master and workers.
2021-08-17 11:43:13 -04:00
Naga Ravi Chaitanya Elluri
06d052af48 Run tasks in pod using Job object type
This commit switches the object type from Deployment to Job to be able
to display the status after executing all the scenarios specified in
the Kraken config instead of crashing which is expected in Deployments.

Fixes https://github.com/cloud-bulldozer/kraken/issues/135
2021-08-09 11:50:41 -04:00
Naga Ravi Chaitanya Elluri
c56a8a5356 Add more tunables for cpu hog scenario
This commit exposes the flags to tweak the number of cores and node
count to hog during the node-cpu-hog scenario.
2021-07-28 17:07:40 -04:00
Naga Ravi Chaitanya Elluri
716057eab6 Monitor user application availability during chaos
Current Kraken integration with Cerberus monitors the cluster as well as the
application health post chaos and pass/fails if they are not healthy after chaos.
This commit adds ability to monitor the user application health during the chaos
and fails the run in case of downtime as it's potentially a downtime in case of
customers environment as well. It is especially useful in case of control plane
failure scenarios including API server, Etcd, Ingress etc.
2021-07-27 13:15:57 -04:00
Naga Ravi Chaitanya Elluri
590edff63b Avoid namespace context switch
There are cases where the kubeconfig can be read only like when running
Kraken as a kubernetes deployment. This commit fixes the instances to
use -n flag instead of a namespace context switch.
2021-07-27 11:31:32 -04:00
Naga Ravi Chaitanya Elluri
e9f5961986 [Docs] Add instructions on how to mount custom scenarios 2021-07-26 09:57:11 -04:00
koflerm
304f606b2b Use jsonpath to retrieve pod nodename (#129) 2021-07-23 20:08:06 -04:00
Naga Ravi Chaitanya Elluri
c0b9cb46da Improve error handling
This commit:
- Adds timeout to avoid operations hanging for long durations.
- Improves exception handling and exits wherever needed.
- Sets KUBECONFIG env var globoally to access the cluster.
2021-07-21 12:48:06 -04:00
Paige Rubendall
f051c1c30f Merge pull request #120 from paigerube14/container_kill
Container kill
2021-07-15 15:07:58 -04:00
prubenda
76efac8f9b Adding delete of namespaces 2021-07-13 13:31:45 -04:00
prubenda
46a1823291 Adding killing of specific containers in pods 2021-07-08 17:10:48 -04:00
Naga Ravi Chaitanya Elluri
b75b6e0042 Increase the granularity of cerberus checks
This commit modifies the wait time from 60 seconds to 3 seconds between
each of the requests to the API to capture the components state at a more
granular level by default.
2021-07-08 16:59:33 -04:00
Naga Ravi Chaitanya Elluri
d7ba19c382 Automate the infrastruture pieces
This commit:
- Adds support to automate the infrastructure pieces leveraged by Kraken
  including Cerberus and Elasticsearch
- Adds a Kraken config that can be used to discover all the infra pieces
  automatically without having to tweak the configuration.
2021-07-07 15:52:26 -04:00
Naga Ravi Chaitanya Elluri
e195922504 Document pip version and add more logging 2021-07-07 09:49:52 -04:00
Jared O'Connell
9b83dbcf04 Baremetal Node Support (#74)
* Support for baremtal node scenarious

* Finished baremetal support

* Added documentation for baremetal

* Clarify limitations of implementation in documentation

* Add baremetal support to new run.py file

* Allow use on newer machines

Some older machines require lanplus instead of lan

* Setup to allow per-device user, pass, and bmc address

Also set min version for a dependency

* Fix linting issues

* More linting issue fixes

* More linter issues

* Account for linter standard non-conformity

* Added baremetal warning

Co-authored-by: jaredoconnell <jocnnel@redhat.com>
2021-07-02 17:31:40 -04:00
Paige Rubendall
0afcd22f66 Merge pull request #115 from chaitanyaenr/workflow
Update the workflow
2021-06-23 14:00:04 -04:00
prubenda
5456fce924 Adding getting started docs 2021-06-23 13:58:43 -04:00
Naga Ravi Chaitanya Elluri
d1ae298692 Update the workflow
This commit modifies the workflow diagram to add pieces that are
leveraged to determine pass/fail of the chaos scnearios.
2021-06-23 12:41:38 -04:00
prubenda
41bf815f98 Adding shut down scenario for gcp, az, aws, openstack 2021-06-23 09:00:58 -04:00
Naga Ravi Chaitanya Elluri
e30a4243f6 Add support to alerting on metrics evaluation
This commit enables alerting in Kraken based on the Prometheus queries defined
by the user and modifies the return code of the run to determine pass/fail for
the run.
2021-06-22 15:22:37 -04:00
Naga Ravi Chaitanya Elluri
7e8f0450d6 Add support to scrape and index metrics
This commit:
- Enables Kraken to leverage kube-burner to scrape metrics from
  Prometheus and index them into Elasticsearch. This way we can
  take a look at the metrics in Grafana long term even after the
  cluster is terminated.
- Enables separation of operations based on distribution with
  OpenShift as the default option. One of the use cases is to
  capture Prometheus instance details as it's installed by default
  while it's optional for Kubernetes.
2021-06-21 14:55:50 -04:00
Naga Ravi Chaitanya Elluri
871eb3d74e Avoid circular dependencies
This commit deletes unneeded imports and fixes the circular dependency
issues.
2021-06-17 11:18:34 -04:00
Robert O'Brien
56de5c76a9 Added selinux label to the docker run install command 2021-06-17 08:30:58 -04:00
Naga Ravi Chaitanya Elluri
5c2453b07e Refactor code base
This commit:
- Refactors the code base to be more modular by moving functions
  into respective modules to make it lean and reusable.
- Uses black to reformat the code to follow PEP 8 practices.
2021-06-14 17:41:10 -04:00
Ryan Drew
8d9faf7033 Correct license from MIT to Apache 2.0 2021-06-08 18:52:42 -04:00