Commit Graph

135 Commits

Author SHA1 Message Date
Pravin Dsilva
38302e7d95 Add timeout for Openstack node scenarios
Signed-off-by: Pravin Dsilva <pdsilva@redhat.com>
2021-11-25 20:56:59 -05:00
Paige Rubendall
f17ad062cf Ci tests (#184)
* Adding in working ci tests

* spacing in readme
2021-11-24 15:12:47 -05:00
Alejandro Gullón
baa812b7f0 Added new scenario to fill up a given volumen (#182)
* Added new scenario to fill up a given volumen

* fixing small issues and style

* adding PVC as input param instead of pod name

* small fix

* get container name and volumen name
replace oc with kubectl commands

* adding yaml file to create a pv, pvc and pod to run pvc_scenario

* adding support to match both string for describe command when looking for pod_name

* added support to find the pvc from a given pod

* small fix

* small fix
2021-11-24 12:18:49 -05:00
prubenda
8e0f4e63af Adding container for pod_exec in time scenarios 2021-11-23 11:33:49 -05:00
Paige Rubendall
2bb4686585 adding check for if namesapce exists 2021-11-16 09:45:04 -05:00
Paige Rubendall
5fc97b68c6 moving readme to show README overall on main page 2021-11-09 06:45:48 -08:00
Paige Rubendall
4a37659416 Adding docker image build and push to github actions 2021-11-04 22:33:26 -04:00
Paige Rubendall
0023d679f7 fixing killed container no reference issue 2021-11-03 10:06:13 -04:00
Naga Ravi Chaitanya Elluri
f3bbc85dd5 Fix issue with matching labels
This commit fixes the issue with application outages scenario where
the pod-selector is not being mapped properly.
2021-10-29 10:45:46 -04:00
Naga Ravi Chaitanya Elluri
43b1e5b727 Remove item only when the list is not empty
This commit fixes the case where the fault injected containers take
longer time to recover as the current checks are bombing out if the
list is empty.
2021-10-28 12:12:45 -04:00
Paige Rubendall
87aa9eef4d Adding multiple node names and instance count for label selectors 2021-10-26 13:44:28 -04:00
Naga Ravi Chaitanya Elluri
674eb74a75 Expose setting the signal in the config
This commit enables users to start Kraken to act as listener by setting
the signal to PAUSE in the config to get the cluster to a desired test or
run any setup before injecting chaos by setting the signal to RUN. This
helps in cases where we have test cases that need to coordinate the chaos
at a desired time depending on the state of the cluster/test run.
2021-10-26 09:05:25 -04:00
Paige Rubendall
6b865fc573 Adding server set up for kraken 2021-10-25 08:58:46 -04:00
Naga Ravi Chaitanya Elluri
d3f8e2dd35 Bake in azure cli needed for node scenarios
This commit also modifies the key members for folks to reach out in case
of any questions.
2021-10-19 16:31:18 -04:00
Naga Ravi Chaitanya Elluri
2674e09407 Ignore validation for network policy creation
This commit helps the cases where targeting application pods in a
namespace using pod-selector to create an outage fails because of
not being able to validate the selector.

Error message for reference:
error validating data: ValidationError(NetworkPolicy.spec.podSelector):
unknown field "app=dittybopper" in io.k8s.apimachinery.pkg.apis.meta.v1.LabelSelector
2021-10-14 19:31:38 -04:00
Paige Rubendall
10e9b09819 Adding fix for openstack node name issue 2021-10-14 14:56:46 -04:00
Paige Rubendall
57ef98f728 adding more node clouds defined 2021-10-11 13:49:12 -04:00
Naga Ravi Chaitanya Elluri
970cd061f4 Set the location of cerberus config to match entrypoint
Entrypoint for reference - https://github.com/cloud-bulldozer/cerberus/blob/master/containers/Dockerfile#L23.
2021-10-08 09:25:14 -04:00
Naga Ravi Chaitanya Elluri
cdf3bc03d2 Add support to block traffic to an application
This commit enables users to simulate a downtime of an application
by blocking the traffic for the specified duration to see how
it/other components communicating with it behave in case of downtime.
2021-10-01 10:13:40 -04:00
Paige Rubendall
22df024312 adding validation that namespace becomes active 2021-09-28 09:58:55 -04:00
Naga Ravi Chaitanya Elluri
4a4033605b Pull images from quay instead of docker
This is needed to avoid getting rate limited. Build for reference -
https://recovery.quay.io/repository/openshift-scale/kraken/build/0cccc967-cfef-43d0-98ca-e3eccb698045.
2021-09-23 15:00:21 -04:00
Naga Ravi Chaitanya Elluri
f36da323e7 Prioritize filtering on namespace to improve performance
This will avoid querying all namespaces for pods matching the label_selector
if defined as shown in the sample scenario config. This commit also prints a
pointer to the report generated at the end of the run.
2021-09-22 15:03:39 -04:00
Paige Rubendall
ad6d2982a3 Merge pull request #152 from paigerube14/time_spacing_fix
Time spacing fix
2021-09-22 09:57:33 -04:00
Naga Ravi Chaitanya Elluri
b736f87695 Bump Kubernetes python version 2021-09-22 09:26:14 -04:00
Paige Rubendall
8e09e0a61b Adding specific tag version of powerfulseal 2021-09-21 13:49:41 -04:00
Paige Rubendall
16b5214fdd Adding specific tag version of powerfulseal 2021-09-21 12:37:45 -04:00
Naga Ravi Chaitanya Elluri
036e51a6b1 Delete litmus crd's during the cleanup
This commit will ensure that the litmus resources installed on the
cluster get cleaned up and also creates the chaosengine in the
specified namespace.
2021-09-16 16:30:21 -04:00
Paige Rubendall
5015853f22 Merge pull request #149 from paigerube14/litmus_logging
adding litmus logging
2021-09-08 17:41:45 -04:00
Paige Rubendall
a9056ddf43 adding litmus logging 2021-09-08 17:11:49 -04:00
Naga Ravi Chaitanya Elluri
5da0b259c5 Run all the litmus resources in a single namespace
- This eases the usage and debuggability by running the fault injection pods in
  the same namespace as other resources of litmus. This will also ease the
  deletion process and ensure that there are no leftover objects on the cluster.

- This commit also enables users to use the same rbac template for all the litmus
  scenarios without having to pull in a specic one for each of the scenarios.
2021-09-08 16:37:07 -04:00
Naga Ravi Chaitanya Elluri
68a32666cd Update litmus docs with supported scenarios 2021-09-01 16:41:22 -04:00
Naga Ravi Chaitanya Elluri
b9493baf1d Add a note around node-scenarios compatability
This commit adds a note around using standlone version of Kraken to
inject node-scenarios until https://github.com/cloud-bulldozer/kraken/issues/106
gets fixed.
2021-08-30 08:40:20 -04:00
Naga Ravi Chaitanya Elluri
9d9f564a3d Add badge for the container image 2021-08-27 20:32:43 -04:00
Naga Ravi Chaitanya Elluri
adb465cab0 Add support for multi-zone disruption
This will enable users to disrupt multiple zones in the cluster simultaneously
to be able to understand the behaviour of various components.
2021-08-26 08:23:24 -04:00
Paige Rubendall
22fcab57f5 container checking in pod 2021-08-25 09:28:03 -04:00
Naga Ravi Chaitanya Elluri
07ccfbf0aa Add pointer to Kraken-hub
This enables users to run Kraken with minimal configuration tweaks
and makes it easy for especially CI use cases.
2021-08-23 14:33:16 -04:00
prubenda
9b0bcdbf0e Adding node memory hog scenario 2021-08-20 14:02:00 -04:00
Naga Ravi Chaitanya Elluri
6456eec76a Add zone outage scenarios
This commit adds support to create zone outage in AWS by denying both
ingress and egress traffic to the instances belonging to a particular
subnet belonging to the zone by tweaking the network acl. This creates
an outage of all the nodes in the zone - both master and workers.
2021-08-17 11:43:13 -04:00
Naga Ravi Chaitanya Elluri
06d052af48 Run tasks in pod using Job object type
This commit switches the object type from Deployment to Job to be able
to display the status after executing all the scenarios specified in
the Kraken config instead of crashing which is expected in Deployments.

Fixes https://github.com/cloud-bulldozer/kraken/issues/135
2021-08-09 11:50:41 -04:00
Naga Ravi Chaitanya Elluri
c56a8a5356 Add more tunables for cpu hog scenario
This commit exposes the flags to tweak the number of cores and node
count to hog during the node-cpu-hog scenario.
2021-07-28 17:07:40 -04:00
Naga Ravi Chaitanya Elluri
716057eab6 Monitor user application availability during chaos
Current Kraken integration with Cerberus monitors the cluster as well as the
application health post chaos and pass/fails if they are not healthy after chaos.
This commit adds ability to monitor the user application health during the chaos
and fails the run in case of downtime as it's potentially a downtime in case of
customers environment as well. It is especially useful in case of control plane
failure scenarios including API server, Etcd, Ingress etc.
2021-07-27 13:15:57 -04:00
Naga Ravi Chaitanya Elluri
590edff63b Avoid namespace context switch
There are cases where the kubeconfig can be read only like when running
Kraken as a kubernetes deployment. This commit fixes the instances to
use -n flag instead of a namespace context switch.
2021-07-27 11:31:32 -04:00
Naga Ravi Chaitanya Elluri
e9f5961986 [Docs] Add instructions on how to mount custom scenarios 2021-07-26 09:57:11 -04:00
koflerm
304f606b2b Use jsonpath to retrieve pod nodename (#129) 2021-07-23 20:08:06 -04:00
Naga Ravi Chaitanya Elluri
c0b9cb46da Improve error handling
This commit:
- Adds timeout to avoid operations hanging for long durations.
- Improves exception handling and exits wherever needed.
- Sets KUBECONFIG env var globoally to access the cluster.
2021-07-21 12:48:06 -04:00
Paige Rubendall
f051c1c30f Merge pull request #120 from paigerube14/container_kill
Container kill
2021-07-15 15:07:58 -04:00
prubenda
76efac8f9b Adding delete of namespaces 2021-07-13 13:31:45 -04:00
prubenda
46a1823291 Adding killing of specific containers in pods 2021-07-08 17:10:48 -04:00
Naga Ravi Chaitanya Elluri
b75b6e0042 Increase the granularity of cerberus checks
This commit modifies the wait time from 60 seconds to 3 seconds between
each of the requests to the API to capture the components state at a more
granular level by default.
2021-07-08 16:59:33 -04:00
Naga Ravi Chaitanya Elluri
d7ba19c382 Automate the infrastruture pieces
This commit:
- Adds support to automate the infrastructure pieces leveraged by Kraken
  including Cerberus and Elasticsearch
- Adds a Kraken config that can be used to discover all the infra pieces
  automatically without having to tweak the configuration.
2021-07-07 15:52:26 -04:00