This commit adds support to create zone outage in AWS by denying both
ingress and egress traffic to the instances belonging to a particular
subnet belonging to the zone by tweaking the network acl. This creates
an outage of all the nodes in the zone - both master and workers.
Current Kraken integration with Cerberus monitors the cluster as well as the
application health post chaos and pass/fails if they are not healthy after chaos.
This commit adds ability to monitor the user application health during the chaos
and fails the run in case of downtime as it's potentially a downtime in case of
customers environment as well. It is especially useful in case of control plane
failure scenarios including API server, Etcd, Ingress etc.
* Support for baremtal node scenarious
* Finished baremetal support
* Added documentation for baremetal
* Clarify limitations of implementation in documentation
* Add baremetal support to new run.py file
* Allow use on newer machines
Some older machines require lanplus instead of lan
* Setup to allow per-device user, pass, and bmc address
Also set min version for a dependency
* Fix linting issues
* More linting issue fixes
* More linter issues
* Account for linter standard non-conformity
* Added baremetal warning
Co-authored-by: jaredoconnell <jocnnel@redhat.com>
This commit:
- Refactors the code base to be more modular by moving functions
into respective modules to make it lean and reusable.
- Uses black to reformat the code to follow PEP 8 practices.
Signed-off-by: arcprabh <arcprabh@in.ibm.com>
Incorporated first round of review comments
Signed-off-by: arcprabh <arcprabh@in.ibm.com>
Resolve multiple node name issue for single ip
Signed-off-by: arcprabh <arcprabh@in.ibm.com>
This commit:
- Adds a node scenario to stop and start an instance
- Adds a node scenario to terminate an instance
- Adds a node scenario to reboot an instance
- Adds a node scenario to stop the kubelet
- Adds a node scenario to crash the node