Added node scenarios to stop and terminate instance

This commit:
- Adds a node scenario to stop and start an instance
- Adds a node scenario to terminate an instance
- Adds a node scenario to reboot an instance
- Adds a node scenario to stop the kubelet
- Adds a node scenario to crash the node
This commit is contained in:
Yashashree Suresh
2020-04-29 22:25:23 +05:30
committed by Naga Ravi Chaitanya Elluri
parent aac254ce45
commit 31f06b861a
11 changed files with 430 additions and 33 deletions

View File

@@ -23,6 +23,8 @@ kraken:
- scenarios/etcd.yml
- scenarios/openshift-kube-apiserver.yml
- scenarios/openshift-apiserver.yml
node_scenarios: # List of chaos node scenarios to load
- scenarios/node_scenarios_example.yml
tunings:
wait_duration: 60 # Duration to wait between each chaos scenario
@@ -57,7 +59,51 @@ The report is generated in the run directory and it contains the information abo
[Cerberus](https://github.com/openshift-scale/cerberus) can be used to monitor the cluster under test and the aggregated go/no-go signal generated by it can be consumed by Kraken to determine pass/fail. This is to make sure the Kubernetes/OpenShift environments are healthy on a cluster level instead of just the targeted components level. It is highly recommended to turn on the Cerberus health check feature avaliable in Kraken after installing and setting up Cerberus. To do that, set cerberus_enabled to True and cerberus_url to the url where Cerberus publishes go/no-go signal in the config file.
### Kubernetes/OpenShift chaos scenarios supported
Following are the components of Kubernetes/OpenShift for which a basic chaos scenario config exists today. It currently just supports pod based scenarios, we will be adding more soon. Adding a new pod based scenario is as simple as adding a new config under scenarios directory and defining it in the config.
Kraken currently just supports pod and node based scenarios, we will be adding more soon.
#### Node chaos scenarios
Following node chaos scenarios are supported:
1. **node_start_scenario**: scenario to stop the node instance.
2. **node_stop_scenario**: scenario to stop the node instance.
3. **node_stop_start_scenario**: scenario to stop and then start the node instance.
4. **node_termination_scenario**: scenario to terminate the node instance.
5. **node_reboot_scenario**: scenario to reboot the node instance.
6. **stop_kubelet_scenario**: scenario to stop the kubelet of the node instance.
7. **stop_start_kubelet_scenario**: scenario to stop and start the kubelet of the node instance.
8. **node_crash_scenario**: scenario to crash the node instance.
**NOTE**: If the node doesn't recover from the node_crash_scenario injection, reboot the node to get it back to Ready state.
**NOTE**: node_start_scenario, node_stop_scenario, node_stop_start_scenario, node_termination_scenario, node_reboot_scenario and stop_start_kubelet_scenario are supported only on AWS as of now.
**NOTE**: With AWS as the cloud type, make sure [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) is installed.
Node scenarios can be injected by placing the node scenarios config files under node_scenarios option in the kraken config. Refer to [node_scenarios_example](https://github.com/openshift-scale/kraken/blob/master/scenarios/node_scenarios_example.yml) config file.
```
node_scenarios:
- actions: # node chaos scenarios to be injected
- node_stop_start_scenario
- stop_start_kubelet_scenario
- node_crash_scenario
node_name: # node on which scenario has to be injected
label_selector: node-role.kubernetes.io/worker # when node_name is not specified, a node with matching label_selector is selected for node chaos scenario injection
instance_kill_count: 1 # number of times to inject each scenario under actions
timeout: 120 # duration to wait for completion of node scenario injection
cloud_type: aws # cloud type on which Kubernetes/OpenShift runs
- actions:
- node_reboot_scenario
node_name:
label_selector: node-role.kubernetes.io/infra
instance_kill_count: 1
timeout: 120
cloud_type: aws
```
#### Pod chaos scenarios
Following are the components of Kubernetes/OpenShift for which a basic chaos scenario config exists today. Adding a new pod based scenario is as simple as adding a new config under scenarios directory and defining it in the config.
Component | Description | Working
------------------------ | ---------------------------------------------------------------------------------------------------| ------------------------- |