5.4 KiB
Chaos Recommendation Tool
This tool, designed for Redhat Kraken, operates through the command line and offers recommendations for chaos testing. It suggests probable chaos test cases that can disrupt application services by analyzing their behavior and assessing their susceptibility to specific fault types.
This tool profiles an application and gathers telemetry data such as CPU, Memory, and Network usage, analyzing it to suggest probable chaos scenarios. For optimal results, it is recommended to activate the utility while the application is under load.
Pre-requisites
- Openshift Or Kubernetes Environment where the application is hosted
- Access to the telemetry data via the exposed Prometheus endpoint
- Python3
Usage
-
To run
$ python3.9 -m venv chaos $ source chaos/bin/activate $ git clone https://github.com/redhat-chaos/krkn.git $ cd krkn $ pip3 install -r requirements.txt $ python3.9 utils/chaos_recommender/chaos_recommender.py -
Follow the prompts to provide the required information.
Configuration
To run the recommender with a config file specify the config file path with the -c argument.
You can customize the default values by editing the krkn/config/recommender_config.yaml file. The configuration file contains the following options:
application: Specify the application name.namespace: Specify the namespace name. If you want to profilelabels: Specify the labels (not used).kubeconfig: Specify the location of the kubeconfig file (not used).prometheus_endpoint: Specify the prometheus endpoint (must).auth_token: Auth token to connect to prometheus endpoint (must).scrape_duration: For how long data should be fetched, e.g., '1m' (must).chaos_library: "kraken" (currently it only supports kraken).chaos_tests: (for output purpose only do not change if not needed)GENERAL: list of general purpose tests available in KrknMEM: list of memory related tests available in KrknNETWORK: list of network related tests available in KrknCPU: list of memory related tests available in Krkn
TIP: to collect prometheus endpoint and token from your OpenShift cluster you can run the following commands:
prometheus_url=$(kubectl get routes -n openshift-monitoring prometheus-k8s --no-headers | awk '{print $2}') #TO USE YOUR CURRENT SESSION TOKEN token=$(oc whoami -t) #TO CREATE A NEW TOKEN token=$(kubectl create token -n openshift-monitoring prometheus-k8s --duration=6h || oc sa new-token -n openshift-monitoring prometheus-k8s)
You can also provide the input values through command-line arguments launching the recommender with -o option:
-o, --options Evaluate command line options
-a APPLICATION, --application APPLICATION
Kubernetes application name
-n NAMESPACE, --namespace NAMESPACE
Kubernetes application namespace
-l LABELS, --labels LABELS
Kubernetes application labels
-p PROMETHEUS_ENDPOINT, --prometheus-endpoint PROMETHEUS_ENDPOINT
Prometheus endpoint URI
-k KUBECONFIG, --kubeconfig KUBECONFIG
Kubeconfig path
-t TOKEN, --token TOKEN
Kubernetes authentication token
-s SCRAPE_DURATION, --scrape-duration SCRAPE_DURATION
Prometheus scrape duration
-i LIBRARY, --library LIBRARY
Chaos library
-L LOG_LEVEL, --log-level LOG_LEVEL
log level (DEBUG, INFO, WARNING, ERROR, CRITICAL
-M MEM [MEM ...], --MEM MEM [MEM ...]
Memory related chaos tests (space separated list)
-C CPU [CPU ...], --CPU CPU [CPU ...]
CPU related chaos tests (space separated list)
-N NETWORK [NETWORK ...], --NETWORK NETWORK [NETWORK ...]
Network related chaos tests (space separated list)
-G GENERIC [GENERIC ...], --GENERIC GENERIC [GENERIC ...]
Memory related chaos tests (space separated list)
If you provide the input values through command-line arguments, the corresponding config file inputs would be ignored.
Podman & Docker image
To run the recommender image please visit the [krkn-hub](https://github.com/redhat-chaos/krkn-hub for further infos.
How it works
After obtaining telemetry data, sourced either locally or from Prometheus, the tool conducts a comprehensive data analysis to detect anomalies. Employing the Z-score method and heatmaps, it identifies outliers by evaluating CPU, memory, and network usage against established limits. Services with Z-scores surpassing a specified threshold are categorized as outliers. This categorization classifies services as network, CPU, or memory-sensitive, consequently leading to the recommendation of relevant test cases.
Customizing Thresholds and Options
You can customize the thresholds and options used for data analysis by modifying the krkn/kraken/chaos_recommender/analysis.py file. For example, you can adjust the threshold for identifying outliers by changing the value of the threshold variable in the identify_outliers function.
Additional Files
config/recommender_config.yaml: The configuration file containing default values for application, namespace, labels, and kubeconfig.
Happy Chaos!