SYN flood scenario (#668 )

* scenario config file Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * syn flood plugin Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * run_krkn.py updaated Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * requirements.txt + documentation + config.yaml * set node selector defaults to worker Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> --------- Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>
Add alerts to alert.yaml
2026-02-19 20:40:33 +00:00 · 2024-07-29 15:31:37 -04:00 · 2024-07-25 10:51:15 -04:00 · 2024-07-22 10:12:14 -04:00 · 2024-07-18 12:56:08 -04:00 · 2024-07-16 18:04:24 +02:00
95 changed files with 2642 additions and 632 deletions
--- a/.github/workflows/docker-image.yml
+++ b/.github/workflows/docker-image.yml
@@ -1,8 +1,7 @@
 name: Docker Image CI
 on:
  push:
-    branches:
-      - main
+    tags: ['v[0-9].[0-9]+.[0-9]+']
  pull_request:

 jobs:
@@ -12,30 +11,43 @@ jobs:
    - name: Check out code
      uses: actions/checkout@v3
    - name: Build the Docker images
+      if: startsWith(github.ref, 'refs/tags')
      run:  |
-        docker build --no-cache -t quay.io/krkn-chaos/krkn containers/
+        docker build --no-cache -t quay.io/krkn-chaos/krkn containers/ --build-arg TAG=${GITHUB_REF#refs/tags/}
        docker tag quay.io/krkn-chaos/krkn quay.io/redhat-chaos/krkn
+        docker tag quay.io/krkn-chaos/krkn quay.io/krkn-chaos/krkn:${GITHUB_REF#refs/tags/}
+        docker tag quay.io/krkn-chaos/krkn quay.io/redhat-chaos/krkn:${GITHUB_REF#refs/tags/}
+
+    - name: Test Build the Docker images
+      if: ${{ github.event_name == 'pull_request' }}
+      run: |
+        docker build --no-cache -t quay.io/krkn-chaos/krkn containers/ --build-arg PR_NUMBER=${{ github.event.pull_request.number }}
    - name: Login in quay
-      if: github.ref == 'refs/heads/main' && github.event_name == 'push'
+      if: startsWith(github.ref, 'refs/tags')
      run: docker login quay.io -u ${QUAY_USER} -p ${QUAY_TOKEN}
      env:
        QUAY_USER: ${{ secrets.QUAY_USERNAME }}
        QUAY_TOKEN: ${{ secrets.QUAY_PASSWORD }}
    - name: Push the KrknChaos Docker images
-      if: github.ref == 'refs/heads/main' && github.event_name == 'push'
-      run: docker push quay.io/krkn-chaos/krkn
+      if: startsWith(github.ref, 'refs/tags')
+      run: |
+        docker push quay.io/krkn-chaos/krkn
+        docker push quay.io/krkn-chaos/krkn:${GITHUB_REF#refs/tags/}
    - name: Login in to redhat-chaos quay
-      if: github.ref == 'refs/heads/main' && github.event_name == 'push'
+      if: startsWith(github.ref, 'refs/tags/v')
      run: docker login quay.io -u ${QUAY_USER} -p ${QUAY_TOKEN}
      env:
        QUAY_USER: ${{ secrets.QUAY_USER_1 }}
        QUAY_TOKEN: ${{ secrets.QUAY_TOKEN_1 }}
    - name: Push the RedHat Chaos Docker images
-      if: github.ref == 'refs/heads/main' && github.event_name == 'push'
-      run: docker push quay.io/redhat-chaos/krkn
+      if: startsWith(github.ref, 'refs/tags')
+      run: | 
+        docker push quay.io/redhat-chaos/krkn
+        docker push quay.io/redhat-chaos/krkn:${GITHUB_REF#refs/tags/}
    - name: Rebuild krkn-hub
-      if: github.ref == 'refs/heads/main' && github.event_name == 'push'
+      if: startsWith(github.ref, 'refs/tags')
      uses: redhat-chaos/actions/krkn-hub@main
      with:
        QUAY_USER: ${{ secrets.QUAY_USERNAME }}
        QUAY_TOKEN: ${{ secrets.QUAY_PASSWORD }}
+        AUTOPUSH: ${{ secrets.AUTOPUSH }}
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -61,6 +61,8 @@ jobs:
          kubectl create namespace namespace-scenario
          kubectl apply -f CI/templates/time_pod.yaml
          kubectl wait --for=condition=ready pod -l scenario=time-skew --timeout=300s
+          kubectl apply -f CI/templates/service_hijacking.yaml
+          kubectl wait --for=condition=ready pod -l "app.kubernetes.io/name=proxy" --timeout=300s
      - name: Get Kind nodes
        run: |
          kubectl get nodes --show-labels=true
@@ -70,12 +72,14 @@ jobs:
        run: python -m coverage run -a -m unittest discover -s tests -v

      - name: Setup Pull Request Functional Tests
-        if: github.event_name == 'pull_request'
+        if: |
+          github.event_name == 'pull_request'
        run: |
            yq -i '.kraken.port="8081"' CI/config/common_test_config.yaml
            yq -i '.kraken.signal_address="0.0.0.0"' CI/config/common_test_config.yaml
            yq -i '.kraken.performance_monitoring="localhost:9090"' CI/config/common_test_config.yaml
-            echo "test_app_outages" > ./CI/tests/functional_tests
+            echo "test_service_hijacking" > ./CI/tests/functional_tests
+            echo "test_app_outages" >> ./CI/tests/functional_tests
            echo "test_container"      >> ./CI/tests/functional_tests
            echo "test_namespace"      >> ./CI/tests/functional_tests
            echo "test_net_chaos"      >> ./CI/tests/functional_tests
@@ -84,7 +88,9 @@ jobs:
            echo "test_arca_memory_hog" >> ./CI/tests/functional_tests
            echo "test_arca_io_hog" >> ./CI/tests/functional_tests

-      # Push on main only steps
+
+      # Push on main only steps + all other functional to collect coverage
+      # for the badge
      - name: Configure AWS Credentials
        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
        uses: aws-actions/configure-aws-credentials@v4
@@ -101,6 +107,15 @@ jobs:
          yq -i '.telemetry.username="${{secrets.TELEMETRY_USERNAME}}"' CI/config/common_test_config.yaml
          yq -i '.telemetry.password="${{secrets.TELEMETRY_PASSWORD}}"' CI/config/common_test_config.yaml
          echo "test_telemetry" > ./CI/tests/functional_tests
+          echo "test_service_hijacking" >> ./CI/tests/functional_tests
+          echo "test_app_outages" >> ./CI/tests/functional_tests
+          echo "test_container"      >> ./CI/tests/functional_tests
+          echo "test_namespace"      >> ./CI/tests/functional_tests
+          echo "test_net_chaos"      >> ./CI/tests/functional_tests
+          echo "test_time"           >> ./CI/tests/functional_tests
+          echo "test_arca_cpu_hog" >> ./CI/tests/functional_tests
+          echo "test_arca_memory_hog" >> ./CI/tests/functional_tests
+          echo "test_arca_io_hog" >> ./CI/tests/functional_tests

      # Final common steps
      - name: Run Functional tests
@@ -119,6 +134,7 @@ jobs:
      - name: Collect coverage report
        run: |
          python -m coverage html
+          python -m coverage json
      - name: Publish coverage report to job summary
        run: |
          pip install html2text
@@ -129,6 +145,54 @@ jobs:
          name: coverage
          path: htmlcov
          if-no-files-found: error
+      - name: Upload json coverage
+        uses: actions/upload-artifact@v3
+        with:
+          name: coverage.json
+          path: coverage.json
+          if-no-files-found: error
      - name: Check CI results
        run: grep Fail CI/results.markdown && false || true
+  badge:
+    permissions:
+      contents: write
+    name: Generate Coverage Badge
+    runs-on: ubuntu-latest
+    needs:
+      - tests
+    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
+    steps:
+        - name: Check out doc repo
+          uses: actions/checkout@master
+          with:
+            repository: krkn-chaos/krkn-lib-docs
+            path: krkn-lib-docs
+            ssh-key: ${{ secrets.KRKN_LIB_DOCS_PRIV_KEY }}
+        - name: Download json coverage
+          uses: actions/download-artifact@v3
+          with:
+            name: coverage.json
+        - name: Set up Python
+          uses: actions/setup-python@v4
+          with:
+            python-version: 3.9
+        - name: Copy badge on GitHub Page Repo
+          env:
+            COLOR: yellow
+          run: |
+            # generate coverage badge on previously calculated total coverage
+            # and copy in the docs page
+            export TOTAL=$(python -c "import json;print(json.load(open('coverage.json'))['totals']['percent_covered_display'])")
+            [[ $TOTAL > 40 ]] && COLOR=green
+            echo "TOTAL: $TOTAL"
+            echo "COLOR: $COLOR"
+            curl "https://img.shields.io/badge/coverage-$TOTAL%25-$COLOR" > ./krkn-lib-docs/coverage_badge_krkn.svg
+        - name: Push updated Coverage Badge
+          run: |
+            cd krkn-lib-docs
+            git add .
+            git config user.name "krkn-chaos"
+            git config user.email "<>"
+            git commit -m "[KRKN] Coverage Badge ${GITHUB_REF##*/}" || echo "no changes to commit"
+            git push
      
--- a/CI/templates/service_hijacking.yaml
+++ b/CI/templates/service_hijacking.yaml
@@ -0,0 +1,29 @@
+apiVersion: v1
+kind: Pod
+metadata:
+  name: nginx
+  labels:
+    app.kubernetes.io/name: proxy
+spec:
+  containers:
+  - name: nginx
+    image: nginx:stable
+    ports:
+      - containerPort: 80
+        name: http-web-svc
+
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: nginx-service
+spec:
+  selector:
+    app.kubernetes.io/name: proxy
+  type: NodePort
+  ports:
+  - name: name-of-service-port
+    protocol: TCP
+    port: 80
+    targetPort: http-web-svc
+    nodePort: 30036
--- a/CI/tests/common.sh
+++ b/CI/tests/common.sh
@@ -1,7 +1,7 @@
 ERRORED=false

 function finish {
-    if [ $? -eq 1 ] && [ $ERRORED != "true" ]
+    if [ $? != 0 ] && [ $ERRORED != "true" ]
    then
        error
    fi
@@ -13,8 +13,10 @@ function error {
    then
      echo "Error caught."
      ERRORED=true
-    else
-      echo "Exit code greater than zero detected: $exit_code"
+    elif [ $exit_code == 2 ]
+    then
+      echo "Run with exit code 2 detected, it is expected, wrapping the exit code with 0 to avoid pipeline failure"
+      exit 0
    fi
 }

--- a/CI/tests/test_container.sh
+++ b/CI/tests/test_container.sh
@@ -8,11 +8,11 @@ trap finish EXIT
 pod_file="CI/scenarios/hello_pod.yaml"

 function functional_test_container_crash {
-  yq -i '.scenarios[0].namespace="default"' scenarios/openshift/app_outage.yaml
-  yq -i '.scenarios[0].label_selector="scenario=container"' scenarios/openshift/app_outage.yaml
-  yq -i '.scenarios[0].container_name="fedtools"' scenarios/openshift/app_outage.yaml
+  yq -i '.scenarios[0].namespace="default"' scenarios/openshift/container_etcd.yml
+  yq -i '.scenarios[0].label_selector="scenario=container"' scenarios/openshift/container_etcd.yml
+  yq -i '.scenarios[0].container_name="fedtools"' scenarios/openshift/container_etcd.yml
  export scenario_type="container_scenarios"
-  export scenario_file="- scenarios/openshift/app_outage.yaml"
+  export scenario_file="- scenarios/openshift/container_etcd.yml"
  export post_config=""
  envsubst < CI/config/common_test_config.yaml > CI/config/container_config.yaml

--- a/CI/tests/test_service_hijacking.sh
+++ b/CI/tests/test_service_hijacking.sh
@@ -0,0 +1,107 @@
+set -xeEo pipefail
+
+source CI/tests/common.sh
+
+trap error ERR
+trap finish EXIT
+# port mapping has been configured in kind-config.yml
+SERVICE_URL=http://localhost:8888
+PAYLOAD_GET_1="{ \
+  \"status\":\"internal server error\" \
+}"
+STATUS_CODE_GET_1=500
+
+PAYLOAD_PATCH_1="resource patched"
+STATUS_CODE_PATCH_1=201
+
+PAYLOAD_POST_1="{ \
+  \"status\": \"unauthorized\" \
+}"
+STATUS_CODE_POST_1=401
+
+PAYLOAD_GET_2="{ \
+  \"status\":\"resource created\" \
+}"
+STATUS_CODE_GET_2=201
+
+PAYLOAD_PATCH_2="bad request"
+STATUS_CODE_PATCH_2=400
+
+PAYLOAD_POST_2="not found"
+STATUS_CODE_POST_2=404
+
+JSON_MIME="application/json"
+TEXT_MIME="text/plain; charset=utf-8"
+
+function functional_test_service_hijacking {
+
+  export scenario_type="service_hijacking"
+  export scenario_file="scenarios/kube/service_hijacking.yaml"
+  export post_config=""
+  envsubst < CI/config/common_test_config.yaml > CI/config/service_hijacking.yaml
+  python3 -m coverage run -a run_kraken.py -c CI/config/service_hijacking.yaml  > /dev/null 2>&1 &
+  PID=$!
+  #Waiting the hijacking to have effect
+  while [ `curl -X GET -s -o /dev/null -I -w "%{http_code}" $SERVICE_URL/list/index.php` == 404 ]; do echo "waiting scenario to kick in."; sleep 1; done;
+
+  #Checking Step 1 GET on /list/index.php
+  OUT_GET="`curl -X GET -s $SERVICE_URL/list/index.php`"
+  OUT_CONTENT=`curl -X GET -s -o /dev/null -I -w "%{content_type}" $SERVICE_URL/list/index.php`
+  OUT_STATUS_CODE=`curl -X GET -s -o /dev/null -I -w "%{http_code}" $SERVICE_URL/list/index.php`
+  [ "${PAYLOAD_GET_1//[$'\t\r\n ']}" == "${OUT_GET//[$'\t\r\n ']}" ] && echo "Step 1 GET Payload OK" || (echo "Payload did not match. Test failed." && exit 1)
+  [ "$OUT_STATUS_CODE" == "$STATUS_CODE_GET_1" ] && echo "Step 1 GET Status Code OK" || (echo " Step 1 GET status code did not match. Test failed." && exit 1)
+  [ "$OUT_CONTENT" == "$JSON_MIME" ] && echo "Step 1 GET MIME OK" || (echo " Step 1 GET MIME did not match. Test failed." && exit 1)
+
+  #Checking Step 1 POST on /list/index.php
+  OUT_POST="`curl -s -X POST $SERVICE_URL/list/index.php`"
+  OUT_STATUS_CODE=`curl -X POST -s -o /dev/null -I -w "%{http_code}" $SERVICE_URL/list/index.php`
+  OUT_CONTENT=`curl -X POST -s -o /dev/null -I -w "%{content_type}" $SERVICE_URL/list/index.php`
+  [ "${PAYLOAD_POST_1//[$'\t\r\n ']}" == "${OUT_POST//[$'\t\r\n ']}" ] && echo "Step 1 POST Payload OK" || (echo "Payload did not match. Test failed." && exit 1)
+  [ "$OUT_STATUS_CODE" == "$STATUS_CODE_POST_1" ] && echo "Step 1 POST Status Code OK" || (echo "Step 1 POST status code did not match. Test failed." && exit 1)
+  [ "$OUT_CONTENT" == "$JSON_MIME" ] && echo "Step 1 POST MIME OK" || (echo " Step 1 POST MIME did not match. Test failed." && exit 1)
+
+  #Checking Step 1 PATCH on /patch
+  OUT_PATCH="`curl -s -X PATCH $SERVICE_URL/patch`"
+  OUT_STATUS_CODE=`curl -X PATCH -s -o /dev/null -I -w "%{http_code}" $SERVICE_URL/patch`
+  OUT_CONTENT=`curl -X PATCH -s -o /dev/null -I -w "%{content_type}" $SERVICE_URL/patch`
+  [ "${PAYLOAD_PATCH_1//[$'\t\r\n ']}" == "${OUT_PATCH//[$'\t\r\n ']}" ] && echo "Step 1 PATCH Payload OK" || (echo "Payload did not match. Test failed." && exit 1)
+  [ "$OUT_STATUS_CODE" == "$STATUS_CODE_PATCH_1" ] && echo "Step 1 PATCH Status Code OK" || (echo "Step 1 PATCH status code did not match. Test failed." && exit 1)
+  [ "$OUT_CONTENT" == "$TEXT_MIME" ] && echo "Step 1 PATCH MIME OK" || (echo " Step 1 PATCH MIME did not match. Test failed." && exit 1)
+  # wait for the next step
+  sleep 16
+
+  #Checking Step 2 GET on /list/index.php
+  OUT_GET="`curl -X GET -s $SERVICE_URL/list/index.php`"
+  OUT_CONTENT=`curl -X GET -s -o /dev/null -I -w "%{content_type}" $SERVICE_URL/list/index.php`
+  OUT_STATUS_CODE=`curl -X GET -s -o /dev/null -I -w "%{http_code}" $SERVICE_URL/list/index.php`
+  [ "${PAYLOAD_GET_2//[$'\t\r\n ']}" == "${OUT_GET//[$'\t\r\n ']}" ] && echo "Step 2 GET Payload OK" || (echo "Step 2 GET Payload did not match. Test failed." && exit 1)
+  [ "$OUT_STATUS_CODE" == "$STATUS_CODE_GET_2" ] && echo "Step 2 GET Status Code OK" || (echo "Step 2 GET status code did not match. Test failed." && exit 1)
+  [ "$OUT_CONTENT" == "$JSON_MIME" ] && echo "Step 2 GET MIME OK" || (echo " Step 2 GET MIME did not match. Test failed." && exit 1)
+
+  #Checking Step 2 POST on /list/index.php
+  OUT_POST="`curl -s -X POST $SERVICE_URL/list/index.php`"
+  OUT_CONTENT=`curl -X POST -s -o /dev/null -I -w "%{content_type}" $SERVICE_URL/list/index.php`
+  OUT_STATUS_CODE=`curl -X POST -s -o /dev/null -I -w "%{http_code}" $SERVICE_URL/list/index.php`
+  [ "${PAYLOAD_POST_2//[$'\t\r\n ']}" == "${OUT_POST//[$'\t\r\n ']}" ] && echo "Step 2 POST Payload OK" || (echo "Step 2 POST Payload did not match. Test failed." && exit 1)
+  [ "$OUT_STATUS_CODE" == "$STATUS_CODE_POST_2" ] && echo "Step 2 POST Status Code OK" || (echo "Step 2 POST status code did not match. Test failed." && exit 1)
+  [ "$OUT_CONTENT" == "$TEXT_MIME" ] && echo "Step 2 POST MIME OK" || (echo " Step 2 POST MIME did not match. Test failed." && exit 1)
+
+  #Checking Step 2 PATCH on /patch
+  OUT_PATCH="`curl -s -X PATCH $SERVICE_URL/patch`"
+  OUT_CONTENT=`curl -X PATCH -s -o /dev/null -I -w "%{content_type}" $SERVICE_URL/patch`
+  OUT_STATUS_CODE=`curl -X PATCH -s -o /dev/null -I -w "%{http_code}" $SERVICE_URL/patch`
+  [ "${PAYLOAD_PATCH_2//[$'\t\r\n ']}" == "${OUT_PATCH//[$'\t\r\n ']}" ] && echo "Step 2 PATCH Payload OK" || (echo "Step 2 PATCH Payload did not match. Test failed." && exit 1)
+  [ "$OUT_STATUS_CODE" == "$STATUS_CODE_PATCH_2" ] && echo "Step 2 PATCH Status Code OK" || (echo "Step 2 PATCH status code did not match. Test failed." && exit 1)
+  [ "$OUT_CONTENT" == "$TEXT_MIME" ] && echo "Step 2 PATCH MIME OK" || (echo " Step 2 PATCH MIME did not match. Test failed." && exit 1)
+  wait $PID
+
+  # now checking  if service has been restore correctly and nginx responds correctly
+  curl -s  $SERVICE_URL | grep nginx! && echo "BODY: Service restored!" || (echo "BODY: failed to restore service" && exit 1)
+  OUT_STATUS_CODE=`curl -X GET -s -o /dev/null -I -w "%{http_code}" $SERVICE_URL`
+  [ "$OUT_STATUS_CODE" == "200" ] && echo "STATUS_CODE: Service restored!" || (echo "STATUS_CODE: failed to restore service" && exit 1)
+
+  echo "Service Hijacking Chaos test: Success"
+}
+
+
+functional_test_service_hijacking
--- a/CI/tests/test_telemetry.sh
+++ b/CI/tests/test_telemetry.sh
@@ -22,14 +22,14 @@ function functional_test_telemetry {
  export scenario_file="scenarios/arcaflow/cpu-hog/input.yaml"
  export post_config=""
  envsubst < CI/config/common_test_config.yaml > CI/config/telemetry.yaml
-  python3 -m coverage run -a run_kraken.py -c CI/config/telemetry.yaml
+  retval=$(python3 -m coverage run -a run_kraken.py -c CI/config/telemetry.yaml)
  RUN_FOLDER=`cat CI/out/test_telemetry.out | grep amazonaws.com | sed -rn "s#.*https:\/\/.*\/files/(.*)#\1#p"`
  $AWS_CLI s3 ls "s3://$AWS_BUCKET/$RUN_FOLDER/" | awk '{ print $4 }' > s3_remote_files
  echo "checking if telemetry files are uploaded on s3"
  cat s3_remote_files | grep events-00.json || ( echo "FAILED: events-00.json not uploaded" && exit 1 )
-  cat s3_remote_files | grep critical-alerts-00.json || ( echo "FAILED: critical-alerts-00.json not uploaded" && exit 1 )
-  cat s3_remote_files | grep prometheus-00.tar || ( echo "FAILED: prometheus backup not uploaded" && exit 1 )
-  cat s3_remote_files | grep telemetry.json || ( echo "FAILED: telemetry.json not uploaded" && exit 1 )
+  cat s3_remote_files | grep critical-alerts-00.log || ( echo "FAILED: critical-alerts-00.log not uploaded"  && exit 1 )
+  cat s3_remote_files | grep prometheus-00.tar || ( echo "FAILED: prometheus backup not uploaded"  && exit 1 )
+  cat s3_remote_files | grep telemetry.json || ( echo "FAILED: telemetry.json not uploaded"  && exit 1 )
  echo "all files uploaded!"
  echo "Telemetry Collection: Success"
 }
--- a/README.md
+++ b/README.md
@@ -1,5 +1,7 @@
 # Krkn aka Kraken
 ![Workflow-Status](https://github.com/krkn-chaos/krkn/actions/workflows/docker-image.yml/badge.svg)
+![coverage](https://krkn-chaos.github.io/krkn-lib-docs/coverage_badge_krkn.svg)
+![action](https://github.com/krkn-chaos/krkn/actions/workflows/tests.yml/badge.svg)

 ![Krkn logo](media/logo.png)

@@ -39,18 +41,6 @@ After installation, refer back to the below sections for supported scenarios and
 #### Running Kraken with minimal configuration tweaks
 For cases where you want to run Kraken with minimal configuration changes, refer to [krkn-hub](https://github.com/krkn-chaos/krkn-hub). One use case is CI integration where you do not want to carry around different configuration files for the scenarios.

-### Setting up infrastructure dependencies
-Kraken indexes the metrics specified in the profile into Elasticsearch in addition to leveraging Cerberus for understanding the health of the Kubernetes cluster under test. More information on the features is documented below. The infrastructure pieces can be easily installed and uninstalled by running:
-
-```
-$ cd kraken
-$ podman-compose up or $ docker-compose up      # Spins up the containers specified in the docker-compose.yml file present in the run directory.
-$ podman-compose down or $ docker-compose down  # Delete the containers installed.
-```
-This will manage the Cerberus and Elasticsearch containers on the host on which you are running Kraken.
-
-**NOTE**: Make sure you have enough resources (memory and disk) on the machine on top of which the containers are running as Elasticsearch is resource intensive. Cerberus monitors the system components by default, the [config](config/cerberus.yaml) can be tweaked to add applications namespaces, routes and other components to monitor as well. The command will keep running until killed since detached mode is not supported as of now.
-

 ### Config
 Instructions on how to setup the config and the options supported can be found at [Config](docs/config.md).
@@ -73,6 +63,8 @@ Scenario type               | Kubernetes
 [PVC scenario](docs/pvc_scenario.md) | :heavy_check_mark: |
 [Network_Chaos](docs/network_chaos.md) | :heavy_check_mark: |
 [ManagedCluster Scenarios](docs/managedcluster_scenarios.md) | :heavy_check_mark: |
+[Service Hijacking Scenarios](docs/service_hijacking_scenarios.md) | :heavy_check_mark: |
+[SYN Flood Scenarios](docs/syn_flood_scenarios.md) | :heavy_check_mark: |


 ### Kraken scenario pass/fail criteria and report
--- a/config/alerts.yaml
+++ b/config/alerts.yaml
@@ -88,3 +88,42 @@
 - expr: ALERTS{severity="critical", alertstate="firing"} > 0
  description: Critical prometheus alert. {{$labels.alertname}}
  severity: warning
+
+# etcd CPU and usage increase
+- expr: sum(rate(container_cpu_usage_seconds_total{image!='', namespace='openshift-etcd', container='etcd'}[1m])) * 100 / sum(machine_cpu_cores)  > 5
+  description: Etcd CPU usage increased significantly
+  severity: warning
+
+# etcd memory usage increase
+- expr: sum(deriv(container_memory_usage_bytes{image!='', namespace='openshift-etcd', container='etcd'}[5m])) * 100 / sum(node_memory_MemTotal_bytes) > 5
+  description: Etcd memory usage increased significantly
+  severity: warning
+
+# Openshift API server CPU and memory usage increase
+- expr: sum(rate(container_cpu_usage_seconds_total{image!='', namespace='openshift-apiserver', container='openshift-apiserver'}[1m])) * 100 / sum(machine_cpu_cores) > 5
+  description: openshift apiserver cpu usage increased significantly
+  severity: warning
+
+- expr: (sum(deriv(container_memory_usage_bytes{namespace='openshift-apiserver', container='openshift-apiserver'}[5m]))) * 100 / sum(node_memory_MemTotal_bytes) > 5
+  description: openshift apiserver memory usage increased significantly
+  severity: warning
+
+# Openshift kube API server CPU and memory usage increase
+- expr: sum(rate(container_cpu_usage_seconds_total{image!='', namespace='openshift-kube-apiserver', container='kube-apiserver'}[1m])) * 100 / sum(machine_cpu_cores) > 5
+  description: openshift apiserver cpu usage increased significantly
+  severity: warning
+
+- expr: (sum(deriv(container_memory_usage_bytes{namespace='openshift-kube-apiserver', container='kube-apiserver'}[5m]))) * 100 / sum(node_memory_MemTotal_bytes) > 5
+  description: openshift apiserver memory usage increased significantly
+  severity: warning
+
+# Master node CPU usage increase
+- expr: (sum((sum(deriv(pod:container_cpu_usage:sum{container="",pod!=""}[5m])) BY (namespace, pod) * on(pod, namespace) group_left(node) (node_namespace_pod:kube_pod_info:)  )  *  on(node) group_left(role) (max by (node) (kube_node_role{role="master"})))) * 100 / sum(machine_cpu_cores) > 5
+  description: master nodes cpu usage increased significantly
+  severity: warning
+
+# Master nodes memory usage increase
+- expr: (sum((sum(deriv(container_memory_usage_bytes{container="",pod!=""}[5m])) BY (namespace, pod) * on(pod, namespace) group_left(node) (node_namespace_pod:kube_pod_info:)  )  *  on(node) group_left(role) (max by (node) (kube_node_role{role="master"})))) * 100 / sum(node_memory_MemTotal_bytes) > 5
+  description: master nodes memory usage increased significantly
+  severity: warning
+
--- a/config/config.yaml
+++ b/config/config.yaml
@@ -42,6 +42,10 @@ kraken:
            - scenarios/openshift/pvc_scenario.yaml
        - network_chaos:
            - scenarios/openshift/network_chaos.yaml
+        - service_hijacking:
+              - scenarios/kube/service_hijacking.yaml
+        - syn_flood:
+              - scenarios/kube/syn_flood.yaml

 cerberus:
    cerberus_enabled: False                                # Enable it when cerberus is previously installed
--- a/config/recommender_config.yaml
+++ b/config/recommender_config.yaml
@@ -1,5 +1,5 @@
 application: openshift-etcd
-namespace: openshift-etcd
+namespaces: openshift-etcd
 labels: app=openshift-etcd
 kubeconfig: ~/.kube/config.yaml
 prometheus_endpoint: <Prometheus_Endpoint>
--- a/containers/Dockerfile
+++ b/containers/Dockerfile
@@ -1,28 +1,54 @@
-# Dockerfile for kraken
-
-FROM mcr.microsoft.com/azure-cli:latest as azure-cli
-
-FROM registry.access.redhat.com/ubi8/ubi:latest
-
-ENV KUBECONFIG /root/.kube/config
-
-# Copy azure client binary from azure-cli image
-COPY --from=azure-cli /usr/local/bin/az /usr/bin/az
-
-# Install dependencies
-RUN yum install -y git python39 python3-pip jq gettext wget && \
-    python3.9 -m pip install -U pip && \
-    git clone https://github.com/krkn-chaos/krkn.git --branch v1.5.10 /root/kraken && \
-    mkdir -p /root/.kube && cd /root/kraken && \
-    pip3.9 install -r requirements.txt && \
-    pip3.9 install virtualenv && \
-    wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/bin/yq && chmod +x /usr/bin/yq
-
-# Get Kubernetes and OpenShift clients from stable releases
+# oc build
+FROM golang:1.22.4 AS oc-build
+RUN apt-get update && apt-get install -y --no-install-recommends libkrb5-dev
 WORKDIR /tmp
-RUN wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz && tar -xvf openshift-client-linux.tar.gz && cp oc /usr/local/bin/oc && cp oc /usr/bin/oc && cp kubectl /usr/local/bin/kubectl && cp kubectl /usr/bin/kubectl
+RUN git clone --branch release-4.18 https://github.com/openshift/oc.git
+WORKDIR /tmp/oc
+RUN go mod edit -go 1.22.3 &&\
+    go get github.com/moby/buildkit@v0.12.5 &&\
+    go get github.com/containerd/containerd@v1.7.11&&\
+    go get github.com/docker/docker@v25.0.5&&\
+    go mod tidy && go mod vendor
+RUN make GO_REQUIRED_MIN_VERSION:= oc

-WORKDIR /root/kraken
+FROM fedora:40
+ARG PR_NUMBER
+ARG TAG
+RUN groupadd -g 1001 krkn && useradd -m -u 1001 -g krkn krkn
+RUN dnf update -y

+ENV KUBECONFIG /home/krkn/.kube/config
+
+# install kubectl
+RUN curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" &&\
+    cp kubectl /usr/local/bin/kubectl && chmod +x /usr/local/bin/kubectl &&\
+    cp kubectl /usr/bin/kubectl && chmod +x /usr/bin/kubectl
+
+# This overwrites any existing configuration in /etc/yum.repos.d/kubernetes.repo
+RUN dnf update && dnf install -y --setopt=install_weak_deps=False \
+    git python39 jq yq gettext wget which &&\
+    dnf clean all
+
+# copy oc client binary from oc-build image
+COPY --from=oc-build /tmp/oc/oc /usr/bin/oc
+
+# krkn build
+RUN git clone https://github.com/krkn-chaos/krkn.git /home/krkn/kraken && \
+    mkdir -p /home/krkn/.kube
+
+WORKDIR /home/krkn/kraken
+
+# default behaviour will be to build main
+# if it is a PR trigger the PR itself will be checked out
+RUN if [ -n "$PR_NUMBER" ]; then git fetch origin pull/${PR_NUMBER}/head:pr-${PR_NUMBER} && git checkout pr-${PR_NUMBER};fi
+# if it is a TAG trigger checkout the tag
+RUN if [ -n "$TAG" ]; then git checkout "$TAG";fi
+
+RUN python3.9 -m ensurepip
+RUN pip3.9 install -r requirements.txt
+RUN pip3.9 install jsonschema
+
+RUN chown -R krkn:krkn /home/krkn && chmod 755 /home/krkn
+USER krkn
 ENTRYPOINT ["python3.9", "run_kraken.py"]
 CMD ["--config=config/config.yaml"]
--- a/containers/Dockerfile-ppc64le
+++ b/containers/Dockerfile-ppc64le
@@ -1,29 +0,0 @@
-# Dockerfile for kraken
-
-FROM ppc64le/centos:8
-
-FROM mcr.microsoft.com/azure-cli:latest as azure-cli
-
-LABEL org.opencontainers.image.authors="Red Hat OpenShift Chaos Engineering"
-
-ENV KUBECONFIG /root/.kube/config
-
-# Copy azure client binary from azure-cli image
-COPY --from=azure-cli /usr/local/bin/az /usr/bin/az
-
-# Install dependencies
-RUN yum install -y git python39 python3-pip jq gettext wget && \
-    python3.9 -m pip install -U pip && \
-    git clone https://github.com/redhat-chaos/krkn.git --branch v1.5.10 /root/kraken && \
-    mkdir -p /root/.kube && cd /root/kraken && \
-    pip3.9 install -r requirements.txt && \
-    pip3.9 install virtualenv && \
-    wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/bin/yq && chmod +x /usr/bin/yq
-
-# Get Kubernetes and OpenShift clients from stable releases
-WORKDIR /tmp
-RUN wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz && tar -xvf openshift-client-linux.tar.gz && cp oc /usr/local/bin/oc && cp oc /usr/bin/oc && cp kubectl /usr/local/bin/kubectl && cp kubectl /usr/bin/kubectl
-
-WORKDIR /root/kraken
-
-ENTRYPOINT python3.9 run_kraken.py --config=config/config.yaml
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -1,31 +0,0 @@
-version: "3"
-services:
-  elastic:
-    image: docker.elastic.co/elasticsearch/elasticsearch:7.13.2
-    deploy:
-      replicas: 1
-      restart_policy:
-        condition: on-failure
-    network_mode: host
-    environment:
-      discovery.type: single-node
-  kibana:
-    image: docker.elastic.co/kibana/kibana:7.13.2
-    deploy:
-      replicas: 1
-      restart_policy:
-        condition: on-failure
-    network_mode: host
-    environment:
-      ELASTICSEARCH_HOSTS: "http://0.0.0.0:9200"
-  cerberus:
-    image: quay.io/openshift-scale/cerberus:latest
-    privileged: true
-    deploy:
-      replicas: 1
-      restart_policy:
-        condition: on-failure
-    network_mode: host
-    volumes:
-       - ./config/cerberus.yaml:/root/cerberus/config/config.yaml:Z  # Modify the config in case of the need to monitor additional components
-       - ${HOME}/.kube/config:/root/.kube/config:Z
--- a/docs/cloud_setup.md
+++ b/docs/cloud_setup.md
@@ -27,14 +27,12 @@ After creating the service account you will need to enable the account using the

 ## Azure

-**NOTE**: For Azure node killing scenarios, make sure [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) is installed.
-
-You will also need to create a service principal and give it the correct access, see [here](https://docs.openshift.com/container-platform/4.5/installing/installing_azure/installing-azure-account.html) for creating the service principal and setting the proper permissions.
+**NOTE**: You will need to create a service principal and give it the correct access, see [here](https://docs.openshift.com/container-platform/4.5/installing/installing_azure/installing-azure-account.html) for creating the service principal and setting the proper permissions.

 To properly run the service principal requires “Azure Active Directory Graph/Application.ReadWrite.OwnedBy” api permission granted and “User Access Administrator”.

 Before running you will need to set the following:
-1. Login using ```az login```
+1. ```export AZURE_SUBSCRIPTION_ID=<subscription_id>```

 2. ```export AZURE_TENANT_ID=<tenant_id>```

--- a/docs/getting_started.md
+++ b/docs/getting_started.md
@@ -14,11 +14,7 @@ For example, for adding a pod level scenario for a new application, refer to the
    namespace_pattern: ^<namespace>$
    label_selector: <pod label>
    kill: <number of pods to kill>
- id: wait-for-pods
-  config:
-    namespace_pattern: ^<namespace>$
-    label_selector: <pod label>
-    count: <expected number of pods that match namespace and label>
+    krkn_pod_recovery_time: <expected time for the pod to become ready>
 ```

 #### Node Scenario Yaml Template
--- a/docs/pod_scenarios.md
+++ b/docs/pod_scenarios.md
@@ -17,11 +17,8 @@ You can then create the scenario file with the following contents:
  config:
    namespace_pattern: ^kube-system$
    label_selector: k8s-app=kube-scheduler
- id: wait-for-pods
-  config:
-    namespace_pattern: ^kube-system$
-    label_selector: k8s-app=kube-scheduler
-    count: 3
+    krkn_pod_recovery_time: 120
+    
 ```

 Please adjust the schema reference to point to the [schema file](../scenarios/plugin.schema.json). This file will give you code completion and documentation for the available options in your IDE.
--- a/docs/service_hijacking_scenarios.md
+++ b/docs/service_hijacking_scenarios.md
@@ -0,0 +1,80 @@
+### Service Hijacking Scenarios
+
+Service Hijacking Scenarios aim to simulate fake HTTP responses from a workload targeted by a 
+`Service` already deployed in the cluster. 
+This scenario is executed by deploying a custom-made web service and modifying the target `Service`
+selector to direct traffic to this web service for a specified duration.
+
+The web service's source code is available [here](https://github.com/krkn-chaos/krkn-service-hijacking). 
+It employs a time-based test plan from the scenario configuration file, which specifies the behavior of resources during the chaos scenario as follows:
+
+```yaml
+service_target_port: http-web-svc # The port of the service to be hijacked (can be named or numeric, based on the workload and service configuration).
+service_name: nginx-service # The name of the service that will be hijacked.
+service_namespace: default # The namespace where the target service is located.
+image: quay.io/krkn-chaos/krkn-service-hijacking:v0.1.3 # Image of the krkn web service to be deployed to receive traffic.
+chaos_duration: 30 # Total duration of the chaos scenario in seconds.
+plan:
+  - resource: "/list/index.php" # Specifies the resource or path to respond to in the scenario. For paths, both the path and query parameters are captured but ignored. For resources, only query parameters are captured.
+
+    steps:                      # A time-based plan consisting of steps can be defined for each resource.
+      GET:                      # One or more HTTP methods can be specified for each step. Note: Non-standard methods are supported for fully custom web services (e.g., using NONEXISTENT instead of POST).
+
+        - duration: 15          # Duration in seconds for this step before moving to the next one, if defined. Otherwise, this step will continue until the chaos scenario ends.
+
+          status: 500           # HTTP status code to be returned in this step.
+          mime_type: "application/json" # MIME type of the response for this step.
+          payload: |            # The response payload for this step.
+            {
+              "status":"internal server error"
+            }
+        - duration: 15
+          status: 201
+          mime_type: "application/json"
+          payload: |
+            {
+              "status":"resource created"
+            }
+      POST:
+        - duration: 15
+          status: 401
+          mime_type: "application/json"
+          payload: |
+            {
+               "status": "unauthorized"
+            }
+        - duration: 15
+          status: 404
+          mime_type: "text/plain"
+          payload: "not found"
+
+
+```
+The scenario will focus on the `service_name` within the `service_namespace`, 
+substituting the selector with a randomly generated one, which is added as a label in the mock service manifest.
+This allows multiple scenarios to be executed in the same namespace, each targeting different services without 
+causing conflicts.
+
+The newly deployed mock web service will expose a `service_target_port`, 
+which can be either a named or numeric port based on the service configuration. 
+This ensures that the Service correctly routes HTTP traffic to the mock web service during the chaos run.
+
+Each step will last for `duration` seconds from the deployment of the mock web service in the cluster. 
+For each HTTP resource, defined as a top-level YAML property of the plan 
+(it could be a specific resource, e.g., /list/index.php, or a path-based resource typical in MVC frameworks), 
+one or more HTTP request methods can be specified. Both standard and custom request methods are supported.
+
+During this time frame, the web service will respond with:
+
+- `status`: The [HTTP status code](https://datatracker.ietf.org/doc/html/rfc7231#section-6) (can be standard or custom).
+- `mime_type`: The [MIME type](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types) (can be standard or custom).
+- `payload`: The response body to be returned to the client.
+
+At the end of the step `duration`, the web service will proceed to the next step (if available) until 
+the global `chaos_duration` concludes. At this point, the original service will be restored, 
+and the custom web service and its resources will be undeployed.
+
+__NOTE__: Some clients (e.g., cURL, jQuery) may optimize queries using lightweight methods (like HEAD or OPTIONS) 
+to probe API behavior. If these methods are not defined in the test plan, the web service may respond with 
+a `405` or `404` status code. If you encounter unexpected behavior, consider this use case.
+
--- a/docs/syn_flood_scenarios.md
+++ b/docs/syn_flood_scenarios.md
@@ -0,0 +1,33 @@
+### SYN Flood Scenarios
+
+This scenario generates a substantial amount of TCP traffic directed at one or more Kubernetes services within 
+the cluster to test the server's resiliency under extreme traffic conditions. 
+It can also target hosts outside the cluster by specifying a reachable IP address or hostname. 
+This scenario leverages the distributed nature of Kubernetes clusters to instantiate multiple instances 
+of the same pod against a single host, significantly increasing the effectiveness of the attack. 
+The configuration also allows for the specification of multiple node selectors, enabling Kubernetes to schedule 
+the attacker pods on a user-defined subset of nodes to make the test more realistic.
+
+ ```yaml
+packet-size: 120 # hping3 packet size
+window-size: 64 # hping 3 TCP window size
+duration: 10 # chaos scenario duration
+namespace: default # namespace where the target service(s) are deployed
+target-service: target-svc # target service name (if set target-service-label must be empty)
+target-port: 80 # target service TCP port
+target-service-label : "" # target service label, can be used to target multiple target at the same time
+                          # if they have the same label set (if set target-service must be empty)
+number-of-pods: 2 # number of attacker pod instantiated per each target
+image: quay.io/krkn-chaos/krkn-syn-flood # syn flood attacker container image
+attacker-nodes: # this will set the node affinity to schedule the attacker node. Per each node label selector
+                # can be specified multiple values in this way the kube scheduler will schedule the attacker pods
+                # in the best way possible based on the provided labels. Multiple labels can be specified
+  kubernetes.io/hostname:
+    - host_1
+    - host_2
+  kubernetes.io/os:
+    - linux
+
+ ```
+
+The attacker container source code is available [here](https://github.com/krkn-chaos/krkn-syn-flood).
--- a/kind-config.yml
+++ b/kind-config.yml
@@ -2,6 +2,9 @@ kind: Cluster
 apiVersion: kind.x-k8s.io/v1alpha4
 nodes:
  - role: control-plane
+    extraPortMappings:
+      - containerPort: 30036
+        hostPort: 8888
  - role: control-plane
  - role: control-plane
  - role: worker
--- a/kraken/application_outage/actions.py
+++ b/kraken/application_outage/actions.py
@@ -4,6 +4,7 @@ import time
 import kraken.cerberus.setup as cerberus
 from jinja2 import Template
 import kraken.invoke.command as runcommand
+from krkn_lib.k8s import KrknKubernetes
 from krkn_lib.telemetry.k8s import KrknTelemetryKubernetes
 from krkn_lib.models.telemetry import ScenarioTelemetry
 from krkn_lib.utils.functions import get_yaml_item_value, log_exception
@@ -11,14 +12,14 @@ from krkn_lib.utils.functions import get_yaml_item_value, log_exception

 # Reads the scenario config, applies and deletes a network policy to
 # block the traffic for the specified duration
-def run(scenarios_list, config, wait_duration, telemetry: KrknTelemetryKubernetes) -> (list[str], list[ScenarioTelemetry]):
+def run(scenarios_list, config, wait_duration,kubecli: KrknKubernetes, telemetry: KrknTelemetryKubernetes) -> (list[str], list[ScenarioTelemetry]):
    failed_post_scenarios = ""
    scenario_telemetries: list[ScenarioTelemetry] = []
    failed_scenarios = []
    for app_outage_config in scenarios_list:
        scenario_telemetry = ScenarioTelemetry()
        scenario_telemetry.scenario = app_outage_config
-        scenario_telemetry.startTimeStamp = time.time()
+        scenario_telemetry.start_timestamp = time.time()
        telemetry.set_parameters_base64(scenario_telemetry, app_outage_config)
        if len(app_outage_config) > 1:
            try:
@@ -49,25 +50,22 @@ spec:
  podSelector:
    matchLabels: {{ pod_selector }}
  policyTypes: {{ traffic_type }}
-                    """
+"""
                    t = Template(network_policy_template)
                    rendered_spec = t.render(pod_selector=pod_selector, traffic_type=traffic_type)
-                    # Write the rendered template to a file
-                    with open("kraken_network_policy.yaml", "w") as f:
-                        f.write(rendered_spec)
+                    yaml_spec = yaml.safe_load(rendered_spec)
                    # Block the traffic by creating network policy
                    logging.info("Creating the network policy")
-                    runcommand.invoke(
-                        "kubectl create -f %s -n %s --validate=false" % ("kraken_network_policy.yaml", namespace)
-                    )

+                    kubecli.create_net_policy(yaml_spec, namespace)
+                   
                    # wait for the specified duration
                    logging.info("Waiting for the specified duration in the config: %s" % (duration))
                    time.sleep(duration)

                    # unblock the traffic by deleting the network policy
                    logging.info("Deleting the network policy")
-                    runcommand.invoke("kubectl delete -f %s -n %s" % ("kraken_network_policy.yaml", namespace))
+                    kubecli.delete_net_policy("kraken-deny", namespace)

                    logging.info("End of scenario. Waiting for the specified duration: %s" % (wait_duration))
                    time.sleep(wait_duration)
@@ -75,12 +73,12 @@ spec:
                    end_time = int(time.time())
                    cerberus.publish_kraken_status(config, failed_post_scenarios, start_time, end_time)
            except Exception as e :
-                scenario_telemetry.exitStatus = 1
+                scenario_telemetry.exit_status = 1
                failed_scenarios.append(app_outage_config)
                log_exception(app_outage_config)
            else:
-                scenario_telemetry.exitStatus = 0
-            scenario_telemetry.endTimeStamp = time.time()
+                scenario_telemetry.exit_status = 0
+            scenario_telemetry.end_timestamp = time.time()
            scenario_telemetries.append(scenario_telemetry)
    return failed_scenarios, scenario_telemetries

--- a/kraken/arcaflow_plugin/arcaflow_plugin.py
+++ b/kraken/arcaflow_plugin/arcaflow_plugin.py
@@ -16,12 +16,12 @@ def run(scenarios_list: List[str], kubeconfig_path: str, telemetry: KrknTelemetr
    for scenario in scenarios_list:
        scenario_telemetry = ScenarioTelemetry()
        scenario_telemetry.scenario = scenario
-        scenario_telemetry.startTimeStamp = time.time()
+        scenario_telemetry.start_timestamp = time.time()
        telemetry.set_parameters_base64(scenario_telemetry,scenario)
        engine_args = build_args(scenario)
        status_code = run_workflow(engine_args, kubeconfig_path)
-        scenario_telemetry.endTimeStamp = time.time()
-        scenario_telemetry.exitStatus = status_code
+        scenario_telemetry.end_timestamp = time.time()
+        scenario_telemetry.exit_status = status_code
        scenario_telemetries.append(scenario_telemetry)
        if status_code != 0:
            failed_post_scenarios.append(scenario)
@@ -36,9 +36,10 @@ def run_workflow(engine_args: arcaflow.EngineArgs, kubeconfig_path: str) -> int:

 def build_args(input_file: str) -> arcaflow.EngineArgs:
    """sets the kubeconfig parsed by setArcaKubeConfig as an input to the arcaflow workflow"""
-    context = Path(input_file).parent
-    workflow = "{}/workflow.yaml".format(context)
-    config = "{}/config.yaml".format(context)
+    current_path = Path().resolve()
+    context = f"{current_path}/{Path(input_file).parent}"
+    workflow = f"{context}/workflow.yaml"
+    config = f"{context}/config.yaml"
    if not os.path.exists(context):
        raise Exception(
            "context folder for arcaflow workflow not found: {}".format(
@@ -61,7 +62,8 @@ def build_args(input_file: str) -> arcaflow.EngineArgs:
    engine_args = arcaflow.EngineArgs()
    engine_args.context = context
    engine_args.config = config
-    engine_args.input = input_file
+    engine_args.workflow = workflow
+    engine_args.input = f"{current_path}/{input_file}"
    return engine_args


--- a/kraken/chaos_recommender/analysis.py
+++ b/kraken/chaos_recommender/analysis.py
@@ -19,6 +19,7 @@ def load_telemetry_data(file_path):

 def calculate_zscores(data):
    zscores = pd.DataFrame()
+    zscores["Namespace"] = data["namespace"]
    zscores["Service"] = data["service"]
    zscores["CPU"] = (data["CPU"] - data["CPU"].mean()) / data["CPU"].std()
    zscores["Memory"] = (data["MEM"] - data["MEM"].mean()) / data["MEM"].std()
@@ -46,28 +47,49 @@ def get_services_above_heatmap_threshold(dataframe, cpu_threshold, mem_threshold
    return cpu_services, mem_services


-def analysis(file_path, chaos_tests_config, threshold, heatmap_cpu_threshold, heatmap_mem_threshold):
+def analysis(file_path, namespaces, chaos_tests_config, threshold,
+             heatmap_cpu_threshold, heatmap_mem_threshold):
    # Load the telemetry data from file
-    logging.info("Fetching the Telemetry data")
+    logging.info("Fetching the Telemetry data...")
    data = load_telemetry_data(file_path)

    # Calculate Z-scores for CPU, Memory, and Network columns
    zscores = calculate_zscores(data)
+    # Dict for saving analysis data -- key is the namespace
+    analysis_data = {}

-    # Identify outliers
-    logging.info("Identifying outliers")
-    outliers_cpu, outliers_memory, outliers_network = identify_outliers(zscores, threshold)
-    cpu_services, mem_services = get_services_above_heatmap_threshold(data, heatmap_cpu_threshold, heatmap_mem_threshold)
+    # Identify outliers for each namespace
+    for namespace in namespaces:

-    analysis_data = analysis_json(outliers_cpu, outliers_memory,
-                                  outliers_network, cpu_services,
-                                  mem_services, chaos_tests_config)
+        logging.info(f"Identifying outliers for namespace {namespace}...")

-    if not cpu_services:
-        logging.info("There are no services that are using significant CPU compared to their assigned limits (infinite in case no limits are set).")
-    if not mem_services:
-        logging.info("There are no services that are using significant MEMORY compared to their assigned limits (infinite in case no limits are set).")
-    time.sleep(2)
+        namespace_zscores = zscores.loc[zscores["Namespace"] == namespace]
+        namespace_data = data.loc[data["namespace"] == namespace]
+        outliers_cpu, outliers_memory, outliers_network = identify_outliers(
+            namespace_zscores, threshold)
+        cpu_services, mem_services = get_services_above_heatmap_threshold(
+            namespace_data, heatmap_cpu_threshold, heatmap_mem_threshold)
+
+        analysis_data[namespace] = analysis_json(outliers_cpu, outliers_memory,
+                                                 outliers_network,
+                                                 cpu_services, mem_services,
+                                                 chaos_tests_config)
+
+        if cpu_services:
+            logging.info(f"These services use significant CPU compared to "
+                         f"their assigned limits: {cpu_services}")
+        else:
+            logging.info("There are no services that are using significant "
+                         "CPU compared to their assigned limits "
+                         "(infinite in case no limits are set).")
+        if mem_services:
+            logging.info(f"These services use significant MEMORY compared to "
+                         f"their assigned limits: {mem_services}")
+        else:
+            logging.info("There are no services that are using significant "
+                         "MEMORY compared to their assigned limits "
+                         "(infinite in case no limits are set).")
+        time.sleep(2)

    logging.info("Please check data in utilisation.txt for further analysis")

--- a/kraken/chaos_recommender/prometheus.py
+++ b/kraken/chaos_recommender/prometheus.py
@@ -17,28 +17,60 @@ def convert_data_to_dataframe(data, label):


 def convert_data(data, service):
-
    result = {}
    for entry in data:
        pod_name = entry['metric']['pod']
        value = entry['value'][1]
        result[pod_name] = value
-    return result.get(service, '100000000000') # for those pods whose limits are not defined they can take as much resources, there assigning a very high value
+    return result.get(service) # for those pods whose limits are not defined they can take as much resources, there assigning a very high value


-def save_utilization_to_file(cpu_data, cpu_limits_result, mem_data, mem_limits_result, network_data, filename):
-    df_cpu = convert_data_to_dataframe(cpu_data, "CPU")
-    merged_df = pd.DataFrame(columns=['service','CPU','CPU_LIMITS','MEM','MEM_LIMITS','NETWORK'])
-    services = df_cpu.service.unique()
-    logging.info(services)
+def convert_data_limits(data, node_data, service, prometheus):
+    result = {}
+    for entry in data:
+        pod_name = entry['metric']['pod']
+        value = entry['value'][1]
+        result[pod_name] = value
+    return result.get(service, get_node_capacity(node_data, service, prometheus)) # for those pods whose limits are not defined they can take as much resources, there assigning a very high value

-    for s in services:
+def get_node_capacity(node_data, pod_name, prometheus ):

-        new_row_df = pd.DataFrame( {"service": s, "CPU" : convert_data(cpu_data, s),
-                    "CPU_LIMITS" : convert_data(cpu_limits_result, s),
-                    "MEM" : convert_data(mem_data, s), "MEM_LIMITS" : convert_data(mem_limits_result, s),
-                    "NETWORK" : convert_data(network_data, s)}, index=[0])
-        merged_df = pd.concat([merged_df, new_row_df], ignore_index=True)
+    # Get the node name on which the pod is running
+    query = f'kube_pod_info{{pod="{pod_name}"}}'
+    result = prometheus.custom_query(query)
+    if not result:
+        return None
+
+    node_name = result[0]['metric']['node']
+
+    for item in node_data:
+        if item['metric']['node'] == node_name:
+            return item['value'][1]
+
+    return '1000000000'
+
+
+def save_utilization_to_file(utilization, filename, prometheus):
+
+    merged_df = pd.DataFrame(columns=['namespace', 'service', 'CPU', 'CPU_LIMITS', 'MEM', 'MEM_LIMITS', 'NETWORK'])
+    for namespace in utilization:
+        # Loading utilization_data[] for namespace
+        # indexes -- 0 CPU, 1 CPU limits, 2 mem, 3 mem limits, 4 network
+        utilization_data = utilization[namespace]
+        df_cpu = convert_data_to_dataframe(utilization_data[0], "CPU")
+        services = df_cpu.service.unique()
+        logging.info(f"Services for namespace {namespace}: {services}")
+
+        for s in services:
+
+            new_row_df = pd.DataFrame({
+                "namespace": namespace, "service": s,
+                "CPU": convert_data(utilization_data[0], s),
+                "CPU_LIMITS": convert_data_limits(utilization_data[1],utilization_data[5], s, prometheus),
+                "MEM": convert_data(utilization_data[2], s),
+                "MEM_LIMITS": convert_data_limits(utilization_data[3], utilization_data[6], s, prometheus),
+                "NETWORK": convert_data(utilization_data[4], s)}, index=[0])
+            merged_df = pd.concat([merged_df, new_row_df], ignore_index=True)

    # Convert columns to string
    merged_df['CPU'] = merged_df['CPU'].astype(str)
@@ -48,47 +80,65 @@ def save_utilization_to_file(cpu_data, cpu_limits_result, mem_data, mem_limits_r
    merged_df['NETWORK'] = merged_df['NETWORK'].astype(str)

    # Extract integer part before the decimal point
-    merged_df['CPU'] = merged_df['CPU'].str.split('.').str[0]
-    merged_df['MEM'] = merged_df['MEM'].str.split('.').str[0]
-    merged_df['CPU_LIMITS'] = merged_df['CPU_LIMITS'].str.split('.').str[0]
-    merged_df['MEM_LIMITS'] = merged_df['MEM_LIMITS'].str.split('.').str[0]
-    merged_df['NETWORK'] = merged_df['NETWORK'].str.split('.').str[0]
+    #merged_df['CPU'] = merged_df['CPU'].str.split('.').str[0]
+    #merged_df['MEM'] = merged_df['MEM'].str.split('.').str[0]
+    #merged_df['CPU_LIMITS'] = merged_df['CPU_LIMITS'].str.split('.').str[0]
+    #merged_df['MEM_LIMITS'] = merged_df['MEM_LIMITS'].str.split('.').str[0]
+    #merged_df['NETWORK'] = merged_df['NETWORK'].str.split('.').str[0]

    merged_df.to_csv(filename, sep='\t', index=False)


-def fetch_utilization_from_prometheus(prometheus_endpoint, auth_token, namespace, scrape_duration):
+def fetch_utilization_from_prometheus(prometheus_endpoint, auth_token,
+                                      namespaces, scrape_duration):
    urllib3.disable_warnings()
-    prometheus = PrometheusConnect(url=prometheus_endpoint, headers={'Authorization':'Bearer {}'.format(auth_token)}, disable_ssl=True)
+    prometheus = PrometheusConnect(url=prometheus_endpoint, headers={
+        'Authorization':'Bearer {}'.format(auth_token)}, disable_ssl=True)

-    # Fetch CPU utilization
-    logging.info("Fetching utilization")
-    cpu_query = 'sum (rate (container_cpu_usage_seconds_total{image!="", namespace="%s"}[%s])) by (pod) *1000' % (namespace,scrape_duration)
-    cpu_result = prometheus.custom_query(cpu_query)
+    # Dicts for saving utilisation and queries -- key is namespace
+    utilization = {}
+    queries = {}

-    cpu_limits_query = '(sum by (pod) (kube_pod_container_resource_limits{resource="cpu", namespace="%s"}))*1000' %(namespace)
-    cpu_limits_result = prometheus.custom_query(cpu_limits_query)
+    logging.info("Fetching utilization...")
+    for namespace in namespaces:

-    mem_query = 'sum by (pod) (avg_over_time(container_memory_usage_bytes{image!="", namespace="%s"}[%s]))' % (namespace, scrape_duration)
-    mem_result = prometheus.custom_query(mem_query)
+        # Fetch CPU utilization
+        cpu_query = 'sum (rate (container_cpu_usage_seconds_total{image!="", namespace="%s"}[%s])) by (pod) *1000' % (namespace,scrape_duration)
+        cpu_result = prometheus.custom_query(cpu_query)

-    mem_limits_query = 'sum by (pod) (kube_pod_container_resource_limits{resource="memory", namespace="%s"})  ' %(namespace)
-    mem_limits_result = prometheus.custom_query(mem_limits_query)
+        cpu_limits_query = '(sum by (pod) (kube_pod_container_resource_limits{resource="cpu", namespace="%s"}))*1000' %(namespace)
+        cpu_limits_result = prometheus.custom_query(cpu_limits_query)

-    network_query = 'sum by (pod) ((avg_over_time(container_network_transmit_bytes_total{namespace="%s"}[%s])) + \
-    (avg_over_time(container_network_receive_bytes_total{namespace="%s"}[%s])))' % (namespace, scrape_duration, namespace, scrape_duration)
-    network_result = prometheus.custom_query(network_query)
+        node_cpu_limits_query = 'kube_node_status_capacity{resource="cpu", unit="core"}*1000'
+        node_cpu_limits_result = prometheus.custom_query(node_cpu_limits_query)
+
+        mem_query = 'sum by (pod) (avg_over_time(container_memory_usage_bytes{image!="", namespace="%s"}[%s]))' % (namespace, scrape_duration)
+        mem_result = prometheus.custom_query(mem_query)
+
+        mem_limits_query = 'sum by (pod) (kube_pod_container_resource_limits{resource="memory", namespace="%s"})  ' %(namespace)
+        mem_limits_result = prometheus.custom_query(mem_limits_query)
+
+        node_mem_limits_query = 'kube_node_status_capacity{resource="memory", unit="byte"}'
+        node_mem_limits_result = prometheus.custom_query(node_mem_limits_query)
+
+        network_query = 'sum by (pod) ((avg_over_time(container_network_transmit_bytes_total{namespace="%s"}[%s])) + \
+        (avg_over_time(container_network_receive_bytes_total{namespace="%s"}[%s])))' % (namespace, scrape_duration, namespace, scrape_duration)
+        network_result = prometheus.custom_query(network_query)
+
+        utilization[namespace] = [cpu_result, cpu_limits_result, mem_result, mem_limits_result, network_result, node_cpu_limits_result, node_mem_limits_result ]
+        queries[namespace] = json_queries(cpu_query, cpu_limits_query, mem_query, mem_limits_query, network_query)
+
+    save_utilization_to_file(utilization, saved_metrics_path, prometheus)

-    save_utilization_to_file(cpu_result, cpu_limits_result, mem_result, mem_limits_result, network_result, saved_metrics_path)
-    queries = json_queries(cpu_query, cpu_limits_query, mem_query, mem_limits_query)
    return saved_metrics_path, queries


-def json_queries(cpu_query, cpu_limits_query, mem_query, mem_limits_query):
+def json_queries(cpu_query, cpu_limits_query, mem_query, mem_limits_query, network_query):
    queries = {
        "cpu_query": cpu_query,
        "cpu_limit_query": cpu_limits_query,
        "memory_query": mem_query,
-        "memory_limit_query": mem_limits_query
+        "memory_limit_query": mem_limits_query,
+        "network_query": network_query
    }
    return queries
--- a/kraken/network_chaos/actions.py
+++ b/kraken/network_chaos/actions.py
@@ -23,7 +23,7 @@ def run(scenarios_list, config, wait_duration, kubecli: KrknKubernetes, telemetr
    for net_config in scenarios_list:
        scenario_telemetry = ScenarioTelemetry()
        scenario_telemetry.scenario = net_config
-        scenario_telemetry.startTimeStamp = time.time()
+        scenario_telemetry.start_timestamp = time.time()
        telemetry.set_parameters_base64(scenario_telemetry, net_config)
        try:
            with open(net_config, "r") as file:
@@ -114,11 +114,11 @@ def run(scenarios_list, config, wait_duration, kubecli: KrknKubernetes, telemetr
                    logging.info("Deleting jobs")
                    delete_job(joblst[:], kubecli)
        except (RuntimeError, Exception):
-            scenario_telemetry.exitStatus = 1
+            scenario_telemetry.exit_status = 1
            failed_scenarios.append(net_config)
            log_exception(net_config)
        else:
-            scenario_telemetry.exitStatus = 0
+            scenario_telemetry.exit_status = 0
        scenario_telemetries.append(scenario_telemetry)
    return failed_scenarios, scenario_telemetries

--- a/kraken/node_actions/az_node_scenarios.py
+++ b/kraken/node_actions/az_node_scenarios.py
@@ -1,6 +1,6 @@

 import time
-import yaml
+import os
 import kraken.invoke.command as runcommand
 import logging
 import kraken.node_actions.common_node_functions as nodeaction
@@ -17,9 +17,9 @@ class Azure:
        # Acquire a credential object using CLI-based authentication.
        credentials = DefaultAzureCredential()
        logging.info("credential " + str(credentials))
-        az_account = runcommand.invoke("az account list -o yaml")
-        az_account_yaml = yaml.safe_load(az_account, Loader=yaml.FullLoader)
-        subscription_id = az_account_yaml[0]["id"]
+        # az_account = runcommand.invoke("az account list -o yaml")
+        # az_account_yaml = yaml.safe_load(az_account, Loader=yaml.FullLoader)
+        subscription_id = os.getenv("AZURE_SUBSCRIPTION_ID")
        self.compute_client = ComputeManagementClient(credentials, subscription_id)

    # Get the instance ID of the node
--- a/kraken/node_actions/gcp_node_scenarios.py
+++ b/kraken/node_actions/gcp_node_scenarios.py
@@ -1,6 +1,8 @@
+import os
 import sys
 import time
 import logging
+import json
 import kraken.node_actions.common_node_functions as nodeaction
 from kraken.node_actions.abstract_node_scenarios import abstract_node_scenarios
 from googleapiclient import discovery
@@ -10,11 +12,19 @@ from krkn_lib.k8s import KrknKubernetes

 class GCP:
    def __init__(self):
+        try: 
+            gapp_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
+            with open(gapp_creds, "r") as f:
+                f_str = f.read()
+                self.project = json.loads(f_str)['project_id']
+            #self.project = runcommand.invoke("gcloud config get-value project").split("/n")[0].strip()
+            logging.info("project " + str(self.project) + "!")
+            credentials = GoogleCredentials.get_application_default()
+            self.client = discovery.build("compute", "v1", credentials=credentials, cache_discovery=False)

-        self.project = runcommand.invoke("gcloud config get-value project").split("/n")[0].strip()
-        logging.info("project " + str(self.project) + "!")
-        credentials = GoogleCredentials.get_application_default()
-        self.client = discovery.build("compute", "v1", credentials=credentials, cache_discovery=False)
+        except Exception as e: 
+            logging.error("Error on setting up GCP connection: " + str(e))
+            sys.exit(1)

    # Get the instance ID of the node
    def get_instance_id(self, node):
--- a/kraken/node_actions/run.py
+++ b/kraken/node_actions/run.py
@@ -15,7 +15,7 @@ import kraken.cerberus.setup as cerberus
 from krkn_lib.k8s import KrknKubernetes
 from krkn_lib.telemetry.k8s import KrknTelemetryKubernetes
 from krkn_lib.models.telemetry import ScenarioTelemetry
-from krkn_lib.utils.functions import get_yaml_item_value
+from krkn_lib.utils.functions import get_yaml_item_value, log_exception

 node_general = False

@@ -61,7 +61,7 @@ def run(scenarios_list, config, wait_duration, kubecli: KrknKubernetes, telemetr
    for node_scenario_config in scenarios_list:
        scenario_telemetry = ScenarioTelemetry()
        scenario_telemetry.scenario = node_scenario_config
-        scenario_telemetry.startTimeStamp = time.time()
+        scenario_telemetry.start_timestamp = time.time()
        telemetry.set_parameters_base64(scenario_telemetry, node_scenario_config)
        with open(node_scenario_config, "r") as f:
            node_scenario_config = yaml.full_load(f)
@@ -78,13 +78,13 @@ def run(scenarios_list, config, wait_duration, kubecli: KrknKubernetes, telemetr
                            cerberus.get_status(config, start_time, end_time)
                            logging.info("")
                        except (RuntimeError, Exception) as e:
-                            scenario_telemetry.exitStatus = 1
+                            scenario_telemetry.exit_status = 1
                            failed_scenarios.append(node_scenario_config)
                            log_exception(node_scenario_config)
                        else:
-                            scenario_telemetry.exitStatus = 0
+                            scenario_telemetry.exit_status = 0

-                        scenario_telemetry.endTimeStamp = time.time()
+                        scenario_telemetry.end_timestamp = time.time()
                        scenario_telemetries.append(scenario_telemetry)

    return failed_scenarios, scenario_telemetries
--- a/kraken/plugins/init.py
+++ b/kraken/plugins/init.py
@@ -2,11 +2,14 @@ import dataclasses
 import json
 import logging
 from os.path import abspath
-from typing import List, Dict
+from typing import List, Dict, Any
 import time

 from arcaflow_plugin_sdk import schema, serialization, jsonschema
 from arcaflow_plugin_kill_pod import kill_pods, wait_for_pods
+from krkn_lib.k8s import KrknKubernetes
+from krkn_lib.k8s.pods_monitor_pool import PodsMonitorPool
+
 import kraken.plugins.node_scenarios.vmware_plugin as vmware_plugin
 import kraken.plugins.node_scenarios.ibmcloud_plugin as ibmcloud_plugin
 from kraken.plugins.run_python_plugin import run_python_file
@@ -47,11 +50,14 @@ class Plugins:
                )
            self.steps_by_id[step.schema.id] = step

-    def run(self, file: str, kubeconfig_path: str, kraken_config: str):
+    def unserialize_scenario(self, file: str) -> Any:
+        return serialization.load_from_file(abspath(file))
+
+    def run(self, file: str, kubeconfig_path: str, kraken_config: str, run_uuid:str):
        """
        Run executes a series of steps
        """
-        data = serialization.load_from_file(abspath(file))
+        data = self.unserialize_scenario(abspath(file))
        if not isinstance(data, list):
            raise Exception(
                "Invalid scenario configuration file: {} expected list, found {}".format(file, type(data).__name__)
@@ -96,7 +102,8 @@ class Plugins:
                unserialized_input.kubeconfig_path = kubeconfig_path
            if "kraken_config" in step.schema.input.properties:
                unserialized_input.kraken_config = kraken_config
-            output_id, output_data = step.schema(unserialized_input)
+            output_id, output_data = step.schema(params=unserialized_input, run_id=run_uuid)
+
            logging.info(step.render_output(output_id, output_data) + "\n")
            if output_id in step.error_output_ids:
                raise Exception(
@@ -241,25 +248,73 @@ PLUGINS = Plugins(
 )


-def run(scenarios: List[str], kubeconfig_path: str, kraken_config: str, failed_post_scenarios: List[str], wait_duration: int, telemetry: KrknTelemetryKubernetes) -> (List[str], list[ScenarioTelemetry]):
+def run(scenarios: List[str],
+        kubeconfig_path: str,
+        kraken_config: str,
+        failed_post_scenarios: List[str],
+        wait_duration: int,
+        telemetry: KrknTelemetryKubernetes,
+        kubecli: KrknKubernetes,
+        run_uuid: str
+        ) -> (List[str], list[ScenarioTelemetry]):
+
    scenario_telemetries: list[ScenarioTelemetry] = []
    for scenario in scenarios:
        scenario_telemetry = ScenarioTelemetry()
        scenario_telemetry.scenario = scenario
-        scenario_telemetry.startTimeStamp = time.time()
+        scenario_telemetry.start_timestamp = time.time()
        telemetry.set_parameters_base64(scenario_telemetry, scenario)
        logging.info('scenario ' + str(scenario))
+        pool = PodsMonitorPool(kubecli)
+        kill_scenarios = [kill_scenario for kill_scenario in PLUGINS.unserialize_scenario(scenario) if kill_scenario["id"] == "kill-pods"]
+
        try:
-            PLUGINS.run(scenario, kubeconfig_path, kraken_config)
+            start_monitoring(pool, kill_scenarios)
+            PLUGINS.run(scenario, kubeconfig_path, kraken_config, run_uuid)
+            result = pool.join()
+            scenario_telemetry.affected_pods = result
+            if result.error:
+                raise Exception(f"unrecovered pods: {result.error}")
+
        except Exception as e:
-            scenario_telemetry.exitStatus = 1
+            logging.error(f"scenario exception: {str(e)}")
+            scenario_telemetry.exit_status = 1
+            pool.cancel()
            failed_post_scenarios.append(scenario)
            log_exception(scenario)
        else:
-            scenario_telemetry.exitStatus = 0
+            scenario_telemetry.exit_status = 0
            logging.info("Waiting for the specified duration: %s" % (wait_duration))
            time.sleep(wait_duration)
        scenario_telemetries.append(scenario_telemetry)
-        scenario_telemetry.endTimeStamp = time.time()
+        scenario_telemetry.end_timestamp = time.time()

    return failed_post_scenarios, scenario_telemetries
+
+
+def start_monitoring(pool: PodsMonitorPool, scenarios: list[Any]):
+    for kill_scenario in scenarios:
+        recovery_time = kill_scenario["config"]["krkn_pod_recovery_time"]
+        if ("namespace_pattern" in kill_scenario["config"] and
+                "label_selector" in kill_scenario["config"]):
+            namespace_pattern = kill_scenario["config"]["namespace_pattern"]
+            label_selector = kill_scenario["config"]["label_selector"]
+            pool.select_and_monitor_by_namespace_pattern_and_label(
+                namespace_pattern=namespace_pattern,
+                label_selector=label_selector,
+                max_timeout=recovery_time)
+            logging.info(
+                f"waiting {recovery_time} seconds for pod recovery, "
+                f"pod label selector: {label_selector} namespace pattern: {namespace_pattern}")
+
+        elif ("namespace_pattern" in kill_scenario["config"] and
+              "name_pattern" in kill_scenario["config"]):
+            namespace_pattern = kill_scenario["config"]["namespace_pattern"]
+            name_pattern = kill_scenario["config"]["name_pattern"]
+            pool.select_and_monitor_by_name_pattern_and_namespace_pattern(pod_name_pattern=name_pattern,
+                                                                          namespace_pattern=namespace_pattern,
+                                                                          max_timeout=recovery_time)
+            logging.info(f"waiting {recovery_time} seconds for pod recovery, "
+                         f"pod name pattern: {name_pattern} namespace pattern: {namespace_pattern}")
+        else:
+            raise Exception(f"impossible to determine monitor parameters, check {kill_scenario} configuration")
--- a/kraken/plugins/node_scenarios/vmware_plugin.py
+++ b/kraken/plugins/node_scenarios/vmware_plugin.py
@@ -119,11 +119,11 @@ class vSphere:
        vm = self.get_vm(instance_id)
        try:
            self.client.vcenter.vm.Power.stop(vm)
-            logging.info("Stopped VM -- '{}-({})'", instance_id, vm)
+            logging.info(f"Stopped VM -- '{instance_id}-({vm})'")
            return True
        except AlreadyInDesiredState:
            logging.info(
-                "VM '{}'-'({})' is already Powered Off", instance_id, vm
+                f"VM '{instance_id}'-'({vm})' is already Powered Off"
            )
            return False

@@ -136,11 +136,11 @@ class vSphere:
        vm = self.get_vm(instance_id)
        try:
            self.client.vcenter.vm.Power.start(vm)
-            logging.info("Started VM -- '{}-({})'", instance_id, vm)
+            logging.info(f"Started VM -- '{instance_id}-({vm})'")
            return True
        except AlreadyInDesiredState:
            logging.info(
-                "VM '{}'-'({})' is already Powered On", instance_id, vm
+                f"VM '{instance_id}'-'({vm})' is already Powered On"
            )
            return False

@@ -318,12 +318,12 @@ class vSphere:
        try:
            vm = self.get_vm(instance_id)
            state = self.client.vcenter.vm.Power.get(vm).state
-            logging.info("Check instance %s status", instance_id)
+            logging.info(f"Check instance {instance_id} status")
            return state
        except Exception as e:
            logging.error(
-                "Failed to get node instance status %s. Encountered following "
-                "exception: %s.", instance_id, e
+                f"Failed to get node instance status {instance_id}. Encountered following "
+                f"exception: {str(e)}. "
            )
            return None

@@ -338,16 +338,14 @@ class vSphere:
        while vm is not None:
            vm = self.get_vm(instance_id)
            logging.info(
-                "VM %s is still being deleted, "
-                "sleeping for 5 seconds",
-                instance_id
+                f"VM {instance_id} is still being deleted, "
+                f"sleeping for 5 seconds"
            )
            time.sleep(5)
            time_counter += 5
            if time_counter >= timeout:
                logging.info(
-                    "VM %s is still not deleted in allotted time",
-                    instance_id
+                    f"VM {instance_id} is still not deleted in allotted time"
                )
                return False
        return True
@@ -371,8 +369,7 @@ class vSphere:
            time_counter += 5
            if time_counter >= timeout:
                logging.info(
-                    "VM %s is still not ready in allotted time",
-                    instance_id
+                    f"VM {instance_id} is still not ready in allotted time"
                )
                return False
        return True
@@ -388,16 +385,14 @@ class vSphere:
        while status != Power.State.POWERED_OFF:
            status = self.get_vm_status(instance_id)
            logging.info(
-                "VM %s is still not running, "
-                "sleeping for 5 seconds",
-                instance_id
+                f"VM {instance_id} is still not running, "
+                f"sleeping for 5 seconds"
            )
            time.sleep(5)
            time_counter += 5
            if time_counter >= timeout:
                logging.info(
-                    "VM %s is still not ready in allotted time",
-                    instance_id
+                    f"VM {instance_id} is still not ready in allotted time"
                )
                return False
        return True
@@ -561,7 +556,7 @@ def node_start(
            try:
                for _ in range(cfg.runs):
                    logging.info("Starting node_start_scenario injection")
-                    logging.info("Starting the node %s ", name)
+                    logging.info(f"Starting the node {name} ")
                    vm_started = vsphere.start_instances(name)
                    if vm_started:
                        vsphere.wait_until_running(name, cfg.timeout)
@@ -571,7 +566,7 @@ def node_start(
                            )
                        nodes_started[int(time.time_ns())] = Node(name=name)
                    logging.info(
-                        "Node with instance ID: %s is in running state", name
+                        f"Node with instance ID: {name} is in running state"
                    )
                    logging.info(
                        "node_start_scenario has been successfully injected!"
@@ -579,8 +574,8 @@ def node_start(
            except Exception as e:
                logging.error("Failed to start node instance. Test Failed")
                logging.error(
-                    "node_start_scenario injection failed! "
-                    "Error was: %s", str(e)
+                    f"node_start_scenario injection failed! "
+                    f"Error was: {str(e)}"
                )
                return "error", NodeScenarioErrorOutput(
                    format_exc(), kube_helper.Actions.START
@@ -620,7 +615,7 @@ def node_stop(
            try:
                for _ in range(cfg.runs):
                    logging.info("Starting node_stop_scenario injection")
-                    logging.info("Stopping the node %s ", name)
+                    logging.info(f"Stopping the node {name} ")
                    vm_stopped = vsphere.stop_instances(name)
                    if vm_stopped:
                        vsphere.wait_until_stopped(name, cfg.timeout)
@@ -630,7 +625,7 @@ def node_stop(
                            )
                        nodes_stopped[int(time.time_ns())] = Node(name=name)
                    logging.info(
-                        "Node with instance ID: %s is in stopped state", name
+                        f"Node with instance ID: {name} is in stopped state"
                    )
                    logging.info(
                        "node_stop_scenario has been successfully injected!"
@@ -638,8 +633,8 @@ def node_stop(
            except Exception as e:
                logging.error("Failed to stop node instance. Test Failed")
                logging.error(
-                    "node_stop_scenario injection failed! "
-                    "Error was: %s", str(e)
+                    f"node_stop_scenario injection failed! "
+                    f"Error was: {str(e)}"
                )
                return "error", NodeScenarioErrorOutput(
                    format_exc(), kube_helper.Actions.STOP
@@ -679,7 +674,7 @@ def node_reboot(
            try:
                for _ in range(cfg.runs):
                    logging.info("Starting node_reboot_scenario injection")
-                    logging.info("Rebooting the node %s ", name)
+                    logging.info(f"Rebooting the node {name} ")
                    vsphere.reboot_instances(name)
                    if not cfg.skip_openshift_checks:
                        kube_helper.wait_for_unknown_status(
@@ -690,8 +685,8 @@ def node_reboot(
                        )
                    nodes_rebooted[int(time.time_ns())] = Node(name=name)
                    logging.info(
-                        "Node with instance ID: %s has rebooted "
-                        "successfully", name
+                        f"Node with instance ID: {name} has rebooted "
+                        "successfully"
                    )
                    logging.info(
                        "node_reboot_scenario has been successfully injected!"
@@ -699,8 +694,8 @@ def node_reboot(
            except Exception as e:
                logging.error("Failed to reboot node instance. Test Failed")
                logging.error(
-                    "node_reboot_scenario injection failed! "
-                    "Error was: %s", str(e)
+                    f"node_reboot_scenario injection failed! "
+                    f"Error was: {str(e)}"
                )
                return "error", NodeScenarioErrorOutput(
                    format_exc(), kube_helper.Actions.REBOOT
@@ -739,13 +734,13 @@ def node_terminate(
                    vsphere.stop_instances(name)
                    vsphere.wait_until_stopped(name, cfg.timeout)
                    logging.info(
-                        "Releasing the node with instance ID: %s ", name
+                        f"Releasing the node with instance ID: {name} "
                    )
                    vsphere.release_instances(name)
                    vsphere.wait_until_released(name, cfg.timeout)
                    nodes_terminated[int(time.time_ns())] = Node(name=name)
                    logging.info(
-                        "Node with instance ID: %s has been released", name
+                        f"Node with instance ID: {name} has been released"
                    )
                    logging.info(
                        "node_terminate_scenario has been "
@@ -754,8 +749,8 @@ def node_terminate(
            except Exception as e:
                logging.error("Failed to terminate node instance. Test Failed")
                logging.error(
-                    "node_terminate_scenario injection failed! "
-                    "Error was: %s", str(e)
+                    f"node_terminate_scenario injection failed! "
+                    f"Error was: {str(e)}"
                )
                return "error", NodeScenarioErrorOutput(
                    format_exc(), kube_helper.Actions.TERMINATE
--- a/kraken/pod_scenarios/setup.py
+++ b/kraken/pod_scenarios/setup.py
@@ -1,9 +1,13 @@
 import logging
 import time
+from typing import Any
+
 import yaml
 import sys
 import random
 import arcaflow_plugin_kill_pod
+from krkn_lib.k8s.pods_monitor_pool import PodsMonitorPool
+
 import kraken.cerberus.setup as cerberus
 import kraken.post_actions.actions as post_actions
 from krkn_lib.k8s import KrknKubernetes
@@ -79,11 +83,12 @@ def container_run(kubeconfig_path,

    failed_scenarios = []
    scenario_telemetries: list[ScenarioTelemetry] = []
+    pool = PodsMonitorPool(kubecli)

    for container_scenario_config in scenarios_list:
        scenario_telemetry = ScenarioTelemetry()
        scenario_telemetry.scenario = container_scenario_config[0]
-        scenario_telemetry.startTimeStamp = time.time()
+        scenario_telemetry.start_timestamp = time.time()
        telemetry.set_parameters_base64(scenario_telemetry, container_scenario_config[0])
        if len(container_scenario_config) > 1:
            pre_action_output = post_actions.run(kubeconfig_path, container_scenario_config[1])
@@ -91,23 +96,17 @@ def container_run(kubeconfig_path,
            pre_action_output = ""
        with open(container_scenario_config[0], "r") as f:
            cont_scenario_config = yaml.full_load(f)
+            start_monitoring(kill_scenarios=cont_scenario_config["scenarios"], pool=pool)
            for cont_scenario in cont_scenario_config["scenarios"]:
                # capture start time
                start_time = int(time.time())
                try:
                    killed_containers = container_killing_in_pod(cont_scenario, kubecli)
-                    if len(container_scenario_config) > 1:
-                        failed_post_scenarios = post_actions.check_recovery(
-                            kubeconfig_path,
-                            container_scenario_config,
-                            failed_post_scenarios,
-                            pre_action_output
-                        )
-                    else:
-                        failed_post_scenarios = check_failed_containers(
-                            killed_containers, cont_scenario.get("retry_wait", 120), kubecli
-                        )
-
+                    logging.info(f"killed containers: {str(killed_containers)}")
+                    result = pool.join()
+                    if result.error:
+                        raise Exception(f"pods failed to recovery: {result.error}")
+                    scenario_telemetry.affected_pods = result
                    logging.info("Waiting for the specified duration: %s" % (wait_duration))
                    time.sleep(wait_duration)

@@ -117,18 +116,29 @@ def container_run(kubeconfig_path,
                    # publish cerberus status
                    cerberus.publish_kraken_status(config, failed_post_scenarios, start_time, end_time)
                except (RuntimeError, Exception):
+                    pool.cancel()
                    failed_scenarios.append(container_scenario_config[0])
                    log_exception(container_scenario_config[0])
-                    scenario_telemetry.exitStatus = 1
+                    scenario_telemetry.exit_status = 1
                    # removed_exit
                    # sys.exit(1)
                else:
-                    scenario_telemetry.exitStatus = 0
-                scenario_telemetry.endTimeStamp = time.time()
+                    scenario_telemetry.exit_status = 0
+                scenario_telemetry.end_timestamp = time.time()
                scenario_telemetries.append(scenario_telemetry)

    return failed_scenarios, scenario_telemetries

+def start_monitoring(kill_scenarios: list[Any], pool: PodsMonitorPool):
+    for kill_scenario in kill_scenarios:
+        namespace_pattern = f"^{kill_scenario['namespace']}$"
+        label_selector = kill_scenario["label_selector"]
+        recovery_time = kill_scenario["expected_recovery_time"]
+        pool.select_and_monitor_by_namespace_pattern_and_label(
+            namespace_pattern=namespace_pattern,
+            label_selector=label_selector,
+            max_timeout=recovery_time)
+

 def container_killing_in_pod(cont_scenario, kubecli: KrknKubernetes):
    scenario_name = get_yaml_item_value(cont_scenario, "name", "")
--- a/kraken/pvc/pvc_scenario.py
+++ b/kraken/pvc/pvc_scenario.py
@@ -11,7 +11,7 @@ from krkn_lib.utils.functions import get_yaml_item_value, log_exception


 # krkn_lib
-def run(scenarios_list, config, kubecli: KrknKubernetes, telemetry: KrknTelemetryKubernetes) -> (list[str], list[ScenarioTelemetry]):
+def run(scenarios_list, config, wait_duration, kubecli: KrknKubernetes, telemetry: KrknTelemetryKubernetes) -> (list[str], list[ScenarioTelemetry]):
    """
    Reads the scenario config and creates a temp file to fill up the PVC
    """
@@ -21,7 +21,7 @@ def run(scenarios_list, config, kubecli: KrknKubernetes, telemetry: KrknTelemetr
    for app_config in scenarios_list:
        scenario_telemetry = ScenarioTelemetry()
        scenario_telemetry.scenario = app_config
-        scenario_telemetry.startTimeStamp = time.time()
+        scenario_telemetry.start_timestamp = time.time()
        telemetry.set_parameters_base64(scenario_telemetry, app_config)
        try:
            if len(app_config) > 1:
@@ -305,7 +305,9 @@ def run(scenarios_list, config, kubecli: KrknKubernetes, telemetry: KrknTelemetr
                        file_size_kb,
                        kubecli
                    )
-
+                    logging.info("End of scenario. Waiting for the specified duration: %s" % (wait_duration))
+                    time.sleep(wait_duration)
+                    
                    end_time = int(time.time())
                    cerberus.publish_kraken_status(
                        config,
@@ -314,11 +316,11 @@ def run(scenarios_list, config, kubecli: KrknKubernetes, telemetry: KrknTelemetr
                        end_time
                    )
        except (RuntimeError, Exception):
-            scenario_telemetry.exitStatus = 1
+            scenario_telemetry.exit_status = 1
            failed_scenarios.append(app_config)
            log_exception(app_config)
        else:
-            scenario_telemetry.exitStatus = 0
+            scenario_telemetry.exit_status = 0
        scenario_telemetries.append(scenario_telemetry)

    return failed_scenarios, scenario_telemetries
--- a/kraken/service_disruption/common_service_disruption_functions.py
+++ b/kraken/service_disruption/common_service_disruption_functions.py
@@ -165,7 +165,7 @@ def run(
    for scenario_config in scenarios_list:
        scenario_telemetry = ScenarioTelemetry()
        scenario_telemetry.scenario = scenario_config[0]
-        scenario_telemetry.startTimeStamp = time.time()
+        scenario_telemetry.start_timestamp = time.time()
        telemetry.set_parameters_base64(scenario_telemetry, scenario_config[0])
        try:
            if len(scenario_config) > 1:
@@ -249,12 +249,12 @@ def run(
                    end_time = int(time.time())
                    cerberus.publish_kraken_status(config, failed_post_scenarios, start_time, end_time)
        except (Exception, RuntimeError):
-            scenario_telemetry.exitStatus = 1
+            scenario_telemetry.exit_status = 1
            failed_scenarios.append(scenario_config[0])
            log_exception(scenario_config[0])
        else:
-            scenario_telemetry.exitStatus = 0
-        scenario_telemetry.endTimeStamp = time.time()
+            scenario_telemetry.exit_status = 0
+        scenario_telemetry.end_timestamp = time.time()
        scenario_telemetries.append(scenario_telemetry)
    return failed_scenarios, scenario_telemetries

--- a/kraken/service_hijacking/init.py
+++ b/kraken/service_hijacking/init.py
--- a/kraken/service_hijacking/service_hijacking.py
+++ b/kraken/service_hijacking/service_hijacking.py
@@ -0,0 +1,90 @@
+import logging
+import time
+
+import yaml
+from krkn_lib.k8s import KrknKubernetes
+from krkn_lib.models.telemetry import ScenarioTelemetry
+from krkn_lib.telemetry.k8s import KrknTelemetryKubernetes
+
+
+def run(scenarios_list: list[str],wait_duration: int,  krkn_lib: KrknKubernetes, telemetry: KrknTelemetryKubernetes) -> (list[str], list[ScenarioTelemetry]):
+    scenario_telemetries= list[ScenarioTelemetry]()
+    failed_post_scenarios = []
+    for scenario in scenarios_list:
+        scenario_telemetry = ScenarioTelemetry()
+        scenario_telemetry.scenario = scenario
+        scenario_telemetry.start_timestamp = time.time()
+        telemetry.set_parameters_base64(scenario_telemetry, scenario)
+        with open(scenario) as stream:
+            scenario_config = yaml.safe_load(stream)
+
+        service_name = scenario_config['service_name']
+        service_namespace = scenario_config['service_namespace']
+        plan = scenario_config["plan"]
+        image = scenario_config["image"]
+        target_port = scenario_config["service_target_port"]
+        chaos_duration = scenario_config["chaos_duration"]
+
+        logging.info(f"checking service {service_name} in namespace: {service_namespace}")
+        if not krkn_lib.service_exists(service_name, service_namespace):
+            logging.error(f"service: {service_name} not found in namespace: {service_namespace}, failed to run scenario.")
+            fail(scenario_telemetry, scenario_telemetries)
+            failed_post_scenarios.append(scenario)
+            break
+        try:
+            logging.info(f"service: {service_name} found in namespace: {service_namespace}")
+            logging.info(f"creating webservice and initializing test plan...")
+            # both named ports and port numbers can be used
+            if isinstance(target_port, int):
+                logging.info(f"webservice will listen on port {target_port}")
+                webservice = krkn_lib.deploy_service_hijacking(service_namespace, plan, image, port_number=target_port)
+            else:
+                logging.info(f"traffic will be redirected to named port: {target_port}")
+                webservice = krkn_lib.deploy_service_hijacking(service_namespace, plan, image, port_name=target_port)
+            logging.info(f"successfully deployed pod: {webservice.pod_name} "
+                         f"in namespace:{service_namespace} with selector {webservice.selector}!"
+                         )
+            logging.info(f"patching service: {service_name} to hijack traffic towards: {webservice.pod_name}")
+            original_service = krkn_lib.replace_service_selector([webservice.selector], service_name, service_namespace)
+            if original_service is None:
+                logging.error(f"failed to patch service: {service_name}, namespace: {service_namespace} with selector {webservice.selector}")
+                fail(scenario_telemetry, scenario_telemetries)
+                failed_post_scenarios.append(scenario)
+                break
+
+            logging.info(f"service: {service_name} successfully patched!")
+            logging.info(f"original service manifest:\n\n{yaml.dump(original_service)}")
+            logging.info(f"waiting {chaos_duration} before restoring the service")
+            time.sleep(chaos_duration)
+            selectors = ["=".join([key, original_service["spec"]["selector"][key]]) for key in original_service["spec"]["selector"].keys()]
+            logging.info(f"restoring the service selectors {selectors}")
+            original_service = krkn_lib.replace_service_selector(selectors, service_name, service_namespace)
+            if original_service is None:
+                logging.error(f"failed to restore original service: {service_name}, namespace: {service_namespace} with selectors: {selectors}")
+                fail(scenario_telemetry, scenario_telemetries)
+                failed_post_scenarios.append(scenario)
+                break
+            logging.info("selectors successfully restored")
+            logging.info("undeploying service-hijacking resources...")
+            krkn_lib.undeploy_service_hijacking(webservice)
+
+            logging.info("End of scenario. Waiting for the specified duration: %s" % (wait_duration))
+            time.sleep(wait_duration)
+            
+            scenario_telemetry.exit_status = 0
+            scenario_telemetry.end_timestamp = time.time()
+            scenario_telemetries.append(scenario_telemetry)
+            logging.info("success")
+        except Exception as e:
+            logging.error(f"scenario {scenario} failed with exception: {e}")
+            fail(scenario_telemetry, scenario_telemetries)
+            failed_post_scenarios.append(scenario)
+
+    return failed_post_scenarios, scenario_telemetries
+
+
+def fail(scenario_telemetry: ScenarioTelemetry,  scenario_telemetries: list[ScenarioTelemetry]):
+    scenario_telemetry.exit_status = 1
+    scenario_telemetry.end_timestamp = time.time()
+    scenario_telemetries.append(scenario_telemetry)
+
--- a/kraken/shut_down/common_shut_down_func.py
+++ b/kraken/shut_down/common_shut_down_func.py
@@ -147,7 +147,7 @@ def run(scenarios_list, config, wait_duration, kubecli: KrknKubernetes, telemetr

        scenario_telemetry = ScenarioTelemetry()
        scenario_telemetry.scenario = config_path
-        scenario_telemetry.startTimeStamp = time.time()
+        scenario_telemetry.start_timestamp = time.time()
        telemetry.set_parameters_base64(scenario_telemetry, config_path)

        with open(config_path, "r") as f:
@@ -175,11 +175,11 @@ def run(scenarios_list, config, wait_duration, kubecli: KrknKubernetes, telemetr
            except (RuntimeError, Exception):
                log_exception(config_path)
                failed_scenarios.append(config_path)
-                scenario_telemetry.exitStatus = 1
+                scenario_telemetry.exit_status = 1
            else:
-                scenario_telemetry.exitStatus = 0
+                scenario_telemetry.exit_status = 0

-            scenario_telemetry.endTimeStamp = time.time()
+            scenario_telemetry.end_timestamp = time.time()
            scenario_telemetries.append(scenario_telemetry)

    return failed_scenarios, scenario_telemetries
--- a/kraken/syn_flood/init.py
+++ b/kraken/syn_flood/init.py
@@ -0,0 +1 @@
+from .syn_flood import *
--- a/kraken/syn_flood/syn_flood.py
+++ b/kraken/syn_flood/syn_flood.py
@@ -0,0 +1,132 @@
+import logging
+import os.path
+import time
+from typing import List
+
+import krkn_lib.utils
+import yaml
+from krkn_lib.k8s import KrknKubernetes
+from krkn_lib.models.telemetry import ScenarioTelemetry
+from krkn_lib.telemetry.k8s import KrknTelemetryKubernetes
+
+
+def run(scenarios_list: list[str], krkn_kubernetes: KrknKubernetes, telemetry: KrknTelemetryKubernetes) -> (list[str], list[ScenarioTelemetry]):
+    scenario_telemetries: list[ScenarioTelemetry] = []
+    failed_post_scenarios = []
+    for scenario in scenarios_list:
+        scenario_telemetry = ScenarioTelemetry()
+        scenario_telemetry.scenario = scenario
+        scenario_telemetry.start_timestamp = time.time()
+        telemetry.set_parameters_base64(scenario_telemetry, scenario)
+
+        try:
+            pod_names = []
+            config = parse_config(scenario)
+            if config["target-service-label"]:
+                target_services = krkn_kubernetes.select_service_by_label(config["namespace"], config["target-service-label"])
+            else:
+                target_services = [config["target-service"]]
+
+            for target in target_services:
+                if not krkn_kubernetes.service_exists(target, config["namespace"]):
+                    raise Exception(f"{target} service not found")
+                for i in range(config["number-of-pods"]):
+                    pod_name = "syn-flood-" + krkn_lib.utils.get_random_string(10)
+                    krkn_kubernetes.deploy_syn_flood(pod_name,
+                                                     config["namespace"],
+                                                     config["image"],
+                                                     target,
+                                                     config["target-port"],
+                                                     config["packet-size"],
+                                                     config["window-size"],
+                                                     config["duration"],
+                                                     config["attacker-nodes"]
+                                                     )
+                    pod_names.append(pod_name)
+
+            logging.info("waiting all the attackers to finish:")
+            did_finish = False
+            finished_pods = []
+            while not did_finish:
+                for pod_name in pod_names:
+                    if not krkn_kubernetes.is_pod_running(pod_name, config["namespace"]):
+                        finished_pods.append(pod_name)
+                    if set(pod_names) == set(finished_pods):
+                        did_finish = True
+                time.sleep(1)
+
+        except Exception as e:
+            logging.error(f"Failed to run syn flood scenario {scenario}: {e}")
+            failed_post_scenarios.append(scenario)
+            scenario_telemetry.exit_status = 1
+        else:
+            scenario_telemetry.exit_status = 0
+        scenario_telemetry.end_timestamp = time.time()
+        scenario_telemetries.append(scenario_telemetry)
+    return failed_post_scenarios, scenario_telemetries
+
+def parse_config(scenario_file: str) -> dict[str,any]:
+    if not os.path.exists(scenario_file):
+        raise Exception(f"failed to load scenario file {scenario_file}")
+
+    try:
+        with open(scenario_file) as stream:
+            config = yaml.safe_load(stream)
+    except Exception:
+        raise Exception(f"{scenario_file} is not a valid yaml file")
+
+    missing = []
+    if not check_key_value(config ,"packet-size"):
+        missing.append("packet-size")
+    if not check_key_value(config,"window-size"):
+        missing.append("window-size")
+    if not check_key_value(config, "duration"):
+        missing.append("duration")
+    if not check_key_value(config, "namespace"):
+        missing.append("namespace")
+    if not check_key_value(config, "number-of-pods"):
+        missing.append("number-of-pods")
+    if not check_key_value(config, "target-port"):
+        missing.append("target-port")
+    if not check_key_value(config, "image"):
+        missing.append("image")
+    if "target-service" not in config.keys():
+        missing.append("target-service")
+    if "target-service-label" not in config.keys():
+        missing.append("target-service-label")
+
+
+
+
+    if len(missing) > 0:
+        raise Exception(f"{(',').join(missing)} parameter(s) are missing")
+
+    if not config["target-service"] and not config["target-service-label"]:
+        raise Exception("you have either to set a target service or a label")
+    if config["target-service"] and config["target-service-label"]:
+        raise Exception("you cannot select both target-service and target-service-label")
+
+    if 'attacker-nodes' and not is_node_affinity_correct(config['attacker-nodes']):
+        raise Exception("attacker-nodes format is not correct")
+    return config
+
+def check_key_value(dictionary, key):
+    if key in dictionary:
+        value = dictionary[key]
+        if value is not None and value != '':
+            return True
+    return False
+
+def is_node_affinity_correct(obj) -> bool:
+    if not isinstance(obj, dict):
+        return False
+    for key in obj.keys():
+        if not isinstance(key, str):
+            return False
+        if not isinstance(obj[key], list):
+            return False
+    return True
+
+
+
+
--- a/kraken/time_actions/common_time_functions.py
+++ b/kraken/time_actions/common_time_functions.py
@@ -354,7 +354,7 @@ def run(scenarios_list, config, wait_duration, kubecli:KrknKubernetes, telemetry
    for time_scenario_config in scenarios_list:
        scenario_telemetry = ScenarioTelemetry()
        scenario_telemetry.scenario = time_scenario_config
-        scenario_telemetry.startTimeStamp = time.time()
+        scenario_telemetry.start_timestamp = time.time()
        telemetry.set_parameters_base64(scenario_telemetry, time_scenario_config)
        try:
            with open(time_scenario_config, "r") as f:
@@ -377,12 +377,12 @@ def run(scenarios_list, config, wait_duration, kubecli:KrknKubernetes, telemetry
                        end_time
                    )
        except (RuntimeError, Exception):
-            scenario_telemetry.exitStatus = 1
+            scenario_telemetry.exit_status = 1
            log_exception(time_scenario_config)
            failed_scenarios.append(time_scenario_config)
        else:
-            scenario_telemetry.exitStatus = 0
-        scenario_telemetry.endTimeStamp = time.time()
+            scenario_telemetry.exit_status = 0
+        scenario_telemetry.end_timestamp = time.time()
        scenario_telemetries.append(scenario_telemetry)

    return failed_scenarios, scenario_telemetries
--- a/kraken/zone_outage/actions.py
+++ b/kraken/zone_outage/actions.py
@@ -19,7 +19,7 @@ def run(scenarios_list, config, wait_duration, telemetry: KrknTelemetryKubernete
    for zone_outage_config in scenarios_list:
        scenario_telemetry = ScenarioTelemetry()
        scenario_telemetry.scenario = zone_outage_config
-        scenario_telemetry.startTimeStamp = time.time()
+        scenario_telemetry.start_timestamp = time.time()
        telemetry.set_parameters_base64(scenario_telemetry, zone_outage_config)
        try:
            if len(zone_outage_config) > 1:
@@ -110,12 +110,12 @@ def run(scenarios_list, config, wait_duration, telemetry: KrknTelemetryKubernete
                        end_time
                    )
        except (RuntimeError, Exception):
-            scenario_telemetry.exitStatus = 1
+            scenario_telemetry.exit_status = 1
            failed_scenarios.append(zone_outage_config)
            log_exception(zone_outage_config)
        else:
-            scenario_telemetry.exitStatus = 0
-        scenario_telemetry.endTimeStamp = time.time()
+            scenario_telemetry.exit_status = 0
+        scenario_telemetry.end_timestamp = time.time()
        scenario_telemetries.append(scenario_telemetry)
    return failed_scenarios, scenario_telemetries

--- a/requirements.txt
+++ b/requirements.txt
@@ -1,9 +1,9 @@
 aliyun-python-sdk-core==2.13.36
 aliyun-python-sdk-ecs==4.24.25
-arcaflow==0.9.0
-arcaflow-plugin-sdk==0.10.0
+arcaflow-plugin-sdk==0.14.0
+arcaflow==0.17.2
 boto3==1.28.61
-azure-identity==1.15.0
+azure-identity==1.16.1
 azure-keyvault==4.2.0
 azure-mgmt-compute==30.5.0
 itsdangerous==2.0.1
@@ -14,28 +14,27 @@ gitpython==3.1.41
 google-api-python-client==2.116.0
 ibm_cloud_sdk_core==3.18.0
 ibm_vpc==0.20.0
-jinja2==3.1.3
-krkn-lib==2.1.0
+jinja2==3.1.4
+krkn-lib==2.1.7
 lxml==5.1.0
-kubernetes==26.1.0
+kubernetes==28.1.0
 oauth2client==4.1.3
 pandas==2.2.0
 openshift-client==1.0.21
 paramiko==3.4.0
-podman-compose==1.0.6
 pyVmomi==8.0.2.0.1
 pyfiglet==1.0.2
 pytest==8.0.0
 python-ipmi==0.5.4
 python-openstackclient==6.5.0
-requests==2.31.0
+requests==2.32.2
 service_identity==24.1.0
-PyYAML==6.0
-setuptools==65.5.1
-werkzeug==3.0.1
+PyYAML==6.0.1
+setuptools==70.0.0
+werkzeug==3.0.3
 wheel==0.42.0
 zope.interface==5.4.0

-git+https://github.com/krkn-chaos/arcaflow-plugin-kill-pod.git
+git+https://github.com/krkn-chaos/arcaflow-plugin-kill-pod.git@v0.1.0
 git+https://github.com/vmware/vsphere-automation-sdk-python.git@v8.0.0.0
 cryptography>=42.0.4 # not directly required, pinned by Snyk to avoid a vulnerability
--- a/run_kraken.py
+++ b/run_kraken.py
@@ -25,8 +25,9 @@ import kraken.pvc.pvc_scenario as pvc_scenario
 import kraken.network_chaos.actions as network_chaos
 import kraken.arcaflow_plugin as arcaflow_plugin
 import kraken.prometheus as prometheus_plugin
+import kraken.service_hijacking.service_hijacking as service_hijacking_plugin
 import server as server
-from kraken import plugins
+from kraken import plugins, syn_flood
 from krkn_lib.k8s import KrknKubernetes
 from krkn_lib.ocp import KrknOpenshift
 from krkn_lib.telemetry.elastic import KrknElastic
@@ -264,7 +265,9 @@ def main(cfg):
                                kraken_config,
                                failed_post_scenarios,
                                wait_duration,
-                                telemetry_k8s
+                                telemetry_k8s,
+                                kubecli,
+                                run_uuid
                            )
                            chaos_telemetry.scenarios.extend(scenario_telemetries)
                        # krkn_lib
@@ -332,14 +335,14 @@ def main(cfg):
                        elif scenario_type == "application_outages":
                            logging.info("Injecting application outage")
                            failed_post_scenarios, scenario_telemetries = application_outage.run(
-                                scenarios_list, config, wait_duration, telemetry_k8s)
+                                scenarios_list, config, wait_duration, kubecli, telemetry_k8s)
                            chaos_telemetry.scenarios.extend(scenario_telemetries)

                        # PVC scenarios
                        # krkn_lib
                        elif scenario_type == "pvc_scenarios":
                            logging.info("Running PVC scenario")
-                            failed_post_scenarios, scenario_telemetries = pvc_scenario.run(scenarios_list, config, kubecli, telemetry_k8s)
+                            failed_post_scenarios, scenario_telemetries = pvc_scenario.run(scenarios_list, config, wait_duration, kubecli, telemetry_k8s)
                            chaos_telemetry.scenarios.extend(scenario_telemetries)

                        # Network scenarios
@@ -347,6 +350,14 @@ def main(cfg):
                        elif scenario_type == "network_chaos":
                            logging.info("Running Network Chaos")
                            failed_post_scenarios, scenario_telemetries = network_chaos.run(scenarios_list, config, wait_duration, kubecli, telemetry_k8s)
+                        elif scenario_type == "service_hijacking":
+                            logging.info("Running Service Hijacking Chaos")
+                            failed_post_scenarios, scenario_telemetries = service_hijacking_plugin.run(scenarios_list, wait_duration, kubecli, telemetry_k8s)
+                            chaos_telemetry.scenarios.extend(scenario_telemetries)
+                        elif scenario_type == "syn_flood":
+                            logging.info("Running Syn Flood Chaos")
+                            failed_post_scenarios, scenario_telemetries = syn_flood.run(scenarios_list, kubecli, telemetry_k8s)
+                            chaos_telemetry.scenarios.extend(scenario_telemetries)

                        # Check for critical alerts when enabled
                        post_critical_alerts = 0
--- a/scenarios/arcaflow/cpu-hog/input.yaml
+++ b/scenarios/arcaflow/cpu-hog/input.yaml
@@ -2,7 +2,7 @@ input_list:
  - cpu_count: 1
    cpu_load_percentage: 80
    cpu_method: all
-    duration: 1s
+    duration: 30
    kubeconfig: ''
    namespace: default
    # set the node selector as a key-value pair eg.
--- a/scenarios/arcaflow/cpu-hog/sub-workflow.yaml
+++ b/scenarios/arcaflow/cpu-hog/sub-workflow.yaml
@@ -1,9 +1,9 @@
 version: v0.2.0
 input:
-  root: RootObject
+  root: SubRootObject
  objects:
-    RootObject:
-      id: input_item
+    SubRootObject:
+      id: SubRootObject
      properties:
        kubeconfig:
          display:
@@ -35,7 +35,7 @@ input:
            description: stop stress test after T seconds. One can also specify the units of time in
              seconds, minutes, hours, days or years with the suffix s, m, h, d or y
          type:
-            type_id: string
+            type_id: integer
          required: true
        cpu_count:
          display:
@@ -68,18 +68,18 @@ steps:
      kubeconfig: !expr $.input.kubeconfig
  stressng:
    plugin: 
-      src: quay.io/arcalot/arcaflow-plugin-stressng:0.5.0
+      src: quay.io/arcalot/arcaflow-plugin-stressng:0.6.0
      deployment_type: image
    step: workload
    input:
      cleanup: "true"
-      StressNGParams:
-        timeout: !expr $.input.duration
-        stressors:
-          - stressor: cpu
-            cpu_count: !expr $.input.cpu_count
-            cpu_method: !expr $.input.cpu_method
-            cpu_load: !expr $.input.cpu_load_percentage
+
+      timeout: !expr $.input.duration
+      stressors:
+        - stressor: cpu
+          workers: !expr $.input.cpu_count
+          cpu-method: "all"
+          cpu-load: !expr $.input.cpu_load_percentage
    deploy:
      deployer_name: kubernetes
      connection: !expr $.steps.kubeconfig.outputs.success.connection
--- a/scenarios/arcaflow/cpu-hog/workflow.yaml
+++ b/scenarios/arcaflow/cpu-hog/workflow.yaml
@@ -9,62 +9,10 @@ input:
          type:
            type_id: list
            items:
-              id: input_item
-              type_id: object
-              properties:
-                kubeconfig:
-                  display:
-                    description: The complete kubeconfig file as a string
-                    name: Kubeconfig file contents
-                  type:
-                    type_id: string
-                  required: true
-                namespace:
-                    display:
-                      description: The namespace where the container will be deployed
-                      name: Namespace
-                    type:
-                      type_id: string
-                    required: true
-                node_selector:
-                    display:
-                      description: kubernetes node name where the plugin must be deployed
-                    type:
-                      type_id: map
-                      values:
-                        type_id: string
-                      keys:
-                        type_id: string
-                    required: true
-                duration:
-                  display:
-                    name: duration the scenario expressed in seconds
-                    description: stop stress test after T seconds. One can also specify the units of time in
-                      seconds, minutes, hours, days or years with the suffix s, m, h, d or y
-                  type:
-                    type_id: string
-                  required: true
-                cpu_count:
-                  display:
-                    description: Number of CPU cores to be used (0 means all)
-                    name: number of CPUs
-                  type:
-                    type_id: integer
-                  required: true
-                cpu_method:
-                  display:
-                    description: CPU stress method
-                    name: fine grained control of which cpu stressors to use (ackermann, cfloat etc.)
-                  type:
-                    type_id: string
-                  required: true
-                cpu_load_percentage:
-                  display:
-                    description: load CPU by percentage
-                    name: CPU load
-                  type:
-                    type_id: integer
-                  required: true
+              id: SubRootObject
+              type_id: ref
+              namespace: $.steps.workload_loop.execute.inputs.items
+
 steps:
  workload_loop:
    kind: foreach
--- a/scenarios/arcaflow/io-hog/input.yaml
+++ b/scenarios/arcaflow/io-hog/input.yaml
@@ -1,5 +1,5 @@
 input_list:
- duration: 30s
+- duration: 30
  io_block_size: 1m
  io_workers: 1
  io_write_bytes: 10m
--- a/scenarios/arcaflow/io-hog/sub-workflow.yaml
+++ b/scenarios/arcaflow/io-hog/sub-workflow.yaml
@@ -1,6 +1,6 @@
 version: v0.2.0
 input:
-  root: RootObject
+  root: SubRootObject
  objects:
    hostPath:
      id: HostPathVolumeSource
@@ -18,8 +18,8 @@ input:
          type:
            id: hostPath
            type_id: ref
-    RootObject:
-      id: input_item
+    SubRootObject:
+      id: SubRootObject
      properties:
        kubeconfig:
          display:
@@ -51,7 +51,7 @@ input:
            description: stop  stress  test  after  T  seconds.  One  can  also specify the units of time in
              seconds, minutes, hours, days or years with the suffix s, m, h, d or  y
          type:
-            type_id: string
+            type_id: integer
          required: true
        io_workers:
          display:
@@ -102,19 +102,18 @@ steps:
      kubeconfig: !expr $.input.kubeconfig
  stressng:
    plugin: 
-      src: quay.io/arcalot/arcaflow-plugin-stressng:0.5.0
+      src: quay.io/arcalot/arcaflow-plugin-stressng:0.6.0
      deployment_type: image
    step: workload
    input:
      cleanup: "true"
-      StressNGParams:
-        timeout: !expr $.input.duration
-        workdir: !expr $.input.target_pod_folder
-        stressors:
-          - stressor: hdd
-            hdd: !expr $.input.io_workers
-            hdd_bytes: !expr $.input.io_write_bytes
-            hdd_write_size: !expr $.input.io_block_size
+      timeout: !expr $.input.duration
+      workdir: !expr $.input.target_pod_folder
+      stressors:
+        - stressor: hdd
+          workers: !expr $.input.io_workers
+          hdd-bytes: !expr $.input.io_write_bytes
+          hdd-write-size: !expr $.input.io_block_size

    deploy:
      deployer_name: kubernetes
--- a/scenarios/arcaflow/io-hog/workflow.yaml
+++ b/scenarios/arcaflow/io-hog/workflow.yaml
@@ -2,22 +2,6 @@ version: v0.2.0
 input:
  root: RootObject
  objects:
-    hostPath:
-      id: HostPathVolumeSource
-      properties:
-        path:
-          type:
-            type_id: string
-    Volume:
-      id: Volume
-      properties:
-        name:
-          type:
-            type_id: string
-        hostPath:
-          type:
-            id: hostPath
-            type_id: ref
    RootObject:
      id: RootObject
      properties:
@@ -25,80 +9,9 @@ input:
          type:
            type_id: list
            items:
-              id: input_item
-              type_id: object
-              properties:
-                kubeconfig:
-                  display:
-                    description: The complete kubeconfig file as a string
-                    name: Kubeconfig file contents
-                  type:
-                    type_id: string
-                  required: true
-                namespace:
-                  display:
-                    description: The namespace where the container will be deployed
-                    name: Namespace
-                  type:
-                    type_id: string
-                  required: true
-                node_selector:
-                  display:
-                    description: kubernetes node name where the plugin must be deployed
-                  type:
-                    type_id: map
-                    values:
-                      type_id: string
-                    keys:
-                      type_id: string
-                  required: true
-                duration:
-                  display:
-                    name: duration the scenario expressed in seconds
-                    description: stop  stress  test  after  T  seconds.  One  can  also specify the units of time in
-                      seconds, minutes, hours, days or years with the suffix s, m, h, d or  y
-                  type:
-                    type_id: string
-                  required: true
-                io_workers:
-                  display:
-                    description: number of workers
-                    name: start N workers continually writing, reading  and  removing  temporary  files
-                  type:
-                    type_id: integer
-                  required: true
-                io_block_size:
-                  display:
-                    description: single write size
-                    name: specify size of each write in bytes. Size can be from 1 byte to 4MB.
-                  type:
-                    type_id: string
-                  required: true
-                io_write_bytes:
-                  display:
-                    description: Total number of bytes written
-                    name: write  N  bytes for each hdd process, the default is 1 GB. One can specify the size
-                      as % of free space on the file system or in units  of  Bytes,  KBytes,  MBytes  and
-                      GBytes using the suffix b, k, m or g
-                  type:
-                    type_id: string
-                  required: true
-                target_pod_folder:
-                  display:
-                    description: Target Folder
-                    name: Folder in the pod where the test will be executed and the test files will be written
-                  type:
-                    type_id: string
-                  required: true
-                target_pod_volume:
-                  display:
-                    name: kubernetes volume definition
-                    description: the volume that will be attached to the pod. In order to stress
-                      the node storage only hosPath mode is currently supported
-                  type:
-                    type_id: ref
-                    id: Volume
-                  required: true
+              id: SubRootObject
+              type_id: ref
+              namespace: $.steps.workload_loop.execute.inputs.items
 steps:
  workload_loop:
    kind: foreach
--- a/scenarios/arcaflow/memory-hog/input.yaml
+++ b/scenarios/arcaflow/memory-hog/input.yaml
@@ -1,5 +1,5 @@
 input_list:
- duration: 30s
+- duration: 30
  vm_bytes: 10%
  vm_workers: 2
  # set the node selector as a key-value pair eg.
--- a/scenarios/arcaflow/memory-hog/sub-workflow.yaml
+++ b/scenarios/arcaflow/memory-hog/sub-workflow.yaml
@@ -1,9 +1,9 @@
 version: v0.2.0
 input:
-  root: RootObject
+  root: SubRootObject
  objects:
-    RootObject:
-      id: input_item
+    SubRootObject:
+      id: SubRootObject
      properties:
        kubeconfig:
          display:
@@ -34,7 +34,7 @@ input:
            name: duration the scenario expressed in seconds
            description: stop stress test after T seconds. One can also specify the units of time in seconds, minutes, hours, days or years with the suffix s, m, h, d or  y
          type:
-            type_id: string
+            type_id: integer
          required: true
        vm_workers:
          display:
@@ -60,17 +60,16 @@ steps:
      kubeconfig: !expr $.input.kubeconfig
  stressng:
    plugin: 
-      src: quay.io/arcalot/arcaflow-plugin-stressng:0.5.0
+      src: quay.io/arcalot/arcaflow-plugin-stressng:0.6.0
      deployment_type: image
    step: workload
    input:
      cleanup: "true"
-      StressNGParams:
-        timeout: !expr $.input.duration
-        stressors:
-          - stressor: vm
-            vm: !expr $.input.vm_workers
-            vm_bytes: !expr $.input.vm_bytes
+      timeout: !expr $.input.duration
+      stressors:
+        - stressor: vm
+          workers: !expr $.input.vm_workers
+          vm-bytes: !expr $.input.vm_bytes
    deploy:
      deployer_name: kubernetes
      connection: !expr $.steps.kubeconfig.outputs.success.connection
--- a/scenarios/arcaflow/memory-hog/workflow.yaml
+++ b/scenarios/arcaflow/memory-hog/workflow.yaml
@@ -9,54 +9,10 @@ input:
          type:
            type_id: list
            items:
-              id: input_item
-              type_id: object
-              properties:
-                kubeconfig:
-                  display:
-                    description: The complete kubeconfig file as a string
-                    name: Kubeconfig file contents
-                  type:
-                    type_id: string
-                  required: true
-                namespace:
-                    display:
-                      description: The namespace where the container will be deployed
-                      name: Namespace
-                    type:
-                      type_id: string
-                    required: true
-                node_selector:
-                  display:
-                    description: kubernetes node name where the plugin must be deployed
-                  type:
-                    type_id: map
-                    values:
-                      type_id: string
-                    keys:
-                      type_id: string
-                  required: true
-                duration:
-                  display:
-                    name: duration the scenario expressed in seconds
-                    description: stop stress test after T seconds. One can also specify the units of time in seconds, minutes, hours, days or years with the suffix s, m, h, d or  y
-                  type:
-                    type_id: string
-                  required: true
-                vm_workers:
-                  display:
-                    description: Number of VM stressors to be run (0 means 1 stressor per CPU)
-                    name: Number of VM stressors
-                  type:
-                    type_id: integer
-                  required: true
-                vm_bytes:
-                  display:
-                    description: N bytes per vm process, the default is 256MB. The size can be expressed in units of Bytes, KBytes, MBytes and GBytes using the suffix b, k, m or g.
-                    name: Kubeconfig file contents
-                  type:
-                    type_id: string
-                  required: true
+              id: SubRootObject
+              type_id: ref
+              namespace: $.steps.workload_loop.execute.inputs.items
+
 steps:
  workload_loop:
    kind: foreach
--- a/scenarios/kind/scheduler.yml
+++ b/scenarios/kind/scheduler.yml
@@ -3,8 +3,4 @@
  config:
    namespace_pattern: ^kube-system$
    label_selector: component=kube-scheduler
- id: wait-for-pods
-  config:
-    namespace_pattern: ^kube-system$
-    label_selector: component=kube-scheduler
-    count: 3
+    krkn_pod_recovery_time: 120
--- a/scenarios/kube/pod.yml
+++ b/scenarios/kube/pod.yml
@@ -4,3 +4,4 @@
    name_pattern: ^nginx-.*$
    namespace_pattern: ^default$
    kill: 1
+    krkn_pod_recovery_time: 120
--- a/scenarios/kube/scheduler.yml
+++ b/scenarios/kube/scheduler.yml
@@ -3,8 +3,4 @@
  config:
    namespace_pattern: ^kube-system$
    label_selector: k8s-app=kube-scheduler
- id: wait-for-pods
-  config:
-    namespace_pattern: ^kube-system$
-    label_selector: k8s-app=kube-scheduler
-    count: 3
+    krkn_pod_recovery_time: 120
--- a/scenarios/kube/service_hijacking.yaml
+++ b/scenarios/kube/service_hijacking.yaml
@@ -0,0 +1,56 @@
+# refer to the documentation for further infos https://github.com/krkn-chaos/krkn/blob/main/docs/service_hijacking.md
+
+service_target_port: http-web-svc # The port of the service to be hijacked (can be named or numeric, based on the workload and service configuration).
+service_name: nginx-service # name of the service to be hijacked
+service_namespace: default # The namespace where the target service is located
+image: quay.io/krkn-chaos/krkn-service-hijacking:v0.1.3 # Image of the krkn web service to be deployed to receive traffic.
+chaos_duration: 30 # Total duration of the chaos scenario in seconds.
+plan:
+  - resource: "/list/index.php" # Specifies the resource or path to respond to in the scenario. For paths, both the path and query parameters are captured but ignored.
+                                # For resources, only query parameters are captured.
+
+    steps:                      # A time-based plan consisting of steps can be defined for each resource.
+      GET:                      # One or more HTTP methods can be specified for each step.
+                                # Note: Non-standard methods are supported
+                                # for fully custom web services (e.g., using NONEXISTENT instead of POST).
+
+        - duration: 15          # Duration in seconds for this step before moving to the next one, if defined. Otherwise,
+                                # this step will continue until the chaos scenario ends.
+
+          status: 500           # HTTP status code to be returned in this step.
+          mime_type: "application/json" # MIME type of the response for this step.
+          payload: |            # The response payload for this step.
+            {
+              "status":"internal server error"
+            }
+        - duration: 15
+          status: 201
+          mime_type: "application/json"
+          payload: |
+            {
+              "status":"resource created"
+            }
+      POST:
+        - duration: 15
+          status: 401
+          mime_type: "application/json"
+          payload: |
+            {
+               "status": "unauthorized"
+            }
+        - duration: 15
+          status: 404
+          mime_type: "text/plain"
+          payload: "not found"
+
+  - resource: "/patch"
+    steps:
+      PATCH:
+        - duration: 15
+          status: 201
+          mime_type: "text/plain"
+          payload: "resource patched"
+        - duration: 15
+          status: 400
+          mime_type: "text/plain"
+          payload: "bad request"
--- a/scenarios/kube/syn_flood.yaml
+++ b/scenarios/kube/syn_flood.yaml
@@ -0,0 +1,16 @@
+packet-size: 120 # hping3 packet size
+window-size: 64 # hping 3 TCP window size
+duration: 10 # chaos scenario duration
+namespace: default # namespace where the target service(s) are deployed
+target-service: elasticsearch # target service name (if set target-service-label must be empty)
+target-port: 9200 # target service TCP port
+target-service-label : "" # target service label, can be used to target multiple target at the same time
+                          # if they have the same label set (if set target-service must be empty)
+number-of-pods: 2 # number of attacker pod instantiated per each target
+image: quay.io/krkn-chaos/krkn-syn-flood:v1.0.0 # syn flood attacker container image
+attacker-nodes:                       # this will set the node affinity to schedule the attacker node. Per each node label selector
+    node-role.kubernetes.io/worker:   # can be specified multiple values in this way the kube scheduler will schedule the attacker pods
+      - ""                            # in the best way possible based on the provided labels. Multiple labels can be specified
+                                      # set empty value  `attacker-nodes: {}`  to let kubernetes schedule the pods
+
+
--- a/scenarios/openshift/container_etcd.yml
+++ b/scenarios/openshift/container_etcd.yml
@@ -5,4 +5,4 @@ scenarios:
  container_name: "etcd"
  action: 1
  count: 1
-  expected_recovery_time: 60
+  expected_recovery_time: 120
--- a/scenarios/openshift/customapp_pod.yaml
+++ b/scenarios/openshift/customapp_pod.yaml
@@ -3,8 +3,4 @@
  config:
    namespace_pattern: ^acme-air$
    name_pattern: .*
- id: wait-for-pods
-  config:
-    namespace_pattern: ^acme-air$
-    name_pattern: .*
-    count: 8
+    krkn_pod_recovery_time: 120
--- a/scenarios/openshift/etcd.yml
+++ b/scenarios/openshift/etcd.yml
@@ -3,8 +3,4 @@
  config:
    namespace_pattern: ^openshift-etcd$
    label_selector: k8s-app=etcd
- id: wait-for-pods
-  config:
-    namespace_pattern: ^openshift-etcd$
-    label_selector: k8s-app=etcd
-    count: 3
+    krkn_pod_recovery_time: 120
--- a/scenarios/openshift/openshift-apiserver.yml
+++ b/scenarios/openshift/openshift-apiserver.yml
@@ -3,8 +3,5 @@
  config:
    namespace_pattern: ^openshift-apiserver$
    label_selector: app=openshift-apiserver-a
- id: wait-for-pods
-  config:
-    namespace_pattern: ^openshift-apiserver$
-    label_selector: app=openshift-apiserver-a
-    count: 3
+    krkn_pod_recovery_time: 120
+
--- a/scenarios/openshift/openshift-kube-apiserver.yml
+++ b/scenarios/openshift/openshift-kube-apiserver.yml
@@ -3,8 +3,5 @@
  config:
    namespace_pattern: ^openshift-kube-apiserver$
    label_selector: app=openshift-kube-apiserver
- id: wait-for-pods
-  config:
-    namespace_pattern: ^openshift-kube-apiserver$
-    label_selector: app=openshift-kube-apiserver
-    count: 3
+    krkn_pod_recovery_time: 120
+
--- a/scenarios/openshift/post_action_prometheus.yml
+++ b/scenarios/openshift/post_action_prometheus.yml
@@ -3,8 +3,4 @@
  config:
    namespace_pattern: ^openshift-monitoring$
    label_selector: app=prometheus
- id: wait-for-pods
-  config:
-    namespace_pattern: ^openshift-monitoring$
-    label_selector: app=prometheus
-    count: 2
+    krkn_pod_recovery_time: 120
--- a/scenarios/openshift/prom_kill.yml
+++ b/scenarios/openshift/prom_kill.yml
@@ -2,8 +2,4 @@
  config:
    namespace_pattern: ^openshift-monitoring$
    label_selector: statefulset.kubernetes.io/pod-name=prometheus-k8s-0
- id: wait-for-pods
-  config:
-    namespace_pattern: ^openshift-monitoring$
-    label_selector: statefulset.kubernetes.io/pod-name=prometheus-k8s-0
-    count: 1
+    krkn_pod_recovery_time: 120
--- a/scenarios/openshift/prometheus.yml
+++ b/scenarios/openshift/prometheus.yml
@@ -3,9 +3,4 @@
  config:
    namespace_pattern: ^openshift-monitoring$
    label_selector: app=prometheus
- id: wait-for-pods
-  config:
-    namespace_pattern: ^openshift-monitoring$
-    label_selector: app=prometheus
-    count: 2
-    timeout: 180
+    krkn_pod_recovery_time: 120
--- a/scenarios/openshift/regex_openshift_pod_kill.yml
+++ b/scenarios/openshift/regex_openshift_pod_kill.yml
@@ -4,3 +4,4 @@
    namespace_pattern: ^openshift-.*$
    name_pattern: .*
    kill: 3
+    krkn_pod_recovery_time: 120
--- a/scenarios/plugin.schema.json
+++ b/scenarios/plugin.schema.json
@@ -60,7 +60,14 @@
 										"default": 1,
 										"title": "Backoff",
 										"description": "How many seconds to wait between checks for the target pod status."
+									},
+									"krkn_pod_recovery_time": {
+										"type": "integer",
+										"default": 30,
+										"title": "Recovery Time",
+										"description": "The Expected Recovery time fo the pod (used by Krkn to monitor the pod lifecycle)."
 									}
+
 								},
 								"required": [
 									"namespace_pattern"
@@ -112,6 +119,12 @@
 								"default": 1,
 								"title": "Backoff",
 								"description": "How many seconds to wait between checks for the target pod status."
+							},
+							"krkn_pod_recovery_time": {
+								"type": "integer",
+								"default": 30,
+								"title": "Recovery Time",
+								"description": "The Expected Recovery time fo the pod (used by Krkn to monitor the pod lifecycle)."
 							}
 						},
 						"required": [
--- a/tests/test_ingress_network_plugin.py
+++ b/tests/test_ingress_network_plugin.py
@@ -39,7 +39,7 @@ class NetworkScenariosTest(unittest.TestCase):

    def test_network_chaos(self):
        output_id, output_data = ingress_shaping.network_chaos(
-            ingress_shaping.NetworkScenarioConfig(
+            params=ingress_shaping.NetworkScenarioConfig(
                label_selector="node-role.kubernetes.io/control-plane",
                instance_count=1,
                network_params={
@@ -47,7 +47,8 @@ class NetworkScenariosTest(unittest.TestCase):
                    "loss": "0.02",
                    "bandwidth": "100mbit"
                }
-            )
+            ),
+            run_id="network-shaping-test"
        )
        if output_id == "error":
            logging.error(output_data.error)
--- a/tests/test_run_python_plugin.py
+++ b/tests/test_run_python_plugin.py
@@ -10,7 +10,7 @@ class RunPythonPluginTest(unittest.TestCase):
        tmp_file = tempfile.NamedTemporaryFile()
        tmp_file.write(bytes("print('Hello world!')", 'utf-8'))
        tmp_file.flush()
-        output_id, output_data = run_python_file(RunPythonFileInput(tmp_file.name))
+        output_id, output_data = run_python_file(params=RunPythonFileInput(tmp_file.name), run_id="test-python-plugin-success")
        self.assertEqual("success", output_id)
        self.assertEqual("Hello world!\n", output_data.stdout)

@@ -18,7 +18,7 @@ class RunPythonPluginTest(unittest.TestCase):
        tmp_file = tempfile.NamedTemporaryFile()
        tmp_file.write(bytes("import sys\nprint('Hello world!')\nsys.exit(42)\n", 'utf-8'))
        tmp_file.flush()
-        output_id, output_data = run_python_file(RunPythonFileInput(tmp_file.name))
+        output_id, output_data = run_python_file(params=RunPythonFileInput(tmp_file.name), run_id="test-python-plugin-error")
        self.assertEqual("error", output_id)
        self.assertEqual(42, output_data.exit_code)
        self.assertEqual("Hello world!\n", output_data.stdout)
--- a/utils/chaos_ai/README.md
+++ b/utils/chaos_ai/README.md
@@ -0,0 +1,40 @@
+# aichaos
+Enhancing Chaos Engineering with AI-assisted fault injection for better resiliency and non-functional testing.
+
+## Generate python package wheel file
+```
+$ python3.9 generate_wheel_package.py sdist bdist_wheel
+$ cp dist/aichaos-0.0.1-py3-none-any.whl docker/
+```
+This creates a python package file aichaos-0.0.1-py3-none-any.whl in the dist folder. 
+
+## Build Image
+```
+$ cd docker
+$ podman build -t aichaos:1.0 .
+OR
+$ docker build -t aichaos:1.0 .
+```
+
+## Run Chaos AI
+```
+$ podman run -v aichaos-config.json:/config/aichaos-config.json --privileged=true --name aichaos -p 5001:5001 aichaos:1.0
+OR
+$ docker run -v aichaos-config.json:/config/aichaos-config.json --privileged -v /var/run/docker.sock:/var/run/docker.sock --name aichaos -p 5001:5001 aichaos:1.0
+```
+
+The output should look like:
+```
+$ podman run -v aichaos-config.json:/config/aichaos-config.json --privileged=true --name aichaos -p 5001:5001 aichaos:1.0
+ * Serving Flask app 'swagger_api' (lazy loading)
+ * Environment: production
+   WARNING: This is a development server. Do not use it in a production deployment.
+   Use a production WSGI server instead.
+ * Debug mode: on
+WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
+ * Running on all addresses (0.0.0.0)
+ * Running on http://127.0.0.1:5001
+ * Running on http://172.17.0.2:5001
+```
+
+You can try out the APIs in browser at http://<server-ip>:5001/apidocs (eg. http://127.0.0.1:5001/apidocs). For testing out, you can try “GenerateChaos” api with ‘kubeconfig’ file and application URLs to test.
--- a/utils/chaos_ai/config/experiments/.gitkeep
+++ b/utils/chaos_ai/config/experiments/.gitkeep
--- a/utils/chaos_ai/docker/Dockerfile
+++ b/utils/chaos_ai/docker/Dockerfile
@@ -0,0 +1,21 @@
+FROM bitnami/kubectl:1.20.9 as kubectl
+FROM python:3.9
+WORKDIR /app
+RUN pip3 install --upgrade pip
+COPY config config/
+COPY requirements.txt .
+RUN mkdir -p /app/logs
+RUN pip3 install -r requirements.txt
+
+COPY --from=kubectl /opt/bitnami/kubectl/bin/kubectl /usr/local/bin/
+
+COPY swagger_api.py .
+ENV PYTHONUNBUFFERED=1
+
+RUN curl -fsSLO https://get.docker.com/builds/Linux/x86_64/docker-17.03.1-ce.tgz && tar --strip-components=1 -xvzf docker-17.03.1-ce.tgz -C /usr/local/bin
+
+RUN apt-get update && apt-get install -y podman
+
+COPY aichaos-0.0.1-py3-none-any.whl .
+RUN pip3 install aichaos-0.0.1-py3-none-any.whl
+CMD ["python3", "swagger_api.py"]
--- a/utils/chaos_ai/docker/aichaos-config.json
+++ b/utils/chaos_ai/docker/aichaos-config.json
@@ -0,0 +1,7 @@
+{
+  "command": "podman",
+  "chaosengine": "kraken",
+  "faults": "pod-delete",
+  "iterations": 1,
+  "maxfaults": 5
+}
--- a/utils/chaos_ai/docker/config/experiments/log.yml
+++ b/utils/chaos_ai/docker/config/experiments/log.yml
@@ -0,0 +1,15 @@
+
+    Get Log from the Chaos ID.---
+    tags:
+      - ChaosAI API Results
+    parameters:
+      - name: chaosid
+        in: path
+        type: string
+        required: true
+        description: Chaos-ID
+    responses:
+      500:
+        description: Error!
+      200:
+        description: Results for the given Chaos ID.
--- a/utils/chaos_ai/docker/config/pod-delete.json
+++ b/utils/chaos_ai/docker/config/pod-delete.json
@@ -0,0 +1,36 @@
+{
+  "apiVersion": "1.0",
+  "kind": "ChaosEngine",
+  "metadata": {
+    "name": "engine-cartns3"
+  },
+  "spec": {
+    "engineState": "active",
+    "annotationCheck": "false",
+    "appinfo": {
+      "appns": "robot-shop",
+      "applabel": "service=payment",
+      "appkind": "deployment"
+    },
+    "chaosServiceAccount": "pod-delete-sa",
+    "experiments": [
+      {
+        "name": "pod-delete",
+        "spec": {
+          "components": {
+            "env": [
+              {
+                "name": "FORCE",
+                "value": "true"
+              },
+              {
+                "name": "TOTAL_CHAOS_DURATION",
+                "value": "120"
+              }
+            ]
+          }
+        }
+      }
+    ]
+  }
+}
--- a/utils/chaos_ai/docker/config/yml/chaosGen.yml
+++ b/utils/chaos_ai/docker/config/yml/chaosGen.yml
@@ -0,0 +1,40 @@
+
+Generate chaos on an application deployed on a cluster.
+---
+    tags:
+      - ChaosAI API
+    parameters:
+      - name: file
+        in: formData
+        type: file
+        required: true
+        description: Kube-config file
+      - name: namespace
+        in: formData
+        type: string
+        default: robot-shop
+        required: true
+        description: Namespace to test
+      - name: podlabels
+        in: formData
+        type: string
+        default: service=cart,service=payment
+        required: true
+        description: Pod labels to test
+      - name: nodelabels
+        in: formData
+        type: string
+        required: false
+        description: Node labels to test
+      - name: urls
+        in: formData
+        type: string
+        default: http://<application-url>:8097/api/cart/health,http://<application-url>:8097/api/payment/health
+        required: true
+        description: Application URLs to test
+
+    responses:
+      500:
+        description: Error!
+      200:
+        description: Chaos ID for the initiated chaos.
--- a/utils/chaos_ai/docker/config/yml/episodes.yml
+++ b/utils/chaos_ai/docker/config/yml/episodes.yml
@@ -0,0 +1,15 @@
+
+    Get Episodes from the Chaos ID.---
+    tags:
+      - ChaosAI API Results
+    parameters:
+      - name: chaosid
+        in: path
+        type: string
+        required: true
+        description: Chaos-ID
+    responses:
+      500:
+        description: Error!
+      200:
+        description: Results for the given Chaos ID.
--- a/utils/chaos_ai/docker/config/yml/log.yml
+++ b/utils/chaos_ai/docker/config/yml/log.yml
@@ -0,0 +1,15 @@
+
+    Get Log from the Chaos ID.---
+    tags:
+      - ChaosAI API Results
+    parameters:
+      - name: chaosid
+        in: path
+        type: string
+        required: true
+        description: Chaos-ID
+    responses:
+      500:
+        description: Error!
+      200:
+        description: Results for the given Chaos ID.
--- a/utils/chaos_ai/docker/config/yml/qtable.yml
+++ b/utils/chaos_ai/docker/config/yml/qtable.yml
@@ -0,0 +1,15 @@
+
+    Get QTable from the Chaos ID.---
+    tags:
+      - ChaosAI API Results
+    parameters:
+      - name: chaosid
+        in: path
+        type: string
+        required: true
+        description: Chaos-ID
+    responses:
+      500:
+        description: Error!
+      200:
+        description: Results for the given Chaos ID.
--- a/utils/chaos_ai/docker/config/yml/status.yml
+++ b/utils/chaos_ai/docker/config/yml/status.yml
@@ -0,0 +1,15 @@
+
+     Get status of the Constraints ID.---
+    tags:
+      - ChaosAI API
+    parameters:
+      - name: chaosid
+        in: path
+        type: string
+        required: true
+        description: Chaos-ID
+    responses:
+      500:
+        description: Error!
+      200:
+        description: Chaos for the given ID.
--- a/utils/chaos_ai/docker/requirements.txt
+++ b/utils/chaos_ai/docker/requirements.txt
@@ -0,0 +1,6 @@
+numpy
+pandas
+requests
+Flask==2.2.5
+Werkzeug==3.0.3
+flasgger==0.9.5
--- a/utils/chaos_ai/docker/swagger_api.py
+++ b/utils/chaos_ai/docker/swagger_api.py
@@ -0,0 +1,186 @@
+import json, os
+import logging
+# import numpy as np
+# import pandas as pd
+import threading
+from datetime import datetime
+from flask import Flask, request
+from flasgger import Swagger
+from flasgger.utils import swag_from
+# import zipfile
+import sys
+
+# sys.path.append("..")
+from src.aichaos_main import AIChaos
+
+app = Flask(__name__)
+Swagger(app)
+flaskdir = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "app", "logs") + '/'
+
+
+class AIChaosSwagger:
+    def __init__(self, flaskdir=''):
+        self.flaskdir = flaskdir
+
+    @app.route("/")
+    def empty(params=''):
+        return "AI Chaos Repository!"
+
+    def startchaos(self, kubeconfigfile, file_id, params):
+        print('[StartChaos]', file_id, kubeconfigfile)
+        dir = flaskdir
+        outfile = ''.join([dir, 'out-', file_id])
+        initfile = ''.join([dir, 'init-', file_id])
+        with open(initfile, 'w'):
+            pass
+        if os.path.exists(outfile):
+            os.remove(outfile)
+        # kubeconfigfile = params['file']
+        os.environ["KUBECONFIG"] = kubeconfigfile
+        os.system("export KUBECONFIG="+kubeconfigfile)
+        os.system("echo $KUBECONFIG")
+        print('setting kubeconfig')
+        params['command'] = 'podman'
+        params['chaosengine'] = 'kraken'
+        params['faults'] = 'pod-delete'
+        params['iterations'] = 1
+        params['maxfaults'] = 5
+        if os.path.isfile('/config/aichaos-config.json'):
+            with open('/config/aichaos-config.json') as f:
+                config_params = json.load(f)
+                params['command'] = config_params['command']
+                params['chaosengine'] = config_params['chaosengine']
+                params['faults']= config_params['faults']
+                params['iterations'] = config_params['iterations']
+                params['maxfaults'] = config_params['maxfaults']
+        # faults = [f + ':' + p for f in params['faults'].split(',') for p in params['podlabels'].split(',')]
+        faults = []
+        for f in params['faults'].split(','):
+            if f in ['pod-delete']:
+                for p in params['podlabels'].split(','):
+                    faults.append(f + ':' + p)
+            elif f in ['network-chaos', 'node-memory-hog', 'node-cpu-hog']:
+                for p in params['nodelabels'].split(','):
+                    faults.append(f + ':' + p)
+            else:
+                pass
+
+        print('#faults:', len(faults), faults)
+        states = {'200': 0, '500': 1, '501': 2, '502': 3, '503': 4, '504': 5,
+                  '401': 6,  '403': 7,  '404': 8,  '429': 9,
+                  'Timeout': 10, 'Other': 11}
+        rewards = {'200': -1, '500': 0.8, '501': 0.8, '502': 0.8, '503': 0.8, '504': 0.8,
+                   '401': 1,  '403': 1,  '404': 1,  '429': 1,
+                   'Timeout': 1, 'Other': 1}
+        logfile = self.flaskdir + 'log_' + str(file_id)
+        qfile = self.flaskdir + 'qfile_' + str(file_id) + '.csv'
+        efile = self.flaskdir + 'efile_' + str(file_id)
+        epfile = self.flaskdir + 'episodes_' + str(file_id) + '.json'
+        # probe_url = params['probeurl']
+        cexp = {'pod-delete': 'pod-delete.json', 'cpu-hog': 'pod-cpu-hog.json',
+                'disk-fill': 'disk-fill.json', 'network-loss': 'network-loss.json',
+                'network-corruption': 'network-corruption.json', 'io-stress': 'io-stress.json'}
+        aichaos = AIChaos(states=states, faults=faults, rewards=rewards,
+                          logfile=logfile, qfile=qfile, efile=efile, epfile=epfile,
+                          urls=params['urls'].split(','), namespace=params['namespace'],
+                          max_faults=int(params['maxfaults']),
+                          num_requests=10, timeout=2,
+                          chaos_engine=params['chaosengine'],
+                          chaos_dir='config/', kubeconfig=kubeconfigfile,
+                          loglevel=logging.DEBUG, chaos_experiment=cexp, iterations=int(params['iterations']),
+                          command=params['command'])
+        print('checking kubeconfig')
+        os.system("echo $KUBECONFIG")
+        aichaos.start_chaos()
+
+        file = open(outfile, "w")
+        file.write('done')
+        file.close()
+        os.remove(initfile)
+        # os.remove(csvfile)
+        # ConstraintsInference().remove_temp_files(dir, file_id)
+        return 'WRITE'
+
+    @app.route('/GenerateChaos/', methods=['POST'])
+    @swag_from('config/yml/chaosGen.yml')
+    def chaos_gen():
+        dir = flaskdir
+        sw = AIChaosSwagger(flaskdir=dir)
+        f = request.files['file']
+        list = os.listdir(dir)
+        for i in range(10000):
+            fname = 'kubeconfig-'+str(i)
+            if fname not in list:
+                break
+        kubeconfigfile = ''.join([dir, 'kubeconfig-', str(i)])
+        f.save(kubeconfigfile)
+        # creating empty file
+        open(kubeconfigfile, 'a').close()
+        # print('HEADER:', f.headers)
+        print('[GenerateChaos] reqs:', request.form.to_dict())
+        # print('[GenerateChaos]', f.filename, datetime.now())
+        thread = threading.Thread(target=sw.startchaos, args=(kubeconfigfile, str(i), request.form.to_dict()))
+        thread.daemon = True
+        print(thread.getName())
+        thread.start()
+        return 'Chaos ID: ' + str(i)
+
+    @app.route('/GetStatus/<chaosid>', methods=['GET'])
+    @swag_from('config/yml/status.yml')
+    def get_status(chaosid):
+        print('[GetStatus]', chaosid, flaskdir)
+        epfile = flaskdir + 'episodes_' + str(chaosid) + '.json'
+        initfile = ''.join([flaskdir, 'init-', chaosid])
+        if os.path.exists(epfile):
+            return 'Completed'
+        elif os.path.exists(initfile):
+            return 'Running'
+        else:
+            return 'Does not exist'
+
+    @app.route('/GetQTable/<chaosid>', methods=['GET'])
+    @swag_from('config/yml/qtable.yml')
+    def get_qtable(chaosid):
+        print('[GetQTable]', chaosid)
+        qfile = flaskdir + 'qfile_' + str(chaosid) + '.csv'
+        initfile = ''.join([flaskdir, 'init-', chaosid])
+        if os.path.exists(qfile):
+            f = open(qfile, "r")
+            return f.read()
+        elif os.path.exists(initfile):
+            return 'Running'
+        else:
+            return 'Invalid Chaos ID: ' + chaosid
+
+    @app.route('/GetEpisodes/<chaosid>', methods=['GET'])
+    @swag_from('config/yml/episodes.yml')
+    def get_episodes(chaosid):
+        print('[GetEpisodes]', chaosid)
+        epfile = flaskdir + 'episodes_' + str(chaosid) + '.json'
+        initfile = ''.join([flaskdir, 'init-', chaosid])
+        if os.path.exists(epfile):
+            f = open(epfile, "r")
+            return f.read()
+        elif os.path.exists(initfile):
+            return 'Running'
+        else:
+            return 'Invalid Chaos ID: ' + chaosid
+
+
+    @app.route('/GetLog/<chaosid>', methods=['GET'])
+    @swag_from('config/yml/log.yml')
+    def get_log(chaosid):
+        print('[GetLog]', chaosid)
+        epfile = flaskdir + 'log_' + str(chaosid)
+        initfile = ''.join([flaskdir, 'init-', chaosid])
+        if os.path.exists(epfile):
+            f = open(epfile, "r")
+            return f.read()
+        elif os.path.exists(initfile):
+            return 'Running'
+        else:
+            return 'Invalid Chaos ID: ' + chaosid
+
+
+if __name__ == '__main__':
+    app.run(debug=True, host='0.0.0.0', port='5001')
--- a/utils/chaos_ai/generate_wheel_package.py
+++ b/utils/chaos_ai/generate_wheel_package.py
@@ -0,0 +1,21 @@
+import setuptools
+# from setuptools_cythonize import get_cmdclass
+
+setuptools.setup(
+    # cmdclass=get_cmdclass(),
+    name="aichaos",
+    version="0.0.1",
+    author="Sandeep Hans",
+    author_email="shans001@in.ibm.com",
+    description="Chaos AI",
+    long_description="Chaos Engineering using AI",
+    long_description_content_type="text/markdown",
+    url="",
+    packages=setuptools.find_packages(),
+    classifiers=[
+        "Programming Language :: Python :: 3",
+        "License :: OSI Approved :: MIT License",
+        "Operating System :: OS Independent",
+    ],
+    python_requires='>=3.9',
+)
--- a/utils/chaos_ai/requirements.txt
+++ b/utils/chaos_ai/requirements.txt
@@ -0,0 +1,10 @@
+numpy
+pandas
+notebook
+jupyterlab
+jupyter
+seaborn
+requests
+wheel
+Flask==2.1.0
+flasgger==0.9.5
--- a/utils/chaos_ai/src/init.py
+++ b/utils/chaos_ai/src/init.py
--- a/utils/chaos_ai/src/aichaos.py
+++ b/utils/chaos_ai/src/aichaos.py
@@ -0,0 +1,213 @@
+import json
+import os
+import random
+import sys
+
+import numpy as np
+import logging
+
+
+class AIChaos:
+    def __init__(self, states=None, faults=None, rewards=None, pod_names=[], chaos_dir=None,
+                 chaos_experiment='experiment.json',
+                 chaos_journal='journal.json', iterations=1000, static_run=False):
+        self.faults = faults
+        self.pod_names = pod_names
+        self.states = states
+        self.rewards = rewards
+        self.episodes = []
+
+        self.chaos_dir = chaos_dir
+        self.chaos_experiment = chaos_experiment
+        self.chaos_journal = chaos_journal
+
+        self.iterations = iterations
+        # Initialize parameters
+        self.gamma = 0.75  # Discount factor
+        self.alpha = 0.9  # Learning rate
+
+        # Initializing Q-Values
+        # self.Q = np.array(np.zeros([9, 9]))
+        # self.Q = np.array(np.zeros([len(faults), len(faults)]))
+        # currently action is a single fault, later on we will do multiple faults together
+        # For multiple faults, the no of cols in q-matrix will be all combinations of faults (infinite)
+        # eg. {f1,f2},f3,f4,{f4,f5} - f1,f2  in parallel, then f3, then f4,  then f4,f5 in parallel produces end state
+        # self.Q = np.array(np.zeros([len(states), len(states)]))
+        self.Q = np.array(np.zeros([len(states), len(faults)]))
+        self.state_matrix = np.array(np.zeros([len(states), len(states)]))
+
+        # may be Q is a dictionary of dictionaries, for each state there is a dictionary of faults
+        # Q = {'500' = {'f1f2f4': 0.3, 'f1':  0.5}, '404' = {'f2': 0.22}}
+
+        self.logger = logging.getLogger()
+        # run from old static experiment and journal files
+        self.static_run = static_run
+
+    # End state is reached when system is down or return error code like '500','404'
+    def get_next_state(self):
+        self.logger.info('[GET_NEXT_STATE]')
+        f = open(self.chaos_dir + self.chaos_journal)
+        data = json.load(f)
+
+        # before the experiment (if before steady state is false, after is null?)
+        for probe in data['steady_states']['before']['probes']:
+            if not probe['tolerance_met']:
+                # start_state = probe['activity']['tolerance']
+                # end_state = probe['status']
+                start_state, end_state = None, None
+                return start_state, end_state
+
+        # after the experiment
+        for probe in data['steady_states']['after']['probes']:
+            # if probe['output']['status'] == probe['activity']['tolerance']:
+            if not probe['tolerance_met']:
+                # print(probe)
+                start_state = probe['activity']['tolerance']
+                end_state = probe['output']['status']
+                # end_state = probe['status']
+                return start_state, end_state
+        # if tolerances for all probes are met
+        start_state = probe['activity']['tolerance']
+        end_state = probe['activity']['tolerance']
+        return start_state, end_state
+
+    def inject_faults(self, fault, pod_name):
+        self.logger.info('[INJECT_FAULT] ' + fault)
+        f = open(self.chaos_dir + self.chaos_experiment)
+        data = json.load(f)
+        for m in data['method']:
+            if 'provider' in m:
+                if fault == 'kill_microservice':
+                    m['name'] = 'kill-microservice'
+                    m['provider']['module'] = 'chaosk8s.actions'
+                    m['provider']['arguments']['name'] = pod_name
+                else:
+                    m['provider']['arguments']['name_pattern'] = pod_name
+                m['provider']['func'] = fault
+
+                print('[INJECT_FAULT] method:', m)
+                # self.logger.info('[INJECT_FAULT] ' + m['provider']['arguments']['name_pattern'])
+                # self.logger.info('[INJECT_FAULT] ' + str(m))
+
+        exp_file = self.chaos_dir + 'experiment_' + str(random.randint(1, 10)) + '.json'
+        with open(exp_file, 'w') as f:
+            json.dump(data, f)
+        exp_file = self.chaos_dir + 'experiment.json'
+        # execute faults
+        # cmd = 'cd ' + self.chaos_dir + ';chaos run ' + self.chaos_experiment
+        cmd = 'cd ' + self.chaos_dir + ';chaos run ' + exp_file
+        if not self.static_run:
+            os.system(cmd)
+
+    def create_episode(self):
+        self.logger.info('[CREATE_EPISODE]')
+        episode = []
+        while True:
+            # inject more faults
+            # TODO: model - choose faults based on q-learning ...
+            fault_pod = random.choice(self.faults)
+            fault = fault_pod.split(':')[0]
+            pod_name = fault_pod.split(':')[1]
+            # fault = random.choice(self.faults)
+            # pod_name = random.choice(self.pod_names)
+            # fault = lstm_model.get_next_fault(episode)
+            # fault = get_max_prob_fault(episode)
+
+            self.inject_faults(fault, pod_name)
+            start_state, next_state = self.get_next_state()
+            print('[CREATE EPISODE]', start_state, next_state)
+            # if before state tolerance is not met
+            if start_state is None and next_state is None:
+                continue
+
+            episode.append({'fault': fault, 'pod_name': pod_name})
+            self.update_q_fault(fault_pod, episode, start_state, next_state)
+            # self.update_q_fault(fault, episode, start_state, next_state)
+            # if an end_state is reached
+            # if next_state is not None:
+            if start_state != next_state:
+                self.logger.info('[CREATE_EPISODE] EPISODE CREATED:' + str(episode))
+                self.logger.info('[CREATE_EPISODE] END STATE:' + str(next_state))
+                return episode, start_state, next_state
+
+    def update_q_fault(self, fault, episode, start_state, end_state):
+        self.logger.info('[UPDATE_Q]')
+        print('[UPDATE_Q] ', str(start_state), str(end_state))
+        if end_state is None:
+            end_state = start_state
+
+        # reward is dependent on the error response (eg. '404') and length of episode
+        reward = self.rewards[str(end_state)] / len(episode)
+        current_state = self.states[str(start_state)]
+        next_state = self.states[str(end_state)]
+        fault_index = self.faults.index(fault)
+
+        TD = reward + \
+             self.gamma * self.Q[next_state, np.argmax(self.Q[next_state,])] - \
+             self.Q[current_state, fault_index]
+        self.Q[current_state, fault_index] += self.alpha * TD
+
+        # update state matrix
+        TD_state = reward + \
+                   self.gamma * self.state_matrix[next_state, np.argmax(self.state_matrix[next_state,])] - \
+                   self.state_matrix[current_state, next_state]
+        self.state_matrix[current_state, next_state] += self.alpha * TD_state
+
+    # def update_q(self, episode, start_state, end_state):
+    #     self.logger.info('[UPDATE_Q]')
+    #     if end_state is None:
+    #         end_state = start_state
+    #
+    #     # reward is dependent on the error response (eg. '404') and length of episode
+    #     reward = self.rewards[str(end_state)] / len(episode)
+    #     current_state = self.states[str(start_state)]
+    #     next_state = self.states[str(end_state)]
+    #     TD = reward + \
+    #          self.gamma * self.Q[next_state, np.argmax(self.Q[next_state,])] - \
+    #          self.Q[current_state, next_state]
+    #     self.Q[current_state, next_state] += self.alpha * TD
+
+    def start_chaos(self):
+        for i in range(self.iterations):
+            episode, start_state, end_state = self.create_episode()
+            # update Q matrix
+            # will do it with each fault injection
+            # self.update_q(episode, start_state, end_state)
+            print(self.Q)
+            print(self.state_matrix)
+
+
+def test_chaos():
+    svc_list = ['cart', 'catalogue', 'dispatch', 'mongodb', 'mysql', 'payment', 'rabbitmq', 'ratings', 'redis',
+                'shipping', 'user', 'web']
+    # Define faults
+    # faults = ['terminate_pods']
+    #     faults = ['terminate_pods:' + x for x in pod_names]
+    faults = ['kill_microservice:' + x for x in svc_list]
+    # Define the states
+    states = {
+        '200': 0,
+        '500': 1,
+        '404': 2
+    }
+    # Define rewards, currently not used
+    rewards = {
+        '200': 0,
+        '500': 0.8,
+        '404': 1
+    }
+
+    # cdir = '/Users/sandeephans/Downloads/chaos/chaostoolkit-samples-master/service-down-not-visible-to-users/'
+    cdir = '/Users/sandeephans/Downloads/openshift/'
+    cexp = 'experiment.json'
+    cjournal = 'journal.json'
+
+    aichaos = AIChaos(states=states, faults=faults, rewards=rewards,
+                      chaos_dir=cdir, chaos_experiment=cexp, chaos_journal=cjournal,
+                      static_run=False)
+    aichaos.start_chaos()
+
+
+if __name__ == '__main__':
+    logging.basicConfig(stream=sys.stdout, level=logging.INFO)
+    test_chaos()
--- a/utils/chaos_ai/src/aichaos_main.py
+++ b/utils/chaos_ai/src/aichaos_main.py
@@ -0,0 +1,248 @@
+import json
+import os
+import random
+
+import numpy as np
+import pandas as pd
+import logging
+
+# sys.path.insert(1, os.path.join(sys.path[0], '..'))
+import src.utils as utils
+from src.kraken_utils import KrakenUtils
+from src.qlearning import QLearning
+from src.test_application import TestApplication
+
+
+class AIChaos:
+    def __init__(self, namespace='robot-shop', states=None, faults=None, rewards=None, urls=[], max_faults=5,
+                 service_weights=None, ctd_subsets=None, pod_names=[], chaos_dir='../config/', kubeconfig='~/.kube/config',
+                 chaos_experiment='experiment.json', logfile='log', qfile='qfile.csv', efile='efile', epfile='episodes.json',
+                 loglevel=logging.INFO,
+                 chaos_journal='journal.json', iterations=10, alpha=0.9, gamma=0.2, epsilon=0.3,
+                 num_requests=10, sleep_time=1, timeout=2, chaos_engine='kraken', dstk_probes=None,
+                 static_run=False, all_faults=False, command='podman'):
+        self.namespace = namespace
+        self.faults = faults
+        self.unused_faults = faults.copy()
+        self.all_faults = all_faults
+        self.pod_names = pod_names
+        self.states = states
+        self.rewards = rewards
+        self.urls = urls
+        self.max_faults = max_faults
+        self.episodes = []
+        self.service_weights = service_weights
+        self.ctd_subsets = ctd_subsets
+
+        self.kubeconfig = kubeconfig
+        self.chaos_dir = chaos_dir
+        self.chaos_experiment = chaos_experiment
+        self.chaos_journal = chaos_journal
+        self.command = command
+
+        if chaos_engine == 'kraken':
+            self.chaos_engine = KrakenUtils(namespace, kubeconfig=kubeconfig, chaos_dir=chaos_dir, chaos_experiment=chaos_experiment, command=self.command)
+        else:
+            self.chaos_engine = None
+
+        self.iterations = iterations
+        # Initialize RL parameters
+        self.epsilon = epsilon  # epsilon decay policy
+        # self.epsdecay = 0
+
+        # log files
+        self.logfile = logfile
+        self.qfile = qfile
+        self.efile = efile
+        self.epfile = epfile
+        open(efile, 'w+').close()
+        open(logfile, 'w+').close()
+        open(logfile, 'r+').truncate(0)
+        logging.getLogger("requests").setLevel(logging.WARNING)
+        logging.getLogger("urllib3").setLevel(logging.WARNING)
+        logging.basicConfig(filename=logfile, filemode='w+', level=loglevel)
+        self.logger = logging.getLogger(logfile.replace('/',''))
+        self.logger.addHandler(logging.FileHandler(logfile))
+
+        self.testapp = TestApplication(num_requests, timeout, sleep_time)
+        self.ql = QLearning(gamma, alpha, faults, states, rewards, urls)
+
+        # run from old static experiment and journal files
+        self.static_run = static_run
+
+    def realistic(self, faults_pods):
+        self.logger.debug('[Realistic] ' + str(faults_pods))
+        fp = faults_pods.copy()
+        for f1 in faults_pods:
+            for f2 in faults_pods:
+                if f1 == f2:
+                    continue
+                if f1 in fp and f2 in fp:
+                    f1_fault, load_1 = utils.get_load(f1.split(':')[0])
+                    f1_pod = f1.split(':')[1]
+                    f2_fault, load_2 = utils.get_load(f2.split(':')[0])
+                    f2_pod = f2.split(':')[1]
+                    if f1_pod == f2_pod:
+                        if f1_fault == 'pod-delete':
+                            fp.remove(f2)
+                        if f1_fault == f2_fault:
+                            # if int(load_1) > int(load_2):
+                            # randomly remove one fault from same faults with different params
+                            fp.remove(f2)
+        if self.service_weights is None:
+            return fp
+
+        fp_copy = fp.copy()
+        for f in fp:
+            f_fault = f.split(':')[0]
+            f_pod = f.split(':')[1].replace('service=', '')
+            self.logger.debug('[ServiceWeights] ' + f + ' ' + str(self.service_weights[f_pod][f_fault]))
+            if self.service_weights[f_pod][f_fault] == 0:
+                fp_copy.remove(f)
+
+        self.logger.debug('[Realistic] ' + str(fp_copy))
+        return fp_copy
+
+    def select_faults(self):
+        max_faults = min(self.max_faults, len(self.unused_faults))
+        num_faults = random.randint(1, max_faults)
+        if self.all_faults:
+            num_faults = len(self.unused_faults)
+        if random.random() > self.epsilon:
+            self.logger.info('[Exploration]')
+            # faults_pods = random.sample(self.faults, k=num_faults)
+            # using used faults list to avoid starvation
+            faults_pods = random.sample(self.unused_faults, k=num_faults)
+            faults_pods = self.realistic(faults_pods)
+            for f in faults_pods:
+                self.unused_faults.remove(f)
+            if len(self.unused_faults) == 0:
+                self.unused_faults = self.faults.copy()
+        else:
+            self.logger.info('[Exploitation]')
+            first_row = self.ql.Q[:, 0, :][0]
+            top_k_indices = np.argpartition(first_row, -num_faults)[-num_faults:]
+            faults_pods = [self.faults[i] for i in top_k_indices]
+            faults_pods = self.realistic(faults_pods)
+
+        return faults_pods
+
+    def create_episode(self, ctd_subset=None):
+        self.logger.debug('[CREATE_EPISODE]')
+        episode = []
+
+        if ctd_subset is None:
+            faults_pods = self.select_faults()
+        else:
+            faults_pods = ctd_subset
+            self.logger.info('CTD Subset: ' + str(faults_pods))
+
+        # faults_pods = self.realistic(faults_pods)
+        if len(faults_pods) == 0:
+            return [], 200, 200
+
+        engines = []
+        for fp in faults_pods:
+            fault = fp.split(':')[0]
+            pod_name = fp.split(':')[1]
+            engine = self.chaos_engine.inject_faults(fault, pod_name)
+            engines.append(engine)
+            episode.append({'fault': fault, 'pod_name': pod_name})
+        self.logger.info('[create_episode]' + str(faults_pods))
+        engines_running = self.chaos_engine.wait_engines(engines)
+        self.logger.info('[create_episode] engines_running' + str(engines_running))
+        if not engines_running:
+            return None, None, None
+
+        # randomly shuffling urls 
+        urls = random.sample(self.urls, len(self.urls))
+        ep_json = []
+        for url in urls:
+            start_state, next_state = self.testapp.test_load(url)
+            self.logger.info('[CREATE EPISODE]' + str(start_state) + ',' + str(next_state))
+            # if before state tolerance is not met
+            if start_state is None and next_state is None:
+                # self.cleanup()
+                self.chaos_engine.stop_engines()
+                continue
+
+                ### episode.append({'fault': fault, 'pod_name': pod_name})
+                # self.update_q_fault(fault_pod, episode, start_state, next_state)
+            url_index = self.urls.index(url)
+            self.logger.info('[CREATEEPISODE]' + str(url) + ':' + str(url_index))
+            for fp in faults_pods:
+                self.ql.update_q_fault(fp, episode, start_state, next_state, self.urls.index(url))
+            ep_json.append({'start_state': start_state, 'next_state': next_state, 'url': url, 'faults': episode})
+
+        self.logger.debug('[CREATE_EPISODE] EPISODE CREATED:' + str(episode))
+        self.logger.debug('[CREATE_EPISODE] END STATE:' + str(next_state))
+
+        self.chaos_engine.print_result(engines)
+        self.chaos_engine.stop_engines(episode=episode)
+        # ep_json = {'start_state': start_state, 'next_state': next_state, 'faults': episode}
+
+        return ep_json, start_state, next_state
+
+    def start_chaos(self):
+        self.logger.info('[INITIALIZING]')
+        self.logger.info('Logfile: '+self.logfile)
+        self.logger.info('Loggerfile: '+self.logger.handlers[0].stream.name)
+        self.logger.info('Chaos Engine: ' + self.chaos_engine.get_name())
+        self.logger.debug('Faults:' + str(self.faults))
+
+        self.chaos_engine.cleanup()
+        if self.ctd_subsets is None:
+            for i in range(self.iterations):
+                episode, start_state, end_state = self.create_episode()
+                self.logger.debug('[start_chaos]' + str(i) + ' ' + str(episode))
+                if episode is None:
+                    continue
+                # update Q matrix
+                # will do it with each fault injection
+                # self.update_q(episode, start_state, end_state)
+                # if episode['next_state'] != '200':
+                self.episodes.extend(episode)
+                self.logger.info(str(i) + ' ' + str(self.ql.Q[:, 0]))
+                # print(i, self.state_matrix)
+                self.write_q()
+                self.write_episode(episode)
+        else:
+            for i, subset in enumerate(self.ctd_subsets):
+                episode, start_state, end_state = self.create_episode(subset)
+                self.logger.debug('[start_chaos]' + str(episode))
+                if episode is None:
+                    continue
+                self.episodes.append(episode)
+                self.logger.info(str(i) + ' ' + str(self.ql.Q[:, 0]))
+                self.write_q()
+                self.write_episode(episode)
+
+        self.chaos_engine.cleanup()
+        # self.remove_temp_file()
+        with open(self.epfile, 'w', encoding='utf-8') as f:
+            json.dump(self.episodes, f, ensure_ascii=False, indent=4)
+        self.logger.info('COMPLETE!!!')
+
+    def write_q(self):
+        df = pd.DataFrame(self.ql.Q[:, 0, :], index=self.urls, columns=self.faults)
+        df.to_csv(self.qfile)
+        return df
+
+    def write_episode(self, episode):
+        for ep in episode:
+            with open(self.efile, "a") as outfile:
+                x = [e['fault'] + ':' + e['pod_name'] for e in ep['faults']]
+                x.append(ep['url'])
+                x.append(str(ep['next_state']))
+                outfile.write(','.join(x) + '\n')
+
+    def remove_temp_file(self):
+        mydir = self.chaos_dir + 'experiments'
+        print('Removing temp files from: '+mydir)
+        self.logger.debug('Removing temp files: '+mydir)
+        if os.path.exists(mydir):
+            return
+        filelist = [f for f in os.listdir(mydir) if f.endswith(".json")]
+        for f in filelist:
+            print(f)
+            os.remove(os.path.join(mydir, f))
--- a/utils/chaos_ai/src/experiments.py
+++ b/utils/chaos_ai/src/experiments.py
@@ -0,0 +1,56 @@
+import random
+
+
+class Experiments:
+    def __init__(self):
+        self.k = 0
+
+    def monotonic(self, aichaos, num_sets=3):
+        for i in range(num_sets):
+            faults_pods = random.sample(aichaos.faults, k=2)
+            faults_set = [[faults_pods[0]], [faults_pods[1]], [faults_pods[0], faults_pods[1]]]
+
+            resp1, resp2, resp_both = 0, 0, 0
+            for fl in faults_set:
+                engines = []
+                for fp in fl:
+                    fault = fp.split(':')[0]
+                    pod_name = fp.split(':')[1]
+                    engine = aichaos.inject_faults_litmus(fault, pod_name)
+                    engines.append(engine)
+                aichaos.litmus.wait_engines(engines)
+
+                for index, url in enumerate(aichaos.urls):
+                    start_state, next_state = aichaos.test_load(url)
+                    print(i, fl, next_state)
+                    # self.write(str(fl), next_state)
+                    if resp1 == 0:
+                        resp1 = next_state
+                    elif resp2 == 0:
+                        resp2 = next_state
+                    else:
+                        resp_both = next_state
+
+                aichaos.litmus.stop_engines()
+            self.write_resp(str(faults_set[2]), resp1, resp2, resp_both)
+        print('Experiment Complete!!!')
+
+    @staticmethod
+    def write(fault, next_state):
+        with open("experiment", "a") as outfile:
+            outfile.write(fault + ',' + str(next_state) + ',' + '\n')
+
+
+    @staticmethod
+    def write_resp(faults, resp1, resp2, resp3):
+        monotonic = True
+        if resp3 == 200:
+            if resp1 != 200 or resp2 != 200:
+                monotonic = False
+        else:
+            if resp1 == 200 and resp2 == 200:
+                monotonic = False
+
+        with open("experiment", "a") as outfile:
+            # outfile.write(faults + ',' + str(resp1) + ',' + '\n')
+            outfile.write(faults + ',' + str(resp1) + ',' + str(resp2) + ',' + str(resp3) + ',' + str(monotonic) + '\n')
--- a/utils/chaos_ai/src/kraken_utils.py
+++ b/utils/chaos_ai/src/kraken_utils.py
@@ -0,0 +1,99 @@
+import json
+import os
+import time
+import logging
+
+import src.utils as utils
+
+
+class KrakenUtils:
+    def __init__(self, namespace='robot-shop', chaos_dir='../config/',
+                 chaos_experiment='experiment.json', kubeconfig='~/.kube/config', wait_checks=60, command='podman'):
+        self.chaos_dir = chaos_dir
+        self.chaos_experiment = chaos_experiment
+        self.namespace = namespace
+        self.kubeconfig = kubeconfig
+        self.logger = logging.getLogger()
+        self.engines = []
+        self.wait_checks = wait_checks
+        self.command = command
+
+    def exp_status(self, engine='engine-cartns3'):
+        substring_list = ['Waiting for the specified duration','Waiting for wait_duration', 'Step workload started, waiting for response']
+        substr = '|'.join(substring_list)
+        # cmd = "docker logs "+engine+" 2>&1 | grep Waiting"
+        # cmd = "docker logs "+engine+" 2>&1 | grep -E '"+substr+"'"
+        cmd = self.command +" logs "+engine+" 2>&1 | grep -E '"+substr+"'"
+        line = os.popen(cmd).read()
+        self.logger.debug('[exp_status]'+line)
+        # if 'Waiting for the specified duration' in line:
+        # if 'Waiting for' in line or 'waiting for' in line:
+        # if 'Waiting for the specified duration' in line or 'Waiting for wait_duration' in line or 'Step workload started, waiting for response' in line:
+        if any(map(line.__contains__, substring_list)):
+            return 'Running'
+        return 'Not Running'
+ 
+    # print chaos result, check if litmus showed any error
+    def print_result(self, engines):
+        # self.logger.debug('')
+        for e in engines:
+            # cmd = 'kubectl describe chaosresult ' + e + ' -n ' + self.namespace + ' | grep "Fail Step:"'
+            # line = os.popen(cmd).read()
+            # self.logger.debug('[Chaos Result] '+e+' : '+line)
+            self.logger.debug('[KRAKEN][Chaos Result] '+e)
+
+    def wait_engines(self, engines=[]):
+        status = 'Completed'
+        max_checks = self.wait_checks
+        for e in engines:
+            self.logger.info('[Wait Engines] ' + e)
+            for i in range(max_checks):
+                status = self.exp_status(e)
+                if status == 'Running':
+                    break
+                time.sleep(1)
+            # return False, if even one engine is not running
+            if status != 'Running':
+                return False
+
+        self.engines = engines
+        # return True if all engines are running
+        return True
+
+
+    def cleanup(self):
+        self.logger.debug('Removing previous engines')
+        # cmd = "docker rm $(docker ps -q -f 'status=exited')"
+        if len(self.engines) > 0:
+            cmd = self.command+" stop " + " ".join(self.engines) + " >> temp"
+            os.system(cmd)
+        self.engines = []
+
+        cmd = self.command+" container prune -f >> temp"
+        os.system(cmd)
+        self.logger.debug('Engines removed')
+
+    def stop_engines(self, episode=[]):
+        self.cleanup()
+
+    def get_name(self):
+        return 'kraken'
+
+    def inject_faults(self, fault, pod_name):
+        self.logger.debug('[KRAKEN][INJECT_FAULT] ' + fault + ':' + pod_name)
+        fault, load = utils.get_load(fault)
+        engine = 'engine-' + pod_name.replace('=', '-').replace('/','-') + '-' + fault
+        if fault == 'pod-delete':
+            cmd = self.command+' run  -d -e NAMESPACE='+self.namespace+' -e POD_LABEL='+pod_name+' --name='+engine+' --net=host -v '+self.kubeconfig+':/root/.kube/config:Z quay.io/redhat-chaos/krkn-hub:pod-scenarios >> temp'
+        elif fault == 'network-chaos':
+            # 'docker run -e NODE_NAME=minikube-m03 -e DURATION=10  --name=knetwork --net=host -v /home/chaos/.kube/kube-config-raw:/root/.kube/config:Z -d quay.io/redhat-chaos/krkn-hub:network-chaos >> temp'        
+            cmd = self.command+' run -d -e NODE_NAME='+pod_name+' -e DURATION=120  --name='+engine+' --net=host -v '+self.kubeconfig+':/root/.kube/config:Z -d quay.io/redhat-chaos/krkn-hub:network-chaos >> temp'
+        elif fault == 'node-memory-hog':
+            cmd = self.command+' run -d -e NODE_NAME='+pod_name+' -e DURATION=120 -e NODES_AFFECTED_PERC=100 --name='+engine+' --net=host -v '+self.kubeconfig+':/root/.kube/config:Z -d quay.io/redhat-chaos/krkn-hub:node-memory-hog >> temp'
+        elif fault == 'node-cpu-hog':
+            cmd = self.command+'  run -e NODE_SELECTORS='+pod_name+' -e NODE_CPU_PERCENTAGE=100 -e NAMESPACE='+self.namespace+' -e TOTAL_CHAOS_DURATION=120 -e NODE_CPU_CORE=100 --name='+engine+' --net=host -env-host=true -v '+self.kubeconfig+':/root/.kube/config:Z -d quay.io/redhat-chaos/krkn-hub:node-cpu-hog'
+        else:
+            cmd = 'echo'
+        self.logger.debug('[KRAKEN][INJECT_FAULT] ' + cmd)
+        os.system(cmd)
+        return engine
--- a/utils/chaos_ai/src/qlearning.py
+++ b/utils/chaos_ai/src/qlearning.py
@@ -0,0 +1,62 @@
+import logging
+
+import numpy as np
+
+
+class QLearning:
+    def __init__(self, gamma=None, alpha=None, faults=None, states=None, rewards=None, urls=None):
+        self.gamma = gamma  # Discount factor
+        self.alpha = alpha  # Learning rate
+        self.faults = faults
+        self.states = states
+        self.rewards = rewards
+
+        # Initializing Q-Values
+        # self.Q = np.array(np.zeros([len(states), len(states)]))
+        self.Q = np.array(np.zeros([len(urls), len(states), len(faults)]))
+        self.state_matrix = np.array(np.zeros([len(states), len(states)]))
+
+        self.logger = logging.getLogger()
+
+    def update_q_fault(self, fault, episode, start_state, end_state, url_index):
+        self.logger.info('[UPDATE_Q] ' + str(url_index) + ' ' + fault + ' ' + str(start_state) + '->' + str(end_state))
+        if end_state is None:
+            end_state = start_state
+        if end_state not in self.states:
+            end_state = 'Other'
+        # reward is dependent on the error response (eg. '404') and length of episode
+        reward = self.rewards[str(end_state)] / len(episode)
+        current_state = self.states[str(start_state)]
+        next_state = self.states[str(end_state)]
+        fault_index = self.faults.index(fault)
+        # self.logger.debug('[update_q]' + fault + ' ' + str(fault_index) + ' ' + str(reward))
+        # self.logger.debug('reward, gamma: ' + str(reward) + ' ' + str(self.gamma))
+        # self.logger.debug(
+        #     'gamma*val' + str(self.gamma * self.Q[url_index, next_state, np.argmax(self.Q[url_index, next_state,])]))
+        # self.logger.debug('current state val:' + str(self.Q[url_index, current_state, fault_index]))
+
+        TD = reward + \
+             self.gamma * self.Q[url_index, next_state, np.argmax(self.Q[url_index, next_state,])] - \
+             self.Q[url_index, current_state, fault_index]
+        self.Q[url_index, current_state, fault_index] += self.alpha * TD
+
+        # update state matrix
+        TD_state = reward + \
+                   self.gamma * self.state_matrix[next_state, np.argmax(self.state_matrix[next_state,])] - \
+                   self.state_matrix[current_state, next_state]
+        self.state_matrix[current_state, next_state] += self.alpha * TD_state
+        # self.logger.debug('updated Q' + str(self.Q[url_index, current_state, fault_index]))
+
+    # def update_q(self, episode, start_state, end_state):
+    #     self.logger.info('[UPDATE_Q]')
+    #     if end_state is None:
+    #         end_state = start_state
+    #
+    #     # reward is dependent on the error response (eg. '404') and length of episode
+    #     reward = self.rewards[str(end_state)] / len(episode)
+    #     current_state = self.states[str(start_state)]
+    #     next_state = self.states[str(end_state)]
+    #     TD = reward + \
+    #          self.gamma * self.Q[next_state, np.argmax(self.Q[next_state,])] - \
+    #          self.Q[current_state, next_state]
+    #     self.Q[current_state, next_state] += self.alpha * TD
--- a/utils/chaos_ai/src/swagger_api.py
+++ b/utils/chaos_ai/src/swagger_api.py
@@ -0,0 +1,171 @@
+import json, os
+import logging
+# import numpy as np
+# import pandas as pd
+import threading
+from datetime import datetime
+from flask import Flask, request
+from flasgger import Swagger
+from flasgger.utils import swag_from
+# import zipfile
+import sys
+
+sys.path.append("..")
+from aichaos_main import AIChaos
+
+app = Flask(__name__)
+Swagger(app)
+flaskdir = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "config", "experiments",
+                        "flask") + '/'
+
+
+class AIChaosSwagger:
+    def __init__(self, flaskdir=''):
+        self.flaskdir = flaskdir
+
+    @app.route("/")
+    def empty(params=''):
+        return "AI Chaos Repository!"
+
+    def startchaos(self, kubeconfigfile, file_id, params):
+        print('[StartChaos]', file_id, kubeconfigfile)
+        dir = flaskdir
+        outfile = ''.join([dir, 'out-', file_id])
+        initfile = ''.join([dir, 'init-', file_id])
+        with open(initfile, 'w'):
+            pass
+        if os.path.exists(outfile):
+            os.remove(outfile)
+        # cons = ConstraintsInference(outdir=dir).get_constraints(csvfile, file_id, params, verbose=False,
+        #                                                         write_local=False)
+        os.environ["KUBECONFIG"] = kubeconfigfile
+        params['command'] = 'podman'
+        params['chaos_engine'] = 'kraken'
+        params['faults'] = 'pod-delete'
+        params['iterations'] = 1
+        params['maxfaults'] = 5
+        if os.path.isfile('/config/aichaos-config.json'):
+            with open('/config/aichaos-config.json') as f:
+                config_params = json.load(f)
+                params['command'] = config_params['command']
+                params['chaos_engine'] = config_params['chaos_engine']
+                params['faults']= config_params['faults']
+                params['iterations'] = config_params['iterations']
+                params['maxfaults'] = config_params['maxfaults']
+        faults = [f + ':' + p for f in params['faults'].split(',') for p in params['podlabels'].split(',')]
+        print('#faults:', len(faults), faults)
+        states = {'200': 0, '500': 1, '502': 2, '503': 3, '404': 4, 'Timeout': 5}
+        rewards = {'200': -1, '500': 0.8, '502': 0.8, '503': 0.8, '404': 1, 'Timeout': 1}
+        logfile = self.flaskdir + 'log_' + str(file_id)
+        qfile = self.flaskdir + 'qfile_' + str(file_id) + '.csv'
+        efile = self.flaskdir + 'efile_' + str(file_id)
+        epfile = self.flaskdir + 'episodes_' + str(file_id) + '.json'
+        probe_url = params['probeurl']
+        probes = {'pod-delete': 'executeprobe', 'cpu-hog': 'wolffi/cpu_load', 'disk-fill': 'wolffi/memory_load',
+                  'io_load': 'wolffi/io_load', 'http_delay': 'wolffi/http_delay', 'packet_delay': 'wolffi/packet_delay',
+                  'packet_duplication': 'wolffi/packet_duplication', 'packet_loss': 'wolffi/packet_loss',
+                  'packet_corruption': 'wolffi/packet_corruption',
+                  'packet_reordering': 'wolffi/packet_reordering', 'network_load': 'wolffi/network_load',
+                  'http_bad_request': 'wolffi/http_bad_request',
+                  'http_unauthorized': 'wolffi/http_unauthorized', 'http_forbidden': 'wolffi/http_forbidden',
+                  'http_not_found': 'wolffi/http_not_found',
+                  'http_method_not_allowed': 'wolffi/http_method_not_allowed',
+                  'http_not_acceptable': 'wolffi/http_not_acceptable',
+                  'http_request_timeout': 'wolffi/http_request_timeout',
+                  'http_unprocessable_entity': 'wolffi/http_unprocessable_entity',
+                  'http_internal_server_error': 'wolffi/http_internal_server_error',
+                  'http_not_implemented': 'wolffi/http_not_implemented',
+                  'http_bad_gateway': 'wolffi/http_bad_gateway',
+                  'http_service_unavailable': 'wolffi/http_service_unavailable',
+                  'bandwidth_restrict': 'wolffi/bandwidth_restrict',
+                  'pod_cpu_load': 'wolffi/pod_cpu_load', 'pod_memory_load': 'wolffi/pod_memory_load',
+                  'pod_io_load': 'wolffi/pod_io_load',
+                  'pod_network_load': 'wolffi/pod_network_load'
+                  }
+        dstk_probes = {k: probe_url + v for k, v in probes.items()}
+        cexp = {'pod-delete': 'pod-delete.json', 'cpu-hog': 'pod-cpu-hog.json',
+                'disk-fill': 'disk-fill.json', 'network-loss': 'network-loss.json',
+                'network-corruption': 'network-corruption.json', 'io-stress': 'io-stress.json'}
+        aichaos = AIChaos(states=states, faults=faults, rewards=rewards,
+                          logfile=logfile, qfile=qfile, efile=efile, epfile=epfile,
+                          urls=params['urls'].split(','), namespace=params['namespace'],
+                          max_faults=params['maxfaults'],
+                          num_requests=10, timeout=2,
+                          chaos_engine=params['chaos_engine'], dstk_probes=dstk_probes, command=params['command'],
+                          loglevel=logging.DEBUG, chaos_experiment=cexp, iterations=params['iterations'])
+        aichaos.start_chaos()
+
+        file = open(outfile, "w")
+        file.write('done')
+        file.close()
+        os.remove(initfile)
+        # os.remove(csvfile)
+        # ConstraintsInference().remove_temp_files(dir, file_id)
+        return 'WRITE'
+
+    @app.route('/GenerateChaos/', methods=['POST'])
+    @swag_from('../config/yml/chaosGen.yml')
+    def chaos_gen():
+        dir = flaskdir
+        sw = AIChaosSwagger(flaskdir=dir)
+        f = request.files['file']
+        list = os.listdir(dir)
+        for i in range(10000):
+            if str(i) not in list:
+                break
+        kubeconfigfile = ''.join([dir, str(i)])
+        f.save(kubeconfigfile)
+        print('HEADER:', f.headers)
+        print('[GenerateChaos] reqs:', request.form.to_dict())
+        print('[GenerateChaos]', f.filename, datetime.now())
+        # thread = threading.Thread(target=sw.write_constraints, args=(csvfile, str(i), parameters))
+        thread = threading.Thread(target=sw.startchaos, args=(kubeconfigfile, str(i), request.form.to_dict()))
+        thread.daemon = True
+        print(thread.getName())
+        thread.start()
+        return 'Chaos ID: ' + str(i)
+
+    @app.route('/GetStatus/<chaosid>', methods=['GET'])
+    @swag_from('../config/yml/status.yml')
+    def get_status(chaosid):
+        print('[GetStatus]', chaosid, flaskdir)
+        epfile = flaskdir + 'episodes_' + str(chaosid) + '.json'
+        initfile = ''.join([flaskdir, 'init-', chaosid])
+        if os.path.exists(epfile):
+            return 'Completed'
+        elif os.path.exists(initfile):
+            return 'Running'
+        else:
+            return 'Does not exist'
+
+    @app.route('/GetQTable/<chaosid>', methods=['GET'])
+    @swag_from('../config/yml/qtable.yml')
+    def get_qtable(chaosid):
+        print('[GetQTable]', chaosid)
+        qfile = flaskdir + 'qfile_' + str(chaosid) + '.csv'
+        initfile = ''.join([flaskdir, 'init-', chaosid])
+        if os.path.exists(qfile):
+            f = open(qfile, "r")
+            return f.read()
+        elif os.path.exists(initfile):
+            return 'Running'
+        else:
+            return 'Invalid Chaos ID: ' + chaosid
+
+    @app.route('/GetEpisodes/<chaosid>', methods=['GET'])
+    @swag_from('../config/yml/episodes.yml')
+    def get_episodes(chaosid):
+        print('[GetEpisodes]', chaosid)
+        epfile = flaskdir + 'episodes_' + str(chaosid) + '.json'
+        initfile = ''.join([flaskdir, 'init-', chaosid])
+        if os.path.exists(epfile):
+            f = open(epfile, "r")
+            return f.read()
+        elif os.path.exists(initfile):
+            return 'Running'
+        else:
+            return 'Invalid Chaos ID: ' + chaosid
+
+
+if __name__ == '__main__':
+    app.run(debug=True, host='0.0.0.0', port='5001')
--- a/utils/chaos_ai/src/test_application.py
+++ b/utils/chaos_ai/src/test_application.py
@@ -0,0 +1,83 @@
+import json
+import logging
+import time
+import requests
+
+
+class TestApplication:
+    def __init__(self, num_requests=10, timeout=2, sleep_time=1):
+        self.num_requests = num_requests
+        self.timeout = timeout
+        self.sleep_time = sleep_time
+        self.logger = logging.getLogger()
+
+    def test_load(self, url=''):
+        # url = 'http://192.168.49.2:31902/api/cart/health'
+        timeout_count = 0
+        avg_lat = 0
+        for i in range(self.num_requests):
+            try:
+                r = requests.get(url, verify=False, timeout=self.timeout)
+                avg_lat += r.elapsed.total_seconds()
+                self.logger.info(
+                    url + ' ' + str(i) + ':' + str(r.status_code) + " {:.2f}".format(r.elapsed.total_seconds())
+                    + " {:.2f}".format(avg_lat))
+                if r.status_code != 200:
+                    return '200', r.status_code
+            # except requests.exceptions.Timeout as toe:
+            except Exception as toe:
+                self.logger.info(url + ' ' + str(i) + ':' + 'Timeout Exception!')
+                timeout_count += 1
+                if timeout_count > 3:
+                    return '200', 'Timeout'
+            # except Exception as e:
+            #   self.logger.debug('Connection refused!'+str(e))
+            time.sleep(self.sleep_time)
+        self.logger.info(url + "Avg: {:.2f}".format(avg_lat/self.num_requests))
+        return '200', '200'
+
+    # def test_load_hey(self):
+    #     cmd = 'hey -c 2 -z 20s http://192.168.49.2:31902/api/cart/health > temp'
+    #     os.system(cmd)
+    #     with open('temp') as f:
+    #         datafile = f.readlines()
+    #     found = False
+    #     for line in datafile:
+    #         if 'Status code distribution:' in line:
+    #             found = True
+    #         if found:
+    #             print('[test_load]', line)
+    #             m = re.search(r"\[([A-Za-z0-9_]+)\]", line)
+    #             if m is not None:
+    #                 resp_code = m.group(1)
+    #                 if resp_code != 200:
+    #                     return '200', resp_code
+    #     return '200', '200'
+
+    # # End state is reached when system is down or return error code like '500','404'
+    # def get_next_state(self):
+    #     self.logger.info('[GET_NEXT_STATE]')
+    #     f = open(self.chaos_dir + self.chaos_journal)
+    #     data = json.load(f)
+    #
+    #     # before the experiment (if before steady state is false, after is null?)
+    #     for probe in data['steady_states']['before']['probes']:
+    #         if not probe['tolerance_met']:
+    #             # start_state = probe['activity']['tolerance']
+    #             # end_state = probe['status']
+    #             start_state, end_state = None, None
+    #             return start_state, end_state
+    #
+    #     # after the experiment
+    #     for probe in data['steady_states']['after']['probes']:
+    #         # if probe['output']['status'] == probe['activity']['tolerance']:
+    #         if not probe['tolerance_met']:
+    #             # print(probe)
+    #             start_state = probe['activity']['tolerance']
+    #             end_state = probe['output']['status']
+    #             # end_state = probe['status']
+    #             return start_state, end_state
+    #     # if tolerances for all probes are met
+    #     start_state = probe['activity']['tolerance']
+    #     end_state = probe['activity']['tolerance']
+    #     return start_state, end_state
--- a/utils/chaos_ai/src/utils.py
+++ b/utils/chaos_ai/src/utils.py
@@ -0,0 +1,10 @@
+import re
+
+
+def get_load(fault):
+    params = re.findall(r'\(.*?\)', fault)
+    load = 100
+    if len(params) > 0:
+        load = params[0].strip('()')
+        fault = fault.strip(params[0])
+    return fault, load
--- a/utils/chaos_recommender/README.md
+++ b/utils/chaos_recommender/README.md
@@ -7,8 +7,8 @@ This tool profiles an application and gathers telemetry data such as CPU, Memory
 ## Pre-requisites

 - Openshift Or Kubernetes Environment where the application is hosted
- Access to the telemetry data via the exposed Prometheus endpoint
- Python3
+- Access to the metrics via the exposed Prometheus endpoint
+- Python3.9

 ## Usage

@@ -22,17 +22,17 @@ This tool profiles an application and gathers telemetry data such as CPU, Memory
    $ pip3 install -r requirements.txt
    Edit configuration file:
    $ vi config/recommender_config.yaml 
-    $ python3.9 utils/chaos_recommender/chaos_recommender.py
+    $ python3.9 utils/chaos_recommender/chaos_recommender.py -c utils/chaos_recommender/recommender_config.yaml
    ```

 2. Follow the prompts to provide the required information.

 ## Configuration
 To run the recommender with a config file specify the config file path with the `-c` argument.
-You can customize the default values by editing the `krkn/config/recommender_config.yaml` file. The configuration file contains the following options:
+You can customize the default values by editing the `recommender_config.yaml` file. The configuration file contains the following options:

  - `application`: Specify the application name.
-  - `namespace`: Specify the namespace name. If you want to profile
+  - `namespaces`: Specify the namespaces names (separated by coma or space). If you want to profile
  - `labels`: Specify the labels (not used).
  - `kubeconfig`: Specify the location of the kubeconfig file (not used).
  - `prometheus_endpoint`: Specify the prometheus endpoint (must).
@@ -65,8 +65,8 @@ You can also provide the input values through command-line arguments launching t
  -o, --options         Evaluate command line options
  -a APPLICATION, --application APPLICATION
                        Kubernetes application name
-  -n NAMESPACE, --namespace NAMESPACE
-                        Kubernetes application namespace
+  -n NAMESPACES, --namespaces NAMESPACE
+                        Kubernetes application namespaces separated by space
  -l LABELS, --labels LABELS
                        Kubernetes application labels
  -p PROMETHEUS_ENDPOINT, --prometheus-endpoint PROMETHEUS_ENDPOINT
@@ -115,6 +115,6 @@ You can customize the thresholds and options used for data analysis and identify

 ## Additional Files

- `config/recommender_config.yaml`: The configuration file containing default values for application, namespace, labels, and kubeconfig.
+- `recommender_config.yaml`: The configuration file containing default values for application, namespace, labels, and kubeconfig.

 Happy Chaos!
--- a/utils/chaos_recommender/chaos_recommender.py
+++ b/utils/chaos_recommender/chaos_recommender.py
@@ -2,6 +2,7 @@ import argparse
 import json
 import logging
 import os.path
+import re
 import sys
 import time
 import yaml
@@ -23,7 +24,7 @@ def parse_arguments(parser):
    # command line options
    parser.add_argument("-c", "--config-file", action="store", help="Config file path")
    parser.add_argument("-o", "--options", action="store_true", help="Evaluate command line options")
-    parser.add_argument("-n", "--namespace", action="store", default="", help="Kubernetes application namespace")
+    parser.add_argument("-n", "--namespaces", action="store", default="", nargs="+", help="Kubernetes application namespaces separated by space")
    parser.add_argument("-p", "--prometheus-endpoint", action="store", default="", help="Prometheus endpoint URI")
    parser.add_argument("-k", "--kubeconfig", action="store", default=kube_config.KUBE_CONFIG_DEFAULT_LOCATION, help="Kubeconfig path")
    parser.add_argument("-t", "--token", action="store", default="", help="Kubernetes authentication token")
@@ -57,7 +58,8 @@ def read_configuration(config_file_path):
        config = yaml.safe_load(config_file)

    log_level = config.get("log level", "INFO")
-    namespace = config.get("namespace")
+    namespaces = config.get("namespaces")
+    namespaces = re.split(r",+\s+|,+|\s+", namespaces)
    kubeconfig = get_yaml_item_value(config, "kubeconfig", kube_config.KUBE_CONFIG_DEFAULT_LOCATION)

    prometheus_endpoint = config.get("prometheus_endpoint")
@@ -72,9 +74,9 @@ def read_configuration(config_file_path):
    else:
        output_path = False
    chaos_tests = config.get("chaos_tests", {})
-    return (namespace, kubeconfig, prometheus_endpoint, auth_token, scrape_duration,
-            chaos_tests, log_level, threshold, heatmap_cpu_threshold,
-            heatmap_mem_threshold, output_path)
+    return (namespaces, kubeconfig, prometheus_endpoint, auth_token,
+            scrape_duration, chaos_tests, log_level, threshold,
+            heatmap_cpu_threshold, heatmap_mem_threshold, output_path)


 def prompt_input(prompt, default_value):
@@ -84,21 +86,18 @@ def prompt_input(prompt, default_value):
    return default_value


-def make_json_output(inputs, queries, analysis_data, output_path):
+def make_json_output(inputs, namespace_data, output_path):
    time_str = time.strftime("%Y-%m-%d_%H-%M-%S", time.localtime())

    data = {
        "inputs": inputs,
-        "queries": queries,
-        "profiling": analysis_data[0],
-        "heatmap_analysis": analysis_data[1],
-        "recommendations": analysis_data[2]
+        "analysis_outputs": namespace_data
    }

    logging.info(f"Summary\n{json.dumps(data, indent=4)}")

    if output_path is not False:
-        file = f"recommender_{inputs['namespace']}_{time_str}.json"
+        file = f"recommender_{time_str}.json"
        path = f"{os.path.expanduser(output_path)}/{file}"

        with open(path, "w") as json_output:
@@ -107,9 +106,11 @@ def make_json_output(inputs, queries, analysis_data, output_path):
            logging.info(f"Recommendation output saved in {file}.")


-def json_inputs(namespace, kubeconfig, prometheus_endpoint, scrape_duration, chaos_tests, threshold, heatmap_cpu_threshold, heatmap_mem_threshold):
+def json_inputs(namespaces, kubeconfig, prometheus_endpoint, scrape_duration,
+                chaos_tests, threshold, heatmap_cpu_threshold,
+                heatmap_mem_threshold):
    inputs = {
-        "namespace": namespace,
+        "namespaces": namespaces,
        "kubeconfig": kubeconfig,
        "prometheus_endpoint": prometheus_endpoint,
        "scrape_duration": scrape_duration,
@@ -121,6 +122,17 @@ def json_inputs(namespace, kubeconfig, prometheus_endpoint, scrape_duration, cha
    return inputs


+def json_namespace(namespace, queries, analysis_data):
+    data = {
+        "namespace": namespace,
+        "queries": queries,
+        "profiling": analysis_data[0],
+        "heatmap_analysis": analysis_data[1],
+        "recommendations": analysis_data[2]
+    }
+    return data
+
+
 def main():
    parser = argparse.ArgumentParser(description="Krkn Chaos Recommender Command-Line tool")
    args = parse_arguments(parser)
@@ -132,7 +144,7 @@ def main():

    if args.config_file is not None:
        (
-         namespace,
+         namespaces,
         kubeconfig,
         prometheus_endpoint,
         auth_token,
@@ -146,7 +158,7 @@ def main():
         ) = read_configuration(args.config_file)

    if args.options:
-        namespace = args.namespace
+        namespaces = args.namespaces
        kubeconfig = args.kubeconfig
        auth_token = args.token
        scrape_duration = args.scrape_duration
@@ -172,14 +184,26 @@ def main():
        if not os.path.exists(os.path.expanduser(output_path)):
            logging.error(f"Folder {output_path} for output not found.")
            sys.exit(1)
+
    logging.info("Loading inputs...")
-    inputs = json_inputs(namespace, kubeconfig, prometheus_endpoint, scrape_duration, chaos_tests, threshold, heatmap_cpu_threshold, heatmap_mem_threshold)
-    logging.info("Starting Analysis ...")
+    inputs = json_inputs(namespaces, kubeconfig, prometheus_endpoint,
+                         scrape_duration, chaos_tests, threshold,
+                         heatmap_cpu_threshold, heatmap_mem_threshold)
+    namespaces_data = []

-    file_path, queries = prometheus.fetch_utilization_from_prometheus(prometheus_endpoint, auth_token, namespace, scrape_duration)
-    analysis_data = analysis(file_path, chaos_tests, threshold, heatmap_cpu_threshold, heatmap_mem_threshold)
+    logging.info("Starting Analysis...")

-    make_json_output(inputs, queries, analysis_data, output_path)
+    file_path, queries = prometheus.fetch_utilization_from_prometheus(
+        prometheus_endpoint, auth_token, namespaces, scrape_duration)
+
+    analysis_data = analysis(file_path, namespaces, chaos_tests, threshold,
+                             heatmap_cpu_threshold, heatmap_mem_threshold)
+
+    for namespace in namespaces:
+        namespace_data = json_namespace(namespace, queries[namespace],
+                                        analysis_data[namespace])
+        namespaces_data.append(namespace_data)
+    make_json_output(inputs, namespaces_data, output_path)


 if __name__ == "__main__":
--- a/utils/chaos_recommender/recommender_config.yaml
+++ b/utils/chaos_recommender/recommender_config.yaml
@@ -0,0 +1,35 @@
+application: openshift-etcd
+namespaces: openshift-etcd
+labels: app=openshift-etcd
+kubeconfig: ~/.kube/config.yaml
+prometheus_endpoint: <Prometheus_Endpoint>
+auth_token: <Auth_Token>
+scrape_duration: 10m
+chaos_library: "kraken"
+log_level: INFO
+json_output_file: False
+json_output_folder_path:
+
+# for output purpose only do not change if not needed
+chaos_tests:
+  GENERIC:
+    - pod_failure
+    - container_failure
+    - node_failure
+    - zone_outage
+    - time_skew
+    - namespace_failure
+    - power_outage
+  CPU:
+    - node_cpu_hog
+  NETWORK:
+    - application_outage
+    - node_network_chaos
+    - pod_network_chaos
+  MEM:
+    - node_memory_hog
+    - pvc_disk_fill
+
+threshold: .7
+cpu_threshold: .5
+mem_threshold: .5
Author	SHA1	Message	Date
Tullio Sebastiani	e02c6d1287	SYN flood scenario (#668 ) * scenario config file Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * syn flood plugin Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * run_krkn.py updaated Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * requirements.txt + documentation + config.yaml * set node selector defaults to worker Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> --------- Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-07-29 15:31:37 -04:00
jtydlack	04425a8d8a	Add alerts to alert.yaml Signed-off-by: jtydlack <139967002+jtydlack@users.noreply.github.com>	2024-07-25 10:51:15 -04:00
Naga Ravi Chaitanya Elluri	f3933f0e62	fix: requirements.txt to reduce vulnerabilities (#673 ) The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-SETUPTOOLS-7448482 Co-authored-by: snyk-bot <snyk-bot@snyk.io>	2024-07-22 10:12:14 -04:00
Naga Ravi Chaitanya Elluri	56ff0a8c72	Deprecate setting release version in the container source file This commit also deprecates building container image for ppc64le as it is not actively maintained. We will add support if users request for it in the future. Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>	2024-07-18 12:56:08 -04:00
Tullio Sebastiani	9378cd74cd	krkn-lib update v2.1.6 to fix pod monitoring time calculations (#674 ) Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-07-16 18:04:24 +02:00
Paige Patton	4d3491da0f	adidng action token passing (#671 ) rh-pre-commit.version: 2.2.0 rh-pre-commit.check-secrets: ENABLED Signed-off-by: Paige Rubendall <prubenda@redhat.com>	2024-07-15 12:50:20 -04:00
Naga Ravi Chaitanya Elluri	d6ce66160b	Remove podman-compose dependency We are not using it in the krkn code base and removing it fixes one of the license issues reported by FOSSA. This commit also removes setting up dependencies using docker/podman compose as it not actively maintained. Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>	2024-07-10 17:25:33 -04:00
Paige Rubendall	ef1a55438b	taking out need for az cli to be installed rh-pre-commit.version: 2.2.0 rh-pre-commit.check-secrets: ENABLED Signed-off-by: Paige Rubendall <prubenda@redhat.com>	2024-07-05 15:18:06 -04:00
Tullio Sebastiani	d8f54b83a2	fixed image push issue Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-07-05 10:32:01 -04:00
Tullio Sebastiani	4870c86515	moves the krkn-hub build from push on main to tag (#660 ) * moves the krkn-hub build from push on main to tag + final image enhancement Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> fixed syntax Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> typo Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> typo Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * quotes Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> --------- Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-07-05 16:09:34 +02:00
Naga Ravi Chaitanya Elluri	6ae17cf678	Update dockerfile to install azure-cli using dnf Avoids architecture issues such as "bash: /usr/bin/az: cannot execute: required file not found" Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>	2024-07-03 18:35:45 -04:00
Tullio Sebastiani	ce9f8aa050	Dockerfile update v1.6.2 (#659 ) Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-07-03 16:34:37 +02:00
Paige Patton	05148317c1	taking out one glcoud call (#657 ) rh-pre-commit.version: 2.2.0 rh-pre-commit.check-secrets: ENABLED Signed-off-by: Paige Rubendall <prubenda@redhat.com>	2024-07-03 16:14:19 +02:00
Tullio Sebastiani	5f836f294b	Kill pod arca plugin update adaptation (#656 ) * new kill-pod interface adaptation Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * unit test fix Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * requirements update Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * fixed duplicate requirement Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * added conditional dockerfile build Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> fix Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> fix Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> fix Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> removed useless print Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> --------- Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-07-03 15:50:43 +02:00
snyk-bot	cfa1bb09a0	fix: requirements.txt to reduce vulnerabilities The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-REQUESTS-6928867	2024-06-24 10:23:37 -04:00
Naga Ravi Chaitanya Elluri	5ddfff5a85	Make krkn dir executable Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>	2024-06-20 14:32:20 -04:00
Tullio Sebastiani	7d18487228	Dockerfile update Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-06-12 14:36:38 -04:00
Naga Ravi Chaitanya Elluri	08de42c91a	Bump arcaflow version to 0.17.2 (#648 ) Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>	2024-06-12 20:29:32 +02:00
dependabot[bot]	dc7d5bb01b	Bump azure-identity from 1.15.0 to 1.16.1 Bumps [azure-identity](https://github.com/Azure/azure-sdk-for-python) from 1.15.0 to 1.16.1. - [Release notes](https://github.com/Azure/azure-sdk-for-python/releases) - [Changelog](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/esrp_release.md) - [Commits](https://github.com/Azure/azure-sdk-for-python/compare/azure-identity_1.15.0...azure-identity_1.16.1) --- updated-dependencies: - dependency-name: azure-identity dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2024-06-12 09:17:14 -04:00
Tullio Sebastiani	ea3444d375	added dependencies removed from the hub Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> jsonschema Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-06-11 12:07:28 -04:00
Tullio Sebastiani	7b660a0878	Fixes system and oc vulnerabilities detected by trivy (#644 ) * fixes system and oc vulnerabilities detected by trivy Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * updated base image to run as krkn user instead of root Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> --------- Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-06-10 14:26:03 -04:00
Tullio Sebastiani	5fe0655f22	libnghttp2 version update Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-06-06 08:21:08 -04:00
Tullio Sebastiani	5df343c183	dockerfile update Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-06-04 14:36:11 -04:00
Tullio Sebastiani	f364e9f283	Arcaflow upgrade to engine v0.17.1 (#639 ) * krkn plugin refactoring to match new engine context path management Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * cpu-hog new syntax Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * memory-hog new syntax Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> removed s from duration Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * io-hog new syntax Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> cpu-hog input Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * path management refactoring agreed with arca team Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> refactoring Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> --------- Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-06-04 14:13:33 -04:00
Tullio Sebastiani	86a7427606	Dockerfile refactoring to build oc together with krkn Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> added oc in /usr/local/bin as well Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> fixed dumb docker build copy Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-06-04 10:41:11 -04:00
Mudit Verma	31266fbc3e	support for node limits	2024-05-31 11:22:30 -04:00
Tullio Sebastiani	57de3769e7	ubi 9 base image + quay.io vulnerability fixes Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-05-31 10:58:52 -04:00
Paige Rubendall	42fc8eea40	adding wait in pvc scenarios and serivce hijack rh-pre-commit.version: 2.2.0 rh-pre-commit.check-secrets: ENABLED Signed-off-by: Paige Rubendall <prubenda@redhat.com>	2024-05-29 16:34:33 -04:00
dependabot[bot]	22d56e2cdc	--- updated-dependencies: - dependency-name: requests dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2024-05-22 17:12:46 -04:00
Matt Leader	a259b68221	Updates for Arcaflow Plugin Stress-NG 0.6.0 (#625 ) * change for cpu hog Signed-off-by: Matthew F Leader <mleader@redhat.com> * change for io hog Signed-off-by: Matthew F Leader <mleader@redhat.com> * change for memory hog Signed-off-by: Matthew F Leader <mleader@redhat.com> --------- Signed-off-by: Matthew F Leader <mleader@redhat.com>	2024-05-20 12:35:51 -04:00
Tullio Sebastiani	052f83e7d9	added reference to webservice source code in the documentation (#630 )	2024-05-14 17:58:06 +02:00
Tullio Sebastiani	fb3bbe4e26	replaced log syntax to allow objects to be printed Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-05-14 11:13:44 -04:00
Naga Ravi Chaitanya Elluri	96ba9be4b8	Add instructions to copy the python package file to docker dir (#616 ) Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>	2024-05-13 12:36:37 -04:00
Naga Ravi Chaitanya Elluri	58d5d1d8dc	Have a config in the chaos_recommender dir (#615 ) This will make it easy for the users to find, configure and run it. Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>	2024-05-13 12:33:41 -04:00
Tullio Sebastiani	3fe22a0d8f	fixing badgecommit fail when coverage doesn't change Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-05-13 12:30:59 -04:00
Tullio Sebastiani	21b89a32a7	fixing missing import for log_exception Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-05-13 11:58:13 -04:00
Tullio Sebastiani	dbe3ea9718	Dockerfiles update Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-05-13 10:56:58 -04:00
Tullio Sebastiani	a142f6e7a4	Service hijacking scenario (#617 ) * WIP: service hijacking scenario Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * wip Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * error handling Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> adapted run_raken.py Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * restored config.yaml Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * added funtest Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> test fix Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> fix Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> fixed test Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> fix Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> fix test Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> fixed funtest Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> funtest fix Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> minor nit Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> added explicit curl method Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> push Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> fix Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> restored all funtests Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> added mime type test Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> fixed pipeline Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> commented unit Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> utf-8 Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> test restored Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> fix test pipeline Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * documentation Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * krkn-lib 2.1.3 Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * added other funtests to main merge to collect coverage Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> --------- Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-05-13 10:04:06 +02:00
Tullio Sebastiani	2610a7af67	added coverage badge and build badge to krkn Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> fix Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> nit Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> permission Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> if main Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-05-10 09:57:10 -04:00
dependabot[bot]	f827f65132	Bump werkzeug from 2.3.8 to 3.0.3 in /utils/chaos_ai/docker (#619 ) Bumps [werkzeug](https://github.com/pallets/werkzeug) from 2.3.8 to 3.0.3. - [Release notes](https://github.com/pallets/werkzeug/releases) - [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/werkzeug/compare/2.3.8...3.0.3) --- updated-dependencies: - dependency-name: werkzeug dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>	2024-05-06 16:09:10 -04:00
dependabot[bot]	aa6cbbc11a	Bump werkzeug from 3.0.1 to 3.0.3 Bumps [werkzeug](https://github.com/pallets/werkzeug) from 3.0.1 to 3.0.3. - [Release notes](https://github.com/pallets/werkzeug/releases) - [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/werkzeug/compare/3.0.1...3.0.3) --- updated-dependencies: - dependency-name: werkzeug dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2024-05-06 16:04:27 -04:00
dependabot[bot]	e17354e54d	Bump jinja2 from 3.1.3 to 3.1.4 Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.3 to 3.1.4. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/jinja/compare/3.1.3...3.1.4) --- updated-dependencies: - dependency-name: jinja2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2024-05-06 15:44:52 -04:00
Tullio Sebastiani	2dfa5cb0cd	fixes missing data in telemetry.json Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-05-06 14:16:09 -04:00
dependabot[bot]	0799008cd5	Bump flask from 2.1.0 to 2.2.5 in /utils/chaos_ai/docker (#611 ) Bumps [flask](https://github.com/pallets/flask) from 2.1.0 to 2.2.5. - [Release notes](https://github.com/pallets/flask/releases) - [Changelog](https://github.com/pallets/flask/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/flask/compare/2.1.0...2.2.5) --- updated-dependencies: - dependency-name: flask dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>	2024-04-25 09:11:50 -04:00
Tullio Sebastiani	2327531e46	Dockerfiles update (#614 ) Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-04-24 11:40:58 -04:00
dependabot[bot]	2c14c48a63	Bump werkzeug from 2.2.2 to 2.3.8 in /utils/chaos_ai/docker (#610 ) Bumps [werkzeug](https://github.com/pallets/werkzeug) from 2.2.2 to 2.3.8. - [Release notes](https://github.com/pallets/werkzeug/releases) - [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/werkzeug/compare/2.2.2...2.3.8) --- updated-dependencies: - dependency-name: werkzeug dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-23 15:26:51 +02:00
Tullio Sebastiani	ab98e416a6	Integration of the new pod recovery monitoring strategy implemented in krkn-lib (#609 ) * pod monitoring integration in plugin scenario Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * pod monitoring integration in container scenario Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * removed wait-for-pod step from plugin scenario config files Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * introduced global pod recovery time Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> nit Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * introduced krkn_pod_recovery_time in plugin scenario and removed all the references to wait-for-pods Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> fix Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * functional test fix Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * main branch functional test fix Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * increased recovery times Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> --------- Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-04-23 10:49:01 +02:00
Sandeep Hans	19ad2d1a3d	initial version of Chaos AI (#606 ) * init push Signed-off-by: Sandeep Hans <shans001@in.ibm.com> * remove litmus + updated readme Signed-off-by: Sandeep Hans <shans001@in.ibm.com> * remove redundant files Signed-off-by: Sandeep Hans <shans001@in.ibm.com> * removed generated file+unused reference --------- Signed-off-by: Sandeep Hans <shans001@in.ibm.com> Co-authored-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>	2024-04-16 10:41:31 -04:00
jtydlcak	804d7cbf58	Accept list of namespaces in chaos recommender Signed-off-by: jtydlack <139967002+jtydlack@users.noreply.github.com>	2024-04-09 23:32:17 -04:00
Paige Rubendall	54af2fc6ff	adding v1.5.12 tag Signed-off-by: Paige Rubendall <prubenda@redhat.com>	2024-03-29 18:45:52 -04:00
Paige Rubendall	b79e526cfd	adding app outage not creating file (#605 ) Signed-off-by: Paige Rubendall <prubenda@redhat.com>	2024-03-29 14:35:14 -04:00
Naga Ravi Chaitanya Elluri	a5efd7d06c	Bump release version to v1.5.11 Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>	2024-03-22 15:24:04 -04:00