fix: use per-URL status_code in HealthChecker telemetry (#1091 )

Signed-off-by: AR21SM <mahajanashishar21sm@gmail.com> Co-authored-by: Paige Patton <64206430+paigerube14@users.noreply.github.com>
Network chaos NG porting - pod network chaos node network chaos (#991 )
2026-02-19 20:40:33 +00:00 · 2026-02-19 09:25:03 -05:00 · 2026-02-18 16:20:16 +01:00 · 2026-02-18 18:26:14 +05:30 · 2026-02-17 15:20:10 -05:00 · 2026-02-11 13:44:13 -05:00
166 changed files with 19063 additions and 2955 deletions
--- a/.coveragerc
+++ b/.coveragerc
@@ -0,0 +1,4 @@
+[run]
+omit =
+    tests/*
+    krkn/tests/**
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,27 +1,47 @@
-## Type of change
+# Type of change

 - [ ] Refactor
 - [ ] New feature
 - [ ] Bug fix
 - [ ] Optimization

-## Description  
-<!-- Provide a brief description of the changes made in this PR. -->  
+# Description  
+<-- Provide a brief description of the changes made in this PR. -->  

 ## Related Tickets & Documents
+If no related issue, please create one and start the converasation on wants of 

- Related Issue #
- Closes #
+- Related Issue #: 
+- Closes #: 

-## Documentation  
+# Documentation  
 - [ ] **Is documentation needed for this update?**

 If checked, a documentation PR must be created and merged in the [website repository](https://github.com/krkn-chaos/website/).

 ## Related Documentation PR (if applicable)  
-<!-- Add the link to the corresponding documentation PR in the website repository -->  
+<-- Add the link to the corresponding documentation PR in the website repository -->  

-## Checklist before requesting a review
+# Checklist before requesting a review
+[ ] Ensure the changes and proposed solution have been discussed in the relevant issue and have received acknowledgment from the community or maintainers. See [contributing guidelines](https://krkn-chaos.dev/docs/contribution-guidelines/)
+See [testing your changes](https://krkn-chaos.dev/docs/developers-guide/testing-changes/) and run on any Kubernetes or OpenShift cluster to validate your changes
+- [ ] I have performed a self-review of my code by running krkn and specific scenario 
+- [ ] If it is a core feature, I have added thorough unit tests with above 80% coverage

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
+*REQUIRED*:
+Description of combination of tests performed and output of run
+
+```bash
+python run_kraken.py
+...
+<---insert test results output--->
+```
+
+OR
+
+
+```bash
+python -m coverage run -a -m unittest discover -s tests -v
+...
+<---insert test results output--->
+```
--- a/.github/workflows/stale.yml
+++ b/.github/workflows/stale.yml
@@ -0,0 +1,52 @@
+name: Manage Stale Issues and Pull Requests
+
+on:
+  schedule:
+    # Run daily at 1:00 AM UTC
+    - cron: '0 1 * * *'
+  workflow_dispatch: 
+
+permissions:
+  issues: write
+  pull-requests: write
+
+jobs:
+  stale:
+    name: Mark and Close Stale Issues and PRs
+    runs-on: ubuntu-latest
+    steps:
+      - name: Mark and close stale issues and PRs
+        uses: actions/stale@v9
+        with:
+          days-before-issue-stale: 60
+          days-before-issue-close: 14
+          stale-issue-label: 'stale'
+          stale-issue-message: |
+            This issue has been automatically marked as stale because it has not had any activity in the last 60 days.
+            It will be closed in 14 days if no further activity occurs.
+            If this issue is still relevant, please leave a comment or remove the stale label.
+            Thank you for your contributions to krkn!
+          close-issue-message: |
+            This issue has been automatically closed due to inactivity.
+            If you believe this issue is still relevant, please feel free to reopen it or create a new issue with updated information.
+            Thank you for your understanding!
+          close-issue-reason: 'not_planned'
+
+          days-before-pr-stale: 90
+          days-before-pr-close: 14
+          stale-pr-label: 'stale'
+          stale-pr-message: |
+            This pull request has been automatically marked as stale because it has not had any activity in the last 90 days.
+            It will be closed in 14 days if no further activity occurs.
+            If this PR is still relevant, please rebase it, address any pending reviews, or leave a comment.
+            Thank you for your contributions to krkn!
+          close-pr-message: |
+            This pull request has been automatically closed due to inactivity.
+            If you believe this PR is still relevant, please feel free to reopen it or create a new pull request with updated changes.
+            Thank you for your understanding!
+
+          # Exempt labels
+          exempt-issue-labels: 'bug,enhancement,good first issue'
+          exempt-pr-labels: 'pending discussions,hold'
+
+          remove-stale-when-updated: true
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -32,21 +32,33 @@ jobs:
      - name: Install Python
        uses: actions/setup-python@v4
        with:
-          python-version: '3.9'
+          python-version: '3.11'
          architecture: 'x64'
      - name: Install environment
        run: |
          sudo apt-get install build-essential python3-dev
          pip install --upgrade pip
          pip install -r requirements.txt
+          pip install coverage

      - name: Deploy test workloads
        run: |
-          es_pod_name=$(kubectl get pods -l "app=elasticsearch-master" -o name)
-          echo "POD_NAME: $es_pod_name"
-          kubectl --namespace default port-forward $es_pod_name 9200 &
-          prom_name=$(kubectl get pods -n monitoring -l "app.kubernetes.io/name=prometheus" -o name)
-          kubectl --namespace monitoring port-forward $prom_name 9090 &
+          #          es_pod_name=$(kubectl get pods -l "app=elasticsearch-master" -o name)
+          #          echo "POD_NAME: $es_pod_name"
+          #          kubectl --namespace default port-forward $es_pod_name 9200 &
+          #          prom_name=$(kubectl get pods -n monitoring -l "app.kubernetes.io/name=prometheus" -o name)
+          #          kubectl --namespace monitoring port-forward $prom_name 9090 &
+
+          # Wait for Elasticsearch to be ready
+          echo "Waiting for Elasticsearch to be ready..."
+          for i in {1..30}; do
+            if curl -k -s -u elastic:$ELASTIC_PASSWORD https://localhost:9200/_cluster/health > /dev/null 2>&1; then
+              echo "Elasticsearch is ready!"
+              break
+            fi
+            echo "Attempt $i: Elasticsearch not ready yet, waiting..."
+            sleep 2
+          done
          kubectl apply -f CI/templates/outage_pod.yaml
          kubectl wait --for=condition=ready pod -l scenario=outage --timeout=300s
          kubectl apply -f CI/templates/container_scenario_pod.yaml
@@ -56,39 +68,43 @@ jobs:
          kubectl wait --for=condition=ready pod -l scenario=time-skew --timeout=300s
          kubectl apply -f CI/templates/service_hijacking.yaml
          kubectl wait --for=condition=ready pod -l "app.kubernetes.io/name=proxy" --timeout=300s
+          kubectl apply -f CI/legacy/scenarios/volume_scenario.yaml
+          kubectl wait --for=condition=ready pod kraken-test-pod -n kraken --timeout=300s
      - name: Get Kind nodes
        run: |
          kubectl get nodes --show-labels=true
      # Pull request only steps
      - name: Run unit tests
-        if: github.event_name == 'pull_request'
        run: python -m coverage run -a -m unittest discover -s tests -v

-      - name: Setup Pull Request Functional Tests
-        if: |
-          github.event_name == 'pull_request'
+      - name: Setup Functional Tests
        run: |
-            yq -i '.kraken.port="8081"' CI/config/common_test_config.yaml
-            yq -i '.kraken.signal_address="0.0.0.0"' CI/config/common_test_config.yaml
            yq -i '.kraken.performance_monitoring="localhost:9090"' CI/config/common_test_config.yaml
            yq -i '.elastic.elastic_port=9200' CI/config/common_test_config.yaml
            yq -i '.elastic.elastic_url="https://localhost"' CI/config/common_test_config.yaml
-            yq -i '.elastic.enable_elastic=True' CI/config/common_test_config.yaml
+            yq -i '.elastic.enable_elastic=False' CI/config/common_test_config.yaml
            yq -i '.elastic.password="${{env.ELASTIC_PASSWORD}}"' CI/config/common_test_config.yaml
            yq -i '.performance_monitoring.prometheus_url="http://localhost:9090"' CI/config/common_test_config.yaml
-            echo "test_service_hijacking" > ./CI/tests/functional_tests
-            echo "test_app_outages" >> ./CI/tests/functional_tests
-            echo "test_container"      >> ./CI/tests/functional_tests
-            echo "test_pod" >> ./CI/tests/functional_tests
-            echo "test_customapp_pod" >> ./CI/tests/functional_tests
-            echo "test_namespace"      >> ./CI/tests/functional_tests
-            echo "test_net_chaos"      >> ./CI/tests/functional_tests
-            echo "test_time"           >> ./CI/tests/functional_tests
+            echo "test_app_outages" > ./CI/tests/functional_tests
+            echo "test_container" >> ./CI/tests/functional_tests
            echo "test_cpu_hog" >> ./CI/tests/functional_tests
-            echo "test_memory_hog" >> ./CI/tests/functional_tests
+            echo "test_customapp_pod" >> ./CI/tests/functional_tests
            echo "test_io_hog" >> ./CI/tests/functional_tests
-            echo "test_pod_network_filter" >> ./CI/tests/functional_tests
+            echo "test_memory_hog" >> ./CI/tests/functional_tests
+            echo "test_namespace" >> ./CI/tests/functional_tests
+            echo "test_net_chaos" >> ./CI/tests/functional_tests
+            echo "test_node" >> ./CI/tests/functional_tests

+            echo "test_service_hijacking" >> ./CI/tests/functional_tests
+            echo "test_pod_network_filter" >> ./CI/tests/functional_tests
+            echo "test_pod_server" >> ./CI/tests/functional_tests
+            echo "test_time" >> ./CI/tests/functional_tests
+            echo "test_node_network_chaos" >> ./CI/tests/functional_tests
+            echo "test_pod_network_chaos" >> ./CI/tests/functional_tests          
+            echo "test_pod_error" >> ./CI/tests/functional_tests
+            echo "test_pod" >> ./CI/tests/functional_tests
+            # echo "test_pvc" >> ./CI/tests/functional_tests
+          

      # Push on main only steps + all other functional to collect coverage
      # for the badge
@@ -102,30 +118,9 @@ jobs:
      - name: Setup Post Merge Request Functional Tests
        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
        run: |
-          yq -i '.kraken.port="8081"' CI/config/common_test_config.yaml
-          yq -i '.kraken.signal_address="0.0.0.0"' CI/config/common_test_config.yaml
-          yq -i '.kraken.performance_monitoring="localhost:9090"' CI/config/common_test_config.yaml
-          yq -i '.elastic.enable_elastic=True' CI/config/common_test_config.yaml
-          yq -i '.elastic.password="${{env.ELASTIC_PASSWORD}}"' CI/config/common_test_config.yaml
-          yq -i '.elastic.elastic_port=9200' CI/config/common_test_config.yaml
-          yq -i '.elastic.elastic_url="https://localhost"' CI/config/common_test_config.yaml
-          yq -i '.performance_monitoring.prometheus_url="http://localhost:9090"' CI/config/common_test_config.yaml
          yq -i '.telemetry.username="${{secrets.TELEMETRY_USERNAME}}"' CI/config/common_test_config.yaml
          yq -i '.telemetry.password="${{secrets.TELEMETRY_PASSWORD}}"' CI/config/common_test_config.yaml
-          echo "test_telemetry" > ./CI/tests/functional_tests
-          echo "test_service_hijacking" >> ./CI/tests/functional_tests
-          echo "test_app_outages" >> ./CI/tests/functional_tests
-          echo "test_container"      >> ./CI/tests/functional_tests
-          echo "test_pod" >> ./CI/tests/functional_tests
-          echo "test_customapp_pod" >> ./CI/tests/functional_tests
-          echo "test_namespace"      >> ./CI/tests/functional_tests
-          echo "test_net_chaos"      >> ./CI/tests/functional_tests
-          echo "test_time"           >> ./CI/tests/functional_tests
-          echo "test_cpu_hog" >> ./CI/tests/functional_tests
-          echo "test_memory_hog" >> ./CI/tests/functional_tests
-          echo "test_io_hog" >> ./CI/tests/functional_tests
-          echo "test_pod_network_filter" >> ./CI/tests/functional_tests
-
+          echo "test_telemetry" >> ./CI/tests/functional_tests
      # Final common steps
      - name: Run Functional tests
        env:
@@ -135,38 +130,38 @@ jobs:
          cat ./CI/results.markdown >> $GITHUB_STEP_SUMMARY
          echo >> $GITHUB_STEP_SUMMARY
      - name: Upload CI logs
-        if: ${{ success() || failure() }}
+        if: ${{ always() }}
        uses: actions/upload-artifact@v4
        with:
          name: ci-logs
          path: CI/out
          if-no-files-found: error
      - name: Collect coverage report
-        if: ${{ success() || failure() }}
+        if: ${{ always() }}
        run: |
          python -m coverage html
          python -m coverage json
      - name: Publish coverage report to job summary
-        if: ${{ success() || failure() }}
+        if: ${{ always() }}
        run: |
          pip install html2text
          html2text --ignore-images --ignore-links -b 0 htmlcov/index.html >> $GITHUB_STEP_SUMMARY
      - name: Upload coverage data
-        if: ${{ success() || failure() }}
+        if: ${{ always() }}
        uses: actions/upload-artifact@v4
        with:
          name: coverage
          path: htmlcov
          if-no-files-found: error
      - name: Upload json coverage
-        if: ${{ success() || failure() }}
+        if: ${{ always() }}
        uses: actions/upload-artifact@v4
        with:
          name: coverage.json
          path: coverage.json
          if-no-files-found: error
      - name: Check CI results
-        if: ${{ success() || failure() }}
+        if: ${{ always() }}
        run: "! grep Fail CI/results.markdown"

  badge:
@@ -191,7 +186,7 @@ jobs:
        - name: Set up Python
          uses: actions/setup-python@v4
          with:
-            python-version: 3.9
+            python-version: '3.11'
        - name: Copy badge on GitHub Page Repo
          env:
            COLOR: yellow
--- a/CI/config/common_test_config.yaml
+++ b/CI/config/common_test_config.yaml
@@ -2,6 +2,10 @@ kraken:
    distribution: kubernetes                                # Distribution can be kubernetes or openshift.
    kubeconfig_path: ~/.kube/config                        # Path to kubeconfig.
    exit_on_failure: False                                 # Exit when a post action scenario fails.
+    publish_kraken_status: True                            # Can be accessed at http://0.0.0.0:8081
+    signal_state: RUN                                      # Will wait for the RUN signal when set to PAUSE before running the scenarios, refer docs/signal.md for more details
+    signal_address: 0.0.0.0                                # Signal listening address
+    port: 8081                                             # Signal port
    auto_rollback: True                                    # Enable auto rollback for scenarios.
    rollback_versions_directory: /tmp/kraken-rollback      # Directory to store rollback version files.
    chaos_scenarios:                                       # List of policies/chaos scenarios to load.
@@ -38,7 +42,7 @@ telemetry:
    prometheus_backup: True                                 # enables/disables prometheus data collection
    full_prometheus_backup: False                           # if is set to False only the /prometheus/wal folder will be downloaded.
    backup_threads: 5                                       # number of telemetry download/upload threads
-    archive_path: /tmp                                      # local path where the archive files will be temporarly stored
+    archive_path: /tmp                                      # local path where the archive files will be temporarily stored
    max_retries: 0                                          # maximum number of upload retries (if 0 will retry forever)
    run_tag: ''                                             # if set, this will be appended to the run folder in the bucket (useful to group the runs)
    archive_size: 10000                                     # the size of the prometheus data archive size in KB. The lower the size of archive is
--- a/CI/legacy/scenarios/volume_scenario.yaml
+++ b/CI/legacy/scenarios/volume_scenario.yaml
@@ -45,15 +45,45 @@ metadata:
  name: kraken-test-pod
  namespace: kraken
 spec:
+  securityContext:
+    fsGroup: 1001
+  # initContainer to fix permissions on the mounted volume
+  initContainers:
+    - name: fix-permissions
+      image: 'quay.io/centos7/httpd-24-centos7:centos7'
+      command:
+        - sh
+        - -c
+        - |
+          echo "Setting up permissions for /home/kraken..."
+          # Create the directory if it doesn't exist
+          mkdir -p /home/kraken
+          # Set ownership to user 1001 and group 1001
+          chown -R 1001:1001 /home/kraken
+          # Set permissions to allow read/write
+          chmod -R 755 /home/kraken
+          rm -rf /home/kraken/*
+          echo "Permissions fixed. Current state:"
+          ls -la /home/kraken
+      volumeMounts:
+        - mountPath: "/home/kraken"
+          name: kraken-test-pv
+      securityContext:
+        runAsUser: 0  # Run as root to fix permissions
  volumes:
    - name: kraken-test-pv
      persistentVolumeClaim:
        claimName: kraken-test-pvc
  containers:
    - name: kraken-test-container
-      image: 'quay.io/centos7/httpd-24-centos7:latest'
-      volumeMounts:
-        - mountPath: "/home/krake-dir/"
-          name: kraken-test-pv
+      image: 'quay.io/centos7/httpd-24-centos7:centos7'
      securityContext:
-        privileged: true
+        runAsUser: 1001
+        runAsNonRoot: true
+        allowPrivilegeEscalation: false
+        capabilities:
+          drop:
+            - ALL
+      volumeMounts:
+        - mountPath: "/home/kraken"
+          name: kraken-test-pv
--- a/CI/tests/test_app_outages.sh
+++ b/CI/tests/test_app_outages.sh
@@ -19,6 +19,7 @@ function functional_test_app_outage {
  kubectl get pods 
  envsubst < CI/config/common_test_config.yaml > CI/config/app_outage.yaml
  cat $scenario_file
+  cat CI/config/app_outage.yaml
  python3 -m coverage run -a run_kraken.py -c CI/config/app_outage.yaml
  echo "App outage scenario test: Success"
 }
--- a/CI/tests/test_container.sh
+++ b/CI/tests/test_container.sh
@@ -16,8 +16,10 @@ function functional_test_container_crash {
  export post_config=""
  envsubst < CI/config/common_test_config.yaml > CI/config/container_config.yaml

-  python3 -m coverage run -a run_kraken.py -c CI/config/container_config.yaml
+  python3 -m coverage run -a run_kraken.py -c CI/config/container_config.yaml -d True
  echo "Container scenario test: Success"
+
+  kubectl get pods -n kube-system -l component=etcd
 }

 functional_test_container_crash
--- a/CI/tests/test_customapp_pod.sh
+++ b/CI/tests/test_customapp_pod.sh
@@ -11,7 +11,7 @@ function functional_test_customapp_pod_node_selector {
  export post_config=""
  envsubst < CI/config/common_test_config.yaml > CI/config/customapp_pod_config.yaml

-  python3 -m coverage run -a run_kraken.py -c CI/config/customapp_pod_config.yaml
+  python3 -m coverage run -a run_kraken.py -c CI/config/customapp_pod_config.yaml -d True
  echo "Pod disruption with node_label_selector test: Success"
 }

--- a/CI/tests/test_node.sh
+++ b/CI/tests/test_node.sh
@@ -0,0 +1,18 @@
+uset -xeEo pipefail
+
+source CI/tests/common.sh
+
+trap error ERR
+trap finish EXIT
+
+function functional_test_node_stop_start {
+  export scenario_type="node_scenarios"
+  export scenario_file="scenarios/kind/node_scenarios_example.yml"
+  export post_config=""
+  envsubst < CI/config/common_test_config.yaml > CI/config/node_config.yaml
+  cat CI/config/node_config.yaml
+  python3 -m coverage run -a run_kraken.py -c CI/config/node_config.yaml
+  echo "Node Stop/Start scenario test: Success"
+}
+
+functional_test_node_stop_start
--- a/CI/tests/test_node_network_chaos.sh
+++ b/CI/tests/test_node_network_chaos.sh
@@ -0,0 +1,165 @@
+set -xeEo pipefail
+
+source CI/tests/common.sh
+
+trap error ERR
+trap finish EXIT
+
+function functional_test_node_network_chaos {
+  echo "Starting node network chaos functional test"
+
+  # Get a worker node
+  get_node
+  export TARGET_NODE=$(echo $WORKER_NODE | awk '{print $1}')
+  echo "Target node: $TARGET_NODE"
+
+  # Deploy nginx workload on the target node
+  echo "Deploying nginx workload on $TARGET_NODE..."
+  kubectl create deployment nginx-node-net-chaos --image=nginx:latest
+
+  # Add node selector to ensure pod runs on target node
+  kubectl patch deployment nginx-node-net-chaos -p '{"spec":{"template":{"spec":{"nodeSelector":{"kubernetes.io/hostname":"'$TARGET_NODE'"}}}}}'
+
+  # Expose service
+  kubectl expose deployment nginx-node-net-chaos --port=80 --target-port=80 --name=nginx-node-net-chaos-svc
+
+  # Wait for nginx to be ready
+  echo "Waiting for nginx pod to be ready on $TARGET_NODE..."
+  kubectl wait --for=condition=ready pod -l app=nginx-node-net-chaos --timeout=120s
+
+  # Verify pod is on correct node
+  export POD_NAME=$(kubectl get pods -l app=nginx-node-net-chaos -o jsonpath='{.items[0].metadata.name}')
+  export POD_NODE=$(kubectl get pod $POD_NAME -o jsonpath='{.spec.nodeName}')
+  echo "Pod $POD_NAME is running on node $POD_NODE"
+
+  if [ "$POD_NODE" != "$TARGET_NODE" ]; then
+    echo "ERROR: Pod is not on target node (expected $TARGET_NODE, got $POD_NODE)"
+    kubectl get pods -l app=nginx-node-net-chaos -o wide
+    exit 1
+  fi
+
+  # Setup port-forward to access nginx
+  echo "Setting up port-forward to nginx service..."
+  kubectl port-forward service/nginx-node-net-chaos-svc 8091:80 &
+  PORT_FORWARD_PID=$!
+  sleep 3  # Give port-forward time to start
+
+  # Test baseline connectivity
+  echo "Testing baseline connectivity..."
+  response=$(curl -s -o /dev/null -w "%{http_code}" --max-time 5 http://localhost:8091 || echo "000")
+  if [ "$response" != "200" ]; then
+    echo "ERROR: Nginx not responding correctly (got $response, expected 200)"
+    kubectl get pods -l app=nginx-node-net-chaos
+    kubectl describe pod $POD_NAME
+    exit 1
+  fi
+  echo "Baseline test passed: nginx responding with 200"
+
+  # Measure baseline latency
+  echo "Measuring baseline latency..."
+  baseline_start=$(date +%s%3N)
+  curl -s http://localhost:8091 > /dev/null || true
+  baseline_end=$(date +%s%3N)
+  baseline_latency=$((baseline_end - baseline_start))
+  echo "Baseline latency: ${baseline_latency}ms"
+
+  # Configure node network chaos scenario
+  echo "Configuring node network chaos scenario..."
+  yq -i '.[0].config.target="'$TARGET_NODE'"' scenarios/kube/node-network-chaos.yml
+  yq -i '.[0].config.namespace="default"' scenarios/kube/node-network-chaos.yml
+  yq -i '.[0].config.test_duration=20' scenarios/kube/node-network-chaos.yml
+  yq -i '.[0].config.latency="200ms"' scenarios/kube/node-network-chaos.yml
+  yq -i '.[0].config.loss=15' scenarios/kube/node-network-chaos.yml
+  yq -i '.[0].config.bandwidth="10mbit"' scenarios/kube/node-network-chaos.yml
+  yq -i '.[0].config.ingress=true' scenarios/kube/node-network-chaos.yml
+  yq -i '.[0].config.egress=true' scenarios/kube/node-network-chaos.yml
+  yq -i '.[0].config.force=false' scenarios/kube/node-network-chaos.yml
+  yq -i 'del(.[0].config.interfaces)' scenarios/kube/node-network-chaos.yml
+
+  # Prepare krkn config
+  export scenario_type="network_chaos_ng_scenarios"
+  export scenario_file="scenarios/kube/node-network-chaos.yml"
+  export post_config=""
+  envsubst < CI/config/common_test_config.yaml > CI/config/node_network_chaos_config.yaml
+
+  # Run krkn in background
+  echo "Starting krkn with node network chaos scenario..."
+  python3 -m coverage run -a run_kraken.py -c CI/config/node_network_chaos_config.yaml &
+  KRKN_PID=$!
+  echo "Krkn started with PID: $KRKN_PID"
+
+  # Wait for chaos to start (give it time to inject chaos)
+  echo "Waiting for chaos injection to begin..."
+  sleep 10
+
+  # Test during chaos - check for increased latency or packet loss effects
+  echo "Testing network behavior during chaos..."
+  chaos_test_count=0
+  chaos_success=0
+
+  for i in {1..5}; do
+    chaos_test_count=$((chaos_test_count + 1))
+    chaos_start=$(date +%s%3N)
+    response=$(curl -s -o /dev/null -w "%{http_code}" --max-time 10 http://localhost:8091 || echo "000")
+    chaos_end=$(date +%s%3N)
+    chaos_latency=$((chaos_end - chaos_start))
+
+    echo "Attempt $i: HTTP $response, latency: ${chaos_latency}ms"
+
+    # We expect either increased latency or some failures due to packet loss
+    if [ "$response" == "200" ] || [ "$response" == "000" ]; then
+      chaos_success=$((chaos_success + 1))
+    fi
+
+    sleep 2
+  done
+
+  echo "Chaos test results: $chaos_success/$chaos_test_count requests processed"
+
+  # Verify node-level chaos affects pod
+  echo "Verifying node-level chaos affects pod on $TARGET_NODE..."
+  # The node chaos should affect all pods on the node
+
+  # Wait for krkn to complete
+  echo "Waiting for krkn to complete..."
+  wait $KRKN_PID || true
+  echo "Krkn completed"
+
+  # Wait a bit for cleanup
+  sleep 5
+
+  # Verify recovery - nginx should respond normally again
+  echo "Verifying service recovery..."
+  recovery_attempts=0
+  max_recovery_attempts=10
+
+  while [ $recovery_attempts -lt $max_recovery_attempts ]; do
+    recovery_attempts=$((recovery_attempts + 1))
+    response=$(curl -s -o /dev/null -w "%{http_code}" --max-time 5 http://localhost:8091 || echo "000")
+
+    if [ "$response" == "200" ]; then
+      echo "Recovery verified: nginx responding normally (attempt $recovery_attempts)"
+      break
+    fi
+
+    echo "Recovery attempt $recovery_attempts/$max_recovery_attempts: got $response, retrying..."
+    sleep 3
+  done
+
+  if [ "$response" != "200" ]; then
+    echo "ERROR: Service did not recover after chaos (got $response)"
+    kubectl get pods -l app=nginx-node-net-chaos
+    kubectl describe pod $POD_NAME
+    exit 1
+  fi
+
+  # Cleanup
+  echo "Cleaning up test resources..."
+  kill $PORT_FORWARD_PID 2>/dev/null || true
+  kubectl delete deployment nginx-node-net-chaos --ignore-not-found=true
+  kubectl delete service nginx-node-net-chaos-svc --ignore-not-found=true
+
+  echo "Node network chaos test: Success"
+}
+
+functional_test_node_network_chaos
--- a/CI/tests/test_pod.sh
+++ b/CI/tests/test_pod.sh
@@ -7,12 +7,15 @@ trap finish EXIT

 function functional_test_pod_crash {
  export scenario_type="pod_disruption_scenarios"
-  export scenario_file="scenarios/kind/pod_etcd.yml"
+  export scenario_file="scenarios/kind/pod_path_provisioner.yml"
+
  export post_config=""
  envsubst < CI/config/common_test_config.yaml > CI/config/pod_config.yaml

  python3 -m coverage run -a run_kraken.py -c CI/config/pod_config.yaml
  echo "Pod disruption scenario test: Success"
+  date
+  kubectl get pods -n local-path-storage -l app=local-path-provisioner -o yaml
 }

 functional_test_pod_crash
--- a/CI/tests/test_pod_error.sh
+++ b/CI/tests/test_pod_error.sh
@@ -0,0 +1,31 @@
+
+
+source CI/tests/common.sh
+
+trap error ERR
+trap finish EXIT
+
+function functional_test_pod_error {
+  export scenario_type="pod_disruption_scenarios"
+  export scenario_file="scenarios/kind/pod_etcd.yml"
+  export post_config=""
+  # this test will check if krkn exits with an error when too many pods are targeted
+  yq -i '.[0].config.kill=5' scenarios/kind/pod_etcd.yml
+  yq -i '.[0].config.krkn_pod_recovery_time=1' scenarios/kind/pod_etcd.yml
+  envsubst < CI/config/common_test_config.yaml > CI/config/pod_config.yaml
+  cat CI/config/pod_config.yaml
+
+  cat scenarios/kind/pod_etcd.yml
+  python3 -m coverage run -a run_kraken.py -c CI/config/pod_config.yaml
+  
+  ret=$?
+  echo "\n\nret $ret"
+  if [[ $ret -ge 1 ]]; then
+      echo "Pod disruption error scenario test: Success"
+  else 
+    echo "Pod disruption error scenario test: Failure"
+    exit 1
+  fi
+}
+
+functional_test_pod_error
--- a/CI/tests/test_pod_network_chaos.sh
+++ b/CI/tests/test_pod_network_chaos.sh
@@ -0,0 +1,143 @@
+set -xeEo pipefail
+
+source CI/tests/common.sh
+
+trap error ERR
+trap finish EXIT
+
+function functional_test_pod_network_chaos {
+  echo "Starting pod network chaos functional test"
+
+  # Deploy nginx workload
+  echo "Deploying nginx workload..."
+  kubectl create deployment nginx-pod-net-chaos --image=nginx:latest
+  kubectl expose deployment nginx-pod-net-chaos --port=80 --target-port=80 --name=nginx-pod-net-chaos-svc
+
+  # Wait for nginx to be ready
+  echo "Waiting for nginx pod to be ready..."
+  kubectl wait --for=condition=ready pod -l app=nginx-pod-net-chaos --timeout=120s
+
+  # Get pod name
+  export POD_NAME=$(kubectl get pods -l app=nginx-pod-net-chaos -o jsonpath='{.items[0].metadata.name}')
+  echo "Target pod: $POD_NAME"
+
+  # Setup port-forward to access nginx
+  echo "Setting up port-forward to nginx service..."
+  kubectl port-forward service/nginx-pod-net-chaos-svc 8090:80 &
+  PORT_FORWARD_PID=$!
+  sleep 3  # Give port-forward time to start
+
+  # Test baseline connectivity
+  echo "Testing baseline connectivity..."
+  response=$(curl -s -o /dev/null -w "%{http_code}" --max-time 5 http://localhost:8090 || echo "000")
+  if [ "$response" != "200" ]; then
+    echo "ERROR: Nginx not responding correctly (got $response, expected 200)"
+    kubectl get pods -l app=nginx-pod-net-chaos
+    kubectl describe pod $POD_NAME
+    exit 1
+  fi
+  echo "Baseline test passed: nginx responding with 200"
+
+  # Measure baseline latency
+  echo "Measuring baseline latency..."
+  baseline_start=$(date +%s%3N)
+  curl -s http://localhost:8090 > /dev/null || true
+  baseline_end=$(date +%s%3N)
+  baseline_latency=$((baseline_end - baseline_start))
+  echo "Baseline latency: ${baseline_latency}ms"
+
+  # Configure pod network chaos scenario
+  echo "Configuring pod network chaos scenario..."
+  yq -i '.[0].config.target="'$POD_NAME'"' scenarios/kube/pod-network-chaos.yml
+  yq -i '.[0].config.namespace="default"' scenarios/kube/pod-network-chaos.yml
+  yq -i '.[0].config.test_duration=20' scenarios/kube/pod-network-chaos.yml
+  yq -i '.[0].config.latency="200ms"' scenarios/kube/pod-network-chaos.yml
+  yq -i '.[0].config.loss=15' scenarios/kube/pod-network-chaos.yml
+  yq -i '.[0].config.bandwidth="10mbit"' scenarios/kube/pod-network-chaos.yml
+  yq -i '.[0].config.ingress=true' scenarios/kube/pod-network-chaos.yml
+  yq -i '.[0].config.egress=true' scenarios/kube/pod-network-chaos.yml
+  yq -i 'del(.[0].config.interfaces)' scenarios/kube/pod-network-chaos.yml
+
+  # Prepare krkn config
+  export scenario_type="network_chaos_ng_scenarios"
+  export scenario_file="scenarios/kube/pod-network-chaos.yml"
+  export post_config=""
+  envsubst < CI/config/common_test_config.yaml > CI/config/pod_network_chaos_config.yaml
+
+  # Run krkn in background
+  echo "Starting krkn with pod network chaos scenario..."
+  python3 -m coverage run -a run_kraken.py -c CI/config/pod_network_chaos_config.yaml &
+  KRKN_PID=$!
+  echo "Krkn started with PID: $KRKN_PID"
+
+  # Wait for chaos to start (give it time to inject chaos)
+  echo "Waiting for chaos injection to begin..."
+  sleep 10
+
+  # Test during chaos - check for increased latency or packet loss effects
+  echo "Testing network behavior during chaos..."
+  chaos_test_count=0
+  chaos_success=0
+
+  for i in {1..5}; do
+    chaos_test_count=$((chaos_test_count + 1))
+    chaos_start=$(date +%s%3N)
+    response=$(curl -s -o /dev/null -w "%{http_code}" --max-time 10 http://localhost:8090 || echo "000")
+    chaos_end=$(date +%s%3N)
+    chaos_latency=$((chaos_end - chaos_start))
+
+    echo "Attempt $i: HTTP $response, latency: ${chaos_latency}ms"
+
+    # We expect either increased latency or some failures due to packet loss
+    if [ "$response" == "200" ] || [ "$response" == "000" ]; then
+      chaos_success=$((chaos_success + 1))
+    fi
+
+    sleep 2
+  done
+
+  echo "Chaos test results: $chaos_success/$chaos_test_count requests processed"
+
+  # Wait for krkn to complete
+  echo "Waiting for krkn to complete..."
+  wait $KRKN_PID || true
+  echo "Krkn completed"
+
+  # Wait a bit for cleanup
+  sleep 5
+
+  # Verify recovery - nginx should respond normally again
+  echo "Verifying service recovery..."
+  recovery_attempts=0
+  max_recovery_attempts=10
+
+  while [ $recovery_attempts -lt $max_recovery_attempts ]; do
+    recovery_attempts=$((recovery_attempts + 1))
+    response=$(curl -s -o /dev/null -w "%{http_code}" --max-time 5 http://localhost:8090 || echo "000")
+
+    if [ "$response" == "200" ]; then
+      echo "Recovery verified: nginx responding normally (attempt $recovery_attempts)"
+      break
+    fi
+
+    echo "Recovery attempt $recovery_attempts/$max_recovery_attempts: got $response, retrying..."
+    sleep 3
+  done
+
+  if [ "$response" != "200" ]; then
+    echo "ERROR: Service did not recover after chaos (got $response)"
+    kubectl get pods -l app=nginx-pod-net-chaos
+    kubectl describe pod $POD_NAME
+    exit 1
+  fi
+
+  # Cleanup
+  echo "Cleaning up test resources..."
+  kill $PORT_FORWARD_PID 2>/dev/null || true
+  kubectl delete deployment nginx-pod-net-chaos --ignore-not-found=true
+  kubectl delete service nginx-pod-net-chaos-svc --ignore-not-found=true
+
+  echo "Pod network chaos test: Success"
+}
+
+functional_test_pod_network_chaos
--- a/CI/tests/test_pod_server.sh
+++ b/CI/tests/test_pod_server.sh
@@ -0,0 +1,35 @@
+set -xeEo pipefail
+
+source CI/tests/common.sh
+
+trap error ERR
+trap finish EXIT
+
+function functional_test_pod_server {
+  export scenario_type="pod_disruption_scenarios"
+  export scenario_file="scenarios/kind/pod_etcd.yml"
+  export post_config=""
+
+  envsubst < CI/config/common_test_config.yaml > CI/config/pod_config.yaml
+  yq -i '.[0].config.kill=1' scenarios/kind/pod_etcd.yml
+  
+  yq -i '.tunings.daemon_mode=True' CI/config/pod_config.yaml
+  cat CI/config/pod_config.yaml
+  python3 -m coverage run -a run_kraken.py -c CI/config/pod_config.yaml & 
+  sleep 15
+  curl -X POST http:/0.0.0.0:8081/STOP
+
+  wait
+
+  yq -i '.kraken.signal_state="PAUSE"' CI/config/pod_config.yaml
+  yq -i '.tunings.daemon_mode=False' CI/config/pod_config.yaml
+  cat CI/config/pod_config.yaml
+  python3 -m coverage run -a run_kraken.py -c CI/config/pod_config.yaml & 
+  sleep 5
+  curl -X POST http:/0.0.0.0:8081/RUN
+  wait
+
+  echo "Pod disruption with server scenario test: Success"
+}
+
+functional_test_pod_server
--- a/CI/tests/test_pvc.sh
+++ b/CI/tests/test_pvc.sh
@@ -0,0 +1,18 @@
+set -xeEo pipefail
+
+source CI/tests/common.sh
+
+trap error ERR
+trap finish EXIT
+
+function functional_test_pvc_fill {
+  export scenario_type="pvc_scenarios"
+  export scenario_file="scenarios/kind/pvc_scenario.yaml"
+  export post_config=""
+  envsubst < CI/config/common_test_config.yaml > CI/config/pvc_config.yaml
+  cat CI/config/pvc_config.yaml
+  python3 -m coverage run -a run_kraken.py -c CI/config/pvc_config.yaml --debug True
+  echo "PVC Fill scenario test: Success"
+}
+
+functional_test_pvc_fill
--- a/CI/tests/test_telemetry.sh
+++ b/CI/tests/test_telemetry.sh
@@ -18,9 +18,8 @@ function functional_test_telemetry {
  yq -i '.performance_monitoring.prometheus_url="http://localhost:9090"' CI/config/common_test_config.yaml
  yq -i '.telemetry.run_tag=env(RUN_TAG)' CI/config/common_test_config.yaml

-  export scenario_type="hog_scenarios"
-
-  export scenario_file="scenarios/kube/cpu-hog.yml"
+  export scenario_type="pod_disruption_scenarios"
+  export scenario_file="scenarios/kind/pod_etcd.yml"

  export post_config=""
  envsubst < CI/config/common_test_config.yaml > CI/config/telemetry.yaml
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,273 @@
+# CLAUDE.md - Krkn Chaos Engineering Framework
+
+## Project Overview
+
+Krkn (Kraken) is a chaos engineering tool for Kubernetes/OpenShift clusters. It injects deliberate failures to validate cluster resilience. Plugin-based architecture with multi-cloud support (AWS, Azure, GCP, IBM Cloud, VMware, Alibaba, OpenStack).
+
+## Repository Structure
+
+```
+krkn/
+├── krkn/
+│   ├── scenario_plugins/        # Chaos scenario plugins (pod, node, network, hogs, etc.)
+│   ├── utils/                   # Utility functions
+│   ├── rollback/                # Rollback management
+│   ├── prometheus/              # Prometheus integration
+│   └── cerberus/                # Health monitoring
+├── tests/                       # Unit tests (unittest framework)
+├── scenarios/                   # Example scenario configs (openshift/, kube/, kind/)
+├── config/                      # Configuration files
+└── CI/                          # CI/CD test scripts
+```
+
+## Quick Start
+
+```bash
+# Setup (ALWAYS use virtual environment)
+python3 -m venv venv
+source venv/bin/activate
+pip install -r requirements.txt
+
+# Run Krkn
+python run_kraken.py --config config/config.yaml
+
+# Note: Scenarios are specified in config.yaml under kraken.chaos_scenarios
+# There is no --scenario flag; edit config/config.yaml to select scenarios
+
+# Run tests
+python -m unittest discover -s tests -v
+python -m coverage run -a -m unittest discover -s tests -v
+```
+
+## Critical Requirements
+
+### Python Environment
+- **Python 3.9+** required
+- **NEVER install packages globally** - always use virtual environment
+- **CRITICAL**: `docker` must be <7.0 and `requests` must be <2.32 (Unix socket compatibility)
+
+### Key Dependencies
+- **krkn-lib** (5.1.13): Core library for Kubernetes/OpenShift operations
+- **kubernetes** (34.1.0): Kubernetes Python client
+- **docker** (<7.0), **requests** (<2.32): DO NOT upgrade without verifying compatibility
+- Cloud SDKs: boto3 (AWS), azure-mgmt-* (Azure), google-cloud-compute (GCP), ibm_vpc (IBM), pyVmomi (VMware)
+
+## Plugin Architecture (CRITICAL)
+
+**Strictly enforced naming conventions:**
+
+### Naming Rules
+- **Module files**: Must end with `_scenario_plugin.py` and use snake_case
+  - Example: `pod_disruption_scenario_plugin.py`
+- **Class names**: Must be CamelCase and end with `ScenarioPlugin`
+  - Example: `PodDisruptionScenarioPlugin`
+  - Must match module filename (snake_case ↔ CamelCase)
+- **Directory structure**: Plugin dirs CANNOT contain "scenario" or "plugin"
+  - Location: `krkn/scenario_plugins/<plugin_name>/`
+
+### Plugin Implementation
+Every plugin MUST:
+1. Extend `AbstractScenarioPlugin`
+2. Implement `run()` method
+3. Implement `get_scenario_types()` method
+
+```python
+from krkn.scenario_plugins import AbstractScenarioPlugin
+
+class PodDisruptionScenarioPlugin(AbstractScenarioPlugin):
+    def run(self, config, scenarios_list, kubeconfig_path, wait_duration):
+        pass
+    
+    def get_scenario_types(self):
+        return ["pod_scenarios", "pod_outage"]
+```
+
+### Creating a New Plugin
+1. Create directory: `krkn/scenario_plugins/<plugin_name>/`
+2. Create module: `<plugin_name>_scenario_plugin.py`
+3. Create class: `<PluginName>ScenarioPlugin` extending `AbstractScenarioPlugin`
+4. Implement `run()` and `get_scenario_types()`
+5. Create unit test: `tests/test_<plugin_name>_scenario_plugin.py`
+6. Add example scenario: `scenarios/<platform>/<scenario>.yaml`
+
+**DO NOT**: Violate naming conventions (factory will reject), include "scenario"/"plugin" in directory names, create plugins without tests.
+
+## Testing
+
+### Unit Tests
+```bash
+# Run all tests
+python -m unittest discover -s tests -v
+
+# Specific test
+python -m unittest tests.test_pod_disruption_scenario_plugin
+
+# With coverage
+python -m coverage run -a -m unittest discover -s tests -v
+python -m coverage html
+```
+
+**Test requirements:**
+- Naming: `test_<module>_scenario_plugin.py`
+- Mock external dependencies (Kubernetes API, cloud providers)
+- Test success, failure, and edge cases
+- Keep tests isolated and independent
+
+### Functional Tests
+Located in `CI/tests/`. Can be run locally on a kind cluster with Prometheus and Elasticsearch set up.
+
+**Setup for local testing:**
+1. Deploy Prometheus and Elasticsearch on your kind cluster:
+   - Prometheus setup: https://krkn-chaos.dev/docs/developers-guide/testing-changes/#prometheus
+   - Elasticsearch setup: https://krkn-chaos.dev/docs/developers-guide/testing-changes/#elasticsearch
+
+2. Or disable monitoring features in `config/config.yaml`:
+   ```yaml
+   performance_monitoring:
+       enable_alerts: False
+       enable_metrics: False
+       check_critical_alerts: False
+   ```
+
+**Note:** Functional tests run automatically in CI with full monitoring enabled.
+
+## Cloud Provider Implementations
+
+Node chaos scenarios are cloud-specific. Each in `krkn/scenario_plugins/node_actions/<provider>_node_scenarios.py`:
+- AWS, Azure, GCP, IBM Cloud, VMware, Alibaba, OpenStack, Bare Metal
+
+Implement: stop, start, reboot, terminate instances.
+
+**When modifying**: Maintain consistency with other providers, handle API errors, add logging, update tests.
+
+### Adding Cloud Provider Support
+1. Create: `krkn/scenario_plugins/node_actions/<provider>_node_scenarios.py`
+2. Extend: `abstract_node_scenarios.AbstractNodeScenarios`
+3. Implement: `stop_instances`, `start_instances`, `reboot_instances`, `terminate_instances`
+4. Add SDK to `requirements.txt`
+5. Create unit test with mocked SDK
+6. Add example scenario: `scenarios/openshift/<provider>_node_scenarios.yml`
+
+## Configuration
+
+**Main config**: `config/config.yaml`
+- `kraken`: Core settings
+- `cerberus`: Health monitoring
+- `performance_monitoring`: Prometheus
+- `elastic`: Elasticsearch telemetry
+
+**Scenario configs**: `scenarios/` directory
+```yaml
+- config:
+    scenario_type: <type>  # Must match plugin's get_scenario_types()
+```
+
+## Code Style
+
+- **Import order**: Standard library, third-party, local imports
+- **Naming**: snake_case (functions/variables), CamelCase (classes)
+- **Logging**: Use Python's `logging` module
+- **Error handling**: Return appropriate exit codes
+- **Docstrings**: Required for public functions/classes
+
+## Exit Codes
+
+Krkn uses specific exit codes to communicate execution status:
+
+- `0`: Success - all scenarios passed, no critical alerts
+- `1`: Scenario failure - one or more scenarios failed
+- `2`: Critical alerts fired during execution
+- `3+`: Health check failure (Cerberus monitoring detected issues)
+
+**When implementing scenarios:**
+- Return `0` on success
+- Return `1` on scenario-specific failures
+- Propagate health check failures appropriately
+- Log exit code reasons clearly
+
+## Container Support
+
+Krkn can run inside a container. See `containers/` directory.
+
+**Building custom image:**
+```bash
+cd containers
+./compile_dockerfile.sh  # Generates Dockerfile from template
+docker build -t krkn:latest .
+```
+
+**Running containerized:**
+```bash
+docker run -v ~/.kube:/root/.kube:Z \
+  -v $(pwd)/config:/config:Z \
+  -v $(pwd)/scenarios:/scenarios:Z \
+  krkn:latest
+```
+
+## Git Workflow
+
+- **NEVER commit directly to main**
+- **NEVER use `--force` without approval**
+- **ALWAYS create feature branches**: `git checkout -b feature/description`
+- **ALWAYS run tests before pushing**
+
+**Conventional commits**: `feat:`, `fix:`, `test:`, `docs:`, `refactor:`
+
+```bash
+git checkout main && git pull origin main
+git checkout -b feature/your-feature-name
+# Make changes, write tests
+python -m unittest discover -s tests -v
+git add <specific-files>
+git commit -m "feat: description"
+git push -u origin feature/your-feature-name
+```
+
+## Environment Variables
+
+- `KUBECONFIG`: Path to kubeconfig
+- `AWS_*`, `AZURE_*`, `GOOGLE_APPLICATION_CREDENTIALS`: Cloud credentials
+- `PROMETHEUS_URL`, `ELASTIC_URL`, `ELASTIC_PASSWORD`: Monitoring config
+
+**NEVER commit credentials or API keys.**
+
+## Common Pitfalls
+
+1. Missing virtual environment - always activate venv
+2. Running functional tests without cluster setup
+3. Ignoring exit codes
+4. Modifying krkn-lib directly (it's a separate package)
+5. Upgrading docker/requests beyond version constraints
+
+## Before Writing Code
+
+1. Check for existing implementations
+2. Review existing plugins as examples
+3. Maintain consistency with cloud provider patterns
+4. Plan rollback logic
+5. Write tests alongside code
+6. Update documentation
+
+## When Adding Dependencies
+
+1. Check if functionality exists in krkn-lib or current dependencies
+2. Verify compatibility with existing versions
+3. Pin specific versions in `requirements.txt`
+4. Check for security vulnerabilities
+5. Test thoroughly for conflicts
+
+## Common Development Tasks
+
+### Modifying Existing Plugin
+1. Read plugin code and corresponding test
+2. Make changes
+3. Update/add unit tests
+4. Run: `python -m unittest tests.test_<plugin>_scenario_plugin`
+
+### Writing Unit Tests
+1. Create: `tests/test_<module>_scenario_plugin.py`
+2. Import `unittest` and plugin class
+3. Mock external dependencies
+4. Test success, failure, and edge cases
+5. Run: `python -m unittest tests.test_<module>_scenario_plugin`
+
--- a/GOVERNANCE.md
+++ b/GOVERNANCE.md
@@ -26,7 +26,7 @@ Here is an excerpt:
 ## Maintainer Levels

 ### Contributor
-Contributors contributor to the community. Anyone can become a contributor by participating in discussions, reporting bugs, or contributing code or documentation.
+Contributors contribute to the community. Anyone can become a contributor by participating in discussions, reporting bugs, or contributing code or documentation.

 #### Responsibilities:

@@ -80,4 +80,4 @@ Represent the project in the broader open-source community.


 # Credits
-Sections of this documents have been borrowed from [Kubernetes governance](https://github.com/kubernetes/community/blob/master/governance.md)
+Sections of this document have been borrowed from [Kubernetes governance](https://github.com/kubernetes/community/blob/master/governance.md)
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -16,5 +16,5 @@ Following are a list of enhancements that we are planning to work on adding supp
 - [x] [Krknctl - client for running Krkn scenarios with ease](https://github.com/krkn-chaos/krknctl)
 - [x] [AI Chat bot to help get started with Krkn and commands](https://github.com/krkn-chaos/krkn-lightspeed)
 - [ ] [Ability to roll back cluster to original state if chaos fails](https://github.com/krkn-chaos/krkn/issues/804)
- [ ] Add recovery time metrics to each scenario for each better regression analysis
+- [ ] Add recovery time metrics to each scenario for better regression analysis
 - [ ] [Add resiliency scoring to chaos scenarios ran on cluster](https://github.com/krkn-chaos/krkn/issues/125)
--- a/SECURITY.md
+++ b/SECURITY.md
@@ -40,4 +40,4 @@ The security team currently consists of the [Maintainers of Krkn](https://github

 ## Process and Supported Releases

-The Krkn security team will investigate and provide a fix in a timely mannner depending on the severity. The fix will be included in the new release of Krkn and details will be included in the release notes.
+The Krkn security team will investigate and provide a fix in a timely manner depending on the severity. The fix will be included in the new release of Krkn and details will be included in the release notes.
--- a/config/cerberus.yaml
+++ b/config/cerberus.yaml
@@ -39,7 +39,7 @@ cerberus:
        Sunday:
    slack_team_alias:                                    # The slack team alias to be tagged while reporting failures in the slack channel when no watcher is assigned

-    custom_checks:                                       # Relative paths of files conataining additional user defined checks
+    custom_checks:                                       # Relative paths of files containing additional user defined checks

 tunings:
    timeout: 3                                          # Number of seconds before requests fail
--- a/config/config.yaml
+++ b/config/config.yaml
@@ -50,13 +50,15 @@ kraken:
       - network_chaos_ng_scenarios:
               - scenarios/kube/pod-network-filter.yml
               - scenarios/kube/node-network-filter.yml
+               - scenarios/kube/node-network-chaos.yml
+               - scenarios/kube/pod-network-chaos.yml
       -  kubevirt_vm_outage:
              - scenarios/kubevirt/kubevirt-vm-outage.yaml

 cerberus:
    cerberus_enabled: False                                # Enable it when cerberus is previously installed
    cerberus_url:                                          # When cerberus_enabled is set to True, provide the url where cerberus publishes go/no-go signal
-    check_applicaton_routes: False                         # When enabled will look for application unavailability using the routes specified in the cerberus config and fails the run
+    check_application_routes: False                         # When enabled will look for application unavailability using the routes specified in the cerberus config and fails the run

 performance_monitoring:
    prometheus_url: ''                                    # The prometheus url/route is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes.
@@ -93,7 +95,7 @@ telemetry:
    prometheus_pod_name: ""                                 # name of the prometheus pod (if distribution is kubernetes)
    full_prometheus_backup: False                           # if is set to False only the /prometheus/wal folder will be downloaded.
    backup_threads: 5                                       # number of telemetry download/upload threads
-    archive_path: /tmp                                      # local path where the archive files will be temporarly stored
+    archive_path: /tmp                                      # local path where the archive files will be temporarily stored
    max_retries: 0                                          # maximum number of upload retries (if 0 will retry forever)
    run_tag: ''                                             # if set, this will be appended to the run folder in the bucket (useful to group the runs)
    archive_size: 500000
@@ -126,4 +128,6 @@ kubevirt_checks:                                            # Utilizing virt che
    name:                                                   # Regex Name style of VMI's to watch, optional, will watch all VMI names in the namespace if left blank
    only_failures: False                                    # Boolean of whether to show all VMI's failures and successful ssh connection (False), or only failure status' (True) 
    disconnected: False                                     # Boolean of how to try to connect to the VMIs; if True will use the ip_address to try ssh from within a node, if false will use the name and uses virtctl to try to connect; Default is False
-    ssh_node: ""                                            # If set, will be a backup way to ssh to a node. Will want to set to a node that isn't targeted in chaos
+    ssh_node: ""                                            # If set, will be a backup way to ssh to a node. Will want to set to a node that isn't targeted in chaos
+    node_names: ""
+    exit_on_failure:                                        # If value is True and VMI's are failing post chaos returns failure, values can be True/False
--- a/config/config_kind.yaml
+++ b/config/config_kind.yaml
@@ -13,7 +13,7 @@ kraken:
 cerberus:
    cerberus_enabled: False                                # Enable it when cerberus is previously installed
    cerberus_url:                                          # When cerberus_enabled is set to True, provide the url where cerberus publishes go/no-go signal
-    check_applicaton_routes: False                         # When enabled will look for application unavailability using the routes specified in the cerberus config and fails the run
+    check_application_routes: False                         # When enabled will look for application unavailability using the routes specified in the cerberus config and fails the run

 performance_monitoring:
    prometheus_url:                                       # The prometheus url/route is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes.
@@ -32,7 +32,7 @@ tunings:

 telemetry:
    enabled: False                                         # enable/disables the telemetry collection feature
-    archive_path: /tmp                                     # local path where the archive files will be temporarly stored
+    archive_path: /tmp                                     # local path where the archive files will be temporarily stored
    events_backup: False                                   # enables/disables cluster events collection
    logs_backup: False

--- a/config/config_kubernetes.yaml
+++ b/config/config_kubernetes.yaml
@@ -14,7 +14,7 @@ kraken:
 cerberus:
    cerberus_enabled: False                                # Enable it when cerberus is previously installed
    cerberus_url:                                          # When cerberus_enabled is set to True, provide the url where cerberus publishes go/no-go signal
-    check_applicaton_routes: False                         # When enabled will look for application unavailability using the routes specified in the cerberus config and fails the run
+    check_application_routes: False                         # When enabled will look for application unavailability using the routes specified in the cerberus config and fails the run

 performance_monitoring:
    prometheus_url:                                       # The prometheus url/route is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes.
--- a/config/config_performance.yaml
+++ b/config/config_performance.yaml
@@ -35,7 +35,7 @@ kraken:
 cerberus:
    cerberus_enabled: True                                # Enable it when cerberus is previously installed
    cerberus_url: http://0.0.0.0:8080                     # When cerberus_enabled is set to True, provide the url where cerberus publishes go/no-go signal
-    check_applicaton_routes: False                        # When enabled will look for application unavailability using the routes specified in the cerberus config and fails the run
+    check_application_routes: False                        # When enabled will look for application unavailability using the routes specified in the cerberus config and fails the run

 performance_monitoring:
    deploy_dashboards: True                               # Install a mutable grafana and load the performance dashboards. Enable this only when running on OpenShift
@@ -61,7 +61,7 @@ telemetry:
    prometheus_backup: True                                 # enables/disables prometheus data collection
    full_prometheus_backup: False                           # if is set to False only the /prometheus/wal folder will be downloaded.
    backup_threads: 5                                       # number of telemetry download/upload threads
-    archive_path: /tmp                                      # local path where the archive files will be temporarly stored
+    archive_path: /tmp                                      # local path where the archive files will be temporarily stored
    max_retries: 0                                          # maximum number of upload retries (if 0 will retry forever)
    run_tag: ''                                             # if set, this will be appended to the run folder in the bucket (useful to group the runs)
    archive_size: 500000                                     # the size of the prometheus data archive size in KB. The lower the size of archive is
--- a/containers/Dockerfile.template
+++ b/containers/Dockerfile.template
@@ -1,22 +1,35 @@
 # oc build
-FROM golang:1.23.1 AS oc-build
+FROM golang:1.24.9 AS oc-build
 RUN apt-get update && apt-get install -y --no-install-recommends libkrb5-dev
 WORKDIR /tmp
+# oc build
 RUN git clone --branch release-4.18 https://github.com/openshift/oc.git
 WORKDIR /tmp/oc
-RUN go mod edit -go 1.23.1 &&\
-    go get github.com/moby/buildkit@v0.12.5 &&\
-    go get github.com/containerd/containerd@v1.7.11&&\
-    go get github.com/docker/docker@v25.0.6&&\
-    go get github.com/opencontainers/runc@v1.1.14&&\
-    go get github.com/go-git/go-git/v5@v5.13.0&&\
-    go get golang.org/x/net@v0.38.0&&\
-    go get github.com/containerd/containerd@v1.7.27&&\
-    go get golang.org/x/oauth2@v0.27.0&&\
-    go get golang.org/x/crypto@v0.35.0&&\
+RUN go mod edit -go 1.24.9 &&\
+    go mod edit -require github.com/moby/buildkit@v0.12.5 &&\
+    go mod edit -require github.com/containerd/containerd@v1.7.29&&\
+    go mod edit -require github.com/docker/docker@v27.5.1+incompatible&&\
+    go mod edit -require github.com/opencontainers/runc@v1.2.8&&\
+    go mod edit -require github.com/go-git/go-git/v5@v5.13.0&&\
+    go mod edit -require github.com/opencontainers/selinux@v1.13.0&&\
+    go mod edit -require github.com/ulikunitz/xz@v0.5.15&&\
+    go mod edit -require golang.org/x/net@v0.38.0&&\
+    go mod edit -require github.com/containerd/containerd@v1.7.27&&\
+    go mod edit -require golang.org/x/oauth2@v0.27.0&&\
+    go mod edit -require golang.org/x/crypto@v0.35.0&&\
+    go mod edit -replace github.com/containerd/containerd@v1.7.27=github.com/containerd/containerd@v1.7.29&&\
    go mod tidy && go mod vendor
+
 RUN make GO_REQUIRED_MIN_VERSION:= oc

+# virtctl build
+WORKDIR /tmp
+RUN git clone https://github.com/kubevirt/kubevirt.git
+WORKDIR /tmp/kubevirt
+RUN go mod edit -go 1.24.9 &&\
+    go work use &&\
+    go build -o virtctl ./cmd/virtctl/
+
 FROM fedora:40
 ARG PR_NUMBER
 ARG TAG
@@ -28,16 +41,12 @@ ENV KUBECONFIG /home/krkn/.kube/config

 # This overwrites any existing configuration in /etc/yum.repos.d/kubernetes.repo
 RUN dnf update && dnf install -y --setopt=install_weak_deps=False \
-    git python39 jq yq gettext wget which ipmitool openssh-server &&\
+    git python3.11 jq yq gettext wget which ipmitool openssh-server &&\
    dnf clean all

-# Virtctl 
-RUN export VERSION=$(curl https://storage.googleapis.com/kubevirt-prow/release/kubevirt/kubevirt/stable.txt) && \
-    wget https://github.com/kubevirt/kubevirt/releases/download/${VERSION}/virtctl-${VERSION}-linux-amd64 && \
-    chmod +x virtctl-${VERSION}-linux-amd64 && sudo mv virtctl-${VERSION}-linux-amd64 /usr/local/bin/virtctl
-
 # copy oc client binary from oc-build image
 COPY --from=oc-build /tmp/oc/oc /usr/bin/oc
+COPY --from=oc-build /tmp/kubevirt/virtctl /usr/bin/virtctl

 # krkn build
 RUN git clone https://github.com/krkn-chaos/krkn.git /home/krkn/kraken && \
@@ -54,10 +63,15 @@ RUN if [ -n "$PR_NUMBER" ]; then git fetch origin pull/${PR_NUMBER}/head:pr-${PR
 # if it is a TAG trigger checkout the tag
 RUN if [ -n "$TAG" ]; then git checkout "$TAG";fi

-RUN python3.9 -m ensurepip --upgrade --default-pip
-RUN python3.9 -m pip install --upgrade pip setuptools==78.1.1
-RUN pip3.9 install -r requirements.txt
-RUN pip3.9 install jsonschema
+RUN python3.11 -m ensurepip --upgrade --default-pip
+RUN python3.11 -m pip install --upgrade pip setuptools==78.1.1
+
+# removes the the vulnerable versions of setuptools and pip
+RUN rm -rf "$(pip cache dir)"
+RUN rm -rf /tmp/*
+RUN rm -rf /usr/local/lib/python3.11/ensurepip/_bundled
+RUN pip3.11 install -r requirements.txt
+RUN pip3.11 install jsonschema

 LABEL krknctl.title.global="Krkn Base Image"
 LABEL krknctl.description.global="This is the krkn base image."
--- a/containers/krknctl-input.json
+++ b/containers/krknctl-input.json
@@ -85,6 +85,24 @@
    "default": "False",
    "required": "false"
  },
+  {
+    "name": "prometheus-url",
+    "short_description": "Prometheus url",
+    "description": "Prometheus url for when running on kuberenetes",
+    "variable": "PROMETHEUS_URL",
+    "type": "string",
+    "default": "",
+    "required": "false"
+  },
+  {
+    "name": "prometheus-token",
+    "short_description": "Prometheus bearer token",
+    "description": "Prometheus bearer token for prometheus url authentication",
+    "variable": "PROMETHEUS_TOKEN",
+    "type": "string",
+    "default": "",
+    "required": "false"
+  },
  {
    "name": "uuid",
    "short_description": "Sets krkn run uuid",
@@ -501,6 +519,26 @@
    "default": "",
    "required": "false"
  },
+  {
+    "name": "kubevirt-exit-on-failure",
+    "short_description": "KubeVirt fail if failed vms at end of run",
+    "description": "KubeVirt fails run if vms still have false status",
+    "variable": "KUBE_VIRT_EXIT_ON_FAIL",
+    "type": "enum",
+    "allowed_values": "True,False,true,false",
+    "separator": ",",
+    "default": "False",
+    "required": "false"
+  },
+  {
+    "name": "kubevirt-node-node",
+    "short_description": "KubeVirt node to filter vms on",
+    "description": "Only track VMs in KubeVirt on given node name",
+    "variable": "KUBE_VIRT_NODE_NAME",
+    "type": "string",
+    "default": "",
+    "required": "false"
+  },
  {
    "name": "krkn-debug",
    "short_description": "Krkn debug mode",
--- a/kind-config.yml
+++ b/kind-config.yml
@@ -3,10 +3,16 @@ apiVersion: kind.x-k8s.io/v1alpha4
 nodes:
  - role: control-plane
    extraPortMappings:
+      - containerPort: 30000
+        hostPort: 9090
+      - containerPort: 32766
+        hostPort: 9200
      - containerPort: 30036
        hostPort: 8888
      - containerPort: 30037
        hostPort: 8889
+      - containerPort: 30080
+        hostPort: 30080
  - role: control-plane
  - role: control-plane
  - role: worker
--- a/utils/chaos_ai/src/init.py
+++ b/utils/chaos_ai/src/init.py
--- a/krkn/cerberus/setup.py
+++ b/krkn/cerberus/setup.py
@@ -14,7 +14,7 @@ def get_status(config, start_time, end_time):
    if config["cerberus"]["cerberus_enabled"]:
        cerberus_url = config["cerberus"]["cerberus_url"]
        check_application_routes = \
-            config["cerberus"]["check_applicaton_routes"]
+            config["cerberus"]["check_application_routes"]
        if not cerberus_url:
            logging.error(
                "url where Cerberus publishes True/False signal "
--- a/krkn/invoke/command.py
+++ b/krkn/invoke/command.py
@@ -15,7 +15,7 @@ def invoke(command, timeout=None):


 # Invokes a given command and returns the stdout
-def invoke_no_exit(command, timeout=None):
+def invoke_no_exit(command, timeout=15):
    output = ""
    try:
        output = subprocess.check_output(command, shell=True, universal_newlines=True, timeout=timeout, stderr=subprocess.DEVNULL)
--- a/krkn/prometheus/client.py
+++ b/krkn/prometheus/client.py
@@ -214,7 +214,7 @@ def metrics(
                    end_time=datetime.datetime.fromtimestamp(end_time), granularity=30
                )
            else: 
-                logging.info('didnt match keys')
+                logging.info("didn't match keys")
                continue
            
            for returned_metric in metrics_result:
--- a/krkn/rollback/command.py
+++ b/krkn/rollback/command.py
@@ -3,7 +3,7 @@ import logging
 from typing import Optional, TYPE_CHECKING

 from krkn.rollback.config import RollbackConfig
-from krkn.rollback.handler import execute_rollback_version_files, cleanup_rollback_version_files
+from krkn.rollback.handler import execute_rollback_version_files



@@ -96,24 +96,16 @@ def execute_rollback(telemetry_ocp: "KrknTelemetryOpenshift", run_uuid: Optional
    :return: Exit code (0 for success, 1 for error)
    """
    logging.info("Executing rollback version files")
-    
-    if not run_uuid:
-        logging.error("run_uuid is required for execute-rollback command")
-        return 1
-    
-    if not scenario_type:
-        logging.warning("scenario_type is not specified, executing all scenarios in rollback directory")
-    
+    logging.info(f"Executing rollback for run_uuid={run_uuid  or '*'}, scenario_type={scenario_type or '*'}")
+
    try:
        # Execute rollback version files
-        logging.info(f"Executing rollback for run_uuid={run_uuid}, scenario_type={scenario_type or '*'}")
-        execute_rollback_version_files(telemetry_ocp, run_uuid, scenario_type)
-        
-        # If execution was successful, cleanup the version files
-        logging.info("Rollback execution completed successfully, cleaning up version files")
-        cleanup_rollback_version_files(run_uuid, scenario_type)
-        
-        logging.info("Rollback execution and cleanup completed successfully")
+        execute_rollback_version_files(
+            telemetry_ocp,
+            run_uuid,
+            scenario_type,
+            ignore_auto_rollback_config=True
+        )
        return 0
        
    except Exception as e:
--- a/krkn/rollback/config.py
+++ b/krkn/rollback/config.py
@@ -108,7 +108,76 @@ class RollbackConfig(metaclass=SingletonMeta):
        return f"{cls().versions_directory}/{rollback_context}"
    
    @classmethod
-    def search_rollback_version_files(cls, run_uuid: str, scenario_type: str | None = None) -> list[str]:
+    def is_rollback_version_file_format(cls, file_name: str, expected_scenario_type: str | None = None) -> bool:
+        """
+        Validate the format of a rollback version file name.
+
+        Expected format: <scenario_type>_<timestamp>_<hash_suffix>.py
+        where:
+            - scenario_type: string (can include underscores)
+            - timestamp: integer (nanoseconds since epoch)
+            - hash_suffix: alphanumeric string (length 8)
+            - .py: file extension
+
+        :param file_name: The name of the file to validate.
+        :param expected_scenario_type: The expected scenario type (if any) to validate against.
+        :return: True if the file name matches the expected format, False otherwise.
+        """
+        if not file_name.endswith(".py"):
+            return False
+
+        parts = file_name.split("_")
+        if len(parts) < 3:
+            return False
+
+        scenario_type = "_".join(parts[:-2])
+        timestamp_str = parts[-2]
+        hash_suffix_with_ext = parts[-1]
+        hash_suffix = hash_suffix_with_ext[:-3]
+
+        if expected_scenario_type and scenario_type != expected_scenario_type:
+            return False
+
+        if not timestamp_str.isdigit():
+            return False
+
+        if len(hash_suffix) != 8 or not hash_suffix.isalnum():
+            return False
+
+        return True
+    
+    @classmethod
+    def is_rollback_context_directory_format(cls, directory_name: str, expected_run_uuid: str | None = None) -> bool:
+        """
+        Validate the format of a rollback context directory name.
+
+        Expected format: <timestamp>-<run_uuid>
+        where:
+            - timestamp: integer (nanoseconds since epoch)
+            - run_uuid: alphanumeric string
+
+        :param directory_name: The name of the directory to validate.
+        :param expected_run_uuid: The expected run UUID (if any) to validate against.
+        :return: True if the directory name matches the expected format, False otherwise.
+        """
+        parts = directory_name.split("-", 1)
+        if len(parts) != 2:
+            return False
+
+        timestamp_str, run_uuid = parts
+
+        # Validate timestamp is numeric
+        if not timestamp_str.isdigit():
+            return False
+
+        # Validate run_uuid
+        if expected_run_uuid and expected_run_uuid != run_uuid:
+            return False
+
+        return True
+
+    @classmethod
+    def search_rollback_version_files(cls, run_uuid: str | None = None, scenario_type: str | None = None) -> list[str]:
        """
        Search for rollback version files based on run_uuid and scenario_type.

@@ -123,34 +192,35 @@ class RollbackConfig(metaclass=SingletonMeta):
        if not os.path.exists(cls().versions_directory):
            return []

-        rollback_context_directories = [
-            dirname for dirname in os.listdir(cls().versions_directory) if run_uuid in dirname
-        ]
+        rollback_context_directories = []
+        for dir in os.listdir(cls().versions_directory):
+            if cls.is_rollback_context_directory_format(dir, run_uuid):
+                rollback_context_directories.append(dir)
+            else:
+                logger.warning(f"Directory {dir} does not match expected pattern of <timestamp>-<run_uuid>")
+
        if not rollback_context_directories:
            logger.warning(f"No rollback context directories found for run UUID {run_uuid}")
            return []

-        if len(rollback_context_directories) > 1:
-            logger.warning(
-                f"Expected one directory for run UUID {run_uuid}, found: {rollback_context_directories}"
-            )
-
-        rollback_context_directory = rollback_context_directories[0]

        version_files = []
-        scenario_rollback_versions_directory = os.path.join(
-            cls().versions_directory, rollback_context_directory
-        )
-        for file in os.listdir(scenario_rollback_versions_directory):
-            # assert all files start with scenario_type and end with .py
-            if file.endswith(".py") and (scenario_type is None or file.startswith(scenario_type)):
-                version_files.append(
-                    os.path.join(scenario_rollback_versions_directory, file)
-                )
-            else:
-                logger.warning(
-                    f"File {file} does not match expected pattern for scenario type {scenario_type}"
-                )
+        for rollback_context_dir in rollback_context_directories:
+            rollback_context_dir = os.path.join(cls().versions_directory, rollback_context_dir)
+
+            for file in os.listdir(rollback_context_dir):
+                # Skip known non-rollback files/directories
+                if file == "__pycache__" or file.endswith(".executed"):
+                    continue
+
+                if cls.is_rollback_version_file_format(file, scenario_type):
+                    version_files.append(
+                        os.path.join(rollback_context_dir, file)
+                    )
+                else:
+                    logger.warning(
+                        f"File {file} does not match expected pattern of <{scenario_type or '*'}>_<timestamp>_<hash_suffix>.py"
+                    )
        return version_files

@dataclass(frozen=True)
--- a/krkn/rollback/handler.py
+++ b/krkn/rollback/handler.py
@@ -117,23 +117,32 @@ def _parse_rollback_module(version_file_path: str) -> tuple[RollbackCallable, Ro
    return rollback_callable, rollback_content


-def execute_rollback_version_files(telemetry_ocp: "KrknTelemetryOpenshift", run_uuid: str, scenario_type: str | None = None):
+def execute_rollback_version_files(
+    telemetry_ocp: "KrknTelemetryOpenshift",
+    run_uuid: str | None = None,
+    scenario_type: str | None = None,
+    ignore_auto_rollback_config: bool = False
+):
    """
    Execute rollback version files for the given run_uuid and scenario_type.
    This function is called when a signal is received to perform rollback operations.
    
    :param run_uuid: Unique identifier for the run.
    :param scenario_type: Type of the scenario being rolled back.
+    :param ignore_auto_rollback_config: Flag to ignore auto rollback configuration. Will be set to True for manual execute-rollback calls.
    """
-    
+    if not ignore_auto_rollback_config and RollbackConfig().auto is False:
+            logger.warning(f"Auto rollback is disabled, skipping execution for run_uuid={run_uuid or '*'}, scenario_type={scenario_type or '*'}")
+            return
+
    # Get the rollback versions directory
    version_files = RollbackConfig.search_rollback_version_files(run_uuid, scenario_type)
    if not version_files:
-        logger.warning(f"Skip execution for run_uuid={run_uuid}, scenario_type={scenario_type or '*'}")
+        logger.warning(f"Skip execution for run_uuid={run_uuid or '*'}, scenario_type={scenario_type or '*'}")
        return

    # Execute all version files in the directory
-    logger.info(f"Executing rollback version files for run_uuid={run_uuid}, scenario_type={scenario_type or '*'}")
+    logger.info(f"Executing rollback version files for run_uuid={run_uuid or '*'}, scenario_type={scenario_type or '*'}")
    for version_file in version_files:
        try:
            logger.info(f"Executing rollback version file: {version_file}")
@@ -144,28 +153,37 @@ def execute_rollback_version_files(telemetry_ocp: "KrknTelemetryOpenshift", run_
            logger.info('Executing rollback callable...')
            rollback_callable(rollback_content, telemetry_ocp)
            logger.info('Rollback completed.')
-            
-            logger.info(f"Executed {version_file} successfully.")
+            success = True
        except Exception as e:
+            success = False
            logger.error(f"Failed to execute rollback version file {version_file}: {e}")
            raise

+        # Rename the version file with .executed suffix if successful
+        if success:
+            try:
+                executed_file = f"{version_file}.executed"
+                os.rename(version_file, executed_file)
+                logger.info(f"Renamed {version_file} to {executed_file} successfully.")
+            except Exception as e:
+                logger.error(f"Failed to rename rollback version file {version_file}: {e}")
+                raise

 def cleanup_rollback_version_files(run_uuid: str, scenario_type: str):
    """
    Cleanup rollback version files for the given run_uuid and scenario_type.
-    This function is called to remove the rollback version files after execution.
+    This function is called to remove the rollback version files after successful scenario execution in run_scenarios.
    
    :param run_uuid: Unique identifier for the run.
    :param scenario_type: Type of the scenario being rolled back.
    """
-    
+
    # Get the rollback versions directory
    version_files = RollbackConfig.search_rollback_version_files(run_uuid, scenario_type)
    if not version_files:
        logger.warning(f"Skip cleanup for run_uuid={run_uuid}, scenario_type={scenario_type or '*'}")
        return
-    
+
    # Remove all version files in the directory
    logger.info(f"Cleaning up rollback version files for run_uuid={run_uuid}, scenario_type={scenario_type}")
    for version_file in version_files:
@@ -176,7 +194,6 @@ def cleanup_rollback_version_files(run_uuid: str, scenario_type: str):
            logger.error(f"Failed to remove rollback version file {version_file}: {e}")
            raise

-
 class RollbackHandler:
    def __init__(
        self,
--- a/krkn/scenario_plugins/abstract_scenario_plugin.py
+++ b/krkn/scenario_plugins/abstract_scenario_plugin.py
@@ -115,14 +115,15 @@ class AbstractScenarioPlugin(ABC):
                    )
                    return_value = 1

-            # execute rollback files based on the return value
-            if return_value != 0:
+            if return_value == 0:
+                cleanup_rollback_version_files(
+                    run_uuid, scenario_telemetry.scenario_type
+                )
+            else:
+                # execute rollback files based on the return value
                execute_rollback_version_files(
                    telemetry, run_uuid, scenario_telemetry.scenario_type
                )
-            cleanup_rollback_version_files(
-                run_uuid, scenario_telemetry.scenario_type
-            )
            scenario_telemetry.exit_status = return_value
            scenario_telemetry.end_timestamp = time.time()
            utils.collect_and_put_ocp_logs(
@@ -145,7 +146,7 @@ class AbstractScenarioPlugin(ABC):
            if scenario_telemetry.exit_status != 0:
                failed_scenarios.append(scenario_config)
            scenario_telemetries.append(scenario_telemetry)
-            logging.info(f"wating {wait_duration} before running the next scenario")
+            logging.info(f"waiting {wait_duration} before running the next scenario")
            time.sleep(wait_duration)
        return failed_scenarios, scenario_telemetries

--- a/krkn/scenario_plugins/application_outage/application_outage_scenario_plugin.py
+++ b/krkn/scenario_plugins/application_outage/application_outage_scenario_plugin.py
@@ -34,6 +34,21 @@ class ApplicationOutageScenarioPlugin(AbstractScenarioPlugin):
                )
                namespace = get_yaml_item_value(scenario_config, "namespace", "")
                duration = get_yaml_item_value(scenario_config, "duration", 60)
+                exclude_label = get_yaml_item_value(
+                    scenario_config, "exclude_label", None
+                )
+                match_expressions = self._build_exclude_expressions(exclude_label)
+                if match_expressions:
+                    # Log the format being used for better clarity
+                    format_type = "dict" if isinstance(exclude_label, dict) else "string"
+                    logging.info(
+                        "Excluding pods with labels (%s format): %s",
+                        format_type,
+                        ", ".join(
+                            f"{expr['key']} NOT IN {expr['values']}"
+                            for expr in match_expressions
+                        ),
+                    )

                start_time = int(time.time())
                policy_name = f"krkn-deny-{get_random_string(5)}"
@@ -43,18 +58,30 @@ class ApplicationOutageScenarioPlugin(AbstractScenarioPlugin):
        apiVersion: networking.k8s.io/v1
        kind: NetworkPolicy
        metadata:
-          name: """
-                    + policy_name
-                    + """
+          name: {{ policy_name }}
        spec:
          podSelector:
            matchLabels: {{ pod_selector }}
+{% if match_expressions %}
+            matchExpressions:
+{% for expression in match_expressions %}
+              - key: {{ expression["key"] }}
+                operator: NotIn
+                values:
+{% for value in expression["values"] %}
+                  - {{ value }}
+{% endfor %}
+{% endfor %}
+{% endif %}
          policyTypes: {{ traffic_type }}
        """
                )
                t = Template(network_policy_template)
                rendered_spec = t.render(
-                    pod_selector=pod_selector, traffic_type=traffic_type
+                    pod_selector=pod_selector,
+                    traffic_type=traffic_type,
+                    match_expressions=match_expressions,
+                    policy_name=policy_name,
                )
                yaml_spec = yaml.safe_load(rendered_spec)
                # Block the traffic by creating network policy
@@ -122,3 +149,63 @@ class ApplicationOutageScenarioPlugin(AbstractScenarioPlugin):

    def get_scenario_types(self) -> list[str]:
        return ["application_outages_scenarios"]
+
+    @staticmethod
+    def _build_exclude_expressions(exclude_label) -> list[dict]:
+        """
+        Build match expressions for NetworkPolicy from exclude_label.
+        
+        Supports multiple formats:
+        - Dict format (preferred, similar to pod_selector): {key1: value1, key2: [value2, value3]}
+          Example: {tier: "gold", env: ["prod", "staging"]}
+        - String format: "key1=value1,key2=value2" or "key1=value1|value2"
+          Example: "tier=gold,env=prod" or "tier=gold|platinum"
+        - List format (list of strings): ["key1=value1", "key2=value2"]
+          Example: ["tier=gold", "env=prod"]
+          Note: List elements must be strings in "key=value" format.
+        
+        :param exclude_label: Can be dict, string, list of strings, or None
+        :return: List of match expression dictionaries
+        """
+        expressions: list[dict] = []
+
+        if not exclude_label:
+            return expressions
+
+        def _append_expr(key: str, values):
+            if not key or values is None:
+                return
+            if not isinstance(values, list):
+                values = [values]
+            cleaned_values = [str(v).strip() for v in values if str(v).strip()]
+            if cleaned_values:
+                expressions.append({"key": key.strip(), "values": cleaned_values})
+
+        if isinstance(exclude_label, dict):
+            for k, v in exclude_label.items():
+                _append_expr(str(k), v)
+            return expressions
+
+        if isinstance(exclude_label, list):
+            selectors = exclude_label
+        else:
+            selectors = [sel.strip() for sel in str(exclude_label).split(",")]
+
+        for selector in selectors:
+            if not selector:
+                continue
+            if "=" not in selector:
+                logging.warning(
+                    "exclude_label entry '%s' is invalid, expected key=value format",
+                    selector,
+                )
+                continue
+            key, value = selector.split("=", 1)
+            value_items = (
+                [item.strip() for item in value.split("|") if item.strip()]
+                if value
+                else []
+            )
+            _append_expr(key, value_items or value)
+
+        return expressions
--- a/krkn/scenario_plugins/container/container_scenario_plugin.py
+++ b/krkn/scenario_plugins/container/container_scenario_plugin.py
@@ -1,6 +1,7 @@
 import logging
 import random
 import time
+import traceback
 from asyncio import Future
 import yaml
 from krkn_lib.k8s import KrknKubernetes
@@ -41,6 +42,7 @@ class ContainerScenarioPlugin(AbstractScenarioPlugin):
                        logging.info("ContainerScenarioPlugin failed with unrecovered containers")
                        return 1
        except (RuntimeError, Exception) as e:
+            logging.error("Stack trace:\n%s", traceback.format_exc())
            logging.error("ContainerScenarioPlugin exiting due to Exception %s" % e)
            return 1
        else:
@@ -50,7 +52,6 @@ class ContainerScenarioPlugin(AbstractScenarioPlugin):
        return ["container_scenarios"]

    def start_monitoring(self, kill_scenario: dict, lib_telemetry: KrknTelemetryOpenshift) -> Future:
-        
        namespace_pattern = f"^{kill_scenario['namespace']}$"
        label_selector = kill_scenario["label_selector"]
        recovery_time = kill_scenario["expected_recovery_time"]
@@ -70,6 +71,7 @@ class ContainerScenarioPlugin(AbstractScenarioPlugin):
        container_name = get_yaml_item_value(cont_scenario, "container_name", "")
        kill_action = get_yaml_item_value(cont_scenario, "action", 1)
        kill_count = get_yaml_item_value(cont_scenario, "count", 1)
+        exclude_label = get_yaml_item_value(cont_scenario, "exclude_label", "")
        if not isinstance(kill_action, int):
            logging.error(
                "Please make sure the action parameter defined in the "
@@ -91,7 +93,19 @@ class ContainerScenarioPlugin(AbstractScenarioPlugin):
                pods = kubecli.get_all_pods(label_selector)
            else:
                # Only returns pod names
-                pods = kubecli.list_pods(namespace, label_selector)
+                # Use list_pods with exclude_label parameter to exclude pods
+                if exclude_label:
+                    logging.info(
+                        "Using exclude_label '%s' to exclude pods from container scenario %s in namespace %s",
+                        exclude_label,
+                        scenario_name,
+                        namespace,
+                    )
+                pods = kubecli.list_pods(
+                    namespace=namespace,
+                    label_selector=label_selector,
+                    exclude_label=exclude_label if exclude_label else None
+                )
        else:
            if namespace == "*":
                logging.error(
@@ -102,6 +116,7 @@ class ContainerScenarioPlugin(AbstractScenarioPlugin):
                # sys.exit(1)
                raise RuntimeError()
            pods = pod_names
+
        # get container and pod name
        container_pod_list = []
        for pod in pods:
@@ -218,4 +233,5 @@ class ContainerScenarioPlugin(AbstractScenarioPlugin):
            timer += 5
            logging.info("Waiting 5 seconds for containers to become ready")
            time.sleep(5)
+
        return killed_container_list
--- a/krkn/scenario_plugins/hogs/hogs_scenario_plugin.py
+++ b/krkn/scenario_plugins/hogs/hogs_scenario_plugin.py
@@ -53,7 +53,7 @@ class HogsScenarioPlugin(AbstractScenarioPlugin):
                    raise Exception("no available nodes to schedule workload")

                if not has_selector:
-                    available_nodes = [available_nodes[random.randint(0, len(available_nodes))]]
+                    available_nodes = [available_nodes[random.randint(0, len(available_nodes) - 1)]]

            if scenario_config.number_of_nodes and len(available_nodes) > scenario_config.number_of_nodes:
                available_nodes = random.sample(available_nodes, scenario_config.number_of_nodes)
--- a/krkn/scenario_plugins/kubevirt_vm_outage/kubevirt_vm_outage_scenario_plugin.py
+++ b/krkn/scenario_plugins/kubevirt_vm_outage/kubevirt_vm_outage_scenario_plugin.py
@@ -25,6 +25,7 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
        super().__init__(scenario_type)
        self.k8s_client = None
        self.original_vmi = None
+        self.vmis_list = []
        
    # Scenario type is handled directly in execute_scenario
    def get_scenario_types(self) -> list[str]:
@@ -54,7 +55,8 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
                    pods_status.merge(single_pods_status)
            
            scenario_telemetry.affected_pods = pods_status
-                        
+            if len(scenario_telemetry.affected_pods.unrecovered) > 0: 
+                return 1
            return 0
        except Exception as e:
            logging.error(f"KubeVirt VM Outage scenario failed: {e}")
@@ -106,20 +108,20 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
        :return: The VMI object if found, None otherwise
        """
        try:
-            vmis = self.custom_object_client.list_namespaced_custom_object(
-                group="kubevirt.io",
-                version="v1",
-                namespace=namespace,
-                plural="virtualmachineinstances",
-            )
+            namespaces = self.k8s_client.list_namespaces_by_regex(namespace)
+            for namespace in namespaces:
+                vmis = self.custom_object_client.list_namespaced_custom_object(
+                    group="kubevirt.io",
+                    version="v1",
+                    namespace=namespace,
+                    plural="virtualmachineinstances",
+                )

-            vmi_list = []
-            for vmi in vmis.get("items"):
-                vmi_name = vmi.get("metadata",{}).get("name")
-                match = re.match(regex_name, vmi_name)
-                if match:
-                    vmi_list.append(vmi)
-            return vmi_list
+                for vmi in vmis.get("items"):
+                    vmi_name = vmi.get("metadata",{}).get("name")
+                    match = re.match(regex_name, vmi_name)
+                    if match:
+                        self.vmis_list.append(vmi)
        except ApiException as e:
            if e.status == 404:
                logging.warning(f"VMI {regex_name} not found in namespace {namespace}")
@@ -152,21 +154,22 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
                logging.error("vm_name parameter is required")
                return 1
            self.pods_status = PodsStatus()
-            vmis_list = self.get_vmis(vm_name,namespace)
+            self.get_vmis(vm_name,namespace)
            for _ in range(kill_count):
                
-                rand_int = random.randint(0, len(vmis_list) - 1)
-                vmi = vmis_list[rand_int]
+                rand_int = random.randint(0, len(self.vmis_list) - 1)
+                vmi = self.vmis_list[rand_int]
                    
                logging.info(f"Starting KubeVirt VM outage scenario for VM: {vm_name} in namespace: {namespace}")
                vmi_name = vmi.get("metadata").get("name")
-                if not self.validate_environment(vmi_name, namespace):
+                vmi_namespace = vmi.get("metadata").get("namespace")
+                if not self.validate_environment(vmi_name, vmi_namespace):
                    return 1
                    
-                vmi = self.get_vmi(vmi_name, namespace)
+                vmi = self.get_vmi(vmi_name, vmi_namespace)
                self.affected_pod = AffectedPod(
                    pod_name=vmi_name,
-                    namespace=namespace,
+                    namespace=vmi_namespace,
                )
                if not vmi:
                    logging.error(f"VMI {vm_name} not found in namespace {namespace}")
@@ -174,12 +177,12 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
                    
                self.original_vmi = vmi
                logging.info(f"Captured initial state of VMI: {vm_name}")
-                result = self.delete_vmi(vmi_name, namespace, disable_auto_restart)
+                result = self.delete_vmi(vmi_name, vmi_namespace, disable_auto_restart)
                if result != 0:
                    self.pods_status.unrecovered.append(self.affected_pod)
                    continue

-                result = self.wait_for_running(vmi_name,namespace, timeout)
+                result = self.wait_for_running(vmi_name,vmi_namespace, timeout)
                if result != 0:
                    self.pods_status.unrecovered.append(self.affected_pod)
                    continue
--- a/krkn/scenario_plugins/native/network/cerberus.py
+++ b/krkn/scenario_plugins/native/network/cerberus.py
@@ -27,7 +27,7 @@ def get_status(config, start_time, end_time):
    application_routes_status = True
    if config["cerberus"]["cerberus_enabled"]:
        cerberus_url = config["cerberus"]["cerberus_url"]
-        check_application_routes = config["cerberus"]["check_applicaton_routes"]
+        check_application_routes = config["cerberus"]["check_application_routes"]
        if not cerberus_url:
            logging.error("url where Cerberus publishes True/False signal is not provided.")
            sys.exit(1)
--- a/krkn/scenario_plugins/native/pod_network_outage/cerberus.py
+++ b/krkn/scenario_plugins/native/pod_network_outage/cerberus.py
@@ -27,7 +27,7 @@ def get_status(config, start_time, end_time):
    application_routes_status = True
    if config["cerberus"]["cerberus_enabled"]:
        cerberus_url = config["cerberus"]["cerberus_url"]
-        check_application_routes = config["cerberus"]["check_applicaton_routes"]
+        check_application_routes = config["cerberus"]["check_application_routes"]
        if not cerberus_url:
            logging.error(
                "url where Cerberus publishes True/False signal is not provided.")
--- a/krkn/scenario_plugins/native/pod_network_outage/pod_network_outage_plugin.py
+++ b/krkn/scenario_plugins/native/pod_network_outage/pod_network_outage_plugin.py
@@ -36,7 +36,7 @@ def get_test_pods(
            - pods matching the label on which network policy
              need to be applied

-        namepsace (string)
+        namespace (string)
            - namespace in which the pod is present

        kubecli (KrknKubernetes)
--- a/krkn/scenario_plugins/network_chaos_ng/models.py
+++ b/krkn/scenario_plugins/network_chaos_ng/models.py
@@ -1,5 +1,7 @@
+import re
 from dataclasses import dataclass
 from enum import Enum
+from typing import TypeVar, Optional


 class NetworkChaosScenarioType(Enum):
@@ -9,16 +11,21 @@ class NetworkChaosScenarioType(Enum):

@dataclass
 class BaseNetworkChaosConfig:
-    supported_execution = ["serial", "parallel"]
    id: str
+    image: str
    wait_duration: int
    test_duration: int
    label_selector: str
    service_account: str
+    taints: list[str]
+    namespace: str
    instance_count: int
    execution: str
-    namespace: str
-    taints: list[str]
+    supported_execution = ["serial", "parallel"]
+    interfaces: list[str]
+    target: str
+    ingress: bool
+    egress: bool

    def validate(self) -> list[str]:
        errors = []
@@ -41,12 +48,7 @@ class BaseNetworkChaosConfig:

@dataclass
 class NetworkFilterConfig(BaseNetworkChaosConfig):
-    ingress: bool
-    egress: bool
-    interfaces: list[str]
-    target: str
    ports: list[int]
-    image: str
    protocols: list[str]

    def validate(self) -> list[str]:
@@ -58,3 +60,30 @@ class NetworkFilterConfig(BaseNetworkChaosConfig):
                f"{self.protocols} contains not allowed protocols only tcp and udp is allowed"
            )
        return errors
+
+
+@dataclass
+class NetworkChaosConfig(BaseNetworkChaosConfig):
+    latency: Optional[str] = None
+    loss: Optional[str] = None
+    bandwidth: Optional[str] = None
+    force: Optional[bool] = None
+
+    def validate(self) -> list[str]:
+        errors = super().validate()
+        latency_regex = re.compile(r"^(\d+)(us|ms|s)$")
+        bandwidth_regex = re.compile(r"^(\d+)(bit|kbit|mbit|gbit|tbit)$")
+        if self.latency:
+            if not (latency_regex.match(self.latency)):
+                errors.append(
+                    "latency must be a number followed by `us` (microseconds) or `ms` (milliseconds), or `s` (seconds)"
+                )
+        if self.bandwidth:
+            if not (bandwidth_regex.match(self.bandwidth)):
+                errors.append(
+                    "bandwidth must be a number followed by `bit` `kbit` or `mbit` or `tbit`"
+                )
+        if self.loss:
+            if "%" in self.loss or not self.loss.isdigit():
+                errors.append("loss must be a number followed without the `%` symbol")
+        return errors
--- a/krkn/scenario_plugins/network_chaos_ng/modules/abstract_network_chaos_module.py
+++ b/krkn/scenario_plugins/network_chaos_ng/modules/abstract_network_chaos_module.py
@@ -1,6 +1,7 @@
 import abc
 import logging
 import queue
+from typing import Tuple

 from krkn_lib.telemetry.ocp import KrknTelemetryOpenshift
 from krkn.scenario_plugins.network_chaos_ng.models import (
@@ -27,7 +28,7 @@ class AbstractNetworkChaosModule(abc.ABC):
        pass

    @abc.abstractmethod
-    def get_config(self) -> (NetworkChaosScenarioType, BaseNetworkChaosConfig):
+    def get_config(self) -> Tuple[NetworkChaosScenarioType, BaseNetworkChaosConfig]:
        """
        returns the common subset of settings shared by all the scenarios `BaseNetworkChaosConfig` and the type of Network
        Chaos Scenario that is running (Pod Scenario or Node Scenario)
@@ -41,6 +42,42 @@ class AbstractNetworkChaosModule(abc.ABC):

        pass

+    def get_node_targets(self, config: BaseNetworkChaosConfig):
+        if self.base_network_config.label_selector:
+            return self.kubecli.get_lib_kubernetes().list_nodes(
+                self.base_network_config.label_selector
+            )
+        else:
+            if not config.target:
+                raise Exception(
+                    "neither node selector nor node_name (target) specified, aborting."
+                )
+            node_info = self.kubecli.get_lib_kubernetes().list_nodes()
+            if config.target not in node_info:
+                raise Exception(f"node {config.target} not found, aborting")
+
+            return [config.target]
+
+    def get_pod_targets(self, config: BaseNetworkChaosConfig):
+        if not config.namespace:
+            raise Exception("namespace not specified, aborting")
+        if self.base_network_config.label_selector:
+            return self.kubecli.get_lib_kubernetes().list_pods(
+                config.namespace, config.label_selector
+            )
+        else:
+            if not config.target:
+                raise Exception(
+                    "neither node selector nor node_name (target) specified, aborting."
+                )
+            if not self.kubecli.get_lib_kubernetes().check_if_pod_exists(
+                config.target, config.namespace
+            ):
+                raise Exception(
+                    f"pod {config.target} not found in namespace {config.namespace}"
+                )
+            return [config.target]
+
    def __init__(
        self,
        base_network_config: BaseNetworkChaosConfig,
--- a/krkn/scenario_plugins/network_chaos_ng/modules/node_network_chaos.py
+++ b/krkn/scenario_plugins/network_chaos_ng/modules/node_network_chaos.py
@@ -0,0 +1,156 @@
+import queue
+import time
+from typing import Tuple
+
+from krkn_lib.telemetry.ocp import KrknTelemetryOpenshift
+from krkn_lib.utils import get_random_string
+
+from krkn.scenario_plugins.network_chaos_ng.models import (
+    NetworkChaosScenarioType,
+    BaseNetworkChaosConfig,
+    NetworkChaosConfig,
+)
+from krkn.scenario_plugins.network_chaos_ng.modules.abstract_network_chaos_module import (
+    AbstractNetworkChaosModule,
+)
+from krkn.scenario_plugins.network_chaos_ng.modules.utils import (
+    log_info,
+    setup_network_chaos_ng_scenario,
+    log_error,
+    log_warning,
+)
+from krkn.scenario_plugins.network_chaos_ng.modules.utils_network_chaos import (
+    common_set_limit_rules,
+    common_delete_limit_rules,
+    node_qdisc_is_simple,
+)
+
+
+class NodeNetworkChaosModule(AbstractNetworkChaosModule):
+
+    def __init__(self, config: NetworkChaosConfig, kubecli: KrknTelemetryOpenshift):
+        super().__init__(config, kubecli)
+        self.config = config
+
+    def run(self, target: str, error_queue: queue.Queue = None):
+        parallel = False
+        if error_queue:
+            parallel = True
+        try:
+            network_chaos_pod_name = f"node-network-chaos-{get_random_string(5)}"
+            container_name = f"fedora-container-{get_random_string(5)}"
+
+            log_info(
+                f"creating workload to inject network chaos in node {target} network"
+                f"latency:{str(self.config.latency) if self.config.latency else '0'}, "
+                f"packet drop:{str(self.config.loss) if self.config.loss else '0'} "
+                f"bandwidth restriction:{str(self.config.bandwidth) if self.config.bandwidth else '0'} ",
+                parallel,
+                network_chaos_pod_name,
+            )
+
+            _, interfaces = setup_network_chaos_ng_scenario(
+                self.config,
+                target,
+                network_chaos_pod_name,
+                container_name,
+                self.kubecli.get_lib_kubernetes(),
+                target,
+                parallel,
+                True,
+            )
+
+            if len(self.config.interfaces) == 0:
+                if len(interfaces) == 0:
+                    log_error(
+                        "no network interface found in pod, impossible to execute the network chaos scenario",
+                        parallel,
+                        network_chaos_pod_name,
+                    )
+                    return
+                log_info(
+                    f"detected network interfaces: {','.join(interfaces)}",
+                    parallel,
+                    network_chaos_pod_name,
+                )
+            else:
+                interfaces = self.config.interfaces
+
+            log_info(
+                f"targeting node {target}",
+                parallel,
+                network_chaos_pod_name,
+            )
+
+            complex_config_interfaces = []
+            for interface in interfaces:
+                is_simple = node_qdisc_is_simple(
+                    self.kubecli.get_lib_kubernetes(),
+                    network_chaos_pod_name,
+                    self.config.namespace,
+                    interface,
+                )
+                if not is_simple:
+                    complex_config_interfaces.append(interface)
+
+            if len(complex_config_interfaces) > 0 and not self.config.force:
+                log_warning(
+                    f"node already has tc rules set for {','.join(complex_config_interfaces)}, this action might damage the cluster,"
+                    "if you want to continue set `force` to True in the node network "
+                    "chaos scenario config file and try again"
+                )
+            else:
+                if len(complex_config_interfaces) > 0 and self.config.force:
+                    log_warning(
+                        f"you are forcing node network configuration override for {','.join(complex_config_interfaces)},"
+                        "this action might lead to unpredictable node behaviour, "
+                        "you're doing it in your own responsibility"
+                        "waiting 10 seconds before continuing"
+                    )
+                    time.sleep(10)
+                common_set_limit_rules(
+                    self.config.egress,
+                    self.config.ingress,
+                    interfaces,
+                    self.config.bandwidth,
+                    self.config.latency,
+                    self.config.loss,
+                    parallel,
+                    network_chaos_pod_name,
+                    self.kubecli.get_lib_kubernetes(),
+                    network_chaos_pod_name,
+                    self.config.namespace,
+                    None,
+                )
+
+                time.sleep(self.config.test_duration)
+
+                log_info("removing tc rules", parallel, network_chaos_pod_name)
+
+                common_delete_limit_rules(
+                    self.config.egress,
+                    self.config.ingress,
+                    interfaces,
+                    network_chaos_pod_name,
+                    self.config.namespace,
+                    self.kubecli.get_lib_kubernetes(),
+                    None,
+                    parallel,
+                    network_chaos_pod_name,
+                )
+
+            self.kubecli.get_lib_kubernetes().delete_pod(
+                network_chaos_pod_name, self.config.namespace
+            )
+
+        except Exception as e:
+            if error_queue is None:
+                raise e
+            else:
+                error_queue.put(str(e))
+
+    def get_config(self) -> Tuple[NetworkChaosScenarioType, BaseNetworkChaosConfig]:
+        return NetworkChaosScenarioType.Node, self.config
+
+    def get_targets(self) -> list[str]:
+        return self.get_node_targets(self.config)
--- a/krkn/scenario_plugins/network_chaos_ng/modules/node_network_filter.py
+++ b/krkn/scenario_plugins/network_chaos_ng/modules/node_network_filter.py
@@ -1,5 +1,6 @@
 import queue
 import time
+from typing import Tuple

 from krkn_lib.telemetry.ocp import KrknTelemetryOpenshift
 from krkn_lib.utils import get_random_string
@@ -11,14 +12,16 @@ from krkn.scenario_plugins.network_chaos_ng.models import (
 from krkn.scenario_plugins.network_chaos_ng.modules.abstract_network_chaos_module import (
    AbstractNetworkChaosModule,
 )
-from krkn.scenario_plugins.network_chaos_ng.modules.utils import log_info
+from krkn.scenario_plugins.network_chaos_ng.modules.utils import (
+    log_info,
+    deploy_network_chaos_ng_pod,
+    get_pod_default_interface,
+)

 from krkn.scenario_plugins.network_chaos_ng.modules.utils_network_filter import (
-    deploy_network_filter_pod,
    apply_network_rules,
    clean_network_rules,
    generate_rules,
-    get_default_interface,
 )


@@ -41,7 +44,7 @@ class NodeNetworkFilterModule(AbstractNetworkChaosModule):
            )

            pod_name = f"node-filter-{get_random_string(5)}"
-            deploy_network_filter_pod(
+            deploy_network_chaos_ng_pod(
                self.config,
                target,
                pod_name,
@@ -50,7 +53,7 @@ class NodeNetworkFilterModule(AbstractNetworkChaosModule):

            if len(self.config.interfaces) == 0:
                interfaces = [
-                    get_default_interface(
+                    get_pod_default_interface(
                        pod_name,
                        self.config.namespace,
                        self.kubecli.get_lib_kubernetes(),
@@ -108,21 +111,8 @@ class NodeNetworkFilterModule(AbstractNetworkChaosModule):
        super().__init__(config, kubecli)
        self.config = config

-    def get_config(self) -> (NetworkChaosScenarioType, BaseNetworkChaosConfig):
+    def get_config(self) -> Tuple[NetworkChaosScenarioType, BaseNetworkChaosConfig]:
        return NetworkChaosScenarioType.Node, self.config

    def get_targets(self) -> list[str]:
-        if self.base_network_config.label_selector:
-            return self.kubecli.get_lib_kubernetes().list_nodes(
-                self.base_network_config.label_selector
-            )
-        else:
-            if not self.config.target:
-                raise Exception(
-                    "neither node selector nor node_name (target) specified, aborting."
-                )
-            node_info = self.kubecli.get_lib_kubernetes().list_nodes()
-            if self.config.target not in node_info:
-                raise Exception(f"node {self.config.target} not found, aborting")
-
-            return [self.config.target]
+        return self.get_node_targets(self.config)
--- a/krkn/scenario_plugins/network_chaos_ng/modules/pod_network_chaos.py
+++ b/krkn/scenario_plugins/network_chaos_ng/modules/pod_network_chaos.py
@@ -0,0 +1,159 @@
+import queue
+import time
+from typing import Tuple
+
+from krkn_lib.telemetry.ocp import KrknTelemetryOpenshift
+from krkn_lib.utils import get_random_string
+
+from krkn.scenario_plugins.network_chaos_ng.models import (
+    NetworkChaosScenarioType,
+    BaseNetworkChaosConfig,
+    NetworkChaosConfig,
+)
+from krkn.scenario_plugins.network_chaos_ng.modules.abstract_network_chaos_module import (
+    AbstractNetworkChaosModule,
+)
+from krkn.scenario_plugins.network_chaos_ng.modules.utils import (
+    log_info,
+    setup_network_chaos_ng_scenario,
+    log_error,
+)
+from krkn.scenario_plugins.network_chaos_ng.modules.utils_network_chaos import (
+    common_set_limit_rules,
+    common_delete_limit_rules,
+)
+
+
+class PodNetworkChaosModule(AbstractNetworkChaosModule):
+
+    def __init__(self, config: NetworkChaosConfig, kubecli: KrknTelemetryOpenshift):
+        super().__init__(config, kubecli)
+        self.config = config
+
+    def run(self, target: str, error_queue: queue.Queue = None):
+        parallel = False
+        if error_queue:
+            parallel = True
+        try:
+            network_chaos_pod_name = f"pod-network-chaos-{get_random_string(5)}"
+            container_name = f"fedora-container-{get_random_string(5)}"
+            pod_info = self.kubecli.get_lib_kubernetes().get_pod_info(
+                target, self.config.namespace
+            )
+
+            log_info(
+                f"creating workload to inject network chaos in pod {target} network"
+                f"latency:{str(self.config.latency) if self.config.latency else '0'}, "
+                f"packet drop:{str(self.config.loss) if self.config.loss else '0'} "
+                f"bandwidth restriction:{str(self.config.bandwidth) if self.config.bandwidth else '0'} ",
+                parallel,
+                network_chaos_pod_name,
+            )
+
+            if not pod_info:
+                raise Exception(
+                    f"impossible to retrieve infos for pod {target} namespace {self.config.namespace}"
+                )
+
+            container_ids, interfaces = setup_network_chaos_ng_scenario(
+                self.config,
+                pod_info.nodeName,
+                network_chaos_pod_name,
+                container_name,
+                self.kubecli.get_lib_kubernetes(),
+                target,
+                parallel,
+                False,
+            )
+
+            if len(self.config.interfaces) == 0:
+                if len(interfaces) == 0:
+                    log_error(
+                        "no network interface found in pod, impossible to execute the network chaos scenario",
+                        parallel,
+                        network_chaos_pod_name,
+                    )
+                    return
+                log_info(
+                    f"detected network interfaces: {','.join(interfaces)}",
+                    parallel,
+                    network_chaos_pod_name,
+                )
+            else:
+                interfaces = self.config.interfaces
+
+            if len(container_ids) == 0:
+                raise Exception(
+                    f"impossible to resolve container id for pod {target} namespace {self.config.namespace}"
+                )
+
+            log_info(
+                f"targeting container {container_ids[0]}",
+                parallel,
+                network_chaos_pod_name,
+            )
+
+            pids = self.kubecli.get_lib_kubernetes().get_pod_pids(
+                base_pod_name=network_chaos_pod_name,
+                base_pod_namespace=self.config.namespace,
+                base_pod_container_name=container_name,
+                pod_name=target,
+                pod_namespace=self.config.namespace,
+                pod_container_id=container_ids[0],
+            )
+
+            if not pids:
+                raise Exception(f"impossible to resolve pid for pod {target}")
+
+            log_info(
+                f"resolved pids {pids} in node {pod_info.nodeName} for pod {target}",
+                parallel,
+                network_chaos_pod_name,
+            )
+
+            common_set_limit_rules(
+                self.config.egress,
+                self.config.ingress,
+                interfaces,
+                self.config.bandwidth,
+                self.config.latency,
+                self.config.loss,
+                parallel,
+                network_chaos_pod_name,
+                self.kubecli.get_lib_kubernetes(),
+                network_chaos_pod_name,
+                self.config.namespace,
+                pids,
+            )
+
+            time.sleep(self.config.test_duration)
+
+            log_info("removing tc rules", parallel, network_chaos_pod_name)
+
+            common_delete_limit_rules(
+                self.config.egress,
+                self.config.ingress,
+                interfaces,
+                network_chaos_pod_name,
+                self.config.namespace,
+                self.kubecli.get_lib_kubernetes(),
+                pids,
+                parallel,
+                network_chaos_pod_name,
+            )
+
+            self.kubecli.get_lib_kubernetes().delete_pod(
+                network_chaos_pod_name, self.config.namespace
+            )
+
+        except Exception as e:
+            if error_queue is None:
+                raise e
+            else:
+                error_queue.put(str(e))
+
+    def get_config(self) -> Tuple[NetworkChaosScenarioType, BaseNetworkChaosConfig]:
+        return NetworkChaosScenarioType.Pod, self.config
+
+    def get_targets(self) -> list[str]:
+        return self.get_pod_targets(self.config)
--- a/krkn/scenario_plugins/network_chaos_ng/modules/pod_network_filter.py
+++ b/krkn/scenario_plugins/network_chaos_ng/modules/pod_network_filter.py
@@ -1,6 +1,6 @@
-import logging
 import queue
 import time
+from typing import Tuple

 from krkn_lib.telemetry.ocp import KrknTelemetryOpenshift
 from krkn_lib.utils import get_random_string
@@ -13,12 +13,17 @@ from krkn.scenario_plugins.network_chaos_ng.models import (
 from krkn.scenario_plugins.network_chaos_ng.modules.abstract_network_chaos_module import (
    AbstractNetworkChaosModule,
 )
-from krkn.scenario_plugins.network_chaos_ng.modules.utils import log_info, log_error
+from krkn.scenario_plugins.network_chaos_ng.modules.utils import (
+    log_info,
+    log_error,
+    deploy_network_chaos_ng_pod,
+    get_pod_default_interface,
+    setup_network_chaos_ng_scenario,
+)
 from krkn.scenario_plugins.network_chaos_ng.modules.utils_network_filter import (
-    deploy_network_filter_pod,
-    generate_namespaced_rules,
    apply_network_rules,
    clean_network_rules_namespaced,
+    generate_namespaced_rules,
 )


@@ -50,22 +55,18 @@ class PodNetworkFilterModule(AbstractNetworkChaosModule):
                    f"impossible to retrieve infos for pod {self.config.target} namespace {self.config.namespace}"
                )

-            deploy_network_filter_pod(
+            container_ids, interfaces = setup_network_chaos_ng_scenario(
                self.config,
                pod_info.nodeName,
                pod_name,
-                self.kubecli.get_lib_kubernetes(),
                container_name,
-                host_network=False,
+                self.kubecli.get_lib_kubernetes(),
+                target,
+                parallel,
+                False,
            )

            if len(self.config.interfaces) == 0:
-                interfaces = (
-                    self.kubecli.get_lib_kubernetes().list_pod_network_interfaces(
-                        target, self.config.namespace
-                    )
-                )
-
                if len(interfaces) == 0:
                    log_error(
                        "no network interface found in pod, impossible to execute the network filter scenario",
@@ -157,26 +158,8 @@ class PodNetworkFilterModule(AbstractNetworkChaosModule):
        super().__init__(config, kubecli)
        self.config = config

-    def get_config(self) -> (NetworkChaosScenarioType, BaseNetworkChaosConfig):
+    def get_config(self) -> Tuple[NetworkChaosScenarioType, BaseNetworkChaosConfig]:
        return NetworkChaosScenarioType.Pod, self.config

    def get_targets(self) -> list[str]:
-        if not self.config.namespace:
-            raise Exception("namespace not specified, aborting")
-        if self.base_network_config.label_selector:
-            return self.kubecli.get_lib_kubernetes().list_pods(
-                self.config.namespace, self.config.label_selector
-            )
-        else:
-            if not self.config.target:
-                raise Exception(
-                    "neither node selector nor node_name (target) specified, aborting."
-                )
-            if not self.kubecli.get_lib_kubernetes().check_if_pod_exists(
-                self.config.target, self.config.namespace
-            ):
-                raise Exception(
-                    f"pod {self.config.target} not found in namespace {self.config.namespace}"
-                )
-
-            return [self.config.target]
+        return self.get_pod_targets(self.config)
--- a/krkn/scenario_plugins/network_chaos_ng/modules/utils.py
+++ b/krkn/scenario_plugins/network_chaos_ng/modules/utils.py
@@ -1,4 +1,15 @@
 import logging
+import os
+from typing import Tuple
+
+import yaml
+from jinja2 import FileSystemLoader, Environment
+from krkn_lib.k8s import KrknKubernetes
+from krkn_lib.models.k8s import Pod
+
+from krkn.scenario_plugins.network_chaos_ng.models import (
+    BaseNetworkChaosConfig,
+)


 def log_info(message: str, parallel: bool = False, node_name: str = ""):
@@ -29,3 +40,101 @@ def log_warning(message: str, parallel: bool = False, node_name: str = ""):
        logging.warning(f"[{node_name}]: {message}")
    else:
        logging.warning(message)
+
+
+def deploy_network_chaos_ng_pod(
+    config: BaseNetworkChaosConfig,
+    target_node: str,
+    pod_name: str,
+    kubecli: KrknKubernetes,
+    container_name: str = "fedora",
+    host_network: bool = True,
+):
+    file_loader = FileSystemLoader(os.path.abspath(os.path.dirname(__file__)))
+    env = Environment(loader=file_loader, autoescape=True)
+    pod_template = env.get_template("templates/network-chaos.j2")
+    tolerations = []
+
+    for taint in config.taints:
+        key_value_part, effect = taint.split(":", 1)
+        if "=" in key_value_part:
+            key, value = key_value_part.split("=", 1)
+            operator = "Equal"
+        else:
+            key = key_value_part
+            value = None
+            operator = "Exists"
+        toleration = {
+            "key": key,
+            "operator": operator,
+            "effect": effect,
+        }
+        if value is not None:
+            toleration["value"] = value
+        tolerations.append(toleration)
+
+    pod_body = yaml.safe_load(
+        pod_template.render(
+            pod_name=pod_name,
+            namespace=config.namespace,
+            host_network=host_network,
+            target=target_node,
+            container_name=container_name,
+            workload_image=config.image,
+            taints=tolerations,
+            service_account=config.service_account,
+        )
+    )
+
+    kubecli.create_pod(pod_body, config.namespace, 300)
+
+
+def get_pod_default_interface(
+    pod_name: str, namespace: str, kubecli: KrknKubernetes
+) -> str:
+    cmd = "ip r | grep default | awk '/default/ {print $5}'"
+    output = kubecli.exec_cmd_in_pod([cmd], pod_name, namespace)
+    return output.replace("\n", "")
+
+
+def setup_network_chaos_ng_scenario(
+    config: BaseNetworkChaosConfig,
+    node_name: str,
+    pod_name: str,
+    container_name: str,
+    kubecli: KrknKubernetes,
+    target: str,
+    parallel: bool,
+    host_network: bool,
+) -> Tuple[list[str], list[str]]:
+
+    deploy_network_chaos_ng_pod(
+        config,
+        node_name,
+        pod_name,
+        kubecli,
+        container_name,
+        host_network=host_network,
+    )
+
+    if len(config.interfaces) == 0:
+        interfaces = [
+            get_pod_default_interface(
+                pod_name,
+                config.namespace,
+                kubecli,
+            )
+        ]
+
+        log_info(f"detected default interface {interfaces[0]}", parallel, target)
+
+    else:
+        interfaces = config.interfaces
+    # if not host_network means that the target is a pod so container_ids need to be resolved
+    # otherwise it's not needed
+    if not host_network:
+        container_ids = kubecli.get_container_ids(target, config.namespace)
+    else:
+        container_ids = []
+
+    return container_ids, interfaces
--- a/krkn/scenario_plugins/network_chaos_ng/modules/utils_network_chaos.py
+++ b/krkn/scenario_plugins/network_chaos_ng/modules/utils_network_chaos.py
@@ -0,0 +1,263 @@
+import subprocess
+import logging
+from typing import Optional
+
+from krkn_lib.k8s import KrknKubernetes
+
+from krkn.scenario_plugins.network_chaos_ng.modules.utils import (
+    log_info,
+    log_warning,
+    log_error,
+)
+
+ROOT_HANDLE = "100:"
+CLASS_ID = "100:1"
+NETEM_HANDLE = "101:"
+
+
+def run(cmd: list[str], check: bool = True) -> subprocess.CompletedProcess:
+    return subprocess.run(cmd, check=check, text=True, capture_output=True)
+
+
+def tc_node(args: list[str]) -> subprocess.CompletedProcess:
+    return run(["tc"] + args)
+
+
+def get_build_tc_tree_commands(devs: list[str]) -> list[str]:
+    tree = []
+    for dev in devs:
+        tree.append(f"tc qdisc add dev {dev} root handle {ROOT_HANDLE} htb default 1")
+        tree.append(
+            f"tc class add dev {dev} parent {ROOT_HANDLE} classid {CLASS_ID} htb rate 1gbit",
+        )
+        tree.append(
+            f"tc qdisc add dev {dev} parent {CLASS_ID} handle {NETEM_HANDLE} netem delay 0ms loss 0%",
+        )
+
+    return tree
+
+
+def namespaced_tc_commands(pids: list[str], commands: list[str]) -> list[str]:
+    return [
+        f"nsenter --target {pid} --net -- {rule}" for pid in pids for rule in commands
+    ]
+
+
+def get_egress_shaping_comand(
+    devices: list[str],
+    rate_mbit: Optional[str],
+    delay_ms: Optional[str],
+    loss_pct: Optional[str],
+) -> list[str]:
+
+    rate_commands = []
+    rate = f"{rate_mbit}mbit" if rate_mbit is not None else "1gbit"
+    d = delay_ms if delay_ms is not None else 0
+    l = loss_pct if loss_pct is not None else 0
+    for dev in devices:
+        rate_commands.append(
+            f"tc class change dev {dev} parent {ROOT_HANDLE} classid {CLASS_ID} htb rate {rate}"
+        )
+        rate_commands.append(
+            f"tc qdisc change dev {dev} parent {CLASS_ID} handle {NETEM_HANDLE} netem delay {d}ms loss {l}%"
+        )
+    return rate_commands
+
+
+def get_clear_egress_shaping_commands(devices: list[str]) -> list[str]:
+    return [f"tc qdisc del dev {dev} root handle {ROOT_HANDLE}" for dev in devices]
+
+
+def get_ingress_shaping_commands(
+    devs: list[str],
+    rate_mbit: Optional[str],
+    delay_ms: Optional[str],
+    loss_pct: Optional[str],
+    ifb_dev: str = "ifb0",
+) -> list[str]:
+
+    rate_commands = [
+        f"modprobe ifb || true",
+        f"ip link add {ifb_dev} type ifb || true",
+        f"ip link set {ifb_dev} up || true",
+    ]
+
+    for dev in devs:
+        rate_commands.append(f"tc qdisc add dev {dev} handle ffff: ingress || true")
+
+        rate_commands.append(
+            f"tc filter add dev {dev} parent ffff: protocol all prio 1 "
+            f"matchall action mirred egress redirect dev {ifb_dev} || true"
+        )
+
+    rate_commands.append(
+        f"tc qdisc add dev {ifb_dev} root handle {ROOT_HANDLE} htb default 1 || true"
+    )
+    rate_commands.append(
+        f"tc class add dev {ifb_dev} parent {ROOT_HANDLE} classid {CLASS_ID} "
+        f"htb rate {rate_mbit if rate_mbit else '1gbit'} || true"
+    )
+    rate_commands.append(
+        f"tc qdisc add dev {ifb_dev} parent {CLASS_ID} handle {NETEM_HANDLE} "
+        f"netem delay {delay_ms if delay_ms else '0ms'} "
+        f"loss {loss_pct if loss_pct else '0'}% || true"
+    )
+
+    return rate_commands
+
+
+def get_clear_ingress_shaping_commands(
+    devs: list[str],
+    ifb_dev: str = "ifb0",
+) -> list[str]:
+
+    cmds: list[str] = []
+    for dev in devs:
+        cmds.append(f"tc qdisc del dev {dev} ingress || true")
+
+    cmds.append(f"tc qdisc del dev {ifb_dev} root handle {ROOT_HANDLE} || true")
+
+    cmds.append(f"ip link set {ifb_dev} down || true")
+    cmds.append(f"ip link del {ifb_dev} || true")
+
+    return cmds
+
+
+def node_qdisc_is_simple(
+    kubecli: KrknKubernetes, pod_name, namespace: str, interface: str
+) -> bool:
+
+    result = kubecli.exec_cmd_in_pod(
+        [f"tc qdisc show dev {interface}"], pod_name, namespace
+    )
+    lines = [l for l in result.splitlines() if l.strip()]
+    if len(lines) != 1:
+        return False
+
+    line = lines[0].lower()
+    if "htb" in line or "netem" in line or "clsact" in line:
+        return False
+
+    return True
+
+
+def common_set_limit_rules(
+    egress: bool,
+    ingress: bool,
+    interfaces: list[str],
+    bandwidth: str,
+    latency: str,
+    loss: str,
+    parallel: bool,
+    target: str,
+    kubecli: KrknKubernetes,
+    network_chaos_pod_name: str,
+    namespace: str,
+    pids: Optional[list[str]] = None,
+):
+    if egress:
+        build_tree_commands = get_build_tc_tree_commands(interfaces)
+        if pids:
+            build_tree_commands = namespaced_tc_commands(pids, build_tree_commands)
+        egress_shaping_commands = get_egress_shaping_comand(
+            interfaces,
+            bandwidth,
+            latency,
+            loss,
+        )
+        if pids:
+            egress_shaping_commands = namespaced_tc_commands(
+                pids, egress_shaping_commands
+            )
+        error_counter = 0
+        for rule in build_tree_commands:
+            result = kubecli.exec_cmd_in_pod([rule], network_chaos_pod_name, namespace)
+            if not result:
+                log_info(f"created tc tree in pod: {rule}", parallel, target)
+            else:
+                error_counter += 1
+        if len(build_tree_commands) == error_counter:
+            log_error(
+                "failed to apply egress shaping rules on cluster", parallel, target
+            )
+
+        for rule in egress_shaping_commands:
+            result = kubecli.exec_cmd_in_pod([rule], network_chaos_pod_name, namespace)
+            if not result:
+                log_info(f"applied egress shaping rules: {rule}", parallel, target)
+    if ingress:
+        ingress_shaping_commands = get_ingress_shaping_commands(
+            interfaces,
+            bandwidth,
+            latency,
+            loss,
+        )
+        if pids:
+            ingress_shaping_commands = namespaced_tc_commands(
+                pids, ingress_shaping_commands
+            )
+        error_counter = 0
+        for rule in ingress_shaping_commands:
+
+            result = kubecli.exec_cmd_in_pod([rule], network_chaos_pod_name, namespace)
+            if not result:
+                log_info(
+                    f"applied ingress shaping rule: {rule}",
+                    parallel,
+                    network_chaos_pod_name,
+                )
+            else:
+                error_counter += 1
+
+        if len(ingress_shaping_commands) == error_counter:
+            log_error(
+                "failed to apply ingress shaping rules on cluster", parallel, target
+            )
+
+
+def common_delete_limit_rules(
+    egress: bool,
+    ingress: bool,
+    interfaces: list[str],
+    network_chaos_pod_name: str,
+    network_chaos_namespace: str,
+    kubecli: KrknKubernetes,
+    pids: Optional[list[str]],
+    parallel: bool,
+    target: str,
+):
+    if egress:
+        clear_commands = get_clear_egress_shaping_commands(interfaces)
+        if pids:
+            clear_commands = namespaced_tc_commands(pids, clear_commands)
+        error_counter = 0
+        for rule in clear_commands:
+            result = kubecli.exec_cmd_in_pod(
+                [rule], network_chaos_pod_name, network_chaos_namespace
+            )
+            if not result:
+                log_info(f"removed egress shaping rule : {rule}", parallel, target)
+            else:
+                error_counter += 1
+        if len(clear_commands) == error_counter:
+            log_error(
+                "failed to remove egress shaping rules on cluster", parallel, target
+            )
+
+    if ingress:
+        clear_commands = get_clear_ingress_shaping_commands(interfaces)
+        if pids:
+            clear_commands = namespaced_tc_commands(pids, clear_commands)
+        error_counter = 0
+        for rule in clear_commands:
+            result = kubecli.exec_cmd_in_pod(
+                [rule], network_chaos_pod_name, network_chaos_namespace
+            )
+            if not result:
+                log_info(f"removed ingress shaping rule: {rule}", parallel, target)
+            else:
+                error_counter += 1
+        if len(clear_commands) == error_counter:
+            log_error(
+                "failed to remove ingress shaping rules on cluster", parallel, target
+            )
--- a/krkn/scenario_plugins/network_chaos_ng/modules/utils_network_filter.py
+++ b/krkn/scenario_plugins/network_chaos_ng/modules/utils_network_filter.py
@@ -1,7 +1,5 @@
-import os
+from typing import Tuple

-import yaml
-from jinja2 import FileSystemLoader, Environment
 from krkn_lib.k8s import KrknKubernetes

 from krkn.scenario_plugins.network_chaos_ng.models import NetworkFilterConfig
@@ -10,7 +8,7 @@ from krkn.scenario_plugins.network_chaos_ng.modules.utils import log_info

 def generate_rules(
    interfaces: list[str], config: NetworkFilterConfig
-) -> (list[str], list[str]):
+) -> Tuple[list[str], list[str]]:
    input_rules = []
    output_rules = []
    for interface in interfaces:
@@ -29,72 +27,6 @@ def generate_rules(
    return input_rules, output_rules


-def generate_namespaced_rules(
-    interfaces: list[str], config: NetworkFilterConfig, pids: list[str]
-) -> (list[str], list[str]):
-    namespaced_input_rules: list[str] = []
-    namespaced_output_rules: list[str] = []
-    input_rules, output_rules = generate_rules(interfaces, config)
-    for pid in pids:
-        ns_input_rules = [
-            f"nsenter --target {pid} --net -- {rule}" for rule in input_rules
-        ]
-        ns_output_rules = [
-            f"nsenter --target {pid} --net -- {rule}" for rule in output_rules
-        ]
-        namespaced_input_rules.extend(ns_input_rules)
-        namespaced_output_rules.extend(ns_output_rules)
-
-    return namespaced_input_rules, namespaced_output_rules
-
-
-def deploy_network_filter_pod(
-    config: NetworkFilterConfig,
-    target_node: str,
-    pod_name: str,
-    kubecli: KrknKubernetes,
-    container_name: str = "fedora",
-    host_network: bool = True,
-):
-    file_loader = FileSystemLoader(os.path.abspath(os.path.dirname(__file__)))
-    env = Environment(loader=file_loader, autoescape=True)
-    pod_template = env.get_template("templates/network-chaos.j2")
-    tolerations = []
-
-    for taint in config.taints:
-        key_value_part, effect = taint.split(":", 1)
-        if "=" in key_value_part:
-            key, value = key_value_part.split("=", 1)
-            operator = "Equal"
-        else:
-            key = key_value_part
-            value = None
-            operator = "Exists"
-        toleration = {
-            "key": key,
-            "operator": operator,
-            "effect": effect,
-        }
-        if value is not None:
-            toleration["value"] = value
-        tolerations.append(toleration)
-
-    pod_body = yaml.safe_load(
-        pod_template.render(
-            pod_name=pod_name,
-            namespace=config.namespace,
-            host_network=host_network,
-            target=target_node,
-            container_name=container_name,
-            workload_image=config.image,
-            taints=tolerations,
-            service_account=config.service_account,
-        )
-    )
-
-    kubecli.create_pod(pod_body, config.namespace, 300)
-
-
 def apply_network_rules(
    kubecli: KrknKubernetes,
    input_rules: list[str],
@@ -153,9 +85,20 @@ def clean_network_rules_namespaced(
            )


-def get_default_interface(
-    pod_name: str, namespace: str, kubecli: KrknKubernetes
-) -> str:
-    cmd = "ip r | grep default | awk '/default/ {print $5}'"
-    output = kubecli.exec_cmd_in_pod([cmd], pod_name, namespace)
-    return output.replace("\n", "")
+def generate_namespaced_rules(
+    interfaces: list[str], config: NetworkFilterConfig, pids: list[str]
+) -> Tuple[list[str], list[str]]:
+    namespaced_input_rules: list[str] = []
+    namespaced_output_rules: list[str] = []
+    input_rules, output_rules = generate_rules(interfaces, config)
+    for pid in pids:
+        ns_input_rules = [
+            f"nsenter --target {pid} --net -- {rule}" for rule in input_rules
+        ]
+        ns_output_rules = [
+            f"nsenter --target {pid} --net -- {rule}" for rule in output_rules
+        ]
+        namespaced_input_rules.extend(ns_input_rules)
+        namespaced_output_rules.extend(ns_output_rules)
+
+    return namespaced_input_rules, namespaced_output_rules
--- a/krkn/scenario_plugins/network_chaos_ng/network_chaos_factory.py
+++ b/krkn/scenario_plugins/network_chaos_ng/network_chaos_factory.py
@@ -1,17 +1,31 @@
 from krkn_lib.telemetry.ocp import KrknTelemetryOpenshift

-from krkn.scenario_plugins.network_chaos_ng.models import NetworkFilterConfig
+from krkn.scenario_plugins.network_chaos_ng.models import (
+    NetworkFilterConfig,
+    NetworkChaosConfig,
+)
 from krkn.scenario_plugins.network_chaos_ng.modules.abstract_network_chaos_module import (
    AbstractNetworkChaosModule,
 )
+from krkn.scenario_plugins.network_chaos_ng.modules.node_network_chaos import (
+    NodeNetworkChaosModule,
+)
 from krkn.scenario_plugins.network_chaos_ng.modules.node_network_filter import (
    NodeNetworkFilterModule,
 )
+from krkn.scenario_plugins.network_chaos_ng.modules.pod_network_chaos import (
+    PodNetworkChaosModule,
+)
 from krkn.scenario_plugins.network_chaos_ng.modules.pod_network_filter import (
    PodNetworkFilterModule,
 )

-supported_modules = ["node_network_filter", "pod_network_filter"]
+supported_modules = [
+    "node_network_filter",
+    "pod_network_filter",
+    "pod_network_chaos",
+    "node_network_chaos",
+]


 class NetworkChaosFactory:
@@ -26,14 +40,28 @@ class NetworkChaosFactory:
            raise Exception(f"{config['id']} is not a supported network chaos module")

        if config["id"] == "node_network_filter":
-            config = NetworkFilterConfig(**config)
-            errors = config.validate()
+            scenario_config = NetworkFilterConfig(**config)
+            errors = scenario_config.validate()
            if len(errors) > 0:
                raise Exception(f"config validation errors: [{';'.join(errors)}]")
-            return NodeNetworkFilterModule(config, kubecli)
+            return NodeNetworkFilterModule(scenario_config, kubecli)
        if config["id"] == "pod_network_filter":
-            config = NetworkFilterConfig(**config)
-            errors = config.validate()
+            scenario_config = NetworkFilterConfig(**config)
+            errors = scenario_config.validate()
            if len(errors) > 0:
                raise Exception(f"config validation errors: [{';'.join(errors)}]")
-            return PodNetworkFilterModule(config, kubecli)
+            return PodNetworkFilterModule(scenario_config, kubecli)
+        if config["id"] == "pod_network_chaos":
+            scenario_config = NetworkChaosConfig(**config)
+            errors = scenario_config.validate()
+            if len(errors) > 0:
+                raise Exception(f"config validation errors: [{';'.join(errors)}]")
+            return PodNetworkChaosModule(scenario_config, kubecli)
+        if config["id"] == "node_network_chaos":
+            scenario_config = NetworkChaosConfig(**config)
+            errors = scenario_config.validate()
+            if len(errors) > 0:
+                raise Exception(f"config validation errors: [{';'.join(errors)}]")
+            return NodeNetworkChaosModule(scenario_config, kubecli)
+        else:
+            raise Exception(f"invalid network chaos id {config['id']}")
--- a/krkn/scenario_plugins/node_actions/abstract_node_scenarios.py
+++ b/krkn/scenario_plugins/node_actions/abstract_node_scenarios.py
@@ -18,20 +18,20 @@ class abstract_node_scenarios:
        self.node_action_kube_check = node_action_kube_check

    # Node scenario to start the node
-    def node_start_scenario(self, instance_kill_count, node, timeout):
+    def node_start_scenario(self, instance_kill_count, node, timeout, poll_interval):
        pass

    # Node scenario to stop the node
-    def node_stop_scenario(self, instance_kill_count, node, timeout):
+    def node_stop_scenario(self, instance_kill_count, node, timeout, poll_interval):
        pass

    # Node scenario to stop and then start the node
-    def node_stop_start_scenario(self, instance_kill_count, node, timeout, duration):
+    def node_stop_start_scenario(self, instance_kill_count, node, timeout, duration, poll_interval):
        logging.info("Starting node_stop_start_scenario injection")
-        self.node_stop_scenario(instance_kill_count, node, timeout)
+        self.node_stop_scenario(instance_kill_count, node, timeout, poll_interval)
        logging.info("Waiting for %s seconds before starting the node" % (duration))
        time.sleep(duration)
-        self.node_start_scenario(instance_kill_count, node, timeout)
+        self.node_start_scenario(instance_kill_count, node, timeout, poll_interval)
        self.affected_nodes_status.merge_affected_nodes()
        logging.info("node_stop_start_scenario has been successfully injected!")

@@ -56,7 +56,7 @@ class abstract_node_scenarios:
            logging.error("node_disk_detach_attach_scenario failed!")

    # Node scenario to terminate the node
-    def node_termination_scenario(self, instance_kill_count, node, timeout):
+    def node_termination_scenario(self, instance_kill_count, node, timeout, poll_interval):
        pass

    # Node scenario to reboot the node
@@ -76,7 +76,7 @@ class abstract_node_scenarios:
                nodeaction.wait_for_unknown_status(node, timeout, self.kubecli, affected_node)
                
                logging.info("The kubelet of the node %s has been stopped" % (node))
-                logging.info("stop_kubelet_scenario has been successfuly injected!")
+                logging.info("stop_kubelet_scenario has been successfully injected!")
            except Exception as e:
                logging.error(
                    "Failed to stop the kubelet of the node. Encountered following "
@@ -108,7 +108,7 @@ class abstract_node_scenarios:
                )
                nodeaction.wait_for_ready_status(node, timeout, self.kubecli,affected_node)
                logging.info("The kubelet of the node %s has been restarted" % (node))
-                logging.info("restart_kubelet_scenario has been successfuly injected!")
+                logging.info("restart_kubelet_scenario has been successfully injected!")
            except Exception as e:
                logging.error(
                    "Failed to restart the kubelet of the node. Encountered following "
@@ -128,7 +128,7 @@ class abstract_node_scenarios:
                    "oc debug node/" + node + " -- chroot /host "
                    "dd if=/dev/urandom of=/proc/sysrq-trigger"
                )
-                logging.info("node_crash_scenario has been successfuly injected!")
+                logging.info("node_crash_scenario has been successfully injected!")
            except Exception as e:
                logging.error(
                    "Failed to crash the node. Encountered following exception: %s. "
--- a/krkn/scenario_plugins/node_actions/alibaba_node_scenarios.py
+++ b/krkn/scenario_plugins/node_actions/alibaba_node_scenarios.py
@@ -234,7 +234,7 @@ class alibaba_node_scenarios(abstract_node_scenarios):
        

    # Node scenario to start the node
-    def node_start_scenario(self, instance_kill_count, node, timeout):
+    def node_start_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            affected_node = AffectedNode(node)
            try:
@@ -260,7 +260,7 @@ class alibaba_node_scenarios(abstract_node_scenarios):
            self.affected_nodes_status.affected_nodes.append(affected_node)

    # Node scenario to stop the node
-    def node_stop_scenario(self, instance_kill_count, node, timeout):
+    def node_stop_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            affected_node = AffectedNode(node)
            try:
@@ -286,7 +286,7 @@ class alibaba_node_scenarios(abstract_node_scenarios):

    # Might need to stop and then release the instance
    # Node scenario to terminate the node
-    def node_termination_scenario(self, instance_kill_count, node, timeout):
+    def node_termination_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            affected_node = AffectedNode(node)
            try:
--- a/krkn/scenario_plugins/node_actions/aws_node_scenarios.py
+++ b/krkn/scenario_plugins/node_actions/aws_node_scenarios.py
@@ -77,10 +77,21 @@ class AWS:
    # until a successful state is reached. An error is returned after 40 failed checks
    # Setting timeout for consistency with other cloud functions
    # Wait until the node instance is running
-    def wait_until_running(self, instance_id, timeout=600, affected_node=None):
+    def wait_until_running(self, instance_id, timeout=600, affected_node=None, poll_interval=15):
        try:
            start_time = time.time()
-            self.boto_instance.wait_until_running(InstanceIds=[instance_id])
+            if timeout > 0:
+                max_attempts = max(1, int(timeout / poll_interval))
+            else:
+                max_attempts = 40
+
+            self.boto_instance.wait_until_running(
+                InstanceIds=[instance_id],
+                WaiterConfig={
+                    'Delay': poll_interval,
+                    'MaxAttempts': max_attempts
+                }
+            )
            end_time = time.time()
            if affected_node:
                affected_node.set_affected_node_status("running", end_time - start_time)
@@ -93,10 +104,21 @@ class AWS:
            return False

    # Wait until the node instance is stopped
-    def wait_until_stopped(self, instance_id, timeout=600, affected_node= None):
+    def wait_until_stopped(self, instance_id, timeout=600, affected_node= None, poll_interval=15):
        try:
            start_time = time.time()
-            self.boto_instance.wait_until_stopped(InstanceIds=[instance_id])
+            if timeout > 0:
+                max_attempts = max(1, int(timeout / poll_interval))
+            else:
+                max_attempts = 40
+
+            self.boto_instance.wait_until_stopped(
+                InstanceIds=[instance_id],
+                WaiterConfig={
+                    'Delay': poll_interval,
+                    'MaxAttempts': max_attempts
+                }
+            )
            end_time = time.time()
            if affected_node:
                affected_node.set_affected_node_status("stopped", end_time - start_time)
@@ -109,10 +131,21 @@ class AWS:
            return False

    # Wait until the node instance is terminated
-    def wait_until_terminated(self, instance_id, timeout=600, affected_node= None):
+    def wait_until_terminated(self, instance_id, timeout=600, affected_node= None, poll_interval=15):
        try:
            start_time = time.time()
-            self.boto_instance.wait_until_terminated(InstanceIds=[instance_id])
+            if timeout > 0:
+                max_attempts = max(1, int(timeout / poll_interval))
+            else:
+                max_attempts = 40
+
+            self.boto_instance.wait_until_terminated(
+                InstanceIds=[instance_id],
+                WaiterConfig={
+                    'Delay': poll_interval,
+                    'MaxAttempts': max_attempts
+                }
+            )
            end_time = time.time()
            if affected_node:
                affected_node.set_affected_node_status("terminated", end_time - start_time)
@@ -267,7 +300,7 @@ class aws_node_scenarios(abstract_node_scenarios):
        self.node_action_kube_check = node_action_kube_check

    # Node scenario to start the node
-    def node_start_scenario(self, instance_kill_count, node, timeout):
+    def node_start_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            affected_node = AffectedNode(node)
            try:
@@ -278,7 +311,7 @@ class aws_node_scenarios(abstract_node_scenarios):
                    "Starting the node %s with instance ID: %s " % (node, instance_id)
                )
                self.aws.start_instances(instance_id)
-                self.aws.wait_until_running(instance_id, affected_node=affected_node)
+                self.aws.wait_until_running(instance_id, timeout=timeout, affected_node=affected_node, poll_interval=poll_interval)
                if self.node_action_kube_check: 
                    nodeaction.wait_for_ready_status(node, timeout, self.kubecli, affected_node)
                logging.info(
@@ -296,7 +329,7 @@ class aws_node_scenarios(abstract_node_scenarios):
            self.affected_nodes_status.affected_nodes.append(affected_node)

    # Node scenario to stop the node
-    def node_stop_scenario(self, instance_kill_count, node, timeout):
+    def node_stop_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            affected_node = AffectedNode(node)
            try:
@@ -307,7 +340,7 @@ class aws_node_scenarios(abstract_node_scenarios):
                    "Stopping the node %s with instance ID: %s " % (node, instance_id)
                )
                self.aws.stop_instances(instance_id)
-                self.aws.wait_until_stopped(instance_id, affected_node=affected_node)
+                self.aws.wait_until_stopped(instance_id, timeout=timeout, affected_node=affected_node, poll_interval=poll_interval)
                logging.info(
                    "Node with instance ID: %s is in stopped state" % (instance_id)
                )
@@ -324,7 +357,7 @@ class aws_node_scenarios(abstract_node_scenarios):
            self.affected_nodes_status.affected_nodes.append(affected_node)

    # Node scenario to terminate the node
-    def node_termination_scenario(self, instance_kill_count, node, timeout):
+    def node_termination_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            affected_node = AffectedNode(node)
            try:
@@ -336,7 +369,7 @@ class aws_node_scenarios(abstract_node_scenarios):
                    % (node, instance_id)
                )
                self.aws.terminate_instances(instance_id)
-                self.aws.wait_until_terminated(instance_id, affected_node=affected_node)
+                self.aws.wait_until_terminated(instance_id, timeout=timeout, affected_node=affected_node, poll_interval=poll_interval)
                for _ in range(timeout):
                    if node not in self.kubecli.list_nodes():
                        break
@@ -346,7 +379,7 @@ class aws_node_scenarios(abstract_node_scenarios):
                logging.info(
                    "Node with instance ID: %s has been terminated" % (instance_id)
                )
-                logging.info("node_termination_scenario has been successfuly injected!")
+                logging.info("node_termination_scenario has been successfully injected!")
            except Exception as e:
                logging.error(
                    "Failed to terminate node instance. Encountered following exception:"
@@ -375,7 +408,7 @@ class aws_node_scenarios(abstract_node_scenarios):
                logging.info(
                    "Node with instance ID: %s has been rebooted" % (instance_id)
                )
-                logging.info("node_reboot_scenario has been successfuly injected!")
+                logging.info("node_reboot_scenario has been successfully injected!")
            except Exception as e:
                logging.error(
                    "Failed to reboot node instance. Encountered following exception:"
--- a/krkn/scenario_plugins/node_actions/az_node_scenarios.py
+++ b/krkn/scenario_plugins/node_actions/az_node_scenarios.py
@@ -18,8 +18,6 @@ class Azure:
        logging.info("azure " + str(self))
        # Acquire a credential object using CLI-based authentication.
        credentials = DefaultAzureCredential()
-        # az_account = runcommand.invoke("az account list -o yaml")
-        # az_account_yaml = yaml.safe_load(az_account, Loader=yaml.FullLoader)
        logger = logging.getLogger("azure")
        logger.setLevel(logging.WARNING)
        subscription_id = os.getenv("AZURE_SUBSCRIPTION_ID")
@@ -218,7 +216,7 @@ class azure_node_scenarios(abstract_node_scenarios):
        

    # Node scenario to start the node
-    def node_start_scenario(self, instance_kill_count, node, timeout):
+    def node_start_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            affected_node = AffectedNode(node)
            try:
@@ -246,7 +244,7 @@ class azure_node_scenarios(abstract_node_scenarios):
            self.affected_nodes_status.affected_nodes.append(affected_node)

    # Node scenario to stop the node
-    def node_stop_scenario(self, instance_kill_count, node, timeout):
+    def node_stop_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            affected_node = AffectedNode(node)
            try:
@@ -273,7 +271,7 @@ class azure_node_scenarios(abstract_node_scenarios):
            self.affected_nodes_status.affected_nodes.append(affected_node)

    # Node scenario to terminate the node
-    def node_termination_scenario(self, instance_kill_count, node, timeout):
+    def node_termination_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            affected_node = AffectedNode(node)
            try:
--- a/krkn/scenario_plugins/node_actions/bm_node_scenarios.py
+++ b/krkn/scenario_plugins/node_actions/bm_node_scenarios.py
@@ -153,7 +153,7 @@ class bm_node_scenarios(abstract_node_scenarios):
        self.node_action_kube_check = node_action_kube_check

    # Node scenario to start the node
-    def node_start_scenario(self, instance_kill_count, node, timeout):
+    def node_start_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            affected_node = AffectedNode(node)
            try:
@@ -182,7 +182,7 @@ class bm_node_scenarios(abstract_node_scenarios):
            self.affected_nodes_status.affected_nodes.append(affected_node)

    # Node scenario to stop the node
-    def node_stop_scenario(self, instance_kill_count, node, timeout):
+    def node_stop_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            affected_node = AffectedNode(node)
            try:
@@ -210,7 +210,7 @@ class bm_node_scenarios(abstract_node_scenarios):
            self.affected_nodes_status.affected_nodes.append(affected_node)

    # Node scenario to terminate the node
-    def node_termination_scenario(self, instance_kill_count, node, timeout):
+    def node_termination_scenario(self, instance_kill_count, node, timeout, poll_interval):
        logging.info("Node termination scenario is not supported on baremetal")

    # Node scenario to reboot the node
@@ -229,7 +229,7 @@ class bm_node_scenarios(abstract_node_scenarios):
                    nodeaction.wait_for_unknown_status(node, timeout, self.kubecli, affected_node)
                    nodeaction.wait_for_ready_status(node, timeout, self.kubecli, affected_node)
                logging.info("Node with bmc address: %s has been rebooted" % (bmc_addr))
-                logging.info("node_reboot_scenario has been successfuly injected!")
+                logging.info("node_reboot_scenario has been successfully injected!")
            except Exception as e:
                logging.error(
                    "Failed to reboot node instance. Encountered following exception:"
--- a/krkn/scenario_plugins/node_actions/common_node_functions.py
+++ b/krkn/scenario_plugins/node_actions/common_node_functions.py
@@ -11,7 +11,7 @@ def get_node_by_name(node_name_list, kubecli: KrknKubernetes):
    for node_name in node_name_list:
        if node_name not in killable_nodes:
            logging.info(
-                f"Node with provided ${node_name} does not exist or the node might "
+                f"Node with provided {node_name} does not exist or the node might "
                "be in NotReady state."
            )
            return
--- a/krkn/scenario_plugins/node_actions/docker_node_scenarios.py
+++ b/krkn/scenario_plugins/node_actions/docker_node_scenarios.py
@@ -2,49 +2,176 @@ import krkn.scenario_plugins.node_actions.common_node_functions as nodeaction
 from krkn.scenario_plugins.node_actions.abstract_node_scenarios import (
    abstract_node_scenarios,
 )
+import os
+import platform
 import logging
 import docker
 from krkn_lib.k8s import KrknKubernetes
 from krkn_lib.models.k8s import AffectedNode, AffectedNodeStatus

 class Docker:
+    """
+    Container runtime client wrapper supporting both Docker and Podman.
+
+    This class automatically detects and connects to either Docker or Podman
+    container runtimes using the Docker-compatible API. It tries multiple
+    connection methods in order of preference:
+
+    1. Docker Unix socket (unix:///var/run/docker.sock)
+    2. Platform-specific Podman sockets:
+       - macOS: ~/.local/share/containers/podman/machine/podman.sock
+       - Linux rootful: unix:///run/podman/podman.sock
+       - Linux rootless: unix:///run/user/<uid>/podman/podman.sock
+    3. Environment variables (DOCKER_HOST or CONTAINER_HOST)
+
+    The runtime type (docker/podman) is auto-detected and logged for debugging.
+    Supports Kind clusters running on Podman.
+
+    Assisted By: Claude Code
+    """
    def __init__(self):
-        self.client = docker.from_env()
+        self.client = None
+        self.runtime = 'unknown'
+        
+
+        # Try multiple connection methods in order of preference
+        # Supports both Docker and Podman
+        connection_methods = [
+            ('unix:///var/run/docker.sock', 'Docker Unix socket'),
+        ]
+
+        # Add platform-specific Podman sockets
+        if platform.system() == 'Darwin':  # macOS
+            # On macOS, Podman uses podman-machine with socket typically at:
+            # ~/.local/share/containers/podman/machine/podman.sock
+            # This is often symlinked to /var/run/docker.sock
+            podman_machine_sock = os.path.expanduser('~/.local/share/containers/podman/machine/podman.sock')
+            if os.path.exists(podman_machine_sock):
+                connection_methods.append((f'unix://{podman_machine_sock}', 'Podman machine socket (macOS)'))
+        else:  # Linux
+            connection_methods.extend([
+                ('unix:///run/podman/podman.sock', 'Podman Unix socket (rootful)'),
+                ('unix:///run/user/{uid}/podman/podman.sock', 'Podman Unix socket (rootless)'),
+            ])
+
+        # Always try from_env as last resort
+        connection_methods.append(('from_env', 'Environment variables (DOCKER_HOST/CONTAINER_HOST)'))
+
+        for method, description in connection_methods:
+            try:
+                # Handle rootless Podman socket path with {uid} placeholder
+                if '{uid}' in method:
+                    uid = os.getuid()
+                    method = method.format(uid=uid)
+                    logging.info(f'Attempting to connect using {description}: {method}')
+
+                if method == 'from_env':
+                    logging.info(f'Attempting to connect using {description}')
+                    self.client = docker.from_env()
+                else:
+                    logging.info(f'Attempting to connect using {description}: {method}')
+                    self.client = docker.DockerClient(base_url=method)
+
+                # Test the connection
+                self.client.ping()
+
+                # Detect runtime type
+                try:
+                    version_info = self.client.version()
+                    version_str = version_info.get('Version', '')
+                    if 'podman' in version_str.lower():
+                        self.runtime = 'podman'
+                    else:
+                        self.runtime = 'docker'
+                    logging.debug(f'Runtime version info: {version_str}')
+                except Exception as version_err:
+                    logging.warning(f'Could not detect runtime version: {version_err}')
+                    self.runtime = 'unknown'
+
+                logging.info(f'Successfully connected to {self.runtime} using {description}')
+
+                # Log available containers for debugging
+                try:
+                    containers = self.client.containers.list(all=True)
+                    logging.info(f'Found {len(containers)} total containers')
+                    for container in containers[:5]:  # Log first 5
+                        logging.debug(f'  Container: {container.name} ({container.status})')
+                except Exception as list_err:
+                    logging.warning(f'Could not list containers: {list_err}')
+
+                break
+
+            except Exception as e:
+                logging.warning(f'Failed to connect using {description}: {e}')
+                continue
+
+        if self.client is None:
+            error_msg = 'Failed to initialize container runtime client (Docker/Podman) with any connection method'
+            logging.error(error_msg)
+            logging.error('Attempted connection methods:')
+            for method, desc in connection_methods:
+                logging.error(f'  - {desc}: {method}')
+            raise RuntimeError(error_msg)
+
+        logging.info(f'Container runtime client initialized successfully: {self.runtime}')

    def get_container_id(self, node_name):
+        """Get the container ID for a given node name."""
        container = self.client.containers.get(node_name)
+        logging.info(f'Found {self.runtime} container for node {node_name}: {container.id}')
        return container.id

    # Start the node instance
    def start_instances(self, node_name):
+        """Start a container instance (works with both Docker and Podman)."""
+        logging.info(f'Starting {self.runtime} container for node: {node_name}')
        container = self.client.containers.get(node_name)
        container.start()
+        logging.info(f'Container {container.id} started successfully')

    # Stop the node instance
    def stop_instances(self, node_name):
+        """Stop a container instance (works with both Docker and Podman)."""
+        logging.info(f'Stopping {self.runtime} container for node: {node_name}')
        container = self.client.containers.get(node_name)
        container.stop()
+        logging.info(f'Container {container.id} stopped successfully')

    # Reboot the node instance
    def reboot_instances(self, node_name):
+        """Restart a container instance (works with both Docker and Podman)."""
+        logging.info(f'Restarting {self.runtime} container for node: {node_name}')
        container = self.client.containers.get(node_name)
        container.restart()
+        logging.info(f'Container {container.id} restarted successfully')

    # Terminate the node instance
    def terminate_instances(self, node_name):
+        """Stop and remove a container instance (works with both Docker and Podman)."""
+        logging.info(f'Terminating {self.runtime} container for node: {node_name}')
        container = self.client.containers.get(node_name)
        container.stop()
        container.remove()
+        logging.info(f'Container {container.id} terminated and removed successfully')


 class docker_node_scenarios(abstract_node_scenarios):
+    """
+    Node chaos scenarios for containerized Kubernetes nodes.
+
+    Supports both Docker and Podman container runtimes. This class provides
+    methods to inject chaos into Kubernetes nodes running as containers
+    (e.g., Kind clusters, Podman-based clusters).
+    """
    def __init__(self, kubecli: KrknKubernetes, node_action_kube_check: bool, affected_nodes_status: AffectedNodeStatus):
+        logging.info('Initializing docker_node_scenarios (supports Docker and Podman)')
        super().__init__(kubecli, node_action_kube_check, affected_nodes_status)
        self.docker = Docker()
        self.node_action_kube_check = node_action_kube_check
+        logging.info(f'Node scenarios initialized successfully using {self.docker.runtime} runtime')

    # Node scenario to start the node
-    def node_start_scenario(self, instance_kill_count, node, timeout):
+    def node_start_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            affected_node = AffectedNode(node)
            try:
@@ -71,7 +198,7 @@ class docker_node_scenarios(abstract_node_scenarios):
            self.affected_nodes_status.affected_nodes.append(affected_node)

    # Node scenario to stop the node
-    def node_stop_scenario(self, instance_kill_count, node, timeout):
+    def node_stop_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            affected_node = AffectedNode(node)
            try:
@@ -97,7 +224,7 @@ class docker_node_scenarios(abstract_node_scenarios):
            self.affected_nodes_status.affected_nodes.append(affected_node)

    # Node scenario to terminate the node
-    def node_termination_scenario(self, instance_kill_count, node, timeout):
+    def node_termination_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            try:
                logging.info("Starting node_termination_scenario injection")
@@ -110,7 +237,7 @@ class docker_node_scenarios(abstract_node_scenarios):
                logging.info(
                    "Node with container ID: %s has been terminated" % (container_id)
                )
-                logging.info("node_termination_scenario has been successfuly injected!")
+                logging.info("node_termination_scenario has been successfully injected!")
            except Exception as e:
                logging.error(
                    "Failed to terminate node instance. Encountered following exception:"
@@ -137,7 +264,7 @@ class docker_node_scenarios(abstract_node_scenarios):
                logging.info(
                    "Node with container ID: %s has been rebooted" % (container_id)
                )
-                logging.info("node_reboot_scenario has been successfuly injected!")
+                logging.info("node_reboot_scenario has been successfully injected!")
            except Exception as e:
                logging.error(
                    "Failed to reboot node instance. Encountered following exception:"
--- a/krkn/scenario_plugins/node_actions/gcp_node_scenarios.py
+++ b/krkn/scenario_plugins/node_actions/gcp_node_scenarios.py
@@ -227,7 +227,7 @@ class gcp_node_scenarios(abstract_node_scenarios):
        self.node_action_kube_check = node_action_kube_check

    # Node scenario to start the node
-    def node_start_scenario(self, instance_kill_count, node, timeout):
+    def node_start_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            affected_node = AffectedNode(node)
            try:
@@ -257,7 +257,7 @@ class gcp_node_scenarios(abstract_node_scenarios):
            self.affected_nodes_status.affected_nodes.append(affected_node)

    # Node scenario to stop the node
-    def node_stop_scenario(self, instance_kill_count, node, timeout):
+    def node_stop_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            affected_node = AffectedNode(node)
            try:
@@ -286,7 +286,7 @@ class gcp_node_scenarios(abstract_node_scenarios):
            self.affected_nodes_status.affected_nodes.append(affected_node)

    # Node scenario to terminate the node
-    def node_termination_scenario(self, instance_kill_count, node, timeout):
+    def node_termination_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            affected_node = AffectedNode(node)
            try:
@@ -309,7 +309,7 @@ class gcp_node_scenarios(abstract_node_scenarios):
                logging.info(
                    "Node with instance ID: %s has been terminated" % instance_id
                )
-                logging.info("node_termination_scenario has been successfuly injected!")
+                logging.info("node_termination_scenario has been successfully injected!")
            except Exception as e:
                logging.error(
                    "Failed to terminate node instance. Encountered following exception:"
@@ -341,7 +341,7 @@ class gcp_node_scenarios(abstract_node_scenarios):
                logging.info(
                    "Node with instance ID: %s has been rebooted" % instance_id
                )
-                logging.info("node_reboot_scenario has been successfuly injected!")
+                logging.info("node_reboot_scenario has been successfully injected!")
            except Exception as e:
                logging.error(
                    "Failed to reboot node instance. Encountered following exception:"
--- a/krkn/scenario_plugins/node_actions/general_cloud_node_scenarios.py
+++ b/krkn/scenario_plugins/node_actions/general_cloud_node_scenarios.py
@@ -18,21 +18,21 @@ class general_node_scenarios(abstract_node_scenarios):
        self.node_action_kube_check = node_action_kube_check

    # Node scenario to start the node
-    def node_start_scenario(self, instance_kill_count, node, timeout):
+    def node_start_scenario(self, instance_kill_count, node, timeout, poll_interval):
        logging.info(
            "Node start is not set up yet for this cloud type, "
            "no action is going to be taken"
        )

    # Node scenario to stop the node
-    def node_stop_scenario(self, instance_kill_count, node, timeout):
+    def node_stop_scenario(self, instance_kill_count, node, timeout, poll_interval):
        logging.info(
            "Node stop is not set up yet for this cloud type,"
            " no action is going to be taken"
        )

    # Node scenario to terminate the node
-    def node_termination_scenario(self, instance_kill_count, node, timeout):
+    def node_termination_scenario(self, instance_kill_count, node, timeout, poll_interval):
        logging.info(
            "Node termination is not set up yet for this cloud type, "
            "no action is going to be taken"
--- a/krkn/scenario_plugins/node_actions/ibmcloud_node_scenarios.py
+++ b/krkn/scenario_plugins/node_actions/ibmcloud_node_scenarios.py
@@ -284,7 +284,7 @@ class ibm_node_scenarios(abstract_node_scenarios):
        
        self.node_action_kube_check = node_action_kube_check

-    def node_start_scenario(self, instance_kill_count, node, timeout):
+    def node_start_scenario(self, instance_kill_count, node, timeout, poll_interval):
        try:
            instance_id = self.ibmcloud.get_instance_id( node)
            affected_node = AffectedNode(node, node_id=instance_id)
@@ -317,7 +317,7 @@ class ibm_node_scenarios(abstract_node_scenarios):
        self.affected_nodes_status.affected_nodes.append(affected_node)


-    def node_stop_scenario(self, instance_kill_count, node, timeout):
+    def node_stop_scenario(self, instance_kill_count, node, timeout, poll_interval):
        try:
            instance_id = self.ibmcloud.get_instance_id(node)
            for _ in range(instance_kill_count):
@@ -327,14 +327,20 @@ class ibm_node_scenarios(abstract_node_scenarios):
                vm_stopped = self.ibmcloud.stop_instances(instance_id)
                if vm_stopped:
                    self.ibmcloud.wait_until_stopped(instance_id, timeout, affected_node)
-                logging.info(
-                    "Node with instance ID: %s is in stopped state" % node
-                )
-                logging.info(
-                    "node_stop_scenario has been successfully injected!"
-                )
+                    logging.info(
+                        "Node with instance ID: %s is in stopped state" % node
+                    )
+                    logging.info(
+                        "node_stop_scenario has been successfully injected!"
+                    )
+                else:
+                    logging.error(
+                        "Failed to stop node instance %s. Stop command failed." % instance_id
+                    )
+                    raise Exception("Stop command failed for instance %s" % instance_id)
+                self.affected_nodes_status.affected_nodes.append(affected_node)
        except Exception as e:
-            logging.error("Failed to stop node instance. Test Failed")
+            logging.error("Failed to stop node instance. Test Failed: %s" % str(e))
            logging.error("node_stop_scenario injection failed!")


@@ -345,28 +351,35 @@ class ibm_node_scenarios(abstract_node_scenarios):
                affected_node = AffectedNode(node, node_id=instance_id)
                logging.info("Starting node_reboot_scenario injection")
                logging.info("Rebooting the node %s " % (node))
-                self.ibmcloud.reboot_instances(instance_id)
-                self.ibmcloud.wait_until_rebooted(instance_id, timeout, affected_node)
-                if self.node_action_kube_check:
-                    nodeaction.wait_for_unknown_status(
-                        node, timeout, affected_node
+                vm_rebooted = self.ibmcloud.reboot_instances(instance_id)
+                if vm_rebooted:
+                    self.ibmcloud.wait_until_rebooted(instance_id, timeout, affected_node)
+                    if self.node_action_kube_check:
+                        nodeaction.wait_for_unknown_status(
+                            node, timeout, self.kubecli, affected_node
+                        )
+                        nodeaction.wait_for_ready_status(
+                            node, timeout, self.kubecli, affected_node
+                        )
+                    logging.info(
+                        "Node with instance ID: %s has rebooted successfully" % node
                    )
-                    nodeaction.wait_for_ready_status(
-                        node, timeout, affected_node
+                    logging.info(
+                        "node_reboot_scenario has been successfully injected!"
                    )
-                logging.info(
-                    "Node with instance ID: %s has rebooted successfully" % node
-                )
-                logging.info(
-                    "node_reboot_scenario has been successfully injected!"
-                )
+                else:
+                    logging.error(
+                        "Failed to reboot node instance %s. Reboot command failed." % instance_id
+                    )
+                    raise Exception("Reboot command failed for instance %s" % instance_id)
+                self.affected_nodes_status.affected_nodes.append(affected_node)

        except Exception as e:
-            logging.error("Failed to reboot node instance. Test Failed")
+            logging.error("Failed to reboot node instance. Test Failed: %s" % str(e))
            logging.error("node_reboot_scenario injection failed!")


-    def node_terminate_scenario(self, instance_kill_count, node, timeout):
+    def node_terminate_scenario(self, instance_kill_count, node, timeout, poll_interval):
        try:
            instance_id = self.ibmcloud.get_instance_id(node)
            for _ in range(instance_kill_count):
@@ -383,7 +396,8 @@ class ibm_node_scenarios(abstract_node_scenarios):
                logging.info(
                    "node_terminate_scenario has been successfully injected!"
                )
+                self.affected_nodes_status.affected_nodes.append(affected_node)
        except Exception as e:
-            logging.error("Failed to terminate node instance. Test Failed")
+            logging.error("Failed to terminate node instance. Test Failed: %s" % str(e))
            logging.error("node_terminate_scenario injection failed!")

--- a/krkn/scenario_plugins/node_actions/ibmcloud_power_node_scenarios.py
+++ b/krkn/scenario_plugins/node_actions/ibmcloud_power_node_scenarios.py
@@ -298,7 +298,7 @@ class ibmcloud_power_node_scenarios(abstract_node_scenarios):
        
        self.node_action_kube_check = node_action_kube_check

-    def node_start_scenario(self, instance_kill_count, node, timeout):
+    def node_start_scenario(self, instance_kill_count, node, timeout, poll_interval):
        try:
            instance_id = self.ibmcloud_power.get_instance_id( node)
            affected_node = AffectedNode(node, node_id=instance_id)
@@ -331,7 +331,7 @@ class ibmcloud_power_node_scenarios(abstract_node_scenarios):
        self.affected_nodes_status.affected_nodes.append(affected_node)


-    def node_stop_scenario(self, instance_kill_count, node, timeout):
+    def node_stop_scenario(self, instance_kill_count, node, timeout, poll_interval):
        try:
            instance_id = self.ibmcloud_power.get_instance_id(node)
            for _ in range(instance_kill_count):
@@ -380,7 +380,7 @@ class ibmcloud_power_node_scenarios(abstract_node_scenarios):
            logging.error("node_reboot_scenario injection failed!")


-    def node_terminate_scenario(self, instance_kill_count, node, timeout):
+    def node_terminate_scenario(self, instance_kill_count, node, timeout, poll_interval):
        try:
            instance_id = self.ibmcloud_power.get_instance_id(node)
            for _ in range(instance_kill_count):
--- a/krkn/scenario_plugins/node_actions/node_actions_scenario_plugin.py
+++ b/krkn/scenario_plugins/node_actions/node_actions_scenario_plugin.py
@@ -196,13 +196,11 @@ class NodeActionsScenarioPlugin(AbstractScenarioPlugin):
                exclude_nodes = common_node_functions.get_node(
                    exclude_label, 0, kubecli
                )
-
-                for node in nodes:
-                    if node in exclude_nodes:
-                        logging.info(
-                            f"excluding node {node} with exclude label {exclude_nodes}"
-                        )
-                        nodes.remove(node)
+                if exclude_nodes:
+                    logging.info(
+                        f"excluding nodes {exclude_nodes} with exclude label {exclude_label}"
+                    )
+                nodes = [node for node in nodes if node not in exclude_nodes]

        # GCP api doesn't support multiprocessing calls, will only actually run 1
        if parallel_nodes:
@@ -236,7 +234,7 @@ class NodeActionsScenarioPlugin(AbstractScenarioPlugin):
        # Get the scenario specifics for running action nodes
        run_kill_count = get_yaml_item_value(node_scenario, "runs", 1)
        duration = get_yaml_item_value(node_scenario, "duration", 120)
-
+        poll_interval = get_yaml_item_value(node_scenario, "poll_interval", 15)
        timeout = get_yaml_item_value(node_scenario, "timeout", 120)
        service = get_yaml_item_value(node_scenario, "service", "")
        soft_reboot = get_yaml_item_value(node_scenario, "soft_reboot", False)
@@ -254,19 +252,19 @@ class NodeActionsScenarioPlugin(AbstractScenarioPlugin):
        else:
            if action == "node_start_scenario":
                node_scenario_object.node_start_scenario(
-                    run_kill_count, single_node, timeout
+                    run_kill_count, single_node, timeout, poll_interval
                )
            elif action == "node_stop_scenario":
                node_scenario_object.node_stop_scenario(
-                    run_kill_count, single_node, timeout
+                    run_kill_count, single_node, timeout, poll_interval
                )
            elif action == "node_stop_start_scenario":
                node_scenario_object.node_stop_start_scenario(
-                    run_kill_count, single_node, timeout, duration
+                    run_kill_count, single_node, timeout, duration, poll_interval
                )
            elif action == "node_termination_scenario":
                node_scenario_object.node_termination_scenario(
-                    run_kill_count, single_node, timeout
+                    run_kill_count, single_node, timeout, poll_interval
                )
            elif action == "node_reboot_scenario":
                node_scenario_object.node_reboot_scenario(
--- a/krkn/scenario_plugins/node_actions/openstack_node_scenarios.py
+++ b/krkn/scenario_plugins/node_actions/openstack_node_scenarios.py
@@ -122,7 +122,7 @@ class openstack_node_scenarios(abstract_node_scenarios):
        self.node_action_kube_check = node_action_kube_check
    
    # Node scenario to start the node
-    def node_start_scenario(self, instance_kill_count, node, timeout):
+    def node_start_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            affected_node = AffectedNode(node)
            try:
@@ -147,7 +147,7 @@ class openstack_node_scenarios(abstract_node_scenarios):
            self.affected_nodes_status.affected_nodes.append(affected_node)

    # Node scenario to stop the node
-    def node_stop_scenario(self, instance_kill_count, node, timeout):
+    def node_stop_scenario(self, instance_kill_count, node, timeout, poll_interval):
        for _ in range(instance_kill_count):
            affected_node = AffectedNode(node)
            try:
@@ -184,7 +184,7 @@ class openstack_node_scenarios(abstract_node_scenarios):
                    nodeaction.wait_for_unknown_status(node, timeout, self.kubecli, affected_node)
                    nodeaction.wait_for_ready_status(node, timeout, self.kubecli, affected_node)
                logging.info("Node with instance name: %s has been rebooted" % (node))
-                logging.info("node_reboot_scenario has been successfuly injected!")
+                logging.info("node_reboot_scenario has been successfully injected!")
            except Exception as e:
                logging.error(
                    "Failed to reboot node instance. Encountered following exception:"
@@ -249,7 +249,7 @@ class openstack_node_scenarios(abstract_node_scenarios):
                node_ip.strip(), service, ssh_private_key, timeout
            )
            logging.info("Service status checked on %s" % (node_ip))
-            logging.info("Check service status is successfuly injected!")
+            logging.info("Check service status is successfully injected!")
        except Exception as e:
            logging.error(
                "Failed to check service status. Encountered following exception:"
--- a/krkn/scenario_plugins/node_actions/vmware_node_scenarios.py
+++ b/krkn/scenario_plugins/node_actions/vmware_node_scenarios.py
@@ -389,7 +389,7 @@ class vmware_node_scenarios(abstract_node_scenarios):
        self.vsphere = vSphere()
        self.node_action_kube_check = node_action_kube_check

-    def node_start_scenario(self, instance_kill_count, node, timeout):
+    def node_start_scenario(self, instance_kill_count, node, timeout, poll_interval):
        try:
            for _ in range(instance_kill_count):
                affected_node = AffectedNode(node)
@@ -409,7 +409,7 @@ class vmware_node_scenarios(abstract_node_scenarios):
                f"node_start_scenario injection failed! " f"Error was: {str(e)}"
            )

-    def node_stop_scenario(self, instance_kill_count, node, timeout):
+    def node_stop_scenario(self, instance_kill_count, node, timeout, poll_interval):
        try:
            for _ in range(instance_kill_count):
                affected_node = AffectedNode(node)
@@ -456,7 +456,7 @@ class vmware_node_scenarios(abstract_node_scenarios):
            )


-    def node_terminate_scenario(self, instance_kill_count, node, timeout):
+    def node_terminate_scenario(self, instance_kill_count, node, timeout, poll_interval):
        try:
            for _ in range(instance_kill_count):
                affected_node = AffectedNode(node)
--- a/krkn/scenario_plugins/pod_disruption/pod_disruption_scenario_plugin.py
+++ b/krkn/scenario_plugins/pod_disruption/pod_disruption_scenario_plugin.py
@@ -2,7 +2,7 @@ import logging
 import random
 import time
 from asyncio import Future
-
+import traceback
 import yaml
 from krkn_lib.k8s import KrknKubernetes
 from krkn_lib.k8s.pod_monitor import select_and_monitor_by_namespace_pattern_and_label, \
@@ -11,6 +11,7 @@ from krkn_lib.k8s.pod_monitor import select_and_monitor_by_namespace_pattern_and
 from krkn.scenario_plugins.pod_disruption.models.models import InputParams
 from krkn_lib.models.telemetry import ScenarioTelemetry
 from krkn_lib.telemetry.ocp import KrknTelemetryOpenshift
+from krkn_lib.models.pod_monitor.models import PodsSnapshot
 from datetime import datetime
 from dataclasses import dataclass

@@ -40,10 +41,27 @@ class PodDisruptionScenarioPlugin(AbstractScenarioPlugin):
                        kill_scenario_config,
                        lib_telemetry
                    )
-                    self.killing_pods(
+                    ret = self.killing_pods(
                        kill_scenario_config, lib_telemetry.get_lib_kubernetes()
                    )
+                    # returning 2 if configuration issue and exiting immediately
+                    if ret > 1:
+                        # Cancel the monitoring future since killing_pods already failed
+                        logging.info("Cancelling pod monitoring future")
+                        future_snapshot.cancel()
+                        # Wait for the future to finish (monitoring will stop when stop_event is set)
+                        while not future_snapshot.done():
+                            logging.info("waiting for future to finish")
+                            time.sleep(1)
+                        logging.info("future snapshot cancelled and finished")
+                        # Get the snapshot result (even if cancelled, it will have partial data)
+                        snapshot = future_snapshot.result()
+                        result = snapshot.get_pods_status()
+                        scenario_telemetry.affected_pods = result

+                        logging.error("PodDisruptionScenarioPlugin failed during setup" + str(result))
+                        return 1
+                    
                    snapshot = future_snapshot.result()
                    result = snapshot.get_pods_status()
                    scenario_telemetry.affected_pods = result
@@ -51,7 +69,12 @@ class PodDisruptionScenarioPlugin(AbstractScenarioPlugin):
                        logging.info("PodDisruptionScenarioPlugin failed with unrecovered pods")
                        return 1

+                    if ret > 0:
+                        logging.info("PodDisruptionScenarioPlugin failed")
+                        return 1
+                    
        except (RuntimeError, Exception) as e:
+            logging.error("Stack trace:\n%s", traceback.format_exc())
            logging.error("PodDisruptionScenariosPlugin exiting due to Exception %s" % e)
            return 1
        else:
@@ -128,7 +151,7 @@ class PodDisruptionScenarioPlugin(AbstractScenarioPlugin):
                field_selector=combined_field_selector
            )

-    def get_pods(self, name_pattern, label_selector, namespace, kubecli: KrknKubernetes, field_selector: str = None, node_label_selector: str = None, node_names: list = None, quiet: bool = False): 
+    def get_pods(self, name_pattern, label_selector, namespace, kubecli: KrknKubernetes, field_selector: str = None, node_label_selector: str = None, node_names: list = None): 
        if label_selector and name_pattern: 
            logging.error('Only, one of name pattern or label pattern can be specified')
            return []
@@ -139,8 +162,7 @@ class PodDisruptionScenarioPlugin(AbstractScenarioPlugin):
        
        # If specific node names are provided, make multiple calls with field selector
        if node_names:
-            if not quiet:
-                logging.info(f"Targeting pods on {len(node_names)} specific nodes")
+            logging.debug(f"Targeting pods on {len(node_names)} specific nodes")
            all_pods = []
            for node_name in node_names:
                pods = self._select_pods_with_field_selector(
@@ -150,8 +172,7 @@ class PodDisruptionScenarioPlugin(AbstractScenarioPlugin):
                if pods:
                    all_pods.extend(pods)
            
-            if not quiet:
-                logging.info(f"Found {len(all_pods)} target pods across {len(node_names)} nodes")
+            logging.debug(f"Found {len(all_pods)} target pods across {len(node_names)} nodes")
            return all_pods
        
        #  Node label selector approach - use field selectors
@@ -159,11 +180,10 @@ class PodDisruptionScenarioPlugin(AbstractScenarioPlugin):
            # Get nodes matching the label selector first
            nodes_with_label = kubecli.list_nodes(label_selector=node_label_selector)
            if not nodes_with_label:
-                logging.info(f"No nodes found with label selector: {node_label_selector}")
+                logging.debug(f"No nodes found with label selector: {node_label_selector}")
                return []
            
-            if not quiet:
-                logging.info(f"Targeting pods on {len(nodes_with_label)} nodes with label: {node_label_selector}")
+            logging.debug(f"Targeting pods on {len(nodes_with_label)} nodes with label: {node_label_selector}")
            # Use field selector for each node
            all_pods = []
            for node_name in nodes_with_label:
@@ -174,8 +194,7 @@ class PodDisruptionScenarioPlugin(AbstractScenarioPlugin):
                if pods:
                    all_pods.extend(pods)
            
-            if not quiet:
-                logging.info(f"Found {len(all_pods)} target pods across {len(nodes_with_label)} nodes")
+            logging.debug(f"Found {len(all_pods)} target pods across {len(nodes_with_label)} nodes")
            return all_pods
        
        # Standard pod selection (no node targeting)
@@ -185,37 +204,40 @@ class PodDisruptionScenarioPlugin(AbstractScenarioPlugin):
    
    def killing_pods(self, config: InputParams, kubecli: KrknKubernetes):
        # region Select target pods
+        try:
+            namespace = config.namespace_pattern
+            if not namespace: 
+                logging.error('Namespace pattern must be specified')
+
+            pods = self.get_pods(config.name_pattern,config.label_selector,config.namespace_pattern, kubecli, field_selector="status.phase=Running", node_label_selector=config.node_label_selector, node_names=config.node_names)
+            exclude_pods = set()
+            if config.exclude_label:
+                _exclude_pods = self.get_pods("",config.exclude_label,config.namespace_pattern, kubecli, field_selector="status.phase=Running", node_label_selector=config.node_label_selector, node_names=config.node_names)
+                for pod in _exclude_pods:
+                    exclude_pods.add(pod[0])
+
+
+            pods_count = len(pods)
+            if len(pods) < config.kill:
+                logging.error("Not enough pods match the criteria, expected {} but found only {} pods".format(
+                        config.kill, len(pods)))
+                return 1
            
-        namespace = config.namespace_pattern
-        if not namespace: 
-            logging.error('Namespace pattern must be specified')
+            random.shuffle(pods)
+            for i in range(config.kill):
+                pod = pods[i]
+                logging.info(pod)
+                if pod[0] in exclude_pods:
+                    logging.info(f"Excluding {pod[0]} from chaos")
+                else:
+                    logging.info(f'Deleting pod {pod[0]}')
+                    kubecli.delete_pod(pod[0], pod[1])
+            
+            return_val = self.wait_for_pods(config.label_selector,config.name_pattern,config.namespace_pattern, pods_count, config.duration, config.timeout, kubecli, config.node_label_selector, config.node_names)
+        except Exception as e:
+            raise(e)

-        pods = self.get_pods(config.name_pattern,config.label_selector,config.namespace_pattern, kubecli, field_selector="status.phase=Running", node_label_selector=config.node_label_selector, node_names=config.node_names)
-        exclude_pods = set()
-        if config.exclude_label:
-            _exclude_pods = self.get_pods("",config.exclude_label,config.namespace_pattern, kubecli, field_selector="status.phase=Running", node_label_selector=config.node_label_selector, node_names=config.node_names)
-            for pod in _exclude_pods:
-                exclude_pods.add(pod[0])
-
-
-        pods_count = len(pods)
-        if len(pods) < config.kill:
-            logging.error("Not enough pods match the criteria, expected {} but found only {} pods".format(
-                    config.kill, len(pods)))
-            return 1
-        
-        random.shuffle(pods)
-        for i in range(config.kill):
-            pod = pods[i]
-            logging.info(pod)
-            if pod[0] in exclude_pods:
-                logging.info(f"Excluding {pod[0]} from chaos")
-            else:
-                logging.info(f'Deleting pod {pod[0]}')
-                kubecli.delete_pod(pod[0], pod[1])
-        
-        self.wait_for_pods(config.label_selector,config.name_pattern,config.namespace_pattern, pods_count, config.duration, config.timeout, kubecli, config.node_label_selector, config.node_names)
-        return 0
+        return return_val

    def wait_for_pods(
        self, label_selector, pod_name, namespace, pod_count, duration, wait_timeout, kubecli: KrknKubernetes, node_label_selector, node_names
@@ -224,10 +246,10 @@ class PodDisruptionScenarioPlugin(AbstractScenarioPlugin):
        start_time = datetime.now()

        while not timeout:
-            pods = self.get_pods(name_pattern=pod_name, label_selector=label_selector,namespace=namespace, field_selector="status.phase=Running", kubecli=kubecli, node_label_selector=node_label_selector, node_names=node_names, quiet=True)
+            pods = self.get_pods(name_pattern=pod_name, label_selector=label_selector,namespace=namespace, field_selector="status.phase=Running", kubecli=kubecli, node_label_selector=node_label_selector, node_names=node_names)
            if pod_count == len(pods):
-                return
-               
+                return 0
+            
            time.sleep(duration)

            now_time = datetime.now()
@@ -236,4 +258,5 @@ class PodDisruptionScenarioPlugin(AbstractScenarioPlugin):
            if time_diff.seconds > wait_timeout:
                logging.error("timeout while waiting for pods to come up")
                return 1
+
        return 0
--- a/krkn/scenario_plugins/pvc/pvc_scenario_plugin.py
+++ b/krkn/scenario_plugins/pvc/pvc_scenario_plugin.py
@@ -1,3 +1,5 @@
+import base64
+import json
 import logging
 import random
 import re
@@ -11,9 +13,12 @@ from krkn_lib.utils import get_yaml_item_value, log_exception

 from krkn import cerberus, utils
 from krkn.scenario_plugins.abstract_scenario_plugin import AbstractScenarioPlugin
+from krkn.rollback.config import RollbackContent
+from krkn.rollback.handler import set_rollback_context_decorator


 class PvcScenarioPlugin(AbstractScenarioPlugin):
+    @set_rollback_context_decorator
    def run(
        self,
        run_uuid: str,
@@ -229,6 +234,24 @@ class PvcScenarioPlugin(AbstractScenarioPlugin):
                logging.info("\n" + str(response))
                if str(file_name).lower() in str(response).lower():
                    logging.info("%s file successfully created" % (str(full_path)))
+                    
+                    # Set rollback callable to ensure temp file cleanup on failure or interruption
+                    rollback_data = {
+                        "pod_name": pod_name,
+                        "container_name": container_name,
+                        "mount_path": mount_path,
+                        "file_name": file_name,
+                        "full_path": full_path,
+                    }
+                    json_str = json.dumps(rollback_data)
+                    encoded_data = base64.b64encode(json_str.encode('utf-8')).decode('utf-8')
+                    self.rollback_handler.set_rollback_callable(
+                        self.rollback_temp_file,
+                        RollbackContent(
+                            namespace=namespace,
+                            resource_identifier=encoded_data,
+                        ),
+                    )
                else:
                    logging.error(
                        "PvcScenarioPlugin Failed to create tmp file with %s size"
@@ -313,5 +336,57 @@ class PvcScenarioPlugin(AbstractScenarioPlugin):
        res = int(value[:-2]) * (base**exp)
        return res

+    @staticmethod
+    def rollback_temp_file(
+        rollback_content: RollbackContent,
+        lib_telemetry: KrknTelemetryOpenshift,
+    ):
+        """Rollback function to remove temporary file created during the PVC scenario.
+
+        :param rollback_content: Rollback content containing namespace and encoded rollback data in resource_identifier.
+        :param lib_telemetry: Instance of KrknTelemetryOpenshift for Kubernetes operations.
+        """
+        try:
+            namespace = rollback_content.namespace
+            import base64 # noqa
+            import json # noqa
+            decoded_data = base64.b64decode(rollback_content.resource_identifier.encode('utf-8')).decode('utf-8')
+            rollback_data = json.loads(decoded_data)
+            pod_name = rollback_data["pod_name"]
+            container_name = rollback_data["container_name"]
+            full_path = rollback_data["full_path"]
+            file_name = rollback_data["file_name"]
+            mount_path = rollback_data["mount_path"]
+            
+            logging.info(
+                f"Rolling back PVC scenario: removing temp file {full_path} from pod {pod_name} in namespace {namespace}"
+            )
+            
+            # Remove the temp file
+            command = "rm -f %s" % (str(full_path))
+            logging.info("Remove temp file from the PVC command:\n %s" % command)
+            response = lib_telemetry.get_lib_kubernetes().exec_cmd_in_pod(
+                [command], pod_name, namespace, container_name
+            )
+            logging.info("\n" + str(response))
+            # Verify removal
+            command = "ls -lh %s" % (str(mount_path))
+            logging.info("Check temp file is removed command:\n %s" % command)
+            response = lib_telemetry.get_lib_kubernetes().exec_cmd_in_pod(
+                [command], pod_name, namespace, container_name
+            )
+            logging.info("\n" + str(response))
+            
+            if not (str(file_name).lower() in str(response).lower()):
+                logging.info("Temp file successfully removed during rollback")
+            else:
+                logging.warning(
+                    f"Temp file {file_name} may still exist after rollback attempt"
+                )
+            
+            logging.info("PVC scenario rollback completed successfully.")
+        except Exception as e:
+            logging.error(f"Failed to rollback PVC scenario temp file: {e}")
+
    def get_scenario_types(self) -> list[str]:
        return ["pvc_scenarios"]
--- a/krkn/scenario_plugins/scenario_plugin_factory.py
+++ b/krkn/scenario_plugins/scenario_plugin_factory.py
@@ -1,7 +1,7 @@
 import importlib
 import inspect
 import pkgutil
-from typing import Type, Tuple, Optional
+from typing import Type, Tuple, Optional, Any
 from krkn.scenario_plugins.abstract_scenario_plugin import AbstractScenarioPlugin


@@ -11,7 +11,7 @@ class ScenarioPluginNotFound(Exception):

 class ScenarioPluginFactory:

-    loaded_plugins: dict[str, any] = {}
+    loaded_plugins: dict[str, Any] = {}
    failed_plugins: list[Tuple[str, str, str]] = []
    package_name = None

--- a/krkn/scenario_plugins/service_disruption/service_disruption_scenario_plugin.py
+++ b/krkn/scenario_plugins/service_disruption/service_disruption_scenario_plugin.py
@@ -209,7 +209,7 @@ class ServiceDisruptionScenarioPlugin(AbstractScenarioPlugin):
        try:
            statefulsets = kubecli.get_all_statefulset(namespace)
            for statefulset in statefulsets:
-                logging.info("Deleting statefulsets" + statefulsets)
+                logging.info("Deleting statefulset" + statefulset)
                kubecli.delete_statefulset(statefulset, namespace)
        except Exception as e:
            logging.error(
--- a/krkn/scenario_plugins/service_hijacking/service_hijacking_scenario_plugin.py
+++ b/krkn/scenario_plugins/service_hijacking/service_hijacking_scenario_plugin.py
@@ -1,13 +1,17 @@
+import json
 import logging
 import time
-
+import base64
 import yaml
 from krkn_lib.models.telemetry import ScenarioTelemetry
 from krkn_lib.telemetry.ocp import KrknTelemetryOpenshift
 from krkn.scenario_plugins.abstract_scenario_plugin import AbstractScenarioPlugin
 from krkn_lib.utils import get_yaml_item_value
+from krkn.rollback.config import RollbackContent
+from krkn.rollback.handler import set_rollback_context_decorator

 class ServiceHijackingScenarioPlugin(AbstractScenarioPlugin):
+    @set_rollback_context_decorator
    def run(
        self,
        run_uuid: str,
@@ -78,6 +82,24 @@ class ServiceHijackingScenarioPlugin(AbstractScenarioPlugin):

            logging.info(f"service: {service_name} successfully patched!")
            logging.info(f"original service manifest:\n\n{yaml.dump(original_service)}")
+            
+            # Set rollback callable to ensure service restoration and pod cleanup on failure or interruption
+            rollback_data = {
+                "service_name": service_name,
+                "service_namespace": service_namespace,
+                "original_selectors": original_service["spec"]["selector"],
+                "webservice_pod_name": webservice.pod_name,
+            }
+            json_str = json.dumps(rollback_data)
+            encoded_data = base64.b64encode(json_str.encode("utf-8")).decode("utf-8")
+            self.rollback_handler.set_rollback_callable(
+                self.rollback_service_hijacking,
+                RollbackContent(
+                    namespace=service_namespace,
+                    resource_identifier=encoded_data,
+                ),
+            )
+            
            logging.info(f"waiting {chaos_duration} before restoring the service")
            time.sleep(chaos_duration)
            selectors = [
@@ -106,5 +128,63 @@ class ServiceHijackingScenarioPlugin(AbstractScenarioPlugin):
            )
            return 1

+    @staticmethod
+    def rollback_service_hijacking(
+        rollback_content: RollbackContent,
+        lib_telemetry: KrknTelemetryOpenshift,
+    ):
+        """Rollback function to restore original service selectors and cleanup hijacker pod.
+
+        :param rollback_content: Rollback content containing namespace and encoded rollback data in resource_identifier.
+        :param lib_telemetry: Instance of KrknTelemetryOpenshift for Kubernetes operations.
+        """
+        try:
+            namespace = rollback_content.namespace
+            import json # noqa
+            import base64 # noqa
+            # Decode rollback data from resource_identifier
+            decoded_data = base64.b64decode(rollback_content.resource_identifier.encode("utf-8")).decode("utf-8")
+            rollback_data = json.loads(decoded_data)
+            service_name = rollback_data["service_name"]
+            service_namespace = rollback_data["service_namespace"]
+            original_selectors = rollback_data["original_selectors"]
+            webservice_pod_name = rollback_data["webservice_pod_name"]
+            
+            logging.info(
+                f"Rolling back service hijacking: restoring service {service_name} in namespace {service_namespace}"
+            )
+            
+            # Restore original service selectors
+            selectors = [
+                "=".join([key, original_selectors[key]])
+                for key in original_selectors.keys()
+            ]
+            logging.info(f"Restoring original service selectors: {selectors}")
+            
+            restored_service = lib_telemetry.get_lib_kubernetes().replace_service_selector(
+                selectors, service_name, service_namespace
+            )
+            
+            if restored_service is None:
+                logging.warning(
+                    f"Failed to restore service {service_name} in namespace {service_namespace}"
+                )
+            else:
+                logging.info(f"Successfully restored service {service_name}")
+            
+            # Delete the hijacker pod
+            logging.info(f"Deleting hijacker pod: {webservice_pod_name}")
+            try:
+                lib_telemetry.get_lib_kubernetes().delete_pod(
+                    webservice_pod_name, service_namespace
+                )
+                logging.info(f"Successfully deleted hijacker pod: {webservice_pod_name}")
+            except Exception as e:
+                logging.warning(f"Failed to delete hijacker pod {webservice_pod_name}: {e}")
+            
+            logging.info("Service hijacking rollback completed successfully.")
+        except Exception as e:
+            logging.error(f"Failed to rollback service hijacking: {e}")
+
    def get_scenario_types(self) -> list[str]:
        return ["service_hijacking_scenarios"]
--- a/krkn/scenario_plugins/syn_flood/syn_flood_scenario_plugin.py
+++ b/krkn/scenario_plugins/syn_flood/syn_flood_scenario_plugin.py
@@ -1,3 +1,5 @@
+import base64
+import json
 import logging
 import os
 import time
@@ -7,9 +9,12 @@ from krkn_lib import utils as krkn_lib_utils
 from krkn_lib.models.telemetry import ScenarioTelemetry
 from krkn_lib.telemetry.ocp import KrknTelemetryOpenshift
 from krkn.scenario_plugins.abstract_scenario_plugin import AbstractScenarioPlugin
+from krkn.rollback.config import RollbackContent
+from krkn.rollback.handler import set_rollback_context_decorator


 class SynFloodScenarioPlugin(AbstractScenarioPlugin):
+    @set_rollback_context_decorator
    def run(
        self,
        run_uuid: str,
@@ -50,6 +55,16 @@ class SynFloodScenarioPlugin(AbstractScenarioPlugin):
                        config["attacker-nodes"],
                    )
                    pod_names.append(pod_name)
+                
+                # Set rollback callable to ensure pod cleanup on failure or interruption
+                rollback_data = base64.b64encode(json.dumps(pod_names).encode('utf-8')).decode('utf-8')
+                self.rollback_handler.set_rollback_callable(
+                    self.rollback_syn_flood_pods,
+                    RollbackContent(
+                        namespace=config["namespace"],
+                        resource_identifier=rollback_data,
+                    ),
+                )

            logging.info("waiting all the attackers to finish:")
            did_finish = False
@@ -137,3 +152,23 @@ class SynFloodScenarioPlugin(AbstractScenarioPlugin):

    def get_scenario_types(self) -> list[str]:
        return ["syn_flood_scenarios"]
+
+    @staticmethod
+    def rollback_syn_flood_pods(rollback_content: RollbackContent, lib_telemetry: KrknTelemetryOpenshift):
+        """
+        Rollback function to delete syn flood pods.
+
+        :param rollback_content: Rollback content containing namespace and resource_identifier.
+        :param lib_telemetry: Instance of KrknTelemetryOpenshift for Kubernetes operations
+        """
+        try:
+            namespace = rollback_content.namespace
+            import base64 # noqa
+            import json # noqa
+            pod_names = json.loads(base64.b64decode(rollback_content.resource_identifier.encode('utf-8')).decode('utf-8'))
+            logging.info(f"Rolling back syn flood pods: {pod_names} in namespace: {namespace}")
+            for pod_name in pod_names:
+                lib_telemetry.get_lib_kubernetes().delete_pod(pod_name, namespace)
+            logging.info("Rollback of syn flood pods completed successfully.")
+        except Exception as e:
+            logging.error(f"Failed to rollback syn flood pods: {e}")
--- a/krkn/scenario_plugins/time_actions/time_actions_scenario_plugin.py
+++ b/krkn/scenario_plugins/time_actions/time_actions_scenario_plugin.py
@@ -43,7 +43,7 @@ class TimeActionsScenarioPlugin(AbstractScenarioPlugin):
                    cerberus.publish_kraken_status(
                        krkn_config, not_reset, start_time, end_time
                    )
-        except (RuntimeError, Exception):
+        except (RuntimeError, Exception) as e:
            logging.error(
                f"TimeActionsScenarioPlugin scenario {scenario} failed with exception: {e}"
            )
@@ -144,6 +144,10 @@ class TimeActionsScenarioPlugin(AbstractScenarioPlugin):
                node_names = scenario["object_name"]
            elif "label_selector" in scenario.keys() and scenario["label_selector"]:
                node_names = kubecli.list_nodes(scenario["label_selector"])
+                # going to filter out nodes with the exclude_label if it is provided
+                if "exclude_label" in scenario.keys() and scenario["exclude_label"]:
+                    excluded_nodes = kubecli.list_nodes(scenario["exclude_label"])
+                    node_names = [node for node in node_names if node not in excluded_nodes]
            for node in node_names:
                self.skew_node(node, scenario["action"], kubecli)
                logging.info("Reset date/time on node " + str(node))
@@ -189,6 +193,10 @@ class TimeActionsScenarioPlugin(AbstractScenarioPlugin):
                    counter += 1
            elif "label_selector" in scenario.keys() and scenario["label_selector"]:
                pod_names = kubecli.get_all_pods(scenario["label_selector"])
+                # and here filter out the pods with exclude_label if it is provided
+                if "exclude_label" in scenario.keys() and scenario["exclude_label"]:
+                    excluded_pods = kubecli.get_all_pods(scenario["exclude_label"])
+                    pod_names = [pod for pod in pod_names if pod not in excluded_pods]

            if len(pod_names) == 0:
                logging.info(
--- a/krkn/scenario_plugins/zone_outage/zone_outage_scenario_plugin.py
+++ b/krkn/scenario_plugins/zone_outage/zone_outage_scenario_plugin.py
@@ -140,7 +140,7 @@ class ZoneOutageScenarioPlugin(AbstractScenarioPlugin):
                network_association_ids[0], acl_id
            )

-            # capture the orginal_acl_id, created_acl_id and
+            # capture the original_acl_id, created_acl_id and
            # new association_id to use during the recovery
            ids[new_association_id] = original_acl_id

@@ -156,7 +156,7 @@ class ZoneOutageScenarioPlugin(AbstractScenarioPlugin):
                new_association_id, original_acl_id
            )
        logging.info(
-            "Wating for 60 seconds to make sure " "the changes are in place"
+            "Waiting for 60 seconds to make sure " "the changes are in place"
        )
        time.sleep(60)

--- a/krkn/tests/test_plugin_factory.py
+++ b/krkn/tests/test_plugin_factory.py
@@ -1,10 +1,17 @@
+import json
+import tempfile
 import unittest
+from pathlib import Path
+from unittest.mock import Mock, patch

 from krkn.scenario_plugins.abstract_scenario_plugin import AbstractScenarioPlugin
 from krkn.scenario_plugins.scenario_plugin_factory import ScenarioPluginFactory
+from krkn.scenario_plugins.native.plugins import PluginStep, Plugins, PLUGINS
 from krkn.tests.test_classes.correct_scenario_plugin import (
    CorrectScenarioPlugin,
 )
+import yaml
+


 class TestPluginFactory(unittest.TestCase):
@@ -108,3 +115,437 @@ class TestPluginFactory(unittest.TestCase):
        self.assertEqual(
            message, "scenario plugin folder cannot contain `scenario` or `plugin` word"
        )
+
+
+class TestPluginStep(unittest.TestCase):
+    """Test cases for PluginStep class"""
+
+    def setUp(self):
+        """Set up test fixtures"""
+        # Create a mock schema
+        self.mock_schema = Mock()
+        self.mock_schema.id = "test_step"
+
+        # Create mock output
+        mock_output = Mock()
+        mock_output.serialize = Mock(return_value={"status": "success", "message": "test"})
+        self.mock_schema.outputs = {
+            "success": mock_output,
+            "error": mock_output
+        }
+
+        self.plugin_step = PluginStep(
+            schema=self.mock_schema,
+            error_output_ids=["error"]
+        )
+
+    def test_render_output(self):
+        """Test render_output method"""
+        output_id = "success"
+        output_data = {"status": "success", "message": "test output"}
+
+        result = self.plugin_step.render_output(output_id, output_data)
+
+        # Verify it returns a JSON string
+        self.assertIsInstance(result, str)
+
+        # Verify it can be parsed as JSON
+        parsed = json.loads(result)
+        self.assertEqual(parsed["output_id"], output_id)
+        self.assertIn("output_data", parsed)
+
+
+class TestPlugins(unittest.TestCase):
+    """Test cases for Plugins class"""
+
+    def setUp(self):
+        """Set up test fixtures"""
+        # Create mock steps with proper id attribute
+        self.mock_step1 = Mock()
+        self.mock_step1.id = "step1"
+
+        self.mock_step2 = Mock()
+        self.mock_step2.id = "step2"
+
+        self.plugin_step1 = PluginStep(schema=self.mock_step1, error_output_ids=["error"])
+        self.plugin_step2 = PluginStep(schema=self.mock_step2, error_output_ids=["error"])
+
+    def test_init_with_valid_steps(self):
+        """Test Plugins initialization with valid steps"""
+        plugins = Plugins([self.plugin_step1, self.plugin_step2])
+
+        self.assertEqual(len(plugins.steps_by_id), 2)
+        self.assertIn("step1", plugins.steps_by_id)
+        self.assertIn("step2", plugins.steps_by_id)
+
+    def test_init_with_duplicate_step_ids(self):
+        """Test Plugins initialization with duplicate step IDs raises exception"""
+        # Create two steps with the same ID
+        duplicate_step = PluginStep(schema=self.mock_step1, error_output_ids=["error"])
+
+        with self.assertRaises(Exception) as context:
+            Plugins([self.plugin_step1, duplicate_step])
+
+        self.assertIn("Duplicate step ID", str(context.exception))
+
+    def test_unserialize_scenario(self):
+        """Test unserialize_scenario method"""
+        # Create a temporary YAML file
+        test_data = [
+            {"id": "test_step", "config": {"param": "value"}}
+        ]
+
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
+            yaml.dump(test_data, f)
+            temp_file = f.name
+
+        try:
+            plugins = Plugins([self.plugin_step1])
+            result = plugins.unserialize_scenario(temp_file)
+
+            self.assertIsInstance(result, list)
+        finally:
+            Path(temp_file).unlink()
+
+    def test_run_with_invalid_scenario_not_list(self):
+        """Test run method with scenario that is not a list"""
+        # Create a temporary YAML file with dict instead of list
+        test_data = {"id": "test_step", "config": {"param": "value"}}
+
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
+            yaml.dump(test_data, f)
+            temp_file = f.name
+
+        try:
+            plugins = Plugins([self.plugin_step1])
+
+            with self.assertRaises(Exception) as context:
+                plugins.run(temp_file, "/path/to/kubeconfig", "/path/to/kraken_config", "test-uuid")
+
+            self.assertIn("expected list", str(context.exception))
+        finally:
+            Path(temp_file).unlink()
+
+    def test_run_with_invalid_entry_not_dict(self):
+        """Test run method with entry that is not a dict"""
+        # Create a temporary YAML file with list of strings instead of dicts
+        test_data = ["invalid", "entries"]
+
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
+            yaml.dump(test_data, f)
+            temp_file = f.name
+
+        try:
+            plugins = Plugins([self.plugin_step1])
+
+            with self.assertRaises(Exception) as context:
+                plugins.run(temp_file, "/path/to/kubeconfig", "/path/to/kraken_config", "test-uuid")
+
+            self.assertIn("expected a list of dict's", str(context.exception))
+        finally:
+            Path(temp_file).unlink()
+
+    def test_run_with_missing_id_field(self):
+        """Test run method with missing 'id' field"""
+        # Create a temporary YAML file with missing id
+        test_data = [
+            {"config": {"param": "value"}}
+        ]
+
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
+            yaml.dump(test_data, f)
+            temp_file = f.name
+
+        try:
+            plugins = Plugins([self.plugin_step1])
+
+            with self.assertRaises(Exception) as context:
+                plugins.run(temp_file, "/path/to/kubeconfig", "/path/to/kraken_config", "test-uuid")
+
+            self.assertIn("missing 'id' field", str(context.exception))
+        finally:
+            Path(temp_file).unlink()
+
+    def test_run_with_missing_config_field(self):
+        """Test run method with missing 'config' field"""
+        # Create a temporary YAML file with missing config
+        test_data = [
+            {"id": "step1"}
+        ]
+
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
+            yaml.dump(test_data, f)
+            temp_file = f.name
+
+        try:
+            plugins = Plugins([self.plugin_step1])
+
+            with self.assertRaises(Exception) as context:
+                plugins.run(temp_file, "/path/to/kubeconfig", "/path/to/kraken_config", "test-uuid")
+
+            self.assertIn("missing 'config' field", str(context.exception))
+        finally:
+            Path(temp_file).unlink()
+
+    def test_run_with_invalid_step_id(self):
+        """Test run method with invalid step ID"""
+        # Create a proper mock schema with string ID
+        mock_schema = Mock()
+        mock_schema.id = "valid_step"
+        plugin_step = PluginStep(schema=mock_schema, error_output_ids=["error"])
+
+        # Create a temporary YAML file with unknown step ID
+        test_data = [
+            {"id": "unknown_step", "config": {"param": "value"}}
+        ]
+
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
+            yaml.dump(test_data, f)
+            temp_file = f.name
+
+        try:
+            plugins = Plugins([plugin_step])
+
+            with self.assertRaises(Exception) as context:
+                plugins.run(temp_file, "/path/to/kubeconfig", "/path/to/kraken_config", "test-uuid")
+
+            self.assertIn("Invalid step", str(context.exception))
+            self.assertIn("expected one of", str(context.exception))
+        finally:
+            Path(temp_file).unlink()
+
+    @patch('krkn.scenario_plugins.native.plugins.logging')
+    def test_run_with_valid_scenario(self, mock_logging):
+        """Test run method with valid scenario"""
+        # Create mock schema with all necessary attributes
+        mock_schema = Mock()
+        mock_schema.id = "test_step"
+
+        # Mock input schema
+        mock_input = Mock()
+        mock_input.properties = {}
+        mock_input.unserialize = Mock(return_value=Mock(spec=[]))
+        mock_schema.input = mock_input
+
+        # Mock output
+        mock_output = Mock()
+        mock_output.serialize = Mock(return_value={"status": "success"})
+        mock_schema.outputs = {"success": mock_output}
+
+        # Mock schema call
+        mock_schema.return_value = ("success", {"status": "success"})
+
+        plugin_step = PluginStep(schema=mock_schema, error_output_ids=["error"])
+
+        # Create a temporary YAML file
+        test_data = [
+            {"id": "test_step", "config": {"param": "value"}}
+        ]
+
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
+            yaml.dump(test_data, f)
+            temp_file = f.name
+
+        try:
+            plugins = Plugins([plugin_step])
+            plugins.run(temp_file, "/path/to/kubeconfig", "/path/to/kraken_config", "test-uuid")
+
+            # Verify schema was called
+            mock_schema.assert_called_once()
+        finally:
+            Path(temp_file).unlink()
+
+    @patch('krkn.scenario_plugins.native.plugins.logging')
+    def test_run_with_error_output(self, mock_logging):
+        """Test run method when step returns error output"""
+        # Create mock schema with error output
+        mock_schema = Mock()
+        mock_schema.id = "test_step"
+
+        # Mock input schema
+        mock_input = Mock()
+        mock_input.properties = {}
+        mock_input.unserialize = Mock(return_value=Mock(spec=[]))
+        mock_schema.input = mock_input
+
+        # Mock output
+        mock_output = Mock()
+        mock_output.serialize = Mock(return_value={"error": "test error"})
+        mock_schema.outputs = {"error": mock_output}
+
+        # Mock schema call to return error
+        mock_schema.return_value = ("error", {"error": "test error"})
+
+        plugin_step = PluginStep(schema=mock_schema, error_output_ids=["error"])
+
+        # Create a temporary YAML file
+        test_data = [
+            {"id": "test_step", "config": {"param": "value"}}
+        ]
+
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
+            yaml.dump(test_data, f)
+            temp_file = f.name
+
+        try:
+            plugins = Plugins([plugin_step])
+
+            with self.assertRaises(Exception) as context:
+                plugins.run(temp_file, "/path/to/kubeconfig", "/path/to/kraken_config", "test-uuid")
+
+            self.assertIn("failed", str(context.exception))
+        finally:
+            Path(temp_file).unlink()
+
+    @patch('krkn.scenario_plugins.native.plugins.logging')
+    def test_run_with_kubeconfig_path_injection(self, mock_logging):
+        """Test run method injects kubeconfig_path when property exists"""
+        # Create mock schema with kubeconfig_path in input properties
+        mock_schema = Mock()
+        mock_schema.id = "test_step"
+
+        # Mock input schema with kubeconfig_path property
+        mock_input_instance = Mock()
+        mock_input = Mock()
+        mock_input.properties = {"kubeconfig_path": Mock()}
+        mock_input.unserialize = Mock(return_value=mock_input_instance)
+        mock_schema.input = mock_input
+
+        # Mock output
+        mock_output = Mock()
+        mock_output.serialize = Mock(return_value={"status": "success"})
+        mock_schema.outputs = {"success": mock_output}
+
+        # Mock schema call
+        mock_schema.return_value = ("success", {"status": "success"})
+
+        plugin_step = PluginStep(schema=mock_schema, error_output_ids=["error"])
+
+        # Create a temporary YAML file
+        test_data = [
+            {"id": "test_step", "config": {"param": "value"}}
+        ]
+
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
+            yaml.dump(test_data, f)
+            temp_file = f.name
+
+        try:
+            plugins = Plugins([plugin_step])
+            plugins.run(temp_file, "/custom/kubeconfig", "/path/to/kraken_config", "test-uuid")
+
+            # Verify kubeconfig_path was set
+            self.assertEqual(mock_input_instance.kubeconfig_path, "/custom/kubeconfig")
+        finally:
+            Path(temp_file).unlink()
+
+    @patch('krkn.scenario_plugins.native.plugins.logging')
+    def test_run_with_kraken_config_injection(self, mock_logging):
+        """Test run method injects kraken_config when property exists"""
+        # Create mock schema with kraken_config in input properties
+        mock_schema = Mock()
+        mock_schema.id = "test_step"
+
+        # Mock input schema with kraken_config property
+        mock_input_instance = Mock()
+        mock_input = Mock()
+        mock_input.properties = {"kraken_config": Mock()}
+        mock_input.unserialize = Mock(return_value=mock_input_instance)
+        mock_schema.input = mock_input
+
+        # Mock output
+        mock_output = Mock()
+        mock_output.serialize = Mock(return_value={"status": "success"})
+        mock_schema.outputs = {"success": mock_output}
+
+        # Mock schema call
+        mock_schema.return_value = ("success", {"status": "success"})
+
+        plugin_step = PluginStep(schema=mock_schema, error_output_ids=["error"])
+
+        # Create a temporary YAML file
+        test_data = [
+            {"id": "test_step", "config": {"param": "value"}}
+        ]
+
+        with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
+            yaml.dump(test_data, f)
+            temp_file = f.name
+
+        try:
+            plugins = Plugins([plugin_step])
+            plugins.run(temp_file, "/path/to/kubeconfig", "/custom/kraken.yaml", "test-uuid")
+
+            # Verify kraken_config was set
+            self.assertEqual(mock_input_instance.kraken_config, "/custom/kraken.yaml")
+        finally:
+            Path(temp_file).unlink()
+
+    def test_json_schema(self):
+        """Test json_schema method"""
+        # Create mock schema with jsonschema support
+        mock_schema = Mock()
+        mock_schema.id = "test_step"
+
+        plugin_step = PluginStep(schema=mock_schema, error_output_ids=["error"])
+
+        with patch('krkn.scenario_plugins.native.plugins.jsonschema') as mock_jsonschema:
+            # Mock the step_input function
+            mock_jsonschema.step_input.return_value = {
+                "$id": "http://example.com",
+                "$schema": "http://json-schema.org/draft-07/schema#",
+                "title": "Test Schema",
+                "description": "Test description",
+                "type": "object",
+                "properties": {"param": {"type": "string"}}
+            }
+
+            plugins = Plugins([plugin_step])
+            result = plugins.json_schema()
+
+            # Verify it returns a JSON string
+            self.assertIsInstance(result, str)
+
+            # Parse and verify structure
+            parsed = json.loads(result)
+            self.assertEqual(parsed["$id"], "https://github.com/redhat-chaos/krkn/")
+            self.assertEqual(parsed["type"], "array")
+            self.assertEqual(parsed["minContains"], 1)
+            self.assertIn("items", parsed)
+            self.assertIn("oneOf", parsed["items"])
+
+            # Verify step is included
+            self.assertEqual(len(parsed["items"]["oneOf"]), 1)
+            step_schema = parsed["items"]["oneOf"][0]
+            self.assertEqual(step_schema["properties"]["id"]["const"], "test_step")
+
+
+class TestPLUGINSConstant(unittest.TestCase):
+    """Test cases for the PLUGINS constant"""
+
+    def test_plugins_initialized(self):
+        """Test that PLUGINS constant is properly initialized"""
+        self.assertIsInstance(PLUGINS, Plugins)
+
+        # Verify all expected steps are registered
+        expected_steps = [
+            "run_python",
+            "network_chaos",
+            "pod_network_outage",
+            "pod_egress_shaping",
+            "pod_ingress_shaping"
+        ]
+
+        for step_id in expected_steps:
+            self.assertIn(step_id, PLUGINS.steps_by_id)
+
+        # Ensure the registered id matches the decorator and no legacy alias is present
+        self.assertEqual(
+            PLUGINS.steps_by_id["pod_network_outage"].schema.id,
+            "pod_network_outage",
+        )
+        self.assertNotIn("pod_outage", PLUGINS.steps_by_id)
+
+    def test_plugins_step_count(self):
+        """Test that PLUGINS has the expected number of steps"""
+        self.assertEqual(len(PLUGINS.steps_by_id), 5)
--- a/krkn/utils/ErrorCollectionHandler.py
+++ b/krkn/utils/ErrorCollectionHandler.py
@@ -0,0 +1,71 @@
+import logging
+import threading
+from datetime import datetime, timezone
+from krkn.utils.ErrorLog import ErrorLog
+
+
+class ErrorCollectionHandler(logging.Handler):
+    """
+    Custom logging handler that captures ERROR and CRITICAL level logs
+    in structured format for telemetry collection.
+
+    Stores logs in memory as ErrorLog objects for later retrieval.
+    Thread-safe for concurrent logging operations.
+    """
+
+    def __init__(self, level=logging.ERROR):
+        """
+        Initialize the error collection handler.
+
+        Args:
+            level: Minimum log level to capture (default: ERROR)
+        """
+        super().__init__(level)
+        self.error_logs: list[ErrorLog] = []
+        self._lock = threading.Lock()
+
+    def emit(self, record: logging.LogRecord):
+        """
+        Capture ERROR and CRITICAL logs and store as ErrorLog objects.
+
+        Args:
+            record: LogRecord from Python logging framework
+        """
+        try:
+            # Only capture ERROR (40) and CRITICAL (50) levels
+            if record.levelno < logging.ERROR:
+                return
+
+            # Format timestamp as ISO 8601 UTC
+            timestamp = datetime.fromtimestamp(
+                record.created, tz=timezone.utc
+            ).strftime("%Y-%m-%dT%H:%M:%S.%f")[:-3] + "Z"
+
+            # Create ErrorLog object
+            error_log = ErrorLog(
+                timestamp=timestamp,
+                message=record.getMessage()
+            )
+
+            # Thread-safe append
+            with self._lock:
+                self.error_logs.append(error_log)
+
+        except Exception:
+            # Handler should never raise exceptions (logging best practice)
+            self.handleError(record)
+
+    def get_error_logs(self) -> list[dict]:
+        """
+        Retrieve all collected error logs as list of dictionaries.
+
+        Returns:
+            List of error log dictionaries with timestamp and message
+        """
+        with self._lock:
+            return [log.to_dict() for log in self.error_logs]
+
+    def clear(self):
+        """Clear all collected error logs (useful for testing)"""
+        with self._lock:
+            self.error_logs.clear()
--- a/krkn/utils/ErrorLog.py
+++ b/krkn/utils/ErrorLog.py
@@ -0,0 +1,18 @@
+from dataclasses import dataclass, asdict
+
+
+@dataclass
+class ErrorLog:
+    """
+    Represents a single error log entry for telemetry collection.
+
+    Attributes:
+        timestamp: ISO 8601 formatted timestamp (UTC)
+        message: Full error message text
+    """
+    timestamp: str
+    message: str
+
+    def to_dict(self) -> dict:
+        """Convert to dictionary for JSON serialization"""
+        return asdict(self)
--- a/krkn/utils/HealthChecker.py
+++ b/krkn/utils/HealthChecker.py
@@ -77,7 +77,7 @@ class HealthChecker:
                success_response = {
                    "url": url,
                    "status": True,
-                    "status_code": response["status_code"],
+                    "status_code": health_check_tracker[url]["status_code"],
                    "start_timestamp": health_check_tracker[url]["start_timestamp"].isoformat(),
                    "end_timestamp": health_check_end_time_stamp.isoformat(),
                    "duration": duration
--- a/krkn/utils/VirtChecker.py
+++ b/krkn/utils/VirtChecker.py
@@ -1,6 +1,7 @@

 import time
 import logging
+import math
 import queue
 from datetime import datetime
 from krkn_lib.models.telemetry.models import VirtCheck
@@ -19,38 +20,57 @@ class VirtChecker:
        self.namespace = get_yaml_item_value(kubevirt_check_config, "namespace", "")
        self.vm_list = []
        self.threads = []
+        self.iteration_lock = threading.Lock()  # Lock to protect current_iterations
        self.threads_limit = threads_limit
-        if self.namespace == "":
-            logging.info("kube virt checks config is not defined, skipping them")
-            return
+        # setting to 0 in case no variables are set, so no threads later get made
+        self.batch_size = 0
+        self.ret_value = 0
        vmi_name_match = get_yaml_item_value(kubevirt_check_config, "name", ".*")
        self.krkn_lib = krkn_lib
        self.disconnected =  get_yaml_item_value(kubevirt_check_config, "disconnected", False)
        self.only_failures =  get_yaml_item_value(kubevirt_check_config, "only_failures", False)
        self.interval = get_yaml_item_value(kubevirt_check_config, "interval", 2)
        self.ssh_node = get_yaml_item_value(kubevirt_check_config, "ssh_node", "")
+        self.node_names = get_yaml_item_value(kubevirt_check_config, "node_names", "")
+        self.exit_on_failure = get_yaml_item_value(kubevirt_check_config, "exit_on_failure", False)
+        if self.namespace == "":
+            logging.info("kube virt checks config is not defined, skipping them")
+            return
        try:
            self.kube_vm_plugin = KubevirtVmOutageScenarioPlugin()
            self.kube_vm_plugin.init_clients(k8s_client=krkn_lib)
-            vmis = self.kube_vm_plugin.get_vmis(vmi_name_match,self.namespace)
+
+            self.kube_vm_plugin.get_vmis(vmi_name_match,self.namespace)
        except Exception as e:
            logging.error('Virt Check init exception: ' + str(e))
-            return 
-        
-        for vmi in vmis:
+            return
+        # See if multiple node names exist
+        node_name_list = [node_name for node_name in self.node_names.split(',') if node_name]
+        for vmi in self.kube_vm_plugin.vmis_list:
            node_name = vmi.get("status",{}).get("nodeName")
            vmi_name = vmi.get("metadata",{}).get("name")
-            ip_address = vmi.get("status",{}).get("interfaces",[])[0].get("ipAddress")
-            self.vm_list.append(VirtCheck({'vm_name':vmi_name, 'ip_address': ip_address, 'namespace':self.namespace, 'node_name':node_name, "new_ip_address":""}))
+            interfaces = vmi.get("status",{}).get("interfaces",[])
+            if not interfaces:
+                logging.warning(f"VMI {vmi_name} has no network interfaces, skipping")
+                continue
+            ip_address = interfaces[0].get("ipAddress")
+            namespace = vmi.get("metadata",{}).get("namespace")
+            # If node_name_list exists, only add if node name is in list
+
+            if len(node_name_list) > 0 and node_name in node_name_list:
+                self.vm_list.append(VirtCheck({'vm_name':vmi_name, 'ip_address': ip_address, 'namespace':namespace, 'node_name':node_name, "new_ip_address":""}))
+            elif len(node_name_list) == 0:
+                # If node_name_list is blank, add all vms
+                self.vm_list.append(VirtCheck({'vm_name':vmi_name, 'ip_address': ip_address, 'namespace':namespace, 'node_name':node_name, "new_ip_address":""}))
+        self.batch_size = math.ceil(len(self.vm_list)/self.threads_limit)

    def check_disconnected_access(self, ip_address: str, worker_name:str = '', vmi_name: str = ''):
        
-        virtctl_vm_cmd = f"ssh core@{worker_name} 'ssh -o BatchMode=yes -o ConnectTimeout=5 -o StrictHostKeyChecking=no root@{ip_address}'"
+        virtctl_vm_cmd = f"ssh core@{worker_name} -o ConnectTimeout=5 'ssh -o BatchMode=yes -o ConnectTimeout=5 -o StrictHostKeyChecking=no root@{ip_address}'"
        
        all_out = invoke_no_exit(virtctl_vm_cmd)
        logging.debug(f"Checking disconnected access for {ip_address} on {worker_name} output: {all_out}")
-        virtctl_vm_cmd = f"ssh core@{worker_name} 'ssh -o BatchMode=yes -o ConnectTimeout=5 -o StrictHostKeyChecking=no root@{ip_address} 2>&1 | grep Permission' && echo 'True' || echo 'False'"
-        logging.debug(f"Checking disconnected access for {ip_address} on {worker_name} with command: {virtctl_vm_cmd}")
+        virtctl_vm_cmd = f"ssh core@{worker_name} -o ConnectTimeout=5 'ssh -o BatchMode=yes -o ConnectTimeout=5 -o StrictHostKeyChecking=no root@{ip_address} 2>&1 | grep Permission' && echo 'True' || echo 'False'"
        output = invoke_no_exit(virtctl_vm_cmd)
        if 'True' in output:
            logging.debug(f"Disconnected access for {ip_address} on {worker_name} is successful: {output}")
@@ -58,20 +78,19 @@ class VirtChecker:
        else:
            logging.debug(f"Disconnected access for {ip_address} on {worker_name} is failed: {output}")
            vmi = self.kube_vm_plugin.get_vmi(vmi_name,self.namespace)
-            new_ip_address = vmi.get("status",{}).get("interfaces",[])[0].get("ipAddress")
+            interfaces = vmi.get("status",{}).get("interfaces",[])
+            new_ip_address = interfaces[0].get("ipAddress") if interfaces else None
            new_node_name = vmi.get("status",{}).get("nodeName")
            # if vm gets deleted, it'll start up with a new ip address
            if new_ip_address != ip_address:
-                virtctl_vm_cmd = f"ssh core@{worker_name} 'ssh -o BatchMode=yes -o ConnectTimeout=5 -o StrictHostKeyChecking=no root@{new_ip_address} 2>&1 | grep Permission' && echo 'True' || echo 'False'"
-                logging.debug(f"Checking disconnected access for {new_ip_address} on {worker_name} with command: {virtctl_vm_cmd}")
+                virtctl_vm_cmd = f"ssh core@{worker_name} -o ConnectTimeout=5 'ssh -o BatchMode=yes -o ConnectTimeout=5 -o StrictHostKeyChecking=no root@{new_ip_address} 2>&1 | grep Permission' && echo 'True' || echo 'False'"
                new_output = invoke_no_exit(virtctl_vm_cmd)
                logging.debug(f"Disconnected access for {ip_address} on {worker_name}: {new_output}")
                if 'True' in new_output:
                    return True, new_ip_address, None
            # if node gets stopped, vmis will start up with a new node (and with new ip)
            if new_node_name != worker_name:
-                virtctl_vm_cmd = f"ssh core@{new_node_name} 'ssh -o BatchMode=yes -o ConnectTimeout=5 -o StrictHostKeyChecking=no root@{new_ip_address} 2>&1 | grep Permission' && echo 'True' || echo 'False'"
-                logging.debug(f"Checking disconnected access for {new_ip_address} on {new_node_name} with command: {virtctl_vm_cmd}")
+                virtctl_vm_cmd = f"ssh core@{new_node_name} -o ConnectTimeout=5 'ssh -o BatchMode=yes -o ConnectTimeout=5 -o StrictHostKeyChecking=no root@{new_ip_address} 2>&1 | grep Permission' && echo 'True' || echo 'False'"
                new_output = invoke_no_exit(virtctl_vm_cmd)
                logging.debug(f"Disconnected access for {ip_address} on {new_node_name}: {new_output}")
                if 'True' in new_output:
@@ -79,8 +98,7 @@ class VirtChecker:
            # try to connect with a common "up" node as last resort
            if self.ssh_node:
                # using new_ip_address here since if it hasn't changed it'll match ip_address
-                virtctl_vm_cmd = f"ssh core@{self.ssh_node} 'ssh -o BatchMode=yes -o ConnectTimeout=5 -o StrictHostKeyChecking=no root@{new_ip_address} 2>&1 | grep Permission' && echo 'True' || echo 'False'"
-                logging.debug(f"Checking disconnected access for {new_ip_address} on {self.ssh_node} with command: {virtctl_vm_cmd}")
+                virtctl_vm_cmd = f"ssh core@{self.ssh_node} -o ConnectTimeout=5 'ssh -o BatchMode=yes -o ConnectTimeout=5 -o StrictHostKeyChecking=no root@{new_ip_address} 2>&1 | grep Permission' && echo 'True' || echo 'False'"
                new_output = invoke_no_exit(virtctl_vm_cmd)
                logging.debug(f"Disconnected access for {new_ip_address} on {self.ssh_node}: {new_output}")
                if 'True' in new_output:
@@ -89,7 +107,7 @@ class VirtChecker:

    def get_vm_access(self, vm_name: str = '', namespace: str = ''):
        """
-        This method returns True when the VM is access and an error message when it is not, using virtctl protocol
+        This method returns True when the VM is accessible and an error message when it is not, using virtctl protocol
        :param vm_name:
        :param namespace:
        :return: virtctl_status 'True' if successful, or an error message if it fails.
@@ -108,22 +126,36 @@ class VirtChecker:
        for thread in self.threads:
            thread.join()

-    def batch_list(self,  queue: queue.Queue, batch_size=20):
-        # Provided prints to easily visualize how the threads are processed.    
-        for i in range (0, len(self.vm_list),batch_size):
-            sub_list = self.vm_list[i: i+batch_size]
-            index = i
-            t = threading.Thread(target=self.run_virt_check,name=str(index), args=(sub_list,queue))
-            self.threads.append(t)
-            t.start()
+    def batch_list(self, queue: queue.SimpleQueue = None):
+        if self.batch_size > 0:
+            # Provided prints to easily visualize how the threads are processed.    
+            for i in range (0, len(self.vm_list),self.batch_size):
+                if i+self.batch_size > len(self.vm_list):
+                    sub_list = self.vm_list[i:]
+                else:
+                    sub_list = self.vm_list[i: i+self.batch_size]
+                index = i
+                t = threading.Thread(target=self.run_virt_check,name=str(index), args=(sub_list,queue))
+                self.threads.append(t)
+                t.start()

-    
-    def run_virt_check(self, vm_list_batch, virt_check_telemetry_queue: queue.Queue):
+    def increment_iterations(self):
+        """Thread-safe method to increment current_iterations"""
+        with self.iteration_lock:
+            self.current_iterations += 1
+
+    def run_virt_check(self, vm_list_batch, virt_check_telemetry_queue: queue.SimpleQueue):
        
        virt_check_telemetry = []
        virt_check_tracker = {}
-        while self.current_iterations < self.iterations:
+        while True:
+            # Thread-safe read of current_iterations
+            with self.iteration_lock:
+                current = self.current_iterations
+            if current >= self.iterations:
+                break
            for vm in vm_list_batch:
+                start_time= datetime.now()
                try: 
                    if not self.disconnected: 
                        vm_status = self.get_vm_access(vm.vm_name, vm.namespace)
@@ -139,8 +171,9 @@ class VirtChecker:
                            if new_node_name and vm.node_name != new_node_name:
                                vm.node_name = new_node_name
                except Exception:
+                    logging.info('Exception in get vm status')
                    vm_status = False
-                
+
                if vm.vm_name not in virt_check_tracker:
                    start_timestamp = datetime.now()
                    virt_check_tracker[vm.vm_name] = {
@@ -153,6 +186,7 @@ class VirtChecker:
                        "new_ip_address": vm.new_ip_address
                    }
                else:
+                    
                    if vm_status != virt_check_tracker[vm.vm_name]["status"]:
                        end_timestamp = datetime.now()
                        start_timestamp = virt_check_tracker[vm.vm_name]["start_timestamp"]
@@ -181,4 +215,66 @@ class VirtChecker:
                    virt_check_telemetry.append(VirtCheck(virt_check_tracker[vm]))
            else:
                virt_check_telemetry.append(VirtCheck(virt_check_tracker[vm]))
-        virt_check_telemetry_queue.put(virt_check_telemetry)
+        try:
+            virt_check_telemetry_queue.put(virt_check_telemetry)
+        except Exception as e:
+            logging.error('Put queue error ' + str(e))
+    def run_post_virt_check(self, vm_list_batch, virt_check_telemetry, post_virt_check_queue: queue.SimpleQueue):
+        
+        virt_check_telemetry = []
+        virt_check_tracker = {}
+        start_timestamp = datetime.now()
+        for vm in vm_list_batch:
+            
+            try: 
+                if not self.disconnected: 
+                    vm_status = self.get_vm_access(vm.vm_name, vm.namespace)
+                else:
+                    vm_status, new_ip_address, new_node_name = self.check_disconnected_access(vm.ip_address, vm.node_name, vm.vm_name)
+                    if new_ip_address and vm.ip_address != new_ip_address:
+                        vm.new_ip_address = new_ip_address
+                    if new_node_name and vm.node_name != new_node_name:
+                        vm.node_name = new_node_name
+            except Exception:
+                vm_status = False
+            
+            if not vm_status:
+
+                virt_check_tracker= {
+                    "vm_name": vm.vm_name,
+                    "ip_address": vm.ip_address,
+                    "namespace": vm.namespace,
+                    "node_name": vm.node_name,
+                    "status": vm_status,
+                    "start_timestamp": start_timestamp.isoformat(),
+                    "new_ip_address": vm.new_ip_address,
+                    "duration": 0,
+                    "end_timestamp": start_timestamp.isoformat()
+                }
+                
+                virt_check_telemetry.append(VirtCheck(virt_check_tracker))
+        post_virt_check_queue.put(virt_check_telemetry)
+    
+
+    def gather_post_virt_checks(self, kubevirt_check_telem):
+
+        post_kubevirt_check_queue = queue.SimpleQueue()
+        post_threads = []
+
+        if self.batch_size > 0:
+            for i in range (0, len(self.vm_list),self.batch_size):
+                sub_list = self.vm_list[i: i+self.batch_size]
+                index = i
+                t = threading.Thread(target=self.run_post_virt_check,name=str(index), args=(sub_list,kubevirt_check_telem, post_kubevirt_check_queue))
+                post_threads.append(t)
+                t.start()
+
+            kubevirt_check_telem = []
+            for thread in post_threads:
+                thread.join()
+                if not post_kubevirt_check_queue.empty():
+                    kubevirt_check_telem.extend(post_kubevirt_check_queue.get_nowait())
+        
+        if self.exit_on_failure and len(kubevirt_check_telem) > 0:
+            self.ret_value = 2
+        return kubevirt_check_telem
--- a/krkn/utils/init.py
+++ b/krkn/utils/init.py
@@ -1,2 +1,4 @@
 from .TeeLogHandler import TeeLogHandler
+from .ErrorLog import ErrorLog
+from .ErrorCollectionHandler import ErrorCollectionHandler
 from .functions import *
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,24 +1,23 @@
 aliyun-python-sdk-core==2.13.36
 aliyun-python-sdk-ecs==4.24.25
 arcaflow-plugin-sdk==0.14.0
-boto3==1.28.61
+boto3>=1.34.0  # Updated to support urllib3 2.x
 azure-identity==1.16.1
 azure-keyvault==4.2.0
 azure-mgmt-compute==30.5.0
 azure-mgmt-network==27.0.0
-itsdangerous==2.0.1
 coverage==7.6.12
 datetime==5.4
-docker==7.0.0
+docker>=6.0,<7.0  # docker 7.0+ has breaking changes; works with requests<2.32
 gitpython==3.1.41
 google-auth==2.37.0
 google-cloud-compute==1.22.0
-ibm_cloud_sdk_core==3.18.0
-ibm_vpc==0.20.0
+ibm_cloud_sdk_core>=3.20.0  # Requires urllib3>=2.1.0 (compatible with updated boto3)
+ibm_vpc==0.26.3  # Requires ibm_cloud_sdk_core
 jinja2==3.1.6
-krkn-lib==5.1.11
 lxml==5.1.0
 kubernetes==34.1.0
+krkn-lib==6.0.3
 numpy==1.26.4
 pandas==2.2.0
 openshift-client==1.0.21
@@ -28,13 +27,15 @@ pyfiglet==1.0.2
 pytest==8.0.0
 python-ipmi==0.5.4
 python-openstackclient==6.5.0
-requests==2.32.4
+requests<2.32  # requests 2.32+ breaks Unix socket support (http+docker scheme)
+requests-unixsocket>=0.4.0  # Required for Docker Unix socket support
+urllib3>=2.1.0,<2.4.0  # Compatible with all dependencies
 service_identity==24.1.0
 PyYAML==6.0.1
 setuptools==78.1.1
-werkzeug==3.0.6
-wheel==0.42.0
-zope.interface==5.4.0
+wheel>=0.44.0
+zope.interface==6.1
+colorlog==6.10.1

 git+https://github.com/vmware/vsphere-automation-sdk-python.git@v8.0.0.0
 cryptography>=42.0.4 # not directly required, pinned by Snyk to avoid a vulnerability
--- a/run_kraken.py
+++ b/run_kraken.py
@@ -6,6 +6,7 @@ import sys
 import yaml
 import logging
 import optparse
+from colorlog import ColoredFormatter
 import pyfiglet
 import uuid
 import time
@@ -27,7 +28,7 @@ from krkn_lib.models.telemetry import ChaosRunTelemetry
 from krkn_lib.utils import SafeLogger
 from krkn_lib.utils.functions import get_yaml_item_value, get_junit_test_case

-from krkn.utils import TeeLogHandler
+from krkn.utils import TeeLogHandler, ErrorCollectionHandler
 from krkn.utils.HealthChecker import HealthChecker
 from krkn.utils.VirtChecker import VirtChecker
 from krkn.scenario_plugins.scenario_plugin_factory import (
@@ -133,7 +134,7 @@ def main(options, command: Optional[str]) -> int:
        telemetry_api_url = config["telemetry"].get("api_url")
        health_check_config = get_yaml_item_value(config, "health_checks",{})
        kubevirt_check_config = get_yaml_item_value(config, "kubevirt_checks", {})
-
+        
        # Initialize clients
        if not os.path.isfile(kubeconfig_path) and not os.path.isfile(
                "/var/run/secrets/kubernetes.io/serviceaccount/token"
@@ -141,7 +142,7 @@ def main(options, command: Optional[str]) -> int:
            logging.error(
                "Cannot read the kubeconfig file at %s, please check" % kubeconfig_path
            )
-            return 1
+            return -1
        logging.info("Initializing client to talk to the Kubernetes cluster")

        # Generate uuid for the run
@@ -184,10 +185,10 @@ def main(options, command: Optional[str]) -> int:
        # Set up kraken url to track signal
        if not 0 <= int(port) <= 65535:
            logging.error("%s isn't a valid port number, please check" % (port))
-            return 1
+            return -1
        if not signal_address:
            logging.error("Please set the signal address in the config")
-            return 1
+            return -1
        address = (signal_address, port)

        # If publish_running_status is False this should keep us going
@@ -220,7 +221,7 @@ def main(options, command: Optional[str]) -> int:
                        "invalid distribution selected, running openshift scenarios against kubernetes cluster."
                        "Please set 'kubernetes' in config.yaml krkn.platform and try again"
                    )
-                    return 1
+                    return -1
        if cv != "":
            logging.info(cv)
        else:
@@ -326,7 +327,7 @@ def main(options, command: Optional[str]) -> int:
                                               args=(health_check_config, health_check_telemetry_queue))
        health_check_worker.start()

-        kubevirt_check_telemetry_queue = queue.Queue()
+        kubevirt_check_telemetry_queue = queue.SimpleQueue()
        kubevirt_checker = VirtChecker(kubevirt_check_config, iterations=iterations, krkn_lib=kubecli)
        kubevirt_checker.batch_list(kubevirt_check_telemetry_queue)

@@ -361,7 +362,7 @@ def main(options, command: Optional[str]) -> int:
                            logging.error(
                                f"impossible to find scenario {scenario_type}, plugin not found. Exiting"
                            )
-                            sys.exit(1)
+                            sys.exit(-1)

                        failed_post_scenarios, scenario_telemetries = (
                            scenario_plugin.run_scenarios(
@@ -393,8 +394,7 @@ def main(options, command: Optional[str]) -> int:

            iteration += 1
            health_checker.current_iterations += 1
-            kubevirt_checker.current_iterations += 1
-
+            kubevirt_checker.increment_iterations()
        # telemetry
        # in order to print decoded telemetry data even if telemetry collection
        # is disabled, it's necessary to serialize the ChaosRunTelemetry object
@@ -408,15 +408,12 @@ def main(options, command: Optional[str]) -> int:
        
        kubevirt_checker.thread_join()
        kubevirt_check_telem = []
-        i =0
-        while i <= kubevirt_checker.threads_limit:
-            if not kubevirt_check_telemetry_queue.empty():
-                kubevirt_check_telem.extend(kubevirt_check_telemetry_queue.get_nowait())
-            else:
-                break
-            i+= 1
+        while not kubevirt_check_telemetry_queue.empty():
+            kubevirt_check_telem.extend(kubevirt_check_telemetry_queue.get_nowait())
        chaos_telemetry.virt_checks = kubevirt_check_telem
-
+        
+        post_kubevirt_check = kubevirt_checker.gather_post_virt_checks(kubevirt_check_telem)
+        chaos_telemetry.post_virt_checks = post_kubevirt_check
        # if platform is openshift will be collected
        # Cloud platform and network plugins metadata
        # through OCP specific APIs
@@ -429,16 +426,22 @@ def main(options, command: Optional[str]) -> int:
            logging.info("collecting Kubernetes cluster metadata....")
            telemetry_k8s.collect_cluster_metadata(chaos_telemetry)

+        # Collect error logs from handler
+        error_logs = error_collection_handler.get_error_logs()
+        if error_logs:
+            logging.info(f"Collected {len(error_logs)} error logs for telemetry")
+            chaos_telemetry.error_logs = error_logs
+        else:
+            logging.info("No error logs collected during chaos run")
+            chaos_telemetry.error_logs = []
+
        telemetry_json = chaos_telemetry.to_json()
        decoded_chaos_run_telemetry = ChaosRunTelemetry(json.loads(telemetry_json))
        chaos_output.telemetry = decoded_chaos_run_telemetry
        logging.info(f"Chaos data:\n{chaos_output.to_json()}")
        if enable_elastic:
-            elastic_telemetry = ElasticChaosRunTelemetry(
-                chaos_run_telemetry=decoded_chaos_run_telemetry
-            )
            result = elastic_search.push_telemetry(
-                elastic_telemetry, elastic_telemetry_index
+                decoded_chaos_run_telemetry, elastic_telemetry_index
            )
            if result == -1:
                safe_logger.error(
@@ -526,7 +529,7 @@ def main(options, command: Optional[str]) -> int:

            else:
                logging.error("Alert profile is not defined")
-                return 1
+                return -1
                # sys.exit(1)
        if enable_metrics:
            logging.info(f'Capturing metrics using file {metrics_profile}')
@@ -541,21 +544,28 @@ def main(options, command: Optional[str]) -> int:
                telemetry_json
            )

+        # want to exit with 1 first to show failure of scenario 
+        # even if alerts failing
+        if failed_post_scenarios:
+            logging.error(
+                "Post scenarios are still failing at the end of all iterations"
+            )
+            # sys.exit(1)
+            return 1
+
        if post_critical_alerts > 0:
            logging.error("Critical alerts are firing, please check; exiting")
            # sys.exit(2)
            return 2

-        if failed_post_scenarios:
-            logging.error(
-                "Post scenarios are still failing at the end of all iterations"
-            )
-            # sys.exit(2)
-            return 2
        if health_checker.ret_value != 0:
            logging.error("Health check failed for the applications, Please check; exiting")
            return health_checker.ret_value

+        if kubevirt_checker.ret_value != 0:
+            logging.error("Kubevirt check still had failed VMIs at end of run, Please check; exiting")
+            return kubevirt_checker.ret_value
+
        logging.info(
            "Successfully finished running Kraken. UUID for the run: "
            "%s. Report generated at %s. Exiting" % (run_uuid, report_file)
@@ -563,7 +573,7 @@ def main(options, command: Optional[str]) -> int:
    else:
        logging.error("Cannot find a config at %s, please check" % (cfg))
        # sys.exit(1)
-        return 2
+        return -1

    return 0

@@ -643,15 +653,30 @@ if __name__ == "__main__":
    # If no command or regular execution, continue with existing logic
    report_file = options.output
    tee_handler = TeeLogHandler()
+
+    fmt = "%(asctime)s [%(levelname)s] %(message)s"
+    plain = logging.Formatter(fmt)
+    colored = ColoredFormatter(
+        "%(asctime)s [%(log_color)s%(levelname)s%(reset)s] %(message)s",
+        log_colors={'DEBUG': 'white', 'INFO': 'white', 'WARNING': 'yellow', 'ERROR': 'red', 'CRITICAL': 'bold_red'},
+        reset=True, style='%'
+    )
+    file_handler = logging.FileHandler(report_file, mode="w")
+    file_handler.setFormatter(plain)
+    stream_handler = logging.StreamHandler()
+    stream_handler.setFormatter(colored)
+    tee_handler.setFormatter(plain)
+    error_collection_handler = ErrorCollectionHandler(level=logging.ERROR)
+
    handlers = [
-        logging.FileHandler(report_file, mode="w"),
-        logging.StreamHandler(),
+        file_handler,
+        stream_handler,
        tee_handler,
+        error_collection_handler,
    ]

    logging.basicConfig(
        level=logging.DEBUG if options.debug else logging.INFO,
-        format="%(asctime)s [%(levelname)s] %(message)s",
        handlers=handlers,
    )
    option_error = False
@@ -732,4 +757,4 @@ if __name__ == "__main__":
        with open(junit_testcase_file_path, "w") as stream:
            stream.write(junit_testcase_xml)

-    sys.exit(retval)
+    sys.exit(retval)
--- a/scenarios/kind/node_scenarios_example.yml
+++ b/scenarios/kind/node_scenarios_example.yml
@@ -1,16 +1,18 @@
 node_scenarios:
  - actions:                                                        # node chaos scenarios to be injected
    - node_stop_start_scenario
-    node_name: kind-worker                                          # node on which scenario has to be injected; can set multiple names separated by comma
-    # label_selector: node-role.kubernetes.io/worker                # when node_name is not specified, a node with matching label_selector is selected for node chaos scenario injection
+    # node_name: kind-control-plane                                        # node on which scenario has to be injected; can set multiple names separated by comma
+    label_selector: kubernetes.io/hostname=kind-worker              # when node_name is not specified, a node with matching label_selector is selected for node chaos scenario injection
    instance_count: 1                                               # Number of nodes to perform action/select that match the label selector
    runs: 1                                                         # number of times to inject each scenario under actions (will perform on same node each time)
    timeout: 120                                                    # duration to wait for completion of node scenario injection
    cloud_type: docker                                                # cloud type on which Kubernetes/OpenShift runs
+    duration: 10
  - actions:
    - node_reboot_scenario
-    node_name: kind-worker
-    # label_selector: node-role.kubernetes.io/infra
+    node_name: kind-control-plane
+    # label_selector: kubernetes.io/hostname=kind-worker
    instance_count: 1
    timeout: 120
    cloud_type: docker
+    kube_check: false
--- a/scenarios/kind/pod_etcd.yml
+++ b/scenarios/kind/pod_etcd.yml
@@ -3,3 +3,4 @@
    namespace_pattern: "kube-system"
    label_selector: "component=etcd"
    krkn_pod_recovery_time: 120
+    kill: 1
--- a/scenarios/kind/pod_path_provisioner.yml
+++ b/scenarios/kind/pod_path_provisioner.yml
@@ -0,0 +1,6 @@
+- id: kill-pods
+  config:
+    namespace_pattern: "local-path-storage"
+    label_selector: "app=local-path-provisioner"
+    krkn_pod_recovery_time: 20
+    kill: 1
--- a/scenarios/kind/pvc_scenario.yaml
+++ b/scenarios/kind/pvc_scenario.yaml
@@ -0,0 +1,7 @@
+pvc_scenario:
+  pvc_name: kraken-test-pvc         # Name of the target PVC
+  pod_name: kraken-test-pod      # Name of the pod where the PVC is mounted, it will be ignored if the pvc_name is defined
+  namespace: kraken  # Namespace where the PVC is
+  fill_percentage: 98           # Target percentage to fill up the cluster, value must be higher than current percentage, valid values are between 0 and 99
+  duration: 10                  # Duration in seconds for the fault
+  block_size: 102400            # used only by dd if fallocate not present in the container
--- a/scenarios/kube/container_dns.yml
+++ b/scenarios/kube/container_dns.yml
@@ -6,3 +6,4 @@ scenarios:
  action: 1
  count: 1
  retry_wait: 60
+  exclude_label: ""
--- a/scenarios/kube/node-network-chaos.yml
+++ b/scenarios/kube/node-network-chaos.yml
@@ -0,0 +1,18 @@
+- id: node_network_chaos
+  image: "quay.io/krkn-chaos/krkn-network-chaos:latest"
+  wait_duration: 1
+  test_duration: 60
+  label_selector: ""
+  service_account: ""
+  taints: []
+  namespace: 'default'
+  instance_count: 1
+  target: "<node_name>"
+  execution: parallel
+  interfaces: []
+  ingress: true
+  egress: true
+  latency: 0s # supported units are us (microseconds), ms, s
+  loss: 10 # percentage
+  bandwidth: 1gbit #supported units are bit kbit mbit gbit tbit
+  force: false
--- a/scenarios/kube/node-network-filter.yml
+++ b/scenarios/kube/node-network-filter.yml
@@ -4,7 +4,7 @@
  test_duration: 10
  label_selector: "<node_selector>"
  service_account: ""
-  taints: [] # example ["node-role.kubernetes.io/master:NoSchedule"]
+  taints: []
  namespace: 'default'
  instance_count: 1
  execution: parallel
--- a/scenarios/kube/pod-network-chaos.yml
+++ b/scenarios/kube/pod-network-chaos.yml
@@ -0,0 +1,17 @@
+- id: pod_network_chaos
+  image: "quay.io/krkn-chaos/krkn-network-chaos:latest"
+  wait_duration: 1
+  test_duration: 60
+  label_selector: ""
+  service_account: ""
+  taints: []
+  namespace: 'default'
+  instance_count: 1
+  target: "<pod_name>"
+  execution: parallel
+  interfaces: []
+  ingress: true
+  egress: true
+  latency: 0s # supported units are us (microseconds), ms, s
+  loss: 10 # percentage
+  bandwidth: 1gbit #supported units are bit kbit mbit gbit tbit
--- a/scenarios/kube/pod-network-filter.yml
+++ b/scenarios/kube/pod-network-filter.yml
@@ -4,7 +4,7 @@
  test_duration: 60
  label_selector: "<pod_selector>"
  service_account: ""
-  taints: [] # example ["node-role.kubernetes.io/master:NoSchedule"]
+  taints: []
  namespace: 'default'
  instance_count: 1
  execution: parallel
--- a/scenarios/openshift/app_outage.yaml
+++ b/scenarios/openshift/app_outage.yaml
@@ -3,3 +3,4 @@ application_outage:                                  # Scenario to create an out
  namespace: <namespace-with-application>            # Namespace to target - all application routes will go inaccessible if pod selector is empty
  pod_selector: {app: foo}                            # Pods to target
  block: [Ingress, Egress]                           # It can be Ingress or Egress or Ingress, Egress
+  exclude_label: ""                                  # Optional label selector to exclude pods. Supports dict, string, or list format
--- a/scenarios/openshift/aws_node_scenarios.yml
+++ b/scenarios/openshift/aws_node_scenarios.yml
@@ -10,6 +10,7 @@ node_scenarios:
    cloud_type: aws                                               # cloud type on which Kubernetes/OpenShift runs  
    parallel: true                                                # Run action on label or node name in parallel or sequential, defaults to sequential
    kube_check: true                                              # Run the kubernetes api calls to see if the node gets to a certain state during the node scenario
+    poll_interval: 15                                             # Time interval(in seconds) to periodically check the node's status
  - actions:
    - node_reboot_scenario
    node_name:
--- a/scenarios/openshift/container_etcd.yml
+++ b/scenarios/openshift/container_etcd.yml
@@ -6,3 +6,4 @@ scenarios:
  action: 1
  count: 1
  expected_recovery_time: 120
+  exclude_label: ""
--- a/tests/kubevirt_vm_outage/test_kubevirt_vm_outage.py
+++ b/tests/kubevirt_vm_outage/test_kubevirt_vm_outage.py
@@ -1,215 +0,0 @@
-import unittest
-import time
-from unittest.mock import MagicMock, patch
-
-import yaml
-from krkn_lib.k8s import KrknKubernetes
-from krkn_lib.models.telemetry import ScenarioTelemetry
-from krkn_lib.telemetry.ocp import KrknTelemetryOpenshift
-
-from krkn.scenario_plugins.kubevirt_vm_outage.kubevirt_vm_outage_scenario_plugin import KubevirtVmOutageScenarioPlugin
-
-
-class TestKubevirtVmOutageScenarioPlugin(unittest.TestCase):
-    
-    def setUp(self):
-        """
-        Set up test fixtures for KubevirtVmOutageScenarioPlugin
-        """
-        self.plugin = KubevirtVmOutageScenarioPlugin()
-        
-        # Create mock k8s client
-        self.k8s_client = MagicMock()
-        self.custom_object_client = MagicMock()
-        self.k8s_client.custom_object_client = self.custom_object_client
-        self.plugin.k8s_client = self.k8s_client
-        
-        # Mock methods needed for KubeVirt operations
-        self.k8s_client.list_custom_resource_definition = MagicMock()
-        
-        # Mock custom resource definition list with KubeVirt CRDs
-        crd_list = MagicMock()
-        crd_item = MagicMock()
-        crd_item.spec = MagicMock()
-        crd_item.spec.group = "kubevirt.io"
-        crd_list.items = [crd_item]
-        self.k8s_client.list_custom_resource_definition.return_value = crd_list
-        
-        # Mock VMI data
-        self.mock_vmi = {
-            "metadata": {
-                "name": "test-vm",
-                "namespace": "default"
-            },
-            "status": {
-                "phase": "Running"
-            }
-        }
-        
-        # Create test config
-        self.config = {
-            "scenarios": [
-                {
-                    "name": "kubevirt outage test",
-                    "scenario": "kubevirt_vm_outage",
-                    "parameters": {
-                        "vm_name": "test-vm",
-                        "namespace": "default",
-                        "duration": 0  
-                    }
-                }
-            ]
-        }
-        
-        # Create a temporary config file
-        import tempfile, os
-        temp_dir = tempfile.gettempdir()
-        self.scenario_file = os.path.join(temp_dir, "test_kubevirt_scenario.yaml")
-        with open(self.scenario_file, "w") as f:
-            yaml.dump(self.config, f)
-            
-        # Mock dependencies
-        self.telemetry = MagicMock(spec=KrknTelemetryOpenshift)
-        self.scenario_telemetry = MagicMock(spec=ScenarioTelemetry)
-        self.telemetry.get_lib_kubernetes.return_value = self.k8s_client
-        
-    def test_successful_injection_and_recovery(self):
-        """
-        Test successful deletion and recovery of a VMI
-        """
-        # Mock get_vmi to return our mock VMI
-        with patch.object(self.plugin, 'get_vmi', return_value=self.mock_vmi):
-            # Mock inject and recover to simulate success
-            with patch.object(self.plugin, 'inject', return_value=0) as mock_inject:
-                with patch.object(self.plugin, 'recover', return_value=0) as mock_recover:
-                    with patch("builtins.open", unittest.mock.mock_open(read_data=yaml.dump(self.config))):
-                        result = self.plugin.run("test-uuid", self.scenario_file, {}, self.telemetry, self.scenario_telemetry)
-                    
-        self.assertEqual(result, 0)
-        mock_inject.assert_called_once_with("test-vm", "default", False)
-        mock_recover.assert_called_once_with("test-vm", "default", False)
-        
-    def test_injection_failure(self):
-        """
-        Test failure during VMI deletion
-        """
-        # Mock get_vmi to return our mock VMI
-        with patch.object(self.plugin, 'get_vmi', return_value=self.mock_vmi):
-            # Mock inject to simulate failure
-            with patch.object(self.plugin, 'inject', return_value=1) as mock_inject:
-                with patch.object(self.plugin, 'recover', return_value=0) as mock_recover:
-                    with patch("builtins.open", unittest.mock.mock_open(read_data=yaml.dump(self.config))):
-                        result = self.plugin.run("test-uuid", self.scenario_file, {}, self.telemetry, self.scenario_telemetry)
-                    
-        self.assertEqual(result, 1)
-        mock_inject.assert_called_once_with("test-vm", "default", False)
-        mock_recover.assert_not_called()
-        
-    def test_disable_auto_restart(self):
-        """
-        Test VM auto-restart can be disabled
-        """
-        # Configure test with disable_auto_restart=True
-        self.config["scenarios"][0]["parameters"]["disable_auto_restart"] = True
-        
-        # Mock VM object for patching
-        mock_vm = {
-            "metadata": {"name": "test-vm", "namespace": "default"},
-            "spec": {}
-        }
-        
-        # Mock get_vmi to return our mock VMI
-        with patch.object(self.plugin, 'get_vmi', return_value=self.mock_vmi):
-            # Mock VM patch operation
-            with patch.object(self.plugin, 'patch_vm_spec') as mock_patch_vm:
-                mock_patch_vm.return_value = True
-                # Mock inject and recover
-                with patch.object(self.plugin, 'inject', return_value=0) as mock_inject:
-                    with patch.object(self.plugin, 'recover', return_value=0) as mock_recover:
-                        with patch("builtins.open", unittest.mock.mock_open(read_data=yaml.dump(self.config))):
-                            result = self.plugin.run("test-uuid", self.scenario_file, {}, self.telemetry, self.scenario_telemetry)
-        
-        self.assertEqual(result, 0)
-        # Should call patch_vm_spec to disable auto-restart
-        mock_patch_vm.assert_any_call("test-vm", "default", False)
-        # Should call patch_vm_spec to re-enable auto-restart during recovery
-        mock_patch_vm.assert_any_call("test-vm", "default", True)
-        mock_inject.assert_called_once_with("test-vm", "default", True)
-        mock_recover.assert_called_once_with("test-vm", "default", True)
-        
-    def test_recovery_when_vmi_does_not_exist(self):
-        """
-        Test recovery logic when VMI does not exist after deletion
-        """
-        # Store the original VMI in the plugin for recovery
-        self.plugin.original_vmi = self.mock_vmi.copy()
-        
-        # Create a cleaned vmi_dict as the plugin would
-        vmi_dict = self.mock_vmi.copy()
-        
-        # Set up running VMI data for after recovery
-        running_vmi = {
-            "metadata": {"name": "test-vm", "namespace": "default"},
-            "status": {"phase": "Running"}
-        }
-        
-        # Set up time.time to immediately exceed the timeout for auto-recovery
-        with patch('time.time', side_effect=[0, 301, 301, 301, 301, 310, 320]):
-            # Mock get_vmi to always return None (not auto-recovered)
-            with patch.object(self.plugin, 'get_vmi', side_effect=[None, None, running_vmi]):
-                # Mock the custom object API to return success
-                self.custom_object_client.create_namespaced_custom_object = MagicMock(return_value=running_vmi)
-                
-                # Run recovery with mocked time.sleep
-                with patch('time.sleep'):
-                    result = self.plugin.recover("test-vm", "default", False)
-        
-        self.assertEqual(result, 0)
-        # Verify create was called with the right arguments for our API version and kind
-        self.custom_object_client.create_namespaced_custom_object.assert_called_once_with(
-            group="kubevirt.io",
-            version="v1",
-            namespace="default",
-            plural="virtualmachineinstances",
-            body=vmi_dict
-        )
-    
-    def test_validation_failure(self):
-        """
-        Test validation failure when KubeVirt is not installed
-        """
-        # Mock empty CRD list (no KubeVirt CRDs)
-        empty_crd_list = MagicMock()
-        empty_crd_list.items = []
-        self.k8s_client.list_custom_resource_definition.return_value = empty_crd_list
-        
-        with patch("builtins.open", unittest.mock.mock_open(read_data=yaml.dump(self.config))):
-            result = self.plugin.run("test-uuid", self.scenario_file, {}, self.telemetry, self.scenario_telemetry)
-            
-        self.assertEqual(result, 1)
-        
-    def test_delete_vmi_timeout(self):
-        """
-        Test timeout during VMI deletion
-        """
-        # Mock successful delete operation
-        self.custom_object_client.delete_namespaced_custom_object = MagicMock(return_value={})
-        
-        # Mock that get_vmi always returns VMI (never gets deleted)
-        with patch.object(self.plugin, 'get_vmi', return_value=self.mock_vmi):
-            # Simulate timeout by making time.time return values that exceed the timeout
-            with patch('time.sleep'), patch('time.time', side_effect=[0, 10, 20, 130, 130, 130, 130, 140]):
-                result = self.plugin.inject("test-vm", "default", False)
-            
-        self.assertEqual(result, 1)
-        self.custom_object_client.delete_namespaced_custom_object.assert_called_once_with(
-            group="kubevirt.io",
-            version="v1",
-            namespace="default",
-            plural="virtualmachineinstances",
-            name="test-vm"
-        )
-
-
-if __name__ == "__main__":
-    unittest.main()
--- a/tests/run_python_plugin.py
+++ b/tests/run_python_plugin.py
@@ -0,0 +1,37 @@
+import tempfile
+import unittest
+
+from krkn.scenario_plugins.native.run_python_plugin import (
+    RunPythonFileInput,
+    run_python_file,
+)
+
+
+class RunPythonPluginTest(unittest.TestCase):
+    def test_success_execution(self):
+        tmp_file = tempfile.NamedTemporaryFile()
+        tmp_file.write(bytes("print('Hello world!')", "utf-8"))
+        tmp_file.flush()
+        output_id, output_data = run_python_file(
+            params=RunPythonFileInput(tmp_file.name),
+            run_id="test-python-plugin-success",
+        )
+        self.assertEqual("success", output_id)
+        self.assertEqual("Hello world!\n", output_data.stdout)
+
+    def test_error_execution(self):
+        tmp_file = tempfile.NamedTemporaryFile()
+        tmp_file.write(
+            bytes("import sys\nprint('Hello world!')\nsys.exit(42)\n", "utf-8")
+        )
+        tmp_file.flush()
+        output_id, output_data = run_python_file(
+            params=RunPythonFileInput(tmp_file.name), run_id="test-python-plugin-error"
+        )
+        self.assertEqual("error", output_id)
+        self.assertEqual(42, output_data.exit_code)
+        self.assertEqual("Hello world!\n", output_data.stdout)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_abstract_node_scenarios.py
+++ b/tests/test_abstract_node_scenarios.py
@@ -0,0 +1,415 @@
+"""
+Test suite for AbstractNode Scenarios
+
+Usage:
+    python -m coverage run -a -m unittest tests/test_abstract_node_scenarios.py
+
+Assisted By: Claude Code
+"""
+
+import unittest
+from unittest.mock import Mock, patch
+from krkn.scenario_plugins.node_actions.abstract_node_scenarios import abstract_node_scenarios
+from krkn_lib.k8s import KrknKubernetes
+from krkn_lib.models.k8s import AffectedNode, AffectedNodeStatus
+
+
+class TestAbstractNodeScenarios(unittest.TestCase):
+    """Test suite for abstract_node_scenarios class"""
+
+    def setUp(self):
+        """Set up test fixtures before each test method"""
+        self.mock_kubecli = Mock(spec=KrknKubernetes)
+        self.mock_affected_nodes_status = Mock(spec=AffectedNodeStatus)
+        self.mock_affected_nodes_status.affected_nodes = []
+        self.node_action_kube_check = True
+
+        self.scenarios = abstract_node_scenarios(
+            kubecli=self.mock_kubecli,
+            node_action_kube_check=self.node_action_kube_check,
+            affected_nodes_status=self.mock_affected_nodes_status
+        )
+
+    def test_init(self):
+        """Test initialization of abstract_node_scenarios"""
+        self.assertEqual(self.scenarios.kubecli, self.mock_kubecli)
+        self.assertEqual(self.scenarios.affected_nodes_status, self.mock_affected_nodes_status)
+        self.assertTrue(self.scenarios.node_action_kube_check)
+
+    @patch('time.sleep')
+    @patch('logging.info')
+    def test_node_stop_start_scenario(self, mock_logging, mock_sleep):
+        """Test node_stop_start_scenario calls stop and start in sequence"""
+        # Arrange
+        instance_kill_count = 1
+        node = "test-node"
+        timeout = 300
+        duration = 60
+        poll_interval = 10
+
+        self.scenarios.node_stop_scenario = Mock()
+        self.scenarios.node_start_scenario = Mock()
+
+        # Act
+        self.scenarios.node_stop_start_scenario(
+            instance_kill_count, node, timeout, duration, poll_interval
+        )
+
+        # Assert
+        self.scenarios.node_stop_scenario.assert_called_once_with(
+            instance_kill_count, node, timeout, poll_interval
+        )
+        mock_sleep.assert_called_once_with(duration)
+        self.scenarios.node_start_scenario.assert_called_once_with(
+            instance_kill_count, node, timeout, poll_interval
+        )
+        self.mock_affected_nodes_status.merge_affected_nodes.assert_called_once()
+
+    @patch('logging.info')
+    def test_helper_node_stop_start_scenario(self, mock_logging):
+        """Test helper_node_stop_start_scenario calls helper stop and start"""
+        # Arrange
+        instance_kill_count = 1
+        node = "helper-node"
+        timeout = 300
+
+        self.scenarios.helper_node_stop_scenario = Mock()
+        self.scenarios.helper_node_start_scenario = Mock()
+
+        # Act
+        self.scenarios.helper_node_stop_start_scenario(instance_kill_count, node, timeout)
+
+        # Assert
+        self.scenarios.helper_node_stop_scenario.assert_called_once_with(
+            instance_kill_count, node, timeout
+        )
+        self.scenarios.helper_node_start_scenario.assert_called_once_with(
+            instance_kill_count, node, timeout
+        )
+
+    @patch('time.sleep')
+    @patch('logging.info')
+    def test_node_disk_detach_attach_scenario_success(self, mock_logging, mock_sleep):
+        """Test disk detach/attach scenario with valid disk attachment"""
+        # Arrange
+        instance_kill_count = 1
+        node = "test-node"
+        timeout = 300
+        duration = 60
+        disk_details = {"disk_id": "disk-123", "device": "/dev/sdb"}
+
+        self.scenarios.get_disk_attachment_info = Mock(return_value=disk_details)
+        self.scenarios.disk_detach_scenario = Mock()
+        self.scenarios.disk_attach_scenario = Mock()
+
+        # Act
+        self.scenarios.node_disk_detach_attach_scenario(
+            instance_kill_count, node, timeout, duration
+        )
+
+        # Assert
+        self.scenarios.get_disk_attachment_info.assert_called_once_with(
+            instance_kill_count, node
+        )
+        self.scenarios.disk_detach_scenario.assert_called_once_with(
+            instance_kill_count, node, timeout
+        )
+        mock_sleep.assert_called_once_with(duration)
+        self.scenarios.disk_attach_scenario.assert_called_once_with(
+            instance_kill_count, disk_details, timeout
+        )
+
+    @patch('logging.error')
+    @patch('logging.info')
+    def test_node_disk_detach_attach_scenario_no_disk(self, mock_info, mock_error):
+        """Test disk detach/attach scenario when only root disk exists"""
+        # Arrange
+        instance_kill_count = 1
+        node = "test-node"
+        timeout = 300
+        duration = 60
+
+        self.scenarios.get_disk_attachment_info = Mock(return_value=None)
+        self.scenarios.disk_detach_scenario = Mock()
+        self.scenarios.disk_attach_scenario = Mock()
+
+        # Act
+        self.scenarios.node_disk_detach_attach_scenario(
+            instance_kill_count, node, timeout, duration
+        )
+
+        # Assert
+        self.scenarios.disk_detach_scenario.assert_not_called()
+        self.scenarios.disk_attach_scenario.assert_not_called()
+        mock_error.assert_any_call("Node %s has only root disk attached" % node)
+
+    @patch('krkn.scenario_plugins.node_actions.abstract_node_scenarios.nodeaction.wait_for_unknown_status')
+    @patch('krkn.scenario_plugins.node_actions.abstract_node_scenarios.runcommand.run')
+    @patch('logging.info')
+    def test_stop_kubelet_scenario_success(self, mock_logging, mock_run, mock_wait):
+        """Test successful kubelet stop scenario"""
+        # Arrange
+        instance_kill_count = 2
+        node = "test-node"
+        timeout = 300
+        mock_affected_node = Mock(spec=AffectedNode)
+        mock_wait.return_value = None
+
+        # Act
+        with patch('krkn.scenario_plugins.node_actions.abstract_node_scenarios.AffectedNode') as mock_affected_node_class:
+            mock_affected_node_class.return_value = mock_affected_node
+            self.scenarios.stop_kubelet_scenario(instance_kill_count, node, timeout)
+
+        # Assert
+        self.assertEqual(mock_run.call_count, 2)
+        expected_command = "oc debug node/" + node + " -- chroot /host systemctl stop kubelet"
+        mock_run.assert_called_with(expected_command)
+        self.assertEqual(mock_wait.call_count, 2)
+        self.assertEqual(len(self.mock_affected_nodes_status.affected_nodes), 2)
+
+    @patch('krkn.scenario_plugins.node_actions.abstract_node_scenarios.nodeaction.wait_for_unknown_status')
+    @patch('krkn.scenario_plugins.node_actions.abstract_node_scenarios.runcommand.run')
+    @patch('logging.error')
+    @patch('logging.info')
+    def test_stop_kubelet_scenario_failure(self, mock_info, mock_error, mock_run, mock_wait):
+        """Test kubelet stop scenario when command fails"""
+        # Arrange
+        instance_kill_count = 1
+        node = "test-node"
+        timeout = 300
+        error_msg = "Command failed"
+        mock_run.side_effect = Exception(error_msg)
+
+        # Act & Assert
+        with self.assertRaises(Exception):
+            with patch('krkn.scenario_plugins.node_actions.abstract_node_scenarios.AffectedNode'):
+                self.scenarios.stop_kubelet_scenario(instance_kill_count, node, timeout)
+
+        mock_error.assert_any_call(
+            "Failed to stop the kubelet of the node. Encountered following "
+            "exception: %s. Test Failed" % error_msg
+        )
+
+    @patch('logging.info')
+    def test_stop_start_kubelet_scenario(self, mock_logging):
+        """Test stop/start kubelet scenario"""
+        # Arrange
+        instance_kill_count = 1
+        node = "test-node"
+        timeout = 300
+
+        self.scenarios.stop_kubelet_scenario = Mock()
+        self.scenarios.node_reboot_scenario = Mock()
+
+        # Act
+        self.scenarios.stop_start_kubelet_scenario(instance_kill_count, node, timeout)
+
+        # Assert
+        self.scenarios.stop_kubelet_scenario.assert_called_once_with(
+            instance_kill_count, node, timeout
+        )
+        self.scenarios.node_reboot_scenario.assert_called_once_with(
+            instance_kill_count, node, timeout
+        )
+        self.mock_affected_nodes_status.merge_affected_nodes.assert_called_once()
+
+    @patch('krkn.scenario_plugins.node_actions.abstract_node_scenarios.nodeaction.wait_for_ready_status')
+    @patch('krkn.scenario_plugins.node_actions.abstract_node_scenarios.runcommand.run')
+    @patch('logging.info')
+    def test_restart_kubelet_scenario_success(self, mock_logging, mock_run, mock_wait):
+        """Test successful kubelet restart scenario"""
+        # Arrange
+        instance_kill_count = 2
+        node = "test-node"
+        timeout = 300
+        mock_affected_node = Mock(spec=AffectedNode)
+        mock_wait.return_value = None
+
+        # Act
+        with patch('krkn.scenario_plugins.node_actions.abstract_node_scenarios.AffectedNode') as mock_affected_node_class:
+            mock_affected_node_class.return_value = mock_affected_node
+            self.scenarios.restart_kubelet_scenario(instance_kill_count, node, timeout)
+
+        # Assert
+        self.assertEqual(mock_run.call_count, 2)
+        expected_command = "oc debug node/" + node + " -- chroot /host systemctl restart kubelet &"
+        mock_run.assert_called_with(expected_command)
+        self.assertEqual(mock_wait.call_count, 2)
+        self.assertEqual(len(self.mock_affected_nodes_status.affected_nodes), 2)
+
+    @patch('krkn.scenario_plugins.node_actions.abstract_node_scenarios.nodeaction.wait_for_ready_status')
+    @patch('krkn.scenario_plugins.node_actions.abstract_node_scenarios.runcommand.run')
+    @patch('logging.error')
+    @patch('logging.info')
+    def test_restart_kubelet_scenario_failure(self, mock_info, mock_error, mock_run, mock_wait):
+        """Test kubelet restart scenario when command fails"""
+        # Arrange
+        instance_kill_count = 1
+        node = "test-node"
+        timeout = 300
+        error_msg = "Restart failed"
+        mock_run.side_effect = Exception(error_msg)
+
+        # Act & Assert
+        with self.assertRaises(Exception):
+            with patch('krkn.scenario_plugins.node_actions.abstract_node_scenarios.AffectedNode'):
+                self.scenarios.restart_kubelet_scenario(instance_kill_count, node, timeout)
+
+        mock_error.assert_any_call(
+            "Failed to restart the kubelet of the node. Encountered following "
+            "exception: %s. Test Failed" % error_msg
+        )
+
+    @patch('krkn.scenario_plugins.node_actions.abstract_node_scenarios.runcommand.run')
+    @patch('logging.info')
+    def test_node_crash_scenario_success(self, mock_logging, mock_run):
+        """Test successful node crash scenario"""
+        # Arrange
+        instance_kill_count = 2
+        node = "test-node"
+        timeout = 300
+
+        # Act
+        result = self.scenarios.node_crash_scenario(instance_kill_count, node, timeout)
+
+        # Assert
+        self.assertEqual(mock_run.call_count, 2)
+        expected_command = (
+            "oc debug node/" + node + " -- chroot /host "
+            "dd if=/dev/urandom of=/proc/sysrq-trigger"
+        )
+        mock_run.assert_called_with(expected_command)
+        self.assertIsNone(result)
+
+    @patch('krkn.scenario_plugins.node_actions.abstract_node_scenarios.runcommand.run')
+    @patch('logging.error')
+    @patch('logging.info')
+    def test_node_crash_scenario_failure(self, mock_info, mock_error, mock_run):
+        """Test node crash scenario when command fails"""
+        # Arrange
+        instance_kill_count = 1
+        node = "test-node"
+        timeout = 300
+        error_msg = "Crash command failed"
+        mock_run.side_effect = Exception(error_msg)
+
+        # Act
+        result = self.scenarios.node_crash_scenario(instance_kill_count, node, timeout)
+
+        # Assert
+        self.assertEqual(result, 1)
+        mock_error.assert_any_call(
+            "Failed to crash the node. Encountered following exception: %s. "
+            "Test Failed" % error_msg
+        )
+
+    def test_node_start_scenario_not_implemented(self):
+        """Test that node_start_scenario returns None (not implemented)"""
+        result = self.scenarios.node_start_scenario(1, "test-node", 300, 10)
+        self.assertIsNone(result)
+
+    def test_node_stop_scenario_not_implemented(self):
+        """Test that node_stop_scenario returns None (not implemented)"""
+        result = self.scenarios.node_stop_scenario(1, "test-node", 300, 10)
+        self.assertIsNone(result)
+
+    def test_node_termination_scenario_not_implemented(self):
+        """Test that node_termination_scenario returns None (not implemented)"""
+        result = self.scenarios.node_termination_scenario(1, "test-node", 300, 10)
+        self.assertIsNone(result)
+
+    def test_node_reboot_scenario_not_implemented(self):
+        """Test that node_reboot_scenario returns None (not implemented)"""
+        result = self.scenarios.node_reboot_scenario(1, "test-node", 300)
+        self.assertIsNone(result)
+
+    def test_node_service_status_not_implemented(self):
+        """Test that node_service_status returns None (not implemented)"""
+        result = self.scenarios.node_service_status("test-node", "service", "key", 300)
+        self.assertIsNone(result)
+
+    def test_node_block_scenario_not_implemented(self):
+        """Test that node_block_scenario returns None (not implemented)"""
+        result = self.scenarios.node_block_scenario(1, "test-node", 300, 60)
+        self.assertIsNone(result)
+
+
+class TestAbstractNodeScenariosIntegration(unittest.TestCase):
+    """Integration tests for abstract_node_scenarios workflows"""
+
+    def setUp(self):
+        """Set up test fixtures before each test method"""
+        self.mock_kubecli = Mock(spec=KrknKubernetes)
+        self.mock_affected_nodes_status = Mock(spec=AffectedNodeStatus)
+        self.mock_affected_nodes_status.affected_nodes = []
+
+        self.scenarios = abstract_node_scenarios(
+            kubecli=self.mock_kubecli,
+            node_action_kube_check=True,
+            affected_nodes_status=self.mock_affected_nodes_status
+        )
+
+    @patch('time.sleep')
+    @patch('krkn.scenario_plugins.node_actions.abstract_node_scenarios.nodeaction.wait_for_unknown_status')
+    @patch('krkn.scenario_plugins.node_actions.abstract_node_scenarios.runcommand.run')
+    def test_complete_stop_start_kubelet_workflow(self, mock_run, mock_wait, mock_sleep):
+        """Test complete workflow of stop/start kubelet scenario"""
+        # Arrange
+        instance_kill_count = 1
+        node = "test-node"
+        timeout = 300
+
+        self.scenarios.node_reboot_scenario = Mock()
+
+        # Act
+        with patch('krkn.scenario_plugins.node_actions.abstract_node_scenarios.AffectedNode'):
+            self.scenarios.stop_start_kubelet_scenario(instance_kill_count, node, timeout)
+
+        # Assert - verify stop kubelet was called
+        expected_stop_command = "oc debug node/" + node + " -- chroot /host systemctl stop kubelet"
+        mock_run.assert_any_call(expected_stop_command)
+
+        # Verify reboot was called
+        self.scenarios.node_reboot_scenario.assert_called_once_with(
+            instance_kill_count, node, timeout
+        )
+
+        # Verify merge was called
+        self.mock_affected_nodes_status.merge_affected_nodes.assert_called_once()
+
+    @patch('time.sleep')
+    def test_node_stop_start_scenario_workflow(self, mock_sleep):
+        """Test complete workflow of node stop/start scenario"""
+        # Arrange
+        instance_kill_count = 1
+        node = "test-node"
+        timeout = 300
+        duration = 60
+        poll_interval = 10
+
+        self.scenarios.node_stop_scenario = Mock()
+        self.scenarios.node_start_scenario = Mock()
+
+        # Act
+        self.scenarios.node_stop_start_scenario(
+            instance_kill_count, node, timeout, duration, poll_interval
+        )
+
+        # Assert - verify order of operations
+        call_order = []
+
+        # Verify stop was called first
+        self.scenarios.node_stop_scenario.assert_called_once()
+
+        # Verify sleep was called
+        mock_sleep.assert_called_once_with(duration)
+
+        # Verify start was called after sleep
+        self.scenarios.node_start_scenario.assert_called_once()
+
+        # Verify merge was called
+        self.mock_affected_nodes_status.merge_affected_nodes.assert_called_once()
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/tests/test_alibaba_node_scenarios.py
+++ b/tests/test_alibaba_node_scenarios.py
@@ -0,0 +1,680 @@
+#!/usr/bin/env python3
+
+"""
+Test suite for alibaba_node_scenarios class
+
+Usage:
+    python -m coverage run -a -m unittest tests/test_alibaba_node_scenarios.py -v
+
+Assisted By: Claude Code
+"""
+
+import unittest
+from unittest.mock import MagicMock, Mock, patch, PropertyMock, call
+import logging
+import json
+
+from krkn_lib.k8s import KrknKubernetes
+from krkn_lib.models.k8s import AffectedNode, AffectedNodeStatus
+
+from krkn.scenario_plugins.node_actions.alibaba_node_scenarios import Alibaba, alibaba_node_scenarios
+
+
+class TestAlibaba(unittest.TestCase):
+    """Test suite for Alibaba class"""
+
+    def setUp(self):
+        """Set up test fixtures"""
+        # Mock environment variables
+        self.env_patcher = patch.dict('os.environ', {
+            'ALIBABA_ID': 'test-access-key',
+            'ALIBABA_SECRET': 'test-secret-key',
+            'ALIBABA_REGION_ID': 'cn-hangzhou'
+        })
+        self.env_patcher.start()
+
+    def tearDown(self):
+        """Clean up after tests"""
+        self.env_patcher.stop()
+
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_alibaba_init_success(self, mock_acs_client, mock_logging):
+        """Test Alibaba class initialization"""
+        mock_client = Mock()
+        mock_acs_client.return_value = mock_client
+
+        alibaba = Alibaba()
+
+        mock_acs_client.assert_called_once_with('test-access-key', 'test-secret-key', 'cn-hangzhou')
+        self.assertEqual(alibaba.compute_client, mock_client)
+
+    @patch('logging.error')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_alibaba_init_failure(self, mock_acs_client, mock_logging):
+        """Test Alibaba initialization handles errors"""
+        mock_acs_client.side_effect = Exception("Credential error")
+
+        alibaba = Alibaba()
+
+        mock_logging.assert_called()
+        self.assertIn("Initializing alibaba", str(mock_logging.call_args))
+
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_send_request_success(self, mock_acs_client):
+        """Test _send_request successfully sends request"""
+        alibaba = Alibaba()
+
+        mock_request = Mock()
+        mock_response = {'Instances': {'Instance': []}}
+        alibaba.compute_client.do_action.return_value = json.dumps(mock_response).encode('utf-8')
+
+        result = alibaba._send_request(mock_request)
+
+        mock_request.set_accept_format.assert_called_once_with('json')
+        alibaba.compute_client.do_action.assert_called_once_with(mock_request)
+        self.assertEqual(result, mock_response)
+
+    @patch('logging.error')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_send_request_failure(self, mock_acs_client, mock_logging):
+        """Test _send_request handles errors"""
+        alibaba = Alibaba()
+
+        mock_request = Mock()
+        alibaba.compute_client.do_action.side_effect = Exception("API error")
+
+        # The actual code has a bug in the format string (%S instead of %s)
+        # So we expect this to raise a ValueError
+        with self.assertRaises(ValueError):
+            alibaba._send_request(mock_request)
+
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_list_instances_success(self, mock_acs_client):
+        """Test list_instances returns instance list"""
+        alibaba = Alibaba()
+
+        mock_instances = [
+            {'InstanceId': 'i-123', 'InstanceName': 'node1'},
+            {'InstanceId': 'i-456', 'InstanceName': 'node2'}
+        ]
+        mock_response = {'Instances': {'Instance': mock_instances}}
+        alibaba.compute_client.do_action.return_value = json.dumps(mock_response).encode('utf-8')
+
+        result = alibaba.list_instances()
+
+        self.assertEqual(result, mock_instances)
+
+    @patch('logging.error')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_list_instances_no_instances_key(self, mock_acs_client, mock_logging):
+        """Test list_instances handles missing Instances key"""
+        alibaba = Alibaba()
+
+        mock_response = {'SomeOtherKey': 'value'}
+        alibaba.compute_client.do_action.return_value = json.dumps(mock_response).encode('utf-8')
+
+        with self.assertRaises(RuntimeError):
+            alibaba.list_instances()
+
+        mock_logging.assert_called()
+
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_list_instances_none_response(self, mock_acs_client):
+        """Test list_instances handles None response"""
+        alibaba = Alibaba()
+        alibaba._send_request = Mock(return_value=None)
+
+        result = alibaba.list_instances()
+
+        self.assertEqual(result, [])
+
+    @patch('logging.error')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_list_instances_exception(self, mock_acs_client, mock_logging):
+        """Test list_instances handles exceptions"""
+        alibaba = Alibaba()
+        alibaba._send_request = Mock(side_effect=Exception("Network error"))
+
+        with self.assertRaises(Exception):
+            alibaba.list_instances()
+
+        mock_logging.assert_called()
+
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_get_instance_id_found(self, mock_acs_client):
+        """Test get_instance_id when instance is found"""
+        alibaba = Alibaba()
+
+        mock_instances = [
+            {'InstanceId': 'i-123', 'InstanceName': 'test-node'},
+            {'InstanceId': 'i-456', 'InstanceName': 'other-node'}
+        ]
+        alibaba.list_instances = Mock(return_value=mock_instances)
+
+        result = alibaba.get_instance_id('test-node')
+
+        self.assertEqual(result, 'i-123')
+
+    @patch('logging.error')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_get_instance_id_not_found(self, mock_acs_client, mock_logging):
+        """Test get_instance_id when instance is not found"""
+        alibaba = Alibaba()
+
+        alibaba.list_instances = Mock(return_value=[])
+
+        with self.assertRaises(RuntimeError):
+            alibaba.get_instance_id('nonexistent-node')
+
+        mock_logging.assert_called()
+        self.assertIn("Couldn't find vm", str(mock_logging.call_args))
+
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_start_instances_success(self, mock_acs_client, mock_logging):
+        """Test start_instances successfully starts instance"""
+        alibaba = Alibaba()
+        alibaba._send_request = Mock(return_value={'RequestId': 'req-123'})
+
+        alibaba.start_instances('i-123')
+
+        alibaba._send_request.assert_called_once()
+        mock_logging.assert_called()
+        call_str = str(mock_logging.call_args_list)
+        self.assertTrue('started' in call_str or 'submit successfully' in call_str)
+
+    @patch('logging.error')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_start_instances_failure(self, mock_acs_client, mock_logging):
+        """Test start_instances handles failure"""
+        alibaba = Alibaba()
+        alibaba._send_request = Mock(side_effect=Exception("Start failed"))
+
+        with self.assertRaises(Exception):
+            alibaba.start_instances('i-123')
+
+        mock_logging.assert_called()
+        self.assertIn("Failed to start", str(mock_logging.call_args))
+
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_stop_instances_success(self, mock_acs_client, mock_logging):
+        """Test stop_instances successfully stops instance"""
+        alibaba = Alibaba()
+        alibaba._send_request = Mock(return_value={'RequestId': 'req-123'})
+
+        alibaba.stop_instances('i-123', force_stop=True)
+
+        alibaba._send_request.assert_called_once()
+        mock_logging.assert_called()
+
+    @patch('logging.error')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_stop_instances_failure(self, mock_acs_client, mock_logging):
+        """Test stop_instances handles failure"""
+        alibaba = Alibaba()
+        alibaba._send_request = Mock(side_effect=Exception("Stop failed"))
+
+        with self.assertRaises(Exception):
+            alibaba.stop_instances('i-123')
+
+        mock_logging.assert_called()
+        self.assertIn("Failed to stop", str(mock_logging.call_args))
+
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_release_instance_success(self, mock_acs_client, mock_logging):
+        """Test release_instance successfully releases instance"""
+        alibaba = Alibaba()
+        alibaba._send_request = Mock(return_value={'RequestId': 'req-123'})
+
+        alibaba.release_instance('i-123', force_release=True)
+
+        alibaba._send_request.assert_called_once()
+        mock_logging.assert_called()
+        self.assertIn("released", str(mock_logging.call_args))
+
+    @patch('logging.error')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_release_instance_failure(self, mock_acs_client, mock_logging):
+        """Test release_instance handles failure"""
+        alibaba = Alibaba()
+        alibaba._send_request = Mock(side_effect=Exception("Release failed"))
+
+        with self.assertRaises(Exception):
+            alibaba.release_instance('i-123')
+
+        mock_logging.assert_called()
+        self.assertIn("Failed to terminate", str(mock_logging.call_args))
+
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_reboot_instances_success(self, mock_acs_client, mock_logging):
+        """Test reboot_instances successfully reboots instance"""
+        alibaba = Alibaba()
+        alibaba._send_request = Mock(return_value={'RequestId': 'req-123'})
+
+        alibaba.reboot_instances('i-123', force_reboot=True)
+
+        alibaba._send_request.assert_called_once()
+        mock_logging.assert_called()
+        self.assertIn("rebooted", str(mock_logging.call_args))
+
+    @patch('logging.error')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_reboot_instances_failure(self, mock_acs_client, mock_logging):
+        """Test reboot_instances handles failure"""
+        alibaba = Alibaba()
+        alibaba._send_request = Mock(side_effect=Exception("Reboot failed"))
+
+        with self.assertRaises(Exception):
+            alibaba.reboot_instances('i-123')
+
+        mock_logging.assert_called()
+        self.assertIn("Failed to reboot", str(mock_logging.call_args))
+
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_get_vm_status_success(self, mock_acs_client, mock_logging):
+        """Test get_vm_status returns instance status"""
+        alibaba = Alibaba()
+
+        mock_response = {
+            'Instances': {
+                'Instance': [{'Status': 'Running'}]
+            }
+        }
+        alibaba._send_request = Mock(return_value=mock_response)
+
+        result = alibaba.get_vm_status('i-123')
+
+        self.assertEqual(result, 'Running')
+
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_get_vm_status_no_instances(self, mock_acs_client, mock_logging):
+        """Test get_vm_status when no instances found"""
+        alibaba = Alibaba()
+
+        mock_response = {
+            'Instances': {
+                'Instance': []
+            }
+        }
+        alibaba._send_request = Mock(return_value=mock_response)
+
+        result = alibaba.get_vm_status('i-123')
+
+        self.assertIsNone(result)
+
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_get_vm_status_none_response(self, mock_acs_client, mock_logging):
+        """Test get_vm_status with None response"""
+        alibaba = Alibaba()
+        alibaba._send_request = Mock(return_value=None)
+
+        result = alibaba.get_vm_status('i-123')
+
+        self.assertEqual(result, 'Unknown')
+
+    @patch('logging.error')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_get_vm_status_exception(self, mock_acs_client, mock_logging):
+        """Test get_vm_status handles exceptions"""
+        alibaba = Alibaba()
+        alibaba._send_request = Mock(side_effect=Exception("API error"))
+
+        result = alibaba.get_vm_status('i-123')
+
+        self.assertIsNone(result)
+        mock_logging.assert_called()
+
+    @patch('time.sleep')
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_wait_until_running_success(self, mock_acs_client, mock_logging, mock_sleep):
+        """Test wait_until_running waits for instance to be running"""
+        alibaba = Alibaba()
+
+        alibaba.get_vm_status = Mock(side_effect=['Starting', 'Running'])
+        mock_affected_node = Mock(spec=AffectedNode)
+
+        result = alibaba.wait_until_running('i-123', 300, mock_affected_node)
+
+        self.assertTrue(result)
+        mock_affected_node.set_affected_node_status.assert_called_once()
+        args = mock_affected_node.set_affected_node_status.call_args[0]
+        self.assertEqual(args[0], 'running')
+
+    @patch('time.sleep')
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_wait_until_running_timeout(self, mock_acs_client, mock_logging, mock_sleep):
+        """Test wait_until_running returns False on timeout"""
+        alibaba = Alibaba()
+
+        alibaba.get_vm_status = Mock(return_value='Starting')
+
+        result = alibaba.wait_until_running('i-123', 10, None)
+
+        self.assertFalse(result)
+
+    @patch('time.sleep')
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_wait_until_stopped_success(self, mock_acs_client, mock_logging, mock_sleep):
+        """Test wait_until_stopped waits for instance to be stopped"""
+        alibaba = Alibaba()
+
+        alibaba.get_vm_status = Mock(side_effect=['Stopping', 'Stopped'])
+        mock_affected_node = Mock(spec=AffectedNode)
+
+        result = alibaba.wait_until_stopped('i-123', 300, mock_affected_node)
+
+        self.assertTrue(result)
+        mock_affected_node.set_affected_node_status.assert_called_once()
+
+    @patch('time.sleep')
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_wait_until_stopped_timeout(self, mock_acs_client, mock_logging, mock_sleep):
+        """Test wait_until_stopped returns False on timeout"""
+        alibaba = Alibaba()
+
+        alibaba.get_vm_status = Mock(return_value='Stopping')
+
+        result = alibaba.wait_until_stopped('i-123', 10, None)
+
+        self.assertFalse(result)
+
+    @patch('time.sleep')
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_wait_until_released_success(self, mock_acs_client, mock_logging, mock_sleep):
+        """Test wait_until_released waits for instance to be released"""
+        alibaba = Alibaba()
+
+        alibaba.get_vm_status = Mock(side_effect=['Deleting', 'Released'])
+        mock_affected_node = Mock(spec=AffectedNode)
+
+        result = alibaba.wait_until_released('i-123', 300, mock_affected_node)
+
+        self.assertTrue(result)
+        mock_affected_node.set_affected_node_status.assert_called_once()
+        args = mock_affected_node.set_affected_node_status.call_args[0]
+        self.assertEqual(args[0], 'terminated')
+
+    @patch('time.sleep')
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_wait_until_released_timeout(self, mock_acs_client, mock_logging, mock_sleep):
+        """Test wait_until_released returns False on timeout"""
+        alibaba = Alibaba()
+
+        alibaba.get_vm_status = Mock(return_value='Deleting')
+
+        result = alibaba.wait_until_released('i-123', 10, None)
+
+        self.assertFalse(result)
+
+    @patch('time.sleep')
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.AcsClient')
+    def test_wait_until_released_none_status(self, mock_acs_client, mock_logging, mock_sleep):
+        """Test wait_until_released when status becomes None"""
+        alibaba = Alibaba()
+
+        alibaba.get_vm_status = Mock(side_effect=['Deleting', None])
+        mock_affected_node = Mock(spec=AffectedNode)
+
+        result = alibaba.wait_until_released('i-123', 300, mock_affected_node)
+
+        self.assertTrue(result)
+
+
+class TestAlibabaNodeScenarios(unittest.TestCase):
+    """Test suite for alibaba_node_scenarios class"""
+
+    def setUp(self):
+        """Set up test fixtures"""
+        self.env_patcher = patch.dict('os.environ', {
+            'ALIBABA_ID': 'test-access-key',
+            'ALIBABA_SECRET': 'test-secret-key',
+            'ALIBABA_REGION_ID': 'cn-hangzhou'
+        })
+        self.env_patcher.start()
+
+        self.mock_kubecli = Mock(spec=KrknKubernetes)
+        self.affected_nodes_status = AffectedNodeStatus()
+
+    def tearDown(self):
+        """Clean up after tests"""
+        self.env_patcher.stop()
+
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.Alibaba')
+    def test_init(self, mock_alibaba_class, mock_logging):
+        """Test alibaba_node_scenarios initialization"""
+        mock_alibaba_instance = Mock()
+        mock_alibaba_class.return_value = mock_alibaba_instance
+
+        scenarios = alibaba_node_scenarios(self.mock_kubecli, True, self.affected_nodes_status)
+
+        self.assertEqual(scenarios.kubecli, self.mock_kubecli)
+        self.assertTrue(scenarios.node_action_kube_check)
+        self.assertEqual(scenarios.alibaba, mock_alibaba_instance)
+
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.nodeaction')
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.Alibaba')
+    def test_node_start_scenario_success(self, mock_alibaba_class, mock_logging, mock_nodeaction):
+        """Test node_start_scenario successfully starts node"""
+        mock_alibaba = Mock()
+        mock_alibaba_class.return_value = mock_alibaba
+        mock_alibaba.get_instance_id.return_value = 'i-123'
+        mock_alibaba.wait_until_running.return_value = True
+
+        scenarios = alibaba_node_scenarios(self.mock_kubecli, True, self.affected_nodes_status)
+
+        scenarios.node_start_scenario(1, 'test-node', 300, 15)
+
+        mock_alibaba.get_instance_id.assert_called_once_with('test-node')
+        mock_alibaba.start_instances.assert_called_once_with('i-123')
+        mock_alibaba.wait_until_running.assert_called_once()
+        mock_nodeaction.wait_for_ready_status.assert_called_once()
+        self.assertEqual(len(self.affected_nodes_status.affected_nodes), 1)
+
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.nodeaction')
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.Alibaba')
+    def test_node_start_scenario_no_kube_check(self, mock_alibaba_class, mock_logging, mock_nodeaction):
+        """Test node_start_scenario without Kubernetes check"""
+        mock_alibaba = Mock()
+        mock_alibaba_class.return_value = mock_alibaba
+        mock_alibaba.get_instance_id.return_value = 'i-123'
+        mock_alibaba.wait_until_running.return_value = True
+
+        scenarios = alibaba_node_scenarios(self.mock_kubecli, False, self.affected_nodes_status)
+
+        scenarios.node_start_scenario(1, 'test-node', 300, 15)
+
+        mock_alibaba.start_instances.assert_called_once()
+        mock_nodeaction.wait_for_ready_status.assert_not_called()
+
+    @patch('logging.error')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.Alibaba')
+    def test_node_start_scenario_failure(self, mock_alibaba_class, mock_logging):
+        """Test node_start_scenario handles failure"""
+        mock_alibaba = Mock()
+        mock_alibaba_class.return_value = mock_alibaba
+        mock_alibaba.get_instance_id.return_value = 'i-123'
+        mock_alibaba.start_instances.side_effect = Exception('Start failed')
+
+        scenarios = alibaba_node_scenarios(self.mock_kubecli, False, self.affected_nodes_status)
+
+        with self.assertRaises(Exception):
+            scenarios.node_start_scenario(1, 'test-node', 300, 15)
+
+        mock_logging.assert_called()
+
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.nodeaction')
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.Alibaba')
+    def test_node_start_scenario_multiple_runs(self, mock_alibaba_class, mock_logging, mock_nodeaction):
+        """Test node_start_scenario with multiple runs"""
+        mock_alibaba = Mock()
+        mock_alibaba_class.return_value = mock_alibaba
+        mock_alibaba.get_instance_id.return_value = 'i-123'
+        mock_alibaba.wait_until_running.return_value = True
+
+        scenarios = alibaba_node_scenarios(self.mock_kubecli, True, self.affected_nodes_status)
+
+        scenarios.node_start_scenario(3, 'test-node', 300, 15)
+
+        self.assertEqual(mock_alibaba.start_instances.call_count, 3)
+        self.assertEqual(len(self.affected_nodes_status.affected_nodes), 3)
+
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.nodeaction')
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.Alibaba')
+    def test_node_stop_scenario_success(self, mock_alibaba_class, mock_logging, mock_nodeaction):
+        """Test node_stop_scenario successfully stops node"""
+        mock_alibaba = Mock()
+        mock_alibaba_class.return_value = mock_alibaba
+        mock_alibaba.get_instance_id.return_value = 'i-123'
+        mock_alibaba.wait_until_stopped.return_value = True
+
+        scenarios = alibaba_node_scenarios(self.mock_kubecli, True, self.affected_nodes_status)
+
+        scenarios.node_stop_scenario(1, 'test-node', 300, 15)
+
+        mock_alibaba.get_instance_id.assert_called_once_with('test-node')
+        mock_alibaba.stop_instances.assert_called_once_with('i-123')
+        mock_alibaba.wait_until_stopped.assert_called_once()
+        mock_nodeaction.wait_for_unknown_status.assert_called_once()
+        self.assertEqual(len(self.affected_nodes_status.affected_nodes), 1)
+
+    @patch('logging.error')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.Alibaba')
+    def test_node_stop_scenario_failure(self, mock_alibaba_class, mock_logging):
+        """Test node_stop_scenario handles failure"""
+        mock_alibaba = Mock()
+        mock_alibaba_class.return_value = mock_alibaba
+        mock_alibaba.get_instance_id.return_value = 'i-123'
+        mock_alibaba.stop_instances.side_effect = Exception('Stop failed')
+
+        scenarios = alibaba_node_scenarios(self.mock_kubecli, False, self.affected_nodes_status)
+
+        with self.assertRaises(Exception):
+            scenarios.node_stop_scenario(1, 'test-node', 300, 15)
+
+        mock_logging.assert_called()
+
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.Alibaba')
+    def test_node_termination_scenario_success(self, mock_alibaba_class, mock_logging):
+        """Test node_termination_scenario successfully terminates node"""
+        mock_alibaba = Mock()
+        mock_alibaba_class.return_value = mock_alibaba
+        mock_alibaba.get_instance_id.return_value = 'i-123'
+        mock_alibaba.wait_until_stopped.return_value = True
+        mock_alibaba.wait_until_released.return_value = True
+
+        scenarios = alibaba_node_scenarios(self.mock_kubecli, False, self.affected_nodes_status)
+
+        scenarios.node_termination_scenario(1, 'test-node', 300, 15)
+
+        mock_alibaba.stop_instances.assert_called_once_with('i-123')
+        mock_alibaba.wait_until_stopped.assert_called_once()
+        mock_alibaba.release_instance.assert_called_once_with('i-123')
+        mock_alibaba.wait_until_released.assert_called_once()
+        self.assertEqual(len(self.affected_nodes_status.affected_nodes), 1)
+
+    @patch('logging.error')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.Alibaba')
+    def test_node_termination_scenario_failure(self, mock_alibaba_class, mock_logging):
+        """Test node_termination_scenario handles failure"""
+        mock_alibaba = Mock()
+        mock_alibaba_class.return_value = mock_alibaba
+        mock_alibaba.get_instance_id.return_value = 'i-123'
+        mock_alibaba.stop_instances.side_effect = Exception('Stop failed')
+
+        scenarios = alibaba_node_scenarios(self.mock_kubecli, False, self.affected_nodes_status)
+
+        with self.assertRaises(Exception):
+            scenarios.node_termination_scenario(1, 'test-node', 300, 15)
+
+        mock_logging.assert_called()
+
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.nodeaction')
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.Alibaba')
+    def test_node_reboot_scenario_success(self, mock_alibaba_class, mock_logging, mock_nodeaction):
+        """Test node_reboot_scenario successfully reboots node"""
+        mock_alibaba = Mock()
+        mock_alibaba_class.return_value = mock_alibaba
+        mock_alibaba.get_instance_id.return_value = 'i-123'
+
+        scenarios = alibaba_node_scenarios(self.mock_kubecli, True, self.affected_nodes_status)
+
+        scenarios.node_reboot_scenario(1, 'test-node', 300, soft_reboot=False)
+
+        mock_alibaba.reboot_instances.assert_called_once_with('i-123')
+        mock_nodeaction.wait_for_unknown_status.assert_called_once()
+        mock_nodeaction.wait_for_ready_status.assert_called_once()
+        self.assertEqual(len(self.affected_nodes_status.affected_nodes), 1)
+
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.nodeaction')
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.Alibaba')
+    def test_node_reboot_scenario_no_kube_check(self, mock_alibaba_class, mock_logging, mock_nodeaction):
+        """Test node_reboot_scenario without Kubernetes check"""
+        mock_alibaba = Mock()
+        mock_alibaba_class.return_value = mock_alibaba
+        mock_alibaba.get_instance_id.return_value = 'i-123'
+
+        scenarios = alibaba_node_scenarios(self.mock_kubecli, False, self.affected_nodes_status)
+
+        scenarios.node_reboot_scenario(1, 'test-node', 300)
+
+        mock_alibaba.reboot_instances.assert_called_once()
+        mock_nodeaction.wait_for_unknown_status.assert_not_called()
+        mock_nodeaction.wait_for_ready_status.assert_not_called()
+
+    @patch('logging.error')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.Alibaba')
+    def test_node_reboot_scenario_failure(self, mock_alibaba_class, mock_logging):
+        """Test node_reboot_scenario handles failure"""
+        mock_alibaba = Mock()
+        mock_alibaba_class.return_value = mock_alibaba
+        mock_alibaba.get_instance_id.return_value = 'i-123'
+        mock_alibaba.reboot_instances.side_effect = Exception('Reboot failed')
+
+        scenarios = alibaba_node_scenarios(self.mock_kubecli, False, self.affected_nodes_status)
+
+        with self.assertRaises(Exception):
+            scenarios.node_reboot_scenario(1, 'test-node', 300)
+
+        mock_logging.assert_called()
+
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.nodeaction')
+    @patch('logging.info')
+    @patch('krkn.scenario_plugins.node_actions.alibaba_node_scenarios.Alibaba')
+    def test_node_reboot_scenario_multiple_runs(self, mock_alibaba_class, mock_logging, mock_nodeaction):
+        """Test node_reboot_scenario with multiple runs"""
+        mock_alibaba = Mock()
+        mock_alibaba_class.return_value = mock_alibaba
+        mock_alibaba.get_instance_id.return_value = 'i-123'
+
+        scenarios = alibaba_node_scenarios(self.mock_kubecli, True, self.affected_nodes_status)
+
+        scenarios.node_reboot_scenario(2, 'test-node', 300)
+
+        self.assertEqual(mock_alibaba.reboot_instances.call_count, 2)
+        self.assertEqual(len(self.affected_nodes_status.affected_nodes), 2)
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/Show More
+++ b/Show More