adding better logging for when sceanrio file cant be found (#1203 )

Signed-off-by: Paige Patton <prubenda@redhat.com>
adding governance template from cncf (#926 )
2026-03-30 07:18:25 +00:00 · 2026-03-27 13:47:49 -04:00 · 2026-03-27 09:33:00 -04:00 · 2026-03-27 14:31:19 +01:00 · 2026-03-26 14:42:45 -04:00 · 2026-03-26 13:08:56 -04:00
22 changed files with 1092 additions and 148 deletions
--- a/.github/workflows/docker-image.yml
+++ b/.github/workflows/docker-image.yml
@@ -6,48 +6,117 @@ on:

 jobs:
  build:
-    runs-on: ubuntu-latest
+    runs-on: ${{ matrix.runner }}
+    strategy:
+      matrix:
+        include:
+          - platform: amd64
+            runner: ubuntu-latest
+          - platform: arm64
+            runner: ubuntu-24.04-arm
    steps:
    - name: Check out code
      uses: actions/checkout@v3
-    - name: Build the Docker images
-      if: startsWith(github.ref, 'refs/tags')
-      run:  |
-        ./containers/compile_dockerfile.sh
-        docker build --no-cache -t quay.io/krkn-chaos/krkn containers/ --build-arg TAG=${GITHUB_REF#refs/tags/}
-        docker tag quay.io/krkn-chaos/krkn quay.io/redhat-chaos/krkn
-        docker tag quay.io/krkn-chaos/krkn quay.io/krkn-chaos/krkn:${GITHUB_REF#refs/tags/}
-        docker tag quay.io/krkn-chaos/krkn quay.io/redhat-chaos/krkn:${GITHUB_REF#refs/tags/}
+
+    - name: Set up Docker Buildx
+      uses: docker/setup-buildx-action@v3

    - name: Test Build the Docker images
-      if: ${{ github.event_name == 'pull_request' }}
+      if: github.event_name == 'pull_request'
      run: |
        ./containers/compile_dockerfile.sh
-        docker build --no-cache -t quay.io/krkn-chaos/krkn containers/ --build-arg PR_NUMBER=${{ github.event.pull_request.number }}
-    - name: Login in quay
+        docker buildx build --no-cache \
+          --platform linux/${{ matrix.platform }} \
+          -t quay.io/krkn-chaos/krkn \
+          -t quay.io/redhat-chaos/krkn  \
+          containers/ \
+          --build-arg PR_NUMBER=${{ github.event.pull_request.number }}
+
+    - name: Login to krkn-chaos quay
      if: startsWith(github.ref, 'refs/tags')
-      run: docker login quay.io -u ${QUAY_USER} -p ${QUAY_TOKEN}
-      env:
-        QUAY_USER: ${{ secrets.QUAY_USERNAME }}
-        QUAY_TOKEN: ${{ secrets.QUAY_PASSWORD }}
-    - name: Push the KrknChaos Docker images
+      uses: docker/login-action@v3
+      with:
+        registry: quay.io
+        username: ${{ secrets.QUAY_USERNAME }}
+        password: ${{ secrets.QUAY_PASSWORD }}
+
+    - name: Build and push krkn-chaos images
      if: startsWith(github.ref, 'refs/tags')
      run: |
-        docker push quay.io/krkn-chaos/krkn
-        docker push quay.io/krkn-chaos/krkn:${GITHUB_REF#refs/tags/}
-    - name: Login in to redhat-chaos quay
-      if: startsWith(github.ref, 'refs/tags/v')
-      run: docker login quay.io -u ${QUAY_USER} -p ${QUAY_TOKEN}
-      env:
-        QUAY_USER: ${{ secrets.QUAY_USER_1 }}
-        QUAY_TOKEN: ${{ secrets.QUAY_TOKEN_1 }}
-    - name: Push the RedHat Chaos Docker images
+        ./containers/compile_dockerfile.sh
+        TAG=${GITHUB_REF#refs/tags/}
+        docker buildx build --no-cache \
+          --platform linux/${{ matrix.platform }} \
+          --provenance=false \
+          -t quay.io/krkn-chaos/krkn:latest-${{ matrix.platform }} \
+          -t quay.io/krkn-chaos/krkn:${TAG}-${{ matrix.platform }} \
+          containers/ \
+          --build-arg TAG=${TAG} \
+          --push --load
+
+    - name: Login to redhat-chaos quay
      if: startsWith(github.ref, 'refs/tags')
-      run: | 
-        docker push quay.io/redhat-chaos/krkn
-        docker push quay.io/redhat-chaos/krkn:${GITHUB_REF#refs/tags/}
+      uses: docker/login-action@v3
+      with:
+        registry: quay.io
+        username: ${{ secrets.QUAY_USER_1 }}
+        password: ${{ secrets.QUAY_TOKEN_1 }}
+
+    - name: Push redhat-chaos images
+      if: startsWith(github.ref, 'refs/tags')
+      run: |
+        TAG=${GITHUB_REF#refs/tags/}
+        docker tag quay.io/krkn-chaos/krkn:${TAG}-${{ matrix.platform }} quay.io/redhat-chaos/krkn:${TAG}-${{ matrix.platform }}
+        docker tag quay.io/krkn-chaos/krkn:${TAG}-${{ matrix.platform }} quay.io/redhat-chaos/krkn:latest-${{ matrix.platform }}
+        docker push quay.io/redhat-chaos/krkn:${TAG}-${{ matrix.platform }}
+        docker push quay.io/redhat-chaos/krkn:latest-${{ matrix.platform }}
+
+  manifest:
+    runs-on: ubuntu-latest
+    needs: build
+    if: startsWith(github.ref, 'refs/tags')
+    steps:
+    - name: Login to krkn-chaos quay
+      uses: docker/login-action@v3
+      with:
+        registry: quay.io
+        username: ${{ secrets.QUAY_USERNAME }}
+        password: ${{ secrets.QUAY_PASSWORD }}
+
+    - name: Create and push KrknChaos manifests
+      run: |
+        TAG=${GITHUB_REF#refs/tags/}
+        docker manifest create quay.io/krkn-chaos/krkn:${TAG} \
+          quay.io/krkn-chaos/krkn:${TAG}-amd64 \
+          quay.io/krkn-chaos/krkn:${TAG}-arm64
+        docker manifest push quay.io/krkn-chaos/krkn:${TAG}
+
+        docker manifest create quay.io/krkn-chaos/krkn:latest \
+          quay.io/krkn-chaos/krkn:latest-amd64 \
+          quay.io/krkn-chaos/krkn:latest-arm64
+        docker manifest push quay.io/krkn-chaos/krkn:latest
+
+    - name: Login to redhat-chaos quay
+      uses: docker/login-action@v3
+      with:
+        registry: quay.io
+        username: ${{ secrets.QUAY_USER_1 }}
+        password: ${{ secrets.QUAY_TOKEN_1 }}
+
+    - name: Create and push RedHat Chaos manifests
+      run: |
+        TAG=${GITHUB_REF#refs/tags/}
+        docker manifest create quay.io/redhat-chaos/krkn:${TAG} \
+          quay.io/redhat-chaos/krkn:${TAG}-amd64 \
+          quay.io/redhat-chaos/krkn:${TAG}-arm64
+        docker manifest push quay.io/redhat-chaos/krkn:${TAG}
+
+        docker manifest create quay.io/redhat-chaos/krkn:latest \
+          quay.io/redhat-chaos/krkn:latest-amd64 \
+          quay.io/redhat-chaos/krkn:latest-arm64
+        docker manifest push quay.io/redhat-chaos/krkn:latest
+
    - name: Rebuild krkn-hub
-      if: startsWith(github.ref, 'refs/tags')
      uses: redhat-chaos/actions/krkn-hub@main
      with:
        QUAY_USER: ${{ secrets.QUAY_USERNAME }}
--- a/GOVERNANCE.md
+++ b/GOVERNANCE.md
@@ -1,83 +1,148 @@
+# Krkn Project Governance

+Krkn is a chaos and resiliency testing tool for Kubernetes that injects deliberate failures into clusters to validate their resilience under turbulent conditions. This governance document explains how the project is run.

+- [Values](#values)
+- [Community Roles](#community-roles)
+- [Becoming a Maintainer](#becoming-a-maintainer)
+- [Removing a Maintainer](#removing-a-maintainer)
+- [Meetings](#meetings)
+- [CNCF Resources](#cncf-resources)
+- [Code of Conduct](#code-of-conduct)
+- [Security Response Team](#security-response-team)
+- [Voting](#voting)
+- [Modifying this Charter](#modifying-this-charter)

-The governance model adopted here is heavily influenced by a set of CNCF projects, especially drew
-reference from [Kubernetes governance](https://github.com/kubernetes/community/blob/master/governance.md).
-*For similar structures some of the same wordings from kubernetes governance are borrowed to adhere
-to the originally construed meaning.*
+## Values

-## Principles
+Krkn and its leadership embrace the following values:

- **Open**: Krkn is open source community.
- **Welcoming and respectful**: See [Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md).
- **Transparent and accessible**: Work and collaboration should be done in public.
-  Changes to the Krkn organization, Krkn code repositories, and CNCF related activities (e.g.
-  level, involvement, etc) are done in public.
- **Merit**: Ideas and contributions are accepted according to their technical merit
-  and alignment with project objectives, scope and design principles.
+* **Openness**: Communication and decision-making happens in the open and is discoverable for future reference. As much as possible, all discussions and work take place in public forums and open repositories.
+
+* **Fairness**: All stakeholders have the opportunity to provide feedback and submit contributions, which will be considered on their merits.
+
+* **Community over Product or Company**: Sustaining and growing our community takes priority over shipping code or sponsors' organizational goals. Each contributor participates in the project as an individual.
+
+* **Inclusivity**: We innovate through different perspectives and skill sets, which can only be accomplished in a welcoming and respectful environment.
+
+* **Participation**: Responsibilities within the project are earned through participation, and there is a clear path up the contributor ladder into leadership positions.
+
+## Community Roles
+
+Krkn uses a tiered contributor model. Each level comes with increasing responsibilities and privileges.
+
+### Contributor
+
+Anyone can become a contributor by participating in discussions, reporting bugs, or submitting code or documentation.
+
+**Responsibilities:**
+- Adhere to the [Code of Conduct](CODE_OF_CONDUCT.md)
+- Report bugs and suggest new features
+- Contribute high-quality code and documentation
+
+### Member
+
+Members are active contributors who have demonstrated a solid understanding of the project's codebase and conventions.
+
+**Responsibilities:**
+- Review pull requests for correctness, quality, and adherence to project standards
+- Provide constructive and timely feedback to contributors
+- Ensure contributions are well-tested and documented
+- Work with maintainers to support a smooth release process
+
+### Maintainer
+
+Maintainers are responsible for the overall health and direction of the project. They have write access to the [project GitHub repository](https://github.com/krkn-chaos/krkn) and can merge patches from themselves or others. The current maintainers are listed in [MAINTAINERS.md](./MAINTAINERS.md).
+
+Maintainers collectively form the **Maintainer Council**, the governing body for the project.
+
+A maintainer is not just someone who can make changes — they are someone who has demonstrated the ability to collaborate with the team, get the right people to review code and docs, contribute high-quality work, and follow through to fix issues.
+
+**Responsibilities:**
+- Set the technical direction and vision for the project
+- Manage releases and ensure stability of the main branch
+- Make decisions on feature inclusion and project priorities
+- Mentor contributors and help grow the community
+- Resolve disputes and make final decisions when consensus cannot be reached
+
+### Owner
+
+Owners have administrative access to the project and are the final decision-makers.
+
+**Responsibilities:**
+- Manage the core team of maintainers
+- Set the overall vision and strategy for the project
+- Handle administrative tasks such as managing the repository and other resources
+- Represent the project in the broader open-source community
+
+## Becoming a Maintainer
+
+To become a Maintainer you need to demonstrate the following:
+
+- **Commitment to the project:**
+  - Participate in discussions, contributions, code and documentation reviews for 3 months or more
+  - Perform reviews for at least 5 non-trivial pull requests
+  - Contribute at least 3 non-trivial pull requests that have been merged
+- Ability to write quality code and/or documentation
+- Ability to collaborate effectively with the team
+- Understanding of how the team works (policies, processes for testing and code review, etc.)
+- Understanding of the project's codebase and coding and documentation style
+
+A new Maintainer must be proposed by an existing Maintainer by sending a message to the [maintainer mailing list](mailto:krkn.maintainers@gmail.com). A simple majority vote of existing Maintainers approves the application. Nominations will be evaluated without prejudice to employer or demographics.
+
+Maintainers who are approved will be granted the necessary GitHub rights and invited to the [maintainer mailing list](mailto:krkn.maintainers@gmail.com).
+
+## Removing a Maintainer
+
+Maintainers may resign at any time if they feel they will not be able to continue fulfilling their project duties.
+
+Maintainers may also be removed for inactivity, failure to fulfill their responsibilities, violating the Code of Conduct, or other reasons. Inactivity is defined as a period of very low or no activity in the project for a year or more, with no definite schedule to return to full Maintainer activity.
+
+A Maintainer may be removed at any time by a 2/3 vote of the remaining Maintainers.
+
+Depending on the reason for removal, a Maintainer may be converted to **Emeritus** status. Emeritus Maintainers will still be consulted on some project matters and can be rapidly returned to Maintainer status if their availability changes.
+
+## Meetings
+
+Maintainers are expected to participate in the public developer meeting, which occurs **once a month via Zoom**. Meeting details (link, agenda, and notes) are posted in the [#krkn channel on Kubernetes Slack](https://kubernetes.slack.com/messages/C05SFMHRWK1) prior to each meeting.
+
+Maintainers will also hold closed meetings to discuss security reports or Code of Conduct violations. Such meetings should be scheduled by any Maintainer on receipt of a security issue or CoC report. All current Maintainers must be invited to such closed meetings, except for any Maintainer who is accused of a CoC violation.
+
+## CNCF Resources
+
+Any Maintainer may suggest a request for CNCF resources, either on the [mailing list](mailto:krkn.maintainers@gmail.com) or during a monthly meeting. A simple majority of Maintainers approves the request. The Maintainers may also choose to delegate working with the CNCF to non-Maintainer community members, who will then be added to the [CNCF's Maintainer List](https://github.com/cncf/foundation/blob/main/project-maintainers.csv) for that purpose.

 ## Code of Conduct

 Krkn follows the [CNCF Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md).
-Here is an excerpt:

->  As contributors and maintainers of this project, and in the interest of fostering an open and welcoming community, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
+> As contributors and maintainers of this project, and in the interest of fostering an open and welcoming community, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.

-## Maintainer Levels
+Code of Conduct violations by community members will be discussed and resolved on the [private maintainer mailing list](mailto:krkn.maintainers@gmail.com). If a Maintainer is directly involved in the report, two Maintainers will instead be designated to work with the CNCF Code of Conduct Committee in resolving it.

-### Contributor
-Contributors contribute to the community. Anyone can become a contributor by participating in discussions, reporting bugs, or contributing code or documentation.
+## Security Response Team

-#### Responsibilities:
+The Maintainers will appoint a Security Response Team to handle security reports. This committee may consist of the Maintainer Council itself. If this responsibility is delegated, the Maintainers will appoint a team of at least two contributors to handle it. The Maintainers will review the composition of this team at least once a year.

-Be active in the community and adhere to the Code of Conduct.
+The Security Response Team is responsible for handling all reports of security holes and breaches according to the [security policy](SECURITY.md).

-Report bugs and suggest new features.
+To report a security vulnerability, please follow the process outlined in [SECURITY.md](SECURITY.md) rather than filing a public GitHub issue.

-Contribute high-quality code and documentation.
+## Voting

+While most business in Krkn is conducted by "[lazy consensus](https://community.apache.org/committers/lazyConsensus.html)", periodically the Maintainers may need to vote on specific actions or changes. Any Maintainer may demand a vote be taken.

-### Member
-Members are active contributors to the community. Members have demonstrated a strong understanding of the project's codebase and conventions.
+Votes on general project matters may be raised on the [maintainer mailing list](mailto:krkn.maintainers@gmail.com) or during a monthly meeting. Votes on security vulnerabilities or Code of Conduct violations must be conducted exclusively on the [private maintainer mailing list](mailto:krkn.maintainers@gmail.com) or in a closed Maintainer meeting, in order to prevent accidental public disclosure of sensitive information.

-#### Responsibilities:
+Most votes require a **simple majority** of all Maintainers to succeed, except where otherwise noted. Two-thirds majority votes mean at least two-thirds of all existing Maintainers.

-Review pull requests for correctness, quality, and adherence to project standards.
+| Action | Required Vote |
+|--------|--------------|
+| Adding a new Maintainer | Simple majority |
+| Removing a Maintainer | 2/3 majority |
+| Approving CNCF resource requests | Simple majority |
+| Modifying this charter | 2/3 majority |

-Provide constructive and timely feedback to contributors.
+## Modifying this Charter

-Ensure that all contributions are well-tested and documented.
-
-Work with maintainers to ensure a smooth and efficient release process.
-
-### Maintainer
-Maintainers are responsible for the overall health and direction of the project. They are long-standing contributors who have shown a deep commitment to the project's success.
-
-#### Responsibilities:
-
-Set the technical direction and vision for the project.
-
-Manage releases and ensure the stability of the main branch.
-
-Make decisions on feature inclusion and project priorities.
-
-Mentor other contributors and help grow the community.
-
-Resolve disputes and make final decisions when consensus cannot be reached.
-
-### Owner
-Owners have administrative access to the project and are the final decision-makers.
-
-#### Responsibilities:
-
-Manage the core team of maintainers and approvers.
-
-Set the overall vision and strategy for the project.
-
-Handle administrative tasks, such as managing the project's repository and other resources.
-
-Represent the project in the broader open-source community.
-
-
-# Credits
-Sections of this document have been borrowed from [Kubernetes governance](https://github.com/kubernetes/community/blob/master/governance.md)
+Changes to this Governance document and its supporting documents may be approved by a 2/3 vote of the Maintainers.
--- a/MAINTAINERS.md
+++ b/MAINTAINERS.md
@@ -15,7 +15,7 @@ For detailed description of the roles, see [Governance](./GOVERNANCE.md) page.
 | Pradeep Surisetty   | [psuriset](https://github.com/psuriset)                   | psuriset@redhat.com     | Owner | 
 | Paige Patton     | [paigerube14](https://github.com/paigerube14)                | prubenda@redhat.com     | Maintainer | 
 | Tullio Sebastiani   | [tsebastiani](https://github.com/tsebastiani)             | tsebasti@redhat.com     | Maintainer | 
-| Yogananth Subramanian  | [yogananth-subramanian](https://github.com/yogananth-subramanian)             | ysubrama@redhat.com     |Maintainer | 
+| Yogananth Subramanian  | [yogananth-subramanian](https://github.com/yogananth-subramanian)             | ysubrama@redhat.com     | Maintainer | 
 | Sahil Shah  | [shahsahil264](https://github.com/shahsahil264)                   | sahshah@redhat.com   |  Member | 


@@ -32,3 +32,64 @@ The roles are:
 * Maintainer: A contributor who is responsible for the overall health and direction of the project.

 * Owner: A contributor who has administrative ownership of the project.
+
+
+## Maintainer Levels
+
+### Contributor
+Contributors contributor to the community. Anyone can become a contributor by participating in discussions, reporting bugs, or contributing code or documentation.
+
+#### Responsibilities:
+
+Be active in the community and adhere to the Code of Conduct.
+
+Report bugs and suggest new features.
+
+Contribute high-quality code and documentation.
+
+
+### Member
+Members are active contributors to the community. Members have demonstrated a strong understanding of the project's codebase and conventions.
+
+#### Responsibilities:
+
+Review pull requests for correctness, quality, and adherence to project standards.
+
+Provide constructive and timely feedback to contributors.
+
+Ensure that all contributions are well-tested and documented.
+
+Work with maintainers to ensure a smooth and efficient release process.
+
+### Maintainer
+Maintainers are responsible for the overall health and direction of the project. They are long-standing contributors who have shown a deep commitment to the project's success.
+
+#### Responsibilities:
+
+Set the technical direction and vision for the project.
+
+Manage releases and ensure the stability of the main branch.
+
+Make decisions on feature inclusion and project priorities.
+
+Mentor other contributors and help grow the community.
+
+Resolve disputes and make final decisions when consensus cannot be reached.
+
+### Owner
+Owners have administrative access to the project and are the final decision-makers.
+
+#### Responsibilities:
+
+Manage the core team of maintainers and approvers.
+
+Set the overall vision and strategy for the project.
+
+Handle administrative tasks, such as managing the project's repository and other resources.
+
+Represent the project in the broader open-source community.
+
+
+
+## Email 
+If you'd like to contact the krkn maintainers about a specific issue you're having, please reach out to use at krkn.maintainers@gmail.com. 
--- a/config/config.yaml
+++ b/config/config.yaml
@@ -56,7 +56,7 @@ kraken:
              - scenarios/kubevirt/kubevirt-vm-outage.yaml

 resiliency:
-  resiliency_run_mode: standalone  # Options: standalone, controller, disabled
+  resiliency_run_mode: standalone  # Options: standalone, detailed, disabled
  resiliency_file: config/alerts.yaml  # Path to SLO definitions, will resolve to performance_monitoring: alert_profile: if not specified

 cerberus:
--- a/containers/krknctl-input.json
+++ b/containers/krknctl-input.json
@@ -558,5 +558,31 @@
    "separator": ",",
    "default": "False",
    "required": "false"
+  },
+  {
+    "name": "resiliency-score",
+    "short_description": "Enable resiliency score calculation",
+    "description": "The system outputs a detailed resiliency score as a single-line JSON object, facilitating easy aggregation across multiple test scenarios.",
+    "variable": "RESILIENCY_SCORE",
+    "type": "boolean",
+    "required": "false"
+  },
+  {
+    "name": "disable-resiliency-score",
+    "short_description": "Disable resiliency score calculation",
+    "description": "Disable resiliency score calculation",
+    "variable": "DISABLE_RESILIENCY_SCORE",
+    "type": "boolean",
+    "required": "false"
+  },
+  {
+    "name": "resiliency-file",
+    "short_description": "Resiliency Score metrics file",
+    "description": "Custom Resiliency score file",
+    "variable": "RESILIENCY_FILE",
+    "type": "file",
+    "required": "false",
+    "mount_path": "/home/krkn/resiliency-file.yaml"
  }
+
 ]
--- a/krkn/prometheus/client.py
+++ b/krkn/prometheus/client.py
@@ -251,7 +251,7 @@ def metrics(
                        for k,v in pod.items():
                            metric[k] = v
                            metric['timestamp'] = str(datetime.datetime.now())
-                        print('adding pod' + str(metric))
+                        logging.debug("adding pod %s", metric)
                        metrics_list.append(metric.copy())
            for affected_node in scenario["affected_nodes"]:
                metric_name = "affected_nodes_recovery"
--- a/krkn/resiliency/resiliency.py
+++ b/krkn/resiliency/resiliency.py
@@ -306,7 +306,7 @@ class Resiliency:
            prom_cli: Pre-configured KrknPrometheus instance.
            total_start_time: Start time for the full test window.
            total_end_time: End time for the full test window.
-            run_mode: "controller" or "standalone" mode.
+            run_mode: "detailed" or "standalone" mode.

        Returns:
            (detailed_report)
@@ -320,7 +320,7 @@ class Resiliency:
            )
            detailed = self.get_detailed_report()

-            if run_mode == "controller":
+            if run_mode == "detailed":
                # krknctl expects the detailed report on stdout in a special format
                try:
                    detailed_json = json.dumps(detailed)
--- a/krkn/scenario_plugins/abstract_scenario_plugin.py
+++ b/krkn/scenario_plugins/abstract_scenario_plugin.py
@@ -1,4 +1,5 @@
 import logging
+import os
 import time
 from abc import ABC, abstractmethod
 from krkn_lib.models.telemetry import ScenarioTelemetry
@@ -86,6 +87,16 @@ class AbstractScenarioPlugin(ABC):
            scenario_telemetry.scenario = scenario_config
            scenario_telemetry.scenario_type = self.get_scenario_types()[0]
            scenario_telemetry.start_timestamp = time.time()
+            if not os.path.exists(scenario_config):
+                logging.error(
+                    f"scenario file not found: '{scenario_config}' -- "
+                    f"check that the path is correct relative to the working directory: {os.getcwd()}"
+                )
+                failed_scenarios.append(scenario_config)
+                scenario_telemetry.exit_status = 1
+                scenario_telemetry.end_timestamp = time.time()
+                scenario_telemetries.append(scenario_telemetry)
+                continue
            parsed_scenario_config = telemetry.set_parameters_base64(
                scenario_telemetry, scenario_config
            )
@@ -147,7 +158,7 @@ class AbstractScenarioPlugin(ABC):
                failed_scenarios.append(scenario_config)
            scenario_telemetries.append(scenario_telemetry)
            cerberus.publish_kraken_status(start_time,end_time)
-            logging.info(f"wating {wait_duration} before running the next scenario")
+            logging.info(f"waiting {wait_duration} before running the next scenario")
            time.sleep(wait_duration)
            
        return failed_scenarios, scenario_telemetries
--- a/krkn/scenario_plugins/kubevirt_vm_outage/kubevirt_vm_outage_scenario_plugin.py
+++ b/krkn/scenario_plugins/kubevirt_vm_outage/kubevirt_vm_outage_scenario_plugin.py
@@ -1,8 +1,7 @@
 import logging
 import time
-from typing import Dict, Any, Optional
+from typing import Dict, Any
 import random
-import re
 import yaml
 from kubernetes.client.rest import ApiException
 from krkn_lib.k8s import KrknKubernetes
@@ -35,7 +34,6 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
        self,
        run_uuid: str,
        scenario: str,
-        krkn_config: dict[str, any],
        lib_telemetry: KrknTelemetryOpenshift,
        scenario_telemetry: ScenarioTelemetry,
    ) -> int:
@@ -60,7 +58,7 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
            return 0
        except Exception as e:
            logging.error(f"KubeVirt VM Outage scenario failed: {e}")
-            log_exception(e)
+            log_exception(str(e))
            return 1

    def init_clients(self, k8s_client: KrknKubernetes):
@@ -143,7 +141,7 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
            
        except Exception as e:
            logging.error(f"Error executing KubeVirt VM outage scenario: {e}")
-            log_exception(e)
+            log_exception(str(e))
            return self.pods_status

    def validate_environment(self, vm_name: str, namespace: str) -> bool:
@@ -243,7 +241,7 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
            
        except Exception as e:
            logging.error(f"Error deleting VMI {vm_name}: {e}")
-            log_exception(e)
+            log_exception(str(e))
            self.pods_status.unrecovered.append(self.affected_pod)
            return 1

@@ -304,7 +302,7 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
                    
                except Exception as e:
                    logging.error(f"Error recreating VMI {vm_name}: {e}")
-                    log_exception(e)
+                    log_exception(str(e))
                    return 1
            else:
                logging.error(f"Failed to recover VMI {vm_name}: No original state captured and auto-recovery did not occur")
@@ -312,5 +310,5 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
                
        except Exception as e:
            logging.error(f"Unexpected error recovering VMI {vm_name}: {e}")
-            log_exception(e)
+            log_exception(str(e))
            return 1
--- a/krkn/scenario_plugins/managed_cluster/managed_cluster_scenario_plugin.py
+++ b/krkn/scenario_plugins/managed_cluster/managed_cluster_scenario_plugin.py
@@ -1,5 +1,4 @@
 import logging
-import time

 import yaml
 from krkn_lib.k8s import KrknKubernetes
@@ -28,7 +27,6 @@ class ManagedClusterScenarioPlugin(AbstractScenarioPlugin):
                )
                if managedcluster_scenario["actions"]:
                    for action in managedcluster_scenario["actions"]:
-                        start_time = int(time.time())
                        try:
                            self.inject_managedcluster_scenario(
                                action,
@@ -44,6 +42,7 @@ class ManagedClusterScenarioPlugin(AbstractScenarioPlugin):
                            return 1
                        else:
                            return 0
+            return 0

    def inject_managedcluster_scenario(
        self,
--- a/krkn/scenario_plugins/time_actions/time_actions_scenario_plugin.py
+++ b/krkn/scenario_plugins/time_actions/time_actions_scenario_plugin.py
@@ -36,7 +36,7 @@ class TimeActionsScenarioPlugin(AbstractScenarioPlugin):
                    )
                    if len(not_reset) > 0:
                        logging.info("Object times were not reset")
-        except (RuntimeError, Exception):
+        except (RuntimeError, Exception) as e:
            logging.error(
                f"TimeActionsScenarioPlugin scenario {scenario} failed with exception: {e}"
            )
--- a/krkn/scenario_plugins/zone_outage/zone_outage_scenario_plugin.py
+++ b/krkn/scenario_plugins/zone_outage/zone_outage_scenario_plugin.py
@@ -1,3 +1,5 @@
+import base64
+import json
 import logging
 import time

@@ -13,11 +15,15 @@ from krkn_lib.telemetry.ocp import KrknTelemetryOpenshift

 from krkn.scenario_plugins.abstract_scenario_plugin import AbstractScenarioPlugin
 from krkn_lib.utils import get_yaml_item_value
+from krkn.rollback.config import RollbackContent
+from krkn.rollback.handler import set_rollback_context_decorator

 from krkn.scenario_plugins.node_actions.aws_node_scenarios import AWS
 from krkn.scenario_plugins.node_actions.gcp_node_scenarios import gcp_node_scenarios

+
 class ZoneOutageScenarioPlugin(AbstractScenarioPlugin):
+    @set_rollback_context_decorator
    def run(
        self,
        run_uuid: str,
@@ -40,7 +46,9 @@ class ZoneOutageScenarioPlugin(AbstractScenarioPlugin):
                    if cloud_type.lower() == "gcp":
                        affected_nodes_status = AffectedNodeStatus()
                        self.cloud_object = gcp_node_scenarios(kubecli, kube_check, affected_nodes_status)
-                        self.node_based_zone(scenario_config, kubecli)
+                        result = self.node_based_zone(scenario_config, kubecli)
+                        if result != 0:
+                            return result
                        affected_nodes_status = self.cloud_object.affected_nodes_status
                        scenario_telemetry.affected_nodes.extend(affected_nodes_status.affected_nodes)
                    else:
@@ -57,22 +65,37 @@ class ZoneOutageScenarioPlugin(AbstractScenarioPlugin):
            return 1
        else:
            return 0
-        
-    def node_based_zone(self, scenario_config: dict[str, any], kubecli: KrknKubernetes ):
+
+    def node_based_zone(self, scenario_config: dict[str, any], kubecli: KrknKubernetes):
        zone = scenario_config["zone"]
        duration = get_yaml_item_value(scenario_config, "duration", 60)
        timeout = get_yaml_item_value(scenario_config, "timeout", 180)
+        kube_check = get_yaml_item_value(scenario_config, "kube_check", True)
        label_selector = f"topology.kubernetes.io/zone={zone}"
-        try: 
+        try:
            # get list of nodes in zone/region
            nodes = kubecli.list_killable_nodes(label_selector)
-            # stop nodes in parallel 
-            pool = ThreadPool(processes=len(nodes))
-    
-            pool.starmap(
-                self.cloud_object.node_stop_scenario,zip(repeat(1), nodes, repeat(timeout))
+
+            # set rollback callable before stopping nodes
+            rollback_data = {
+                "nodes": nodes,
+                "timeout": timeout,
+                "kube_check": kube_check,
+            }
+            encoded = base64.b64encode(
+                json.dumps(rollback_data).encode("utf-8")
+            ).decode("utf-8")
+            self.rollback_handler.set_rollback_callable(
+                self.rollback_gcp_zone_outage,
+                RollbackContent(resource_identifier=encoded),
            )

+            # stop nodes in parallel
+            pool = ThreadPool(processes=len(nodes))
+            pool.starmap(
+                self.cloud_object.node_stop_scenario,
+                zip(repeat(1), nodes, repeat(timeout), repeat(None)),
+            )
            pool.close()

            logging.info(
@@ -80,10 +103,11 @@ class ZoneOutageScenarioPlugin(AbstractScenarioPlugin):
            )
            time.sleep(duration)

-            # start nodes in parallel 
+            # start nodes in parallel
            pool = ThreadPool(processes=len(nodes))
            pool.starmap(
-                self.cloud_object.node_start_scenario,zip(repeat(1), nodes, repeat(timeout))
+                self.cloud_object.node_start_scenario,
+                zip(repeat(1), nodes, repeat(timeout), repeat(None)),
            )
            pool.close()
        except Exception as e:
@@ -94,6 +118,58 @@ class ZoneOutageScenarioPlugin(AbstractScenarioPlugin):
        else:
            return 0

+    @staticmethod
+    def rollback_gcp_zone_outage(
+        rollback_content: RollbackContent,
+        lib_telemetry: KrknTelemetryOpenshift,
+    ):
+        """Rollback function to restart stopped nodes after a GCP zone outage
+        scenario failure.
+
+        :param rollback_content: Rollback content containing encoded node
+            list and config.
+        :param lib_telemetry: Instance of KrknTelemetryOpenshift for
+            Kubernetes operations.
+        """
+        try:
+            import json
+            import base64
+            from krkn_lib.models.k8s import AffectedNodeStatus
+            from krkn.scenario_plugins.node_actions.gcp_node_scenarios import (
+                gcp_node_scenarios,
+            )
+
+            decoded = base64.b64decode(
+                rollback_content.resource_identifier.encode("utf-8")
+            ).decode("utf-8")
+            rollback_data = json.loads(decoded)
+            nodes = rollback_data["nodes"]
+            timeout = rollback_data["timeout"]
+            kube_check = rollback_data["kube_check"]
+
+            kubecli = lib_telemetry.get_lib_kubernetes()
+            affected_nodes_status = AffectedNodeStatus()
+            cloud_object = gcp_node_scenarios(
+                kubecli, kube_check, affected_nodes_status
+            )
+
+            logging.info(
+                "Rolling back GCP zone outage: starting %d stopped nodes"
+                % len(nodes)
+            )
+            for node in nodes:
+                try:
+                    cloud_object.node_start_scenario(1, node, timeout, None)
+                except Exception as node_error:
+                    logging.error(
+                        "Failed to start node %s during rollback: %s"
+                        % (node, node_error)
+                    )
+            logging.info("GCP zone outage rollback completed.")
+        except Exception as e:
+            logging.error("Failed to rollback GCP zone outage: %s" % e)
+            raise
+
    def network_based_zone(self, scenario_config: dict[str, any]):

        vpc_id = scenario_config["vpc_id"]
@@ -118,12 +194,12 @@ class ZoneOutageScenarioPlugin(AbstractScenarioPlugin):
                "Network association ids associated with "
                "the subnet %s: %s" % (subnet_id, network_association_ids)
            )
-            
+
            # Use provided default ACL if available, otherwise create a new one
            if default_acl_id:
                acl_id = default_acl_id
                logging.info(
-                    "Using provided default ACL ID %s - this ACL will not be deleted after the scenario", 
+                    "Using provided default ACL ID %s - this ACL will not be deleted after the scenario",
                    default_acl_id
                )
                # Don't add to acl_ids_created since we don't want to delete user-provided ACLs at cleanup
@@ -160,6 +236,5 @@ class ZoneOutageScenarioPlugin(AbstractScenarioPlugin):
        for acl_id in acl_ids_created:
            self.cloud_object.delete_network_acl(acl_id)

-
    def get_scenario_types(self) -> list[str]:
        return ["zone_outages_scenarios"]
--- a/krkn/utils/VirtChecker.py
+++ b/krkn/utils/VirtChecker.py
@@ -171,7 +171,7 @@ class VirtChecker:
                            if new_node_name and vm.node_name != new_node_name:
                                vm.node_name = new_node_name
                except Exception:
-                    logging.info('Exception in get vm status')
+                    logging.exception("Exception in get vm status")
                    vm_status = False

                if vm.vm_name not in virt_check_tracker:
--- a/krkn/utils/init.py
+++ b/krkn/utils/init.py
@@ -1,4 +1,10 @@
 from .TeeLogHandler import TeeLogHandler
 from .ErrorLog import ErrorLog
 from .ErrorCollectionHandler import ErrorCollectionHandler
-from .functions import *
+from .functions import (
+    populate_cluster_events, 
+    collect_and_put_ocp_logs, 
+    KrknKubernetes, 
+    ScenarioTelemetry, 
+    KrknTelemetryOpenshift
+)
--- a/run_kraken.py
+++ b/run_kraken.py
@@ -65,8 +65,6 @@ def main(options, command: Optional[str]) -> int:
    if os.path.isfile(cfg):
        with open(cfg, "r") as f:
            config = yaml.full_load(f)
-        global kubeconfig_path, wait_duration, kraken_config
-
        kubeconfig_path = os.path.expanduser(
            get_yaml_item_value(config["kraken"], "kubeconfig_path", "")
        )
@@ -95,7 +93,7 @@ def main(options, command: Optional[str]) -> int:
        run_signal = get_yaml_item_value(config["kraken"], "signal_state", "RUN")
        
        resiliency_config = get_yaml_item_value(config,"resiliency",{})
-        # Determine execution mode (standalone, controller, or disabled)
+        # Determine execution mode (standalone, detailed, or disabled)
        run_mode = get_yaml_item_value(resiliency_config, "resiliency_run_mode", "standalone")
        valid_run_modes = {"standalone", "detailed", "disabled"}
        if run_mode not in valid_run_modes:
--- a/scenarios/kube/cpu-hog.yml
+++ b/scenarios/kube/cpu-hog.yml
@@ -1,4 +1,4 @@
-duration: 60
+duration: 10
 workers: '' # leave it empty '' node cpu auto-detection
 hog-type: cpu
 image: quay.io/krkn-chaos/krkn-hog
--- a/tests/test_abstract_scenario_plugin_cerberus.py
+++ b/tests/test_abstract_scenario_plugin_cerberus.py
@@ -66,9 +66,10 @@ class TestAbstractScenarioPluginCerberusIntegration(unittest.TestCase):
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.cleanup_rollback_version_files')
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.utils.collect_and_put_ocp_logs')
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.signal_handler.signal_context')
+    @patch('krkn.scenario_plugins.abstract_scenario_plugin.os.path.exists', return_value=True)
    @patch('time.sleep')
    def test_cerberus_publish_called_after_successful_scenario(
-        self, mock_sleep, mock_signal_ctx, mock_collect_logs, mock_cleanup, mock_cerberus_publish
+        self, mock_sleep, mock_exists, mock_signal_ctx, mock_collect_logs, mock_cleanup, mock_cerberus_publish
    ):
        """Test that cerberus.publish_kraken_status is called after a successful scenario"""
        mock_signal_ctx.return_value.__enter__ = Mock()
@@ -97,9 +98,10 @@ class TestAbstractScenarioPluginCerberusIntegration(unittest.TestCase):
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.execute_rollback_version_files')
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.utils.collect_and_put_ocp_logs')
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.signal_handler.signal_context')
+    @patch('krkn.scenario_plugins.abstract_scenario_plugin.os.path.exists', return_value=True)
    @patch('time.sleep')
    def test_cerberus_publish_called_after_failed_scenario(
-        self, mock_sleep, mock_signal_ctx, mock_collect_logs, mock_rollback, mock_cerberus_publish
+        self, mock_sleep, mock_exists, mock_signal_ctx, mock_collect_logs, mock_rollback, mock_cerberus_publish
    ):
        """Test that cerberus.publish_kraken_status is called even after a failed scenario"""
        mock_signal_ctx.return_value.__enter__ = Mock()
@@ -122,9 +124,10 @@ class TestAbstractScenarioPluginCerberusIntegration(unittest.TestCase):
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.cleanup_rollback_version_files')
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.utils.collect_and_put_ocp_logs')
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.signal_handler.signal_context')
+    @patch('krkn.scenario_plugins.abstract_scenario_plugin.os.path.exists', return_value=True)
    @patch('time.sleep')
    def test_cerberus_publish_called_for_multiple_scenarios(
-        self, mock_sleep, mock_signal_ctx, mock_collect_logs, mock_cleanup, mock_cerberus_publish
+        self, mock_sleep, mock_exists, mock_signal_ctx, mock_collect_logs, mock_cleanup, mock_cerberus_publish
    ):
        """Test that cerberus.publish_kraken_status is called for each scenario"""
        mock_signal_ctx.return_value.__enter__ = Mock()
@@ -148,10 +151,11 @@ class TestAbstractScenarioPluginCerberusIntegration(unittest.TestCase):
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.execute_rollback_version_files')
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.utils.collect_and_put_ocp_logs')
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.signal_handler.signal_context')
+    @patch('krkn.scenario_plugins.abstract_scenario_plugin.os.path.exists', return_value=True)
    @patch('time.sleep')
    @patch('time.time')
    def test_cerberus_publish_timing(
-        self, mock_time, mock_sleep, mock_signal_ctx, mock_collect_logs, 
+        self, mock_time, mock_sleep, mock_exists, mock_signal_ctx, mock_collect_logs,
        mock_rollback, mock_cleanup, mock_cerberus_publish
    ):
        """Test that cerberus.publish_kraken_status receives correct timestamps"""
@@ -181,9 +185,10 @@ class TestAbstractScenarioPluginCerberusIntegration(unittest.TestCase):
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.cleanup_rollback_version_files')
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.utils.collect_and_put_ocp_logs')
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.signal_handler.signal_context')
+    @patch('krkn.scenario_plugins.abstract_scenario_plugin.os.path.exists', return_value=True)
    @patch('time.sleep')
    def test_cerberus_publish_exception_does_not_break_flow(
-        self, mock_sleep, mock_signal_ctx, mock_collect_logs, mock_cleanup, mock_cerberus_publish
+        self, mock_sleep, mock_exists, mock_signal_ctx, mock_collect_logs, mock_cleanup, mock_cerberus_publish
    ):
        """Test that exceptions in cerberus.publish_kraken_status don't break scenario execution"""
        mock_signal_ctx.return_value.__enter__ = Mock()
@@ -210,9 +215,10 @@ class TestAbstractScenarioPluginCerberusIntegration(unittest.TestCase):
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.execute_rollback_version_files')
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.utils.collect_and_put_ocp_logs')
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.signal_handler.signal_context')
+    @patch('krkn.scenario_plugins.abstract_scenario_plugin.os.path.exists', return_value=True)
    @patch('time.sleep')
    def test_cerberus_publish_called_for_mixed_success_and_failure(
-        self, mock_sleep, mock_signal_ctx, mock_collect_logs, mock_rollback, 
+        self, mock_sleep, mock_exists, mock_signal_ctx, mock_collect_logs, mock_rollback,
        mock_cleanup, mock_cerberus_publish
    ):
        """Test cerberus publish is called for both successful and failed scenarios"""
@@ -250,9 +256,10 @@ class TestAbstractScenarioPluginCerberusIntegration(unittest.TestCase):
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.cerberus.publish_kraken_status')
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.utils.collect_and_put_ocp_logs')
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.signal_handler.signal_context')
+    @patch('krkn.scenario_plugins.abstract_scenario_plugin.os.path.exists', return_value=True)
    @patch('time.sleep')
    def test_cerberus_not_called_for_deprecated_post_scenarios(
-        self, mock_sleep, mock_signal_ctx, mock_collect_logs, mock_cerberus_publish
+        self, mock_sleep, mock_exists, mock_signal_ctx, mock_collect_logs, mock_cerberus_publish
    ):
        """Test that cerberus is not called for deprecated post scenarios (list format)"""
        mock_signal_ctx.return_value.__enter__ = Mock()
@@ -277,9 +284,10 @@ class TestAbstractScenarioPluginCerberusIntegration(unittest.TestCase):
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.utils.collect_and_put_ocp_logs')
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.utils.populate_cluster_events')
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.signal_handler.signal_context')
+    @patch('krkn.scenario_plugins.abstract_scenario_plugin.os.path.exists', return_value=True)
    @patch('time.sleep')
    def test_cerberus_called_with_events_backup_enabled(
-        self, mock_sleep, mock_signal_ctx, mock_populate_events, 
+        self, mock_sleep, mock_exists, mock_signal_ctx, mock_populate_events,
        mock_collect_logs, mock_cleanup, mock_cerberus_publish
    ):
        """Test that cerberus is called even when events_backup is enabled"""
@@ -308,9 +316,10 @@ class TestAbstractScenarioPluginCerberusIntegration(unittest.TestCase):
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.execute_rollback_version_files')
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.utils.collect_and_put_ocp_logs')
    @patch('krkn.scenario_plugins.abstract_scenario_plugin.signal_handler.signal_context')
+    @patch('krkn.scenario_plugins.abstract_scenario_plugin.os.path.exists', return_value=True)
    @patch('time.sleep')
    def test_cerberus_called_after_exception_in_run(
-        self, mock_sleep, mock_signal_ctx, mock_collect_logs, 
+        self, mock_sleep, mock_exists, mock_signal_ctx, mock_collect_logs,
        mock_rollback, mock_cerberus_publish
    ):
        """Test that cerberus is called even if run() raises an uncaught exception"""
@@ -345,5 +354,73 @@ class TestAbstractScenarioPluginCerberusIntegration(unittest.TestCase):
        self.assertEqual(telemetries[0].exit_status, 1)


+    @patch('krkn.scenario_plugins.abstract_scenario_plugin.cerberus.publish_kraken_status')
+    @patch('krkn.scenario_plugins.abstract_scenario_plugin.os.path.exists', return_value=False)
+    @patch('time.sleep')
+    def test_missing_scenario_file_logs_error_and_marks_failed(
+        self, mock_sleep, mock_exists, mock_cerberus_publish
+    ):
+        """Test that a missing scenario file logs a clear error and is marked as failed without crashing"""
+        scenarios_list = ["scenarios/openshift/cnv.yml"]
+
+        with self.assertLogs('root', level='ERROR') as log_ctx:
+            failed_scenarios, telemetries = self.plugin.run_scenarios(
+                "test-uuid",
+                scenarios_list,
+                self.krkn_config,
+                self.mock_telemetry,
+            )
+
+        # scenario is marked failed and returned in failed list
+        self.assertEqual(len(failed_scenarios), 1)
+        self.assertEqual(failed_scenarios[0], "scenarios/openshift/cnv.yml")
+
+        # telemetry recorded with exit_status=1
+        self.assertEqual(len(telemetries), 1)
+        self.assertEqual(telemetries[0].exit_status, 1)
+
+        # error message contains the missing path
+        self.assertTrue(
+            any("scenarios/openshift/cnv.yml" in msg for msg in log_ctx.output),
+            f"Expected file path in error log, got: {log_ctx.output}",
+        )
+
+        # set_parameters_base64 and cerberus should not be called
+        self.mock_telemetry.set_parameters_base64.assert_not_called()
+        mock_cerberus_publish.assert_not_called()
+
+    @patch('krkn.scenario_plugins.abstract_scenario_plugin.cerberus.publish_kraken_status')
+    @patch('krkn.scenario_plugins.abstract_scenario_plugin.cleanup_rollback_version_files')
+    @patch('krkn.scenario_plugins.abstract_scenario_plugin.utils.collect_and_put_ocp_logs')
+    @patch('krkn.scenario_plugins.abstract_scenario_plugin.signal_handler.signal_context')
+    @patch('krkn.scenario_plugins.abstract_scenario_plugin.os.path.exists')
+    @patch('time.sleep')
+    def test_missing_scenario_file_skipped_others_continue(
+        self, mock_sleep, mock_exists, mock_signal_ctx, mock_collect_logs,
+        mock_cleanup, mock_cerberus_publish
+    ):
+        """Test that a missing file is skipped and remaining scenarios still run"""
+        mock_signal_ctx.return_value.__enter__ = Mock()
+        mock_signal_ctx.return_value.__exit__ = Mock(return_value=False)
+        # first file missing, second exists
+        mock_exists.side_effect = [False, True]
+
+        scenarios_list = ["missing.yml", "scenario2.yaml"]
+
+        with self.assertLogs('root', level='ERROR'):
+            failed_scenarios, telemetries = self.plugin.run_scenarios(
+                "test-uuid",
+                scenarios_list,
+                self.krkn_config,
+                self.mock_telemetry,
+            )
+
+        self.assertIn("missing.yml", failed_scenarios)
+        self.assertNotIn("scenario2.yaml", failed_scenarios)
+        self.assertEqual(len(telemetries), 2)
+        # cerberus called only for the scenario that ran
+        self.assertEqual(mock_cerberus_publish.call_count, 1)
+
+
 if __name__ == '__main__':
    unittest.main()
--- a/tests/test_kubevirt_vm_outage.py
+++ b/tests/test_kubevirt_vm_outage.py
@@ -176,7 +176,7 @@ class TestKubevirtVmOutageScenarioPlugin(unittest.TestCase):
        self.k8s_client.delete_vmi.return_value = None

        with patch("builtins.open", unittest.mock.mock_open(read_data=yaml.dump(self.config))):
-            result = self.plugin.run("test-uuid", self.scenario_file, {}, self.telemetry, self.scenario_telemetry)
+            result = self.plugin.run("test-uuid", self.scenario_file, self.telemetry, self.scenario_telemetry)

        self.assertEqual(result, 0)
        self.k8s_client.delete_vmi.assert_called_once_with("test-vm", "default")
@@ -196,7 +196,7 @@ class TestKubevirtVmOutageScenarioPlugin(unittest.TestCase):
        self.k8s_client.delete_vmi.side_effect = ApiException(status=500)

        with patch("builtins.open", unittest.mock.mock_open(read_data=yaml.dump(self.config))):
-            result = self.plugin.run("test-uuid", self.scenario_file, {}, self.telemetry, self.scenario_telemetry)
+            result = self.plugin.run("test-uuid", self.scenario_file, self.telemetry, self.scenario_telemetry)

        self.assertEqual(result, 1)
        self.k8s_client.delete_vmi.assert_called_once_with("test-vm", "default")
@@ -234,7 +234,7 @@ class TestKubevirtVmOutageScenarioPlugin(unittest.TestCase):
        self.k8s_client.delete_vmi.return_value = None

        with patch("builtins.open", unittest.mock.mock_open(read_data=yaml.dump(self.config))):
-            result = self.plugin.run("test-uuid", self.scenario_file, {}, self.telemetry, self.scenario_telemetry)
+            result = self.plugin.run("test-uuid", self.scenario_file, self.telemetry, self.scenario_telemetry)

        self.assertEqual(result, 0)
        # Verify patch_vm was called to disable auto-restart
@@ -278,7 +278,7 @@ class TestKubevirtVmOutageScenarioPlugin(unittest.TestCase):
        self.k8s_client.get_vmi.return_value = None

        with patch("builtins.open", unittest.mock.mock_open(read_data=yaml.dump(self.config))):
-            result = self.plugin.run("test-uuid", self.scenario_file, {}, self.telemetry, self.scenario_telemetry)
+            result = self.plugin.run("test-uuid", self.scenario_file, self.telemetry, self.scenario_telemetry)

        # When validation fails, run() returns 1 due to exception handling
        self.assertEqual(result, 1)
--- a/tests/test_logging_and_code_quality.py
+++ b/tests/test_logging_and_code_quality.py
@@ -0,0 +1,323 @@
+#!/usr/bin/env python3
+"""
+Tests for fixes introduced in issues #24–#28.
+
+Stubs all external dependencies (krkn_lib, kubernetes, broken urllib3)
+so tests run without any additional installs.
+
+Usage (run from repo root):
+    python3 -m coverage run -a -m unittest tests/test_fixes_24_to_28.py -v
+"""
+
+import queue
+import sys
+import types
+import unittest
+from unittest.mock import MagicMock, patch
+
+
+# ---------------------------------------------------------------------------
+# Inject minimal stubs for every external dependency
+# ---------------------------------------------------------------------------
+
+def _inject(name, **attrs):
+    mod = types.ModuleType(name)
+    for k, v in attrs.items():
+        setattr(mod, k, v)
+    sys.modules.setdefault(name, mod)
+    return sys.modules[name]
+
+
+# -- krkn_lib ----------------------------------------------------------------
+_inject("krkn_lib")
+_inject("krkn_lib.utils", deep_get_attribute=MagicMock(return_value=[]))
+_inject("krkn_lib.utils.functions",
+        get_yaml_item_value=MagicMock(
+            side_effect=lambda cfg, key, default: (
+                cfg.get(key, default) if isinstance(cfg, dict) else default
+            )
+        ))
+_inject("krkn_lib.models.telemetry",
+        ScenarioTelemetry=MagicMock(), ChaosRunTelemetry=MagicMock())
+
+
+class _VirtCheck:
+    def __init__(self, d):
+        for k, v in d.items():
+            setattr(self, k, v)
+
+
+_inject("krkn_lib.models.telemetry.models", VirtCheck=_VirtCheck)
+_inject("krkn_lib.models.krkn",
+        ChaosRunAlertSummary=MagicMock(), ChaosRunAlert=MagicMock())
+_inject("krkn_lib.models.elastic.models", ElasticAlert=MagicMock())
+_inject("krkn_lib.models.elastic", ElasticChaosRunTelemetry=MagicMock())
+_inject("krkn_lib.models.k8s", ResiliencyReport=MagicMock())
+_inject("krkn_lib.elastic.krkn_elastic", KrknElastic=MagicMock())
+_inject("krkn_lib.prometheus.krkn_prometheus", KrknPrometheus=MagicMock())
+_inject("krkn_lib.telemetry.ocp", KrknTelemetryOpenshift=MagicMock())
+_inject("krkn_lib.telemetry.k8s", KrknTelemetryKubernetes=MagicMock())
+_inject("krkn_lib.k8s", KrknKubernetes=MagicMock())
+_inject("krkn_lib.ocp", KrknOpenshift=MagicMock())
+
+# -- broken third-party ------------------------------------------------------
+# urllib3.exceptions doesn't export HTTPError on this Python version
+import urllib3.exceptions  # noqa: E402 (real module, just patch the attr)
+if not hasattr(urllib3.exceptions, "HTTPError"):
+    urllib3.exceptions.HTTPError = Exception
+
+# kubernetes – stub the whole chain before anything imports it
+_inject("kubernetes")
+_inject("kubernetes.client")
+_inject("kubernetes.client.rest", ApiException=type("ApiException", (Exception,), {}))
+
+# -- other stubs needed by krkn internals ------------------------------------
+_inject("tzlocal")
+_inject("tzlocal.unix", get_localzone=MagicMock(return_value="UTC"))
+
+# kubevirt plugin (imports kubernetes.client.rest)
+_KubevirtPlugin = MagicMock()
+_inject(
+    "krkn.scenario_plugins.kubevirt_vm_outage"
+    ".kubevirt_vm_outage_scenario_plugin",
+    KubevirtVmOutageScenarioPlugin=_KubevirtPlugin,
+)
+
+# -- yaml (real or stub) -----------------------------------------------------
+try:
+    import yaml as _yaml  # noqa: F401
+except ImportError:
+    _inject("yaml")
+
+# ---------------------------------------------------------------------------
+# Now import the actual krkn modules under test
+# ---------------------------------------------------------------------------
+
+from krkn.prometheus import client                        # noqa: E402
+from krkn.utils import VirtChecker as VirtCheckerModule  # noqa: E402
+from krkn.utils.VirtChecker import VirtChecker           # noqa: E402
+
+
+# ===========================================================================
+# #1 — Typo "wating" -> "waiting"
+# ===========================================================================
+
+class TestIssue24TypoFix(unittest.TestCase):
+    """#24: Log message must spell 'waiting' correctly."""
+
+    def test_no_wating_typo_in_source(self):
+        import pathlib
+        src = pathlib.Path("krkn/scenario_plugins/abstract_scenario_plugin.py").read_text()
+        self.assertNotIn('"wating ', src,
+                         "Typo 'wating' still present in abstract_scenario_plugin.py")
+
+    def test_waiting_present_in_source(self):
+        import pathlib
+        src = pathlib.Path("krkn/scenario_plugins/abstract_scenario_plugin.py").read_text()
+        self.assertIn('"waiting ', src,
+                      "'waiting' not found in abstract_scenario_plugin.py")
+
+
+# ===========================================================================
+# #2 — print() replaced by logging.debug()
+# ===========================================================================
+
+class TestIssue25NoPrintInClient(unittest.TestCase):
+    """#25: client.py must not use print() for pod metric messages."""
+
+    def test_no_print_adding_pod(self):
+        import pathlib
+        src = pathlib.Path("krkn/prometheus/client.py").read_text()
+        self.assertNotIn("print('adding pod'", src)
+        self.assertNotIn('print("adding pod"', src)
+
+    def test_logging_debug_used(self):
+        import pathlib
+        src = pathlib.Path("krkn/prometheus/client.py").read_text()
+        self.assertIn('logging.debug("adding pod', src)
+
+    def test_metrics_does_not_write_to_stdout(self):
+        """metrics() must not emit to stdout for pod telemetry entries."""
+        import io, json, os, tempfile
+        prom_cli = MagicMock()
+        prom_cli.process_prom_query_in_range.return_value = []
+        prom_cli.process_query.return_value = []
+
+        telemetry_data = {
+            "scenarios": [{
+                "affected_pods": {
+                    "disrupted": [{"name": "pod-1", "namespace": "default"}]
+                },
+                "affected_nodes": [],
+            }],
+            "health_checks": [],
+            "virt_checks": [],
+        }
+        profile = tempfile.NamedTemporaryFile(
+            mode="w", suffix=".yaml", delete=False
+        )
+        profile.write("metrics:\n  - query: up\n    metricName: uptime\n")
+        profile.close()
+
+        elastic = MagicMock()
+        elastic.upload_metrics_to_elasticsearch.return_value = 0
+
+        captured = io.StringIO()
+        sys.stdout, orig = captured, sys.stdout
+        try:
+            client.metrics(
+                prom_cli, elastic, "uuid-1",
+                1_000_000.0, 1_000_060.0,
+                profile.name, "idx",
+                json.dumps(telemetry_data),
+            )
+        finally:
+            sys.stdout = orig
+            os.unlink(profile.name)
+
+        self.assertEqual(
+            captured.getvalue(), "",
+            f"stdout was not empty: {captured.getvalue()!r}",
+        )
+
+
+# ===========================================================================
+# #3 — Star import removed
+# ===========================================================================
+
+class TestIssue26NoStarImport(unittest.TestCase):
+    """#26: utils/__init__.py must use explicit imports, not star import."""
+
+    def test_no_star_import(self):
+        import pathlib
+        src = pathlib.Path("krkn/utils/__init__.py").read_text()
+        self.assertNotIn("import *", src)
+
+    def test_explicit_names_present(self):
+        import pathlib
+        src = pathlib.Path("krkn/utils/__init__.py").read_text()
+        self.assertIn("populate_cluster_events", src)
+        self.assertIn("collect_and_put_ocp_logs", src)
+        self.assertIn("KrknKubernetes", src)
+        self.assertIn("ScenarioTelemetry", src)
+        self.assertIn("KrknTelemetryOpenshift", src)
+
+    def test_functions_accessible_from_package(self):
+        from krkn import utils
+        self.assertTrue(hasattr(utils, "populate_cluster_events"))
+        self.assertTrue(hasattr(utils, "collect_and_put_ocp_logs"))
+        self.assertTrue(hasattr(utils, "KrknKubernetes"))
+        self.assertTrue(hasattr(utils, "ScenarioTelemetry"))
+        self.assertTrue(hasattr(utils, "KrknTelemetryOpenshift"))
+
+
+# ===========================================================================
+# #4 — global declaration removed from main()
+# ===========================================================================
+
+class TestIssue27NoGlobalInMain(unittest.TestCase):
+    """#27: main() in run_kraken.py must not declare global variables."""
+
+    def test_no_global_statement_in_main(self):
+        import ast, pathlib
+        src = pathlib.Path("run_kraken.py").read_text()
+        tree = ast.parse(src)
+        found = []
+        for node in ast.walk(tree):
+            if isinstance(node, ast.FunctionDef) and node.name == "main":
+                for child in ast.walk(node):
+                    if isinstance(child, ast.Global):
+                        found.extend(child.names)
+        self.assertEqual(found, [],
+                         f"Global declarations found in main(): {found}")
+
+
+# ===========================================================================
+# #5 — Exception logged at ERROR level, not INFO
+# ===========================================================================
+
+class TestIssue28ExceptionLogLevel(unittest.TestCase):
+    """#28: VirtChecker must log VM status exceptions at ERROR, not INFO."""
+
+    def test_no_info_for_vm_exception_in_source(self):
+        import pathlib
+        src = pathlib.Path("krkn/utils/VirtChecker.py").read_text()
+        self.assertNotIn(
+            "logging.info('Exception in get vm status')", src
+        )
+
+    def test_error_level_present_in_source(self):
+        import pathlib
+        src = pathlib.Path("krkn/utils/VirtChecker.py").read_text()
+        self.assertIn(
+            'logging.exception("Exception in get vm status")', src
+        )
+
+    def test_runtime_exception_triggers_error_log(self):
+        """When get_vm_access raises, the handler must call logging.error."""
+        config = {}
+        mock_krkn = MagicMock()
+
+        with patch(
+            "krkn.utils.VirtChecker.get_yaml_item_value",
+            side_effect=lambda cfg, key, default: (
+                cfg.get(key, default) if isinstance(cfg, dict) else default
+            ),
+        ):
+            checker = VirtChecker(config, iterations=1, krkn_lib=mock_krkn)
+
+        checker.batch_size = 1
+        checker.interval = 0
+        checker.disconnected = False
+
+        vm = _VirtCheck({
+            "vm_name": "vm-1",
+            "ip_address": "1.2.3.4",
+            "namespace": "ns",
+            "node_name": "w1",
+            "new_ip_address": "",
+        })
+
+        error_calls, info_calls, exception_calls = [], [], []
+
+        with (
+            patch.object(
+                checker, "get_vm_access",
+                side_effect=RuntimeError("connection refused"),
+            ),
+            patch("krkn.utils.VirtChecker.logging") as mock_log,
+            patch("krkn.utils.VirtChecker.time") as mock_time,
+        ):
+            mock_log.error.side_effect = (
+                lambda msg, *a, **kw: error_calls.append(msg % a if a else msg)
+            )
+            mock_log.info.side_effect = (
+                lambda msg, *a, **kw: info_calls.append(msg % a if a else msg)
+            )
+            mock_log.exception.side_effect = (
+                lambda msg, *a, **kw: exception_calls.append(msg % a if a else msg)
+            )
+            # End loop after first sleep
+            mock_time.sleep.side_effect = (
+                lambda _: setattr(checker, "current_iterations", 999)
+            )
+            checker.current_iterations = 0
+
+            q = queue.SimpleQueue()
+            checker.run_virt_check([vm], q)
+
+        vm_infos  = [m for m in info_calls  if "Exception in get vm status" in m]
+        err_vm_msgs  = [m for m in error_calls + exception_calls if "Exception in get vm status" in m]
+
+        self.assertEqual(
+            vm_infos, [],
+            "Exception still logged at INFO level at runtime",
+        )
+        self.assertGreater(
+            len(err_vm_msgs), 0,
+            "Exception not logged at ERROR level at runtime",
+        )
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/tests/test_resiliency.py
+++ b/tests/test_resiliency.py
@@ -597,7 +597,7 @@ class TestFinalizeAndSave(unittest.TestCase):
            prom_cli=self.mock_prom,
            total_start_time=self.start,
            total_end_time=self.end,
-            run_mode="controller",
+            run_mode="detailed",
        )

        mock_print.assert_called()
--- a/tests/test_time_actions_scenario_plugin.py
+++ b/tests/test_time_actions_scenario_plugin.py
@@ -10,7 +10,7 @@ Assisted By: Claude Code
 """

 import unittest
-from unittest.mock import MagicMock
+from unittest.mock import MagicMock, patch

 from krkn_lib.k8s import KrknKubernetes
 from krkn_lib.telemetry.ocp import KrknTelemetryOpenshift
@@ -35,6 +35,22 @@ class TestTimeActionsScenarioPlugin(unittest.TestCase):
        self.assertEqual(result, ["time_scenarios"])
        self.assertEqual(len(result), 1)

+    @patch("krkn.scenario_plugins.time_actions.time_actions_scenario_plugin.logging")
+    @patch("builtins.open", side_effect=RuntimeError("disk quota exceeded"))
+    def test_exception_variable_bound_in_except_handler(self, mock_open, mock_logging):
+        """run() must bind exception variable so logging shows actual error, not NameError"""
+        result = self.plugin.run(
+            run_uuid="test-uuid",
+            scenario="fake_scenario.yaml",
+            lib_telemetry=MagicMock(),
+            scenario_telemetry=MagicMock(),
+        )
+
+        self.assertEqual(result, 1)
+        logged_msg = mock_logging.error.call_args[0][0]
+        self.assertIn("disk quota exceeded", logged_msg)
+        self.assertNotIn("NameError", logged_msg)
+

 if __name__ == "__main__":
    unittest.main()
--- a/tests/test_zone_outage_scenario_plugin.py
+++ b/tests/test_zone_outage_scenario_plugin.py
@@ -4,18 +4,26 @@
 Test suite for ZoneOutageScenarioPlugin class

 Usage:
-    python -m coverage run -a -m unittest tests/test_zone_outage_scenario_plugin.py -v
+    python -m coverage run -a -m unittest \
+        tests/test_zone_outage_scenario_plugin.py -v

 Assisted By: Claude Code
 """

+import base64
+import json
+import tempfile
 import unittest
-from unittest.mock import MagicMock
+import uuid
+from pathlib import Path
+from unittest.mock import MagicMock, patch

-from krkn_lib.k8s import KrknKubernetes
-from krkn_lib.telemetry.ocp import KrknTelemetryOpenshift
+import yaml

-from krkn.scenario_plugins.zone_outage.zone_outage_scenario_plugin import ZoneOutageScenarioPlugin
+from krkn.rollback.config import RollbackContent
+from krkn.scenario_plugins.zone_outage.zone_outage_scenario_plugin import (
+    ZoneOutageScenarioPlugin,
+)


 class TestZoneOutageScenarioPlugin(unittest.TestCase):
@@ -36,5 +44,217 @@ class TestZoneOutageScenarioPlugin(unittest.TestCase):
        self.assertEqual(len(result), 1)


+class TestRollbackGcpZoneOutage(unittest.TestCase):
+    """Tests for the GCP zone outage rollback functionality"""
+
+    @patch(
+        "krkn.scenario_plugins.node_actions."
+        "gcp_node_scenarios.gcp_node_scenarios"
+    )
+    def test_rollback_gcp_zone_outage_success(self, mock_gcp_class):
+        """
+        Test successful rollback starts all stopped nodes
+        """
+        rollback_data = {
+            "nodes": ["node-1", "node-2", "node-3"],
+            "timeout": 180,
+            "kube_check": True,
+        }
+        encoded = base64.b64encode(
+            json.dumps(rollback_data).encode("utf-8")
+        ).decode("utf-8")
+
+        rollback_content = RollbackContent(
+            resource_identifier=encoded,
+        )
+
+        mock_lib_telemetry = MagicMock()
+        mock_kubecli = MagicMock()
+        mock_lib_telemetry.get_lib_kubernetes.return_value = mock_kubecli
+
+        mock_cloud_instance = MagicMock()
+        mock_gcp_class.return_value = mock_cloud_instance
+
+        ZoneOutageScenarioPlugin.rollback_gcp_zone_outage(
+            rollback_content, mock_lib_telemetry
+        )
+
+        self.assertEqual(
+            mock_cloud_instance.node_start_scenario.call_count, 3
+        )
+        mock_cloud_instance.node_start_scenario.assert_any_call(
+            1, "node-1", 180, None
+        )
+        mock_cloud_instance.node_start_scenario.assert_any_call(
+            1, "node-2", 180, None
+        )
+        mock_cloud_instance.node_start_scenario.assert_any_call(
+            1, "node-3", 180, None
+        )
+
+    @patch(
+        "krkn.scenario_plugins.node_actions."
+        "gcp_node_scenarios.gcp_node_scenarios"
+    )
+    def test_rollback_gcp_zone_outage_partial_failure(self, mock_gcp_class):
+        """
+        Test rollback continues when one node fails to start
+        """
+        rollback_data = {
+            "nodes": ["node-1", "node-2"],
+            "timeout": 180,
+            "kube_check": True,
+        }
+        encoded = base64.b64encode(
+            json.dumps(rollback_data).encode("utf-8")
+        ).decode("utf-8")
+
+        rollback_content = RollbackContent(
+            resource_identifier=encoded,
+        )
+
+        mock_lib_telemetry = MagicMock()
+        mock_kubecli = MagicMock()
+        mock_lib_telemetry.get_lib_kubernetes.return_value = mock_kubecli
+
+        mock_cloud_instance = MagicMock()
+        mock_gcp_class.return_value = mock_cloud_instance
+        mock_cloud_instance.node_start_scenario.side_effect = [
+            Exception("GCP API error"),
+            None,
+        ]
+
+        ZoneOutageScenarioPlugin.rollback_gcp_zone_outage(
+            rollback_content, mock_lib_telemetry
+        )
+
+        self.assertEqual(
+            mock_cloud_instance.node_start_scenario.call_count, 2
+        )
+
+    def test_rollback_gcp_zone_outage_invalid_data(self):
+        """
+        Test rollback raises exception for invalid base64 data
+        """
+        rollback_content = RollbackContent(
+            resource_identifier="invalid_base64_data",
+        )
+
+        mock_lib_telemetry = MagicMock()
+
+        with self.assertRaises(Exception):
+            ZoneOutageScenarioPlugin.rollback_gcp_zone_outage(
+                rollback_content, mock_lib_telemetry
+            )
+
+
+class TestZoneOutageRun(unittest.TestCase):
+    """Tests for the run method of ZoneOutageScenarioPlugin"""
+
+    def setUp(self):
+        self.temp_dir = tempfile.TemporaryDirectory()
+        self.tmp_path = Path(self.temp_dir.name)
+
+    def tearDown(self):
+        self.temp_dir.cleanup()
+
+    def _create_scenario_file(self, config=None):
+        """Helper to create a temporary scenario YAML file"""
+        default_config = {
+            "zone_outage": {
+                "cloud_type": "gcp",
+                "zone": "us-central1-a",
+                "duration": 1,
+                "timeout": 10,
+                "kube_check": True,
+            }
+        }
+        if config:
+            default_config["zone_outage"].update(config)
+        scenario_file = self.tmp_path / "test_scenario.yaml"
+        with open(scenario_file, "w") as f:
+            yaml.dump(default_config, f)
+        return str(scenario_file)
+
+    def _create_mocks(self):
+        """Helper to create mock objects for testing"""
+        mock_lib_telemetry = MagicMock()
+        mock_lib_kubernetes = MagicMock()
+        mock_lib_telemetry.get_lib_kubernetes.return_value = (
+            mock_lib_kubernetes
+        )
+        mock_scenario_telemetry = MagicMock()
+        return mock_lib_telemetry, mock_lib_kubernetes, mock_scenario_telemetry
+
+    @patch("time.sleep")
+    @patch(
+        "krkn.scenario_plugins.zone_outage."
+        "zone_outage_scenario_plugin.gcp_node_scenarios"
+    )
+    def test_run_gcp_success(self, mock_gcp_class, mock_sleep):
+        """Test successful GCP zone outage scenario execution"""
+        scenario_file = self._create_scenario_file()
+        mock_lib_telemetry, mock_lib_kubernetes, mock_scenario_telemetry = (
+            self._create_mocks()
+        )
+
+        mock_lib_kubernetes.list_killable_nodes.return_value = ["node-1"]
+        mock_cloud = MagicMock()
+        mock_gcp_class.return_value = mock_cloud
+
+        plugin = ZoneOutageScenarioPlugin()
+        result = plugin.run(
+            run_uuid=str(uuid.uuid4()),
+            scenario=scenario_file,
+            lib_telemetry=mock_lib_telemetry,
+            scenario_telemetry=mock_scenario_telemetry,
+        )
+
+        self.assertEqual(result, 0)
+        mock_lib_kubernetes.list_killable_nodes.assert_called_once()
+        mock_cloud.node_stop_scenario.assert_called()
+        mock_cloud.node_start_scenario.assert_called()
+
+    def test_run_unsupported_cloud_type(self):
+        """Test run returns 1 for unsupported cloud type"""
+        scenario_file = self._create_scenario_file(
+            {"cloud_type": "unsupported"}
+        )
+        mock_lib_telemetry, mock_lib_kubernetes, mock_scenario_telemetry = (
+            self._create_mocks()
+        )
+
+        plugin = ZoneOutageScenarioPlugin()
+        result = plugin.run(
+            run_uuid=str(uuid.uuid4()),
+            scenario=scenario_file,
+            lib_telemetry=mock_lib_telemetry,
+            scenario_telemetry=mock_scenario_telemetry,
+        )
+
+        self.assertEqual(result, 1)
+
+    def test_run_gcp_exception(self):
+        """Test run handles exceptions gracefully"""
+        scenario_file = self._create_scenario_file()
+        mock_lib_telemetry, mock_lib_kubernetes, mock_scenario_telemetry = (
+            self._create_mocks()
+        )
+
+        mock_lib_telemetry.get_lib_kubernetes.side_effect = Exception(
+            "Connection error"
+        )
+
+        plugin = ZoneOutageScenarioPlugin()
+        result = plugin.run(
+            run_uuid=str(uuid.uuid4()),
+            scenario=scenario_file,
+            lib_telemetry=mock_lib_telemetry,
+            scenario_telemetry=mock_scenario_telemetry,
+        )
+
+        self.assertEqual(result, 1)
+
+
 if __name__ == "__main__":
    unittest.main()
Author	SHA1	Message	Date
Paige Patton	71bd34b020	adding better logging for when sceanrio file cant be found (#1203 ) Signed-off-by: Paige Patton <prubenda@redhat.com>	2026-03-27 13:47:49 -04:00
Paige Patton	6da7c9dec6	adding governance template from cncf (#926 ) Signed-off-by: Paige Patton <prubenda@redhat.com>	2026-03-27 09:33:00 -04:00
Tullio Sebastiani	4d5aea146d	Run method fixes (#1202 ) * kubevirt plugin fixes Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * managed_cluster plugin fixes Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * unit tests fix Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> --------- Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2026-03-27 14:31:19 +01:00
Yashasvi Yadav	62f500fb2e	feat: add GCP zone outage rollback support (#1200 ) Add rollback functionality for GCP zone outage scenarios following the established rollback pattern (Service Hijacking, PVC, Syn Flood). - Add @set_rollback_context_decorator to run() - Set rollback callable before stopping nodes with base64/JSON encoded data - Add rollback_gcp_zone_outage() static method with per-node error handling - Fix missing poll_interval argument in starmap calls - Add unit tests for rollback and run methods Closes #915 Signed-off-by: YASHASVIYADAV30 <yashasviydv30@gmail.com> Co-authored-by: Paige Patton <64206430+paigerube14@users.noreply.github.com>	2026-03-26 14:42:45 -04:00
Arpit Raj	ec241d35d6	fix: improve logging reliability and code quality (#1199 ) - Fix typo 'wating' -> 'waiting' in scenario wait log message - Replace print() with logging.debug() for pod metrics in prometheus client - Replace star import with explicit imports in utils/__init__.py - Remove unnecessary global declaration in main() - Log VM status exceptions at ERROR level with exception details Include unit tests in tests/test_logging_and_code_quality.py covering all fixes. Signed-off-by: 1PoPTRoN <vrxn.arp1traj@gmail.com> Co-authored-by: Paige Patton <64206430+paigerube14@users.noreply.github.com>	2026-03-26 13:08:56 -04:00
Arpit Raj	59e10d5a99	fix: bind exception variable in except handlers to prevent NameError (#1198 ) Signed-off-by: 1PoPTRoN <vrxn.arp1traj@gmail.com> Co-authored-by: Paige Patton <64206430+paigerube14@users.noreply.github.com>	2026-03-26 09:43:37 -04:00
Paige Patton	c8aa959df2	controller -> detailed (#1201 ) Signed-off-by: Paige Patton <prubenda@redhat.com>	2026-03-26 08:47:06 -04:00
Paige Patton	3db5e1abbe	no rebuild image (#1197 ) Signed-off-by: Paige Patton <prubenda@redhat.com>	2026-03-20 12:54:45 -04:00
Paige Patton	1e699c6cc9	different quay users (#1196 ) Signed-off-by: Paige Patton <prubenda@redhat.com>	2026-03-20 17:30:42 +01:00
Paige Patton	0ebda3e101	test multi platform (#1194 ) Signed-off-by: Paige Patton <prubenda@redhat.com>	2026-03-20 11:09:33 -04:00
Tullio Sebastiani	8a5be0dd2f	Resiliency Score krknctl compatibility fixes (#1195 ) * added console log of the resiliency score when mode is "detailed" Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * base image krknctl input Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> resiliency score flag Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * removed json print in run_krkn.py Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * unit test fix Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> --------- Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2026-03-20 11:09:07 -04:00
Tullio Sebastiani	62dadfe25c	Resiliency Score krknctl compatibility fixes (#1195 ) * added console log of the resiliency score when mode is "detailed" Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * base image krknctl input Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> resiliency score flag Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * removed json print in run_krkn.py Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * unit test fix Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> --------- Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2026-03-20 11:08:56 -04:00