Dockerfile v1.5.4 (#552 )

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>
Replaced oc debug command execution on node with a native version (#547 )
2026-02-18 20:09:55 +00:00 · 2024-01-15 19:22:52 +01:00 · 2024-01-15 12:15:38 -05:00 · 2024-01-15 11:11:45 -05:00 · 2024-01-12 13:58:50 -05:00 · 2024-01-12 13:21:37 -05:00
12 changed files with 112 additions and 102 deletions
--- a/.github/workflows/docker-image.yml
+++ b/.github/workflows/docker-image.yml
@@ -12,14 +12,25 @@ jobs:
    - name: Check out code
      uses: actions/checkout@v3
    - name: Build the Docker images
-      run: docker build --no-cache -t quay.io/redhat-chaos/krkn containers/
+      run:  |
+        docker build --no-cache -t quay.io/krkn-chaos/krkn containers/
+        docker tag quay.io/krkn-chaos/krkn quay.io/redhat-chaos/krkn
    - name: Login in quay
+      if: github.ref == 'refs/heads/main' && github.event_name == 'push'
+      run: docker login quay.io -u ${QUAY_USER} -p ${QUAY_TOKEN}
+      env:
+        QUAY_USER: ${{ secrets.QUAY_USERNAME }}
+        QUAY_TOKEN: ${{ secrets.QUAY_PASSWORD }}
+    - name: Push the KrknChaos Docker images
+      if: github.ref == 'refs/heads/main' && github.event_name == 'push'
+      run: docker push quay.io/krkn-chaos/krkn
+    - name: Login in to redhat-chaos quay
      if: github.ref == 'refs/heads/main' && github.event_name == 'push'
      run: docker login quay.io -u ${QUAY_USER} -p ${QUAY_TOKEN}
      env:
        QUAY_USER: ${{ secrets.QUAY_USER_1 }}
        QUAY_TOKEN: ${{ secrets.QUAY_TOKEN_1 }}
-    - name: Push the Docker images
+    - name: Push the RedHat Chaos Docker images
      if: github.ref == 'refs/heads/main' && github.event_name == 'push'
      run: docker push quay.io/redhat-chaos/krkn
    - name: Rebuild krkn-hub
--- a/README.md
+++ b/README.md
@@ -1,11 +1,11 @@
-# KrknChaos aka Kraken
+# Krkn aka Kraken
 [![Docker Repository on Quay](https://quay.io/repository/redhat-chaos/krkn/status "Docker Repository on Quay")](https://quay.io/repository/redhat-chaos/krkn?tab=tags&tag=latest)
 ![Workflow-Status](https://github.com/redhat-chaos/krkn/actions/workflows/docker-image.yml/badge.svg)

 ![Krkn logo](media/logo.png)

-Chaos and resiliency testing tool for Kubernetes and OpenShift.
-Kraken injects deliberate failures into Kubernetes/OpenShift clusters to check if it is resilient to turbulent conditions.
+Chaos and resiliency testing tool for Kubernetes.
+Kraken injects deliberate failures into Kubernetes clusters to check if it is resilient to turbulent conditions.


 ### Workflow
@@ -18,13 +18,13 @@ Kraken injects deliberate failures into Kubernetes/OpenShift clusters to check i
 ### Chaos Testing Guide
 [Guide](docs/index.md) encapsulates:
 - Test methodology that needs to be embraced.
- Best practices that an OpenShift cluster, platform and applications running on top of it should take into account for best user experience, performance, resilience and reliability.
+- Best practices that an Kubernetes cluster, platform and applications running on top of it should take into account for best user experience, performance, resilience and reliability.
 - Tooling.
 - Scenarios supported.
 - Test environment recommendations as to how and where to run chaos tests.
 - Chaos testing in practice.

-The guide is hosted at https://redhat-chaos.github.io/krknChoas.
+The guide is hosted at https://krkn-chaos.github.io/krkn.


 ### How to Get Started
@@ -57,29 +57,29 @@ This will manage the Cerberus and Elasticsearch containers on the host on which
 Instructions on how to setup the config and the options supported can be found at [Config](docs/config.md).


-### Kubernetes/OpenShift chaos scenarios supported
+### Kubernetes chaos scenarios supported

-Scenario type               | Kubernetes    | OpenShift          
--------------------------- | ------------- |--------------------|  
-[Pod Scenarios](docs/pod_scenarios.md) | :heavy_check_mark: | :heavy_check_mark: |
-[Pod Network Scenarios](docs/pod_network_scenarios.md) | :x: | :heavy_check_mark: |
-[Container Scenarios](docs/container_scenarios.md) | :heavy_check_mark: | :heavy_check_mark: |
-[Node Scenarios](docs/node_scenarios.md) | :heavy_check_mark: | :heavy_check_mark: |
-[Time Scenarios](docs/time_scenarios.md) | :x: | :heavy_check_mark: |
-[Hog Scenarios: CPU, Memory](docs/arcaflow_scenarios.md) | :heavy_check_mark: | :heavy_check_mark: |
-[Cluster Shut Down Scenarios](docs/cluster_shut_down_scenarios.md) | :heavy_check_mark: | :heavy_check_mark: |
-[Service Disruption Scenarios](docs/service_disruption_scenarios.md.md) | :heavy_check_mark: | :heavy_check_mark: |
-[Zone Outage Scenarios](docs/zone_outage.md) | :heavy_check_mark: | :heavy_check_mark: |
-[Application_outages](docs/application_outages.md) | :heavy_check_mark: | :heavy_check_mark: |
-[PVC scenario](docs/pvc_scenario.md) | :heavy_check_mark: | :heavy_check_mark: |
-[Network_Chaos](docs/network_chaos.md) | :heavy_check_mark: | :heavy_check_mark: |
-[ManagedCluster Scenarios](docs/managedcluster_scenarios.md) | :heavy_check_mark: | :question:         |
+Scenario type               | Kubernetes    
+--------------------------- | ------------- | 
+[Pod Scenarios](docs/pod_scenarios.md) | :heavy_check_mark: |
+[Pod Network Scenarios](docs/pod_network_scenarios.md) | :x: |
+[Container Scenarios](docs/container_scenarios.md) | :heavy_check_mark: |
+[Node Scenarios](docs/node_scenarios.md) | :heavy_check_mark: |
+[Time Scenarios](docs/time_scenarios.md) | :x: |
+[Hog Scenarios: CPU, Memory](docs/arcaflow_scenarios.md) | :heavy_check_mark: |
+[Cluster Shut Down Scenarios](docs/cluster_shut_down_scenarios.md) | :heavy_check_mark: |
+[Service Disruption Scenarios](docs/service_disruption_scenarios.md.md) | :heavy_check_mark: |
+[Zone Outage Scenarios](docs/zone_outage.md) | :heavy_check_mark: |
+[Application_outages](docs/application_outages.md) | :heavy_check_mark: |
+[PVC scenario](docs/pvc_scenario.md) | :heavy_check_mark: |
+[Network_Chaos](docs/network_chaos.md) | :heavy_check_mark: |
+[ManagedCluster Scenarios](docs/managedcluster_scenarios.md) | :heavy_check_mark: |


 ### Kraken scenario pass/fail criteria and report
 It is important to make sure to check if the targeted component recovered from the chaos injection and also if the Kubernetes/OpenShift cluster is healthy as failures in one component can have an adverse impact on other components. Kraken does this by:
 - Having built in checks for pod and node based scenarios to ensure the expected number of replicas and nodes are up. It also supports running custom scripts with the checks.
- Leveraging [Cerberus](https://github.com/openshift-scale/cerberus) to monitor the cluster under test and consuming the aggregated go/no-go signal to determine pass/fail post chaos. It is highly recommended to turn on the Cerberus health check feature available in Kraken. Instructions on installing and setting up Cerberus can be found [here](https://github.com/openshift-scale/cerberus#installation) or can be installed from Kraken using the [instructions](https://github.com/redhat-chaos/krkn#setting-up-infrastructure-dependencies). Once Cerberus is up and running, set cerberus_enabled to True and cerberus_url to the url where Cerberus publishes go/no-go signal in the Kraken config file. Cerberus can monitor [application routes](https://github.com/redhat-chaos/cerberus/blob/main/docs/config.md#watch-routes) during the chaos and fails the run if it encounters downtime as it is a potential downtime in a customers, or users environment as well. It is especially important during the control plane chaos scenarios including the API server, Etcd, Ingress etc. It can be enabled by setting `check_applicaton_routes: True` in the [Kraken config](https://github.com/redhat-chaos/krkn/blob/main/config/config.yaml) provided application routes are being monitored in the [cerberus config](https://github.com/redhat-chaos/krkn/blob/main/config/cerberus.yaml).
+- Leveraging [Cerberus](https://github.com/redhat-chaos/cerberus) to monitor the cluster under test and consuming the aggregated go/no-go signal to determine pass/fail post chaos. It is highly recommended to turn on the Cerberus health check feature available in Kraken. Instructions on installing and setting up Cerberus can be found [here](https://github.com/openshift-scale/cerberus#installation) or can be installed from Kraken using the [instructions](https://github.com/redhat-chaos/krkn#setting-up-infrastructure-dependencies). Once Cerberus is up and running, set cerberus_enabled to True and cerberus_url to the url where Cerberus publishes go/no-go signal in the Kraken config file. Cerberus can monitor [application routes](https://github.com/redhat-chaos/cerberus/blob/main/docs/config.md#watch-routes) during the chaos and fails the run if it encounters downtime as it is a potential downtime in a customers, or users environment as well. It is especially important during the control plane chaos scenarios including the API server, Etcd, Ingress etc. It can be enabled by setting `check_applicaton_routes: True` in the [Kraken config](https://github.com/redhat-chaos/krkn/blob/main/config/config.yaml) provided application routes are being monitored in the [cerberus config](https://github.com/redhat-chaos/krkn/blob/main/config/cerberus.yaml).
 - Leveraging built-in alert collection feature to fail the runs in case of critical alerts.

 ### Signaling
@@ -94,10 +94,6 @@ More detailed information on enabling and leveraging this feature can be found [
 Monitoring the Kubernetes/OpenShift cluster to observe the impact of Kraken chaos scenarios on various components is key to find out the bottlenecks as it is important to make sure the cluster is healthy in terms if both recovery as well as performance during/after the failure has been injected. Instructions on enabling it can be found [here](docs/performance_dashboards.md).


-### Scraping and storing metrics long term
-Kraken supports capturing metrics for the duration of the scenarios defined in the config and indexes then into Elasticsearch to be able to store and evaluate the state of the runs long term. The indexed metrics can be visualized with the help of Grafana. It uses [Kube-burner](https://github.com/kube-burner/kube-burner) under the hood. The metrics to capture need to be defined in a metrics profile which Kraken consumes to query prometheus ( installed by default in OpenShift ) with the start and end timestamp of the run. Information on enabling and leveraging this feature can be found [here](docs/metrics.md).
-
-
 ### SLOs validation during and post chaos
 - In addition to checking the recovery and health of the cluster and components under test, Kraken takes in a profile with the Prometheus expressions to validate and alerts, exits with a non-zero return code depending on the severity set. This feature can be used to determine pass/fail or alert on abnormalities observed in the cluster based on the metrics. 
 - Kraken also provides ability to check if any critical alerts are firing in the cluster post chaos and pass/fail's. 
@@ -116,7 +112,8 @@ Kraken supports injecting faults into [Open Cluster Management (OCM)](https://op
 - Blog post emphasizing the importance of making Chaos part of Performance and Scale runs to mimic the production environments: https://www.openshift.com/blog/making-chaos-part-of-kubernetes/openshift-performance-and-scalability-tests
 - Blog post on findings from Chaos test runs: https://cloud.redhat.com/blog/openshift/kubernetes-chaos-stories
 - Discussion with CNCF TAG App Delivery on Krkn workflow, features and addition to CNCF sandbox: [Github](https://github.com/cncf/sandbox/issues/44), [Tracker](https://github.com/cncf/tag-app-delivery/issues/465), [recording](https://www.youtube.com/watch?v=nXQkBFK_MWc&t=722s)
-
+- Blog post on supercharging chaos testing using AI integration in Krkn: https://www.redhat.com/en/blog/supercharging-chaos-testing-using-ai
+- Blog post announcing Krkn joining CNCF Sandbox: https://www.redhat.com/en/blog/krknchaos-joining-cncf-sandbox

 ### Roadmap
 Enhancements being planned can be found in the [roadmap](ROADMAP.md).
--- a/config/config.yaml
+++ b/config/config.yaml
@@ -51,8 +51,6 @@ cerberus:
 performance_monitoring:
    deploy_dashboards: False                              # Install a mutable grafana and load the performance dashboards. Enable this only when running on OpenShift
    repo: "https://github.com/cloud-bulldozer/performance-dashboards.git"
-    capture_metrics: False
-    metrics_profile_path: config/metrics-aggregated.yaml
    prometheus_url:                                       # The prometheus url/route is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes.
    prometheus_bearer_token:                              # The bearer token is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes. This is needed to authenticate with prometheus.
    uuid:                                                 # uuid for the run is generated by default if not set
--- a/config/config_kind.yaml
+++ b/config/config_kind.yaml
@@ -20,8 +20,6 @@ cerberus:
 performance_monitoring:
    deploy_dashboards: False                              # Install a mutable grafana and load the performance dashboards. Enable this only when running on OpenShift
    repo: "https://github.com/cloud-bulldozer/performance-dashboards.git"
-    capture_metrics: False
-    metrics_profile_path: config/metrics-aggregated.yaml
    prometheus_url:                                       # The prometheus url/route is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes.
    prometheus_bearer_token:                              # The bearer token is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes. This is needed to authenticate with prometheus.
    uuid:                                                 # uuid for the run is generated by default if not set
--- a/config/config_kubernetes.yaml
+++ b/config/config_kubernetes.yaml
@@ -19,8 +19,6 @@ cerberus:
 performance_monitoring:
    deploy_dashboards: False                              # Install a mutable grafana and load the performance dashboards. Enable this only when running on OpenShift
    repo: "https://github.com/cloud-bulldozer/performance-dashboards.git"
-    capture_metrics: False
-    metrics_profile_path: config/metrics-aggregated.yaml
    prometheus_url:                                       # The prometheus url/route is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes.
    prometheus_bearer_token:                              # The bearer token is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes. This is needed to authenticate with prometheus.
    uuid:                                                 # uuid for the run is generated by default if not set
--- a/containers/Dockerfile
+++ b/containers/Dockerfile
@@ -4,8 +4,6 @@ FROM mcr.microsoft.com/azure-cli:latest as azure-cli

 FROM registry.access.redhat.com/ubi8/ubi:latest

-LABEL org.opencontainers.image.authors="Red Hat OpenShift Chaos Engineering"
-
 ENV KUBECONFIG /root/.kube/config

 # Copy azure client binary from azure-cli image
@@ -14,7 +12,7 @@ COPY --from=azure-cli /usr/local/bin/az /usr/bin/az
 # Install dependencies
 RUN yum install -y git python39 python3-pip jq gettext wget && \
    python3.9 -m pip install -U pip && \
-    git clone https://github.com/redhat-chaos/krkn.git --branch v1.5.3 /root/kraken && \
+    git clone https://github.com/krkn-chaos/krkn.git --branch v1.5.4 /root/kraken && \
    mkdir -p /root/.kube && cd /root/kraken && \
    pip3.9 install -r requirements.txt && \
    pip3.9 install virtualenv && \
--- a/containers/Dockerfile-ppc64le
+++ b/containers/Dockerfile-ppc64le
@@ -14,7 +14,7 @@ COPY --from=azure-cli /usr/local/bin/az /usr/bin/az
 # Install dependencies
 RUN yum install -y git python39 python3-pip jq gettext wget && \
    python3.9 -m pip install -U pip && \
-    git clone https://github.com/redhat-chaos/krkn.git --branch v1.5.3 /root/kraken && \
+    git clone https://github.com/redhat-chaos/krkn.git --branch v1.5.4 /root/kraken && \
    mkdir -p /root/.kube && cd /root/kraken && \
    pip3.9 install -r requirements.txt && \
    pip3.9 install virtualenv && \
--- a/docs/metrics.md
+++ b/docs/metrics.md
@@ -1,31 +0,0 @@
-## Scraping and storing metrics for the run
-
-There are cases where the state of the cluster and metrics on the cluster during the chaos test run need to be stored long term to review after the cluster is terminated, for example CI and automation test runs. To help with this, Kraken supports capturing metrics for the duration of the scenarios defined in the config.
-
-The metrics to capture need to be defined in a metrics profile which Kraken consumes to query prometheus with the start and end timestamp of the run. Each run has a unique identifier ( uuid ). The uuid is generated automatically if not set in the config. This feature can be enabled in the [config](https://github.com/redhat-chaos/krkn/blob/main/config/config.yaml) by setting the following:
-
-```
-performance_monitoring:
-    capture_metrics: True
-    metrics_profile_path: config/metrics-aggregated.yaml
-    prometheus_url:                                       # The prometheus url/route is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes.
-    prometheus_bearer_token:                              # The bearer token is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes. This is needed to authenticate with prometheus.
-    uuid:                                                 # uuid for the run is generated by default if not set.
-```
-
-### Metrics profile
-A couple of [metric profiles](https://github.com/redhat-chaos/krkn/tree/main/config), [metrics.yaml](https://github.com/redhat-chaos/krkn/blob/main/config/metrics.yaml), and [metrics-aggregated.yaml](https://github.com/redhat-chaos/krkn/blob/main/config/metrics-aggregated.yaml) are shipped by default and can be tweaked to add more metrics to capture during the run. The following are the API server metrics for example:
-
-```
-metrics:
-# API server
-  - query: histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{apiserver="kube-apiserver", verb!~"WATCH", subresource!="log"}[2m])) by (verb,resource,subresource,instance,le)) > 0
-    metricName: API99thLatency
-
-  - query: sum(irate(apiserver_request_total{apiserver="kube-apiserver",verb!="WATCH",subresource!="log"}[2m])) by (verb,instance,resource,code) > 0
-    metricName: APIRequestRate
-
-  - query: sum(apiserver_current_inflight_requests{}) by (request_kind) > 0
-    metricName: APIInflightRequests
-```
-
--- a/kraken/time_actions/common_time_functions.py
+++ b/kraken/time_actions/common_time_functions.py
@@ -2,14 +2,18 @@ import datetime
 import time
 import logging
 import re
+
 import yaml
 import random
+
+from krkn_lib import utils
+from kubernetes.client import ApiException
+
 from ..cerberus import setup as cerberus
-from ..invoke import command as runcommand
 from krkn_lib.k8s import KrknKubernetes
 from krkn_lib.telemetry.k8s import KrknTelemetryKubernetes
 from krkn_lib.models.telemetry import ScenarioTelemetry
-from krkn_lib.utils.functions import get_yaml_item_value, log_exception
+from krkn_lib.utils.functions import get_yaml_item_value, log_exception, get_random_string


 # krkn_lib
@@ -35,13 +39,6 @@ def pod_exec(pod_name, command, namespace, container_name, kubecli:KrknKubernete
    return response


-def node_debug(node_name, command):
-    response = runcommand.invoke(
-        "oc debug node/" + node_name + " -- chroot /host " + command
-    )
-    return response
-
-
 # krkn_lib
 def get_container_name(pod_name, namespace, kubecli:KrknKubernetes, container_name=""):

@@ -65,15 +62,46 @@ def get_container_name(pod_name, namespace, kubecli:KrknKubernetes, container_na
        return container_name


+
+def skew_node(node_name: str, action: str, kubecli: KrknKubernetes):
+    pod_namespace = "default"
+    status_pod_name = f"time-skew-pod-{get_random_string(5)}"
+    skew_pod_name = f"time-skew-pod-{get_random_string(5)}"
+    ntp_enabled = True
+    logging.info(f'Creating pod to skew {"time" if action == "skew_time" else "date"} on node {node_name}')
+    status_command = ["timedatectl"]
+    param = "2001-01-01"
+    skew_command = ["timedatectl", "set-time"]
+    if action == "skew_time":
+        skew_command.append("01:01:01")
+    else:
+        skew_command.append("2001-01-01")
+
+    try:
+        status_response = kubecli.exec_command_on_node(node_name, status_command, status_pod_name, pod_namespace)
+        if "Network time on: no" in status_response:
+            ntp_enabled = False
+
+            logging.warning(f'ntp unactive on node {node_name} skewing {"time" if action == "skew_time" else "date"} to {param}')
+            pod_exec(skew_pod_name, skew_command, pod_namespace, None, kubecli)
+        else:
+            logging.info(f'ntp active in cluster node, {"time" if action == "skew_time" else "date"} skewing will have no effect, skipping')
+    except ApiException:
+        pass
+    except Exception as e:
+        logging.error(f"failed to execute skew command in pod: {e}")
+    finally:
+        kubecli.delete_pod(status_pod_name, pod_namespace)
+        if not ntp_enabled :
+            kubecli.delete_pod(skew_pod_name, pod_namespace)
+
+
+
 # krkn_lib
 def skew_time(scenario, kubecli:KrknKubernetes):
-    skew_command = "date --date "
-    if scenario["action"] == "skew_date":
-        skewed_date = "00-01-01"
-        skew_command += skewed_date
-    elif scenario["action"] == "skew_time":
-        skewed_time = "01:01:01"
-        skew_command += skewed_time
+    if scenario["action"] not in ["skew_date","skew_time"]:
+        raise RuntimeError(f'{scenario["action"]} is not a valid time skew action')
+
    if "node" in scenario["object_type"]:
        node_names = []
        if "object_name" in scenario.keys() and scenario["object_name"]:
@@ -83,13 +111,19 @@ def skew_time(scenario, kubecli:KrknKubernetes):
            scenario["label_selector"]
        ):
            node_names = kubecli.list_nodes(scenario["label_selector"])
-
        for node in node_names:
-            node_debug(node, skew_command)
+            skew_node(node, scenario["action"], kubecli)
            logging.info("Reset date/time on node " + str(node))
        return "node", node_names

    elif "pod" in scenario["object_type"]:
+        skew_command = "date --date "
+        if scenario["action"] == "skew_date":
+            skewed_date = "00-01-01"
+            skew_command += skewed_date
+        elif scenario["action"] == "skew_time":
+            skewed_time = "01:01:01"
+            skew_command += skewed_time
        container_name = get_yaml_item_value(scenario, "container_name", "")
        pod_names = []
        if "object_name" in scenario.keys() and scenario["object_name"]:
@@ -241,7 +275,8 @@ def check_date_time(object_type, names, kubecli:KrknKubernetes):
    if object_type == "node":
        for node_name in names:
            first_date_time = datetime.datetime.utcnow()
-            node_datetime_string = node_debug(node_name, skew_command)
+            check_pod_name = f"time-skew-pod-{get_random_string(5)}"
+            node_datetime_string = kubecli.exec_command_on_node(node_name, [skew_command], check_pod_name)
            node_datetime = string_to_date(node_datetime_string)
            counter = 0
            while not (
@@ -252,7 +287,8 @@ def check_date_time(object_type, names, kubecli:KrknKubernetes):
                    "Date/time on node %s still not reset, "
                    "waiting 10 seconds and retrying" % node_name
                )
-                node_datetime_string = node_debug(node_name, skew_command)
+
+                node_datetime_string = kubecli.exec_cmd_in_pod([skew_command], check_pod_name, "default")
                node_datetime = string_to_date(node_datetime_string)
                counter += 1
                if counter > max_retries:
@@ -266,6 +302,8 @@ def check_date_time(object_type, names, kubecli:KrknKubernetes):
                logging.info(
                    "Date in node " + str(node_name) + " reset properly"
                )
+            kubecli.delete_pod(check_pod_name)
+
    elif object_type == "pod":
        for pod_name in names:
            first_date_time = datetime.datetime.utcnow()
--- a/media/kraken-workflow.png
+++ b/media/kraken-workflow.png
--- a/requirements.txt
+++ b/requirements.txt
@@ -18,8 +18,8 @@ google-api-python-client
 ibm_cloud_sdk_core
 ibm_vpc
 itsdangerous==2.0.1
-jinja2==3.0.3
-krkn-lib >= 1.4.5
+jinja2==3.1.3
+krkn-lib >= 1.4.6
 kubernetes
 lxml >= 4.3.0
 oauth2client>=4.1.3
--- a/run_kraken.py
+++ b/run_kraken.py
@@ -170,7 +170,10 @@ def main(cfg):
        # KrknTelemetry init
        telemetry_k8s = KrknTelemetryKubernetes(safe_logger, kubecli)
        telemetry_ocp = KrknTelemetryOpenshift(safe_logger, ocpcli)
-        prometheus = KrknPrometheus(prometheus_url, prometheus_bearer_token)
+
+
+        if enable_alerts:
+            prometheus = KrknPrometheus(prometheus_url, prometheus_bearer_token)

        logging.info("Server URL: %s" % kubecli.get_host())

@@ -200,6 +203,7 @@ def main(cfg):

        # Capture the start time
        start_time = int(time.time())
+
        chaos_telemetry = ChaosRunTelemetry()
        chaos_telemetry.run_uuid = run_uuid
        # Loop to run the chaos starts here
@@ -280,16 +284,9 @@ def main(cfg):
                        # in the config
                        # krkn_lib
                        elif scenario_type == "time_scenarios":
-                            if distribution == "openshift":
                                logging.info("Running time skew scenarios")
                                failed_post_scenarios, scenario_telemetries = time_actions.run(scenarios_list, config, wait_duration, kubecli, telemetry_k8s)
                                chaos_telemetry.scenarios.extend(scenario_telemetries)
-                            else:
-                                logging.error(
-                                    "Litmus scenarios are currently "
-                                    "supported only on openshift"
-                                )
-                                sys.exit(1)
                        # Inject cluster shutdown scenarios
                        # krkn_lib
                        elif scenario_type == "cluster_shut_down_scenarios":
@@ -337,12 +334,18 @@ def main(cfg):
                            failed_post_scenarios, scenario_telemetries = network_chaos.run(scenarios_list, config, wait_duration, kubecli, telemetry_k8s)

                        # Check for critical alerts when enabled
-                        if check_critical_alerts:
+                        if enable_alerts and check_critical_alerts :
                            logging.info("Checking for critical alerts firing post choas")

                            ##PROM
                            query = r"""ALERTS{severity="critical"}"""
-                            critical_alerts = prometheus.process_prom_query_in_range(query, datetime.datetime.fromtimestamp(start_time))
+                            end_time = datetime.datetime.now()
+                            critical_alerts = prometheus.process_prom_query_in_range(
+                                query,
+                                start_time = datetime.datetime.fromtimestamp(start_time),
+                                end_time = end_time
+
+                            )
                            critical_alerts_count = len(critical_alerts)
                            if critical_alerts_count > 0:
                                logging.error("Critical alerts are firing: %s", critical_alerts)
Author	SHA1	Message	Date
Tullio Sebastiani	4f7c58106d	Dockerfile v1.5.4 (#552 ) Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-01-15 19:22:52 +01:00
Tullio Sebastiani	a7e5ae6c80	Replaced `oc debug` command execution on node with a native version (#547 ) * native time skew feature Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * fixed podname conflict issue Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * updated krkn-lib to v1.4.6 Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> * fixed pod conflict issue Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com> --------- Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-01-15 12:15:38 -05:00
Tullio Sebastiani	aa030a21d3	Fixes the critical alerts exception with the start_time > end_time Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-01-15 11:11:45 -05:00
Paige Rubendall	631f12bdff	Adding push to both red hat and krkn chaos quay (#550 ) * adding push to both red hat and krkn chaos quay * tag redhat chaos from krkn-chaos image * login to both quays	2024-01-12 13:58:50 -05:00
Naga Ravi Chaitanya Elluri	2525982c55	Rename repo name and update workflow This commit also removes OpenShift references and updates source in the dockerfile. Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>	2024-01-12 13:21:37 -05:00
dependabot[bot]	9760d7d97d	Bump jinja2 from 3.0.3 to 3.1.3 Bumps [jinja2](https://github.com/pallets/jinja) from 3.0.3 to 3.1.3. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/jinja/compare/3.0.3...3.1.3) --- updated-dependencies: - dependency-name: jinja2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2024-01-11 15:40:09 -05:00
Naga Ravi Chaitanya Elluri	720488c159	Add new blogs to the useful resources list (#546 ) Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>	2024-01-10 15:45:36 -05:00
Naga Ravi Chaitanya Elluri	487a9f464c	Deprecate long term metrics collection This will be added back soon via native prometheus integration. Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>	2024-01-10 15:08:58 -05:00
Tullio Sebastiani	d9e137e85a	fixes prometheus url check on Kubernetes Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>	2024-01-10 11:23:02 -05:00