Bake in virtualenv in krkn images

This is needed to tie the python version being used in case multiple versions are installed.
Fix for pvc scenario
2026-02-18 20:09:55 +00:00 · 2023-07-24 12:52:20 -04:00 · 2023-07-21 15:41:28 -04:00 · 2023-07-20 16:01:50 -04:00 · 2023-07-20 15:24:56 -04:00 · 2023-07-20 13:17:52 -04:00
36 changed files with 415 additions and 252 deletions
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -1,8 +1,5 @@
 name: Build Krkn
 on:
-  push:
-    branches:
-      - main
  pull_request:

 jobs:
@@ -51,20 +48,4 @@ jobs:
          if-no-files-found: error
      - name: Check CI results
        run: grep Fail CI/results.markdown && false || true
-      - name: Build the Docker images
-        run: docker build --no-cache -t quay.io/redhat-chaos/krkn containers/
-      - name: Login in quay
-        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
-        run: docker login quay.io -u ${QUAY_USER} -p ${QUAY_TOKEN}
-        env:
-          QUAY_USER: ${{ secrets.QUAY_USER_1 }}
-          QUAY_TOKEN: ${{ secrets.QUAY_TOKEN_1 }}
-      - name: Push the Docker images
-        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
-        run: docker push quay.io/redhat-chaos/krkn
-      - name: Rebuild krkn-hub
-        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
-        uses: redhat-chaos/actions/krkn-hub@main
-        with:
-          QUAY_USER: ${{ secrets.QUAY_USER_1 }}
-          QUAY_TOKEN: ${{ secrets.QUAY_TOKEN_1 }}
+      
--- a/.github/workflows/docker-image.yml
+++ b/.github/workflows/docker-image.yml
@@ -0,0 +1,30 @@
+name: Docker Image CI
+on:
+  push:
+    branches:
+      - main
+  pull_request:
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+    - name: Check out code
+      uses: actions/checkout@v3
+    - name: Build the Docker images
+      run: docker build --no-cache -t quay.io/redhat-chaos/krkn containers/
+    - name: Login in quay
+      if: github.ref == 'refs/heads/main' && github.event_name == 'push'
+      run: docker login quay.io -u ${QUAY_USER} -p ${QUAY_TOKEN}
+      env:
+        QUAY_USER: ${{ secrets.QUAY_USER_1 }}
+        QUAY_TOKEN: ${{ secrets.QUAY_TOKEN_1 }}
+    - name: Push the Docker images
+      if: github.ref == 'refs/heads/main' && github.event_name == 'push'
+      run: docker push quay.io/redhat-chaos/krkn
+    - name: Rebuild krkn-hub
+      if: github.ref == 'refs/heads/main' && github.event_name == 'push'
+      uses: redhat-chaos/actions/krkn-hub@main
+      with:
+        QUAY_USER: ${{ secrets.QUAY_USER_1 }}
+        QUAY_TOKEN: ${{ secrets.QUAY_TOKEN_1 }}
--- a/README.md
+++ b/README.md
@@ -62,7 +62,7 @@ Scenario type               | Kubernetes    | OpenShift
 [Container Scenarios](docs/container_scenarios.md) | :heavy_check_mark: | :heavy_check_mark: |
 [Node Scenarios](docs/node_scenarios.md) | :heavy_check_mark: | :heavy_check_mark: |
 [Time Scenarios](docs/time_scenarios.md) | :x: | :heavy_check_mark: |
-[Hog Scenarios](docs/arcaflow_scenarios.md) | :heavy_check_mark: | :heavy_check_mark: |
+[Hog Scenarios: CPU, Memory](docs/arcaflow_scenarios.md) | :heavy_check_mark: | :heavy_check_mark: |
 [Cluster Shut Down Scenarios](docs/cluster_shut_down_scenarios.md) | :heavy_check_mark: | :heavy_check_mark: |
 [Namespace Scenarios](docs/namespace_scenarios.md) | :heavy_check_mark: | :heavy_check_mark: |
 [Zone Outage Scenarios](docs/zone_outage.md) | :heavy_check_mark: | :heavy_check_mark: |
@@ -94,8 +94,12 @@ Monitoring the Kubernetes/OpenShift cluster to observe the impact of Kraken chao
 Kraken supports capturing metrics for the duration of the scenarios defined in the config and indexes then into Elasticsearch to be able to store and evaluate the state of the runs long term. The indexed metrics can be visualized with the help of Grafana. It uses [Kube-burner](https://github.com/cloud-bulldozer/kube-burner) under the hood. The metrics to capture need to be defined in a metrics profile which Kraken consumes to query prometheus ( installed by default in OpenShift ) with the start and end timestamp of the run. Information on enabling and leveraging this feature can be found [here](docs/metrics.md).


-### Alerts
-In addition to checking the recovery and health of the cluster and components under test, Kraken takes in a profile with the Prometheus expressions to validate and alerts, exits with a non-zero return code depending on the severity set. This feature can be used to determine pass/fail or alert on abnormalities observed in the cluster based on the metrics. Information on enabling and leveraging this feature can be found [here](docs/alerts.md).
+### SLOs validation during and post chaos
+- In addition to checking the recovery and health of the cluster and components under test, Kraken takes in a profile with the Prometheus expressions to validate and alerts, exits with a non-zero return code depending on the severity set. This feature can be used to determine pass/fail or alert on abnormalities observed in the cluster based on the metrics. 
+- Kraken also provides ability to check if any critical alerts are firing in the cluster post chaos and pass/fail's. 
+
+Information on enabling and leveraging this feature can be found [here](docs/SLOs_validation.md)
+

 ### OCM / ACM integration

--- a/config/alerts
+++ b/config/alerts
@@ -1,11 +1,65 @@
- expr: avg_over_time(histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[2m]))[5m:]) > 0.01
-  description: 5 minutes avg. etcd fsync latency on {{$labels.pod}} higher than 10ms {{$value}}
+# etcd
+
+- expr: avg_over_time(histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[2m]))[10m:]) > 0.01
+  description: 10 minutes avg. 99th etcd fsync latency on {{$labels.pod}} higher than 10ms. {{$value}}s
+  severity: warning
+
+- expr: avg_over_time(histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket[2m]))[10m:]) > 1
+  description: 10 minutes avg. 99th etcd fsync latency on {{$labels.pod}} higher than 1s. {{$value}}s
  severity: error

- expr: avg_over_time(histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[5m]))[5m:]) > 0.1
-  description: 5 minutes avg. etcd netowrk peer round trip on {{$labels.pod}} higher than 100ms {{$value}}
-  severity: info
+- expr: avg_over_time(histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket[2m]))[10m:]) > 0.03
+  description: 10 minutes avg. 99th etcd commit latency on {{$labels.pod}} higher than 30ms. {{$value}}s
+  severity: warning

- expr: increase(etcd_server_leader_changes_seen_total[2m]) > 0
+- expr: rate(etcd_server_leader_changes_seen_total[2m]) > 0
  description: etcd leader changes observed
+  severity: warning
+
+# API server
+- expr: avg_over_time(histogram_quantile(0.99, sum(irate(apiserver_request_duration_seconds_bucket{apiserver="kube-apiserver", verb=~"POST|PUT|DELETE|PATCH", subresource!~"log|exec|portforward|attach|proxy"}[2m])) by (le, resource, verb))[10m:]) > 1
+  description: 10 minutes avg. 99th mutating API call latency for {{$labels.verb}}/{{$labels.resource}} higher than 1 second. {{$value}}s
  severity: error
+
+- expr: avg_over_time(histogram_quantile(0.99, sum(irate(apiserver_request_duration_seconds_bucket{apiserver="kube-apiserver", verb=~"LIST|GET", subresource!~"log|exec|portforward|attach|proxy", scope="resource"}[2m])) by (le, resource, verb, scope))[5m:]) > 1
+  description: 5 minutes avg. 99th read-only API call latency for {{$labels.verb}}/{{$labels.resource}} in scope {{$labels.scope}} higher than 1 second. {{$value}}s
+  severity: error
+
+- expr: avg_over_time(histogram_quantile(0.99, sum(irate(apiserver_request_duration_seconds_bucket{apiserver="kube-apiserver", verb=~"LIST|GET", subresource!~"log|exec|portforward|attach|proxy", scope="namespace"}[2m])) by (le, resource, verb, scope))[5m:]) > 5
+  description: 5 minutes avg. 99th read-only API call latency for {{$labels.verb}}/{{$labels.resource}} in scope {{$labels.scope}} higher than 5 seconds. {{$value}}s
+  severity: error
+
+- expr: avg_over_time(histogram_quantile(0.99, sum(irate(apiserver_request_duration_seconds_bucket{apiserver="kube-apiserver", verb=~"LIST|GET", subresource!~"log|exec|portforward|attach|proxy", scope="cluster"}[2m])) by (le, resource, verb, scope))[5m:]) > 30
+  description: 5 minutes avg. 99th read-only API call latency for {{$labels.verb}}/{{$labels.resource}} in scope {{$labels.scope}} higher than 30 seconds. {{$value}}s
+  severity: error
+
+# Control plane pods
+- expr: up{apiserver=~"kube-apiserver|openshift-apiserver"} == 0
+  description: "{{$labels.apiserver}} {{$labels.instance}} down"
+  severity: warning
+
+- expr: up{namespace=~"openshift-etcd"} == 0
+  description: "{{$labels.namespace}}/{{$labels.pod}} down"
+  severity: warning
+
+- expr: up{namespace=~"openshift-.*(kube-controller-manager|scheduler|controller-manager|sdn|ovn-kubernetes|dns)"} == 0
+  description: "{{$labels.namespace}}/{{$labels.pod}} down"
+  severity: warning
+
+- expr: up{job=~"crio|kubelet"} == 0
+  description: "{{$labels.node}}/{{$labels.job}} down"
+  severity: warning
+
+- expr: up{job="ovnkube-node"} == 0
+  description: "{{$labels.instance}}/{{$labels.pod}} {{$labels.job}} down"
+  severity: warning
+
+# Service sync latency
+- expr: histogram_quantile(0.99, sum(rate(kubeproxy_network_programming_duration_seconds_bucket[2m])) by (le)) > 10
+  description: 99th Kubeproxy network programming latency higher than 10 seconds. {{$value}}s 
+  severity: warning
+
+# Prometheus alerts
+- expr: ALERTS{severity="critical", alertstate="firing"} > 0
+  description: Critical prometheus alert. {{$labels.alertname}}
+  severity: warning
--- a/config/config.yaml
+++ b/config/config.yaml
@@ -50,7 +50,7 @@ cerberus:
 performance_monitoring:
    deploy_dashboards: False                              # Install a mutable grafana and load the performance dashboards. Enable this only when running on OpenShift
    repo: "https://github.com/cloud-bulldozer/performance-dashboards.git"
-    kube_burner_binary_url: "https://github.com/cloud-bulldozer/kube-burner/releases/download/v0.9.1/kube-burner-0.9.1-Linux-x86_64.tar.gz"
+    kube_burner_binary_url: "https://github.com/cloud-bulldozer/kube-burner/releases/download/v1.7.0/kube-burner-1.7.0-Linux-x86_64.tar.gz"
    capture_metrics: False
    config_path: config/kube_burner.yaml                  # Define the Elasticsearch url and index name in this config
    metrics_profile_path: config/metrics-aggregated.yaml
--- a/containers/Dockerfile
+++ b/containers/Dockerfile
@@ -1,29 +1,28 @@
 # Dockerfile for kraken

-FROM quay.io/openshift/origin-tests:latest as origintests
-
 FROM mcr.microsoft.com/azure-cli:latest as azure-cli

-FROM quay.io/centos/centos:stream9
+FROM registry.access.redhat.com/ubi8/ubi:latest

 LABEL org.opencontainers.image.authors="Red Hat OpenShift Chaos Engineering"

 ENV KUBECONFIG /root/.kube/config

-# Copy OpenShift CLI, Kubernetes CLI from origin-tests image
-COPY --from=origintests /usr/bin/oc /usr/bin/oc
-COPY --from=origintests /usr/bin/kubectl /usr/bin/kubectl
-
 # Copy azure client binary from azure-cli image
 COPY --from=azure-cli /usr/local/bin/az /usr/bin/az

 # Install dependencies
-RUN yum install epel-release -y && \
-    yum install -y git python39 python3-pip jq gettext && \
+RUN yum install -y git python39 python3-pip jq gettext wget && \
    python3.9 -m pip install -U pip && \
-    git clone https://github.com/redhat-chaos/krkn.git --branch v1.3.0 /root/kraken && \
+    git clone https://github.com/redhat-chaos/krkn.git --branch v1.3.5 /root/kraken && \
    mkdir -p /root/.kube && cd /root/kraken && \
-    pip3.9 install -r requirements.txt
+    pip3.9 install -r requirements.txt && \
+    pip3.9 install virtualenv && \
+    wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/bin/yq && chmod +x /usr/bin/yq
+
+# Get Kubernetes and OpenShift clients from stable releases
+WORKDIR /tmp
+RUN wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz && tar -xvf openshift-client-linux.tar.gz && cp oc /usr/local/bin/oc && cp kubectl /usr/local/bin/kubectl

 WORKDIR /root/kraken

--- a/containers/Dockerfile-ppc64le
+++ b/containers/Dockerfile-ppc64le
@@ -2,24 +2,28 @@

 FROM ppc64le/centos:8

-MAINTAINER Red Hat OpenShift Performance and Scale
+FROM mcr.microsoft.com/azure-cli:latest as azure-cli
+
+LABEL org.opencontainers.image.authors="Red Hat OpenShift Chaos Engineering"

 ENV KUBECONFIG /root/.kube/config

-RUN curl -L -o kubernetes-client-linux-ppc64le.tar.gz https://dl.k8s.io/v1.19.0/kubernetes-client-linux-ppc64le.tar.gz \
-&& tar xf kubernetes-client-linux-ppc64le.tar.gz && mv kubernetes/client/bin/kubectl /usr/bin/ && rm -rf kubernetes-client-linux-ppc64le.tar.gz
-
-RUN curl -L -o openshift-client-linux.tar.gz https://mirror.openshift.com/pub/openshift-v4/ppc64le/clients/ocp/stable/openshift-client-linux.tar.gz \
-&& tar xf openshift-client-linux.tar.gz -C /usr/bin && rm -rf openshift-client-linux.tar.gz
+# Copy azure client binary from azure-cli image
+COPY --from=azure-cli /usr/local/bin/az /usr/bin/az

 # Install dependencies
-RUN yum install epel-release -y && \
-yum install -y git python36 python3-pip gcc libffi-devel python36-devel openssl-devel gcc-c++ make jq gettext && \
-git clone https://github.com/redhat-chaos/krkn.git --branch main /root/kraken && \
-mkdir -p /root/.kube && cd /root/kraken && \
-pip3 install cryptography==3.3.2 && \
-pip3 install -r requirements.txt setuptools==40.3.0 urllib3==1.25.4
+RUN yum install -y git python39 python3-pip jq gettext wget && \
+    python3.9 -m pip install -U pip && \
+    git clone https://github.com/redhat-chaos/krkn.git --branch v1.3.5 /root/kraken && \
+    mkdir -p /root/.kube && cd /root/kraken && \
+    pip3.9 install -r requirements.txt && \
+    pip3.9 install virtualenv && \
+    wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/bin/yq && chmod +x /usr/bin/yq
+
+# Get Kubernetes and OpenShift clients from stable releases
+WORKDIR /tmp
+RUN wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz && tar -xvf openshift-client-linux.tar.gz && cp oc /usr/local/bin/oc && cp kubectl /usr/local/bin/kubectl

 WORKDIR /root/kraken

-ENTRYPOINT python3 run_kraken.py --config=config/config.yaml
+ENTRYPOINT python3.9 run_kraken.py --config=config/config.yaml
--- a/docs/SLOs_validation.md
+++ b/docs/SLOs_validation.md
@@ -1,16 +1,16 @@
-## Alerts
+## SLOs validation

 Pass/fail based on metrics captured from the cluster is important in addition to checking the health status and recovery. Kraken supports:

-###  Checking for critical alerts 
-If enabled, the check runs at the end of each scenario and Kraken exits in case critical alerts are firing to allow user to debug. You can enable it in the config:
+###  Checking for critical alerts post chaos 
+If enabled, the check runs at the end of each scenario ( post chaos ) and Kraken exits in case critical alerts are firing to allow user to debug. You can enable it in the config:

 ```
 performance_monitoring:
    check_critical_alerts: False                          # When enabled will check prometheus for critical alerts firing post chaos
 ```

-### Alerting based on the queries defined by the user
+### Validation and alerting based on the queries defined by the user during chaos
 Takes PromQL queries as input and modifies the return code of the run to determine pass/fail. It's especially useful in case of automated runs in CI where user won't be able to monitor the system. It uses [Kube-burner](https://kube-burner.readthedocs.io/en/latest/) under the hood. This feature can be enabled in the [config](https://github.com/redhat-chaos/krkn/blob/main/config/config.yaml) by setting the following:

 ```
--- a/kraken/kube_burner/client.py
+++ b/kraken/kube_burner/client.py
@@ -40,7 +40,7 @@ def scrape_metrics(
                distribution, prometheus_url, prometheus_bearer_token
            )
        else:
-            logging.error("Looks like proemtheus url is not defined, exiting")
+            logging.error("Looks like prometheus url is not defined, exiting")
            sys.exit(1)
    command = (
        "./kube-burner index --uuid "
--- a/kraken/kubernetes_legacy/init.py
+++ b/kraken/kubernetes_legacy/init.py
--- a/kraken/kubernetes_legacy/client.py
+++ b/kraken/kubernetes_legacy/client.py
--- a/kraken/kubernetes_legacy/resources.py
+++ b/kraken/kubernetes_legacy/resources.py
--- a/kraken/litmus/common_litmus.py
+++ b/kraken/litmus/common_litmus.py
@@ -1,5 +1,5 @@
 import kraken.invoke.command as runcommand
-import kraken.kubernetes.client as kubecli
+import krkn_lib_kubernetes
 import logging
 import time
 import sys
@@ -8,8 +8,16 @@ import yaml
 import kraken.cerberus.setup as cerberus


+# krkn_lib_kubernetes
 # Inject litmus scenarios defined in the config
-def run(scenarios_list, config, litmus_uninstall, wait_duration, litmus_namespace):
+def run(
+        scenarios_list,
+        config,
+        litmus_uninstall,
+        wait_duration,
+        litmus_namespace,
+        kubecli: krkn_lib_kubernetes.KrknLibKubernetes
+):
    # Loop to run the scenarios starts here
    for l_scenario in scenarios_list:
        start_time = int(time.time())
@@ -35,16 +43,16 @@ def run(scenarios_list, config, litmus_uninstall, wait_duration, litmus_namespac
                        sys.exit(1)
                    for expr in experiment_names:
                        expr_name = expr["name"]
-                        experiment_result = check_experiment(engine_name, expr_name, litmus_namespace)
+                        experiment_result = check_experiment(engine_name, expr_name, litmus_namespace, kubecli)
                        if experiment_result:
                            logging.info("Scenario: %s has been successfully injected!" % item)
                        else:
                            logging.info("Scenario: %s was not successfully injected, please check" % item)
                            if litmus_uninstall:
-                                delete_chaos(litmus_namespace)
+                                delete_chaos(litmus_namespace, kubecli)
                            sys.exit(1)
            if litmus_uninstall:
-                delete_chaos(litmus_namespace)
+                delete_chaos(litmus_namespace, kubecli)
            logging.info("Waiting for the specified duration: %s" % wait_duration)
            time.sleep(wait_duration)
            end_time = int(time.time())
@@ -86,7 +94,8 @@ def deploy_all_experiments(version_string, namespace):
    )


-def wait_for_initialized(engine_name, experiment_name, namespace):
+# krkn_lib_kubernetes
+def wait_for_initialized(engine_name, experiment_name, namespace, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):

    chaos_engine = kubecli.get_litmus_chaos_object(kind='chaosengine', name=engine_name,
                                                   namespace=namespace).engineStatus
@@ -110,10 +119,17 @@ def wait_for_initialized(engine_name, experiment_name, namespace):
    return True


-def wait_for_status(engine_name, expected_status, experiment_name, namespace):
+# krkn_lib_kubernetes
+def wait_for_status(
+        engine_name,
+        expected_status,
+        experiment_name,
+        namespace,
+        kubecli: krkn_lib_kubernetes.KrknLibKubernetes
+):

    if expected_status == "running":
-        response = wait_for_initialized(engine_name, experiment_name, namespace)
+        response = wait_for_initialized(engine_name, experiment_name, namespace, kubecli)
        if not response:
            logging.info("Chaos engine never initialized, exiting")
            return False
@@ -140,12 +156,13 @@ def wait_for_status(engine_name, expected_status, experiment_name, namespace):


 # Check status of experiment
-def check_experiment(engine_name, experiment_name, namespace):
+# krkn_lib_kubernetes
+def check_experiment(engine_name, experiment_name, namespace, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):

-    wait_response = wait_for_status(engine_name, "running", experiment_name, namespace)
+    wait_response = wait_for_status(engine_name, "running", experiment_name, namespace, kubecli)

    if wait_response:
-        wait_for_status(engine_name, "completed", experiment_name, namespace)
+        wait_for_status(engine_name, "completed", experiment_name, namespace, kubecli)
    else:
        sys.exit(1)

@@ -166,7 +183,8 @@ def check_experiment(engine_name, experiment_name, namespace):


 # Delete all chaos engines in a given namespace
-def delete_chaos_experiments(namespace):
+# krkn_lib_kubernetes
+def delete_chaos_experiments(namespace, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):

    if kubecli.check_if_namespace_exists(namespace):
        chaos_exp_exists = runcommand.invoke_no_exit("kubectl get chaosexperiment")
@@ -176,7 +194,8 @@ def delete_chaos_experiments(namespace):


 # Delete all chaos engines in a given namespace
-def delete_chaos(namespace):
+# krkn_lib_kubernetes
+def delete_chaos(namespace, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):

    if kubecli.check_if_namespace_exists(namespace):
        logging.info("Deleting all litmus run objects")
@@ -190,7 +209,8 @@ def delete_chaos(namespace):
        logging.info(namespace + " namespace doesn't exist")


-def uninstall_litmus(version, litmus_namespace):
+# krkn_lib_kubernetes
+def uninstall_litmus(version, litmus_namespace, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):

    if kubecli.check_if_namespace_exists(litmus_namespace):
        logging.info("Uninstalling Litmus operator")
--- a/kraken/managedcluster_scenarios/common_managedcluster_functions.py
+++ b/kraken/managedcluster_scenarios/common_managedcluster_functions.py
@@ -1,10 +1,15 @@
 import random
 import logging
-import kraken.kubernetes.client as kubecli
-
+import krkn_lib_kubernetes

+# krkn_lib_kubernetes
 # Pick a random managedcluster with specified label selector
-def get_managedcluster(managedcluster_name, label_selector, instance_kill_count):
+def get_managedcluster(
+        managedcluster_name,
+        label_selector,
+        instance_kill_count,
+        kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
+
    if managedcluster_name in kubecli.list_killable_managedclusters():
        return [managedcluster_name]
    elif managedcluster_name:
@@ -25,10 +30,12 @@ def get_managedcluster(managedcluster_name, label_selector, instance_kill_count)


 # Wait until the managedcluster status becomes Available
-def wait_for_available_status(managedcluster, timeout):
+# krkn_lib_kubernetes
+def wait_for_available_status(managedcluster, timeout, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    kubecli.watch_managedcluster_status(managedcluster, "True", timeout)


 # Wait until the managedcluster status becomes Not Available
-def wait_for_unavailable_status(managedcluster, timeout):
+# krkn_lib_kubernetes
+def wait_for_unavailable_status(managedcluster, timeout, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    kubecli.watch_managedcluster_status(managedcluster, "Unknown", timeout)
--- a/kraken/managedcluster_scenarios/managedcluster_scenarios.py
+++ b/kraken/managedcluster_scenarios/managedcluster_scenarios.py
@@ -5,7 +5,7 @@ import logging
 import sys
 import yaml
 import html
-import kraken.kubernetes.client as kubecli
+import krkn_lib_kubernetes
 import kraken.managedcluster_scenarios.common_managedcluster_functions as common_managedcluster_functions


@@ -13,9 +13,11 @@ class GENERAL:
    def __init__(self):
        pass

-
+# krkn_lib_kubernetes
 class managedcluster_scenarios():
-    def __init__(self):
+    kubecli: krkn_lib_kubernetes.KrknLibKubernetes
+    def __init__(self, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
+        self.kubecli = kubecli
        self.general = GENERAL()

    # managedcluster scenario to start the managedcluster
@@ -31,16 +33,16 @@ class managedcluster_scenarios():
                        args="""kubectl scale deployment.apps/klusterlet --replicas 3 &
                                kubectl scale deployment.apps/klusterlet-registration-agent --replicas 1 -n open-cluster-management-agent""")
                )
-                kubecli.create_manifestwork(body, managedcluster)
+                self.kubecli.create_manifestwork(body, managedcluster)
                logging.info("managedcluster_start_scenario has been successfully injected!")
                logging.info("Waiting for the specified timeout: %s" % timeout)
-                common_managedcluster_functions.wait_for_available_status(managedcluster, timeout)
+                common_managedcluster_functions.wait_for_available_status(managedcluster, timeout, self.kubecli)
            except Exception as e:
                logging.error("managedcluster scenario exiting due to Exception %s" % e)
                sys.exit(1)
            finally:
                logging.info("Deleting manifestworks")
-                kubecli.delete_manifestwork(managedcluster)
+                self.kubecli.delete_manifestwork(managedcluster)

    # managedcluster scenario to stop the managedcluster
    def managedcluster_stop_scenario(self, instance_kill_count, managedcluster, timeout):
@@ -55,16 +57,16 @@ class managedcluster_scenarios():
                        args="""kubectl scale deployment.apps/klusterlet --replicas 0 &&
                                kubectl scale deployment.apps/klusterlet-registration-agent --replicas 0 -n open-cluster-management-agent""")
                )
-                kubecli.create_manifestwork(body, managedcluster)
+                self.kubecli.create_manifestwork(body, managedcluster)
                logging.info("managedcluster_stop_scenario has been successfully injected!")
                logging.info("Waiting for the specified timeout: %s" % timeout)
-                common_managedcluster_functions.wait_for_unavailable_status(managedcluster, timeout)
+                common_managedcluster_functions.wait_for_unavailable_status(managedcluster, timeout, self.kubecli)
            except Exception as e:
                logging.error("managedcluster scenario exiting due to Exception %s" % e)
                sys.exit(1)
            finally:
                logging.info("Deleting manifestworks")
-                kubecli.delete_manifestwork(managedcluster)
+                self.kubecli.delete_manifestwork(managedcluster)

    # managedcluster scenario to stop and then start the managedcluster
    def managedcluster_stop_start_scenario(self, instance_kill_count, managedcluster, timeout):
@@ -94,7 +96,7 @@ class managedcluster_scenarios():
                    template.render(managedcluster_name=managedcluster,
                        args="""kubectl scale deployment.apps/klusterlet --replicas 3""")
                )
-                kubecli.create_manifestwork(body, managedcluster)
+                self.kubecli.create_manifestwork(body, managedcluster)
                logging.info("start_klusterlet_scenario has been successfully injected!")
                time.sleep(30)                              # until https://github.com/open-cluster-management-io/OCM/issues/118 gets solved
            except Exception as e:
@@ -102,7 +104,7 @@ class managedcluster_scenarios():
                sys.exit(1)
            finally:
                logging.info("Deleting manifestworks")
-                kubecli.delete_manifestwork(managedcluster)
+                self.kubecli.delete_manifestwork(managedcluster)

    # managedcluster scenario to stop the klusterlet
    def stop_klusterlet_scenario(self, instance_kill_count, managedcluster, timeout):
@@ -116,7 +118,7 @@ class managedcluster_scenarios():
                    template.render(managedcluster_name=managedcluster,
                        args="""kubectl scale deployment.apps/klusterlet --replicas 0""")
                )
-                kubecli.create_manifestwork(body, managedcluster)
+                self.kubecli.create_manifestwork(body, managedcluster)
                logging.info("stop_klusterlet_scenario has been successfully injected!")
                time.sleep(30)                              # until https://github.com/open-cluster-management-io/OCM/issues/118 gets solved
            except Exception as e:
@@ -124,7 +126,7 @@ class managedcluster_scenarios():
                sys.exit(1)
            finally:
                logging.info("Deleting manifestworks")
-                kubecli.delete_manifestwork(managedcluster)
+                self.kubecli.delete_manifestwork(managedcluster)

    # managedcluster scenario to stop and start the klusterlet
    def stop_start_klusterlet_scenario(self, instance_kill_count, managedcluster, timeout):
--- a/kraken/managedcluster_scenarios/run.py
+++ b/kraken/managedcluster_scenarios/run.py
@@ -1,26 +1,29 @@
 import yaml
 import logging
 import time
+import krkn_lib_kubernetes
 from kraken.managedcluster_scenarios.managedcluster_scenarios import managedcluster_scenarios
 import kraken.managedcluster_scenarios.common_managedcluster_functions as common_managedcluster_functions
 import kraken.cerberus.setup as cerberus


 # Get the managedcluster scenarios object of specfied cloud type
-def get_managedcluster_scenario_object(managedcluster_scenario):
-    return managedcluster_scenarios()
+# krkn_lib_kubernetes
+def get_managedcluster_scenario_object(managedcluster_scenario, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
+    return managedcluster_scenarios(kubecli)

 # Run defined scenarios
-def run(scenarios_list, config, wait_duration):
+# krkn_lib_kubernetes
+def run(scenarios_list, config, wait_duration, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    for managedcluster_scenario_config in scenarios_list:
        with open(managedcluster_scenario_config, "r") as f:
            managedcluster_scenario_config = yaml.full_load(f)
            for managedcluster_scenario in managedcluster_scenario_config["managedcluster_scenarios"]:
-                managedcluster_scenario_object = get_managedcluster_scenario_object(managedcluster_scenario)
+                managedcluster_scenario_object = get_managedcluster_scenario_object(managedcluster_scenario, kubecli)
                if managedcluster_scenario["actions"]:
                    for action in managedcluster_scenario["actions"]:
                        start_time = int(time.time())
-                        inject_managedcluster_scenario(action, managedcluster_scenario, managedcluster_scenario_object)
+                        inject_managedcluster_scenario(action, managedcluster_scenario, managedcluster_scenario_object, kubecli)
                        logging.info("Waiting for the specified duration: %s" % (wait_duration))
                        time.sleep(wait_duration)
                        end_time = int(time.time())
@@ -29,7 +32,8 @@ def run(scenarios_list, config, wait_duration):


 # Inject the specified managedcluster scenario
-def inject_managedcluster_scenario(action, managedcluster_scenario, managedcluster_scenario_object):
+# krkn_lib_kubernetes
+def inject_managedcluster_scenario(action, managedcluster_scenario, managedcluster_scenario_object, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    # Get the managedcluster scenario configurations
    run_kill_count = managedcluster_scenario.get("runs", 1)
    instance_kill_count = managedcluster_scenario.get("instance_count", 1)
@@ -42,7 +46,7 @@ def inject_managedcluster_scenario(action, managedcluster_scenario, managedclust
    else:
        managedcluster_name_list = [managedcluster_name]
    for single_managedcluster_name in managedcluster_name_list:
-        managedclusters = common_managedcluster_functions.get_managedcluster(single_managedcluster_name, label_selector, instance_kill_count)
+        managedclusters = common_managedcluster_functions.get_managedcluster(single_managedcluster_name, label_selector, instance_kill_count, kubecli)
        for single_managedcluster in managedclusters:
            if action == "managedcluster_start_scenario":
                managedcluster_scenario_object.managedcluster_start_scenario(run_kill_count, single_managedcluster, timeout)
--- a/kraken/namespace_actions/common_namespace_functions.py
+++ b/kraken/namespace_actions/common_namespace_functions.py
@@ -1,14 +1,23 @@
 import time
 import random
 import logging
-import kraken.kubernetes.client as kubecli
+import krkn_lib_kubernetes
 import kraken.cerberus.setup as cerberus
 import kraken.post_actions.actions as post_actions
 import yaml
 import sys


-def run(scenarios_list, config, wait_duration, failed_post_scenarios, kubeconfig_path):
+# krkn_lib_kubernetes
+def run(
+        scenarios_list,
+        config,
+        wait_duration,
+        failed_post_scenarios,
+        kubeconfig_path,
+        kubecli: krkn_lib_kubernetes.KrknLibKubernetes
+):
+
    for scenario_config in scenarios_list:
        if len(scenario_config) > 1:
            pre_action_output = post_actions.run(kubeconfig_path, scenario_config[1])
@@ -69,12 +78,12 @@ def run(scenarios_list, config, wait_duration, failed_post_scenarios, kubeconfig
                                logging.error("Failed to run post action checks: %s" % e)
                                sys.exit(1)
                        else:
-                            failed_post_scenarios = check_active_namespace(killed_namespaces, wait_time)
+                            failed_post_scenarios = check_active_namespace(killed_namespaces, wait_time, kubecli)
                end_time = int(time.time())
                cerberus.publish_kraken_status(config, failed_post_scenarios, start_time, end_time)

-
-def check_active_namespace(killed_namespaces, wait_time):
+# krkn_lib_kubernetes
+def check_active_namespace(killed_namespaces, wait_time, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    active_namespace = []
    timer = 0
    while timer < wait_time and killed_namespaces:
--- a/kraken/network_chaos/actions.py
+++ b/kraken/network_chaos/actions.py
@@ -4,14 +4,15 @@ import time
 import sys
 import os
 import random
+import krkn_lib_kubernetes
 from jinja2 import Environment, FileSystemLoader
 import kraken.cerberus.setup as cerberus
-import kraken.kubernetes.client as kubecli
 import kraken.node_actions.common_node_functions as common_node_functions


+# krkn_lib_kubernetes
 # Reads the scenario config and introduces traffic variations in Node's host network interface.
-def run(scenarios_list, config, wait_duration):
+def run(scenarios_list, config, wait_duration, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    failed_post_scenarios = ""
    logging.info("Runing the Network Chaos tests")
    for net_config in scenarios_list:
@@ -32,11 +33,11 @@ def run(scenarios_list, config, wait_duration):
                node_name_list = [test_node]
            nodelst = []
            for single_node_name in node_name_list:
-                nodelst.extend(common_node_functions.get_node(single_node_name, test_node_label, test_instance_count))
+                nodelst.extend(common_node_functions.get_node(single_node_name, test_node_label, test_instance_count, kubecli))
            file_loader = FileSystemLoader(os.path.abspath(os.path.dirname(__file__)))
            env = Environment(loader=file_loader, autoescape=True)
            pod_template = env.get_template("pod.j2")
-            test_interface = verify_interface(test_interface, nodelst, pod_template)
+            test_interface = verify_interface(test_interface, nodelst, pod_template, kubecli)
            joblst = []
            egress_lst = [i for i in param_lst if i in test_egress]
            chaos_config = {
@@ -68,7 +69,7 @@ def run(scenarios_list, config, wait_duration):
                    if test_execution == "serial":
                        logging.info("Waiting for serial job to finish")
                        start_time = int(time.time())
-                        wait_for_job(joblst[:], test_duration + 300)
+                        wait_for_job(joblst[:], kubecli, test_duration + 300)
                        logging.info("Waiting for wait_duration %s" % wait_duration)
                        time.sleep(wait_duration)
                        end_time = int(time.time())
@@ -78,7 +79,7 @@ def run(scenarios_list, config, wait_duration):
                if test_execution == "parallel":
                    logging.info("Waiting for parallel job to finish")
                    start_time = int(time.time())
-                    wait_for_job(joblst[:], test_duration + 300)
+                    wait_for_job(joblst[:], kubecli, test_duration + 300)
                    logging.info("Waiting for wait_duration %s" % wait_duration)
                    time.sleep(wait_duration)
                    end_time = int(time.time())
@@ -88,10 +89,11 @@ def run(scenarios_list, config, wait_duration):
                sys.exit(1)
            finally:
                logging.info("Deleting jobs")
-                delete_job(joblst[:])
+                delete_job(joblst[:], kubecli)


-def verify_interface(test_interface, nodelst, template):
+# krkn_lib_kubernetes
+def verify_interface(test_interface, nodelst, template, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    pod_index = random.randint(0, len(nodelst) - 1)
    pod_body = yaml.safe_load(template.render(nodename=nodelst[pod_index]))
    logging.info("Creating pod to query interface on node %s" % nodelst[pod_index])
@@ -115,14 +117,16 @@ def verify_interface(test_interface, nodelst, template):
        kubecli.delete_pod("fedtools", "default")


-def get_job_pods(api_response):
+# krkn_lib_kubernetes
+def get_job_pods(api_response, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    controllerUid = api_response.metadata.labels["controller-uid"]
    pod_label_selector = "controller-uid=" + controllerUid
    pods_list = kubecli.list_pods(label_selector=pod_label_selector, namespace="default")
    return pods_list[0]


-def wait_for_job(joblst, timeout=300):
+# krkn_lib_kubernetes
+def wait_for_job(joblst, kubecli: krkn_lib_kubernetes.KrknLibKubernetes, timeout=300):
    waittime = time.time() + timeout
    count = 0
    joblen = len(joblst)
@@ -134,25 +138,26 @@ def wait_for_job(joblst, timeout=300):
                    count += 1
                    joblst.remove(jobname)
            except Exception:
-                logging.warn("Exception in getting job status")
+                logging.warning("Exception in getting job status")
            if time.time() > waittime:
                raise Exception("Starting pod failed")
            time.sleep(5)


-def delete_job(joblst):
+# krkn_lib_kubernetes
+def delete_job(joblst, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    for jobname in joblst:
        try:
            api_response = kubecli.get_job_status(jobname, namespace="default")
            if api_response.status.failed is not None:
-                pod_name = get_job_pods(api_response)
+                pod_name = get_job_pods(api_response, kubecli)
                pod_stat = kubecli.read_pod(name=pod_name, namespace="default")
                logging.error(pod_stat.status.container_statuses)
                pod_log_response = kubecli.get_pod_log(name=pod_name, namespace="default")
                pod_log = pod_log_response.data.decode("utf-8")
                logging.error(pod_log)
        except Exception:
-            logging.warn("Exception in getting job status")
+            logging.warning("Exception in getting job status")
        api_response = kubecli.delete_job(name=jobname, namespace="default")


--- a/kraken/node_actions/abstract_node_scenarios.py
+++ b/kraken/node_actions/abstract_node_scenarios.py
@@ -2,10 +2,13 @@ import sys
 import logging
 import kraken.invoke.command as runcommand
 import kraken.node_actions.common_node_functions as nodeaction
+import  krkn_lib_kubernetes

-
+# krkn_lib_kubernetes
 class abstract_node_scenarios:
-
+    kubecli: krkn_lib_kubernetes.KrknLibKubernetes
+    def __init__(self, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
+        self.kubecli = kubecli
    # Node scenario to start the node
    def node_start_scenario(self, instance_kill_count, node, timeout):
        pass
@@ -42,7 +45,7 @@ class abstract_node_scenarios:
                logging.info("Starting stop_kubelet_scenario injection")
                logging.info("Stopping the kubelet of the node %s" % (node))
                runcommand.run("oc debug node/" + node + " -- chroot /host systemctl stop kubelet")
-                nodeaction.wait_for_unknown_status(node, timeout)
+                nodeaction.wait_for_unknown_status(node, timeout, self.kubecli)
                logging.info("The kubelet of the node %s has been stopped" % (node))
                logging.info("stop_kubelet_scenario has been successfuly injected!")
            except Exception as e:
--- a/kraken/node_actions/alibaba_node_scenarios.py
+++ b/kraken/node_actions/alibaba_node_scenarios.py
@@ -1,5 +1,6 @@
 import sys
 import time
+import krkn_lib_kubernetes
 from aliyunsdkcore.client import AcsClient
 from aliyunsdkecs.request.v20140526 import DescribeInstancesRequest, DeleteInstanceRequest
 from aliyunsdkecs.request.v20140526 import StopInstanceRequest, StartInstanceRequest, RebootInstanceRequest
@@ -179,9 +180,9 @@ class Alibaba:
        logging.info("ECS %s is released" % instance_id)
        return True

-
+# krkn_lib_kubernetes
 class alibaba_node_scenarios(abstract_node_scenarios):
-    def __init__(self):
+    def __init__(self,kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
        self.alibaba = Alibaba()

    # Node scenario to start the node
@@ -193,7 +194,7 @@ class alibaba_node_scenarios(abstract_node_scenarios):
                logging.info("Starting the node %s with instance ID: %s " % (node, vm_id))
                self.alibaba.start_instances(vm_id)
                self.alibaba.wait_until_running(vm_id, timeout)
-                nodeaction.wait_for_ready_status(node, timeout)
+                nodeaction.wait_for_ready_status(node, timeout, self.kubecli)
                logging.info("Node with instance ID: %s is in running state" % node)
                logging.info("node_start_scenario has been successfully injected!")
            except Exception as e:
@@ -213,7 +214,7 @@ class alibaba_node_scenarios(abstract_node_scenarios):
                self.alibaba.stop_instances(vm_id)
                self.alibaba.wait_until_stopped(vm_id, timeout)
                logging.info("Node with instance ID: %s is in stopped state" % vm_id)
-                nodeaction.wait_for_unknown_status(node, timeout)
+                nodeaction.wait_for_unknown_status(node, timeout, self.kubecli)
            except Exception as e:
                logging.error("Failed to stop node instance. Encountered following exception: %s. " "Test Failed" % e)
                logging.error("node_stop_scenario injection failed!")
@@ -248,8 +249,8 @@ class alibaba_node_scenarios(abstract_node_scenarios):
                instance_id = self.alibaba.get_instance_id(node)
                logging.info("Rebooting the node with instance ID: %s " % (instance_id))
                self.alibaba.reboot_instances(instance_id)
-                nodeaction.wait_for_unknown_status(node, timeout)
-                nodeaction.wait_for_ready_status(node, timeout)
+                nodeaction.wait_for_unknown_status(node, timeout, self.kubecli)
+                nodeaction.wait_for_ready_status(node, timeout, self.kubecli)
                logging.info("Node with instance ID: %s has been rebooted" % (instance_id))
                logging.info("node_reboot_scenario has been successfully injected!")
            except Exception as e:
--- a/kraken/node_actions/aws_node_scenarios.py
+++ b/kraken/node_actions/aws_node_scenarios.py
@@ -2,7 +2,7 @@ import sys
 import time
 import boto3
 import logging
-import kraken.kubernetes.client as kubecli
+import krkn_lib_kubernetes
 import kraken.node_actions.common_node_functions as nodeaction
 from kraken.node_actions.abstract_node_scenarios import abstract_node_scenarios

@@ -150,9 +150,10 @@ class AWS:
            )
            sys.exit(1)

-
+# krkn_lib_kubernetes
 class aws_node_scenarios(abstract_node_scenarios):
-    def __init__(self):
+    def __init__(self, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
+        super().__init__(kubecli)
        self.aws = AWS()

    # Node scenario to start the node
@@ -164,7 +165,7 @@ class aws_node_scenarios(abstract_node_scenarios):
                logging.info("Starting the node %s with instance ID: %s " % (node, instance_id))
                self.aws.start_instances(instance_id)
                self.aws.wait_until_running(instance_id)
-                nodeaction.wait_for_ready_status(node, timeout)
+                nodeaction.wait_for_ready_status(node, timeout, self.kubecli)
                logging.info("Node with instance ID: %s is in running state" % (instance_id))
                logging.info("node_start_scenario has been successfully injected!")
            except Exception as e:
@@ -184,7 +185,7 @@ class aws_node_scenarios(abstract_node_scenarios):
                self.aws.stop_instances(instance_id)
                self.aws.wait_until_stopped(instance_id)
                logging.info("Node with instance ID: %s is in stopped state" % (instance_id))
-                nodeaction.wait_for_unknown_status(node, timeout)
+                nodeaction.wait_for_unknown_status(node, timeout, self.kubecli)
            except Exception as e:
                logging.error("Failed to stop node instance. Encountered following exception: %s. " "Test Failed" % (e))
                logging.error("node_stop_scenario injection failed!")
@@ -200,10 +201,10 @@ class aws_node_scenarios(abstract_node_scenarios):
                self.aws.terminate_instances(instance_id)
                self.aws.wait_until_terminated(instance_id)
                for _ in range(timeout):
-                    if node not in kubecli.list_nodes():
+                    if node not in self.kubecli.list_nodes():
                        break
                    time.sleep(1)
-                if node in kubecli.list_nodes():
+                if node in self.kubecli.list_nodes():
                    raise Exception("Node could not be terminated")
                logging.info("Node with instance ID: %s has been terminated" % (instance_id))
                logging.info("node_termination_scenario has been successfuly injected!")
@@ -222,8 +223,8 @@ class aws_node_scenarios(abstract_node_scenarios):
                instance_id = self.aws.get_instance_id(node)
                logging.info("Rebooting the node %s with instance ID: %s " % (node, instance_id))
                self.aws.reboot_instances(instance_id)
-                nodeaction.wait_for_unknown_status(node, timeout)
-                nodeaction.wait_for_ready_status(node, timeout)
+                nodeaction.wait_for_unknown_status(node, timeout, self.kubecli)
+                nodeaction.wait_for_ready_status(node, timeout, self.kubecli)
                logging.info("Node with instance ID: %s has been rebooted" % (instance_id))
                logging.info("node_reboot_scenario has been successfuly injected!")
            except Exception as e:
--- a/kraken/node_actions/az_node_scenarios.py
+++ b/kraken/node_actions/az_node_scenarios.py
@@ -3,7 +3,7 @@ import time
 from azure.mgmt.compute import ComputeManagementClient
 from azure.identity import DefaultAzureCredential
 import logging
-import kraken.kubernetes.client as kubecli
+import krkn_lib_kubernetes
 import kraken.node_actions.common_node_functions as nodeaction
 from kraken.node_actions.abstract_node_scenarios import abstract_node_scenarios
 import kraken.invoke.command as runcommand
@@ -121,9 +121,10 @@ class Azure:
                logging.info("Vm %s is terminated" % vm_name)
                return True

-
+# krkn_lib_kubernetes
 class azure_node_scenarios(abstract_node_scenarios):
-    def __init__(self):
+    def __init__(self, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
+        super().__init__(kubecli)
        logging.info("init in azure")
        self.azure = Azure()

@@ -136,7 +137,7 @@ class azure_node_scenarios(abstract_node_scenarios):
                logging.info("Starting the node %s with instance ID: %s " % (vm_name, resource_group))
                self.azure.start_instances(resource_group, vm_name)
                self.azure.wait_until_running(resource_group, vm_name, timeout)
-                nodeaction.wait_for_ready_status(vm_name, timeout)
+                nodeaction.wait_for_ready_status(vm_name, timeout,self.kubecli)
                logging.info("Node with instance ID: %s is in running state" % node)
                logging.info("node_start_scenario has been successfully injected!")
            except Exception as e:
@@ -156,7 +157,7 @@ class azure_node_scenarios(abstract_node_scenarios):
                self.azure.stop_instances(resource_group, vm_name)
                self.azure.wait_until_stopped(resource_group, vm_name, timeout)
                logging.info("Node with instance ID: %s is in stopped state" % vm_name)
-                nodeaction.wait_for_unknown_status(vm_name, timeout)
+                nodeaction.wait_for_unknown_status(vm_name, timeout, self.kubecli)
            except Exception as e:
                logging.error("Failed to stop node instance. Encountered following exception: %s. " "Test Failed" % e)
                logging.error("node_stop_scenario injection failed!")
@@ -172,10 +173,10 @@ class azure_node_scenarios(abstract_node_scenarios):
                self.azure.terminate_instances(resource_group, vm_name)
                self.azure.wait_until_terminated(resource_group, vm_name, timeout)
                for _ in range(timeout):
-                    if vm_name not in kubecli.list_nodes():
+                    if vm_name not in self.kubecli.list_nodes():
                        break
                    time.sleep(1)
-                if vm_name in kubecli.list_nodes():
+                if vm_name in self.kubecli.list_nodes():
                    raise Exception("Node could not be terminated")
                logging.info("Node with instance ID: %s has been terminated" % node)
                logging.info("node_termination_scenario has been successfully injected!")
@@ -194,8 +195,8 @@ class azure_node_scenarios(abstract_node_scenarios):
                vm_name, resource_group = self.azure.get_instance_id(node)
                logging.info("Rebooting the node %s with instance ID: %s " % (vm_name, resource_group))
                self.azure.reboot_instances(resource_group, vm_name)
-                nodeaction.wait_for_unknown_status(vm_name, timeout)
-                nodeaction.wait_for_ready_status(vm_name, timeout)
+                nodeaction.wait_for_unknown_status(vm_name, timeout, self.kubecli)
+                nodeaction.wait_for_ready_status(vm_name, timeout, self.kubecli)
                logging.info("Node with instance ID: %s has been rebooted" % (vm_name))
                logging.info("node_reboot_scenario has been successfully injected!")
            except Exception as e:
--- a/kraken/node_actions/bm_node_scenarios.py
+++ b/kraken/node_actions/bm_node_scenarios.py
@@ -1,5 +1,6 @@
 import kraken.node_actions.common_node_functions as nodeaction
 from kraken.node_actions.abstract_node_scenarios import abstract_node_scenarios
+import krkn_lib_kubernetes
 import logging
 import openshift as oc
 import pyipmi
@@ -104,9 +105,10 @@ class BM:
        while self.get_ipmi_connection(bmc_addr, node_name).get_chassis_status().power_on:
            time.sleep(1)

-
+# krkn_lib_kubernetes
 class bm_node_scenarios(abstract_node_scenarios):
-    def __init__(self, bm_info, user, passwd):
+    def __init__(self, bm_info, user, passwd, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
+        super().__init__(kubecli)
        self.bm = BM(bm_info, user, passwd)

    # Node scenario to start the node
@@ -118,7 +120,7 @@ class bm_node_scenarios(abstract_node_scenarios):
                logging.info("Starting the node %s with bmc address: %s " % (node, bmc_addr))
                self.bm.start_instances(bmc_addr, node)
                self.bm.wait_until_running(bmc_addr, node)
-                nodeaction.wait_for_ready_status(node, timeout)
+                nodeaction.wait_for_ready_status(node, timeout, self.kubecli)
                logging.info("Node with bmc address: %s is in running state" % (bmc_addr))
                logging.info("node_start_scenario has been successfully injected!")
            except Exception as e:
@@ -140,7 +142,7 @@ class bm_node_scenarios(abstract_node_scenarios):
                self.bm.stop_instances(bmc_addr, node)
                self.bm.wait_until_stopped(bmc_addr, node)
                logging.info("Node with bmc address: %s is in stopped state" % (bmc_addr))
-                nodeaction.wait_for_unknown_status(node, timeout)
+                nodeaction.wait_for_unknown_status(node, timeout, self.kubecli)
            except Exception as e:
                logging.error(
                    "Failed to stop node instance. Encountered following exception: %s. "
@@ -163,8 +165,8 @@ class bm_node_scenarios(abstract_node_scenarios):
                logging.info("BMC Addr: %s" % (bmc_addr))
                logging.info("Rebooting the node %s with bmc address: %s " % (node, bmc_addr))
                self.bm.reboot_instances(bmc_addr, node)
-                nodeaction.wait_for_unknown_status(node, timeout)
-                nodeaction.wait_for_ready_status(node, timeout)
+                nodeaction.wait_for_unknown_status(node, timeout, self.kubecli)
+                nodeaction.wait_for_ready_status(node, timeout, self.kubecli)
                logging.info("Node with bmc address: %s has been rebooted" % (bmc_addr))
                logging.info("node_reboot_scenario has been successfuly injected!")
            except Exception as e:
--- a/kraken/node_actions/common_node_functions.py
+++ b/kraken/node_actions/common_node_functions.py
@@ -2,14 +2,14 @@ import time
 import random
 import logging
 import paramiko
-import kraken.kubernetes.client as kubecli
+import krkn_lib_kubernetes
 import kraken.invoke.command as runcommand

 node_general = False


 # Pick a random node with specified label selector
-def get_node(node_name, label_selector, instance_kill_count):
+def get_node(node_name, label_selector, instance_kill_count, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    if node_name in kubecli.list_killable_nodes():
        return [node_name]
    elif node_name:
@@ -29,20 +29,21 @@ def get_node(node_name, label_selector, instance_kill_count):
    return nodes_to_return


+# krkn_lib_kubernetes
 # Wait until the node status becomes Ready
-def wait_for_ready_status(node, timeout):
+def wait_for_ready_status(node, timeout, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    resource_version = kubecli.get_node_resource_version(node)
    kubecli.watch_node_status(node, "True", timeout, resource_version)

-
+# krkn_lib_kubernetes
 # Wait until the node status becomes Not Ready
-def wait_for_not_ready_status(node, timeout):
+def wait_for_not_ready_status(node, timeout, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    resource_version = kubecli.get_node_resource_version(node)
    kubecli.watch_node_status(node, "False", timeout, resource_version)

-
+# krkn_lib_kubernetes
 # Wait until the node status becomes Unknown
-def wait_for_unknown_status(node, timeout):
+def wait_for_unknown_status(node, timeout, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    resource_version = kubecli.get_node_resource_version(node)
    kubecli.watch_node_status(node, "Unknown", timeout, resource_version)

--- a/kraken/node_actions/docker_node_scenarios.py
+++ b/kraken/node_actions/docker_node_scenarios.py
@@ -1,5 +1,6 @@
 import kraken.node_actions.common_node_functions as nodeaction
 from kraken.node_actions.abstract_node_scenarios import abstract_node_scenarios
+import krkn_lib_kubernetes
 import logging
 import sys
 import docker
@@ -36,7 +37,8 @@ class Docker:


 class docker_node_scenarios(abstract_node_scenarios):
-    def __init__(self):
+    def __init__(self, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
+        super().__init__(kubecli)
        self.docker = Docker()

    # Node scenario to start the node
@@ -47,7 +49,7 @@ class docker_node_scenarios(abstract_node_scenarios):
                container_id = self.docker.get_container_id(node)
                logging.info("Starting the node %s with container ID: %s " % (node, container_id))
                self.docker.start_instances(node)
-                nodeaction.wait_for_ready_status(node, timeout)
+                nodeaction.wait_for_ready_status(node, timeout, self.kubecli)
                logging.info("Node with container ID: %s is in running state" % (container_id))
                logging.info("node_start_scenario has been successfully injected!")
            except Exception as e:
@@ -66,7 +68,7 @@ class docker_node_scenarios(abstract_node_scenarios):
                logging.info("Stopping the node %s with container ID: %s " % (node, container_id))
                self.docker.stop_instances(node)
                logging.info("Node with container ID: %s is in stopped state" % (container_id))
-                nodeaction.wait_for_unknown_status(node, timeout)
+                nodeaction.wait_for_unknown_status(node, timeout, self.kubecli)
            except Exception as e:
                logging.error("Failed to stop node instance. Encountered following exception: %s. " "Test Failed" % (e))
                logging.error("node_stop_scenario injection failed!")
@@ -97,8 +99,8 @@ class docker_node_scenarios(abstract_node_scenarios):
                container_id = self.docker.get_container_id(node)
                logging.info("Rebooting the node %s with container ID: %s " % (node, container_id))
                self.docker.reboot_instances(node)
-                nodeaction.wait_for_unknown_status(node, timeout)
-                nodeaction.wait_for_ready_status(node, timeout)
+                nodeaction.wait_for_unknown_status(node, timeout, self.kubecli)
+                nodeaction.wait_for_ready_status(node, timeout, self.kubecli)
                logging.info("Node with container ID: %s has been rebooted" % (container_id))
                logging.info("node_reboot_scenario has been successfuly injected!")
            except Exception as e:
--- a/kraken/node_actions/gcp_node_scenarios.py
+++ b/kraken/node_actions/gcp_node_scenarios.py
@@ -1,7 +1,7 @@
 import sys
 import time
 import logging
-import kraken.kubernetes.client as kubecli
+import krkn_lib_kubernetes
 import kraken.node_actions.common_node_functions as nodeaction
 from kraken.node_actions.abstract_node_scenarios import abstract_node_scenarios
 from googleapiclient import discovery
@@ -133,8 +133,10 @@ class GCP:
            return True


+# krkn_lib_kubernetes
 class gcp_node_scenarios(abstract_node_scenarios):
-    def __init__(self):
+    def __init__(self, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
+        super().__init__(kubecli)
        self.gcp = GCP()

    # Node scenario to start the node
@@ -146,7 +148,7 @@ class gcp_node_scenarios(abstract_node_scenarios):
                logging.info("Starting the node %s with instance ID: %s " % (node, instance_id))
                self.gcp.start_instances(zone, instance_id)
                self.gcp.wait_until_running(zone, instance_id, timeout)
-                nodeaction.wait_for_ready_status(node, timeout)
+                nodeaction.wait_for_ready_status(node, timeout, self.kubecli)
                logging.info("Node with instance ID: %s is in running state" % instance_id)
                logging.info("node_start_scenario has been successfully injected!")
            except Exception as e:
@@ -167,7 +169,7 @@ class gcp_node_scenarios(abstract_node_scenarios):
                self.gcp.stop_instances(zone, instance_id)
                self.gcp.wait_until_stopped(zone, instance_id, timeout)
                logging.info("Node with instance ID: %s is in stopped state" % instance_id)
-                nodeaction.wait_for_unknown_status(node, timeout)
+                nodeaction.wait_for_unknown_status(node, timeout, self.kubecli)
            except Exception as e:
                logging.error("Failed to stop node instance. Encountered following exception: %s. " "Test Failed" % (e))
                logging.error("node_stop_scenario injection failed!")
@@ -183,10 +185,10 @@ class gcp_node_scenarios(abstract_node_scenarios):
                self.gcp.terminate_instances(zone, instance_id)
                self.gcp.wait_until_terminated(zone, instance_id, timeout)
                for _ in range(timeout):
-                    if node not in kubecli.list_nodes():
+                    if node not in self.kubecli.list_nodes():
                        break
                    time.sleep(1)
-                if node in kubecli.list_nodes():
+                if node in self.kubecli.list_nodes():
                    raise Exception("Node could not be terminated")
                logging.info("Node with instance ID: %s has been terminated" % instance_id)
                logging.info("node_termination_scenario has been successfuly injected!")
@@ -205,7 +207,7 @@ class gcp_node_scenarios(abstract_node_scenarios):
                instance_id, zone = self.gcp.get_instance_id(node)
                logging.info("Rebooting the node %s with instance ID: %s " % (node, instance_id))
                self.gcp.reboot_instances(zone, instance_id)
-                nodeaction.wait_for_ready_status(node, timeout)
+                nodeaction.wait_for_ready_status(node, timeout, self.kubecli)
                logging.info("Node with instance ID: %s has been rebooted" % instance_id)
                logging.info("node_reboot_scenario has been successfuly injected!")
            except Exception as e:
--- a/kraken/node_actions/general_cloud_node_scenarios.py
+++ b/kraken/node_actions/general_cloud_node_scenarios.py
@@ -1,4 +1,5 @@
 import logging
+import krkn_lib_kubernetes
 from kraken.node_actions.abstract_node_scenarios import abstract_node_scenarios


@@ -6,9 +7,10 @@ class GENERAL:
    def __init__(self):
        pass

-
+# krkn_lib_kubernetes
 class general_node_scenarios(abstract_node_scenarios):
-    def __init__(self):
+    def __init__(self, kubecli: krkn_lib_kubernetes.KrknLibKubernetes ):
+        super().__init__(kubecli)
        self.general = GENERAL()

    # Node scenario to start the node
--- a/kraken/node_actions/openstack_node_scenarios.py
+++ b/kraken/node_actions/openstack_node_scenarios.py
@@ -1,6 +1,7 @@
 import sys
 import time
 import logging
+import krkn_lib_kubernetes
 import kraken.invoke.command as runcommand
 import kraken.node_actions.common_node_functions as nodeaction
 from kraken.node_actions.abstract_node_scenarios import abstract_node_scenarios
@@ -86,9 +87,9 @@ class OPENSTACKCLOUD:
                    return node_name
                counter += 1

-
+# krkn_lib_kubernetes
 class openstack_node_scenarios(abstract_node_scenarios):
-    def __init__(self):
+    def __init__(self, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
        self.openstackcloud = OPENSTACKCLOUD()

    # Node scenario to start the node
@@ -100,7 +101,7 @@ class openstack_node_scenarios(abstract_node_scenarios):
                openstack_node_name = self.openstackcloud.get_instance_id(node)
                self.openstackcloud.start_instances(openstack_node_name)
                self.openstackcloud.wait_until_running(openstack_node_name, timeout)
-                nodeaction.wait_for_ready_status(node, timeout)
+                nodeaction.wait_for_ready_status(node, timeout, self.kubecli)
                logging.info("Node with instance ID: %s is in running state" % (node))
                logging.info("node_start_scenario has been successfully injected!")
            except Exception as e:
@@ -120,7 +121,7 @@ class openstack_node_scenarios(abstract_node_scenarios):
                self.openstackcloud.stop_instances(openstack_node_name)
                self.openstackcloud.wait_until_stopped(openstack_node_name, timeout)
                logging.info("Node with instance name: %s is in stopped state" % (node))
-                nodeaction.wait_for_ready_status(node, timeout)
+                nodeaction.wait_for_ready_status(node, timeout, self.kubecli)
            except Exception as e:
                logging.error("Failed to stop node instance. Encountered following exception: %s. " "Test Failed" % (e))
                logging.error("node_stop_scenario injection failed!")
@@ -134,8 +135,8 @@ class openstack_node_scenarios(abstract_node_scenarios):
                logging.info("Rebooting the node %s" % (node))
                openstack_node_name = self.openstackcloud.get_instance_id(node)
                self.openstackcloud.reboot_instances(openstack_node_name)
-                nodeaction.wait_for_unknown_status(node, timeout)
-                nodeaction.wait_for_ready_status(node, timeout)
+                nodeaction.wait_for_unknown_status(node, timeout, self.kubecli)
+                nodeaction.wait_for_ready_status(node, timeout, self.kubecli)
                logging.info("Node with instance name: %s has been rebooted" % (node))
                logging.info("node_reboot_scenario has been successfuly injected!")
            except Exception as e:
--- a/kraken/node_actions/run.py
+++ b/kraken/node_actions/run.py
@@ -2,6 +2,7 @@ import yaml
 import logging
 import sys
 import time
+import krkn_lib_kubernetes
 from kraken.node_actions.aws_node_scenarios import aws_node_scenarios
 from kraken.node_actions.general_cloud_node_scenarios import general_node_scenarios
 from kraken.node_actions.az_node_scenarios import azure_node_scenarios
@@ -18,27 +19,29 @@ node_general = False


 # Get the node scenarios object of specfied cloud type
-def get_node_scenario_object(node_scenario):
+# krkn_lib_kubernetes
+def get_node_scenario_object(node_scenario, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    if "cloud_type" not in node_scenario.keys() or node_scenario["cloud_type"] == "generic":
        global node_general
        node_general = True
-        return general_node_scenarios()
+        return general_node_scenarios(kubecli)
    if node_scenario["cloud_type"] == "aws":
-        return aws_node_scenarios()
+        return aws_node_scenarios(kubecli)
    elif node_scenario["cloud_type"] == "gcp":
-        return gcp_node_scenarios()
+        return gcp_node_scenarios(kubecli)
    elif node_scenario["cloud_type"] == "openstack":
-        return openstack_node_scenarios()
+        return openstack_node_scenarios(kubecli)
    elif node_scenario["cloud_type"] == "azure" or node_scenario["cloud_type"] == "az":
-        return azure_node_scenarios()
+        return azure_node_scenarios(kubecli)
    elif node_scenario["cloud_type"] == "alibaba" or node_scenario["cloud_type"] == "alicloud":
-        return alibaba_node_scenarios()
+        return alibaba_node_scenarios(kubecli)
    elif node_scenario["cloud_type"] == "bm":
        return bm_node_scenarios(
-            node_scenario.get("bmc_info"), node_scenario.get("bmc_user", None), node_scenario.get("bmc_password", None)
+            node_scenario.get("bmc_info"), node_scenario.get("bmc_user", None), node_scenario.get("bmc_password", None),
+            kubecli
        )
    elif node_scenario["cloud_type"] == "docker":
-        return docker_node_scenarios()
+        return docker_node_scenarios(kubecli)
    else:
        logging.error(
            "Cloud type " + node_scenario["cloud_type"] + " is not currently supported; "
@@ -49,16 +52,17 @@ def get_node_scenario_object(node_scenario):


 # Run defined scenarios
-def run(scenarios_list, config, wait_duration):
+# krkn_lib_kubernetes
+def run(scenarios_list, config, wait_duration, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    for node_scenario_config in scenarios_list:
        with open(node_scenario_config, "r") as f:
            node_scenario_config = yaml.full_load(f)
            for node_scenario in node_scenario_config["node_scenarios"]:
-                node_scenario_object = get_node_scenario_object(node_scenario)
+                node_scenario_object = get_node_scenario_object(node_scenario, kubecli)
                if node_scenario["actions"]:
                    for action in node_scenario["actions"]:
                        start_time = int(time.time())
-                        inject_node_scenario(action, node_scenario, node_scenario_object)
+                        inject_node_scenario(action, node_scenario, node_scenario_object, kubecli)
                        logging.info("Waiting for the specified duration: %s" % (wait_duration))
                        time.sleep(wait_duration)
                        end_time = int(time.time())
@@ -67,7 +71,7 @@ def run(scenarios_list, config, wait_duration):


 # Inject the specified node scenario
-def inject_node_scenario(action, node_scenario, node_scenario_object):
+def inject_node_scenario(action, node_scenario, node_scenario_object, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    generic_cloud_scenarios = ("stop_kubelet_scenario", "node_crash_scenario")
    # Get the node scenario configurations
    run_kill_count = node_scenario.get("runs", 1)
@@ -83,7 +87,7 @@ def inject_node_scenario(action, node_scenario, node_scenario_object):
    else:
        node_name_list = [node_name]
    for single_node_name in node_name_list:
-        nodes = common_node_functions.get_node(single_node_name, label_selector, instance_kill_count)
+        nodes = common_node_functions.get_node(single_node_name, label_selector, instance_kill_count, kubecli)
        for single_node in nodes:
            if node_general and action not in generic_cloud_scenarios:
                logging.info("Scenario: " + action + " is not set up for generic cloud type, skipping action")
--- a/kraken/pod_scenarios/setup.py
+++ b/kraken/pod_scenarios/setup.py
@@ -5,7 +5,7 @@ import arcaflow_plugin_kill_pod

 import kraken.cerberus.setup as cerberus
 import kraken.post_actions.actions as post_actions
-import kraken.kubernetes.client as kubecli
+import krkn_lib_kubernetes
 import time
 import yaml
 import sys
@@ -66,8 +66,8 @@ def run(kubeconfig_path, scenarios_list, config, failed_post_scenarios, wait_dur
        cerberus.publish_kraken_status(config, failed_post_scenarios, start_time, end_time)
    return failed_post_scenarios

-
-def container_run(kubeconfig_path, scenarios_list, config, failed_post_scenarios, wait_duration):
+# krkn_lib_kubernetes
+def container_run(kubeconfig_path, scenarios_list, config, failed_post_scenarios, wait_duration, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    for container_scenario_config in scenarios_list:
        if len(container_scenario_config) > 1:
            pre_action_output = post_actions.run(kubeconfig_path, container_scenario_config[1])
@@ -78,7 +78,7 @@ def container_run(kubeconfig_path, scenarios_list, config, failed_post_scenarios
            for cont_scenario in cont_scenario_config["scenarios"]:
                # capture start time
                start_time = int(time.time())
-                killed_containers = container_killing_in_pod(cont_scenario)
+                killed_containers = container_killing_in_pod(cont_scenario, kubecli)

                if len(container_scenario_config) > 1:
                    try:
@@ -90,7 +90,7 @@ def container_run(kubeconfig_path, scenarios_list, config, failed_post_scenarios
                        sys.exit(1)
                else:
                    failed_post_scenarios = check_failed_containers(
-                        killed_containers, cont_scenario.get("retry_wait", 120)
+                        killed_containers, cont_scenario.get("retry_wait", 120), kubecli
                    )

                logging.info("Waiting for the specified duration: %s" % (wait_duration))
@@ -104,7 +104,7 @@ def container_run(kubeconfig_path, scenarios_list, config, failed_post_scenarios
                logging.info("")


-def container_killing_in_pod(cont_scenario):
+def container_killing_in_pod(cont_scenario, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    scenario_name = cont_scenario.get("name", "")
    namespace = cont_scenario.get("namespace", "*")
    label_selector = cont_scenario.get("label_selector", None)
@@ -153,11 +153,11 @@ def container_killing_in_pod(cont_scenario):
            if container_name != "":
                if c_name == container_name:
                    killed_container_list.append([selected_container_pod[0], selected_container_pod[1], c_name])
-                    retry_container_killing(kill_action, selected_container_pod[0], selected_container_pod[1], c_name)
+                    retry_container_killing(kill_action, selected_container_pod[0], selected_container_pod[1], c_name, kubecli)
                    break
            else:
                killed_container_list.append([selected_container_pod[0], selected_container_pod[1], c_name])
-                retry_container_killing(kill_action, selected_container_pod[0], selected_container_pod[1], c_name)
+                retry_container_killing(kill_action, selected_container_pod[0], selected_container_pod[1], c_name, kubecli)
                break
        container_pod_list.remove(selected_container_pod)
        killed_count += 1
@@ -165,7 +165,7 @@ def container_killing_in_pod(cont_scenario):
    return killed_container_list


-def retry_container_killing(kill_action, podname, namespace, container_name):
+def retry_container_killing(kill_action, podname, namespace, container_name, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    i = 0
    while i < 5:
        logging.info("Killing container %s in pod %s (ns %s)" % (str(container_name), str(podname), str(namespace)))
@@ -181,7 +181,7 @@ def retry_container_killing(kill_action, podname, namespace, container_name):
            continue


-def check_failed_containers(killed_container_list, wait_time):
+def check_failed_containers(killed_container_list, wait_time, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):

    container_ready = []
    timer = 0
--- a/kraken/prometheus/client.py
+++ b/kraken/prometheus/client.py
@@ -31,7 +31,7 @@ def process_prom_query(query):
            logging.error("Failed to get the metrics: %s" % e)
            sys.exit(1)
    else:
-        logging.info("Skipping the prometheus query as the prometheus client couldn't " "be initilized\n")
+        logging.info("Skipping the prometheus query as the prometheus client couldn't " "be initialized\n")

 # Get prometheus details
 def instance(distribution, prometheus_url, prometheus_bearer_token):
--- a/kraken/pvc/pvc_scenario.py
+++ b/kraken/pvc/pvc_scenario.py
@@ -3,14 +3,14 @@ import random
 import re
 import sys
 import time
-
+import krkn_lib_kubernetes
 import yaml

 from ..cerberus import setup as cerberus
-from ..kubernetes import client as kubecli


-def run(scenarios_list, config):
+# krkn_lib_kubernetes
+def run(scenarios_list, config, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    """
    Reads the scenario config and creates a temp file to fill up the PVC
    """
@@ -128,7 +128,6 @@ def run(scenarios_list, config):
                        pod_name,
                        namespace,
                        container_name,
-                        "sh"
                    )
                ).split()
                pvc_used_kb = int(command_output[2])
@@ -186,14 +185,14 @@ def run(scenarios_list, config):
                    "Create temp file in the PVC command:\n %s" % command
                )
                kubecli.exec_cmd_in_pod(
-                    command, pod_name, namespace, container_name, "sh"
+                    command, pod_name, namespace, container_name
                )

                # Check if file is created
                command = "ls -lh %s" % (str(mount_path))
                logging.debug("Check file is created command:\n %s" % command)
                response = kubecli.exec_cmd_in_pod(
-                    command, pod_name, namespace, container_name, "sh"
+                    command, pod_name, namespace, container_name
                )
                logging.info("\n" + str(response))
                if str(file_name).lower() in str(response).lower():
@@ -213,7 +212,8 @@ def run(scenarios_list, config):
                        namespace,
                        container_name,
                        mount_path,
-                        file_size_kb
+                        file_size_kb,
+                        kubecli
                    )
                    sys.exit(1)

@@ -233,7 +233,8 @@ def run(scenarios_list, config):
                    namespace,
                    container_name,
                    mount_path,
-                    file_size_kb
+                    file_size_kb,
+                    kubecli
                )

                end_time = int(time.time())
@@ -245,6 +246,7 @@ def run(scenarios_list, config):
                )


+# krkn_lib_kubernetes
 def remove_temp_file(
    file_name,
    full_path,
@@ -252,19 +254,19 @@ def remove_temp_file(
    namespace,
    container_name,
    mount_path,
-    file_size_kb
+    file_size_kb,
+    kubecli: krkn_lib_kubernetes.KrknLibKubernetes
 ):
    command = "rm -f %s" % (str(full_path))
    logging.debug("Remove temp file from the PVC command:\n %s" % command)
-    kubecli.exec_cmd_in_pod(command, pod_name, namespace, container_name, "sh")
+    kubecli.exec_cmd_in_pod(command, pod_name, namespace, container_name)
    command = "ls -lh %s" % (str(mount_path))
    logging.debug("Check temp file is removed command:\n %s" % command)
    response = kubecli.exec_cmd_in_pod(
        command,
        pod_name,
        namespace,
-        container_name,
-        "sh"
+        container_name
    )
    logging.info("\n" + str(response))
    if not (str(file_name).lower() in str(response).lower()):
--- a/kraken/shut_down/common_shut_down_func.py
+++ b/kraken/shut_down/common_shut_down_func.py
@@ -4,10 +4,10 @@ import sys
 import yaml
 import logging
 import time
+import krkn_lib_kubernetes
 from multiprocessing.pool import ThreadPool

 from ..cerberus import setup as cerberus
-from ..kubernetes import client as kubecli
 from ..post_actions import actions as post_actions
 from ..node_actions.aws_node_scenarios import AWS
 from ..node_actions.openstack_node_scenarios import OPENSTACKCLOUD
@@ -40,7 +40,8 @@ def multiprocess_nodes(cloud_object_function, nodes):


 # Inject the cluster shut down scenario
-def cluster_shut_down(shut_down_config):
+# krkn_lib_kubernetes
+def cluster_shut_down(shut_down_config, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    runs = shut_down_config["runs"]
    shut_down_duration = shut_down_config["shut_down_duration"]
    cloud_type = shut_down_config["cloud_type"]
@@ -125,8 +126,9 @@ def cluster_shut_down(shut_down_config):

        logging.info("Successfully injected cluster_shut_down scenario!")

+# krkn_lib_kubernetes

-def run(scenarios_list, config, wait_duration):
+def run(scenarios_list, config, wait_duration, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    failed_post_scenarios = []
    for shut_down_config in scenarios_list:
        if len(shut_down_config) > 1:
@@ -138,7 +140,7 @@ def run(scenarios_list, config, wait_duration):
            shut_down_config_scenario = \
                shut_down_config_yaml["cluster_shut_down_scenario"]
            start_time = int(time.time())
-            cluster_shut_down(shut_down_config_scenario)
+            cluster_shut_down(shut_down_config_scenario, kubecli)
            logging.info(
                "Waiting for the specified duration: %s" % (wait_duration)
            )
--- a/kraken/time_actions/common_time_functions.py
+++ b/kraken/time_actions/common_time_functions.py
@@ -5,14 +5,12 @@ import re
 import sys
 import yaml
 import random
-
+import krkn_lib_kubernetes
 from ..cerberus import setup as cerberus
-from ..kubernetes import client as kubecli
 from ..invoke import command as runcommand

-
-def pod_exec(pod_name, command, namespace, container_name):
-    i = 0
+# krkn_lib_kubernetes
+def pod_exec(pod_name, command, namespace, container_name, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    for i in range(5):
        response = kubecli.exec_cmd_in_pod(
            command,
@@ -41,7 +39,8 @@ def node_debug(node_name, command):
    return response


-def get_container_name(pod_name, namespace, container_name=""):
+# krkn_lib_kubernetes
+def get_container_name(pod_name, namespace, kubecli: krkn_lib_kubernetes.KrknLibKubernetes, container_name=""):

    container_names = kubecli.get_containers_in_pod(pod_name, namespace)
    if container_name != "":
@@ -63,7 +62,8 @@ def get_container_name(pod_name, namespace, container_name=""):
        return container_name


-def skew_time(scenario):
+# krkn_lib_kubernetes
+def skew_time(scenario, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    skew_command = "date --set "
    if scenario["action"] == "skew_date":
        skewed_date = "00-01-01"
@@ -134,13 +134,17 @@ def skew_time(scenario):
                selected_container_name = get_container_name(
                    pod[0],
                    pod[1],
-                    container_name
+                    kubecli,
+                    container_name,
+
                )
                pod_exec_response = pod_exec(
                    pod[0],
                    skew_command,
                    pod[1],
-                    selected_container_name
+                    selected_container_name,
+                    kubecli,
+
                )
                if pod_exec_response is False:
                    logging.error(
@@ -154,13 +158,15 @@ def skew_time(scenario):
                selected_container_name = get_container_name(
                    pod,
                    scenario["namespace"],
+                    kubecli,
                    container_name
                )
                pod_exec_response = pod_exec(
                    pod,
                    skew_command,
                    scenario["namespace"],
-                    selected_container_name
+                    selected_container_name,
+                    kubecli
                )
                if pod_exec_response is False:
                    logging.error(
@@ -216,7 +222,8 @@ def string_to_date(obj_datetime):
        return datetime.datetime(datetime.MINYEAR, 1, 1)


-def check_date_time(object_type, names):
+# krkn_lib_kubernetes
+def check_date_time(object_type, names, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    skew_command = "date"
    not_reset = []
    max_retries = 30
@@ -256,7 +263,8 @@ def check_date_time(object_type, names):
                pod_name[0],
                skew_command,
                pod_name[1],
-                pod_name[2]
+                pod_name[2],
+                kubecli
            )
            pod_datetime = string_to_date(pod_datetime_string)
            while not (
@@ -271,7 +279,8 @@ def check_date_time(object_type, names):
                    pod_name[0],
                    skew_command,
                    pod_name[1],
-                    pod_name[2]
+                    pod_name[2],
+                    kubecli
                )
                pod_datetime = string_to_date(pod_datetime)
                counter += 1
@@ -289,14 +298,15 @@ def check_date_time(object_type, names):
    return not_reset


-def run(scenarios_list, config, wait_duration):
+# krkn_lib_kubernetes
+def run(scenarios_list, config, wait_duration, kubecli: krkn_lib_kubernetes.KrknLibKubernetes):
    for time_scenario_config in scenarios_list:
        with open(time_scenario_config, "r") as f:
            scenario_config = yaml.full_load(f)
            for time_scenario in scenario_config["time_scenarios"]:
                start_time = int(time.time())
-                object_type, object_names = skew_time(time_scenario)
-                not_reset = check_date_time(object_type, object_names)
+                object_type, object_names = skew_time(time_scenario, kubecli)
+                not_reset = check_date_time(object_type, object_names, kubecli)
                if len(not_reset) > 0:
                    logging.info("Object times were not reset")
                logging.info(
--- a/requirements.txt
+++ b/requirements.txt
@@ -37,3 +37,4 @@ prometheus_api_client
 ibm_cloud_sdk_core
 ibm_vpc
 pytest
+krkn-lib-kubernetes > 0.1.1
--- a/run_kraken.py
+++ b/run_kraken.py
@@ -8,7 +8,6 @@ import optparse
 import pyfiglet
 import uuid
 import time
-import kraken.kubernetes.client as kubecli
 import kraken.litmus.common_litmus as common_litmus
 import kraken.time_actions.common_time_functions as time_actions
 import kraken.performance_dashboards.setup as performance_dashboards
@@ -26,12 +25,13 @@ import kraken.arcaflow_plugin as arcaflow_plugin
 import server as server
 import kraken.prometheus.client as promcli
 from kraken import plugins
+from krkn_lib_kubernetes import KrknLibKubernetes

 KUBE_BURNER_URL = (
    "https://github.com/cloud-bulldozer/kube-burner/"
    "releases/download/v{version}/kube-burner-{version}-Linux-x86_64.tar.gz"
 )
-KUBE_BURNER_VERSION = "0.9.1"
+KUBE_BURNER_VERSION = "1.7.0"


 # Main function
@@ -101,7 +101,7 @@ def main(cfg):
        try:
            kubeconfig_path
            os.environ["KUBECONFIG"] = str(kubeconfig_path)
-            kubecli.initialize_clients(kubeconfig_path)
+            kubecli = KrknLibKubernetes(kubeconfig_path=kubeconfig_path)
        except NameError:
            kubecli.initialize_clients(None)

@@ -125,9 +125,6 @@ def main(cfg):
            logging.info(
                "Publishing kraken status at http://%s:%s" % (server_address, port)
            )
-            logging.info(
-                "Publishing kraken status at http://%s:%s" % (server_address, port)
-            )
            server.start_server(address, run_signal)

        # Cluster info
@@ -218,6 +215,7 @@ def main(cfg):
                                failed_post_scenarios,
                                wait_duration,
                            )
+                        # krkn_lib_kubernetes
                        elif scenario_type == "container_scenarios":
                            logging.info("Running container scenarios")
                            failed_post_scenarios = pod_scenarios.container_run(
@@ -226,26 +224,30 @@ def main(cfg):
                                config,
                                failed_post_scenarios,
                                wait_duration,
+                                kubecli
                            )

                        # Inject node chaos scenarios specified in the config
+                        # krkn_lib_kubernetes
                        elif scenario_type == "node_scenarios":
                            logging.info("Running node scenarios")
-                            nodeaction.run(scenarios_list, config, wait_duration)
+                            nodeaction.run(scenarios_list, config, wait_duration, kubecli)

                        # Inject managedcluster chaos scenarios specified in the config
+                        # krkn_lib_kubernetes
                        elif scenario_type == "managedcluster_scenarios":
                            logging.info("Running managedcluster scenarios")
                            managedcluster_scenarios.run(
-                                scenarios_list, config, wait_duration
+                                scenarios_list, config, wait_duration, kubecli
                            )

                        # Inject time skew chaos scenarios specified
                        # in the config
+                        # krkn_lib_kubernetes
                        elif scenario_type == "time_scenarios":
                            if distribution == "openshift":
                                logging.info("Running time skew scenarios")
-                                time_actions.run(scenarios_list, config, wait_duration)
+                                time_actions.run(scenarios_list, config, wait_duration, kubecli)
                            else:
                                logging.error(
                                    "Litmus scenarios are currently "
@@ -261,13 +263,14 @@ def main(cfg):
                                if litmus_install:
                                    # Remove Litmus resources
                                    # before running the scenarios
-                                    common_litmus.delete_chaos(litmus_namespace)
+                                    common_litmus.delete_chaos(litmus_namespace, kubecli)
                                    common_litmus.delete_chaos_experiments(
-                                        litmus_namespace
+                                        litmus_namespace,
+                                        kubecli
                                    )
                                    if litmus_uninstall_before_run:
                                        common_litmus.uninstall_litmus(
-                                            litmus_version, litmus_namespace
+                                            litmus_version, litmus_namespace, kubecli
                                        )
                                    common_litmus.install_litmus(
                                        litmus_version, litmus_namespace
@@ -282,6 +285,7 @@ def main(cfg):
                                    litmus_uninstall,
                                    wait_duration,
                                    litmus_namespace,
+                                    kubecli
                                )
                            else:
                                logging.error(
@@ -291,10 +295,12 @@ def main(cfg):
                                sys.exit(1)

                        # Inject cluster shutdown scenarios
+                        # krkn_lib_kubernetes
                        elif scenario_type == "cluster_shut_down_scenarios":
-                            shut_down.run(scenarios_list, config, wait_duration)
+                            shut_down.run(scenarios_list, config, wait_duration, kubecli)

                        # Inject namespace chaos scenarios
+                        # krkn_lib_kubernetes
                        elif scenario_type == "namespace_scenarios":
                            logging.info("Running namespace scenarios")
                            namespace_actions.run(
@@ -303,6 +309,7 @@ def main(cfg):
                                wait_duration,
                                failed_post_scenarios,
                                kubeconfig_path,
+                                kubecli
                            )

                        # Inject zone failures
@@ -318,14 +325,16 @@ def main(cfg):
                            )

                        # PVC scenarios
+                        # krkn_lib_kubernetes
                        elif scenario_type == "pvc_scenarios":
                            logging.info("Running PVC scenario")
-                            pvc_scenario.run(scenarios_list, config)
+                            pvc_scenario.run(scenarios_list, config, kubecli)

                        # Network scenarios
+                        # krkn_lib_kubernetes
                        elif scenario_type == "network_chaos":
                            logging.info("Running Network Chaos")
-                            network_chaos.run(scenarios_list, config, wait_duration)
+                            network_chaos.run(scenarios_list, config, wait_duration, kubecli)

                        # Check for critical alerts when enabled
                        if check_critical_alerts:
@@ -380,9 +389,9 @@ def main(cfg):
                sys.exit(1)
  
        if litmus_uninstall and litmus_installed:
-            common_litmus.delete_chaos(litmus_namespace)
-            common_litmus.delete_chaos_experiments(litmus_namespace)
-            common_litmus.uninstall_litmus(litmus_version, litmus_namespace)
+            common_litmus.delete_chaos(litmus_namespace, kubecli)
+            common_litmus.delete_chaos_experiments(litmus_namespace, kubecli)
+            common_litmus.uninstall_litmus(litmus_version, litmus_namespace, kubecli)

        if failed_post_scenarios:
            logging.error(
Author	SHA1	Message	Date
Naga Ravi Chaitanya Elluri	4084ffd9c6	Bake in virtualenv in krkn images This is needed to tie the python version being used in case multiple versions are installed.	2023-07-24 12:52:20 -04:00
Sahil Shah	19cc2c047f	Fix for pvc scenario	2023-07-21 15:41:28 -04:00
Paige Rubendall	6197fc6722	separating build and test workflows (#448 ) * separating build and test workflows * only run build on pull request	2023-07-20 16:01:50 -04:00
Naga Ravi Chaitanya Elluri	2a8ac41ebf	Bump release version to v1.3.5	2023-07-20 15:24:56 -04:00
Naga Ravi Chaitanya Elluri	b4d235d31c	Bake in yq dependency in Kraken container images (#450 ) This commit also updates ppc64le image to have the latest bits.	2023-07-20 13:17:52 -04:00
Naga Ravi Chaitanya Elluri	e4e4620d10	Bump release version to 1.3.4 (#447 )	2023-06-28 16:30:28 -04:00
Naga Ravi Chaitanya Elluri	a2c24ab7ed	Install latest version of krkn-lib-kubernetes (#446 )	2023-06-28 15:21:19 -04:00
Naga Ravi Chaitanya Elluri	fe892fd9bf	Switch from centos to redhat ubi base image This replaces the base image for Kraken container images to use redhat ubi image to be more secure and stable.	2023-06-22 12:10:51 -04:00
Naga Ravi Chaitanya Elluri	74613fdb4b	Install oc and kubectl clients from stable releases This makes sure latest clients are installed and used: - This will avoid compatability issues with the server - Fixes security vulnerabilities and CVEs	2023-06-20 15:39:53 -04:00
Naga Ravi Chaitanya Elluri	28c37c9353	Bump release version to v1.3.3	2023-06-16 09:42:46 -04:00
Naga Ravi Chaitanya Elluri	de0567b067	Tweak the etcd alert severity	2023-06-16 09:19:17 -04:00
Naga Ravi Chaitanya Elluri	83486557f1	Bump release version to v1.3.2 (#439 )	2023-06-15 12:12:42 -04:00
Naga Ravi Chaitanya Elluri	ce409ea6fb	Update kube-burner dependency version to 1.7.0	2023-06-15 11:55:17 -04:00
Naga Ravi Chaitanya Elluri	0eb8d38596	Expand SLOs profile to cover monitoring for more alerts This commit: - Also sets appropriate severity to avoid false failures for the test cases especially given that theses are monitored during the chaos vs post chaos. Critical alerts are all monitored post chaos with few monitored during the chaos that represent overall health and performance of the service. - Renames Alerts to SLOs validation Metrics reference: `f09a492b13/cmd/kube-burner/ocp-config/alerts.yml`	2023-06-14 16:58:36 -04:00
Tullio Sebastiani	68dc17bc44	krkn-lib-kubernetes refactoring proposal (#400 ) * run_kraken.py updated + renamed kubernetes library folder unstaged files kubecli marker * container scenarios updated * node scenarios updated typo injected kubecli * managed cluster scenarios updated * time scenarios updated * litmus scenarios updated * cluster scenarios updated * namespace scenarios updated * pvc scenarios updated * network chaos scenarios updated * common_managed_cluster functions updated * switched draft library to official one * regression on rebase	2023-06-13 10:02:35 -04:00
Naga Ravi Chaitanya Elluri	572eeefaf4	Minor fixes This commit fixes few typos and duplicate logs	2023-06-12 21:05:27 -04:00
Naga Ravi Chaitanya Elluri	81376bad56	Bump release version to v1.3.1 This updates the Krkn container images to use the latest v1.3.1 minor release: https://github.com/redhat-chaos/krkn/releases.	2023-06-07 14:41:09 -04:00