Compare commits

...

12 Commits

Author SHA1 Message Date
Tullio Sebastiani
9378cd74cd krkn-lib update v2.1.6 to fix pod monitoring time calculations (#674)
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>
2024-07-16 18:04:24 +02:00
Paige Patton
4d3491da0f adidng action token passing (#671)
rh-pre-commit.version: 2.2.0
rh-pre-commit.check-secrets: ENABLED

Signed-off-by: Paige Rubendall <prubenda@redhat.com>
2024-07-15 12:50:20 -04:00
Naga Ravi Chaitanya Elluri
d6ce66160b Remove podman-compose dependency
We are not using it in the krkn code base and removing it fixes one
of the license issues reported by FOSSA. This commit also removes
setting up dependencies using docker/podman compose as it not actively
maintained.

Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>
2024-07-10 17:25:33 -04:00
Paige Rubendall
ef1a55438b taking out need for az cli to be installed
rh-pre-commit.version: 2.2.0
rh-pre-commit.check-secrets: ENABLED

Signed-off-by: Paige Rubendall <prubenda@redhat.com>
2024-07-05 15:18:06 -04:00
Tullio Sebastiani
d8f54b83a2 fixed image push issue
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>
2024-07-05 10:32:01 -04:00
Tullio Sebastiani
4870c86515 moves the krkn-hub build from push on main to tag (#660)
* moves the krkn-hub build from push on main to tag + final image enhancement

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

fixed syntax

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

typo

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

typo

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* quotes

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

---------

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>
2024-07-05 16:09:34 +02:00
Naga Ravi Chaitanya Elluri
6ae17cf678 Update dockerfile to install azure-cli using dnf
Avoids architecture issues such as "bash: /usr/bin/az: cannot execute: required file not found"

Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>
2024-07-03 18:35:45 -04:00
Tullio Sebastiani
ce9f8aa050 Dockerfile update v1.6.2 (#659)
Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>
2024-07-03 16:34:37 +02:00
Paige Patton
05148317c1 taking out one glcoud call (#657)
rh-pre-commit.version: 2.2.0
rh-pre-commit.check-secrets: ENABLED

Signed-off-by: Paige Rubendall <prubenda@redhat.com>
2024-07-03 16:14:19 +02:00
Tullio Sebastiani
5f836f294b Kill pod arca plugin update adaptation (#656)
* new kill-pod interface adaptation

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* unit test fix

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* requirements update

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* fixed duplicate requirement

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* added conditional dockerfile build

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

fix

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

fix

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

fix

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

removed useless print

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

---------

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>
2024-07-03 15:50:43 +02:00
snyk-bot
cfa1bb09a0 fix: requirements.txt to reduce vulnerabilities
The following vulnerabilities are fixed by pinning transitive dependencies:
- https://snyk.io/vuln/SNYK-PYTHON-REQUESTS-6928867
2024-06-24 10:23:37 -04:00
Naga Ravi Chaitanya Elluri
5ddfff5a85 Make krkn dir executable
Signed-off-by: Naga Ravi Chaitanya Elluri <nelluri@redhat.com>
2024-06-20 14:32:20 -04:00
13 changed files with 79 additions and 97 deletions

View File

@@ -1,8 +1,7 @@
name: Docker Image CI
on:
push:
branches:
- main
tags: ['v[0-9].[0-9]+.[0-9]+']
pull_request:
jobs:
@@ -12,30 +11,43 @@ jobs:
- name: Check out code
uses: actions/checkout@v3
- name: Build the Docker images
if: startsWith(github.ref, 'refs/tags')
run: |
docker build --no-cache -t quay.io/krkn-chaos/krkn containers/
docker build --no-cache -t quay.io/krkn-chaos/krkn containers/ --build-arg TAG=${GITHUB_REF#refs/tags/}
docker tag quay.io/krkn-chaos/krkn quay.io/redhat-chaos/krkn
docker tag quay.io/krkn-chaos/krkn quay.io/krkn-chaos/krkn:${GITHUB_REF#refs/tags/}
docker tag quay.io/krkn-chaos/krkn quay.io/redhat-chaos/krkn:${GITHUB_REF#refs/tags/}
- name: Test Build the Docker images
if: ${{ github.event_name == 'pull_request' }}
run: |
docker build --no-cache -t quay.io/krkn-chaos/krkn containers/ --build-arg PR_NUMBER=${{ github.event.pull_request.number }}
- name: Login in quay
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
if: startsWith(github.ref, 'refs/tags')
run: docker login quay.io -u ${QUAY_USER} -p ${QUAY_TOKEN}
env:
QUAY_USER: ${{ secrets.QUAY_USERNAME }}
QUAY_TOKEN: ${{ secrets.QUAY_PASSWORD }}
- name: Push the KrknChaos Docker images
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
run: docker push quay.io/krkn-chaos/krkn
if: startsWith(github.ref, 'refs/tags')
run: |
docker push quay.io/krkn-chaos/krkn
docker push quay.io/krkn-chaos/krkn:${GITHUB_REF#refs/tags/}
- name: Login in to redhat-chaos quay
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
if: startsWith(github.ref, 'refs/tags/v')
run: docker login quay.io -u ${QUAY_USER} -p ${QUAY_TOKEN}
env:
QUAY_USER: ${{ secrets.QUAY_USER_1 }}
QUAY_TOKEN: ${{ secrets.QUAY_TOKEN_1 }}
- name: Push the RedHat Chaos Docker images
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
run: docker push quay.io/redhat-chaos/krkn
if: startsWith(github.ref, 'refs/tags')
run: |
docker push quay.io/redhat-chaos/krkn
docker push quay.io/redhat-chaos/krkn:${GITHUB_REF#refs/tags/}
- name: Rebuild krkn-hub
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
if: startsWith(github.ref, 'refs/tags')
uses: redhat-chaos/actions/krkn-hub@main
with:
QUAY_USER: ${{ secrets.QUAY_USERNAME }}
QUAY_TOKEN: ${{ secrets.QUAY_PASSWORD }}
AUTOPUSH: ${{ secrets.AUTOPUSH }}

View File

@@ -41,18 +41,6 @@ After installation, refer back to the below sections for supported scenarios and
#### Running Kraken with minimal configuration tweaks
For cases where you want to run Kraken with minimal configuration changes, refer to [krkn-hub](https://github.com/krkn-chaos/krkn-hub). One use case is CI integration where you do not want to carry around different configuration files for the scenarios.
### Setting up infrastructure dependencies
Kraken indexes the metrics specified in the profile into Elasticsearch in addition to leveraging Cerberus for understanding the health of the Kubernetes cluster under test. More information on the features is documented below. The infrastructure pieces can be easily installed and uninstalled by running:
```
$ cd kraken
$ podman-compose up or $ docker-compose up # Spins up the containers specified in the docker-compose.yml file present in the run directory.
$ podman-compose down or $ docker-compose down # Delete the containers installed.
```
This will manage the Cerberus and Elasticsearch containers on the host on which you are running Kraken.
**NOTE**: Make sure you have enough resources (memory and disk) on the machine on top of which the containers are running as Elasticsearch is resource intensive. Cerberus monitors the system components by default, the [config](config/cerberus.yaml) can be tweaked to add applications namespaces, routes and other components to monitor as well. The command will keep running until killed since detached mode is not supported as of now.
### Config
Instructions on how to setup the config and the options supported can be found at [Config](docs/config.md).

View File

@@ -1,9 +1,6 @@
# azure-client
FROM mcr.microsoft.com/azure-cli:latest as azure-cli
# oc build
FROM golang:1.22.4 AS oc-build
RUN apt-get update && apt-get install -y libkrb5-dev
RUN apt-get update && apt-get install -y --no-install-recommends libkrb5-dev
WORKDIR /tmp
RUN git clone --branch release-4.18 https://github.com/openshift/oc.git
WORKDIR /tmp/oc
@@ -15,11 +12,13 @@ RUN go mod edit -go 1.22.3 &&\
RUN make GO_REQUIRED_MIN_VERSION:= oc
FROM fedora:40
ARG PR_NUMBER
ARG TAG
RUN groupadd -g 1001 krkn && useradd -m -u 1001 -g krkn krkn
RUN dnf update -y
# krkn version that will be built
ENV KRKN_VERSION v1.6.1
ENV KRKN_VERSION v1.6.2
ENV KUBECONFIG /home/krkn/.kube/config
@@ -29,22 +28,30 @@ RUN curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/s
cp kubectl /usr/bin/kubectl && chmod +x /usr/bin/kubectl
# This overwrites any existing configuration in /etc/yum.repos.d/kubernetes.repo
RUN dnf update && dnf install -y git python39 jq yq gettext wget which
# copy azure client binary from azure-cli image
COPY --from=azure-cli /usr/local/bin/az /usr/bin/az
RUN dnf update && dnf install -y --setopt=install_weak_deps=False \
git python39 jq yq gettext wget which &&\
dnf clean all
# copy oc client binary from oc-build image
COPY --from=oc-build /tmp/oc/oc /usr/bin/oc
# krkn build
RUN git clone https://github.com/krkn-chaos/krkn.git --branch $KRKN_VERSION /home/krkn/kraken && \
RUN git clone https://github.com/krkn-chaos/krkn.git /home/krkn/kraken && \
mkdir -p /home/krkn/.kube
WORKDIR /home/krkn/kraken
# default behaviour will be to build main
# if it is a PR trigger the PR itself will be checked out
RUN if [ -n "$PR_NUMBER" ]; then git fetch origin pull/${PR_NUMBER}/head:pr-${PR_NUMBER} && git checkout pr-${PR_NUMBER};fi
# if it is a TAG trigger checkout the tag
RUN if [ -n "$TAG" ]; then git checkout "$TAG";fi
RUN python3.9 -m ensurepip
RUN pip3.9 install -r requirements.txt
RUN pip3.9 install jsonschema
RUN chown -R krkn:krkn /home/krkn
RUN chown -R krkn:krkn /home/krkn && chmod 755 /home/krkn
USER krkn
ENTRYPOINT ["python3.9", "run_kraken.py"]
CMD ["--config=config/config.yaml"]
CMD ["--config=config/config.yaml"]

View File

@@ -2,15 +2,10 @@
FROM ppc64le/centos:8
FROM mcr.microsoft.com/azure-cli:latest as azure-cli
LABEL org.opencontainers.image.authors="Red Hat OpenShift Chaos Engineering"
ENV KUBECONFIG /root/.kube/config
# Copy azure client binary from azure-cli image
COPY --from=azure-cli /usr/local/bin/az /usr/bin/az
# Install dependencies
RUN yum install -y git python39 python3-pip jq gettext wget && \
python3.9 -m pip install -U pip && \

View File

@@ -1,31 +0,0 @@
version: "3"
services:
elastic:
image: docker.elastic.co/elasticsearch/elasticsearch:7.13.2
deploy:
replicas: 1
restart_policy:
condition: on-failure
network_mode: host
environment:
discovery.type: single-node
kibana:
image: docker.elastic.co/kibana/kibana:7.13.2
deploy:
replicas: 1
restart_policy:
condition: on-failure
network_mode: host
environment:
ELASTICSEARCH_HOSTS: "http://0.0.0.0:9200"
cerberus:
image: quay.io/openshift-scale/cerberus:latest
privileged: true
deploy:
replicas: 1
restart_policy:
condition: on-failure
network_mode: host
volumes:
- ./config/cerberus.yaml:/root/cerberus/config/config.yaml:Z # Modify the config in case of the need to monitor additional components
- ${HOME}/.kube/config:/root/.kube/config:Z

View File

@@ -27,14 +27,12 @@ After creating the service account you will need to enable the account using the
## Azure
**NOTE**: For Azure node killing scenarios, make sure [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) is installed.
You will also need to create a service principal and give it the correct access, see [here](https://docs.openshift.com/container-platform/4.5/installing/installing_azure/installing-azure-account.html) for creating the service principal and setting the proper permissions.
**NOTE**: You will need to create a service principal and give it the correct access, see [here](https://docs.openshift.com/container-platform/4.5/installing/installing_azure/installing-azure-account.html) for creating the service principal and setting the proper permissions.
To properly run the service principal requires “Azure Active Directory Graph/Application.ReadWrite.OwnedBy” api permission granted and “User Access Administrator”.
Before running you will need to set the following:
1. Login using ```az login```
1. ```export AZURE_SUBSCRIPTION_ID=<subscription_id>```
2. ```export AZURE_TENANT_ID=<tenant_id>```

View File

@@ -1,6 +1,6 @@
import time
import yaml
import os
import kraken.invoke.command as runcommand
import logging
import kraken.node_actions.common_node_functions as nodeaction
@@ -17,9 +17,9 @@ class Azure:
# Acquire a credential object using CLI-based authentication.
credentials = DefaultAzureCredential()
logging.info("credential " + str(credentials))
az_account = runcommand.invoke("az account list -o yaml")
az_account_yaml = yaml.safe_load(az_account, Loader=yaml.FullLoader)
subscription_id = az_account_yaml[0]["id"]
# az_account = runcommand.invoke("az account list -o yaml")
# az_account_yaml = yaml.safe_load(az_account, Loader=yaml.FullLoader)
subscription_id = os.getenv("AZURE_SUBSCRIPTION_ID")
self.compute_client = ComputeManagementClient(credentials, subscription_id)
# Get the instance ID of the node

View File

@@ -1,6 +1,8 @@
import os
import sys
import time
import logging
import json
import kraken.node_actions.common_node_functions as nodeaction
from kraken.node_actions.abstract_node_scenarios import abstract_node_scenarios
from googleapiclient import discovery
@@ -10,11 +12,19 @@ from krkn_lib.k8s import KrknKubernetes
class GCP:
def __init__(self):
try:
gapp_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
with open(gapp_creds, "r") as f:
f_str = f.read()
self.project = json.loads(f_str)['project_id']
#self.project = runcommand.invoke("gcloud config get-value project").split("/n")[0].strip()
logging.info("project " + str(self.project) + "!")
credentials = GoogleCredentials.get_application_default()
self.client = discovery.build("compute", "v1", credentials=credentials, cache_discovery=False)
self.project = runcommand.invoke("gcloud config get-value project").split("/n")[0].strip()
logging.info("project " + str(self.project) + "!")
credentials = GoogleCredentials.get_application_default()
self.client = discovery.build("compute", "v1", credentials=credentials, cache_discovery=False)
except Exception as e:
logging.error("Error on setting up GCP connection: " + str(e))
sys.exit(1)
# Get the instance ID of the node
def get_instance_id(self, node):

View File

@@ -53,7 +53,7 @@ class Plugins:
def unserialize_scenario(self, file: str) -> Any:
return serialization.load_from_file(abspath(file))
def run(self, file: str, kubeconfig_path: str, kraken_config: str):
def run(self, file: str, kubeconfig_path: str, kraken_config: str, run_uuid:str):
"""
Run executes a series of steps
"""
@@ -102,7 +102,8 @@ class Plugins:
unserialized_input.kubeconfig_path = kubeconfig_path
if "kraken_config" in step.schema.input.properties:
unserialized_input.kraken_config = kraken_config
output_id, output_data = step.schema(unserialized_input)
output_id, output_data = step.schema(params=unserialized_input, run_id=run_uuid)
logging.info(step.render_output(output_id, output_data) + "\n")
if output_id in step.error_output_ids:
raise Exception(
@@ -253,7 +254,8 @@ def run(scenarios: List[str],
failed_post_scenarios: List[str],
wait_duration: int,
telemetry: KrknTelemetryKubernetes,
kubecli: KrknKubernetes
kubecli: KrknKubernetes,
run_uuid: str
) -> (List[str], list[ScenarioTelemetry]):
scenario_telemetries: list[ScenarioTelemetry] = []
@@ -268,7 +270,7 @@ def run(scenarios: List[str],
try:
start_monitoring(pool, kill_scenarios)
PLUGINS.run(scenario, kubeconfig_path, kraken_config)
PLUGINS.run(scenario, kubeconfig_path, kraken_config, run_uuid)
result = pool.join()
scenario_telemetry.affected_pods = result
if result.error:

View File

@@ -1,7 +1,7 @@
aliyun-python-sdk-core==2.13.36
aliyun-python-sdk-ecs==4.24.25
arcaflow-plugin-sdk==0.14.0
arcaflow==0.17.2
arcaflow-plugin-sdk==0.10.0
boto3==1.28.61
azure-identity==1.16.1
azure-keyvault==4.2.0
@@ -15,27 +15,26 @@ google-api-python-client==2.116.0
ibm_cloud_sdk_core==3.18.0
ibm_vpc==0.20.0
jinja2==3.1.4
krkn-lib==2.1.3
krkn-lib==2.1.6
lxml==5.1.0
kubernetes==26.1.0
kubernetes==28.1.0
oauth2client==4.1.3
pandas==2.2.0
openshift-client==1.0.21
paramiko==3.4.0
podman-compose==1.0.6
pyVmomi==8.0.2.0.1
pyfiglet==1.0.2
pytest==8.0.0
python-ipmi==0.5.4
python-openstackclient==6.5.0
requests==2.32.0
requests==2.32.2
service_identity==24.1.0
PyYAML==6.0
PyYAML==6.0.1
setuptools==65.5.1
werkzeug==3.0.3
wheel==0.42.0
zope.interface==5.4.0
git+https://github.com/krkn-chaos/arcaflow-plugin-kill-pod.git
git+https://github.com/krkn-chaos/arcaflow-plugin-kill-pod.git@v0.1.0
git+https://github.com/vmware/vsphere-automation-sdk-python.git@v8.0.0.0
cryptography>=42.0.4 # not directly required, pinned by Snyk to avoid a vulnerability

View File

@@ -266,7 +266,8 @@ def main(cfg):
failed_post_scenarios,
wait_duration,
telemetry_k8s,
kubecli
kubecli,
run_uuid
)
chaos_telemetry.scenarios.extend(scenario_telemetries)
# krkn_lib

View File

@@ -39,7 +39,7 @@ class NetworkScenariosTest(unittest.TestCase):
def test_network_chaos(self):
output_id, output_data = ingress_shaping.network_chaos(
ingress_shaping.NetworkScenarioConfig(
params=ingress_shaping.NetworkScenarioConfig(
label_selector="node-role.kubernetes.io/control-plane",
instance_count=1,
network_params={
@@ -47,7 +47,8 @@ class NetworkScenariosTest(unittest.TestCase):
"loss": "0.02",
"bandwidth": "100mbit"
}
)
),
run_id="network-shaping-test"
)
if output_id == "error":
logging.error(output_data.error)

View File

@@ -10,7 +10,7 @@ class RunPythonPluginTest(unittest.TestCase):
tmp_file = tempfile.NamedTemporaryFile()
tmp_file.write(bytes("print('Hello world!')", 'utf-8'))
tmp_file.flush()
output_id, output_data = run_python_file(RunPythonFileInput(tmp_file.name))
output_id, output_data = run_python_file(params=RunPythonFileInput(tmp_file.name), run_id="test-python-plugin-success")
self.assertEqual("success", output_id)
self.assertEqual("Hello world!\n", output_data.stdout)
@@ -18,7 +18,7 @@ class RunPythonPluginTest(unittest.TestCase):
tmp_file = tempfile.NamedTemporaryFile()
tmp_file.write(bytes("import sys\nprint('Hello world!')\nsys.exit(42)\n", 'utf-8'))
tmp_file.flush()
output_id, output_data = run_python_file(RunPythonFileInput(tmp_file.name))
output_id, output_data = run_python_file(params=RunPythonFileInput(tmp_file.name), run_id="test-python-plugin-error")
self.assertEqual("error", output_id)
self.assertEqual(42, output_data.exit_code)
self.assertEqual("Hello world!\n", output_data.stdout)