Compare commits

...

20 Commits

Author SHA1 Message Date
Sahil Shah
dfc3a1d716 Adding http load scenario (#1160)
Signed-off-by: Sahil Shah <sahshah@redhat.com>
2026-04-09 10:47:50 -04:00
Paige Patton
0777ef924f changing pod recovery to vmi recovery
Signed-off-by: Paige Patton <prubenda@redhat.com>
2026-04-08 16:37:25 -04:00
Arpit Raj
1623dbac53 fix: resolve Python version mismatch in container entrypoint (#1222)
Signed-off-by: 1PoPTRoN <vrxn.arp1traj@gmail.com>
2026-04-08 09:00:37 -04:00
Arpit Raj
daa6dc4df9 fix: replace hardcoded /tmp paths with secure tempfile.mkdtemp() (#1223)
Signed-off-by: 1PoPTRoN <vrxn.arp1traj@gmail.com>
2026-04-07 10:55:46 -04:00
NITESH SINGH
9c064d888a fix(scenarios): fix network_chaos_ng variable shadowing and instance_count condition (#1219)
Signed-off-by: NETIZEN-11 <niteshkumar121411@gmail.com>
Co-authored-by: Paige Patton <64206430+paigerube14@users.noreply.github.com>
2026-04-01 08:54:29 -04:00
NITESH SINGH
b3e9ea1c3b fix(utils): fix HealthChecker bool comparisons and add missing return value (#1216)
- Replace '!= False' with 'is not False' and '== True' with 'is True'
  for idiomatic Python bool identity checks
- Add missing 'return self.ret_value' so callers receive the exit code
  instead of always getting None
- Add Apache 2.0 license header

Signed-off-by: NETIZEN-11 <niteshkumar121411@gmail.com>
Co-authored-by: Paige Patton <64206430+paigerube14@users.noreply.github.com>
2026-04-01 08:53:21 -04:00
Arpit Raj
9f417d8f1a fix: log exception details in delete_job to surface get_job_pods errors (#1220)
Signed-off-by: 1PoPTRoN <vrxn.arp1traj@gmail.com>
2026-04-01 08:52:51 -04:00
Paige Patton
ef50aa8c83 adding licsense to files (#1215)
Signed-off-by: Paige Patton <prubenda@redhat.com>
2026-03-31 15:19:19 -04:00
Paige Patton
357889196a Adding node interface down/up scenario' (#1192)
* Adding node interface down/up scenario'

Signed-off-by: Paige Patton <prubenda@redhat.com>

* Trigger CI

---------

Signed-off-by: Paige Patton <prubenda@redhat.com>
2026-03-31 13:59:41 -04:00
Paige Patton
35ee9d7bae adding changes to properly pass/fail a scenario if errors occur (#1065)
Signed-off-by: Paige Patton <prubenda@redhat.com>
2026-03-31 12:31:25 -04:00
Paige Patton
626e203d33 removing kubernetes functions (#1205)
Signed-off-by: Paige Patton <prubenda@redhat.com>
2026-03-31 08:46:40 -04:00
NITESH SINGH
8c57b0956b fix(rollback): use == instead of is for boolean check in execute_roll… (#1207)
* fix(rollback): use == instead of is for boolean check in execute_rollback_version_files

Signed-off-by: Nitesh <nitesh@example.com>

* fix(rollback): fix indentation in execute_rollback_version_files and add license header

Signed-off-by: Nitesh <nitesh@example.com>

---------

Signed-off-by: Nitesh <nitesh@example.com>
Co-authored-by: Nitesh <nitesh@example.com>
Co-authored-by: Paige Patton <64206430+paigerube14@users.noreply.github.com>
2026-03-30 10:56:36 -04:00
Paige Patton
d55695f7c4 adding pre commit hook (#1206)
Signed-off-by: Paige Patton <prubenda@redhat.com>
2026-03-30 09:49:16 -04:00
Paige Patton
71bd34b020 adding better logging for when sceanrio file cant be found (#1203)
Signed-off-by: Paige Patton <prubenda@redhat.com>
2026-03-27 13:47:49 -04:00
Paige Patton
6da7c9dec6 adding governance template from cncf (#926)
Signed-off-by: Paige Patton <prubenda@redhat.com>
2026-03-27 09:33:00 -04:00
Tullio Sebastiani
4d5aea146d Run method fixes (#1202)
* kubevirt plugin fixes

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* managed_cluster plugin fixes

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

* unit tests fix

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>

---------

Signed-off-by: Tullio Sebastiani <tsebasti@redhat.com>
2026-03-27 14:31:19 +01:00
Yashasvi Yadav
62f500fb2e feat: add GCP zone outage rollback support (#1200)
Add rollback functionality for GCP zone outage scenarios following the
established rollback pattern (Service Hijacking, PVC, Syn Flood).

- Add @set_rollback_context_decorator to run()
- Set rollback callable before stopping nodes with base64/JSON encoded data
- Add rollback_gcp_zone_outage() static method with per-node error handling
- Fix missing poll_interval argument in starmap calls
- Add unit tests for rollback and run methods

Closes #915

Signed-off-by: YASHASVIYADAV30 <yashasviydv30@gmail.com>
Co-authored-by: Paige Patton <64206430+paigerube14@users.noreply.github.com>
2026-03-26 14:42:45 -04:00
Arpit Raj
ec241d35d6 fix: improve logging reliability and code quality (#1199)
- Fix typo 'wating' -> 'waiting' in scenario wait log message
- Replace print() with logging.debug() for pod metrics in prometheus client
- Replace star import with explicit imports in utils/__init__.py
- Remove unnecessary global declaration in main()
- Log VM status exceptions at ERROR level with exception details

Include unit tests in tests/test_logging_and_code_quality.py covering all fixes.

Signed-off-by: 1PoPTRoN <vrxn.arp1traj@gmail.com>
Co-authored-by: Paige Patton <64206430+paigerube14@users.noreply.github.com>
2026-03-26 13:08:56 -04:00
Arpit Raj
59e10d5a99 fix: bind exception variable in except handlers to prevent NameError (#1198)
Signed-off-by: 1PoPTRoN <vrxn.arp1traj@gmail.com>
Co-authored-by: Paige Patton <64206430+paigerube14@users.noreply.github.com>
2026-03-26 09:43:37 -04:00
Paige Patton
c8aa959df2 controller -> detailed (#1201)
Signed-off-by: Paige Patton <prubenda@redhat.com>
2026-03-26 08:47:06 -04:00
148 changed files with 5083 additions and 1200 deletions

2
.gitignore vendored
View File

@@ -55,7 +55,7 @@ MANIFEST
# Per-project virtualenvs
.venv*/
venv*/
kraken.report
*.report
collected-metrics/*
inspect.local.*

9
.pre-commit-config.yaml Normal file
View File

@@ -0,0 +1,9 @@
repos:
- repo: local
hooks:
- id: check-license-header
name: Check Apache 2.0 license header
language: python
entry: python scripts/check_license.py
types: [python]
exclude: ^tests/|/test_|^CI/

View File

@@ -7,7 +7,7 @@ kraken:
signal_address: 0.0.0.0 # Signal listening address
port: 8081 # Signal port
auto_rollback: True # Enable auto rollback for scenarios.
rollback_versions_directory: /tmp/kraken-rollback # Directory to store rollback version files.
rollback_versions_directory: # Directory to store rollback version files. If empty, a secure temp directory is created automatically.
chaos_scenarios: # List of policies/chaos scenarios to load.
- $scenario_type: # List of chaos pod scenarios to load.
- $scenario_file
@@ -42,7 +42,7 @@ telemetry:
prometheus_backup: True # enables/disables prometheus data collection
full_prometheus_backup: False # if is set to False only the /prometheus/wal folder will be downloaded.
backup_threads: 5 # number of telemetry download/upload threads
archive_path: /tmp # local path where the archive files will be temporarily stored
archive_path: # local path where the archive files will be temporarily stored. If empty, a secure temp directory is created automatically.
max_retries: 0 # maximum number of upload retries (if 0 will retry forever)
run_tag: '' # if set, this will be appended to the run folder in the bucket (useful to group the runs)
archive_size: 10000 # the size of the prometheus data archive size in KB. The lower the size of archive is

View File

@@ -7,7 +7,7 @@ kraken:
signal_address: 0.0.0.0 # Signal listening address
port: 8081 # Signal port
auto_rollback: True # Enable auto rollback for scenarios.
rollback_versions_directory: /tmp/kraken-rollback # Directory to store rollback version files.
rollback_versions_directory: # Directory to store rollback version files. If empty, a secure temp directory is created automatically.
chaos_scenarios: # List of policies/chaos scenarios to load.
- $scenario_type: # List of chaos pod scenarios to load.
- $scenario_file
@@ -42,7 +42,7 @@ telemetry:
prometheus_backup: True # enables/disables prometheus data collection
full_prometheus_backup: False # if is set to False only the /prometheus/wal folder will be downloaded.
backup_threads: 5 # number of telemetry download/upload threads
archive_path: /tmp # local path where the archive files will be temporarily stored
archive_path: # local path where the archive files will be temporarily stored. If empty, a secure temp directory is created automatically.
max_retries: 0 # maximum number of upload retries (if 0 will retry forever)
run_tag: '' # if set, this will be appended to the run folder in the bucket (useful to group the runs)
archive_size: 10000 # the size of the prometheus data archive size in KB. The lower the size of archive is

View File

@@ -1,83 +1,148 @@
# Krkn Project Governance
Krkn is a chaos and resiliency testing tool for Kubernetes that injects deliberate failures into clusters to validate their resilience under turbulent conditions. This governance document explains how the project is run.
- [Values](#values)
- [Community Roles](#community-roles)
- [Becoming a Maintainer](#becoming-a-maintainer)
- [Removing a Maintainer](#removing-a-maintainer)
- [Meetings](#meetings)
- [CNCF Resources](#cncf-resources)
- [Code of Conduct](#code-of-conduct)
- [Security Response Team](#security-response-team)
- [Voting](#voting)
- [Modifying this Charter](#modifying-this-charter)
The governance model adopted here is heavily influenced by a set of CNCF projects, especially drew
reference from [Kubernetes governance](https://github.com/kubernetes/community/blob/master/governance.md).
*For similar structures some of the same wordings from kubernetes governance are borrowed to adhere
to the originally construed meaning.*
## Values
## Principles
Krkn and its leadership embrace the following values:
- **Open**: Krkn is open source community.
- **Welcoming and respectful**: See [Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md).
- **Transparent and accessible**: Work and collaboration should be done in public.
Changes to the Krkn organization, Krkn code repositories, and CNCF related activities (e.g.
level, involvement, etc) are done in public.
- **Merit**: Ideas and contributions are accepted according to their technical merit
and alignment with project objectives, scope and design principles.
* **Openness**: Communication and decision-making happens in the open and is discoverable for future reference. As much as possible, all discussions and work take place in public forums and open repositories.
* **Fairness**: All stakeholders have the opportunity to provide feedback and submit contributions, which will be considered on their merits.
* **Community over Product or Company**: Sustaining and growing our community takes priority over shipping code or sponsors' organizational goals. Each contributor participates in the project as an individual.
* **Inclusivity**: We innovate through different perspectives and skill sets, which can only be accomplished in a welcoming and respectful environment.
* **Participation**: Responsibilities within the project are earned through participation, and there is a clear path up the contributor ladder into leadership positions.
## Community Roles
Krkn uses a tiered contributor model. Each level comes with increasing responsibilities and privileges.
### Contributor
Anyone can become a contributor by participating in discussions, reporting bugs, or submitting code or documentation.
**Responsibilities:**
- Adhere to the [Code of Conduct](CODE_OF_CONDUCT.md)
- Report bugs and suggest new features
- Contribute high-quality code and documentation
### Member
Members are active contributors who have demonstrated a solid understanding of the project's codebase and conventions.
**Responsibilities:**
- Review pull requests for correctness, quality, and adherence to project standards
- Provide constructive and timely feedback to contributors
- Ensure contributions are well-tested and documented
- Work with maintainers to support a smooth release process
### Maintainer
Maintainers are responsible for the overall health and direction of the project. They have write access to the [project GitHub repository](https://github.com/krkn-chaos/krkn) and can merge patches from themselves or others. The current maintainers are listed in [MAINTAINERS.md](./MAINTAINERS.md).
Maintainers collectively form the **Maintainer Council**, the governing body for the project.
A maintainer is not just someone who can make changes — they are someone who has demonstrated the ability to collaborate with the team, get the right people to review code and docs, contribute high-quality work, and follow through to fix issues.
**Responsibilities:**
- Set the technical direction and vision for the project
- Manage releases and ensure stability of the main branch
- Make decisions on feature inclusion and project priorities
- Mentor contributors and help grow the community
- Resolve disputes and make final decisions when consensus cannot be reached
### Owner
Owners have administrative access to the project and are the final decision-makers.
**Responsibilities:**
- Manage the core team of maintainers
- Set the overall vision and strategy for the project
- Handle administrative tasks such as managing the repository and other resources
- Represent the project in the broader open-source community
## Becoming a Maintainer
To become a Maintainer you need to demonstrate the following:
- **Commitment to the project:**
- Participate in discussions, contributions, code and documentation reviews for 3 months or more
- Perform reviews for at least 5 non-trivial pull requests
- Contribute at least 3 non-trivial pull requests that have been merged
- Ability to write quality code and/or documentation
- Ability to collaborate effectively with the team
- Understanding of how the team works (policies, processes for testing and code review, etc.)
- Understanding of the project's codebase and coding and documentation style
A new Maintainer must be proposed by an existing Maintainer by sending a message to the [maintainer mailing list](mailto:krkn.maintainers@gmail.com). A simple majority vote of existing Maintainers approves the application. Nominations will be evaluated without prejudice to employer or demographics.
Maintainers who are approved will be granted the necessary GitHub rights and invited to the [maintainer mailing list](mailto:krkn.maintainers@gmail.com).
## Removing a Maintainer
Maintainers may resign at any time if they feel they will not be able to continue fulfilling their project duties.
Maintainers may also be removed for inactivity, failure to fulfill their responsibilities, violating the Code of Conduct, or other reasons. Inactivity is defined as a period of very low or no activity in the project for a year or more, with no definite schedule to return to full Maintainer activity.
A Maintainer may be removed at any time by a 2/3 vote of the remaining Maintainers.
Depending on the reason for removal, a Maintainer may be converted to **Emeritus** status. Emeritus Maintainers will still be consulted on some project matters and can be rapidly returned to Maintainer status if their availability changes.
## Meetings
Maintainers are expected to participate in the public developer meeting, which occurs **once a month via Zoom**. Meeting details (link, agenda, and notes) are posted in the [#krkn channel on Kubernetes Slack](https://kubernetes.slack.com/messages/C05SFMHRWK1) prior to each meeting.
Maintainers will also hold closed meetings to discuss security reports or Code of Conduct violations. Such meetings should be scheduled by any Maintainer on receipt of a security issue or CoC report. All current Maintainers must be invited to such closed meetings, except for any Maintainer who is accused of a CoC violation.
## CNCF Resources
Any Maintainer may suggest a request for CNCF resources, either on the [mailing list](mailto:krkn.maintainers@gmail.com) or during a monthly meeting. A simple majority of Maintainers approves the request. The Maintainers may also choose to delegate working with the CNCF to non-Maintainer community members, who will then be added to the [CNCF's Maintainer List](https://github.com/cncf/foundation/blob/main/project-maintainers.csv) for that purpose.
## Code of Conduct
Krkn follows the [CNCF Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md).
Here is an excerpt:
> As contributors and maintainers of this project, and in the interest of fostering an open and welcoming community, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
> As contributors and maintainers of this project, and in the interest of fostering an open and welcoming community, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
## Maintainer Levels
Code of Conduct violations by community members will be discussed and resolved on the [private maintainer mailing list](mailto:krkn.maintainers@gmail.com). If a Maintainer is directly involved in the report, two Maintainers will instead be designated to work with the CNCF Code of Conduct Committee in resolving it.
### Contributor
Contributors contribute to the community. Anyone can become a contributor by participating in discussions, reporting bugs, or contributing code or documentation.
## Security Response Team
#### Responsibilities:
The Maintainers will appoint a Security Response Team to handle security reports. This committee may consist of the Maintainer Council itself. If this responsibility is delegated, the Maintainers will appoint a team of at least two contributors to handle it. The Maintainers will review the composition of this team at least once a year.
Be active in the community and adhere to the Code of Conduct.
The Security Response Team is responsible for handling all reports of security holes and breaches according to the [security policy](SECURITY.md).
Report bugs and suggest new features.
To report a security vulnerability, please follow the process outlined in [SECURITY.md](SECURITY.md) rather than filing a public GitHub issue.
Contribute high-quality code and documentation.
## Voting
While most business in Krkn is conducted by "[lazy consensus](https://community.apache.org/committers/lazyConsensus.html)", periodically the Maintainers may need to vote on specific actions or changes. Any Maintainer may demand a vote be taken.
### Member
Members are active contributors to the community. Members have demonstrated a strong understanding of the project's codebase and conventions.
Votes on general project matters may be raised on the [maintainer mailing list](mailto:krkn.maintainers@gmail.com) or during a monthly meeting. Votes on security vulnerabilities or Code of Conduct violations must be conducted exclusively on the [private maintainer mailing list](mailto:krkn.maintainers@gmail.com) or in a closed Maintainer meeting, in order to prevent accidental public disclosure of sensitive information.
#### Responsibilities:
Most votes require a **simple majority** of all Maintainers to succeed, except where otherwise noted. Two-thirds majority votes mean at least two-thirds of all existing Maintainers.
Review pull requests for correctness, quality, and adherence to project standards.
| Action | Required Vote |
|--------|--------------|
| Adding a new Maintainer | Simple majority |
| Removing a Maintainer | 2/3 majority |
| Approving CNCF resource requests | Simple majority |
| Modifying this charter | 2/3 majority |
Provide constructive and timely feedback to contributors.
## Modifying this Charter
Ensure that all contributions are well-tested and documented.
Work with maintainers to ensure a smooth and efficient release process.
### Maintainer
Maintainers are responsible for the overall health and direction of the project. They are long-standing contributors who have shown a deep commitment to the project's success.
#### Responsibilities:
Set the technical direction and vision for the project.
Manage releases and ensure the stability of the main branch.
Make decisions on feature inclusion and project priorities.
Mentor other contributors and help grow the community.
Resolve disputes and make final decisions when consensus cannot be reached.
### Owner
Owners have administrative access to the project and are the final decision-makers.
#### Responsibilities:
Manage the core team of maintainers and approvers.
Set the overall vision and strategy for the project.
Handle administrative tasks, such as managing the project's repository and other resources.
Represent the project in the broader open-source community.
# Credits
Sections of this document have been borrowed from [Kubernetes governance](https://github.com/kubernetes/community/blob/master/governance.md)
Changes to this Governance document and its supporting documents may be approved by a 2/3 vote of the Maintainers.

View File

@@ -15,7 +15,7 @@ For detailed description of the roles, see [Governance](./GOVERNANCE.md) page.
| Pradeep Surisetty | [psuriset](https://github.com/psuriset) | psuriset@redhat.com | Owner |
| Paige Patton | [paigerube14](https://github.com/paigerube14) | prubenda@redhat.com | Maintainer |
| Tullio Sebastiani | [tsebastiani](https://github.com/tsebastiani) | tsebasti@redhat.com | Maintainer |
| Yogananth Subramanian | [yogananth-subramanian](https://github.com/yogananth-subramanian) | ysubrama@redhat.com |Maintainer |
| Yogananth Subramanian | [yogananth-subramanian](https://github.com/yogananth-subramanian) | ysubrama@redhat.com | Maintainer |
| Sahil Shah | [shahsahil264](https://github.com/shahsahil264) | sahshah@redhat.com | Member |
@@ -32,3 +32,64 @@ The roles are:
* Maintainer: A contributor who is responsible for the overall health and direction of the project.
* Owner: A contributor who has administrative ownership of the project.
## Maintainer Levels
### Contributor
Contributors contributor to the community. Anyone can become a contributor by participating in discussions, reporting bugs, or contributing code or documentation.
#### Responsibilities:
Be active in the community and adhere to the Code of Conduct.
Report bugs and suggest new features.
Contribute high-quality code and documentation.
### Member
Members are active contributors to the community. Members have demonstrated a strong understanding of the project's codebase and conventions.
#### Responsibilities:
Review pull requests for correctness, quality, and adherence to project standards.
Provide constructive and timely feedback to contributors.
Ensure that all contributions are well-tested and documented.
Work with maintainers to ensure a smooth and efficient release process.
### Maintainer
Maintainers are responsible for the overall health and direction of the project. They are long-standing contributors who have shown a deep commitment to the project's success.
#### Responsibilities:
Set the technical direction and vision for the project.
Manage releases and ensure the stability of the main branch.
Make decisions on feature inclusion and project priorities.
Mentor other contributors and help grow the community.
Resolve disputes and make final decisions when consensus cannot be reached.
### Owner
Owners have administrative access to the project and are the final decision-makers.
#### Responsibilities:
Manage the core team of maintainers and approvers.
Set the overall vision and strategy for the project.
Handle administrative tasks, such as managing the project's repository and other resources.
Represent the project in the broader open-source community.
## Email
If you'd like to contact the krkn maintainers about a specific issue you're having, please reach out to use at krkn.maintainers@gmail.com.

View File

@@ -2,7 +2,7 @@ kraken:
kubeconfig_path: ~/.kube/config # Path to kubeconfig
exit_on_failure: False # Exit when a post action scenario fails
auto_rollback: True # Enable auto rollback for scenarios.
rollback_versions_directory: /tmp/kraken-rollback # Directory to store rollback version files.
rollback_versions_directory: # Directory to store rollback version files. If empty, a secure temp directory is created automatically.
publish_kraken_status: True # Can be accessed at http://0.0.0.0:8081
signal_state: RUN # Will wait for the RUN signal when set to PAUSE before running the scenarios, refer docs/signal.md for more details
signal_address: 0.0.0.0 # Signal listening address
@@ -52,11 +52,14 @@ kraken:
- scenarios/kube/node-network-filter.yml
- scenarios/kube/node-network-chaos.yml
- scenarios/kube/pod-network-chaos.yml
- scenarios/kube/node_interface_down.yaml
- kubevirt_vm_outage:
- scenarios/kubevirt/kubevirt-vm-outage.yaml
- http_load_scenarios:
- scenarios/kube/http_load_scenario.yml
resiliency:
resiliency_run_mode: standalone # Options: standalone, controller, disabled
resiliency_run_mode: standalone # Options: standalone, detailed, disabled
resiliency_file: config/alerts.yaml # Path to SLO definitions, will resolve to performance_monitoring: alert_profile: if not specified
cerberus:
@@ -100,7 +103,7 @@ telemetry:
prometheus_pod_name: "" # name of the prometheus pod (if distribution is kubernetes)
full_prometheus_backup: False # if is set to False only the /prometheus/wal folder will be downloaded.
backup_threads: 5 # number of telemetry download/upload threads
archive_path: /tmp # local path where the archive files will be temporarily stored
archive_path: # local path where the archive files will be temporarily stored. If empty, a secure temp directory is created automatically.
max_retries: 0 # maximum number of upload retries (if 0 will retry forever)
run_tag: '' # if set, this will be appended to the run folder in the bucket (useful to group the runs)
archive_size: 500000

View File

@@ -32,7 +32,7 @@ tunings:
telemetry:
enabled: False # enable/disables the telemetry collection feature
archive_path: /tmp # local path where the archive files will be temporarily stored
archive_path: # local path where the archive files will be temporarily stored. If empty, a secure temp directory is created automatically.
events_backup: False # enables/disables cluster events collection
logs_backup: False

View File

@@ -61,7 +61,7 @@ telemetry:
prometheus_backup: True # enables/disables prometheus data collection
full_prometheus_backup: False # if is set to False only the /prometheus/wal folder will be downloaded.
backup_threads: 5 # number of telemetry download/upload threads
archive_path: /tmp # local path where the archive files will be temporarily stored
archive_path: # local path where the archive files will be temporarily stored. If empty, a secure temp directory is created automatically.
max_retries: 0 # maximum number of upload retries (if 0 will retry forever)
run_tag: '' # if set, this will be appended to the run folder in the bucket (useful to group the runs)
archive_size: 500000 # the size of the prometheus data archive size in KB. The lower the size of archive is

View File

@@ -33,6 +33,8 @@ RUN go mod edit -go 1.24.9 &&\
FROM fedora:40
ARG PR_NUMBER
ARG TAG
ARG PYTHON_VERSION=3.11
ENV PYTHON_CMD=python${PYTHON_VERSION}
RUN groupadd -g 1001 krkn && useradd -m -u 1001 -g krkn krkn
RUN dnf update -y
@@ -41,7 +43,7 @@ ENV KUBECONFIG /home/krkn/.kube/config
# This overwrites any existing configuration in /etc/yum.repos.d/kubernetes.repo
RUN dnf update && dnf install -y --setopt=install_weak_deps=False \
git python3.11 jq yq gettext wget which ipmitool openssh-server &&\
git python${PYTHON_VERSION} jq yq gettext wget which ipmitool openssh-server &&\
dnf clean all
# copy oc client binary from oc-build image
@@ -63,15 +65,15 @@ RUN if [ -n "$PR_NUMBER" ]; then git fetch origin pull/${PR_NUMBER}/head:pr-${PR
# if it is a TAG trigger checkout the tag
RUN if [ -n "$TAG" ]; then git checkout "$TAG";fi
RUN python3.11 -m ensurepip --upgrade --default-pip
RUN python3.11 -m pip install --upgrade pip setuptools==78.1.1
RUN ${PYTHON_CMD} -m ensurepip --upgrade --default-pip
RUN ${PYTHON_CMD} -m pip install --upgrade pip setuptools==78.1.1
# removes the the vulnerable versions of setuptools and pip
RUN rm -rf "$(pip cache dir)"
RUN rm -rf /tmp/*
RUN rm -rf /usr/local/lib/python3.11/ensurepip/_bundled
RUN pip3.11 install -r requirements.txt
RUN pip3.11 install jsonschema
RUN rm -rf /usr/local/lib/${PYTHON_CMD}/ensurepip/_bundled
RUN ${PYTHON_CMD} -m pip install -r requirements.txt
RUN ${PYTHON_CMD} -m pip install jsonschema
LABEL krknctl.title.global="Krkn Base Image"
LABEL krknctl.description.global="This is the krkn base image."

View File

@@ -5,4 +5,4 @@ set -e
# Change to kraken directory
# Execute the main command
exec python3.9 run_kraken.py "$@"
exec "${PYTHON_CMD:-python3}" run_kraken.py "$@"

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1 +1,14 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .setup import *

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import requests
import sys

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .analysis import *
from .kraken_tests import *
from .prometheus import *

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import pandas as pd

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
def get_entries_by_category(filename, category):
# Read the file
with open(filename, "r") as file:

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
from prometheus_api_client import PrometheusConnect

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import subprocess
import logging
import sys

View File

@@ -1 +1,14 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .client import *

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import annotations
import datetime
@@ -251,7 +264,17 @@ def metrics(
for k,v in pod.items():
metric[k] = v
metric['timestamp'] = str(datetime.datetime.now())
print('adding pod' + str(metric))
logging.debug("adding pod %s", metric)
metrics_list.append(metric.copy())
for k,v in scenario.get("affected_vmis", {}).items():
metric_name = "affected_vmis_recovery"
metric = {"metricName": metric_name, "type": k}
if type(v) is list:
for vmi in v:
for k,v in vmi.items():
metric[k] = v
metric['timestamp'] = str(datetime.datetime.now())
logging.debug("adding vmi %s", metric)
metrics_list.append(metric.copy())
for affected_node in scenario["affected_nodes"]:
metric_name = "affected_nodes_recovery"

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import annotations
import datetime

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""krkn.resiliency package public interface."""
from .resiliency import Resiliency # noqa: F401

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Resiliency evaluation orchestrator for Krkn chaos runs.
This module provides the `Resiliency` class which loads the canonical
@@ -306,7 +319,7 @@ class Resiliency:
prom_cli: Pre-configured KrknPrometheus instance.
total_start_time: Start time for the full test window.
total_end_time: End time for the full test window.
run_mode: "controller" or "standalone" mode.
run_mode: "detailed" or "standalone" mode.
Returns:
(detailed_report)

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import annotations
from typing import Dict, List, Tuple

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import logging
from typing import Optional, TYPE_CHECKING

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import annotations
from dataclasses import dataclass

View File

@@ -1,3 +1,19 @@
#!/usr/bin/env python
#
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import annotations
import logging
@@ -132,8 +148,8 @@ def execute_rollback_version_files(
:param ignore_auto_rollback_config: Flag to ignore auto rollback configuration. Will be set to True for manual execute-rollback calls.
"""
if not ignore_auto_rollback_config and RollbackConfig().auto is False:
logger.warning(f"Auto rollback is disabled, skipping execution for run_uuid={run_uuid or '*'}, scenario_type={scenario_type or '*'}")
return
logger.warning(f"Auto rollback is disabled, skipping execution for run_uuid={run_uuid or '*'}, scenario_type={scenario_type or '*'}")
return
# Get the rollback versions directory
version_files = RollbackConfig.search_rollback_version_files(run_uuid, scenario_type)

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import inspect
import os
import logging

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Dict, Any, Optional
import threading
import signal

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1,4 +1,18 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import os
import time
from abc import ABC, abstractmethod
from krkn_lib.models.telemetry import ScenarioTelemetry
@@ -86,6 +100,16 @@ class AbstractScenarioPlugin(ABC):
scenario_telemetry.scenario = scenario_config
scenario_telemetry.scenario_type = self.get_scenario_types()[0]
scenario_telemetry.start_timestamp = time.time()
if not os.path.exists(scenario_config):
logging.error(
f"scenario file not found: '{scenario_config}' -- "
f"check that the path is correct relative to the working directory: {os.getcwd()}"
)
failed_scenarios.append(scenario_config)
scenario_telemetry.exit_status = 1
scenario_telemetry.end_timestamp = time.time()
scenario_telemetries.append(scenario_telemetry)
continue
parsed_scenario_config = telemetry.set_parameters_base64(
scenario_telemetry, scenario_config
)
@@ -147,7 +171,7 @@ class AbstractScenarioPlugin(ABC):
failed_scenarios.append(scenario_config)
scenario_telemetries.append(scenario_telemetry)
cerberus.publish_kraken_status(start_time,end_time)
logging.info(f"wating {wait_duration} before running the next scenario")
logging.info(f"waiting {wait_duration} before running the next scenario")
time.sleep(wait_duration)
return failed_scenarios, scenario_telemetries

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import time
import yaml

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import random
import time

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import copy
import logging
import queue

View File

@@ -0,0 +1,563 @@
import base64
import json
import logging
import time
from typing import Dict, List, Any
import yaml
from krkn_lib.models.telemetry import ScenarioTelemetry
from krkn_lib.telemetry.ocp import KrknTelemetryOpenshift
from krkn_lib.utils import get_random_string
from krkn.scenario_plugins.abstract_scenario_plugin import AbstractScenarioPlugin
from krkn.rollback.config import RollbackContent
from krkn.rollback.handler import set_rollback_context_decorator
class HttpLoadScenarioPlugin(AbstractScenarioPlugin):
"""
HTTP Load Testing Scenario Plugin using Vegeta.
Deploys Vegeta load testing pods inside the Kubernetes cluster for distributed
HTTP load testing. Supports multiple concurrent pods, node affinity, authentication,
and comprehensive results collection.
"""
def __init__(self, scenario_type: str = "http_load_scenarios"):
super().__init__(scenario_type=scenario_type)
@set_rollback_context_decorator
def run(
self,
run_uuid: str,
scenario: str,
lib_telemetry: KrknTelemetryOpenshift,
scenario_telemetry: ScenarioTelemetry,
) -> int:
"""
Main entry point for HTTP load scenario execution.
Deploys Vegeta load testing pods inside the cluster for distributed load testing.
:param run_uuid: Unique identifier for this chaos run
:param scenario: Path to scenario configuration file
:param lib_telemetry: Telemetry object for Kubernetes operations
:param scenario_telemetry: Telemetry object for this scenario
:return: 0 on success, 1 on failure
"""
try:
# Load scenario configuration
with open(scenario, "r") as f:
scenario_configs = yaml.full_load(f)
if not scenario_configs:
logging.error("Empty scenario configuration file")
return 1
# Process each scenario configuration
for scenario_config in scenario_configs:
if not isinstance(scenario_config, dict):
logging.error(f"Invalid scenario configuration format: {scenario_config}")
return 1
# Get the http_load_scenario configuration
config = scenario_config.get("http_load_scenario", scenario_config)
# Validate configuration
if not self._validate_config(config):
return 1
# Execute the load test (deploy pods)
result = self._execute_distributed_load_test(
config,
lib_telemetry,
scenario_telemetry
)
if result != 0:
return result
logging.info("HTTP load test completed successfully")
return 0
except Exception as e:
logging.error(f"HTTP load scenario failed with exception: {e}")
import traceback
logging.error(traceback.format_exc())
return 1
def get_scenario_types(self) -> list[str]:
"""Return the scenario types this plugin handles."""
return ["http_load_scenarios"]
def _validate_config(self, config: Dict[str, Any]) -> bool:
"""
Validate scenario configuration.
:param config: Scenario configuration dictionary
:return: True if valid, False otherwise
"""
# Check for required fields
if "targets" not in config:
logging.error("Missing required field: targets")
return False
targets = config["targets"]
# Validate targets configuration
if "endpoints" not in targets:
logging.error("targets must contain 'endpoints'")
return False
if "endpoints" in targets:
endpoints = targets["endpoints"]
if not isinstance(endpoints, list) or len(endpoints) == 0:
logging.error("endpoints must be a non-empty list")
return False
# Validate each endpoint
for idx, endpoint in enumerate(endpoints):
if not isinstance(endpoint, dict):
logging.error(f"Endpoint {idx} must be a dictionary")
return False
if "url" not in endpoint:
logging.error(f"Endpoint {idx} missing required field: url")
return False
if "method" not in endpoint:
logging.error(f"Endpoint {idx} missing required field: method")
return False
# Validate rate format
if "rate" in config:
rate = config["rate"]
if not isinstance(rate, (str, int)):
logging.error("rate must be a string (e.g., '200/1s') or integer")
return False
# Validate duration format
if "duration" in config:
duration = config["duration"]
if not isinstance(duration, (str, int)):
logging.error("duration must be a string (e.g., '30s') or integer")
return False
return True
def _execute_distributed_load_test(
self,
config: Dict[str, Any],
lib_telemetry: KrknTelemetryOpenshift,
scenario_telemetry: ScenarioTelemetry
) -> int:
"""
Execute distributed HTTP load test by deploying Vegeta pods.
:param config: Scenario configuration
:param lib_telemetry: Telemetry object for Kubernetes operations
:param scenario_telemetry: Telemetry object for recording results
:return: 0 on success, 1 on failure
"""
pod_names = []
namespace = config.get("namespace", "default")
try:
# Get number of pods to deploy
number_of_pods = config.get("number-of-pods", 1)
# Get container image
image = config.get("image", "quay.io/krkn-chaos/krkn-http-load:latest")
# Get endpoints
endpoints = config.get("targets", {}).get("endpoints", [])
if not endpoints:
logging.error("No endpoints specified in targets")
return 1
# Build Vegeta JSON targets for all endpoints (round-robin)
targets_json = self._build_vegeta_json_targets(endpoints)
targets_json_base64 = base64.b64encode(targets_json.encode()).decode()
target_urls = [ep["url"] for ep in endpoints]
logging.info(f"Targeting {len(endpoints)} endpoint(s): {target_urls}")
# Get node selectors for pod placement
node_selectors = config.get("attacker-nodes")
# Deploy multiple Vegeta pods
logging.info(f"Deploying {number_of_pods} HTTP load testing pod(s)")
for i in range(number_of_pods):
pod_name = f"http-load-{get_random_string(10)}"
logging.info(f"Deploying pod {i+1}/{number_of_pods}: {pod_name}")
# Deploy pod using krkn-lib
lib_telemetry.get_lib_kubernetes().deploy_http_load(
name=pod_name,
namespace=namespace,
image=image,
targets_json_base64=targets_json_base64,
duration=config.get("duration", "30s"),
rate=config.get("rate", "50/1s"),
workers=config.get("workers", 10),
max_workers=config.get("max_workers", 100),
connections=config.get("connections", 100),
timeout=config.get("timeout", "10s"),
keepalive=config.get("keepalive", True),
http2=config.get("http2", True),
insecure=config.get("insecure", False),
node_selectors=node_selectors,
timeout_sec=500
)
pod_names.append(pod_name)
# Set rollback callable for pod cleanup
rollback_data = base64.b64encode(json.dumps(pod_names).encode('utf-8')).decode('utf-8')
self.rollback_handler.set_rollback_callable(
self.rollback_http_load_pods,
RollbackContent(
namespace=namespace,
resource_identifier=rollback_data,
),
)
logging.info(f"Successfully deployed {len(pod_names)} HTTP load pod(s)")
# Wait for all pods to complete
logging.info("Waiting for all HTTP load pods to complete...")
self._wait_for_pods_completion(pod_names, namespace, lib_telemetry, config)
# Collect and aggregate results from all pods
metrics = self._collect_and_aggregate_results(pod_names, namespace, lib_telemetry)
if metrics:
# Log metrics summary
self._log_metrics_summary(metrics)
# Store metrics in telemetry
scenario_telemetry.additional_telemetry = metrics
logging.info("HTTP load test completed successfully")
return 0
except Exception as e:
logging.error(f"Error executing distributed load test: {e}")
import traceback
logging.error(traceback.format_exc())
return 1
def _build_vegeta_json_targets(self, endpoints: List[Dict[str, Any]]) -> str:
"""
Build newline-delimited Vegeta JSON targets from all endpoints.
Vegeta round-robins across targets when multiple are provided.
Each line is a JSON object: {"method":"GET","url":"...","header":{...},"body":"base64..."}
:param endpoints: List of endpoint configurations
:return: Newline-delimited JSON string
"""
lines = []
for ep in endpoints:
target = {
"method": ep.get("method", "GET"),
"url": ep["url"],
}
# Add headers
if "headers" in ep and ep["headers"]:
target["header"] = {k: [v] for k, v in ep["headers"].items()}
# Add body (base64 encoded as Vegeta JSON format expects)
if "body" in ep and ep["body"]:
target["body"] = base64.b64encode(ep["body"].encode()).decode()
lines.append(json.dumps(target, separators=(",", ":")))
return "\n".join(lines)
def _wait_for_pods_completion(
self,
pod_names: List[str],
namespace: str,
lib_telemetry: KrknTelemetryOpenshift,
config: Dict[str, Any]
):
"""
Wait for all HTTP load pods to complete.
:param pod_names: List of pod names to wait for
:param namespace: Namespace where pods are running
:param lib_telemetry: Telemetry object for Kubernetes operations
:param config: Scenario configuration
"""
lib_k8s = lib_telemetry.get_lib_kubernetes()
finished_pods = []
did_finish = False
# Calculate max wait time (duration + buffer)
duration_str = config.get("duration", "30s")
max_wait = self._parse_duration_to_seconds(duration_str) + 60 # Add 60s buffer
start_time = time.time()
while not did_finish:
for pod_name in pod_names:
if pod_name not in finished_pods:
if not lib_k8s.is_pod_running(pod_name, namespace):
finished_pods.append(pod_name)
logging.info(f"Pod {pod_name} has completed")
if set(pod_names) == set(finished_pods):
did_finish = True
break
# Check timeout
if time.time() - start_time > max_wait:
logging.warning(f"Timeout waiting for pods to complete (waited {max_wait}s)")
break
time.sleep(5)
logging.info(f"All {len(finished_pods)}/{len(pod_names)} pods have completed")
def _collect_and_aggregate_results(
self,
pod_names: List[str],
namespace: str,
lib_telemetry: KrknTelemetryOpenshift
) -> Dict[str, Any]:
"""
Collect results from all pods and aggregate metrics.
:param pod_names: List of pod names
:param namespace: Namespace where pods ran
:param lib_telemetry: Telemetry object for Kubernetes operations
:return: Aggregated metrics dictionary
"""
lib_k8s = lib_telemetry.get_lib_kubernetes()
all_metrics = []
logging.info("Collecting results from HTTP load pods...")
for pod_name in pod_names:
try:
# Read pod logs to get results
log_response = lib_k8s.get_pod_log(pod_name, namespace)
# Handle HTTPResponse object from kubernetes client
if hasattr(log_response, 'data'):
logs = log_response.data.decode('utf-8') if isinstance(log_response.data, bytes) else str(log_response.data)
elif hasattr(log_response, 'read'):
logs = log_response.read().decode('utf-8')
else:
logs = str(log_response)
# Parse JSON report from logs
metrics = self._parse_metrics_from_logs(logs)
if metrics:
all_metrics.append(metrics)
logging.info(f"Collected metrics from pod: {pod_name}")
else:
logging.warning(f"No metrics found in logs for pod: {pod_name}")
except Exception as e:
logging.warning(f"Failed to collect results from pod {pod_name}: {e}")
if not all_metrics:
logging.warning("No metrics collected from any pods")
return {}
# Aggregate metrics from all pods
aggregated = self._aggregate_metrics(all_metrics)
logging.info(f"Aggregated metrics from {len(all_metrics)} pod(s)")
return aggregated
def _parse_metrics_from_logs(self, logs: str) -> Dict[str, Any]:
"""
Parse Vegeta JSON metrics from pod logs.
:param logs: Pod logs
:return: Metrics dictionary or None
"""
try:
# Look for JSON report section in logs
for line in logs.split('\n'):
line = line.strip()
if line.startswith('{') and '"latencies"' in line:
return json.loads(line)
return None
except Exception as e:
logging.warning(f"Failed to parse metrics from logs: {e}")
return None
def _aggregate_metrics(self, metrics_list: List[Dict[str, Any]]) -> Dict[str, Any]:
"""
Aggregate metrics from multiple pods.
:param metrics_list: List of metrics dictionaries from each pod
:return: Aggregated metrics
"""
if not metrics_list:
return {}
# Sum totals
total_requests = sum(m.get("requests", 0) for m in metrics_list)
total_rate = sum(m.get("rate", 0) for m in metrics_list)
total_throughput = sum(m.get("throughput", 0) for m in metrics_list)
# Average latencies (weighted by request count)
latencies = {}
if total_requests > 0:
for percentile in ["mean", "50th", "95th", "99th", "max", "min"]:
weighted_sum = sum(
m.get("latencies", {}).get(percentile, 0) * m.get("requests", 0)
for m in metrics_list
)
latencies[percentile] = weighted_sum / total_requests if total_requests > 0 else 0
# Average success rate (weighted by request count)
total_success = sum(
m.get("success", 0) * m.get("requests", 0)
for m in metrics_list
)
success_rate = total_success / total_requests if total_requests > 0 else 0
# Aggregate status codes
status_codes = {}
for metrics in metrics_list:
for code, count in metrics.get("status_codes", {}).items():
status_codes[code] = status_codes.get(code, 0) + count
# Aggregate bytes
bytes_in_total = sum(m.get("bytes_in", {}).get("total", 0) for m in metrics_list)
bytes_out_total = sum(m.get("bytes_out", {}).get("total", 0) for m in metrics_list)
# Aggregate errors
all_errors = []
for metrics in metrics_list:
all_errors.extend(metrics.get("errors", []))
return {
"requests": total_requests,
"rate": total_rate,
"throughput": total_throughput,
"latencies": latencies,
"success": success_rate,
"status_codes": status_codes,
"bytes_in": {"total": bytes_in_total},
"bytes_out": {"total": bytes_out_total},
"errors": all_errors[:10], # First 10 errors only
"pod_count": len(metrics_list)
}
def _parse_duration_to_seconds(self, duration: str) -> int:
"""
Parse duration string to seconds.
:param duration: Duration string like "30s", "5m", "1h"
:return: Duration in seconds
"""
import re
match = re.match(r'^(\d+)(s|m|h)$', str(duration))
if not match:
logging.warning(f"Invalid duration format: {duration}, defaulting to 30s")
return 30
value = int(match.group(1))
unit = match.group(2)
multipliers = {
"s": 1,
"m": 60,
"h": 3600,
}
return value * multipliers.get(unit, 1)
@staticmethod
def rollback_http_load_pods(
rollback_content: RollbackContent,
lib_telemetry: KrknTelemetryOpenshift
):
"""
Rollback function to delete HTTP load pods.
:param rollback_content: Rollback content containing namespace and pod names
:param lib_telemetry: Instance of KrknTelemetryOpenshift for Kubernetes operations
"""
try:
namespace = rollback_content.namespace
pod_names = json.loads(
base64.b64decode(rollback_content.resource_identifier.encode('utf-8')).decode('utf-8')
)
logging.info(f"Rolling back HTTP load pods: {pod_names} in namespace: {namespace}")
for pod_name in pod_names:
try:
lib_telemetry.get_lib_kubernetes().delete_pod(pod_name, namespace)
logging.info(f"Deleted pod: {pod_name}")
except Exception as e:
logging.warning(f"Failed to delete pod {pod_name}: {e}")
logging.info("Rollback of HTTP load pods completed")
except Exception as e:
logging.error(f"Failed to rollback HTTP load pods: {e}")
def _log_metrics_summary(self, metrics: Dict[str, Any]):
"""Log summary of test metrics."""
logging.info("=" * 60)
logging.info("HTTP Load Test Results Summary (Aggregated)")
logging.info("=" * 60)
# Pod count
pod_count = metrics.get("pod_count", 1)
logging.info(f"Load Generator Pods: {pod_count}")
# Request statistics
requests = metrics.get("requests", 0)
logging.info(f"Total Requests: {requests}")
# Success rate
success = metrics.get("success", 0.0)
logging.info(f"Success Rate: {success * 100:.2f}%")
# Latency statistics
latencies = metrics.get("latencies", {})
if latencies:
logging.info(f"Latency Mean: {latencies.get('mean', 0) / 1e6:.2f} ms")
logging.info(f"Latency P50: {latencies.get('50th', 0) / 1e6:.2f} ms")
logging.info(f"Latency P95: {latencies.get('95th', 0) / 1e6:.2f} ms")
logging.info(f"Latency P99: {latencies.get('99th', 0) / 1e6:.2f} ms")
logging.info(f"Latency Max: {latencies.get('max', 0) / 1e6:.2f} ms")
# Throughput
throughput = metrics.get("throughput", 0.0)
logging.info(f"Total Throughput: {throughput:.2f} req/s")
# Bytes
bytes_in = metrics.get("bytes_in", {})
bytes_out = metrics.get("bytes_out", {})
if bytes_in:
logging.info(f"Bytes In (total): {bytes_in.get('total', 0) / 1024 / 1024:.2f} MB")
if bytes_out:
logging.info(f"Bytes Out (total): {bytes_out.get('total', 0) / 1024 / 1024:.2f} MB")
# Status codes
status_codes = metrics.get("status_codes", {})
if status_codes:
logging.info("Status Code Distribution:")
for code, count in sorted(status_codes.items()):
logging.info(f" {code}: {count}")
# Errors
errors = metrics.get("errors", [])
if errors:
logging.warning(f"Errors encountered: {len(errors)}")
for error in errors[:5]: # Show first 5 errors
logging.warning(f" - {error}")
logging.info("=" * 60)

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1,15 +1,27 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import time
from typing import Dict, Any, Optional
from typing import Dict, Any
import random
import re
import yaml
from kubernetes.client.rest import ApiException
from krkn_lib.k8s import KrknKubernetes
from krkn_lib.models.telemetry import ScenarioTelemetry
from krkn_lib.telemetry.ocp import KrknTelemetryOpenshift
from krkn_lib.utils import log_exception
from krkn_lib.models.k8s import AffectedPod, PodsStatus
from krkn_lib.models.k8s import AffectedVMI, VmisStatus
from krkn.scenario_plugins.abstract_scenario_plugin import AbstractScenarioPlugin
@@ -35,7 +47,6 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
self,
run_uuid: str,
scenario: str,
krkn_config: dict[str, any],
lib_telemetry: KrknTelemetryOpenshift,
scenario_telemetry: ScenarioTelemetry,
) -> int:
@@ -48,19 +59,19 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
scenario_config = yaml.full_load(f)
self.init_clients(lib_telemetry.get_lib_kubernetes())
pods_status = PodsStatus()
vmis_status = VmisStatus()
for config in scenario_config["scenarios"]:
if config.get("scenario") == "kubevirt_vm_outage":
single_pods_status = self.execute_scenario(config, scenario_telemetry)
pods_status.merge(single_pods_status)
single_vmis_status = self.execute_scenario(config, scenario_telemetry)
vmis_status.merge(single_vmis_status)
scenario_telemetry.affected_pods = pods_status
if len(scenario_telemetry.affected_pods.unrecovered) > 0:
scenario_telemetry.affected_vmis = vmis_status
if len(scenario_telemetry.affected_vmis.unrecovered) > 0:
return 1
return 0
except Exception as e:
logging.error(f"KubeVirt VM Outage scenario failed: {e}")
log_exception(e)
log_exception(str(e))
return 1
def init_clients(self, k8s_client: KrknKubernetes):
@@ -72,15 +83,15 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
logging.info("Successfully initialized Kubernetes client for KubeVirt operations")
def execute_scenario(self, config: Dict[str, Any], scenario_telemetry: ScenarioTelemetry) -> PodsStatus:
def execute_scenario(self, config: Dict[str, Any], scenario_telemetry: ScenarioTelemetry) -> VmisStatus:
"""
Execute a KubeVirt VM outage scenario based on the provided configuration.
:param config: The scenario configuration
:param scenario_telemetry: The telemetry object for recording metrics
:return: PodsStatus object containing recovered and unrecovered pods
:return: VmisStatus object containing recovered and unrecovered pods
"""
self.pods_status = PodsStatus()
self.vmis_status = VmisStatus()
try:
params = config.get("parameters", {})
vm_name = params.get("vm_name")
@@ -91,8 +102,8 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
if not vm_name:
logging.error("vm_name parameter is required")
return self.pods_status
self.pods_status = PodsStatus()
return self.vmis_status
self.vmis_status = VmisStatus()
self.vmis_list = self.k8s_client.get_vmis(vm_name,namespace)
for _ in range(kill_count):
@@ -103,48 +114,48 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
vmi_name = vmi.get("metadata").get("name")
vmi_namespace = vmi.get("metadata").get("namespace")
# Create affected_pod early so we can track failures
self.affected_pod = AffectedPod(
pod_name=vmi_name,
# Create affected_vmi early so we can track failures
self.affected_vmi = AffectedVMI(
vmi_name=vmi_name,
namespace=vmi_namespace,
)
if not self.validate_environment(vmi_name, vmi_namespace):
self.pods_status.unrecovered.append(self.affected_pod)
self.vmis_status.unrecovered.append(self.affected_vmi)
continue
vmi = self.k8s_client.get_vmi(vmi_name, vmi_namespace)
if not vmi:
logging.error(f"VMI {vm_name} not found in namespace {namespace}")
self.pods_status.unrecovered.append(self.affected_pod)
self.vmis_status.unrecovered.append(self.affected_vmi)
continue
self.original_vmi = vmi
logging.info(f"Captured initial state of VMI: {vm_name}")
result = self.delete_vmi(vmi_name, vmi_namespace, disable_auto_restart)
if result != 0:
self.pods_status.unrecovered.append(self.affected_pod)
self.vmis_status.unrecovered.append(self.affected_vmi)
continue
result = self.wait_for_running(vmi_name,vmi_namespace, timeout)
if result != 0:
self.pods_status.unrecovered.append(self.affected_pod)
self.vmis_status.unrecovered.append(self.affected_vmi)
continue
self.affected_pod.total_recovery_time = (
self.affected_pod.pod_readiness_time
+ self.affected_pod.pod_rescheduling_time
self.affected_vmi.total_recovery_time = (
self.affected_vmi.vmi_readiness_time
+ self.affected_vmi.vmi_rescheduling_time
)
self.pods_status.recovered.append(self.affected_pod)
self.vmis_status.recovered.append(self.affected_vmi)
logging.info(f"Successfully completed KubeVirt VM outage scenario for VM: {vm_name}")
return self.pods_status
return self.vmis_status
except Exception as e:
logging.error(f"Error executing KubeVirt VM outage scenario: {e}")
log_exception(e)
return self.pods_status
log_exception(str(e))
return self.vmis_status
def validate_environment(self, vm_name: str, namespace: str) -> bool:
"""
@@ -231,20 +242,20 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
if deleted_vmi:
if start_creation_time != deleted_vmi.get('metadata', {}).get('creationTimestamp'):
logging.info(f"VMI {vm_name} successfully recreated")
self.affected_pod.pod_rescheduling_time = time.time() - start_time
self.affected_vmi.vmi_rescheduling_time = time.time() - start_time
return 0
else:
logging.info(f"VMI {vm_name} successfully deleted")
time.sleep(1)
logging.error(f"Timed out waiting for VMI {vm_name} to be deleted")
self.pods_status.unrecovered.append(self.affected_pod)
self.vmis_status.unrecovered.append(self.affected_vmi)
return 1
except Exception as e:
logging.error(f"Error deleting VMI {vm_name}: {e}")
log_exception(e)
self.pods_status.unrecovered.append(self.affected_pod)
log_exception(str(e))
self.vmis_status.unrecovered.append(self.affected_vmi)
return 1
def wait_for_running(self, vm_name: str, namespace: str, timeout: int = 120) -> int:
@@ -257,7 +268,7 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
if vmi:
if vmi.get('status', {}).get('phase') == "Running":
end_time = time.time()
self.affected_pod.pod_readiness_time = end_time - start_time
self.affected_vmi.vmi_readiness_time = end_time - start_time
logging.info(f"VMI {vm_name} is already running")
return 0
@@ -304,7 +315,7 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
except Exception as e:
logging.error(f"Error recreating VMI {vm_name}: {e}")
log_exception(e)
log_exception(str(e))
return 1
else:
logging.error(f"Failed to recover VMI {vm_name}: No original state captured and auto-recovery did not occur")
@@ -312,5 +323,5 @@ class KubevirtVmOutageScenarioPlugin(AbstractScenarioPlugin):
except Exception as e:
logging.error(f"Unexpected error recovering VMI {vm_name}: {e}")
log_exception(e)
log_exception(str(e))
return 1

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import random
import logging
from krkn_lib.k8s import KrknKubernetes

View File

@@ -1,5 +1,17 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import time
import yaml
from krkn_lib.k8s import KrknKubernetes
@@ -27,23 +39,27 @@ class ManagedClusterScenarioPlugin(AbstractScenarioPlugin):
lib_telemetry.get_lib_kubernetes()
)
if managedcluster_scenario["actions"]:
for action in managedcluster_scenario["actions"]:
start_time = int(time.time())
try:
self.inject_managedcluster_scenario(
action,
managedcluster_scenario,
managedcluster_scenario_object,
lib_telemetry.get_lib_kubernetes(),
)
except Exception as e:
logging.error(
"ManagedClusterScenarioPlugin exiting due to Exception %s"
% e
)
return 1
else:
return 0
try:
self.inject_managedcluster_scenario(
action,
managedcluster_scenario,
managedcluster_scenario_object,
lib_telemetry.get_lib_kubernetes(),
)
except Exception as e:
logging.error(
"ManagedClusterScenarioPlugin exiting due to Exception %s"
% e
)
return 1
else:
logging.error(
"ManagedClusterScenarioPlugin: 'actions' must be defined and non-empty in the scenario config"
)
return 1
return 0
def inject_managedcluster_scenario(
self,

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from jinja2 import Environment, FileSystemLoader
import os
import time

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from krkn.scenario_plugins.abstract_scenario_plugin import AbstractScenarioPlugin
from krkn.scenario_plugins.native.plugins import PLUGINS
from krkn_lib.models.telemetry import ScenarioTelemetry

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from dataclasses import dataclass, field
import yaml
import logging
@@ -8,11 +21,9 @@ import re
import random
from traceback import format_exc
from jinja2 import Environment, FileSystemLoader
from . import kubernetes_functions as kube_helper
from krkn_lib.k8s import KrknKubernetes
import typing
from arcaflow_plugin_sdk import validation, plugin
from kubernetes.client.api.core_v1_api import CoreV1Api as CoreV1Api
from kubernetes.client.api.batch_v1_api import BatchV1Api as BatchV1Api
@dataclass
@@ -150,7 +161,7 @@ class NetworkScenarioErrorOutput:
)
def get_default_interface(node: str, pod_template, cli: CoreV1Api, image: str) -> str:
def get_default_interface(node: str, pod_template, kubecli: KrknKubernetes, image: str) -> str:
"""
Function that returns a random interface from a node
@@ -162,20 +173,20 @@ def get_default_interface(node: str, pod_template, cli: CoreV1Api, image: str) -
- The YAML template used to instantiate a pod to query
the node's interface
cli (CoreV1Api)
- Object to interact with Kubernetes Python client's CoreV1 API
kubecli (KrknKubernetes)
- Object to interact with Kubernetes Python client
Returns:
Default interface (string) belonging to the node
"""
pod_name_regex = str(random.randint(0, 10000))
pod_body = yaml.safe_load(pod_template.render(regex_name=pod_name_regex,nodename=node, image=image))
pod_body = yaml.safe_load(pod_template.render(regex_name=pod_name_regex, nodename=node, image=image))
logging.info("Creating pod to query interface on node %s" % node)
kube_helper.create_pod(cli, pod_body, "default", 300)
kubecli.create_pod(pod_body, "default", 300)
pod_name = f"fedtools-{pod_name_regex}"
try:
cmd = ["ip", "r"]
output = kube_helper.exec_cmd_in_pod(cli, cmd, pod_name, "default")
output = kubecli.exec_cmd_in_pod(cmd, pod_name, "default")
if not output:
logging.error("Exception occurred while executing command in pod")
@@ -191,13 +202,13 @@ def get_default_interface(node: str, pod_template, cli: CoreV1Api, image: str) -
finally:
logging.info("Deleting pod to query interface on node")
kube_helper.delete_pod(cli, pod_name, "default")
kubecli.delete_pod(pod_name, "default")
return interfaces
def verify_interface(
input_interface_list: typing.List[str], node: str, pod_template, cli: CoreV1Api, image: str
input_interface_list: typing.List[str], node: str, pod_template, kubecli: KrknKubernetes, image: str
) -> typing.List[str]:
"""
Function that verifies whether a list of interfaces is present in the node.
@@ -214,21 +225,21 @@ def verify_interface(
- The YAML template used to instantiate a pod to query
the node's interfaces
cli (CoreV1Api)
- Object to interact with Kubernetes Python client's CoreV1 API
kubecli (KrknKubernetes)
- Object to interact with Kubernetes Python client
Returns:
The interface list for the node
"""
pod_name_regex = str(random.randint(0, 10000))
pod_body = yaml.safe_load(pod_template.render(regex_name=pod_name_regex,nodename=node, image=image))
pod_body = yaml.safe_load(pod_template.render(regex_name=pod_name_regex, nodename=node, image=image))
logging.info("Creating pod to query interface on node %s" % node)
kube_helper.create_pod(cli, pod_body, "default", 300)
kubecli.create_pod(pod_body, "default", 300)
pod_name = f"fedtools-{pod_name_regex}"
try:
if input_interface_list == []:
cmd = ["ip", "r"]
output = kube_helper.exec_cmd_in_pod(cli, cmd, pod_name, "default")
output = kubecli.exec_cmd_in_pod(cmd, pod_name, "default")
if not output:
logging.error("Exception occurred while executing command in pod")
@@ -244,7 +255,7 @@ def verify_interface(
else:
cmd = ["ip", "-br", "addr", "show"]
output = kube_helper.exec_cmd_in_pod(cli, cmd, pod_name, "default")
output = kubecli.exec_cmd_in_pod(cmd, pod_name, "default")
if not output:
logging.error("Exception occurred while executing command in pod")
@@ -267,7 +278,7 @@ def verify_interface(
)
finally:
logging.info("Deleting pod to query interface on node")
kube_helper.delete_pod(cli, pod_name, "default")
kubecli.delete_pod(pod_name, "default")
return input_interface_list
@@ -277,7 +288,7 @@ def get_node_interfaces(
label_selector: str,
instance_count: int,
pod_template,
cli: CoreV1Api,
kubecli: KrknKubernetes,
image: str
) -> typing.Dict[str, typing.List[str]]:
"""
@@ -305,8 +316,8 @@ def get_node_interfaces(
- The YAML template used to instantiate a pod to query
the node's interfaces
cli (CoreV1Api)
- Object to interact with Kubernetes Python client's CoreV1 API
kubecli (KrknKubernetes)
- Object to interact with Kubernetes Python client
Returns:
Filtered dictionary containing the test nodes and their test interfaces
@@ -317,22 +328,22 @@ def get_node_interfaces(
"If node names and interfaces aren't provided, "
"then the label selector must be provided"
)
nodes = kube_helper.get_node(None, label_selector, instance_count, cli)
nodes = kubecli.get_node(None, label_selector, instance_count)
node_interface_dict = {}
for node in nodes:
node_interface_dict[node] = get_default_interface(node, pod_template, cli, image)
node_interface_dict[node] = get_default_interface(node, pod_template, kubecli, image)
else:
node_name_list = node_interface_dict.keys()
filtered_node_list = []
for node in node_name_list:
filtered_node_list.extend(
kube_helper.get_node(node, label_selector, instance_count, cli)
kubecli.get_node(node, label_selector, instance_count)
)
for node in filtered_node_list:
node_interface_dict[node] = verify_interface(
node_interface_dict[node], node, pod_template, cli, image
node_interface_dict[node], node, pod_template, kubecli, image
)
return node_interface_dict
@@ -344,11 +355,10 @@ def apply_ingress_filter(
node: str,
pod_template,
job_template,
batch_cli: BatchV1Api,
cli: CoreV1Api,
kubecli: KrknKubernetes,
create_interfaces: bool = True,
param_selector: str = "all",
image:str = "quay.io/krkn-chaos/krkn:tools",
image: str = "quay.io/krkn-chaos/krkn:tools",
) -> str:
"""
Function that applies the filters to shape incoming traffic to
@@ -374,11 +384,8 @@ def apply_ingress_filter(
- The YAML template used to instantiate a job to apply and remove
the filters on the interfaces
batch_cli
- Object to interact with Kubernetes Python client's BatchV1 API
cli (CoreV1Api)
- Object to interact with Kubernetes Python client's CoreV1 API
kubecli (KrknKubernetes)
- Object to interact with Kubernetes Python client
param_selector (string)
- Used to specify what kind of filter to apply. Useful during
@@ -394,7 +401,7 @@ def apply_ingress_filter(
network_params = {param_selector: cfg.network_params[param_selector]}
if create_interfaces:
create_virtual_interfaces(cli, interface_list, node, pod_template, image)
create_virtual_interfaces(kubecli, interface_list, node, pod_template, image)
exec_cmd = get_ingress_cmd(
interface_list, network_params, duration=cfg.test_duration
@@ -403,7 +410,7 @@ def apply_ingress_filter(
job_body = yaml.safe_load(
job_template.render(jobname=str(hash(node))[:5], nodename=node, image=image, cmd=exec_cmd)
)
api_response = kube_helper.create_job(batch_cli, job_body)
api_response = kubecli.create_job(job_body)
if api_response is None:
raise Exception("Error creating job")
@@ -412,15 +419,15 @@ def apply_ingress_filter(
def create_virtual_interfaces(
cli: CoreV1Api, interface_list: typing.List[str], node: str, pod_template, image: str
kubecli: KrknKubernetes, interface_list: typing.List[str], node: str, pod_template, image: str
) -> None:
"""
Function that creates a privileged pod and uses it to create
virtual interfaces on the node
Args:
cli (CoreV1Api)
- Object to interact with Kubernetes Python client's CoreV1 API
kubecli (KrknKubernetes)
- Object to interact with Kubernetes Python client
interface_list (List of strings)
- The list of interfaces on the node for which virtual interfaces
@@ -434,37 +441,34 @@ def create_virtual_interfaces(
virtual interfaces on the node
"""
pod_name_regex = str(random.randint(0, 10000))
pod_body = yaml.safe_load(pod_template.render(regex_name=pod_name_regex,nodename=node, image=image))
kube_helper.create_pod(cli, pod_body, "default", 300)
pod_body = yaml.safe_load(pod_template.render(regex_name=pod_name_regex, nodename=node, image=image))
kubecli.create_pod(pod_body, "default", 300)
logging.info(
"Creating {0} virtual interfaces on node {1} using a pod".format(
len(interface_list), node
)
)
pod_name = f"modtools-{pod_name_regex}"
create_ifb(cli, len(interface_list), pod_name)
create_ifb(kubecli, len(interface_list), pod_name)
logging.info("Deleting pod used to create virtual interfaces")
kube_helper.delete_pod(cli, pod_name, "default")
kubecli.delete_pod(pod_name, "default")
def delete_virtual_interfaces(
cli: CoreV1Api, node_list: typing.List[str], pod_template, image: str
kubecli: KrknKubernetes, node_list: typing.List[str], pod_template, image: str
):
"""
Function that creates a privileged pod and uses it to delete all
virtual interfaces on the specified nodes
Args:
cli (CoreV1Api)
- Object to interact with Kubernetes Python client's CoreV1 API
kubecli (KrknKubernetes)
- Object to interact with Kubernetes Python client
node_list (List of strings)
- The list of nodes on which the list of virtual interfaces are
to be deleted
node (string)
- The node on which the virtual interfaces are created
pod_template (jinja2.environment.Template))
- The YAML template used to instantiate a pod to delete
virtual interfaces on the node
@@ -472,46 +476,45 @@ def delete_virtual_interfaces(
for node in node_list:
pod_name_regex = str(random.randint(0, 10000))
pod_body = yaml.safe_load(pod_template.render(regex_name=pod_name_regex,nodename=node, image=image))
kube_helper.create_pod(cli, pod_body, "default", 300)
pod_body = yaml.safe_load(pod_template.render(regex_name=pod_name_regex, nodename=node, image=image))
kubecli.create_pod(pod_body, "default", 300)
logging.info("Deleting all virtual interfaces on node {0}".format(node))
pod_name = f"modtools-{pod_name_regex}"
delete_ifb(cli, pod_name)
kube_helper.delete_pod(cli, pod_name, "default")
delete_ifb(kubecli, pod_name)
kubecli.delete_pod(pod_name, "default")
def create_ifb(cli: CoreV1Api, number: int, pod_name: str):
def create_ifb(kubecli: KrknKubernetes, number: int, pod_name: str):
"""
Function that creates virtual interfaces in a pod.
Makes use of modprobe commands
"""
exec_command = ["chroot", "/host", "modprobe", "ifb", "numifbs=" + str(number)]
kube_helper.exec_cmd_in_pod(cli, exec_command, pod_name, "default")
exec_command = ["/host", "modprobe", "ifb", "numifbs=" + str(number)]
kubecli.exec_cmd_in_pod(exec_command, pod_name, "default", base_command="chroot")
for i in range(0, number):
exec_command = ["chroot", "/host", "ip", "link", "set", "dev"]
exec_command += ["ifb" + str(i), "up"]
kube_helper.exec_cmd_in_pod(cli, exec_command, pod_name, "default")
exec_command = ["/host", "ip", "link", "set", "dev", "ifb" + str(i), "up"]
kubecli.exec_cmd_in_pod(exec_command, pod_name, "default", base_command="chroot")
def delete_ifb(cli: CoreV1Api, pod_name: str):
def delete_ifb(kubecli: KrknKubernetes, pod_name: str):
"""
Function that deletes all virtual interfaces in a pod.
Makes use of modprobe command
"""
exec_command = ["chroot", "/host", "modprobe", "-r", "ifb"]
kube_helper.exec_cmd_in_pod(cli, exec_command, pod_name, "default")
exec_command = ["/host", "modprobe", "-r", "ifb"]
kubecli.exec_cmd_in_pod(exec_command, pod_name, "default", base_command="chroot")
def get_job_pods(cli: CoreV1Api, api_response):
def get_job_pods(kubecli: KrknKubernetes, api_response):
"""
Function that gets the pod corresponding to the job
Args:
cli (CoreV1Api)
- Object to interact with Kubernetes Python client's CoreV1 API
kubecli (KrknKubernetes)
- Object to interact with Kubernetes Python client
api_response
- The API response for the job status
@@ -522,22 +525,22 @@ def get_job_pods(cli: CoreV1Api, api_response):
controllerUid = api_response.metadata.labels["controller-uid"]
pod_label_selector = "controller-uid=" + controllerUid
pods_list = kube_helper.list_pods(
cli, label_selector=pod_label_selector, namespace="default"
pods_list = kubecli.list_pods(
label_selector=pod_label_selector, namespace="default"
)
return pods_list[0]
def wait_for_job(
batch_cli: BatchV1Api, job_list: typing.List[str], timeout: int = 300
kubecli: KrknKubernetes, job_list: typing.List[str], timeout: int = 300
) -> None:
"""
Function that waits for a list of jobs to finish within a time period
Args:
batch_cli (BatchV1Api)
- Object to interact with Kubernetes Python client's BatchV1 API
kubecli (KrknKubernetes)
- Object to interact with Kubernetes Python client
job_list (List of strings)
- The list of jobs to check for completion
@@ -552,9 +555,7 @@ def wait_for_job(
while count != job_len:
for job_name in job_list:
try:
api_response = kube_helper.get_job_status(
batch_cli, job_name, namespace="default"
)
api_response = kubecli.get_job_status(job_name, namespace="default")
if (
api_response.status.succeeded is not None
or api_response.status.failed is not None
@@ -571,16 +572,13 @@ def wait_for_job(
time.sleep(5)
def delete_jobs(cli: CoreV1Api, batch_cli: BatchV1Api, job_list: typing.List[str]):
def delete_jobs(kubecli: KrknKubernetes, job_list: typing.List[str]):
"""
Function that deletes jobs
Args:
cli (CoreV1Api)
- Object to interact with Kubernetes Python client's CoreV1 API
batch_cli (BatchV1Api)
- Object to interact with Kubernetes Python client's BatchV1 API
kubecli (KrknKubernetes)
- Object to interact with Kubernetes Python client
job_list (List of strings)
- The list of jobs to delete
@@ -588,23 +586,19 @@ def delete_jobs(cli: CoreV1Api, batch_cli: BatchV1Api, job_list: typing.List[str
for job_name in job_list:
try:
api_response = kube_helper.get_job_status(
batch_cli, job_name, namespace="default"
)
api_response = kubecli.get_job_status(job_name, namespace="default")
if api_response.status.failed is not None:
pod_name = get_job_pods(cli, api_response)
pod_stat = kube_helper.read_pod(cli, name=pod_name, namespace="default")
pod_name = get_job_pods(kubecli, api_response)
pod_stat = kubecli.read_pod(name=pod_name, namespace="default")
logging.error(pod_stat.status.container_statuses)
pod_log_response = kube_helper.get_pod_log(
cli, name=pod_name, namespace="default"
pod_log_response = kubecli.get_pod_log(
name=pod_name, namespace="default"
)
pod_log = pod_log_response.data.decode("utf-8")
logging.error(pod_log)
except Exception as e:
logging.warning("Exception in getting job status: %s" % str(e))
api_response = kube_helper.delete_job(
batch_cli, name=job_name, namespace="default"
)
kubecli.delete_job(name=job_name, namespace="default")
def get_ingress_cmd(
@@ -715,7 +709,7 @@ def network_chaos(
job_template = env.get_template("job.j2")
pod_interface_template = env.get_template("pod_interface.j2")
pod_module_template = env.get_template("pod_module.j2")
cli, batch_cli = kube_helper.setup_kubernetes(cfg.kubeconfig_path)
kubecli = KrknKubernetes(kubeconfig_path=cfg.kubeconfig_path)
test_image = cfg.image
logging.info("Starting Ingress Network Chaos")
try:
@@ -724,7 +718,7 @@ def network_chaos(
cfg.label_selector,
cfg.instance_count,
pod_interface_template,
cli,
kubecli,
test_image
)
except Exception:
@@ -741,13 +735,12 @@ def network_chaos(
node,
pod_module_template,
job_template,
batch_cli,
cli,
test_image
kubecli,
image=test_image
)
)
logging.info("Waiting for parallel job to finish")
wait_for_job(batch_cli, job_list[:], cfg.test_duration + 100)
wait_for_job(kubecli, job_list[:], cfg.test_duration + 100)
elif cfg.execution_type == "serial":
create_interfaces = True
@@ -760,22 +753,20 @@ def network_chaos(
node,
pod_module_template,
job_template,
batch_cli,
cli,
kubecli,
create_interfaces=create_interfaces,
param_selector=param,
image=test_image
)
)
logging.info("Waiting for serial job to finish")
wait_for_job(batch_cli, job_list[:], cfg.test_duration + 100)
wait_for_job(kubecli, job_list[:], cfg.test_duration + 100)
logging.info("Deleting jobs")
delete_jobs(cli, batch_cli, job_list[:])
delete_jobs(kubecli, job_list[:])
job_list = []
create_interfaces = False
else:
return "error", NetworkScenarioErrorOutput(
"Invalid execution type - serial and parallel are "
"the only accepted types"
@@ -790,6 +781,6 @@ def network_chaos(
logging.error("Ingress Network Chaos exiting due to Exception - %s" % e)
return "error", NetworkScenarioErrorOutput(format_exc())
finally:
delete_virtual_interfaces(cli, node_interface_dict.keys(), pod_module_template, test_image)
delete_virtual_interfaces(kubecli, node_interface_dict.keys(), pod_module_template, test_image)
logging.info("Deleting jobs(if any)")
delete_jobs(cli, batch_cli, job_list[:])
delete_jobs(kubecli, job_list[:])

View File

@@ -1,284 +0,0 @@
from kubernetes import config, client
from kubernetes.client.rest import ApiException
from kubernetes.stream import stream
import sys
import time
import logging
import random
def setup_kubernetes(kubeconfig_path):
"""
Sets up the Kubernetes client
"""
if kubeconfig_path is None:
kubeconfig_path = config.KUBE_CONFIG_DEFAULT_LOCATION
config.load_kube_config(kubeconfig_path)
cli = client.CoreV1Api()
batch_cli = client.BatchV1Api()
return cli, batch_cli
def create_job(batch_cli, body, namespace="default"):
"""
Function used to create a job from a YAML config
"""
try:
api_response = batch_cli.create_namespaced_job(body=body, namespace=namespace)
return api_response
except ApiException as api:
logging.warning(
"Exception when calling \
BatchV1Api->create_job: %s"
% api
)
if api.status == 409:
logging.warning("Job already present")
except Exception as e:
logging.error(
"Exception when calling \
BatchV1Api->create_namespaced_job: %s"
% e
)
raise
def delete_pod(cli, name, namespace):
"""
Function that deletes a pod and waits until deletion is complete
"""
try:
cli.delete_namespaced_pod(name=name, namespace=namespace)
while cli.read_namespaced_pod(name=name, namespace=namespace):
time.sleep(1)
except ApiException as e:
if e.status == 404:
logging.info("Pod deleted")
else:
logging.error("Failed to delete pod %s" % e)
raise e
def create_pod(cli, body, namespace, timeout=120):
"""
Function used to create a pod from a YAML config
"""
try:
pod_stat = None
pod_stat = cli.create_namespaced_pod(body=body, namespace=namespace)
end_time = time.time() + timeout
while True:
pod_stat = cli.read_namespaced_pod(name=body["metadata"]["name"], namespace=namespace)
if pod_stat.status.phase == "Running":
break
if time.time() > end_time:
raise Exception("Starting pod failed")
time.sleep(1)
except Exception as e:
logging.error("Pod creation failed %s" % e)
if pod_stat:
logging.error(pod_stat.status.container_statuses)
delete_pod(cli, body["metadata"]["name"], namespace)
sys.exit(1)
def exec_cmd_in_pod(cli, command, pod_name, namespace, container=None):
"""
Function used to execute a command in a running pod
"""
exec_command = command
try:
if container:
ret = stream(
cli.connect_get_namespaced_pod_exec,
pod_name,
namespace,
container=container,
command=exec_command,
stderr=True,
stdin=False,
stdout=True,
tty=False,
)
else:
ret = stream(
cli.connect_get_namespaced_pod_exec,
pod_name,
namespace,
command=exec_command,
stderr=True,
stdin=False,
stdout=True,
tty=False,
)
except Exception as e:
return False
return ret
def create_ifb(cli, number, pod_name):
"""
Function that creates virtual interfaces in a pod. Makes use of modprobe commands
"""
exec_command = ['chroot', '/host', 'modprobe', 'ifb','numifbs=' + str(number)]
resp = exec_cmd_in_pod(cli, exec_command, pod_name, 'default')
for i in range(0, number):
exec_command = ['chroot', '/host','ip','link','set','dev']
exec_command+= ['ifb' + str(i), 'up']
resp = exec_cmd_in_pod(cli, exec_command, pod_name, 'default')
def delete_ifb(cli, pod_name):
"""
Function that deletes all virtual interfaces in a pod. Makes use of modprobe command
"""
exec_command = ['chroot', '/host', 'modprobe', '-r', 'ifb']
resp = exec_cmd_in_pod(cli, exec_command, pod_name, 'default')
def list_pods(cli, namespace, label_selector=None):
"""
Function used to list pods in a given namespace and having a certain label
"""
pods = []
try:
if label_selector:
ret = cli.list_namespaced_pod(namespace, pretty=True, label_selector=label_selector)
else:
ret = cli.list_namespaced_pod(namespace, pretty=True)
except ApiException as e:
logging.error(
"Exception when calling \
CoreV1Api->list_namespaced_pod: %s\n"
% e
)
raise e
for pod in ret.items:
pods.append(pod.metadata.name)
return pods
def get_job_status(batch_cli, name, namespace="default"):
"""
Function that retrieves the status of a running job in a given namespace
"""
try:
return batch_cli.read_namespaced_job_status(name=name, namespace=namespace)
except Exception as e:
logging.error(
"Exception when calling \
BatchV1Api->read_namespaced_job_status: %s"
% e
)
raise
def get_pod_log(cli, name, namespace="default"):
"""
Function that retrieves the logs of a running pod in a given namespace
"""
return cli.read_namespaced_pod_log(
name=name, namespace=namespace, _return_http_data_only=True, _preload_content=False
)
def read_pod(cli, name, namespace="default"):
"""
Function that retrieves the info of a running pod in a given namespace
"""
return cli.read_namespaced_pod(name=name, namespace=namespace)
def delete_job(batch_cli, name, namespace="default"):
"""
Deletes a job with the input name and namespace
"""
try:
api_response = batch_cli.delete_namespaced_job(
name=name,
namespace=namespace,
body=client.V1DeleteOptions(propagation_policy="Foreground", grace_period_seconds=0),
)
logging.debug("Job deleted. status='%s'" % str(api_response.status))
return api_response
except ApiException as api:
logging.warning(
"Exception when calling \
BatchV1Api->create_namespaced_job: %s"
% api
)
logging.warning("Job already deleted\n")
except Exception as e:
logging.error(
"Exception when calling \
BatchV1Api->delete_namespaced_job: %s\n"
% e
)
sys.exit(1)
def list_ready_nodes(cli, label_selector=None):
"""
Returns a list of ready nodes
"""
nodes = []
try:
if label_selector:
ret = cli.list_node(pretty=True, label_selector=label_selector)
else:
ret = cli.list_node(pretty=True)
except ApiException as e:
logging.error("Exception when calling CoreV1Api->list_node: %s\n" % e)
raise e
for node in ret.items:
for cond in node.status.conditions:
if str(cond.type) == "Ready" and str(cond.status) == "True":
nodes.append(node.metadata.name)
return nodes
def get_node(node_name, label_selector, instance_kill_count, cli):
"""
Returns active node(s) on which the scenario can be performed
"""
if node_name in list_ready_nodes(cli):
return [node_name]
elif node_name:
logging.info(
"Node with provided node_name does not exist or the node might "
"be in NotReady state."
)
nodes = list_ready_nodes(cli, label_selector)
if not nodes:
raise Exception("Ready nodes with the provided label selector do not exist")
logging.info(
"Ready nodes with the label selector %s: %s" % (label_selector, nodes)
)
number_of_nodes = len(nodes)
if instance_kill_count == number_of_nodes:
return nodes
nodes_to_return = []
for i in range(instance_kill_count):
node_to_add = nodes[random.randint(0, len(nodes) - 1)]
nodes_to_return.append(node_to_add)
nodes.remove(node_to_add)
return nodes_to_return

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import dataclasses
import json
import logging

View File

@@ -1,274 +0,0 @@
from kubernetes import config, client
from kubernetes.client.rest import ApiException
from kubernetes.stream import stream
import sys
import time
import logging
import random
def setup_kubernetes(kubeconfig_path) -> client.ApiClient:
"""
Sets up the Kubernetes client
"""
if kubeconfig_path is None:
kubeconfig_path = config.KUBE_CONFIG_DEFAULT_LOCATION
client_config = config.load_kube_config(kubeconfig_path)
return client.ApiClient(client_config)
def create_job(batch_cli, body, namespace="default"):
"""
Function used to create a job from a YAML config
"""
try:
api_response = batch_cli.create_namespaced_job(body=body, namespace=namespace)
return api_response
except ApiException as api:
logging.warning(
"Exception when calling \
BatchV1Api->create_job: %s"
% api
)
if api.status == 409:
logging.warning("Job already present")
except Exception as e:
logging.error(
"Exception when calling \
BatchV1Api->create_namespaced_job: %s"
% e
)
raise
def delete_pod(cli, name, namespace):
"""
Function that deletes a pod and waits until deletion is complete
"""
try:
cli.delete_namespaced_pod(name=name, namespace=namespace)
while cli.read_namespaced_pod(name=name, namespace=namespace):
time.sleep(1)
except ApiException as e:
if e.status == 404:
logging.info("Pod deleted")
else:
logging.error("Failed to delete pod %s" % e)
raise e
def create_pod(cli, body, namespace, timeout=120):
"""
Function used to create a pod from a YAML config
"""
try:
pod_stat = None
pod_stat = cli.create_namespaced_pod(body=body, namespace=namespace)
end_time = time.time() + timeout
while True:
pod_stat = cli.read_namespaced_pod(
name=body["metadata"]["name"], namespace=namespace
)
if pod_stat.status.phase == "Running":
break
if time.time() > end_time:
raise Exception("Starting pod failed")
time.sleep(1)
except Exception as e:
logging.error("Pod creation failed %s" % e)
if pod_stat:
logging.error(pod_stat.status.container_statuses)
delete_pod(cli, body["metadata"]["name"], namespace)
sys.exit(1)
def exec_cmd_in_pod(cli, command, pod_name, namespace, container=None):
"""
Function used to execute a command in a running pod
"""
exec_command = command
try:
if container:
ret = stream(
cli.connect_get_namespaced_pod_exec,
pod_name,
namespace,
container=container,
command=exec_command,
stderr=True,
stdin=False,
stdout=True,
tty=False,
)
else:
ret = stream(
cli.connect_get_namespaced_pod_exec,
pod_name,
namespace,
command=exec_command,
stderr=True,
stdin=False,
stdout=True,
tty=False,
)
except BaseException:
return False
return ret
def list_pods(cli, namespace, label_selector=None, exclude_label=None):
"""
Function used to list pods in a given namespace and having a certain label and excluding pods with exclude_label
and excluding pods with exclude_label
"""
pods = []
try:
if label_selector:
ret = cli.list_namespaced_pod(
namespace, pretty=True, label_selector=label_selector
)
else:
ret = cli.list_namespaced_pod(namespace, pretty=True)
except ApiException as e:
logging.error(
"Exception when calling \
CoreV1Api->list_namespaced_pod: %s\n"
% e
)
raise e
for pod in ret.items:
# Skip pods with the exclude label if specified
if exclude_label and pod.metadata.labels:
exclude_key, exclude_value = exclude_label.split("=", 1)
if (
exclude_key in pod.metadata.labels
and pod.metadata.labels[exclude_key] == exclude_value
):
continue
pods.append(pod.metadata.name)
return pods
def get_job_status(batch_cli, name, namespace="default"):
"""
Function that retrieves the status of a running job in a given namespace
"""
try:
return batch_cli.read_namespaced_job_status(name=name, namespace=namespace)
except Exception as e:
logging.error(
"Exception when calling \
BatchV1Api->read_namespaced_job_status: %s"
% e
)
raise
def get_pod_log(cli, name, namespace="default"):
"""
Function that retrieves the logs of a running pod in a given namespace
"""
return cli.read_namespaced_pod_log(
name=name,
namespace=namespace,
_return_http_data_only=True,
_preload_content=False,
)
def read_pod(cli, name, namespace="default"):
"""
Function that retrieves the info of a running pod in a given namespace
"""
return cli.read_namespaced_pod(name=name, namespace=namespace)
def delete_job(batch_cli, name, namespace="default"):
"""
Deletes a job with the input name and namespace
"""
try:
api_response = batch_cli.delete_namespaced_job(
name=name,
namespace=namespace,
body=client.V1DeleteOptions(
propagation_policy="Foreground", grace_period_seconds=0
),
)
logging.debug("Job deleted. status='%s'" % str(api_response.status))
return api_response
except ApiException as api:
logging.warning(
"Exception when calling \
BatchV1Api->create_namespaced_job: %s"
% api
)
logging.warning("Job already deleted\n")
except Exception as e:
logging.error(
"Exception when calling \
BatchV1Api->delete_namespaced_job: %s\n"
% e
)
sys.exit(1)
def list_ready_nodes(cli, label_selector=None):
"""
Returns a list of ready nodes
"""
nodes = []
try:
if label_selector:
ret = cli.list_node(pretty=True, label_selector=label_selector)
else:
ret = cli.list_node(pretty=True)
except ApiException as e:
logging.error("Exception when calling CoreV1Api->list_node: %s\n" % e)
raise e
for node in ret.items:
for cond in node.status.conditions:
if str(cond.type) == "Ready" and str(cond.status) == "True":
nodes.append(node.metadata.name)
return nodes
def get_node(node_name, label_selector, instance_kill_count, cli):
"""
Returns active node(s) on which the scenario can be performed
"""
if node_name in list_ready_nodes(cli):
return [node_name]
elif node_name:
logging.info(
"Node with provided node_name does not exist or the node might "
"be in NotReady state."
)
nodes = list_ready_nodes(cli, label_selector)
if not nodes:
raise Exception("Ready nodes with the provided label selector do not exist")
logging.info("Ready nodes with the label selector %s: %s" % (label_selector, nodes))
number_of_nodes = len(nodes)
if instance_kill_count == number_of_nodes:
return nodes
nodes_to_return = []
for i in range(instance_kill_count):
node_to_add = nodes[random.randint(0, len(nodes) - 1)]
nodes_to_return.append(node_to_add)
nodes.remove(node_to_add)
return nodes_to_return

View File

@@ -1,4 +1,17 @@
#!/usr/bin/env python3
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import os
import typing

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import dataclasses
import subprocess
import sys

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import os
import random
@@ -179,6 +192,12 @@ class NetworkChaosScenarioPlugin(AbstractScenarioPlugin):
pods_list = kubecli.list_pods(
label_selector=pod_label_selector, namespace="default"
)
if not pods_list:
raise Exception(
f"No pods found matching label selector '{pod_label_selector}' "
f"in namespace 'default'. The job pod may not have started or "
f"the label selector may be incorrect."
)
return pods_list[0]
# krkn_lib
@@ -218,8 +237,8 @@ class NetworkChaosScenarioPlugin(AbstractScenarioPlugin):
)
pod_log = pod_log_response.data.decode("utf-8")
logging.error(pod_log)
except Exception:
logging.warning("Exception in getting job status")
except Exception as e:
logging.warning(f"Exception in getting job status: {e}")
kubecli.delete_job(name=jobname, namespace="default")
def get_egress_cmd(self, execution, test_interface, mod, vallst, duration=30):

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import re
from dataclasses import dataclass
from enum import Enum
@@ -43,6 +56,10 @@ class BaseNetworkChaosConfig:
errors.append("wait_duration must be an int")
if not isinstance(self.test_duration, int):
errors.append("test_duration must be an int")
if not isinstance(self.instance_count, int):
errors.append("instance_count must be an int")
elif self.instance_count < 0:
errors.append("instance_count must be >= 0")
return errors
@@ -62,6 +79,19 @@ class NetworkFilterConfig(BaseNetworkChaosConfig):
return errors
@dataclass
class InterfaceDownConfig(BaseNetworkChaosConfig):
ingress: bool = True
egress: bool = True
recovery_time: int = 0
def validate(self) -> list[str]:
errors = super().validate()
if not isinstance(self.recovery_time, int) or self.recovery_time < 0:
errors.append("recovery_time must be a non-negative integer (seconds)")
return errors
@dataclass
class NetworkChaosConfig(BaseNetworkChaosConfig):
latency: Optional[str] = None

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import abc
import logging
import queue
@@ -44,7 +57,7 @@ class AbstractNetworkChaosModule(abc.ABC):
def get_node_targets(self, config: BaseNetworkChaosConfig):
if self.base_network_config.label_selector:
return self.kubecli.get_lib_kubernetes().list_nodes(
return self.kubecli.get_lib_kubernetes().list_ready_nodes(
self.base_network_config.label_selector
)
else:
@@ -52,9 +65,9 @@ class AbstractNetworkChaosModule(abc.ABC):
raise Exception(
"neither node selector nor node_name (target) specified, aborting."
)
node_info = self.kubecli.get_lib_kubernetes().list_nodes()
if config.target not in node_info:
raise Exception(f"node {config.target} not found, aborting")
ready_nodes = self.kubecli.get_lib_kubernetes().list_ready_nodes()
if config.target not in ready_nodes:
raise Exception(f"node {config.target} not found or not Ready, aborting")
return [config.target]

View File

@@ -0,0 +1,155 @@
import queue
import time
from typing import Tuple
from krkn_lib.telemetry.ocp import KrknTelemetryOpenshift
from krkn_lib.utils import get_random_string
from krkn.scenario_plugins.network_chaos_ng.models import (
NetworkChaosScenarioType,
BaseNetworkChaosConfig,
InterfaceDownConfig,
)
from krkn.scenario_plugins.network_chaos_ng.modules.abstract_network_chaos_module import (
AbstractNetworkChaosModule,
)
from krkn.scenario_plugins.network_chaos_ng.modules.utils import (
log_info,
log_error,
deploy_network_chaos_ng_pod,
get_pod_default_interface,
)
class NodeInterfaceDownModule(AbstractNetworkChaosModule):
config: InterfaceDownConfig
kubecli: KrknTelemetryOpenshift
def __init__(self, config: InterfaceDownConfig, kubecli: KrknTelemetryOpenshift):
super().__init__(config, kubecli)
self.config = config
def run(self, target: str, error_queue: queue.Queue = None):
parallel = False
if error_queue:
parallel = True
try:
pod_name = f"node-iface-down-{get_random_string(5)}"
log_info(
f"creating workload pod on node {target} to bring interface(s) down",
parallel,
target,
)
deploy_network_chaos_ng_pod(
self.config,
target,
pod_name,
self.kubecli.get_lib_kubernetes(),
)
if len(self.config.interfaces) == 0:
interfaces = [
get_pod_default_interface(
pod_name,
self.config.namespace,
self.kubecli.get_lib_kubernetes(),
)
]
if not interfaces[0]:
log_error(
"could not detect default network interface, aborting",
parallel,
target,
)
self.kubecli.get_lib_kubernetes().delete_pod(
pod_name, self.config.namespace
)
return
log_info(
f"detected default interface: {interfaces[0]}", parallel, target
)
else:
interfaces = self.config.interfaces
log_info(
f"scheduling recovery and bringing down interface(s): {', '.join(interfaces)} on node {target}",
parallel,
target,
)
# Pre-schedule recovery as a background process on the node before bringing
# the interface down. Once the interface is down the node loses connectivity
# to the control plane, so exec_cmd_in_pod can no longer reach the pod.
# The background process runs entirely on the node and fires regardless of
# control-plane connectivity.
recovery_cmds = " && ".join(
[f"ip link set {iface} up" for iface in interfaces]
)
down_cmds = " && ".join(
[f"ip link set {iface} down" for iface in interfaces]
)
cmd = f"(sleep {self.config.test_duration} && {recovery_cmds}) & {down_cmds}"
self.kubecli.get_lib_kubernetes().exec_cmd_in_pod(
[cmd], pod_name, self.config.namespace
)
log_info(
f"interface(s) {', '.join(interfaces)} are down on node {target}, "
f"recovery scheduled in {self.config.test_duration}s",
parallel,
target,
)
log_info(
f"waiting {self.config.test_duration} seconds for interface(s) to recover",
parallel,
target,
)
time.sleep(self.config.test_duration)
log_info(
f"waiting for node {target} to become Ready after interface recovery",
parallel,
target,
)
node_ready = False
for _ in range(60):
time.sleep(5)
ready_nodes = self.kubecli.get_lib_kubernetes().list_ready_nodes()
if target in ready_nodes:
node_ready = True
break
if not node_ready:
log_error(
f"node {target} did not become Ready within 5 minutes after interface recovery",
parallel,
target,
)
else:
log_info(f"node {target} is Ready", parallel, target)
if self.config.recovery_time > 0:
log_info(
f"waiting {self.config.recovery_time} seconds for node to stabilize",
parallel,
target,
)
time.sleep(self.config.recovery_time)
self.kubecli.get_lib_kubernetes().delete_pod(
pod_name, self.config.namespace
)
except Exception as e:
if error_queue is None:
raise e
else:
error_queue.put(str(e))
def get_config(self) -> Tuple[NetworkChaosScenarioType, BaseNetworkChaosConfig]:
return NetworkChaosScenarioType.Node, self.config
def get_targets(self) -> list[str]:
return self.get_node_targets(self.config)

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import queue
import time
from typing import Tuple

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import queue
import time
from typing import Tuple

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import queue
import time
from typing import Tuple

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import queue
import time
from typing import Tuple

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import os
from typing import Tuple

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import subprocess
import logging
from typing import Optional

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Tuple
from krkn_lib.k8s import KrknKubernetes

View File

@@ -1,12 +1,29 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from krkn_lib.telemetry.ocp import KrknTelemetryOpenshift
from krkn.scenario_plugins.network_chaos_ng.models import (
NetworkFilterConfig,
NetworkChaosConfig,
InterfaceDownConfig,
)
from krkn.scenario_plugins.network_chaos_ng.modules.abstract_network_chaos_module import (
AbstractNetworkChaosModule,
)
from krkn.scenario_plugins.network_chaos_ng.modules.node_interface_down import (
NodeInterfaceDownModule,
)
from krkn.scenario_plugins.network_chaos_ng.modules.node_network_chaos import (
NodeNetworkChaosModule,
)
@@ -25,6 +42,7 @@ supported_modules = [
"pod_network_filter",
"pod_network_chaos",
"node_network_chaos",
"node_interface_down",
]
@@ -63,5 +81,11 @@ class NetworkChaosFactory:
if len(errors) > 0:
raise Exception(f"config validation errors: [{';'.join(errors)}]")
return NodeNetworkChaosModule(scenario_config, kubecli)
if config["id"] == "node_interface_down":
scenario_config = InterfaceDownConfig(**config)
errors = scenario_config.validate()
if len(errors) > 0:
raise Exception(f"config validation errors: [{';'.join(errors)}]")
return NodeInterfaceDownModule(scenario_config, kubecli)
else:
raise Exception(f"invalid network chaos id {config['id']}")

View File

@@ -1,3 +1,19 @@
#!/usr/bin/env python
#
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import queue
import random
@@ -52,8 +68,8 @@ class NetworkChaosNgScenarioPlugin(AbstractScenarioPlugin):
)
if (
network_chaos_config.instance_count != 0
and network_chaos_config.instance_count > len(targets)
network_chaos_config.instance_count > 0
and len(targets) > network_chaos_config.instance_count
):
targets = random.sample(
targets, network_chaos_config.instance_count
@@ -63,7 +79,7 @@ class NetworkChaosNgScenarioPlugin(AbstractScenarioPlugin):
self.run_parallel(targets, network_chaos)
else:
self.run_serial(targets, network_chaos)
if len(config) > 1:
if len(scenario_config) > 1:
logging.info(
f"waiting {network_chaos_config.wait_duration} seconds before running the next "
f"Network Chaos NG Module"

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import logging
import time

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import time
import logging

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import time
import boto3

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
import os
import logging

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import krkn.scenario_plugins.node_actions.common_node_functions as nodeaction
from krkn.scenario_plugins.node_actions.abstract_node_scenarios import (
abstract_node_scenarios,

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
import random
import logging

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import krkn.scenario_plugins.node_actions.common_node_functions as nodeaction
from krkn.scenario_plugins.node_actions.abstract_node_scenarios import (
abstract_node_scenarios,

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
import logging
import google.auth

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
from krkn.scenario_plugins.node_actions.abstract_node_scenarios import (
abstract_node_scenarios,

View File

@@ -1,4 +1,17 @@
#!/usr/bin/env python
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
import typing
from os import environ

View File

@@ -1,4 +1,17 @@
#!/usr/bin/env python
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
from os import environ
from dataclasses import dataclass

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import time
from multiprocessing.pool import ThreadPool

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import time
import logging

View File

@@ -1,4 +1,17 @@
#!/usr/bin/env python
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import random
import sys

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from dataclasses import dataclass
@dataclass

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import random
import time

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import base64
import json
import logging

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import importlib
import inspect
import pkgutil

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import random
import time

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import json
import logging
import time

View File

@@ -0,0 +1,13 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -1,3 +1,16 @@
# Copyright 2025 The Krkn Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import time
from multiprocessing.pool import ThreadPool

Some files were not shown because too many files have changed in this diff Show More