Compare commits

..

1 Commits

Author SHA1 Message Date
Paige Patton
717cb72f79 Update tests.yml 2026-01-16 13:51:27 -05:00
25 changed files with 43 additions and 529 deletions

View File

@@ -23,7 +23,6 @@ If checked, a documentation PR must be created and merged in the [website reposi
<-- Add the link to the corresponding documentation PR in the website repository -->
# Checklist before requesting a review
[ ] Ensure the changes and proposed solution have been discussed in the relevant issue and have received acknowledgment from the community or maintainers. See [contributing guidelines](https://krkn-chaos.dev/docs/contribution-guidelines/)
See [testing your changes](https://krkn-chaos.dev/docs/developers-guide/testing-changes/) and run on any Kubernetes or OpenShift cluster to validate your changes
- [ ] I have performed a self-review of my code by running krkn and specific scenario
- [ ] If it is a core feature, I have added thorough unit tests with above 80% coverage
@@ -44,4 +43,4 @@ OR
python -m coverage run -a -m unittest discover -s tests -v
...
<---insert test results output--->
```
```

View File

@@ -1,52 +0,0 @@
name: Manage Stale Issues and Pull Requests
on:
schedule:
# Run daily at 1:00 AM UTC
- cron: '0 1 * * *'
workflow_dispatch:
permissions:
issues: write
pull-requests: write
jobs:
stale:
name: Mark and Close Stale Issues and PRs
runs-on: ubuntu-latest
steps:
- name: Mark and close stale issues and PRs
uses: actions/stale@v9
with:
days-before-issue-stale: 60
days-before-issue-close: 14
stale-issue-label: 'stale'
stale-issue-message: |
This issue has been automatically marked as stale because it has not had any activity in the last 60 days.
It will be closed in 14 days if no further activity occurs.
If this issue is still relevant, please leave a comment or remove the stale label.
Thank you for your contributions to krkn!
close-issue-message: |
This issue has been automatically closed due to inactivity.
If you believe this issue is still relevant, please feel free to reopen it or create a new issue with updated information.
Thank you for your understanding!
close-issue-reason: 'not_planned'
days-before-pr-stale: 90
days-before-pr-close: 14
stale-pr-label: 'stale'
stale-pr-message: |
This pull request has been automatically marked as stale because it has not had any activity in the last 90 days.
It will be closed in 14 days if no further activity occurs.
If this PR is still relevant, please rebase it, address any pending reviews, or leave a comment.
Thank you for your contributions to krkn!
close-pr-message: |
This pull request has been automatically closed due to inactivity.
If you believe this PR is still relevant, please feel free to reopen it or create a new pull request with updated changes.
Thank you for your understanding!
# Exempt labels
exempt-issue-labels: 'bug,enhancement,good first issue'
exempt-pr-labels: 'pending discussions,hold'
remove-stale-when-updated: true

View File

@@ -32,14 +32,13 @@ jobs:
- name: Install Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
python-version: '3.9'
architecture: 'x64'
- name: Install environment
run: |
sudo apt-get install build-essential python3-dev
pip install --upgrade pip
pip install -r requirements.txt
pip install coverage
- name: Deploy test workloads
run: |
@@ -96,7 +95,7 @@ jobs:
echo "test_node" >> ./CI/tests/functional_tests
echo "test_pod" >> ./CI/tests/functional_tests
echo "test_pod_error" >> ./CI/tests/functional_tests
echo "test_service_hijacking" >> ./CI/tests/functional_tests
echo "test_service_hijacking" > ./CI/tests/functional_tests
echo "test_pod_network_filter" >> ./CI/tests/functional_tests
echo "test_pod_server" >> ./CI/tests/functional_tests
echo "test_time" >> ./CI/tests/functional_tests
@@ -182,7 +181,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
python-version: 3.9
- name: Copy badge on GitHub Page Repo
env:
COLOR: yellow

273
CLAUDE.md
View File

@@ -1,273 +0,0 @@
# CLAUDE.md - Krkn Chaos Engineering Framework
## Project Overview
Krkn (Kraken) is a chaos engineering tool for Kubernetes/OpenShift clusters. It injects deliberate failures to validate cluster resilience. Plugin-based architecture with multi-cloud support (AWS, Azure, GCP, IBM Cloud, VMware, Alibaba, OpenStack).
## Repository Structure
```
krkn/
├── krkn/
│ ├── scenario_plugins/ # Chaos scenario plugins (pod, node, network, hogs, etc.)
│ ├── utils/ # Utility functions
│ ├── rollback/ # Rollback management
│ ├── prometheus/ # Prometheus integration
│ └── cerberus/ # Health monitoring
├── tests/ # Unit tests (unittest framework)
├── scenarios/ # Example scenario configs (openshift/, kube/, kind/)
├── config/ # Configuration files
└── CI/ # CI/CD test scripts
```
## Quick Start
```bash
# Setup (ALWAYS use virtual environment)
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Run Krkn
python run_kraken.py --config config/config.yaml
# Note: Scenarios are specified in config.yaml under kraken.chaos_scenarios
# There is no --scenario flag; edit config/config.yaml to select scenarios
# Run tests
python -m unittest discover -s tests -v
python -m coverage run -a -m unittest discover -s tests -v
```
## Critical Requirements
### Python Environment
- **Python 3.9+** required
- **NEVER install packages globally** - always use virtual environment
- **CRITICAL**: `docker` must be <7.0 and `requests` must be <2.32 (Unix socket compatibility)
### Key Dependencies
- **krkn-lib** (5.1.13): Core library for Kubernetes/OpenShift operations
- **kubernetes** (34.1.0): Kubernetes Python client
- **docker** (<7.0), **requests** (<2.32): DO NOT upgrade without verifying compatibility
- Cloud SDKs: boto3 (AWS), azure-mgmt-* (Azure), google-cloud-compute (GCP), ibm_vpc (IBM), pyVmomi (VMware)
## Plugin Architecture (CRITICAL)
**Strictly enforced naming conventions:**
### Naming Rules
- **Module files**: Must end with `_scenario_plugin.py` and use snake_case
- Example: `pod_disruption_scenario_plugin.py`
- **Class names**: Must be CamelCase and end with `ScenarioPlugin`
- Example: `PodDisruptionScenarioPlugin`
- Must match module filename (snake_case ↔ CamelCase)
- **Directory structure**: Plugin dirs CANNOT contain "scenario" or "plugin"
- Location: `krkn/scenario_plugins/<plugin_name>/`
### Plugin Implementation
Every plugin MUST:
1. Extend `AbstractScenarioPlugin`
2. Implement `run()` method
3. Implement `get_scenario_types()` method
```python
from krkn.scenario_plugins import AbstractScenarioPlugin
class PodDisruptionScenarioPlugin(AbstractScenarioPlugin):
def run(self, config, scenarios_list, kubeconfig_path, wait_duration):
pass
def get_scenario_types(self):
return ["pod_scenarios", "pod_outage"]
```
### Creating a New Plugin
1. Create directory: `krkn/scenario_plugins/<plugin_name>/`
2. Create module: `<plugin_name>_scenario_plugin.py`
3. Create class: `<PluginName>ScenarioPlugin` extending `AbstractScenarioPlugin`
4. Implement `run()` and `get_scenario_types()`
5. Create unit test: `tests/test_<plugin_name>_scenario_plugin.py`
6. Add example scenario: `scenarios/<platform>/<scenario>.yaml`
**DO NOT**: Violate naming conventions (factory will reject), include "scenario"/"plugin" in directory names, create plugins without tests.
## Testing
### Unit Tests
```bash
# Run all tests
python -m unittest discover -s tests -v
# Specific test
python -m unittest tests.test_pod_disruption_scenario_plugin
# With coverage
python -m coverage run -a -m unittest discover -s tests -v
python -m coverage html
```
**Test requirements:**
- Naming: `test_<module>_scenario_plugin.py`
- Mock external dependencies (Kubernetes API, cloud providers)
- Test success, failure, and edge cases
- Keep tests isolated and independent
### Functional Tests
Located in `CI/tests/`. Can be run locally on a kind cluster with Prometheus and Elasticsearch set up.
**Setup for local testing:**
1. Deploy Prometheus and Elasticsearch on your kind cluster:
- Prometheus setup: https://krkn-chaos.dev/docs/developers-guide/testing-changes/#prometheus
- Elasticsearch setup: https://krkn-chaos.dev/docs/developers-guide/testing-changes/#elasticsearch
2. Or disable monitoring features in `config/config.yaml`:
```yaml
performance_monitoring:
enable_alerts: False
enable_metrics: False
check_critical_alerts: False
```
**Note:** Functional tests run automatically in CI with full monitoring enabled.
## Cloud Provider Implementations
Node chaos scenarios are cloud-specific. Each in `krkn/scenario_plugins/node_actions/<provider>_node_scenarios.py`:
- AWS, Azure, GCP, IBM Cloud, VMware, Alibaba, OpenStack, Bare Metal
Implement: stop, start, reboot, terminate instances.
**When modifying**: Maintain consistency with other providers, handle API errors, add logging, update tests.
### Adding Cloud Provider Support
1. Create: `krkn/scenario_plugins/node_actions/<provider>_node_scenarios.py`
2. Extend: `abstract_node_scenarios.AbstractNodeScenarios`
3. Implement: `stop_instances`, `start_instances`, `reboot_instances`, `terminate_instances`
4. Add SDK to `requirements.txt`
5. Create unit test with mocked SDK
6. Add example scenario: `scenarios/openshift/<provider>_node_scenarios.yml`
## Configuration
**Main config**: `config/config.yaml`
- `kraken`: Core settings
- `cerberus`: Health monitoring
- `performance_monitoring`: Prometheus
- `elastic`: Elasticsearch telemetry
**Scenario configs**: `scenarios/` directory
```yaml
- config:
scenario_type: <type> # Must match plugin's get_scenario_types()
```
## Code Style
- **Import order**: Standard library, third-party, local imports
- **Naming**: snake_case (functions/variables), CamelCase (classes)
- **Logging**: Use Python's `logging` module
- **Error handling**: Return appropriate exit codes
- **Docstrings**: Required for public functions/classes
## Exit Codes
Krkn uses specific exit codes to communicate execution status:
- `0`: Success - all scenarios passed, no critical alerts
- `1`: Scenario failure - one or more scenarios failed
- `2`: Critical alerts fired during execution
- `3+`: Health check failure (Cerberus monitoring detected issues)
**When implementing scenarios:**
- Return `0` on success
- Return `1` on scenario-specific failures
- Propagate health check failures appropriately
- Log exit code reasons clearly
## Container Support
Krkn can run inside a container. See `containers/` directory.
**Building custom image:**
```bash
cd containers
./compile_dockerfile.sh # Generates Dockerfile from template
docker build -t krkn:latest .
```
**Running containerized:**
```bash
docker run -v ~/.kube:/root/.kube:Z \
-v $(pwd)/config:/config:Z \
-v $(pwd)/scenarios:/scenarios:Z \
krkn:latest
```
## Git Workflow
- **NEVER commit directly to main**
- **NEVER use `--force` without approval**
- **ALWAYS create feature branches**: `git checkout -b feature/description`
- **ALWAYS run tests before pushing**
**Conventional commits**: `feat:`, `fix:`, `test:`, `docs:`, `refactor:`
```bash
git checkout main && git pull origin main
git checkout -b feature/your-feature-name
# Make changes, write tests
python -m unittest discover -s tests -v
git add <specific-files>
git commit -m "feat: description"
git push -u origin feature/your-feature-name
```
## Environment Variables
- `KUBECONFIG`: Path to kubeconfig
- `AWS_*`, `AZURE_*`, `GOOGLE_APPLICATION_CREDENTIALS`: Cloud credentials
- `PROMETHEUS_URL`, `ELASTIC_URL`, `ELASTIC_PASSWORD`: Monitoring config
**NEVER commit credentials or API keys.**
## Common Pitfalls
1. Missing virtual environment - always activate venv
2. Running functional tests without cluster setup
3. Ignoring exit codes
4. Modifying krkn-lib directly (it's a separate package)
5. Upgrading docker/requests beyond version constraints
## Before Writing Code
1. Check for existing implementations
2. Review existing plugins as examples
3. Maintain consistency with cloud provider patterns
4. Plan rollback logic
5. Write tests alongside code
6. Update documentation
## When Adding Dependencies
1. Check if functionality exists in krkn-lib or current dependencies
2. Verify compatibility with existing versions
3. Pin specific versions in `requirements.txt`
4. Check for security vulnerabilities
5. Test thoroughly for conflicts
## Common Development Tasks
### Modifying Existing Plugin
1. Read plugin code and corresponding test
2. Make changes
3. Update/add unit tests
4. Run: `python -m unittest tests.test_<plugin>_scenario_plugin`
### Writing Unit Tests
1. Create: `tests/test_<module>_scenario_plugin.py`
2. Import `unittest` and plugin class
3. Mock external dependencies
4. Test success, failure, and edge cases
5. Run: `python -m unittest tests.test_<module>_scenario_plugin`

View File

@@ -41,7 +41,7 @@ ENV KUBECONFIG /home/krkn/.kube/config
# This overwrites any existing configuration in /etc/yum.repos.d/kubernetes.repo
RUN dnf update && dnf install -y --setopt=install_weak_deps=False \
git python3.11 jq yq gettext wget which ipmitool openssh-server &&\
git python39 jq yq gettext wget which ipmitool openssh-server &&\
dnf clean all
# copy oc client binary from oc-build image
@@ -63,15 +63,15 @@ RUN if [ -n "$PR_NUMBER" ]; then git fetch origin pull/${PR_NUMBER}/head:pr-${PR
# if it is a TAG trigger checkout the tag
RUN if [ -n "$TAG" ]; then git checkout "$TAG";fi
RUN python3.11 -m ensurepip --upgrade --default-pip
RUN python3.11 -m pip install --upgrade pip setuptools==78.1.1
RUN python3.9 -m ensurepip --upgrade --default-pip
RUN python3.9 -m pip install --upgrade pip setuptools==78.1.1
# removes the the vulnerable versions of setuptools and pip
RUN rm -rf "$(pip cache dir)"
RUN rm -rf /tmp/*
RUN rm -rf /usr/local/lib/python3.11/ensurepip/_bundled
RUN pip3.11 install -r requirements.txt
RUN pip3.11 install jsonschema
RUN rm -rf /usr/local/lib/python3.9/ensurepip/_bundled
RUN pip3.9 install -r requirements.txt
RUN pip3.9 install jsonschema
LABEL krknctl.title.global="Krkn Base Image"
LABEL krknctl.description.global="This is the krkn base image."

View File

@@ -146,7 +146,7 @@ class AbstractScenarioPlugin(ABC):
if scenario_telemetry.exit_status != 0:
failed_scenarios.append(scenario_config)
scenario_telemetries.append(scenario_telemetry)
logging.info(f"waiting {wait_duration} before running the next scenario")
logging.info(f"wating {wait_duration} before running the next scenario")
time.sleep(wait_duration)
return failed_scenarios, scenario_telemetries

View File

@@ -53,7 +53,7 @@ class HogsScenarioPlugin(AbstractScenarioPlugin):
raise Exception("no available nodes to schedule workload")
if not has_selector:
available_nodes = [available_nodes[random.randint(0, len(available_nodes) - 1)]]
available_nodes = [available_nodes[random.randint(0, len(available_nodes))]]
if scenario_config.number_of_nodes and len(available_nodes) > scenario_config.number_of_nodes:
available_nodes = random.sample(available_nodes, scenario_config.number_of_nodes)

View File

@@ -76,7 +76,7 @@ class abstract_node_scenarios:
nodeaction.wait_for_unknown_status(node, timeout, self.kubecli, affected_node)
logging.info("The kubelet of the node %s has been stopped" % (node))
logging.info("stop_kubelet_scenario has been successfully injected!")
logging.info("stop_kubelet_scenario has been successfuly injected!")
except Exception as e:
logging.error(
"Failed to stop the kubelet of the node. Encountered following "
@@ -108,7 +108,7 @@ class abstract_node_scenarios:
)
nodeaction.wait_for_ready_status(node, timeout, self.kubecli,affected_node)
logging.info("The kubelet of the node %s has been restarted" % (node))
logging.info("restart_kubelet_scenario has been successfully injected!")
logging.info("restart_kubelet_scenario has been successfuly injected!")
except Exception as e:
logging.error(
"Failed to restart the kubelet of the node. Encountered following "
@@ -128,7 +128,7 @@ class abstract_node_scenarios:
"oc debug node/" + node + " -- chroot /host "
"dd if=/dev/urandom of=/proc/sysrq-trigger"
)
logging.info("node_crash_scenario has been successfully injected!")
logging.info("node_crash_scenario has been successfuly injected!")
except Exception as e:
logging.error(
"Failed to crash the node. Encountered following exception: %s. "

View File

@@ -379,7 +379,7 @@ class aws_node_scenarios(abstract_node_scenarios):
logging.info(
"Node with instance ID: %s has been terminated" % (instance_id)
)
logging.info("node_termination_scenario has been successfully injected!")
logging.info("node_termination_scenario has been successfuly injected!")
except Exception as e:
logging.error(
"Failed to terminate node instance. Encountered following exception:"
@@ -408,7 +408,7 @@ class aws_node_scenarios(abstract_node_scenarios):
logging.info(
"Node with instance ID: %s has been rebooted" % (instance_id)
)
logging.info("node_reboot_scenario has been successfully injected!")
logging.info("node_reboot_scenario has been successfuly injected!")
except Exception as e:
logging.error(
"Failed to reboot node instance. Encountered following exception:"

View File

@@ -229,7 +229,7 @@ class bm_node_scenarios(abstract_node_scenarios):
nodeaction.wait_for_unknown_status(node, timeout, self.kubecli, affected_node)
nodeaction.wait_for_ready_status(node, timeout, self.kubecli, affected_node)
logging.info("Node with bmc address: %s has been rebooted" % (bmc_addr))
logging.info("node_reboot_scenario has been successfully injected!")
logging.info("node_reboot_scenario has been successfuly injected!")
except Exception as e:
logging.error(
"Failed to reboot node instance. Encountered following exception:"

View File

@@ -237,7 +237,7 @@ class docker_node_scenarios(abstract_node_scenarios):
logging.info(
"Node with container ID: %s has been terminated" % (container_id)
)
logging.info("node_termination_scenario has been successfully injected!")
logging.info("node_termination_scenario has been successfuly injected!")
except Exception as e:
logging.error(
"Failed to terminate node instance. Encountered following exception:"
@@ -264,7 +264,7 @@ class docker_node_scenarios(abstract_node_scenarios):
logging.info(
"Node with container ID: %s has been rebooted" % (container_id)
)
logging.info("node_reboot_scenario has been successfully injected!")
logging.info("node_reboot_scenario has been successfuly injected!")
except Exception as e:
logging.error(
"Failed to reboot node instance. Encountered following exception:"

View File

@@ -309,7 +309,7 @@ class gcp_node_scenarios(abstract_node_scenarios):
logging.info(
"Node with instance ID: %s has been terminated" % instance_id
)
logging.info("node_termination_scenario has been successfully injected!")
logging.info("node_termination_scenario has been successfuly injected!")
except Exception as e:
logging.error(
"Failed to terminate node instance. Encountered following exception:"
@@ -341,7 +341,7 @@ class gcp_node_scenarios(abstract_node_scenarios):
logging.info(
"Node with instance ID: %s has been rebooted" % instance_id
)
logging.info("node_reboot_scenario has been successfully injected!")
logging.info("node_reboot_scenario has been successfuly injected!")
except Exception as e:
logging.error(
"Failed to reboot node instance. Encountered following exception:"

View File

@@ -184,7 +184,7 @@ class openstack_node_scenarios(abstract_node_scenarios):
nodeaction.wait_for_unknown_status(node, timeout, self.kubecli, affected_node)
nodeaction.wait_for_ready_status(node, timeout, self.kubecli, affected_node)
logging.info("Node with instance name: %s has been rebooted" % (node))
logging.info("node_reboot_scenario has been successfully injected!")
logging.info("node_reboot_scenario has been successfuly injected!")
except Exception as e:
logging.error(
"Failed to reboot node instance. Encountered following exception:"
@@ -249,7 +249,7 @@ class openstack_node_scenarios(abstract_node_scenarios):
node_ip.strip(), service, ssh_private_key, timeout
)
logging.info("Service status checked on %s" % (node_ip))
logging.info("Check service status is successfully injected!")
logging.info("Check service status is successfuly injected!")
except Exception as e:
logging.error(
"Failed to check service status. Encountered following exception:"

View File

@@ -43,7 +43,7 @@ class TimeActionsScenarioPlugin(AbstractScenarioPlugin):
cerberus.publish_kraken_status(
krkn_config, not_reset, start_time, end_time
)
except (RuntimeError, Exception) as e:
except (RuntimeError, Exception):
logging.error(
f"TimeActionsScenarioPlugin scenario {scenario} failed with exception: {e}"
)

View File

@@ -140,7 +140,7 @@ class ZoneOutageScenarioPlugin(AbstractScenarioPlugin):
network_association_ids[0], acl_id
)
# capture the original_acl_id, created_acl_id and
# capture the orginal_acl_id, created_acl_id and
# new association_id to use during the recovery
ids[new_association_id] = original_acl_id
@@ -156,7 +156,7 @@ class ZoneOutageScenarioPlugin(AbstractScenarioPlugin):
new_association_id, original_acl_id
)
logging.info(
"Waiting for 60 seconds to make sure " "the changes are in place"
"Wating for 60 seconds to make sure " "the changes are in place"
)
time.sleep(60)

View File

@@ -1,71 +0,0 @@
import logging
import threading
from datetime import datetime, timezone
from krkn.utils.ErrorLog import ErrorLog
class ErrorCollectionHandler(logging.Handler):
"""
Custom logging handler that captures ERROR and CRITICAL level logs
in structured format for telemetry collection.
Stores logs in memory as ErrorLog objects for later retrieval.
Thread-safe for concurrent logging operations.
"""
def __init__(self, level=logging.ERROR):
"""
Initialize the error collection handler.
Args:
level: Minimum log level to capture (default: ERROR)
"""
super().__init__(level)
self.error_logs: list[ErrorLog] = []
self._lock = threading.Lock()
def emit(self, record: logging.LogRecord):
"""
Capture ERROR and CRITICAL logs and store as ErrorLog objects.
Args:
record: LogRecord from Python logging framework
"""
try:
# Only capture ERROR (40) and CRITICAL (50) levels
if record.levelno < logging.ERROR:
return
# Format timestamp as ISO 8601 UTC
timestamp = datetime.fromtimestamp(
record.created, tz=timezone.utc
).strftime("%Y-%m-%dT%H:%M:%S.%f")[:-3] + "Z"
# Create ErrorLog object
error_log = ErrorLog(
timestamp=timestamp,
message=record.getMessage()
)
# Thread-safe append
with self._lock:
self.error_logs.append(error_log)
except Exception:
# Handler should never raise exceptions (logging best practice)
self.handleError(record)
def get_error_logs(self) -> list[dict]:
"""
Retrieve all collected error logs as list of dictionaries.
Returns:
List of error log dictionaries with timestamp and message
"""
with self._lock:
return [log.to_dict() for log in self.error_logs]
def clear(self):
"""Clear all collected error logs (useful for testing)"""
with self._lock:
self.error_logs.clear()

View File

@@ -1,18 +0,0 @@
from dataclasses import dataclass, asdict
@dataclass
class ErrorLog:
"""
Represents a single error log entry for telemetry collection.
Attributes:
timestamp: ISO 8601 formatted timestamp (UTC)
message: Full error message text
"""
timestamp: str
message: str
def to_dict(self) -> dict:
"""Convert to dictionary for JSON serialization"""
return asdict(self)

View File

@@ -1,4 +1,2 @@
from .TeeLogHandler import TeeLogHandler
from .ErrorLog import ErrorLog
from .ErrorCollectionHandler import ErrorCollectionHandler
from .functions import *

View File

@@ -6,6 +6,7 @@ azure-identity==1.16.1
azure-keyvault==4.2.0
azure-mgmt-compute==30.5.0
azure-mgmt-network==27.0.0
itsdangerous==2.0.1
coverage==7.6.12
datetime==5.4
docker>=6.0,<7.0 # docker 7.0+ has breaking changes with Unix sockets
@@ -15,7 +16,7 @@ google-cloud-compute==1.22.0
ibm_cloud_sdk_core==3.18.0
ibm_vpc==0.20.0
jinja2==3.1.6
krkn-lib==6.0.1
krkn-lib==5.1.13
lxml==5.1.0
kubernetes==34.1.0
numpy==1.26.4
@@ -32,8 +33,9 @@ requests-unixsocket>=0.4.0 # Required for Docker Unix socket support
service_identity==24.1.0
PyYAML==6.0.1
setuptools==78.1.1
wheel>=0.44.0
zope.interface==6.1
werkzeug==3.1.4
wheel==0.42.0
zope.interface==5.4.0
git+https://github.com/vmware/vsphere-automation-sdk-python.git@v8.0.0.0
cryptography>=42.0.4 # not directly required, pinned by Snyk to avoid a vulnerability

View File

@@ -27,7 +27,7 @@ from krkn_lib.models.telemetry import ChaosRunTelemetry
from krkn_lib.utils import SafeLogger
from krkn_lib.utils.functions import get_yaml_item_value, get_junit_test_case
from krkn.utils import TeeLogHandler, ErrorCollectionHandler
from krkn.utils import TeeLogHandler
from krkn.utils.HealthChecker import HealthChecker
from krkn.utils.VirtChecker import VirtChecker
from krkn.scenario_plugins.scenario_plugin_factory import (
@@ -425,22 +425,16 @@ def main(options, command: Optional[str]) -> int:
logging.info("collecting Kubernetes cluster metadata....")
telemetry_k8s.collect_cluster_metadata(chaos_telemetry)
# Collect error logs from handler
error_logs = error_collection_handler.get_error_logs()
if error_logs:
logging.info(f"Collected {len(error_logs)} error logs for telemetry")
chaos_telemetry.error_logs = error_logs
else:
logging.info("No error logs collected during chaos run")
chaos_telemetry.error_logs = []
telemetry_json = chaos_telemetry.to_json()
decoded_chaos_run_telemetry = ChaosRunTelemetry(json.loads(telemetry_json))
chaos_output.telemetry = decoded_chaos_run_telemetry
logging.info(f"Chaos data:\n{chaos_output.to_json()}")
if enable_elastic:
elastic_telemetry = ElasticChaosRunTelemetry(
chaos_run_telemetry=decoded_chaos_run_telemetry
)
result = elastic_search.push_telemetry(
decoded_chaos_run_telemetry, elastic_telemetry_index
elastic_telemetry, elastic_telemetry_index
)
if result == -1:
safe_logger.error(
@@ -652,13 +646,10 @@ if __name__ == "__main__":
# If no command or regular execution, continue with existing logic
report_file = options.output
tee_handler = TeeLogHandler()
error_collection_handler = ErrorCollectionHandler(level=logging.ERROR)
handlers = [
logging.FileHandler(report_file, mode="w"),
logging.StreamHandler(),
tee_handler,
error_collection_handler,
]
logging.basicConfig(

View File

@@ -43,15 +43,14 @@ class TestGCP(unittest.TestCase):
def setUp(self):
"""Set up test fixtures"""
# Mock google.auth before creating GCP instance
self.auth_patcher = patch('krkn.scenario_plugins.node_actions.gcp_node_scenarios.google.auth.default')
self.compute_patcher = patch('krkn.scenario_plugins.node_actions.gcp_node_scenarios.compute_v1.InstancesClient')
self.auth_patcher = patch('google.auth.default')
self.compute_patcher = patch('google.cloud.compute_v1.InstancesClient')
self.mock_auth = self.auth_patcher.start()
self.mock_compute_client = self.compute_patcher.start()
# Configure auth mock to return credentials and project_id
mock_credentials = MagicMock()
self.mock_auth.return_value = (mock_credentials, 'test-project-123')
self.mock_auth.return_value = (MagicMock(), 'test-project-123')
# Create GCP instance with mocked dependencies
self.gcp = GCP()
@@ -68,7 +67,7 @@ class TestGCP(unittest.TestCase):
def test_gcp_init_failure(self):
"""Test GCP class initialization failure"""
with patch('krkn.scenario_plugins.node_actions.gcp_node_scenarios.google.auth.default', side_effect=Exception("Auth error")):
with patch('google.auth.default', side_effect=Exception("Auth error")):
with self.assertRaises(Exception):
GCP()

View File

@@ -1,31 +0,0 @@
# ⚠️ DEPRECATED
This directory is **no longer actively maintained** and will not accept new changes.
## Migration Notice
All development efforts have been moved to:
**[github.com/krkn-chaos/krkn-ai](https://github.com/krkn-chaos/krkn-ai)**
## What This Means
- ❌ No new features will be added here
- ❌ Bug fixes will not be accepted
- ❌ Pull requests will be closed and redirected
- Existing code remains for historical reference only
## Next Steps
If you're looking to:
- **Use** chaos engineering AI features → Visit [krkn-chaos/krkn-ai](https://github.com/krkn-chaos/krkn-ai)
- **Contribute** improvements → Submit to [krkn-chaos/krkn-ai](https://github.com/krkn-chaos/krkn-ai)
- **Report issues** → Open issues at [krkn-chaos/krkn-ai](https://github.com/krkn-chaos/krkn-ai/issues)
## Questions?
Please visit the new repository for documentation, examples, and community support.
---
**Last Updated:** January 2026

View File

@@ -1,17 +1,3 @@
# ⚠️ DEPRECATED - This project has moved
> **All development has moved to [github.com/krkn-chaos/krkn-ai](https://github.com/krkn-chaos/krkn-ai)**
>
> This directory is no longer maintained. Please visit the new repository for:
> - Latest features and updates
> - Active development and support
> - Bug fixes and improvements
> - Documentation and examples
>
> See [../README.md](../README.md) for more information.
---
# aichaos
Enhancing Chaos Engineering with AI-assisted fault injection for better resiliency and non-functional testing.

View File

@@ -3,9 +3,8 @@ pandas
notebook
jupyterlab
jupyter
seaborn==0.13.2
seaborn
requests
wheel
Flask==2.2.5
Flask==2.1.0
flasgger==0.9.5
pillow==10.3.0

View File

@@ -1,17 +1,3 @@
# ⚠️ DEPRECATED - This project has moved
> **All development has moved to [github.com/krkn-chaos/krkn-ai](https://github.com/krkn-chaos/krkn-ai)**
>
> This directory is no longer maintained. Please visit the new repository for:
> - Latest features and updates
> - Active development and support
> - Bug fixes and improvements
> - Documentation and examples
>
> See [../README.md](../README.md) for more information.
---
# Chaos Recommendation Tool
This tool, designed for Redhat Kraken, operates through the command line and offers recommendations for chaos testing. It suggests probable chaos test cases that can disrupt application services by analyzing their behavior and assessing their susceptibility to specific fault types.