Signed-off-by: ddjain <darjain@redhat.com>
8.9 KiB
CLAUDE.md - Krkn Chaos Engineering Framework
Project Overview
Krkn (Kraken) is a chaos engineering tool for Kubernetes/OpenShift clusters. It injects deliberate failures to validate cluster resilience. Plugin-based architecture with multi-cloud support (AWS, Azure, GCP, IBM Cloud, VMware, Alibaba, OpenStack).
Repository Structure
krkn/
├── krkn/
│ ├── scenario_plugins/ # Chaos scenario plugins (pod, node, network, hogs, etc.)
│ ├── utils/ # Utility functions
│ ├── rollback/ # Rollback management
│ ├── prometheus/ # Prometheus integration
│ └── cerberus/ # Health monitoring
├── tests/ # Unit tests (unittest framework)
├── scenarios/ # Example scenario configs (openshift/, kube/, kind/)
├── config/ # Configuration files
└── CI/ # CI/CD test scripts
Quick Start
# Setup (ALWAYS use virtual environment)
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Run Krkn
python run_kraken.py --config config/config.yaml
# Note: Scenarios are specified in config.yaml under kraken.chaos_scenarios
# There is no --scenario flag; edit config/config.yaml to select scenarios
# Run tests
python -m unittest discover -s tests -v
python -m coverage run -a -m unittest discover -s tests -v
Critical Requirements
Python Environment
- Python 3.9+ required
- NEVER install packages globally - always use virtual environment
- CRITICAL:
dockermust be <7.0 andrequestsmust be <2.32 (Unix socket compatibility)
Key Dependencies
- krkn-lib (5.1.13): Core library for Kubernetes/OpenShift operations
- kubernetes (34.1.0): Kubernetes Python client
- docker (<7.0), requests (<2.32): DO NOT upgrade without verifying compatibility
- Cloud SDKs: boto3 (AWS), azure-mgmt-* (Azure), google-cloud-compute (GCP), ibm_vpc (IBM), pyVmomi (VMware)
Plugin Architecture (CRITICAL)
Strictly enforced naming conventions:
Naming Rules
- Module files: Must end with
_scenario_plugin.pyand use snake_case- Example:
pod_disruption_scenario_plugin.py
- Example:
- Class names: Must be CamelCase and end with
ScenarioPlugin- Example:
PodDisruptionScenarioPlugin - Must match module filename (snake_case ↔ CamelCase)
- Example:
- Directory structure: Plugin dirs CANNOT contain "scenario" or "plugin"
- Location:
krkn/scenario_plugins/<plugin_name>/
- Location:
Plugin Implementation
Every plugin MUST:
- Extend
AbstractScenarioPlugin - Implement
run()method - Implement
get_scenario_types()method
from krkn.scenario_plugins import AbstractScenarioPlugin
class PodDisruptionScenarioPlugin(AbstractScenarioPlugin):
def run(self, config, scenarios_list, kubeconfig_path, wait_duration):
pass
def get_scenario_types(self):
return ["pod_scenarios", "pod_outage"]
Creating a New Plugin
- Create directory:
krkn/scenario_plugins/<plugin_name>/ - Create module:
<plugin_name>_scenario_plugin.py - Create class:
<PluginName>ScenarioPluginextendingAbstractScenarioPlugin - Implement
run()andget_scenario_types() - Create unit test:
tests/test_<plugin_name>_scenario_plugin.py - Add example scenario:
scenarios/<platform>/<scenario>.yaml
DO NOT: Violate naming conventions (factory will reject), include "scenario"/"plugin" in directory names, create plugins without tests.
Testing
Unit Tests
# Run all tests
python -m unittest discover -s tests -v
# Specific test
python -m unittest tests.test_pod_disruption_scenario_plugin
# With coverage
python -m coverage run -a -m unittest discover -s tests -v
python -m coverage html
Test requirements:
- Naming:
test_<module>_scenario_plugin.py - Mock external dependencies (Kubernetes API, cloud providers)
- Test success, failure, and edge cases
- Keep tests isolated and independent
Functional Tests
Located in CI/tests/. Can be run locally on a kind cluster with Prometheus and Elasticsearch set up.
Setup for local testing:
-
Deploy Prometheus and Elasticsearch on your kind cluster:
- Prometheus setup: https://krkn-chaos.dev/docs/developers-guide/testing-changes/#prometheus
- Elasticsearch setup: https://krkn-chaos.dev/docs/developers-guide/testing-changes/#elasticsearch
-
Or disable monitoring features in
config/config.yaml:performance_monitoring: enable_alerts: False enable_metrics: False check_critical_alerts: False
Note: Functional tests run automatically in CI with full monitoring enabled.
Cloud Provider Implementations
Node chaos scenarios are cloud-specific. Each in krkn/scenario_plugins/node_actions/<provider>_node_scenarios.py:
- AWS, Azure, GCP, IBM Cloud, VMware, Alibaba, OpenStack, Bare Metal
Implement: stop, start, reboot, terminate instances.
When modifying: Maintain consistency with other providers, handle API errors, add logging, update tests.
Adding Cloud Provider Support
- Create:
krkn/scenario_plugins/node_actions/<provider>_node_scenarios.py - Extend:
abstract_node_scenarios.AbstractNodeScenarios - Implement:
stop_instances,start_instances,reboot_instances,terminate_instances - Add SDK to
requirements.txt - Create unit test with mocked SDK
- Add example scenario:
scenarios/openshift/<provider>_node_scenarios.yml
Configuration
Main config: config/config.yaml
kraken: Core settingscerberus: Health monitoringperformance_monitoring: Prometheuselastic: Elasticsearch telemetry
Scenario configs: scenarios/ directory
- config:
scenario_type: <type> # Must match plugin's get_scenario_types()
Code Style
- Import order: Standard library, third-party, local imports
- Naming: snake_case (functions/variables), CamelCase (classes)
- Logging: Use Python's
loggingmodule - Error handling: Return appropriate exit codes
- Docstrings: Required for public functions/classes
Exit Codes
Krkn uses specific exit codes to communicate execution status:
0: Success - all scenarios passed, no critical alerts1: Scenario failure - one or more scenarios failed2: Critical alerts fired during execution3+: Health check failure (Cerberus monitoring detected issues)
When implementing scenarios:
- Return
0on success - Return
1on scenario-specific failures - Propagate health check failures appropriately
- Log exit code reasons clearly
Container Support
Krkn can run inside a container. See containers/ directory.
Building custom image:
cd containers
./compile_dockerfile.sh # Generates Dockerfile from template
docker build -t krkn:latest .
Running containerized:
docker run -v ~/.kube:/root/.kube:Z \
-v $(pwd)/config:/config:Z \
-v $(pwd)/scenarios:/scenarios:Z \
krkn:latest
Git Workflow
- NEVER commit directly to main
- NEVER use
--forcewithout approval - ALWAYS create feature branches:
git checkout -b feature/description - ALWAYS run tests before pushing
Conventional commits: feat:, fix:, test:, docs:, refactor:
git checkout main && git pull origin main
git checkout -b feature/your-feature-name
# Make changes, write tests
python -m unittest discover -s tests -v
git add <specific-files>
git commit -m "feat: description"
git push -u origin feature/your-feature-name
Environment Variables
KUBECONFIG: Path to kubeconfigAWS_*,AZURE_*,GOOGLE_APPLICATION_CREDENTIALS: Cloud credentialsPROMETHEUS_URL,ELASTIC_URL,ELASTIC_PASSWORD: Monitoring config
NEVER commit credentials or API keys.
Common Pitfalls
- Missing virtual environment - always activate venv
- Running functional tests without cluster setup
- Ignoring exit codes
- Modifying krkn-lib directly (it's a separate package)
- Upgrading docker/requests beyond version constraints
Before Writing Code
- Check for existing implementations
- Review existing plugins as examples
- Maintain consistency with cloud provider patterns
- Plan rollback logic
- Write tests alongside code
- Update documentation
When Adding Dependencies
- Check if functionality exists in krkn-lib or current dependencies
- Verify compatibility with existing versions
- Pin specific versions in
requirements.txt - Check for security vulnerabilities
- Test thoroughly for conflicts
Common Development Tasks
Modifying Existing Plugin
- Read plugin code and corresponding test
- Make changes
- Update/add unit tests
- Run:
python -m unittest tests.test_<plugin>_scenario_plugin
Writing Unit Tests
- Create:
tests/test_<module>_scenario_plugin.py - Import
unittestand plugin class - Mock external dependencies
- Test success, failure, and edge cases
- Run:
python -m unittest tests.test_<module>_scenario_plugin