feat: Load tests

This commit is contained in:
TheiLLeniumStudios
2026-01-06 11:03:26 +01:00
parent a5d1012570
commit 9a3edf13d2
17 changed files with 6191 additions and 82 deletions

544
test/loadtest/README.md Normal file
View File

@@ -0,0 +1,544 @@
# Reloader Load Test Framework
This framework provides A/B comparison testing between two Reloader container images.
## Overview
The load test framework:
1. Creates a local kind cluster (1 control-plane + 6 worker nodes)
2. Deploys Prometheus for metrics collection
3. Loads the provided Reloader container images into the cluster
4. Runs standardized test scenarios (S1-S13)
5. Collects metrics via Prometheus scraping
6. Generates comparison reports with pass/fail criteria
## Prerequisites
- Docker or Podman
- kind (Kubernetes in Docker)
- kubectl
- Go 1.22+
## Building
```bash
cd test/loadtest
go build -o loadtest ./cmd/loadtest
```
## Quick Start
```bash
# Compare two published images (e.g., different versions)
./loadtest run \
--old-image=stakater/reloader:v1.0.0 \
--new-image=stakater/reloader:v1.1.0
# Run a specific scenario
./loadtest run \
--old-image=stakater/reloader:v1.0.0 \
--new-image=stakater/reloader:v1.1.0 \
--scenario=S2 \
--duration=120
# Test only a single image (no comparison)
./loadtest run --new-image=myregistry/reloader:dev
# Use local images built with docker/podman
./loadtest run \
--old-image=localhost/reloader:baseline \
--new-image=localhost/reloader:feature-branch
# Skip cluster creation (use existing kind cluster)
./loadtest run \
--old-image=stakater/reloader:v1.0.0 \
--new-image=stakater/reloader:v1.1.0 \
--skip-cluster
# Run all scenarios in parallel on 4 clusters (faster execution)
./loadtest run \
--new-image=localhost/reloader:dev \
--parallelism=4
# Run all 13 scenarios in parallel (one cluster per scenario)
./loadtest run \
--new-image=localhost/reloader:dev \
--parallelism=13
# Generate report from existing results
./loadtest report --scenario=S2 --results-dir=./results
```
## Command Line Options
### Run Command
| Option | Description | Default |
|--------|-------------|---------|
| `--old-image=IMAGE` | Container image for "old" version | - |
| `--new-image=IMAGE` | Container image for "new" version | - |
| `--scenario=ID` | Test scenario: S1-S13 or "all" | all |
| `--duration=SECONDS` | Test duration in seconds | 60 |
| `--parallelism=N` | Run N scenarios in parallel on N kind clusters | 1 |
| `--skip-cluster` | Skip kind cluster creation (use existing, only for parallelism=1) | false |
| `--results-dir=DIR` | Directory for results | ./results |
**Note:** At least one of `--old-image` or `--new-image` is required. Provide both for A/B comparison.
### Report Command
| Option | Description | Default |
|--------|-------------|---------|
| `--scenario=ID` | Scenario to report on (required) | - |
| `--results-dir=DIR` | Directory containing results | ./results |
| `--output=FILE` | Output file (default: stdout) | - |
## Test Scenarios
| ID | Name | Description |
|-----|-----------------------|-------------------------------------------------|
| S1 | Burst Updates | Many ConfigMap/Secret updates in quick succession |
| S2 | Fan-Out | One ConfigMap used by many (50) workloads |
| S3 | High Cardinality | Many CMs/Secrets across many namespaces |
| S4 | No-Op Updates | Updates that don't change data (annotation only)|
| S5 | Workload Churn | Deployments created/deleted rapidly |
| S6 | Controller Restart | Restart controller pod under load |
| S7 | API Pressure | Many concurrent update requests |
| S8 | Large Objects | ConfigMaps > 100KB |
| S9 | Multi-Workload Types | Tests all workload types (Deploy, STS, DS) |
| S10 | Secrets + Mixed | Secrets and mixed ConfigMap+Secret workloads |
| S11 | Annotation Strategy | Tests `--reload-strategy=annotations` |
| S12 | Pause & Resume | Tests pause-period during rapid updates |
| S13 | Complex References | Init containers, valueFrom, projected volumes |
## Metrics Reference
This section explains each metric collected during load tests, what it measures, and what different values might indicate.
### Counter Metrics (Totals)
#### `reconcile_total`
**What it measures:** The total number of reconciliation loops executed by the controller.
**What it indicates:**
- **Higher in new vs old:** The new controller-runtime implementation may batch events differently. This is often expected behavior, not a problem.
- **Lower in new vs old:** Better event batching/deduplication. Controller-runtime's work queue naturally deduplicates events.
- **Expected behavior:** The new implementation typically has *fewer* reconciles due to intelligent event batching.
#### `action_total`
**What it measures:** The total number of reload actions triggered (rolling restarts of Deployments/StatefulSets/DaemonSets).
**What it indicates:**
- **Should match expected value:** Both implementations should trigger the same number of reloads for the same workload.
- **Lower than expected:** Some updates were missed - potential bug or race condition.
- **Higher than expected:** Duplicate reloads triggered - inefficiency but not data loss.
#### `reload_executed_total`
**What it measures:** Successful reload operations executed, labeled by `success=true/false`.
**What it indicates:**
- **`success=true` count:** Number of workloads successfully restarted.
- **`success=false` count:** Failed restart attempts (API errors, permission issues).
- **Should match `action_total`:** If significantly lower, reloads are failing.
#### `workloads_scanned_total`
**What it measures:** Number of workloads (Deployments, etc.) scanned when checking for ConfigMap/Secret references.
**What it indicates:**
- **High count:** Controller is scanning many workloads per reconcile.
- **Expected behavior:** Should roughly match the number of workloads × number of reconciles.
- **Optimization signal:** If very high, namespace filtering or label selectors could help.
#### `workloads_matched_total`
**What it measures:** Number of workloads that matched (reference the changed ConfigMap/Secret).
**What it indicates:**
- **Should match `reload_executed_total`:** Every matched workload should be reloaded.
- **Higher than reloads:** Some matched workloads weren't reloaded (potential issue).
#### `errors_total`
**What it measures:** Total errors encountered, labeled by error type.
**What it indicates:**
- **Should be 0:** Any errors indicate problems.
- **Common causes:** API server timeouts, RBAC issues, resource conflicts.
- **Critical metric:** Non-zero errors in production should be investigated.
### API Efficiency Metrics (REST Client)
These metrics track Kubernetes API server calls made by Reloader. Lower values indicate more efficient operation with less API server load.
#### `rest_client_requests_total`
**What it measures:** Total number of HTTP requests made to the Kubernetes API server.
**What it indicates:**
- **Lower is better:** Fewer API calls means less load on the API server.
- **High count:** May indicate inefficient caching or excessive reconciles.
- **Comparison use:** Shows overall API efficiency between implementations.
#### `rest_client_requests_get`
**What it measures:** Number of GET requests (fetching individual resources or listings).
**What it indicates:**
- **Includes:** Fetching ConfigMaps, Secrets, Deployments, etc.
- **Higher count:** More frequent resource fetching, possibly due to cache misses.
- **Expected behavior:** Controller-runtime's caching should reduce GET requests compared to direct API calls.
#### `rest_client_requests_patch`
**What it measures:** Number of PATCH requests (partial updates to resources).
**What it indicates:**
- **Used for:** Rolling restart annotations on workloads.
- **Should correlate with:** `reload_executed_total` - each reload typically requires one PATCH.
- **Lower is better:** Fewer patches means more efficient batching or deduplication.
#### `rest_client_requests_put`
**What it measures:** Number of PUT requests (full resource updates).
**What it indicates:**
- **Used for:** Full object replacements (less common than PATCH).
- **Should be low:** Most updates use PATCH for efficiency.
- **High count:** May indicate suboptimal update strategy.
#### `rest_client_requests_errors`
**What it measures:** Number of failed API requests (4xx/5xx responses).
**What it indicates:**
- **Should be 0:** Errors indicate API server issues or permission problems.
- **Common causes:** Rate limiting, RBAC issues, resource conflicts, network issues.
- **Non-zero:** Investigate API server logs and Reloader permissions.
### Latency Metrics (Percentiles)
All latency metrics are reported in **seconds**. The report shows p50 (median), p95, and p99 percentiles.
#### `reconcile_duration (s)`
**What it measures:** Time spent inside each reconcile loop, from start to finish.
**What it indicates:**
- **p50 (median):** Typical reconcile time. Should be < 100ms for good performance.
- **p95:** 95th percentile - only 5% of reconciles take longer than this.
- **p99:** 99th percentile - indicates worst-case performance.
**Interpreting differences:**
- **New higher than old:** Controller-runtime reconciles may do more work per loop but run fewer times. Check `reconcile_total` - if it's lower, this is expected.
- **Minor differences (< 0.5s absolute):** Not significant for sub-second values.
#### `action_latency (s)`
**What it measures:** End-to-end time from ConfigMap/Secret change detection to workload restart triggered.
**What it indicates:**
- **This is the user-facing latency:** How long users wait for their config changes to take effect.
- **p50 < 1s:** Excellent - most changes apply within a second.
- **p95 < 5s:** Good - even under load, changes apply quickly.
- **p99 > 10s:** May need investigation - some changes take too long.
**What affects this:**
- API server responsiveness
- Number of workloads to scan
- Concurrent updates competing for resources
### Understanding the Report
#### Report Columns
```
Metric Old New Expected Old✓ New✓ Status
------ --- --- -------- ---- ---- ------
action_total 100.00 100.00 100 ✓ ✓ pass
action_latency_p95 (s) 0.15 0.04 - - - pass
```
- **Old/New:** Measured values from each implementation
- **Expected:** Known expected value (for throughput metrics)
- **Old✓/New✓:** Whether the value is within 15% of expected (✓ = yes, ✗ = no, - = no expected value)
- **Status:** pass/fail based on comparison thresholds
#### Pass/Fail Logic
| Metric Type | Pass Condition |
|-------------|----------------|
| Throughput (action_total, reload_executed_total) | New value within 15% of expected |
| Latency (p50, p95, p99) | New not more than threshold% worse than old, OR absolute difference < minimum threshold |
| Errors | New ≤ Old (ideally both 0) |
| API Efficiency (rest_client_requests_*) | New ≤ Old (lower is better), or New not more than 50% higher |
#### Latency Thresholds
Latency comparisons use both percentage AND absolute thresholds to avoid false failures:
| Metric | Max % Worse | Min Absolute Diff |
|--------|-------------|-------------------|
| p50 | 100% | 0.5s |
| p95 | 100% | 1.0s |
| p99 | 100% | 1.0s |
**Example:** If old p50 = 0.01s and new p50 = 0.08s:
- Percentage difference: +700% (would fail % check)
- Absolute difference: 0.07s (< 0.5s threshold)
- **Result: PASS** (both values are fast enough that the difference doesn't matter)
### Resource Consumption Metrics
These metrics track CPU, memory, and Go runtime resource usage. Lower values generally indicate more efficient operation.
#### Memory Metrics
| Metric | Description | Unit |
|--------|-------------|------|
| `memory_rss_mb_avg` | Average RSS (resident set size) memory | MB |
| `memory_rss_mb_max` | Peak RSS memory during test | MB |
| `memory_heap_mb_avg` | Average Go heap allocation | MB |
| `memory_heap_mb_max` | Peak Go heap allocation | MB |
**What to watch for:**
- **High RSS:** May indicate memory leaks or inefficient caching
- **High heap:** Many objects being created (check GC metrics)
- **Growing over time:** Potential memory leak
#### CPU Metrics
| Metric | Description | Unit |
|--------|-------------|------|
| `cpu_cores_avg` | Average CPU usage rate | cores |
| `cpu_cores_max` | Peak CPU usage rate | cores |
**What to watch for:**
- **High CPU:** Inefficient algorithms or excessive reconciles
- **Spiky max:** May indicate burst handling issues
#### Go Runtime Metrics
| Metric | Description | Unit |
|--------|-------------|------|
| `goroutines_avg` | Average goroutine count | count |
| `goroutines_max` | Peak goroutine count | count |
| `gc_pause_p99_ms` | 99th percentile GC pause time | ms |
**What to watch for:**
- **High goroutines:** Potential goroutine leak or unbounded concurrency
- **High GC pause:** Large heap or allocation pressure
### Scenario-Specific Expectations
| Scenario | Key Metrics to Watch | Expected Behavior |
|----------|---------------------|-------------------|
| S1 (Burst) | action_latency_p99, cpu_cores_max, goroutines_max | Should handle bursts without queue backup |
| S2 (Fan-Out) | reconcile_total, workloads_matched, memory_rss_mb_max | One CM change → 50 workload reloads |
| S3 (High Cardinality) | reconcile_duration, memory_heap_mb_avg | Many namespaces shouldn't increase memory |
| S4 (No-Op) | action_total = 0, cpu_cores_avg should be low | Minimal resource usage for no-op |
| S5 (Churn) | errors_total, goroutines_avg | Graceful handling, no goroutine leak |
| S6 (Restart) | All metrics captured | Metrics survive controller restart |
| S7 (API Pressure) | errors_total, cpu_cores_max, goroutines_max | No errors under concurrent load |
| S8 (Large Objects) | memory_rss_mb_max, gc_pause_p99_ms | Large ConfigMaps don't cause OOM or GC issues |
| S9 (Multi-Workload) | reload_executed_total per type | All workload types (Deploy, STS, DS) reload |
| S10 (Secrets) | reload_executed_total, workloads_matched | Both Secrets and ConfigMaps trigger reloads |
| S11 (Annotation) | workload annotations present | Deployments get `last-reloaded-from` annotation |
| S12 (Pause) | reload_executed_total << updates | Pause-period reduces reload frequency |
| S13 (Complex) | reload_executed_total | All reference types trigger reloads |
### Troubleshooting
#### New implementation shows 0 for all metrics
- Check if Prometheus is scraping the new Reloader pod
- Verify pod annotations: `prometheus.io/scrape: "true"`
- Check Prometheus targets: `http://localhost:9091/targets`
#### Metrics don't match expected values
- Verify test ran to completion (check logs)
- Ensure Prometheus scraped final metrics (18s wait after test)
- Check for pod restarts during test (metrics reset on restart - handled by `increase()`)
#### High latency in new implementation
- Check Reloader pod resource limits
- Look for API server throttling in logs
- Compare `reconcile_total` - fewer reconciles with higher duration may be normal
#### REST client errors are non-zero
- **Common causes:**
- Optional CRD schemes registered but CRDs not installed (e.g., Argo Rollouts, OpenShift DeploymentConfig)
- API server rate limiting under high load
- RBAC permissions missing for certain resource types
- **Argo Rollouts errors:** If you see ~4 errors per test, ensure `--enable-argo-rollouts=false` if not using Argo Rollouts
- **OpenShift errors:** Similarly, ensure DeploymentConfig support is disabled on non-OpenShift clusters
#### REST client requests much higher in new implementation
- Check if caching is working correctly
- Look for excessive re-queuing in controller logs
- Compare `reconcile_total` - more reconciles naturally means more API calls
## Report Format
The report generator produces a comparison table with units and expected value indicators:
```
================================================================================
RELOADER A/B COMPARISON REPORT
================================================================================
Scenario: S2
Generated: 2026-01-03 14:30:00
Status: PASS
Summary: All metrics within acceptable thresholds
Test: S2: Fan-out test - 1 CM update triggers 50 deployment reloads
--------------------------------------------------------------------------------
METRIC COMPARISONS
--------------------------------------------------------------------------------
(Old✓/New✓ = meets expected value within 15%)
Metric Old New Expected Old✓ New✓ Status
------ --- --- -------- ---- ---- ------
reconcile_total 50.00 25.00 - - - pass
reconcile_duration_p50 (s) 0.01 0.05 - - - pass
reconcile_duration_p95 (s) 0.02 0.15 - - - pass
action_total 50.00 50.00 50 ✓ ✓ pass
action_latency_p50 (s) 0.05 0.03 - - - pass
action_latency_p95 (s) 0.12 0.08 - - - pass
errors_total 0.00 0.00 - - - pass
reload_executed_total 50.00 50.00 50 ✓ ✓ pass
workloads_scanned_total 50.00 50.00 50 ✓ ✓ pass
workloads_matched_total 50.00 50.00 50 ✓ ✓ pass
rest_client_requests_total 850 720 - - - pass
rest_client_requests_get 500 420 - - - pass
rest_client_requests_patch 300 250 - - - pass
rest_client_requests_errors 0 0 - - - pass
```
Reports are saved to `results/<scenario>/report.txt` after each test.
## Directory Structure
```
test/loadtest/
├── cmd/
│ └── loadtest/ # Unified CLI (run + report)
│ └── main.go
├── internal/
│ ├── cluster/ # Kind cluster management
│ │ └── kind.go
│ ├── prometheus/ # Prometheus deployment & querying
│ │ └── prometheus.go
│ ├── reloader/ # Reloader deployment
│ │ └── deploy.go
│ └── scenarios/ # Test scenario implementations
│ └── scenarios.go
├── manifests/
│ └── prometheus.yaml # Prometheus deployment manifest
├── results/ # Generated after tests
│ └── <scenario>/
│ ├── old/ # Old version data
│ │ ├── *.json # Prometheus metric snapshots
│ │ └── reloader.log # Reloader pod logs
│ ├── new/ # New version data
│ │ ├── *.json # Prometheus metric snapshots
│ │ └── reloader.log # Reloader pod logs
│ ├── expected.json # Expected values from test
│ └── report.txt # Comparison report
├── go.mod
├── go.sum
└── README.md
```
## Building Local Images for Testing
If you want to test local code changes:
```bash
# Build the new Reloader image from current source
docker build -t localhost/reloader:dev -f Dockerfile .
# Build from a different branch/commit
git checkout feature-branch
docker build -t localhost/reloader:feature -f Dockerfile .
# Then run comparison
./loadtest run \
--old-image=stakater/reloader:v1.0.0 \
--new-image=localhost/reloader:feature
```
## Interpreting Results
### PASS
All metrics are within acceptable thresholds. The new implementation is comparable or better than the old one.
### FAIL
One or more metrics exceeded thresholds. Review the specific metrics:
- **Latency degradation**: p95/p99 latencies are significantly higher
- **Missed reloads**: `reload_executed_total` differs significantly
- **Errors increased**: `errors_total` is higher in new version
### Investigation
If tests fail, check:
1. Pod logs: `kubectl logs -n reloader-new deployment/reloader` (or check `results/<scenario>/new/reloader.log`)
2. Resource usage: `kubectl top pods -n reloader-new`
3. Events: `kubectl get events -n reloader-test`
## Parallel Execution
The `--parallelism` option enables running scenarios on multiple kind clusters simultaneously, significantly reducing total test time.
### How It Works
1. **Multiple Clusters**: Creates N kind clusters named `reloader-loadtest-0`, `reloader-loadtest-1`, etc.
2. **Separate Prometheus**: Each cluster gets its own Prometheus instance with a unique port (9091, 9092, etc.)
3. **Worker Pool**: Scenarios are distributed to workers via a channel, with each worker running on its own cluster
4. **Independent Execution**: Each scenario runs in complete isolation with no resource contention
### Usage
```bash
# Run 4 scenarios at a time (creates 4 clusters)
./loadtest run --new-image=my-image:tag --parallelism=4
# Run all 13 scenarios in parallel (creates 13 clusters)
./loadtest run --new-image=my-image:tag --parallelism=13 --scenario=all
```
### Resource Requirements
Parallel execution requires significant system resources:
| Parallelism | Clusters | Est. Memory | Est. CPU |
|-------------|----------|-------------|----------|
| 1 (default) | 1 | ~4GB | 2-4 cores |
| 4 | 4 | ~16GB | 8-16 cores |
| 13 | 13 | ~52GB | 26-52 cores |
### Notes
- The `--skip-cluster` option is not supported with parallelism > 1
- Each worker loads images independently, so initial setup takes longer
- All results are written to the same `--results-dir` with per-scenario subdirectories
- If a cluster setup fails, remaining workers continue with available clusters
- Parallelism automatically reduces to match scenario count if set higher
## CI Integration
### GitHub Actions
Load tests can be triggered on pull requests by commenting `/loadtest`:
```
/loadtest
```
This will:
1. Build a container image from the PR branch
2. Run all load test scenarios against it
3. Post results as a PR comment
4. Upload detailed results as artifacts
### Make Target
Run load tests locally or in CI:
```bash
# From repository root
make loadtest
```
This builds the container image and runs all scenarios with a 60-second duration.