# Reloader Load Test Framework This framework provides A/B comparison testing between two Reloader container images. ## Overview The load test framework: 1. Creates a local kind cluster (1 control-plane + 6 worker nodes) 2. Deploys Prometheus for metrics collection 3. Loads the provided Reloader container images into the cluster 4. Runs standardized test scenarios (S1-S13) 5. Collects metrics via Prometheus scraping 6. Generates comparison reports with pass/fail criteria ## Prerequisites - Docker or Podman - kind (Kubernetes in Docker) - kubectl - Go 1.22+ ## Building ```bash cd test/loadtest go build -o loadtest ./cmd/loadtest ``` ## Quick Start ```bash # Compare two published images (e.g., different versions) ./loadtest run \ --old-image=stakater/reloader:v1.0.0 \ --new-image=stakater/reloader:v1.1.0 # Run a specific scenario ./loadtest run \ --old-image=stakater/reloader:v1.0.0 \ --new-image=stakater/reloader:v1.1.0 \ --scenario=S2 \ --duration=120 # Test only a single image (no comparison) ./loadtest run --new-image=myregistry/reloader:dev # Use local images built with docker/podman ./loadtest run \ --old-image=localhost/reloader:baseline \ --new-image=localhost/reloader:feature-branch # Skip cluster creation (use existing kind cluster) ./loadtest run \ --old-image=stakater/reloader:v1.0.0 \ --new-image=stakater/reloader:v1.1.0 \ --skip-cluster # Run all scenarios in parallel on 4 clusters (faster execution) ./loadtest run \ --new-image=localhost/reloader:dev \ --parallelism=4 # Run all 13 scenarios in parallel (one cluster per scenario) ./loadtest run \ --new-image=localhost/reloader:dev \ --parallelism=13 # Generate report from existing results ./loadtest report --scenario=S2 --results-dir=./results ``` ## Command Line Options ### Run Command | Option | Description | Default | |--------|-------------|---------| | `--old-image=IMAGE` | Container image for "old" version | - | | `--new-image=IMAGE` | Container image for "new" version | - | | `--scenario=ID` | Test scenario: S1-S13 or "all" | all | | `--duration=SECONDS` | Test duration in seconds | 60 | | `--parallelism=N` | Run N scenarios in parallel on N kind clusters | 1 | | `--skip-cluster` | Skip kind cluster creation (use existing, only for parallelism=1) | false | | `--results-dir=DIR` | Directory for results | ./results | **Note:** At least one of `--old-image` or `--new-image` is required. Provide both for A/B comparison. ### Report Command | Option | Description | Default | |--------|-------------|---------| | `--scenario=ID` | Scenario to report on (required) | - | | `--results-dir=DIR` | Directory containing results | ./results | | `--output=FILE` | Output file (default: stdout) | - | ## Test Scenarios | ID | Name | Description | |-----|-----------------------|-------------------------------------------------| | S1 | Burst Updates | Many ConfigMap/Secret updates in quick succession | | S2 | Fan-Out | One ConfigMap used by many (50) workloads | | S3 | High Cardinality | Many CMs/Secrets across many namespaces | | S4 | No-Op Updates | Updates that don't change data (annotation only)| | S5 | Workload Churn | Deployments created/deleted rapidly | | S6 | Controller Restart | Restart controller pod under load | | S7 | API Pressure | Many concurrent update requests | | S8 | Large Objects | ConfigMaps > 100KB | | S9 | Multi-Workload Types | Tests all workload types (Deploy, STS, DS) | | S10 | Secrets + Mixed | Secrets and mixed ConfigMap+Secret workloads | | S11 | Annotation Strategy | Tests `--reload-strategy=annotations` | | S12 | Pause & Resume | Tests pause-period during rapid updates | | S13 | Complex References | Init containers, valueFrom, projected volumes | ## Metrics Reference This section explains each metric collected during load tests, what it measures, and what different values might indicate. ### Counter Metrics (Totals) #### `reconcile_total` **What it measures:** The total number of reconciliation loops executed by the controller. **What it indicates:** - **Higher in new vs old:** The new controller-runtime implementation may batch events differently. This is often expected behavior, not a problem. - **Lower in new vs old:** Better event batching/deduplication. Controller-runtime's work queue naturally deduplicates events. - **Expected behavior:** The new implementation typically has *fewer* reconciles due to intelligent event batching. #### `action_total` **What it measures:** The total number of reload actions triggered (rolling restarts of Deployments/StatefulSets/DaemonSets). **What it indicates:** - **Should match expected value:** Both implementations should trigger the same number of reloads for the same workload. - **Lower than expected:** Some updates were missed - potential bug or race condition. - **Higher than expected:** Duplicate reloads triggered - inefficiency but not data loss. #### `reload_executed_total` **What it measures:** Successful reload operations executed, labeled by `success=true/false`. **What it indicates:** - **`success=true` count:** Number of workloads successfully restarted. - **`success=false` count:** Failed restart attempts (API errors, permission issues). - **Should match `action_total`:** If significantly lower, reloads are failing. #### `workloads_scanned_total` **What it measures:** Number of workloads (Deployments, etc.) scanned when checking for ConfigMap/Secret references. **What it indicates:** - **High count:** Controller is scanning many workloads per reconcile. - **Expected behavior:** Should roughly match the number of workloads × number of reconciles. - **Optimization signal:** If very high, namespace filtering or label selectors could help. #### `workloads_matched_total` **What it measures:** Number of workloads that matched (reference the changed ConfigMap/Secret). **What it indicates:** - **Should match `reload_executed_total`:** Every matched workload should be reloaded. - **Higher than reloads:** Some matched workloads weren't reloaded (potential issue). #### `errors_total` **What it measures:** Total errors encountered, labeled by error type. **What it indicates:** - **Should be 0:** Any errors indicate problems. - **Common causes:** API server timeouts, RBAC issues, resource conflicts. - **Critical metric:** Non-zero errors in production should be investigated. ### API Efficiency Metrics (REST Client) These metrics track Kubernetes API server calls made by Reloader. Lower values indicate more efficient operation with less API server load. #### `rest_client_requests_total` **What it measures:** Total number of HTTP requests made to the Kubernetes API server. **What it indicates:** - **Lower is better:** Fewer API calls means less load on the API server. - **High count:** May indicate inefficient caching or excessive reconciles. - **Comparison use:** Shows overall API efficiency between implementations. #### `rest_client_requests_get` **What it measures:** Number of GET requests (fetching individual resources or listings). **What it indicates:** - **Includes:** Fetching ConfigMaps, Secrets, Deployments, etc. - **Higher count:** More frequent resource fetching, possibly due to cache misses. - **Expected behavior:** Controller-runtime's caching should reduce GET requests compared to direct API calls. #### `rest_client_requests_patch` **What it measures:** Number of PATCH requests (partial updates to resources). **What it indicates:** - **Used for:** Rolling restart annotations on workloads. - **Should correlate with:** `reload_executed_total` - each reload typically requires one PATCH. - **Lower is better:** Fewer patches means more efficient batching or deduplication. #### `rest_client_requests_put` **What it measures:** Number of PUT requests (full resource updates). **What it indicates:** - **Used for:** Full object replacements (less common than PATCH). - **Should be low:** Most updates use PATCH for efficiency. - **High count:** May indicate suboptimal update strategy. #### `rest_client_requests_errors` **What it measures:** Number of failed API requests (4xx/5xx responses). **What it indicates:** - **Should be 0:** Errors indicate API server issues or permission problems. - **Common causes:** Rate limiting, RBAC issues, resource conflicts, network issues. - **Non-zero:** Investigate API server logs and Reloader permissions. ### Latency Metrics (Percentiles) All latency metrics are reported in **seconds**. The report shows p50 (median), p95, and p99 percentiles. #### `reconcile_duration (s)` **What it measures:** Time spent inside each reconcile loop, from start to finish. **What it indicates:** - **p50 (median):** Typical reconcile time. Should be < 100ms for good performance. - **p95:** 95th percentile - only 5% of reconciles take longer than this. - **p99:** 99th percentile - indicates worst-case performance. **Interpreting differences:** - **New higher than old:** Controller-runtime reconciles may do more work per loop but run fewer times. Check `reconcile_total` - if it's lower, this is expected. - **Minor differences (< 0.5s absolute):** Not significant for sub-second values. #### `action_latency (s)` **What it measures:** End-to-end time from ConfigMap/Secret change detection to workload restart triggered. **What it indicates:** - **This is the user-facing latency:** How long users wait for their config changes to take effect. - **p50 < 1s:** Excellent - most changes apply within a second. - **p95 < 5s:** Good - even under load, changes apply quickly. - **p99 > 10s:** May need investigation - some changes take too long. **What affects this:** - API server responsiveness - Number of workloads to scan - Concurrent updates competing for resources ### Understanding the Report #### Report Columns ``` Metric Old New Expected Old✓ New✓ Status ------ --- --- -------- ---- ---- ------ action_total 100.00 100.00 100 ✓ ✓ pass action_latency_p95 (s) 0.15 0.04 - - - pass ``` - **Old/New:** Measured values from each implementation - **Expected:** Known expected value (for throughput metrics) - **Old✓/New✓:** Whether the value is within 15% of expected (✓ = yes, ✗ = no, - = no expected value) - **Status:** pass/fail based on comparison thresholds #### Pass/Fail Logic | Metric Type | Pass Condition | |-------------|----------------| | Throughput (action_total, reload_executed_total) | New value within 15% of expected | | Latency (p50, p95, p99) | New not more than threshold% worse than old, OR absolute difference < minimum threshold | | Errors | New ≤ Old (ideally both 0) | | API Efficiency (rest_client_requests_*) | New ≤ Old (lower is better), or New not more than 50% higher | #### Latency Thresholds Latency comparisons use both percentage AND absolute thresholds to avoid false failures: | Metric | Max % Worse | Min Absolute Diff | |--------|-------------|-------------------| | p50 | 100% | 0.5s | | p95 | 100% | 1.0s | | p99 | 100% | 1.0s | **Example:** If old p50 = 0.01s and new p50 = 0.08s: - Percentage difference: +700% (would fail % check) - Absolute difference: 0.07s (< 0.5s threshold) - **Result: PASS** (both values are fast enough that the difference doesn't matter) ### Resource Consumption Metrics These metrics track CPU, memory, and Go runtime resource usage. Lower values generally indicate more efficient operation. #### Memory Metrics | Metric | Description | Unit | |--------|-------------|------| | `memory_rss_mb_avg` | Average RSS (resident set size) memory | MB | | `memory_rss_mb_max` | Peak RSS memory during test | MB | | `memory_heap_mb_avg` | Average Go heap allocation | MB | | `memory_heap_mb_max` | Peak Go heap allocation | MB | **What to watch for:** - **High RSS:** May indicate memory leaks or inefficient caching - **High heap:** Many objects being created (check GC metrics) - **Growing over time:** Potential memory leak #### CPU Metrics | Metric | Description | Unit | |--------|-------------|------| | `cpu_cores_avg` | Average CPU usage rate | cores | | `cpu_cores_max` | Peak CPU usage rate | cores | **What to watch for:** - **High CPU:** Inefficient algorithms or excessive reconciles - **Spiky max:** May indicate burst handling issues #### Go Runtime Metrics | Metric | Description | Unit | |--------|-------------|------| | `goroutines_avg` | Average goroutine count | count | | `goroutines_max` | Peak goroutine count | count | | `gc_pause_p99_ms` | 99th percentile GC pause time | ms | **What to watch for:** - **High goroutines:** Potential goroutine leak or unbounded concurrency - **High GC pause:** Large heap or allocation pressure ### Scenario-Specific Expectations | Scenario | Key Metrics to Watch | Expected Behavior | |----------|---------------------|-------------------| | S1 (Burst) | action_latency_p99, cpu_cores_max, goroutines_max | Should handle bursts without queue backup | | S2 (Fan-Out) | reconcile_total, workloads_matched, memory_rss_mb_max | One CM change → 50 workload reloads | | S3 (High Cardinality) | reconcile_duration, memory_heap_mb_avg | Many namespaces shouldn't increase memory | | S4 (No-Op) | action_total = 0, cpu_cores_avg should be low | Minimal resource usage for no-op | | S5 (Churn) | errors_total, goroutines_avg | Graceful handling, no goroutine leak | | S6 (Restart) | All metrics captured | Metrics survive controller restart | | S7 (API Pressure) | errors_total, cpu_cores_max, goroutines_max | No errors under concurrent load | | S8 (Large Objects) | memory_rss_mb_max, gc_pause_p99_ms | Large ConfigMaps don't cause OOM or GC issues | | S9 (Multi-Workload) | reload_executed_total per type | All workload types (Deploy, STS, DS) reload | | S10 (Secrets) | reload_executed_total, workloads_matched | Both Secrets and ConfigMaps trigger reloads | | S11 (Annotation) | workload annotations present | Deployments get `last-reloaded-from` annotation | | S12 (Pause) | reload_executed_total << updates | Pause-period reduces reload frequency | | S13 (Complex) | reload_executed_total | All reference types trigger reloads | ### Troubleshooting #### New implementation shows 0 for all metrics - Check if Prometheus is scraping the new Reloader pod - Verify pod annotations: `prometheus.io/scrape: "true"` - Check Prometheus targets: `http://localhost:9091/targets` #### Metrics don't match expected values - Verify test ran to completion (check logs) - Ensure Prometheus scraped final metrics (18s wait after test) - Check for pod restarts during test (metrics reset on restart - handled by `increase()`) #### High latency in new implementation - Check Reloader pod resource limits - Look for API server throttling in logs - Compare `reconcile_total` - fewer reconciles with higher duration may be normal #### REST client errors are non-zero - **Common causes:** - Optional CRD schemes registered but CRDs not installed (e.g., Argo Rollouts, OpenShift DeploymentConfig) - API server rate limiting under high load - RBAC permissions missing for certain resource types - **Argo Rollouts errors:** If you see ~4 errors per test, ensure `--enable-argo-rollouts=false` if not using Argo Rollouts - **OpenShift errors:** Similarly, ensure DeploymentConfig support is disabled on non-OpenShift clusters #### REST client requests much higher in new implementation - Check if caching is working correctly - Look for excessive re-queuing in controller logs - Compare `reconcile_total` - more reconciles naturally means more API calls ## Report Format The report generator produces a comparison table with units and expected value indicators: ``` ================================================================================ RELOADER A/B COMPARISON REPORT ================================================================================ Scenario: S2 Generated: 2026-01-03 14:30:00 Status: PASS Summary: All metrics within acceptable thresholds Test: S2: Fan-out test - 1 CM update triggers 50 deployment reloads -------------------------------------------------------------------------------- METRIC COMPARISONS -------------------------------------------------------------------------------- (Old✓/New✓ = meets expected value within 15%) Metric Old New Expected Old✓ New✓ Status ------ --- --- -------- ---- ---- ------ reconcile_total 50.00 25.00 - - - pass reconcile_duration_p50 (s) 0.01 0.05 - - - pass reconcile_duration_p95 (s) 0.02 0.15 - - - pass action_total 50.00 50.00 50 ✓ ✓ pass action_latency_p50 (s) 0.05 0.03 - - - pass action_latency_p95 (s) 0.12 0.08 - - - pass errors_total 0.00 0.00 - - - pass reload_executed_total 50.00 50.00 50 ✓ ✓ pass workloads_scanned_total 50.00 50.00 50 ✓ ✓ pass workloads_matched_total 50.00 50.00 50 ✓ ✓ pass rest_client_requests_total 850 720 - - - pass rest_client_requests_get 500 420 - - - pass rest_client_requests_patch 300 250 - - - pass rest_client_requests_errors 0 0 - - - pass ``` Reports are saved to `results//report.txt` after each test. ## Directory Structure ``` test/loadtest/ ├── cmd/ │ └── loadtest/ # Unified CLI (run + report) │ └── main.go ├── internal/ │ ├── cluster/ # Kind cluster management │ │ └── kind.go │ ├── prometheus/ # Prometheus deployment & querying │ │ └── prometheus.go │ ├── reloader/ # Reloader deployment │ │ └── deploy.go │ └── scenarios/ # Test scenario implementations │ └── scenarios.go ├── manifests/ │ └── prometheus.yaml # Prometheus deployment manifest ├── results/ # Generated after tests │ └── / │ ├── old/ # Old version data │ │ ├── *.json # Prometheus metric snapshots │ │ └── reloader.log # Reloader pod logs │ ├── new/ # New version data │ │ ├── *.json # Prometheus metric snapshots │ │ └── reloader.log # Reloader pod logs │ ├── expected.json # Expected values from test │ └── report.txt # Comparison report ├── go.mod ├── go.sum └── README.md ``` ## Building Local Images for Testing If you want to test local code changes: ```bash # Build the new Reloader image from current source docker build -t localhost/reloader:dev -f Dockerfile . # Build from a different branch/commit git checkout feature-branch docker build -t localhost/reloader:feature -f Dockerfile . # Then run comparison ./loadtest run \ --old-image=stakater/reloader:v1.0.0 \ --new-image=localhost/reloader:feature ``` ## Interpreting Results ### PASS All metrics are within acceptable thresholds. The new implementation is comparable or better than the old one. ### FAIL One or more metrics exceeded thresholds. Review the specific metrics: - **Latency degradation**: p95/p99 latencies are significantly higher - **Missed reloads**: `reload_executed_total` differs significantly - **Errors increased**: `errors_total` is higher in new version ### Investigation If tests fail, check: 1. Pod logs: `kubectl logs -n reloader-new deployment/reloader` (or check `results//new/reloader.log`) 2. Resource usage: `kubectl top pods -n reloader-new` 3. Events: `kubectl get events -n reloader-test` ## Parallel Execution The `--parallelism` option enables running scenarios on multiple kind clusters simultaneously, significantly reducing total test time. ### How It Works 1. **Multiple Clusters**: Creates N kind clusters named `reloader-loadtest-0`, `reloader-loadtest-1`, etc. 2. **Separate Prometheus**: Each cluster gets its own Prometheus instance with a unique port (9091, 9092, etc.) 3. **Worker Pool**: Scenarios are distributed to workers via a channel, with each worker running on its own cluster 4. **Independent Execution**: Each scenario runs in complete isolation with no resource contention ### Usage ```bash # Run 4 scenarios at a time (creates 4 clusters) ./loadtest run --new-image=my-image:tag --parallelism=4 # Run all 13 scenarios in parallel (creates 13 clusters) ./loadtest run --new-image=my-image:tag --parallelism=13 --scenario=all ``` ### Resource Requirements Parallel execution requires significant system resources: | Parallelism | Clusters | Est. Memory | Est. CPU | |-------------|----------|-------------|----------| | 1 (default) | 1 | ~4GB | 2-4 cores | | 4 | 4 | ~16GB | 8-16 cores | | 13 | 13 | ~52GB | 26-52 cores | ### Notes - The `--skip-cluster` option is not supported with parallelism > 1 - Each worker loads images independently, so initial setup takes longer - All results are written to the same `--results-dir` with per-scenario subdirectories - If a cluster setup fails, remaining workers continue with available clusters - Parallelism automatically reduces to match scenario count if set higher ## CI Integration ### GitHub Actions Load tests can be triggered on pull requests by commenting `/loadtest`: ``` /loadtest ``` This will: 1. Build a container image from the PR branch 2. Run all load test scenarios against it 3. Post results as a PR comment 4. Upload detailed results as artifacts ### Make Target Run load tests locally or in CI: ```bash # From repository root make loadtest ``` This builds the container image and runs all scenarios with a 60-second duration.