Files
Reloader/test/loadtest
2026-01-09 00:34:09 +01:00
..
2026-01-06 11:03:26 +01:00
2026-01-06 11:03:26 +01:00

Reloader Load Test Framework

This framework provides A/B comparison testing between two Reloader container images.

Overview

The load test framework:

  1. Creates a local kind cluster (1 control-plane + 6 worker nodes)
  2. Deploys Prometheus for metrics collection
  3. Loads the provided Reloader container images into the cluster
  4. Runs standardized test scenarios (S1-S13)
  5. Collects metrics via Prometheus scraping
  6. Generates comparison reports with pass/fail criteria

Prerequisites

  • Docker or Podman
  • kind (Kubernetes in Docker)
  • kubectl
  • Go 1.22+

Building

cd test/loadtest
go build -o loadtest ./cmd/loadtest

Quick Start

# Compare two published images (e.g., different versions)
./loadtest run \
  --old-image=stakater/reloader:v1.0.0 \
  --new-image=stakater/reloader:v1.1.0

# Run a specific scenario
./loadtest run \
  --old-image=stakater/reloader:v1.0.0 \
  --new-image=stakater/reloader:v1.1.0 \
  --scenario=S2 \
  --duration=120

# Test only a single image (no comparison)
./loadtest run --new-image=myregistry/reloader:dev

# Use local images built with docker/podman
./loadtest run \
  --old-image=localhost/reloader:baseline \
  --new-image=localhost/reloader:feature-branch

# Skip cluster creation (use existing kind cluster)
./loadtest run \
  --old-image=stakater/reloader:v1.0.0 \
  --new-image=stakater/reloader:v1.1.0 \
  --skip-cluster

# Run all scenarios in parallel on 4 clusters (faster execution)
./loadtest run \
  --new-image=localhost/reloader:dev \
  --parallelism=4

# Run all 13 scenarios in parallel (one cluster per scenario)
./loadtest run \
  --new-image=localhost/reloader:dev \
  --parallelism=13

# Generate report from existing results
./loadtest report --scenario=S2 --results-dir=./results

Command Line Options

Run Command

Option Description Default
--old-image=IMAGE Container image for "old" version -
--new-image=IMAGE Container image for "new" version -
--scenario=ID Test scenario: S1-S13 or "all" all
--duration=SECONDS Test duration in seconds 60
--parallelism=N Run N scenarios in parallel on N kind clusters 1
--skip-cluster Skip kind cluster creation (use existing, only for parallelism=1) false
--results-dir=DIR Directory for results ./results

Note: At least one of --old-image or --new-image is required. Provide both for A/B comparison.

Report Command

Option Description Default
--scenario=ID Scenario to report on (required) -
--results-dir=DIR Directory containing results ./results
--output=FILE Output file (default: stdout) -

Test Scenarios

ID Name Description
S1 Burst Updates Many ConfigMap/Secret updates in quick succession
S2 Fan-Out One ConfigMap used by many (50) workloads
S3 High Cardinality Many CMs/Secrets across many namespaces
S4 No-Op Updates Updates that don't change data (annotation only)
S5 Workload Churn Deployments created/deleted rapidly
S6 Controller Restart Restart controller pod under load
S7 API Pressure Many concurrent update requests
S8 Large Objects ConfigMaps > 100KB
S9 Multi-Workload Types Tests all workload types (Deploy, STS, DS)
S10 Secrets + Mixed Secrets and mixed ConfigMap+Secret workloads
S11 Annotation Strategy Tests --reload-strategy=annotations
S12 Pause & Resume Tests pause-period during rapid updates
S13 Complex References Init containers, valueFrom, projected volumes

Metrics Reference

This section explains each metric collected during load tests, what it measures, and what different values might indicate.

Counter Metrics (Totals)

reconcile_total

What it measures: The total number of reconciliation loops executed by the controller.

What it indicates:

  • Higher in new vs old: The new controller-runtime implementation may batch events differently. This is often expected behavior, not a problem.
  • Lower in new vs old: Better event batching/deduplication. Controller-runtime's work queue naturally deduplicates events.
  • Expected behavior: The new implementation typically has fewer reconciles due to intelligent event batching.

action_total

What it measures: The total number of reload actions triggered (rolling restarts of Deployments/StatefulSets/DaemonSets).

What it indicates:

  • Should match expected value: Both implementations should trigger the same number of reloads for the same workload.
  • Lower than expected: Some updates were missed - potential bug or race condition.
  • Higher than expected: Duplicate reloads triggered - inefficiency but not data loss.

reload_executed_total

What it measures: Successful reload operations executed, labeled by success=true/false.

What it indicates:

  • success=true count: Number of workloads successfully restarted.
  • success=false count: Failed restart attempts (API errors, permission issues).
  • Should match action_total: If significantly lower, reloads are failing.

workloads_scanned_total

What it measures: Number of workloads (Deployments, etc.) scanned when checking for ConfigMap/Secret references.

What it indicates:

  • High count: Controller is scanning many workloads per reconcile.
  • Expected behavior: Should roughly match the number of workloads × number of reconciles.
  • Optimization signal: If very high, namespace filtering or label selectors could help.

workloads_matched_total

What it measures: Number of workloads that matched (reference the changed ConfigMap/Secret).

What it indicates:

  • Should match reload_executed_total: Every matched workload should be reloaded.
  • Higher than reloads: Some matched workloads weren't reloaded (potential issue).

errors_total

What it measures: Total errors encountered, labeled by error type.

What it indicates:

  • Should be 0: Any errors indicate problems.
  • Common causes: API server timeouts, RBAC issues, resource conflicts.
  • Critical metric: Non-zero errors in production should be investigated.

API Efficiency Metrics (REST Client)

These metrics track Kubernetes API server calls made by Reloader. Lower values indicate more efficient operation with less API server load.

rest_client_requests_total

What it measures: Total number of HTTP requests made to the Kubernetes API server.

What it indicates:

  • Lower is better: Fewer API calls means less load on the API server.
  • High count: May indicate inefficient caching or excessive reconciles.
  • Comparison use: Shows overall API efficiency between implementations.

rest_client_requests_get

What it measures: Number of GET requests (fetching individual resources or listings).

What it indicates:

  • Includes: Fetching ConfigMaps, Secrets, Deployments, etc.
  • Higher count: More frequent resource fetching, possibly due to cache misses.
  • Expected behavior: Controller-runtime's caching should reduce GET requests compared to direct API calls.

rest_client_requests_patch

What it measures: Number of PATCH requests (partial updates to resources).

What it indicates:

  • Used for: Rolling restart annotations on workloads.
  • Should correlate with: reload_executed_total - each reload typically requires one PATCH.
  • Lower is better: Fewer patches means more efficient batching or deduplication.

rest_client_requests_put

What it measures: Number of PUT requests (full resource updates).

What it indicates:

  • Used for: Full object replacements (less common than PATCH).
  • Should be low: Most updates use PATCH for efficiency.
  • High count: May indicate suboptimal update strategy.

rest_client_requests_errors

What it measures: Number of failed API requests (4xx/5xx responses).

What it indicates:

  • Should be 0: Errors indicate API server issues or permission problems.
  • Common causes: Rate limiting, RBAC issues, resource conflicts, network issues.
  • Non-zero: Investigate API server logs and Reloader permissions.

Latency Metrics (Percentiles)

All latency metrics are reported in seconds. The report shows p50 (median), p95, and p99 percentiles.

reconcile_duration (s)

What it measures: Time spent inside each reconcile loop, from start to finish.

What it indicates:

  • p50 (median): Typical reconcile time. Should be < 100ms for good performance.
  • p95: 95th percentile - only 5% of reconciles take longer than this.
  • p99: 99th percentile - indicates worst-case performance.

Interpreting differences:

  • New higher than old: Controller-runtime reconciles may do more work per loop but run fewer times. Check reconcile_total - if it's lower, this is expected.
  • Minor differences (< 0.5s absolute): Not significant for sub-second values.

action_latency (s)

What it measures: End-to-end time from ConfigMap/Secret change detection to workload restart triggered.

What it indicates:

  • This is the user-facing latency: How long users wait for their config changes to take effect.
  • p50 < 1s: Excellent - most changes apply within a second.
  • p95 < 5s: Good - even under load, changes apply quickly.
  • p99 > 10s: May need investigation - some changes take too long.

What affects this:

  • API server responsiveness
  • Number of workloads to scan
  • Concurrent updates competing for resources

Understanding the Report

Report Columns

Metric                           Old          New   Expected  Old✓  New✓   Status
------                           ---          ---   --------  ----  ----   ------
action_total                  100.00       100.00        100     ✓     ✓     pass
action_latency_p95 (s)          0.15         0.04          -     -     -     pass
  • Old/New: Measured values from each implementation
  • Expected: Known expected value (for throughput metrics)
  • Old✓/New✓: Whether the value is within 15% of expected (✓ = yes, ✗ = no, - = no expected value)
  • Status: pass/fail based on comparison thresholds

Pass/Fail Logic

Metric Type Pass Condition
Throughput (action_total, reload_executed_total) New value within 15% of expected
Latency (p50, p95, p99) New not more than threshold% worse than old, OR absolute difference < minimum threshold
Errors New ≤ Old (ideally both 0)
API Efficiency (rest_client_requests_*) New ≤ Old (lower is better), or New not more than 50% higher

Latency Thresholds

Latency comparisons use both percentage AND absolute thresholds to avoid false failures:

Metric Max % Worse Min Absolute Diff
p50 100% 0.5s
p95 100% 1.0s
p99 100% 1.0s

Example: If old p50 = 0.01s and new p50 = 0.08s:

  • Percentage difference: +700% (would fail % check)
  • Absolute difference: 0.07s (< 0.5s threshold)
  • Result: PASS (both values are fast enough that the difference doesn't matter)

Resource Consumption Metrics

These metrics track CPU, memory, and Go runtime resource usage. Lower values generally indicate more efficient operation.

Memory Metrics

Metric Description Unit
memory_rss_mb_avg Average RSS (resident set size) memory MB
memory_rss_mb_max Peak RSS memory during test MB
memory_heap_mb_avg Average Go heap allocation MB
memory_heap_mb_max Peak Go heap allocation MB

What to watch for:

  • High RSS: May indicate memory leaks or inefficient caching
  • High heap: Many objects being created (check GC metrics)
  • Growing over time: Potential memory leak

CPU Metrics

Metric Description Unit
cpu_cores_avg Average CPU usage rate cores
cpu_cores_max Peak CPU usage rate cores

What to watch for:

  • High CPU: Inefficient algorithms or excessive reconciles
  • Spiky max: May indicate burst handling issues

Go Runtime Metrics

Metric Description Unit
goroutines_avg Average goroutine count count
goroutines_max Peak goroutine count count
gc_pause_p99_ms 99th percentile GC pause time ms

What to watch for:

  • High goroutines: Potential goroutine leak or unbounded concurrency
  • High GC pause: Large heap or allocation pressure

Scenario-Specific Expectations

Scenario Key Metrics to Watch Expected Behavior
S1 (Burst) action_latency_p99, cpu_cores_max, goroutines_max Should handle bursts without queue backup
S2 (Fan-Out) reconcile_total, workloads_matched, memory_rss_mb_max One CM change → 50 workload reloads
S3 (High Cardinality) reconcile_duration, memory_heap_mb_avg Many namespaces shouldn't increase memory
S4 (No-Op) action_total = 0, cpu_cores_avg should be low Minimal resource usage for no-op
S5 (Churn) errors_total, goroutines_avg Graceful handling, no goroutine leak
S6 (Restart) All metrics captured Metrics survive controller restart
S7 (API Pressure) errors_total, cpu_cores_max, goroutines_max No errors under concurrent load
S8 (Large Objects) memory_rss_mb_max, gc_pause_p99_ms Large ConfigMaps don't cause OOM or GC issues
S9 (Multi-Workload) reload_executed_total per type All workload types (Deploy, STS, DS) reload
S10 (Secrets) reload_executed_total, workloads_matched Both Secrets and ConfigMaps trigger reloads
S11 (Annotation) workload annotations present Deployments get last-reloaded-from annotation
S12 (Pause) reload_executed_total << updates Pause-period reduces reload frequency
S13 (Complex) reload_executed_total All reference types trigger reloads

Troubleshooting

New implementation shows 0 for all metrics

  • Check if Prometheus is scraping the new Reloader pod
  • Verify pod annotations: prometheus.io/scrape: "true"
  • Check Prometheus targets: http://localhost:9091/targets

Metrics don't match expected values

  • Verify test ran to completion (check logs)
  • Ensure Prometheus scraped final metrics (18s wait after test)
  • Check for pod restarts during test (metrics reset on restart - handled by increase())

High latency in new implementation

  • Check Reloader pod resource limits
  • Look for API server throttling in logs
  • Compare reconcile_total - fewer reconciles with higher duration may be normal

REST client errors are non-zero

  • Common causes:
    • Optional CRD schemes registered but CRDs not installed (e.g., Argo Rollouts, OpenShift DeploymentConfig)
    • API server rate limiting under high load
    • RBAC permissions missing for certain resource types
  • Argo Rollouts errors: If you see ~4 errors per test, ensure --enable-argo-rollouts=false if not using Argo Rollouts
  • OpenShift errors: Similarly, ensure DeploymentConfig support is disabled on non-OpenShift clusters

REST client requests much higher in new implementation

  • Check if caching is working correctly
  • Look for excessive re-queuing in controller logs
  • Compare reconcile_total - more reconciles naturally means more API calls

Report Format

The report generator produces a comparison table with units and expected value indicators:

================================================================================
                     RELOADER A/B COMPARISON REPORT
================================================================================

Scenario:     S2
Generated:    2026-01-03 14:30:00
Status:       PASS
Summary:      All metrics within acceptable thresholds

Test:         S2: Fan-out test - 1 CM update triggers 50 deployment reloads

--------------------------------------------------------------------------------
                           METRIC COMPARISONS
--------------------------------------------------------------------------------
(Old✓/New✓ = meets expected value within 15%)

Metric                                   Old          New   Expected  Old✓  New✓   Status
------                                   ---          ---   --------  ----  ----   ------
reconcile_total                        50.00        25.00          -     -     -     pass
reconcile_duration_p50 (s)              0.01         0.05          -     -     -     pass
reconcile_duration_p95 (s)              0.02         0.15          -     -     -     pass
action_total                           50.00        50.00         50     ✓     ✓     pass
action_latency_p50 (s)                  0.05         0.03          -     -     -     pass
action_latency_p95 (s)                  0.12         0.08          -     -     -     pass
errors_total                            0.00         0.00          -     -     -     pass
reload_executed_total                  50.00        50.00         50     ✓     ✓     pass
workloads_scanned_total                50.00        50.00         50     ✓     ✓     pass
workloads_matched_total                50.00        50.00         50     ✓     ✓     pass
rest_client_requests_total              850         720            -     -     -     pass
rest_client_requests_get                500         420            -     -     -     pass
rest_client_requests_patch              300         250            -     -     -     pass
rest_client_requests_errors               0           0            -     -     -     pass

Reports are saved to results/<scenario>/report.txt after each test.

Directory Structure

test/loadtest/
├── cmd/
│   └── loadtest/              # Unified CLI (run + report)
│       └── main.go
├── internal/
│   ├── cluster/               # Kind cluster management
│   │   └── kind.go
│   ├── prometheus/            # Prometheus deployment & querying
│   │   └── prometheus.go
│   ├── reloader/              # Reloader deployment
│   │   └── deploy.go
│   └── scenarios/             # Test scenario implementations
│       └── scenarios.go
├── manifests/
│   └── prometheus.yaml        # Prometheus deployment manifest
├── results/                   # Generated after tests
│   └── <scenario>/
│       ├── old/               # Old version data
│       │   ├── *.json         # Prometheus metric snapshots
│       │   └── reloader.log   # Reloader pod logs
│       ├── new/               # New version data
│       │   ├── *.json         # Prometheus metric snapshots
│       │   └── reloader.log   # Reloader pod logs
│       ├── expected.json      # Expected values from test
│       └── report.txt         # Comparison report
├── go.mod
├── go.sum
└── README.md

Building Local Images for Testing

If you want to test local code changes:

# Build the new Reloader image from current source
docker build -t localhost/reloader:dev -f Dockerfile .

# Build from a different branch/commit
git checkout feature-branch
docker build -t localhost/reloader:feature -f Dockerfile .

# Then run comparison
./loadtest run \
  --old-image=stakater/reloader:v1.0.0 \
  --new-image=localhost/reloader:feature

Interpreting Results

PASS

All metrics are within acceptable thresholds. The new implementation is comparable or better than the old one.

FAIL

One or more metrics exceeded thresholds. Review the specific metrics:

  • Latency degradation: p95/p99 latencies are significantly higher
  • Missed reloads: reload_executed_total differs significantly
  • Errors increased: errors_total is higher in new version

Investigation

If tests fail, check:

  1. Pod logs: kubectl logs -n reloader-new deployment/reloader (or check results/<scenario>/new/reloader.log)
  2. Resource usage: kubectl top pods -n reloader-new
  3. Events: kubectl get events -n reloader-test

Parallel Execution

The --parallelism option enables running scenarios on multiple kind clusters simultaneously, significantly reducing total test time.

How It Works

  1. Multiple Clusters: Creates N kind clusters named reloader-loadtest-0, reloader-loadtest-1, etc.
  2. Separate Prometheus: Each cluster gets its own Prometheus instance with a unique port (9091, 9092, etc.)
  3. Worker Pool: Scenarios are distributed to workers via a channel, with each worker running on its own cluster
  4. Independent Execution: Each scenario runs in complete isolation with no resource contention

Usage

# Run 4 scenarios at a time (creates 4 clusters)
./loadtest run --new-image=my-image:tag --parallelism=4

# Run all 13 scenarios in parallel (creates 13 clusters)
./loadtest run --new-image=my-image:tag --parallelism=13 --scenario=all

Resource Requirements

Parallel execution requires significant system resources:

Parallelism Clusters Est. Memory Est. CPU
1 (default) 1 ~4GB 2-4 cores
4 4 ~16GB 8-16 cores
13 13 ~52GB 26-52 cores

Notes

  • The --skip-cluster option is not supported with parallelism > 1
  • Each worker loads images independently, so initial setup takes longer
  • All results are written to the same --results-dir with per-scenario subdirectories
  • If a cluster setup fails, remaining workers continue with available clusters
  • Parallelism automatically reduces to match scenario count if set higher

CI Integration

GitHub Actions

Load tests can be triggered on pull requests by commenting /loadtest:

/loadtest

This will:

  1. Build a container image from the PR branch
  2. Run all load test scenarios against it
  3. Post results as a PR comment
  4. Upload detailed results as artifacts

Make Target

Run load tests locally or in CI:

# From repository root
make loadtest

This builds the container image and runs all scenarios with a 60-second duration.