github/Reloader

Fork 0

mirror of https://github.com/stakater/Reloader.git synced 2026-02-14 18:09:50 +00:00

Files

History

TheiLLeniumStudios 2442eddd81 fix: Skip loading images when already done

2026-01-09 00:34:09 +01:00

cmd/loadtest

feat: Use cobra for loadtest CLI commands

2026-01-08 23:26:41 +01:00

internal

fix: Skip loading images when already done

2026-01-09 00:34:09 +01:00

manifests

feat: Load tests

2026-01-06 11:03:26 +01:00

go.mod

feat: Use cobra for loadtest CLI commands

2026-01-08 23:26:41 +01:00

go.sum

feat: Use cobra for loadtest CLI commands

2026-01-08 23:26:41 +01:00

README.md

feat: Load tests

2026-01-06 11:03:26 +01:00

README.md

Reloader Load Test Framework

This framework provides A/B comparison testing between two Reloader container images.

Overview

The load test framework:

Creates a local kind cluster (1 control-plane + 6 worker nodes)
Deploys Prometheus for metrics collection
Loads the provided Reloader container images into the cluster
Runs standardized test scenarios (S1-S13)
Collects metrics via Prometheus scraping
Generates comparison reports with pass/fail criteria

Prerequisites

Docker or Podman
kind (Kubernetes in Docker)
kubectl
Go 1.22+

Building

cd test/loadtest
go build -o loadtest ./cmd/loadtest

Quick Start

# Compare two published images (e.g., different versions)
./loadtest run \
  --old-image=stakater/reloader:v1.0.0 \
  --new-image=stakater/reloader:v1.1.0

# Run a specific scenario
./loadtest run \
  --old-image=stakater/reloader:v1.0.0 \
  --new-image=stakater/reloader:v1.1.0 \
  --scenario=S2 \
  --duration=120

# Test only a single image (no comparison)
./loadtest run --new-image=myregistry/reloader:dev

# Use local images built with docker/podman
./loadtest run \
  --old-image=localhost/reloader:baseline \
  --new-image=localhost/reloader:feature-branch

# Skip cluster creation (use existing kind cluster)
./loadtest run \
  --old-image=stakater/reloader:v1.0.0 \
  --new-image=stakater/reloader:v1.1.0 \
  --skip-cluster

# Run all scenarios in parallel on 4 clusters (faster execution)
./loadtest run \
  --new-image=localhost/reloader:dev \
  --parallelism=4

# Run all 13 scenarios in parallel (one cluster per scenario)
./loadtest run \
  --new-image=localhost/reloader:dev \
  --parallelism=13

# Generate report from existing results
./loadtest report --scenario=S2 --results-dir=./results

Command Line Options

Run Command

Option	Description	Default
`--old-image=IMAGE`	Container image for "old" version	-
`--new-image=IMAGE`	Container image for "new" version	-
`--scenario=ID`	Test scenario: S1-S13 or "all"	all
`--duration=SECONDS`	Test duration in seconds	60
`--parallelism=N`	Run N scenarios in parallel on N kind clusters	1
`--skip-cluster`	Skip kind cluster creation (use existing, only for parallelism=1)	false
`--results-dir=DIR`	Directory for results	./results

Note: At least one of --old-image or --new-image is required. Provide both for A/B comparison.

Report Command

Option	Description	Default
`--scenario=ID`	Scenario to report on (required)	-
`--results-dir=DIR`	Directory containing results	./results
`--output=FILE`	Output file (default: stdout)	-

Test Scenarios

ID	Name	Description
S1	Burst Updates	Many ConfigMap/Secret updates in quick succession
S2	Fan-Out	One ConfigMap used by many (50) workloads
S3	High Cardinality	Many CMs/Secrets across many namespaces
S4	No-Op Updates	Updates that don't change data (annotation only)
S5	Workload Churn	Deployments created/deleted rapidly
S6	Controller Restart	Restart controller pod under load
S7	API Pressure	Many concurrent update requests
S8	Large Objects	ConfigMaps > 100KB
S9	Multi-Workload Types	Tests all workload types (Deploy, STS, DS)
S10	Secrets + Mixed	Secrets and mixed ConfigMap+Secret workloads
S11	Annotation Strategy	Tests `--reload-strategy=annotations`
S12	Pause & Resume	Tests pause-period during rapid updates
S13	Complex References	Init containers, valueFrom, projected volumes

Metrics Reference

This section explains each metric collected during load tests, what it measures, and what different values might indicate.

Counter Metrics (Totals)

`reconcile_total`

What it measures: The total number of reconciliation loops executed by the controller.

What it indicates:

Higher in new vs old: The new controller-runtime implementation may batch events differently. This is often expected behavior, not a problem.
Lower in new vs old: Better event batching/deduplication. Controller-runtime's work queue naturally deduplicates events.
Expected behavior: The new implementation typically has fewer reconciles due to intelligent event batching.

`action_total`

What it measures: The total number of reload actions triggered (rolling restarts of Deployments/StatefulSets/DaemonSets).

What it indicates:

Should match expected value: Both implementations should trigger the same number of reloads for the same workload.
Lower than expected: Some updates were missed - potential bug or race condition.
Higher than expected: Duplicate reloads triggered - inefficiency but not data loss.

`reload_executed_total`

What it measures: Successful reload operations executed, labeled by success=true/false.

What it indicates:

success=true count: Number of workloads successfully restarted.
success=false count: Failed restart attempts (API errors, permission issues).
Should match action_total: If significantly lower, reloads are failing.

`workloads_scanned_total`

What it measures: Number of workloads (Deployments, etc.) scanned when checking for ConfigMap/Secret references.

What it indicates:

High count: Controller is scanning many workloads per reconcile.
Expected behavior: Should roughly match the number of workloads × number of reconciles.
Optimization signal: If very high, namespace filtering or label selectors could help.

`workloads_matched_total`

What it measures: Number of workloads that matched (reference the changed ConfigMap/Secret).

What it indicates:

Should match reload_executed_total: Every matched workload should be reloaded.
Higher than reloads: Some matched workloads weren't reloaded (potential issue).

`errors_total`

What it measures: Total errors encountered, labeled by error type.

What it indicates:

Should be 0: Any errors indicate problems.
Common causes: API server timeouts, RBAC issues, resource conflicts.
Critical metric: Non-zero errors in production should be investigated.

API Efficiency Metrics (REST Client)

These metrics track Kubernetes API server calls made by Reloader. Lower values indicate more efficient operation with less API server load.

`rest_client_requests_total`

What it measures: Total number of HTTP requests made to the Kubernetes API server.

What it indicates:

Lower is better: Fewer API calls means less load on the API server.
High count: May indicate inefficient caching or excessive reconciles.
Comparison use: Shows overall API efficiency between implementations.

`rest_client_requests_get`

What it measures: Number of GET requests (fetching individual resources or listings).

What it indicates:

Includes: Fetching ConfigMaps, Secrets, Deployments, etc.
Higher count: More frequent resource fetching, possibly due to cache misses.
Expected behavior: Controller-runtime's caching should reduce GET requests compared to direct API calls.

`rest_client_requests_patch`

What it measures: Number of PATCH requests (partial updates to resources).

What it indicates:

Used for: Rolling restart annotations on workloads.
Should correlate with: reload_executed_total - each reload typically requires one PATCH.
Lower is better: Fewer patches means more efficient batching or deduplication.

`rest_client_requests_put`

What it measures: Number of PUT requests (full resource updates).

What it indicates:

Used for: Full object replacements (less common than PATCH).
Should be low: Most updates use PATCH for efficiency.
High count: May indicate suboptimal update strategy.

`rest_client_requests_errors`

What it measures: Number of failed API requests (4xx/5xx responses).

What it indicates:

Should be 0: Errors indicate API server issues or permission problems.
Common causes: Rate limiting, RBAC issues, resource conflicts, network issues.
Non-zero: Investigate API server logs and Reloader permissions.

Latency Metrics (Percentiles)

All latency metrics are reported in seconds. The report shows p50 (median), p95, and p99 percentiles.

`reconcile_duration (s)`

What it measures: Time spent inside each reconcile loop, from start to finish.

What it indicates:

p50 (median): Typical reconcile time. Should be < 100ms for good performance.
p95: 95th percentile - only 5% of reconciles take longer than this.
p99: 99th percentile - indicates worst-case performance.

Interpreting differences:

New higher than old: Controller-runtime reconciles may do more work per loop but run fewer times. Check reconcile_total - if it's lower, this is expected.
Minor differences (< 0.5s absolute): Not significant for sub-second values.

`action_latency (s)`

What it measures: End-to-end time from ConfigMap/Secret change detection to workload restart triggered.

What it indicates:

This is the user-facing latency: How long users wait for their config changes to take effect.
p50 < 1s: Excellent - most changes apply within a second.
p95 < 5s: Good - even under load, changes apply quickly.
p99 > 10s: May need investigation - some changes take too long.

What affects this:

API server responsiveness
Number of workloads to scan
Concurrent updates competing for resources

Understanding the Report

Report Columns

Metric                           Old          New   Expected  Old✓  New✓   Status
------                           ---          ---   --------  ----  ----   ------
action_total                  100.00       100.00        100     ✓     ✓     pass
action_latency_p95 (s)          0.15         0.04          -     -     -     pass

Old/New: Measured values from each implementation
Expected: Known expected value (for throughput metrics)
Old✓/New✓: Whether the value is within 15% of expected (✓ = yes, ✗ = no, - = no expected value)
Status: pass/fail based on comparison thresholds

Pass/Fail Logic

Metric Type	Pass Condition
Throughput (action_total, reload_executed_total)	New value within 15% of expected
Latency (p50, p95, p99)	New not more than threshold% worse than old, OR absolute difference < minimum threshold
Errors	New ≤ Old (ideally both 0)
API Efficiency (rest_client_requests_*)	New ≤ Old (lower is better), or New not more than 50% higher

Latency Thresholds

Latency comparisons use both percentage AND absolute thresholds to avoid false failures:

Metric	Max % Worse	Min Absolute Diff
p50	100%	0.5s
p95	100%	1.0s
p99	100%	1.0s

Example: If old p50 = 0.01s and new p50 = 0.08s:

Percentage difference: +700% (would fail % check)
Absolute difference: 0.07s (< 0.5s threshold)
Result: PASS (both values are fast enough that the difference doesn't matter)

Resource Consumption Metrics

These metrics track CPU, memory, and Go runtime resource usage. Lower values generally indicate more efficient operation.

Memory Metrics

Metric	Description	Unit
`memory_rss_mb_avg`	Average RSS (resident set size) memory	MB
`memory_rss_mb_max`	Peak RSS memory during test	MB
`memory_heap_mb_avg`	Average Go heap allocation	MB
`memory_heap_mb_max`	Peak Go heap allocation	MB

What to watch for:

High RSS: May indicate memory leaks or inefficient caching
High heap: Many objects being created (check GC metrics)
Growing over time: Potential memory leak

CPU Metrics

Metric	Description	Unit
`cpu_cores_avg`	Average CPU usage rate	cores
`cpu_cores_max`	Peak CPU usage rate	cores

What to watch for:

High CPU: Inefficient algorithms or excessive reconciles
Spiky max: May indicate burst handling issues

Go Runtime Metrics

Metric	Description	Unit
`goroutines_avg`	Average goroutine count	count
`goroutines_max`	Peak goroutine count	count
`gc_pause_p99_ms`	99th percentile GC pause time	ms

What to watch for:

High goroutines: Potential goroutine leak or unbounded concurrency
High GC pause: Large heap or allocation pressure

Scenario-Specific Expectations

Scenario	Key Metrics to Watch	Expected Behavior
S1 (Burst)	action_latency_p99, cpu_cores_max, goroutines_max	Should handle bursts without queue backup
S2 (Fan-Out)	reconcile_total, workloads_matched, memory_rss_mb_max	One CM change → 50 workload reloads
S3 (High Cardinality)	reconcile_duration, memory_heap_mb_avg	Many namespaces shouldn't increase memory
S4 (No-Op)	action_total = 0, cpu_cores_avg should be low	Minimal resource usage for no-op
S5 (Churn)	errors_total, goroutines_avg	Graceful handling, no goroutine leak
S6 (Restart)	All metrics captured	Metrics survive controller restart
S7 (API Pressure)	errors_total, cpu_cores_max, goroutines_max	No errors under concurrent load
S8 (Large Objects)	memory_rss_mb_max, gc_pause_p99_ms	Large ConfigMaps don't cause OOM or GC issues
S9 (Multi-Workload)	reload_executed_total per type	All workload types (Deploy, STS, DS) reload
S10 (Secrets)	reload_executed_total, workloads_matched	Both Secrets and ConfigMaps trigger reloads
S11 (Annotation)	workload annotations present	Deployments get `last-reloaded-from` annotation
S12 (Pause)	reload_executed_total << updates	Pause-period reduces reload frequency
S13 (Complex)	reload_executed_total	All reference types trigger reloads

Troubleshooting

New implementation shows 0 for all metrics

Check if Prometheus is scraping the new Reloader pod
Verify pod annotations: prometheus.io/scrape: "true"
Check Prometheus targets: http://localhost:9091/targets

Metrics don't match expected values

Verify test ran to completion (check logs)
Ensure Prometheus scraped final metrics (18s wait after test)
Check for pod restarts during test (metrics reset on restart - handled by increase())

High latency in new implementation

Check Reloader pod resource limits
Look for API server throttling in logs
Compare reconcile_total - fewer reconciles with higher duration may be normal

REST client errors are non-zero

Common causes:
- Optional CRD schemes registered but CRDs not installed (e.g., Argo Rollouts, OpenShift DeploymentConfig)
- API server rate limiting under high load
- RBAC permissions missing for certain resource types
Argo Rollouts errors: If you see ~4 errors per test, ensure --enable-argo-rollouts=false if not using Argo Rollouts
OpenShift errors: Similarly, ensure DeploymentConfig support is disabled on non-OpenShift clusters

REST client requests much higher in new implementation

Check if caching is working correctly
Look for excessive re-queuing in controller logs
Compare reconcile_total - more reconciles naturally means more API calls

Report Format

The report generator produces a comparison table with units and expected value indicators:

================================================================================
                     RELOADER A/B COMPARISON REPORT
================================================================================

Scenario:     S2
Generated:    2026-01-03 14:30:00
Status:       PASS
Summary:      All metrics within acceptable thresholds

Test:         S2: Fan-out test - 1 CM update triggers 50 deployment reloads

--------------------------------------------------------------------------------
                           METRIC COMPARISONS
--------------------------------------------------------------------------------
(Old✓/New✓ = meets expected value within 15%)

Metric                                   Old          New   Expected  Old✓  New✓   Status
------                                   ---          ---   --------  ----  ----   ------
reconcile_total                        50.00        25.00          -     -     -     pass
reconcile_duration_p50 (s)              0.01         0.05          -     -     -     pass
reconcile_duration_p95 (s)              0.02         0.15          -     -     -     pass
action_total                           50.00        50.00         50     ✓     ✓     pass
action_latency_p50 (s)                  0.05         0.03          -     -     -     pass
action_latency_p95 (s)                  0.12         0.08          -     -     -     pass
errors_total                            0.00         0.00          -     -     -     pass
reload_executed_total                  50.00        50.00         50     ✓     ✓     pass
workloads_scanned_total                50.00        50.00         50     ✓     ✓     pass
workloads_matched_total                50.00        50.00         50     ✓     ✓     pass
rest_client_requests_total              850         720            -     -     -     pass
rest_client_requests_get                500         420            -     -     -     pass
rest_client_requests_patch              300         250            -     -     -     pass
rest_client_requests_errors               0           0            -     -     -     pass

Reports are saved to results/<scenario>/report.txt after each test.

Directory Structure

test/loadtest/
├── cmd/
│   └── loadtest/              # Unified CLI (run + report)
│       └── main.go
├── internal/
│   ├── cluster/               # Kind cluster management
│   │   └── kind.go
│   ├── prometheus/            # Prometheus deployment & querying
│   │   └── prometheus.go
│   ├── reloader/              # Reloader deployment
│   │   └── deploy.go
│   └── scenarios/             # Test scenario implementations
│       └── scenarios.go
├── manifests/
│   └── prometheus.yaml        # Prometheus deployment manifest
├── results/                   # Generated after tests
│   └── <scenario>/
│       ├── old/               # Old version data
│       │   ├── *.json         # Prometheus metric snapshots
│       │   └── reloader.log   # Reloader pod logs
│       ├── new/               # New version data
│       │   ├── *.json         # Prometheus metric snapshots
│       │   └── reloader.log   # Reloader pod logs
│       ├── expected.json      # Expected values from test
│       └── report.txt         # Comparison report
├── go.mod
├── go.sum
└── README.md

Building Local Images for Testing

If you want to test local code changes:

# Build the new Reloader image from current source
docker build -t localhost/reloader:dev -f Dockerfile .

# Build from a different branch/commit
git checkout feature-branch
docker build -t localhost/reloader:feature -f Dockerfile .

# Then run comparison
./loadtest run \
  --old-image=stakater/reloader:v1.0.0 \
  --new-image=localhost/reloader:feature

Interpreting Results

PASS

All metrics are within acceptable thresholds. The new implementation is comparable or better than the old one.

FAIL

One or more metrics exceeded thresholds. Review the specific metrics:

Latency degradation: p95/p99 latencies are significantly higher
Missed reloads: reload_executed_total differs significantly
Errors increased: errors_total is higher in new version

Investigation

If tests fail, check:

Pod logs: kubectl logs -n reloader-new deployment/reloader (or check results/<scenario>/new/reloader.log)
Resource usage: kubectl top pods -n reloader-new
Events: kubectl get events -n reloader-test

Parallel Execution

The --parallelism option enables running scenarios on multiple kind clusters simultaneously, significantly reducing total test time.

How It Works

Multiple Clusters: Creates N kind clusters named reloader-loadtest-0, reloader-loadtest-1, etc.
Separate Prometheus: Each cluster gets its own Prometheus instance with a unique port (9091, 9092, etc.)
Worker Pool: Scenarios are distributed to workers via a channel, with each worker running on its own cluster
Independent Execution: Each scenario runs in complete isolation with no resource contention

Usage

# Run 4 scenarios at a time (creates 4 clusters)
./loadtest run --new-image=my-image:tag --parallelism=4

# Run all 13 scenarios in parallel (creates 13 clusters)
./loadtest run --new-image=my-image:tag --parallelism=13 --scenario=all

Resource Requirements

Parallel execution requires significant system resources:

Parallelism	Clusters	Est. Memory	Est. CPU
1 (default)	1	~4GB	2-4 cores
4	4	~16GB	8-16 cores
13	13	~52GB	26-52 cores

Notes

The --skip-cluster option is not supported with parallelism > 1
Each worker loads images independently, so initial setup takes longer
All results are written to the same --results-dir with per-scenario subdirectories
If a cluster setup fails, remaining workers continue with available clusters
Parallelism automatically reduces to match scenario count if set higher

CI Integration

GitHub Actions

Load tests can be triggered on pull requests by commenting /loadtest:

/loadtest

This will:

Build a container image from the PR branch
Run all load test scenarios against it
Post results as a PR comment
Upload detailed results as artifacts

Make Target

Run load tests locally or in CI:

# From repository root
make loadtest

This builds the container image and runs all scenarios with a 60-second duration.

README.md Unescape Escape

Reloader Load Test Framework

Overview

Prerequisites

Building

Quick Start

Command Line Options

Run Command

Report Command

Test Scenarios

Metrics Reference

Counter Metrics (Totals)

reconcile_total

action_total

reload_executed_total

workloads_scanned_total

workloads_matched_total

errors_total

API Efficiency Metrics (REST Client)

rest_client_requests_total

rest_client_requests_get

rest_client_requests_patch

rest_client_requests_put

rest_client_requests_errors

Latency Metrics (Percentiles)

reconcile_duration (s)

action_latency (s)

Understanding the Report

Report Columns

Pass/Fail Logic

Latency Thresholds

Resource Consumption Metrics

Memory Metrics

CPU Metrics

Go Runtime Metrics

Scenario-Specific Expectations

Troubleshooting

New implementation shows 0 for all metrics

Metrics don't match expected values

High latency in new implementation

REST client errors are non-zero

REST client requests much higher in new implementation

Report Format

Directory Structure

Building Local Images for Testing

Interpreting Results

PASS

FAIL

Investigation

Parallel Execution

How It Works

Usage

Resource Requirements

Notes

CI Integration

GitHub Actions

Make Target

README.md

`reconcile_total`

`action_total`

`reload_executed_total`

`workloads_scanned_total`

`workloads_matched_total`

`errors_total`

`rest_client_requests_total`

`rest_client_requests_get`

`rest_client_requests_patch`

`rest_client_requests_put`

`rest_client_requests_errors`

`reconcile_duration (s)`

`action_latency (s)`