22 KiB
Stakater Reloader Project Memory
Project Purpose
Reloader is a Kubernetes operator that automatically triggers rolling restarts of workloads when the ConfigMaps or Secrets they reference are updated. Without it, Kubernetes does not restart pods when configuration changes — operators must do it manually or rely on GitOps pipelines.
What it watches: ConfigMaps, Secrets, Namespaces, and (optionally) SecretProviderClassPodStatus (CSI-mounted secrets).
Workload types it can reload: Deployment, StatefulSet, DaemonSet, CronJob, Job, Argo Rollout, and OpenShift DeploymentConfig.
How restarts are triggered: Two strategies (selected via --reload-strategy):
- env-vars (default) — injects an environment variable (
STAKATER_{NAME}_{TYPE}) into every container with the SHA1 hash of the resource's data. A change in data changes the env var value, causing Kubernetes to restart pods. - annotations — writes the SHA1 hash into the pod template's annotations, which also forces a rollout.
The core problem it solves: ConfigMaps and Secrets are decoupled from pod lifecycle in Kubernetes. Applications reading config at startup see stale data after a config update unless pods are restarted. Reloader closes that gap automatically and selectively.
Potential improvements observed:
- Duplicate reload suppression: If a workload references both a ConfigMap and a Secret that are updated in the same controller reconcile cycle, it may get reloaded twice. Could be solved with a per-workload debounce map keyed by namespace/name/resourceVersion, flushed after a short TTL.
- CronJob/Job reload is destructive: Jobs are deleted and recreated on change, which loses run history. Could instead only annotate the CronJob template without spawning a new Job.
- No per-resource reload rate limiting: A rapid-fire ConfigMap update (e.g., from a CI pipeline) can trigger many restarts. A cooldown window per resource would help.
- CSI integration gap: CSI volumes are watched at the
SecretProviderClassPodStatuslevel, but the link back to the workload is indirect and may miss edge cases. Needs a direct map from SecretProviderClass → workloads that mount it.
Repo Map
| Path | Owns | Inspect when |
|---|---|---|
main.go |
Entry point, delegates to app.Run() |
Never needs changes |
internal/pkg/app/ |
Run() bootstrap, Cobra command wiring |
Startup sequence changes |
internal/pkg/cmd/ |
CLI flags parsing, startReloader(), controller/HA wiring |
Adding new flags or startup behavior |
internal/pkg/controller/ |
Informer/queue per resource type, event handlers (Add/Update/Delete) | Watching new resource types, queue tuning |
internal/pkg/handler/ |
Per-event handlers (create, update, delete), doRollingUpgrade(), pause deployment |
Core reload logic changes |
internal/pkg/callbacks/ |
Workload-specific get/list/update/patch functions, RollingUpgradeFuncs struct |
Adding new workload types |
internal/pkg/options/ |
All CLI flag variables, defaults, ArgoRolloutStrategy type |
Adding or renaming flags |
internal/pkg/constants/ |
Constants: env var postfixes, annotation prefix, strategy names, HA lock name | Renaming global identifiers |
internal/pkg/metrics/ |
Prometheus Collectors struct, all metric registration and recording helpers |
Adding metrics |
internal/pkg/alerts/ |
Slack/Teams/GChat/raw webhook alerting, env var config | Alert sink changes |
internal/pkg/util/ |
SHA generation via crypto/sha.go, env var name conversion, namespace/label utilities |
Utility/hash changes |
internal/pkg/crypto/ |
GenerateSHA(data) — SHA1 hex digest |
Hash algorithm changes |
internal/pkg/leadership/ |
Leader election via Kubernetes Lease, HA stop/start of controllers | HA behavior changes |
internal/pkg/testutil/ |
Fake Kubernetes objects for unit tests | Writing new tests |
pkg/common/ |
ReloadCheckResult, ReloaderOptions, ShouldReload() logic, Config struct |
Reload decision logic, annotation precedence |
pkg/kube/ |
Clients struct (k8s + OpenShift + Argo + CSI), GetKubernetesClient(), ResourceMap |
Client initialization, new CRD clients |
deployments/ |
Helm chart (deployments/kubernetes/chart/reloader/), Kustomize manifests |
Helm values, RBAC, deployment config |
docs/ |
User-facing annotation documentation, architecture notes | Writing docs or confirming annotation behavior |
scripts/ |
Shell scripts used by CI and Makefile | Build/release pipeline |
test/loadtest/ |
Load test CLI (cmd/loadtest), 13 scenarios (S1–S13), Kind cluster setup |
Performance testing, regression benchmarks |
.github/ |
CI workflows: lint, test, Kind e2e, multi-arch Docker build, release | CI changes |
Core Runtime Flow
1. Entry — main.go:10 calls app.Run().
2. CLI Init — internal/pkg/app/app.go calls cmd.NewReloaderCommand() which registers all Cobra flags from options/flags.go and runs startReloader().
3. Client Setup — pkg/kube/client.go: builds kube.Clients with:
kubernetes.Interface— standard k8s clientappsclient.Interface— OpenShift client (auto-detected by probingdeploymentconfigs)argorollout.Interface— if--is-Argo-Rollouts=truecsiclient.Interface— if--enable-csi-integration
4. Controller Creation — startReloader() iterates kube.ResourceMap (configmaps, secrets, namespaces, and optionally secretproviderclasspodstatuses) and calls controller.NewController() for each resource in each watched namespace.
5. Informer/Queue — controller.NewController():
- Creates a
cache.NewFilteredListWatchFromClientwith label/field selectors. - Registers
Add,Update,Deleteevent handlers. - Creates a
workqueue.TypedRateLimitingQueuefor async processing.
6. Event Detection:
Add— enqueues only ifReloadOnCreateis enabled (skips during initial sync unlessSyncAfterRestart).Update— compares SHA of old vs new object data; enqueues only on real changes.Delete— enqueues only ifReloadOnDeleteis enabled.- Namespace events update
selectedNamespacesCachefor namespace-selector filtering.
7. Handler Dispatch — The queue worker calls handler.Handle() on the dequeued item. Three handler types:
ResourceCreatedHandler(create.go) — firesdoRollingUpgradeor sends webhook.ResourceUpdatedHandler(update.go) — firesdoRollingUpgradeor sends webhook.ResourceDeleteHandler(delete.go) — callsinvokeDeleteStrategy(removes env vars or clears annotation).
8. Workload Discovery — doRollingUpgrade() (upgrade.go:181) calls rollingUpgrade() for each workload type. For each type, ItemsFunc lists all workloads in the namespace, then pkg/common.ShouldReload() checks annotations to decide which ones need reloading.
9. Reload Execution — invokeReloadStrategy() either:
- env-vars: mutates container env vars; uses JSON patch if
SupportsPatch=true, full update otherwise. - annotations: writes SHA to pod template annotations; same patch/update split.
10. Post-reload — optionally pauses the Deployment via pause_deployment.go, records Kubernetes Events via recorder, updates Prometheus metrics, sends alert webhooks.
HA Mode: if --enable-ha, internal/pkg/leadership/ runs Kubernetes Lease-based leader election. Only the leader runs controllers; losing leadership stops them and marks the pod unhealthy.
HTTP Server: port :9090 serves /metrics (Prometheus) and liveness/readiness probes.
Reload Behavior And Annotations
All annotation names are configurable via CLI flags; the values below are defaults.
Trigger Annotations (on workloads)
| Annotation | Value | Behavior |
|---|---|---|
reloader.stakater.com/auto |
"true" |
Reload on change to any ConfigMap or Secret referenced by the workload (via envFrom, env valueFrom, or volumes) |
configmap.reloader.stakater.com/auto |
"true" |
Reload on change to any referenced ConfigMap only |
secret.reloader.stakater.com/auto |
"true" |
Reload on change to any referenced Secret only |
secretproviderclass.reloader.stakater.com/auto |
"true" |
Reload on change to any referenced SecretProviderClass only |
configmap.reloader.stakater.com/reload |
"cm1,cm2" |
Reload only when the named ConfigMaps change (regex supported) |
secret.reloader.stakater.com/reload |
"sec1,sec2" |
Reload only when the named Secrets change (regex supported) |
secretproviderclass.reloader.stakater.com/reload |
"spc1" |
Reload only when the named SecretProviderClass changes |
reloader.stakater.com/search |
"true" |
Reload when any ConfigMap/Secret tagged with reloader.stakater.com/match: "true" changes |
Exclude Annotations (on workloads)
| Annotation | Value | Behavior |
|---|---|---|
reloader.stakater.com/ignore |
"true" |
Skip this workload entirely |
configmaps.exclude.reloader.stakater.com/reload |
"cm1,cm2" |
Exclude these named ConfigMaps from triggering reload |
secrets.exclude.reloader.stakater.com/reload |
"sec1,sec2" |
Exclude these named Secrets |
secretproviderclasses.exclude.reloader.stakater.com/reload |
"spc1" |
Exclude these named SecretProviderClasses |
Behavior Annotations (on workloads)
| Annotation | Value | Behavior |
|---|---|---|
reloader.stakater.com/rollout-strategy |
"restart" or "rollout" |
For Argo Rollouts: "restart" uses restartAt, "rollout" (default) uses full rollout update |
deployment.reloader.stakater.com/pause-period |
Go duration e.g. "30s" |
Pause Deployment for this duration after reload |
deployment.reloader.stakater.com/paused-at |
RFC3339 timestamp | Set by Reloader to track pause start time; do not set manually |
Search/Match Pattern
The reloader.stakater.com/search annotation on a workload pairs with reloader.stakater.com/match: "true" on a ConfigMap or Secret. Any workload with search: true will reload when any match: true resource changes.
Global Flag Overrides
--auto-reload-all— reload all workloads on any ConfigMap/Secret change; annotation not required.--resources-to-ignore=configMapsor=secrets— skip one type entirely.--ignored-workload-types=jobs,cronjobs— skip Job and CronJob reload.--namespaces-to-ignore— comma-separated namespace names to skip.--namespace-selector— only watch namespaces with matching labels.--resource-label-selector— only watch ConfigMaps/Secrets with matching labels.
Precedence Rules
reloader.stakater.com/ignore: "true"wins everything — workload is skipped.- Exclude annotations override include annotations for specific named resources.
- Named annotations (
.../reload) are checked before auto annotations. --auto-reload-allis the lowest-priority fallback (only applies if no annotation matches).- Annotations are checked on both the workload and its pod template (pod template takes precedence in some paths — verify in
pkg/common/common.go:ShouldReload()).
Workload Support
| Workload | SupportsPatch | Update Mechanism | Key files |
|---|---|---|---|
| Deployment | Yes | JSON patch or full update | callbacks/rolling_upgrade.go, handler/upgrade.go:38 |
| StatefulSet | Yes | JSON patch or full update | callbacks/rolling_upgrade.go, handler/upgrade.go:109 |
| DaemonSet | Yes | JSON patch or full update | callbacks/rolling_upgrade.go, handler/upgrade.go:91 |
| CronJob | No | Creates a new Job from CronJob spec (adds cronjob.kubernetes.io/instantiate: manual) |
callbacks.CreateJobFromCronjob, handler/upgrade.go:55 |
| Job | No | Deletes old Job, creates new one (strips ResourceVersion, UID, Status, controller labels) | callbacks.ReCreateJobFromjob, handler/upgrade.go:73 |
| Argo Rollout | No | Full update via Argo Rollouts client | callbacks.UpdateRollout, handler/upgrade.go:127; requires --is-Argo-Rollouts=true |
| DeploymentConfig | Yes | OpenShift DeploymentConfigs API | callbacks/rolling_upgrade.go; auto-detected by probing deploymentconfigs |
Reload flow per workload: doRollingUpgrade() → rollingUpgrade() per type → ItemsFunc lists workloads → ShouldReload() filters → invokeReloadStrategy() patches or updates → optional pause + metrics + alert.
CSI Support
Enabled by: --enable-csi-integration
What is watched: SecretProviderClassPodStatus resources (from sigs.k8s.io/secrets-store-csi-driver). Resource name constant: constants.SecretProviderClassController = "secretproviderclasspodstatuses".
How it works:
- The CSI driver injects secrets into pods as volume mounts and tracks injection state via
SecretProviderClassPodStatusobjects. - Reloader watches these objects for version changes.
- When a version change is detected, it computes a SHA of the object's IDs and versions.
- It then looks up the referenced
SecretProviderClassand treats the event like a Secret update, triggering workload reloads.
Workload annotation: secretproviderclass.reloader.stakater.com/reload: "my-spc" or secretproviderclass.reloader.stakater.com/auto: "true".
Required: CSI CRDs must be installed in the cluster. Reloader auto-detects their presence at startup.
Env var postfix: STAKATER_{NAME}_SECRETPROVIDERCLASS.
Known limitations:
- Only works for secrets mounted as volumes via CSI, not env-var-based CSI injection.
- The link from
SecretProviderClassPodStatus→ workload is indirect; edge cases may be missed. - Requires the CSI driver CRDs to be pre-installed; Reloader won't start CSI controller if CRDs are absent.
Build, Test, And Run Commands
Go version: go 1.26.2 (from go.mod)
| Purpose | Command |
|---|---|
| Run locally | go run ./main.go |
| Build binary | make build → go build -o Reloader |
| Unit tests | make test → go test -timeout 1800s -v ./... |
| Lint | make lint → golangci-lint run ./... (v2.6.1) |
| Docker build (single arch) | make build-image ARCH=amd64 |
| Docker push | make push |
| Full release (build+push+manifest) | make release ARCH=amd64 |
| Multi-arch release | make release-all |
| Generate k8s manifests | make k8s-manifests (Kustomize v5.3.0) |
| Load test (quick) | make loadtest-quick LOADTEST_OLD_IMAGE=... LOADTEST_NEW_IMAGE=... (runs S1, S4, S6) |
| Load test (full) | make loadtest-full LOADTEST_OLD_IMAGE=... LOADTEST_NEW_IMAGE=... |
| Load test (custom) | make loadtest LOADTEST_SCENARIOS=S1,S3 LOADTEST_DURATION=120 |
Docker image: ghcr.io/stakater/reloader — multi-arch (amd64, arm64, arm), distroless nonroot base.
Helm chart: deployments/kubernetes/chart/reloader/ — install via Helm or kubectl apply -f deployments/kubernetes/reloader.yaml.
Coding Conventions
Package boundaries: Each internal/pkg/<name> package has a single clear responsibility. Cross-package access goes through exported types/functions only.
Error handling: logrus.Errorf(...) for non-fatal, logrus.Fatalf(...) for startup failures. Errors are returned up the call stack and logged at the point of action, not at every layer. Retry uses k8s.io/client-go/util/retry.RetryOnConflict.
Logging: logrus with structured fields. Format controlled by --log-format=json flag. Log level controlled by --log-level. Messages follow the pattern: "Changes detected in '%s' of type '%s' in namespace '%s'".
Kubernetes client patterns: All k8s operations go through the kube.Clients struct. Use context.TODO() for context (no request-scoped contexts). List/watch via informers, not polling.
Callback pattern: Workload-specific logic is encapsulated in callbacks.RollingUpgradeFuncs structs returned by handler.Get*RollingUpgradeFuncs(). Adding a new workload type = add a new RollingUpgradeFuncs factory function and call it in doRollingUpgrade().
Test style: Standard testing.T, testify/assert. Fake k8s objects via testutil/kube.go. Tests live alongside source in the same package. Large integration-style tests in handler/upgrade_test.go.
Naming patterns:
- Annotation variables:
XxxUpdateOnChangeAnnotation,XxxReloaderAutoAnnotation - Callback funcs:
GetXxxItem,GetXxxItems,UpdateXxx,PatchXxx - Handler factories:
GetXxxRollingUpgradeFuncs()
Adding new behavior: Add flag to options/flags.go + common.ReloaderOptions struct → wire in cmd/reloader.go → implement logic in handler/ or callbacks/ → add metrics recording → write tests in *_test.go.
Gotchas And Risks
Duplicate reloads: If a workload references multiple ConfigMaps/Secrets and all change simultaneously, each change event fires a separate reload. No deduplication exists within a reconcile window. This can cause unnecessary rolling restarts.
Controller init guard: secretControllerInitialized and configmapControllerInitialized booleans in controller/controller.go prevent processing Add events during the initial list/sync (to avoid reloading everything on startup). If --sync-after-restart is set, both are pre-set to true, bypassing the guard. Be careful when this interacts with --reload-on-create.
Namespace filtering: --namespaces-to-ignore does a name match; --namespace-selector watches namespaces by label and caches them in selectedNamespacesCache. The cache is updated on Namespace Add/Update/Delete events. A race between cache population and first ConfigMap event could cause missed reloads on startup in label-selected deployments.
RBAC: Reloader requires get/list/watch on secrets and configmaps, and get/list/watch/update/patch on all workload types it manages. Missing RBAC silently causes no reloads (not an error — just empty lists). Check ClusterRole in deployments/kubernetes/chart/reloader/templates/.
GitOps drift: If a GitOps tool (Flux, ArgoCD) manages the same Deployments, annotation or env var changes made by Reloader will be detected as drift and reverted. Use --reload-strategy=annotations with care in GitOps setups; env-vars strategy is generally safer since it modifies the pod template rather than workload-level annotations.
Annotation precedence edge case: Annotations are checked first on the workload object, then on the pod template. If both are set to conflicting values, the behavior depends on which path ShouldReload() hits first. Verify in pkg/common/common.go.
CronJob/Job destructive reload: Job recreation deletes the old Job. Any in-flight pod from that Job will be terminated. This is intentional but surprising. There is no protection for long-running jobs.
OpenShift DeploymentConfig: Auto-detected by probing for the deploymentconfigs resource. If the probe fails at startup, OpenShift support is silently disabled. Check pkg/kube/client.go.
Argo Rollouts: Must be explicitly enabled via --is-Argo-Rollouts=true. Without it, Rollout objects are never listed. The SupportsPatch=false means full object updates are used — be aware of potential conflicts with Argo's own controller.
CSI rotation behavior: SecretProviderClassPodStatus is updated by the CSI driver when secrets rotate. Reloader reacts to those updates. However, if the CSI driver updates the status in a way that doesn't change the versions Reloader tracks, the reload will be missed.
Backward compatibility: Annotation names are configurable, so changing defaults would break existing clusters. Never change default annotation values without a migration path.
Tests to update for risky changes: handler/upgrade_test.go (large suite covering all workload types), controller/controller_test.go (event handling), pkg/common/common_test.go (reload decision logic).
Open Questions
- Exact
ShouldReload()precedence: The code inpkg/common/common.gochecks annotations in a specific order. The exact tie-breaking when both workload-level and pod-template-level annotations are set should be verified by reading that function fully before making annotation behavior changes. - CSI → workload mapping: How exactly does Reloader map a
SecretProviderClassPodStatuschange back to workloads? Is it via the SecretProviderClass name matching an annotation on the workload, or via volume reference scanning? Needs confirmation before adding CSI-related features. ContainerPatchPathFuncfield:RollingUpgradeFuncshas aContainerPatchPathFuncfield, but it is not documented — unclear if/how it differs fromContainersFuncin patch scenarios.- Webhook vs alert:
--webhook-urlreplaces reloading with a POST request.ALERT_WEBHOOK_URLenv var sends an alert after reloading. These are two different mechanisms; the naming is confusing and easy to conflate. - Load test scenarios S7–S13: Only S1, S4, and S6 are confirmed from CI. The behavior and coverage of the remaining scenarios is unknown without reading
test/loadtest/in full. SyncAfterRestartsemantics: Flag docs say it "syncs add events after restart" but only ifReloadOnCreateis also true. The interaction between these two flags in HA mode (where controllers restart on leader change) needs verification.
Important Files
| File | Description |
|---|---|
internal/pkg/cmd/reloader.go |
startReloader() — main wiring of clients, controllers, HA, and HTTP server |
internal/pkg/handler/upgrade.go |
doRollingUpgrade() + all Get*RollingUpgradeFuncs() factories |
internal/pkg/callbacks/rolling_upgrade.go |
All workload-specific get/update/patch implementations |
pkg/common/common.go |
ShouldReload() — the annotation decision tree |
internal/pkg/options/flags.go |
Every configurable option with defaults |
internal/pkg/controller/controller.go |
Informer setup, queue, event handlers |
pkg/kube/client.go |
Multi-client initialization and OpenShift/CSI detection |
internal/pkg/handler/pause_deployment.go |
Pause/resume deployment logic with timers |
internal/pkg/leadership/leadership.go |
HA leader election |
internal/pkg/metrics/prometheus.go |
All Prometheus collector definitions |
internal/pkg/alerts/alert.go |
Slack/Teams/GChat alerting |
internal/pkg/constants/constants.go |
Global constants (env var prefixes, annotation prefix, strategy names) |
deployments/kubernetes/chart/reloader/values.yaml |
Helm chart defaults — source of truth for production config |
handler/upgrade_test.go |
Largest test suite; must be updated for any reload logic change |
Makefile |
All build/test/release/loadtest commands |