Add claude init

Signed-off-by: faizanahmad055 <faizan.ahmad55@outlook.com>
This commit is contained in:
faizanahmad055
2026-05-11 01:17:39 +02:00
parent 9294193f86
commit c83699cfd0

300
CLAUDE.md Normal file
View File

@@ -0,0 +1,300 @@
# Stakater Reloader Project Memory
## Project Purpose
Reloader is a Kubernetes operator that automatically triggers rolling restarts of workloads when the ConfigMaps or Secrets they reference are updated. Without it, Kubernetes does not restart pods when configuration changes — operators must do it manually or rely on GitOps pipelines.
**What it watches**: ConfigMaps, Secrets, Namespaces, and (optionally) `SecretProviderClassPodStatus` (CSI-mounted secrets).
**Workload types it can reload**: Deployment, StatefulSet, DaemonSet, CronJob, Job, Argo Rollout, and OpenShift DeploymentConfig.
**How restarts are triggered**: Two strategies (selected via `--reload-strategy`):
1. **env-vars** (default) — injects an environment variable (`STAKATER_{NAME}_{TYPE}`) into every container with the SHA1 hash of the resource's data. A change in data changes the env var value, causing Kubernetes to restart pods.
2. **annotations** — writes the SHA1 hash into the pod template's annotations, which also forces a rollout.
**The core problem it solves**: ConfigMaps and Secrets are decoupled from pod lifecycle in Kubernetes. Applications reading config at startup see stale data after a config update unless pods are restarted. Reloader closes that gap automatically and selectively.
**Potential improvements observed**:
- **Duplicate reload suppression**: If a workload references both a ConfigMap and a Secret that are updated in the same controller reconcile cycle, it may get reloaded twice. Could be solved with a per-workload debounce map keyed by namespace/name/resourceVersion, flushed after a short TTL.
- **CronJob/Job reload is destructive**: Jobs are deleted and recreated on change, which loses run history. Could instead only annotate the CronJob template without spawning a new Job.
- **No per-resource reload rate limiting**: A rapid-fire ConfigMap update (e.g., from a CI pipeline) can trigger many restarts. A cooldown window per resource would help.
- **CSI integration gap**: CSI volumes are watched at the `SecretProviderClassPodStatus` level but the link back to the workload is indirect and may miss edge cases. Needs a direct map from SecretProviderClass → workloads that mount it.
---
## Repo Map
| Path | Owns | Inspect when |
|---|---|---|
| `main.go` | Entry point, delegates to `app.Run()` | Never needs changes |
| `internal/pkg/app/` | `Run()` bootstrap, Cobra command wiring | Startup sequence changes |
| `internal/pkg/cmd/` | CLI flags parsing, `startReloader()`, controller/HA wiring | Adding new flags or startup behavior |
| `internal/pkg/controller/` | Informer/queue per resource type, event handlers (Add/Update/Delete) | Watching new resource types, queue tuning |
| `internal/pkg/handler/` | Per-event handlers (create, update, delete), `doRollingUpgrade()`, pause deployment | Core reload logic changes |
| `internal/pkg/callbacks/` | Workload-specific get/list/update/patch functions, `RollingUpgradeFuncs` struct | Adding new workload types |
| `internal/pkg/options/` | All CLI flag variables, defaults, `ArgoRolloutStrategy` type | Adding or renaming flags |
| `internal/pkg/constants/` | Constants: env var postfixes, annotation prefix, strategy names, HA lock name | Renaming global identifiers |
| `internal/pkg/metrics/` | Prometheus `Collectors` struct, all metric registration and recording helpers | Adding metrics |
| `internal/pkg/alerts/` | Slack/Teams/GChat/raw webhook alerting, env var config | Alert sink changes |
| `internal/pkg/util/` | SHA generation via `crypto/sha.go`, env var name conversion, namespace/label utilities | Utility/hash changes |
| `internal/pkg/crypto/` | `GenerateSHA(data)` — SHA1 hex digest | Hash algorithm changes |
| `internal/pkg/leadership/` | Leader election via Kubernetes Lease, HA stop/start of controllers | HA behavior changes |
| `internal/pkg/testutil/` | Fake Kubernetes objects for unit tests | Writing new tests |
| `pkg/common/` | `ReloadCheckResult`, `ReloaderOptions`, `ShouldReload()` logic, `Config` struct | Reload decision logic, annotation precedence |
| `pkg/kube/` | `Clients` struct (k8s + OpenShift + Argo + CSI), `GetKubernetesClient()`, `ResourceMap` | Client initialization, new CRD clients |
| `deployments/` | Helm chart (`deployments/kubernetes/chart/reloader/`), Kustomize manifests | Helm values, RBAC, deployment config |
| `docs/` | User-facing annotation documentation, architecture notes | Writing docs or confirming annotation behavior |
| `scripts/` | Shell scripts used by CI and Makefile | Build/release pipeline |
| `test/loadtest/` | Load test CLI (`cmd/loadtest`), 13 scenarios (S1S13), Kind cluster setup | Performance testing, regression benchmarks |
| `.github/` | CI workflows: lint, test, Kind e2e, multi-arch Docker build, release | CI changes |
---
## Core Runtime Flow
**1. Entry**`main.go:10` calls `app.Run()`.
**2. CLI Init**`internal/pkg/app/app.go` calls `cmd.NewReloaderCommand()` which registers all Cobra flags from `options/flags.go` and runs `startReloader()`.
**3. Client Setup**`pkg/kube/client.go`: builds `kube.Clients` with:
- `kubernetes.Interface` — standard k8s client
- `appsclient.Interface` — OpenShift client (auto-detected by probing `deploymentconfigs`)
- `argorollout.Interface` — if `--is-Argo-Rollouts=true`
- `csiclient.Interface` — if `--enable-csi-integration`
**4. Controller Creation**`startReloader()` iterates `kube.ResourceMap` (configmaps, secrets, namespaces, and optionally secretproviderclasspodstatuses) and calls `controller.NewController()` for each resource in each watched namespace.
**5. Informer/Queue**`controller.NewController()`:
- Creates a `cache.NewFilteredListWatchFromClient` with label/field selectors.
- Registers `Add`, `Update`, `Delete` event handlers.
- Creates a `workqueue.TypedRateLimitingQueue` for async processing.
**6. Event Detection**:
- `Add` — enqueues only if `ReloadOnCreate` is enabled (skips during initial sync unless `SyncAfterRestart`).
- `Update` — compares SHA of old vs new object data; enqueues only on real changes.
- `Delete` — enqueues only if `ReloadOnDelete` is enabled.
- Namespace events update `selectedNamespacesCache` for namespace-selector filtering.
**7. Handler Dispatch** — The queue worker calls `handler.Handle()` on the dequeued item. Three handler types:
- `ResourceCreatedHandler` (`create.go`) — fires `doRollingUpgrade` or sends webhook.
- `ResourceUpdatedHandler` (`update.go`) — fires `doRollingUpgrade` or sends webhook.
- `ResourceDeleteHandler` (`delete.go`) — calls `invokeDeleteStrategy` (removes env vars or clears annotation).
**8. Workload Discovery**`doRollingUpgrade()` (`upgrade.go:181`) calls `rollingUpgrade()` for each workload type. For each type, `ItemsFunc` lists all workloads in the namespace, then `pkg/common.ShouldReload()` checks annotations to decide which ones need reloading.
**9. Reload Execution**`invokeReloadStrategy()` either:
- **env-vars**: mutates container env vars; uses JSON patch if `SupportsPatch=true`, full update otherwise.
- **annotations**: writes SHA to pod template annotations; same patch/update split.
**10. Post-reload** — optionally pauses the Deployment via `pause_deployment.go`, records Kubernetes Events via `recorder`, updates Prometheus metrics, sends alert webhooks.
**HA Mode**: if `--enable-ha`, `internal/pkg/leadership/` runs Kubernetes Lease-based leader election. Only the leader runs controllers; losing leadership stops them and marks the pod unhealthy.
**HTTP Server**: port `:9090` serves `/metrics` (Prometheus) and liveness/readiness probes.
---
## Reload Behavior And Annotations
All annotation names are configurable via CLI flags; the values below are defaults.
### Trigger Annotations (on workloads)
| Annotation | Value | Behavior |
|---|---|---|
| `reloader.stakater.com/auto` | `"true"` | Reload on change to **any** ConfigMap or Secret referenced by the workload (via envFrom, env valueFrom, or volumes) |
| `configmap.reloader.stakater.com/auto` | `"true"` | Reload on change to **any referenced ConfigMap** only |
| `secret.reloader.stakater.com/auto` | `"true"` | Reload on change to **any referenced Secret** only |
| `secretproviderclass.reloader.stakater.com/auto` | `"true"` | Reload on change to **any referenced SecretProviderClass** only |
| `configmap.reloader.stakater.com/reload` | `"cm1,cm2"` | Reload only when the **named ConfigMaps** change (regex supported) |
| `secret.reloader.stakater.com/reload` | `"sec1,sec2"` | Reload only when the **named Secrets** change (regex supported) |
| `secretproviderclass.reloader.stakater.com/reload` | `"spc1"` | Reload only when the **named SecretProviderClass** changes |
| `reloader.stakater.com/search` | `"true"` | Reload when any ConfigMap/Secret tagged with `reloader.stakater.com/match: "true"` changes |
### Exclude Annotations (on workloads)
| Annotation | Value | Behavior |
|---|---|---|
| `reloader.stakater.com/ignore` | `"true"` | Skip this workload entirely |
| `configmaps.exclude.reloader.stakater.com/reload` | `"cm1,cm2"` | Exclude these named ConfigMaps from triggering reload |
| `secrets.exclude.reloader.stakater.com/reload` | `"sec1,sec2"` | Exclude these named Secrets |
| `secretproviderclasses.exclude.reloader.stakater.com/reload` | `"spc1"` | Exclude these named SecretProviderClasses |
### Behavior Annotations (on workloads)
| Annotation | Value | Behavior |
|---|---|---|
| `reloader.stakater.com/rollout-strategy` | `"restart"` or `"rollout"` | For Argo Rollouts: `"restart"` uses restartAt, `"rollout"` (default) uses full rollout update |
| `deployment.reloader.stakater.com/pause-period` | Go duration e.g. `"30s"` | Pause Deployment for this duration after reload |
| `deployment.reloader.stakater.com/paused-at` | RFC3339 timestamp | Set by Reloader to track pause start time; do not set manually |
### Search/Match Pattern
The `reloader.stakater.com/search` annotation on a workload pairs with `reloader.stakater.com/match: "true"` on a ConfigMap or Secret. Any workload with `search: true` will reload when any `match: true` resource changes.
### Global Flag Overrides
- `--auto-reload-all` — reload all workloads on any ConfigMap/Secret change; annotation not required.
- `--resources-to-ignore=configMaps` or `=secrets` — skip one type entirely.
- `--ignored-workload-types=jobs,cronjobs` — skip Job and CronJob reload.
- `--namespaces-to-ignore` — comma-separated namespace names to skip.
- `--namespace-selector` — only watch namespaces with matching labels.
- `--resource-label-selector` — only watch ConfigMaps/Secrets with matching labels.
### Precedence Rules
1. `reloader.stakater.com/ignore: "true"` wins everything — workload is skipped.
2. Exclude annotations override include annotations for specific named resources.
3. Named annotations (`.../reload`) are checked before auto annotations.
4. `--auto-reload-all` is the lowest-priority fallback (only applies if no annotation matches).
5. Annotations are checked on both the workload and its pod template (pod template takes precedence in some paths — verify in `pkg/common/common.go:ShouldReload()`).
---
## Workload Support
| Workload | SupportsPatch | Update Mechanism | Key files |
|---|---|---|---|
| **Deployment** | Yes | JSON patch or full update | `callbacks/rolling_upgrade.go`, `handler/upgrade.go:38` |
| **StatefulSet** | Yes | JSON patch or full update | `callbacks/rolling_upgrade.go`, `handler/upgrade.go:109` |
| **DaemonSet** | Yes | JSON patch or full update | `callbacks/rolling_upgrade.go`, `handler/upgrade.go:91` |
| **CronJob** | No | Creates a new Job from CronJob spec (adds `cronjob.kubernetes.io/instantiate: manual`) | `callbacks.CreateJobFromCronjob`, `handler/upgrade.go:55` |
| **Job** | No | Deletes old Job, creates new one (strips ResourceVersion, UID, Status, controller labels) | `callbacks.ReCreateJobFromjob`, `handler/upgrade.go:73` |
| **Argo Rollout** | No | Full update via Argo Rollouts client | `callbacks.UpdateRollout`, `handler/upgrade.go:127`; requires `--is-Argo-Rollouts=true` |
| **DeploymentConfig** | Yes | OpenShift DeploymentConfigs API | `callbacks/rolling_upgrade.go`; auto-detected by probing `deploymentconfigs` |
**Reload flow per workload**: `doRollingUpgrade()``rollingUpgrade()` per type → `ItemsFunc` lists workloads → `ShouldReload()` filters → `invokeReloadStrategy()` patches or updates → optional pause + metrics + alert.
---
## CSI Support
**Enabled by**: `--enable-csi-integration`
**What is watched**: `SecretProviderClassPodStatus` resources (from `sigs.k8s.io/secrets-store-csi-driver`). Resource name constant: `constants.SecretProviderClassController = "secretproviderclasspodstatuses"`.
**How it works**:
1. The CSI driver injects secrets into pods as volume mounts and tracks injection state via `SecretProviderClassPodStatus` objects.
2. Reloader watches these objects for version changes.
3. When a version change is detected, it computes a SHA of the object's IDs and versions.
4. It then looks up the referenced `SecretProviderClass` and treats the event like a Secret update, triggering workload reloads.
**Workload annotation**: `secretproviderclass.reloader.stakater.com/reload: "my-spc"` or `secretproviderclass.reloader.stakater.com/auto: "true"`.
**Required**: CSI CRDs must be installed in the cluster. Reloader auto-detects their presence at startup.
**Env var postfix**: `STAKATER_{NAME}_SECRETPROVIDERCLASS`.
**Known limitations**:
- Only works for secrets mounted as volumes via CSI, not env-var-based CSI injection.
- The link from `SecretProviderClassPodStatus` → workload is indirect; edge cases may be missed.
- Requires the CSI driver CRDs to be pre-installed; Reloader won't start CSI controller if CRDs are absent.
---
## Build, Test, And Run Commands
**Go version**: `go 1.26.2` (from `go.mod`)
| Purpose | Command |
|---|---|
| Run locally | `go run ./main.go` |
| Build binary | `make build``go build -o Reloader` |
| Unit tests | `make test``go test -timeout 1800s -v ./...` |
| Lint | `make lint``golangci-lint run ./...` (v2.6.1) |
| Docker build (single arch) | `make build-image ARCH=amd64` |
| Docker push | `make push` |
| Full release (build+push+manifest) | `make release ARCH=amd64` |
| Multi-arch release | `make release-all` |
| Generate k8s manifests | `make k8s-manifests` (Kustomize v5.3.0) |
| Load test (quick) | `make loadtest-quick LOADTEST_OLD_IMAGE=... LOADTEST_NEW_IMAGE=...` (runs S1, S4, S6) |
| Load test (full) | `make loadtest-full LOADTEST_OLD_IMAGE=... LOADTEST_NEW_IMAGE=...` |
| Load test (custom) | `make loadtest LOADTEST_SCENARIOS=S1,S3 LOADTEST_DURATION=120` |
**Docker image**: `ghcr.io/stakater/reloader` — multi-arch (amd64, arm64, arm), distroless nonroot base.
**Helm chart**: `deployments/kubernetes/chart/reloader/` — install via Helm or `kubectl apply -f deployments/kubernetes/reloader.yaml`.
---
## Coding Conventions
**Package boundaries**: Each `internal/pkg/<name>` package has a single clear responsibility. Cross-package access goes through exported types/functions only.
**Error handling**: `logrus.Errorf(...)` for non-fatal, `logrus.Fatalf(...)` for startup failures. Errors are returned up the call stack and logged at the point of action, not at every layer. Retry uses `k8s.io/client-go/util/retry.RetryOnConflict`.
**Logging**: `logrus` with structured fields. Format controlled by `--log-format=json` flag. Log level controlled by `--log-level`. Messages follow the pattern: `"Changes detected in '%s' of type '%s' in namespace '%s'"`.
**Kubernetes client patterns**: All k8s operations go through the `kube.Clients` struct. Use `context.TODO()` for context (no request-scoped contexts). List/watch via informers, not polling.
**Callback pattern**: Workload-specific logic is encapsulated in `callbacks.RollingUpgradeFuncs` structs returned by `handler.Get*RollingUpgradeFuncs()`. Adding a new workload type = add a new `RollingUpgradeFuncs` factory function and call it in `doRollingUpgrade()`.
**Test style**: Standard `testing.T`, `testify/assert`. Fake k8s objects via `testutil/kube.go`. Tests live alongside source in the same package. Large integration-style tests in `handler/upgrade_test.go`.
**Naming patterns**:
- Annotation variables: `XxxUpdateOnChangeAnnotation`, `XxxReloaderAutoAnnotation`
- Callback funcs: `GetXxxItem`, `GetXxxItems`, `UpdateXxx`, `PatchXxx`
- Handler factories: `GetXxxRollingUpgradeFuncs()`
**Adding new behavior**: Add flag to `options/flags.go` + `common.ReloaderOptions` struct → wire in `cmd/reloader.go` → implement logic in `handler/` or `callbacks/` → add metrics recording → write tests in `*_test.go`.
---
## Gotchas And Risks
**Duplicate reloads**: If a workload references multiple ConfigMaps/Secrets and all change simultaneously, each change event fires a separate reload. No deduplication exists within a reconcile window. This can cause unnecessary rolling restarts.
**Controller init guard**: `secretControllerInitialized` and `configmapControllerInitialized` booleans in `controller/controller.go` prevent processing Add events during the initial list/sync (to avoid reloading everything on startup). If `--sync-after-restart` is set, both are pre-set to `true`, bypassing the guard. Be careful when this interacts with `--reload-on-create`.
**Namespace filtering**: `--namespaces-to-ignore` does a name match; `--namespace-selector` watches namespaces by label and caches them in `selectedNamespacesCache`. The cache is updated on Namespace Add/Update/Delete events. A race between cache population and first ConfigMap event could cause missed reloads on startup in label-selected deployments.
**RBAC**: Reloader requires get/list/watch on secrets and configmaps, and get/list/watch/update/patch on all workload types it manages. Missing RBAC silently causes no reloads (not an error — just empty lists). Check ClusterRole in `deployments/kubernetes/chart/reloader/templates/`.
**GitOps drift**: If a GitOps tool (Flux, ArgoCD) manages the same Deployments, annotation or env var changes made by Reloader will be detected as drift and reverted. Use `--reload-strategy=annotations` with care in GitOps setups; `env-vars` strategy is generally safer since it modifies the pod template rather than workload-level annotations.
**Annotation precedence edge case**: Annotations are checked first on the workload object, then on the pod template. If both are set to conflicting values, the behavior depends on which path `ShouldReload()` hits first. Verify in `pkg/common/common.go`.
**CronJob/Job destructive reload**: Job recreation deletes the old Job. Any in-flight pod from that Job will be terminated. This is intentional but surprising. There is no protection for long-running jobs.
**OpenShift DeploymentConfig**: Auto-detected by probing for the `deploymentconfigs` resource. If the probe fails at startup, OpenShift support is silently disabled. Check `pkg/kube/client.go`.
**Argo Rollouts**: Must be explicitly enabled via `--is-Argo-Rollouts=true`. Without it, Rollout objects are never listed. The `SupportsPatch=false` means full object updates are used — be aware of potential conflicts with Argo's own controller.
**CSI rotation behavior**: `SecretProviderClassPodStatus` is updated by the CSI driver when secrets rotate. Reloader reacts to those updates. However, if the CSI driver updates the status in a way that doesn't change the versions Reloader tracks, the reload will be missed.
**Backward compatibility**: Annotation names are configurable, so changing defaults would break existing clusters. Never change default annotation values without a migration path.
**Tests to update for risky changes**: `handler/upgrade_test.go` (large suite covering all workload types), `controller/controller_test.go` (event handling), `pkg/common/common_test.go` (reload decision logic).
---
## Open Questions
- **Exact `ShouldReload()` precedence**: The code in `pkg/common/common.go` checks annotations in a specific order. The exact tie-breaking when both workload-level and pod-template-level annotations are set should be verified by reading that function fully before making annotation behavior changes.
- **CSI → workload mapping**: How exactly does Reloader map a `SecretProviderClassPodStatus` change back to workloads? Is it via the SecretProviderClass name matching an annotation on the workload, or via volume reference scanning? Needs confirmation before adding CSI-related features.
- **`ContainerPatchPathFunc` field**: `RollingUpgradeFuncs` has a `ContainerPatchPathFunc` field but it is not documented — unclear if/how it differs from `ContainersFunc` in patch scenarios.
- **Webhook vs alert**: `--webhook-url` replaces reloading with a POST request. `ALERT_WEBHOOK_URL` env var sends an alert *after* reloading. These are two different mechanisms; the naming is confusing and easy to conflate.
- **Load test scenarios S7S13**: Only S1, S4, and S6 are confirmed from CI. The behavior and coverage of the remaining scenarios is unknown without reading `test/loadtest/` in full.
- **`SyncAfterRestart` semantics**: Flag docs say it "syncs add events after restart" but only if `ReloadOnCreate` is also true. The interaction between these two flags in HA mode (where controllers restart on leader change) needs verification.
---
## Important Files
| File | Description |
|---|---|
| `internal/pkg/cmd/reloader.go` | `startReloader()` — main wiring of clients, controllers, HA, and HTTP server |
| `internal/pkg/handler/upgrade.go` | `doRollingUpgrade()` + all `Get*RollingUpgradeFuncs()` factories |
| `internal/pkg/callbacks/rolling_upgrade.go` | All workload-specific get/update/patch implementations |
| `pkg/common/common.go` | `ShouldReload()` — the annotation decision tree |
| `internal/pkg/options/flags.go` | Every configurable option with defaults |
| `internal/pkg/controller/controller.go` | Informer setup, queue, event handlers |
| `pkg/kube/client.go` | Multi-client initialization and OpenShift/CSI detection |
| `internal/pkg/handler/pause_deployment.go` | Pause/resume deployment logic with timers |
| `internal/pkg/leadership/leadership.go` | HA leader election |
| `internal/pkg/metrics/prometheus.go` | All Prometheus collector definitions |
| `internal/pkg/alerts/alert.go` | Slack/Teams/GChat alerting |
| `internal/pkg/constants/constants.go` | Global constants (env var prefixes, annotation prefix, strategy names) |
| `deployments/kubernetes/chart/reloader/values.yaml` | Helm chart defaults — source of truth for production config |
| `handler/upgrade_test.go` | Largest test suite; must be updated for any reload logic change |
| `Makefile` | All build/test/release/loadtest commands |