* refactor: migrate error packages from pkg/errors to stdlib
Replace github.com/pkg/errors with Go standard library error handling
in foundation error packages:
- internal/datastore/errors: errors.Wrap -> fmt.Errorf with %w
- internal/errors: errors.As -> stdlib errors.As
- controllers/soot/controllers/errors: errors.New -> stdlib errors.New
Part 1 of 4 in the pkg/errors migration.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor: migrate datastore package from pkg/errors to stdlib
Replace github.com/pkg/errors with Go standard library error handling
in the datastore layer:
- connection.go: errors.Wrap -> fmt.Errorf with %w
- datastore.go: errors.Wrap -> fmt.Errorf with %w
- etcd.go: goerrors alias removed, use stdlib errors.As
- nats.go: errors.Wrap/Is/New -> stdlib equivalents
- postgresql.go: goerrors.Wrap -> fmt.Errorf with %w
Part 2 of 4 in the pkg/errors migration.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor: migrate internal packages from pkg/errors to stdlib (partial)
Replace github.com/pkg/errors with Go standard library error handling
in internal packages:
- internal/builders/controlplane: errors.Wrap -> fmt.Errorf
- internal/crypto: errors.Wrap -> fmt.Errorf
- internal/kubeadm: errors.Wrap/Wrapf -> fmt.Errorf
- internal/upgrade: errors.Wrap -> fmt.Errorf
- internal/webhook: errors.Wrap -> fmt.Errorf
Part 3 of 4 in the pkg/errors migration.
Remaining files: internal/resources/*.go (8 files, 42 occurrences)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor(resources): migrate from pkg/errors to stdlib
Replace github.com/pkg/errors with Go standard library:
- errors.Wrap(err, msg) → fmt.Errorf("msg: %w", err)
- errors.New(msg) → errors.New(msg)
Files migrated:
- internal/resources/kubeadm_phases.go
- internal/resources/kubeadm_upgrade.go
- internal/resources/kubeadm_utils.go
- internal/resources/datastore/datastore_multitenancy.go
- internal/resources/datastore/datastore_setup.go
- internal/resources/datastore/datastore_storage_config.go
- internal/resources/addons/coredns.go
- internal/resources/addons/kube_proxy.go
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor(controllers): migrate from pkg/errors to stdlib
Replace github.com/pkg/errors with Go standard library:
- errors.Wrap(err, msg) → fmt.Errorf("msg: %w", err)
- errors.New(msg) → errors.New(msg) (stdlib)
- errors.Is/As → errors.Is/As (stdlib)
Files migrated:
- controllers/datastore_controller.go
- controllers/kubeconfiggenerator_controller.go
- controllers/tenantcontrolplane_controller.go
- controllers/telemetry_controller.go
- controllers/certificate_lifecycle_controller.go
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor(soot): migrate from pkg/errors to stdlib
Replace github.com/pkg/errors with Go standard library:
- errors.Is() now uses stdlib errors.Is()
Files migrated:
- controllers/soot/controllers/kubeproxy.go
- controllers/soot/controllers/migrate.go
- controllers/soot/controllers/coredns.go
- controllers/soot/controllers/konnectivity.go
- controllers/soot/controllers/kubeadm_phase.go
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor(api,cmd): migrate from pkg/errors to stdlib
Replace github.com/pkg/errors with Go standard library:
- errors.Wrap(err, msg) → fmt.Errorf("msg: %w", err)
Files migrated:
- api/v1alpha1/tenantcontrolplane_funcs.go
- cmd/utils/k8s_version.go
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* chore: run go mod tidy after pkg/errors migration
The github.com/pkg/errors package moved from direct to indirect
dependency. It remains as an indirect dependency because other
packages in the dependency tree still use it.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(datastore): use errors.Is for sentinel error comparison
The stdlib errors.As expects a pointer to a concrete error type, not
a pointer to an error value. For comparing against sentinel errors
like rpctypes.ErrGRPCUserNotFound, errors.Is should be used instead.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: resolve golangci-lint errors
- Fix GCI import formatting (remove extra blank lines between groups)
- Use errors.Is instead of errors.As for mutex sentinel errors
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(errors): use proper variable declarations for errors.As
The errors.As function requires a pointer to an assignable variable,
not a pointer to a composite literal. The previous pattern
`errors.As(err, &SomeError{})` creates a pointer to a temporary value
which errors.As cannot reliably use for assignment.
This fix declares proper variables for each error type and passes
their addresses to errors.As, ensuring correct error chain matching.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(datastore/etcd): use rpctypes.Error() for gRPC error comparison
The etcd gRPC status errors (ErrGRPCUserNotFound, ErrGRPCRoleNotFound)
cannot be compared directly using errors.Is() because they are wrapped
in gRPC status errors during transmission.
The etcd rpctypes package provides:
- ErrGRPC* constants: server-side gRPC status errors
- Err* constants (without GRPC prefix): client-side comparable errors
- Error() function: converts gRPC errors to comparable EtcdError values
The correct pattern is to use rpctypes.Error(err) to normalize the
received error, then compare against client-side error constants
like rpctypes.ErrUserNotFound.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* feat: add ObservedGeneration to all status types
Add ObservedGeneration field to DataStoreStatus, KubeconfigGeneratorStatus,
and TenantControlPlaneStatus to track which generation the controller has
processed. This enables clients and tools like kstatus to determine if the
controller has reconciled the latest spec changes.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor: follow Cluster API pattern for ObservedGeneration
Move ObservedGeneration setting for TenantControlPlane from intermediate
status updates to the final successful reconciliation completion. This
follows Cluster API conventions where ObservedGeneration indicates the
controller has fully processed the given generation.
Previously, ObservedGeneration was set on every status update during
resource processing, which could mislead clients into thinking the spec
was fully reconciled when the controller was still mid-reconciliation
or had hit a transient error.
Now:
- DataStore: Sets ObservedGeneration before single status update (simple controller)
- KubeconfigGenerator: Sets ObservedGeneration before single status update (simple controller)
- TenantControlPlane: Sets ObservedGeneration only after ALL resources processed successfully
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* test: verify ObservedGeneration equals Generation after reconciliation
Add assertion to e2e test to verify that status.observedGeneration
equals metadata.generation after a TenantControlPlane is successfully
reconciled.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* chore: regenerate CRDs with ObservedGeneration field
Run make crds to regenerate CRDs with the new ObservedGeneration
field in status types.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* Run make manifests
* Run make apidoc
* Remove rbac role
* Remove webhook manifest
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This change extends Gateway API support to Konnectivity addons.
When `spec.controlPlane.gateway` is configured and Konnectivity addon is
enabled, Kamaji automatically creates two TLSRoutes:
1. A Control plane TLSRoute (port 6443, sectionName "kube-apiserver")
2. A Konnectivity TLSRoute (port 8132, sectionName "konnectivity-server")
Both routes use the hostname specified in `gateway.hostname` and reference
the same Gateway resource via `parentRefs`, with `port` and `sectionName`
set automatically by Kamaji.
This patch also adds CEL validation to prevent users from specifying
`port` or `sectionName` in Gateway `parentRefs`, as these fields are now
managed automatically by Kamaji.
Signed-off-by: Parth Yadav <parth@coredge.io>
* feat: add support for multiple Datastores
* docs: add guide for datastore overrides
* feat(datastore): add e2e test for dataStoreOverrides
* ci: reclaim disk space from runner to fix flaky tests
* Feat: Gateway Routes Specs, plus resource and status init progress
* Generated content, RBAC and start of e2e
* latest code POC Working but e2e fails
* Use Gateway API v1.2.0
* Remove draft comment
* Use TCPRoute
* Revert the charts folder to reduce noise
* Use the correct controller-gen version
* Rename fields and fix tcp/tls typos
* Rename TLSRouteSpec to GatewayRouteSpec
* Remove last instance of tcproute
* Renaming more fields to match the gateway api naming
* Remove ownership of the gateway
* Revert Ko to 0.14.1 and makefile comments
* service discovery, webhooks, and deadcode removal.
* add conditional check for gateway api resources and mark is as owned!
* removing duplicated code and note for maybe a refactor later
* E2E now works!
* e2e suite modifications to support Gateway API v1alpha2 TLSRoute
* Suggestions commit, naming and other related.
* First pass at the status update
* Rename route to gateway
* Only allow one hostname in gateway
* Update status types
* WIP: testing conditions
* Update status API
* Add tests
* Detect endpoint
* Update manifests
* Remove old code and use proper condition check
* Fix compilation error
* Watch the Gateway resources
* Rename fields
* Add missing port
* Add ingress endpoint to the kubeadm
* Error if access points are empty
* Check the spec and status to delay the creation of the kubeadm
* Use the spec for the hostname
* Update api/v1alpha1/tenantcontrolplane_types.go
Co-authored-by: Dario Tranchitella <dario@tranchitella.eu>
* PR fixes, CEL k8s validations, proper status updates checks
* more context and separation of functions
* resolve all pr comments, with indexer
* merge master - go {sum,mod} updates dependabot
* Feat: Gateway Routes Specs, plus resource and status init progress
* Use Gateway API v1.2.0
* merge master - go {sum,mod} updates dependabot
* sum go mod tidy
* leftover comments
* clean go.sum
* fix: missing generated crds spec
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
* docs: gateway api support
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
* golint comments
* linting and test fix.
* Gateway API resource watching was made conditional to prevent crashes when CRDs are absent, and TLSRoute creation now returns an error when the service isn't ready instead of creating invalid resources with empty rules.
* unit test was incorrect after all the fixes we did, gracefull errors are not expected due to conditional adds
* fix(conditional-indexer): Gateway Indexer should also be conditional
* fix(conditional-indexer): Gateway Indexer should also be conditional
---------
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
Co-authored-by: Hadrien Kohl <hadrien.kohl@gmail.com>
Co-authored-by: Dario Tranchitella <dario@tranchitella.eu>
* feat: pausing reconciliation of controlled objects
Objects such as TenantControlPlane and Secret can be annotated with
kamaji.clastix.io/paused to prevent controllers from processing them.
This will stop reconciling objects for debugging or other purposes.
Annotation value is irrelevant, just the key presence is evaluated.
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
* docs: pausing reconciliation of controlled objects
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
* chore(logs): typo for deleted resources
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
---------
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
* feat: buffered channels for generic events
Channels used for GenericEvent feeding for cross controllers triggers
are now buffered according to the --max-concurrent-tcp-reconciles: this
is required to avoid channel full errors when dealing with large
management clusters serving a sizeable amount of Tenant Control Planes.
Increasing this value will put more pressure on memory (mostly for GC)
and CPU (provisioning multiple certificates at the same time).
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
* refactor: retrying datastore status update
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
* feat(performance): reducing memory consumption for channel triggers
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
* feat(datastore): reconcile events only for root object changes
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
* feat: waiting soot manager exit before termination
This change introduces a grace period of 10 seconds before abruptly
terminating the Tenant Control Plane deployment, allowing the soot
manager to complete its exit procedure and avoid false positive errors
due to API Server being unresponsive due to user deletion.
Aim of this change is reducing the amount of false positive errors upon
mass deletion of Tenant COntrol Plane objects.
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
* refactor: unbuffered channel with timeout
WatchesRawSource is non blocking, no need to check if channel is full.
To prevent deadlocks a WithTimeout check has been introduced.
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
---------
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
* feat(api): introducing sleeping status
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
* chore(helm)!: introducing sleeping status
Marking this commit as breaking since a CustomResourceDefinition update
is required for users dealing with scale to zero since the introduction
of the new enum for the status field.
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
* docs: introducing sleeping status
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
---------
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>