46 Commits

Author SHA1 Message Date
Matteo Ruina
a9c2c0de89 fix(style): migrate from deprecated github.com/pkg/errors package (#1071)
* refactor: migrate error packages from pkg/errors to stdlib

Replace github.com/pkg/errors with Go standard library error handling
in foundation error packages:

- internal/datastore/errors: errors.Wrap -> fmt.Errorf with %w
- internal/errors: errors.As -> stdlib errors.As
- controllers/soot/controllers/errors: errors.New -> stdlib errors.New

Part 1 of 4 in the pkg/errors migration.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: migrate datastore package from pkg/errors to stdlib

Replace github.com/pkg/errors with Go standard library error handling
in the datastore layer:

- connection.go: errors.Wrap -> fmt.Errorf with %w
- datastore.go: errors.Wrap -> fmt.Errorf with %w
- etcd.go: goerrors alias removed, use stdlib errors.As
- nats.go: errors.Wrap/Is/New -> stdlib equivalents
- postgresql.go: goerrors.Wrap -> fmt.Errorf with %w

Part 2 of 4 in the pkg/errors migration.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: migrate internal packages from pkg/errors to stdlib (partial)

Replace github.com/pkg/errors with Go standard library error handling
in internal packages:

- internal/builders/controlplane: errors.Wrap -> fmt.Errorf
- internal/crypto: errors.Wrap -> fmt.Errorf
- internal/kubeadm: errors.Wrap/Wrapf -> fmt.Errorf
- internal/upgrade: errors.Wrap -> fmt.Errorf
- internal/webhook: errors.Wrap -> fmt.Errorf

Part 3 of 4 in the pkg/errors migration.

Remaining files: internal/resources/*.go (8 files, 42 occurrences)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor(resources): migrate from pkg/errors to stdlib

Replace github.com/pkg/errors with Go standard library:
- errors.Wrap(err, msg) → fmt.Errorf("msg: %w", err)
- errors.New(msg) → errors.New(msg)

Files migrated:
- internal/resources/kubeadm_phases.go
- internal/resources/kubeadm_upgrade.go
- internal/resources/kubeadm_utils.go
- internal/resources/datastore/datastore_multitenancy.go
- internal/resources/datastore/datastore_setup.go
- internal/resources/datastore/datastore_storage_config.go
- internal/resources/addons/coredns.go
- internal/resources/addons/kube_proxy.go

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor(controllers): migrate from pkg/errors to stdlib

Replace github.com/pkg/errors with Go standard library:
- errors.Wrap(err, msg) → fmt.Errorf("msg: %w", err)
- errors.New(msg) → errors.New(msg) (stdlib)
- errors.Is/As → errors.Is/As (stdlib)

Files migrated:
- controllers/datastore_controller.go
- controllers/kubeconfiggenerator_controller.go
- controllers/tenantcontrolplane_controller.go
- controllers/telemetry_controller.go
- controllers/certificate_lifecycle_controller.go

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor(soot): migrate from pkg/errors to stdlib

Replace github.com/pkg/errors with Go standard library:
- errors.Is() now uses stdlib errors.Is()

Files migrated:
- controllers/soot/controllers/kubeproxy.go
- controllers/soot/controllers/migrate.go
- controllers/soot/controllers/coredns.go
- controllers/soot/controllers/konnectivity.go
- controllers/soot/controllers/kubeadm_phase.go

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor(api,cmd): migrate from pkg/errors to stdlib

Replace github.com/pkg/errors with Go standard library:
- errors.Wrap(err, msg) → fmt.Errorf("msg: %w", err)

Files migrated:
- api/v1alpha1/tenantcontrolplane_funcs.go
- cmd/utils/k8s_version.go

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: run go mod tidy after pkg/errors migration

The github.com/pkg/errors package moved from direct to indirect
dependency. It remains as an indirect dependency because other
packages in the dependency tree still use it.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(datastore): use errors.Is for sentinel error comparison

The stdlib errors.As expects a pointer to a concrete error type, not
a pointer to an error value. For comparing against sentinel errors
like rpctypes.ErrGRPCUserNotFound, errors.Is should be used instead.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix: resolve golangci-lint errors

- Fix GCI import formatting (remove extra blank lines between groups)
- Use errors.Is instead of errors.As for mutex sentinel errors

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(errors): use proper variable declarations for errors.As

The errors.As function requires a pointer to an assignable variable,
not a pointer to a composite literal. The previous pattern
`errors.As(err, &SomeError{})` creates a pointer to a temporary value
which errors.As cannot reliably use for assignment.

This fix declares proper variables for each error type and passes
their addresses to errors.As, ensuring correct error chain matching.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix(datastore/etcd): use rpctypes.Error() for gRPC error comparison

The etcd gRPC status errors (ErrGRPCUserNotFound, ErrGRPCRoleNotFound)
cannot be compared directly using errors.Is() because they are wrapped
in gRPC status errors during transmission.

The etcd rpctypes package provides:
- ErrGRPC* constants: server-side gRPC status errors
- Err* constants (without GRPC prefix): client-side comparable errors
- Error() function: converts gRPC errors to comparable EtcdError values

The correct pattern is to use rpctypes.Error(err) to normalize the
received error, then compare against client-side error constants
like rpctypes.ErrUserNotFound.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 06:11:14 +01:00
Matteo Ruina
b40428825d feat: add ObservedGeneration (#1069)
* feat: add ObservedGeneration to all status types

Add ObservedGeneration field to DataStoreStatus, KubeconfigGeneratorStatus,
and TenantControlPlaneStatus to track which generation the controller has
processed. This enables clients and tools like kstatus to determine if the
controller has reconciled the latest spec changes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* refactor: follow Cluster API pattern for ObservedGeneration

Move ObservedGeneration setting for TenantControlPlane from intermediate
status updates to the final successful reconciliation completion. This
follows Cluster API conventions where ObservedGeneration indicates the
controller has fully processed the given generation.

Previously, ObservedGeneration was set on every status update during
resource processing, which could mislead clients into thinking the spec
was fully reconciled when the controller was still mid-reconciliation
or had hit a transient error.

Now:
- DataStore: Sets ObservedGeneration before single status update (simple controller)
- KubeconfigGenerator: Sets ObservedGeneration before single status update (simple controller)
- TenantControlPlane: Sets ObservedGeneration only after ALL resources processed successfully

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* test: verify ObservedGeneration equals Generation after reconciliation

Add assertion to e2e test to verify that status.observedGeneration
equals metadata.generation after a TenantControlPlane is successfully
reconciled.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: regenerate CRDs with ObservedGeneration field

Run make crds to regenerate CRDs with the new ObservedGeneration
field in status types.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Run make manifests

* Run make apidoc

* Remove rbac role

* Remove webhook manifest

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 12:18:46 +01:00
Léonard Suslian
d3fb03a752 feat: add support for multiple Datastores (#961)
* feat: add support for multiple Datastores

* docs: add guide for datastore overrides

* feat(datastore): add e2e test for dataStoreOverrides

* ci: reclaim disk space from runner to fix flaky tests
2025-12-12 12:10:02 +01:00
Alfredo Suarez
880b36e0fa feat: gateway api support (#1000)
* Feat: Gateway Routes Specs, plus resource and status init progress

* Generated content, RBAC and start of e2e

* latest code POC Working but e2e fails

* Use Gateway API v1.2.0

* Remove draft comment

* Use TCPRoute

* Revert the charts folder to reduce noise

* Use the correct controller-gen version

* Rename fields and fix tcp/tls typos

* Rename TLSRouteSpec to GatewayRouteSpec

* Remove last instance of tcproute

* Renaming more fields to match the gateway api naming

* Remove ownership of the gateway

* Revert Ko to 0.14.1 and makefile comments

* service discovery, webhooks, and deadcode removal.

* add conditional check for gateway api resources and mark is as owned!

* removing duplicated code and note for maybe a refactor later

* E2E now works!

* e2e suite modifications to support Gateway API v1alpha2 TLSRoute

* Suggestions commit, naming and other related.

* First pass at the status update

* Rename route to gateway

* Only allow one hostname in gateway

* Update status types

* WIP: testing conditions

* Update status API

* Add tests

* Detect endpoint

* Update manifests

* Remove old code and use proper condition check

* Fix compilation error

* Watch the Gateway resources

* Rename fields

* Add missing port

* Add ingress endpoint to the kubeadm

* Error if access points are empty

* Check the spec and status to delay the creation of the kubeadm

* Use the spec for the hostname

* Update api/v1alpha1/tenantcontrolplane_types.go

Co-authored-by: Dario Tranchitella <dario@tranchitella.eu>

* PR fixes, CEL k8s validations, proper status updates checks

* more context and separation of functions

* resolve all pr comments, with indexer

* merge master - go {sum,mod} updates dependabot

* Feat: Gateway Routes Specs, plus resource and status init progress

* Use Gateway API v1.2.0

* merge master - go {sum,mod} updates dependabot

* sum go mod tidy

* leftover comments

* clean go.sum

* fix: missing generated crds spec

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

* docs: gateway api support

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

* golint comments

* linting and test fix.

* Gateway API resource watching was made conditional to prevent crashes when CRDs are absent, and TLSRoute creation now returns an error when the service isn't ready instead of creating invalid resources with empty rules.

* unit test was incorrect after all the fixes we did, gracefull errors are not expected due to conditional adds

* fix(conditional-indexer): Gateway Indexer should also be conditional

* fix(conditional-indexer): Gateway Indexer should also be conditional

---------

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
Co-authored-by: Hadrien Kohl <hadrien.kohl@gmail.com>
Co-authored-by: Dario Tranchitella <dario@tranchitella.eu>
2025-11-26 10:34:09 +01:00
Dario Tranchitella
5e68fd8fe0 fix: honouring certificate expiratin threshold (#886)
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
2025-07-28 09:40:16 +02:00
Dario Tranchitella
e366dc3959 feat: pausing reconciliation of controlled objects (#874)
* feat: pausing reconciliation of controlled objects

Objects such as TenantControlPlane and Secret can be annotated with
kamaji.clastix.io/paused to prevent controllers from processing them.

This will stop reconciling objects for debugging or other purposes.
Annotation value is irrelevant, just the key presence is evaluated.

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

* docs: pausing reconciliation of controlled objects

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

* chore(logs): typo for deleted resources

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

---------

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
2025-07-16 10:44:48 +02:00
Dario Tranchitella
ce8d5f2516 refactor: requeue deprecated, migrating to requeue after (#837)
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
2025-06-10 17:27:45 +02:00
Dario Tranchitella
b027e23b99 feat: enhancing concurrent reconciliations (#790)
* feat: buffered channels for generic events

Channels used for GenericEvent feeding for cross controllers triggers
are now buffered according to the --max-concurrent-tcp-reconciles: this
is required to avoid channel full errors when dealing with large
management clusters serving a sizeable amount of Tenant Control Planes.

Increasing this value will put more pressure on memory (mostly for GC)
and CPU (provisioning multiple certificates at the same time).

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

* refactor: retrying datastore status update

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

* feat(performance): reducing memory consumption for channel triggers

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

* feat(datastore): reconcile events only for root object changes

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

* feat: waiting soot manager exit before termination

This change introduces a grace period of 10 seconds before abruptly
terminating the Tenant Control Plane deployment, allowing the soot
manager to complete its exit procedure and avoid false positive errors
due to API Server being unresponsive due to user deletion.

Aim of this change is reducing the amount of false positive errors upon
mass deletion of Tenant COntrol Plane objects.

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

* refactor: unbuffered channel with timeout

WatchesRawSource is non blocking, no need to check if channel is full.
To prevent deadlocks a WithTimeout check has been introduced.

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

---------

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
2025-04-23 21:00:29 +02:00
Dario Tranchitella
1d72802abd refactor: avoid logging error and sentinel for status (#673)
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
2025-01-22 11:08:01 +01:00
Dario Tranchitella
0c0111094e feat: making default datastore optional (#597)
* feat: making default datastore optional

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

* feat(helm): making default datastore optional

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

* docs: making default datastore optional

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

---------

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
2024-10-30 20:23:34 +01:00
Dario Tranchitella
66d96a138d feat(deps): bump sigs.k8s.io/controller-runtime from 0.18.5 to 0.19.0 (#551)
* feat(deps): bump sigs.k8s.io/controller-runtime from 0.18.5 to 0.19.0

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

* feat: bumping up k8s supported version to v1.30.0

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

* feat(deps): aligning code to controlle-runtime v0.19.0

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

* docs: clastix subscription plans info

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

* chore: bumping up controller-gen to v0.16.1

* chore(kustomize): updating manifests for k8s v1.31.0 support

* chore(helm): updating manifests for k8s v1.31.0 support

* docs(api): updating api for k8s v1.31.0 support

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

* fix(test): worker nodes join support from v1.29 onwards

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

* chore(ci): disabling swap

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>

---------

Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
2024-08-20 17:37:18 +02:00
Dario Tranchitella
d57d5b5a56 feat(deps): bumping up sigs.k8s.io/controller-runtime to v0.18.4
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
2024-06-27 11:38:41 +02:00
Mathieu Cesbron
eff68db336 fix(certificate_lifecycle_controller): blocking reconciliation in case of error
Signed-off-by: Mathieu Cesbron <mathieu.cesbron@protonmail.com>
2024-02-26 21:27:17 +01:00
Dario Tranchitella
ddb700f4f0 refactor: upgrading to new dependencies
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
2023-12-15 13:02:49 +01:00
Dario Tranchitella
0f1a4f28de fix: blocking datastore secret deletion with finalizer 2023-09-29 10:56:28 +02:00
Dario Tranchitella
7e94ecdbab feat: kubeconfig and certificates rotation 2023-08-03 18:03:54 +02:00
Dario Tranchitella
f831f385c4 feat(cli): controller reconcile timeout flag with 30s default value 2023-08-01 13:51:09 +02:00
Dario Tranchitella
4110b688c9 feat: configurable max concurrent tcp reconciles 2023-02-06 22:12:50 +01:00
Dario Tranchitella
830d86a38a feat: introducing enqueueback reconciliation status
Required for the changes introduced with 74f7157e8b
2023-02-06 22:12:50 +01:00
Dario Tranchitella
44d1f3fa7f refactor: updating local tcp instance to avoid 2nd retrieval 2023-02-06 22:12:50 +01:00
Dario Tranchitella
18c60461e5 refactor: conforming finalizers management 2022-12-16 22:44:42 +01:00
Dario Tranchitella
3f7fa08871 refactor: removing unused scheme 2022-12-15 15:50:30 +01:00
Dario Tranchitella
4c51eafc90 feat(konnectivity): reconciliation performed by soot manager 2022-12-12 16:22:36 +01:00
Dario Tranchitella
28c47d9d13 refactor: moving migrate webhook handling from tcp to soot manager 2022-12-12 16:22:36 +01:00
Dario Tranchitella
e25f95d7eb feat(migrate): making image configurable 2022-12-08 14:33:20 +01:00
Dario Tranchitella
723fef5336 feat(migrate): injecting webhook into tcp 2022-12-08 14:13:45 +01:00
Dario Tranchitella
9e899379f4 feat: support to datastore migration w/ the same driver 2022-12-03 12:04:04 +01:00
Dario Tranchitella
ece1a4e7ee fix: avoiding inconsistency upon tcp retrieval and status update 2022-12-03 12:04:04 +01:00
Dario Tranchitella
eb2440ae62 refactor: abstracting datastore configuration retrieval 2022-12-03 12:04:04 +01:00
Dario Tranchitella
0d607dfe5d refactor: adding finalizer upon datastore setu 2022-11-27 17:26:34 +01:00
Dario Tranchitella
11502bf359 refactor: retry on conflict for the status update 2022-11-27 17:26:34 +01:00
Dario Tranchitella
f853f25195 refactor: adding further context to error reporting 2022-08-30 16:22:06 +02:00
Dario Tranchitella
d59f494a69 feat: support for tcp specific data store 2022-08-30 16:22:06 +02:00
Dario Tranchitella
8273d7c7b4 chore(golangci-lint): updating to v1.49.0 2022-08-27 15:16:31 +02:00
Dario Tranchitella
1ddaeccc94 feat: storage homogeneity 2022-08-27 15:16:31 +02:00
Dario Tranchitella
2c963881ab feat: using datastore api for backing storage driver 2022-08-26 22:05:59 +02:00
Dario Tranchitella
31b25f7c78 feat: postgresql kine driver 2022-08-23 08:48:56 +02:00
Dario Tranchitella
d290e73307 feat: making kine container image configurable via kamaji flag
A new CLI flag (`--kine-container`) has been introduced, with the
default value of `rancher/kine:v0.9.2-amd64`. It can be overridden also
using the kamaji configuration file (`kamaji.yaml`) using the key
`kine-image`.
2022-07-21 13:53:42 +00:00
Dario Tranchitella
eb699051a1 feat: logging resource deletion 2022-07-11 09:01:53 +00:00
mendrugory
9e3173676e feat: kine 2022-07-07 12:39:42 +00:00
mendrugory
8f59de6e13 refactor: adaption for kine 2022-07-07 12:39:42 +00:00
mendrugory
5b4de76229 refactor:
* cleaning code
    * group of resources and code improvements
    * addons
    * manifest for helm
2022-06-20 15:53:02 +02:00
Dario Tranchitella
8be787adc5 reorg: marking loadbalancer errors as debug 2022-05-29 16:41:39 +00:00
mendrugory
258b1ff48f feat: addons 2022-05-26 10:16:02 +02:00
Dario Tranchitella
0be08a0099 reorg: upgrade phase as first resource
Ensuring that upon a Tenant Control Plane upgrade the additional printer
columns are reporting a coherent status.
2022-05-23 10:19:43 +00:00
Gonzalo Gabriel Jiménez Fuentes
432c50b081 feat: releasing kamaji 2022-05-17 14:44:38 +02:00