Commit Graph

369 Commits

Author SHA1 Message Date
Jian Qiu
9b010ef622 🌱 build object reader to get resource object from spoke (#1324)
Some checks failed
Post / images (amd64, addon-manager) (push) Failing after 51s
Post / images (amd64, placement) (push) Failing after 46s
Post / images (amd64, registration) (push) Failing after 43s
Post / images (amd64, registration-operator) (push) Failing after 44s
Post / images (amd64, work) (push) Failing after 44s
Post / images (arm64, addon-manager) (push) Failing after 43s
Post / images (arm64, placement) (push) Failing after 43s
Post / images (arm64, registration) (push) Failing after 42s
Post / images (arm64, registration-operator) (push) Failing after 43s
Post / images (arm64, work) (push) Failing after 41s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Scorecard supply-chain security / Scorecard analysis (push) Failing after 8m56s
Post / coverage (push) Failing after 13m3s
Close stale issues and PRs / stale (push) Successful in 42s
* A resource informer code to watch resources

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Use object reader in controller

Signed-off-by: Jian Qiu <jqiu@redhat.com>

---------

Signed-off-by: Jian Qiu <jqiu@redhat.com>
2026-01-23 07:32:13 +00:00
Wei Liu
5740147dba add token request service (#1339)
Some checks failed
Post / images (amd64, placement) (push) Failing after 46s
Post / images (amd64, registration) (push) Failing after 42s
Post / images (amd64, registration-operator) (push) Failing after 39s
Post / images (amd64, work) (push) Failing after 39s
Post / images (arm64, addon-manager) (push) Failing after 41s
Post / images (arm64, placement) (push) Failing after 40s
Post / images (arm64, registration) (push) Failing after 41s
Post / images (arm64, registration-operator) (push) Failing after 40s
Post / images (arm64, work) (push) Failing after 42s
Post / images (amd64, addon-manager) (push) Failing after 7m44s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Post / coverage (push) Failing after 10m19s
Scorecard supply-chain security / Scorecard analysis (push) Failing after 23s
Close stale issues and PRs / stale (push) Successful in 44s
Signed-off-by: Wei Liu <liuweixa@redhat.com>
2026-01-20 03:19:41 +00:00
Zhiwei Yin
9a1e925112 ensure immediate requeue for transient errors when work spec is changed (#1335)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 22s
Post / images (amd64, addon-manager) (push) Failing after 51s
Post / images (amd64, placement) (push) Failing after 46s
Post / images (amd64, registration) (push) Failing after 44s
Post / images (amd64, registration-operator) (push) Failing after 44s
Post / images (amd64, work) (push) Failing after 46s
Post / images (arm64, placement) (push) Failing after 45s
Post / images (arm64, registration) (push) Failing after 45s
Post / images (arm64, registration-operator) (push) Failing after 44s
Post / images (arm64, work) (push) Failing after 45s
Post / images (arm64, addon-manager) (push) Failing after 16m21s
Post / coverage (push) Failing after 39m14s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Signed-off-by: Zhiwei Yin <zyin@redhat.com>
2026-01-19 07:57:39 +00:00
xuezhao
d83c822129 Add duplicate manifest detection in ManifestWork webhook validation (#1310)
This commit adds validation to detect and reject duplicate manifests
in ManifestWork resources. A manifest is considered duplicate when
it has the same apiVersion, kind, namespace, and name as another
manifest in the same ManifestWork.

This prevents issues where duplicate manifests with different specs
can cause state inconsistency, as the Work Agent applies manifests
sequentially and later entries would overwrite earlier ones.

The validation returns a clear error message indicating the duplicate
manifest's index and the index of its first occurrence.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: xuezhaojun <zxue@redhat.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 06:09:25 +00:00
Jian Zhu
c69a2586e5 fix: ensure immediate eviction after grace period expires (#1330)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m3s
Post / images (amd64, addon-manager) (push) Failing after 7m31s
Post / coverage (push) Failing after 9m30s
Post / images (amd64, registration-operator) (push) Failing after 57s
Post / images (amd64, work) (push) Failing after 52s
Post / images (arm64, addon-manager) (push) Failing after 50s
Post / images (arm64, placement) (push) Failing after 52s
Post / images (arm64, registration) (push) Failing after 50s
Post / images (arm64, registration-operator) (push) Failing after 52s
Post / images (arm64, work) (push) Failing after 49s
Post / images (amd64, registration) (push) Failing after 7m6s
Post / images (amd64, placement) (push) Failing after 27m47s
Post / image manifest (addon-manager) (push) Has been cancelled
Post / image manifest (placement) (push) Has been cancelled
Post / image manifest (registration) (push) Has been cancelled
Post / image manifest (registration-operator) (push) Has been cancelled
Post / image manifest (work) (push) Has been cancelled
Post / trigger clusteradm e2e (push) Has been cancelled
Close stale issues and PRs / stale (push) Successful in 3s
Fixed a bug where AppliedManifestWorks were not evicted immediately
after the appliedmanifestwork-eviction-grace-period expired.

Root cause: The controller used an exponential backoff rate limiter
to schedule requeue delays, which caused:
1. Exponentially increasing delays during grace period (1min -> 2min -> 4min...)
2. Unpredictable delays after grace period expired

Solution: Replace rate limiter with direct time calculation. Now the
controller calculates the exact remaining time until eviction and
schedules the next sync for that precise moment:
  remainingTime := evictionTime.Sub(now)

Changes:
- Removed rateLimiter field and workqueue import
- Calculate exact remaining time instead of using exponential backoff
- Added V(4) logging to show scheduled eviction time and remaining time
- Updated unit test expectations (queue length 0 for delayed items)

Impact: AppliedManifestWorks are now evicted immediately when the
grace period expires, instead of being delayed by minutes due to
exponential backoff.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: zhujian <jiazhu@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-16 09:48:50 +00:00
Anne Lau
4dc99cd621 Progressing status conditions, true wins (#1332)
Signed-off-by: annelau <annelau@salesforce.com>
Co-authored-by: annelau <annelau@salesforce.com>
2026-01-16 06:53:14 +00:00
Zhiwei Yin
40de7f2ed1 refactor(registration): preserve ClusterRole/ClusterRoleBinding when managed cluster is denied (#1328)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m24s
Post / coverage (push) Failing after 7m11s
Post / images (amd64, registration) (push) Failing after 45s
Post / images (amd64, registration-operator) (push) Failing after 42s
Post / images (amd64, placement) (push) Failing after 7m50s
Post / images (amd64, work) (push) Failing after 42s
Post / images (arm64, placement) (push) Failing after 42s
Post / images (arm64, registration) (push) Failing after 40s
Post / images (arm64, registration-operator) (push) Failing after 38s
Post / images (arm64, work) (push) Failing after 42s
Post / images (amd64, addon-manager) (push) Failing after 14m28s
Post / images (arm64, addon-manager) (push) Failing after 7m10s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Refactored the removeClusterRbac function into separate functions to handle
different RBAC resource cleanup scenarios:

- removeClusterRBACResources: orchestrates full RBAC cleanup when cluster is deleted
- removeClusterSpecificRBAC: removes ClusterRole and ClusterRoleBinding
- removeClusterSpecificRoleBindings: removes registration and work RoleBindings

When hubAcceptsClient is false (cluster denied), only RoleBindings are removed
while ClusterRole and ClusterRoleBinding are preserved and updated. This ensures
proper RBAC state for denied clusters without deleting cluster-scoped resources.

Added unit test to verify that when a cluster is denied, only RoleBindings are
deleted while ClusterRole and ClusterRoleBinding remain intact.

Signed-off-by: Zhiwei Yin <zyin@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-13 10:16:00 +00:00
Wei Liu
d5e677414c add options to grpc broker (#1326)
Signed-off-by: Wei Liu <liuweixa@redhat.com>
2026-01-12 12:50:14 +00:00
Guilhem Lettron
ac5f34839d feat(manager): implement import-renderers (#1317)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m40s
Post / images (amd64, addon-manager) (push) Failing after 52s
Post / images (amd64, placement) (push) Failing after 46s
Post / images (amd64, registration-operator) (push) Failing after 47s
Post / images (amd64, work) (push) Failing after 47s
Post / images (arm64, addon-manager) (push) Failing after 49s
Post / images (arm64, placement) (push) Failing after 48s
Post / images (arm64, registration) (push) Failing after 46s
Post / images (arm64, registration-operator) (push) Failing after 48s
Post / images (arm64, work) (push) Failing after 49s
Post / images (amd64, registration) (push) Failing after 14m11s
Post / coverage (push) Failing after 40m4s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 3s
Signed-off-by: Guilhem Lettron <glettron@akamai.com>
2026-01-09 07:38:35 +00:00
Qing Hao
c6aa931619 fix(addon): remove internal annotation from v1alpha1 after conversion (#1321)
When converting ManagedClusterAddOn from v1beta1 to v1alpha1, the
internal annotation 'addon.open-cluster-management.io/v1alpha1-install-namespace'
should be removed after being converted to Spec.InstallNamespace field.

This annotation is only used internally for v1beta1 storage to preserve
the InstallNamespace field which was removed in v1beta1. It should not
appear in v1alpha1 API responses.

Fixes: ACM-28133

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Qing Hao <qhao@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-09 01:34:23 +00:00
Érico GR
ad89f05351 🐛 Fix work rolebinding cleanup when hubAcceptsClient is set to false (#1318)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 13s
Post / images (amd64, addon-manager) (push) Failing after 48s
Post / images (amd64, placement) (push) Failing after 1m22s
Post / images (amd64, registration) (push) Failing after 42s
Post / images (amd64, work) (push) Failing after 41s
Post / images (arm64, addon-manager) (push) Failing after 42s
Post / images (arm64, placement) (push) Failing after 41s
Post / images (arm64, registration) (push) Failing after 41s
Post / images (arm64, registration-operator) (push) Failing after 41s
Post / images (arm64, work) (push) Failing after 42s
Post / images (amd64, registration-operator) (push) Failing after 21m14s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Post / coverage (push) Failing after 39m11s
Close stale issues and PRs / stale (push) Successful in 50s
* Fix work rolebinding cleanup when hubAcceptsClient is set to false
Signed-off-by: Erico G. Rimoli <erico.rimoli@totvs.com.br>

* Adds error handling to the removeClusterRbac call within the controller synchronization function
Signed-off-by: Erico G. Rimoli <erico.rimoli@totvs.com.br>
2026-01-08 13:46:13 +00:00
Anne Lau
635b0ff7e9 PlacementRollout to reflect Ready status (#1281)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 20s
Post / images (amd64, placement) (push) Failing after 45s
Post / images (amd64, registration) (push) Failing after 42s
Post / images (amd64, registration-operator) (push) Failing after 40s
Post / images (amd64, work) (push) Failing after 41s
Post / images (arm64, addon-manager) (push) Failing after 41s
Post / images (arm64, placement) (push) Failing after 40s
Post / images (arm64, registration) (push) Failing after 39s
Post / images (arm64, registration-operator) (push) Failing after 39s
Post / images (arm64, work) (push) Failing after 41s
Post / images (amd64, addon-manager) (push) Failing after 7m30s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Post / coverage (push) Failing after 9m44s
Update with success count

Remove status references

Add unit tests

Fix unit tests

Update unit tests
Test fix

Fix tests for lastTransitionTime

Fix integration tests

Signed-off-by: annelau <annelau@salesforce.com>
Co-authored-by: annelau <annelau@salesforce.com>
2026-01-08 01:53:14 +00:00
Carlos Cardeñosa
1b40e72e0b Fix race condition: wait for CA bundle ConfigMap before applying CRDs (#1309)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 14s
Post / images (amd64, addon-manager) (push) Failing after 7m59s
Post / coverage (push) Failing after 8m58s
Post / images (amd64, registration) (push) Failing after 52s
Post / images (amd64, registration-operator) (push) Failing after 50s
Post / images (amd64, work) (push) Failing after 48s
Post / images (arm64, placement) (push) Failing after 48s
Post / images (arm64, registration) (push) Failing after 47s
Post / images (arm64, registration-operator) (push) Failing after 46s
Post / images (arm64, work) (push) Failing after 45s
Post / images (amd64, placement) (push) Failing after 7m34s
Post / images (arm64, addon-manager) (push) Failing after 9m56s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 1m3s
The cluster manager controller was silently using a literal "placeholder"
string as the CA bundle when the ca-bundle-configmap ConfigMap didn't exist
yet. This caused CRDs to be created with an invalid caBundle field
(cGxhY2Vob2xkZXI= which is base64 of "placeholder"), resulting in:

1. CRD conversion webhooks failing with "InvalidCABundle" error
2. CRDs not becoming Established
3. API endpoints not being registered
4. Dependent components (like MultiClusterHub) failing with:
   "no matches for kind ClusterManagementAddOn"

The bug was a race condition between the cert rotation controller (which
creates the ca-bundle-configmap) and the cluster manager controller (which
reads it). When the ConfigMap was not found, the code did "// do nothing"
and silently continued with the placeholder value.

This fix:
1. Creates the hub namespace FIRST (before waiting for the CA bundle)
   to allow the cert rotation controller to create the ca-bundle-configmap
2. Then waits for the CA bundle ConfigMap to exist before proceeding
3. Requeues via AddAfter if the ConfigMap is not found, allowing the
   controller to gracefully retry until the cert rotation controller
   has created it

This ensures CRDs are always created with valid CA bundles while avoiding
the deadlock where clusterManagerController waited for CA bundle but
certRotationController needed the namespace first.

Changes based on review feedback:
- Use requeue (AddAfter) instead of returning error (@elgnay)
- Use contextual logging instead of klog.V(4).Infof (@qiujian16)

The issue was discovered in OpenShift CI Prow jobs for ZTP hub deployment:
- https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-kni-eco-ci-cd-ztp-left-shifting-kpi-ci-4.21-telcov10n-virtualised-single-node-hub-ztp/2005051399989104640
- https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-kni-eco-ci-cd-ztp-left-shifting-kpi-ci-4.21-telcov10n-virtualised-single-node-hub-ztp/2005219283428184064

Affected versions: ACM 2.16.0-113/114, MCE 2.11.0-142/143 on OCP 4.21.0-rc.0

Signed-off-by: Carlos Cardenosa <ccardeno@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-01-07 14:35:55 +00:00
Anne Lau
ff9f801aa0 Fix transition time for Applied + StatusFeedbackSynced (#1282)
Some checks failed
Post / coverage (push) Failing after 7m10s
Post / images (amd64, addon-manager) (push) Failing after 43s
Post / images (amd64, placement) (push) Failing after 36s
Post / images (amd64, registration) (push) Failing after 36s
Post / images (amd64, registration-operator) (push) Failing after 36s
Post / images (amd64, work) (push) Failing after 38s
Post / images (arm64, placement) (push) Failing after 37s
Post / images (arm64, registration) (push) Failing after 37s
Post / images (arm64, registration-operator) (push) Failing after 38s
Post / images (arm64, work) (push) Failing after 38s
Post / images (arm64, addon-manager) (push) Failing after 14m20s
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m28s
Post / image manifest (addon-manager) (push) Has been cancelled
Post / image manifest (placement) (push) Has been cancelled
Post / image manifest (registration) (push) Has been cancelled
Post / image manifest (registration-operator) (push) Has been cancelled
Post / image manifest (work) (push) Has been cancelled
Post / trigger clusteradm e2e (push) Has been cancelled
Close stale issues and PRs / stale (push) Successful in 4s
Update code changes to only update observed generation without lastTransitionTime

Update with simple tests

Update with the latest PR changes

Add unit test changes

Add integration test generated by cursor

Fix unit tests

Signed-off-by: annelau <annelau@salesforce.com>
Co-authored-by: annelau <annelau@salesforce.com>
2025-12-31 02:27:59 +00:00
Qing Hao
c516beffa6 Add addon conversion webhook for v1alpha1/v1beta1 API migration (#1289)
Some checks failed
Post / images (amd64, addon-manager) (push) Failing after 46s
Post / images (amd64, placement) (push) Failing after 41s
Post / images (amd64, registration-operator) (push) Failing after 39s
Post / images (amd64, work) (push) Failing after 42s
Post / images (arm64, addon-manager) (push) Failing after 39s
Post / images (arm64, placement) (push) Failing after 39s
Post / images (arm64, registration) (push) Failing after 40s
Post / images (arm64, registration-operator) (push) Failing after 42s
Post / images (arm64, work) (push) Failing after 39s
Post / images (amd64, registration) (push) Failing after 7m46s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Post / coverage (push) Failing after 14m33s
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m25s
Close stale issues and PRs / stale (push) Successful in 46s
* Add addon conversion webhook for v1alpha1/v1beta1 API migration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Qing Hao <qhao@redhat.com>

* Fix GroupVersion compatibility issues after API dependency update

This commit fixes compilation and test errors introduced by updating
the API dependency to use native conversion functions from PR #411.

Changes include:

1. Fix GroupVersion type mismatches across the codebase:
   - Updated OwnerReference creation to use schema.GroupVersion
   - Fixed webhook scheme registration to use proper GroupVersion type
   - Applied fixes to addon, placement, migration, work, and registration controllers

2. Enhance addon conversion webhook:
   - Use native API conversion functions from addon/v1beta1/conversion.go
   - Fix InstallNamespace annotation key to match expected format
   - Add custom logic to populate deprecated ConfigReferent field in ConfigReferences
   - Properly preserve annotations during v1alpha1 <-> v1beta1 conversion

3. Remove duplicate conversion code:
   - Deleted pkg/addon/webhook/conversion/ directory (~500 lines)
   - Now using native conversion functions from the API repository

4. Patch vendored addon-framework:
   - Fixed GroupVersion errors in agentdeploy utils

All unit tests pass successfully (97 packages, 0 failures).

Signed-off-by: Qing Hao <qhao@redhat.com>

---------

Signed-off-by: Qing Hao <qhao@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-24 08:26:35 +00:00
Jian Qiu
78daf0d2ae fix: skip GC for ManifestWorks managed by ManifestWorkReplicaSet (#1299)
Skip garbage collection for ManifestWorks that have the
ManifestWorkReplicaSet controller label, as these should be
managed exclusively by the ManifestWorkReplicaSet controller.

Changes:
- Fix logic bug in controller to properly check for ReplicaSet label
- Add unit tests for label-based GC skip behavior
- Add integration test to verify GC skip for ReplicaSet-managed works

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Jian Qiu <jqiu@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 08:56:07 +00:00
Jian Qiu
99265f6113 Refactor to contextual logging (#1283)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m25s
Post / coverage (push) Failing after 36m59s
Post / images (amd64, addon-manager) (push) Failing after 7m34s
Post / images (amd64, placement) (push) Failing after 7m4s
Post / images (amd64, registration) (push) Failing after 7m8s
Post / images (amd64, registration-operator) (push) Failing after 7m3s
Post / images (amd64, work) (push) Failing after 6m59s
Post / images (arm64, addon-manager) (push) Failing after 7m0s
Post / images (arm64, placement) (push) Failing after 6m54s
Post / images (arm64, registration) (push) Failing after 6m55s
Post / images (arm64, registration-operator) (push) Failing after 6m55s
Post / images (arm64, work) (push) Failing after 7m16s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-12-08 08:14:30 +00:00
Jian Qiu
a06e37e65c 🌱 Integrate SDK logging tracing into work agent controllers (#1277)
This change adds log tracing support to the work agent controllers by:
- Upgrading SDK to version with logging.SetLogTracingByObject helper
- Setting tracing keys from ManifestWork objects in all work controllers
- Adding clusterName to the base logger for better log context
- Propagating tracing context through cloud events

The tracing keys enable better correlation of logs across the work
lifecycle from source to agent.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Jian Qiu <jqiu@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-04 13:12:38 +00:00
Jian Qiu
33310619d9 🌱 use SDK basecontroller for better logging. (#1269)
* Use basecontroller in sdk-go instead for better logging

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Rename to fakeSyncContext

Signed-off-by: Jian Qiu <jqiu@redhat.com>

---------

Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-12-01 03:07:02 +00:00
Qing Hao
26edb9423a fix: Check Applied condition before evaluating rollout status (#1243)
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Qing Hao <qhao@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-01 02:14:46 +00:00
Jian Qiu
8f8cd01b52 Update dependencies: k8s 0.34.1, controller-runtime 0.22.3, and OCM libs (#1267)
Some checks failed
Post / coverage (push) Failing after 37m28s
Post / images (amd64, addon-manager) (push) Failing after 7m29s
Post / images (amd64, placement) (push) Failing after 7m1s
Post / images (amd64, registration) (push) Failing after 7m7s
Post / images (amd64, registration-operator) (push) Failing after 7m22s
Post / images (amd64, work) (push) Failing after 7m25s
Post / images (arm64, addon-manager) (push) Failing after 7m5s
Post / images (arm64, placement) (push) Failing after 7m4s
Post / images (arm64, registration) (push) Failing after 7m20s
Post / images (arm64, registration-operator) (push) Failing after 7m9s
Post / images (arm64, work) (push) Failing after 7m12s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Scorecard supply-chain security / Scorecard analysis (push) Failing after 59s
Close stale issues and PRs / stale (push) Successful in 29s
- Update k8s.io/* libraries to v0.34.1
- Update sigs.k8s.io/controller-runtime to v0.22.3
- Update open-cluster-management.io/api to 2337d27c3b7f
- Update open-cluster-management.io/sdk-go to a185f88d7b1b
- Update open-cluster-management.io/addon-framework to 1a0a9be61322
- Update openshift libraries (api, client-go, library-go) to latest commits
  for structured-merge-diff v6 compatibility
- Add Recorder() method to FakeSDKSyncContext with adapter pattern to bridge
  openshift/library-go and SDK event recorder interfaces
- Update vendor directory and regenerate CRDs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Jian Qiu <jqiu@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-11-26 05:56:58 +00:00
Jian Qiu
eb033993c2 🌱 Use base controller in sdk-go (#1251)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m11s
Post / coverage (push) Failing after 37m30s
Post / images (amd64, addon-manager) (push) Failing after 7m29s
Post / images (amd64, placement) (push) Failing after 6m57s
Post / images (amd64, registration) (push) Failing after 7m5s
Post / images (amd64, registration-operator) (push) Failing after 7m5s
Post / images (amd64, work) (push) Failing after 7m2s
Post / images (arm64, addon-manager) (push) Failing after 7m18s
Post / images (arm64, placement) (push) Failing after 7m7s
Post / images (arm64, registration) (push) Failing after 7m13s
Post / images (arm64, registration-operator) (push) Failing after 7m6s
Post / images (arm64, work) (push) Failing after 7m2s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 45s
* Use base controller in sdk-go

We can leverage contextual logger in base controller.

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Fix integration test error

Signed-off-by: Jian Qiu <jqiu@redhat.com>

---------

Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-11-20 07:53:42 +00:00
Wei Liu
b928d9f2a9 update sdk-go (#1257)
Some checks failed
Post / coverage (push) Failing after 38m23s
Post / images (amd64, addon-manager) (push) Failing after 7m53s
Post / images (amd64, placement) (push) Failing after 6m57s
Post / images (amd64, registration) (push) Failing after 7m7s
Post / images (amd64, registration-operator) (push) Failing after 7m1s
Post / images (amd64, work) (push) Failing after 7m8s
Post / images (arm64, addon-manager) (push) Failing after 7m10s
Post / images (arm64, placement) (push) Failing after 7m11s
Post / images (arm64, registration) (push) Failing after 6m58s
Post / images (arm64, registration-operator) (push) Failing after 7m17s
Post / images (arm64, work) (push) Failing after 7m18s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m15s
Close stale issues and PRs / stale (push) Successful in 41s
Signed-off-by: Wei Liu <liuweixa@redhat.com>
2025-11-19 04:08:16 +00:00
Zhiwei Yin
76449f862c support loadBalancer for grpc endpoint type (#1255)
Signed-off-by: Zhiwei Yin <zyin@redhat.com>
2025-11-19 02:39:54 +00:00
Jian Qiu
5528aff6d3 🌱 Add contextual logging for work agent (#1242)
* Add contextual logging for work agent

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Resolve comments

Signed-off-by: Jian Qiu <jqiu@redhat.com>

---------

Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-11-07 05:28:13 +00:00
Zhiwei Yin
d80ec55608 add server configuration for clusterManager helm chart (#1239)
Signed-off-by: Zhiwei Yin <zyin@redhat.com>
2025-11-05 06:44:23 +00:00
Yang Le
1384645b10 🌱 optimize the requeue timing for lease controller (#1236)
Signed-off-by: Yang Le <yangle@redhat.com>
2025-10-31 04:21:55 +00:00
Wei Liu
8d46ca188a only init cluster and csr client in bootstrap phase (#1224)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 19s
Post / coverage (push) Failing after 18s
Post / images (amd64, addon-manager) (push) Failing after 14s
Post / images (amd64, placement) (push) Failing after 19s
Post / images (amd64, registration) (push) Failing after 19s
Post / images (amd64, registration-operator) (push) Failing after 19s
Post / images (amd64, work) (push) Failing after 20s
Post / images (arm64, addon-manager) (push) Failing after 17s
Post / images (arm64, placement) (push) Failing after 16s
Post / images (arm64, registration) (push) Failing after 14s
Post / images (arm64, registration-operator) (push) Failing after 19s
Post / images (arm64, work) (push) Failing after 20s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Failing after 33s
Signed-off-by: Wei Liu <liuweixa@redhat.com>
2025-10-28 14:17:36 +00:00
Qing Hao
34cd9a2549 Update rollout logic to use Progressing condition instead of WorkApplied (#1207)
Some checks failed
Post / coverage (push) Failing after 22s
Post / images (amd64, addon-manager) (push) Failing after 17s
Post / images (amd64, placement) (push) Failing after 25s
Post / images (amd64, registration) (push) Failing after 17s
Post / images (amd64, registration-operator) (push) Failing after 18s
Post / images (amd64, work) (push) Failing after 26s
Post / images (arm64, addon-manager) (push) Failing after 16s
Post / images (arm64, placement) (push) Failing after 21s
Post / images (arm64, registration) (push) Failing after 25s
Post / images (arm64, registration-operator) (push) Failing after 27s
Post / images (arm64, work) (push) Failing after 23s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Scorecard supply-chain security / Scorecard analysis (push) Failing after 20s
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Qing Hao <qhao@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-10-22 02:31:40 +00:00
Jian Zhu
8a2a4a5e6b Add test for empty agentInstallNamespace using template namespace (#1213)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 24s
Post / coverage (push) Failing after 24s
Post / images (amd64, addon-manager) (push) Failing after 27s
Post / images (amd64, placement) (push) Failing after 22s
Post / images (amd64, registration) (push) Failing after 17s
Post / images (amd64, registration-operator) (push) Failing after 27s
Post / images (amd64, work) (push) Failing after 17s
Post / images (arm64, addon-manager) (push) Failing after 19s
Post / images (arm64, placement) (push) Failing after 27s
Post / images (arm64, registration) (push) Failing after 26s
Post / images (arm64, registration-operator) (push) Failing after 33s
Post / images (arm64, work) (push) Failing after 19s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Failing after 35s
Add a test case to verify that when agentInstallNamespace is explicitly
set to an empty string in AddOnDeploymentConfig, the namespace defined
in the addonTemplate is used instead of being overridden.

This test validates the fix for issue #1209 where AddOnDeploymentConfig
was silently overriding the addonTemplate namespace even when
agentInstallNamespace was not intended to be set.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: zhujian <jiazhu@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-10-20 01:58:57 +00:00
Jian Qiu
daa9b2fa54 🐛 Avoid redundant apply and get operation in work controller (#1196)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 28s
Post / coverage (push) Failing after 22s
Post / images (amd64, addon-manager) (push) Failing after 30s
Post / images (amd64, placement) (push) Failing after 25s
Post / images (amd64, registration) (push) Failing after 16s
Post / images (amd64, registration-operator) (push) Failing after 23s
Post / images (amd64, work) (push) Failing after 17s
Post / images (arm64, addon-manager) (push) Failing after 14s
Post / images (arm64, placement) (push) Failing after 19s
Post / images (arm64, registration) (push) Failing after 23s
Post / images (arm64, registration-operator) (push) Failing after 17s
Post / images (arm64, work) (push) Failing after 19s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Failing after 31s
* Remove event after apply and add jitter when requeue

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Change event handler to avoid redundant reconciles

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Add unit tests for onAdd and onUpdate function

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Fix interegation test fail

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Set resync interval to 4-6 mins

Signed-off-by: Jian Qiu <jqiu@redhat.com>

---------

Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-10-17 02:04:40 +00:00
Wei Liu
f1e7905b16 using mw finalizer instead of resource finalizer (#1211)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 32s
Post / coverage (push) Failing after 25s
Post / images (amd64, addon-manager) (push) Failing after 30s
Post / images (amd64, placement) (push) Failing after 15s
Post / images (amd64, registration) (push) Failing after 20s
Post / images (amd64, registration-operator) (push) Failing after 27s
Post / images (amd64, work) (push) Failing after 14s
Post / images (arm64, addon-manager) (push) Failing after 15s
Post / images (arm64, placement) (push) Failing after 19s
Post / images (arm64, registration) (push) Failing after 18s
Post / images (arm64, registration-operator) (push) Failing after 19s
Post / images (arm64, work) (push) Failing after 17s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Failing after 34s
Signed-off-by: Wei Liu <liuweixa@redhat.com>
2025-10-16 09:30:26 +00:00
Zhiwei Yin
4a8b2ddb5e check registration webhook deployment for HubRegistrationDegraded condidtion (#1043)
Signed-off-by: Zhiwei Yin <zyin@redhat.com>
2025-10-16 08:19:09 +00:00
Jian Qiu
eed705a038 Fix ManifestWorkReplicaSet not deleting ManifestWorks from old placement (#1206)
When a ManifestWorkReplicaSet's placementRef was changed, the
ManifestWorks created for the old placement were not deleted,
causing orphaned resources.

The deployReconciler only processed placements currently in the spec
and never cleaned up ManifestWorks from removed placements.

This commit adds cleanup logic that:
- Builds a set of current placement names from the spec
- Lists all ManifestWorks belonging to the ManifestWorkReplicaSet
- Deletes any ManifestWorks with placement labels not in current spec

Also adds comprehensive tests:
- Integration test verifying placement change cleanup
- Unit tests for single and multiple placement change scenarios

Fixes #1203

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Jian Qiu <jqiu@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-10-13 06:31:38 +00:00
Yang Le
db92ed79d4 support managed namespaces (#1193)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m6s
Post / coverage (push) Failing after 30s
Post / images (amd64, addon-manager) (push) Failing after 19s
Post / images (amd64, placement) (push) Failing after 24s
Post / images (amd64, registration) (push) Failing after 18s
Post / images (amd64, registration-operator) (push) Failing after 14s
Post / images (amd64, work) (push) Failing after 14s
Post / images (arm64, addon-manager) (push) Failing after 22s
Post / images (arm64, placement) (push) Failing after 16s
Post / images (arm64, registration) (push) Failing after 21s
Post / images (arm64, registration-operator) (push) Failing after 16s
Post / images (arm64, work) (push) Failing after 17s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Failing after 45s
Signed-off-by: Yang Le <yangle@redhat.com>
2025-09-25 08:19:30 +00:00
Zhiwei Yin
35bab4476a add grpc config into the bootstrap secret (#1194)
Some checks failed
Post / coverage (push) Failing after 27s
Post / images (amd64, addon-manager) (push) Failing after 21s
Post / images (amd64, placement) (push) Failing after 23s
Post / images (amd64, registration) (push) Failing after 18s
Post / images (amd64, registration-operator) (push) Failing after 22s
Post / images (amd64, work) (push) Failing after 23s
Post / images (arm64, addon-manager) (push) Failing after 25s
Post / images (arm64, placement) (push) Failing after 21s
Post / images (arm64, registration) (push) Failing after 27s
Post / images (arm64, registration-operator) (push) Failing after 28s
Post / images (arm64, work) (push) Failing after 21s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Scorecard supply-chain security / Scorecard analysis (push) Failing after 33s
Close stale issues and PRs / stale (push) Failing after 31s
Signed-off-by: Zhiwei Yin <zyin@redhat.com>
2025-09-24 03:35:31 +00:00
xuezhao
010f5efe6d Add startTime initialization and wait 10s in hubTimeoutController (#1191)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 23s
Post / coverage (push) Failing after 29s
Post / images (amd64, addon-manager) (push) Failing after 27s
Post / images (amd64, placement) (push) Failing after 29s
Post / images (amd64, registration) (push) Failing after 21s
Post / images (amd64, registration-operator) (push) Failing after 20s
Post / images (amd64, work) (push) Failing after 23s
Post / images (arm64, addon-manager) (push) Failing after 26s
Post / images (arm64, placement) (push) Failing after 24s
Post / images (arm64, registration) (push) Failing after 19s
Post / images (arm64, registration-operator) (push) Failing after 26s
Post / images (arm64, work) (push) Failing after 33s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Failing after 46s
Signed-off-by: xuezhaojun <zxue@redhat.com>
2025-09-23 07:26:48 +00:00
Jian Qiu
2f04992d6c Deleted manifestwork when it is completed for ttl seconds. (#1158)
* Delete manifestwork when it is completed after ttl

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Fix integration test

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Update operator and e2e tests

Signed-off-by: Jian Qiu <jqiu@redhat.com>

---------

Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-09-23 02:23:47 +00:00
Jian Qiu
8cb4f13e74 Improve unit test coverage for low-coverage packages (#1188)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 42s
Post / coverage (push) Failing after 37s
Post / images (amd64, addon-manager) (push) Failing after 37s
Post / images (amd64, placement) (push) Failing after 35s
Post / images (amd64, registration) (push) Failing after 33s
Post / images (amd64, registration-operator) (push) Failing after 36s
Post / images (amd64, work) (push) Failing after 33s
Post / images (arm64, addon-manager) (push) Failing after 33s
Post / images (arm64, placement) (push) Failing after 34s
Post / images (arm64, registration) (push) Failing after 36s
Post / images (arm64, registration-operator) (push) Failing after 38s
Post / images (arm64, work) (push) Failing after 39s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Failing after 40s
This commit enhances unit test coverage for packages with the lowest
test coverage, focusing on previously untested methods and edge cases.

Changes:
- pkg/server/grpc: Increased coverage from 31.6% to 81.6%
  - Added comprehensive tests for Clients.Run() method
  - Added tests for GRPCServerOptions.Run() method
  - Covered error handling, configuration validation, and context cancellation

- pkg/singleton/spoke: Enhanced test suite with additional edge cases
  - Added method signature validation tests
  - Added configuration setup and struct initialization tests
  - Fixed race condition issues in existing tests

- pkg/server/grpc coverage improvements:
  - Clients.Run(): 0% → 100% coverage
  - GRPCServerOptions.Run(): 0% → 88.2% coverage

The new tests cover normal operation, error conditions, edge cases,
and defensive programming scenarios, significantly improving overall
code quality and test reliability.

🤖 Generated with [Claude Code](https://claude.ai/code)

Signed-off-by: Jian Qiu <jqiu@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-09-19 06:22:38 +00:00
Suvaansh
6056c04893 Adding labels to the resources created by work controller (#1176)
Signed-off-by: suvaanshkumar <suvaanshkumar@gmail.com>
2025-09-19 02:24:46 +00:00
Zhiwei Yin
dab97728e2 support cluster import config secret (#1170)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 35s
Post / coverage (push) Failing after 27s
Post / images (amd64, addon-manager) (push) Failing after 34s
Post / images (amd64, placement) (push) Failing after 29s
Post / images (amd64, registration) (push) Failing after 27s
Post / images (amd64, registration-operator) (push) Failing after 27s
Post / images (amd64, work) (push) Failing after 33s
Post / images (arm64, addon-manager) (push) Failing after 29s
Post / images (arm64, placement) (push) Failing after 28s
Post / images (arm64, registration) (push) Failing after 27s
Post / images (arm64, registration-operator) (push) Failing after 29s
Post / images (arm64, work) (push) Failing after 29s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Signed-off-by: Zhiwei Yin <zyin@redhat.com>
2025-09-18 06:47:16 +00:00
Jian Zhu
cca5ba8b42 🌱 Improve addon template controller logging and template mode support (#1184)
Some checks failed
Post / coverage (push) Failing after 29s
Post / images (amd64, addon-manager) (push) Failing after 29s
Post / images (amd64, placement) (push) Failing after 24s
Post / images (amd64, registration) (push) Failing after 41s
Post / images (amd64, registration-operator) (push) Failing after 23s
Post / images (amd64, work) (push) Failing after 36s
Post / images (arm64, addon-manager) (push) Failing after 33s
Post / images (arm64, placement) (push) Failing after 29s
Post / images (arm64, registration) (push) Failing after 31s
Post / images (arm64, registration-operator) (push) Failing after 28s
Post / images (arm64, work) (push) Failing after 31s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Scorecard supply-chain security / Scorecard analysis (push) Failing after 41s
Close stale issues and PRs / stale (push) Failing after 31s
* Upgrade addon template to the latest version

Signed-off-by: zhujian <jiazhu@redhat.com>

* Improve addon template controller logging and template mode

- Reduce log verbosity for duplicate manager check to V(4)
- Enable template mode in addon manager for better template support

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

---------

Signed-off-by: zhujian <jiazhu@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-09-16 06:41:01 +00:00
Jian Zhu
1125e4c33d 🌱 Remove resolved TODO comments (#1177)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 50s
Post / coverage (push) Failing after 55s
Post / images (amd64, addon-manager) (push) Failing after 36s
Post / images (amd64, placement) (push) Failing after 40s
Post / images (amd64, registration) (push) Failing after 28s
Post / images (amd64, registration-operator) (push) Failing after 20s
Post / images (amd64, work) (push) Failing after 29s
Post / images (arm64, addon-manager) (push) Failing after 33s
Post / images (arm64, placement) (push) Failing after 32s
Post / images (arm64, registration) (push) Failing after 25s
Post / images (arm64, registration-operator) (push) Failing after 33s
Post / images (arm64, work) (push) Failing after 29s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
* 🧹 Remove resolved TODO comments

- Remove TODO comment about confirming subject in CustomSignerConfigurations
- Remove TODO comment about namespace value in manifestwork validating webhook

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* Ensure addon namespace and image pull secret sync

- Remove outdated TODO comment about addon deployment in Hosted mode
- Namespace creation is responsibility of addon developers
- Image pull secrets are synced to namespaces with addon.open-cluster-management.io/namespace label by the AddonPullImageSecretController

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

---------

Signed-off-by: zhujian <jiazhu@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-09-15 02:20:53 +00:00
Jian Qiu
df5b456f88 Add more unit test to improve coverage (#1183)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 35s
Post / coverage (push) Failing after 28s
Post / images (amd64, addon-manager) (push) Failing after 29s
Post / images (amd64, placement) (push) Failing after 27s
Post / images (amd64, registration) (push) Failing after 31s
Post / images (amd64, registration-operator) (push) Failing after 28s
Post / images (amd64, work) (push) Failing after 31s
Post / images (arm64, addon-manager) (push) Failing after 32s
Post / images (arm64, placement) (push) Failing after 32s
Post / images (arm64, registration) (push) Failing after 32s
Post / images (arm64, registration-operator) (push) Failing after 34s
Post / images (arm64, work) (push) Failing after 31s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Failing after 43s
Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-09-12 08:30:33 +00:00
Jian Zhu
ab41b86b8d 🐛 Use specific addon template instead of default in CSR functions (#1180)
* Upgrade addon framework

Signed-off-by: zhujian <jiazhu@redhat.com>

* Use specific addon template instead of default in CSR functions

- Pass real ManagedClusterAddOn to GetDesiredAddOnTemplate instead of nil
- Enable per-addon template selection using addon.Status.ConfigReferences
- Replace utilruntime.HandleError with explicit error returns
- Update CSRConfigurationsFunc to return ([]RegistrationConfig, error)
- Update CSRSignerFunc to return ([]byte, error)
- Add addon parameter to CSR functions for better context
- Convert runtime errors to structured logging with cluster/addon context
- Update tests to verify error conditions

This allows each ManagedClusterAddOn instance to use its specific template
configuration rather than falling back to the ClusterManagementAddon default.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* Fix error assertion logic in registration tests and improve error handling

- Fix inverted error assertion logic in TestTemplateCSRConfigurationsFunc and TestTemplateCSRSignFunc
- Change tests to properly check if expectedErr is empty vs non-empty
- When no error expected, assert err == nil; when error expected, assert err != nil and contains substring
- Fix strings.Contains argument order to check if actual error contains expected substring
- Add nil template checks with proper error messages in CSRSign and PermissionConfig functions
- Improve logging consistency with clusterName/addonName format across CSR functions
- Guard against nil pointer access by checking err == nil before calling err.Error()

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

---------

Signed-off-by: zhujian <jiazhu@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-09-12 06:42:44 +00:00
Yang Le
6d4561ca9a 🐛 fix addon namespace deletion in multi-hub scenarios (#1153)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 33s
Post / coverage (push) Failing after 40s
Post / images (amd64, addon-manager) (push) Failing after 33s
Post / images (amd64, placement) (push) Failing after 34s
Post / images (amd64, registration) (push) Failing after 33s
Post / images (amd64, registration-operator) (push) Failing after 32s
Post / images (amd64, work) (push) Failing after 38s
Post / images (arm64, addon-manager) (push) Failing after 39s
Post / images (arm64, placement) (push) Failing after 40s
Post / images (arm64, registration) (push) Failing after 35s
Post / images (arm64, registration-operator) (push) Failing after 15s
Post / images (arm64, work) (push) Failing after 18s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Failing after 30s
Signed-off-by: Yang Le <yangle@redhat.com>
2025-09-11 07:54:39 +00:00
Jian Qiu
f9d4628f17 Enhance test coverage (#1174)
Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-09-11 07:26:59 +00:00
Jian Zhu
b506d16cf8 🐛 Fix ManagedClusterAddons not removed when ClusterManagementAddon is deleted (#1160)
Some checks failed
Post / coverage (push) Failing after 38s
Post / images (amd64, addon-manager) (push) Failing after 33s
Post / images (amd64, placement) (push) Failing after 41s
Post / images (amd64, registration) (push) Failing after 40s
Post / images (amd64, registration-operator) (push) Failing after 38s
Post / images (amd64, work) (push) Failing after 36s
Post / images (arm64, addon-manager) (push) Failing after 35s
Post / images (arm64, placement) (push) Failing after 39s
Post / images (arm64, registration) (push) Failing after 34s
Post / images (arm64, registration-operator) (push) Failing after 33s
Post / images (arm64, work) (push) Failing after 35s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Scorecard supply-chain security / Scorecard analysis (push) Failing after 41s
Close stale issues and PRs / stale (push) Failing after 27s
* Fix ManagedClusterAddons not removed when ClusterManagementAddon is deleted

The addon template controller was stopping addon managers immediately when
ClusterManagementAddon was deleted, without waiting for pre-delete jobs
to complete or ManagedClusterAddons to be cleaned up via owner reference
cascading deletion.

This change implements the TODO at line 105 by checking if all
ManagedClusterAddons are deleted before stopping the manager. The controller
now uses field selectors to efficiently query for remaining ManagedClusterAddons
and requeues after 10 seconds if any still exist, allowing time for proper
cleanup.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* add e2e test

Signed-off-by: zhujian <jiazhu@redhat.com>

* return err when stopUnusedManagers failed

Signed-off-by: zhujian <jiazhu@redhat.com>

* Address review comments for addon manager deletion fix

- Use lister instead of API client for better performance
- Add named constant for requeue delay
- Fix test cache synchronization issues
- Improve test coverage from 74.7% to 75.6%

Addresses review feedback from Qiujian16 and CodeRabbit.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* Fix e2e test timeout for configmap deletion check

Add explicit 180s timeout for pre-delete job configmap cleanup.
The default 90s timeout was insufficient for the deletion workflow.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* Improve error logging in template agent

- Replace utilruntime.HandleError with structured logging in CSR functions
- Add more context to error messages for better debugging
- Use logger.Info for template retrieval errors to provide better visibility

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* Use ManagedClusterAddonByName index for efficient lookup

- Replace inefficient list-and-filter with indexed lookup
- Add managedClusterAddonIndexer field to controller struct
- Update comment to accurately describe functionality
- Fix unit tests to properly set up the required index

This addresses the PR review feedback to use the existing index
instead of listing all ManagedClusterAddOns and filtering by name.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* Remove unused mcaLister field

Since we now use managedClusterAddonIndexer for efficient lookup,
the mcaLister field is no longer needed. This cleanup reduces
memory usage and simplifies the controller structure.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* Replace inefficient list-and-filter with indexed lookup in runController

Use managedClusterAddonIndexer.ByIndex() instead of listing all ManagedClusterAddOns
and filtering by name. This provides O(1) indexed lookup instead of O(n) linear scan.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* Fix review comments for addon manager deletion

- Fix closure capture bug in controller test by using captured variables
- Fix typo 'copyiedConfig' to 'copiedConfig' in e2e tests

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* Optimize ManagedClusterAddOn event handling in addon template controller

Replace filtered event handling with custom event handlers that only trigger
reconciliation when AddOnTemplate configReferences actually change. This
reduces unnecessary reconciliation cycles by using reflect.DeepEqual to
compare config references between old and new objects.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* Revert "Optimize ManagedClusterAddOn event handling in addon template controller"

This reverts commit 4649d1b9ac.

Signed-off-by: zhujian <jiazhu@redhat.com>

---------

Signed-off-by: zhujian <jiazhu@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-09-10 01:30:19 +00:00
Jian Qiu
b4b42aa0b5 Requeue ssar check if only hubKubeConfigSecret is unauthorized (#1169) (#1164)
Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-09-08 07:11:44 +00:00
Jian Qiu
7d42f5f9f6 Requeue ssar check if only hubKubeConfigSecret is unauthorized (#1169)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 39s
Post / coverage (push) Failing after 28s
Post / images (amd64, addon-manager) (push) Failing after 27s
Post / images (amd64, placement) (push) Failing after 29s
Post / images (amd64, registration) (push) Failing after 27s
Post / images (amd64, registration-operator) (push) Failing after 33s
Post / images (amd64, work) (push) Failing after 29s
Post / images (arm64, addon-manager) (push) Failing after 27s
Post / images (arm64, placement) (push) Failing after 26s
Post / images (arm64, registration) (push) Failing after 32s
Post / images (arm64, registration-operator) (push) Failing after 31s
Post / images (arm64, work) (push) Failing after 34s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 50s
Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-09-05 02:26:30 +00:00