107 Commits

Author SHA1 Message Date
Jian Qiu
63d9574ca2 Add watch-based feedback with dynamic informer lifecycle management (#1350)
* Add watch-based feedback with dynamic informer lifecycle management

Implements dynamic informer registration and cleanup for resources
configured with watch-based status feedback (FeedbackScrapeType=Watch).
This enables real-time status updates for watched resources while
efficiently managing resource lifecycle.

Features:
- Automatically register informers for resources with FeedbackWatchType
- Skip informer registration for FeedbackPollType or when not configured
- Clean up informers when resources are removed from manifestwork
- Clean up informers during applied manifestwork finalization
- Clean up informers when feedback type changes from watch to poll

Implementation:
- Refactored ObjectReader to interface for better modularity
- Added UnRegisterInformerFromAppliedManifestWork helper for bulk cleanup
- Enhanced AvailableStatusController to conditionally register informers
- Updated finalization controllers to unregister informers on cleanup
- Added nil safety checks to prevent panics during cleanup

Testing:
- Unit tests for informer registration based on feedback type
- Unit tests for bulk unregistration and nil safety
- Integration test for end-to-end watch-based feedback workflow
- Integration test for informer cleanup on manifestwork deletion
- All existing tests updated and passing

This feature improves performance by using watch-based updates for
real-time status feedback while maintaining efficient resource cleanup.

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Fallback to get from client when informer is not synced

Signed-off-by: Jian Qiu <jqiu@redhat.com>

---------

Signed-off-by: Jian Qiu <jqiu@redhat.com>
2026-01-29 06:46:21 +00:00
Jian Qiu
9b010ef622 🌱 build object reader to get resource object from spoke (#1324)
Some checks failed
Post / images (amd64, addon-manager) (push) Failing after 51s
Post / images (amd64, placement) (push) Failing after 46s
Post / images (amd64, registration) (push) Failing after 43s
Post / images (amd64, registration-operator) (push) Failing after 44s
Post / images (amd64, work) (push) Failing after 44s
Post / images (arm64, addon-manager) (push) Failing after 43s
Post / images (arm64, placement) (push) Failing after 43s
Post / images (arm64, registration) (push) Failing after 42s
Post / images (arm64, registration-operator) (push) Failing after 43s
Post / images (arm64, work) (push) Failing after 41s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Scorecard supply-chain security / Scorecard analysis (push) Failing after 8m56s
Post / coverage (push) Failing after 13m3s
Close stale issues and PRs / stale (push) Successful in 42s
* A resource informer code to watch resources

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Use object reader in controller

Signed-off-by: Jian Qiu <jqiu@redhat.com>

---------

Signed-off-by: Jian Qiu <jqiu@redhat.com>
2026-01-23 07:32:13 +00:00
Zhiwei Yin
9a1e925112 ensure immediate requeue for transient errors when work spec is changed (#1335)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 22s
Post / images (amd64, addon-manager) (push) Failing after 51s
Post / images (amd64, placement) (push) Failing after 46s
Post / images (amd64, registration) (push) Failing after 44s
Post / images (amd64, registration-operator) (push) Failing after 44s
Post / images (amd64, work) (push) Failing after 46s
Post / images (arm64, placement) (push) Failing after 45s
Post / images (arm64, registration) (push) Failing after 45s
Post / images (arm64, registration-operator) (push) Failing after 44s
Post / images (arm64, work) (push) Failing after 45s
Post / images (arm64, addon-manager) (push) Failing after 16m21s
Post / coverage (push) Failing after 39m14s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Signed-off-by: Zhiwei Yin <zyin@redhat.com>
2026-01-19 07:57:39 +00:00
xuezhao
d83c822129 Add duplicate manifest detection in ManifestWork webhook validation (#1310)
This commit adds validation to detect and reject duplicate manifests
in ManifestWork resources. A manifest is considered duplicate when
it has the same apiVersion, kind, namespace, and name as another
manifest in the same ManifestWork.

This prevents issues where duplicate manifests with different specs
can cause state inconsistency, as the Work Agent applies manifests
sequentially and later entries would overwrite earlier ones.

The validation returns a clear error message indicating the duplicate
manifest's index and the index of its first occurrence.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: xuezhaojun <zxue@redhat.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 06:09:25 +00:00
Jian Zhu
c69a2586e5 fix: ensure immediate eviction after grace period expires (#1330)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m3s
Post / images (amd64, addon-manager) (push) Failing after 7m31s
Post / coverage (push) Failing after 9m30s
Post / images (amd64, registration-operator) (push) Failing after 57s
Post / images (amd64, work) (push) Failing after 52s
Post / images (arm64, addon-manager) (push) Failing after 50s
Post / images (arm64, placement) (push) Failing after 52s
Post / images (arm64, registration) (push) Failing after 50s
Post / images (arm64, registration-operator) (push) Failing after 52s
Post / images (arm64, work) (push) Failing after 49s
Post / images (amd64, registration) (push) Failing after 7m6s
Post / images (amd64, placement) (push) Failing after 27m47s
Post / image manifest (addon-manager) (push) Has been cancelled
Post / image manifest (placement) (push) Has been cancelled
Post / image manifest (registration) (push) Has been cancelled
Post / image manifest (registration-operator) (push) Has been cancelled
Post / image manifest (work) (push) Has been cancelled
Post / trigger clusteradm e2e (push) Has been cancelled
Close stale issues and PRs / stale (push) Successful in 3s
Fixed a bug where AppliedManifestWorks were not evicted immediately
after the appliedmanifestwork-eviction-grace-period expired.

Root cause: The controller used an exponential backoff rate limiter
to schedule requeue delays, which caused:
1. Exponentially increasing delays during grace period (1min -> 2min -> 4min...)
2. Unpredictable delays after grace period expired

Solution: Replace rate limiter with direct time calculation. Now the
controller calculates the exact remaining time until eviction and
schedules the next sync for that precise moment:
  remainingTime := evictionTime.Sub(now)

Changes:
- Removed rateLimiter field and workqueue import
- Calculate exact remaining time instead of using exponential backoff
- Added V(4) logging to show scheduled eviction time and remaining time
- Updated unit test expectations (queue length 0 for delayed items)

Impact: AppliedManifestWorks are now evicted immediately when the
grace period expires, instead of being delayed by minutes due to
exponential backoff.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: zhujian <jiazhu@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-16 09:48:50 +00:00
Anne Lau
4dc99cd621 Progressing status conditions, true wins (#1332)
Signed-off-by: annelau <annelau@salesforce.com>
Co-authored-by: annelau <annelau@salesforce.com>
2026-01-16 06:53:14 +00:00
Anne Lau
635b0ff7e9 PlacementRollout to reflect Ready status (#1281)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 20s
Post / images (amd64, placement) (push) Failing after 45s
Post / images (amd64, registration) (push) Failing after 42s
Post / images (amd64, registration-operator) (push) Failing after 40s
Post / images (amd64, work) (push) Failing after 41s
Post / images (arm64, addon-manager) (push) Failing after 41s
Post / images (arm64, placement) (push) Failing after 40s
Post / images (arm64, registration) (push) Failing after 39s
Post / images (arm64, registration-operator) (push) Failing after 39s
Post / images (arm64, work) (push) Failing after 41s
Post / images (amd64, addon-manager) (push) Failing after 7m30s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Post / coverage (push) Failing after 9m44s
Update with success count

Remove status references

Add unit tests

Fix unit tests

Update unit tests
Test fix

Fix tests for lastTransitionTime

Fix integration tests

Signed-off-by: annelau <annelau@salesforce.com>
Co-authored-by: annelau <annelau@salesforce.com>
2026-01-08 01:53:14 +00:00
Anne Lau
ff9f801aa0 Fix transition time for Applied + StatusFeedbackSynced (#1282)
Some checks failed
Post / coverage (push) Failing after 7m10s
Post / images (amd64, addon-manager) (push) Failing after 43s
Post / images (amd64, placement) (push) Failing after 36s
Post / images (amd64, registration) (push) Failing after 36s
Post / images (amd64, registration-operator) (push) Failing after 36s
Post / images (amd64, work) (push) Failing after 38s
Post / images (arm64, placement) (push) Failing after 37s
Post / images (arm64, registration) (push) Failing after 37s
Post / images (arm64, registration-operator) (push) Failing after 38s
Post / images (arm64, work) (push) Failing after 38s
Post / images (arm64, addon-manager) (push) Failing after 14m20s
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m28s
Post / image manifest (addon-manager) (push) Has been cancelled
Post / image manifest (placement) (push) Has been cancelled
Post / image manifest (registration) (push) Has been cancelled
Post / image manifest (registration-operator) (push) Has been cancelled
Post / image manifest (work) (push) Has been cancelled
Post / trigger clusteradm e2e (push) Has been cancelled
Close stale issues and PRs / stale (push) Successful in 4s
Update code changes to only update observed generation without lastTransitionTime

Update with simple tests

Update with the latest PR changes

Add unit test changes

Add integration test generated by cursor

Fix unit tests

Signed-off-by: annelau <annelau@salesforce.com>
Co-authored-by: annelau <annelau@salesforce.com>
2025-12-31 02:27:59 +00:00
Qing Hao
c516beffa6 Add addon conversion webhook for v1alpha1/v1beta1 API migration (#1289)
Some checks failed
Post / images (amd64, addon-manager) (push) Failing after 46s
Post / images (amd64, placement) (push) Failing after 41s
Post / images (amd64, registration-operator) (push) Failing after 39s
Post / images (amd64, work) (push) Failing after 42s
Post / images (arm64, addon-manager) (push) Failing after 39s
Post / images (arm64, placement) (push) Failing after 39s
Post / images (arm64, registration) (push) Failing after 40s
Post / images (arm64, registration-operator) (push) Failing after 42s
Post / images (arm64, work) (push) Failing after 39s
Post / images (amd64, registration) (push) Failing after 7m46s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Post / coverage (push) Failing after 14m33s
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m25s
Close stale issues and PRs / stale (push) Successful in 46s
* Add addon conversion webhook for v1alpha1/v1beta1 API migration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Qing Hao <qhao@redhat.com>

* Fix GroupVersion compatibility issues after API dependency update

This commit fixes compilation and test errors introduced by updating
the API dependency to use native conversion functions from PR #411.

Changes include:

1. Fix GroupVersion type mismatches across the codebase:
   - Updated OwnerReference creation to use schema.GroupVersion
   - Fixed webhook scheme registration to use proper GroupVersion type
   - Applied fixes to addon, placement, migration, work, and registration controllers

2. Enhance addon conversion webhook:
   - Use native API conversion functions from addon/v1beta1/conversion.go
   - Fix InstallNamespace annotation key to match expected format
   - Add custom logic to populate deprecated ConfigReferent field in ConfigReferences
   - Properly preserve annotations during v1alpha1 <-> v1beta1 conversion

3. Remove duplicate conversion code:
   - Deleted pkg/addon/webhook/conversion/ directory (~500 lines)
   - Now using native conversion functions from the API repository

4. Patch vendored addon-framework:
   - Fixed GroupVersion errors in agentdeploy utils

All unit tests pass successfully (97 packages, 0 failures).

Signed-off-by: Qing Hao <qhao@redhat.com>

---------

Signed-off-by: Qing Hao <qhao@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-24 08:26:35 +00:00
Jian Qiu
78daf0d2ae fix: skip GC for ManifestWorks managed by ManifestWorkReplicaSet (#1299)
Skip garbage collection for ManifestWorks that have the
ManifestWorkReplicaSet controller label, as these should be
managed exclusively by the ManifestWorkReplicaSet controller.

Changes:
- Fix logic bug in controller to properly check for ReplicaSet label
- Add unit tests for label-based GC skip behavior
- Add integration test to verify GC skip for ReplicaSet-managed works

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Jian Qiu <jqiu@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 08:56:07 +00:00
Jian Qiu
a06e37e65c 🌱 Integrate SDK logging tracing into work agent controllers (#1277)
This change adds log tracing support to the work agent controllers by:
- Upgrading SDK to version with logging.SetLogTracingByObject helper
- Setting tracing keys from ManifestWork objects in all work controllers
- Adding clusterName to the base logger for better log context
- Propagating tracing context through cloud events

The tracing keys enable better correlation of logs across the work
lifecycle from source to agent.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Jian Qiu <jqiu@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-04 13:12:38 +00:00
Jian Qiu
33310619d9 🌱 use SDK basecontroller for better logging. (#1269)
* Use basecontroller in sdk-go instead for better logging

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Rename to fakeSyncContext

Signed-off-by: Jian Qiu <jqiu@redhat.com>

---------

Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-12-01 03:07:02 +00:00
Qing Hao
26edb9423a fix: Check Applied condition before evaluating rollout status (#1243)
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Qing Hao <qhao@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-01 02:14:46 +00:00
Jian Qiu
8f8cd01b52 Update dependencies: k8s 0.34.1, controller-runtime 0.22.3, and OCM libs (#1267)
Some checks failed
Post / coverage (push) Failing after 37m28s
Post / images (amd64, addon-manager) (push) Failing after 7m29s
Post / images (amd64, placement) (push) Failing after 7m1s
Post / images (amd64, registration) (push) Failing after 7m7s
Post / images (amd64, registration-operator) (push) Failing after 7m22s
Post / images (amd64, work) (push) Failing after 7m25s
Post / images (arm64, addon-manager) (push) Failing after 7m5s
Post / images (arm64, placement) (push) Failing after 7m4s
Post / images (arm64, registration) (push) Failing after 7m20s
Post / images (arm64, registration-operator) (push) Failing after 7m9s
Post / images (arm64, work) (push) Failing after 7m12s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Scorecard supply-chain security / Scorecard analysis (push) Failing after 59s
Close stale issues and PRs / stale (push) Successful in 29s
- Update k8s.io/* libraries to v0.34.1
- Update sigs.k8s.io/controller-runtime to v0.22.3
- Update open-cluster-management.io/api to 2337d27c3b7f
- Update open-cluster-management.io/sdk-go to a185f88d7b1b
- Update open-cluster-management.io/addon-framework to 1a0a9be61322
- Update openshift libraries (api, client-go, library-go) to latest commits
  for structured-merge-diff v6 compatibility
- Add Recorder() method to FakeSDKSyncContext with adapter pattern to bridge
  openshift/library-go and SDK event recorder interfaces
- Update vendor directory and regenerate CRDs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Jian Qiu <jqiu@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-11-26 05:56:58 +00:00
Jian Qiu
eb033993c2 🌱 Use base controller in sdk-go (#1251)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m11s
Post / coverage (push) Failing after 37m30s
Post / images (amd64, addon-manager) (push) Failing after 7m29s
Post / images (amd64, placement) (push) Failing after 6m57s
Post / images (amd64, registration) (push) Failing after 7m5s
Post / images (amd64, registration-operator) (push) Failing after 7m5s
Post / images (amd64, work) (push) Failing after 7m2s
Post / images (arm64, addon-manager) (push) Failing after 7m18s
Post / images (arm64, placement) (push) Failing after 7m7s
Post / images (arm64, registration) (push) Failing after 7m13s
Post / images (arm64, registration-operator) (push) Failing after 7m6s
Post / images (arm64, work) (push) Failing after 7m2s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 45s
* Use base controller in sdk-go

We can leverage contextual logger in base controller.

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Fix integration test error

Signed-off-by: Jian Qiu <jqiu@redhat.com>

---------

Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-11-20 07:53:42 +00:00
Wei Liu
b928d9f2a9 update sdk-go (#1257)
Some checks failed
Post / coverage (push) Failing after 38m23s
Post / images (amd64, addon-manager) (push) Failing after 7m53s
Post / images (amd64, placement) (push) Failing after 6m57s
Post / images (amd64, registration) (push) Failing after 7m7s
Post / images (amd64, registration-operator) (push) Failing after 7m1s
Post / images (amd64, work) (push) Failing after 7m8s
Post / images (arm64, addon-manager) (push) Failing after 7m10s
Post / images (arm64, placement) (push) Failing after 7m11s
Post / images (arm64, registration) (push) Failing after 6m58s
Post / images (arm64, registration-operator) (push) Failing after 7m17s
Post / images (arm64, work) (push) Failing after 7m18s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m15s
Close stale issues and PRs / stale (push) Successful in 41s
Signed-off-by: Wei Liu <liuweixa@redhat.com>
2025-11-19 04:08:16 +00:00
Jian Qiu
5528aff6d3 🌱 Add contextual logging for work agent (#1242)
* Add contextual logging for work agent

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Resolve comments

Signed-off-by: Jian Qiu <jqiu@redhat.com>

---------

Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-11-07 05:28:13 +00:00
Qing Hao
34cd9a2549 Update rollout logic to use Progressing condition instead of WorkApplied (#1207)
Some checks failed
Post / coverage (push) Failing after 22s
Post / images (amd64, addon-manager) (push) Failing after 17s
Post / images (amd64, placement) (push) Failing after 25s
Post / images (amd64, registration) (push) Failing after 17s
Post / images (amd64, registration-operator) (push) Failing after 18s
Post / images (amd64, work) (push) Failing after 26s
Post / images (arm64, addon-manager) (push) Failing after 16s
Post / images (arm64, placement) (push) Failing after 21s
Post / images (arm64, registration) (push) Failing after 25s
Post / images (arm64, registration-operator) (push) Failing after 27s
Post / images (arm64, work) (push) Failing after 23s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Scorecard supply-chain security / Scorecard analysis (push) Failing after 20s
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Qing Hao <qhao@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-10-22 02:31:40 +00:00
Jian Qiu
daa9b2fa54 🐛 Avoid redundant apply and get operation in work controller (#1196)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 28s
Post / coverage (push) Failing after 22s
Post / images (amd64, addon-manager) (push) Failing after 30s
Post / images (amd64, placement) (push) Failing after 25s
Post / images (amd64, registration) (push) Failing after 16s
Post / images (amd64, registration-operator) (push) Failing after 23s
Post / images (amd64, work) (push) Failing after 17s
Post / images (arm64, addon-manager) (push) Failing after 14s
Post / images (arm64, placement) (push) Failing after 19s
Post / images (arm64, registration) (push) Failing after 23s
Post / images (arm64, registration-operator) (push) Failing after 17s
Post / images (arm64, work) (push) Failing after 19s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Failing after 31s
* Remove event after apply and add jitter when requeue

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Change event handler to avoid redundant reconciles

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Add unit tests for onAdd and onUpdate function

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Fix interegation test fail

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Set resync interval to 4-6 mins

Signed-off-by: Jian Qiu <jqiu@redhat.com>

---------

Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-10-17 02:04:40 +00:00
Jian Qiu
eed705a038 Fix ManifestWorkReplicaSet not deleting ManifestWorks from old placement (#1206)
When a ManifestWorkReplicaSet's placementRef was changed, the
ManifestWorks created for the old placement were not deleted,
causing orphaned resources.

The deployReconciler only processed placements currently in the spec
and never cleaned up ManifestWorks from removed placements.

This commit adds cleanup logic that:
- Builds a set of current placement names from the spec
- Lists all ManifestWorks belonging to the ManifestWorkReplicaSet
- Deletes any ManifestWorks with placement labels not in current spec

Also adds comprehensive tests:
- Integration test verifying placement change cleanup
- Unit tests for single and multiple placement change scenarios

Fixes #1203

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Jian Qiu <jqiu@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-10-13 06:31:38 +00:00
Jian Qiu
2f04992d6c Deleted manifestwork when it is completed for ttl seconds. (#1158)
* Delete manifestwork when it is completed after ttl

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Fix integration test

Signed-off-by: Jian Qiu <jqiu@redhat.com>

* Update operator and e2e tests

Signed-off-by: Jian Qiu <jqiu@redhat.com>

---------

Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-09-23 02:23:47 +00:00
Suvaansh
6056c04893 Adding labels to the resources created by work controller (#1176)
Signed-off-by: suvaanshkumar <suvaanshkumar@gmail.com>
2025-09-19 02:24:46 +00:00
Jian Zhu
1125e4c33d 🌱 Remove resolved TODO comments (#1177)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 50s
Post / coverage (push) Failing after 55s
Post / images (amd64, addon-manager) (push) Failing after 36s
Post / images (amd64, placement) (push) Failing after 40s
Post / images (amd64, registration) (push) Failing after 28s
Post / images (amd64, registration-operator) (push) Failing after 20s
Post / images (amd64, work) (push) Failing after 29s
Post / images (arm64, addon-manager) (push) Failing after 33s
Post / images (arm64, placement) (push) Failing after 32s
Post / images (arm64, registration) (push) Failing after 25s
Post / images (arm64, registration-operator) (push) Failing after 33s
Post / images (arm64, work) (push) Failing after 29s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
* 🧹 Remove resolved TODO comments

- Remove TODO comment about confirming subject in CustomSignerConfigurations
- Remove TODO comment about namespace value in manifestwork validating webhook

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* Ensure addon namespace and image pull secret sync

- Remove outdated TODO comment about addon deployment in Hosted mode
- Namespace creation is responsibility of addon developers
- Image pull secrets are synced to namespaces with addon.open-cluster-management.io/namespace label by the AddonPullImageSecretController

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

---------

Signed-off-by: zhujian <jiazhu@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-09-15 02:20:53 +00:00
Jian Qiu
b4b42aa0b5 Requeue ssar check if only hubKubeConfigSecret is unauthorized (#1169) (#1164)
Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-09-08 07:11:44 +00:00
Jian Qiu
e2be403132 Update grpc configuration in operator API (#1159)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 32s
Post / coverage (push) Failing after 43s
Post / images (amd64, addon-manager) (push) Failing after 41s
Post / images (amd64, placement) (push) Failing after 21s
Post / images (amd64, registration) (push) Failing after 23s
Post / images (amd64, registration-operator) (push) Failing after 30s
Post / images (amd64, work) (push) Failing after 28s
Post / images (arm64, addon-manager) (push) Failing after 28s
Post / images (arm64, placement) (push) Failing after 26s
Post / images (arm64, registration) (push) Failing after 35s
Post / images (arm64, registration-operator) (push) Failing after 28s
Post / images (arm64, work) (push) Failing after 35s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 38s
Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-09-04 11:15:15 +00:00
Wei Liu
c5e7e0711a adjust base and max delay for appliedmanifestwork deletion (#1120)
Signed-off-by: Wei Liu <liuweixa@redhat.com>
2025-08-11 07:07:00 +00:00
Jian Zhu
aa660678a4 ⚠️ Remove crd apiextensions v1beta1 (#1095)
Some checks failed
Post / coverage (push) Failing after 39m34s
Post / images (amd64) (push) Failing after 8m31s
Post / images (arm64) (push) Failing after 7m55s
Post / image manifest (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m53s
Close stale issues and PRs / stale (push) Successful in 56s
* Remove crd apiextensions v1beta1

Signed-off-by: zhujian <jiazhu@redhat.com>

* fix unit test

Signed-off-by: zhujian <jiazhu@redhat.com>

---------

Signed-off-by: zhujian <jiazhu@redhat.com>
2025-07-30 01:59:42 +00:00
Jian Qiu
588f82f48b Refactor webhook to use a common webhook option (#1096)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m26s
Post / coverage (push) Failing after 39m1s
Post / images (amd64) (push) Failing after 8m21s
Post / images (arm64) (push) Failing after 7m47s
Post / image manifest (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 47s
Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-07-29 07:38:59 +00:00
Jian Zhu
f989a37e1a 🌱 Upgrade golang to 1.24 and helm to 3.18.4 (#1085)
* Upgrade golang to 1.24

Signed-off-by: zhujian <jiazhu@redhat.com>

* Fix lint errors

Co-authored-by: gemini <gemini@google.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* upgrade sdk-go to latest

Signed-off-by: zhujian <jiazhu@redhat.com>

---------

Signed-off-by: zhujian <jiazhu@redhat.com>
Co-authored-by: gemini <gemini@google.com>
2025-07-28 02:13:36 +00:00
Jian Qiu
334710ce0e Set deleting condition when mw is deleting (#1084)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m54s
Post / coverage (push) Failing after 36m57s
Post / images (amd64) (push) Failing after 8m41s
Post / images (arm64) (push) Failing after 8m7s
Post / image manifest (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 49s
Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-07-24 01:11:02 +00:00
Ben Perry
c5e776cdd9 Manifest completion (#1033)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m58s
Post / coverage (push) Failing after 36m24s
Post / images (amd64) (push) Failing after 9m7s
Post / images (arm64) (push) Failing after 8m30s
Post / image manifest (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 57s
* Skip manifests in work reconcile that are marked Complete

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Aggregate Complete condition to work from manifests

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Delete work that is complete and satisfies configured TTL

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* tests

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* lint

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* go.mod

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Helper funcs for conditions

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Generic condition aggregation

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Support integration test args

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Remove work deletion from spoke, will be moved to hub GC

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Cleanup

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* update api

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Wait for NS to exist before testing

Signed-off-by: Ben Perry <bhperry94@gmail.com>

---------

Signed-off-by: Ben Perry <bhperry94@gmail.com>
2025-07-14 04:53:04 +00:00
Jian Zhu
71236758c1 🌱 Send an event for evicting appliedmanifestwork (#1066)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Has been cancelled
Post / coverage (push) Has been cancelled
Post / images (amd64) (push) Has been cancelled
Post / images (arm64) (push) Has been cancelled
Post / image manifest (push) Has been cancelled
Post / trigger clusteradm e2e (push) Has been cancelled
Close stale issues and PRs / stale (push) Successful in 56s
* Increase the log level for evicting appliedmanifestwork

Signed-off-by: zhujian <jiazhu@redhat.com>

* Add an event for evicting appliedmanifestwork

Signed-off-by: zhujian <jiazhu@redhat.com>

---------

Signed-off-by: zhujian <jiazhu@redhat.com>
2025-07-13 13:47:30 +00:00
Ben Perry
cbff56ad4b Add bhperry as approver/reviewer of work API (#1065)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 5m29s
Post / coverage (push) Failing after 34m7s
Post / images (amd64) (push) Failing after 8m16s
Post / images (arm64) (push) Failing after 7m38s
Post / image manifest (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Signed-off-by: Ben Perry <bhperry94@gmail.com>
2025-07-09 01:05:19 +00:00
Ben Perry
377ba25c26 Workload conditions (#910)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m40s
Post / coverage (push) Failing after 35m43s
Post / images (amd64) (push) Failing after 8m36s
Post / images (arm64) (push) Failing after 8m8s
Post / image manifest (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 48s
* Import OCM API changes for workload conditions

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Implement condition rule evaluator

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Evaluate manifest condition rules after apply

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* note to self

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Cleanup

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Return config option if rules are set

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* update api

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Always return an error to inform user about the state of their condition rule

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Condition rule errors should not result in retrying apply

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Test condition rule reconciliation

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Return condition status Unknown when an internal CEL error occurs

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Update api

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Switch to common CEL lib

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Update to simplified celExpressions format

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Formatting

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* tidy

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Update ocm api

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Update sdk-go

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Switch to sdk-go ConditionLib

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Update API

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Switch to WellKnownConditions with required Condition field

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Support CEL evaluation budget

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Update sdk-go

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Update API

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* lint

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Update go.mod

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Tests and comments

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Move condition reader to status controller for more frequent updates

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Ignore missing WellKnownCondition

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Fix test

Signed-off-by: Ben Perry <bhperry94@gmail.com>

* Update condition tests

Signed-off-by: Ben Perry <bhperry94@gmail.com>

---------

Signed-off-by: Ben Perry <bhperry94@gmail.com>
2025-06-11 15:47:35 +00:00
Jian Zhu
fb5ba3acaf 🐛 Use syncmap for the resource cache (#1023)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m33s
Post / coverage (push) Failing after 32m30s
Post / images (amd64) (push) Failing after 8m6s
Post / images (arm64) (push) Failing after 7m35s
Post / image manifest (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 43s
* Use syncmap for the resource cache

Signed-off-by: zhujian <jiazhu@redhat.com>

* update unit tests

Signed-off-by: zhujian <jiazhu@redhat.com>

* fix unit test

Signed-off-by: zhujian <jiazhu@redhat.com>

* use sync.map directly

Signed-off-by: zhujian <jiazhu@redhat.com>

---------

Signed-off-by: zhujian <jiazhu@redhat.com>
2025-06-05 01:58:40 +00:00
ivanscai
e753bd6e81 add hub QPS/Burst to hub work client,for talking with hub cluster apiserver (#1012)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m7s
Post / coverage (push) Failing after 27m40s
Post / images (amd64) (push) Failing after 3m26s
Post / images (arm64) (push) Failing after 2m55s
Post / image manifest (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 36s
Signed-off-by: caijing <caijing.cai@alibaba-inc.com>
2025-05-28 13:41:55 +00:00
Zhiwei Yin
e78a3a6d3d add deletionPolicy for manifestworkReplicaset (#996)
Some checks failed
Post / coverage (push) Failing after 26m38s
Post / images (amd64) (push) Failing after 3m24s
Post / images (arm64) (push) Failing after 2m59s
Post / image manifest (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m13s
Signed-off-by: Zhiwei Yin <zyin@redhat.com>
2025-05-28 01:12:21 +00:00
Jian Qiu
4eda44f2b9 Add jitter in requeue for status controller (#991)
Some checks failed
Post / coverage (push) Failing after 27m51s
Post / images (amd64) (push) Failing after 3m27s
Post / images (arm64) (push) Failing after 3m13s
Post / image manifest (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m9s
Close stale issues and PRs / stale (push) Successful in 40s
Instead of requeue all each resyncInterval, we requeue
for each item separately with a jitter to avoud bursty request

Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-05-14 07:09:27 +00:00
Ankit Kurmi
cd8827572e feat: updated golang to v1.23.6 and related k8s.io packages (#870)
Signed-off-by: Ankit152 <ankitkurmi152@gmail.com>
2025-04-09 07:46:27 +00:00
Wei Liu
0c5377c34b upgrade go-sdk (#914)
Signed-off-by: Wei Liu <liuweixa@redhat.com>
2025-03-27 07:06:09 +00:00
Wei Liu
73150dea19 reduce unnecessary log (#890)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 4m14s
Post / images (amd64) (push) Failing after 6m28s
Post / images (arm64) (push) Failing after 5m18s
Post / image manifest (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Post / coverage (push) Failing after 25m50s
Close stale issues and PRs / stale (push) Successful in 7s
Signed-off-by: Wei Liu <liuweixa@redhat.com>
2025-03-14 01:19:08 +00:00
Jian Qiu
453b775170 Bump api/sdk-go/addon-framework to v0.16 (#879)
Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-03-10 13:52:52 +00:00
Jian Qiu
741bf1c60f Apply ownerref eventhough other field is ignored (#847)
Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-02-25 02:16:45 +00:00
Jian Qiu
2746226037 Regactor hub driver interface and remove approver (#846)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m27s
Post / coverage (push) Failing after 8m0s
Post / images (amd64) (push) Failing after 7m17s
Post / images (arm64) (push) Failing after 5m47s
Post / image manifest (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 34s
Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-02-24 13:18:47 +00:00
Zhiwei Yin
568789fef4 refactor to use common HasFinalizer func (#830)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 2m33s
Post / coverage (push) Failing after 26m11s
Post / images (amd64) (push) Failing after 7m0s
Post / images (arm64) (push) Failing after 6m47s
Post / image manifest (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 28s
Signed-off-by: Zhiwei Yin <zyin@redhat.com>
2025-02-13 02:48:46 +00:00
Jian Qiu
11896ccda1 Fix the issue that ownerref is not set with ignorefields (#794)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 45s
Post / images (amd64) (push) Failing after 5m38s
Post / images (arm64) (push) Failing after 5m35s
Post / image manifest (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Post / coverage (push) Failing after 26m35s
Close stale issues and PRs / stale (push) Successful in 25s
Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-01-10 03:19:59 +00:00
Jian Qiu
2397c4e911 Add cache for applyUnstructured (#769)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 59s
Post / images (amd64) (push) Failing after 14m29s
Post / coverage (push) Failing after 26m17s
Post / images (arm64) (push) Failing after 6m52s
Post / image manifest (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 39s
This could reduce the number of calls to the spoke cluster

Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-01-08 15:46:53 +00:00
Yang Le
9af100f427 🐛 fix work agent performance issue (#785)
Some checks failed
Post / images (amd64) (push) Failing after 5m31s
Post / images (arm64) (push) Failing after 5m31s
Post / image manifest (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Post / coverage (push) Failing after 27m33s
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m11s
Signed-off-by: Yang Le <yangle@redhat.com>
2025-01-08 10:01:24 +00:00
Jian Qiu
037aa3ccfa Ignore field should not be honored when creating the resource (#784)
Some checks failed
Scorecard supply-chain security / Scorecard analysis (push) Failing after 12m44s
Post / images (amd64) (push) Failing after 8m35s
Post / coverage (push) Failing after 26m36s
Post / images (arm64) (push) Failing after 9m10s
Post / image manifest (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 30s
Signed-off-by: Jian Qiu <jqiu@redhat.com>
2025-01-03 06:09:48 +00:00
Jian Qiu
0897da69da Implement ignoreFields in server side apply (#726)
Signed-off-by: Jian Qiu <jqiu@redhat.com>
2024-12-10 02:56:55 +00:00