* Support default mode webhook networking configuration
Signed-off-by: Ben Perry <bhperry94@gmail.com>
* Share common webhook config between hosted and default mode
Signed-off-by: Ben Perry <bhperry94@gmail.com>
* Nest all related bind configuration together
Signed-off-by: Ben Perry <bhperry94@gmail.com>
* Disable surge with hostNetwork to prevent port conflicts
Signed-off-by: Ben Perry <bhperry94@gmail.com>
* Remove dev dependency
Signed-off-by: Ben Perry <bhperry94@gmail.com>
* Set defaults in one place
Signed-off-by: Ben Perry <bhperry94@gmail.com>
---------
Signed-off-by: Ben Perry <bhperry94@gmail.com>
The TestHubTimeoutController_Sync test was failing intermittently in CI
due to timing issues with time.Sleep() and real-time execution overhead.
Changes:
- Removed time.Sleep() dependency that caused flakiness
- Set lease renew time in the past using time.Now().Add(-duration)
to deterministically simulate aged leases
- Made timeout threshold configurable per test case
- Increased safety margin from 2s to 3s in "not timeout" case
(2s lease age vs 5s timeout, previously 1s wait vs 3s timeout)
- Set startTime in the past to bypass the 10s grace period check
that was added to handle stale lease scenarios
This eliminates race conditions in CI environments where execution
overhead could push the test beyond the timeout threshold.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Signed-off-by: zhujian <jiazhu@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The TestNewAgentOptions test was failing in CI because it expected
ComponentNamespace to always be "open-cluster-management-agent", but
NewAgentOptions() reads from /var/run/secrets/kubernetes.io/serviceaccount/namespace
when running in a Kubernetes pod (which exists in CI environment).
Updated the test to accept either the default value (when running locally)
or the actual pod namespace (when running in CI), while ensuring the
namespace is never empty.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Signed-off-by: zhujian <jiazhu@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Scorecard supply-chain security / Scorecard analysis (push) Failing after 2m4s
Post / coverage (push) Failing after 7m14s
Post / images (amd64, placement) (push) Failing after 47s
Post / images (amd64, registration-operator) (push) Failing after 40s
Post / images (amd64, work) (push) Failing after 41s
Post / images (amd64, addon-manager) (push) Failing after 7m50s
Post / images (arm64, addon-manager) (push) Failing after 42s
Post / images (arm64, registration) (push) Failing after 41s
Post / images (arm64, registration-operator) (push) Failing after 39s
Post / images (arm64, work) (push) Failing after 44s
Post / images (arm64, placement) (push) Failing after 7m13s
Post / images (amd64, registration) (push) Failing after 12m57s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 7s
* ✨ Support token-based authentication for template addons
This change enables template type addons to work with both CSR-based
and token-based authentication through dynamic subject binding.
Changes:
- Modified createPermissionBinding() to extract dynamic subjects from
addon.Status.Registrations instead of using hardcoded groups
- Added buildSubjectsFromRegistration() helper to extract user/groups
from registration status
- Returns SubjectNotReadyError when subjects not ready (enables retry)
- Removed clusterAddonGroup() function (no longer needed)
- Updated addon-framework dependency to v1.2.0 for SubjectNotReadyError
- Added comprehensive tests for buildSubjectsFromRegistration
- Updated test helpers to include registration status with proper subjects
The implementation now supports:
- CSR-based authentication (existing)
- Token-based authentication (new)
- Any future authentication method that populates Status.Registrations
Related: 14af2a2eeb/enhancements/sig-architecture/167-token-based-addon-registration/README.md🤖 Generated with Claude Code
https://claude.com/claude-code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>
* test: add unit test for system:authenticated group filtering
Add a test case to verify that buildSubjectsFromRegistration correctly
filters out the system:authenticated group from the list of groups when
building RBAC subjects. This covers the filtering logic in
registration.go lines 560-562.
Also update the expected groups in TestTemplateCSRConfigurationsFunc
to match the implementation that includes both cluster-specific and
addon-wide groups for token-based authentication.
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>
* feat: add addon-wide group and filter system:authenticated
Add support for addon-wide group in defaultGroups() to support
token-based authentication for template addons. This adds the
system:open-cluster-management:addon:{addonName} group in addition
to the cluster-specific group.
Also add filtering logic in buildSubjectsFromRegistration() to
exclude the system:authenticated group from RBAC subjects, as this
is a special Kubernetes group automatically added to all authenticated
users and should not be explicitly included in RoleBindings.
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>
* refactor: implement custom CSR approver with flexible org validation
Replace addon-framework's DefaultCSRApprover with a custom implementation
that supports both legacy and new CSR organization structures.
Key changes:
- Implement defaultCSRApprover function that accepts 2 or 3 organization units
- 3 orgs: legacy behavior including system:authenticated group in CSRs
- 2 orgs: new behavior where system:authenticated is filtered out
- Add support for gRPC-based CSR requests by checking CSRUsernameAnnotation
- Validate all required default addon groups are present in CSR
- Add necessary imports: k8s.io/apimachinery/pkg/util/sets and operatorapiv1
This enables backward compatibility while supporting the new token-based
authentication flow where system:authenticated is excluded from CSR orgs
but included in registration configs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>
* refactor: use addon-framework's updated KubeClientSignerConfigurations
Remove custom implementations and use addon-framework's native functions
which now include system:authenticated group by default.
Changes:
- Remove custom kubeClientSignerConfigurations function
- Remove custom defaultGroups function
- Remove custom defaultCSRApprover function
- Use agent.KubeClientSignerConfigurations from addon-framework
- Use utils.DefaultCSRApprover from addon-framework
- Remove unused imports: k8s.io/apimachinery/pkg/util/sets and operatorapiv1
The addon-framework has been updated to include system:authenticated in
DefaultGroups(), eliminating the need for custom implementations.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>
---------
Signed-off-by: zhujian <jiazhu@redhat.com>
Signed-off-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Scorecard supply-chain security / Scorecard analysis (push) Failing after 20s
Post / images (amd64, placement) (push) Failing after 47s
Post / images (amd64, registration) (push) Failing after 41s
Post / images (amd64, registration-operator) (push) Failing after 45s
Post / images (amd64, work) (push) Failing after 40s
Post / images (arm64, addon-manager) (push) Failing after 44s
Post / images (arm64, placement) (push) Failing after 41s
Post / images (arm64, registration) (push) Failing after 41s
Post / images (arm64, registration-operator) (push) Failing after 41s
Post / images (arm64, work) (push) Failing after 42s
Post / images (amd64, addon-manager) (push) Failing after 7m42s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Post / coverage (push) Failing after 9m45s
* check namespace existence and state in clusterprofile lifecycle controller.
Signed-off-by: Morven Cao <lcao@redhat.com>
* optimize the queue key and log for clusterprofile controller.
Signed-off-by: Morven Cao <lcao@redhat.com>
---------
Signed-off-by: Morven Cao <lcao@redhat.com>
* Add watch-based feedback with dynamic informer lifecycle management
Implements dynamic informer registration and cleanup for resources
configured with watch-based status feedback (FeedbackScrapeType=Watch).
This enables real-time status updates for watched resources while
efficiently managing resource lifecycle.
Features:
- Automatically register informers for resources with FeedbackWatchType
- Skip informer registration for FeedbackPollType or when not configured
- Clean up informers when resources are removed from manifestwork
- Clean up informers during applied manifestwork finalization
- Clean up informers when feedback type changes from watch to poll
Implementation:
- Refactored ObjectReader to interface for better modularity
- Added UnRegisterInformerFromAppliedManifestWork helper for bulk cleanup
- Enhanced AvailableStatusController to conditionally register informers
- Updated finalization controllers to unregister informers on cleanup
- Added nil safety checks to prevent panics during cleanup
Testing:
- Unit tests for informer registration based on feedback type
- Unit tests for bulk unregistration and nil safety
- Integration test for end-to-end watch-based feedback workflow
- Integration test for informer cleanup on manifestwork deletion
- All existing tests updated and passing
This feature improves performance by using watch-based updates for
real-time status feedback while maintaining efficient resource cleanup.
Signed-off-by: Jian Qiu <jqiu@redhat.com>
* Fallback to get from client when informer is not synced
Signed-off-by: Jian Qiu <jqiu@redhat.com>
---------
Signed-off-by: Jian Qiu <jqiu@redhat.com>
Scorecard supply-chain security / Scorecard analysis (push) Failing after 25s
Post / images (amd64, placement) (push) Failing after 47s
Post / images (amd64, registration) (push) Failing after 44s
Post / images (amd64, registration-operator) (push) Failing after 44s
Post / images (amd64, work) (push) Failing after 43s
Post / images (arm64, addon-manager) (push) Failing after 42s
Post / images (arm64, placement) (push) Failing after 41s
Post / images (arm64, registration) (push) Failing after 43s
Post / images (arm64, registration-operator) (push) Failing after 41s
Post / images (arm64, work) (push) Failing after 41s
Post / images (amd64, addon-manager) (push) Failing after 7m45s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Post / coverage (push) Failing after 38m55s
Close stale issues and PRs / stale (push) Successful in 50s
* sync clusterprofile based on managedclusterset and managedclustersetbinding
Co-authored-by: Claude <claude@anthropic.com>
Signed-off-by: Morven Cao <lcao@redhat.com>
* Refactor ClusterProfile controller into two separate controllers.
Signed-off-by: Morven Cao <lcao@redhat.com>
* address comments.
Signed-off-by: Morven Cao <lcao@redhat.com>
* fix lint issues.
Signed-off-by: Morven Cao <lcao@redhat.com>
* address comments.
Signed-off-by: Morven Cao <lcao@redhat.com>
* address comments.
Signed-off-by: Morven Cao <lcao@redhat.com>
---------
Signed-off-by: Morven Cao <lcao@redhat.com>
This commit adds validation to detect and reject duplicate manifests
in ManifestWork resources. A manifest is considered duplicate when
it has the same apiVersion, kind, namespace, and name as another
manifest in the same ManifestWork.
This prevents issues where duplicate manifests with different specs
can cause state inconsistency, as the Work Agent applies manifests
sequentially and later entries would overwrite earlier ones.
The validation returns a clear error message indicating the duplicate
manifest's index and the index of its first occurrence.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Signed-off-by: xuezhaojun <zxue@redhat.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m3s
Post / images (amd64, addon-manager) (push) Failing after 7m31s
Post / coverage (push) Failing after 9m30s
Post / images (amd64, registration-operator) (push) Failing after 57s
Post / images (amd64, work) (push) Failing after 52s
Post / images (arm64, addon-manager) (push) Failing after 50s
Post / images (arm64, placement) (push) Failing after 52s
Post / images (arm64, registration) (push) Failing after 50s
Post / images (arm64, registration-operator) (push) Failing after 52s
Post / images (arm64, work) (push) Failing after 49s
Post / images (amd64, registration) (push) Failing after 7m6s
Post / images (amd64, placement) (push) Failing after 27m47s
Post / image manifest (addon-manager) (push) Has been cancelled
Post / image manifest (placement) (push) Has been cancelled
Post / image manifest (registration) (push) Has been cancelled
Post / image manifest (registration-operator) (push) Has been cancelled
Post / image manifest (work) (push) Has been cancelled
Post / trigger clusteradm e2e (push) Has been cancelled
Close stale issues and PRs / stale (push) Successful in 3s
Fixed a bug where AppliedManifestWorks were not evicted immediately
after the appliedmanifestwork-eviction-grace-period expired.
Root cause: The controller used an exponential backoff rate limiter
to schedule requeue delays, which caused:
1. Exponentially increasing delays during grace period (1min -> 2min -> 4min...)
2. Unpredictable delays after grace period expired
Solution: Replace rate limiter with direct time calculation. Now the
controller calculates the exact remaining time until eviction and
schedules the next sync for that precise moment:
remainingTime := evictionTime.Sub(now)
Changes:
- Removed rateLimiter field and workqueue import
- Calculate exact remaining time instead of using exponential backoff
- Added V(4) logging to show scheduled eviction time and remaining time
- Updated unit test expectations (queue length 0 for delayed items)
Impact: AppliedManifestWorks are now evicted immediately when the
grace period expires, instead of being delayed by minutes due to
exponential backoff.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Signed-off-by: zhujian <jiazhu@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m24s
Post / coverage (push) Failing after 7m11s
Post / images (amd64, registration) (push) Failing after 45s
Post / images (amd64, registration-operator) (push) Failing after 42s
Post / images (amd64, placement) (push) Failing after 7m50s
Post / images (amd64, work) (push) Failing after 42s
Post / images (arm64, placement) (push) Failing after 42s
Post / images (arm64, registration) (push) Failing after 40s
Post / images (arm64, registration-operator) (push) Failing after 38s
Post / images (arm64, work) (push) Failing after 42s
Post / images (amd64, addon-manager) (push) Failing after 14m28s
Post / images (arm64, addon-manager) (push) Failing after 7m10s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Refactored the removeClusterRbac function into separate functions to handle
different RBAC resource cleanup scenarios:
- removeClusterRBACResources: orchestrates full RBAC cleanup when cluster is deleted
- removeClusterSpecificRBAC: removes ClusterRole and ClusterRoleBinding
- removeClusterSpecificRoleBindings: removes registration and work RoleBindings
When hubAcceptsClient is false (cluster denied), only RoleBindings are removed
while ClusterRole and ClusterRoleBinding are preserved and updated. This ensures
proper RBAC state for denied clusters without deleting cluster-scoped resources.
Added unit test to verify that when a cluster is denied, only RoleBindings are
deleted while ClusterRole and ClusterRoleBinding remain intact.
Signed-off-by: Zhiwei Yin <zyin@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
When converting ManagedClusterAddOn from v1beta1 to v1alpha1, the
internal annotation 'addon.open-cluster-management.io/v1alpha1-install-namespace'
should be removed after being converted to Spec.InstallNamespace field.
This annotation is only used internally for v1beta1 storage to preserve
the InstallNamespace field which was removed in v1beta1. It should not
appear in v1alpha1 API responses.
Fixes: ACM-28133
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Signed-off-by: Qing Hao <qhao@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Scorecard supply-chain security / Scorecard analysis (push) Failing after 13s
Post / images (amd64, addon-manager) (push) Failing after 48s
Post / images (amd64, placement) (push) Failing after 1m22s
Post / images (amd64, registration) (push) Failing after 42s
Post / images (amd64, work) (push) Failing after 41s
Post / images (arm64, addon-manager) (push) Failing after 42s
Post / images (arm64, placement) (push) Failing after 41s
Post / images (arm64, registration) (push) Failing after 41s
Post / images (arm64, registration-operator) (push) Failing after 41s
Post / images (arm64, work) (push) Failing after 42s
Post / images (amd64, registration-operator) (push) Failing after 21m14s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Post / coverage (push) Failing after 39m11s
Close stale issues and PRs / stale (push) Successful in 50s
* Fix work rolebinding cleanup when hubAcceptsClient is set to false
Signed-off-by: Erico G. Rimoli <erico.rimoli@totvs.com.br>
* Adds error handling to the removeClusterRbac call within the controller synchronization function
Signed-off-by: Erico G. Rimoli <erico.rimoli@totvs.com.br>
Scorecard supply-chain security / Scorecard analysis (push) Failing after 20s
Post / images (amd64, placement) (push) Failing after 45s
Post / images (amd64, registration) (push) Failing after 42s
Post / images (amd64, registration-operator) (push) Failing after 40s
Post / images (amd64, work) (push) Failing after 41s
Post / images (arm64, addon-manager) (push) Failing after 41s
Post / images (arm64, placement) (push) Failing after 40s
Post / images (arm64, registration) (push) Failing after 39s
Post / images (arm64, registration-operator) (push) Failing after 39s
Post / images (arm64, work) (push) Failing after 41s
Post / images (amd64, addon-manager) (push) Failing after 7m30s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Post / coverage (push) Failing after 9m44s
Update with success count
Remove status references
Add unit tests
Fix unit tests
Update unit tests
Test fix
Fix tests for lastTransitionTime
Fix integration tests
Signed-off-by: annelau <annelau@salesforce.com>
Co-authored-by: annelau <annelau@salesforce.com>
Scorecard supply-chain security / Scorecard analysis (push) Failing after 14s
Post / images (amd64, addon-manager) (push) Failing after 7m59s
Post / coverage (push) Failing after 8m58s
Post / images (amd64, registration) (push) Failing after 52s
Post / images (amd64, registration-operator) (push) Failing after 50s
Post / images (amd64, work) (push) Failing after 48s
Post / images (arm64, placement) (push) Failing after 48s
Post / images (arm64, registration) (push) Failing after 47s
Post / images (arm64, registration-operator) (push) Failing after 46s
Post / images (arm64, work) (push) Failing after 45s
Post / images (amd64, placement) (push) Failing after 7m34s
Post / images (arm64, addon-manager) (push) Failing after 9m56s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 1m3s
The cluster manager controller was silently using a literal "placeholder"
string as the CA bundle when the ca-bundle-configmap ConfigMap didn't exist
yet. This caused CRDs to be created with an invalid caBundle field
(cGxhY2Vob2xkZXI= which is base64 of "placeholder"), resulting in:
1. CRD conversion webhooks failing with "InvalidCABundle" error
2. CRDs not becoming Established
3. API endpoints not being registered
4. Dependent components (like MultiClusterHub) failing with:
"no matches for kind ClusterManagementAddOn"
The bug was a race condition between the cert rotation controller (which
creates the ca-bundle-configmap) and the cluster manager controller (which
reads it). When the ConfigMap was not found, the code did "// do nothing"
and silently continued with the placeholder value.
This fix:
1. Creates the hub namespace FIRST (before waiting for the CA bundle)
to allow the cert rotation controller to create the ca-bundle-configmap
2. Then waits for the CA bundle ConfigMap to exist before proceeding
3. Requeues via AddAfter if the ConfigMap is not found, allowing the
controller to gracefully retry until the cert rotation controller
has created it
This ensures CRDs are always created with valid CA bundles while avoiding
the deadlock where clusterManagerController waited for CA bundle but
certRotationController needed the namespace first.
Changes based on review feedback:
- Use requeue (AddAfter) instead of returning error (@elgnay)
- Use contextual logging instead of klog.V(4).Infof (@qiujian16)
The issue was discovered in OpenShift CI Prow jobs for ZTP hub deployment:
- https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-kni-eco-ci-cd-ztp-left-shifting-kpi-ci-4.21-telcov10n-virtualised-single-node-hub-ztp/2005051399989104640
- https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-kni-eco-ci-cd-ztp-left-shifting-kpi-ci-4.21-telcov10n-virtualised-single-node-hub-ztp/2005219283428184064
Affected versions: ACM 2.16.0-113/114, MCE 2.11.0-142/143 on OCP 4.21.0-rc.0
Signed-off-by: Carlos Cardenosa <ccardeno@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Post / images (amd64, addon-manager) (push) Failing after 43s
Post / images (amd64, placement) (push) Failing after 36s
Post / images (amd64, registration) (push) Failing after 36s
Post / images (amd64, registration-operator) (push) Failing after 36s
Post / images (amd64, work) (push) Failing after 38s
Post / images (arm64, placement) (push) Failing after 37s
Post / images (arm64, registration) (push) Failing after 37s
Post / images (arm64, registration-operator) (push) Failing after 38s
Post / images (arm64, work) (push) Failing after 38s
Post / images (arm64, addon-manager) (push) Failing after 14m20s
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m28s
Post / image manifest (addon-manager) (push) Has been cancelled
Post / image manifest (placement) (push) Has been cancelled
Post / image manifest (registration) (push) Has been cancelled
Post / image manifest (registration-operator) (push) Has been cancelled
Post / image manifest (work) (push) Has been cancelled
Post / trigger clusteradm e2e (push) Has been cancelled
Close stale issues and PRs / stale (push) Successful in 4s
Update code changes to only update observed generation without lastTransitionTime
Update with simple tests
Update with the latest PR changes
Add unit test changes
Add integration test generated by cursor
Fix unit tests
Signed-off-by: annelau <annelau@salesforce.com>
Co-authored-by: annelau <annelau@salesforce.com>
Post / images (amd64, addon-manager) (push) Failing after 46s
Post / images (amd64, placement) (push) Failing after 41s
Post / images (amd64, registration-operator) (push) Failing after 39s
Post / images (amd64, work) (push) Failing after 42s
Post / images (arm64, addon-manager) (push) Failing after 39s
Post / images (arm64, placement) (push) Failing after 39s
Post / images (arm64, registration) (push) Failing after 40s
Post / images (arm64, registration-operator) (push) Failing after 42s
Post / images (arm64, work) (push) Failing after 39s
Post / images (amd64, registration) (push) Failing after 7m46s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Post / coverage (push) Failing after 14m33s
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m25s
Close stale issues and PRs / stale (push) Successful in 46s
* Add addon conversion webhook for v1alpha1/v1beta1 API migration
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Qing Hao <qhao@redhat.com>
* Fix GroupVersion compatibility issues after API dependency update
This commit fixes compilation and test errors introduced by updating
the API dependency to use native conversion functions from PR #411.
Changes include:
1. Fix GroupVersion type mismatches across the codebase:
- Updated OwnerReference creation to use schema.GroupVersion
- Fixed webhook scheme registration to use proper GroupVersion type
- Applied fixes to addon, placement, migration, work, and registration controllers
2. Enhance addon conversion webhook:
- Use native API conversion functions from addon/v1beta1/conversion.go
- Fix InstallNamespace annotation key to match expected format
- Add custom logic to populate deprecated ConfigReferent field in ConfigReferences
- Properly preserve annotations during v1alpha1 <-> v1beta1 conversion
3. Remove duplicate conversion code:
- Deleted pkg/addon/webhook/conversion/ directory (~500 lines)
- Now using native conversion functions from the API repository
4. Patch vendored addon-framework:
- Fixed GroupVersion errors in agentdeploy utils
All unit tests pass successfully (97 packages, 0 failures).
Signed-off-by: Qing Hao <qhao@redhat.com>
---------
Signed-off-by: Qing Hao <qhao@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Skip garbage collection for ManifestWorks that have the
ManifestWorkReplicaSet controller label, as these should be
managed exclusively by the ManifestWorkReplicaSet controller.
Changes:
- Fix logic bug in controller to properly check for ReplicaSet label
- Add unit tests for label-based GC skip behavior
- Add integration test to verify GC skip for ReplicaSet-managed works
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Signed-off-by: Jian Qiu <jqiu@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
This change adds log tracing support to the work agent controllers by:
- Upgrading SDK to version with logging.SetLogTracingByObject helper
- Setting tracing keys from ManifestWork objects in all work controllers
- Adding clusterName to the base logger for better log context
- Propagating tracing context through cloud events
The tracing keys enable better correlation of logs across the work
lifecycle from source to agent.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Signed-off-by: Jian Qiu <jqiu@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Scorecard supply-chain security / Scorecard analysis (push) Failing after 1m11s
Post / coverage (push) Failing after 37m30s
Post / images (amd64, addon-manager) (push) Failing after 7m29s
Post / images (amd64, placement) (push) Failing after 6m57s
Post / images (amd64, registration) (push) Failing after 7m5s
Post / images (amd64, registration-operator) (push) Failing after 7m5s
Post / images (amd64, work) (push) Failing after 7m2s
Post / images (arm64, addon-manager) (push) Failing after 7m18s
Post / images (arm64, placement) (push) Failing after 7m7s
Post / images (arm64, registration) (push) Failing after 7m13s
Post / images (arm64, registration-operator) (push) Failing after 7m6s
Post / images (arm64, work) (push) Failing after 7m2s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Successful in 45s
* Use base controller in sdk-go
We can leverage contextual logger in base controller.
Signed-off-by: Jian Qiu <jqiu@redhat.com>
* Fix integration test error
Signed-off-by: Jian Qiu <jqiu@redhat.com>
---------
Signed-off-by: Jian Qiu <jqiu@redhat.com>
Scorecard supply-chain security / Scorecard analysis (push) Failing after 24s
Post / coverage (push) Failing after 24s
Post / images (amd64, addon-manager) (push) Failing after 27s
Post / images (amd64, placement) (push) Failing after 22s
Post / images (amd64, registration) (push) Failing after 17s
Post / images (amd64, registration-operator) (push) Failing after 27s
Post / images (amd64, work) (push) Failing after 17s
Post / images (arm64, addon-manager) (push) Failing after 19s
Post / images (arm64, placement) (push) Failing after 27s
Post / images (arm64, registration) (push) Failing after 26s
Post / images (arm64, registration-operator) (push) Failing after 33s
Post / images (arm64, work) (push) Failing after 19s
Post / image manifest (addon-manager) (push) Has been skipped
Post / image manifest (placement) (push) Has been skipped
Post / image manifest (registration) (push) Has been skipped
Post / image manifest (registration-operator) (push) Has been skipped
Post / image manifest (work) (push) Has been skipped
Post / trigger clusteradm e2e (push) Has been skipped
Close stale issues and PRs / stale (push) Failing after 35s
Add a test case to verify that when agentInstallNamespace is explicitly
set to an empty string in AddOnDeploymentConfig, the namespace defined
in the addonTemplate is used instead of being overridden.
This test validates the fix for issue #1209 where AddOnDeploymentConfig
was silently overriding the addonTemplate namespace even when
agentInstallNamespace was not intended to be set.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Signed-off-by: zhujian <jiazhu@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
When a ManifestWorkReplicaSet's placementRef was changed, the
ManifestWorks created for the old placement were not deleted,
causing orphaned resources.
The deployReconciler only processed placements currently in the spec
and never cleaned up ManifestWorks from removed placements.
This commit adds cleanup logic that:
- Builds a set of current placement names from the spec
- Lists all ManifestWorks belonging to the ManifestWorkReplicaSet
- Deletes any ManifestWorks with placement labels not in current spec
Also adds comprehensive tests:
- Integration test verifying placement change cleanup
- Unit tests for single and multiple placement change scenarios
Fixes#1203🤖 Generated with [Claude Code](https://claude.com/claude-code)
Signed-off-by: Jian Qiu <jqiu@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>