Jian Zhu fc55a5df7c 🌱 Add TLS ConfigMap watch and restart for cluster-manager operator (#1452)
* 🌱 Add TLS profile configuration support via flags and ConfigMap

Add pkg/common/tls library to support TLS profile compliance
for OCM components. This enables components to receive TLS
configuration via command-line flags (--tls-min-version and
--tls-cipher-suites) from operators, aligning with the upstream
enhancement proposal for TLS profile configuration.

Key features:
- TLS version and cipher suite parsing from flags or ConfigMap
- ConfigMap-based TLS configuration for operator use
- ConfigMap watcher for operators to detect profile changes
- OpenSSL cipher name mapping to Go crypto/tls constants
- Safe defaults (TLS 1.2) when no configuration provided

Updated pkg/common/options/webhook.go to use TLS library instead
of hardcoded TLS 1.2, enabling webhook components to respect
TLS flags injected by operators.

This is the foundation for OCM TLS profile compliance, keeping
upstream code OpenShift-agnostic while supporting dynamic TLS
configuration.

Related: open-cluster-management-io/enhancements#175

Signed-off-by: Jia Zhu <jiazhu@redhat.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* 🌱 Add TLS ConfigMap watch and restart to cluster-manager operator

Implement ConfigMap-based TLS profile compliance for cluster-manager operator
with hash comparison to prevent infinite restart loops.

Changes:
- Add TLS ConfigMap informer to watch ocm-tls-profile ConfigMap
- Load current TLS config at startup and compute hash
- Add event handlers that compare ConfigMap hash with current hash
- Only restart if ConfigMap content actually differs from current config
- Add comprehensive logging for all scenarios

Scenarios handled:
 ConfigMap exists at startup (hash matches) → no restart
 ConfigMap created after startup (hash differs) → restart to apply
 ConfigMap updated (new hash differs) → restart to apply
 ConfigMap deleted (was using it) → restart to use defaults

Leader election behavior:
- This code only runs on the leader pod (due to controllercmd framework)
- Non-leader pods wait idle until they acquire leadership
- New leaders load current ConfigMap state when they start, ensuring latest config
- Only the active leader monitors ConfigMap changes and restarts

🤖 Generated with Claude Code

Signed-off-by: zhujian <jiazhu@redhat.com>

* 🌱 Inject TLS config flags into addon-webhook deployment

Implement Case 2 pattern for addon-webhook TLS configuration:
cluster-manager-operator loads TLS config from ConfigMap and injects
it as flags into the addon-webhook deployment.

Changes:
- Add AddonWebhookTLSMinVersion and AddonWebhookTLSCipherSuites fields to HubConfig
- Load TLS config once when creating ClusterManagerController
- Pass TLS config strings as parameters to controller
- Inject --tls-min-version and --tls-cipher-suites flags into addon-webhook deployment template

This approach ensures addon-webhook receives TLS configuration via flags
without needing to watch the ConfigMap itself. When the ConfigMap changes,
cluster-manager-operator restarts, reloads the config, and updates the
deployment with new flags.

🤖 Generated with Claude Code

Signed-off-by: zhujian <jiazhu@redhat.com>

* 🌱 Log TLS min version and cipher suites on startup

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* 🌱 Move TLS library to sdk-go and update vendor dependencies

Relocates TLS config and cipher helpers from pkg/common/tls into the
vendored open-cluster-management.io/sdk-go/pkg/tls package, adds a
generic watcher utility, and updates all import references accordingly.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* 🌱 Inject TLS flags into all hub component deployments

Extend TLS flag injection from addon-webhook-only to all seven
hub deployments managed by cluster-manager-operator:

Manifests (operator → deployment args):
- Rename HubConfig.AddonWebhookTLS* → TLS* so the same fields
  drive all deployments rather than only the addon webhook
- Add {{- if .TLSMinVersion }} blocks to all six remaining
  deployment manifests (registration/work/placement controllers
  and registration/work webhook servers)

Controller binaries (registration, work, placement, addon-manager):
- Add --tls-min-version and --tls-cipher-suites flags to the
  common Options struct so the binaries accept the injected flags
  without failing; the flags are stored for future use

Note: library-go's NewCommandWithContext uses cmd.Run (not RunE),
so there is no clean programmatic hook to inject TLS into the 8443
health server without bypassing library-go's own boilerplate
(signal handling, log init, profiling). Upstream library-go also
has no native TLS configuration API on ControllerCommandConfig or
ControllerBuilder. The 8443 health server defaults to TLS 1.2 via
SetRecommendedHTTPServingInfoDefaults; configuring it further
requires an upstream library-go enhancement.

Webhook binaries already fully support these flags via WebhookOptions;
no binary changes are needed there.

Signed-off-by: Jian Zhu <zhujian@redhat.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* 🌱 Wire --tls-min-version to library-go 8443 health server via WithServingTLSConfig

Now that library-go has WithServingTLSConfig (ServingMinTLSVersion /
ServingCipherSuites fields + injection in StartController before
WithServer is called), wire the --tls-min-version and
--tls-cipher-suites flags from Options into it.

ApplyTLSToCommand installs a PersistentPreRunE hook that calls
CmdConfig.WithServingTLSConfig after cobra flag parsing completes.
PersistentPreRunE runs before cmd.Run, so all library-go boilerplate
(signal handling, logging, profiling) is preserved - unlike the
previous approach of replacing RunE which silently bypassed it.

Uses go mod replace → /Users/jiazhu/go/src/github.com/openshift/library-go
for local development/testing; replace directive to be removed once the
library-go PR is merged and vendored.

Signed-off-by: Jian Zhu <zhujian@redhat.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* 🌱 Switch to --config file for controller 8443 TLS configuration

Replace the WithServingTLSConfig approach with library-go's native
--config flag mechanism:

ApplyTLSToCommand now installs a PersistentPreRunE hook that:
1. Writes a minimal GenericOperatorConfig YAML to a temp file under
   /tmp (which is mounted as an emptyDir in all hub controller
   deployments, so writing is safe even with readOnlyRootFilesystem)
2. Sets --config to point at the temp file before cmd.Run executes

All library-go boilerplate in cmd.Run (signal handling, log init,
profiling, basicFlags.Validate) is fully preserved because
PersistentPreRunE runs before Run, not replacing it.

Inside StartController, Config() reads the temp file; the TLS values
survive SetRecommendedHTTPServingInfoDefaults because DefaultString
only sets fields that are currently empty.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* 🌱 Add tests for TLS profile compliance

Unit tests (pkg/common/options):
- TestApplyTLSToCommand: table-driven test covering all flag combinations:
  no flags (no-op), min-version only, cipher-suites only, both set,
  and --config pre-set by user (injection skipped).

Unit tests (clustermanager_controller):
- TestSyncDeployWithTLSConfig: verifies that when tlsMinVersion /
  tlsCipherSuites are set on the controller, the --tls-min-version and
  --tls-cipher-suites flags appear in the args of every managed hub
  deployment (registration, registration-webhook, placement, work-webhook).
  Also verifies the flags are absent when TLS config is not set.

Integration tests (test/integration/operator):
- "should inject tls-min-version into all hub deployments when
  ocm-tls-profile ConfigMap exists": creates the ocm-tls-profile
  ConfigMap with minTLSVersion=VersionTLS13 in the operator namespace
  and verifies all six hub deployments gain --tls-min-version=VersionTLS13
  in their container args.

Signed-off-by: Jian Zhu <zhujian@redhat.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* 🌱 Switch TLS cipher suite format from OpenSSL to IANA

Update vendored sdk-go to use IANA cipher suite names (e.g.
TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) instead of OpenSSL names
(e.g. ECDHE-RSA-AES128-GCM-SHA256).

IANA is the canonical format used by Go's crypto/tls, the Kubernetes
apiserver --tls-cipher-suites flag, and library-go's ServingInfo.CipherSuites.
Using IANA names end-to-end eliminates the format mismatch that caused
library-go's 8443 health server to reject cipher suite names written by
ApplyTLSToCommand.

The ocm-tls-profile ConfigMap now accepts IANA names only. The downstream
tls-profile-sync sidecar is responsible for converting OpenShift
TLSSecurityProfile (OpenSSL-style) names to IANA before writing the ConfigMap.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

* 🌱 Fix TLS ConfigMap test: create ConfigMap before operator startup

The previous test created ocm-tls-profile ConfigMap after the operator
started, which triggered the watcher's hash-change detection and called
os.Exit(0), killing the test process. Move the test into a dedicated
Describe with BeforeEach that creates the ConfigMap before starting the
operator so the watcher seeds its hash at startup and no restart is
triggered.

Also add hubWorkControllerDeployment to the tlsDeployments list since
its manifest includes tls-min-version injection.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: zhujian <jiazhu@redhat.com>

---------

Signed-off-by: Jia Zhu <jiazhu@redhat.com>
Signed-off-by: zhujian <jiazhu@redhat.com>
Signed-off-by: Jian Zhu <zhujian@redhat.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-01 06:54:30 +00:00
2025-12-16 20:11:32 +00:00
2026-01-27 10:26:32 +00:00
2025-04-01 03:16:08 +00:00
2025-12-16 20:11:32 +00:00
2021-09-30 23:11:54 +08:00
2025-09-11 07:38:13 +00:00

OCM logo

License CII Best Practices OpenSSF Scorecard FOSSA Status Artifact HUB Cluster Manager Artifact HUB Klusterlet

Open Cluster Management (OCM) is a Cloud Native Computing Foundation (CNCF) sandbox project focused on multicluster and multicloud scenarios for Kubernetes applications. At its core, OCM exists to make running, managing, and utilizing heterogeneous, multicluster environments simpler.

CNCF logo

Project Goals and Objectives

We believe that no one runs just one cluster, and making multicluster management as easy as possible encourages community growth across all projects, empowering our users and their customers. Our core goal is to "multicluster everything."

OCM provides a flexible, extensible, and open model that allows any software project to easily understand how to run in a multicluster way. We align with and contribute to the SIG-Multicluster APIs for multicluster management while developing and sharing key concepts and components that make it easier and more accessible for any project, large or small, to "do multicluster."

Differentiation in the Cloud Native Landscape

Open Cluster Management differentiates itself in several key ways:

  • Vendor Neutral: Avoid vendor lock-in by using APIs that are not tied to any cloud providers or proprietary platforms
  • Standards-Based: We provide both a reference implementation of the APIs put forward by SIG-Multicluster as well as extended capabilities to accentuate them for numerous additional use cases
  • Plug-and-Play Architecture: By standardizing on a single PlacementDecision API, any scheduler can publish its choices and any consumer can subscribe to them, eliminating friction and enabling true interoperability
  • Extensible Framework: Our add-on structure allows projects to gain the benefits of simple, secure, and vendor-neutral multicloud management with minimal code changes

What OCM Does

OCM utilizes open APIs and an easy-to-use add-on structure to provide comprehensive multicluster management capabilities. We do this because we believe that managing multiple clusters should be simple and easy. With the variety of both free and proprietary cloud and on-premises offerings, it is important to help the community utilize these heterogeneous environments to their fullest potential.

Our architecture is modular and extensible, built to always model Kubernetes principles. We utilize open declarative APIs and build code with DevOps and automation. OCM runs on a hub-spoke model that can survive failure and recovery of components without interrupting workloads. It's containerized, vendor-neutral, and focuses on declarative management style aligned with GitOps principles.

Core Architecture

The OCM architecture uses a hub-spoke model where the hub centralizes control of all managed clusters. An agent, called the klusterlet, resides on each managed cluster to manage registration to the hub and run instructions from the hub.

OCM Architecture

Core Components

Cluster Inventory

Registration of multiple clusters to a hub cluster to place them for management. For comprehensive details on cluster inventory management, see the cluster inventory documentation. OCM implements a secure "double opt-in handshaking" protocol that requires explicit consent from both hub cluster administrators and managed cluster administrators to establish the connection.

The registration process creates an mTLS connection between the registration-agent on the managed cluster (Klusterlet) and the registration-controller on the hub (Cluster Manager). Once registered, each managed cluster is assigned a dedicated namespace on the hub for isolation and security. The registration-controller continuously monitors cluster health and manages cluster lifecycle operations including certificate rotation and permission management.

Work Distribution

The ManifestWork API enables resources to be applied to managed clusters from a hub cluster. ManifestWork acts as a wrapper containing Kubernetes resource manifests that need to be distributed and applied to specific managed clusters.

The work-agent running on each managed cluster actively pulls ManifestWork resources from its dedicated namespace on the hub and applies them locally. This pull-based approach ensures scalability and eliminates the need to store managed cluster credentials on the hub. The API supports sophisticated features including update strategies, conflict resolution, resource ordering, and status feedback to provide comprehensive resource distribution capabilities.

Content Placement

Dynamic placement of content and behavior across multiple clusters. The Placement API is a sophisticated scheduler for multicluster environments that operates on a Hub-Spoke model and is deeply integrated with the Work API.

The Placement API allows administrators to use vendor-neutral selectors to dynamically choose clusters based on labels, resource capacity, or custom criteria. The PlacementDecision resource provides a list of selected clusters that can be consumed by any scheduler or controller.

Add-on Framework

OCM Add-ons provide a clear framework that allows developers to easily add their software to OCM and make it multicluster aware. Add-ons are simple to write and are fully documented and maintained according to specifications in the addon-framework repository.

The framework handles the complex aspects of multicluster deployment including registration, lifecycle management, configuration distribution, and status reporting. Add-ons are used by multiple projects such as Argo, Submariner, KubeVela, and more. They are also used extensively for internal components of OCM, ensuring the framework is actively developed and maintained. With add-ons, any project can plug into OCM with minimal code and almost instantaneously become multicluster aware.

Built-in Extensions

These optional but valuable capabilities extend OCM's core functionality for specific use cases.

Cluster Lifecycle Management

OCM integrates seamlessly with Cluster API to provide comprehensive cluster lifecycle management. Cluster API is a Kubernetes sub-project focused on providing declarative APIs and tooling to simplify provisioning, upgrading, and operating multiple Kubernetes clusters.

The integration allows OCM to work alongside Cluster API management planes, where clusters provisioned through Cluster API can be automatically registered with OCM for multicluster management. OCM's hub cluster can run in the same cluster as the Cluster API management plane, enabling seamless workflows from cluster creation to workload deployment and policy enforcement. See the Cluster API integration guide for detailed setup instructions.

Application Lifecycle Management

Leverage the Argo CD add-on for OCM to enable decentralized, pull-based application deployment to managed clusters. The OCM Argo CD add-on uses a hub-spoke architecture to deliver Argo CD Applications from the OCM hub cluster to registered managed clusters.

Unlike traditional push-based deployment models, this pull mechanism provides several advantages:

  • Scalability: Hub-spoke pattern offers better scalability
  • Security: Cluster credentials don't have to be stored in a centralized environment
  • Resilience: Reduces the impact of a single point of centralized failure

See Argo CD OCM add-on for details on installing the add-on and deploying applications across multiple clusters.

Governance, Risk, and Compliance (GRC)

Use prebuilt security and configuration controllers to enforce policies on Kubernetes configuration across your clusters. Policy controllers allow the declarative expression of desired conditions that can be audited or enforced against a set of managed clusters.

Related repositories:

Cloud Native Use Cases

Everything OCM does is Cloud Native by definition. Our comprehensive User Scenarios section provides detailed examples of how OCM enables various multicluster and multicloud use cases.

Getting Started

You can use the clusteradm CLI to bootstrap a control plane for multicluster management. To set up a multicluster environment with OCM enabled on your local machine, follow the instructions in setup dev environment.

For developers looking to contribute to OCM, see our comprehensive Development Guide which covers development environment setup, code standards, testing, and contribution workflows.

External Integrations

We constantly work with other open-source projects to make multicluster management easier:

  • Argo CD (CNCF): OCM supplies Argo CD with ClusterDecision resources via Argo CD's Cluster Decision Resource Generator, enabling it to select target clusters for GitOps deployments
  • Clusternet: (CNCF) Multicluster orchestration that can easily plug into OCM with clusternet addon
  • KubeVela (CNCF): KubeVela develops an integration addon to work with OCM supporting application deployment across multiple clusters
  • KubeStellar (CNCF): KubeStellar uses OCM as the backend of multicluster management
  • Kueue (Kubernetes SIGs): OCM supplies Kueue with a streamlined MultiKueue setup process, automated generation of MultiKueue specific Kubeconfig, and enhanced multicluster scheduling capabilities
  • Submariner: (CNCF) Provides multicluster networking connectivity with automated deployment and management

Documentation and Resources

For comprehensive information about OCM, visit our website with detailed sections on key concepts.

Community

Connect with the OCM community:

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Description
Core components in the OCM project. Report here if you found any issues in OCM.
Readme Apache-2.0 129 MiB
Languages
Go 98.8%
Makefile 0.5%
Shell 0.5%
Python 0.1%