🌱 OCM Kueue Admission Check Controller (#601)

* Deliver proposal for OCM Kueue admission check controller.

Signed-off-by: Zhe Shen <xxtale02591@gmail.com>

* add more explanation in doc, delete unused permissions

Signed-off-by: Zhe Shen <xxtale02591@gmail.com>

* add section in doc

Signed-off-by: Zhe Shen <xxtale02591@gmail.com>

---------

Signed-off-by: Zhe Shen <xxtale02591@gmail.com>
This commit is contained in:
Zhe Shen
2024-09-02 10:41:01 +02:00
committed by GitHub
parent fc3ee21f54
commit 3e3d4eade1
19 changed files with 1627 additions and 0 deletions

View File

@@ -0,0 +1,641 @@
# Set up Multikueue with OCM Kueue Admission Check Controller
This guide demonstrates how to use the external OCM [Kueue Admission Check Controller](https://kueue.sigs.k8s.io/docs/concepts/admission_check/) which integrates OCM `Placement` results with [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) for intelligent multi-cluster job scheduling.
The controller reads OCM `Placement` decisions and generates corresponding `MultiKueueConfig` and `MultiKueueCluster` resources, streamlining the setup of the [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) environment and enabling users to select clusters based on custom criteria.
We'll walk through different user stories that showcase the power and flexibility of this integration.
## Background
### Existing Components
1. **OCM Placement and AddonPlacementScore**:
- `Placement` is used to dynamically select a set of `managedClusters` in one or multiple `ManagedClusterSet` to achieve Multi-Cluster scheduling.
- `AddOnPlacementScore` is an API introduced by `Placement` to support scheduling based on customized scores.
2. **Kueue MultiKueue and AdmissionChecks**:
- [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) is a feature of Kueue for job dispatching across multiple clusters.
- The [AdmissionChecks](https://kueue.sigs.k8s.io/docs/concepts/admission_check/) are a mechanism which manages Kueue and allows it to consider additional criteria before admitting a workload. Kueue only proceeds with a workload if all associated AdmissionChecks return a positive signal.
REF: [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/), [Admission Check](https://kueue.sigs.k8s.io/docs/concepts/admission_check/), [Placement](https://open-cluster-management.io/concepts/placement/).
## Motivation
- Setting up a [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) environment for multiple clusters is a complex and manual process, often requiring users to create `MultiKueueCluster` and `MultiKueueConfig` resources for each worker cluster individually.
- Driven by the growing need for optimal compute resource utilization, particularly in AI/ML workloads, multi-cluster users increasingly seek to leverage the OCM framework with [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) for intelligent cluster selection.
REF: [Setup a MultiKueue environment](https://kueue.sigs.k8s.io/docs/tasks/manage/setup_multikueue/#multikueue-specific-kubeconfig)
## Prerequisites
1. A Kubernetes environment with OCM installed on a hub cluster and at least three managed clusters.
2. [Kueue](https://kueue.sigs.k8s.io/docs/installation/) deployed across all clusters.
3. [Managed-serviceaccount](https://github.com/open-cluster-management-io/managed-serviceaccount), [cluster-permission](https://github.com/open-cluster-management-io/cluster-permission) and [resource-usage-collect-addon](https://github.com/open-cluster-management-io/addon-contrib/tree/main/resource-usage-collect-addon) installed on managed clusters.
- You can set up these above by running the command:
```bash
./setup-env.sh
```
**Notice**: Currently, this functionality relies on the support of `ClusterProfile` and the user's manual installation of the Admission Check Controller.
OCM achieves this by replacing some OCM images in this `setup-env.sh`. In the future, we plan to address the items listed in the [TODO section](#todo).
After that, you can verify your setup.
- Check the managed clusters.
```bash
kubectl get mcl
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE
cluster1 true https://cluster1-control-plane:6443 True True 116s
cluster2 true https://cluster2-control-plane:6443 True True 94s
cluster3 true https://cluster3-control-plane:6443 True True 73s
```
- Verify the installed addons.
```bash
kubectl get mca -A
NAMESPACE NAME AVAILABLE DEGRADED PROGRESSING
cluster1 managed-serviceaccount True False
cluster1 resource-usage-collect True False
cluster2 managed-serviceaccount True False
cluster2 resource-usage-collect True False
cluster3 managed-serviceaccount True False
cluster3 resource-usage-collect True False
```
- Confirm Kueue is running on the clusters.
```bash
kubectl get pods -n kueue-system --context kind-hub # Same for managed clusters.
NAME READY STATUS RESTARTS AGE
kueue-controller-manager-87bd7888b-gqk4g 2/2 Running 0 69s
```
- On the hub cluster, check `ClusterProfiles`.
```bash
kubectl get clusterprofile -A
NAMESPACE NAME AGE
open-cluster-management cluster1 23s
open-cluster-management cluster2 23s
open-cluster-management cluster3 23s
```
- The `ClusterProfile` status contains credentials that Kueue can use.
```bash
kubectl get clusterprofile -A -ojson | jq '.items[] | .metadata.name, .status.credentials[]'
"cluster1"
{
"accessRef": {
"kind": "Secret",
"name": "kueue-admin-cluster1-kubeconfig",
"namespace": "kueue-system"
},
"consumer": "kueue-admin"
}
"cluster2"
{
"accessRef": {
"kind": "Secret",
"name": "kueue-admin-cluster2-kubeconfig",
"namespace": "kueue-system"
},
"consumer": "kueue-admin"
}
"cluster3"
{
"accessRef": {
"kind": "Secret",
"name": "kueue-admin-cluster3-kubeconfig",
"namespace": "kueue-system"
},
"consumer": "kueue-admin"
}
```
- On hub cluster, Check secrets with `kubeconfig` for the managed cluster created under `kueue-system` namespace.
```bash
kubectl get secret -n kueue-system
NAME TYPE DATA AGE
kueue-admin-cluster1-kubeconfig Opaque 1 4m4s
kueue-admin-cluster2-kubeconfig Opaque 1 4m4s
kueue-admin-cluster3-kubeconfig Opaque 1 4m4s
kueue-webhook-server-cert Opaque 4 5m27s
```
## User Stories
#### Story 1
As an admin, I want to automate [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) configuration across multiple clusters, so that I can streamline the setup process without manual intervention.
- With the help of the `ClusterProfile` API, we can easily set up MultiKueue environment.
```bash
kubectl apply -f ./multikueue-setup-demo1.yaml
```
- After that, check the status of `MultiKueueCluster`, `AdmissionChecks` and `Clusterqueues`
```bash
kubectl get multikueuecluster -A -ojson | jq '.items[] | .metadata.name, .status.conditions'
kubectl get admissionchecks -ojson | jq '.items[] | .metadata.name, .status.conditions'
kubectl get clusterqueues -ojson | jq '.items[] | .metadata.name, .status.conditions'
```
Success is indicated when "status": "True" and reasons like "Active" or "Ready" are present in the conditions.
```bash
"multikueue-demo1-cluster1"
[
{
"lastTransitionTime": "2024-08-31T20:41:41Z",
"message": "Connected",
"observedGeneration": 1,
"reason": "Active",
"status": "True",
"type": "Active"
}
]
"multikueue-demo1-cluster2"
[
{
"lastTransitionTime": "2024-08-31T20:41:41Z",
"message": "Connected",
"observedGeneration": 1,
"reason": "Active",
"status": "True",
"type": "Active"
}
]
"multikueue-demo1"
[
{
"lastTransitionTime": "2024-08-31T20:41:41Z",
"message": "The admission check is active",
"observedGeneration": 1,
"reason": "Active",
"status": "True",
"type": "Active"
},
{
"lastTransitionTime": "2024-08-31T20:41:41Z",
"message": "only one multikueue managed admission check can be used in one ClusterQueue",
"observedGeneration": 1,
"reason": "MultiKueue",
"status": "True",
"type": "SingleInstanceInClusterQueue"
},
{
"lastTransitionTime": "2024-08-31T20:41:41Z",
"message": "admission check cannot be applied at ResourceFlavor level",
"observedGeneration": 1,
"reason": "MultiKueue",
"status": "True",
"type": "FlavorIndependent"
}
]
"cluster-queue-demo1"
[
{
"lastTransitionTime": "2024-08-31T20:41:41Z",
"message": "Can admit new workloads",
"observedGeneration": 1,
"reason": "Ready",
"status": "True",
"type": "Active"
}
]
```
- Deploy a job to the MultiKueue.
```bash
kubectl create -f ./job-demo1.yaml
```
- Check the workload on the managed clusters. Here when the jobs Workload receives a QuotaReservation in the manager cluster, a copy of the Workload is created in all configured worker clusters.
Once `kind-cluster1` admitted the workload, the manager removed the corresponding workloads from the other clusters(`kind-cluster2`).
```bash
kubectl get workload --context kind-cluster1
NAME QUEUE RESERVED IN ADMITTED AGE
job-demo1-jobnktc6-6c5f3 user-queue-demo1 cluster-queue-demo1 True 5s
kubectl get workload --context kind-cluster2
No resources found in default namespace. # After cluster1 admitted the workload, no workload should show up here.
```
#### Story 2
As an admin, I want to use OCM `Placement` results for scheduling, so that clusters with specific attributes, like those with the `nvidia-t4` GPU accelerator label, are automatically selected and converted into a [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) for targeted workload deployment.
- You can manually label the accelerators on the clusters.
```bash
kubectl label managedcluster cluster2 accelerator=nvidia-tesla-t4
kubectl label managedcluster cluster3 accelerator=nvidia-tesla-t4
```
The `placememt-demo2-1.yaml` selects clusters with the `nvidia-tesla-t4` accelerator label.
```yaml
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
name: placement-demo2
namespace: kueue-system
spec:
clusterSets:
- spoke
tolerations:
- key: cluster.open-cluster-management.io/unreachable
operator: Exists
- key: cluster.open-cluster-management.io/unavailable
operator: Exists
predicates:
- requiredClusterSelector:
labelSelector:
matchLabels:
accelerator: nvidia-tesla-t4
```
- Bind the cluster set to the Kueue namespace and verify the bindings.
```bash
clusteradm clusterset bind spoke --namespace kueue-system
clusteradm get clustersets
<ManagedClusterSet>
└── <spoke>
└── <BoundNamespace> default,kueue-system
└── <Status> 3 ManagedClusters selected
└── <Clusters> [cluster1 cluster2 cluster3]
```
- Apply the placement policy.
```bash
kubectl apply -f placement-demo2-1.yaml
```
- Apply the MultiKueue setup configuration.
```bash
kubectl apply -f ./multikueue-setup-demo2.yaml
```
- Check the `MultikueueKonfig` and `MultikueueClusters`.
```bash
kubectl get multikueueconfig
NAME AGE
placement-demo2 60s
kubectl get multikueuecluster
NAME AGE
placement-demo2-cluster2 60s
placement-demo2-cluster3 60s
```
- After that, check the status of `MultiKueueCluster`, `AdmissionChecks` and `Clusterqueues`
```bash
kubectl get multikueuecluster -A -ojson | jq '.items[] | .metadata.name, .status.conditions'
kubectl get admissionchecks -ojson | jq '.items[] | .metadata.name, .status.conditions'
kubectl get clusterqueues -ojson | jq '.items[] | .metadata.name, .status.conditions'
```
If success, there should be "status": "True" and reasons like "Active" or "Ready" presented in the conditions.
```bash
"placement-demo2-cluster2"
[
{
"lastTransitionTime": "2024-08-31T22:03:16Z",
"message": "Connected",
"observedGeneration": 1,
"reason": "Active",
"status": "True",
"type": "Active"
}
]
"placement-demo2-cluster3"
[
{
"lastTransitionTime": "2024-08-31T22:03:16Z",
"message": "Connected",
"observedGeneration": 1,
"reason": "Active",
"status": "True",
"type": "Active"
}
]
"multikueue-demo2" # The status of the admissioncheck `multikueue-demo2`
[
{
"lastTransitionTime": "2024-08-31T22:03:16Z",
"message": "The admission check is active",
"observedGeneration": 1,
"reason": "Active",
"status": "True",
"type": "Active"
},
{
"lastTransitionTime": "2024-08-31T22:03:16Z",
"message": "only one multikueue managed admission check can be used in one ClusterQueue",
"observedGeneration": 1,
"reason": "MultiKueue",
"status": "True",
"type": "SingleInstanceInClusterQueue"
},
{
"lastTransitionTime": "2024-08-31T22:03:16Z",
"message": "admission check cannot be applied at ResourceFlavor level",
"observedGeneration": 1,
"reason": "MultiKueue",
"status": "True",
"type": "FlavorIndependent"
}
]
"placement-demo2" # The status of the admissioncheck `placement-demo2`
[
{
"lastTransitionTime": "2024-08-31T22:03:16Z",
"message": "MultiKueueConfig and MultiKueueCluster generated",
"reason": "Active",
"status": "True",
"type": "Active"
}
]
"cluster-queue-demo2"
[
{
"lastTransitionTime": "2024-08-31T22:03:16Z",
"message": "Can admit new workloads",
"observedGeneration": 1,
"reason": "Ready",
"status": "True",
"type": "Active"
}
]
```
- Create a job requesting GPU resources to the MultiKueue.
```bash
kubectl create -f ./job-demo2.yaml
```
- Check the workload on managed clusters. Like we explained in the case in story 1, once one cluster(here `kind-cluster3`) has admitted the workload, the manager removed the corresponding workloads from the other clusters(here `kind-cluster2`).
```bash
kubectl get workload --context kind-cluster2
No resources found in default namespace.
kubectl get workload --context kind-cluster3
NAME QUEUE RESERVED IN ADMITTED AGE
job-demo2-jobl2t6d-a8cdd user-queue-demo2 cluster-queue-demo2 True 3s
```
#### Story 3
As an admin, I want to leverage OCM's `AddonPlacementScore` for dynamic workload scheduling, so that clusters with higher GPU scores, indicating clusters with more GPU resources, are selected and converted into a [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/), which automatically adjusts by adding or removing clusters as scores change.
`placememt-demo2-2` selects clusters with the `nvidia-tesla-t4` accelerator label, and select one cluster with the highest GPU-score, indicating having more GPU resources.
```yaml
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
name: placement-demo2
namespace: kueue-system
spec:
clusterSets:
- spoke
tolerations:
- key: cluster.open-cluster-management.io/unreachable
operator: Exists
- key: cluster.open-cluster-management.io/unavailable
operator: Exists
predicates:
- requiredClusterSelector:
labelSelector:
matchLabels:
accelerator: nvidia-tesla-t4
numberOfClusters: 1
prioritizerPolicy:
mode: Exact
configurations:
- scoreCoordinate:
type: AddOn
addOn:
resourceName: resource-usage-score
scoreName: gpuClusterAvailable
weight: 1
```
- You can manually edit the GPU resources on the managed clusters for testing, for example on `kind-cluster2`, set 3 fake GPU resources on the `control-plane-node`.
```bash
kubectl edit-status node cluster2-control-plane --context kind-cluster2 # Same operation with other clusters/nodes.
```
- Edit the `status` of the node `cluster2-control-plane`:
```yaml
allocatable:
cpu: "8"
ephemeral-storage: 61202244Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
hugepages-32Mi: "0"
hugepages-64Ki: "0"
memory: 8027168Ki
nvidia.com/gpu: "3" # Add 3 fake GPUs in allocatable
pods: "110"
capacity:
cpu: "8"
ephemeral-storage: 61202244Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
hugepages-32Mi: "0"
hugepages-64Ki: "0"
memory: 8027168Ki
nvidia.com/gpu: "3" # Add 3 fake GPUs in capacity
pods: "110"
```
- Here in this environment, cluster1 has no GPUs, while cluster2 and cluster3 each have 3 GPUs.
Check `AddonPlacementScore`, the range of the score is from -100 to 100, clusters with more resources available have higher scores.
Here cluster1, which has no GPUs, should have a score of -100, and the cluster running the workload(here from story 2 we have one workload running on `kind-cluster3`) will have a lower score.
```bash
kubectl get addonplacementscore -A -ojson | jq '.items[] | .metadata.name, .status.scores[5]'
"resource-usage-score"
{
"name": "gpuClusterAvailable",
"value": -100
}
"resource-usage-score" # kind-cluster2 has no workload.
{
"name": "gpuClusterAvailable",
"value": -70
}
"resource-usage-score" # kind-cluster3 has a workload from story 2, so it has fewer GPU available, thus lower score.
{
"name": "gpuClusterAvailable",
"value": -80
}
```
- Apply the changes in the `Placement` to update MultiKueue dynamically.
```bash
kubectl apply -f ./placement-demo2-2.yaml
```
- Review the update in `MultikueueKonfig`.
```bash
kubectl get multikueueconfig
NAME AGE
placement-demo2 22m
kubectl get multikueueconfig placement-demo2 -oyaml
apiVersion: kueue.x-k8s.io/v1alpha1
kind: MultiKueueConfig
metadata:
creationTimestamp: "2024-08-31T22:03:16Z"
generation: 5
name: placement-demo2
resourceVersion: "18109"
uid: 3c16af72-94bf-4444-bf79-7e896165aabc
spec:
clusters:
- placement-demo2-cluster2 # cluster2 has a higher GPU score, so it got selected by the placement decision.
```
- Create a job for the updated MultiKueue and check the workload, this time the workload is admitted by `kind-cluster2`, in `kind-cluster3` can only find the old workload from Story 2.
```bash
kubectl create -f ./job-demo2.yaml
kubectl get workload --context kind-cluster2
NAME QUEUE RESERVED IN ADMITTED AGE
job-demo2-jobxn888-4b91e user-queue-demo2 cluster-queue-demo2 True 6s
kubectl get workload --context kind-cluster3
NAME QUEUE RESERVED IN ADMITTED AGE
job-demo2-jobl2t6d-a8cdd user-queue-demo2 cluster-queue-demo2 True 9m13s
```
## Design Details
### OCM Admission Check Controller
The OCM Admission Check Controller will integrate OCM `Placement` results into MultiKueue by reading `Placement` decisions and generating the necessary `MultiKueueConfig` and `MultiKueueCluster` resources.
- `controllerName`: Identifies the controller that processes the Admission Check, currently set to `open-cluster-management.io/placement`
- `parameters`: Identifies a configuration with additional parameters for the check, here we add the existing OCM `Placement` component. Clusters specified in the `Placement` will be bound to the `kueue-system` namespace.
Example OCM Admission Check Controller design:
```yaml
# OCM implements an admissioncheck controller to automate the MultiKueue setup process.
# MultiKueueConfigs and MultiKueueClusters are generated dynamically based on OCM placement decisions.
apiVersion: kueue.x-k8s.io/v1beta1
kind: AdmissionCheck
metadata:
name: placement-demo2
spec:
controllerName: open-cluster-management.io/placement
parameters:
apiGroup: cluster.open-cluster-management.io
kind: Placement
name: placement-demo2
# Leverages OCM's placement mechanism to select clusters based on specific criteria.
# For example `Placement-demo2-1` selects clusters with the `nvidia-tesla-t4` accelerator label.
```
### Changes in the Configuration Process with OCM Admission Check Controller
Using the OCM Admission Check Controller significantly simplifies the configuration process for system administrators by automating several manual tasks.
#### Before Using OCM Admission Check Controller
In the traditional setup, administrators must manually configure both `MultiKueueConfig` and `MultiKueueCluster` resources:
- **MultiKueueConfig**: Defines which clusters are part of the [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) environment. Admins need to specify each cluster manually.
- **MultiKueueCluster**: Each cluster requires a `MultiKueueCluster` resource, which includes a kubeconfig secret that administrators must create manually for secure communication.
```yaml
apiVersion: kueue.x-k8s.io/v1alpha1
kind: MultiKueueConfig
metadata:
name: multikueue-config
spec:
clusters:
- multikueue-cluster1
- multikueue-cluster2
---
apiVersion: kueue.x-k8s.io/v1alpha1
kind: MultiKueueCluster
metadata:
name: multikueue-cluster1
spec:
kubeConfig:
locationType: Secret
location: kueue-admin-cluster1-kubeconfig
---
apiVersion: kueue.x-k8s.io/v1alpha1
kind: MultiKueueCluster
metadata:
name: multikueue-cluster2
spec:
kubeConfig:
locationType: Secret
location: kueue-admin-cluster2-kubeconfig
```
#### After Using OCM Admission Check Controller
With the OCM Admission Check Controller, the need for manual configuration of `MultiKueueConfig` and `MultiKueueCluster` is eliminated. Instead, the administrator only needs to configure two additional admission checks in the ClusterQueue resource:
`multikueue-demo2` and `placement-demo2` (see in `multikueue-setup-demo2.yaml`) which leverage OCM's placement mechanism to select clusters based on specific criteria and automate the process of setting up `MultiKueueConfig` and `MultiKueueCluster`.
```yaml
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "cluster-queue-demo2"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory","nvidia.com/gpu"]
flavors:
- name: "default-flavor-demo2"
resources:
- name: "cpu"
nominalQuota: 9
- name: "memory"
nominalQuota: 36Gi
- name: "nvidia.com/gpu"
nominalQuota: 3
admissionChecks:
- multikueue-demo2
- placement-demo2
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: AdmissionCheck
metadata:
name: multikueue-demo2
spec:
controllerName: kueue.x-k8s.io/multikueue
parameters:
apiGroup: kueue.x-k8s.io
kind: MultiKueueConfig
name: placement-demo2
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: AdmissionCheck
metadata:
name: placement-demo2
spec:
controllerName: open-cluster-management.io/placement
parameters:
apiGroup: cluster.open-cluster-management.io
kind: Placement
name: placement-demo2
```
#### OCM Admission Check Controller Workflow
- The OCM Admission Check Controller retrieves the OCM `Placement` associated with an AdmissionCheck in the `kueue-system` namespace.
- It uses a `PlacementDecisionTracker` to gather the selected clusters and retrieves their `ClusterProfile` for `credentials`.
- The controller creates or updates `MultiKueueCluster` resources with the kubeconfig details for each cluster, and then lists these clusters in a `MultiKueueConfig` resource.
- Finally, it updates the AdmissionCheck condition to true, indicating successful generation of the `MultiKueueConfig` and `MultiKueueCluster`, readying the [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) environment for job scheduling.
## TODO
- In the future, the `AdmissionCheckcontroller` may be added to `featureGates` as a user-enabled feature or possibly developed into an individual component running as a pod on the `hub`.
- Users may also need to enable the `ClusterProfile` feature in the `featureGates` to utilize the OCM Admission Check. This can be done by configuring the `ClusterManager` in `hub`.
```yaml
apiVersion: operator.open-cluster-management.io/v1
kind: ClusterManager
metadata:
name: cluster-manager
spec:
registrationConfiguration:
featureGates:
- feature: ClusterProfile
mode: Enable
...
```

View File

@@ -0,0 +1,63 @@
apiVersion: rbac.open-cluster-management.io/v1alpha1
kind: ClusterPermission
metadata:
name: kueue-admin-cluster1
namespace: cluster1
spec:
clusterRole:
rules:
- apiGroups:
- batch
resources:
- jobs
verbs:
- create
- delete
- get
- list
- watch
- apiGroups:
- batch
resources:
- jobs/status
verbs:
- get
- apiGroups:
- jobset.x-k8s.io
resources:
- jobsets
verbs:
- create
- delete
- get
- list
- watch
- apiGroups:
- jobset.x-k8s.io
resources:
- jobsets/status
verbs:
- get
- apiGroups:
- kueue.x-k8s.io
resources:
- workloads
verbs:
- create
- delete
- get
- list
- watch
- apiGroups:
- kueue.x-k8s.io
resources:
- workloads/status
verbs:
- get
- patch
- update
clusterRoleBinding:
subject:
kind: ServiceAccount
name: kueue-admin-cluster1
namespace: open-cluster-management-agent-addon

View File

@@ -0,0 +1,63 @@
apiVersion: rbac.open-cluster-management.io/v1alpha1
kind: ClusterPermission
metadata:
name: kueue-admin-cluster2
namespace: cluster2
spec:
clusterRole:
rules:
- apiGroups:
- batch
resources:
- jobs
verbs:
- create
- delete
- get
- list
- watch
- apiGroups:
- batch
resources:
- jobs/status
verbs:
- get
- apiGroups:
- jobset.x-k8s.io
resources:
- jobsets
verbs:
- create
- delete
- get
- list
- watch
- apiGroups:
- jobset.x-k8s.io
resources:
- jobsets/status
verbs:
- get
- apiGroups:
- kueue.x-k8s.io
resources:
- workloads
verbs:
- create
- delete
- get
- list
- watch
- apiGroups:
- kueue.x-k8s.io
resources:
- workloads/status
verbs:
- get
- patch
- update
clusterRoleBinding:
subject:
kind: ServiceAccount
name: kueue-admin-cluster2
namespace: open-cluster-management-agent-addon

View File

@@ -0,0 +1,63 @@
apiVersion: rbac.open-cluster-management.io/v1alpha1
kind: ClusterPermission
metadata:
name: kueue-admin-cluster3
namespace: cluster3
spec:
clusterRole:
rules:
- apiGroups:
- batch
resources:
- jobs
verbs:
- create
- delete
- get
- list
- watch
- apiGroups:
- batch
resources:
- jobs/status
verbs:
- get
- apiGroups:
- jobset.x-k8s.io
resources:
- jobsets
verbs:
- create
- delete
- get
- list
- watch
- apiGroups:
- jobset.x-k8s.io
resources:
- jobsets/status
verbs:
- get
- apiGroups:
- kueue.x-k8s.io
resources:
- workloads
verbs:
- create
- delete
- get
- list
- watch
- apiGroups:
- kueue.x-k8s.io
resources:
- workloads/status
verbs:
- get
- patch
- update
clusterRoleBinding:
subject:
kind: ServiceAccount
name: kueue-admin-cluster3
namespace: open-cluster-management-agent-addon

View File

@@ -0,0 +1,7 @@
apiVersion: authentication.open-cluster-management.io/v1beta1
kind: ManagedServiceAccount
metadata:
name: kueue-admin-cluster1
namespace: cluster1
spec:
rotation: {}

View File

@@ -0,0 +1,7 @@
apiVersion: authentication.open-cluster-management.io/v1beta1
kind: ManagedServiceAccount
metadata:
name: kueue-admin-cluster2
namespace: cluster2
spec:
rotation: {}

View File

@@ -0,0 +1,7 @@
apiVersion: authentication.open-cluster-management.io/v1beta1
kind: ManagedServiceAccount
metadata:
name: kueue-admin-cluster3
namespace: cluster3
spec:
rotation: {}

View File

@@ -0,0 +1,219 @@
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.14.0
name: clusterprofiles.multicluster.x-k8s.io
spec:
group: multicluster.x-k8s.io
names:
kind: ClusterProfile
listKind: ClusterProfileList
plural: clusterprofiles
singular: clusterprofile
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
description: ClusterProfile represents a single cluster in a multi-cluster
deployment.
properties:
apiVersion:
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
spec:
description: ClusterProfileSpec defines the desired state of ClusterProfile.
properties:
clusterManager:
description: ClusterManager defines which cluster manager owns this
ClusterProfile resource
properties:
name:
description: Name defines the name of the cluster manager
type: string
required:
- name
type: object
x-kubernetes-validations:
- message: ClusterManager is immutable
rule: self == oldSelf
displayName:
description: DisplayName defines a human-readable name of the ClusterProfile
type: string
required:
- clusterManager
type: object
status:
description: ClusterProfileStatus defines the observed state of ClusterProfile.
properties:
conditions:
description: Conditions contains the different condition statuses
for this cluster.
items:
description: "Condition contains details for one aspect of the current
state of this API Resource.\n---\nThis struct is intended for
direct use as an array at the field path .status.conditions. For
example,\n\n\n\ttype FooStatus struct{\n\t // Represents the
observations of a foo's current state.\n\t // Known .status.conditions.type
are: \"Available\", \"Progressing\", and \"Degraded\"\n\t //
+patchMergeKey=type\n\t // +patchStrategy=merge\n\t // +listType=map\n\t
\ // +listMapKey=type\n\t Conditions []metav1.Condition `json:\"conditions,omitempty\"
patchStrategy:\"merge\" patchMergeKey:\"type\" protobuf:\"bytes,1,rep,name=conditions\"`\n\n\n\t
\ // other fields\n\t}"
properties:
lastTransitionTime:
description: |-
lastTransitionTime is the last time the condition transitioned from one status to another.
This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
format: date-time
type: string
message:
description: |-
message is a human readable message indicating details about the transition.
This may be an empty string.
maxLength: 32768
type: string
observedGeneration:
description: |-
observedGeneration represents the .metadata.generation that the condition was set based upon.
For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date
with respect to the current state of the instance.
format: int64
minimum: 0
type: integer
reason:
description: |-
reason contains a programmatic identifier indicating the reason for the condition's last transition.
Producers of specific condition types may define expected values and meanings for this field,
and whether the values are considered a guaranteed API.
The value should be a CamelCase string.
This field may not be empty.
maxLength: 1024
minLength: 1
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
type: string
status:
description: status of the condition, one of True, False, Unknown.
enum:
- "True"
- "False"
- Unknown
type: string
type:
description: |-
type of condition in CamelCase or in foo.example.com/CamelCase.
---
Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be
useful (see .node.status.conditions), the ability to deconflict is important.
The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt)
maxLength: 316
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
type: string
required:
- lastTransitionTime
- message
- reason
- status
- type
type: object
type: array
credentials:
description: |-
TokenRequests describes a list of token requests on this cluster and its
approval status.
items:
properties:
accessRef:
description: RequestRef points to a specific AuthTokenRequest
object.
properties:
kind:
description: Kind is the kind of the referred token request
object.
type: string
name:
description: Name is the name of the referred token request
object.
type: string
namespace:
description: Namespace is the namespace of the referred
token request object.
type: string
required:
- kind
- name
- namespace
type: object
consumer:
type: string
required:
- accessRef
- consumer
type: object
type: array
properties:
description: |-
Properties defines name/value pairs to represent properties of a cluster.
It could be a collection of ClusterProperty (KEP-2149) resources,
but could also be info based on other implementations.
The names of the properties can be predefined names from ClusterProperty resources
and is allowed to be customized by different cluster managers.
items:
description: |-
Property defines a name/value pair to represent a property of a cluster.
It could be a ClusterProperty (KEP-2149) resource,
but could also be info based on other implementations.
The name of the property can be predefined name from a ClusterProperty resource
and is allowed to be customized by different cluster managers.
This property can store various configurable details and metrics of a cluster,
which may include information such as the number of nodes, total and free CPU,
and total and free memory, among other potential attributes.
properties:
name:
description: |-
Name is the name of a property resource on cluster. It's a well-known
or customized name to identify the property.
maxLength: 253
minLength: 1
type: string
value:
description: Value is a property-dependent string
maxLength: 1024
minLength: 1
type: string
required:
- name
- value
type: object
type: array
version:
description: Version defines the version information of the cluster.
properties:
kubernetes:
description: Kubernetes is the kubernetes version of the cluster.
type: string
type: object
type: object
required:
- spec
type: object
served: true
storage: true
subresources:
status: {}

View File

@@ -0,0 +1,83 @@
[
{
"op": "add",
"path": "/rules/-",
"value": {
"apiGroups": ["multicluster.x-k8s.io"],
"resources": ["clusterprofiles"],
"verbs": ["get", "list", "watch", "create", "update", "patch", "delete"]
}
},
{
"op": "add",
"path": "/rules/-",
"value": {
"apiGroups": ["multicluster.x-k8s.io"],
"resources": ["clusterprofiles/status"],
"verbs": ["update", "patch"]
}
},
{
"op": "add",
"path": "/rules/-",
"value": {
"apiGroups": ["rbac.open-cluster-management.io"],
"resources": ["clusterpermissions"],
"verbs": ["get", "list", "watch", "create", "update", "patch", "delete"]
}
},
{
"op": "add",
"path": "/rules/-",
"value": {
"apiGroups": ["authentication.open-cluster-management.io"],
"resources": ["managedserviceaccounts"],
"verbs": ["get", "list", "watch", "create", "update", "patch", "delete"]
}
},
{
"op": "add",
"path": "/rules/-",
"value": {
"apiGroups": ["kueue.x-k8s.io"],
"resources": ["multikueueconfigs"],
"verbs": ["get", "list", "watch", "create", "update", "patch", "delete"]
}
},
{
"op": "add",
"path": "/rules/-",
"value": {
"apiGroups": ["kueue.x-k8s.io"],
"resources": ["multikueueclusters"],
"verbs": ["get", "list", "watch", "create", "update", "patch", "delete"]
}
},
{
"op": "add",
"path": "/rules/-",
"value": {
"apiGroups": ["kueue.x-k8s.io"],
"resources": ["admissionchecks"],
"verbs": ["get", "list", "watch", "create", "update", "patch", "delete"]
}
},
{
"op": "add",
"path": "/rules/-",
"value": {
"apiGroups": ["kueue.x-k8s.io"],
"resources": ["admissionchecks/status"],
"verbs": ["update", "patch"]
}
},
{
"op": "add",
"path": "/rules/-",
"value": {
"apiGroups": [""],
"resources": ["secrets"],
"verbs": ["get", "list", "watch", "create", "update", "patch", "delete"]
}
}
]

View File

@@ -0,0 +1,18 @@
[
{
"op": "replace",
"path": "/spec/installStrategy",
"value": {
"placements": [
{
"name": "placement-spoke",
"namespace": "default",
"rolloutStrategy": {
"type": "All"
}
}
],
"type": "Placements"
}
}
]

View File

@@ -0,0 +1,14 @@
# clusteradm clusterset bind global --namespace default
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
name: placement-spoke
namespace: default
spec:
clusterSets:
- spoke
tolerations:
- key: cluster.open-cluster-management.io/unreachable
operator: Exists
- key: cluster.open-cluster-management.io/unavailable
operator: Exists

View File

@@ -0,0 +1,90 @@
apiVersion: work.open-cluster-management.io/v1alpha1
kind: ManifestWorkReplicaSet
metadata:
name: single-clusterqueue
namespace: default
spec:
placementRefs:
- name: placement-spoke
manifestWorkTemplate:
workload:
manifests:
- apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kueue-manager-ocm-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kueue-manager-role
subjects:
- kind: ServiceAccount
name: klusterlet-work-sa
namespace: open-cluster-management-agent
- apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kueue-batch-admin-ocm-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kueue-batch-admin-role
subjects:
- kind: ServiceAccount
name: klusterlet-work-sa
namespace: open-cluster-management-agent
- apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "default-flavor-demo1"
- apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "cluster-queue-demo1"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory"]
flavors:
- name: "default-flavor-demo1"
resources:
- name: "cpu"
nominalQuota: 9
- name: "memory"
nominalQuota: 36Gi
- apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
namespace: "default"
name: "user-queue-demo1"
spec:
clusterQueue: "cluster-queue-demo1"
- apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "default-flavor-demo2"
- apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "cluster-queue-demo2"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory","nvidia.com/gpu"]
flavors:
- name: "default-flavor-demo2"
resources:
- name: "cpu"
nominalQuota: 9
- name: "memory"
nominalQuota: 36Gi
- name: "nvidia.com/gpu"
nominalQuota: 3
- apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
namespace: "default"
name: "user-queue-demo2"
spec:
clusterQueue: "cluster-queue-demo2"

View File

@@ -0,0 +1,25 @@
apiVersion: batch/v1
kind: Job
metadata:
generateName: demo1-job
namespace: default
labels:
kueue.x-k8s.io/queue-name: user-queue-demo1
spec:
parallelism: 1
completions: 1
suspend: true
template:
spec:
containers:
- name: dummy-job
image: gcr.io/k8s-staging-perf-tests/sleep:v0.1.0
args: ["30s"]
resources:
requests:
cpu: "1"
memory: "200Mi"
limits:
cpu: "1"
memory: "200Mi"
restartPolicy: Never

View File

@@ -0,0 +1,27 @@
apiVersion: batch/v1
kind: Job
metadata:
generateName: demo2-job
namespace: default
labels:
kueue.x-k8s.io/queue-name: "user-queue-demo2"
spec:
parallelism: 1
completions: 1
suspend: true
template:
spec:
containers:
- name: dummy-job
image: gcr.io/k8s-staging-perf-tests/sleep:v0.1.0
args: ["600s"]
resources:
requests:
cpu: "1"
memory: "200Mi"
nvidia.com/gpu: "1"
limits:
cpu: "1"
memory: "200Mi"
nvidia.com/gpu: "1" # This job requires one GPU.
restartPolicy: Never

View File

@@ -0,0 +1,71 @@
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "default-flavor-demo1"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "cluster-queue-demo1"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory"]
flavors:
- name: "default-flavor-demo1"
resources:
- name: "cpu"
nominalQuota: 9
- name: "memory"
nominalQuota: 36Gi
admissionChecks:
- multikueue-demo1
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
namespace: "default"
name: "user-queue-demo1"
spec:
clusterQueue: "cluster-queue-demo1"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: AdmissionCheck
metadata:
name: multikueue-demo1
spec:
controllerName: kueue.x-k8s.io/multikueue
parameters:
apiGroup: kueue.x-k8s.io
kind: MultiKueueConfig
name: multikueue-config-demo1
---
apiVersion: kueue.x-k8s.io/v1alpha1
kind: MultiKueueConfig
metadata:
name: multikueue-config-demo1
spec:
clusters:
- multikueue-demo1-cluster1
- multikueue-demo1-cluster2
---
apiVersion: kueue.x-k8s.io/v1alpha1
kind: MultiKueueCluster
metadata:
name: multikueue-demo1-cluster1
spec:
kubeConfig:
locationType: Secret
location: kueue-admin-cluster1-kubeconfig
# a secret called "kueue-admin-cluster1-kubeconfig" should be created in the namespace the kueue
# controller manager runs into, holding the kubeConfig needed to connect to the
# worker cluster in the "kubeconfig" key;
---
apiVersion: kueue.x-k8s.io/v1alpha1
kind: MultiKueueCluster
metadata:
name: multikueue-demo1-cluster2
spec:
kubeConfig:
locationType: Secret
location: kueue-admin-cluster2-kubeconfig

View File

@@ -0,0 +1,57 @@
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "default-flavor-demo2"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "cluster-queue-demo2"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory","nvidia.com/gpu"]
flavors:
- name: "default-flavor-demo2"
resources:
- name: "cpu"
nominalQuota: 9
- name: "memory"
nominalQuota: 36Gi
- name: "nvidia.com/gpu"
nominalQuota: 3
admissionChecks:
- multikueue-demo2
- placement-demo2
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
namespace: "default"
name: "user-queue-demo2"
spec:
clusterQueue: "cluster-queue-demo2"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: AdmissionCheck
metadata:
name: multikueue-demo2
spec:
controllerName: kueue.x-k8s.io/multikueue
parameters:
apiGroup: kueue.x-k8s.io
kind: MultiKueueConfig
name: placement-demo2
---
# OCM implements an admissioncheck controller to automate the MultiKueue setup process.
# MultiKueueConfigs and MultiKueueClusters are generated dynamically based on OCM placement decisions.
apiVersion: kueue.x-k8s.io/v1beta1
kind: AdmissionCheck
metadata:
name: placement-demo2
spec:
controllerName: open-cluster-management.io/placement
parameters:
apiGroup: cluster.open-cluster-management.io
kind: Placement
name: placement-demo2

View File

@@ -0,0 +1,18 @@
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
name: placement-demo2
namespace: kueue-system
spec:
clusterSets:
- spoke
tolerations:
- key: cluster.open-cluster-management.io/unreachable
operator: Exists
- key: cluster.open-cluster-management.io/unavailable
operator: Exists
predicates:
- requiredClusterSelector:
labelSelector:
matchLabels:
accelerator: nvidia-tesla-t4

View File

@@ -0,0 +1,28 @@
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
name: placement-demo2
namespace: kueue-system
spec:
clusterSets:
- spoke
tolerations:
- key: cluster.open-cluster-management.io/unreachable
operator: Exists
- key: cluster.open-cluster-management.io/unavailable
operator: Exists
predicates:
- requiredClusterSelector:
labelSelector:
matchLabels:
accelerator: nvidia-tesla-t4
numberOfClusters: 1
prioritizerPolicy:
mode: Exact
configurations:
- scoreCoordinate:
type: AddOn
addOn:
resourceName: resource-usage-score
scoreName: gpuClusterAvailable
weight: 1

View File

@@ -0,0 +1,126 @@
#!/bin/bash
cd $(dirname ${BASH_SOURCE})
set -e
hub=${CLUSTER1:-hub}
c1=${CLUSTER1:-cluster1}
c2=${CLUSTER2:-cluster2}
c3=${CLUSTER2:-cluster3}
hubctx="kind-${hub}"
c1ctx="kind-${c1}"
c2ctx="kind-${c2}"
c3ctx="kind-${c3}"
kind create cluster --name "${hub}" --image kindest/node:v1.29.0@sha256:eaa1450915475849a73a9227b8f201df25e55e268e5d619312131292e324d570
kind create cluster --name "${c1}" --image kindest/node:v1.29.0@sha256:eaa1450915475849a73a9227b8f201df25e55e268e5d619312131292e324d570
kind create cluster --name "${c2}" --image kindest/node:v1.29.0@sha256:eaa1450915475849a73a9227b8f201df25e55e268e5d619312131292e324d570
kind create cluster --name "${c3}" --image kindest/node:v1.29.0@sha256:eaa1450915475849a73a9227b8f201df25e55e268e5d619312131292e324d570
echo "Initialize the ocm hub cluster"
clusteradm init --feature-gates="ManifestWorkReplicaSet=true,ManagedClusterAutoApproval=true" --bundle-version="latest" --wait --context ${hubctx}
joincmd=$(clusteradm get token --context ${hubctx} | grep clusteradm)
echo "Join cluster1 to hub"
$(echo ${joincmd} --force-internal-endpoint-lookup --wait --context ${c1ctx} | sed "s/<cluster_name>/$c1/g")
echo "Join cluster2 to hub"
$(echo ${joincmd} --force-internal-endpoint-lookup --wait --context ${c2ctx} | sed "s/<cluster_name>/$c2/g")
echo "Join cluster3 to hub"
$(echo ${joincmd} --force-internal-endpoint-lookup --wait --context ${c3ctx} | sed "s/<cluster_name>/$c3/g")
echo "Accept join of cluster1 and cluster2"
clusteradm accept --context ${hubctx} --clusters ${c1},${c2},${c3} --wait
kubectl get managedclusters --all-namespaces --context ${hubctx}
echo "Install Kueue (this can be replaced with OCM Manifestwork in the future)"
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.7.1/manifests.yaml --context ${hubctx}
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.7.1/manifests.yaml --context ${c1ctx}
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.7.1/manifests.yaml --context ${c2ctx}
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.7.1/manifests.yaml --context ${c3ctx}
echo "Install Jobset for MultiKueue (this can be replaced with OCM Manifestwork in the future)"
kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.5.2/manifests.yaml --context ${hubctx}
kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.5.2/manifests.yaml --context ${c1ctx}
kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.5.2/manifests.yaml --context ${c2ctx}
kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.5.2/manifests.yaml --context ${c3ctx}
kubectl config use-context ${hubctx}
echo "Patch permission"
kubectl patch clusterrole cluster-manager --type='json' -p "$(cat env/patch-clusterrole.json)"
echo "Patch image"
kubectl patch deployment cluster-manager -n open-cluster-management --type=json -p='[
{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value": "quay.io/haoqing/registration-operator:latest"},
{"op": "replace", "path": "/spec/template/spec/containers/0/imagePullPolicy", "value": "Always"}
]'
kubectl patch clustermanager cluster-manager --type=json -p='[{"op": "replace", "path": "/spec/registrationImagePullSpec", "value": "quay.io/haoqing/registration:latest"}]'
kubectl patch clustermanager cluster-manager --type=json -p='[{"op": "replace", "path": "/spec/placementImagePullSpec", "value": "quay.io/haoqing/placement:latest"}]'
echo "Install CRDs"
kubectl create -f env/multicluster.x-k8s.io_clusterprofiles.yaml
echo "Install managed-serviceaccount"
git clone git@github.com:open-cluster-management-io/managed-serviceaccount.git || true
cd managed-serviceaccount
helm uninstall -n open-cluster-management-addon managed-serviceaccount || true
helm install \
-n open-cluster-management-addon --create-namespace \
managed-serviceaccount charts/managed-serviceaccount/ \
--set tag=latest \
--set featureGates.ephemeralIdentity=true \
--set enableAddOnDeploymentConfig=true \
--set hubDeployMode=AddOnTemplate
cd -
rm -rf managed-serviceaccount
echo "Install managed-serviceaccount mca"
clusteradm create clusterset spoke
clusteradm clusterset set spoke --clusters ${c1},${c2},${c3}
clusteradm clusterset bind spoke --namespace default
kubectl apply -f env/placement.yaml || true
kubectl patch clustermanagementaddon managed-serviceaccount --type='json' -p="$(cat env/patch-mg-sa-cma.json)" || true
echo "Install cluster-permission"
git clone git@github.com:open-cluster-management-io/cluster-permission.git || true
cd cluster-permission
kubectl apply -f config/crds
kubectl apply -f config/rbac
kubectl apply -f config/deploy
cd -
rm -rf cluster-permission
echo "Install resource-usage-collect-addon"
git clone git@github.com:open-cluster-management-io/addon-contrib.git || true
cd addon-contrib/resource-usage-collect-addon
make deploy
cd -
rm -rf addon-contrib
echo "Enable MultiKueue on the hub"
kubectl patch deployment kueue-controller-manager -n kueue-system --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": ["--config=/controller_manager_config.yaml", "--zap-log-level=2", "--feature-gates=MultiKueue=true"]}]'
echo "Setup queue on the spoke"
kubectl apply -f env/single-clusterqueue-setup-mwrs.yaml
echo "Setup credentials for clusterprofile"
kubectl apply -f env/cp-c1.yaml
kubectl apply -f env/cp-c2.yaml
kubectl apply -f env/cp-c3.yaml
kubectl apply -f env/msa-c1.yaml
kubectl apply -f env/msa-c2.yaml
kubectl apply -f env/msa-c3.yaml
echo "Setup faked GPU on the spoke"
kubectl label managedcluster cluster2 accelerator=nvidia-tesla-t4
kubectl label managedcluster cluster3 accelerator=nvidia-tesla-t4
echo "IMPORTANT: RUN BELOW COMMAND MANUALLY on cluster2 and cluster3 !!!"
echo "kubectl edit-status node cluster2-control-plane --context ${c2ctx}" with nvidia.com/gpu: "3"
echo "kubectl edit-status node cluster3-control-plane --context ${c3ctx}" with nvidia.com/gpu: "3"