mirror of
https://github.com/open-cluster-management-io/ocm.git
synced 2026-02-14 10:00:11 +00:00
🌱 OCM Kueue Admission Check Controller (#601)
* Deliver proposal for OCM Kueue admission check controller. Signed-off-by: Zhe Shen <xxtale02591@gmail.com> * add more explanation in doc, delete unused permissions Signed-off-by: Zhe Shen <xxtale02591@gmail.com> * add section in doc Signed-off-by: Zhe Shen <xxtale02591@gmail.com> --------- Signed-off-by: Zhe Shen <xxtale02591@gmail.com>
This commit is contained in:
641
solutions/kueue-admission-check/README.md
Normal file
641
solutions/kueue-admission-check/README.md
Normal file
@@ -0,0 +1,641 @@
|
||||
# Set up Multikueue with OCM Kueue Admission Check Controller
|
||||
|
||||
This guide demonstrates how to use the external OCM [Kueue Admission Check Controller](https://kueue.sigs.k8s.io/docs/concepts/admission_check/) which integrates OCM `Placement` results with [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) for intelligent multi-cluster job scheduling.
|
||||
The controller reads OCM `Placement` decisions and generates corresponding `MultiKueueConfig` and `MultiKueueCluster` resources, streamlining the setup of the [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) environment and enabling users to select clusters based on custom criteria.
|
||||
We'll walk through different user stories that showcase the power and flexibility of this integration.
|
||||
|
||||
## Background
|
||||
|
||||
### Existing Components
|
||||
|
||||
1. **OCM Placement and AddonPlacementScore**:
|
||||
|
||||
- `Placement` is used to dynamically select a set of `managedClusters` in one or multiple `ManagedClusterSet` to achieve Multi-Cluster scheduling.
|
||||
- `AddOnPlacementScore` is an API introduced by `Placement` to support scheduling based on customized scores.
|
||||
|
||||
2. **Kueue MultiKueue and AdmissionChecks**:
|
||||
|
||||
- [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) is a feature of Kueue for job dispatching across multiple clusters.
|
||||
- The [AdmissionChecks](https://kueue.sigs.k8s.io/docs/concepts/admission_check/) are a mechanism which manages Kueue and allows it to consider additional criteria before admitting a workload. Kueue only proceeds with a workload if all associated AdmissionChecks return a positive signal.
|
||||
|
||||
REF: [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/), [Admission Check](https://kueue.sigs.k8s.io/docs/concepts/admission_check/), [Placement](https://open-cluster-management.io/concepts/placement/).
|
||||
|
||||
## Motivation
|
||||
|
||||
- Setting up a [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) environment for multiple clusters is a complex and manual process, often requiring users to create `MultiKueueCluster` and `MultiKueueConfig` resources for each worker cluster individually.
|
||||
|
||||
- Driven by the growing need for optimal compute resource utilization, particularly in AI/ML workloads, multi-cluster users increasingly seek to leverage the OCM framework with [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) for intelligent cluster selection.
|
||||
|
||||
REF: [Setup a MultiKueue environment](https://kueue.sigs.k8s.io/docs/tasks/manage/setup_multikueue/#multikueue-specific-kubeconfig)
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. A Kubernetes environment with OCM installed on a hub cluster and at least three managed clusters.
|
||||
2. [Kueue](https://kueue.sigs.k8s.io/docs/installation/) deployed across all clusters.
|
||||
3. [Managed-serviceaccount](https://github.com/open-cluster-management-io/managed-serviceaccount), [cluster-permission](https://github.com/open-cluster-management-io/cluster-permission) and [resource-usage-collect-addon](https://github.com/open-cluster-management-io/addon-contrib/tree/main/resource-usage-collect-addon) installed on managed clusters.
|
||||
|
||||
|
||||
|
||||
- You can set up these above by running the command:
|
||||
```bash
|
||||
./setup-env.sh
|
||||
```
|
||||
**Notice**: Currently, this functionality relies on the support of `ClusterProfile` and the user's manual installation of the Admission Check Controller.
|
||||
OCM achieves this by replacing some OCM images in this `setup-env.sh`. In the future, we plan to address the items listed in the [TODO section](#todo).
|
||||
|
||||
After that, you can verify your setup.
|
||||
|
||||
- Check the managed clusters.
|
||||
|
||||
```bash
|
||||
kubectl get mcl
|
||||
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE
|
||||
cluster1 true https://cluster1-control-plane:6443 True True 116s
|
||||
cluster2 true https://cluster2-control-plane:6443 True True 94s
|
||||
cluster3 true https://cluster3-control-plane:6443 True True 73s
|
||||
```
|
||||
- Verify the installed addons.
|
||||
```bash
|
||||
kubectl get mca -A
|
||||
NAMESPACE NAME AVAILABLE DEGRADED PROGRESSING
|
||||
cluster1 managed-serviceaccount True False
|
||||
cluster1 resource-usage-collect True False
|
||||
cluster2 managed-serviceaccount True False
|
||||
cluster2 resource-usage-collect True False
|
||||
cluster3 managed-serviceaccount True False
|
||||
cluster3 resource-usage-collect True False
|
||||
```
|
||||
- Confirm Kueue is running on the clusters.
|
||||
```bash
|
||||
kubectl get pods -n kueue-system --context kind-hub # Same for managed clusters.
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
kueue-controller-manager-87bd7888b-gqk4g 2/2 Running 0 69s
|
||||
```
|
||||
|
||||
- On the hub cluster, check `ClusterProfiles`.
|
||||
```bash
|
||||
kubectl get clusterprofile -A
|
||||
NAMESPACE NAME AGE
|
||||
open-cluster-management cluster1 23s
|
||||
open-cluster-management cluster2 23s
|
||||
open-cluster-management cluster3 23s
|
||||
```
|
||||
- The `ClusterProfile` status contains credentials that Kueue can use.
|
||||
```bash
|
||||
kubectl get clusterprofile -A -ojson | jq '.items[] | .metadata.name, .status.credentials[]'
|
||||
"cluster1"
|
||||
{
|
||||
"accessRef": {
|
||||
"kind": "Secret",
|
||||
"name": "kueue-admin-cluster1-kubeconfig",
|
||||
"namespace": "kueue-system"
|
||||
},
|
||||
"consumer": "kueue-admin"
|
||||
}
|
||||
"cluster2"
|
||||
{
|
||||
"accessRef": {
|
||||
"kind": "Secret",
|
||||
"name": "kueue-admin-cluster2-kubeconfig",
|
||||
"namespace": "kueue-system"
|
||||
},
|
||||
"consumer": "kueue-admin"
|
||||
}
|
||||
"cluster3"
|
||||
{
|
||||
"accessRef": {
|
||||
"kind": "Secret",
|
||||
"name": "kueue-admin-cluster3-kubeconfig",
|
||||
"namespace": "kueue-system"
|
||||
},
|
||||
"consumer": "kueue-admin"
|
||||
|
||||
}
|
||||
```
|
||||
- On hub cluster, Check secrets with `kubeconfig` for the managed cluster created under `kueue-system` namespace.
|
||||
```bash
|
||||
kubectl get secret -n kueue-system
|
||||
NAME TYPE DATA AGE
|
||||
kueue-admin-cluster1-kubeconfig Opaque 1 4m4s
|
||||
kueue-admin-cluster2-kubeconfig Opaque 1 4m4s
|
||||
kueue-admin-cluster3-kubeconfig Opaque 1 4m4s
|
||||
kueue-webhook-server-cert Opaque 4 5m27s
|
||||
```
|
||||
|
||||
## User Stories
|
||||
|
||||
#### Story 1
|
||||
|
||||
As an admin, I want to automate [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) configuration across multiple clusters, so that I can streamline the setup process without manual intervention.
|
||||
|
||||
- With the help of the `ClusterProfile` API, we can easily set up MultiKueue environment.
|
||||
```bash
|
||||
kubectl apply -f ./multikueue-setup-demo1.yaml
|
||||
```
|
||||
- After that, check the status of `MultiKueueCluster`, `AdmissionChecks` and `Clusterqueues`
|
||||
|
||||
```bash
|
||||
kubectl get multikueuecluster -A -ojson | jq '.items[] | .metadata.name, .status.conditions'
|
||||
kubectl get admissionchecks -ojson | jq '.items[] | .metadata.name, .status.conditions'
|
||||
kubectl get clusterqueues -ojson | jq '.items[] | .metadata.name, .status.conditions'
|
||||
```
|
||||
Success is indicated when "status": "True" and reasons like "Active" or "Ready" are present in the conditions.
|
||||
|
||||
```bash
|
||||
"multikueue-demo1-cluster1"
|
||||
[
|
||||
{
|
||||
"lastTransitionTime": "2024-08-31T20:41:41Z",
|
||||
"message": "Connected",
|
||||
"observedGeneration": 1,
|
||||
"reason": "Active",
|
||||
"status": "True",
|
||||
"type": "Active"
|
||||
}
|
||||
]
|
||||
"multikueue-demo1-cluster2"
|
||||
[
|
||||
{
|
||||
"lastTransitionTime": "2024-08-31T20:41:41Z",
|
||||
"message": "Connected",
|
||||
"observedGeneration": 1,
|
||||
"reason": "Active",
|
||||
"status": "True",
|
||||
"type": "Active"
|
||||
}
|
||||
]
|
||||
"multikueue-demo1"
|
||||
[
|
||||
{
|
||||
"lastTransitionTime": "2024-08-31T20:41:41Z",
|
||||
"message": "The admission check is active",
|
||||
"observedGeneration": 1,
|
||||
"reason": "Active",
|
||||
"status": "True",
|
||||
"type": "Active"
|
||||
},
|
||||
{
|
||||
"lastTransitionTime": "2024-08-31T20:41:41Z",
|
||||
"message": "only one multikueue managed admission check can be used in one ClusterQueue",
|
||||
"observedGeneration": 1,
|
||||
"reason": "MultiKueue",
|
||||
"status": "True",
|
||||
"type": "SingleInstanceInClusterQueue"
|
||||
},
|
||||
{
|
||||
"lastTransitionTime": "2024-08-31T20:41:41Z",
|
||||
"message": "admission check cannot be applied at ResourceFlavor level",
|
||||
"observedGeneration": 1,
|
||||
"reason": "MultiKueue",
|
||||
"status": "True",
|
||||
"type": "FlavorIndependent"
|
||||
}
|
||||
]
|
||||
"cluster-queue-demo1"
|
||||
[
|
||||
{
|
||||
"lastTransitionTime": "2024-08-31T20:41:41Z",
|
||||
"message": "Can admit new workloads",
|
||||
"observedGeneration": 1,
|
||||
"reason": "Ready",
|
||||
"status": "True",
|
||||
"type": "Active"
|
||||
}
|
||||
]
|
||||
```
|
||||
- Deploy a job to the MultiKueue.
|
||||
|
||||
```bash
|
||||
kubectl create -f ./job-demo1.yaml
|
||||
```
|
||||
- Check the workload on the managed clusters. Here when the job’s Workload receives a QuotaReservation in the manager cluster, a copy of the Workload is created in all configured worker clusters.
|
||||
Once `kind-cluster1` admitted the workload, the manager removed the corresponding workloads from the other clusters(`kind-cluster2`).
|
||||
```bash
|
||||
kubectl get workload --context kind-cluster1
|
||||
NAME QUEUE RESERVED IN ADMITTED AGE
|
||||
job-demo1-jobnktc6-6c5f3 user-queue-demo1 cluster-queue-demo1 True 5s
|
||||
|
||||
kubectl get workload --context kind-cluster2
|
||||
No resources found in default namespace. # After cluster1 admitted the workload, no workload should show up here.
|
||||
```
|
||||
#### Story 2
|
||||
|
||||
As an admin, I want to use OCM `Placement` results for scheduling, so that clusters with specific attributes, like those with the `nvidia-t4` GPU accelerator label, are automatically selected and converted into a [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) for targeted workload deployment.
|
||||
|
||||
- You can manually label the accelerators on the clusters.
|
||||
```bash
|
||||
kubectl label managedcluster cluster2 accelerator=nvidia-tesla-t4
|
||||
kubectl label managedcluster cluster3 accelerator=nvidia-tesla-t4
|
||||
```
|
||||
The `placememt-demo2-1.yaml` selects clusters with the `nvidia-tesla-t4` accelerator label.
|
||||
```yaml
|
||||
apiVersion: cluster.open-cluster-management.io/v1beta1
|
||||
kind: Placement
|
||||
metadata:
|
||||
name: placement-demo2
|
||||
namespace: kueue-system
|
||||
spec:
|
||||
clusterSets:
|
||||
- spoke
|
||||
tolerations:
|
||||
- key: cluster.open-cluster-management.io/unreachable
|
||||
operator: Exists
|
||||
- key: cluster.open-cluster-management.io/unavailable
|
||||
operator: Exists
|
||||
predicates:
|
||||
- requiredClusterSelector:
|
||||
labelSelector:
|
||||
matchLabels:
|
||||
accelerator: nvidia-tesla-t4
|
||||
```
|
||||
- Bind the cluster set to the Kueue namespace and verify the bindings.
|
||||
|
||||
```bash
|
||||
clusteradm clusterset bind spoke --namespace kueue-system
|
||||
clusteradm get clustersets
|
||||
<ManagedClusterSet>
|
||||
└── <spoke>
|
||||
└── <BoundNamespace> default,kueue-system
|
||||
└── <Status> 3 ManagedClusters selected
|
||||
└── <Clusters> [cluster1 cluster2 cluster3]
|
||||
```
|
||||
|
||||
- Apply the placement policy.
|
||||
|
||||
```bash
|
||||
kubectl apply -f placement-demo2-1.yaml
|
||||
```
|
||||
|
||||
- Apply the MultiKueue setup configuration.
|
||||
|
||||
```bash
|
||||
kubectl apply -f ./multikueue-setup-demo2.yaml
|
||||
```
|
||||
|
||||
- Check the `MultikueueKonfig` and `MultikueueClusters`.
|
||||
|
||||
```bash
|
||||
kubectl get multikueueconfig
|
||||
NAME AGE
|
||||
placement-demo2 60s
|
||||
|
||||
kubectl get multikueuecluster
|
||||
NAME AGE
|
||||
placement-demo2-cluster2 60s
|
||||
placement-demo2-cluster3 60s
|
||||
```
|
||||
- After that, check the status of `MultiKueueCluster`, `AdmissionChecks` and `Clusterqueues`
|
||||
```bash
|
||||
kubectl get multikueuecluster -A -ojson | jq '.items[] | .metadata.name, .status.conditions'
|
||||
kubectl get admissionchecks -ojson | jq '.items[] | .metadata.name, .status.conditions'
|
||||
kubectl get clusterqueues -ojson | jq '.items[] | .metadata.name, .status.conditions'
|
||||
```
|
||||
If success, there should be "status": "True" and reasons like "Active" or "Ready" presented in the conditions.
|
||||
```bash
|
||||
"placement-demo2-cluster2"
|
||||
[
|
||||
{
|
||||
"lastTransitionTime": "2024-08-31T22:03:16Z",
|
||||
"message": "Connected",
|
||||
"observedGeneration": 1,
|
||||
"reason": "Active",
|
||||
"status": "True",
|
||||
"type": "Active"
|
||||
}
|
||||
]
|
||||
"placement-demo2-cluster3"
|
||||
[
|
||||
{
|
||||
"lastTransitionTime": "2024-08-31T22:03:16Z",
|
||||
"message": "Connected",
|
||||
"observedGeneration": 1,
|
||||
"reason": "Active",
|
||||
"status": "True",
|
||||
"type": "Active"
|
||||
}
|
||||
]
|
||||
"multikueue-demo2" # The status of the admissioncheck `multikueue-demo2`
|
||||
[
|
||||
{
|
||||
"lastTransitionTime": "2024-08-31T22:03:16Z",
|
||||
"message": "The admission check is active",
|
||||
"observedGeneration": 1,
|
||||
"reason": "Active",
|
||||
"status": "True",
|
||||
"type": "Active"
|
||||
},
|
||||
{
|
||||
"lastTransitionTime": "2024-08-31T22:03:16Z",
|
||||
"message": "only one multikueue managed admission check can be used in one ClusterQueue",
|
||||
"observedGeneration": 1,
|
||||
"reason": "MultiKueue",
|
||||
"status": "True",
|
||||
"type": "SingleInstanceInClusterQueue"
|
||||
},
|
||||
{
|
||||
"lastTransitionTime": "2024-08-31T22:03:16Z",
|
||||
"message": "admission check cannot be applied at ResourceFlavor level",
|
||||
"observedGeneration": 1,
|
||||
"reason": "MultiKueue",
|
||||
"status": "True",
|
||||
"type": "FlavorIndependent"
|
||||
}
|
||||
]
|
||||
"placement-demo2" # The status of the admissioncheck `placement-demo2`
|
||||
[
|
||||
{
|
||||
"lastTransitionTime": "2024-08-31T22:03:16Z",
|
||||
"message": "MultiKueueConfig and MultiKueueCluster generated",
|
||||
"reason": "Active",
|
||||
"status": "True",
|
||||
"type": "Active"
|
||||
}
|
||||
]
|
||||
"cluster-queue-demo2"
|
||||
[
|
||||
{
|
||||
"lastTransitionTime": "2024-08-31T22:03:16Z",
|
||||
"message": "Can admit new workloads",
|
||||
"observedGeneration": 1,
|
||||
"reason": "Ready",
|
||||
"status": "True",
|
||||
"type": "Active"
|
||||
}
|
||||
]
|
||||
```
|
||||
- Create a job requesting GPU resources to the MultiKueue.
|
||||
```bash
|
||||
kubectl create -f ./job-demo2.yaml
|
||||
```
|
||||
- Check the workload on managed clusters. Like we explained in the case in story 1, once one cluster(here `kind-cluster3`) has admitted the workload, the manager removed the corresponding workloads from the other clusters(here `kind-cluster2`).
|
||||
```bash
|
||||
kubectl get workload --context kind-cluster2
|
||||
No resources found in default namespace.
|
||||
|
||||
kubectl get workload --context kind-cluster3
|
||||
NAME QUEUE RESERVED IN ADMITTED AGE
|
||||
job-demo2-jobl2t6d-a8cdd user-queue-demo2 cluster-queue-demo2 True 3s
|
||||
```
|
||||
#### Story 3
|
||||
|
||||
As an admin, I want to leverage OCM's `AddonPlacementScore` for dynamic workload scheduling, so that clusters with higher GPU scores, indicating clusters with more GPU resources, are selected and converted into a [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/), which automatically adjusts by adding or removing clusters as scores change.
|
||||
|
||||
`placememt-demo2-2` selects clusters with the `nvidia-tesla-t4` accelerator label, and select one cluster with the highest GPU-score, indicating having more GPU resources.
|
||||
|
||||
```yaml
|
||||
apiVersion: cluster.open-cluster-management.io/v1beta1
|
||||
kind: Placement
|
||||
metadata:
|
||||
name: placement-demo2
|
||||
namespace: kueue-system
|
||||
spec:
|
||||
clusterSets:
|
||||
- spoke
|
||||
tolerations:
|
||||
- key: cluster.open-cluster-management.io/unreachable
|
||||
operator: Exists
|
||||
- key: cluster.open-cluster-management.io/unavailable
|
||||
operator: Exists
|
||||
predicates:
|
||||
- requiredClusterSelector:
|
||||
labelSelector:
|
||||
matchLabels:
|
||||
accelerator: nvidia-tesla-t4
|
||||
numberOfClusters: 1
|
||||
prioritizerPolicy:
|
||||
mode: Exact
|
||||
configurations:
|
||||
- scoreCoordinate:
|
||||
type: AddOn
|
||||
addOn:
|
||||
resourceName: resource-usage-score
|
||||
scoreName: gpuClusterAvailable
|
||||
weight: 1
|
||||
```
|
||||
- You can manually edit the GPU resources on the managed clusters for testing, for example on `kind-cluster2`, set 3 fake GPU resources on the `control-plane-node`.
|
||||
```bash
|
||||
kubectl edit-status node cluster2-control-plane --context kind-cluster2 # Same operation with other clusters/nodes.
|
||||
```
|
||||
- Edit the `status` of the node `cluster2-control-plane`:
|
||||
```yaml
|
||||
allocatable:
|
||||
cpu: "8"
|
||||
ephemeral-storage: 61202244Ki
|
||||
hugepages-1Gi: "0"
|
||||
hugepages-2Mi: "0"
|
||||
hugepages-32Mi: "0"
|
||||
hugepages-64Ki: "0"
|
||||
memory: 8027168Ki
|
||||
nvidia.com/gpu: "3" # Add 3 fake GPUs in allocatable
|
||||
pods: "110"
|
||||
capacity:
|
||||
cpu: "8"
|
||||
ephemeral-storage: 61202244Ki
|
||||
hugepages-1Gi: "0"
|
||||
hugepages-2Mi: "0"
|
||||
hugepages-32Mi: "0"
|
||||
hugepages-64Ki: "0"
|
||||
memory: 8027168Ki
|
||||
nvidia.com/gpu: "3" # Add 3 fake GPUs in capacity
|
||||
pods: "110"
|
||||
```
|
||||
|
||||
- Here in this environment, cluster1 has no GPUs, while cluster2 and cluster3 each have 3 GPUs.
|
||||
Check `AddonPlacementScore`, the range of the score is from -100 to 100, clusters with more resources available have higher scores.
|
||||
Here cluster1, which has no GPUs, should have a score of -100, and the cluster running the workload(here from story 2 we have one workload running on `kind-cluster3`) will have a lower score.
|
||||
```bash
|
||||
kubectl get addonplacementscore -A -ojson | jq '.items[] | .metadata.name, .status.scores[5]'
|
||||
"resource-usage-score"
|
||||
{
|
||||
"name": "gpuClusterAvailable",
|
||||
"value": -100
|
||||
}
|
||||
"resource-usage-score" # kind-cluster2 has no workload.
|
||||
{
|
||||
"name": "gpuClusterAvailable",
|
||||
"value": -70
|
||||
}
|
||||
"resource-usage-score" # kind-cluster3 has a workload from story 2, so it has fewer GPU available, thus lower score.
|
||||
{
|
||||
"name": "gpuClusterAvailable",
|
||||
"value": -80
|
||||
}
|
||||
```
|
||||
|
||||
- Apply the changes in the `Placement` to update MultiKueue dynamically.
|
||||
```bash
|
||||
kubectl apply -f ./placement-demo2-2.yaml
|
||||
```
|
||||
|
||||
- Review the update in `MultikueueKonfig`.
|
||||
```bash
|
||||
kubectl get multikueueconfig
|
||||
NAME AGE
|
||||
placement-demo2 22m
|
||||
|
||||
kubectl get multikueueconfig placement-demo2 -oyaml
|
||||
apiVersion: kueue.x-k8s.io/v1alpha1
|
||||
kind: MultiKueueConfig
|
||||
metadata:
|
||||
creationTimestamp: "2024-08-31T22:03:16Z"
|
||||
generation: 5
|
||||
name: placement-demo2
|
||||
resourceVersion: "18109"
|
||||
uid: 3c16af72-94bf-4444-bf79-7e896165aabc
|
||||
spec:
|
||||
clusters:
|
||||
- placement-demo2-cluster2 # cluster2 has a higher GPU score, so it got selected by the placement decision.
|
||||
```
|
||||
- Create a job for the updated MultiKueue and check the workload, this time the workload is admitted by `kind-cluster2`, in `kind-cluster3` can only find the old workload from Story 2.
|
||||
```bash
|
||||
kubectl create -f ./job-demo2.yaml
|
||||
kubectl get workload --context kind-cluster2
|
||||
NAME QUEUE RESERVED IN ADMITTED AGE
|
||||
job-demo2-jobxn888-4b91e user-queue-demo2 cluster-queue-demo2 True 6s
|
||||
|
||||
kubectl get workload --context kind-cluster3
|
||||
NAME QUEUE RESERVED IN ADMITTED AGE
|
||||
job-demo2-jobl2t6d-a8cdd user-queue-demo2 cluster-queue-demo2 True 9m13s
|
||||
```
|
||||
|
||||
## Design Details
|
||||
|
||||
### OCM Admission Check Controller
|
||||
|
||||
The OCM Admission Check Controller will integrate OCM `Placement` results into MultiKueue by reading `Placement` decisions and generating the necessary `MultiKueueConfig` and `MultiKueueCluster` resources.
|
||||
|
||||
- `controllerName`: Identifies the controller that processes the Admission Check, currently set to `open-cluster-management.io/placement`
|
||||
- `parameters`: Identifies a configuration with additional parameters for the check, here we add the existing OCM `Placement` component. Clusters specified in the `Placement` will be bound to the `kueue-system` namespace.
|
||||
|
||||
Example OCM Admission Check Controller design:
|
||||
|
||||
```yaml
|
||||
# OCM implements an admissioncheck controller to automate the MultiKueue setup process.
|
||||
# MultiKueueConfigs and MultiKueueClusters are generated dynamically based on OCM placement decisions.
|
||||
apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: AdmissionCheck
|
||||
metadata:
|
||||
name: placement-demo2
|
||||
spec:
|
||||
controllerName: open-cluster-management.io/placement
|
||||
parameters:
|
||||
apiGroup: cluster.open-cluster-management.io
|
||||
kind: Placement
|
||||
name: placement-demo2
|
||||
# Leverages OCM's placement mechanism to select clusters based on specific criteria.
|
||||
# For example `Placement-demo2-1` selects clusters with the `nvidia-tesla-t4` accelerator label.
|
||||
```
|
||||
|
||||
### Changes in the Configuration Process with OCM Admission Check Controller
|
||||
|
||||
Using the OCM Admission Check Controller significantly simplifies the configuration process for system administrators by automating several manual tasks.
|
||||
|
||||
#### Before Using OCM Admission Check Controller
|
||||
|
||||
In the traditional setup, administrators must manually configure both `MultiKueueConfig` and `MultiKueueCluster` resources:
|
||||
|
||||
- **MultiKueueConfig**: Defines which clusters are part of the [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) environment. Admins need to specify each cluster manually.
|
||||
- **MultiKueueCluster**: Each cluster requires a `MultiKueueCluster` resource, which includes a kubeconfig secret that administrators must create manually for secure communication.
|
||||
|
||||
```yaml
|
||||
apiVersion: kueue.x-k8s.io/v1alpha1
|
||||
kind: MultiKueueConfig
|
||||
metadata:
|
||||
name: multikueue-config
|
||||
spec:
|
||||
clusters:
|
||||
- multikueue-cluster1
|
||||
- multikueue-cluster2
|
||||
---
|
||||
apiVersion: kueue.x-k8s.io/v1alpha1
|
||||
kind: MultiKueueCluster
|
||||
metadata:
|
||||
name: multikueue-cluster1
|
||||
spec:
|
||||
kubeConfig:
|
||||
locationType: Secret
|
||||
location: kueue-admin-cluster1-kubeconfig
|
||||
---
|
||||
apiVersion: kueue.x-k8s.io/v1alpha1
|
||||
kind: MultiKueueCluster
|
||||
metadata:
|
||||
name: multikueue-cluster2
|
||||
spec:
|
||||
kubeConfig:
|
||||
locationType: Secret
|
||||
location: kueue-admin-cluster2-kubeconfig
|
||||
```
|
||||
|
||||
#### After Using OCM Admission Check Controller
|
||||
|
||||
With the OCM Admission Check Controller, the need for manual configuration of `MultiKueueConfig` and `MultiKueueCluster` is eliminated. Instead, the administrator only needs to configure two additional admission checks in the ClusterQueue resource:
|
||||
`multikueue-demo2` and `placement-demo2` (see in `multikueue-setup-demo2.yaml`) which leverage OCM's placement mechanism to select clusters based on specific criteria and automate the process of setting up `MultiKueueConfig` and `MultiKueueCluster`.
|
||||
|
||||
```yaml
|
||||
apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: ClusterQueue
|
||||
metadata:
|
||||
name: "cluster-queue-demo2"
|
||||
spec:
|
||||
namespaceSelector: {} # match all.
|
||||
resourceGroups:
|
||||
- coveredResources: ["cpu", "memory","nvidia.com/gpu"]
|
||||
flavors:
|
||||
- name: "default-flavor-demo2"
|
||||
resources:
|
||||
- name: "cpu"
|
||||
nominalQuota: 9
|
||||
- name: "memory"
|
||||
nominalQuota: 36Gi
|
||||
- name: "nvidia.com/gpu"
|
||||
nominalQuota: 3
|
||||
admissionChecks:
|
||||
- multikueue-demo2
|
||||
- placement-demo2
|
||||
---
|
||||
apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: AdmissionCheck
|
||||
metadata:
|
||||
name: multikueue-demo2
|
||||
spec:
|
||||
controllerName: kueue.x-k8s.io/multikueue
|
||||
parameters:
|
||||
apiGroup: kueue.x-k8s.io
|
||||
kind: MultiKueueConfig
|
||||
name: placement-demo2
|
||||
---
|
||||
apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: AdmissionCheck
|
||||
metadata:
|
||||
name: placement-demo2
|
||||
spec:
|
||||
controllerName: open-cluster-management.io/placement
|
||||
parameters:
|
||||
apiGroup: cluster.open-cluster-management.io
|
||||
kind: Placement
|
||||
name: placement-demo2
|
||||
```
|
||||
|
||||
#### OCM Admission Check Controller Workflow
|
||||
|
||||
- The OCM Admission Check Controller retrieves the OCM `Placement` associated with an AdmissionCheck in the `kueue-system` namespace.
|
||||
- It uses a `PlacementDecisionTracker` to gather the selected clusters and retrieves their `ClusterProfile` for `credentials`.
|
||||
- The controller creates or updates `MultiKueueCluster` resources with the kubeconfig details for each cluster, and then lists these clusters in a `MultiKueueConfig` resource.
|
||||
- Finally, it updates the AdmissionCheck condition to true, indicating successful generation of the `MultiKueueConfig` and `MultiKueueCluster`, readying the [MultiKueue](https://kueue.sigs.k8s.io/docs/concepts/multikueue/) environment for job scheduling.
|
||||
|
||||
## TODO
|
||||
- In the future, the `AdmissionCheckcontroller` may be added to `featureGates` as a user-enabled feature or possibly developed into an individual component running as a pod on the `hub`.
|
||||
- Users may also need to enable the `ClusterProfile` feature in the `featureGates` to utilize the OCM Admission Check. This can be done by configuring the `ClusterManager` in `hub`.
|
||||
```yaml
|
||||
apiVersion: operator.open-cluster-management.io/v1
|
||||
kind: ClusterManager
|
||||
metadata:
|
||||
name: cluster-manager
|
||||
spec:
|
||||
registrationConfiguration:
|
||||
featureGates:
|
||||
- feature: ClusterProfile
|
||||
mode: Enable
|
||||
...
|
||||
```
|
||||
|
||||
63
solutions/kueue-admission-check/env/cp-c1.yaml
vendored
Normal file
63
solutions/kueue-admission-check/env/cp-c1.yaml
vendored
Normal file
@@ -0,0 +1,63 @@
|
||||
apiVersion: rbac.open-cluster-management.io/v1alpha1
|
||||
kind: ClusterPermission
|
||||
metadata:
|
||||
name: kueue-admin-cluster1
|
||||
namespace: cluster1
|
||||
spec:
|
||||
clusterRole:
|
||||
rules:
|
||||
- apiGroups:
|
||||
- batch
|
||||
resources:
|
||||
- jobs
|
||||
verbs:
|
||||
- create
|
||||
- delete
|
||||
- get
|
||||
- list
|
||||
- watch
|
||||
- apiGroups:
|
||||
- batch
|
||||
resources:
|
||||
- jobs/status
|
||||
verbs:
|
||||
- get
|
||||
- apiGroups:
|
||||
- jobset.x-k8s.io
|
||||
resources:
|
||||
- jobsets
|
||||
verbs:
|
||||
- create
|
||||
- delete
|
||||
- get
|
||||
- list
|
||||
- watch
|
||||
- apiGroups:
|
||||
- jobset.x-k8s.io
|
||||
resources:
|
||||
- jobsets/status
|
||||
verbs:
|
||||
- get
|
||||
- apiGroups:
|
||||
- kueue.x-k8s.io
|
||||
resources:
|
||||
- workloads
|
||||
verbs:
|
||||
- create
|
||||
- delete
|
||||
- get
|
||||
- list
|
||||
- watch
|
||||
- apiGroups:
|
||||
- kueue.x-k8s.io
|
||||
resources:
|
||||
- workloads/status
|
||||
verbs:
|
||||
- get
|
||||
- patch
|
||||
- update
|
||||
clusterRoleBinding:
|
||||
subject:
|
||||
kind: ServiceAccount
|
||||
name: kueue-admin-cluster1
|
||||
namespace: open-cluster-management-agent-addon
|
||||
63
solutions/kueue-admission-check/env/cp-c2.yaml
vendored
Normal file
63
solutions/kueue-admission-check/env/cp-c2.yaml
vendored
Normal file
@@ -0,0 +1,63 @@
|
||||
apiVersion: rbac.open-cluster-management.io/v1alpha1
|
||||
kind: ClusterPermission
|
||||
metadata:
|
||||
name: kueue-admin-cluster2
|
||||
namespace: cluster2
|
||||
spec:
|
||||
clusterRole:
|
||||
rules:
|
||||
- apiGroups:
|
||||
- batch
|
||||
resources:
|
||||
- jobs
|
||||
verbs:
|
||||
- create
|
||||
- delete
|
||||
- get
|
||||
- list
|
||||
- watch
|
||||
- apiGroups:
|
||||
- batch
|
||||
resources:
|
||||
- jobs/status
|
||||
verbs:
|
||||
- get
|
||||
- apiGroups:
|
||||
- jobset.x-k8s.io
|
||||
resources:
|
||||
- jobsets
|
||||
verbs:
|
||||
- create
|
||||
- delete
|
||||
- get
|
||||
- list
|
||||
- watch
|
||||
- apiGroups:
|
||||
- jobset.x-k8s.io
|
||||
resources:
|
||||
- jobsets/status
|
||||
verbs:
|
||||
- get
|
||||
- apiGroups:
|
||||
- kueue.x-k8s.io
|
||||
resources:
|
||||
- workloads
|
||||
verbs:
|
||||
- create
|
||||
- delete
|
||||
- get
|
||||
- list
|
||||
- watch
|
||||
- apiGroups:
|
||||
- kueue.x-k8s.io
|
||||
resources:
|
||||
- workloads/status
|
||||
verbs:
|
||||
- get
|
||||
- patch
|
||||
- update
|
||||
clusterRoleBinding:
|
||||
subject:
|
||||
kind: ServiceAccount
|
||||
name: kueue-admin-cluster2
|
||||
namespace: open-cluster-management-agent-addon
|
||||
63
solutions/kueue-admission-check/env/cp-c3.yaml
vendored
Normal file
63
solutions/kueue-admission-check/env/cp-c3.yaml
vendored
Normal file
@@ -0,0 +1,63 @@
|
||||
apiVersion: rbac.open-cluster-management.io/v1alpha1
|
||||
kind: ClusterPermission
|
||||
metadata:
|
||||
name: kueue-admin-cluster3
|
||||
namespace: cluster3
|
||||
spec:
|
||||
clusterRole:
|
||||
rules:
|
||||
- apiGroups:
|
||||
- batch
|
||||
resources:
|
||||
- jobs
|
||||
verbs:
|
||||
- create
|
||||
- delete
|
||||
- get
|
||||
- list
|
||||
- watch
|
||||
- apiGroups:
|
||||
- batch
|
||||
resources:
|
||||
- jobs/status
|
||||
verbs:
|
||||
- get
|
||||
- apiGroups:
|
||||
- jobset.x-k8s.io
|
||||
resources:
|
||||
- jobsets
|
||||
verbs:
|
||||
- create
|
||||
- delete
|
||||
- get
|
||||
- list
|
||||
- watch
|
||||
- apiGroups:
|
||||
- jobset.x-k8s.io
|
||||
resources:
|
||||
- jobsets/status
|
||||
verbs:
|
||||
- get
|
||||
- apiGroups:
|
||||
- kueue.x-k8s.io
|
||||
resources:
|
||||
- workloads
|
||||
verbs:
|
||||
- create
|
||||
- delete
|
||||
- get
|
||||
- list
|
||||
- watch
|
||||
- apiGroups:
|
||||
- kueue.x-k8s.io
|
||||
resources:
|
||||
- workloads/status
|
||||
verbs:
|
||||
- get
|
||||
- patch
|
||||
- update
|
||||
clusterRoleBinding:
|
||||
subject:
|
||||
kind: ServiceAccount
|
||||
name: kueue-admin-cluster3
|
||||
namespace: open-cluster-management-agent-addon
|
||||
7
solutions/kueue-admission-check/env/msa-c1.yaml
vendored
Normal file
7
solutions/kueue-admission-check/env/msa-c1.yaml
vendored
Normal file
@@ -0,0 +1,7 @@
|
||||
apiVersion: authentication.open-cluster-management.io/v1beta1
|
||||
kind: ManagedServiceAccount
|
||||
metadata:
|
||||
name: kueue-admin-cluster1
|
||||
namespace: cluster1
|
||||
spec:
|
||||
rotation: {}
|
||||
7
solutions/kueue-admission-check/env/msa-c2.yaml
vendored
Normal file
7
solutions/kueue-admission-check/env/msa-c2.yaml
vendored
Normal file
@@ -0,0 +1,7 @@
|
||||
apiVersion: authentication.open-cluster-management.io/v1beta1
|
||||
kind: ManagedServiceAccount
|
||||
metadata:
|
||||
name: kueue-admin-cluster2
|
||||
namespace: cluster2
|
||||
spec:
|
||||
rotation: {}
|
||||
7
solutions/kueue-admission-check/env/msa-c3.yaml
vendored
Normal file
7
solutions/kueue-admission-check/env/msa-c3.yaml
vendored
Normal file
@@ -0,0 +1,7 @@
|
||||
apiVersion: authentication.open-cluster-management.io/v1beta1
|
||||
kind: ManagedServiceAccount
|
||||
metadata:
|
||||
name: kueue-admin-cluster3
|
||||
namespace: cluster3
|
||||
spec:
|
||||
rotation: {}
|
||||
219
solutions/kueue-admission-check/env/multicluster.x-k8s.io_clusterprofiles.yaml
vendored
Normal file
219
solutions/kueue-admission-check/env/multicluster.x-k8s.io_clusterprofiles.yaml
vendored
Normal file
@@ -0,0 +1,219 @@
|
||||
---
|
||||
apiVersion: apiextensions.k8s.io/v1
|
||||
kind: CustomResourceDefinition
|
||||
metadata:
|
||||
annotations:
|
||||
controller-gen.kubebuilder.io/version: v0.14.0
|
||||
name: clusterprofiles.multicluster.x-k8s.io
|
||||
spec:
|
||||
group: multicluster.x-k8s.io
|
||||
names:
|
||||
kind: ClusterProfile
|
||||
listKind: ClusterProfileList
|
||||
plural: clusterprofiles
|
||||
singular: clusterprofile
|
||||
scope: Namespaced
|
||||
versions:
|
||||
- name: v1alpha1
|
||||
schema:
|
||||
openAPIV3Schema:
|
||||
description: ClusterProfile represents a single cluster in a multi-cluster
|
||||
deployment.
|
||||
properties:
|
||||
apiVersion:
|
||||
description: |-
|
||||
APIVersion defines the versioned schema of this representation of an object.
|
||||
Servers should convert recognized schemas to the latest internal value, and
|
||||
may reject unrecognized values.
|
||||
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
|
||||
type: string
|
||||
kind:
|
||||
description: |-
|
||||
Kind is a string value representing the REST resource this object represents.
|
||||
Servers may infer this from the endpoint the client submits requests to.
|
||||
Cannot be updated.
|
||||
In CamelCase.
|
||||
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
|
||||
type: string
|
||||
metadata:
|
||||
type: object
|
||||
spec:
|
||||
description: ClusterProfileSpec defines the desired state of ClusterProfile.
|
||||
properties:
|
||||
clusterManager:
|
||||
description: ClusterManager defines which cluster manager owns this
|
||||
ClusterProfile resource
|
||||
properties:
|
||||
name:
|
||||
description: Name defines the name of the cluster manager
|
||||
type: string
|
||||
required:
|
||||
- name
|
||||
type: object
|
||||
x-kubernetes-validations:
|
||||
- message: ClusterManager is immutable
|
||||
rule: self == oldSelf
|
||||
displayName:
|
||||
description: DisplayName defines a human-readable name of the ClusterProfile
|
||||
type: string
|
||||
required:
|
||||
- clusterManager
|
||||
type: object
|
||||
status:
|
||||
description: ClusterProfileStatus defines the observed state of ClusterProfile.
|
||||
properties:
|
||||
conditions:
|
||||
description: Conditions contains the different condition statuses
|
||||
for this cluster.
|
||||
items:
|
||||
description: "Condition contains details for one aspect of the current
|
||||
state of this API Resource.\n---\nThis struct is intended for
|
||||
direct use as an array at the field path .status.conditions. For
|
||||
example,\n\n\n\ttype FooStatus struct{\n\t // Represents the
|
||||
observations of a foo's current state.\n\t // Known .status.conditions.type
|
||||
are: \"Available\", \"Progressing\", and \"Degraded\"\n\t //
|
||||
+patchMergeKey=type\n\t // +patchStrategy=merge\n\t // +listType=map\n\t
|
||||
\ // +listMapKey=type\n\t Conditions []metav1.Condition `json:\"conditions,omitempty\"
|
||||
patchStrategy:\"merge\" patchMergeKey:\"type\" protobuf:\"bytes,1,rep,name=conditions\"`\n\n\n\t
|
||||
\ // other fields\n\t}"
|
||||
properties:
|
||||
lastTransitionTime:
|
||||
description: |-
|
||||
lastTransitionTime is the last time the condition transitioned from one status to another.
|
||||
This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
|
||||
format: date-time
|
||||
type: string
|
||||
message:
|
||||
description: |-
|
||||
message is a human readable message indicating details about the transition.
|
||||
This may be an empty string.
|
||||
maxLength: 32768
|
||||
type: string
|
||||
observedGeneration:
|
||||
description: |-
|
||||
observedGeneration represents the .metadata.generation that the condition was set based upon.
|
||||
For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date
|
||||
with respect to the current state of the instance.
|
||||
format: int64
|
||||
minimum: 0
|
||||
type: integer
|
||||
reason:
|
||||
description: |-
|
||||
reason contains a programmatic identifier indicating the reason for the condition's last transition.
|
||||
Producers of specific condition types may define expected values and meanings for this field,
|
||||
and whether the values are considered a guaranteed API.
|
||||
The value should be a CamelCase string.
|
||||
This field may not be empty.
|
||||
maxLength: 1024
|
||||
minLength: 1
|
||||
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
|
||||
type: string
|
||||
status:
|
||||
description: status of the condition, one of True, False, Unknown.
|
||||
enum:
|
||||
- "True"
|
||||
- "False"
|
||||
- Unknown
|
||||
type: string
|
||||
type:
|
||||
description: |-
|
||||
type of condition in CamelCase or in foo.example.com/CamelCase.
|
||||
---
|
||||
Many .condition.type values are consistent across resources like Available, but because arbitrary conditions can be
|
||||
useful (see .node.status.conditions), the ability to deconflict is important.
|
||||
The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt)
|
||||
maxLength: 316
|
||||
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
|
||||
type: string
|
||||
required:
|
||||
- lastTransitionTime
|
||||
- message
|
||||
- reason
|
||||
- status
|
||||
- type
|
||||
type: object
|
||||
type: array
|
||||
credentials:
|
||||
description: |-
|
||||
TokenRequests describes a list of token requests on this cluster and its
|
||||
approval status.
|
||||
items:
|
||||
properties:
|
||||
accessRef:
|
||||
description: RequestRef points to a specific AuthTokenRequest
|
||||
object.
|
||||
properties:
|
||||
kind:
|
||||
description: Kind is the kind of the referred token request
|
||||
object.
|
||||
type: string
|
||||
name:
|
||||
description: Name is the name of the referred token request
|
||||
object.
|
||||
type: string
|
||||
namespace:
|
||||
description: Namespace is the namespace of the referred
|
||||
token request object.
|
||||
type: string
|
||||
required:
|
||||
- kind
|
||||
- name
|
||||
- namespace
|
||||
type: object
|
||||
consumer:
|
||||
type: string
|
||||
required:
|
||||
- accessRef
|
||||
- consumer
|
||||
type: object
|
||||
type: array
|
||||
properties:
|
||||
description: |-
|
||||
Properties defines name/value pairs to represent properties of a cluster.
|
||||
It could be a collection of ClusterProperty (KEP-2149) resources,
|
||||
but could also be info based on other implementations.
|
||||
The names of the properties can be predefined names from ClusterProperty resources
|
||||
and is allowed to be customized by different cluster managers.
|
||||
items:
|
||||
description: |-
|
||||
Property defines a name/value pair to represent a property of a cluster.
|
||||
It could be a ClusterProperty (KEP-2149) resource,
|
||||
but could also be info based on other implementations.
|
||||
The name of the property can be predefined name from a ClusterProperty resource
|
||||
and is allowed to be customized by different cluster managers.
|
||||
This property can store various configurable details and metrics of a cluster,
|
||||
which may include information such as the number of nodes, total and free CPU,
|
||||
and total and free memory, among other potential attributes.
|
||||
properties:
|
||||
name:
|
||||
description: |-
|
||||
Name is the name of a property resource on cluster. It's a well-known
|
||||
or customized name to identify the property.
|
||||
maxLength: 253
|
||||
minLength: 1
|
||||
type: string
|
||||
value:
|
||||
description: Value is a property-dependent string
|
||||
maxLength: 1024
|
||||
minLength: 1
|
||||
type: string
|
||||
required:
|
||||
- name
|
||||
- value
|
||||
type: object
|
||||
type: array
|
||||
version:
|
||||
description: Version defines the version information of the cluster.
|
||||
properties:
|
||||
kubernetes:
|
||||
description: Kubernetes is the kubernetes version of the cluster.
|
||||
type: string
|
||||
type: object
|
||||
type: object
|
||||
required:
|
||||
- spec
|
||||
type: object
|
||||
served: true
|
||||
storage: true
|
||||
subresources:
|
||||
status: {}
|
||||
83
solutions/kueue-admission-check/env/patch-clusterrole.json
vendored
Normal file
83
solutions/kueue-admission-check/env/patch-clusterrole.json
vendored
Normal file
@@ -0,0 +1,83 @@
|
||||
[
|
||||
{
|
||||
"op": "add",
|
||||
"path": "/rules/-",
|
||||
"value": {
|
||||
"apiGroups": ["multicluster.x-k8s.io"],
|
||||
"resources": ["clusterprofiles"],
|
||||
"verbs": ["get", "list", "watch", "create", "update", "patch", "delete"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"op": "add",
|
||||
"path": "/rules/-",
|
||||
"value": {
|
||||
"apiGroups": ["multicluster.x-k8s.io"],
|
||||
"resources": ["clusterprofiles/status"],
|
||||
"verbs": ["update", "patch"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"op": "add",
|
||||
"path": "/rules/-",
|
||||
"value": {
|
||||
"apiGroups": ["rbac.open-cluster-management.io"],
|
||||
"resources": ["clusterpermissions"],
|
||||
"verbs": ["get", "list", "watch", "create", "update", "patch", "delete"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"op": "add",
|
||||
"path": "/rules/-",
|
||||
"value": {
|
||||
"apiGroups": ["authentication.open-cluster-management.io"],
|
||||
"resources": ["managedserviceaccounts"],
|
||||
"verbs": ["get", "list", "watch", "create", "update", "patch", "delete"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"op": "add",
|
||||
"path": "/rules/-",
|
||||
"value": {
|
||||
"apiGroups": ["kueue.x-k8s.io"],
|
||||
"resources": ["multikueueconfigs"],
|
||||
"verbs": ["get", "list", "watch", "create", "update", "patch", "delete"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"op": "add",
|
||||
"path": "/rules/-",
|
||||
"value": {
|
||||
"apiGroups": ["kueue.x-k8s.io"],
|
||||
"resources": ["multikueueclusters"],
|
||||
"verbs": ["get", "list", "watch", "create", "update", "patch", "delete"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"op": "add",
|
||||
"path": "/rules/-",
|
||||
"value": {
|
||||
"apiGroups": ["kueue.x-k8s.io"],
|
||||
"resources": ["admissionchecks"],
|
||||
"verbs": ["get", "list", "watch", "create", "update", "patch", "delete"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"op": "add",
|
||||
"path": "/rules/-",
|
||||
"value": {
|
||||
"apiGroups": ["kueue.x-k8s.io"],
|
||||
"resources": ["admissionchecks/status"],
|
||||
"verbs": ["update", "patch"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"op": "add",
|
||||
"path": "/rules/-",
|
||||
"value": {
|
||||
"apiGroups": [""],
|
||||
"resources": ["secrets"],
|
||||
"verbs": ["get", "list", "watch", "create", "update", "patch", "delete"]
|
||||
}
|
||||
}
|
||||
]
|
||||
18
solutions/kueue-admission-check/env/patch-mg-sa-cma.json
vendored
Normal file
18
solutions/kueue-admission-check/env/patch-mg-sa-cma.json
vendored
Normal file
@@ -0,0 +1,18 @@
|
||||
[
|
||||
{
|
||||
"op": "replace",
|
||||
"path": "/spec/installStrategy",
|
||||
"value": {
|
||||
"placements": [
|
||||
{
|
||||
"name": "placement-spoke",
|
||||
"namespace": "default",
|
||||
"rolloutStrategy": {
|
||||
"type": "All"
|
||||
}
|
||||
}
|
||||
],
|
||||
"type": "Placements"
|
||||
}
|
||||
}
|
||||
]
|
||||
14
solutions/kueue-admission-check/env/placement.yaml
vendored
Normal file
14
solutions/kueue-admission-check/env/placement.yaml
vendored
Normal file
@@ -0,0 +1,14 @@
|
||||
# clusteradm clusterset bind global --namespace default
|
||||
apiVersion: cluster.open-cluster-management.io/v1beta1
|
||||
kind: Placement
|
||||
metadata:
|
||||
name: placement-spoke
|
||||
namespace: default
|
||||
spec:
|
||||
clusterSets:
|
||||
- spoke
|
||||
tolerations:
|
||||
- key: cluster.open-cluster-management.io/unreachable
|
||||
operator: Exists
|
||||
- key: cluster.open-cluster-management.io/unavailable
|
||||
operator: Exists
|
||||
90
solutions/kueue-admission-check/env/single-clusterqueue-setup-mwrs.yaml
vendored
Normal file
90
solutions/kueue-admission-check/env/single-clusterqueue-setup-mwrs.yaml
vendored
Normal file
@@ -0,0 +1,90 @@
|
||||
apiVersion: work.open-cluster-management.io/v1alpha1
|
||||
kind: ManifestWorkReplicaSet
|
||||
metadata:
|
||||
name: single-clusterqueue
|
||||
namespace: default
|
||||
spec:
|
||||
placementRefs:
|
||||
- name: placement-spoke
|
||||
manifestWorkTemplate:
|
||||
workload:
|
||||
manifests:
|
||||
- apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
metadata:
|
||||
name: kueue-manager-ocm-rolebinding
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
kind: ClusterRole
|
||||
name: kueue-manager-role
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: klusterlet-work-sa
|
||||
namespace: open-cluster-management-agent
|
||||
- apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
metadata:
|
||||
name: kueue-batch-admin-ocm-rolebinding
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
kind: ClusterRole
|
||||
name: kueue-batch-admin-role
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: klusterlet-work-sa
|
||||
namespace: open-cluster-management-agent
|
||||
- apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: ResourceFlavor
|
||||
metadata:
|
||||
name: "default-flavor-demo1"
|
||||
- apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: ClusterQueue
|
||||
metadata:
|
||||
name: "cluster-queue-demo1"
|
||||
spec:
|
||||
namespaceSelector: {} # match all.
|
||||
resourceGroups:
|
||||
- coveredResources: ["cpu", "memory"]
|
||||
flavors:
|
||||
- name: "default-flavor-demo1"
|
||||
resources:
|
||||
- name: "cpu"
|
||||
nominalQuota: 9
|
||||
- name: "memory"
|
||||
nominalQuota: 36Gi
|
||||
- apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: LocalQueue
|
||||
metadata:
|
||||
namespace: "default"
|
||||
name: "user-queue-demo1"
|
||||
spec:
|
||||
clusterQueue: "cluster-queue-demo1"
|
||||
- apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: ResourceFlavor
|
||||
metadata:
|
||||
name: "default-flavor-demo2"
|
||||
- apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: ClusterQueue
|
||||
metadata:
|
||||
name: "cluster-queue-demo2"
|
||||
spec:
|
||||
namespaceSelector: {} # match all.
|
||||
resourceGroups:
|
||||
- coveredResources: ["cpu", "memory","nvidia.com/gpu"]
|
||||
flavors:
|
||||
- name: "default-flavor-demo2"
|
||||
resources:
|
||||
- name: "cpu"
|
||||
nominalQuota: 9
|
||||
- name: "memory"
|
||||
nominalQuota: 36Gi
|
||||
- name: "nvidia.com/gpu"
|
||||
nominalQuota: 3
|
||||
- apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: LocalQueue
|
||||
metadata:
|
||||
namespace: "default"
|
||||
name: "user-queue-demo2"
|
||||
spec:
|
||||
clusterQueue: "cluster-queue-demo2"
|
||||
|
||||
25
solutions/kueue-admission-check/job-demo1.yaml
Normal file
25
solutions/kueue-admission-check/job-demo1.yaml
Normal file
@@ -0,0 +1,25 @@
|
||||
apiVersion: batch/v1
|
||||
kind: Job
|
||||
metadata:
|
||||
generateName: demo1-job
|
||||
namespace: default
|
||||
labels:
|
||||
kueue.x-k8s.io/queue-name: user-queue-demo1
|
||||
spec:
|
||||
parallelism: 1
|
||||
completions: 1
|
||||
suspend: true
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: dummy-job
|
||||
image: gcr.io/k8s-staging-perf-tests/sleep:v0.1.0
|
||||
args: ["30s"]
|
||||
resources:
|
||||
requests:
|
||||
cpu: "1"
|
||||
memory: "200Mi"
|
||||
limits:
|
||||
cpu: "1"
|
||||
memory: "200Mi"
|
||||
restartPolicy: Never
|
||||
27
solutions/kueue-admission-check/job-demo2.yaml
Normal file
27
solutions/kueue-admission-check/job-demo2.yaml
Normal file
@@ -0,0 +1,27 @@
|
||||
apiVersion: batch/v1
|
||||
kind: Job
|
||||
metadata:
|
||||
generateName: demo2-job
|
||||
namespace: default
|
||||
labels:
|
||||
kueue.x-k8s.io/queue-name: "user-queue-demo2"
|
||||
spec:
|
||||
parallelism: 1
|
||||
completions: 1
|
||||
suspend: true
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: dummy-job
|
||||
image: gcr.io/k8s-staging-perf-tests/sleep:v0.1.0
|
||||
args: ["600s"]
|
||||
resources:
|
||||
requests:
|
||||
cpu: "1"
|
||||
memory: "200Mi"
|
||||
nvidia.com/gpu: "1"
|
||||
limits:
|
||||
cpu: "1"
|
||||
memory: "200Mi"
|
||||
nvidia.com/gpu: "1" # This job requires one GPU.
|
||||
restartPolicy: Never
|
||||
71
solutions/kueue-admission-check/multikueue-setup-demo1.yaml
Normal file
71
solutions/kueue-admission-check/multikueue-setup-demo1.yaml
Normal file
@@ -0,0 +1,71 @@
|
||||
apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: ResourceFlavor
|
||||
metadata:
|
||||
name: "default-flavor-demo1"
|
||||
---
|
||||
apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: ClusterQueue
|
||||
metadata:
|
||||
name: "cluster-queue-demo1"
|
||||
spec:
|
||||
namespaceSelector: {} # match all.
|
||||
resourceGroups:
|
||||
- coveredResources: ["cpu", "memory"]
|
||||
flavors:
|
||||
- name: "default-flavor-demo1"
|
||||
resources:
|
||||
- name: "cpu"
|
||||
nominalQuota: 9
|
||||
- name: "memory"
|
||||
nominalQuota: 36Gi
|
||||
admissionChecks:
|
||||
- multikueue-demo1
|
||||
---
|
||||
apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: LocalQueue
|
||||
metadata:
|
||||
namespace: "default"
|
||||
name: "user-queue-demo1"
|
||||
spec:
|
||||
clusterQueue: "cluster-queue-demo1"
|
||||
---
|
||||
apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: AdmissionCheck
|
||||
metadata:
|
||||
name: multikueue-demo1
|
||||
spec:
|
||||
controllerName: kueue.x-k8s.io/multikueue
|
||||
parameters:
|
||||
apiGroup: kueue.x-k8s.io
|
||||
kind: MultiKueueConfig
|
||||
name: multikueue-config-demo1
|
||||
---
|
||||
apiVersion: kueue.x-k8s.io/v1alpha1
|
||||
kind: MultiKueueConfig
|
||||
metadata:
|
||||
name: multikueue-config-demo1
|
||||
spec:
|
||||
clusters:
|
||||
- multikueue-demo1-cluster1
|
||||
- multikueue-demo1-cluster2
|
||||
---
|
||||
apiVersion: kueue.x-k8s.io/v1alpha1
|
||||
kind: MultiKueueCluster
|
||||
metadata:
|
||||
name: multikueue-demo1-cluster1
|
||||
spec:
|
||||
kubeConfig:
|
||||
locationType: Secret
|
||||
location: kueue-admin-cluster1-kubeconfig
|
||||
# a secret called "kueue-admin-cluster1-kubeconfig" should be created in the namespace the kueue
|
||||
# controller manager runs into, holding the kubeConfig needed to connect to the
|
||||
# worker cluster in the "kubeconfig" key;
|
||||
---
|
||||
apiVersion: kueue.x-k8s.io/v1alpha1
|
||||
kind: MultiKueueCluster
|
||||
metadata:
|
||||
name: multikueue-demo1-cluster2
|
||||
spec:
|
||||
kubeConfig:
|
||||
locationType: Secret
|
||||
location: kueue-admin-cluster2-kubeconfig
|
||||
57
solutions/kueue-admission-check/multikueue-setup-demo2.yaml
Normal file
57
solutions/kueue-admission-check/multikueue-setup-demo2.yaml
Normal file
@@ -0,0 +1,57 @@
|
||||
apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: ResourceFlavor
|
||||
metadata:
|
||||
name: "default-flavor-demo2"
|
||||
---
|
||||
apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: ClusterQueue
|
||||
metadata:
|
||||
name: "cluster-queue-demo2"
|
||||
spec:
|
||||
namespaceSelector: {} # match all.
|
||||
resourceGroups:
|
||||
- coveredResources: ["cpu", "memory","nvidia.com/gpu"]
|
||||
flavors:
|
||||
- name: "default-flavor-demo2"
|
||||
resources:
|
||||
- name: "cpu"
|
||||
nominalQuota: 9
|
||||
- name: "memory"
|
||||
nominalQuota: 36Gi
|
||||
- name: "nvidia.com/gpu"
|
||||
nominalQuota: 3
|
||||
admissionChecks:
|
||||
- multikueue-demo2
|
||||
- placement-demo2
|
||||
---
|
||||
apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: LocalQueue
|
||||
metadata:
|
||||
namespace: "default"
|
||||
name: "user-queue-demo2"
|
||||
spec:
|
||||
clusterQueue: "cluster-queue-demo2"
|
||||
---
|
||||
apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: AdmissionCheck
|
||||
metadata:
|
||||
name: multikueue-demo2
|
||||
spec:
|
||||
controllerName: kueue.x-k8s.io/multikueue
|
||||
parameters:
|
||||
apiGroup: kueue.x-k8s.io
|
||||
kind: MultiKueueConfig
|
||||
name: placement-demo2
|
||||
---
|
||||
# OCM implements an admissioncheck controller to automate the MultiKueue setup process.
|
||||
# MultiKueueConfigs and MultiKueueClusters are generated dynamically based on OCM placement decisions.
|
||||
apiVersion: kueue.x-k8s.io/v1beta1
|
||||
kind: AdmissionCheck
|
||||
metadata:
|
||||
name: placement-demo2
|
||||
spec:
|
||||
controllerName: open-cluster-management.io/placement
|
||||
parameters:
|
||||
apiGroup: cluster.open-cluster-management.io
|
||||
kind: Placement
|
||||
name: placement-demo2
|
||||
18
solutions/kueue-admission-check/placement-demo2-1.yaml
Normal file
18
solutions/kueue-admission-check/placement-demo2-1.yaml
Normal file
@@ -0,0 +1,18 @@
|
||||
apiVersion: cluster.open-cluster-management.io/v1beta1
|
||||
kind: Placement
|
||||
metadata:
|
||||
name: placement-demo2
|
||||
namespace: kueue-system
|
||||
spec:
|
||||
clusterSets:
|
||||
- spoke
|
||||
tolerations:
|
||||
- key: cluster.open-cluster-management.io/unreachable
|
||||
operator: Exists
|
||||
- key: cluster.open-cluster-management.io/unavailable
|
||||
operator: Exists
|
||||
predicates:
|
||||
- requiredClusterSelector:
|
||||
labelSelector:
|
||||
matchLabels:
|
||||
accelerator: nvidia-tesla-t4
|
||||
28
solutions/kueue-admission-check/placement-demo2-2.yaml
Normal file
28
solutions/kueue-admission-check/placement-demo2-2.yaml
Normal file
@@ -0,0 +1,28 @@
|
||||
apiVersion: cluster.open-cluster-management.io/v1beta1
|
||||
kind: Placement
|
||||
metadata:
|
||||
name: placement-demo2
|
||||
namespace: kueue-system
|
||||
spec:
|
||||
clusterSets:
|
||||
- spoke
|
||||
tolerations:
|
||||
- key: cluster.open-cluster-management.io/unreachable
|
||||
operator: Exists
|
||||
- key: cluster.open-cluster-management.io/unavailable
|
||||
operator: Exists
|
||||
predicates:
|
||||
- requiredClusterSelector:
|
||||
labelSelector:
|
||||
matchLabels:
|
||||
accelerator: nvidia-tesla-t4
|
||||
numberOfClusters: 1
|
||||
prioritizerPolicy:
|
||||
mode: Exact
|
||||
configurations:
|
||||
- scoreCoordinate:
|
||||
type: AddOn
|
||||
addOn:
|
||||
resourceName: resource-usage-score
|
||||
scoreName: gpuClusterAvailable
|
||||
weight: 1
|
||||
126
solutions/kueue-admission-check/setup-env.sh
Executable file
126
solutions/kueue-admission-check/setup-env.sh
Executable file
@@ -0,0 +1,126 @@
|
||||
#!/bin/bash
|
||||
|
||||
cd $(dirname ${BASH_SOURCE})
|
||||
|
||||
set -e
|
||||
|
||||
hub=${CLUSTER1:-hub}
|
||||
c1=${CLUSTER1:-cluster1}
|
||||
c2=${CLUSTER2:-cluster2}
|
||||
c3=${CLUSTER2:-cluster3}
|
||||
|
||||
hubctx="kind-${hub}"
|
||||
c1ctx="kind-${c1}"
|
||||
c2ctx="kind-${c2}"
|
||||
c3ctx="kind-${c3}"
|
||||
|
||||
kind create cluster --name "${hub}" --image kindest/node:v1.29.0@sha256:eaa1450915475849a73a9227b8f201df25e55e268e5d619312131292e324d570
|
||||
kind create cluster --name "${c1}" --image kindest/node:v1.29.0@sha256:eaa1450915475849a73a9227b8f201df25e55e268e5d619312131292e324d570
|
||||
kind create cluster --name "${c2}" --image kindest/node:v1.29.0@sha256:eaa1450915475849a73a9227b8f201df25e55e268e5d619312131292e324d570
|
||||
kind create cluster --name "${c3}" --image kindest/node:v1.29.0@sha256:eaa1450915475849a73a9227b8f201df25e55e268e5d619312131292e324d570
|
||||
|
||||
echo "Initialize the ocm hub cluster"
|
||||
|
||||
clusteradm init --feature-gates="ManifestWorkReplicaSet=true,ManagedClusterAutoApproval=true" --bundle-version="latest" --wait --context ${hubctx}
|
||||
joincmd=$(clusteradm get token --context ${hubctx} | grep clusteradm)
|
||||
|
||||
echo "Join cluster1 to hub"
|
||||
$(echo ${joincmd} --force-internal-endpoint-lookup --wait --context ${c1ctx} | sed "s/<cluster_name>/$c1/g")
|
||||
|
||||
echo "Join cluster2 to hub"
|
||||
$(echo ${joincmd} --force-internal-endpoint-lookup --wait --context ${c2ctx} | sed "s/<cluster_name>/$c2/g")
|
||||
|
||||
echo "Join cluster3 to hub"
|
||||
$(echo ${joincmd} --force-internal-endpoint-lookup --wait --context ${c3ctx} | sed "s/<cluster_name>/$c3/g")
|
||||
|
||||
echo "Accept join of cluster1 and cluster2"
|
||||
clusteradm accept --context ${hubctx} --clusters ${c1},${c2},${c3} --wait
|
||||
|
||||
kubectl get managedclusters --all-namespaces --context ${hubctx}
|
||||
|
||||
echo "Install Kueue (this can be replaced with OCM Manifestwork in the future)"
|
||||
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.7.1/manifests.yaml --context ${hubctx}
|
||||
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.7.1/manifests.yaml --context ${c1ctx}
|
||||
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.7.1/manifests.yaml --context ${c2ctx}
|
||||
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.7.1/manifests.yaml --context ${c3ctx}
|
||||
|
||||
echo "Install Jobset for MultiKueue (this can be replaced with OCM Manifestwork in the future)"
|
||||
kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.5.2/manifests.yaml --context ${hubctx}
|
||||
kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.5.2/manifests.yaml --context ${c1ctx}
|
||||
kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.5.2/manifests.yaml --context ${c2ctx}
|
||||
kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.5.2/manifests.yaml --context ${c3ctx}
|
||||
|
||||
kubectl config use-context ${hubctx}
|
||||
|
||||
echo "Patch permission"
|
||||
kubectl patch clusterrole cluster-manager --type='json' -p "$(cat env/patch-clusterrole.json)"
|
||||
|
||||
echo "Patch image"
|
||||
kubectl patch deployment cluster-manager -n open-cluster-management --type=json -p='[
|
||||
{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value": "quay.io/haoqing/registration-operator:latest"},
|
||||
{"op": "replace", "path": "/spec/template/spec/containers/0/imagePullPolicy", "value": "Always"}
|
||||
]'
|
||||
kubectl patch clustermanager cluster-manager --type=json -p='[{"op": "replace", "path": "/spec/registrationImagePullSpec", "value": "quay.io/haoqing/registration:latest"}]'
|
||||
kubectl patch clustermanager cluster-manager --type=json -p='[{"op": "replace", "path": "/spec/placementImagePullSpec", "value": "quay.io/haoqing/placement:latest"}]'
|
||||
|
||||
echo "Install CRDs"
|
||||
kubectl create -f env/multicluster.x-k8s.io_clusterprofiles.yaml
|
||||
|
||||
echo "Install managed-serviceaccount"
|
||||
git clone git@github.com:open-cluster-management-io/managed-serviceaccount.git || true
|
||||
cd managed-serviceaccount
|
||||
helm uninstall -n open-cluster-management-addon managed-serviceaccount || true
|
||||
helm install \
|
||||
-n open-cluster-management-addon --create-namespace \
|
||||
managed-serviceaccount charts/managed-serviceaccount/ \
|
||||
--set tag=latest \
|
||||
--set featureGates.ephemeralIdentity=true \
|
||||
--set enableAddOnDeploymentConfig=true \
|
||||
--set hubDeployMode=AddOnTemplate
|
||||
cd -
|
||||
rm -rf managed-serviceaccount
|
||||
|
||||
echo "Install managed-serviceaccount mca"
|
||||
clusteradm create clusterset spoke
|
||||
clusteradm clusterset set spoke --clusters ${c1},${c2},${c3}
|
||||
clusteradm clusterset bind spoke --namespace default
|
||||
kubectl apply -f env/placement.yaml || true
|
||||
kubectl patch clustermanagementaddon managed-serviceaccount --type='json' -p="$(cat env/patch-mg-sa-cma.json)" || true
|
||||
|
||||
echo "Install cluster-permission"
|
||||
git clone git@github.com:open-cluster-management-io/cluster-permission.git || true
|
||||
cd cluster-permission
|
||||
kubectl apply -f config/crds
|
||||
kubectl apply -f config/rbac
|
||||
kubectl apply -f config/deploy
|
||||
cd -
|
||||
rm -rf cluster-permission
|
||||
|
||||
echo "Install resource-usage-collect-addon"
|
||||
git clone git@github.com:open-cluster-management-io/addon-contrib.git || true
|
||||
cd addon-contrib/resource-usage-collect-addon
|
||||
make deploy
|
||||
cd -
|
||||
rm -rf addon-contrib
|
||||
|
||||
echo "Enable MultiKueue on the hub"
|
||||
kubectl patch deployment kueue-controller-manager -n kueue-system --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": ["--config=/controller_manager_config.yaml", "--zap-log-level=2", "--feature-gates=MultiKueue=true"]}]'
|
||||
|
||||
echo "Setup queue on the spoke"
|
||||
kubectl apply -f env/single-clusterqueue-setup-mwrs.yaml
|
||||
|
||||
echo "Setup credentials for clusterprofile"
|
||||
kubectl apply -f env/cp-c1.yaml
|
||||
kubectl apply -f env/cp-c2.yaml
|
||||
kubectl apply -f env/cp-c3.yaml
|
||||
kubectl apply -f env/msa-c1.yaml
|
||||
kubectl apply -f env/msa-c2.yaml
|
||||
kubectl apply -f env/msa-c3.yaml
|
||||
|
||||
echo "Setup faked GPU on the spoke"
|
||||
kubectl label managedcluster cluster2 accelerator=nvidia-tesla-t4
|
||||
kubectl label managedcluster cluster3 accelerator=nvidia-tesla-t4
|
||||
|
||||
echo "IMPORTANT: RUN BELOW COMMAND MANUALLY on cluster2 and cluster3 !!!"
|
||||
echo "kubectl edit-status node cluster2-control-plane --context ${c2ctx}" with nvidia.com/gpu: "3"
|
||||
echo "kubectl edit-status node cluster3-control-plane --context ${c3ctx}" with nvidia.com/gpu: "3"
|
||||
Reference in New Issue
Block a user