KubeVirt v1.8.2 publishes VMI status checksum fields (uint32 in Go) as
format: int32 in its generated CRD schema. k8s 1.36 enables strict
numeric format validation in CRDs via
https://github.com/kubernetes/kubernetes/pull/136582, which now rejects
the legacy schema and causes virt-handler to enter an infinite VMI
status update re-enqueue loop. Live migrations never complete and the
descheduler e2e TestLiveMigrationInBackground times out.
The schema fix landed upstream in
https://github.com/kubevirt/kubevirt/pull/17469 (merged to main on
2026-04-18, included in v1.9.0-alpha.0 tagged 2026-05-11) but was not
backported to release-1.8, so no v1.8.x release contains it. Bump the
default KUBEVIRT_VERSION to v1.9.0-alpha.0 so the e2e suite consumes a
release whose generated CRDs are compatible with k8s 1.36's stricter
validator.
Tracked in https://github.com/kubevirt/kubevirt/issues/17858.
Reverts b767b9c0f. The helper was added to work around what looked like
the virt-handler containerdisk-socket race on k8s 1.36, but the actual
root cause is unrelated: k8s 1.36's stricter CRD numeric format
validation (kubernetes/kubernetes#136582) rejects VMI status updates
with the pre-fix uint32 Checksum schema. See
https://github.com/kubevirt/kubevirt/issues/17858 for the upstream
context and kubevirt/kubevirt#17469 for the upstream fix (merged to
main, included in v1.9.0-alpha.0, not in v1.8.x).
The follow-up commit bumps KUBEVIRT_VERSION so the test consumes a
KubeVirt release that contains the fix, which removes the need for any
test-side retry.
Adds ensureVMIsLiveMigratable in TestLiveMigrationInBackground. After
the existing wait for virt-launcher pods to reach Running, poll each
VMI for the LiveMigratable=True condition. If a VMI fails to become
migratable within 120s, delete and recreate it (up to 3 attempts).
This works around an upstream KubeVirt race where virt-handler computes
the containerdisk checksum before the disk socket is ready, fails, and
never retries. The recreated VMI lands on a node that has already
cached the containerdisk image, so the socket comes up before
virt-handler's first attempt.
The race surfaces consistently with the kind v1.36.1 node image,
causing TestLiveMigrationInBackground to fail with "Expected at least
3 finished live migrations, got less: 0".
- Update kubevirt.io/api from v1.3.0 to v1.8.2
- Update kubevirt.io/client-go from v1.3.0 to v1.8.2
- Update kubevirt.io/containerized-data-importer-api from v1.57.0-alpha1 to v1.64.0
- Migrate e2e test from deprecated generated clientset path
(kubevirt.io/client-go/generated/kubevirt/clientset/versioned)
to new kubevirt.io/client-go/kubevirt client package
- Update vendor and dependencies for Kubernetes 1.36 compatibility
When KubeVirt sets EvictionInProgressAnnotationKey before returning
TooManyRequests, the informer's UpdateFunc can call addPod
(evictionAssumed=false) before evictPod's assumePod call arrives.
assumePod found the entry already present and returned early, leaving
evictionAssumed=false. DeleteFunc then skipped the "success" metric.
Fix: if the existing entry has evictionAssumed=false (added by addPod),
upgrade it in place without double-counting the pod in the counters.
Adds TestEvictionInBackgroundMetrics_InformerRace to reproduce the race
deterministically.
Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
Profile creation was moved outside the descheduling cycle in b214c147,
but reconcileInClusterSAToken() still runs only in runFnc(), after
newDescheduler() returns. This leaves the prometheus client nil when
LowNodeUtilization's New() runs, causing "prometheus client not
initialized" at startup.
Avoid failing at plugin creation time if the prometheus
client is not yet available. Instead, usageClientForMetrics() is now
called at the start of every extension point via a resetUsageClient()
helper, so each descheduling cycle picks up the latest client regardless
of when the SA token is reconciled or rotated.
Fixes: https://github.com/kubernetes-sigs/descheduler/issues/1840
Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
Background evictions were completely invisible in metrics: the ignore=true
path caused EvictPod to return before incrementing any counter, leaving
operators with no signal that a background eviction had been triggered or
completed.
Add a "background" result label emitted at eviction request time and a
"success" label emitted from the informer DeleteFunc when the pod is
actually gone. The two labels together give a complete picture:
"background" is recorded at eviction request time and may not have a
matching "success" if the descheduler restarts before the pod is deleted,
while "success" confirms the eviction completed within the same lifecycle.
Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
Updates aquasecurity/trivy-action from mutable references to SHA-pinned
version to address security vulnerabilities.
- Updates to v0.35.0 (57a97c7e)
- Pins to specific SHA for immutability
- Addresses issue: aquasecurity/trivy#10425
Signed-off-by: Priyanka Saggu <priyankasaggu11929@gmail.com>
Move container waiting/terminated state checking from PodLifeTime and
RemovePodsHavingTooManyRestarts into podutil as separate exported helpers:
HasMatchingContainerWaitingState and HasMatchingContainerTerminatedState.
Each plugin composes only the helpers it needs.
CodeQL Action v1 and v2 have been deprecated. Update
upload-sarif to v4, remove unnecessary strategy block
(missing required matrix property), and remove invalid
exit-code input from the upload-sarif step.
This commit adds support for init containers in the descheduler Helm chart,
allowing users to run initialization tasks before the main descheduler
container starts.
Changes:
- Add initContainers field to values.yaml with example usage
- Update deployment.yaml template to render init containers
- Update cronjob.yaml template to render init containers
- Bump chart version from 0.34.0 to 0.34.1
Init containers can be used for various purposes such as:
- Pre-loading configuration from external sources
- Waiting for dependencies to be ready
- Setting up required files or permissions
- Running security scans or compliance checks
Example usage in values.yaml:
initContainers:
- name: init-config
image: busybox:1.28
command: ['sh', '-c', 'echo Initializing && sleep 5']
Signed-off-by: kjoshi <kjoshi@egnyte.com>
* Synchronize helm clusterrole RBAC with base yaml
I noticed in v0.35.
```
E0219 23:53:57.761596 1 reflector.go:204] "Failed to watch" err="failed to list *v1.PersistentVolumeClaim: persistentvolumeclaims is forbidden: User \"system:serviceaccount:kube-system:descheduler\" cannot list resource \"persistentvolumeclaims\" in API group \"\" at the cluster scope" logger="UnhandledError" reflector="k8s.io/client-go/informers/factory.go:161" type="*v1.PersistentVolumeClaim"
```
I saw it in rbac.yaml bec9cd38d0/kubernetes/base/rbac.yaml (L38-L40)
So I figured this just needed a bump
* remove dupe
* undo version change
The helm-unittest plugin install was failing with:
error unmarshaling JSON: while decoding JSON: json: unknown field "platformHooks"
Pin helm-unittest to v1.0.3 and bump chart-testing-action to v2.8.0.