Merge pull request #23 from nvtkaszpir/runbook-kubernetes-2

2026-05-20 22:02:45 +00:00 · 2023-02-13 19:05:28 +01:00
parent ff3bf6b3e2 cc9715aca8
commit e2bb5286ea
10 changed files with 396 additions and 0 deletions
--- a/content/runbooks/kubernetes/KubeCPUOvercommit.md
+++ b/content/runbooks/kubernetes/KubeCPUOvercommit.md
@@ -0,0 +1,45 @@
+---
+title: Kube CPU Overcommit
+weight: 20
+---
+
+# KubeCPUOvercommit
+
+## Meaning
+
+Cluster has overcommitted CPU resource requests for Pods
+and cannot tolerate node failure.
+
+<details>
+<summary>Full context</summary>
+
+Total number of CPU requests for pods exceeds cluster capacity.
+In case of node failure some pods will not fit in the remaining nodes.
+
+</details>
+
+## Impact
+
+The cluster cannot tolerate node failure. In the event of a node failure, some Pods will be in `Pending` state.
+
+## Diagnosis
+
+- Check if CPU resource requests are adjusted to the app usage
+- Check if some nodes are available and not cordoned
+- Check if cluster-autoscaler has issues with adding new nodes
+
+## Mitigation
+
+- Add more nodes to the cluster - usually it is better to have more smaller
+  nodes, than few bigger.
+
+- Add different node pools with different instance types to avoid problem
+  when using only one instance type in the cloud.
+
+- Use pod priorities to avoid important services from losing performance,
+  see [pod priority and preemption](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/)
+
+- Fine tune settings for special pods used with [cluster-autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-cluster-autoscaler-work-with-pod-priority-and-preemption)
+
+- Prepare performance tests for the expected workload, plan cluster capacity
+  accordingly.
--- a/content/runbooks/kubernetes/KubeCPUQuotaOvercommit.md
+++ b/content/runbooks/kubernetes/KubeCPUQuotaOvercommit.md
@@ -0,0 +1,39 @@
+---
+title: Kube CPU Quota Overcommit
+weight: 20
+---
+
+# KubeCPUQuotaOvercommit
+
+## Meaning
+
+Cluster has overcommitted CPU resource requests for Namespaces and cannot tolerate node failure.
+
+## Impact
+
+In the event of a node failure, some Pods will be in `Pending` state due to a lack of available CPU resources.
+
+## Diagnosis
+
+- Check if CPU resource requests are adjusted to the app usage
+- Check if some nodes are available and not cordoned
+- Check if cluster-autoscaler has issues with adding new nodes
+- Check if the given namespace usage grows in time more than expected
+
+## Mitigation
+
+- Review existing quota for given namespace and adjust it accordingly.
+
+- Add more nodes to the cluster - usually it is better to have more smaller
+  nodes, than few bigger.
+
+- Add different node pools with different instance types to avoid problem
+  when using only one instance type in the cloud.
+
+- Use pod priorities to avoid important services from losing performance,
+  see [pod priority and preemption](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/)
+
+- Fine tune settings for special pods used with [cluster-autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-cluster-autoscaler-work-with-pod-priority-and-preemption)
+
+- Prepare performance tests for the expected workload, plan cluster capacity
+  accordingly.
--- a/content/runbooks/kubernetes/KubeDaemonSetMisScheduled.md
+++ b/content/runbooks/kubernetes/KubeDaemonSetMisScheduled.md
@@ -0,0 +1,35 @@
+---
+title: Kube DaemonSet MisScheduled
+weight: 20
+---
+
+# KubeDaemonSetMisScheduled
+
+## Meaning
+
+A number of pods of daemonset are running where they are not supposed to run.
+
+## Impact
+
+Service degradation or unavailability.
+Excessive resource usage where they could be used by other apps.
+
+## Diagnosis
+
+Usually happens when specifying wrong pod nodeSelector/taints/affinities or
+node (node pools) were tainted and existing pods were not scheduled for eviction.
+
+- Check daemonset status via `kubectl -n $NAMESPACE describe daemonset $NAME`.
+- Check [DaemonSet update strategy](https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/)
+- Check the status of the pods which belong to the replica sets under the deployment.
+- Check pod template parameters such as:
+  - pod priority - maybe it was evicted by other more important pods
+  - affinity rules - maybe due to affinities and not enough nodes it is not
+    possible to schedule pods
+- Check node taints and labels
+- Check logs for [node-feature-discovery](https://kubernetes-sigs.github.io/node-feature-discovery/master/get-started/index.html)
+  and other supporting tools such as gpu-feature-discovery
+
+## Mitigation
+
+Update DaemonSet and apply change, delete pods manually.
--- a/content/runbooks/kubernetes/KubeDaemonSetNotScheduled.md
+++ b/content/runbooks/kubernetes/KubeDaemonSetNotScheduled.md
@@ -0,0 +1,45 @@
+---
+title: Kube DaemonSet Not Scheduled
+weight: 20
+---
+
+# KubeDaemonSetNotScheduled
+
+## Meaning
+
+A number of pods of daemonset are not scheduled.
+
+## Impact
+
+Service degradation or unavailability.
+
+## Diagnosis
+
+Usually happens when specifying wrong pod taints/affinities or lack of
+resources on the nodes.
+
+- Check daemonset status via `kubectl -n $NAMESPACE describe daemonset $NAME`.
+- Check [DaemonSet update strategy](https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/)
+- Check the status of the pods which belong to the replica sets under the deployment.
+- Check pod template parameters such as:
+  - pod priority - maybe it was evicted by other more important pods
+  - resources - maybe it tries to use unavailable resource, such as GPU but
+    there is limited number of nodes with GPU
+  - affinity rules - maybe due to affinities and not enough nodes it is not
+    possible to schedule pods
+- Check if Horizontal Pod Autoscaler (HPA) is not triggered due to untested
+  values (requests values).
+- Check if cluster-autoscaler is able to create new nodes - see its logs or
+  cluster-autoscaler status configmap.
+
+## Mitigation
+
+Set proper priority class for important dameonsets to system-node-critical.
+
+See [DaemonSet rolling update is stuck](https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/#daemonset-rolling-update-is-stuck)
+
+In some rare cases you may need to change node affinities or delete pod
+manually if this is special daemonset which has specific pod priority class
+and is limited to only 1 replica (so it runs on specific node only)
+
+See [Debugging Pods](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/#debugging-pods)
--- a/content/runbooks/kubernetes/KubeDaemonSetRolloutStuck.md
+++ b/content/runbooks/kubernetes/KubeDaemonSetRolloutStuck.md
@@ -0,0 +1,44 @@
+---
+title: Kube DaemonSet Rollout Stuck
+weight: 20
+---
+
+# KubeDaemonSetRolloutStuck
+
+## Meaning
+
+DaemonSet update is stuck waiting for replaced pod.
+
+
+## Impact
+
+Service degradation or unavailability.
+
+## Diagnosis
+
+- Check daemonset status via `kubectl -n $NAMESPACE describe daemonset $NAME`.
+- Check [DaemonSet update strategy](https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/)
+- Check the status of the pods which belong to the replica sets under the deployment.
+- Check pod template parameters such as:
+  - pod priority - maybe it was evicted by other more important pods
+  - resources - maybe it tries to use unavailable resource, such as GPU but
+    there is limited number of nodes with GPU
+  - affinity rules - maybe due to affinities and not enough nodes it is not
+    possible to schedule pods
+  - pod termination grace period - if too long then pods may be for too long
+    in terminating state
+- Check if Horizontal Pod Autoscaler (HPA) is not triggered due to untested
+  values (requests values).
+- Check if cluster-autoscaler is able to create new nodes - see its logs or
+  cluster-autoscaler status configmap.
+
+## Mitigation
+
+See [DaemonSet rolling update is stuck](https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/#daemonset-rolling-update-is-stuck)
+
+In some rare cases you may need to change node affinities or delete pod
+manually if this is special daemonset
+which has pod priority class system-cluster-critical and is limited to only
+1 replica (so it runs on specific node only)
+
+See [Debugging Pods](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/#debugging-pods)
--- a/content/runbooks/kubernetes/KubeDeploymentGenerationMismatch.md
+++ b/content/runbooks/kubernetes/KubeDeploymentGenerationMismatch.md
@@ -0,0 +1,51 @@
+---
+title: Kube Deployment Generation Mismatch
+weight: 20
+---
+
+# KubeDeploymentGenerationMismatch
+
+## Meaning
+
+Deployment generation mismatch due to possible roll-back.
+
+## Impact
+
+Service degradation or unavailability.
+
+## Diagnosis
+
+See [Kubernetes Docs - Failed Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#failed-deployment)
+
+- Check out rollout history `kubectl -n $NAMESPACE rollout history deployment $NAME`
+- Check rollout status if it is not paused
+- Check deployment status via `kubectl -n $NAMESPACE describe deployment $NAME`.
+- Check how many replicas are there declared.
+- Investigate if new pods are not crashing.
+- Check the status of the pods which belong to the replica sets under the deployment.
+- Check pod template parameters such as:
+  - pod priority - maybe it was evicted by other more important pods
+  - resources - maybe it tries to use unavailable resource, such as GPU
+    but there is limited number of nodes with GPU
+  - affinity rules - maybe due to affinities and not enough nodes it is
+    not possible to schedule pods
+  - pod termination grace period - if too long then pods may be for too long
+    in terminating state
+- Check if Horizontal Pod Autoscaler (HPA) is not triggered due to untested
+  values (requests values).
+- Check if cluster-autoscaler is able to create new nodes - see its logs or
+  cluster-autoscaler status configmap.
+
+## Mitigation
+
+Depending on the conditions usually adding new nodes solves the issue.
+
+Otherwise probably deployment or HPA definition needs to be fixed.
+If you can not add nodes then you can change rolling update strategy to `Recreate`.
+Sometimes manually deleting pod helps :)
+
+In rare cases roll back to previous version - see [Kubernetes Docs - Rolling Back](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-back-to-a-previous-revision)
+
+In extremely rare situations scale oldest ReplicaSets to 0 and delete them.
+
+See [Debugging Pods](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/#debugging-pods)
--- a/content/runbooks/kubernetes/KubeDeploymentReplicasMismatch.md
+++ b/content/runbooks/kubernetes/KubeDeploymentReplicasMismatch.md
@@ -0,0 +1,52 @@
+---
+title: Kube Deployment Replicas Mismatch
+weight: 20
+---
+
+# KubeDeploymentReplicasMismatch
+
+## Meaning
+
+Deployment has not matched the expected number of replicas.
+
+<details>
+<summary>Full context</summary>
+
+Kubernetes Deployment resource does not have number of replicas which were
+declared to be in operation.
+For example deployment is expected to have 3 replicas, but it has less than
+that for a noticeable period of time.
+
+In rare occasions there may be more replicas than it should and system did
+not clean it up.
+</details>
+
+## Impact
+
+Service degradation or unavailability.
+
+## Diagnosis
+
+- Check deployment status via `kubectl -n $NAMESPACE describe deployment $NAME`.
+- Check how many replicas are there declared.
+- Check the status of the pods which belong to the replica sets under the deployment.
+- Check pod template parameters such as:
+  - pod priority - maybe it was evicted by other more important pods
+  - resources - maybe it tries to use unavailable resource, such as GPU
+    but there is limited number of nodes with GPU
+  - affinity rules - maybe due to affinities and not enough nodes it is
+    not possible to schedule pods
+  - pod termination grace period - if too long then pods may be for too long
+    in terminating state
+- Check if Horizontal Pod Autoscaler (HPA) is not triggered due to untested
+  values (requests values).
+- Check if cluster-autoscaler is able to create new nodes - see its logs or
+  cluster-autoscaler status configmap.
+
+## Mitigation
+
+Depending on the conditions usually adding new nodes solves the issue.
+
+Otherwise probably deployment or HPA definition needs to be fixed.
+
+See [Debugging Pods](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/#debugging-pods)
--- a/content/runbooks/kubernetes/KubeHpaMaxedOut.md
+++ b/content/runbooks/kubernetes/KubeHpaMaxedOut.md
@@ -0,0 +1,32 @@
+---
+title: Kube HPA Maxed Out
+weight: 20
+---
+
+# KubeHpaMaxedOut
+
+## Meaning
+
+Horizontal Pod Autoscaler has been running at max replicas for longer
+than 15 minutes.
+
+## Impact
+
+Horizontal Pod Autoscaler won't be able to add new pods and thus scale application.
+**Notice** for some services maximizing HPA is in fact desired.
+
+## Diagnosis
+
+Check why HPA was unable to scale:
+
+- max replicas too low
+- too low value for requests such as CPU?
+
+## Mitigation
+
+If using basic metrics like CPU/Memory then ensure to set proper values for
+`requests`.
+For memory based scaling ensure there are no memory leaks.
+If using custom metrics then tine tune how app scales accordingly to it.
+
+Use performance tests to see how the app scales.
--- a/content/runbooks/kubernetes/KubeHpaReplicasMismatch.md
+++ b/content/runbooks/kubernetes/KubeHpaReplicasMismatch.md
@@ -0,0 +1,28 @@
+---
+title: Kube HPA  Replicas Mismatch
+weight: 20
+---
+
+# KubeHpaReplicasMismatch
+
+## Meaning
+
+Horizontal Pod Autoscaler has not matched the desired number of replicas for
+longer than 15 minutes.
+
+## Impact
+
+HPA was unable to schedule desired number of pods.
+
+## Diagnosis
+
+Check why HPA was unable to scale:
+
+- not enough nodes in the cluster
+- hitting resource quotas in the cluster
+- pods evicted due to pod priority
+
+## Mitigation
+
+In case of cluster-autoscaler you may need to set up preemtive pod pools to
+ensure nodes are created on time.
--- a/content/runbooks/kubernetes/KubeJobCompletion.md
+++ b/content/runbooks/kubernetes/KubeJobCompletion.md
@@ -0,0 +1,25 @@
+---
+title: Kube Job Completion
+weight: 20
+---
+
+# KubeJobCompletion
+
+## Meaning
+
+Job is taking more than 1h to complete.
+
+## Impact
+
+- Long processing of batch jobs.
+- Possible issues with scheduling next Job
+
+## Diagnosis
+
+- Check job via `kubectl -n $NAMESPACE describe jobs $JOB`.
+- Check pod events via `kubectl -n $NAMESPACE describe job $JOB`.
+
+## Mitigation
+
+- Give it more resources so it finishes faster, if applicable.
+- See [Job patterns](https://kubernetes.io/docs/tasks/job/)