Merge pull request #23 from nvtkaszpir/runbook-kubernetes-2

This commit is contained in:
Paweł Krupa
2023-02-13 19:05:28 +01:00
committed by GitHub
10 changed files with 396 additions and 0 deletions

View File

@@ -0,0 +1,45 @@
---
title: Kube CPU Overcommit
weight: 20
---
# KubeCPUOvercommit
## Meaning
Cluster has overcommitted CPU resource requests for Pods
and cannot tolerate node failure.
<details>
<summary>Full context</summary>
Total number of CPU requests for pods exceeds cluster capacity.
In case of node failure some pods will not fit in the remaining nodes.
</details>
## Impact
The cluster cannot tolerate node failure. In the event of a node failure, some Pods will be in `Pending` state.
## Diagnosis
- Check if CPU resource requests are adjusted to the app usage
- Check if some nodes are available and not cordoned
- Check if cluster-autoscaler has issues with adding new nodes
## Mitigation
- Add more nodes to the cluster - usually it is better to have more smaller
nodes, than few bigger.
- Add different node pools with different instance types to avoid problem
when using only one instance type in the cloud.
- Use pod priorities to avoid important services from losing performance,
see [pod priority and preemption](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/)
- Fine tune settings for special pods used with [cluster-autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-cluster-autoscaler-work-with-pod-priority-and-preemption)
- Prepare performance tests for the expected workload, plan cluster capacity
accordingly.

View File

@@ -0,0 +1,39 @@
---
title: Kube CPU Quota Overcommit
weight: 20
---
# KubeCPUQuotaOvercommit
## Meaning
Cluster has overcommitted CPU resource requests for Namespaces and cannot tolerate node failure.
## Impact
In the event of a node failure, some Pods will be in `Pending` state due to a lack of available CPU resources.
## Diagnosis
- Check if CPU resource requests are adjusted to the app usage
- Check if some nodes are available and not cordoned
- Check if cluster-autoscaler has issues with adding new nodes
- Check if the given namespace usage grows in time more than expected
## Mitigation
- Review existing quota for given namespace and adjust it accordingly.
- Add more nodes to the cluster - usually it is better to have more smaller
nodes, than few bigger.
- Add different node pools with different instance types to avoid problem
when using only one instance type in the cloud.
- Use pod priorities to avoid important services from losing performance,
see [pod priority and preemption](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/)
- Fine tune settings for special pods used with [cluster-autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-cluster-autoscaler-work-with-pod-priority-and-preemption)
- Prepare performance tests for the expected workload, plan cluster capacity
accordingly.

View File

@@ -0,0 +1,35 @@
---
title: Kube DaemonSet MisScheduled
weight: 20
---
# KubeDaemonSetMisScheduled
## Meaning
A number of pods of daemonset are running where they are not supposed to run.
## Impact
Service degradation or unavailability.
Excessive resource usage where they could be used by other apps.
## Diagnosis
Usually happens when specifying wrong pod nodeSelector/taints/affinities or
node (node pools) were tainted and existing pods were not scheduled for eviction.
- Check daemonset status via `kubectl -n $NAMESPACE describe daemonset $NAME`.
- Check [DaemonSet update strategy](https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/)
- Check the status of the pods which belong to the replica sets under the deployment.
- Check pod template parameters such as:
- pod priority - maybe it was evicted by other more important pods
- affinity rules - maybe due to affinities and not enough nodes it is not
possible to schedule pods
- Check node taints and labels
- Check logs for [node-feature-discovery](https://kubernetes-sigs.github.io/node-feature-discovery/master/get-started/index.html)
and other supporting tools such as gpu-feature-discovery
## Mitigation
Update DaemonSet and apply change, delete pods manually.

View File

@@ -0,0 +1,45 @@
---
title: Kube DaemonSet Not Scheduled
weight: 20
---
# KubeDaemonSetNotScheduled
## Meaning
A number of pods of daemonset are not scheduled.
## Impact
Service degradation or unavailability.
## Diagnosis
Usually happens when specifying wrong pod taints/affinities or lack of
resources on the nodes.
- Check daemonset status via `kubectl -n $NAMESPACE describe daemonset $NAME`.
- Check [DaemonSet update strategy](https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/)
- Check the status of the pods which belong to the replica sets under the deployment.
- Check pod template parameters such as:
- pod priority - maybe it was evicted by other more important pods
- resources - maybe it tries to use unavailable resource, such as GPU but
there is limited number of nodes with GPU
- affinity rules - maybe due to affinities and not enough nodes it is not
possible to schedule pods
- Check if Horizontal Pod Autoscaler (HPA) is not triggered due to untested
values (requests values).
- Check if cluster-autoscaler is able to create new nodes - see its logs or
cluster-autoscaler status configmap.
## Mitigation
Set proper priority class for important dameonsets to system-node-critical.
See [DaemonSet rolling update is stuck](https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/#daemonset-rolling-update-is-stuck)
In some rare cases you may need to change node affinities or delete pod
manually if this is special daemonset which has specific pod priority class
and is limited to only 1 replica (so it runs on specific node only)
See [Debugging Pods](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/#debugging-pods)

View File

@@ -0,0 +1,44 @@
---
title: Kube DaemonSet Rollout Stuck
weight: 20
---
# KubeDaemonSetRolloutStuck
## Meaning
DaemonSet update is stuck waiting for replaced pod.
## Impact
Service degradation or unavailability.
## Diagnosis
- Check daemonset status via `kubectl -n $NAMESPACE describe daemonset $NAME`.
- Check [DaemonSet update strategy](https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/)
- Check the status of the pods which belong to the replica sets under the deployment.
- Check pod template parameters such as:
- pod priority - maybe it was evicted by other more important pods
- resources - maybe it tries to use unavailable resource, such as GPU but
there is limited number of nodes with GPU
- affinity rules - maybe due to affinities and not enough nodes it is not
possible to schedule pods
- pod termination grace period - if too long then pods may be for too long
in terminating state
- Check if Horizontal Pod Autoscaler (HPA) is not triggered due to untested
values (requests values).
- Check if cluster-autoscaler is able to create new nodes - see its logs or
cluster-autoscaler status configmap.
## Mitigation
See [DaemonSet rolling update is stuck](https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/#daemonset-rolling-update-is-stuck)
In some rare cases you may need to change node affinities or delete pod
manually if this is special daemonset
which has pod priority class system-cluster-critical and is limited to only
1 replica (so it runs on specific node only)
See [Debugging Pods](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/#debugging-pods)

View File

@@ -0,0 +1,51 @@
---
title: Kube Deployment Generation Mismatch
weight: 20
---
# KubeDeploymentGenerationMismatch
## Meaning
Deployment generation mismatch due to possible roll-back.
## Impact
Service degradation or unavailability.
## Diagnosis
See [Kubernetes Docs - Failed Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#failed-deployment)
- Check out rollout history `kubectl -n $NAMESPACE rollout history deployment $NAME`
- Check rollout status if it is not paused
- Check deployment status via `kubectl -n $NAMESPACE describe deployment $NAME`.
- Check how many replicas are there declared.
- Investigate if new pods are not crashing.
- Check the status of the pods which belong to the replica sets under the deployment.
- Check pod template parameters such as:
- pod priority - maybe it was evicted by other more important pods
- resources - maybe it tries to use unavailable resource, such as GPU
but there is limited number of nodes with GPU
- affinity rules - maybe due to affinities and not enough nodes it is
not possible to schedule pods
- pod termination grace period - if too long then pods may be for too long
in terminating state
- Check if Horizontal Pod Autoscaler (HPA) is not triggered due to untested
values (requests values).
- Check if cluster-autoscaler is able to create new nodes - see its logs or
cluster-autoscaler status configmap.
## Mitigation
Depending on the conditions usually adding new nodes solves the issue.
Otherwise probably deployment or HPA definition needs to be fixed.
If you can not add nodes then you can change rolling update strategy to `Recreate`.
Sometimes manually deleting pod helps :)
In rare cases roll back to previous version - see [Kubernetes Docs - Rolling Back](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-back-to-a-previous-revision)
In extremely rare situations scale oldest ReplicaSets to 0 and delete them.
See [Debugging Pods](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/#debugging-pods)

View File

@@ -0,0 +1,52 @@
---
title: Kube Deployment Replicas Mismatch
weight: 20
---
# KubeDeploymentReplicasMismatch
## Meaning
Deployment has not matched the expected number of replicas.
<details>
<summary>Full context</summary>
Kubernetes Deployment resource does not have number of replicas which were
declared to be in operation.
For example deployment is expected to have 3 replicas, but it has less than
that for a noticeable period of time.
In rare occasions there may be more replicas than it should and system did
not clean it up.
</details>
## Impact
Service degradation or unavailability.
## Diagnosis
- Check deployment status via `kubectl -n $NAMESPACE describe deployment $NAME`.
- Check how many replicas are there declared.
- Check the status of the pods which belong to the replica sets under the deployment.
- Check pod template parameters such as:
- pod priority - maybe it was evicted by other more important pods
- resources - maybe it tries to use unavailable resource, such as GPU
but there is limited number of nodes with GPU
- affinity rules - maybe due to affinities and not enough nodes it is
not possible to schedule pods
- pod termination grace period - if too long then pods may be for too long
in terminating state
- Check if Horizontal Pod Autoscaler (HPA) is not triggered due to untested
values (requests values).
- Check if cluster-autoscaler is able to create new nodes - see its logs or
cluster-autoscaler status configmap.
## Mitigation
Depending on the conditions usually adding new nodes solves the issue.
Otherwise probably deployment or HPA definition needs to be fixed.
See [Debugging Pods](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/#debugging-pods)

View File

@@ -0,0 +1,32 @@
---
title: Kube HPA Maxed Out
weight: 20
---
# KubeHpaMaxedOut
## Meaning
Horizontal Pod Autoscaler has been running at max replicas for longer
than 15 minutes.
## Impact
Horizontal Pod Autoscaler won't be able to add new pods and thus scale application.
**Notice** for some services maximizing HPA is in fact desired.
## Diagnosis
Check why HPA was unable to scale:
- max replicas too low
- too low value for requests such as CPU?
## Mitigation
If using basic metrics like CPU/Memory then ensure to set proper values for
`requests`.
For memory based scaling ensure there are no memory leaks.
If using custom metrics then tine tune how app scales accordingly to it.
Use performance tests to see how the app scales.

View File

@@ -0,0 +1,28 @@
---
title: Kube HPA Replicas Mismatch
weight: 20
---
# KubeHpaReplicasMismatch
## Meaning
Horizontal Pod Autoscaler has not matched the desired number of replicas for
longer than 15 minutes.
## Impact
HPA was unable to schedule desired number of pods.
## Diagnosis
Check why HPA was unable to scale:
- not enough nodes in the cluster
- hitting resource quotas in the cluster
- pods evicted due to pod priority
## Mitigation
In case of cluster-autoscaler you may need to set up preemtive pod pools to
ensure nodes are created on time.

View File

@@ -0,0 +1,25 @@
---
title: Kube Job Completion
weight: 20
---
# KubeJobCompletion
## Meaning
Job is taking more than 1h to complete.
## Impact
- Long processing of batch jobs.
- Possible issues with scheduling next Job
## Diagnosis
- Check job via `kubectl -n $NAMESPACE describe jobs $JOB`.
- Check pod events via `kubectl -n $NAMESPACE describe job $JOB`.
## Mitigation
- Give it more resources so it finishes faster, if applicable.
- See [Job patterns](https://kubernetes.io/docs/tasks/job/)