mirror of
https://github.com/jpetazzo/container.training.git
synced 2026-02-14 17:49:59 +00:00
🔥 Add prometheus-stack + Grafana content (from LKE workshop) and update metrics-server section
This commit is contained in:
@@ -1,69 +1,182 @@
|
||||
# Checking pod and node resource usage
|
||||
# Checking Node and Pod resource usage
|
||||
|
||||
- Since Kubernetes 1.8, metrics are collected by the [resource metrics pipeline](https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/)
|
||||
- We've installed a few things on our cluster so far
|
||||
|
||||
- The resource metrics pipeline is:
|
||||
- How much resources (CPU, RAM) are we using?
|
||||
|
||||
- optional (Kubernetes can function without it)
|
||||
|
||||
- necessary for some features (like the Horizontal Pod Autoscaler)
|
||||
|
||||
- exposed through the Kubernetes API using the [aggregation layer](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
|
||||
|
||||
- usually implemented by the "metrics server"
|
||||
|
||||
---
|
||||
|
||||
## How to know if the metrics server is running?
|
||||
|
||||
- The easiest way to know is to run `kubectl top`
|
||||
- We need metrics!
|
||||
|
||||
.exercise[
|
||||
|
||||
- Check if the core metrics pipeline is available:
|
||||
- Let's try the following command:
|
||||
```bash
|
||||
kubectl top nodes
|
||||
```
|
||||
]
|
||||
|
||||
---
|
||||
|
||||
## Is metrics-server installed?
|
||||
|
||||
- If we see a list of nodes, with CPU and RAM usage:
|
||||
|
||||
*great, metrics-server is installed!*
|
||||
|
||||
- If we see `error: Metrics API not available`:
|
||||
|
||||
*metrics-server isn't installed, so we'll install it!*
|
||||
|
||||
---
|
||||
|
||||
## The resource metrics pipeline
|
||||
|
||||
- The `kubectl top` command relies on the Metrics API
|
||||
|
||||
- The Metrics API is part of the "[resource metrics pipeline]"
|
||||
|
||||
- The Metrics API isn't served (built into) the Kubernetes API server
|
||||
|
||||
- It is made available through the [aggregation layer]
|
||||
|
||||
- It is usually served by a component called metrics-server
|
||||
|
||||
- It is optional (Kubernetes can function without it)
|
||||
|
||||
- It is necessary for some features (like the Horizontal Pod Autoscaler)
|
||||
|
||||
[resource metrics pipeline]: https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/
|
||||
[aggregation layer]: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/
|
||||
|
||||
---
|
||||
|
||||
## Other ways to get metrics
|
||||
|
||||
- We could use a SAAS like Datadog, New Relic...
|
||||
|
||||
- We could use a self-hosted solution like Prometheus
|
||||
|
||||
- Or we could use metrics-server
|
||||
|
||||
- What's special about metrics-server?
|
||||
|
||||
---
|
||||
|
||||
## Pros/cons
|
||||
|
||||
Cons:
|
||||
|
||||
- no data retention (no history data, just instant numbers)
|
||||
|
||||
- only CPU and RAM of nodes and pods (no disk or network usage or I/O...)
|
||||
|
||||
Pros:
|
||||
|
||||
- very lightweight
|
||||
|
||||
- doesn't require storage
|
||||
|
||||
- used by Kubernetes autoscaling
|
||||
|
||||
---
|
||||
|
||||
## Why metrics-server
|
||||
|
||||
- We may install something fancier later
|
||||
|
||||
(think: Prometheus with Grafana)
|
||||
|
||||
- But metrics-server will work in *minutes*
|
||||
|
||||
- It will barely use resources on our cluster
|
||||
|
||||
- It's required for autoscaling anyway
|
||||
|
||||
---
|
||||
|
||||
## How metric-server works
|
||||
|
||||
- It runs a single Pod
|
||||
|
||||
- That Pod will fetch metrics from all our Nodes
|
||||
|
||||
- It will expose them through the Kubernetes API agregation layer
|
||||
|
||||
(we won't say much more about that agregation layer; that's fairly advanced stuff!)
|
||||
|
||||
---
|
||||
|
||||
## Installing metrics-server
|
||||
|
||||
- In a lot of places, this is done with a little bit of custom YAML
|
||||
|
||||
(derived from the [official installation instructions](https://github.com/kubernetes-sigs/metrics-server#installation))
|
||||
|
||||
- We're going to use Helm one more time:
|
||||
```bash
|
||||
helm upgrade --install metrics-server bitnami/metrics-server \
|
||||
--create-namespace --namespace metrics-server \
|
||||
--set apiService.create=true \
|
||||
--set extraArgs.kubelet-insecure-tls=true \
|
||||
--set extraArgs.kubelet-preferred-address-types=InternalIP
|
||||
```
|
||||
|
||||
- What are these options for?
|
||||
|
||||
---
|
||||
|
||||
## Installation options
|
||||
|
||||
- `apiService.create=true`
|
||||
|
||||
register `metrics-server` with the Kubernetes agregation layer
|
||||
|
||||
(create an entry that will show up in `kubectl get apiservices`)
|
||||
|
||||
- `extraArgs.kubelet-insecure-tls=true`
|
||||
|
||||
when connecting to nodes to collect their metrics, don't check kubelet TLS certs
|
||||
|
||||
(because most kubelet certs include the node name, but not its IP address)
|
||||
|
||||
- `extraArgs.kubelet-preferred-address-types=InternalIP`
|
||||
|
||||
when connecting to nodes, use their internal IP address instead of node name
|
||||
|
||||
(because the latter requires an internal DNS, which is rarely configured)
|
||||
|
||||
---
|
||||
|
||||
## Testing metrics-server
|
||||
|
||||
- After a minute or two, metrics-server should be up
|
||||
|
||||
- We should now be able to check Nodes resource usage:
|
||||
```bash
|
||||
kubectl top nodes
|
||||
```
|
||||
|
||||
]
|
||||
|
||||
If it shows our nodes and their CPU and memory load, we're good!
|
||||
|
||||
---
|
||||
|
||||
## Installing metrics server
|
||||
|
||||
- The metrics server doesn't have any particular requirements
|
||||
|
||||
(it doesn't need persistence, as it doesn't *store* metrics)
|
||||
|
||||
- It has its own repository, [kubernetes-incubator/metrics-server](https://github.com/kubernetes-incubator/metrics-server)
|
||||
|
||||
- The repository comes with [YAML files for deployment](https://github.com/kubernetes-incubator/metrics-server/tree/master/deploy/1.8%2B)
|
||||
|
||||
- These files may not work on some clusters
|
||||
|
||||
(e.g. if your node names are not in DNS)
|
||||
|
||||
- The container.training repository has a [metrics-server.yaml](https://github.com/jpetazzo/container.training/blob/master/k8s/metrics-server.yaml#L90) file to help with that
|
||||
|
||||
(we can `kubectl apply -f` that file if needed)
|
||||
|
||||
---
|
||||
|
||||
## Showing container resource usage
|
||||
|
||||
- Once the metrics server is running, we can check container resource usage
|
||||
|
||||
.exercise[
|
||||
|
||||
- Show resource usage across all containers:
|
||||
- And Pods resource usage, too:
|
||||
```bash
|
||||
kubectl top pods --containers --all-namespaces
|
||||
kubectl top pods --all-namespaces
|
||||
```
|
||||
]
|
||||
|
||||
- We can also use selectors (`-l app=...`)
|
||||
---
|
||||
|
||||
## Keep some padding
|
||||
|
||||
- The RAM usage that we see should correspond more or less to the Resident Set Size
|
||||
|
||||
- Our pods also need some extra space for buffers, caches...
|
||||
|
||||
- Do not aim for 100% memory usage!
|
||||
|
||||
- Some more realistic targets:
|
||||
|
||||
50% (for workloads with disk I/O and leveraging caching)
|
||||
|
||||
90% (on very big nodes with mostly CPU-bound workloads)
|
||||
|
||||
75% (anywhere in between!)
|
||||
|
||||
---
|
||||
|
||||
@@ -83,5 +196,8 @@ If it shows our nodes and their CPU and memory load, we're good!
|
||||
|
||||
???
|
||||
|
||||
:EN:- The *core metrics pipeline*
|
||||
:FR:- Le *core metrics pipeline*
|
||||
:EN:- The resource metrics pipeline
|
||||
:EN:- Installing metrics-server
|
||||
|
||||
:EN:- Le *resource metrics pipeline*
|
||||
:FR:- Installtion de metrics-server
|
||||
|
||||
123
slides/k8s/prometheus-stack.md
Normal file
123
slides/k8s/prometheus-stack.md
Normal file
@@ -0,0 +1,123 @@
|
||||
# Prometheus and Grafana
|
||||
|
||||
- What if we want metrics retention, view graphs, trends?
|
||||
|
||||
- A very popular combo is Prometheus+Grafana:
|
||||
|
||||
- Prometheus as the "metrics engine"
|
||||
|
||||
- Grafana to display comprehensive dashboards
|
||||
|
||||
- Prometheus also has an alert-manager component to trigger alerts
|
||||
|
||||
(we won't talk about that one)
|
||||
|
||||
---
|
||||
|
||||
## Installing Prometheus and Grafana
|
||||
|
||||
- A complete metrics stack needs at least:
|
||||
|
||||
- the Prometheus server (collects metrics and stores them efficiently)
|
||||
|
||||
- a collection of *exporters* (exposing metrics to Prometheus)
|
||||
|
||||
- Grafana
|
||||
|
||||
- a collection of Grafana dashboards (building them from scratch is tedious)
|
||||
|
||||
- The Helm chart `kube-prometheus-stack` combines all these elements
|
||||
|
||||
- ... So we're going to use it to deploy our metrics stack!
|
||||
|
||||
---
|
||||
|
||||
## Installing `kube-prometheus-stack`
|
||||
|
||||
- Let's install that stack *directly* from its repo
|
||||
|
||||
(without doing `helm repo add` first)
|
||||
|
||||
- Otherwise, keep the same naming strategy:
|
||||
```bash
|
||||
helm upgrade --install kube-prometheus-stack kube-prometheus-stack \
|
||||
--namespace kube-prometheus-stack --create-namespace \
|
||||
--repo https://prometheus-community.github.io/helm-charts
|
||||
```
|
||||
|
||||
- This will take a minute...
|
||||
|
||||
- Then check what was installed:
|
||||
```bash
|
||||
kubectl get all --namespace kube-prometheus-stack
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Exposing Grafana
|
||||
|
||||
- Let's create an Ingress for Grafana
|
||||
```bash
|
||||
kubectl create ingress --namespace kube-prometheus-stack grafana \
|
||||
--rule=grafana.`cloudnative.party`/*=kube-prometheus-stack-grafana:80
|
||||
```
|
||||
|
||||
(as usual, make sure to use *your* domain name above)
|
||||
|
||||
- Connect to Grafana
|
||||
|
||||
(remember that the DNS record might take a few minutes to come up)
|
||||
|
||||
---
|
||||
|
||||
## Grafana credentials
|
||||
|
||||
- What could the login and password be?
|
||||
|
||||
- Let's look at the Secrets available in the namespace:
|
||||
```bash
|
||||
kubectl get secrets --namespace kube-prometheus-stack
|
||||
```
|
||||
|
||||
- There is a `kube-prometheus-stack-grafana` that looks promising!
|
||||
|
||||
- Decode the Secret:
|
||||
```bash
|
||||
kubectl get secret --namespace kube-prometheus-stack \
|
||||
kube-prometheus-stack-grafana -o json | jq '.data | map_values(@base64d)'
|
||||
```
|
||||
|
||||
- If you don't have the `jq` tool mentioned above, don't worry...
|
||||
|
||||
--
|
||||
|
||||
- The login/password is hardcoded to `admin`/`prom-operator` 😬
|
||||
|
||||
---
|
||||
|
||||
## Grafana dashboards
|
||||
|
||||
- Once logged in, click on the "Dashboards" icon on the left
|
||||
|
||||
(it's the one that looks like four squares)
|
||||
|
||||
- Then click on the "Manage" entry
|
||||
|
||||
- Then click on "Kubernetes / Compute Resources / Cluster"
|
||||
|
||||
- This gives us a breakdown of resource usage by Namespace
|
||||
|
||||
- Feel free to explore the other dashboards!
|
||||
|
||||
???
|
||||
|
||||
:EN:- Installing Prometheus and Grafana
|
||||
:FR:- Installer Prometheus et Grafana
|
||||
|
||||
:T: Observing our cluster with Prometheus and Grafana
|
||||
|
||||
:Q: What's the relationship between Prometheus and Grafana?
|
||||
:A: Prometheus collects and graphs metrics; Grafana sends alerts
|
||||
:A: ✔️Prometheus collects metrics; Grafana displays them on dashboards
|
||||
:A: Prometheus collects and graphs metrics; Grafana is its configuration interface
|
||||
:A: Grafana collects and graphs metrics; Prometheus sends alerts
|
||||
@@ -64,6 +64,7 @@ content:
|
||||
- k8s/cluster-sizing.md
|
||||
- k8s/horizontal-pod-autoscaler.md
|
||||
- - k8s/prometheus.md
|
||||
#- k8s/prometheus-stack.md
|
||||
- k8s/extending-api.md
|
||||
- k8s/crd.md
|
||||
- k8s/operators.md
|
||||
|
||||
@@ -69,6 +69,7 @@ content:
|
||||
- k8s/aggregation-layer.md
|
||||
- k8s/metrics-server.md
|
||||
- k8s/prometheus.md
|
||||
- k8s/prometheus-stack.md
|
||||
- k8s/hpa-v2.md
|
||||
- #9
|
||||
- k8s/operators-design.md
|
||||
|
||||
@@ -106,6 +106,7 @@ content:
|
||||
#- k8s/build-with-kaniko.md
|
||||
#- k8s/logs-centralized.md
|
||||
#- k8s/prometheus.md
|
||||
#- k8s/prometheus-stack.md
|
||||
#- k8s/statefulsets.md
|
||||
#- k8s/local-persistent-volumes.md
|
||||
#- k8s/portworx.md
|
||||
|
||||
@@ -116,6 +116,7 @@ content:
|
||||
-
|
||||
- k8s/logs-centralized.md
|
||||
- k8s/prometheus.md
|
||||
- k8s/prometheus-stack.md
|
||||
- k8s/resource-limits.md
|
||||
- k8s/metrics-server.md
|
||||
- k8s/cluster-sizing.md
|
||||
|
||||
@@ -104,7 +104,8 @@ content:
|
||||
- k8s/configuration.md
|
||||
- k8s/secrets.md
|
||||
- k8s/logs-centralized.md
|
||||
- k8s/prometheus.md
|
||||
#- k8s/prometheus.md
|
||||
#- k8s/prometheus-stack.md
|
||||
-
|
||||
- k8s/statefulsets.md
|
||||
- k8s/local-persistent-volumes.md
|
||||
|
||||
Reference in New Issue
Block a user