🔥 Add prometheus-stack + Grafana content (from LKE workshop) and update metrics-server section

2026-02-14 17:49:59 +00:00 · 2021-05-04 17:19:59 +02:00
parent bbf65f7433
commit 98429e14f0
7 changed files with 300 additions and 56 deletions
--- a/slides/k8s/metrics-server.md
+++ b/slides/k8s/metrics-server.md
@@ -1,69 +1,182 @@
-# Checking pod and node resource usage
+# Checking Node and Pod resource usage

- Since Kubernetes 1.8, metrics are collected by the [resource metrics pipeline](https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/)
+- We've installed a few things on our cluster so far

- The resource metrics pipeline is:
+- How much resources (CPU, RAM) are we using?

-  - optional (Kubernetes can function without it)
-
-  - necessary for some features (like the Horizontal Pod Autoscaler)
-
-  - exposed through the Kubernetes API using the [aggregation layer](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
-
-  - usually implemented by the "metrics server"
-
---
-
-## How to know if the metrics server is running?
-
- The easiest way to know is to run `kubectl top`
+- We need metrics!

 .exercise[

- Check if the core metrics pipeline is available:
+- Let's try the following command:
+  ```bash
+  kubectl top nodes
+  ```
+]
+
+---
+
+## Is metrics-server installed?
+
+- If we see a list of nodes, with CPU and RAM usage:
+
+  *great, metrics-server is installed!*
+
+- If we see `error: Metrics API not available`:
+
+  *metrics-server isn't installed, so we'll install it!*
+
+---
+
+## The resource metrics pipeline
+
+- The `kubectl top` command relies on the Metrics API
+
+- The Metrics API is part of the "[resource metrics pipeline]"
+
+- The Metrics API isn't served (built into) the Kubernetes API server
+
+- It is made available through the [aggregation layer]
+
+- It is usually served by a component called metrics-server
+
+- It is optional (Kubernetes can function without it)
+
+- It is necessary for some features (like the Horizontal Pod Autoscaler)
+
+[resource metrics pipeline]: https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/
+[aggregation layer]: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/
+
+---
+
+## Other ways to get metrics
+
+- We could use a SAAS like Datadog, New Relic...
+
+- We could use a self-hosted solution like Prometheus
+
+- Or we could use metrics-server
+
+- What's special about metrics-server?
+
+---
+
+## Pros/cons
+
+Cons:
+
+- no data retention (no history data, just instant numbers)
+
+- only CPU and RAM of nodes and pods (no disk or network usage or I/O...)
+
+Pros:
+
+- very lightweight
+
+- doesn't require storage
+
+- used by Kubernetes autoscaling
+
+---
+
+## Why metrics-server
+
+- We may install something fancier later
+
+  (think: Prometheus with Grafana)
+
+- But metrics-server will work in *minutes*
+
+- It will barely use resources on our cluster
+
+- It's required for autoscaling anyway
+
+---
+
+## How metric-server works
+
+- It runs a single Pod
+
+- That Pod will fetch metrics from all our Nodes
+
+- It will expose them through the Kubernetes API agregation layer
+
+  (we won't say much more about that agregation layer; that's fairly advanced stuff!)
+
+---
+
+## Installing metrics-server
+
+- In a lot of places, this is done with a little bit of custom YAML
+
+  (derived from the [official installation instructions](https://github.com/kubernetes-sigs/metrics-server#installation))
+
+- We're going to use Helm one more time:
+  ```bash
+    helm upgrade --install metrics-server bitnami/metrics-server \
+      --create-namespace --namespace metrics-server \
+      --set apiService.create=true \
+      --set extraArgs.kubelet-insecure-tls=true \
+      --set extraArgs.kubelet-preferred-address-types=InternalIP
+  ```
+
+- What are these options for?
+
+---
+
+## Installation options
+
+- `apiService.create=true`
+
+  register `metrics-server` with the Kubernetes agregation layer
+
+  (create an entry that will show up in `kubectl get apiservices`)
+
+- `extraArgs.kubelet-insecure-tls=true`
+
+  when connecting to nodes to collect their metrics, don't check kubelet TLS certs
+
+  (because most kubelet certs include the node name, but not its IP address)
+
+- `extraArgs.kubelet-preferred-address-types=InternalIP`
+
+  when connecting to nodes, use their internal IP address instead of node name
+
+  (because the latter requires an internal DNS, which is rarely configured)
+
+---
+
+## Testing metrics-server
+
+- After a minute or two, metrics-server should be up
+
+- We should now be able to check Nodes resource usage:
  ```bash
  kubectl top nodes
  ```

-]
-
-If it shows our nodes and their CPU and memory load, we're good!
-
---
-
-## Installing metrics server
-
- The metrics server doesn't have any particular requirements
-
-  (it doesn't need persistence, as it doesn't *store* metrics)
-
- It has its own repository, [kubernetes-incubator/metrics-server](https://github.com/kubernetes-incubator/metrics-server)
-
- The repository comes with [YAML files for deployment](https://github.com/kubernetes-incubator/metrics-server/tree/master/deploy/1.8%2B)
-
- These files may not work on some clusters
-
-  (e.g. if your node names are not in DNS)
-
- The container.training repository has a [metrics-server.yaml](https://github.com/jpetazzo/container.training/blob/master/k8s/metrics-server.yaml#L90) file to help with that
-
-  (we can `kubectl apply -f` that file if needed)
-
---
-
-## Showing container resource usage
-
- Once the metrics server is running, we can check container resource usage
-
-.exercise[
-
- Show resource usage across all containers:
+- And Pods resource usage, too:
  ```bash
-  kubectl top pods --containers --all-namespaces
+  kubectl top pods --all-namespaces
  ```
-]

- We can also use selectors (`-l app=...`)
+---
+
+## Keep some padding
+
+- The RAM usage that we see should correspond more or less to the Resident Set Size
+
+- Our pods also need some extra space for buffers, caches...
+
+- Do not aim for 100% memory usage!
+
+- Some more realistic targets:
+
+  50% (for workloads with disk I/O and leveraging caching)
+
+  90% (on very big nodes with mostly CPU-bound workloads)
+
+  75% (anywhere in between!)

 ---

@@ -83,5 +196,8 @@ If it shows our nodes and their CPU and memory load, we're good!

 ???

-:EN:- The *core metrics pipeline*
-:FR:- Le *core metrics pipeline*
+:EN:- The resource metrics pipeline
+:EN:- Installing metrics-server
+
+:EN:- Le *resource metrics pipeline*
+:FR:- Installtion de metrics-server
--- a/slides/k8s/prometheus-stack.md
+++ b/slides/k8s/prometheus-stack.md
@@ -0,0 +1,123 @@
+# Prometheus and Grafana
+
+- What if we want metrics retention, view graphs, trends?
+
+- A very popular combo is Prometheus+Grafana:
+
+  - Prometheus as the "metrics engine"
+
+  - Grafana to display comprehensive dashboards
+
+- Prometheus also has an alert-manager component to trigger alerts
+
+  (we won't talk about that one)
+
+---
+
+## Installing Prometheus and Grafana
+
+- A complete metrics stack needs at least:
+
+  - the Prometheus server (collects metrics and stores them efficiently)
+
+  - a collection of *exporters* (exposing metrics to Prometheus)
+
+  - Grafana
+
+  - a collection of Grafana dashboards (building them from scratch is tedious)
+
+- The Helm chart `kube-prometheus-stack` combines all these elements
+
+- ... So we're going to use it to deploy our metrics stack!
+
+---
+
+## Installing `kube-prometheus-stack`
+
+- Let's install that stack *directly* from its repo
+
+  (without doing `helm repo add` first)
+
+- Otherwise, keep the same naming strategy:
+  ```bash
+    helm upgrade --install kube-prometheus-stack kube-prometheus-stack \
+      --namespace kube-prometheus-stack --create-namespace \
+      --repo https://prometheus-community.github.io/helm-charts 
+  ```
+
+- This will take a minute...
+
+- Then check what was installed:
+  ```bash
+  kubectl get all --namespace kube-prometheus-stack
+  ```
+
+---
+
+## Exposing Grafana
+
+- Let's create an Ingress for Grafana
+  ```bash
+    kubectl create ingress --namespace kube-prometheus-stack grafana \
+      --rule=grafana.`cloudnative.party`/*=kube-prometheus-stack-grafana:80
+  ```
+
+  (as usual, make sure to use *your* domain name above)
+
+- Connect to Grafana
+
+  (remember that the DNS record might take a few minutes to come up)
+
+---
+
+## Grafana credentials
+
+- What could the login and password be?
+
+- Let's look at the Secrets available in the namespace:
+  ```bash
+  kubectl get secrets --namespace kube-prometheus-stack
+  ```
+
+- There is a `kube-prometheus-stack-grafana` that looks promising!
+
+- Decode the Secret:
+  ```bash
+    kubectl get secret --namespace kube-prometheus-stack \
+      kube-prometheus-stack-grafana -o json | jq '.data | map_values(@base64d)'
+  ```
+
+- If you don't have the `jq` tool mentioned above, don't worry...
+
+--
+
+- The login/password is hardcoded to `admin`/`prom-operator` 😬
+
+---
+
+## Grafana dashboards
+
+- Once logged in, click on the "Dashboards" icon on the left
+
+  (it's the one that looks like four squares)
+
+- Then click on the "Manage" entry
+
+- Then click on "Kubernetes / Compute Resources / Cluster"
+
+- This gives us a breakdown of resource usage by Namespace
+
+- Feel free to explore the other dashboards!
+
+???
+
+:EN:- Installing Prometheus and Grafana
+:FR:- Installer Prometheus et Grafana
+
+:T: Observing our cluster with Prometheus and Grafana
+
+:Q: What's the relationship between Prometheus and Grafana?
+:A: Prometheus collects and graphs metrics; Grafana sends alerts
+:A: ✔️Prometheus collects metrics; Grafana displays them on dashboards
+:A: Prometheus collects and graphs metrics; Grafana is its configuration interface
+:A: Grafana collects and graphs metrics; Prometheus sends alerts
--- a/slides/kadm-twodays.yml
+++ b/slides/kadm-twodays.yml
@@ -64,6 +64,7 @@ content:
  - k8s/cluster-sizing.md
  - k8s/horizontal-pod-autoscaler.md
 - - k8s/prometheus.md
+  #- k8s/prometheus-stack.md
  - k8s/extending-api.md
  - k8s/crd.md
  - k8s/operators.md
--- a/slides/kube-adv.yml
+++ b/slides/kube-adv.yml
@@ -69,6 +69,7 @@ content:
  - k8s/aggregation-layer.md
  - k8s/metrics-server.md
  - k8s/prometheus.md
+  - k8s/prometheus-stack.md
  - k8s/hpa-v2.md
 - #9
  - k8s/operators-design.md
--- a/slides/kube-fullday.yml
+++ b/slides/kube-fullday.yml
@@ -106,6 +106,7 @@ content:
  #- k8s/build-with-kaniko.md
  #- k8s/logs-centralized.md
  #- k8s/prometheus.md
+  #- k8s/prometheus-stack.md
  #- k8s/statefulsets.md
  #- k8s/local-persistent-volumes.md
  #- k8s/portworx.md
--- a/slides/kube-selfpaced.yml
+++ b/slides/kube-selfpaced.yml
@@ -116,6 +116,7 @@ content:
 -
  - k8s/logs-centralized.md
  - k8s/prometheus.md
+  - k8s/prometheus-stack.md
  - k8s/resource-limits.md
  - k8s/metrics-server.md
  - k8s/cluster-sizing.md
--- a/slides/kube-twodays.yml
+++ b/slides/kube-twodays.yml
@@ -104,7 +104,8 @@ content:
  - k8s/configuration.md
  - k8s/secrets.md
  - k8s/logs-centralized.md
-  - k8s/prometheus.md
+  #- k8s/prometheus.md
+  #- k8s/prometheus-stack.md
 -
  - k8s/statefulsets.md
  - k8s/local-persistent-volumes.md