🔥 Add prometheus-stack + Grafana content (from LKE workshop) and update metrics-server section

This commit is contained in:
Jérôme Petazzoni
2021-05-04 17:19:59 +02:00
parent bbf65f7433
commit 98429e14f0
7 changed files with 300 additions and 56 deletions

View File

@@ -1,69 +1,182 @@
# Checking pod and node resource usage
# Checking Node and Pod resource usage
- Since Kubernetes 1.8, metrics are collected by the [resource metrics pipeline](https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/)
- We've installed a few things on our cluster so far
- The resource metrics pipeline is:
- How much resources (CPU, RAM) are we using?
- optional (Kubernetes can function without it)
- necessary for some features (like the Horizontal Pod Autoscaler)
- exposed through the Kubernetes API using the [aggregation layer](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/)
- usually implemented by the "metrics server"
---
## How to know if the metrics server is running?
- The easiest way to know is to run `kubectl top`
- We need metrics!
.exercise[
- Check if the core metrics pipeline is available:
- Let's try the following command:
```bash
kubectl top nodes
```
]
---
## Is metrics-server installed?
- If we see a list of nodes, with CPU and RAM usage:
*great, metrics-server is installed!*
- If we see `error: Metrics API not available`:
*metrics-server isn't installed, so we'll install it!*
---
## The resource metrics pipeline
- The `kubectl top` command relies on the Metrics API
- The Metrics API is part of the "[resource metrics pipeline]"
- The Metrics API isn't served (built into) the Kubernetes API server
- It is made available through the [aggregation layer]
- It is usually served by a component called metrics-server
- It is optional (Kubernetes can function without it)
- It is necessary for some features (like the Horizontal Pod Autoscaler)
[resource metrics pipeline]: https://kubernetes.io/docs/tasks/debug-application-cluster/resource-metrics-pipeline/
[aggregation layer]: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/
---
## Other ways to get metrics
- We could use a SAAS like Datadog, New Relic...
- We could use a self-hosted solution like Prometheus
- Or we could use metrics-server
- What's special about metrics-server?
---
## Pros/cons
Cons:
- no data retention (no history data, just instant numbers)
- only CPU and RAM of nodes and pods (no disk or network usage or I/O...)
Pros:
- very lightweight
- doesn't require storage
- used by Kubernetes autoscaling
---
## Why metrics-server
- We may install something fancier later
(think: Prometheus with Grafana)
- But metrics-server will work in *minutes*
- It will barely use resources on our cluster
- It's required for autoscaling anyway
---
## How metric-server works
- It runs a single Pod
- That Pod will fetch metrics from all our Nodes
- It will expose them through the Kubernetes API agregation layer
(we won't say much more about that agregation layer; that's fairly advanced stuff!)
---
## Installing metrics-server
- In a lot of places, this is done with a little bit of custom YAML
(derived from the [official installation instructions](https://github.com/kubernetes-sigs/metrics-server#installation))
- We're going to use Helm one more time:
```bash
helm upgrade --install metrics-server bitnami/metrics-server \
--create-namespace --namespace metrics-server \
--set apiService.create=true \
--set extraArgs.kubelet-insecure-tls=true \
--set extraArgs.kubelet-preferred-address-types=InternalIP
```
- What are these options for?
---
## Installation options
- `apiService.create=true`
register `metrics-server` with the Kubernetes agregation layer
(create an entry that will show up in `kubectl get apiservices`)
- `extraArgs.kubelet-insecure-tls=true`
when connecting to nodes to collect their metrics, don't check kubelet TLS certs
(because most kubelet certs include the node name, but not its IP address)
- `extraArgs.kubelet-preferred-address-types=InternalIP`
when connecting to nodes, use their internal IP address instead of node name
(because the latter requires an internal DNS, which is rarely configured)
---
## Testing metrics-server
- After a minute or two, metrics-server should be up
- We should now be able to check Nodes resource usage:
```bash
kubectl top nodes
```
]
If it shows our nodes and their CPU and memory load, we're good!
---
## Installing metrics server
- The metrics server doesn't have any particular requirements
(it doesn't need persistence, as it doesn't *store* metrics)
- It has its own repository, [kubernetes-incubator/metrics-server](https://github.com/kubernetes-incubator/metrics-server)
- The repository comes with [YAML files for deployment](https://github.com/kubernetes-incubator/metrics-server/tree/master/deploy/1.8%2B)
- These files may not work on some clusters
(e.g. if your node names are not in DNS)
- The container.training repository has a [metrics-server.yaml](https://github.com/jpetazzo/container.training/blob/master/k8s/metrics-server.yaml#L90) file to help with that
(we can `kubectl apply -f` that file if needed)
---
## Showing container resource usage
- Once the metrics server is running, we can check container resource usage
.exercise[
- Show resource usage across all containers:
- And Pods resource usage, too:
```bash
kubectl top pods --containers --all-namespaces
kubectl top pods --all-namespaces
```
]
- We can also use selectors (`-l app=...`)
---
## Keep some padding
- The RAM usage that we see should correspond more or less to the Resident Set Size
- Our pods also need some extra space for buffers, caches...
- Do not aim for 100% memory usage!
- Some more realistic targets:
50% (for workloads with disk I/O and leveraging caching)
90% (on very big nodes with mostly CPU-bound workloads)
75% (anywhere in between!)
---
@@ -83,5 +196,8 @@ If it shows our nodes and their CPU and memory load, we're good!
???
:EN:- The *core metrics pipeline*
:FR:- Le *core metrics pipeline*
:EN:- The resource metrics pipeline
:EN:- Installing metrics-server
:EN:- Le *resource metrics pipeline*
:FR:- Installtion de metrics-server

View File

@@ -0,0 +1,123 @@
# Prometheus and Grafana
- What if we want metrics retention, view graphs, trends?
- A very popular combo is Prometheus+Grafana:
- Prometheus as the "metrics engine"
- Grafana to display comprehensive dashboards
- Prometheus also has an alert-manager component to trigger alerts
(we won't talk about that one)
---
## Installing Prometheus and Grafana
- A complete metrics stack needs at least:
- the Prometheus server (collects metrics and stores them efficiently)
- a collection of *exporters* (exposing metrics to Prometheus)
- Grafana
- a collection of Grafana dashboards (building them from scratch is tedious)
- The Helm chart `kube-prometheus-stack` combines all these elements
- ... So we're going to use it to deploy our metrics stack!
---
## Installing `kube-prometheus-stack`
- Let's install that stack *directly* from its repo
(without doing `helm repo add` first)
- Otherwise, keep the same naming strategy:
```bash
helm upgrade --install kube-prometheus-stack kube-prometheus-stack \
--namespace kube-prometheus-stack --create-namespace \
--repo https://prometheus-community.github.io/helm-charts
```
- This will take a minute...
- Then check what was installed:
```bash
kubectl get all --namespace kube-prometheus-stack
```
---
## Exposing Grafana
- Let's create an Ingress for Grafana
```bash
kubectl create ingress --namespace kube-prometheus-stack grafana \
--rule=grafana.`cloudnative.party`/*=kube-prometheus-stack-grafana:80
```
(as usual, make sure to use *your* domain name above)
- Connect to Grafana
(remember that the DNS record might take a few minutes to come up)
---
## Grafana credentials
- What could the login and password be?
- Let's look at the Secrets available in the namespace:
```bash
kubectl get secrets --namespace kube-prometheus-stack
```
- There is a `kube-prometheus-stack-grafana` that looks promising!
- Decode the Secret:
```bash
kubectl get secret --namespace kube-prometheus-stack \
kube-prometheus-stack-grafana -o json | jq '.data | map_values(@base64d)'
```
- If you don't have the `jq` tool mentioned above, don't worry...
--
- The login/password is hardcoded to `admin`/`prom-operator` 😬
---
## Grafana dashboards
- Once logged in, click on the "Dashboards" icon on the left
(it's the one that looks like four squares)
- Then click on the "Manage" entry
- Then click on "Kubernetes / Compute Resources / Cluster"
- This gives us a breakdown of resource usage by Namespace
- Feel free to explore the other dashboards!
???
:EN:- Installing Prometheus and Grafana
:FR:- Installer Prometheus et Grafana
:T: Observing our cluster with Prometheus and Grafana
:Q: What's the relationship between Prometheus and Grafana?
:A: Prometheus collects and graphs metrics; Grafana sends alerts
:A: ✔Prometheus collects metrics; Grafana displays them on dashboards
:A: Prometheus collects and graphs metrics; Grafana is its configuration interface
:A: Grafana collects and graphs metrics; Prometheus sends alerts

View File

@@ -64,6 +64,7 @@ content:
- k8s/cluster-sizing.md
- k8s/horizontal-pod-autoscaler.md
- - k8s/prometheus.md
#- k8s/prometheus-stack.md
- k8s/extending-api.md
- k8s/crd.md
- k8s/operators.md

View File

@@ -69,6 +69,7 @@ content:
- k8s/aggregation-layer.md
- k8s/metrics-server.md
- k8s/prometheus.md
- k8s/prometheus-stack.md
- k8s/hpa-v2.md
- #9
- k8s/operators-design.md

View File

@@ -106,6 +106,7 @@ content:
#- k8s/build-with-kaniko.md
#- k8s/logs-centralized.md
#- k8s/prometheus.md
#- k8s/prometheus-stack.md
#- k8s/statefulsets.md
#- k8s/local-persistent-volumes.md
#- k8s/portworx.md

View File

@@ -116,6 +116,7 @@ content:
-
- k8s/logs-centralized.md
- k8s/prometheus.md
- k8s/prometheus-stack.md
- k8s/resource-limits.md
- k8s/metrics-server.md
- k8s/cluster-sizing.md

View File

@@ -104,7 +104,8 @@ content:
- k8s/configuration.md
- k8s/secrets.md
- k8s/logs-centralized.md
- k8s/prometheus.md
#- k8s/prometheus.md
#- k8s/prometheus-stack.md
-
- k8s/statefulsets.md
- k8s/local-persistent-volumes.md