mirror of https://github.com/jpetazzo/container.training.git synced 2026-05-04 07:56:37 +00:00

Files

Jerome Petazzoni 6593f4ad42 Chart → chart

As per https://helm.sh/docs/chart_best_practices/#usage-of-the-words-helm-tiller-and-chart

2019-05-25 17:44:28 -05:00

14 KiB

Raw Blame History

Collecting metrics with Prometheus

Prometheus is an open-source monitoring system including:
- multiple service discovery backends to figure out which metrics to collect
- a scraper to collect these metrics
- an efficient time series database to store these metrics
- a specific query language (PromQL) to query these time series
- an alert manager to notify us according to metrics values or trends
We are going to use it to collect and query some metrics on our Kubernetes cluster

Why Prometheus?

We don't endorse Prometheus more or less than any other system
It's relatively well integrated within the Cloud Native ecosystem
It can be self-hosted (this is useful for tutorials like this)
It can be used for deployments of varying complexity:
- one binary and 10 lines of configuration to get started
- all the way to thousands of nodes and millions of metrics

Exposing metrics to Prometheus

Prometheus obtains metrics and their values by querying exporters
An exporter serves metrics over HTTP, in plain text
This is what the node exporter looks like:

http://demo.robustperception.io:9100/metrics
Prometheus itself exposes its own internal metrics, too:

http://demo.robustperception.io:9090/metrics
If you want to expose custom metrics to Prometheus:
- serve a text page like these, and you're good to go
- libraries are available in various languages to help with quantiles etc.

How Prometheus gets these metrics

The Prometheus server will scrape URLs like these at regular intervals

(by default: every minute; can be more/less frequent)
If you're worried about parsing overhead: exporters can also use protobuf
The list of URLs to scrape (the scrape targets) is defined in configuration

Defining scrape targets

This is maybe the simplest configuration file for Prometheus:

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

In this configuration, Prometheus collects its own internal metrics
A typical configuration file will have multiple scrape_configs
In this configuration, the list of targets is fixed
A typical configuration file will use dynamic service discovery

Service discovery

This configuration file will leverage existing DNS A records:

scrape_configs:
  - ...
  - job_name: 'node'
    dns_sd_configs:
      - names: ['api-backends.dc-paris-2.enix.io']
        type: 'A'
        port: 9100

In this configuration, Prometheus resolves the provided name(s)

(here, api-backends.dc-paris-2.enix.io)
Each resulting IP address is added as a target on port 9100

Dynamic service discovery

In the DNS example, the names are re-resolved at regular intervals
As DNS records are created/updated/removed, scrape targets change as well
Existing data (previously collected metrics) is not deleted
Other service discovery backends work in a similar fashion

Other service discovery mechanisms

Prometheus can connect to e.g. a cloud API to list instances
Or to the Kubernetes API to list nodes, pods, services ...
Or a service like Consul, Zookeeper, etcd, to list applications
The resulting configurations files are way more complex

(but don't worry, we won't need to write them ourselves)

Time series database

We could wonder, "why do we need a specialized database?"
One metrics data point = metrics ID + timestamp + value
With a classic SQL or noSQL data store, that's at least 160 bits of data + indexes
Prometheus is way more efficient, without sacrificing performance

(it will even be gentler on the I/O subsystem since it needs to write less)
Would you like to know more? Check this video:

Storage in Prometheus 2.0 by Goutham V at DC17EU

Checking if Prometheus is installed

Before trying to install Prometheus, let's check if it's already there

Look for services with a label app=prometheus across all namespaces:
```
kubectl get services --selector=app=prometheus --all-namespaces
```

]

If we see a NodePort service called prometheus-server, we're good!

(We can then skip to "Connecting to the Prometheus web UI".)

Running Prometheus on our cluster

We need to:

Run the Prometheus server in a pod

(using e.g. a Deployment to ensure that it keeps running)
Expose the Prometheus server web UI (e.g. with a NodePort)
Run the node exporter on each node (with a Daemon Set)
Setup a Service Account so that Prometheus can query the Kubernetes API
Configure the Prometheus server

(storing the configuration in a Config Map for easy updates)

Helm charts to the rescue

To make our lives easier, we are going to use a Helm chart
The Helm chart will take care of all the steps explained above

(including some extra features that we don't need, but won't hurt)

Step 1: install Helm

If we already installed Helm earlier, these commands won't break anything

Install Tiller (Helm's server-side component) on our cluster:
```
helm init
```

Give Tiller permission to deploy things on our cluster:

kubectl create clusterrolebinding add-on-cluster-admin \
    --clusterrole=cluster-admin --serviceaccount=kube-system:default

]

Step 2: install Prometheus

Skip this if we already installed Prometheus earlier

(in doubt, check with helm list)

Install Prometheus on our cluster:

  helm upgrade prometheus stable/prometheus \
      --install \
      --namespace kube-system \
      --set server.service.type=NodePort \
      --set server.service.nodePort=30090 \
      --set server.persistentVolume.enabled=false \
      --set alertmanager.enabled=false

]

Curious about all these flags? They're explained in the next slide.

Explaining all the Helm flags

helm upgrade prometheus → upgrade release "prometheus" to the latest version ...

(a "release" is a unique name given to an app deployed with Helm)
stable/prometheus → ... of the chart prometheus in repo stable
--install → if the app doesn't exist, create it
--namespace kube-system → put it in that specific namespace
And set the following values when rendering the chart's templates:
- server.service.type=NodePort → expose the Prometheus server with a NodePort
- server.service.nodePort=30090 → set the specific NodePort number to use
- server.persistentVolume.enabled=false → do not use a PersistentVolumeClaim
- alertmanager.enabled=false → disable the alert manager entirely

Connecting to the Prometheus web UI

Let's connect to the web UI and see what we can do

Figure out the NodePort that was allocated to the Prometheus server:
```
kubectl get svc --all-namespaces | grep prometheus-server
```
With your browser, connect to that port

]

Querying some metrics

This is easy ... if you are familiar with PromQL

Click on "Graph", and in "expression", paste the following:

  sum by (instance) (
    irate(
      container_cpu_usage_seconds_total{
        pod_name=~"worker.*"
        }[5m]
    )
  )

]

Click on the blue "Execute" button and on the "Graph" tab just below
We see the cumulated CPU usage of worker pods for each node
(if we just deployed Prometheus, there won't be much data to see, though)

Getting started with PromQL

We can't learn PromQL in just 5 minutes
But we can cover the basics to get an idea of what is possible

(and have some keywords and pointers)
We are going to break down the query above

(building it one step at a time)

Graphing one metric across all tags

This query will show us CPU usage across all containers:

container_cpu_usage_seconds_total

The suffix of the metrics name tells us:
- the unit (seconds of CPU)
- that it's the total used since the container creation
Since it's a "total", it is an increasing quantity

(we need to compute the derivative if we want e.g. CPU % over time)
We see that the metrics retrieved have tags attached to them

Selecting metrics with tags

This query will show us only metrics for worker containers:

container_cpu_usage_seconds_total{pod_name=~"worker.*"}

The =~ operator allows regex matching
We select all the pods with a name starting with worker

(it would be better to use labels to select pods; more on that later)
The result is a smaller set of containers

Transforming counters in rates

This query will show us CPU usage % instead of total seconds used:

100*irate(container_cpu_usage_seconds_total{pod_name=~"worker.*"}[5m])

The irate operator computes the "per-second instant rate of increase"
- rate is similar but allows decreasing counters and negative values
- with irate, if a counter goes back to zero, we don't get a negative spike
The [5m] tells how far to look back if there is a gap in the data
And we multiply with 100* to get CPU % usage

Aggregation operators

This query sums the CPU usage per node:

sum by (instance) (
  irate(container_cpu_usage_seconds_total{pod_name=~"worker.*"}[5m])
)

instance corresponds to the node on which the container is running
sum by (instance) (...) computes the sum for each instance
Note: all the other tags are collapsed

(in other words, the resulting graph only shows the instance tag)
PromQL supports many more aggregation operators

What kind of metrics can we collect?

Node metrics (related to physical or virtual machines)
Container metrics (resource usage per container)
Databases, message queues, load balancers, ...

(check out this list of exporters!)
Instrumentation (=deluxe printf for our code)
Business metrics (customers served, revenue, ...)

Node metrics

CPU, RAM, disk usage on the whole node
Total number of processes running, and their states
Number of open files, sockets, and their states
I/O activity (disk, network), per operation or volume
Physical/hardware (when applicable): temperature, fan speed ...
... and much more!

Container metrics

Similar to node metrics, but not totally identical
RAM breakdown will be different
- active vs inactive memory
- some memory is shared between containers, and accounted specially
I/O activity is also harder to track
- async writes can cause deferred "charges"
- some page-ins are also shared between containers

For details about container metrics, see:
http://jpetazzo.github.io/2013/10/08/docker-containers-metrics/

Application metrics

Arbitrary metrics related to your application and business
System performance: request latency, error rate ...
Volume information: number of rows in database, message queue size ...
Business data: inventory, items sold, revenue ...

Detecting scrape targets

Prometheus can leverage Kubernetes service discovery

(with proper configuration)
Services or pods can be annotated with:
- prometheus.io/scrape: true to enable scraping
- prometheus.io/port: 9090 to indicate the port number
- prometheus.io/path: /metrics to indicate the URI (/metrics by default)
Prometheus will detect and scrape these (without needing a restart or reload)

Querying labels

What if we want to get metrics for containers belong to pod tagged worker?
The cAdvisor exporter does not give us Kubernetes labels
Kubernetes labels are exposed through another exporter
We can see Kubernetes labels through metrics kube_pod_labels

(each container appears as a time series with constant value of 1)
Prometheus kind of supports "joins" between time series
But only if the names of the tags match exactly

Unfortunately ...

The cAdvisor exporter uses tag pod_name for the name of a pod
The Kubernetes service endpoints exporter uses tag pod instead
See this blog post or this other one to see how to perform "joins"
Alas, Prometheus cannot "join" time series with different labels

(see Prometheus issue #2204 for the rationale)
There is a workaround involving relabeling, but it's "not cheap"
- see this comment for an overview
- or this blog post for a complete description of the process

In practice

Grafana is a beautiful (and useful) frontend to display all kinds of graphs
Not everyone needs to know Prometheus, PromQL, Grafana, etc.
But in a team, it is valuable to have at least one person who know them
That person can set up queries and dashboards for the rest of the team
It's a little bit likeknowing how to optimize SQL queries, Dockerfiles ...

Don't panic if you don't know these tools!

... But make sure at least one person in your team is on it 💯

14 KiB Raw Blame History