mirror of
https://github.com/fluxcd/flagger.git
synced 2026-02-14 18:10:00 +00:00
200 lines
4.6 KiB
Markdown
200 lines
4.6 KiB
Markdown
# Canary analysis with Prometheus Operator
|
|
|
|
This guide show you how to use
|
|
[Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator) for canary analysis.
|
|
|
|
## Prerequisites
|
|
|
|
Flagger requires a Kubernetes cluster **v1.16** or newer and Prometheus Operator **v0.40** or newer.
|
|
|
|
Install Prometheus Operator with Helm v3:
|
|
|
|
```bash
|
|
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
|
|
|
|
kubectl create ns monitoring
|
|
helm upgrade -i prometheus prometheus-community/kube-prometheus-stack \
|
|
--namespace monitoring \
|
|
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
|
|
--set fullnameOverride=prometheus
|
|
```
|
|
|
|
The `prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false`
|
|
option allows Prometheus Operator to watch serviceMonitors outside of its namespace.
|
|
|
|
Install Flagger by setting the metrics server to Prometheus:
|
|
|
|
```bash
|
|
helm repo add flagger https://flagger.app
|
|
|
|
kubectl create ns flagger-system
|
|
helm upgrade -i flagger flagger/flagger \
|
|
--namespace flagger-system \
|
|
--set metricsServer=http://prometheus-prometheus.monitoring:9090 \
|
|
--set meshProvider=kubernetes
|
|
```
|
|
|
|
Install Flagger's tester:
|
|
|
|
```bash
|
|
helm upgrade -i loadtester flagger/loadtester \
|
|
--namespace flagger-system
|
|
```
|
|
|
|
Install [podinfo](https://github.com/stefanprodan/podinfo) demo app:
|
|
|
|
```bash
|
|
helm repo add podinfo https://stefanprodan.github.io/podinfo
|
|
|
|
kubectl create ns test
|
|
helm upgrade -i podinfo podinfo/podinfo \
|
|
--namespace test \
|
|
--set service.enabled=false
|
|
```
|
|
|
|
## Service monitors
|
|
|
|
The demo app is instrumented with Prometheus,
|
|
so you can create a `ServiceMonitor` objects to scrape podinfo's metrics endpoint:
|
|
|
|
```yaml
|
|
apiVersion: monitoring.coreos.com/v1
|
|
kind: ServiceMonitor
|
|
metadata:
|
|
name: podinfo-canary
|
|
namespace: test
|
|
spec:
|
|
endpoints:
|
|
- path: /metrics
|
|
port: http
|
|
interval: 5s
|
|
selector:
|
|
matchLabels:
|
|
app: podinfo-canary
|
|
---
|
|
apiVersion: monitoring.coreos.com/v1
|
|
kind: ServiceMonitor
|
|
metadata:
|
|
name: podinfo-primary
|
|
namespace: test
|
|
spec:
|
|
endpoints:
|
|
- path: /metrics
|
|
port: http
|
|
interval: 5s
|
|
selector:
|
|
matchLabels:
|
|
app: podinfo
|
|
```
|
|
|
|
We are setting `interval: 5s` to have a more aggressive scraping.
|
|
If you do not define it, you should use a longer interval in the Canary object.
|
|
|
|
## Metric templates
|
|
|
|
Create a metric template to measure the HTTP requests error rate:
|
|
|
|
```yaml
|
|
apiVersion: flagger.app/v1beta1
|
|
kind: MetricTemplate
|
|
metadata:
|
|
name: error-rate
|
|
namespace: test
|
|
spec:
|
|
provider:
|
|
address: http://prometheus-prometheus.monitoring:9090
|
|
type: prometheus
|
|
query: |
|
|
100 - rate(
|
|
http_requests_total{
|
|
namespace="{{ namespace }}",
|
|
job="{{ target }}-canary",
|
|
status!~"5.*"
|
|
}[{{ interval }}])
|
|
/
|
|
rate(
|
|
http_requests_total{
|
|
namespace="{{ namespace }}",
|
|
job="{{ target }}-canary"
|
|
}[{{ interval }}]
|
|
) * 100
|
|
```
|
|
|
|
Create a metric template to measure the HTTP requests average duration:
|
|
|
|
```yaml
|
|
apiVersion: flagger.app/v1beta1
|
|
kind: MetricTemplate
|
|
metadata:
|
|
name: latency
|
|
namespace: test
|
|
spec:
|
|
provider:
|
|
address: http://prometheus-prometheus.monitoring:9090
|
|
type: prometheus
|
|
query: |
|
|
histogram_quantile(0.99,
|
|
sum(
|
|
rate(
|
|
http_request_duration_seconds_bucket{
|
|
namespace="{{ namespace }}",
|
|
job="{{ target }}-canary"
|
|
}[{{ interval }}]
|
|
)
|
|
) by (le)
|
|
)
|
|
```
|
|
|
|
## Canary analysis
|
|
|
|
Using the metrics template you can configure the canary analysis with HTTP error rate and latency checks:
|
|
|
|
```yaml
|
|
apiVersion: flagger.app/v1beta1
|
|
kind: Canary
|
|
metadata:
|
|
name: podinfo
|
|
namespace: test
|
|
spec:
|
|
provider: kubernetes
|
|
targetRef:
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
name: podinfo
|
|
progressDeadlineSeconds: 60
|
|
service:
|
|
port: 80
|
|
targetPort: http
|
|
name: podinfo
|
|
analysis:
|
|
interval: 30s
|
|
iterations: 10
|
|
threshold: 2
|
|
metrics:
|
|
- name: error-rate
|
|
templateRef:
|
|
name: error-rate
|
|
thresholdRange:
|
|
max: 1
|
|
interval: 30s
|
|
- name: latency
|
|
templateRef:
|
|
name: latency
|
|
thresholdRange:
|
|
max: 0.5
|
|
interval: 30s
|
|
webhooks:
|
|
- name: load-test
|
|
type: rollout
|
|
url: "http://loadtester.flagger-system/"
|
|
timeout: 5s
|
|
metadata:
|
|
type: cmd
|
|
cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test/"
|
|
```
|
|
|
|
Based on the above specification, Flagger creates the primary and canary Kubernetes ClusterIP service.
|
|
|
|
During the canary analysis, Prometheus will scrape the canary service and Flagger will use the HTTP error rate
|
|
and latency queries to determine if the release should be promoted or rolled back.
|