mirror of
https://github.com/fluxcd/flagger.git
synced 2026-03-02 17:51:00 +00:00
196 lines
4.4 KiB
Markdown
196 lines
4.4 KiB
Markdown
# Canary analysis with Prometheus Operator
|
|
|
|
This guide show you how to use Prometheus Operator for canary analysis.
|
|
|
|
## Prerequisites
|
|
|
|
Install Prometheus Operator with Helm v3:
|
|
|
|
```bash
|
|
helm repo add stable https://kubernetes-charts.storage.googleapis.com
|
|
|
|
kubectl create ns monitoring
|
|
helm upgrade -i prometheus stable/prometheus-operator \
|
|
--namespace monitoring \
|
|
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
|
|
--set fullnameOverride=prometheus
|
|
```
|
|
|
|
The `prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false` option allows Prometheus operator to watch serviceMonitors outside of his namespace.
|
|
|
|
Install Flagger by setting the metrics server to Prometheus:
|
|
|
|
```bash
|
|
helm repo add flagger https://flagger.app
|
|
|
|
kubectl create ns flagger-system
|
|
helm upgrade -i flagger flagger/flagger \
|
|
--namespace flagger-system \
|
|
--set metricsServer=http://prometheus-prometheus.monitoring:9090 \
|
|
--set meshProvider=kubernetes
|
|
```
|
|
|
|
Install Flagger's tester:
|
|
|
|
```bash
|
|
helm upgrade -i loadtester flagger/loadtester \
|
|
--namespace flagger-system
|
|
```
|
|
|
|
Install podinfo demo app:
|
|
|
|
```bash
|
|
helm repo add podinfo https://stefanprodan.github.io/podinfo
|
|
|
|
kubectl create ns test
|
|
helm upgrade -i podinfo podinfo/podinfo \
|
|
--namespace test \
|
|
--set service.enabled=false
|
|
```
|
|
|
|
## Service monitors
|
|
|
|
The demo app is instrumented with Prometheus so you can create service monitors to scrape podinfo's metrics endpoint:
|
|
|
|
```yaml
|
|
apiVersion: monitoring.coreos.com/v1
|
|
kind: ServiceMonitor
|
|
metadata:
|
|
name: podinfo-primary
|
|
namespace: test
|
|
spec:
|
|
endpoints:
|
|
- path: /metrics
|
|
port: http
|
|
interval: 5s
|
|
selector:
|
|
matchLabels:
|
|
app: podinfo
|
|
```
|
|
|
|
```yaml
|
|
apiVersion: monitoring.coreos.com/v1
|
|
kind: ServiceMonitor
|
|
metadata:
|
|
name: podinfo-canary
|
|
namespace: test
|
|
spec:
|
|
endpoints:
|
|
- path: /metrics
|
|
port: http
|
|
interval: 5s
|
|
selector:
|
|
matchLabels:
|
|
app: podinfo-canary
|
|
```
|
|
|
|
We are setting `interval: 5s` to have a more aggressive scraping. If you do not define it, you must to use a longer interval in the Canary object.
|
|
|
|
## Metric templates
|
|
|
|
Create a metric template to measure the HTTP requests error rate:
|
|
|
|
```yaml
|
|
apiVersion: flagger.app/v1beta1
|
|
kind: MetricTemplate
|
|
metadata:
|
|
name: error-rate
|
|
namespace: test
|
|
spec:
|
|
provider:
|
|
address: http://prometheus-prometheus.monitoring:9090
|
|
type: prometheus
|
|
query: |
|
|
100 - rate(
|
|
http_requests_total{
|
|
namespace="{{ namespace }}",
|
|
job="{{ target }}-canary",
|
|
status!~"5.*"
|
|
}[{{ interval }}])
|
|
/
|
|
rate(
|
|
http_requests_total{
|
|
namespace="{{ namespace }}",
|
|
job="{{ target }}-canary"
|
|
}[{{ interval }}]
|
|
) * 100
|
|
```
|
|
|
|
Create a metric template to measure the HTTP requests average duration:
|
|
|
|
```yaml
|
|
apiVersion: flagger.app/v1beta1
|
|
kind: MetricTemplate
|
|
metadata:
|
|
name: latency
|
|
namespace: test
|
|
spec:
|
|
provider:
|
|
address: http://prometheus-prometheus.monitoring:9090
|
|
type: prometheus
|
|
query: |
|
|
histogram_quantile(0.99,
|
|
sum(
|
|
rate(
|
|
http_request_duration_seconds_bucket{
|
|
namespace="{{ namespace }}",
|
|
job="{{ target }}-canary"
|
|
}[{{ interval }}]
|
|
)
|
|
) by (le)
|
|
)
|
|
```
|
|
|
|
## Canary analysis
|
|
|
|
Using the metrics template you can configure the canary analysis with HTTP error rate and latency checks:
|
|
|
|
```yaml
|
|
apiVersion: flagger.app/v1beta1
|
|
kind: Canary
|
|
metadata:
|
|
name: podinfo
|
|
namespace: test
|
|
spec:
|
|
provider: kubernetes
|
|
targetRef:
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
name: podinfo
|
|
progressDeadlineSeconds: 60
|
|
service:
|
|
port: 80
|
|
targetPort: http
|
|
name: podinfo
|
|
analysis:
|
|
interval: 30s
|
|
iterations: 10
|
|
threshold: 2
|
|
metrics:
|
|
- name: error-rate
|
|
templateRef:
|
|
name: error-rate
|
|
thresholdRange:
|
|
max: 1
|
|
interval: 30s
|
|
- name: latency
|
|
templateRef:
|
|
name: latency
|
|
thresholdRange:
|
|
max: 0.5
|
|
interval: 30s
|
|
webhooks:
|
|
- name: load-test
|
|
type: rollout
|
|
url: "http://loadtester.flagger-system/"
|
|
timeout: 5s
|
|
metadata:
|
|
type: cmd
|
|
cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test/"
|
|
```
|
|
|
|
Based on the above specification, Flagger creates the primary and canary Kubernetes ClusterIP service.
|
|
|
|
During the canary analysis, Prometheus will scrape the canary service and Flagger will use the HTTP error rate and latency queries to determine if the release should be promoted or rolled back.
|
|
|