mirror of
https://github.com/fluxcd/flagger.git
synced 2026-03-02 17:51:00 +00:00
Format docs markdown
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
This commit is contained in:
@@ -1,349 +0,0 @@
|
||||
# Crossover Canary Deployments
|
||||
|
||||
This guide shows you how to use Envoy, [Crossover](https://github.com/mumoshu/crossover) and Flagger to automate canary deployments.
|
||||
|
||||
Crossover is a minimal Envoy xDS implementation supports [Service Mesh Interface](https://smi-spec.io/).
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Flagger requires a Kubernetes cluster **v1.16** or newer and Envoy paired with [Crossover](https://github.com/mumoshu/crossover) sidecar.
|
||||
|
||||
Create a test namespace:
|
||||
|
||||
```bash
|
||||
kubectl create ns test
|
||||
```
|
||||
|
||||
Install Envoy along with the Crossover sidecar with Helm:
|
||||
|
||||
```bash
|
||||
helm repo add crossover https://mumoshu.github.io/crossover
|
||||
|
||||
helm upgrade --install envoy crossover/envoy \
|
||||
--namespace test \
|
||||
-f <(cat <<EOF
|
||||
smi:
|
||||
apiVersions:
|
||||
trafficSplits: v1alpha1
|
||||
upstreams:
|
||||
podinfo:
|
||||
smi:
|
||||
enabled: true
|
||||
backends:
|
||||
podinfo-primary:
|
||||
port: 9898
|
||||
weight: 100
|
||||
podinfo-canary:
|
||||
port: 9898
|
||||
weight: 0
|
||||
EOF
|
||||
)
|
||||
```
|
||||
|
||||
Install Flagger and the Prometheus add-on in the same namespace as Envoy:
|
||||
|
||||
```bash
|
||||
helm repo add flagger https://flagger.app
|
||||
|
||||
helm upgrade -i flagger flagger/flagger \
|
||||
--namespace test \
|
||||
--set prometheus.install=true \
|
||||
--set meshProvider=smi:crossover
|
||||
```
|
||||
|
||||
## Bootstrap
|
||||
|
||||
Flagger takes a Kubernetes deployment and optionally a horizontal pod autoscaler \(HPA\), then creates a series of objects \(Kubernetes deployments, ClusterIP services, SMI traffic splits\). These objects expose the application on the mesh and drive the canary analysis and promotion. There's no SMI object you need to create by yourself.
|
||||
|
||||
Create a deployment and a horizontal pod autoscaler:
|
||||
|
||||
```bash
|
||||
kubectl apply -k https://github.com/fluxcd/flagger//kustomize/podinfo?ref=main
|
||||
```
|
||||
|
||||
Deploy the load testing service to generate traffic during the canary analysis:
|
||||
|
||||
```bash
|
||||
helm upgrade -i flagger-loadtester flagger/loadtester \
|
||||
--namespace=test
|
||||
```
|
||||
|
||||
Create a metric template to measure the HTTP requests error rate:
|
||||
|
||||
```yaml
|
||||
apiVersion: flagger.app/v1beta1
|
||||
kind: MetricTemplate
|
||||
metadata:
|
||||
name: error-rate
|
||||
namespace: test
|
||||
spec:
|
||||
provider:
|
||||
address: http://flagger-prometheus:9090
|
||||
type: prometheus
|
||||
query: |
|
||||
100 - rate(
|
||||
envoy_cluster_upstream_rq{
|
||||
kubernetes_namespace="{{ namespace }}",
|
||||
envoy_cluster_name="{{ target }}-canary",
|
||||
envoy_response_code!~"5.*"
|
||||
}[{{ interval }}])
|
||||
/
|
||||
rate(
|
||||
envoy_cluster_upstream_rq{
|
||||
kubernetes_namespace="{{ namespace }}",
|
||||
envoy_cluster_name="{{ target }}-canary"
|
||||
}[{{ interval }}]
|
||||
) * 100
|
||||
```
|
||||
|
||||
Create a metric template to measure the HTTP requests average duration:
|
||||
|
||||
```yaml
|
||||
apiVersion: flagger.app/v1beta1
|
||||
kind: MetricTemplate
|
||||
metadata:
|
||||
name: latency
|
||||
namespace: test
|
||||
spec:
|
||||
provider:
|
||||
address: http://flagger-prometheus:9090
|
||||
type: prometheus
|
||||
query: |
|
||||
histogram_quantile(0.99,
|
||||
sum(
|
||||
rate(
|
||||
envoy_cluster_upstream_rq_time_bucket{
|
||||
kubernetes_namespace="{{ namespace }}",
|
||||
envoy_cluster_name="{{ target }}-canary"
|
||||
}[{{ interval }}]
|
||||
)
|
||||
) by (le)
|
||||
)
|
||||
```
|
||||
|
||||
Create a canary custom resource:
|
||||
|
||||
```yaml
|
||||
apiVersion: flagger.app/v1beta1
|
||||
kind: Canary
|
||||
metadata:
|
||||
name: podinfo
|
||||
namespace: test
|
||||
spec:
|
||||
provider: "smi:crossover"
|
||||
# deployment reference
|
||||
targetRef:
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
name: podinfo
|
||||
progressDeadlineSeconds: 60
|
||||
# HPA reference (optional)
|
||||
autoscalerRef:
|
||||
apiVersion: autoscaling/v2beta2
|
||||
kind: HorizontalPodAutoscaler
|
||||
name: podinfo
|
||||
service:
|
||||
port: 9898
|
||||
# define the canary analysis timing and KPIs
|
||||
analysis:
|
||||
# schedule interval (default 60s)
|
||||
interval: 1m
|
||||
# max number of failed metric checks before rollback
|
||||
threshold: 5
|
||||
# max traffic percentage routed to canary
|
||||
# percentage (0-100)
|
||||
maxWeight: 50
|
||||
# canary increment step
|
||||
# percentage (0-100)
|
||||
stepWeight: 5
|
||||
metrics:
|
||||
- name: error-rate
|
||||
templateRef:
|
||||
name: error-rate
|
||||
thresholdRange:
|
||||
max: 1
|
||||
interval: 30s
|
||||
- name: latency
|
||||
templateRef:
|
||||
name: latency
|
||||
thresholdRange:
|
||||
max: 0.5
|
||||
interval: 30s
|
||||
webhooks:
|
||||
- name: acceptance-test
|
||||
type: pre-rollout
|
||||
url: http://flagger-loadtester.test/
|
||||
timeout: 30s
|
||||
metadata:
|
||||
type: bash
|
||||
cmd: "curl -sd 'test' http://podinfo-canary.test:9898/token | grep token"
|
||||
- name: load-test
|
||||
url: http://flagger-loadtester.test/
|
||||
timeout: 5s
|
||||
metadata:
|
||||
cmd: "hey -z 1m -q 10 -c 2 -H 'Host: podinfo.test' http://envoy.test:10000/"
|
||||
```
|
||||
|
||||
Save the above resource as podinfo-canary.yaml and then apply it:
|
||||
|
||||
```bash
|
||||
kubectl apply -f ./podinfo-canary.yaml
|
||||
```
|
||||
|
||||
After a couple of seconds Flagger will create the canary objects:
|
||||
|
||||
```bash
|
||||
# applied
|
||||
deployment.apps/podinfo
|
||||
horizontalpodautoscaler.autoscaling/podinfo
|
||||
canary.flagger.app/podinfo
|
||||
|
||||
# generated
|
||||
deployment.apps/podinfo-primary
|
||||
horizontalpodautoscaler.autoscaling/podinfo-primary
|
||||
service/podinfo
|
||||
service/podinfo-canary
|
||||
service/podinfo-primary
|
||||
trafficsplits.split.smi-spec.io/podinfo
|
||||
```
|
||||
|
||||
After the boostrap, the podinfo deployment will be scaled to zero and the traffic to `podinfo.test` will be routed to the primary pods. During the canary analysis, the `podinfo-canary.test` address can be used to target directly the canary pods.
|
||||
|
||||
## Automated canary promotion
|
||||
|
||||
Flagger implements a control loop that gradually shifts traffic to the canary while measuring key performance indicators like HTTP requests success rate, requests average duration and pod health. Based on analysis of the KPIs a canary is promoted or aborted, and the analysis result is published to Slack.
|
||||
|
||||

|
||||
|
||||
A canary deployment is triggered by changes in any of the following objects:
|
||||
|
||||
* Deployment PodSpec \(container image, command, ports, env, resources, etc\)
|
||||
* ConfigMaps and Secrets mounted as volumes or mapped to environment variables
|
||||
|
||||
Trigger a canary deployment by updating the container image:
|
||||
|
||||
```bash
|
||||
kubectl -n test set image deployment/podinfo \
|
||||
podinfod=stefanprodan/podinfo:3.1.5
|
||||
```
|
||||
|
||||
Flagger detects that the deployment revision changed and starts a new rollout:
|
||||
|
||||
```text
|
||||
kubectl -n test describe canary/podinfo
|
||||
|
||||
Status:
|
||||
Canary Weight: 0
|
||||
Failed Checks: 0
|
||||
Phase: Succeeded
|
||||
Events:
|
||||
New revision detected! Scaling up podinfo.test
|
||||
Waiting for podinfo.test rollout to finish: 0 of 1 updated replicas are available
|
||||
Pre-rollout check acceptance-test passed
|
||||
Advance podinfo.test canary weight 5
|
||||
Advance podinfo.test canary weight 10
|
||||
Advance podinfo.test canary weight 15
|
||||
Advance podinfo.test canary weight 20
|
||||
Advance podinfo.test canary weight 25
|
||||
Advance podinfo.test canary weight 30
|
||||
Advance podinfo.test canary weight 35
|
||||
Advance podinfo.test canary weight 40
|
||||
Advance podinfo.test canary weight 45
|
||||
Advance podinfo.test canary weight 50
|
||||
Copying podinfo.test template spec to podinfo-primary.test
|
||||
Waiting for podinfo-primary.test rollout to finish: 1 of 2 updated replicas are available
|
||||
Routing all traffic to primary
|
||||
Promotion completed! Scaling down podinfo.test
|
||||
```
|
||||
|
||||
When the canary analysis starts, Flagger will call the pre-rollout webhooks before routing traffic to the canary.
|
||||
|
||||
**Note** that if you apply new changes to the deployment during the canary analysis, Flagger will restart the analysis.
|
||||
|
||||
During the analysis the canary’s progress can be monitored with Grafana.
|
||||
|
||||
Flagger comes with a Grafana dashboard made for canary analysis. Install Grafana with Helm:
|
||||
|
||||
```bash
|
||||
helm upgrade -i flagger-grafana flagger/grafana \
|
||||
--namespace=test \
|
||||
--set url=http://flagger-prometheus:9090
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
kubectl port-forward --namespace test svc/flagger-grafana 3000:80
|
||||
```
|
||||
|
||||
The Envoy dashboard URL is [http://localhost:3000/d/flagger-envoy/envoy-canary?refresh=10s&orgId=1&var-namespace=test&var-target=podinfo](http://localhost:3000/d/flagger-envoy/envoy-canary?refresh=10s&orgId=1&var-namespace=test&var-target=podinfo)
|
||||
|
||||

|
||||
|
||||
You can monitor all canaries with:
|
||||
|
||||
```bash
|
||||
watch kubectl get canaries --all-namespaces
|
||||
|
||||
NAMESPACE NAME STATUS WEIGHT LASTTRANSITIONTIME
|
||||
test podinfo Progressing 15 2019-10-02T14:05:07Z
|
||||
prod frontend Succeeded 0 2019-10-02T16:15:07Z
|
||||
prod backend Failed 0 2019-10-02T17:05:07Z
|
||||
```
|
||||
|
||||
If you’ve enabled the Slack notifications, you should receive the following messages:
|
||||
|
||||

|
||||
|
||||
## Automated rollback
|
||||
|
||||
During the canary analysis you can generate HTTP 500 errors or high latency to test if Flagger pauses the rollout.
|
||||
|
||||
Trigger a canary deployment:
|
||||
|
||||
```bash
|
||||
kubectl -n test set image deployment/podinfo \
|
||||
podinfod=stefanprodan/podinfo:3.1.2
|
||||
```
|
||||
|
||||
Exec into the load tester pod with:
|
||||
|
||||
```bash
|
||||
kubectl -n test exec -it deploy/flagger-loadtester bash
|
||||
```
|
||||
|
||||
Generate HTTP 500 errors:
|
||||
|
||||
```bash
|
||||
hey -z 1m -c 5 -q 5 -H 'Host: podinfo.test' http://envoy.test:10000/status/500
|
||||
```
|
||||
|
||||
Generate latency:
|
||||
|
||||
```bash
|
||||
watch -n 1 curl -H 'Host: podinfo.test' http://envoy.test:10000/delay/1
|
||||
```
|
||||
|
||||
When the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary, the canary is scaled to zero and the rollout is marked as failed.
|
||||
|
||||
```text
|
||||
kubectl -n test logs deploy/flagger -f | jq .msg
|
||||
|
||||
New revision detected! progressing canary analysis for podinfo.test
|
||||
Pre-rollout check acceptance-test passed
|
||||
Advance podinfo.test canary weight 5
|
||||
Advance podinfo.test canary weight 10
|
||||
Advance podinfo.test canary weight 15
|
||||
Halt podinfo.test advancement success rate 69.17% < 99%
|
||||
Halt podinfo.test advancement success rate 61.39% < 99%
|
||||
Halt podinfo.test advancement success rate 55.06% < 99%
|
||||
Halt podinfo.test advancement request duration 1.20s > 0.5s
|
||||
Halt podinfo.test advancement request duration 1.45s > 0.5s
|
||||
Rolling back podinfo.test failed checks threshold reached 5
|
||||
Canary failed! Scaling down podinfo.test
|
||||
```
|
||||
|
||||
If you’ve enabled the Slack notifications, you’ll receive a message if the progress deadline is exceeded, or if the analysis reached the maximum number of failed checks:
|
||||
|
||||

|
||||
|
||||
@@ -1,16 +1,18 @@
|
||||
# Gloo Canary Deployments
|
||||
|
||||
This guide shows you how to use the [Gloo Edge](https://gloo.solo.io/) ingress controller and Flagger to automate canary deployments.
|
||||
This guide shows you how to use the [Gloo Edge](https://gloo.solo.io/) ingress controller
|
||||
and Flagger to automate canary releases and A/B testing.
|
||||
|
||||

|
||||
|
||||
## Prerequisites
|
||||
|
||||
This guide was written for Flagger version **1.5.0** or higher. Prior versions of Flagger used Gloo upstream groups to handle
|
||||
canaries, but newer versions of Flagger use Gloo route tables to handle canaries as well as A/B testing.
|
||||
|
||||
Flagger requires a Kubernetes cluster **v1.16** or newer and Gloo Edge ingress **1.6.0** or newer.
|
||||
|
||||
This guide was written for Flagger version **1.6.0** or higher. Prior versions of Flagger
|
||||
used Gloo upstream groups to handle canaries, but newer versions of Flagger use Gloo
|
||||
route tables to handle canaries as well as A/B testing.
|
||||
|
||||
Install Gloo with Helm v3:
|
||||
|
||||
```bash
|
||||
@@ -33,7 +35,9 @@ helm upgrade -i flagger flagger/flagger \
|
||||
|
||||
## Bootstrap
|
||||
|
||||
Flagger takes a Kubernetes deployment and optionally a horizontal pod autoscaler \(HPA\), then creates a series of objects \(Kubernetes deployments, ClusterIP services and Gloo route tables groups\). These objects expose the application outside the cluster and drive the canary analysis and promotion.
|
||||
Flagger takes a Kubernetes deployment and optionally a horizontal pod autoscaler (HPA),
|
||||
then creates a series of objects (Kubernetes deployments, ClusterIP services and Gloo route tables groups).
|
||||
These objects expose the application outside the cluster and drive the canary analysis and promotion.
|
||||
|
||||
Create a test namespace:
|
||||
|
||||
@@ -53,7 +57,8 @@ Deploy the load testing service to generate traffic during the canary analysis:
|
||||
kubectl -n test apply -k https://github.com/fluxcd/flagger//kustomize/tester?ref=main
|
||||
```
|
||||
|
||||
Create a virtual service definition that references a route table that will be generated by Flagger \(replace `app.example.com` with your own domain\):
|
||||
Create a virtual service definition that references a route table that will be generated by Flagger
|
||||
(replace `app.example.com` with your own domain):
|
||||
|
||||
```yaml
|
||||
apiVersion: gateway.solo.io/v1
|
||||
@@ -80,7 +85,7 @@ Save the above resource as podinfo-virtualservice.yaml and then apply it:
|
||||
kubectl apply -f ./podinfo-virtualservice.yaml
|
||||
```
|
||||
|
||||
Create a canary custom resource \(replace `app.example.com` with your own domain\):
|
||||
Create a canary custom resource (replace `app.example.com` with your own domain):
|
||||
|
||||
```yaml
|
||||
apiVersion: flagger.app/v1beta1
|
||||
@@ -182,7 +187,9 @@ podinfo Initialized 0 2019-05-17T08:09:51Z
|
||||
|
||||
## Automated canary promotion
|
||||
|
||||
Flagger implements a control loop that gradually shifts traffic to the canary while measuring key performance indicators like HTTP requests success rate, requests average duration and pod health. Based on analysis of the KPIs a canary is promoted or aborted, and the analysis result is published to Slack.
|
||||
Flagger implements a control loop that gradually shifts traffic to the canary while measuring
|
||||
key performance indicators like HTTP requests success rate, requests average duration and pod health.
|
||||
Based on analysis of the KPIs a canary is promoted or aborted, and the analysis result is published to Slack.
|
||||
|
||||

|
||||
|
||||
@@ -238,7 +245,8 @@ prod backend Failed 0 2019-05-17T17:05:07Z
|
||||
|
||||
## Automated rollback
|
||||
|
||||
During the canary analysis you can generate HTTP 500 errors and high latency to test if Flagger pauses and rolls back the faulted version.
|
||||
During the canary analysis you can generate HTTP 500 errors and high latency to test if
|
||||
Flagger pauses and rolls back the faulted version.
|
||||
|
||||
Trigger another canary deployment:
|
||||
|
||||
@@ -259,7 +267,8 @@ Generate high latency:
|
||||
watch curl -H 'Host: app.example.com' http://gateway-proxy.gloo-system/delay/2
|
||||
```
|
||||
|
||||
When the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary, the canary is scaled to zero and the rollout is marked as failed.
|
||||
When the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary,
|
||||
the canary is scaled to zero and the rollout is marked as failed.
|
||||
|
||||
```text
|
||||
kubectl -n test describe canary/podinfo
|
||||
@@ -288,7 +297,8 @@ Events:
|
||||
|
||||
The canary analysis can be extended with Prometheus queries.
|
||||
|
||||
The demo app is instrumented with Prometheus so you can create a custom check that will use the HTTP request duration histogram to validate the canary.
|
||||
The demo app is instrumented with Prometheus so you can create a custom check that will use the HTTP request
|
||||
duration histogram to validate the canary.
|
||||
|
||||
Create a metric template and apply it on the cluster:
|
||||
|
||||
@@ -336,7 +346,8 @@ Edit the canary analysis and add the following metric:
|
||||
interval: 1m
|
||||
```
|
||||
|
||||
The above configuration validates the canary by checking if the HTTP 404 req/sec percentage is below 5 percent of the total traffic. If the 404s rate reaches the 5% threshold, then the canary fails.
|
||||
The above configuration validates the canary by checking if the HTTP 404 req/sec percentage
|
||||
is below 5 percent of the total traffic. If the 404s rate reaches the 5% threshold, then the canary fails.
|
||||
|
||||
Trigger a canary deployment by updating the container image:
|
||||
|
||||
@@ -369,7 +380,8 @@ Rolling back podinfo.test failed checks threshold reached 5
|
||||
Canary failed! Scaling down podinfo.test
|
||||
```
|
||||
|
||||
If you have [alerting](../usage/alerting.md) configured, Flagger will send a notification with the reason why the canary failed.
|
||||
If you have [alerting](../usage/alerting.md) configured,
|
||||
Flagger will send a notification with the reason why the canary failed.
|
||||
|
||||
For an in-depth look at the analysis process read the [usage docs](../usage/how-it-works.md).
|
||||
|
||||
|
||||
@@ -1,10 +1,11 @@
|
||||
# Canary analysis with Prometheus Operator
|
||||
|
||||
This guide show you how to use Prometheus Operator for canary analysis.
|
||||
This guide show you how to use
|
||||
[Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator) for canary analysis.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Flagger requires a Kubernetes cluster **v1.16** or newer.
|
||||
Flagger requires a Kubernetes cluster **v1.16** or newer and Prometheus Operator **v0.40** or newer.
|
||||
|
||||
Install Prometheus Operator with Helm v3:
|
||||
|
||||
@@ -19,7 +20,7 @@ helm upgrade -i prometheus prometheus-community/kube-prometheus-stack \
|
||||
```
|
||||
|
||||
The `prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false`
|
||||
option allows Prometheus operator to watch serviceMonitors outside of his namespace.
|
||||
option allows Prometheus Operator to watch serviceMonitors outside of its namespace.
|
||||
|
||||
Install Flagger by setting the metrics server to Prometheus:
|
||||
|
||||
@@ -40,7 +41,7 @@ helm upgrade -i loadtester flagger/loadtester \
|
||||
--namespace flagger-system
|
||||
```
|
||||
|
||||
Install podinfo demo app:
|
||||
Install [podinfo](https://github.com/stefanprodan/podinfo) demo app:
|
||||
|
||||
```bash
|
||||
helm repo add podinfo https://stefanprodan.github.io/podinfo
|
||||
@@ -53,23 +54,8 @@ helm upgrade -i podinfo podinfo/podinfo \
|
||||
|
||||
## Service monitors
|
||||
|
||||
The demo app is instrumented with Prometheus so you can create service monitors to scrape podinfo's metrics endpoint:
|
||||
|
||||
```yaml
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: ServiceMonitor
|
||||
metadata:
|
||||
name: podinfo-primary
|
||||
namespace: test
|
||||
spec:
|
||||
endpoints:
|
||||
- path: /metrics
|
||||
port: http
|
||||
interval: 5s
|
||||
selector:
|
||||
matchLabels:
|
||||
app: podinfo
|
||||
```
|
||||
The demo app is instrumented with Prometheus,
|
||||
so you can create a `ServiceMonitor` objects to scrape podinfo's metrics endpoint:
|
||||
|
||||
```yaml
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
@@ -85,10 +71,24 @@ spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
app: podinfo-canary
|
||||
---
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: ServiceMonitor
|
||||
metadata:
|
||||
name: podinfo-primary
|
||||
namespace: test
|
||||
spec:
|
||||
endpoints:
|
||||
- path: /metrics
|
||||
port: http
|
||||
interval: 5s
|
||||
selector:
|
||||
matchLabels:
|
||||
app: podinfo
|
||||
```
|
||||
|
||||
We are setting `interval: 5s` to have a more aggressive scraping.
|
||||
If you do not define it, you must to use a longer interval in the Canary object.
|
||||
If you do not define it, you should use a longer interval in the Canary object.
|
||||
|
||||
## Metric templates
|
||||
|
||||
@@ -197,4 +197,3 @@ Based on the above specification, Flagger creates the primary and canary Kuberne
|
||||
|
||||
During the canary analysis, Prometheus will scrape the canary service and Flagger will use the HTTP error rate
|
||||
and latency queries to determine if the release should be promoted or rolled back.
|
||||
|
||||
|
||||
@@ -1,2 +0,0 @@
|
||||
# Rollout Weights
|
||||
|
||||
Reference in New Issue
Block a user