mirror of
https://github.com/fluxcd/flagger.git
synced 2026-03-04 18:51:02 +00:00
379 lines
12 KiB
Markdown
379 lines
12 KiB
Markdown
# Istio Canary Deployments
|
|
|
|
This guide shows you how to use Istio and Flagger to automate canary deployments.
|
|
|
|

|
|
|
|
## Prerequisites
|
|
|
|
Flagger requires a Kubernetes cluster **v1.11** or newer and Istio **v1.5** or newer.
|
|
|
|
Install Istio with telemetry support and Prometheus:
|
|
|
|
```bash
|
|
istioctl manifest apply --set profile=default
|
|
```
|
|
|
|
Install Flagger using Kustomize (kubectl >= 1.14) in the `istio-system` namespace:
|
|
|
|
```bash
|
|
kubectl apply -k github.com/weaveworks/flagger//kustomize/istio
|
|
```
|
|
|
|
Create an ingress gateway to expose the demo app outside of the mesh:
|
|
|
|
```yaml
|
|
apiVersion: networking.istio.io/v1alpha3
|
|
kind: Gateway
|
|
metadata:
|
|
name: public-gateway
|
|
namespace: istio-system
|
|
spec:
|
|
selector:
|
|
istio: ingressgateway
|
|
servers:
|
|
- port:
|
|
number: 80
|
|
name: http
|
|
protocol: HTTP
|
|
hosts:
|
|
- "*"
|
|
```
|
|
|
|
## Bootstrap
|
|
|
|
Flagger takes a Kubernetes deployment and optionally a horizontal pod autoscaler (HPA),
|
|
then creates a series of objects (Kubernetes deployments, ClusterIP services,
|
|
Istio destination rules and virtual services).
|
|
These objects expose the application inside the mesh and drive the canary analysis and promotion.
|
|
|
|
Create a test namespace with Istio sidecar injection enabled:
|
|
|
|
```bash
|
|
kubectl create ns test
|
|
kubectl label namespace test istio-injection=enabled
|
|
```
|
|
|
|
Create a deployment and a horizontal pod autoscaler:
|
|
|
|
```bash
|
|
kubectl apply -k github.com/weaveworks/flagger//kustomize/podinfo
|
|
```
|
|
|
|
Deploy the load testing service to generate traffic during the canary analysis:
|
|
|
|
```bash
|
|
kubectl apply -k github.com/weaveworks/flagger//kustomize/tester
|
|
```
|
|
|
|
Create a canary custom resource (replace example.com with your own domain):
|
|
|
|
```yaml
|
|
apiVersion: flagger.app/v1beta1
|
|
kind: Canary
|
|
metadata:
|
|
name: podinfo
|
|
namespace: test
|
|
spec:
|
|
# deployment reference
|
|
targetRef:
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
name: podinfo
|
|
# the maximum time in seconds for the canary deployment
|
|
# to make progress before it is rollback (default 600s)
|
|
progressDeadlineSeconds: 60
|
|
# HPA reference (optional)
|
|
autoscalerRef:
|
|
apiVersion: autoscaling/v2beta1
|
|
kind: HorizontalPodAutoscaler
|
|
name: podinfo
|
|
service:
|
|
# service port number
|
|
port: 9898
|
|
# container port number or name (optional)
|
|
targetPort: 9898
|
|
# Istio gateways (optional)
|
|
gateways:
|
|
- public-gateway.istio-system.svc.cluster.local
|
|
# Istio virtual service host names (optional)
|
|
hosts:
|
|
- app.example.com
|
|
# Istio traffic policy (optional)
|
|
trafficPolicy:
|
|
tls:
|
|
# use ISTIO_MUTUAL when mTLS is enabled
|
|
mode: DISABLE
|
|
# Istio retry policy (optional)
|
|
retries:
|
|
attempts: 3
|
|
perTryTimeout: 1s
|
|
retryOn: "gateway-error,connect-failure,refused-stream"
|
|
analysis:
|
|
# schedule interval (default 60s)
|
|
interval: 1m
|
|
# max number of failed metric checks before rollback
|
|
threshold: 5
|
|
# max traffic percentage routed to canary
|
|
# percentage (0-100)
|
|
maxWeight: 50
|
|
# canary increment step
|
|
# percentage (0-100)
|
|
stepWeight: 10
|
|
metrics:
|
|
- name: request-success-rate
|
|
# minimum req success rate (non 5xx responses)
|
|
# percentage (0-100)
|
|
thresholdRange:
|
|
min: 99
|
|
interval: 1m
|
|
- name: request-duration
|
|
# maximum req duration P99
|
|
# milliseconds
|
|
thresholdRange:
|
|
max: 500
|
|
interval: 30s
|
|
# testing (optional)
|
|
webhooks:
|
|
- name: acceptance-test
|
|
type: pre-rollout
|
|
url: http://flagger-loadtester.test/
|
|
timeout: 30s
|
|
metadata:
|
|
type: bash
|
|
cmd: "curl -sd 'test' http://podinfo-canary:9898/token | grep token"
|
|
- name: load-test
|
|
url: http://flagger-loadtester.test/
|
|
timeout: 5s
|
|
metadata:
|
|
cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/"
|
|
```
|
|
|
|
**Note** that when using Istio 1.4 you have to replace the `request-duration`
|
|
with a [metric template](https://docs.flagger.app/dev/upgrade-guide#istio-telemetry-v2).
|
|
|
|
Save the above resource as podinfo-canary.yaml and then apply it:
|
|
|
|
```bash
|
|
kubectl apply -f ./podinfo-canary.yaml
|
|
```
|
|
|
|
When the canary analysis starts, Flagger will call the pre-rollout webhooks before routing traffic to the canary.
|
|
The canary analysis will run for five minutes while validating the HTTP metrics and rollout hooks every minute.
|
|
|
|

|
|
|
|
After a couple of seconds Flagger will create the canary objects:
|
|
|
|
```bash
|
|
# applied
|
|
deployment.apps/podinfo
|
|
horizontalpodautoscaler.autoscaling/podinfo
|
|
canary.flagger.app/podinfo
|
|
|
|
# generated
|
|
deployment.apps/podinfo-primary
|
|
horizontalpodautoscaler.autoscaling/podinfo-primary
|
|
service/podinfo
|
|
service/podinfo-canary
|
|
service/podinfo-primary
|
|
destinationrule.networking.istio.io/podinfo-canary
|
|
destinationrule.networking.istio.io/podinfo-primary
|
|
virtualservice.networking.istio.io/podinfo
|
|
```
|
|
|
|
## Automated canary promotion
|
|
|
|
Trigger a canary deployment by updating the container image:
|
|
|
|
```bash
|
|
kubectl -n test set image deployment/podinfo \
|
|
podinfod=stefanprodan/podinfo:3.1.1
|
|
```
|
|
|
|
Flagger detects that the deployment revision changed and starts a new rollout:
|
|
|
|
```text
|
|
kubectl -n test describe canary/podinfo
|
|
|
|
Status:
|
|
Canary Weight: 0
|
|
Failed Checks: 0
|
|
Phase: Succeeded
|
|
Events:
|
|
Type Reason Age From Message
|
|
---- ------ ---- ---- -------
|
|
Normal Synced 3m flagger New revision detected podinfo.test
|
|
Normal Synced 3m flagger Scaling up podinfo.test
|
|
Warning Synced 3m flagger Waiting for podinfo.test rollout to finish: 0 of 1 updated replicas are available
|
|
Normal Synced 3m flagger Advance podinfo.test canary weight 5
|
|
Normal Synced 3m flagger Advance podinfo.test canary weight 10
|
|
Normal Synced 3m flagger Advance podinfo.test canary weight 15
|
|
Normal Synced 2m flagger Advance podinfo.test canary weight 20
|
|
Normal Synced 2m flagger Advance podinfo.test canary weight 25
|
|
Normal Synced 1m flagger Advance podinfo.test canary weight 30
|
|
Normal Synced 1m flagger Advance podinfo.test canary weight 35
|
|
Normal Synced 55s flagger Advance podinfo.test canary weight 40
|
|
Normal Synced 45s flagger Advance podinfo.test canary weight 45
|
|
Normal Synced 35s flagger Advance podinfo.test canary weight 50
|
|
Normal Synced 25s flagger Copying podinfo.test template spec to podinfo-primary.test
|
|
Warning Synced 15s flagger Waiting for podinfo-primary.test rollout to finish: 1 of 2 updated replicas are available
|
|
Normal Synced 5s flagger Promotion completed! Scaling down podinfo.test
|
|
```
|
|
|
|
**Note** that if you apply new changes to the deployment during the canary analysis, Flagger will restart the analysis.
|
|
|
|
A canary deployment is triggered by changes in any of the following objects:
|
|
|
|
* Deployment PodSpec \(container image, command, ports, env, resources, etc\)
|
|
* ConfigMaps mounted as volumes or mapped to environment variables
|
|
* Secrets mounted as volumes or mapped to environment variables
|
|
|
|
You can monitor all canaries with:
|
|
|
|
```bash
|
|
watch kubectl get canaries --all-namespaces
|
|
|
|
NAMESPACE NAME STATUS WEIGHT LASTTRANSITIONTIME
|
|
test podinfo Progressing 15 2019-01-16T14:05:07Z
|
|
prod frontend Succeeded 0 2019-01-15T16:15:07Z
|
|
prod backend Failed 0 2019-01-14T17:05:07Z
|
|
```
|
|
|
|
## Automated rollback
|
|
|
|
During the canary analysis you can generate HTTP 500 errors and high latency to test if Flagger pauses the rollout.
|
|
|
|
Trigger another canary deployment:
|
|
|
|
```bash
|
|
kubectl -n test set image deployment/podinfo \
|
|
podinfod=stefanprodan/podinfo:3.1.2
|
|
```
|
|
|
|
Exec into the load tester pod with:
|
|
|
|
```bash
|
|
kubectl -n test exec -it flagger-loadtester-xx-xx sh
|
|
```
|
|
|
|
Generate HTTP 500 errors:
|
|
|
|
```bash
|
|
watch curl http://podinfo-canary:9898/status/500
|
|
```
|
|
|
|
Generate latency:
|
|
|
|
```bash
|
|
watch curl http://podinfo-canary:9898/delay/1
|
|
```
|
|
|
|
When the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary,
|
|
the canary is scaled to zero and the rollout is marked as failed.
|
|
|
|
```text
|
|
kubectl -n test describe canary/podinfo
|
|
|
|
Status:
|
|
Canary Weight: 0
|
|
Failed Checks: 10
|
|
Phase: Failed
|
|
Events:
|
|
Type Reason Age From Message
|
|
---- ------ ---- ---- -------
|
|
Normal Synced 3m flagger Starting canary deployment for podinfo.test
|
|
Normal Synced 3m flagger Advance podinfo.test canary weight 5
|
|
Normal Synced 3m flagger Advance podinfo.test canary weight 10
|
|
Normal Synced 3m flagger Advance podinfo.test canary weight 15
|
|
Normal Synced 3m flagger Halt podinfo.test advancement success rate 69.17% < 99%
|
|
Normal Synced 2m flagger Halt podinfo.test advancement success rate 61.39% < 99%
|
|
Normal Synced 2m flagger Halt podinfo.test advancement success rate 55.06% < 99%
|
|
Normal Synced 2m flagger Halt podinfo.test advancement success rate 47.00% < 99%
|
|
Normal Synced 2m flagger (combined from similar events): Halt podinfo.test advancement success rate 38.08% < 99%
|
|
Warning Synced 1m flagger Rolling back podinfo.test failed checks threshold reached 10
|
|
Warning Synced 1m flagger Canary failed! Scaling down podinfo.test
|
|
```
|
|
|
|
## Traffic mirroring
|
|
|
|

|
|
|
|
For applications that perform read operations, Flagger can be configured to drive canary releases with traffic mirroring.
|
|
Istio traffic mirroring will copy each incoming request, sending one request to the primary and one to the canary service.
|
|
The response from the primary is sent back to the user and the response from the canary is discarded.
|
|
Metrics are collected on both requests so that the deployment will only proceed if the canary metrics are within the threshold values.
|
|
|
|
Note that mirroring should be used for requests that are **idempotent** or capable of being processed twice (once by the primary and once by the canary).
|
|
|
|
You can enable mirroring by replacing `stepWeight/maxWeight` with `iterations` and by setting `analysis.mirror` to `true`:
|
|
|
|
```yaml
|
|
apiVersion: flagger.app/v1beta1
|
|
kind: Canary
|
|
metadata:
|
|
name: podinfo
|
|
namespace: test
|
|
spec:
|
|
analysis:
|
|
# schedule interval
|
|
interval: 1m
|
|
# max number of failed metric checks before rollback
|
|
threshold: 5
|
|
# total number of iterations
|
|
iterations: 10
|
|
# enable traffic shadowing
|
|
mirror: true
|
|
# weight of the traffic mirrored to your canary (defaults to 100%)
|
|
mirrorWeight: 100
|
|
metrics:
|
|
- name: request-success-rate
|
|
thresholdRange:
|
|
min: 99
|
|
interval: 1m
|
|
- name: request-duration
|
|
thresholdRange:
|
|
max: 500
|
|
interval: 1m
|
|
webhooks:
|
|
- name: acceptance-test
|
|
type: pre-rollout
|
|
url: http://flagger-loadtester.test/
|
|
timeout: 30s
|
|
metadata:
|
|
type: bash
|
|
cmd: "curl -sd 'test' http://podinfo-canary:9898/token | grep token"
|
|
- name: load-test
|
|
url: http://flagger-loadtester.test/
|
|
timeout: 5s
|
|
metadata:
|
|
cmd: "hey -z 1m -q 10 -c 2 http://podinfo.test:9898/"
|
|
```
|
|
|
|
With the above configuration, Flagger will run a canary release with the following steps:
|
|
|
|
* detect new revision (deployment spec, secrets or configmaps changes)
|
|
* scale from zero the canary deployment
|
|
* wait for the HPA to set the canary minimum replicas
|
|
* check canary pods health
|
|
* run the acceptance tests
|
|
* abort the canary release if tests fail
|
|
* start the load tests
|
|
* mirror 100% of the traffic from primary to canary
|
|
* check request success rate and request duration every minute
|
|
* abort the canary release if the metrics check failure threshold is reached
|
|
* stop traffic mirroring after the number of iterations is reached
|
|
* route live traffic to the canary pods
|
|
* promote the canary (update the primary secrets, configmaps and deployment spec)
|
|
* wait for the primary deployment rollout to finish
|
|
* wait for the HPA to set the primary minimum replicas
|
|
* check primary pods health
|
|
* switch live traffic back to primary
|
|
* scale to zero the canary
|
|
* send notification with the canary analysis result
|
|
|
|
The above procedure can be extended with [custom metrics](../usage/metrics.md) checks,
|
|
[webhooks](../usage/webhooks.md),
|
|
[manual promotion](../usage/webhooks.md#manual-gating) approval and
|
|
[Slack or MS Teams](../usage/alerting.md) notifications.
|