Add canary rollback scenario

This commit is contained in:
stefanprodan
2019-02-13 12:48:26 +02:00
parent adaf4c99c0
commit bb0175aebf
3 changed files with 30 additions and 14 deletions

View File

@@ -1,10 +1,11 @@
# Canary Deployments with Helm charts
This guide shows you how to package a web app into a Helm chart and trigger a canary deployment on upgrade.
This guide shows you how to package a web app into a Helm chart, trigger canary deployments on Helm upgrade
and automate the chart release process with Weave Flux.
### Packaging
You'll be using the [podinfo](https://github.com/stefanprodan/flagger/tree/master/charts/podinfo) chart.
You'll be using the [podinfo](https://github.com/stefanprodan/k8s-podinfo) chart.
This chart packages a web app made with Go, it's configuration, a horizontal pod autoscaler (HPA)
and the canary configuration file.
@@ -21,6 +22,8 @@ and the canary configuration file.
└── values.yaml
```
You can find the chart source [here](https://github.com/stefanprodan/flagger/tree/master/charts/podinfo).
### Install
Create a test namespace with Istio sidecar injection enabled:
@@ -50,7 +53,9 @@ helm upgrade -i frontend flagger/podinfo \
--set canary.istioIngress.host=frontend.istio.example.com
```
After a couple of seconds Flagger will create the canary objects:
Flagger takes a Kubernetes deployment and a horizontal pod autoscaler (HPA),
then creates a series of objects (Kubernetes deployments, ClusterIP services and Istio virtual services).
These objects expose the application on the mesh and drive the canary analysis and promotion.
```bash
# generated by Helm
@@ -74,7 +79,7 @@ Flagger will route all traffic to the primary pods and scale to zero the `fronte
Open your browser and navigate to the frontend URL:
![Frontend](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/screens/demo-frontend.png)
![Podinfo Frontend](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/screens/demo-frontend.png)
Now let's install the `backend` release without exposing it outside the mesh:
@@ -99,7 +104,7 @@ frontend Initialized 0 2019-02-12T17:50:50Z
Click on the ping button in the `frontend` UI to trigger a HTTP POST request
that will reach the `backend` app:
![Jaeger](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/screens/demo-frontend-jaeger.png)
![Jaeger Tracing](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/screens/demo-frontend-jaeger.png)
We'll use the `/echo` endpoint (same as the one the ping button calls)
to generate load on both apps during a canary deployment.
@@ -203,8 +208,8 @@ Advance backend.test canary weight 45
Halt backend.test advancement request duration 2.415s > 500ms
Halt backend.test advancement request duration 2.42s > 500ms
Advance backend.test canary weight 50
Copying backend.test template spec to backend-primary.test
ConfigMap backend-primary synced
Copying backend.test template spec to backend-primary.test
Promotion completed! Scaling down backend.test
```
@@ -223,7 +228,6 @@ frontend Failed 0 2019-02-12T19:47:20Z
If you've enabled the Slack notifications, you'll receive an alert with the reason why the `backend` promotion failed.
### GitOps automation
Instead of using Helm CLI from a CI tool to perform the install and upgrade, you could use a Git based approach.
@@ -242,6 +246,8 @@ Create a git repository with the following content:
└── loadtester.yaml
```
You can find the git source [here](https://github.com/stefanprodan/flagger/tree/master/artifacts/cluster).
Define the `frontend` release using Flux `HelmRelease` custom resource:
```yaml
@@ -278,7 +284,7 @@ In the `chart` section I've defined the release source by specifying the Helm re
In the `values` section I've overwritten the defaults set in values.yaml.
With the `flux.weave.works` annotations I instruct Flux to automate this release.
When a image tag in the sem ver range of `1.4.0 - 1.4.99` is pushed to Quay,
When an image tag in the sem ver range of `1.4.0 - 1.4.99` is pushed to Quay,
Flux will upgrade the Helm release and from there Flagger will pick up the change and start a canary deployment.
A CI/CD pipeline for the frontend release could look like this:
@@ -296,3 +302,14 @@ A CI/CD pipeline for the frontend release could look like this:
If the canary fails, fix the bug, do another patch release eg `1.4.2` and the whole process will run again.
There are a couple of reasons why a canary deployment fails:
* the container image can't be downloaded
* the deployment replica set is stuck for more then ten minutes (eg. due to a container crash loop)
* the webooks (acceptance tests, load tests, etc) are returning a non 2xx response
* the HTTP success rate (non 5xx responses) metric drops under the threshold
* the HTTP average duration metric goes over the threshold
* the Istio telemetry service is unable to collect traffic metrics
* the metrics server (Prometheus) can't be reached