Compare commits

...

265 Commits
0.0.1 ... 0.6.0

Author SHA1 Message Date
Stefan Prodan
04a56a3591 Merge pull request #57 from stefanprodan/release-0.6.0
Release v0.6.0
2019-02-26 01:45:10 +02:00
stefanprodan
4a354e74d4 Update roadmap 2019-02-25 23:45:54 +02:00
stefanprodan
1e3e6427d5 Add link to virtual service docs 2019-02-25 23:22:49 +02:00
stefanprodan
38826108c8 Add changelog for v0.6.0 2019-02-25 23:01:35 +02:00
stefanprodan
4c4752f907 Release v0.6.0 2019-02-25 20:10:33 +02:00
Stefan Prodan
94dcd6c94d Merge pull request #55 from stefanprodan/http-match
Add HTTP match and rewrite to Canary service spec
2019-02-25 20:04:12 +02:00
stefanprodan
eabef3db30 Router improvements
- change virtual service route to canary service
- keep the existing destination weights on virtual service updates
- set the match conditions and URI rewrite when changing the traffic weight
2019-02-25 03:14:45 +02:00
stefanprodan
6750f10ffa Add HTTP match and rewrite docs 2019-02-25 03:07:39 +02:00
stefanprodan
56cb888cbf Add HTTP match and rewrite to virtual service 2019-02-25 00:08:06 +02:00
stefanprodan
b3e7fb3417 Add HTTP match and rewrite to Canary service spec 2019-02-25 00:06:14 +02:00
stefanprodan
2c6e1baca2 Update istio client 2019-02-25 00:05:09 +02:00
Stefan Prodan
c8358929d1 Merge pull request #54 from stefanprodan/vsvc
Refactor virtual service sync
2019-02-24 21:18:01 +02:00
stefanprodan
1dc7677dfb Add tests for virtual service sync 2019-02-24 19:58:01 +02:00
stefanprodan
8e699a7543 Detect changes in virtual service
- ignore destination weight when comparing the two specs
2019-02-24 18:25:12 +02:00
Stefan Prodan
cbbabdfac0 Merge pull request #53 from stefanprodan/kind
Add CircleCI workflow for end-to-end testing with Kubernetes Kind
2019-02-24 12:44:20 +02:00
stefanprodan
9d92de234c Increase promotion e2e wait time to 10s 2019-02-24 11:55:37 +02:00
stefanprodan
ba65975fb5 Add e2e testing docs 2019-02-24 11:41:22 +02:00
stefanprodan
ef423b2078 Move Flagger e2e build to a dedicated job 2019-02-24 03:10:50 +02:00
stefanprodan
f451b4e36c Split e2e prerequisites 2019-02-24 02:52:25 +02:00
stefanprodan
0856e13ee6 Use kind kubeconfig 2019-02-24 02:35:36 +02:00
stefanprodan
87b9fa8ca7 Move cluster init to prerequisites 2019-02-24 02:24:23 +02:00
stefanprodan
5b43d3d314 Use local docker image for e2e testing 2019-02-24 02:11:32 +02:00
stefanprodan
ac4972dd8d Fix e2e paths 2019-02-24 02:09:45 +02:00
stefanprodan
8a8f68af5d Test CircleCI 2019-02-24 02:02:37 +02:00
stefanprodan
c669dc0c4b Run e2e tests with CircleCI 2019-02-24 01:58:18 +02:00
stefanprodan
863a5466cc Add e2e prerequisites 2019-02-24 01:58:03 +02:00
stefanprodan
e2347c84e3 Use absolute paths in e2e tests 2019-02-24 01:11:04 +02:00
stefanprodan
e0e673f565 Install e2e deps and run tests 2019-02-24 01:03:39 +02:00
stefanprodan
30cbf2a741 Add e2e tests
- create Kubernetes cluster with Kind
- install Istio and Prometheus
- install Flagger
- test canary init and promotion
2019-02-24 01:02:15 +02:00
stefanprodan
f58de3801c Add Istio install values for e2e testing 2019-02-24 01:00:03 +02:00
Stefan Prodan
7c6b88d4c1 Merge pull request #51 from carlossg/update-virtualservice
Update VirtualService when the Canary service spec changes
2019-02-20 09:07:27 +00:00
Carlos Sanchez
0c0ebaecd5 Compare only hosts and gateways 2019-02-19 19:54:38 +01:00
Carlos Sanchez
1925f99118 If generated VirtualService already exists update it
Only if spec has changed
2019-02-19 19:40:46 +01:00
Stefan Prodan
6f2a22a1cc Merge pull request #47 from stefanprodan/release-0.5.1
Release v0.5.1
2019-02-14 12:12:11 +01:00
stefanprodan
ee04082cd7 Release v0.5.1 2019-02-13 18:59:34 +02:00
Stefan Prodan
efd901ac3a Merge pull request #46 from stefanprodan/skip-canary
Add option to skip the canary analysis
2019-02-13 17:28:07 +01:00
stefanprodan
e565789ae8 Add link to Helm GitOps repo 2019-02-13 18:18:37 +02:00
stefanprodan
d3953004f6 Add docs links and trim down the readme 2019-02-13 16:39:48 +02:00
stefanprodan
df1d9e3011 Add skip analysis test 2019-02-13 15:56:40 +02:00
stefanprodan
631c55fa6e Document how to skip the canary analysis 2019-02-13 15:31:01 +02:00
stefanprodan
29cdd43288 Implement skip analysis
When skip analysis is enabled, Flagger checks if the canary deployment is healthy and promotes it without analysing it. If an analysis is underway, Flagger cancels it and runs the promotion.
2019-02-13 15:30:29 +02:00
stefanprodan
9b79af9fcd Add skipAnalysis field to Canary CRD 2019-02-13 15:27:45 +02:00
stefanprodan
2c9c1adb47 Fix docs summary 2019-02-13 13:05:57 +02:00
Stefan Prodan
5dfb5808c4 Merge pull request #44 from stefanprodan/helm-docs
Add Helm and Weave Flux GitOps article
2019-02-13 11:51:38 +01:00
stefanprodan
bb0175aebf Add canary rollback scenario 2019-02-13 12:48:26 +02:00
stefanprodan
adaf4c99c0 Add GitOps example to Helm guide 2019-02-13 02:14:40 +02:00
stefanprodan
bed6ed09d5 Add tutorial for canaries with Helm 2019-02-13 00:52:49 +02:00
stefanprodan
4ff67a85ce Add configmap demo to podinfo 2019-02-13 00:51:44 +02:00
stefanprodan
702f4fcd14 Add configmap demo to podinfo 2019-02-12 19:12:10 +02:00
Stefan Prodan
8a03ae153d Merge pull request #43 from stefanprodan/app-validation
Add validation for label selectors
2019-02-11 10:55:34 +01:00
stefanprodan
434c6149ab Package all charts 2019-02-11 11:47:46 +02:00
stefanprodan
97fc4a90ae Add validation for label selectors
- Reject deployment if the pod label selector doesn't match 'app: <DEPLOYMENT_NAME>'
2019-02-11 11:46:59 +02:00
Stefan Prodan
217ef06930 Merge pull request #41 from stefanprodan/demo
Add canary deployment demo Helm chart
2019-02-11 10:20:48 +01:00
stefanprodan
71057946e6 Fix podinfo helm tests 2019-02-10 17:38:33 +02:00
stefanprodan
a74ad52c72 Add dashboard screens 2019-02-10 12:07:44 +02:00
stefanprodan
12d26874f8 Add canary deployment demo chart based on podinfo 2019-02-10 11:48:51 +02:00
stefanprodan
27de9ce151 Session affinity incompatible with destinations weight
- consistent hashing does not apply across multiple subsets
2019-02-10 11:47:01 +02:00
stefanprodan
9e7cd5a8c5 Disable Stackdriver monitoring
- Istio add-on v1.0.3 stackdriver adapter is missing the zone label
2019-02-10 11:37:01 +02:00
stefanprodan
38cb487b64 Allow Grafana anonymous access 2019-02-09 23:45:42 +02:00
stefanprodan
05ca266c5e Add HPA add-on to GKE docs 2019-02-04 16:52:03 +02:00
Stefan Prodan
5cc26de645 Merge pull request #40 from stefanprodan/gke
Flagger install docs revamp
2019-02-02 12:43:15 +01:00
stefanprodan
2b9a195fa3 Add cert-manager diagram to docs 2019-02-02 13:36:51 +02:00
stefanprodan
4454749eec Add load tester install instructions to docs 2019-02-02 13:01:48 +02:00
stefanprodan
b435a03fab Document Istio requirements 2019-02-02 12:16:16 +02:00
stefanprodan
7c166e2b40 Restructure the install docs 2019-02-02 02:20:02 +02:00
stefanprodan
f7a7963dcf Add Flagger install guide for GKE 2019-02-02 02:19:25 +02:00
stefanprodan
9c77c0d69c Add GKE Istio diagram 2019-02-02 02:18:31 +02:00
stefanprodan
e8a9555346 Add GKE Istio Gateway and Prometheus definitions 2019-02-02 02:17:55 +02:00
Stefan Prodan
59751dd007 Merge pull request #39 from stefanprodan/changelog
Add changelog
2019-01-31 17:29:47 +01:00
stefanprodan
9c4d4d16b6 Add PR links to changelog 2019-01-31 12:17:52 +02:00
stefanprodan
0e3d1b3e8f Improve changelog formatting 2019-01-31 12:11:47 +02:00
stefanprodan
f119b78940 Add features and fixes to changelog 2019-01-31 12:08:32 +02:00
stefanprodan
456d914c35 Release v0.5.0 2019-01-30 14:54:03 +02:00
Stefan Prodan
737507b0fe Merge pull request #37 from stefanprodan/track-configs
Track changes in ConfigMaps and Secrets
2019-01-30 13:46:56 +01:00
stefanprodan
4bcf82d295 Copy annotations from canary to primary on promotion 2019-01-28 11:02:33 +02:00
stefanprodan
e9cd7afc8a Add configs track changes to docs 2019-01-28 10:50:30 +02:00
stefanprodan
0830abd51d Trigger a rolling update when configs change
- generate a unique pod annotation on promotion
2019-01-28 10:49:43 +02:00
stefanprodan
5b296e01b3 Detect changes in configs and trigger canary analysis
- restart analysis if a ConfigMap or Secret changes during rollout
- add tests for tracked changes
2019-01-26 12:36:27 +02:00
stefanprodan
3fd039afd1 Add tracked configs checksum to canary status 2019-01-26 12:33:15 +02:00
stefanprodan
5904348ba5 Refactor tests
- consolidate fake clients and mock objects
2019-01-26 00:39:33 +02:00
stefanprodan
1a98e93723 Add config and secret volumes tests 2019-01-25 23:47:50 +02:00
stefanprodan
c9685fbd13 Add ConfigMap env from source tests 2019-01-25 18:58:23 +02:00
stefanprodan
dc347e273d Add secrets from env tests 2019-01-25 18:27:05 +02:00
stefanprodan
8170916897 Add ConfigMap tracking tests 2019-01-25 18:03:36 +02:00
stefanprodan
71cd4e0cb7 Include ConfigMaps and Secrets in promotion
- create primary configs and secrets at bootstrap
- copy configs and secrets from canary to primary and update the pod spec on promotion
2019-01-25 16:03:51 +02:00
stefanprodan
0109788ccc Discover config maps and secrets
- scan target deployment volumes and containers for configmaps and secrets
2019-01-25 13:20:46 +02:00
stefanprodan
1649dea468 Add config maps and secrets manifests for testing 2019-01-25 11:19:34 +02:00
Stefan Prodan
b8a7ea8534 Merge pull request #35 from stefanprodan/gh-actions
Publish charts with GitHub Actions
2019-01-24 11:52:54 +01:00
stefanprodan
afe4d59d5a Move Helm repository to gh-pages branch 2019-01-24 12:47:36 +02:00
stefanprodan
0f2697df23 Publish charts with GitHub Actions 2019-01-24 12:38:45 +02:00
stefanprodan
05664fa648 Release v0.4.1 2019-01-24 12:17:37 +02:00
Stefan Prodan
3b2564f34b Merge pull request #33 from stefanprodan/loadtest
Add load testing service
2019-01-24 11:04:31 +01:00
stefanprodan
dd0cf2d588 Add load tester dockerfile to docs 2019-01-23 15:12:23 +02:00
stefanprodan
7c66f23c6a Add load tester Helm chart 2019-01-21 21:02:40 +02:00
stefanprodan
a9f034de1a Add load testing diagram 2019-01-21 18:02:44 +02:00
stefanprodan
6ad2dca57a Add load testing setup to docs 2019-01-21 17:29:04 +02:00
stefanprodan
e8353c110b Release load tester v0.0.2 2019-01-21 13:37:26 +02:00
stefanprodan
dbf26ddf53 Add load tester flag to log the cmd output 2019-01-21 13:36:08 +02:00
stefanprodan
acc72d207f Change container image tag format 2019-01-20 17:27:08 +02:00
stefanprodan
a784f83464 Add loadtester manifests 2019-01-20 15:59:41 +02:00
stefanprodan
07d8355363 Rename load testing service to flagger-loadtester 2019-01-20 14:28:45 +02:00
stefanprodan
f7a439274e Go format API types 2019-01-20 14:10:10 +02:00
stefanprodan
bd6d446cb8 Go format scheduler 2019-01-20 14:04:10 +02:00
stefanprodan
385d0e0549 Add load test runner service
- embed rakyll/hey in the runner container image
2019-01-20 14:00:14 +02:00
stefanprodan
02236374d8 Run the wekbooks before the metrics checks
- log warning when no values are found for Istio metric due to lack of traffic
2019-01-20 13:54:44 +02:00
stefanprodan
c46fe55ad0 Release v0.4.0 2019-01-18 12:49:36 +02:00
Stefan Prodan
36a54fbf2a Merge pull request #31 from stefanprodan/reset
Restart analysis if revision changes during validation
2019-01-18 10:25:38 +01:00
stefanprodan
60f6b05397 Refactor scheduler tests 2019-01-18 11:14:27 +02:00
stefanprodan
6d8a7343b7 Add tests for analysis restart and canary promotion 2019-01-18 11:05:40 +02:00
stefanprodan
aff8b117d4 Restart validation if revision changes during analysis 2019-01-17 15:13:59 +02:00
Stefan Prodan
1b3c3b22b3 Merge pull request #29 from stefanprodan/status
Use Kubernetes 1.11 CRD status sub-resource
2019-01-17 13:06:28 +01:00
stefanprodan
1d31b5ed90 Add canary name and namespace to controller logs
- zap key-value: canary=name.namespace
2019-01-17 13:58:10 +02:00
stefanprodan
1ef310f00d Add traffic weight to canary status
- show current weight on kubectl get canaries and kubectl get all
2019-01-16 16:29:59 +02:00
stefanprodan
acdd2c46d5 Refactor Canary status
- add status phases (Initialized, Progressing, Succeeded, Failed)
- rename status revision to LastAppliedSpec
2019-01-16 15:06:38 +02:00
stefanprodan
9872e6bc16 Skip readiness checks if canary analysis finished 2019-01-16 13:18:53 +02:00
stefanprodan
10c2bdec86 Use deep copy when updating the virtual service routes 2019-01-16 13:13:07 +02:00
stefanprodan
4bf3b70048 Use CRD UpdateStatus for Canary status updated
- requires Kubernetes >=1.11
2019-01-16 01:00:39 +02:00
stefanprodan
ada446bbaa Drop compatibility with Kubernetes 1.10 2019-01-16 00:58:51 +02:00
stefanprodan
c4981ef4db Add status and additional printer columns to CRD 2019-01-16 00:57:46 +02:00
Stefan Prodan
d1b84cd31d Merge pull request #28 from stefanprodan/naming
Fix for when canary name is different to the target name
2019-01-15 23:32:41 +01:00
stefanprodan
9232c8647a Check if multiple canaries have the same target
- log an error on target duplication ref #13
2019-01-15 21:43:05 +02:00
stefanprodan
23e8c7d616 Fix for when canary name is different to the target name
- use target name consistent at bootstrap
2019-01-15 21:18:46 +02:00
Stefan Prodan
42607fbd64 Merge pull request #26 from carlossg/service-name
Fix VirtualService routes
2019-01-15 19:38:38 +01:00
stefanprodan
28781a5f02 Use deep copy when updating the deployment object
- fix canary status update logs
2019-01-15 20:37:14 +02:00
stefanprodan
3589e11244 Bump dev version 2019-01-15 20:36:59 +02:00
Carlos Sanchez
5e880d3942 Wrong VirtualService routes
If deployment name is different from canary name
the virtual service routes are created with canary name
but the services are created with deployment name

Note that canary name should match deployment name
2019-01-15 18:44:50 +01:00
stefanprodan
f7e675144d Release v0.3.0 2019-01-11 20:10:41 +02:00
Stefan Prodan
3bff2c339b Merge pull request #20 from stefanprodan/scheduler
Add canary analysis schedule interval to CRD
2019-01-11 19:06:17 +01:00
Stefan Prodan
b035c1e7fb Merge pull request #25 from carlossg/virtualservice-naming
Tries to create VirtualService that already exists
2019-01-11 18:03:57 +01:00
Carlos Sanchez
7ae0d49e80 Tries to create VirtualService that already exists
When canary name is different than deployment name

VirtualService croc-hunter-jenkinsx.jx-staging create error virtualservices.networking.istio.io "croc-hunter-jenkinsx" already exists
2019-01-11 17:47:52 +01:00
Stefan Prodan
07f66e849d Merge branch 'master' into scheduler 2019-01-11 15:07:03 +01:00
Stefan Prodan
06c29051eb Merge pull request #24 from carlossg/log-fix
Fix bad error message
2019-01-11 15:05:37 +01:00
stefanprodan
83118faeb3 Fix autoscalerRef tests 2019-01-11 13:51:44 +02:00
stefanprodan
aa2c28c733 Make autoscalerRef optional
- use anyOf as a workaround for the openAPI object validation not accepting empty values
- fix #23
2019-01-11 13:42:32 +02:00
stefanprodan
10185407f6 Use httpbin.org for webhook testing 2019-01-11 13:12:53 +02:00
Carlos Sanchez
c1bde57c17 Fix bad error message
"controller/scheduler.go:217","msg":"deployment . update error Canary.flagger.app \"jx-staging-croc-hunter-jenkinsx\" is invalid: []: Invalid value: map[string]interface {}{\"metadata\":map[string]interface {}{\"name\":\"jx-staging-croc-hunter-jenkinsx\", \"namespace\":\"jx-staging\", \"selfLink\":\"/apis/flagger.app/v1alpha2/namespaces/jx-staging/canaries/jx-staging-croc-hunter-jenkinsx\", \"uid\":\"b248877e-1406-11e9-bf64-42010a8000c6\", \"resourceVersion\":\"30650895\", \"generation\":1, \"creationTimestamp\":\"2019-01-09T12:04:20Z\"}, \"spec\":map[string]interface {}{\"canaryAnalysis\":map[string]interface {}{\"threshold\":5, \"maxWeight\":50, \"stepWeight\":10, \"metrics\":[]interface {}{map[string]interface {}{\"name\":\"istio_requests_total\", \"interval\":\"1m\", \"threshold\":99}, map[string]interface {}{\"name\":\"istio_request_duration_seconds_bucket\", \"interval\":\"30s\"istio-system/flagger-b486d78c8-fkmbr[flagger]: {"level":"info","ts":"2019-01-09T12:14:05.158Z","caller":"controller/deployer.go:228","msg":"Scaling down jx-staging-croc-hunter-jenkinsx.jx-staging"}
2019-01-09 13:17:17 +01:00
stefanprodan
882b4b2d23 Update the control loop interval flag description 2019-01-08 13:15:10 +02:00
Stefan Prodan
cac585157f Merge pull request #21 from carlossg/patch-1
Qualify letsencrypt api version
2019-01-07 15:45:07 +02:00
Carlos Sanchez
cc2860a49f Qualify letsencrypt api version
Otherwise getting

    error: unable to recognize "./letsencrypt-issuer.yaml": no matches for kind "Issuer" in version "v1alpha2"
2019-01-07 14:38:53 +01:00
stefanprodan
bec96356ec Bump CRD version to v1alpha3
- new field canaryAnalysis.interval
2019-01-07 01:03:31 +02:00
stefanprodan
b5c648ea54 Bump version to 0.3.0-beta.1 2019-01-07 00:30:09 +02:00
stefanprodan
e6e3e500be Schedule canary analysis based on interval 2019-01-07 00:26:01 +02:00
stefanprodan
537e8fdaf7 Add canary analysis interval to CRD 2019-01-07 00:24:43 +02:00
stefanprodan
322c83bdad Add docs site link to chart 2019-01-06 18:18:47 +02:00
stefanprodan
41f0ba0247 Document the CRD target ref and control loop interval 2019-01-05 10:22:00 +02:00
stefanprodan
b67b49fde6 Change the default analysis interval to 1m 2019-01-05 01:05:27 +02:00
stefanprodan
f90ba560b7 Release v0.2.0 2019-01-04 13:44:50 +02:00
Stefan Prodan
2a9641fd68 Merge pull request #18 from stefanprodan/webhooks
Add external checks to canary analysis
2019-01-04 13:24:49 +02:00
stefanprodan
13fffe1323 Document webhooks status codes 2019-01-03 18:46:36 +02:00
stefanprodan
083556baae Document the canary analysis timespan 2019-01-03 18:27:49 +02:00
stefanprodan
5d0939af7d Add webhook docs 2019-01-03 16:11:30 +02:00
stefanprodan
d26255070e Copyright Weaveworks 2019-01-03 14:42:21 +02:00
stefanprodan
b008abd4a7 Fix metrics server offline test 2018-12-27 12:43:43 +02:00
stefanprodan
cbf9e1011d Add tests for metrics server check 2018-12-27 12:42:12 +02:00
stefanprodan
6ec3d7a76f Format observer tests 2018-12-27 12:21:33 +02:00
stefanprodan
ab52752d57 Add observer histogram test 2018-12-27 12:16:10 +02:00
stefanprodan
df3951a7ef Add observer tests 2018-12-27 12:15:16 +02:00
stefanprodan
722d36a8cc Add webhook tests 2018-12-26 17:58:35 +02:00
stefanprodan
e86c02d600 Implement canary external check
- do a HTTP POST for each webhook registered in the canary analysis
- increment the failed checks counter if a webhook returns a non-2xx status code and log the error and the response body if exists
2018-12-26 14:41:35 +02:00
stefanprodan
53546878d5 Make service port mandatory in CRD v1alpha2 2018-12-26 13:55:34 +02:00
stefanprodan
199e3b36c6 Upgrade CRD to v1alpha2
- add required fields for deployment and hpa targets
- make service port mandatory
- add webhooks validation
2018-12-26 13:46:59 +02:00
stefanprodan
0d96bedfee Add webhooks to Canary CRD v1alpha2 2018-12-26 13:42:36 +02:00
Stefan Prodan
9753820579 GitBook: [master] 3 pages modified 2018-12-19 14:32:51 +00:00
Stefan Prodan
197f218ba4 GitBook: [master] one page modified 2018-12-19 14:10:49 +00:00
Stefan Prodan
b4b1a36aba GitBook: [master] 8 pages modified 2018-12-19 13:45:12 +00:00
stefanprodan
cfc848bfa9 Link to docs website 2018-12-19 15:42:16 +02:00
stefanprodan
fcf6f96912 Add overview diagram 2018-12-19 15:30:43 +02:00
Stefan Prodan
1504dcab74 GitBook: [master] 5 pages modified 2018-12-19 13:24:16 +00:00
Stefan Prodan
4e4bc0c4f0 GitBook: [master] 4 pages modified 2018-12-19 13:21:33 +00:00
Stefan Prodan
36ce610465 GitBook: [master] 5 pages modified 2018-12-19 12:46:06 +00:00
stefanprodan
1dc2aa147b Ignore gitbook for GitHub pages 2018-12-19 13:31:18 +02:00
Stefan Prodan
8cc7e4adbb GitBook: [master] 4 pages modified 2018-12-19 11:25:30 +00:00
Stefan Prodan
978f7256a8 GitBook: [master] 2 pages modified 2018-12-19 10:08:59 +00:00
stefanprodan
e799e63e3f Set gitbook root 2018-12-19 12:00:18 +02:00
stefanprodan
5b35854464 init gitbook 2018-12-19 11:56:42 +02:00
stefanprodan
d485498a14 Add email field to charts 2018-12-18 18:38:33 +02:00
stefanprodan
dfa974cf57 Change Grafana chart title 2018-12-18 18:37:04 +02:00
stefanprodan
ee1e2e6fd9 Upgrade Grafana to v5.4.2 2018-12-18 12:58:14 +02:00
Stefan Prodan
eeb3b1ba4d Merge pull request #15 from stefanprodan/chart
Add service account option to Helm chart
2018-12-18 12:12:05 +02:00
Stefan Prodan
b510f0ee02 Merge branch 'master' into chart 2018-12-18 11:56:06 +02:00
stefanprodan
c34737b9ce Use app.kubernetes.io labels 2018-12-18 11:53:42 +02:00
stefanprodan
e4ea4f3994 Make the service account optional 2018-12-18 11:06:53 +02:00
stefanprodan
07359192e7 Add chart prerequisites and icon 2018-12-18 10:31:47 +02:00
stefanprodan
4dd23c42a2 Add Flagger logo and icons 2018-12-18 10:31:05 +02:00
Stefan Prodan
f281021abf Add Slack notifications screen 2018-12-06 16:18:38 +07:00
Stefan Prodan
71137ba3bb Release 0.1.2 2018-12-06 14:00:12 +07:00
Stefan Prodan
6372c7dfcc Merge pull request #14 from stefanprodan/slack
Add details to Slack messages
2018-12-06 13:53:20 +07:00
Stefan Prodan
4584733f6f Change coverage threshold 2018-12-06 13:48:06 +07:00
Stefan Prodan
03408683c0 Add details to Slack messages
- attach canary analysis metadata to init/start messages
- add rollback reason to failed canary messages
2018-12-06 12:51:02 +07:00
Stefan Prodan
29137ae75b Add Alermanager example 2018-12-06 12:49:41 +07:00
Stefan Prodan
6bf85526d0 Add Slack screens with successful and failed canaries 2018-12-06 12:49:15 +07:00
stefanprodan
9f6a30f43e Bump dev version 2018-11-28 15:08:24 +02:00
stefanprodan
11bc0390c4 Release v0.1.1 2018-11-28 14:56:34 +02:00
stefanprodan
9a29ea69d7 Change progress deadline default to 10 minutes 2018-11-28 14:53:12 +02:00
Stefan Prodan
2d8adbaca4 Merge pull request #10 from stefanprodan/deadline
Rollback canary based on the deployment progress deadline check
2018-11-28 14:48:17 +02:00
stefanprodan
f3904ea099 Use canary state constants in recorder 2018-11-27 17:34:48 +02:00
stefanprodan
1b2b13e77f Disable patch coverage 2018-11-27 17:11:57 +02:00
stefanprodan
8878f15806 Clean up isDeploymentReady 2018-11-27 17:11:35 +02:00
stefanprodan
5977ff9bae Add rollback test based on failed checks threshold 2018-11-27 17:00:13 +02:00
stefanprodan
11ef6bdf37 Add progressDeadlineSeconds to canary example 2018-11-27 16:58:21 +02:00
stefanprodan
9c342e35be Add progressDeadlineSeconds validation 2018-11-27 16:35:39 +02:00
stefanprodan
c7e7785b06 Fix canary deployer is ready test 2018-11-27 15:55:04 +02:00
stefanprodan
4cb5ceb48b Rollback canary based on the deployment progress deadline check
- determine if the canary deployment is stuck by checking if there is a minimum replicas unavailable condition and if the last update time exceeds the deadline
- set progress deadline default value to 60 seconds
2018-11-27 15:44:15 +02:00
stefanprodan
5a79402a73 Add canary status state constants 2018-11-27 15:29:06 +02:00
stefanprodan
c24b11ff8b Add ProgressDeadlineSeconds to Canary CRD 2018-11-27 12:16:20 +02:00
stefanprodan
042d3c1a5b Set ProgressDeadlineSeconds for primary deployment on init/promote 2018-11-27 12:10:14 +02:00
stefanprodan
f8821cf30b bump dev version 2018-11-27 11:56:11 +02:00
stefanprodan
8c12cdb21d Release v0.1.0 2018-11-25 21:05:16 +02:00
stefanprodan
923799dce7 Keep CRD on Helm release delete 2018-11-25 20:13:40 +02:00
stefanprodan
ebc932fba5 Add Slack configuration to Helm readme 2018-11-25 20:07:32 +02:00
Stefan Prodan
3d8d30db47 Merge pull request #6 from stefanprodan/quay
Switch to Quay and Go 1.11
2018-11-25 19:35:53 +02:00
stefanprodan
1022c3438a Use go 1.11 for docker build 2018-11-25 19:21:42 +02:00
stefanprodan
9159855df2 Use Quay as container registry in Helm and YAML manifests 2018-11-25 19:20:29 +02:00
Stefan Prodan
7927ac0a5d Push container image to Quay 2018-11-25 18:52:18 +02:00
Stefan Prodan
f438e9a4b2 Merge pull request #4 from stefanprodan/slack
Add Slack notifications
2018-11-25 11:54:15 +02:00
Stefan Prodan
4c70a330d4 Add Slack notifications configuration to readme 2018-11-25 11:46:18 +02:00
Stefan Prodan
d8875a3da1 Add Slack flags to Helm chart 2018-11-25 11:45:38 +02:00
Stefan Prodan
769aff57cb Add Slack notifications for canary events 2018-11-25 11:44:45 +02:00
Stefan Prodan
4138f37f9a Add Slack notifier component 2018-11-25 11:40:35 +02:00
stefanprodan
583c9cc004 Rename Istio client set 2018-11-25 00:05:43 +02:00
Stefan Prodan
c5930e6f70 Update deployment strategy on promotion
- include spec strategy, min ready seconds and revision history limit to initialization and promotion
2018-11-24 20:03:02 +02:00
stefanprodan
423d9bbbb3 Use go 1.11 in Travis 2018-11-24 16:23:20 +02:00
Stefan Prodan
07771f500f Release 0.1.0-beta.7 2018-11-24 15:58:17 +02:00
Stefan Prodan
65bd77c88f Add last transition time to Canary CRD status 2018-11-24 15:48:35 +02:00
Stefan Prodan
82bf63f89b Change website URL 2018-11-15 12:20:53 +02:00
Stefan Prodan
7f735ead07 Set site banner 2018-11-15 10:50:58 +02:00
Stefan Prodan
56ffd618d6 Increase flagger probes timeout to 5s (containerd fix) 2018-11-15 10:38:20 +02:00
Stefan Prodan
19cb34479e Increase probes timeout to 5s (containerd fix) 2018-11-14 15:39:44 +02:00
Stefan Prodan
2d906f0b71 Add Grafana install to helm-up cmd 2018-11-14 15:38:35 +02:00
Stefan Prodan
3eaeec500e Clean coverage artifacts (fix goreleaser) 2018-10-29 21:52:09 +02:00
Stefan Prodan
df98de7d11 Release v0.1.0-beta.6 2018-10-29 21:46:54 +02:00
Stefan Prodan
580924e63b Record canary duration and total
- add Prometheus metrics canary_duration_seconds and canary_total
2018-10-29 21:44:43 +02:00
Stefan Prodan
1b2108001f Add Prometheus registry flag to recorder
- fix tests
2018-10-29 14:04:45 +02:00
Stefan Prodan
3a28768bf9 Update website docs 2018-10-29 13:56:17 +02:00
Stefan Prodan
53c09f40eb Add Prometheus metrics docs
- ref #2
2018-10-29 13:44:20 +02:00
Stefan Prodan
074e57aa12 Add recorder to revision tests 2018-10-29 13:43:54 +02:00
Stefan Prodan
e16dde809d Add recorder to mock controller 2018-10-29 13:34:28 +02:00
Stefan Prodan
188e4ea82e Release v0.1.0-beta.5 2018-10-29 11:26:56 +02:00
Stefan Prodan
4a8aa3b547 Add recorder component
- records the canary analysis status and current weight as Prometheus metrics
2018-10-29 11:25:36 +02:00
Stefan Prodan
6bf4a8f95b Rename user to flagger 2018-10-23 16:58:32 +03:00
Stefan Prodan
c5ea947899 Add codecov badge 2018-10-23 16:44:25 +03:00
Stefan Prodan
344c7db968 Make golint happy and add codecov 2018-10-23 16:36:48 +03:00
Stefan Prodan
65b908e702 Release v0.1.0-beta.2 2018-10-23 13:42:43 +03:00
Stefan Prodan
8e66baa0e7 Update the artifacts yamls to match the naming conventions 2018-10-23 13:39:10 +03:00
Stefan Prodan
667e915700 Update canary dashboard to latest CRD naming conventions 2018-10-23 13:21:57 +03:00
Stefan Prodan
7af103f112 Update Grafana to v5.3.1 2018-10-23 11:21:04 +03:00
Stefan Prodan
8e2f538e4c Add scheduler tests for initialization and revision 2018-10-22 20:14:09 +03:00
Stefan Prodan
be289ef7ce Add router tests 2018-10-22 17:21:06 +03:00
Stefan Prodan
4a074e50c4 Add Istio fake clientset 2018-10-22 17:18:33 +03:00
Stefan Prodan
fa13c92a15 Add deployer status and scaling tests 2018-10-22 16:29:59 +03:00
Stefan Prodan
dbd0908313 Add deployer promote tests 2018-10-22 16:03:06 +03:00
Stefan Prodan
9b5c4586b9 Add deployer sync tests 2018-10-22 16:02:01 +03:00
Stefan Prodan
bfbb272c88 Add Kubernetes fake clientset package 2018-10-22 16:00:50 +03:00
Stefan Prodan
4b4a88cbe5 Publish Helm chart 0.1.0-alpha.2 2018-10-15 11:11:41 +02:00
Stefan Prodan
b022124415 bump version 2018-10-15 11:05:39 +02:00
Stefan Prodan
663dc82574 Controller refactoring part two
- share components between loops
2018-10-11 20:51:12 +03:00
Stefan Prodan
baeee62a26 Controller refactoring
- split controller logic into components (deployer, observer, router and scheduler)
- set the canary analysis final state (failed or finished) in a single run
2018-10-11 19:59:40 +03:00
Stefan Prodan
56f2ee9078 Add contributing and code of conduct docs 2018-10-11 14:33:28 +03:00
Stefan Prodan
a4f890c8b2 Add autoscaling support
- add HorizontalPodAutoscaler reference to CRD
- create primary HPA on canary bootstrap
2018-10-11 11:16:56 +03:00
Stefan Prodan
a03cf43a1d Update CRD in readme and chart 2018-10-11 02:03:04 +03:00
Stefan Prodan
302de10fec Canary CRD refactoring
- set canaries.flagger.app version to v1alpha1
- replace old Canary spec with CanaryDeployment
2018-10-11 01:43:53 +03:00
Stefan Prodan
5a1412549d Merge pull request #1 from stefanprodan/crd-deployment
Add CanaryDeployment kind
2018-10-10 16:57:58 +03:00
Stefan Prodan
e2be4fdaed Add CanaryDeployment kind
- bootstrap the deployments. services and Istio virtual service
- use google/go-cmp to detect changes in the deployment pod spec
2018-10-10 16:57:12 +03:00
Stefan Prodan
3eb60a8447 Add alternative canary routing 2018-10-09 18:18:48 +03:00
Stefan Prodan
276fdfc0ff Rename dashboard 2018-10-09 18:18:04 +03:00
406 changed files with 22794 additions and 6811 deletions

16
.circleci/config.yml Normal file
View File

@@ -0,0 +1,16 @@
version: 2.1
jobs:
e2e-testing:
machine: true
steps:
- checkout
- run: test/e2e-kind.sh
- run: test/e2e-istio.sh
- run: test/e2e-build.sh
- run: test/e2e-tests.sh
workflows:
version: 2
build-and-test:
jobs:
- e2e-testing

8
.codecov.yml Normal file
View File

@@ -0,0 +1,8 @@
coverage:
status:
project:
default:
target: auto
threshold: 50
base: auto
patch: off

1
.gitbook.yaml Normal file
View File

@@ -0,0 +1 @@
root: ./docs/gitbook

17
.github/main.workflow vendored Normal file
View File

@@ -0,0 +1,17 @@
workflow "Publish Helm charts" {
on = "push"
resolves = ["helm-push"]
}
action "helm-lint" {
uses = "stefanprodan/gh-actions/helm@master"
args = ["lint charts/*"]
}
action "helm-push" {
needs = ["helm-lint"]
uses = "stefanprodan/gh-actions/helm-gh-pages@master"
args = ["charts/*","https://flagger.app"]
secrets = ["GITHUB_TOKEN"]
}

3
.gitignore vendored
View File

@@ -11,3 +11,6 @@
# Output of the go coverage tool, specifically when used with LiteIDE
*.out
.DS_Store
bin/
artifacts/gcloud/

View File

@@ -2,7 +2,7 @@ sudo: required
language: go
go:
- 1.10.x
- 1.11.x
services:
- docker
@@ -12,27 +12,36 @@ addons:
packages:
- docker-ce
#before_script:
# - go get -u sigs.k8s.io/kind
# - curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bash
# - curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl && chmod +x kubectl && sudo mv kubectl /usr/local/bin/
script:
- set -e
- make test
- make build
- set -e
- make test-fmt
- make test-codegen
- go test -race -coverprofile=coverage.txt -covermode=atomic ./pkg/controller/
- make build
after_success:
- if [ -z "$DOCKER_USER" ]; then
echo "PR build, skipping Docker Hub push";
echo "PR build, skipping image push";
else
docker tag stefanprodan/flagger:latest stefanprodan/flagger:${TRAVIS_COMMIT};
echo $DOCKER_PASS | docker login -u=$DOCKER_USER --password-stdin;
docker push stefanprodan/flagger:${TRAVIS_COMMIT};
BRANCH_COMMIT=${TRAVIS_BRANCH}-$(echo ${TRAVIS_COMMIT} | head -c7);
docker tag stefanprodan/flagger:latest quay.io/stefanprodan/flagger:${BRANCH_COMMIT};
echo $DOCKER_PASS | docker login -u=$DOCKER_USER --password-stdin quay.io;
docker push quay.io/stefanprodan/flagger:${BRANCH_COMMIT};
fi
- if [ -z "$TRAVIS_TAG" ]; then
echo "Not a release, skipping Docker Hub push";
echo "Not a release, skipping image push";
else
docker tag stefanprodan/flagger:latest stefanprodan/flagger:$TRAVIS_TAG;
echo $DOCKER_PASS | docker login -u=$DOCKER_USER --password-stdin;
docker push stefanprodan/flagger:latest;
docker push stefanprodan/flagger:$TRAVIS_TAG;
docker tag stefanprodan/flagger:latest quay.io/stefanprodan/flagger:${TRAVIS_TAG};
echo $DOCKER_PASS | docker login -u=$DOCKER_USER --password-stdin quay.io;
docker push quay.io/stefanprodan/flagger:$TRAVIS_TAG;
fi
- bash <(curl -s https://codecov.io/bash)
- rm coverage.txt
deploy:
- provider: script

155
CHANGELOG.md Normal file
View File

@@ -0,0 +1,155 @@
# Changelog
All notable changes to this project are documented in this file.
## 0.6.0 (2019-02-25)
Allows for [HTTPMatchRequests](https://istio.io/docs/reference/config/istio.networking.v1alpha3/#HTTPMatchRequest)
and [HTTPRewrite](https://istio.io/docs/reference/config/istio.networking.v1alpha3/#HTTPRewrite)
to be customized in the service spec of the canary custom resource.
#### Features
- Add HTTP match conditions and URI rewrite to the canary service spec [#55](https://github.com/stefanprodan/flagger/pull/55)
- Update virtual service when the canary service spec changes
[#54](https://github.com/stefanprodan/flagger/pull/54)
[#51](https://github.com/stefanprodan/flagger/pull/51)
#### Improvements
- Run e2e testing on [Kubernetes Kind](https://github.com/kubernetes-sigs/kind) for canary promotion
[#53](https://github.com/stefanprodan/flagger/pull/53)
## 0.5.1 (2019-02-14)
Allows skipping the analysis phase to ship changes directly to production
#### Features
- Add option to skip the canary analysis [#46](https://github.com/stefanprodan/flagger/pull/46)
#### Fixes
- Reject deployment if the pod label selector doesn't match `app: <DEPLOYMENT_NAME>` [#43](https://github.com/stefanprodan/flagger/pull/43)
## 0.5.0 (2019-01-30)
Track changes in ConfigMaps and Secrets [#37](https://github.com/stefanprodan/flagger/pull/37)
#### Features
- Promote configmaps and secrets changes from canary to primary
- Detect changes in configmaps and/or secrets and (re)start canary analysis
- Add configs checksum to Canary CRD status
- Create primary configmaps and secrets at bootstrap
- Scan canary volumes and containers for configmaps and secrets
#### Fixes
- Copy deployment labels from canary to primary at bootstrap and promotion
## 0.4.1 (2019-01-24)
Load testing webhook [#35](https://github.com/stefanprodan/flagger/pull/35)
#### Features
- Add the load tester chart to Flagger Helm repository
- Implement a load test runner based on [rakyll/hey](https://github.com/rakyll/hey)
- Log warning when no values are found for Istio metric due to lack of traffic
#### Fixes
- Run wekbooks before the metrics checks to avoid failures when using a load tester
## 0.4.0 (2019-01-18)
Restart canary analysis if revision changes [#31](https://github.com/stefanprodan/flagger/pull/31)
#### Breaking changes
- Drop support for Kubernetes 1.10
#### Features
- Detect changes during canary analysis and reset advancement
- Add status and additional printer columns to CRD
- Add canary name and namespace to controller structured logs
#### Fixes
- Allow canary name to be different to the target name
- Check if multiple canaries have the same target and log error
- Use deep copy when updating Kubernetes objects
- Skip readiness checks if canary analysis has finished
## 0.3.0 (2019-01-11)
Configurable canary analysis duration [#20](https://github.com/stefanprodan/flagger/pull/20)
#### Breaking changes
- Helm chart: flag `controlLoopInterval` has been removed
#### Features
- CRD: canaries.flagger.app v1alpha3
- Schedule canary analysis independently based on `canaryAnalysis.interval`
- Add analysis interval to Canary CRD (defaults to one minute)
- Make autoscaler (HPA) reference optional
## 0.2.0 (2019-01-04)
Webhooks [#18](https://github.com/stefanprodan/flagger/pull/18)
#### Features
- CRD: canaries.flagger.app v1alpha2
- Implement canary external checks based on webhooks HTTP POST calls
- Add webhooks to Canary CRD
- Move docs to gitbook [docs.flagger.app](https://docs.flagger.app)
## 0.1.2 (2018-12-06)
Improve Slack notifications [#14](https://github.com/stefanprodan/flagger/pull/14)
#### Features
- Add canary analysis metadata to init and start Slack messages
- Add rollback reason to failed canary Slack messages
## 0.1.1 (2018-11-28)
Canary progress deadline [#10](https://github.com/stefanprodan/flagger/pull/10)
#### Features
- Rollback canary based on the deployment progress deadline check
- Add progress deadline to Canary CRD (defaults to 10 minutes)
## 0.1.0 (2018-11-25)
First stable release
#### Features
- CRD: canaries.flagger.app v1alpha1
- Notifications: post canary events to Slack
- Instrumentation: expose Prometheus metrics for canary status and traffic weight percentage
- Autoscaling: add HPA reference to CRD and create primary HPA at bootstrap
- Bootstrap: create primary deployment, ClusterIP services and Istio virtual service based on CRD spec
## 0.0.1 (2018-10-07)
Initial semver release
#### Features
- Implement canary rollback based on failed checks threshold
- Scale up the deployment when canary revision changes
- Add OpenAPI v3 schema validation to Canary CRD
- Use CRD status for canary state persistence
- Add Helm charts for Flagger and Grafana
- Add canary analysis Grafana dashboard

72
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,72 @@
# How to Contribute
Flagger is [Apache 2.0 licensed](LICENSE) and accepts contributions via GitHub
pull requests. This document outlines some of the conventions on development
workflow, commit message formatting, contact points and other resources to make
it easier to get your contribution accepted.
We gratefully welcome improvements to documentation as well as to code.
## Certificate of Origin
By contributing to this project you agree to the Developer Certificate of
Origin (DCO). This document was created by the Linux Kernel community and is a
simple statement that you, as a contributor, have the legal right to make the
contribution.
## Chat
The project uses Slack: To join the conversation, simply join the
[Weave community](https://slack.weave.works/) Slack workspace.
## Getting Started
- Fork the repository on GitHub
- If you want to contribute as a developer, continue reading this document for further instructions
- If you have questions, concerns, get stuck or need a hand, let us know
on the Slack channel. We are happy to help and look forward to having
you part of the team. No matter in which capacity.
- Play with the project, submit bugs, submit pull requests!
## Contribution workflow
This is a rough outline of how to prepare a contribution:
- Create a topic branch from where you want to base your work (usually branched from master).
- Make commits of logical units.
- Make sure your commit messages are in the proper format (see below).
- Push your changes to a topic branch in your fork of the repository.
- If you changed code:
- add automated tests to cover your changes
- Submit a pull request to the original repository.
## Acceptance policy
These things will make a PR more likely to be accepted:
- a well-described requirement
- new code and tests follow the conventions in old code and tests
- a good commit message (see below)
- All code must abide [Go Code Review Comments](https://github.com/golang/go/wiki/CodeReviewComments)
- Names should abide [What's in a name](https://talks.golang.org/2014/names.slide#1)
- Code must build on both Linux and Darwin, via plain `go build`
- Code should have appropriate test coverage and tests should be written
to work with `go test`
In general, we will merge a PR once one maintainer has endorsed it.
For substantial changes, more people may become involved, and you might
get asked to resubmit the PR or divide the changes into more than one PR.
### Format of the Commit Message
For Flux we prefer the following rules for good commit messages:
- Limit the subject to 50 characters and write as the continuation
of the sentence "If applied, this commit will ..."
- Explain what and why in the body, if more than a trivial change;
wrap it at 72 characters.
The [following article](https://chris.beams.io/posts/git-commit/#seven-rules)
has some more helpful advice on documenting your work.
This doc is adapted from the [Weaveworks Flux](https://github.com/weaveworks/flux/blob/master/CONTRIBUTING.md)

View File

@@ -1,4 +1,4 @@
FROM golang:1.10
FROM golang:1.11
RUN mkdir -p /go/src/github.com/stefanprodan/flagger/
@@ -13,17 +13,17 @@ RUN GIT_COMMIT=$(git rev-list -1 HEAD) && \
FROM alpine:3.8
RUN addgroup -S app \
&& adduser -S -g app app \
RUN addgroup -S flagger \
&& adduser -S -g flagger flagger \
&& apk --no-cache add ca-certificates
WORKDIR /home/app
WORKDIR /home/flagger
COPY --from=0 /go/src/github.com/stefanprodan/flagger/flagger .
RUN chown -R app:app ./
RUN chown -R flagger:flagger ./
USER app
USER flagger
ENTRYPOINT ["./flagger"]

44
Dockerfile.loadtester Normal file
View File

@@ -0,0 +1,44 @@
FROM golang:1.11 AS hey-builder
RUN mkdir -p /go/src/github.com/rakyll/hey/
WORKDIR /go/src/github.com/rakyll/hey
ADD https://github.com/rakyll/hey/archive/v0.1.1.tar.gz .
RUN tar xzf v0.1.1.tar.gz --strip 1
RUN go get ./...
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
go install -ldflags '-w -extldflags "-static"' \
/go/src/github.com/rakyll/hey
FROM golang:1.11 AS builder
RUN mkdir -p /go/src/github.com/stefanprodan/flagger/
WORKDIR /go/src/github.com/stefanprodan/flagger
COPY . .
RUN go test -race ./pkg/loadtester/
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o loadtester ./cmd/loadtester/*
FROM alpine:3.8
RUN addgroup -S app \
&& adduser -S -g app app \
&& apk --no-cache add ca-certificates curl
WORKDIR /home/app
COPY --from=hey-builder /go/bin/hey /usr/local/bin/hey
COPY --from=builder /go/src/github.com/stefanprodan/flagger/loadtester .
RUN chown -R app:app ./
USER app
ENTRYPOINT ["./loadtester"]

82
Gopkg.lock generated
View File

@@ -25,14 +25,6 @@
revision = "8991bc29aa16c548c550c7ff78260e27b9ab7c73"
version = "v1.1.1"
[[projects]]
digest = "1:ade392a843b2035effb4b4a2efa2c3bab3eb29b992e98bacf9c898b0ecb54e45"
name = "github.com/fatih/color"
packages = ["."]
pruneopts = "NUT"
revision = "5b77d2a35fb0ede96d138fc9a99f5c9b6aef11b4"
version = "v1.7.0"
[[projects]]
digest = "1:81466b4218bf6adddac2572a30ac733a9255919bc2f470b4827a317bd4ee1756"
name = "github.com/ghodss/yaml"
@@ -92,10 +84,11 @@
revision = "4030bb1f1f0c35b30ca7009e9ebd06849dd45306"
[[projects]]
digest = "1:2e3c336fc7fde5c984d2841455a658a6d626450b1754a854b3b32e7a8f49a07a"
digest = "1:d2754cafcab0d22c13541618a8029a70a8959eb3525ff201fe971637e2274cd0"
name = "github.com/google/go-cmp"
packages = [
"cmp",
"cmp/cmpopts",
"cmp/internal/diff",
"cmp/internal/function",
"cmp/internal/value",
@@ -170,42 +163,25 @@
revision = "f2b4162afba35581b6d4a50d3b8f34e33c144682"
[[projects]]
digest = "1:555e31114bd0e89c6340c47ab73162e8c8d873e4d88914310923566f487bfcd5"
digest = "1:05ddd9088c0cfb8eaa3adf3626977caa6d96b3959a3bd8c91fef932fd1696c34"
name = "github.com/knative/pkg"
packages = [
"apis",
"apis/duck",
"apis/duck/v1alpha1",
"apis/istio",
"apis/istio/authentication",
"apis/istio/authentication/v1alpha1",
"apis/istio/common/v1alpha1",
"apis/istio/v1alpha3",
"client/clientset/versioned",
"client/clientset/versioned/fake",
"client/clientset/versioned/scheme",
"client/clientset/versioned/typed/authentication/v1alpha1",
"client/clientset/versioned/typed/duck/v1alpha1",
"client/clientset/versioned/typed/authentication/v1alpha1/fake",
"client/clientset/versioned/typed/istio/v1alpha3",
"client/clientset/versioned/typed/istio/v1alpha3/fake",
"signals",
]
pruneopts = "NUT"
revision = "c15d7c8f2220a7578b33504df6edefa948c845ae"
[[projects]]
digest = "1:08c231ec84231a7e23d67e4b58f975e1423695a32467a362ee55a803f9de8061"
name = "github.com/mattn/go-colorable"
packages = ["."]
pruneopts = "NUT"
revision = "167de6bfdfba052fa6b2d3664c8f5272e23c9072"
version = "v0.0.9"
[[projects]]
digest = "1:bffa444ca07c69c599ae5876bc18b25bfd5fa85b297ca10a25594d284a7e9c5d"
name = "github.com/mattn/go-isatty"
packages = ["."]
pruneopts = "NUT"
revision = "6ca4dbf54d38eea1a992b3c722a76a5d1c4cb25c"
version = "v0.0.4"
revision = "f9612ef73847258e381e749c4f45b0f5e03b66e9"
[[projects]]
digest = "1:5985ef4caf91ece5d54817c11ea25f182697534f8ae6521eadcd628c142ac4b6"
@@ -495,10 +471,9 @@
version = "kubernetes-1.11.0"
[[projects]]
digest = "1:4b0d523ee389c762d02febbcfa0734c4530ebe87abe925db18f05422adcb33e8"
digest = "1:83b01e3d6f85c4e911de84febd69a2d3ece614c5a4a518fbc2b5d59000645980"
name = "k8s.io/apimachinery"
packages = [
"pkg/api/equality",
"pkg/api/errors",
"pkg/api/meta",
"pkg/api/resource",
@@ -547,42 +522,72 @@
version = "kubernetes-1.11.0"
[[projects]]
digest = "1:29e55bcff61dd3d1f768724450a3933ea76e6277684796eb7c315154f41db902"
digest = "1:c7d6cf5e28c377ab4000b94b6b9ff562c4b13e7e8b948ad943f133c5104be011"
name = "k8s.io/client-go"
packages = [
"discovery",
"discovery/fake",
"kubernetes",
"kubernetes/fake",
"kubernetes/scheme",
"kubernetes/typed/admissionregistration/v1alpha1",
"kubernetes/typed/admissionregistration/v1alpha1/fake",
"kubernetes/typed/admissionregistration/v1beta1",
"kubernetes/typed/admissionregistration/v1beta1/fake",
"kubernetes/typed/apps/v1",
"kubernetes/typed/apps/v1/fake",
"kubernetes/typed/apps/v1beta1",
"kubernetes/typed/apps/v1beta1/fake",
"kubernetes/typed/apps/v1beta2",
"kubernetes/typed/apps/v1beta2/fake",
"kubernetes/typed/authentication/v1",
"kubernetes/typed/authentication/v1/fake",
"kubernetes/typed/authentication/v1beta1",
"kubernetes/typed/authentication/v1beta1/fake",
"kubernetes/typed/authorization/v1",
"kubernetes/typed/authorization/v1/fake",
"kubernetes/typed/authorization/v1beta1",
"kubernetes/typed/authorization/v1beta1/fake",
"kubernetes/typed/autoscaling/v1",
"kubernetes/typed/autoscaling/v1/fake",
"kubernetes/typed/autoscaling/v2beta1",
"kubernetes/typed/autoscaling/v2beta1/fake",
"kubernetes/typed/batch/v1",
"kubernetes/typed/batch/v1/fake",
"kubernetes/typed/batch/v1beta1",
"kubernetes/typed/batch/v1beta1/fake",
"kubernetes/typed/batch/v2alpha1",
"kubernetes/typed/batch/v2alpha1/fake",
"kubernetes/typed/certificates/v1beta1",
"kubernetes/typed/certificates/v1beta1/fake",
"kubernetes/typed/core/v1",
"kubernetes/typed/core/v1/fake",
"kubernetes/typed/events/v1beta1",
"kubernetes/typed/events/v1beta1/fake",
"kubernetes/typed/extensions/v1beta1",
"kubernetes/typed/extensions/v1beta1/fake",
"kubernetes/typed/networking/v1",
"kubernetes/typed/networking/v1/fake",
"kubernetes/typed/policy/v1beta1",
"kubernetes/typed/policy/v1beta1/fake",
"kubernetes/typed/rbac/v1",
"kubernetes/typed/rbac/v1/fake",
"kubernetes/typed/rbac/v1alpha1",
"kubernetes/typed/rbac/v1alpha1/fake",
"kubernetes/typed/rbac/v1beta1",
"kubernetes/typed/rbac/v1beta1/fake",
"kubernetes/typed/scheduling/v1alpha1",
"kubernetes/typed/scheduling/v1alpha1/fake",
"kubernetes/typed/scheduling/v1beta1",
"kubernetes/typed/scheduling/v1beta1/fake",
"kubernetes/typed/settings/v1alpha1",
"kubernetes/typed/settings/v1alpha1/fake",
"kubernetes/typed/storage/v1",
"kubernetes/typed/storage/v1/fake",
"kubernetes/typed/storage/v1alpha1",
"kubernetes/typed/storage/v1alpha1/fake",
"kubernetes/typed/storage/v1beta1",
"kubernetes/typed/storage/v1beta1/fake",
"pkg/apis/clientauthentication",
"pkg/apis/clientauthentication/v1alpha1",
"pkg/apis/clientauthentication/v1beta1",
@@ -675,24 +680,30 @@
analyzer-name = "dep"
analyzer-version = 1
input-imports = [
"github.com/fatih/color",
"github.com/google/go-cmp/cmp",
"github.com/google/go-cmp/cmp/cmpopts",
"github.com/istio/glog",
"github.com/knative/pkg/apis/istio/v1alpha3",
"github.com/knative/pkg/client/clientset/versioned",
"github.com/knative/pkg/client/clientset/versioned/fake",
"github.com/knative/pkg/signals",
"github.com/prometheus/client_golang/prometheus",
"github.com/prometheus/client_golang/prometheus/promhttp",
"go.uber.org/zap",
"go.uber.org/zap/zapcore",
"k8s.io/api/apps/v1",
"k8s.io/api/autoscaling/v1",
"k8s.io/api/autoscaling/v2beta1",
"k8s.io/api/core/v1",
"k8s.io/apimachinery/pkg/api/errors",
"k8s.io/apimachinery/pkg/api/resource",
"k8s.io/apimachinery/pkg/apis/meta/v1",
"k8s.io/apimachinery/pkg/labels",
"k8s.io/apimachinery/pkg/runtime",
"k8s.io/apimachinery/pkg/runtime/schema",
"k8s.io/apimachinery/pkg/runtime/serializer",
"k8s.io/apimachinery/pkg/types",
"k8s.io/apimachinery/pkg/util/intstr",
"k8s.io/apimachinery/pkg/util/runtime",
"k8s.io/apimachinery/pkg/util/sets/types",
"k8s.io/apimachinery/pkg/util/wait",
@@ -700,6 +711,7 @@
"k8s.io/client-go/discovery",
"k8s.io/client-go/discovery/fake",
"k8s.io/client-go/kubernetes",
"k8s.io/client-go/kubernetes/fake",
"k8s.io/client-go/kubernetes/scheme",
"k8s.io/client-go/kubernetes/typed/core/v1",
"k8s.io/client-go/plugin/pkg/client/auth/gcp",

View File

@@ -47,7 +47,7 @@ required = [
[[constraint]]
name = "github.com/knative/pkg"
revision = "c15d7c8f2220a7578b33504df6edefa948c845ae"
revision = "f9612ef73847258e381e749c4f45b0f5e03b66e9"
[[override]]
name = "github.com/golang/glog"

View File

@@ -186,7 +186,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Copyright 2018 Weaveworks. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

View File

@@ -3,14 +3,20 @@ VERSION?=$(shell grep 'VERSION' pkg/version/version.go | awk '{ print $$4 }' | t
VERSION_MINOR:=$(shell grep 'VERSION' pkg/version/version.go | awk '{ print $$4 }' | tr -d '"' | rev | cut -d'.' -f2- | rev)
PATCH:=$(shell grep 'VERSION' pkg/version/version.go | awk '{ print $$4 }' | tr -d '"' | awk -F. '{print $$NF}')
SOURCE_DIRS = cmd pkg/apis pkg/controller pkg/server pkg/logging pkg/version
LT_VERSION?=$(shell grep 'VERSION' cmd/loadtester/main.go | awk '{ print $$4 }' | tr -d '"' | head -n1)
run:
go run cmd/flagger/* -kubeconfig=$$HOME/.kube/config -log-level=info -metrics-server=https://prometheus.istio.weavedx.com
go run cmd/flagger/* -kubeconfig=$$HOME/.kube/config -log-level=info \
-metrics-server=https://prometheus.istio.weavedx.com \
-slack-url=https://hooks.slack.com/services/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 \
-slack-channel="devops-alerts"
build:
docker build -t stefanprodan/flagger:$(TAG) . -f Dockerfile
push:
docker push stefanprodan/flagger:$(TAG)
docker tag stefanprodan/flagger:$(TAG) quay.io/stefanprodan/flagger:$(VERSION)
docker push quay.io/stefanprodan/flagger:$(VERSION)
fmt:
gofmt -l -s -w $(SOURCE_DIRS)
@@ -25,12 +31,13 @@ test: test-fmt test-codegen
go test ./...
helm-package:
cd charts/ && helm package flagger/ && helm package podinfo-flagger/ && helm package grafana/
cd charts/ && helm package ./*
mv charts/*.tgz docs/
helm repo index docs --url https://stefanprodan.github.io/flagger --merge ./docs/index.yaml
helm-up:
helm upgrade --install flagger ./charts/flagger --namespace=istio-system --set crd.create=false
helm upgrade --install flagger-grafana ./charts/grafana --namespace=istio-system
version-set:
@next="$(TAG)" && \
@@ -39,6 +46,7 @@ version-set:
sed -i '' "s/flagger:$$current/flagger:$$next/g" artifacts/flagger/deployment.yaml && \
sed -i '' "s/tag: $$current/tag: $$next/g" charts/flagger/values.yaml && \
sed -i '' "s/appVersion: $$current/appVersion: $$next/g" charts/flagger/Chart.yaml && \
sed -i '' "s/version: $$current/version: $$next/g" charts/flagger/Chart.yaml && \
echo "Version $$next set in code, deployment and charts"
version-up:
@@ -52,10 +60,10 @@ version-up:
dev-up: version-up
@echo "Starting build/push/deploy pipeline for $(VERSION)"
docker build -t stefanprodan/flagger:$(VERSION) . -f Dockerfile
docker push stefanprodan/flagger:$(VERSION)
docker build -t quay.io/stefanprodan/flagger:$(VERSION) . -f Dockerfile
docker push quay.io/stefanprodan/flagger:$(VERSION)
kubectl apply -f ./artifacts/flagger/crd.yaml
helm upgrade --install flagger ./charts/flagger --namespace=istio-system --set crd.create=false
helm upgrade -i flagger ./charts/flagger --namespace=istio-system --set crd.create=false
release:
git tag $(VERSION)
@@ -68,3 +76,11 @@ release-set: fmt version-set helm-package
git tag $(VERSION)
git push origin $(VERSION)
reset-test:
kubectl delete -f ./artifacts/namespaces
kubectl apply -f ./artifacts/namespaces
kubectl apply -f ./artifacts/canaries
loadtester-push:
docker build -t quay.io/stefanprodan/flagger-loadtester:$(LT_VERSION) . -f Dockerfile.loadtester
docker push quay.io/stefanprodan/flagger-loadtester:$(LT_VERSION)

427
README.md
View File

@@ -2,13 +2,42 @@
[![build](https://travis-ci.org/stefanprodan/flagger.svg?branch=master)](https://travis-ci.org/stefanprodan/flagger)
[![report](https://goreportcard.com/badge/github.com/stefanprodan/flagger)](https://goreportcard.com/report/github.com/stefanprodan/flagger)
[![codecov](https://codecov.io/gh/stefanprodan/flagger/branch/master/graph/badge.svg)](https://codecov.io/gh/stefanprodan/flagger)
[![license](https://img.shields.io/github/license/stefanprodan/flagger.svg)](https://github.com/stefanprodan/flagger/blob/master/LICENSE)
[![release](https://img.shields.io/github/release/stefanprodan/flagger/all.svg)](https://github.com/stefanprodan/flagger/releases)
Flagger is a Kubernetes operator that automates the promotion of canary deployments
using Istio routing for traffic shifting and Prometheus metrics for canary analysis.
The project is currently in experimental phase and it is expected that breaking changes
to the API will be made in the upcoming releases.
using Istio routing for traffic shifting and Prometheus metrics for canary analysis.
The canary analysis can be extended with webhooks for running acceptance tests,
load tests or any other custom validation.
Flagger implements a control loop that gradually shifts traffic to the canary while measuring key performance
indicators like HTTP requests success rate, requests average duration and pods health.
Based on analysis of the KPIs a canary is promoted or aborted, and the analysis result is published to Slack.
![flagger-overview](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/diagrams/flagger-canary-overview.png)
### Documentation
Flagger documentation can be found at [docs.flagger.app](https://docs.flagger.app)
* Install
* [Flagger install on Kubernetes](https://docs.flagger.app/install/flagger-install-on-kubernetes)
* [Flagger install on GKE](https://docs.flagger.app/install/flagger-install-on-google-cloud)
* How it works
* [Canary custom resource](https://docs.flagger.app/how-it-works#canary-custom-resource)
* [Virtual Service](https://docs.flagger.app/how-it-works#virtual-service)
* [Canary deployment stages](https://docs.flagger.app/how-it-works#canary-deployment)
* [Canary analysis](https://docs.flagger.app/how-it-works#canary-analysis)
* [HTTP metrics](https://docs.flagger.app/how-it-works#http-metrics)
* [Webhooks](https://docs.flagger.app/how-it-works#webhooks)
* [Load testing](https://docs.flagger.app/how-it-works#load-testing)
* Usage
* [Canary promotions and rollbacks](https://docs.flagger.app/usage/progressive-delivery)
* [Monitoring](https://docs.flagger.app/usage/monitoring)
* [Alerting](https://docs.flagger.app/usage/alerting)
* Tutorials
* [Canary deployments with Helm charts and Weave Flux](https://docs.flagger.app/tutorials/canary-helm-gitops)
### Install
@@ -19,120 +48,70 @@ Deploy Flagger in the `istio-system` namespace using Helm:
```bash
# add the Helm repository
helm repo add flagger https://stefanprodan.github.io/flagger
helm repo add flagger https://flagger.app
# install or upgrade
helm upgrade -i flagger flagger/flagger \
--namespace=istio-system \
--set metricsServer=http://prometheus.istio-system:9090 \
--set controlLoopInterval=1m
--set metricsServer=http://prometheus.istio-system:9090
```
Flagger is compatible with Kubernetes >1.10.0 and Istio >1.0.0.
Flagger is compatible with Kubernetes >1.11.0 and Istio >1.0.0.
### Usage
### Canary CRD
Flagger requires two Kubernetes [deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/):
one for the version you want to upgrade called _primary_ and one for the _canary_.
Each deployment must have a corresponding ClusterIP [service](https://kubernetes.io/docs/concepts/services-networking/service/)
that exposes a port named http or https. These services are used as destinations in a Istio [virtual service](https://istio.io/docs/reference/config/istio.networking.v1alpha3/#VirtualService).
Flagger takes a Kubernetes deployment and optionally a horizontal pod autoscaler (HPA),
then creates a series of objects (Kubernetes deployments, ClusterIP services and Istio virtual services).
These objects expose the application on the mesh and drive the canary analysis and promotion.
![flagger-overview](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/diagrams/flagger-overview.png)
Flagger keeps track of ConfigMaps and Secrets referenced by a Kubernetes Deployment and triggers a canary analysis if any of those objects change.
When promoting a workload in production, both code (container images) and configuration (config maps and secrets) are being synchronised.
Gated canary promotion stages:
* scan for canary deployments
* check Istio virtual service routes are mapped to primary and canary ClusterIP services
* check primary and canary deployments status
* halt rollout if a rolling update is underway
* halt rollout if pods are unhealthy
* increase canary traffic weight percentage from 0% to 5% (step weight)
* check canary HTTP request success rate and latency
* halt rollout if any metric is under the specified threshold
* increment the failed checks counter
* check if the number of failed checks reached the threshold
* route all traffic to primary
* scale to zero the canary deployment and mark it as failed
* wait for the canary deployment to be updated (revision bump) and start over
* increase canary traffic weight by 5% (step weight) till it reaches 50% (max weight)
* halt rollout while canary request success rate is under the threshold
* halt rollout while canary request duration P99 is over the threshold
* halt rollout if the primary or canary deployment becomes unhealthy
* halt rollout while canary deployment is being scaled up/down by HPA
* promote canary to primary
* copy canary deployment spec template over primary
* wait for primary rolling update to finish
* halt rollout if pods are unhealthy
* route all traffic to primary
* scale to zero the canary deployment
* mark rollout as finished
* wait for the canary deployment to be updated (revision bump) and start over
You can change the canary analysis _max weight_ and the _step weight_ percentage in the Flagger's custom resource.
Assuming the primary deployment is named _podinfo_ and the canary one _podinfo-canary_, Flagger will require
a virtual service configured with weight-based routing:
For a deployment named _podinfo_, a canary promotion can be defined using Flagger's custom resource:
```yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: podinfo
spec:
hosts:
- podinfo
http:
- route:
- destination:
host: podinfo
port:
number: 9898
weight: 100
- destination:
host: podinfo-canary
port:
number: 9898
weight: 0
```
Primary and canary services should expose a port named http:
```yaml
apiVersion: v1
kind: Service
metadata:
name: podinfo-canary
spec:
type: ClusterIP
selector:
app: podinfo-canary
ports:
- name: http
port: 9898
targetPort: 9898
```
Based on the two deployments, services and virtual service, a canary promotion can be defined using Flagger's custom resource:
```yaml
apiVersion: flagger.app/v1beta1
apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
name: podinfo
namespace: test
spec:
targetKind: Deployment
virtualService:
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
primary:
# the maximum time in seconds for the canary deployment
# to make progress before it is rollback (default 600s)
progressDeadlineSeconds: 60
# HPA reference (optional)
autoscalerRef:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: podinfo
host: podinfo
canary:
name: podinfo-canary
host: podinfo-canary
service:
# container port
port: 9898
# Istio gateways (optional)
gateways:
- public-gateway.istio-system.svc.cluster.local
# Istio virtual service host names (optional)
hosts:
- podinfo.example.com
# Istio virtual service HTTP match conditions (optional)
match:
- uri:
prefix: /
# Istio virtual service HTTP rewrite (optional)
rewrite:
uri: /
# for emergency cases when you want to ship changes
# in production without analysing the canary (default false)
skipAnalysis: false
canaryAnalysis:
# max number of failed checks
# before rolling back the canary
# schedule interval (default 60s)
interval: 1m
# max number of failed metric checks before rollback
threshold: 10
# max traffic percentage routed to canary
# percentage (0-100)
@@ -140,6 +119,7 @@ spec:
# canary increment step
# percentage (0-100)
stepWeight: 5
# Istio Prometheus checks
metrics:
- name: istio_requests_total
# minimum req success rate (non 5xx responses)
@@ -150,256 +130,23 @@ spec:
# maximum req duration P99
# milliseconds
threshold: 500
interval: 1m
interval: 30s
# external checks (optional)
webhooks:
- name: load-test
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://podinfo.test:9898/"
```
The canary analysis is using the following promql queries:
_HTTP requests success rate percentage_
```sql
sum(
rate(
istio_requests_total{
reporter="destination",
destination_workload_namespace=~"$namespace",
destination_workload=~"$workload",
response_code!~"5.*"
}[$interval]
)
)
/
sum(
rate(
istio_requests_total{
reporter="destination",
destination_workload_namespace=~"$namespace",
destination_workload=~"$workload"
}[$interval]
)
)
```
_HTTP requests milliseconds duration P99_
```sql
histogram_quantile(0.99,
sum(
irate(
istio_request_duration_seconds_bucket{
reporter="destination",
destination_workload=~"$workload",
destination_workload_namespace=~"$namespace"
}[$interval]
)
) by (le)
)
```
### Automated canary analysis, promotions and rollbacks
![flagger-canary](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/diagrams/flagger-canary-hpa.png)
Create a test namespace with Istio sidecar injection enabled:
```bash
export REPO=https://raw.githubusercontent.com/stefanprodan/flagger/master
kubectl apply -f ${REPO}/artifacts/namespaces/test.yaml
```
Create the primary deployment, service and hpa:
```bash
kubectl apply -f ${REPO}/artifacts/workloads/primary-deployment.yaml
kubectl apply -f ${REPO}/artifacts/workloads/primary-service.yaml
kubectl apply -f ${REPO}/artifacts/workloads/primary-hpa.yaml
```
Create the canary deployment, service and hpa:
```bash
kubectl apply -f ${REPO}/artifacts/workloads/canary-deployment.yaml
kubectl apply -f ${REPO}/artifacts/workloads/canary-service.yaml
kubectl apply -f ${REPO}/artifacts/workloads/canary-hpa.yaml
```
Create a virtual service (replace the Istio gateway and the internet domain with your own):
```bash
kubectl apply -f ${REPO}/artifacts/workloads/virtual-service.yaml
```
Create a canary promotion custom resource:
```bash
kubectl apply -f ${REPO}/artifacts/rollouts/podinfo.yaml
```
Canary promotion output:
```
kubectl -n test describe canary/podinfo
Status:
Canary Revision: 16271121
Failed Checks: 6
State: finished
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Synced 3m flagger Starting canary deployment for podinfo.test
Normal Synced 3m flagger Advance podinfo.test canary weight 5
Normal Synced 3m flagger Advance podinfo.test canary weight 10
Normal Synced 3m flagger Advance podinfo.test canary weight 15
Warning Synced 3m flagger Halt podinfo.test advancement request duration 2.525s > 500ms
Warning Synced 3m flagger Halt podinfo.test advancement request duration 1.567s > 500ms
Warning Synced 3m flagger Halt podinfo.test advancement request duration 823ms > 500ms
Normal Synced 2m flagger Advance podinfo.test canary weight 20
Normal Synced 2m flagger Advance podinfo.test canary weight 25
Normal Synced 1m flagger Advance podinfo.test canary weight 30
Warning Synced 1m flagger Halt podinfo.test advancement success rate 82.33% < 99%
Warning Synced 1m flagger Halt podinfo.test advancement success rate 87.22% < 99%
Warning Synced 1m flagger Halt podinfo.test advancement success rate 94.74% < 99%
Normal Synced 1m flagger Advance podinfo.test canary weight 35
Normal Synced 55s flagger Advance podinfo.test canary weight 40
Normal Synced 45s flagger Advance podinfo.test canary weight 45
Normal Synced 35s flagger Advance podinfo.test canary weight 50
Normal Synced 25s flagger Copying podinfo-canary.test template spec to podinfo.test
Warning Synced 15s flagger Waiting for podinfo.test rollout to finish: 1 of 2 updated replicas are available
Normal Synced 5s flagger Promotion completed! Scaling down podinfo-canary.test
```
During the canary analysis you can generate HTTP 500 errors and high latency to test if Flagger pauses the rollout.
Create a tester pod and exec into it:
```bash
kubectl -n test run tester --image=quay.io/stefanprodan/podinfo:1.2.1 -- ./podinfo --port=9898
kubectl -n test exec -it tester-xx-xx sh
```
Generate HTTP 500 errors:
```bash
watch curl http://podinfo-canary:9898/status/500
```
Generate latency:
```bash
watch curl http://podinfo-canary:9898/delay/1
```
When the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary,
the canary is scaled to zero and the rollout is marked as failed.
```
kubectl -n test describe canary/podinfo
Status:
Canary Revision: 16695041
Failed Checks: 10
State: failed
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Synced 3m flagger Starting canary deployment for podinfo.test
Normal Synced 3m flagger Advance podinfo.test canary weight 5
Normal Synced 3m flagger Advance podinfo.test canary weight 10
Normal Synced 3m flagger Advance podinfo.test canary weight 15
Normal Synced 3m flagger Halt podinfo.test advancement success rate 69.17% < 99%
Normal Synced 2m flagger Halt podinfo.test advancement success rate 61.39% < 99%
Normal Synced 2m flagger Halt podinfo.test advancement success rate 55.06% < 99%
Normal Synced 2m flagger Halt podinfo.test advancement success rate 47.00% < 99%
Normal Synced 2m flagger (combined from similar events): Halt podinfo.test advancement success rate 38.08% < 99%
Warning Synced 1m flagger Rolling back podinfo-canary.test failed checks threshold reached 10
Warning Synced 1m flagger Canary failed! Scaling down podinfo-canary.test
```
Trigger a new canary deployment by updating the canary image:
```bash
kubectl -n test set image deployment/podinfo-canary \
podinfod=quay.io/stefanprodan/podinfo:1.2.1
```
Steer detects that the canary revision changed and starts a new rollout:
```
kubectl -n test describe canary/podinfo
Status:
Canary Revision: 19871136
Failed Checks: 0
State: finished
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Synced 3m flagger New revision detected podinfo-canary.test old 17211012 new 17246876
Normal Synced 3m flagger Scaling up podinfo.test
Warning Synced 3m flagger Waiting for podinfo.test rollout to finish: 0 of 1 updated replicas are available
Normal Synced 3m flagger Advance podinfo.test canary weight 5
Normal Synced 3m flagger Advance podinfo.test canary weight 10
Normal Synced 3m flagger Advance podinfo.test canary weight 15
Normal Synced 2m flagger Advance podinfo.test canary weight 20
Normal Synced 2m flagger Advance podinfo.test canary weight 25
Normal Synced 1m flagger Advance podinfo.test canary weight 30
Normal Synced 1m flagger Advance podinfo.test canary weight 35
Normal Synced 55s flagger Advance podinfo.test canary weight 40
Normal Synced 45s flagger Advance podinfo.test canary weight 45
Normal Synced 35s flagger Advance podinfo.test canary weight 50
Normal Synced 25s flagger Copying podinfo-canary.test template spec to podinfo.test
Warning Synced 15s flagger Waiting for podinfo.test rollout to finish: 1 of 2 updated replicas are available
Normal Synced 5s flagger Promotion completed! Scaling down podinfo-canary.test
```
### Monitoring
Flagger comes with a Grafana dashboard made for canary analysis.
Install Grafana with Helm:
```bash
helm upgrade -i flagger-grafana flagger/grafana \
--namespace=istio-system \
--set url=http://prometheus.istio-system:9090
```
The dashboard shows the RED and USE metrics for the primary and canary workloads:
![flagger-grafana](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/screens/grafana-canary-analysis.png)
The canary errors and latency spikes have been recorded as Kubernetes events and logged by Flagger in json format:
```
kubectl -n istio-system logs deployment/flagger --tail=100 | jq .msg
Starting canary deployment for podinfo.test
Advance podinfo.test canary weight 5
Advance podinfo.test canary weight 10
Advance podinfo.test canary weight 15
Advance podinfo.test canary weight 20
Advance podinfo.test canary weight 25
Advance podinfo.test canary weight 30
Advance podinfo.test canary weight 35
Halt podinfo.test advancement success rate 98.69% < 99%
Advance podinfo.test canary weight 40
Halt podinfo.test advancement request duration 1.515s > 500ms
Advance podinfo.test canary weight 45
Advance podinfo.test canary weight 50
Copying podinfo-canary.test template spec to podinfo-primary.test
Scaling down podinfo-canary.test
Promotion completed! podinfo-canary.test revision 81289
```
For more details on how the canary analysis and promotion works please [read the docs](https://docs.flagger.app/how-it-works).
### Roadmap
* Extend the canary analysis and promotion to other types than Kubernetes deployments such as Flux Helm releases or OpenFaaS functions
* Extend the validation mechanism to support other metrics than HTTP success rate and latency
* Add A/B testing capabilities using fixed routing based on HTTP headers and cookies match conditions
* Add support for comparing the canary metrics to the primary ones and do the validation based on the derivation between the two
* Alerting: Trigger Alertmanager on successful or failed promotions (Prometheus instrumentation of the canary analysis)
* Reporting: publish canary analysis results to Slack/Jira/etc
### Contributing

View File

@@ -0,0 +1,68 @@
apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
name: podinfo
namespace: test
spec:
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
# the maximum time in seconds for the canary deployment
# to make progress before it is rollback (default 600s)
progressDeadlineSeconds: 60
# HPA reference (optional)
autoscalerRef:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: podinfo
service:
# container port
port: 9898
# Istio gateways (optional)
gateways:
- public-gateway.istio-system.svc.cluster.local
# Istio virtual service host names (optional)
hosts:
- app.istio.weavedx.com
# Istio virtual service HTTP match conditions (optional)
match:
- uri:
prefix: /
# Istio virtual service HTTP rewrite (optional)
rewrite:
uri: /
# for emergency cases when you want to ship changes
# in production without analysing the canary
skipAnalysis: false
canaryAnalysis:
# schedule interval (default 60s)
interval: 10s
# max number of failed metric checks before rollback
threshold: 10
# max traffic percentage routed to canary
# percentage (0-100)
maxWeight: 50
# canary increment step
# percentage (0-100)
stepWeight: 5
# Istio Prometheus checks
metrics:
- name: istio_requests_total
# minimum req success rate (non 5xx responses)
# percentage (0-100)
threshold: 99
interval: 1m
- name: istio_request_duration_seconds_bucket
# maximum req duration P99
# milliseconds
threshold: 500
interval: 30s
# external checks (optional)
webhooks:
- name: load-test
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://podinfo.test:9898/"

View File

@@ -0,0 +1,67 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: podinfo
namespace: test
labels:
app: podinfo
spec:
minReadySeconds: 5
revisionHistoryLimit: 5
progressDeadlineSeconds: 60
strategy:
rollingUpdate:
maxUnavailable: 0
type: RollingUpdate
selector:
matchLabels:
app: podinfo
template:
metadata:
annotations:
prometheus.io/scrape: "true"
labels:
app: podinfo
spec:
containers:
- name: podinfod
image: quay.io/stefanprodan/podinfo:1.4.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9898
name: http
protocol: TCP
command:
- ./podinfo
- --port=9898
- --level=info
- --random-delay=false
- --random-error=false
env:
- name: PODINFO_UI_COLOR
value: blue
livenessProbe:
exec:
command:
- podcli
- check
- http
- localhost:9898/healthz
initialDelaySeconds: 5
timeoutSeconds: 5
readinessProbe:
exec:
command:
- podcli
- check
- http
- localhost:9898/readyz
initialDelaySeconds: 5
timeoutSeconds: 5
resources:
limits:
cpu: 2000m
memory: 512Mi
requests:
cpu: 100m
memory: 64Mi

View File

@@ -0,0 +1,19 @@
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: podinfo
namespace: test
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
minReplicas: 2
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
# scale up if usage is above
# 99% of the requested CPU (100m)
targetAverageUtilization: 99

View File

@@ -0,0 +1,6 @@
apiVersion: v1
kind: Namespace
metadata:
name: test
labels:
istio-injection: enabled

View File

@@ -0,0 +1,26 @@
apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
metadata:
name: backend
namespace: test
annotations:
flux.weave.works/automated: "true"
flux.weave.works/tag.chart-image: regexp:^1.4.*
spec:
releaseName: backend
chart:
repository: https://flagger.app/
name: podinfo
version: 2.0.0
values:
image:
repository: quay.io/stefanprodan/podinfo
tag: 1.4.0
httpServer:
timeout: 30s
canary:
enabled: true
istioIngress:
enabled: false
loadtest:
enabled: true

View File

@@ -0,0 +1,27 @@
apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
metadata:
name: frontend
namespace: test
annotations:
flux.weave.works/automated: "true"
flux.weave.works/tag.chart-image: semver:~1.4
spec:
releaseName: frontend
chart:
repository: https://flagger.app/
name: podinfo
version: 2.0.0
values:
image:
repository: quay.io/stefanprodan/podinfo
tag: 1.4.0
backend: http://backend-podinfo:9898/echo
canary:
enabled: true
istioIngress:
enabled: true
gateway: public-gateway.istio-system.svc.cluster.local
host: frontend.istio.example.com
loadtest:
enabled: true

View File

@@ -0,0 +1,18 @@
apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
metadata:
name: loadtester
namespace: test
annotations:
flux.weave.works/automated: "true"
flux.weave.works/tag.chart-image: glob:0.*
spec:
releaseName: flagger-loadtester
chart:
repository: https://flagger.app/
name: loadtester
version: 0.1.0
values:
image:
repository: quay.io/stefanprodan/flagger-loadtester
tag: 0.1.0

View File

@@ -0,0 +1,58 @@
apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
name: podinfo
namespace: test
spec:
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
# the maximum time in seconds for the canary deployment
# to make progress before it is rollback (default 600s)
progressDeadlineSeconds: 60
# HPA reference (optional)
autoscalerRef:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: podinfo
service:
# container port
port: 9898
# Istio gateways (optional)
gateways:
- public-gateway.istio-system.svc.cluster.local
# Istio virtual service host names (optional)
hosts:
- app.iowa.weavedx.com
canaryAnalysis:
# schedule interval (default 60s)
interval: 10s
# max number of failed metric checks before rollback
threshold: 10
# max traffic percentage routed to canary
# percentage (0-100)
maxWeight: 50
# canary increment step
# percentage (0-100)
stepWeight: 5
# Istio Prometheus checks
metrics:
- name: istio_requests_total
# minimum req success rate (non 5xx responses)
# percentage (0-100)
threshold: 99
interval: 1m
- name: istio_request_duration_seconds_bucket
# maximum req duration P99
# milliseconds
threshold: 500
interval: 30s
# external checks (optional)
webhooks:
- name: load-test
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://podinfo.test:9898/"

View File

@@ -0,0 +1,16 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: podinfo-config-env
namespace: test
data:
color: blue
---
apiVersion: v1
kind: ConfigMap
metadata:
name: podinfo-config-vol
namespace: test
data:
output: console
textmode: "true"

View File

@@ -0,0 +1,89 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: podinfo
namespace: test
labels:
app: podinfo
spec:
minReadySeconds: 5
revisionHistoryLimit: 5
progressDeadlineSeconds: 60
strategy:
rollingUpdate:
maxUnavailable: 0
type: RollingUpdate
selector:
matchLabels:
app: podinfo
template:
metadata:
annotations:
prometheus.io/scrape: "true"
labels:
app: podinfo
spec:
containers:
- name: podinfod
image: quay.io/stefanprodan/podinfo:1.3.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9898
name: http
protocol: TCP
command:
- ./podinfo
- --port=9898
- --level=info
- --random-delay=false
- --random-error=false
env:
- name: PODINFO_UI_COLOR
valueFrom:
configMapKeyRef:
name: podinfo-config-env
key: color
- name: SECRET_USER
valueFrom:
secretKeyRef:
name: podinfo-secret-env
key: user
livenessProbe:
exec:
command:
- podcli
- check
- http
- localhost:9898/healthz
initialDelaySeconds: 5
timeoutSeconds: 5
readinessProbe:
exec:
command:
- podcli
- check
- http
- localhost:9898/readyz
initialDelaySeconds: 5
timeoutSeconds: 5
resources:
limits:
cpu: 2000m
memory: 512Mi
requests:
cpu: 100m
memory: 64Mi
volumeMounts:
- name: configs
mountPath: /etc/podinfo/configs
readOnly: true
- name: secrets
mountPath: /etc/podinfo/secrets
readOnly: true
volumes:
- name: configs
configMap:
name: podinfo-config-vol
- name: secrets
secret:
secretName: podinfo-secret-vol

View File

@@ -0,0 +1,19 @@
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: podinfo
namespace: test
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
minReplicas: 1
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
# scale up if usage is above
# 99% of the requested CPU (100m)
targetAverageUtilization: 99

View File

@@ -0,0 +1,16 @@
apiVersion: v1
kind: Secret
metadata:
name: podinfo-secret-env
namespace: test
data:
password: cGFzc3dvcmQ=
user: YWRtaW4=
---
apiVersion: v1
kind: Secret
metadata:
name: podinfo-secret-vol
namespace: test
data:
key: cGFzc3dvcmQ=

View File

@@ -4,47 +4,82 @@ metadata:
name: canaries.flagger.app
spec:
group: flagger.app
version: v1beta1
version: v1alpha3
versions:
- name: v1beta1
- name: v1alpha3
served: true
storage: true
- name: v1alpha2
served: true
storage: false
- name: v1alpha1
served: true
storage: false
names:
plural: canaries
singular: canary
kind: Canary
categories:
- all
scope: Namespaced
subresources:
status: {}
additionalPrinterColumns:
- name: Status
type: string
JSONPath: .status.phase
- name: Weight
type: string
JSONPath: .status.canaryWeight
- name: LastTransitionTime
type: string
JSONPath: .status.lastTransitionTime
validation:
openAPIV3Schema:
properties:
spec:
required:
- targetKind
- virtualService
- primary
- canary
- canaryAnalysis
- targetRef
- service
- canaryAnalysis
properties:
targetKind:
type: string
virtualService:
progressDeadlineSeconds:
type: number
targetRef:
type: object
required: ['apiVersion', 'kind', 'name']
properties:
apiVersion:
type: string
kind:
type: string
name:
type: string
primary:
autoscalerRef:
anyOf:
- type: string
- type: object
required: ['apiVersion', 'kind', 'name']
properties:
apiVersion:
type: string
kind:
type: string
name:
type: string
host:
type: string
canary:
service:
type: object
required: ['port']
properties:
name:
type: string
host:
type: string
port:
type: number
skipAnalysis:
type: boolean
canaryAnalysis:
properties:
interval:
type: string
pattern: "^[0-9]+(m|s)"
threshold:
type: number
maxWeight:
@@ -56,12 +91,27 @@ spec:
properties:
items:
type: object
required: ['name', 'interval', 'threshold']
properties:
name:
type: string
interval:
type: string
pattern: "^[0-9]+(m)"
pattern: "^[0-9]+(m|s)"
threshold:
type: number
webhooks:
type: array
properties:
items:
type: object
required: ['name', 'url', 'timeout']
properties:
name:
type: string
url:
type: string
format: url
timeout:
type: string
pattern: "^[0-9]+(m|s)"

View File

@@ -22,8 +22,8 @@ spec:
serviceAccountName: flagger
containers:
- name: flagger
image: stefanprodan/flagger:0.0.1
imagePullPolicy: Always
image: quay.io/stefanprodan/flagger:0.6.0
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8080
@@ -41,6 +41,7 @@ spec:
- --timeout=2
- --spider
- http://localhost:8080/healthz
timeoutSeconds: 5
readinessProbe:
exec:
command:
@@ -50,6 +51,7 @@ spec:
- --timeout=2
- --spider
- http://localhost:8080/healthz
timeoutSeconds: 5
resources:
limits:
memory: "512Mi"

View File

@@ -0,0 +1,27 @@
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: public-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
tls:
httpsRedirect: true
- port:
number: 443
name: https
protocol: HTTPS
hosts:
- "*"
tls:
mode: SIMPLE
privateKey: /etc/istio/ingressgateway-certs/tls.key
serverCertificate: /etc/istio/ingressgateway-certs/tls.crt

View File

@@ -0,0 +1,443 @@
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
labels:
app: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
labels:
app: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: istio-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: istio-system
labels:
app: prometheus
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus
namespace: istio-system
labels:
app: prometheus
data:
prometheus.yml: |-
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'istio-mesh'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-telemetry;prometheus
# Scrape config for envoy stats
- job_name: 'envoy-stats'
metrics_path: /stats/prometheus
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_port_name]
action: keep
regex: '.*-envoy-prom'
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:15090
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod_name
metric_relabel_configs:
# Exclude some of the envoy metrics that have massive cardinality
# This list may need to be pruned further moving forward, as informed
# by performance and scalability testing.
- source_labels: [ cluster_name ]
regex: '(outbound|inbound|prometheus_stats).*'
action: drop
- source_labels: [ tcp_prefix ]
regex: '(outbound|inbound|prometheus_stats).*'
action: drop
- source_labels: [ listener_address ]
regex: '(.+)'
action: drop
- source_labels: [ http_conn_manager_listener_prefix ]
regex: '(.+)'
action: drop
- source_labels: [ http_conn_manager_prefix ]
regex: '(.+)'
action: drop
- source_labels: [ __name__ ]
regex: 'envoy_tls.*'
action: drop
- source_labels: [ __name__ ]
regex: 'envoy_tcp_downstream.*'
action: drop
- source_labels: [ __name__ ]
regex: 'envoy_http_(stats|admin).*'
action: drop
- source_labels: [ __name__ ]
regex: 'envoy_cluster_(lb|retry|bind|internal|max|original).*'
action: drop
- job_name: 'istio-policy'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-policy;http-monitoring
- job_name: 'istio-telemetry'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-telemetry;http-monitoring
- job_name: 'pilot'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-pilot;http-monitoring
- job_name: 'galley'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-galley;http-monitoring
# scrape config for API servers
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- default
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: kubernetes;https
# scrape config for nodes (kubelet)
- job_name: 'kubernetes-nodes'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
# Scrape config for Kubelet cAdvisor.
#
# This is required for Kubernetes 1.7.3 and later, where cAdvisor metrics
# (those whose names begin with 'container_') have been removed from the
# Kubelet metrics endpoint. This job scrapes the cAdvisor endpoint to
# retrieve those metrics.
#
# In Kubernetes 1.7.0-1.7.2, these metrics are only exposed on the cAdvisor
# HTTP endpoint; use "replacement: /api/v1/nodes/${1}:4194/proxy/metrics"
# in that case (and ensure cAdvisor's HTTP server hasn't been disabled with
# the --cadvisor-port=0 Kubelet flag).
#
# This job is not necessary and should be removed in Kubernetes 1.6 and
# earlier versions, or it will cause the metrics to be scraped twice.
- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
# scrape config for service endpoints.
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs: # If first two labels are present, pod should be scraped by the istio-secure job.
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_sidecar_istio_io_status]
action: drop
regex: (.+)
- source_labels: [__meta_kubernetes_pod_annotation_istio_mtls]
action: drop
regex: (true)
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod_name
- job_name: 'kubernetes-pods-istio-secure'
scheme: https
tls_config:
ca_file: /etc/istio-certs/root-cert.pem
cert_file: /etc/istio-certs/cert-chain.pem
key_file: /etc/istio-certs/key.pem
insecure_skip_verify: true # prometheus does not support secure naming.
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
# sidecar status annotation is added by sidecar injector and
# istio_workload_mtls_ability can be specifically placed on a pod to indicate its ability to receive mtls traffic.
- source_labels: [__meta_kubernetes_pod_annotation_sidecar_istio_io_status, __meta_kubernetes_pod_annotation_istio_mtls]
action: keep
regex: (([^;]+);([^;]*))|(([^;]*);(true))
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__] # Only keep address that is host:port
action: keep # otherwise an extra target with ':443' is added for https scheme
regex: ([^:]+):(\d+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod_name
---
# Source: istio/charts/prometheus/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: istio-system
annotations:
prometheus.io/scrape: 'true'
labels:
name: prometheus
spec:
selector:
app: prometheus
ports:
- name: http-prometheus
protocol: TCP
port: 9090
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: istio-system
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
annotations:
sidecar.istio.io/inject: "false"
scheduler.alpha.kubernetes.io/critical-pod: ""
spec:
serviceAccountName: prometheus
containers:
- name: prometheus
image: "docker.io/prom/prometheus:v2.7.1"
imagePullPolicy: IfNotPresent
args:
- '--storage.tsdb.retention=6h'
- '--config.file=/etc/prometheus/prometheus.yml'
ports:
- containerPort: 9090
name: http
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
readinessProbe:
httpGet:
path: /-/ready
port: 9090
resources:
requests:
cpu: 10m
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus
- mountPath: /etc/istio-certs
name: istio-certs
volumes:
- name: config-volume
configMap:
name: prometheus
- name: istio-certs
secret:
defaultMode: 420
optional: true
secretName: istio.default

View File

@@ -0,0 +1,60 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: flagger-loadtester
labels:
app: flagger-loadtester
spec:
selector:
matchLabels:
app: flagger-loadtester
template:
metadata:
labels:
app: flagger-loadtester
annotations:
prometheus.io/scrape: "true"
spec:
containers:
- name: loadtester
image: quay.io/stefanprodan/flagger-loadtester:0.1.0
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8080
command:
- ./loadtester
- -port=8080
- -log-level=info
- -timeout=1h
- -log-cmd-output=true
livenessProbe:
exec:
command:
- wget
- --quiet
- --tries=1
- --timeout=4
- --spider
- http://localhost:8080/healthz
timeoutSeconds: 5
readinessProbe:
exec:
command:
- wget
- --quiet
- --tries=1
- --timeout=4
- --spider
- http://localhost:8080/healthz
timeoutSeconds: 5
resources:
limits:
memory: "512Mi"
cpu: "1000m"
requests:
memory: "32Mi"
cpu: "10m"
securityContext:
readOnlyRootFilesystem: true
runAsUser: 10001

View File

@@ -0,0 +1,15 @@
apiVersion: v1
kind: Service
metadata:
name: flagger-loadtester
labels:
app: flagger-loadtester
spec:
type: ClusterIP
selector:
app: flagger-loadtester
ports:
- name: http
port: 80
protocol: TCP
targetPort: http

View File

@@ -1,42 +0,0 @@
# monitor events: watch "kubectl -n test describe rollout/podinfo | sed -n 35,1000p"
# run tester: kubectl run -n test tester --image=quay.io/stefanprodan/podinfo:1.2.1 -- ./podinfo --port=9898
# generate latency: watch curl http://podinfo-canary:9898/delay/1
# generate errors: watch curl http://podinfo-canary:9898/status/500
# run load test: kubectl run -n test -it --rm --restart=Never hey --image=stefanprodan/loadtest -- sh
# generate load: hey -z 2m -h2 -m POST -d '{test: 1}' -c 10 -q 5 http://podinfo:9898/api/echo
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: podinfo
namespace: test
spec:
targetKind: Deployment
virtualService:
name: podinfo
primary:
name: podinfo
host: podinfo
canary:
name: podinfo-canary
host: podinfo-canary
canaryAnalysis:
# max number of failed metric checks
# before rolling back the canary
threshold: 5
# max traffic percentage routed to canary
# percentage (0-100)
maxWeight: 50
# canary increment step
# percentage (0-100)
stepWeight: 10
metrics:
- name: istio_requests_total
# minimum req success rate (non 5xx responses)
# percentage (0-100)
threshold: 99
interval: 1m
- name: istio_request_duration_seconds_bucket
# maximum req duration P99
# milliseconds
threshold: 500
interval: 30s

View File

@@ -1,36 +0,0 @@
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: podinfoc
namespace: test
spec:
targetKind: Deployment
virtualService:
name: podinfoc
primary:
name: podinfoc-primary
host: podinfoc-primary
canary:
name: podinfoc-canary
host: podinfoc-canary
canaryAnalysis:
# max number of failed metric checks
# before rolling back the canary
threshold: 10
# max traffic percentage routed to canary
# percentage (0-100)
maxWeight: 50
# canary increment step
# percentage (0-100)
stepWeight: 10
metrics:
- name: istio_requests_total
# minimum req success rate (non 5xx responses)
# percentage (0-100)
threshold: 99
interval: 1m
- name: istio_request_duration_seconds_bucket
# maximum req duration P99
# milliseconds
threshold: 500
interval: 30s

View File

@@ -0,0 +1,45 @@
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: podinfo
namespace: test
spec:
gateways:
- public-gateway.istio-system.svc.cluster.local
- mesh
hosts:
- podinfo.istio.weavedx.com
- podinfo
http:
- route:
- destination:
host: podinfo
subset: primary
weight: 50
- destination:
host: podinfo
subset: canary
weight: 50
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: podinfo-destination
namespace: test
spec:
host: podinfo
trafficPolicy:
loadBalancer:
consistentHash:
httpCookie:
name: istiouser
ttl: 30s
subsets:
- name: primary
labels:
app: podinfo
role: primary
- name: canary
labels:
app: podinfo
role: canary

View File

@@ -0,0 +1,34 @@
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: podinfo
namespace: test
spec:
gateways:
- public-gateway.istio-system.svc.cluster.local
- mesh
hosts:
- podinfo.iowa.weavedx.com
- podinfo
http:
- match:
- headers:
user-agent:
regex: ^(?!.*Chrome)(?=.*\bSafari\b).*$
route:
- destination:
host: podinfo-primary
port:
number: 9898
weight: 0
- destination:
host: podinfo
port:
number: 9898
weight: 100
- route:
- destination:
host: podinfo-primary
port:
number: 9898
weight: 100

View File

@@ -0,0 +1,25 @@
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: podinfo
namespace: test
labels:
app: podinfo
spec:
gateways:
- public-gateway.istio-system.svc.cluster.local
- mesh
hosts:
- podinfo.iowa.weavedx.com
- podinfo
http:
- route:
- destination:
host: podinfo-primary
port:
number: 9898
weight: 100
mirror:
host: podinfo
port:
number: 9898

View File

@@ -0,0 +1,26 @@
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: podinfo
namespace: test
labels:
app: podinfo
spec:
gateways:
- public-gateway.istio-system.svc.cluster.local
- mesh
hosts:
- podinfo.iowa.weavedx.com
- podinfo
http:
- route:
- destination:
host: podinfo-primary
port:
number: 9898
weight: 100
- destination:
host: podinfo
port:
number: 9898
weight: 0

View File

@@ -1,10 +1,10 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: podinfo-canary
name: podinfo
namespace: test
labels:
app: podinfo-canary
app: podinfo
spec:
replicas: 1
strategy:
@@ -13,13 +13,13 @@ spec:
type: RollingUpdate
selector:
matchLabels:
app: podinfo-canary
app: podinfo
template:
metadata:
annotations:
prometheus.io/scrape: "true"
labels:
app: podinfo-canary
app: podinfo
spec:
containers:
- name: podinfod

View File

@@ -1,13 +1,13 @@
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: podinfo-canary
name: podinfo
namespace: test
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo-canary
name: podinfo
minReplicas: 2
maxReplicas: 3
metrics:

View File

@@ -1,14 +1,14 @@
apiVersion: v1
kind: Service
metadata:
name: podinfo-canary
name: podinfo
namespace: test
labels:
app: podinfo-canary
app: podinfo
spec:
type: ClusterIP
selector:
app: podinfo-canary
app: podinfo
ports:
- name: http
port: 9898

View File

@@ -1,10 +1,10 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: podinfo
name: podinfo-primary
namespace: test
labels:
app: podinfo
app: podinfo-primary
spec:
replicas: 1
strategy:
@@ -13,13 +13,13 @@ spec:
type: RollingUpdate
selector:
matchLabels:
app: podinfo
app: podinfo-primary
template:
metadata:
annotations:
prometheus.io/scrape: "true"
labels:
app: podinfo
app: podinfo-primary
spec:
containers:
- name: podinfod

View File

@@ -1,13 +1,13 @@
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: podinfo
name: podinfo-primary
namespace: test
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
name: podinfo-primary
minReplicas: 2
maxReplicas: 4
metrics:

View File

@@ -1,14 +1,14 @@
apiVersion: v1
kind: Service
metadata:
name: podinfo
name: podinfo-primary
namespace: test
labels:
app: podinfo
app: podinfo-primary
spec:
type: ClusterIP
selector:
app: podinfo
app: podinfo-primary
ports:
- name: http
port: 9898

View File

@@ -8,18 +8,19 @@ metadata:
spec:
gateways:
- public-gateway.istio-system.svc.cluster.local
- mesh
hosts:
- app.istio.weavedx.com
- podinfo.istio.weavedx.com
- podinfo
http:
- route:
- destination:
host: podinfo
host: podinfo-primary
port:
number: 9898
weight: 100
- destination:
host: podinfo-canary
host: podinfo
port:
number: 9898
weight: 0

View File

@@ -1,6 +1,19 @@
apiVersion: v1
name: flagger
version: 0.0.1
appVersion: 0.0.1
version: 0.6.0
appVersion: 0.6.0
kubeVersion: ">=1.11.0-0"
engine: gotpl
description: Flagger is a Kubernetes operator that automates the promotion of canary deployments using Istio routing for traffic shifting and Prometheus metrics for canary analysis.
home: https://github.com/stefanprodan/flagger
home: https://docs.flagger.app
icon: https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/logo/flagger-icon.png
sources:
- https://github.com/stefanprodan/flagger
maintainers:
- name: stefanprodan
url: https://github.com/stefanprodan
email: stefanprodan@users.noreply.github.com
keywords:
- canary
- istio
- gitops

View File

@@ -1,14 +1,29 @@
# Flagger
Flagger is a Kubernetes operator that automates the promotion of canary deployments
using Istio routing for traffic shifting and Prometheus metrics for canary analysis.
[Flagger](https://github.com/stefanprodan/flagger) is a Kubernetes operator that automates the promotion of
canary deployments using Istio routing for traffic shifting and Prometheus metrics for canary analysis.
Flagger implements a control loop that gradually shifts traffic to the canary while measuring key performance indicators
like HTTP requests success rate, requests average duration and pods health.
Based on the KPIs analysis a canary is promoted or aborted and the analysis result is published to Slack.
## Prerequisites
* Kubernetes >= 1.11
* Istio >= 1.0
* Prometheus >= 2.6
## Installing the Chart
Add Flagger Helm repository:
```console
helm repo add flagger https://flagger.app
```
To install the chart with the release name `flagger`:
```console
$ helm upgrade --install flagger ./charts/flagger --namespace=istio-system
$ helm install --name flagger --namespace istio-system flagger/flagger
```
The command deploys Flagger on the Kubernetes cluster in the istio-system namespace.
@@ -30,9 +45,15 @@ The following tables lists the configurable parameters of the Flagger chart and
Parameter | Description | Default
--- | --- | ---
`image.repository` | image repository | `stefanprodan/flagger`
`image.repository` | image repository | `quay.io/stefanprodan/flagger`
`image.tag` | image tag | `<VERSION>`
`image.pullPolicy` | image pull policy | `IfNotPresent`
`metricsServer` | Prometheus URL | `http://prometheus.istio-system:9090`
`slack.url` | Slack incoming webhook | None
`slack.channel` | Slack channel | None
`slack.user` | Slack username | `flagger`
`rbac.create` | if `true`, create and use RBAC resources | `true`
`crd.create` | if `true`, create Flagger's CRDs | `true`
`resources.requests/cpu` | pod CPU request | `10m`
`resources.requests/memory` | pod memory request | `32Mi`
`resources.limits/cpu` | pod CPU limit | `1000m`
@@ -44,19 +65,20 @@ Parameter | Description | Default
Specify each parameter using the `--set key=value[,key=value]` argument to `helm upgrade`. For example,
```console
$ helm upgrade --install flagger ./charts/flagger \
--namespace=istio-system \
--set=image.tag=0.0.2
$ helm upgrade -i flagger flagger/flagger \
--namespace istio-system \
--set slack.url=https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK \
--set slack.channel=general
```
Alternatively, a YAML file that specifies the values for the above parameters can be provided while installing the chart. For example,
```console
$ helm upgrade --install flagger ./charts/flagger \
--namespace=istio-system \
$ helm upgrade -i flagger flagger/flagger \
--namespace istio-system \
-f values.yaml
```
> **Tip**: You can use the default [values.yaml](values.yaml)
```

View File

@@ -1,4 +1,10 @@
{{/* vim: set filetype=mustache: */}}
{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "flagger.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{/*
Expand the name of the chart.
*/}}
@@ -25,8 +31,12 @@ If release name contains chart name it will be used as a full name.
{{- end -}}
{{/*
Create chart name and version as used by the chart label.
Create the name of the service account to use
*/}}
{{- define "flagger.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" -}}
{{- define "flagger.serviceAccountName" -}}
{{- if .Values.serviceAccount.create -}}
{{ default (include "flagger.fullname" .) .Values.serviceAccount.name }}
{{- else -}}
{{ default "default" .Values.serviceAccount.name }}
{{- end -}}
{{- end -}}

View File

@@ -1,9 +1,11 @@
{{- if .Values.serviceAccount.create }}
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ template "flagger.name" . }}
name: {{ template "flagger.serviceAccountName" . }}
labels:
app: {{ template "flagger.name" . }}
chart: {{ template "flagger.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
helm.sh/chart: {{ template "flagger.chart" . }}
app.kubernetes.io/name: {{ template "flagger.name" . }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}

View File

@@ -5,47 +5,82 @@ metadata:
name: canaries.flagger.app
spec:
group: flagger.app
version: v1beta1
version: v1alpha3
versions:
- name: v1beta1
- name: v1alpha3
served: true
storage: true
- name: v1alpha2
served: true
storage: false
- name: v1alpha1
served: true
storage: false
names:
plural: canaries
singular: canary
kind: Canary
categories:
- all
scope: Namespaced
subresources:
status: {}
additionalPrinterColumns:
- name: Status
type: string
JSONPath: .status.phase
- name: Weight
type: string
JSONPath: .status.canaryWeight
- name: LastTransitionTime
type: string
JSONPath: .status.lastTransitionTime
validation:
openAPIV3Schema:
properties:
spec:
required:
- targetKind
- virtualService
- primary
- canary
- targetRef
- service
- canaryAnalysis
properties:
targetKind:
type: string
virtualService:
progressDeadlineSeconds:
type: number
targetRef:
type: object
required: ['apiVersion', 'kind', 'name']
properties:
apiVersion:
type: string
kind:
type: string
name:
type: string
primary:
autoscalerRef:
anyOf:
- type: string
- type: object
required: ['apiVersion', 'kind', 'name']
properties:
apiVersion:
type: string
kind:
type: string
name:
type: string
host:
type: string
canary:
service:
type: object
required: ['port']
properties:
name:
type: string
host:
type: string
port:
type: number
skipAnalysis:
type: boolean
canaryAnalysis:
properties:
interval:
type: string
pattern: "^[0-9]+(m|s)"
threshold:
type: number
maxWeight:
@@ -57,12 +92,28 @@ spec:
properties:
items:
type: object
required: ['name', 'interval', 'threshold']
properties:
name:
type: string
interval:
type: string
pattern: "^[0-9]+(m)"
pattern: "^[0-9]+(m|s)"
threshold:
type: number
webhooks:
type: array
properties:
items:
type: object
required: ['name', 'url', 'timeout']
properties:
name:
type: string
url:
type: string
format: url
timeout:
type: string
pattern: "^[0-9]+(m|s)"
{{- end }}

View File

@@ -3,25 +3,25 @@ kind: Deployment
metadata:
name: {{ include "flagger.fullname" . }}
labels:
app: {{ include "flagger.name" . }}
chart: {{ include "flagger.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
helm.sh/chart: {{ template "flagger.chart" . }}
app.kubernetes.io/name: {{ template "flagger.name" . }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
app.kubernetes.io/instance: {{ .Release.Name }}
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: {{ include "flagger.name" . }}
release: {{ .Release.Name }}
app.kubernetes.io/name: {{ template "flagger.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
template:
metadata:
labels:
app: {{ include "flagger.name" . }}
release: {{ .Release.Name }}
app.kubernetes.io/name: {{ template "flagger.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
spec:
serviceAccountName: flagger
serviceAccountName: {{ template "flagger.serviceAccountName" . }}
containers:
- name: flagger
securityContext:
@@ -35,26 +35,32 @@ spec:
command:
- ./flagger
- -log-level=info
- -control-loop-interval={{ .Values.controlLoopInterval }}
- -metrics-server={{ .Values.metricsServer }}
{{- if .Values.slack.url }}
- -slack-url={{ .Values.slack.url }}
- -slack-user={{ .Values.slack.user }}
- -slack-channel={{ .Values.slack.channel }}
{{- end }}
livenessProbe:
exec:
command:
- wget
- --quiet
- --tries=1
- --timeout=2
- --timeout=4
- --spider
- http://localhost:8080/healthz
timeoutSeconds: 5
readinessProbe:
exec:
command:
- wget
- --quiet
- --tries=1
- --timeout=2
- --timeout=4
- --spider
- http://localhost:8080/healthz
timeoutSeconds: 5
resources:
{{ toYaml .Values.resources | indent 12 }}
{{- with .Values.nodeSelector }}

View File

@@ -4,10 +4,10 @@ kind: ClusterRole
metadata:
name: {{ template "flagger.fullname" . }}
labels:
app: {{ template "flagger.name" . }}
chart: {{ template "flagger.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
helm.sh/chart: {{ template "flagger.chart" . }}
app.kubernetes.io/name: {{ template "flagger.name" . }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
app.kubernetes.io/instance: {{ .Release.Name }}
rules:
- apiGroups: ['*']
resources: ['*']
@@ -20,16 +20,16 @@ kind: ClusterRoleBinding
metadata:
name: {{ template "flagger.fullname" . }}
labels:
app: {{ template "flagger.name" . }}
chart: {{ template "flagger.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
helm.sh/chart: {{ template "flagger.chart" . }}
app.kubernetes.io/name: {{ template "flagger.name" . }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
app.kubernetes.io/instance: {{ .Release.Name }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: {{ template "flagger.fullname" . }}
subjects:
- name: {{ template "flagger.name" . }}
namespace: {{ .Release.Namespace | quote }}
- name: {{ template "flagger.serviceAccountName" . }}
namespace: {{ .Release.Namespace }}
kind: ServiceAccount
{{- end }}

View File

@@ -1,17 +1,30 @@
# Default values for flagger.
image:
repository: stefanprodan/flagger
tag: 0.0.1
repository: quay.io/stefanprodan/flagger
tag: 0.6.0
pullPolicy: IfNotPresent
controlLoopInterval: "10s"
metricsServer: "http://prometheus.istio-system.svc.cluster.local:9090"
crd:
slack:
user: flagger
channel:
# incoming webhook https://api.slack.com/incoming-webhooks
url:
serviceAccount:
# serviceAccount.create: Whether to create a service account or not
create: true
# serviceAccount.name: The name of the service account to create or use
name: ""
rbac:
# rbac.create: `true` if rbac resources should be created
create: true
crd:
# crd.create: `true` if custom resource definitions should be created
create: true
nameOverride: ""

View File

@@ -1,6 +1,13 @@
apiVersion: v1
name: grafana
version: 5.2.4
appVersion: 5.2.0
description: A Helm chart for monitoring progressive deployments powered by Istio and Flagger
home: https://github.com/stefanprodan/flagger
version: 1.0.0
appVersion: 5.4.3
description: Grafana dashboards for monitoring Flagger canary deployments
icon: https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/logo/flagger-icon.png
home: https://flagger.app
sources:
- https://github.com/stefanprodan/flagger
maintainers:
- name: stefanprodan
url: https://github.com/stefanprodan
email: stefanprodan@users.noreply.github.com

View File

@@ -1,16 +1,31 @@
# Weave Cloud Grafana
# Flagger Grafana
Grafana v5 with Kubernetes dashboards and Prometheus and Weave Cloud data sources.
Grafana dashboards for monitoring progressive deployments powered by Istio, Prometheus and Flagger.
![flagger-grafana](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/screens/grafana-canary-analysis.png)
## Prerequisites
* Kubernetes >= 1.11
* Istio >= 1.0
* Prometheus >= 2.6
## Installing the Chart
To install the chart with the release name `my-release`:
Add Flagger Helm repository:
```console
$ helm install stable/grafana --name my-release \
--set service.type=NodePort \
--set token=WEAVE-TOKEN \
--set password=admin
helm repo add flagger https://flagger.app
```
To install the chart with the release name `flagger-grafana`:
```console
helm upgrade -i flagger-grafana flagger/grafana \
--namespace=istio-system \
--set url=http://prometheus:9090 \
--set user=admin \
--set password=admin
```
The command deploys Grafana on the Kubernetes cluster in the default namespace.
@@ -18,10 +33,10 @@ The [configuration](#configuration) section lists the parameters that can be con
## Uninstalling the Chart
To uninstall/delete the `my-release` deployment:
To uninstall/delete the `flagger-grafana` deployment:
```console
$ helm delete --purge my-release
helm delete --purge flagger-grafana
```
The command removes all the Kubernetes components associated with the chart and deletes the release.
@@ -34,32 +49,31 @@ Parameter | Description | Default
--- | --- | ---
`image.repository` | Image repository | `grafana/grafana`
`image.pullPolicy` | Image pull policy | `IfNotPresent`
`image.tag` | Image tag | `5.0.1`
`image.tag` | Image tag | `<VERSION>`
`replicaCount` | desired number of pods | `1`
`resources` | pod resources | `none`
`tolerations` | List of node taints to tolerate | `[]`
`affinity` | node/pod affinities | `node`
`nodeSelector` | node labels for pod assignment | `{}`
`service.type` | type of service | `LoadBalancer`
`url` | Prometheus URL, used when Weave token is empty | `http://prometheus:9090`
`service.type` | type of service | `ClusterIP`
`url` | Prometheus URL, used when Weave Cloud token is empty | `http://prometheus:9090`
`token` | Weave Cloud token | `none`
`user` | Grafana admin username | `admin`
`password` | Grafana admin password | `none`
`password` | Grafana admin password | `admin`
Specify each parameter using the `--set key=value[,key=value]` argument to `helm install`. For example,
```console
$ helm install stable/grafana --name my-release \
--set=token=WEAVE-TOKEN \
--set password=admin
helm install flagger/grafana --name flagger-grafana \
--set token=WEAVE-CLOUD-TOKEN
```
Alternatively, a YAML file that specifies the values for the above parameters can be provided while installing the chart. For example,
```console
$ helm install stable/grafana --name my-release -f values.yaml
helm install flagger/grafana --name flagger-grafana -f values.yaml
```
> **Tip**: You can use the default [values.yaml](values.yaml)
```

View File

@@ -2,7 +2,6 @@
"annotations": {
"list": [
{
"$$hashKey": "object:1587",
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
@@ -16,12 +15,12 @@
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": null,
"iteration": 1534587617141,
"id": 1,
"iteration": 1549736611069,
"links": [],
"panels": [
{
"content": "<div class=\"dashboard-header text-center\">\n<span>RED: $primary.$namespace</span>\n</div>",
"content": "<div class=\"dashboard-header text-center\">\n<span>RED: $canary.$namespace</span>\n</div>",
"gridPos": {
"h": 3,
"w": 24,
@@ -179,7 +178,6 @@
"tableColumn": "",
"targets": [
{
"$$hashKey": "object:2857",
"expr": "sum(irate(istio_requests_total{reporter=\"destination\",destination_workload_namespace=~\"$namespace\",destination_workload=~\"$primary\",response_code!~\"5.*\"}[30s])) / sum(irate(istio_requests_total{reporter=\"destination\",destination_workload_namespace=~\"$namespace\",destination_workload=~\"$primary\"}[30s]))",
"format": "time_series",
"intervalFactor": 1,
@@ -344,7 +342,6 @@
"tableColumn": "",
"targets": [
{
"$$hashKey": "object:2810",
"expr": "sum(irate(istio_requests_total{reporter=\"destination\",destination_workload_namespace=~\"$namespace\",destination_workload=~\"$canary\",response_code!~\"5.*\"}[30s])) / sum(irate(istio_requests_total{reporter=\"destination\",destination_workload_namespace=~\"$namespace\",destination_workload=~\"$canary\"}[30s]))",
"format": "time_series",
"intervalFactor": 1,
@@ -363,7 +360,7 @@
"value": "null"
}
],
"valueName": "avg"
"valueName": "current"
},
{
"aliasColors": {},
@@ -432,6 +429,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Primary: Request Duration",
"tooltip": {
@@ -464,7 +462,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
@@ -533,6 +535,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Canary: Request Duration",
"tooltip": {
@@ -565,10 +568,14 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"content": "<div class=\"dashboard-header text-center\">\n<span>USE: $primary.$namespace</span>\n</div>",
"content": "<div class=\"dashboard-header text-center\">\n<span>USE: $canary.$namespace</span>\n</div>",
"gridPos": {
"h": 3,
"w": 24,
@@ -623,7 +630,6 @@
"steppedLine": false,
"targets": [
{
"$$hashKey": "object:1685",
"expr": "sum(rate(container_cpu_usage_seconds_total{cpu=\"total\",namespace=\"$namespace\",pod_name=~\"$primary.*\", container_name!~\"POD|istio-proxy\"}[1m])) by (pod_name)",
"format": "time_series",
"hide": false,
@@ -634,6 +640,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Primary: CPU Usage by Pod",
"tooltip": {
@@ -651,7 +658,6 @@
},
"yaxes": [
{
"$$hashKey": "object:1845",
"format": "s",
"label": "CPU seconds / second",
"logBase": 1,
@@ -660,7 +666,6 @@
"show": true
},
{
"$$hashKey": "object:1846",
"format": "short",
"label": null,
"logBase": 1,
@@ -668,7 +673,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
@@ -711,8 +720,7 @@
"steppedLine": false,
"targets": [
{
"$$hashKey": "object:1685",
"expr": "sum(rate(container_cpu_usage_seconds_total{cpu=\"total\",namespace=\"$namespace\",pod_name=~\"$canary.*\", container_name!~\"POD|istio-proxy\"}[1m])) by (pod_name)",
"expr": "sum(rate(container_cpu_usage_seconds_total{cpu=\"total\",namespace=\"$namespace\",pod_name=~\"$canary.*\", pod_name!~\"$primary.*\", container_name!~\"POD|istio-proxy\"}[1m])) by (pod_name)",
"format": "time_series",
"hide": false,
"intervalFactor": 1,
@@ -722,6 +730,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Canary: CPU Usage by Pod",
"tooltip": {
@@ -739,7 +748,6 @@
},
"yaxes": [
{
"$$hashKey": "object:1845",
"format": "s",
"label": "CPU seconds / second",
"logBase": 1,
@@ -748,7 +756,6 @@
"show": true
},
{
"$$hashKey": "object:1846",
"format": "short",
"label": null,
"logBase": 1,
@@ -756,7 +763,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
@@ -799,7 +810,6 @@
"steppedLine": false,
"targets": [
{
"$$hashKey": "object:1685",
"expr": "sum(container_memory_working_set_bytes{namespace=\"$namespace\",pod_name=~\"$primary.*\", container_name!~\"POD|istio-proxy\"}) by (pod_name)",
"format": "time_series",
"hide": false,
@@ -811,6 +821,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Primary: Memory Usage by Pod",
"tooltip": {
@@ -828,7 +839,6 @@
},
"yaxes": [
{
"$$hashKey": "object:1845",
"decimals": null,
"format": "bytes",
"label": "",
@@ -838,7 +848,6 @@
"show": true
},
{
"$$hashKey": "object:1846",
"format": "short",
"label": null,
"logBase": 1,
@@ -846,7 +855,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
@@ -889,8 +902,7 @@
"steppedLine": false,
"targets": [
{
"$$hashKey": "object:1685",
"expr": "sum(container_memory_working_set_bytes{namespace=\"$namespace\",pod_name=~\"$canary.*\", container_name!~\"POD|istio-proxy\"}) by (pod_name)",
"expr": "sum(container_memory_working_set_bytes{namespace=\"$namespace\",pod_name=~\"$canary.*\", pod_name!~\"$primary.*\", container_name!~\"POD|istio-proxy\"}) by (pod_name)",
"format": "time_series",
"hide": false,
"interval": "",
@@ -901,6 +913,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Canary: Memory Usage by Pod",
"tooltip": {
@@ -918,7 +931,6 @@
},
"yaxes": [
{
"$$hashKey": "object:1845",
"decimals": null,
"format": "bytes",
"label": "",
@@ -928,7 +940,6 @@
"show": true
},
{
"$$hashKey": "object:1846",
"format": "short",
"label": null,
"logBase": 1,
@@ -936,7 +947,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
@@ -975,12 +990,10 @@
"renderer": "flot",
"seriesOverrides": [
{
"$$hashKey": "object:3641",
"alias": "received",
"color": "#f9d9f9"
},
{
"$$hashKey": "object:3649",
"alias": "transmited",
"color": "#f29191"
}
@@ -990,7 +1003,6 @@
"steppedLine": false,
"targets": [
{
"$$hashKey": "object:2598",
"expr": "sum(rate (container_network_receive_bytes_total{namespace=\"$namespace\",pod_name=~\"$primary.*\"}[1m])) ",
"format": "time_series",
"intervalFactor": 1,
@@ -998,7 +1010,6 @@
"refId": "A"
},
{
"$$hashKey": "object:3245",
"expr": "-sum (rate (container_network_transmit_bytes_total{namespace=\"$namespace\",pod_name=~\"$primary.*\"}[1m]))",
"format": "time_series",
"intervalFactor": 1,
@@ -1008,6 +1019,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Primary: Network I/O",
"tooltip": {
@@ -1025,7 +1037,6 @@
},
"yaxes": [
{
"$$hashKey": "object:1845",
"decimals": null,
"format": "Bps",
"label": "",
@@ -1035,7 +1046,6 @@
"show": true
},
{
"$$hashKey": "object:1846",
"format": "short",
"label": null,
"logBase": 1,
@@ -1043,7 +1053,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
@@ -1082,12 +1096,10 @@
"renderer": "flot",
"seriesOverrides": [
{
"$$hashKey": "object:3641",
"alias": "received",
"color": "#f9d9f9"
},
{
"$$hashKey": "object:3649",
"alias": "transmited",
"color": "#f29191"
}
@@ -1097,16 +1109,14 @@
"steppedLine": false,
"targets": [
{
"$$hashKey": "object:2598",
"expr": "sum(rate (container_network_receive_bytes_total{namespace=\"$namespace\",pod_name=~\"$canary.*\"}[1m])) ",
"expr": "sum(rate (container_network_receive_bytes_total{namespace=\"$namespace\",pod_name=~\"$canary.*\",pod_name!~\"$primary.*\"}[1m])) ",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "received",
"refId": "A"
},
{
"$$hashKey": "object:3245",
"expr": "-sum (rate (container_network_transmit_bytes_total{namespace=\"$namespace\",pod_name=~\"$canary.*\"}[1m]))",
"expr": "-sum (rate (container_network_transmit_bytes_total{namespace=\"$namespace\",pod_name=~\"$canary.*\",pod_name!~\"$primary.*\"}[1m]))",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "transmited",
@@ -1115,6 +1125,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Canary: Network I/O",
"tooltip": {
@@ -1132,7 +1143,6 @@
},
"yaxes": [
{
"$$hashKey": "object:1845",
"decimals": null,
"format": "Bps",
"label": "",
@@ -1142,7 +1152,6 @@
"show": true
},
{
"$$hashKey": "object:1846",
"format": "short",
"label": null,
"logBase": 1,
@@ -1150,10 +1159,14 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"content": "<div class=\"dashboard-header text-center\">\n<span>IN/OUTBOUND: $primary.$namespace</span>\n</div>",
"content": "<div class=\"dashboard-header text-center\">\n<span>IN/OUTBOUND: $canary.$namespace</span>\n</div>",
"gridPos": {
"h": 3,
"w": 24,
@@ -1205,7 +1218,6 @@
"steppedLine": false,
"targets": [
{
"$$hashKey": "object:1953",
"expr": "round(sum(irate(istio_requests_total{connection_security_policy=\"mutual_tls\", destination_workload_namespace=~\"$namespace\", destination_workload=~\"$primary\", reporter=\"destination\"}[30s])) by (source_workload, source_workload_namespace, response_code), 0.001)",
"format": "time_series",
"hide": false,
@@ -1215,7 +1227,6 @@
"step": 2
},
{
"$$hashKey": "object:1954",
"expr": "round(sum(irate(istio_requests_total{connection_security_policy!=\"mutual_tls\", destination_workload_namespace=~\"$namespace\", destination_workload=~\"$primary\", reporter=\"destination\"}[30s])) by (source_workload, source_workload_namespace, response_code), 0.001)",
"format": "time_series",
"hide": false,
@@ -1227,6 +1238,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Primary: Incoming Requests by Source And Response Code",
"tooltip": {
@@ -1246,7 +1258,6 @@
},
"yaxes": [
{
"$$hashKey": "object:1999",
"format": "ops",
"label": null,
"logBase": 1,
@@ -1255,7 +1266,6 @@
"show": true
},
{
"$$hashKey": "object:2000",
"format": "short",
"label": null,
"logBase": 1,
@@ -1263,7 +1273,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
@@ -1323,6 +1337,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Canary: Incoming Requests by Source And Response Code",
"tooltip": {
@@ -1357,7 +1372,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
@@ -1416,6 +1435,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Primary: Outgoing Requests by Destination And Response Code",
"tooltip": {
@@ -1450,7 +1470,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
@@ -1509,6 +1533,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Canary: Outgoing Requests by Destination And Response Code",
"tooltip": {
@@ -1543,7 +1568,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
}
],
"refresh": "10s",
@@ -1555,10 +1584,12 @@
{
"allValue": null,
"current": {
"text": "demo",
"value": "demo"
"selected": true,
"text": "test",
"value": "test"
},
"datasource": "prometheus",
"definition": "",
"hide": 0,
"includeAll": false,
"label": "Namespace",
@@ -1568,6 +1599,7 @@
"query": "query_result(sum(istio_requests_total) by (destination_workload_namespace) or sum(istio_tcp_sent_bytes_total) by (destination_workload_namespace))",
"refresh": 1,
"regex": "/.*_namespace=\"([^\"]*).*/",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
@@ -1578,10 +1610,12 @@
{
"allValue": null,
"current": {
"text": "primary",
"value": "primary"
"selected": false,
"text": "backend-primary",
"value": "backend-primary"
},
"datasource": "prometheus",
"definition": "",
"hide": 0,
"includeAll": false,
"label": "Primary",
@@ -1591,6 +1625,7 @@
"query": "query_result(sum(istio_requests_total{destination_workload_namespace=~\"$namespace\"}) by (destination_service_name))",
"refresh": 1,
"regex": "/.*destination_service_name=\"([^\"]*).*/",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [],
@@ -1601,10 +1636,12 @@
{
"allValue": null,
"current": {
"text": "canary",
"value": "canary"
"selected": true,
"text": "backend",
"value": "backend"
},
"datasource": "prometheus",
"definition": "",
"hide": 0,
"includeAll": false,
"label": "Canary",
@@ -1614,6 +1651,7 @@
"query": "query_result(sum(istio_requests_total{destination_workload_namespace=~\"$namespace\"}) by (destination_service_name))",
"refresh": 1,
"regex": "/.*destination_service_name=\"([^\"]*).*/",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [],
@@ -1653,7 +1691,7 @@
]
},
"timezone": "",
"title": "Flagger",
"title": "Flagger canary",
"uid": "RdykD7tiz",
"version": 2
}
"version": 3
}

View File

@@ -38,12 +38,21 @@ spec:
# path: /
# port: http
env:
- name: GF_PATHS_PROVISIONING
value: /etc/grafana/provisioning/
{{- if .Values.password }}
- name: GF_SECURITY_ADMIN_USER
value: {{ .Values.user }}
- name: GF_SECURITY_ADMIN_PASSWORD
value: {{ .Values.password }}
- name: GF_PATHS_PROVISIONING
value: /etc/grafana/provisioning/
{{- else }}
- name: GF_AUTH_BASIC_ENABLED
value: "false"
- name: GF_AUTH_ANONYMOUS_ENABLED
value: "true"
- name: GF_AUTH_ANONYMOUS_ORG_ROLE
value: Admin
{{- end }}
volumeMounts:
- name: grafana
mountPath: /var/lib/grafana

View File

@@ -6,7 +6,7 @@ replicaCount: 1
image:
repository: grafana/grafana
tag: 5.2.4
tag: 5.4.3
pullPolicy: IfNotPresent
service:
@@ -28,7 +28,7 @@ tolerations: []
affinity: {}
user: admin
password: admin
password:
# Istio Prometheus instance
url: http://prometheus:9090

View File

@@ -0,0 +1,22 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/

View File

@@ -0,0 +1,20 @@
apiVersion: v1
name: loadtester
version: 0.1.0
appVersion: 0.1.0
kubeVersion: ">=1.11.0-0"
engine: gotpl
description: Flagger's load testing services based on rakyll/hey that generates traffic during canary analysis when configured as a webhook.
home: https://docs.flagger.app
icon: https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/logo/flagger-icon.png
sources:
- https://github.com/stefanprodan/flagger
maintainers:
- name: stefanprodan
url: https://github.com/stefanprodan
email: stefanprodan@users.noreply.github.com
keywords:
- canary
- istio
- gitops
- load testing

View File

@@ -0,0 +1,78 @@
# Flagger load testing service
[Flagger's](https://github.com/stefanprodan/flagger) load testing service is based on
[rakyll/hey](https://github.com/rakyll/hey)
and can be used to generates traffic during canary analysis when configured as a webhook.
## Prerequisites
* Kubernetes >= 1.11
* Istio >= 1.0
## Installing the Chart
Add Flagger Helm repository:
```console
helm repo add flagger https://flagger.app
```
To install the chart with the release name `flagger-loadtester`:
```console
helm upgrade -i flagger-loadtester flagger/loadtester
```
The command deploys Grafana on the Kubernetes cluster in the default namespace.
> **Tip**: Note that the namespace where you deploy the load tester should have the Istio sidecar injection enabled
The [configuration](#configuration) section lists the parameters that can be configured during installation.
## Uninstalling the Chart
To uninstall/delete the `flagger-loadtester` deployment:
```console
helm delete --purge flagger-loadtester
```
The command removes all the Kubernetes components associated with the chart and deletes the release.
## Configuration
The following tables lists the configurable parameters of the load tester chart and their default values.
Parameter | Description | Default
--- | --- | ---
`image.repository` | Image repository | `quay.io/stefanprodan/flagger-loadtester`
`image.pullPolicy` | Image pull policy | `IfNotPresent`
`image.tag` | Image tag | `<VERSION>`
`replicaCount` | desired number of pods | `1`
`resources.requests.cpu` | CPU requests | `10m`
`resources.requests.memory` | memory requests | `64Mi`
`tolerations` | List of node taints to tolerate | `[]`
`affinity` | node/pod affinities | `node`
`nodeSelector` | node labels for pod assignment | `{}`
`service.type` | type of service | `ClusterIP`
`service.port` | ClusterIP port | `80`
`cmd.logOutput` | Log the command output to stderr | `true`
`cmd.timeout` | Command execution timeout | `1h`
`logLevel` | Log level can be debug, info, warning, error or panic | `info`
Specify each parameter using the `--set key=value[,key=value]` argument to `helm install`. For example,
```console
helm install flagger/loadtester --name flagger-loadtester \
--set cmd.logOutput=false
```
Alternatively, a YAML file that specifies the values for the above parameters can be provided while installing the chart. For example,
```console
helm install flagger/loadtester --name flagger-loadtester -f values.yaml
```
> **Tip**: You can use the default [values.yaml](values.yaml)

View File

@@ -0,0 +1 @@
Flagger's load testing service is available at http://{{ include "loadtester.fullname" . }}.{{ .Release.Namespace }}/

View File

@@ -2,35 +2,31 @@
{{/*
Expand the name of the chart.
*/}}
{{- define "podinfo-flagger.name" -}}
{{- define "loadtester.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{/*
Create a default fully qualified app name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
The release name is used as a full name.
If release name contains chart name it will be used as a full name.
*/}}
{{- define "podinfo-flagger.fullname" -}}
{{- define "loadtester.fullname" -}}
{{- if .Values.fullnameOverride -}}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- $name := default .Chart.Name .Values.nameOverride -}}
{{- if contains $name .Release.Name -}}
{{- .Release.Name | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{- end -}}
{{- define "podinfo-flagger.primary" -}}
{{- printf "%s-%s" .Release.Name "primary" | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{- define "podinfo-flagger.canary" -}}
{{- printf "%s-%s" .Release.Name "canary" | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "podinfo-flagger.chart" -}}
{{- define "loadtester.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" -}}
{{- end -}}

View File

@@ -0,0 +1,66 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "loadtester.fullname" . }}
labels:
app.kubernetes.io/name: {{ include "loadtester.name" . }}
helm.sh/chart: {{ include "loadtester.chart" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
app: {{ include "loadtester.name" . }}
template:
metadata:
labels:
app: {{ include "loadtester.name" . }}
spec:
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
containerPort: 8080
command:
- ./loadtester
- -port=8080
- -log-level={{ .Values.logLevel }}
- -timeout={{ .Values.cmd.timeout }}
- -log-cmd-output={{ .Values.cmd.logOutput }}
livenessProbe:
exec:
command:
- wget
- --quiet
- --tries=1
- --timeout=4
- --spider
- http://localhost:8080/healthz
timeoutSeconds: 5
readinessProbe:
exec:
command:
- wget
- --quiet
- --tries=1
- --timeout=4
- --spider
- http://localhost:8080/healthz
timeoutSeconds: 5
resources:
{{- toYaml .Values.resources | nindent 12 }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}

View File

@@ -0,0 +1,18 @@
apiVersion: v1
kind: Service
metadata:
name: {{ include "loadtester.fullname" . }}
labels:
app.kubernetes.io/name: {{ include "loadtester.name" . }}
helm.sh/chart: {{ include "loadtester.chart" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
spec:
type: {{ .Values.service.type }}
ports:
- port: {{ .Values.service.port }}
targetPort: http
protocol: TCP
name: http
selector:
app: {{ include "loadtester.name" . }}

View File

@@ -0,0 +1,29 @@
replicaCount: 1
image:
repository: quay.io/stefanprodan/flagger-loadtester
tag: 0.1.0
pullPolicy: IfNotPresent
logLevel: info
cmd:
logOutput: true
timeout: 1h
nameOverride: ""
fullnameOverride: ""
service:
type: ClusterIP
port: 80
resources:
requests:
cpu: 10m
memory: 64Mi
nodeSelector: {}
tolerations: []
affinity: {}

View File

@@ -1,56 +0,0 @@
# Podinfo Istio
Podinfo is a tiny web application made with Go
that showcases best practices of running microservices in Kubernetes.
## Installing the Chart
Create an Istio enabled namespace:
```console
kubectl create namespace test
kubectl label namespace test istio-injection=enabled
```
Create an Istio Gateway in the `istio-system` namespace named `public-gateway`:
```yaml
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: public-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
tls:
httpsRedirect: true
- port:
number: 443
name: https
protocol: HTTPS
hosts:
- "*"
tls:
mode: SIMPLE
privateKey: /etc/istio/ingressgateway-certs/tls.key
serverCertificate: /etc/istio/ingressgateway-certs/tls.crt
```
Create the `frontend` release by specifying the external domain name:
```console
helm upgrade frontend -i ./charts/podinfo-flagger \
--namespace=test \
--set gateway.enabled=true \
--set gateway.host=podinfo.example.com
```

View File

@@ -1 +0,0 @@
{{ template "podinfo-flagger.fullname" . }} has been deployed successfully!

View File

@@ -1,63 +0,0 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ template "podinfo-flagger.canary" . }}
labels:
app: {{ template "podinfo-flagger.fullname" . }}
chart: {{ template "podinfo-flagger.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
selector:
matchLabels:
app: {{ template "podinfo-flagger.canary" . }}
template:
metadata:
labels:
app: {{ template "podinfo-flagger.canary" . }}
annotations:
prometheus.io/scrape: "true"
spec:
terminationGracePeriodSeconds: 30
containers:
- name: podinfod
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.imagePullPolicy }}
command:
- ./podinfo
- --port={{ .Values.containerPort }}
- --level={{ .Values.logLevel }}
env:
- name: PODINFO_UI_COLOR
value: green
ports:
- name: http
containerPort: {{ .Values.containerPort }}
protocol: TCP
livenessProbe:
exec:
command:
- podcli
- check
- http
- localhost:{{ .Values.containerPort }}/healthz
readinessProbe:
exec:
command:
- podcli
- check
- http
- localhost:{{ .Values.containerPort }}/readyz
periodSeconds: 3
volumeMounts:
- name: data
mountPath: /data
resources:
{{ toYaml .Values.resources | indent 12 }}
volumes:
- name: data
emptyDir: {}

View File

@@ -1,21 +0,0 @@
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: {{ template "podinfo-flagger.canary" . }}
labels:
app: {{ template "podinfo-flagger.fullname" . }}
chart: {{ template "podinfo-flagger.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ template "podinfo-flagger.canary" . }}
minReplicas: {{ .Values.hpa.minReplicas }}
maxReplicas: {{ .Values.hpa.maxReplicas }}
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: {{ .Values.hpa.targetAverageUtilization }}

View File

@@ -1,18 +0,0 @@
apiVersion: v1
kind: Service
metadata:
name: {{ template "podinfo-flagger.canary" . }}
labels:
app: {{ template "podinfo-flagger.fullname" . }}
chart: {{ template "podinfo-flagger.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
type: ClusterIP
ports:
- port: {{ .Values.containerPort }}
targetPort: http
protocol: TCP
name: http
selector:
app: {{ template "podinfo-flagger.canary" . }}

View File

@@ -1,63 +0,0 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ template "podinfo-flagger.primary" . }}
labels:
app: {{ template "podinfo-flagger.fullname" . }}
chart: {{ template "podinfo-flagger.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
selector:
matchLabels:
app: {{ template "podinfo-flagger.primary" . }}
template:
metadata:
labels:
app: {{ template "podinfo-flagger.primary" . }}
annotations:
prometheus.io/scrape: "true"
spec:
terminationGracePeriodSeconds: 30
containers:
- name: podinfod
image: "quay.io/stefanprodan/podinfo:1.1.1"
imagePullPolicy: {{ .Values.imagePullPolicy }}
command:
- ./podinfo
- --port={{ .Values.containerPort }}
- --level={{ .Values.logLevel }}
env:
- name: PODINFO_UI_COLOR
value: blue
ports:
- name: http
containerPort: {{ .Values.containerPort }}
protocol: TCP
livenessProbe:
exec:
command:
- podcli
- check
- http
- localhost:{{ .Values.containerPort }}/healthz
readinessProbe:
exec:
command:
- podcli
- check
- http
- localhost:{{ .Values.containerPort }}/readyz
periodSeconds: 3
volumeMounts:
- name: data
mountPath: /data
resources:
{{ toYaml .Values.resources | indent 12 }}
volumes:
- name: data
emptyDir: {}

View File

@@ -1,21 +0,0 @@
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: {{ template "podinfo-flagger.primary" . }}
labels:
app: {{ template "podinfo-flagger.fullname" . }}
chart: {{ template "podinfo-flagger.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ template "podinfo-flagger.primary" . }}
minReplicas: {{ .Values.hpa.minReplicas }}
maxReplicas: {{ .Values.hpa.maxReplicas }}
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: {{ .Values.hpa.targetAverageUtilization }}

View File

@@ -1,18 +0,0 @@
apiVersion: v1
kind: Service
metadata:
name: {{ template "podinfo-flagger.primary" . }}
labels:
app: {{ template "podinfo-flagger.fullname" . }}
chart: {{ template "podinfo-flagger.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
type: ClusterIP
ports:
- port: {{ .Values.containerPort }}
targetPort: http
protocol: TCP
name: http
selector:
app: {{ template "podinfo-flagger.primary" . }}

View File

@@ -1,30 +0,0 @@
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: {{ template "podinfo-flagger.fullname" . }}
labels:
app: {{ template "podinfo-flagger.fullname" . }}
chart: {{ template "podinfo-flagger.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
hosts:
- {{ template "podinfo-flagger.fullname" . }}
{{- if .Values.gateway.enabled }}
- {{ .Values.gateway.host }}
gateways:
- {{ .Values.gateway.name }}
{{- end }}
http:
- route:
- destination:
host: {{ template "podinfo-flagger.primary" . }}
port:
number: {{ .Values.containerPort }}
weight: 100
- destination:
host: {{ template "podinfo-flagger.canary" . }}
port:
number: {{ .Values.containerPort }}
weight: 0
timeout: {{ .Values.timeout }}

View File

@@ -1,30 +0,0 @@
# Default values for podinfo-flagger.
image:
repository: quay.io/stefanprodan/podinfo
tag: "1.2.0"
# enable the gateway when exposing the service outside the cluster
gateway:
enabled: false
name: public-gateway.istio-system.svc.cluster.local
# external domain name
host:
hpa:
minReplicas: 2
maxReplicas: 4
targetAverageUtilization: 99
timeout: 30s
logLevel: info
containerPort: 9898
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 2000m
memory: 512Mi
requests:
cpu: 100m
memory: 64Mi

View File

@@ -1,10 +1,10 @@
apiVersion: v1
version: 2.0.0
appVersion: 1.2.1
appVersion: 1.4.0
name: podinfo
engine: gotpl
name: podinfo-flagger
description: Podinfo Helm chart for Flagger progressive delivery
home: https://github.com/stefanprodan/k8s-podinfo
description: Flagger canary deployment demo chart
home: https://github.com/stefanprodan/flagger
maintainers:
- email: stefanprodan@users.noreply.github.com
name: stefanprodan

79
charts/podinfo/README.md Normal file
View File

@@ -0,0 +1,79 @@
# Podinfo
Podinfo is a tiny web application made with Go
that showcases best practices of running canary deployments with Flagger and Istio.
## Installing the Chart
Add Flagger Helm repository:
```console
helm repo add flagger https://flagger.app
```
To install the chart with the release name `frontend`:
```console
helm upgrade -i frontend flagger/podinfo \
--namespace test \
--set nameOverride=frontend \
--set backend=http://backend.test:9898/echo \
--set canary.enabled=true \
--set canary.istioIngress.enabled=true \
--set canary.istioIngress.gateway=public-gateway.istio-system.svc.cluster.local \
--set canary.istioIngress.host=frontend.istio.example.com
```
To install the chart as `backend`:
```console
helm upgrade -i backend flagger/podinfo \
--namespace test \
--set nameOverride=backend \
--set canary.enabled=true
```
## Uninstalling the Chart
To uninstall/delete the `frontend` deployment:
```console
$ helm delete --purge frontend
```
The command removes all the Kubernetes components associated with the chart and deletes the release.
## Configuration
The following tables lists the configurable parameters of the podinfo chart and their default values.
Parameter | Description | Default
--- | --- | ---
`image.repository` | image repository | `quay.io/stefanprodan/podinfo`
`image.tag` | image tag | `<VERSION>`
`image.pullPolicy` | image pull policy | `IfNotPresent`
`hpa.enabled` | enables HPA | `true`
`hpa.cpu` | target CPU usage per pod | `80`
`hpa.memory` | target memory usage per pod | `512Mi`
`hpa.minReplicas` | maximum pod replicas | `2`
`hpa.maxReplicas` | maximum pod replicas | `4`
`resources.requests/cpu` | pod CPU request | `1m`
`resources.requests/memory` | pod memory request | `16Mi`
`backend` | backend URL | None
`faults.delay` | random HTTP response delays between 0 and 5 seconds | `false`
`faults.error` | 1/3 chances of a random HTTP response error | `false`
Specify each parameter using the `--set key=value[,key=value]` argument to `helm install`. For example,
```console
$ helm install flagger/podinfo --name frontend \
--set=image.tag=1.4.1,hpa.enabled=false
```
Alternatively, a YAML file that specifies the values for the above parameters can be provided while installing the chart. For example,
```console
$ helm install flagger/podinfo --name frontend -f values.yaml
```

View File

@@ -0,0 +1 @@
podinfo {{ .Release.Name }} deployed!

View File

@@ -0,0 +1,43 @@
{{/* vim: set filetype=mustache: */}}
{{/*
Expand the name of the chart.
*/}}
{{- define "podinfo.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{/*
Create a default fully qualified app name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
If release name contains chart name it will be used as a full name.
*/}}
{{- define "podinfo.fullname" -}}
{{- if .Values.fullnameOverride -}}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- $name := default .Chart.Name .Values.nameOverride -}}
{{- if contains $name .Release.Name -}}
{{- .Release.Name | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{- end -}}
{{- end -}}
{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "podinfo.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{/*
Create chart name suffix.
*/}}
{{- define "podinfo.suffix" -}}
{{- if .Values.canary.enabled -}}
{{- "-primary" -}}
{{- else -}}
{{- "" -}}
{{- end -}}
{{- end -}}

View File

@@ -0,0 +1,54 @@
{{- if .Values.canary.enabled }}
apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
name: {{ template "podinfo.fullname" . }}
labels:
app: {{ template "podinfo.name" . }}
chart: {{ template "podinfo.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ template "podinfo.fullname" . }}
progressDeadlineSeconds: 60
autoscalerRef:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: {{ template "podinfo.fullname" . }}
service:
port: {{ .Values.service.port }}
{{- if .Values.canary.istioIngress.enabled }}
gateways:
- {{ .Values.canary.istioIngress.gateway }}
hosts:
- {{ .Values.canary.istioIngress.host }}
{{- end }}
canaryAnalysis:
interval: {{ .Values.canary.analysis.interval }}
threshold: {{ .Values.canary.analysis.threshold }}
maxWeight: {{ .Values.canary.analysis.maxWeight }}
stepWeight: {{ .Values.canary.analysis.stepWeight }}
metrics:
- name: istio_requests_total
threshold: {{ .Values.canary.thresholds.successRate }}
interval: 1m
- name: istio_request_duration_seconds_bucket
threshold: {{ .Values.canary.thresholds.latency }}
interval: 1m
{{- if .Values.canary.loadtest.enabled }}
webhooks:
- name: load-test-get
url: {{ .Values.canary.loadtest.url }}
timeout: 5s
metadata:
cmd: "hey -z 1m -q 5 -c 2 http://{{ template "podinfo.fullname" . }}.{{ .Release.Namespace }}:{{ .Values.service.port }}"
- name: load-test-post
url: {{ .Values.canary.loadtest.url }}
timeout: 5s
metadata:
cmd: "hey -z 1m -q 5 -c 2 -m POST -d '{\"test\": true}' http://{{ template "podinfo.fullname" . }}.{{ .Release.Namespace }}:{{ .Values.service.port }}/echo"
{{- end }}
{{- end }}

View File

@@ -0,0 +1,15 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ template "podinfo.fullname" . }}
labels:
app: {{ template "podinfo.name" . }}
chart: {{ template "podinfo.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
data:
config.yaml: |-
# http settings
http-client-timeout: 1m
http-server-timeout: {{ .Values.httpServer.timeout }}
http-server-shutdown-timeout: 5s

View File

@@ -0,0 +1,93 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ template "podinfo.fullname" . }}
labels:
app: {{ template "podinfo.name" . }}
chart: {{ template "podinfo.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
app: {{ template "podinfo.fullname" . }}
template:
metadata:
labels:
app: {{ template "podinfo.fullname" . }}
annotations:
prometheus.io/scrape: 'true'
spec:
terminationGracePeriodSeconds: 30
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
command:
- ./podinfo
- --port={{ .Values.service.port }}
- --level={{ .Values.logLevel }}
- --random-delay={{ .Values.faults.delay }}
- --random-error={{ .Values.faults.error }}
- --config-path=/podinfo/config
env:
{{- if .Values.message }}
- name: PODINFO_UI_MESSAGE
value: {{ .Values.message }}
{{- end }}
{{- if .Values.backend }}
- name: PODINFO_BACKEND_URL
value: {{ .Values.backend }}
{{- end }}
ports:
- name: http
containerPort: {{ .Values.service.port }}
protocol: TCP
livenessProbe:
exec:
command:
- podcli
- check
- http
- localhost:{{ .Values.service.port }}/healthz
initialDelaySeconds: 5
timeoutSeconds: 5
readinessProbe:
exec:
command:
- podcli
- check
- http
- localhost:{{ .Values.service.port }}/readyz
initialDelaySeconds: 5
timeoutSeconds: 5
volumeMounts:
- name: data
mountPath: /data
- name: config
mountPath: /podinfo/config
readOnly: true
resources:
{{ toYaml .Values.resources | indent 12 }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{ toYaml . | indent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{ toYaml . | indent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{ toYaml . | indent 8 }}
{{- end }}
volumes:
- name: data
emptyDir: {}
- name: config
configMap:
name: {{ template "podinfo.fullname" . }}

View File

@@ -0,0 +1,37 @@
{{- if .Values.hpa.enabled -}}
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: {{ template "podinfo.fullname" . }}
labels:
app: {{ template "podinfo.name" . }}
chart: {{ template "podinfo.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
scaleTargetRef:
apiVersion: apps/v1beta2
kind: Deployment
name: {{ template "podinfo.fullname" . }}
minReplicas: {{ .Values.hpa.minReplicas }}
maxReplicas: {{ .Values.hpa.maxReplicas }}
metrics:
{{- if .Values.hpa.cpu }}
- type: Resource
resource:
name: cpu
targetAverageUtilization: {{ .Values.hpa.cpu }}
{{- end }}
{{- if .Values.hpa.memory }}
- type: Resource
resource:
name: memory
targetAverageValue: {{ .Values.hpa.memory }}
{{- end }}
{{- if .Values.hpa.requests }}
- type: Pod
pods:
metricName: http_requests
targetAverageValue: {{ .Values.hpa.requests }}
{{- end }}
{{- end }}

View File

@@ -0,0 +1,20 @@
{{- if not .Values.canary.enabled }}
apiVersion: v1
kind: Service
metadata:
name: {{ template "podinfo.fullname" . }}
labels:
app: {{ template "podinfo.name" . }}
chart: {{ template "podinfo.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
type: {{ .Values.service.type }}
ports:
- port: {{ .Values.service.port }}
targetPort: http
protocol: TCP
name: http
selector:
app: {{ template "podinfo.fullname" . }}
{{- end }}

View File

@@ -0,0 +1,22 @@
{{- $url := printf "%s%s.%s:%v" (include "podinfo.fullname" .) (include "podinfo.suffix" .) .Release.Namespace .Values.service.port -}}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ template "podinfo.fullname" . }}-tests
labels:
heritage: {{ .Release.Service }}
release: {{ .Release.Name }}
chart: {{ .Chart.Name }}-{{ .Chart.Version }}
app: {{ template "podinfo.name" . }}
data:
run.sh: |-
@test "HTTP POST /echo" {
run curl --retry 3 --connect-timeout 2 -sSX POST -d 'test' {{ $url }}/echo
[ $output = "test" ]
}
@test "HTTP POST /store" {
curl --retry 3 --connect-timeout 2 -sSX POST -d 'test' {{ $url }}/store
}
@test "HTTP GET /" {
curl --retry 3 --connect-timeout 2 -sS {{ $url }} | grep hostname
}

View File

@@ -0,0 +1,43 @@
apiVersion: v1
kind: Pod
metadata:
name: {{ template "podinfo.fullname" . }}-tests-{{ randAlphaNum 5 | lower }}
annotations:
"helm.sh/hook": test-success
sidecar.istio.io/inject: "false"
labels:
heritage: {{ .Release.Service }}
release: {{ .Release.Name }}
chart: {{ .Chart.Name }}-{{ .Chart.Version }}
app: {{ template "podinfo.name" . }}
spec:
initContainers:
- name: "test-framework"
image: "dduportal/bats:0.4.0"
command:
- "bash"
- "-c"
- |
set -ex
# copy bats to tools dir
cp -R /usr/local/libexec/ /tools/bats/
volumeMounts:
- mountPath: /tools
name: tools
containers:
- name: {{ .Release.Name }}-ui-test
image: dduportal/bats:0.4.0
command: ["/tools/bats/bats", "-t", "/tests/run.sh"]
volumeMounts:
- mountPath: /tests
name: tests
readOnly: true
- mountPath: /tools
name: tools
volumes:
- name: tests
configMap:
name: {{ template "podinfo.fullname" . }}-tests
- name: tools
emptyDir: {}
restartPolicy: Never

View File

@@ -0,0 +1,73 @@
# Default values for podinfo.
image:
repository: quay.io/stefanprodan/podinfo
tag: 1.4.0
pullPolicy: IfNotPresent
service:
type: ClusterIP
port: 9898
hpa:
enabled: true
minReplicas: 2
maxReplicas: 2
cpu: 80
memory: 512Mi
canary:
enabled: true
istioIngress:
enabled: false
# Istio ingress gateway name
gateway: public-gateway.istio-system.svc.cluster.local
# external host name eg. podinfo.example.com
host:
analysis:
# schedule interval (default 60s)
interval: 15s
# max number of failed metric checks before rollback
threshold: 10
# max traffic percentage routed to canary
# percentage (0-100)
maxWeight: 50
# canary increment step
# percentage (0-100)
stepWeight: 5
thresholds:
# minimum req success rate (non 5xx responses)
# percentage (0-100)
successRate: 99
# maximum req duration P99
# milliseconds
latency: 500
loadtest:
enabled: false
# load tester address
url: http://flagger-loadtester.test/
resources:
limits:
requests:
cpu: 100m
memory: 32Mi
nodeSelector: {}
tolerations: []
affinity: {}
nameOverride: ""
fullnameOverride: ""
logLevel: info
backend: #http://backend-podinfo:9898/echo
message: #UI greetings
faults:
delay: false
error: false
httpServer:
timeout: 30s

View File

@@ -6,12 +6,13 @@ import (
"time"
_ "github.com/istio/glog"
sharedclientset "github.com/knative/pkg/client/clientset/versioned"
istioclientset "github.com/knative/pkg/client/clientset/versioned"
"github.com/knative/pkg/signals"
clientset "github.com/stefanprodan/flagger/pkg/client/clientset/versioned"
informers "github.com/stefanprodan/flagger/pkg/client/informers/externalversions"
"github.com/stefanprodan/flagger/pkg/controller"
"github.com/stefanprodan/flagger/pkg/logging"
"github.com/stefanprodan/flagger/pkg/notifier"
"github.com/stefanprodan/flagger/pkg/server"
"github.com/stefanprodan/flagger/pkg/version"
"k8s.io/client-go/kubernetes"
@@ -27,15 +28,21 @@ var (
controlLoopInterval time.Duration
logLevel string
port string
slackURL string
slackUser string
slackChannel string
)
func init() {
flag.StringVar(&kubeconfig, "kubeconfig", "", "Path to a kubeconfig. Only required if out-of-cluster.")
flag.StringVar(&masterURL, "master", "", "The address of the Kubernetes API server. Overrides any value in kubeconfig. Only required if out-of-cluster.")
flag.StringVar(&metricsServer, "metrics-server", "http://prometheus:9090", "Prometheus URL")
flag.DurationVar(&controlLoopInterval, "control-loop-interval", 10*time.Second, "wait interval between rollouts")
flag.DurationVar(&controlLoopInterval, "control-loop-interval", 10*time.Second, "Kubernetes API sync interval")
flag.StringVar(&logLevel, "log-level", "debug", "Log level can be: debug, info, warning, error.")
flag.StringVar(&port, "port", "8080", "Port to listen on.")
flag.StringVar(&slackURL, "slack-url", "", "Slack hook URL.")
flag.StringVar(&slackUser, "slack-user", "flagger", "Slack user name.")
flag.StringVar(&slackChannel, "slack-channel", "", "Slack channel.")
}
func main() {
@@ -59,18 +66,18 @@ func main() {
logger.Fatalf("Error building kubernetes clientset: %v", err)
}
sharedClient, err := sharedclientset.NewForConfig(cfg)
istioClient, err := istioclientset.NewForConfig(cfg)
if err != nil {
logger.Fatalf("Error building shared clientset: %v", err)
logger.Fatalf("Error building istio clientset: %v", err)
}
rolloutClient, err := clientset.NewForConfig(cfg)
flaggerClient, err := clientset.NewForConfig(cfg)
if err != nil {
logger.Fatalf("Error building example clientset: %s", err.Error())
}
rolloutInformerFactory := informers.NewSharedInformerFactory(rolloutClient, time.Second*30)
rolloutInformer := rolloutInformerFactory.Flagger().V1beta1().Canaries()
flaggerInformerFactory := informers.NewSharedInformerFactory(flaggerClient, time.Second*30)
canaryInformer := flaggerInformerFactory.Flagger().V1alpha3().Canaries()
logger.Infof("Starting flagger version %s revision %s", version.VERSION, version.REVISION)
@@ -88,24 +95,35 @@ func main() {
logger.Errorf("Metrics server %s unreachable %v", metricsServer, err)
}
var slack *notifier.Slack
if slackURL != "" {
slack, err = notifier.NewSlack(slackURL, slackUser, slackChannel)
if err != nil {
logger.Errorf("Notifier %v", err)
} else {
logger.Infof("Slack notifications enabled for channel %s", slack.Channel)
}
}
// start HTTP server
go server.ListenAndServe(port, 3*time.Second, logger, stopCh)
c := controller.NewController(
kubeClient,
sharedClient,
rolloutClient,
rolloutInformer,
istioClient,
flaggerClient,
canaryInformer,
controlLoopInterval,
metricsServer,
logger,
slack,
)
rolloutInformerFactory.Start(stopCh)
flaggerInformerFactory.Start(stopCh)
logger.Info("Waiting for informer caches to sync")
for _, synced := range []cache.InformerSynced{
rolloutInformer.Informer().HasSynced,
canaryInformer.Informer().HasSynced,
} {
if ok := cache.WaitForCacheSync(stopCh, synced); !ok {
logger.Fatalf("Failed to wait for cache sync")

44
cmd/loadtester/main.go Normal file
View File

@@ -0,0 +1,44 @@
package main
import (
"flag"
"github.com/knative/pkg/signals"
"github.com/stefanprodan/flagger/pkg/loadtester"
"github.com/stefanprodan/flagger/pkg/logging"
"log"
"time"
)
var VERSION = "0.1.0"
var (
logLevel string
port string
timeout time.Duration
logCmdOutput bool
)
func init() {
flag.StringVar(&logLevel, "log-level", "debug", "Log level can be: debug, info, warning, error.")
flag.StringVar(&port, "port", "9090", "Port to listen on.")
flag.DurationVar(&timeout, "timeout", time.Hour, "Command exec timeout.")
flag.BoolVar(&logCmdOutput, "log-cmd-output", true, "Log command output to stderr")
}
func main() {
flag.Parse()
logger, err := logging.NewLogger(logLevel)
if err != nil {
log.Fatalf("Error creating logger: %v", err)
}
defer logger.Sync()
stopCh := signals.SetupSignalHandler()
taskRunner := loadtester.NewTaskRunner(logger, timeout, logCmdOutput)
go taskRunner.Start(100*time.Millisecond, stopCh)
logger.Infof("Starting load tester v%s API on port %s", VERSION, port)
loadtester.ListenAndServe(port, time.Minute, logger, taskRunner, stopCh)
}

73
code-of-conduct.md Normal file
View File

@@ -0,0 +1,73 @@
# Contributor Covenant Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, gender identity and expression, level of experience,
education, socio-economic status, nationality, personal appearance, race,
religion, or sexual identity and orientation.
## Our Standards
Examples of behavior that contributes to creating a positive environment
include:
* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Our Responsibilities
Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.
Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.
## Scope
This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior
may be reported by contacting stefan.prodan(at)gmail.com.
All complaints will be reviewed and investigated and will result in a response that is deemed
necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of
an incident. Further details of specific enforcement policies may be
posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org), version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

View File

@@ -1 +0,0 @@
flagger.app

View File

@@ -1,413 +0,0 @@
# flagger
[![build](https://travis-ci.org/stefanprodan/flagger.svg?branch=master)](https://travis-ci.org/stefanprodan/flagger)
[![report](https://goreportcard.com/badge/github.com/stefanprodan/flagger)](https://goreportcard.com/report/github.com/stefanprodan/flagger)
[![license](https://img.shields.io/github/license/stefanprodan/flagger.svg)](https://github.com/stefanprodan/flagger/blob/master/LICENSE)
[![release](https://img.shields.io/github/release/stefanprodan/flagger/all.svg)](https://github.com/stefanprodan/flagger/releases)
Flagger is a Kubernetes operator that automates the promotion of canary deployments
using Istio routing for traffic shifting and Prometheus metrics for canary analysis.
The project is currently in experimental phase and it is expected that breaking changes
to the API will be made in the upcoming releases.
### Install
Before installing Flagger make sure you have Istio setup up with Prometheus enabled.
If you are new to Istio you can follow my [Istio service mesh walk-through](https://github.com/stefanprodan/istio-gke).
Deploy Flagger in the `istio-system` namespace using Helm:
```bash
# add the Helm repository
helm repo add flagger https://stefanprodan.github.io/flagger
# install or upgrade
helm upgrade -i flagger flagger/flagger \
--namespace=istio-system \
--set metricsServer=http://prometheus.istio-system:9090 \
--set controlLoopInterval=1m
```
Flagger is compatible with Kubernetes >1.10.0 and Istio >1.0.0.
### Usage
Flagger requires two Kubernetes [deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/):
one for the version you want to upgrade called _primary_ and one for the _canary_.
Each deployment must have a corresponding ClusterIP [service](https://kubernetes.io/docs/concepts/services-networking/service/)
that exposes a port named http or https. These services are used as destinations in a Istio [virtual service](https://istio.io/docs/reference/config/istio.networking.v1alpha3/#VirtualService).
![flagger-overview](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/diagrams/flagger-overview.png)
Gated canary promotion stages:
* scan for canary deployments
* check Istio virtual service routes are mapped to primary and canary ClusterIP services
* check primary and canary deployments status
* halt rollout if a rolling update is underway
* halt rollout if pods are unhealthy
* increase canary traffic weight percentage from 0% to 5% (step weight)
* check canary HTTP request success rate and latency
* halt rollout if any metric is under the specified threshold
* increment the failed checks counter
* check if the number of failed checks reached the threshold
* route all traffic to primary
* scale to zero the canary deployment and mark it as failed
* wait for the canary deployment to be updated (revision bump) and start over
* increase canary traffic weight by 5% (step weight) till it reaches 50% (max weight)
* halt rollout while canary request success rate is under the threshold
* halt rollout while canary request duration P99 is over the threshold
* halt rollout if the primary or canary deployment becomes unhealthy
* halt rollout while canary deployment is being scaled up/down by HPA
* promote canary to primary
* copy canary deployment spec template over primary
* wait for primary rolling update to finish
* halt rollout if pods are unhealthy
* route all traffic to primary
* scale to zero the canary deployment
* mark rollout as finished
* wait for the canary deployment to be updated (revision bump) and start over
You can change the canary analysis _max weight_ and the _step weight_ percentage in the Flagger's custom resource.
Assuming the primary deployment is named _podinfo_ and the canary one _podinfo-canary_, Flagger will require
a virtual service configured with weight-based routing:
```yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: podinfo
spec:
hosts:
- podinfo
http:
- route:
- destination:
host: podinfo
port:
number: 9898
weight: 100
- destination:
host: podinfo-canary
port:
number: 9898
weight: 0
```
Primary and canary services should expose a port named http:
```yaml
apiVersion: v1
kind: Service
metadata:
name: podinfo-canary
spec:
type: ClusterIP
selector:
app: podinfo-canary
ports:
- name: http
port: 9898
targetPort: 9898
```
Based on the two deployments, services and virtual service, a canary promotion can be defined using Flagger's custom resource:
```yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: podinfo
namespace: test
spec:
targetKind: Deployment
virtualService:
name: podinfo
primary:
name: podinfo
host: podinfo
canary:
name: podinfo-canary
host: podinfo-canary
canaryAnalysis:
# max number of failed checks
# before rolling back the canary
threshold: 10
# max traffic percentage routed to canary
# percentage (0-100)
maxWeight: 50
# canary increment step
# percentage (0-100)
stepWeight: 5
metrics:
- name: istio_requests_total
# minimum req success rate (non 5xx responses)
# percentage (0-100)
threshold: 99
interval: 1m
- name: istio_request_duration_seconds_bucket
# maximum req duration P99
# milliseconds
threshold: 500
interval: 1m
```
The canary analysis is using the following promql queries:
_HTTP requests success rate percentage_
```promql
sum(
rate(
istio_requests_total{
reporter="destination",
destination_workload_namespace=~"$namespace",
destination_workload=~"$workload",
response_code!~"5.*"
}[$interval]
)
)
/
sum(
rate(
istio_requests_total{
reporter="destination",
destination_workload_namespace=~"$namespace",
destination_workload=~"$workload"
}[$interval]
)
)
```
_HTTP requests milliseconds duration P99_
```promql
histogram_quantile(0.99,
sum(
irate(
istio_request_duration_seconds_bucket{
reporter="destination",
destination_workload=~"$workload",
destination_workload_namespace=~"$namespace"
}[$interval]
)
) by (le)
)
```
### Automated canary analysis, promotions and rollbacks
![flagger-canary](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/diagrams/flagger-canary-hpa.png)
Create a test namespace with Istio sidecar injection enabled:
```bash
export REPO=https://raw.githubusercontent.com/stefanprodan/flagger/master
kubectl apply -f ${REPO}/artifacts/namespaces/test.yaml
```
Create the primary deployment, service and hpa:
```bash
kubectl apply -f ${REPO}/artifacts/workloads/primary-deployment.yaml
kubectl apply -f ${REPO}/artifacts/workloads/primary-service.yaml
kubectl apply -f ${REPO}/artifacts/workloads/primary-hpa.yaml
```
Create the canary deployment, service and hpa:
```bash
kubectl apply -f ${REPO}/artifacts/workloads/canary-deployment.yaml
kubectl apply -f ${REPO}/artifacts/workloads/canary-service.yaml
kubectl apply -f ${REPO}/artifacts/workloads/canary-hpa.yaml
```
Create a virtual service (replace the Istio gateway and the internet domain with your own):
```bash
kubectl apply -f ${REPO}/artifacts/workloads/virtual-service.yaml
```
Create a canary promotion custom resource:
```bash
kubectl apply -f ${REPO}/artifacts/rollouts/podinfo.yaml
```
Canary promotion output:
```
kubectl -n test describe canary/podinfo
Status:
Canary Revision: 16271121
Failed Checks: 6
State: finished
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Synced 3m flagger Starting canary deployment for podinfo.test
Normal Synced 3m flagger Advance podinfo.test canary weight 5
Normal Synced 3m flagger Advance podinfo.test canary weight 10
Normal Synced 3m flagger Advance podinfo.test canary weight 15
Warning Synced 3m flagger Halt podinfo.test advancement request duration 2.525s > 500ms
Warning Synced 3m flagger Halt podinfo.test advancement request duration 1.567s > 500ms
Warning Synced 3m flagger Halt podinfo.test advancement request duration 823ms > 500ms
Normal Synced 2m flagger Advance podinfo.test canary weight 20
Normal Synced 2m flagger Advance podinfo.test canary weight 25
Normal Synced 1m flagger Advance podinfo.test canary weight 30
Warning Synced 1m flagger Halt podinfo.test advancement success rate 82.33% < 99%
Warning Synced 1m flagger Halt podinfo.test advancement success rate 87.22% < 99%
Warning Synced 1m flagger Halt podinfo.test advancement success rate 94.74% < 99%
Normal Synced 1m flagger Advance podinfo.test canary weight 35
Normal Synced 55s flagger Advance podinfo.test canary weight 40
Normal Synced 45s flagger Advance podinfo.test canary weight 45
Normal Synced 35s flagger Advance podinfo.test canary weight 50
Normal Synced 25s flagger Copying podinfo-canary.test template spec to podinfo.test
Warning Synced 15s flagger Waiting for podinfo.test rollout to finish: 1 of 2 updated replicas are available
Normal Synced 5s flagger Promotion completed! Scaling down podinfo-canary.test
```
During the canary analysis you can generate HTTP 500 errors and high latency to test if Flagger pauses the rollout.
Create a tester pod and exec into it:
```bash
kubectl -n test run tester --image=quay.io/stefanprodan/podinfo:1.2.1 -- ./podinfo --port=9898
kubectl -n test exec -it tester-xx-xx sh
```
Generate HTTP 500 errors:
```bash
watch curl http://podinfo-canary:9898/status/500
```
Generate latency:
```bash
watch curl http://podinfo-canary:9898/delay/1
```
When the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary,
the canary is scaled to zero and the rollout is marked as failed.
```
kubectl -n test describe canary/podinfo
Status:
Canary Revision: 16695041
Failed Checks: 10
State: failed
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Synced 3m flagger Starting canary deployment for podinfo.test
Normal Synced 3m flagger Advance podinfo.test canary weight 5
Normal Synced 3m flagger Advance podinfo.test canary weight 10
Normal Synced 3m flagger Advance podinfo.test canary weight 15
Normal Synced 3m flagger Halt podinfo.test advancement success rate 69.17% < 99%
Normal Synced 2m flagger Halt podinfo.test advancement success rate 61.39% < 99%
Normal Synced 2m flagger Halt podinfo.test advancement success rate 55.06% < 99%
Normal Synced 2m flagger Halt podinfo.test advancement success rate 47.00% < 99%
Normal Synced 2m flagger (combined from similar events): Halt podinfo.test advancement success rate 38.08% < 99%
Warning Synced 1m flagger Rolling back podinfo-canary.test failed checks threshold reached 10
Warning Synced 1m flagger Canary failed! Scaling down podinfo-canary.test
```
Trigger a new canary deployment by updating the canary image:
```bash
kubectl -n test set image deployment/podinfo-canary \
podinfod=quay.io/stefanprodan/podinfo:1.2.1
```
Steer detects that the canary revision changed and starts a new rollout:
```
kubectl -n test describe canary/podinfo
Status:
Canary Revision: 19871136
Failed Checks: 0
State: finished
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Synced 3m flagger New revision detected podinfo-canary.test old 17211012 new 17246876
Normal Synced 3m flagger Scaling up podinfo.test
Warning Synced 3m flagger Waiting for podinfo.test rollout to finish: 0 of 1 updated replicas are available
Normal Synced 3m flagger Advance podinfo.test canary weight 5
Normal Synced 3m flagger Advance podinfo.test canary weight 10
Normal Synced 3m flagger Advance podinfo.test canary weight 15
Normal Synced 2m flagger Advance podinfo.test canary weight 20
Normal Synced 2m flagger Advance podinfo.test canary weight 25
Normal Synced 1m flagger Advance podinfo.test canary weight 30
Normal Synced 1m flagger Advance podinfo.test canary weight 35
Normal Synced 55s flagger Advance podinfo.test canary weight 40
Normal Synced 45s flagger Advance podinfo.test canary weight 45
Normal Synced 35s flagger Advance podinfo.test canary weight 50
Normal Synced 25s flagger Copying podinfo-canary.test template spec to podinfo.test
Warning Synced 15s flagger Waiting for podinfo.test rollout to finish: 1 of 2 updated replicas are available
Normal Synced 5s flagger Promotion completed! Scaling down podinfo-canary.test
```
### Monitoring
Flagger comes with a Grafana dashboard made for canary analysis.
Install Grafana with Helm:
```bash
helm upgrade -i flagger-grafana flagger/grafana \
--namespace=istio-system \
--set url=http://prometheus.istio-system:9090
```
The dashboard shows the RED and USE metrics for the primary and canary workloads:
![flagger-grafana](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/screens/grafana-canary-analysis.png)
The canary errors and latency spikes have been recorded as Kubernetes events and logged by Flagger in json format:
```
kubectl -n istio-system logs deployment/flagger --tail=100 | jq .msg
Starting canary deployment for podinfo.test
Advance podinfo.test canary weight 5
Advance podinfo.test canary weight 10
Advance podinfo.test canary weight 15
Advance podinfo.test canary weight 20
Advance podinfo.test canary weight 25
Advance podinfo.test canary weight 30
Advance podinfo.test canary weight 35
Halt podinfo.test advancement success rate 98.69% < 99%
Advance podinfo.test canary weight 40
Halt podinfo.test advancement request duration 1.515s > 500ms
Advance podinfo.test canary weight 45
Advance podinfo.test canary weight 50
Copying podinfo-canary.test template spec to podinfo-primary.test
Scaling down podinfo-canary.test
Promotion completed! podinfo-canary.test revision 81289
```
### Roadmap
* Extend the canary analysis and promotion to other types than Kubernetes deployments such as Flux Helm releases or OpenFaaS functions
* Extend the validation mechanism to support other metrics than HTTP success rate and latency
* Add support for comparing the canary metrics to the primary ones and do the validation based on the derivation between the two
* Alerting: Trigger Alertmanager on successful or failed promotions (Prometheus instrumentation of the canary analysis)
* Reporting: publish canary analysis results to Slack/Jira/etc
### Contributing
Flagger is Apache 2.0 licensed and accepts contributions via GitHub pull requests.
When submitting bug reports please include as much details as possible:
* which Flagger version
* which Flagger CRD version
* which Kubernetes/Istio version
* what configuration (canary, virtual service and workloads definitions)
* what happened (Flagger, Istio Pilot and Proxy logs)

View File

@@ -1,64 +0,0 @@
title: Flagger - Istio Progressive Delivery Kubernetes Operator
remote_theme: errordeveloper/simple-project-homepage
repository: stefanprodan/flagger
by_weaveworks: true
url: "https://stefanprodan.github.io/flagger"
baseurl: "/"
twitter:
username: "stefanprodan"
author:
twitter: "stefanprodan"
# Set default og:image
defaults:
- scope: {path: ""}
values: {image: "logo/logo-flagger.png"}
# See: https://material.io/guidelines/style/color.html
# Use color-name-value, like pink-200 or deep-purple-100
brand_color: "amber-400"
# How article URLs are structured.
# See: https://jekyllrb.com/docs/permalinks/
permalink: posts/:title/
# "UA-NNNNNNNN-N"
google_analytics: ""
# Language. For example, if you write in Japanese, use "ja"
lang: "en"
# How many posts are visible on the home page without clicking "View More"
num_posts_visible_initially: 5
# Date format: See http://strftime.net/
date_format: "%b %-d, %Y"
plugins:
- jekyll-feed
- jekyll-readme-index
- jekyll-seo-tag
- jekyll-sitemap
- jemoji
# # required for local builds with starefossen/github-pages
# - jekyll-github-metadata
# - jekyll-mentions
# - jekyll-redirect-from
# - jekyll-remote-theme
exclude:
- CNAME
- Dockerfile
- Gopkg.lock
- Gopkg.toml
- LICENSE
- Makefile
- add-model.sh
- build
- cmd
- pkg
- tag_release.sh
- vendor

Binary file not shown.

After

Width:  |  Height:  |  Size: 207 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 130 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 196 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 159 KiB

Some files were not shown because too many files have changed in this diff Show More