Compare commits

...

376 Commits

Author SHA1 Message Date
Stefan Prodan
242d79e49d Merge pull request #159 from weaveworks/release-v0.12.0
Prepare release v0.12.0
2019-04-29 17:08:16 +03:00
stefanprodan
4f01ecde5a Update changelog 2019-04-29 16:41:26 +03:00
stefanprodan
61141c7479 Release v0.12.0 2019-04-29 16:37:48 +03:00
Stefan Prodan
62429ff710 Merge pull request #158 from weaveworks/docs-supergloo
Add SuperGloo install docs
2019-04-29 16:35:58 +03:00
stefanprodan
82a1f45cc1 Fix load tester image repo 2019-04-29 11:17:19 +03:00
stefanprodan
1a95fc2a9c Add SuperGloo install docs 2019-04-26 19:51:09 +03:00
Stefan Prodan
13816eeafa Merge pull request #151 from yuval-k/supergloo
Supergloo Support
2019-04-25 23:18:22 +03:00
Yuval Kohavi
5279f73c17 use name.namespace instead of namespace.name 2019-04-25 11:10:23 -04:00
Yuval Kohavi
d196bb2856 e2e test 2019-04-24 16:00:55 -04:00
Yuval Kohavi
3f8f634a1b add e2e tests 2019-04-23 18:06:46 -04:00
Yuval Kohavi
5ba27c898e remove todo 2019-04-23 07:42:52 -04:00
Stefan Prodan
57f1b63fa1 Merge pull request #156 from weaveworks/docs-fix
Fix Tiller-less install command
2019-04-22 20:15:11 +03:00
stefanprodan
d69e203479 Fix Tiller-less install command 2019-04-22 20:08:56 +03:00
Yuval Kohavi
4d7fae39a8 add retries and cors 2019-04-19 14:41:50 -04:00
Yuval Kohavi
2dc554c92a dep ensure twice 2019-04-19 11:23:32 -04:00
Yuval Kohavi
21c394ef7f pin supergloo\solo-kit 2019-04-19 11:20:28 -04:00
Yuval Kohavi
2173bfc1a0 Merge remote-tracking branch 'origin/master' into supergloo-updated 2019-04-19 11:17:37 -04:00
Yuval Kohavi
a19d016e14 more rules 2019-04-19 10:59:04 -04:00
Stefan Prodan
8f1b5df9e2 Merge pull request #154 from weaveworks/dep-update
Disable bats in load tester artifacts
2019-04-19 13:03:39 +03:00
stefanprodan
2d6b8ecfdf Disable bats in load tester artifacts 2019-04-19 13:02:20 +03:00
Stefan Prodan
8093612011 Merge pull request #153 from weaveworks/dep-update
Update Kubernetes packages to 1.13.1
2019-04-18 20:07:33 +03:00
stefanprodan
39dc761e32 Make codegen work with the klog shim 2019-04-18 19:56:48 +03:00
stefanprodan
0c68983c62 Update deps to Kubernetes 1.13.1 2019-04-18 19:30:55 +03:00
Stefan Prodan
c7539f6e4b Merge pull request #152 from weaveworks/release-v0.11.1
Prepare release v0.11.1
2019-04-18 16:14:55 +03:00
stefanprodan
8cebc0acee Update changelog for v0.11.1 2019-04-18 15:40:48 +03:00
stefanprodan
f60c4d60cf Release v0.11.1 2019-04-18 14:50:26 +03:00
stefanprodan
662f9cba2e Add bats tests to load tester artifacts 2019-04-18 14:34:25 +03:00
stefanprodan
4a82e1e223 Use the builtin metrics in docs 2019-04-18 14:25:55 +03:00
stefanprodan
b60b912bf8 Use the builtin metrics in artifacts 2019-04-18 13:53:13 +03:00
stefanprodan
093348bc60 Release loadtester 0.3.0 with bats support 2019-04-18 13:45:32 +03:00
Yuval Kohavi
37ebbf14f9 fix compile 2019-04-17 18:44:33 -04:00
Yuval Kohavi
156488c8d5 Merge remote-tracking branch 'origin/master' into supergloo-updated 2019-04-17 18:24:41 -04:00
Yuval Kohavi
68d1f583cc more tests 2019-04-17 13:04:02 -04:00
Stefan Prodan
3492b07d9a Merge pull request #150 from weaveworks/release-v0.11.0
Release v0.11.0
2019-04-17 11:25:16 +03:00
stefanprodan
d0b582048f Add change log for v0.11.0 2019-04-17 11:22:02 +03:00
stefanprodan
a82eb7b01f Release v0.11.0 2019-04-17 11:13:31 +03:00
stefanprodan
cd08afcbeb Add bash bats task runner
- run bats tests (blocking requests)
2019-04-17 11:05:10 +03:00
stefanprodan
331942a4ed Switch to Docker Hub from Quay 2019-04-17 11:03:38 +03:00
Yuval Kohavi
aa24d6ff7e minor change 2019-04-16 19:16:49 -04:00
Yuval Kohavi
58c2c19f1e re generate code 2019-04-16 19:16:37 -04:00
Yuval Kohavi
2a91149211 starting adding tests 2019-04-16 19:16:15 -04:00
Yuval Kohavi
868482c240 basics seem working! 2019-04-16 15:10:08 -04:00
Stefan Prodan
4e387fa943 Merge pull request #149 from weaveworks/e2e
Add end-to-end tests for A/B testing and pre/post hooks
2019-04-16 11:11:01 +03:00
stefanprodan
15484363d6 Add A/B testing and hooks to e2e readme 2019-04-16 11:00:03 +03:00
stefanprodan
50b7b74480 Speed up e2e tests by reducing the number of iterations 2019-04-16 10:41:35 +03:00
stefanprodan
adb53c63dd Add e2e tests for A/B testing and pre/post hooks 2019-04-16 10:26:00 +03:00
Stefan Prodan
bdc3a32e96 Merge pull request #148 from weaveworks/selectors
Make the pod selector label configurable
2019-04-16 09:51:05 +03:00
stefanprodan
65f716182b Add default selectors to docs 2019-04-15 13:30:58 +03:00
stefanprodan
6ef72e2550 Make the pod selector configurable
- default labels: app, name and app.kubernetes.io/name
2019-04-15 12:57:25 +03:00
stefanprodan
60f51ad7d5 Move deployer and config tracker to canary package 2019-04-15 11:27:08 +03:00
stefanprodan
a09dc2cbd8 Rename logging package 2019-04-15 11:25:45 +03:00
Stefan Prodan
825d07aa54 Merge pull request #147 from weaveworks/pre-post-rollout-hooks
Add pre/post-rollout webhooks
2019-04-14 19:52:19 +03:00
stefanprodan
f46882c778 Update cert-manager to v0.7 in GKE docs 2019-04-14 12:24:35 +03:00
stefanprodan
663fa08cc1 Add hook type and status to CRD schema validation 2019-04-13 21:21:51 +03:00
stefanprodan
19e625d38e Add pre/post rollout webhooks to docs 2019-04-13 20:30:19 +03:00
stefanprodan
edcff9cd15 Execute pre/post rollout webhooks
- halt the canary advancement if pre-rollout hooks are failing
- include the canary status (Succeeded/Failed) in the post-rollout webhook payload
- ignore post-rollout webhook failures
- log pre/post rollout webhook response result
2019-04-13 15:43:23 +03:00
stefanprodan
e0fc5ecb39 Add hook type to CRD
- pre-rollout execute webhook before routing traffic to canary
- rollout execute webhook during the canary analysis on each iteration
- post-rollout execute webhook after the canary has been promoted or rolled back
Add canary phase to webhook payload
2019-04-13 15:37:41 +03:00
stefanprodan
4ac6629969 Exclude docs branches from CI 2019-04-13 15:33:03 +03:00
Stefan Prodan
68d8dad7c8 Merge pull request #146 from weaveworks/metrics
Unify App Mesh and Istio builtin metric checks
2019-04-13 00:47:35 +03:00
stefanprodan
4ab9ceafc1 Use metrics alias in e2e tests 2019-04-12 17:43:47 +03:00
stefanprodan
352ed898d4 Add request success rate and duration metrics alias 2019-04-12 17:00:04 +03:00
stefanprodan
e091d6a50d Set Envoy request duration to ms 2019-04-12 16:25:34 +03:00
stefanprodan
c651ef00c9 Add Envoy request duration test 2019-04-12 16:09:00 +03:00
stefanprodan
4b17788a77 Add Envoy request duration P99 query 2019-04-12 16:02:23 +03:00
Yuval Kohavi
e5612bca50 dep ensure 2019-04-10 20:20:10 -04:00
Yuval Kohavi
d21fb1afe8 initial supergloo code 2019-04-10 20:19:49 -04:00
Yuval Kohavi
89d0a533e2 dep + vendor 2019-04-10 20:03:53 -04:00
Stefan Prodan
db673dddd9 Merge pull request #141 from weaveworks/default-mesh
Set default mesh gateway only if no gateway is specified
2019-04-08 10:26:33 +03:00
stefanprodan
88ad457e87 Add default mesh gateway to docs and examples 2019-04-06 12:26:11 +03:00
stefanprodan
126b68559e Set default mesh gateway if no gateway is specified 2019-04-06 12:21:30 +03:00
stefanprodan
2cd3fe47e6 Remove FAQ from docs site index 2019-04-03 19:33:37 +03:00
Stefan Prodan
15eb7cce55 Merge pull request #139 from dholbach/add-faq-stub
Add faq stub
2019-04-03 18:33:16 +03:00
Daniel Holbach
13f923aabf remove FAQ from main page for now, as requested by Stefan 2019-04-03 17:25:45 +02:00
Daniel Holbach
0ffb112063 move FAQ into separate file, add links 2019-04-03 14:49:21 +02:00
Daniel Holbach
b4ea6af110 add FAQ stub
closes: #132
2019-04-03 11:10:55 +02:00
Daniel Holbach
611c8f7374 make markdownlint happy
Signed-off-by: Daniel Holbach <daniel@weave.works>
2019-04-03 10:56:04 +02:00
Daniel Holbach
1cc73f37e7 go OCD on the feature table
Signed-off-by: Daniel Holbach <daniel@weave.works>
2019-04-03 10:52:40 +02:00
Stefan Prodan
ca37fc0eb5 Merge pull request #136 from weaveworks/feature-list
Add Istio vs App Mesh feature list
2019-04-02 13:37:20 +03:00
stefanprodan
5380624da9 Add CORS, retries and timeouts to feature list 2019-04-02 13:27:32 +03:00
stefanprodan
aaece0bd44 Add acceptance tests to feature list 2019-04-02 13:05:01 +03:00
stefanprodan
de7cc17f5d Add Istio vs App Mesh feature list 2019-04-02 12:57:15 +03:00
Stefan Prodan
66efa39d27 Merge pull request #134 from dholbach/add-features-section
Seed a features section
2019-04-02 12:44:17 +03:00
Stefan Prodan
ff7c0a105d Merge pull request #130 from weaveworks/metrics-refactor
Refactor the metrics observer
2019-04-02 12:41:59 +03:00
Daniel Holbach
7b29253df4 use white-check-mark instead - hope this work this time 2019-04-01 18:51:44 +02:00
Daniel Holbach
7ef63b341e use white-heavy-check-mark instead 2019-04-01 18:47:10 +02:00
Daniel Holbach
e7bfaa4f1a factor in Stefan's feedback - use a table instead 2019-04-01 18:44:51 +02:00
Daniel Holbach
3a9a408941 Seed a features section
Closes: #133

Signed-off-by: Daniel Holbach <daniel@weave.works>
2019-04-01 18:09:57 +02:00
stefanprodan
3e43963daa Wait for Istio pods to be ready 2019-04-01 14:52:03 +03:00
stefanprodan
69a6e260f5 Bring down the Istio e2e CPU requests 2019-04-01 14:40:11 +03:00
stefanprodan
664e7ad555 Debug e2e load test 2019-04-01 14:33:47 +03:00
stefanprodan
ee4a009a06 Print Istio e2e status 2019-04-01 14:09:19 +03:00
stefanprodan
36dfd4dd35 Bring down Istio Pilot memory requests 2019-04-01 13:09:52 +03:00
stefanprodan
dbf36082b2 Revert Istio e2e to 1.1.0 2019-04-01 12:55:00 +03:00
stefanprodan
3a1018cff6 Change Istio e2e limits 2019-03-31 18:00:03 +03:00
stefanprodan
fc10745a1a Upgrade Istio e2e to v1.1.1 2019-03-31 14:43:08 +03:00
stefanprodan
347cfd06de Upgrade Kubernetes Kind to v0.2.1 2019-03-31 14:30:27 +03:00
stefanprodan
ec759ce467 Add Envoy success rate test 2019-03-31 14:17:39 +03:00
stefanprodan
f211e0fe31 Use go templates to render the builtin promql queries 2019-03-31 13:55:14 +03:00
stefanprodan
c91a128b65 Fix observer mock init 2019-03-30 11:55:41 +02:00
stefanprodan
6a080f3032 Rename observer and recorder 2019-03-30 11:49:43 +02:00
stefanprodan
b2c12c1131 Move observer to metrics package 2019-03-30 11:45:39 +02:00
Stefan Prodan
b945b37089 Merge pull request #127 from weaveworks/metrics
Refactor Prometheus recorder
2019-03-28 15:49:08 +02:00
stefanprodan
9a5529a0aa Add flagger_info metric to docs 2019-03-28 12:11:25 +02:00
stefanprodan
025785389d Refactor Prometheus recorder
- add flagger_info gauge metric
- expose the version and mesh provider as labels
- move the recorder to the metrics package
2019-03-28 11:58:19 +02:00
stefanprodan
48d9a0dede Ensure the status metric is set after a restart 2019-03-28 11:52:13 +02:00
Stefan Prodan
fbdf38e990 Merge pull request #124 from weaveworks/release-v0.10.0
Release v0.10.0 (AWS App Mesh edition)
2019-03-27 14:20:50 +02:00
stefanprodan
ef5bf70386 Update go to 1.12 2019-03-27 13:02:30 +02:00
stefanprodan
274c1469b4 Update changelog v0.10.0 2019-03-27 09:46:06 +02:00
stefanprodan
960d506360 Upgrade mesh definition to v1beta1 2019-03-27 09:40:11 +02:00
stefanprodan
79a6421178 Move load tester to Weaveworks Quay 2019-03-27 09:35:34 +02:00
Stefan Prodan
8b5c004860 Merge pull request #123 from weaveworks/appmesh-v1beta1
Update App Mesh to v1beta1
2019-03-26 21:53:34 +02:00
stefanprodan
f54768772e Fix App Mesh success rate graph 2019-03-26 21:30:18 +02:00
stefanprodan
b9075dc6f9 Update App Mesh to v1beta1 2019-03-26 20:29:40 +02:00
Stefan Prodan
107596ad54 Merge pull request #122 from weaveworks/prep-0.10.0
Reconcile ClusterIP services and prep v0.10.0
2019-03-26 17:25:57 +02:00
stefanprodan
3c6a2b1508 Update changelog for v0.10.0 2019-03-26 17:12:46 +02:00
stefanprodan
f996cba354 Set Grafana dashboards readable IDs 2019-03-26 17:12:46 +02:00
stefanprodan
bdd864fbdd Add port and mesh name to CRD validation 2019-03-26 17:12:46 +02:00
stefanprodan
ca074ef13f Rename router sync to reconcile 2019-03-26 17:12:46 +02:00
stefanprodan
ddd3a8251e Reconcile ClusterIP services
- add svc update tests fix: #114
2019-03-26 17:12:46 +02:00
stefanprodan
f5b862dc1b Add App Mesh docs to readme 2019-03-26 17:12:45 +02:00
stefanprodan
d45d475f61 Add Grafana dashboard link and update screen 2019-03-26 17:12:45 +02:00
stefanprodan
d0f72ea3fa Bump Flagger version to 0.10.0 2019-03-26 17:12:45 +02:00
stefanprodan
5ed5d1e5b6 Fix load tester install command for zsh 2019-03-26 11:57:01 +02:00
stefanprodan
311b14026e Release loadtester chart v0.2.0 2019-03-26 11:33:07 +02:00
stefanprodan
67cd722b54 Release Grafana chart v1.1.0 2019-03-26 11:16:42 +02:00
stefanprodan
7f6247eb7b Add jq requirement to App Mesh installer 2019-03-26 10:14:44 +02:00
Stefan Prodan
f3fd515521 Merge pull request #121 from weaveworks/stats
Fix canary status prom metric
2019-03-26 09:58:54 +02:00
stefanprodan
9db5dd0d7f Update changelog 2019-03-26 09:58:40 +02:00
stefanprodan
d07925d79d Fix canary status prom metrics 2019-03-25 17:26:22 +02:00
Stefan Prodan
662d0f31ff Merge pull request #119 from weaveworks/appmesh-ref
App Mesh docs
2019-03-25 15:13:35 +02:00
stefanprodan
2c5ad0bf8f Disable App Mesh ingress for load tester 2019-03-25 15:00:57 +02:00
stefanprodan
0ea76b986a Prevent the CRD from being removed by Helm 2019-03-25 15:00:23 +02:00
stefanprodan
3c4253c336 Docs fixes 2019-03-25 14:59:55 +02:00
stefanprodan
77ba28e91c Use App Mesh install script 2019-03-25 09:41:56 +02:00
stefanprodan
6399e7586c Update overview diagram 2019-03-24 15:10:13 +02:00
Stefan Prodan
1caa62adc8 Merge pull request #118 from weaveworks/appmesh
App Mesh refactoring + docs
2019-03-24 13:53:40 +02:00
stefanprodan
8fa558f124 Add intro do App Mesh tutorial 2019-03-24 13:38:00 +02:00
stefanprodan
191228633b Add App Mesh backend example 2019-03-24 13:11:03 +02:00
stefanprodan
8a981f935a Add App Mesh GitOps diagram 2019-03-24 12:26:35 +02:00
stefanprodan
a8ea9adbcc Add the Slack notifications to App Mesh tutorial 2019-03-23 17:27:53 +02:00
stefanprodan
685d94c44b Add Grafana screen to App Mesh docs 2019-03-23 16:26:24 +02:00
stefanprodan
153ed1b044 Add the App Mesh ingress to docs 2019-03-23 16:09:30 +02:00
stefanprodan
7788f3a1ba Add virtual node for App Mesh ingress 2019-03-23 15:56:52 +02:00
stefanprodan
cd99225f9b Add App Mesh canary deployments tutorial 2019-03-23 15:41:04 +02:00
stefanprodan
33ba3b8d4a Add App Mesh canary demo definitions 2019-03-23 15:40:27 +02:00
stefanprodan
d222dd1069 Change GitHub raw URLs to Weaveworks org 2019-03-23 13:38:57 +02:00
stefanprodan
39fd3d46ba Add App Mesh ingress gateway deployment 2019-03-23 13:22:33 +02:00
stefanprodan
419c1804b6 Add App Mesh telemetry deployment 2019-03-23 13:21:52 +02:00
stefanprodan
ae0351ddad Exclude the namespace from AppMesh object names
ref: https://github.com/aws/aws-app-mesh-controller-for-k8s/issues/14
2019-03-23 11:25:39 +02:00
stefanprodan
941be15762 Fix typo in comments 2019-03-23 11:25:31 +02:00
stefanprodan
578ebcf6ed Use pod name filter in Envoy metrics query
Add support for Weave Cloud Prometheus agent
2019-03-23 11:20:51 +02:00
Stefan Prodan
27ab4b08f9 Merge pull request #112 from weaveworks/appmesh-grafana
Add AppMesh Grafana dashboard
2019-03-21 16:19:05 +02:00
stefanprodan
428b2208ba Add Flagger logo svg format (CNCF landscape) 2019-03-21 16:05:51 +02:00
Stefan Prodan
438c553d60 Merge pull request #113 from tanordheim/service-port-name
Allow setting name of ports in generated services
2019-03-21 15:38:21 +02:00
Trond Nordheim
90cb293182 Add GRPC canary analysis custom query example 2019-03-21 14:27:11 +01:00
Trond Nordheim
1f9f93ebe4 Add portName information to how it works-guide 2019-03-21 12:41:44 +01:00
Trond Nordheim
f5b97fbb74 Add support for naming generated service ports 2019-03-21 12:37:02 +01:00
stefanprodan
ce79244126 Add maintainers 2019-03-21 12:34:28 +02:00
stefanprodan
3af5d767d8 Add AppMesh dashboard screen 2019-03-21 12:19:04 +02:00
stefanprodan
3ce3efd2f2 Fix Grafana port forward command 2019-03-21 12:11:11 +02:00
stefanprodan
8108edea31 Add AppMesh Grafana dashboard 2019-03-21 12:06:50 +02:00
Stefan Prodan
5d80087ab3 Merge pull request #109 from weaveworks/appmesh-docs
Add EKS App Mesh install docs
2019-03-21 11:02:35 +02:00
stefanprodan
6593be584d Remove namespace from Prometheus URL 2019-03-21 10:52:21 +02:00
stefanprodan
a0f63f858f Update changelog and roadmap 2019-03-21 09:57:49 +02:00
stefanprodan
49914f3bd5 Add EKS App Mesh install link to readme 2019-03-20 23:46:14 +02:00
stefanprodan
71988a8b98 Add EKS App Mesh install guide 2019-03-20 23:40:09 +02:00
stefanprodan
d65be6ef58 Add AppMesh virtual node to load tester chart 2019-03-20 23:39:34 +02:00
Stefan Prodan
38c40d02e7 Merge pull request #108 from weaveworks/ww
Move to weaveworks org
2019-03-20 19:10:56 +02:00
stefanprodan
9e071b9d60 Move to weaveworks Quay 2019-03-20 18:52:27 +02:00
stefanprodan
b4ae060122 Move to weaveworks org 2019-03-20 18:26:04 +02:00
Stefan Prodan
436656e81b Merge pull request #107 from stefanprodan/appmesh
AWS App Mesh integration
2019-03-20 17:57:25 +02:00
stefanprodan
d7e111b7d4 Add mesh provider option to Helm chart 2019-03-20 12:59:35 +02:00
stefanprodan
4b6126dd1a Add Envoy HTTP success rate metric check 2019-03-19 15:52:26 +02:00
stefanprodan
89faa70196 Fix canary virtual node DNS discovery 2019-03-19 15:51:17 +02:00
stefanprodan
6ed9d4a1db Add AppMesh CRDs to Flagger's RBAC 2019-03-18 10:13:46 +02:00
stefanprodan
9d0e38c2e1 Disable primary readiness check in tests 2019-03-17 12:50:48 +02:00
stefanprodan
8b758fd616 Set primary min replicas 2019-03-17 12:30:11 +02:00
stefanprodan
14369d8be3 Fix virtual node backends 2019-03-17 12:29:04 +02:00
stefanprodan
7b4153113e Fix router tests 2019-03-17 11:14:47 +02:00
stefanprodan
7d340c5e61 Change mesh providers based on cmd flag 2019-03-17 10:52:52 +02:00
stefanprodan
337c94376d Add AppMesh routing tests 2019-03-17 10:29:40 +02:00
stefanprodan
0ef1d0b2f1 Implement AppMesh routing ops 2019-03-17 10:29:21 +02:00
stefanprodan
5cf67bd4e0 Add AppMesh router sync tests 2019-03-16 14:14:24 +02:00
stefanprodan
f22be17852 Add AppMesh router sync implementation
Sync virtual nodes and virtual services
2019-03-16 14:13:58 +02:00
stefanprodan
48e79d5dd4 Add mesh provider flag 2019-03-16 14:12:23 +02:00
stefanprodan
59f5a0654a Add AppMesh fields to Canary CRD 2019-03-16 14:11:24 +02:00
stefanprodan
6da2e11683 Add AppMesh CRDs and Kubernetes client 2019-03-16 14:10:09 +02:00
stefanprodan
802c087a4b Fix Istio Gateway certificate
fix: #102
2019-03-14 17:52:58 +02:00
Stefan Prodan
ed2048e9f3 Merge pull request #105 from stefanprodan/svc
Copy pod labels from canary to primary
2019-03-14 01:59:20 +02:00
stefanprodan
437b1d30c0 Copy labels from canary to primary 2019-03-14 01:49:07 +02:00
stefanprodan
ba1788cbc5 Change default ClusterIP to point to primary
- ensures that the routing works without a service mesh
2019-03-14 01:48:10 +02:00
Stefan Prodan
773094a20d Merge pull request #99 from stefanprodan/loadtester-v0.2.0
Release loadtester v0.2.0
2019-03-12 12:07:47 +02:00
stefanprodan
5aa39106a0 Update loadtester cmd for e2e testing 2019-03-12 11:58:16 +02:00
stefanprodan
a9167801ba Release loadtester v0.2.0 2019-03-12 11:55:31 +02:00
Stefan Prodan
62f4a6cb96 Merge pull request #90 from cloudang/ngrinder
Support delegation to external load testing tools
2019-03-12 11:47:30 +02:00
Stefan Prodan
ea2b41e96e Merge pull request #98 from peterj/master
Upgrade Alpine to 3.9
2019-03-12 10:18:16 +02:00
Alex Wong
d28ce650e9 fix typo 2019-03-12 15:40:20 +08:00
Alex Wong
1bfcdba499 update vendor 2019-03-12 15:00:55 +08:00
Alex Wong
e48faa9144 add docs for ngrinder load testing 2019-03-12 14:59:55 +08:00
Alex Wong
33fbe99561 remove logCmdOutput from docs and k8s resources definition 2019-03-12 14:35:39 +08:00
Alex Wong
989925b484 update canary spec example, cmd flag logCmdOutput moved here 2019-03-12 14:33:23 +08:00
Alex Wong
7dd66559e7 add metadata field 'cmd' 2019-03-12 14:31:30 +08:00
Alex Wong
2ef1c5608e remove logCmdOutput flag 2019-03-12 14:31:00 +08:00
Alex Wong
b5932e8905 support time duration literal 2019-03-12 14:29:50 +08:00
Peter Jausovec
37999d3250 Upgrade Alpine to 3.9. Fixes #89 2019-03-11 20:17:15 -07:00
Stefan Prodan
83985ae482 Merge pull request #93 from stefanprodan/release-v0.9.0
Release v0.9.0
2019-03-11 15:26:00 +02:00
Stefan Prodan
3adfcc837e Merge pull request #94 from stefanprodan/fix-abtest-routing
Fix A/B Testing HTTP URI match conditions
2019-03-11 15:15:42 +02:00
stefanprodan
c720fee3ab Target the canary header in the load test 2019-03-11 15:04:01 +02:00
stefanprodan
881387e522 Fix HTTP URI match conditions 2019-03-11 14:54:17 +02:00
stefanprodan
d9f3378e29 Add change log for v0.9.0 2019-03-11 14:03:55 +02:00
stefanprodan
ba87620225 Release v0.9.0 2019-03-11 13:57:10 +02:00
Stefan Prodan
1cd0c49872 Merge pull request #88 from stefanprodan/ab-testing
A/B testing - canary with session affinity
2019-03-11 13:55:06 +02:00
stefanprodan
12ac96deeb Document how to enable A/B testing 2019-03-11 12:58:33 +02:00
Alex Wong
17e6f35785 add gock.v1 dependency 2019-03-11 10:07:50 +08:00
Stefan Prodan
bd115633a3 Merge pull request #91 from huydinhle/update-analysis-interval
Sync job when canary's interval changes
fix #86
2019-03-09 19:58:34 +02:00
stefanprodan
86ea172380 Fix weight metric report 2019-03-08 23:28:45 +02:00
stefanprodan
d87bbbbc1e Add A/B testing tutorial 2019-03-08 21:26:52 +02:00
Huy Le
6196f69f4d Create New Job when Canary's Interval changes
- Currently whenever the Canary analysis interval changes, flagger does
not reflect this into canary's job.
- This change will make sure the canary analysis interval got updated whenever
the Canary object's interval changes
2019-03-08 10:27:34 -08:00
Alex Wong
be31bcf22f mocked test 2019-03-08 22:20:29 +08:00
Alex Wong
cba2135c69 add comments 2019-03-08 22:20:16 +08:00
Alex Wong
2e52573499 add gock dep 2019-03-08 22:20:02 +08:00
Alex Wong
b2ce1ed1fb test for ngrinder task 2019-03-08 21:30:26 +08:00
Alex Wong
77a485af74 poll ngrinder task status 2019-03-08 21:29:58 +08:00
stefanprodan
d8b847a973 Mention session affinity in docs 2019-03-08 15:05:44 +02:00
stefanprodan
e80a3d3232 Add A/B testing scheduling unit tests 2019-03-08 13:06:39 +02:00
stefanprodan
780ba82385 Log namespace restriction if one exists 2019-03-08 13:05:25 +02:00
stefanprodan
6ba69dce0a Add iterations field to CRD validation 2019-03-08 12:31:35 +02:00
stefanprodan
3c7a561db8 Add Istio routes A/B testing unit tests 2019-03-08 12:24:43 +02:00
stefanprodan
49c942bea0 Add A/B testing examples 2019-03-08 11:55:04 +02:00
stefanprodan
bf1ca293dc Implement fix routing for canary analysis
Allow A/B testing scenarios where instead of weighted routing the traffic is split between the primary and canary based on HTTP headers or cookies.
2019-03-08 11:54:41 +02:00
stefanprodan
62b906d30b Add canary HTTP match conditions and iterations 2019-03-08 11:49:32 +02:00
Alex Wong
65bf048189 add ngrinder support 2019-03-08 15:50:44 +08:00
Alex Wong
a498ed8200 move original cmd tester to standalone source 2019-03-08 15:50:26 +08:00
Alex Wong
9f12bbcd98 refactoring loadtester to support external testing platform 2019-03-08 15:49:35 +08:00
Stefan Prodan
fcd520787d Merge pull request #84 from stefanprodan/release-v0.8.0
Release v0.8.0
2019-03-06 21:30:09 +02:00
stefanprodan
e2417e4e40 Skip e2e tests for release branches 2019-03-06 21:21:48 +02:00
stefanprodan
70a2cbf1c6 Add change log for v0.8.0 2019-03-06 21:17:37 +02:00
stefanprodan
fa0c6af6aa Release v0.8.0 2019-03-06 21:17:13 +02:00
Stefan Prodan
4f1abd0c8d Merge pull request #83 from stefanprodan/cors-policy
Add CORS policy support
2019-03-06 20:31:37 +02:00
stefanprodan
41e839aa36 Fix virtual service example 2019-03-06 15:56:20 +02:00
stefanprodan
2fd1593ad2 Use service headers to set Envoy timeout 2019-03-06 15:38:14 +02:00
stefanprodan
27b601c5aa Add CORS policy example 2019-03-06 15:37:28 +02:00
stefanprodan
5fc69134e3 Add CORS policy test 2019-03-06 15:34:51 +02:00
stefanprodan
9adc0698bb Add CORS policy to Istio router 2019-03-06 15:34:36 +02:00
stefanprodan
119c2ff464 Add CORS policy to Canary CRD 2019-03-06 15:33:53 +02:00
Stefan Prodan
f3a4201c7d Merge pull request #82 from stefanprodan/headers-ops
Add support for HTTP request header manipulation rules
2019-03-06 14:58:05 +02:00
stefanprodan
8b6aa73df0 Fix request header test 2019-03-06 13:51:04 +02:00
stefanprodan
1d4dfb0883 Add request header add test 2019-03-06 13:46:19 +02:00
stefanprodan
eab7f126a6 Use request.add for header append operation 2019-03-06 13:45:46 +02:00
stefanprodan
fe7547d83e Update Envoy headers example 2019-03-06 12:42:34 +02:00
stefanprodan
7d0df82861 Add header manipulation rules to Canary CRD 2019-03-06 12:41:53 +02:00
stefanprodan
7f0cd27591 Add Header manipulation rules to Istio Virtual Service 2019-03-06 12:17:41 +02:00
Stefan Prodan
e094c2ae14 Merge pull request #80 from stefanprodan/istio
Add Istio k8s client
2019-03-06 11:55:27 +02:00
Stefan Prodan
a5d438257f Merge pull request #78 from huydinhle/namespace-watcher
Add namespace flag
2019-03-06 11:10:17 +02:00
Huy Le
d8cb8f1064 Added Namespace Flag for Flagger
- introduce the namespace flag for flagger to watch a single namespace
for Canary Objects
2019-03-05 20:57:00 -08:00
stefanprodan
a8d8bb2d6f Fix go fmt 2019-03-06 01:54:31 +02:00
stefanprodan
a76ea5917c Remove knative pkg
CORS and RetryOn are missing from the knative pkg.
Until Istio has an official k8s client, we'll maintain our own.
2019-03-06 01:47:13 +02:00
stefanprodan
b0b6198ec8 Add Istio virtual service and signal packages 2019-03-06 01:43:09 +02:00
Stefan Prodan
eda97f35d2 Merge pull request #73 from huydinhle/fined-grained-rbac
Fine-grained RBAC
2019-03-06 00:06:40 +02:00
Huy Le
2b6507d35a fine-grained rbac for flagger helm 2019-03-05 11:29:34 -08:00
stefanprodan
f7c4d5aa0b Disable PR comments when coverage doesn't change 2019-03-05 16:25:30 +02:00
Stefan Prodan
74f07cffa6 Merge pull request #72 from stefanprodan/router
Refactor routing management
2019-03-05 12:28:11 +02:00
Stefan Prodan
79c8ff0af8 Merge pull request #74 from cloudang/options
Command line options for easier debugging
2019-03-05 12:07:03 +02:00
stefanprodan
ac544eea4b Extend test coverage to all packages 2019-03-05 11:59:40 +02:00
Alex Wong
231a32331b move flags to main packages 2019-03-05 17:48:55 +08:00
Alex Wong
104e8ef050 Add options for customizing threadiness, logger encoding, and global logger level 2019-03-05 14:30:23 +08:00
Alex Wong
296015faff update .gitignore 2019-03-05 12:15:27 +08:00
stefanprodan
9a9964c968 Add ClusterIP host to virtual service 2019-03-05 02:27:56 +02:00
stefanprodan
0d05d86e32 Add Istio routing tests 2019-03-05 02:18:07 +02:00
stefanprodan
9680ca98f2 Rename service router to Kubernetes router 2019-03-05 02:12:52 +02:00
stefanprodan
42b850ca52 Replace controller routing management with router pkg 2019-03-05 02:04:55 +02:00
stefanprodan
3f5c22d863 Extract routing to dedicated package
- split routing management into Kubernetes service router and Istio Virtual service router
2019-03-05 02:02:58 +02:00
Stefan Prodan
535a92e871 Merge pull request #70 from stefanprodan/append-headers
Allow headers to be appended to HTTP requests
2019-03-04 10:39:43 +02:00
stefanprodan
3411a6a981 Add delay Envoy shutdown tip to docs 2019-03-03 14:03:34 +02:00
stefanprodan
b5adee271c Add zero downtime deployments tutorial 2019-03-03 13:24:15 +02:00
stefanprodan
e2abcd1323 Add append headers PR to changelog 2019-03-03 10:33:08 +02:00
Stefan Prodan
25fbe7ecb6 Merge pull request #71 from huydinhle/namepace-typo
Fixed namepace typo in the repo
2019-03-03 10:29:29 +02:00
Huy Le
6befee79c2 Fixed namepace typo in the repo 2019-03-02 13:49:42 -08:00
stefanprodan
f09c5a60f1 Add Envoy headers to e2e tests 2019-03-02 14:26:17 +02:00
stefanprodan
52e89ff509 Add Envoy timeout and retry policy to docs 2019-03-02 13:48:19 +02:00
stefanprodan
35e20406ef Append HTTP headers when configuring routing 2019-03-02 13:35:36 +02:00
stefanprodan
c6e96ff1bb Add append headers field to Canary CRD 2019-03-02 13:33:03 +02:00
Stefan Prodan
793ab524b0 Merge pull request #68 from stefanprodan/fix-docs
Add Getting Help section to readme
2019-03-02 10:36:40 +02:00
stefanprodan
5a479d0187 Add Weaveworks Slack links 2019-03-02 10:26:54 +02:00
stefanprodan
a23e4f1d2a Add timeout and reties example to docs 2019-03-02 10:26:34 +02:00
Stefan Prodan
bd35a3f61c Merge pull request #66 from stefanprodan/fix-mesh
Avoid mesh gateway duplicates
2019-03-02 01:27:00 +02:00
stefanprodan
197e987d5f Avoid mesh gateway duplicates 2019-03-01 13:09:27 +02:00
stefanprodan
7f29beb639 Don't run e2e tests for docs branches 2019-02-28 18:55:58 +02:00
Stefan Prodan
1140af8dc7 Merge pull request #63 from stefanprodan/release-0.7.0
Release v0.7.0
2019-02-28 17:12:27 +02:00
stefanprodan
a2688c3910 Add link to custom metrics docs 2019-02-28 16:58:26 +02:00
stefanprodan
75b27ab3f3 Add change log for v0.7.0 2019-02-28 16:56:49 +02:00
stefanprodan
59d3f55fb2 Release v0.7.0 2019-02-28 16:05:48 +02:00
Stefan Prodan
f34739f334 Merge pull request #62 from stefanprodan/retries
Add timeout and retries
2019-02-28 15:36:46 +02:00
stefanprodan
90c71ec18f Update roadmap with alternatives to Istio 2019-02-28 15:09:24 +02:00
stefanprodan
395234d7c8 Add promql custom check to readme 2019-02-28 00:33:47 +02:00
stefanprodan
e322ba0065 Add timeout and retries to router 2019-02-28 00:05:40 +02:00
stefanprodan
6db8b96f72 Add timeout and retries example to docs 2019-02-28 00:02:48 +02:00
stefanprodan
44d7e96e96 Add timeout and retries fields to Canary CRD 2019-02-28 00:02:01 +02:00
Stefan Prodan
1662479c8d Merge pull request #60 from stefanprodan/custom-metrics
Add support for custom metrics
2019-02-27 23:31:05 +02:00
stefanprodan
2e351fcf0d Add a custom metric example to docs 2019-02-27 16:37:42 +02:00
stefanprodan
5d81876d07 Make the metric interval optional
- set default value to 1m
2019-02-27 16:03:56 +02:00
stefanprodan
c81e6989ec Add e2e tests for custom metrics 2019-02-27 15:49:09 +02:00
stefanprodan
4d61a896c3 Add custom promql queries support 2019-02-27 15:48:31 +02:00
stefanprodan
d148933ab3 Add metric query field to Canary CRD 2019-02-27 15:46:09 +02:00
Stefan Prodan
04a56a3591 Merge pull request #57 from stefanprodan/release-0.6.0
Release v0.6.0
2019-02-26 01:45:10 +02:00
stefanprodan
4a354e74d4 Update roadmap 2019-02-25 23:45:54 +02:00
stefanprodan
1e3e6427d5 Add link to virtual service docs 2019-02-25 23:22:49 +02:00
stefanprodan
38826108c8 Add changelog for v0.6.0 2019-02-25 23:01:35 +02:00
stefanprodan
4c4752f907 Release v0.6.0 2019-02-25 20:10:33 +02:00
Stefan Prodan
94dcd6c94d Merge pull request #55 from stefanprodan/http-match
Add HTTP match and rewrite to Canary service spec
2019-02-25 20:04:12 +02:00
stefanprodan
eabef3db30 Router improvements
- change virtual service route to canary service
- keep the existing destination weights on virtual service updates
- set the match conditions and URI rewrite when changing the traffic weight
2019-02-25 03:14:45 +02:00
stefanprodan
6750f10ffa Add HTTP match and rewrite docs 2019-02-25 03:07:39 +02:00
stefanprodan
56cb888cbf Add HTTP match and rewrite to virtual service 2019-02-25 00:08:06 +02:00
stefanprodan
b3e7fb3417 Add HTTP match and rewrite to Canary service spec 2019-02-25 00:06:14 +02:00
stefanprodan
2c6e1baca2 Update istio client 2019-02-25 00:05:09 +02:00
Stefan Prodan
c8358929d1 Merge pull request #54 from stefanprodan/vsvc
Refactor virtual service sync
2019-02-24 21:18:01 +02:00
stefanprodan
1dc7677dfb Add tests for virtual service sync 2019-02-24 19:58:01 +02:00
stefanprodan
8e699a7543 Detect changes in virtual service
- ignore destination weight when comparing the two specs
2019-02-24 18:25:12 +02:00
Stefan Prodan
cbbabdfac0 Merge pull request #53 from stefanprodan/kind
Add CircleCI workflow for end-to-end testing with Kubernetes Kind
2019-02-24 12:44:20 +02:00
stefanprodan
9d92de234c Increase promotion e2e wait time to 10s 2019-02-24 11:55:37 +02:00
stefanprodan
ba65975fb5 Add e2e testing docs 2019-02-24 11:41:22 +02:00
stefanprodan
ef423b2078 Move Flagger e2e build to a dedicated job 2019-02-24 03:10:50 +02:00
stefanprodan
f451b4e36c Split e2e prerequisites 2019-02-24 02:52:25 +02:00
stefanprodan
0856e13ee6 Use kind kubeconfig 2019-02-24 02:35:36 +02:00
stefanprodan
87b9fa8ca7 Move cluster init to prerequisites 2019-02-24 02:24:23 +02:00
stefanprodan
5b43d3d314 Use local docker image for e2e testing 2019-02-24 02:11:32 +02:00
stefanprodan
ac4972dd8d Fix e2e paths 2019-02-24 02:09:45 +02:00
stefanprodan
8a8f68af5d Test CircleCI 2019-02-24 02:02:37 +02:00
stefanprodan
c669dc0c4b Run e2e tests with CircleCI 2019-02-24 01:58:18 +02:00
stefanprodan
863a5466cc Add e2e prerequisites 2019-02-24 01:58:03 +02:00
stefanprodan
e2347c84e3 Use absolute paths in e2e tests 2019-02-24 01:11:04 +02:00
stefanprodan
e0e673f565 Install e2e deps and run tests 2019-02-24 01:03:39 +02:00
stefanprodan
30cbf2a741 Add e2e tests
- create Kubernetes cluster with Kind
- install Istio and Prometheus
- install Flagger
- test canary init and promotion
2019-02-24 01:02:15 +02:00
stefanprodan
f58de3801c Add Istio install values for e2e testing 2019-02-24 01:00:03 +02:00
Stefan Prodan
7c6b88d4c1 Merge pull request #51 from carlossg/update-virtualservice
Update VirtualService when the Canary service spec changes
2019-02-20 09:07:27 +00:00
Carlos Sanchez
0c0ebaecd5 Compare only hosts and gateways 2019-02-19 19:54:38 +01:00
Carlos Sanchez
1925f99118 If generated VirtualService already exists update it
Only if spec has changed
2019-02-19 19:40:46 +01:00
Stefan Prodan
6f2a22a1cc Merge pull request #47 from stefanprodan/release-0.5.1
Release v0.5.1
2019-02-14 12:12:11 +01:00
stefanprodan
ee04082cd7 Release v0.5.1 2019-02-13 18:59:34 +02:00
Stefan Prodan
efd901ac3a Merge pull request #46 from stefanprodan/skip-canary
Add option to skip the canary analysis
2019-02-13 17:28:07 +01:00
stefanprodan
e565789ae8 Add link to Helm GitOps repo 2019-02-13 18:18:37 +02:00
stefanprodan
d3953004f6 Add docs links and trim down the readme 2019-02-13 16:39:48 +02:00
stefanprodan
df1d9e3011 Add skip analysis test 2019-02-13 15:56:40 +02:00
stefanprodan
631c55fa6e Document how to skip the canary analysis 2019-02-13 15:31:01 +02:00
stefanprodan
29cdd43288 Implement skip analysis
When skip analysis is enabled, Flagger checks if the canary deployment is healthy and promotes it without analysing it. If an analysis is underway, Flagger cancels it and runs the promotion.
2019-02-13 15:30:29 +02:00
stefanprodan
9b79af9fcd Add skipAnalysis field to Canary CRD 2019-02-13 15:27:45 +02:00
stefanprodan
2c9c1adb47 Fix docs summary 2019-02-13 13:05:57 +02:00
Stefan Prodan
5dfb5808c4 Merge pull request #44 from stefanprodan/helm-docs
Add Helm and Weave Flux GitOps article
2019-02-13 11:51:38 +01:00
stefanprodan
bb0175aebf Add canary rollback scenario 2019-02-13 12:48:26 +02:00
stefanprodan
adaf4c99c0 Add GitOps example to Helm guide 2019-02-13 02:14:40 +02:00
stefanprodan
bed6ed09d5 Add tutorial for canaries with Helm 2019-02-13 00:52:49 +02:00
stefanprodan
4ff67a85ce Add configmap demo to podinfo 2019-02-13 00:51:44 +02:00
stefanprodan
702f4fcd14 Add configmap demo to podinfo 2019-02-12 19:12:10 +02:00
Stefan Prodan
8a03ae153d Merge pull request #43 from stefanprodan/app-validation
Add validation for label selectors
2019-02-11 10:55:34 +01:00
stefanprodan
434c6149ab Package all charts 2019-02-11 11:47:46 +02:00
stefanprodan
97fc4a90ae Add validation for label selectors
- Reject deployment if the pod label selector doesn't match 'app: <DEPLOYMENT_NAME>'
2019-02-11 11:46:59 +02:00
Stefan Prodan
217ef06930 Merge pull request #41 from stefanprodan/demo
Add canary deployment demo Helm chart
2019-02-11 10:20:48 +01:00
stefanprodan
71057946e6 Fix podinfo helm tests 2019-02-10 17:38:33 +02:00
stefanprodan
a74ad52c72 Add dashboard screens 2019-02-10 12:07:44 +02:00
stefanprodan
12d26874f8 Add canary deployment demo chart based on podinfo 2019-02-10 11:48:51 +02:00
stefanprodan
27de9ce151 Session affinity incompatible with destinations weight
- consistent hashing does not apply across multiple subsets
2019-02-10 11:47:01 +02:00
stefanprodan
9e7cd5a8c5 Disable Stackdriver monitoring
- Istio add-on v1.0.3 stackdriver adapter is missing the zone label
2019-02-10 11:37:01 +02:00
stefanprodan
38cb487b64 Allow Grafana anonymous access 2019-02-09 23:45:42 +02:00
stefanprodan
05ca266c5e Add HPA add-on to GKE docs 2019-02-04 16:52:03 +02:00
Stefan Prodan
5cc26de645 Merge pull request #40 from stefanprodan/gke
Flagger install docs revamp
2019-02-02 12:43:15 +01:00
stefanprodan
2b9a195fa3 Add cert-manager diagram to docs 2019-02-02 13:36:51 +02:00
stefanprodan
4454749eec Add load tester install instructions to docs 2019-02-02 13:01:48 +02:00
stefanprodan
b435a03fab Document Istio requirements 2019-02-02 12:16:16 +02:00
stefanprodan
7c166e2b40 Restructure the install docs 2019-02-02 02:20:02 +02:00
stefanprodan
f7a7963dcf Add Flagger install guide for GKE 2019-02-02 02:19:25 +02:00
stefanprodan
9c77c0d69c Add GKE Istio diagram 2019-02-02 02:18:31 +02:00
stefanprodan
e8a9555346 Add GKE Istio Gateway and Prometheus definitions 2019-02-02 02:17:55 +02:00
Stefan Prodan
59751dd007 Merge pull request #39 from stefanprodan/changelog
Add changelog
2019-01-31 17:29:47 +01:00
stefanprodan
9c4d4d16b6 Add PR links to changelog 2019-01-31 12:17:52 +02:00
stefanprodan
0e3d1b3e8f Improve changelog formatting 2019-01-31 12:11:47 +02:00
stefanprodan
f119b78940 Add features and fixes to changelog 2019-01-31 12:08:32 +02:00
1957 changed files with 278404 additions and 48935 deletions

38
.circleci/config.yml Normal file
View File

@@ -0,0 +1,38 @@
version: 2.1
jobs:
e2e-testing:
machine: true
steps:
- checkout
- run: test/e2e-kind.sh
- run: test/e2e-istio.sh
- run: test/e2e-build.sh
- run: test/e2e-tests.sh
e2e-supergloo-testing:
machine: true
steps:
- checkout
- run: test/e2e-kind.sh
- run: test/e2e-supergloo.sh
- run: test/e2e-build.sh supergloo:test.supergloo-system
- run: test/e2e-tests.sh canary
workflows:
version: 2
build-and-test:
jobs:
- e2e-testing:
filters:
branches:
ignore:
- /gh-pages.*/
- /docs-.*/
- /release-.*/
- e2e-supergloo-testing:
filters:
branches:
ignore:
- /gh-pages.*/
- /docs-.*/
- /release-.*/

View File

@@ -6,3 +6,6 @@ coverage:
threshold: 50
base: auto
patch: off
comment:
require_changes: yes

1
.github/CODEOWNERS vendored Normal file
View File

@@ -0,0 +1 @@
* @stefanprodan

4
.gitignore vendored
View File

@@ -11,3 +11,7 @@
# Output of the go coverage tool, specifically when used with LiteIDE
*.out
.DS_Store
bin/
artifacts/gcloud/
.idea

View File

@@ -1,7 +1,7 @@
builds:
- main: ./cmd/flagger
binary: flagger
ldflags: -s -w -X github.com/stefanprodan/flagger/pkg/version.REVISION={{.Commit}}
ldflags: -s -w -X github.com/weaveworks/flagger/pkg/version.REVISION={{.Commit}}
goos:
- linux
goarch:

View File

@@ -1,8 +1,13 @@
sudo: required
language: go
branches:
except:
- /gh-pages.*/
- /docs-.*/
go:
- 1.11.x
- 1.12.x
services:
- docker
@@ -13,27 +18,26 @@ addons:
- docker-ce
script:
- set -e
- make test-fmt
- make test-codegen
- go test -race -coverprofile=coverage.txt -covermode=atomic ./pkg/controller/
- make build
- make test-fmt
- make test-codegen
- go test -race -coverprofile=coverage.txt -covermode=atomic $(go list ./pkg/...)
- make build
after_success:
- if [ -z "$DOCKER_USER" ]; then
echo "PR build, skipping image push";
else
BRANCH_COMMIT=${TRAVIS_BRANCH}-$(echo ${TRAVIS_COMMIT} | head -c7);
docker tag stefanprodan/flagger:latest quay.io/stefanprodan/flagger:${BRANCH_COMMIT};
echo $DOCKER_PASS | docker login -u=$DOCKER_USER --password-stdin quay.io;
docker push quay.io/stefanprodan/flagger:${BRANCH_COMMIT};
docker tag weaveworks/flagger:latest weaveworks/flagger:${BRANCH_COMMIT};
echo $DOCKER_PASS | docker login -u=$DOCKER_USER --password-stdin;
docker push weaveworks/flagger:${BRANCH_COMMIT};
fi
- if [ -z "$TRAVIS_TAG" ]; then
echo "Not a release, skipping image push";
else
docker tag stefanprodan/flagger:latest quay.io/stefanprodan/flagger:${TRAVIS_TAG};
echo $DOCKER_PASS | docker login -u=$DOCKER_USER --password-stdin quay.io;
docker push quay.io/stefanprodan/flagger:$TRAVIS_TAG;
docker tag weaveworks/flagger:latest weaveworks/flagger:${TRAVIS_TAG};
echo $DOCKER_PASS | docker login -u=$DOCKER_USER --password-stdin;
docker push weaveworks/flagger:$TRAVIS_TAG;
fi
- bash <(curl -s https://codecov.io/bash)
- rm coverage.txt

245
CHANGELOG.md Normal file
View File

@@ -0,0 +1,245 @@
# Changelog
All notable changes to this project are documented in this file.
## 0.12.0 (2019-04-29)
Adds support for [SuperGloo](https://docs.flagger.app/install/flagger-install-with-supergloo)
#### Features
- Supergloo support for canary deployment (weighted traffic) [#151](https://github.com/weaveworks/flagger/pull/151)
## 0.11.1 (2019-04-18)
Move Flagger and the load tester container images to Docker Hub
#### Features
- Add Bash Automated Testing System support to Flagger tester for running acceptance tests as pre-rollout hooks
## 0.11.0 (2019-04-17)
Adds pre/post rollout [webhooks](https://docs.flagger.app/how-it-works#webhooks)
#### Features
- Add `pre-rollout` and `post-rollout` webhook types [#147](https://github.com/weaveworks/flagger/pull/147)
#### Improvements
- Unify App Mesh and Istio builtin metric checks [#146](https://github.com/weaveworks/flagger/pull/146)
- Make the pod selector label configurable [#148](https://github.com/weaveworks/flagger/pull/148)
#### Breaking changes
- Set default `mesh` Istio gateway only if no gateway is specified [#141](https://github.com/weaveworks/flagger/pull/141)
## 0.10.0 (2019-03-27)
Adds support for App Mesh
#### Features
- AWS App Mesh integration
[#107](https://github.com/weaveworks/flagger/pull/107)
[#123](https://github.com/weaveworks/flagger/pull/123)
#### Improvements
- Reconcile Kubernetes ClusterIP services [#122](https://github.com/weaveworks/flagger/pull/122)
#### Fixes
- Preserve pod labels on canary promotion [#105](https://github.com/weaveworks/flagger/pull/105)
- Fix canary status Prometheus metric [#121](https://github.com/weaveworks/flagger/pull/121)
## 0.9.0 (2019-03-11)
Allows A/B testing scenarios where instead of weighted routing, the traffic is split between the
primary and canary based on HTTP headers or cookies.
#### Features
- A/B testing - canary with session affinity [#88](https://github.com/weaveworks/flagger/pull/88)
#### Fixes
- Update the analysis interval when the custom resource changes [#91](https://github.com/weaveworks/flagger/pull/91)
## 0.8.0 (2019-03-06)
Adds support for CORS policy and HTTP request headers manipulation
#### Features
- CORS policy support [#83](https://github.com/weaveworks/flagger/pull/83)
- Allow headers to be appended to HTTP requests [#82](https://github.com/weaveworks/flagger/pull/82)
#### Improvements
- Refactor the routing management
[#72](https://github.com/weaveworks/flagger/pull/72)
[#80](https://github.com/weaveworks/flagger/pull/80)
- Fine-grained RBAC [#73](https://github.com/weaveworks/flagger/pull/73)
- Add option to limit Flagger to a single namespace [#78](https://github.com/weaveworks/flagger/pull/78)
## 0.7.0 (2019-02-28)
Adds support for custom metric checks, HTTP timeouts and HTTP retries
#### Features
- Allow custom promql queries in the canary analysis spec [#60](https://github.com/weaveworks/flagger/pull/60)
- Add HTTP timeout and retries to canary service spec [#62](https://github.com/weaveworks/flagger/pull/62)
## 0.6.0 (2019-02-25)
Allows for [HTTPMatchRequests](https://istio.io/docs/reference/config/istio.networking.v1alpha3/#HTTPMatchRequest)
and [HTTPRewrite](https://istio.io/docs/reference/config/istio.networking.v1alpha3/#HTTPRewrite)
to be customized in the service spec of the canary custom resource.
#### Features
- Add HTTP match conditions and URI rewrite to the canary service spec [#55](https://github.com/weaveworks/flagger/pull/55)
- Update virtual service when the canary service spec changes
[#54](https://github.com/weaveworks/flagger/pull/54)
[#51](https://github.com/weaveworks/flagger/pull/51)
#### Improvements
- Run e2e testing on [Kubernetes Kind](https://github.com/kubernetes-sigs/kind) for canary promotion
[#53](https://github.com/weaveworks/flagger/pull/53)
## 0.5.1 (2019-02-14)
Allows skipping the analysis phase to ship changes directly to production
#### Features
- Add option to skip the canary analysis [#46](https://github.com/weaveworks/flagger/pull/46)
#### Fixes
- Reject deployment if the pod label selector doesn't match `app: <DEPLOYMENT_NAME>` [#43](https://github.com/weaveworks/flagger/pull/43)
## 0.5.0 (2019-01-30)
Track changes in ConfigMaps and Secrets [#37](https://github.com/weaveworks/flagger/pull/37)
#### Features
- Promote configmaps and secrets changes from canary to primary
- Detect changes in configmaps and/or secrets and (re)start canary analysis
- Add configs checksum to Canary CRD status
- Create primary configmaps and secrets at bootstrap
- Scan canary volumes and containers for configmaps and secrets
#### Fixes
- Copy deployment labels from canary to primary at bootstrap and promotion
## 0.4.1 (2019-01-24)
Load testing webhook [#35](https://github.com/weaveworks/flagger/pull/35)
#### Features
- Add the load tester chart to Flagger Helm repository
- Implement a load test runner based on [rakyll/hey](https://github.com/rakyll/hey)
- Log warning when no values are found for Istio metric due to lack of traffic
#### Fixes
- Run wekbooks before the metrics checks to avoid failures when using a load tester
## 0.4.0 (2019-01-18)
Restart canary analysis if revision changes [#31](https://github.com/weaveworks/flagger/pull/31)
#### Breaking changes
- Drop support for Kubernetes 1.10
#### Features
- Detect changes during canary analysis and reset advancement
- Add status and additional printer columns to CRD
- Add canary name and namespace to controller structured logs
#### Fixes
- Allow canary name to be different to the target name
- Check if multiple canaries have the same target and log error
- Use deep copy when updating Kubernetes objects
- Skip readiness checks if canary analysis has finished
## 0.3.0 (2019-01-11)
Configurable canary analysis duration [#20](https://github.com/weaveworks/flagger/pull/20)
#### Breaking changes
- Helm chart: flag `controlLoopInterval` has been removed
#### Features
- CRD: canaries.flagger.app v1alpha3
- Schedule canary analysis independently based on `canaryAnalysis.interval`
- Add analysis interval to Canary CRD (defaults to one minute)
- Make autoscaler (HPA) reference optional
## 0.2.0 (2019-01-04)
Webhooks [#18](https://github.com/weaveworks/flagger/pull/18)
#### Features
- CRD: canaries.flagger.app v1alpha2
- Implement canary external checks based on webhooks HTTP POST calls
- Add webhooks to Canary CRD
- Move docs to gitbook [docs.flagger.app](https://docs.flagger.app)
## 0.1.2 (2018-12-06)
Improve Slack notifications [#14](https://github.com/weaveworks/flagger/pull/14)
#### Features
- Add canary analysis metadata to init and start Slack messages
- Add rollback reason to failed canary Slack messages
## 0.1.1 (2018-11-28)
Canary progress deadline [#10](https://github.com/weaveworks/flagger/pull/10)
#### Features
- Rollback canary based on the deployment progress deadline check
- Add progress deadline to Canary CRD (defaults to 10 minutes)
## 0.1.0 (2018-11-25)
First stable release
#### Features
- CRD: canaries.flagger.app v1alpha1
- Notifications: post canary events to Slack
- Instrumentation: expose Prometheus metrics for canary status and traffic weight percentage
- Autoscaling: add HPA reference to CRD and create primary HPA at bootstrap
- Bootstrap: create primary deployment, ClusterIP services and Istio virtual service based on CRD spec
## 0.0.1 (2018-10-07)
Initial semver release
#### Features
- Implement canary rollback based on failed checks threshold
- Scale up the deployment when canary revision changes
- Add OpenAPI v3 schema validation to Canary CRD
- Use CRD status for canary state persistence
- Add Helm charts for Flagger and Grafana
- Add canary analysis Grafana dashboard

View File

@@ -1,17 +1,17 @@
FROM golang:1.11
FROM golang:1.12
RUN mkdir -p /go/src/github.com/stefanprodan/flagger/
RUN mkdir -p /go/src/github.com/weaveworks/flagger/
WORKDIR /go/src/github.com/stefanprodan/flagger
WORKDIR /go/src/github.com/weaveworks/flagger
COPY . .
RUN GIT_COMMIT=$(git rev-list -1 HEAD) && \
CGO_ENABLED=0 GOOS=linux go build -ldflags "-s -w \
-X github.com/stefanprodan/flagger/pkg/version.REVISION=${GIT_COMMIT}" \
-X github.com/weaveworks/flagger/pkg/version.REVISION=${GIT_COMMIT}" \
-a -installsuffix cgo -o flagger ./cmd/flagger/*
FROM alpine:3.8
FROM alpine:3.9
RUN addgroup -S flagger \
&& adduser -S -g flagger flagger \
@@ -19,7 +19,7 @@ RUN addgroup -S flagger \
WORKDIR /home/flagger
COPY --from=0 /go/src/github.com/stefanprodan/flagger/flagger .
COPY --from=0 /go/src/github.com/weaveworks/flagger/flagger .
RUN chown -R flagger:flagger ./

View File

@@ -1,24 +1,8 @@
FROM golang:1.11 AS hey-builder
FROM golang:1.12 AS builder
RUN mkdir -p /go/src/github.com/rakyll/hey/
RUN mkdir -p /go/src/github.com/weaveworks/flagger/
WORKDIR /go/src/github.com/rakyll/hey
ADD https://github.com/rakyll/hey/archive/v0.1.1.tar.gz .
RUN tar xzf v0.1.1.tar.gz --strip 1
RUN go get ./...
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
go install -ldflags '-w -extldflags "-static"' \
/go/src/github.com/rakyll/hey
FROM golang:1.11 AS builder
RUN mkdir -p /go/src/github.com/stefanprodan/flagger/
WORKDIR /go/src/github.com/stefanprodan/flagger
WORKDIR /go/src/github.com/weaveworks/flagger
COPY . .
@@ -26,16 +10,18 @@ RUN go test -race ./pkg/loadtester/
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o loadtester ./cmd/loadtester/*
FROM alpine:3.8
FROM bats/bats:v1.1.0
RUN addgroup -S app \
&& adduser -S -g app app \
&& apk --no-cache add ca-certificates curl
&& apk --no-cache add ca-certificates curl jq
WORKDIR /home/app
COPY --from=hey-builder /go/bin/hey /usr/local/bin/hey
COPY --from=builder /go/src/github.com/stefanprodan/flagger/loadtester .
RUN curl -sSLo hey "https://storage.googleapis.com/jblabs/dist/hey_linux_v0.1.2" && \
chmod +x hey && mv hey /usr/local/bin/hey
COPY --from=builder /go/src/github.com/weaveworks/flagger/loadtester .
RUN chown -R app:app ./

694
Gopkg.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -11,31 +11,37 @@ required = [
name = "go.uber.org/zap"
version = "v1.9.1"
[[constraint]]
name = "gopkg.in/h2non/gock.v1"
version = "v1.0.14"
[[override]]
name = "gopkg.in/yaml.v2"
version = "v2.2.1"
[[override]]
name = "k8s.io/api"
version = "kubernetes-1.11.0"
version = "kubernetes-1.13.1"
[[override]]
name = "k8s.io/apimachinery"
version = "kubernetes-1.11.0"
version = "kubernetes-1.13.1"
[[override]]
name = "k8s.io/code-generator"
version = "kubernetes-1.11.0"
version = "kubernetes-1.13.1"
[[override]]
name = "k8s.io/client-go"
version = "kubernetes-1.11.0"
version = "kubernetes-1.13.1"
[[override]]
name = "github.com/json-iterator/go"
# This is the commit at which k8s depends on this in 1.11
# It seems to be broken at HEAD.
revision = "f2b4162afba35581b6d4a50d3b8f34e33c144682"
name = "k8s.io/apiextensions-apiserver"
version = "kubernetes-1.13.1"
[[override]]
name = "k8s.io/apiserver"
version = "kubernetes-1.13.1"
[[constraint]]
name = "github.com/prometheus/client_golang"
@@ -45,13 +51,9 @@ required = [
name = "github.com/google/go-cmp"
version = "v0.2.0"
[[constraint]]
name = "github.com/knative/pkg"
revision = "c15d7c8f2220a7578b33504df6edefa948c845ae"
[[override]]
name = "github.com/golang/glog"
source = "github.com/istio/glog"
name = "k8s.io/klog"
source = "github.com/stefanprodan/klog"
[prune]
go-tests = true
@@ -62,3 +64,11 @@ required = [
name = "k8s.io/code-generator"
unused-packages = false
non-go = false
[[constraint]]
name = "github.com/solo-io/supergloo"
version = "v0.3.11"
[[constraint]]
name = "github.com/solo-io/solo-kit"
version = "v0.6.3"

5
MAINTAINERS Normal file
View File

@@ -0,0 +1,5 @@
The maintainers are generally available in Slack at
https://weave-community.slack.com/messages/flagger/ (obtain an invitation
at https://slack.weave.works/).
Stefan Prodan, Weaveworks <stefan@weave.works> (Slack: @stefan Twitter: @stefanprodan)

View File

@@ -4,19 +4,26 @@ VERSION_MINOR:=$(shell grep 'VERSION' pkg/version/version.go | awk '{ print $$4
PATCH:=$(shell grep 'VERSION' pkg/version/version.go | awk '{ print $$4 }' | tr -d '"' | awk -F. '{print $$NF}')
SOURCE_DIRS = cmd pkg/apis pkg/controller pkg/server pkg/logging pkg/version
LT_VERSION?=$(shell grep 'VERSION' cmd/loadtester/main.go | awk '{ print $$4 }' | tr -d '"' | head -n1)
TS=$(shell date +%Y-%m-%d_%H-%M-%S)
run:
go run cmd/flagger/* -kubeconfig=$$HOME/.kube/config -log-level=info \
-metrics-server=https://prometheus.iowa.weavedx.com \
-metrics-server=https://prometheus.istio.weavedx.com \
-slack-url=https://hooks.slack.com/services/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 \
-slack-channel="devops-alerts"
run-appmesh:
go run cmd/flagger/* -kubeconfig=$$HOME/.kube/config -log-level=info -mesh-provider=appmesh \
-metrics-server=http://acfc235624ca911e9a94c02c4171f346-1585187926.us-west-2.elb.amazonaws.com:9090 \
-slack-url=https://hooks.slack.com/services/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 \
-slack-channel="devops-alerts"
build:
docker build -t stefanprodan/flagger:$(TAG) . -f Dockerfile
docker build -t weaveworks/flagger:$(TAG) . -f Dockerfile
push:
docker tag stefanprodan/flagger:$(TAG) quay.io/stefanprodan/flagger:$(VERSION)
docker push quay.io/stefanprodan/flagger:$(VERSION)
docker tag weaveworks/flagger:$(TAG) weaveworks/flagger:$(VERSION)
docker push weaveworks/flagger:$(VERSION)
fmt:
gofmt -l -s -w $(SOURCE_DIRS)
@@ -31,9 +38,9 @@ test: test-fmt test-codegen
go test ./...
helm-package:
cd charts/ && helm package flagger/ && helm package grafana/ && helm package loadtester/
cd charts/ && helm package ./*
mv charts/*.tgz docs/
helm repo index docs --url https://stefanprodan.github.io/flagger --merge ./docs/index.yaml
helm repo index docs --url https://weaveworks.github.io/flagger --merge ./docs/index.yaml
helm-up:
helm upgrade --install flagger ./charts/flagger --namespace=istio-system --set crd.create=false
@@ -82,5 +89,5 @@ reset-test:
kubectl apply -f ./artifacts/canaries
loadtester-push:
docker build -t quay.io/stefanprodan/flagger-loadtester:$(LT_VERSION) . -f Dockerfile.loadtester
docker push quay.io/stefanprodan/flagger-loadtester:$(LT_VERSION)
docker build -t weaveworks/flagger-loadtester:$(LT_VERSION) . -f Dockerfile.loadtester
docker push weaveworks/flagger-loadtester:$(LT_VERSION)

473
README.md
View File

@@ -1,79 +1,58 @@
# flagger
[![build](https://travis-ci.org/stefanprodan/flagger.svg?branch=master)](https://travis-ci.org/stefanprodan/flagger)
[![report](https://goreportcard.com/badge/github.com/stefanprodan/flagger)](https://goreportcard.com/report/github.com/stefanprodan/flagger)
[![codecov](https://codecov.io/gh/stefanprodan/flagger/branch/master/graph/badge.svg)](https://codecov.io/gh/stefanprodan/flagger)
[![license](https://img.shields.io/github/license/stefanprodan/flagger.svg)](https://github.com/stefanprodan/flagger/blob/master/LICENSE)
[![release](https://img.shields.io/github/release/stefanprodan/flagger/all.svg)](https://github.com/stefanprodan/flagger/releases)
[![build](https://travis-ci.org/weaveworks/flagger.svg?branch=master)](https://travis-ci.org/weaveworks/flagger)
[![report](https://goreportcard.com/badge/github.com/weaveworks/flagger)](https://goreportcard.com/report/github.com/weaveworks/flagger)
[![codecov](https://codecov.io/gh/weaveworks/flagger/branch/master/graph/badge.svg)](https://codecov.io/gh/weaveworks/flagger)
[![license](https://img.shields.io/github/license/weaveworks/flagger.svg)](https://github.com/weaveworks/flagger/blob/master/LICENSE)
[![release](https://img.shields.io/github/release/weaveworks/flagger/all.svg)](https://github.com/weaveworks/flagger/releases)
Flagger is a Kubernetes operator that automates the promotion of canary deployments
using Istio routing for traffic shifting and Prometheus metrics for canary analysis.
The canary analysis can be extended with webhooks for running integration tests,
using Istio or App Mesh routing for traffic shifting and Prometheus metrics for canary analysis.
The canary analysis can be extended with webhooks for running acceptance tests,
load tests or any other custom validation.
### Install
Flagger implements a control loop that gradually shifts traffic to the canary while measuring key performance
indicators like HTTP requests success rate, requests average duration and pods health.
Based on analysis of the KPIs a canary is promoted or aborted, and the analysis result is published to Slack.
Before installing Flagger make sure you have Istio setup up with Prometheus enabled.
If you are new to Istio you can follow my [Istio service mesh walk-through](https://github.com/stefanprodan/istio-gke).
![flagger-overview](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/diagrams/flagger-canary-overview.png)
Deploy Flagger in the `istio-system` namespace using Helm:
## Documentation
```bash
# add the Helm repository
helm repo add flagger https://flagger.app
Flagger documentation can be found at [docs.flagger.app](https://docs.flagger.app)
# install or upgrade
helm upgrade -i flagger flagger/flagger \
--namespace=istio-system \
--set metricsServer=http://prometheus.istio-system:9090
```
* Install
* [Flagger install on Kubernetes](https://docs.flagger.app/install/flagger-install-on-kubernetes)
* [Flagger install on GKE Istio](https://docs.flagger.app/install/flagger-install-on-google-cloud)
* [Flagger install on EKS App Mesh](https://docs.flagger.app/install/flagger-install-on-eks-appmesh)
* [Flagger install with SuperGloo](https://docs.flagger.app/install/flagger-install-with-supergloo)
* How it works
* [Canary custom resource](https://docs.flagger.app/how-it-works#canary-custom-resource)
* [Routing](https://docs.flagger.app/how-it-works#istio-routing)
* [Canary deployment stages](https://docs.flagger.app/how-it-works#canary-deployment)
* [Canary analysis](https://docs.flagger.app/how-it-works#canary-analysis)
* [HTTP metrics](https://docs.flagger.app/how-it-works#http-metrics)
* [Custom metrics](https://docs.flagger.app/how-it-works#custom-metrics)
* [Webhooks](https://docs.flagger.app/how-it-works#webhooks)
* [Load testing](https://docs.flagger.app/how-it-works#load-testing)
* Usage
* [Istio canary deployments](https://docs.flagger.app/usage/progressive-delivery)
* [Istio A/B testing](https://docs.flagger.app/usage/ab-testing)
* [App Mesh canary deployments](https://docs.flagger.app/usage/appmesh-progressive-delivery)
* [Monitoring](https://docs.flagger.app/usage/monitoring)
* [Alerting](https://docs.flagger.app/usage/alerting)
* Tutorials
* [Canary deployments with Helm charts and Weave Flux](https://docs.flagger.app/tutorials/canary-helm-gitops)
Flagger is compatible with Kubernetes >1.11.0 and Istio >1.0.0.
## Canary CRD
### Usage
Flagger takes a Kubernetes deployment and optionally a horizontal pod autoscaler (HPA),
then creates a series of objects (Kubernetes deployments, ClusterIP services and Istio or App Mesh virtual services).
These objects expose the application on the mesh and drive the canary analysis and promotion.
Flagger takes a Kubernetes deployment and creates a series of objects
(Kubernetes [deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/),
ClusterIP [services](https://kubernetes.io/docs/concepts/services-networking/service/) and
Istio [virtual services](https://istio.io/docs/reference/config/istio.networking.v1alpha3/#VirtualService))
to drive the canary analysis and promotion.
Flagger keeps track of ConfigMaps and Secrets referenced by a Kubernetes Deployment and triggers a canary analysis if any of those objects change.
Flagger keeps track of ConfigMaps and Secrets referenced by a Kubernetes Deployment and triggers a canary analysis if any of those objects change.
When promoting a workload in production, both code (container images) and configuration (config maps and secrets) are being synchronised.
![flagger-overview](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/diagrams/flagger-canary-overview.png)
Gated canary promotion stages:
* scan for canary deployments
* check Istio virtual service routes are mapped to primary and canary ClusterIP services
* check primary and canary deployments status
* halt advancement if a rolling update is underway
* halt advancement if pods are unhealthy
* increase canary traffic weight percentage from 0% to 5% (step weight)
* call webhooks and check results
* check canary HTTP request success rate and latency
* halt advancement if any metric is under the specified threshold
* increment the failed checks counter
* check if the number of failed checks reached the threshold
* route all traffic to primary
* scale to zero the canary deployment and mark it as failed
* wait for the canary deployment to be updated and start over
* increase canary traffic weight by 5% (step weight) till it reaches 50% (max weight)
* halt advancement while canary request success rate is under the threshold
* halt advancement while canary request duration P99 is over the threshold
* halt advancement if the primary or canary deployment becomes unhealthy
* halt advancement while canary deployment is being scaled up/down by HPA
* promote canary to primary
* copy ConfigMaps and Secrets from canary to primary
* copy canary deployment spec template over primary
* wait for primary rolling update to finish
* halt advancement if pods are unhealthy
* route all traffic to primary
* scale to zero the canary deployment
* mark rollout as finished
* wait for the canary deployment to be updated and start over
For a deployment named _podinfo_, a canary promotion can be defined using Flagger's custom resource:
```yaml
@@ -102,9 +81,31 @@ spec:
# Istio gateways (optional)
gateways:
- public-gateway.istio-system.svc.cluster.local
- mesh
# Istio virtual service host names (optional)
hosts:
- podinfo.example.com
# HTTP match conditions (optional)
match:
- uri:
prefix: /
# HTTP rewrite (optional)
rewrite:
uri: /
# Envoy timeout and retry policy (optional)
headers:
request:
add:
x-envoy-upstream-rq-timeout-ms: "15000"
x-envoy-max-retries: "10"
x-envoy-retry-on: "gateway-error,connect-failure,refused-stream"
# cross-origin resource sharing policy (optional)
corsPolicy:
allowOrigin:
- example.com
# promote the canary without analysing it (default false)
skipAnalysis: false
# define the canary analysis timing and KPIs
canaryAnalysis:
# schedule interval (default 60s)
interval: 1m
@@ -118,16 +119,27 @@ spec:
stepWeight: 5
# Istio Prometheus checks
metrics:
- name: istio_requests_total
# builtin checks
- name: request-success-rate
# minimum req success rate (non 5xx responses)
# percentage (0-100)
threshold: 99
interval: 1m
- name: istio_request_duration_seconds_bucket
- name: request-duration
# maximum req duration P99
# milliseconds
threshold: 500
interval: 30s
# custom check
- name: "kafka lag"
threshold: 100
query: |
avg_over_time(
kafka_consumergroup_lag{
consumergroup=~"podinfo-consumer-.*",
topic="podinfo"
}[1m]
)
# external checks (optional)
webhooks:
- name: load-test
@@ -137,322 +149,47 @@ spec:
cmd: "hey -z 1m -q 10 -c 2 http://podinfo.test:9898/"
```
The canary analysis is using the following promql queries:
_HTTP requests success rate percentage_
For more details on how the canary analysis and promotion works please [read the docs](https://docs.flagger.app/how-it-works).
```sql
sum(
rate(
istio_requests_total{
reporter="destination",
destination_workload_namespace=~"$namespace",
destination_workload=~"$workload",
response_code!~"5.*"
}[$interval]
)
)
/
sum(
rate(
istio_requests_total{
reporter="destination",
destination_workload_namespace=~"$namespace",
destination_workload=~"$workload"
}[$interval]
)
)
```
## Features
_HTTP requests milliseconds duration P99_
| Feature | Istio | App Mesh | SuperGloo |
| -------------------------------------------- | ------------------ | ------------------ |------------------ |
| Canary deployments (weighted traffic) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| A/B testing (headers and cookies filters) | :heavy_check_mark: | :heavy_minus_sign: | :heavy_minus_sign: |
| Load testing | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| Webhooks (custom acceptance tests) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| Request success rate check (Envoy metric) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| Request duration check (Envoy metric) | :heavy_check_mark: | :heavy_minus_sign: | :heavy_check_mark: |
| Custom promql checks | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| Ingress gateway (CORS, retries and timeouts) | :heavy_check_mark: | :heavy_minus_sign: | :heavy_check_mark: |
```sql
histogram_quantile(0.99,
sum(
irate(
istio_request_duration_seconds_bucket{
reporter="destination",
destination_workload=~"$workload",
destination_workload_namespace=~"$namespace"
}[$interval]
)
) by (le)
)
```
## Roadmap
The canary analysis can be extended with webhooks.
Flagger will call the webhooks (HTTP POST) and determine from the response status code (HTTP 2xx) if the canary is failing or not.
Webhook payload:
```json
{
"name": "podinfo",
"namespace": "test",
"metadata": {
"test": "all",
"token": "16688eb5e9f289f1991c"
}
}
```
### Automated canary analysis, promotions and rollbacks
Create a test namespace with Istio sidecar injection enabled:
```bash
export REPO=https://raw.githubusercontent.com/stefanprodan/flagger/master
kubectl apply -f ${REPO}/artifacts/namespaces/test.yaml
```
Create a deployment and a horizontal pod autoscaler:
```bash
kubectl apply -f ${REPO}/artifacts/canaries/deployment.yaml
kubectl apply -f ${REPO}/artifacts/canaries/hpa.yaml
```
Deploy the load testing service to generate traffic during the canary analysis:
```bash
kubectl -n test apply -f ${REPO}/artifacts/loadtester/deployment.yaml
kubectl -n test apply -f ${REPO}/artifacts/loadtester/service.yaml
```
Create a canary promotion custom resource (replace the Istio gateway and the internet domain with your own):
```bash
kubectl apply -f ${REPO}/artifacts/canaries/canary.yaml
```
After a couple of seconds Flagger will create the canary objects:
```bash
# applied
deployment.apps/podinfo
horizontalpodautoscaler.autoscaling/podinfo
canary.flagger.app/podinfo
# generated
deployment.apps/podinfo-primary
horizontalpodautoscaler.autoscaling/podinfo-primary
service/podinfo
service/podinfo-canary
service/podinfo-primary
virtualservice.networking.istio.io/podinfo
```
![flagger-canary-steps](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/diagrams/flagger-canary-steps.png)
Trigger a canary deployment by updating the container image:
```bash
kubectl -n test set image deployment/podinfo \
podinfod=quay.io/stefanprodan/podinfo:1.4.0
```
**Note** that Flagger tracks changes in the deployment `PodSpec` but also in `ConfigMaps` and `Secrets`
that are referenced in the pod's volumes and containers environment variables.
Flagger detects that the deployment revision changed and starts a new canary analysis:
```
kubectl -n test describe canary/podinfo
Status:
Canary Weight: 0
Failed Checks: 0
Last Transition Time: 2019-01-16T13:47:16Z
Phase: Succeeded
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Synced 3m flagger New revision detected podinfo.test
Normal Synced 3m flagger Scaling up podinfo.test
Warning Synced 3m flagger Waiting for podinfo.test rollout to finish: 0 of 1 updated replicas are available
Normal Synced 3m flagger Advance podinfo.test canary weight 5
Normal Synced 3m flagger Advance podinfo.test canary weight 10
Normal Synced 3m flagger Advance podinfo.test canary weight 15
Normal Synced 2m flagger Advance podinfo.test canary weight 20
Normal Synced 2m flagger Advance podinfo.test canary weight 25
Normal Synced 1m flagger Advance podinfo.test canary weight 30
Normal Synced 1m flagger Advance podinfo.test canary weight 35
Normal Synced 55s flagger Advance podinfo.test canary weight 40
Normal Synced 45s flagger Advance podinfo.test canary weight 45
Normal Synced 35s flagger Advance podinfo.test canary weight 50
Normal Synced 25s flagger Copying podinfo.test template spec to podinfo-primary.test
Warning Synced 15s flagger Waiting for podinfo-primary.test rollout to finish: 1 of 2 updated replicas are available
Normal Synced 5s flagger Promotion completed! Scaling down podinfo.test
```
You can monitor all canaries with:
```bash
watch kubectl get canaries --all-namespaces
NAMESPACE NAME STATUS WEIGHT LASTTRANSITIONTIME
test podinfo Progressing 5 2019-01-16T14:05:07Z
```
During the canary analysis you can generate HTTP 500 errors and high latency to test if Flagger pauses the rollout.
Create a tester pod and exec into it:
```bash
kubectl -n test run tester --image=quay.io/stefanprodan/podinfo:1.2.1 -- ./podinfo --port=9898
kubectl -n test exec -it tester-xx-xx sh
```
Generate HTTP 500 errors:
```bash
watch curl http://podinfo-canary:9898/status/500
```
Generate latency:
```bash
watch curl http://podinfo-canary:9898/delay/1
```
When the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary,
the canary is scaled to zero and the rollout is marked as failed.
```
kubectl -n test describe canary/podinfo
Status:
Canary Weight: 0
Failed Checks: 10
Last Transition Time: 2019-01-16T13:47:16Z
Phase: Failed
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Synced 3m flagger Starting canary deployment for podinfo.test
Normal Synced 3m flagger Advance podinfo.test canary weight 5
Normal Synced 3m flagger Advance podinfo.test canary weight 10
Normal Synced 3m flagger Advance podinfo.test canary weight 15
Normal Synced 3m flagger Halt podinfo.test advancement success rate 69.17% < 99%
Normal Synced 2m flagger Halt podinfo.test advancement success rate 61.39% < 99%
Normal Synced 2m flagger Halt podinfo.test advancement success rate 55.06% < 99%
Normal Synced 2m flagger Halt podinfo.test advancement success rate 47.00% < 99%
Normal Synced 2m flagger (combined from similar events): Halt podinfo.test advancement success rate 38.08% < 99%
Warning Synced 1m flagger Rolling back podinfo.test failed checks threshold reached 10
Warning Synced 1m flagger Canary failed! Scaling down podinfo.test
```
**Note** that if you apply new changes to the deployment during the canary analysis, Flagger will restart the analysis.
### Monitoring
Flagger comes with a Grafana dashboard made for canary analysis.
Install Grafana with Helm:
```bash
helm upgrade -i flagger-grafana flagger/grafana \
--namespace=istio-system \
--set url=http://prometheus.istio-system:9090
```
The dashboard shows the RED and USE metrics for the primary and canary workloads:
![flagger-grafana](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/screens/grafana-canary-analysis.png)
The canary errors and latency spikes have been recorded as Kubernetes events and logged by Flagger in json format:
```
kubectl -n istio-system logs deployment/flagger --tail=100 | jq .msg
Starting canary deployment for podinfo.test
Advance podinfo.test canary weight 5
Advance podinfo.test canary weight 10
Advance podinfo.test canary weight 15
Advance podinfo.test canary weight 20
Advance podinfo.test canary weight 25
Advance podinfo.test canary weight 30
Advance podinfo.test canary weight 35
Halt podinfo.test advancement success rate 98.69% < 99%
Advance podinfo.test canary weight 40
Halt podinfo.test advancement request duration 1.515s > 500ms
Advance podinfo.test canary weight 45
Advance podinfo.test canary weight 50
Copying podinfo.test template spec to podinfo-primary.test
Halt podinfo-primary.test advancement waiting for rollout to finish: 1 old replicas are pending termination
Scaling down podinfo.test
Promotion completed! podinfo.test
```
Flagger exposes Prometheus metrics that can be used to determine the canary analysis status and the destination weight values:
```bash
# Canaries total gauge
flagger_canary_total{namespace="test"} 1
# Canary promotion last known status gauge
# 0 - running, 1 - successful, 2 - failed
flagger_canary_status{name="podinfo" namespace="test"} 1
# Canary traffic weight gauge
flagger_canary_weight{workload="podinfo-primary" namespace="test"} 95
flagger_canary_weight{workload="podinfo" namespace="test"} 5
# Seconds spent performing canary analysis histogram
flagger_canary_duration_seconds_bucket{name="podinfo",namespace="test",le="10"} 6
flagger_canary_duration_seconds_bucket{name="podinfo",namespace="test",le="+Inf"} 6
flagger_canary_duration_seconds_sum{name="podinfo",namespace="test"} 17.3561329
flagger_canary_duration_seconds_count{name="podinfo",namespace="test"} 6
```
### Alerting
Flagger can be configured to send Slack notifications:
```bash
helm upgrade -i flagger flagger/flagger \
--namespace=istio-system \
--set slack.url=https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK \
--set slack.channel=general \
--set slack.user=flagger
```
Once configured with a Slack incoming webhook, Flagger will post messages when a canary deployment has been initialized,
when a new revision has been detected and if the canary analysis failed or succeeded.
![flagger-slack](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/screens/slack-canary-notifications.png)
A canary deployment will be rolled back if the progress deadline exceeded or if the analysis
reached the maximum number of failed checks:
![flagger-slack-errors](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/screens/slack-canary-failed.png)
Besides Slack, you can use Alertmanager to trigger alerts when a canary deployment failed:
```yaml
- alert: canary_rollback
expr: flagger_canary_status > 1
for: 1m
labels:
severity: warning
annotations:
summary: "Canary failed"
description: "Workload {{ $labels.name }} namespace {{ $labels.namespace }}"
```
### Roadmap
* Extend the validation mechanism to support other metrics than HTTP success rate and latency
* Integrate with other service mesh technologies like Linkerd v2
* Add support for comparing the canary metrics to the primary ones and do the validation based on the derivation between the two
* Extend the canary analysis and promotion to other types than Kubernetes deployments such as Flux Helm releases or OpenFaaS functions
### Contributing
## Contributing
Flagger is Apache 2.0 licensed and accepts contributions via GitHub pull requests.
When submitting bug reports please include as much details as possible:
When submitting bug reports please include as much details as possible:
* which Flagger version
* which Flagger CRD version
* which Kubernetes/Istio version
* what configuration (canary, virtual service and workloads definitions)
* what happened (Flagger, Istio Pilot and Proxy logs)
## Getting Help
If you have any questions about Flagger and progressive delivery:
* Read the Flagger [docs](https://docs.flagger.app).
* Invite yourself to the [Weave community slack](https://slack.weave.works/)
and join the [#flagger](https://weave-community.slack.com/messages/flagger/) channel.
* Join the [Weave User Group](https://www.meetup.com/pro/Weave/) and get invited to online talks,
hands-on training and meetups in your area.
* File an [issue](https://github.com/weaveworks/flagger/issues/new).
Your feedback is always welcome!

View File

@@ -0,0 +1,62 @@
apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
name: abtest
namespace: test
spec:
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
name: abtest
# the maximum time in seconds for the canary deployment
# to make progress before it is rollback (default 600s)
progressDeadlineSeconds: 60
# HPA reference (optional)
autoscalerRef:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: abtest
service:
# container port
port: 9898
# Istio gateways (optional)
gateways:
- public-gateway.istio-system.svc.cluster.local
- mesh
# Istio virtual service host names (optional)
hosts:
- abtest.istio.weavedx.com
canaryAnalysis:
# schedule interval (default 60s)
interval: 10s
# max number of failed metric checks before rollback
threshold: 10
# total number of iterations
iterations: 10
# canary match condition
match:
- headers:
user-agent:
regex: "^(?!.*Chrome)(?=.*\bSafari\b).*$"
- headers:
cookie:
regex: "^(.*?;)?(type=insider)(;.*)?$"
metrics:
- name: request-success-rate
# minimum req success rate (non 5xx responses)
# percentage (0-100)
threshold: 99
interval: 1m
- name: request-duration
# maximum req duration P99
# milliseconds
threshold: 500
interval: 30s
# external checks (optional)
webhooks:
- name: load-test
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 -H 'Cookie: type=insider' http://podinfo.test:9898/"

View File

@@ -0,0 +1,67 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: abtest
namespace: test
labels:
app: abtest
spec:
minReadySeconds: 5
revisionHistoryLimit: 5
progressDeadlineSeconds: 60
strategy:
rollingUpdate:
maxUnavailable: 0
type: RollingUpdate
selector:
matchLabels:
app: abtest
template:
metadata:
annotations:
prometheus.io/scrape: "true"
labels:
app: abtest
spec:
containers:
- name: podinfod
image: quay.io/stefanprodan/podinfo:1.4.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9898
name: http
protocol: TCP
command:
- ./podinfo
- --port=9898
- --level=info
- --random-delay=false
- --random-error=false
env:
- name: PODINFO_UI_COLOR
value: blue
livenessProbe:
exec:
command:
- podcli
- check
- http
- localhost:9898/healthz
initialDelaySeconds: 5
timeoutSeconds: 5
readinessProbe:
exec:
command:
- podcli
- check
- http
- localhost:9898/readyz
initialDelaySeconds: 5
timeoutSeconds: 5
resources:
limits:
cpu: 2000m
memory: 512Mi
requests:
cpu: 100m
memory: 64Mi

View File

@@ -0,0 +1,19 @@
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: abtest
namespace: test
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: abtest
minReplicas: 2
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
# scale up if usage is above
# 99% of the requested CPU (100m)
targetAverageUtilization: 99

View File

@@ -0,0 +1,50 @@
apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
name: podinfo
namespace: test
spec:
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
# the maximum time in seconds for the canary deployment
# to make progress before it is rollback (default 600s)
progressDeadlineSeconds: 60
# HPA reference (optional)
autoscalerRef:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: podinfo
service:
# container port
port: 9898
# App Mesh reference
meshName: global
# define the canary analysis timing and KPIs
canaryAnalysis:
# schedule interval (default 60s)
interval: 10s
# max number of failed metric checks before rollback
threshold: 10
# max traffic percentage routed to canary
# percentage (0-100)
maxWeight: 50
# canary increment step
# percentage (0-100)
stepWeight: 5
# App Mesh Prometheus checks
metrics:
- name: request-success-rate
# minimum req success rate (non 5xx responses)
# percentage (0-100)
threshold: 99
interval: 1m
# external checks (optional)
webhooks:
- name: load-test
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://podinfo.test:9898/"

View File

@@ -0,0 +1,65 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: podinfo
namespace: test
labels:
app: podinfo
spec:
minReadySeconds: 5
revisionHistoryLimit: 5
progressDeadlineSeconds: 60
strategy:
rollingUpdate:
maxUnavailable: 0
type: RollingUpdate
selector:
matchLabels:
app: podinfo
template:
metadata:
annotations:
prometheus.io/scrape: "true"
labels:
app: podinfo
spec:
containers:
- name: podinfod
image: quay.io/stefanprodan/podinfo:1.4.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9898
name: http
protocol: TCP
command:
- ./podinfo
- --port=9898
- --level=info
env:
- name: PODINFO_UI_COLOR
value: blue
livenessProbe:
exec:
command:
- podcli
- check
- http
- localhost:9898/healthz
initialDelaySeconds: 5
timeoutSeconds: 5
readinessProbe:
exec:
command:
- podcli
- check
- http
- localhost:9898/readyz
initialDelaySeconds: 5
timeoutSeconds: 5
resources:
limits:
cpu: 2000m
memory: 512Mi
requests:
cpu: 100m
memory: 64Mi

View File

@@ -0,0 +1,6 @@
apiVersion: appmesh.k8s.aws/v1beta1
kind: Mesh
metadata:
name: global
spec:
serviceDiscoveryType: dns

View File

@@ -0,0 +1,19 @@
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: podinfo
namespace: test
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
minReplicas: 2
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
# scale up if usage is above
# 99% of the requested CPU (100m)
targetAverageUtilization: 99

View File

@@ -0,0 +1,177 @@
---
kind: ConfigMap
apiVersion: v1
metadata:
name: ingress-config
namespace: test
labels:
app: ingress
data:
envoy.yaml: |
static_resources:
listeners:
- address:
socket_address:
address: 0.0.0.0
port_value: 80
filter_chains:
- filters:
- name: envoy.http_connection_manager
config:
access_log:
- name: envoy.file_access_log
config:
path: /dev/stdout
codec_type: auto
stat_prefix: ingress_http
http_filters:
- name: envoy.router
config: {}
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match:
prefix: "/"
route:
cluster: podinfo
host_rewrite: podinfo.test
timeout: 15s
retry_policy:
retry_on: "gateway-error,connect-failure,refused-stream"
num_retries: 10
per_try_timeout: 5s
clusters:
- name: podinfo
connect_timeout: 0.30s
type: strict_dns
lb_policy: round_robin
http2_protocol_options: {}
hosts:
- socket_address:
address: podinfo.test
port_value: 9898
admin:
access_log_path: /dev/null
address:
socket_address:
address: 0.0.0.0
port_value: 9999
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ingress
namespace: test
labels:
app: ingress
spec:
replicas: 1
selector:
matchLabels:
app: ingress
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
app: ingress
annotations:
prometheus.io/path: "/stats/prometheus"
prometheus.io/port: "9999"
prometheus.io/scrape: "true"
# dummy port to exclude ingress from mesh traffic
# only egress should go over the mesh
appmesh.k8s.aws/ports: "444"
spec:
terminationGracePeriodSeconds: 30
containers:
- name: ingress
image: "envoyproxy/envoy-alpine:d920944aed67425f91fc203774aebce9609e5d9a"
securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
command:
- /usr/bin/dumb-init
- --
args:
- /usr/local/bin/envoy
- --base-id 30
- --v2-config-only
- -l
- $loglevel
- -c
- /config/envoy.yaml
ports:
- name: admin
containerPort: 9999
protocol: TCP
- name: http
containerPort: 80
protocol: TCP
- name: https
containerPort: 443
protocol: TCP
livenessProbe:
initialDelaySeconds: 5
tcpSocket:
port: admin
readinessProbe:
initialDelaySeconds: 5
tcpSocket:
port: admin
resources:
requests:
cpu: 100m
memory: 64Mi
volumeMounts:
- name: config
mountPath: /config
volumes:
- name: config
configMap:
name: ingress-config
---
kind: Service
apiVersion: v1
metadata:
name: ingress
namespace: test
spec:
selector:
app: ingress
ports:
- protocol: TCP
name: http
port: 80
targetPort: 80
- protocol: TCP
name: https
port: 443
targetPort: 443
type: LoadBalancer
---
apiVersion: appmesh.k8s.aws/v1beta1
kind: VirtualNode
metadata:
name: ingress
namespace: test
spec:
meshName: global
listeners:
- portMapping:
port: 80
protocol: http
serviceDiscovery:
dns:
hostName: ingress.test
backends:
- virtualService:
virtualServiceName: podinfo.test

View File

@@ -23,9 +23,26 @@ spec:
# Istio gateways (optional)
gateways:
- public-gateway.istio-system.svc.cluster.local
- mesh
# Istio virtual service host names (optional)
hosts:
- app.iowa.weavedx.com
- app.istio.weavedx.com
# HTTP match conditions (optional)
match:
- uri:
prefix: /
# HTTP rewrite (optional)
rewrite:
uri: /
# Envoy timeout and retry policy (optional)
headers:
request:
add:
x-envoy-upstream-rq-timeout-ms: "15000"
x-envoy-max-retries: "10"
x-envoy-retry-on: "gateway-error,connect-failure,refused-stream"
# promote the canary without analysing it (default false)
skipAnalysis: false
canaryAnalysis:
# schedule interval (default 60s)
interval: 10s
@@ -39,12 +56,12 @@ spec:
stepWeight: 5
# Istio Prometheus checks
metrics:
- name: istio_requests_total
- name: request-success-rate
# minimum req success rate (non 5xx responses)
# percentage (0-100)
threshold: 99
interval: 1m
- name: istio_request_duration_seconds_bucket
- name: request-duration
# maximum req duration P99
# milliseconds
threshold: 500
@@ -55,4 +72,6 @@ spec:
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
type: cmd
cmd: "hey -z 1m -q 10 -c 2 http://podinfo.test:9898/"
logCmdOutput: "true"

View File

@@ -25,7 +25,7 @@ spec:
spec:
containers:
- name: podinfod
image: quay.io/stefanprodan/podinfo:1.3.0
image: quay.io/stefanprodan/podinfo:1.4.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9898

View File

@@ -0,0 +1,6 @@
apiVersion: v1
kind: Namespace
metadata:
name: test
labels:
istio-injection: enabled

View File

@@ -0,0 +1,26 @@
apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
metadata:
name: backend
namespace: test
annotations:
flux.weave.works/automated: "true"
flux.weave.works/tag.chart-image: regexp:^1.4.*
spec:
releaseName: backend
chart:
repository: https://flagger.app/
name: podinfo
version: 2.0.0
values:
image:
repository: quay.io/stefanprodan/podinfo
tag: 1.4.0
httpServer:
timeout: 30s
canary:
enabled: true
istioIngress:
enabled: false
loadtest:
enabled: true

View File

@@ -0,0 +1,27 @@
apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
metadata:
name: frontend
namespace: test
annotations:
flux.weave.works/automated: "true"
flux.weave.works/tag.chart-image: semver:~1.4
spec:
releaseName: frontend
chart:
repository: https://flagger.app/
name: podinfo
version: 2.0.0
values:
image:
repository: quay.io/stefanprodan/podinfo
tag: 1.4.0
backend: http://backend-podinfo:9898/echo
canary:
enabled: true
istioIngress:
enabled: true
gateway: public-gateway.istio-system.svc.cluster.local
host: frontend.istio.example.com
loadtest:
enabled: true

View File

@@ -0,0 +1,18 @@
apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
metadata:
name: loadtester
namespace: test
annotations:
flux.weave.works/automated: "true"
flux.weave.works/tag.chart-image: glob:0.*
spec:
releaseName: flagger-loadtester
chart:
repository: https://flagger.app/
name: loadtester
version: 0.1.0
values:
image:
repository: quay.io/stefanprodan/flagger-loadtester
tag: 0.1.0

View File

@@ -23,6 +23,7 @@ spec:
# Istio gateways (optional)
gateways:
- public-gateway.istio-system.svc.cluster.local
- mesh
# Istio virtual service host names (optional)
hosts:
- app.iowa.weavedx.com
@@ -39,12 +40,12 @@ spec:
stepWeight: 5
# Istio Prometheus checks
metrics:
- name: istio_requests_total
- name: request-success-rate
# minimum req success rate (non 5xx responses)
# percentage (0-100)
threshold: 99
interval: 1m
- name: istio_request_duration_seconds_bucket
- name: request-duration
# maximum req duration P99
# milliseconds
threshold: 500

View File

@@ -0,0 +1,264 @@
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
labels:
app: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
labels:
app: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: appmesh-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: appmesh-system
labels:
app: prometheus
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus
namespace: appmesh-system
labels:
app: prometheus
data:
prometheus.yml: |-
global:
scrape_interval: 5s
scrape_configs:
# Scrape config for AppMesh Envoy sidecar
- job_name: 'appmesh-envoy'
metrics_path: /stats/prometheus
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_name]
action: keep
regex: '^envoy$'
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: ${1}:9901
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
# Exclude high cardinality metrics
metric_relabel_configs:
- source_labels: [ cluster_name ]
regex: '(outbound|inbound|prometheus_stats).*'
action: drop
- source_labels: [ tcp_prefix ]
regex: '(outbound|inbound|prometheus_stats).*'
action: drop
- source_labels: [ listener_address ]
regex: '(.+)'
action: drop
- source_labels: [ http_conn_manager_listener_prefix ]
regex: '(.+)'
action: drop
- source_labels: [ http_conn_manager_prefix ]
regex: '(.+)'
action: drop
- source_labels: [ __name__ ]
regex: 'envoy_tls.*'
action: drop
- source_labels: [ __name__ ]
regex: 'envoy_tcp_downstream.*'
action: drop
- source_labels: [ __name__ ]
regex: 'envoy_http_(stats|admin).*'
action: drop
- source_labels: [ __name__ ]
regex: 'envoy_cluster_(lb|retry|bind|internal|max|original).*'
action: drop
# Scrape config for API servers
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- default
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: kubernetes;https
# Scrape config for nodes
- job_name: 'kubernetes-nodes'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
# scrape config for cAdvisor
- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
# scrape config for pods
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
- source_labels: [ __address__ ]
regex: '.*9901.*'
action: drop
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_pod_annotation_prometheus_io_port
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: kubernetes_pod_name
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: appmesh-system
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
annotations:
version: "appmesh-v1alpha1"
spec:
serviceAccountName: prometheus
containers:
- name: prometheus
image: "docker.io/prom/prometheus:v2.7.1"
imagePullPolicy: IfNotPresent
args:
- '--storage.tsdb.retention=6h'
- '--config.file=/etc/prometheus/prometheus.yml'
ports:
- containerPort: 9090
name: http
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
readinessProbe:
httpGet:
path: /-/ready
port: 9090
resources:
requests:
cpu: 10m
memory: 128Mi
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus
volumes:
- name: config-volume
configMap:
name: prometheus
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: appmesh-system
labels:
name: prometheus
spec:
selector:
app: prometheus
ports:
- name: http
protocol: TCP
port: 9090

View File

@@ -13,11 +13,50 @@ metadata:
labels:
app: flagger
rules:
- apiGroups: ['*']
resources: ['*']
verbs: ['*']
- nonResourceURLs: ['*']
verbs: ['*']
- apiGroups:
- ""
resources:
- events
- configmaps
- secrets
- services
verbs: ["*"]
- apiGroups:
- apps
resources:
- deployments
verbs: ["*"]
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs: ["*"]
- apiGroups:
- flagger.app
resources:
- canaries
- canaries/status
verbs: ["*"]
- apiGroups:
- networking.istio.io
resources:
- virtualservices
- virtualservices/status
verbs: ["*"]
- apiGroups:
- appmesh.k8s.aws
resources:
- meshes
- meshes/status
- virtualnodes
- virtualnodes/status
- virtualservices
- virtualservices/status
verbs: ["*"]
- nonResourceURLs:
- /version
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding

View File

@@ -2,6 +2,8 @@ apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: canaries.flagger.app
annotations:
helm.sh/resource-policy: keep
spec:
group: flagger.app
version: v1alpha3
@@ -39,9 +41,9 @@ spec:
properties:
spec:
required:
- targetRef
- service
- canaryAnalysis
- targetRef
- service
- canaryAnalysis
properties:
progressDeadlineSeconds:
type: number
@@ -73,11 +75,21 @@ spec:
properties:
port:
type: number
portName:
type: string
meshName:
type: string
timeout:
type: string
skipAnalysis:
type: boolean
canaryAnalysis:
properties:
interval:
type: string
pattern: "^[0-9]+(m|s)"
iterations:
type: number
threshold:
type: number
maxWeight:
@@ -89,7 +101,7 @@ spec:
properties:
items:
type: object
required: ['name', 'interval', 'threshold']
required: ['name', 'threshold']
properties:
name:
type: string
@@ -98,6 +110,8 @@ spec:
pattern: "^[0-9]+(m|s)"
threshold:
type: number
query:
type: string
webhooks:
type: array
properties:
@@ -107,9 +121,36 @@ spec:
properties:
name:
type: string
type:
type: string
enum:
- ""
- pre-rollout
- rollout
- post-rollout
url:
type: string
format: url
timeout:
type: string
pattern: "^[0-9]+(m|s)"
status:
properties:
phase:
type: string
enum:
- ""
- Initialized
- Progressing
- Succeeded
- Failed
canaryWeight:
type: number
failedChecks:
type: number
iterations:
type: number
lastAppliedSpec:
type: string
lastTransitionTime:
type: string

View File

@@ -22,8 +22,8 @@ spec:
serviceAccountName: flagger
containers:
- name: flagger
image: quay.io/stefanprodan/flagger:0.5.0
imagePullPolicy: Always
image: weaveworks/flagger:0.12.0
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8080
@@ -31,6 +31,7 @@ spec:
- ./flagger
- -log-level=info
- -control-loop-interval=10s
- -mesh-provider=$(MESH_PROVIDER)
- -metrics-server=http://prometheus.istio-system.svc.cluster.local:9090
livenessProbe:
exec:

View File

@@ -0,0 +1,27 @@
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: public-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
tls:
httpsRedirect: true
- port:
number: 443
name: https
protocol: HTTPS
hosts:
- "*"
tls:
mode: SIMPLE
privateKey: /etc/istio/ingressgateway-certs/tls.key
serverCertificate: /etc/istio/ingressgateway-certs/tls.crt

View File

@@ -0,0 +1,443 @@
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
labels:
app: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
labels:
app: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: istio-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: istio-system
labels:
app: prometheus
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus
namespace: istio-system
labels:
app: prometheus
data:
prometheus.yml: |-
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'istio-mesh'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-telemetry;prometheus
# Scrape config for envoy stats
- job_name: 'envoy-stats'
metrics_path: /stats/prometheus
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_port_name]
action: keep
regex: '.*-envoy-prom'
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:15090
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod_name
metric_relabel_configs:
# Exclude some of the envoy metrics that have massive cardinality
# This list may need to be pruned further moving forward, as informed
# by performance and scalability testing.
- source_labels: [ cluster_name ]
regex: '(outbound|inbound|prometheus_stats).*'
action: drop
- source_labels: [ tcp_prefix ]
regex: '(outbound|inbound|prometheus_stats).*'
action: drop
- source_labels: [ listener_address ]
regex: '(.+)'
action: drop
- source_labels: [ http_conn_manager_listener_prefix ]
regex: '(.+)'
action: drop
- source_labels: [ http_conn_manager_prefix ]
regex: '(.+)'
action: drop
- source_labels: [ __name__ ]
regex: 'envoy_tls.*'
action: drop
- source_labels: [ __name__ ]
regex: 'envoy_tcp_downstream.*'
action: drop
- source_labels: [ __name__ ]
regex: 'envoy_http_(stats|admin).*'
action: drop
- source_labels: [ __name__ ]
regex: 'envoy_cluster_(lb|retry|bind|internal|max|original).*'
action: drop
- job_name: 'istio-policy'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-policy;http-monitoring
- job_name: 'istio-telemetry'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-telemetry;http-monitoring
- job_name: 'pilot'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-pilot;http-monitoring
- job_name: 'galley'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-galley;http-monitoring
# scrape config for API servers
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- default
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: kubernetes;https
# scrape config for nodes (kubelet)
- job_name: 'kubernetes-nodes'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
# Scrape config for Kubelet cAdvisor.
#
# This is required for Kubernetes 1.7.3 and later, where cAdvisor metrics
# (those whose names begin with 'container_') have been removed from the
# Kubelet metrics endpoint. This job scrapes the cAdvisor endpoint to
# retrieve those metrics.
#
# In Kubernetes 1.7.0-1.7.2, these metrics are only exposed on the cAdvisor
# HTTP endpoint; use "replacement: /api/v1/nodes/${1}:4194/proxy/metrics"
# in that case (and ensure cAdvisor's HTTP server hasn't been disabled with
# the --cadvisor-port=0 Kubelet flag).
#
# This job is not necessary and should be removed in Kubernetes 1.6 and
# earlier versions, or it will cause the metrics to be scraped twice.
- job_name: 'kubernetes-cadvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
# scrape config for service endpoints.
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs: # If first two labels are present, pod should be scraped by the istio-secure job.
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_sidecar_istio_io_status]
action: drop
regex: (.+)
- source_labels: [__meta_kubernetes_pod_annotation_istio_mtls]
action: drop
regex: (true)
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod_name
- job_name: 'kubernetes-pods-istio-secure'
scheme: https
tls_config:
ca_file: /etc/istio-certs/root-cert.pem
cert_file: /etc/istio-certs/cert-chain.pem
key_file: /etc/istio-certs/key.pem
insecure_skip_verify: true # prometheus does not support secure naming.
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
# sidecar status annotation is added by sidecar injector and
# istio_workload_mtls_ability can be specifically placed on a pod to indicate its ability to receive mtls traffic.
- source_labels: [__meta_kubernetes_pod_annotation_sidecar_istio_io_status, __meta_kubernetes_pod_annotation_istio_mtls]
action: keep
regex: (([^;]+);([^;]*))|(([^;]*);(true))
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__] # Only keep address that is host:port
action: keep # otherwise an extra target with ':443' is added for https scheme
regex: ([^:]+):(\d+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod_name
---
# Source: istio/charts/prometheus/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: istio-system
annotations:
prometheus.io/scrape: 'true'
labels:
name: prometheus
spec:
selector:
app: prometheus
ports:
- name: http-prometheus
protocol: TCP
port: 9090
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: istio-system
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
annotations:
sidecar.istio.io/inject: "false"
scheduler.alpha.kubernetes.io/critical-pod: ""
spec:
serviceAccountName: prometheus
containers:
- name: prometheus
image: "docker.io/prom/prometheus:v2.7.1"
imagePullPolicy: IfNotPresent
args:
- '--storage.tsdb.retention=6h'
- '--config.file=/etc/prometheus/prometheus.yml'
ports:
- containerPort: 9090
name: http
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
readinessProbe:
httpGet:
path: /-/ready
port: 9090
resources:
requests:
cpu: 10m
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus
- mountPath: /etc/istio-certs
name: istio-certs
volumes:
- name: config-volume
configMap:
name: prometheus
- name: istio-certs
secret:
defaultMode: 420
optional: true
secretName: istio.default

View File

@@ -0,0 +1,19 @@
---
apiVersion: v1
kind: ConfigMap
metadata:
name: flagger-loadtester-bats
data:
tests: |
#!/usr/bin/env bats
@test "check message" {
curl -sS http://${URL} | jq -r .message | {
run cut -d $' ' -f1
[ $output = "greetings" ]
}
}
@test "check headers" {
curl -sS http://${URL}/headers | grep X-Request-Id
}

View File

@@ -17,7 +17,7 @@ spec:
spec:
containers:
- name: loadtester
image: quay.io/stefanprodan/flagger-loadtester:0.1.0
image: weaveworks/flagger-loadtester:0.3.0
imagePullPolicy: IfNotPresent
ports:
- name: http
@@ -27,7 +27,6 @@ spec:
- -port=8080
- -log-level=info
- -timeout=1h
- -log-cmd-output=true
livenessProbe:
exec:
command:
@@ -58,3 +57,11 @@ spec:
securityContext:
readOnlyRootFilesystem: true
runAsUser: 10001
# volumeMounts:
# - name: tests
# mountPath: /bats
# readOnly: true
# volumes:
# - name: tests
# configMap:
# name: flagger-loadtester-bats

View File

@@ -4,3 +4,4 @@ metadata:
name: test
labels:
istio-injection: enabled
appmesh.k8s.aws/sidecarInjectorWebhook: enabled

View File

@@ -0,0 +1,45 @@
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: podinfo
namespace: test
spec:
gateways:
- public-gateway.istio-system.svc.cluster.local
- mesh
hosts:
- podinfo.istio.weavedx.com
- podinfo
http:
- route:
- destination:
host: podinfo
subset: primary
weight: 50
- destination:
host: podinfo
subset: canary
weight: 50
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: podinfo-destination
namespace: test
spec:
host: podinfo
trafficPolicy:
loadBalancer:
consistentHash:
httpCookie:
name: istiouser
ttl: 30s
subsets:
- name: primary
labels:
app: podinfo
role: primary
- name: canary
labels:
app: podinfo
role: canary

View File

@@ -8,13 +8,17 @@ spec:
- public-gateway.istio-system.svc.cluster.local
- mesh
hosts:
- podinfo.iowa.weavedx.com
- app.istio.weavedx.com
- podinfo
http:
- match:
- headers:
user-agent:
regex: ^(?!.*Chrome)(?=.*\bSafari\b).*$
uri:
prefix: "/version/"
rewrite:
uri: /api/info
route:
- destination:
host: podinfo-primary
@@ -26,7 +30,12 @@ spec:
port:
number: 9898
weight: 100
- route:
- match:
- uri:
prefix: "/version/"
rewrite:
uri: /api/info
route:
- destination:
host: podinfo-primary
port:

View File

@@ -23,7 +23,7 @@ spec:
spec:
containers:
- name: podinfod
image: quay.io/stefanprodan/podinfo:1.2.0
image: quay.io/stefanprodan/podinfo:1.4.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9898
@@ -67,9 +67,3 @@ spec:
requests:
cpu: 100m
memory: 16Mi
volumeMounts:
- mountPath: /data
name: data
volumes:
- emptyDir: {}
name: data

View File

@@ -1,10 +1,8 @@
apiVersion: v1
kind: Service
metadata:
name: podinfo
name: podinfo-canary
namespace: test
labels:
app: podinfo
spec:
type: ClusterIP
selector:

View File

@@ -23,7 +23,7 @@ spec:
spec:
containers:
- name: podinfod
image: quay.io/stefanprodan/podinfo:1.1.1
image: quay.io/stefanprodan/podinfo:1.4.1
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9898

View File

@@ -0,0 +1,14 @@
apiVersion: v1
kind: Service
metadata:
name: podinfo
namespace: test
spec:
type: ClusterIP
selector:
app: podinfo-primary
ports:
- name: http
port: 9898
protocol: TCP
targetPort: http

View File

@@ -1,14 +1,14 @@
apiVersion: v1
name: flagger
version: 0.5.0
appVersion: 0.5.0
version: 0.12.0
appVersion: 0.12.0
kubeVersion: ">=1.11.0-0"
engine: gotpl
description: Flagger is a Kubernetes operator that automates the promotion of canary deployments using Istio routing for traffic shifting and Prometheus metrics for canary analysis.
home: https://docs.flagger.app
icon: https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/logo/flagger-icon.png
icon: https://raw.githubusercontent.com/weaveworks/flagger/master/docs/logo/flagger-icon.png
sources:
- https://github.com/stefanprodan/flagger
- https://github.com/weaveworks/flagger
maintainers:
- name: stefanprodan
url: https://github.com/stefanprodan
@@ -16,4 +16,5 @@ maintainers:
keywords:
- canary
- istio
- appmesh
- gitops

View File

@@ -1,6 +1,6 @@
# Flagger
[Flagger](https://github.com/stefanprodan/flagger) is a Kubernetes operator that automates the promotion of
[Flagger](https://github.com/weaveworks/flagger) is a Kubernetes operator that automates the promotion of
canary deployments using Istio routing for traffic shifting and Prometheus metrics for canary analysis.
Flagger implements a control loop that gradually shifts traffic to the canary while measuring key performance indicators
like HTTP requests success rate, requests average duration and pods health.

View File

@@ -3,6 +3,8 @@ apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: canaries.flagger.app
annotations:
helm.sh/resource-policy: keep
spec:
group: flagger.app
version: v1alpha3
@@ -74,11 +76,21 @@ spec:
properties:
port:
type: number
portName:
type: string
meshName:
type: string
timeout:
type: string
skipAnalysis:
type: boolean
canaryAnalysis:
properties:
interval:
type: string
pattern: "^[0-9]+(m|s)"
iterations:
type: number
threshold:
type: number
maxWeight:
@@ -90,7 +102,7 @@ spec:
properties:
items:
type: object
required: ['name', 'interval', 'threshold']
required: ['name', 'threshold']
properties:
name:
type: string
@@ -99,6 +111,8 @@ spec:
pattern: "^[0-9]+(m|s)"
threshold:
type: number
query:
type: string
webhooks:
type: array
properties:
@@ -108,10 +122,37 @@ spec:
properties:
name:
type: string
type:
type: string
enum:
- ""
- pre-rollout
- rollout
- post-rollout
url:
type: string
format: url
timeout:
type: string
pattern: "^[0-9]+(m|s)"
status:
properties:
phase:
type: string
enum:
- ""
- Initialized
- Progressing
- Succeeded
- Failed
canaryWeight:
type: number
failedChecks:
type: number
iterations:
type: number
lastAppliedSpec:
type: string
lastTransitionTime:
type: string
{{- end }}

View File

@@ -35,7 +35,13 @@ spec:
command:
- ./flagger
- -log-level=info
{{- if .Values.meshProvider }}
- -mesh-provider={{ .Values.meshProvider }}
{{- end }}
- -metrics-server={{ .Values.metricsServer }}
{{- if .Values.namespace }}
- -namespace={{ .Values.namespace }}
{{- end }}
{{- if .Values.slack.url }}
- -slack-url={{ .Values.slack.url }}
- -slack-user={{ .Values.slack.user }}

View File

@@ -9,11 +9,50 @@ metadata:
app.kubernetes.io/managed-by: {{ .Release.Service }}
app.kubernetes.io/instance: {{ .Release.Name }}
rules:
- apiGroups: ['*']
resources: ['*']
verbs: ['*']
- nonResourceURLs: ['*']
verbs: ['*']
- apiGroups:
- ""
resources:
- events
- configmaps
- secrets
- services
verbs: ["*"]
- apiGroups:
- apps
resources:
- deployments
verbs: ["*"]
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs: ["*"]
- apiGroups:
- flagger.app
resources:
- canaries
- canaries/status
verbs: ["*"]
- apiGroups:
- networking.istio.io
resources:
- virtualservices
- virtualservices/status
verbs: ["*"]
- apiGroups:
- appmesh.k8s.aws
resources:
- meshes
- meshes/status
- virtualnodes
- virtualnodes/status
- virtualservices
- virtualservices/status
verbs: ["*"]
- nonResourceURLs:
- /version
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding

View File

@@ -1,11 +1,17 @@
# Default values for flagger.
image:
repository: quay.io/stefanprodan/flagger
tag: 0.5.0
repository: weaveworks/flagger
tag: 0.12.0
pullPolicy: IfNotPresent
metricsServer: "http://prometheus.istio-system.svc.cluster.local:9090"
metricsServer: "http://prometheus:9090"
# accepted values are istio or appmesh (defaults to istio)
meshProvider: ""
# single namespace restriction
namespace: ""
slack:
user: flagger

View File

@@ -1,12 +1,12 @@
apiVersion: v1
name: grafana
version: 0.1.0
appVersion: 5.4.2
version: 1.1.0
appVersion: 5.4.3
description: Grafana dashboards for monitoring Flagger canary deployments
icon: https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/logo/flagger-icon.png
icon: https://raw.githubusercontent.com/weaveworks/flagger/master/docs/logo/flagger-icon.png
home: https://flagger.app
sources:
- https://github.com/stefanprodan/flagger
- https://github.com/weaveworks/flagger
maintainers:
- name: stefanprodan
url: https://github.com/stefanprodan

View File

@@ -2,7 +2,7 @@
Grafana dashboards for monitoring progressive deployments powered by Istio, Prometheus and Flagger.
![flagger-grafana](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/screens/grafana-canary-analysis.png)
![flagger-grafana](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/screens/grafana-canary-analysis.png)
## Prerequisites

File diff suppressed because it is too large Load Diff

View File

@@ -2,7 +2,6 @@
"annotations": {
"list": [
{
"$$hashKey": "object:1587",
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
@@ -16,8 +15,8 @@
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": null,
"iteration": 1534587617141,
"id": 1,
"iteration": 1549736611069,
"links": [],
"panels": [
{
@@ -179,7 +178,6 @@
"tableColumn": "",
"targets": [
{
"$$hashKey": "object:2857",
"expr": "sum(irate(istio_requests_total{reporter=\"destination\",destination_workload_namespace=~\"$namespace\",destination_workload=~\"$primary\",response_code!~\"5.*\"}[30s])) / sum(irate(istio_requests_total{reporter=\"destination\",destination_workload_namespace=~\"$namespace\",destination_workload=~\"$primary\"}[30s]))",
"format": "time_series",
"intervalFactor": 1,
@@ -344,7 +342,6 @@
"tableColumn": "",
"targets": [
{
"$$hashKey": "object:2810",
"expr": "sum(irate(istio_requests_total{reporter=\"destination\",destination_workload_namespace=~\"$namespace\",destination_workload=~\"$canary\",response_code!~\"5.*\"}[30s])) / sum(irate(istio_requests_total{reporter=\"destination\",destination_workload_namespace=~\"$namespace\",destination_workload=~\"$canary\"}[30s]))",
"format": "time_series",
"intervalFactor": 1,
@@ -363,7 +360,7 @@
"value": "null"
}
],
"valueName": "avg"
"valueName": "current"
},
{
"aliasColors": {},
@@ -432,6 +429,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Primary: Request Duration",
"tooltip": {
@@ -464,7 +462,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
@@ -533,6 +535,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Canary: Request Duration",
"tooltip": {
@@ -565,7 +568,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"content": "<div class=\"dashboard-header text-center\">\n<span>USE: $canary.$namespace</span>\n</div>",
@@ -623,7 +630,6 @@
"steppedLine": false,
"targets": [
{
"$$hashKey": "object:1685",
"expr": "sum(rate(container_cpu_usage_seconds_total{cpu=\"total\",namespace=\"$namespace\",pod_name=~\"$primary.*\", container_name!~\"POD|istio-proxy\"}[1m])) by (pod_name)",
"format": "time_series",
"hide": false,
@@ -634,6 +640,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Primary: CPU Usage by Pod",
"tooltip": {
@@ -651,7 +658,6 @@
},
"yaxes": [
{
"$$hashKey": "object:1845",
"format": "s",
"label": "CPU seconds / second",
"logBase": 1,
@@ -660,7 +666,6 @@
"show": true
},
{
"$$hashKey": "object:1846",
"format": "short",
"label": null,
"logBase": 1,
@@ -668,7 +673,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
@@ -711,7 +720,6 @@
"steppedLine": false,
"targets": [
{
"$$hashKey": "object:1685",
"expr": "sum(rate(container_cpu_usage_seconds_total{cpu=\"total\",namespace=\"$namespace\",pod_name=~\"$canary.*\", pod_name!~\"$primary.*\", container_name!~\"POD|istio-proxy\"}[1m])) by (pod_name)",
"format": "time_series",
"hide": false,
@@ -722,6 +730,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Canary: CPU Usage by Pod",
"tooltip": {
@@ -739,7 +748,6 @@
},
"yaxes": [
{
"$$hashKey": "object:1845",
"format": "s",
"label": "CPU seconds / second",
"logBase": 1,
@@ -748,7 +756,6 @@
"show": true
},
{
"$$hashKey": "object:1846",
"format": "short",
"label": null,
"logBase": 1,
@@ -756,7 +763,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
@@ -799,7 +810,6 @@
"steppedLine": false,
"targets": [
{
"$$hashKey": "object:1685",
"expr": "sum(container_memory_working_set_bytes{namespace=\"$namespace\",pod_name=~\"$primary.*\", container_name!~\"POD|istio-proxy\"}) by (pod_name)",
"format": "time_series",
"hide": false,
@@ -811,6 +821,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Primary: Memory Usage by Pod",
"tooltip": {
@@ -828,7 +839,6 @@
},
"yaxes": [
{
"$$hashKey": "object:1845",
"decimals": null,
"format": "bytes",
"label": "",
@@ -838,7 +848,6 @@
"show": true
},
{
"$$hashKey": "object:1846",
"format": "short",
"label": null,
"logBase": 1,
@@ -846,7 +855,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
@@ -889,7 +902,6 @@
"steppedLine": false,
"targets": [
{
"$$hashKey": "object:1685",
"expr": "sum(container_memory_working_set_bytes{namespace=\"$namespace\",pod_name=~\"$canary.*\", pod_name!~\"$primary.*\", container_name!~\"POD|istio-proxy\"}) by (pod_name)",
"format": "time_series",
"hide": false,
@@ -901,6 +913,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Canary: Memory Usage by Pod",
"tooltip": {
@@ -918,7 +931,6 @@
},
"yaxes": [
{
"$$hashKey": "object:1845",
"decimals": null,
"format": "bytes",
"label": "",
@@ -928,7 +940,6 @@
"show": true
},
{
"$$hashKey": "object:1846",
"format": "short",
"label": null,
"logBase": 1,
@@ -936,7 +947,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
@@ -975,12 +990,10 @@
"renderer": "flot",
"seriesOverrides": [
{
"$$hashKey": "object:3641",
"alias": "received",
"color": "#f9d9f9"
},
{
"$$hashKey": "object:3649",
"alias": "transmited",
"color": "#f29191"
}
@@ -990,7 +1003,6 @@
"steppedLine": false,
"targets": [
{
"$$hashKey": "object:2598",
"expr": "sum(rate (container_network_receive_bytes_total{namespace=\"$namespace\",pod_name=~\"$primary.*\"}[1m])) ",
"format": "time_series",
"intervalFactor": 1,
@@ -998,7 +1010,6 @@
"refId": "A"
},
{
"$$hashKey": "object:3245",
"expr": "-sum (rate (container_network_transmit_bytes_total{namespace=\"$namespace\",pod_name=~\"$primary.*\"}[1m]))",
"format": "time_series",
"intervalFactor": 1,
@@ -1008,6 +1019,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Primary: Network I/O",
"tooltip": {
@@ -1025,7 +1037,6 @@
},
"yaxes": [
{
"$$hashKey": "object:1845",
"decimals": null,
"format": "Bps",
"label": "",
@@ -1035,7 +1046,6 @@
"show": true
},
{
"$$hashKey": "object:1846",
"format": "short",
"label": null,
"logBase": 1,
@@ -1043,7 +1053,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
@@ -1082,12 +1096,10 @@
"renderer": "flot",
"seriesOverrides": [
{
"$$hashKey": "object:3641",
"alias": "received",
"color": "#f9d9f9"
},
{
"$$hashKey": "object:3649",
"alias": "transmited",
"color": "#f29191"
}
@@ -1097,7 +1109,6 @@
"steppedLine": false,
"targets": [
{
"$$hashKey": "object:2598",
"expr": "sum(rate (container_network_receive_bytes_total{namespace=\"$namespace\",pod_name=~\"$canary.*\",pod_name!~\"$primary.*\"}[1m])) ",
"format": "time_series",
"intervalFactor": 1,
@@ -1105,7 +1116,6 @@
"refId": "A"
},
{
"$$hashKey": "object:3245",
"expr": "-sum (rate (container_network_transmit_bytes_total{namespace=\"$namespace\",pod_name=~\"$canary.*\",pod_name!~\"$primary.*\"}[1m]))",
"format": "time_series",
"intervalFactor": 1,
@@ -1115,6 +1125,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Canary: Network I/O",
"tooltip": {
@@ -1132,7 +1143,6 @@
},
"yaxes": [
{
"$$hashKey": "object:1845",
"decimals": null,
"format": "Bps",
"label": "",
@@ -1142,7 +1152,6 @@
"show": true
},
{
"$$hashKey": "object:1846",
"format": "short",
"label": null,
"logBase": 1,
@@ -1150,7 +1159,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"content": "<div class=\"dashboard-header text-center\">\n<span>IN/OUTBOUND: $canary.$namespace</span>\n</div>",
@@ -1205,7 +1218,6 @@
"steppedLine": false,
"targets": [
{
"$$hashKey": "object:1953",
"expr": "round(sum(irate(istio_requests_total{connection_security_policy=\"mutual_tls\", destination_workload_namespace=~\"$namespace\", destination_workload=~\"$primary\", reporter=\"destination\"}[30s])) by (source_workload, source_workload_namespace, response_code), 0.001)",
"format": "time_series",
"hide": false,
@@ -1215,7 +1227,6 @@
"step": 2
},
{
"$$hashKey": "object:1954",
"expr": "round(sum(irate(istio_requests_total{connection_security_policy!=\"mutual_tls\", destination_workload_namespace=~\"$namespace\", destination_workload=~\"$primary\", reporter=\"destination\"}[30s])) by (source_workload, source_workload_namespace, response_code), 0.001)",
"format": "time_series",
"hide": false,
@@ -1227,6 +1238,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Primary: Incoming Requests by Source And Response Code",
"tooltip": {
@@ -1246,7 +1258,6 @@
},
"yaxes": [
{
"$$hashKey": "object:1999",
"format": "ops",
"label": null,
"logBase": 1,
@@ -1255,7 +1266,6 @@
"show": true
},
{
"$$hashKey": "object:2000",
"format": "short",
"label": null,
"logBase": 1,
@@ -1263,7 +1273,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
@@ -1323,6 +1337,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Canary: Incoming Requests by Source And Response Code",
"tooltip": {
@@ -1357,7 +1372,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
@@ -1416,6 +1435,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Primary: Outgoing Requests by Destination And Response Code",
"tooltip": {
@@ -1450,7 +1470,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
@@ -1509,6 +1533,7 @@
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Canary: Outgoing Requests by Destination And Response Code",
"tooltip": {
@@ -1543,7 +1568,11 @@
"min": null,
"show": false
}
]
],
"yaxis": {
"align": false,
"alignLevel": null
}
}
],
"refresh": "10s",
@@ -1554,11 +1583,9 @@
"list": [
{
"allValue": null,
"current": {
"text": "demo",
"value": "demo"
},
"current": null,
"datasource": "prometheus",
"definition": "",
"hide": 0,
"includeAll": false,
"label": "Namespace",
@@ -1568,6 +1595,7 @@
"query": "query_result(sum(istio_requests_total) by (destination_workload_namespace) or sum(istio_tcp_sent_bytes_total) by (destination_workload_namespace))",
"refresh": 1,
"regex": "/.*_namespace=\"([^\"]*).*/",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
@@ -1577,11 +1605,9 @@
},
{
"allValue": null,
"current": {
"text": "primary",
"value": "primary"
},
"current": null,
"datasource": "prometheus",
"definition": "",
"hide": 0,
"includeAll": false,
"label": "Primary",
@@ -1591,6 +1617,7 @@
"query": "query_result(sum(istio_requests_total{destination_workload_namespace=~\"$namespace\"}) by (destination_service_name))",
"refresh": 1,
"regex": "/.*destination_service_name=\"([^\"]*).*/",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [],
@@ -1600,11 +1627,9 @@
},
{
"allValue": null,
"current": {
"text": "canary",
"value": "canary"
},
"current": null,
"datasource": "prometheus",
"definition": "",
"hide": 0,
"includeAll": false,
"label": "Canary",
@@ -1614,6 +1639,7 @@
"query": "query_result(sum(istio_requests_total{destination_workload_namespace=~\"$namespace\"}) by (destination_service_name))",
"refresh": 1,
"regex": "/.*destination_service_name=\"([^\"]*).*/",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [],
@@ -1653,7 +1679,7 @@
]
},
"timezone": "",
"title": "Canary analysis",
"uid": "RdykD7tiz",
"version": 2
}
"title": "Istio Canary",
"uid": "flagger-istio",
"version": 3
}

View File

@@ -1,15 +1,7 @@
1. Get the application URL by running these commands:
{{- if contains "NodePort" .Values.service.type }}
export NODE_PORT=$(kubectl get --namespace {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}" services {{ template "grafana.fullname" . }})
export NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORT
{{- else if contains "LoadBalancer" .Values.service.type }}
NOTE: It may take a few minutes for the LoadBalancer IP to be available.
You can watch the status of by running 'kubectl get svc -w {{ template "grafana.fullname" . }}'
export SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ template "grafana.fullname" . }} -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo http://$SERVICE_IP:{{ .Values.service.port }}
{{- else if contains "ClusterIP" .Values.service.type }}
export POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app={{ template "grafana.name" . }},release={{ .Release.Name }}" -o jsonpath="{.items[0].metadata.name}")
echo "Visit http://127.0.0.1:8080 to use your application"
kubectl port-forward $POD_NAME 8080:80
{{- end }}
1. Run the port forward command:
kubectl -n {{ .Release.Namespace }} port-forward svc/{{ .Release.Name }} 3000:80
2. Navigate to:
http://localhost:3000

View File

@@ -38,12 +38,21 @@ spec:
# path: /
# port: http
env:
- name: GF_PATHS_PROVISIONING
value: /etc/grafana/provisioning/
{{- if .Values.password }}
- name: GF_SECURITY_ADMIN_USER
value: {{ .Values.user }}
- name: GF_SECURITY_ADMIN_PASSWORD
value: {{ .Values.password }}
- name: GF_PATHS_PROVISIONING
value: /etc/grafana/provisioning/
{{- else }}
- name: GF_AUTH_BASIC_ENABLED
value: "false"
- name: GF_AUTH_ANONYMOUS_ENABLED
value: "true"
- name: GF_AUTH_ANONYMOUS_ORG_ROLE
value: Admin
{{- end }}
volumeMounts:
- name: grafana
mountPath: /var/lib/grafana

View File

@@ -6,7 +6,7 @@ replicaCount: 1
image:
repository: grafana/grafana
tag: 5.4.2
tag: 5.4.3
pullPolicy: IfNotPresent
service:
@@ -28,7 +28,7 @@ tolerations: []
affinity: {}
user: admin
password: admin
password:
# Istio Prometheus instance
url: http://prometheus:9090

View File

@@ -1,14 +1,14 @@
apiVersion: v1
name: loadtester
version: 0.1.0
appVersion: 0.1.0
version: 0.4.0
appVersion: 0.3.0
kubeVersion: ">=1.11.0-0"
engine: gotpl
description: Flagger's load testing services based on rakyll/hey that generates traffic during canary analysis when configured as a webhook.
home: https://docs.flagger.app
icon: https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/logo/flagger-icon.png
icon: https://raw.githubusercontent.com/weaveworks/flagger/master/docs/logo/flagger-icon.png
sources:
- https://github.com/stefanprodan/flagger
- https://github.com/weaveworks/flagger
maintainers:
- name: stefanprodan
url: https://github.com/stefanprodan
@@ -16,5 +16,6 @@ maintainers:
keywords:
- canary
- istio
- appmesh
- gitops
- load testing

View File

@@ -1,6 +1,6 @@
# Flagger load testing service
[Flagger's](https://github.com/stefanprodan/flagger) load testing service is based on
[Flagger's](https://github.com/weaveworks/flagger) load testing service is based on
[rakyll/hey](https://github.com/rakyll/hey)
and can be used to generates traffic during canary analysis when configured as a webhook.
@@ -56,15 +56,15 @@ Parameter | Description | Default
`nodeSelector` | node labels for pod assignment | `{}`
`service.type` | type of service | `ClusterIP`
`service.port` | ClusterIP port | `80`
`cmd.logOutput` | Log the command output to stderr | `true`
`cmd.timeout` | Command execution timeout | `1h`
`logLevel` | Log level can be debug, info, warning, error or panic | `info`
`meshName` | AWS App Mesh name | `none`
`backends` | AWS App Mesh virtual services | `none`
Specify each parameter using the `--set key=value[,key=value]` argument to `helm install`. For example,
```console
helm install flagger/loadtester --name flagger-loadtester \
--set cmd.logOutput=false
helm install flagger/loadtester --name flagger-loadtester
```
Alternatively, a YAML file that specifies the values for the above parameters can be provided while installing the chart. For example,

View File

@@ -16,6 +16,8 @@ spec:
metadata:
labels:
app: {{ include "loadtester.name" . }}
annotations:
appmesh.k8s.aws/ports: "444"
spec:
containers:
- name: {{ .Chart.Name }}
@@ -29,7 +31,6 @@ spec:
- -port=8080
- -log-level={{ .Values.logLevel }}
- -timeout={{ .Values.cmd.timeout }}
- -log-cmd-output={{ .Values.cmd.logOutput }}
livenessProbe:
exec:
command:

View File

@@ -0,0 +1,27 @@
{{- if .Values.meshName }}
apiVersion: appmesh.k8s.aws/v1beta1
kind: VirtualNode
metadata:
name: {{ include "loadtester.fullname" . }}
labels:
app.kubernetes.io/name: {{ include "loadtester.name" . }}
helm.sh/chart: {{ include "loadtester.chart" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
spec:
meshName: {{ .Values.meshName }}
listeners:
- portMapping:
port: 444
protocol: http
serviceDiscovery:
dns:
hostName: {{ include "loadtester.fullname" . }}.{{ .Release.Namespace }}
{{- if .Values.backends }}
backends:
{{- range .Values.backends }}
- virtualService:
virtualServiceName: {{ . }}
{{- end }}
{{- end }}
{{- end }}

View File

@@ -1,13 +1,12 @@
replicaCount: 1
image:
repository: quay.io/stefanprodan/flagger-loadtester
tag: 0.1.0
repository: weaveworks/flagger-loadtester
tag: 0.3.0
pullPolicy: IfNotPresent
logLevel: info
cmd:
logOutput: true
timeout: 1h
nameOverride: ""
@@ -27,3 +26,9 @@ nodeSelector: {}
tolerations: []
affinity: {}
# App Mesh virtual node settings
meshName: ""
#backends:
# - app1.namespace
# - app2.namespace

View File

@@ -0,0 +1,21 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*~
# Various IDEs
.project
.idea/
*.tmproj

12
charts/podinfo/Chart.yaml Normal file
View File

@@ -0,0 +1,12 @@
apiVersion: v1
version: 2.0.1
appVersion: 1.4.0
name: podinfo
engine: gotpl
description: Flagger canary deployment demo chart
home: https://github.com/weaveworks/flagger
maintainers:
- email: stefanprodan@users.noreply.github.com
name: stefanprodan
sources:
- https://github.com/weaveworks/flagger

79
charts/podinfo/README.md Normal file
View File

@@ -0,0 +1,79 @@
# Podinfo
Podinfo is a tiny web application made with Go
that showcases best practices of running canary deployments with Flagger and Istio.
## Installing the Chart
Add Flagger Helm repository:
```console
helm repo add flagger https://flagger.app
```
To install the chart with the release name `frontend`:
```console
helm upgrade -i frontend flagger/podinfo \
--namespace test \
--set nameOverride=frontend \
--set backend=http://backend.test:9898/echo \
--set canary.enabled=true \
--set canary.istioIngress.enabled=true \
--set canary.istioIngress.gateway=public-gateway.istio-system.svc.cluster.local \
--set canary.istioIngress.host=frontend.istio.example.com
```
To install the chart as `backend`:
```console
helm upgrade -i backend flagger/podinfo \
--namespace test \
--set nameOverride=backend \
--set canary.enabled=true
```
## Uninstalling the Chart
To uninstall/delete the `frontend` deployment:
```console
$ helm delete --purge frontend
```
The command removes all the Kubernetes components associated with the chart and deletes the release.
## Configuration
The following tables lists the configurable parameters of the podinfo chart and their default values.
Parameter | Description | Default
--- | --- | ---
`image.repository` | image repository | `quay.io/stefanprodan/podinfo`
`image.tag` | image tag | `<VERSION>`
`image.pullPolicy` | image pull policy | `IfNotPresent`
`hpa.enabled` | enables HPA | `true`
`hpa.cpu` | target CPU usage per pod | `80`
`hpa.memory` | target memory usage per pod | `512Mi`
`hpa.minReplicas` | maximum pod replicas | `2`
`hpa.maxReplicas` | maximum pod replicas | `4`
`resources.requests/cpu` | pod CPU request | `1m`
`resources.requests/memory` | pod memory request | `16Mi`
`backend` | backend URL | None
`faults.delay` | random HTTP response delays between 0 and 5 seconds | `false`
`faults.error` | 1/3 chances of a random HTTP response error | `false`
Specify each parameter using the `--set key=value[,key=value]` argument to `helm install`. For example,
```console
$ helm install flagger/podinfo --name frontend \
--set=image.tag=1.4.1,hpa.enabled=false
```
Alternatively, a YAML file that specifies the values for the above parameters can be provided while installing the chart. For example,
```console
$ helm install flagger/podinfo --name frontend -f values.yaml
```

View File

@@ -0,0 +1 @@
podinfo {{ .Release.Name }} deployed!

View File

@@ -0,0 +1,43 @@
{{/* vim: set filetype=mustache: */}}
{{/*
Expand the name of the chart.
*/}}
{{- define "podinfo.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{/*
Create a default fully qualified app name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
If release name contains chart name it will be used as a full name.
*/}}
{{- define "podinfo.fullname" -}}
{{- if .Values.fullnameOverride -}}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- $name := default .Chart.Name .Values.nameOverride -}}
{{- if contains $name .Release.Name -}}
{{- .Release.Name | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{- end -}}
{{- end -}}
{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "podinfo.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{/*
Create chart name suffix.
*/}}
{{- define "podinfo.suffix" -}}
{{- if .Values.canary.enabled -}}
{{- "-primary" -}}
{{- else -}}
{{- "" -}}
{{- end -}}
{{- end -}}

View File

@@ -0,0 +1,54 @@
{{- if .Values.canary.enabled }}
apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
name: {{ template "podinfo.fullname" . }}
labels:
app: {{ template "podinfo.name" . }}
chart: {{ template "podinfo.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ template "podinfo.fullname" . }}
progressDeadlineSeconds: 60
autoscalerRef:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: {{ template "podinfo.fullname" . }}
service:
port: {{ .Values.service.port }}
{{- if .Values.canary.istioIngress.enabled }}
gateways:
- {{ .Values.canary.istioIngress.gateway }}
hosts:
- {{ .Values.canary.istioIngress.host }}
{{- end }}
canaryAnalysis:
interval: {{ .Values.canary.analysis.interval }}
threshold: {{ .Values.canary.analysis.threshold }}
maxWeight: {{ .Values.canary.analysis.maxWeight }}
stepWeight: {{ .Values.canary.analysis.stepWeight }}
metrics:
- name: request-success-rate
threshold: {{ .Values.canary.thresholds.successRate }}
interval: 1m
- name: request-duration
threshold: {{ .Values.canary.thresholds.latency }}
interval: 1m
{{- if .Values.canary.loadtest.enabled }}
webhooks:
- name: load-test-get
url: {{ .Values.canary.loadtest.url }}
timeout: 5s
metadata:
cmd: "hey -z 1m -q 5 -c 2 http://{{ template "podinfo.fullname" . }}.{{ .Release.Namespace }}:{{ .Values.service.port }}"
- name: load-test-post
url: {{ .Values.canary.loadtest.url }}
timeout: 5s
metadata:
cmd: "hey -z 1m -q 5 -c 2 -m POST -d '{\"test\": true}' http://{{ template "podinfo.fullname" . }}.{{ .Release.Namespace }}:{{ .Values.service.port }}/echo"
{{- end }}
{{- end }}

View File

@@ -0,0 +1,15 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ template "podinfo.fullname" . }}
labels:
app: {{ template "podinfo.name" . }}
chart: {{ template "podinfo.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
data:
config.yaml: |-
# http settings
http-client-timeout: 1m
http-server-timeout: {{ .Values.httpServer.timeout }}
http-server-shutdown-timeout: 5s

View File

@@ -0,0 +1,93 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ template "podinfo.fullname" . }}
labels:
app: {{ template "podinfo.name" . }}
chart: {{ template "podinfo.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
app: {{ template "podinfo.fullname" . }}
template:
metadata:
labels:
app: {{ template "podinfo.fullname" . }}
annotations:
prometheus.io/scrape: 'true'
spec:
terminationGracePeriodSeconds: 30
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
command:
- ./podinfo
- --port={{ .Values.service.port }}
- --level={{ .Values.logLevel }}
- --random-delay={{ .Values.faults.delay }}
- --random-error={{ .Values.faults.error }}
- --config-path=/podinfo/config
env:
{{- if .Values.message }}
- name: PODINFO_UI_MESSAGE
value: {{ .Values.message }}
{{- end }}
{{- if .Values.backend }}
- name: PODINFO_BACKEND_URL
value: {{ .Values.backend }}
{{- end }}
ports:
- name: http
containerPort: {{ .Values.service.port }}
protocol: TCP
livenessProbe:
exec:
command:
- podcli
- check
- http
- localhost:{{ .Values.service.port }}/healthz
initialDelaySeconds: 5
timeoutSeconds: 5
readinessProbe:
exec:
command:
- podcli
- check
- http
- localhost:{{ .Values.service.port }}/readyz
initialDelaySeconds: 5
timeoutSeconds: 5
volumeMounts:
- name: data
mountPath: /data
- name: config
mountPath: /podinfo/config
readOnly: true
resources:
{{ toYaml .Values.resources | indent 12 }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{ toYaml . | indent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{ toYaml . | indent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{ toYaml . | indent 8 }}
{{- end }}
volumes:
- name: data
emptyDir: {}
- name: config
configMap:
name: {{ template "podinfo.fullname" . }}

View File

@@ -0,0 +1,37 @@
{{- if .Values.hpa.enabled -}}
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: {{ template "podinfo.fullname" . }}
labels:
app: {{ template "podinfo.name" . }}
chart: {{ template "podinfo.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
scaleTargetRef:
apiVersion: apps/v1beta2
kind: Deployment
name: {{ template "podinfo.fullname" . }}
minReplicas: {{ .Values.hpa.minReplicas }}
maxReplicas: {{ .Values.hpa.maxReplicas }}
metrics:
{{- if .Values.hpa.cpu }}
- type: Resource
resource:
name: cpu
targetAverageUtilization: {{ .Values.hpa.cpu }}
{{- end }}
{{- if .Values.hpa.memory }}
- type: Resource
resource:
name: memory
targetAverageValue: {{ .Values.hpa.memory }}
{{- end }}
{{- if .Values.hpa.requests }}
- type: Pod
pods:
metricName: http_requests
targetAverageValue: {{ .Values.hpa.requests }}
{{- end }}
{{- end }}

View File

@@ -0,0 +1,20 @@
{{- if not .Values.canary.enabled }}
apiVersion: v1
kind: Service
metadata:
name: {{ template "podinfo.fullname" . }}
labels:
app: {{ template "podinfo.name" . }}
chart: {{ template "podinfo.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
type: {{ .Values.service.type }}
ports:
- port: {{ .Values.service.port }}
targetPort: http
protocol: TCP
name: http
selector:
app: {{ template "podinfo.fullname" . }}
{{- end }}

View File

@@ -0,0 +1,22 @@
{{- $url := printf "%s%s.%s:%v" (include "podinfo.fullname" .) (include "podinfo.suffix" .) .Release.Namespace .Values.service.port -}}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ template "podinfo.fullname" . }}-tests
labels:
heritage: {{ .Release.Service }}
release: {{ .Release.Name }}
chart: {{ .Chart.Name }}-{{ .Chart.Version }}
app: {{ template "podinfo.name" . }}
data:
run.sh: |-
@test "HTTP POST /echo" {
run curl --retry 3 --connect-timeout 2 -sSX POST -d 'test' {{ $url }}/echo
[ $output = "test" ]
}
@test "HTTP POST /store" {
curl --retry 3 --connect-timeout 2 -sSX POST -d 'test' {{ $url }}/store
}
@test "HTTP GET /" {
curl --retry 3 --connect-timeout 2 -sS {{ $url }} | grep hostname
}

View File

@@ -0,0 +1,43 @@
apiVersion: v1
kind: Pod
metadata:
name: {{ template "podinfo.fullname" . }}-tests-{{ randAlphaNum 5 | lower }}
annotations:
"helm.sh/hook": test-success
sidecar.istio.io/inject: "false"
labels:
heritage: {{ .Release.Service }}
release: {{ .Release.Name }}
chart: {{ .Chart.Name }}-{{ .Chart.Version }}
app: {{ template "podinfo.name" . }}
spec:
initContainers:
- name: "test-framework"
image: "dduportal/bats:0.4.0"
command:
- "bash"
- "-c"
- |
set -ex
# copy bats to tools dir
cp -R /usr/local/libexec/ /tools/bats/
volumeMounts:
- mountPath: /tools
name: tools
containers:
- name: {{ .Release.Name }}-ui-test
image: dduportal/bats:0.4.0
command: ["/tools/bats/bats", "-t", "/tests/run.sh"]
volumeMounts:
- mountPath: /tests
name: tests
readOnly: true
- mountPath: /tools
name: tools
volumes:
- name: tests
configMap:
name: {{ template "podinfo.fullname" . }}-tests
- name: tools
emptyDir: {}
restartPolicy: Never

View File

@@ -0,0 +1,73 @@
# Default values for podinfo.
image:
repository: quay.io/stefanprodan/podinfo
tag: 1.4.0
pullPolicy: IfNotPresent
service:
type: ClusterIP
port: 9898
hpa:
enabled: true
minReplicas: 2
maxReplicas: 2
cpu: 80
memory: 512Mi
canary:
enabled: true
istioIngress:
enabled: false
# Istio ingress gateway name
gateway: public-gateway.istio-system.svc.cluster.local
# external host name eg. podinfo.example.com
host:
analysis:
# schedule interval (default 60s)
interval: 15s
# max number of failed metric checks before rollback
threshold: 10
# max traffic percentage routed to canary
# percentage (0-100)
maxWeight: 50
# canary increment step
# percentage (0-100)
stepWeight: 5
thresholds:
# minimum req success rate (non 5xx responses)
# percentage (0-100)
successRate: 99
# maximum req duration P99
# milliseconds
latency: 500
loadtest:
enabled: false
# load tester address
url: http://flagger-loadtester.test/
resources:
limits:
requests:
cpu: 100m
memory: 32Mi
nodeSelector: {}
tolerations: []
affinity: {}
nameOverride: ""
fullnameOverride: ""
logLevel: info
backend: #http://backend-podinfo:9898/echo
message: #UI greetings
faults:
delay: false
error: false
httpServer:
timeout: 30s

View File

@@ -3,18 +3,21 @@ package main
import (
"flag"
"log"
"strings"
"time"
_ "github.com/istio/glog"
istioclientset "github.com/knative/pkg/client/clientset/versioned"
"github.com/knative/pkg/signals"
clientset "github.com/stefanprodan/flagger/pkg/client/clientset/versioned"
informers "github.com/stefanprodan/flagger/pkg/client/informers/externalversions"
"github.com/stefanprodan/flagger/pkg/controller"
"github.com/stefanprodan/flagger/pkg/logging"
"github.com/stefanprodan/flagger/pkg/notifier"
"github.com/stefanprodan/flagger/pkg/server"
"github.com/stefanprodan/flagger/pkg/version"
_ "github.com/stefanprodan/klog"
clientset "github.com/weaveworks/flagger/pkg/client/clientset/versioned"
informers "github.com/weaveworks/flagger/pkg/client/informers/externalversions"
"github.com/weaveworks/flagger/pkg/controller"
"github.com/weaveworks/flagger/pkg/logger"
"github.com/weaveworks/flagger/pkg/metrics"
"github.com/weaveworks/flagger/pkg/notifier"
"github.com/weaveworks/flagger/pkg/router"
"github.com/weaveworks/flagger/pkg/server"
"github.com/weaveworks/flagger/pkg/signals"
"github.com/weaveworks/flagger/pkg/version"
"go.uber.org/zap"
"k8s.io/client-go/kubernetes"
_ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
"k8s.io/client-go/tools/cache"
@@ -31,6 +34,12 @@ var (
slackURL string
slackUser string
slackChannel string
threadiness int
zapReplaceGlobals bool
zapEncoding string
namespace string
meshProvider string
selectorLabels string
)
func init() {
@@ -43,15 +52,25 @@ func init() {
flag.StringVar(&slackURL, "slack-url", "", "Slack hook URL.")
flag.StringVar(&slackUser, "slack-user", "flagger", "Slack user name.")
flag.StringVar(&slackChannel, "slack-channel", "", "Slack channel.")
flag.IntVar(&threadiness, "threadiness", 2, "Worker concurrency.")
flag.BoolVar(&zapReplaceGlobals, "zap-replace-globals", false, "Whether to change the logging level of the global zap logger.")
flag.StringVar(&zapEncoding, "zap-encoding", "json", "Zap logger encoding.")
flag.StringVar(&namespace, "namespace", "", "Namespace that flagger would watch canary object")
flag.StringVar(&meshProvider, "mesh-provider", "istio", "Service mesh provider, can be istio or appmesh")
flag.StringVar(&selectorLabels, "selector-labels", "app,name,app.kubernetes.io/name", "List of pod labels that Flagger uses to create pod selectors")
}
func main() {
flag.Parse()
logger, err := logging.NewLogger(logLevel)
logger, err := logger.NewLoggerWithEncoding(logLevel, zapEncoding)
if err != nil {
log.Fatalf("Error creating logger: %v", err)
}
if zapReplaceGlobals {
zap.ReplaceGlobals(logger.Desugar())
}
defer logger.Sync()
stopCh := signals.SetupSignalHandler()
@@ -66,7 +85,7 @@ func main() {
logger.Fatalf("Error building kubernetes clientset: %v", err)
}
istioClient, err := istioclientset.NewForConfig(cfg)
meshClient, err := clientset.NewForConfig(cfg)
if err != nil {
logger.Fatalf("Error building istio clientset: %v", err)
}
@@ -76,7 +95,8 @@ func main() {
logger.Fatalf("Error building example clientset: %s", err.Error())
}
flaggerInformerFactory := informers.NewSharedInformerFactory(flaggerClient, time.Second*30)
flaggerInformerFactory := informers.NewSharedInformerFactoryWithOptions(flaggerClient, time.Second*30, informers.WithNamespace(namespace))
canaryInformer := flaggerInformerFactory.Flagger().V1alpha3().Canaries()
logger.Infof("Starting flagger version %s revision %s", version.VERSION, version.REVISION)
@@ -86,9 +106,17 @@ func main() {
logger.Fatalf("Error calling Kubernetes API: %v", err)
}
logger.Infof("Connected to Kubernetes API %s", ver)
labels := strings.Split(selectorLabels, ",")
if len(labels) < 1 {
logger.Fatalf("At least one selector label is required")
}
ok, err := controller.CheckMetricsServer(metricsServer)
logger.Infof("Connected to Kubernetes API %s", ver)
if namespace != "" {
logger.Infof("Watching namespace %s", namespace)
}
ok, err := metrics.CheckMetricsServer(metricsServer)
if ok {
logger.Infof("Connected to metrics server %s", metricsServer)
} else {
@@ -108,15 +136,21 @@ func main() {
// start HTTP server
go server.ListenAndServe(port, 3*time.Second, logger, stopCh)
routerFactory := router.NewFactory(cfg, kubeClient, flaggerClient, logger, meshClient)
c := controller.NewController(
kubeClient,
istioClient,
meshClient,
flaggerClient,
canaryInformer,
controlLoopInterval,
metricsServer,
logger,
slack,
routerFactory,
meshProvider,
version.VERSION,
labels,
)
flaggerInformerFactory.Start(stopCh)
@@ -132,7 +166,7 @@ func main() {
// start controller
go func(ctrl *controller.Controller) {
if err := ctrl.Run(2, stopCh); err != nil {
if err := ctrl.Run(threadiness, stopCh); err != nil {
logger.Fatalf("Error running controller: %v", err)
}
}(c)

View File

@@ -2,40 +2,47 @@ package main
import (
"flag"
"github.com/knative/pkg/signals"
"github.com/stefanprodan/flagger/pkg/loadtester"
"github.com/stefanprodan/flagger/pkg/logging"
"github.com/weaveworks/flagger/pkg/loadtester"
"github.com/weaveworks/flagger/pkg/logger"
"github.com/weaveworks/flagger/pkg/signals"
"go.uber.org/zap"
"log"
"time"
)
var VERSION = "0.1.0"
var VERSION = "0.3.0"
var (
logLevel string
port string
timeout time.Duration
logCmdOutput bool
logLevel string
port string
timeout time.Duration
zapReplaceGlobals bool
zapEncoding string
)
func init() {
flag.StringVar(&logLevel, "log-level", "debug", "Log level can be: debug, info, warning, error.")
flag.StringVar(&port, "port", "9090", "Port to listen on.")
flag.DurationVar(&timeout, "timeout", time.Hour, "Command exec timeout.")
flag.BoolVar(&logCmdOutput, "log-cmd-output", true, "Log command output to stderr")
flag.DurationVar(&timeout, "timeout", time.Hour, "Load test exec timeout.")
flag.BoolVar(&zapReplaceGlobals, "zap-replace-globals", false, "Whether to change the logging level of the global zap logger.")
flag.StringVar(&zapEncoding, "zap-encoding", "json", "Zap logger encoding.")
}
func main() {
flag.Parse()
logger, err := logging.NewLogger(logLevel)
logger, err := logger.NewLoggerWithEncoding(logLevel, zapEncoding)
if err != nil {
log.Fatalf("Error creating logger: %v", err)
}
if zapReplaceGlobals {
zap.ReplaceGlobals(logger.Desugar())
}
defer logger.Sync()
stopCh := signals.SetupSignalHandler()
taskRunner := loadtester.NewTaskRunner(logger, timeout, logCmdOutput)
taskRunner := loadtester.NewTaskRunner(logger, timeout)
go taskRunner.Start(100*time.Millisecond, stopCh)

Binary file not shown.

After

Width:  |  Height:  |  Size: 158 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 207 KiB

After

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 130 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 196 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 183 KiB

After

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 205 KiB

View File

@@ -1,21 +1,21 @@
---
description: Flagger is an Istio progressive delivery Kubernetes operator
description: Flagger is a progressive delivery Kubernetes operator
---
# Introduction
[Flagger](https://github.com/stefanprodan/flagger) is a **Kubernetes** operator that automates the promotion of canary
deployments using **Istio** routing for traffic shifting and **Prometheus** metrics for canary analysis.
The canary analysis can be extended with webhooks for running integration tests,
load tests or any other custom validation.
[Flagger](https://github.com/weaveworks/flagger) is a **Kubernetes** operator that automates the promotion of canary
deployments using **Istio** or **App Mesh** routing for traffic shifting and **Prometheus** metrics for canary analysis.
The canary analysis can be extended with webhooks for running
system integration/acceptance tests, load tests, or any other custom validation.
Flagger implements a control loop that gradually shifts traffic to the canary while measuring key performance
indicators like HTTP requests success rate, requests average duration and pods health.
Based on the **KPIs** analysis a canary is promoted or aborted and the analysis result is published to **Slack**.
Based on analysis of the **KPIs** a canary is promoted or aborted, and the analysis result is published to **Slack**.
![Flagger overview diagram](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/diagrams/flagger-canary-overview.png)
![Flagger overview diagram](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/diagrams/flagger-canary-overview.png)
Flagger can be configured with Kubernetes custom resources \(canaries.flagger.app kind\) and is compatible with
Flagger can be configured with Kubernetes custom resources and is compatible with
any CI/CD solutions made for Kubernetes. Since Flagger is declarative and reacts to Kubernetes events,
it can be used in **GitOps** pipelines together with Weave Flux or JenkinsX.

View File

@@ -5,13 +5,20 @@
## Install
* [Install Flagger](install/install-flagger.md)
* [Install Grafana](install/install-grafana.md)
* [Install Istio](install/install-istio.md)
* [Flagger Install on Kubernetes](install/flagger-install-on-kubernetes.md)
* [Flagger Install on GKE Istio](install/flagger-install-on-google-cloud.md)
* [Flagger Install on EKS App Mesh](install/flagger-install-on-eks-appmesh.md)
* [Flagger Install with SuperGloo](install/flagger-install-with-supergloo.md)
## Usage
* [Canary Deployments](usage/progressive-delivery.md)
* [Istio Canary Deployments](usage/progressive-delivery.md)
* [Istio A/B Testing](usage/ab-testing.md)
* [App Mesh Canary Deployments](usage/appmesh-progressive-delivery.md)
* [Monitoring](usage/monitoring.md)
* [Alerting](usage/alerting.md)
## Tutorials
* [Canaries with Helm charts and GitOps](tutorials/canary-helm-gitops.md)
* [Zero downtime deployments](tutorials/zero-downtime-deployments.md)

21
docs/gitbook/faq.md Normal file
View File

@@ -0,0 +1,21 @@
# Frequently asked questions
**Can Flagger be part of my integration tests?**
> Yes, Flagger supports webhooks to do integration testing.
**What if I only want to target beta testers?**
> That's a feature in Flagger, not in App Mesh. It's on the App Mesh roadmap.
**When do I use A/B testing when Canary?**
> One advantage of using A/B testing is that each version remains separated and routes aren't mixed.
>
> Using a Canary deployment can lead to behaviour like this one observed by a
> user:
>
> [..] during a canary deployment of our nodejs app, the version that is being served <50% traffic reports mime type mismatch errors in the browser (js as "text/html")
> When the deployment Passes/ Fails (doesn't really matter) the version that stays alive works as expected. If anyone has any tips or direction I would greatly appreciate it. Even if its as simple as I'm looking in the wrong place. Thanks in advance!
>
> The issue was that we were not maintaining session affinity while serving files for our frontend. Which resulted in any redirects or refreshes occasionally returning a mismatched app.*.js file (generated from vue)
>
> Read up on [A/B testing](https://docs.flagger.app/usage/ab-testing).

View File

@@ -1,10 +1,10 @@
# How it works
[Flagger](https://github.com/stefanprodan/flagger) takes a Kubernetes deployment and optionally
[Flagger](https://github.com/weaveworks/flagger) takes a Kubernetes deployment and optionally
a horizontal pod autoscaler \(HPA\) and creates a series of objects
\(Kubernetes deployments, ClusterIP services and Istio virtual services\) to drive the canary analysis and promotion.
\(Kubernetes deployments, ClusterIP services and Istio or App Mesh virtual services\) to drive the canary analysis and promotion.
![Flagger Canary Process](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/diagrams/flagger-canary-hpa.png)
![Flagger Canary Process](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/diagrams/flagger-canary-hpa.png)
### Canary Custom Resource
@@ -33,12 +33,18 @@ spec:
service:
# container port
port: 9898
# service port name (optional, will default to "http")
portName: http-podinfo
# Istio gateways (optional)
gateways:
- public-gateway.istio-system.svc.cluster.local
- mesh
# Istio virtual service host names (optional)
hosts:
- podinfo.example.com
# promote the canary without analysing it (default false)
skipAnalysis: false
# define the canary analysis timing and KPIs
canaryAnalysis:
# schedule interval (default 60s)
interval: 1m
@@ -50,14 +56,14 @@ spec:
# canary increment step
# percentage (0-100)
stepWeight: 5
# Istio Prometheus checks
# Prometheus checks
metrics:
- name: istio_requests_total
- name: request-success-rate
# minimum req success rate (non 5xx responses)
# percentage (0-100)
threshold: 99
interval: 1m
- name: istio_request_duration_seconds_bucket
- name: request-duration
# maximum req duration P99
# milliseconds
threshold: 500
@@ -90,12 +96,167 @@ spec:
app: podinfo
```
Besides `app` Flagger supports `name` and `app.kubernetes.io/name` selectors. If you use a different
convention you can specify your label with the `-selector-labels` flag.
The target deployment should expose a TCP port that will be used by Flagger to create the ClusterIP Service and
the Istio Virtual Service. The container port from the target deployment should match the `service.port` value.
### Canary Deployment
### Istio routing
![Flagger Canary Stages](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/diagrams/flagger-canary-steps.png)
Flagger creates an Istio Virtual Service based on the Canary service spec. The service configuration lets you expose
an app inside or outside the mesh.
You can also define HTTP match conditions, URI rewrite rules, CORS policies, timeout and retries.
The following spec exposes the `frontend` workload inside the mesh on `frontend.test.svc.cluster.local:9898`
and outside the mesh on `frontend.example.com`. You'll have to specify an Istio ingress gateway for external hosts.
```yaml
apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
name: frontend
namespace: test
spec:
service:
# container port
port: 9898
# service port name (optional, will default to "http")
portName: http-frontend
# Istio gateways (optional)
gateways:
- public-gateway.istio-system.svc.cluster.local
- mesh
# Istio virtual service host names (optional)
hosts:
- frontend.example.com
# HTTP match conditions (optional)
match:
- uri:
prefix: /
# HTTP rewrite (optional)
rewrite:
uri: /
# Envoy timeout and retry policy (optional)
headers:
request:
add:
x-envoy-upstream-rq-timeout-ms: "15000"
x-envoy-max-retries: "10"
x-envoy-retry-on: "gateway-error,connect-failure,refused-stream"
# cross-origin resource sharing policy (optional)
corsPolicy:
allowOrigin:
- example.com
allowMethods:
- GET
allowCredentials: false
allowHeaders:
- x-some-header
maxAge: 24h
```
For the above spec Flagger will generate the following virtual service:
```yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: frontend
namespace: test
ownerReferences:
- apiVersion: flagger.app/v1alpha3
blockOwnerDeletion: true
controller: true
kind: Canary
name: podinfo
uid: 3a4a40dd-3875-11e9-8e1d-42010a9c0fd1
spec:
gateways:
- public-gateway.istio-system.svc.cluster.local
- mesh
hosts:
- frontend.example.com
- frontend
http:
- appendHeaders:
x-envoy-max-retries: "10"
x-envoy-retry-on: gateway-error,connect-failure,refused-stream
x-envoy-upstream-rq-timeout-ms: "15000"
corsPolicy:
allowHeaders:
- x-some-header
allowMethods:
- GET
allowOrigin:
- example.com
maxAge: 24h
match:
- uri:
prefix: /
rewrite:
uri: /
route:
- destination:
host: podinfo-primary
port:
number: 9898
weight: 100
- destination:
host: podinfo-canary
port:
number: 9898
weight: 0
```
Flagger keeps in sync the virtual service with the canary service spec. Any direct modification to the virtual
service spec will be overwritten.
To expose a workload inside the mesh on `http://backend.test.svc.cluster.local:9898`,
the service spec can contain only the container port:
```yaml
apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
name: backend
namespace: test
spec:
service:
port: 9898
```
Based on the above spec, Flagger will create several ClusterIP services like:
```yaml
apiVersion: v1
kind: Service
metadata:
name: backend-primary
ownerReferences:
- apiVersion: flagger.app/v1alpha3
blockOwnerDeletion: true
controller: true
kind: Canary
name: backend
uid: 2ca1a9c7-2ef6-11e9-bd01-42010a9c0145
spec:
type: ClusterIP
ports:
- name: http
port: 9898
protocol: TCP
targetPort: 9898
selector:
app: backend-primary
```
Flagger works for user facing apps exposed outside the cluster via an ingress gateway
and for backend HTTP APIs that are accessible only from inside the mesh.
### Canary Stages
![Flagger Canary Stages](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/diagrams/flagger-canary-steps.png)
A canary deployment is triggered by changes in any of the following objects:
@@ -110,16 +271,22 @@ Gated canary promotion stages:
* check primary and canary deployments status
* halt advancement if a rolling update is underway
* halt advancement if pods are unhealthy
* call pre-rollout webhooks are check results
* halt advancement if any hook returned a non HTTP 2xx result
* increment the failed checks counter
* increase canary traffic weight percentage from 0% to 5% (step weight)
* call webhooks and check results
* call rollout webhooks and check results
* check canary HTTP request success rate and latency
* halt advancement if any metric is under the specified threshold
* increment the failed checks counter
* check if the number of failed checks reached the threshold
* route all traffic to primary
* scale to zero the canary deployment and mark it as failed
* call post-rollout webhooks
* post the analysis result to Slack
* wait for the canary deployment to be updated and start over
* increase canary traffic weight by 5% (step weight) till it reaches 50% (max weight)
* halt advancement if any webhook call fails
* halt advancement while canary request success rate is under the threshold
* halt advancement while canary request duration P99 is over the threshold
* halt advancement if the primary or canary deployment becomes unhealthy
@@ -132,6 +299,8 @@ Gated canary promotion stages:
* route all traffic to primary
* scale to zero the canary deployment
* mark rollout as finished
* call post-rollout webhooks
* post the analysis result to Slack
* wait for the canary deployment to be updated and start over
### Canary Analysis
@@ -152,6 +321,9 @@ Spec:
# canary increment step
# percentage (0-100)
stepWeight: 2
# deploy straight to production without
# the metrics and webhook checks
skipAnalysis: false
```
The above analysis, if it succeeds, will run for 25 minutes while validating the HTTP metrics and webhooks every minute.
@@ -167,6 +339,54 @@ And the time it takes for a canary to be rollback when the metrics or webhook ch
interval * threshold
```
In emergency cases, you may want to skip the analysis phase and ship changes directly to production.
At any time you can set the `spec.skipAnalysis: true`.
When skip analysis is enabled, Flagger checks if the canary deployment is healthy and
promotes it without analysing it. If an analysis is underway, Flagger cancels it and runs the promotion.
### A/B Testing
Besides weighted routing, Flagger can be configured to route traffic to the canary based on HTTP match conditions.
In an A/B testing scenario, you'll be using HTTP headers or cookies to target a certain segment of your users.
This is particularly useful for frontend applications that require session affinity.
You can enable A/B testing by specifying the HTTP match conditions and the number of iterations:
```yaml
canaryAnalysis:
# schedule interval (default 60s)
interval: 1m
# total number of iterations
iterations: 10
# max number of failed iterations before rollback
threshold: 2
# canary match condition
match:
- headers:
user-agent:
regex: "^(?!.*Chrome).*Safari.*"
- headers:
cookie:
regex: "^(.*?;)?(user=test)(;.*)?$"
```
If Flagger finds a HTTP match condition, it will ignore the `maxWeight` and `stepWeight` settings.
The above configuration will run an analysis for ten minutes targeting the Safari users and those that have a test cookie.
You can determine the minimum time that it takes to validate and promote a canary deployment using this formula:
```
interval * iterations
```
And the time it takes for a canary to be rollback when the metrics or webhook checks are failing:
```
interval * threshold
```
Make sure that the analysis threshold is lower than the number of iterations.
### HTTP Metrics
The canary analysis is using the following Prometheus queries:
@@ -178,14 +398,14 @@ Spec:
```yaml
canaryAnalysis:
metrics:
- name: istio_requests_total
- name: request-success-rate
# minimum req success rate (non 5xx responses)
# percentage (0-100)
threshold: 99
interval: 1m
```
Query:
Istio query:
```javascript
sum(
@@ -210,6 +430,29 @@ sum(
)
```
App Mesh query:
```javascript
sum(
rate(
envoy_cluster_upstream_rq{
kubernetes_namespace="$namespace",
kubernetes_pod_name=~"$workload",
response_code!~"5.*"
}[$interval]
)
)
/
sum(
rate(
envoy_cluster_upstream_rq{
kubernetes_namespace="$namespace",
kubernetes_pod_name=~"$workload"
}[$interval]
)
)
```
**HTTP requests milliseconds duration P99**
Spec:
@@ -217,14 +460,14 @@ Spec:
```yaml
canaryAnalysis:
metrics:
- name: istio_request_duration_seconds_bucket
- name: request-duration
# maximum req duration P99
# milliseconds
threshold: 500
interval: 1m
```
Query:
Istio query:
```javascript
histogram_quantile(0.99,
@@ -240,40 +483,145 @@ histogram_quantile(0.99,
)
```
App Mesh query:
```javascript
histogram_quantile(0.99,
sum(
irate(
envoy_cluster_upstream_rq_time_bucket{
kubernetes_pod_name=~"$workload",
kubernetes_namespace=~"$namespace"
}[$interval]
)
) by (le)
)
```
> **Note** that the metric interval should be lower or equal to the control loop interval.
### Custom Metrics
The canary analysis can be extended with custom Prometheus queries.
```yaml
canaryAnalysis:
threshold: 1
maxWeight: 50
stepWeight: 5
metrics:
- name: "404s percentage"
threshold: 5
query: |
100 - sum(
rate(
istio_requests_total{
reporter="destination",
destination_workload_namespace="test",
destination_workload="podinfo",
response_code!="404"
}[1m]
)
)
/
sum(
rate(
istio_requests_total{
reporter="destination",
destination_workload_namespace="test",
destination_workload="podinfo"
}[1m]
)
) * 100
```
The above configuration validates the canary by checking
if the HTTP 404 req/sec percentage is below 5 percent of the total traffic.
If the 404s rate reaches the 5% threshold, then the canary fails.
```yaml
canaryAnalysis:
threshold: 1
maxWeight: 50
stepWeight: 5
metrics:
- name: "rpc error rate"
threshold: 5
query: |
100 - (sum
rate(
grpc_server_handled_total{
grpc_service="my.TestService",
grpc_code!="OK"
}[1m]
)
)
/
sum(
rate(
grpc_server_started_total{
grpc_service="my.TestService"
}[1m]
)
) * 100
```
The above configuration validates the canary by checking if the percentage of
non-OK GRPC req/sec is below 5 percent of the total requests. If the non-OK
rate reaches the 5% threshold, then the canary fails.
When specifying a query, Flagger will run the promql query and convert the result to float64.
Then it compares the query result value with the metric threshold value.
### Webhooks
The canary analysis can be extended with webhooks.
Flagger will call each webhook URL and determine from the response status code (HTTP 2xx) if the canary is failing or not.
The canary analysis can be extended with webhooks. Flagger will call each webhook URL and
determine from the response status code (HTTP 2xx) if the canary is failing or not.
There are three types of hooks:
* Pre-rollout hooks are executed before routing traffic to canary.
The canary advancement is paused if a pre-rollout hook fails and if the number of failures reach the
threshold the canary will be rollback.
* Rollout hooks are executed during the analysis on each iteration before the metric checks.
If a rollout hook call fails the canary advancement is paused and eventfully rolled back.
* Post-rollout hooks are executed after the canary has been promoted or rolled back.
If a post rollout hook fails the error is logged.
Spec:
```yaml
canaryAnalysis:
webhooks:
- name: integration-tests
url: http://podinfo.test:9898/echo
timeout: 30s
metadata:
test: "all"
token: "16688eb5e9f289f1991c"
- name: load-tests
url: http://podinfo.test:9898/echo
- name: "smoke test"
type: pre-rollout
url: http://migration-check.db/query
timeout: 30s
metadata:
key1: "val1"
key2: "val2"
- name: "load test"
type: rollout
url: http://flagger-loadtester.test/
timeout: 15s
metadata:
cmd: "hey -z 1m -q 5 -c 2 http://podinfo-canary.test:9898/"
- name: "notify"
type: post-rollout
url: http://telegram.bot:8080/
timeout: 5s
metadata:
some: "message"
```
> **Note** that the sum of all webhooks timeouts should be lower than the control loop interval.
> **Note** that the sum of all rollout webhooks timeouts should be lower than the analysis interval.
Webhook payload (HTTP POST):
```json
{
"name": "podinfo",
"namespace": "test",
"namespace": "test",
"phase": "Progressing",
"metadata": {
"test": "all",
"token": "16688eb5e9f289f1991c"
@@ -298,12 +646,12 @@ Flagger metric checks will fail with "no values found for metric istio_requests_
Flagger comes with a load testing service based on [rakyll/hey](https://github.com/rakyll/hey)
that generates traffic during analysis when configured as a webhook.
![Flagger Load Testing Webhook](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/diagrams/flagger-load-testing.png)
![Flagger Load Testing Webhook](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/diagrams/flagger-load-testing.png)
First you need to deploy the load test runner in a namespace with Istio sidecar injection enabled:
```bash
export REPO=https://raw.githubusercontent.com/stefanprodan/flagger/master
export REPO=https://raw.githubusercontent.com/weaveworks/flagger/master
kubectl -n test apply -f ${REPO}/artifacts/loadtester/deployment.yaml
kubectl -n test apply -f ${REPO}/artifacts/loadtester/service.yaml
@@ -315,8 +663,7 @@ Or by using Helm:
helm repo add flagger https://flagger.app
helm upgrade -i flagger-loadtester flagger/loadtester \
--namepace=test \
--set cmd.logOutput=true \
--namespace=test \
--set cmd.timeout=1h
```
@@ -330,11 +677,13 @@ webhooks:
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
type: cmd
cmd: "hey -z 1m -q 10 -c 2 http://podinfo.test:9898/"
- name: load-test-post
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
type: cmd
cmd: "hey -z 1m -q 10 -c 2 -m POST -d '{test: 2}' http://podinfo.test:9898/echo"
```
@@ -351,6 +700,7 @@ webhooks:
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
type: cmd
cmd: "hey -z 1m -q 10 -c 2 -h2 https://podinfo.example.com/"
```
@@ -363,3 +713,32 @@ FROM quay.io/stefanprodan/flagger-loadtester:<VER>
RUN curl -Lo /usr/local/bin/my-cli https://github.com/user/repo/releases/download/ver/my-cli \
&& chmod +x /usr/local/bin/my-cli
```
### Load Testing Delegation
The load tester can also forward testing tasks to external tools, by now [nGrinder](https://github.com/naver/ngrinder)
is supported.
To use this feature, add a load test task of type 'ngrinder' to the canary analysis spec:
```yaml
webhooks:
- name: load-test-post
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
# type of this load test task, cmd or ngrinder
type: ngrinder
# base url of your nGrinder controller server
server: http://ngrinder-server:port
# id of the test to clone from, the test must have been defined.
clone: 100
# user name and base64 encoded password to authenticate against the nGrinder server
username: admin
passwd: YWRtaW4=
# the interval between between nGrinder test status polling, default to 1s
pollInterval: 5s
```
When the canary analysis starts, the load tester will initiate a [clone_and_start request](https://github.com/naver/ngrinder/wiki/REST-API-PerfTest)
to the nGrinder server and start a new performance test. the load tester will periodically poll the nGrinder server
for the status of the test, and prevent duplicate requests from being sent in subsequent analysis loops.

View File

@@ -0,0 +1,188 @@
# Flagger install on AWS
This guide walks you through setting up Flagger and AWS App Mesh on EKS.
### App Mesh
The App Mesh integration with EKS is made out of the following components:
* Kubernetes custom resources
* `mesh.appmesh.k8s.aws` defines a logical boundary for network traffic between the services
* `virtualnode.appmesh.k8s.aws` defines a logical pointer to a Kubernetes workload
* `virtualservice.appmesh.k8s.aws` defines the routing rules for a workload inside the mesh
* CRD controller - keeps the custom resources in sync with the App Mesh control plane
* Admission controller - injects the Envoy sidecar and assigns Kubernetes pods to App Mesh virtual nodes
* Metrics server - Prometheus instance that collects and stores Envoy's metrics
Prerequisites:
* jq
* homebrew
* openssl
* kubectl
* AWS CLI (default region us-west-2)
### Create a Kubernetes cluster
In order to create an EKS cluster you can use [eksctl](https://eksctl.io).
Eksctl is an open source command-line utility made by Weaveworks in collaboration with Amazon,
its a Kubernetes-native tool written in Go.
On MacOS you can install eksctl with Homebrew:
```bash
brew tap weaveworks/tap
brew install weaveworks/tap/eksctl
```
Create an EKS cluster:
```bash
eksctl create cluster --name=appmesh \
--region=us-west-2 \
--appmesh-access
```
The above command will create a two nodes cluster with App Mesh
[IAM policy](https://docs.aws.amazon.com/app-mesh/latest/userguide/MESH_IAM_user_policies.html)
attached to the EKS node instance role.
Verify the install with:
```bash
kubectl get nodes
```
### Install Helm
Install the [Helm](https://docs.helm.sh/using_helm/#installing-helm) command-line tool:
```text
brew install kubernetes-helm
```
Create a service account and a cluster role binding for Tiller:
```bash
kubectl -n kube-system create sa tiller
kubectl create clusterrolebinding tiller-cluster-rule \
--clusterrole=cluster-admin \
--serviceaccount=kube-system:tiller
```
Deploy Tiller in the `kube-system` namespace:
```bash
helm init --service-account tiller
```
You should consider using SSL between Helm and Tiller, for more information on securing your Helm
installation see [docs.helm.sh](https://docs.helm.sh/using_helm/#securing-your-helm-installation).
### Enable horizontal pod auto-scaling
Install the Horizontal Pod Autoscaler (HPA) metrics provider:
```bash
helm upgrade -i metrics-server stable/metrics-server \
--namespace kube-system
```
After a minute, the metrics API should report CPU and memory usage for pods.
You can very the metrics API with:
```bash
kubectl -n kube-system top pods
```
### Install the App Mesh components
Run the App Mesh installer:
```bash
curl -fsSL https://git.io/get-app-mesh-eks.sh | bash -
```
The installer does the following:
* creates the `appmesh-system` namespace
* generates a certificate signed by Kubernetes CA
* registers the App Mesh mutating webhook
* deploys the App Mesh webhook in `appmesh-system` namespace
* deploys the App Mesh CRDs
* deploys the App Mesh controller in `appmesh-system` namespace
* creates a mesh called `global`
Verify that the global mesh is active:
```bash
kubectl describe mesh
Status:
Mesh Condition:
Status: True
Type: MeshActive
```
### Install Prometheus
In order to collect the App Mesh metrics that Flagger needs to run the canary analysis,
you'll need to setup a Prometheus instance to scrape the Envoy sidecars.
Deploy Prometheus in the `appmesh-system` namespace:
```bash
REPO=https://raw.githubusercontent.com/weaveworks/flagger/master
kubectl apply -f ${REPO}/artifacts/eks/appmesh-prometheus.yaml
```
### Install Flagger and Grafana
Add Flagger Helm repository:
```bash
helm repo add flagger https://flagger.app
```
Deploy Flagger in the _**appmesh-system**_ namespace:
```bash
helm upgrade -i flagger flagger/flagger \
--namespace=appmesh-system \
--set meshProvider=appmesh \
--set metricsServer=http://prometheus.appmesh-system:9090
```
You can install Flagger in any namespace as long as it can talk to the Istio Prometheus service on port 9090.
You can enable **Slack** notifications with:
```bash
helm upgrade -i flagger flagger/flagger \
--namespace=appmesh-system \
--set meshProvider=appmesh \
--set metricsServer=http://prometheus.appmesh:9090 \
--set slack.url=https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK \
--set slack.channel=general \
--set slack.user=flagger
```
Flagger comes with a Grafana dashboard made for monitoring the canary analysis.
Deploy Grafana in the _**appmesh-system**_ namespace:
```bash
helm upgrade -i flagger-grafana flagger/grafana \
--namespace=appmesh-system \
--set url=http://prometheus.appmesh-system:9090
```
You can access Grafana using port forwarding:
```bash
kubectl -n appmesh-system port-forward svc/flagger-grafana 3000:80
```
Now that you have Flagger running you can try the
[App Mesh canary deployments tutorial](https://docs.flagger.app/usage/appmesh-progressive-delivery).

View File

@@ -1,16 +1,15 @@
# Install Istio
# Flagger install on Google Cloud
This guide walks you through setting up Istio with Jaeger, Prometheus, Grafana and
Lets Encrypt TLS for ingress gateway on Google Kubernetes Engine.
This guide walks you through setting up Flagger and Istio on Google Kubernetes Engine.
![Istio GKE diagram](https://raw.githubusercontent.com/stefanprodan/istio-gke/master/docs/screens/istio-gcp-overview.png)
![GKE Cluster Overview](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/diagrams/flagger-gke-istio.png)
### Prerequisites
You will be creating a cluster on Googles Kubernetes Engine \(GKE\),
if you dont have an account you can sign up [here](https://cloud.google.com/free/) for free credits.
Login into GCP, create a project and enable billing for it.
Login into Google Cloud, create a project and enable billing for it.
Install the [gcloud](https://cloud.google.com/sdk/) command line utility and configure your project with `gcloud init`.
@@ -23,8 +22,8 @@ gcloud config set project PROJECT_ID
Set the default compute region and zone:
```text
gcloud config set compute/region europe-west3
gcloud config set compute/zone europe-west3-a
gcloud config set compute/region us-central1
gcloud config set compute/zone us-central1-a
```
Enable the Kubernetes and Cloud DNS services for your project:
@@ -34,46 +33,42 @@ gcloud services enable container.googleapis.com
gcloud services enable dns.googleapis.com
```
Install the `kubectl` command-line tool:
Install the kubectl command-line tool:
```text
gcloud components install kubectl
```
Install the `helm` command-line tool:
```text
brew install kubernetes-helm
```
### GKE cluster setup
Create a cluster with three nodes using the latest Kubernetes version:
Create a cluster with the Istio add-on:
```bash
k8s_version=$(gcloud container get-server-config --format=json \
| jq -r '.validNodeVersions[0]')
K8S_VERSION=$(gcloud container get-server-config --format=json \
| jq -r '.validMasterVersions[0]')
gcloud container clusters create istio \
--cluster-version=${k8s_version} \
--zone=europe-west3-a \
--num-nodes=3 \
gcloud beta container clusters create istio \
--cluster-version=${K8S_VERSION} \
--zone=us-central1-a \
--num-nodes=2 \
--machine-type=n1-highcpu-4 \
--preemptible \
--no-enable-cloud-logging \
--no-enable-cloud-monitoring \
--disk-size=30 \
--enable-autorepair \
--scopes=gke-default,compute-rw,storage-rw
--addons=HorizontalPodAutoscaling,Istio \
--istio-config=auth=MTLS_PERMISSIVE
```
The above command will create a default node pool consisting of `n1-highcpu-4` \(vCPU: 4, RAM 3.60GB, DISK: 30GB\)
The above command will create a default node pool consisting of two `n1-highcpu-4` \(vCPU: 4, RAM 3.60GB, DISK: 30GB\)
preemptible VMs. Preemptible VMs are up to 80% cheaper than regular instances and are terminated and replaced
after a maximum of 24 hours.
Set up credentials for `kubectl`:
```bash
gcloud container clusters get-credentials istio -z=europe-west3-a
gcloud container clusters get-credentials istio
```
Create a cluster admin role binding:
@@ -87,9 +82,11 @@ kubectl create clusterrolebinding "cluster-admin-$(whoami)" \
Validate your setup with:
```bash
kubectl get nodes -o wide
kubectl -n istio-system get svc
```
In a couple of seconds GCP should allocate an external IP to the `istio-ingressgateway` service.
### Cloud DNS setup
You will need an internet domain and access to the registrar to change the name servers to Google Cloud DNS.
@@ -116,34 +113,30 @@ Wait for the name servers to change \(replace `example.com` with your domain\):
watch dig +short NS example.com
```
Create a static IP address named `istio-gateway-ip` in the same region as your GKE cluster:
Create a static IP address named `istio-gateway` using the Istio ingress IP:
```bash
gcloud compute addresses create istio-gateway-ip --region europe-west3
export GATEWAY_IP=$(kubectl -n istio-system get svc/istio-ingressgateway -ojson \
| jq -r .status.loadBalancer.ingress[0].ip)
gcloud compute addresses create istio-gateway --addresses ${GATEWAY_IP} --region us-central1
```
Find the static IP address:
```bash
gcloud compute addresses describe istio-gateway-ip --region europe-west3
```
Create the following DNS records \(replace `example.com` with your domain and set your Istio Gateway IP\):
Create the following DNS records \(replace `example.com` with your domain\):
```bash
DOMAIN="example.com"
GATEWAYIP="35.198.98.90"
gcloud dns record-sets transaction start --zone=istio
gcloud dns record-sets transaction add --zone=istio \
--name="${DOMAIN}" --ttl=300 --type=A ${GATEWAYIP}
--name="${DOMAIN}" --ttl=300 --type=A ${GATEWAY_IP}
gcloud dns record-sets transaction add --zone=istio \
--name="www.${DOMAIN}" --ttl=300 --type=A ${GATEWAYIP}
--name="www.${DOMAIN}" --ttl=300 --type=A ${GATEWAY_IP}
gcloud dns record-sets transaction add --zone=istio \
--name="*.${DOMAIN}" --ttl=300 --type=A ${GATEWAYIP}
--name="*.${DOMAIN}" --ttl=300 --type=A ${GATEWAY_IP}
gcloud dns record-sets transaction execute --zone istio
```
@@ -154,31 +147,22 @@ Verify that the wildcard DNS is working \(replace `example.com` with your domain
watch host test.example.com
```
### Install Istio with Helm
### Install Helm
Download the latest Istio release:
Install the [Helm](https://docs.helm.sh/using_helm/#installing-helm) command-line tool:
```bash
curl -L https://git.io/getLatestIstio | sh -
```
Navigate to `istio-x.x.x` dir and copy the Istio CLI in your bin:
```bash
cd istio-x.x.x/
sudo cp ./bin/istioctl /usr/local/bin/istioctl
```
Apply the Istio CRDs:
```bash
kubectl apply -f ./install/kubernetes/helm/istio/templates/crds.yaml
```text
brew install kubernetes-helm
```
Create a service account and a cluster role binding for Tiller:
```bash
kubectl apply -f ./install/kubernetes/helm/helm-service-account.yaml
kubectl -n kube-system create sa tiller
kubectl create clusterrolebinding tiller-cluster-rule \
--clusterrole=cluster-admin \
--serviceaccount=kube-system:tiller
```
Deploy Tiller in the `kube-system` namespace:
@@ -187,125 +171,53 @@ Deploy Tiller in the `kube-system` namespace:
helm init --service-account tiller
```
Find the GKE IP ranges:
You should consider using SSL between Helm and Tiller, for more information on securing your Helm
installation see [docs.helm.sh](https://docs.helm.sh/using_helm/#securing-your-helm-installation).
### Install cert-manager
Jetstack's [cert-manager](https://github.com/jetstack/cert-manager)
is a Kubernetes operator that automatically creates and manages TLS certs issued by Lets Encrypt.
You'll be using cert-manager to provision a wildcard certificate for the Istio ingress gateway.
Install cert-manager's CRDs:
```bash
gcloud container clusters describe istio --zone=europe-west3-a \
| grep -e clusterIpv4Cidr -e servicesIpv4Cidr
CERT_REPO=https://raw.githubusercontent.com/jetstack/cert-manager
kubectl apply -f ${CERT_REPO}/release-0.7/deploy/manifests/00-crds.yaml
```
You'll be using the IP ranges to allow unrestricted egress traffic for services running inside the service mesh.
Configure Istio with Prometheus, Jaeger, and cert-manager:
```yaml
global:
nodePort: false
proxy:
# replace with your GKE IP ranges
includeIPRanges: "10.28.0.0/14,10.7.240.0/20"
sidecarInjectorWebhook:
enabled: true
enableNamespacesByDefault: false
gateways:
enabled: true
istio-ingressgateway:
replicaCount: 2
autoscaleMin: 2
autoscaleMax: 3
# replace with your Istio Gateway IP
loadBalancerIP: "35.198.98.90"
type: LoadBalancer
pilot:
enabled: true
replicaCount: 1
autoscaleMin: 1
autoscaleMax: 1
resources:
requests:
cpu: 500m
memory: 1024Mi
grafana:
enabled: true
security:
enabled: true
adminUser: admin
# change the password
adminPassword: admin
prometheus:
enabled: true
servicegraph:
enabled: true
tracing:
enabled: true
jaeger:
tag: 1.7
certmanager:
enabled: true
```
Save the above file as `my-istio.yaml` and install Istio with Helm:
Create the cert-manager namespace and disable resource validation:
```bash
helm upgrade --install istio ./install/kubernetes/helm/istio \
--namespace=istio-system \
-f ./my-istio.yaml
kubectl create namespace cert-manager
kubectl label namespace cert-manager certmanager.k8s.io/disable-validation=true
```
Verify that Istio workloads are running:
Install cert-manager with Helm:
```text
kubectl -n istio-system get pods
```bash
helm repo add jetstack https://charts.jetstack.io && \
helm repo update && \
helm upgrade -i cert-manager \
--namespace cert-manager \
--version v0.7.0 \
jetstack/cert-manager
```
### Configure Istio Gateway with LE TLS
### Istio Gateway TLS setup
![Istio Let&apos;s Encrypt diagram](https://raw.githubusercontent.com/stefanprodan/istio-gke/master/docs/screens/istio-cert-manager-gcp.png)
![Istio Let&apos;s Encrypt](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/diagrams/istio-cert-manager-gke.png)
Create a Istio Gateway in istio-system namespace with HTTPS redirect:
Create a generic Istio Gateway to expose services outside the mesh on HTTPS:
```yaml
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: public-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
tls:
httpsRedirect: true
- port:
number: 443
name: https
protocol: HTTPS
hosts:
- "*"
tls:
mode: SIMPLE
privateKey: /etc/istio/ingressgateway-certs/tls.key
serverCertificate: /etc/istio/ingressgateway-certs/tls.crt
```
```bash
REPO=https://raw.githubusercontent.com/weaveworks/flagger/master
Save the above resource as istio-gateway.yaml and then apply it:
```text
kubectl apply -f ./istio-gateway.yaml
kubectl apply -f ${REPO}/artifacts/gke/istio-gateway.yaml
```
Create a service account with Cloud DNS admin role \(replace `my-gcp-project` with your project ID\):
@@ -374,7 +286,7 @@ metadata:
name: istio-gateway
namespace: istio-system
spec:
secretname: istio-ingressgateway-certs
secretName: istio-ingressgateway-certs
issuerRef:
name: letsencrypt-prod
commonName: "*.example.com"
@@ -387,37 +299,76 @@ spec:
- "example.com"
```
Save the above resource as of-cert.yaml and then apply it:
Save the above resource as istio-gateway-cert.yaml and then apply it:
```text
kubectl apply -f ./of-cert.yaml
kubectl apply -f ./istio-gateway-cert.yaml
```
In a couple of seconds cert-manager should fetch a wildcard certificate from letsencrypt.org:
```text
kubectl -n istio-system logs deployment/certmanager -f
kubectl -n istio-system describe certificate istio-gateway
Certificate issued successfully
Certificate istio-system/istio-gateway scheduled for renewal in 1438 hours
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CertIssued 1m52s cert-manager Certificate issued successfully
```
Recreate Istio ingress gateway pods:
```bash
kubectl -n istio-system delete pods -l istio=ingressgateway
kubectl -n istio-system get pods -l istio=ingressgateway
```
Note that Istio gateway doesn't reload the certificates from the TLS secret on cert-manager renewal.
Since the GKE cluster is made out of preemptible VMs the gateway pods will be replaced once every 24h,
if your not using preemptible nodes then you need to manually kill the gateway pods every two months
if your not using preemptible nodes then you need to manually delete the gateway pods every two months
before the certificate expires.
### Expose services outside the service mesh
### Install Prometheus
In order to expose services via the Istio Gateway you have to create a Virtual Service attached to Istio Gateway.
The GKE Istio add-on does not include a Prometheus instance that scrapes the Istio telemetry service.
Because Flagger uses the Istio HTTP metrics to run the canary analysis you have to deploy the following
Prometheus configuration that's similar to the one that comes with the official Istio Helm chart.
Create a virtual service in `istio-system` namespace for Grafana \(replace `example.com` with your domain\):
```bash
REPO=https://raw.githubusercontent.com/weaveworks/flagger/master
kubectl apply -f ${REPO}/artifacts/gke/istio-prometheus.yaml
```
### Install Flagger and Grafana
Add Flagger Helm repository:
```bash
helm repo add flagger https://flagger.app
```
Deploy Flagger in the `istio-system` namespace with Slack notifications enabled:
```bash
helm upgrade -i flagger flagger/flagger \
--namespace=istio-system \
--set metricsServer=http://prometheus.istio-system:9090 \
--set slack.url=https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK \
--set slack.channel=general \
--set slack.user=flagger
```
Deploy Grafana in the `istio-system` namespace:
```bash
helm upgrade -i flagger-grafana flagger/grafana \
--namespace=istio-system \
--set url=http://prometheus.istio-system:9090 \
--set user=admin \
--set password=replace-me
```
Expose Grafana through the public gateway by creating a virtual service \(replace `example.com` with your domain\):
```yaml
apiVersion: networking.istio.io/v1alpha3
@@ -433,8 +384,7 @@ spec:
http:
- route:
- destination:
host: grafana
timeout: 30s
host: flagger-grafana
```
Save the above resource as grafana-virtual-service.yaml and then apply it:
@@ -444,17 +394,3 @@ kubectl apply -f ./grafana-virtual-service.yaml
```
Navigate to `http://grafana.example.com` in your browser and you should be redirected to the HTTPS version.
Check that HTTP2 is enabled:
```bash
curl -I --http2 https://grafana.example.com
HTTP/2 200
content-type: text/html; charset=UTF-8
x-envoy-upstream-service-time: 3
server: envoy
```

View File

@@ -0,0 +1,145 @@
# Flagger install on Kubernetes
This guide walks you through setting up Flagger on a Kubernetes cluster.
### Prerequisites
Flagger requires a Kubernetes cluster **v1.11** or newer with the following admission controllers enabled:
* MutatingAdmissionWebhook
* ValidatingAdmissionWebhook
Flagger depends on [Istio](https://istio.io/docs/setup/kubernetes/quick-start/) **v1.0.3** or newer
with traffic management, telemetry and Prometheus enabled.
A minimal Istio installation should contain the following services:
* istio-pilot
* istio-ingressgateway
* istio-sidecar-injector
* istio-telemetry
* prometheus
### Install Flagger
Add Flagger Helm repository:
```bash
helm repo add flagger https://flagger.app
```
Deploy Flagger in the _**istio-system**_ namespace:
```bash
helm upgrade -i flagger flagger/flagger \
--namespace=istio-system \
--set metricsServer=http://prometheus.istio-system:9090
```
You can install Flagger in any namespace as long as it can talk to the Istio Prometheus service on port 9090.
Enable **Slack** notifications:
```bash
helm upgrade -i flagger flagger/flagger \
--namespace=istio-system \
--set slack.url=https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK \
--set slack.channel=general \
--set slack.user=flagger
```
If you don't have Tiller you can use the helm template command and apply the generated yaml with kubectl:
```bash
# generate
helm fetch --untar --untardir . flagger/flagger &&
helm template flagger \
--name flagger \
--namespace=istio-system \
--set metricsServer=http://prometheus.istio-system:9090 \
> $HOME/flagger.yaml
# apply
kubectl apply -f $HOME/flagger.yaml
```
To uninstall the Flagger release with Helm run:
```text
helm delete --purge flagger
```
The command removes all the Kubernetes components associated with the chart and deletes the release.
> **Note** that on uninstall the Canary CRD will not be removed.
Deleting the CRD will make Kubernetes remove all the objects owned by Flagger like Istio virtual services,
Kubernetes deployments and ClusterIP services.
If you want to remove all the objects created by Flagger you have delete the Canary CRD with kubectl:
```text
kubectl delete crd canaries.flagger.app
```
### Install Grafana
Flagger comes with a Grafana dashboard made for monitoring the canary analysis.
Deploy Grafana in the _**istio-system**_ namespace:
```bash
helm upgrade -i flagger-grafana flagger/grafana \
--namespace=istio-system \
--set url=http://prometheus.istio-system:9090 \
--set user=admin \
--set password=change-me
```
Or use helm template command and apply the generated yaml with kubectl:
```bash
# generate
helm fetch --untar --untardir . flagger/grafana &&
helm template grafana \
--name flagger-grafana \
--namespace=istio-system \
> $HOME/flagger-grafana.yaml
# apply
kubectl apply -f $HOME/flagger-grafana.yaml
```
You can access Grafana using port forwarding:
```bash
kubectl -n istio-system port-forward svc/flagger-grafana 3000:80
```
### Install Load Tester
Flagger comes with an optional load testing service that generates traffic
during canary analysis when configured as a webhook.
Deploy the load test runner with Helm:
```bash
helm upgrade -i flagger-loadtester flagger/loadtester \
--namespace=test \
--set cmd.timeout=1h
```
Deploy with kubectl:
```bash
helm fetch --untar --untardir . flagger/loadtester &&
helm template loadtester \
--name flagger-loadtester \
--namespace=test
> $HOME/flagger-loadtester.yaml
# apply
kubectl apply -f $HOME/flagger-loadtester.yaml
```
> **Note** that the load tester should be deployed in a namespace with Istio sidecar injection enabled.

View File

@@ -0,0 +1,135 @@
# Flagger install on Kubernetes with SuperGloo
This guide walks you through setting up Flagger on a Kubernetes cluster using [SuperGloo](https://github.com/solo-io/supergloo).
SuperGloo by [Solo.io](https://solo.io) is an opinionated abstraction layer that will simplify the installation, management, and operation of your service mesh.
It supports running multiple ingress with multiple mesh (Istio, App Mesh, Consul Connect and Linkerd 2) in the same cluster.
### Prerequisites
Flagger requires a Kubernetes cluster **v1.11** or newer with the following admission controllers enabled:
* MutatingAdmissionWebhook
* ValidatingAdmissionWebhook
### Install Istio with SuperGloo
Download SuperGloo CLI and add it to your path:
```bash
curl -sL https://run.solo.io/supergloo/install | sh
export PATH=$HOME/.supergloo/bin:$PATH
```
Deploy the SuperGloo controller in the `supergloo-system` namespace:
```bash
supergloo init
```
Create the `istio-system` namespace and install Istio with traffic management, telemetry and Prometheus enabled:
```bash
ISTIO_VER="1.0.6"
kubectl create ns istio-system
supergloo install istio --name istio \
--namespace=supergloo-system \
--auto-inject=true \
--installation-namespace=istio-system \
--mtls=false \
--prometheus=true \
--version ${ISTIO_VER}
```
Create a cluster role binding so that Flagger can manipulate SuperGloo custom resources:
```bash
kubectl create clusterrolebinding flagger-supergloo \
--clusterrole=mesh-discovery \
--serviceaccount=istio-system:flagger
```
Wait for the Istio control plane to become available:
```bash
kubectl -n istio-system rollout status deployment/istio-sidecar-injector
kubectl -n istio-system rollout status deployment/prometheus
```
### Install Flagger
Add Flagger Helm repository:
```bash
helm repo add flagger https://flagger.app
```
Deploy Flagger in the _**istio-system**_ namespace and set the service mesh provider to SuperGloo:
```bash
helm upgrade -i flagger flagger/flagger \
--namespace=istio-system \
--set metricsServer=http://prometheus.istio-system:9090 \
--set meshProvider=supergloo:istio.supergloo-system
```
When using SuperGloo the mesh provider format is `supergloo:<MESH-NAME>.<SUPERGLOO-NAMESPACE>`.
Optionally you can enable **Slack** notifications:
```bash
helm upgrade -i flagger flagger/flagger \
--reuse-values \
--namespace=istio-system \
--set slack.url=https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK \
--set slack.channel=general \
--set slack.user=flagger
```
### Install Grafana
Flagger comes with a Grafana dashboard made for monitoring the canary analysis.
Deploy Grafana in the _**istio-system**_ namespace:
```bash
helm upgrade -i flagger-grafana flagger/grafana \
--namespace=istio-system \
--set url=http://prometheus.istio-system:9090
```
You can access Grafana using port forwarding:
```bash
kubectl -n istio-system port-forward svc/flagger-grafana 3000:80
```
### Install Load Tester
Flagger comes with an optional load testing service that generates traffic
during canary analysis when configured as a webhook.
Deploy the load test runner with Helm:
```bash
helm upgrade -i flagger-loadtester flagger/loadtester \
--namespace=test \
--set cmd.timeout=1h
```
Deploy with kubectl:
```bash
helm fetch --untar --untardir . flagger/loadtester &&
helm template loadtester \
--name flagger-loadtester \
--namespace=test
> $HOME/flagger-loadtester.yaml
# apply
kubectl apply -f $HOME/flagger-loadtester.yaml
```
> **Note** that the load tester should be deployed in a namespace with Istio sidecar injection enabled.

View File

@@ -1,75 +0,0 @@
# Install Flagger
Before installing Flagger make sure you have [Istio](https://istio.io) running with Prometheus enabled.
If you are new to Istio you can follow this GKE guide
[Istio service mesh walk-through](https://docs.flagger.app/install/install-istio).
**Prerequisites**
* Kubernetes &gt;= 1.11
* Istio &gt;= 1.0
* Prometheus &gt;= 2.6
### Install with Helm and Tiller
Add Flagger Helm repository:
```bash
helm repo add flagger https://flagger.app
```
Deploy Flagger in the _**istio-system**_ namespace:
```bash
helm upgrade -i flagger flagger/flagger \
--namespace=istio-system \
--set metricsServer=http://prometheus.istio-system:9090
```
Enable **Slack** notifications:
```bash
helm upgrade -i flagger flagger/flagger \
--namespace=istio-system \
--set slack.url=https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK \
--set slack.channel=general \
--set slack.user=flagger
```
### Install with kubectl
If you don't have Tiller you can use the helm template command and apply the generated yaml with kubectl:
```bash
# generate
helm template flagger/flagger \
--name flagger \
--namespace=istio-system \
--set metricsServer=http://prometheus.istio-system:9090 \
--set controlLoopInterval=1m > $HOME/flagger.yaml
# apply
kubectl apply -f $HOME/flagger.yaml
```
### Uninstall
To uninstall/delete the flagger release with Helm run:
```text
helm delete --purge flagger
```
The command removes all the Kubernetes components associated with the chart and deletes the release.
> **Note** that on uninstall the Canary CRD will not be removed.
Deleting the CRD will make Kubernetes remove all the objects owned by Flagger like Istio virtual services,
Kubernetes deployments and ClusterIP services.
If you want to remove all the objects created by Flagger you have delete the Canary CRD with kubectl:
```text
kubectl delete crd canaries.flagger.app
```

View File

@@ -1,48 +0,0 @@
# Install Grafana
Flagger comes with a Grafana dashboard made for monitoring the canary analysis.
### Install with Helm and Tiller
Add Flagger Helm repository:
```bash
helm repo add flagger https://flagger.app
```
Deploy Grafana in the _**istio-system**_ namespace:
```bash
helm upgrade -i flagger-grafana flagger/grafana \
--namespace=istio-system \
--set url=http://prometheus:9090 \
--set user=admin \
--set password=admin
```
### Install with kubectl
If you don't have Tiller you can use the helm template command and apply the generated yaml with kubectl:
```bash
# generate
helm template flagger/grafana \
--name flagger-grafana \
--namespace=istio-system \
--set user=admin \
--set password=admin > $HOME/flagger-grafana.yaml
# apply
kubectl apply -f $HOME/flagger-grafana.yaml
```
### Uninstall
To uninstall/delete the Grafana release with Helm run:
```text
helm delete --purge flagger-grafana
```
The command removes all the Kubernetes components associated with the chart and deletes the release.

View File

@@ -0,0 +1,353 @@
# Canary Deployments with Helm Charts and GitOps
This guide shows you how to package a web app into a Helm chart, trigger canary deployments on Helm upgrade
and automate the chart release process with Weave Flux.
### Packaging
You'll be using the [podinfo](https://github.com/stefanprodan/k8s-podinfo) chart.
This chart packages a web app made with Go, it's configuration, a horizontal pod autoscaler (HPA)
and the canary configuration file.
```
├── Chart.yaml
├── README.md
├── templates
│   ├── NOTES.txt
│   ├── _helpers.tpl
│   ├── canary.yaml
│   ├── configmap.yaml
│   ├── deployment.yaml
│   └── hpa.yaml
└── values.yaml
```
You can find the chart source [here](https://github.com/stefanprodan/flagger/tree/master/charts/podinfo).
### Install
Create a test namespace with Istio sidecar injection enabled:
```bash
export REPO=https://raw.githubusercontent.com/weaveworks/flagger/master
kubectl apply -f ${REPO}/artifacts/namespaces/test.yaml
```
Add Flagger Helm repository:
```bash
helm repo add flagger https://flagger.app
```
Install podinfo with the release name `frontend` (replace `example.com` with your own domain):
```bash
helm upgrade -i frontend flagger/podinfo \
--namespace test \
--set nameOverride=frontend \
--set backend=http://backend.test:9898/echo \
--set canary.enabled=true \
--set canary.istioIngress.enabled=true \
--set canary.istioIngress.gateway=public-gateway.istio-system.svc.cluster.local \
--set canary.istioIngress.host=frontend.istio.example.com
```
Flagger takes a Kubernetes deployment and a horizontal pod autoscaler (HPA),
then creates a series of objects (Kubernetes deployments, ClusterIP services and Istio virtual services).
These objects expose the application on the mesh and drive the canary analysis and promotion.
```bash
# generated by Helm
configmap/frontend
deployment.apps/frontend
horizontalpodautoscaler.autoscaling/frontend
canary.flagger.app/frontend
# generated by Flagger
configmap/frontend-primary
deployment.apps/frontend-primary
horizontalpodautoscaler.autoscaling/frontend-primary
service/frontend
service/frontend-canary
service/frontend-primary
virtualservice.networking.istio.io/frontend
```
When the `frontend-primary` deployment comes online,
Flagger will route all traffic to the primary pods and scale to zero the `frontend` deployment.
Open your browser and navigate to the frontend URL:
![Podinfo Frontend](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/screens/demo-frontend.png)
Now let's install the `backend` release without exposing it outside the mesh:
```bash
helm upgrade -i backend flagger/podinfo \
--namespace test \
--set nameOverride=backend \
--set canary.enabled=true \
--set canary.istioIngress.enabled=false
```
Check if Flagger has successfully deployed the canaries:
```
kubectl -n test get canaries
NAME STATUS WEIGHT LASTTRANSITIONTIME
backend Initialized 0 2019-02-12T18:53:18Z
frontend Initialized 0 2019-02-12T17:50:50Z
```
Click on the ping button in the `frontend` UI to trigger a HTTP POST request
that will reach the `backend` app:
![Jaeger Tracing](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/screens/demo-frontend-jaeger.png)
We'll use the `/echo` endpoint (same as the one the ping button calls)
to generate load on both apps during a canary deployment.
### Upgrade
First let's install a load testing service that will generate traffic during analysis:
```bash
helm upgrade -i flagger-loadtester flagger/loadtester \
--namespace=test
```
Enable the load tester and deploy a new `frontend` version:
```bash
helm upgrade -i frontend flagger/podinfo/ \
--namespace test \
--reuse-values \
--set canary.loadtest.enabled=true \
--set image.tag=1.4.1
```
Flagger detects that the deployment revision changed and starts the canary analysis along with the load test:
```
kubectl -n istio-system logs deployment/flagger -f | jq .msg
New revision detected! Scaling up frontend.test
Halt advancement frontend.test waiting for rollout to finish: 0 of 2 updated replicas are available
Starting canary analysis for frontend.test
Advance frontend.test canary weight 5
Advance frontend.test canary weight 10
Advance frontend.test canary weight 15
Advance frontend.test canary weight 20
Advance frontend.test canary weight 25
Advance frontend.test canary weight 30
Advance frontend.test canary weight 35
Advance frontend.test canary weight 40
Advance frontend.test canary weight 45
Advance frontend.test canary weight 50
Copying frontend.test template spec to frontend-primary.test
Halt advancement frontend-primary.test waiting for rollout to finish: 1 old replicas are pending termination
Promotion completed! Scaling down frontend.test
```
You can monitor the canary deployment with Grafana. Open the Flagger dashboard,
select `test` from the namespace dropdown, `frontend-primary` from the primary dropdown and `frontend` from the
canary dropdown.
![Flagger Grafana Dashboard](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/screens/demo-frontend-dashboard.png)
Now trigger a canary deployment for the `backend` app, but this time you'll change a value in the configmap:
```bash
helm upgrade -i backend flagger/podinfo/ \
--namespace test \
--reuse-values \
--set canary.loadtest.enabled=true \
--set httpServer.timeout=25s
```
Generate HTTP 500 errors:
```bash
kubectl -n test exec -it flagger-loadtester-xxx-yyy sh
watch curl http://backend-canary:9898/status/500
```
Generate latency:
```bash
kubectl -n test exec -it flagger-loadtester-xxx-yyy sh
watch curl http://backend-canary:9898/delay/1
```
Flagger detects the config map change and starts a canary analysis. Flagger will pause the advancement
when the HTTP success rate drops under 99% or when the average request duration in the last minute is over 500ms:
```
kubectl -n test describe canary backend
Events:
ConfigMap backend has changed
New revision detected! Scaling up backend.test
Starting canary analysis for backend.test
Advance backend.test canary weight 5
Advance backend.test canary weight 10
Advance backend.test canary weight 15
Advance backend.test canary weight 20
Advance backend.test canary weight 25
Advance backend.test canary weight 30
Advance backend.test canary weight 35
Halt backend.test advancement success rate 62.50% < 99%
Halt backend.test advancement success rate 88.24% < 99%
Advance backend.test canary weight 40
Advance backend.test canary weight 45
Halt backend.test advancement request duration 2.415s > 500ms
Halt backend.test advancement request duration 2.42s > 500ms
Advance backend.test canary weight 50
ConfigMap backend-primary synced
Copying backend.test template spec to backend-primary.test
Promotion completed! Scaling down backend.test
```
![Flagger Grafana Dashboard](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/screens/demo-backend-dashboard.png)
If the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary,
the canary is scaled to zero and the rollout is marked as failed.
```bash
kubectl -n test get canary
NAME STATUS WEIGHT LASTTRANSITIONTIME
backend Succeeded 0 2019-02-12T19:33:11Z
frontend Failed 0 2019-02-12T19:47:20Z
```
If you've enabled the Slack notifications, you'll receive an alert with the reason why the `backend` promotion failed.
### GitOps automation
Instead of using Helm CLI from a CI tool to perform the install and upgrade,
you could use a Git based approach. GitOps is a way to do Continuous Delivery,
it works by using Git as a source of truth for declarative infrastructure and workloads.
In the [GitOps model](https://www.weave.works/technologies/gitops/),
any change to production must be committed in source control
prior to being applied on the cluster. This way rollback and audit logs are provided by Git.
![Helm GitOps Canary Deployment](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/diagrams/flagger-flux-gitops.png)
In order to apply the GitOps pipeline model to Flagger canary deployments you'll need
a Git repository with your workloads definitions in YAML format,
a container registry where your CI system pushes immutable images and
an operator that synchronizes the Git repo with the cluster state.
Create a git repository with the following content:
```
├── namespaces
│   └── test.yaml
└── releases
└── test
├── backend.yaml
├── frontend.yaml
└── loadtester.yaml
```
You can find the git source [here](https://github.com/stefanprodan/flagger/tree/master/artifacts/cluster).
Define the `frontend` release using Flux `HelmRelease` custom resource:
```yaml
apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
metadata:
name: frontend
namespace: test
annotations:
flux.weave.works/automated: "true"
flux.weave.works/tag.chart-image: semver:~1.4
spec:
releaseName: frontend
chart:
repository: https://stefanprodan.github.io/flagger/
name: podinfo
version: 2.0.0
values:
image:
repository: quay.io/stefanprodan/podinfo
tag: 1.4.0
backend: http://backend-podinfo:9898/echo
canary:
enabled: true
istioIngress:
enabled: true
gateway: public-gateway.istio-system.svc.cluster.local
host: frontend.istio.example.com
loadtest:
enabled: true
```
In the `chart` section I've defined the release source by specifying the Helm repository (hosted on GitHub Pages), chart name and version.
In the `values` section I've overwritten the defaults set in values.yaml.
With the `flux.weave.works` annotations I instruct Flux to automate this release.
When an image tag in the sem ver range of `1.4.0 - 1.4.99` is pushed to Quay,
Flux will upgrade the Helm release and from there Flagger will pick up the change and start a canary deployment.
Install [Weave Flux](https://github.com/weaveworks/flux) and its Helm Operator by specifying your Git repo URL:
```bash
helm repo add weaveworks https://weaveworks.github.io/flux
helm install --name flux \
--set helmOperator.create=true \
--set git.url=git@github.com:<USERNAME>/<REPOSITORY> \
--namespace flux \
weaveworks/flux
```
At startup Flux generates a SSH key and logs the public key. Find the SSH public key with:
```bash
kubectl -n flux logs deployment/flux | grep identity.pub | cut -d '"' -f2
```
In order to sync your cluster state with Git you need to copy the public key and create a
deploy key with write access on your GitHub repository.
Open GitHub, navigate to your fork, go to _Setting > Deploy keys_ click on _Add deploy key_,
check _Allow write access_, paste the Flux public key and click _Add key_.
After a couple of seconds Flux will apply the Kubernetes resources from Git and Flagger will
launch the `frontend` and `backend` apps.
A CI/CD pipeline for the `frontend` release could look like this:
* cut a release from the master branch of the podinfo code repo with the git tag `1.4.1`
* CI builds the image and pushes the `podinfo:1.4.1` image to the container registry
* Flux scans the registry and updates the Helm release `image.tag` to `1.4.1`
* Flux commits and push the change to the cluster repo
* Flux applies the updated Helm release on the cluster
* Flux Helm Operator picks up the change and calls Tiller to upgrade the release
* Flagger detects a revision change and scales up the `frontend` deployment
* Flagger starts the load test and runs the canary analysis
* Based on the analysis result the canary deployment is promoted to production or rolled back
* Flagger sends a Slack notification with the canary result
If the canary fails, fix the bug, do another patch release eg `1.4.2` and the whole process will run again.
A canary deployment can fail due to any of the following reasons:
* the container image can't be downloaded
* the deployment replica set is stuck for more then ten minutes (eg. due to a container crash loop)
* the webooks (acceptance tests, load tests, etc) are returning a non 2xx response
* the HTTP success rate (non 5xx responses) metric drops under the threshold
* the HTTP average duration metric goes over the threshold
* the Istio telemetry service is unable to collect traffic metrics
* the metrics server (Prometheus) can't be reached
If you want to find out more about managing Helm releases with Flux here is an in-depth guide
[github.com/stefanprodan/gitops-helm](https://github.com/stefanprodan/gitops-helm).

View File

@@ -0,0 +1,206 @@
# Zero downtime deployments
This is a list of things you should consider when dealing with a high traffic production environment if you want to
minimise the impact of rolling updates and downscaling.
### Deployment strategy
Limit the number of unavailable pods during a rolling update:
```yaml
apiVersion: apps/v1
kind: Deployment
spec:
progressDeadlineSeconds: 120
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
```
The default progress deadline for a deployment is ten minutes.
You should consider adjusting this value to make the deployment process fail faster.
### Liveness health check
You application should expose a HTTP endpoint that Kubernetes can call to determine if
your app transitioned to a broken state from which it can't recover and needs to be restarted.
```yaml
readinessProbe:
exec:
command:
- wget
- --quiet
- --tries=1
- --timeout=4
- --spider
- http://localhost:8080/healthz
timeoutSeconds: 5
initialDelaySeconds: 5
```
If you've enabled mTLS, you'll have to use `exec` for liveness and readiness checks since
kubelet is not part of the service mesh and doesn't have access to the TLS cert.
### Readiness health check
You application should expose a HTTP endpoint that Kubernetes can call to determine if
your app is ready to receive traffic.
```yaml
livenessProbe:
exec:
command:
- wget
- --quiet
- --tries=1
- --timeout=4
- --spider
- http://localhost:8080/readyz
timeoutSeconds: 5
initialDelaySeconds: 5
periodSeconds: 5
```
If your app depends on external services, you should check if those services are available before allowing Kubernetes
to route traffic to an app instance. Keep in mind that the Envoy sidecar can have a slower startup than your app.
This means that on application start you should retry for at least a couple of seconds any external connection.
### Graceful shutdown
Before a pod gets terminated, Kubernetes sends a `SIGTERM` signal to every container and waits for period of
time (30s by default) for all containers to exit gracefully. If your app doesn't handle the `SIGTERM` signal or if it
doesn't exit within the grace period, Kubernetes will kill the container and any inflight requests that your app is
processing will fail.
```yaml
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
terminationGracePeriodSeconds: 60
containers:
- name: app
lifecycle:
preStop:
exec:
command:
- sleep
- "10"
```
Your app container should have a `preStop` hook that delays the container shutdown.
This will allow the service mesh to drain the traffic and remove this pod from all other Envoy sidecars before your app
becomes unavailable.
### Delay Envoy shutdown
Even if your app reacts to `SIGTERM` and tries to complete the inflight requests before shutdown, that
doesn't mean that the response will make it back to the caller. If the Envoy sidecar shuts down before your app, then
the caller will receive a 503 error.
To mitigate this issue you can add a `preStop` hook to the Istio proxy and wait for the main app to exist before Envoy exists.
```bash
#!/bin/bash
set -e
if ! pidof envoy &>/dev/null; then
exit 0
fi
if ! pidof pilot-agent &>/dev/null; then
exit 0
fi
while [ $(netstat -plunt | grep tcp | grep -v envoy | wc -l | xargs) -ne 0 ]; do
sleep 1;
done
exit 0
```
You'll have to build your own Envoy docker image with the above script and
modify the Istio injection webhook with the `preStop` directive.
Thanks to Stono for his excellent [tips](https://github.com/istio/istio/issues/12183) on minimising 503s.
### Resource requests and limits
Setting CPU and memory requests/limits for all workloads is a mandatory step if you're running a production system.
Without limits your nodes could run out of memory or become unresponsive due to CPU exhausting.
Without CPU and memory requests,
the Kubernetes scheduler will not be able to make decisions about which nodes to place pods on.
```yaml
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: app
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 100m
memory: 128Mi
```
Note that without resource requests the horizontal pod autoscaler can't determine when to scale your app.
### Autoscaling
A production environment should be able to handle traffic bursts without impacting the quality of service.
This can be achieved with Kubernetes autoscaling capabilities.
Autoscaling in Kubernetes has two dimensions: the Cluster Autoscaler that deals with node scaling operations and
the Horizontal Pod Autoscaler that automatically scales the number of pods in a deployment.
```yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app
minReplicas: 2
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
targetAverageValue: 900m
- type: Resource
resource:
name: memory
targetAverageValue: 768Mi
```
The above HPA ensures your app will be scaled up before the pods reach the CPU or memory limits.
### Ingress retries
To minimise the impact of downscaling operations you can make use of Envoy retry capabilities.
```yaml
apiVersion: flagger.app/v1alpha3
kind: Canary
spec:
service:
port: 9898
gateways:
- public-gateway.istio-system.svc.cluster.local
hosts:
- app.example.com
appendHeaders:
x-envoy-upstream-rq-timeout-ms: "15000"
x-envoy-max-retries: "10"
x-envoy-retry-on: "gateway-error,connect-failure,refused-stream"
```
When the HPA scales down your app, your users could run into 503 errors.
The above configuration will make Envoy retry the HTTP requests that failed due to gateway errors.

View File

@@ -0,0 +1,214 @@
# Istio A/B Testing
This guide shows you how to automate A/B testing with Istio and Flagger.
Besides weighted routing, Flagger can be configured to route traffic to the canary based on HTTP match conditions.
In an A/B testing scenario, you'll be using HTTP headers or cookies to target a certain segment of your users.
This is particularly useful for frontend applications that require session affinity.
![Flagger A/B Testing Stages](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/diagrams/flagger-abtest-steps.png)
### Bootstrap
Create a test namespace with Istio sidecar injection enabled:
```bash
export REPO=https://raw.githubusercontent.com/weaveworks/flagger/master
kubectl apply -f ${REPO}/artifacts/namespaces/test.yaml
```
Create a deployment and a horizontal pod autoscaler:
```bash
kubectl apply -f ${REPO}/artifacts/ab-testing/deployment.yaml
kubectl apply -f ${REPO}/artifacts/ab-testing/hpa.yaml
```
Deploy the load testing service to generate traffic during the canary analysis:
```bash
kubectl -n test apply -f ${REPO}/artifacts/loadtester/deployment.yaml
kubectl -n test apply -f ${REPO}/artifacts/loadtester/service.yaml
```
Create a canary custom resource (replace example.com with your own domain):
```yaml
apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
name: abtest
namespace: test
spec:
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
name: abtest
# the maximum time in seconds for the canary deployment
# to make progress before it is rollback (default 600s)
progressDeadlineSeconds: 60
# HPA reference (optional)
autoscalerRef:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: abtest
service:
# container port
port: 9898
# Istio gateways (optional)
gateways:
- public-gateway.istio-system.svc.cluster.local
- mesh
# Istio virtual service host names (optional)
hosts:
- app.example.com
canaryAnalysis:
# schedule interval (default 60s)
interval: 1m
# total number of iterations
iterations: 10
# max number of failed iterations before rollback
threshold: 2
# canary match condition
match:
- headers:
user-agent:
regex: "^(?!.*Chrome).*Safari.*"
- headers:
cookie:
regex: "^(.*?;)?(type=insider)(;.*)?$"
metrics:
- name: request-success-rate
# minimum req success rate (non 5xx responses)
# percentage (0-100)
threshold: 99
interval: 1m
- name: request-duration
# maximum req duration P99
# milliseconds
threshold: 500
interval: 30s
# generate traffic during analysis
webhooks:
- name: load-test
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 -H 'Cookie: type=insider' http://podinfo.test:9898/"
```
The above configuration will run an analysis for ten minutes targeting Safari users and those that have an insider cookie.
Save the above resource as podinfo-abtest.yaml and then apply it:
```bash
kubectl apply -f ./podinfo-abtest.yaml
```
After a couple of seconds Flagger will create the canary objects:
```bash
# applied
deployment.apps/abtest
horizontalpodautoscaler.autoscaling/abtest
canary.flagger.app/abtest
# generated
deployment.apps/abtest-primary
horizontalpodautoscaler.autoscaling/abtest-primary
service/abtest
service/abtest-canary
service/abtest-primary
virtualservice.networking.istio.io/abtest
```
### Automated canary promotion
Trigger a canary deployment by updating the container image:
```bash
kubectl -n test set image deployment/abtest \
podinfod=quay.io/stefanprodan/podinfo:1.4.1
```
Flagger detects that the deployment revision changed and starts a new rollout:
```text
kubectl -n test describe canary/abtest
Status:
Failed Checks: 0
Phase: Succeeded
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Synced 3m flagger New revision detected abtest.test
Normal Synced 3m flagger Scaling up abtest.test
Warning Synced 3m flagger Waiting for abtest.test rollout to finish: 0 of 1 updated replicas are available
Normal Synced 3m flagger Advance abtest.test canary iteration 1/10
Normal Synced 3m flagger Advance abtest.test canary iteration 2/10
Normal Synced 3m flagger Advance abtest.test canary iteration 3/10
Normal Synced 2m flagger Advance abtest.test canary iteration 4/10
Normal Synced 2m flagger Advance abtest.test canary iteration 5/10
Normal Synced 1m flagger Advance abtest.test canary iteration 6/10
Normal Synced 1m flagger Advance abtest.test canary iteration 7/10
Normal Synced 55s flagger Advance abtest.test canary iteration 8/10
Normal Synced 45s flagger Advance abtest.test canary iteration 9/10
Normal Synced 35s flagger Advance abtest.test canary iteration 10/10
Normal Synced 25s flagger Copying abtest.test template spec to abtest-primary.test
Warning Synced 15s flagger Waiting for abtest-primary.test rollout to finish: 1 of 2 updated replicas are available
Normal Synced 5s flagger Promotion completed! Scaling down abtest.test
```
**Note** that if you apply new changes to the deployment during the canary analysis, Flagger will restart the analysis.
You can monitor all canaries with:
```bash
watch kubectl get canaries --all-namespaces
NAMESPACE NAME STATUS WEIGHT LASTTRANSITIONTIME
test abtest Progressing 100 2019-03-16T14:05:07Z
prod frontend Succeeded 0 2019-03-15T16:15:07Z
prod backend Failed 0 2019-03-14T17:05:07Z
```
### Automated rollback
During the canary analysis you can generate HTTP 500 errors and high latency to test Flagger's rollback.
Generate HTTP 500 errors:
```bash
watch curl -b 'type=insider' http://app.example.com/status/500
```
Generate latency:
```bash
watch curl -b 'type=insider' http://app.example.com/delay/1
```
When the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary,
the canary is scaled to zero and the rollout is marked as failed.
```text
kubectl -n test describe canary/abtest
Status:
Failed Checks: 2
Phase: Failed
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Synced 3m flagger Starting canary deployment for abtest.test
Normal Synced 3m flagger Advance abtest.test canary iteration 1/10
Normal Synced 3m flagger Advance abtest.test canary iteration 2/10
Normal Synced 3m flagger Advance abtest.test canary iteration 3/10
Normal Synced 3m flagger Halt abtest.test advancement success rate 69.17% < 99%
Normal Synced 2m flagger Halt abtest.test advancement success rate 61.39% < 99%
Warning Synced 2m flagger Rolling back abtest.test failed checks threshold reached 2
Warning Synced 1m flagger Canary failed! Scaling down abtest.test
```

View File

@@ -15,12 +15,12 @@ helm upgrade -i flagger flagger/flagger \
Once configured with a Slack incoming **webhook**, Flagger will post messages when a canary deployment
has been initialised, when a new revision has been detected and if the canary analysis failed or succeeded.
![flagger-slack](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/screens/slack-canary-notifications.png)
![Slack Notifications](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/screens/slack-canary-notifications.png)
A canary deployment will be rolled back if the progress deadline exceeded or if the analysis reached the
maximum number of failed checks:
![flagger-slack-errors](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/screens/slack-canary-failed.png)
![Slack Notifications](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/screens/slack-canary-failed.png)
### Prometheus Alert Manager

View File

@@ -0,0 +1,286 @@
# App Mesh Canary Deployments
This guide shows you how to use App Mesh and Flagger to automate canary deployments.
You'll need an EKS cluster configured with App Mesh, you can find the install guide
[here](https://docs.flagger.app/install/flagger-install-on-eks-appmesh).
### Bootstrap
Flagger takes a Kubernetes deployment and optionally a horizontal pod autoscaler (HPA),
then creates a series of objects (Kubernetes deployments, ClusterIP services, App Mesh virtual nodes and services).
These objects expose the application on the mesh and drive the canary analysis and promotion.
The only App Mesh object you need to create by yourself is the mesh resource.
Create a mesh called `global`:
```bash
export REPO=https://raw.githubusercontent.com/weaveworks/flagger/master
kubectl apply -f ${REPO}/artifacts/appmesh/global-mesh.yaml
```
Create a test namespace with App Mesh sidecar injection enabled:
```bash
kubectl apply -f ${REPO}/artifacts/namespaces/test.yaml
```
Create a deployment and a horizontal pod autoscaler:
```bash
kubectl apply -f ${REPO}/artifacts/appmesh/deployment.yaml
kubectl apply -f ${REPO}/artifacts/appmesh/hpa.yaml
```
Deploy the load testing service to generate traffic during the canary analysis:
```bash
helm upgrade -i flagger-loadtester flagger/loadtester \
--namespace=test \
--set meshName=global.appmesh-system \
--set "backends[0]=podinfo.test"
```
Create a canary custom resource:
```yaml
apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
name: podinfo
namespace: test
spec:
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
# the maximum time in seconds for the canary deployment
# to make progress before it is rollback (default 600s)
progressDeadlineSeconds: 60
# HPA reference (optional)
autoscalerRef:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: podinfo
service:
# container port
port: 9898
# App Mesh reference
meshName: global.appmesh-system
# App Mesh egress (optional)
backends:
- backend.test
# define the canary analysis timing and KPIs
canaryAnalysis:
# schedule interval (default 60s)
interval: 10s
# max number of failed metric checks before rollback
threshold: 10
# max traffic percentage routed to canary
# percentage (0-100)
maxWeight: 50
# canary increment step
# percentage (0-100)
stepWeight: 5
# App Mesh Prometheus checks
metrics:
- name: request-success-rate
# minimum req success rate (non 5xx responses)
# percentage (0-100)
threshold: 99
interval: 1m
# external checks (optional)
webhooks:
- name: load-test
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://podinfo.test:9898/"
```
Save the above resource as podinfo-canary.yaml and then apply it:
```bash
kubectl apply -f ./podinfo-canary.yaml
```
After a couple of seconds Flagger will create the canary objects:
```bash
# applied
deployment.apps/podinfo
horizontalpodautoscaler.autoscaling/podinfo
canary.flagger.app/podinfo
# generated Kubernetes objects
deployment.apps/podinfo-primary
horizontalpodautoscaler.autoscaling/podinfo-primary
service/podinfo
service/podinfo-canary
service/podinfo-primary
# generated App Mesh objects
virtualnode.appmesh.k8s.aws/podinfo
virtualnode.appmesh.k8s.aws/podinfo-canary
virtualnode.appmesh.k8s.aws/podinfo-primary
virtualservice.appmesh.k8s.aws/podinfo.test
```
The App Mesh specific settings are:
```yaml
service:
port: 9898
meshName: global.appmesh-system
backends:
- backend1.test
- backend2.test
```
App Mesh blocks all egress traffic by default. If your application needs to call another service, you have to create an
App Mesh virtual service for it and add the virtual service name to the backend list.
### Setup App Mesh ingress (optional)
In order to expose the podinfo app outside the mesh you'll be using an Envoy ingress and an AWS classic load balancer.
The ingress binds to an internet domain and forwards the calls into the mesh through the App Mesh sidecar.
If podinfo becomes unavailable due to a HPA downscaling or a node restart,
the ingress will retry the calls for a short period of time.
Deploy the ingress and the AWS ELB service:
```bash
kubectl apply -f ${REPO}/artifacts/appmesh/ingress.yaml
```
Find the ingress public address:
```bash
kubectl -n test describe svc/ingress | grep Ingress
LoadBalancer Ingress: yyy-xx.us-west-2.elb.amazonaws.com
```
Wait for the ELB to become active:
```bash
watch curl -sS ${INGRESS_URL}
```
Open your browser and navigate to the ingress address to access podinfo UI.
### Automated canary promotion
Trigger a canary deployment by updating the container image:
```bash
kubectl -n test set image deployment/podinfo \
podinfod=quay.io/stefanprodan/podinfo:1.4.1
```
Flagger detects that the deployment revision changed and starts a new rollout:
```text
kubectl -n test describe canary/podinfo
Status:
Canary Weight: 0
Failed Checks: 0
Phase: Succeeded
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Synced 3m flagger New revision detected podinfo.test
Normal Synced 3m flagger Scaling up podinfo.test
Warning Synced 3m flagger Waiting for podinfo.test rollout to finish: 0 of 1 updated replicas are available
Normal Synced 3m flagger Advance podinfo.test canary weight 5
Normal Synced 3m flagger Advance podinfo.test canary weight 10
Normal Synced 3m flagger Advance podinfo.test canary weight 15
Normal Synced 2m flagger Advance podinfo.test canary weight 20
Normal Synced 2m flagger Advance podinfo.test canary weight 25
Normal Synced 1m flagger Advance podinfo.test canary weight 30
Normal Synced 1m flagger Advance podinfo.test canary weight 35
Normal Synced 55s flagger Advance podinfo.test canary weight 40
Normal Synced 45s flagger Advance podinfo.test canary weight 45
Normal Synced 35s flagger Advance podinfo.test canary weight 50
Normal Synced 25s flagger Copying podinfo.test template spec to podinfo-primary.test
Warning Synced 15s flagger Waiting for podinfo-primary.test rollout to finish: 1 of 2 updated replicas are available
Normal Synced 5s flagger Promotion completed! Scaling down podinfo.test
```
**Note** that if you apply new changes to the deployment during the canary analysis, Flagger will restart the analysis.
During the analysis the canarys progress can be monitored with Grafana. The App Mesh dashboard URL is
http://localhost:3000/d/flagger-appmesh/appmesh-canary?refresh=10s&orgId=1&var-namespace=test&var-primary=podinfo-primary&var-canary=podinfo
![App Mesh Canary Dashboard](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/screens/flagger-grafana-appmesh.png)
You can monitor all canaries with:
```bash
watch kubectl get canaries --all-namespaces
NAMESPACE NAME STATUS WEIGHT LASTTRANSITIONTIME
test podinfo Progressing 15 2019-03-16T14:05:07Z
prod frontend Succeeded 0 2019-03-15T16:15:07Z
prod backend Failed 0 2019-03-14T17:05:07Z
```
If youve enabled the Slack notifications, you should receive the following messages:
![Flagger Slack Notifications](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/screens/slack-canary-notifications.png)
### Automated rollback
During the canary analysis you can generate HTTP 500 errors to test if Flagger pauses the rollout.
Trigger a canary deployment:
```bash
kubectl -n test set image deployment/podinfo \
podinfod=quay.io/stefanprodan/podinfo:1.4.2
```
Exec into the load tester pod with:
```bash
kubectl -n test exec -it flagger-loadtester-xx-xx sh
```
Generate HTTP 500 errors:
```bash
hey -z 1m -c 5 -q 5 http://podinfo.test:9898/status/500
```
When the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary,
the canary is scaled to zero and the rollout is marked as failed.
```text
kubectl -n test describe canary/podinfo
Status:
Canary Weight: 0
Failed Checks: 10
Phase: Failed
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Synced 3m flagger Starting canary deployment for podinfo.test
Normal Synced 3m flagger Advance podinfo.test canary weight 5
Normal Synced 3m flagger Advance podinfo.test canary weight 10
Normal Synced 3m flagger Advance podinfo.test canary weight 15
Normal Synced 3m flagger Halt podinfo.test advancement success rate 69.17% < 99%
Normal Synced 2m flagger Halt podinfo.test advancement success rate 61.39% < 99%
Normal Synced 2m flagger Halt podinfo.test advancement success rate 55.06% < 99%
Normal Synced 2m flagger Halt podinfo.test advancement success rate 47.00% < 99%
Normal Synced 2m flagger (combined from similar events): Halt podinfo.test advancement success rate 38.08% < 99%
Warning Synced 1m flagger Rolling back podinfo.test failed checks threshold reached 10
Warning Synced 1m flagger Canary failed! Scaling down podinfo.test
```
If youve enabled the Slack notifications, youll receive a message if the progress deadline is exceeded,
or if the analysis reached the maximum number of failed checks:
![Flagger Slack Notifications](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/screens/slack-canary-failed.png)

View File

@@ -6,15 +6,13 @@ Flagger comes with a Grafana dashboard made for canary analysis. Install Grafana
```bash
helm upgrade -i flagger-grafana flagger/grafana \
--namespace=istio-system \
--set url=http://prometheus:9090 \
--set user=admin \
--set password=admin
--namespace=istio-system \ # or appmesh-system
--set url=http://prometheus:9090
```
The dashboard shows the RED and USE metrics for the primary and canary workloads:
![canary dashboard](https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/screens/grafana-canary-analysis.png)
![Canary Dashboard](https://raw.githubusercontent.com/weaveworks/flagger/master/docs/screens/grafana-canary-analysis.png)
### Logging
@@ -48,6 +46,9 @@ Flagger exposes Prometheus metrics that can be used to determine the canary anal
the destination weight values:
```bash
# Flagger version and mesh provider gauge
flagger_info{version="0.10.0", mesh_provider="istio"} 1
# Canaries total gauge
flagger_canary_total{namespace="test"} 1

Some files were not shown because too many files have changed in this diff Show More