It can take some time for changes to propagate for cloud load balancers,
so flagger should ensure the route changes are current before proceeding
with any more.
Signed-off-by: Steven Davidovitz <sdavidovitz@groq.com>
These are labels and annotations that should be ignored by Flagger
(i.e. not overwritten upon reconciliation).
See: github.com/fluxcd/flagger/issues/1573
Signed-off-by: Brian Sonnenberg <bsonnenberg@google.com>
When cross-namespace references are disabled, ensure that UpstreamRef,
MetricTemplateRef, and AlertProviderRef default to the canary's namespace
if their namespace field is empty. This aligns the validation logic with
the rest of the controller and prevents false positives when the namespace
is omitted.
Fixes#1827
Signed-off-by: Barrera, Angel <angelbarrerasanchez@protonmail.com>
Implement flagger_canary_successes_total and flagger_canary_failures_total
counter metrics with deployment strategy detection and analysis status
tracking for better observability of canary deployment outcomes.
Signed-off-by: cappyzawa <cappyzawa@gmail.com>
Enhance existing scheduler tests for deployments, daemonsets, and
services by adding prometheus metrics verification using testutil.
This ensures that status metrics are correctly recorded during
canary promotion workflows and provides better test coverage for
the metrics recording functionality.
Signed-off-by: cappyzawa <cappyzawa@gmail.com>
`apisix` Helm chart has dependency on `etcd` chart which uses a pinned Bitnami image. These became unavailable on August 28, 2025: https://github.com/bitnami/containers/issues/83267
The image is still available in the `bitnamilegacy` repository.
Signed-off-by: Kevin Snyder <kevin.snyder@gusto.com>
Add a new field `.spec.service.headless` which if set to true results in
Flagger generating headless Services, i.e. with the Service's
`.spec.clusterIP` set to None.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
`nginx_ingress_controller_ingress_upstream_latency_seconds_sum` measures the connection latency, not the time it takes the backend to respond.
Fixes#1685
Signed-off-by: Federico Nafria <federiconafria@gmail.com>
Add a Keptn metrics provider for two resources:
* KeptnMetric: Verify the value of a single metric.
* Analysis (via AnalysisDefinition): Run a Keptn analysis over an
interval validating SLOs.
Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
This should avoid frequent "Operation cannot be fulfilled" errors from
polluting Canary resource events and logs.
Signed-off-by: Aurel Canciu <aurel.canciu@nexhealth.com>
Modify `canary.IsPrimaryReady()` and `canary.Initialize()` to return a
boolean indicating if the error is retriable. Modify the scheduler to
rollback the analysis and mark the Canary object as failed if the above
two functions or `canary.IsCanaryRead()` returns false along with an
error.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Add support for v1 of Gateway API `HTTPRoute`. Drop support for v1alpha2
as it was deprecated almost a year ago.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Add a new field `.spec.webhooks[].retries` to specify the number of
retries when calling a webhook.
Signed-off-by: Joseph Kwasniewski <kwasniewski@gmail.com>
Fix the waiting logic to actually wait for the canary deployment to be
ready before continuing with the rest of the finalization logic.
Previously, the canary deployment was not being checked for a ready
status due to the the absence of the `Steps` field in the specified
backoff.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Add support for mirroring requests while performing B/G deployments with
Gateway API. A `RequestMirror` filter pointing to the canary service is
added to the HTTPRoute during a Canary run. During the Canary run, drift
correction for `.spec.rules[].filters` is disabled to avoid removing the
mirror filter.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
This adds a new Checksum field to the canary webhook body, which is a
hash of the LastAppliedSpec and TrackedConfigs.
This can be used to identify the rollout of a specific configuration,
and differentiate between webhooks being sent for different
configuration and deployment versions.
Signed-off-by: Kevin McDermott <kevin@weave.works>
Skipper's installation requires the creation of a PodSecurityPolicy
object. Since PSP was removed from k8s 1.25, we need to run tests for
skipper on k8s 1.24.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
When the annotation of ingress is not set, the returned value is nil
(not empty map). Trying to assign to this map leads to panic.
Signed-off-by: Jiří Pinkava <j-pi@seznam.cz>
Suspend, if set to true will suspend the Canary, disabling any canary runs
regardless of any changes to its target, services, etc. Note that if the
Canary is suspended during an analysis, its paused until the Canary is unsuspended.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Resume target scaler during finalization so that targetRef deployment
does not get stuck at 0 replicas after canary has been deleted.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Run the `confirm-rollout` webhook check right before scaling up the
deployment only, instead of running it on every loop.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
In Linkerd 2.13 the Prometheus instance in
the `linkerd-viz` namespace is now locked behind an
[_AuthorizationPolicy_](https://github.com/linkerd/linkerd2/blob/stable-2.13.1/viz/charts/linkerd-viz/templates/prometheus-policy.yaml)
that only allows access to the `metrics-api` _ServiceAccount_.
This adds an extra _AuthorizationPolicy_ to authorize the `flagger`
_ServiceAccount_. It's created by default when using Kustomize, but
needs to be opted-in when using Helm via the new
`linkerdAuthPolicy.create` value. This also implies that the Flagger
workload has to be injected by the Linkerd proxy, and that can't happen
in the same `linkerd` namespace where the control plane lives, so we're
moving Flagger into the new injected `flagger-system` namespace.
The `namespace` field in `kustomization.yml` was resetting the namespace
for the new _AuthorizationPolicy_ resource, so that gets restored back
to `linkerd-viz` using a `patchesJson6902` entry. A better way to do
this would have been to use the `unsetOnly` field in a
_NamespaceTransformer_ (see kubernetes-sigs/kustomize#4708) but for
the life of me I couldn't make that work...
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
Use regex filtering to match against session affinity cookie headers
when using Istio instead of an exact match.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Adding support for overriding the primary scaler replica count via
.spec.autoscalerRef.primaryScalerReplicas, a feature which would enable
users to define a different scaling configurations for the primary.
This can be useful in the situation where the user does not want to
scale the canary workload to the exact same size as the primary,
especially when opting for a canary deployment pattern where only a
small portion of traffic is routed to the canary workload pods.
Signed-off-by: Aurel Canciu <aurelcanciu@gmail.com>
Add `.spec.analysis.sessionAffinity` to configure session affinity for
weighted routing. Add support for session affinity in the Istio router,
using the `Set-Cookie` and `Cookie` headers.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
- Push Flagger Helm chart to `ghcr.io/fluxcd/charts/flagger`
- Sign Flagger Helm chart with Cosign and GitHub OIDC
- Push install manifests and overlays from `./kustomize` with Flux CLI to `ghcr.io/fluxcd/flagger-manifests`
- Sign Flagger manifests with Cosign and GitHub OIDC
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
- Replace the Cosign static key with GitHub Actions OIDC when signing the flagger container image
- Sign the GitHub release assets checksums with Cosign keyless
- Sign the load-tester container image with Cosign keyless
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
Without this, the canary replicas are updated twice:
to 1 replica then after a few seconds to the value of HPA minReplicas.
In some cases, when updated to 1 replica (before updated by HPA
controller to the minReplicas), it's considered ready: 1 of 1 (readyThreshold 100%),
and the canary weight is advanced to receive traffic with less capacity
than expected.
Co-Authored-By: Joshua Gibeon <joshuagibeon7719@gmail.com>
Co-authored-by: Sanskar Jaiswal <hey@aryan.lol>
Signed-off-by: Andy Librian <andylibrian@gmail.com>
without this change the HTTPProxy - podinfo.test was not getting created due to the following warning:
```
test 4m11s Warning Synced canary/podinfo HTTPProxy podinfo.test create error: HTTPProxy.projectcontour.io "podinfo" is invalid: spec.routes.retryPolicy.retryOn: Unsupported value: "": supported values: "5xx", "gateway-error", "reset", "connect-failure", "retriable-4xx", "refused-stream", "retriable-status-codes", "retriable-headers", "cancelled", "deadline-exceeded", "internal", "resource-exhausted", "unavailable"
```
Signed-off-by: Mae Anne Large <Mpluya@users.noreply.github.com>
Signed-off-by: Mae Large <mlarge@vmware.com>
Prevents the canary from getting triggered, when a canary deploy is
updated to match the primary deploy after an analysis fails.
Signed-off-by: Sanskar Jaiswal <sanskar.jaiswal@weave.works>
Adds Gateway API as a provider for progressive traffic shifting, A/B
testing and Blue-Green testing. Adds a new field in the Canary
`spec.service.gatewayRefs` which specifies the Gateway that Flagger
should use.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
This allows to forbid access from canaries in non-whitelisted
namespaces.
In a multi-tenant context, this can be combined with network policies to
maintain isolation between namespaces.
Signed-off-by: Cédric Connes <cedric.connes@gmail.com>
Signed-off-by: Author Name <johnzhengaz@gmail.com>
It is easy tp raise: Halt advancement no values found for istio metric request-success-rate probably podinfo.test is not receiving traffic: running query failed: no values found
If it is inconsistence between the prometheus version and istio version.
Signed-off-by: John Zheng <john.zheng@hp.com>
Signed-off-by: Karl Heins <karlheins@northwesternmutual.com>
Support updating primary Deployment/DaemonSet/HPA/Service labels and annotations after first-time rollout
Also bring it up to same format as all the other
MAINTAINERS files (needed for fluxcd/community#155).
Signed-off-by: Daniel Holbach <daniel@weave.works>
In my self project I reference this nice script, the go.sum look like this(after run go mod tidy and
download):
```
k8s.io/apimachinery v0.22.1/go.mod h1:O3oNtNadZdeOMxHFVxOreoznohCpy0z6mocxbZr7oJ0=
k8s.io/apiserver v0.22.1/go.mod h1:2mcM6dzSt+XndzVQJX21Gx0/Klo7Aen7i0Ai6tIa400=
k8s.io/client-go v0.22.1 h1:jW0ZSHi8wW260FvcXHkIa0NLxFBQszTlhiAVsU5mopw=
k8s.io/client-go v0.22.1/go.mod h1:BquC5A4UOo4qVDUtoc04/+Nxp1MeHcVc1HJm1KmG8kk=
k8s.io/code-generator v0.22.1/go.mod h1:eV77Y09IopzeXOJzndrDyCI88UBok2h6WxAlBwpxa+o=
k8s.io/component-base v0.22.1 h1:SFqIXsEN3v3Kkr1bS6rstrs1wd45StJqbtgbQ4nRQdo=
```
as you can see the, sometimes the go.sum only has `version/go.mod` line,
if we run the scripts, it will fail like this:
chmod: cannot access '/home/longkai/pkg/mod/k8s.io/code-generator@v0.22.1/go.mod/generate-groups.sh': Not a directory
so this pr fix this.
Finally, the list is sort by version ast, we want to choose the newer one.
Signed-off-by: longkai <im.longkai@gmail.com>
If a "primary" ConfigMap or Secret already exists, keep the list of
ownerReferences and append the updating Canary as ownerReference if it's
not already in the list. This will prevent the GC from deleting primary
ConfigMaps and Secrets used by multiple primary deployments when one is
deleted.
Signed-off-by: Zacharias Taubert <zacharias.taubert@gmail.com>
Some third party software relies on annotations and labels on istios VirtualServices. For instance external-dns makes use of the `external-dns.alpha.kubernetes.io/controller` annotation. Currently there is no way to set labels and annotations on the VirtualService resource.
This change takes the metadata from the `canary.Spec.Service.Apex` property to replicate exactly what is already possible for a traefik resource:
c36a13ccff/pkg/router/traefik.go (L59-L68)Fix#854
Signed-off-by: Jonny Langefeld <jonny.langefeld@gmail.com>
A minor issue I stumbled across while learning how to drive Flagger, is that the docs still use `istio_request_duration_seconds_bucket` to illustrate the query behind the `request-duration` metric. I understand that this changed with Istio 1.5 (https://github.com/fluxcd/flagger/issues/478), but it seems that in the current version of flagger, the correct metric must already be used, since I'm getting duration metrics out of Istio 1.10 :)
This change simply makes the docs clearer for those of us trying to understand exactly what `request-duration` entails!
Signed-off-by: David Young <davidy@funkypenguin.co.nz>
In the SMI TrafficSplit spec, Weight and Service are
required values for TrafficSplit Backend.
In flagger's SMI v1alpha2 implementation,
Service and Weight have the omitempty json option.
During canary analysis, flagger initially creates
a SMI TrafficSplit custom resource in which the
canary backend service has a Weight of 0.
The omitempty option causes Go to omit Weight
when it sends the custom resource to Kubernetes.
This throws an error during canary analysis.
Signed-off-by: Johnson Shi <Johnson.Shi@microsoft.com>
- breaking change: drop support for Ingress `k8s.io/api/networking/v1beta1`
- routing: use Ingress `k8s.io/api/networking/v1` for NGINX and Skipper routers
- e2e: update ingress-nginx v0.46.0 and skipper to v0.13.61
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
The Deployment of the Flagger loadtester did not contain the correct
label app.kubernetes.io/name. This label is used for the Flagger
deployment and it is also used in the PodDisruptionBudget for
the Flagger operator. I added the same label to the Flagger
load tester to make the PodDisruptionBudget work correctly
for the Flagger loadtester.
Signed-off-by: Marcus Rodan <marcusrodan@gmail.com>
Use curly braces to specify an array value in helm set.
The latest versions of the chart need to have the additional arguments specified as a list or they error out:
```
Error: template: traefik/templates/_podtemplate.tpl:199:20: executing "traefik.podTemplate" at <.>: range can't iterate over --metrics.prometheus=true
```
Signed-off-by: Carson Anderson <carson.anderson@getweave.com>
This commit updates the linkerd version to `2.10`, along with
the install script to download the arm version.
It also updates the install script and metricsTempalte to install
and use the viz Prometheus respectively.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
In Linkerd 2.10, The Prometheus instance moved into the `viz`
extension which is installed separately from the core
control-plane. This means that the prometheus now exists in
the `linkerd-viz` namespace by default unless overriden.
This PR updates the URl to reflect the same
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Prometheus deployment created by the Helm chart is missing a pull secret,
variable is necessary to pull the prometheus image from private a repository
Signed-off-by: Joseph Villarreal Lopez <lapeyus@gmail.com>
and integrates with various Kubernetes ingress controllers, service mesh and monitoring solutions.
Flagger works with service mesh solutions (Istio, Linkerd, AWS App Mesh) and with Kubernetes ingress controllers (NGINX, Skipper, Gloo, Contour, Traefik).
Flagger can be configured to send alerts to various chat platforms such as Slack, Microsoft Teams, Discord and Rocket.
Flagger is a [Cloud Native Computing Foundation](https://cncf.io/) project
and part of [Flux](https://fluxcd.io) family of GitOps tools.
`metricsServer` | Prometheus URL, used when `prometheus.install` is `false` | `http://prometheus.istio-system:9090`
`prometheus.install` | If `true`, installs Prometheus configured to scrape all pods in the custer | `false`
`prometheus.retention`| Prometheus data retention | `2h`
`selectorLabels` | List of labels that Flagger uses to create pod selectors | `app,name,app.kubernetes.io/name`
`configTracking.enabled` | If `true`, flagger will track changes in Secrets and ConfigMaps referenced in the target deployment | `true`
`eventWebhook` | If set, Flagger will publish events to the given webhook | None
`slack.url` | Slack incoming webhook | None
`slack.channel` | Slack channel | None
`slack.user` | Slack username | `flagger`
`msteams.url` | Microsoft Teams incoming webhook | None
`podMonitor.enabled` | If `true`, create a PodMonitor for [monitoring the metrics](https://docs.flagger.app/usage/monitoring#metrics) | `false`
`podMonitor.namespace` | Namespace where the PodMonitor is created | the same namespace
`podMonitor.interval` | Interval at which metrics should be scraped | `15s`
`podMonitor.podMonitor` | Additional labels to add to the PodMonitor | `{}`
`leaderElection.enabled` | If `true`, Flagger will run in HA mode | `false`
`leaderElection.replicaCount` | Number of replicas | `1`
`serviceAccount.create` | If `true`, Flagger will create service account | `true`
`serviceAccount.name` | The name of the service account to create or use. If not set and `serviceAccount.create` is `true`, a name is generated using the Flagger fullname | `""`
`serviceAccount.annotations` | Annotations for service account | `{}`
`ingressAnnotationsPrefix` | Annotations prefix for ingresses | `custom.ingress.kubernetes.io`
`includeLabelPrefix` | List of prefixes of labels that are copied when creating primary deployments or daemonsets. Use * to include all | `""`
`rbac.create` | If `true`, create and use RBAC resources | `true`
`rbac.pspEnabled` | If `true`, create and use a restricted pod security policy | `false`
`crd.create` | If `true`, create Flagger's CRDs (should be enabled for Helm v2 only) | `false`
`resources.requests/cpu` | Pod CPU request | `10m`
`resources.requests/memory` | Pod memory request | `32Mi`
`resources.limits/cpu` | Pod CPU limit | `1000m`
`resources.limits/memory` | Pod memory limit | `512Mi`
`affinity` | Node/pod affinities | None
`nodeSelector` | Node labels for pod assignment | `{}`
`threadiness` | Number of controller workers | `2`
`tolerations` | List of node taints to tolerate | `[]`
`istio.kubeconfig.secretName` | The name of the Kubernetes secret containing the Istio shared control plane kubeconfig | None
`istio.kubeconfig.key` | The name of Kubernetes secret data key that contains the Istio control plane kubeconfig | `kubeconfig`
`ingressAnnotationsPrefix` | Annotations prefix for NGINX ingresses | None
`ingressClass` | Ingress class used for annotating HTTPProxy objects, e.g. `contour` | None
`podPriorityClassName` | PriorityClass name for pod priority configuration | ""
`podDisruptionBudget.enabled` | A PodDisruptionBudget will be created if `true` | `false`
`podDisruptionBudget.minAvailable` | The minimal number of available replicas that will be set in the PodDisruptionBudget | `1`
| `metricsServer` | Prometheus URL, used when `prometheus.install` is `false` | `http://prometheus.istio-system:9090` |
| `prometheus.install` | If `true`, installs Prometheus configured to scrape all pods in the custer | `false` |
| `prometheus.retention` | Prometheus data retention | `2h` |
| `selectorLabels` | List of labels that Flagger uses to create pod selectors | `app,name,app.kubernetes.io/name` |
| `serviceMonitor.enabled` | If `true`, creates service and serviceMonitor for monitoring Flagger metrics | `false` |
| `serviceMonitor.honorLabels` | If `true`, label conflicts are resolved by keeping label values from the scraped data and ignoring the conflicting server-side labels | `false` |
| `serviceMonitor.namespace` | Namespace Servicemonitor is installed in | the same namespace |
| `serviceMonitor.labels` | labels for the ServiceMonitor passed to Prometheus Operator | `{}` |
| `configTracking.enabled` | If `true`, flagger will track changes in Secrets and ConfigMaps referenced in the target deployment | `true` |
| `eventWebhook` | If set, Flagger will publish events to the given webhook | None |
| `slack.url` | Slack incoming webhook | None |
| `slack.proxyUrl` | Slack proxy url | None |
| `slack.channel` | Slack channel | None |
| `slack.user` | Slack username | `flagger` |
| `msteams.url` | Microsoft Teams incoming webhook | None |
| `msteams.proxyUrl` | Microsoft Teams proxy url | None |
| `clusterName` | When specified, Flagger will add the cluster name to alerts | `""` |
| `podMonitor.enabled` | If `true`, create a PodMonitor for [monitoring the metrics](https://docs.flagger.app/usage/monitoring#metrics) | `false` |
| `podMonitor.namespace` | Namespace where the PodMonitor is created | the same namespace |
| `podMonitor.interval` | Interval at which metrics should be scraped | `15s` |
| `podMonitor.podMonitor` | Additional labels to add to the PodMonitor | `{}` |
| `podMonitor.honorLabels` | If `true`, label conflicts are resolved by keeping label values from the scraped data and ignoring the conflicting server-side labels | `false` |
| `leaderElection.enabled` | If `true`, Flagger will run in HA mode | `false` |
| `leaderElection.replicaCount` | Number of replicas | `1` |
| `serviceAccount.create` | If `true`, Flagger will create service account | `true` |
| `serviceAccount.name` | The name of the service account to create or use. If not set and `serviceAccount.create` is `true`, a name is generated using the Flagger fullname | `""` |
| `serviceAccount.annotations` | Annotations for service account | `{}` |
| `ingressAnnotationsPrefix` | Annotations prefix for ingresses | `custom.ingress.kubernetes.io` |
| `includeLabelPrefix` | List of prefixes of labels that are copied when creating primary deployments or daemonsets. Use * to include all | `""` |
| `rbac.create` | If `true`, create and use RBAC resources | `true` |
| `rbac.pspEnabled` | If `true`, create and use a restricted pod security policy | `false` |
| `crd.create` | If `true`, create Flagger's CRDs (should be enabled for Helm v2 only) | `false` |
| `resources.requests/cpu` | Pod CPU request | `10m` |
| `resources.requests/memory` | Pod memory request | `32Mi` |
| `resources.limits/cpu` | Pod CPU limit | `1000m` |
| `resources.limits/memory` | Pod memory limit | `512Mi` |
"expr":"histogram_quantile(0.50, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$primary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le))",
"expr":"histogram_quantile(0.50, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$primary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le)) / 1000",
"format":"time_series",
"interval":"",
"intervalFactor":1,
@@ -411,7 +411,7 @@
"refId":"A"
},
{
"expr":"histogram_quantile(0.90, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$primary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le))",
"expr":"histogram_quantile(0.90, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$primary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le)) / 1000",
"format":"time_series",
"hide":false,
"intervalFactor":1,
@@ -419,7 +419,7 @@
"refId":"B"
},
{
"expr":"histogram_quantile(0.99, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$primary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le))",
"expr":"histogram_quantile(0.99, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$primary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le)) / 1000",
"format":"time_series",
"hide":false,
"intervalFactor":1,
@@ -509,7 +509,7 @@
"steppedLine":false,
"targets":[
{
"expr":"histogram_quantile(0.50, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$canary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le))",
"expr":"histogram_quantile(0.50, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$canary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le)) / 1000",
"format":"time_series",
"interval":"",
"intervalFactor":1,
@@ -517,7 +517,7 @@
"refId":"A"
},
{
"expr":"histogram_quantile(0.90, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$canary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le))",
"expr":"histogram_quantile(0.90, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$canary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le)) / 1000",
"format":"time_series",
"hide":false,
"intervalFactor":1,
@@ -525,7 +525,7 @@
"refId":"B"
},
{
"expr":"histogram_quantile(0.99, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$canary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le))",
"expr":"histogram_quantile(0.99, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$canary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le)) / 1000",
description:Flagger's load testing services based on rakyll/hey and bojand/ghz that generates traffic during canary analysis when configured as a webhook.
flag.StringVar(&eventWebhook,"event-webhook","","Webhook for publishing flagger events")
flag.StringVar(&msteamsURL,"msteams-url","","MS Teams incoming webhook URL.")
flag.StringVar(&msteamsProxyURL,"msteams-proxy-url","","MS Teams proxy URL.")
flag.StringVar(&includeLabelPrefix,"include-label-prefix","","List of prefixes of labels that are copied when creating primary deployments or daemonsets. Use * to include all.")
flag.StringVar(&namespace,"namespace","","Namespace that flagger would watch canary object.")
flag.StringVar(&meshProvider,"mesh-provider","istio","Service mesh provider, can be istio, linkerd, appmesh, contour, gloo, nginx, skipper or traefik.")
flag.StringVar(&meshProvider,"mesh-provider","istio","Service mesh provider, can be istio, linkerd, appmesh, contour, knative, gloo, nginx, skipper, traefik, apisix, osm or kuma.")
flag.StringVar(&selectorLabels,"selector-labels","app,name,app.kubernetes.io/name","List of pod labels that Flagger uses to create pod selectors.")
flag.StringVar(&ingressAnnotationsPrefix,"ingress-annotations-prefix","nginx.ingress.kubernetes.io","Annotations prefix for NGINX ingresses.")
flag.StringVar(&ingressClass,"ingress-class","","Ingress class used for annotating HTTPProxy objects.")
@@ -109,9 +121,12 @@ func init() {
flag.BoolVar(&enableConfigTracking,"enable-config-tracking",true,"Enable secrets and configmaps tracking.")
@@ -4,27 +4,38 @@ description: Flagger is a progressive delivery Kubernetes operator
# Introduction
[Flagger](https://github.com/weaveworks/flagger) is a **Kubernetes** operator that automates the promotion of canary deployments using **Istio**, **Linkerd**, **App Mesh**, **NGINX**, **Skipper**, **Contour**, **Gloo** or **Traefik** routing for traffic shifting and **Prometheus** metrics for canary analysis. The canary analysis can be extended with webhooks for running system integration/acceptance tests, load tests, or any other custom validation.
[Flagger](https://github.com/fluxcd/flagger) is a progressive delivery tool that automates the release
process for applications running on Kubernetes. It reduces the risk of introducing a new software
version in production by gradually shifting traffic to the new version while measuring metrics
and running conformance tests.
Flagger implements a control loop that gradually shifts traffic to the canary while measuring key performance indicators like HTTP requests success rate, requests average duration and pods health. Based on analysis of the **KPIs** a canary is promoted or aborted, and the analysis result is published to **Slack** or **MS Teams**.
Flagger can be configured with Kubernetes custom resources and is compatible with any CI/CD solutions made for Kubernetes. Since Flagger is declarative and reacts to Kubernetes events, it can be used in **GitOps** pipelines together with Flux CD or JenkinsX.
Flagger can be configured with Kubernetes custom resources and is compatible with
any CI/CD solutions made for Kubernetes. Since Flagger is declarative and reacts to Kubernetes events,
it can be used in **GitOps** pipelines together with tools like [Flux CD](install/flagger-install-with-flux.md).
This project is sponsored by [Weaveworks](https://www.weave.works/)
Flagger is a [Cloud Native Computing Foundation](https://cncf.io/) graduated project
and part of [Flux](https://fluxcd.io) family of GitOps tools.
## Getting started
To get started with Flagger, chose one of the supported routing providers and [install](install/flagger-install-on-kubernetes.md) Flagger with Helm or Kustomize.
To get started with Flagger, choose one of the supported routing providers and
[install](install/flagger-install-with-flux.md) Flagger with Flux CD.
After install Flagger, you can follow one of the tutorials:
After installing Flagger, you can follow one of these tutorials to get started:
@@ -8,16 +8,13 @@ Flagger is written in Go and uses Go modules for dependency management.
On your dev machine install the following tools:
* go >= 1.14
*git >= 2.20
*bash >= 5.0
*make >= 3.81
* kubectl >= 1.16
* kustomize >= 3.5
* helm >= 3.0
* docker >= 19.03
* go >= 1.25
*kubectl >= 1.30
*kustomize >= 5.0
*helm >= 3.0
You'll also need a Kubernetes cluster for testing Flagger. You can use Minikube, Kind, Docker desktop or any remote cluster \(AKS/EKS/GKE/etc\) Kubernetes version 1.14 or newer.
You'll also need a Kubernetes cluster for testing Flagger.
You can use Minikube, Kind, Docker desktop or any remote cluster (AKS/EKS/GKE/etc).
To start contributing to Flagger, fork the [repository](https://github.com/fluxcd/flagger) on GitHub.
@@ -99,6 +96,8 @@ make codegen
Run code formatters:
```bash
go install golang.org/x/tools/cmd/goimports@latest
make fmt
```
@@ -126,7 +125,9 @@ Note that any change to the CRDs must be accompanied by an update to the Open AP
## Manual testing
Install a service mesh and/or an ingress controller on your cluster and deploy Flagger using one of the install options [listed here](https://docs.flagger.app/install/flagger-install-on-kubernetes).
Install a service mesh and/or an ingress controller on your cluster
and deploy Flagger using one of the install options
**When should I use A/B testing instead of progressive traffic shifting?**
#### When should I use A/B testing instead of progressive traffic shifting?
For frontend applications that require session affinity you should use HTTP headers or cookies match conditions to ensure a set of users will stay on the same version for the whole duration of the canary analysis.
For frontend applications that require session affinity, you should use HTTP headers or
cookie match conditions to ensure a set of users will stay on the same version for
the whole duration of the canary analysis.
**Can I use Flagger to manage applications that live outside of a service mesh?**
#### Can I use Flagger to manage applications that live outside of a service mesh?
For applications that are not deployed on a service mesh, Flagger can orchestrate Blue/Green style deployments with Kubernetes L4 networking.
For applications that are not deployed on a service mesh,
Flagger can orchestrate Blue/Green style deployments with Kubernetes L4 networking.
**When can I use traffic mirroring?**
#### When can I use traffic mirroring?
Traffic mirroring can be used for Blue/Green deployment strategy or a pre-stage in a Canary release. Traffic mirroring will copy each incoming request, sending one request to the primary and one to the canary service. Mirroring should be used for requests that are **idempotent** or capable of being processed twice \(once by the primary and once by the canary\).
Traffic mirroring can be used for Blue/Green deployment strategy or a pre-stage in a Canary release.
Traffic mirroring will copy each incoming request, sending one request to the primary and one to the canary service.
Mirroring should be used for requests that are **idempotent**
or capable of being processed twice (once by the primary and once by the canary).
**How to retry a failed release?**
#### How to retry a failed release?
A canary analysis is triggered by changes in any of the following objects:
* ConfigMaps mounted as volumes or mapped to environment variables
* Secrets mounted as volumes or mapped to environment variables
@@ -43,11 +49,45 @@ spec:
timestamp:"2020-03-10T14:24:48+0000"
```
#### How to change replicas for a deployment when not using HPA?
To change replicas for a deployment when not using HPA, you have to update the canary deployment with the desired replica count
and trigger an analysis by annotating the template. After the analysis finishes, Flagger will promote the `spec.replicas` changes to the primary deployment.
Example:
```yaml
apiVersion:apps/v1
kind:Deployment
spec:
replicas:4#update replicas
template:
metadata:
annotations:
timestamp:"2022-02-10T14:24:48+0000"#add annotation to trigger analysis
```
#### Why is there a window of downtime during the canary initializing process when analysis is disabled?
A window of downtime is the intended behavior when the analysis is disabled. This allows instant rollback and also mimics the way
a Kubernetes deployment initialization works. To avoid this, enable the analysis (`skipAnalysis: false`), wait for the initialization
to finish, and disable it afterward (`skipAnalysis: true`).
#### How to disable cross namespace references?
Flagger by default can access resources across namespaces (`AlertProivder`, `MetricProvider` and Gloo `Upsteream`).
If you're in a multi-tenant environment and wish to disable this, you can do so through the `no-cross-namespace-refs` flag.
```
flagger \
-no-cross-namespace-refs=true \
...
```
## Kubernetes services
**How is an application exposed inside the cluster?**
#### How is an application exposed inside the cluster?
Assuming the app name is podinfo you can define a canary like:
Assuming the app name is `podinfo`, you can define a canary like:
```yaml
apiVersion:flagger.app/v1beta1
@@ -71,23 +111,26 @@ spec:
portName:http
```
If the `service.name` is not specified, then `targetRef.name` is used for the apex domain and canary/primary services name prefix. You should treat the service name as an immutable field, changing it could result in routing conflicts.
If the `service.name` is not specified, then `targetRef.name` is used for
the apex domain and canary/primary services name prefix.
You should treat the service name as an immutable field; changing its could result in routing conflicts.
Based on the canary spec service, Flagger generates the following Kubernetes ClusterIP service:
This ensures that traffic coming from a namespace outside the mesh to `podinfo.test:9898` will be routed to the latest stable release of your app.
This ensures that traffic coming from a namespace outside the mesh to `podinfo.test:9898`
will be routed to the latest stable release of your app.
```yaml
apiVersion:v1
@@ -133,13 +176,16 @@ spec:
targetPort:http
```
The `podinfo-canary.test:9898` address is available only during the canary analysis and can be used for conformance testing or load testing.
The `podinfo-canary.test:9898` address is available only during the canary analysis
and can be used for conformance testing or load testing.
## Multiple ports
**My application listens on multiple ports, how can I expose them inside the cluster?**
#### My application listens on multiple ports. How can I expose them inside the cluster?
If port discovery is enabled, Flagger scans the deployment spec and extracts the containers ports excluding the port specified in the canary service and Envoy sidecar ports. These ports will be used when generating the ClusterIP services.
If port discovery is enabled, Flagger scans the deployment spec and extracts the containers ports excluding
the port specified in the canary service and Envoy sidecar ports.
These ports will be used when generating the ClusterIP services.
For a deployment that exposes two ports:
@@ -183,7 +229,7 @@ Both port `8080` and `9090` will be added to the ClusterIP services.
## Label selectors
**What labels selectors are supported by Flagger?**
#### What labels selectors are supported by Flagger?
The target deployment must have a single label selector in the format `app: <DEPLOYMENT-NAME>`:
@@ -202,13 +248,93 @@ spec:
app:podinfo
```
Besides `app` Flagger supports `name` and `app.kubernetes.io/name` selectors. If you use a different convention you can specify your label with the `-selector-labels` flag.
Besides `app`, Flagger supports `name` and `app.kubernetes.io/name` selectors.
If you use a different convention, you can specify your label with the `-selector-labels` flag.
> **Note** that the metric interval should be lower or equal to the control loop interval.
**Can I use custom metrics?**
#### Can I use custom metrics?
The analysis can be extended with metrics provided by Prometheus, Datadog and AWS CloudWatch. For more details on how custom metrics can be used please read the [metrics docs](usage/metrics.md).
The analysis can be extended with metrics provided by Prometheus, Datadog, AWS CloudWatch, New Relic and Graphite.
For more details on how custom metrics can be used, please read the [metrics docs](usage/metrics.md).
#### Istio Gateway API
If you're using Istio with Gateway API, the Prometheus query needs to include `reporter="source"`. For example, to calculate HTTP requests error percentage, the query would be:
Flagger creates an Istio Virtual Service and Destination Rules based on the Canary service spec. The service configuration lets you expose an app inside or outside the mesh. You can also define traffic policies, HTTP match conditions, URI rewrite rules, CORS policies, timeout and retries.
Flagger creates an Istio Virtual Service and Destination Rules based on the Canary service spec.
The service configuration lets you expose an app inside or outside the mesh. You can also define traffic policies,
HTTP match conditions, URI rewrite rules, CORS policies, timeout and retries.
The following spec exposes the `frontend` workload inside the mesh on `frontend.test.svc.cluster.local:9898` and outside the mesh on `frontend.example.com`. You'll have to specify an Istio ingress gateway for external hosts.
The following spec exposes the `frontend` workload inside the mesh on `frontend.test.svc.cluster.local:9898`
and outside the mesh on `frontend.example.com`. You'll have to specify an Istio ingress gateway for external hosts.
```yaml
apiVersion:flagger.app/v1beta1
@@ -401,7 +558,7 @@ spec:
portName:http-frontend
# Istio gateways (optional)
gateways:
- public-gateway.istio-system.svc.cluster.local
- istio-system/public-gateway
- mesh
# Istio virtual service host names (optional)
hosts:
@@ -443,7 +600,7 @@ spec:
For the above spec Flagger will generate the following virtual service:
```yaml
apiVersion:networking.istio.io/v1alpha3
apiVersion:networking.istio.io/v1beta1
kind:VirtualService
metadata:
name:frontend
@@ -457,7 +614,7 @@ metadata:
uid:3a4a40dd-3875-11e9-8e1d-42010a9c0fd1
spec:
gateways:
- public-gateway.istio-system.svc.cluster.local
- istio-system/public-gateway
- mesh
hosts:
- frontend.example.com
@@ -496,7 +653,7 @@ spec:
For each destination in the virtual service a rule is generated:
```yaml
apiVersion:networking.istio.io/v1alpha3
apiVersion:networking.istio.io/v1beta1
kind:DestinationRule
metadata:
name:frontend-primary
@@ -507,7 +664,7 @@ spec:
tls:
mode:DISABLE
---
apiVersion:networking.istio.io/v1alpha3
apiVersion:networking.istio.io/v1beta1
kind:DestinationRule
metadata:
name:frontend-canary
@@ -519,9 +676,11 @@ spec:
mode:DISABLE
```
Flagger keeps in sync the virtual service and destination rules with the canary service spec. Any direct modification to the virtual service spec will be overwritten.
Flagger keeps in sync the virtual service and destination rules with the canary service spec.
Any direct modification to the virtual service spec will be overwritten.
To expose a workload inside the mesh on `http://backend.test.svc.cluster.local:9898`, the service spec can contain only the container port and the traffic policy:
To expose a workload inside the mesh on `http://backend.test.svc.cluster.local:9898`,
the service spec can contain only the container port and the traffic policy:
```yaml
apiVersion:flagger.app/v1beta1
@@ -562,9 +721,11 @@ spec:
app:backend-primary
```
Flagger works for user facing apps exposed outside the cluster via an ingress gateway and for backend HTTP APIs that are accessible only from inside the mesh.
Flagger works for user facing apps exposed outside the cluster via an ingress gateway and for backend HTTP APIs
that are accessible only from inside the mesh.
If `Delegation` is enabled, Flagger would generate Istio VirtualService without hosts and gateway, making the service compatible with Istio delegation.
If `Delegation` is enabled, Flagger would generate Istio VirtualService without hosts and gateway,
making the service compatible with Istio delegation.
```yaml
apiVersion:flagger.app/v1beta1
@@ -590,7 +751,7 @@ spec:
Based on the above spec, Flagger will create the following virtual service:
```yaml
apiVersion:networking.istio.io/v1alpha3
apiVersion:networking.istio.io/v1beta1
kind:VirtualService
metadata:
name:backend
@@ -613,17 +774,17 @@ spec:
weight:0
```
Therefore, The following virtual service forward the traffic to `/podinfo` by the above delegate VirtualService.
Therefore, the following virtual service forwards the traffic to `/podinfo` by the above delegate VirtualService.
```yaml
apiVersion:networking.istio.io/v1alpha3
apiVersion:networking.istio.io/v1beta1
kind:VirtualService
metadata:
name:frontend
namespace:test
spec:
gateways:
- public-gateway.istio-system.svc.cluster.local
- istio-system/public-gateway
- mesh
hosts:
- frontend.example.com
@@ -639,13 +800,17 @@ spec:
namespace:test
```
Note that pilot env `PILOT_ENABLE_VIRTUAL_SERVICE_DELEGATE` must also be set. \(For the use of Istio Delegation, you can refer to the documentation of [Virtual Service](https://istio.io/latest/docs/reference/config/networking/virtual-service/#Delegate) and [pilot environment variables](https://istio.io/latest/docs/reference/commands/pilot-discovery/#envvars).\)
Note that pilot env `PILOT_ENABLE_VIRTUAL_SERVICE_DELEGATE` must also be set.
For the use of Istio Delegation, you can refer to the documentation of
and [pilot environment variables](https://istio.io/latest/docs/reference/commands/pilot-discovery/#envvars).
## Istio Ingress Gateway
**How can I expose multiple canaries on the same external domain?**
#### How can I expose multiple canaries on the same external domain?
Assuming you have two apps, one that servers the main website and one that serves the REST API. For each app you can define a canary object as:
Assuming you have two apps -- one that serves the main website and one that serves its REST API --
you can define a canary object for each app as:
```yaml
apiVersion:flagger.app/v1beta1
@@ -656,7 +821,7 @@ spec:
service:
port:8080
gateways:
- public-gateway.istio-system.svc.cluster.local
- istio-system/public-gateway
hosts:
- my-site.com
match:
@@ -673,7 +838,7 @@ spec:
service:
port:8080
gateways:
- public-gateway.istio-system.svc.cluster.local
- istio-system/public-gateway
hosts:
- my-site.com
match:
@@ -683,13 +848,17 @@ spec:
uri:/
```
Based on the above configuration, Flagger will create two virtual services bounded to the same ingress gateway and external host. Istio Pilot will [merge](https://istio.io/help/ops/traffic-management/deploy-guidelines/#multiple-virtual-services-and-destination-rules-for-the-same-host) the two services and the website rule will be moved to the end of the list in the merged configuration.
Based on the above configuration, Flagger will create two virtual services bounded
This guide walks you through setting up Flagger and AWS App Mesh on EKS.
## App Mesh
The App Mesh integration with EKS is made out of the following components:
* Kubernetes custom resources
*`mesh.appmesh.k8s.aws` defines a logical boundary for network traffic between the services
*`virtualnode.appmesh.k8s.aws` defines a logical pointer to a Kubernetes workload
*`virtualservice.appmesh.k8s.aws` defines the routing rules for a workload inside the mesh
* CRD controller - keeps the custom resources in sync with the App Mesh control plane
* Admission controller - injects the Envoy sidecar and assigns Kubernetes pods to App Mesh virtual nodes
* Telemetry service - Prometheus instance that collects and stores Envoy's metrics
## Create a Kubernetes cluster
In order to create an EKS cluster you can use [eksctl](https://eksctl.io). Eksctl is an open source command-line utility made by Weaveworks in collaboration with Amazon.
On MacOS you can install eksctl with Homebrew:
```bash
brew tap weaveworks/tap
brew install weaveworks/tap/eksctl
```
Create an EKS cluster with:
```bash
eksctl create cluster --name=appmesh \
--region=us-west-2 \
--nodes 3\
--node-volume-size=120\
--appmesh-access
```
The above command will create a two nodes cluster with App Mesh [IAM policy](https://docs.aws.amazon.com/app-mesh/latest/userguide/MESH_IAM_user_policies.html) attached to the EKS node instance role.
Verify the install with:
```bash
kubectl get nodes
```
## Install Helm
Install the [Helm](https://docs.helm.sh/using_helm/#installing-helm) v3 command-line tool:
In order to collect the App Mesh metrics that Flagger needs to run the canary analysis, you'll need to setup a Prometheus instance to scrape the Envoy sidecars.
Now that you have Flagger running, you can try the [App Mesh canary deployments tutorial](https://docs.flagger.app/usage/appmesh-progressive-delivery).
You will be creating a cluster on Google’s Kubernetes Engine \(GKE\), if you don’t have an account you can sign up [here](https://cloud.google.com/free/) for free credits.
Login into Google Cloud, create a project and enable billing for it.
Install the [gcloud](https://cloud.google.com/sdk/) command line utility and configure your project with `gcloud init`.
Set the default project \(replace `PROJECT_ID` with your own project\):
```text
gcloud config set project PROJECT_ID
```
Set the default compute region and zone:
```text
gcloud config set compute/region us-central1
gcloud config set compute/zone us-central1-a
```
Enable the Kubernetes and Cloud DNS services for your project:
The above command will create a default node pool consisting of two `n1-highcpu-4` \(vCPU: 4, RAM 3.60GB, DISK: 30GB\) preemptible VMs. Preemptible VMs are up to 80% cheaper than regular instances and are terminated and replaced after a maximum of 24 hours.
You should consider using SSL between Helm and Tiller, for more information on securing your Helm installation see [docs.helm.sh](https://docs.helm.sh/using_helm/#securing-your-helm-installation).
## Install cert-manager
Jetstack's [cert-manager](https://github.com/jetstack/cert-manager) is a Kubernetes operator that automatically creates and manages TLS certs issued by Let’s Encrypt.
You'll be using cert-manager to provision a wildcard certificate for the Istio ingress gateway.
Normal CertIssued 1m52s cert-manager Certificate issued successfully
```
Recreate Istio ingress gateway pods:
```bash
kubectl -n istio-system get pods -l istio=ingressgateway
```
Note that Istio gateway doesn't reload the certificates from the TLS secret on cert-manager renewal. Since the GKE cluster is made out of preemptible VMs the gateway pods will be replaced once every 24h, if your not using preemptible nodes then you need to manually delete the gateway pods every two months before the certificate expires.
## Install Prometheus
The GKE Istio add-on does not include a Prometheus instance that scrapes the Istio telemetry service. Because Flagger uses the Istio HTTP metrics to run the canary analysis you have to deploy the following Prometheus configuration that's similar to the one that comes with the official Istio Helm chart.
Find the GKE Istio version with:
```bash
kubectl -n istio-system get deploy istio-pilot -oyaml | grep image:
Note that Flagger depends on Istio telemetry and Prometheus, if you're installing Istio with istioctl then you should be using the [default profile](https://istio.io/docs/setup/additional-setup/config-profiles/).
Note that Flagger depends on Istio telemetry and Prometheus, if you're installing
Note that the Istio kubeconfig must be stored in a Kubernetes secret with a data key named `kubeconfig`. For more details on how to configure Istio multi-cluster credentials read the [Istio docs](https://istio.io/docs/setup/install/multicluster/shared-vpn/#credentials).
Note that the Istio kubeconfig must be stored in a Kubernetes secret with a data key named `kubeconfig`.
For more details on how to configure Istio multi-cluster
credentials read the [Istio docs](https://istio.io/docs/setup/install/multicluster/shared-vpn/#credentials).
The command removes all the Kubernetes components associated with the chart and deletes the release.
> **Note** that on uninstall the Canary CRD will not be removed. Deleting the CRD will make Kubernetes remove all the objects owned by Flagger like Istio virtual services, Kubernetes deployments and ClusterIP services.
> **Note** that on uninstall the Canary CRD will not be removed. Deleting the CRD will make Kubernetes
> remove all the objects owned by Flagger like Istio virtual services, Kubernetes deployments and ClusterIP services.
If you want to remove all the objects created by Flagger you have delete the Canary CRD with kubectl:
If you want to remove all the objects created by Flagger you have to delete the Canary CRD with kubectl:
This deploys Flagger and Prometheus in the `flagger-system` namespace, sets the metrics server URL to `http://flagger-prometheus.flagger-system:9090` and the mesh provider to `kubernetes`.
The Prometheus instance has a two hours data retention and is configured to scrape all pods in your cluster that have the `prometheus.io/scrape: "true"` annotation.
To target a different provider you can specify it in the canary custom resource:
Install Flagger for Istio with Slack notifications:
```bash
kustomize build . | kubectl apply -f -
```
If you want to use MS Teams instead of Slack, replace `-slack-url` with `-msteams-url` and set the webhook address to `https://outlook.office.com/webhook/YOUR/TEAMS/WEBHOOK`.
Flagger requires a Kubernetes cluster **v1.19** or newer and Apache APISIX **v2.15** or newer and Apache APISIX Ingress Controller **v1.5.0** or newer.
Install Apache APISIX and Apache APISIX Ingress Controller with Helm v3:
Install Flagger and the Prometheus add-on in the same namespace as Apache APISIX:
```bash
helm repo add flagger https://flagger.app
helm upgrade -i flagger flagger/flagger \
--namespace apisix \
--set prometheus.install=true\
--set meshProvider=apisix
```
## Bootstrap
Flagger takes a Kubernetes deployment and optionally a horizontal pod autoscaler \(HPA\), then creates a series of objects \(Kubernetes deployments, ClusterIP services and an ApisixRoute\). These objects expose the application outside the cluster and drive the canary analysis and promotion.
Create a test namespace:
```bash
kubectl create ns test
```
Create a deployment and a horizontal pod autoscaler:
Create an Apache APISIX `ApisixRoute`, Flagger will reference and generate the canary Apache APISIX `ApisixRoute` \(replace `app.example.com` with your own domain\):
```yaml
apiVersion:apisix.apache.org/v2
kind:ApisixRoute
metadata:
name:podinfo
namespace:test
spec:
http:
- backends:
- serviceName:podinfo
servicePort:80
match:
hosts:
- app.example.com
methods:
- GET
paths:
- /*
name:method
plugins:
- name:prometheus
enable:true
config:
disable:false
prefer_name:true
```
Save the above resource as podinfo-apisixroute.yaml and then apply it:
```bash
kubectl apply -f ./podinfo-apisixroute.yaml
```
Create a canary custom resource \(replace `app.example.com` with your own domain\):
```yaml
apiVersion:flagger.app/v1beta1
kind:Canary
metadata:
name:podinfo
namespace:test
spec:
provider:apisix
targetRef:
apiVersion:apps/v1
kind:Deployment
name:podinfo
# apisix route reference
routeRef:
apiVersion:apisix.apache.org/v2
kind:ApisixRoute
name:podinfo
# the maximum time in seconds for the canary deployment
# to make progress before it is rollback (default 600s)
progressDeadlineSeconds:60
service:
# ClusterIP port number
port:80
# container port number or name
targetPort:9898
analysis:
# schedule interval (default 60s)
interval:10s
# max number of failed metric checks before rollback
Flagger implements a control loop that gradually shifts traffic to the canary while measuring key performance indicators like HTTP requests success rate, requests average duration and pod health. Based on analysis of the KPIs a canary is promoted or aborted, and the analysis result is published to Slack or MS Teams.
Warning Synced 2m59s flagger podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less than desired generation
Warning Synced 2m50s flagger podinfo-primary.test not ready: waiting for rollout to finish: 0 of 1 (readyThreshold 100%) updated replicas are available
Normal Synced 2m40s (x3 over 2m59s) flagger all the metrics providers are available!
Normal Synced 2m39s flagger Initialization done! podinfo.test
Normal Synced 2m20s flagger New revision detected! Scaling up podinfo.test
Warning Synced 2m (x2 over 2m10s) flagger canary deployment podinfo.test not ready: waiting for rollout to finish: 0 of 1 (readyThreshold 100%) updated replicas are available
Normal Synced 110s flagger Starting canary analysis for podinfo.test
Normal Synced 109s flagger Advance podinfo.test canary weight 10
Warning Synced 100s flagger Halt advancement no values found for apisix metric request-success-rate probably podinfo.test is not receiving traffic: running query failed: no values found
Normal Synced 90s flagger Advance podinfo.test canary weight 20
Normal Synced 80s flagger Advance podinfo.test canary weight 30
Normal Synced 69s flagger Advance podinfo.test canary weight 40
Normal Synced 59s flagger Advance podinfo.test canary weight 50
Warning Synced 30s (x2 over 40s) flagger podinfo-primary.test not ready: waiting for rollout to finish: 1 old replicas are pending termination
Normal Synced 9s (x3 over 50s) flagger (combined from similar events): Promotion completed! Scaling down podinfo.test
```
**Note** that if you apply new changes to the deployment during the canary analysis, Flagger will restart the analysis.
You can monitor all canaries with:
```bash
watch kubectl get canaries --all-namespaces
NAMESPACE NAME STATUS WEIGHT LASTTRANSITIONTIME
test podinfo-2 Progressing 10 2022-11-23T05:00:54Z
test podinfo Succeeded 0 2022-11-23T06:00:54Z
```
## Automated rollback
During the canary analysis you can generate HTTP 500 errors to test if Flagger pauses and rolls back the faulted version.
When the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary, the canary is scaled to zero and the rollout is marked as failed.
Edit the canary analysis and add the not found error rate check:
```yaml
analysis:
metrics:
- name:"404s percentage"
templateRef:
name:not-found-percentage
thresholdRange:
max:5
interval:1m
```
The above configuration validates the canary by checking if the HTTP 404 req/sec percentage is below 5 percent of the total traffic. If the 404s rate reaches the 5% threshold, then the canary fails.
The above procedures can be extended with more [custom metrics](../usage/metrics.md) checks, [webhooks](../usage/webhooks.md), [manual promotion](../usage/webhooks.md#manual-gating) approval and [Slack or MS Teams](../usage/alerting.md) notifications.
This guide shows you how to use App Mesh and Flagger to automate canary deployments. You'll need an EKS cluster configured with App Mesh, you can find the installion guide [here](https://docs.flagger.app/install/flagger-install-on-eks-appmesh).
## Bootstrap
Flagger takes a Kubernetes deployment and optionally a horizontal pod autoscaler \(HPA\), then creates a series of objects \(Kubernetes deployments, ClusterIP services, App Mesh virtual nodes and services\). These objects expose the application on the mesh and drive the canary analysis and promotion. The only App Mesh object you need to create by yourself is the mesh resource.
Create a mesh called `global`:
```bash
cat << EOF | kubectl apply -f -
apiVersion: appmesh.k8s.aws/v1beta2
kind: Mesh
metadata:
name: global
spec:
namespaceSelector:
matchLabels:
appmesh.k8s.aws/sidecarInjectorWebhook: enabled
EOF
```
Create a test namespace with App Mesh sidecar injection enabled:
```bash
cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
name: test
labels:
appmesh.k8s.aws/sidecarInjectorWebhook: enabled
EOF
```
Create a deployment and a horizontal pod autoscaler:
After the boostrap, the podinfo deployment will be scaled to zero and the traffic to `podinfo.test` will be routed to the primary pods. During the canary analysis, the `podinfo-canary.test` address can be used to target directly the canary pods.
App Mesh blocks all egress traffic by default. If your application needs to call another service, you have to create an App Mesh virtual service for it and add the virtual service name to the backend list.
* ConfigMaps and Secrets mounted as volumes or mapped to environment variables
Trigger a canary deployment by updating the container image:
```bash
kubectl -n testset image deployment/podinfo \
podinfod=stefanprodan/podinfo:3.1.1
```
Flagger detects that the deployment revision changed and starts a new rollout:
```text
kubectl -n test describe canary/podinfo
Status:
Canary Weight: 0
Failed Checks: 0
Phase: Succeeded
Events:
New revision detected! Scaling up podinfo.test
Waiting for podinfo.test rollout to finish: 0 of 1 updated replicas are available
Pre-rollout check acceptance-test passed
Advance podinfo.test canary weight 5
Advance podinfo.test canary weight 10
Advance podinfo.test canary weight 15
Advance podinfo.test canary weight 20
Advance podinfo.test canary weight 25
Advance podinfo.test canary weight 30
Advance podinfo.test canary weight 35
Advance podinfo.test canary weight 40
Advance podinfo.test canary weight 45
Advance podinfo.test canary weight 50
Copying podinfo.test template spec to podinfo-primary.test
Waiting for podinfo-primary.test rollout to finish: 1 of 2 updated replicas are available
Routing all traffic to primary
Promotion completed! Scaling down podinfo.test
```
When the canary analysis starts, Flagger will call the pre-rollout webhooks before routing traffic to the canary.
**Note** that if you apply new changes to the deployment during the canary analysis, Flagger will restart the analysis.
During the analysis the canary’s progress can be monitored with Grafana. The App Mesh dashboard URL is [http://localhost:3000/d/flagger-appmesh/appmesh-canary?refresh=10s&orgId=1&var-namespace=test&var-primary=podinfo-primary&var-canary=podinfo](http://localhost:3000/d/flagger-appmesh/appmesh-canary?refresh=10s&orgId=1&var-namespace=test&var-primary=podinfo-primary&var-canary=podinfo).
When the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary, the canary is scaled to zero and the rollout is marked as failed.
New revision detected! progressing canary analysis for podinfo.test
Pre-rollout check acceptance-test passed
Advance podinfo.test canary weight 5
Advance podinfo.test canary weight 10
Advance podinfo.test canary weight 15
Halt podinfo.test advancement success rate 69.17% < 99%
Halt podinfo.test advancement success rate 61.39% < 99%
Halt podinfo.test advancement success rate 55.06% < 99%
Halt podinfo.test advancement request duration 1.20s > 0.5s
Halt podinfo.test advancement request duration 1.45s > 0.5s
Rolling back podinfo.test failed checks threshold reached 5
Canary failed! Scaling down podinfo.test
```
If you’ve enabled the Slack notifications, you’ll receive a message if the progress deadline is exceeded, or if the analysis reached the maximum number of failed checks:
Besides weighted routing, Flagger can be configured to route traffic to the canary based on HTTP match conditions. In an A/B testing scenario, you'll be using HTTP headers or cookies to target a certain segment of your users. This is particularly useful for frontend applications that require session affinity.
New revision detected! progressing canary analysis for podinfo.test
Advance podinfo.test canary iteration 1/10
Advance podinfo.test canary iteration 2/10
Advance podinfo.test canary iteration 3/10
Advance podinfo.test canary iteration 4/10
Advance podinfo.test canary iteration 5/10
Advance podinfo.test canary iteration 6/10
Advance podinfo.test canary iteration 7/10
Advance podinfo.test canary iteration 8/10
Advance podinfo.test canary iteration 9/10
Advance podinfo.test canary iteration 10/10
Copying podinfo.test template spec to podinfo-primary.test
Waiting for podinfo-primary.test rollout to finish: 1 of 2 updated replicas are available
Routing all traffic to primary
Promotion completed! Scaling down podinfo.test
```
For an in-depth look at the analysis process read the [usage docs](../usage/how-it-works.md).
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.