Add support for v1 of Gateway API `HTTPRoute`. Drop support for v1alpha2
as it was deprecated almost a year ago.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Add a new field `.spec.webhooks[].retries` to specify the number of
retries when calling a webhook.
Signed-off-by: Joseph Kwasniewski <kwasniewski@gmail.com>
Fix the waiting logic to actually wait for the canary deployment to be
ready before continuing with the rest of the finalization logic.
Previously, the canary deployment was not being checked for a ready
status due to the the absence of the `Steps` field in the specified
backoff.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Add support for mirroring requests while performing B/G deployments with
Gateway API. A `RequestMirror` filter pointing to the canary service is
added to the HTTPRoute during a Canary run. During the Canary run, drift
correction for `.spec.rules[].filters` is disabled to avoid removing the
mirror filter.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
This adds a new Checksum field to the canary webhook body, which is a
hash of the LastAppliedSpec and TrackedConfigs.
This can be used to identify the rollout of a specific configuration,
and differentiate between webhooks being sent for different
configuration and deployment versions.
Signed-off-by: Kevin McDermott <kevin@weave.works>
Skipper's installation requires the creation of a PodSecurityPolicy
object. Since PSP was removed from k8s 1.25, we need to run tests for
skipper on k8s 1.24.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
When the annotation of ingress is not set, the returned value is nil
(not empty map). Trying to assign to this map leads to panic.
Signed-off-by: Jiří Pinkava <j-pi@seznam.cz>
Suspend, if set to true will suspend the Canary, disabling any canary runs
regardless of any changes to its target, services, etc. Note that if the
Canary is suspended during an analysis, its paused until the Canary is unsuspended.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Resume target scaler during finalization so that targetRef deployment
does not get stuck at 0 replicas after canary has been deleted.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Run the `confirm-rollout` webhook check right before scaling up the
deployment only, instead of running it on every loop.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
In Linkerd 2.13 the Prometheus instance in
the `linkerd-viz` namespace is now locked behind an
[_AuthorizationPolicy_](https://github.com/linkerd/linkerd2/blob/stable-2.13.1/viz/charts/linkerd-viz/templates/prometheus-policy.yaml)
that only allows access to the `metrics-api` _ServiceAccount_.
This adds an extra _AuthorizationPolicy_ to authorize the `flagger`
_ServiceAccount_. It's created by default when using Kustomize, but
needs to be opted-in when using Helm via the new
`linkerdAuthPolicy.create` value. This also implies that the Flagger
workload has to be injected by the Linkerd proxy, and that can't happen
in the same `linkerd` namespace where the control plane lives, so we're
moving Flagger into the new injected `flagger-system` namespace.
The `namespace` field in `kustomization.yml` was resetting the namespace
for the new _AuthorizationPolicy_ resource, so that gets restored back
to `linkerd-viz` using a `patchesJson6902` entry. A better way to do
this would have been to use the `unsetOnly` field in a
_NamespaceTransformer_ (see kubernetes-sigs/kustomize#4708) but for
the life of me I couldn't make that work...
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
Use regex filtering to match against session affinity cookie headers
when using Istio instead of an exact match.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Adding support for overriding the primary scaler replica count via
.spec.autoscalerRef.primaryScalerReplicas, a feature which would enable
users to define a different scaling configurations for the primary.
This can be useful in the situation where the user does not want to
scale the canary workload to the exact same size as the primary,
especially when opting for a canary deployment pattern where only a
small portion of traffic is routed to the canary workload pods.
Signed-off-by: Aurel Canciu <aurelcanciu@gmail.com>
Add `.spec.analysis.sessionAffinity` to configure session affinity for
weighted routing. Add support for session affinity in the Istio router,
using the `Set-Cookie` and `Cookie` headers.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
- Push Flagger Helm chart to `ghcr.io/fluxcd/charts/flagger`
- Sign Flagger Helm chart with Cosign and GitHub OIDC
- Push install manifests and overlays from `./kustomize` with Flux CLI to `ghcr.io/fluxcd/flagger-manifests`
- Sign Flagger manifests with Cosign and GitHub OIDC
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
- Replace the Cosign static key with GitHub Actions OIDC when signing the flagger container image
- Sign the GitHub release assets checksums with Cosign keyless
- Sign the load-tester container image with Cosign keyless
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
Without this, the canary replicas are updated twice:
to 1 replica then after a few seconds to the value of HPA minReplicas.
In some cases, when updated to 1 replica (before updated by HPA
controller to the minReplicas), it's considered ready: 1 of 1 (readyThreshold 100%),
and the canary weight is advanced to receive traffic with less capacity
than expected.
Co-Authored-By: Joshua Gibeon <joshuagibeon7719@gmail.com>
Co-authored-by: Sanskar Jaiswal <hey@aryan.lol>
Signed-off-by: Andy Librian <andylibrian@gmail.com>
without this change the HTTPProxy - podinfo.test was not getting created due to the following warning:
```
test 4m11s Warning Synced canary/podinfo HTTPProxy podinfo.test create error: HTTPProxy.projectcontour.io "podinfo" is invalid: spec.routes.retryPolicy.retryOn: Unsupported value: "": supported values: "5xx", "gateway-error", "reset", "connect-failure", "retriable-4xx", "refused-stream", "retriable-status-codes", "retriable-headers", "cancelled", "deadline-exceeded", "internal", "resource-exhausted", "unavailable"
```
Signed-off-by: Mae Anne Large <Mpluya@users.noreply.github.com>
Signed-off-by: Mae Large <mlarge@vmware.com>
Prevents the canary from getting triggered, when a canary deploy is
updated to match the primary deploy after an analysis fails.
Signed-off-by: Sanskar Jaiswal <sanskar.jaiswal@weave.works>
Adds Gateway API as a provider for progressive traffic shifting, A/B
testing and Blue-Green testing. Adds a new field in the Canary
`spec.service.gatewayRefs` which specifies the Gateway that Flagger
should use.
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
This allows to forbid access from canaries in non-whitelisted
namespaces.
In a multi-tenant context, this can be combined with network policies to
maintain isolation between namespaces.
Signed-off-by: Cédric Connes <cedric.connes@gmail.com>
Signed-off-by: Author Name <johnzhengaz@gmail.com>
It is easy tp raise: Halt advancement no values found for istio metric request-success-rate probably podinfo.test is not receiving traffic: running query failed: no values found
If it is inconsistence between the prometheus version and istio version.
Signed-off-by: John Zheng <john.zheng@hp.com>
Signed-off-by: Karl Heins <karlheins@northwesternmutual.com>
Support updating primary Deployment/DaemonSet/HPA/Service labels and annotations after first-time rollout
Also bring it up to same format as all the other
MAINTAINERS files (needed for fluxcd/community#155).
Signed-off-by: Daniel Holbach <daniel@weave.works>
In my self project I reference this nice script, the go.sum look like this(after run go mod tidy and
download):
```
k8s.io/apimachinery v0.22.1/go.mod h1:O3oNtNadZdeOMxHFVxOreoznohCpy0z6mocxbZr7oJ0=
k8s.io/apiserver v0.22.1/go.mod h1:2mcM6dzSt+XndzVQJX21Gx0/Klo7Aen7i0Ai6tIa400=
k8s.io/client-go v0.22.1 h1:jW0ZSHi8wW260FvcXHkIa0NLxFBQszTlhiAVsU5mopw=
k8s.io/client-go v0.22.1/go.mod h1:BquC5A4UOo4qVDUtoc04/+Nxp1MeHcVc1HJm1KmG8kk=
k8s.io/code-generator v0.22.1/go.mod h1:eV77Y09IopzeXOJzndrDyCI88UBok2h6WxAlBwpxa+o=
k8s.io/component-base v0.22.1 h1:SFqIXsEN3v3Kkr1bS6rstrs1wd45StJqbtgbQ4nRQdo=
```
as you can see the, sometimes the go.sum only has `version/go.mod` line,
if we run the scripts, it will fail like this:
chmod: cannot access '/home/longkai/pkg/mod/k8s.io/code-generator@v0.22.1/go.mod/generate-groups.sh': Not a directory
so this pr fix this.
Finally, the list is sort by version ast, we want to choose the newer one.
Signed-off-by: longkai <im.longkai@gmail.com>
If a "primary" ConfigMap or Secret already exists, keep the list of
ownerReferences and append the updating Canary as ownerReference if it's
not already in the list. This will prevent the GC from deleting primary
ConfigMaps and Secrets used by multiple primary deployments when one is
deleted.
Signed-off-by: Zacharias Taubert <zacharias.taubert@gmail.com>
Some third party software relies on annotations and labels on istios VirtualServices. For instance external-dns makes use of the `external-dns.alpha.kubernetes.io/controller` annotation. Currently there is no way to set labels and annotations on the VirtualService resource.
This change takes the metadata from the `canary.Spec.Service.Apex` property to replicate exactly what is already possible for a traefik resource:
c36a13ccff/pkg/router/traefik.go (L59-L68)Fix#854
Signed-off-by: Jonny Langefeld <jonny.langefeld@gmail.com>
A minor issue I stumbled across while learning how to drive Flagger, is that the docs still use `istio_request_duration_seconds_bucket` to illustrate the query behind the `request-duration` metric. I understand that this changed with Istio 1.5 (https://github.com/fluxcd/flagger/issues/478), but it seems that in the current version of flagger, the correct metric must already be used, since I'm getting duration metrics out of Istio 1.10 :)
This change simply makes the docs clearer for those of us trying to understand exactly what `request-duration` entails!
Signed-off-by: David Young <davidy@funkypenguin.co.nz>
In the SMI TrafficSplit spec, Weight and Service are
required values for TrafficSplit Backend.
In flagger's SMI v1alpha2 implementation,
Service and Weight have the omitempty json option.
During canary analysis, flagger initially creates
a SMI TrafficSplit custom resource in which the
canary backend service has a Weight of 0.
The omitempty option causes Go to omit Weight
when it sends the custom resource to Kubernetes.
This throws an error during canary analysis.
Signed-off-by: Johnson Shi <Johnson.Shi@microsoft.com>
- breaking change: drop support for Ingress `k8s.io/api/networking/v1beta1`
- routing: use Ingress `k8s.io/api/networking/v1` for NGINX and Skipper routers
- e2e: update ingress-nginx v0.46.0 and skipper to v0.13.61
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
The Deployment of the Flagger loadtester did not contain the correct
label app.kubernetes.io/name. This label is used for the Flagger
deployment and it is also used in the PodDisruptionBudget for
the Flagger operator. I added the same label to the Flagger
load tester to make the PodDisruptionBudget work correctly
for the Flagger loadtester.
Signed-off-by: Marcus Rodan <marcusrodan@gmail.com>
Use curly braces to specify an array value in helm set.
The latest versions of the chart need to have the additional arguments specified as a list or they error out:
```
Error: template: traefik/templates/_podtemplate.tpl:199:20: executing "traefik.podTemplate" at <.>: range can't iterate over --metrics.prometheus=true
```
Signed-off-by: Carson Anderson <carson.anderson@getweave.com>
This commit updates the linkerd version to `2.10`, along with
the install script to download the arm version.
It also updates the install script and metricsTempalte to install
and use the viz Prometheus respectively.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
In Linkerd 2.10, The Prometheus instance moved into the `viz`
extension which is installed separately from the core
control-plane. This means that the prometheus now exists in
the `linkerd-viz` namespace by default unless overriden.
This PR updates the URl to reflect the same
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Prometheus deployment created by the Helm chart is missing a pull secret,
variable is necessary to pull the prometheus image from private a repository
Signed-off-by: Joseph Villarreal Lopez <lapeyus@gmail.com>
`metricsServer` | Prometheus URL, used when `prometheus.install` is `false` | `http://prometheus.istio-system:9090`
`prometheus.install` | If `true`, installs Prometheus configured to scrape all pods in the custer | `false`
`prometheus.retention`| Prometheus data retention | `2h`
`selectorLabels` | List of labels that Flagger uses to create pod selectors | `app,name,app.kubernetes.io/name`
`configTracking.enabled` | If `true`, flagger will track changes in Secrets and ConfigMaps referenced in the target deployment | `true`
`eventWebhook` | If set, Flagger will publish events to the given webhook | None
`slack.url` | Slack incoming webhook | None
`slack.channel` | Slack channel | None
`slack.user` | Slack username | `flagger`
`msteams.url` | Microsoft Teams incoming webhook | None
`podMonitor.enabled` | If `true`, create a PodMonitor for [monitoring the metrics](https://docs.flagger.app/usage/monitoring#metrics) | `false`
`podMonitor.namespace` | Namespace where the PodMonitor is created | the same namespace
`podMonitor.interval` | Interval at which metrics should be scraped | `15s`
`podMonitor.podMonitor` | Additional labels to add to the PodMonitor | `{}`
`leaderElection.enabled` | If `true`, Flagger will run in HA mode | `false`
`leaderElection.replicaCount` | Number of replicas | `1`
`serviceAccount.create` | If `true`, Flagger will create service account | `true`
`serviceAccount.name` | The name of the service account to create or use. If not set and `serviceAccount.create` is `true`, a name is generated using the Flagger fullname | `""`
`serviceAccount.annotations` | Annotations for service account | `{}`
`ingressAnnotationsPrefix` | Annotations prefix for ingresses | `custom.ingress.kubernetes.io`
`includeLabelPrefix` | List of prefixes of labels that are copied when creating primary deployments or daemonsets. Use * to include all | `""`
`rbac.create` | If `true`, create and use RBAC resources | `true`
`rbac.pspEnabled` | If `true`, create and use a restricted pod security policy | `false`
`crd.create` | If `true`, create Flagger's CRDs (should be enabled for Helm v2 only) | `false`
`resources.requests/cpu` | Pod CPU request | `10m`
`resources.requests/memory` | Pod memory request | `32Mi`
`resources.limits/cpu` | Pod CPU limit | `1000m`
`resources.limits/memory` | Pod memory limit | `512Mi`
`affinity` | Node/pod affinities | None
`nodeSelector` | Node labels for pod assignment | `{}`
`threadiness` | Number of controller workers | `2`
`tolerations` | List of node taints to tolerate | `[]`
`istio.kubeconfig.secretName` | The name of the Kubernetes secret containing the Istio shared control plane kubeconfig | None
`istio.kubeconfig.key` | The name of Kubernetes secret data key that contains the Istio control plane kubeconfig | `kubeconfig`
`ingressAnnotationsPrefix` | Annotations prefix for NGINX ingresses | None
`ingressClass` | Ingress class used for annotating HTTPProxy objects, e.g. `contour` | None
`podPriorityClassName` | PriorityClass name for pod priority configuration | ""
`podDisruptionBudget.enabled` | A PodDisruptionBudget will be created if `true` | `false`
`podDisruptionBudget.minAvailable` | The minimal number of available replicas that will be set in the PodDisruptionBudget | `1`
| `metricsServer` | Prometheus URL, used when `prometheus.install` is `false` | `http://prometheus.istio-system:9090` |
| `prometheus.install` | If `true`, installs Prometheus configured to scrape all pods in the custer | `false` |
| `prometheus.retention` | Prometheus data retention | `2h` |
| `selectorLabels` | List of labels that Flagger uses to create pod selectors | `app,name,app.kubernetes.io/name` |
| `serviceMonitor.enabled` | If `true`, creates service and serviceMonitor for monitoring Flagger metrics | `false` |
| `serviceMonitor.honorLabels` | If `true`, label conflicts are resolved by keeping label values from the scraped data and ignoring the conflicting server-side labels | `false` |
| `serviceMonitor.namespace` | Namespace Servicemonitor is installed in | the same namespace |
| `serviceMonitor.labels` | labels for the ServiceMonitor passed to Prometheus Operator | `{}` |
| `configTracking.enabled` | If `true`, flagger will track changes in Secrets and ConfigMaps referenced in the target deployment | `true` |
| `eventWebhook` | If set, Flagger will publish events to the given webhook | None |
| `slack.url` | Slack incoming webhook | None |
| `slack.proxyUrl` | Slack proxy url | None |
| `slack.channel` | Slack channel | None |
| `slack.user` | Slack username | `flagger` |
| `msteams.url` | Microsoft Teams incoming webhook | None |
| `msteams.proxyUrl` | Microsoft Teams proxy url | None |
| `clusterName` | When specified, Flagger will add the cluster name to alerts | `""` |
| `podMonitor.enabled` | If `true`, create a PodMonitor for [monitoring the metrics](https://docs.flagger.app/usage/monitoring#metrics) | `false` |
| `podMonitor.namespace` | Namespace where the PodMonitor is created | the same namespace |
| `podMonitor.interval` | Interval at which metrics should be scraped | `15s` |
| `podMonitor.podMonitor` | Additional labels to add to the PodMonitor | `{}` |
| `leaderElection.enabled` | If `true`, Flagger will run in HA mode | `false` |
| `leaderElection.replicaCount` | Number of replicas | `1` |
| `serviceAccount.create` | If `true`, Flagger will create service account | `true` |
| `serviceAccount.name` | The name of the service account to create or use. If not set and `serviceAccount.create` is `true`, a name is generated using the Flagger fullname | `""` |
| `serviceAccount.annotations` | Annotations for service account | `{}` |
| `ingressAnnotationsPrefix` | Annotations prefix for ingresses | `custom.ingress.kubernetes.io` |
| `includeLabelPrefix` | List of prefixes of labels that are copied when creating primary deployments or daemonsets. Use * to include all | `""` |
| `rbac.create` | If `true`, create and use RBAC resources | `true` |
| `rbac.pspEnabled` | If `true`, create and use a restricted pod security policy | `false` |
| `crd.create` | If `true`, create Flagger's CRDs (should be enabled for Helm v2 only) | `false` |
| `resources.requests/cpu` | Pod CPU request | `10m` |
| `resources.requests/memory` | Pod memory request | `32Mi` |
| `resources.limits/cpu` | Pod CPU limit | `1000m` |
| `resources.limits/memory` | Pod memory limit | `512Mi` |
"expr":"histogram_quantile(0.50, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$primary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le))",
"expr":"histogram_quantile(0.50, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$primary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le)) / 1000",
"format":"time_series",
"interval":"",
"intervalFactor":1,
@@ -411,7 +411,7 @@
"refId":"A"
},
{
"expr":"histogram_quantile(0.90, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$primary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le))",
"expr":"histogram_quantile(0.90, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$primary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le)) / 1000",
"format":"time_series",
"hide":false,
"intervalFactor":1,
@@ -419,7 +419,7 @@
"refId":"B"
},
{
"expr":"histogram_quantile(0.99, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$primary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le))",
"expr":"histogram_quantile(0.99, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$primary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le)) / 1000",
"format":"time_series",
"hide":false,
"intervalFactor":1,
@@ -509,7 +509,7 @@
"steppedLine":false,
"targets":[
{
"expr":"histogram_quantile(0.50, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$canary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le))",
"expr":"histogram_quantile(0.50, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$canary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le)) / 1000",
"format":"time_series",
"interval":"",
"intervalFactor":1,
@@ -517,7 +517,7 @@
"refId":"A"
},
{
"expr":"histogram_quantile(0.90, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$canary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le))",
"expr":"histogram_quantile(0.90, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$canary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le)) / 1000",
"format":"time_series",
"hide":false,
"intervalFactor":1,
@@ -525,7 +525,7 @@
"refId":"B"
},
{
"expr":"histogram_quantile(0.99, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$canary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le))",
"expr":"histogram_quantile(0.99, sum(irate(istio_request_duration_milliseconds_bucket{reporter=\"destination\",destination_workload=~\"$canary\", destination_workload_namespace=~\"$namespace\"}[1m])) by (le)) / 1000",
description:Flagger's load testing services based on rakyll/hey and bojand/ghz that generates traffic during canary analysis when configured as a webhook.
flag.StringVar(&eventWebhook,"event-webhook","","Webhook for publishing flagger events")
flag.StringVar(&msteamsURL,"msteams-url","","MS Teams incoming webhook URL.")
flag.StringVar(&msteamsProxyURL,"msteams-proxy-url","","MS Teams proxy URL.")
flag.StringVar(&includeLabelPrefix,"include-label-prefix","","List of prefixes of labels that are copied when creating primary deployments or daemonsets. Use * to include all.")
flag.StringVar(&namespace,"namespace","","Namespace that flagger would watch canary object.")
flag.StringVar(&meshProvider,"mesh-provider","istio","Service mesh provider, can be istio, linkerd, appmesh, contour, gloo, nginx, skipper or traefik.")
flag.StringVar(&meshProvider,"mesh-provider","istio","Service mesh provider, can be istio, linkerd, appmesh, contour, gloo, nginx, skipper, traefik, apisix, osm or kuma.")
flag.StringVar(&selectorLabels,"selector-labels","app,name,app.kubernetes.io/name","List of pod labels that Flagger uses to create pod selectors.")
flag.StringVar(&ingressAnnotationsPrefix,"ingress-annotations-prefix","nginx.ingress.kubernetes.io","Annotations prefix for NGINX ingresses.")
flag.StringVar(&ingressClass,"ingress-class","","Ingress class used for annotating HTTPProxy objects.")
@@ -111,6 +119,8 @@ func init() {
flag.BoolVar(&enableConfigTracking,"enable-config-tracking",true,"Enable secrets and configmaps tracking.")
**When should I use A/B testing instead of progressive traffic shifting?**
#### When should I use A/B testing instead of progressive traffic shifting?
For frontend applications that require session affinity you should use HTTP headers or
cookies match conditions to ensure a set of users will stay on the same version for
For frontend applications that require session affinity, you should use HTTP headers or
cookie match conditions to ensure a set of users will stay on the same version for
the whole duration of the canary analysis.
**Can I use Flagger to manage applications that live outside of a service mesh?**
#### Can I use Flagger to manage applications that live outside of a service mesh?
For applications that are not deployed on a service mesh,
Flagger can orchestrate Blue/Green style deployments with Kubernetes L4 networking.
**When can I use traffic mirroring?**
#### When can I use traffic mirroring?
Traffic mirroring can be used for Blue/Green deployment strategy or a pre-stage in a Canary release.
Traffic mirroring will copy each incoming request, sending one request to the primary and one to the canary service.
Mirroring should be used for requests that are **idempotent**
or capable of being processed twice (once by the primary and once by the canary).
**How to retry a failed release?**
#### How to retry a failed release?
A canary analysis is triggered by changes in any of the following objects:
@@ -49,11 +49,45 @@ spec:
timestamp:"2020-03-10T14:24:48+0000"
```
#### How to change replicas for a deployment when not using HPA?
To change replicas for a deployment when not using HPA, you have to update the canary deployment with the desired replica count
and trigger an analysis by annotating the template. After the analysis finishes, Flagger will promote the `spec.replicas` changes to the primary deployment.
Example:
```yaml
apiVersion:apps/v1
kind:Deployment
spec:
replicas:4#update replicas
template:
metadata:
annotations:
timestamp:"2022-02-10T14:24:48+0000"#add annotation to trigger analysis
```
#### Why is there a window of downtime during the canary initializing process when analysis is disabled?
A window of downtime is the intended behavior when the analysis is disabled. This allows instant rollback and also mimics the way
a Kubernetes deployment initialization works. To avoid this, enable the analysis (`skipAnalysis: false`), wait for the initialization
to finish, and disable it afterward (`skipAnalysis: true`).
#### How to disable cross namespace references?
Flagger by default can access resources across namespaces (`AlertProivder`, `MetricProvider` and Gloo `Upsteream`).
If you're in a multi-tenant environment and wish to disable this, you can do so through the `no-cross-namespace-refs` flag.
```
flagger \
-no-cross-namespace-refs=true \
...
```
## Kubernetes services
**How is an application exposed inside the cluster?**
#### How is an application exposed inside the cluster?
Assuming the app name is podinfo you can define a canary like:
Assuming the app name is `podinfo`, you can define a canary like:
```yaml
apiVersion:flagger.app/v1beta1
@@ -79,19 +113,19 @@ spec:
If the `service.name` is not specified, then `targetRef.name` is used for
the apex domain and canary/primary services name prefix.
You should treat the service name as an immutable field, changing it could result in routing conflicts.
You should treat the service name as an immutable field; changing its could result in routing conflicts.
Based on the canary spec service, Flagger generates the following Kubernetes ClusterIP service:
> **Note** that the metric interval should be lower or equal to the control loop interval.
**Can I use custom metrics?**
#### Can I use custom metrics?
The analysis can be extended with metrics provided by Prometheus, Datadog and AWS CloudWatch.
For more details on how custom metrics can be used please read the [metrics docs](usage/metrics.md).
The analysis can be extended with metrics provided by Prometheus, Datadog, AWS CloudWatch, New Relic and Graphite.
For more details on how custom metrics can be used, please read the [metrics docs](usage/metrics.md).
#### Istio Gateway API
If you're using Istio with Gateway API, the Prometheus query needs to include `reporter="source"`. For example, to calculate HTTP requests error percentage, the query would be:
This guide walks you through setting up Flagger on Alibaba ServiceMesh.
## Prerequisites
- Created an ACK([Alibabacloud Container Service for Kubernetes](https://cs.console.aliyun.com)) cluster instance.
- Create an ASM([Alibaba ServiceMesh](https://servicemesh.console.aliyun.com)) enterprise instance and add ACK cluster.
### Variables declaration
-`$ACK_CONFIG`: the kubeconfig file path of ACK, which be treated as`$HOME/.kube/config` in the rest of guide.
-`$MESH_CONFIG`: the kubeconfig file path of ASM.
### Enable Data-plane KubeAPI access in ASM
In the Alibaba Cloud Service Mesh (ASM) console, on the basic information page, make sure Data-plane KubeAPI access is enabled. When enabled, the Istio resources of the control plane can be managed through the Kubeconfig of the data plane cluster.
## Enable Prometheus
In the Alibaba Cloud Service Mesh (ASM) console, click Settings to enable the collection of Prometheus monitoring metrics. You can use the self-built Prometheus monitoring, or you can use the Alibaba Cloud ARMS Prometheus monitoring plug-in that has joined the ACK cluster, and use ARMS Prometheus to collect monitoring indicators.
### Add data plane cluster to Alibaba Cloud Service Mesh (ASM)
In the Alibaba Cloud Service Mesh (ASM) console, click Cluster & Workload Management, select the Kubernetes cluster, select the target ACK cluster, and add it to ASM.
### Prometheus address
If you are using Alibaba Cloud Container Service for Kubernetes (ACK) ARMS Prometheus monitoring, replace {Region-ID} in the link below with your region ID, such as cn-hangzhou. {ACKID} is the ACK ID of the data plane cluster that you added to Alibaba Cloud Service Mesh (ASM). Visit the following links to query the public and intranet addresses monitored by ACK's ARMS Prometheus:
Flagger requires a Kubernetes cluster **v1.19** or newer and Apache APISIX **v2.15** or newer and Apache APISIX Ingress Controller **v1.5.0** or newer.
Install Apache APISIX and Apache APISIX Ingress Controller with Helm v3:
Install Flagger and the Prometheus add-on in the same namespace as Apache APISIX:
```bash
helm repo add flagger https://flagger.app
helm upgrade -i flagger flagger/flagger \
--namespace apisix \
--set prometheus.install=true\
--set meshProvider=apisix
```
## Bootstrap
Flagger takes a Kubernetes deployment and optionally a horizontal pod autoscaler \(HPA\), then creates a series of objects \(Kubernetes deployments, ClusterIP services and an ApisixRoute\). These objects expose the application outside the cluster and drive the canary analysis and promotion.
Create a test namespace:
```bash
kubectl create ns test
```
Create a deployment and a horizontal pod autoscaler:
Create an Apache APISIX `ApisixRoute`, Flagger will reference and generate the canary Apache APISIX `ApisixRoute` \(replace `app.example.com` with your own domain\):
```yaml
apiVersion:apisix.apache.org/v2
kind:ApisixRoute
metadata:
name:podinfo
namespace:test
spec:
http:
- backends:
- serviceName:podinfo
servicePort:80
match:
hosts:
- app.example.com
methods:
- GET
paths:
- /*
name:method
plugins:
- name:prometheus
enable:true
config:
disable:false
prefer_name:true
```
Save the above resource as podinfo-apisixroute.yaml and then apply it:
```bash
kubectl apply -f ./podinfo-apisixroute.yaml
```
Create a canary custom resource \(replace `app.example.com` with your own domain\):
```yaml
apiVersion:flagger.app/v1beta1
kind:Canary
metadata:
name:podinfo
namespace:test
spec:
provider:apisix
targetRef:
apiVersion:apps/v1
kind:Deployment
name:podinfo
# apisix route reference
routeRef:
apiVersion:apisix.apache.org/v2
kind:ApisixRoute
name:podinfo
# the maximum time in seconds for the canary deployment
# to make progress before it is rollback (default 600s)
progressDeadlineSeconds:60
service:
# ClusterIP port number
port:80
# container port number or name
targetPort:9898
analysis:
# schedule interval (default 60s)
interval:10s
# max number of failed metric checks before rollback
Flagger implements a control loop that gradually shifts traffic to the canary while measuring key performance indicators like HTTP requests success rate, requests average duration and pod health. Based on analysis of the KPIs a canary is promoted or aborted, and the analysis result is published to Slack or MS Teams.
Warning Synced 2m59s flagger podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less than desired generation
Warning Synced 2m50s flagger podinfo-primary.test not ready: waiting for rollout to finish: 0 of 1 (readyThreshold 100%) updated replicas are available
Normal Synced 2m40s (x3 over 2m59s) flagger all the metrics providers are available!
Normal Synced 2m39s flagger Initialization done! podinfo.test
Normal Synced 2m20s flagger New revision detected! Scaling up podinfo.test
Warning Synced 2m (x2 over 2m10s) flagger canary deployment podinfo.test not ready: waiting for rollout to finish: 0 of 1 (readyThreshold 100%) updated replicas are available
Normal Synced 110s flagger Starting canary analysis for podinfo.test
Normal Synced 109s flagger Advance podinfo.test canary weight 10
Warning Synced 100s flagger Halt advancement no values found for apisix metric request-success-rate probably podinfo.test is not receiving traffic: running query failed: no values found
Normal Synced 90s flagger Advance podinfo.test canary weight 20
Normal Synced 80s flagger Advance podinfo.test canary weight 30
Normal Synced 69s flagger Advance podinfo.test canary weight 40
Normal Synced 59s flagger Advance podinfo.test canary weight 50
Warning Synced 30s (x2 over 40s) flagger podinfo-primary.test not ready: waiting for rollout to finish: 1 old replicas are pending termination
Normal Synced 9s (x3 over 50s) flagger (combined from similar events): Promotion completed! Scaling down podinfo.test
```
**Note** that if you apply new changes to the deployment during the canary analysis, Flagger will restart the analysis.
You can monitor all canaries with:
```bash
watch kubectl get canaries --all-namespaces
NAMESPACE NAME STATUS WEIGHT LASTTRANSITIONTIME
test podinfo-2 Progressing 10 2022-11-23T05:00:54Z
test podinfo Succeeded 0 2022-11-23T06:00:54Z
```
## Automated rollback
During the canary analysis you can generate HTTP 500 errors to test if Flagger pauses and rolls back the faulted version.
When the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary, the canary is scaled to zero and the rollout is marked as failed.
Edit the canary analysis and add the not found error rate check:
```yaml
analysis:
metrics:
- name:"404s percentage"
templateRef:
name:not-found-percentage
thresholdRange:
max:5
interval:1m
```
The above configuration validates the canary by checking if the HTTP 404 req/sec percentage is below 5 percent of the total traffic. If the 404s rate reaches the 5% threshold, then the canary fails.
The above procedures can be extended with more [custom metrics](../usage/metrics.md) checks, [webhooks](../usage/webhooks.md), [manual promotion](../usage/webhooks.md#manual-gating) approval and [Slack or MS Teams](../usage/alerting.md) notifications.
# supported values for retryOn - https://projectcontour.io/docs/main/config/api/#projectcontour.io/v1.RetryOn
retryOn:"5xx"
# define the canary analysis timing and KPIs
analysis:
# schedule interval (default 60s)
@@ -157,7 +159,7 @@ service/podinfo-primary
httpproxy.projectcontour.io/podinfo
```
After the boostrap, the podinfo deployment will be scaled to zero and the traffic to `podinfo.test` will be routed to the primary pods. During the canary analysis, the `podinfo-canary.test` address can be used to target directly the canary pods.
After the bootstrap, the podinfo deployment will be scaled to zero and the traffic to `podinfo.test` will be routed to the primary pods. During the canary analysis, the `podinfo-canary.test` address can be used to target directly the canary pods.
## Expose the app outside the cluster
@@ -224,7 +226,7 @@ Trigger a canary deployment by updating the container image:
```bash
kubectl -n testset image deployment/podinfo \
podinfod=stefanprodan/podinfo:3.1.1
podinfod=ghcr.io/stefanprodan/podinfo:6.0.1
```
Flagger detects that the deployment revision changed and starts a new rollout:
@@ -281,7 +283,7 @@ Trigger a canary deployment:
```bash
kubectl -n testset image deployment/podinfo \
podinfod=stefanprodan/podinfo:3.1.2
podinfod=ghcr.io/stefanprodan/podinfo:6.0.2
```
Exec into the load tester pod with:
@@ -369,7 +371,7 @@ Trigger a canary deployment by updating the container image:
```bash
kubectl -n testset image deployment/podinfo \
podinfod=stefanprodan/podinfo:3.1.3
podinfod=ghcr.io/stefanprodan/podinfo:6.0.3
```
Flagger detects that the deployment revision changed and starts the A/B test:
> Note: The above installation sets the mesh provider to be `gatewayapi:v1`. If your Gateway API implementation uses the `v1beta1` CRDs, then
set the `--meshProvider` value to `gatewayapi:v1beta1`.
Create a namespace for the `Gateway`:
```bash
kubectl create ns istio-ingress
```
Create a `Gateway` that configures load balancing, traffic ACL, etc:
```yaml
apiVersion:gateway.networking.k8s.io/v1beta1
kind:Gateway
metadata:
name:gateway
namespace:istio-ingress
spec:
gatewayClassName:istio
listeners:
- name:default
hostname:"*.example.com"
port:80
protocol:HTTP
allowedRoutes:
namespaces:
from:All
```
## Bootstrap
Flagger takes a Kubernetes deployment and optionally a horizontal pod autoscaler \(HPA\), then creates a series of objects \(Kubernetes deployments, ClusterIP services, HTTPRoutes for the Gateway\). These objects expose the application inside the mesh and drive the canary analysis and promotion.
Create a test namespace:
```bash
kubectl create ns test
```
Create a deployment and a horizontal pod autoscaler:
Create metric templates targeting the Prometheus server in the `flagger-system` namespace. The PromQL queries below are meant for `Envoy`, but you can [change it to your ingress/mesh provider](https://docs.flagger.app/faq#metrics) accordingly.
Save the above resource as podinfo-canary.yaml and then apply it:
```bash
kubectl apply -f ./podinfo-canary.yaml
```
When the canary analysis starts, Flagger will call the pre-rollout webhooks before routing traffic to the canary. The canary analysis will run for five minutes while validating the HTTP metrics and rollout hooks every minute.
After a couple of seconds Flagger will create the canary objects:
* ConfigMaps mounted as volumes or mapped to environment variables
* Secrets mounted as volumes or mapped to environment variables
You can monitor how Flagger progressively changes the weights of the HTTPRoute object that is attahed to the Gateway with:
```bash
watch kubectl get httproute -n test podinfo -o=jsonpath='{.spec.rules}'
```
You can monitor all canaries with:
```bash
watch kubectl get canaries --all-namespaces
NAMESPACE NAME STATUS WEIGHT LASTTRANSITIONTIME
test podinfo Progressing 15 2022-01-16T14:05:07Z
prod frontend Succeeded 0 2022-01-15T16:15:07Z
prod backend Failed 0 2022-01-14T17:05:07Z
```
## Automated rollback
During the canary analysis you can generate HTTP 500 errors and high latency to test if Flagger pauses the rollout.
Trigger another canary deployment:
```bash
kubectl -n testset image deployment/podinfo \
podinfod=stefanprodan/podinfo:6.0.2
```
Exec into the load tester pod with:
```bash
kubectl -n testexec -it flagger-loadtester-xx-xx sh
```
Generate HTTP 500 errors:
```bash
watch curl http://podinfo-canary:9898/status/500
```
Generate latency:
```bash
watch curl http://podinfo-canary:9898/delay/1
```
When the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary, the canary is scaled to zero and the rollout is marked as failed.
```text
kubectl -n test describe canary/podinfo
Status:
Canary Weight: 0
Failed Checks: 10
Phase: Failed
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Synced 3m flagger Starting canary deployment for podinfo.test
Normal Synced 3m flagger Advance podinfo.test canary weight 5
Normal Synced 3m flagger Advance podinfo.test canary weight 10
Normal Synced 3m flagger Advance podinfo.test canary weight 15
Normal Synced 3m flagger Halt podinfo.test advancement error rate 69.17% > 1%
Normal Synced 2m flagger Halt podinfo.test advancement error rate 61.39% > 1%
Normal Synced 2m flagger Halt podinfo.test advancement error rate 55.06% > 1%
Normal Synced 2m flagger Halt podinfo.test advancement error rate 47.00% > 1%
Normal Synced 2m flagger (combined from similar events): Halt podinfo.test advancement error rate 38.08% > 1%
Warning Synced 1m flagger Rolling back podinfo.test failed checks threshold reached 10
Warning Synced 1m flagger Canary failed! Scaling down podinfo.test
```
## Session Affinity
While Flagger can perform weighted routing and A/B testing individually, with Gateway API it can combine the two leading to a Canary
release with session affinity.
For more information you can read the [deployment strategies docs](../usage/deployment-strategies.md#canary-release-with-session-affinity).
> **Note:** The implementation must have support for the [`ResponseHeaderModifier`](https://github.com/kubernetes-sigs/gateway-api/blob/3d22aa5a08413222cb79e6b2e245870360434614/apis/v1beta1/httproute_types.go#L651) API.
Create a canary custom resource \(replace www.example.com with your own domain\):
```yaml
apiVersion:flagger.app/v1beta1
kind:Canary
metadata:
name:podinfo
namespace:test
spec:
# deployment reference
targetRef:
apiVersion:apps/v1
kind:Deployment
name:podinfo
# the maximum time in seconds for the canary deployment
# to make progress before it is rollback (default 600s)
progressDeadlineSeconds:60
# HPA reference (optional)
autoscalerRef:
apiVersion:autoscaling/v2
kind:HorizontalPodAutoscaler
name:podinfo
service:
# service port number
port:9898
# container port number or name (optional)
targetPort:9898
# Gateway API HTTPRoute host names
hosts:
- www.example.com
# Reference to the Gateway that the generated HTTPRoute would attach to.
gatewayRefs:
- name:gateway
namespace:istio-ingress
analysis:
# schedule interval (default 60s)
interval:1m
# max number of failed metric checks before rollback
Trigger a canary deployment by updating the container image:
```bash
kubectl -n testset image deployment/podinfo \
podinfod=ghcr.io/stefanprodan/podinfo:6.0.1
```
You can load `www.example.com` in your browser and refresh it until you see the requests being served by `podinfo:6.0.1`.
All subsequent requests after that will be served by `podinfo:6.0.1` and not `podinfo:6.0.0` because of the session affinity
configured by Flagger in the HTTPRoute object.
# A/B Testing
Besides weighted routing, Flagger can be configured to route traffic to the canary based on HTTP match conditions. In an A/B testing scenario, you'll be using HTTP headers or cookies to target a certain segment of your users. This is particularly useful for frontend applications that require session affinity.
For applications that perform read operations, Flagger can be configured to do B/G tests with traffic mirroring.
Gateway API traffic mirroring will copy each incoming request, sending one request to the primary and one to the canary service.
The response from the primary is sent back to the user and the response from the canary is discarded.
Metrics are collected on both requests so that the deployment will only proceed if the canary metrics are within the threshold values.
Note that mirroring should be used for requests that are **idempotent** or capable of being processed twice \(once by the primary and once by the canary\).
You can enable mirroring by replacing `stepWeight` with `iterations` and by setting `analysis.mirror` to `true`:
```yaml
apiVersion:flagger.app/v1beta1
kind:Canary
metadata:
name:podinfo
namespace:test
spec:
# deployment reference
targetRef:
apiVersion:apps/v1
kind:Deployment
name:podinfo
service:
# service port number
port:9898
# container port number or name (optional)
targetPort:9898
# Gateway API HTTPRoute host names
hosts:
- www.example.com
# Reference to the Gateway that the generated HTTPRoute would attach to.
gatewayRefs:
- name:gateway
namespace:istio-ingress
analysis:
# schedule interval
interval:1m
# max number of failed metric checks before rollback
With the above configuration, Flagger will run a canary release with the following steps:
* detect new revision \(deployment spec, secrets or configmaps changes\)
* scale from zero the canary deployment
* wait for the HPA to set the canary minimum replicas
* check canary pods health
* run the acceptance tests
* abort the canary release if tests fail
* start the load tests
* mirror 100% of the traffic from primary to canary
* check request success rate and request duration every minute
* abort the canary release if the metrics check failure threshold is reached
* stop traffic mirroring after the number of iterations is reached
* route live traffic to the canary pods
* promote the canary \(update the primary secrets, configmaps and deployment spec\)
* wait for the primary deployment rollout to finish
* wait for the HPA to set the primary minimum replicas
* check primary pods health
* switch live traffic back to primary
* scale to zero the canary
* send notification with the canary analysis result
The above procedures can be extended with [custom metrics](../usage/metrics.md) checks, [webhooks](../usage/webhooks.md), [manual promotion](../usage/webhooks.md#manual-gating) approval and [Slack or MS Teams](../usage/alerting.md) notifications.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.