mirror of
https://github.com/fluxcd/flagger.git
synced 2026-03-06 11:41:16 +00:00
272 lines
8.1 KiB
Markdown
272 lines
8.1 KiB
Markdown
# steerer
|
|
|
|
[](https://travis-ci.org/stefanprodan/steerer)
|
|

|
|
|
|
Steerer is a Kubernetes operator that automates the promotion of canary deployments
|
|
using Istio routing for traffic shifting and Prometheus metrics for canary analysis.
|
|
|
|
### Install
|
|
|
|
Before installing Steerer make sure you have Istio setup up with Prometheus enabled.
|
|
If you are new to Istio you can follow my [GKE service mesh walk-through](https://github.com/stefanprodan/istio-gke).
|
|
|
|
Deploy Steerer in the `istio-system` using Helm:
|
|
|
|
```bash
|
|
# add Steerer Helm repo
|
|
helm repo add steerer https://stefanprodan.github.io/steerer
|
|
|
|
# install or upgrade Steerer
|
|
helm upgrade --install steerer steerer/steerer \
|
|
--namespace=istio-system \
|
|
--set metricsServer=http://prometheus.istio-system:9090 \
|
|
--set controlLoopInterval=1m
|
|
```
|
|
|
|
### Usage
|
|
|
|
Steerer requires two Kubernetes deployments: one for the version you want to upgrade called _primary_ and one for the _canary_.
|
|
Each deployment must have a corresponding ClusterIP service that exposes a port named http or https.
|
|
These services are used as destinations in a Istio virtual service.
|
|
|
|

|
|
|
|
Gated rollout stages:
|
|
|
|
* scan for deployments marked for rollout
|
|
* check Istio virtual service routes are mapped to primary and canary ClusterIP services
|
|
* check primary and canary deployments status
|
|
* halt rollout if a rolling update is underway
|
|
* halt rollout if pods are unhealthy
|
|
* increase canary traffic weight percentage from 0% to 10%
|
|
* check canary HTTP success rate
|
|
* halt rollout if percentage is under the specified threshold
|
|
* increase canary traffic wight by 10% (step wight) till it reaches 100% (max weight)
|
|
* halt rollout while canary request success rate is under the threshold
|
|
* halt rollout while canary request duration P99 is over the threshold
|
|
* halt rollout if the primary or canary deployment becomes unhealthy
|
|
* halt rollout while canary deployment is being scaled up/down by HPA
|
|
* promote canary to primary
|
|
* copy canary deployment spec template over primary
|
|
* wait for primary rolling update to finish
|
|
* halt rollout if pods are unhealthy
|
|
* route all traffic to primary
|
|
* scale to zero the canary deployment
|
|
* mark rollout as finished
|
|
* wait for the canary deployment to be updated (revision bump) and start over
|
|
|
|
You can change the canary analysis max weight and the step wight size in the rollout custom resource.
|
|
|
|
Assuming the primary deployment is named _podinfo_ and the canary one _podinfo-canary_, Steerer will require
|
|
a virtual service configured with weight-based routing:
|
|
|
|
```yaml
|
|
apiVersion: networking.istio.io/v1alpha3
|
|
kind: VirtualService
|
|
metadata:
|
|
name: podinfo
|
|
spec:
|
|
hosts:
|
|
- podinfo
|
|
http:
|
|
- route:
|
|
- destination:
|
|
host: podinfo
|
|
port:
|
|
number: 9898
|
|
weight: 100
|
|
- destination:
|
|
host: podinfo-canary
|
|
port:
|
|
number: 9898
|
|
weight: 0
|
|
```
|
|
|
|
Primary and canary services should expose a port named http:
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Service
|
|
metadata:
|
|
name: podinfo-canary
|
|
spec:
|
|
type: ClusterIP
|
|
selector:
|
|
app: podinfo-canary
|
|
ports:
|
|
- name: http
|
|
port: 9898
|
|
targetPort: 9898
|
|
```
|
|
|
|
Based on the two deployments, services and virtual service, a rollout can be defined using Steerer's custom resource:
|
|
|
|
```yaml
|
|
apiVersion: apps.weave.works/v1beta1
|
|
kind: Rollout
|
|
metadata:
|
|
name: podinfo
|
|
namespace: test
|
|
spec:
|
|
targetKind: Deployment
|
|
virtualService:
|
|
name: podinfo
|
|
primary:
|
|
name: podinfo
|
|
host: podinfo
|
|
canary:
|
|
name: podinfo-canary
|
|
host: podinfo-canary
|
|
canaryAnalysis:
|
|
# max traffic percentage routed to canary
|
|
# percentage (0-100)
|
|
maxWeight: 100
|
|
# canary increment step
|
|
# percentage (0-100)
|
|
stepWeight: 10
|
|
metrics:
|
|
- name: istio_requests_total
|
|
# minimum req success rate (non 5xx responses)
|
|
# percentage (0-100)
|
|
threshold: 99
|
|
interval: 1m
|
|
- name: istio_request_duration_seconds_bucket
|
|
# maximum req duration P99
|
|
# milliseconds
|
|
threshold: 500
|
|
interval: 1m
|
|
```
|
|
|
|
The canary analysis is using the following promql queries:
|
|
|
|
HTTP requests success rate percentage:
|
|
|
|
```sql
|
|
sum(
|
|
rate(
|
|
istio_requests_total{
|
|
reporter="destination",
|
|
destination_workload_namespace=~"$namespace",
|
|
destination_workload=~"$workload",
|
|
response_code!~"5.*"
|
|
}[$interval]
|
|
)
|
|
)
|
|
/
|
|
sum(
|
|
rate(
|
|
istio_requests_total{
|
|
reporter="destination",
|
|
destination_workload_namespace=~"$namespace",
|
|
destination_workload=~"$workload"
|
|
}[$interval]
|
|
)
|
|
)
|
|
```
|
|
|
|
HTTP requests milliseconds duration P99:
|
|
|
|
```sql
|
|
histogram_quantile(0.99,
|
|
sum(
|
|
irate(
|
|
istio_request_duration_seconds_bucket{
|
|
reporter="destination",
|
|
destination_workload=~"$workload",
|
|
destination_workload_namespace=~"$namespace"
|
|
}[$interval]
|
|
)
|
|
) by (le)
|
|
)
|
|
```
|
|
|
|
### Example
|
|
|
|

|
|
|
|
Create a test namespace with Istio sidecard injection enabled:
|
|
|
|
```bash
|
|
kubectl apply -f ./artifacts/namespaces/
|
|
```
|
|
|
|
Create the primary deployment and service:
|
|
|
|
```bash
|
|
kubectl apply -f ./artifacts/workloads/deployment.yaml
|
|
kubectl apply -f ./artifacts/workloads/service.yaml
|
|
```
|
|
|
|
Create the canary deployment, service and horizontal pod auto-scalar:
|
|
|
|
```bash
|
|
kubectl apply -f ./artifacts/workloads/deployment-canary.yaml
|
|
kubectl apply -f ./artifacts/workloads/service-canary.yaml
|
|
kubectl apply -f ./artifacts/workloads/hpa-canary.yaml
|
|
```
|
|
|
|
Create a virtual service (replace the gateway and the internet domain with your own):
|
|
|
|
```yaml
|
|
kubectl apply -f ./artifacts/workloads/virtual-service.yaml
|
|
```
|
|
|
|
Create a rollout custom resource:
|
|
|
|
```bash
|
|
kubectl apply -f ./artifacts/rollouts/podinfo.yaml
|
|
```
|
|
|
|
Rollout output:
|
|
|
|
```
|
|
kubectl -n test describe rollout/podinfo
|
|
|
|
Events:
|
|
Type Reason Age From Message
|
|
---- ------ ---- ---- -------
|
|
Normal Synced 3m steerer Starting rollout for podinfo.test
|
|
Normal Synced 3m steerer Advance rollout podinfo.test weight 10
|
|
Normal Synced 3m steerer Advance rollout podinfo.test weight 20
|
|
Normal Synced 2m steerer Advance rollout podinfo.test weight 30
|
|
Warning Synced 3m steerer Halt rollout podinfo.test request duration 2.525s > 500ms
|
|
Warning Synced 3m steerer Halt rollout podinfo.test request duration 1.567s > 500ms
|
|
Warning Synced 3m steerer Halt rollout podinfo.test request duration 823ms > 500ms
|
|
Normal Synced 2m steerer Advance rollout podinfo.test weight 40
|
|
Normal Synced 2m steerer Advance rollout podinfo.test weight 50
|
|
Normal Synced 1m steerer Advance rollout podinfo.test weight 60
|
|
Warning Synced 1m steerer Halt rollout podinfo.test success rate 82.33% < 99%
|
|
Warning Synced 1m steerer Halt rollout podinfo.test success rate 87.22% < 99%
|
|
Warning Synced 1m steerer Halt rollout podinfo.test success rate 94.74% < 99%
|
|
Normal Synced 1m steerer Advance rollout podinfo.test weight 70
|
|
Normal Synced 55s steerer Advance rollout podinfo.test weight 80
|
|
Normal Synced 45s steerer Advance rollout podinfo.test weight 90
|
|
Normal Synced 35s steerer Advance rollout podinfo.test weight 100
|
|
Normal Synced 25s steerer Copying podinfo-canary.test template spec to podinfo.test
|
|
Warning Synced 15s steerer Waiting for podinfo.test rollout to finish: 1 of 2 updated replicas are available
|
|
Normal Synced 5s steerer Promotion complete! Scaling down podinfo-canary.test
|
|
```
|
|
|
|
During the rollout you can generate HTTP 500 errors and high latency to test if Steerer pauses the rollout.
|
|
|
|
Create a tester pod and exec into it:
|
|
|
|
```bash
|
|
kubectl -n test run tester --image=quay.io/stefanprodan/podinfo:1.2.1 -- ./podinfo --port=9898
|
|
kubectl -n test exec -it tester-xx-xx sh
|
|
```
|
|
|
|
Generate HTTP 500 errors:
|
|
|
|
```bash
|
|
watch curl http://podinfo-canary:9898/status/500
|
|
```
|
|
|
|
Generate latency:
|
|
|
|
```bash
|
|
watch curl http://podinfo-canary:9898/delay/1
|
|
```
|
|
|