Compare commits

...

264 Commits
1.6.0 ... 1.91c

Author SHA1 Message Date
David Shay
e1db60b2b5 Requested changes to multi-arch build 2022-01-14 09:39:48 -05:00
David Shay
f3295b99ef Added support for multi-arch image build 2022-01-12 10:31:26 -05:00
Daniel Simionato
178ba93b5a Add ability to define ds annotations in helm chart 2022-01-12 07:25:11 +01:00
Christian Kotzbauer
f3ed0087d2 Merge pull request #493 from weaveworks/dependabot/github_actions/helm/chart-testing-action-2.2.0
build(deps): bump helm/chart-testing-action from 2.1.0 to 2.2.0
2022-01-07 20:41:40 +01:00
dependabot[bot]
71a273a14c build(deps): bump helm/chart-testing-action from 2.1.0 to 2.2.0
Bumps [helm/chart-testing-action](https://github.com/helm/chart-testing-action) from 2.1.0 to 2.2.0.
- [Release notes](https://github.com/helm/chart-testing-action/releases)
- [Commits](https://github.com/helm/chart-testing-action/compare/v2.1.0...v2.2.0)

---
updated-dependencies:
- dependency-name: helm/chart-testing-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-01-07 17:02:55 +00:00
Christian Kotzbauer
2b36eab0f8 Merge pull request #492 from weaveworks/feature/release-1.9.1
Prepare release 1.9.1
2022-01-06 19:13:05 +01:00
Christian Kotzbauer
aefd901b4e prepare release 1.9.1
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
2022-01-06 10:06:45 +01:00
Christian Kotzbauer
91b01b5524 Merge pull request #489 from dkulchinsky/dannyk/remove_env_values_from_logs
don't print env variable values in the logs (some are sensitive)
2022-01-05 05:55:28 +01:00
Christian Kotzbauer
f1255bff91 Merge pull request #490 from dkulchinsky/dannyk/deprecation_fix
small fix in deprecation log messages
2022-01-04 19:03:46 +01:00
Danny Kulchinsky
22a76f0da2 small fix in deprecation log messages 2022-01-04 12:23:22 -05:00
Danny Kulchinsky
b52a9587f3 don't print env variable values in the logs (some are sensitive) 2022-01-04 10:55:46 -05:00
Christian Kotzbauer
a6e1cf8191 Merge pull request #487 from weaveworks/release-1.9.0
Release 1.9.0
2021-12-17 14:14:42 +01:00
Christian Kotzbauer
d7576dce0f Merge pull request #456 from span/jsonlogging-chart
Jsonlogging chart
2021-12-17 10:33:58 +01:00
Christian Kotzbauer
661af3b042 prepare 1.9.0 2021-12-17 10:32:21 +01:00
Daniel Holbach
eec8ca1f9b Merge pull request #485 from weaveworks/dependabot/go_modules/github.com/spf13/viper-1.10.1
build(deps): bump github.com/spf13/viper from 1.10.0 to 1.10.1
2021-12-15 19:16:38 +01:00
dependabot[bot]
15356fa26d build(deps): bump github.com/spf13/viper from 1.10.0 to 1.10.1
Bumps [github.com/spf13/viper](https://github.com/spf13/viper) from 1.10.0 to 1.10.1.
- [Release notes](https://github.com/spf13/viper/releases)
- [Commits](https://github.com/spf13/viper/compare/v1.10.0...v1.10.1)

---
updated-dependencies:
- dependency-name: github.com/spf13/viper
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-12-15 17:55:30 +00:00
Daniel Holbach
7e3565a565 Merge pull request #484 from weaveworks/dependabot/go_modules/github.com/spf13/cobra-1.3.0
build(deps): bump github.com/spf13/cobra from 1.2.1 to 1.3.0
2021-12-15 18:45:36 +01:00
dependabot[bot]
a3bc03b4b9 build(deps): bump github.com/spf13/cobra from 1.2.1 to 1.3.0
Bumps [github.com/spf13/cobra](https://github.com/spf13/cobra) from 1.2.1 to 1.3.0.
- [Release notes](https://github.com/spf13/cobra/releases)
- [Changelog](https://github.com/spf13/cobra/blob/master/CHANGELOG.md)
- [Commits](https://github.com/spf13/cobra/compare/v1.2.1...v1.3.0)

---
updated-dependencies:
- dependency-name: github.com/spf13/cobra
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-12-15 17:12:47 +00:00
Daniel Holbach
22ce5a2628 Merge pull request #483 from weaveworks/dependabot/go_modules/github.com/spf13/viper-1.10.0
build(deps): bump github.com/spf13/viper from 1.9.0 to 1.10.0
2021-12-14 18:33:53 +01:00
dependabot[bot]
0f80b70478 build(deps): bump github.com/spf13/viper from 1.9.0 to 1.10.0
Bumps [github.com/spf13/viper](https://github.com/spf13/viper) from 1.9.0 to 1.10.0.
- [Release notes](https://github.com/spf13/viper/releases)
- [Commits](https://github.com/spf13/viper/compare/v1.9.0...v1.10.0)

---
updated-dependencies:
- dependency-name: github.com/spf13/viper
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-12-14 17:12:51 +00:00
Daniel Holbach
28be690849 Merge pull request #480 from weaveworks/dependabot/github_actions/nick-invision/retry-2.6.0
build(deps): bump nick-invision/retry from 2.5.1 to 2.6.0
2021-12-10 19:12:53 +01:00
dependabot[bot]
84292cc8c3 build(deps): bump nick-invision/retry from 2.5.1 to 2.6.0
Bumps [nick-invision/retry](https://github.com/nick-invision/retry) from 2.5.1 to 2.6.0.
- [Release notes](https://github.com/nick-invision/retry/releases)
- [Changelog](https://github.com/nick-invision/retry/blob/master/.releaserc.js)
- [Commits](https://github.com/nick-invision/retry/compare/v2.5.1...v2.6.0)

---
updated-dependencies:
- dependency-name: nick-invision/retry
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-12-10 17:02:55 +00:00
Christian Kotzbauer
21b54227a7 Merge pull request #479 from weaveworks/dependabot/go_modules/github.com/spf13/viper-1.9.0
build(deps): bump github.com/spf13/viper from 1.8.1 to 1.9.0
2021-12-09 18:42:24 +01:00
dependabot[bot]
8e3fb55ec4 build(deps): bump github.com/spf13/viper from 1.8.1 to 1.9.0
Bumps [github.com/spf13/viper](https://github.com/spf13/viper) from 1.8.1 to 1.9.0.
- [Release notes](https://github.com/spf13/viper/releases)
- [Commits](https://github.com/spf13/viper/compare/v1.8.1...v1.9.0)

---
updated-dependencies:
- dependency-name: github.com/spf13/viper
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-12-09 17:11:56 +00:00
Christian Kotzbauer
1a6592851e Merge pull request #459 from georgekaz/patch-1
Exclude terminated pods from the blocking mechanism
2021-12-09 14:02:49 +01:00
Christian Kotzbauer
bba3b8d83f Merge pull request #464 from dkulchinsky/viper_env_vars
bind environment variables to cobra flags with viper
2021-12-09 14:00:11 +01:00
Daniel Holbach
9c6d6a6d82 Merge pull request #476 from dholbach/fix-474
update to test against k8s 1.2{1,2,3} kind images
2021-12-08 10:34:12 +01:00
Daniel Holbach
997794eaac update to test against k8s 1.2{1,2,3} kind images
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-12-08 09:59:01 +01:00
Daniel Holbach
0763cdd95a Merge pull request #475 from dholbach/fix-473
Update k8s dependencies to 0.22.4
2021-12-07 08:40:35 +01:00
Daniel Holbach
c004566e97 ensure go version for tests
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-12-07 08:07:21 +01:00
Daniel Holbach
077ef2488e Update k8s dependencies to 0.22.4
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-12-06 15:08:54 +01:00
Daniel Holbach
06093ab53b Merge pull request #472 from dholbach/chart-1.8.2-update
update image tag to 1.8.2
2021-12-06 15:04:01 +01:00
Daniel Holbach
4d2019c07f update image tag to 1.8.2 2021-12-06 14:40:51 +01:00
Danny Kulchinsky
687aeda813 use sprintf for value in log 2021-12-02 12:05:07 -05:00
Danny Kulchinsky
acddd6b675 minor restructure and adding log for flag to env var binding 2021-12-01 20:59:12 -05:00
Danny Kulchinsky
54e7d93902 dedup const block 2021-12-01 14:50:53 -05:00
Danny Kulchinsky
2666b49d01 address review comments 2021-12-01 11:14:19 -05:00
Daniel Holbach
ff1a27ba8b Merge pull request #468 from weaveworks/fix-ghcr-login
fix ghcr.io login
2021-11-29 20:29:49 +01:00
Daniel Holbach
38ed636ecf fix ghcr.io login
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-11-29 16:59:36 +01:00
Daniel Holbach
8324b09bb9 Merge pull request #446 from weaveworks/revert-445-revert-439-feature/quay-registry
Add ghcr.io as second registry
2021-11-29 16:54:28 +01:00
Daniel Holbach
fb8677e7ac Move to GHCR as a backup for Docker Hub 2021-11-29 16:29:47 +01:00
Daniel Holbach
bdd16d4e01 Merge pull request #467 from weaveworks/dependabot/docker/cmd/kured/alpine-3.15.0
build(deps): bump alpine from 3.14 to 3.15.0 in /cmd/kured
2021-11-29 11:12:38 +01:00
dependabot[bot]
16e6d3c4d3 build(deps): bump alpine from 3.14 to 3.15.0 in /cmd/kured
Bumps alpine from 3.14 to 3.15.0.

---
updated-dependencies:
- dependency-name: alpine
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-11-29 09:51:54 +00:00
Daniel Holbach
af824bfd6a Merge pull request #466 from dholbach/follow-up-to-465
follow up to #465
2021-11-29 10:51:35 +01:00
Daniel Holbach
8264a529d6 follow up to #465
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-11-29 10:29:16 +01:00
Jean-Philippe Evrard
cd25017d67 Merge pull request #462 from jackfrancis/helm-chart-2.10.1
feat: update chart to 2.10.1 w/ 1.8.1 kured image
2021-11-27 11:18:49 +01:00
Daniel Holbach
4c1a23a047 Merge pull request #465 from dholbach/add-docker-dependabot
update docker images too
2021-11-26 09:45:14 +01:00
Daniel Holbach
8f86e1d4f8 update docker images too
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-11-26 09:12:52 +01:00
Danny Kulchinsky
79e19d84ba bind environment variables to cobra flags with viper 2021-11-25 13:53:30 -05:00
Jack
01396db3d1 feat: update chart to 2.10.1 w/ 1.8.1 kured image 2021-11-19 09:08:57 -08:00
georgekaz
d3b59b8922 Exclude terminated pods from the blocking mechanism
Terminated pods should be excluded from the blocking a reboot as per https://github.com/weaveworks/kured/issues/227

This adds status filters to the fieldSelector in order to do that. I've not updated tests here but have successfully tested the exact same filter using kubectl
2021-11-05 16:48:36 +00:00
Daniel Kvist
eafe2c3d98 Update README.md
Add default value for logformat.
2021-10-30 04:35:53 +02:00
Daniel Kvist
e4f1c7358c Add chart configuration for json logging 2021-10-28 10:49:44 +02:00
Daniel Holbach
348b5b4c96 Merge pull request #368 from atighineanu/proto_removed_slack
removed notifications/slack package [Merge after 1.7.0 release]
2021-10-28 08:43:27 +02:00
Christian Kotzbauer
c8a3a6ff9d Merge pull request #455 from span/jsonlogging
Support json logformatter
2021-10-27 18:24:02 +02:00
Daniel Holbach
c196d4e97f Merge pull request #457 from weaveworks/dependabot/github_actions/nick-invision/retry-2.5.1
build(deps): bump nick-invision/retry from 2.5.0 to 2.5.1
2021-10-25 19:26:47 +02:00
dependabot[bot]
efc98c8813 build(deps): bump nick-invision/retry from 2.5.0 to 2.5.1
Bumps [nick-invision/retry](https://github.com/nick-invision/retry) from 2.5.0 to 2.5.1.
- [Release notes](https://github.com/nick-invision/retry/releases)
- [Changelog](https://github.com/nick-invision/retry/blob/master/.releaserc.js)
- [Commits](https://github.com/nick-invision/retry/compare/v2.5.0...v2.5.1)

---
updated-dependencies:
- dependency-name: nick-invision/retry
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-25 17:02:51 +00:00
Daniel Kvist
b108aa4d2d Support json logformatter
This commit introduces a new flag '--log-format' that allows a user
to configure json logging on the pods. If the log-format
is not specified, the formatter will default to the existing
text formatter.
2021-10-25 14:38:53 +02:00
Christian Kotzbauer
2ae0a82510 Merge pull request #454 from weaveworks/dependabot/go_modules/github.com/prometheus/common-0.32.1
build(deps): bump github.com/prometheus/common from 0.32.0 to 0.32.1
2021-10-21 19:31:53 +02:00
dependabot[bot]
f95664156d build(deps): bump github.com/prometheus/common from 0.32.0 to 0.32.1
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.32.0 to 0.32.1.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.32.0...v0.32.1)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-21 17:11:08 +00:00
Christian Kotzbauer
891afda596 Merge pull request #453 from weaveworks/dependabot/go_modules/github.com/prometheus/common-0.32.0
build(deps): bump github.com/prometheus/common from 0.31.1 to 0.32.0
2021-10-21 09:21:07 +02:00
dependabot[bot]
2b89170417 build(deps): bump github.com/prometheus/common from 0.31.1 to 0.32.0
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.31.1 to 0.32.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.31.1...v0.32.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-20 17:10:49 +00:00
Daniel Holbach
de59c2614d Merge pull request #450 from weaveworks/dependabot/go_modules/github.com/containrrr/shoutrrr-0.5.2
build(deps): bump github.com/containrrr/shoutrrr from 0.5.1 to 0.5.2
2021-10-11 19:32:29 +02:00
dependabot[bot]
2e5cb81b4c build(deps): bump github.com/containrrr/shoutrrr from 0.5.1 to 0.5.2
Bumps [github.com/containrrr/shoutrrr](https://github.com/containrrr/shoutrrr) from 0.5.1 to 0.5.2.
- [Release notes](https://github.com/containrrr/shoutrrr/releases)
- [Changelog](https://github.com/containrrr/shoutrrr/blob/main/goreleaser.yml)
- [Commits](https://github.com/containrrr/shoutrrr/compare/v0.5.1...v0.5.2)

---
updated-dependencies:
- dependency-name: github.com/containrrr/shoutrrr
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-11 17:10:43 +00:00
Christian Kotzbauer
fde91041d5 Merge pull request #449 from weaveworks/feature/helm-1.8.0
helm: Prepare release for 1.8.0
2021-10-08 16:01:52 +02:00
Christian Kotzbauer
8a3f486ad9 feat: update to 1.8.0
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-10-08 15:40:57 +02:00
Christian Kotzbauer
513db7ce8c Merge pull request #448 from weaveworks/feature/release-1.8.0
docs: updated version table
2021-10-08 15:06:09 +02:00
Christian Kotzbauer
938cbd428c feat: add also missing prefer-no-schedule-taint
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-10-08 15:05:18 +02:00
Christian Kotzbauer
fa28b550b2 feat: add reboot-sentinel-command to helm-chart
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-10-08 14:56:30 +02:00
Christian Kotzbauer
164183e1bc fix: correct indent
ref: #447

Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-10-08 14:53:12 +02:00
Christian Kotzbauer
7d0499cc0a Merge pull request #430 from amorey/reboot-delay-documentation
Add `reboot-delay` CLI argument to docs, helm charts and manifests
2021-10-08 14:49:05 +02:00
Christian Kotzbauer
5e32864e0b Merge pull request #415 from MattJeanes/prometheus-alert-firing-option-chart
Add --alert-firing-only parameter to chart
2021-10-08 14:48:18 +02:00
Christian Kotzbauer
718faf4d31 Merge branch 'feature/helm-1.8.0' into prometheus-alert-firing-option-chart 2021-10-08 14:47:57 +02:00
Christian Kotzbauer
ac9e669b52 docs: updated version table
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-10-08 14:44:04 +02:00
Daniel Holbach
7c33ad8b6e Merge pull request #436 from weaveworks/dependabot/github_actions/guyarb/golang-test-annoations-0.5.0
Bump guyarb/golang-test-annoations from 0.4.0 to 0.5.0
2021-10-08 10:47:31 +02:00
Daniel Holbach
6f8d36e8db Merge pull request #445 from weaveworks/revert-439-feature/quay-registry
Revert "Add quay.io as second registry"
2021-10-08 10:12:13 +02:00
Daniel Holbach
688346e811 Revert "[WIP] Add quay.io as second registry" 2021-10-08 09:51:04 +02:00
Daniel Holbach
079425349d Merge pull request #444 from weaveworks/dependabot/github_actions/nick-invision/retry-2.5.0
Bump nick-invision/retry from 2.4.1 to 2.5.0
2021-10-08 09:35:50 +02:00
dependabot[bot]
d7589b16d7 Bump nick-invision/retry from 2.4.1 to 2.5.0
Bumps [nick-invision/retry](https://github.com/nick-invision/retry) from 2.4.1 to 2.5.0.
- [Release notes](https://github.com/nick-invision/retry/releases)
- [Changelog](https://github.com/nick-invision/retry/blob/master/.releaserc.js)
- [Commits](https://github.com/nick-invision/retry/compare/v2.4.1...v2.5.0)

---
updated-dependencies:
- dependency-name: nick-invision/retry
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-07 17:03:28 +00:00
atighineanu
bab1425e1a removed notifications/slack package
In this PR the slack-hook-url is translated
 into shoutrrr syntax. Therefore, slack pack
 age as well as checks for slack-hook-url in
 drain and reboot functions are removed.
 Also added a unit test for flagCheck(), this
 function also checks the (slack)URL syntax.
2021-10-07 10:37:47 +02:00
Daniel Holbach
4e1c05c5e3 Merge pull request #443 from weaveworks/feature/contrib-docs
doc: some clarification of release-docs
2021-10-01 08:45:17 +02:00
Christian Kotzbauer
2c7ca8261f doc: some clarification for release-docs
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-09-30 16:52:40 +02:00
Daniel Holbach
6ebf9a96f9 Merge pull request #439 from weaveworks/feature/quay-registry
[WIP] Add quay.io as second registry
2021-09-29 13:34:50 +02:00
Daniel Holbach
adffa11796 Merge pull request #440 from jackfrancis/maintainers-add-jackfrancis
Add jackfrancis to MAINTAINERS
2021-09-29 12:05:25 +02:00
Daniel Holbach
1152d72d51 Merge pull request #441 from weaveworks/dependabot/go_modules/github.com/prometheus/common-0.31.1
Bump github.com/prometheus/common from 0.31.0 to 0.31.1
2021-09-29 10:13:46 +02:00
dependabot[bot]
fb6a224f66 Bump github.com/prometheus/common from 0.31.0 to 0.31.1
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.31.0 to 0.31.1.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.31.0...v0.31.1)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-09-28 17:10:41 +00:00
Jack
c671dce161 Add jackfrancis to MAINTAINERS 2021-09-28 09:08:05 -07:00
Christian Kotzbauer
f8fc6e5017 build: add quay.io as second registry
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-09-28 17:42:49 +02:00
Daniel Holbach
effbf62987 Merge pull request #428 from weaveworks/k8s-1.21
Updated Kubernetes to 1.21
2021-09-28 10:15:50 +02:00
Daniel Holbach
6423bf0069 update to go 1.16 (follow the load of k8s 1.21)
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-09-28 09:06:35 +02:00
Christian Kotzbauer
9c81caa92e build: added k8s@1.22 and dropped k8s@1.19
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-09-28 09:06:35 +02:00
Christian Kotzbauer
978acba030 feat: updated to k8s@1.21
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-09-28 09:06:35 +02:00
Daniel Holbach
acef34e916 Merge pull request #437 from weaveworks/dependabot/go_modules/github.com/prometheus/common-0.31.0
Bump github.com/prometheus/common from 0.30.0 to 0.31.0
2021-09-28 08:59:22 +02:00
Daniel Holbach
f72ef8c2ca Merge pull request #438 from jackfrancis/kubectl-cordon-context
fix: don't use nil context in drain helper
2021-09-28 08:56:02 +02:00
Jack
3c2508050d fix: don't use nil context in drain helper 2021-09-27 12:43:20 -07:00
dependabot[bot]
483a5d8211 Bump github.com/prometheus/common from 0.30.0 to 0.31.0
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.30.0 to 0.31.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.30.0...v0.31.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-09-27 17:09:51 +00:00
dependabot[bot]
9b89a8c0fc Bump guyarb/golang-test-annoations from 0.4.0 to 0.5.0
Bumps [guyarb/golang-test-annoations](https://github.com/guyarb/golang-test-annoations) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/guyarb/golang-test-annoations/releases)
- [Commits](https://github.com/guyarb/golang-test-annoations/compare/v0.4.0...v0.5.0)

---
updated-dependencies:
- dependency-name: guyarb/golang-test-annoations
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-09-27 17:02:45 +00:00
Christian Kotzbauer
b5a4bf432c Merge pull request #360 from cnmcavoy/cnmcavoy/force-reboot-timeout-helm
Add force-reboot and drain timeouts to chart config and ds
2021-09-15 18:45:37 +02:00
Cameron McAvoy
cee15cfc32 Add force-reboot and drain timeouts to chart config and ds 2021-09-15 10:42:50 -05:00
Christian Kotzbauer
b2b1940435 fix: do not use array for stale action (#433) 2021-09-10 09:52:44 +02:00
Daniel Holbach
a9eb139f60 Merge pull request #431 from weaveworks/dependabot/go_modules/github.com/containrrr/shoutrrr-0.5.1
Bump github.com/containrrr/shoutrrr from 0.5.0 to 0.5.1
2021-09-02 08:01:45 +02:00
dependabot[bot]
d6e478ec6b Bump github.com/containrrr/shoutrrr from 0.5.0 to 0.5.1
Bumps [github.com/containrrr/shoutrrr](https://github.com/containrrr/shoutrrr) from 0.5.0 to 0.5.1.
- [Release notes](https://github.com/containrrr/shoutrrr/releases)
- [Changelog](https://github.com/containrrr/shoutrrr/blob/main/goreleaser.yml)
- [Commits](https://github.com/containrrr/shoutrrr/compare/v0.5.0...v0.5.1)

---
updated-dependencies:
- dependency-name: github.com/containrrr/shoutrrr
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-09-01 17:11:21 +00:00
Daniel Holbach
0955403470 Merge pull request #429 from weaveworks/alpine-3.14
build: updated to alpine@3.14
2021-08-30 10:54:35 +02:00
Andres Morey
a3f9796305 Add reboot-delay CLI argument to docs, manifests and helm charts 2021-08-26 16:26:21 +03:00
Christian Kotzbauer
9473f831be build: updated to alpine@3.14
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-08-25 20:19:03 +02:00
Daniel Holbach
3682eb36de Merge pull request #418 from amorey/reboot-delay
Add `reboot-delay` command line argument
2021-08-25 18:12:03 +02:00
Daniel Holbach
3900ee8876 Merge pull request #422 from weaveworks/dependabot/go_modules/github.com/containrrr/shoutrrr-0.5.0
Bump github.com/containrrr/shoutrrr from 0.4.4 to 0.5.0
2021-08-23 11:37:37 +02:00
dependabot[bot]
4c31084be8 Bump github.com/containrrr/shoutrrr from 0.4.4 to 0.5.0
Bumps [github.com/containrrr/shoutrrr](https://github.com/containrrr/shoutrrr) from 0.4.4 to 0.5.0.
- [Release notes](https://github.com/containrrr/shoutrrr/releases)
- [Changelog](https://github.com/containrrr/shoutrrr/blob/main/goreleaser.yml)
- [Commits](https://github.com/containrrr/shoutrrr/compare/v0.4.4...v0.5.0)

---
updated-dependencies:
- dependency-name: github.com/containrrr/shoutrrr
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-08-16 17:08:06 +00:00
David Höld
6c9ee57dc1 Change default updateStrategy to RollingUpdate (#420)
Incrementally update Pods by default when changing the DaemonSet spec.

Fixes #413

Co-authored-by: David Hoeld <david.hoeld@fujitsu.com>
2021-08-06 09:38:37 +02:00
Andres Morey
3c5eb968d3 Add reboot-delay command line argument
Currently, kured issues the system reboot command immediately after
kubectl drain finishes.

This is a problem for processes that need extra time to finish but aren't
running on pods and therefore aren't controlled by kubectl drain (e.g.
de-registering nodes from external load balancers).

This patch solves the problem by introducing a `reboot-delay` command
line argument that can be used to add a delay after kubectl drain
finishes but before the reboot command is issued.
2021-08-03 16:48:25 +03:00
Jean-Philippe Evrard
54c0e4e25f Merge pull request #410 from MattJeanes/prometheus-alert-firing-option
Add --alert-firing-only parameter to only consider firing alerts
2021-07-28 09:02:44 +02:00
Matt Jeanes
afac9d435a Add --alert-firing-only parameter to chart 2021-07-27 11:27:08 +01:00
Matt Jeanes
6af3f1abc1 Add --alert-firing-only parameter to only consider firing alerts 2021-07-27 11:23:10 +01:00
dependabot[bot]
a48da239bc Bump github.com/prometheus/common from 0.29.0 to 0.30.0 (#414)
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.29.0 to 0.30.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.29.0...v0.30.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-07-27 08:03:42 +02:00
SimeonPoot
c7d5810503 Restructuring Prometheus client, added unit-tests to regex-queries active alerts (#386)
* prometheus labels incl tests

* enable label in main, add log, docs

* revert the option to query by label

* revert the option to query by label

* PromClient instantiate by func,white space removal

* revert whitespace fix for readability.

* revert removal of newlines for readability

* rename New to NewPromClient to improve readability

Co-authored-by: simp <simp@saxobank.com>
2021-07-27 07:09:46 +02:00
Renaud Hager
6e16e993d9 Added possibility to mount volumes (#407)
* Added possibility to mount volumes

* Added a new line at the end of the file.

* Added a new line at the end of the file.

* Updated README.md
2021-07-26 13:19:02 +02:00
Daniel Holbach
24f4925b3f Merge pull request #408 from jackfrancis/chart-2.7.1-reboot-default
fix: common default reboot command for code and chart
2021-07-16 09:55:33 +02:00
Jack Francis
c0333d186e fix: common default reboot command for code and chart 2021-07-15 12:34:32 -07:00
Jean-Philippe Evrard
7a2b4a6a1a Merge pull request #405 from weaveworks/dependabot/github_actions/actions/stale-4
Bump actions/stale from 3.0.19 to 4
2021-07-14 19:28:23 +02:00
dependabot[bot]
fb7a7feb15 Bump actions/stale from 3.0.19 to 4
Bumps [actions/stale](https://github.com/actions/stale) from 3.0.19 to 4.
- [Release notes](https://github.com/actions/stale/releases)
- [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/stale/compare/v3.0.19...v4)

---
updated-dependencies:
- dependency-name: actions/stale
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-07-14 17:02:48 +00:00
Daniel Holbach
ffddfd7add Merge pull request #402 from piksel/patch-1
link to versioned shoutrrr docs
2021-07-05 11:33:49 +02:00
Daniel Holbach
a0bc7daa32 Merge pull request #401 from weaveworks/dependabot/go_modules/github.com/spf13/cobra-1.2.1
Bump github.com/spf13/cobra from 1.1.3 to 1.2.1
2021-07-05 10:13:39 +02:00
nils måsén
fd6f520b6e link to versioned shoutrrr docs
shoutrrr now have versioned docs to allow directly linking to the version that matches the one you use
changes should always backwards compatible, but not the other way around
2021-07-04 03:19:25 +02:00
dependabot[bot]
c2f275ebd0 Bump github.com/spf13/cobra from 1.1.3 to 1.2.1
Bumps [github.com/spf13/cobra](https://github.com/spf13/cobra) from 1.1.3 to 1.2.1.
- [Release notes](https://github.com/spf13/cobra/releases)
- [Changelog](https://github.com/spf13/cobra/blob/master/CHANGELOG.md)
- [Commits](https://github.com/spf13/cobra/compare/v1.1.3...v1.2.1)

---
updated-dependencies:
- dependency-name: github.com/spf13/cobra
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-07-02 17:11:01 +00:00
Daniel Holbach
01b0ca8cea Merge pull request #399 from weaveworks/dependabot/github_actions/helm/kind-action-1.2.0
Bump helm/kind-action from 1.1.0 to 1.2.0
2021-07-01 08:21:23 +02:00
dependabot[bot]
aa45139b80 Bump helm/kind-action from 1.1.0 to 1.2.0
Bumps [helm/kind-action](https://github.com/helm/kind-action) from 1.1.0 to 1.2.0.
- [Release notes](https://github.com/helm/kind-action/releases)
- [Commits](https://github.com/helm/kind-action/compare/v1.1.0...v1.2.0)

---
updated-dependencies:
- dependency-name: helm/kind-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-06-30 17:01:53 +00:00
Jean-Philippe Evrard
1654b75ec4 Merge pull request #396 from dholbach/fix-stale
our 'good first issue' issue label has no '-', add 'keep'
2021-06-23 18:11:30 +02:00
Daniel Holbach
e4da44a774 our 'good first issue' issue label has no '-', add 'keep'
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-06-22 15:33:27 +02:00
Jean-Philippe Evrard
e301908ae8 Merge pull request #391 from weaveworks/dependabot/go_modules/github.com/prometheus/common-0.29.0
Bump github.com/prometheus/common from 0.25.0 to 0.29.0
2021-06-20 11:11:45 +02:00
Renaud Hager
f442c6b632 Added rebootCommand values (#394)
* Added rebootCommand values

* Increased chart version from 2.6.0 to 2.7.0

* Updated README.md

* Added a space before a comment.
2021-06-17 18:14:09 +02:00
Daniel Holbach
8fc0a9daf2 Merge pull request #392 from weaveworks/dependabot/github_actions/nick-invision/retry-2.4.1
Bump nick-invision/retry from 2.4.0 to 2.4.1
2021-06-14 16:23:33 +02:00
dependabot[bot]
4d783e4321 Bump nick-invision/retry from 2.4.0 to 2.4.1
Bumps [nick-invision/retry](https://github.com/nick-invision/retry) from 2.4.0 to 2.4.1.
- [Release notes](https://github.com/nick-invision/retry/releases)
- [Changelog](https://github.com/nick-invision/retry/blob/master/.releaserc.js)
- [Commits](https://github.com/nick-invision/retry/compare/v2.4.0...v2.4.1)

---
updated-dependencies:
- dependency-name: nick-invision/retry
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-06-11 06:44:51 +00:00
dependabot[bot]
11f077f689 Bump github.com/prometheus/common from 0.25.0 to 0.29.0
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.25.0 to 0.29.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.25.0...v0.29.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-06-11 05:26:37 +00:00
Daniel Holbach
807b727ab3 Merge pull request #382 from dkulchinsky/fix_deprecation
fix slack deprecation notice
2021-05-31 10:03:04 +02:00
Danny Kulchinsky
c826d73695 fix slack deprecation notice 2021-05-28 13:52:01 -04:00
Daniel Holbach
5193f2de16 Merge pull request #379 from weaveworks/dependabot/github_actions/helm/chart-testing-action-2.1.0
Bump helm/chart-testing-action from 2.0.1 to 2.1.0
2021-05-26 08:59:12 +02:00
dependabot[bot]
310c6c114d Bump helm/chart-testing-action from 2.0.1 to 2.1.0
Bumps [helm/chart-testing-action](https://github.com/helm/chart-testing-action) from 2.0.1 to 2.1.0.
- [Release notes](https://github.com/helm/chart-testing-action/releases)
- [Commits](https://github.com/helm/chart-testing-action/compare/v2.0.1...v2.1.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-05-26 05:13:46 +00:00
Christian Kotzbauer
e1017f47fb Merge pull request #353 from spingel/release-lock-delay-chart
Add lockReleaseDelay parameter to helm chart
2021-05-20 13:55:54 +02:00
Steffen Pingel
42f69c7b1e sort parameters alphabetically 2021-05-20 13:28:12 +02:00
Steffen Pingel
e3f4a88a07 Add documentation for lockReleaseDelay parameter 2021-05-20 13:26:53 +02:00
Steffen Pingel
48dc84b3e6 Add lockReleaseDelay parameter to helm chart 2021-05-19 22:06:25 +02:00
Christian Kotzbauer
816c732f39 Merge pull request #338 from atighineanu/master
update chart definition to include --notify-url
2021-05-19 19:09:53 +02:00
Christian Kotzbauer
0bd22c7c56 Merge branch 'main' into master 2021-05-19 18:49:37 +02:00
Christian Kotzbauer
2850417e48 doc: update image-version 2021-05-19 18:48:51 +02:00
Daniel Holbach
4f8e9a0761 Merge pull request #377 from weaveworks/release-1.7.0
Release 1.7.0: Compatibility docs
2021-05-19 16:01:50 +02:00
Christian Kotzbauer
0cbc2d58d2 doc: add compat-line for 1.7.0 2021-05-19 15:17:02 +02:00
Daniel Holbach
11a62c8ce8 Merge pull request #349 from dholbach/fix-347
Update test matrix to latest 3 sets of k8s releases
2021-05-19 10:43:01 +02:00
Daniel Holbach
89d1fe497c use latest kind
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-05-19 10:20:06 +02:00
Daniel Holbach
870329c7b4 Bounce kubernetes testing versions
This update the test matrix to latest set of 3 minor k8s releases

Fixes: #347

Co-Authored-By: Jean-Philippe Evrard <open-source@a.spamming.party>
2021-05-19 10:17:46 +02:00
Daniel Holbach
78bb9d6c14 Merge pull request #376 from weaveworks/dependabot/go_modules/github.com/prometheus/common-0.25.0
Bump github.com/prometheus/common from 0.24.0 to 0.25.0
2021-05-19 10:16:49 +02:00
Daniel Holbach
c035259d0a Merge pull request #374 from weaveworks/dependabot/github_actions/actions/stale-3.0.19
Bump actions/stale from 3.0.18 to 3.0.19
2021-05-19 10:16:26 +02:00
dependabot[bot]
d08b42933d Bump github.com/prometheus/common from 0.24.0 to 0.25.0
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.24.0 to 0.25.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.24.0...v0.25.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-05-19 05:45:01 +00:00
dependabot[bot]
729fa658dc Bump actions/stale from 3.0.18 to 3.0.19
Bumps [actions/stale](https://github.com/actions/stale) from 3.0.18 to 3.0.19.
- [Release notes](https://github.com/actions/stale/releases)
- [Commits](https://github.com/actions/stale/compare/v3.0.18...v3.0.19)

Signed-off-by: dependabot[bot] <support@github.com>
2021-05-18 07:50:09 +00:00
Daniel Holbach
d7377bff1b update golang.org/x/crypto - break out of #349
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-05-18 09:38:38 +02:00
Daniel Holbach
42e4c317ae Merge pull request #369 from weaveworks/dependabot/go_modules/github.com/prometheus/common-0.24.0
Bump github.com/prometheus/common from 0.23.0 to 0.24.0
2021-05-11 09:08:50 +02:00
dependabot[bot]
5061a611a8 Bump github.com/prometheus/common from 0.23.0 to 0.24.0
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.23.0 to 0.24.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.23.0...v0.24.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-05-11 05:17:02 +00:00
Jean-Philippe Evrard
eca6da173c Clarify and simplify tests
Without this, we get multiple questions about our testing.
This should help clarify the tests and our coverage by:
- Simplifying our coverage
- Documenting better the purpose of each workflow file
- Documenting our testing and development activities better.
2021-05-04 11:24:20 +02:00
Jean-Philippe Evrard
7582e166be Merge pull request #367 from weaveworks/dependabot/go_modules/github.com/prometheus/common-0.23.0
Bump github.com/prometheus/common from 0.18.0 to 0.23.0
2021-05-04 08:43:27 +02:00
Jean-Philippe Evrard
de23444a5f Merge pull request #366 from papanito/papanito/update-docu-for-ms-teams
docu: update url for ms teams notifications, fixes #362
2021-05-04 08:42:54 +02:00
dependabot[bot]
4d5ea21db3 Bump github.com/prometheus/common from 0.18.0 to 0.23.0
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.18.0 to 0.23.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.18.0...v0.23.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-04-28 16:16:32 +00:00
papanito
bb56c731bb docu: update url for ms teams notifications, fixes #362 2021-04-28 09:48:23 +02:00
Daniel Holbach
ea6844d315 Merge pull request #365 from evrardjp/fix-kind-action
Use stable kind-action
2021-04-28 08:52:08 +02:00
Jean-Philippe Evrard
247e6f6c70 Use stable kind-action
We are relying on master, which might break anytime (or in this
case, moved to another branch).

Instead we should rely on a stable version, and unfreeze if
necessary. Dependabot helps us maintain those releases anyway.
2021-04-27 10:11:16 +02:00
Jean-Philippe Evrard
43a7a1a1ca Merge pull request #352 from spingel/release-lock-delay
Add parameter for delaying release of lock
2021-04-21 11:42:09 +02:00
Jean-Philippe Evrard
803ecef1de Merge pull request #324 from weaveworks/dependabot/go_modules/github.com/prometheus/client_golang-1.10.0
Bump github.com/prometheus/client_golang from 1.8.0 to 1.10.0
2021-04-21 11:33:49 +02:00
dependabot[bot]
0eb318c1b2 Bump github.com/prometheus/client_golang from 1.8.0 to 1.10.0
Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.8.0 to 1.10.0.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/master/CHANGELOG.md)
- [Commits](https://github.com/prometheus/client_golang/compare/v1.8.0...v1.10.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-04-21 08:22:04 +00:00
Daniel Holbach
6a7494fda5 Merge pull request #363 from weaveworks/dependabot/go_modules/github.com/containrrr/shoutrrr-0.4.4
Bump github.com/containrrr/shoutrrr from 0.4.3 to 0.4.4
2021-04-21 08:27:55 +02:00
dependabot[bot]
7b44fd2eb8 Bump github.com/containrrr/shoutrrr from 0.4.3 to 0.4.4
Bumps [github.com/containrrr/shoutrrr](https://github.com/containrrr/shoutrrr) from 0.4.3 to 0.4.4.
- [Release notes](https://github.com/containrrr/shoutrrr/releases)
- [Changelog](https://github.com/containrrr/shoutrrr/blob/main/goreleaser.yml)
- [Commits](https://github.com/containrrr/shoutrrr/compare/v0.4.3...v0.4.4)

Signed-off-by: dependabot[bot] <support@github.com>
2021-04-21 05:48:57 +00:00
Daniel Holbach
3f322dfbb2 Merge pull request #361 from weaveworks/dependabot/go_modules/github.com/containrrr/shoutrrr-0.4.3
Bump github.com/containrrr/shoutrrr from 0.4.2 to 0.4.3
2021-04-20 08:37:34 +02:00
dependabot[bot]
4a11a95b86 Bump github.com/containrrr/shoutrrr from 0.4.2 to 0.4.3
Bumps [github.com/containrrr/shoutrrr](https://github.com/containrrr/shoutrrr) from 0.4.2 to 0.4.3.
- [Release notes](https://github.com/containrrr/shoutrrr/releases)
- [Changelog](https://github.com/containrrr/shoutrrr/blob/main/goreleaser.yml)
- [Commits](https://github.com/containrrr/shoutrrr/compare/v0.4.2...v0.4.3)

Signed-off-by: dependabot[bot] <support@github.com>
2021-04-20 05:49:19 +00:00
Jean-Philippe Evrard
0b759a9ff6 Update kured-ds.yaml
Without this patch, it's not clear that we added command line
arguments recently. This should expose our latest changes in the
future released manifest.
2021-04-14 19:52:25 +02:00
Daniel Holbach
496d2b26d8 Merge pull request #354 from evrardjp/test-prom
Add prometheus export metrics functional testing
2021-04-14 10:11:26 +02:00
Daniel Holbach
c1a9de6622 Merge pull request #355 from evrardjp/fix-linter-false-positive
Reduce false positives
2021-04-14 10:10:54 +02:00
Jean-Philippe Evrard
79f22cee67 Merge branch 'main' into release-lock-delay 2021-04-14 09:48:28 +02:00
Jean-Philippe Evrard
83415d0e59 Reduce false positives in chart testing
Without this change, the "Test helm chart (install) action" will
rightfully succeed when our helm chart gets installed and has
no syntax issues. However, it doesn't test if kured is properly
installed. For example, the helm chart can try to install a
yet unpublished image, and our test will succeed, as the syntax
is still valid.

This is a problem, as everything looks green, but it's not
effectively working. Our other jobs are focusing on code changes,
so they rightfully override the image tag, which is not what
we want in this "Test helm chart" action.

This fixes it by adding an extra job in the workflow, depending
on the chart testing.
2021-04-13 17:20:06 +02:00
Jean-Philippe Evrard
8046977d1b Merge pull request #341 from cnmcavoy/cnmcavoy/force-reboot-timeout
Add force-reboot after force-timeout duration has been exceeded
2021-04-13 16:47:41 +02:00
Jean-Philippe Evrard
240a669727 Add prometheus export metrics functional testing
Without this, we can't know if the exposed prometheus metrics
behave properly.

This is a problem, as the only way we can evaluate the success
(right now), is a compilation success or failure from kured.
While this is a good start, it doesn't translate to what we
claim to offer: A boolean showing if a reboot is required.

This fixes it by creating a new github action workflow testing
if the float64 gauge is properly showing 0 for no reboot, 1 for reboot.
This is done by exposing the metrics endpoint through a node port.
A helm chart change was required to have the ability to expose
the service on a node port. We connect to the kind node through
docker in the `tests/test-metrics.sh`, where we curl the nodeport,
extract the only relevant metric, and compare it to the expected result.
2021-04-13 16:17:42 +02:00
Steffen Pingel
f7b3de36a6 Add parameter for delaying release of lock
This support throtteling of reboots across the cluster
and allows rebooted nodes to reschedule pods, e.g.
to synchronize replicated state before rebooting the next node.
2021-04-13 10:14:14 +02:00
Jean-Philippe Evrard
4c4508a586 Merge pull request #342 from jackfrancis/retry-daemonset-get
chore: retry daemonset get operations
2021-04-13 09:50:45 +02:00
Jean-Philippe Evrard
4e4c29aec0 Merge pull request #350 from dholbach/update-k8s-deps
update to latest k8s deps of 1.20 branch
2021-04-12 11:29:05 +02:00
Daniel Holbach
59d5266005 update to latest k8s deps of 1.20 branch
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-04-12 11:05:07 +02:00
Cameron McAvoy
25dcf3cb12 Expose SkipWaitForDeleteTimeoutSeconds and explicitly return when cordonning fails 2021-04-08 09:52:15 -05:00
Cameron McAvoy
5a86ef40e8 Update the default drain timeout to be infinite 2021-04-07 17:17:33 -05:00
Cameron McAvoy
2400f34cc0 Don't panic if the cordon fails and force-reboot is true 2021-04-07 14:58:21 -05:00
Cameron McAvoy
8db5650510 Refactor force-drain to be a drain-timeout in general 2021-04-07 12:57:01 -05:00
Jack Francis
390f6e9f99 chore: retry daemonset get operations 2021-04-07 09:27:05 -07:00
Cameron McAvoy
65292983f2 Add force-reboot after force-timeout duration has been exceeded 2021-04-07 09:39:01 -05:00
atighineanu
120bf713c0 update chart definition to include --notify-url 2021-04-07 13:26:02 +02:00
Daniel Holbach
d2c9ef8cba Merge pull request #336 from weaveworks/dependabot/go_modules/github.com/containrrr/shoutrrr-0.4.2
Bump github.com/containrrr/shoutrrr from 0.4.1 to 0.4.2
2021-04-07 11:14:55 +02:00
dependabot[bot]
9030f56648 Bump github.com/containrrr/shoutrrr from 0.4.1 to 0.4.2
Bumps [github.com/containrrr/shoutrrr](https://github.com/containrrr/shoutrrr) from 0.4.1 to 0.4.2.
- [Release notes](https://github.com/containrrr/shoutrrr/releases)
- [Changelog](https://github.com/containrrr/shoutrrr/blob/main/goreleaser.yml)
- [Commits](https://github.com/containrrr/shoutrrr/compare/v0.4.1...v0.4.2)

Signed-off-by: dependabot[bot] <support@github.com>
2021-04-07 08:48:02 +00:00
Jean-Philippe Evrard
1c13476b49 Update deps
This is the result of a go mod tidy.
It should clarify our dependencies.
2021-04-07 10:43:59 +02:00
Jean-Philippe Evrard
cd7976ce4f Add chart-testing target-branch
Without this patch, chart-testing is using the branch named
"master" by default.

This is a problem, as we just renamed our development branch
"main" instead of "master".

This should fix it by pointing to the right branch.
2021-04-07 10:43:43 +02:00
Jean-Philippe Evrard
8dfe5f2486 Merge pull request #340 from dholbach/update-dev-docs
Update dev docs
2021-04-06 17:14:58 +02:00
Daniel Holbach
f1c5608bcd Merge pull request #339 from evrardjp/fix-gh-action-cancelling
Update github actions
2021-04-06 17:12:41 +02:00
Daniel Holbach
c2122f3924 udpate Dev docs to latest
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-04-06 16:40:41 +02:00
Jean-Philippe Evrard
babc9095ef Update github actions
Without this patch, github actions are lagging behind.
This should improve our coverage.
2021-04-06 15:26:33 +02:00
Daniel Holbach
5305d7b34d Merge pull request #337 from dholbach/change-to-main-branch
Change default branch to 'main'.
2021-04-06 15:00:54 +02:00
atighineanu
9583df2e50 update chart definition to include --notify-url 2021-04-06 13:19:38 +02:00
Daniel Holbach
56a26a2f25 Change default branch to 'main'.
- Make markdownlint happier in a couple of places.
	- Rename '*-master-*' files
	- Change default branches of some other projects
	  we rely on. They moved to 'main' as well.
	- Standardise version of actions/checkout.
	- Update last release in README to 1.6.1.
	- Bbump chart version.

	Eventually closes: #252

Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-04-06 12:46:12 +02:00
Jean-Philippe Evrard
3fa1f3feec Merge pull request #335 from weaveworks/helm-app-version
Use chart appVersion as default image-tag
2021-04-02 10:06:06 +02:00
Christian Kotzbauer
21fdba4ef0 feat: use chart appVersion as default image-tag
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-04-02 09:41:37 +02:00
Jean-Philippe Evrard
4d45fa8bdb Fix invoke reboot for custom commands
Without this patch, the rebootCommand passed to invokeReboot is
ignored, and the command used for reboot is always systemctl reboot.

This is a problem, as we are aiming for flexible commands for this
release.

This fixes it by restoring the previous behaviour before commit
[1] happened.

[1]: 694957d56e
2021-04-02 09:15:59 +02:00
Jean-Philippe Evrard
e09359e46c Merge pull request #330 from weaveworks/dependabot/github_actions/guyarb/golang-test-annoations-v0.4.0
Bump guyarb/golang-test-annoations from v0.3.0 to v0.4.0
2021-03-29 15:53:11 +02:00
Daniel Holbach
770eb1e4f8 Merge pull request #315 from atighineanu/master
Implement universal notification mechanism (NEW)
2021-03-29 15:21:12 +02:00
atighineanu
694957d56e Implement universal notification mechanism
This patch gives the possibility to send notifications
 across different technologies. Also, this patch makes
 slack-hook-url, slack-username and slack-channel
 deprecated (informed by a warning).
 Also, updated the documentation (Readme).
2021-03-29 11:26:18 +02:00
dependabot[bot]
85c42fdb81 Bump guyarb/golang-test-annoations from v0.3.0 to v0.4.0
Bumps [guyarb/golang-test-annoations](https://github.com/guyarb/golang-test-annoations) from v0.3.0 to v0.4.0.
- [Release notes](https://github.com/guyarb/golang-test-annoations/releases)
- [Commits](https://github.com/guyarb/golang-test-annoations/compare/v0.3.0...48645c385003e0c362bf954d4018895be76f1d3d)

Signed-off-by: dependabot[bot] <support@github.com>
2021-03-29 09:19:36 +00:00
Jean-Philippe Evrard
3671c27e37 Add go tests
Without this patch, go test bugs can appear without getting caught,
neither in periodics, nor in PRs.

This should fix it.
2021-03-29 10:26:38 +02:00
Jean-Philippe Evrard
5930d733f8 Fix the Fatal calls using formatting
Without this, go test will rightfully fail.

This is a problem, as we don't have go test enabled, but we want
to have this in the future.

This should fix it.
2021-03-29 09:50:56 +02:00
Jean-Philippe Evrard
fd63e9a74b Add flexible commands parameters
Without this patch, you cannot configure the reboot
command to use, or the use another command to trigger
a reboot.

This is a problem, as multiple users have asked for
it in the past, and we are lacking flexibility.

This fixes it by introducing two new parameters,
- one to provide a custom reboot command.
  This should help people running kured on
  non systemd OS
- one to provide a custom sentinel command.
  This should help people running non Ubuntu OS,
  as they can directly use their command instead of
  generating a file (useful for CentOS/SUSE)

For this, several refactors had to be done, to
remove global state in some functions. Making those
functions closer to "pure functions" helps us
increase our test coverage here and later.

As commandReboot was very close to rebootCommand,
the function to reboot the node has been renamed
to invokeReboot.
2021-03-29 09:50:56 +02:00
Jean-Philippe Evrard
837bd4eb2a Refactor reboot blocks
Without this patch, we rely on global state in many functions for
which we check the reboot blockers.

This is a problem, as it's harder to test.

This patch fixes it by refactoring the reboot blockers. This also
includes a first series of unit tests for our main.
2021-03-29 09:50:56 +02:00
Jean-Philippe Evrard
2a95f0b6c8 Fix periodic jobs
Without this patch, the version of 1.20 is taken in jobs as 1.2.
This is a problem, as it breaks all jobs, because there is no
file to provision a cluster with kubernetes 1.2 (and we shouldn't
do this!)

This fixes it by ensuring there is no mangling of the version
strings, and therefore the right file is used.
2021-03-24 14:29:26 +01:00
Jean-Philippe Evrard
15c57927c8 Update the deprecated DeleteLocalData
DeleteLocalData was deprecated for users of kubectl in 0.20 [1].
At the same time of the deprecation, the relevant code was also
removed [2] without warning: The DeleteLocalData from the helper
structure was simply renamed DeleteEmptyDirData, without shims
on the exposed pkg.

This is a problem, as it completely breaks kured.

This should fix it, by using the new field name.

[1]:
56ea9621b7
[2]:
56ea9621b7 (diff-041bdcdedca650a38a8d82cf15ab6f3665b7b84a0fb44a8bb5dcdc5cd944c63d)
2021-03-22 14:28:17 +01:00
Jean-Philippe Evrard
20cbf6112d Bouncing go.mod with latest kubernetes packages
Without this patch, go.mod will lag behind for the kubernetes
packages, as it's not automatically tested by dependabot.

We should bump versions with each new minor release of kured.

This should fix it.
2021-03-22 14:28:17 +01:00
Christian Kotzbauer
f668bdb1ba Merge pull request #325 from weaveworks/stale-duration
Extend close-duration for stale issues and prs
2021-03-19 11:36:18 +01:00
Christian Kotzbauer
8209647e69 change comment accordingly
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-03-19 10:20:32 +01:00
Christian Kotzbauer
46354837f9 extend close-duration for stale issues and prs
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-03-19 08:26:11 +01:00
Jean-Philippe Evrard
de2e0bb2c8 Merge pull request #321 from dholbach/add-maintainers
Adding a MAINTAINERS file
2021-03-11 14:41:49 +01:00
Daniel Holbach
2b88b72d38 Merge pull request #318 from jackfrancis/node-annotations-chart
update chart definition to include --annotate-nodes
2021-03-11 12:04:39 +01:00
Jack Francis
87e610c25f update chart definition to include --annotate-nodes 2021-03-10 16:03:46 -08:00
Daniel Holbach
fe4ad73c2d Adding a MAINTAINERS file
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-03-10 18:16:11 +01:00
Daniel Holbach
f6ada05c5d Merge pull request #320 from dholbach/alpine-3.13
update to alpine 3.13
2021-03-10 08:50:42 +01:00
Daniel Holbach
355813de30 update to alpine 3.13
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-03-10 08:10:36 +01:00
Daniel Holbach
8a5f69480b Merge pull request #319 from weaveworks/dependabot/go_modules/github.com/sirupsen/logrus-1.8.1
Bump github.com/sirupsen/logrus from 1.8.0 to 1.8.1
2021-03-10 08:07:11 +01:00
Daniel Holbach
1e0fc11b01 Merge pull request #316 from weaveworks/dependabot/github_actions/actions/stale-v3.0.18
Bump actions/stale from v3.0.17 to v3.0.18
2021-03-10 07:55:11 +01:00
dependabot[bot]
2218e29504 Bump github.com/sirupsen/logrus from 1.8.0 to 1.8.1
Bumps [github.com/sirupsen/logrus](https://github.com/sirupsen/logrus) from 1.8.0 to 1.8.1.
- [Release notes](https://github.com/sirupsen/logrus/releases)
- [Changelog](https://github.com/sirupsen/logrus/blob/master/CHANGELOG.md)
- [Commits](https://github.com/sirupsen/logrus/compare/v1.8.0...v1.8.1)

Signed-off-by: dependabot[bot] <support@github.com>
2021-03-10 05:55:36 +00:00
Daniel Holbach
250b9bad05 Merge pull request #296 from jackfrancis/node-annotations
add node annotations to identify kured reboot operations
2021-03-09 10:14:46 +01:00
Daniel Holbach
32e01a8417 Merge pull request #294 from jackfrancis/always-drain
always drain before reboot
2021-03-09 10:13:36 +01:00
Jack Francis
baf83408b8 add node annotations
adds a new --annotate-nodes daemonset runtime argument, which does the following when enabled:

- adds a new node annotation "weave.works/kured-most-recent-reboot-needed" with a value of the current RFC3339 timestamp as soon as kured identifies that a node needs to be rebooted
- adds a new node annotation "weave.works/kured-reboot-in-progress" with a value of the current RFC3339 timestamp as soon as kured identifies that a node needs to be rebooted
- removes the annotation "weave.works/kured-reboot-in-progress" when kured has successfully rebooted the node
2021-03-08 17:22:47 -08:00
Jack Francis
93c8242b89 always drain before reboot
This changes the pre-reboot drain functionality so that it always runs, regardless of the value of the Unschedulable node property.

Because kubectl drain is idempotent, we shouldn't have to worry about whether the node has already been set to Unschedulable (perhaps due to a prior, unsuccessful loop of the kured reboot cycle): we can run it over and over again. And because this drain func actually does a cordon + drain (and it only performs the drain if a cordon is successful), we can be sure that we aren't going to be thrashing this node w/ respect to scheduled pods.

This also fixes an edge case: if the node has been marked Unschedulable out-of-band, but workloads remain Running on this node, kured will no longer reboot the node's underlying VM/machine while it is actively running pods.
2021-03-08 17:20:31 -08:00
dependabot[bot]
c3d4c36493 Bump actions/stale from v3.0.17 to v3.0.18
Bumps [actions/stale](https://github.com/actions/stale) from v3.0.17 to v3.0.18.
- [Release notes](https://github.com/actions/stale/releases)
- [Commits](https://github.com/actions/stale/compare/v3.0.17...3b3c3f03cd4d8e2b61e179ef744a0d20efbe90b4)

Signed-off-by: dependabot[bot] <support@github.com>
2021-03-08 06:35:26 +00:00
Daniel Holbach
1fd09dd572 Merge pull request #310 from weaveworks/dependabot/go_modules/github.com/sirupsen/logrus-1.8.0
Bump github.com/sirupsen/logrus from 1.7.0 to 1.8.0
2021-03-02 10:48:41 +01:00
Daniel Holbach
d21a438197 Merge pull request #311 from weaveworks/dependabot/github_actions/actions/stale-v3.0.17
Bump actions/stale from v3.0.16 to v3.0.17
2021-03-02 10:48:15 +01:00
dependabot[bot]
3fdd1cf6f7 Bump actions/stale from v3.0.16 to v3.0.17
Bumps [actions/stale](https://github.com/actions/stale) from v3.0.16 to v3.0.17.
- [Release notes](https://github.com/actions/stale/releases)
- [Commits](https://github.com/actions/stale/compare/v3.0.16...996798eb71ef485dc4c7b4d3285842d714040c4a)

Signed-off-by: dependabot[bot] <support@github.com>
2021-02-19 05:49:06 +00:00
dependabot[bot]
48688044d5 Bump github.com/sirupsen/logrus from 1.7.0 to 1.8.0
Bumps [github.com/sirupsen/logrus](https://github.com/sirupsen/logrus) from 1.7.0 to 1.8.0.
- [Release notes](https://github.com/sirupsen/logrus/releases)
- [Changelog](https://github.com/sirupsen/logrus/blob/master/CHANGELOG.md)
- [Commits](https://github.com/sirupsen/logrus/compare/v1.7.0...v1.8.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-02-18 05:49:25 +00:00
Daniel Holbach
640613565d Merge pull request #305 from weaveworks/dependabot/go_modules/github.com/spf13/cobra-1.1.3
Bump github.com/spf13/cobra from 1.1.2 to 1.1.3
2021-02-16 12:18:40 +01:00
dependabot[bot]
763695de5c Bump github.com/spf13/cobra from 1.1.2 to 1.1.3
Bumps [github.com/spf13/cobra](https://github.com/spf13/cobra) from 1.1.2 to 1.1.3.
- [Release notes](https://github.com/spf13/cobra/releases)
- [Changelog](https://github.com/spf13/cobra/blob/master/CHANGELOG.md)
- [Commits](https://github.com/spf13/cobra/compare/v1.1.2...v1.1.3)

Signed-off-by: dependabot[bot] <support@github.com>
2021-02-11 05:52:43 +00:00
Daniel Holbach
6ff5722728 Merge pull request #304 from weaveworks/dependabot/go_modules/github.com/spf13/cobra-1.1.2
Bump github.com/spf13/cobra from 1.1.1 to 1.1.2
2021-02-10 12:40:27 +01:00
dependabot[bot]
472934e958 Bump github.com/spf13/cobra from 1.1.1 to 1.1.2
Bumps [github.com/spf13/cobra](https://github.com/spf13/cobra) from 1.1.1 to 1.1.2.
- [Release notes](https://github.com/spf13/cobra/releases)
- [Changelog](https://github.com/spf13/cobra/blob/master/CHANGELOG.md)
- [Commits](https://github.com/spf13/cobra/compare/v1.1.1...v1.1.2)

Signed-off-by: dependabot[bot] <support@github.com>
2021-02-10 05:53:05 +00:00
Daniel Holbach
b7f29c76ce Merge pull request #302 from weaveworks/coc
Point to CNCF Code of Conduct
2021-02-08 17:40:40 +01:00
Daniel Holbach
fa4e458f1f Merge pull request #300 from t3mi/master
add podLabels parameter
2021-02-08 16:05:24 +01:00
Daniel Holbach
4fc93d550d Merge pull request #301 from weaveworks/dependabot/github_actions/actions/stale-v3.0.16
Bump actions/stale from v3.0.15 to v3.0.16
2021-02-08 16:04:16 +01:00
Daniel Holbach
6eb9050156 Point to CNCF Code of Conduct 2021-02-08 11:35:50 +01:00
dependabot[bot]
d8b7669ab4 Bump actions/stale from v3.0.15 to v3.0.16
Bumps [actions/stale](https://github.com/actions/stale) from v3.0.15 to v3.0.16.
- [Release notes](https://github.com/actions/stale/releases)
- [Commits](https://github.com/actions/stale/compare/v3.0.15...9d6f46564a515a9ea11e7762ab3957ee58ca50da)

Signed-off-by: dependabot[bot] <support@github.com>
2021-02-08 06:26:07 +00:00
t3mi
d52d78a303 add podLabels parameter 2021-02-07 23:58:55 +02:00
Daniel Holbach
6a8e3f1e98 Merge pull request #298 from weaveworks/dependabot/github_actions/actions/stale-v3.0.15
Bump actions/stale from v3.0.14 to v3.0.15
2021-01-25 10:05:12 +01:00
dependabot[bot]
b39c9011ea Bump actions/stale from v3.0.14 to v3.0.15
Bumps [actions/stale](https://github.com/actions/stale) from v3.0.14 to v3.0.15.
- [Release notes](https://github.com/actions/stale/releases)
- [Commits](https://github.com/actions/stale/compare/v3.0.14...86561461b92875de77a8b2d2e75f004c826e8f45)

Signed-off-by: dependabot[bot] <support@github.com>
2021-01-25 06:54:10 +00:00
Daniel Holbach
fade706cbf Merge pull request #250 from damoon/19-PreferNoSchedule
implement issue-19 add prefer no schedule taint to avoid double draining of pods
2021-01-12 14:28:23 +01:00
David Sauer
5a4e197d27 change taint config to be disabled by default 2021-01-11 18:24:17 +01:00
Daniel Holbach
1320c5d318 Merge pull request #293 from evrardjp/fix-make-helm-chart
Update helm chart README using Make
2021-01-11 16:39:23 +01:00
Jean-Philippe Evrard
0640683fbb Update helm chart README using Make
Without this, it's possible that the helm chart documentation
contains the `image tag` version which might not be equal to
the version in the helm chart, as it's only an example.

This is a confusing, so instead we should use make to edit the
application version everywhere.

This fixes it by updating the Makefile to modify text of the
chart's README using a regex looking for something similar to
a version; then I used the updated makefile to edit the README,
which in turns requires a bump of the version of the chart
itself.
2021-01-11 16:14:18 +01:00
Daniel Holbach
ec1a931a39 Merge pull request #292 from evrardjp/update-helm-chart
Update helm chart
2021-01-11 15:18:50 +01:00
Jean-Philippe Evrard
36308cee91 Update helm chart
Bumping the helm chart with minor version bump, due to minor
version bump of the kured appVersion.
2021-01-11 14:57:42 +01:00
Daniel Holbach
b733d00550 Merge pull request #280 from cnmcavoy/cnmcavoy/helm-updates
Expose the service name and maxUnavailable for rolling updates in helm chart
2021-01-11 14:53:53 +01:00
Daniel Holbach
56e2c12d38 Merge pull request #291 from evrardjp/fix-tagging
Fix automated tagging
2021-01-11 14:29:28 +01:00
Jean-Philippe Evrard
48e7ff28bf Fix automated tagging
Without this patch, the name of the image is not templated, which
cause the action to fail.

This should fix it, by ensuring the image scan action uses a
templated value, instead of incorrectly relying on shell templating,
which doesn't run in the action.
2021-01-11 14:23:14 +01:00
David Sauer
3a35d6a46c remove taint in case the reboot is not needed anymore 2021-01-06 22:21:41 +01:00
David Sauer
e430b1442a updated README 2021-01-06 21:59:53 +01:00
David Sauer
b3e39418ba cache taint state to avoid unnecessary API calls 2021-01-06 21:51:43 +01:00
David Sauer
34446f949e Allow to disable tainting during pending node reboot by setting the taint name to an empty string. 2021-01-06 21:39:32 +01:00
David Sauer
10d95c426f fixed type & renamed variable 2021-01-06 21:29:35 +01:00
David Sauer
e4c684c3af taint node with PreferNoSchedule to prevent receiving (and double draining) additional pods from other rebooting nodes 2021-01-06 21:23:40 +01:00
David Sauer
204a06ca38 fixed call of log.Fatal instead of log.Fatalf 2021-01-06 21:23:40 +01:00
David Sauer
48897eb0ab avoid indentations to ease readability 2021-01-06 21:23:40 +01:00
Cameron McAvoy
d4893d7bd7 Expose the service name and maxUnavailable for rolling updates in the helm chart 2020-12-17 18:31:28 -05:00
37 changed files with 2575 additions and 729 deletions

1
.github/ct.yaml vendored
View File

@@ -1,5 +1,6 @@
# See https://github.com/helm/chart-testing#configuration
remote: origin
target-branch: main
chart-dirs:
- charts
chart-repos: []

View File

@@ -15,3 +15,7 @@ updates:
- dependency-name: "k8s.io/apimachinery"
- dependency-name: "k8s.io/client-go"
- dependency-name: "k8s.io/kubectl"
- package-ecosystem: "docker"
directory: "cmd/kured"
schedule:
interval: "daily"

View File

@@ -1,13 +0,0 @@
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: "kindest/node:v1.20.0"
- role: control-plane
image: "kindest/node:v1.20.0"
- role: control-plane
image: "kindest/node:v1.20.0"
- role: worker
image: "kindest/node:v1.20.0"
- role: worker
image: "kindest/node:v1.20.0"

View File

@@ -2,12 +2,12 @@ kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: kindest/node:v1.18.8
image: kindest/node:v1.21.2
- role: control-plane
image: kindest/node:v1.18.8
image: kindest/node:v1.21.2
- role: control-plane
image: kindest/node:v1.18.8
image: kindest/node:v1.21.2
- role: worker
image: kindest/node:v1.18.8
image: kindest/node:v1.21.2
- role: worker
image: kindest/node:v1.18.8
image: kindest/node:v1.21.2

View File

@@ -2,12 +2,12 @@ kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: kindest/node:v1.19.4
image: kindest/node:v1.22.4
- role: control-plane
image: kindest/node:v1.19.4
image: kindest/node:v1.22.4
- role: control-plane
image: kindest/node:v1.19.4
image: kindest/node:v1.22.4
- role: worker
image: kindest/node:v1.19.4
image: kindest/node:v1.22.4
- role: worker
image: kindest/node:v1.19.4
image: kindest/node:v1.22.4

13
.github/kind-cluster-1.23.yaml vendored Normal file
View File

@@ -0,0 +1,13 @@
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: "kindest/node:v1.23.0"
- role: control-plane
image: "kindest/node:v1.23.0"
- role: control-plane
image: "kindest/node:v1.23.0"
- role: worker
image: "kindest/node:v1.23.0"
- role: worker
image: "kindest/node:v1.23.0"

View File

@@ -2,7 +2,7 @@ name: Publish helm chart
on:
push:
branches:
- "master"
- "main"
paths:
- "charts/**"

59
.github/workflows/on-main-push.yaml vendored Normal file
View File

@@ -0,0 +1,59 @@
# We publish every merged commit in the form of an image
# named kured:<branch>-<short tag>
name: Push image of latest main
on:
push:
branches:
- main
jobs:
tag-scan-and-push-final-image:
name: "Build, scan, and publish tagged image"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Find go version
run: |
GO_VERSION=$(awk '/^go/ {print $2};' go.mod)
echo "::set-output name=version::${GO_VERSION}"
id: awk_gomod
- name: Ensure go version
uses: actions/setup-go@v2
with:
go-version: "${{ steps.awk_gomod.outputs.version }}"
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME_WEAVEWORKSKUREDCI }}
password: ${{ secrets.DOCKERHUB_TOKEN_WEAVEWORKSKUREDCI }}
- name: Login to ghcr.io
uses: docker/login-action@v1
with:
registry: ghcr.io
username: weave-ghcr-bot
password: ${{ secrets.KURED_WEAVE_GHCR_BOT_TOKEN }}
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v1
- name: Find current tag version
run: echo "::set-output name=sha_short::$(git rev-parse --short HEAD)"
id: tags
- name: Build image
uses: docker/build-push-action@v2
with:
context: .
file: cmd/kured/Dockerfile.multi
platforms: linux/arm64, linux/amd64
push: true
tags: |
docker.io/${{ GITHUB.REPOSITORY }}:main-${{ steps.tags.outputs.sha_short }}
ghcr.io/${{ GITHUB.REPOSITORY }}:main-${{ steps.tags.outputs.sha_short }}

View File

@@ -1,38 +0,0 @@
# We publish every merged commit in the form of an image
# named kured:<branch>-<short tag>
name: Push image of latest master
on:
push:
branches:
- master
jobs:
tag-scan-and-push-final-image:
name: "Build, scan, and publish tagged image"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@master
- name: Find go version
run: |
GO_VERSION=$(awk '/^go/ {print $2};' go.mod)
echo "::set-output name=version::${GO_VERSION}"
id: awk_gomod
- name: Ensure go version
uses: actions/setup-go@v2
with:
go-version: "${{ steps.awk_gomod.outputs.version }}"
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME_WEAVEWORKSKUREDCI }}
password: ${{ secrets.DOCKERHUB_TOKEN_WEAVEWORKSKUREDCI }}
- name: Build image
run: |
make DH_ORG="${{ github.repository_owner }}" image
- name: Publish image
run: |
make DH_ORG="${{ github.repository_owner }}" publish-image

View File

@@ -1,5 +1,6 @@
#This is just extra testing, for lint check, and basic installation
#If those fail, no need to test the rest of the PR (github will cancel the rest of the builds)
#Those can fail earlier than functional tests (shorter tests)
# and give developer feedback soon if they didn't test themselves
name: PR - charts
on:
pull_request:
@@ -11,7 +12,7 @@ jobs:
# tackling that for us.
# Fail-fast ensures that if one of those matrix job fail, the other one gets cancelled.
test-chart:
name: Test helm chart
name: Test helm chart changes
runs-on: ubuntu-latest
strategy:
fail-fast: true
@@ -31,11 +32,47 @@ jobs:
# Helm is already present in github actions, so do not re-install it
- name: Setup chart testing
uses: helm/chart-testing-action@v2.0.1
uses: helm/chart-testing-action@v2.2.0
- name: Create default kind cluster
uses: helm/kind-action@v1.1.0
uses: helm/kind-action@v1.2.0
with:
version: v0.11.0
if: ${{ matrix.test-action == 'install' }}
- name: Run chart tests
- name: Run chart tests
run: ct ${{ matrix.test-action }} --config .github/ct.yaml
# This doesn't re-use the ct actions, due to many limitations (auto tear down, no real testing)
deploy-chart:
name: Functional test of helm chart in its current state (needs published image of the helm chart)
runs-on: ubuntu-latest
needs: test-chart
steps:
- uses: actions/checkout@v2
# Default name for helm/kind-action kind clusters is "chart-testing"
- name: Create 1 node kind cluster
uses: helm/kind-action@v1.2.0
with:
version: v0.11.0
- name: Deploy kured on default namespace with its helm chart
run: |
# Documented in official helm doc to live on the edge
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# Refresh bins
hash -r
helm install kured ./charts/kured/ --set configuration.period=1m --wait
kubectl config set-context kind-chart-testing
kubectl get ds --all-namespaces
kubectl describe ds kured
- name: Test if successful deploy
uses: nick-invision/retry@v2.6.0
with:
timeout_minutes: 10
max_attempts: 10
retry_wait_seconds: 10
# DESIRED CURRENT READY UP-TO-DATE AVAILABLE should all be = to cluster_size
command: "kubectl get ds kured | grep -E 'kured.*1.*1.*1.*1.*1'"

View File

@@ -4,6 +4,29 @@ on:
push:
jobs:
pr-gotest:
name: Run go tests
runs-on: ubuntu-18.04
steps:
- name: checkout
uses: actions/checkout@v2
- name: Find go version
run: |
GO_VERSION=$(awk '/^go/ {print $2};' go.mod)
echo "::set-output name=version::${GO_VERSION}"
id: awk_gomod
- name: Ensure go version
uses: actions/setup-go@v2
with:
go-version: "${{ steps.awk_gomod.outputs.version }}"
- name: run tests
run: go test -json ./... > test.json
- name: Annotate tests
if: always()
uses: guyarb/golang-test-annoations@v0.5.0
with:
test-results: test.json
pr-shellcheck:
name: Lint bash code with shellcheck
runs-on: ubuntu-latest
@@ -39,7 +62,7 @@ jobs:
name: Check docs for incorrect links
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- uses: actions/checkout@v2
- name: Link Checker
id: lc
uses: peter-evans/link-checker@v1
@@ -70,16 +93,93 @@ jobs:
with:
image-name: docker.io/${{ github.repository_owner }}/kured:${{ github.sha }}
# If the PRs don't break the behaviour in the helm chart, we can simply publish the helm chart at the time of the release.
e2e-helm:
name: "Functional test of helm chart, e2e testing"
# This ensures the latest code works with the manifests built from tree.
# It is useful for two things:
# - Test manifests changes (obviously), ensuring they don't break existing clusters
# - Ensure manifests work with the latest versions even with no manifest change
# (compared to helm charts, manifests cannot easily template changes based on versions)
# Helm charts are _trailing_ releases, while manifests are done during development.
e2e-manifests:
name: End-to-End test with kured with code and manifests from HEAD
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
kubernetes:
- "1.21"
- "1.22"
- "1.23"
steps:
- uses: actions/checkout@v2
- name: Find go version
run: |
GO_VERSION=$(awk '/^go/ {print $2};' go.mod)
echo "::set-output name=version::${GO_VERSION}"
id: awk_gomod
- name: Ensure go version
uses: actions/setup-go@v2
with:
go-version: "${{ steps.awk_gomod.outputs.version }}"
- name: Build artifacts
run: |
make DH_ORG="${{ github.repository_owner }}" VERSION="${{ github.sha }}" image
make DH_ORG="${{ github.repository_owner }}" VERSION="${{ github.sha }}" manifest
- name: Workaround "Failed to attach 1 to compat systemd cgroup /actions_job/..." on gh actions
run: |
sudo bash << EOF
cp /etc/docker/daemon.json /etc/docker/daemon.json.old
echo '{}' > /etc/docker/daemon.json
systemctl restart docker || journalctl --no-pager -n 500
systemctl status docker
EOF
# Default name for helm/kind-action kind clusters is "chart-testing"
- name: Create kind cluster with 5 nodes
uses: helm/kind-action@v1.2.0
with:
config: .github/kind-cluster-${{ matrix.kubernetes }}.yaml
version: v0.11.0
- name: Preload previously built images onto kind cluster
run: kind load docker-image docker.io/${{ github.repository_owner }}/kured:${{ github.sha }} --name chart-testing
- name: Do not wait for an hour before detecting the rebootSentinel
run: |
sed -i 's/#\(.*\)--period=1h/\1--period=30s/g' kured-ds.yaml
- name: Install kured with kubectl
run: |
kubectl apply -f kured-rbac.yaml && kubectl apply -f kured-ds.yaml
- name: Ensure kured is ready
uses: nick-invision/retry@v2.6.0
with:
timeout_minutes: 10
max_attempts: 10
retry_wait_seconds: 60
# DESIRED CURRENT READY UP-TO-DATE AVAILABLE should all be = to cluster_size
command: "kubectl get ds -n kube-system kured | grep -E 'kured.*5.*5.*5.*5.*5'"
- name: Create reboot sentinel files
run: |
./tests/kind/create-reboot-sentinels.sh
- name: Follow reboot until success
env:
DEBUG: true
run: |
./tests/kind/follow-coordinated-reboot.sh
scenario-prom-helm:
name: Test prometheus with latest code from HEAD (=overrides image of the helm chart)
runs-on: ubuntu-latest
# only build with oldest and newest supported, it should be good enough.
strategy:
fail-fast: false
matrix:
kubernetes:
- "1.18"
- "1.20"
- "1.21"
steps:
- uses: actions/checkout@v2
- name: Find go version
@@ -96,7 +196,7 @@ jobs:
make DH_ORG="${{ github.repository_owner }}" VERSION="${{ github.sha }}" image
make DH_ORG="${{ github.repository_owner }}" VERSION="${{ github.sha }}" helm-chart
- name: "Workaround 'Failed to attach 1 to compat systemd cgroup /actions_job/...' on gh actions"
- name: Workaround 'Failed to attach 1 to compat systemd cgroup /actions_job/...' on gh actions
run: |
sudo bash << EOF
cp /etc/docker/daemon.json /etc/docker/daemon.json.old
@@ -106,10 +206,10 @@ jobs:
EOF
# Default name for helm/kind-action kind clusters is "chart-testing"
- name: Create 5 node kind cluster
uses: helm/kind-action@master
- name: Create 1 node kind cluster
uses: helm/kind-action@v1.2.0
with:
config: .github/kind-cluster-${{ matrix.kubernetes }}.yaml
version: v0.11.0
- name: Preload previously built images onto kind cluster
run: kind load docker-image docker.io/${{ github.repository_owner }}/kured:${{ github.sha }} --name chart-testing
@@ -117,81 +217,120 @@ jobs:
- name: Deploy kured on default namespace with its helm chart
run: |
# Documented in official helm doc to live on the edge
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# Refresh bins
hash -r
helm install kured ./charts/kured/ --set configuration.period=1m
helm install kured ./charts/kured/ --wait --values ./charts/kured/ci/prometheus-values.yaml
kubectl config set-context kind-chart-testing
kubectl get ds --all-namespaces
kubectl describe ds kured
- name: Ensure kured is ready
uses: nick-invision/retry@v2.4.0
uses: nick-invision/retry@v2.6.0
with:
timeout_minutes: 10
max_attempts: 10
retry_wait_seconds: 60
# DESIRED CURRENT READY UP-TO-DATE AVAILABLE should all be = 5
command: "kubectl get ds kured | grep -E 'kured.*5.*5.*5.*5.*5' "
# DESIRED CURRENT READY UP-TO-DATE AVAILABLE
command: "kubectl get ds kured | grep -E 'kured.*1.*1.*1.*1.*1' "
- name: Get metrics (healthy)
uses: nick-invision/retry@v2.6.0
with:
timeout_minutes: 2
max_attempts: 12
retry_wait_seconds: 5
command: "./tests/kind/test-metrics.sh 0"
- name: Create reboot sentinel files
run: |
./tests/kind/create-reboot-sentinels.sh
- name: Follow reboot until success
env:
DEBUG: true
run: |
./tests/kind/follow-coordinated-reboot.sh
# This workflow is useful when introducing new versions, to ensure our manifests
# still work (even if there might be no manifest 'code' change).
# The version used here is what hasn't been tested with the helm chart
deploy-manifests:
name: Deploy kured with current manifests
runs-on: ubuntu-latest
strategy:
matrix:
kubernetes:
- 1.18
steps:
- uses: actions/checkout@v2
- name: Find go version
run: |
GO_VERSION=$(awk '/^go/ {print $2};' go.mod)
echo "::set-output name=version::${GO_VERSION}"
id: awk_gomod
- name: Ensure go version
uses: actions/setup-go@v2
- name: Get metrics (need reboot)
uses: nick-invision/retry@v2.6.0
with:
go-version: "${{ steps.awk_gomod.outputs.version }}"
- name: Build artifacts
run: |
make DH_ORG="${{ github.repository_owner }}" VERSION="${{ github.sha }}" image
make DH_ORG="${{ github.repository_owner }}" VERSION="${{ github.sha }}" manifest
- name: Workaround "Failed to attach 1 to compat systemd cgroup /actions_job/..." on gh actions
run: |
sudo bash << EOF
cp /etc/docker/daemon.json /etc/docker/daemon.json.old
echo '{}' > /etc/docker/daemon.json
systemctl restart docker || journalctl --no-pager -n 500
systemctl status docker
EOF
# Default name for helm/kind-action kind clusters is "chart-testing"
- name: Create kind cluster
uses: helm/kind-action@master
with:
config: .github/kind-cluster-${{ matrix.kubernetes }}.yaml
- name: Preload previously built images onto kind cluster
run: kind load docker-image docker.io/${{ github.repository_owner }}/kured:${{ github.sha }} --name chart-testing
- name: Install kured with kubectl
run: |
kubectl apply -f kured-rbac.yaml && kubectl apply -f kured-ds.yaml
- name: Ensure kured is ready
uses: nick-invision/retry@v2.4.0
with:
timeout_minutes: 10
timeout_minutes: 15
max_attempts: 10
retry_wait_seconds: 60
# DESIRED CURRENT READY UP-TO-DATE AVAILABLE should all be = to cluster_size
command: "kubectl get ds -n kube-system kured | grep -E 'kured.*5.*5.*5.*5.*5'"
command: "./tests/kind/test-metrics.sh 1"
# TEMPLATE Scenario testing.
# Note: keep in mind that the helm chart's appVersion is overriden to test your HEAD of the branch,
# if you `make helm-chart`.
# This will allow you to test properly your scenario and not use an existing image which will not
# contain your feature.
# scenario-<REPLACETHIS>-helm:
# #example: Testing <REPLACETHIS> with helm chart and code from HEAD"
# name: "<REPLACETHIS>"
# runs-on: ubuntu-latest
# strategy:
# fail-fast: false
# # You can define your own kubernetes versions. For example if your helm chart change should behave differently with different kubernetes versions.
# matrix:
# kubernetes:
# - "1.20"
# steps:
# - uses: actions/checkout@v2
# - name: Find go version
# run: |
# GO_VERSION=$(awk '/^go/ {print $2};' go.mod)
# echo "::set-output name=version::${GO_VERSION}"
# id: awk_gomod
# - name: Ensure go version
# uses: actions/setup-go@v2
# with:
# go-version: "${{ steps.awk_gomod.outputs.version }}"
# - name: Build artifacts
# run: |
# make DH_ORG="${{ github.repository_owner }}" VERSION="${{ github.sha }}" image
# make DH_ORG="${{ github.repository_owner }}" VERSION="${{ github.sha }}" helm-chart
#
# - name: "Workaround 'Failed to attach 1 to compat systemd cgroup /actions_job/...' on gh actions"
# run: |
# sudo bash << EOF
# cp /etc/docker/daemon.json /etc/docker/daemon.json.old
# echo '{}' > /etc/docker/daemon.json
# systemctl restart docker || journalctl --no-pager -n 500
# systemctl status docker
# EOF
#
# # Default name for helm/kind-action kind clusters is "chart-testing"
# - name: Create 5 node kind cluster
# uses: helm/kind-action@master
# with:
# config: .github/kind-cluster-${{ matrix.kubernetes }}.yaml
#
# - name: Preload previously built images onto kind cluster
# run: kind load docker-image docker.io/${{ github.repository_owner }}/kured:${{ github.sha }} --name chart-testing
#
# - name: Deploy kured on default namespace with its helm chart
# run: |
# # Documented in official helm doc to live on the edge
# curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# # Refresh bins
# hash -r
# helm install kured ./charts/kured/ --wait --values ./charts/kured/ci/<REPLACETHIS>-values.yaml
# kubectl config set-context kind-chart-testing
# kubectl get ds --all-namespaces
# kubectl describe ds kured
#
# - name: Ensure kured is ready
# uses: nick-invision/retry@v2.6.0
# with:
# timeout_minutes: 10
# max_attempts: 10
# retry_wait_seconds: 60
# # DESIRED CURRENT READY UP-TO-DATE AVAILABLE should all be = 5
# command: "kubectl get ds kured | grep -E 'kured.*5.*5.*5.*5.*5' "
#
# - name: Create reboot sentinel files
# run: |
# ./tests/kind/create-reboot-sentinels.sh
#
# - name: Test <REPLACETHIS>
# env:
# DEBUG: true
# run: |
# <TODO>

View File

@@ -12,7 +12,7 @@ jobs:
name: "Build, scan, and publish tagged image"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@master
- uses: actions/checkout@v2
- name: Find go version
run: |
GO_VERSION=$(awk '/^go/ {print $2};' go.mod)
@@ -22,12 +22,14 @@ jobs:
uses: actions/setup-go@v2
with:
go-version: "${{ steps.awk_gomod.outputs.version }}"
- name: Find current tag version
run: echo "::set-output name=version::${GITHUB_REF#refs/tags/}"
id: tags
- run: |
make DH_ORG="${{ github.repository_owner }}" VERSION="${GITHUB_REF#refs/tags/}" image
make DH_ORG="${{ github.repository_owner }}" VERSION="${{ steps.tags.outputs.version }}" image
- uses: Azure/container-scan@v0
with:
image-name: docker.io/${{ github.repository_owner }}/kured:${GITHUB_REF#refs/tags/}
image-name: docker.io/${{ github.repository_owner }}/kured:${{ steps.tags.outputs.version }}
- name: Login to DockerHub
uses: docker/login-action@v1
@@ -35,6 +37,27 @@ jobs:
username: ${{ secrets.DOCKERHUB_USERNAME_WEAVEWORKSKUREDCI }}
password: ${{ secrets.DOCKERHUB_TOKEN_WEAVEWORKSKUREDCI }}
- name: Publish image
run: |
make DH_ORG="${{ github.repository_owner }}" VERSION="${GITHUB_REF#refs/tags/}" publish-image
- name: Login to ghcr.io
uses: docker/login-action@v1
with:
registry: ghcr.io
username: weave-ghcr-bot
password: ${{ secrets.KURED_WEAVE_GHCR_BOT_TOKEN }}
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v1
- name: Build image
uses: docker/build-push-action@v2
with:
context: .
file: cmd/kured/Dockerfile.multi
platforms: linux/arm64, linux/amd64, linux/arm/v7, linux/arm/v6, linux/386
push: true
tags: |
docker.io/${{ GITHUB.REPOSITORY }}:${{ steps.tags.outputs.version }}
ghcr.io/${{ GITHUB.REPOSITORY }}:${{ steps.tags.outputs.version }}

View File

@@ -5,26 +5,41 @@ on:
- cron: "30 1 * * *"
jobs:
periodics-gotest:
name: Run go tests
runs-on: ubuntu-18.04
steps:
- name: checkout
uses: actions/checkout@v2
- name: run tests
run: go test -json ./... > test.json
- name: Annotate tests
if: always()
uses: guyarb/golang-test-annoations@v0.5.0
with:
test-results: test.json
periodics-mark-stale:
name: Mark stale issues and PRs
runs-on: ubuntu-latest
steps:
# Stale by default waits for 60 days before marking PR/issues as stale, and closes them after 7 days.
# Stale by default waits for 60 days before marking PR/issues as stale, and closes them after 21 days.
# Do not expire the first issues that would allow the community to grow.
- uses: actions/stale@v3.0.14
- uses: actions/stale@v4
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
stale-issue-message: 'This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days).'
stale-pr-message: 'This PR was automatically considered stale due to lack of activity. Please refresh it and/or join our slack channels to highlight it, before it automatically closes (in 7 days).'
stale-issue-label: 'no-issue-activity'
stale-pr-label: 'no-pr-activity'
exempt-issue-labels: 'good-first-issue'
exempt-issue-labels: 'good first issue,keep'
days-before-close: 21
check-docs-links:
name: Check docs for incorrect links
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- uses: actions/checkout@v2
- name: Link Checker
id: lc
uses: peter-evans/link-checker@v1
@@ -53,15 +68,15 @@ jobs:
image-name: docker.io/${{ github.repository_owner }}/kured:${{ github.sha }}
deploy-helm:
name: Ensure a kubernetes change didn't break our code
name: Ensure our currently released helm chart works on all kubernetes versions
runs-on: ubuntu-latest
# only build with oldest and newest supported, it should be good enough.
strategy:
matrix:
kubernetes:
- 1.18
- 1.19
- 1.20
- "1.21"
- "1.22"
- "1.23"
steps:
- uses: actions/checkout@v2
- name: Find go version
@@ -73,10 +88,6 @@ jobs:
uses: actions/setup-go@v2
with:
go-version: "${{ steps.awk_gomod.outputs.version }}"
- name: Build artifacts
run: |
make DH_ORG="${{ github.repository_owner }}" VERSION="master" image
make DH_ORG="${{ github.repository_owner }}" VERSION="master" helm-chart
- name: "Workaround 'Failed to attach 1 to compat systemd cgroup /actions_job/...' on gh actions"
run: |
@@ -89,17 +100,15 @@ jobs:
# Default name for helm/kind-action kind clusters is "chart-testing"
- name: Create 5 node kind cluster
uses: helm/kind-action@master
uses: helm/kind-action@v1.2.0
with:
config: .github/kind-cluster-${{ matrix.kubernetes }}.yaml
- name: Preload previously built images onto kind cluster
run: kind load docker-image docker.io/${{ github.repository_owner }}/kured:master --name chart-testing
version: v0.11.0
- name: Deploy kured on default namespace with its helm chart
run: |
# Documented in official helm doc to live on the edge
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# Refresh bins
hash -r
helm install kured ./charts/kured/ --set configuration.period=1m
@@ -108,7 +117,7 @@ jobs:
kubectl describe ds kured
- name: Ensure kured is ready
uses: nick-invision/retry@v2.4.0
uses: nick-invision/retry@v2.6.0
with:
timeout_minutes: 10
max_attempts: 10

View File

@@ -13,7 +13,9 @@ you are planning to contribute code.
[issues]: https://github.com/weaveworks/kured/issues
[readme]: README.md
## Updating k8s support
## Regular development activities
### Updating k8s support
Whenever we want to update e.g. the `kubectl` or `client-go` dependencies,
some RBAC changes might be necessary too.
@@ -24,15 +26,90 @@ This is what it took to support Kubernetes 1.14:
That the process can be more involved that that can be seen in
<https://github.com/weaveworks/kured/commits/support-k8s-1.10>
Please update our .github/workflows with the new k8s images, starting by
the creation of a .github/kind-cluster-<version>.yaml, then updating
our workflows with the new versions.
Once you updated everything, make sure you update the support matrix on
the main [README][readme] as well.
## Release testing
### Updating other dependencies
Dependabot proposes changes in our go.mod/go.sum.
Some of those changes are covered by CI testing, some are not.
Please make sure to test those not covered by CI (mostly the integration
with other tools) manually before merging.
### Review periodic jobs
We run periodic jobs (see also Automated testing section of this documentation).
Those should be monitored for failures.
If a failure happen in periodics, something terribly wrong must have happened
(or github is failing at the creation of a kind cluster). Please monitor those
failures carefully.
### Introducing new features
When you introduce a new feature, the kured team expects you to have tested
your change thoroughly. If possible, include all the necessary testing in your change.
If your change involves a user facing change (change in flags of kured for example),
please include expose your new feature in our default manifest (`kured-ds.yaml`),
as a comment.
Do not update the helm chart directly.
Helm charts and our release manifests (see below) are our stable interfaces.
Any user facing changes will therefore have to wait for a while before being
exposed to our users.
This also means that when you expose a new feature, you should create another PR
for your changes in `charts/` to make your feature available for our next kured version.
In this change, you can directly bump the appVersion to the next minor version.
(for example, if current appVersion is 1.6.x, make sure you update your appVersion
to 1.7.0). It allows us to have an easy view of what we land each release.
Do not hesitate to increase the test coverage for your feature, whether it's unit
testing to full functional testing (even using helm charts)
### Increasing test coverage
We are welcoming any change to increase our test coverage.
See also our github issues for the label `testing`.
### Updating helm charts
Helm charts are continuously published. Any change in `charts/` will be immediately
pushed in production.
## Automated testing
Our CI is covered by github actions.
You can see their contents in .github/workflows.
We currently run:
- go tests and lint
- shellcheck
- a check for dead links in our docs
- a security check against our base image (alpine)
- a deep functional test using our manifests on all supported k8s versions
- basic deployment using our helm chart on any chart change
Changes in helm charts are not functionally tested on PRs. We assume that
the PRs to implement the feature are properly tested by our users and
contributors before merge.
To test your code manually, follow the section Manual testing.
## Manual (release) testing
Before `kured` is released, we want to make sure it still works fine on the
previous, current and next minor version of Kubernetes (with respect to the
embedded `client-go` & `kubectl`). For local testing e.g. `minikube` or
`kind` can be sufficient.
`client-go` & `kubectl` dependencies in use). For local testing e.g.
`minikube` or `kind` can be sufficient. This will allow you to catch issues
that might not have been tested in our CI, like integration with other tools,
or your specific use case.
Deploy kured in your test scenario, make sure you pass the right `image`,
update the e.g. `period` and `reboot-days` options, so you get immediate
@@ -42,7 +119,11 @@ results, if you login to a node and run:
sudo touch /var/run/reboot-required
```
### Testing with `minikube`
### Example of golang testing
Please run `make test`. You should have golint installed.
### Example of testing with `minikube`
A test-run with `minikube` could look like this:
@@ -82,16 +163,16 @@ If all the tests ran well, kured maintainers can reach out to the Weaveworks
team to get an upcoming `kured` release tested in the Dev environment for
real life testing.
### Testing with `kind`
### Example of testing with `kind`
A test-run with `kind` could look like this:
```console
# create kind cluster
kind create cluster --config .github/kind-cluster.yaml
kind create cluster --config .github/kind-cluster-<k8s-version>.yaml
# create reboot required files on pre-defined kind nodes
./tests/create-reboot-sentinels.sh
./tests/kind/create-reboot-sentinels.sh
# check if reboot is working fine
./tests/kind/follow-coordinated-reboot.sh
@@ -101,27 +182,20 @@ kind create cluster --config .github/kind-cluster.yaml
## Publishing a new kured release
### Prepare Documentation
Check that `README.md` has an updated compatibility matrix and that the
url in the `kubectl` incantation (under "Installation") is updated to the
new version you want to release.
### Create a tag on the repo and publish the image
### Create a tag on the repo
Before going further, we should freeze the code for a release, by
tagging the code, and publishing its immutable artifact: the kured
docker image.
tagging the code. The Github-Action should start a new job and push
the new image to the registry.
```sh
make DH_ORG="weaveworks" VERSION="1.3.0" image
```
Then docker push the image. In the future, that might be automatically
done when creating a tag on the repository, with the help of github
actions.
### Create the combined manifest
Now create the `kured-<release>-dockerhub.yaml` for e.g. `1.3.0`:
```sh
@@ -131,6 +205,7 @@ make DH_ORG="weaveworks" VERSION="${VERSION}" manifest
cat kured-rbac.yaml > "$MANIFEST"
cat kured-ds.yaml >> "$MANIFEST"
```
### Publish release artifacts
Now you can head to the Github UI, use the version number as tag and upload the
@@ -155,3 +230,6 @@ A change in the helm chart requires a bump of the `version`
in `charts/kured/Chart.yaml` (following the versioning rules).
Update it, and issue a PR. Upon merge, that PR will automatically
publish the chart to the gh-pages branch.
When there are open helm-chart PRs which are on hold until the helm-chart has been updated
with the new kured version, they can be merged now (unless a rebase is needed from the contributor).

5
MAINTAINERS Normal file
View File

@@ -0,0 +1,5 @@
Christian Kotzbauer <christian.kotzbauer@gmail.com> (@ckotzbauer)
Daniel Holbach <daniel@weave.works> (@dholbach)
Hidde Beydals <hidde@weave.works> (@hiddeco)
Jean-Phillipe Evrard <jean-philippe.evrard@suse.com> (@evrardjp)
Jack Francis <jackfrancis@gmail.com> (@jackfrancis)

View File

@@ -1,5 +1,5 @@
.DEFAULT: all
.PHONY: all clean image publish-image minikube-publish manifest helm-chart
.PHONY: all clean image publish-image minikube-publish manifest helm-chart test tests
DH_ORG=weaveworks
VERSION=$(shell git symbolic-ref --short HEAD)-$(shell git rev-parse --short HEAD)
@@ -24,12 +24,14 @@ build/.image.done: cmd/kured/Dockerfile cmd/kured/kured
cp $^ build
$(SUDO) docker build -t docker.io/$(DH_ORG)/kured -f build/Dockerfile ./build
$(SUDO) docker tag docker.io/$(DH_ORG)/kured docker.io/$(DH_ORG)/kured:$(VERSION)
$(SUDO) docker tag docker.io/$(DH_ORG)/kured ghcr.io/$(DH_ORG)/kured:$(VERSION)
touch $@
image: build/.image.done
publish-image: image
$(SUDO) docker push docker.io/$(DH_ORG)/kured:$(VERSION)
$(SUDO) docker push ghcr.io/$(DH_ORG)/kured:$(VERSION)
minikube-publish: image
$(SUDO) docker save docker.io/$(DH_ORG)/kured | (eval $$(minikube docker-env) && docker load)
@@ -40,6 +42,14 @@ manifest:
helm-chart:
sed -i "s#repository:.*/kured#repository: $(DH_ORG)/kured#g" charts/kured/values.yaml
sed -i "s#tag:.*#tag: $(VERSION)#g" charts/kured/values.yaml
sed -i "s#appVersion:.*#appVersion: \"$(VERSION)\"#g" charts/kured/Chart.yaml
sed -i "s#\`[0-9]*\.[0-9]*\.[0-9]*\`#\`$(VERSION)\`#g" charts/kured/README.md
echo "Please bump version in charts/kured/Chart.yaml"
test: tests
echo "Running go tests"
go test ./...
echo "Running golint on pkg"
golint ./pkg/...
echo "Running golint on cmd"
golint ./cmd/...

147
README.md
View File

@@ -1,27 +1,29 @@
# kured - Kubernetes Reboot Daemon
<img src="https://github.com/weaveworks/kured/raw/master/img/logo.png" align="right"/>
<img src="https://github.com/weaveworks/kured/raw/main/img/logo.png" align="right"/>
* [Introduction](#introduction)
* [Kubernetes & OS Compatibility](#kubernetes-&-os-compatibility)
* [Installation](#installation)
* [Configuration](#configuration)
* [Reboot Sentinel File & Period](#reboot-sentinel-file-&-period)
* [Setting a schedule](#setting-a-schedule)
* [Blocking Reboots via Alerts](#blocking-reboots-via-alerts)
* [Blocking Reboots via Pods](#blocking-reboots-via-pods)
* [Prometheus Metrics](#prometheus-metrics)
* [Slack Notifications](#slack-notifications)
* [Overriding Lock Configuration](#overriding-lock-configuration)
* [Operation](#operation)
* [Testing](#testing)
* [Disabling Reboots](#disabling-reboots)
* [Manual Unlock](#manual-unlock)
* [Automatic Unlock](#automatic-unlock)
* [Building](#building)
* [Frequently Asked/Anticipated Questions](#frequently-askedanticipated-questions)
* [Getting Help](#getting-help)
- [Introduction](#introduction)
- [Kubernetes & OS Compatibility](#kubernetes--os-compatibility)
- [Installation](#installation)
- [Configuration](#configuration)
- [Reboot Sentinel File & Period](#reboot-sentinel-file--period)
- [Setting a schedule](#setting-a-schedule)
- [Blocking Reboots via Alerts](#blocking-reboots-via-alerts)
- [Blocking Reboots via Pods](#blocking-reboots-via-pods)
- [Prometheus Metrics](#prometheus-metrics)
- [Notifications](#notifications)
- [Overriding Lock Configuration](#overriding-lock-configuration)
- [Operation](#operation)
- [Testing](#testing)
- [Disabling Reboots](#disabling-reboots)
- [Manual Unlock](#manual-unlock)
- [Automatic Unlock](#automatic-unlock)
- [Delaying Lock Release](#delaying-lock-release)
- [Building](#building)
- [Frequently Asked/Anticipated Questions](#frequently-askedanticipated-questions)
- [Why is there no `latest` tag on Docker Hub?](#why-is-there-no-latest-tag-on-docker-hub)
- [Getting Help](#getting-help)
## Introduction
@@ -29,7 +31,8 @@ Kured (KUbernetes REboot Daemon) is a Kubernetes daemonset that
performs safe automatic node reboots when the need to do so is
indicated by the package management system of the underlying OS.
* Watches for the presence of a reboot sentinel e.g. `/var/run/reboot-required`
* Watches for the presence of a reboot sentinel file e.g. `/var/run/reboot-required`
or the successful run of a sentinel command.
* Utilises a lock in the API server to ensure only one node reboots at
a time
* Optionally defers reboots in the presence of active Prometheus alerts or selected pods
@@ -43,16 +46,19 @@ maintaining the lock and draining worker nodes. Kubernetes aims to provide
forwards and backwards compatibility of one minor version between client and
server:
| kured | kubectl | k8s.io/client-go | k8s.io/apimachinery | expected kubernetes compatibility |
|--------|---------|------------------|---------------------|-----------------------------------|
| master | 1.19.4 | v0.19.4 | v0.19.4 | 1.18.x, 1.19.x, 1.20.x |
| 1.6.0 | 1.19.4 | v0.19.4 | v0.19.4 | 1.18.x, 1.19.x, 1.20.x |
| 1.5.1 | 1.18.8 | v0.18.8 | v0.18.8 | 1.17.x, 1.18.x, 1.19.x |
| 1.4.4 | 1.17.7 | v0.17.0 | v0.17.0 | 1.16.x, 1.17.x, 1.18.x |
| 1.3.0 | 1.15.10 | v12.0.0 | release-1.15 | 1.15.x, 1.16.x, 1.17.x |
| 1.2.0 | 1.13.6 | v10.0.0 | release-1.13 | 1.12.x, 1.13.x, 1.14.x |
| 1.1.0 | 1.12.1 | v9.0.0 | release-1.12 | 1.11.x, 1.12.x, 1.13.x |
| 1.0.0 | 1.7.6 | v4.0.0 | release-1.7 | 1.6.x, 1.7.x, 1.8.x |
| kured | kubectl | k8s.io/client-go | k8s.io/apimachinery | expected kubernetes compatibility |
|-------|---------|------------------|---------------------|-----------------------------------|
| main | 1.22.4 | v0.22.4 | v0.22.4 | 1.21.x, 1.22.x, 1.23.x |
| 1.9.1 | 1.22.4 | v0.22.4 | v0.22.4 | 1.21.x, 1.22.x, 1.23.x |
| 1.8.1 | 1.21.4 | v0.21.4 | v0.21.4 | 1.20.x, 1.21.x, 1.22.x |
| 1.7.0 | 1.20.5 | v0.20.5 | v0.20.5 | 1.19.x, 1.20.x, 1.21.x |
| 1.6.1 | 1.19.4 | v0.19.4 | v0.19.4 | 1.18.x, 1.19.x, 1.20.x |
| 1.5.1 | 1.18.8 | v0.18.8 | v0.18.8 | 1.17.x, 1.18.x, 1.19.x |
| 1.4.4 | 1.17.7 | v0.17.0 | v0.17.0 | 1.16.x, 1.17.x, 1.18.x |
| 1.3.0 | 1.15.10 | v12.0.0 | release-1.15 | 1.15.x, 1.16.x, 1.17.x |
| 1.2.0 | 1.13.6 | v10.0.0 | release-1.13 | 1.12.x, 1.13.x, 1.14.x |
| 1.1.0 | 1.12.1 | v9.0.0 | release-1.12 | 1.11.x, 1.12.x, 1.13.x |
| 1.0.0 | 1.7.6 | v4.0.0 | release-1.7 | 1.6.x, 1.7.x, 1.8.x |
See the [release notes](https://github.com/weaveworks/kured/releases)
for specific version compatibility information, including which
@@ -80,25 +86,37 @@ The following arguments can be passed to kured via the daemonset pod template:
```console
Flags:
--lock-ttl time force clean annotation after this ammount of time (default 0, disabled)
--alert-filter-regexp regexp.Regexp alert names to ignore when checking for active alerts
--alert-firing-only bool only consider firing alerts when checking for active alerts
--blocking-pod-selector stringArray label selector identifying pods whose presence should prevent reboots
--drain-grace-period int time in seconds given to each pod to terminate gracefully, if negative, the default value specified in the pod will be used (default: -1)
--skip-wait-for-delete-timeout int when seconds is greater than zero, skip waiting for the pods whose deletion timestamp is older than N seconds while draining a node (default: 0)
--ds-name string name of daemonset on which to place lock (default "kured")
--ds-namespace string namespace containing daemonset on which to place lock (default "kube-system")
--end-time string only reboot before this time of day (default "23:59")
--end-time string schedule reboot only before this time of day (default "23:59:59")
--force-reboot bool force a reboot even if the drain is still running (default: false)
--drain-timeout duration timeout after which the drain is aborted (default: 0, infinite time)
-h, --help help for kured
--lock-annotation string annotation in which to record locking node (default "weave.works/kured-node-lock")
--period duration reboot check period (default 1h0m0s)
--prometheus-url string Prometheus instance to probe for active alerts
--reboot-days strings only reboot on these days (default [su,mo,tu,we,th,fr,sa])
--reboot-sentinel string path to file whose existence signals need to reboot (default "/var/run/reboot-required")
--slack-channel string slack channel for reboot notfications
--slack-hook-url string slack hook URL for reboot notfications
--slack-username string slack username for reboot notfications (default "kured")
--lock-release-delay duration hold lock after reboot by this duration (default: 0, disabled)
--lock-ttl duration expire lock annotation after this duration (default: 0, disabled)
--message-template-drain string message template used to notify about a node being drained (default "Draining node %s")
--message-template-reboot string message template used to notify about a node being rebooted (default "Rebooting node %s")
--start-time string only reboot after this time of day (default "0:00")
--time-zone string use this timezone to calculate allowed reboot time (default "UTC")
--notify-url url for reboot notifications (cannot use with --slack-hook-url flags)
--period duration reboot check period (default 1h0m0s)
--prefer-no-schedule-taint string Taint name applied during pending node reboot (to prevent receiving additional pods from other rebooting nodes). Disabled by default. Set e.g. to "weave.works/kured-node-reboot" to enable tainting.
--prometheus-url string Prometheus instance to probe for active alerts
--reboot-command string command to run when a reboot is required by the sentinel (default "/sbin/systemctl reboot")
--reboot-days strings schedule reboot on these days (default [su,mo,tu,we,th,fr,sa])
--reboot-delay duration add a delay after drain finishes but before the reboot command is issued (default 0, no time)
--reboot-sentinel string path to file whose existence signals need to reboot (default "/var/run/reboot-required")
--reboot-sentinel-command string command for which a successful run signals need to reboot (default ""). If non-empty, sentinel file will be ignored.
--slack-channel string slack channel for reboot notfications
--slack-hook-url string slack hook URL for reboot notfications [deprecated in favor of --notify-url]
--slack-username string slack username for reboot notfications (default "kured")
--start-time string schedule reboot only after this time of day (default "0:00")
--time-zone string use this timezone for schedule inputs (default "UTC")
--log-format string log format specified as text or json, defaults to "text"
```
### Reboot Sentinel File & Period
@@ -109,6 +127,10 @@ values with `--reboot-sentinel` and `--period`. Each replica of the
daemon uses a random offset derived from the period on startup so that
nodes don't all contend for the lock simultaneously.
Alternatively, a reboot sentinel command can be used. If a reboot
sentinel command is used, the reboot sentinel file presence will be
ignored.
### Setting a schedule
By default, kured will reboot any time it detects the sentinel, but this
@@ -149,6 +171,11 @@ will block reboots, however you can ignore specific alerts:
--alert-filter-regexp=^(RebootRequired|AnotherBenignAlert|...$
```
You can also only block reboots for firing alerts:
```console
--alert-firing-only=true
```
See the section on Prometheus metrics for an important application of this
filter.
@@ -211,21 +238,34 @@ If you choose to employ such an alert and have configured kured to
probe for active alerts before rebooting, be sure to specify
`--alert-filter-regexp=^RebootRequired$` to avoid deadlock!
### Slack Notifications
### Notifications
If you specify a Slack hook via `--slack-hook-url`, kured will notify
you immediately prior to rebooting a node:
When you specify a formatted URL using `--notify-url`, kured will notify
about draining and rebooting nodes across a list of technologies.
![Notification](img/slack-notification.png)
We recommend setting `--slack-username` to be the name of the
environment, e.g. `dev` or `prod`.
Alternatively you can use the `--message-template-drain` and `--message-template-reboot` to customize the text of the message, e.g.
```
```cli
--message-template-drain="Draining node %s part of *my-cluster* in region *xyz*"
```
Here is the syntax:
- slack: `slack://tokenA/tokenB/tokenC`
(`--slack-hook-url` is deprecated but possible to use)
- rocketchat: `rocketchat://[username@]rocketchat-host/token[/channel|@recipient]`
- teams: `teams://tName/token-a/token-b/token-c`
> **Attention** as the [format of the url has changed](https://github.com/containrrr/shoutrrr/issues/138) you also have to specify a `tName`
- Email: `smtp://username:password@host:port/?fromAddress=fromAddress&toAddresses=recipient1[,recipient2,...]`
More details here: [containrrr.dev/shoutrrr/v0.4/services/overview](https://containrrr.dev/shoutrrr/v0.4/services/overview)
### Overriding Lock Configuration
The `--ds-name` and `--ds-namespace` arguments should match the name and
@@ -282,6 +322,11 @@ which holds lock might be killed thus annotation will stay there for ever.
Using `--lock-ttl=30m` will allow other nodes to take over if TTL has expired (in this case 30min) and continue reboot process.
### Delaying Lock Release
Using `--lock-release-delay=30m` will cause nodes to hold the lock for the specified time frame (in this case 30min) before it is released and the reboot process continues. This can be used to throttle reboots across the cluster.
## Building
Kured now uses [Go
@@ -314,7 +359,7 @@ our [development][development] docs.
Use of `latest` for production deployments is bad practice - see
[here](https://kubernetes.io/docs/concepts/configuration/overview) for
details. The manifest on `master` refers to `latest` for local
details. The manifest on `main` refers to `latest` for local
development testing with minikube only; for production use choose a
versioned manifest from the [release page](https://github.com/weaveworks/kured/releases/).
@@ -328,4 +373,6 @@ If you have any questions about, feedback for or problems with `kured`:
* Join us in [our monthly meeting](https://docs.google.com/document/d/1bsHTjHhqaaZ7yJnXF6W8c89UB_yn-OoSZEmDnIP34n8/edit#),
every fourth Wednesday of the month at 16:00 UTC.
We follow the [CNCF Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md).
Your feedback is always welcome!

View File

@@ -1,8 +1,8 @@
apiVersion: v1
appVersion: "1.5.1"
appVersion: "1.9.1"
description: A Helm chart for kured
name: kured
version: 2.2.3
version: 2.11.2
home: https://github.com/weaveworks/kured
maintainers:
- name: ckotzbauer
@@ -11,4 +11,4 @@ maintainers:
email: david@davidkarlsen.com
sources:
- https://github.com/weaveworks/kured
icon: https://raw.githubusercontent.com/weaveworks/kured/master/img/logo.png
icon: https://raw.githubusercontent.com/weaveworks/kured/main/img/logo.png

View File

@@ -36,29 +36,44 @@ The following changes have been made compared to the stable chart:
| Config | Description | Default |
| ------ | ----------- | ------- |
| `image.repository` | Image repository | `weaveworks/kured` |
| `image.tag` | Image tag | `1.5.1` |
| `image.tag` | Image tag | `1.9.1` |
| `image.pullPolicy` | Image pull policy | `IfNotPresent` |
| `image.pullSecrets` | Image pull secrets | `[]` |
| `updateStrategy` | Daemonset update strategy | `OnDelete` |
| `updateStrategy` | Daemonset update strategy | `RollingUpdate` |
| `maxUnavailable` | The max pods unavailable during a rolling update | `1` |
| `podAnnotations` | Annotations to apply to pods (eg to add Prometheus annotations) | `{}` |
| `dsAnnotations` | Annotations to apply to the kured DaemonSet | `{}` |
| `extraArgs` | Extra arguments to pass to `/usr/bin/kured`. See below. | `{}` |
| `extraEnvVars` | Array of environment variables to pass to the daemonset. | `{}` |
| `configuration.lockTtl` | cli-parameter `--lock-ttl` | `0` |
| `configuration.lockReleaseDelay` | cli-parameter `--lock-release-delay` | `0` |
| `configuration.alertFilterRegexp` | cli-parameter `--alert-filter-regexp` | `""` |
| `configuration.alertFiringOnly` | cli-parameter `--alert-firing-only` | `false` |
| `configuration.blockingPodSelector` | Array of selectors for multiple cli-parameters `--blocking-pod-selector` | `[]` |
| `configuration.endTime` | cli-parameter `--end-time` | `""` |
| `configuration.lockAnnotation` | cli-parameter `--lock-annotation` | `""` |
| `configuration.period` | cli-parameter `--period` | `""` |
| `configuration.forceReboot` | cli-parameter `--force-reboot` | `false` |
| `configuration.drainGracePeriod` | cli-parameter `--drain-grace-period` | `""` |
| `configuration.drainTimeout` | cli-parameter `--drain-timeout` | `""` |
| `configuration.skipWaitForDeleteTimeout` | cli-parameter `--skip-wait-for-delete-timeout` | `""` |
| `configuration.prometheusUrl` | cli-parameter `--prometheus-url` | `""` |
| `configuration.rebootDays` | Array of days for multiple cli-parameters `--reboot-days` | `[]` |
| `configuration.rebootSentinel` | cli-parameter `--reboot-sentinel` | `""` |
| `configuration.rebootSentinelCommand` | cli-parameter `--reboot-sentinel-command` | `""` |
| `configuration.rebootCommand` | cli-parameter `--reboot-command` | `""` |
| `configuration.rebootDelay` | cli-parameter `--reboot-delay` | `""` |
| `configuration.slackChannel` | cli-parameter `--slack-channel` | `""` |
| `configuration.slackHookUrl` | cli-parameter `--slack-hook-url` | `""` |
| `configuration.slackUsername` | cli-parameter `--slack-username` | `""` |
| `configuration.notifyUrl` | cli-parameter `--notify-url` | `""` |
| `configuration.messageTemplateDrain` | cli-parameter `--message-template-drain` | `""` |
| `configuration.messageTemplateReboot` | cli-parameter `--message-template-reboot` | `""` |
| `configuration.startTime` | cli-parameter `--start-time` | `""` |
| `configuration.timeZone` | cli-parameter `--time-zone` | `""` |
| `configuration.annotateNodes` | cli-parameter `--annotate-nodes` | `false` |
| `configuration.logFormat` | cli-parameter `--log-format` | `"text"` |
| `configuration.preferNoScheduleTaint` | Taint name applied during pending node reboot | `""` |
| `rbac.create` | Create RBAC roles | `true` |
| `serviceAccount.create` | Create a service account | `true` |
| `serviceAccount.name` | Service account name to create (or use if `serviceAccount.create` is false) | (chart fullname) |
@@ -70,13 +85,16 @@ The following changes have been made compared to the stable chart:
| `metrics.interval` | Interval prometheus should scrape the endpoint | `60s` |
| `metrics.scrapeTimeout` | A custom scrapeTimeout for prometheus | `""` |
| `service.create` | Create a Service for the metrics endpoint | `false` |
| `service.name ` | Service name for the metrics endpoint | `""` |
| `service.port` | Port of the service to expose | `8080` |
| `service.annotations` | Annotations to apply to the service (eg to add Prometheus annotations) | `{}` |
| `podLabels` | Additional labels for pods (e.g. CostCenter=IT) | `{}` |
| `priorityClassName` | Priority Class to be used by the pods | `""` |
| `tolerations` | Tolerations to apply to the daemonset (eg to allow running on master) | `[{"key": "node-role.kubernetes.io/master", "effect": "NoSchedule"}]`|
| `affinity` | Affinity for the daemonset (ie, restrict which nodes kured runs on) | `{}` |
| `nodeSelector` | Node Selector for the daemonset (ie, restrict which nodes kured runs on) | `{}` |
| `volumeMounts` | Maps of volumes mount to mount | `{}` |
| `volumes` | Maps of volumes to mount | `{}` |
See https://github.com/weaveworks/kured#configuration for values (not contained in the `configuration` object) for `extraArgs`. Note that
```yaml
extraArgs:

View File

@@ -0,0 +1,13 @@
# This is tested twice:
# Basic install test with chart-testing (on charts PRs)
# Functional testing in PRs (other PRs)
service:
create: true
name: kured-prometheus-endpoint
port: 8080
type: NodePort
nodePort: 30000
# Do not override the configuration: period in this, so that
# We can test prometheus exposed metrics without rebooting.

View File

@@ -5,9 +5,19 @@ metadata:
namespace: {{ .Release.Namespace }}
labels:
{{- include "kured.labels" . | nindent 4 }}
{{- if .Values.dsAnnotations }}
annotations:
{{- range $key, $value := .Values.dsAnnotations }}
{{ $key }}: {{ $value | quote }}
{{- end }}
{{- end }}
spec:
updateStrategy:
type: {{ .Values.updateStrategy }}
{{- if eq .Values.updateStrategy "RollingUpdate"}}
rollingUpdate:
maxUnavailable: {{ .Values.maxUnavailable }}
{{- end}}
selector:
matchLabels:
{{- include "kured.matchLabels" . | nindent 6 }}
@@ -15,6 +25,9 @@ spec:
metadata:
labels:
{{- include "kured.labels" . | nindent 8 }}
{{- if .Values.podLabels }}
{{- toYaml .Values.podLabels | nindent 8 }}
{{- end }}
{{- if .Values.podAnnotations }}
annotations:
{{- range $key, $value := .Values.podAnnotations }}
@@ -34,7 +47,7 @@ spec:
{{- end }}
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
securityContext:
privileged: true # Give permission to nsenter /proc/1/ns/mnt
@@ -48,9 +61,15 @@ spec:
{{- if .Values.configuration.lockTtl }}
- --lock-ttl={{ .Values.configuration.lockTtl }}
{{- end }}
{{- if .Values.configuration.lockReleaseDelay }}
- --lock-release-delay={{ .Values.configuration.lockReleaseDelay }}
{{- end }}
{{- if .Values.configuration.alertFilterRegexp }}
- --alert-filter-regexp={{ .Values.configuration.alertFilterRegexp }}
{{- end }}
{{- if .Values.configuration.alertFiringOnly }}
- --alert-firing-only={{ .Values.configuration.alertFiringOnly }}
{{- end }}
{{- range .Values.configuration.blockingPodSelector }}
- --blocking-pod-selector={{ . }}
{{- end }}
@@ -63,6 +82,18 @@ spec:
{{- if .Values.configuration.period }}
- --period={{ .Values.configuration.period }}
{{- end }}
{{- if .Values.configuration.forceReboot }}
- --force-reboot
{{- end }}
{{- if .Values.configuration.drainGracePeriod }}
- --drain-grace-period={{ .Values.configuration.drainGracePeriod }}
{{- end }}
{{- if .Values.configuration.drainTimeout }}
- --drain-timeout={{ .Values.configuration.drainTimeout }}
{{- end }}
{{- if .Values.configuration.skipWaitForDeleteTimeout }}
- --skip-wait-for-delete-timeout={{ .Values.configuration.skipWaitForDeleteTimeout }}
{{- end }}
{{- if .Values.configuration.prometheusUrl }}
- --prometheus-url={{ .Values.configuration.prometheusUrl }}
{{- end }}
@@ -72,6 +103,15 @@ spec:
{{- if .Values.configuration.rebootSentinel }}
- --reboot-sentinel={{ .Values.configuration.rebootSentinel }}
{{- end }}
{{- if .Values.configuration.rebootSentinelCommand }}
- --reboot-sentinel-command={{ .Values.configuration.rebootSentinelCommand }}
{{- end }}
{{- if .Values.configuration.rebootCommand }}
- --reboot-command={{ .Values.configuration.rebootCommand }}
{{- end }}
{{- if .Values.configuration.rebootDelay }}
- --reboot-delay={{ .Values.configuration.rebootDelay }}
{{- end }}
{{- if .Values.configuration.slackChannel }}
- --slack-channel={{ .Values.configuration.slackChannel }}
{{- end }}
@@ -81,6 +121,9 @@ spec:
{{- if .Values.configuration.slackUsername }}
- --slack-username={{ .Values.configuration.slackUsername }}
{{- end }}
{{- if .Values.configuration.notifyUrl }}
- --notify-url={{ .Values.configuration.notifyUrl }}
{{- end }}
{{- if .Values.configuration.messageTemplateDrain }}
- --message-template-drain={{ .Values.configuration.messageTemplateDrain }}
{{- end }}
@@ -93,6 +136,15 @@ spec:
{{- if .Values.configuration.timeZone }}
- --time-zone={{ .Values.configuration.timeZone }}
{{- end }}
{{- if .Values.configuration.annotateNodes }}
- --annotate-nodes={{ .Values.configuration.annotateNodes }}
{{- end }}
{{- if .Values.configuration.preferNoScheduleTaint }}
- --prefer-no-schedule-taint={{ .Values.configuration.preferNoScheduleTaint }}
{{- end }}
{{- if .Values.configuration.logFormat }}
- --log-format={{ .Values.configuration.logFormat }}
{{- end }}
{{- range $key, $value := .Values.extraArgs }}
{{- if $value }}
- --{{ $key }}={{ $value }}
@@ -100,6 +152,10 @@ spec:
- --{{ $key }}
{{- end }}
{{- end }}
{{- if .Values.volumeMounts }}
volumeMounts:
{{- toYaml .Values.volumeMounts | nindent 12 }}
{{- end }}
ports:
- containerPort: 8080
name: metrics
@@ -125,3 +181,7 @@ spec:
affinity:
{{ toYaml . | indent 8 }}
{{- end }}
{{- if .Values.volumes }}
volumes:
{{- toYaml .Values.volumes | nindent 8 }}
{{- end }}

View File

@@ -2,7 +2,11 @@
apiVersion: v1
kind: Service
metadata:
{{- if .Values.service.name }}
name: {{ .Values.service.name }}
{{- else }}
name: {{ template "kured.fullname" . }}
{{- end }}
labels:
{{- include "kured.labels" . | nindent 4 }}
{{- if .Values.service.annotations }}
@@ -12,11 +16,14 @@ metadata:
{{- end }}
{{- end }}
spec:
type: ClusterIP
type: {{ .Values.service.type }}
ports:
- name: metrics
port: {{ .Values.service.port }}
targetPort: 8080
{{- if eq .Values.service.type "NodePort" }}
nodePort: {{ .Values.service.nodePort }}
{{- end }}
selector:
{{- include "kured.matchLabels" . | nindent 4 }}
{{- end }}
{{- end }}

View File

@@ -3,19 +3,29 @@ image:
tag: latest
configuration:
# annotationTtl: 0 # force clean annotation after this ammount of time (default 0, disabled)
# annotationTtl: 0 # force clean annotation after this amount of time (default 0, disabled)
# alertFilterRegexp: "" # alert names to ignore when checking for active alerts
# alertFiringOnly: false # only consider firing alerts when checking for active alerts
# blockingPodSelector: [] # label selector identifying pods whose presence should prevent reboots
# endTime: "" # only reboot before this time of day (default "23:59")
# lockAnnotation: "" # annotation in which to record locking node (default "weave.works/kured-node-lock")
period: "1m" # reboot check period (default 1h0m0s)
# forceReboot: false # force a reboot even if the drain fails or times out (default: false)
# drainGracePeriod: "" # time in seconds given to each pod to terminate gracefully, if negative, the default value specified in the pod will be used (default: -1)
# drainTimeout: "" # timeout after which the drain is aborted (default: 0, infinite time)
# skipWaitForDeleteTimeout: "" # when time is greater than zero, skip waiting for the pods whose deletion timestamp is older than N seconds while draining a node (default: 0)
# prometheusUrl: "" # Prometheus instance to probe for active alerts
# rebootDays: [] # only reboot on these days (default [su,mo,tu,we,th,fr,sa])
# rebootSentinel: "" # path to file whose existence signals need to reboot (default "/var/run/reboot-required")
# rebootSentinelCommand: "" # command for which a successful run signals need to reboot (default ""). If non-empty, sentinel file will be ignored.
# slackChannel: "" # slack channel for reboot notfications
# slackHookUrl: "" # slack hook URL for reboot notfications
# slackUsername: "" # slack username for reboot notfications (default "kured")
# notifyUrl: "" # notification URL with the syntax as follows: https://containrrr.dev/shoutrrr/services/overview/
# messageTemplateDrain: "" # slack message template when notifying about a node being drained (default "Draining node %s")
# messageTemplateReboot: "" # slack message template when notifying about a node being rebooted (default "Rebooted node %s")
# startTime: "" # only reboot after this time of day (default "0:00")
# timeZone: "" # time-zone to use (valid zones from "time" golang package)
# annotateNodes: false # enable 'weave.works/kured-reboot-in-progress' and 'weave.works/kured-most-recent-reboot-needed' node annotations to signify kured reboot operations
# lockReleaseDelay: "5m" # hold lock after reboot by this amount of time (default 0, disabled)
# logFormat: "text" # log format specified as text or json, defaults to text

View File

@@ -1,12 +1,15 @@
image:
repository: weaveworks/kured
tag: 1.5.1
tag: "" # will default to the appVersion in Chart.yaml
pullPolicy: IfNotPresent
pullSecrets: []
updateStrategy: OnDelete
updateStrategy: RollingUpdate
# requires RollingUpdate updateStrategy
maxUnavailable: 1
podAnnotations: {}
dsAnnotations: {}
extraArgs: {}
@@ -20,22 +23,35 @@ extraEnvVars:
# value: 123
configuration:
lockTtl: 0 # force clean annotation after this ammount of time (default 0, disabled)
lockTtl: 0 # force clean annotation after this amount of time (default 0, disabled)
alertFilterRegexp: "" # alert names to ignore when checking for active alerts
alertFiringOnly: false # only consider firing alerts when checking for active alerts
blockingPodSelector: [] # label selector identifying pods whose presence should prevent reboots
endTime: "" # only reboot before this time of day (default "23:59")
lockAnnotation: "" # annotation in which to record locking node (default "weave.works/kured-node-lock")
period: "" # reboot check period (default 1h0m0s)
forceReboot: false # force a reboot even if the drain fails or times out (default: false)
drainGracePeriod: "" # time in seconds given to each pod to terminate gracefully, if negative, the default value specified in the pod will be used (default: -1)
drainTimeout: "" # timeout after which the drain is aborted (default: 0, infinite time)
skipWaitForDeleteTimeout: "" # when time is greater than zero, skip waiting for the pods whose deletion timestamp is older than N seconds while draining a node (default: 0)
prometheusUrl: "" # Prometheus instance to probe for active alerts
rebootDays: [] # only reboot on these days (default [su,mo,tu,we,th,fr,sa])
rebootSentinel: "" # path to file whose existence signals need to reboot (default "/var/run/reboot-required")
rebootSentinelCommand: "" # command for which a successful run signals need to reboot (default ""). If non-empty, sentinel file will be ignored.
rebootCommand: "/bin/systemctl reboot" # command to run when a reboot is required by the sentinel
rebootDelay: "" # add a delay after drain finishes but before the reboot command is issued
slackChannel: "" # slack channel for reboot notfications
slackHookUrl: "" # slack hook URL for reboot notfications
slackUsername: "" # slack username for reboot notfications (default "kured")
notifyUrl: "" # notification URL with the syntax as follows: https://containrrr.dev/shoutrrr/services/overview/
messageTemplateDrain: "" # slack message template when notifying about a node being drained (default "Draining node %s")
messageTemplateReboot: "" # slack message template when notifying about a node being rebooted (default "Rebooted node %s")
startTime: "" # only reboot after this time of day (default "0:00")
timeZone: "" # time-zone to use (valid zones from "time" golang package)
annotateNodes: false # enable 'weave.works/kured-reboot-in-progress' and 'weave.works/kured-most-recent-reboot-needed' node annotations to signify kured reboot operations
lockReleaseDelay: 0 # hold lock after reboot by this amount of time (default 0, disabled)
preferNoScheduleTaint: "" # Taint name applied during pending node reboot (to prevent receiving additional pods from other rebooting nodes). Disabled by default. Set e.g. to "weave.works/kured-node-reboot" to enable tainting.
logFormat: "text" # log format specified as text or json, defaults to text
rbac:
create: true
@@ -60,6 +76,10 @@ service:
create: false
port: 8080
annotations: {}
name: ""
type: ClusterIP
podLabels: {}
priorityClassName: ""
@@ -70,3 +90,7 @@ tolerations:
affinity: {}
nodeSelector: {}
volumeMounts: []
volumes: []

View File

@@ -1,4 +1,4 @@
FROM alpine:3.12
FROM alpine:3.15.0
RUN apk update --no-cache && apk upgrade --no-cache && apk add --no-cache ca-certificates tzdata
COPY ./kured /usr/bin/kured
ENTRYPOINT ["/usr/bin/kured"]

View File

@@ -0,0 +1,19 @@
FROM --platform=$BUILDPLATFORM golang:bullseye AS build
ARG TARGETOS
ARG TARGETARCH
ARG TARGETVARIANT
ENV GOOS=$TARGETOS
ENV GOARCH=$TARGETARCH
ENV GOVARIANT=$TARGETVARIANT
WORKDIR /src
COPY . .
RUN go list -f '{{join .Deps "\n"}}' ./cmd/kured | grep -v /vendor/ | xargs go list -f '{{if not .Standard}}{{ $dep := . }}{{range .GoFiles}}{{$dep.Dir}}/{{.}} {{end}}{{end}}'
RUN CGO_ENABLED=0 go build -o cmd/kured/kured cmd/kured/*.go
FROM --platform=$TARGETPLATFORM alpine:3.15 as bin
RUN apk update --no-cache && apk upgrade --no-cache && apk add --no-cache ca-certificates tzdata
COPY --from=build /src/cmd/kured/kured /usr/bin/kured
ENTRYPOINT ["/usr/bin/kured"]

View File

@@ -2,28 +2,38 @@ package main
import (
"context"
"encoding/json"
"fmt"
"math/rand"
"net/http"
"net/url"
"os"
"os/exec"
"regexp"
"strings"
"time"
papi "github.com/prometheus/client_golang/api"
log "github.com/sirupsen/logrus"
"github.com/spf13/cobra"
"github.com/spf13/pflag"
"github.com/spf13/viper"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/rest"
kubectldrain "k8s.io/kubectl/pkg/drain"
"github.com/google/shlex"
shoutrrr "github.com/containrrr/shoutrrr"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
"github.com/weaveworks/kured/pkg/alerts"
"github.com/weaveworks/kured/pkg/daemonsetlock"
"github.com/weaveworks/kured/pkg/delaytick"
"github.com/weaveworks/kured/pkg/notifications/slack"
"github.com/weaveworks/kured/pkg/taints"
"github.com/weaveworks/kured/pkg/timewindow"
)
@@ -31,25 +41,39 @@ var (
version = "unreleased"
// Command line flags
period time.Duration
dsNamespace string
dsName string
lockAnnotation string
lockTTL time.Duration
prometheusURL string
alertFilter *regexp.Regexp
rebootSentinel string
slackHookURL string
slackUsername string
slackChannel string
messageTemplateDrain string
messageTemplateReboot string
podSelectors []string
forceReboot bool
drainTimeout time.Duration
rebootDelay time.Duration
period time.Duration
drainGracePeriod int
skipWaitForDeleteTimeoutSeconds int
dsNamespace string
dsName string
lockAnnotation string
lockTTL time.Duration
lockReleaseDelay time.Duration
prometheusURL string
preferNoScheduleTaintName string
alertFilter *regexp.Regexp
alertFiringOnly bool
rebootSentinelFile string
rebootSentinelCommand string
notifyURL string
slackHookURL string
slackUsername string
slackChannel string
messageTemplateDrain string
messageTemplateReboot string
podSelectors []string
rebootCommand string
logFormat string
nodeID string
rebootDays []string
rebootStart string
rebootEnd string
timezone string
rebootDays []string
rebootStart string
rebootEnd string
timezone string
annotateNodes bool
// Metrics
rebootRequiredGauge = prometheus.NewGaugeVec(prometheus.GaugeOpts{
@@ -59,39 +83,85 @@ var (
}, []string{"node"})
)
const (
// KuredNodeLockAnnotation is the canonical string value for the kured node-lock annotation
KuredNodeLockAnnotation string = "weave.works/kured-node-lock"
// KuredRebootInProgressAnnotation is the canonical string value for the kured reboot-in-progress annotation
KuredRebootInProgressAnnotation string = "weave.works/kured-reboot-in-progress"
// KuredMostRecentRebootNeededAnnotation is the canonical string value for the kured most-recent-reboot-needed annotation
KuredMostRecentRebootNeededAnnotation string = "weave.works/kured-most-recent-reboot-needed"
// EnvPrefix The environment variable prefix of all environment variables bound to our command line flags.
EnvPrefix = "KURED"
)
func init() {
prometheus.MustRegister(rebootRequiredGauge)
}
func main() {
rootCmd := &cobra.Command{
Use: "kured",
Short: "Kubernetes Reboot Daemon",
Run: root}
cmd := NewRootCommand()
if err := cmd.Execute(); err != nil {
log.Fatal(err)
}
}
// NewRootCommand construct the Cobra root command
func NewRootCommand() *cobra.Command {
rootCmd := &cobra.Command{
Use: "kured",
Short: "Kubernetes Reboot Daemon",
PersistentPreRunE: bindViper,
PreRun: flagCheck,
Run: root}
rootCmd.PersistentFlags().StringVar(&nodeID, "node-id", "",
"node name kured runs on, should be passed down from spec.nodeName via KURED_NODE_ID environment variable")
rootCmd.PersistentFlags().BoolVar(&forceReboot, "force-reboot", false,
"force a reboot even if the drain fails or times out (default: false)")
rootCmd.PersistentFlags().IntVar(&drainGracePeriod, "drain-grace-period", -1,
"time in seconds given to each pod to terminate gracefully, if negative, the default value specified in the pod will be used (default: -1)")
rootCmd.PersistentFlags().IntVar(&skipWaitForDeleteTimeoutSeconds, "skip-wait-for-delete-timeout", 0,
"when seconds is greater than zero, skip waiting for the pods whose deletion timestamp is older than N seconds while draining a node (default: 0)")
rootCmd.PersistentFlags().DurationVar(&drainTimeout, "drain-timeout", 0,
"timeout after which the drain is aborted (default: 0, infinite time)")
rootCmd.PersistentFlags().DurationVar(&rebootDelay, "reboot-delay", 0,
"delay reboot for this duration (default: 0, disabled)")
rootCmd.PersistentFlags().DurationVar(&period, "period", time.Minute*60,
"reboot check period")
"sentinel check period")
rootCmd.PersistentFlags().StringVar(&dsNamespace, "ds-namespace", "kube-system",
"namespace containing daemonset on which to place lock")
rootCmd.PersistentFlags().StringVar(&dsName, "ds-name", "kured",
"name of daemonset on which to place lock")
rootCmd.PersistentFlags().StringVar(&lockAnnotation, "lock-annotation", "weave.works/kured-node-lock",
rootCmd.PersistentFlags().StringVar(&lockAnnotation, "lock-annotation", KuredNodeLockAnnotation,
"annotation in which to record locking node")
rootCmd.PersistentFlags().DurationVar(&lockTTL, "lock-ttl", 0,
"expire lock annotation after this duration (default: 0, disabled)")
rootCmd.PersistentFlags().DurationVar(&lockReleaseDelay, "lock-release-delay", 0,
"delay lock release for this duration (default: 0, disabled)")
rootCmd.PersistentFlags().StringVar(&prometheusURL, "prometheus-url", "",
"Prometheus instance to probe for active alerts")
rootCmd.PersistentFlags().Var(&regexpValue{&alertFilter}, "alert-filter-regexp",
"alert names to ignore when checking for active alerts")
rootCmd.PersistentFlags().StringVar(&rebootSentinel, "reboot-sentinel", "/var/run/reboot-required",
"path to file whose existence signals need to reboot")
rootCmd.PersistentFlags().BoolVar(&alertFiringOnly, "alert-firing-only", false,
"only consider firing alerts when checking for active alerts (default: false)")
rootCmd.PersistentFlags().StringVar(&rebootSentinelFile, "reboot-sentinel", "/var/run/reboot-required",
"path to file whose existence triggers the reboot command")
rootCmd.PersistentFlags().StringVar(&preferNoScheduleTaintName, "prefer-no-schedule-taint", "",
"Taint name applied during pending node reboot (to prevent receiving additional pods from other rebooting nodes). Disabled by default. Set e.g. to \"weave.works/kured-node-reboot\" to enable tainting.")
rootCmd.PersistentFlags().StringVar(&rebootSentinelCommand, "reboot-sentinel-command", "",
"command for which a zero return code will trigger a reboot command")
rootCmd.PersistentFlags().StringVar(&rebootCommand, "reboot-command", "/bin/systemctl reboot",
"command to run when a reboot is required")
rootCmd.PersistentFlags().StringVar(&slackHookURL, "slack-hook-url", "",
"slack hook URL for reboot notfications")
"slack hook URL for notifications")
rootCmd.PersistentFlags().StringVar(&slackUsername, "slack-username", "kured",
"slack username for reboot notfications")
"slack username for notifications")
rootCmd.PersistentFlags().StringVar(&slackChannel, "slack-channel", "",
"slack channel for reboot notfications")
rootCmd.PersistentFlags().StringVar(&notifyURL, "notify-url", "",
"notify URL for reboot notfications")
rootCmd.PersistentFlags().StringVar(&messageTemplateDrain, "message-template-drain", "Draining node %s",
"message template used to notify about a node being drained")
rootCmd.PersistentFlags().StringVar(&messageTemplateReboot, "message-template-reboot", "Rebooting node %s",
@@ -109,15 +179,71 @@ func main() {
rootCmd.PersistentFlags().StringVar(&timezone, "time-zone", "UTC",
"use this timezone for schedule inputs")
if err := rootCmd.Execute(); err != nil {
log.Fatal(err)
rootCmd.PersistentFlags().BoolVar(&annotateNodes, "annotate-nodes", false,
"if set, the annotations 'weave.works/kured-reboot-in-progress' and 'weave.works/kured-most-recent-reboot-needed' will be given to nodes undergoing kured reboots")
rootCmd.PersistentFlags().StringVar(&logFormat, "log-format", "text",
"use text or json log format")
return rootCmd
}
// temporary func that checks for deprecated slack-notification-related flags
func flagCheck(cmd *cobra.Command, args []string) {
if slackHookURL != "" && notifyURL != "" {
log.Warnf("Cannot use both --notify-url and --slack-hook-url flags. Kured will use --notify-url flag only...")
}
if slackHookURL != "" {
log.Warnf("Deprecated flag(s). Please use --notify-url flag instead.")
trataURL, err := url.Parse(slackHookURL)
if err != nil {
log.Warnf("slack-hook-url is not properly formatted... no notification will be sent: %v\n", err)
}
if len(strings.Split(strings.Trim(trataURL.Path, "/services/"), "/")) != 3 {
log.Warnf("slack-hook-url is not properly formatted... no notification will be sent: unexpected number of / in URL\n")
} else {
notifyURL = fmt.Sprintf("slack://%s", strings.Trim(trataURL.Path, "/services/"))
}
}
}
// bindViper initializes viper and binds command flags with environment variables
func bindViper(cmd *cobra.Command, args []string) error {
v := viper.New()
v.SetEnvPrefix(EnvPrefix)
v.AutomaticEnv()
bindFlags(cmd, v)
return nil
}
// bindFlags binds each cobra flag to its associated viper configuration (environment variable)
func bindFlags(cmd *cobra.Command, v *viper.Viper) {
cmd.Flags().VisitAll(func(f *pflag.Flag) {
// Environment variables can't have dashes in them, so bind them to their equivalent keys with underscores
if strings.Contains(f.Name, "-") {
v.BindEnv(f.Name, flagToEnvVar(f.Name))
}
// Apply the viper config value to the flag when the flag is not set and viper has a value
if !f.Changed && v.IsSet(f.Name) {
val := v.Get(f.Name)
log.Infof("Binding %s command flag to environment variable: %s", f.Name, flagToEnvVar(f.Name))
cmd.Flags().Set(f.Name, fmt.Sprintf("%v", val))
}
})
}
// flagToEnvVar converts command flag name to equivalent environment variable name
func flagToEnvVar(flag string) string {
envVarSuffix := strings.ToUpper(strings.ReplaceAll(flag, "-", "_"))
return fmt.Sprintf("%s_%s", EnvPrefix, envVarSuffix)
}
// newCommand creates a new Command with stdout/stderr wired to our standard logger
func newCommand(name string, arg ...string) *exec.Cmd {
cmd := exec.Command(name, arg...)
cmd.Stdout = log.NewEntry(log.StandardLogger()).
WithField("cmd", cmd.Args[0]).
WithField("std", "out").
@@ -131,10 +257,19 @@ func newCommand(name string, arg ...string) *exec.Cmd {
return cmd
}
func sentinelExists() bool {
// Relies on hostPID:true and privileged:true to enter host mount space
sentinelCmd := newCommand("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "--", "/usr/bin/test", "-f", rebootSentinel)
if err := sentinelCmd.Run(); err != nil {
// buildHostCommand writes a new command to run in the host namespace
// Rancher based need different pid
func buildHostCommand(pid int, command []string) []string {
// From the container, we nsenter into the proper PID to run the hostCommand.
// For this, kured daemonset need to be configured with hostPID:true and privileged:true
cmd := []string{"/usr/bin/nsenter", fmt.Sprintf("-m/proc/%d/ns/mnt", pid), "--"}
cmd = append(cmd, command...)
return cmd
}
func rebootRequired(sentinelCommand []string) bool {
if err := newCommand(sentinelCommand[0], sentinelCommand[1:]...).Run(); err != nil {
switch err := err.(type) {
case *exec.ExitError:
// We assume a non-zero exit code means 'reboot not required', but of course
@@ -151,35 +286,56 @@ func sentinelExists() bool {
return true
}
func rebootRequired() bool {
if sentinelExists() {
log.Infof("Reboot required")
// RebootBlocker interface should be implemented by types
// to know if their instantiations should block a reboot
type RebootBlocker interface {
isBlocked() bool
}
// PrometheusBlockingChecker contains info for connecting
// to prometheus, and can give info about whether a reboot should be blocked
type PrometheusBlockingChecker struct {
// prometheusClient to make prometheus-go-client and api config available
// into the PrometheusBlockingChecker struct
promClient *alerts.PromClient
// regexp used to get alerts
filter *regexp.Regexp
// bool to indicate if only firing alerts should be considered
firingOnly bool
}
// KubernetesBlockingChecker contains info for connecting
// to k8s, and can give info about whether a reboot should be blocked
type KubernetesBlockingChecker struct {
// client used to contact kubernetes API
client *kubernetes.Clientset
nodename string
// lised used to filter pods (podSelector)
filter []string
}
func (pb PrometheusBlockingChecker) isBlocked() bool {
alertNames, err := pb.promClient.ActiveAlerts(pb.filter, pb.firingOnly)
if err != nil {
log.Warnf("Reboot blocked: prometheus query error: %v", err)
return true
}
count := len(alertNames)
if count > 10 {
alertNames = append(alertNames[:10], "...")
}
if count > 0 {
log.Warnf("Reboot blocked: %d active alerts: %v", count, alertNames)
return true
}
log.Infof("Reboot not required")
return false
}
func rebootBlocked(client *kubernetes.Clientset, nodeID string) bool {
if prometheusURL != "" {
alertNames, err := alerts.PrometheusActiveAlerts(prometheusURL, alertFilter)
if err != nil {
log.Warnf("Reboot blocked: prometheus query error: %v", err)
return true
}
count := len(alertNames)
if count > 10 {
alertNames = append(alertNames[:10], "...")
}
if count > 0 {
log.Warnf("Reboot blocked: %d active alerts: %v", count, alertNames)
return true
}
}
fieldSelector := fmt.Sprintf("spec.nodeName=%s", nodeID)
for _, labelSelector := range podSelectors {
podList, err := client.CoreV1().Pods("").List(context.TODO(), metav1.ListOptions{
func (kb KubernetesBlockingChecker) isBlocked() bool {
fieldSelector := fmt.Sprintf("spec.nodeName=%s,status.phase!=Succeeded,status.phase!=Failed,status.phase!=Unknown", kb.nodename)
for _, labelSelector := range kb.filter {
podList, err := kb.client.CoreV1().Pods("").List(context.TODO(), metav1.ListOptions{
LabelSelector: labelSelector,
FieldSelector: fieldSelector,
Limit: 10})
@@ -200,7 +356,15 @@ func rebootBlocked(client *kubernetes.Clientset, nodeID string) bool {
return true
}
}
return false
}
func rebootBlocked(blockers ...RebootBlocker) bool {
for _, blocker := range blockers {
if blocker.isBlocked() {
return true
}
}
return false
}
@@ -230,6 +394,13 @@ func acquire(lock *daemonsetlock.DaemonSetLock, metadata interface{}, TTL time.D
}
}
func throttle(releaseDelay time.Duration) {
if releaseDelay > 0 {
log.Infof("Delaying lock release by %v", releaseDelay)
time.Sleep(releaseDelay)
}
}
func release(lock *daemonsetlock.DaemonSetLock) {
log.Infof("Releasing lock")
if err := lock.Release(); err != nil {
@@ -242,27 +413,39 @@ func drain(client *kubernetes.Clientset, node *v1.Node) {
log.Infof("Draining node %s", nodename)
if slackHookURL != "" {
if err := slack.NotifyDrain(slackHookURL, slackUsername, slackChannel, messageTemplateDrain, nodename); err != nil {
log.Warnf("Error notifying slack: %v", err)
if notifyURL != "" {
if err := shoutrrr.Send(notifyURL, fmt.Sprintf(messageTemplateDrain, nodename)); err != nil {
log.Warnf("Error notifying: %v", err)
}
}
drainer := &kubectldrain.Helper{
Client: client,
GracePeriodSeconds: -1,
Force: true,
DeleteLocalData: true,
IgnoreAllDaemonSets: true,
ErrOut: os.Stderr,
Out: os.Stdout,
Client: client,
Ctx: context.Background(),
GracePeriodSeconds: drainGracePeriod,
SkipWaitForDeleteTimeoutSeconds: skipWaitForDeleteTimeoutSeconds,
Force: true,
DeleteEmptyDirData: true,
IgnoreAllDaemonSets: true,
ErrOut: os.Stderr,
Out: os.Stdout,
Timeout: drainTimeout,
}
if err := kubectldrain.RunCordonOrUncordon(drainer, node, true); err != nil {
log.Fatal("Error cordonning %s: %v", nodename, err)
if !forceReboot {
log.Fatalf("Error cordonning %s: %v", nodename, err)
}
log.Errorf("Error cordonning %s: %v, continuing with reboot anyway", nodename, err)
return
}
if err := kubectldrain.RunNodeDrain(drainer, nodename); err != nil {
log.Fatal("Error draining %s: %v", nodename, err)
if !forceReboot {
log.Fatalf("Error draining %s: %v", nodename, err)
}
log.Errorf("Error draining %s: %v, continuing with reboot anyway", nodename, err)
return
}
}
@@ -273,31 +456,30 @@ func uncordon(client *kubernetes.Clientset, node *v1.Node) {
Client: client,
ErrOut: os.Stderr,
Out: os.Stdout,
Ctx: context.Background(),
}
if err := kubectldrain.RunCordonOrUncordon(drainer, node, false); err != nil {
log.Fatal("Error uncordonning %s: %v", nodename, err)
log.Fatalf("Error uncordonning %s: %v", nodename, err)
}
}
func commandReboot(nodeID string) {
log.Infof("Commanding reboot for node: %s", nodeID)
func invokeReboot(nodeID string, rebootCommand []string) {
log.Infof("Running command: %s for node: %s", rebootCommand, nodeID)
if slackHookURL != "" {
if err := slack.NotifyReboot(slackHookURL, slackUsername, slackChannel, messageTemplateReboot, nodeID); err != nil {
log.Warnf("Error notifying slack: %v", err)
if notifyURL != "" {
if err := shoutrrr.Send(notifyURL, fmt.Sprintf(messageTemplateReboot, nodeID)); err != nil {
log.Warnf("Error notifying: %v", err)
}
}
// Relies on hostPID:true and privileged:true to enter host mount space
rebootCmd := newCommand("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "/bin/systemctl", "reboot")
if err := rebootCmd.Run(); err != nil {
if err := newCommand(rebootCommand[0], rebootCommand[1:]...).Run(); err != nil {
log.Fatalf("Error invoking reboot command: %v", err)
}
}
func maintainRebootRequiredMetric(nodeID string) {
func maintainRebootRequiredMetric(nodeID string, sentinelCommand []string) {
for {
if sentinelExists() {
if rebootRequired(sentinelCommand) {
rebootRequiredGauge.WithLabelValues(nodeID).Set(1)
} else {
rebootRequiredGauge.WithLabelValues(nodeID).Set(0)
@@ -311,7 +493,45 @@ type nodeMeta struct {
Unschedulable bool `json:"unschedulable"`
}
func rebootAsRequired(nodeID string, window *timewindow.TimeWindow, TTL time.Duration) {
func addNodeAnnotations(client *kubernetes.Clientset, nodeID string, annotations map[string]string) {
node, err := client.CoreV1().Nodes().Get(context.TODO(), nodeID, metav1.GetOptions{})
if err != nil {
log.Fatalf("Error retrieving node object via k8s API: %s", err)
}
for k, v := range annotations {
node.Annotations[k] = v
log.Infof("Adding node %s annotation: %s=%s", node.GetName(), k, v)
}
bytes, err := json.Marshal(node)
if err != nil {
log.Fatalf("Error marshalling node object into JSON: %v", err)
}
_, err = client.CoreV1().Nodes().Patch(context.TODO(), node.GetName(), types.StrategicMergePatchType, bytes, metav1.PatchOptions{})
if err != nil {
var annotationsErr string
for k, v := range annotations {
annotationsErr += fmt.Sprintf("%s=%s ", k, v)
}
log.Fatalf("Error adding node annotations %s via k8s API: %v", annotationsErr, err)
}
}
func deleteNodeAnnotation(client *kubernetes.Clientset, nodeID, key string) {
log.Infof("Deleting node %s annotation %s", nodeID, key)
// JSON Patch takes as path input a JSON Pointer, defined in RFC6901
// So we replace all instances of "/" with "~1" as per:
// https://tools.ietf.org/html/rfc6901#section-3
patch := []byte(fmt.Sprintf("[{\"op\":\"remove\",\"path\":\"/metadata/annotations/%s\"}]", strings.ReplaceAll(key, "/", "~1")))
_, err := client.CoreV1().Nodes().Patch(context.TODO(), nodeID, types.JSONPatchType, patch, metav1.PatchOptions{})
if err != nil {
log.Fatalf("Error deleting node annotation %s via k8s API: %v", key, err)
}
}
func rebootAsRequired(nodeID string, rebootCommand []string, sentinelCommand []string, window *timewindow.TimeWindow, TTL time.Duration, releaseDelay time.Duration) {
config, err := rest.InClusterConfig()
if err != nil {
log.Fatal(err)
@@ -326,44 +546,138 @@ func rebootAsRequired(nodeID string, window *timewindow.TimeWindow, TTL time.Dur
nodeMeta := nodeMeta{}
if holding(lock, &nodeMeta) {
node, err := client.CoreV1().Nodes().Get(context.TODO(), nodeID, metav1.GetOptions{})
if err != nil {
log.Fatalf("Error retrieving node object via k8s API: %v", err)
}
if !nodeMeta.Unschedulable {
node, err := client.CoreV1().Nodes().Get(context.TODO(), nodeID, metav1.GetOptions{})
if err != nil {
log.Fatal(err)
}
uncordon(client, node)
}
// If we're holding the lock we know we've tried, in a prior run, to reboot
// So (1) we want to confirm that the reboot succeeded practically ( !rebootRequired() )
// And (2) check if we previously annotated the node that it was in the process of being rebooted,
// And finally (3) if it has that annotation, to delete it.
// This indicates to other node tools running on the cluster that this node may be a candidate for maintenance
if annotateNodes && !rebootRequired(sentinelCommand) {
if _, ok := node.Annotations[KuredRebootInProgressAnnotation]; ok {
deleteNodeAnnotation(client, nodeID, KuredRebootInProgressAnnotation)
}
}
throttle(releaseDelay)
release(lock)
}
preferNoScheduleTaint := taints.New(client, nodeID, preferNoScheduleTaintName, v1.TaintEffectPreferNoSchedule)
// Remove taint immediately during startup to quickly allow scheduling again.
if !rebootRequired(sentinelCommand) {
preferNoScheduleTaint.Disable()
}
// instantiate prometheus client
promClient, err := alerts.NewPromClient(papi.Config{Address: prometheusURL})
if err != nil {
log.Fatal("Unable to create prometheus client: ", err)
}
source := rand.NewSource(time.Now().UnixNano())
tick := delaytick.New(source, period)
for range tick {
if window.Contains(time.Now()) && rebootRequired() && !rebootBlocked(client, nodeID) {
node, err := client.CoreV1().Nodes().Get(context.TODO(), nodeID, metav1.GetOptions{})
if err != nil {
log.Fatal(err)
}
nodeMeta.Unschedulable = node.Spec.Unschedulable
if !window.Contains(time.Now()) {
// Remove taint outside the reboot time window to allow for normal operation.
preferNoScheduleTaint.Disable()
continue
}
if acquire(lock, &nodeMeta, TTL) {
if !nodeMeta.Unschedulable {
drain(client, node)
}
commandReboot(nodeID)
for {
log.Infof("Waiting for reboot")
time.Sleep(time.Minute)
}
if !rebootRequired(sentinelCommand) {
log.Infof("Reboot not required")
preferNoScheduleTaint.Disable()
continue
}
log.Infof("Reboot required")
var blockCheckers []RebootBlocker
if prometheusURL != "" {
blockCheckers = append(blockCheckers, PrometheusBlockingChecker{promClient: promClient, filter: alertFilter, firingOnly: alertFiringOnly})
}
if podSelectors != nil {
blockCheckers = append(blockCheckers, KubernetesBlockingChecker{client: client, nodename: nodeID, filter: podSelectors})
}
if rebootBlocked(blockCheckers...) {
continue
}
node, err := client.CoreV1().Nodes().Get(context.TODO(), nodeID, metav1.GetOptions{})
if err != nil {
log.Fatalf("Error retrieving node object via k8s API: %v", err)
}
nodeMeta.Unschedulable = node.Spec.Unschedulable
var timeNowString string
if annotateNodes {
if _, ok := node.Annotations[KuredRebootInProgressAnnotation]; !ok {
timeNowString = time.Now().Format(time.RFC3339)
// Annotate this node to indicate that "I am going to be rebooted!"
// so that other node maintenance tools running on the cluster are aware that this node is in the process of a "state transition"
annotations := map[string]string{KuredRebootInProgressAnnotation: timeNowString}
// & annotate this node with a timestamp so that other node maintenance tools know how long it's been since this node has been marked for reboot
annotations[KuredMostRecentRebootNeededAnnotation] = timeNowString
addNodeAnnotations(client, nodeID, annotations)
}
}
if !acquire(lock, &nodeMeta, TTL) {
// Prefer to not schedule pods onto this node to avoid draing the same pod multiple times.
preferNoScheduleTaint.Enable()
continue
}
drain(client, node)
if rebootDelay > 0 {
log.Infof("Delaying reboot for %v", rebootDelay)
time.Sleep(rebootDelay)
}
invokeReboot(nodeID, rebootCommand)
for {
log.Infof("Waiting for reboot")
time.Sleep(time.Minute)
}
}
}
// buildSentinelCommand creates the shell command line which will need wrapping to escape
// the container boundaries
func buildSentinelCommand(rebootSentinelFile string, rebootSentinelCommand string) []string {
if rebootSentinelCommand != "" {
cmd, err := shlex.Split(rebootSentinelCommand)
if err != nil {
log.Fatalf("Error parsing provided sentinel command: %v", err)
}
return cmd
}
return []string{"test", "-f", rebootSentinelFile}
}
// parseRebootCommand creates the shell command line which will need wrapping to escape
// the container boundaries
func parseRebootCommand(rebootCommand string) []string {
command, err := shlex.Split(rebootCommand)
if err != nil {
log.Fatalf("Error parsing provided reboot command: %v", err)
}
return command
}
func root(cmd *cobra.Command, args []string) {
if logFormat == "json" {
log.SetFormatter(&log.JSONFormatter{})
}
log.Infof("Kubernetes Reboot Daemon: %s", version)
nodeID := os.Getenv("KURED_NODE_ID")
if nodeID == "" {
log.Fatal("KURED_NODE_ID environment variable required")
}
@@ -373,6 +687,9 @@ func root(cmd *cobra.Command, args []string) {
log.Fatalf("Failed to build time window: %v", err)
}
sentinelCommand := buildSentinelCommand(rebootSentinelFile, rebootSentinelCommand)
restartCommand := parseRebootCommand(rebootCommand)
log.Infof("Node ID: %s", nodeID)
log.Infof("Lock Annotation: %s/%s:%s", dsNamespace, dsName, lockAnnotation)
if lockTTL > 0 {
@@ -380,12 +697,28 @@ func root(cmd *cobra.Command, args []string) {
} else {
log.Info("Lock TTL not set, lock will remain until being released")
}
log.Infof("Reboot Sentinel: %s every %v", rebootSentinel, period)
if lockReleaseDelay > 0 {
log.Infof("Lock release delay set, lock release will be delayed by: %v", lockReleaseDelay)
} else {
log.Info("Lock release delay not set, lock will be released immediately after rebooting")
}
log.Infof("PreferNoSchedule taint: %s", preferNoScheduleTaintName)
log.Infof("Blocking Pod Selectors: %v", podSelectors)
log.Infof("Reboot on: %v", window)
log.Infof("Reboot schedule: %v", window)
log.Infof("Reboot check command: %s every %v", sentinelCommand, period)
log.Infof("Reboot command: %s", restartCommand)
if annotateNodes {
log.Infof("Will annotate nodes during kured reboot operations")
}
go rebootAsRequired(nodeID, window, lockTTL)
go maintainRebootRequiredMetric(nodeID)
// To run those commands as it was the host, we'll use nsenter
// Relies on hostPID:true and privileged:true to enter host mount space
// PID set to 1, until we have a better discovery mechanism.
hostSentinelCommand := buildHostCommand(1, sentinelCommand)
hostRestartCommand := buildHostCommand(1, restartCommand)
go rebootAsRequired(nodeID, hostRestartCommand, hostSentinelCommand, window, lockTTL, lockReleaseDelay)
go maintainRebootRequiredMetric(nodeID, hostSentinelCommand)
http.Handle("/metrics", promhttp.Handler())
log.Fatal(http.ListenAndServe(":8080", nil))

235
cmd/kured/main_test.go Normal file
View File

@@ -0,0 +1,235 @@
package main
import (
"reflect"
"testing"
log "github.com/sirupsen/logrus"
"github.com/spf13/cobra"
"github.com/weaveworks/kured/pkg/alerts"
assert "gotest.tools/v3/assert"
papi "github.com/prometheus/client_golang/api"
)
type BlockingChecker struct {
blocking bool
}
func (fbc BlockingChecker) isBlocked() bool {
return fbc.blocking
}
var _ RebootBlocker = BlockingChecker{} // Verify that Type implements Interface.
var _ RebootBlocker = (*BlockingChecker)(nil) // Verify that *Type implements Interface.
func Test_flagCheck(t *testing.T) {
var cmd *cobra.Command
var args []string
slackHookURL = "https://hooks.slack.com/services/BLABLABA12345/IAM931A0VERY/COMPLICATED711854TOKEN1SET"
flagCheck(cmd, args)
if notifyURL != "slack://BLABLABA12345/IAM931A0VERY/COMPLICATED711854TOKEN1SET" {
t.Errorf("Slack URL Parsing is wrong: expecting %s but got %s\n", "slack://BLABLABA12345/IAM931A0VERY/COMPLICATED711854TOKEN1SET", notifyURL)
}
}
func Test_rebootBlocked(t *testing.T) {
noCheckers := []RebootBlocker{}
nonblockingChecker := BlockingChecker{blocking: false}
blockingChecker := BlockingChecker{blocking: true}
// Instantiate a prometheusClient with a broken_url
promClient, err := alerts.NewPromClient(papi.Config{Address: "broken_url"})
if err != nil {
log.Fatal("Can't create prometheusClient: ", err)
}
brokenPrometheusClient := PrometheusBlockingChecker{promClient: promClient, filter: nil, firingOnly: false}
type args struct {
blockers []RebootBlocker
}
tests := []struct {
name string
args args
want bool
}{
{
name: "Do not block on no blocker defined",
args: args{blockers: noCheckers},
want: false,
},
{
name: "Ensure a blocker blocks",
args: args{blockers: []RebootBlocker{blockingChecker}},
want: true,
},
{
name: "Ensure a non-blocker doesn't block",
args: args{blockers: []RebootBlocker{nonblockingChecker}},
want: false,
},
{
name: "Ensure one blocker is enough to block",
args: args{blockers: []RebootBlocker{nonblockingChecker, blockingChecker}},
want: true,
},
{
name: "Do block on error contacting prometheus API",
args: args{blockers: []RebootBlocker{brokenPrometheusClient}},
want: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := rebootBlocked(tt.args.blockers...); got != tt.want {
t.Errorf("rebootBlocked() = %v, want %v", got, tt.want)
}
})
}
}
func Test_buildHostCommand(t *testing.T) {
type args struct {
pid int
command []string
}
tests := []struct {
name string
args args
want []string
}{
{
name: "Ensure command will run with nsenter",
args: args{pid: 1, command: []string{"ls", "-Fal"}},
want: []string{"/usr/bin/nsenter", "-m/proc/1/ns/mnt", "--", "ls", "-Fal"},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := buildHostCommand(tt.args.pid, tt.args.command); !reflect.DeepEqual(got, tt.want) {
t.Errorf("buildHostCommand() = %v, want %v", got, tt.want)
}
})
}
}
func Test_buildSentinelCommand(t *testing.T) {
type args struct {
rebootSentinelFile string
rebootSentinelCommand string
}
tests := []struct {
name string
args args
want []string
}{
{
name: "Ensure a sentinelFile generates a shell 'test' command with the right file",
args: args{
rebootSentinelFile: "/test1",
rebootSentinelCommand: "",
},
want: []string{"test", "-f", "/test1"},
},
{
name: "Ensure a sentinelCommand has priority over a sentinelFile if both are provided (because sentinelFile is always provided)",
args: args{
rebootSentinelFile: "/test1",
rebootSentinelCommand: "/sbin/reboot-required -r",
},
want: []string{"/sbin/reboot-required", "-r"},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := buildSentinelCommand(tt.args.rebootSentinelFile, tt.args.rebootSentinelCommand); !reflect.DeepEqual(got, tt.want) {
t.Errorf("buildSentinelCommand() = %v, want %v", got, tt.want)
}
})
}
}
func Test_parseRebootCommand(t *testing.T) {
type args struct {
rebootCommand string
}
tests := []struct {
name string
args args
want []string
}{
{
name: "Ensure a reboot command is properly parsed",
args: args{
rebootCommand: "/sbin/systemctl reboot",
},
want: []string{"/sbin/systemctl", "reboot"},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := parseRebootCommand(tt.args.rebootCommand); !reflect.DeepEqual(got, tt.want) {
t.Errorf("parseRebootCommand() = %v, want %v", got, tt.want)
}
})
}
}
func Test_rebootRequired(t *testing.T) {
type args struct {
sentinelCommand []string
}
tests := []struct {
name string
args args
want bool
}{
{
name: "Ensure rc = 0 means reboot required",
args: args{
sentinelCommand: []string{"true"},
},
want: true,
},
{
name: "Ensure rc != 0 means reboot NOT required",
args: args{
sentinelCommand: []string{"false"},
},
want: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := rebootRequired(tt.args.sentinelCommand); got != tt.want {
t.Errorf("rebootRequired() = %v, want %v", got, tt.want)
}
})
}
}
func Test_rebootRequired_fatals(t *testing.T) {
cases := []struct {
param []string
expectFatal bool
}{
{
param: []string{"true"},
expectFatal: false,
},
{
param: []string{"./babar"},
expectFatal: true,
},
}
defer func() { log.StandardLogger().ExitFunc = nil }()
var fatal bool
log.StandardLogger().ExitFunc = func(int) { fatal = true }
for _, c := range cases {
fatal = false
rebootRequired(c.param)
assert.Equal(t, c.expectFatal, fatal)
}
}

24
go.mod
View File

@@ -1,14 +1,20 @@
module github.com/weaveworks/kured
go 1.15
go 1.16
require (
github.com/prometheus/client_golang v1.8.0
github.com/prometheus/common v0.15.0
github.com/sirupsen/logrus v1.7.0
github.com/spf13/cobra v1.1.1
k8s.io/api v0.19.4
k8s.io/apimachinery v0.19.4
k8s.io/client-go v0.19.4
k8s.io/kubectl v0.19.4
github.com/containrrr/shoutrrr v0.5.2
github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510
github.com/prometheus/client_golang v1.11.0
github.com/prometheus/common v0.32.1
github.com/sirupsen/logrus v1.8.1
github.com/spf13/cobra v1.3.0
github.com/spf13/pflag v1.0.5
github.com/spf13/viper v1.10.1
github.com/stretchr/testify v1.7.0
gotest.tools/v3 v3.0.3
k8s.io/api v0.22.4
k8s.io/apimachinery v0.22.4
k8s.io/client-go v0.22.4
k8s.io/kubectl v0.22.4
)

990
go.sum

File diff suppressed because it is too large Load Diff

View File

@@ -29,7 +29,7 @@ spec:
restartPolicy: Always
containers:
- name: kured
image: docker.io/weaveworks/kured
image: docker.io/weaveworks/kured:1.9.1
# If you find yourself here wondering why there is no
# :latest tag on Docker Hub,see the FAQ in the README
imagePullPolicy: IfNotPresent
@@ -44,22 +44,35 @@ spec:
fieldPath: spec.nodeName
command:
- /usr/bin/kured
# - --alert-filter-regexp=^RebootRequired$
# - --blocking-pod-selector=runtime=long,cost=expensive
# - --blocking-pod-selector=name=temperamental
# - --blocking-pod-selector=...
# - --ds-name=kured
# - --ds-namespace=kube-system
# - --end-time=23:59:59
# - --lock-annotation=weave.works/kured-node-lock
# - --force-reboot=false
# - --drain-grace-period=-1
# - --skip-wait-for-delete-timeout=0
# - --drain-timeout=0
# - --period=1h
# - --ds-namespace=kube-system
# - --ds-name=kured
# - --lock-annotation=weave.works/kured-node-lock
# - --lock-ttl=0
# - --prometheus-url=http://prometheus.monitoring.svc.cluster.local
# - --reboot-days=sun,mon,tue,wed,thu,fri,sat
# - --alert-filter-regexp=^RebootRequired$
# - --alert-firing-only=false
# - --reboot-sentinel=/var/run/reboot-required
# - --prefer-no-schedule-taint=""
# - --reboot-sentinel-command=""
# - --slack-hook-url=https://hooks.slack.com/...
# - --slack-username=prod
# - --slack-channel=alerting
# - --notify-url="" # See also shoutrrr url format
# - --message-template-drain=Draining node %s
# - --message-template-drain=Rebooting node %s
# - --blocking-pod-selector=runtime=long,cost=expensive
# - --blocking-pod-selector=name=temperamental
# - --blocking-pod-selector=...
# - --reboot-days=sun,mon,tue,wed,thu,fri,sat
# - --reboot-delay=90s
# - --start-time=0:00
# - --end-time=23:59:59
# - --time-zone=UTC
# - --annotate-nodes=false
# - --lock-release-delay=30m
# - --log-format=text

View File

@@ -7,22 +7,39 @@ import (
"sort"
"time"
"github.com/prometheus/client_golang/api"
papi "github.com/prometheus/client_golang/api"
v1 "github.com/prometheus/client_golang/api/prometheus/v1"
"github.com/prometheus/common/model"
)
// PrometheusActiveAlerts returns a list of names of active (e.g. pending or firing) alerts, filtered
// by the supplied regexp.
func PrometheusActiveAlerts(prometheusURL string, filter *regexp.Regexp) ([]string, error) {
client, err := api.NewClient(api.Config{Address: prometheusURL})
// PromClient is a wrapper around the Prometheus Client interface and implements the api
// This way, the PromClient can be instantiated with the configuration the Client needs, and
// the ability to use the methods the api has, like Query and so on.
type PromClient struct {
papi papi.Client
api v1.API
}
// NewPromClient creates a new client to the Prometheus API.
// It returns an error on any problem.
func NewPromClient(conf papi.Config) (*PromClient, error) {
promClient, err := papi.NewClient(conf)
if err != nil {
return nil, err
}
client := PromClient{papi: promClient, api: v1.NewAPI(promClient)}
return &client, nil
}
queryAPI := v1.NewAPI(client)
// ActiveAlerts is a method of type PromClient, it returns a list of names of active alerts
// (e.g. pending or firing), filtered by the supplied regexp or by the includeLabels query.
// filter by regexp means when the regex finds the alert-name; the alert is exluded from the
// block-list and will NOT block rebooting. query by includeLabel means,
// if the query finds an alert, it will include it to the block-list and it WILL block rebooting.
func (p *PromClient) ActiveAlerts(filter *regexp.Regexp, firingOnly bool) ([]string, error) {
value, _, err := queryAPI.Query(context.Background(), "ALERTS", time.Now())
// get all alerts from prometheus
value, _, err := p.api.Query(context.Background(), "ALERTS", time.Now())
if err != nil {
return nil, err
}
@@ -32,7 +49,7 @@ func PrometheusActiveAlerts(prometheusURL string, filter *regexp.Regexp) ([]stri
activeAlertSet := make(map[string]bool)
for _, sample := range vector {
if alertName, isAlert := sample.Metric[model.AlertNameLabel]; isAlert && sample.Value != 0 {
if filter == nil || !filter.MatchString(string(alertName)) {
if (filter == nil || !filter.MatchString(string(alertName))) && (!firingOnly || sample.Metric["alertstate"] == "firing") {
activeAlertSet[string(alertName)] = true
}
}
@@ -42,7 +59,7 @@ func PrometheusActiveAlerts(prometheusURL string, filter *regexp.Regexp) ([]stri
for activeAlert := range activeAlertSet {
activeAlerts = append(activeAlerts, activeAlert)
}
sort.Sort(sort.StringSlice(activeAlerts))
sort.Strings(activeAlerts)
return activeAlerts, nil
}

View File

@@ -0,0 +1,141 @@
package alerts
import (
"log"
"net/http"
"net/http/httptest"
"regexp"
"testing"
"github.com/prometheus/client_golang/api"
"github.com/stretchr/testify/assert"
)
type MockResponse struct {
StatusCode int
Body []byte
}
// MockServerProperties ties a mock response to a url and a method
type MockServerProperties struct {
URI string
HTTPMethod string
Response MockResponse
}
// NewMockServer sets up a new MockServer with properties ad starts the server.
func NewMockServer(props ...MockServerProperties) *httptest.Server {
handler := http.HandlerFunc(
func(w http.ResponseWriter, r *http.Request) {
for _, proc := range props {
_, err := w.Write(proc.Response.Body)
if err != nil {
log.Fatal(err)
}
}
})
return httptest.NewServer(handler)
}
func TestActiveAlerts(t *testing.T) {
responsebody := `{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"ALERTS","alertname":"GatekeeperViolations","alertstate":"firing","severity":"warning","team":"platform-infra"},"value":[1622472933.973,"1"]},{"metric":{"__name__":"ALERTS","alertname":"PodCrashing-dev","alertstate":"firing","container":"deployment","instance":"1.2.3.4:8080","job":"kube-state-metrics","namespace":"dev","pod":"dev-deployment-78dcbmf25v","severity":"critical","team":"dev"},"value":[1622472933.973,"1"]},{"metric":{"__name__":"ALERTS","alertname":"PodRestart-dev","alertstate":"firing","container":"deployment","instance":"1.2.3.4:1234","job":"kube-state-metrics","namespace":"qa","pod":"qa-job-deployment-78dcbmf25v","severity":"warning","team":"qa"},"value":[1622472933.973,"1"]},{"metric":{"__name__":"ALERTS","alertname":"PrometheusTargetDown","alertstate":"firing","job":"kubernetes-pods","severity":"warning","team":"platform-infra"},"value":[1622472933.973,"1"]},{"metric":{"__name__":"ALERTS","alertname":"ScheduledRebootFailing","alertstate":"pending","severity":"warning","team":"platform-infra"},"value":[1622472933.973,"1"]}]}}`
addr := "http://localhost:10001"
for _, tc := range []struct {
it string
rFilter string
respBody string
aName string
wantN int
firingOnly bool
}{
{
it: "should return no active alerts",
respBody: responsebody,
rFilter: "",
wantN: 0,
firingOnly: false,
},
{
it: "should return a subset of all alerts",
respBody: responsebody,
rFilter: "Pod",
wantN: 3,
firingOnly: false,
},
{
it: "should return all active alerts by regex",
respBody: responsebody,
rFilter: "*",
wantN: 5,
firingOnly: false,
},
{
it: "should return all active alerts by regex filter",
respBody: responsebody,
rFilter: "*",
wantN: 5,
firingOnly: false,
},
{
it: "should return only firing alerts if firingOnly is true",
respBody: responsebody,
rFilter: "*",
wantN: 4,
firingOnly: true,
},
{
it: "should return ScheduledRebootFailing active alerts",
respBody: `{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"ALERTS","alertname":"ScheduledRebootFailing","alertstate":"pending","severity":"warning","team":"platform-infra"},"value":[1622472933.973,"1"]}]}}`,
aName: "ScheduledRebootFailing",
rFilter: "*",
wantN: 1,
firingOnly: false,
},
{
it: "should not return an active alert if RebootRequired is firing (regex filter)",
respBody: `{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"ALERTS","alertname":"RebootRequired","alertstate":"pending","severity":"warning","team":"platform-infra"},"value":[1622472933.973,"1"]}]}}`,
rFilter: "RebootRequired",
wantN: 0,
firingOnly: false,
},
} {
// Start mockServer
mockServer := NewMockServer(MockServerProperties{
URI: addr,
HTTPMethod: http.MethodPost,
Response: MockResponse{
Body: []byte(tc.respBody),
},
})
// Close mockServer after all connections are gone
defer mockServer.Close()
t.Run(tc.it, func(t *testing.T) {
// regex filter
regex, _ := regexp.Compile(tc.rFilter)
// instantiate the prometheus client with the mockserver-address
p, err := NewPromClient(api.Config{Address: mockServer.URL})
if err != nil {
log.Fatal(err)
}
result, err := p.ActiveAlerts(regex, tc.firingOnly)
if err != nil {
log.Fatal(err)
}
// assert
assert.Equal(t, tc.wantN, len(result), "expected amount of alerts %v, got %v", tc.wantN, len(result))
if tc.aName != "" {
assert.Equal(t, tc.aName, result[0], "expected active alert %v, got %v", tc.aName, result[0])
}
})
}
}

View File

@@ -6,11 +6,18 @@ import (
"fmt"
"time"
v1 "k8s.io/api/apps/v1"
"k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/wait"
"k8s.io/client-go/kubernetes"
)
const (
k8sAPICallRetrySleep = 5 * time.Second // How much time to wait in between retrying a k8s API call
k8sAPICallRetryTimeout = 5 * time.Minute // How long to wait until we determine that the k8s API is definitively unavailable
)
// DaemonSetLock holds all necessary information to do actions
// on the kured ds which holds lock info through annotations.
type DaemonSetLock struct {
@@ -34,11 +41,11 @@ func New(client *kubernetes.Clientset, nodeID, namespace, name, annotation strin
}
// Acquire attempts to annotate the kured daemonset with lock info from instantiated DaemonSetLock using client-go
func (dsl *DaemonSetLock) Acquire(metadata interface{}, TTL time.Duration) (acquired bool, owner string, err error) {
func (dsl *DaemonSetLock) Acquire(metadata interface{}, TTL time.Duration) (bool, string, error) {
for {
ds, err := dsl.client.AppsV1().DaemonSets(dsl.namespace).Get(context.TODO(), dsl.name, metav1.GetOptions{})
ds, err := dsl.GetDaemonSet(k8sAPICallRetrySleep, k8sAPICallRetryTimeout)
if err != nil {
return false, "", err
return false, "", fmt.Errorf("timed out trying to get daemonset %s in namespace %s: %w", dsl.name, dsl.namespace, err)
}
valueString, exists := ds.ObjectMeta.Annotations[dsl.annotation]
@@ -78,10 +85,10 @@ func (dsl *DaemonSetLock) Acquire(metadata interface{}, TTL time.Duration) (acqu
}
// Test attempts to check the kured daemonset lock status (existence, expiry) from instantiated DaemonSetLock using client-go
func (dsl *DaemonSetLock) Test(metadata interface{}) (holding bool, err error) {
ds, err := dsl.client.AppsV1().DaemonSets(dsl.namespace).Get(context.TODO(), dsl.name, metav1.GetOptions{})
func (dsl *DaemonSetLock) Test(metadata interface{}) (bool, error) {
ds, err := dsl.GetDaemonSet(k8sAPICallRetrySleep, k8sAPICallRetryTimeout)
if err != nil {
return false, err
return false, fmt.Errorf("timed out trying to get daemonset %s in namespace %s: %w", dsl.name, dsl.namespace, err)
}
valueString, exists := ds.ObjectMeta.Annotations[dsl.annotation]
@@ -102,9 +109,9 @@ func (dsl *DaemonSetLock) Test(metadata interface{}) (holding bool, err error) {
// Release attempts to remove the lock data from the kured ds annotations using client-go
func (dsl *DaemonSetLock) Release() error {
for {
ds, err := dsl.client.AppsV1().DaemonSets(dsl.namespace).Get(context.TODO(), dsl.name, metav1.GetOptions{})
ds, err := dsl.GetDaemonSet(k8sAPICallRetrySleep, k8sAPICallRetryTimeout)
if err != nil {
return err
return fmt.Errorf("timed out trying to get daemonset %s in namespace %s: %w", dsl.name, dsl.namespace, err)
}
valueString, exists := ds.ObjectMeta.Annotations[dsl.annotation]
@@ -137,6 +144,24 @@ func (dsl *DaemonSetLock) Release() error {
}
}
// GetDaemonSet returns the named DaemonSet resource from the DaemonSetLock's configured client
func (dsl *DaemonSetLock) GetDaemonSet(sleep, timeout time.Duration) (*v1.DaemonSet, error) {
var ds *v1.DaemonSet
var lastError error
err := wait.PollImmediate(sleep, timeout, func() (bool, error) {
ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
if ds, lastError = dsl.client.AppsV1().DaemonSets(dsl.namespace).Get(ctx, dsl.name, metav1.GetOptions{}); lastError != nil {
return false, nil
}
return true, nil
})
if err != nil {
return nil, fmt.Errorf("Timed out trying to get daemonset %s in namespace %s: %v", dsl.name, dsl.namespace, lastError)
}
return ds, nil
}
func ttlExpired(created time.Time, ttl time.Duration) bool {
if ttl > 0 && time.Since(created) >= ttl {
return true

View File

@@ -1,54 +0,0 @@
package slack
import (
"bytes"
"encoding/json"
"fmt"
"net/http"
"time"
)
var (
httpClient = &http.Client{Timeout: 5 * time.Second}
)
type body struct {
Text string `json:"text,omitempty"`
Username string `json:"username,omitempty"`
Channel string `json:"channel,omitempty"`
}
func notify(hookURL, username, channel, message string) error {
msg := body{
Text: message,
Username: username,
Channel: channel,
}
var buf bytes.Buffer
if err := json.NewEncoder(&buf).Encode(&msg); err != nil {
return err
}
resp, err := httpClient.Post(hookURL, "application/json", &buf)
if err != nil {
return err
}
defer resp.Body.Close()
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
return fmt.Errorf(resp.Status)
}
return nil
}
// NotifyDrain is the exposed way to notify of a drain event onto a slack chan
func NotifyDrain(hookURL, username, channel, messageTemplate, nodeID string) error {
return notify(hookURL, username, channel, fmt.Sprintf(messageTemplate, nodeID))
}
// NotifyReboot is the exposed way to notify of a reboot event onto a slack chan
func NotifyReboot(hookURL, username, channel, messageTemplate, nodeID string) error {
return notify(hookURL, username, channel, fmt.Sprintf(messageTemplate, nodeID))
}

166
pkg/taints/taints.go Normal file
View File

@@ -0,0 +1,166 @@
package taints
import (
"context"
"encoding/json"
"fmt"
log "github.com/sirupsen/logrus"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/kubernetes"
)
// Taint allows to set soft and hard limitations for scheduling and executing pods on nodes.
type Taint struct {
client *kubernetes.Clientset
nodeID string
taintName string
effect v1.TaintEffect
exists bool
}
// New provides a new taint.
func New(client *kubernetes.Clientset, nodeID, taintName string, effect v1.TaintEffect) *Taint {
exists, _, _ := taintExists(client, nodeID, taintName)
return &Taint{
client: client,
nodeID: nodeID,
taintName: taintName,
effect: effect,
exists: exists,
}
}
// Enable creates the taint for a node. Creating an existing taint is a noop.
func (t *Taint) Enable() {
if t.taintName == "" {
return
}
if t.exists {
return
}
preferNoSchedule(t.client, t.nodeID, t.taintName, t.effect, true)
t.exists = true
}
// Disable removes the taint for a node. Removing a missing taint is a noop.
func (t *Taint) Disable() {
if t.taintName == "" {
return
}
if !t.exists {
return
}
preferNoSchedule(t.client, t.nodeID, t.taintName, t.effect, false)
t.exists = false
}
func taintExists(client *kubernetes.Clientset, nodeID, taintName string) (bool, int, *v1.Node) {
updatedNode, err := client.CoreV1().Nodes().Get(context.TODO(), nodeID, metav1.GetOptions{})
if err != nil || updatedNode == nil {
log.Fatalf("Error reading node %s: %v", nodeID, err)
}
for i, taint := range updatedNode.Spec.Taints {
if taint.Key == taintName {
return true, i, updatedNode
}
}
return false, 0, updatedNode
}
func preferNoSchedule(client *kubernetes.Clientset, nodeID, taintName string, effect v1.TaintEffect, shouldExists bool) {
taintExists, offset, updatedNode := taintExists(client, nodeID, taintName)
if taintExists && shouldExists {
log.Debugf("Taint %v exists already for node %v.", taintName, nodeID)
return
}
if !taintExists && !shouldExists {
log.Debugf("Taint %v already missing for node %v.", taintName, nodeID)
return
}
type patchTaints struct {
Op string `json:"op"`
Path string `json:"path"`
Value interface{} `json:"value,omitempty"`
}
taint := v1.Taint{
Key: taintName,
Effect: effect,
}
var patches []patchTaints
if len(updatedNode.Spec.Taints) == 0 {
// add first taint and ensure to keep current taints
patches = []patchTaints{
{
Op: "test",
Path: "/spec",
Value: updatedNode.Spec,
},
{
Op: "add",
Path: "/spec/taints",
Value: []v1.Taint{},
},
{
Op: "add",
Path: "/spec/taints/-",
Value: taint,
},
}
} else if taintExists {
// remove taint and ensure to test against race conditions
patches = []patchTaints{
{
Op: "test",
Path: fmt.Sprintf("/spec/taints/%d", offset),
Value: taint,
},
{
Op: "remove",
Path: fmt.Sprintf("/spec/taints/%d", offset),
},
}
} else {
// add missing taint to exsting list
patches = []patchTaints{
{
Op: "add",
Path: "/spec/taints/-",
Value: taint,
},
}
}
patchBytes, err := json.Marshal(patches)
if err != nil {
log.Fatalf("Error encoding taint patch for node %s: %v", nodeID, err)
}
_, err = client.CoreV1().Nodes().Patch(context.TODO(), nodeID, types.JSONPatchType, patchBytes, metav1.PatchOptions{})
if err != nil {
log.Fatalf("Error patching taint for node %s: %v", nodeID, err)
}
if shouldExists {
log.Info("Node taint added")
} else {
log.Info("Node taint removed")
}
}

19
tests/kind/test-metrics.sh Executable file
View File

@@ -0,0 +1,19 @@
#!/usr/bin/env bash
expected="$1"
if [[ "$expected" != "0" && "$expected" != "1" ]]; then
echo "You should give an argument to this script, the gauge value (0 or 1)"
exit 1
fi
HOST="${HOST:-localhost}"
PORT="${PORT:-30000}"
NODENAME="${NODENAME-chart-testing-control-plane}"
reboot_required=$(docker exec "$NODENAME" curl "http://$HOST:$PORT/metrics" | awk '/^kured_reboot_required/{print $2}')
if [[ "$reboot_required" == "$expected" ]]; then
echo "Test success"
else
echo "Test failed"
exit 1
fi