Compare commits

...

415 Commits
1.4.2 ... 1.91c

Author SHA1 Message Date
David Shay
e1db60b2b5 Requested changes to multi-arch build 2022-01-14 09:39:48 -05:00
David Shay
f3295b99ef Added support for multi-arch image build 2022-01-12 10:31:26 -05:00
Daniel Simionato
178ba93b5a Add ability to define ds annotations in helm chart 2022-01-12 07:25:11 +01:00
Christian Kotzbauer
f3ed0087d2 Merge pull request #493 from weaveworks/dependabot/github_actions/helm/chart-testing-action-2.2.0
build(deps): bump helm/chart-testing-action from 2.1.0 to 2.2.0
2022-01-07 20:41:40 +01:00
dependabot[bot]
71a273a14c build(deps): bump helm/chart-testing-action from 2.1.0 to 2.2.0
Bumps [helm/chart-testing-action](https://github.com/helm/chart-testing-action) from 2.1.0 to 2.2.0.
- [Release notes](https://github.com/helm/chart-testing-action/releases)
- [Commits](https://github.com/helm/chart-testing-action/compare/v2.1.0...v2.2.0)

---
updated-dependencies:
- dependency-name: helm/chart-testing-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-01-07 17:02:55 +00:00
Christian Kotzbauer
2b36eab0f8 Merge pull request #492 from weaveworks/feature/release-1.9.1
Prepare release 1.9.1
2022-01-06 19:13:05 +01:00
Christian Kotzbauer
aefd901b4e prepare release 1.9.1
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
2022-01-06 10:06:45 +01:00
Christian Kotzbauer
91b01b5524 Merge pull request #489 from dkulchinsky/dannyk/remove_env_values_from_logs
don't print env variable values in the logs (some are sensitive)
2022-01-05 05:55:28 +01:00
Christian Kotzbauer
f1255bff91 Merge pull request #490 from dkulchinsky/dannyk/deprecation_fix
small fix in deprecation log messages
2022-01-04 19:03:46 +01:00
Danny Kulchinsky
22a76f0da2 small fix in deprecation log messages 2022-01-04 12:23:22 -05:00
Danny Kulchinsky
b52a9587f3 don't print env variable values in the logs (some are sensitive) 2022-01-04 10:55:46 -05:00
Christian Kotzbauer
a6e1cf8191 Merge pull request #487 from weaveworks/release-1.9.0
Release 1.9.0
2021-12-17 14:14:42 +01:00
Christian Kotzbauer
d7576dce0f Merge pull request #456 from span/jsonlogging-chart
Jsonlogging chart
2021-12-17 10:33:58 +01:00
Christian Kotzbauer
661af3b042 prepare 1.9.0 2021-12-17 10:32:21 +01:00
Daniel Holbach
eec8ca1f9b Merge pull request #485 from weaveworks/dependabot/go_modules/github.com/spf13/viper-1.10.1
build(deps): bump github.com/spf13/viper from 1.10.0 to 1.10.1
2021-12-15 19:16:38 +01:00
dependabot[bot]
15356fa26d build(deps): bump github.com/spf13/viper from 1.10.0 to 1.10.1
Bumps [github.com/spf13/viper](https://github.com/spf13/viper) from 1.10.0 to 1.10.1.
- [Release notes](https://github.com/spf13/viper/releases)
- [Commits](https://github.com/spf13/viper/compare/v1.10.0...v1.10.1)

---
updated-dependencies:
- dependency-name: github.com/spf13/viper
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-12-15 17:55:30 +00:00
Daniel Holbach
7e3565a565 Merge pull request #484 from weaveworks/dependabot/go_modules/github.com/spf13/cobra-1.3.0
build(deps): bump github.com/spf13/cobra from 1.2.1 to 1.3.0
2021-12-15 18:45:36 +01:00
dependabot[bot]
a3bc03b4b9 build(deps): bump github.com/spf13/cobra from 1.2.1 to 1.3.0
Bumps [github.com/spf13/cobra](https://github.com/spf13/cobra) from 1.2.1 to 1.3.0.
- [Release notes](https://github.com/spf13/cobra/releases)
- [Changelog](https://github.com/spf13/cobra/blob/master/CHANGELOG.md)
- [Commits](https://github.com/spf13/cobra/compare/v1.2.1...v1.3.0)

---
updated-dependencies:
- dependency-name: github.com/spf13/cobra
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-12-15 17:12:47 +00:00
Daniel Holbach
22ce5a2628 Merge pull request #483 from weaveworks/dependabot/go_modules/github.com/spf13/viper-1.10.0
build(deps): bump github.com/spf13/viper from 1.9.0 to 1.10.0
2021-12-14 18:33:53 +01:00
dependabot[bot]
0f80b70478 build(deps): bump github.com/spf13/viper from 1.9.0 to 1.10.0
Bumps [github.com/spf13/viper](https://github.com/spf13/viper) from 1.9.0 to 1.10.0.
- [Release notes](https://github.com/spf13/viper/releases)
- [Commits](https://github.com/spf13/viper/compare/v1.9.0...v1.10.0)

---
updated-dependencies:
- dependency-name: github.com/spf13/viper
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-12-14 17:12:51 +00:00
Daniel Holbach
28be690849 Merge pull request #480 from weaveworks/dependabot/github_actions/nick-invision/retry-2.6.0
build(deps): bump nick-invision/retry from 2.5.1 to 2.6.0
2021-12-10 19:12:53 +01:00
dependabot[bot]
84292cc8c3 build(deps): bump nick-invision/retry from 2.5.1 to 2.6.0
Bumps [nick-invision/retry](https://github.com/nick-invision/retry) from 2.5.1 to 2.6.0.
- [Release notes](https://github.com/nick-invision/retry/releases)
- [Changelog](https://github.com/nick-invision/retry/blob/master/.releaserc.js)
- [Commits](https://github.com/nick-invision/retry/compare/v2.5.1...v2.6.0)

---
updated-dependencies:
- dependency-name: nick-invision/retry
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-12-10 17:02:55 +00:00
Christian Kotzbauer
21b54227a7 Merge pull request #479 from weaveworks/dependabot/go_modules/github.com/spf13/viper-1.9.0
build(deps): bump github.com/spf13/viper from 1.8.1 to 1.9.0
2021-12-09 18:42:24 +01:00
dependabot[bot]
8e3fb55ec4 build(deps): bump github.com/spf13/viper from 1.8.1 to 1.9.0
Bumps [github.com/spf13/viper](https://github.com/spf13/viper) from 1.8.1 to 1.9.0.
- [Release notes](https://github.com/spf13/viper/releases)
- [Commits](https://github.com/spf13/viper/compare/v1.8.1...v1.9.0)

---
updated-dependencies:
- dependency-name: github.com/spf13/viper
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-12-09 17:11:56 +00:00
Christian Kotzbauer
1a6592851e Merge pull request #459 from georgekaz/patch-1
Exclude terminated pods from the blocking mechanism
2021-12-09 14:02:49 +01:00
Christian Kotzbauer
bba3b8d83f Merge pull request #464 from dkulchinsky/viper_env_vars
bind environment variables to cobra flags with viper
2021-12-09 14:00:11 +01:00
Daniel Holbach
9c6d6a6d82 Merge pull request #476 from dholbach/fix-474
update to test against k8s 1.2{1,2,3} kind images
2021-12-08 10:34:12 +01:00
Daniel Holbach
997794eaac update to test against k8s 1.2{1,2,3} kind images
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-12-08 09:59:01 +01:00
Daniel Holbach
0763cdd95a Merge pull request #475 from dholbach/fix-473
Update k8s dependencies to 0.22.4
2021-12-07 08:40:35 +01:00
Daniel Holbach
c004566e97 ensure go version for tests
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-12-07 08:07:21 +01:00
Daniel Holbach
077ef2488e Update k8s dependencies to 0.22.4
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-12-06 15:08:54 +01:00
Daniel Holbach
06093ab53b Merge pull request #472 from dholbach/chart-1.8.2-update
update image tag to 1.8.2
2021-12-06 15:04:01 +01:00
Daniel Holbach
4d2019c07f update image tag to 1.8.2 2021-12-06 14:40:51 +01:00
Danny Kulchinsky
687aeda813 use sprintf for value in log 2021-12-02 12:05:07 -05:00
Danny Kulchinsky
acddd6b675 minor restructure and adding log for flag to env var binding 2021-12-01 20:59:12 -05:00
Danny Kulchinsky
54e7d93902 dedup const block 2021-12-01 14:50:53 -05:00
Danny Kulchinsky
2666b49d01 address review comments 2021-12-01 11:14:19 -05:00
Daniel Holbach
ff1a27ba8b Merge pull request #468 from weaveworks/fix-ghcr-login
fix ghcr.io login
2021-11-29 20:29:49 +01:00
Daniel Holbach
38ed636ecf fix ghcr.io login
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-11-29 16:59:36 +01:00
Daniel Holbach
8324b09bb9 Merge pull request #446 from weaveworks/revert-445-revert-439-feature/quay-registry
Add ghcr.io as second registry
2021-11-29 16:54:28 +01:00
Daniel Holbach
fb8677e7ac Move to GHCR as a backup for Docker Hub 2021-11-29 16:29:47 +01:00
Daniel Holbach
bdd16d4e01 Merge pull request #467 from weaveworks/dependabot/docker/cmd/kured/alpine-3.15.0
build(deps): bump alpine from 3.14 to 3.15.0 in /cmd/kured
2021-11-29 11:12:38 +01:00
dependabot[bot]
16e6d3c4d3 build(deps): bump alpine from 3.14 to 3.15.0 in /cmd/kured
Bumps alpine from 3.14 to 3.15.0.

---
updated-dependencies:
- dependency-name: alpine
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-11-29 09:51:54 +00:00
Daniel Holbach
af824bfd6a Merge pull request #466 from dholbach/follow-up-to-465
follow up to #465
2021-11-29 10:51:35 +01:00
Daniel Holbach
8264a529d6 follow up to #465
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-11-29 10:29:16 +01:00
Jean-Philippe Evrard
cd25017d67 Merge pull request #462 from jackfrancis/helm-chart-2.10.1
feat: update chart to 2.10.1 w/ 1.8.1 kured image
2021-11-27 11:18:49 +01:00
Daniel Holbach
4c1a23a047 Merge pull request #465 from dholbach/add-docker-dependabot
update docker images too
2021-11-26 09:45:14 +01:00
Daniel Holbach
8f86e1d4f8 update docker images too
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-11-26 09:12:52 +01:00
Danny Kulchinsky
79e19d84ba bind environment variables to cobra flags with viper 2021-11-25 13:53:30 -05:00
Jack
01396db3d1 feat: update chart to 2.10.1 w/ 1.8.1 kured image 2021-11-19 09:08:57 -08:00
georgekaz
d3b59b8922 Exclude terminated pods from the blocking mechanism
Terminated pods should be excluded from the blocking a reboot as per https://github.com/weaveworks/kured/issues/227

This adds status filters to the fieldSelector in order to do that. I've not updated tests here but have successfully tested the exact same filter using kubectl
2021-11-05 16:48:36 +00:00
Daniel Kvist
eafe2c3d98 Update README.md
Add default value for logformat.
2021-10-30 04:35:53 +02:00
Daniel Kvist
e4f1c7358c Add chart configuration for json logging 2021-10-28 10:49:44 +02:00
Daniel Holbach
348b5b4c96 Merge pull request #368 from atighineanu/proto_removed_slack
removed notifications/slack package [Merge after 1.7.0 release]
2021-10-28 08:43:27 +02:00
Christian Kotzbauer
c8a3a6ff9d Merge pull request #455 from span/jsonlogging
Support json logformatter
2021-10-27 18:24:02 +02:00
Daniel Holbach
c196d4e97f Merge pull request #457 from weaveworks/dependabot/github_actions/nick-invision/retry-2.5.1
build(deps): bump nick-invision/retry from 2.5.0 to 2.5.1
2021-10-25 19:26:47 +02:00
dependabot[bot]
efc98c8813 build(deps): bump nick-invision/retry from 2.5.0 to 2.5.1
Bumps [nick-invision/retry](https://github.com/nick-invision/retry) from 2.5.0 to 2.5.1.
- [Release notes](https://github.com/nick-invision/retry/releases)
- [Changelog](https://github.com/nick-invision/retry/blob/master/.releaserc.js)
- [Commits](https://github.com/nick-invision/retry/compare/v2.5.0...v2.5.1)

---
updated-dependencies:
- dependency-name: nick-invision/retry
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-25 17:02:51 +00:00
Daniel Kvist
b108aa4d2d Support json logformatter
This commit introduces a new flag '--log-format' that allows a user
to configure json logging on the pods. If the log-format
is not specified, the formatter will default to the existing
text formatter.
2021-10-25 14:38:53 +02:00
Christian Kotzbauer
2ae0a82510 Merge pull request #454 from weaveworks/dependabot/go_modules/github.com/prometheus/common-0.32.1
build(deps): bump github.com/prometheus/common from 0.32.0 to 0.32.1
2021-10-21 19:31:53 +02:00
dependabot[bot]
f95664156d build(deps): bump github.com/prometheus/common from 0.32.0 to 0.32.1
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.32.0 to 0.32.1.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.32.0...v0.32.1)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-21 17:11:08 +00:00
Christian Kotzbauer
891afda596 Merge pull request #453 from weaveworks/dependabot/go_modules/github.com/prometheus/common-0.32.0
build(deps): bump github.com/prometheus/common from 0.31.1 to 0.32.0
2021-10-21 09:21:07 +02:00
dependabot[bot]
2b89170417 build(deps): bump github.com/prometheus/common from 0.31.1 to 0.32.0
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.31.1 to 0.32.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.31.1...v0.32.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-20 17:10:49 +00:00
Daniel Holbach
de59c2614d Merge pull request #450 from weaveworks/dependabot/go_modules/github.com/containrrr/shoutrrr-0.5.2
build(deps): bump github.com/containrrr/shoutrrr from 0.5.1 to 0.5.2
2021-10-11 19:32:29 +02:00
dependabot[bot]
2e5cb81b4c build(deps): bump github.com/containrrr/shoutrrr from 0.5.1 to 0.5.2
Bumps [github.com/containrrr/shoutrrr](https://github.com/containrrr/shoutrrr) from 0.5.1 to 0.5.2.
- [Release notes](https://github.com/containrrr/shoutrrr/releases)
- [Changelog](https://github.com/containrrr/shoutrrr/blob/main/goreleaser.yml)
- [Commits](https://github.com/containrrr/shoutrrr/compare/v0.5.1...v0.5.2)

---
updated-dependencies:
- dependency-name: github.com/containrrr/shoutrrr
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-11 17:10:43 +00:00
Christian Kotzbauer
fde91041d5 Merge pull request #449 from weaveworks/feature/helm-1.8.0
helm: Prepare release for 1.8.0
2021-10-08 16:01:52 +02:00
Christian Kotzbauer
8a3f486ad9 feat: update to 1.8.0
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-10-08 15:40:57 +02:00
Christian Kotzbauer
513db7ce8c Merge pull request #448 from weaveworks/feature/release-1.8.0
docs: updated version table
2021-10-08 15:06:09 +02:00
Christian Kotzbauer
938cbd428c feat: add also missing prefer-no-schedule-taint
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-10-08 15:05:18 +02:00
Christian Kotzbauer
fa28b550b2 feat: add reboot-sentinel-command to helm-chart
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-10-08 14:56:30 +02:00
Christian Kotzbauer
164183e1bc fix: correct indent
ref: #447

Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-10-08 14:53:12 +02:00
Christian Kotzbauer
7d0499cc0a Merge pull request #430 from amorey/reboot-delay-documentation
Add `reboot-delay` CLI argument to docs, helm charts and manifests
2021-10-08 14:49:05 +02:00
Christian Kotzbauer
5e32864e0b Merge pull request #415 from MattJeanes/prometheus-alert-firing-option-chart
Add --alert-firing-only parameter to chart
2021-10-08 14:48:18 +02:00
Christian Kotzbauer
718faf4d31 Merge branch 'feature/helm-1.8.0' into prometheus-alert-firing-option-chart 2021-10-08 14:47:57 +02:00
Christian Kotzbauer
ac9e669b52 docs: updated version table
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-10-08 14:44:04 +02:00
Daniel Holbach
7c33ad8b6e Merge pull request #436 from weaveworks/dependabot/github_actions/guyarb/golang-test-annoations-0.5.0
Bump guyarb/golang-test-annoations from 0.4.0 to 0.5.0
2021-10-08 10:47:31 +02:00
Daniel Holbach
6f8d36e8db Merge pull request #445 from weaveworks/revert-439-feature/quay-registry
Revert "Add quay.io as second registry"
2021-10-08 10:12:13 +02:00
Daniel Holbach
688346e811 Revert "[WIP] Add quay.io as second registry" 2021-10-08 09:51:04 +02:00
Daniel Holbach
079425349d Merge pull request #444 from weaveworks/dependabot/github_actions/nick-invision/retry-2.5.0
Bump nick-invision/retry from 2.4.1 to 2.5.0
2021-10-08 09:35:50 +02:00
dependabot[bot]
d7589b16d7 Bump nick-invision/retry from 2.4.1 to 2.5.0
Bumps [nick-invision/retry](https://github.com/nick-invision/retry) from 2.4.1 to 2.5.0.
- [Release notes](https://github.com/nick-invision/retry/releases)
- [Changelog](https://github.com/nick-invision/retry/blob/master/.releaserc.js)
- [Commits](https://github.com/nick-invision/retry/compare/v2.4.1...v2.5.0)

---
updated-dependencies:
- dependency-name: nick-invision/retry
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-07 17:03:28 +00:00
atighineanu
bab1425e1a removed notifications/slack package
In this PR the slack-hook-url is translated
 into shoutrrr syntax. Therefore, slack pack
 age as well as checks for slack-hook-url in
 drain and reboot functions are removed.
 Also added a unit test for flagCheck(), this
 function also checks the (slack)URL syntax.
2021-10-07 10:37:47 +02:00
Daniel Holbach
4e1c05c5e3 Merge pull request #443 from weaveworks/feature/contrib-docs
doc: some clarification of release-docs
2021-10-01 08:45:17 +02:00
Christian Kotzbauer
2c7ca8261f doc: some clarification for release-docs
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-09-30 16:52:40 +02:00
Daniel Holbach
6ebf9a96f9 Merge pull request #439 from weaveworks/feature/quay-registry
[WIP] Add quay.io as second registry
2021-09-29 13:34:50 +02:00
Daniel Holbach
adffa11796 Merge pull request #440 from jackfrancis/maintainers-add-jackfrancis
Add jackfrancis to MAINTAINERS
2021-09-29 12:05:25 +02:00
Daniel Holbach
1152d72d51 Merge pull request #441 from weaveworks/dependabot/go_modules/github.com/prometheus/common-0.31.1
Bump github.com/prometheus/common from 0.31.0 to 0.31.1
2021-09-29 10:13:46 +02:00
dependabot[bot]
fb6a224f66 Bump github.com/prometheus/common from 0.31.0 to 0.31.1
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.31.0 to 0.31.1.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.31.0...v0.31.1)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-09-28 17:10:41 +00:00
Jack
c671dce161 Add jackfrancis to MAINTAINERS 2021-09-28 09:08:05 -07:00
Christian Kotzbauer
f8fc6e5017 build: add quay.io as second registry
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-09-28 17:42:49 +02:00
Daniel Holbach
effbf62987 Merge pull request #428 from weaveworks/k8s-1.21
Updated Kubernetes to 1.21
2021-09-28 10:15:50 +02:00
Daniel Holbach
6423bf0069 update to go 1.16 (follow the load of k8s 1.21)
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-09-28 09:06:35 +02:00
Christian Kotzbauer
9c81caa92e build: added k8s@1.22 and dropped k8s@1.19
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-09-28 09:06:35 +02:00
Christian Kotzbauer
978acba030 feat: updated to k8s@1.21
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-09-28 09:06:35 +02:00
Daniel Holbach
acef34e916 Merge pull request #437 from weaveworks/dependabot/go_modules/github.com/prometheus/common-0.31.0
Bump github.com/prometheus/common from 0.30.0 to 0.31.0
2021-09-28 08:59:22 +02:00
Daniel Holbach
f72ef8c2ca Merge pull request #438 from jackfrancis/kubectl-cordon-context
fix: don't use nil context in drain helper
2021-09-28 08:56:02 +02:00
Jack
3c2508050d fix: don't use nil context in drain helper 2021-09-27 12:43:20 -07:00
dependabot[bot]
483a5d8211 Bump github.com/prometheus/common from 0.30.0 to 0.31.0
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.30.0 to 0.31.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.30.0...v0.31.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-09-27 17:09:51 +00:00
dependabot[bot]
9b89a8c0fc Bump guyarb/golang-test-annoations from 0.4.0 to 0.5.0
Bumps [guyarb/golang-test-annoations](https://github.com/guyarb/golang-test-annoations) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/guyarb/golang-test-annoations/releases)
- [Commits](https://github.com/guyarb/golang-test-annoations/compare/v0.4.0...v0.5.0)

---
updated-dependencies:
- dependency-name: guyarb/golang-test-annoations
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-09-27 17:02:45 +00:00
Christian Kotzbauer
b5a4bf432c Merge pull request #360 from cnmcavoy/cnmcavoy/force-reboot-timeout-helm
Add force-reboot and drain timeouts to chart config and ds
2021-09-15 18:45:37 +02:00
Cameron McAvoy
cee15cfc32 Add force-reboot and drain timeouts to chart config and ds 2021-09-15 10:42:50 -05:00
Christian Kotzbauer
b2b1940435 fix: do not use array for stale action (#433) 2021-09-10 09:52:44 +02:00
Daniel Holbach
a9eb139f60 Merge pull request #431 from weaveworks/dependabot/go_modules/github.com/containrrr/shoutrrr-0.5.1
Bump github.com/containrrr/shoutrrr from 0.5.0 to 0.5.1
2021-09-02 08:01:45 +02:00
dependabot[bot]
d6e478ec6b Bump github.com/containrrr/shoutrrr from 0.5.0 to 0.5.1
Bumps [github.com/containrrr/shoutrrr](https://github.com/containrrr/shoutrrr) from 0.5.0 to 0.5.1.
- [Release notes](https://github.com/containrrr/shoutrrr/releases)
- [Changelog](https://github.com/containrrr/shoutrrr/blob/main/goreleaser.yml)
- [Commits](https://github.com/containrrr/shoutrrr/compare/v0.5.0...v0.5.1)

---
updated-dependencies:
- dependency-name: github.com/containrrr/shoutrrr
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-09-01 17:11:21 +00:00
Daniel Holbach
0955403470 Merge pull request #429 from weaveworks/alpine-3.14
build: updated to alpine@3.14
2021-08-30 10:54:35 +02:00
Andres Morey
a3f9796305 Add reboot-delay CLI argument to docs, manifests and helm charts 2021-08-26 16:26:21 +03:00
Christian Kotzbauer
9473f831be build: updated to alpine@3.14
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-08-25 20:19:03 +02:00
Daniel Holbach
3682eb36de Merge pull request #418 from amorey/reboot-delay
Add `reboot-delay` command line argument
2021-08-25 18:12:03 +02:00
Daniel Holbach
3900ee8876 Merge pull request #422 from weaveworks/dependabot/go_modules/github.com/containrrr/shoutrrr-0.5.0
Bump github.com/containrrr/shoutrrr from 0.4.4 to 0.5.0
2021-08-23 11:37:37 +02:00
dependabot[bot]
4c31084be8 Bump github.com/containrrr/shoutrrr from 0.4.4 to 0.5.0
Bumps [github.com/containrrr/shoutrrr](https://github.com/containrrr/shoutrrr) from 0.4.4 to 0.5.0.
- [Release notes](https://github.com/containrrr/shoutrrr/releases)
- [Changelog](https://github.com/containrrr/shoutrrr/blob/main/goreleaser.yml)
- [Commits](https://github.com/containrrr/shoutrrr/compare/v0.4.4...v0.5.0)

---
updated-dependencies:
- dependency-name: github.com/containrrr/shoutrrr
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-08-16 17:08:06 +00:00
David Höld
6c9ee57dc1 Change default updateStrategy to RollingUpdate (#420)
Incrementally update Pods by default when changing the DaemonSet spec.

Fixes #413

Co-authored-by: David Hoeld <david.hoeld@fujitsu.com>
2021-08-06 09:38:37 +02:00
Andres Morey
3c5eb968d3 Add reboot-delay command line argument
Currently, kured issues the system reboot command immediately after
kubectl drain finishes.

This is a problem for processes that need extra time to finish but aren't
running on pods and therefore aren't controlled by kubectl drain (e.g.
de-registering nodes from external load balancers).

This patch solves the problem by introducing a `reboot-delay` command
line argument that can be used to add a delay after kubectl drain
finishes but before the reboot command is issued.
2021-08-03 16:48:25 +03:00
Jean-Philippe Evrard
54c0e4e25f Merge pull request #410 from MattJeanes/prometheus-alert-firing-option
Add --alert-firing-only parameter to only consider firing alerts
2021-07-28 09:02:44 +02:00
Matt Jeanes
afac9d435a Add --alert-firing-only parameter to chart 2021-07-27 11:27:08 +01:00
Matt Jeanes
6af3f1abc1 Add --alert-firing-only parameter to only consider firing alerts 2021-07-27 11:23:10 +01:00
dependabot[bot]
a48da239bc Bump github.com/prometheus/common from 0.29.0 to 0.30.0 (#414)
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.29.0 to 0.30.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.29.0...v0.30.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-07-27 08:03:42 +02:00
SimeonPoot
c7d5810503 Restructuring Prometheus client, added unit-tests to regex-queries active alerts (#386)
* prometheus labels incl tests

* enable label in main, add log, docs

* revert the option to query by label

* revert the option to query by label

* PromClient instantiate by func,white space removal

* revert whitespace fix for readability.

* revert removal of newlines for readability

* rename New to NewPromClient to improve readability

Co-authored-by: simp <simp@saxobank.com>
2021-07-27 07:09:46 +02:00
Renaud Hager
6e16e993d9 Added possibility to mount volumes (#407)
* Added possibility to mount volumes

* Added a new line at the end of the file.

* Added a new line at the end of the file.

* Updated README.md
2021-07-26 13:19:02 +02:00
Daniel Holbach
24f4925b3f Merge pull request #408 from jackfrancis/chart-2.7.1-reboot-default
fix: common default reboot command for code and chart
2021-07-16 09:55:33 +02:00
Jack Francis
c0333d186e fix: common default reboot command for code and chart 2021-07-15 12:34:32 -07:00
Jean-Philippe Evrard
7a2b4a6a1a Merge pull request #405 from weaveworks/dependabot/github_actions/actions/stale-4
Bump actions/stale from 3.0.19 to 4
2021-07-14 19:28:23 +02:00
dependabot[bot]
fb7a7feb15 Bump actions/stale from 3.0.19 to 4
Bumps [actions/stale](https://github.com/actions/stale) from 3.0.19 to 4.
- [Release notes](https://github.com/actions/stale/releases)
- [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/stale/compare/v3.0.19...v4)

---
updated-dependencies:
- dependency-name: actions/stale
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-07-14 17:02:48 +00:00
Daniel Holbach
ffddfd7add Merge pull request #402 from piksel/patch-1
link to versioned shoutrrr docs
2021-07-05 11:33:49 +02:00
Daniel Holbach
a0bc7daa32 Merge pull request #401 from weaveworks/dependabot/go_modules/github.com/spf13/cobra-1.2.1
Bump github.com/spf13/cobra from 1.1.3 to 1.2.1
2021-07-05 10:13:39 +02:00
nils måsén
fd6f520b6e link to versioned shoutrrr docs
shoutrrr now have versioned docs to allow directly linking to the version that matches the one you use
changes should always backwards compatible, but not the other way around
2021-07-04 03:19:25 +02:00
dependabot[bot]
c2f275ebd0 Bump github.com/spf13/cobra from 1.1.3 to 1.2.1
Bumps [github.com/spf13/cobra](https://github.com/spf13/cobra) from 1.1.3 to 1.2.1.
- [Release notes](https://github.com/spf13/cobra/releases)
- [Changelog](https://github.com/spf13/cobra/blob/master/CHANGELOG.md)
- [Commits](https://github.com/spf13/cobra/compare/v1.1.3...v1.2.1)

---
updated-dependencies:
- dependency-name: github.com/spf13/cobra
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-07-02 17:11:01 +00:00
Daniel Holbach
01b0ca8cea Merge pull request #399 from weaveworks/dependabot/github_actions/helm/kind-action-1.2.0
Bump helm/kind-action from 1.1.0 to 1.2.0
2021-07-01 08:21:23 +02:00
dependabot[bot]
aa45139b80 Bump helm/kind-action from 1.1.0 to 1.2.0
Bumps [helm/kind-action](https://github.com/helm/kind-action) from 1.1.0 to 1.2.0.
- [Release notes](https://github.com/helm/kind-action/releases)
- [Commits](https://github.com/helm/kind-action/compare/v1.1.0...v1.2.0)

---
updated-dependencies:
- dependency-name: helm/kind-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-06-30 17:01:53 +00:00
Jean-Philippe Evrard
1654b75ec4 Merge pull request #396 from dholbach/fix-stale
our 'good first issue' issue label has no '-', add 'keep'
2021-06-23 18:11:30 +02:00
Daniel Holbach
e4da44a774 our 'good first issue' issue label has no '-', add 'keep'
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-06-22 15:33:27 +02:00
Jean-Philippe Evrard
e301908ae8 Merge pull request #391 from weaveworks/dependabot/go_modules/github.com/prometheus/common-0.29.0
Bump github.com/prometheus/common from 0.25.0 to 0.29.0
2021-06-20 11:11:45 +02:00
Renaud Hager
f442c6b632 Added rebootCommand values (#394)
* Added rebootCommand values

* Increased chart version from 2.6.0 to 2.7.0

* Updated README.md

* Added a space before a comment.
2021-06-17 18:14:09 +02:00
Daniel Holbach
8fc0a9daf2 Merge pull request #392 from weaveworks/dependabot/github_actions/nick-invision/retry-2.4.1
Bump nick-invision/retry from 2.4.0 to 2.4.1
2021-06-14 16:23:33 +02:00
dependabot[bot]
4d783e4321 Bump nick-invision/retry from 2.4.0 to 2.4.1
Bumps [nick-invision/retry](https://github.com/nick-invision/retry) from 2.4.0 to 2.4.1.
- [Release notes](https://github.com/nick-invision/retry/releases)
- [Changelog](https://github.com/nick-invision/retry/blob/master/.releaserc.js)
- [Commits](https://github.com/nick-invision/retry/compare/v2.4.0...v2.4.1)

---
updated-dependencies:
- dependency-name: nick-invision/retry
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-06-11 06:44:51 +00:00
dependabot[bot]
11f077f689 Bump github.com/prometheus/common from 0.25.0 to 0.29.0
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.25.0 to 0.29.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.25.0...v0.29.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-06-11 05:26:37 +00:00
Daniel Holbach
807b727ab3 Merge pull request #382 from dkulchinsky/fix_deprecation
fix slack deprecation notice
2021-05-31 10:03:04 +02:00
Danny Kulchinsky
c826d73695 fix slack deprecation notice 2021-05-28 13:52:01 -04:00
Daniel Holbach
5193f2de16 Merge pull request #379 from weaveworks/dependabot/github_actions/helm/chart-testing-action-2.1.0
Bump helm/chart-testing-action from 2.0.1 to 2.1.0
2021-05-26 08:59:12 +02:00
dependabot[bot]
310c6c114d Bump helm/chart-testing-action from 2.0.1 to 2.1.0
Bumps [helm/chart-testing-action](https://github.com/helm/chart-testing-action) from 2.0.1 to 2.1.0.
- [Release notes](https://github.com/helm/chart-testing-action/releases)
- [Commits](https://github.com/helm/chart-testing-action/compare/v2.0.1...v2.1.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-05-26 05:13:46 +00:00
Christian Kotzbauer
e1017f47fb Merge pull request #353 from spingel/release-lock-delay-chart
Add lockReleaseDelay parameter to helm chart
2021-05-20 13:55:54 +02:00
Steffen Pingel
42f69c7b1e sort parameters alphabetically 2021-05-20 13:28:12 +02:00
Steffen Pingel
e3f4a88a07 Add documentation for lockReleaseDelay parameter 2021-05-20 13:26:53 +02:00
Steffen Pingel
48dc84b3e6 Add lockReleaseDelay parameter to helm chart 2021-05-19 22:06:25 +02:00
Christian Kotzbauer
816c732f39 Merge pull request #338 from atighineanu/master
update chart definition to include --notify-url
2021-05-19 19:09:53 +02:00
Christian Kotzbauer
0bd22c7c56 Merge branch 'main' into master 2021-05-19 18:49:37 +02:00
Christian Kotzbauer
2850417e48 doc: update image-version 2021-05-19 18:48:51 +02:00
Daniel Holbach
4f8e9a0761 Merge pull request #377 from weaveworks/release-1.7.0
Release 1.7.0: Compatibility docs
2021-05-19 16:01:50 +02:00
Christian Kotzbauer
0cbc2d58d2 doc: add compat-line for 1.7.0 2021-05-19 15:17:02 +02:00
Daniel Holbach
11a62c8ce8 Merge pull request #349 from dholbach/fix-347
Update test matrix to latest 3 sets of k8s releases
2021-05-19 10:43:01 +02:00
Daniel Holbach
89d1fe497c use latest kind
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-05-19 10:20:06 +02:00
Daniel Holbach
870329c7b4 Bounce kubernetes testing versions
This update the test matrix to latest set of 3 minor k8s releases

Fixes: #347

Co-Authored-By: Jean-Philippe Evrard <open-source@a.spamming.party>
2021-05-19 10:17:46 +02:00
Daniel Holbach
78bb9d6c14 Merge pull request #376 from weaveworks/dependabot/go_modules/github.com/prometheus/common-0.25.0
Bump github.com/prometheus/common from 0.24.0 to 0.25.0
2021-05-19 10:16:49 +02:00
Daniel Holbach
c035259d0a Merge pull request #374 from weaveworks/dependabot/github_actions/actions/stale-3.0.19
Bump actions/stale from 3.0.18 to 3.0.19
2021-05-19 10:16:26 +02:00
dependabot[bot]
d08b42933d Bump github.com/prometheus/common from 0.24.0 to 0.25.0
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.24.0 to 0.25.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.24.0...v0.25.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-05-19 05:45:01 +00:00
dependabot[bot]
729fa658dc Bump actions/stale from 3.0.18 to 3.0.19
Bumps [actions/stale](https://github.com/actions/stale) from 3.0.18 to 3.0.19.
- [Release notes](https://github.com/actions/stale/releases)
- [Commits](https://github.com/actions/stale/compare/v3.0.18...v3.0.19)

Signed-off-by: dependabot[bot] <support@github.com>
2021-05-18 07:50:09 +00:00
Daniel Holbach
d7377bff1b update golang.org/x/crypto - break out of #349
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-05-18 09:38:38 +02:00
Daniel Holbach
42e4c317ae Merge pull request #369 from weaveworks/dependabot/go_modules/github.com/prometheus/common-0.24.0
Bump github.com/prometheus/common from 0.23.0 to 0.24.0
2021-05-11 09:08:50 +02:00
dependabot[bot]
5061a611a8 Bump github.com/prometheus/common from 0.23.0 to 0.24.0
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.23.0 to 0.24.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.23.0...v0.24.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-05-11 05:17:02 +00:00
Jean-Philippe Evrard
eca6da173c Clarify and simplify tests
Without this, we get multiple questions about our testing.
This should help clarify the tests and our coverage by:
- Simplifying our coverage
- Documenting better the purpose of each workflow file
- Documenting our testing and development activities better.
2021-05-04 11:24:20 +02:00
Jean-Philippe Evrard
7582e166be Merge pull request #367 from weaveworks/dependabot/go_modules/github.com/prometheus/common-0.23.0
Bump github.com/prometheus/common from 0.18.0 to 0.23.0
2021-05-04 08:43:27 +02:00
Jean-Philippe Evrard
de23444a5f Merge pull request #366 from papanito/papanito/update-docu-for-ms-teams
docu: update url for ms teams notifications, fixes #362
2021-05-04 08:42:54 +02:00
dependabot[bot]
4d5ea21db3 Bump github.com/prometheus/common from 0.18.0 to 0.23.0
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.18.0 to 0.23.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.18.0...v0.23.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-04-28 16:16:32 +00:00
papanito
bb56c731bb docu: update url for ms teams notifications, fixes #362 2021-04-28 09:48:23 +02:00
Daniel Holbach
ea6844d315 Merge pull request #365 from evrardjp/fix-kind-action
Use stable kind-action
2021-04-28 08:52:08 +02:00
Jean-Philippe Evrard
247e6f6c70 Use stable kind-action
We are relying on master, which might break anytime (or in this
case, moved to another branch).

Instead we should rely on a stable version, and unfreeze if
necessary. Dependabot helps us maintain those releases anyway.
2021-04-27 10:11:16 +02:00
Jean-Philippe Evrard
43a7a1a1ca Merge pull request #352 from spingel/release-lock-delay
Add parameter for delaying release of lock
2021-04-21 11:42:09 +02:00
Jean-Philippe Evrard
803ecef1de Merge pull request #324 from weaveworks/dependabot/go_modules/github.com/prometheus/client_golang-1.10.0
Bump github.com/prometheus/client_golang from 1.8.0 to 1.10.0
2021-04-21 11:33:49 +02:00
dependabot[bot]
0eb318c1b2 Bump github.com/prometheus/client_golang from 1.8.0 to 1.10.0
Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.8.0 to 1.10.0.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/master/CHANGELOG.md)
- [Commits](https://github.com/prometheus/client_golang/compare/v1.8.0...v1.10.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-04-21 08:22:04 +00:00
Daniel Holbach
6a7494fda5 Merge pull request #363 from weaveworks/dependabot/go_modules/github.com/containrrr/shoutrrr-0.4.4
Bump github.com/containrrr/shoutrrr from 0.4.3 to 0.4.4
2021-04-21 08:27:55 +02:00
dependabot[bot]
7b44fd2eb8 Bump github.com/containrrr/shoutrrr from 0.4.3 to 0.4.4
Bumps [github.com/containrrr/shoutrrr](https://github.com/containrrr/shoutrrr) from 0.4.3 to 0.4.4.
- [Release notes](https://github.com/containrrr/shoutrrr/releases)
- [Changelog](https://github.com/containrrr/shoutrrr/blob/main/goreleaser.yml)
- [Commits](https://github.com/containrrr/shoutrrr/compare/v0.4.3...v0.4.4)

Signed-off-by: dependabot[bot] <support@github.com>
2021-04-21 05:48:57 +00:00
Daniel Holbach
3f322dfbb2 Merge pull request #361 from weaveworks/dependabot/go_modules/github.com/containrrr/shoutrrr-0.4.3
Bump github.com/containrrr/shoutrrr from 0.4.2 to 0.4.3
2021-04-20 08:37:34 +02:00
dependabot[bot]
4a11a95b86 Bump github.com/containrrr/shoutrrr from 0.4.2 to 0.4.3
Bumps [github.com/containrrr/shoutrrr](https://github.com/containrrr/shoutrrr) from 0.4.2 to 0.4.3.
- [Release notes](https://github.com/containrrr/shoutrrr/releases)
- [Changelog](https://github.com/containrrr/shoutrrr/blob/main/goreleaser.yml)
- [Commits](https://github.com/containrrr/shoutrrr/compare/v0.4.2...v0.4.3)

Signed-off-by: dependabot[bot] <support@github.com>
2021-04-20 05:49:19 +00:00
Jean-Philippe Evrard
0b759a9ff6 Update kured-ds.yaml
Without this patch, it's not clear that we added command line
arguments recently. This should expose our latest changes in the
future released manifest.
2021-04-14 19:52:25 +02:00
Daniel Holbach
496d2b26d8 Merge pull request #354 from evrardjp/test-prom
Add prometheus export metrics functional testing
2021-04-14 10:11:26 +02:00
Daniel Holbach
c1a9de6622 Merge pull request #355 from evrardjp/fix-linter-false-positive
Reduce false positives
2021-04-14 10:10:54 +02:00
Jean-Philippe Evrard
79f22cee67 Merge branch 'main' into release-lock-delay 2021-04-14 09:48:28 +02:00
Jean-Philippe Evrard
83415d0e59 Reduce false positives in chart testing
Without this change, the "Test helm chart (install) action" will
rightfully succeed when our helm chart gets installed and has
no syntax issues. However, it doesn't test if kured is properly
installed. For example, the helm chart can try to install a
yet unpublished image, and our test will succeed, as the syntax
is still valid.

This is a problem, as everything looks green, but it's not
effectively working. Our other jobs are focusing on code changes,
so they rightfully override the image tag, which is not what
we want in this "Test helm chart" action.

This fixes it by adding an extra job in the workflow, depending
on the chart testing.
2021-04-13 17:20:06 +02:00
Jean-Philippe Evrard
8046977d1b Merge pull request #341 from cnmcavoy/cnmcavoy/force-reboot-timeout
Add force-reboot after force-timeout duration has been exceeded
2021-04-13 16:47:41 +02:00
Jean-Philippe Evrard
240a669727 Add prometheus export metrics functional testing
Without this, we can't know if the exposed prometheus metrics
behave properly.

This is a problem, as the only way we can evaluate the success
(right now), is a compilation success or failure from kured.
While this is a good start, it doesn't translate to what we
claim to offer: A boolean showing if a reboot is required.

This fixes it by creating a new github action workflow testing
if the float64 gauge is properly showing 0 for no reboot, 1 for reboot.
This is done by exposing the metrics endpoint through a node port.
A helm chart change was required to have the ability to expose
the service on a node port. We connect to the kind node through
docker in the `tests/test-metrics.sh`, where we curl the nodeport,
extract the only relevant metric, and compare it to the expected result.
2021-04-13 16:17:42 +02:00
Steffen Pingel
f7b3de36a6 Add parameter for delaying release of lock
This support throtteling of reboots across the cluster
and allows rebooted nodes to reschedule pods, e.g.
to synchronize replicated state before rebooting the next node.
2021-04-13 10:14:14 +02:00
Jean-Philippe Evrard
4c4508a586 Merge pull request #342 from jackfrancis/retry-daemonset-get
chore: retry daemonset get operations
2021-04-13 09:50:45 +02:00
Jean-Philippe Evrard
4e4c29aec0 Merge pull request #350 from dholbach/update-k8s-deps
update to latest k8s deps of 1.20 branch
2021-04-12 11:29:05 +02:00
Daniel Holbach
59d5266005 update to latest k8s deps of 1.20 branch
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-04-12 11:05:07 +02:00
Cameron McAvoy
25dcf3cb12 Expose SkipWaitForDeleteTimeoutSeconds and explicitly return when cordonning fails 2021-04-08 09:52:15 -05:00
Cameron McAvoy
5a86ef40e8 Update the default drain timeout to be infinite 2021-04-07 17:17:33 -05:00
Cameron McAvoy
2400f34cc0 Don't panic if the cordon fails and force-reboot is true 2021-04-07 14:58:21 -05:00
Cameron McAvoy
8db5650510 Refactor force-drain to be a drain-timeout in general 2021-04-07 12:57:01 -05:00
Jack Francis
390f6e9f99 chore: retry daemonset get operations 2021-04-07 09:27:05 -07:00
Cameron McAvoy
65292983f2 Add force-reboot after force-timeout duration has been exceeded 2021-04-07 09:39:01 -05:00
atighineanu
120bf713c0 update chart definition to include --notify-url 2021-04-07 13:26:02 +02:00
Daniel Holbach
d2c9ef8cba Merge pull request #336 from weaveworks/dependabot/go_modules/github.com/containrrr/shoutrrr-0.4.2
Bump github.com/containrrr/shoutrrr from 0.4.1 to 0.4.2
2021-04-07 11:14:55 +02:00
dependabot[bot]
9030f56648 Bump github.com/containrrr/shoutrrr from 0.4.1 to 0.4.2
Bumps [github.com/containrrr/shoutrrr](https://github.com/containrrr/shoutrrr) from 0.4.1 to 0.4.2.
- [Release notes](https://github.com/containrrr/shoutrrr/releases)
- [Changelog](https://github.com/containrrr/shoutrrr/blob/main/goreleaser.yml)
- [Commits](https://github.com/containrrr/shoutrrr/compare/v0.4.1...v0.4.2)

Signed-off-by: dependabot[bot] <support@github.com>
2021-04-07 08:48:02 +00:00
Jean-Philippe Evrard
1c13476b49 Update deps
This is the result of a go mod tidy.
It should clarify our dependencies.
2021-04-07 10:43:59 +02:00
Jean-Philippe Evrard
cd7976ce4f Add chart-testing target-branch
Without this patch, chart-testing is using the branch named
"master" by default.

This is a problem, as we just renamed our development branch
"main" instead of "master".

This should fix it by pointing to the right branch.
2021-04-07 10:43:43 +02:00
Jean-Philippe Evrard
8dfe5f2486 Merge pull request #340 from dholbach/update-dev-docs
Update dev docs
2021-04-06 17:14:58 +02:00
Daniel Holbach
f1c5608bcd Merge pull request #339 from evrardjp/fix-gh-action-cancelling
Update github actions
2021-04-06 17:12:41 +02:00
Daniel Holbach
c2122f3924 udpate Dev docs to latest
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-04-06 16:40:41 +02:00
Jean-Philippe Evrard
babc9095ef Update github actions
Without this patch, github actions are lagging behind.
This should improve our coverage.
2021-04-06 15:26:33 +02:00
Daniel Holbach
5305d7b34d Merge pull request #337 from dholbach/change-to-main-branch
Change default branch to 'main'.
2021-04-06 15:00:54 +02:00
atighineanu
9583df2e50 update chart definition to include --notify-url 2021-04-06 13:19:38 +02:00
Daniel Holbach
56a26a2f25 Change default branch to 'main'.
- Make markdownlint happier in a couple of places.
	- Rename '*-master-*' files
	- Change default branches of some other projects
	  we rely on. They moved to 'main' as well.
	- Standardise version of actions/checkout.
	- Update last release in README to 1.6.1.
	- Bbump chart version.

	Eventually closes: #252

Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-04-06 12:46:12 +02:00
Jean-Philippe Evrard
3fa1f3feec Merge pull request #335 from weaveworks/helm-app-version
Use chart appVersion as default image-tag
2021-04-02 10:06:06 +02:00
Christian Kotzbauer
21fdba4ef0 feat: use chart appVersion as default image-tag
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-04-02 09:41:37 +02:00
Jean-Philippe Evrard
4d45fa8bdb Fix invoke reboot for custom commands
Without this patch, the rebootCommand passed to invokeReboot is
ignored, and the command used for reboot is always systemctl reboot.

This is a problem, as we are aiming for flexible commands for this
release.

This fixes it by restoring the previous behaviour before commit
[1] happened.

[1]: 694957d56e
2021-04-02 09:15:59 +02:00
Jean-Philippe Evrard
e09359e46c Merge pull request #330 from weaveworks/dependabot/github_actions/guyarb/golang-test-annoations-v0.4.0
Bump guyarb/golang-test-annoations from v0.3.0 to v0.4.0
2021-03-29 15:53:11 +02:00
Daniel Holbach
770eb1e4f8 Merge pull request #315 from atighineanu/master
Implement universal notification mechanism (NEW)
2021-03-29 15:21:12 +02:00
atighineanu
694957d56e Implement universal notification mechanism
This patch gives the possibility to send notifications
 across different technologies. Also, this patch makes
 slack-hook-url, slack-username and slack-channel
 deprecated (informed by a warning).
 Also, updated the documentation (Readme).
2021-03-29 11:26:18 +02:00
dependabot[bot]
85c42fdb81 Bump guyarb/golang-test-annoations from v0.3.0 to v0.4.0
Bumps [guyarb/golang-test-annoations](https://github.com/guyarb/golang-test-annoations) from v0.3.0 to v0.4.0.
- [Release notes](https://github.com/guyarb/golang-test-annoations/releases)
- [Commits](https://github.com/guyarb/golang-test-annoations/compare/v0.3.0...48645c385003e0c362bf954d4018895be76f1d3d)

Signed-off-by: dependabot[bot] <support@github.com>
2021-03-29 09:19:36 +00:00
Jean-Philippe Evrard
3671c27e37 Add go tests
Without this patch, go test bugs can appear without getting caught,
neither in periodics, nor in PRs.

This should fix it.
2021-03-29 10:26:38 +02:00
Jean-Philippe Evrard
5930d733f8 Fix the Fatal calls using formatting
Without this, go test will rightfully fail.

This is a problem, as we don't have go test enabled, but we want
to have this in the future.

This should fix it.
2021-03-29 09:50:56 +02:00
Jean-Philippe Evrard
fd63e9a74b Add flexible commands parameters
Without this patch, you cannot configure the reboot
command to use, or the use another command to trigger
a reboot.

This is a problem, as multiple users have asked for
it in the past, and we are lacking flexibility.

This fixes it by introducing two new parameters,
- one to provide a custom reboot command.
  This should help people running kured on
  non systemd OS
- one to provide a custom sentinel command.
  This should help people running non Ubuntu OS,
  as they can directly use their command instead of
  generating a file (useful for CentOS/SUSE)

For this, several refactors had to be done, to
remove global state in some functions. Making those
functions closer to "pure functions" helps us
increase our test coverage here and later.

As commandReboot was very close to rebootCommand,
the function to reboot the node has been renamed
to invokeReboot.
2021-03-29 09:50:56 +02:00
Jean-Philippe Evrard
837bd4eb2a Refactor reboot blocks
Without this patch, we rely on global state in many functions for
which we check the reboot blockers.

This is a problem, as it's harder to test.

This patch fixes it by refactoring the reboot blockers. This also
includes a first series of unit tests for our main.
2021-03-29 09:50:56 +02:00
Jean-Philippe Evrard
2a95f0b6c8 Fix periodic jobs
Without this patch, the version of 1.20 is taken in jobs as 1.2.
This is a problem, as it breaks all jobs, because there is no
file to provision a cluster with kubernetes 1.2 (and we shouldn't
do this!)

This fixes it by ensuring there is no mangling of the version
strings, and therefore the right file is used.
2021-03-24 14:29:26 +01:00
Jean-Philippe Evrard
15c57927c8 Update the deprecated DeleteLocalData
DeleteLocalData was deprecated for users of kubectl in 0.20 [1].
At the same time of the deprecation, the relevant code was also
removed [2] without warning: The DeleteLocalData from the helper
structure was simply renamed DeleteEmptyDirData, without shims
on the exposed pkg.

This is a problem, as it completely breaks kured.

This should fix it, by using the new field name.

[1]:
56ea9621b7
[2]:
56ea9621b7 (diff-041bdcdedca650a38a8d82cf15ab6f3665b7b84a0fb44a8bb5dcdc5cd944c63d)
2021-03-22 14:28:17 +01:00
Jean-Philippe Evrard
20cbf6112d Bouncing go.mod with latest kubernetes packages
Without this patch, go.mod will lag behind for the kubernetes
packages, as it's not automatically tested by dependabot.

We should bump versions with each new minor release of kured.

This should fix it.
2021-03-22 14:28:17 +01:00
Christian Kotzbauer
f668bdb1ba Merge pull request #325 from weaveworks/stale-duration
Extend close-duration for stale issues and prs
2021-03-19 11:36:18 +01:00
Christian Kotzbauer
8209647e69 change comment accordingly
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-03-19 10:20:32 +01:00
Christian Kotzbauer
46354837f9 extend close-duration for stale issues and prs
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2021-03-19 08:26:11 +01:00
Jean-Philippe Evrard
de2e0bb2c8 Merge pull request #321 from dholbach/add-maintainers
Adding a MAINTAINERS file
2021-03-11 14:41:49 +01:00
Daniel Holbach
2b88b72d38 Merge pull request #318 from jackfrancis/node-annotations-chart
update chart definition to include --annotate-nodes
2021-03-11 12:04:39 +01:00
Jack Francis
87e610c25f update chart definition to include --annotate-nodes 2021-03-10 16:03:46 -08:00
Daniel Holbach
fe4ad73c2d Adding a MAINTAINERS file
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-03-10 18:16:11 +01:00
Daniel Holbach
f6ada05c5d Merge pull request #320 from dholbach/alpine-3.13
update to alpine 3.13
2021-03-10 08:50:42 +01:00
Daniel Holbach
355813de30 update to alpine 3.13
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-03-10 08:10:36 +01:00
Daniel Holbach
8a5f69480b Merge pull request #319 from weaveworks/dependabot/go_modules/github.com/sirupsen/logrus-1.8.1
Bump github.com/sirupsen/logrus from 1.8.0 to 1.8.1
2021-03-10 08:07:11 +01:00
Daniel Holbach
1e0fc11b01 Merge pull request #316 from weaveworks/dependabot/github_actions/actions/stale-v3.0.18
Bump actions/stale from v3.0.17 to v3.0.18
2021-03-10 07:55:11 +01:00
dependabot[bot]
2218e29504 Bump github.com/sirupsen/logrus from 1.8.0 to 1.8.1
Bumps [github.com/sirupsen/logrus](https://github.com/sirupsen/logrus) from 1.8.0 to 1.8.1.
- [Release notes](https://github.com/sirupsen/logrus/releases)
- [Changelog](https://github.com/sirupsen/logrus/blob/master/CHANGELOG.md)
- [Commits](https://github.com/sirupsen/logrus/compare/v1.8.0...v1.8.1)

Signed-off-by: dependabot[bot] <support@github.com>
2021-03-10 05:55:36 +00:00
Daniel Holbach
250b9bad05 Merge pull request #296 from jackfrancis/node-annotations
add node annotations to identify kured reboot operations
2021-03-09 10:14:46 +01:00
Daniel Holbach
32e01a8417 Merge pull request #294 from jackfrancis/always-drain
always drain before reboot
2021-03-09 10:13:36 +01:00
Jack Francis
baf83408b8 add node annotations
adds a new --annotate-nodes daemonset runtime argument, which does the following when enabled:

- adds a new node annotation "weave.works/kured-most-recent-reboot-needed" with a value of the current RFC3339 timestamp as soon as kured identifies that a node needs to be rebooted
- adds a new node annotation "weave.works/kured-reboot-in-progress" with a value of the current RFC3339 timestamp as soon as kured identifies that a node needs to be rebooted
- removes the annotation "weave.works/kured-reboot-in-progress" when kured has successfully rebooted the node
2021-03-08 17:22:47 -08:00
Jack Francis
93c8242b89 always drain before reboot
This changes the pre-reboot drain functionality so that it always runs, regardless of the value of the Unschedulable node property.

Because kubectl drain is idempotent, we shouldn't have to worry about whether the node has already been set to Unschedulable (perhaps due to a prior, unsuccessful loop of the kured reboot cycle): we can run it over and over again. And because this drain func actually does a cordon + drain (and it only performs the drain if a cordon is successful), we can be sure that we aren't going to be thrashing this node w/ respect to scheduled pods.

This also fixes an edge case: if the node has been marked Unschedulable out-of-band, but workloads remain Running on this node, kured will no longer reboot the node's underlying VM/machine while it is actively running pods.
2021-03-08 17:20:31 -08:00
dependabot[bot]
c3d4c36493 Bump actions/stale from v3.0.17 to v3.0.18
Bumps [actions/stale](https://github.com/actions/stale) from v3.0.17 to v3.0.18.
- [Release notes](https://github.com/actions/stale/releases)
- [Commits](https://github.com/actions/stale/compare/v3.0.17...3b3c3f03cd4d8e2b61e179ef744a0d20efbe90b4)

Signed-off-by: dependabot[bot] <support@github.com>
2021-03-08 06:35:26 +00:00
Daniel Holbach
1fd09dd572 Merge pull request #310 from weaveworks/dependabot/go_modules/github.com/sirupsen/logrus-1.8.0
Bump github.com/sirupsen/logrus from 1.7.0 to 1.8.0
2021-03-02 10:48:41 +01:00
Daniel Holbach
d21a438197 Merge pull request #311 from weaveworks/dependabot/github_actions/actions/stale-v3.0.17
Bump actions/stale from v3.0.16 to v3.0.17
2021-03-02 10:48:15 +01:00
dependabot[bot]
3fdd1cf6f7 Bump actions/stale from v3.0.16 to v3.0.17
Bumps [actions/stale](https://github.com/actions/stale) from v3.0.16 to v3.0.17.
- [Release notes](https://github.com/actions/stale/releases)
- [Commits](https://github.com/actions/stale/compare/v3.0.16...996798eb71ef485dc4c7b4d3285842d714040c4a)

Signed-off-by: dependabot[bot] <support@github.com>
2021-02-19 05:49:06 +00:00
dependabot[bot]
48688044d5 Bump github.com/sirupsen/logrus from 1.7.0 to 1.8.0
Bumps [github.com/sirupsen/logrus](https://github.com/sirupsen/logrus) from 1.7.0 to 1.8.0.
- [Release notes](https://github.com/sirupsen/logrus/releases)
- [Changelog](https://github.com/sirupsen/logrus/blob/master/CHANGELOG.md)
- [Commits](https://github.com/sirupsen/logrus/compare/v1.7.0...v1.8.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-02-18 05:49:25 +00:00
Daniel Holbach
640613565d Merge pull request #305 from weaveworks/dependabot/go_modules/github.com/spf13/cobra-1.1.3
Bump github.com/spf13/cobra from 1.1.2 to 1.1.3
2021-02-16 12:18:40 +01:00
dependabot[bot]
763695de5c Bump github.com/spf13/cobra from 1.1.2 to 1.1.3
Bumps [github.com/spf13/cobra](https://github.com/spf13/cobra) from 1.1.2 to 1.1.3.
- [Release notes](https://github.com/spf13/cobra/releases)
- [Changelog](https://github.com/spf13/cobra/blob/master/CHANGELOG.md)
- [Commits](https://github.com/spf13/cobra/compare/v1.1.2...v1.1.3)

Signed-off-by: dependabot[bot] <support@github.com>
2021-02-11 05:52:43 +00:00
Daniel Holbach
6ff5722728 Merge pull request #304 from weaveworks/dependabot/go_modules/github.com/spf13/cobra-1.1.2
Bump github.com/spf13/cobra from 1.1.1 to 1.1.2
2021-02-10 12:40:27 +01:00
dependabot[bot]
472934e958 Bump github.com/spf13/cobra from 1.1.1 to 1.1.2
Bumps [github.com/spf13/cobra](https://github.com/spf13/cobra) from 1.1.1 to 1.1.2.
- [Release notes](https://github.com/spf13/cobra/releases)
- [Changelog](https://github.com/spf13/cobra/blob/master/CHANGELOG.md)
- [Commits](https://github.com/spf13/cobra/compare/v1.1.1...v1.1.2)

Signed-off-by: dependabot[bot] <support@github.com>
2021-02-10 05:53:05 +00:00
Daniel Holbach
b7f29c76ce Merge pull request #302 from weaveworks/coc
Point to CNCF Code of Conduct
2021-02-08 17:40:40 +01:00
Daniel Holbach
fa4e458f1f Merge pull request #300 from t3mi/master
add podLabels parameter
2021-02-08 16:05:24 +01:00
Daniel Holbach
4fc93d550d Merge pull request #301 from weaveworks/dependabot/github_actions/actions/stale-v3.0.16
Bump actions/stale from v3.0.15 to v3.0.16
2021-02-08 16:04:16 +01:00
Daniel Holbach
6eb9050156 Point to CNCF Code of Conduct 2021-02-08 11:35:50 +01:00
dependabot[bot]
d8b7669ab4 Bump actions/stale from v3.0.15 to v3.0.16
Bumps [actions/stale](https://github.com/actions/stale) from v3.0.15 to v3.0.16.
- [Release notes](https://github.com/actions/stale/releases)
- [Commits](https://github.com/actions/stale/compare/v3.0.15...9d6f46564a515a9ea11e7762ab3957ee58ca50da)

Signed-off-by: dependabot[bot] <support@github.com>
2021-02-08 06:26:07 +00:00
t3mi
d52d78a303 add podLabels parameter 2021-02-07 23:58:55 +02:00
Daniel Holbach
6a8e3f1e98 Merge pull request #298 from weaveworks/dependabot/github_actions/actions/stale-v3.0.15
Bump actions/stale from v3.0.14 to v3.0.15
2021-01-25 10:05:12 +01:00
dependabot[bot]
b39c9011ea Bump actions/stale from v3.0.14 to v3.0.15
Bumps [actions/stale](https://github.com/actions/stale) from v3.0.14 to v3.0.15.
- [Release notes](https://github.com/actions/stale/releases)
- [Commits](https://github.com/actions/stale/compare/v3.0.14...86561461b92875de77a8b2d2e75f004c826e8f45)

Signed-off-by: dependabot[bot] <support@github.com>
2021-01-25 06:54:10 +00:00
Daniel Holbach
fade706cbf Merge pull request #250 from damoon/19-PreferNoSchedule
implement issue-19 add prefer no schedule taint to avoid double draining of pods
2021-01-12 14:28:23 +01:00
David Sauer
5a4e197d27 change taint config to be disabled by default 2021-01-11 18:24:17 +01:00
Daniel Holbach
1320c5d318 Merge pull request #293 from evrardjp/fix-make-helm-chart
Update helm chart README using Make
2021-01-11 16:39:23 +01:00
Jean-Philippe Evrard
0640683fbb Update helm chart README using Make
Without this, it's possible that the helm chart documentation
contains the `image tag` version which might not be equal to
the version in the helm chart, as it's only an example.

This is a confusing, so instead we should use make to edit the
application version everywhere.

This fixes it by updating the Makefile to modify text of the
chart's README using a regex looking for something similar to
a version; then I used the updated makefile to edit the README,
which in turns requires a bump of the version of the chart
itself.
2021-01-11 16:14:18 +01:00
Daniel Holbach
ec1a931a39 Merge pull request #292 from evrardjp/update-helm-chart
Update helm chart
2021-01-11 15:18:50 +01:00
Jean-Philippe Evrard
36308cee91 Update helm chart
Bumping the helm chart with minor version bump, due to minor
version bump of the kured appVersion.
2021-01-11 14:57:42 +01:00
Daniel Holbach
b733d00550 Merge pull request #280 from cnmcavoy/cnmcavoy/helm-updates
Expose the service name and maxUnavailable for rolling updates in helm chart
2021-01-11 14:53:53 +01:00
Daniel Holbach
56e2c12d38 Merge pull request #291 from evrardjp/fix-tagging
Fix automated tagging
2021-01-11 14:29:28 +01:00
Jean-Philippe Evrard
48e7ff28bf Fix automated tagging
Without this patch, the name of the image is not templated, which
cause the action to fail.

This should fix it, by ensuring the image scan action uses a
templated value, instead of incorrectly relying on shell templating,
which doesn't run in the action.
2021-01-11 14:23:14 +01:00
Daniel Holbach
14fcc7bf37 Merge pull request #289 from evrardjp/update-README-for-1.6.0
Update README
2021-01-11 11:51:20 +01:00
Daniel Holbach
5b4e5b8533 Merge pull request #288 from evrardjp/update-versions-testing
Refresh kind cluster versions
2021-01-11 11:39:54 +01:00
Jean-Philippe Evrard
0162288ecf Update README
This will prepare the README for 1.6.0 release, showing the
planned version.
2021-01-11 11:26:23 +01:00
Jean-Philippe Evrard
2e09425a45 Refresh kind cluster versions
Without this patch, we are using outdated images in kind cluster
setup.

This should fix it, by removing 1.17 cluster (which is not tested
anymore), and updating 1.19 images.
2021-01-11 11:16:09 +01:00
Daniel Holbach
5cbca18377 Merge pull request #269 from evrardjp/publish-chart-on-change-not-on-release
Auto-publish helm chart on master change
2021-01-11 10:49:37 +01:00
Daniel Holbach
86fe6ff03e Merge pull request #285 from weaveworks/dependabot/github_actions/nick-invision/retry-v2.4.0
Bump nick-invision/retry from v2.2.0 to v2.4.0
2021-01-08 15:10:30 +01:00
Daniel Holbach
a3b782f86b Merge pull request #268 from evrardjp/prep-1.20
Update for kubernetes 1.20 support
2021-01-08 15:09:37 +01:00
Daniel Holbach
14269023e8 Merge pull request #275 from evrardjp/dont-bump-k8s
Do not bump any k8s module
2021-01-08 15:02:29 +01:00
Daniel Holbach
a1e443a9f3 Merge pull request #276 from evrardjp/vuln-image-fix
Temporarily workaround alpine issue
2021-01-08 15:01:38 +01:00
Daniel Holbach
a2cc24e656 Merge pull request #287 from jack-education/correct-README-schedule-example
Corrected README Setting a schedule configuration example
2021-01-08 11:40:54 +01:00
jack-education
77ca6fda07 Corrected README Setting a schedule configuration example 2021-01-08 00:26:15 +00:00
David Sauer
3a35d6a46c remove taint in case the reboot is not needed anymore 2021-01-06 22:21:41 +01:00
David Sauer
e430b1442a updated README 2021-01-06 21:59:53 +01:00
David Sauer
b3e39418ba cache taint state to avoid unnecessary API calls 2021-01-06 21:51:43 +01:00
David Sauer
34446f949e Allow to disable tainting during pending node reboot by setting the taint name to an empty string. 2021-01-06 21:39:32 +01:00
David Sauer
10d95c426f fixed type & renamed variable 2021-01-06 21:29:35 +01:00
David Sauer
e4c684c3af taint node with PreferNoSchedule to prevent receiving (and double draining) additional pods from other rebooting nodes 2021-01-06 21:23:40 +01:00
David Sauer
204a06ca38 fixed call of log.Fatal instead of log.Fatalf 2021-01-06 21:23:40 +01:00
David Sauer
48897eb0ab avoid indentations to ease readability 2021-01-06 21:23:40 +01:00
dependabot[bot]
84407690c6 Bump nick-invision/retry from v2.2.0 to v2.4.0
Bumps [nick-invision/retry](https://github.com/nick-invision/retry) from v2.2.0 to v2.4.0.
- [Release notes](https://github.com/nick-invision/retry/releases)
- [Changelog](https://github.com/nick-invision/retry/blob/master/.releaserc.js)
- [Commits](https://github.com/nick-invision/retry/compare/v2.2.0...7c68161adf97a48beb850a595b8784ec57a98cbb)

Signed-off-by: dependabot[bot] <support@github.com>
2021-01-05 05:58:00 +00:00
Cameron McAvoy
d4893d7bd7 Expose the service name and maxUnavailable for rolling updates in the helm chart 2020-12-17 18:31:28 -05:00
Jean-Philippe Evrard
897834a9db Temporarily workaround alpine issue
Until a new alpine image is created, we should ensure the latest
packages are used, and therefore we should upgrade default
installed packages.

Without this patch, we'll have outdated and vulnerable packages
until a new 3.12 image is released.

This is a problem, as we'll publish broken images.

This should temporarily workaround it, at the expense of larger
images (contains package cache)
2020-12-14 11:20:27 +01:00
Jean-Philippe Evrard
996b1459b1 Do not bump any k8s module
Without this patch, dependabot will still try to bump some k8s
dependencies.

This is a problem, as we need to bump them together, manually.

This should fix it by removing them all from dependabot.
2020-12-14 10:25:52 +01:00
Jean-Philippe Evrard
251c3c8503 Clarify development process for helm charts
Without this, it might be unclear when the chart is published.

This should fix it.
2020-12-11 12:57:47 +01:00
Jean-Philippe Evrard
0bb0cd168b Auto-publish helm chart on master change
We are now testing the helm charts on each PR. They are now
ensured to be passing our tests and reviewed before merging.
This also means that the merged changes in the master branch
are reliable, and therefore can be consumed immediately.

Currently, we are waiting for a release to publish a helm
chart.

This is a problem as it means that the helm chart will
always lag behind, and we'll miss a few semantic versions,
if for example the helm chart is adapted multiple times
before the next release.

This should fix it by ensuring ALL the merged changes in
our helm chart will result in a new published helm chart.
2020-12-10 11:17:25 +01:00
Daniel Holbach
e716e9c2b4 Merge pull request #270 from evrardjp/fix-current-helm-chart
Fix comment spacing
2020-12-09 17:28:08 +01:00
Jean-Philippe Evrard
1362eafb33 Fix comment spacing
Without this patch, chart linting will fail: more than two
spaces are needed before a comment in the helm chart values.

This fixes it by adding one more space, and move the whole block
of comments for consistency.
2020-12-09 16:37:03 +01:00
Jean-Philippe Evrard
c68937b5ff Update for kubernetes 1.20 support
This ensures we bump the code for 1.20.
It updates the testing to ensure kured works on a 1.20 cluster,
removes the testing on 1.17 (as it is now deprecated).
Libraries remain on 1.19, to avoid breaking 1.18 clusters.
2020-12-09 14:54:35 +01:00
Daniel Holbach
e878e0e5b3 Merge pull request #258 from evrardjp/only-use-github-actions
Publish image on tag with github actions
2020-12-07 15:29:36 +01:00
Jean-Philippe Evrard
525f04b492 Publish image on master merged changes
As we are pretty much committed to github actions, we should
probably rely on it to push the images at each commit merged
on the master branch.
2020-12-07 13:57:58 +01:00
Jean-Philippe Evrard
c7542a5d21 Point docs to current golang version
This is to be on par with the previous documentation.
2020-12-07 13:21:25 +01:00
Ciaran Moran
170a792112 DockerHub auth: use local and org secrets 2020-12-07 13:21:25 +01:00
Jean-Philippe Evrard
ea57673373 Publish image on tag
As we are pretty much committed to github actions, we should
probably rely on it to push the images on tag.

This covers the missing bits.
2020-12-07 13:21:25 +01:00
Daniel Holbach
277a8e30cd Merge pull request #262 from evrardjp/fix-force-golang-version
Fix typo in github workflows
2020-12-07 12:57:28 +01:00
Jean-Philippe Evrard
bd0d901d22 Fix typo in github workflows
Without this patch, the PR jobs are broken and no jobs are running.
This was a recently introduced typo in the last refactor of the
PR jobs.

This should fix it, and make the PR test working again.
2020-12-07 12:35:52 +01:00
Daniel Holbach
dcd5ec5325 Merge pull request #220 from weaveworks/dependabot/go_modules/github.com/prometheus/client_golang-1.8.0
Bump github.com/prometheus/client_golang from 1.0.0 to 1.8.0
2020-12-05 16:13:32 +01:00
Daniel Holbach
09d44c9ac1 Merge pull request #259 from evrardjp/force-golang-version
Force golang version
2020-12-02 16:51:16 +01:00
dependabot[bot]
8a9ae1ee9d Bump github.com/prometheus/client_golang from 1.0.0 to 1.8.0
Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.0.0 to 1.8.0.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/master/CHANGELOG.md)
- [Commits](https://github.com/prometheus/client_golang/compare/v1.0.0...v1.8.0)

Signed-off-by: dependabot[bot] <support@github.com>
2020-12-02 13:59:26 +00:00
Daniel Holbach
67c9cc0fa7 Merge pull request #226 from weaveworks/dependabot/go_modules/github.com/prometheus/common-0.15.0
Bump github.com/prometheus/common from 0.4.1 to 0.15.0
2020-12-02 14:55:49 +01:00
dependabot[bot]
111d1a1a98 Bump github.com/prometheus/common from 0.4.1 to 0.15.0
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.4.1 to 0.15.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](https://github.com/prometheus/common/compare/v0.4.1...v0.15.0)

Signed-off-by: dependabot[bot] <support@github.com>
2020-12-02 13:29:19 +00:00
Daniel Holbach
6e6ad21b70 Merge pull request #257 from weaveworks/dependabot/go_modules/github.com/spf13/cobra-1.1.1
Bump github.com/spf13/cobra from 1.0.0 to 1.1.1
2020-12-02 14:25:47 +01:00
dependabot[bot]
db9e716a55 Bump github.com/spf13/cobra from 1.0.0 to 1.1.1
Bumps [github.com/spf13/cobra](https://github.com/spf13/cobra) from 1.0.0 to 1.1.1.
- [Release notes](https://github.com/spf13/cobra/releases)
- [Changelog](https://github.com/spf13/cobra/blob/master/CHANGELOG.md)
- [Commits](https://github.com/spf13/cobra/compare/v1.0.0...v1.1.1)

Signed-off-by: dependabot[bot] <support@github.com>
2020-12-02 13:03:21 +00:00
Daniel Holbach
e40a925cf0 Merge pull request #207 from weaveworks/dependabot/go_modules/github.com/sirupsen/logrus-1.7.0
Bump github.com/sirupsen/logrus from 1.2.0 to 1.7.0
2020-12-02 13:59:06 +01:00
Jean-Philippe Evrard
e2dd29748d Force golang version
Without this, golang version used is the golang version decided
by github.

This is a problem, as it might shift over time, without our control.

This fixes it by getting the golang version from the go.mod.
2020-12-01 08:36:35 +01:00
dependabot[bot]
8ab4d7390e Bump github.com/sirupsen/logrus from 1.2.0 to 1.7.0
Bumps [github.com/sirupsen/logrus](https://github.com/sirupsen/logrus) from 1.2.0 to 1.7.0.
- [Release notes](https://github.com/sirupsen/logrus/releases)
- [Changelog](https://github.com/sirupsen/logrus/blob/master/CHANGELOG.md)
- [Commits](https://github.com/sirupsen/logrus/compare/v1.2.0...v1.7.0)

Signed-off-by: dependabot[bot] <support@github.com>
2020-11-30 12:51:45 +00:00
Daniel Holbach
a13dfbb538 Merge pull request #256 from evrardjp/dependabot-ignore-kubernetes
Do not bump kubernetes with dependabot
2020-11-30 13:13:36 +01:00
Daniel Holbach
c9de90c96d Merge pull request #255 from dholbach/merge-k8s-and-go-updates
Merge k8s and go updates
2020-11-30 12:31:34 +01:00
Jean-Philippe Evrard
b4c8b64c2d Do not bump kubernetes with dependabot
Without this patch, we'll get kubernetes updates.

This is not necessary, and could be even a problem on merge:
those kubernetes updates are done separately, knowingly,
to respect the life cycle of the kubernetes we need
(and stay one version below latest to have a larger coverage
of versions).

We could keep dependabot to update those on a lower frequency,
but that sounds clunky and not great. Instead disable them all,
and rely on the team to do this regular maintenance work.
2020-11-30 12:03:50 +01:00
Daniel Holbach
8344015019 Merge pull request #253 from evrardjp/ensure-python-is-installed
Fix chart linter
2020-11-30 11:11:01 +01:00
Daniel Holbach
31bb5363b2 update versions 2020-11-30 10:51:43 +01:00
Daniel Holbach
4894a86f32 Merge remote-tracking branch 'evrardjp/build-with-golang1.14' into merge-k8s-and-go-updates 2020-11-30 10:42:40 +01:00
Daniel Holbach
21eabe2fa6 Merge remote-tracking branch 'upstream/dependabot/go_modules/k8s.io/apimachinery-0.19.2' into merge-k8s-and-go-updates 2020-11-30 10:33:47 +01:00
Daniel Holbach
459d7c53aa Merge remote-tracking branch 'upstream/dependabot/go_modules/k8s.io/kubectl-0.19.4' into merge-k8s-and-go-updates 2020-11-30 10:31:11 +01:00
Jean-Philippe Evrard
40f5eac8aa Simplify action code
There are lots of duplicated code in this workflow.
This fixes it by making a unique job with parameters. The
matrix buys us the parallelisation and the fail-fast.
2020-11-30 10:30:41 +01:00
Jean-Philippe Evrard
1b54c4bc04 Fix chart linter
Without this patch, the lint action incorrectly returns everything
is fine.

This is a problem, as lint effectively is not running, and
therefore we could merge broken charts.

This fixes it by updating to the latest practices you can find
in the official chart-repo-actions.

(See the official example in
i1a9640d998/.github/workflows/lint-test.yaml)
2020-11-30 10:05:02 +01:00
Daniel Holbach
ef7a7f6320 Merge pull request #251 from weaveworks/dependabot/github_actions/nick-invision/retry-v2.2.0
Bump nick-invision/retry from v1 to v2.2.0
2020-11-30 09:07:14 +01:00
dependabot[bot]
876f72fa50 Bump nick-invision/retry from v1 to v2.2.0
Bumps [nick-invision/retry](https://github.com/nick-invision/retry) from v1 to v2.2.0.
- [Release notes](https://github.com/nick-invision/retry/releases)
- [Changelog](https://github.com/nick-invision/retry/blob/master/.releaserc.js)
- [Commits](https://github.com/nick-invision/retry/compare/v1...fb3bca3fb54f6488d7508c8d1eeb64b94efd5a93)

Signed-off-by: dependabot[bot] <support@github.com>
2020-11-30 07:07:20 +00:00
Daniel Holbach
c62fa36259 Merge pull request #241 from evrardjp/fix-incoherences-in-actions-names
Cleanup github actions
2020-11-27 16:58:00 +01:00
Jean-Philippe Evrard
ba54b199b8 Clarify development process
Without this, it's a little bit hard to grasp how things
are interconnected. This should clarify things.
2020-11-27 16:23:24 +01:00
Jean-Philippe Evrard
679f45c321 Cleanup github actions
- Made all the file extensions ".yaml"
- Regrouped actions together to make it easy to see when they
  are useful: on-pr is useful at every PR, on-tag when we are
  ready to tag next image, on-pr-chart when we have a PR to
  modify the chart with the published image, on-release when
  we have released and need to publish the final helm chart
- Regrouped periodic jobs together, to deal with stale prs/issues
  and ensuring that our helm chart always works.
2020-11-27 14:41:38 +01:00
Daniel Holbach
de4e9a9bd9 Merge pull request #249 from evrardjp/produce-more-logs-for-stopped-containers
Add more logs into gates
2020-11-27 13:49:17 +01:00
Jean-Philippe Evrard
81ee206a87 Add more logs into gates
This will be necessary to find out why some docker containers fail
to come back up in github actions.
2020-11-27 13:31:20 +01:00
Daniel Holbach
77594f2e31 Merge pull request #248 from evrardjp/add-shellcheck
Add Shellcheck
2020-11-27 12:52:25 +01:00
Daniel Holbach
36a9e4e3d6 Merge pull request #247 from evrardjp/make-shellcheck-happy
Fix shellcheck issue
2020-11-27 12:49:21 +01:00
Jean-Philippe Evrard
98b547a66e Add Shellcheck
Ensures our bash is neat!
2020-11-27 12:23:39 +01:00
Jean-Philippe Evrard
1165cfe6f4 Fix shellcheck issue
Without this, shellcheck will complain about double quotes
missing.
2020-11-27 12:12:39 +01:00
Daniel Holbach
91a8ed0638 Merge pull request #246 from evrardjp/separate-debug-output
Improve coordinated reboot output
2020-11-27 11:38:34 +01:00
Jean-Philippe Evrard
67ea5922f4 Improve coordinated reboot output
When a failure is happening and the cluster doesn't manage to
be back up on time, we exit 1, and don't show docker logs.

This is a problem, as we would benefit from a detailed docker
output on those cases, when debugging.

This fixes it by ensuring the logging is always done at the
exit of the script.
2020-11-27 10:59:14 +01:00
Daniel Holbach
88b8b5d223 Merge pull request #181 from evrardjp/kustomize-kind-tests
Add manifests testing
2020-11-27 09:43:07 +01:00
Daniel Holbach
645ca7f88f Merge pull request #242 from weaveworks/dependabot/github_actions/nick-invision/retry-v2.2.0
Bump nick-invision/retry from v1 to v2.2.0
2020-11-27 09:40:34 +01:00
Daniel Holbach
f152a15552 Merge pull request #184 from evrardjp/multiple-k8s-versions
Increase test matrix for smoke tests
2020-11-27 09:36:18 +01:00
dependabot[bot]
470b887ea4 Bump nick-invision/retry from v1 to v2.2.0
Bumps [nick-invision/retry](https://github.com/nick-invision/retry) from v1 to v2.2.0.
- [Release notes](https://github.com/nick-invision/retry/releases)
- [Changelog](https://github.com/nick-invision/retry/blob/master/.releaserc.js)
- [Commits](https://github.com/nick-invision/retry/compare/v1...fb3bca3fb54f6488d7508c8d1eeb64b94efd5a93)

Signed-off-by: dependabot[bot] <support@github.com>
2020-11-27 06:12:15 +00:00
Jean-Philippe Evrard
9ca74a6062 Simplify manifest testing
We don't need to test with kustomize, manifest testing is good
enough, as we just test that the manifest are correct, not that
they are functional (which would require a change in the poll time).
2020-11-26 14:14:18 +01:00
Jean-Philippe Evrard
7f379ac920 Add kustomize testing
This extends our test coverages for kured-* manifest changes on PRs,
and any eventual changes in kubernetes/kubectl on periodics.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2020-11-26 09:46:07 +01:00
Jean-Philippe Evrard
fa9991f929 Increase kubernetes versions test matrix for smoke tests
This allows us to test this branch with multiple kubernetes
versions.
2020-11-26 09:41:32 +01:00
Daniel Holbach
596394db79 Merge pull request #183 from evrardjp/helm-smoke-test
Add smoke/basic functional test
2020-11-26 09:40:18 +01:00
Jean-Philippe Evrard
c9367eeff5 Always have latest helm binary installed
This will ease our maintenance.
2020-11-26 09:19:41 +01:00
Daniel Holbach
8ad6bb7c24 Merge pull request #240 from weaveworks/dependabot/github_actions/actions/stale-v3.0.14
Bump actions/stale from v1 to v3.0.14
2020-11-26 08:57:34 +01:00
Daniel Holbach
722db47b2d Merge pull request #212 from DaniJG/master
GH-125, add additional parameters for the drain/reboot slack message template
2020-11-26 08:53:29 +01:00
dependabot[bot]
3f2027da32 Bump actions/stale from v1 to v3.0.14
Bumps [actions/stale](https://github.com/actions/stale) from v1 to v3.0.14.
- [Release notes](https://github.com/actions/stale/releases)
- [Commits](https://github.com/actions/stale/compare/v1...87c2b794b9b47a9bec68ae03c01aeb572ffebdb1)

Signed-off-by: dependabot[bot] <support@github.com>
2020-11-26 06:26:14 +00:00
Daniel Jimenez Garcia
51cab0dedc rename message template parameters so they are not related to slack 2020-11-25 16:20:54 +00:00
Daniel Jimenez Garcia
f059cec794 GH-125, add additional parameters to override the drain/reboot slack messages 2020-11-25 16:19:31 +00:00
Daniel Holbach
2fef8b1b12 Merge pull request #206 from chentex/time-wrap
Added support for time wrap in timewindow.Contains
2020-11-25 10:28:57 +01:00
Daniel Holbach
e9a7c4535a Merge pull request #224 from evrardjp/auto-expire-issues
Auto expire issues and PRs
2020-11-25 10:14:45 +01:00
Daniel Holbach
da60edbe7b Merge pull request #229 from weaveworks/dependabot/github_actions/helm/kind-action-v1.1.0
Bump helm/kind-action from v1.0.0 to v1.1.0
2020-11-25 10:12:54 +01:00
Daniel Holbach
a6040aa12d Merge pull request #230 from weaveworks/dependabot/github_actions/helm/chart-testing-action-v2.0.1
Bump helm/chart-testing-action from v1.0.0 to v2.0.1
2020-11-25 10:10:05 +01:00
Daniel Holbach
685f32881b Merge pull request #235 from dholbach/prep-1.5.1-release
Prepare 1.5.1 release
2020-11-24 15:34:27 +01:00
Daniel Holbach
1931b9b939 Merge pull request #238 from dholbach/rename-annotationTTL-in-chart
rename annotation-ttl to lock-ttl in all places, follow-up to #213
2020-11-24 09:55:21 +01:00
Daniel Holbach
32aa77f75b rename annotation-ttl to lock-ttl in all places, follow-up to #213 2020-11-24 09:28:57 +01:00
Daniel Holbach
4b1e4036e3 Merge pull request #237 from weaveworks/drain-grace
Drain: allow pods grace period to terminate
2020-11-24 07:46:05 +01:00
Bryan Boreham
1ba3acab98 Drain: allow pods grace period to terminate
The default of 0 is taken as "delete immediately", which is
not appropriate.
2020-11-23 18:07:56 +00:00
Daniel Holbach
038a3412b1 Prepare 1.5.1 release 2020-11-23 10:47:16 +01:00
dependabot[bot]
84b62ba7ec Bump k8s.io/kubectl from 0.18.8 to 0.19.4
Bumps [k8s.io/kubectl](https://github.com/kubernetes/kubectl) from 0.18.8 to 0.19.4.
- [Release notes](https://github.com/kubernetes/kubectl/releases)
- [Commits](https://github.com/kubernetes/kubectl/compare/v0.18.8...v0.19.4)

Signed-off-by: dependabot[bot] <support@github.com>
2020-11-16 07:10:36 +00:00
dependabot[bot]
cc4a4f5161 Bump helm/chart-testing-action from v1.0.0 to v2.0.1
Bumps [helm/chart-testing-action](https://github.com/helm/chart-testing-action) from v1.0.0 to v2.0.1.
- [Release notes](https://github.com/helm/chart-testing-action/releases)
- [Commits](https://github.com/helm/chart-testing-action/compare/v1.0.0...b0d4458c71155b54fcf33e11dd465dc923550009)

Signed-off-by: dependabot[bot] <support@github.com>
2020-11-16 07:06:39 +00:00
dependabot[bot]
8a29c218da Bump helm/kind-action from v1.0.0 to v1.1.0
Bumps [helm/kind-action](https://github.com/helm/kind-action) from v1.0.0 to v1.1.0.
- [Release notes](https://github.com/helm/kind-action/releases)
- [Commits](https://github.com/helm/kind-action/compare/v1.0.0...7a937c0fb648064a83b8b9354151e5e543d9fcec)

Signed-off-by: dependabot[bot] <support@github.com>
2020-11-16 07:06:38 +00:00
Daniel Holbach
64202ff440 Merge pull request #185 from evrardjp/release-helper
Release helper
2020-11-09 16:43:12 +01:00
Daniel Holbach
08ae57579c Merge pull request #221 from evrardjp/lint-job
Add Lint job in github actions
2020-11-09 13:22:26 +01:00
Jean-Philippe Evrard
2e5ea66e91 Add lint job
In the past, we had lint issues which were merged into the code,
and/or lint changed without us adapting our code.

This should allow us to stay on top of linting issue by
highlighting them in PRs.
2020-11-09 13:11:58 +01:00
Daniel Holbach
7461ab8d95 Merge pull request #222 from evrardjp/make-lint-happier-for-pkg-folder
Make go lint on pkg folder happier
2020-11-09 11:50:58 +01:00
Daniel Holbach
aa49cfd8c4 Merge pull request #215 from evrardjp/make-lint-happier
Make go lint on cmd folder happier
2020-11-09 11:49:51 +01:00
Bryan Boreham
4c31184422 Merge pull request #213 from mvisonneau/lock_ttl
Replaced --annotationTTL with --lockTTL and fixed bug
2020-11-06 11:31:19 +00:00
Jean-Philippe Evrard
8a0f38ac2a Auto expire issues and PRs
Without this patch, we might hold old issues and PR for a long
time. Instead we should close them. People can reopen if necessary.

This would show that we have a proper triage process, and a proper
way to handle those.
2020-11-05 11:23:05 +01:00
Daniel Holbach
6177c3a996 Merge pull request #217 from weaveworks/post-210-cleanup
Clean up deps, update docs to explain state post-210
2020-11-05 11:10:36 +01:00
Jean-Philippe Evrard
5d88e6c6db Make lint happier in pkg folder
Without this patch, lint will complain about a few cosmetic details.
2020-11-05 11:01:49 +01:00
Jean-Philippe Evrard
7091debe23 Make lint happier
Without this, golint is complaining about a few cosmetic changes.
This solves it, and is necessary if we want to add a lint test
in CI.
2020-11-05 10:14:39 +01:00
Jean-Philippe Evrard
ce6075c800 Remove prom-active-alerts
Prom-active-alerts command is not used, not tested, and
currently broken. Let's remove it.
2020-11-05 10:13:50 +01:00
Daniel Holbach
4ed5b823fc update docs following #210 2020-11-04 12:09:20 +01:00
Daniel Holbach
cadb6c263f run 'go mod tidy' 2020-11-04 12:07:05 +01:00
Daniel Holbach
d5fe4fbaec Merge pull request #210 from evrardjp/remove-kubectl-bin
feature: Remove kubectl bin
2020-11-04 12:01:47 +01:00
Maxime VISONNEAU
9648d1d759 Replaced --annotationTTL with --lockTTL and made it work correctly 2020-10-30 10:39:18 +00:00
Jean-Philippe Evrard
e5a2d4acc7 Refactor drain/uncordon
Moving the drainer object close to its usage is more readable.
2020-10-29 11:45:20 +01:00
Jean-Philippe Evrard
e1ba9a975e Remove kubectl exception in container scanning
Because we now have a builtin kubectl, we don't need that
security exception.
2020-10-29 09:56:32 +01:00
Jean-Philippe Evrard
19bf5bf224 Bump prometheus
This is required by the vendoring of kubectl.
2020-10-15 13:02:39 +02:00
Jean-Philippe Evrard
72c4112e20 Use kubectl as library instead of calling from cli 2020-10-15 13:02:35 +02:00
Vicente Zepeda Mas
2f740b7f9a Added support for time wrap in timewindow.Contains
Add test scenarios to test new cases
Organize test scenarios chronologically

Signed-off-by: Vicente Zepeda Mas <vzepedamas@suse.com>
2020-09-28 13:58:43 +02:00
dependabot[bot]
cf58d9a777 Bump k8s.io/apimachinery from 0.18.8 to 0.19.2
Bumps [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) from 0.18.8 to 0.19.2.
- [Release notes](https://github.com/kubernetes/apimachinery/releases)
- [Commits](https://github.com/kubernetes/apimachinery/compare/v0.18.8...v0.19.2)

Signed-off-by: dependabot[bot] <support@github.com>
2020-09-17 05:53:07 +00:00
Daniel Holbach
553e061b94 Merge pull request #199 from evrardjp/ci/add-security-scanner
feat: Add security scanning into CI
2020-09-14 14:50:34 +02:00
Daniel Holbach
598964b0f6 Merge pull request #198 from evrardjp/fix/DKL-DI-0004
fix: Follow DKL-DI-0004 guideline
2020-09-14 12:32:40 +02:00
Jean-Philippe Evrard
b0bd603931 fix: Follow DKL-DI-0004 guideline
Without this patch, we need to build a cache, remove it.
Since apk allows to work with no-cache and won't leave artifacts,
we should use it.

This will make the dockle best practices scanner happier.
2020-09-11 16:53:59 +02:00
Jean-Philippe Evrard
8961cbf262 feat: Add security scanning into CI
Without this patch, there is no way we can see, in the development
process, if the image we are about to publish is insecure.

This is a problem as we might be releasing new versions of kured
with outdated base image which contains vulnerabilities.

This fixes it by creating a job which will show any eventual
vulnerability.
2020-09-10 15:16:05 +02:00
Daniel Holbach
13708ad1dc Merge pull request #194 from dholbach/add-missing-quote
add missing quote - thanks Karan Arora for reporting
2020-09-09 15:55:06 +02:00
Daniel Holbach
83dccea063 add missing quote - thanks Karan Arora for reporting 2020-09-09 15:13:57 +02:00
Daniel Holbach
10e8dbda97 Merge pull request #166 from smueller18/param-alert-filterregexp
Remove quote for parameter alert-filter-regexp
2020-09-04 19:53:56 +02:00
Stephan Müller
a6c0a4a7cb Bump helm chart version 2020-09-04 17:34:26 +02:00
Stephan Müller
8b4b92f237 Remove quote for parameter alert-filter-regexp 2020-09-04 17:33:47 +02:00
Daniel Holbach
ba8c830d63 Merge pull request #192 from dholbach/prep-1.5.0-release
Prepare 1.5.0 release
2020-09-01 15:08:10 +02:00
Daniel Holbach
19d7dee4aa Prepare 1.5.0 release
- update chart versions
	- update release documentation
	- Christian and David are the chart maintainers
2020-09-01 11:38:17 +02:00
Daniel Holbach
7408cebb6b Merge pull request #187 from weaveworks/dependabot/github_actions/helm/chart-testing-action-v1.0.0
Bump helm/chart-testing-action from v1.0.0-rc.2 to v1.0.0
2020-09-01 10:59:14 +02:00
Daniel Holbach
cd0f3cee12 Merge pull request #188 from weaveworks/dependabot/github_actions/helm/kind-action-v1.0.0
Bump helm/kind-action from v1.0.0-rc.1 to v1.0.0
2020-09-01 10:58:07 +02:00
dependabot[bot]
dadf2cdd48 Bump helm/kind-action from v1.0.0-rc.1 to v1.0.0
Bumps [helm/kind-action](https://github.com/helm/kind-action) from v1.0.0-rc.1 to v1.0.0.
- [Release notes](https://github.com/helm/kind-action/releases)
- [Commits](https://github.com/helm/kind-action/compare/v1.0.0-rc.1...3af270e3dacc4feded63d810def7e19de77cba72)

Signed-off-by: dependabot[bot] <support@github.com>
2020-09-01 08:53:51 +00:00
dependabot[bot]
15c5c47b49 Bump helm/chart-testing-action from v1.0.0-rc.2 to v1.0.0
Bumps [helm/chart-testing-action](https://github.com/helm/chart-testing-action) from v1.0.0-rc.2 to v1.0.0.
- [Release notes](https://github.com/helm/chart-testing-action/releases)
- [Commits](https://github.com/helm/chart-testing-action/compare/v1.0.0-rc.2...96a4323c6cfa90ddea6e02db43143cd80124a7fa)

Signed-off-by: dependabot[bot] <support@github.com>
2020-09-01 08:53:50 +00:00
Daniel Holbach
012e680061 Merge pull request #186 from evrardjp/dependabot
Add dependabot
2020-09-01 10:53:24 +02:00
Jean-Philippe Evrard
1cab9a1d28 Add dependabot
Without this patch, our deps will have to be manually maintained.

This should fix it.
2020-08-31 15:58:21 +02:00
Jean-Philippe Evrard
6793b0882f Release helper
This automates the manifest and helm chart version handling.
Just pass the organisation and version in the make command to
update the manifests/helm charts.

This does not automate the helm chart version and does not
create a manifest used in the release process.
2020-08-28 16:46:53 +02:00
Jean-Philippe Evrard
ad8c0053e2 Update to golang 1.15 2020-08-28 14:35:58 +02:00
Daniel Holbach
549bb9b415 Merge pull request #165 from dholbach/prep-for-k8s-1.19
Prepare for k8s release 1.19 (Aug 25)
2020-08-28 13:49:48 +02:00
Daniel Holbach
3ebc224958 update alpine to 3.12, k8s 1.18.8 2020-08-28 10:27:39 +02:00
Jean-Philippe Evrard
3d75f1b37a Add smoke/basic functional test
Without this patch, we don't test on release whether kured actually
works and behave well.

This is a problem, as a functional issue could have been hidden by
a recent change, as our testing is minimalist (only test the
usability, not the functionality).
Instead of testing manually, we should ensure this in CI.

This fixes it by adding a github action which tests the previously
built artifacts before publishing a release. The job consume the helm
chart in our code tree  (note: this relies on the last released image),
and run a functional test triggering a coordinated restart of a
whole 5 node cluster deployed with kind, through github actions.

Note: The github action needs to reset docker configuration, else
the reboot of the node (a docker container in kind) will fail.
It will be correctly triggered, but the node will not come back up,
with its systemd log mentioning: "Failed to attach 1 to compat systemd cgroup".
2020-08-28 09:25:44 +02:00
Daniel Holbach
16109017ce Prepare for k8s release 1.19 (Aug 25)
This is #152, #139, #127 in disguise.

	Maybe this time let it simmer a bit longer until the k8s
	release is there?
2020-08-19 17:30:00 +02:00
Daniel Holbach
b024898ed6 Merge pull request #171 from dholbach/prep-1.4.5
Prep 1.4.5 release
2020-08-05 11:21:15 +02:00
Daniel Holbach
19b177372e document how releases are town wrt Helm bits 2020-08-05 11:03:32 +02:00
Daniel Holbach
50f8d037fc bump versions for 1.4.5 release 2020-08-05 11:01:22 +02:00
Daniel Holbach
aea93fd0ac Merge pull request #169 from audunsolemdal/master
Chart: Support extraEnvVars
2020-07-29 15:09:53 +02:00
audunsolemdal
4dd15e3874 Use nindent, not indent 2020-07-29 09:37:31 +02:00
audunsolemdal
c1abad8a92 chart: update readme 2020-07-28 15:07:48 +02:00
audunsolemdal
25491c801b Bump chart version 2020-07-28 15:01:03 +02:00
audunsolemdal
9e42c8ec15 Add missing 'end' 2020-07-28 14:59:23 +02:00
Daniel Holbach
29eb862335 Merge pull request #168 from dholbach/update-to-last-release
update install instructions to use latest
2020-07-25 18:09:53 +02:00
audunsolemdal
d7e58bef3e Chart: Support extraEnvVars 2020-07-21 17:35:59 +02:00
Daniel Holbach
949aa3acf7 update install instructions to use latest 2020-07-21 10:23:49 +02:00
Daniel Holbach
2762837dac Merge pull request #164 from dholbach/1.4.4-release
Prep for 1.4.4 release
2020-07-01 11:27:51 +02:00
Daniel Holbach
d507361a45 update chart version 2020-07-01 10:49:12 +02:00
Daniel Holbach
1d1f22c93b Prep for 1.4.4 release
Drop bit in the docs about updating image tag - not necessary
	if you use the instructions.
2020-07-01 10:43:06 +02:00
Daniel Holbach
644aca3fa0 Merge pull request #163 from ckotzbauer/chart-fixes
Additional chart changes for service-handling
2020-06-30 20:28:30 +02:00
Christian Kotzbauer
59b078f38d bump and fix
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2020-06-30 19:21:06 +02:00
Christian Kotzbauer
36cef41c20 split matchLabels template
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2020-06-30 19:18:47 +02:00
Christian Kotzbauer
eb617adc2b restructured and improved service
Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2020-06-30 19:15:32 +02:00
Daniel Holbach
2afd04ddd3 Merge pull request #162 from ckotzbauer/chart-fixes
Several small chart fixes
2020-06-30 18:25:53 +02:00
Christian Kotzbauer
3eb7f17b3a bumped kured to upcoming 1.4.3
fixed servicemonitor indent
fixed quotes for arguments

Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>
2020-06-30 18:00:05 +02:00
Jean-Philippe Evrard
247863109d Bump to golang 1.14 2020-04-08 18:18:36 +02:00
50 changed files with 3691 additions and 583 deletions

View File

@@ -1,28 +0,0 @@
version: 2
jobs:
build:
docker:
- image: cimg/go:1.13
steps:
- checkout
- setup_remote_docker
- deploy:
name: Build and push image
command: |
echo "$DOCKER_PASS" | docker login --username "$DOCKER_USER" --password-stdin
if [ -z "${CIRCLE_TAG}" ]; then
make publish-image
else
make VERSION="${CIRCLE_TAG}" publish-image
fi
workflows:
version: 2
build:
jobs:
- build:
filters:
tags:
only: /.*/
branches:
ignore: gh-pages

1
.github/ct.yaml vendored
View File

@@ -1,5 +1,6 @@
# See https://github.com/helm/chart-testing#configuration
remote: origin
target-branch: main
chart-dirs:
- charts
chart-repos: []

21
.github/dependabot.yml vendored Normal file
View File

@@ -0,0 +1,21 @@
version: 2
updates:
# Maintain dependencies for GitHub Actions
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "daily"
# Maintain dependencies for gomod
- package-ecosystem: "gomod"
directory: "/"
schedule:
interval: "daily"
ignore:
- dependency-name: "k8s.io/api"
- dependency-name: "k8s.io/apimachinery"
- dependency-name: "k8s.io/client-go"
- dependency-name: "k8s.io/kubectl"
- package-ecosystem: "docker"
directory: "cmd/kured"
schedule:
interval: "daily"

13
.github/kind-cluster-1.21.yaml vendored Normal file
View File

@@ -0,0 +1,13 @@
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: kindest/node:v1.21.2
- role: control-plane
image: kindest/node:v1.21.2
- role: control-plane
image: kindest/node:v1.21.2
- role: worker
image: kindest/node:v1.21.2
- role: worker
image: kindest/node:v1.21.2

13
.github/kind-cluster-1.22.yaml vendored Normal file
View File

@@ -0,0 +1,13 @@
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: kindest/node:v1.22.4
- role: control-plane
image: kindest/node:v1.22.4
- role: control-plane
image: kindest/node:v1.22.4
- role: worker
image: kindest/node:v1.22.4
- role: worker
image: kindest/node:v1.22.4

13
.github/kind-cluster-1.23.yaml vendored Normal file
View File

@@ -0,0 +1,13 @@
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: "kindest/node:v1.23.0"
- role: control-plane
image: "kindest/node:v1.23.0"
- role: control-plane
image: "kindest/node:v1.23.0"
- role: worker
image: "kindest/node:v1.23.0"
- role: worker
image: "kindest/node:v1.23.0"

View File

@@ -1,32 +0,0 @@
name: lint-chart
on:
pull_request:
paths:
- "charts/**"
jobs:
lint-test:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
with:
fetch-depth: "0"
- name: Run chart-testing (lint)
id: lint
uses: helm/chart-testing-action@v1.0.0-rc.2
with:
command: lint
config: .github/ct.yaml
- name: Create kind cluster
uses: helm/kind-action@v1.0.0-rc.1
if: steps.lint.outputs.changed == 'true'
- name: Run chart-testing (install)
uses: helm/chart-testing-action@v1.0.0-rc.2
with:
command: install
config: .github/ct.yaml

View File

@@ -1,16 +0,0 @@
name: "Check links"
on: [pull_request, push]
jobs:
docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- name: Link Checker
id: lc
uses: peter-evans/link-checker@v1
with:
args: -r *.md *.yaml */*/*.go -x .cluster.local
- name: Fail if there were link errors
run: exit ${{ steps.lc.outputs.exit_code }}

View File

@@ -1,11 +1,14 @@
name: release-chart
name: Publish helm chart
on:
push:
tags:
- "*"
branches:
- "main"
paths:
- "charts/**"
jobs:
publish:
publish-helm-chart:
name: Publish latest chart
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2

59
.github/workflows/on-main-push.yaml vendored Normal file
View File

@@ -0,0 +1,59 @@
# We publish every merged commit in the form of an image
# named kured:<branch>-<short tag>
name: Push image of latest main
on:
push:
branches:
- main
jobs:
tag-scan-and-push-final-image:
name: "Build, scan, and publish tagged image"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Find go version
run: |
GO_VERSION=$(awk '/^go/ {print $2};' go.mod)
echo "::set-output name=version::${GO_VERSION}"
id: awk_gomod
- name: Ensure go version
uses: actions/setup-go@v2
with:
go-version: "${{ steps.awk_gomod.outputs.version }}"
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME_WEAVEWORKSKUREDCI }}
password: ${{ secrets.DOCKERHUB_TOKEN_WEAVEWORKSKUREDCI }}
- name: Login to ghcr.io
uses: docker/login-action@v1
with:
registry: ghcr.io
username: weave-ghcr-bot
password: ${{ secrets.KURED_WEAVE_GHCR_BOT_TOKEN }}
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v1
- name: Find current tag version
run: echo "::set-output name=sha_short::$(git rev-parse --short HEAD)"
id: tags
- name: Build image
uses: docker/build-push-action@v2
with:
context: .
file: cmd/kured/Dockerfile.multi
platforms: linux/arm64, linux/amd64
push: true
tags: |
docker.io/${{ GITHUB.REPOSITORY }}:main-${{ steps.tags.outputs.sha_short }}
ghcr.io/${{ GITHUB.REPOSITORY }}:main-${{ steps.tags.outputs.sha_short }}

78
.github/workflows/on-pr-charts.yaml vendored Normal file
View File

@@ -0,0 +1,78 @@
#This is just extra testing, for lint check, and basic installation
#Those can fail earlier than functional tests (shorter tests)
# and give developer feedback soon if they didn't test themselves
name: PR - charts
on:
pull_request:
paths:
- "charts/**"
jobs:
# We create two jobs (with a matrix) instead of one to make those parallel.
# We don't need to conditionally check if something has changed, due to github actions
# tackling that for us.
# Fail-fast ensures that if one of those matrix job fail, the other one gets cancelled.
test-chart:
name: Test helm chart changes
runs-on: ubuntu-latest
strategy:
fail-fast: true
matrix:
test-action:
- lint
- install
steps:
- name: Checkout
uses: actions/checkout@v2
with:
fetch-depth: "0"
- uses: actions/setup-python@v2
with:
python-version: 3.7
# Helm is already present in github actions, so do not re-install it
- name: Setup chart testing
uses: helm/chart-testing-action@v2.2.0
- name: Create default kind cluster
uses: helm/kind-action@v1.2.0
with:
version: v0.11.0
if: ${{ matrix.test-action == 'install' }}
- name: Run chart tests
run: ct ${{ matrix.test-action }} --config .github/ct.yaml
# This doesn't re-use the ct actions, due to many limitations (auto tear down, no real testing)
deploy-chart:
name: Functional test of helm chart in its current state (needs published image of the helm chart)
runs-on: ubuntu-latest
needs: test-chart
steps:
- uses: actions/checkout@v2
# Default name for helm/kind-action kind clusters is "chart-testing"
- name: Create 1 node kind cluster
uses: helm/kind-action@v1.2.0
with:
version: v0.11.0
- name: Deploy kured on default namespace with its helm chart
run: |
# Documented in official helm doc to live on the edge
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# Refresh bins
hash -r
helm install kured ./charts/kured/ --set configuration.period=1m --wait
kubectl config set-context kind-chart-testing
kubectl get ds --all-namespaces
kubectl describe ds kured
- name: Test if successful deploy
uses: nick-invision/retry@v2.6.0
with:
timeout_minutes: 10
max_attempts: 10
retry_wait_seconds: 10
# DESIRED CURRENT READY UP-TO-DATE AVAILABLE should all be = to cluster_size
command: "kubectl get ds kured | grep -E 'kured.*1.*1.*1.*1.*1'"

336
.github/workflows/on-pr.yaml vendored Normal file
View File

@@ -0,0 +1,336 @@
name: PR
on:
pull_request:
push:
jobs:
pr-gotest:
name: Run go tests
runs-on: ubuntu-18.04
steps:
- name: checkout
uses: actions/checkout@v2
- name: Find go version
run: |
GO_VERSION=$(awk '/^go/ {print $2};' go.mod)
echo "::set-output name=version::${GO_VERSION}"
id: awk_gomod
- name: Ensure go version
uses: actions/setup-go@v2
with:
go-version: "${{ steps.awk_gomod.outputs.version }}"
- name: run tests
run: go test -json ./... > test.json
- name: Annotate tests
if: always()
uses: guyarb/golang-test-annoations@v0.5.0
with:
test-results: test.json
pr-shellcheck:
name: Lint bash code with shellcheck
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Run ShellCheck
uses: bewuethr/shellcheck-action@v2
pr-lint-code:
name: Lint golang code
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Find go version
run: |
GO_VERSION=$(awk '/^go/ {print $2};' go.mod)
echo "::set-output name=version::${GO_VERSION}"
id: awk_gomod
- name: Ensure go version
uses: actions/setup-go@v2
with:
go-version: "${{ steps.awk_gomod.outputs.version }}"
- name: Lint cmd folder
uses: Jerome1337/golint-action@v1.0.2
with:
golint-path: './cmd/...'
- name: Lint pkg folder
uses: Jerome1337/golint-action@v1.0.2
with:
golint-path: './pkg/...'
pr-check-docs-links:
name: Check docs for incorrect links
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Link Checker
id: lc
uses: peter-evans/link-checker@v1
with:
args: -r *.md *.yaml */*/*.go -x .cluster.local
- name: Fail if there were link errors
run: exit ${{ steps.lc.outputs.exit_code }}
# This should not be made a mandatory test
# It is only used to make us aware of any potential security failure, that
# should trigger a bump of the image in build/.
pr-vuln-scan:
name: Build image and scan it against known vulnerabilities
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Find go version
run: |
GO_VERSION=$(awk '/^go/ {print $2};' go.mod)
echo "::set-output name=version::${GO_VERSION}"
id: awk_gomod
- name: Ensure go version
uses: actions/setup-go@v2
with:
go-version: "${{ steps.awk_gomod.outputs.version }}"
- run: make DH_ORG="${{ github.repository_owner }}" VERSION="${{ github.sha }}" image
- uses: Azure/container-scan@v0
with:
image-name: docker.io/${{ github.repository_owner }}/kured:${{ github.sha }}
# This ensures the latest code works with the manifests built from tree.
# It is useful for two things:
# - Test manifests changes (obviously), ensuring they don't break existing clusters
# - Ensure manifests work with the latest versions even with no manifest change
# (compared to helm charts, manifests cannot easily template changes based on versions)
# Helm charts are _trailing_ releases, while manifests are done during development.
e2e-manifests:
name: End-to-End test with kured with code and manifests from HEAD
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
kubernetes:
- "1.21"
- "1.22"
- "1.23"
steps:
- uses: actions/checkout@v2
- name: Find go version
run: |
GO_VERSION=$(awk '/^go/ {print $2};' go.mod)
echo "::set-output name=version::${GO_VERSION}"
id: awk_gomod
- name: Ensure go version
uses: actions/setup-go@v2
with:
go-version: "${{ steps.awk_gomod.outputs.version }}"
- name: Build artifacts
run: |
make DH_ORG="${{ github.repository_owner }}" VERSION="${{ github.sha }}" image
make DH_ORG="${{ github.repository_owner }}" VERSION="${{ github.sha }}" manifest
- name: Workaround "Failed to attach 1 to compat systemd cgroup /actions_job/..." on gh actions
run: |
sudo bash << EOF
cp /etc/docker/daemon.json /etc/docker/daemon.json.old
echo '{}' > /etc/docker/daemon.json
systemctl restart docker || journalctl --no-pager -n 500
systemctl status docker
EOF
# Default name for helm/kind-action kind clusters is "chart-testing"
- name: Create kind cluster with 5 nodes
uses: helm/kind-action@v1.2.0
with:
config: .github/kind-cluster-${{ matrix.kubernetes }}.yaml
version: v0.11.0
- name: Preload previously built images onto kind cluster
run: kind load docker-image docker.io/${{ github.repository_owner }}/kured:${{ github.sha }} --name chart-testing
- name: Do not wait for an hour before detecting the rebootSentinel
run: |
sed -i 's/#\(.*\)--period=1h/\1--period=30s/g' kured-ds.yaml
- name: Install kured with kubectl
run: |
kubectl apply -f kured-rbac.yaml && kubectl apply -f kured-ds.yaml
- name: Ensure kured is ready
uses: nick-invision/retry@v2.6.0
with:
timeout_minutes: 10
max_attempts: 10
retry_wait_seconds: 60
# DESIRED CURRENT READY UP-TO-DATE AVAILABLE should all be = to cluster_size
command: "kubectl get ds -n kube-system kured | grep -E 'kured.*5.*5.*5.*5.*5'"
- name: Create reboot sentinel files
run: |
./tests/kind/create-reboot-sentinels.sh
- name: Follow reboot until success
env:
DEBUG: true
run: |
./tests/kind/follow-coordinated-reboot.sh
scenario-prom-helm:
name: Test prometheus with latest code from HEAD (=overrides image of the helm chart)
runs-on: ubuntu-latest
# only build with oldest and newest supported, it should be good enough.
strategy:
fail-fast: false
matrix:
kubernetes:
- "1.21"
steps:
- uses: actions/checkout@v2
- name: Find go version
run: |
GO_VERSION=$(awk '/^go/ {print $2};' go.mod)
echo "::set-output name=version::${GO_VERSION}"
id: awk_gomod
- name: Ensure go version
uses: actions/setup-go@v2
with:
go-version: "${{ steps.awk_gomod.outputs.version }}"
- name: Build artifacts
run: |
make DH_ORG="${{ github.repository_owner }}" VERSION="${{ github.sha }}" image
make DH_ORG="${{ github.repository_owner }}" VERSION="${{ github.sha }}" helm-chart
- name: Workaround 'Failed to attach 1 to compat systemd cgroup /actions_job/...' on gh actions
run: |
sudo bash << EOF
cp /etc/docker/daemon.json /etc/docker/daemon.json.old
echo '{}' > /etc/docker/daemon.json
systemctl restart docker || journalctl --no-pager -n 500
systemctl status docker
EOF
# Default name for helm/kind-action kind clusters is "chart-testing"
- name: Create 1 node kind cluster
uses: helm/kind-action@v1.2.0
with:
version: v0.11.0
- name: Preload previously built images onto kind cluster
run: kind load docker-image docker.io/${{ github.repository_owner }}/kured:${{ github.sha }} --name chart-testing
- name: Deploy kured on default namespace with its helm chart
run: |
# Documented in official helm doc to live on the edge
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# Refresh bins
hash -r
helm install kured ./charts/kured/ --wait --values ./charts/kured/ci/prometheus-values.yaml
kubectl config set-context kind-chart-testing
kubectl get ds --all-namespaces
kubectl describe ds kured
- name: Ensure kured is ready
uses: nick-invision/retry@v2.6.0
with:
timeout_minutes: 10
max_attempts: 10
retry_wait_seconds: 60
# DESIRED CURRENT READY UP-TO-DATE AVAILABLE
command: "kubectl get ds kured | grep -E 'kured.*1.*1.*1.*1.*1' "
- name: Get metrics (healthy)
uses: nick-invision/retry@v2.6.0
with:
timeout_minutes: 2
max_attempts: 12
retry_wait_seconds: 5
command: "./tests/kind/test-metrics.sh 0"
- name: Create reboot sentinel files
run: |
./tests/kind/create-reboot-sentinels.sh
- name: Get metrics (need reboot)
uses: nick-invision/retry@v2.6.0
with:
timeout_minutes: 15
max_attempts: 10
retry_wait_seconds: 60
command: "./tests/kind/test-metrics.sh 1"
# TEMPLATE Scenario testing.
# Note: keep in mind that the helm chart's appVersion is overriden to test your HEAD of the branch,
# if you `make helm-chart`.
# This will allow you to test properly your scenario and not use an existing image which will not
# contain your feature.
# scenario-<REPLACETHIS>-helm:
# #example: Testing <REPLACETHIS> with helm chart and code from HEAD"
# name: "<REPLACETHIS>"
# runs-on: ubuntu-latest
# strategy:
# fail-fast: false
# # You can define your own kubernetes versions. For example if your helm chart change should behave differently with different kubernetes versions.
# matrix:
# kubernetes:
# - "1.20"
# steps:
# - uses: actions/checkout@v2
# - name: Find go version
# run: |
# GO_VERSION=$(awk '/^go/ {print $2};' go.mod)
# echo "::set-output name=version::${GO_VERSION}"
# id: awk_gomod
# - name: Ensure go version
# uses: actions/setup-go@v2
# with:
# go-version: "${{ steps.awk_gomod.outputs.version }}"
# - name: Build artifacts
# run: |
# make DH_ORG="${{ github.repository_owner }}" VERSION="${{ github.sha }}" image
# make DH_ORG="${{ github.repository_owner }}" VERSION="${{ github.sha }}" helm-chart
#
# - name: "Workaround 'Failed to attach 1 to compat systemd cgroup /actions_job/...' on gh actions"
# run: |
# sudo bash << EOF
# cp /etc/docker/daemon.json /etc/docker/daemon.json.old
# echo '{}' > /etc/docker/daemon.json
# systemctl restart docker || journalctl --no-pager -n 500
# systemctl status docker
# EOF
#
# # Default name for helm/kind-action kind clusters is "chart-testing"
# - name: Create 5 node kind cluster
# uses: helm/kind-action@master
# with:
# config: .github/kind-cluster-${{ matrix.kubernetes }}.yaml
#
# - name: Preload previously built images onto kind cluster
# run: kind load docker-image docker.io/${{ github.repository_owner }}/kured:${{ github.sha }} --name chart-testing
#
# - name: Deploy kured on default namespace with its helm chart
# run: |
# # Documented in official helm doc to live on the edge
# curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# # Refresh bins
# hash -r
# helm install kured ./charts/kured/ --wait --values ./charts/kured/ci/<REPLACETHIS>-values.yaml
# kubectl config set-context kind-chart-testing
# kubectl get ds --all-namespaces
# kubectl describe ds kured
#
# - name: Ensure kured is ready
# uses: nick-invision/retry@v2.6.0
# with:
# timeout_minutes: 10
# max_attempts: 10
# retry_wait_seconds: 60
# # DESIRED CURRENT READY UP-TO-DATE AVAILABLE should all be = 5
# command: "kubectl get ds kured | grep -E 'kured.*5.*5.*5.*5.*5' "
#
# - name: Create reboot sentinel files
# run: |
# ./tests/kind/create-reboot-sentinels.sh
#
# - name: Test <REPLACETHIS>
# env:
# DEBUG: true
# run: |
# <TODO>

63
.github/workflows/on-tag.yaml vendored Normal file
View File

@@ -0,0 +1,63 @@
# when we add a tag to the repo, we should publish the kured image to a public repository
# if it's safe.
# It doesn't mean it's ready for release, but at least it's getting us started.
# The next step is to have a PR with the helm chart, to bump the version of the image used
name: Tag repo
on:
push:
tags:
- "*"
jobs:
tag-scan-and-push-final-image:
name: "Build, scan, and publish tagged image"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Find go version
run: |
GO_VERSION=$(awk '/^go/ {print $2};' go.mod)
echo "::set-output name=version::${GO_VERSION}"
id: awk_gomod
- name: Ensure go version
uses: actions/setup-go@v2
with:
go-version: "${{ steps.awk_gomod.outputs.version }}"
- name: Find current tag version
run: echo "::set-output name=version::${GITHUB_REF#refs/tags/}"
id: tags
- run: |
make DH_ORG="${{ github.repository_owner }}" VERSION="${{ steps.tags.outputs.version }}" image
- uses: Azure/container-scan@v0
with:
image-name: docker.io/${{ github.repository_owner }}/kured:${{ steps.tags.outputs.version }}
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME_WEAVEWORKSKUREDCI }}
password: ${{ secrets.DOCKERHUB_TOKEN_WEAVEWORKSKUREDCI }}
- name: Login to ghcr.io
uses: docker/login-action@v1
with:
registry: ghcr.io
username: weave-ghcr-bot
password: ${{ secrets.KURED_WEAVE_GHCR_BOT_TOKEN }}
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v1
- name: Build image
uses: docker/build-push-action@v2
with:
context: .
file: cmd/kured/Dockerfile.multi
platforms: linux/arm64, linux/amd64, linux/arm/v7, linux/arm/v6, linux/386
push: true
tags: |
docker.io/${{ GITHUB.REPOSITORY }}:${{ steps.tags.outputs.version }}
ghcr.io/${{ GITHUB.REPOSITORY }}:${{ steps.tags.outputs.version }}

136
.github/workflows/periodics-daily.yaml vendored Normal file
View File

@@ -0,0 +1,136 @@
name: Daily jobs
on:
schedule:
- cron: "30 1 * * *"
jobs:
periodics-gotest:
name: Run go tests
runs-on: ubuntu-18.04
steps:
- name: checkout
uses: actions/checkout@v2
- name: run tests
run: go test -json ./... > test.json
- name: Annotate tests
if: always()
uses: guyarb/golang-test-annoations@v0.5.0
with:
test-results: test.json
periodics-mark-stale:
name: Mark stale issues and PRs
runs-on: ubuntu-latest
steps:
# Stale by default waits for 60 days before marking PR/issues as stale, and closes them after 21 days.
# Do not expire the first issues that would allow the community to grow.
- uses: actions/stale@v4
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
stale-issue-message: 'This issue was automatically considered stale due to lack of activity. Please update it and/or join our slack channels to promote it, before it automatically closes (in 7 days).'
stale-pr-message: 'This PR was automatically considered stale due to lack of activity. Please refresh it and/or join our slack channels to highlight it, before it automatically closes (in 7 days).'
stale-issue-label: 'no-issue-activity'
stale-pr-label: 'no-pr-activity'
exempt-issue-labels: 'good first issue,keep'
days-before-close: 21
check-docs-links:
name: Check docs for incorrect links
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Link Checker
id: lc
uses: peter-evans/link-checker@v1
with:
args: -r *.md *.yaml */*/*.go -x .cluster.local
- name: Fail if there were link errors
run: exit ${{ steps.lc.outputs.exit_code }}
vuln-scan:
name: Build image and scan it against known vulnerabilities
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Find go version
run: |
GO_VERSION=$(awk '/^go/ {print $2};' go.mod)
echo "::set-output name=version::${GO_VERSION}"
id: awk_gomod
- name: Ensure go version
uses: actions/setup-go@v2
with:
go-version: "${{ steps.awk_gomod.outputs.version }}"
- run: make DH_ORG="${{ github.repository_owner }}" VERSION="${{ github.sha }}" image
- uses: Azure/container-scan@v0
with:
image-name: docker.io/${{ github.repository_owner }}/kured:${{ github.sha }}
deploy-helm:
name: Ensure our currently released helm chart works on all kubernetes versions
runs-on: ubuntu-latest
# only build with oldest and newest supported, it should be good enough.
strategy:
matrix:
kubernetes:
- "1.21"
- "1.22"
- "1.23"
steps:
- uses: actions/checkout@v2
- name: Find go version
run: |
GO_VERSION=$(awk '/^go/ {print $2};' go.mod)
echo "::set-output name=version::${GO_VERSION}"
id: awk_gomod
- name: Ensure go version
uses: actions/setup-go@v2
with:
go-version: "${{ steps.awk_gomod.outputs.version }}"
- name: "Workaround 'Failed to attach 1 to compat systemd cgroup /actions_job/...' on gh actions"
run: |
sudo bash << EOF
cp /etc/docker/daemon.json /etc/docker/daemon.json.old
echo '{}' > /etc/docker/daemon.json
systemctl restart docker || journalctl --no-pager -n 500
systemctl status docker
EOF
# Default name for helm/kind-action kind clusters is "chart-testing"
- name: Create 5 node kind cluster
uses: helm/kind-action@v1.2.0
with:
config: .github/kind-cluster-${{ matrix.kubernetes }}.yaml
version: v0.11.0
- name: Deploy kured on default namespace with its helm chart
run: |
# Documented in official helm doc to live on the edge
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# Refresh bins
hash -r
helm install kured ./charts/kured/ --set configuration.period=1m
kubectl config set-context kind-chart-testing
kubectl get ds --all-namespaces
kubectl describe ds kured
- name: Ensure kured is ready
uses: nick-invision/retry@v2.6.0
with:
timeout_minutes: 10
max_attempts: 10
retry_wait_seconds: 60
# DESIRED CURRENT READY UP-TO-DATE AVAILABLE should all be = 5
command: "kubectl get ds kured | grep -E 'kured.*5.*5.*5.*5.*5' "
- name: Create reboot sentinel files
run: |
./tests/kind/create-reboot-sentinels.sh
- name: Follow reboot until success
env:
DEBUG: true
run: |
./tests/kind/follow-coordinated-reboot.sh

1
.gitignore vendored
View File

@@ -1,4 +1,3 @@
cmd/kured/kured
cmd/prom-active-alerts/prom-active-alerts
vendor
build

View File

@@ -13,11 +13,12 @@ you are planning to contribute code.
[issues]: https://github.com/weaveworks/kured/issues
[readme]: README.md
## Updating k8s support
## Regular development activities
Whenever we want to update e.g. [`kubectl` in the
image](cmd/kured/Dockerfile), we need to consider if we update `client-go`
as well, some RBAC changes might be necessary too.
### Updating k8s support
Whenever we want to update e.g. the `kubectl` or `client-go` dependencies,
some RBAC changes might be necessary too.
This is what it took to support Kubernetes 1.14:
<https://github.com/weaveworks/kured/pull/75>
@@ -25,15 +26,90 @@ This is what it took to support Kubernetes 1.14:
That the process can be more involved that that can be seen in
<https://github.com/weaveworks/kured/commits/support-k8s-1.10>
Please update our .github/workflows with the new k8s images, starting by
the creation of a .github/kind-cluster-<version>.yaml, then updating
our workflows with the new versions.
Once you updated everything, make sure you update the support matrix on
the main [README][readme] as well.
## Release testing
### Updating other dependencies
Dependabot proposes changes in our go.mod/go.sum.
Some of those changes are covered by CI testing, some are not.
Please make sure to test those not covered by CI (mostly the integration
with other tools) manually before merging.
### Review periodic jobs
We run periodic jobs (see also Automated testing section of this documentation).
Those should be monitored for failures.
If a failure happen in periodics, something terribly wrong must have happened
(or github is failing at the creation of a kind cluster). Please monitor those
failures carefully.
### Introducing new features
When you introduce a new feature, the kured team expects you to have tested
your change thoroughly. If possible, include all the necessary testing in your change.
If your change involves a user facing change (change in flags of kured for example),
please include expose your new feature in our default manifest (`kured-ds.yaml`),
as a comment.
Do not update the helm chart directly.
Helm charts and our release manifests (see below) are our stable interfaces.
Any user facing changes will therefore have to wait for a while before being
exposed to our users.
This also means that when you expose a new feature, you should create another PR
for your changes in `charts/` to make your feature available for our next kured version.
In this change, you can directly bump the appVersion to the next minor version.
(for example, if current appVersion is 1.6.x, make sure you update your appVersion
to 1.7.0). It allows us to have an easy view of what we land each release.
Do not hesitate to increase the test coverage for your feature, whether it's unit
testing to full functional testing (even using helm charts)
### Increasing test coverage
We are welcoming any change to increase our test coverage.
See also our github issues for the label `testing`.
### Updating helm charts
Helm charts are continuously published. Any change in `charts/` will be immediately
pushed in production.
## Automated testing
Our CI is covered by github actions.
You can see their contents in .github/workflows.
We currently run:
- go tests and lint
- shellcheck
- a check for dead links in our docs
- a security check against our base image (alpine)
- a deep functional test using our manifests on all supported k8s versions
- basic deployment using our helm chart on any chart change
Changes in helm charts are not functionally tested on PRs. We assume that
the PRs to implement the feature are properly tested by our users and
contributors before merge.
To test your code manually, follow the section Manual testing.
## Manual (release) testing
Before `kured` is released, we want to make sure it still works fine on the
previous, current and next minor version of Kubernetes (with respect to the
embedded `client-go` & `kubectl`). For local testing e.g. `minikube` can be
sufficient.
`client-go` & `kubectl` dependencies in use). For local testing e.g.
`minikube` or `kind` can be sufficient. This will allow you to catch issues
that might not have been tested in our CI, like integration with other tools,
or your specific use case.
Deploy kured in your test scenario, make sure you pass the right `image`,
update the e.g. `period` and `reboot-days` options, so you get immediate
@@ -43,13 +119,21 @@ results, if you login to a node and run:
sudo touch /var/run/reboot-required
```
### Testing with `minikube`
### Example of golang testing
Please run `make test`. You should have golint installed.
### Example of testing with `minikube`
A test-run with `minikube` could look like this:
```console
# start minikube
minikube start --vm-driver kvm2 --kubernetes-version <k8s-release>
# build kured image and publish to registry accessible by minikube
make image minikube-publish
# edit kured-ds.yaml to
# - point to new image
# - change e.g. period and reboot-days option for immediate results
@@ -58,6 +142,10 @@ minikube kubectl -- apply -f kured-rbac.yaml
minikube kubectl -- apply -f kured-ds.yaml
minikube kubectl -- logs daemonset.apps/kured -n kube-system -f
# Alternatively use helm to install the chart
# edit values-local.yaml to change any chart parameters
helm install kured ./charts/kured --namespace kube-system -f ./charts/kured/values.minikube.yaml
# In separate terminal
minikube ssh
sudo touch /var/run/reboot-required
@@ -75,32 +163,73 @@ If all the tests ran well, kured maintainers can reach out to the Weaveworks
team to get an upcoming `kured` release tested in the Dev environment for
real life testing.
### Example of testing with `kind`
A test-run with `kind` could look like this:
```console
# create kind cluster
kind create cluster --config .github/kind-cluster-<k8s-version>.yaml
# create reboot required files on pre-defined kind nodes
./tests/kind/create-reboot-sentinels.sh
# check if reboot is working fine
./tests/kind/follow-coordinated-reboot.sh
```
## Publishing a new kured release
### Prepare Documentation
Check that `README.md` has an updated compatibility matrix and that the
url in the `kubectl` incantation (under "Installation") is updated to the
new version you want to release.
### Create a tag on the repo
Before going further, we should freeze the code for a release, by
tagging the code. The Github-Action should start a new job and push
the new image to the registry.
### Create the combined manifest
Now create the `kured-<release>-dockerhub.yaml` for e.g. `1.3.0`:
```sh
VERSION=1.3.0
MANIFEST="kured-$VERSION-dockerhub.yaml"
make DH_ORG="weaveworks" VERSION="${VERSION}" manifest
cat kured-rbac.yaml > "$MANIFEST"
cat kured-ds.yaml >> "$MANIFEST"
sed -i "s#docker.io/weaveworks/kured#docker.io/weaveworks/kured:$VERSION#g" "$MANIFEST"
```
The last thing you need to do is update the `image:` to point to the release
tag, e.g. `docker.io/weaveworks/kured:1.3.0`.
### Publish release artifacts
Now you can head to the Github UI, use the version number as tag and upload the
`kured-<release>-dockerhub.yaml` file.
### Release notes
Please describe what's new and noteworthy in the release notes, list the PRs
that landed and give a shout-out to everyone who contributed.
Please also note down on which releases the upcoming `kured` release was
tested on. (Check old release notes if you're unsure.)
### Update the Helm chart
You can automatically bump the helm chart's application version
with the latest image tag by running:
```sh
make DH_ORG="weaveworks" VERSION="1.3.0" helm-chart
```
A change in the helm chart requires a bump of the `version`
in `charts/kured/Chart.yaml` (following the versioning rules).
Update it, and issue a PR. Upon merge, that PR will automatically
publish the chart to the gh-pages branch.
When there are open helm-chart PRs which are on hold until the helm-chart has been updated
with the new kured version, they can be merged now (unless a rebase is needed from the contributor).

5
MAINTAINERS Normal file
View File

@@ -0,0 +1,5 @@
Christian Kotzbauer <christian.kotzbauer@gmail.com> (@ckotzbauer)
Daniel Holbach <daniel@weave.works> (@dholbach)
Hidde Beydals <hidde@weave.works> (@hiddeco)
Jean-Phillipe Evrard <jean-philippe.evrard@suse.com> (@evrardjp)
Jack Francis <jackfrancis@gmail.com> (@jackfrancis)

View File

@@ -1,5 +1,5 @@
.DEFAULT: all
.PHONY: all clean image publish-image minikube-publish
.PHONY: all clean image publish-image minikube-publish manifest helm-chart test tests
DH_ORG=weaveworks
VERSION=$(shell git symbolic-ref --short HEAD)-$(shell git rev-parse --short HEAD)
@@ -24,12 +24,32 @@ build/.image.done: cmd/kured/Dockerfile cmd/kured/kured
cp $^ build
$(SUDO) docker build -t docker.io/$(DH_ORG)/kured -f build/Dockerfile ./build
$(SUDO) docker tag docker.io/$(DH_ORG)/kured docker.io/$(DH_ORG)/kured:$(VERSION)
$(SUDO) docker tag docker.io/$(DH_ORG)/kured ghcr.io/$(DH_ORG)/kured:$(VERSION)
touch $@
image: build/.image.done
publish-image: image
$(SUDO) docker push docker.io/$(DH_ORG)/kured:$(VERSION)
$(SUDO) docker push ghcr.io/$(DH_ORG)/kured:$(VERSION)
minikube-publish: image
$(SUDO) docker save docker.io/$(DH_ORG)/kured | (eval $$(minikube docker-env) && docker load)
manifest:
sed -i "s#image: docker.io/.*kured.*#image: docker.io/$(DH_ORG)/kured:$(VERSION)#g" kured-ds.yaml
echo "Please generate combined manifest if necessary"
helm-chart:
sed -i "s#repository:.*/kured#repository: $(DH_ORG)/kured#g" charts/kured/values.yaml
sed -i "s#appVersion:.*#appVersion: \"$(VERSION)\"#g" charts/kured/Chart.yaml
sed -i "s#\`[0-9]*\.[0-9]*\.[0-9]*\`#\`$(VERSION)\`#g" charts/kured/README.md
echo "Please bump version in charts/kured/Chart.yaml"
test: tests
echo "Running go tests"
go test ./...
echo "Running golint on pkg"
golint ./pkg/...
echo "Running golint on cmd"
golint ./cmd/...

165
README.md
View File

@@ -1,27 +1,29 @@
# kured - Kubernetes Reboot Daemon
<img src="https://github.com/weaveworks/kured/raw/master/img/logo.png" align="right"/>
<img src="https://github.com/weaveworks/kured/raw/main/img/logo.png" align="right"/>
* [Introduction](#introduction)
* [Kubernetes & OS Compatibility](#kubernetes-&-os-compatibility)
* [Installation](#installation)
* [Configuration](#configuration)
* [Reboot Sentinel File & Period](#reboot-sentinel-file-&-period)
* [Setting a schedule](#setting-a-schedule)
* [Blocking Reboots via Alerts](#blocking-reboots-via-alerts)
* [Blocking Reboots via Pods](#blocking-reboots-via-pods)
* [Prometheus Metrics](#prometheus-metrics)
* [Slack Notifications](#slack-notifications)
* [Overriding Lock Configuration](#overriding-lock-configuration)
* [Operation](#operation)
* [Testing](#testing)
* [Disabling Reboots](#disabling-reboots)
* [Manual Unlock](#manual-unlock)
* [Automatic Unlock](#automatic-unlock)
* [Building](#building)
* [Frequently Asked/Anticipated Questions](#frequently-askedanticipated-questions)
* [Getting Help](#getting-help)
- [Introduction](#introduction)
- [Kubernetes & OS Compatibility](#kubernetes--os-compatibility)
- [Installation](#installation)
- [Configuration](#configuration)
- [Reboot Sentinel File & Period](#reboot-sentinel-file--period)
- [Setting a schedule](#setting-a-schedule)
- [Blocking Reboots via Alerts](#blocking-reboots-via-alerts)
- [Blocking Reboots via Pods](#blocking-reboots-via-pods)
- [Prometheus Metrics](#prometheus-metrics)
- [Notifications](#notifications)
- [Overriding Lock Configuration](#overriding-lock-configuration)
- [Operation](#operation)
- [Testing](#testing)
- [Disabling Reboots](#disabling-reboots)
- [Manual Unlock](#manual-unlock)
- [Automatic Unlock](#automatic-unlock)
- [Delaying Lock Release](#delaying-lock-release)
- [Building](#building)
- [Frequently Asked/Anticipated Questions](#frequently-askedanticipated-questions)
- [Why is there no `latest` tag on Docker Hub?](#why-is-there-no-latest-tag-on-docker-hub)
- [Getting Help](#getting-help)
## Introduction
@@ -29,7 +31,8 @@ Kured (KUbernetes REboot Daemon) is a Kubernetes daemonset that
performs safe automatic node reboots when the need to do so is
indicated by the package management system of the underlying OS.
* Watches for the presence of a reboot sentinel e.g. `/var/run/reboot-required`
* Watches for the presence of a reboot sentinel file e.g. `/var/run/reboot-required`
or the successful run of a sentinel command.
* Utilises a lock in the API server to ensure only one node reboots at
a time
* Optionally defers reboots in the presence of active Prometheus alerts or selected pods
@@ -37,19 +40,25 @@ indicated by the package management system of the underlying OS.
## Kubernetes & OS Compatibility
The daemon image contains versions of `k8s.io/client-go` and the
`kubectl` binary for the purposes of maintaining the lock and draining
worker nodes. Kubernetes aims to provide forwards & backwards
compatibility of one minor version between client and server:
The daemon image contains versions of `k8s.io/client-go` and
`k8s.io/kubectl` (the binary of `kubectl` in older releases) for the purposes of
maintaining the lock and draining worker nodes. Kubernetes aims to provide
forwards and backwards compatibility of one minor version between client and
server:
| kured | kubectl | k8s.io/client-go | k8s.io/apimachinery | expected kubernetes compatibility |
|--------|---------|------------------|---------------------|-----------------------------------|
| master | 1.17.7 | v0.17.0 | v0.17.0 | 1.16.x, 1.17.x, 1.18.x |
| 1.4.2 | 1.17.7 | v0.17.0 | v0.17.0 | 1.16.x, 1.17.x, 1.18.x |
| 1.3.0 | 1.15.10 | v12.0.0 | release-1.15 | 1.15.x, 1.16.x, 1.17.x |
| 1.2.0 | 1.13.6 | v10.0.0 | release-1.13 | 1.12.x, 1.13.x, 1.14.x |
| 1.1.0 | 1.12.1 | v9.0.0 | release-1.12 | 1.11.x, 1.12.x, 1.13.x |
| 1.0.0 | 1.7.6 | v4.0.0 | release-1.7 | 1.6.x, 1.7.x, 1.8.x |
| kured | kubectl | k8s.io/client-go | k8s.io/apimachinery | expected kubernetes compatibility |
|-------|---------|------------------|---------------------|-----------------------------------|
| main | 1.22.4 | v0.22.4 | v0.22.4 | 1.21.x, 1.22.x, 1.23.x |
| 1.9.1 | 1.22.4 | v0.22.4 | v0.22.4 | 1.21.x, 1.22.x, 1.23.x |
| 1.8.1 | 1.21.4 | v0.21.4 | v0.21.4 | 1.20.x, 1.21.x, 1.22.x |
| 1.7.0 | 1.20.5 | v0.20.5 | v0.20.5 | 1.19.x, 1.20.x, 1.21.x |
| 1.6.1 | 1.19.4 | v0.19.4 | v0.19.4 | 1.18.x, 1.19.x, 1.20.x |
| 1.5.1 | 1.18.8 | v0.18.8 | v0.18.8 | 1.17.x, 1.18.x, 1.19.x |
| 1.4.4 | 1.17.7 | v0.17.0 | v0.17.0 | 1.16.x, 1.17.x, 1.18.x |
| 1.3.0 | 1.15.10 | v12.0.0 | release-1.15 | 1.15.x, 1.16.x, 1.17.x |
| 1.2.0 | 1.13.6 | v10.0.0 | release-1.13 | 1.12.x, 1.13.x, 1.14.x |
| 1.1.0 | 1.12.1 | v9.0.0 | release-1.12 | 1.11.x, 1.12.x, 1.13.x |
| 1.0.0 | 1.7.6 | v4.0.0 | release-1.7 | 1.6.x, 1.7.x, 1.8.x |
See the [release notes](https://github.com/weaveworks/kured/releases)
for specific version compatibility information, including which
@@ -64,7 +73,8 @@ To obtain a default installation without Prometheus alerting interlock
or Slack notifications:
```console
kubectl apply -f https://github.com/weaveworks/kured/releases/download/1.3.0/kured-1.3.0-dockerhub.yaml
latest=$(curl -s https://api.github.com/repos/weaveworks/kured/releases | jq -r .[0].tag_name)
kubectl apply -f "https://github.com/weaveworks/kured/releases/download/$latest/kured-$latest-dockerhub.yaml"
```
If you want to customise the installation, download the manifest and
@@ -76,23 +86,37 @@ The following arguments can be passed to kured via the daemonset pod template:
```console
Flags:
--annotation-ttl time force clean annotation after this ammount of time (default 0, disabled)
--alert-filter-regexp regexp.Regexp alert names to ignore when checking for active alerts
--alert-firing-only bool only consider firing alerts when checking for active alerts
--blocking-pod-selector stringArray label selector identifying pods whose presence should prevent reboots
--drain-grace-period int time in seconds given to each pod to terminate gracefully, if negative, the default value specified in the pod will be used (default: -1)
--skip-wait-for-delete-timeout int when seconds is greater than zero, skip waiting for the pods whose deletion timestamp is older than N seconds while draining a node (default: 0)
--ds-name string name of daemonset on which to place lock (default "kured")
--ds-namespace string namespace containing daemonset on which to place lock (default "kube-system")
--end-time string only reboot before this time of day (default "23:59")
--end-time string schedule reboot only before this time of day (default "23:59:59")
--force-reboot bool force a reboot even if the drain is still running (default: false)
--drain-timeout duration timeout after which the drain is aborted (default: 0, infinite time)
-h, --help help for kured
--lock-annotation string annotation in which to record locking node (default "weave.works/kured-node-lock")
--lock-release-delay duration hold lock after reboot by this duration (default: 0, disabled)
--lock-ttl duration expire lock annotation after this duration (default: 0, disabled)
--message-template-drain string message template used to notify about a node being drained (default "Draining node %s")
--message-template-reboot string message template used to notify about a node being rebooted (default "Rebooting node %s")
--notify-url url for reboot notifications (cannot use with --slack-hook-url flags)
--period duration reboot check period (default 1h0m0s)
--prefer-no-schedule-taint string Taint name applied during pending node reboot (to prevent receiving additional pods from other rebooting nodes). Disabled by default. Set e.g. to "weave.works/kured-node-reboot" to enable tainting.
--prometheus-url string Prometheus instance to probe for active alerts
--reboot-days strings only reboot on these days (default [su,mo,tu,we,th,fr,sa])
--reboot-command string command to run when a reboot is required by the sentinel (default "/sbin/systemctl reboot")
--reboot-days strings schedule reboot on these days (default [su,mo,tu,we,th,fr,sa])
--reboot-delay duration add a delay after drain finishes but before the reboot command is issued (default 0, no time)
--reboot-sentinel string path to file whose existence signals need to reboot (default "/var/run/reboot-required")
--reboot-sentinel-command string command for which a successful run signals need to reboot (default ""). If non-empty, sentinel file will be ignored.
--slack-channel string slack channel for reboot notfications
--slack-hook-url string slack hook URL for reboot notfications
--slack-hook-url string slack hook URL for reboot notfications [deprecated in favor of --notify-url]
--slack-username string slack username for reboot notfications (default "kured")
--start-time string only reboot after this time of day (default "0:00")
--time-zone string use this timezone to calculate allowed reboot time (default "UTC")
--start-time string schedule reboot only after this time of day (default "0:00")
--time-zone string use this timezone for schedule inputs (default "UTC")
--log-format string log format specified as text or json, defaults to "text"
```
### Reboot Sentinel File & Period
@@ -103,6 +127,10 @@ values with `--reboot-sentinel` and `--period`. Each replica of the
daemon uses a random offset derived from the period on startup so that
nodes don't all contend for the lock simultaneously.
Alternatively, a reboot sentinel command can be used. If a reboot
sentinel command is used, the reboot sentinel file presence will be
ignored.
### Setting a schedule
By default, kured will reboot any time it detects the sentinel, but this
@@ -113,10 +141,10 @@ reboots to predictable schedules. Use `--reboot-days`, `--start-time`,
hours on the west coast USA can be specified with:
```console
--reboot-days mon,tue,wed,thu,fri
--start-time 9am
--end-time 5pm
--time-zone America/Los_Angeles
--reboot-days=mon,tue,wed,thu,fri
--start-time=9am
--end-time=5pm
--time-zone=America/Los_Angeles
```
Times can be formatted in numerous ways, including `5pm`, `5:00pm` `17:00`,
@@ -143,6 +171,11 @@ will block reboots, however you can ignore specific alerts:
--alert-filter-regexp=^(RebootRequired|AnotherBenignAlert|...$
```
You can also only block reboots for firing alerts:
```console
--alert-firing-only=true
```
See the section on Prometheus metrics for an important application of this
filter.
@@ -205,15 +238,33 @@ If you choose to employ such an alert and have configured kured to
probe for active alerts before rebooting, be sure to specify
`--alert-filter-regexp=^RebootRequired$` to avoid deadlock!
### Slack Notifications
### Notifications
If you specify a Slack hook via `--slack-hook-url`, kured will notify
you immediately prior to rebooting a node:
When you specify a formatted URL using `--notify-url`, kured will notify
about draining and rebooting nodes across a list of technologies.
![Notification](img/slack-notification.png)
We recommend setting `--slack-username` to be the name of the
environment, e.g. `dev` or `prod`.
Alternatively you can use the `--message-template-drain` and `--message-template-reboot` to customize the text of the message, e.g.
```cli
--message-template-drain="Draining node %s part of *my-cluster* in region *xyz*"
```
Here is the syntax:
- slack: `slack://tokenA/tokenB/tokenC`
(`--slack-hook-url` is deprecated but possible to use)
- rocketchat: `rocketchat://[username@]rocketchat-host/token[/channel|@recipient]`
- teams: `teams://tName/token-a/token-b/token-c`
> **Attention** as the [format of the url has changed](https://github.com/containrrr/shoutrrr/issues/138) you also have to specify a `tName`
- Email: `smtp://username:password@host:port/?fromAddress=fromAddress&toAddresses=recipient1[,recipient2,...]`
More details here: [containrrr.dev/shoutrrr/v0.4/services/overview](https://containrrr.dev/shoutrrr/v0.4/services/overview)
### Overriding Lock Configuration
@@ -269,12 +320,16 @@ kubectl -n kube-system annotate ds kured weave.works/kured-node-lock-
In exceptional circumstances (especially when used with cluster-autoscaler) a node
which holds lock might be killed thus annotation will stay there for ever.
Using `--annotation-ttl=30m` will allow other nodes to take over if TTL has expired (in this case 30min) and continue reboot process.
Using `--lock-ttl=30m` will allow other nodes to take over if TTL has expired (in this case 30min) and continue reboot process.
### Delaying Lock Release
Using `--lock-release-delay=30m` will cause nodes to hold the lock for the specified time frame (in this case 30min) before it is released and the reboot process continues. This can be used to throttle reboots across the cluster.
## Building
See the [CircleCI config](.circleci/config.yml) for the preferred
version of Golang. Kured now uses [Go
Kured now uses [Go
Modules](https://github.com/golang/go/wiki/Modules), so build
instructions vary depending on where you have checked out the
repository:
@@ -291,6 +346,8 @@ make
GO111MODULE=on make
```
You can find the current preferred version of Golang in the [go.mod file](go.mod).
If you are interested in contributing code to kured, please take a look at
our [development][development] docs.
@@ -302,7 +359,7 @@ our [development][development] docs.
Use of `latest` for production deployments is bad practice - see
[here](https://kubernetes.io/docs/concepts/configuration/overview) for
details. The manifest on `master` refers to `latest` for local
details. The manifest on `main` refers to `latest` for local
development testing with minikube only; for production use choose a
versioned manifest from the [release page](https://github.com/weaveworks/kured/releases/).
@@ -316,4 +373,6 @@ If you have any questions about, feedback for or problems with `kured`:
* Join us in [our monthly meeting](https://docs.google.com/document/d/1bsHTjHhqaaZ7yJnXF6W8c89UB_yn-OoSZEmDnIP34n8/edit#),
every fourth Wednesday of the month at 16:00 UTC.
We follow the [CNCF Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md).
Your feedback is always welcome!

View File

@@ -1,14 +1,14 @@
apiVersion: v1
appVersion: "1.4.2"
appVersion: "1.9.1"
description: A Helm chart for kured
name: kured
version: 2.0.0
version: 2.11.2
home: https://github.com/weaveworks/kured
maintainers:
- name: dholbach
email: daniel@weave.works
- name: ckotzbauer
email: christian.kotzbauer@gmail.com
- name: davidkarlsen
email: david@davidkarlsen.com
sources:
- https://github.com/weaveworks/kured
icon: https://raw.githubusercontent.com/weaveworks/kured/master/img/logo.png
icon: https://raw.githubusercontent.com/weaveworks/kured/main/img/logo.png

View File

@@ -36,42 +36,65 @@ The following changes have been made compared to the stable chart:
| Config | Description | Default |
| ------ | ----------- | ------- |
| `image.repository` | Image repository | `weaveworks/kured` |
| `image.tag` | Image tag | `1.4.2` |
| `image.tag` | Image tag | `1.9.1` |
| `image.pullPolicy` | Image pull policy | `IfNotPresent` |
| `image.pullSecrets` | Image pull secrets | `[]` |
| `updateStrategy` | Daemonset update strategy | `OnDelete` |
| `updateStrategy` | Daemonset update strategy | `RollingUpdate` |
| `maxUnavailable` | The max pods unavailable during a rolling update | `1` |
| `podAnnotations` | Annotations to apply to pods (eg to add Prometheus annotations) | `{}` |
| `dsAnnotations` | Annotations to apply to the kured DaemonSet | `{}` |
| `extraArgs` | Extra arguments to pass to `/usr/bin/kured`. See below. | `{}` |
| `configuration.annotationTtl` | cli-parameter `--annotation-ttl` | `0` |
| `extraEnvVars` | Array of environment variables to pass to the daemonset. | `{}` |
| `configuration.lockTtl` | cli-parameter `--lock-ttl` | `0` |
| `configuration.lockReleaseDelay` | cli-parameter `--lock-release-delay` | `0` |
| `configuration.alertFilterRegexp` | cli-parameter `--alert-filter-regexp` | `""` |
| `configuration.alertFiringOnly` | cli-parameter `--alert-firing-only` | `false` |
| `configuration.blockingPodSelector` | Array of selectors for multiple cli-parameters `--blocking-pod-selector` | `[]` |
| `configuration.endTime` | cli-parameter `--end-time` | `""` |
| `configuration.lockAnnotation` | cli-parameter `--lock-annotation` | `""` |
| `configuration.period` | cli-parameter `--period` | `""` |
| `configuration.forceReboot` | cli-parameter `--force-reboot` | `false` |
| `configuration.drainGracePeriod` | cli-parameter `--drain-grace-period` | `""` |
| `configuration.drainTimeout` | cli-parameter `--drain-timeout` | `""` |
| `configuration.skipWaitForDeleteTimeout` | cli-parameter `--skip-wait-for-delete-timeout` | `""` |
| `configuration.prometheusUrl` | cli-parameter `--prometheus-url` | `""` |
| `configuration.rebootDays` | Array of days for multiple cli-parameters `--reboot-days` | `[]` |
| `configuration.rebootSentinel` | cli-parameter `--reboot-sentinel` | `""` |
| `configuration.rebootSentinelCommand` | cli-parameter `--reboot-sentinel-command` | `""` |
| `configuration.rebootCommand` | cli-parameter `--reboot-command` | `""` |
| `configuration.rebootDelay` | cli-parameter `--reboot-delay` | `""` |
| `configuration.slackChannel` | cli-parameter `--slack-channel` | `""` |
| `configuration.slackHookUrl` | cli-parameter `--slack-hook-url` | `""` |
| `configuration.slackUsername` | cli-parameter `--slack-username` | `""` |
| `configuration.notifyUrl` | cli-parameter `--notify-url` | `""` |
| `configuration.messageTemplateDrain` | cli-parameter `--message-template-drain` | `""` |
| `configuration.messageTemplateReboot` | cli-parameter `--message-template-reboot` | `""` |
| `configuration.startTime` | cli-parameter `--start-time` | `""` |
| `configuration.timeZone` | cli-parameter `--time-zone` | `""` |
| `configuration.annotateNodes` | cli-parameter `--annotate-nodes` | `false` |
| `configuration.logFormat` | cli-parameter `--log-format` | `"text"` |
| `configuration.preferNoScheduleTaint` | Taint name applied during pending node reboot | `""` |
| `rbac.create` | Create RBAC roles | `true` |
| `serviceAccount.create` | Create a service account | `true` |
| `serviceAccount.name` | Service account name to create (or use if `serviceAccount.create` is false) | (chart fullname) |
| `podSecurityPolicy.create` | Create podSecurityPolicy | `false` |
| `resources` | Resources requests and limits. | `{}` |
| `metrics.create` | Create a Service for the metrics endpoint | `false` |
| `metrics.serviceMonitor.create` | Create a ServiceMonitor for prometheus-operator | `true` |
| `metrics.serviceMonitor.namespace` | The namespace to create the ServiceMonitor in | `""` |
| `metrics.serviceMonitor.labels` | Additional labels for the ServiceMonitor | `{}` |
| `metrics.serviceMonitor.interval` | Interval prometheus should scrape the endpoint | `60s` |
| `metrics.serviceMonitor.scrapeTimeout` | A custom scrapeTimeout for prometheus | `""` |
| `metrics.create` | Create a ServiceMonitor for prometheus-operator | `false` |
| `metrics.namespace` | The namespace to create the ServiceMonitor in | `""` |
| `metrics.labels` | Additional labels for the ServiceMonitor | `{}` |
| `metrics.interval` | Interval prometheus should scrape the endpoint | `60s` |
| `metrics.scrapeTimeout` | A custom scrapeTimeout for prometheus | `""` |
| `service.create` | Create a Service for the metrics endpoint | `false` |
| `service.name ` | Service name for the metrics endpoint | `""` |
| `service.port` | Port of the service to expose | `8080` |
| `service.annotations` | Annotations to apply to the service (eg to add Prometheus annotations) | `{}` |
| `podLabels` | Additional labels for pods (e.g. CostCenter=IT) | `{}` |
| `priorityClassName` | Priority Class to be used by the pods | `""` |
| `tolerations` | Tolerations to apply to the daemonset (eg to allow running on master) | `[{"key": "node-role.kubernetes.io/master", "effect": "NoSchedule"}]`|
| `affinity` | Affinity for the daemonset (ie, restrict which nodes kured runs on) | `{}` |
| `nodeSelector` | Node Selector for the daemonset (ie, restrict which nodes kured runs on) | `{}` |
| `volumeMounts` | Maps of volumes mount to mount | `{}` |
| `volumes` | Maps of volumes to mount | `{}` |
See https://github.com/weaveworks/kured#configuration for values (not contained in the `configuration` object) for `extraArgs`. Note that
```yaml
extraArgs:
@@ -83,7 +106,7 @@ becomes `/usr/bin/kured ... --foo=1 --bar-baz=2`.
## Prometheus Metrics
Kured exposes a single prometheus metric indicating whether a reboot is required or not (see [kured docs](https://github.com/weaveworks/kured#prometheus-metrics)) for details.
Kured exposes a single prometheus metric indicating whether a reboot is required or not (see [kured docs](https://github.com/weaveworks/kured#prometheus-metrics)) for details.
#### Prometheus-Operator
@@ -95,8 +118,9 @@ metrics:
#### Prometheus Annotations
```yaml
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/metrics"
prometheus.io/port: "8080"
service:
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/metrics"
prometheus.io/port: "8080"
```

View File

@@ -0,0 +1,13 @@
# This is tested twice:
# Basic install test with chart-testing (on charts PRs)
# Functional testing in PRs (other PRs)
service:
create: true
name: kured-prometheus-endpoint
port: 8080
type: NodePort
nodePort: 30000
# Do not override the configuration: period in this, so that
# We can test prometheus exposed metrics without rebooting.

View File

@@ -62,3 +62,11 @@ chart: {{ template "kured.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
{{- end -}}
{{/*
Returns a set of matchLabels applied.
*/}}
{{- define "kured.matchLabels" -}}
app: {{ template "kured.name" . }}
release: {{ .Release.Name }}
{{- end -}}

View File

@@ -10,7 +10,7 @@ rules:
# Allow kubectl to drain/uncordon
#
# NB: These permissions are tightly coupled to the bundled version of kubectl; the ones below
# match https://github.com/kubernetes/kubernetes/blob/v1.12.1/pkg/kubectl/cmd/drain.go
# match https://github.com/kubernetes/kubernetes/blob/v1.19.4/staging/src/k8s.io/kubectl/pkg/cmd/drain/drain.go
#
- apiGroups: [""]
resources: ["nodes"]

View File

@@ -5,16 +5,29 @@ metadata:
namespace: {{ .Release.Namespace }}
labels:
{{- include "kured.labels" . | nindent 4 }}
{{- if .Values.dsAnnotations }}
annotations:
{{- range $key, $value := .Values.dsAnnotations }}
{{ $key }}: {{ $value | quote }}
{{- end }}
{{- end }}
spec:
updateStrategy:
type: {{ .Values.updateStrategy }}
{{- if eq .Values.updateStrategy "RollingUpdate"}}
rollingUpdate:
maxUnavailable: {{ .Values.maxUnavailable }}
{{- end}}
selector:
matchLabels:
{{- include "kured.labels" . | nindent 6 }}
{{- include "kured.matchLabels" . | nindent 6 }}
template:
metadata:
labels:
{{- include "kured.labels" . | nindent 8 }}
{{- if .Values.podLabels }}
{{- toYaml .Values.podLabels | nindent 8 }}
{{- end }}
{{- if .Values.podAnnotations }}
annotations:
{{- range $key, $value := .Values.podAnnotations }}
@@ -34,7 +47,7 @@ spec:
{{- end }}
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
securityContext:
privileged: true # Give permission to nsenter /proc/1/ns/mnt
@@ -45,47 +58,92 @@ spec:
args:
- --ds-name={{ template "kured.fullname" . }}
- --ds-namespace={{ .Release.Namespace }}
{{- if .Values.configuration.annotationTtl }}
- --annotation-ttl={{ .Values.configuration.annotationTtl }}
{{- if .Values.configuration.lockTtl }}
- --lock-ttl={{ .Values.configuration.lockTtl }}
{{- end }}
{{- if .Values.configuration.lockReleaseDelay }}
- --lock-release-delay={{ .Values.configuration.lockReleaseDelay }}
{{- end }}
{{- if .Values.configuration.alertFilterRegexp }}
- --alert-filter-regexp={{ .Values.configuration.alertFilterRegexp | quote }}
- --alert-filter-regexp={{ .Values.configuration.alertFilterRegexp }}
{{- end }}
{{- if .Values.configuration.alertFiringOnly }}
- --alert-firing-only={{ .Values.configuration.alertFiringOnly }}
{{- end }}
{{- range .Values.configuration.blockingPodSelector }}
- --blocking-pod-selector={{ . | quote }}
- --blocking-pod-selector={{ . }}
{{- end }}
{{- if .Values.configuration.endTime }}
- --end-time={{ .Values.configuration.endTime | quote }}
- --end-time={{ .Values.configuration.endTime }}
{{- end }}
{{- if .Values.configuration.lockAnnotation }}
- --lock-annotation={{ .Values.configuration.lockAnnotation | quote }}
- --lock-annotation={{ .Values.configuration.lockAnnotation }}
{{- end }}
{{- if .Values.configuration.period }}
- --period={{ .Values.configuration.period | quote }}
- --period={{ .Values.configuration.period }}
{{- end }}
{{- if .Values.configuration.forceReboot }}
- --force-reboot
{{- end }}
{{- if .Values.configuration.drainGracePeriod }}
- --drain-grace-period={{ .Values.configuration.drainGracePeriod }}
{{- end }}
{{- if .Values.configuration.drainTimeout }}
- --drain-timeout={{ .Values.configuration.drainTimeout }}
{{- end }}
{{- if .Values.configuration.skipWaitForDeleteTimeout }}
- --skip-wait-for-delete-timeout={{ .Values.configuration.skipWaitForDeleteTimeout }}
{{- end }}
{{- if .Values.configuration.prometheusUrl }}
- --prometheus-url={{ .Values.configuration.prometheusUrl | quote }}
- --prometheus-url={{ .Values.configuration.prometheusUrl }}
{{- end }}
{{- range .Values.configuration.rebootDays }}
- --reboot-days={{ . | quote }}
- --reboot-days={{ . }}
{{- end }}
{{- if .Values.configuration.rebootSentinel }}
- --reboot-sentinel={{ .Values.configuration.rebootSentinel | quote }}
- --reboot-sentinel={{ .Values.configuration.rebootSentinel }}
{{- end }}
{{- if .Values.configuration.rebootSentinelCommand }}
- --reboot-sentinel-command={{ .Values.configuration.rebootSentinelCommand }}
{{- end }}
{{- if .Values.configuration.rebootCommand }}
- --reboot-command={{ .Values.configuration.rebootCommand }}
{{- end }}
{{- if .Values.configuration.rebootDelay }}
- --reboot-delay={{ .Values.configuration.rebootDelay }}
{{- end }}
{{- if .Values.configuration.slackChannel }}
- --slack-channel={{ .Values.configuration.slackChannel | quote }}
- --slack-channel={{ .Values.configuration.slackChannel }}
{{- end }}
{{- if .Values.configuration.slackHookUrl }}
- --slack-hook-url={{ .Values.configuration.slackHookUrl | quote }}
- --slack-hook-url={{ .Values.configuration.slackHookUrl }}
{{- end }}
{{- if .Values.configuration.slackUsername }}
- --slack-username={{ .Values.configuration.slackUsername | quote }}
- --slack-username={{ .Values.configuration.slackUsername }}
{{- end }}
{{- if .Values.configuration.notifyUrl }}
- --notify-url={{ .Values.configuration.notifyUrl }}
{{- end }}
{{- if .Values.configuration.messageTemplateDrain }}
- --message-template-drain={{ .Values.configuration.messageTemplateDrain }}
{{- end }}
{{- if .Values.configuration.messageTemplateReboot }}
- --message-template-reboot={{ .Values.configuration.messageTemplateReboot }}
{{- end }}
{{- if .Values.configuration.startTime }}
- --start-time={{ .Values.configuration.startTime | quote }}
- --start-time={{ .Values.configuration.startTime }}
{{- end }}
{{- if .Values.configuration.timeZone }}
- --time-zone={{ .Values.configuration.timeZone | quote }}
- --time-zone={{ .Values.configuration.timeZone }}
{{- end }}
{{- if .Values.configuration.annotateNodes }}
- --annotate-nodes={{ .Values.configuration.annotateNodes }}
{{- end }}
{{- if .Values.configuration.preferNoScheduleTaint }}
- --prefer-no-schedule-taint={{ .Values.configuration.preferNoScheduleTaint }}
{{- end }}
{{- if .Values.configuration.logFormat }}
- --log-format={{ .Values.configuration.logFormat }}
{{- end }}
{{- range $key, $value := .Values.extraArgs }}
{{- if $value }}
@@ -94,9 +152,13 @@ spec:
- --{{ $key }}
{{- end }}
{{- end }}
{{- if .Values.volumeMounts }}
volumeMounts:
{{- toYaml .Values.volumeMounts | nindent 12 }}
{{- end }}
ports:
- containerPort: 8080
name: metrics
name: metrics
env:
# Pass in the name of the node on which this pod is scheduled
# for use with drain/uncordon operations and lock acquisition
@@ -104,6 +166,9 @@ spec:
valueFrom:
fieldRef:
fieldPath: spec.nodeName
{{- if .Values.extraEnvVars }}
{{ toYaml .Values.extraEnvVars | nindent 12 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{ toYaml . | indent 8 }}
@@ -115,4 +180,8 @@ spec:
{{- with .Values.affinity }}
affinity:
{{ toYaml . | indent 8 }}
{{- end }}
{{- end }}
{{- if .Values.volumes }}
volumes:
{{- toYaml .Values.volumes | nindent 8 }}
{{- end }}

View File

@@ -1,15 +1,29 @@
{{- if .Values.metrics.create }}
{{- if or .Values.service.create .Values.metrics.create }}
apiVersion: v1
kind: Service
metadata:
{{- if .Values.service.name }}
name: {{ .Values.service.name }}
{{- else }}
name: {{ template "kured.fullname" . }}
{{- end }}
labels:
{{- include "kured.labels" . | nindent 4 }}
{{- if .Values.service.annotations }}
annotations:
{{- range $key, $value := .Values.service.annotations }}
{{ $key }}: {{ $value | quote }}
{{- end }}
{{- end }}
spec:
type: ClusterIP
type: {{ .Values.service.type }}
ports:
- name: metrics
port: 8080
port: {{ .Values.service.port }}
targetPort: 8080
{{- if eq .Values.service.type "NodePort" }}
nodePort: {{ .Values.service.nodePort }}
{{- end }}
selector:
{{- include "kured.labels" . | nindent 4 }}
{{- end }}
{{- include "kured.matchLabels" . | nindent 4 }}
{{- end }}

View File

@@ -1,21 +1,21 @@
{{- if and .Values.metrics.create .Values.metrics.serviceMonitor.create }}
{{- if .Values.metrics.create }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: {{ template "kured.fullname" . }}
{{- if .Values.metrics.serviceMonitor.namespace }}
namespace: {{ .Values.metrics.serviceMonitor.namespace }}
{{- if .Values.metrics.namespace }}
namespace: {{ .Values.metrics.namespace }}
{{- end }}
labels:
{{- include "kured.labels" . | nindent 4 }}
{{- if .Values.metrics.serviceMonitor.labels }}
{{- toYaml .Values.metrics.serviceMonitor.labels | nindent 4 }}
{{- if .Values.metrics.labels }}
{{- toYaml .Values.metrics.labels | nindent 4 }}
{{- end }}
spec:
endpoints:
- interval: {{ .Values.metrics.serviceMonitor.interval }}
{{- if .Values.metrics.serviceMonitor.scrapeTimeout }}
scrapeTimeout: {{ .Values.metrics.serviceMonitor.scrapeTimeout }}
- interval: {{ .Values.metrics.interval }}
{{- if .Values.metrics.scrapeTimeout }}
scrapeTimeout: {{ .Values.metrics.scrapeTimeout }}
{{- end }}
honorLabels: true
targetPort: 8080
@@ -24,7 +24,7 @@ spec:
jobLabel: "{{ .Release.Name }}"
selector:
matchLabels:
{{- include "kured.labels" . | nindent 4 }}
{{- include "kured.matchLabels" . | nindent 6 }}
namespaceSelector:
matchNames:
- {{ .Release.Namespace }}

View File

@@ -0,0 +1,31 @@
image:
repository: weaveworks/kured
tag: latest
configuration:
# annotationTtl: 0 # force clean annotation after this amount of time (default 0, disabled)
# alertFilterRegexp: "" # alert names to ignore when checking for active alerts
# alertFiringOnly: false # only consider firing alerts when checking for active alerts
# blockingPodSelector: [] # label selector identifying pods whose presence should prevent reboots
# endTime: "" # only reboot before this time of day (default "23:59")
# lockAnnotation: "" # annotation in which to record locking node (default "weave.works/kured-node-lock")
period: "1m" # reboot check period (default 1h0m0s)
# forceReboot: false # force a reboot even if the drain fails or times out (default: false)
# drainGracePeriod: "" # time in seconds given to each pod to terminate gracefully, if negative, the default value specified in the pod will be used (default: -1)
# drainTimeout: "" # timeout after which the drain is aborted (default: 0, infinite time)
# skipWaitForDeleteTimeout: "" # when time is greater than zero, skip waiting for the pods whose deletion timestamp is older than N seconds while draining a node (default: 0)
# prometheusUrl: "" # Prometheus instance to probe for active alerts
# rebootDays: [] # only reboot on these days (default [su,mo,tu,we,th,fr,sa])
# rebootSentinel: "" # path to file whose existence signals need to reboot (default "/var/run/reboot-required")
# rebootSentinelCommand: "" # command for which a successful run signals need to reboot (default ""). If non-empty, sentinel file will be ignored.
# slackChannel: "" # slack channel for reboot notfications
# slackHookUrl: "" # slack hook URL for reboot notfications
# slackUsername: "" # slack username for reboot notfications (default "kured")
# notifyUrl: "" # notification URL with the syntax as follows: https://containrrr.dev/shoutrrr/services/overview/
# messageTemplateDrain: "" # slack message template when notifying about a node being drained (default "Draining node %s")
# messageTemplateReboot: "" # slack message template when notifying about a node being rebooted (default "Rebooted node %s")
# startTime: "" # only reboot after this time of day (default "0:00")
# timeZone: "" # time-zone to use (valid zones from "time" golang package)
# annotateNodes: false # enable 'weave.works/kured-reboot-in-progress' and 'weave.works/kured-most-recent-reboot-needed' node annotations to signify kured reboot operations
# lockReleaseDelay: "5m" # hold lock after reboot by this amount of time (default 0, disabled)
# logFormat: "text" # log format specified as text or json, defaults to text

View File

@@ -1,30 +1,57 @@
image:
repository: weaveworks/kured
tag: 1.4.2
tag: "" # will default to the appVersion in Chart.yaml
pullPolicy: IfNotPresent
pullSecrets: []
updateStrategy: OnDelete
updateStrategy: RollingUpdate
# requires RollingUpdate updateStrategy
maxUnavailable: 1
podAnnotations: {}
dsAnnotations: {}
extraArgs: {}
extraEnvVars:
# - name: slackHookUrl
# valueFrom:
# secretKeyRef:
# name: secret_name
# key: secret_key
# - name: regularEnvVariable
# value: 123
configuration:
annotationTtl: 0 # force clean annotation after this ammount of time (default 0, disabled)
alertFilterRegexp: "" # alert names to ignore when checking for active alerts
blockingPodSelector: [] # label selector identifying pods whose presence should prevent reboots
endTime: "" # only reboot before this time of day (default "23:59")
lockAnnotation: "" # annotation in which to record locking node (default "weave.works/kured-node-lock")
period: "" # reboot check period (default 1h0m0s)
prometheusUrl: "" # Prometheus instance to probe for active alerts
rebootDays: [] # only reboot on these days (default [su,mo,tu,we,th,fr,sa])
rebootSentinel: "" # path to file whose existence signals need to reboot (default "/var/run/reboot-required")
slackChannel: "" # slack channel for reboot notfications
slackHookUrl: "" # slack hook URL for reboot notfications
slackUsername: "" # slack username for reboot notfications (default "kured")
startTime: "" # only reboot after this time of day (default "0:00")
timeZone: "" # time-zone to use (valid zones from "time" golang package)
lockTtl: 0 # force clean annotation after this amount of time (default 0, disabled)
alertFilterRegexp: "" # alert names to ignore when checking for active alerts
alertFiringOnly: false # only consider firing alerts when checking for active alerts
blockingPodSelector: [] # label selector identifying pods whose presence should prevent reboots
endTime: "" # only reboot before this time of day (default "23:59")
lockAnnotation: "" # annotation in which to record locking node (default "weave.works/kured-node-lock")
period: "" # reboot check period (default 1h0m0s)
forceReboot: false # force a reboot even if the drain fails or times out (default: false)
drainGracePeriod: "" # time in seconds given to each pod to terminate gracefully, if negative, the default value specified in the pod will be used (default: -1)
drainTimeout: "" # timeout after which the drain is aborted (default: 0, infinite time)
skipWaitForDeleteTimeout: "" # when time is greater than zero, skip waiting for the pods whose deletion timestamp is older than N seconds while draining a node (default: 0)
prometheusUrl: "" # Prometheus instance to probe for active alerts
rebootDays: [] # only reboot on these days (default [su,mo,tu,we,th,fr,sa])
rebootSentinel: "" # path to file whose existence signals need to reboot (default "/var/run/reboot-required")
rebootSentinelCommand: "" # command for which a successful run signals need to reboot (default ""). If non-empty, sentinel file will be ignored.
rebootCommand: "/bin/systemctl reboot" # command to run when a reboot is required by the sentinel
rebootDelay: "" # add a delay after drain finishes but before the reboot command is issued
slackChannel: "" # slack channel for reboot notfications
slackHookUrl: "" # slack hook URL for reboot notfications
slackUsername: "" # slack username for reboot notfications (default "kured")
notifyUrl: "" # notification URL with the syntax as follows: https://containrrr.dev/shoutrrr/services/overview/
messageTemplateDrain: "" # slack message template when notifying about a node being drained (default "Draining node %s")
messageTemplateReboot: "" # slack message template when notifying about a node being rebooted (default "Rebooted node %s")
startTime: "" # only reboot after this time of day (default "0:00")
timeZone: "" # time-zone to use (valid zones from "time" golang package)
annotateNodes: false # enable 'weave.works/kured-reboot-in-progress' and 'weave.works/kured-most-recent-reboot-needed' node annotations to signify kured reboot operations
lockReleaseDelay: 0 # hold lock after reboot by this amount of time (default 0, disabled)
preferNoScheduleTaint: "" # Taint name applied during pending node reboot (to prevent receiving additional pods from other rebooting nodes). Disabled by default. Set e.g. to "weave.works/kured-node-reboot" to enable tainting.
logFormat: "text" # log format specified as text or json, defaults to text
rbac:
create: true
@@ -40,12 +67,19 @@ resources: {}
metrics:
create: false
serviceMonitor:
create: true
namespace: ""
labels: {}
interval: 60s
scrapeTimeout: ""
namespace: ""
labels: {}
interval: 60s
scrapeTimeout: ""
service:
create: false
port: 8080
annotations: {}
name: ""
type: ClusterIP
podLabels: {}
priorityClassName: ""
@@ -56,3 +90,7 @@ tolerations:
affinity: {}
nodeSelector: {}
volumeMounts: []
volumes: []

View File

@@ -1,7 +1,4 @@
FROM alpine:3.11
RUN apk update && apk add ca-certificates tzdata && rm -rf /var/cache/apk/*
# NB: you may need to update RBAC permissions when upgrading kubectl - see kured-rbac.yaml for details
ADD https://storage.googleapis.com/kubernetes-release/release/v1.17.7/bin/linux/amd64/kubectl /usr/bin/kubectl
RUN chmod 0755 /usr/bin/kubectl
FROM alpine:3.15.0
RUN apk update --no-cache && apk upgrade --no-cache && apk add --no-cache ca-certificates tzdata
COPY ./kured /usr/bin/kured
ENTRYPOINT ["/usr/bin/kured"]

View File

@@ -0,0 +1,19 @@
FROM --platform=$BUILDPLATFORM golang:bullseye AS build
ARG TARGETOS
ARG TARGETARCH
ARG TARGETVARIANT
ENV GOOS=$TARGETOS
ENV GOARCH=$TARGETARCH
ENV GOVARIANT=$TARGETVARIANT
WORKDIR /src
COPY . .
RUN go list -f '{{join .Deps "\n"}}' ./cmd/kured | grep -v /vendor/ | xargs go list -f '{{if not .Standard}}{{ $dep := . }}{{range .GoFiles}}{{$dep.Dir}}/{{.}} {{end}}{{end}}'
RUN CGO_ENABLED=0 go build -o cmd/kured/kured cmd/kured/*.go
FROM --platform=$TARGETPLATFORM alpine:3.15 as bin
RUN apk update --no-cache && apk upgrade --no-cache && apk add --no-cache ca-certificates tzdata
COPY --from=build /src/cmd/kured/kured /usr/bin/kured
ENTRYPOINT ["/usr/bin/kured"]

View File

@@ -1,26 +1,39 @@
package main
import (
"context"
"encoding/json"
"fmt"
"math/rand"
"net/http"
"net/url"
"os"
"os/exec"
"regexp"
"strings"
"time"
papi "github.com/prometheus/client_golang/api"
log "github.com/sirupsen/logrus"
"github.com/spf13/cobra"
"github.com/spf13/pflag"
"github.com/spf13/viper"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/rest"
kubectldrain "k8s.io/kubectl/pkg/drain"
"github.com/google/shlex"
shoutrrr "github.com/containrrr/shoutrrr"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
"github.com/weaveworks/kured/pkg/alerts"
"github.com/weaveworks/kured/pkg/daemonsetlock"
"github.com/weaveworks/kured/pkg/delaytick"
"github.com/weaveworks/kured/pkg/notifications/slack"
"github.com/weaveworks/kured/pkg/taints"
"github.com/weaveworks/kured/pkg/timewindow"
)
@@ -28,24 +41,39 @@ var (
version = "unreleased"
// Command line flags
period time.Duration
dsNamespace string
dsName string
lockAnnotation string
prometheusURL string
alertFilter *regexp.Regexp
rebootSentinel string
slackHookURL string
slackUsername string
slackChannel string
podSelectors []string
forceReboot bool
drainTimeout time.Duration
rebootDelay time.Duration
period time.Duration
drainGracePeriod int
skipWaitForDeleteTimeoutSeconds int
dsNamespace string
dsName string
lockAnnotation string
lockTTL time.Duration
lockReleaseDelay time.Duration
prometheusURL string
preferNoScheduleTaintName string
alertFilter *regexp.Regexp
alertFiringOnly bool
rebootSentinelFile string
rebootSentinelCommand string
notifyURL string
slackHookURL string
slackUsername string
slackChannel string
messageTemplateDrain string
messageTemplateReboot string
podSelectors []string
rebootCommand string
logFormat string
nodeID string
rebootDays []string
rebootStart string
rebootEnd string
timezone string
annotationTTL time.Duration
rebootDays []string
rebootStart string
rebootEnd string
timezone string
annotateNodes bool
// Metrics
rebootRequiredGauge = prometheus.NewGaugeVec(prometheus.GaugeOpts{
@@ -55,37 +83,89 @@ var (
}, []string{"node"})
)
const (
// KuredNodeLockAnnotation is the canonical string value for the kured node-lock annotation
KuredNodeLockAnnotation string = "weave.works/kured-node-lock"
// KuredRebootInProgressAnnotation is the canonical string value for the kured reboot-in-progress annotation
KuredRebootInProgressAnnotation string = "weave.works/kured-reboot-in-progress"
// KuredMostRecentRebootNeededAnnotation is the canonical string value for the kured most-recent-reboot-needed annotation
KuredMostRecentRebootNeededAnnotation string = "weave.works/kured-most-recent-reboot-needed"
// EnvPrefix The environment variable prefix of all environment variables bound to our command line flags.
EnvPrefix = "KURED"
)
func init() {
prometheus.MustRegister(rebootRequiredGauge)
}
func main() {
rootCmd := &cobra.Command{
Use: "kured",
Short: "Kubernetes Reboot Daemon",
Run: root}
cmd := NewRootCommand()
if err := cmd.Execute(); err != nil {
log.Fatal(err)
}
}
// NewRootCommand construct the Cobra root command
func NewRootCommand() *cobra.Command {
rootCmd := &cobra.Command{
Use: "kured",
Short: "Kubernetes Reboot Daemon",
PersistentPreRunE: bindViper,
PreRun: flagCheck,
Run: root}
rootCmd.PersistentFlags().StringVar(&nodeID, "node-id", "",
"node name kured runs on, should be passed down from spec.nodeName via KURED_NODE_ID environment variable")
rootCmd.PersistentFlags().BoolVar(&forceReboot, "force-reboot", false,
"force a reboot even if the drain fails or times out (default: false)")
rootCmd.PersistentFlags().IntVar(&drainGracePeriod, "drain-grace-period", -1,
"time in seconds given to each pod to terminate gracefully, if negative, the default value specified in the pod will be used (default: -1)")
rootCmd.PersistentFlags().IntVar(&skipWaitForDeleteTimeoutSeconds, "skip-wait-for-delete-timeout", 0,
"when seconds is greater than zero, skip waiting for the pods whose deletion timestamp is older than N seconds while draining a node (default: 0)")
rootCmd.PersistentFlags().DurationVar(&drainTimeout, "drain-timeout", 0,
"timeout after which the drain is aborted (default: 0, infinite time)")
rootCmd.PersistentFlags().DurationVar(&rebootDelay, "reboot-delay", 0,
"delay reboot for this duration (default: 0, disabled)")
rootCmd.PersistentFlags().DurationVar(&period, "period", time.Minute*60,
"reboot check period")
"sentinel check period")
rootCmd.PersistentFlags().StringVar(&dsNamespace, "ds-namespace", "kube-system",
"namespace containing daemonset on which to place lock")
rootCmd.PersistentFlags().StringVar(&dsName, "ds-name", "kured",
"name of daemonset on which to place lock")
rootCmd.PersistentFlags().StringVar(&lockAnnotation, "lock-annotation", "weave.works/kured-node-lock",
rootCmd.PersistentFlags().StringVar(&lockAnnotation, "lock-annotation", KuredNodeLockAnnotation,
"annotation in which to record locking node")
rootCmd.PersistentFlags().DurationVar(&lockTTL, "lock-ttl", 0,
"expire lock annotation after this duration (default: 0, disabled)")
rootCmd.PersistentFlags().DurationVar(&lockReleaseDelay, "lock-release-delay", 0,
"delay lock release for this duration (default: 0, disabled)")
rootCmd.PersistentFlags().StringVar(&prometheusURL, "prometheus-url", "",
"Prometheus instance to probe for active alerts")
rootCmd.PersistentFlags().Var(&regexpValue{&alertFilter}, "alert-filter-regexp",
"alert names to ignore when checking for active alerts")
rootCmd.PersistentFlags().StringVar(&rebootSentinel, "reboot-sentinel", "/var/run/reboot-required",
"path to file whose existence signals need to reboot")
rootCmd.PersistentFlags().BoolVar(&alertFiringOnly, "alert-firing-only", false,
"only consider firing alerts when checking for active alerts (default: false)")
rootCmd.PersistentFlags().StringVar(&rebootSentinelFile, "reboot-sentinel", "/var/run/reboot-required",
"path to file whose existence triggers the reboot command")
rootCmd.PersistentFlags().StringVar(&preferNoScheduleTaintName, "prefer-no-schedule-taint", "",
"Taint name applied during pending node reboot (to prevent receiving additional pods from other rebooting nodes). Disabled by default. Set e.g. to \"weave.works/kured-node-reboot\" to enable tainting.")
rootCmd.PersistentFlags().StringVar(&rebootSentinelCommand, "reboot-sentinel-command", "",
"command for which a zero return code will trigger a reboot command")
rootCmd.PersistentFlags().StringVar(&rebootCommand, "reboot-command", "/bin/systemctl reboot",
"command to run when a reboot is required")
rootCmd.PersistentFlags().StringVar(&slackHookURL, "slack-hook-url", "",
"slack hook URL for reboot notfications")
"slack hook URL for notifications")
rootCmd.PersistentFlags().StringVar(&slackUsername, "slack-username", "kured",
"slack username for reboot notfications")
"slack username for notifications")
rootCmd.PersistentFlags().StringVar(&slackChannel, "slack-channel", "",
"slack channel for reboot notfications")
rootCmd.PersistentFlags().StringVar(&notifyURL, "notify-url", "",
"notify URL for reboot notfications")
rootCmd.PersistentFlags().StringVar(&messageTemplateDrain, "message-template-drain", "Draining node %s",
"message template used to notify about a node being drained")
rootCmd.PersistentFlags().StringVar(&messageTemplateReboot, "message-template-reboot", "Rebooting node %s",
"message template used to notify about a node being rebooted")
rootCmd.PersistentFlags().StringArrayVar(&podSelectors, "blocking-pod-selector", nil,
"label selector identifying pods whose presence should prevent reboots")
@@ -99,18 +179,71 @@ func main() {
rootCmd.PersistentFlags().StringVar(&timezone, "time-zone", "UTC",
"use this timezone for schedule inputs")
rootCmd.PersistentFlags().DurationVar(&annotationTTL, "annotation-ttl", 0,
"force clean annotation after this ammount of time (default 0, disabled)")
rootCmd.PersistentFlags().BoolVar(&annotateNodes, "annotate-nodes", false,
"if set, the annotations 'weave.works/kured-reboot-in-progress' and 'weave.works/kured-most-recent-reboot-needed' will be given to nodes undergoing kured reboots")
if err := rootCmd.Execute(); err != nil {
log.Fatal(err)
rootCmd.PersistentFlags().StringVar(&logFormat, "log-format", "text",
"use text or json log format")
return rootCmd
}
// temporary func that checks for deprecated slack-notification-related flags
func flagCheck(cmd *cobra.Command, args []string) {
if slackHookURL != "" && notifyURL != "" {
log.Warnf("Cannot use both --notify-url and --slack-hook-url flags. Kured will use --notify-url flag only...")
}
if slackHookURL != "" {
log.Warnf("Deprecated flag(s). Please use --notify-url flag instead.")
trataURL, err := url.Parse(slackHookURL)
if err != nil {
log.Warnf("slack-hook-url is not properly formatted... no notification will be sent: %v\n", err)
}
if len(strings.Split(strings.Trim(trataURL.Path, "/services/"), "/")) != 3 {
log.Warnf("slack-hook-url is not properly formatted... no notification will be sent: unexpected number of / in URL\n")
} else {
notifyURL = fmt.Sprintf("slack://%s", strings.Trim(trataURL.Path, "/services/"))
}
}
}
// bindViper initializes viper and binds command flags with environment variables
func bindViper(cmd *cobra.Command, args []string) error {
v := viper.New()
v.SetEnvPrefix(EnvPrefix)
v.AutomaticEnv()
bindFlags(cmd, v)
return nil
}
// bindFlags binds each cobra flag to its associated viper configuration (environment variable)
func bindFlags(cmd *cobra.Command, v *viper.Viper) {
cmd.Flags().VisitAll(func(f *pflag.Flag) {
// Environment variables can't have dashes in them, so bind them to their equivalent keys with underscores
if strings.Contains(f.Name, "-") {
v.BindEnv(f.Name, flagToEnvVar(f.Name))
}
// Apply the viper config value to the flag when the flag is not set and viper has a value
if !f.Changed && v.IsSet(f.Name) {
val := v.Get(f.Name)
log.Infof("Binding %s command flag to environment variable: %s", f.Name, flagToEnvVar(f.Name))
cmd.Flags().Set(f.Name, fmt.Sprintf("%v", val))
}
})
}
// flagToEnvVar converts command flag name to equivalent environment variable name
func flagToEnvVar(flag string) string {
envVarSuffix := strings.ToUpper(strings.ReplaceAll(flag, "-", "_"))
return fmt.Sprintf("%s_%s", EnvPrefix, envVarSuffix)
}
// newCommand creates a new Command with stdout/stderr wired to our standard logger
func newCommand(name string, arg ...string) *exec.Cmd {
cmd := exec.Command(name, arg...)
cmd.Stdout = log.NewEntry(log.StandardLogger()).
WithField("cmd", cmd.Args[0]).
WithField("std", "out").
@@ -124,10 +257,19 @@ func newCommand(name string, arg ...string) *exec.Cmd {
return cmd
}
func sentinelExists() bool {
// Relies on hostPID:true and privileged:true to enter host mount space
sentinelCmd := newCommand("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "--", "/usr/bin/test", "-f", rebootSentinel)
if err := sentinelCmd.Run(); err != nil {
// buildHostCommand writes a new command to run in the host namespace
// Rancher based need different pid
func buildHostCommand(pid int, command []string) []string {
// From the container, we nsenter into the proper PID to run the hostCommand.
// For this, kured daemonset need to be configured with hostPID:true and privileged:true
cmd := []string{"/usr/bin/nsenter", fmt.Sprintf("-m/proc/%d/ns/mnt", pid), "--"}
cmd = append(cmd, command...)
return cmd
}
func rebootRequired(sentinelCommand []string) bool {
if err := newCommand(sentinelCommand[0], sentinelCommand[1:]...).Run(); err != nil {
switch err := err.(type) {
case *exec.ExitError:
// We assume a non-zero exit code means 'reboot not required', but of course
@@ -144,36 +286,56 @@ func sentinelExists() bool {
return true
}
func rebootRequired() bool {
if sentinelExists() {
log.Infof("Reboot required")
return true
} else {
log.Infof("Reboot not required")
return false
}
// RebootBlocker interface should be implemented by types
// to know if their instantiations should block a reboot
type RebootBlocker interface {
isBlocked() bool
}
func rebootBlocked(client *kubernetes.Clientset, nodeID string) bool {
if prometheusURL != "" {
alertNames, err := alerts.PrometheusActiveAlerts(prometheusURL, alertFilter)
if err != nil {
log.Warnf("Reboot blocked: prometheus query error: %v", err)
return true
}
count := len(alertNames)
if count > 10 {
alertNames = append(alertNames[:10], "...")
}
if count > 0 {
log.Warnf("Reboot blocked: %d active alerts: %v", count, alertNames)
return true
}
}
// PrometheusBlockingChecker contains info for connecting
// to prometheus, and can give info about whether a reboot should be blocked
type PrometheusBlockingChecker struct {
// prometheusClient to make prometheus-go-client and api config available
// into the PrometheusBlockingChecker struct
promClient *alerts.PromClient
// regexp used to get alerts
filter *regexp.Regexp
// bool to indicate if only firing alerts should be considered
firingOnly bool
}
fieldSelector := fmt.Sprintf("spec.nodeName=%s", nodeID)
for _, labelSelector := range podSelectors {
podList, err := client.CoreV1().Pods("").List(metav1.ListOptions{
// KubernetesBlockingChecker contains info for connecting
// to k8s, and can give info about whether a reboot should be blocked
type KubernetesBlockingChecker struct {
// client used to contact kubernetes API
client *kubernetes.Clientset
nodename string
// lised used to filter pods (podSelector)
filter []string
}
func (pb PrometheusBlockingChecker) isBlocked() bool {
alertNames, err := pb.promClient.ActiveAlerts(pb.filter, pb.firingOnly)
if err != nil {
log.Warnf("Reboot blocked: prometheus query error: %v", err)
return true
}
count := len(alertNames)
if count > 10 {
alertNames = append(alertNames[:10], "...")
}
if count > 0 {
log.Warnf("Reboot blocked: %d active alerts: %v", count, alertNames)
return true
}
return false
}
func (kb KubernetesBlockingChecker) isBlocked() bool {
fieldSelector := fmt.Sprintf("spec.nodeName=%s,status.phase!=Succeeded,status.phase!=Failed,status.phase!=Unknown", kb.nodename)
for _, labelSelector := range kb.filter {
podList, err := kb.client.CoreV1().Pods("").List(context.TODO(), metav1.ListOptions{
LabelSelector: labelSelector,
FieldSelector: fieldSelector,
Limit: 10})
@@ -194,7 +356,15 @@ func rebootBlocked(client *kubernetes.Clientset, nodeID string) bool {
return true
}
}
return false
}
func rebootBlocked(blockers ...RebootBlocker) bool {
for _, blocker := range blockers {
if blocker.isBlocked() {
return true
}
}
return false
}
@@ -224,6 +394,13 @@ func acquire(lock *daemonsetlock.DaemonSetLock, metadata interface{}, TTL time.D
}
}
func throttle(releaseDelay time.Duration) {
if releaseDelay > 0 {
log.Infof("Delaying lock release by %v", releaseDelay)
time.Sleep(releaseDelay)
}
}
func release(lock *daemonsetlock.DaemonSetLock) {
log.Infof("Releasing lock")
if err := lock.Release(); err != nil {
@@ -231,50 +408,78 @@ func release(lock *daemonsetlock.DaemonSetLock) {
}
}
func drain(nodeID string) {
log.Infof("Draining node %s", nodeID)
func drain(client *kubernetes.Clientset, node *v1.Node) {
nodename := node.GetName()
if slackHookURL != "" {
if err := slack.NotifyDrain(slackHookURL, slackUsername, slackChannel, nodeID); err != nil {
log.Warnf("Error notifying slack: %v", err)
log.Infof("Draining node %s", nodename)
if notifyURL != "" {
if err := shoutrrr.Send(notifyURL, fmt.Sprintf(messageTemplateDrain, nodename)); err != nil {
log.Warnf("Error notifying: %v", err)
}
}
drainCmd := newCommand("/usr/bin/kubectl", "drain",
"--ignore-daemonsets", "--delete-local-data", "--force", nodeID)
drainer := &kubectldrain.Helper{
Client: client,
Ctx: context.Background(),
GracePeriodSeconds: drainGracePeriod,
SkipWaitForDeleteTimeoutSeconds: skipWaitForDeleteTimeoutSeconds,
Force: true,
DeleteEmptyDirData: true,
IgnoreAllDaemonSets: true,
ErrOut: os.Stderr,
Out: os.Stdout,
Timeout: drainTimeout,
}
if err := drainCmd.Run(); err != nil {
log.Fatalf("Error invoking drain command: %v", err)
if err := kubectldrain.RunCordonOrUncordon(drainer, node, true); err != nil {
if !forceReboot {
log.Fatalf("Error cordonning %s: %v", nodename, err)
}
log.Errorf("Error cordonning %s: %v, continuing with reboot anyway", nodename, err)
return
}
if err := kubectldrain.RunNodeDrain(drainer, nodename); err != nil {
if !forceReboot {
log.Fatalf("Error draining %s: %v", nodename, err)
}
log.Errorf("Error draining %s: %v, continuing with reboot anyway", nodename, err)
return
}
}
func uncordon(nodeID string) {
log.Infof("Uncordoning node %s", nodeID)
uncordonCmd := newCommand("/usr/bin/kubectl", "uncordon", nodeID)
if err := uncordonCmd.Run(); err != nil {
log.Fatalf("Error invoking uncordon command: %v", err)
func uncordon(client *kubernetes.Clientset, node *v1.Node) {
nodename := node.GetName()
log.Infof("Uncordoning node %s", nodename)
drainer := &kubectldrain.Helper{
Client: client,
ErrOut: os.Stderr,
Out: os.Stdout,
Ctx: context.Background(),
}
if err := kubectldrain.RunCordonOrUncordon(drainer, node, false); err != nil {
log.Fatalf("Error uncordonning %s: %v", nodename, err)
}
}
func commandReboot(nodeID string) {
log.Infof("Commanding reboot")
func invokeReboot(nodeID string, rebootCommand []string) {
log.Infof("Running command: %s for node: %s", rebootCommand, nodeID)
if slackHookURL != "" {
if err := slack.NotifyReboot(slackHookURL, slackUsername, slackChannel, nodeID); err != nil {
log.Warnf("Error notifying slack: %v", err)
if notifyURL != "" {
if err := shoutrrr.Send(notifyURL, fmt.Sprintf(messageTemplateReboot, nodeID)); err != nil {
log.Warnf("Error notifying: %v", err)
}
}
// Relies on hostPID:true and privileged:true to enter host mount space
rebootCmd := newCommand("/usr/bin/nsenter", "-m/proc/1/ns/mnt", "/bin/systemctl", "reboot")
if err := rebootCmd.Run(); err != nil {
if err := newCommand(rebootCommand[0], rebootCommand[1:]...).Run(); err != nil {
log.Fatalf("Error invoking reboot command: %v", err)
}
}
func maintainRebootRequiredMetric(nodeID string) {
func maintainRebootRequiredMetric(nodeID string, sentinelCommand []string) {
for {
if sentinelExists() {
if rebootRequired(sentinelCommand) {
rebootRequiredGauge.WithLabelValues(nodeID).Set(1)
} else {
rebootRequiredGauge.WithLabelValues(nodeID).Set(0)
@@ -288,7 +493,45 @@ type nodeMeta struct {
Unschedulable bool `json:"unschedulable"`
}
func rebootAsRequired(nodeID string, window *timewindow.TimeWindow, TTL time.Duration) {
func addNodeAnnotations(client *kubernetes.Clientset, nodeID string, annotations map[string]string) {
node, err := client.CoreV1().Nodes().Get(context.TODO(), nodeID, metav1.GetOptions{})
if err != nil {
log.Fatalf("Error retrieving node object via k8s API: %s", err)
}
for k, v := range annotations {
node.Annotations[k] = v
log.Infof("Adding node %s annotation: %s=%s", node.GetName(), k, v)
}
bytes, err := json.Marshal(node)
if err != nil {
log.Fatalf("Error marshalling node object into JSON: %v", err)
}
_, err = client.CoreV1().Nodes().Patch(context.TODO(), node.GetName(), types.StrategicMergePatchType, bytes, metav1.PatchOptions{})
if err != nil {
var annotationsErr string
for k, v := range annotations {
annotationsErr += fmt.Sprintf("%s=%s ", k, v)
}
log.Fatalf("Error adding node annotations %s via k8s API: %v", annotationsErr, err)
}
}
func deleteNodeAnnotation(client *kubernetes.Clientset, nodeID, key string) {
log.Infof("Deleting node %s annotation %s", nodeID, key)
// JSON Patch takes as path input a JSON Pointer, defined in RFC6901
// So we replace all instances of "/" with "~1" as per:
// https://tools.ietf.org/html/rfc6901#section-3
patch := []byte(fmt.Sprintf("[{\"op\":\"remove\",\"path\":\"/metadata/annotations/%s\"}]", strings.ReplaceAll(key, "/", "~1")))
_, err := client.CoreV1().Nodes().Patch(context.TODO(), nodeID, types.JSONPatchType, patch, metav1.PatchOptions{})
if err != nil {
log.Fatalf("Error deleting node annotation %s via k8s API: %v", key, err)
}
}
func rebootAsRequired(nodeID string, rebootCommand []string, sentinelCommand []string, window *timewindow.TimeWindow, TTL time.Duration, releaseDelay time.Duration) {
config, err := rest.InClusterConfig()
if err != nil {
log.Fatal(err)
@@ -303,40 +546,138 @@ func rebootAsRequired(nodeID string, window *timewindow.TimeWindow, TTL time.Dur
nodeMeta := nodeMeta{}
if holding(lock, &nodeMeta) {
if !nodeMeta.Unschedulable {
uncordon(nodeID)
node, err := client.CoreV1().Nodes().Get(context.TODO(), nodeID, metav1.GetOptions{})
if err != nil {
log.Fatalf("Error retrieving node object via k8s API: %v", err)
}
if !nodeMeta.Unschedulable {
uncordon(client, node)
}
// If we're holding the lock we know we've tried, in a prior run, to reboot
// So (1) we want to confirm that the reboot succeeded practically ( !rebootRequired() )
// And (2) check if we previously annotated the node that it was in the process of being rebooted,
// And finally (3) if it has that annotation, to delete it.
// This indicates to other node tools running on the cluster that this node may be a candidate for maintenance
if annotateNodes && !rebootRequired(sentinelCommand) {
if _, ok := node.Annotations[KuredRebootInProgressAnnotation]; ok {
deleteNodeAnnotation(client, nodeID, KuredRebootInProgressAnnotation)
}
}
throttle(releaseDelay)
release(lock)
}
preferNoScheduleTaint := taints.New(client, nodeID, preferNoScheduleTaintName, v1.TaintEffectPreferNoSchedule)
// Remove taint immediately during startup to quickly allow scheduling again.
if !rebootRequired(sentinelCommand) {
preferNoScheduleTaint.Disable()
}
// instantiate prometheus client
promClient, err := alerts.NewPromClient(papi.Config{Address: prometheusURL})
if err != nil {
log.Fatal("Unable to create prometheus client: ", err)
}
source := rand.NewSource(time.Now().UnixNano())
tick := delaytick.New(source, period)
for _ = range tick {
if window.Contains(time.Now()) && rebootRequired() && !rebootBlocked(client, nodeID) {
node, err := client.CoreV1().Nodes().Get(nodeID, metav1.GetOptions{})
if err != nil {
log.Fatal(err)
}
nodeMeta.Unschedulable = node.Spec.Unschedulable
for range tick {
if !window.Contains(time.Now()) {
// Remove taint outside the reboot time window to allow for normal operation.
preferNoScheduleTaint.Disable()
continue
}
if acquire(lock, &nodeMeta, TTL) {
if !nodeMeta.Unschedulable {
drain(nodeID)
}
commandReboot(nodeID)
for {
log.Infof("Waiting for reboot")
time.Sleep(time.Minute)
}
if !rebootRequired(sentinelCommand) {
log.Infof("Reboot not required")
preferNoScheduleTaint.Disable()
continue
}
log.Infof("Reboot required")
var blockCheckers []RebootBlocker
if prometheusURL != "" {
blockCheckers = append(blockCheckers, PrometheusBlockingChecker{promClient: promClient, filter: alertFilter, firingOnly: alertFiringOnly})
}
if podSelectors != nil {
blockCheckers = append(blockCheckers, KubernetesBlockingChecker{client: client, nodename: nodeID, filter: podSelectors})
}
if rebootBlocked(blockCheckers...) {
continue
}
node, err := client.CoreV1().Nodes().Get(context.TODO(), nodeID, metav1.GetOptions{})
if err != nil {
log.Fatalf("Error retrieving node object via k8s API: %v", err)
}
nodeMeta.Unschedulable = node.Spec.Unschedulable
var timeNowString string
if annotateNodes {
if _, ok := node.Annotations[KuredRebootInProgressAnnotation]; !ok {
timeNowString = time.Now().Format(time.RFC3339)
// Annotate this node to indicate that "I am going to be rebooted!"
// so that other node maintenance tools running on the cluster are aware that this node is in the process of a "state transition"
annotations := map[string]string{KuredRebootInProgressAnnotation: timeNowString}
// & annotate this node with a timestamp so that other node maintenance tools know how long it's been since this node has been marked for reboot
annotations[KuredMostRecentRebootNeededAnnotation] = timeNowString
addNodeAnnotations(client, nodeID, annotations)
}
}
if !acquire(lock, &nodeMeta, TTL) {
// Prefer to not schedule pods onto this node to avoid draing the same pod multiple times.
preferNoScheduleTaint.Enable()
continue
}
drain(client, node)
if rebootDelay > 0 {
log.Infof("Delaying reboot for %v", rebootDelay)
time.Sleep(rebootDelay)
}
invokeReboot(nodeID, rebootCommand)
for {
log.Infof("Waiting for reboot")
time.Sleep(time.Minute)
}
}
}
// buildSentinelCommand creates the shell command line which will need wrapping to escape
// the container boundaries
func buildSentinelCommand(rebootSentinelFile string, rebootSentinelCommand string) []string {
if rebootSentinelCommand != "" {
cmd, err := shlex.Split(rebootSentinelCommand)
if err != nil {
log.Fatalf("Error parsing provided sentinel command: %v", err)
}
return cmd
}
return []string{"test", "-f", rebootSentinelFile}
}
// parseRebootCommand creates the shell command line which will need wrapping to escape
// the container boundaries
func parseRebootCommand(rebootCommand string) []string {
command, err := shlex.Split(rebootCommand)
if err != nil {
log.Fatalf("Error parsing provided reboot command: %v", err)
}
return command
}
func root(cmd *cobra.Command, args []string) {
if logFormat == "json" {
log.SetFormatter(&log.JSONFormatter{})
}
log.Infof("Kubernetes Reboot Daemon: %s", version)
nodeID := os.Getenv("KURED_NODE_ID")
if nodeID == "" {
log.Fatal("KURED_NODE_ID environment variable required")
}
@@ -346,19 +687,38 @@ func root(cmd *cobra.Command, args []string) {
log.Fatalf("Failed to build time window: %v", err)
}
sentinelCommand := buildSentinelCommand(rebootSentinelFile, rebootSentinelCommand)
restartCommand := parseRebootCommand(rebootCommand)
log.Infof("Node ID: %s", nodeID)
log.Infof("Lock Annotation: %s/%s:%s", dsNamespace, dsName, lockAnnotation)
log.Infof("Reboot Sentinel: %s every %v", rebootSentinel, period)
log.Infof("Blocking Pod Selectors: %v", podSelectors)
log.Infof("Reboot on: %v", window)
if annotationTTL > 0 {
log.Infof("Force annotation cleanup after: %v", annotationTTL)
if lockTTL > 0 {
log.Infof("Lock TTL set, lock will expire after: %v", lockTTL)
} else {
log.Info("Force annotation cleanup disabled.")
log.Info("Lock TTL not set, lock will remain until being released")
}
if lockReleaseDelay > 0 {
log.Infof("Lock release delay set, lock release will be delayed by: %v", lockReleaseDelay)
} else {
log.Info("Lock release delay not set, lock will be released immediately after rebooting")
}
log.Infof("PreferNoSchedule taint: %s", preferNoScheduleTaintName)
log.Infof("Blocking Pod Selectors: %v", podSelectors)
log.Infof("Reboot schedule: %v", window)
log.Infof("Reboot check command: %s every %v", sentinelCommand, period)
log.Infof("Reboot command: %s", restartCommand)
if annotateNodes {
log.Infof("Will annotate nodes during kured reboot operations")
}
go rebootAsRequired(nodeID, window, annotationTTL)
go maintainRebootRequiredMetric(nodeID)
// To run those commands as it was the host, we'll use nsenter
// Relies on hostPID:true and privileged:true to enter host mount space
// PID set to 1, until we have a better discovery mechanism.
hostSentinelCommand := buildHostCommand(1, sentinelCommand)
hostRestartCommand := buildHostCommand(1, restartCommand)
go rebootAsRequired(nodeID, hostRestartCommand, hostSentinelCommand, window, lockTTL, lockReleaseDelay)
go maintainRebootRequiredMetric(nodeID, hostSentinelCommand)
http.Handle("/metrics", promhttp.Handler())
log.Fatal(http.ListenAndServe(":8080", nil))

235
cmd/kured/main_test.go Normal file
View File

@@ -0,0 +1,235 @@
package main
import (
"reflect"
"testing"
log "github.com/sirupsen/logrus"
"github.com/spf13/cobra"
"github.com/weaveworks/kured/pkg/alerts"
assert "gotest.tools/v3/assert"
papi "github.com/prometheus/client_golang/api"
)
type BlockingChecker struct {
blocking bool
}
func (fbc BlockingChecker) isBlocked() bool {
return fbc.blocking
}
var _ RebootBlocker = BlockingChecker{} // Verify that Type implements Interface.
var _ RebootBlocker = (*BlockingChecker)(nil) // Verify that *Type implements Interface.
func Test_flagCheck(t *testing.T) {
var cmd *cobra.Command
var args []string
slackHookURL = "https://hooks.slack.com/services/BLABLABA12345/IAM931A0VERY/COMPLICATED711854TOKEN1SET"
flagCheck(cmd, args)
if notifyURL != "slack://BLABLABA12345/IAM931A0VERY/COMPLICATED711854TOKEN1SET" {
t.Errorf("Slack URL Parsing is wrong: expecting %s but got %s\n", "slack://BLABLABA12345/IAM931A0VERY/COMPLICATED711854TOKEN1SET", notifyURL)
}
}
func Test_rebootBlocked(t *testing.T) {
noCheckers := []RebootBlocker{}
nonblockingChecker := BlockingChecker{blocking: false}
blockingChecker := BlockingChecker{blocking: true}
// Instantiate a prometheusClient with a broken_url
promClient, err := alerts.NewPromClient(papi.Config{Address: "broken_url"})
if err != nil {
log.Fatal("Can't create prometheusClient: ", err)
}
brokenPrometheusClient := PrometheusBlockingChecker{promClient: promClient, filter: nil, firingOnly: false}
type args struct {
blockers []RebootBlocker
}
tests := []struct {
name string
args args
want bool
}{
{
name: "Do not block on no blocker defined",
args: args{blockers: noCheckers},
want: false,
},
{
name: "Ensure a blocker blocks",
args: args{blockers: []RebootBlocker{blockingChecker}},
want: true,
},
{
name: "Ensure a non-blocker doesn't block",
args: args{blockers: []RebootBlocker{nonblockingChecker}},
want: false,
},
{
name: "Ensure one blocker is enough to block",
args: args{blockers: []RebootBlocker{nonblockingChecker, blockingChecker}},
want: true,
},
{
name: "Do block on error contacting prometheus API",
args: args{blockers: []RebootBlocker{brokenPrometheusClient}},
want: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := rebootBlocked(tt.args.blockers...); got != tt.want {
t.Errorf("rebootBlocked() = %v, want %v", got, tt.want)
}
})
}
}
func Test_buildHostCommand(t *testing.T) {
type args struct {
pid int
command []string
}
tests := []struct {
name string
args args
want []string
}{
{
name: "Ensure command will run with nsenter",
args: args{pid: 1, command: []string{"ls", "-Fal"}},
want: []string{"/usr/bin/nsenter", "-m/proc/1/ns/mnt", "--", "ls", "-Fal"},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := buildHostCommand(tt.args.pid, tt.args.command); !reflect.DeepEqual(got, tt.want) {
t.Errorf("buildHostCommand() = %v, want %v", got, tt.want)
}
})
}
}
func Test_buildSentinelCommand(t *testing.T) {
type args struct {
rebootSentinelFile string
rebootSentinelCommand string
}
tests := []struct {
name string
args args
want []string
}{
{
name: "Ensure a sentinelFile generates a shell 'test' command with the right file",
args: args{
rebootSentinelFile: "/test1",
rebootSentinelCommand: "",
},
want: []string{"test", "-f", "/test1"},
},
{
name: "Ensure a sentinelCommand has priority over a sentinelFile if both are provided (because sentinelFile is always provided)",
args: args{
rebootSentinelFile: "/test1",
rebootSentinelCommand: "/sbin/reboot-required -r",
},
want: []string{"/sbin/reboot-required", "-r"},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := buildSentinelCommand(tt.args.rebootSentinelFile, tt.args.rebootSentinelCommand); !reflect.DeepEqual(got, tt.want) {
t.Errorf("buildSentinelCommand() = %v, want %v", got, tt.want)
}
})
}
}
func Test_parseRebootCommand(t *testing.T) {
type args struct {
rebootCommand string
}
tests := []struct {
name string
args args
want []string
}{
{
name: "Ensure a reboot command is properly parsed",
args: args{
rebootCommand: "/sbin/systemctl reboot",
},
want: []string{"/sbin/systemctl", "reboot"},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := parseRebootCommand(tt.args.rebootCommand); !reflect.DeepEqual(got, tt.want) {
t.Errorf("parseRebootCommand() = %v, want %v", got, tt.want)
}
})
}
}
func Test_rebootRequired(t *testing.T) {
type args struct {
sentinelCommand []string
}
tests := []struct {
name string
args args
want bool
}{
{
name: "Ensure rc = 0 means reboot required",
args: args{
sentinelCommand: []string{"true"},
},
want: true,
},
{
name: "Ensure rc != 0 means reboot NOT required",
args: args{
sentinelCommand: []string{"false"},
},
want: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := rebootRequired(tt.args.sentinelCommand); got != tt.want {
t.Errorf("rebootRequired() = %v, want %v", got, tt.want)
}
})
}
}
func Test_rebootRequired_fatals(t *testing.T) {
cases := []struct {
param []string
expectFatal bool
}{
{
param: []string{"true"},
expectFatal: false,
},
{
param: []string{"./babar"},
expectFatal: true,
},
}
defer func() { log.StandardLogger().ExitFunc = nil }()
var fatal bool
log.StandardLogger().ExitFunc = func(int) { fatal = true }
for _, c := range cases {
fatal = false
rebootRequired(c.param)
assert.Equal(t, c.expectFatal, fatal)
}
}

View File

@@ -1,23 +0,0 @@
package main
import (
"fmt"
"log"
"os"
"regexp"
"github.com/weaveworks/kured/pkg/alerts"
)
func main() {
if len(os.Args) != 3 {
log.Fatalf("USAGE: %s <prometheusURL> <filterRegexp>", os.Args[0])
}
count, err := alerts.PrometheusCountActive(os.Args[1], regexp.MustCompile(os.Args[2]))
if err != nil {
log.Fatal(err)
}
fmt.Println(count)
}

25
go.mod
View File

@@ -1,15 +1,20 @@
module github.com/weaveworks/kured
go 1.13
go 1.16
require (
github.com/googleapis/gnostic v0.2.0 // indirect
github.com/inconshreveable/mousetrap v1.0.0 // indirect
github.com/prometheus/client_golang v0.0.0-20181230203121-fb3d5cb2ad57
github.com/prometheus/common v0.0.0-20181218105931-67670fe90761
github.com/prometheus/procfs v0.0.0-20190102135031-14fa7590c24d // indirect
github.com/sirupsen/logrus v1.2.0
github.com/spf13/cobra v0.0.0-20181127133106-d2d81d9a96e2
k8s.io/apimachinery v0.17.0
k8s.io/client-go v0.17.0
github.com/containrrr/shoutrrr v0.5.2
github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510
github.com/prometheus/client_golang v1.11.0
github.com/prometheus/common v0.32.1
github.com/sirupsen/logrus v1.8.1
github.com/spf13/cobra v1.3.0
github.com/spf13/pflag v1.0.5
github.com/spf13/viper v1.10.1
github.com/stretchr/testify v1.7.0
gotest.tools/v3 v3.0.3
k8s.io/api v0.22.4
k8s.io/apimachinery v0.22.4
k8s.io/client-go v0.22.4
k8s.io/kubectl v0.22.4
)

1172
go.sum

File diff suppressed because it is too large Load Diff

View File

@@ -29,7 +29,7 @@ spec:
restartPolicy: Always
containers:
- name: kured
image: docker.io/weaveworks/kured
image: docker.io/weaveworks/kured:1.9.1
# If you find yourself here wondering why there is no
# :latest tag on Docker Hub,see the FAQ in the README
imagePullPolicy: IfNotPresent
@@ -44,20 +44,35 @@ spec:
fieldPath: spec.nodeName
command:
- /usr/bin/kured
# - --alert-filter-regexp=^RebootRequired$
# - --blocking-pod-selector=runtime=long,cost=expensive
# - --blocking-pod-selector=name=temperamental
# - --blocking-pod-selector=...
# - --ds-name=kured
# - --ds-namespace=kube-system
# - --end-time=23:59:59
# - --lock-annotation=weave.works/kured-node-lock
# - --force-reboot=false
# - --drain-grace-period=-1
# - --skip-wait-for-delete-timeout=0
# - --drain-timeout=0
# - --period=1h
# - --ds-namespace=kube-system
# - --ds-name=kured
# - --lock-annotation=weave.works/kured-node-lock
# - --lock-ttl=0
# - --prometheus-url=http://prometheus.monitoring.svc.cluster.local
# - --reboot-days=sun,mon,tue,wed,thu,fri,sat
# - --alert-filter-regexp=^RebootRequired$
# - --alert-firing-only=false
# - --reboot-sentinel=/var/run/reboot-required
# - --prefer-no-schedule-taint=""
# - --reboot-sentinel-command=""
# - --slack-hook-url=https://hooks.slack.com/...
# - --slack-username=prod
# - --slack-channel=alerting
# - --notify-url="" # See also shoutrrr url format
# - --message-template-drain=Draining node %s
# - --message-template-drain=Rebooting node %s
# - --blocking-pod-selector=runtime=long,cost=expensive
# - --blocking-pod-selector=name=temperamental
# - --blocking-pod-selector=...
# - --reboot-days=sun,mon,tue,wed,thu,fri,sat
# - --reboot-delay=90s
# - --start-time=0:00
# - --end-time=23:59:59
# - --time-zone=UTC
# - --annotate-nodes=false
# - --lock-release-delay=30m
# - --log-format=text

View File

@@ -8,7 +8,7 @@ rules:
# Allow kubectl to drain/uncordon
#
# NB: These permissions are tightly coupled to the bundled version of kubectl; the ones below
# match https://github.com/kubernetes/kubernetes/blob/v1.17.7/staging/src/k8s.io/kubectl/pkg/cmd/drain/drain.go
# match https://github.com/kubernetes/kubernetes/blob/v1.19.4/staging/src/k8s.io/kubectl/pkg/cmd/drain/drain.go
#
- apiGroups: [""]
resources: ["nodes"]

View File

@@ -7,22 +7,39 @@ import (
"sort"
"time"
"github.com/prometheus/client_golang/api"
"github.com/prometheus/client_golang/api/prometheus/v1"
papi "github.com/prometheus/client_golang/api"
v1 "github.com/prometheus/client_golang/api/prometheus/v1"
"github.com/prometheus/common/model"
)
// Returns a list of names of active (e.g. pending or firing) alerts, filtered
// by the supplied regexp.
func PrometheusActiveAlerts(prometheusURL string, filter *regexp.Regexp) ([]string, error) {
client, err := api.NewClient(api.Config{Address: prometheusURL})
// PromClient is a wrapper around the Prometheus Client interface and implements the api
// This way, the PromClient can be instantiated with the configuration the Client needs, and
// the ability to use the methods the api has, like Query and so on.
type PromClient struct {
papi papi.Client
api v1.API
}
// NewPromClient creates a new client to the Prometheus API.
// It returns an error on any problem.
func NewPromClient(conf papi.Config) (*PromClient, error) {
promClient, err := papi.NewClient(conf)
if err != nil {
return nil, err
}
client := PromClient{papi: promClient, api: v1.NewAPI(promClient)}
return &client, nil
}
queryAPI := v1.NewAPI(client)
// ActiveAlerts is a method of type PromClient, it returns a list of names of active alerts
// (e.g. pending or firing), filtered by the supplied regexp or by the includeLabels query.
// filter by regexp means when the regex finds the alert-name; the alert is exluded from the
// block-list and will NOT block rebooting. query by includeLabel means,
// if the query finds an alert, it will include it to the block-list and it WILL block rebooting.
func (p *PromClient) ActiveAlerts(filter *regexp.Regexp, firingOnly bool) ([]string, error) {
value, err := queryAPI.Query(context.Background(), "ALERTS", time.Now())
// get all alerts from prometheus
value, _, err := p.api.Query(context.Background(), "ALERTS", time.Now())
if err != nil {
return nil, err
}
@@ -32,17 +49,17 @@ func PrometheusActiveAlerts(prometheusURL string, filter *regexp.Regexp) ([]stri
activeAlertSet := make(map[string]bool)
for _, sample := range vector {
if alertName, isAlert := sample.Metric[model.AlertNameLabel]; isAlert && sample.Value != 0 {
if filter == nil || !filter.MatchString(string(alertName)) {
if (filter == nil || !filter.MatchString(string(alertName))) && (!firingOnly || sample.Metric["alertstate"] == "firing") {
activeAlertSet[string(alertName)] = true
}
}
}
var activeAlerts []string
for activeAlert, _ := range activeAlertSet {
for activeAlert := range activeAlertSet {
activeAlerts = append(activeAlerts, activeAlert)
}
sort.Sort(sort.StringSlice(activeAlerts))
sort.Strings(activeAlerts)
return activeAlerts, nil
}

View File

@@ -0,0 +1,141 @@
package alerts
import (
"log"
"net/http"
"net/http/httptest"
"regexp"
"testing"
"github.com/prometheus/client_golang/api"
"github.com/stretchr/testify/assert"
)
type MockResponse struct {
StatusCode int
Body []byte
}
// MockServerProperties ties a mock response to a url and a method
type MockServerProperties struct {
URI string
HTTPMethod string
Response MockResponse
}
// NewMockServer sets up a new MockServer with properties ad starts the server.
func NewMockServer(props ...MockServerProperties) *httptest.Server {
handler := http.HandlerFunc(
func(w http.ResponseWriter, r *http.Request) {
for _, proc := range props {
_, err := w.Write(proc.Response.Body)
if err != nil {
log.Fatal(err)
}
}
})
return httptest.NewServer(handler)
}
func TestActiveAlerts(t *testing.T) {
responsebody := `{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"ALERTS","alertname":"GatekeeperViolations","alertstate":"firing","severity":"warning","team":"platform-infra"},"value":[1622472933.973,"1"]},{"metric":{"__name__":"ALERTS","alertname":"PodCrashing-dev","alertstate":"firing","container":"deployment","instance":"1.2.3.4:8080","job":"kube-state-metrics","namespace":"dev","pod":"dev-deployment-78dcbmf25v","severity":"critical","team":"dev"},"value":[1622472933.973,"1"]},{"metric":{"__name__":"ALERTS","alertname":"PodRestart-dev","alertstate":"firing","container":"deployment","instance":"1.2.3.4:1234","job":"kube-state-metrics","namespace":"qa","pod":"qa-job-deployment-78dcbmf25v","severity":"warning","team":"qa"},"value":[1622472933.973,"1"]},{"metric":{"__name__":"ALERTS","alertname":"PrometheusTargetDown","alertstate":"firing","job":"kubernetes-pods","severity":"warning","team":"platform-infra"},"value":[1622472933.973,"1"]},{"metric":{"__name__":"ALERTS","alertname":"ScheduledRebootFailing","alertstate":"pending","severity":"warning","team":"platform-infra"},"value":[1622472933.973,"1"]}]}}`
addr := "http://localhost:10001"
for _, tc := range []struct {
it string
rFilter string
respBody string
aName string
wantN int
firingOnly bool
}{
{
it: "should return no active alerts",
respBody: responsebody,
rFilter: "",
wantN: 0,
firingOnly: false,
},
{
it: "should return a subset of all alerts",
respBody: responsebody,
rFilter: "Pod",
wantN: 3,
firingOnly: false,
},
{
it: "should return all active alerts by regex",
respBody: responsebody,
rFilter: "*",
wantN: 5,
firingOnly: false,
},
{
it: "should return all active alerts by regex filter",
respBody: responsebody,
rFilter: "*",
wantN: 5,
firingOnly: false,
},
{
it: "should return only firing alerts if firingOnly is true",
respBody: responsebody,
rFilter: "*",
wantN: 4,
firingOnly: true,
},
{
it: "should return ScheduledRebootFailing active alerts",
respBody: `{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"ALERTS","alertname":"ScheduledRebootFailing","alertstate":"pending","severity":"warning","team":"platform-infra"},"value":[1622472933.973,"1"]}]}}`,
aName: "ScheduledRebootFailing",
rFilter: "*",
wantN: 1,
firingOnly: false,
},
{
it: "should not return an active alert if RebootRequired is firing (regex filter)",
respBody: `{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"ALERTS","alertname":"RebootRequired","alertstate":"pending","severity":"warning","team":"platform-infra"},"value":[1622472933.973,"1"]}]}}`,
rFilter: "RebootRequired",
wantN: 0,
firingOnly: false,
},
} {
// Start mockServer
mockServer := NewMockServer(MockServerProperties{
URI: addr,
HTTPMethod: http.MethodPost,
Response: MockResponse{
Body: []byte(tc.respBody),
},
})
// Close mockServer after all connections are gone
defer mockServer.Close()
t.Run(tc.it, func(t *testing.T) {
// regex filter
regex, _ := regexp.Compile(tc.rFilter)
// instantiate the prometheus client with the mockserver-address
p, err := NewPromClient(api.Config{Address: mockServer.URL})
if err != nil {
log.Fatal(err)
}
result, err := p.ActiveAlerts(regex, tc.firingOnly)
if err != nil {
log.Fatal(err)
}
// assert
assert.Equal(t, tc.wantN, len(result), "expected amount of alerts %v, got %v", tc.wantN, len(result))
if tc.aName != "" {
assert.Equal(t, tc.aName, result[0], "expected active alert %v, got %v", tc.aName, result[0])
}
})
}
}

View File

@@ -1,15 +1,25 @@
package daemonsetlock
import (
"context"
"encoding/json"
"fmt"
"time"
v1 "k8s.io/api/apps/v1"
"k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/wait"
"k8s.io/client-go/kubernetes"
)
const (
k8sAPICallRetrySleep = 5 * time.Second // How much time to wait in between retrying a k8s API call
k8sAPICallRetryTimeout = 5 * time.Minute // How long to wait until we determine that the k8s API is definitively unavailable
)
// DaemonSetLock holds all necessary information to do actions
// on the kured ds which holds lock info through annotations.
type DaemonSetLock struct {
client *kubernetes.Clientset
nodeID string
@@ -25,15 +35,17 @@ type lockAnnotationValue struct {
TTL time.Duration `json:"TTL"`
}
// New creates a daemonsetLock object containing the necessary data for follow up k8s requests
func New(client *kubernetes.Clientset, nodeID, namespace, name, annotation string) *DaemonSetLock {
return &DaemonSetLock{client, nodeID, namespace, name, annotation}
}
func (dsl *DaemonSetLock) Acquire(metadata interface{}, TTL time.Duration) (acquired bool, owner string, err error) {
// Acquire attempts to annotate the kured daemonset with lock info from instantiated DaemonSetLock using client-go
func (dsl *DaemonSetLock) Acquire(metadata interface{}, TTL time.Duration) (bool, string, error) {
for {
ds, err := dsl.client.AppsV1().DaemonSets(dsl.namespace).Get(dsl.name, metav1.GetOptions{})
ds, err := dsl.GetDaemonSet(k8sAPICallRetrySleep, k8sAPICallRetryTimeout)
if err != nil {
return false, "", err
return false, "", fmt.Errorf("timed out trying to get daemonset %s in namespace %s: %w", dsl.name, dsl.namespace, err)
}
valueString, exists := ds.ObjectMeta.Annotations[dsl.annotation]
@@ -43,11 +55,9 @@ func (dsl *DaemonSetLock) Acquire(metadata interface{}, TTL time.Duration) (acqu
return false, "", err
}
if ttlExpired(value.Created, value.TTL) {
return true, value.NodeID, nil
if !ttlExpired(value.Created, value.TTL) {
return value.NodeID == dsl.nodeID, value.NodeID, nil
}
return value.NodeID == dsl.nodeID, value.NodeID, nil
}
if ds.ObjectMeta.Annotations == nil {
@@ -60,7 +70,7 @@ func (dsl *DaemonSetLock) Acquire(metadata interface{}, TTL time.Duration) (acqu
}
ds.ObjectMeta.Annotations[dsl.annotation] = string(valueBytes)
_, err = dsl.client.AppsV1().DaemonSets(dsl.namespace).Update(ds)
_, err = dsl.client.AppsV1().DaemonSets(dsl.namespace).Update(context.TODO(), ds, metav1.UpdateOptions{})
if err != nil {
if se, ok := err.(*errors.StatusError); ok && se.ErrStatus.Reason == metav1.StatusReasonConflict {
// Something else updated the resource between us reading and writing - try again soon
@@ -74,10 +84,11 @@ func (dsl *DaemonSetLock) Acquire(metadata interface{}, TTL time.Duration) (acqu
}
}
func (dsl *DaemonSetLock) Test(metadata interface{}) (holding bool, err error) {
ds, err := dsl.client.AppsV1().DaemonSets(dsl.namespace).Get(dsl.name, metav1.GetOptions{})
// Test attempts to check the kured daemonset lock status (existence, expiry) from instantiated DaemonSetLock using client-go
func (dsl *DaemonSetLock) Test(metadata interface{}) (bool, error) {
ds, err := dsl.GetDaemonSet(k8sAPICallRetrySleep, k8sAPICallRetryTimeout)
if err != nil {
return false, err
return false, fmt.Errorf("timed out trying to get daemonset %s in namespace %s: %w", dsl.name, dsl.namespace, err)
}
valueString, exists := ds.ObjectMeta.Annotations[dsl.annotation]
@@ -87,21 +98,20 @@ func (dsl *DaemonSetLock) Test(metadata interface{}) (holding bool, err error) {
return false, err
}
if ttlExpired(value.Created, value.TTL) {
return true, nil
if !ttlExpired(value.Created, value.TTL) {
return value.NodeID == dsl.nodeID, nil
}
return value.NodeID == dsl.nodeID, nil
}
return false, nil
}
// Release attempts to remove the lock data from the kured ds annotations using client-go
func (dsl *DaemonSetLock) Release() error {
for {
ds, err := dsl.client.AppsV1().DaemonSets(dsl.namespace).Get(dsl.name, metav1.GetOptions{})
ds, err := dsl.GetDaemonSet(k8sAPICallRetrySleep, k8sAPICallRetryTimeout)
if err != nil {
return err
return fmt.Errorf("timed out trying to get daemonset %s in namespace %s: %w", dsl.name, dsl.namespace, err)
}
valueString, exists := ds.ObjectMeta.Annotations[dsl.annotation]
@@ -110,7 +120,8 @@ func (dsl *DaemonSetLock) Release() error {
if err := json.Unmarshal([]byte(valueString), &value); err != nil {
return err
}
if value.NodeID != dsl.nodeID && !ttlExpired(value.Created, value.TTL) {
if value.NodeID != dsl.nodeID {
return fmt.Errorf("Not lock holder: %v", value.NodeID)
}
} else {
@@ -119,7 +130,7 @@ func (dsl *DaemonSetLock) Release() error {
delete(ds.ObjectMeta.Annotations, dsl.annotation)
_, err = dsl.client.AppsV1().DaemonSets(dsl.namespace).Update(ds)
_, err = dsl.client.AppsV1().DaemonSets(dsl.namespace).Update(context.TODO(), ds, metav1.UpdateOptions{})
if err != nil {
if se, ok := err.(*errors.StatusError); ok && se.ErrStatus.Reason == metav1.StatusReasonConflict {
// Something else updated the resource between us reading and writing - try again soon
@@ -133,6 +144,24 @@ func (dsl *DaemonSetLock) Release() error {
}
}
// GetDaemonSet returns the named DaemonSet resource from the DaemonSetLock's configured client
func (dsl *DaemonSetLock) GetDaemonSet(sleep, timeout time.Duration) (*v1.DaemonSet, error) {
var ds *v1.DaemonSet
var lastError error
err := wait.PollImmediate(sleep, timeout, func() (bool, error) {
ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
if ds, lastError = dsl.client.AppsV1().DaemonSets(dsl.namespace).Get(ctx, dsl.name, metav1.GetOptions{}); lastError != nil {
return false, nil
}
return true, nil
})
if err != nil {
return nil, fmt.Errorf("Timed out trying to get daemonset %s in namespace %s: %v", dsl.name, dsl.namespace, lastError)
}
return ds, nil
}
func ttlExpired(created time.Time, ttl time.Duration) bool {
if ttl > 0 && time.Since(created) >= ttl {
return true

View File

@@ -5,7 +5,7 @@ import (
"time"
)
// Tick regularly after an initial delay randomly distributed between d/2 and d + d/2
// New ticks regularly after an initial delay randomly distributed between d/2 and d + d/2
func New(s rand.Source, d time.Duration) <-chan time.Time {
c := make(chan time.Time)

View File

@@ -1,52 +0,0 @@
package slack
import (
"bytes"
"encoding/json"
"fmt"
"net/http"
"time"
)
var (
httpClient = &http.Client{Timeout: 5 * time.Second}
)
type body struct {
Text string `json:"text,omitempty"`
Username string `json:"username,omitempty"`
Channel string `json:"channel,omitempty"`
}
func notify(hookURL, username, channel, message string) error {
msg := body{
Text: message,
Username: username,
Channel: channel,
}
var buf bytes.Buffer
if err := json.NewEncoder(&buf).Encode(&msg); err != nil {
return err
}
resp, err := httpClient.Post(hookURL, "application/json", &buf)
if err != nil {
return err
}
defer resp.Body.Close()
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
return fmt.Errorf(resp.Status)
}
return nil
}
func NotifyDrain(hookURL, username, channel, nodeID string) error {
return notify(hookURL, username, channel, fmt.Sprintf("Draining node %s", nodeID))
}
func NotifyReboot(hookURL, username, channel, nodeID string) error {
return notify(hookURL, username, channel, fmt.Sprintf("Rebooting node %s", nodeID))
}

166
pkg/taints/taints.go Normal file
View File

@@ -0,0 +1,166 @@
package taints
import (
"context"
"encoding/json"
"fmt"
log "github.com/sirupsen/logrus"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/kubernetes"
)
// Taint allows to set soft and hard limitations for scheduling and executing pods on nodes.
type Taint struct {
client *kubernetes.Clientset
nodeID string
taintName string
effect v1.TaintEffect
exists bool
}
// New provides a new taint.
func New(client *kubernetes.Clientset, nodeID, taintName string, effect v1.TaintEffect) *Taint {
exists, _, _ := taintExists(client, nodeID, taintName)
return &Taint{
client: client,
nodeID: nodeID,
taintName: taintName,
effect: effect,
exists: exists,
}
}
// Enable creates the taint for a node. Creating an existing taint is a noop.
func (t *Taint) Enable() {
if t.taintName == "" {
return
}
if t.exists {
return
}
preferNoSchedule(t.client, t.nodeID, t.taintName, t.effect, true)
t.exists = true
}
// Disable removes the taint for a node. Removing a missing taint is a noop.
func (t *Taint) Disable() {
if t.taintName == "" {
return
}
if !t.exists {
return
}
preferNoSchedule(t.client, t.nodeID, t.taintName, t.effect, false)
t.exists = false
}
func taintExists(client *kubernetes.Clientset, nodeID, taintName string) (bool, int, *v1.Node) {
updatedNode, err := client.CoreV1().Nodes().Get(context.TODO(), nodeID, metav1.GetOptions{})
if err != nil || updatedNode == nil {
log.Fatalf("Error reading node %s: %v", nodeID, err)
}
for i, taint := range updatedNode.Spec.Taints {
if taint.Key == taintName {
return true, i, updatedNode
}
}
return false, 0, updatedNode
}
func preferNoSchedule(client *kubernetes.Clientset, nodeID, taintName string, effect v1.TaintEffect, shouldExists bool) {
taintExists, offset, updatedNode := taintExists(client, nodeID, taintName)
if taintExists && shouldExists {
log.Debugf("Taint %v exists already for node %v.", taintName, nodeID)
return
}
if !taintExists && !shouldExists {
log.Debugf("Taint %v already missing for node %v.", taintName, nodeID)
return
}
type patchTaints struct {
Op string `json:"op"`
Path string `json:"path"`
Value interface{} `json:"value,omitempty"`
}
taint := v1.Taint{
Key: taintName,
Effect: effect,
}
var patches []patchTaints
if len(updatedNode.Spec.Taints) == 0 {
// add first taint and ensure to keep current taints
patches = []patchTaints{
{
Op: "test",
Path: "/spec",
Value: updatedNode.Spec,
},
{
Op: "add",
Path: "/spec/taints",
Value: []v1.Taint{},
},
{
Op: "add",
Path: "/spec/taints/-",
Value: taint,
},
}
} else if taintExists {
// remove taint and ensure to test against race conditions
patches = []patchTaints{
{
Op: "test",
Path: fmt.Sprintf("/spec/taints/%d", offset),
Value: taint,
},
{
Op: "remove",
Path: fmt.Sprintf("/spec/taints/%d", offset),
},
}
} else {
// add missing taint to exsting list
patches = []patchTaints{
{
Op: "add",
Path: "/spec/taints/-",
Value: taint,
},
}
}
patchBytes, err := json.Marshal(patches)
if err != nil {
log.Fatalf("Error encoding taint patch for node %s: %v", nodeID, err)
}
_, err = client.CoreV1().Nodes().Patch(context.TODO(), nodeID, types.JSONPatchType, patchBytes, metav1.PatchOptions{})
if err != nil {
log.Fatalf("Error patching taint for node %s: %v", nodeID, err)
}
if shouldExists {
log.Info("Node taint added")
} else {
log.Info("Node taint removed")
}
}

View File

@@ -7,6 +7,8 @@ import (
"time"
)
// EveryDay contains all days of the week, and exports it
// for convenience use in the cmd line arguments.
var EveryDay = []string{"su", "mo", "tu", "we", "th", "fr", "sa"}
// dayStrings maps day strings to time.Weekdays
@@ -78,14 +80,12 @@ func parseWeekday(day string) (time.Weekday, error) {
if n, err := strconv.Atoi(day); err == nil {
if n >= 0 && n < 7 {
return time.Weekday(n), nil
} else {
return time.Sunday, fmt.Errorf("Invalid weekday, number out of range: %s", day)
}
return time.Sunday, fmt.Errorf("Invalid weekday, number out of range: %s", day)
}
if weekday, ok := dayStrings[strings.ToLower(day)]; ok {
return weekday, nil
} else {
return time.Sunday, fmt.Errorf("Invalid weekday: %s", day)
}
return time.Sunday, fmt.Errorf("Invalid weekday: %s", day)
}

View File

@@ -47,6 +47,19 @@ func (tw *TimeWindow) Contains(t time.Time) bool {
start := time.Date(loctime.Year(), loctime.Month(), loctime.Day(), tw.startTime.Hour(), tw.startTime.Minute(), tw.startTime.Second(), 0, tw.location)
end := time.Date(loctime.Year(), loctime.Month(), loctime.Day(), tw.endTime.Hour(), tw.endTime.Minute(), tw.endTime.Second(), 1e9-1, tw.location)
// Time Wrap validation
// First we check for start and end time, if start is after end time
// Next we need to validate if we want to wrap to the day before or to the day after
// For that we check the loctime value to see if it is before end time, we wrap with the day before
// Otherwise we wrap to the next day.
if tw.startTime.After(tw.endTime) {
if loctime.Before(end) {
start = start.Add(-24 * time.Hour)
} else {
end = end.Add(24 * time.Hour)
}
}
return (loctime.After(start) || loctime.Equal(start)) && (loctime.Before(end) || loctime.Equal(end))
}

View File

@@ -20,12 +20,12 @@ func TestTimeWindows(t *testing.T) {
cases []testcase
}{
{"mon,tue,wed,thu,fri", "9am", "5pm", "America/Los_Angeles", []testcase{
{"2019/04/04 00:49 PDT", false},
{"2019/04/05 08:59 PDT", false},
{"2019/04/05 9:01 PDT", true},
{"2019/03/31 10:00 PDT", false},
{"2019/04/04 00:49 PDT", false},
{"2019/04/04 12:00 PDT", true},
{"2019/04/04 11:59 UTC", false},
{"2019/04/05 08:59 PDT", false},
{"2019/04/05 9:01 PDT", true},
}},
{"mon,we,fri", "10:01", "11:30am", "America/Los_Angeles", []testcase{
{"2019/04/05 10:30 PDT", true},
@@ -40,6 +40,43 @@ func TestTimeWindows(t *testing.T) {
{"2019/04/18 00:00 UTC", true},
{"2019/04/18 23:59 UTC", true},
}},
{"mon,tue,wed,thu,fri", "9pm", "5am", "America/Los_Angeles", []testcase{
{"2019/03/30 04:00 PDT", false},
{"2019/03/31 10:00 PDT", false},
{"2019/03/31 22:00 PDT", false},
{"2019/04/04 00:49 PDT", true},
{"2019/04/04 12:00 PDT", false},
{"2019/04/04 22:49 PDT", true},
{"2019/04/05 00:49 PDT", true},
{"2019/04/05 08:59 PDT", false},
{"2019/04/05 9:01 PDT", false},
}},
{"mon,tue,wed,thu,fri", "11:59pm", "00:01am", "America/Los_Angeles", []testcase{
{"2019/04/04 23:58 PDT", false},
{"2019/04/04 23:59 PDT", true},
{"2019/04/05 00:00 PDT", true},
{"2019/04/05 00:01 PDT", true},
{"2019/04/05 00:02 PDT", false},
}},
{"mon,tue,wed,fri", "11:59pm", "00:01am", "America/Los_Angeles", []testcase{
{"2019/04/04 23:58 PDT", false},
{"2019/04/04 23:59 PDT", false}, // Even that this falls in the between the hours Thursday is not included so should not run
{"2019/04/05 00:00 PDT", true},
{"2019/04/05 00:02 PDT", false},
}},
{"mon,tue,wed,thu", "11:59pm", "00:01am", "America/Los_Angeles", []testcase{
{"2019/04/04 23:58 PDT", false},
{"2019/04/04 23:59 PDT", true},
{"2019/04/05 00:00 PDT", false}, // Even that this falls in the between the hours Friday is not included so should not run
{"2019/04/05 00:02 PDT", false},
}},
{"mon,tue,wed,thu,fri", "11:59pm", "00:01am", "UTC", []testcase{
{"2019/04/04 23:58 UTC", false},
{"2019/04/04 23:59 UTC", true},
{"2019/04/05 00:00 UTC", true},
{"2019/04/05 00:01 UTC", true},
{"2019/04/05 00:02 UTC", false},
}},
}
for i, tst := range tests {

View File

@@ -0,0 +1,12 @@
#!/usr/bin/env bash
# USE KUBECTL_CMD to pass context and/or namespaces.
KUBECTL_CMD="${KUBECTL_CMD:-kubectl}"
SENTINEL_FILE="${SENTINEL_FILE:-/var/run/reboot-required}"
echo "Creating reboot sentinel on all nodes"
for nodename in $("$KUBECTL_CMD" get nodes -o name); do
docker exec "${nodename/node\//}" hostname
docker exec "${nodename/node\//}" touch "${SENTINEL_FILE}"
done

View File

@@ -0,0 +1,85 @@
#!/usr/bin/env bash
NODECOUNT=${NODECOUNT:-5}
KUBECTL_CMD="${KUBECTL_CMD:-kubectl}"
DEBUG="${DEBUG:-false}"
CONTAINER_NAME_FORMAT=${CONTAINER_NAME_FORMAT:-"chart-testing-*"}
tmp_dir=$(mktemp -d -t kured-XXXX)
function gather_logs_and_cleanup {
if [[ -f "$tmp_dir"/node_output ]]; then
rm "$tmp_dir"/node_output
fi
rmdir "$tmp_dir"
# The next commands are useful regardless of success or failures.
if [[ "$DEBUG" == "true" ]]; then
echo "############################################################"
# This is useful to see if containers have crashed.
echo "docker ps -a:"
docker ps -a
echo "docker journal logs"
journalctl -u docker --no-pager
# This is useful to see if the nodes have _properly_ rebooted.
# It should show the reboot/two container starts per node.
for name in $(docker ps -a -f "name=${CONTAINER_NAME_FORMAT}" -q); do
echo "############################################################"
echo "docker logs for container $name:"
docker logs "$name"
done
fi
}
trap gather_logs_and_cleanup EXIT
declare -A was_unschedulable
declare -A has_recovered
max_attempts="60"
sleep_time=60
attempt_num=1
set +o errexit
echo "There are $NODECOUNT nodes in the cluster"
until [ ${#was_unschedulable[@]} == "$NODECOUNT" ] && [ ${#has_recovered[@]} == "$NODECOUNT" ]
do
echo "${#was_unschedulable[@]} nodes were removed from pool once:" "${!was_unschedulable[@]}"
echo "${#has_recovered[@]} nodes removed from the pool are now back:" "${!has_recovered[@]}"
"$KUBECTL_CMD" get nodes -o custom-columns=NAME:.metadata.name,SCHEDULABLE:.spec.unschedulable --no-headers > "$tmp_dir"/node_output
if [[ "$DEBUG" == "true" ]]; then
# This is useful to see if a node gets stuck after drain, and doesn't
# come back up.
echo "Result of command $KUBECTL_CMD get nodes ... showing unschedulable nodes:"
cat "$tmp_dir"/node_output
fi
while read -r node; do
unschedulable=$(echo "$node" | grep true | cut -f 1 -d ' ')
if [ -n "$unschedulable" ] && [ -z ${was_unschedulable["$unschedulable"]+x} ] ; then
echo "$unschedulable is now unschedulable!"
was_unschedulable["$unschedulable"]=1
fi
schedulable=$(echo "$node" | grep '<none>' | cut -f 1 -d ' ')
if [ -n "$schedulable" ] && [ ${was_unschedulable["$schedulable"]+x} ] && [ -z ${has_recovered["$schedulable"]+x} ]; then
echo "$schedulable has recovered!"
has_recovered["$schedulable"]=1
fi
done < "$tmp_dir"/node_output
if [[ "${#has_recovered[@]}" == "$NODECOUNT" ]]; then
echo "All nodes recovered."
break
else
if (( attempt_num == max_attempts ))
then
echo "Attempt $attempt_num failed and there are no more attempts left!"
exit 1
else
echo "Attempt $attempt_num failed! Trying again in $sleep_time seconds..."
sleep "$sleep_time"
fi
fi
(( attempt_num++ ))
done
echo "Test successful"

19
tests/kind/test-metrics.sh Executable file
View File

@@ -0,0 +1,19 @@
#!/usr/bin/env bash
expected="$1"
if [[ "$expected" != "0" && "$expected" != "1" ]]; then
echo "You should give an argument to this script, the gauge value (0 or 1)"
exit 1
fi
HOST="${HOST:-localhost}"
PORT="${PORT:-30000}"
NODENAME="${NODENAME-chart-testing-control-plane}"
reboot_required=$(docker exec "$NODENAME" curl "http://$HOST:$PORT/metrics" | awk '/^kured_reboot_required/{print $2}')
if [[ "$reboot_required" == "$expected" ]]; then
echo "Test success"
else
echo "Test failed"
exit 1
fi