Commit Graph

76 Commits

Author SHA1 Message Date
Jean-Philippe Evrard
5930d733f8 Fix the Fatal calls using formatting
Without this, go test will rightfully fail.

This is a problem, as we don't have go test enabled, but we want
to have this in the future.

This should fix it.
2021-03-29 09:50:56 +02:00
Jean-Philippe Evrard
fd63e9a74b Add flexible commands parameters
Without this patch, you cannot configure the reboot
command to use, or the use another command to trigger
a reboot.

This is a problem, as multiple users have asked for
it in the past, and we are lacking flexibility.

This fixes it by introducing two new parameters,
- one to provide a custom reboot command.
  This should help people running kured on
  non systemd OS
- one to provide a custom sentinel command.
  This should help people running non Ubuntu OS,
  as they can directly use their command instead of
  generating a file (useful for CentOS/SUSE)

For this, several refactors had to be done, to
remove global state in some functions. Making those
functions closer to "pure functions" helps us
increase our test coverage here and later.

As commandReboot was very close to rebootCommand,
the function to reboot the node has been renamed
to invokeReboot.
2021-03-29 09:50:56 +02:00
Jean-Philippe Evrard
837bd4eb2a Refactor reboot blocks
Without this patch, we rely on global state in many functions for
which we check the reboot blockers.

This is a problem, as it's harder to test.

This patch fixes it by refactoring the reboot blockers. This also
includes a first series of unit tests for our main.
2021-03-29 09:50:56 +02:00
Jean-Philippe Evrard
15c57927c8 Update the deprecated DeleteLocalData
DeleteLocalData was deprecated for users of kubectl in 0.20 [1].
At the same time of the deprecation, the relevant code was also
removed [2] without warning: The DeleteLocalData from the helper
structure was simply renamed DeleteEmptyDirData, without shims
on the exposed pkg.

This is a problem, as it completely breaks kured.

This should fix it, by using the new field name.

[1]:
56ea9621b7
[2]:
56ea9621b7 (diff-041bdcdedca650a38a8d82cf15ab6f3665b7b84a0fb44a8bb5dcdc5cd944c63d)
2021-03-22 14:28:17 +01:00
Daniel Holbach
f6ada05c5d Merge pull request #320 from dholbach/alpine-3.13
update to alpine 3.13
2021-03-10 08:50:42 +01:00
Daniel Holbach
355813de30 update to alpine 3.13
Signed-off-by: Daniel Holbach <daniel@weave.works>
2021-03-10 08:10:36 +01:00
Daniel Holbach
250b9bad05 Merge pull request #296 from jackfrancis/node-annotations
add node annotations to identify kured reboot operations
2021-03-09 10:14:46 +01:00
Jack Francis
baf83408b8 add node annotations
adds a new --annotate-nodes daemonset runtime argument, which does the following when enabled:

- adds a new node annotation "weave.works/kured-most-recent-reboot-needed" with a value of the current RFC3339 timestamp as soon as kured identifies that a node needs to be rebooted
- adds a new node annotation "weave.works/kured-reboot-in-progress" with a value of the current RFC3339 timestamp as soon as kured identifies that a node needs to be rebooted
- removes the annotation "weave.works/kured-reboot-in-progress" when kured has successfully rebooted the node
2021-03-08 17:22:47 -08:00
Jack Francis
93c8242b89 always drain before reboot
This changes the pre-reboot drain functionality so that it always runs, regardless of the value of the Unschedulable node property.

Because kubectl drain is idempotent, we shouldn't have to worry about whether the node has already been set to Unschedulable (perhaps due to a prior, unsuccessful loop of the kured reboot cycle): we can run it over and over again. And because this drain func actually does a cordon + drain (and it only performs the drain if a cordon is successful), we can be sure that we aren't going to be thrashing this node w/ respect to scheduled pods.

This also fixes an edge case: if the node has been marked Unschedulable out-of-band, but workloads remain Running on this node, kured will no longer reboot the node's underlying VM/machine while it is actively running pods.
2021-03-08 17:20:31 -08:00
Daniel Holbach
fade706cbf Merge pull request #250 from damoon/19-PreferNoSchedule
implement issue-19 add prefer no schedule taint to avoid double draining of pods
2021-01-12 14:28:23 +01:00
David Sauer
5a4e197d27 change taint config to be disabled by default 2021-01-11 18:24:17 +01:00
David Sauer
3a35d6a46c remove taint in case the reboot is not needed anymore 2021-01-06 22:21:41 +01:00
David Sauer
34446f949e Allow to disable tainting during pending node reboot by setting the taint name to an empty string. 2021-01-06 21:39:32 +01:00
David Sauer
e4c684c3af taint node with PreferNoSchedule to prevent receiving (and double draining) additional pods from other rebooting nodes 2021-01-06 21:23:40 +01:00
David Sauer
204a06ca38 fixed call of log.Fatal instead of log.Fatalf 2021-01-06 21:23:40 +01:00
David Sauer
48897eb0ab avoid indentations to ease readability 2021-01-06 21:23:40 +01:00
Jean-Philippe Evrard
897834a9db Temporarily workaround alpine issue
Until a new alpine image is created, we should ensure the latest
packages are used, and therefore we should upgrade default
installed packages.

Without this patch, we'll have outdated and vulnerable packages
until a new 3.12 image is released.

This is a problem, as we'll publish broken images.

This should temporarily workaround it, at the expense of larger
images (contains package cache)
2020-12-14 11:20:27 +01:00
Daniel Jimenez Garcia
51cab0dedc rename message template parameters so they are not related to slack 2020-11-25 16:20:54 +00:00
Daniel Jimenez Garcia
f059cec794 GH-125, add additional parameters to override the drain/reboot slack messages 2020-11-25 16:19:31 +00:00
Bryan Boreham
1ba3acab98 Drain: allow pods grace period to terminate
The default of 0 is taken as "delete immediately", which is
not appropriate.
2020-11-23 18:07:56 +00:00
Daniel Holbach
aa49cfd8c4 Merge pull request #215 from evrardjp/make-lint-happier
Make go lint on cmd folder happier
2020-11-09 11:49:51 +01:00
Bryan Boreham
4c31184422 Merge pull request #213 from mvisonneau/lock_ttl
Replaced --annotationTTL with --lockTTL and fixed bug
2020-11-06 11:31:19 +00:00
Jean-Philippe Evrard
7091debe23 Make lint happier
Without this, golint is complaining about a few cosmetic changes.
This solves it, and is necessary if we want to add a lint test
in CI.
2020-11-05 10:14:39 +01:00
Jean-Philippe Evrard
ce6075c800 Remove prom-active-alerts
Prom-active-alerts command is not used, not tested, and
currently broken. Let's remove it.
2020-11-05 10:13:50 +01:00
Maxime VISONNEAU
9648d1d759 Replaced --annotationTTL with --lockTTL and made it work correctly 2020-10-30 10:39:18 +00:00
Jean-Philippe Evrard
e5a2d4acc7 Refactor drain/uncordon
Moving the drainer object close to its usage is more readable.
2020-10-29 11:45:20 +01:00
Jean-Philippe Evrard
72c4112e20 Use kubectl as library instead of calling from cli 2020-10-15 13:02:35 +02:00
Jean-Philippe Evrard
b0bd603931 fix: Follow DKL-DI-0004 guideline
Without this patch, we need to build a cache, remove it.
Since apk allows to work with no-cache and won't leave artifacts,
we should use it.

This will make the dockle best practices scanner happier.
2020-09-11 16:53:59 +02:00
Daniel Holbach
3ebc224958 update alpine to 3.12, k8s 1.18.8 2020-08-28 10:27:39 +02:00
Daniel Holbach
16109017ce Prepare for k8s release 1.19 (Aug 25)
This is #152, #139, #127 in disguise.

	Maybe this time let it simmer a bit longer until the k8s
	release is there?
2020-08-19 17:30:00 +02:00
Daniel Holbach
8fafad18bb Revert #139
This is a follow-up to #150, so we can get a 1.4.x release
	out that will be geared towards k8s 1.1[6-8].

	Update to latest 1.17 kubectl: 1.17.7.
2020-06-26 17:30:01 +02:00
Bryan Boreham
ec75533394 Merge pull request #119 from michalschott/annotationTTL
Adding --annotation-ttl for automatic unlock
2020-05-20 11:30:44 +01:00
Michal Schott
59a6700add Renaming flag as suggested. 2020-05-05 20:52:10 +02:00
Michal Schott
64ebf53264 Typo in logic. 2020-05-05 14:32:41 +02:00
Michal Schott
1257d97ead Be clean when this feature is disabled. 2020-05-05 14:10:23 +02:00
Michal Schott
7fb16fed9b Adding annotationTTL. 2020-05-05 14:10:22 +02:00
Daniel Holbach
72a31030db replay changes from #127 2020-05-01 09:07:16 +02:00
Daniel Holbach
8e73cf224d Revert parts of #127, move to client-go/kubectl 1.17
After the release of kured 1.4.0 we should be able to go back.

	This was decided in our meeting
(https://docs.google.com/document/d/1bsHTjHhqaaZ7yJnXF6W8c89UB_yn-OoSZEmDnIP34n8/edit#heading=h.8cgszb6vuhza)

	Let's go with supporting 1.1[678] in this release.
2020-04-22 18:32:25 +02:00
Carlos Garcia Lalicata
800e9e19fb pring node id when commanding reboot, so that any monitoring tool can catch it and act on it 2020-04-20 10:58:35 +02:00
Jean-Philippe Evrard
bdd20c963c Unpin base docker images
The upside is that image building will always use the latest
stable version of the alpine OS, which might include security fixes.
The downside is that it's less reproducible, because the full
version isn't given.

While this commit isn't necessary per se, it's nice to have
an image that will be up to date, when we'll build it.
2020-04-08 18:17:58 +02:00
Daniel Holbach
0a419d0d34 update to 1.18.0 API
confirmed by running https://github.com/kubernetes-sigs/clientgofix

	closes: #123
2020-03-30 10:11:30 +02:00
Daniel Holbach
b75aec87d7 update urls to match k8s 1.18 release 2020-03-30 10:11:30 +02:00
Peter Groenewegen
7e7430f7df Keeping alpine fresh
Updating alpine to the latest version.
Tested this version of alpine and running fine, keeping versions of dependencies up to date.
2020-02-25 11:09:28 -08:00
Daniel Holbach
7975a78025 update to latest kubectl 1.15 2020-02-21 16:11:10 +01:00
Peter Groenewegen
f86514c1e6 Use newer version of k8s client tools
The version of k8s has security vulnerabilities, updating to a newer version

Tested this this version to on our clusters
2020-02-19 07:55:03 -08:00
Praveen Adusumilli
f2ae01120a Upgrading to latest alpine (#100)
* Upgrading to latest alpine 3.10.3
2019-11-26 16:53:43 +00:00
Nighthawk22
5c21206bdb Merge branch 'master' into master 2019-10-28 10:56:13 +01:00
leigh capili
4beddb5338 Reboot only within time window specified on commandline (#66)
Reboot only within time window specified on commandline
2019-10-23 22:23:51 -06:00
Maximilian Zollneritsch
d1315c691e Added slack channel name configuration 2019-09-11 13:59:09 +02:00
Adam Harrison
8d809333b3 Update embedded kubectl to v1.14.1 2019-05-16 17:07:17 +01:00