kured

mirror of https://github.com/kubereboot/kured.git synced 2026-05-19 06:46:44 +00:00

Author	SHA1	Message	Date
Daniel Kvist	b108aa4d2d	Support json logformatter This commit introduces a new flag '--log-format' that allows a user to configure json logging on the pods. If the log-format is not specified, the formatter will default to the existing text formatter.	2021-10-25 14:38:53 +02:00
Jack	3c2508050d	fix: don't use nil context in drain helper	2021-09-27 12:43:20 -07:00
Cameron McAvoy	cee15cfc32	Add force-reboot and drain timeouts to chart config and ds	2021-09-15 10:42:50 -05:00
Daniel Holbach	0955403470	Merge pull request #429 from weaveworks/alpine-3.14 build: updated to alpine@3.14	2021-08-30 10:54:35 +02:00
Christian Kotzbauer	9473f831be	build: updated to alpine@3.14 Signed-off-by: Christian Kotzbauer <christian.kotzbauer@gmail.com>	2021-08-25 20:19:03 +02:00
Andres Morey	3c5eb968d3	Add `reboot-delay` command line argument Currently, kured issues the system reboot command immediately after kubectl drain finishes. This is a problem for processes that need extra time to finish but aren't running on pods and therefore aren't controlled by kubectl drain (e.g. de-registering nodes from external load balancers). This patch solves the problem by introducing a `reboot-delay` command line argument that can be used to add a delay after kubectl drain finishes but before the reboot command is issued.	2021-08-03 16:48:25 +03:00
Matt Jeanes	6af3f1abc1	Add --alert-firing-only parameter to only consider firing alerts	2021-07-27 11:23:10 +01:00
SimeonPoot	c7d5810503	Restructuring Prometheus client, added unit-tests to regex-queries active alerts (#386 ) * prometheus labels incl tests * enable label in main, add log, docs * revert the option to query by label * revert the option to query by label * PromClient instantiate by func,white space removal * revert whitespace fix for readability. * revert removal of newlines for readability * rename New to NewPromClient to improve readability Co-authored-by: simp <simp@saxobank.com>	2021-07-27 07:09:46 +02:00
Danny Kulchinsky	c826d73695	fix slack deprecation notice	2021-05-28 13:52:01 -04:00
Jean-Philippe Evrard	79f22cee67	Merge branch 'main' into release-lock-delay	2021-04-14 09:48:28 +02:00
Steffen Pingel	f7b3de36a6	Add parameter for delaying release of lock This support throtteling of reboots across the cluster and allows rebooted nodes to reschedule pods, e.g. to synchronize replicated state before rebooting the next node.	2021-04-13 10:14:14 +02:00
Cameron McAvoy	25dcf3cb12	Expose SkipWaitForDeleteTimeoutSeconds and explicitly return when cordonning fails	2021-04-08 09:52:15 -05:00
Cameron McAvoy	5a86ef40e8	Update the default drain timeout to be infinite	2021-04-07 17:17:33 -05:00
Cameron McAvoy	2400f34cc0	Don't panic if the cordon fails and force-reboot is true	2021-04-07 14:58:21 -05:00
Cameron McAvoy	8db5650510	Refactor force-drain to be a drain-timeout in general	2021-04-07 12:57:01 -05:00
Cameron McAvoy	65292983f2	Add force-reboot after force-timeout duration has been exceeded	2021-04-07 09:39:01 -05:00
Jean-Philippe Evrard	4d45fa8bdb	Fix invoke reboot for custom commands Without this patch, the rebootCommand passed to invokeReboot is ignored, and the command used for reboot is always systemctl reboot. This is a problem, as we are aiming for flexible commands for this release. This fixes it by restoring the previous behaviour before commit [1] happened. [1]: `694957d56e`	2021-04-02 09:15:59 +02:00
atighineanu	694957d56e	Implement universal notification mechanism This patch gives the possibility to send notifications across different technologies. Also, this patch makes slack-hook-url, slack-username and slack-channel deprecated (informed by a warning). Also, updated the documentation (Readme).	2021-03-29 11:26:18 +02:00
Jean-Philippe Evrard	5930d733f8	Fix the Fatal calls using formatting Without this, go test will rightfully fail. This is a problem, as we don't have go test enabled, but we want to have this in the future. This should fix it.	2021-03-29 09:50:56 +02:00
Jean-Philippe Evrard	fd63e9a74b	Add flexible commands parameters Without this patch, you cannot configure the reboot command to use, or the use another command to trigger a reboot. This is a problem, as multiple users have asked for it in the past, and we are lacking flexibility. This fixes it by introducing two new parameters, - one to provide a custom reboot command. This should help people running kured on non systemd OS - one to provide a custom sentinel command. This should help people running non Ubuntu OS, as they can directly use their command instead of generating a file (useful for CentOS/SUSE) For this, several refactors had to be done, to remove global state in some functions. Making those functions closer to "pure functions" helps us increase our test coverage here and later. As commandReboot was very close to rebootCommand, the function to reboot the node has been renamed to invokeReboot.	2021-03-29 09:50:56 +02:00
Jean-Philippe Evrard	837bd4eb2a	Refactor reboot blocks Without this patch, we rely on global state in many functions for which we check the reboot blockers. This is a problem, as it's harder to test. This patch fixes it by refactoring the reboot blockers. This also includes a first series of unit tests for our main.	2021-03-29 09:50:56 +02:00
Jean-Philippe Evrard	15c57927c8	Update the deprecated DeleteLocalData DeleteLocalData was deprecated for users of kubectl in 0.20 [1]. At the same time of the deprecation, the relevant code was also removed [2] without warning: The DeleteLocalData from the helper structure was simply renamed DeleteEmptyDirData, without shims on the exposed pkg. This is a problem, as it completely breaks kured. This should fix it, by using the new field name. [1]: `56ea9621b7` [2]: `56ea9621b7 (diff-041bdcdedca650a38a8d82cf15ab6f3665b7b84a0fb44a8bb5dcdc5cd944c63d)`	2021-03-22 14:28:17 +01:00
Daniel Holbach	f6ada05c5d	Merge pull request #320 from dholbach/alpine-3.13 update to alpine 3.13	2021-03-10 08:50:42 +01:00
Daniel Holbach	355813de30	update to alpine 3.13 Signed-off-by: Daniel Holbach <daniel@weave.works>	2021-03-10 08:10:36 +01:00
Daniel Holbach	250b9bad05	Merge pull request #296 from jackfrancis/node-annotations add node annotations to identify kured reboot operations	2021-03-09 10:14:46 +01:00
Jack Francis	baf83408b8	add node annotations adds a new --annotate-nodes daemonset runtime argument, which does the following when enabled: - adds a new node annotation "weave.works/kured-most-recent-reboot-needed" with a value of the current RFC3339 timestamp as soon as kured identifies that a node needs to be rebooted - adds a new node annotation "weave.works/kured-reboot-in-progress" with a value of the current RFC3339 timestamp as soon as kured identifies that a node needs to be rebooted - removes the annotation "weave.works/kured-reboot-in-progress" when kured has successfully rebooted the node	2021-03-08 17:22:47 -08:00
Jack Francis	93c8242b89	always drain before reboot This changes the pre-reboot drain functionality so that it always runs, regardless of the value of the Unschedulable node property. Because kubectl drain is idempotent, we shouldn't have to worry about whether the node has already been set to Unschedulable (perhaps due to a prior, unsuccessful loop of the kured reboot cycle): we can run it over and over again. And because this drain func actually does a cordon + drain (and it only performs the drain if a cordon is successful), we can be sure that we aren't going to be thrashing this node w/ respect to scheduled pods. This also fixes an edge case: if the node has been marked Unschedulable out-of-band, but workloads remain Running on this node, kured will no longer reboot the node's underlying VM/machine while it is actively running pods.	2021-03-08 17:20:31 -08:00
Daniel Holbach	fade706cbf	Merge pull request #250 from damoon/19-PreferNoSchedule implement issue-19 add prefer no schedule taint to avoid double draining of pods	2021-01-12 14:28:23 +01:00
David Sauer	5a4e197d27	change taint config to be disabled by default	2021-01-11 18:24:17 +01:00
David Sauer	3a35d6a46c	remove taint in case the reboot is not needed anymore	2021-01-06 22:21:41 +01:00
David Sauer	34446f949e	Allow to disable tainting during pending node reboot by setting the taint name to an empty string.	2021-01-06 21:39:32 +01:00
David Sauer	e4c684c3af	taint node with PreferNoSchedule to prevent receiving (and double draining) additional pods from other rebooting nodes	2021-01-06 21:23:40 +01:00
David Sauer	204a06ca38	fixed call of log.Fatal instead of log.Fatalf	2021-01-06 21:23:40 +01:00
David Sauer	48897eb0ab	avoid indentations to ease readability	2021-01-06 21:23:40 +01:00
Jean-Philippe Evrard	897834a9db	Temporarily workaround alpine issue Until a new alpine image is created, we should ensure the latest packages are used, and therefore we should upgrade default installed packages. Without this patch, we'll have outdated and vulnerable packages until a new 3.12 image is released. This is a problem, as we'll publish broken images. This should temporarily workaround it, at the expense of larger images (contains package cache)	2020-12-14 11:20:27 +01:00
Daniel Jimenez Garcia	51cab0dedc	rename message template parameters so they are not related to slack	2020-11-25 16:20:54 +00:00
Daniel Jimenez Garcia	f059cec794	GH-125, add additional parameters to override the drain/reboot slack messages	2020-11-25 16:19:31 +00:00
Bryan Boreham	1ba3acab98	Drain: allow pods grace period to terminate The default of 0 is taken as "delete immediately", which is not appropriate.	2020-11-23 18:07:56 +00:00
Daniel Holbach	aa49cfd8c4	Merge pull request #215 from evrardjp/make-lint-happier Make go lint on cmd folder happier	2020-11-09 11:49:51 +01:00
Bryan Boreham	4c31184422	Merge pull request #213 from mvisonneau/lock_ttl Replaced --annotationTTL with --lockTTL and fixed bug	2020-11-06 11:31:19 +00:00
Jean-Philippe Evrard	7091debe23	Make lint happier Without this, golint is complaining about a few cosmetic changes. This solves it, and is necessary if we want to add a lint test in CI.	2020-11-05 10:14:39 +01:00
Jean-Philippe Evrard	ce6075c800	Remove prom-active-alerts Prom-active-alerts command is not used, not tested, and currently broken. Let's remove it.	2020-11-05 10:13:50 +01:00
Maxime VISONNEAU	9648d1d759	Replaced --annotationTTL with --lockTTL and made it work correctly	2020-10-30 10:39:18 +00:00
Jean-Philippe Evrard	e5a2d4acc7	Refactor drain/uncordon Moving the drainer object close to its usage is more readable.	2020-10-29 11:45:20 +01:00
Jean-Philippe Evrard	72c4112e20	Use kubectl as library instead of calling from cli	2020-10-15 13:02:35 +02:00
Jean-Philippe Evrard	b0bd603931	fix: Follow DKL-DI-0004 guideline Without this patch, we need to build a cache, remove it. Since apk allows to work with no-cache and won't leave artifacts, we should use it. This will make the dockle best practices scanner happier.	2020-09-11 16:53:59 +02:00
Daniel Holbach	3ebc224958	update alpine to 3.12, k8s 1.18.8	2020-08-28 10:27:39 +02:00
Daniel Holbach	16109017ce	Prepare for k8s release 1.19 (Aug 25) This is #152, #139, #127 in disguise. Maybe this time let it simmer a bit longer until the k8s release is there?	2020-08-19 17:30:00 +02:00
Daniel Holbach	8fafad18bb	Revert #139 This is a follow-up to #150, so we can get a 1.4.x release out that will be geared towards k8s 1.1[6-8]. Update to latest 1.17 kubectl: 1.17.7.	2020-06-26 17:30:01 +02:00
Bryan Boreham	ec75533394	Merge pull request #119 from michalschott/annotationTTL Adding --annotation-ttl for automatic unlock	2020-05-20 11:30:44 +01:00

1 2

94 Commits