65 Commits

Author SHA1 Message Date
Jean-Philippe Evrard
cbf9c46474 Add package comments to pass linters
Without this, you get an error about the lack of package comments.
"package-comments: should have a package comment (revive)"

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2025-08-30 17:20:14 +02:00
7h3-3mp7y-m4n
d677b436a0 fix lint of files
Signed-off-by: 7h3-3mp7y-m4n <emailtorash@gmail.com>
2025-08-30 17:20:14 +02:00
Jean-Philippe Evrard
455b3df0dc improve tests (#1021)
* Add e2e test concurrency w/ signal

This will help make sure the big refactoring does not break
the main features.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>

* Add podblocker test

Extends test coverage to ensure nothing breaks

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>

* Rename "version" with "variant" in tests

For tests not running in different kubernetes versions,
but have different tests subcases/variants, rephrase the wording
"versions" as it is confusing.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>

* Fix Staticcheck's SA1024 (subset with dupe chars)

This will replace trim, taking a cutset, with Replace.

This clarifies the intent to remove a substring.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>

* Fix Staticcheck's ST1005

According to staticcheck, Error strings should not be capitalized (ST1005).

This changes the cases for our errors.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>

* Fix incorrect string prints

A few strings have evolved to eventually remove all the templating
part of their strings, yet kept the formatting features.

This is incorrect, and will not pass staticcheck SA1006 and S1039.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>

* Add staticcheck in make tests

Without this, people like myself will forget to run staticcheck.

This fixes it by making it part of make tests, which will run
with all the fast tests in CI.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>

---------

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2025-01-09 14:42:28 -08:00
Jean-Philippe Evrard
94e73465ad Add stdout and stderr to log info
Without this, we are loosing features based on previous logrus
implementation. Now, we will log the stdout and stderr for
each call.

Next to this, we ensure the call of the log. methods will be
ready for the switch to get rid of logrus in the future.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-11-06 08:41:19 +01:00
Jean-Philippe Evrard
f20a1ddd05 Fix goroutine leak
Without this patch, we use WriterLevel, which spawns
go routines. As we do it at every call of the util commands,
we spawn goroutines at every check.

This is a problem as it leads to memory management issues.

This fixes it by using a buffer for stdout and stderr, then
logging the results after the command was executed.

To make sure the logging happened at the same place, I inlined
the code from utils. This results in duplicated the code.

However, this is not a big problem as:
- It makes the code more readable
- The implementation between checkers and rebooters _ARE_
  different -- One definitely NEEDS privileges, while the other
  does not... Which could lead to later improvements.

Removing a "utils" package is not really a big deal (it
is kinda a win in itself, as it is an anti-pattern), as the
test coverage was kept.

Partial-Fix: #1004
Fixes: #1013
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-11-05 22:11:13 +01:00
Jean-Philippe Evrard
f559a95304 Make the internal blockers implementation internal
This at the same time, removes the alert public package.
Alert was only used inside prometheus blocker, so it allows
to simplify the code.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
73f00ce445 Make all the internal validations ... internal
The main is doing flag validation through pflags, then
did further validation by involving the constructors.

With the recent refactor on the commit "Refactor constructors"
in this branch, we moved away from that pattern.

However, it means we reintroduced a log dependency into our
external API, and the external API now had extra validations
regardless of the type.

This is unnecessary, so I moved away from that pattern, and
moved back all the validation into a central place, internal,
which is only doing what kured would desire, without exposing
it to users. The users could still theoretically use the proper
constructors for each type, as they would validate just fine.

The only thing they would lose is the kured internal decision
of validation/precedence.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
626db87158 Add error to reboot interface
Without this, impossible to bubble up errors to main

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
67df0e935a Remove deprecated PollWithContext
Replaced with PollUntilContextTimeout.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
d8b9e31ac9 Refactor constructors
Without this, a bit of the validation is done in main, while
the rest is done in each constructor.

This fixes it by create a new global constructor in checkers/reboot to
solve all the cases and bubble up the errors.

I prefered keeping the old constructors, and calling them, this
way someone wanting to have a fork of the code could still create
directly the good checker/rebooter, without the arbitrary decisions
taken by the generic constructor.

However, kured is not a library, and was never intended to be
usable in forks, so we might want to reconsider is part 2 of the
refactor.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
104a745305 Make locks more generic
Implementation details of lock should not leak into the calling
methods.

Without this path, calls are a bit more complex
and error handling is harder to find.

This is a problem for long term maintenance, as it
is tougher to refactor the locks without impacting the main.

Decoupling the two (main usage of the lock, and the lock
themselves) will allow us to introduce other kinds of locks
easily.

I solve this by inlining into the daemonsetlock package:
- including all the methods for managing locks from the main.go
  functions. Those were mostly doing error handling
  where code became no-op by introducing multiple
  daemonsetlock types
- adding the lock release delay part of lock info

I also did not like the pattern include in Test method,
which added a reference to nodeMeta: It was not very clear
that Test was storing the current metadata of the node,
or was returning the current state. (Metadata here only means unschedulable).

The problem I saw was that the metadata was silently
mutated from a lock Test method, which was very not obvious.

Instead, I picked to explicitly return the lock data instead.

I also made it explicit that the Acquire lock method
is passing the node metadata as structured information,
rather than an interface{}. This is a bit more fragile
at runtime, but I prefer having very explicit errors if
the locks are incorrect, rather than having to deal with
unvalidated data.

For the lock release delay, it was part of the rebootasrequired
loop, where I believe it makes more sense to be part of the
Release method itself, for readability. Yet, it hides the
delay into the implementation detail, but it keeps the
reboot as required goroutine more readable.

Instead of passing the argument rebootDelay as parameter of the
rebootasrequired method, this refactor took creation of the lock
object in the main loop, close to all the variables, and then
pass the lock object to the rebootasrequired. This makes the
call for rebootasrequired more clear, and lock is now
encompassing everything needed to acquire, release, or get
info about the lock.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
42c4b8bc53 Revert to use a constructor again
Without this, we have no validation of the data in command/signal
reboot.

This was not a problem in the first refactor, as the constructor
was a dummy one, without validation.

However, as we refactoed, we now have code in the root method
that is validation for the reboot command. This can now be
encompassed in the constructor.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
3895a2f6d3 Remove nodeID from rebooter interface
Without this patch, the rebooter interface has data which is
not related to the rebooter interface. This should get removed
to make it easier to maintain.

The loss comes from the logging, which mentioned the node.

In order to not have a regression compared to [1], this ensures
that at least the node to be rebooted appears in the main.

[1]:  https://github.com/kubereboot/kured/pull/134

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
f43ed1484e Cleanup checkers
Without this, the checkers are only shell calls: test -f
sentinelFile, or sentinelCommand.

This changes the behaviour of existing code to test file for
sentinelFile checker, and to keep the sentinel command as
a command.

However, to avoid having validation in the root loop, it moves
to use a constructor to cleanup the code.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
eeedf203c3 Extract blockers
This will make it easier to manipulate main in the future.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-18 00:53:38 +02:00
Jean-Philippe Evrard
574065ff8a Add checker interface
This will be useful to refactor the checkers loop.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-18 00:53:38 +02:00
Jean-Philippe Evrard
3bfdd76f29 Extract privileged command wrapper into util
Without this, it makes the code a bit harder to read.

This fixes it by extracting the method.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-18 00:53:38 +02:00
Jean-Philippe Evrard
f34864758e Cleanup rebooter interface
Without this, the interface and the code to reboot is
a bit more complex than it should be.

We do not need setters and getters, as we are just
instanciating a single instance of a rebooter interface.

We create it based on user input, then pass the object
around. This should cleanup the code.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-18 00:53:38 +02:00
Christian Hopf
87202d8fcf Add signal-reboot (#814)
* feat: sentinel-command without nsenter by default

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: no readonly mount

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: mount at different folder

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* feat: add signal-reboot

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* feat: make signal configurable and add tests

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* build: rename job

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* cleanup: linter

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* build: also adjust signal manifest

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* test: add e2e-tests

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: small code restructure

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: adjust version-range

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

---------

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
2024-01-06 10:25:11 +01:00
Jim
9a4b8fdb32 add argument to invert the behavior of alert-filter-regexp (#786)
* add argument to invert the behavior of alert-filter-regexp

Signed-off-by: Jim Liming <james.k.liming@gmail.com>

* feat: small code-improvements

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

---------

Signed-off-by: Jim Liming <james.k.liming@gmail.com>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Co-authored-by: Christian Kotzbauer <git@ckotzbauer.de>
2023-08-14 18:52:12 +02:00
Thomas Stringer
3b9b190422 Add multiple concurrent node reboot feature (#660)
* Add ability to have multiple nodes get a lock

Currently in kured a single node can get a lock with Acquire. There
could be situations where multiple nodes might want a lock in the event
that a cluster can handle multiple nodes being rebooted. This adds the
side-by-side implementation for a multiple node lock situation.

Signed-off-by: Thomas Stringer <thomas@trstringer.com>

* Refactor to use the same code path for a single lock and a multilock

Signed-off-by: Thomas Stringer <thomas@trstringer.com>

* test: force rebuild

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* build: log pod-logs

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: change condition

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* build: fix test-script

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* build: add concurrent test

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: final changes

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

---------

Signed-off-by: Thomas Stringer <thomas@trstringer.com>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Co-authored-by: Christian Kotzbauer <git@ckotzbauer.de>
2023-08-14 18:33:18 +02:00
atighineanu
bab1425e1a removed notifications/slack package
In this PR the slack-hook-url is translated
 into shoutrrr syntax. Therefore, slack pack
 age as well as checks for slack-hook-url in
 drain and reboot functions are removed.
 Also added a unit test for flagCheck(), this
 function also checks the (slack)URL syntax.
2021-10-07 10:37:47 +02:00
Matt Jeanes
6af3f1abc1 Add --alert-firing-only parameter to only consider firing alerts 2021-07-27 11:23:10 +01:00
SimeonPoot
c7d5810503 Restructuring Prometheus client, added unit-tests to regex-queries active alerts (#386)
* prometheus labels incl tests

* enable label in main, add log, docs

* revert the option to query by label

* revert the option to query by label

* PromClient instantiate by func,white space removal

* revert whitespace fix for readability.

* revert removal of newlines for readability

* rename New to NewPromClient to improve readability

Co-authored-by: simp <simp@saxobank.com>
2021-07-27 07:09:46 +02:00
Jack Francis
390f6e9f99 chore: retry daemonset get operations 2021-04-07 09:27:05 -07:00
David Sauer
b3e39418ba cache taint state to avoid unnecessary API calls 2021-01-06 21:51:43 +01:00
David Sauer
34446f949e Allow to disable tainting during pending node reboot by setting the taint name to an empty string. 2021-01-06 21:39:32 +01:00
David Sauer
10d95c426f fixed type & renamed variable 2021-01-06 21:29:35 +01:00
David Sauer
e4c684c3af taint node with PreferNoSchedule to prevent receiving (and double draining) additional pods from other rebooting nodes 2021-01-06 21:23:40 +01:00
Daniel Jimenez Garcia
f059cec794 GH-125, add additional parameters to override the drain/reboot slack messages 2020-11-25 16:19:31 +00:00
Daniel Holbach
2fef8b1b12 Merge pull request #206 from chentex/time-wrap
Added support for time wrap in timewindow.Contains
2020-11-25 10:28:57 +01:00
Daniel Holbach
7461ab8d95 Merge pull request #222 from evrardjp/make-lint-happier-for-pkg-folder
Make go lint on pkg folder happier
2020-11-09 11:50:58 +01:00
Daniel Holbach
aa49cfd8c4 Merge pull request #215 from evrardjp/make-lint-happier
Make go lint on cmd folder happier
2020-11-09 11:49:51 +01:00
Bryan Boreham
4c31184422 Merge pull request #213 from mvisonneau/lock_ttl
Replaced --annotationTTL with --lockTTL and fixed bug
2020-11-06 11:31:19 +00:00
Jean-Philippe Evrard
5d88e6c6db Make lint happier in pkg folder
Without this patch, lint will complain about a few cosmetic details.
2020-11-05 11:01:49 +01:00
Jean-Philippe Evrard
7091debe23 Make lint happier
Without this, golint is complaining about a few cosmetic changes.
This solves it, and is necessary if we want to add a lint test
in CI.
2020-11-05 10:14:39 +01:00
Maxime VISONNEAU
9648d1d759 Replaced --annotationTTL with --lockTTL and made it work correctly 2020-10-30 10:39:18 +00:00
Jean-Philippe Evrard
19bf5bf224 Bump prometheus
This is required by the vendoring of kubectl.
2020-10-15 13:02:39 +02:00
Vicente Zepeda Mas
2f740b7f9a Added support for time wrap in timewindow.Contains
Add test scenarios to test new cases
Organize test scenarios chronologically

Signed-off-by: Vicente Zepeda Mas <vzepedamas@suse.com>
2020-09-28 13:58:43 +02:00
Daniel Holbach
16109017ce Prepare for k8s release 1.19 (Aug 25)
This is #152, #139, #127 in disguise.

	Maybe this time let it simmer a bit longer until the k8s
	release is there?
2020-08-19 17:30:00 +02:00
Daniel Holbach
8fafad18bb Revert #139
This is a follow-up to #150, so we can get a 1.4.x release
	out that will be geared towards k8s 1.1[6-8].

	Update to latest 1.17 kubectl: 1.17.7.
2020-06-26 17:30:01 +02:00
Bryan Boreham
ec75533394 Merge pull request #119 from michalschott/annotationTTL
Adding --annotation-ttl for automatic unlock
2020-05-20 11:30:44 +01:00
Michal Schott
cf03bc587c Adding unit tests for ttlExpired. 2020-05-05 22:37:18 +02:00
Michal Schott
615e3d4840 Calculate time difference easier. 2020-05-05 14:10:23 +02:00
Michal Schott
7fb16fed9b Adding annotationTTL. 2020-05-05 14:10:22 +02:00
Daniel Holbach
72a31030db replay changes from #127 2020-05-01 09:07:16 +02:00
Daniel Holbach
8e73cf224d Revert parts of #127, move to client-go/kubectl 1.17
After the release of kured 1.4.0 we should be able to go back.

	This was decided in our meeting
(https://docs.google.com/document/d/1bsHTjHhqaaZ7yJnXF6W8c89UB_yn-OoSZEmDnIP34n8/edit#heading=h.8cgszb6vuhza)

	Let's go with supporting 1.1[678] in this release.
2020-04-22 18:32:25 +02:00
Daniel Holbach
0a419d0d34 update to 1.18.0 API
confirmed by running https://github.com/kubernetes-sigs/clientgofix

	closes: #123
2020-03-30 10:11:30 +02:00
Nighthawk22
5c21206bdb Merge branch 'master' into master 2019-10-28 10:56:13 +01:00
leigh capili
4beddb5338 Reboot only within time window specified on commandline (#66)
Reboot only within time window specified on commandline
2019-10-23 22:23:51 -06:00