Without this, you get an error about the lack of package comments.
"package-comments: should have a package comment (revive)"
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
* Add e2e test concurrency w/ signal
This will help make sure the big refactoring does not break
the main features.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
* Add podblocker test
Extends test coverage to ensure nothing breaks
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
* Rename "version" with "variant" in tests
For tests not running in different kubernetes versions,
but have different tests subcases/variants, rephrase the wording
"versions" as it is confusing.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
* Fix Staticcheck's SA1024 (subset with dupe chars)
This will replace trim, taking a cutset, with Replace.
This clarifies the intent to remove a substring.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
* Fix Staticcheck's ST1005
According to staticcheck, Error strings should not be capitalized (ST1005).
This changes the cases for our errors.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
* Fix incorrect string prints
A few strings have evolved to eventually remove all the templating
part of their strings, yet kept the formatting features.
This is incorrect, and will not pass staticcheck SA1006 and S1039.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
* Add staticcheck in make tests
Without this, people like myself will forget to run staticcheck.
This fixes it by making it part of make tests, which will run
with all the fast tests in CI.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
---------
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Without this, we are loosing features based on previous logrus
implementation. Now, we will log the stdout and stderr for
each call.
Next to this, we ensure the call of the log. methods will be
ready for the switch to get rid of logrus in the future.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Without this patch, we use WriterLevel, which spawns
go routines. As we do it at every call of the util commands,
we spawn goroutines at every check.
This is a problem as it leads to memory management issues.
This fixes it by using a buffer for stdout and stderr, then
logging the results after the command was executed.
To make sure the logging happened at the same place, I inlined
the code from utils. This results in duplicated the code.
However, this is not a big problem as:
- It makes the code more readable
- The implementation between checkers and rebooters _ARE_
different -- One definitely NEEDS privileges, while the other
does not... Which could lead to later improvements.
Removing a "utils" package is not really a big deal (it
is kinda a win in itself, as it is an anti-pattern), as the
test coverage was kept.
Partial-Fix: #1004Fixes: #1013
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
This at the same time, removes the alert public package.
Alert was only used inside prometheus blocker, so it allows
to simplify the code.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
The main is doing flag validation through pflags, then
did further validation by involving the constructors.
With the recent refactor on the commit "Refactor constructors"
in this branch, we moved away from that pattern.
However, it means we reintroduced a log dependency into our
external API, and the external API now had extra validations
regardless of the type.
This is unnecessary, so I moved away from that pattern, and
moved back all the validation into a central place, internal,
which is only doing what kured would desire, without exposing
it to users. The users could still theoretically use the proper
constructors for each type, as they would validate just fine.
The only thing they would lose is the kured internal decision
of validation/precedence.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Without this, a bit of the validation is done in main, while
the rest is done in each constructor.
This fixes it by create a new global constructor in checkers/reboot to
solve all the cases and bubble up the errors.
I prefered keeping the old constructors, and calling them, this
way someone wanting to have a fork of the code could still create
directly the good checker/rebooter, without the arbitrary decisions
taken by the generic constructor.
However, kured is not a library, and was never intended to be
usable in forks, so we might want to reconsider is part 2 of the
refactor.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Implementation details of lock should not leak into the calling
methods.
Without this path, calls are a bit more complex
and error handling is harder to find.
This is a problem for long term maintenance, as it
is tougher to refactor the locks without impacting the main.
Decoupling the two (main usage of the lock, and the lock
themselves) will allow us to introduce other kinds of locks
easily.
I solve this by inlining into the daemonsetlock package:
- including all the methods for managing locks from the main.go
functions. Those were mostly doing error handling
where code became no-op by introducing multiple
daemonsetlock types
- adding the lock release delay part of lock info
I also did not like the pattern include in Test method,
which added a reference to nodeMeta: It was not very clear
that Test was storing the current metadata of the node,
or was returning the current state. (Metadata here only means unschedulable).
The problem I saw was that the metadata was silently
mutated from a lock Test method, which was very not obvious.
Instead, I picked to explicitly return the lock data instead.
I also made it explicit that the Acquire lock method
is passing the node metadata as structured information,
rather than an interface{}. This is a bit more fragile
at runtime, but I prefer having very explicit errors if
the locks are incorrect, rather than having to deal with
unvalidated data.
For the lock release delay, it was part of the rebootasrequired
loop, where I believe it makes more sense to be part of the
Release method itself, for readability. Yet, it hides the
delay into the implementation detail, but it keeps the
reboot as required goroutine more readable.
Instead of passing the argument rebootDelay as parameter of the
rebootasrequired method, this refactor took creation of the lock
object in the main loop, close to all the variables, and then
pass the lock object to the rebootasrequired. This makes the
call for rebootasrequired more clear, and lock is now
encompassing everything needed to acquire, release, or get
info about the lock.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Without this, we have no validation of the data in command/signal
reboot.
This was not a problem in the first refactor, as the constructor
was a dummy one, without validation.
However, as we refactoed, we now have code in the root method
that is validation for the reboot command. This can now be
encompassed in the constructor.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Without this patch, the rebooter interface has data which is
not related to the rebooter interface. This should get removed
to make it easier to maintain.
The loss comes from the logging, which mentioned the node.
In order to not have a regression compared to [1], this ensures
that at least the node to be rebooted appears in the main.
[1]: https://github.com/kubereboot/kured/pull/134
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Without this, the checkers are only shell calls: test -f
sentinelFile, or sentinelCommand.
This changes the behaviour of existing code to test file for
sentinelFile checker, and to keep the sentinel command as
a command.
However, to avoid having validation in the root loop, it moves
to use a constructor to cleanup the code.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Without this, it makes the code a bit harder to read.
This fixes it by extracting the method.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Without this, the interface and the code to reboot is
a bit more complex than it should be.
We do not need setters and getters, as we are just
instanciating a single instance of a rebooter interface.
We create it based on user input, then pass the object
around. This should cleanup the code.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
* feat: sentinel-command without nsenter by default
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* fix: no readonly mount
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* fix: mount at different folder
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* feat: add signal-reboot
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* feat: make signal configurable and add tests
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* build: rename job
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* cleanup: linter
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* build: also adjust signal manifest
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* test: add e2e-tests
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* fix: small code restructure
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* fix: adjust version-range
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
---------
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* Add ability to have multiple nodes get a lock
Currently in kured a single node can get a lock with Acquire. There
could be situations where multiple nodes might want a lock in the event
that a cluster can handle multiple nodes being rebooted. This adds the
side-by-side implementation for a multiple node lock situation.
Signed-off-by: Thomas Stringer <thomas@trstringer.com>
* Refactor to use the same code path for a single lock and a multilock
Signed-off-by: Thomas Stringer <thomas@trstringer.com>
* test: force rebuild
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* build: log pod-logs
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* fix: change condition
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* build: fix test-script
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* build: add concurrent test
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* fix: final changes
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
---------
Signed-off-by: Thomas Stringer <thomas@trstringer.com>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Co-authored-by: Christian Kotzbauer <git@ckotzbauer.de>
In this PR the slack-hook-url is translated
into shoutrrr syntax. Therefore, slack pack
age as well as checks for slack-hook-url in
drain and reboot functions are removed.
Also added a unit test for flagCheck(), this
function also checks the (slack)URL syntax.
* prometheus labels incl tests
* enable label in main, add log, docs
* revert the option to query by label
* revert the option to query by label
* PromClient instantiate by func,white space removal
* revert whitespace fix for readability.
* revert removal of newlines for readability
* rename New to NewPromClient to improve readability
Co-authored-by: simp <simp@saxobank.com>