Commit Graph

163 Commits

Author SHA1 Message Date
Jean-Philippe Evrard
44a68beb2f Fix incorrect break
Without this, the node cleanup loop is never ending.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2025-08-30 17:20:14 +02:00
Jean-Philippe Evrard
cbf9c46474 Add package comments to pass linters
Without this, you get an error about the lack of package comments.
"package-comments: should have a package comment (revive)"

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2025-08-30 17:20:14 +02:00
7h3-3mp7y-m4n
d677b436a0 fix lint of files
Signed-off-by: 7h3-3mp7y-m4n <emailtorash@gmail.com>
2025-08-30 17:20:14 +02:00
Jean-Philippe Evrard
cb84fad891 Fix bad formatting
Without this, make will rightfully trip for main.go
"non-constant format string in call to github.com/sirupsen/logrus.Warnf".

This should fix it.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2025-08-30 13:54:16 +02:00
Jean-Philippe Evrard
455b3df0dc improve tests (#1021)
* Add e2e test concurrency w/ signal

This will help make sure the big refactoring does not break
the main features.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>

* Add podblocker test

Extends test coverage to ensure nothing breaks

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>

* Rename "version" with "variant" in tests

For tests not running in different kubernetes versions,
but have different tests subcases/variants, rephrase the wording
"versions" as it is confusing.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>

* Fix Staticcheck's SA1024 (subset with dupe chars)

This will replace trim, taking a cutset, with Replace.

This clarifies the intent to remove a substring.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>

* Fix Staticcheck's ST1005

According to staticcheck, Error strings should not be capitalized (ST1005).

This changes the cases for our errors.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>

* Fix incorrect string prints

A few strings have evolved to eventually remove all the templating
part of their strings, yet kept the formatting features.

This is incorrect, and will not pass staticcheck SA1006 and S1039.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>

* Add staticcheck in make tests

Without this, people like myself will forget to run staticcheck.

This fixes it by making it part of make tests, which will run
with all the fast tests in CI.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>

---------

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2025-01-09 14:42:28 -08:00
Jean-Philippe Evrard
e370b0bd4a Remove reassignment in rebootasrequired loop
There is no need to continuously reallocate the check blockers.
They only need to be defined once.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-11-06 18:57:09 +01:00
Jean-Philippe Evrard
f559a95304 Make the internal blockers implementation internal
This at the same time, removes the alert public package.
Alert was only used inside prometheus blocker, so it allows
to simplify the code.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
73f00ce445 Make all the internal validations ... internal
The main is doing flag validation through pflags, then
did further validation by involving the constructors.

With the recent refactor on the commit "Refactor constructors"
in this branch, we moved away from that pattern.

However, it means we reintroduced a log dependency into our
external API, and the external API now had extra validations
regardless of the type.

This is unnecessary, so I moved away from that pattern, and
moved back all the validation into a central place, internal,
which is only doing what kured would desire, without exposing
it to users. The users could still theoretically use the proper
constructors for each type, as they would validate just fine.

The only thing they would lose is the kured internal decision
of validation/precedence.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
626db87158 Add error to reboot interface
Without this, impossible to bubble up errors to main

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
231888e58a Use RegexpValue in plags
This will remove double pointers, and be explicit about the
type we are using.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
d8b9e31ac9 Refactor constructors
Without this, a bit of the validation is done in main, while
the rest is done in each constructor.

This fixes it by create a new global constructor in checkers/reboot to
solve all the cases and bubble up the errors.

I prefered keeping the old constructors, and calling them, this
way someone wanting to have a fork of the code could still create
directly the good checker/rebooter, without the arbitrary decisions
taken by the generic constructor.

However, kured is not a library, and was never intended to be
usable in forks, so we might want to reconsider is part 2 of the
refactor.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
104a745305 Make locks more generic
Implementation details of lock should not leak into the calling
methods.

Without this path, calls are a bit more complex
and error handling is harder to find.

This is a problem for long term maintenance, as it
is tougher to refactor the locks without impacting the main.

Decoupling the two (main usage of the lock, and the lock
themselves) will allow us to introduce other kinds of locks
easily.

I solve this by inlining into the daemonsetlock package:
- including all the methods for managing locks from the main.go
  functions. Those were mostly doing error handling
  where code became no-op by introducing multiple
  daemonsetlock types
- adding the lock release delay part of lock info

I also did not like the pattern include in Test method,
which added a reference to nodeMeta: It was not very clear
that Test was storing the current metadata of the node,
or was returning the current state. (Metadata here only means unschedulable).

The problem I saw was that the metadata was silently
mutated from a lock Test method, which was very not obvious.

Instead, I picked to explicitly return the lock data instead.

I also made it explicit that the Acquire lock method
is passing the node metadata as structured information,
rather than an interface{}. This is a bit more fragile
at runtime, but I prefer having very explicit errors if
the locks are incorrect, rather than having to deal with
unvalidated data.

For the lock release delay, it was part of the rebootasrequired
loop, where I believe it makes more sense to be part of the
Release method itself, for readability. Yet, it hides the
delay into the implementation detail, but it keeps the
reboot as required goroutine more readable.

Instead of passing the argument rebootDelay as parameter of the
rebootasrequired method, this refactor took creation of the lock
object in the main loop, close to all the variables, and then
pass the lock object to the rebootasrequired. This makes the
call for rebootasrequired more clear, and lock is now
encompassing everything needed to acquire, release, or get
info about the lock.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
aae5bb6ebb Raise the error levels for wrong flag
If the notification url configuration is known to be not working,
this should be raised as an error, not a warning.

Without this, it would be easy to miss a misconfiguration.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
a8132a2286 Remove viper/cobra deps
Without this, the main loop is in need of 3 functions to simply
parse flags and env variables (excluding input validation).

This is a bit more complex than it should, especially since
we only need to parse command line flags and env vars.

This fixes it by simply using pflags (which we were already
using) instead of pflags + viper + cobra (for which we
do not have any benefit), and removing all the methods
outside the mapping of env var with cli flag.

The main code is now far simpler: It handles the reading,
parsing, and returning in case of error.

As we do not bubble up errors from rebootasRequired yet,
this is good enough at this moment.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
42c4b8bc53 Revert to use a constructor again
Without this, we have no validation of the data in command/signal
reboot.

This was not a problem in the first refactor, as the constructor
was a dummy one, without validation.

However, as we refactoed, we now have code in the root method
that is validation for the reboot command. This can now be
encompassed in the constructor.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
3895a2f6d3 Remove nodeID from rebooter interface
Without this patch, the rebooter interface has data which is
not related to the rebooter interface. This should get removed
to make it easier to maintain.

The loss comes from the logging, which mentioned the node.

In order to not have a regression compared to [1], this ensures
that at least the node to be rebooted appears in the main.

[1]:  https://github.com/kubereboot/kured/pull/134

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
f43ed1484e Cleanup checkers
Without this, the checkers are only shell calls: test -f
sentinelFile, or sentinelCommand.

This changes the behaviour of existing code to test file for
sentinelFile checker, and to keep the sentinel command as
a command.

However, to avoid having validation in the root loop, it moves
to use a constructor to cleanup the code.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard
36e6c8b4d8 Rename variable
Without this, the variable name is hard to follow.

This fixes it by cleaning up the var name.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-18 00:53:38 +02:00
Jean-Philippe Evrard
00d8a524ab Move command line validations in pre function
Without this, validations are all over the place.
This moves some validations directly into the function, to
make the code simpler to read.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-18 00:53:38 +02:00
Jean-Philippe Evrard
eeedf203c3 Extract blockers
This will make it easier to manipulate main in the future.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-18 00:53:38 +02:00
Jean-Philippe Evrard
574065ff8a Add checker interface
This will be useful to refactor the checkers loop.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-18 00:53:38 +02:00
Jean-Philippe Evrard
3bfdd76f29 Extract privileged command wrapper into util
Without this, it makes the code a bit harder to read.

This fixes it by extracting the method.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-18 00:53:38 +02:00
Jean-Philippe Evrard
f34864758e Cleanup rebooter interface
Without this, the interface and the code to reboot is
a bit more complex than it should be.

We do not need setters and getters, as we are just
instanciating a single instance of a rebooter interface.

We create it based on user input, then pass the object
around. This should cleanup the code.

Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
2024-10-18 00:53:38 +02:00
Christian Hopf
87202d8fcf Add signal-reboot (#814)
* feat: sentinel-command without nsenter by default

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: no readonly mount

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: mount at different folder

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* feat: add signal-reboot

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* feat: make signal configurable and add tests

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* build: rename job

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* cleanup: linter

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* build: also adjust signal manifest

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* test: add e2e-tests

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: small code restructure

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: adjust version-range

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

---------

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
2024-01-06 10:25:11 +01:00
Daniel Malon
d51258ffde feat: add drain delay (#852)
Signed-off-by: Daniel Malon <daniel.malon@me.com>
2023-12-11 10:58:29 -08:00
Jack Francis
8bc66c937d fix: don’t hold node lock if reboot is blocked (#819)
Signed-off-by: Jack Francis <jackfrancis@gmail.com>
2023-08-17 06:15:09 +02:00
Jim
9a4b8fdb32 add argument to invert the behavior of alert-filter-regexp (#786)
* add argument to invert the behavior of alert-filter-regexp

Signed-off-by: Jim Liming <james.k.liming@gmail.com>

* feat: small code-improvements

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

---------

Signed-off-by: Jim Liming <james.k.liming@gmail.com>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Co-authored-by: Christian Kotzbauer <git@ckotzbauer.de>
2023-08-14 18:52:12 +02:00
Thomas Stringer
3b9b190422 Add multiple concurrent node reboot feature (#660)
* Add ability to have multiple nodes get a lock

Currently in kured a single node can get a lock with Acquire. There
could be situations where multiple nodes might want a lock in the event
that a cluster can handle multiple nodes being rebooted. This adds the
side-by-side implementation for a multiple node lock situation.

Signed-off-by: Thomas Stringer <thomas@trstringer.com>

* Refactor to use the same code path for a single lock and a multilock

Signed-off-by: Thomas Stringer <thomas@trstringer.com>

* test: force rebuild

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* build: log pod-logs

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: change condition

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* build: fix test-script

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* build: add concurrent test

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: final changes

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

---------

Signed-off-by: Thomas Stringer <thomas@trstringer.com>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Co-authored-by: Christian Kotzbauer <git@ckotzbauer.de>
2023-08-14 18:33:18 +02:00
nkinkade
351ca71787 Adds new flag --metrics-host (#811)
* Replaces flag --metrics-port with --metrics-addresss

Signed-off-by: Nathan Kinkade <kinkade@measurementlab.net>

* Revert "Replaces flag --metrics-port with --metrics-addresss"

This reverts commit 528c7bb14b.

Signed-off-by: Nathan Kinkade <kinkade@measurementlab.net>

* Adds new --metrics-host flag

The flag --metrics-port already exists. While not as clean, to avoid
introducing a backward incompatible change to flags, this commit adds a
new --metrics-host flag, which in combination with the existing
--metrics-port flag can define a complete listen address for the metrics
server as "<metrics-host>:<metrics-port>"

Signed-off-by: Nathan Kinkade <kinkade@measurementlab.net>

* Adds new, commented flags --metrics-{port,host}

Signed-off-by: Nathan Kinkade <kinkade@measurementlab.net>

---------

Signed-off-by: Nathan Kinkade <kinkade@measurementlab.net>
2023-08-08 08:50:41 +02:00
Christian Kotzbauer
16dc5e30d9 fix: log on unusual sentinel-command exit code (#806)
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
2023-08-02 19:04:52 +02:00
Boris Prüßmann
d019e7a50a Support pod-selector for drain command (#788)
Signed-off-by: Boris Pruessmann <boris@pruessmann.org>
2023-08-02 11:33:29 +02:00
Maxime Leroy
4c75199b41 feat: metrics port command (#780)
Signed-off-by: Maxime Leroy <19607336+maxime1907@users.noreply.github.com>
2023-06-09 21:33:17 +02:00
Jack Francis
1929c11297 fix: annotate nodes for reboot before aborting due to blocked (#749)
Signed-off-by: Jack Francis <jackfrancis@gmail.com>
2023-04-14 10:29:22 +02:00
Christian Kotzbauer
ba1328ca12 feat: Integrate GoReleaser, Cosign and Syft (#595)
* build: integrate goreleaser, syft and cosign

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: chmod for all binaries

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: version-env

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: remove prefix

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: remove prefix

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: schellcheck

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: shellcheck

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: several script updates

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* fix: remove main-prefix

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
2022-10-02 15:25:17 +02:00
Daniel Holbach
bce0bac183 Changed weaveworks to kubereboot in many places
Areas I did not touch:
- bot name, secrets
- image name
- LICENSE (would need to ask how/if that gets changed...?)
- one mention in the Dev docs that we used to do some
  pre-release smoke-testing on the Weave Dev cluster

Signed-off-by: Daniel Holbach <daniel@weave.works>
2022-09-20 13:17:55 +02:00
dependabot[bot]
9d4ebfc1f8 build(deps): bump alpine from 3.16.1 to 3.16.2 in /cmd/kured (#617)
Bumps alpine from 3.16.1 to 3.16.2.

---
updated-dependencies:
- dependency-name: alpine
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-10 06:20:13 +02:00
Jack Francis
777f5b2cce update command line flags in README (#607) 2022-07-23 09:20:52 +02:00
dependabot[bot]
10d42b07a5 build(deps): bump alpine from 3.16.0 to 3.16.1 in /cmd/kured
Bumps alpine from 3.16.0 to 3.16.1.

---
updated-dependencies:
- dependency-name: alpine
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-22 15:16:47 +00:00
Alexei Tighineanu
28c5332450 added notification when uncordoning (#587)
* added notification when uncordoning

 when reboot & uncordoning is succ
 essful -> notification will be se
 nt

* added uncordon message tmpl

 added message template for
 announcing successful uncor-
 doning and reboot.

* added proper documentation about new flag

 added readme note about new flag
2022-06-25 21:08:05 +02:00
Christian Kotzbauer
115fea9d2a Release 1.10.0 preparation (#572)
* feat: updated helm-chart for 1.10.0
close #551

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>

* feat: update multiarch-dockerfile to 3.16.0

Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
2022-06-08 19:32:09 +02:00
David Shay
641c319eb8 Added support for multi-arch image build (#496)
* Added support for multi-arch image build

* Requested changes to multi-arch build

* Further optimizations of multi build

* multi needs QEMU for some pieces

* change main push for all platforms

* Update Dockerfile to call Makefile

* Remove manual workflow
2022-06-07 08:23:36 +02:00
dependabot[bot]
cd7c4f8da3 build(deps): bump alpine from 3.15.4 to 3.16.0 in /cmd/kured (#560)
Bumps alpine from 3.15.4 to 3.16.0.

---
updated-dependencies:
- dependency-name: alpine
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-25 06:54:04 +02:00
harbottle
6191c73a3c Use clean patch to update node labels. Fixes #553 2022-05-20 08:16:45 +02:00
harbottle
48d112ba32 Change after-reboot-node-labels flag to post-reboot-node-labels 2022-05-18 11:39:38 +02:00
harbottle
50aac294b7 Use Errorf instead of Fatalf for node label logging 2022-05-18 11:39:38 +02:00
harbottle
c3cb2bbc6c Tidy node labelling code 2022-05-18 11:39:38 +02:00
harbottle
9be88fb878 Add verification for node labelling flags 2022-05-18 11:39:38 +02:00
harbottle
4fcf6e184b Add node labelling 2022-05-18 11:39:38 +02:00
Jack Francis
aa5c3e7783 strip unnecessary quotes for notify-url configurations 2022-05-17 19:33:35 +02:00
Jack Francis
d965e7f67e Merge pull request #486 from jackfrancis/retry-cordon-drain
retry cordon + drain if fail, keep lock
2022-05-06 12:19:31 -07:00