kured

mirror of https://github.com/kubereboot/kured.git synced 2026-03-03 17:30:20 +00:00

Author	SHA1	Message	Date
Jean-Philippe Evrard	44a68beb2f	Fix incorrect break Without this, the node cleanup loop is never ending. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2025-08-30 17:20:14 +02:00
Jean-Philippe Evrard	cbf9c46474	Add package comments to pass linters Without this, you get an error about the lack of package comments. "package-comments: should have a package comment (revive)" Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2025-08-30 17:20:14 +02:00
7h3-3mp7y-m4n	d677b436a0	fix lint of files Signed-off-by: 7h3-3mp7y-m4n <emailtorash@gmail.com>	2025-08-30 17:20:14 +02:00
Jean-Philippe Evrard	cb84fad891	Fix bad formatting Without this, make will rightfully trip for main.go "non-constant format string in call to github.com/sirupsen/logrus.Warnf". This should fix it. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2025-08-30 13:54:16 +02:00
Jean-Philippe Evrard	455b3df0dc	improve tests (#1021 ) * Add e2e test concurrency w/ signal This will help make sure the big refactoring does not break the main features. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party> * Add podblocker test Extends test coverage to ensure nothing breaks Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party> * Rename "version" with "variant" in tests For tests not running in different kubernetes versions, but have different tests subcases/variants, rephrase the wording "versions" as it is confusing. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party> * Fix Staticcheck's SA1024 (subset with dupe chars) This will replace trim, taking a cutset, with Replace. This clarifies the intent to remove a substring. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party> * Fix Staticcheck's ST1005 According to staticcheck, Error strings should not be capitalized (ST1005). This changes the cases for our errors. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party> * Fix incorrect string prints A few strings have evolved to eventually remove all the templating part of their strings, yet kept the formatting features. This is incorrect, and will not pass staticcheck SA1006 and S1039. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party> * Add staticcheck in make tests Without this, people like myself will forget to run staticcheck. This fixes it by making it part of make tests, which will run with all the fast tests in CI. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party> --------- Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2025-01-09 14:42:28 -08:00
Jean-Philippe Evrard	e370b0bd4a	Remove reassignment in rebootasrequired loop There is no need to continuously reallocate the check blockers. They only need to be defined once. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2024-11-06 18:57:09 +01:00
Jean-Philippe Evrard	f559a95304	Make the internal blockers implementation internal This at the same time, removes the alert public package. Alert was only used inside prometheus blocker, so it allows to simplify the code. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard	73f00ce445	Make all the internal validations ... internal The main is doing flag validation through pflags, then did further validation by involving the constructors. With the recent refactor on the commit "Refactor constructors" in this branch, we moved away from that pattern. However, it means we reintroduced a log dependency into our external API, and the external API now had extra validations regardless of the type. This is unnecessary, so I moved away from that pattern, and moved back all the validation into a central place, internal, which is only doing what kured would desire, without exposing it to users. The users could still theoretically use the proper constructors for each type, as they would validate just fine. The only thing they would lose is the kured internal decision of validation/precedence. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard	626db87158	Add error to reboot interface Without this, impossible to bubble up errors to main Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard	231888e58a	Use RegexpValue in plags This will remove double pointers, and be explicit about the type we are using. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard	d8b9e31ac9	Refactor constructors Without this, a bit of the validation is done in main, while the rest is done in each constructor. This fixes it by create a new global constructor in checkers/reboot to solve all the cases and bubble up the errors. I prefered keeping the old constructors, and calling them, this way someone wanting to have a fork of the code could still create directly the good checker/rebooter, without the arbitrary decisions taken by the generic constructor. However, kured is not a library, and was never intended to be usable in forks, so we might want to reconsider is part 2 of the refactor. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard	104a745305	Make locks more generic Implementation details of lock should not leak into the calling methods. Without this path, calls are a bit more complex and error handling is harder to find. This is a problem for long term maintenance, as it is tougher to refactor the locks without impacting the main. Decoupling the two (main usage of the lock, and the lock themselves) will allow us to introduce other kinds of locks easily. I solve this by inlining into the daemonsetlock package: - including all the methods for managing locks from the main.go functions. Those were mostly doing error handling where code became no-op by introducing multiple daemonsetlock types - adding the lock release delay part of lock info I also did not like the pattern include in Test method, which added a reference to nodeMeta: It was not very clear that Test was storing the current metadata of the node, or was returning the current state. (Metadata here only means unschedulable). The problem I saw was that the metadata was silently mutated from a lock Test method, which was very not obvious. Instead, I picked to explicitly return the lock data instead. I also made it explicit that the Acquire lock method is passing the node metadata as structured information, rather than an interface{}. This is a bit more fragile at runtime, but I prefer having very explicit errors if the locks are incorrect, rather than having to deal with unvalidated data. For the lock release delay, it was part of the rebootasrequired loop, where I believe it makes more sense to be part of the Release method itself, for readability. Yet, it hides the delay into the implementation detail, but it keeps the reboot as required goroutine more readable. Instead of passing the argument rebootDelay as parameter of the rebootasrequired method, this refactor took creation of the lock object in the main loop, close to all the variables, and then pass the lock object to the rebootasrequired. This makes the call for rebootasrequired more clear, and lock is now encompassing everything needed to acquire, release, or get info about the lock. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard	aae5bb6ebb	Raise the error levels for wrong flag If the notification url configuration is known to be not working, this should be raised as an error, not a warning. Without this, it would be easy to miss a misconfiguration. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard	a8132a2286	Remove viper/cobra deps Without this, the main loop is in need of 3 functions to simply parse flags and env variables (excluding input validation). This is a bit more complex than it should, especially since we only need to parse command line flags and env vars. This fixes it by simply using pflags (which we were already using) instead of pflags + viper + cobra (for which we do not have any benefit), and removing all the methods outside the mapping of env var with cli flag. The main code is now far simpler: It handles the reading, parsing, and returning in case of error. As we do not bubble up errors from rebootasRequired yet, this is good enough at this moment. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard	42c4b8bc53	Revert to use a constructor again Without this, we have no validation of the data in command/signal reboot. This was not a problem in the first refactor, as the constructor was a dummy one, without validation. However, as we refactoed, we now have code in the root method that is validation for the reboot command. This can now be encompassed in the constructor. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard	3895a2f6d3	Remove nodeID from rebooter interface Without this patch, the rebooter interface has data which is not related to the rebooter interface. This should get removed to make it easier to maintain. The loss comes from the logging, which mentioned the node. In order to not have a regression compared to [1], this ensures that at least the node to be rebooted appears in the main. [1]: https://github.com/kubereboot/kured/pull/134 Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard	f43ed1484e	Cleanup checkers Without this, the checkers are only shell calls: test -f sentinelFile, or sentinelCommand. This changes the behaviour of existing code to test file for sentinelFile checker, and to keep the sentinel command as a command. However, to avoid having validation in the root loop, it moves to use a constructor to cleanup the code. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2024-10-19 15:51:04 +02:00
Jean-Philippe Evrard	36e6c8b4d8	Rename variable Without this, the variable name is hard to follow. This fixes it by cleaning up the var name. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2024-10-18 00:53:38 +02:00
Jean-Philippe Evrard	00d8a524ab	Move command line validations in pre function Without this, validations are all over the place. This moves some validations directly into the function, to make the code simpler to read. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2024-10-18 00:53:38 +02:00
Jean-Philippe Evrard	eeedf203c3	Extract blockers This will make it easier to manipulate main in the future. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2024-10-18 00:53:38 +02:00
Jean-Philippe Evrard	574065ff8a	Add checker interface This will be useful to refactor the checkers loop. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2024-10-18 00:53:38 +02:00
Jean-Philippe Evrard	3bfdd76f29	Extract privileged command wrapper into util Without this, it makes the code a bit harder to read. This fixes it by extracting the method. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2024-10-18 00:53:38 +02:00
Jean-Philippe Evrard	f34864758e	Cleanup rebooter interface Without this, the interface and the code to reboot is a bit more complex than it should be. We do not need setters and getters, as we are just instanciating a single instance of a rebooter interface. We create it based on user input, then pass the object around. This should cleanup the code. Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>	2024-10-18 00:53:38 +02:00
Christian Hopf	87202d8fcf	Add signal-reboot (#814 ) * feat: sentinel-command without nsenter by default Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * fix: no readonly mount Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * fix: mount at different folder Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * feat: add signal-reboot Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * feat: make signal configurable and add tests Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * build: rename job Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * cleanup: linter Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * build: also adjust signal manifest Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * test: add e2e-tests Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * fix: small code restructure Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * fix: adjust version-range Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> --------- Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>	2024-01-06 10:25:11 +01:00
Daniel Malon	d51258ffde	feat: add drain delay (#852 ) Signed-off-by: Daniel Malon <daniel.malon@me.com>	2023-12-11 10:58:29 -08:00
Jack Francis	8bc66c937d	fix: don’t hold node lock if reboot is blocked (#819 ) Signed-off-by: Jack Francis <jackfrancis@gmail.com>	2023-08-17 06:15:09 +02:00
Jim	9a4b8fdb32	add argument to invert the behavior of alert-filter-regexp (#786 ) * add argument to invert the behavior of alert-filter-regexp Signed-off-by: Jim Liming <james.k.liming@gmail.com> * feat: small code-improvements Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> --------- Signed-off-by: Jim Liming <james.k.liming@gmail.com> Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> Co-authored-by: Christian Kotzbauer <git@ckotzbauer.de>	2023-08-14 18:52:12 +02:00
Thomas Stringer	3b9b190422	Add multiple concurrent node reboot feature (#660 ) * Add ability to have multiple nodes get a lock Currently in kured a single node can get a lock with Acquire. There could be situations where multiple nodes might want a lock in the event that a cluster can handle multiple nodes being rebooted. This adds the side-by-side implementation for a multiple node lock situation. Signed-off-by: Thomas Stringer <thomas@trstringer.com> * Refactor to use the same code path for a single lock and a multilock Signed-off-by: Thomas Stringer <thomas@trstringer.com> * test: force rebuild Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * build: log pod-logs Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * fix: change condition Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * build: fix test-script Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * build: add concurrent test Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * fix: final changes Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> --------- Signed-off-by: Thomas Stringer <thomas@trstringer.com> Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> Co-authored-by: Christian Kotzbauer <git@ckotzbauer.de>	2023-08-14 18:33:18 +02:00
nkinkade	351ca71787	Adds new flag --metrics-host (#811 ) * Replaces flag --metrics-port with --metrics-addresss Signed-off-by: Nathan Kinkade <kinkade@measurementlab.net> * Revert "Replaces flag --metrics-port with --metrics-addresss" This reverts commit `528c7bb14b`. Signed-off-by: Nathan Kinkade <kinkade@measurementlab.net> * Adds new --metrics-host flag The flag --metrics-port already exists. While not as clean, to avoid introducing a backward incompatible change to flags, this commit adds a new --metrics-host flag, which in combination with the existing --metrics-port flag can define a complete listen address for the metrics server as "<metrics-host>:<metrics-port>" Signed-off-by: Nathan Kinkade <kinkade@measurementlab.net> * Adds new, commented flags --metrics-{port,host} Signed-off-by: Nathan Kinkade <kinkade@measurementlab.net> --------- Signed-off-by: Nathan Kinkade <kinkade@measurementlab.net>	2023-08-08 08:50:41 +02:00
Christian Kotzbauer	16dc5e30d9	fix: log on unusual sentinel-command exit code (#806 ) Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>	2023-08-02 19:04:52 +02:00
Boris Prüßmann	d019e7a50a	Support pod-selector for drain command (#788 ) Signed-off-by: Boris Pruessmann <boris@pruessmann.org>	2023-08-02 11:33:29 +02:00
Maxime Leroy	4c75199b41	feat: metrics port command (#780 ) Signed-off-by: Maxime Leroy <19607336+maxime1907@users.noreply.github.com>	2023-06-09 21:33:17 +02:00
Jack Francis	1929c11297	fix: annotate nodes for reboot before aborting due to blocked (#749 ) Signed-off-by: Jack Francis <jackfrancis@gmail.com>	2023-04-14 10:29:22 +02:00
Christian Kotzbauer	ba1328ca12	feat: Integrate GoReleaser, Cosign and Syft (#595 ) * build: integrate goreleaser, syft and cosign Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * fix: chmod for all binaries Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * fix: version-env Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * fix: remove prefix Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * fix: remove prefix Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * fix: schellcheck Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * fix: shellcheck Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * fix: several script updates Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * fix: remove main-prefix Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>	2022-10-02 15:25:17 +02:00
Daniel Holbach	bce0bac183	Changed weaveworks to kubereboot in many places Areas I did not touch: - bot name, secrets - image name - LICENSE (would need to ask how/if that gets changed...?) - one mention in the Dev docs that we used to do some pre-release smoke-testing on the Weave Dev cluster Signed-off-by: Daniel Holbach <daniel@weave.works>	2022-09-20 13:17:55 +02:00
dependabot[bot]	9d4ebfc1f8	build(deps): bump alpine from 3.16.1 to 3.16.2 in /cmd/kured (#617 ) Bumps alpine from 3.16.1 to 3.16.2. --- updated-dependencies: - dependency-name: alpine dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-08-10 06:20:13 +02:00
Jack Francis	777f5b2cce	update command line flags in README (#607 )	2022-07-23 09:20:52 +02:00
dependabot[bot]	10d42b07a5	build(deps): bump alpine from 3.16.0 to 3.16.1 in /cmd/kured Bumps alpine from 3.16.0 to 3.16.1. --- updated-dependencies: - dependency-name: alpine dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2022-07-22 15:16:47 +00:00
Alexei Tighineanu	28c5332450	added notification when uncordoning (#587 ) * added notification when uncordoning when reboot & uncordoning is succ essful -> notification will be se nt * added uncordon message tmpl added message template for announcing successful uncor- doning and reboot. * added proper documentation about new flag added readme note about new flag	2022-06-25 21:08:05 +02:00
Christian Kotzbauer	115fea9d2a	Release 1.10.0 preparation (#572 ) * feat: updated helm-chart for 1.10.0 close #551 Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * feat: update multiarch-dockerfile to 3.16.0 Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>	2022-06-08 19:32:09 +02:00
David Shay	641c319eb8	Added support for multi-arch image build (#496 ) * Added support for multi-arch image build * Requested changes to multi-arch build * Further optimizations of multi build * multi needs QEMU for some pieces * change main push for all platforms * Update Dockerfile to call Makefile * Remove manual workflow	2022-06-07 08:23:36 +02:00
dependabot[bot]	cd7c4f8da3	build(deps): bump alpine from 3.15.4 to 3.16.0 in /cmd/kured (#560 ) Bumps alpine from 3.15.4 to 3.16.0. --- updated-dependencies: - dependency-name: alpine dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-25 06:54:04 +02:00
harbottle	6191c73a3c	Use clean patch to update node labels. Fixes #553	2022-05-20 08:16:45 +02:00
harbottle	48d112ba32	Change after-reboot-node-labels flag to post-reboot-node-labels	2022-05-18 11:39:38 +02:00
harbottle	50aac294b7	Use Errorf instead of Fatalf for node label logging	2022-05-18 11:39:38 +02:00
harbottle	c3cb2bbc6c	Tidy node labelling code	2022-05-18 11:39:38 +02:00
harbottle	9be88fb878	Add verification for node labelling flags	2022-05-18 11:39:38 +02:00
harbottle	4fcf6e184b	Add node labelling	2022-05-18 11:39:38 +02:00
Jack Francis	aa5c3e7783	strip unnecessary quotes for notify-url configurations	2022-05-17 19:33:35 +02:00
Jack Francis	d965e7f67e	Merge pull request #486 from jackfrancis/retry-cordon-drain retry cordon + drain if fail, keep lock	2022-05-06 12:19:31 -07:00

1 2 3 4

163 Commits