kured

mirror of https://github.com/kubereboot/kured.git synced 2026-05-25 01:33:18 +00:00

Author	SHA1	Message	Date
Thomas Stringer	3b9b190422	Add multiple concurrent node reboot feature (#660 ) * Add ability to have multiple nodes get a lock Currently in kured a single node can get a lock with Acquire. There could be situations where multiple nodes might want a lock in the event that a cluster can handle multiple nodes being rebooted. This adds the side-by-side implementation for a multiple node lock situation. Signed-off-by: Thomas Stringer <thomas@trstringer.com> * Refactor to use the same code path for a single lock and a multilock Signed-off-by: Thomas Stringer <thomas@trstringer.com> * test: force rebuild Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * build: log pod-logs Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * fix: change condition Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * build: fix test-script Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * build: add concurrent test Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> * fix: final changes Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> --------- Signed-off-by: Thomas Stringer <thomas@trstringer.com> Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de> Co-authored-by: Christian Kotzbauer <git@ckotzbauer.de>	2023-08-14 18:33:18 +02:00
Jean-Philippe Evrard	240a669727	Add prometheus export metrics functional testing Without this, we can't know if the exposed prometheus metrics behave properly. This is a problem, as the only way we can evaluate the success (right now), is a compilation success or failure from kured. While this is a good start, it doesn't translate to what we claim to offer: A boolean showing if a reboot is required. This fixes it by creating a new github action workflow testing if the float64 gauge is properly showing 0 for no reboot, 1 for reboot. This is done by exposing the metrics endpoint through a node port. A helm chart change was required to have the ability to expose the service on a node port. We connect to the kind node through docker in the `tests/test-metrics.sh`, where we curl the nodeport, extract the only relevant metric, and compare it to the expected result.	2021-04-13 16:17:42 +02:00
Daniel Holbach	de4e9a9bd9	Merge pull request #249 from evrardjp/produce-more-logs-for-stopped-containers Add more logs into gates	2020-11-27 13:49:17 +01:00
Jean-Philippe Evrard	81ee206a87	Add more logs into gates This will be necessary to find out why some docker containers fail to come back up in github actions.	2020-11-27 13:31:20 +01:00
Jean-Philippe Evrard	1165cfe6f4	Fix shellcheck issue Without this, shellcheck will complain about double quotes missing.	2020-11-27 12:12:39 +01:00
Jean-Philippe Evrard	67ea5922f4	Improve coordinated reboot output When a failure is happening and the cluster doesn't manage to be back up on time, we exit 1, and don't show docker logs. This is a problem, as we would benefit from a detailed docker output on those cases, when debugging. This fixes it by ensuring the logging is always done at the exit of the script.	2020-11-27 10:59:14 +01:00
Jean-Philippe Evrard	3d75f1b37a	Add smoke/basic functional test Without this patch, we don't test on release whether kured actually works and behave well. This is a problem, as a functional issue could have been hidden by a recent change, as our testing is minimalist (only test the usability, not the functionality). Instead of testing manually, we should ensure this in CI. This fixes it by adding a github action which tests the previously built artifacts before publishing a release. The job consume the helm chart in our code tree (note: this relies on the last released image), and run a functional test triggering a coordinated restart of a whole 5 node cluster deployed with kind, through github actions. Note: The github action needs to reset docker configuration, else the reboot of the node (a docker container in kind) will fail. It will be correctly triggered, but the node will not come back up, with its systemd log mentioning: "Failed to attach 1 to compat systemd cgroup".	2020-08-28 09:25:44 +02:00

7 Commits