* Add ability to have multiple nodes get a lock
Currently in kured a single node can get a lock with Acquire. There
could be situations where multiple nodes might want a lock in the event
that a cluster can handle multiple nodes being rebooted. This adds the
side-by-side implementation for a multiple node lock situation.
Signed-off-by: Thomas Stringer <thomas@trstringer.com>
* Refactor to use the same code path for a single lock and a multilock
Signed-off-by: Thomas Stringer <thomas@trstringer.com>
* test: force rebuild
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* build: log pod-logs
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* fix: change condition
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* build: fix test-script
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* build: add concurrent test
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
* fix: final changes
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
---------
Signed-off-by: Thomas Stringer <thomas@trstringer.com>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Co-authored-by: Christian Kotzbauer <git@ckotzbauer.de>
Without this, we can't know if the exposed prometheus metrics
behave properly.
This is a problem, as the only way we can evaluate the success
(right now), is a compilation success or failure from kured.
While this is a good start, it doesn't translate to what we
claim to offer: A boolean showing if a reboot is required.
This fixes it by creating a new github action workflow testing
if the float64 gauge is properly showing 0 for no reboot, 1 for reboot.
This is done by exposing the metrics endpoint through a node port.
A helm chart change was required to have the ability to expose
the service on a node port. We connect to the kind node through
docker in the `tests/test-metrics.sh`, where we curl the nodeport,
extract the only relevant metric, and compare it to the expected result.
When a failure is happening and the cluster doesn't manage to
be back up on time, we exit 1, and don't show docker logs.
This is a problem, as we would benefit from a detailed docker
output on those cases, when debugging.
This fixes it by ensuring the logging is always done at the
exit of the script.
Without this patch, we don't test on release whether kured actually
works and behave well.
This is a problem, as a functional issue could have been hidden by
a recent change, as our testing is minimalist (only test the
usability, not the functionality).
Instead of testing manually, we should ensure this in CI.
This fixes it by adding a github action which tests the previously
built artifacts before publishing a release. The job consume the helm
chart in our code tree (note: this relies on the last released image),
and run a functional test triggering a coordinated restart of a
whole 5 node cluster deployed with kind, through github actions.
Note: The github action needs to reset docker configuration, else
the reboot of the node (a docker container in kind) will fail.
It will be correctly triggered, but the node will not come back up,
with its systemd log mentioning: "Failed to attach 1 to compat systemd cgroup".