6 Commits

Author SHA1 Message Date
Jean-Philippe Evrard
240a669727 Add prometheus export metrics functional testing
Without this, we can't know if the exposed prometheus metrics
behave properly.

This is a problem, as the only way we can evaluate the success
(right now), is a compilation success or failure from kured.
While this is a good start, it doesn't translate to what we
claim to offer: A boolean showing if a reboot is required.

This fixes it by creating a new github action workflow testing
if the float64 gauge is properly showing 0 for no reboot, 1 for reboot.
This is done by exposing the metrics endpoint through a node port.
A helm chart change was required to have the ability to expose
the service on a node port. We connect to the kind node through
docker in the `tests/test-metrics.sh`, where we curl the nodeport,
extract the only relevant metric, and compare it to the expected result.
2021-04-13 16:17:42 +02:00
Daniel Holbach
de4e9a9bd9 Merge pull request #249 from evrardjp/produce-more-logs-for-stopped-containers
Add more logs into gates
2020-11-27 13:49:17 +01:00
Jean-Philippe Evrard
81ee206a87 Add more logs into gates
This will be necessary to find out why some docker containers fail
to come back up in github actions.
2020-11-27 13:31:20 +01:00
Jean-Philippe Evrard
1165cfe6f4 Fix shellcheck issue
Without this, shellcheck will complain about double quotes
missing.
2020-11-27 12:12:39 +01:00
Jean-Philippe Evrard
67ea5922f4 Improve coordinated reboot output
When a failure is happening and the cluster doesn't manage to
be back up on time, we exit 1, and don't show docker logs.

This is a problem, as we would benefit from a detailed docker
output on those cases, when debugging.

This fixes it by ensuring the logging is always done at the
exit of the script.
2020-11-27 10:59:14 +01:00
Jean-Philippe Evrard
3d75f1b37a Add smoke/basic functional test
Without this patch, we don't test on release whether kured actually
works and behave well.

This is a problem, as a functional issue could have been hidden by
a recent change, as our testing is minimalist (only test the
usability, not the functionality).
Instead of testing manually, we should ensure this in CI.

This fixes it by adding a github action which tests the previously
built artifacts before publishing a release. The job consume the helm
chart in our code tree  (note: this relies on the last released image),
and run a functional test triggering a coordinated restart of a
whole 5 node cluster deployed with kind, through github actions.

Note: The github action needs to reset docker configuration, else
the reboot of the node (a docker container in kind) will fail.
It will be correctly triggered, but the node will not come back up,
with its systemd log mentioning: "Failed to attach 1 to compat systemd cgroup".
2020-08-28 09:25:44 +02:00