Without this, we can't know if the exposed prometheus metrics
behave properly.
This is a problem, as the only way we can evaluate the success
(right now), is a compilation success or failure from kured.
While this is a good start, it doesn't translate to what we
claim to offer: A boolean showing if a reboot is required.
This fixes it by creating a new github action workflow testing
if the float64 gauge is properly showing 0 for no reboot, 1 for reboot.
This is done by exposing the metrics endpoint through a node port.
A helm chart change was required to have the ability to expose
the service on a node port. We connect to the kind node through
docker in the `tests/test-metrics.sh`, where we curl the nodeport,
extract the only relevant metric, and compare it to the expected result.
Without this patch, chart-testing is using the branch named
"master" by default.
This is a problem, as we just renamed our development branch
"main" instead of "master".
This should fix it by pointing to the right branch.
- Make markdownlint happier in a couple of places.
- Rename '*-master-*' files
- Change default branches of some other projects
we rely on. They moved to 'main' as well.
- Standardise version of actions/checkout.
- Update last release in README to 1.6.1.
- Bbump chart version.
Eventually closes: #252
Signed-off-by: Daniel Holbach <daniel@weave.works>
Without this patch, the rebootCommand passed to invokeReboot is
ignored, and the command used for reboot is always systemctl reboot.
This is a problem, as we are aiming for flexible commands for this
release.
This fixes it by restoring the previous behaviour before commit
[1] happened.
[1]: 694957d56e
This patch gives the possibility to send notifications
across different technologies. Also, this patch makes
slack-hook-url, slack-username and slack-channel
deprecated (informed by a warning).
Also, updated the documentation (Readme).
Without this, go test will rightfully fail.
This is a problem, as we don't have go test enabled, but we want
to have this in the future.
This should fix it.
Without this patch, you cannot configure the reboot
command to use, or the use another command to trigger
a reboot.
This is a problem, as multiple users have asked for
it in the past, and we are lacking flexibility.
This fixes it by introducing two new parameters,
- one to provide a custom reboot command.
This should help people running kured on
non systemd OS
- one to provide a custom sentinel command.
This should help people running non Ubuntu OS,
as they can directly use their command instead of
generating a file (useful for CentOS/SUSE)
For this, several refactors had to be done, to
remove global state in some functions. Making those
functions closer to "pure functions" helps us
increase our test coverage here and later.
As commandReboot was very close to rebootCommand,
the function to reboot the node has been renamed
to invokeReboot.
Without this patch, we rely on global state in many functions for
which we check the reboot blockers.
This is a problem, as it's harder to test.
This patch fixes it by refactoring the reboot blockers. This also
includes a first series of unit tests for our main.
Without this patch, the version of 1.20 is taken in jobs as 1.2.
This is a problem, as it breaks all jobs, because there is no
file to provision a cluster with kubernetes 1.2 (and we shouldn't
do this!)
This fixes it by ensuring there is no mangling of the version
strings, and therefore the right file is used.
DeleteLocalData was deprecated for users of kubectl in 0.20 [1].
At the same time of the deprecation, the relevant code was also
removed [2] without warning: The DeleteLocalData from the helper
structure was simply renamed DeleteEmptyDirData, without shims
on the exposed pkg.
This is a problem, as it completely breaks kured.
This should fix it, by using the new field name.
[1]:
56ea9621b7
[2]:
56ea9621b7 (diff-041bdcdedca650a38a8d82cf15ab6f3665b7b84a0fb44a8bb5dcdc5cd944c63d)
Without this patch, go.mod will lag behind for the kubernetes
packages, as it's not automatically tested by dependabot.
We should bump versions with each new minor release of kured.
This should fix it.
adds a new --annotate-nodes daemonset runtime argument, which does the following when enabled:
- adds a new node annotation "weave.works/kured-most-recent-reboot-needed" with a value of the current RFC3339 timestamp as soon as kured identifies that a node needs to be rebooted
- adds a new node annotation "weave.works/kured-reboot-in-progress" with a value of the current RFC3339 timestamp as soon as kured identifies that a node needs to be rebooted
- removes the annotation "weave.works/kured-reboot-in-progress" when kured has successfully rebooted the node
This changes the pre-reboot drain functionality so that it always runs, regardless of the value of the Unschedulable node property.
Because kubectl drain is idempotent, we shouldn't have to worry about whether the node has already been set to Unschedulable (perhaps due to a prior, unsuccessful loop of the kured reboot cycle): we can run it over and over again. And because this drain func actually does a cordon + drain (and it only performs the drain if a cordon is successful), we can be sure that we aren't going to be thrashing this node w/ respect to scheduled pods.
This also fixes an edge case: if the node has been marked Unschedulable out-of-band, but workloads remain Running on this node, kured will no longer reboot the node's underlying VM/machine while it is actively running pods.