This commit introduces a new flag '--log-format' that allows a user
to configure json logging on the pods. If the log-format
is not specified, the formatter will default to the existing
text formatter.
Currently, kured issues the system reboot command immediately after
kubectl drain finishes.
This is a problem for processes that need extra time to finish but aren't
running on pods and therefore aren't controlled by kubectl drain (e.g.
de-registering nodes from external load balancers).
This patch solves the problem by introducing a `reboot-delay` command
line argument that can be used to add a delay after kubectl drain
finishes but before the reboot command is issued.
* prometheus labels incl tests
* enable label in main, add log, docs
* revert the option to query by label
* revert the option to query by label
* PromClient instantiate by func,white space removal
* revert whitespace fix for readability.
* revert removal of newlines for readability
* rename New to NewPromClient to improve readability
Co-authored-by: simp <simp@saxobank.com>
This support throtteling of reboots across the cluster
and allows rebooted nodes to reschedule pods, e.g.
to synchronize replicated state before rebooting the next node.
Without this patch, the rebootCommand passed to invokeReboot is
ignored, and the command used for reboot is always systemctl reboot.
This is a problem, as we are aiming for flexible commands for this
release.
This fixes it by restoring the previous behaviour before commit
[1] happened.
[1]: 694957d56e
This patch gives the possibility to send notifications
across different technologies. Also, this patch makes
slack-hook-url, slack-username and slack-channel
deprecated (informed by a warning).
Also, updated the documentation (Readme).
Without this, go test will rightfully fail.
This is a problem, as we don't have go test enabled, but we want
to have this in the future.
This should fix it.
Without this patch, you cannot configure the reboot
command to use, or the use another command to trigger
a reboot.
This is a problem, as multiple users have asked for
it in the past, and we are lacking flexibility.
This fixes it by introducing two new parameters,
- one to provide a custom reboot command.
This should help people running kured on
non systemd OS
- one to provide a custom sentinel command.
This should help people running non Ubuntu OS,
as they can directly use their command instead of
generating a file (useful for CentOS/SUSE)
For this, several refactors had to be done, to
remove global state in some functions. Making those
functions closer to "pure functions" helps us
increase our test coverage here and later.
As commandReboot was very close to rebootCommand,
the function to reboot the node has been renamed
to invokeReboot.
Without this patch, we rely on global state in many functions for
which we check the reboot blockers.
This is a problem, as it's harder to test.
This patch fixes it by refactoring the reboot blockers. This also
includes a first series of unit tests for our main.
DeleteLocalData was deprecated for users of kubectl in 0.20 [1].
At the same time of the deprecation, the relevant code was also
removed [2] without warning: The DeleteLocalData from the helper
structure was simply renamed DeleteEmptyDirData, without shims
on the exposed pkg.
This is a problem, as it completely breaks kured.
This should fix it, by using the new field name.
[1]:
56ea9621b7
[2]:
56ea9621b7 (diff-041bdcdedca650a38a8d82cf15ab6f3665b7b84a0fb44a8bb5dcdc5cd944c63d)
adds a new --annotate-nodes daemonset runtime argument, which does the following when enabled:
- adds a new node annotation "weave.works/kured-most-recent-reboot-needed" with a value of the current RFC3339 timestamp as soon as kured identifies that a node needs to be rebooted
- adds a new node annotation "weave.works/kured-reboot-in-progress" with a value of the current RFC3339 timestamp as soon as kured identifies that a node needs to be rebooted
- removes the annotation "weave.works/kured-reboot-in-progress" when kured has successfully rebooted the node
This changes the pre-reboot drain functionality so that it always runs, regardless of the value of the Unschedulable node property.
Because kubectl drain is idempotent, we shouldn't have to worry about whether the node has already been set to Unschedulable (perhaps due to a prior, unsuccessful loop of the kured reboot cycle): we can run it over and over again. And because this drain func actually does a cordon + drain (and it only performs the drain if a cordon is successful), we can be sure that we aren't going to be thrashing this node w/ respect to scheduled pods.
This also fixes an edge case: if the node has been marked Unschedulable out-of-band, but workloads remain Running on this node, kured will no longer reboot the node's underlying VM/machine while it is actively running pods.
Until a new alpine image is created, we should ensure the latest
packages are used, and therefore we should upgrade default
installed packages.
Without this patch, we'll have outdated and vulnerable packages
until a new 3.12 image is released.
This is a problem, as we'll publish broken images.
This should temporarily workaround it, at the expense of larger
images (contains package cache)
Without this patch, we need to build a cache, remove it.
Since apk allows to work with no-cache and won't leave artifacts,
we should use it.
This will make the dockle best practices scanner happier.