Without this, we have no validation of the data in command/signal
reboot.
This was not a problem in the first refactor, as the constructor
was a dummy one, without validation.
However, as we refactoed, we now have code in the root method
that is validation for the reboot command. This can now be
encompassed in the constructor.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Without this patch, the rebooter interface has data which is
not related to the rebooter interface. This should get removed
to make it easier to maintain.
The loss comes from the logging, which mentioned the node.
In order to not have a regression compared to [1], this ensures
that at least the node to be rebooted appears in the main.
[1]: https://github.com/kubereboot/kured/pull/134
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Without this, the checkers are only shell calls: test -f
sentinelFile, or sentinelCommand.
This changes the behaviour of existing code to test file for
sentinelFile checker, and to keep the sentinel command as
a command.
However, to avoid having validation in the root loop, it moves
to use a constructor to cleanup the code.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Without this, the variable name is hard to follow.
This fixes it by cleaning up the var name.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Without this, validations are all over the place.
This moves some validations directly into the function, to
make the code simpler to read.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Without this, it makes the code a bit harder to read.
This fixes it by extracting the method.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Without this, the interface and the code to reboot is
a bit more complex than it should be.
We do not need setters and getters, as we are just
instanciating a single instance of a rebooter interface.
We create it based on user input, then pass the object
around. This should cleanup the code.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
* Move to stable kind cluster filenames
Without this, we have to rename files at every version.
This is really unnecessary, we should only change the files
and be done with it.
This is a problem, as if we move to programmatic test running,
the tests would need to be mutatated at every k8s version.
With this model, we know that only the kind-cluster files
need to be modified for the tests to ba automatically
adapted.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
* Create e2e from go tests interface
Without this, e2e tests need tons of manual work to
test locally, and the results are not easily exposed.
People are less likely to use the e2e tests if they
are tough to use outside the CI.
This commit makes it easier to run tests locally,
and ensures the CI is closer to the Makefile.
At the same time, this removes debt in the github
worfklows: By switching to newer versions of kind,
we can remove the very old workaround for the
failed to attach pid 1.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
* Add node stays as cordonned test
Without this, impossible to prove that the node stays as cordonned
after a reboot by kured.
This refactor also adds the test in the CI, and makes sure the
CI is a bit simpler, by using matrix more extensively.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
* Use hack dir instead of .tmp
This is more idiomatic.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
---------
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Without this, some CI jobs are flaky or slow due to the following
issues:
- Triggering a reboot cause an unrecoverable boot loop.
This fixes it by restarting the containers that are incorrectly
exited.
- API server is down while operations happen.
This fixes it by ensuring at least one API server is up. In this
case, we don't add a reboot marker on the unique api server.
- The amount of nodes in a test environment is larger than
necessary.
This fixes it by ensuring two nodes are required to reboot.
This is enough for concurrency, and for the e2e testing.
- The wait time between operations is high, and can cause
a heartbeat to be missed in the check script.
This fixes it by checking more often, at the expense of
more logging. This is compensated by increasing the amount
of tries.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Without this, the CI would automatically point DH_ORG to
kubereboot/kured on ghcr, instead of pointing to the owner
of the repo.
This makes the CI smoother.
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
Without this patch, the copy of the contributing.md into our
doc site (generated from [1] and [2]) will refer to a local README.md
into the website git repo, which is not existing.
This is a problem, as it generates dead link for lychee on local
runs in 'content/en/docs/development.md'.
This fixes it by making the link absolute, while keeping the
CONTRIBUTING.md in sync between repos. The alternative would be
to edit the site generator in [1]. Yet, I believe having an
absolute link does not hurt, because we already use the full
git repo in other parts of the same documentation.
[1]: 5da8aba559/hack/gen-content.py
[2]: 5da8aba559/external-sources/kubereboot/kured
Signed-off-by: Jean-Philippe Evrard <open-source@a.spamming.party>
- Explain kured code structure
- Update links, as some docs have moved
- Readability and fix typos
Signed-off-by: Daniel Holbach <daniel.holbach@gmail.com>