* new schema for etcd collector
* add placeholder
* wip
* get supported distribution
* add exec implementation
* wait for etcd pod to be ready
* misc
* update k0s etcd certs path
* fix unit tests
* address code reviews
* update from code review
* add etcdctl version
* feat: node metrics collector
A collector to collect node metrics served by the API server as
per the documented API https://kubernetes.io/docs/reference/instrumentation/node-metrics/
* Update CRD schemas
* Add tests
* Remove clean from build target
* Update comments
* Commit missing tests
* Remove unnecessary log in tests
* feat: allow templating of the outcome message for the JSON and YAML Compare analyzers
* Update pkg/analyze/json_compare.go
Co-authored-by: Evans Mungai <evans@replicated.com>
* Deprecate old Makefile targets
* Name Makefile targets to point to files they 'make' as per convention
* Update the contributing doc
* Update CLI docs left behind
* feat: Add regular expressions host anaylser
This anaylser is the same as the in-cluster text anaylser. You pass in
search expressions to find values in files collected in a bundle
* additional test assertion to check analyser warn
* ignore some more stuff
* watchrsync make command like kurl project
* document watchrsync
* the actual watchrsync script
* dont need these for npm dependencies
* Revert "dont need these for npm dependencies"
This reverts commit bdb4e62c38.
* install gaze-run-interrupt with make
* watchrsync instructions
* Boiler plate for host copy collector
* feat: Add copy host collector
* Add tests
* No need to handle symlinks in a special way
System libraries (os.ReadAll, os.ReadDir) already handle symlinks
The which binary is used to detect if client-gen is installed, and if
it's not, the Makefile will install it. The initial detection prints
an error if it's not found. This is misleading, as it is actually an
expected situation.
When running a support bundle, we want to know how long each operation
(collect, redact, analyze) takes. This commit adds a new trace exporter
that records the start and end times of each operation, and then prints
a summary of the execution. The summary is also stored in the support
bundle.
Related to #923
* adding test coverage for preflight.RunPreflights()
TDD to work on https://github.com/replicatedhq/troubleshoot/issues/906
and verify the fix is successful
* go.mod/go.sum: removing gnomock stuff since it's not in use (yet)
* Makefile: try running the preflight integration test with the e2e tests,
since there's a K3s instance in place already
* Makefile add a dedicated test-integration task, which runs as it's own
github action job
* Makefile: exclude a few things from test-integration that break the
github action job
* WIP on preflight tests, addressing some of @banjoh's feedback, more to
go though (specifically changing over to using assert)
* preflight tests: use the testify libraries, restructure code to be
formatted more like other tests in this project
* feat(redactors): Run redactors on an existing support bundle
Add redact subcommand to support-bundle to allow running redactors on an
existing bundle to creating a new redacted bundle.
The command will be launched like so
support-bundle redact <redactor urls> --bundle support-bundle.tar.gz
Fixes: #705
* Add collect command and remote host collectors
Adds the ability to run a host collector on a set of remote k8s nodes.
Target nodes can be filtered using the --selector flag, with the same
syntax as kubectl. Existing flags for --collector-image,
--collector-pullpolicy and --request-timeout are used. To run on a
specified node, --selector="kubernetes.io/hostname=kind-worker2" could
be used.
The collect command is used by the remote collector to output the
results using a "raw" format, which uses the filename as the key, and
the value the output as a escaped json string. When run manually it
defaults to fully decoded json. The existing block devices,
ipv4interfaces and services host collectors don't decode properly - the
fix is to convert their slice output to a map (fix not included as
unsure what depends on the existing format).
The collect command is also useful for troubleshooting preflight issues.
Examples are included to show remote collector usage.
```
bin/collect --collector-image=croomes/troubleshoot:latest examples/collect/remote/memory.yaml --namespace test
{
"kind-control-plane": {
"system/memory.json": {
"total": 1304207360
}
},
"kind-worker": {
"system/memory.json": {
"total": 1695780864
}
},
"kind-worker2": {
"system/memory.json": {
"total": 1726353408
}
}
}
```
The preflight command has been updated to run remote collectors. To run
a host collector remotely it must be specified in the spec as a
`remoteCollector`:
```
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
name: memory
spec:
remoteCollectors:
- memory:
collectorName: memory
analyzers:
- memory:
outcomes:
- fail:
when: "< 8Gi"
message: At least 8Gi of memory is required
- warn:
when: "< 32Gi"
message: At least 32Gi of memory is recommended
- pass:
message: The system has as sufficient memory
```
Results for each node are analyzed separately, with the node name
appended to the title:
```
bin/preflight --interactive=false --collector-image=croomes/troubleshoot:latest examples/preflight/remote/memory.yaml --format=json
{memory running 0 1}
{memory completed 1 1}
{
"fail": [
{
"title": "Amount of Memory (kind-worker2)",
"message": "At least 8Gi of memory is required"
},
{
"title": "Amount of Memory (kind-worker)",
"message": "At least 8Gi of memory is required"
},
{
"title": "Amount of Memory (kind-control-plane)",
"message": "At least 8Gi of memory is required"
}
]
}
```
Also added a host collector to allow preflight checks of required kernel
modules, which is the main driver for this change.