* Add dry-run flag
* No traces on dry run
* More refactoring
* More updates to support bundle binary
* More refactoring changes
* Different approach of loading specs from URIs
* Self review
* More changes after review and testing
* fix how we parse oci image uri
* Remove unnecessary comment
* Add missing file
* Fix failing tests
* Better error check for no collectors
* Add default collectors when parsing support bundle specs
* Add missed test fixture
* Download specs with correct headers
* Fix typo
* chore: make specs an internal package
* Some minor improvements
* Use LoadClusterSpecs in support bundle implementation
* Remove change accidentally committed
* Use LoadFromCLIArgs in preflight CLI implementation
* Update comment
* Fix edge case where the label selector is an empty string
* Fix failing test
* feat: add loader APIs to load specs from a list of yaml docs
The change introduces a loader package that will contain loader
public APIs. The aim of these APIs will be to, given any source of
troubleshoot specs, the loaders will fetch the specs and parse out
all troubleshoot objects that can be extracted.
* Some refactoring
* Some more changes
* More changes caught when testing vendor portal
* Add tests and rename Troubleshoot kinds struct
* Additional test
* Handle ConfigMap and Secrets with multiple specs in them
* Fix failing test
* Revert multidoc split implementation
* Fix merge conflict
* Change LoadFromXXX functions to a single LoadSpecs function
* we can now read preflight specs out of secrets, either from stdin or file input
* moved spec read logic out into its own function so it can be unit
tested easier
* added more comprehensive unit testing on the different ways we can read in specs
When running a support bundle, we want to know how long each operation
(collect, redact, analyze) takes. This commit adds a new trace exporter
that records the start and end times of each operation, and then prints
a summary of the execution. The summary is also stored in the support
bundle.
Related to #923
* adding dedup for in cluster collectors
* add tests
* return collector as is whenever marshalling to json fails
---------
Co-authored-by: Evans Mungai <evans@replicated.com>
* adding test coverage for preflight.RunPreflights()
TDD to work on https://github.com/replicatedhq/troubleshoot/issues/906
and verify the fix is successful
* go.mod/go.sum: removing gnomock stuff since it's not in use (yet)
* Makefile: try running the preflight integration test with the e2e tests,
since there's a K3s instance in place already
* Makefile add a dedicated test-integration task, which runs as it's own
github action job
* Makefile: exclude a few things from test-integration that break the
github action job
* WIP on preflight tests, addressing some of @banjoh's feedback, more to
go though (specifically changing over to using assert)
* preflight tests: use the testify libraries, restructure code to be
formatted more like other tests in this project
To keep both the Support Bundle and Preflight CLIs similar, this PR adds the ability for the Preflight binary to allow multiple specs be provided as CLI args and for them all to be run.
* add dedup for cluster resources collector
* restructure both collect.go in both pkg/supportbundle and pkg/preflight to be more similar for eventual refactor
* feat(analyze): add ExcludeFiles field to textAnazlye
* feat(analyze): fix test for getFiles
* feat(analyze): change function name to excludeFilePaths
* feat(analyze): fix preflight test fail
* feat(analyze): add tests for excludeFiles
* feat(schemas): run make schemas
* feat(analyze): use getChildCollectedFileContents function prototype
* feat(analyze): reduce time complexity
* feat(longhorn): add getFileContents as getCollectedFileContents
* fix(flag): fix wrong output filename
* fix(flag): add reset flag function
* fix(flag): add output flag test cases
* fix(flag): move resetFlags function into private go test
* fix(flag): restructure flag tests with testify
* fix(flag): remove resetFlags function
* fix(flag): remove duplicated test and rewrite test names
This change ensures that the clusterResources collector runs prior to any others
in order to not collect info on pods that collectors run during collection.
Additionally centralizes functions that are common to all collection to make future
maintenance simpler.
Fixes: #767
* fix: return true when one of analyzers is true
* fix: return true when one of analyzers is true
* update: add unit test
* update: log errors while processing analyzers
* fix: return parse error instead of logging
* fix: update err message
* fix: evaluate strict check err separately
* add strict flag to Analyzer/AnalyzerMeta
and regenerate schemas and controller-gen code
* map analyzer strict to result
* Update stdout for human and json format
* fix review comment
* update interactive result
* update interactive results
* Update types.go
* Update upload_results.go
* print strict when only true
* Add collect command and remote host collectors
Adds the ability to run a host collector on a set of remote k8s nodes.
Target nodes can be filtered using the --selector flag, with the same
syntax as kubectl. Existing flags for --collector-image,
--collector-pullpolicy and --request-timeout are used. To run on a
specified node, --selector="kubernetes.io/hostname=kind-worker2" could
be used.
The collect command is used by the remote collector to output the
results using a "raw" format, which uses the filename as the key, and
the value the output as a escaped json string. When run manually it
defaults to fully decoded json. The existing block devices,
ipv4interfaces and services host collectors don't decode properly - the
fix is to convert their slice output to a map (fix not included as
unsure what depends on the existing format).
The collect command is also useful for troubleshooting preflight issues.
Examples are included to show remote collector usage.
```
bin/collect --collector-image=croomes/troubleshoot:latest examples/collect/remote/memory.yaml --namespace test
{
"kind-control-plane": {
"system/memory.json": {
"total": 1304207360
}
},
"kind-worker": {
"system/memory.json": {
"total": 1695780864
}
},
"kind-worker2": {
"system/memory.json": {
"total": 1726353408
}
}
}
```
The preflight command has been updated to run remote collectors. To run
a host collector remotely it must be specified in the spec as a
`remoteCollector`:
```
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
name: memory
spec:
remoteCollectors:
- memory:
collectorName: memory
analyzers:
- memory:
outcomes:
- fail:
when: "< 8Gi"
message: At least 8Gi of memory is required
- warn:
when: "< 32Gi"
message: At least 32Gi of memory is recommended
- pass:
message: The system has as sufficient memory
```
Results for each node are analyzed separately, with the node name
appended to the title:
```
bin/preflight --interactive=false --collector-image=croomes/troubleshoot:latest examples/preflight/remote/memory.yaml --format=json
{memory running 0 1}
{memory completed 1 1}
{
"fail": [
{
"title": "Amount of Memory (kind-worker2)",
"message": "At least 8Gi of memory is required"
},
{
"title": "Amount of Memory (kind-worker)",
"message": "At least 8Gi of memory is required"
},
{
"title": "Amount of Memory (kind-control-plane)",
"message": "At least 8Gi of memory is required"
}
]
}
```
Also added a host collector to allow preflight checks of required kernel
modules, which is the main driver for this change.