* feat: implement collector and analyser for network namespace connectivity
checks if two network namespaces can talk to each other on udp and tcp.
its usage is as follows:
```yaml
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: test
spec:
hostCollectors:
- networkNamespaceConnectivity:
collectorName: check-network-connectivity
fromCIDR: 10.0.0.0/24
toCIDR: 10.0.1.0/24
hostAnalyzers:
- networkNamespaceConnectivity:
collectorName: check-network-connectivity
outcomes:
- pass:
message: "Communication between 10.0.0.0/24 and 10.0.1.0/24 is working"
- fail:
message: "Communication between 10.0.0.0/24 and 10.0.1.0/24 isn't working"
```
if this fails then you may need to enable `forwarding` with:
```bash
sysctl -w net.ipv4.ip_forward=1
```
if it still fails then you may need to configure firewalld to allow the
traffic or simply disable it for sake of testing.
* chore: rebuild schemas
* chore: remove unused property
* chore: disable namespaces for other platforms
* chore: make sure we timeout temporary servers
* feat: analyzer now supports multi-node collection
* feat: check both udp and tcp even on failure
check both protocols even if one fails. this pr commit also introduces a
timeout that can be set by the user.
* feat: add templating to the failure outcome
allow users to dump the errors found during the analysis.
* chore: addressing pr comments
* feat: delete interface pair before namespace
even though the interface pair is deleted everyttime we delete the
namespace on my tests we better delete it before we delete the
namespace.
this comes out of a review comment where some people seem to still be
able to see the interface pair even after the namespace is deleted.
i.e. better safe than sorry.
* chore: fix typo on comment
allow users to check if specific cpu flags are supported by the host.
```yaml
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
name: ec-cluster-preflight
spec:
collectors:
- cpu: {}
analyzers:
- cpu:
checkName: CPU
outcomes:
- pass:
when: hasFlags cmov,cx8,fpu,fxsr,mmx
message: CPU supports all required flags
- fail:
message: CPU not supported
```
allows troubleshoot to collect and analyze CPU micro architecture. this
is an usage example:
```yaml
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
name: ec-cluster-preflight
spec:
collectors:
- cpu: {}
analyzers:
- cpu:
checkName: CPU
outcomes:
- pass:
when: 'supports x86-64-v2'
message: CPU supports x86-64-v2
- fail:
message: CPU does not support x86-64-v2
```
Change to stop re-analysing preflight results when uploadResultsTo is present leading to duplicate results
Signed-off-by: Evans Mungai <evans@replicated.com>
* feat: Install goldpinger if one does not exist when running goldpinger collector
- Deploy golpinger daemonset if one is not detected in the cluster
- Clean up all deployed resources
- Add delay to allow users to wait for goldpinger to perform checks
Signed-off-by: Evans Mungai <evans@replicated.com>
* Add missing test data file
Signed-off-by: Evans Mungai <evans@replicated.com>
* Better naming of create resource functions
Signed-off-by: Evans Mungai <evans@replicated.com>
---------
Signed-off-by: Evans Mungai <evans@replicated.com>
* new struct and update schemas
* implement Collect function
* add kernel config to collector struct
* generate kernel config analyzer schema
* implement kernel config analyzer
* fail on no match in pass outcome
* run make check-schemas
* fix failed unit test
* update from code review
* add selectedConfigs field
* run make check-schemas
* feat: node metrics analyser
The analyser only checks PVC usage at the moment. More analysers
can be added on a need to have basis
* Add tests
* Fix flaky test by waiting for goldpinger pods to start
* Fix how outcomes get checked
* Fix catch all outcome condition
* Fix test
* feat: node metrics analyser
The analyser only checks PVC usage at the moment. More analysers
can be added on a need to have basis
* Add tests
* Fix flaky test by waiting for goldpinger pods to start
* Fix how outcomes get checked
* Fix catch all outcome condition
* Fix test
* Regenerate schemas
* Fix failing test
---------
Co-authored-by: Dexter Yan <yanshaocong@gmail.com>
In order to be compatible with KOTS, downgrade Velero to 1.10.
This removes some features from the Velero collector, but unblocks KOTS from being able
to import Troubleshoot.
We should be wary of updating Velero in the future to prevent this recurring.
sc-98475
* Add workaround for EKS version string
The EKS version string returned is not semver compliant. To work around this, we remove
the suffix for version strings that contain -eks-.
Fixes#1441
* Add parsing version test cases
* Rename function
---------
Co-authored-by: Evans Mungai <evans@replicated.com>
* feat: allow templating of the outcome message for the JSON and YAML Compare analyzers
* Update pkg/analyze/json_compare.go
Co-authored-by: Evans Mungai <evans@replicated.com>
* feat: goldpinger analyser
Analyser to generate a report from goldpinger results
* Add goldpinger testdata
* Goldpinger collector
* Improvements after running tests
* More minor updates after further testing
* Better error message if a container fails to start
* A few more updates
* Add goldpinger e2e test
* Update schemas
* Clean up help installs in e2e tests
* Add resource limits to goldpinger pods
* Some minor improvements
* Some more changes noted when writing docs
* Update schemas
* A few more updates when testing with kURL
* Log goldpinger tests
* Tests before exit code
* feat: add velero analyzer (#806)
* updated schema
* analyzer without collector
* tests
* covers deprecated Restic repository type
* velero version from deployment image to check deprecated type
* read for both velero pod kinds (velero*, node-agent*)
---------
Signed-off-by: Archit Sharma <archit@pm.me>
* stashing changes
* split filesystem collector into fio and legacy functions
* read fio results into analyzer
* remove test script
* update go.mod
* remove old notes
* go mod tidy
* fix up go.mod
* fix up go.mod
* refactor tests for fio
* make schemas
* remove local scripts
* local watch script for building troubleshoot
* document watch script
* fix var names
* handle errors if run as non-root
* go mod tidy
* use String interface
* collector happy path test
* invalid filesize
* invalid filesize
* tests
* remove old code
* remove old init function
* let actions tests run this
* clean up tests
* go mod tidy
* remove duplicated type declaration
* remove old file create code