* Uses secrets from cluster
* updated gitignore to stop ignoring needed files
* Delete specs.go.bak
* make fmt
* added preflight to generic loader
* Tells user to run in cluster if using secretKeyRef
* Update loader.go
* Update loader.go
* feat: add timestamps falg to logs collector
Kubernetes logs can be transmitted with the captured timestamps. This is useful for containers that do not log with timestamps. So I'm exposing that as a flag.
* fix: update schemas
* feat: implement collector and analyser for network namespace connectivity
checks if two network namespaces can talk to each other on udp and tcp.
its usage is as follows:
```yaml
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: test
spec:
hostCollectors:
- networkNamespaceConnectivity:
collectorName: check-network-connectivity
fromCIDR: 10.0.0.0/24
toCIDR: 10.0.1.0/24
hostAnalyzers:
- networkNamespaceConnectivity:
collectorName: check-network-connectivity
outcomes:
- pass:
message: "Communication between 10.0.0.0/24 and 10.0.1.0/24 is working"
- fail:
message: "Communication between 10.0.0.0/24 and 10.0.1.0/24 isn't working"
```
if this fails then you may need to enable `forwarding` with:
```bash
sysctl -w net.ipv4.ip_forward=1
```
if it still fails then you may need to configure firewalld to allow the
traffic or simply disable it for sake of testing.
* chore: rebuild schemas
* chore: remove unused property
* chore: disable namespaces for other platforms
* chore: make sure we timeout temporary servers
* feat: analyzer now supports multi-node collection
* feat: check both udp and tcp even on failure
check both protocols even if one fails. this pr commit also introduces a
timeout that can be set by the user.
* feat: add templating to the failure outcome
allow users to dump the errors found during the analysis.
* chore: addressing pr comments
* feat: delete interface pair before namespace
even though the interface pair is deleted everyttime we delete the
namespace on my tests we better delete it before we delete the
namespace.
this comes out of a review comment where some people seem to still be
able to see the interface pair even after the namespace is deleted.
i.e. better safe than sorry.
* chore: fix typo on comment
* add a TLS parameter for cacert
* pass a ca cert into http request
* test preflight
* make schemas
* log extra information from http request
* pass a proxy into the collector spec
* hitting a segfault; breakpoint
* accept a dir, file, or a string-literal as CA
* move tls params into get, put, post methods
* test for cert untrusted response
* make generate
* make schemas
* more test cases
* make schemas
* dont include system certs
* make generate && make schemas
* resolve gosec G402 warning
* remove old check for system certs
* ignore errcheck "return value not checked" linter errors
* Add image parameter to the goldpinger collector
* Pass image directly as a function arg
Also allow util image to be set in spec
* Remove pointless util image override
* Update pkg/collect/goldpinger.go
Co-authored-by: Evans Mungai <evans@replicated.com>
* Simplify image override
---------
Co-authored-by: Evans Mungai <evans@replicated.com>
* feat: Install goldpinger if one does not exist when running goldpinger collector
- Deploy golpinger daemonset if one is not detected in the cluster
- Clean up all deployed resources
- Add delay to allow users to wait for goldpinger to perform checks
Signed-off-by: Evans Mungai <evans@replicated.com>
* Add missing test data file
Signed-off-by: Evans Mungai <evans@replicated.com>
* Better naming of create resource functions
Signed-off-by: Evans Mungai <evans@replicated.com>
---------
Signed-off-by: Evans Mungai <evans@replicated.com>
* add struct for host dns collector
* add miekg/dns
* add more logs
* nit
* new field names
* use Hostnames instead of Names
* misc update
* make schemas
* no error when there is no resolv.conf
* query all searches
* add summary.json file
* merge summary into result file
* query AAAA and CNAME as well
* update schema for hostnames to be required
* store DNS collector in JSON output for analyze later
* fix incorrect path
* configurable dns image
* make non resolvable domain configurable
* nit update address field
* * update dns util image
* add unit test
* new schema for etcd collector
* add placeholder
* wip
* get supported distribution
* add exec implementation
* wait for etcd pod to be ready
* misc
* update k0s etcd certs path
* fix unit tests
* address code reviews
* update from code review
* add etcdctl version
Linux control groups host collector that detects whether the specified mountPoint is a cgroup filesystem and what version it is. The collector also collects information of the configured cgroup controllers.
Signed-off-by: Evans Mungai <evans@replicated.com>
* new struct and update schemas
* implement Collect function
* add kernel config to collector struct
* generate kernel config analyzer schema
* implement kernel config analyzer
* fail on no match in pass outcome
* run make check-schemas
* fix failed unit test
* update from code review
* add selectedConfigs field
* run make check-schemas
* feat: node metrics analyser
The analyser only checks PVC usage at the moment. More analysers
can be added on a need to have basis
* Add tests
* Fix flaky test by waiting for goldpinger pods to start
* Fix how outcomes get checked
* Fix catch all outcome condition
* Fix test
* feat: node metrics analyser
The analyser only checks PVC usage at the moment. More analysers
can be added on a need to have basis
* Add tests
* Fix flaky test by waiting for goldpinger pods to start
* Fix how outcomes get checked
* Fix catch all outcome condition
* Fix test
* Regenerate schemas
* Fix failing test
---------
Co-authored-by: Dexter Yan <yanshaocong@gmail.com>
* feat: node metrics collector
A collector to collect node metrics served by the API server as
per the documented API https://kubernetes.io/docs/reference/instrumentation/node-metrics/
* Update CRD schemas
* Add tests
* Remove clean from build target
* Update comments
* Commit missing tests
* Remove unnecessary log in tests
* feat: goldpinger analyser
Analyser to generate a report from goldpinger results
* Add goldpinger testdata
* Goldpinger collector
* Improvements after running tests
* More minor updates after further testing
* Better error message if a container fails to start
* A few more updates
* Add goldpinger e2e test
* Update schemas
* Clean up help installs in e2e tests
* Add resource limits to goldpinger pods
* Some minor improvements
* Some more changes noted when writing docs
* Update schemas
* A few more updates when testing with kURL
* Log goldpinger tests
* Tests before exit code
* fix: missing omitempty on 2 of the new fields
* fix: Rename TS_WORKSPACE_DIR to TS_OUTPUT_DIR
---------
Co-authored-by: Evans Mungai <evans@replicated.com>
* feat: save cmd run output
* chore: schema changes
* chore: example hostCollector
* chore: add log messages to key actions
* fix: correctly inherit all parent env by default
* chore: do not save input file
the user invokes the input already got the input but those content could be sensitive to another user who received this bundle
* test: unit test for host run
* revert: "chore: do not save input file"
This reverts commit 6af77ad1ce.
that commit is wrong
* chore: fix log msg and example yaml
* Ensure child cmd runs in its own working dir
* Check filename for slashes not content
* Update logging
* Add using relative path files as commands
---------
Co-authored-by: Evans Mungai <evans@replicated.com>
* feat: add velero analyzer (#806)
* updated schema
* analyzer without collector
* tests
* covers deprecated Restic repository type
* velero version from deployment image to check deprecated type
* read for both velero pod kinds (velero*, node-agent*)
---------
Signed-off-by: Archit Sharma <archit@pm.me>
* feat: Add regular expressions host anaylser
This anaylser is the same as the in-cluster text anaylser. You pass in
search expressions to find values in files collected in a bundle
* additional test assertion to check analyser warn