Commit Graph

334 Commits

Author SHA1 Message Date
ada mancini
eacff7112f support adding a CA cert to http collector (#1624)
* add a TLS parameter for cacert

* pass a ca cert into http request

* test preflight

* make schemas

* log extra information from http request

* pass a proxy into the collector spec

* hitting a segfault; breakpoint

* accept a dir, file, or a string-literal as CA

* move tls params into get, put, post methods

* test for cert untrusted response

* make generate

* make schemas

* more test cases

* make schemas

* dont include system certs

* make generate && make schemas

* resolve gosec G402 warning

* remove old check for system certs

* ignore errcheck "return value not checked" linter errors
2024-10-23 18:15:08 -04:00
Diamon Wiggins
b88bc8ddf7 Refactor Multi Node Analyzers (#1646)
* initial refactor of host os analyzer

* refactor remote collect analysis

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
Co-authored-by: Gerard Nguyen <gerard@replicated.com>
Co-authored-by: Evans Mungai <evans@replicated.com>
2024-10-22 10:45:50 +13:00
Shubhag Saxena
52efd167ad feat: allow users to check cpu arch (#1644) 2024-10-10 18:59:22 +05:30
Diamon Wiggins
8105fa00e9 Refactor Remote Host Collection (#1633)
* refactor remote collectors

* add remotecollect params struct

* remove commented checkrbac function

* removed unused function

* add temp comments

* refactor to not require RemoteCollect method per collector

* removed unneeded param

* removed unneeded param

* more refactor

* more refactor

* remove unneeded function

* remove debug print

* fix analyzer results

* move rbac to separate file

* be more specific with rbac function name

* fix imports

* fix node list file

* make k8s rest client config consistent with in cluster collection

* add ctx and otel tracing

* add test for allCollectedData

* move runHostCollectorsInPod to spec instead of metadata

* make generate

* fix broken references to supportbundle metadata

* add e2e tests

* update loader tests

* fix tests

* fix hostos remote collector spec

* update remoteHostCollectrs.yaml

---------

Co-authored-by: Dexter Yan <yanshaocong@gmail.com>
2024-10-09 18:38:49 +13:00
Evans Mungai
0240a632c9 chore: Collect endpointslices resources (#1636)
Signed-off-by: Evans Mungai <evans@replicated.com>
2024-10-03 14:55:53 +01:00
Ash
f58f02560f Allow goldpinger / goldpinger util images to be set in collector spec (#1635)
* Add image parameter to the goldpinger collector

* Pass image directly as a function arg

Also allow util image to be set in spec

* Remove pointless util image override

* Update pkg/collect/goldpinger.go

Co-authored-by: Evans Mungai <evans@replicated.com>

* Simplify image override

---------

Co-authored-by: Evans Mungai <evans@replicated.com>
2024-10-03 13:18:36 +01:00
Gerard Nguyen
c1c4b612a4 feat: [sc-113128] Create node list file before running remote host collector (#1632)
* create node list
2024-10-01 14:43:24 +10:00
Gerard Nguyen
d60f9a6b76 change resolvedFromSearch content (#1629) 2024-09-30 11:22:10 +10:00
Ricardo Maraschini
668b7ed0b2 feat: add CPU micro architecture support (#1628)
allows troubleshoot to collect and analyze CPU micro architecture. this
is an usage example:

```yaml
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
  name: ec-cluster-preflight
spec:
  collectors:
  - cpu: {}
  analyzers:
  - cpu:
      checkName: CPU
      outcomes:
        - pass:
            when: 'supports x86-64-v2'
            message: CPU supports x86-64-v2
        - fail:
            message: CPU does not support x86-64-v2
```
2024-09-27 17:16:49 +02:00
Dexter Yan
142015cce3 feat(analyzer): enable host os info analyzer to support multiple nodes (#1618) 2024-09-26 10:25:08 +12:00
Evans Mungai
83f02f4705 feat: Install goldpinger daemonset if one does not exist when running goldpinger collector (#1619)
* feat: Install goldpinger if one does not exist when running goldpinger collector

- Deploy golpinger daemonset if one is not detected in the cluster
- Clean up all deployed resources
- Add delay to allow users to wait for goldpinger to perform checks

Signed-off-by: Evans Mungai <evans@replicated.com>

* Add missing test data file

Signed-off-by: Evans Mungai <evans@replicated.com>

* Better naming of create resource functions

Signed-off-by: Evans Mungai <evans@replicated.com>

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-09-24 17:17:14 +01:00
Dexter Yan
e97b9613a5 feat(support-bundle): add runHostCollectorsInPod in spec (#1608) 2024-09-20 11:57:58 -05:00
Gerard Nguyen
8823f7d99e feat: host collector for DNS (#1617)
* add struct for host dns collector

* add miekg/dns

* add more logs

* nit

* new field names

* use Hostnames instead of Names

* misc update

* make schemas

* no error when there is no resolv.conf

* query all searches

* add summary.json file

* merge summary into result file

* query AAAA and CNAME as well

* update schema for hostnames to be required
2024-09-20 08:13:57 +10:00
Evans Mungai
aea4f7c87c feat: Optionally save preflight bundles to disk (#1612)
* feat: Optionally save preflight bundles to disk

Signed-off-by: Evans Mungai <evans@replicated.com>

* Add e2e test of saving preflight bundle

Signed-off-by: Evans Mungai <evans@replicated.com>

* Update cli docs

Signed-off-by: Evans Mungai <evans@replicated.com>

* Expose GetVersionFile function publicly

Signed-off-by: Evans Mungai <evans@replicated.com>

* Store analysis.json file in preflight bundle

Signed-off-by: Evans Mungai <evans@replicated.com>

* Run go fmt when running lint fixers

Signed-off-by: Evans Mungai <evans@replicated.com>

* Always generate a preflight bundle in CLI

Signed-off-by: Evans Mungai <evans@replicated.com>

* Print saving bundle message to stderr

Signed-off-by: Evans Mungai <evans@replicated.com>

* Revert changes in docs directory

Signed-off-by: Evans Mungai <evans@replicated.com>

* Use NewResult constructor

Signed-off-by: Evans Mungai <evans@replicated.com>

* Log always when preflight bundle is saved to disk

Signed-off-by: Evans Mungai <evans@replicated.com>

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-09-16 23:36:52 +01:00
Gerard Nguyen
05dcae2388 fix: [sc-112114] registry collector failed to talk to Replicated private registry (#1613)
decode auth for registry secret
2024-09-13 15:03:52 +01:00
Gerard Nguyen
7484b10914 feat: [sc-110727] troubleshoot: collector/analyzer for wildcard dns (#1606)
* store DNS collector in JSON output for analyze later

* fix incorrect path

* configurable dns image

* make non resolvable domain configurable

* nit update address field

* * update dns util image
* add unit test
2024-09-11 14:35:30 +10:00
Ethan Mosbaugh
1b1efa133e feat(fio): add option to disable runtime (#1601) 2024-08-22 16:47:08 -07:00
Gerard Nguyen
47656a8e6f feat: etcd collector (#1589)
* new schema for etcd collector

* add placeholder

* wip

* get supported distribution

* add exec implementation

* wait for etcd pod to be ready

* misc

* update k0s etcd certs path

* fix unit tests

* address code reviews

* update from code review

* add etcdctl version
2024-08-13 08:42:26 +10:00
Gerard Nguyen
60263caf78 feat: [sc-108732] Can't add annotations in pods executed with the runPod collector (#1590)
add new field annotations for run pod collector
2024-08-08 10:13:36 +10:00
Gerard Nguyen
a57171a918 feat: [sc-108689] troubleshoot: journald collector (#1586)
* add schema for Journald Host Collector

* implement journald host collector

* update host collector

* add --no-pager
2024-07-29 09:44:43 +10:00
Evans Mungai
01d5804977 feat: cgroups host collector (#1581)
Linux control groups host collector that detects whether the specified mountPoint is a cgroup filesystem and what version it is. The collector also collects information of the configured cgroup controllers.

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-07-24 16:46:04 +01:00
Gerard Nguyen
f5f02f5a80 fix: [sc-107456] exec collector is running in all pods matched the selector (#1571)
only exec in 1 pod
2024-07-08 09:31:29 +10:00
Gerard Nguyen
e882f44ae9 Gerard/sc 106216/b registry image collector (#1570)
* update registry auth with username and password

* add unit test
2024-07-03 08:06:37 +10:00
Evans Mungai
ce155270c8 fix: Use correct cron job kind when discovering API versions (#1554)
* fix: Use correct cron job kind when discovering API versions

* Fix failing e2e test
2024-05-31 07:23:56 +01:00
Gerard Nguyen
80e5fac07c feat: New host collector and analyzer for Kernel Configs (#1546)
* new struct and update schemas

* implement Collect function

* add kernel config to collector struct

* generate kernel config analyzer schema

* implement kernel config analyzer

* fail on no match in pass outcome

* run make check-schemas

* fix failed unit test

* update from code review

* add selectedConfigs field

* run make check-schemas
2024-05-27 09:55:39 +10:00
Gerard Nguyen
6b368f2221 feat: [sc-103754] Be able to detect search domain misconfiguration #1391 (#1534)
* new collector dns

* implement DNS collector

* add dns service and endpoints check

* add nil check on retrieve endpoints
2024-05-01 07:04:20 +10:00
Dexter Yan
f3bad5f409 fix(collector): fix helm collector with a nil error return to error list (#1535) 2024-04-29 12:19:34 +12:00
Andrew Lavery
f18b5d754e update k8s imports to v0.30.0 and address changed function signature (#1528)
* update k8s imports to v0.30.0 and address changed function signature

* update schemas
2024-04-24 23:05:37 +09:00
Evans Mungai
123d17ab4a feat: node metrics collector (#1516)
* feat: node metrics collector

A collector to collect node metrics served by the API server as
per the documented API https://kubernetes.io/docs/reference/instrumentation/node-metrics/

* Update CRD schemas

* Add tests

* Remove clean from build target

* Update comments

* Commit missing tests

* Remove unnecessary log in tests
2024-04-02 14:59:55 +01:00
Gerard Nguyen
76c52d2b93 New JSON field in HTTP request collector if any (#1511)
* add new raw_json field to http collector

* add unit test
2024-03-28 17:08:51 +11:00
Gerard Nguyen
6f839b389d Quick fix to prevent panic on nil collector result (#1508)
* quick fix to prevent panic on nil collector result
* do not save pod details on error
2024-03-18 16:42:52 +11:00
Salah Al Saleh
8132936e3e Fix automated PRs manager workflow (#1489)
* Fix automated PRs manager workflow
2024-02-26 14:30:21 -08:00
Ethan Mosbaugh
e24ca642aa feat: sonobuoy collector (#1469) 2024-02-16 06:56:15 -08:00
Gerard Nguyen
772d867093 update correct K8S API group for batch and policy (#1461) 2024-02-09 08:17:55 +13:00
Gerard Nguyen
5daf6d6c89 Feat: new collector run-daemonset (#1460)
* feat: new collector run-daemonset
2024-02-09 08:16:32 +13:00
Evans Mungai
2def517d39 chore: additional ceph collector commands (#1450)
* Collect text representations of the ceph data which are easier to read
* Collect ceph df command output
2024-02-06 10:04:25 +13:00
Gerard Nguyen
39b371991e feat: add timeout to run host collector (#1435)
* feat: add timeout to run host collector

* return error on invalid timeout

* return -1 on context deadline exceeded
2024-01-31 12:46:01 +00:00
Akash Shrivastava
361e12e691 feat: [ISSUE-1401]: Added helm get values option in Helm collector (#1402)
* feat: [ISSUE-1401]: Added helm get values option in Helm collector

Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>

* Made changes in error handling

Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>

* feat: [ISSUE-1401]: Added test cases and fixes

Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>

* fixed test case

Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>

---------

Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>
2024-01-16 19:12:43 +00:00
Xav Paice
e542f4fd0a bump k8s.io packages to v0.29.0 (#1419)
* bump k8s.io packages to v0.29.0

* update Go to 1.21

* update schemas
2024-01-08 17:24:41 +00:00
Evans Mungai
3012b870bd fix: flaky e2e test (#1400) 2023-12-20 16:07:30 +13:00
Evans Mungai
53113c0170 feat: goldpinger collector and analyser (#1398)
* feat: goldpinger analyser

Analyser to generate a report from goldpinger results

* Add goldpinger testdata

* Goldpinger collector

* Improvements after running tests

* More minor updates after further testing

* Better error message if a container fails to start

* A few more updates

* Add goldpinger e2e test

* Update schemas

* Clean up help installs in e2e tests

* Add resource limits to goldpinger pods

* Some minor improvements

* Some more changes noted when writing docs

* Update schemas

* A few more updates when testing with kURL

* Log goldpinger tests

* Tests before exit code
2023-12-12 11:02:41 +00:00
Evans Mungai
d4623d9404 fix(collector): Let pgx library parse TLS parameters (#1390)
* fix(collector): Let pgx library parse TLS parameters

This allows the collector to respect the sslmode parameters

Fix: #1163

* Add comment

* Improve postgres collector test
2023-11-16 12:51:43 +00:00
Weiyanli Chen(York)
bc4856869e fix: missing omitempty on 2 of the new fields (#1389)
* fix: missing omitempty on 2 of the new fields

* fix: Rename TS_WORKSPACE_DIR to TS_OUTPUT_DIR

---------

Co-authored-by: Evans Mungai <evans@replicated.com>
2023-11-09 13:25:11 +00:00
Weiyanli Chen(York)
f6373f3e36 feat: save host run file output (#1376)
* feat: save cmd run output

* chore: schema changes

* chore: example hostCollector

* chore: add log messages to key actions

* fix: correctly inherit all parent env by default

* chore: do not save input file

the user invokes the input already got the input but those content could be sensitive to another user who received this bundle

* test: unit test for host run

* revert: "chore: do not save input file"

This reverts commit 6af77ad1ce.

that commit is wrong

* chore: fix log msg and example yaml

* Ensure child cmd runs in its own working dir

* Check filename for slashes not content

* Update logging

* Add using relative path files as commands

---------

Co-authored-by: Evans Mungai <evans@replicated.com>
2023-11-08 13:49:46 +00:00
Evans Mungai
73a2d882d7 fix: Store custom resources in JSON & YAML format (#1360)
fix: Store custom resources as JSON and YAML files
2023-10-10 17:50:15 +01:00
Xav Paice
cc56522571 feat: add ceph df to ceph collector (#1358) 2023-10-09 16:53:20 +13:00
ada mancini
e3adc1cb35 call out to fio for host filesystem performance (#1275)
* stashing changes

* split filesystem collector into fio and legacy functions

* read fio results into analyzer

* remove test script

* update go.mod

* remove old notes

* go mod tidy

* fix up go.mod

* fix up go.mod

* refactor tests for fio

* make schemas

* remove local scripts

* local watch script for building troubleshoot

* document watch script

* fix var names

* handle errors if run as non-root

* go mod tidy

* use String interface

* collector happy path test

* invalid filesize

* invalid filesize

* tests

* remove old code

* remove old init function

* let actions tests run this

* clean up tests

* go mod tidy

* remove duplicated type declaration

* remove old file create code
2023-10-03 14:21:56 -04:00
Evans Mungai
86279b4ec4 chore(redactors): memory consumption improvements (#1332)
* Document additional go tool profiling flags

* Add a regex cache to avoid compiling regular expressions all the time

* Reduce max buffer capacity

* Prefer bytes to strings

Strings are immutable and hence we need to create a new one
all the time when operation on them

* Some more changes

* More bytes

* Use writer.Write instead of fmt.FPrintf

* Clear regex cache when resetting redactors

* Logs errors when redactors error since they get swallowed

* Add an improvement comment

* Limit the number of goroutines spawned when redacting

* Minor improvement

* Write byte slices one at a time instead of concatenating them first

* Add a test for writeBytes

* Additional tests
2023-09-15 13:09:21 -04:00
Evans Mungai
514c86d891 fix: use duration strings for http collector timeout (#1338)
* fix: use duration strings for http collector timeout

This follows the same format that all other collectors use.

* Update from PR comment
2023-09-13 19:18:03 -04:00
Archit Sharma
24c6278e5d add http collector timeouts; fixes #1064 (#1310)
* feat(collector): add http method timeouts (#1064)

Signed-off-by: Archit Sharma <archit@replicated.com>

* feat(collector) add schema updates for the http timeout (#1064)

Signed-off-by: Archit Sharma <archit@replicated.com>

* feat(collector): refactor HTTP method calls (#1064)

Signed-off-by: Archit Sharma <archit@replicated.com>

* feat(collector): add tests for HTTP type collector (#1064)

Signed-off-by: Archit Sharma <archit@replicated.com>

---------

Signed-off-by: Archit Sharma <archit@replicated.com>
2023-09-01 19:08:22 +05:30