Commit Graph

748 Commits

Author SHA1 Message Date
Gerard Nguyen
f0b8de68ae feat: multiple nodes analyzers (#1667)
* implement refactor for multiple node analyzers

---------

Co-authored-by: Diamon Wiggins <38189728+diamonwiggins@users.noreply.github.com>
2024-11-04 14:17:39 +11:00
Ash
544a700062 [sc-114813] copy HostCollector fails to copy binary files when run in cluster (#1669)
* Don't convert output bytes to string

This prevents binary files getting mangled when the collector ourput is being passed around between functions

* Update pkg/collect/runner.go

Co-authored-by: Evans Mungai <evans@replicated.com>

* organise imports

---------

Co-authored-by: Evans Mungai <evans@replicated.com>
2024-10-31 10:44:35 +00:00
Dexter Yan
059b5d14d2 fix(collector): limit run pod collector to delete only one related secret (#1668)
* fix(collector): limit run pod collector to delete only related secret

* change to ctx
2024-10-30 14:19:30 +00:00
Evans Mungai
deda4ce98c feat: Do not prompt users to save support bundle analysis results (#1662)
In interactive mode, do not prompt users to save support
bundle analysis results. Users end up providing this file
instead of the support bundle archive. The analysis results
are contained in the support bundle archive already

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-10-25 13:03:16 +01:00
Dexter Yan
350418c6e9 feat(host-collector): add progress for host collector (#1659) 2024-10-25 15:34:09 +13:00
ada mancini
eacff7112f support adding a CA cert to http collector (#1624)
* add a TLS parameter for cacert

* pass a ca cert into http request

* test preflight

* make schemas

* log extra information from http request

* pass a proxy into the collector spec

* hitting a segfault; breakpoint

* accept a dir, file, or a string-literal as CA

* move tls params into get, put, post methods

* test for cert untrusted response

* make generate

* make schemas

* more test cases

* make schemas

* dont include system certs

* make generate && make schemas

* resolve gosec G402 warning

* remove old check for system certs

* ignore errcheck "return value not checked" linter errors
2024-10-23 18:15:08 -04:00
Dexter Yan
0d21eed5f8 fix(support): add missing host collectors for ParseSupportBundle (#1656)
* fix(support): add missing host collectors for ParseSupportBundle

* update

* add host ananlyers
2024-10-22 13:07:44 +13:00
Diamon Wiggins
b88bc8ddf7 Refactor Multi Node Analyzers (#1646)
* initial refactor of host os analyzer

* refactor remote collect analysis

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
Co-authored-by: Gerard Nguyen <gerard@replicated.com>
Co-authored-by: Evans Mungai <evans@replicated.com>
2024-10-22 10:45:50 +13:00
Evans Mungai
9c24ab6067 chore: Remove preempted deprecation warnings (#1655)
Signed-off-by: Evans Mungai <evans@replicated.com>
2024-10-22 08:35:36 +11:00
Gerard Nguyen
289102f16d bug: fix nil check in host collector filter (#1653)
* add nil check in filter host collector
2024-10-18 15:58:32 +11:00
Gerard Nguyen
ffa1c040e2 fix: [sc-111255] CRD analyzer outcomes has no Warn field (#1647)
add warn field to CRD analyzer
2024-10-14 14:36:42 +11:00
Evans Mungai
0113624352 chore(support-bundle): respect using load-cluster-specs=false (#1634)
* fix: Allow using load-cluster-specs=false

Signed-off-by: Evans Mungai <evans@replicated.com>

* Some more simplification

Signed-off-by: Evans Mungai <evans@replicated.com>

* Ensure error in loading specs is printed in CLI

Signed-off-by: Evans Mungai <evans@replicated.com>

* Run linter

Signed-off-by: Evans Mungai <evans@replicated.com>

* Fix failing tests

Signed-off-by: Evans Mungai <evans@replicated.com>

* Remove unnecessary test case rename

Signed-off-by: Evans Mungai <evans@replicated.com>

* Fix error wrapping

Signed-off-by: Evans Mungai <evans@replicated.com>

* Check if load-cluster-specs was provided in cli

Signed-off-by: Evans Mungai <evans@replicated.com>

* Better wording in comments

Signed-off-by: Evans Mungai <evans@replicated.com>

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-10-11 13:48:32 -04:00
Shubhag Saxena
52efd167ad feat: allow users to check cpu arch (#1644) 2024-10-10 18:59:22 +05:30
Diamon Wiggins
8105fa00e9 Refactor Remote Host Collection (#1633)
* refactor remote collectors

* add remotecollect params struct

* remove commented checkrbac function

* removed unused function

* add temp comments

* refactor to not require RemoteCollect method per collector

* removed unneeded param

* removed unneeded param

* more refactor

* more refactor

* remove unneeded function

* remove debug print

* fix analyzer results

* move rbac to separate file

* be more specific with rbac function name

* fix imports

* fix node list file

* make k8s rest client config consistent with in cluster collection

* add ctx and otel tracing

* add test for allCollectedData

* move runHostCollectorsInPod to spec instead of metadata

* make generate

* fix broken references to supportbundle metadata

* add e2e tests

* update loader tests

* fix tests

* fix hostos remote collector spec

* update remoteHostCollectrs.yaml

---------

Co-authored-by: Dexter Yan <yanshaocong@gmail.com>
2024-10-09 18:38:49 +13:00
Evans Mungai
0240a632c9 chore: Collect endpointslices resources (#1636)
Signed-off-by: Evans Mungai <evans@replicated.com>
2024-10-03 14:55:53 +01:00
Ash
f58f02560f Allow goldpinger / goldpinger util images to be set in collector spec (#1635)
* Add image parameter to the goldpinger collector

* Pass image directly as a function arg

Also allow util image to be set in spec

* Remove pointless util image override

* Update pkg/collect/goldpinger.go

Co-authored-by: Evans Mungai <evans@replicated.com>

* Simplify image override

---------

Co-authored-by: Evans Mungai <evans@replicated.com>
2024-10-03 13:18:36 +01:00
Ricardo Maraschini
2efbc20b7c feat: allow users to check cpu flags (#1631)
allow users to check if specific cpu flags are supported by the host.

```yaml
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
  name: ec-cluster-preflight
spec:
  collectors:
  - cpu: {}
  analyzers:
  - cpu:
      checkName: CPU
      outcomes:
        - pass:
            when: hasFlags cmov,cx8,fpu,fxsr,mmx
            message: CPU supports all required flags
        - fail:
            message: CPU not supported
```
2024-10-01 10:48:25 +02:00
Gerard Nguyen
c1c4b612a4 feat: [sc-113128] Create node list file before running remote host collector (#1632)
* create node list
2024-10-01 14:43:24 +10:00
Gerard Nguyen
d60f9a6b76 change resolvedFromSearch content (#1629) 2024-09-30 11:22:10 +10:00
Ricardo Maraschini
668b7ed0b2 feat: add CPU micro architecture support (#1628)
allows troubleshoot to collect and analyze CPU micro architecture. this
is an usage example:

```yaml
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
  name: ec-cluster-preflight
spec:
  collectors:
  - cpu: {}
  analyzers:
  - cpu:
      checkName: CPU
      outcomes:
        - pass:
            when: 'supports x86-64-v2'
            message: CPU supports x86-64-v2
        - fail:
            message: CPU does not support x86-64-v2
```
2024-09-27 17:16:49 +02:00
Evans Mungai
2bb611cda1 bug: Remove duplicate results in preflights (#1626)
Change to stop re-analysing preflight results when uploadResultsTo is present leading to duplicate results

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-09-26 15:25:39 +01:00
Dexter Yan
142015cce3 feat(analyzer): enable host os info analyzer to support multiple nodes (#1618) 2024-09-26 10:25:08 +12:00
Evans Mungai
83f02f4705 feat: Install goldpinger daemonset if one does not exist when running goldpinger collector (#1619)
* feat: Install goldpinger if one does not exist when running goldpinger collector

- Deploy golpinger daemonset if one is not detected in the cluster
- Clean up all deployed resources
- Add delay to allow users to wait for goldpinger to perform checks

Signed-off-by: Evans Mungai <evans@replicated.com>

* Add missing test data file

Signed-off-by: Evans Mungai <evans@replicated.com>

* Better naming of create resource functions

Signed-off-by: Evans Mungai <evans@replicated.com>

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-09-24 17:17:14 +01:00
Dexter Yan
e97b9613a5 feat(support-bundle): add runHostCollectorsInPod in spec (#1608) 2024-09-20 11:57:58 -05:00
Gerard Nguyen
8823f7d99e feat: host collector for DNS (#1617)
* add struct for host dns collector

* add miekg/dns

* add more logs

* nit

* new field names

* use Hostnames instead of Names

* misc update

* make schemas

* no error when there is no resolv.conf

* query all searches

* add summary.json file

* merge summary into result file

* query AAAA and CNAME as well

* update schema for hostnames to be required
2024-09-20 08:13:57 +10:00
Evans Mungai
aea4f7c87c feat: Optionally save preflight bundles to disk (#1612)
* feat: Optionally save preflight bundles to disk

Signed-off-by: Evans Mungai <evans@replicated.com>

* Add e2e test of saving preflight bundle

Signed-off-by: Evans Mungai <evans@replicated.com>

* Update cli docs

Signed-off-by: Evans Mungai <evans@replicated.com>

* Expose GetVersionFile function publicly

Signed-off-by: Evans Mungai <evans@replicated.com>

* Store analysis.json file in preflight bundle

Signed-off-by: Evans Mungai <evans@replicated.com>

* Run go fmt when running lint fixers

Signed-off-by: Evans Mungai <evans@replicated.com>

* Always generate a preflight bundle in CLI

Signed-off-by: Evans Mungai <evans@replicated.com>

* Print saving bundle message to stderr

Signed-off-by: Evans Mungai <evans@replicated.com>

* Revert changes in docs directory

Signed-off-by: Evans Mungai <evans@replicated.com>

* Use NewResult constructor

Signed-off-by: Evans Mungai <evans@replicated.com>

* Log always when preflight bundle is saved to disk

Signed-off-by: Evans Mungai <evans@replicated.com>

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-09-16 23:36:52 +01:00
Gerard Nguyen
05dcae2388 fix: [sc-112114] registry collector failed to talk to Replicated private registry (#1613)
decode auth for registry secret
2024-09-13 15:03:52 +01:00
Gerard Nguyen
7484b10914 feat: [sc-110727] troubleshoot: collector/analyzer for wildcard dns (#1606)
* store DNS collector in JSON output for analyze later

* fix incorrect path

* configurable dns image

* make non resolvable domain configurable

* nit update address field

* * update dns util image
* add unit test
2024-09-11 14:35:30 +10:00
Dexter Yan
0a2c9c74ab feat(analyzer): allow templating for Node Resources Analyzer (#1605)
* feat(analyzer): allow templating for Node Resources Analyzer
2024-09-02 09:42:40 +12:00
Ethan Mosbaugh
1b1efa133e feat(fio): add option to disable runtime (#1601) 2024-08-22 16:47:08 -07:00
Evans Mungai
ff31f5af0b Log when analysers fail to match any outcome conditions (#1597)
Signed-off-by: Evans Mungai <evans@replicated.com>
2024-08-20 10:52:28 +01:00
Diamon Wiggins
fa14616009 Log non-existentent analyzers instead of adding to analyzer results (#1593)
log to debug non-existent analyzes instead of adding to analyzers results
2024-08-14 15:34:36 -04:00
Gerard Nguyen
47656a8e6f feat: etcd collector (#1589)
* new schema for etcd collector

* add placeholder

* wip

* get supported distribution

* add exec implementation

* wait for etcd pod to be ready

* misc

* update k0s etcd certs path

* fix unit tests

* address code reviews

* update from code review

* add etcdctl version
2024-08-13 08:42:26 +10:00
Gerard Nguyen
60263caf78 feat: [sc-108732] Can't add annotations in pods executed with the runPod collector (#1590)
add new field annotations for run pod collector
2024-08-08 10:13:36 +10:00
Gerard Nguyen
a57171a918 feat: [sc-108689] troubleshoot: journald collector (#1586)
* add schema for Journald Host Collector

* implement journald host collector

* update host collector

* add --no-pager
2024-07-29 09:44:43 +10:00
Evans Mungai
01d5804977 feat: cgroups host collector (#1581)
Linux control groups host collector that detects whether the specified mountPoint is a cgroup filesystem and what version it is. The collector also collects information of the configured cgroup controllers.

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-07-24 16:46:04 +01:00
Evans Mungai
1444c01725 feat: json compare host analyser (#1582)
* feat: json compore host analyser

Signed-off-by: Evans Mungai <evans@replicated.com>

* Add missing json compare host analyser file

Signed-off-by: Evans Mungai <evans@replicated.com>

* Generate schemas

Signed-off-by: Evans Mungai <evans@replicated.com>

* Fix failing tests

Signed-off-by: Evans Mungai <evans@replicated.com>

* Ensure json compare analyser always has a title

Signed-off-by: Evans Mungai <evans@replicated.com>

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-07-24 14:27:20 +01:00
Evans Mungai
0020c1129e feat: Allow checking kernel versions only in host os analyzer (#1585)
* Allow checking kernel versions only in host os analyzer

Signed-off-by: Evans Mungai <evans@replicated.com>

* Minor fix in logic

Signed-off-by: Evans Mungai <evans@replicated.com>

* Fix formatting

Signed-off-by: Evans Mungai <evans@replicated.com>

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-07-24 07:04:59 +01:00
Gerard Nguyen
04e656a0a5 fix: [sc-106256] Add missing uri field to troubleshoot.sh types (#1578)
* new no-uri flag for preflight
* implement load additional spec from URIs
2024-07-19 08:23:55 +10:00
Gerard Nguyen
4e999d6bfb fix: [sc-106256] Add missing uri field to troubleshoot.sh types (#1574)
add uri field for top level type
2024-07-15 13:07:04 +10:00
Gerard Nguyen
8173759e52 feat: [sc-106927] Allow kernelConfig analyser to check kernel capability is either built in or loaded for EC host preflights (#1572)
* allow multiple value in kernel config check

* update unit test
2024-07-09 09:45:42 +10:00
Gerard Nguyen
f5f02f5a80 fix: [sc-107456] exec collector is running in all pods matched the selector (#1571)
only exec in 1 pod
2024-07-08 09:31:29 +10:00
Gerard Nguyen
e882f44ae9 Gerard/sc 106216/b registry image collector (#1570)
* update registry auth with username and password

* add unit test
2024-07-03 08:06:37 +10:00
Gerard Nguyen
edfa01c5c4 feat: [sc-106625] http analyzer for in-cluster (#1566)
* http analzyer for in-cluster
* make check-schemas
2024-06-19 12:14:37 +10:00
Evans Mungai
ce155270c8 fix: Use correct cron job kind when discovering API versions (#1554)
* fix: Use correct cron job kind when discovering API versions

* Fix failing e2e test
2024-05-31 07:23:56 +01:00
Gerard Nguyen
80e5fac07c feat: New host collector and analyzer for Kernel Configs (#1546)
* new struct and update schemas

* implement Collect function

* add kernel config to collector struct

* generate kernel config analyzer schema

* implement kernel config analyzer

* fail on no match in pass outcome

* run make check-schemas

* fix failed unit test

* update from code review

* add selectedConfigs field

* run make check-schemas
2024-05-27 09:55:39 +10:00
Dexter Yan
51c07b42c3 feat(analyzer): let cluster resource case insensitive to fix name inconsistent (#1547)
* feat(analyzer): let cluster resource case insensitive
2024-05-22 11:37:39 +12:00
Evans Mungai
78bbea18ac feat: Prefer embedded-cluster over k0s when detecting distro (#1544)
* feat: Prefer embedded-cluster over k0s when detecting distro

* Implement check for embedded cluster detection
2024-05-15 09:26:03 +01:00
Gerard Nguyen
6b368f2221 feat: [sc-103754] Be able to detect search domain misconfiguration #1391 (#1534)
* new collector dns

* implement DNS collector

* add dns service and endpoints check

* add nil check on retrieve endpoints
2024-05-01 07:04:20 +10:00
Dexter Yan
cb5db1733a feat(analyzer): make sure ReplicaSetStatus has valid result (#1538)
* feat(analyzer): make sure ReplicaSetStatus has valid result
---------

Co-authored-by: Gerard Nguyen <gerard@replicated.com>
2024-05-01 09:02:45 +12:00