Commit Graph

761 Commits

Author SHA1 Message Date
Miguel Varela Ramos
8e2647077d feat: add support for matchExpressions when filtering for nodes (#1697)
* feat: add support for matchExpressions when filtering for nodes

* fix: make generate
2024-11-30 23:15:26 +11:00
Ash
ecc92b1e3e [bug] Quick fix for handling non 200 status codes when loading specs from URI (#1695)
* Quick fix for handling non 200 status codes when loading specs from URI

Go http client already handles 3xx responses for us

* note
2024-11-25 15:04:38 +00:00
Ricardo Maraschini
9f5f0633cf feat: rename templating variables (#1693)
when templating the output of the namespace connectivity check we were
referring to the 'fromCIDR' as 'fromNamespace'. it makes way more sense
to refer to it as 'fromCIDR' as this is how it is provided in the input
for the collector.

as this is a brand new feature it is very unlikely that anyone is using
this feature (except for the embedded cluster that still needs to be
patched accodringly).

this is how the analyser were defined before:

```yaml
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
    name: ec-cluster-preflight
spec:
    analyzers:
        - networkNamespaceConnectivity:
            collectorName: check-network-connectivity
            outcomes:
            - pass:
                message: "Communication between {{ .FromNamespace }} and {{ .ToNamespace }} is working"
            - fail:
                message: "{{ .ErrorMessage }}"
```

and this is how it is now:

```yaml
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
    name: ec-cluster-preflight
spec:
    analyzers:
        - networkNamespaceConnectivity:
            collectorName: check-network-connectivity
            outcomes:
            - pass:
                message: "Communication between {{ .FromCIDR }} and {{ .ToCIDR }} is working"
            - fail:
                message: "{{ .ErrorMessage }}"

```
2024-11-21 16:03:50 +01:00
Dexter Yan
6167fd8a5e fix(collector): fix dns collector limited to 63 chars (#1690) 2024-11-19 17:47:24 +13:00
Gerard Nguyen
7bb88e6b83 feat: ensure Copy collector run last (#1688)
* ensure Copy collector run last

* * add unit test
* reorder in Preflight as well
2024-11-15 10:59:38 +11:00
Dexter Yan
1a828fa90b fix(analyzer): add missing warning in outcome (#1687) 2024-11-13 16:32:54 +13:00
Ash
deeeea7cec exec remote host collectors in a daemonset (#1671)
Co-authored-by: Gerard Nguyen <gerard@replicated.com>
Co-authored-by: Dexter Yan <yanshaocong@gmail.com>
2024-11-12 08:47:24 +13:00
João Antunes
197f6de425 feat(host_analyzer): add host sysctl analyzer (#1681)
* feat(host_analyzer): add host sysctl analyzer

* chore: add e2e tests to support bundle collection

* chore: missing spec e2e test update

* chore: cleanup remote collector and use parse operator

* chore: update schemas
2024-11-08 18:55:24 +00:00
Evans Mungai
d25aa7d0ea fix: Do not fail analysis if node list does not exist (#1678)
* fix: Do not error if node list does not exist

Signed-off-by: Evans Mungai <evans@replicated.com>

* fix test fail

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
Co-authored-by: Dexter Yan <yanshaocong@gmail.com>
2024-11-08 09:53:03 +13:00
João Antunes
77c9968ff6 feat(host_sysctl): add host sysctl collector (#1676)
* feat(host_sysctl): add host sysctl collector

* chore: add examples

* Update pkg/collect/host_sysctl.go

Co-authored-by: Evans Mungai <evans@replicated.com>

* chore: use sysctl package vs exec calls

* chore: make linter happy

* chore: make schemas

* chore: go back to sysctl exec

* chore: make linter happy

---------

Co-authored-by: Evans Mungai <evans@replicated.com>
2024-11-07 18:18:11 +00:00
Diamon Wiggins
06506ed95d Fix remote host collection RBAC checks (#1672)
* fix remote host collection rbac checks

* move saveNodeList into collectRemoteHost function

* fix resource attribute list and retrieve namespace from kubeconfig

* revert change to set a default namespace from kubeconfig

* remove duplicate code
2024-11-07 10:07:27 -05:00
Ricardo Maraschini
e272683bce feat: implement collector and analyser for network namespace connectivity (#1670)
* feat: implement collector and analyser for network namespace connectivity

checks if two network namespaces can talk to each other on udp and tcp.
its usage is as follows:

```yaml
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
  name: test
spec:
  hostCollectors:
  - networkNamespaceConnectivity:
      collectorName: check-network-connectivity
      fromCIDR: 10.0.0.0/24
      toCIDR: 10.0.1.0/24
  hostAnalyzers:
  - networkNamespaceConnectivity:
      collectorName: check-network-connectivity
      outcomes:
      - pass:
          message: "Communication between 10.0.0.0/24 and 10.0.1.0/24 is working"
      - fail:
          message: "Communication between 10.0.0.0/24 and 10.0.1.0/24 isn't working"
```

if this fails then you may need to enable `forwarding` with:

```bash
sysctl -w net.ipv4.ip_forward=1
```

if it still fails then you may need to configure firewalld to allow the
traffic or simply disable it for sake of testing.

* chore: rebuild schemas

* chore: remove unused property

* chore: disable namespaces for other platforms

* chore: make sure we timeout temporary servers

* feat: analyzer now supports multi-node collection

* feat: check both udp and tcp even on failure

check both protocols even if one fails. this pr commit also introduces a
timeout that can be set by the user.

* feat: add templating to the failure outcome

allow users to dump the errors found during the analysis.

* chore: addressing pr comments

* feat: delete interface pair before namespace

even though the interface pair is deleted everyttime we delete the
namespace on my tests we better delete it before we delete the
namespace.

this comes out of a review comment where some people seem to still be
able to see the interface pair even after the namespace is deleted.

i.e. better safe than sorry.

* chore: fix typo on comment
2024-11-06 11:30:13 +01:00
Ash
ea900a1881 chore: Refactor host cpu analyzer for remote collection (#1664)
* Refactor host cpu analyzer for remote collection

---------

Co-authored-by: Gerard Nguyen <gerard@replicated.com>
2024-11-06 14:43:27 +11:00
Gerard Nguyen
f0b8de68ae feat: multiple nodes analyzers (#1667)
* implement refactor for multiple node analyzers

---------

Co-authored-by: Diamon Wiggins <38189728+diamonwiggins@users.noreply.github.com>
2024-11-04 14:17:39 +11:00
Ash
544a700062 [sc-114813] copy HostCollector fails to copy binary files when run in cluster (#1669)
* Don't convert output bytes to string

This prevents binary files getting mangled when the collector ourput is being passed around between functions

* Update pkg/collect/runner.go

Co-authored-by: Evans Mungai <evans@replicated.com>

* organise imports

---------

Co-authored-by: Evans Mungai <evans@replicated.com>
2024-10-31 10:44:35 +00:00
Dexter Yan
059b5d14d2 fix(collector): limit run pod collector to delete only one related secret (#1668)
* fix(collector): limit run pod collector to delete only related secret

* change to ctx
2024-10-30 14:19:30 +00:00
Evans Mungai
deda4ce98c feat: Do not prompt users to save support bundle analysis results (#1662)
In interactive mode, do not prompt users to save support
bundle analysis results. Users end up providing this file
instead of the support bundle archive. The analysis results
are contained in the support bundle archive already

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-10-25 13:03:16 +01:00
Dexter Yan
350418c6e9 feat(host-collector): add progress for host collector (#1659) 2024-10-25 15:34:09 +13:00
ada mancini
eacff7112f support adding a CA cert to http collector (#1624)
* add a TLS parameter for cacert

* pass a ca cert into http request

* test preflight

* make schemas

* log extra information from http request

* pass a proxy into the collector spec

* hitting a segfault; breakpoint

* accept a dir, file, or a string-literal as CA

* move tls params into get, put, post methods

* test for cert untrusted response

* make generate

* make schemas

* more test cases

* make schemas

* dont include system certs

* make generate && make schemas

* resolve gosec G402 warning

* remove old check for system certs

* ignore errcheck "return value not checked" linter errors
2024-10-23 18:15:08 -04:00
Dexter Yan
0d21eed5f8 fix(support): add missing host collectors for ParseSupportBundle (#1656)
* fix(support): add missing host collectors for ParseSupportBundle

* update

* add host ananlyers
2024-10-22 13:07:44 +13:00
Diamon Wiggins
b88bc8ddf7 Refactor Multi Node Analyzers (#1646)
* initial refactor of host os analyzer

* refactor remote collect analysis

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
Co-authored-by: Gerard Nguyen <gerard@replicated.com>
Co-authored-by: Evans Mungai <evans@replicated.com>
2024-10-22 10:45:50 +13:00
Evans Mungai
9c24ab6067 chore: Remove preempted deprecation warnings (#1655)
Signed-off-by: Evans Mungai <evans@replicated.com>
2024-10-22 08:35:36 +11:00
Gerard Nguyen
289102f16d bug: fix nil check in host collector filter (#1653)
* add nil check in filter host collector
2024-10-18 15:58:32 +11:00
Gerard Nguyen
ffa1c040e2 fix: [sc-111255] CRD analyzer outcomes has no Warn field (#1647)
add warn field to CRD analyzer
2024-10-14 14:36:42 +11:00
Evans Mungai
0113624352 chore(support-bundle): respect using load-cluster-specs=false (#1634)
* fix: Allow using load-cluster-specs=false

Signed-off-by: Evans Mungai <evans@replicated.com>

* Some more simplification

Signed-off-by: Evans Mungai <evans@replicated.com>

* Ensure error in loading specs is printed in CLI

Signed-off-by: Evans Mungai <evans@replicated.com>

* Run linter

Signed-off-by: Evans Mungai <evans@replicated.com>

* Fix failing tests

Signed-off-by: Evans Mungai <evans@replicated.com>

* Remove unnecessary test case rename

Signed-off-by: Evans Mungai <evans@replicated.com>

* Fix error wrapping

Signed-off-by: Evans Mungai <evans@replicated.com>

* Check if load-cluster-specs was provided in cli

Signed-off-by: Evans Mungai <evans@replicated.com>

* Better wording in comments

Signed-off-by: Evans Mungai <evans@replicated.com>

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-10-11 13:48:32 -04:00
Shubhag Saxena
52efd167ad feat: allow users to check cpu arch (#1644) 2024-10-10 18:59:22 +05:30
Diamon Wiggins
8105fa00e9 Refactor Remote Host Collection (#1633)
* refactor remote collectors

* add remotecollect params struct

* remove commented checkrbac function

* removed unused function

* add temp comments

* refactor to not require RemoteCollect method per collector

* removed unneeded param

* removed unneeded param

* more refactor

* more refactor

* remove unneeded function

* remove debug print

* fix analyzer results

* move rbac to separate file

* be more specific with rbac function name

* fix imports

* fix node list file

* make k8s rest client config consistent with in cluster collection

* add ctx and otel tracing

* add test for allCollectedData

* move runHostCollectorsInPod to spec instead of metadata

* make generate

* fix broken references to supportbundle metadata

* add e2e tests

* update loader tests

* fix tests

* fix hostos remote collector spec

* update remoteHostCollectrs.yaml

---------

Co-authored-by: Dexter Yan <yanshaocong@gmail.com>
2024-10-09 18:38:49 +13:00
Evans Mungai
0240a632c9 chore: Collect endpointslices resources (#1636)
Signed-off-by: Evans Mungai <evans@replicated.com>
2024-10-03 14:55:53 +01:00
Ash
f58f02560f Allow goldpinger / goldpinger util images to be set in collector spec (#1635)
* Add image parameter to the goldpinger collector

* Pass image directly as a function arg

Also allow util image to be set in spec

* Remove pointless util image override

* Update pkg/collect/goldpinger.go

Co-authored-by: Evans Mungai <evans@replicated.com>

* Simplify image override

---------

Co-authored-by: Evans Mungai <evans@replicated.com>
2024-10-03 13:18:36 +01:00
Ricardo Maraschini
2efbc20b7c feat: allow users to check cpu flags (#1631)
allow users to check if specific cpu flags are supported by the host.

```yaml
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
  name: ec-cluster-preflight
spec:
  collectors:
  - cpu: {}
  analyzers:
  - cpu:
      checkName: CPU
      outcomes:
        - pass:
            when: hasFlags cmov,cx8,fpu,fxsr,mmx
            message: CPU supports all required flags
        - fail:
            message: CPU not supported
```
2024-10-01 10:48:25 +02:00
Gerard Nguyen
c1c4b612a4 feat: [sc-113128] Create node list file before running remote host collector (#1632)
* create node list
2024-10-01 14:43:24 +10:00
Gerard Nguyen
d60f9a6b76 change resolvedFromSearch content (#1629) 2024-09-30 11:22:10 +10:00
Ricardo Maraschini
668b7ed0b2 feat: add CPU micro architecture support (#1628)
allows troubleshoot to collect and analyze CPU micro architecture. this
is an usage example:

```yaml
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
  name: ec-cluster-preflight
spec:
  collectors:
  - cpu: {}
  analyzers:
  - cpu:
      checkName: CPU
      outcomes:
        - pass:
            when: 'supports x86-64-v2'
            message: CPU supports x86-64-v2
        - fail:
            message: CPU does not support x86-64-v2
```
2024-09-27 17:16:49 +02:00
Evans Mungai
2bb611cda1 bug: Remove duplicate results in preflights (#1626)
Change to stop re-analysing preflight results when uploadResultsTo is present leading to duplicate results

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-09-26 15:25:39 +01:00
Dexter Yan
142015cce3 feat(analyzer): enable host os info analyzer to support multiple nodes (#1618) 2024-09-26 10:25:08 +12:00
Evans Mungai
83f02f4705 feat: Install goldpinger daemonset if one does not exist when running goldpinger collector (#1619)
* feat: Install goldpinger if one does not exist when running goldpinger collector

- Deploy golpinger daemonset if one is not detected in the cluster
- Clean up all deployed resources
- Add delay to allow users to wait for goldpinger to perform checks

Signed-off-by: Evans Mungai <evans@replicated.com>

* Add missing test data file

Signed-off-by: Evans Mungai <evans@replicated.com>

* Better naming of create resource functions

Signed-off-by: Evans Mungai <evans@replicated.com>

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-09-24 17:17:14 +01:00
Dexter Yan
e97b9613a5 feat(support-bundle): add runHostCollectorsInPod in spec (#1608) 2024-09-20 11:57:58 -05:00
Gerard Nguyen
8823f7d99e feat: host collector for DNS (#1617)
* add struct for host dns collector

* add miekg/dns

* add more logs

* nit

* new field names

* use Hostnames instead of Names

* misc update

* make schemas

* no error when there is no resolv.conf

* query all searches

* add summary.json file

* merge summary into result file

* query AAAA and CNAME as well

* update schema for hostnames to be required
2024-09-20 08:13:57 +10:00
Evans Mungai
aea4f7c87c feat: Optionally save preflight bundles to disk (#1612)
* feat: Optionally save preflight bundles to disk

Signed-off-by: Evans Mungai <evans@replicated.com>

* Add e2e test of saving preflight bundle

Signed-off-by: Evans Mungai <evans@replicated.com>

* Update cli docs

Signed-off-by: Evans Mungai <evans@replicated.com>

* Expose GetVersionFile function publicly

Signed-off-by: Evans Mungai <evans@replicated.com>

* Store analysis.json file in preflight bundle

Signed-off-by: Evans Mungai <evans@replicated.com>

* Run go fmt when running lint fixers

Signed-off-by: Evans Mungai <evans@replicated.com>

* Always generate a preflight bundle in CLI

Signed-off-by: Evans Mungai <evans@replicated.com>

* Print saving bundle message to stderr

Signed-off-by: Evans Mungai <evans@replicated.com>

* Revert changes in docs directory

Signed-off-by: Evans Mungai <evans@replicated.com>

* Use NewResult constructor

Signed-off-by: Evans Mungai <evans@replicated.com>

* Log always when preflight bundle is saved to disk

Signed-off-by: Evans Mungai <evans@replicated.com>

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-09-16 23:36:52 +01:00
Gerard Nguyen
05dcae2388 fix: [sc-112114] registry collector failed to talk to Replicated private registry (#1613)
decode auth for registry secret
2024-09-13 15:03:52 +01:00
Gerard Nguyen
7484b10914 feat: [sc-110727] troubleshoot: collector/analyzer for wildcard dns (#1606)
* store DNS collector in JSON output for analyze later

* fix incorrect path

* configurable dns image

* make non resolvable domain configurable

* nit update address field

* * update dns util image
* add unit test
2024-09-11 14:35:30 +10:00
Dexter Yan
0a2c9c74ab feat(analyzer): allow templating for Node Resources Analyzer (#1605)
* feat(analyzer): allow templating for Node Resources Analyzer
2024-09-02 09:42:40 +12:00
Ethan Mosbaugh
1b1efa133e feat(fio): add option to disable runtime (#1601) 2024-08-22 16:47:08 -07:00
Evans Mungai
ff31f5af0b Log when analysers fail to match any outcome conditions (#1597)
Signed-off-by: Evans Mungai <evans@replicated.com>
2024-08-20 10:52:28 +01:00
Diamon Wiggins
fa14616009 Log non-existentent analyzers instead of adding to analyzer results (#1593)
log to debug non-existent analyzes instead of adding to analyzers results
2024-08-14 15:34:36 -04:00
Gerard Nguyen
47656a8e6f feat: etcd collector (#1589)
* new schema for etcd collector

* add placeholder

* wip

* get supported distribution

* add exec implementation

* wait for etcd pod to be ready

* misc

* update k0s etcd certs path

* fix unit tests

* address code reviews

* update from code review

* add etcdctl version
2024-08-13 08:42:26 +10:00
Gerard Nguyen
60263caf78 feat: [sc-108732] Can't add annotations in pods executed with the runPod collector (#1590)
add new field annotations for run pod collector
2024-08-08 10:13:36 +10:00
Gerard Nguyen
a57171a918 feat: [sc-108689] troubleshoot: journald collector (#1586)
* add schema for Journald Host Collector

* implement journald host collector

* update host collector

* add --no-pager
2024-07-29 09:44:43 +10:00
Evans Mungai
01d5804977 feat: cgroups host collector (#1581)
Linux control groups host collector that detects whether the specified mountPoint is a cgroup filesystem and what version it is. The collector also collects information of the configured cgroup controllers.

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-07-24 16:46:04 +01:00
Evans Mungai
1444c01725 feat: json compare host analyser (#1582)
* feat: json compore host analyser

Signed-off-by: Evans Mungai <evans@replicated.com>

* Add missing json compare host analyser file

Signed-off-by: Evans Mungai <evans@replicated.com>

* Generate schemas

Signed-off-by: Evans Mungai <evans@replicated.com>

* Fix failing tests

Signed-off-by: Evans Mungai <evans@replicated.com>

* Ensure json compare analyser always has a title

Signed-off-by: Evans Mungai <evans@replicated.com>

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-07-24 14:27:20 +01:00