Commit Graph

318 Commits

Author SHA1 Message Date
Salah Al Saleh
d5a6b19417 Add a host analyzer to check if a subnet contains an IP address (#1735)
* Add a host collector / analyzer to check if a subnet contains an IP address
2025-02-13 13:16:59 -08:00
Ash
de791e951c Enable Daemonsets in ClusterResources analyzer (#1729) 2025-02-06 13:55:39 -05:00
Dexter Yan
64ee9e5596 feat(nodeResources): add GPU support (#1708)
* feat(nodeResources): add GPU support

* add resourceCapacity and sum test

* update with make schemas

* Correct tests names

Signed-off-by: Evans Mungai <evans@replicated.com>

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
Co-authored-by: Evans Mungai <evans@replicated.com>
2025-01-03 15:11:10 +13:00
Gerard Nguyen
a6fbf144b8 feat: container statuses analyzer (#1698)
* new schema for analyzer ClusterContainerStatues
2024-12-04 10:36:23 +11:00
Miguel Varela Ramos
8e2647077d feat: add support for matchExpressions when filtering for nodes (#1697)
* feat: add support for matchExpressions when filtering for nodes

* fix: make generate
2024-11-30 23:15:26 +11:00
Dexter Yan
1a828fa90b fix(analyzer): add missing warning in outcome (#1687) 2024-11-13 16:32:54 +13:00
João Antunes
197f6de425 feat(host_analyzer): add host sysctl analyzer (#1681)
* feat(host_analyzer): add host sysctl analyzer

* chore: add e2e tests to support bundle collection

* chore: missing spec e2e test update

* chore: cleanup remote collector and use parse operator

* chore: update schemas
2024-11-08 18:55:24 +00:00
Evans Mungai
d25aa7d0ea fix: Do not fail analysis if node list does not exist (#1678)
* fix: Do not error if node list does not exist

Signed-off-by: Evans Mungai <evans@replicated.com>

* fix test fail

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
Co-authored-by: Dexter Yan <yanshaocong@gmail.com>
2024-11-08 09:53:03 +13:00
Ricardo Maraschini
e272683bce feat: implement collector and analyser for network namespace connectivity (#1670)
* feat: implement collector and analyser for network namespace connectivity

checks if two network namespaces can talk to each other on udp and tcp.
its usage is as follows:

```yaml
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
  name: test
spec:
  hostCollectors:
  - networkNamespaceConnectivity:
      collectorName: check-network-connectivity
      fromCIDR: 10.0.0.0/24
      toCIDR: 10.0.1.0/24
  hostAnalyzers:
  - networkNamespaceConnectivity:
      collectorName: check-network-connectivity
      outcomes:
      - pass:
          message: "Communication between 10.0.0.0/24 and 10.0.1.0/24 is working"
      - fail:
          message: "Communication between 10.0.0.0/24 and 10.0.1.0/24 isn't working"
```

if this fails then you may need to enable `forwarding` with:

```bash
sysctl -w net.ipv4.ip_forward=1
```

if it still fails then you may need to configure firewalld to allow the
traffic or simply disable it for sake of testing.

* chore: rebuild schemas

* chore: remove unused property

* chore: disable namespaces for other platforms

* chore: make sure we timeout temporary servers

* feat: analyzer now supports multi-node collection

* feat: check both udp and tcp even on failure

check both protocols even if one fails. this pr commit also introduces a
timeout that can be set by the user.

* feat: add templating to the failure outcome

allow users to dump the errors found during the analysis.

* chore: addressing pr comments

* feat: delete interface pair before namespace

even though the interface pair is deleted everyttime we delete the
namespace on my tests we better delete it before we delete the
namespace.

this comes out of a review comment where some people seem to still be
able to see the interface pair even after the namespace is deleted.

i.e. better safe than sorry.

* chore: fix typo on comment
2024-11-06 11:30:13 +01:00
Ash
ea900a1881 chore: Refactor host cpu analyzer for remote collection (#1664)
* Refactor host cpu analyzer for remote collection

---------

Co-authored-by: Gerard Nguyen <gerard@replicated.com>
2024-11-06 14:43:27 +11:00
Gerard Nguyen
f0b8de68ae feat: multiple nodes analyzers (#1667)
* implement refactor for multiple node analyzers

---------

Co-authored-by: Diamon Wiggins <38189728+diamonwiggins@users.noreply.github.com>
2024-11-04 14:17:39 +11:00
Diamon Wiggins
b88bc8ddf7 Refactor Multi Node Analyzers (#1646)
* initial refactor of host os analyzer

* refactor remote collect analysis

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
Co-authored-by: Gerard Nguyen <gerard@replicated.com>
Co-authored-by: Evans Mungai <evans@replicated.com>
2024-10-22 10:45:50 +13:00
Gerard Nguyen
ffa1c040e2 fix: [sc-111255] CRD analyzer outcomes has no Warn field (#1647)
add warn field to CRD analyzer
2024-10-14 14:36:42 +11:00
Shubhag Saxena
52efd167ad feat: allow users to check cpu arch (#1644) 2024-10-10 18:59:22 +05:30
Ricardo Maraschini
2efbc20b7c feat: allow users to check cpu flags (#1631)
allow users to check if specific cpu flags are supported by the host.

```yaml
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
  name: ec-cluster-preflight
spec:
  collectors:
  - cpu: {}
  analyzers:
  - cpu:
      checkName: CPU
      outcomes:
        - pass:
            when: hasFlags cmov,cx8,fpu,fxsr,mmx
            message: CPU supports all required flags
        - fail:
            message: CPU not supported
```
2024-10-01 10:48:25 +02:00
Gerard Nguyen
c1c4b612a4 feat: [sc-113128] Create node list file before running remote host collector (#1632)
* create node list
2024-10-01 14:43:24 +10:00
Ricardo Maraschini
668b7ed0b2 feat: add CPU micro architecture support (#1628)
allows troubleshoot to collect and analyze CPU micro architecture. this
is an usage example:

```yaml
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
  name: ec-cluster-preflight
spec:
  collectors:
  - cpu: {}
  analyzers:
  - cpu:
      checkName: CPU
      outcomes:
        - pass:
            when: 'supports x86-64-v2'
            message: CPU supports x86-64-v2
        - fail:
            message: CPU does not support x86-64-v2
```
2024-09-27 17:16:49 +02:00
Evans Mungai
2bb611cda1 bug: Remove duplicate results in preflights (#1626)
Change to stop re-analysing preflight results when uploadResultsTo is present leading to duplicate results

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-09-26 15:25:39 +01:00
Dexter Yan
142015cce3 feat(analyzer): enable host os info analyzer to support multiple nodes (#1618) 2024-09-26 10:25:08 +12:00
Evans Mungai
83f02f4705 feat: Install goldpinger daemonset if one does not exist when running goldpinger collector (#1619)
* feat: Install goldpinger if one does not exist when running goldpinger collector

- Deploy golpinger daemonset if one is not detected in the cluster
- Clean up all deployed resources
- Add delay to allow users to wait for goldpinger to perform checks

Signed-off-by: Evans Mungai <evans@replicated.com>

* Add missing test data file

Signed-off-by: Evans Mungai <evans@replicated.com>

* Better naming of create resource functions

Signed-off-by: Evans Mungai <evans@replicated.com>

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-09-24 17:17:14 +01:00
Dexter Yan
0a2c9c74ab feat(analyzer): allow templating for Node Resources Analyzer (#1605)
* feat(analyzer): allow templating for Node Resources Analyzer
2024-09-02 09:42:40 +12:00
Ethan Mosbaugh
1b1efa133e feat(fio): add option to disable runtime (#1601) 2024-08-22 16:47:08 -07:00
Evans Mungai
ff31f5af0b Log when analysers fail to match any outcome conditions (#1597)
Signed-off-by: Evans Mungai <evans@replicated.com>
2024-08-20 10:52:28 +01:00
Diamon Wiggins
fa14616009 Log non-existentent analyzers instead of adding to analyzer results (#1593)
log to debug non-existent analyzes instead of adding to analyzers results
2024-08-14 15:34:36 -04:00
Evans Mungai
1444c01725 feat: json compare host analyser (#1582)
* feat: json compore host analyser

Signed-off-by: Evans Mungai <evans@replicated.com>

* Add missing json compare host analyser file

Signed-off-by: Evans Mungai <evans@replicated.com>

* Generate schemas

Signed-off-by: Evans Mungai <evans@replicated.com>

* Fix failing tests

Signed-off-by: Evans Mungai <evans@replicated.com>

* Ensure json compare analyser always has a title

Signed-off-by: Evans Mungai <evans@replicated.com>

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-07-24 14:27:20 +01:00
Evans Mungai
0020c1129e feat: Allow checking kernel versions only in host os analyzer (#1585)
* Allow checking kernel versions only in host os analyzer

Signed-off-by: Evans Mungai <evans@replicated.com>

* Minor fix in logic

Signed-off-by: Evans Mungai <evans@replicated.com>

* Fix formatting

Signed-off-by: Evans Mungai <evans@replicated.com>

---------

Signed-off-by: Evans Mungai <evans@replicated.com>
2024-07-24 07:04:59 +01:00
Gerard Nguyen
8173759e52 feat: [sc-106927] Allow kernelConfig analyser to check kernel capability is either built in or loaded for EC host preflights (#1572)
* allow multiple value in kernel config check

* update unit test
2024-07-09 09:45:42 +10:00
Gerard Nguyen
edfa01c5c4 feat: [sc-106625] http analyzer for in-cluster (#1566)
* http analzyer for in-cluster
* make check-schemas
2024-06-19 12:14:37 +10:00
Gerard Nguyen
80e5fac07c feat: New host collector and analyzer for Kernel Configs (#1546)
* new struct and update schemas

* implement Collect function

* add kernel config to collector struct

* generate kernel config analyzer schema

* implement kernel config analyzer

* fail on no match in pass outcome

* run make check-schemas

* fix failed unit test

* update from code review

* add selectedConfigs field

* run make check-schemas
2024-05-27 09:55:39 +10:00
Dexter Yan
51c07b42c3 feat(analyzer): let cluster resource case insensitive to fix name inconsistent (#1547)
* feat(analyzer): let cluster resource case insensitive
2024-05-22 11:37:39 +12:00
Evans Mungai
78bbea18ac feat: Prefer embedded-cluster over k0s when detecting distro (#1544)
* feat: Prefer embedded-cluster over k0s when detecting distro

* Implement check for embedded cluster detection
2024-05-15 09:26:03 +01:00
Dexter Yan
cb5db1733a feat(analyzer): make sure ReplicaSetStatus has valid result (#1538)
* feat(analyzer): make sure ReplicaSetStatus has valid result
---------

Co-authored-by: Gerard Nguyen <gerard@replicated.com>
2024-05-01 09:02:45 +12:00
Evans Mungai
a374b0a3ab feat: Detect EC as a distribution (#1529) 2024-04-25 17:18:52 +01:00
Evans Mungai
6aaba59ebd feat: Detect k0s distribution in analyser (#1527) 2024-04-25 15:29:37 +01:00
Evans Mungai
db871e6889 feat: node metrics analyser (#1520)
* feat: node metrics analyser

The analyser only checks PVC usage at the moment. More analysers
can be added on a need to have basis

* Add tests

* Fix flaky test by waiting for goldpinger pods to start

* Fix how outcomes get checked

* Fix catch all outcome condition

* Fix test

* feat: node metrics analyser

The analyser only checks PVC usage at the moment. More analysers
can be added on a need to have basis

* Add tests

* Fix flaky test by waiting for goldpinger pods to start

* Fix how outcomes get checked

* Fix catch all outcome condition

* Fix test

* Regenerate schemas

* Fix failing test

---------

Co-authored-by: Dexter Yan <yanshaocong@gmail.com>
2024-04-09 12:14:10 +01:00
Ethan Mosbaugh
6f7acec7b3 feat: distribution analyzer support for kind (#1521) 2024-04-07 09:24:42 +12:00
Gerard Nguyen
742e92f1ee New Event Analyzer (#1474)
* add new Event analyzer
2024-03-07 08:31:45 +13:00
Gerard Nguyen
553d709043 fix panic in Velero analyzer when there's no Velero deployment found (#1497)
* fix panic in Velero analyzer when there's no Velero deployment found

* do not omit error in find files
2024-03-05 21:58:37 +11:00
Xav Paice
8b491b5702 Downgrade Velero to v1.10 (#1462)
In order to be compatible with KOTS, downgrade Velero to 1.10.

This removes some features from the Velero collector, but unblocks KOTS from being able
to import Troubleshoot.

We should be wary of updating Velero in the future to prevent this recurring.

sc-98475
2024-02-09 08:13:26 +13:00
Xav Paice
fe6b1c7448 Add workaround for EKS version string (#1449)
* Add workaround for EKS version string

The EKS version string returned is not semver compliant.  To work around this, we remove
the suffix for version strings that contain -eks-.

Fixes #1441

* Add parsing version test cases

* Rename function

---------

Co-authored-by: Evans Mungai <evans@replicated.com>
2024-02-06 10:04:01 +13:00
Diamon Wiggins
1447e18c56 feat: Allow templating of outcome messages for the JSON/YAML compare analyzers (#1432)
* feat: allow templating of the outcome message for the JSON and YAML Compare analyzers

* Update pkg/analyze/json_compare.go

Co-authored-by: Evans Mungai <evans@replicated.com>
2024-01-26 10:56:57 -05:00
Evans Mungai
53113c0170 feat: goldpinger collector and analyser (#1398)
* feat: goldpinger analyser

Analyser to generate a report from goldpinger results

* Add goldpinger testdata

* Goldpinger collector

* Improvements after running tests

* More minor updates after further testing

* Better error message if a container fails to start

* A few more updates

* Add goldpinger e2e test

* Update schemas

* Clean up help installs in e2e tests

* Add resource limits to goldpinger pods

* Some minor improvements

* Some more changes noted when writing docs

* Update schemas

* A few more updates when testing with kURL

* Log goldpinger tests

* Tests before exit code
2023-12-12 11:02:41 +00:00
Archit Sharma
7038da85b1 Velero analyzer (#1366)
* feat: add velero analyzer (#806)

  * updated schema
  * analyzer without collector
  * tests
  * covers deprecated Restic repository type
  * velero version from deployment image to check deprecated type
  * read for both velero pod kinds (velero*, node-agent*)

---------

Signed-off-by: Archit Sharma <archit@pm.me>
2023-11-03 18:41:17 +05:30
Jason McCampbell
a7bb9ea31e Add support for Oracle OKE environment (#1387) 2023-11-02 17:16:18 +13:00
Andrew Lavery
461cc994ef allow warning when filesystem performance not collected (fio) (#1363)
* add a way to only warn when the host fs perf file was not collected

* make fileNotCollected an exported constant
2023-10-10 11:03:47 -04:00
Diamon Wiggins
32b0e1a890 Fix mssql and mysql analyzers (#1359)
fix mssql and mysql analyzers
2023-10-09 15:34:02 +01:00
ada mancini
e3adc1cb35 call out to fio for host filesystem performance (#1275)
* stashing changes

* split filesystem collector into fio and legacy functions

* read fio results into analyzer

* remove test script

* update go.mod

* remove old notes

* go mod tidy

* fix up go.mod

* fix up go.mod

* refactor tests for fio

* make schemas

* remove local scripts

* local watch script for building troubleshoot

* document watch script

* fix var names

* handle errors if run as non-root

* go mod tidy

* use String interface

* collector happy path test

* invalid filesize

* invalid filesize

* tests

* remove old code

* remove old init function

* let actions tests run this

* clean up tests

* go mod tidy

* remove duplicated type declaration

* remove old file create code
2023-10-03 14:21:56 -04:00
Diamon Wiggins
6cbe188abe Fix incorrect result URI for pass and warn outcomes in common status analyzer (#1333)
* fix result URI
* revert examples
* fix warn outcome
2023-09-16 07:36:58 +12:00
Evans Mungai
96c482a1f5 fix: complete mssql analyser implementation (#1290)
* fix: complete mssql analyser implementation

* Add mssql analyser tests and mssql collector logs

* Close mssql db after collecting data
2023-08-25 10:38:39 +01:00
Dexter Yan
04f69b3f8c fix(json_compare): solve unorderd slice deep equal (#1300)
* fix(json_compare): sort slice

* fix(json_compare): improve

* fix(json_compare): remove duplicated tests

---------

Co-authored-by: Evans Mungai <evans@replicated.com>
2023-08-24 11:48:33 +01:00