Ethan Mosbaugh
f352396e2e
fix(collect): add context timeout to registry collector ( #1846 )
...
* fix(collect): add context timeout to registry collector
* f
* f
2025-09-10 11:47:58 -07:00
João Antunes
5cd98acdc1
fix(cluster_resources): pod disruption budgets for policy v1 not being collected ( #1843 )
...
* fix(cluster_resources): pod disruption budgets for policy v1 not being collected
* fix: e2e test
* fix: now actually fix the e2e test
2025-09-10 16:30:06 +01:00
Ash
dd48aadf7f
Allow filtering node resources on taint. ( #1840 )
...
* allow filtering node resources on taint
2025-09-09 14:33:51 +01:00
Ethan Mosbaugh
6e62251904
chore(deps): bump the security group with 16 updates ( #1835 )
...
* chore(deps): bump the security group with 16 updates
Bumps the security group with 16 updates:
| Package | From | To |
| --- | --- | --- |
| [github.com/shirou/gopsutil/v4](https://github.com/shirou/gopsutil ) | `4.25.7` | `4.25.8` |
| [github.com/spf13/cobra](https://github.com/spf13/cobra ) | `1.9.1` | `1.10.1` |
| [github.com/spf13/pflag](https://github.com/spf13/pflag ) | `1.0.7` | `1.0.9` |
| [github.com/stretchr/testify](https://github.com/stretchr/testify ) | `1.11.0` | `1.11.1` |
| [go.opentelemetry.io/otel](https://github.com/open-telemetry/opentelemetry-go ) | `1.37.0` | `1.38.0` |
| [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go ) | `1.37.0` | `1.38.0` |
| [k8s.io/api](https://github.com/kubernetes/api ) | `0.33.4` | `0.34.0` |
| [k8s.io/apiextensions-apiserver](https://github.com/kubernetes/apiextensions-apiserver ) | `0.33.4` | `0.34.0` |
| [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery ) | `0.33.4` | `0.34.0` |
| [k8s.io/apiserver](https://github.com/kubernetes/apiserver ) | `0.33.4` | `0.34.0` |
| [k8s.io/cli-runtime](https://github.com/kubernetes/cli-runtime ) | `0.33.4` | `0.34.0` |
| [k8s.io/client-go](https://github.com/kubernetes/client-go ) | `0.33.4` | `0.34.0` |
| [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime ) | `0.21.0` | `0.22.0` |
| [k8s.io/kubelet](https://github.com/kubernetes/kubelet ) | `0.33.4` | `0.34.0` |
| [k8s.io/metrics](https://github.com/kubernetes/metrics ) | `0.33.4` | `0.34.0` |
| [k8s.io/utils](https://github.com/kubernetes/utils ) | `0.0.0-20241104100929-3ea5e8cea738` | `0.0.0-20250604170112-4c0f3b243397` |
Updates `github.com/shirou/gopsutil/v4` from 4.25.7 to 4.25.8
- [Release notes](https://github.com/shirou/gopsutil/releases )
- [Commits](https://github.com/shirou/gopsutil/compare/v4.25.7...v4.25.8 )
Updates `github.com/spf13/cobra` from 1.9.1 to 1.10.1
- [Release notes](https://github.com/spf13/cobra/releases )
- [Commits](https://github.com/spf13/cobra/compare/v1.9.1...v1.10.1 )
Updates `github.com/spf13/pflag` from 1.0.7 to 1.0.9
- [Release notes](https://github.com/spf13/pflag/releases )
- [Commits](https://github.com/spf13/pflag/compare/v1.0.7...v1.0.9 )
Updates `github.com/stretchr/testify` from 1.11.0 to 1.11.1
- [Release notes](https://github.com/stretchr/testify/releases )
- [Commits](https://github.com/stretchr/testify/compare/v1.11.0...v1.11.1 )
Updates `go.opentelemetry.io/otel` from 1.37.0 to 1.38.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases )
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md )
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.37.0...v1.38.0 )
Updates `go.opentelemetry.io/otel/sdk` from 1.37.0 to 1.38.0
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases )
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md )
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.37.0...v1.38.0 )
Updates `k8s.io/api` from 0.33.4 to 0.34.0
- [Commits](https://github.com/kubernetes/api/compare/v0.33.4...v0.34.0 )
Updates `k8s.io/apiextensions-apiserver` from 0.33.4 to 0.34.0
- [Release notes](https://github.com/kubernetes/apiextensions-apiserver/releases )
- [Commits](https://github.com/kubernetes/apiextensions-apiserver/compare/v0.33.4...v0.34.0 )
Updates `k8s.io/apimachinery` from 0.33.4 to 0.34.0
- [Commits](https://github.com/kubernetes/apimachinery/compare/v0.33.4...v0.34.0 )
Updates `k8s.io/apiserver` from 0.33.4 to 0.34.0
- [Commits](https://github.com/kubernetes/apiserver/compare/v0.33.4...v0.34.0 )
Updates `k8s.io/cli-runtime` from 0.33.4 to 0.34.0
- [Commits](https://github.com/kubernetes/cli-runtime/compare/v0.33.4...v0.34.0 )
Updates `k8s.io/client-go` from 0.33.4 to 0.34.0
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md )
- [Commits](https://github.com/kubernetes/client-go/compare/v0.33.4...v0.34.0 )
Updates `sigs.k8s.io/controller-runtime` from 0.21.0 to 0.22.0
- [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases )
- [Changelog](https://github.com/kubernetes-sigs/controller-runtime/blob/main/RELEASE.md )
- [Commits](https://github.com/kubernetes-sigs/controller-runtime/compare/v0.21.0...v0.22.0 )
Updates `k8s.io/kubelet` from 0.33.4 to 0.34.0
- [Commits](https://github.com/kubernetes/kubelet/compare/v0.33.4...v0.34.0 )
Updates `k8s.io/metrics` from 0.33.4 to 0.34.0
- [Commits](https://github.com/kubernetes/metrics/compare/v0.33.4...v0.34.0 )
Updates `k8s.io/utils` from 0.0.0-20241104100929-3ea5e8cea738 to 0.0.0-20250604170112-4c0f3b243397
- [Commits](https://github.com/kubernetes/utils/commits )
---
updated-dependencies:
- dependency-name: github.com/shirou/gopsutil/v4
dependency-version: 4.25.8
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: security
- dependency-name: github.com/spf13/cobra
dependency-version: 1.10.1
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: security
- dependency-name: github.com/spf13/pflag
dependency-version: 1.0.9
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: security
- dependency-name: github.com/stretchr/testify
dependency-version: 1.11.1
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: security
- dependency-name: go.opentelemetry.io/otel
dependency-version: 1.38.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: security
- dependency-name: go.opentelemetry.io/otel/sdk
dependency-version: 1.38.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: security
- dependency-name: k8s.io/api
dependency-version: 0.34.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: security
- dependency-name: k8s.io/apiextensions-apiserver
dependency-version: 0.34.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: security
- dependency-name: k8s.io/apimachinery
dependency-version: 0.34.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: security
- dependency-name: k8s.io/apiserver
dependency-version: 0.34.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: security
- dependency-name: k8s.io/cli-runtime
dependency-version: 0.34.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: security
- dependency-name: k8s.io/client-go
dependency-version: 0.34.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: security
- dependency-name: sigs.k8s.io/controller-runtime
dependency-version: 0.22.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: security
- dependency-name: k8s.io/kubelet
dependency-version: 0.34.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: security
- dependency-name: k8s.io/metrics
dependency-version: 0.34.0
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: security
- dependency-name: k8s.io/utils
dependency-version: 0.0.0-20250604170112-4c0f3b243397
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: security
...
Signed-off-by: dependabot[bot] <support@github.com >
* f
* f
* f
---------
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-09-08 09:32:56 -07:00
Diamon Wiggins
2861425293
feat(run-pod): allow image pull retries in run pod collector ( #1811 )
...
* allow image pull retries in run pod collector
* fix formatting
2025-07-25 11:30:14 -04:00
Diamon Wiggins
7a6bffeff5
chore: fix noisy info logs ( #1808 )
...
* refine logging
* keep progress message at level 0
2025-07-09 20:58:47 -04:00
Ethan Mosbaugh
38d8a45171
fix(host-analyze): certificate analyzer wrong file path ( #1807 )
2025-07-09 09:32:04 -04:00
Diamon Wiggins
989780af69
feat: allow secrets collector to retreive all key data if specified ( #1801 )
...
* allow secrets collector retreival all key data if specified
* add new line
* remove unneeded comments
2025-06-30 10:06:14 -04:00
Ethan Mosbaugh
a4a387eb0e
chore: CVE-2024-0406 remove github.com/mholt/archiver/v3 dependency ( #1793 )
2025-06-06 11:35:56 -07:00
Dmitriy Ivolgin
03efedf714
Follow logs when using runDaemonSet and runPod collectors ( #1783 )
...
Follow logs when using runDaemonSet collector
Signed-off-by: divolgin <dmitriy@replicated.com >
2025-05-09 12:54:28 -07:00
dependabot[bot]
9bca9c5245
chore(deps): bump github.com/distribution/distribution/v3 from 3.0.0-rc.3 to 3.0.0 ( #1771 )
...
* chore(deps): bump github.com/distribution/distribution/v3
Bumps [github.com/distribution/distribution/v3](https://github.com/distribution/distribution ) from 3.0.0-rc.3 to 3.0.0.
- [Release notes](https://github.com/distribution/distribution/releases )
- [Commits](https://github.com/distribution/distribution/compare/v3.0.0-rc.3...v3.0.0 )
---
updated-dependencies:
- dependency-name: github.com/distribution/distribution/v3
dependency-version: 3.0.0
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
* update go
* use constant format strings
* f
---------
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Lavery <laverya@umich.edu >
2025-04-21 16:45:47 +00:00
Luke Amdor
3a457d11fc
feat: add timestamps flag to logs collector ( #1776 )
...
* feat: add timestamps falg to logs collector
Kubernetes logs can be transmitted with the captured timestamps. This is useful for containers that do not log with timestamps. So I'm exposing that as a flag.
* fix: update schemas
2025-04-17 10:51:07 -04:00
Andrew Lavery
ab7f50d0ce
improve detection of tanzu clusters ( #1769 )
2025-04-03 11:25:44 -07:00
Ethan Mosbaugh
641c195db3
fix(collect.runPod): does not delete image pull secrets without name in spec ( #1761 )
...
* fix(collect.runPod): fix deleting image pull secrets
* f
* f
2025-03-17 16:21:28 -05:00
Johannes Tuchscherer
ef1cd66b1e
Handling the case when the Cluster Analyzer doesn't find a resource ( #1760 )
...
* Handling the case when the Cluster Analyzer doesn't find a resource
* Add namespace information to Resource not found fail message
2025-03-14 11:22:49 -07:00
Greg Schofield
64c63d3f7a
Log namespace when analyzing deployment status ( #1757 )
2025-03-12 13:49:15 +00:00
Andrew Lavery
9d9b3c565c
add additional test cases to the host os info analyzer ( #1754 )
2025-03-06 16:57:59 -06:00
Johannes Tuchscherer
3665d25abf
Http comperators ( #1753 )
...
* Allowing more comperators for the http analyzer
* test
* Update pkg/analyze/host_http.go
Co-authored-by: Andrew Lavery <laverya@umich.edu >
---------
Co-authored-by: Andrew Lavery <laverya@umich.edu >
2025-03-06 21:40:47 +00:00
Salah Al Saleh
97dcae9fc7
Ability to use sprig functions in analyzer templates ( #1745 )
...
* Ability to use sprig functions in analyzer templates
2025-02-21 08:10:46 -08:00
Ethan Mosbaugh
b80f38a9a0
fix(redact): multi-line redactors strip empty lines ( #1742 )
2025-02-20 21:55:05 -05:00
Andrew Lavery
dca4e675fa
update shirou/gopsutil to v4 ( #1744 )
2025-02-20 16:04:03 -08:00
Andrew Lavery
fb9ea281cb
improve the host OS collector and analyzer ( #1743 )
...
The OS version analyzer did not allow checking for things like "redhat 8.x" - this equates to >= 8 && < 9 in the new code.
Also, we previously only collected the OS name (like redhat, centos, or ubuntu) not the OS family (which would be rhel, rhel, and debian for the previous OSes) - this greatly reduces the number of cases required in an analyzer.
2025-02-20 13:03:53 -08:00
Ethan Mosbaugh
51c3a0c40f
fix(host-preflights): buildtin kernel modules file from wrong path ( #1741 )
...
* fix(host-preflights): buildtin kernel modules file from wrong path
* f
* f
* f
* f
2025-02-18 16:19:58 -05:00
Ethan Mosbaugh
8e1dc9c5cb
fix(preflights): builtin kernel modules file may be not found ( #1738 )
2025-02-17 15:54:38 -08:00
Ethan Mosbaugh
923293e79a
fix(preflights): support for builtin kernel modules ( #1737 )
...
* fix(preflights): support for builtin kernel modules
* f
2025-02-17 16:57:44 -06:00
Ethan Mosbaugh
ae2b5d1311
fix(ci): remove windows build from goreleaser ( #1736 )
2025-02-13 21:55:12 +00:00
Salah Al Saleh
d5a6b19417
Add a host analyzer to check if a subnet contains an IP address ( #1735 )
...
* Add a host collector / analyzer to check if a subnet contains an IP address
2025-02-13 13:16:59 -08:00
Ethan Mosbaugh
716dda221d
fix(host.kernelModules): /lib/modules does not exist in a container ( #1734 )
2025-02-13 11:54:33 -06:00
Dexter Yan
683391522e
fix(window): improve rename file process and remove windows release ( #1728 )
2025-02-11 17:33:08 +13:00
Ash
de791e951c
Enable Daemonsets in ClusterResources analyzer ( #1729 )
2025-02-06 13:55:39 -05:00
Gerard Nguyen
fa5365cfae
fix: [sc-118962] Unable to Retrieve TLS Parameters from Kubernetes Secrets with the Postgres Collector ( #1724 )
...
* use Data instead of StringData
2025-01-28 21:39:39 +11:00
Xav Paice
86b7e54466
Revert "feat: save YAML spec used to generate support bundle/preflight" ( #1715 )
...
Revert "feat: save YAML spec used to generate support bundle/preflight (#1713 )"
This reverts commit f6f51acbd5 .
2025-01-06 09:42:58 +11:00
Gerard Nguyen
f6f51acbd5
feat: save YAML spec used to generate support bundle/preflight ( #1713 )
...
* save YAML spec of support bundle
* save YAML spec of preflight
* add unit test
* redact TLS private key by default in output spec
* update YAML path for HTTP TLS redactor
2025-01-04 11:35:43 +11:00
Dexter Yan
64ee9e5596
feat(nodeResources): add GPU support ( #1708 )
...
* feat(nodeResources): add GPU support
* add resourceCapacity and sum test
* update with make schemas
* Correct tests names
Signed-off-by: Evans Mungai <evans@replicated.com >
---------
Signed-off-by: Evans Mungai <evans@replicated.com >
Co-authored-by: Evans Mungai <evans@replicated.com >
2025-01-03 15:11:10 +13:00
Gerard Nguyen
a6fbf144b8
feat: container statuses analyzer ( #1698 )
...
* new schema for analyzer ClusterContainerStatues
2024-12-04 10:36:23 +11:00
Miguel Varela Ramos
8e2647077d
feat: add support for matchExpressions when filtering for nodes ( #1697 )
...
* feat: add support for matchExpressions when filtering for nodes
* fix: make generate
2024-11-30 23:15:26 +11:00
Ash
ecc92b1e3e
[bug] Quick fix for handling non 200 status codes when loading specs from URI ( #1695 )
...
* Quick fix for handling non 200 status codes when loading specs from URI
Go http client already handles 3xx responses for us
* note
2024-11-25 15:04:38 +00:00
Ricardo Maraschini
9f5f0633cf
feat: rename templating variables ( #1693 )
...
when templating the output of the namespace connectivity check we were
referring to the 'fromCIDR' as 'fromNamespace'. it makes way more sense
to refer to it as 'fromCIDR' as this is how it is provided in the input
for the collector.
as this is a brand new feature it is very unlikely that anyone is using
this feature (except for the embedded cluster that still needs to be
patched accodringly).
this is how the analyser were defined before:
```yaml
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
name: ec-cluster-preflight
spec:
analyzers:
- networkNamespaceConnectivity:
collectorName: check-network-connectivity
outcomes:
- pass:
message: "Communication between {{ .FromNamespace }} and {{ .ToNamespace }} is working"
- fail:
message: "{{ .ErrorMessage }}"
```
and this is how it is now:
```yaml
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
name: ec-cluster-preflight
spec:
analyzers:
- networkNamespaceConnectivity:
collectorName: check-network-connectivity
outcomes:
- pass:
message: "Communication between {{ .FromCIDR }} and {{ .ToCIDR }} is working"
- fail:
message: "{{ .ErrorMessage }}"
```
2024-11-21 16:03:50 +01:00
Dexter Yan
6167fd8a5e
fix(collector): fix dns collector limited to 63 chars ( #1690 )
2024-11-19 17:47:24 +13:00
Gerard Nguyen
7bb88e6b83
feat: ensure Copy collector run last ( #1688 )
...
* ensure Copy collector run last
* * add unit test
* reorder in Preflight as well
2024-11-15 10:59:38 +11:00
Dexter Yan
1a828fa90b
fix(analyzer): add missing warning in outcome ( #1687 )
2024-11-13 16:32:54 +13:00
Ash
deeeea7cec
exec remote host collectors in a daemonset ( #1671 )
...
Co-authored-by: Gerard Nguyen <gerard@replicated.com >
Co-authored-by: Dexter Yan <yanshaocong@gmail.com >
2024-11-12 08:47:24 +13:00
João Antunes
197f6de425
feat(host_analyzer): add host sysctl analyzer ( #1681 )
...
* feat(host_analyzer): add host sysctl analyzer
* chore: add e2e tests to support bundle collection
* chore: missing spec e2e test update
* chore: cleanup remote collector and use parse operator
* chore: update schemas
2024-11-08 18:55:24 +00:00
Evans Mungai
d25aa7d0ea
fix: Do not fail analysis if node list does not exist ( #1678 )
...
* fix: Do not error if node list does not exist
Signed-off-by: Evans Mungai <evans@replicated.com >
* fix test fail
---------
Signed-off-by: Evans Mungai <evans@replicated.com >
Co-authored-by: Dexter Yan <yanshaocong@gmail.com >
2024-11-08 09:53:03 +13:00
João Antunes
77c9968ff6
feat(host_sysctl): add host sysctl collector ( #1676 )
...
* feat(host_sysctl): add host sysctl collector
* chore: add examples
* Update pkg/collect/host_sysctl.go
Co-authored-by: Evans Mungai <evans@replicated.com >
* chore: use sysctl package vs exec calls
* chore: make linter happy
* chore: make schemas
* chore: go back to sysctl exec
* chore: make linter happy
---------
Co-authored-by: Evans Mungai <evans@replicated.com >
2024-11-07 18:18:11 +00:00
Diamon Wiggins
06506ed95d
Fix remote host collection RBAC checks ( #1672 )
...
* fix remote host collection rbac checks
* move saveNodeList into collectRemoteHost function
* fix resource attribute list and retrieve namespace from kubeconfig
* revert change to set a default namespace from kubeconfig
* remove duplicate code
2024-11-07 10:07:27 -05:00
Ricardo Maraschini
e272683bce
feat: implement collector and analyser for network namespace connectivity ( #1670 )
...
* feat: implement collector and analyser for network namespace connectivity
checks if two network namespaces can talk to each other on udp and tcp.
its usage is as follows:
```yaml
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: test
spec:
hostCollectors:
- networkNamespaceConnectivity:
collectorName: check-network-connectivity
fromCIDR: 10.0.0.0/24
toCIDR: 10.0.1.0/24
hostAnalyzers:
- networkNamespaceConnectivity:
collectorName: check-network-connectivity
outcomes:
- pass:
message: "Communication between 10.0.0.0/24 and 10.0.1.0/24 is working"
- fail:
message: "Communication between 10.0.0.0/24 and 10.0.1.0/24 isn't working"
```
if this fails then you may need to enable `forwarding` with:
```bash
sysctl -w net.ipv4.ip_forward=1
```
if it still fails then you may need to configure firewalld to allow the
traffic or simply disable it for sake of testing.
* chore: rebuild schemas
* chore: remove unused property
* chore: disable namespaces for other platforms
* chore: make sure we timeout temporary servers
* feat: analyzer now supports multi-node collection
* feat: check both udp and tcp even on failure
check both protocols even if one fails. this pr commit also introduces a
timeout that can be set by the user.
* feat: add templating to the failure outcome
allow users to dump the errors found during the analysis.
* chore: addressing pr comments
* feat: delete interface pair before namespace
even though the interface pair is deleted everyttime we delete the
namespace on my tests we better delete it before we delete the
namespace.
this comes out of a review comment where some people seem to still be
able to see the interface pair even after the namespace is deleted.
i.e. better safe than sorry.
* chore: fix typo on comment
2024-11-06 11:30:13 +01:00
Ash
ea900a1881
chore: Refactor host cpu analyzer for remote collection ( #1664 )
...
* Refactor host cpu analyzer for remote collection
---------
Co-authored-by: Gerard Nguyen <gerard@replicated.com >
2024-11-06 14:43:27 +11:00
Gerard Nguyen
f0b8de68ae
feat: multiple nodes analyzers ( #1667 )
...
* implement refactor for multiple node analyzers
---------
Co-authored-by: Diamon Wiggins <38189728+diamonwiggins@users.noreply.github.com >
2024-11-04 14:17:39 +11:00
Ash
544a700062
[sc-114813] copy HostCollector fails to copy binary files when run in cluster ( #1669 )
...
* Don't convert output bytes to string
This prevents binary files getting mangled when the collector ourput is being passed around between functions
* Update pkg/collect/runner.go
Co-authored-by: Evans Mungai <evans@replicated.com >
* organise imports
---------
Co-authored-by: Evans Mungai <evans@replicated.com >
2024-10-31 10:44:35 +00:00