161 Commits

Author SHA1 Message Date
Salah Aldeen Al Saleh
880c7dc3ea ability to specify a list of namespaces for the cluster resources collector (#424)
* ability to specify a list of namespaces for the cluster resources collector
2021-09-23 08:02:05 -07:00
Andrew Lavery
1b65d1a544 Merge pull request #413 from replicatedhq/laverya/collect-jobs-and-cronjobs
collect jobs and cronjobs as part of cluster-resources
2021-09-03 17:25:41 -04:00
Andrew Lavery
7fcc951c9a collect jobs and cronjobs as part of cluster-resources 2021-09-03 15:46:03 -05:00
Dan Stough
0478a7a60f fix: cluster-res collector fixed to one namespace 2021-09-03 19:23:44 +00:00
Kyle Sorensen
bf7d658313 troubleshoot enables collecting all data from a configmap (#395)
Enabled collecting all data from a ConfigMap instead of by key
2021-07-26 13:00:06 -06:00
Ethan Mosbaugh
851c91b582 remove debug log 2021-07-26 16:28:11 +00:00
Ethan Mosbaugh
cf7864cd97 Copy collectors extractArchive property 2021-07-23 13:37:57 +00:00
emosbaugh
8dcfa9886d Copy from host collector (#391)
* Copy from host collector

* namespace improvements

* better support for multiple nodes
2021-07-22 12:25:59 -07:00
kwsorensen
1ed6100ac8 Feature/validate tcp load balancer address (#387)
Load Balancer Validation part of troubleshoot pre-flight checks
2021-07-14 14:30:47 -06:00
emosbaugh
39350b5722 ConfigMap collector and secrets can be collected by selectors (#384)
* ConfigMap collector and secrets can be collected by selectors

* follow docs

* Pass context and kubernetes client to collectors

* collect tests

* analyze tests

* fix tests

* improvements
2021-07-08 16:30:26 -07:00
John Murphy
eef54d0021 force timezone to upper case 2021-07-06 08:42:12 -05:00
Andrew Reed
1ed8532663 Speed up replica checksum 2021-07-01 16:52:59 +00:00
Andrew Reed
3833955a58 Always include longhorn namespace 2021-07-01 15:03:28 +00:00
divolgin
52bbc0f2bf Don't skip TLS validation on http package's default client 2021-06-30 18:22:15 +00:00
Andrew Reed
cb3925a0af Longhorn replica corruption analyzer
This automates the procedure from
https://longhorn.io/docs/1.1.1/advanced-resources/data-recovery/corrupted-replica/
2021-06-22 21:55:12 +00:00
Andrew Reed
a86f5cae7d Collect all longhorn pod logs 2021-05-27 20:14:05 +00:00
Andrew Reed
646f7a6991 Longhorn collector for all CRDs
Also implement a single analyzer as a proof of concept. More analyzers
can be added using the collected CRDs.
2021-05-26 23:37:15 +00:00
Andrew Lavery
25a92dec56 collect rook block device disk stats
this contains both max size and currently used size for each PV
2021-04-20 15:41:47 -05:00
divolgin
39cf553a03 Merge pull request #359 from replicatedhq/divolgin/maxage
Honor maxAge for log collector if set in the spec
2021-04-19 13:26:29 -07:00
divolgin
e5233dfcf5 Honor maxAge for log collector if set in the spec 2021-04-19 20:15:41 +00:00
Andrew Reed
30f21ac71b Fix background IOPS blocking until timeout 2021-04-13 18:55:53 +00:00
Andrew Reed
0a6c9836e0 Add timeout to filesystem performance collector 2021-04-13 18:30:18 +00:00
Andrew Lavery
44993a5d0d collect RGW status as part of ceph collector 2021-04-12 23:14:00 -05:00
Andrew Reed
477cde7228 Benchmark write latency with background IOPS
Add a background IOPS feature to the filesystem performance collector
that specifies separate read and write background IOPS to perform while
measuring latency. This allows for better assessment of whether etcd
will be stable when running alongside other workloads on the same
cluster.

Also add templating to the outcome message of the filesystem performance
analyzers to allow printing individual latency percentiles or the entire
table.

Remove the random IOPS benchmark since it was attempting to perform
unaligned direct I/O.
2021-04-12 22:56:00 +00:00
divolgin
7a0c6e5383 use containers package instead of go-containerregistry 2021-04-11 21:39:44 +00:00
divolgin
fe414af556 Docker registry collector/analyzer 2021-04-09 16:17:15 +00:00
Andrew Lavery
bf4d26acd2 add host_services analyzer 2021-03-30 16:15:18 -04:00
Andrew Lavery
f3b599c19a collect host systemctl services 2021-03-30 16:15:17 -04:00
Salah Aldeen Al Saleh
afa0bc56d4 fix custom redactors file selectors in support bundle subdirectory (#336)
* fix custom redactors file selectors in support bundle subdirectory
2021-03-11 08:45:20 -08:00
Ethan Mosbaugh
4b78c430ca Host preflight ux improvements 2021-03-02 17:27:01 +00:00
Ethan Mosbaugh
09d16ff185 Host preflights exclude 2021-03-01 22:45:16 +00:00
Jalaja
3b10d9c9e1 add pv and pvcs to support bundle 2021-02-26 01:02:33 +00:00
Ethan Mosbaugh
a459120516 diskUsage host collector traverse file tree for directory exists 2021-02-24 23:38:57 +00:00
Andrew Reed
b02338762e Fix build for darwin and windows (#327)
* Fix build for darwin and windows

* Remove unexpected dependencies
2021-02-19 18:31:18 -05:00
Andrew Reed
87b4c12274 Analyze TLS certificate 2021-02-19 20:55:16 +00:00
Andrew Reed
b418334a46 Analyze random read IOPS for a directory
The random IOPS benchmark attempts to replicate the results of this
fio command:

fio --ioengine=psync --direct=1 --bs=4k --size=1Gi --readwrite=randread --serialize_overlap=1

Across three tests the fio command reported 1877 IOPS and the preflight
1822 IOPS with the same block and file size.
2021-02-18 23:56:51 +00:00
Andrew Reed
989d5f7dbd Analyze fs write performance
The included example found P99 latency of 2.6ms.
Fio reported P99 latency of 2.5ms with this command:
fio --rw=write --ioengine=sync --fdatasync=1 --directory=/var/lib/etcd
--size=220m --bs=2300
2021-02-17 23:20:38 +00:00
Andrew Reed
fe4db40b43 Move host preflights examples into separate directory
Add all supported analyzers to host preflight sample.
Don't log transient errors waiting for TCP connection.
Begin human stdout results on new line after spinner.
2021-02-15 22:46:12 +00:00
Andrew Reed
6498c34da5 Analyze ipv4 interfaces
Co-authored-by: Andrew Lavery <laverya@umich.edu>
2021-02-15 20:54:53 +00:00
Andrew Reed
b0a005796c Merge pull request #317 from areed/host-remote-port
Analyze TCP connection
2021-02-15 15:18:11 -05:00
Andrew Reed
450d7570eb Analyze HTTP load balancer 2021-02-15 17:22:42 +00:00
Andrew Reed
40af0f8a9c Analyze TCP connection 2021-02-12 21:45:57 +00:00
Andrew Reed
0bcd5183f5 Analyze block devices 2021-02-11 19:19:45 +00:00
Andrew Reed
9984fe2caa Get time info from timedated 2021-02-10 20:01:15 +00:00
Andrew Reed
f25149f45c Host HTTP request analyzer 2021-02-09 20:31:28 +00:00
Andrew Reed
10a34c2e58 Host preflight (#311)
* Add HostPreflight v1beta2

* Work on TCP Load Balancer

* Host disk usage collector and analyzer

* Host memory analyzer

* TCP port status

* TCP load balancer

* Review changes

Co-authored-by: Marc Campbell <marc.e.campbell@gmail.com>
2021-02-08 16:09:01 -05:00
Marc Campbell
c7fdec0291 Removing Scopeagent 2021-01-28 18:22:48 +00:00
divolgin
3e2d90ee9b Create pull secrets if they don't have names 2021-01-08 20:01:47 +00:00
Andrew Lavery
858d00ce41 check whether a collector is excluded before checking RBAC for it 2020-11-19 18:31:56 -05:00
Ethan Mosbaugh
1e8e20a295 Ceph collector does not need a name property 2020-11-13 21:01:02 +00:00