121 Commits

Author SHA1 Message Date
Jalaja Ganapathy
eb795c98b6 fix serializer for unique id (#432) 2021-09-24 14:20:37 -07:00
Jalaja Ganapathy
a0b3b3f7dc add an unique id to each host preflights (#431)
* add an unique id to each host preflights

* auto generated files

* updated schemas for the new field id

* keeping it consistent with the rest of the spec
2021-09-24 13:29:14 -07:00
Salah Aldeen Al Saleh
1bdd3db8c5 update schemas (#428)
* update schemas

* update controller-gen
2021-09-23 11:03:19 -07:00
Salah Aldeen Al Saleh
880c7dc3ea ability to specify a list of namespaces for the cluster resources collector (#424)
* ability to specify a list of namespaces for the cluster resources collector
2021-09-23 08:02:05 -07:00
Andrew Reed
91eb94baaa Weave report analyzers
The IPAM pool analyzer checks that utilization of the pod IP subnet is
less than 85%. For example, if using 10.32.0.0/12, this analyzer will
warn if 3,482 IPs are currently allocated to pods.

The pending allocation analyzer checks that the IPAM status in the
report has no items for the PendingAllocates field. This indicates the
IPAM service is not ready according to the code in the weave status
template
e3712152d2/prog/weaver/http.go (L186).

The weave connections analyzer checks that all connections to remote
peers are in the established state. The state will be "pending" if UDP
is blocked between nodes and will be "failed" if the weave pod on the
remote node is in a crash loop. To force a pending state for testing,
run the commands `iptables -A INPUT -p udp --dport 6784 -j REJECT` and
`iptables -A INPUT -p udp --dport 6783 -j REJECT` on a peer.

The weave connections analyzer also checks that all connections are
using the fastdp protocol. A commopn issue seen in the field on
CentOS/RHEL 7 is that some sides of a connection are using fastdp and
other sides have fallen back to sleeve. Set the WEAVE_NO_FASTDP env var
on the weave daemonset to "true" to test this analyzer.
2021-09-08 21:29:38 +00:00
Kyle Sorensen
bf7d658313 troubleshoot enables collecting all data from a configmap (#395)
Enabled collecting all data from a ConfigMap instead of by key
2021-07-26 13:00:06 -06:00
Ethan Mosbaugh
cf7864cd97 Copy collectors extractArchive property 2021-07-23 13:37:57 +00:00
emosbaugh
8dcfa9886d Copy from host collector (#391)
* Copy from host collector

* namespace improvements

* better support for multiple nodes
2021-07-22 12:25:59 -07:00
emosbaugh
39350b5722 ConfigMap collector and secrets can be collected by selectors (#384)
* ConfigMap collector and secrets can be collected by selectors

* follow docs

* Pass context and kubernetes client to collectors

* collect tests

* analyze tests

* fix tests

* improvements
2021-07-08 16:30:26 -07:00
divolgin
7381d5086c Update troubleshoot api schema 2021-07-01 17:24:00 +00:00
Andrew Reed
646f7a6991 Longhorn collector for all CRDs
Also implement a single analyzer as a proof of concept. More analyzers
can be added using the collected CRDs.
2021-05-26 23:37:15 +00:00
Andrew Reed
0a6c9836e0 Add timeout to filesystem performance collector 2021-04-13 18:30:18 +00:00
Andrew Reed
477cde7228 Benchmark write latency with background IOPS
Add a background IOPS feature to the filesystem performance collector
that specifies separate read and write background IOPS to perform while
measuring latency. This allows for better assessment of whether etcd
will be stable when running alongside other workloads on the same
cluster.

Also add templating to the outcome message of the filesystem performance
analyzers to allow printing individual latency percentiles or the entire
table.

Remove the random IOPS benchmark since it was attempting to perform
unaligned direct I/O.
2021-04-12 22:56:00 +00:00
divolgin
7a0c6e5383 use containers package instead of go-containerregistry 2021-04-11 21:39:44 +00:00
divolgin
fe414af556 Docker registry collector/analyzer 2021-04-09 16:17:15 +00:00
Andrew Lavery
bf4d26acd2 add host_services analyzer 2021-03-30 16:15:18 -04:00
Andrew Lavery
f3b599c19a collect host systemctl services 2021-03-30 16:15:17 -04:00
Andrew Lavery
256c68feca added two parameters to the eligible block device check
whether to accept unmounted partitions (default false) and minimum acceptable device size (default 0)
2021-03-18 19:03:39 -04:00
Ethan Mosbaugh
09d16ff185 Host preflights exclude 2021-03-01 22:45:16 +00:00
Ethan Mosbaugh
d6acd6d906 Conditional Analyzers 2021-02-26 04:33:11 +00:00
Andrew Reed
87b4c12274 Analyze TLS certificate 2021-02-19 20:55:16 +00:00
Andrew Reed
989d5f7dbd Analyze fs write performance
The included example found P99 latency of 2.6ms.
Fio reported P99 latency of 2.5ms with this command:
fio --rw=write --ioengine=sync --fdatasync=1 --directory=/var/lib/etcd
--size=220m --bs=2300
2021-02-17 23:20:38 +00:00
Andrew Reed
6498c34da5 Analyze ipv4 interfaces
Co-authored-by: Andrew Lavery <laverya@umich.edu>
2021-02-15 20:54:53 +00:00
Andrew Reed
b0a005796c Merge pull request #317 from areed/host-remote-port
Analyze TCP connection
2021-02-15 15:18:11 -05:00
Andrew Reed
450d7570eb Analyze HTTP load balancer 2021-02-15 17:22:42 +00:00
Andrew Reed
40af0f8a9c Analyze TCP connection 2021-02-12 21:45:57 +00:00
divolgin
105a718bbb log not logs is the subresource 2021-02-12 18:38:46 +00:00
divolgin
ba22fe9f22 Fix can-i checks 2021-02-12 17:27:48 +00:00
Andrew Reed
0bcd5183f5 Analyze block devices 2021-02-11 19:19:45 +00:00
Andrew Reed
9984fe2caa Get time info from timedated 2021-02-10 20:01:15 +00:00
Andrew Reed
f25149f45c Host HTTP request analyzer 2021-02-09 20:31:28 +00:00
Andrew Reed
10a34c2e58 Host preflight (#311)
* Add HostPreflight v1beta2

* Work on TCP Load Balancer

* Host disk usage collector and analyzer

* Host memory analyzer

* TCP port status

* TCP load balancer

* Review changes

Co-authored-by: Marc Campbell <marc.e.campbell@gmail.com>
2021-02-08 16:09:01 -05:00
Ethan Mosbaugh
1e8e20a295 Ceph collector does not need a name property 2020-11-13 21:01:02 +00:00
emosbaugh
2bf19eaddf Ceph collectors and analyzers (#295)
* Ceph collectors and analyzers

* updating based on prior pr

* fixes

* fixes
2020-11-13 09:12:42 -08:00
divolgin
9e669b3f13 Remove RRD dependency and switch to dynamic linking 2020-11-12 19:13:25 +00:00
divolgin
5a1321da02 Collector and analyzer for RRD data 2020-11-10 17:19:17 +00:00
Matias Manavella
f0d9418e21 parseTimeFlag function and error handling added 2020-10-22 12:26:29 -03:00
Matias Manavella
1ba3840ad9 parseTimeFlag function and error handling added 2020-10-22 11:37:52 -03:00
Matias Manavella
7186b75f7e --since flag added 2020-10-21 09:51:52 -03:00
Matias Manavella
e16eabd531 added flag --since-time 2020-10-19 16:53:13 -03:00
Matias Manavella
4b2e7e153e add or create ImagePullSecret 2020-09-04 12:41:47 -03:00
Marc Campbell
949f0f2213 Fix statefulset status 2020-09-03 13:48:25 -07:00
divolgin
a0ce85ae1e Adding troubleshoot.sh/v1beta2 2020-09-01 19:57:11 +00:00
Matias Manavella
b1e251ed03 selector:MatchLabel (#249)
* selector:MatchLabel

* selector:MatchLabel
2020-08-31 09:46:04 -07:00
Andrew Lavery
6e874483b6 update yaml key name, fixup example troubleshoot 2020-06-15 13:55:07 -04:00
Andrew Lavery
afd2ee95ca Merge remote-tracking branch 'origin/master' into laverya/redactor-reformat 2020-06-15 11:21:37 -04:00
Andrew Lavery
e66d12dab2 combine multiline and single line regex 2020-06-15 11:15:50 -04:00
Marc Campbell
65f957db81 Refactor to support K8s 1.18 2020-06-12 09:28:49 -07:00
Andrew Lavery
a561254756 break apart redactor type 2020-06-09 18:43:44 -04:00
Andrew Lavery
fb2f028fb5 add methods to get and clear redactions 2020-05-20 10:28:46 -04:00