171 Commits

Author SHA1 Message Date
divolgin
afa08e5362 Analyzers should not return multiple results 2021-09-22 22:50:38 +00:00
Salah Aldeen Al Saleh
0c7fede7b6 check for nil analyzers (#421) 2021-09-21 12:12:10 -07:00
Andrew Reed
91eb94baaa Weave report analyzers
The IPAM pool analyzer checks that utilization of the pod IP subnet is
less than 85%. For example, if using 10.32.0.0/12, this analyzer will
warn if 3,482 IPs are currently allocated to pods.

The pending allocation analyzer checks that the IPAM status in the
report has no items for the PendingAllocates field. This indicates the
IPAM service is not ready according to the code in the weave status
template
e3712152d2/prog/weaver/http.go (L186).

The weave connections analyzer checks that all connections to remote
peers are in the established state. The state will be "pending" if UDP
is blocked between nodes and will be "failed" if the weave pod on the
remote node is in a crash loop. To force a pending state for testing,
run the commands `iptables -A INPUT -p udp --dport 6784 -j REJECT` and
`iptables -A INPUT -p udp --dport 6783 -j REJECT` on a peer.

The weave connections analyzer also checks that all connections are
using the fastdp protocol. A commopn issue seen in the field on
CentOS/RHEL 7 is that some sides of a connection are using fastdp and
other sides have fallen back to sleeve. Set the WEAVE_NO_FASTDP env var
on the weave daemonset to "true" to test this analyzer.
2021-09-08 21:29:38 +00:00
Salah Aldeen Al Saleh
c7af0dc593 fix openshift cluster detection (#408) 2021-08-24 09:51:12 -07:00
John Murphy
fd3b32293c default result only when no other result exists (#398) 2021-07-28 11:19:41 -05:00
Kyle Sorensen
2977f8f0d3 Stop longhorn false positives on no results. (#397)
Longhorn analyzer no longer report positive results on no results
2021-07-28 09:37:54 -06:00
Joris 'Josh' De Winne
6349ae8aee Adding support for inverted regex (#370) 2021-07-26 13:06:30 -04:00
emosbaugh
8dcfa9886d Copy from host collector (#391)
* Copy from host collector

* namespace improvements

* better support for multiple nodes
2021-07-22 12:25:59 -07:00
John Murphy
6007f15253 fixed issue where warnings are disseminated along with passes (#390) 2021-07-22 08:27:39 -05:00
Andrew Lavery
6a0fb2e19c greatly improve coverage by adding regex group tests 2021-07-20 19:15:09 -04:00
Andrew Lavery
6861660460 simplify the text analyze code by combining with compareRegex code 2021-07-20 18:43:09 -04:00
emosbaugh
39350b5722 ConfigMap collector and secrets can be collected by selectors (#384)
* ConfigMap collector and secrets can be collected by selectors

* follow docs

* Pass context and kubernetes client to collectors

* collect tests

* analyze tests

* fix tests

* improvements
2021-07-08 16:30:26 -07:00
Andrew Reed
c95dc489a2 Accumulate all longhorn pass results
If there are any error or warning results then return those. Otherwise
return a single healthy pass result.
2021-07-08 18:25:10 +00:00
John Murphy
c119a16235 Fixed bugs introduced by handling multiple results in host preflights (#383)
Fixed bug caused by host preflights not handling empty when clauses, this cropped up because we now handle multiple host preflight results. Also expanded test coverage and added integration test script.
2021-07-08 11:08:53 -05:00
John Murphy
d730e6cad6 fixed tests 2021-07-06 08:42:12 -05:00
John Murphy
7e32de464a implemented code review suggestion 2021-07-06 08:42:12 -05:00
John Murphy
ae4c07027b host preflights can produce multiple results 2021-07-06 08:42:12 -05:00
Andrew Reed
cb3925a0af Longhorn replica corruption analyzer
This automates the procedure from
https://longhorn.io/docs/1.1.1/advanced-resources/data-recovery/corrupted-replica/
2021-06-22 21:55:12 +00:00
Andrew Reed
e1bccd74b5 Analyze longhorn engine 2021-05-27 21:37:39 +00:00
Andrew Reed
0d5f17de3c Analyze longhorn replica 2021-05-27 19:44:52 +00:00
Andrew Reed
646f7a6991 Longhorn collector for all CRDs
Also implement a single analyzer as a proof of concept. More analyzers
can be added using the collected CRDs.
2021-05-26 23:37:15 +00:00
Jelena
c43da65afe More analyzer types checks 2021-04-15 14:30:20 +00:00
jgruica
dd2c2f84e6 Merge pull request #352 from replicatedhq/jelena-analyze-supportbundle
Analyze kind support bundle
2021-04-14 14:15:24 -07:00
Andrew Reed
7d7e3c2664 Remove html escaping in fs performance analyzer 2021-04-13 19:35:11 +00:00
Jelena
a2f4041a1b Analyze kind support bundle 2021-04-12 23:50:15 +00:00
Andrew Reed
477cde7228 Benchmark write latency with background IOPS
Add a background IOPS feature to the filesystem performance collector
that specifies separate read and write background IOPS to perform while
measuring latency. This allows for better assessment of whether etcd
will be stable when running alongside other workloads on the same
cluster.

Also add templating to the outcome message of the filesystem performance
analyzers to allow printing individual latency percentiles or the entire
table.

Remove the random IOPS benchmark since it was attempting to perform
unaligned direct I/O.
2021-04-12 22:56:00 +00:00
divolgin
fe414af556 Docker registry collector/analyzer 2021-04-09 16:17:15 +00:00
Andrew Lavery
19aef8a02f expand systemctl service analyzer to also match service sub/load 2021-04-02 14:48:24 -04:00
Andrew Lavery
559e18d996 lowercase errors 2021-03-30 16:32:19 -04:00
Andrew Lavery
bf4d26acd2 add host_services analyzer 2021-03-30 16:15:18 -04:00
Andrew Lavery
256c68feca added two parameters to the eligible block device check
whether to accept unmounted partitions (default false) and minimum acceptable device size (default 0)
2021-03-18 19:03:39 -04:00
Ethan Mosbaugh
4b78c430ca Host preflight ux improvements 2021-03-02 17:27:01 +00:00
Ethan Mosbaugh
09d16ff185 Host preflights exclude 2021-03-01 22:45:16 +00:00
Andrew Lavery
47f7d98907 add a test that uses a case-insensitive regex analyzer 2021-03-01 13:02:30 -05:00
Andrew Reed
87b4c12274 Analyze TLS certificate 2021-02-19 20:55:16 +00:00
Dan Stough
7647c039e9 Merge pull request #325 from replicatedhq/feat/rke3-k3s-anaylzer
feat(analyzer): rke2 and k3s distro support
2021-02-19 14:52:22 -05:00
Dan Stough
c26824a619 feat(analyzer): rke2 and k3s distro support 2021-02-19 19:06:02 +00:00
Andrew Reed
b418334a46 Analyze random read IOPS for a directory
The random IOPS benchmark attempts to replicate the results of this
fio command:

fio --ioengine=psync --direct=1 --bs=4k --size=1Gi --readwrite=randread --serialize_overlap=1

Across three tests the fio command reported 1877 IOPS and the preflight
1822 IOPS with the same block and file size.
2021-02-18 23:56:51 +00:00
Andrew Reed
989d5f7dbd Analyze fs write performance
The included example found P99 latency of 2.6ms.
Fio reported P99 latency of 2.5ms with this command:
fio --rw=write --ioengine=sync --fdatasync=1 --directory=/var/lib/etcd
--size=220m --bs=2300
2021-02-17 23:20:38 +00:00
Andrew Reed
6498c34da5 Analyze ipv4 interfaces
Co-authored-by: Andrew Lavery <laverya@umich.edu>
2021-02-15 20:54:53 +00:00
Andrew Reed
b0a005796c Merge pull request #317 from areed/host-remote-port
Analyze TCP connection
2021-02-15 15:18:11 -05:00
Andrew Reed
450d7570eb Analyze HTTP load balancer 2021-02-15 17:22:42 +00:00
Andrew Reed
40af0f8a9c Analyze TCP connection 2021-02-12 21:45:57 +00:00
Andrew Reed
0bcd5183f5 Analyze block devices 2021-02-11 19:19:45 +00:00
Andrew Reed
9984fe2caa Get time info from timedated 2021-02-10 20:01:15 +00:00
Andrew Reed
f25149f45c Host HTTP request analyzer 2021-02-09 20:31:28 +00:00
Andrew Reed
10a34c2e58 Host preflight (#311)
* Add HostPreflight v1beta2

* Work on TCP Load Balancer

* Host disk usage collector and analyzer

* Host memory analyzer

* TCP port status

* TCP load balancer

* Review changes

Co-authored-by: Marc Campbell <marc.e.campbell@gmail.com>
2021-02-08 16:09:01 -05:00
Marc Campbell
c7fdec0291 Removing Scopeagent 2021-01-28 18:22:48 +00:00
Andrew Lavery
f19ac98055 add tests for analyzeNodeResources, including label filtering 2020-12-03 19:19:57 -05:00
Andrew Lavery
55fbd92673 not finding a label is not an error 2020-12-03 19:17:47 -05:00