troubleshoot

mirror of https://github.com/replicatedhq/troubleshoot.git synced 2026-04-15 07:16:34 +00:00

Author	SHA1	Message	Date
Camila Macedo	f2ffef80af	Revert "Remove ingress check from https://preflight.replicated.com (#840 )" (#846 ) This reverts commit `a5244a262c`.	2022-11-16 14:57:47 +13:00
Camila Macedo	a5244a262c	Remove ingress check from https://preflight.replicated.com (#840 ) ## Description: Currently, when we install an k8s with kurl and we run kubectl preflight https://preflight.replicated.com it will fail: > ------------ > Check FAIL > Title: Ingress > Message: Contour ingress not found! Therefore, Contour ingress does not seems a pre-requirement to kURL. So, should we have this check in the default example/test?	2022-11-15 08:55:04 -06:00
Camila Macedo	18d9a16ceb	Minimal memory requirement for kURL is 8GI (#843 )	2022-11-15 08:54:29 -06:00
Diamon Wiggins	a0fb06f0b9	E2E Tests for Support Bundle CLI (#761 ) * adding e2e tests for support bundle cli * update e2e.yaml	2022-10-07 14:43:16 +13:00
Evans Mungai	906fa88119	Fit wrapped long url in more information analyser message (#758 )	2022-10-06 10:21:52 +13:00
Alex Parker	34c59eb237	Update K8s versions and container runtime	2022-06-22 15:20:48 -04:00
Ethan Mosbaugh	9792289ac1	Remove namespace template from examples	2022-06-02 19:54:31 +00:00
Ethan Mosbaugh	5c269e2aaf	E2E preflight tests	2022-05-17 17:20:45 +00:00
Ethan Mosbaugh	755220cd4c	Mysql collector example	2022-05-04 17:45:10 +00:00
Salah Aldeen Al Saleh	d1f341b8ed	host system packages collector/analyzer (#506 ) * host system packages collector/analyzer	2021-12-10 12:05:21 -08:00
Ethan Mosbaugh	213d518136	Time parse doesnt support day notation	2021-11-30 20:11:51 +00:00
Jalaja Ganapathy	372454651e	collector/analyzer for host operating system (#443 ) * collector/analyzer for host operating system * address cr comments * cleanup * fix invoking the analyzer code cleanup * fix cr comments * add corner case unit-test * fix kernel version parsing * address review comments * add default case * parse using regex * added more testcases and fixed the bug found in cr * few small things	2021-10-12 14:42:23 -07:00
Simon Croome	977fc438ea	Remote host collectors (#392 ) * Add collect command and remote host collectors Adds the ability to run a host collector on a set of remote k8s nodes. Target nodes can be filtered using the --selector flag, with the same syntax as kubectl. Existing flags for --collector-image, --collector-pullpolicy and --request-timeout are used. To run on a specified node, --selector="kubernetes.io/hostname=kind-worker2" could be used. The collect command is used by the remote collector to output the results using a "raw" format, which uses the filename as the key, and the value the output as a escaped json string. When run manually it defaults to fully decoded json. The existing block devices, ipv4interfaces and services host collectors don't decode properly - the fix is to convert their slice output to a map (fix not included as unsure what depends on the existing format). The collect command is also useful for troubleshooting preflight issues. Examples are included to show remote collector usage. ``` bin/collect --collector-image=croomes/troubleshoot:latest examples/collect/remote/memory.yaml --namespace test { "kind-control-plane": { "system/memory.json": { "total": 1304207360 } }, "kind-worker": { "system/memory.json": { "total": 1695780864 } }, "kind-worker2": { "system/memory.json": { "total": 1726353408 } } } ``` The preflight command has been updated to run remote collectors. To run a host collector remotely it must be specified in the spec as a `remoteCollector`: ``` apiVersion: troubleshoot.sh/v1beta2 kind: HostPreflight metadata: name: memory spec: remoteCollectors: - memory: collectorName: memory analyzers: - memory: outcomes: - fail: when: "< 8Gi" message: At least 8Gi of memory is required - warn: when: "< 32Gi" message: At least 32Gi of memory is recommended - pass: message: The system has as sufficient memory ``` Results for each node are analyzed separately, with the node name appended to the title: ``` bin/preflight --interactive=false --collector-image=croomes/troubleshoot:latest examples/preflight/remote/memory.yaml --format=json {memory running 0 1} {memory completed 1 1} { "fail": [ { "title": "Amount of Memory (kind-worker2)", "message": "At least 8Gi of memory is required" }, { "title": "Amount of Memory (kind-worker)", "message": "At least 8Gi of memory is required" }, { "title": "Amount of Memory (kind-control-plane)", "message": "At least 8Gi of memory is required" } ] } ``` Also added a host collector to allow preflight checks of required kernel modules, which is the main driver for this change.	2021-10-06 09:03:53 -05:00
Andrew Reed	4d52760d35	Collector and analyzer for sysctl parameters (#441 ) Collector and analyzer for sysctl parameters	2021-10-01 13:43:26 -05:00
Andrew Reed	91eb94baaa	Weave report analyzers The IPAM pool analyzer checks that utilization of the pod IP subnet is less than 85%. For example, if using 10.32.0.0/12, this analyzer will warn if 3,482 IPs are currently allocated to pods. The pending allocation analyzer checks that the IPAM status in the report has no items for the PendingAllocates field. This indicates the IPAM service is not ready according to the code in the weave status template `e3712152d2/prog/weaver/http.go (L186)`. The weave connections analyzer checks that all connections to remote peers are in the established state. The state will be "pending" if UDP is blocked between nodes and will be "failed" if the weave pod on the remote node is in a crash loop. To force a pending state for testing, run the commands `iptables -A INPUT -p udp --dport 6784 -j REJECT` and `iptables -A INPUT -p udp --dport 6783 -j REJECT` on a peer. The weave connections analyzer also checks that all connections are using the fastdp protocol. A commopn issue seen in the field on CentOS/RHEL 7 is that some sides of a connection are using fastdp and other sides have fallen back to sleeve. Set the WEAVE_NO_FASTDP env var on the weave daemonset to "true" to test this analyzer.	2021-09-08 21:29:38 +00:00
Kyle Sorensen	bf7d658313	troubleshoot enables collecting all data from a configmap (#395 ) Enabled collecting all data from a ConfigMap instead of by key	2021-07-26 13:00:06 -06:00
kwsorensen	1ed6100ac8	Feature/validate tcp load balancer address (#387 ) Load Balancer Validation part of troubleshoot pre-flight checks	2021-07-14 14:30:47 -06:00
John Murphy	c119a16235	Fixed bugs introduced by handling multiple results in host preflights (#383 ) Fixed bug caused by host preflights not handling empty when clauses, this cropped up because we now handle multiple host preflight results. Also expanded test coverage and added integration test script.	2021-07-08 11:08:53 -05:00
John Murphy	ae4c07027b	host preflights can produce multiple results	2021-07-06 08:42:12 -05:00
Andrew Reed	646f7a6991	Longhorn collector for all CRDs Also implement a single analyzer as a proof of concept. More analyzers can be added using the collected CRDs.	2021-05-26 23:37:15 +00:00
Andrew Reed	0a6c9836e0	Add timeout to filesystem performance collector	2021-04-13 18:30:18 +00:00
Andrew Reed	477cde7228	Benchmark write latency with background IOPS Add a background IOPS feature to the filesystem performance collector that specifies separate read and write background IOPS to perform while measuring latency. This allows for better assessment of whether etcd will be stable when running alongside other workloads on the same cluster. Also add templating to the outcome message of the filesystem performance analyzers to allow printing individual latency percentiles or the entire table. Remove the random IOPS benchmark since it was attempting to perform unaligned direct I/O.	2021-04-12 22:56:00 +00:00
Andrew Reed	87b4c12274	Analyze TLS certificate	2021-02-19 20:55:16 +00:00
Dan Stough	7647c039e9	Merge pull request #325 from replicatedhq/feat/rke3-k3s-anaylzer feat(analyzer): rke2 and k3s distro support	2021-02-19 14:52:22 -05:00
Dan Stough	c26824a619	feat(analyzer): rke2 and k3s distro support	2021-02-19 19:06:02 +00:00
Andrew Reed	b418334a46	Analyze random read IOPS for a directory The random IOPS benchmark attempts to replicate the results of this fio command: fio --ioengine=psync --direct=1 --bs=4k --size=1Gi --readwrite=randread --serialize_overlap=1 Across three tests the fio command reported 1877 IOPS and the preflight 1822 IOPS with the same block and file size.	2021-02-18 23:56:51 +00:00
Andrew Reed	989d5f7dbd	Analyze fs write performance The included example found P99 latency of 2.6ms. Fio reported P99 latency of 2.5ms with this command: fio --rw=write --ioengine=sync --fdatasync=1 --directory=/var/lib/etcd --size=220m --bs=2300	2021-02-17 23:20:38 +00:00
Andrew Reed	fe4db40b43	Move host preflights examples into separate directory Add all supported analyzers to host preflight sample. Don't log transient errors waiting for TCP connection. Begin human stdout results on new line after spinner.	2021-02-15 22:46:12 +00:00
Andrew Reed	6498c34da5	Analyze ipv4 interfaces Co-authored-by: Andrew Lavery <laverya@umich.edu>	2021-02-15 20:54:53 +00:00
Andrew Reed	b0a005796c	Merge pull request #317 from areed/host-remote-port Analyze TCP connection	2021-02-15 15:18:11 -05:00
Andrew Reed	450d7570eb	Analyze HTTP load balancer	2021-02-15 17:22:42 +00:00
Andrew Reed	40af0f8a9c	Analyze TCP connection	2021-02-12 21:45:57 +00:00
Andrew Reed	0bcd5183f5	Analyze block devices	2021-02-11 19:19:45 +00:00
Andrew Reed	9984fe2caa	Get time info from timedated	2021-02-10 20:01:15 +00:00
Andrew Reed	f25149f45c	Host HTTP request analyzer	2021-02-09 20:31:28 +00:00
Andrew Reed	10a34c2e58	Host preflight (#311 ) * Add HostPreflight v1beta2 * Work on TCP Load Balancer * Host disk usage collector and analyzer * Host memory analyzer * TCP port status * TCP load balancer * Review changes Co-authored-by: Marc Campbell <marc.e.campbell@gmail.com>	2021-02-08 16:09:01 -05:00
divolgin	a0ce85ae1e	Adding troubleshoot.sh/v1beta2	2020-09-01 19:57:11 +00:00
GraysonNull	cc9d3aedec	move logs to collectors array	2020-07-13 22:26:17 +00:00
GraysonNull	a70843cd69	add space	2020-07-13 22:07:37 +00:00
GraysonNull	08c52770a9	fix invalid analyzer in sample support-bundle	2020-07-13 22:06:37 +00:00
GraysonNull	4b1852c3e1	updates from feedback	2020-07-13 16:45:46 +00:00
GraysonNull	dc80733397	update sample preflight and support bundles, update diretory name	2020-07-13 15:43:10 +00:00
Marc Campbell	a22e2e25df	Analyze from the CLI	2020-06-12 12:47:45 -07:00
Marc Campbell	91e856d6d5	Add validation for collectors and support bundle in e2e	2020-06-12 10:23:28 -07:00
Marc Campbell	4e3a627b21	Remove broken mysql example	2020-03-23 08:51:59 -07:00
Marc Campbell	8f78827002	Redis	2020-03-20 11:03:49 -07:00
Marc Campbell	562a565f1f	Adding postgres analyzer	2020-03-19 18:28:13 -07:00
Michael	a1c5247c8b	REFACTOR create helper functions for analyzeDistribution	2020-03-12 18:01:05 +00:00
Marc Campbell	f15579a3b2	Update example	2020-02-06 23:10:40 +00:00
Marc Campbell	879c3a67d7	Node resource analyzer	2020-01-29 23:16:40 +00:00

1 2

62 Commits