troubleshoot

mirror of https://github.com/replicatedhq/troubleshoot.git synced 2026-04-15 07:16:34 +00:00

Author	SHA1	Message	Date
Pavan Sokke Nagaraj	e248ab0f97	Fix `strict` flag mapping (#542 ) * add func BoolOrDefaultFalse and Bool * use strict.BoolOrDefaultFalse * Update pkg/multitype/boolstring.go Co-authored-by: Andrew Lavery <laverya@umich.edu> * Update pkg/multitype/boolstring_test.go Co-authored-by: Andrew Lavery <laverya@umich.edu> * Update pkg/multitype/boolstring_test.go Co-authored-by: Andrew Lavery <laverya@umich.edu> * Update boolstring_test.go * remove duplicate test * Update pkg/multitype/boolstring_test.go Co-authored-by: garcialuis <garcialuisdev@gmail.com> Co-authored-by: Andrew Lavery <laverya@umich.edu> Co-authored-by: garcialuis <garcialuisdev@gmail.com>	2022-02-24 13:31:51 -05:00
Pavan Sokke Nagaraj	942234da80	Add `strict` flag to Analyzers and ResultAnalyzers (#539 ) * add strict flag to Analyzer/AnalyzerMeta and regenerate schemas and controller-gen code * map analyzer strict to result * Update stdout for human and json format * fix review comment * update interactive result * update interactive results * Update types.go * Update upload_results.go * print strict when only true	2022-02-23 15:07:51 -05:00
divolgin	3351c289ab	Add GVK to k8s objects in cluster-resources files	2022-02-04 01:31:07 +00:00
divolgin	007edd1181	Allow specifying namespaces when analyzing cluster resources	2021-12-17 21:47:06 +00:00
divolgin	3cedbe16a7	Organize test files by type and namespace	2021-12-17 19:23:54 +00:00
Salah Aldeen Al Saleh	4c72573936	os minor should default to 0 (#513 )	2021-12-10 13:17:36 -08:00
Salah Aldeen Al Saleh	d1f341b8ed	host system packages collector/analyzer (#506 ) * host system packages collector/analyzer	2021-12-10 12:05:21 -08:00
Ethan Mosbaugh	fba0f97225	found not ound	2021-11-30 20:12:29 +00:00
Ethan Mosbaugh	4d0eaf471f	crd not storageClass	2021-11-30 20:12:09 +00:00
divolgin	739ee666af	Allow text analyzer to not generate an error if no files match	2021-10-29 17:52:59 +00:00
divolgin	742ddc8c06	Ensure outcomes are optional in every case	2021-10-29 00:23:32 +00:00
divolgin	7cb6d90a39	replicaset analyzer supports label selectors	2021-10-28 22:06:15 +00:00
Sean Rester	5d9f14fde5	Merge pull request #474 from replicatedhq/add-node-status-check 38798: Adding node status check	2021-10-28 17:52:18 -04:00
Salah Aldeen Al Saleh	45dd980012	update cluster pod analyzers comment (#475 )	2021-10-28 10:31:59 -07:00
Salah Aldeen Al Saleh	e100e7c478	get container logs for unhealthy pods (#469 ) * get container logs for unhealthy pods Co-authored-by: divolgin <dmitriy@replicated.com> Co-authored-by: divolgin <divolgin@users.noreply.github.com>	2021-10-28 09:21:14 -07:00
Sean Rester	1345b200aa	38798: Adding node status check	2021-10-28 11:16:26 -04:00
divolgin	e7daba9d0c	Merge pull request #470 from replicatedhq/divolgin/analyzers Replicaset collector and analyzer	2021-10-27 13:51:42 -07:00
divolgin	ada35eb31c	Replicaset collector and analyzer	2021-10-27 20:24:14 +00:00
Salah Aldeen Al Saleh	f2374cf113	add involved object to clusterPodStatuses analyzer result (#459 ) * cluster pod statuses analyzer involved object	2021-10-27 12:18:49 -07:00
divolgin	1cdfd96768	Jobs status analyzer	2021-10-26 23:41:02 +00:00
divolgin	f108c3ca57	Analyze all deployments in all namespaces	2021-10-26 21:36:27 +00:00
divolgin	34724e7932	Ability to analyze all statefulsets	2021-10-26 20:51:45 +00:00
Salah Aldeen Al Saleh	26402a7b04	cluster pod statuses analyzer improvements (#458 ) * add pod status reason to cluster pod statuses analyzer	2021-10-26 08:42:40 -07:00
Salah Aldeen Al Saleh	3d1d53ee9d	ClusterPodStatuses analyzer (#456 ) * ClusterPodStatuses analyzer Co-authored-by: divolgin <dmitriy@replicated.com>	2021-10-25 17:44:59 -07:00
Andrew Reed	7b36e6a1f8	Copy in longhorn client (#454 )	2021-10-22 15:24:07 -05:00
Jalaja Ganapathy	372454651e	collector/analyzer for host operating system (#443 ) * collector/analyzer for host operating system * address cr comments * cleanup * fix invoking the analyzer code cleanup * fix cr comments * add corner case unit-test * fix kernel version parsing * address review comments * add default case * parse using regex * added more testcases and fixed the bug found in cr * few small things	2021-10-12 14:42:23 -07:00
divolgin	e095a7838f	Check nil pointers	2021-10-12 16:10:02 +00:00
Vera Harless	73609c4fef	feat: add more detail to the ceph analyzer output (#445 )	2021-10-06 11:22:56 -04:00
Simon Croome	977fc438ea	Remote host collectors (#392 ) * Add collect command and remote host collectors Adds the ability to run a host collector on a set of remote k8s nodes. Target nodes can be filtered using the --selector flag, with the same syntax as kubectl. Existing flags for --collector-image, --collector-pullpolicy and --request-timeout are used. To run on a specified node, --selector="kubernetes.io/hostname=kind-worker2" could be used. The collect command is used by the remote collector to output the results using a "raw" format, which uses the filename as the key, and the value the output as a escaped json string. When run manually it defaults to fully decoded json. The existing block devices, ipv4interfaces and services host collectors don't decode properly - the fix is to convert their slice output to a map (fix not included as unsure what depends on the existing format). The collect command is also useful for troubleshooting preflight issues. Examples are included to show remote collector usage. ``` bin/collect --collector-image=croomes/troubleshoot:latest examples/collect/remote/memory.yaml --namespace test { "kind-control-plane": { "system/memory.json": { "total": 1304207360 } }, "kind-worker": { "system/memory.json": { "total": 1695780864 } }, "kind-worker2": { "system/memory.json": { "total": 1726353408 } } } ``` The preflight command has been updated to run remote collectors. To run a host collector remotely it must be specified in the spec as a `remoteCollector`: ``` apiVersion: troubleshoot.sh/v1beta2 kind: HostPreflight metadata: name: memory spec: remoteCollectors: - memory: collectorName: memory analyzers: - memory: outcomes: - fail: when: "< 8Gi" message: At least 8Gi of memory is required - warn: when: "< 32Gi" message: At least 32Gi of memory is recommended - pass: message: The system has as sufficient memory ``` Results for each node are analyzed separately, with the node name appended to the title: ``` bin/preflight --interactive=false --collector-image=croomes/troubleshoot:latest examples/preflight/remote/memory.yaml --format=json {memory running 0 1} {memory completed 1 1} { "fail": [ { "title": "Amount of Memory (kind-worker2)", "message": "At least 8Gi of memory is required" }, { "title": "Amount of Memory (kind-worker)", "message": "At least 8Gi of memory is required" }, { "title": "Amount of Memory (kind-control-plane)", "message": "At least 8Gi of memory is required" } ] } ``` Also added a host collector to allow preflight checks of required kernel modules, which is the main driver for this change.	2021-10-06 09:03:53 -05:00
Andrew Reed	4d52760d35	Collector and analyzer for sysctl parameters (#441 ) Collector and analyzer for sysctl parameters	2021-10-01 13:43:26 -05:00
divolgin	afa08e5362	Analyzers should not return multiple results	2021-09-22 22:50:38 +00:00
Salah Aldeen Al Saleh	0c7fede7b6	check for nil analyzers (#421 )	2021-09-21 12:12:10 -07:00
Andrew Reed	91eb94baaa	Weave report analyzers The IPAM pool analyzer checks that utilization of the pod IP subnet is less than 85%. For example, if using 10.32.0.0/12, this analyzer will warn if 3,482 IPs are currently allocated to pods. The pending allocation analyzer checks that the IPAM status in the report has no items for the PendingAllocates field. This indicates the IPAM service is not ready according to the code in the weave status template `e3712152d2/prog/weaver/http.go (L186)`. The weave connections analyzer checks that all connections to remote peers are in the established state. The state will be "pending" if UDP is blocked between nodes and will be "failed" if the weave pod on the remote node is in a crash loop. To force a pending state for testing, run the commands `iptables -A INPUT -p udp --dport 6784 -j REJECT` and `iptables -A INPUT -p udp --dport 6783 -j REJECT` on a peer. The weave connections analyzer also checks that all connections are using the fastdp protocol. A commopn issue seen in the field on CentOS/RHEL 7 is that some sides of a connection are using fastdp and other sides have fallen back to sleeve. Set the WEAVE_NO_FASTDP env var on the weave daemonset to "true" to test this analyzer.	2021-09-08 21:29:38 +00:00
Salah Aldeen Al Saleh	c7af0dc593	fix openshift cluster detection (#408 )	2021-08-24 09:51:12 -07:00
John Murphy	fd3b32293c	default result only when no other result exists (#398 )	2021-07-28 11:19:41 -05:00
Kyle Sorensen	2977f8f0d3	Stop longhorn false positives on no results. (#397 ) Longhorn analyzer no longer report positive results on no results	2021-07-28 09:37:54 -06:00
Joris 'Josh' De Winne	6349ae8aee	Adding support for inverted regex (#370 )	2021-07-26 13:06:30 -04:00
emosbaugh	8dcfa9886d	Copy from host collector (#391 ) * Copy from host collector * namespace improvements * better support for multiple nodes	2021-07-22 12:25:59 -07:00
John Murphy	6007f15253	fixed issue where warnings are disseminated along with passes (#390 )	2021-07-22 08:27:39 -05:00
Andrew Lavery	6a0fb2e19c	greatly improve coverage by adding regex group tests	2021-07-20 19:15:09 -04:00
Andrew Lavery	6861660460	simplify the text analyze code by combining with compareRegex code	2021-07-20 18:43:09 -04:00
emosbaugh	39350b5722	ConfigMap collector and secrets can be collected by selectors (#384 ) * ConfigMap collector and secrets can be collected by selectors * follow docs * Pass context and kubernetes client to collectors * collect tests * analyze tests * fix tests * improvements	2021-07-08 16:30:26 -07:00
Andrew Reed	c95dc489a2	Accumulate all longhorn pass results If there are any error or warning results then return those. Otherwise return a single healthy pass result.	2021-07-08 18:25:10 +00:00
John Murphy	c119a16235	Fixed bugs introduced by handling multiple results in host preflights (#383 ) Fixed bug caused by host preflights not handling empty when clauses, this cropped up because we now handle multiple host preflight results. Also expanded test coverage and added integration test script.	2021-07-08 11:08:53 -05:00
John Murphy	d730e6cad6	fixed tests	2021-07-06 08:42:12 -05:00
John Murphy	7e32de464a	implemented code review suggestion	2021-07-06 08:42:12 -05:00
John Murphy	ae4c07027b	host preflights can produce multiple results	2021-07-06 08:42:12 -05:00
Andrew Reed	cb3925a0af	Longhorn replica corruption analyzer This automates the procedure from https://longhorn.io/docs/1.1.1/advanced-resources/data-recovery/corrupted-replica/	2021-06-22 21:55:12 +00:00
Andrew Reed	e1bccd74b5	Analyze longhorn engine	2021-05-27 21:37:39 +00:00
Andrew Reed	0d5f17de3c	Analyze longhorn replica	2021-05-27 19:44:52 +00:00

1 2 3 4 5

201 Commits