Commit Graph

212 Commits

Author SHA1 Message Date
Pavan Sokke Nagaraj
750ebc1277 return nil when no matching analyzer result 2022-05-27 14:33:35 -04:00
Ethan Mosbaugh
74b4802b46 Add support for k8s 1.24+ 2022-05-24 11:05:59 -07:00
Ethan Mosbaugh
84b40804b5 filter nil analyzer results to prevent panic 2022-05-23 19:43:43 +00:00
diamonwiggins
cccc9f23a1 adding new test case for ceph analyzers 2022-05-13 23:39:03 +00:00
diamonwiggins
ff4353817f adding ceph health status messages to analyzer result 2022-05-13 22:27:19 +00:00
diamonwiggins
8c62aadcfc using subdirectory for all host collectors in support bundle 2022-05-12 16:46:24 +00:00
diamonwiggins
e7f2685ed8 fix output file for diskUsage collector 2022-05-12 04:20:24 +00:00
diamonwiggins
3b1ba08a6b hardcoding system hostcollector filenames 2022-05-12 03:39:19 +00:00
diamonwiggins
17fe3db79f adding host collectors to support bundles 2022-05-11 22:50:03 +00:00
Ethan Mosbaugh
2c9a37a4f1 BoolOrString pollutes marshalling, does not respect omitempty (#566)
* BoolOrString pollutes marshalling, does not respect omitempty

* fix panic
2022-05-05 16:10:05 -07:00
Andrew Lavery
7eb1d5a5fb better parse namespaces, and parse legacy list-of-pods storage format 2022-03-25 22:13:52 -04:00
Pavan Sokke Nagaraj
e248ab0f97 Fix strict flag mapping (#542)
* add func BoolOrDefaultFalse and Bool

* use strict.BoolOrDefaultFalse

* Update pkg/multitype/boolstring.go

Co-authored-by: Andrew Lavery <laverya@umich.edu>

* Update pkg/multitype/boolstring_test.go

Co-authored-by: Andrew Lavery <laverya@umich.edu>

* Update pkg/multitype/boolstring_test.go

Co-authored-by: Andrew Lavery <laverya@umich.edu>

* Update boolstring_test.go

* remove duplicate test

* Update pkg/multitype/boolstring_test.go

Co-authored-by: garcialuis <garcialuisdev@gmail.com>

Co-authored-by: Andrew Lavery <laverya@umich.edu>
Co-authored-by: garcialuis <garcialuisdev@gmail.com>
2022-02-24 13:31:51 -05:00
Pavan Sokke Nagaraj
942234da80 Add strict flag to Analyzers and ResultAnalyzers (#539)
* add strict flag to Analyzer/AnalyzerMeta

and regenerate schemas and controller-gen code

* map analyzer strict to result

* Update stdout for human and json format

* fix review comment

* update interactive result

* update interactive results

* Update types.go

* Update upload_results.go

* print strict when only true
2022-02-23 15:07:51 -05:00
divolgin
3351c289ab Add GVK to k8s objects in cluster-resources files 2022-02-04 01:31:07 +00:00
divolgin
007edd1181 Allow specifying namespaces when analyzing cluster resources 2021-12-17 21:47:06 +00:00
divolgin
3cedbe16a7 Organize test files by type and namespace 2021-12-17 19:23:54 +00:00
Salah Aldeen Al Saleh
4c72573936 os minor should default to 0 (#513) 2021-12-10 13:17:36 -08:00
Salah Aldeen Al Saleh
d1f341b8ed host system packages collector/analyzer (#506)
* host system packages collector/analyzer
2021-12-10 12:05:21 -08:00
Ethan Mosbaugh
fba0f97225 found not ound 2021-11-30 20:12:29 +00:00
Ethan Mosbaugh
4d0eaf471f crd not storageClass 2021-11-30 20:12:09 +00:00
divolgin
739ee666af Allow text analyzer to not generate an error if no files match 2021-10-29 17:52:59 +00:00
divolgin
742ddc8c06 Ensure outcomes are optional in every case 2021-10-29 00:23:32 +00:00
divolgin
7cb6d90a39 replicaset analyzer supports label selectors 2021-10-28 22:06:15 +00:00
Sean Rester
5d9f14fde5 Merge pull request #474 from replicatedhq/add-node-status-check
38798: Adding node status check
2021-10-28 17:52:18 -04:00
Salah Aldeen Al Saleh
45dd980012 update cluster pod analyzers comment (#475) 2021-10-28 10:31:59 -07:00
Salah Aldeen Al Saleh
e100e7c478 get container logs for unhealthy pods (#469)
* get container logs for unhealthy pods

Co-authored-by: divolgin <dmitriy@replicated.com>
Co-authored-by: divolgin <divolgin@users.noreply.github.com>
2021-10-28 09:21:14 -07:00
Sean Rester
1345b200aa 38798: Adding node status check 2021-10-28 11:16:26 -04:00
divolgin
e7daba9d0c Merge pull request #470 from replicatedhq/divolgin/analyzers
Replicaset collector and analyzer
2021-10-27 13:51:42 -07:00
divolgin
ada35eb31c Replicaset collector and analyzer 2021-10-27 20:24:14 +00:00
Salah Aldeen Al Saleh
f2374cf113 add involved object to clusterPodStatuses analyzer result (#459)
* cluster pod statuses analyzer involved object
2021-10-27 12:18:49 -07:00
divolgin
1cdfd96768 Jobs status analyzer 2021-10-26 23:41:02 +00:00
divolgin
f108c3ca57 Analyze all deployments in all namespaces 2021-10-26 21:36:27 +00:00
divolgin
34724e7932 Ability to analyze all statefulsets 2021-10-26 20:51:45 +00:00
Salah Aldeen Al Saleh
26402a7b04 cluster pod statuses analyzer improvements (#458)
* add pod status reason to cluster pod statuses analyzer
2021-10-26 08:42:40 -07:00
Salah Aldeen Al Saleh
3d1d53ee9d ClusterPodStatuses analyzer (#456)
* ClusterPodStatuses analyzer

Co-authored-by: divolgin <dmitriy@replicated.com>
2021-10-25 17:44:59 -07:00
Andrew Reed
7b36e6a1f8 Copy in longhorn client (#454) 2021-10-22 15:24:07 -05:00
Jalaja Ganapathy
372454651e collector/analyzer for host operating system (#443)
* collector/analyzer for host operating system

* address cr comments

* cleanup

* fix invoking the analyzer
code cleanup

* fix cr comments

* add corner case unit-test

* fix kernel version parsing

* address review comments

* add default case

* parse using regex

* added more testcases and fixed the bug found in cr

* few small things
2021-10-12 14:42:23 -07:00
divolgin
e095a7838f Check nil pointers 2021-10-12 16:10:02 +00:00
Vera Harless
73609c4fef feat: add more detail to the ceph analyzer output (#445) 2021-10-06 11:22:56 -04:00
Simon Croome
977fc438ea Remote host collectors (#392)
* Add collect command and remote host collectors

Adds the ability to run a host collector on a set of remote k8s nodes.
Target nodes can be filtered using the --selector flag, with the same
syntax as kubectl.  Existing flags for --collector-image,
--collector-pullpolicy and --request-timeout are used.  To run on a
specified node, --selector="kubernetes.io/hostname=kind-worker2" could
be used.

The collect command is used by the remote collector to output the
results using a "raw" format, which uses the filename as the key, and
the value the output as a escaped json string.  When run manually it
defaults to fully decoded json. The existing block devices,
ipv4interfaces and services host collectors don't decode properly - the
fix is to convert their slice output to a map (fix not included as
unsure what depends on the existing format).

The collect command is also useful for troubleshooting preflight issues.

Examples are included to show remote collector usage.

```
bin/collect --collector-image=croomes/troubleshoot:latest  examples/collect/remote/memory.yaml --namespace test
{
  "kind-control-plane": {
    "system/memory.json": {
      "total": 1304207360
    }
  },
  "kind-worker": {
    "system/memory.json": {
      "total": 1695780864
    }
  },
  "kind-worker2": {
    "system/memory.json": {
      "total": 1726353408
    }
  }
}
```

The preflight command has been updated to run remote collectors.  To run
a host collector remotely it must be specified in the spec as a
`remoteCollector`:

```
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
  name: memory
spec:
  remoteCollectors:
    - memory:
        collectorName: memory
  analyzers:
    - memory:
        outcomes:
          - fail:
              when: "< 8Gi"
              message: At least 8Gi of memory is required
          - warn:
              when: "< 32Gi"
              message: At least 32Gi of memory is recommended
          - pass:
              message: The system has as sufficient memory
```

Results for each node are analyzed separately, with the node name
appended to the title:

```
bin/preflight --interactive=false --collector-image=croomes/troubleshoot:latest examples/preflight/remote/memory.yaml --format=json
{memory running 0 1}
{memory completed 1 1}
{
  "fail": [
    {
      "title": "Amount of Memory (kind-worker2)",
      "message": "At least 8Gi of memory is required"
    },
    {
      "title": "Amount of Memory (kind-worker)",
      "message": "At least 8Gi of memory is required"
    },
    {
      "title": "Amount of Memory (kind-control-plane)",
      "message": "At least 8Gi of memory is required"
    }
  ]
}
```

Also added a host collector to allow preflight checks of required kernel
modules, which is the main driver for this change.
2021-10-06 09:03:53 -05:00
Andrew Reed
4d52760d35 Collector and analyzer for sysctl parameters (#441)
Collector and analyzer for sysctl parameters
2021-10-01 13:43:26 -05:00
divolgin
afa08e5362 Analyzers should not return multiple results 2021-09-22 22:50:38 +00:00
Salah Aldeen Al Saleh
0c7fede7b6 check for nil analyzers (#421) 2021-09-21 12:12:10 -07:00
Andrew Reed
91eb94baaa Weave report analyzers
The IPAM pool analyzer checks that utilization of the pod IP subnet is
less than 85%. For example, if using 10.32.0.0/12, this analyzer will
warn if 3,482 IPs are currently allocated to pods.

The pending allocation analyzer checks that the IPAM status in the
report has no items for the PendingAllocates field. This indicates the
IPAM service is not ready according to the code in the weave status
template
e3712152d2/prog/weaver/http.go (L186).

The weave connections analyzer checks that all connections to remote
peers are in the established state. The state will be "pending" if UDP
is blocked between nodes and will be "failed" if the weave pod on the
remote node is in a crash loop. To force a pending state for testing,
run the commands `iptables -A INPUT -p udp --dport 6784 -j REJECT` and
`iptables -A INPUT -p udp --dport 6783 -j REJECT` on a peer.

The weave connections analyzer also checks that all connections are
using the fastdp protocol. A commopn issue seen in the field on
CentOS/RHEL 7 is that some sides of a connection are using fastdp and
other sides have fallen back to sleeve. Set the WEAVE_NO_FASTDP env var
on the weave daemonset to "true" to test this analyzer.
2021-09-08 21:29:38 +00:00
Salah Aldeen Al Saleh
c7af0dc593 fix openshift cluster detection (#408) 2021-08-24 09:51:12 -07:00
John Murphy
fd3b32293c default result only when no other result exists (#398) 2021-07-28 11:19:41 -05:00
Kyle Sorensen
2977f8f0d3 Stop longhorn false positives on no results. (#397)
Longhorn analyzer no longer report positive results on no results
2021-07-28 09:37:54 -06:00
Joris 'Josh' De Winne
6349ae8aee Adding support for inverted regex (#370) 2021-07-26 13:06:30 -04:00
emosbaugh
8dcfa9886d Copy from host collector (#391)
* Copy from host collector

* namespace improvements

* better support for multiple nodes
2021-07-22 12:25:59 -07:00
John Murphy
6007f15253 fixed issue where warnings are disseminated along with passes (#390) 2021-07-22 08:27:39 -05:00