158 Commits

Author SHA1 Message Date
Edgar Lanting
8fcb65d2a6 Update supportbundle_types.go
Alter comment for the additional `uri` field into one single sentence.
2022-09-02 12:21:40 +02:00
Edgar Lanting
b2c3280623 EL 20220901 - Implement new type for #682 2022-09-01 14:58:51 +02:00
divolgin
75bc9d576c Check if user has access to namespace resources before collecting 2022-08-10 19:40:18 -07:00
divolgin
8e7ea022f7 Adding some utility interfaces for collectors 2022-07-08 14:53:46 -07:00
Diamon Wiggins
a1533d5ec5 adding host analyzers to kind analyzer and supportbundle 2022-07-06 12:19:44 -04:00
Diamon Wiggins
c9c305570b Host Run Collector (#606)
Host Run Collector
2022-06-29 12:14:56 -04:00
divolgin
f02566c712 Use reflection instead of hardcoding all alnalyzers 2022-06-17 13:54:23 -07:00
Craig O'Donnell
354a996edc feat: adds new yamlCompare and jsonCompare analyzers (#598)
* feat: adds new yamlCompare analyzer

* feat: adds new jsonCompare analyzer

* outcome when for yamlCompare and jsonCompare
2022-06-17 14:43:56 -04:00
divolgin
b308f4a2b2 Wrapper function for analyzer's exclude flag 2022-06-16 14:18:39 -07:00
diamonwiggins
3b1ba08a6b hardcoding system hostcollector filenames 2022-05-12 03:39:19 +00:00
diamonwiggins
17fe3db79f adding host collectors to support bundles 2022-05-11 22:50:03 +00:00
Diamon Wiggins
9f527ee6a5 Merge branch 'main' into diamonwiggins/sc-44286/run-pod-spec 2022-05-06 11:08:15 -04:00
Ethan Mosbaugh
2c9a37a4f1 BoolOrString pollutes marshalling, does not respect omitempty (#566)
* BoolOrString pollutes marshalling, does not respect omitempty

* fix panic
2022-05-05 16:10:05 -07:00
Edgar Ochoa
7289134757 Add Mysql variables to collector (#562)
* Add Mysql variables to collector

* Cleanup row scanning and a few updates based on feedback

* Close db connection

* Move defer db.close

* Updates based on feedback

* Use vars in loop instead of struct

* Only pull parameters specified in collector config

Co-authored-by: Ethan Mosbaugh <ethan@replicated.com>
2022-05-04 10:42:37 -07:00
diamonwiggins
42902405cd adding new runpod collector and refactoring old run collector to use new code 2022-05-02 02:44:02 +00:00
diamonwiggins
648f9b8d35 allow entire podspec to be passed in run collector 2022-04-19 16:25:59 +00:00
Pavan Sokke Nagaraj
3d7a255e32 update: add missing database collector code for func GetName() (#553) 2022-03-23 22:00:25 -04:00
diamonwiggins
2b774e16d7 adding serviceaccountname parameter to run collector 2022-03-03 06:12:42 +00:00
Pavan Sokke Nagaraj
942234da80 Add strict flag to Analyzers and ResultAnalyzers (#539)
* add strict flag to Analyzer/AnalyzerMeta

and regenerate schemas and controller-gen code

* map analyzer strict to result

* Update stdout for human and json format

* fix review comment

* update interactive result

* update interactive results

* Update types.go

* Update upload_results.go

* print strict when only true
2022-02-23 15:07:51 -05:00
Andrew Lavery
8fc7d12e19 mark a number of fields as not being required
namespace/namepsaces in resource status analyzers, and the OS list in host package collectors
2022-01-06 23:54:19 +01:00
divolgin
007edd1181 Allow specifying namespaces when analyzing cluster resources 2021-12-17 21:47:06 +00:00
Salah Aldeen Al Saleh
d1f341b8ed host system packages collector/analyzer (#506)
* host system packages collector/analyzer
2021-12-10 12:05:21 -08:00
Ethan Mosbaugh
177f2da16d Update github.com/containers/image/v5 2021-11-30 23:37:25 +00:00
Ethan Mosbaugh
59d50e7679 Fix go mod 2021-11-30 21:26:24 +00:00
divolgin
739ee666af Allow text analyzer to not generate an error if no files match 2021-10-29 17:52:59 +00:00
divolgin
7cb6d90a39 replicaset analyzer supports label selectors 2021-10-28 22:06:15 +00:00
Sean Rester
1345b200aa 38798: Adding node status check 2021-10-28 11:16:26 -04:00
divolgin
ada35eb31c Replicaset collector and analyzer 2021-10-27 20:24:14 +00:00
divolgin
1cdfd96768 Jobs status analyzer 2021-10-26 23:41:02 +00:00
Salah Aldeen Al Saleh
3d1d53ee9d ClusterPodStatuses analyzer (#456)
* ClusterPodStatuses analyzer

Co-authored-by: divolgin <dmitriy@replicated.com>
2021-10-25 17:44:59 -07:00
divolgin
072d2d7a36 Fix ceph collector 2021-10-22 23:01:13 +00:00
Andrew Reed
7b36e6a1f8 Copy in longhorn client (#454) 2021-10-22 15:24:07 -05:00
Jalaja Ganapathy
372454651e collector/analyzer for host operating system (#443)
* collector/analyzer for host operating system

* address cr comments

* cleanup

* fix invoking the analyzer
code cleanup

* fix cr comments

* add corner case unit-test

* fix kernel version parsing

* address review comments

* add default case

* parse using regex

* added more testcases and fixed the bug found in cr

* few small things
2021-10-12 14:42:23 -07:00
Simon Croome
977fc438ea Remote host collectors (#392)
* Add collect command and remote host collectors

Adds the ability to run a host collector on a set of remote k8s nodes.
Target nodes can be filtered using the --selector flag, with the same
syntax as kubectl.  Existing flags for --collector-image,
--collector-pullpolicy and --request-timeout are used.  To run on a
specified node, --selector="kubernetes.io/hostname=kind-worker2" could
be used.

The collect command is used by the remote collector to output the
results using a "raw" format, which uses the filename as the key, and
the value the output as a escaped json string.  When run manually it
defaults to fully decoded json. The existing block devices,
ipv4interfaces and services host collectors don't decode properly - the
fix is to convert their slice output to a map (fix not included as
unsure what depends on the existing format).

The collect command is also useful for troubleshooting preflight issues.

Examples are included to show remote collector usage.

```
bin/collect --collector-image=croomes/troubleshoot:latest  examples/collect/remote/memory.yaml --namespace test
{
  "kind-control-plane": {
    "system/memory.json": {
      "total": 1304207360
    }
  },
  "kind-worker": {
    "system/memory.json": {
      "total": 1695780864
    }
  },
  "kind-worker2": {
    "system/memory.json": {
      "total": 1726353408
    }
  }
}
```

The preflight command has been updated to run remote collectors.  To run
a host collector remotely it must be specified in the spec as a
`remoteCollector`:

```
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
  name: memory
spec:
  remoteCollectors:
    - memory:
        collectorName: memory
  analyzers:
    - memory:
        outcomes:
          - fail:
              when: "< 8Gi"
              message: At least 8Gi of memory is required
          - warn:
              when: "< 32Gi"
              message: At least 32Gi of memory is recommended
          - pass:
              message: The system has as sufficient memory
```

Results for each node are analyzed separately, with the node name
appended to the title:

```
bin/preflight --interactive=false --collector-image=croomes/troubleshoot:latest examples/preflight/remote/memory.yaml --format=json
{memory running 0 1}
{memory completed 1 1}
{
  "fail": [
    {
      "title": "Amount of Memory (kind-worker2)",
      "message": "At least 8Gi of memory is required"
    },
    {
      "title": "Amount of Memory (kind-worker)",
      "message": "At least 8Gi of memory is required"
    },
    {
      "title": "Amount of Memory (kind-control-plane)",
      "message": "At least 8Gi of memory is required"
    }
  ]
}
```

Also added a host collector to allow preflight checks of required kernel
modules, which is the main driver for this change.
2021-10-06 09:03:53 -05:00
Andrew Reed
4d52760d35 Collector and analyzer for sysctl parameters (#441)
Collector and analyzer for sysctl parameters
2021-10-01 13:43:26 -05:00
Jalaja Ganapathy
8a29442a2a Remove ID from host preflight spec (#438) 2021-09-29 09:49:54 -07:00
divolgin
0e8bedc281 Save collector data to disk directly 2021-09-29 00:15:02 +00:00
Jalaja Ganapathy
eb795c98b6 fix serializer for unique id (#432) 2021-09-24 14:20:37 -07:00
Jalaja Ganapathy
a0b3b3f7dc add an unique id to each host preflights (#431)
* add an unique id to each host preflights

* auto generated files

* updated schemas for the new field id

* keeping it consistent with the rest of the spec
2021-09-24 13:29:14 -07:00
Salah Aldeen Al Saleh
1bdd3db8c5 update schemas (#428)
* update schemas

* update controller-gen
2021-09-23 11:03:19 -07:00
Salah Aldeen Al Saleh
880c7dc3ea ability to specify a list of namespaces for the cluster resources collector (#424)
* ability to specify a list of namespaces for the cluster resources collector
2021-09-23 08:02:05 -07:00
Andrew Reed
91eb94baaa Weave report analyzers
The IPAM pool analyzer checks that utilization of the pod IP subnet is
less than 85%. For example, if using 10.32.0.0/12, this analyzer will
warn if 3,482 IPs are currently allocated to pods.

The pending allocation analyzer checks that the IPAM status in the
report has no items for the PendingAllocates field. This indicates the
IPAM service is not ready according to the code in the weave status
template
e3712152d2/prog/weaver/http.go (L186).

The weave connections analyzer checks that all connections to remote
peers are in the established state. The state will be "pending" if UDP
is blocked between nodes and will be "failed" if the weave pod on the
remote node is in a crash loop. To force a pending state for testing,
run the commands `iptables -A INPUT -p udp --dport 6784 -j REJECT` and
`iptables -A INPUT -p udp --dport 6783 -j REJECT` on a peer.

The weave connections analyzer also checks that all connections are
using the fastdp protocol. A commopn issue seen in the field on
CentOS/RHEL 7 is that some sides of a connection are using fastdp and
other sides have fallen back to sleeve. Set the WEAVE_NO_FASTDP env var
on the weave daemonset to "true" to test this analyzer.
2021-09-08 21:29:38 +00:00
Kyle Sorensen
bf7d658313 troubleshoot enables collecting all data from a configmap (#395)
Enabled collecting all data from a ConfigMap instead of by key
2021-07-26 13:00:06 -06:00
Ethan Mosbaugh
cf7864cd97 Copy collectors extractArchive property 2021-07-23 13:37:57 +00:00
emosbaugh
8dcfa9886d Copy from host collector (#391)
* Copy from host collector

* namespace improvements

* better support for multiple nodes
2021-07-22 12:25:59 -07:00
emosbaugh
39350b5722 ConfigMap collector and secrets can be collected by selectors (#384)
* ConfigMap collector and secrets can be collected by selectors

* follow docs

* Pass context and kubernetes client to collectors

* collect tests

* analyze tests

* fix tests

* improvements
2021-07-08 16:30:26 -07:00
divolgin
7381d5086c Update troubleshoot api schema 2021-07-01 17:24:00 +00:00
Andrew Reed
646f7a6991 Longhorn collector for all CRDs
Also implement a single analyzer as a proof of concept. More analyzers
can be added using the collected CRDs.
2021-05-26 23:37:15 +00:00
Andrew Reed
0a6c9836e0 Add timeout to filesystem performance collector 2021-04-13 18:30:18 +00:00
Andrew Reed
477cde7228 Benchmark write latency with background IOPS
Add a background IOPS feature to the filesystem performance collector
that specifies separate read and write background IOPS to perform while
measuring latency. This allows for better assessment of whether etcd
will be stable when running alongside other workloads on the same
cluster.

Also add templating to the outcome message of the filesystem performance
analyzers to allow printing individual latency percentiles or the entire
table.

Remove the random IOPS benchmark since it was attempting to perform
unaligned direct I/O.
2021-04-12 22:56:00 +00:00