Commit Graph

19 Commits

Author SHA1 Message Date
Salah Al Saleh
d5a6b19417 Add a host analyzer to check if a subnet contains an IP address (#1735)
* Add a host collector / analyzer to check if a subnet contains an IP address
2025-02-13 13:16:59 -08:00
João Antunes
197f6de425 feat(host_analyzer): add host sysctl analyzer (#1681)
* feat(host_analyzer): add host sysctl analyzer

* chore: add e2e tests to support bundle collection

* chore: missing spec e2e test update

* chore: cleanup remote collector and use parse operator

* chore: update schemas
2024-11-08 18:55:24 +00:00
João Antunes
77c9968ff6 feat(host_sysctl): add host sysctl collector (#1676)
* feat(host_sysctl): add host sysctl collector

* chore: add examples

* Update pkg/collect/host_sysctl.go

Co-authored-by: Evans Mungai <evans@replicated.com>

* chore: use sysctl package vs exec calls

* chore: make linter happy

* chore: make schemas

* chore: go back to sysctl exec

* chore: make linter happy

---------

Co-authored-by: Evans Mungai <evans@replicated.com>
2024-11-07 18:18:11 +00:00
Diamon Wiggins
6cbe188abe Fix incorrect result URI for pass and warn outcomes in common status analyzer (#1333)
* fix result URI
* revert examples
* fix warn outcome
2023-09-16 07:36:58 +12:00
Nathan Sullivan
f3db02a200 Collector/Analyzer - Subnet Available (#1004)
* Adding a new Subnet Available Collector and Analyzer, used to check if a subnet is available for use on a K8s node.
2023-03-10 12:52:21 +10:00
Ethan Mosbaugh
3419a9b888 feat: support for rhel 9 variants (rhel, centos, ol, rocky) (#1045) 2023-03-08 06:29:16 -08:00
Ethan Mosbaugh
ad1a56251f feat(hostpreflights): udp port status (#981)
* feat(hostpreflights): udp port status

* fix(hostpreflights): tcpPortStatus -> udpPortStatus
2023-01-24 16:38:54 -05:00
Salah Aldeen Al Saleh
d1f341b8ed host system packages collector/analyzer (#506)
* host system packages collector/analyzer
2021-12-10 12:05:21 -08:00
Jalaja Ganapathy
372454651e collector/analyzer for host operating system (#443)
* collector/analyzer for host operating system

* address cr comments

* cleanup

* fix invoking the analyzer
code cleanup

* fix cr comments

* add corner case unit-test

* fix kernel version parsing

* address review comments

* add default case

* parse using regex

* added more testcases and fixed the bug found in cr

* few small things
2021-10-12 14:42:23 -07:00
Simon Croome
977fc438ea Remote host collectors (#392)
* Add collect command and remote host collectors

Adds the ability to run a host collector on a set of remote k8s nodes.
Target nodes can be filtered using the --selector flag, with the same
syntax as kubectl.  Existing flags for --collector-image,
--collector-pullpolicy and --request-timeout are used.  To run on a
specified node, --selector="kubernetes.io/hostname=kind-worker2" could
be used.

The collect command is used by the remote collector to output the
results using a "raw" format, which uses the filename as the key, and
the value the output as a escaped json string.  When run manually it
defaults to fully decoded json. The existing block devices,
ipv4interfaces and services host collectors don't decode properly - the
fix is to convert their slice output to a map (fix not included as
unsure what depends on the existing format).

The collect command is also useful for troubleshooting preflight issues.

Examples are included to show remote collector usage.

```
bin/collect --collector-image=croomes/troubleshoot:latest  examples/collect/remote/memory.yaml --namespace test
{
  "kind-control-plane": {
    "system/memory.json": {
      "total": 1304207360
    }
  },
  "kind-worker": {
    "system/memory.json": {
      "total": 1695780864
    }
  },
  "kind-worker2": {
    "system/memory.json": {
      "total": 1726353408
    }
  }
}
```

The preflight command has been updated to run remote collectors.  To run
a host collector remotely it must be specified in the spec as a
`remoteCollector`:

```
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
  name: memory
spec:
  remoteCollectors:
    - memory:
        collectorName: memory
  analyzers:
    - memory:
        outcomes:
          - fail:
              when: "< 8Gi"
              message: At least 8Gi of memory is required
          - warn:
              when: "< 32Gi"
              message: At least 32Gi of memory is recommended
          - pass:
              message: The system has as sufficient memory
```

Results for each node are analyzed separately, with the node name
appended to the title:

```
bin/preflight --interactive=false --collector-image=croomes/troubleshoot:latest examples/preflight/remote/memory.yaml --format=json
{memory running 0 1}
{memory completed 1 1}
{
  "fail": [
    {
      "title": "Amount of Memory (kind-worker2)",
      "message": "At least 8Gi of memory is required"
    },
    {
      "title": "Amount of Memory (kind-worker)",
      "message": "At least 8Gi of memory is required"
    },
    {
      "title": "Amount of Memory (kind-control-plane)",
      "message": "At least 8Gi of memory is required"
    }
  ]
}
```

Also added a host collector to allow preflight checks of required kernel
modules, which is the main driver for this change.
2021-10-06 09:03:53 -05:00
kwsorensen
1ed6100ac8 Feature/validate tcp load balancer address (#387)
Load Balancer Validation part of troubleshoot pre-flight checks
2021-07-14 14:30:47 -06:00
John Murphy
c119a16235 Fixed bugs introduced by handling multiple results in host preflights (#383)
Fixed bug caused by host preflights not handling empty when clauses, this cropped up because we now handle multiple host preflight results. Also expanded test coverage and added integration test script.
2021-07-08 11:08:53 -05:00
John Murphy
ae4c07027b host preflights can produce multiple results 2021-07-06 08:42:12 -05:00
Andrew Reed
0a6c9836e0 Add timeout to filesystem performance collector 2021-04-13 18:30:18 +00:00
Andrew Reed
477cde7228 Benchmark write latency with background IOPS
Add a background IOPS feature to the filesystem performance collector
that specifies separate read and write background IOPS to perform while
measuring latency. This allows for better assessment of whether etcd
will be stable when running alongside other workloads on the same
cluster.

Also add templating to the outcome message of the filesystem performance
analyzers to allow printing individual latency percentiles or the entire
table.

Remove the random IOPS benchmark since it was attempting to perform
unaligned direct I/O.
2021-04-12 22:56:00 +00:00
Andrew Reed
87b4c12274 Analyze TLS certificate 2021-02-19 20:55:16 +00:00
Andrew Reed
b418334a46 Analyze random read IOPS for a directory
The random IOPS benchmark attempts to replicate the results of this
fio command:

fio --ioengine=psync --direct=1 --bs=4k --size=1Gi --readwrite=randread --serialize_overlap=1

Across three tests the fio command reported 1877 IOPS and the preflight
1822 IOPS with the same block and file size.
2021-02-18 23:56:51 +00:00
Andrew Reed
989d5f7dbd Analyze fs write performance
The included example found P99 latency of 2.6ms.
Fio reported P99 latency of 2.5ms with this command:
fio --rw=write --ioengine=sync --fdatasync=1 --directory=/var/lib/etcd
--size=220m --bs=2300
2021-02-17 23:20:38 +00:00
Andrew Reed
fe4db40b43 Move host preflights examples into separate directory
Add all supported analyzers to host preflight sample.
Don't log transient errors waiting for TCP connection.
Begin human stdout results on new line after spinner.
2021-02-15 22:46:12 +00:00