Commit Graph

62 Commits

Author SHA1 Message Date
Camila Macedo
f2ffef80af Revert "Remove ingress check from https://preflight.replicated.com (#840)" (#846)
This reverts commit a5244a262c.
2022-11-16 14:57:47 +13:00
Camila Macedo
a5244a262c Remove ingress check from https://preflight.replicated.com (#840)
## Description:

Currently, when we install an k8s with kurl and we run kubectl preflight https://preflight.replicated.com it will fail:

> ------------
> Check FAIL
> Title: Ingress
> Message: Contour ingress not found!

Therefore,  Contour ingress does not seems a pre-requirement to kURL. So, should we have this check in the default example/test?
2022-11-15 08:55:04 -06:00
Camila Macedo
18d9a16ceb Minimal memory requirement for kURL is 8GI (#843) 2022-11-15 08:54:29 -06:00
Diamon Wiggins
a0fb06f0b9 E2E Tests for Support Bundle CLI (#761)
* adding e2e tests for support bundle cli

* update e2e.yaml
2022-10-07 14:43:16 +13:00
Evans Mungai
906fa88119 Fit wrapped long url in more information analyser message (#758) 2022-10-06 10:21:52 +13:00
Alex Parker
34c59eb237 Update K8s versions and container runtime 2022-06-22 15:20:48 -04:00
Ethan Mosbaugh
9792289ac1 Remove namespace template from examples 2022-06-02 19:54:31 +00:00
Ethan Mosbaugh
5c269e2aaf E2E preflight tests 2022-05-17 17:20:45 +00:00
Ethan Mosbaugh
755220cd4c Mysql collector example 2022-05-04 17:45:10 +00:00
Salah Aldeen Al Saleh
d1f341b8ed host system packages collector/analyzer (#506)
* host system packages collector/analyzer
2021-12-10 12:05:21 -08:00
Ethan Mosbaugh
213d518136 Time parse doesnt support day notation 2021-11-30 20:11:51 +00:00
Jalaja Ganapathy
372454651e collector/analyzer for host operating system (#443)
* collector/analyzer for host operating system

* address cr comments

* cleanup

* fix invoking the analyzer
code cleanup

* fix cr comments

* add corner case unit-test

* fix kernel version parsing

* address review comments

* add default case

* parse using regex

* added more testcases and fixed the bug found in cr

* few small things
2021-10-12 14:42:23 -07:00
Simon Croome
977fc438ea Remote host collectors (#392)
* Add collect command and remote host collectors

Adds the ability to run a host collector on a set of remote k8s nodes.
Target nodes can be filtered using the --selector flag, with the same
syntax as kubectl.  Existing flags for --collector-image,
--collector-pullpolicy and --request-timeout are used.  To run on a
specified node, --selector="kubernetes.io/hostname=kind-worker2" could
be used.

The collect command is used by the remote collector to output the
results using a "raw" format, which uses the filename as the key, and
the value the output as a escaped json string.  When run manually it
defaults to fully decoded json. The existing block devices,
ipv4interfaces and services host collectors don't decode properly - the
fix is to convert their slice output to a map (fix not included as
unsure what depends on the existing format).

The collect command is also useful for troubleshooting preflight issues.

Examples are included to show remote collector usage.

```
bin/collect --collector-image=croomes/troubleshoot:latest  examples/collect/remote/memory.yaml --namespace test
{
  "kind-control-plane": {
    "system/memory.json": {
      "total": 1304207360
    }
  },
  "kind-worker": {
    "system/memory.json": {
      "total": 1695780864
    }
  },
  "kind-worker2": {
    "system/memory.json": {
      "total": 1726353408
    }
  }
}
```

The preflight command has been updated to run remote collectors.  To run
a host collector remotely it must be specified in the spec as a
`remoteCollector`:

```
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
  name: memory
spec:
  remoteCollectors:
    - memory:
        collectorName: memory
  analyzers:
    - memory:
        outcomes:
          - fail:
              when: "< 8Gi"
              message: At least 8Gi of memory is required
          - warn:
              when: "< 32Gi"
              message: At least 32Gi of memory is recommended
          - pass:
              message: The system has as sufficient memory
```

Results for each node are analyzed separately, with the node name
appended to the title:

```
bin/preflight --interactive=false --collector-image=croomes/troubleshoot:latest examples/preflight/remote/memory.yaml --format=json
{memory running 0 1}
{memory completed 1 1}
{
  "fail": [
    {
      "title": "Amount of Memory (kind-worker2)",
      "message": "At least 8Gi of memory is required"
    },
    {
      "title": "Amount of Memory (kind-worker)",
      "message": "At least 8Gi of memory is required"
    },
    {
      "title": "Amount of Memory (kind-control-plane)",
      "message": "At least 8Gi of memory is required"
    }
  ]
}
```

Also added a host collector to allow preflight checks of required kernel
modules, which is the main driver for this change.
2021-10-06 09:03:53 -05:00
Andrew Reed
4d52760d35 Collector and analyzer for sysctl parameters (#441)
Collector and analyzer for sysctl parameters
2021-10-01 13:43:26 -05:00
Andrew Reed
91eb94baaa Weave report analyzers
The IPAM pool analyzer checks that utilization of the pod IP subnet is
less than 85%. For example, if using 10.32.0.0/12, this analyzer will
warn if 3,482 IPs are currently allocated to pods.

The pending allocation analyzer checks that the IPAM status in the
report has no items for the PendingAllocates field. This indicates the
IPAM service is not ready according to the code in the weave status
template
e3712152d2/prog/weaver/http.go (L186).

The weave connections analyzer checks that all connections to remote
peers are in the established state. The state will be "pending" if UDP
is blocked between nodes and will be "failed" if the weave pod on the
remote node is in a crash loop. To force a pending state for testing,
run the commands `iptables -A INPUT -p udp --dport 6784 -j REJECT` and
`iptables -A INPUT -p udp --dport 6783 -j REJECT` on a peer.

The weave connections analyzer also checks that all connections are
using the fastdp protocol. A commopn issue seen in the field on
CentOS/RHEL 7 is that some sides of a connection are using fastdp and
other sides have fallen back to sleeve. Set the WEAVE_NO_FASTDP env var
on the weave daemonset to "true" to test this analyzer.
2021-09-08 21:29:38 +00:00
Kyle Sorensen
bf7d658313 troubleshoot enables collecting all data from a configmap (#395)
Enabled collecting all data from a ConfigMap instead of by key
2021-07-26 13:00:06 -06:00
kwsorensen
1ed6100ac8 Feature/validate tcp load balancer address (#387)
Load Balancer Validation part of troubleshoot pre-flight checks
2021-07-14 14:30:47 -06:00
John Murphy
c119a16235 Fixed bugs introduced by handling multiple results in host preflights (#383)
Fixed bug caused by host preflights not handling empty when clauses, this cropped up because we now handle multiple host preflight results. Also expanded test coverage and added integration test script.
2021-07-08 11:08:53 -05:00
John Murphy
ae4c07027b host preflights can produce multiple results 2021-07-06 08:42:12 -05:00
Andrew Reed
646f7a6991 Longhorn collector for all CRDs
Also implement a single analyzer as a proof of concept. More analyzers
can be added using the collected CRDs.
2021-05-26 23:37:15 +00:00
Andrew Reed
0a6c9836e0 Add timeout to filesystem performance collector 2021-04-13 18:30:18 +00:00
Andrew Reed
477cde7228 Benchmark write latency with background IOPS
Add a background IOPS feature to the filesystem performance collector
that specifies separate read and write background IOPS to perform while
measuring latency. This allows for better assessment of whether etcd
will be stable when running alongside other workloads on the same
cluster.

Also add templating to the outcome message of the filesystem performance
analyzers to allow printing individual latency percentiles or the entire
table.

Remove the random IOPS benchmark since it was attempting to perform
unaligned direct I/O.
2021-04-12 22:56:00 +00:00
Andrew Reed
87b4c12274 Analyze TLS certificate 2021-02-19 20:55:16 +00:00
Dan Stough
7647c039e9 Merge pull request #325 from replicatedhq/feat/rke3-k3s-anaylzer
feat(analyzer): rke2 and k3s distro support
2021-02-19 14:52:22 -05:00
Dan Stough
c26824a619 feat(analyzer): rke2 and k3s distro support 2021-02-19 19:06:02 +00:00
Andrew Reed
b418334a46 Analyze random read IOPS for a directory
The random IOPS benchmark attempts to replicate the results of this
fio command:

fio --ioengine=psync --direct=1 --bs=4k --size=1Gi --readwrite=randread --serialize_overlap=1

Across three tests the fio command reported 1877 IOPS and the preflight
1822 IOPS with the same block and file size.
2021-02-18 23:56:51 +00:00
Andrew Reed
989d5f7dbd Analyze fs write performance
The included example found P99 latency of 2.6ms.
Fio reported P99 latency of 2.5ms with this command:
fio --rw=write --ioengine=sync --fdatasync=1 --directory=/var/lib/etcd
--size=220m --bs=2300
2021-02-17 23:20:38 +00:00
Andrew Reed
fe4db40b43 Move host preflights examples into separate directory
Add all supported analyzers to host preflight sample.
Don't log transient errors waiting for TCP connection.
Begin human stdout results on new line after spinner.
2021-02-15 22:46:12 +00:00
Andrew Reed
6498c34da5 Analyze ipv4 interfaces
Co-authored-by: Andrew Lavery <laverya@umich.edu>
2021-02-15 20:54:53 +00:00
Andrew Reed
b0a005796c Merge pull request #317 from areed/host-remote-port
Analyze TCP connection
2021-02-15 15:18:11 -05:00
Andrew Reed
450d7570eb Analyze HTTP load balancer 2021-02-15 17:22:42 +00:00
Andrew Reed
40af0f8a9c Analyze TCP connection 2021-02-12 21:45:57 +00:00
Andrew Reed
0bcd5183f5 Analyze block devices 2021-02-11 19:19:45 +00:00
Andrew Reed
9984fe2caa Get time info from timedated 2021-02-10 20:01:15 +00:00
Andrew Reed
f25149f45c Host HTTP request analyzer 2021-02-09 20:31:28 +00:00
Andrew Reed
10a34c2e58 Host preflight (#311)
* Add HostPreflight v1beta2

* Work on TCP Load Balancer

* Host disk usage collector and analyzer

* Host memory analyzer

* TCP port status

* TCP load balancer

* Review changes

Co-authored-by: Marc Campbell <marc.e.campbell@gmail.com>
2021-02-08 16:09:01 -05:00
divolgin
a0ce85ae1e Adding troubleshoot.sh/v1beta2 2020-09-01 19:57:11 +00:00
GraysonNull
cc9d3aedec move logs to collectors array 2020-07-13 22:26:17 +00:00
GraysonNull
a70843cd69 add space 2020-07-13 22:07:37 +00:00
GraysonNull
08c52770a9 fix invalid analyzer in sample support-bundle 2020-07-13 22:06:37 +00:00
GraysonNull
4b1852c3e1 updates from feedback 2020-07-13 16:45:46 +00:00
GraysonNull
dc80733397 update sample preflight and support bundles, update diretory name 2020-07-13 15:43:10 +00:00
Marc Campbell
a22e2e25df Analyze from the CLI 2020-06-12 12:47:45 -07:00
Marc Campbell
91e856d6d5 Add validation for collectors and support bundle in e2e 2020-06-12 10:23:28 -07:00
Marc Campbell
4e3a627b21 Remove broken mysql example 2020-03-23 08:51:59 -07:00
Marc Campbell
8f78827002 Redis 2020-03-20 11:03:49 -07:00
Marc Campbell
562a565f1f Adding postgres analyzer 2020-03-19 18:28:13 -07:00
Michael
a1c5247c8b REFACTOR create helper functions for analyzeDistribution 2020-03-12 18:01:05 +00:00
Marc Campbell
f15579a3b2 Update example 2020-02-06 23:10:40 +00:00
Marc Campbell
879c3a67d7 Node resource analyzer 2020-01-29 23:16:40 +00:00