816 Commits

Author SHA1 Message Date
Jalaja Ganapathy
372454651e collector/analyzer for host operating system (#443)
* collector/analyzer for host operating system

* address cr comments

* cleanup

* fix invoking the analyzer
code cleanup

* fix cr comments

* add corner case unit-test

* fix kernel version parsing

* address review comments

* add default case

* parse using regex

* added more testcases and fixed the bug found in cr

* few small things
v0.16.0
2021-10-12 14:42:23 -07:00
divolgin
5dece3eb75 Merge pull request #451 from replicatedhq/divolgin/panic
Check nil pointers
2021-10-12 10:12:11 -07:00
divolgin
e095a7838f Check nil pointers 2021-10-12 16:10:02 +00:00
Vera Harless
08953d46d1 fix: add collect to goreleaser (#450) v0.15.0 2021-10-08 15:44:55 -04:00
Andrew Lavery
bc197761ea Merge pull request #434 from croomes/handle-api-deprecations
Handle k8s api deprecations
2021-10-08 06:52:31 -07:00
Simon Croome
dc8b38d249 Handle k8s api deprecations 2021-10-07 18:55:51 +01:00
Vera Harless
73609c4fef feat: add more detail to the ceph analyzer output (#445) 2021-10-06 11:22:56 -04:00
Simon Croome
977fc438ea Remote host collectors (#392)
* Add collect command and remote host collectors

Adds the ability to run a host collector on a set of remote k8s nodes.
Target nodes can be filtered using the --selector flag, with the same
syntax as kubectl.  Existing flags for --collector-image,
--collector-pullpolicy and --request-timeout are used.  To run on a
specified node, --selector="kubernetes.io/hostname=kind-worker2" could
be used.

The collect command is used by the remote collector to output the
results using a "raw" format, which uses the filename as the key, and
the value the output as a escaped json string.  When run manually it
defaults to fully decoded json. The existing block devices,
ipv4interfaces and services host collectors don't decode properly - the
fix is to convert their slice output to a map (fix not included as
unsure what depends on the existing format).

The collect command is also useful for troubleshooting preflight issues.

Examples are included to show remote collector usage.

```
bin/collect --collector-image=croomes/troubleshoot:latest  examples/collect/remote/memory.yaml --namespace test
{
  "kind-control-plane": {
    "system/memory.json": {
      "total": 1304207360
    }
  },
  "kind-worker": {
    "system/memory.json": {
      "total": 1695780864
    }
  },
  "kind-worker2": {
    "system/memory.json": {
      "total": 1726353408
    }
  }
}
```

The preflight command has been updated to run remote collectors.  To run
a host collector remotely it must be specified in the spec as a
`remoteCollector`:

```
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
  name: memory
spec:
  remoteCollectors:
    - memory:
        collectorName: memory
  analyzers:
    - memory:
        outcomes:
          - fail:
              when: "< 8Gi"
              message: At least 8Gi of memory is required
          - warn:
              when: "< 32Gi"
              message: At least 32Gi of memory is recommended
          - pass:
              message: The system has as sufficient memory
```

Results for each node are analyzed separately, with the node name
appended to the title:

```
bin/preflight --interactive=false --collector-image=croomes/troubleshoot:latest examples/preflight/remote/memory.yaml --format=json
{memory running 0 1}
{memory completed 1 1}
{
  "fail": [
    {
      "title": "Amount of Memory (kind-worker2)",
      "message": "At least 8Gi of memory is required"
    },
    {
      "title": "Amount of Memory (kind-worker)",
      "message": "At least 8Gi of memory is required"
    },
    {
      "title": "Amount of Memory (kind-control-plane)",
      "message": "At least 8Gi of memory is required"
    }
  ]
}
```

Also added a host collector to allow preflight checks of required kernel
modules, which is the main driver for this change.
2021-10-06 09:03:53 -05:00
Andrew Reed
4d52760d35 Collector and analyzer for sysctl parameters (#441)
Collector and analyzer for sysctl parameters
v0.14.0
2021-10-01 13:43:26 -05:00
divolgin
6e34aa615e Merge pull request #442 from replicatedhq/divolgin/closer
Allow memory writers
v0.13.17
2021-09-30 11:42:14 -07:00
divolgin
ca51e92878 Allow memory writers 2021-09-30 18:25:52 +00:00
divolgin
06750d478e Merge pull request #439 from replicatedhq/divolgin/nil
Don't panic when no data is collected
v0.13.16
2021-09-29 15:03:50 -07:00
divolgin
6d0a57b16e Don't panic when no data is collected 2021-09-29 21:25:28 +00:00
Jalaja Ganapathy
8a29442a2a Remove ID from host preflight spec (#438) v0.13.15 2021-09-29 09:49:54 -07:00
divolgin
299497c0c0 Merge pull request #429 from danbudris/copyFromHostForCpNodes
add toleration to copy-from-host daemonset to allow collection from CP nodes
2021-09-29 09:01:14 -07:00
divolgin
050f5939c6 Merge pull request #437 from replicatedhq/divolgin/memory
Save collector data to disk directly
2021-09-29 08:12:05 -07:00
divolgin
0e8bedc281 Save collector data to disk directly 2021-09-29 00:15:02 +00:00
Dan Stough
bb0515830d Merge pull request #436 from replicatedhq/dans-fix-sbom-perms
chore(ci): fix sbom assets for krew
2021-09-28 15:09:47 -04:00
Dan Stough
b903f1f1c4 chore(ci): fix sbom asset perms 2021-09-28 16:37:53 +00:00
Jalaja Ganapathy
f26c9b4136 fix README syntax (#433) v0.13.14 2021-09-24 17:35:36 -07:00
Jalaja Ganapathy
eb795c98b6 fix serializer for unique id (#432) v0.13.13 2021-09-24 14:20:37 -07:00
Jalaja Ganapathy
a0b3b3f7dc add an unique id to each host preflights (#431)
* add an unique id to each host preflights

* auto generated files

* updated schemas for the new field id

* keeping it consistent with the rest of the spec
2021-09-24 13:29:14 -07:00
danbudris
67987a4432 add toleration to allow copy-from-host daemonset to run on CP nodes 2021-09-23 17:53:57 -04:00
Salah Aldeen Al Saleh
1bdd3db8c5 update schemas (#428)
* update schemas

* update controller-gen
v0.13.12
2021-09-23 11:03:19 -07:00
John Murphy
a2b5edb551 added missing cosign.key (#427)
SBOM generation was failing because it missed a step to generate the private key needed for SBOM signing from Github secret.
v0.13.11
2021-09-23 10:46:30 -05:00
Salah Aldeen Al Saleh
880c7dc3ea ability to specify a list of namespaces for the cluster resources collector (#424)
* ability to specify a list of namespaces for the cluster resources collector
2021-09-23 08:02:05 -07:00
divolgin
922f7c8b23 Merge pull request #425 from replicatedhq/divolgin/results
Analyzers should not return multiple results
2021-09-22 16:13:54 -07:00
divolgin
afa08e5362 Analyzers should not return multiple results 2021-09-22 22:50:38 +00:00
Dan Stough
614aed52c9 Merge pull request #422 from replicatedhq/dans/fix-clean-noninteractive-output
fix(support-bundle): no client-go warnings or control chars if noninteractive.
2021-09-22 13:38:43 -04:00
Dan Stough
72a50ee3f2 fix(support-bundle): no client-go warnings or control chars if noninteractive 2021-09-22 15:59:35 +00:00
Salah Aldeen Al Saleh
0c7fede7b6 check for nil analyzers (#421) 2021-09-21 12:12:10 -07:00
John Murphy
639bf7a832 Add signed SBOM to troubleshoot (#414)
This change will generate a signed software bill of materials and add it to the repository release archives when the project is released.
2021-09-21 13:55:41 -05:00
John Murphy
48287097d8 added email alias to code of conduct (#420) 2021-09-21 13:52:00 -05:00
divolgin
cb5ddf752f Merge pull request #419 from danbudris/machineReadableNonInteractiveOutput
make non-interactive `support-bundle` output more machine readable
2021-09-21 09:21:06 -07:00
danbudris
52e1a04f57 Merge branch 'machineReadableNonInteractiveOutput' of https://github.com/danbudris/troubleshoot into machineReadableNonInteractiveOutput 2021-09-17 11:21:34 -04:00
danbudris
5b4b548aa0 if interactive, only return the print archivePath to stdout; if non-interactive, print whole analysis as json 2021-09-17 11:20:39 -04:00
Daniel Budris
f2a232d174 use analyzerResults not analysis for key 2021-09-17 11:05:34 -04:00
danbudris
f4e675dae0 add json tags to output struct for easier unmarshalling 2021-09-17 10:57:52 -04:00
danbudris
867df407ea convert output bytearray to string before printing 2021-09-17 10:50:22 -04:00
danbudris
e0fb748498 move non-interactive output to discreet struct with marshalling methods; dont show output for non-interactive; format everything in JSON 2021-09-17 10:38:38 -04:00
danbudris
463783d2fa resolve merge conflicts 2021-09-15 21:25:15 -04:00
danbudris
2ce78ac33a Merge branch 'master' of https://github.com/replicatedhq/troubleshoot into machineReadableNonInteractiveOutput 2021-09-15 21:19:01 -04:00
danbudris
4cf0f5881d make non-interactive support-bundle output more machine readable
when using the `interactive=false` flag of `support-bundle`, the spinner would still spin and the archive path and analysis output were kind of smooshed together with the logs.

now, if `interactive=false`, only print each recieved collector callback message once, and don't spin

also, add a key to the archivePath and analyzerOutput that are returned, for easier programatic parsing
2021-09-15 20:58:09 -04:00
Salah Aldeen Al Saleh
465a533640 store analysis in the support bundle (#417)
* store analysis in the support bundle
v0.13.10
2021-09-10 11:58:16 -07:00
Andrew Reed
10785987c5 Merge pull request #415 from areed/areed/weave-analyzer
Weave report analyzers
v0.13.9
2021-09-09 12:47:51 -05:00
Andrew Reed
91eb94baaa Weave report analyzers
The IPAM pool analyzer checks that utilization of the pod IP subnet is
less than 85%. For example, if using 10.32.0.0/12, this analyzer will
warn if 3,482 IPs are currently allocated to pods.

The pending allocation analyzer checks that the IPAM status in the
report has no items for the PendingAllocates field. This indicates the
IPAM service is not ready according to the code in the weave status
template
e3712152d2/prog/weaver/http.go (L186).

The weave connections analyzer checks that all connections to remote
peers are in the established state. The state will be "pending" if UDP
is blocked between nodes and will be "failed" if the weave pod on the
remote node is in a crash loop. To force a pending state for testing,
run the commands `iptables -A INPUT -p udp --dport 6784 -j REJECT` and
`iptables -A INPUT -p udp --dport 6783 -j REJECT` on a peer.

The weave connections analyzer also checks that all connections are
using the fastdp protocol. A commopn issue seen in the field on
CentOS/RHEL 7 is that some sides of a connection are using fastdp and
other sides have fallen back to sleeve. Set the WEAVE_NO_FASTDP env var
on the weave daemonset to "true" to test this analyzer.
2021-09-08 21:29:38 +00:00
Andrew Lavery
1b65d1a544 Merge pull request #413 from replicatedhq/laverya/collect-jobs-and-cronjobs
collect jobs and cronjobs as part of cluster-resources
v0.13.8
2021-09-03 17:25:41 -04:00
Dan Stough
6e09aa641d Merge pull request #412 from replicatedhq/dans-chore-goreleaser-175-updates
chore(ci): update gorelease.yaml to use v175 syntax
2021-09-03 17:23:14 -04:00
Andrew Lavery
7fcc951c9a collect jobs and cronjobs as part of cluster-resources 2021-09-03 15:46:03 -05:00
Dan Stough
123e2e1049 chore(ci): update gorelease.yaml to use 175 syntax 2021-09-03 20:45:18 +00:00