Simon Croome 977fc438ea Remote host collectors (#392)
* Add collect command and remote host collectors

Adds the ability to run a host collector on a set of remote k8s nodes.
Target nodes can be filtered using the --selector flag, with the same
syntax as kubectl.  Existing flags for --collector-image,
--collector-pullpolicy and --request-timeout are used.  To run on a
specified node, --selector="kubernetes.io/hostname=kind-worker2" could
be used.

The collect command is used by the remote collector to output the
results using a "raw" format, which uses the filename as the key, and
the value the output as a escaped json string.  When run manually it
defaults to fully decoded json. The existing block devices,
ipv4interfaces and services host collectors don't decode properly - the
fix is to convert their slice output to a map (fix not included as
unsure what depends on the existing format).

The collect command is also useful for troubleshooting preflight issues.

Examples are included to show remote collector usage.

```
bin/collect --collector-image=croomes/troubleshoot:latest  examples/collect/remote/memory.yaml --namespace test
{
  "kind-control-plane": {
    "system/memory.json": {
      "total": 1304207360
    }
  },
  "kind-worker": {
    "system/memory.json": {
      "total": 1695780864
    }
  },
  "kind-worker2": {
    "system/memory.json": {
      "total": 1726353408
    }
  }
}
```

The preflight command has been updated to run remote collectors.  To run
a host collector remotely it must be specified in the spec as a
`remoteCollector`:

```
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
  name: memory
spec:
  remoteCollectors:
    - memory:
        collectorName: memory
  analyzers:
    - memory:
        outcomes:
          - fail:
              when: "< 8Gi"
              message: At least 8Gi of memory is required
          - warn:
              when: "< 32Gi"
              message: At least 32Gi of memory is recommended
          - pass:
              message: The system has as sufficient memory
```

Results for each node are analyzed separately, with the node name
appended to the title:

```
bin/preflight --interactive=false --collector-image=croomes/troubleshoot:latest examples/preflight/remote/memory.yaml --format=json
{memory running 0 1}
{memory completed 1 1}
{
  "fail": [
    {
      "title": "Amount of Memory (kind-worker2)",
      "message": "At least 8Gi of memory is required"
    },
    {
      "title": "Amount of Memory (kind-worker)",
      "message": "At least 8Gi of memory is required"
    },
    {
      "title": "Amount of Memory (kind-control-plane)",
      "message": "At least 8Gi of memory is required"
    }
  ]
}
```

Also added a host collector to allow preflight checks of required kernel
modules, which is the main driver for this change.
2021-10-06 09:03:53 -05:00
2021-09-23 10:46:30 -05:00
2021-10-06 09:03:53 -05:00
2021-10-06 09:03:53 -05:00
2021-10-06 09:03:53 -05:00
2021-10-06 09:03:53 -05:00
2020-01-29 13:49:02 -08:00
2019-07-05 22:38:40 +00:00
2021-10-06 09:03:53 -05:00
2021-10-06 09:03:53 -05:00
2021-08-19 20:57:46 -06:00
2021-10-06 09:03:53 -05:00
2021-10-06 09:03:53 -05:00
2019-07-19 00:55:32 +00:00
2021-10-06 09:03:53 -05:00
2019-07-05 22:38:40 +00:00
2021-09-24 17:35:36 -07:00

Replicated Troubleshoot

Replicated Troubleshoot is a framework for collecting, redacting, and analyzing highly customizable diagnostic information about a Kubernetes cluster. Troubleshoot specs are created by 3rd-party application developers/maintainers and run by cluster operators in the initial and ongoing operation of those applications.

Troubleshoot provides two CLI tools as kubectl plugins (using Krew): kubectl preflight and kubectl support-bundle. Preflight provides pre-installation cluster conformance testing and validation (preflight checks) and support-bundle provides post-installation troubleshooting and diagnostics (support bundles).

Preflight Checks

Preflight checks are an easy-to-run set of conformance tests that can be written to verify that specific requirements in a cluster are met.

To run a sample preflight check from a sample application, install the preflight kubectl plugin:

curl https://krew.sh/preflight | bash

and run:

kubectl preflight https://preflight.replicated.com

For more details on creating the custom resource files that drive preflight checks, visit creating preflight checks.

Support Bundle

A support bundle is an archive that's created in-cluster, by collecting logs and cluster information, and executing specified commands (including redaction of sensitive information). After creating a support bundle, the cluster operator will normally deliver it to the 3rd-party application vendor for analysis and disconnected debugging. Another Replicated project, KOTS, provides k8s apps an in-cluster UI for processing support bundles and viewing analyzers (as well as support bundle collection).

To collect a sample support bundle, install the troubleshoot kubectl plugin:

curl https://krew.sh/support-bundle | bash

and run:

kubectl support-bundle https://support-bundle.replicated.com

For more details on creating the custom resource files that drive support-bundle collection, visit creating collectors and creating analyzers.

Community

For questions about using Troubleshoot, there's a Replicated Community forum, and a #app-troubleshoot channel in Kubernetes Slack.

Software Bill of Materials

A signed SBOM that includes Troubleshoot dependencies is included in each release.

  • troubleshoot-sbom.tgz contains a software bill of materials for Troubleshoot.
  • troubleshoot-sbom.tgz.sig is the digital signature for troubleshoot-sbom.tgz
  • key.pub is the public key from the key pair used to sign troubleshoot-sbom.tgz

The following example illustrates using cosign to verify that troubleshoot-sbom.tgz has not been tampered with.

$ cosign verify-blob -key key.pub -signature troubleshoot-sbom.tgz.sig troubleshoot-sbom.tgz
Verified OK
Description
Preflight Checks and Support Bundles Framework for Kubernetes Applications
Readme Apache-2.0 36 MiB
Languages
Go 98.1%
Python 0.9%
Shell 0.8%
Makefile 0.2%