* Auto-collectors: foundational discovery, image metadata, CLI integration; reset PRD markers * Address PR review feedback - Implement missing namespace exclude patterns functionality - Fix image facts collector to use empty Data field instead of static string - Correct APIVersion to use troubleshoot.sh/v1beta2 consistently * Fix bug bot issues: API parsing, EOF error, and API group corrections - Fix RBAC API parsing errors in rbac_checker.go (getAPIGroup/getAPIVersion functions) - Fix FakeReader EOF error to use standard io.EOF instead of custom error - Fix incorrect API group from troubleshoot.sh to troubleshoot.replicated.com in run.go These changes address the issues identified by the bug bot and ensure proper interface compliance and consistent API group usage. * Fix multiple bug bot issues - Fix RBAC API parsing errors in rbac_checker.go (getAPIGroup/getAPIVersion functions) - Fix FakeReader EOF error to use standard io.EOF instead of custom error - Fix incorrect API group from troubleshoot.sh to troubleshoot.replicated.com in run.go - Fix image facts collector Data field to contain structured JSON instead of static strings These changes address all issues identified by the bug bot and ensure proper interface compliance, consistent API usage, and meaningful data fields. * Update auto_discovery.go * Fix TODO comments in Auto-collector section Fixed 3 of 4 TODOs as requested in PR review: 1. pkg/collect/images/registry_client.go (line 46): - Implement custom CA certificate loading - Add x509 import and certificate parsing logic - Enables image collection from private registries with custom CAs 2. cmd/troubleshoot/cli/diff.go (line 209): - Implement bundle file count functionality - Add tar/gzip imports and getFileCountFromBundle() function - Properly counts files in support bundle archives (.gz/.tgz) 3. cmd/troubleshoot/cli/run.go (line 338): - Replace TODO with clarifying comment about RemoteCollectors usage - Confirmed RemoteCollectors are still actively used in preflights The 4th TODO (diff.go line 196) is left as-is since it's explicitly marked as Phase 4 future work (Support Bundle Differencing implementation). Addresses PR review feedback about unimplemented TODO comments. --------- Co-authored-by: Benjamin Yang <benjaminyang@Benjamins-MacBook-Pro.local>
Replicated Troubleshoot
Replicated Troubleshoot is a framework for collecting, redacting, and analyzing highly customizable diagnostic information about a Kubernetes cluster. Troubleshoot specs are created by 3rd-party application developers/maintainers and run by cluster operators in the initial and ongoing operation of those applications.
Troubleshoot provides two CLI tools as kubectl plugins (using Krew): kubectl preflight and kubectl support-bundle. Preflight provides pre-installation cluster conformance testing and validation (preflight checks) and support-bundle provides post-installation troubleshooting and diagnostics (support bundles).
To know more about troubleshoot, please visit: https://troubleshoot.sh/
Preflight Checks
Preflight checks are an easy-to-run set of conformance tests that can be written to verify that specific requirements in a cluster are met.
To run a sample preflight check from a sample application, install the preflight kubectl plugin:
curl https://krew.sh/preflight | bash
and run, where https://preflight.replicated.com provides an example preflight spec:
kubectl preflight https://preflight.replicated.com
NOTE this is an example. Do not use to validate real scenarios.
For more details on creating the custom resource files that drive preflight checks, visit creating preflight checks.
Support Bundle
A support bundle is an archive that's created in-cluster, by collecting logs and cluster information, and executing specified commands (including redaction of sensitive information). After creating a support bundle, the cluster operator will normally deliver it to the 3rd-party application vendor for analysis and disconnected debugging. Another Replicated project, KOTS, provides k8s apps an in-cluster UI for processing support bundles and viewing analyzers (as well as support bundle collection).
To collect a sample support bundle, install the troubleshoot kubectl plugin:
curl https://krew.sh/support-bundle | bash
and run, where https://support-bundle.replicated.com provides an example support bundle spec:
kubectl support-bundle https://support-bundle.replicated.com
NOTE this is an example. Do not use to validate real scenarios.
For more details on creating the custom resource files that drive support-bundle collection, visit creating collectors and creating analyzers.
And see our other tool sbctl that makes it easier to interact with support bundles using kubectl commands you already know
Community
For questions about using Troubleshoot, how to contribute and engaging with the project in any other way, please refer to the following resources and channels.
- Replicated Community forum
- #app-troubleshoot channel in Kubernetes Slack
- #Community meetings calendar. This happen monthly but dates may change and would be kept upto date in the calendar.
Software Bill of Materials
A signed SBOM that includes Troubleshoot dependencies is included in each release.
- troubleshoot-sbom.tgz contains a software bill of materials for Troubleshoot.
- troubleshoot-sbom.tgz.sig is the digital signature for troubleshoot-sbom.tgz
- key.pub is the public key from the key pair used to sign troubleshoot-sbom.tgz
The following example illustrates using cosign to verify that troubleshoot-sbom.tgz has not been tampered with.
$ cosign verify-blob --key key.pub --signature troubleshoot-sbom.tgz.sig troubleshoot-sbom.tgz
Verified OK
If you were to get an error similar to the one below, it means you are verifying an SBOM signed using cosign v1 using a newer v2 of the binary. This version introduced breaking changes which require an additional flag --insecure-ignore-tlog=true to successfully verify SBOMs like so.
$ cosign verify-blob --key key.pub --signature troubleshoot-sbom.tgz.sig troubleshoot-sbom.tgz --insecure-ignore-tlog=true
WARNING: Skipping tlog verification is an insecure practice that lacks of transparency and auditability verification for the blob.
Verified OK