* created roadmap and yaml claude agent * Update roadmap.md * chore(deps): bump sigstore/cosign-installer from 3.9.2 to 3.10.0 (#1857) Bumps [sigstore/cosign-installer](https://github.com/sigstore/cosign-installer) from 3.9.2 to 3.10.0. - [Release notes](https://github.com/sigstore/cosign-installer/releases) - [Commits](https://github.com/sigstore/cosign-installer/compare/v3.9.2...v3.10.0) --- updated-dependencies: - dependency-name: sigstore/cosign-installer dependency-version: 3.10.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump the security group with 2 updates (#1858) Bumps the security group with 2 updates: [github.com/vmware-tanzu/velero](https://github.com/vmware-tanzu/velero) and [helm.sh/helm/v3](https://github.com/helm/helm). Updates `github.com/vmware-tanzu/velero` from 1.16.2 to 1.17.0 - [Release notes](https://github.com/vmware-tanzu/velero/releases) - [Changelog](https://github.com/vmware-tanzu/velero/blob/main/CHANGELOG.md) - [Commits](https://github.com/vmware-tanzu/velero/compare/v1.16.2...v1.17.0) Updates `helm.sh/helm/v3` from 3.18.6 to 3.19.0 - [Release notes](https://github.com/helm/helm/releases) - [Commits](https://github.com/helm/helm/compare/v3.18.6...v3.19.0) --- updated-dependencies: - dependency-name: github.com/vmware-tanzu/velero dependency-version: 1.17.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: security - dependency-name: helm.sh/helm/v3 dependency-version: 3.19.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: security ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump helm.sh/helm/v3 from 3.18.6 to 3.19.0 in /examples/sdk/helm-template in the security group (#1859) chore(deps): bump helm.sh/helm/v3 Bumps the security group in /examples/sdk/helm-template with 1 update: [helm.sh/helm/v3](https://github.com/helm/helm). Updates `helm.sh/helm/v3` from 3.18.6 to 3.19.0 - [Release notes](https://github.com/helm/helm/releases) - [Commits](https://github.com/helm/helm/compare/v3.18.6...v3.19.0) --- updated-dependencies: - dependency-name: helm.sh/helm/v3 dependency-version: 3.19.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: security ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add cron job support bundle scheduler Complete implementation with K8s integration: - pkg/schedule/job.go: Job management and persistence - pkg/schedule/daemon.go: Real-time scheduler daemon - pkg/schedule/cli.go: CLI commands (create, list, delete, daemon) - pkg/schedule/schedule_test.go: Comprehensive unit tests - cmd/troubleshoot/cli/root.go: CLI integration * fixing bugbot * Fix all bugbot errors: auto-update stability, job cooldown timing, and daemon execution * Deleting Agent * removed unused flags * fixing auto-upload * fixing markdown files * namespace not required flag for auto collectors to work * loosened cron job validation * writes logs to logfile * fix: resolve autoFromEnv variable scoping issue for CI - Ensure autoFromEnv variable and its usage are in correct scope - Fix build errors: declared and not used / undefined variable - All functionality preserved and tested locally - Force add to override gitignore --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Noah Campbell <noah.edward.campbell@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
28 KiB
Phased execution plan (actionable)
-
Foundation & policy (cross-cutting) • Goal: Establish non-negotiable engineering charters, error taxonomy, deterministic I/O, and output envelope. • Do: • Adopt items under “Cross-cutting engineering charters”. • Implement centralized error codes (see “1) Error codes (centralized)”). • Implement JSON output envelope (see “2) Output envelope (JSON mode)”). • Add idempotency key helper (see “3) Idempotency key”). • Ensure deterministic marshaling patterns (see “4) Deterministic marshaling”). • Define config precedence and env aliases (see section E) Config precedence & env aliases). • Add Make targets (see section F) Make targets). • Acceptance: • “Measurable add-on success criteria” items related to CLI output and determinism are satisfied.
-
Distribution & updates (installers, signing, updater) • Goal: Stop krew; ship Homebrew and curl|bash installers; add secure update with rollback. • Do: • Remove/retire krew guidance; add Homebrew formulas and curl|bash script(s). • Implement “C) Update system (secure + rollback)” including channels, rollback, tamper defense, delta updates (optional later). • Implement “Reproducible, signed, attestable releases” (SBOM, cosign, SLSA, SOURCE_DATE_EPOCH). • Add minimal packaging matrix validation for brew and curl|bash; expand later (see D) Packaging matrix validation (CI)). • Acceptance: • Users can install preflight and support-bundle via brew and curl|bash. • Updater supports --channel, verify, rollback; signatures verified per roadmap details.
-
API v1beta3 schemas and libraries • Goal: Define and own v1beta3 JSON Schemas and supporting defaulting/validation/conversion libraries within performance budgets. • Do: • Implement “API v1beta3 & schema work (deeper)” sections A–D (JSON Schema strategy; defaulting; validation; performance budget). • Add converters and fuzzers per “C) Converters robustness”. • Benchmarks per “D) Performance budget”. • Acceptance: • Schemas published under schemas.troubleshoot.sh/v1beta3/* with $id, $schema, $defs. • Validation/defaulting return structured errors; fuzz and perf budgets pass.
-
Preflight requirements disclosure command • Goal: Let customers preview requirements offline; render table/json/yaml/md; support templating values. • Do: • Implement “Preflight requirements disclosure (new command)” (
preflight requirements), including flags and behaviors. • Implement templating from “Preflight CLI: Values and --set support (templating)”. • Acceptance: • Output validates against docs/preflight-requirements.schema.json and renders within width targets. • Unit and golden tests for table/json/md; fuzz tests for extractor stability. -
Docs generator and portal gate/override • Goal: Generate preflight docs with rationale and support portal gate/override flow. • Do: • Implement “Preflight docs & portal flow (hardening)” sections A–D (merge engine, docs generator, portal client contract, E2E tests). • Ensure CLI prints requestId on error; implement backoff/idempotency per contract. • Acceptance: • E2E portal tests cover pass/fail/override/429/5xx with retries. • Docs generator emits MD/HTML with i18n hooks and template slots.
-
Simplified spec model: intents, presets, imports • Goal: Reduce authoring burden via intents for collect/analyze, redaction profiles with tokenize, and preset/import model. • Do: • Implement “Simplified spec model: intents, presets, imports”: intents.collect.auto; intents.analyze.requirements; redact.profile + tokenize; import/extends; selectors/filters; compatibility flags
--emitand--explain. • Provide examples and downgrade warnings for v1beta2 emit. • Acceptance: • Deterministic expansion demonstrated; explain output shows generated low-level spec; downgrade warnings reported where applicable. -
Public packages & ecosystem factoring • Goal: Establish stable package boundaries to support reuse and avoid logging in libs. • Do: • Create packages listed under “Public packages & ecosystem” (pkg/cli/contract, update, schema, specs/*, docs/render, portal/client). • Export minimal, stable APIs; return structured errors. • Acceptance: • api-diff green or change proposal attached.
-
CI/CD reinforcement • Goal: End-to-end pipelines for verification, install matrix, benchmarks, supply-chain, and releases. • Do: • Implement pipeline stages listed under “CI/CD reinforcement → Pipelines 1–5”. • Add static checks (revive/golangci-lint, api-diff rules) per roadmap. • Acceptance: • Pipelines green; supply chain artifacts (SBOM, cosign, SLSA) produced; release flow notarizes and publishes.
-
Testing strategy, determinism and performance harness, artifacts layout • Goal: Comprehensive unit/contract/fuzz/integration tests, deterministic outputs, and curated fixtures. • Do: • Implement “Testing strategy (Dev 1 scope)” (unit, contract/golden, fuzz/property, integration/matrix tests). • Implement “Determinism & performance” harness and budgets. • Organize artifacts per “Artifacts & layout” and add Make targets for test/fuzz/contracts/e2e/bench. • Acceptance: • Golden tests stable; determinism harness passes under SOURCE_DATE_EPOCH; benchmarks within budgets.
-
Packaging matrix expansion (optional later) • Goal: Expand beyond brew/curl to scoop and deb/rpm when desired. • Do: • Extend “D) Packaging matrix validation (CI)” to include scoop and deb/rpm installers and tests across OSes. • Acceptance: • Installers validated on ubuntu/macos/windows with smoke commands; macOS notarization verified.
Notes • Each phase references detailed specifications below. Implement phases in order; parallelize sub-items where safe. • If scope for an initial milestone is narrower (e.g., brew/curl only), mark the remaining items as deferred but keep tests/docs ready to expand.
Cross-cutting engineering charters
-
Contract/stability policy (one pager, checked into repo) • SemVer & windows: major.minor.patch; flags/commands stable for ≥2 minors; deprecations carry --explain-deprecations. • Breaking-change gate: PR must include contracts/CHANGE_PROPOSAL.md + updated goldens + migration notes. • Determinism: Same inputs ⇒ byte-identical outputs (normalized map ordering, sorted slices, stable timestamps with SOURCE_DATE_EPOCH).
-
Observability & diagnostics • Structured logs (zerolog/zap): --log-format {text,json}, --log-level {info,debug,trace}. • Exit code taxonomy: 0 ok, 1 generic, 2 usage, 3 network, 4 schema, 5 incompatible-api, 6 update-failed, 7 permission, 8 partial-success. • OTel hooks (behind TROUBLESHOOT_OTEL_ENDPOINT): span “loadSpec”, “mergeSpec”, “runPreflight”, “uploadPortal”.
-
Reproducible, signed, attestable releases • SBOM (cyclonedx/spdx) emitted by GoReleaser. • cosign: sign archives + checksums.txt; produce SLSA provenance attestation. • SOURCE_DATE_EPOCH set in CI to pin archive mtimes.
CLI contracts & packaging (more depth)
A) Machine-readable CLI spec • Generate docs/cli-contracts.json from Cobra tree (name, synopsis, flags, defaults, env aliases, deprecation). • Validate at runtime when TROUBLESHOOT_DEBUG_CONTRACT=1 to catch drift in dev builds. • Use that JSON to: • Autogenerate shell completions for bash/zsh/fish/pwsh. • Render the --help text (single source of truth).
B) UX hardening • TTY detection: progress bars only on TTY; --no-progress to force off. • Color policy: --color {auto,always,never} + NO_COLOR env respected. • Output mode: --output {human,json,yaml} for all read commands. For json, include a top-level "schemaVersion": "cli.v1".
C) Update system (secure + rollback) • Channel support: --channel {stable,rc,nightly} (maps to tags: vX.Y.Z, vX.Y.Z-rc.N, nightly-YYYYMMDD). • Rollback: keep N=2 previous binaries under ~/.troubleshoot/bin/versions/…; preflight update --rollback. • Tamper defense: verify cosign sig for checksums.txt; verify SHA256 of selected asset; fail closed with error code 6. • Delta updates (optional later): if asset .patch exists and base version matches, apply bsdiff; fallback to full.
D) Packaging matrix validation (CI) • Matrix test on ubuntu-latest, macos-latest, windows-latest: • Install via brew, scoop, deb/rpm, curl|bash; then run preflight --version and a sample command. • Gatekeeper: spctl -a -v on macOS; print notarization ticket.
E) Config precedence & env aliases • Per-binary config paths (defaults): • macOS/Linux: • preflight: ~/.config/preflight/config.yaml • support-bundle: ~/.config/support-bundle/config.yaml • Windows: • preflight: %APPDATA%\Troubleshoot\Preflight\config.yaml • support-bundle: %APPDATA%\Troubleshoot\SupportBundle\config.yaml • Optional global fallback (lower precedence): ~/.config/troubleshoot/config.yaml • Precedence: flag > binary env > global env > binary config > global config > default • --config overrides discovery; respects XDG_CONFIG_HOME (Unix) and APPDATA (Windows) • Env aliases: • Global: TROUBLESHOOT_PORTAL_URL, TROUBLESHOOT_API_TOKEN • Binary-scoped: PREFLIGHT_* and SUPPORT_BUNDLE_* (take precedence over TROUBLESHOOT_*)
F) Make targets
make contracts # regen CLI JSON + goldens make sbom # build SBOMs make release-dryrun # goreleaser --skip-publish make e2e-install # spins a container farm to test deb/rpm
API v1beta3 & schema work (deeper)
A) JSON Schema strategy • Give every schema an $id and $schema; publish at schemas.troubleshoot.sh/v1beta3/*.json. • Use $defs for shared primitives (Quantity, Duration, CPUSet, Selector). • Add x-kubernetes-validations parity constraints where applicable (even if not applying as CRD).
B) Defaulting & validation library • pkg/validation/validate.go: returns []FieldError with JSONPointer paths and machine codes. • pkg/defaults/defaults.go: idempotent defaulting; fuzz tests prove no oscillation (fuzz: in -> default -> default == default).
C) Converters robustness • Fuzzers (go1.20+): generate random v1beta1/2 structs, convert→internal→v1beta3→internal and assert invariants (lossless roundtrips where representable). • Report downgrade loss: if v1beta3→v1beta2 drops info, print warning list to stderr and annotate output with x-downgrade-warnings.
D) Performance budget • Load+validate 1MB spec ≤ 150ms p95, 10MB ≤ 800ms p95 on GOARCH=amd64 GitHub runner. • Benchmarks in pkg/apis/bench_test.go enforce budgets.
E) Simplified spec model: intents, presets, imports • Problem: vendors handwrite verbose collector/analyzer lists. Goal: smaller, intent-driven specs that expand deterministically. • Tenets: • Additive, backwards-compatible; loader can expand intents into concrete v1beta2-equivalent structures. • Deterministic expansion (same inputs ⇒ same expansion) with --explain to show the generated low-level spec. • Shorthand over raw lists: “what” not “how”. • Top-level additions (v1beta3): • intents.collect.auto: namespace, profiles, includeKinds, excludeKinds, selectors, size caps. • intents.analyze.requirements: high-level checks (k8sVersion, nodes.cpu/memory, podsReady, storageClass, CRDsPresent…). • redact.profile + tokenize: standard|strict; optional token map emission. • import: versioned presets (preset://k8s/basic@v1) with local vendoring. • extends: URL or preset to inherit from, with override blocks. • Selectors & filters: • labelSelector, fieldSelector, name/glob filters; include/exclude precedence clarified in schema docs. • Compatibility: • --emit v1beta2 to produce a concrete legacy spec; downgrade warnings if some intent can’t fully map. • --explain prints the expanded collectors/analyzers to aid review and vendoring. • Example: Preflight with requirements + docs
apiVersion: troubleshoot.sh/v1beta3
kind: Preflight
metadata:
name: example
requirements:
- name: Baseline
docString: "Core Kubernetes and cluster requirements."
checks:
- clusterVersion:
checkName: Kubernetes version
outcomes:
- fail:
when: "< 1.20.0"
message: This application requires at least Kubernetes 1.20.0, and recommends 1.22.0.
uri: https://kubernetes.io
- warn:
when: "< 1.22.0"
message: Your cluster meets the minimum version of Kubernetes, but we recommend you update to 1.22.0 or later.
uri: https://kubernetes.io
- pass:
when: ">= 1.22.0"
message: Your cluster meets the recommended and required versions of Kubernetes.
- customResourceDefinition:
checkName: Ingress
customResourceDefinitionName: ingressroutes.contour.heptio.com
outcomes:
- fail:
message: Contour ingress not found!
- pass:
message: Contour ingress found!
- containerRuntime:
outcomes:
- pass:
when: "== containerd"
message: containerd container runtime was found.
- fail:
message: Did not find containerd container runtime.
- storageClass:
checkName: Required storage classes
storageClassName: "default"
outcomes:
- fail:
message: Could not find a storage class called default.
- pass:
message: All good on storage classes
- distribution:
outcomes:
- fail:
when: "== docker-desktop"
message: The application does not support Docker Desktop Clusters
- fail:
when: "== microk8s"
message: The application does not support Microk8s Clusters
- fail:
when: "== minikube"
message: The application does not support Minikube Clusters
- pass:
when: "== eks"
message: EKS is a supported distribution
- pass:
when: "== gke"
message: GKE is a supported distribution
- pass:
when: "== aks"
message: AKS is a supported distribution
- pass:
when: "== kurl"
message: KURL is a supported distribution
- pass:
when: "== digitalocean"
message: DigitalOcean is a supported distribution
- pass:
when: "== rke2"
message: RKE2 is a supported distribution
- pass:
when: "== k3s"
message: K3S is a supported distribution
- pass:
when: "== oke"
message: OKE is a supported distribution
- pass:
when: "== kind"
message: Kind is a supported distribution
- warn:
message: Unable to determine the distribution of Kubernetes
- nodeResources:
checkName: Must have at least 3 nodes in the cluster, with 5 recommended
outcomes:
- fail:
when: "count() < 3"
message: This application requires at least 3 nodes.
uri: https://kurl.sh/docs/install-with-kurl/adding-nodes
- warn:
when: "count() < 5"
message: This application recommends at last 5 nodes.
uri: https://kurl.sh/docs/install-with-kurl/adding-nodes
- pass:
message: This cluster has enough nodes.
- nodeResources:
checkName: Every node in the cluster must have at least 8 GB of memory, with 32 GB recommended
outcomes:
- fail:
when: "min(memoryCapacity) < 8Gi"
message: All nodes must have at least 8 GB of memory.
uri: https://kurl.sh/docs/install-with-kurl/system-requirements
- warn:
when: "min(memoryCapacity) < 32Gi"
message: All nodes are recommended to have at least 32 GB of memory.
uri: https://kurl.sh/docs/install-with-kurl/system-requirements
- pass:
message: All nodes have at least 32 GB of memory.
- nodeResources:
checkName: Total CPU Cores in the cluster is 4 or greater
outcomes:
- fail:
when: "sum(cpuCapacity) < 4"
message: The cluster must contain at least 4 cores
uri: https://kurl.sh/docs/install-with-kurl/system-requirements
- pass:
message: There are at least 4 cores in the cluster
- nodeResources:
checkName: Every node in the cluster must have at least 40 GB of ephemeral storage, with 100 GB recommended
outcomes:
- fail:
when: "min(ephemeralStorageCapacity) < 40Gi"
message: All nodes must have at least 40 GB of ephemeral storage.
uri: https://kurl.sh/docs/install-with-kurl/system-requirements
- warn:
when: "min(ephemeralStorageCapacity) < 100Gi"
message: All nodes are recommended to have at least 100 GB of ephemeral storage.
uri: https://kurl.sh/docs/install-with-kurl/system-requirements
- pass:
message: All nodes have at least 100 GB of ephemeral storage.
{{- if eq .Values.postgres.enabled true }}
- name: Postgres
docString: "Postgres needs a storage class and sufficient memory."
checks:
- storageClass:
checkName: Postgres storage class
name: "{{ .Values.postgres.storageClassName | default \"default\" }}"
required: true
- nodeResources:
checkName: Postgres memory guidance
outcomes:
- fail:
when: "min(memoryCapacity) < 8Gi"
message: All nodes must have at least 8 GB of memory for Postgres.
- warn:
when: "min(memoryCapacity) < 32Gi"
message: Nodes are recommended to have at least 32 GB of memory for Postgres.
- pass:
message: Nodes have sufficient memory for Postgres.
{{- end }}
{{- if eq .Values.redis.enabled true }}
- name: Redis
docString: "Redis needs a storage class and adequate ephemeral storage."
checks:
- storageClass:
checkName: Redis storage class
name: "{{ .Values.redis.storageClassName | default \"default\" }}"
required: true
- nodeResources:
checkName: Redis ephemeral storage
outcomes:
- fail:
when: "min(ephemeralStorageCapacity) < 40Gi"
message: All nodes must have at least 40 GB of ephemeral storage for Redis.
- warn:
when: "min(ephemeralStorageCapacity) < 100Gi"
message: Nodes are recommended to have at least 100 GB of ephemeral storage for Redis.
- pass:
message: Nodes have sufficient ephemeral storage for Redis.
{{- end }}
• Presets library:
• Versioned URIs (e.g., preset://k8s/basic@v1, preset://app/logs@v1) maintained in-repo and publishable.
• "troubleshoot vendor --import" downloads presets to ./vendor/troubleshoot/ for offline builds.
Preflight docs & portal flow (hardening)
A) Merge engine details • Stable key = GroupKind/Name[/Namespace] (e.g., NodeResource/CPU, FilePermission//etc/hosts). • Conflict detection emits a list with reasons: “same key, differing fields: thresholds.min, description”. • Provenance captured on each merged node: • troubleshoot.sh/provenance: vendor|replicated|merged • troubleshoot.sh/merge-conflict: "thresholds.min, description"
B) Docs generator upgrades • Template slots: why, riskLevel {low,med,high}, owner, runbookURL, estimatedTime. • i18n hooks: template lookup by locale --locale es-ES falls back to en-US. • Output MD + self-contained HTML (inline CSS) when --html. --toc adds a nav sidebar.
C) Portal client contract • Auth: Bearer ; optional mTLS later. • Idempotency: Idempotency-Key header derived from spec SHA256. • Backoff: exponential jitter (100ms → 3s, 6 tries) on 429/5xx; code 3 on exhaustion. • Response model:
{ "requestId": "r_abc123", "decision": "pass|override|fail", "reason": "text", "policyVersion": "2025-09-01" }
• CLI prints requestId on error for support.
D) E2E tests (httptest.Server) • Scenarios: pass, fail, override, 429 with retry-after, 5xx flake, invalid JSON. • Golden transcripts of HTTP exchanges under testdata/e2e/portal.
Public packages & ecosystem
A) Package boundaries
pkg/ cli/contract # cobra->json exporter (no cobra import cycles) update/ # channel, verify, rollback schema/ # embed.FS of JSON Schemas + helpers specs/loader # version sniffing, load any -> internal specs/convert # converters specs/validate # validation library docs/render # md/html generation portal/client # http client + types
• No logging in libs; return structured errors with codes; callers log.
B) SARIF export (nice-to-have) • --output sarif for preflight results so CI systems ingest findings.
C) Back-compat façade • For integrators, add tiny shim: pkg/legacy/v1beta2loader that calls new loader + converter; mark with Deprecated: GoDoc but stable for a window.
CI/CD reinforcement
Pipelines 1. verify: lint, unit, fuzz (short), contracts, schemas → required. 2. matrix-install: brew/scoop/deb/rpm/curl on 3 OSes. 3. bench: enforce perf budgets. 4. supply-chain: build SBOM, cosign sign/verify, slsa attestation. 5. release (tagged): goreleaser publish, notarize, bump brew/scoop, attach SBOM, cosign attest.
Static checks • revive/golangci-lint with a rule to forbid time.Now() in pure functions; must use injected clock. • api-diff: compare exported pkg/** against last tag; fails on breaking changes without contracts/CHANGE_PROPOSAL.md.
- Error codes (centralized)
package xerr type Code int const ( OK Code = iota Usage Network Schema IncompatibleAPI UpdateFailed Permission Partial ) type E struct { Code Code; Op, Msg string; Err error } func (e E) Error() string { return e.Msg } func CodeOf(err error) Code { / unwrap */ }
- Output envelope (JSON mode)
{ "schemaVersion": "cli.v1", "tool": "preflight", "version": "1.12.0", "timestamp": "2025-09-09T17:02:33Z", "result": { /* command-specific */ }, "warnings": [], "errors": [] }
- Idempotency key
func idemKey(spec []byte) string { sum := sha256.Sum256(spec) return hex.EncodeToString(sum[:]) }
- Deterministic marshaling
enc := json.NewEncoder(w) enc.SetEscapeHTML(false) enc.SetIndent("", " ") sort.SliceStable(obj.Items, func(i,j int) bool { return obj.Items[i].Name < obj.Items[j].Name })
Measurable add-on success criteria • preflight --help --output json validates against docs/cli-contracts.schema.json. • make bench passes with stated p95 budgets. • cosign verify-blob succeeds for checksums.txt in CI and on dev machines (doc’d). • E2E portal tests cover all decision branches and 429/5xx paths with retries observed. • api-diff is green or has an attached change proposal.
Testing strategy (Dev 1 scope)
Unit tests
• CLI arg parsing: Cobra ExecuteC with table-driven flag sets for both binaries.
• Config precedence resolver: tmp dirs + OS-specific cases (XDG_CONFIG_HOME/APPDATA).
• Validation/defaulting libraries: happy/edge cases; structured []FieldError assertions.
• Portal client: httptest.Server scenarios (pass/fail/override/429/5xx) with retry/backoff checks.
• Updater: mock release index; cosign verify using test keys; rollback success/failure paths.
Contract/golden tests
• CLI contracts: generate docs/cli-contracts.json and compare to goldens; update via make contracts.
• --help rendering snapshots (normalized width/colors) for core commands.
• Schemas: validate example specs against v1beta3 JSON Schemas; store fixtures in testdata/schemas/.
• Docs generator: preflight-docs.md/HTML goldens for sample merged specs with provenance.
Fuzz/property tests
• Converters: v1beta1/2→internal→v1beta3→internal round-trip fuzz; invariants enforced.
• Defaulting idempotence: default(default(x)) == default(x).
Integration/matrix tests
• Installers: brew/scoop/deb/rpm/curl on ubuntu/macos/windows; run preflight/support-bundle --version and a smoke command.
• macOS notarization: spctl -a -v on built binaries.
• Updater E2E: start mock release server, switch channels, rollback, tamper-detection failure.
Determinism & performance
• Deterministic outputs under SOURCE_DATE_EPOCH; byte-for-byte stable archives in a test harness.
• Benchmarks: load+validate budgets (latency + RSS) enforced via go test -bench and thresholds.
Artifacts & layout
• Fixtures under testdata/: schemas/, cli/, docs/, portal/, updater/ with README explaining regeneration.
• Make targets: make test, make fuzz-short, make contracts, make e2e-install, make bench.
Preflight CLI: Values and --set support (templating)
• Goal: Let end customers pass Values at runtime to drive a single modular YAML with conditionals.
• Scope: preflight gains --values (repeatable) and --set key=value (repeatable), rendered over the input YAML before loading specs.
• Template engine: Go text/template + Sprig, with .Values bound. Standard delimiters {{ }}.
• Precedence:
• --set overrides everything (last one wins when repeated)
• Later --values files override earlier ones (left-to-right deep merge)
• Defaults embedded in the YAML are lowest precedence
• Merge:
• Maps: deep-merge
• Slices: replace (whole list)
• Types:
• true|false parsed as bool, numbers as float/int when unquoted, everything else as string
• Use quotes to force string: --set image.tag="1.2.3"
Example usage
# combine file values with inline overrides
preflight ./some-preflight-checks.yaml \
--values ./values.yaml \
--values ./values-prod.yaml \
--set postgres.enabled=true \
--set redis.enabled=false
Minimal Values schema (illustrative)
postgres:
enabled: false
storageClassName: default
redis:
enabled: true
storageClassName: default
Single-file modular YAML authoring pattern
apiVersion: troubleshoot.sh/v1beta3
kind: Preflight
metadata:
name: example
requirements:
- name: Baseline
docString: "Core Kubernetes requirements."
checks:
- k8sVersion: ">=1.22"
- distribution:
allow: [eks, gke, aks, kurl, digitalocean, rke2, k3s, oke, kind]
deny: [docker-desktop, microk8s, minikube]
- storageClass:
name: "default"
required: true
{{- if eq .Values.postgres.enabled true }}
- name: Postgres
docString: "Postgres needs a storage class and sufficient memory."
checks:
- storageClass:
name: "{{ .Values.postgres.storageClassName | default \"default\" }}"
required: true
- nodes:
memoryPerNode: ">=8Gi"
recommendMemoryPerNode: ">=32Gi"
{{- end }}
{{- if eq .Values.redis.enabled true }}
- name: Redis
docString: "Redis needs a storage class and adequate ephemeral storage."
checks:
- storageClass:
name: "{{ .Values.redis.storageClassName | default \"default\" }}"
required: true
- nodes:
ephemeralPerNode: ">=40Gi"
recommendEphemeralPerNode: ">=100Gi"
{{- end }}
Notes
• Keep everything in one YAML; conditionals gate entire requirement blocks.
• Authors can still drop down to raw analyzers; the renderer runs before spec parsing, so both styles work.
• Add --dry-run to print the rendered spec without executing checks.