mirror of
https://github.com/replicatedhq/troubleshoot.git
synced 2026-02-14 18:29:53 +00:00
docs: refine sbctl bundle contract design
Design clarifications: - sbctl remains separate binary, imports shared libs from troubleshoot - Add bundle schema versioning to discovery.json - Add CRD table collection as explicit requirement - Add failsafe collection guarantee (table failures never block raw) - Add hybrid field selector discovery (runtime + static fallback) - Add Limitations section (watch, subresources, writes, pagination) - Add Testing Strategy section Copyediting: - Apply Strunk's Elements of Style throughout - Use active voice consistently - Remove hedging and redundant phrases - Tighten prose for clarity
This commit is contained in:
@@ -1,4 +1,4 @@
|
|||||||
# Design: Merging sbctl into Troubleshoot
|
# Design: Shared Bundle Contract for sbctl and Troubleshoot
|
||||||
|
|
||||||
**Date:** 2026-01-22
|
**Date:** 2026-01-22
|
||||||
**Status:** Draft
|
**Status:** Draft
|
||||||
@@ -6,34 +6,35 @@
|
|||||||
|
|
||||||
## Problem Statement
|
## Problem Statement
|
||||||
|
|
||||||
sbctl is a companion tool to troubleshoot that creates a local Kubernetes API server from support bundles, allowing `kubectl` commands to be run against captured cluster state. Currently:
|
sbctl is a companion tool to troubleshoot that creates a local Kubernetes API server from support bundles, enabling users to run `kubectl` commands against captured cluster state. Currently:
|
||||||
|
|
||||||
1. **Maintenance burden**: sbctl requires constant updates to stay in sync with troubleshoot's collector changes
|
1. **Maintenance burden**: sbctl requires constant updates to stay in sync with troubleshoot's collector changes
|
||||||
2. **No formal contract**: There's no schema defining how troubleshoot writes resources and how sbctl reads them
|
2. **No formal contract**: No schema defines how troubleshoot writes resources or how sbctl reads them
|
||||||
3. **Type-specific code**: sbctl has ~1800 lines of switch statements handling each resource type individually
|
3. **Type-specific code**: sbctl has ~1800 lines of switch statements handling each resource type individually
|
||||||
4. **Drift risk**: As troubleshoot adds capabilities, sbctl may not reflect them, causing inconsistent behavior
|
4. **Drift risk**: As troubleshoot adds capabilities, sbctl may not reflect them, causing inconsistent behavior
|
||||||
|
|
||||||
## Goals
|
## Goals
|
||||||
|
|
||||||
1. **Eliminate maintenance burden** - Stop having to update sbctl every time troubleshoot adds a new collector or changes file structure
|
1. **Eliminate maintenance burden** - sbctl should not require updates when troubleshoot adds collectors or changes file structure
|
||||||
2. **Enable dynamic resource support** - sbctl should serve any Kubernetes resource without code changes, as long as troubleshoot collected it
|
2. **Enable dynamic resource support** - sbctl should serve any Kubernetes resource troubleshoot collects, without code changes
|
||||||
3. **Perfect fidelity** - sbctl output must match what a real Kubernetes API server would return; debuggers must trust the output
|
3. **Perfect fidelity** - sbctl output must match what a real Kubernetes API server would return; debuggers must trust the output
|
||||||
4. **Backwards compatibility** - New bundles work with old sbctl; new sbctl works with old bundles
|
4. **Backwards compatibility** - New and old versions of bundles and sbctl must interoperate
|
||||||
|
|
||||||
## Non-Goals
|
## Non-Goals
|
||||||
|
|
||||||
- Modifying how users invoke sbctl (CLI interface unchanged)
|
- Modifying the sbctl CLI interface
|
||||||
- Supporting write operations (sbctl remains read-only)
|
- Supporting write operations
|
||||||
- Real-time cluster connection (sbctl only serves static bundle data)
|
- Connecting to live clusters
|
||||||
|
|
||||||
## Solution Overview
|
## Solution Overview
|
||||||
|
|
||||||
Merge sbctl into the troubleshoot codebase with a shared `pkg/bundle` package that defines the contract between collection and serving. Key changes:
|
A new `pkg/bundle` package in troubleshoot defines the contract between collection and serving. sbctl remains a separate binary but imports shared libraries from troubleshoot. Key changes:
|
||||||
|
|
||||||
1. **Pre-compute table representations** at collection time (when cluster access exists)
|
1. **Pre-compute table representations** at collection time, when cluster access exists
|
||||||
2. **Store API discovery metadata** alongside resources
|
2. **Store API discovery metadata** alongside resources
|
||||||
3. **Use shared selectable fields mapping** for field selector support
|
3. **Discover field selectors** at runtime with static fallback
|
||||||
4. **Serve resources generically** using unstructured objects with metadata-driven behavior
|
4. **Serve resources generically** using unstructured objects with metadata-driven behavior
|
||||||
|
5. **Collect failsafe** - table and metadata failures never block raw resource collection
|
||||||
|
|
||||||
## Detailed Design
|
## Detailed Design
|
||||||
|
|
||||||
@@ -100,10 +101,13 @@ This is stored as `{resource}/{namespace}.table.json` alongside the raw `{namesp
|
|||||||
|
|
||||||
### Discovery Metadata
|
### Discovery Metadata
|
||||||
|
|
||||||
`_meta/discovery.json` captures API server discovery information:
|
`_meta/discovery.json` captures API server discovery information and bundle schema version:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
|
"bundleSchemaVersion": "1.0",
|
||||||
|
"collectedAt": "2026-01-22T10:30:00Z",
|
||||||
|
"kubernetesVersion": "v1.28.4",
|
||||||
"apiVersion": "v1",
|
"apiVersion": "v1",
|
||||||
"groups": [
|
"groups": [
|
||||||
{
|
{
|
||||||
@@ -129,6 +133,11 @@ This is stored as `{resource}/{namespace}.table.json` alongside the raw `{namesp
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
The `bundleSchemaVersion` field enables future schema evolution:
|
||||||
|
- sbctl checks version and uses appropriate parsing logic
|
||||||
|
- Unknown versions fall back to legacy behavior
|
||||||
|
- Semantic versioning: minor versions are backwards compatible
|
||||||
|
|
||||||
### Shared Package: `pkg/bundle`
|
### Shared Package: `pkg/bundle`
|
||||||
|
|
||||||
A new package shared between collectors and sbctl defines the bundle contract.
|
A new package shared between collectors and sbctl defines the bundle contract.
|
||||||
@@ -161,6 +170,16 @@ func TablePath(resource, namespace string) string {
|
|||||||
}
|
}
|
||||||
return fmt.Sprintf("%s/%s.table.json", resource, namespace)
|
return fmt.Sprintf("%s/%s.table.json", resource, namespace)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// SelectableFieldsPath returns the path for discovered selectable fields
|
||||||
|
func SelectableFieldsPath(gvr schema.GroupVersionResource) string {
|
||||||
|
group := gvr.Group
|
||||||
|
if group == "" {
|
||||||
|
group = "core"
|
||||||
|
}
|
||||||
|
return filepath.Join(MetaDir, SelectableFieldsDir,
|
||||||
|
fmt.Sprintf("%s.%s.%s.json", group, gvr.Version, gvr.Resource))
|
||||||
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
#### `pkg/bundle/fields.go`
|
#### `pkg/bundle/fields.go`
|
||||||
@@ -249,6 +268,24 @@ func GetSelectableFields(group, version, resource string) []string {
|
|||||||
// Default: all resources support metadata selectors
|
// Default: all resources support metadata selectors
|
||||||
return []string{"metadata.name", "metadata.namespace"}
|
return []string{"metadata.name", "metadata.namespace"}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// LoadSelectableFields loads selectable fields using hybrid approach:
|
||||||
|
// 1. Check for discovered fields in bundle (preferred)
|
||||||
|
// 2. Fall back to static SelectableFields map
|
||||||
|
// 3. Fall back to metadata-only fields
|
||||||
|
func LoadSelectableFields(bundlePath string, gvr schema.GroupVersionResource) []string {
|
||||||
|
// Try discovered fields first
|
||||||
|
discoveredPath := filepath.Join(bundlePath, ClusterResourcesDir, SelectableFieldsPath(gvr))
|
||||||
|
if data, err := os.ReadFile(discoveredPath); err == nil {
|
||||||
|
var fields []string
|
||||||
|
if json.Unmarshal(data, &fields) == nil && len(fields) > 0 {
|
||||||
|
return fields
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fall back to static map
|
||||||
|
return GetSelectableFields(gvr.Group, gvr.Version, gvr.Resource)
|
||||||
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
#### `pkg/bundle/extract.go`
|
#### `pkg/bundle/extract.go`
|
||||||
@@ -311,26 +348,47 @@ func formatSlice(slice []interface{}) string {
|
|||||||
|
|
||||||
### Collection Changes
|
### Collection Changes
|
||||||
|
|
||||||
The `clusterResources` collector gains additional collection logic:
|
The `clusterResources` collector gains additional collection logic. **Critical requirement: collection must be failsafe.** Failures in table or metadata collection must never prevent raw resource collection from succeeding.
|
||||||
|
|
||||||
```go
|
```go
|
||||||
func (c *CollectClusterResources) Collect(progressChan chan<- interface{}) (CollectorResult, error) {
|
func (c *CollectClusterResources) Collect(progressChan chan<- interface{}) (CollectorResult, error) {
|
||||||
// ... existing collection logic ...
|
// ... existing collection logic ...
|
||||||
|
|
||||||
// For each resource type collected:
|
// For each resource type collected (including CRDs):
|
||||||
for _, gvr := range collectedResources {
|
for _, gvr := range collectedResources {
|
||||||
// 1. Collect raw resources (existing)
|
// 1. Collect raw resources (existing) - this MUST succeed
|
||||||
rawResult := collectRawResources(client, gvr, namespace)
|
rawResult, err := collectRawResources(client, gvr, namespace)
|
||||||
|
if err != nil {
|
||||||
|
// Handle error per existing behavior
|
||||||
|
continue
|
||||||
|
}
|
||||||
output.SaveResult(bundle.ResourcePath(gvr.Resource, namespace), rawResult)
|
output.SaveResult(bundle.ResourcePath(gvr.Resource, namespace), rawResult)
|
||||||
|
|
||||||
// 2. Collect table representation (NEW)
|
// 2. Collect table representation (NEW) - failsafe, errors logged but not fatal
|
||||||
tableResult := collectTableRepresentation(client, gvr, namespace)
|
tableResult, err := collectTableRepresentation(client, gvr, namespace)
|
||||||
output.SaveResult(bundle.TablePath(gvr.Resource, namespace), tableResult)
|
if err != nil {
|
||||||
|
// Log warning but continue - some resources (especially CRDs) may not support table format
|
||||||
|
log.Warnf("Failed to collect table for %s: %v", gvr, err)
|
||||||
|
} else {
|
||||||
|
output.SaveResult(bundle.TablePath(gvr.Resource, namespace), tableResult)
|
||||||
|
}
|
||||||
|
|
||||||
|
// 3. Discover selectable fields (NEW) - failsafe
|
||||||
|
fields, err := discoverSelectableFields(client, gvr)
|
||||||
|
if err != nil {
|
||||||
|
log.Debugf("Could not discover selectable fields for %s: %v", gvr, err)
|
||||||
|
} else if len(fields) > 0 {
|
||||||
|
output.SaveResult(bundle.SelectableFieldsPath(gvr), fields)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Collect discovery metadata (NEW)
|
// Collect discovery metadata (NEW) - failsafe
|
||||||
discovery := collectDiscoveryMetadata(client)
|
discovery, err := collectDiscoveryMetadata(client)
|
||||||
output.SaveResult(path.Join(bundle.MetaDir, bundle.DiscoveryFile), discovery)
|
if err != nil {
|
||||||
|
log.Warnf("Failed to collect API discovery metadata: %v", err)
|
||||||
|
} else {
|
||||||
|
output.SaveResult(path.Join(bundle.MetaDir, bundle.DiscoveryFile), discovery)
|
||||||
|
}
|
||||||
|
|
||||||
return output, nil
|
return output, nil
|
||||||
}
|
}
|
||||||
@@ -343,6 +401,30 @@ func collectTableRepresentation(client rest.Interface, gvr schema.GroupVersionRe
|
|||||||
|
|
||||||
return req.DoRaw(ctx)
|
return req.DoRaw(ctx)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// discoverSelectableFields probes the API server to discover which fields are selectable
|
||||||
|
func discoverSelectableFields(client rest.Interface, gvr schema.GroupVersionResource) ([]byte, error) {
|
||||||
|
// Make request with invalid field selector to get error listing valid fields
|
||||||
|
req := client.Get().
|
||||||
|
Resource(gvr.Resource).
|
||||||
|
Param("fieldSelector", "__invalid_field__=x")
|
||||||
|
|
||||||
|
_, err := req.DoRaw(ctx)
|
||||||
|
if err == nil {
|
||||||
|
// Unexpected success - return empty
|
||||||
|
return nil, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse error message for field list
|
||||||
|
// Error format: "field label not supported: __invalid_field__"
|
||||||
|
// or lists valid fields in some K8s versions
|
||||||
|
fields := parseSelectableFieldsFromError(err)
|
||||||
|
if len(fields) == 0 {
|
||||||
|
return nil, fmt.Errorf("could not parse selectable fields from error")
|
||||||
|
}
|
||||||
|
|
||||||
|
return json.Marshal(fields)
|
||||||
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### sbctl Serving Changes
|
### sbctl Serving Changes
|
||||||
@@ -426,26 +508,21 @@ This works because:
|
|||||||
|
|
||||||
## Migration Path
|
## Migration Path
|
||||||
|
|
||||||
|
sbctl remains a separate binary with its own repository, but imports shared libraries from troubleshoot.
|
||||||
|
|
||||||
| Phase | Changes |
|
| Phase | Changes |
|
||||||
|-------|---------|
|
|-------|---------|
|
||||||
| Phase 1 | Add `pkg/bundle` to troubleshoot with schema definitions |
|
| Phase 1 | Add `pkg/bundle` to troubleshoot with schema definitions |
|
||||||
| Phase 2 | Update collectors to write `_meta/` and `.table.json` files |
|
| Phase 2 | Update collectors to write `_meta/` and `.table.json` files (failsafe) |
|
||||||
| Phase 3 | Move sbctl code into troubleshoot as `pkg/sbctl` or `cmd/sbctl` |
|
| Phase 3 | sbctl adds troubleshoot as a Go module dependency |
|
||||||
| Phase 4 | Refactor sbctl to use `pkg/bundle` with fallback to legacy |
|
| Phase 4 | Refactor sbctl to use `pkg/bundle` with fallback to legacy |
|
||||||
| Phase 5 | Deprecate standalone sbctl repository |
|
| Phase 5 | (Later) Remove legacy fallback code from sbctl |
|
||||||
| Phase 6 | (Later) Remove legacy fallback code from sbctl |
|
|
||||||
|
|
||||||
## Bundle Size Impact
|
## Bundle Size Impact
|
||||||
|
|
||||||
Pre-computed table files add storage overhead:
|
Pre-computed table files add minimal storage overhead:
|
||||||
- Approximately 2x storage for resource data (raw + table)
|
- The `cluster-resources/` directory rarely exceeds 5MB
|
||||||
- Table format rows are often smaller than full objects (cells vs full spec)
|
- Table JSON compresses well within the tar.gz bundle
|
||||||
- Estimated increase: 30-50% of cluster-resources directory size
|
|
||||||
|
|
||||||
This tradeoff is acceptable because:
|
|
||||||
1. Fidelity guarantee is more valuable than bundle size for debugging
|
|
||||||
2. Support bundles are typically compressed (tar.gz)
|
|
||||||
3. Table data compresses well (repetitive structure)
|
|
||||||
|
|
||||||
## Alternatives Considered
|
## Alternatives Considered
|
||||||
|
|
||||||
@@ -454,35 +531,133 @@ This tradeoff is acceptable because:
|
|||||||
Generate table output at serve time using Kubernetes' internal printers.
|
Generate table output at serve time using Kubernetes' internal printers.
|
||||||
|
|
||||||
**Rejected because:**
|
**Rejected because:**
|
||||||
- Requires typed objects for each resource (switch statements remain)
|
- Typed objects for each resource preserve switch statements
|
||||||
- Complex columns (e.g., Pod "Ready" = running/total containers) require type-specific logic
|
- Complex columns (e.g., Pod "Ready" = running/total containers) demand type-specific logic
|
||||||
- Cannot achieve perfect fidelity without cluster access
|
- Perfect fidelity requires cluster access
|
||||||
|
|
||||||
### Alternative 2: Unstructured with Basic Tables
|
### Alternative 2: Unstructured with Basic Tables
|
||||||
|
|
||||||
Serve all resources as unstructured, provide only NAME/NAMESPACE/AGE columns.
|
Serve all resources as unstructured, provide only NAME/NAMESPACE/AGE columns.
|
||||||
|
|
||||||
**Rejected because:**
|
**Rejected because:**
|
||||||
- Violates fidelity requirement
|
- Violates the fidelity requirement
|
||||||
- Debuggers lose valuable context from resource-specific columns
|
- Strips resource-specific columns that debuggers rely on
|
||||||
- UX degradation unacceptable for troubleshooting tool
|
- Degrades the troubleshooting experience unacceptably
|
||||||
|
|
||||||
### Alternative 3: Parse Error Messages for Selectable Fields
|
### Alternative 3: Static-Only Selectable Fields Mapping
|
||||||
|
|
||||||
Discover selectable fields by making invalid requests and parsing error messages.
|
Maintain only a static mapping of selectable fields without runtime discovery.
|
||||||
|
|
||||||
**Rejected because:**
|
**Rejected because:**
|
||||||
- Fragile (error message format may change)
|
- Map becomes stale as Kubernetes adds new selectable fields
|
||||||
- Requires cluster access at serve time (defeats offline purpose)
|
- No way to know selectable fields for CRDs
|
||||||
- Shared mapping is more maintainable
|
- Requires manual updates to track Kubernetes releases
|
||||||
|
|
||||||
|
The chosen hybrid approach (runtime discovery at collection time + static fallback) provides better coverage while maintaining offline functionality.
|
||||||
|
|
||||||
|
## Design Decisions
|
||||||
|
|
||||||
|
### CRD Table Collection
|
||||||
|
|
||||||
|
**Decision: Yes, collect table representations for CRDs.**
|
||||||
|
|
||||||
|
Rationale:
|
||||||
|
- CRDs often provide the most valuable custom printer columns (e.g., Certificate expiry, VolumeSnapshot ready state)
|
||||||
|
- Users expect consistent behavior between core resources and CRDs
|
||||||
|
- Failsafe collection ensures raw data even when a CRD lacks table format support
|
||||||
|
|
||||||
|
### Failsafe Collection
|
||||||
|
|
||||||
|
**Decision: Table and metadata collection failures must never block raw resource collection.**
|
||||||
|
|
||||||
|
Rationale:
|
||||||
|
- The cluster-resources collector exists primarily to capture raw resource data
|
||||||
|
- Some resources, especially CRDs, lack table format support
|
||||||
|
- Degraded clusters may fail API server discovery
|
||||||
|
- Users must always receive at least the raw resource data
|
||||||
|
|
||||||
|
### Field Selector Discovery
|
||||||
|
|
||||||
|
**Decision: Hybrid approach - runtime discovery at collection time with static fallback.**
|
||||||
|
|
||||||
|
At collection time:
|
||||||
|
1. Probe API server with invalid field selector
|
||||||
|
2. Parse error message to extract valid field list
|
||||||
|
3. Store discovered fields in `_meta/selectable-fields/`
|
||||||
|
|
||||||
|
At serve time (sbctl):
|
||||||
|
1. Check for discovered fields in bundle
|
||||||
|
2. Fall back to static `SelectableFields` map
|
||||||
|
3. Fall back to metadata-only (`metadata.name`, `metadata.namespace`)
|
||||||
|
|
||||||
|
**Note:** Field discovery via error parsing is inherently fragile as error message formats vary between Kubernetes versions. The static fallback ensures functionality even when discovery fails.
|
||||||
|
|
||||||
|
### Bundle Schema Versioning
|
||||||
|
|
||||||
|
**Decision: Include `bundleSchemaVersion` in `_meta/discovery.json`.**
|
||||||
|
|
||||||
|
Rationale:
|
||||||
|
- Enables future schema changes without breaking compatibility
|
||||||
|
- sbctl can adapt behavior based on schema version
|
||||||
|
- Unknown versions trigger fallback to legacy behavior
|
||||||
|
- Semantic versioning: `1.x` versions are backwards compatible with `1.0`
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
The following features are explicitly **not supported** by sbctl and are out of scope for this design:
|
||||||
|
|
||||||
|
### Watch Operations
|
||||||
|
|
||||||
|
sbctl serves static bundle data and cannot support watch operations. Commands like `kubectl get pods -w`:
|
||||||
|
- Return the initial list
|
||||||
|
- Immediately close the watch stream
|
||||||
|
|
||||||
|
This is inherent to sbctl's design as a snapshot-based tool.
|
||||||
|
|
||||||
|
### Subresources
|
||||||
|
|
||||||
|
Subresources that require live cluster interaction are not supported:
|
||||||
|
|
||||||
|
| Subresource | Reason |
|
||||||
|
|-------------|--------|
|
||||||
|
| `pods/exec` | Requires running container |
|
||||||
|
| `pods/attach` | Requires running container |
|
||||||
|
| `pods/portforward` | Requires running container |
|
||||||
|
| `pods/log` | Log data not collected by default (separate collector) |
|
||||||
|
| `*/scale` | Write operation |
|
||||||
|
| `*/status` | Could be supported if status is in collected data |
|
||||||
|
|
||||||
|
### Write Operations
|
||||||
|
|
||||||
|
All mutating operations return `405 Method Not Allowed`:
|
||||||
|
- `kubectl create`, `kubectl apply`, `kubectl delete`
|
||||||
|
- `kubectl edit`, `kubectl patch`
|
||||||
|
- `kubectl scale`, `kubectl rollout`
|
||||||
|
|
||||||
|
### Real-time Data
|
||||||
|
|
||||||
|
sbctl serves point-in-time snapshot data. It cannot reflect:
|
||||||
|
- Current pod status
|
||||||
|
- Live metrics
|
||||||
|
- Recent events (only events collected at bundle time)
|
||||||
|
- Changes since bundle collection
|
||||||
|
|
||||||
|
### List Pagination
|
||||||
|
|
||||||
|
Kubernetes `continue` tokens for paginated lists are not implemented. sbctl returns all matching resources in a single response. This design works because:
|
||||||
|
- Bundles contain finite resources, already filtered at collection time
|
||||||
|
- Typical bundle sizes fit comfortably in single responses
|
||||||
|
|
||||||
|
### resourceVersion Semantics
|
||||||
|
|
||||||
|
sbctl does not implement proper `resourceVersion` semantics:
|
||||||
|
- All resources return a static or omitted `resourceVersion`
|
||||||
|
- Optimistic concurrency checks are not meaningful
|
||||||
|
- Watch `resourceVersion` parameters are ignored
|
||||||
|
|
||||||
## Open Questions
|
## Open Questions
|
||||||
|
|
||||||
1. **CRD table columns**: Should we capture table representations for custom resources? (Recommended: yes, same approach applies)
|
None remaining. All questions have been resolved in the Design Decisions section.
|
||||||
|
|
||||||
2. **Bundle format versioning**: Should we add a version field to `_meta/` for future schema changes?
|
|
||||||
|
|
||||||
3. **Selective table collection**: Should table collection be opt-in to reduce bundle size for users who don't use sbctl?
|
|
||||||
|
|
||||||
## Success Criteria
|
## Success Criteria
|
||||||
|
|
||||||
@@ -491,6 +666,97 @@ Discover selectable fields by making invalid requests and parsing error messages
|
|||||||
3. Field selectors work for documented selectable fields
|
3. Field selectors work for documented selectable fields
|
||||||
4. Old bundles continue to work with new sbctl
|
4. Old bundles continue to work with new sbctl
|
||||||
5. No switch statements for resource types in sbctl serving code
|
5. No switch statements for resource types in sbctl serving code
|
||||||
|
6. Table/metadata collection failures never prevent raw resource collection
|
||||||
|
7. CRD resources have table representations when the CRD supports them
|
||||||
|
|
||||||
|
## Testing Strategy
|
||||||
|
|
||||||
|
### Unit Tests
|
||||||
|
|
||||||
|
**pkg/bundle tests:**
|
||||||
|
- `ResourcePath()`, `TablePath()`, `SelectableFieldsPath()` return correct paths
|
||||||
|
- `GetSelectableFields()` returns correct fields for known GVRs
|
||||||
|
- `GetSelectableFields()` returns metadata-only fallback for unknown GVRs
|
||||||
|
- `LoadSelectableFields()` prefers discovered fields over static map
|
||||||
|
- `ExtractSelectableFields()` correctly extracts nested field values
|
||||||
|
|
||||||
|
**Collection tests:**
|
||||||
|
- Table collection failure doesn't prevent raw collection
|
||||||
|
- Discovery metadata collection failure doesn't prevent resource collection
|
||||||
|
- Field selector discovery failure logs warning and continues
|
||||||
|
- CRDs without table support are collected as raw-only
|
||||||
|
|
||||||
|
### Integration Tests
|
||||||
|
|
||||||
|
**Fidelity tests (compare sbctl vs live cluster):**
|
||||||
|
```bash
|
||||||
|
# Collect bundle from live cluster
|
||||||
|
support-bundle --spec=test-spec.yaml -o test-bundle.tar.gz
|
||||||
|
|
||||||
|
# For each resource type:
|
||||||
|
# 1. Query live cluster
|
||||||
|
kubectl get pods -o json > live-pods.json
|
||||||
|
kubectl get pods > live-pods-table.txt
|
||||||
|
|
||||||
|
# 2. Query sbctl
|
||||||
|
sbctl serve test-bundle.tar.gz &
|
||||||
|
kubectl --server=https://localhost:8443 get pods -o json > sbctl-pods.json
|
||||||
|
kubectl --server=https://localhost:8443 get pods > sbctl-pods-table.txt
|
||||||
|
|
||||||
|
# 3. Compare (ignoring dynamic fields like resourceVersion)
|
||||||
|
diff <(jq 'del(.metadata.resourceVersion)' live-pods.json) \
|
||||||
|
<(jq 'del(.metadata.resourceVersion)' sbctl-pods.json)
|
||||||
|
diff live-pods-table.txt sbctl-pods-table.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
**Backwards compatibility tests:**
|
||||||
|
```bash
|
||||||
|
# Test new sbctl with old bundle (no _meta/)
|
||||||
|
sbctl serve old-bundle-without-meta.tar.gz
|
||||||
|
kubectl --server=https://localhost:8443 get pods # Should work via legacy path
|
||||||
|
|
||||||
|
# Test old sbctl with new bundle (has _meta/)
|
||||||
|
old-sbctl serve new-bundle-with-meta.tar.gz
|
||||||
|
kubectl --server=https://localhost:8443 get pods # Should work, ignoring _meta/
|
||||||
|
```
|
||||||
|
|
||||||
|
**Failsafe tests:**
|
||||||
|
```bash
|
||||||
|
# Simulate table collection failure (mock API server returning errors)
|
||||||
|
# Verify raw resources still collected
|
||||||
|
# Verify bundle is valid and usable
|
||||||
|
|
||||||
|
# Simulate discovery metadata failure
|
||||||
|
# Verify resources still collected
|
||||||
|
# Verify sbctl falls back to legacy behavior
|
||||||
|
```
|
||||||
|
|
||||||
|
### Field Selector Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Test field selectors work correctly
|
||||||
|
kubectl --server=https://localhost:8443 get pods --field-selector=status.phase=Running
|
||||||
|
kubectl --server=https://localhost:8443 get pods --field-selector=spec.nodeName=node-1
|
||||||
|
kubectl --server=https://localhost:8443 get events --field-selector=involvedObject.name=my-pod
|
||||||
|
|
||||||
|
# Compare results with live cluster output
|
||||||
|
```
|
||||||
|
|
||||||
|
### CRD Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Collect bundle from cluster with CRDs (e.g., cert-manager Certificates)
|
||||||
|
# Verify CRD resources have table representations
|
||||||
|
# Verify kubectl get certificates shows custom columns (Ready, Secret, Age)
|
||||||
|
# Verify CRDs without additionalPrinterColumns still work (basic columns)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Performance Tests
|
||||||
|
|
||||||
|
For large bundles (10k+ resources):
|
||||||
|
- Measure response time for list operations
|
||||||
|
- Measure memory usage during filtering
|
||||||
|
- Verify no timeouts or OOM conditions
|
||||||
|
|
||||||
## References
|
## References
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user