Files
kubevela/docs/spec-diff-proposal.md
2026-02-09 10:19:05 +00:00

7.7 KiB

Proposal: Store Spec Diffs for Policy Transforms

Problem

Currently when a policy modifies the Application spec, we only set specModified: true. This isn't helpful for debugging:

{
  "name": "inject-sidecar",
  "specModified": true  // ❌ What did it change?
}

Users can't tell:

  • What exactly was added/removed/modified
  • How multiple policies interact
  • If a policy is working correctly

Proposed Solution

Store a structured diff when specModified=true:

{
  "name": "inject-sidecar",
  "specModified": true,
  "specDiff": "eyJhZGRlZCI6e...}",  // Base64 JSON patch
  "specDiffSummary": "Added 1 component, modified 2 properties"
}

Implementation Options

Option 1: JSON Patch (RFC 6902)

Format: Industry standard, compact

[
  {"op": "add", "path": "/components/1", "value": {...}},
  {"op": "replace", "path": "/components/0/properties/replicas", "value": 3}
]

Pros:

  • Standard format (kubectl diff uses this)
  • Reversible (can undo changes)
  • Compact representation
  • Libraries available

Cons:

  • Harder to read for humans
  • Requires JSON marshal/unmarshal

Cost: ~10-15ms overhead

Option 2: Structured Summary

Format: Human-readable summary

{
  "componentsAdded": 1,
  "componentsModified": 0,
  "propertiesChanged": ["components[0].properties.replicas"],
  "beforeHash": "abc123",
  "afterHash": "def456"
}

Pros:

  • Very readable
  • Lightweight (no full diff)
  • Fast to compute (~2-3ms)

Cons:

  • Can't see actual values
  • Not reversible
  • Less useful for complex changes

Cost: ~2-5ms overhead

Store both summary + diff (only if diff is small)

const MaxSpecDiffSize = 10 * 1024 // 10KB

type SpecChange struct {
    Summary      SpecChangeSummary  `json:"summary"`
    FullDiff     string            `json:"fullDiff,omitempty"`  // Base64 JSON patch (if <10KB)
    DiffTooLarge bool              `json:"diffTooLarge,omitempty"`
}

type SpecChangeSummary struct {
    ComponentsAdded    int      `json:"componentsAdded,omitempty"`
    ComponentsModified int      `json:"componentsModified,omitempty"`
    ComponentsRemoved  int      `json:"componentsRemoved,omitempty"`
    FieldsChanged      []string `json:"fieldsChanged,omitempty"`
    BeforeHash         string   `json:"beforeHash"`
    AfterHash          string   `json:"afterHash"`
}

Example:

{
  "name": "inject-sidecar",
  "specModified": true,
  "specChange": {
    "summary": {
      "componentsAdded": 0,
      "componentsModified": 2,
      "fieldsChanged": [
        "components[0].properties.env[0]",
        "components[1].properties.env[0]"
      ],
      "beforeHash": "abc123",
      "afterHash": "def456"
    },
    "fullDiff": "W3sib3AiOiJhZGQiLCJ...",  // Only if <10KB
    "diffTooLarge": false
  }
}

Pros:

  • Human-readable summary for quick diagnosis
  • Full diff available for detailed debugging (when needed)
  • Avoids etcd bloat for large changes
  • Fast path (summary only) is cheap (~5ms)

Cons:

  • More complex implementation
  • Two code paths to maintain

Cost: ~5-15ms depending on size

Scope: Only Diff Spec Changes

Do NOT diff labels/annotations - we already track these explicitly:

"addedLabels": {"team": "platform"},        // ✅ Already clear
"addedAnnotations": {"version": "v1.0"}     // ✅ Already clear

Only diff spec transforms - this is where we need help:

"specModified": true,   // ❌ Not helpful
"specChange": {...}     // ✅ Shows what changed

Computational Impact

Per-Application Overhead:

  • Option 1 (JSON Patch): ~10-15ms
  • Option 2 (Summary Only): ~2-5ms
  • Option 3 (Hybrid): ~5-15ms (average ~8ms)

Context:

  • Typical reconciliation: 100-500ms
  • Policy rendering (uncached): 30-100ms
  • 8ms overhead = ~2-5% of total time Acceptable

When to Skip:

  • If no spec transform: 0ms overhead
  • If diff >10KB: compute summary only (~2ms)
  • Labels/annotations only: 0ms overhead

Storage Impact

etcd Size:

  • JSON Patch for typical sidecar injection: ~2-5KB
  • Base64 encoding: +33% → ~3-7KB
  • 5 policies with spec changes: ~15-35KB
  • Total Application size increase: <5% Acceptable

etcd Limits:

  • Max object size: 1.5MB
  • Typical Application: 20-100KB
  • With diffs: 25-135KB
  • Still well under limit

Implementation Plan

Phase 1: Summary Only (Quick Win)

type PolicyChanges struct {
    AddedLabels       map[string]string
    AddedAnnotations  map[string]string
    AdditionalContext map[string]interface{}
    SpecModified      bool
    SpecChangeSummary *SpecChangeSummary  // NEW
}

Benefits:

  • Low overhead (~2ms)
  • Helps with debugging
  • No storage concerns

Phase 2: Add Full Diff (If Needed)

type PolicyChanges struct {
    // ... existing fields
    SpecChange *SpecChange  // Replaces SpecModified + Summary
}

Benefits:

  • Complete visibility
  • Can show diffs in UI
  • Enables "undo" functionality

Alternative: External Diff Storage

If storage is a concern, store diffs externally:

type AppliedGlobalPolicy struct {
    // ... existing fields
    SpecDiffRef string  // "configmap/my-app-policy-diffs/inject-sidecar"
}

Create a ConfigMap per Application with all policy diffs:

apiVersion: v1
kind: ConfigMap
metadata:
  name: my-app-policy-diffs
data:
  inject-sidecar: |
    [{"op": "add", ...}]
  resource-limits: |
    [{"op": "replace", ...}]

Pros:

  • Doesn't bloat Application status
  • Can be cleaned up separately
  • No etcd concerns

Cons:

  • Extra API call to view diffs
  • More objects to manage
  • Lifecycle management complexity

Decision Criteria

When to Implement Full Diffs:

YES if:

  • Users frequently ask "what did this policy change?"
  • Debugging complex spec transforms is common
  • UI/CLI tools will display diffs
  • "Undo policy effects" is a requirement

NO if:

  • Current tracking (labels/annotations/specModified) is sufficient
  • Performance is critical (every ms counts)
  • Storage is limited

Recommendation:

Start with Phase 1 (Summary Only):

  • Low cost (~2ms, ~1KB)
  • Immediate value for debugging
  • Easy to implement
  • Can upgrade to full diffs later if needed

Add Phase 2 (Full Diffs) if:

  • Users request it after using summaries
  • UI/CLI tools are built to display diffs
  • "Policy dry-run" feature is added

Open Questions

  1. Should diffs be human-readable or machine-parseable?

    • JSON Patch (machine) vs. kubectl-style diff (human)
  2. Should we store diffs for all policies or just spec changes?

    • Current proposal: Only spec changes
  3. Should diffs be compressed?

    • Could use gzip before base64 (saves ~60% space)
  4. Retention policy?

    • Clear diffs on successful reconciliation?
    • Keep last N diffs?
  5. Should we support "reverting" policy changes?

    • Would require storing inverse patches

Example Usage

CLI Tool

# Show what a policy changed
kubectl vela policy diff my-app inject-sidecar

# Output:
Spec changes by policy 'inject-sidecar':
  + Added component 'monitoring-sidecar'
  ~ Modified components[0].properties.env
    + Added env var: SIDECAR_ENABLED=true

# Show full JSON patch
kubectl vela policy diff my-app inject-sidecar --format=json-patch

UI Dashboard

Application: my-app
Applied Policies:
  ✅ inject-sidecar (vela-system)
     Spec Changes:
       ├─ Added 1 component
       ├─ Modified 2 properties
       └─ [View Full Diff]

Conclusion

Summary diffs (~2ms, ~1KB) provide 80% of the value with 20% of the cost.

Recommend:

  1. Implement Phase 1 (Summary) now
  2. 🤔 Evaluate Phase 2 (Full Diff) based on usage
  3. 📊 Add metrics to track diff size distribution
  4. 🔍 Monitor performance impact in production