7.7 KiB
Proposal: Store Spec Diffs for Policy Transforms
Problem
Currently when a policy modifies the Application spec, we only set specModified: true. This isn't helpful for debugging:
{
"name": "inject-sidecar",
"specModified": true // ❌ What did it change?
}
Users can't tell:
- What exactly was added/removed/modified
- How multiple policies interact
- If a policy is working correctly
Proposed Solution
Store a structured diff when specModified=true:
{
"name": "inject-sidecar",
"specModified": true,
"specDiff": "eyJhZGRlZCI6e...}", // Base64 JSON patch
"specDiffSummary": "Added 1 component, modified 2 properties"
}
Implementation Options
Option 1: JSON Patch (RFC 6902)
Format: Industry standard, compact
[
{"op": "add", "path": "/components/1", "value": {...}},
{"op": "replace", "path": "/components/0/properties/replicas", "value": 3}
]
Pros:
- Standard format (kubectl diff uses this)
- Reversible (can undo changes)
- Compact representation
- Libraries available
Cons:
- Harder to read for humans
- Requires JSON marshal/unmarshal
Cost: ~10-15ms overhead
Option 2: Structured Summary
Format: Human-readable summary
{
"componentsAdded": 1,
"componentsModified": 0,
"propertiesChanged": ["components[0].properties.replicas"],
"beforeHash": "abc123",
"afterHash": "def456"
}
Pros:
- Very readable
- Lightweight (no full diff)
- Fast to compute (~2-3ms)
Cons:
- Can't see actual values
- Not reversible
- Less useful for complex changes
Cost: ~2-5ms overhead
Option 3: Hybrid Approach (RECOMMENDED)
Store both summary + diff (only if diff is small)
const MaxSpecDiffSize = 10 * 1024 // 10KB
type SpecChange struct {
Summary SpecChangeSummary `json:"summary"`
FullDiff string `json:"fullDiff,omitempty"` // Base64 JSON patch (if <10KB)
DiffTooLarge bool `json:"diffTooLarge,omitempty"`
}
type SpecChangeSummary struct {
ComponentsAdded int `json:"componentsAdded,omitempty"`
ComponentsModified int `json:"componentsModified,omitempty"`
ComponentsRemoved int `json:"componentsRemoved,omitempty"`
FieldsChanged []string `json:"fieldsChanged,omitempty"`
BeforeHash string `json:"beforeHash"`
AfterHash string `json:"afterHash"`
}
Example:
{
"name": "inject-sidecar",
"specModified": true,
"specChange": {
"summary": {
"componentsAdded": 0,
"componentsModified": 2,
"fieldsChanged": [
"components[0].properties.env[0]",
"components[1].properties.env[0]"
],
"beforeHash": "abc123",
"afterHash": "def456"
},
"fullDiff": "W3sib3AiOiJhZGQiLCJ...", // Only if <10KB
"diffTooLarge": false
}
}
Pros:
- Human-readable summary for quick diagnosis
- Full diff available for detailed debugging (when needed)
- Avoids etcd bloat for large changes
- Fast path (summary only) is cheap (~5ms)
Cons:
- More complex implementation
- Two code paths to maintain
Cost: ~5-15ms depending on size
Scope: Only Diff Spec Changes
Do NOT diff labels/annotations - we already track these explicitly:
"addedLabels": {"team": "platform"}, // ✅ Already clear
"addedAnnotations": {"version": "v1.0"} // ✅ Already clear
Only diff spec transforms - this is where we need help:
"specModified": true, // ❌ Not helpful
"specChange": {...} // ✅ Shows what changed
Computational Impact
Per-Application Overhead:
- Option 1 (JSON Patch): ~10-15ms
- Option 2 (Summary Only): ~2-5ms
- Option 3 (Hybrid): ~5-15ms (average ~8ms)
Context:
- Typical reconciliation: 100-500ms
- Policy rendering (uncached): 30-100ms
- 8ms overhead = ~2-5% of total time ✅ Acceptable
When to Skip:
- If no spec transform: 0ms overhead
- If diff >10KB: compute summary only (~2ms)
- Labels/annotations only: 0ms overhead
Storage Impact
etcd Size:
- JSON Patch for typical sidecar injection: ~2-5KB
- Base64 encoding: +33% → ~3-7KB
- 5 policies with spec changes: ~15-35KB
- Total Application size increase: <5% ✅ Acceptable
etcd Limits:
- Max object size: 1.5MB
- Typical Application: 20-100KB
- With diffs: 25-135KB
- Still well under limit ✅
Implementation Plan
Phase 1: Summary Only (Quick Win)
type PolicyChanges struct {
AddedLabels map[string]string
AddedAnnotations map[string]string
AdditionalContext map[string]interface{}
SpecModified bool
SpecChangeSummary *SpecChangeSummary // NEW
}
Benefits:
- Low overhead (~2ms)
- Helps with debugging
- No storage concerns
Phase 2: Add Full Diff (If Needed)
type PolicyChanges struct {
// ... existing fields
SpecChange *SpecChange // Replaces SpecModified + Summary
}
Benefits:
- Complete visibility
- Can show diffs in UI
- Enables "undo" functionality
Alternative: External Diff Storage
If storage is a concern, store diffs externally:
type AppliedGlobalPolicy struct {
// ... existing fields
SpecDiffRef string // "configmap/my-app-policy-diffs/inject-sidecar"
}
Create a ConfigMap per Application with all policy diffs:
apiVersion: v1
kind: ConfigMap
metadata:
name: my-app-policy-diffs
data:
inject-sidecar: |
[{"op": "add", ...}]
resource-limits: |
[{"op": "replace", ...}]
Pros:
- Doesn't bloat Application status
- Can be cleaned up separately
- No etcd concerns
Cons:
- Extra API call to view diffs
- More objects to manage
- Lifecycle management complexity
Decision Criteria
When to Implement Full Diffs:
✅ YES if:
- Users frequently ask "what did this policy change?"
- Debugging complex spec transforms is common
- UI/CLI tools will display diffs
- "Undo policy effects" is a requirement
❌ NO if:
- Current tracking (labels/annotations/specModified) is sufficient
- Performance is critical (every ms counts)
- Storage is limited
Recommendation:
Start with Phase 1 (Summary Only):
- Low cost (~2ms, ~1KB)
- Immediate value for debugging
- Easy to implement
- Can upgrade to full diffs later if needed
Add Phase 2 (Full Diffs) if:
- Users request it after using summaries
- UI/CLI tools are built to display diffs
- "Policy dry-run" feature is added
Open Questions
-
Should diffs be human-readable or machine-parseable?
- JSON Patch (machine) vs. kubectl-style diff (human)
-
Should we store diffs for all policies or just spec changes?
- Current proposal: Only spec changes
-
Should diffs be compressed?
- Could use gzip before base64 (saves ~60% space)
-
Retention policy?
- Clear diffs on successful reconciliation?
- Keep last N diffs?
-
Should we support "reverting" policy changes?
- Would require storing inverse patches
Example Usage
CLI Tool
# Show what a policy changed
kubectl vela policy diff my-app inject-sidecar
# Output:
Spec changes by policy 'inject-sidecar':
+ Added component 'monitoring-sidecar'
~ Modified components[0].properties.env
+ Added env var: SIDECAR_ENABLED=true
# Show full JSON patch
kubectl vela policy diff my-app inject-sidecar --format=json-patch
UI Dashboard
Application: my-app
Applied Policies:
✅ inject-sidecar (vela-system)
Spec Changes:
├─ Added 1 component
├─ Modified 2 properties
└─ [View Full Diff]
Conclusion
Summary diffs (~2ms, ~1KB) provide 80% of the value with 20% of the cost.
Recommend:
- ✅ Implement Phase 1 (Summary) now
- 🤔 Evaluate Phase 2 (Full Diff) based on usage
- 📊 Add metrics to track diff size distribution
- 🔍 Monitor performance impact in production