goldpinger

mirror of https://github.com/bloomberg/goldpinger.git synced 2026-05-25 10:02:45 +00:00

Files

Cooper Ry Lees 5e625bbd40 Prune stale Prometheus metrics for defunct peer pod IPs on teardown

After a DaemonSet rolling update, goldpinger retained response-time
histogram and error counter series for old pod IPs that no longer exist.
These stale single-sample series skewed P95/P99 latency calculations and
made transient rollout errors appear permanent. (Fixes #167)

The existing destroyPingers path only cleaned UDP-specific per-peer
metrics (and only when UDP was enabled). This adds:

- DeletePeerMetrics(): removes goldpinger_peers_response_time_s histogram
  label sets for destroyed peers, called unconditionally on pinger teardown
- goldpinger_udp_errors_total cleanup in DeletePeerUDPMetrics(), which was
  previously missed

Testing:
- TestDeletePeerMetrics_CleansResponseTimeHistogram: verifies the
  response-time histogram label set is removed after DeletePeerMetrics()
- TestDeletePeerMetrics_LeavesOtherPeersIntact: verifies pruning one
  peer does not affect another peer's metric series
- TestDeletePeerUDPMetrics_CleansAllPerPeerMetrics: extended to also
  verify goldpinger_udp_errors_total cleanup
- All 11 tests pass (go test ./pkg/goldpinger/ -v)

Validated on a 6-node IPv6 kubeadm cluster by upgrading goldpinger with
a rolling update and confirming /metrics only contains current pod IPs
after the rollout completes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Cooper Ry Lees <me@cooperlees.com>

2026-04-20 14:02:40 -05:00

client

Add UDP probe metrics: packet loss, hop count, and RTT

2026-03-27 16:05:32 +00:00

goldpinger

Prune stale Prometheus metrics for defunct peer pod IPs on teardown

2026-04-20 14:02:40 -05:00

models

Rename PathLength to HopCount in swagger model and UI

2026-04-02 19:45:31 +00:00

restapi

Rename PathLength to HopCount in swagger model and UI

2026-04-02 19:45:31 +00:00