22 Commits

Author SHA1 Message Date
Cooper Ry Lees
832bc7b598 Add UDP probe metrics: packet loss, hop count, and RTT
Add an opt-in UDP echo probe that runs alongside the existing HTTP
ping. Each goldpinger pod listens on a configurable UDP port (default
6969). During each ping cycle, the prober sends N sequenced packets
to the peer's listener, which echoes them back. From the replies we
compute packet loss percentage, path hop count (from IPv4 TTL / IPv6
HopLimit), and average round-trip time.

New Prometheus metrics:
  - goldpinger_peers_loss_pct      (gauge)     — per-peer UDP loss %
  - goldpinger_peers_path_length   (gauge)     — estimated hop count
  - goldpinger_peers_udp_rtt_ms    (histogram) — UDP RTT in milliseconds

The graph UI shows yellow edges for links with partial loss, and
displays sub-millisecond UDP RTT instead of HTTP latency when UDP
is enabled. Stale metric labels are cleaned up when a pinger is
destroyed so rolled pods don't leave ghost entries.

Configuration (all via env vars, disabled by default):
  UDP_ENABLED=true      enable UDP probing and listener
  UDP_PORT=6969         listener port
  UDP_PACKET_COUNT=10   packets per probe
  UDP_PACKET_SIZE=64    bytes per packet
  UDP_TIMEOUT=1s        probe timeout

New files:
  pkg/goldpinger/udp_probe.go       — echo listener + probe client
  pkg/goldpinger/udp_probe_test.go  — unit tests

Unit tests:
```
=== RUN   TestProbeUDP_NoLoss
    udp_probe_test.go:51: avg UDP RTT: 0.0823 ms
--- PASS: TestProbeUDP_NoLoss (0.00s)
=== RUN   TestProbeUDP_FullLoss
--- PASS: TestProbeUDP_FullLoss (0.00s)
=== RUN   TestProbeUDP_PacketFormat
--- PASS: TestProbeUDP_PacketFormat (0.00s)
=== RUN   TestEstimateHops
--- PASS: TestEstimateHops (0.00s)
PASS
```

Cluster test (6-node IPv6 k8s, UDP_ENABLED=true):
```
Prometheus metrics (healthy cluster, 0% loss):
  goldpinger_peers_loss_pct{...,pod_ip="fd00:4:69:3::3746"} 0
  goldpinger_peers_path_length{...,pod_ip="fd00:4:69:3::3746"} 0

Simulated 50% loss via ip6tables DROP in pod netns on node-0:
  goldpinger_peers_loss_pct{instance="server",...} 60
  goldpinger_peers_loss_pct{instance="node-1",...} 30
  goldpinger_peers_loss_pct{instance="server2",...} 30

UDP RTT vs HTTP RTT (check_all API):
  node-0 -> server:  udp=2.18ms  http=2ms
  node-2 -> node-2:  udp=0.40ms  http=1ms
  server -> node-0:  udp=0.55ms  http=2ms

Post-rollout stale metrics cleanup verified:
  All 36 edges show 0% loss, no stale pod IPs.
```

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Cooper Ry Lees <me@cooperlees.com>
2026-03-27 16:05:32 +00:00
Derek Brown
4af6666853 feat: add helm chart
Signed-off-by: Derek Brown <derektbrown@users.noreply.github.com>
2023-09-25 15:51:14 -07:00
Will Daly
1f3ad0acc9 Remove deprecated rbac.authorization.k8s.io/v1beta1
This commit updates the README and examples to use
rbac.authorization.k8s.io/v1 instead, which has been available
since K8s 1.8

rbac.authorization.k8s.io/v1beta1 was deprecated in K8s 1.17
and removed in K8s 1.22.

Reference:
https://kubernetes.io/docs/reference/using-api/deprecation-guide/#rbac-resources-v122

Signed-off-by: Will Daly <widaly@microsoft.com>
2023-05-03 11:29:42 -07:00
tgetachew
14ea96999a add external probes
Signed-off-by: kitfoman <thaddeusgetachew@gmail.com>

make timeout flags backwards compatible

Signed-off-by: kitfoman <thaddeusgetachew@gmail.com>
2022-05-08 22:02:09 -04:00
Tyler Lloyd
5b080c7087 update readme for IPv6 example
Signed-off-by: Tyler Lloyd <Tyler.Lloyd@microsoft.com>
2021-11-03 16:58:38 -04:00
Mikolaj Pawlikowski
0260da795f Update example-with-kubeconfig.yaml
Signed-off-by: Mikolaj Pawlikowski <mikolaj@pawlikowski.pl>
2020-05-08 11:44:30 +01:00
Mikolaj Pawlikowski
4f8d872700 Update example-serviceaccounts.yml
Signed-off-by: Mikolaj Pawlikowski <mikolaj@pawlikowski.pl>
2020-05-08 11:44:30 +01:00
Ángel Barrera Sánchez
8a86a74478 Change dashboard's datasource parameter to be variable
Signed-off-by: Ángel Barrera Sánchez <angel@sighup.io>
2019-11-14 17:19:42 +01:00
Ángel Barrera Sánchez
4dab241ff6 Change daemonset definition to be more secure
Signed-off-by: Ángel Barrera Sánchez <angel@sighup.io>
2019-11-14 17:19:42 +01:00
Mikolaj Pawlikowski
433a6b8b88 Add a note about the DNS usage
Signed-off-by: Mikolaj Pawlikowski <mikolaj@pawlikowski.pl>
2019-09-06 14:51:22 +01:00
Danny Kulchinsky
d243f0fb59 update example
Signed-off-by: Danny Kulchinsky <danny.kul@gmail.com>
Signed-off-by: Danny Kulchinsky <dannyk@tuenti.com>
2019-03-17 21:01:15 -04:00
Mikolaj Pawlikowski
c006eede86 Merge branch 'master' into stn/rendezvous-hashing 2019-03-13 17:11:23 +00:00
stuart nelson
771f303062 Add rendezvous hash for selecting subset of nodes
Select a user-defined number of pods via
rendezvous hash. This is important for larger
clusters, where the metric cardinality explosion
is too much for a single prometheus to handle.

Signed-off-by: stuart nelson <stuartnelson3@gmail.com>
2019-03-13 15:30:18 +01:00
Mikolaj Pawlikowski
6cd12ef2d5 Update example-with-kubeconfig.yaml
Signed-off-by: Mikolaj Pawlikowski <mikolaj@pawlikowski.pl>
2019-03-12 23:07:54 +00:00
Mikolaj Pawlikowski
8182369c02 Update example-serviceaccounts.yml
Signed-off-by: Mikolaj Pawlikowski <mikolaj@pawlikowski.pl>
2019-03-12 23:07:29 +00:00
Danny Kulchinsky
1de9257854 fix typo and version bump
Signed-off-by: Danny Kulchinsky <danny.kul@gmail.com>
2019-01-02 10:24:04 -05:00
Marcos Diez
e23ddf5083 More complete example-serviceaccounts.yml, now with rbac rules
Signed-off-by: Marcos Diez <marcos@unitron.com.br>
2018-12-26 16:55:58 -02:00
Mikolaj Pawlikowski
7eec26b194 Add a large size screenshot
Signed-off-by: Mikolaj Pawlikowski <mikolaj@pawlikowski.pl>
2018-12-20 18:52:44 +01:00
Mikolaj Pawlikowski
ca464c0e77 Merge branch 'master' into example-serviceaccounts 2018-12-11 16:49:12 -08:00
Mikolaj Pawlikowski
1342f80776 Create example-serviceaccounts.yml
Signed-off-by: Mikolaj Pawlikowski <mikolaj@pawlikowski.pl>
2018-12-11 16:45:31 -08:00
Mikolaj Pawlikowski
9dfed4af5f Update goldpinger-dashboard.json
Signed-off-by: Mikolaj Pawlikowski <mikolaj@pawlikowski.pl>
2018-12-06 15:07:18 -05:00
Kevin P. Fleming
fa643e9be8 Initial commit 2018-12-04 13:33:45 -05:00