443 Commits

Author SHA1 Message Date
skamboj
7935a11f9d Merge pull request #164 from cooperlees/master
Add UDP probe metrics: packet loss, hop count, and RTT
goldpinger-1.1.0 v3.11.0
2026-04-03 13:04:31 -04:00
Sachin Kamboj
de7f4e9004 Bump the version to 3.11.0
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2026-04-03 12:57:41 -04:00
Cooper Ry Lees
145d2bf000 Rename PathLength to HopCount in swagger model and UI
Rename the swagger field from path-length to hop-count so the
generated Go struct field (PathLength → HopCount) and JSON key
(path-length → hop-count) align with the Prometheus metric rename
to goldpinger_peers_hop_count from the previous commit.

Signed-off-by: Cooper Ry Lees <me@cooperlees.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 19:45:31 +00:00
Cooper Ry Lees
641b658f23 Address PR #164 review feedback
Concurrent HTTP + UDP pings:
  HTTP ping and UDP probe now run in separate goroutines via
  sync.WaitGroup, so UDP timeout doesn't add to the ping cycle
  latency. (skamboj on pinger.go:124)

Remove duplicate log:
  Removed the "UDP echo listener started" log from main.go since
  StartUDPListener already logs it. (skamboj on main.go:191)

Prometheus base units (seconds):
  Renamed goldpinger_peers_udp_rtt_ms back to goldpinger_peers_udp_rtt_s
  with sub-millisecond histogram buckets (.0001s to 1s), per Prometheus
  naming conventions. RTT is computed in seconds internally and only
  converted to ms for the JSON API. (skamboj on stats.go:150)

Rename path_length to hop_count:
  goldpinger_peers_path_length → goldpinger_peers_hop_count, and
  SetPeerPathLength → SetPeerHopCount. (skamboj on stats.go:139)

UDP buffer constant and packet size clamping:
  Added udpMaxPacketSize=1500 constant, documented as standard Ethernet
  MTU — the largest UDP payload that survives most networks without
  fragmentation. Used for both listener and prober receive buffers.
  ProbeUDP now clamps UDP_PACKET_SIZE to udpMaxPacketSize to prevent
  silent truncation if someone configures a size > MTU.
  (skamboj on udp_probe.go:54)

Guard count=0:
  ProbeUDP returns an error immediately if count <= 0 instead of
  dividing by zero. (skamboj on udp_probe.go:176)

UDP error counter:
  Added goldpinger_udp_errors_total counter (labels: goldpinger_instance,
  host). CountUDPError is called on dial failures and send errors.
  (skamboj on udp_probe.go:115)

Test: random source port for full loss:
  TestProbeUDP_FullLoss now binds an ephemeral port and closes it,
  instead of assuming port 19999 is free. (skamboj on udp_probe_test.go:56)

Test: partial loss validation:
  New TestProbeUDP_PartialLoss uses a lossy echo listener that drops
  every Nth packet to validate loss calculations are exact:
    drop every 2nd → 50.0%, every 3rd → 33.3%,
    every 5th → 20.0%, every 10th → 10.0%
  (skamboj on udp_probe_test.go:96)

Test: zero count:
  New TestProbeUDP_ZeroCount verifies error is returned for count=0.

Test results:
```
=== RUN   TestProbeUDP_NoLoss
    udp_probe_test.go:88: avg UDP RTT: 0.0816 ms
--- PASS: TestProbeUDP_NoLoss (0.00s)
=== RUN   TestProbeUDP_FullLoss
--- PASS: TestProbeUDP_FullLoss (0.00s)
=== RUN   TestProbeUDP_PartialLoss
=== RUN   TestProbeUDP_PartialLoss/drop_every_2nd_(50%)
    udp_probe_test.go:134: loss: 50.0% (expected 50.0%)
=== RUN   TestProbeUDP_PartialLoss/drop_every_3rd_(33.3%)
    udp_probe_test.go:134: loss: 33.3% (expected 33.3%)
=== RUN   TestProbeUDP_PartialLoss/drop_every_5th_(20%)
    udp_probe_test.go:134: loss: 20.0% (expected 20.0%)
=== RUN   TestProbeUDP_PartialLoss/drop_every_10th_(10%)
    udp_probe_test.go:134: loss: 10.0% (expected 10.0%)
--- PASS: TestProbeUDP_PartialLoss (8.00s)
=== RUN   TestProbeUDP_ZeroCount
--- PASS: TestProbeUDP_ZeroCount (0.00s)
=== RUN   TestProbeUDP_PacketFormat
--- PASS: TestProbeUDP_PacketFormat (0.00s)
=== RUN   TestEstimateHops
--- PASS: TestEstimateHops (0.00s)
PASS
```

Signed-off-by: Cooper Ry Lees <me@cooperlees.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 19:37:52 +00:00
Cooper Ry Lees
832bc7b598 Add UDP probe metrics: packet loss, hop count, and RTT
Add an opt-in UDP echo probe that runs alongside the existing HTTP
ping. Each goldpinger pod listens on a configurable UDP port (default
6969). During each ping cycle, the prober sends N sequenced packets
to the peer's listener, which echoes them back. From the replies we
compute packet loss percentage, path hop count (from IPv4 TTL / IPv6
HopLimit), and average round-trip time.

New Prometheus metrics:
  - goldpinger_peers_loss_pct      (gauge)     — per-peer UDP loss %
  - goldpinger_peers_path_length   (gauge)     — estimated hop count
  - goldpinger_peers_udp_rtt_ms    (histogram) — UDP RTT in milliseconds

The graph UI shows yellow edges for links with partial loss, and
displays sub-millisecond UDP RTT instead of HTTP latency when UDP
is enabled. Stale metric labels are cleaned up when a pinger is
destroyed so rolled pods don't leave ghost entries.

Configuration (all via env vars, disabled by default):
  UDP_ENABLED=true      enable UDP probing and listener
  UDP_PORT=6969         listener port
  UDP_PACKET_COUNT=10   packets per probe
  UDP_PACKET_SIZE=64    bytes per packet
  UDP_TIMEOUT=1s        probe timeout

New files:
  pkg/goldpinger/udp_probe.go       — echo listener + probe client
  pkg/goldpinger/udp_probe_test.go  — unit tests

Unit tests:
```
=== RUN   TestProbeUDP_NoLoss
    udp_probe_test.go:51: avg UDP RTT: 0.0823 ms
--- PASS: TestProbeUDP_NoLoss (0.00s)
=== RUN   TestProbeUDP_FullLoss
--- PASS: TestProbeUDP_FullLoss (0.00s)
=== RUN   TestProbeUDP_PacketFormat
--- PASS: TestProbeUDP_PacketFormat (0.00s)
=== RUN   TestEstimateHops
--- PASS: TestEstimateHops (0.00s)
PASS
```

Cluster test (6-node IPv6 k8s, UDP_ENABLED=true):
```
Prometheus metrics (healthy cluster, 0% loss):
  goldpinger_peers_loss_pct{...,pod_ip="fd00:4:69:3::3746"} 0
  goldpinger_peers_path_length{...,pod_ip="fd00:4:69:3::3746"} 0

Simulated 50% loss via ip6tables DROP in pod netns on node-0:
  goldpinger_peers_loss_pct{instance="server",...} 60
  goldpinger_peers_loss_pct{instance="node-1",...} 30
  goldpinger_peers_loss_pct{instance="server2",...} 30

UDP RTT vs HTTP RTT (check_all API):
  node-0 -> server:  udp=2.18ms  http=2ms
  node-2 -> node-2:  udp=0.40ms  http=1ms
  server -> node-0:  udp=0.55ms  http=2ms

Post-rollout stale metrics cleanup verified:
  All 36 edges show 0% loss, no stale pod IPs.
```

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Cooper Ry Lees <me@cooperlees.com>
2026-03-27 16:05:32 +00:00
skamboj
8d63d44fe2 Merge pull request #162 from skamboj/update-golang
Some checks failed
Helm Publish / helm_publish (push) Failing after 9s
CI / build (push) Successful in 5m56s
Update golang to 1.25 and update all dependencies
v3.10.3 goldpinger-1.0.2
2026-01-28 21:12:18 -05:00
Sachin Kamboj
4392ae9f09 Update chart versions as well
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2026-01-28 20:37:47 -05:00
Sachin Kamboj
cb9c8ae248 Update goldpinger version
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2026-01-28 20:28:04 -05:00
Sachin Kamboj
b54e3feea6 Update dependencies
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2026-01-28 20:27:05 -05:00
Sachin Kamboj
0dfa55880c Update to golang 1.25
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2026-01-28 20:26:41 -05:00
skamboj
a93c8040a1 Merge pull request #161 from skamboj/update-workflows
All checks were successful
Helm Publish / helm_publish (push) Successful in 52s
CI / build (push) Successful in 6m19s
Update versions of the various actions
2026-01-28 20:11:48 -05:00
Sachin Kamboj
3ce341330b Attempt to fix the bake step
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2026-01-28 16:30:22 -05:00
Sachin Kamboj
a85572f799 More updates to the versions
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2026-01-28 08:50:01 -05:00
Sachin Kamboj
f29301ed41 Merge remote-tracking branch 'upstream/master' into update-workflows
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2026-01-28 08:37:45 -05:00
Sachin Kamboj
7379914781 Update versions of the various actions
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2026-01-27 21:35:26 -05:00
skamboj
52e86c25f5 Merge pull request #152 from Leundai/add-deepwiki
Some checks failed
Helm Publish / helm_publish (push) Successful in 55s
CI / build (push) Failing after 47m53s
feat: Add deepwiki badge
2025-10-04 11:21:14 -04:00
leundai
ba779f50e7 feat: Add deepwiki badge
Small enhancement to improve quick onboarding for the curious

Signed-off-by: leundai <leogalindofrias@gmail.com>
2025-07-12 14:48:02 -04:00
skamboj
02065cf812 Merge pull request #148 from scoof/improvement-metricrelabelings
improvement: support relabelings in ServiceMonitor
v3.10.2 goldpinger-1.0.1
2024-11-11 09:30:59 -05:00
skamboj
98bee8cc4e Merge branch 'master' into improvement-metricrelabelings 2024-11-11 09:17:09 -05:00
Sachin Kamboj
41680b856a Up the app version
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2024-11-11 09:16:18 -05:00
Sachin Kamboj
8db3d2f2de Fix typo
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2024-11-11 09:14:07 -05:00
skamboj
d1c60472df Merge pull request #147 from avnes/fix/typo-in-chart-description
Fix small typo i Chart description. Change troublshoot to troubleshoot
2024-11-11 09:05:44 -05:00
skamboj
65cf0cab7c Merge branch 'master' into fix/typo-in-chart-description 2024-11-11 09:04:52 -05:00
skamboj
438c5d0739 Merge branch 'master' into improvement-metricrelabelings 2024-11-11 09:02:41 -05:00
skamboj
259ab8f22a Merge pull request #150 from laverya/build-with-go-1.23
build with go 1.23
2024-11-11 09:02:27 -05:00
skamboj
31a851fbb0 Merge branch 'master' into fix/typo-in-chart-description 2024-11-11 08:56:16 -05:00
skamboj
6401b59cb8 Merge branch 'master' into build-with-go-1.23 2024-11-11 08:54:55 -05:00
skamboj
1577ae84b8 Merge pull request #149 from laverya/update-x-image-for-cve-2024-24792
update golang.org/x/image to resolve cve-2024-24792
2024-11-11 08:54:26 -05:00
Andrew Lavery
e1b06a5236 build with go 1.23
Signed-off-by: Andrew Lavery <laverya@umich.edu>
2024-10-11 17:07:24 +02:00
Andrew Lavery
2f77117b89 update golang.org/x/image to resolve cve-2024-24792
Signed-off-by: Andrew Lavery <laverya@umich.edu>
2024-10-11 17:00:52 +02:00
Andreas Plesner
d8819d6d6d Fix datatype
Signed-off-by: Andreas Plesner <apj@mutt.dk>
2024-09-09 20:40:26 +02:00
Sachin Kamboj
f7ab34e462 Update the chart version
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2024-09-09 13:38:55 -04:00
Andreas Plesner
b07803d8c6 fix: move metricRelabelings to correct section
Signed-off-by: Andreas Plesner <apj@mutt.dk>
2024-08-26 12:24:08 +02:00
Andreas Plesner
876b3f4068 improvement: support relabelings in ServiceMonitor
Signed-Off-By: Andreas Plesner <apj@mutt.dk>
2024-08-12 09:13:29 +02:00
Audun Nes
2addb57cb4 iFix small typo i Chart description. Change troublshoot to troubleshoot
Signed-off-by: Audun Nes <audun.nes@gmail.com>
2024-06-13 13:12:47 +02:00
skamboj
36b0aed3b1 Merge pull request #137 from DerekTBrown/add-helm-chart
feat: add helm chart
goldpinger-1.0.0
2024-05-14 10:46:43 -04:00
Sachin Kamboj
a909e03de9 The appVersion should not have a v
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2024-05-14 10:33:36 -04:00
Sachin Kamboj
b8035264ed Update the publishing workflow
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2024-05-14 10:16:45 -04:00
Sachin Kamboj
6a3794f3d6 Secure by default - set the security context and pod security context
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2024-05-14 10:05:07 -04:00
Sachin Kamboj
f514bac57c Remove kubernetes version to use the default image
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2024-05-14 10:01:00 -04:00
Sachin Kamboj
a1a481ffe9 Update to kube 1.30 for the kind cluster as well
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2024-05-14 08:42:55 -04:00
Sachin Kamboj
aed183926e Update the versions to the latest
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2024-05-14 08:36:28 -04:00
skamboj
dbd1f5f295 Merge branch 'master' into add-helm-chart 2024-05-13 15:41:59 -04:00
skamboj
a8f1a76691 Merge pull request #143 from pettersolberg88/master
Upgrade golang to 1.22 and update dependencies
v3.10.1
2024-05-13 14:40:08 -04:00
Sachin Kamboj
f4aa170407 Update the version
Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>
2024-05-13 14:32:23 -04:00
Petter Solberg
c740646bc2 Upgrade golang to 1.22 and update dependencies
Signed-off-by: Petter Solberg <pettersolberg88@gmail.com>
2024-04-16 21:27:40 +02:00
skamboj
41af078647 Merge pull request #142 from abctaylor/abctaylor-serviceaccount
Add default namespace `default` to ServiceAccount definition in example yaml
2024-04-12 09:21:17 -04:00
ABC Taylor
c70d8a6a8a Merge branch 'master' into abctaylor-serviceaccount 2024-04-11 08:41:04 +01:00
ABC Taylor
562df92c3a Add default namespace default to ServiceAccount definition, to catch case where users find-replace default with another namespace but don't change it for the ServiceAccount
Signed-Off-By: ABC Taylor <abc@abctaylor.com>
2024-04-11 08:37:09 +01:00
skamboj
e22842fbfb Merge pull request #135 from j4ckstraw/use-protobuf
use protobuf and add resourceVersion in listOption
v3.10.0
2024-04-08 15:49:26 -04:00