goldpinger

mirror of https://github.com/bloomberg/goldpinger.git synced 2026-04-10 12:26:53 +00:00

Author	SHA1	Message	Date
skamboj	7935a11f9d	Merge pull request #164 from cooperlees/master Add UDP probe metrics: packet loss, hop count, and RTT goldpinger-1.1.0 v3.11.0	2026-04-03 13:04:31 -04:00
Sachin Kamboj	de7f4e9004	Bump the version to 3.11.0 Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2026-04-03 12:57:41 -04:00
Cooper Ry Lees	145d2bf000	Rename PathLength to HopCount in swagger model and UI Rename the swagger field from path-length to hop-count so the generated Go struct field (PathLength → HopCount) and JSON key (path-length → hop-count) align with the Prometheus metric rename to goldpinger_peers_hop_count from the previous commit. Signed-off-by: Cooper Ry Lees <me@cooperlees.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 19:45:31 +00:00
Cooper Ry Lees	641b658f23	Address PR #164 review feedback Concurrent HTTP + UDP pings: HTTP ping and UDP probe now run in separate goroutines via sync.WaitGroup, so UDP timeout doesn't add to the ping cycle latency. (skamboj on pinger.go:124) Remove duplicate log: Removed the "UDP echo listener started" log from main.go since StartUDPListener already logs it. (skamboj on main.go:191) Prometheus base units (seconds): Renamed goldpinger_peers_udp_rtt_ms back to goldpinger_peers_udp_rtt_s with sub-millisecond histogram buckets (.0001s to 1s), per Prometheus naming conventions. RTT is computed in seconds internally and only converted to ms for the JSON API. (skamboj on stats.go:150) Rename path_length to hop_count: goldpinger_peers_path_length → goldpinger_peers_hop_count, and SetPeerPathLength → SetPeerHopCount. (skamboj on stats.go:139) UDP buffer constant and packet size clamping: Added udpMaxPacketSize=1500 constant, documented as standard Ethernet MTU — the largest UDP payload that survives most networks without fragmentation. Used for both listener and prober receive buffers. ProbeUDP now clamps UDP_PACKET_SIZE to udpMaxPacketSize to prevent silent truncation if someone configures a size > MTU. (skamboj on udp_probe.go:54) Guard count=0: ProbeUDP returns an error immediately if count <= 0 instead of dividing by zero. (skamboj on udp_probe.go:176) UDP error counter: Added goldpinger_udp_errors_total counter (labels: goldpinger_instance, host). CountUDPError is called on dial failures and send errors. (skamboj on udp_probe.go:115) Test: random source port for full loss: TestProbeUDP_FullLoss now binds an ephemeral port and closes it, instead of assuming port 19999 is free. (skamboj on udp_probe_test.go:56) Test: partial loss validation: New TestProbeUDP_PartialLoss uses a lossy echo listener that drops every Nth packet to validate loss calculations are exact: drop every 2nd → 50.0%, every 3rd → 33.3%, every 5th → 20.0%, every 10th → 10.0% (skamboj on udp_probe_test.go:96) Test: zero count: New TestProbeUDP_ZeroCount verifies error is returned for count=0. Test results: ``` === RUN TestProbeUDP_NoLoss udp_probe_test.go:88: avg UDP RTT: 0.0816 ms --- PASS: TestProbeUDP_NoLoss (0.00s) === RUN TestProbeUDP_FullLoss --- PASS: TestProbeUDP_FullLoss (0.00s) === RUN TestProbeUDP_PartialLoss === RUN TestProbeUDP_PartialLoss/drop_every_2nd_(50%) udp_probe_test.go:134: loss: 50.0% (expected 50.0%) === RUN TestProbeUDP_PartialLoss/drop_every_3rd_(33.3%) udp_probe_test.go:134: loss: 33.3% (expected 33.3%) === RUN TestProbeUDP_PartialLoss/drop_every_5th_(20%) udp_probe_test.go:134: loss: 20.0% (expected 20.0%) === RUN TestProbeUDP_PartialLoss/drop_every_10th_(10%) udp_probe_test.go:134: loss: 10.0% (expected 10.0%) --- PASS: TestProbeUDP_PartialLoss (8.00s) === RUN TestProbeUDP_ZeroCount --- PASS: TestProbeUDP_ZeroCount (0.00s) === RUN TestProbeUDP_PacketFormat --- PASS: TestProbeUDP_PacketFormat (0.00s) === RUN TestEstimateHops --- PASS: TestEstimateHops (0.00s) PASS ``` Signed-off-by: Cooper Ry Lees <me@cooperlees.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 19:37:52 +00:00
Cooper Ry Lees	832bc7b598	Add UDP probe metrics: packet loss, hop count, and RTT Add an opt-in UDP echo probe that runs alongside the existing HTTP ping. Each goldpinger pod listens on a configurable UDP port (default 6969). During each ping cycle, the prober sends N sequenced packets to the peer's listener, which echoes them back. From the replies we compute packet loss percentage, path hop count (from IPv4 TTL / IPv6 HopLimit), and average round-trip time. New Prometheus metrics: - goldpinger_peers_loss_pct (gauge) — per-peer UDP loss % - goldpinger_peers_path_length (gauge) — estimated hop count - goldpinger_peers_udp_rtt_ms (histogram) — UDP RTT in milliseconds The graph UI shows yellow edges for links with partial loss, and displays sub-millisecond UDP RTT instead of HTTP latency when UDP is enabled. Stale metric labels are cleaned up when a pinger is destroyed so rolled pods don't leave ghost entries. Configuration (all via env vars, disabled by default): UDP_ENABLED=true enable UDP probing and listener UDP_PORT=6969 listener port UDP_PACKET_COUNT=10 packets per probe UDP_PACKET_SIZE=64 bytes per packet UDP_TIMEOUT=1s probe timeout New files: pkg/goldpinger/udp_probe.go — echo listener + probe client pkg/goldpinger/udp_probe_test.go — unit tests Unit tests: ``` === RUN TestProbeUDP_NoLoss udp_probe_test.go:51: avg UDP RTT: 0.0823 ms --- PASS: TestProbeUDP_NoLoss (0.00s) === RUN TestProbeUDP_FullLoss --- PASS: TestProbeUDP_FullLoss (0.00s) === RUN TestProbeUDP_PacketFormat --- PASS: TestProbeUDP_PacketFormat (0.00s) === RUN TestEstimateHops --- PASS: TestEstimateHops (0.00s) PASS ``` Cluster test (6-node IPv6 k8s, UDP_ENABLED=true): ``` Prometheus metrics (healthy cluster, 0% loss): goldpinger_peers_loss_pct{...,pod_ip="fd00:4:69:3::3746"} 0 goldpinger_peers_path_length{...,pod_ip="fd00:4:69:3::3746"} 0 Simulated 50% loss via ip6tables DROP in pod netns on node-0: goldpinger_peers_loss_pct{instance="server",...} 60 goldpinger_peers_loss_pct{instance="node-1",...} 30 goldpinger_peers_loss_pct{instance="server2",...} 30 UDP RTT vs HTTP RTT (check_all API): node-0 -> server: udp=2.18ms http=2ms node-2 -> node-2: udp=0.40ms http=1ms server -> node-0: udp=0.55ms http=2ms Post-rollout stale metrics cleanup verified: All 36 edges show 0% loss, no stale pod IPs. ``` Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Cooper Ry Lees <me@cooperlees.com>	2026-03-27 16:05:32 +00:00
skamboj	8d63d44fe2	Merge pull request #162 from skamboj/update-golang Some checks failed Helm Publish / helm_publish (push) Failing after 9s CI / build (push) Successful in 5m56s Update golang to 1.25 and update all dependencies v3.10.3 goldpinger-1.0.2	2026-01-28 21:12:18 -05:00
Sachin Kamboj	4392ae9f09	Update chart versions as well Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2026-01-28 20:37:47 -05:00
Sachin Kamboj	cb9c8ae248	Update goldpinger version Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2026-01-28 20:28:04 -05:00
Sachin Kamboj	b54e3feea6	Update dependencies Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2026-01-28 20:27:05 -05:00
Sachin Kamboj	0dfa55880c	Update to golang 1.25 Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2026-01-28 20:26:41 -05:00
skamboj	a93c8040a1	Merge pull request #161 from skamboj/update-workflows All checks were successful Helm Publish / helm_publish (push) Successful in 52s CI / build (push) Successful in 6m19s Update versions of the various actions	2026-01-28 20:11:48 -05:00
Sachin Kamboj	3ce341330b	Attempt to fix the bake step Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2026-01-28 16:30:22 -05:00
Sachin Kamboj	a85572f799	More updates to the versions Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2026-01-28 08:50:01 -05:00
Sachin Kamboj	f29301ed41	Merge remote-tracking branch 'upstream/master' into update-workflows Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2026-01-28 08:37:45 -05:00
Sachin Kamboj	7379914781	Update versions of the various actions Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2026-01-27 21:35:26 -05:00
skamboj	52e86c25f5	Merge pull request #152 from Leundai/add-deepwiki Some checks failed Helm Publish / helm_publish (push) Successful in 55s CI / build (push) Failing after 47m53s feat: Add deepwiki badge	2025-10-04 11:21:14 -04:00
leundai	ba779f50e7	feat: Add deepwiki badge Small enhancement to improve quick onboarding for the curious Signed-off-by: leundai <leogalindofrias@gmail.com>	2025-07-12 14:48:02 -04:00
skamboj	02065cf812	Merge pull request #148 from scoof/improvement-metricrelabelings improvement: support relabelings in ServiceMonitor v3.10.2 goldpinger-1.0.1	2024-11-11 09:30:59 -05:00
skamboj	98bee8cc4e	Merge branch 'master' into improvement-metricrelabelings	2024-11-11 09:17:09 -05:00
Sachin Kamboj	41680b856a	Up the app version Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2024-11-11 09:16:18 -05:00
Sachin Kamboj	8db3d2f2de	Fix typo Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2024-11-11 09:14:07 -05:00
skamboj	d1c60472df	Merge pull request #147 from avnes/fix/typo-in-chart-description Fix small typo i Chart description. Change troublshoot to troubleshoot	2024-11-11 09:05:44 -05:00
skamboj	65cf0cab7c	Merge branch 'master' into fix/typo-in-chart-description	2024-11-11 09:04:52 -05:00
skamboj	438c5d0739	Merge branch 'master' into improvement-metricrelabelings	2024-11-11 09:02:41 -05:00
skamboj	259ab8f22a	Merge pull request #150 from laverya/build-with-go-1.23 build with go 1.23	2024-11-11 09:02:27 -05:00
skamboj	31a851fbb0	Merge branch 'master' into fix/typo-in-chart-description	2024-11-11 08:56:16 -05:00
skamboj	6401b59cb8	Merge branch 'master' into build-with-go-1.23	2024-11-11 08:54:55 -05:00
skamboj	1577ae84b8	Merge pull request #149 from laverya/update-x-image-for-cve-2024-24792 update golang.org/x/image to resolve cve-2024-24792	2024-11-11 08:54:26 -05:00
Andrew Lavery	e1b06a5236	build with go 1.23 Signed-off-by: Andrew Lavery <laverya@umich.edu>	2024-10-11 17:07:24 +02:00
Andrew Lavery	2f77117b89	update golang.org/x/image to resolve cve-2024-24792 Signed-off-by: Andrew Lavery <laverya@umich.edu>	2024-10-11 17:00:52 +02:00
Andreas Plesner	d8819d6d6d	Fix datatype Signed-off-by: Andreas Plesner <apj@mutt.dk>	2024-09-09 20:40:26 +02:00
Sachin Kamboj	f7ab34e462	Update the chart version Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2024-09-09 13:38:55 -04:00
Andreas Plesner	b07803d8c6	fix: move metricRelabelings to correct section Signed-off-by: Andreas Plesner <apj@mutt.dk>	2024-08-26 12:24:08 +02:00
Andreas Plesner	876b3f4068	improvement: support relabelings in ServiceMonitor Signed-Off-By: Andreas Plesner <apj@mutt.dk>	2024-08-12 09:13:29 +02:00
Audun Nes	2addb57cb4	iFix small typo i Chart description. Change troublshoot to troubleshoot Signed-off-by: Audun Nes <audun.nes@gmail.com>	2024-06-13 13:12:47 +02:00
skamboj	36b0aed3b1	Merge pull request #137 from DerekTBrown/add-helm-chart feat: add helm chart goldpinger-1.0.0	2024-05-14 10:46:43 -04:00
Sachin Kamboj	a909e03de9	The appVersion should not have a v Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2024-05-14 10:33:36 -04:00
Sachin Kamboj	b8035264ed	Update the publishing workflow Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2024-05-14 10:16:45 -04:00
Sachin Kamboj	6a3794f3d6	Secure by default - set the security context and pod security context Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2024-05-14 10:05:07 -04:00
Sachin Kamboj	f514bac57c	Remove kubernetes version to use the default image Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2024-05-14 10:01:00 -04:00
Sachin Kamboj	a1a481ffe9	Update to kube 1.30 for the kind cluster as well Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2024-05-14 08:42:55 -04:00
Sachin Kamboj	aed183926e	Update the versions to the latest Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2024-05-14 08:36:28 -04:00
skamboj	dbd1f5f295	Merge branch 'master' into add-helm-chart	2024-05-13 15:41:59 -04:00
skamboj	a8f1a76691	Merge pull request #143 from pettersolberg88/master Upgrade golang to 1.22 and update dependencies v3.10.1	2024-05-13 14:40:08 -04:00
Sachin Kamboj	f4aa170407	Update the version Signed-off-by: Sachin Kamboj <skamboj1@bloomberg.net>	2024-05-13 14:32:23 -04:00
Petter Solberg	c740646bc2	Upgrade golang to 1.22 and update dependencies Signed-off-by: Petter Solberg <pettersolberg88@gmail.com>	2024-04-16 21:27:40 +02:00
skamboj	41af078647	Merge pull request #142 from abctaylor/abctaylor-serviceaccount Add default namespace `default` to ServiceAccount definition in example yaml	2024-04-12 09:21:17 -04:00
ABC Taylor	c70d8a6a8a	Merge branch 'master' into abctaylor-serviceaccount	2024-04-11 08:41:04 +01:00
ABC Taylor	562df92c3a	Add default namespace `default` to ServiceAccount definition, to catch case where users find-replace `default` with another namespace but don't change it for the ServiceAccount Signed-Off-By: ABC Taylor <abc@abctaylor.com>	2024-04-11 08:37:09 +01:00
skamboj	e22842fbfb	Merge pull request #135 from j4ckstraw/use-protobuf use protobuf and add resourceVersion in listOption v3.10.0	2024-04-08 15:49:26 -04:00

1 2 3 4 5 ...

443 Commits