This prevents cluttering host.LocalNetworks with lots of /32
addresses. These were unsightly and rather distracting in the UI. They
also bloated the report and slowed down server-side rendering.
Fixes#2748.
The rendering code checks whether endpoint IPs are part of
cluster-local networks. Due to the prevalence of endpoints - medium
sized reports can contain many thousands of endpoints - this is
performance critical. Alas the existing code performs the check via a
linear scan of a list of networks. That is slow when there are more
than a few, which will be the case in the context of k8s, since there
the probes register service IPs as local /32 networks.
Here we change representation of the set of networks to a prefix
tree (aka trie), which is well-suited for IP network membership checks
since networks are in fact a bitstring prefixes.
The specific representation is a crit-bit tree, but that choice was
purely based on implementation convenience - the chosen library is the
only one I could find that directly supports IP networks.
The rendering code checks whether endpoint IPs are part of
cluster-local networks. Due to the prevalence of endpoints - medium
sized reports can contain many thousands of endpoints - this is
performance critical. Alas the existing code performs the check via a
linear scan of a list of networks. That is slow when there are more
than a few. Unfortunately in some common k8s network setups, e.g. on
AWS, a cluster can contain hundreds of networks, due to /32 networks
derived from interfaces with multiple IPs.
Here we change representation of the set of networks to a prefix
tree (aka trie), which is well-suited for IP network membership checks
since networks are in fact a bitstring prefixes.
The specific representation is a crit-bit tree, but that choice was
purely based on implementation convenience - the chosen library is the
only one I could find that directly supports IP networks.
...by removing them. It was a ridiculous amount of contorted code to
test some utterly trivial functionality that is largely provided by
the golang stdlib.
Another implicit invariant in the data model is that edges are always of the
form (local -> remote). That is, the source of an edge must always be a node
that originates from within Scope's domain of visibility. This was evident by
the presence of ingress and egress fields in edge/aggregate metadata.
When building the sniffer, I accidentally and incorrectly violated this
invariant, by constructing distinct edges for (local -> remote) and (remote ->
local), and collapsing ingress and egress byte counts to a single scalar. I
experienced a variety of subtle undefined behavior as a result. See #339.
This change reverts to the old, correct methodology. Consequently the sniffer
needs to be able to find out which side of the sniffed packet is local v.
remote, and to do that it needs access to local networks. I moved the
discovery from the probe/host package into probe/main.go.
As part of that work I discovered that package report also maintains its own,
independent "cache" of local networks. Except it contains only the (optional)
Docker bridge network, if it's been populated by the probe, and it's only used
by the report.Make{Endpoint,Address}NodeID constructors to scope local
addresses. Normally, scoping happens during rendering, and only for pseudo
nodes -- see current LeafMap Render localNetworks. This is pretty convoluted
and should be either be made consistent or heavily commented.