We defer starting the ebpf tracer until we've set the global var which
is referenced by the callback functions. Previously the var could be
unset when the callbacks are invoked, resulting in a segfault.
Fixes#2687.
The rendering code checks whether endpoint IPs are part of
cluster-local networks. Due to the prevalence of endpoints - medium
sized reports can contain many thousands of endpoints - this is
performance critical. Alas the existing code performs the check via a
linear scan of a list of networks. That is slow when there are more
than a few, which will be the case in the context of k8s, since there
the probes register service IPs as local /32 networks.
Here we change representation of the set of networks to a prefix
tree (aka trie), which is well-suited for IP network membership checks
since networks are in fact a bitstring prefixes.
The specific representation is a crit-bit tree, but that choice was
purely based on implementation convenience - the chosen library is the
only one I could find that directly supports IP networks.
The rendering code checks whether endpoint IPs are part of
cluster-local networks. Due to the prevalence of endpoints - medium
sized reports can contain many thousands of endpoints - this is
performance critical. Alas the existing code performs the check via a
linear scan of a list of networks. That is slow when there are more
than a few. Unfortunately in some common k8s network setups, e.g. on
AWS, a cluster can contain hundreds of networks, due to /32 networks
derived from interfaces with multiple IPs.
Here we change representation of the set of networks to a prefix
tree (aka trie), which is well-suited for IP network membership checks
since networks are in fact a bitstring prefixes.
The specific representation is a crit-bit tree, but that choice was
purely based on implementation convenience - the chosen library is the
only one I could find that directly supports IP networks.
We will use this code to execute the code in some process' network
namespace.
I did the vendoring a bit differently, as gvt seems to be a bit dumb
about getting dependencies for test packages (it tried to vendor
ginkgo and gomega, since cni tests are using it).
Also, instead of vendoring golang.org/x/sys as
github.com/containernetworking/cni/vendor/golang.org/x/sys I moved it
to scope's vendor directory.
* alpine: dl-4.alpinelinux.org is dead, use another server
* increase buffer for docker stats
Attempt to avoid the following message:
docker container: dropping stats.
* probe: better timeout error messages
The logs contains the following messages:
Process reporter took longer than 1s
K8s reporter took longer than 1s
Docker reporter took longer than 1s
Endpoint reporter took longer than 1s
This patch prints how long it takes.