...when initialising eBPF-based connection tracking.
Previously we were ignoring all eBPF events until we had gathered the
existing connections. That means we could a) miss connections created
during the gathering, and b) fail to forget connections that got
closed during the gathering.
The fix comprises the following changes:
1. pay attention to eBPF events immediately. That way we do not
miss anything.
2. remember connections for which we received a Close event during the
initalisation phase, and subsequently drop gathered existing
connections that match these. That way we do not erroneously consider
a gathered connection as open when it got closed since the gathering.
3. drop gathered existing connections which match connections detected
through eBPF events. The latter typically have more / current
metadata. In particular, PIDs can be missing from the former.
Fixes#2689.
Fixes#2700.
Without synchronisation, the isDead() call might return a stale value,
delaying deadness detection potentially indefinitely.
Without the guards / idempotence in .stop(), invoking stop() more than
once could cause a panic, since tracer.Stop() closes a channel (which
panics on a closed channel). Multiple stop() invocations are rare, but
not impossible.
when we got an fd install event but the pid was dead by time we
processed it, we would fail to remove the watcher for that pid from
the fdinstall_pids table.
This is a minor, and bounded, leak, since the table only contains pids
that were alive when we initialized ebpf. And this change only plugs
that leak very partially, since we will never remove pids that die
while sitting in accept().
We defer starting the ebpf tracer until we've set the global var which
is referenced by the callback functions. Previously the var could be
unset when the callbacks are invoked, resulting in a segfault.
Fixes#2687.
ProcNet.Next does not allocate Connection structs, for efficiency.
Instead it always returns a *Connection pointing to the same instance.
As a result, any mutations by the caller to struct elements that
aren't actually set by ProcNet.Next, in particular Connection.Proc,
are carried across to subsequent calls.
This had hilarious consequences: connections referencing an inode
which we hadn't come across during proc walking would be associated
with the process corresponding to the last successfully looked up
inode.
The fix is to clear out the garbage left over from previous calls.
Fixes#2638.
the information is constant and already present in the id, so we can
extract it from there.
That reduces the report size and improves report encoding/decoding
performance. It should reduce memory usage too and improve report
merging performance too.
NB: Probes with this change are incompatible with old apps.
- eliminate the code duplication when falling back to procfs scanning
- trim some superfluous comments
Also fix a bug in the procvess: when falling back to procfs scanning
in ReportConnections, the scanner was given a "--any-nat" param, which
is wrong.
Since https://github.com/weaveworks/tcptracer-bpf/pull/39, tcptracer-bpf
can generate "fd_install" events when a process installs a new file
descriptor in its fd table. Those events must be requested explicitely
on a per-pid basis with tracer.AddFdInstallWatcher(pid).
This is useful to know about "accept" events that would otherwise be
missed because kretprobes are not triggered for functions that were
called before the installation of the kretprobe.
This patch find all the processes that are currently blocked on an
accept() syscall during the EbpfTracker initialization.
feedInitialConnections() will use tracer.AddFdInstallWatcher() to
subscribe to fd_install events. When a fd_install event is received,
synthesise an accept event with the connection tuple and the network
namespace (from /proc).
Since d60874aca8 `connectionTracker` can
fallback when the `EbpfTracker` died. Hence we only have to stop the
`tracer` in `stop()`.
This commit is also a fixup for d60874aca8
where we do a gentle fallback but never actually stop the tracer to stop
polling.
Based on work from Lorenzo, updated by Iago, Alban, Alessandro and
Michael.
This PR adds connection tracking using eBPF. This feature is not enabled by default.
For now, you can enable it by launching scope with the following command:
```
sudo ./scope launch --probe.ebpf.connections=true
```
This patch allows scope to get notified of every connection event,
without relying on the parsing of /proc/$pid/net/tcp{,6} and
/proc/$pid/fd/*, and therefore improve performance.
We vendor https://github.com/iovisor/gobpf in Scope to load the
pre-compiled ebpf program and https://github.com/weaveworks/tcptracer-bpf
to guess the offsets of the structures we need in the kernel. In this
way we don't need a different pre-compiled ebpf object file per kernel.
The pre-compiled ebpf program is included in the vendoring of
tcptracer-bpf.
The ebpf program uses kprobes/kretprobes on the following kernel functions:
- tcp_v4_connect
- tcp_v6_connect
- tcp_set_state
- inet_csk_accept
- tcp_close
It generates "connect", "accept" and "close" events containing the
connection tuple but also pid and netns.
Note: the IPv6 events are not supported in Scope and thus not passed on.
probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, it also keeps the dead connections for one iteration in order
to report short-lived connections.
The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().
NAT resolution on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances is only started to get the initial state and then it is
stopped since eBPF detects short-lived connections.
The Scope Docker image size comparison:
- weaveworks/scope in current master: 22 MB (compressed), 68 MB
(uncompressed)
- weaveworks/scope with this patchset: 23 MB (compressed), 69 MB
(uncompressed)
Fixes#1168 (walking /proc to obtain connections is very expensive)
Fixes#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
Fixes#1962 (Port ebpf tracker to Go)
Fixes#1961 (Remove runtime kernel header dependency from ebpf tracker)