Use Utsname from golang.org/x/sys/unix which contains byte array
instead of int8/uint8 array members. This allows to simplify the string
conversions of these members and the marshal.FromUtsname functions are
no longer needed.
ProcNet.Next does not allocate Connection structs, for efficiency.
Instead it always returns a *Connection pointing to the same instance.
As a result, any mutations by the caller to struct elements that
aren't actually set by ProcNet.Next, in particular Connection.Proc,
are carried across to subsequent calls.
This had hilarious consequences: connections referencing an inode
which we hadn't come across during proc walking would be associated
with the process corresponding to the last successfully looked up
inode.
The fix is to clear out the garbage left over from previous calls.
Fixes#2638.
Since https://github.com/weaveworks/tcptracer-bpf/pull/39, tcptracer-bpf
can generate "fd_install" events when a process installs a new file
descriptor in its fd table. Those events must be requested explicitely
on a per-pid basis with tracer.AddFdInstallWatcher(pid).
This is useful to know about "accept" events that would otherwise be
missed because kretprobes are not triggered for functions that were
called before the installation of the kretprobe.
This patch find all the processes that are currently blocked on an
accept() syscall during the EbpfTracker initialization.
feedInitialConnections() will use tracer.AddFdInstallWatcher() to
subscribe to fd_install events. When a fd_install event is received,
synthesise an accept event with the connection tuple and the network
namespace (from /proc).
Based on work from Lorenzo, updated by Iago, Alban, Alessandro and
Michael.
This PR adds connection tracking using eBPF. This feature is not enabled by default.
For now, you can enable it by launching scope with the following command:
```
sudo ./scope launch --probe.ebpf.connections=true
```
This patch allows scope to get notified of every connection event,
without relying on the parsing of /proc/$pid/net/tcp{,6} and
/proc/$pid/fd/*, and therefore improve performance.
We vendor https://github.com/iovisor/gobpf in Scope to load the
pre-compiled ebpf program and https://github.com/weaveworks/tcptracer-bpf
to guess the offsets of the structures we need in the kernel. In this
way we don't need a different pre-compiled ebpf object file per kernel.
The pre-compiled ebpf program is included in the vendoring of
tcptracer-bpf.
The ebpf program uses kprobes/kretprobes on the following kernel functions:
- tcp_v4_connect
- tcp_v6_connect
- tcp_set_state
- inet_csk_accept
- tcp_close
It generates "connect", "accept" and "close" events containing the
connection tuple but also pid and netns.
Note: the IPv6 events are not supported in Scope and thus not passed on.
probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, it also keeps the dead connections for one iteration in order
to report short-lived connections.
The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().
NAT resolution on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances is only started to get the initial state and then it is
stopped since eBPF detects short-lived connections.
The Scope Docker image size comparison:
- weaveworks/scope in current master: 22 MB (compressed), 68 MB
(uncompressed)
- weaveworks/scope with this patchset: 23 MB (compressed), 69 MB
(uncompressed)
Fixes#1168 (walking /proc to obtain connections is very expensive)
Fixes#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
Fixes#1962 (Port ebpf tracker to Go)
Fixes#1961 (Remove runtime kernel header dependency from ebpf tracker)
The header checking code was unsafe because:
1. It was accessing the byteslice at [2] without ensuring a length >= 3
2. It was assuming that the indentation of the 'sl' header is always 2 (which seems to be the case in recent kernels 8f18e4d03e/net/ipv4/tcp_ipv4.c (L2304) and 8f18e4d03e/net/ipv6/tcp_ipv6.c (L1831) ) but it's more robust to simply trim the byteslice.
getWalkedProcPid() reads latestBuf every 3 seconds (for each report).
But performWalk() writes latestBuf every 10 seconds or so. So we need to
be able to read the same buffer several times.