Since d60874aca8 `connectionTracker` can
fallback when the `EbpfTracker` died. Hence we only have to stop the
`tracer` in `stop()`.
This commit is also a fixup for d60874aca8
where we do a gentle fallback but never actually stop the tracer to stop
polling.
Based on work from Lorenzo, updated by Iago, Alban, Alessandro and
Michael.
This PR adds connection tracking using eBPF. This feature is not enabled by default.
For now, you can enable it by launching scope with the following command:
```
sudo ./scope launch --probe.ebpf.connections=true
```
This patch allows scope to get notified of every connection event,
without relying on the parsing of /proc/$pid/net/tcp{,6} and
/proc/$pid/fd/*, and therefore improve performance.
We vendor https://github.com/iovisor/gobpf in Scope to load the
pre-compiled ebpf program and https://github.com/weaveworks/tcptracer-bpf
to guess the offsets of the structures we need in the kernel. In this
way we don't need a different pre-compiled ebpf object file per kernel.
The pre-compiled ebpf program is included in the vendoring of
tcptracer-bpf.
The ebpf program uses kprobes/kretprobes on the following kernel functions:
- tcp_v4_connect
- tcp_v6_connect
- tcp_set_state
- inet_csk_accept
- tcp_close
It generates "connect", "accept" and "close" events containing the
connection tuple but also pid and netns.
Note: the IPv6 events are not supported in Scope and thus not passed on.
probe/endpoint/ebpf.go maintains the list of connections. Similarly to
conntrack, it also keeps the dead connections for one iteration in order
to report short-lived connections.
The code for parsing /proc/$pid/net/tcp{,6} and /proc/$pid/fd/* is still
there and still used at start-up because eBPF only brings us the events
and not the initial state. However, the /proc parsing for the initial
state is now done in foreground instead of background, via
newForegroundReader().
NAT resolution on connections from eBPF works in the same way as it did
on connections from /proc: by using conntrack. One of the two conntrack
instances is only started to get the initial state and then it is
stopped since eBPF detects short-lived connections.
The Scope Docker image size comparison:
- weaveworks/scope in current master: 22 MB (compressed), 68 MB
(uncompressed)
- weaveworks/scope with this patchset: 23 MB (compressed), 69 MB
(uncompressed)
Fixes#1168 (walking /proc to obtain connections is very expensive)
Fixes#1260 (Short-lived connections not tracked for containers in
shared networking namespaces)
Fixes#1962 (Port ebpf tracker to Go)
Fixes#1961 (Remove runtime kernel header dependency from ebpf tracker)
With net.netfilter.nf_conntrack_acct = 1, conntrack adds the following
fields in the output: packets=3 bytes=164
And with SELinux (e.g. Fedora), conntrack adds: secctx=...
The parsing with fmt.Sscanf introduced in #2095 was unfortunately
rejecting lines with those fields. This patch fixes that by adding more
complicated parsing in decodeFlowKeyValues() with FieldsFunc and SplitN.
Fixes#2117
Regression from #2095
The header checking code was unsafe because:
1. It was accessing the byteslice at [2] without ensuring a length >= 3
2. It was assuming that the indentation of the 'sl' header is always 2 (which seems to be the case in recent kernels 8f18e4d03e/net/ipv4/tcp_ipv4.c (L2304) and 8f18e4d03e/net/ipv6/tcp_ipv6.c (L1831) ) but it's more robust to simply trim the byteslice.
getWalkedProcPid() reads latestBuf every 3 seconds (for each report).
But performWalk() writes latestBuf every 10 seconds or so. So we need to
be able to read the same buffer several times.
There were two problems:
- the renderer was looking for reverse names on the destination
- the probe was not annotating source nodes with reverse-resolved names
Fixes#1847
* Rework Scope metrics according to Prometheus conventions.
- counters should end with _total
- elaborated and added units to help strings
- recommended for cache hit/miss metrics: track only the total and the
hits and in separate metrics, since the most common query will be
"hits / total"
- track all times in seconds (base units), which has become the standard
recommendation
- other small changes
There could be more changes that would require more thinking (what
dimensions to use, summaries vs. histograms, etc.), but this is probably
enough controversial material already :)
* Use timeRequestStatus() in sqs_control_router.go.
This closes a small window where we might produce reports which contain flows that are NEW but have never seen an UPDATE, which can potentially be invalid.