When the probe first starts we should only be interested in active
connections, and if the loop re-starts it's probably because too many
connections are opening and closing to keep up with, so it's good to
drop any that are already closed then too.
Refactor the code so `handleFlow` is only called on events, and handle
the initial list of connections directly.
- don't need another wrapper round `conntrack.Connections()`
- logPipe() was only for the command-line conntrack
- nobody closes the `event` chan now, so no need to pre-check for quit
1)backend/Dockerfile 2) probe/endpoint/dns_snooper.go
3) client/Dockerfile 4) docker/Dockerfile.cloud-agent
5) probe/process/walker_linux_test.go & 6) tools/lint
1)'backend/Dockerfile' : Conditional added so that the cross-compiling should
be done on amd64. Also removed support for sh-lint for ppc64le for now.
As the version for shfmt mentioned in the dockerfile is not available for
ppc64le and the later version does't work fine with existing application.
2)'probe/endpoint/dns_snooper.go' : Renamed this file so as to reuse for ppc64le
and added a build-constraint. Now this file will be build for amd64 on linux
and ppc64le on linux.
3)'client/Dockerfile' : Modified the version of the base image for node from
8.4.0 to 8.11, as this version supports multiarch.
4)'docker/Dockerfile.cloud-agent' : Modified the version of the base image for
golang from 1.10.2-strech to 1.10.2, which supports multiarch.
5) 'probe/process/walker_linux_test.go' : Test fixed to run for ppc64le,
modified the code to accept RSSBytes based on pageSize value per
architecture, instead of hard-coded values.
6)'tools/lint' : Modified the file to skip the sh-lint implementation for ppc64le.
PR #3231
With c75700fe04 we added code to detect
Ubuntu Xenial kernels with a regression in the eBPF subsystem in order
to gently fallback to procfs scanning on such systems (and not crash the
host system by running eBPF code).
With the latest kernel update for Ubuntu Xenial, the bug was fixed:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1763454
Therefore we can update the added check with an upper limit and make
sure that eBPF connection tracking only is disabled on kernels within
the range having the bug.
xref: https://github.com/weaveworks/scope/issues/3131
The Ubuntu Xenial update to kernel 4.4.0-119.143 from 4.4.0-116.140 did
include a regression in the eBPF code. A basic `bpf_map_lookup_elem`
call as found in the tcptracer-bpf library used by Scope leads to a
kernel panic. As a result, Scope / the system crashes during startup
when the tcptracer is initialized. The Scope bug report can be found
here:
https://github.com/weaveworks/scope/issues/3131
To avoid crashes and gently fallback to procfs (as Scope already does
for systems not supporting eBPF), update `isKernelSupported()` and
explicitly check for Ubuntu Kernel versions with the problem.
Once the bug is fixed and an update published, the `abiNumber` check in
`isKernelSupported()` can and should be updated with an upper limit.
The Ubuntu bug report can be found here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1763454
It is unused and none of the adjacency mapping code in the renderer
takes any notice of it. Removing this shrinks the report size.
Edges were introduced in #838. At the time we had an experimental
packet sniffer under experimental/sniff/sniffer.go. That got removed
in #1646.
We can resurrect this if we ever decide to add meta data to edges.
Use Utsname from golang.org/x/sys/unix which contains byte array
instead of int8/uint8 array members. This allows to simplify the string
conversions of these members and the marshal.FromUtsname functions are
no longer needed.
EbpfTracker can die when the tcp events are received out of order. This
can happen with a buggy kernel or apparently in other cases, see:
https://github.com/weaveworks/scope/issues/2650
As a workaround, restart EbpfTracker when an event is received out of
order. This does not seem to happen often, but as a precaution,
EbpfTracker will not restart if the last failure is less than 5 minutes
ago.
This is not easy to test but I added instrumentation to trigger a
restart:
- Start Scope with:
$ sudo WEAVESCOPE_DOCKER_ARGS="-e SCOPE_DEBUG_BPF=1" ./scope launch
- Request a stop with:
$ echo stop | sudo tee /proc/$(pidof scope-probe)/root/var/run/scope/debug-bpf
...when initialising eBPF-based connection tracking.
Previously we were ignoring all eBPF events until we had gathered the
existing connections. That means we could a) miss connections created
during the gathering, and b) fail to forget connections that got
closed during the gathering.
The fix comprises the following changes:
1. pay attention to eBPF events immediately. That way we do not
miss anything.
2. remember connections for which we received a Close event during the
initalisation phase, and subsequently drop gathered existing
connections that match these. That way we do not erroneously consider
a gathered connection as open when it got closed since the gathering.
3. drop gathered existing connections which match connections detected
through eBPF events. The latter typically have more / current
metadata. In particular, PIDs can be missing from the former.
Fixes#2689.
Fixes#2700.
Without synchronisation, the isDead() call might return a stale value,
delaying deadness detection potentially indefinitely.
Without the guards / idempotence in .stop(), invoking stop() more than
once could cause a panic, since tracer.Stop() closes a channel (which
panics on a closed channel). Multiple stop() invocations are rare, but
not impossible.
when we got an fd install event but the pid was dead by time we
processed it, we would fail to remove the watcher for that pid from
the fdinstall_pids table.
This is a minor, and bounded, leak, since the table only contains pids
that were alive when we initialized ebpf. And this change only plugs
that leak very partially, since we will never remove pids that die
while sitting in accept().
We defer starting the ebpf tracer until we've set the global var which
is referenced by the callback functions. Previously the var could be
unset when the callbacks are invoked, resulting in a segfault.
Fixes#2687.