We should have done this anyway, but I haven't noticed the error
previously. Possibly Go 1.14 is more aggressive about exiting when
some goroutines are still active.
Utility functions to create fake sets of connections for testing, and
then exercising the subset filtering code to check that quantities
come out as expected.
The app will only show one line, regardless of how many connections we
have, so reduce the number to save bandwidth and rendering time.
We filter by choosing a modulus, e.g. send every connection that is a
multiple of 3, or 9, and so on. We avoid multiples of 2 because port
numbers are often a multiple of 2 or 4 for bit-encoding reasons.
The previous code tracked only by four-tuple, which meant that two
connections with same address/port combinations in different namespace
would clash and one would get dropped.
Also previously the tuple was duplicated between the map key and
value, so we remove it from the value.
We only add the namespace in the case that the local address is
loopback, which matches how the rest of Scope treats addresses.
It was possible for `t.ebpfTracker` to change underneath this code
while running on a background goroutine, so change it to take
`ebpfTracker` as a parameter.
While we're here, rename the functions to better match what they do.
If we run `getInitialState()` async there is some chance we will see
another ebpf failure and call `useProcfs()` before `getInitialState()`
gets to the last line, whereupon it will crash on nil pointer.
Also it seems pointless to call `performEbpfTrack()` without waiting
for something to feed in, so I suspect this is what the original
author had in mind.
It will slow down this one `Report()` on machines with a lot of
processes or connections, but ebpfTracker restart is supposed to be a
rare event.
The previous code seems to be relying on a 64-bit to 32-bit conversion
working in a certain way; when gopacket was changed to cast the value
explicitly it starts returning immeditely from pcap.
We observe a slow increase in connections reported, and are unable to
find the root cause, so clear down the data every six hours and start
from a clean sheet.
Delay kernel events by up to 0.2ms, to reduce the chance the ebpf
reporter sends them out-of-order, and allow out-of-order events to
happen up to once a minute without giving up on the ebpf reporter.
The call to register this metric was removed in #633 over three years ago.
If it isn't registered then nobody can see the values.
The measurement is duplicated by metrics added in #658.
When the probe first starts we should only be interested in active
connections, and if the loop re-starts it's probably because too many
connections are opening and closing to keep up with, so it's good to
drop any that are already closed then too.
Refactor the code so `handleFlow` is only called on events, and handle
the initial list of connections directly.
- don't need another wrapper round `conntrack.Connections()`
- logPipe() was only for the command-line conntrack
- nobody closes the `event` chan now, so no need to pre-check for quit
1)backend/Dockerfile 2) probe/endpoint/dns_snooper.go
3) client/Dockerfile 4) docker/Dockerfile.cloud-agent
5) probe/process/walker_linux_test.go & 6) tools/lint
1)'backend/Dockerfile' : Conditional added so that the cross-compiling should
be done on amd64. Also removed support for sh-lint for ppc64le for now.
As the version for shfmt mentioned in the dockerfile is not available for
ppc64le and the later version does't work fine with existing application.
2)'probe/endpoint/dns_snooper.go' : Renamed this file so as to reuse for ppc64le
and added a build-constraint. Now this file will be build for amd64 on linux
and ppc64le on linux.
3)'client/Dockerfile' : Modified the version of the base image for node from
8.4.0 to 8.11, as this version supports multiarch.
4)'docker/Dockerfile.cloud-agent' : Modified the version of the base image for
golang from 1.10.2-strech to 1.10.2, which supports multiarch.
5) 'probe/process/walker_linux_test.go' : Test fixed to run for ppc64le,
modified the code to accept RSSBytes based on pageSize value per
architecture, instead of hard-coded values.
6)'tools/lint' : Modified the file to skip the sh-lint implementation for ppc64le.
PR #3231
With c75700fe04 we added code to detect
Ubuntu Xenial kernels with a regression in the eBPF subsystem in order
to gently fallback to procfs scanning on such systems (and not crash the
host system by running eBPF code).
With the latest kernel update for Ubuntu Xenial, the bug was fixed:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1763454
Therefore we can update the added check with an upper limit and make
sure that eBPF connection tracking only is disabled on kernels within
the range having the bug.
xref: https://github.com/weaveworks/scope/issues/3131