Commit Graph

240 Commits

Author SHA1 Message Date
Bryan Boreham
23d8a418e1 performance: network namespace ID is a 32-bit quantity
This shrinks some data-structures slightly.

Citation: https://github.com/torvalds/linux/blob/6f0d349d922b/include/linux/ns_common.h#L10
2019-10-04 13:11:30 +00:00
Bryan Boreham
c6ce47f87d diagnostics: make fourTuple.String() human-readable 2019-10-04 13:09:50 +00:00
Bryan Boreham
2941850a75 performance: in connection tracker, hold IP addresses in binary rather than strings
This is more compact, and saves effort converting to and from the string format.
2019-09-25 20:15:05 +00:00
Bryan Boreham
9208e08bf3 Change dns snooper timeout to avoid spinning in pcap
The previous code seems to be relying on a 64-bit to 32-bit conversion
working in a certain way; when gopacket was changed to cast the value
explicitly it starts returning immeditely from pcap.
2019-09-20 14:32:53 +00:00
Bryan Boreham
5cba126c12 Merge pull request #3600 from weaveworks/expose-probe-metrics
Expose probe metrics to Prometheus
2019-08-20 14:35:06 +01:00
Bryan Boreham
eba9f31f3f fix(probe): restart conntrack handler periodically to clear out data
We observe a slow increase in connections reported, and are unable to
find the root cause, so clear down the data every six hours and start
from a clean sheet.
2019-08-13 16:30:56 +00:00
Bryan Boreham
6e715d2697 fix(probe): Loosen ebpf parameters to reduce restarts
Delay kernel events by up to 0.2ms, to reduce the chance the ebpf
reporter sends them out-of-order, and allow out-of-order events to
happen up to once a minute without giving up on the ebpf reporter.
2019-08-13 16:17:23 +00:00
Bryan Boreham
5e57b0dbcf Add metrics for conntrack and ebpf errors 2019-07-09 13:01:47 +00:00
Bryan Boreham
161014857d Merge pull request #3646 from weaveworks/remove-useless-metric
fix: remove unused metric SpyDuration
2019-07-04 17:37:16 +01:00
Bryan Boreham
c1e110ac7b fix: remove unused metric SpyDuration
The call to register this metric was removed in #633 over three years ago.
If it isn't registered then nobody can see the values.
The measurement is duplicated by metrics added in #658.
2019-07-04 13:53:00 +00:00
Bryan Boreham
1e6a6a7bb4 fix: handle errors reported by the conntrack package
In particular, ENOBUFS which indicates data has been dropped.
With this change the collector will restart, thus resynchronising with
the OS.
2019-07-04 13:30:46 +00:00
tiriplicamihai
0b4e26ed77 Fix formatting. 2019-04-30 19:24:23 +03:00
tiriplicamihai
92d3c1d5e9 Fix test data. 2019-04-30 19:09:02 +03:00
tiriplicamihai
1fbe648e82 Add newline. 2019-04-30 18:41:16 +03:00
tiriplicamihai
364a7423a5 Add tests for dns snooper. 2019-04-30 18:38:07 +03:00
tiriplicamihai
a6bc6b0148 Use break instead of continue. 2019-01-28 23:06:18 +02:00
tiriplicamihai
1aadf5c606 Fix dnssnooper probe for multiple CNAMEs. 2019-01-28 13:49:16 +02:00
CarlosEDP
4f8fc5e010 Add ARM64 build 2019-01-02 12:08:49 -02:00
Bryan Boreham
c732fee433 Don't add closed connections to 'activeFlows' 2018-11-14 15:34:58 +00:00
Bryan Boreham
95ce2cb1a8 Add build constraint on Linux-only features
Split Reporter into Linux and non-Linux parts, and stubbed it out for
non-Linux targets.
2018-11-14 15:34:58 +00:00
Bryan Boreham
01ef6a104d Eliminate connectionTrackerConfig struct 2018-11-14 15:34:58 +00:00
Bryan Boreham
e3d42676a3 Add back some parts of the original cli code 2018-11-14 15:34:58 +00:00
Bryan Boreham
71c59e87d1 Update comment 2018-11-14 15:34:58 +00:00
Bryan Boreham
f4dc368955 Don't buffer TIME_WAIT flows on conntrack start-up
When the probe first starts we should only be interested in active
connections, and if the loop re-starts it's probably because too many
connections are opening and closing to keep up with, so it's good to
drop any that are already closed then too.

Refactor the code so `handleFlow` is only called on events, and handle
the initial list of connections directly.
2018-11-14 15:34:58 +00:00
Bryan Boreham
c627802664 Refactor: remove some code that is now unnecessary
- don't need another wrapper round `conntrack.Connections()`
- logPipe() was only for the command-line conntrack
- nobody closes the `event` chan now, so no need to pre-check for quit
2018-11-14 15:34:58 +00:00
Bryan Boreham
a29e9fa27a Update to match upstream conntrack library 2018-11-14 15:34:57 +00:00
Bryan Boreham
b9405bcc4b Remove our own copy of the upstream library 2018-11-14 15:34:57 +00:00
Bryan Boreham
73f35fd6d9 Handle nat status from conntrack via netlink
Replacement for the --any-nat command-line parameter
2018-11-14 15:34:57 +00:00
Bryan Boreham
ed6a010330 Decode conntrack status from netlink 2018-11-14 15:34:57 +00:00
Bryan Boreham
3314e1f0c7 Move constants to headers.go to be more like upstream 2018-11-14 15:34:57 +00:00
Bryan Boreham
7a68b5bdb0 Use Nfgenmsg from unix package instead of declaring locally 2018-11-14 15:34:57 +00:00
Bryan Boreham
8b04ef7359 Move conntrack code out to client.go to match upstream 2018-11-14 15:34:57 +00:00
Joseph Glanville
ac63937df7 Switch to new conntrack library 2018-11-14 15:34:57 +00:00
Joseph Glanville
853196f6d1 Import conntrack library 2018-11-14 15:34:57 +00:00
meghalidhoble
625998b91e Change made to the listed files, to enable weaveworks-scope on Power(ppc64le)
1)backend/Dockerfile 2) probe/endpoint/dns_snooper.go
3) client/Dockerfile 4) docker/Dockerfile.cloud-agent
5) probe/process/walker_linux_test.go & 6) tools/lint

1)'backend/Dockerfile' : Conditional added so that the cross-compiling should
   be done on amd64. Also removed support for sh-lint for ppc64le for now.
   As the version for shfmt mentioned in the dockerfile is not available for
   ppc64le and the later version does't work fine with existing application.
2)'probe/endpoint/dns_snooper.go' : Renamed this file so as to reuse for ppc64le
   and added a build-constraint. Now this file will be build for amd64 on linux
   and ppc64le on linux.
3)'client/Dockerfile' : Modified the version of the base image for node from
   8.4.0 to 8.11, as this version supports multiarch.
4)'docker/Dockerfile.cloud-agent' : Modified the version of the base image for
   golang from 1.10.2-strech to 1.10.2, which supports multiarch.
5) 'probe/process/walker_linux_test.go' : Test fixed to run for ppc64le,
    modified the code to accept RSSBytes based on pageSize value per
    architecture, instead of hard-coded values.
6)'tools/lint' : Modified the file to skip the sh-lint implementation for ppc64le.

PR #3231
2018-08-13 12:45:25 +05:30
Marc Carré
d46c2266ce Change Sirupsen/logrus to sirupsen/logrus
```
$ git grep -l Sirupsen | grep -v vendor | xargs sed -i '' 's:github.com/Sirupsen/logrus:github.com/sirupsen/logrus:g'
$ gofmt -s -w app
$ gofmt -s -w common
$ gofmt -s -w probe
$ gofmt -s -w prog
$ gofmt -s -w tools
```
2018-07-23 20:10:14 +02:00
Michael Schubert
7bb1e38de3 ebpf: update check for known faulty Ubuntu kernels
With c75700fe04 we added code to detect
Ubuntu Xenial kernels with a regression in the eBPF subsystem in order
to gently fallback to procfs scanning on such systems (and not crash the
host system by running eBPF code).

With the latest kernel update for Ubuntu Xenial, the bug was fixed:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1763454

Therefore we can update the added check with an upper limit and make
sure that eBPF connection tracking only is disabled on kernels within
the range having the bug.

xref: https://github.com/weaveworks/scope/issues/3131
2018-05-23 11:38:04 +02:00
Michael Schubert
5d036c5ac4 ebpf: add tests for isKernelSupported() 2018-04-13 17:17:51 +02:00
Michael Schubert
c75700fe04 ebpf: check for known faulty Ubuntu kernel
The Ubuntu Xenial update to kernel 4.4.0-119.143 from 4.4.0-116.140 did
include a regression in the eBPF code. A basic `bpf_map_lookup_elem`
call as found in the tcptracer-bpf library used by Scope leads to a
kernel panic. As a result, Scope / the system crashes during startup
when the tcptracer is initialized. The Scope bug report can be found
here:

https://github.com/weaveworks/scope/issues/3131

To avoid crashes and gently fallback to procfs (as Scope already does
for systems not supporting eBPF), update `isKernelSupported()` and
explicitly check for Ubuntu Kernel versions with the problem.

Once the bug is fixed and an update published, the `abiNumber` check in
`isKernelSupported()` can and should be updated with an upper limit.

The Ubuntu bug report can be found here:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1763454
2018-04-13 17:17:51 +02:00
Bryan Boreham
b5cdcb9a42 Move DNS name mapping from endpoint to report 2018-02-20 16:14:21 +00:00
Bryan Boreham
6674ff61e5 Fix incorrect comment 2018-02-20 16:14:20 +00:00
Matthias Radestock
5b30b668ae refactor: don't return receiver in Topology.AddNode()
This had little use and was obscuring the mutating nature of
AddNode().
2018-02-19 05:10:04 +00:00
Matthias Radestock
e93b69cf10 remove Node.Edges
It is unused and none of the adjacency mapping code in the renderer
takes any notice of it. Removing this shrinks the report size.

Edges were introduced in #838. At the time we had an experimental
packet sniffer under experimental/sniff/sniffer.go. That got removed
in #1646.

We can resurrect this if we ever decide to add meta data to edges.
2017-12-17 13:28:22 +00:00
Matthias Radestock
1f2247a8c4 move node metadata keys into report package
Both the probe and the app (for rendering) need to know about them.
2017-12-11 20:26:08 +00:00
Matthias Radestock
1865c46368 refactor: introduce a constant for "copy_of"
since it's shared between the probe and renderer
2017-12-09 10:45:59 +00:00
Tobias Klauser
89f3ce2e64 Simplify Utsname string conversion
Use Utsname from golang.org/x/sys/unix which contains byte array
instead of int8/uint8 array members. This allows to simplify the string
conversions of these members and the marshal.FromUtsname functions are
no longer needed.
2017-11-02 08:45:54 +01:00
Alban Crequy
9c53653997 EbpfTracker: restart it when it dies
EbpfTracker can die when the tcp events are received out of order. This
can happen with a buggy kernel or apparently in other cases, see:
https://github.com/weaveworks/scope/issues/2650

As a workaround, restart EbpfTracker when an event is received out of
order. This does not seem to happen often, but as a precaution,
EbpfTracker will not restart if the last failure is less than 5 minutes
ago.

This is not easy to test but I added instrumentation to trigger a
restart:

- Start Scope with:
    $ sudo WEAVESCOPE_DOCKER_ARGS="-e SCOPE_DEBUG_BPF=1" ./scope launch

- Request a stop with:
    $ echo stop | sudo tee /proc/$(pidof scope-probe)/root/var/run/scope/debug-bpf
2017-08-17 16:39:27 +02:00
Matthias Radestock
e77d40fc16 refactor: inline connectionTracker.performFlowWalk 2017-07-30 09:23:41 +01:00
Matthias Radestock
b93b19a7c7 refactor: simplify connection polarity reversal 2017-07-30 08:48:13 +01:00
Matthias Radestock
65cebed6c4 get rid of endpoint type indicators
The app stopped paying attention to these some time ago.

Removing them shrinks reports by 3-10%.
2017-07-30 08:38:56 +01:00