Commit Graph

716 Commits

Author SHA1 Message Date
Tobias Klauser
89f3ce2e64 Simplify Utsname string conversion
Use Utsname from golang.org/x/sys/unix which contains byte array
instead of int8/uint8 array members. This allows to simplify the string
conversions of these members and the marshal.FromUtsname functions are
no longer needed.
2017-11-02 08:45:54 +01:00
Damien Lespiau
5990ad4947 docker: Close pipe when the docker API call fails
This hasn't been found in the wild but by code inspection. If we fail the
docker API call, the pipe is never closed. Close it before returning.
2017-10-16 23:30:46 +01:00
Mike Lang
1c6fbffc69 Fix test broken by #2854 2017-09-19 03:54:13 -07:00
Bruno Galindro da Costa
cd21bafa2e Adds ECS Cluster Region option 2017-09-18 20:14:44 -03:00
Alban Crequy
9c53653997 EbpfTracker: restart it when it dies
EbpfTracker can die when the tcp events are received out of order. This
can happen with a buggy kernel or apparently in other cases, see:
https://github.com/weaveworks/scope/issues/2650

As a workaround, restart EbpfTracker when an event is received out of
order. This does not seem to happen often, but as a precaution,
EbpfTracker will not restart if the last failure is less than 5 minutes
ago.

This is not easy to test but I added instrumentation to trigger a
restart:

- Start Scope with:
    $ sudo WEAVESCOPE_DOCKER_ARGS="-e SCOPE_DEBUG_BPF=1" ./scope launch

- Request a stop with:
    $ echo stop | sudo tee /proc/$(pidof scope-probe)/root/var/run/scope/debug-bpf
2017-08-17 16:39:27 +02:00
Matthias Radestock
7a23afde2c Merge pull request #2781 from weaveworks/2550-non-login-container-shell
run a normal (rather than login) shell in containers
2017-08-02 08:33:43 +01:00
Mike Lang
c149e5792a k8s probe: Fix a panic (nil pointer deref) when a cronjob has never been scheduled
in which case cj.Status.LastScheduled is nil.
New behaviour is to omit it from the map (and therefore the display) if it has never been scheduled.
2017-08-01 14:14:44 -07:00
Matthias Radestock
1e38e78518 Merge pull request #2779 from weaveworks/2748-synthesise-service-network
synthesise k8s service network from service IPs

Fixes #2748.
2017-08-01 13:17:27 +01:00
Matthias Radestock
4dae7edc9c synthesise k8s service network from service IPs
This prevents cluttering host.LocalNetworks with lots of /32
addresses. These were unsightly and rather distracting in the UI. They
also bloated the report and slowed down server-side rendering.

Fixes #2748.
2017-08-01 12:17:50 +01:00
Matthias Radestock
8935d434c5 run a normal (rather than login) shell in containers
That way PATH is preserved.

Fixes #2550.
2017-08-01 08:53:56 +01:00
Matthias Radestock
e77d40fc16 refactor: inline connectionTracker.performFlowWalk 2017-07-30 09:23:41 +01:00
Matthias Radestock
b93b19a7c7 refactor: simplify connection polarity reversal 2017-07-30 08:48:13 +01:00
Matthias Radestock
65cebed6c4 get rid of endpoint type indicators
The app stopped paying attention to these some time ago.

Removing them shrinks reports by 3-10%.
2017-07-30 08:38:56 +01:00
Alfonso Acosta
3e4b3cbbf5 Add pod restart count to details pane 2017-07-27 13:15:53 +00:00
Mike Lang
486bdcc796 k8s: Use 'DaemonSet', 'StatefulSet' etc instead of 'Daemon Set', 'Stateful Set'
We can't search for terms with spaces.
2017-07-26 13:49:54 -07:00
Matthias Radestock
3ab48bcdbe Merge pull request #2756 from weaveworks/2464-maximize-timeout
maximize report publishing timeout

Fixes #2464.
2017-07-26 11:41:48 +01:00
Matthias Radestock
d8c747ef20 downgrade "Dropping report" log message to warning
It indicates degraded functionality, not catastrophe.
2017-07-25 21:29:18 +01:00
Matthias Radestock
935f6e6c20 maximize report publishing timeout
If the app really does take a long time to process reports, it is
better not to time out and send it more reports. However, we do want
to send at least one report per app.window, otherwise the scope UI
will go blank.

Fixes #2464 (as much as is practically possible)
2017-07-25 21:20:39 +01:00
Matthias Radestock
a5a9180605 do not back off on timeouts when sending reports
...since doing so unnecessarily throttles report sending, to the point
where the app is receiving reports so infrequently that often it has
no data to show.

The timeout period itself is sufficient to prevent thrashing.

Fixes #2745.
2017-07-25 21:16:35 +01:00
Matthias Radestock
3c6ae972ab new full reports are more important than old and shortcut reports
so when there is backpressure in publishing reports, drop shortcut
reports in preference to full reports, and drop old full reports in
preference to new full reports.

Fixes #2738.
2017-07-24 22:19:27 +01:00
Matthias Radestock
9e6ecee37d optimisation: don't copy report stream unnecessarily
We don't need to copy from the reader when publishing to just one
destination.
2017-07-21 17:56:08 +01:00
Matthias Radestock
d368854b90 defend against nils
Fixes #2508. Hopefully.
2017-07-20 15:59:59 +01:00
Mike Lang
fe3bdbfcdc probe/kubernetes: Speed up lookups of large lists of cronjobs and jobs
Currently joining the two lists is O(mn), by putting into a hashmap first it's O(m+n)
2017-07-19 11:31:43 -07:00
Mike Lang
38814d54a6 deployment: Fix usage of Spec.Replicas, which is a pointer
Spec.Replicas is a *int32, with a value of nil occurring when the user doesn't set it.
In this case k8s defaults to 1, so we mimic this to show the effective value.
2017-07-18 11:35:50 -07:00
Mike Lang
481258d8fc kubernetes probe: Collect info on cronjobs and statefulsets
Most of the time you only care about cronjobs, not the jobs that make them up,
so we only collect full cronjob data. We associate pods of jobs with the parent cronjob
2017-07-18 11:35:50 -07:00
Alfonso Acosta
b7d292e161 Gather Weave Net plugin and proxy info from report
Instead of using Docker, because after Weave Net 2.0 there are no proxy nor
plugin containers.

This has the drawback of not detecting the plugin/proxy in systems running
Weave Net < 2.0 , but I think we can live with it.
2017-07-17 13:23:37 +00:00
Bryan Boreham
88ca9812b2 Fix up tests for change to NewReporter() 2017-07-13 16:24:17 +00:00
Bryan Boreham
3e9eb83d12 Use Kubernetes node name to filter pods if possible 2017-07-13 16:24:17 +00:00
Matthias Radestock
e603a28ca4 Merge pull request #2704 from weaveworks/2689-2700-ebpf-init
don't miss, or fail to forget, initial connections

Fixes #2689.
Fixes #2700.
2017-07-13 11:39:31 +01:00
Matthias Radestock
b087e95711 bump tcptracer-bpf version 2017-07-12 07:27:35 +01:00
Matthias Radestock
ebc3cddf01 don't miss, or fail to forget, initial connections
...when initialising eBPF-based connection tracking.

Previously we were ignoring all eBPF events until we had gathered the
existing connections. That means we could a) miss connections created
during the gathering, and b) fail to forget connections that got
closed during the gathering.

The fix comprises the following changes:

1. pay attention to eBPF events immediately. That way we do not
miss anything.

2. remember connections for which we received a Close event during the
initalisation phase, and subsequently drop gathered existing
connections that match these. That way we do not erroneously consider
a gathered connection as open when it got closed since the gathering.

3. drop gathered existing connections which match connections detected
through eBPF events. The latter typically have more / current
metadata. In particular, PIDs can be missing from the former.

Fixes #2689.
Fixes #2700.
2017-07-11 22:50:47 +01:00
Matthias Radestock
d568c50ec4 make EbpfTracker.dead go-routine-safe and .stop() idempotent
Without synchronisation, the isDead() call might return a stale value,
delaying deadness detection potentially indefinitely.

Without the guards / idempotence in .stop(), invoking stop() more than
once could cause a panic, since tracer.Stop() closes a channel (which
panics on a closed channel). Multiple stop() invocations are rare, but
not impossible.
2017-07-11 19:38:07 +01:00
Matthias Radestock
cf6353327a eliminate race in ebpf initialization
We were enabling event processing before feeding in the initial
connections, which results in a non-deterministic outcome.
2017-07-11 19:38:07 +01:00
Matthias Radestock
15215d0c2c prevent concurrent map access in ebpf fd install event handler
which presumably could cause havoc
2017-07-11 19:38:07 +01:00
Matthias Radestock
3883d8f1af fix a minor leak in ebfp fdinstall_pids table
when we got an fd install event but the pid was dead by time we
processed it, we would fail to remove the watcher for that pid from
the fdinstall_pids table.

This is a minor, and bounded, leak, since the table only contains pids
that were alive when we initialized ebpf. And this change only plugs
that leak very partially, since we will never remove pids that die
while sitting in accept().
2017-07-11 19:38:07 +01:00
Matthias Radestock
e2cbe7ac26 refactor: a bit of inlining 2017-07-11 19:38:06 +01:00
Matthias Radestock
3baeb3d238 refactor: use fourTuple as map key instead of string 2017-07-11 19:38:06 +01:00
Matthias Radestock
ad7b5cdc19 refactor: remove pointless interface
premature abstraction
2017-07-11 19:38:06 +01:00
Matthias Radestock
8a56540648 refactor: eliminate global var 2017-07-11 19:38:06 +01:00
Matthias Radestock
8bd0188537 respect UseConntrack setting in ebpf initialisation 2017-07-11 19:37:11 +01:00
Matthias Radestock
7ea0800f8b refactor: extract helper to get initial flows 2017-07-10 07:34:20 +01:00
Matthias Radestock
07e7adbd63 refactor: make performFlowWalk data flow more obvious 2017-07-10 07:22:12 +01:00
Matthias Radestock
19e45ec248 refactor: eliminate global var 2017-07-07 10:18:43 +01:00
Matthias Radestock
8cf79b2e4a bump tcptracer-bpf version and use it to fix race
We defer starting the ebpf tracer until we've set the global var which
is referenced by the callback functions. Previously the var could be
unset when the callbacks are invoked, resulting in a segfault.

Fixes #2687.
2017-07-07 06:56:28 +01:00
Matthias Radestock
f0ae2bd98c refactor: use inline StringSet constructor 2017-07-04 06:29:19 +01:00
Alfonso Acosta
6c03540b1f Merge pull request #2659 from weaveworks/use-new-k8s-go-client
Use new k8s go client
2017-07-03 23:23:41 +02:00
Alfonso Acosta
84afe9fe70 Fix typo 2017-07-03 20:20:28 +00:00
Alfonso Acosta
34bfc22b4f Fix tests 2017-07-03 20:20:28 +00:00
Alfonso Acosta
7d59936d8c HostNetwork is now inlined in the pod spec 2017-07-03 20:20:28 +00:00
Alfonso Acosta
8bbbf25809 Migrate probe to new new kubernetes go-client
This namely involved importing new libraries and using the new Clientset.

Changes worth mentioning:

* The new kubernetes library doesn't provide StoreToLister wrappers, so now I am going the casting directly.
* Deleting the pods and getting their logs is done in a cleaner way (using the
  Clientset instead of the lower-level RESTclient).
2017-07-03 20:20:27 +00:00