Commit Graph

260 Commits

Author SHA1 Message Date
Julius Volz
4fa40e22b2 Rework Scope metrics according to Prometheus conventions. (#1615)
* Rework Scope metrics according to Prometheus conventions.

- counters should end with _total
- elaborated and added units to help strings
- recommended for cache hit/miss metrics: track only the total and the
  hits and in separate metrics, since the most common query will be
  "hits / total"
- track all times in seconds (base units), which has become the standard
  recommendation
- other small changes

There could be more changes that would require more thinking (what
dimensions to use, summaries vs. histograms, etc.), but this is probably
enough controversial material already :)

* Use timeRequestStatus() in sqs_control_router.go.
2016-06-30 09:12:25 +01:00
Jonathan Lange
387c543a87 Fix nil pointer error when memcache not enabled 2016-06-24 14:01:46 +01:00
Tom Wilkie
29133e54ca Add backoff to the consul client (#1608)
* Add backoff to the consul client

* Review feedback
2016-06-24 09:04:08 +01:00
Jonathan Lange
47fcb52354 Optional memcached between probes and S3
If given settings for memcached, services will store & fetch reports
from memcache after checking their in-process cache but before fetching
from S3.
2016-06-22 18:40:50 +01:00
Jonathan Lange
9e0b27840b Delete test for unsupported functionality 2016-06-22 11:19:19 +01:00
Jonathan Lange
40cbf119d3 Nice error on unsupported content type 2016-06-22 10:02:18 +01:00
Jonathan Lange
ce5c933d3c Remove unused import 2016-06-21 11:14:14 +01:00
Jonathan Lange
8bd8f883a1 Restore debugging logic 2016-06-21 11:08:55 +01:00
Jonathan Lange
81b05a33ee Make ReadBinary more general and re-use in router 2016-06-20 18:02:23 +01:00
Jonathan Lange
13269e8110 Helper for reading & writing from binary 2016-06-17 15:24:33 +01:00
Tom Wilkie
c80eb42a4f Add filters for pseudo nodes. (#1581)
* Add filters for pseudo nodes.

- Don't filter the internet node as a pseudo node.
- Rename pseudo filter to unmanaged/uncontained.
- Review feedback
- Move the FilterFoo funcs into the tests
- Drop the 'nodes' from filter labels.

* Fix experimental
2016-06-16 20:09:13 +01:00
Tom Wilkie
a7b34f1601 Use NATS for shortcut reports in the service. (#1568)
* Vendor nats-io/nats

* Use NATS for shortcut reports.

* Review feedback.

* Rejig shortcut subscriptions, so they work.

* Review feedback
2016-06-09 12:48:41 +01:00
Tom Wilkie
141ce75902 Log errors in response to http requests. (#1569) 2016-06-09 09:01:50 +01:00
Jonathan Lange
48fc985a3e Get non-cached reports in parallel 2016-06-07 19:23:45 +01:00
Jonathan Lange
0907cdfa0d Fail fast on error fetching non-cached reports 2016-06-07 19:00:14 +01:00
Jonathan Lange
3d12a2a76c Extract function for getting single report 2016-06-07 18:48:01 +01:00
Tom Wilkie
12f281654d Put reports in S3; add in process caching (#1545)
* Add in-process caching to dynamodb collector

* Add metrics for dynamodb consumed capacity and report size

* Log and return errors during report collection

* Increase compression to the max

* Put reports in S3 and just use DynamoDB as an index.

* Review feedback
2016-05-31 15:40:15 +01:00
Tom Wilkie
7377945302 Use smart merger in the dynamodb collector. (#1543) 2016-05-27 08:57:07 +01:00
Tom Wilkie
c8828826ae Allow user to specify table name and queue prefix. (#1538)
* Allow user to specify table name and queue prefix.

* Trim leading slash, catch missed queue prefix

* Comment out publish step until devwww is fixed.
2016-05-25 10:09:32 +01:00
Tom Wilkie
861605a5ee Instrument SQS calls 2016-05-23 16:48:31 +01:00
Tom Wilkie
5a9aebbcb4 lint 2016-05-23 16:19:23 +01:00
Tom Wilkie
f36bb4e2fb Gather dyanmodb latency 2016-05-23 14:02:34 +01:00
Tom Wilkie
334701f92e Add a missing return. 2016-05-20 19:17:15 +01:00
Tom Wilkie
24062be6c9 Increase test replicas (#1529)
* Increase number of test VMs to 5 per shard.

* Make pipe router test shorter.
2016-05-19 11:00:51 +01:00
Tom Wilkie
8f772a696d Add flag to disable reporting of processes (and procspied endpoints) 2016-05-17 17:29:09 +01:00
Alfonso Acosta
3cf3c713ae Meassure report sizes 2016-05-10 09:45:29 +00:00
Paul Bellamy
2d10a6a9a6 Merge pull request #1447 from weaveworks/1441-cache-size
Remove cache from SmartMerger.
2016-05-09 10:51:23 +01:00
Tom Wilkie
2dae03501e Remove the caching 2016-05-09 10:08:14 +01:00
Paul Bellamy
541699d193 Review Feedback 2016-05-09 09:19:11 +01:00
Paul Bellamy
16a5c738d9 Deployment and ReplicaSet views for k8s 2016-05-09 09:03:57 +01:00
Paul Bellamy
bb284edee8 set 'default' as the default namespace filter instead of 'all' (#1445) 2016-05-06 18:17:32 +01:00
Tom Wilkie
71d3126c82 Limit merge cache to 200 entries and expire entries old than merge window. 2016-05-06 17:54:57 +01:00
Tom Wilkie
54a760a56d Log(n) complexity report merger. 2016-05-04 17:53:09 +01:00
Paul Bellamy
0e70f70ffd Review feedback 2016-05-03 12:49:02 +01:00
Paul Bellamy
02a0e752e3 fix up stats on sub-topologies 2016-05-03 12:47:26 +01:00
Paul Bellamy
fe853e3f0f filter out deleted pods when calculating available namespaces 2016-05-03 12:47:26 +01:00
Paul Bellamy
8758921215 pass nil for Noop a few other places 2016-05-03 12:47:26 +01:00
Paul Bellamy
2af2b1f15a Filter by Kubernetes Namespaces 2016-05-03 12:47:24 +01:00
Tom Wilkie
cf879b268e Aggressively pass nil for the decorator in the rendering pipeline to improve performance. 2016-04-29 11:42:33 +01:00
Paul Bellamy
64450a4830 Merge pull request #1371 from weaveworks/1219-grouped-node-counts-2
Fixing grouped node count for filtered children nodes
2016-04-28 13:30:15 +01:00
Paul Bellamy
3d3aed2bb3 Fixing grouped node count for filtered children nodes
Squash of:

* We have to keep all the container hostnames until the end so we can
  count how many we've filtered

* Adding tests for ContainerHostnameRenderer and PodServiceRenderer with
  filters

* Because we filter on image name we need the image name before
  filtering

* Alternative approach to passing decorators.

* Refactor out some of the decorator capture

* Don't memoise decorated calls to Render

* Fixing filtered counts on containers topology

  Tricky, because we need the filters to be silent sometimes (when they're
  in the middle), but not when they're at the top, so we take the "top"
  filter's stats. However, this means we have to compose all
  user-specified filters into a single Filter layer, so we can get all
  stats.

  There are no more Silent filters, as all filters are silent (unless they
  are at the top).

  Additionally, I clarified some of the filters as their usage/terminology
  was inconsistent and confused. Now Filter(IsFoo, ...) *keeps* only nodes
  where IsFoo is true.
2016-04-28 12:23:43 +01:00
Tom Wilkie
b05ef74552 Report hostname and version in probe struct, and version in host node. 2016-04-26 09:25:15 +01:00
Tom Wilkie
901f46c5fc Report if newer version are availible in /api (#1366)
* Report if newer version are availible in /api

* Render version update hint in UI, next to version

* Fix lint
2016-04-22 10:25:00 +01:00
Tom Wilkie
99204e1ff7 Add k8s pod log control (#1298)
* Remove individually vendored k8s.io/kubernetes/pkg/<foo>

* Vendor the whole of vendor/k8s.io/kubernetes/pkg

* Add k8s pod log control

* Tag pods with host id and include them in the host topology as children.

* adding a basic test for kubernetes.Reporter.GetLogs
2016-04-21 13:48:50 +01:00
Paul Bellamy
1edeb8d190 Removing report.Node.WithID (#1315)
* removing usage of report.Node.WithID

* report.Topology.AddNode can use the node's ID field
2016-04-19 16:48:03 +01:00
Tom Wilkie
0396a79d7f Don't show non-internet pseudo nodes. (#1326) 2016-04-18 14:18:19 +01:00
Tom Wilkie
e2cb836272 Correctly expire the cache in the collector 2016-04-13 11:29:53 +01:00
Paul Bellamy
f211d48cda Merge pull request #1126 from weaveworks/plugins
Plugins
2016-04-12 18:03:26 +01:00
Paul Bellamy
7632e0b3c5 Adding support for plugins, with basic example of iowait, and ebpf
Squash of:
* Include plugins in the report
* show plugin list in the UI
* moving metric and metadata templates into the probe reports
* update js for prime -> priority
* added retry to plugin handshake
* added iowait plugin
* review feedback
* plugin documentation
2016-04-12 17:22:14 +01:00
Tom Wilkie
281ba58845 Add /api/probes endpoint 2016-04-12 17:17:18 +01:00