Commit Graph

273 Commits

Author SHA1 Message Date
Jonathan Lange
31c88fd62b Instrumentation that we might like to keep 2016-07-04 16:03:50 +01:00
Jonathan Lange
7dd2c6371e Parametrize window rather than assuming default 2016-07-04 13:50:54 +01:00
Jonathan Lange
c1dab17fb3 Make expiration a Duration 2016-07-04 13:30:23 +01:00
Jonathan Lange
f7bdedc149 Config struct for memcache client 2016-07-04 13:25:45 +01:00
Jonathan Lange
96520d7a46 Fixes to memcache support (#1628)
* Fix errors discovered in dev

* Log an error rather than aborting when memcache doesn't resolve
* Initialize map correctly

* Review tweaks
2016-07-04 11:00:11 +01:00
Jonathan Lange
9e0f0c51b9 Configuration type for AWS collector 2016-06-30 17:01:58 +01:00
Jonathan Lange
baacaa8cc5 Rename dynamoCollector to awsCollector 2016-06-30 16:44:43 +01:00
Jonathan Lange
6520f8f5f3 Pass in memcache client 2016-06-30 09:59:55 +01:00
Jonathan Lange
abec257c59 Just pass in the s3 client 2016-06-30 09:57:49 +01:00
Jonathan Lange
d984605de1 Write back to the in-process cache 2016-06-30 09:57:49 +01:00
Jonathan Lange
5ec422c7a3 Fetch all reports at once
Rather than have getReports be responsible for determining keys, instead
call getReportKeys directly and then pass keys to getReports
2016-06-30 09:57:49 +01:00
Jonathan Lange
87da22767e Move s3 logic to separate file 2016-06-30 09:57:49 +01:00
Jonathan Lange
e2bda8f670 Move last memcache bits out of dynamo_collector 2016-06-30 09:55:03 +01:00
Julius Volz
4fa40e22b2 Rework Scope metrics according to Prometheus conventions. (#1615)
* Rework Scope metrics according to Prometheus conventions.

- counters should end with _total
- elaborated and added units to help strings
- recommended for cache hit/miss metrics: track only the total and the
  hits and in separate metrics, since the most common query will be
  "hits / total"
- track all times in seconds (base units), which has become the standard
  recommendation
- other small changes

There could be more changes that would require more thinking (what
dimensions to use, summaries vs. histograms, etc.), but this is probably
enough controversial material already :)

* Use timeRequestStatus() in sqs_control_router.go.
2016-06-30 09:12:25 +01:00
Jonathan Lange
387c543a87 Fix nil pointer error when memcache not enabled 2016-06-24 14:01:46 +01:00
Tom Wilkie
29133e54ca Add backoff to the consul client (#1608)
* Add backoff to the consul client

* Review feedback
2016-06-24 09:04:08 +01:00
Jonathan Lange
47fcb52354 Optional memcached between probes and S3
If given settings for memcached, services will store & fetch reports
from memcache after checking their in-process cache but before fetching
from S3.
2016-06-22 18:40:50 +01:00
Jonathan Lange
9e0b27840b Delete test for unsupported functionality 2016-06-22 11:19:19 +01:00
Jonathan Lange
40cbf119d3 Nice error on unsupported content type 2016-06-22 10:02:18 +01:00
Jonathan Lange
ce5c933d3c Remove unused import 2016-06-21 11:14:14 +01:00
Jonathan Lange
8bd8f883a1 Restore debugging logic 2016-06-21 11:08:55 +01:00
Jonathan Lange
81b05a33ee Make ReadBinary more general and re-use in router 2016-06-20 18:02:23 +01:00
Jonathan Lange
13269e8110 Helper for reading & writing from binary 2016-06-17 15:24:33 +01:00
Tom Wilkie
c80eb42a4f Add filters for pseudo nodes. (#1581)
* Add filters for pseudo nodes.

- Don't filter the internet node as a pseudo node.
- Rename pseudo filter to unmanaged/uncontained.
- Review feedback
- Move the FilterFoo funcs into the tests
- Drop the 'nodes' from filter labels.

* Fix experimental
2016-06-16 20:09:13 +01:00
Tom Wilkie
a7b34f1601 Use NATS for shortcut reports in the service. (#1568)
* Vendor nats-io/nats

* Use NATS for shortcut reports.

* Review feedback.

* Rejig shortcut subscriptions, so they work.

* Review feedback
2016-06-09 12:48:41 +01:00
Tom Wilkie
141ce75902 Log errors in response to http requests. (#1569) 2016-06-09 09:01:50 +01:00
Jonathan Lange
48fc985a3e Get non-cached reports in parallel 2016-06-07 19:23:45 +01:00
Jonathan Lange
0907cdfa0d Fail fast on error fetching non-cached reports 2016-06-07 19:00:14 +01:00
Jonathan Lange
3d12a2a76c Extract function for getting single report 2016-06-07 18:48:01 +01:00
Tom Wilkie
12f281654d Put reports in S3; add in process caching (#1545)
* Add in-process caching to dynamodb collector

* Add metrics for dynamodb consumed capacity and report size

* Log and return errors during report collection

* Increase compression to the max

* Put reports in S3 and just use DynamoDB as an index.

* Review feedback
2016-05-31 15:40:15 +01:00
Tom Wilkie
7377945302 Use smart merger in the dynamodb collector. (#1543) 2016-05-27 08:57:07 +01:00
Tom Wilkie
c8828826ae Allow user to specify table name and queue prefix. (#1538)
* Allow user to specify table name and queue prefix.

* Trim leading slash, catch missed queue prefix

* Comment out publish step until devwww is fixed.
2016-05-25 10:09:32 +01:00
Tom Wilkie
861605a5ee Instrument SQS calls 2016-05-23 16:48:31 +01:00
Tom Wilkie
5a9aebbcb4 lint 2016-05-23 16:19:23 +01:00
Tom Wilkie
f36bb4e2fb Gather dyanmodb latency 2016-05-23 14:02:34 +01:00
Tom Wilkie
334701f92e Add a missing return. 2016-05-20 19:17:15 +01:00
Tom Wilkie
24062be6c9 Increase test replicas (#1529)
* Increase number of test VMs to 5 per shard.

* Make pipe router test shorter.
2016-05-19 11:00:51 +01:00
Tom Wilkie
8f772a696d Add flag to disable reporting of processes (and procspied endpoints) 2016-05-17 17:29:09 +01:00
Alfonso Acosta
3cf3c713ae Meassure report sizes 2016-05-10 09:45:29 +00:00
Paul Bellamy
2d10a6a9a6 Merge pull request #1447 from weaveworks/1441-cache-size
Remove cache from SmartMerger.
2016-05-09 10:51:23 +01:00
Tom Wilkie
2dae03501e Remove the caching 2016-05-09 10:08:14 +01:00
Paul Bellamy
541699d193 Review Feedback 2016-05-09 09:19:11 +01:00
Paul Bellamy
16a5c738d9 Deployment and ReplicaSet views for k8s 2016-05-09 09:03:57 +01:00
Paul Bellamy
bb284edee8 set 'default' as the default namespace filter instead of 'all' (#1445) 2016-05-06 18:17:32 +01:00
Tom Wilkie
71d3126c82 Limit merge cache to 200 entries and expire entries old than merge window. 2016-05-06 17:54:57 +01:00
Tom Wilkie
54a760a56d Log(n) complexity report merger. 2016-05-04 17:53:09 +01:00
Paul Bellamy
0e70f70ffd Review feedback 2016-05-03 12:49:02 +01:00
Paul Bellamy
02a0e752e3 fix up stats on sub-topologies 2016-05-03 12:47:26 +01:00
Paul Bellamy
fe853e3f0f filter out deleted pods when calculating available namespaces 2016-05-03 12:47:26 +01:00
Paul Bellamy
8758921215 pass nil for Noop a few other places 2016-05-03 12:47:26 +01:00