556 Commits

Author SHA1 Message Date
Tom Wilkie
32edfc9112 Don't reencode reports in the collector (#1819)
* Don't reencode reports in the collector

* Review feedback

* Fix comment

* Update alpine URLs so it will build

* Fix tests
2016-08-22 17:37:41 +01:00
Matthias Radestock
6ba812e82f restore compatibility of old probes with new m/t app
This got broken in #1682
2016-08-16 23:53:07 +01:00
Paul Bellamy
6e2fe78cab Merge pull request #1682 from kinvolk/krnowak/plugin-controls
RFC: forwarding control requests to plugins
2016-08-16 13:44:42 +01:00
Matthias Radestock
e6a474ead7 fix report store query chaining
We fell victim to variable shadowing here. Each store would be fed the
original list of report keys, instead of only the ones that weren't
found in the previous store. So if a single report was missing from the
in-process cache, we would then fetch all reports from memcache. And if
that in turn was missing a single report we would fetch all reports from
S3.

We chain report stores for a reason - to reduce latency and, in case of
the in-process cache, eliminate decoding costs. So this bug has a huge
impact on query service performance.

To make matters worse, we return *all* the reports - now possibly in
triplicate. Fortunately, the SmartMerger filters these out, so at least
we were not incurring extra merge costs.
2016-08-16 00:55:39 +01:00
Matthias Radestock
d6e5f0a154 allow more accurate reporting of memcache hit ratio
A lot of time could pass between recording the request count and hit
count pertaining to a particular report fetching batch, which skewed
calculations cache hit ratios.

Fix that by defering the request count recording to the end, which is
when we record the hit count.
2016-08-15 16:25:12 +01:00
Matthias Radestock
9cf178f130 fix MemcacheClient.FetchReports miss & leak on corrupt report
Problem: Decoding a corrupt report grows the 'missing' list. Since we
are waiting for 'len(keys)-len(missing)' decoder go-routines, this
results in waiting for fewer go-routines than we should. The surplus
go-routines leak and we ignore their reports. And since the keys of the
ignored reports are not included in 'missing', we won't attempt to fetch
them from S3 either. Oops.

Fix: calculate the number of go-routines once, at the beginning.
2016-08-15 10:44:29 +01:00
Krzesimir Nowak
0ecb908c22 Ensure backward compatilibity in report's node controls
The new probe will convert all node's LatestControls to Controls, so
the old app can consume them. Also, the new app will convert all
node's Controls to LatestControl, so it can consume the reports from
old probes.
2016-08-12 17:15:43 +02:00
Matthias Radestock
6334836f69 Merge pull request #1768 from weaveworks/1202-silence-abnormal-close
silence abnormal websocket close

Fixes #1202.
2016-08-12 13:53:51 +01:00
Matthias Radestock
c3315f9c99 reduce log level for absent pipe
since we can get this when a pipe has been closed normally
2016-08-05 23:47:35 +01:00
Matthias Radestock
df467d80de log error as error 2016-08-05 23:37:33 +01:00
Matthias Radestock
6d9194cfaf treat EOF and ErrClosedPipe in websocket connections as uninteresting
both occur in various states of disconnectedness
2016-08-05 23:32:34 +01:00
Matthias Radestock
190e840484 reduce some pipe log noise
NB: the m/t version remains unchanged since it is generally a lot
noisier
2016-08-05 19:16:15 +01:00
Krzesimir Nowak
dcaa7503b8 Fix a typo
The typo is here from March, 2016. It is strange that it wasn't
detected earlier.
2016-08-04 11:36:04 +02:00
Alfonso Acosta
b8bf60c6f1 Use slices instead of linked lists for Metric
Also:

* Remove Gob encoder/decoder
* Stop using custom encoders/decoders for Timestamps (both ugorji and the Golang JSON codecs use nanosecond precision).
* Use idiomatic way to check for existence in metric.LastSample()
2016-08-01 10:21:57 +00:00
Paul Bellamy
274158493d Name our routes, so /metrics gives more sensible aggregations 2016-07-26 12:49:04 +01:00
Matthias Radestock
3202cc7e58 hide uncontained/unmanaged by default
They are of no interest to most users and affect the initial user
experience.

Fixes #1689.
2016-07-17 19:00:18 +01:00
Jonathan Lange
a3648f0c89 Inline StoreBytes 2016-07-15 12:58:27 +01:00
Jonathan Lange
bbd75ddd24 Use memcache compression level from config 2016-07-15 11:24:37 +01:00
Jonathan Lange
1fd8a5fb88 Use StoreReport in main AWS routine 2016-07-15 11:24:37 +01:00
Jonathan Lange
46dfeb627d Call it reportKey 2016-07-15 11:24:37 +01:00
Jonathan Lange
0058229687 Extract functions for calculating keys
Not so much for re-use as to help jml understand what's going on
2016-07-15 11:24:36 +01:00
Jonathan Lange
270a55060f Add StoreReport methods to stores
Not sure if we'll use them.
2016-07-15 11:24:36 +01:00
Jonathan Lange
60e14c1dc2 Plumb through an option for compression 2016-07-15 11:24:36 +01:00
Jonathan Lange
2bfd6d7eb7 Parametrize compression level 2016-07-15 11:24:36 +01:00
Jonathan Lange
d83d7318d0 Remove the old metric 2016-07-12 18:16:35 +01:00
Jonathan Lange
d2298aa8f3 Store a histogram of report sizes 2016-07-12 16:37:29 +01:00
Tom Wilkie
3173f6ad75 Use histograms over summaries 2016-07-12 11:15:57 +01:00
Paul Bellamy
ce2fd1e477 Merge pull request #1659 from weaveworks/demo
Adding a static report file mode.
2016-07-11 14:43:04 +01:00
Paul Bellamy
7a37577f71 Review Feedback 2016-07-11 13:36:22 +01:00
Jonathan Lange
49f2e4e40c Count memcache requests even if they time out 2016-07-11 13:01:02 +01:00
Paul Bellamy
bcddfd82c3 Added file collector, to serve a static report from file 2016-07-11 11:50:27 +01:00
Paul Bellamy
8cb1ecdf2c Merge pull request #1642 from weaveworks/refactoring-timing
refactor some timing helpers into a common lib
2016-07-05 13:01:28 +01:00
Paul Bellamy
7736564337 refactor some timing helpers into a common lib 2016-07-05 12:29:00 +01:00
Jonathan Lange
31c88fd62b Instrumentation that we might like to keep 2016-07-04 16:03:50 +01:00
Jonathan Lange
7dd2c6371e Parametrize window rather than assuming default 2016-07-04 13:50:54 +01:00
Jonathan Lange
c1dab17fb3 Make expiration a Duration 2016-07-04 13:30:23 +01:00
Jonathan Lange
f7bdedc149 Config struct for memcache client 2016-07-04 13:25:45 +01:00
Jonathan Lange
96520d7a46 Fixes to memcache support (#1628)
* Fix errors discovered in dev

* Log an error rather than aborting when memcache doesn't resolve
* Initialize map correctly

* Review tweaks
2016-07-04 11:00:11 +01:00
Jonathan Lange
9e0f0c51b9 Configuration type for AWS collector 2016-06-30 17:01:58 +01:00
Jonathan Lange
baacaa8cc5 Rename dynamoCollector to awsCollector 2016-06-30 16:44:43 +01:00
Jonathan Lange
6520f8f5f3 Pass in memcache client 2016-06-30 09:59:55 +01:00
Jonathan Lange
abec257c59 Just pass in the s3 client 2016-06-30 09:57:49 +01:00
Jonathan Lange
d984605de1 Write back to the in-process cache 2016-06-30 09:57:49 +01:00
Jonathan Lange
5ec422c7a3 Fetch all reports at once
Rather than have getReports be responsible for determining keys, instead
call getReportKeys directly and then pass keys to getReports
2016-06-30 09:57:49 +01:00
Jonathan Lange
87da22767e Move s3 logic to separate file 2016-06-30 09:57:49 +01:00
Jonathan Lange
e2bda8f670 Move last memcache bits out of dynamo_collector 2016-06-30 09:55:03 +01:00
Julius Volz
4fa40e22b2 Rework Scope metrics according to Prometheus conventions. (#1615)
* Rework Scope metrics according to Prometheus conventions.

- counters should end with _total
- elaborated and added units to help strings
- recommended for cache hit/miss metrics: track only the total and the
  hits and in separate metrics, since the most common query will be
  "hits / total"
- track all times in seconds (base units), which has become the standard
  recommendation
- other small changes

There could be more changes that would require more thinking (what
dimensions to use, summaries vs. histograms, etc.), but this is probably
enough controversial material already :)

* Use timeRequestStatus() in sqs_control_router.go.
2016-06-30 09:12:25 +01:00
Jonathan Lange
387c543a87 Fix nil pointer error when memcache not enabled 2016-06-24 14:01:46 +01:00
Tom Wilkie
29133e54ca Add backoff to the consul client (#1608)
* Add backoff to the consul client

* Review feedback
2016-06-24 09:04:08 +01:00
Jonathan Lange
47fcb52354 Optional memcached between probes and S3
If given settings for memcached, services will store & fetch reports
from memcache after checking their in-process cache but before fetching
from S3.
2016-06-22 18:40:50 +01:00