Matthias Radestock
d6e5f0a154
allow more accurate reporting of memcache hit ratio
...
A lot of time could pass between recording the request count and hit
count pertaining to a particular report fetching batch, which skewed
calculations cache hit ratios.
Fix that by defering the request count recording to the end, which is
when we record the hit count.
2016-08-15 16:25:12 +01:00
Matthias Radestock
9cf178f130
fix MemcacheClient.FetchReports miss & leak on corrupt report
...
Problem: Decoding a corrupt report grows the 'missing' list. Since we
are waiting for 'len(keys)-len(missing)' decoder go-routines, this
results in waiting for fewer go-routines than we should. The surplus
go-routines leak and we ignore their reports. And since the keys of the
ignored reports are not included in 'missing', we won't attempt to fetch
them from S3 either. Oops.
Fix: calculate the number of go-routines once, at the beginning.
2016-08-15 10:44:29 +01:00
Matthias Radestock
6334836f69
Merge pull request #1768 from weaveworks/1202-silence-abnormal-close
...
silence abnormal websocket close
Fixes #1202 .
2016-08-12 13:53:51 +01:00
Matthias Radestock
c3315f9c99
reduce log level for absent pipe
...
since we can get this when a pipe has been closed normally
2016-08-05 23:47:35 +01:00
Matthias Radestock
df467d80de
log error as error
2016-08-05 23:37:33 +01:00
Matthias Radestock
6d9194cfaf
treat EOF and ErrClosedPipe in websocket connections as uninteresting
...
both occur in various states of disconnectedness
2016-08-05 23:32:34 +01:00
Matthias Radestock
190e840484
reduce some pipe log noise
...
NB: the m/t version remains unchanged since it is generally a lot
noisier
2016-08-05 19:16:15 +01:00
Krzesimir Nowak
dcaa7503b8
Fix a typo
...
The typo is here from March, 2016. It is strange that it wasn't
detected earlier.
2016-08-04 11:36:04 +02:00
Alfonso Acosta
b8bf60c6f1
Use slices instead of linked lists for Metric
...
Also:
* Remove Gob encoder/decoder
* Stop using custom encoders/decoders for Timestamps (both ugorji and the Golang JSON codecs use nanosecond precision).
* Use idiomatic way to check for existence in metric.LastSample()
2016-08-01 10:21:57 +00:00
Paul Bellamy
274158493d
Name our routes, so /metrics gives more sensible aggregations
2016-07-26 12:49:04 +01:00
Matthias Radestock
3202cc7e58
hide uncontained/unmanaged by default
...
They are of no interest to most users and affect the initial user
experience.
Fixes #1689 .
2016-07-17 19:00:18 +01:00
Jonathan Lange
a3648f0c89
Inline StoreBytes
2016-07-15 12:58:27 +01:00
Jonathan Lange
bbd75ddd24
Use memcache compression level from config
2016-07-15 11:24:37 +01:00
Jonathan Lange
1fd8a5fb88
Use StoreReport in main AWS routine
2016-07-15 11:24:37 +01:00
Jonathan Lange
46dfeb627d
Call it reportKey
2016-07-15 11:24:37 +01:00
Jonathan Lange
0058229687
Extract functions for calculating keys
...
Not so much for re-use as to help jml understand what's going on
2016-07-15 11:24:36 +01:00
Jonathan Lange
270a55060f
Add StoreReport methods to stores
...
Not sure if we'll use them.
2016-07-15 11:24:36 +01:00
Jonathan Lange
60e14c1dc2
Plumb through an option for compression
2016-07-15 11:24:36 +01:00
Jonathan Lange
2bfd6d7eb7
Parametrize compression level
2016-07-15 11:24:36 +01:00
Jonathan Lange
d83d7318d0
Remove the old metric
2016-07-12 18:16:35 +01:00
Jonathan Lange
d2298aa8f3
Store a histogram of report sizes
2016-07-12 16:37:29 +01:00
Tom Wilkie
3173f6ad75
Use histograms over summaries
2016-07-12 11:15:57 +01:00
Paul Bellamy
ce2fd1e477
Merge pull request #1659 from weaveworks/demo
...
Adding a static report file mode.
2016-07-11 14:43:04 +01:00
Paul Bellamy
7a37577f71
Review Feedback
2016-07-11 13:36:22 +01:00
Jonathan Lange
49f2e4e40c
Count memcache requests even if they time out
2016-07-11 13:01:02 +01:00
Paul Bellamy
bcddfd82c3
Added file collector, to serve a static report from file
2016-07-11 11:50:27 +01:00
Paul Bellamy
8cb1ecdf2c
Merge pull request #1642 from weaveworks/refactoring-timing
...
refactor some timing helpers into a common lib
2016-07-05 13:01:28 +01:00
Paul Bellamy
7736564337
refactor some timing helpers into a common lib
2016-07-05 12:29:00 +01:00
Jonathan Lange
31c88fd62b
Instrumentation that we might like to keep
2016-07-04 16:03:50 +01:00
Jonathan Lange
7dd2c6371e
Parametrize window rather than assuming default
2016-07-04 13:50:54 +01:00
Jonathan Lange
c1dab17fb3
Make expiration a Duration
2016-07-04 13:30:23 +01:00
Jonathan Lange
f7bdedc149
Config struct for memcache client
2016-07-04 13:25:45 +01:00
Jonathan Lange
96520d7a46
Fixes to memcache support ( #1628 )
...
* Fix errors discovered in dev
* Log an error rather than aborting when memcache doesn't resolve
* Initialize map correctly
* Review tweaks
2016-07-04 11:00:11 +01:00
Jonathan Lange
9e0f0c51b9
Configuration type for AWS collector
2016-06-30 17:01:58 +01:00
Jonathan Lange
baacaa8cc5
Rename dynamoCollector to awsCollector
2016-06-30 16:44:43 +01:00
Jonathan Lange
6520f8f5f3
Pass in memcache client
2016-06-30 09:59:55 +01:00
Jonathan Lange
abec257c59
Just pass in the s3 client
2016-06-30 09:57:49 +01:00
Jonathan Lange
d984605de1
Write back to the in-process cache
2016-06-30 09:57:49 +01:00
Jonathan Lange
5ec422c7a3
Fetch all reports at once
...
Rather than have getReports be responsible for determining keys, instead
call getReportKeys directly and then pass keys to getReports
2016-06-30 09:57:49 +01:00
Jonathan Lange
87da22767e
Move s3 logic to separate file
2016-06-30 09:57:49 +01:00
Jonathan Lange
e2bda8f670
Move last memcache bits out of dynamo_collector
2016-06-30 09:55:03 +01:00
Julius Volz
4fa40e22b2
Rework Scope metrics according to Prometheus conventions. ( #1615 )
...
* Rework Scope metrics according to Prometheus conventions.
- counters should end with _total
- elaborated and added units to help strings
- recommended for cache hit/miss metrics: track only the total and the
hits and in separate metrics, since the most common query will be
"hits / total"
- track all times in seconds (base units), which has become the standard
recommendation
- other small changes
There could be more changes that would require more thinking (what
dimensions to use, summaries vs. histograms, etc.), but this is probably
enough controversial material already :)
* Use timeRequestStatus() in sqs_control_router.go.
2016-06-30 09:12:25 +01:00
Jonathan Lange
387c543a87
Fix nil pointer error when memcache not enabled
2016-06-24 14:01:46 +01:00
Tom Wilkie
29133e54ca
Add backoff to the consul client ( #1608 )
...
* Add backoff to the consul client
* Review feedback
2016-06-24 09:04:08 +01:00
Jonathan Lange
47fcb52354
Optional memcached between probes and S3
...
If given settings for memcached, services will store & fetch reports
from memcache after checking their in-process cache but before fetching
from S3.
2016-06-22 18:40:50 +01:00
Jonathan Lange
9e0b27840b
Delete test for unsupported functionality
2016-06-22 11:19:19 +01:00
Jonathan Lange
40cbf119d3
Nice error on unsupported content type
2016-06-22 10:02:18 +01:00
Jonathan Lange
ce5c933d3c
Remove unused import
2016-06-21 11:14:14 +01:00
Jonathan Lange
8bd8f883a1
Restore debugging logic
2016-06-21 11:08:55 +01:00
Jonathan Lange
81b05a33ee
Make ReadBinary more general and re-use in router
2016-06-20 18:02:23 +01:00