Bryan Boreham
5318498d9a
improvement: make command-line parsing more robust
2020-06-11 11:15:35 +00:00
Bryan Boreham
70240fc82d
billing: cope with spy-interval set longer than publish-interval
2020-06-11 11:15:35 +00:00
Bryan Boreham
a20c51e94d
Higher limit on topology size for merged reports.
...
Where a report has been merged from several probes, give it a higher
limit before dropping topologies.
We will already have applied the limit on each single-probe report as
it came in, except for historical ones.
2020-05-25 13:12:47 +00:00
Bryan Boreham
fd65155cd6
Register the metric for dropped topologies
...
Missed earlier.
2020-05-25 13:12:27 +00:00
Bryan Boreham
b117b1a5ef
cosmetic: move misplaced import
2020-05-19 10:09:52 +00:00
Bryan Boreham
ad82fafde8
multitenant: scan container command-lines as well as process
2020-05-19 10:09:28 +00:00
Bryan Boreham
f83ad517d8
multitenant: extract command-line parsing function and add test
2020-05-19 10:07:23 +00:00
Bryan Boreham
323aa46d1c
fix (pipes): check websocket errors inside CopyToWebsocket()
...
Previously we were treating EOF on the reader as no-error, meaning
that operations like Kubernetes Describe would retry endlessly when
finished.
2020-05-06 10:04:40 +00:00
Bryan Boreham
fa4d1c4c2b
Upgrade reports before merging
...
In case they came from an older or an overload probe.
2020-04-16 19:27:40 +00:00
Bryan Boreham
b772fa83b3
Add a metric for topologies dropped because they are over limit
...
Need to modify DropTopologiesOver() to report what it dropped, and
plumb through the userid so the metric can show who has a problem.
2020-04-16 19:27:28 +00:00
Bryan Boreham
b1fc59819a
comment: clarify memcached error cases
2020-04-15 16:49:02 +00:00
Bryan Boreham
9a739fda46
Parallelise sending merged reports to store
...
Writes to DynamoDB and S3 can be done in parallel, which will reduce
the overall flush time.
2020-04-13 19:11:34 +00:00
Bryan Boreham
2629d13780
Add a histogram for flush times
2020-04-13 19:11:34 +00:00
Bryan Boreham
ccf031b8a9
enhancement(multitenant): merge incoming reports in a time window
...
This means we store fewer, bigger, reports, which reduces cost of
storage and time to render when data is viewed.
2020-04-13 19:11:34 +00:00
Bryan Boreham
104b9cba50
refactor: Call Close() on collector
...
Doesn't do anything at present, but will be used later.
Change the signature on BillingEmitter.Close() to match. Note we didn't use the error returned.
2020-04-13 19:11:34 +00:00
Bryan Boreham
777ff07e19
refactor(multitenant): break report storage code out into sub-functions
...
So the main Add() function isn't so long.
2020-04-13 19:11:34 +00:00
Bryan Boreham
8c46367808
fix(multitenant): move use of rounding map inside lock
2020-04-13 16:16:20 +00:00
Bryan Boreham
3f11352435
enhancement(multitenant): Track rounding error in billing calculation
...
Billing takes an integer number of seconds, so keep track of the
amount lost to rounding when the publish interval is not an integer.
2020-04-10 19:04:50 +00:00
Bryan Boreham
c784acc20d
Revert change to use report timestamp
...
This reverts commit 6b72246fe6 .
The app merges reports within a 15-second window of its own time, so
if one or more probes have a time that is several seconds different
they will get excluded from the window.
2020-03-28 13:58:34 +00:00
Bryan Boreham
6b72246fe6
fix (multitenant collector): Use consistent report timestamp
...
Previously the code called `time.Now()` in two different places so the
timestamps didn't match. Now we use the timestamp of the report itself.
Add the collector's local time to the report if it didn't have one.
2020-03-26 19:15:34 +00:00
Bryan Boreham
53701aca1f
Cache the last-known report interval per user
...
Delta reports don't contain the string we are looking for, so remember
it from the last full report.
2020-03-06 18:03:51 +00:00
Bryan Boreham
329023b7c5
Improve calculation of usage in multitenant code
...
Use the duration supplied, if there is one.
It was looking for a process named "scope-probe", whereas the
executable is just named "scope".
2020-03-06 13:29:54 +00:00
Bryan Boreham
634e8f1158
Add tracing for pipe operations
2020-01-21 15:51:00 +00:00
Bryan Boreham
d516ed9883
performance(aws_collector): don't persist shortcut reports
...
Shortcut reports are sent to update the UI quickly, on events like a
container starting. We don't need to persist them in the time-travel
data since the same information will be covered by a regular report a
few seconds later.
2019-10-17 18:02:36 +00:00
Akash Srivastava
ca420b07aa
Merge pull request #3687 from weaveworks/refactor-reading
...
Refactor report reading
2019-09-24 12:26:01 +05:30
Bryan Boreham
13af359bcf
refactor: eliminate report.ReadBinary() in favour of MakeFromBinary()
...
The signature of MakeFromFile changed to return a pointer for
consistency.
2019-09-23 10:01:43 +00:00
Bryan Boreham
6ee9738581
Merge pull request #3686 from weaveworks/analyze-reports
...
feature(app): Add a debugging summary function, exposed via http
2019-09-18 15:54:50 +01:00
Bryan Boreham
2bbd4a3f0d
refactor: remove MakeFromBytes() function which is almost the same as ReadBinary()
2019-09-17 10:55:44 +00:00
Bryan Boreham
11e76f1740
feature(app): Add a debugging summary function, exposed via http
...
URL is /admin/summary
2019-09-17 10:48:23 +00:00
Bryan Boreham
04af634065
tracing(app): set a tag for userid on awsCollector.Report
2019-09-15 19:22:08 +00:00
Bryan Boreham
74b6a292d5
Use time.Duration instead of nanoseconds for constants
2019-09-13 07:31:07 +00:00
Bryan Boreham
b5376facf2
Cache merged groups of reports, to reduce the number we handle in parallel
...
Previously we would merge all reports in a 15-second window.
Now we use a 'quantum' of 3 seconds, similar to the single-user app.
E.g. a 30-node cluster will have 150 individual reports over 15
seconds, but the new code will merge 5 pre-merged reports plus 20-ish
very recent individual ones.
This limits the max heap size used for deserialising, since we only do
3 seconds at once per instance.
Individual reports are still put into the cache, but should get
displaced by the pre-merged ones under LRU.
2019-09-09 10:00:26 +00:00
Bryan Boreham
70550ca34a
Refactor: pull userid fetch up out of getReportKeys()
2019-09-09 08:19:55 +00:00
Bryan Boreham
589c4c4d0b
Refactor: pull time interval computation up out of getReportKeys()
2019-09-08 12:27:57 +00:00
Bryan Boreham
26c8760877
Merge pull request #3605 from weaveworks/defer-metrics-registration
...
Defer metrics registration until we need it
2019-07-16 15:45:30 +01:00
Bryan Boreham
89363f5dcf
Defer metrics registration until we need it
...
This avoids app-specific metrics appearing in the probe.
2019-07-04 14:24:22 +00:00
Bryan Boreham
1e2206963a
Merge pull request #3599 from weaveworks/per-tenant-metrics
...
Add metrics for report size and count per tenant
2019-05-15 13:44:42 +01:00
Bryan Boreham
870b52eec0
Review feedback: metric description
2019-05-15 12:43:22 +00:00
Bryan Boreham
711aa66bd5
Add OpenTracing span for report.ReadBinary()
...
So we can see the timing and size in Jaeger.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com >
2019-05-10 15:34:53 +00:00
Bryan Boreham
4c74f8b1cf
Add metrics for report size and count per tenant
...
In a multitenant system it is useful to be able to drill into which
tenants have the most or biggest reports.
Signed-off-by: Bryan Boreham <bryan@weave.works >
2019-05-10 14:49:57 +00:00
Bryan Boreham
ee0ce7b087
Merge pull request #3384 from weaveworks/drop-big-topologies
...
In multitenant app, drop all nodes for big topologies
2018-11-01 17:21:55 +00:00
Bryan Boreham
3be8cf71dd
Add more Opentracing detail to the app ( #3383 )
...
* Pass Go context down to Renderers
This is useful for cancellation or tracing.
* Add tracing spans to app
Also log things like number of nodes in Map, total number of reports.
2018-10-26 11:21:33 +05:30
Bryan Boreham
05b350850f
Drop topologies which are way too big
2018-10-11 17:20:16 +00:00
Bryan Boreham
27047c3297
Embed AWSCollectorConfig instead of duplicating the fields
...
This simplifies adding more fields later.
2018-10-11 15:57:45 +00:00
Marc Carré
2ba50b8b3d
Update golang.org/x/net/context to latest
...
```
$ gvt delete golang.org/x/net/context
$ gvt fetch golang.org/x/net/context
2018/07/23 18:03:49 Fetching: golang.org/x/net/context
$ git grep -l "golang.org/x/net/context" | grep -v vendor | xargs sed -i '' 's:golang.org/x/net/context:context:g'
$ git grep -l "context/ctxhttp" | grep -v vendor | xargs sed -i '' 's:context/ctxhttp:golang.org/x/net/context/ctxhttp:g'
$ gofmt -s -w app
$ gofmt -s -w common
$ gofmt -s -w probe
$ gofmt -s -w prog
$ gofmt -s -w tools
```
fixed a bunch of:
```
cannot use func literal (type func("github.com/weaveworks/scope/vendor/golang.org/x/net/context".Context) error) as type func("context".Context) error
```
2018-07-23 20:10:18 +02:00
Marc Carré
d46c2266ce
Change Sirupsen/logrus to sirupsen/logrus
...
```
$ git grep -l Sirupsen | grep -v vendor | xargs sed -i '' 's:github.com/Sirupsen/logrus:github.com/sirupsen/logrus:g'
$ gofmt -s -w app
$ gofmt -s -w common
$ gofmt -s -w probe
$ gofmt -s -w prog
$ gofmt -s -w tools
```
2018-07-23 20:10:14 +02:00
Bryan Boreham
126a171f62
Make 'fast' merger the default
2018-06-22 11:59:43 +00:00
Marcus Cobden
ba81924278
Add CLI flag for SQS RPC timeout
2018-05-04 10:11:25 +01:00
Matthias Radestock
72b9e9c6b9
add Reporter.HasReports() for cheap report availability checking
...
This requires no report reading / merging.
We plan to expose this in the HTTP API, so the UI gets a cheap way of
checking whether the app is currently receiving data from probes.
2017-12-14 00:13:45 +00:00
Matthias Radestock
54fe1e37da
cosmetic
2017-12-13 23:52:48 +00:00