The host CPU metric was reported as a percentage of all available CPUs,
but the limit was set to n_cpus * 100%. So on a 4-core machine the
graphs and metrics-on-canvas would never show more than 1/4th usage. Now
the limit is set to 100%.
Fixes#1664.
The container CPU metric was reported in units of 100% = 1 CPU. So the
the ratio was correct, but since we don't show limits in most places it
is hard to interpret that figure. It also makes sorting by CPU usage
highly misleading. So now we normalise everything to 100%. That too can
be misleading, depending on what you are looking for, but it's generally
less surprising.
- Increase line height so that it doesn't fall over the side of the
container (which is overflow: hidden).
- Compensate for line-height increase w/ small label position tweak.
* Squashed 'tools/' changes from e9e7e6b..db5efc0
db5efc0 Merge pull request #28 from weaveworks/mike/add-image-tag
5312c40 Import image-tag script into build tools so it can be shared
7e850f8 Fix logs path
dda9785 Update deploy api
f2f4e5b Fix the wcloud client
3925eb6 Merge pull request #27 from weaveworks/wcloud-events
77355b9 Lint
d9a1c6c Add wcloud events, update flags and error nicely when there is no config
git-subtree-dir: tools
git-subtree-split: db5efc0537
* Remove ./image-tag and use ./tools/image-tag instead
image-tag is now shared code from the build-tools repo
We fell victim to variable shadowing here. Each store would be fed the
original list of report keys, instead of only the ones that weren't
found in the previous store. So if a single report was missing from the
in-process cache, we would then fetch all reports from memcache. And if
that in turn was missing a single report we would fetch all reports from
S3.
We chain report stores for a reason - to reduce latency and, in case of
the in-process cache, eliminate decoding costs. So this bug has a huge
impact on query service performance.
To make matters worse, we return *all* the reports - now possibly in
triplicate. Fortunately, the SmartMerger filters these out, so at least
we were not incurring extra merge costs.
A lot of time could pass between recording the request count and hit
count pertaining to a particular report fetching batch, which skewed
calculations cache hit ratios.
Fix that by defering the request count recording to the end, which is
when we record the hit count.
Problem: Decoding a corrupt report grows the 'missing' list. Since we
are waiting for 'len(keys)-len(missing)' decoder go-routines, this
results in waiting for fewer go-routines than we should. The surplus
go-routines leak and we ignore their reports. And since the keys of the
ignored reports are not included in 'missing', we won't attempt to fetch
them from S3 either. Oops.
Fix: calculate the number of go-routines once, at the beginning.