Similar to video compression which uses key-frames and differences
between them: every N publishes we send a full report, but inbetween
we only send what has changed.
Fairly simple approach in the probe - hold on to the last full report,
and for the deltas remove anything that would be merged in from the
full report.
On the receiving side in the app it already merges a set of reports
together to produce the final output for rendering, so provided N is
smaller than that set we don't need to do anything different.
Deltas don't need to represent nodes that have disappeared - an
earlier full node will have that node so it would be merged into the
final output anyway.
We are already timing all report, tag and tick operations.
If Prometheus is in use, expose those metrics that way.
Adjust metrics naming to fit with Prometheus norms.
The previous way these metrics were exposed was via SIGUSR1, and we
can only have one "sink", so make it either-or.
Signed-off-by: Bryan Boreham <bryan@weave.works>
So that, if many shortcut reports are produced in quick succession,
they will tend to get merged together.
Expand the queue size for shortcut reports, to avoid holding up the
producer so much.
Simplification: move the 'noControls' functionality into the probe, as
we don't need a whole struct to do that.
The ReportPublisher interface also moves into probe where it belongs:
"the consumer should define the interface" - Dave Cheney
Removed to reduce CPU and memory usage in probes.
This code was added in August 2016 so that newer probes could be used
with older apps. Since then we have adopted the stance that new apps
will accept reports from old probes but not vice-versa, on a version
change.
The new probe will convert all node's LatestControls to Controls, so
the old app can consume them. Also, the new app will convert all
node's Controls to LatestControl, so it can consume the reports from
old probes.
* alpine: dl-4.alpinelinux.org is dead, use another server
* increase buffer for docker stats
Attempt to avoid the following message:
docker container: dropping stats.
* probe: better timeout error messages
The logs contains the following messages:
Process reporter took longer than 1s
K8s reporter took longer than 1s
Docker reporter took longer than 1s
Endpoint reporter took longer than 1s
This patch prints how long it takes.
Squash of:
* Include plugins in the report
* show plugin list in the UI
* moving metric and metadata templates into the probe reports
* update js for prime -> priority
* added retry to plugin handshake
* added iowait plugin
* review feedback
* plugin documentation