Commit Graph

557 Commits

Author SHA1 Message Date
Mike Lang
5c19dc792e ecs probe: add tests for reporter 2017-01-13 17:31:29 -08:00
Mike Lang
685af493bf ecs probe: Allow cache settings to be tweaked 2017-01-12 11:37:23 -08:00
Mike Lang
513977081d aws ecs probe: Use a size and time bound LRU gcache for caching
instead of our own hand-rolled size-unbound cache
2017-01-12 10:34:41 -08:00
Mike Lang
e220ae822f wip: 2017-01-12 07:11:12 -08:00
Mike Lang
49d3e7bbd3 wip: 2016-12-16 17:00:57 -08:00
Mike Lang
0fb74d6781 ecs client: more refactoring for nice code
pulls the inner function of describeServices into its own top-level function,
makes the lock part of the client object as a result
2016-12-15 14:11:58 -08:00
Mike Lang
adb6f9d4a1 Appease linter 2016-12-15 14:11:58 -08:00
Mike Lang
6f2efca968 more review feedback 2016-12-15 14:11:58 -08:00
Mike Lang
7d845f9130 ecs reporter: Review feedback, some trivial renames 2016-12-15 14:11:58 -08:00
Mike Lang
7ebb76d0a3 ecs reporter: Move some code around to break up large function 2016-12-15 14:11:58 -08:00
Mike Lang
1d63830792 awsecs reporter: Add lots of debug logging and fix bugs
- describeServices wasn't describing the partial page left over at the end,
  which would cause incorrect results
- the shim between listServices and describeServices was closing the channel every iteration,
  which would cause panic for write to closed channel
- client was not being saved when created, so it gets recreated each time
- we were describeTasks'ing even if we had no tasks to describe
2016-12-15 14:11:57 -08:00
Mike Lang
4234888bf4 ecs: Linter fixes 2016-12-15 14:11:57 -08:00
Mike Lang
357136721d Fix compile errors and go fmt 2016-12-15 14:11:57 -08:00
Mike Lang
9d1e46f81b ECS reporter: Use persistent client objects across reports
Not only does this allow us to re-use connections, but vitally it allows us
to make use of the new task and service caching within the client object.
2016-12-15 14:11:57 -08:00
Mike Lang
6b19bc2da9 Changes to how ECS AWS API is used to minimize API calls
Due to AWS API rate limits, we need to minimize API calls as much as possible.

Our stated objectives:
* for all displayed tasks and services to have up-to-date metadata
* for all tasks to map to services if able

My approach here:
* Tasks only contain immutable fields (that we care about). We cache tasks forever.
  We only DescribeTasks the first time we see a new task.
* We attempt to match tasks to services with what info we have. Any "referenced" services,
  ie. a service with at least one matching task, needs to be updated to refresh changing data.
* In the event that a task doesn't match any of the (updated) services, ie. a new service entirely
  needs to be found, we do a full list and detail of all services (we don't re-detail ones we just refreshed).
* To avoid unbounded memory usage, we evict tasks and services from the cache after 1 minute without use.
  This should be long enough for things like temporary failures to be glossed over.

This gives us exactly one call per task, and one call per referenced service per report,
which is unavoidable to maintain fresh data. Expensive "describe all" service queries are kept
to only when newly-referenced services appear, which should be rare.

We could make a few very minor improvements here, such as trying to refresh unreferenced but known
services before doing a list query, or getting details one by one when "describing all" and stopping
when all matches have been found, but I believe these would produce very minor, if any, gains in
number of calls while having an unjustifiable effect on latency since we wouldn't be able to do requests
as concurrently.

Speaking of which, this change has a minor performance impact.
Even though we're now doing less calls, we can't do them as concurrently.

Old code:
	concurrently:
		describe tasks (1 call)
		sequentially:
			list services (1 call)
			describe services (N calls concurrently)
Assuming full concurrency, total latency: 2 end-to-end calls

New code (worst case):
	sequentially:
		describe tasks (1 call)
		describe services (N calls concurrently)
		list services (1 call)
		describe services (N calls concurrently)
Assuming full concurrency, total latency: 4 end-to-end calls

In practical terms, I don't expect this to matter.
2016-12-15 14:11:57 -08:00
Mike Lang
5ed63de306 Merge pull request #2060 from weaveworks/mike/awsecs/fix-log-formatting
ecs reporter: Fix some log lines that were passing *string instead of string
2016-12-14 11:20:59 -08:00
Alfonso Acosta
07aee0ed97 Merge pull request #2020 from kinvolk/alban/fix-getWalkedProcPid
procspy: use a Reader to copy the background reader buffer
2016-12-07 12:53:53 +01:00
Jonathan Lange
1020fc5f85 Use test.Diff from common 2016-12-07 11:22:40 +00:00
Jonathan Lange
b5c750ddea Move test & fs 2016-12-07 11:22:39 +00:00
Jonathan Lange
e8085b01b6 Use 'common' library 2016-12-07 11:22:38 +00:00
Mike Lang
fb12df6036 ecs reporter: Fix some log lines that were passing *string instead of string 2016-12-05 15:43:36 -08:00
Alban Crequy
543f3d5bdc procspy: use a Reader to copy the background reader buffer
getWalkedProcPid() reads latestBuf every 3 seconds (for each report).
But performWalk() writes latestBuf every 10 seconds or so. So we need to
be able to read the same buffer several times.
2016-12-05 18:12:11 +01:00
Mike Lang
d0caee4748 Add some basic metadata to the ECS task/service details panels 2016-11-29 07:18:08 -08:00
Alfonso Acosta
9c7282231f Fix tests
Also, refactor some tests and MakeRegistry in api_topologies
2016-11-29 07:18:08 -08:00
Mike Lang
003ef6b4ea Add some basic metadata to ECS nodes 2016-11-29 07:18:08 -08:00
Mike Lang
9a10e9650d Fix the one instance of "make" that is actually apparently required 2016-11-29 07:18:08 -08:00
Mike Lang
b06fee8c0f Review feedback 2016-11-29 07:18:08 -08:00
Mike Lang
b53de4317d Appease linter
spellcheck and required comments and required comment formatting
2016-11-29 07:18:08 -08:00
Mike Lang
f5b7b5bec2 ecs reporter: Fix a bug where parents weren't actually set 2016-11-29 07:18:08 -08:00
Alfonso Acosta
90c8b6eeed Add ECS topologies to tagger 2016-11-29 07:18:08 -08:00
Alfonso Acosta
78775bbdb8 Initial rendering for ECS
(not working yet)
2016-11-29 07:18:05 -08:00
Mike Lang
a2d329dee7 ecs reporter: Associate containers with ECS tasks and services as parents 2016-11-29 07:17:16 -08:00
Mike Lang
511f6dad6a Add report tagger for populating ECS topologies 2016-11-29 07:17:16 -08:00
Filip Barl
d15e884cb1 Table-mode: sort ips numerically (#2007)
Fix #1746 - sort IPs numerically in the table mode
2016-11-22 11:05:59 +01:00
Alfonso Acosta
e57cd8d2e7 Fix time-dependant test (stop testing docker client library) 2016-11-10 16:35:57 +00:00
Alfonso Acosta
74490b8fef Give time to the overlay test backoff collectors to finish
Not the most elegant/robust solution but it solves the problem without adding
extra methods to the backoff interface (only to check if they are ready from
tests).
2016-11-08 13:49:17 +00:00
Alfonso Acosta
fe53752520 Add connections table to Weave Net details panel 2016-11-04 09:41:16 +00:00
Alfonso Acosta
3ba83ddd53 Merge pull request #1973 from weaveworks/1938-enrich-weave-details-panel
Extend metadata in details panel for Weave Net nodes
2016-11-04 09:44:27 +01:00
Alfonso Acosta
fc4eb85de2 More feedback 2016-11-03 22:27:46 +00:00
Alfonso Acosta
b30a9c44b6 Review feedback (and fix metadata bug) 2016-11-03 21:58:54 +00:00
Alfonso Acosta
0884955c95 Extend metadata in details panel for Weave Net nodes 2016-11-03 15:57:23 +00:00
Simon
7e5166e45e Merge pull request #1966 from weaveworks/746-resize-ttys
Resize TTYs
2016-11-03 11:06:16 +01:00
Alfonso Acosta
3af7076f30 Review feedback 2016-11-02 14:46:56 +00:00
Alfonso Acosta
9e378ca4b5 Show image information at the beginning 2016-11-02 13:16:12 +00:00
Alfonso Acosta
216cc0d605 Add image table to container nodes
Also, extend metadata of images with sizes
2016-11-02 13:16:11 +00:00
Alfonso Acosta
9367d95cb0 Allow providing fixed entries in tables 2016-11-02 13:00:15 +00:00
Alfonso Acosta
1cf66419d9 Make linter happy 2016-10-31 14:26:37 +00:00
Alfonso Acosta
253657887c Implement TTY resize for hosts 2016-10-31 12:11:25 +00:00
Alfonso Acosta
6a3910d20a Fix backend tests 2016-10-31 10:15:50 +00:00
Alfonso Acosta
411f8d729e Remove leftover functions 2016-10-31 11:04:41 +01:00