weave-scope

mirror of https://github.com/weaveworks/scope.git synced 2026-05-18 07:07:37 +00:00

Files

Mike Lang 6b19bc2da9 Changes to how ECS AWS API is used to minimize API calls

Due to AWS API rate limits, we need to minimize API calls as much as possible.

Our stated objectives:
* for all displayed tasks and services to have up-to-date metadata
* for all tasks to map to services if able

My approach here:
* Tasks only contain immutable fields (that we care about). We cache tasks forever.
  We only DescribeTasks the first time we see a new task.
* We attempt to match tasks to services with what info we have. Any "referenced" services,
  ie. a service with at least one matching task, needs to be updated to refresh changing data.
* In the event that a task doesn't match any of the (updated) services, ie. a new service entirely
  needs to be found, we do a full list and detail of all services (we don't re-detail ones we just refreshed).
* To avoid unbounded memory usage, we evict tasks and services from the cache after 1 minute without use.
  This should be long enough for things like temporary failures to be glossed over.

This gives us exactly one call per task, and one call per referenced service per report,
which is unavoidable to maintain fresh data. Expensive "describe all" service queries are kept
to only when newly-referenced services appear, which should be rare.

We could make a few very minor improvements here, such as trying to refresh unreferenced but known
services before doing a list query, or getting details one by one when "describing all" and stopping
when all matches have been found, but I believe these would produce very minor, if any, gains in
number of calls while having an unjustifiable effect on latency since we wouldn't be able to do requests
as concurrently.

Speaking of which, this change has a minor performance impact.
Even though we're now doing less calls, we can't do them as concurrently.

Old code:
	concurrently:
		describe tasks (1 call)
		sequentially:
			list services (1 call)
			describe services (N calls concurrently)
Assuming full concurrency, total latency: 2 end-to-end calls

New code (worst case):
	sequentially:
		describe tasks (1 call)
		describe services (N calls concurrently)
		list services (1 call)
		describe services (N calls concurrently)
Assuming full concurrency, total latency: 4 end-to-end calls

In practical terms, I don't expect this to matter.

2016-12-15 14:11:57 -08:00