GC takes longer, for instance. Also we should fail if we don't get the IP in time, otherwise
we just deadlock. And fix some concurrent variable access.
- App takes POST report on /api/report
- Probe publishes to configured target(s)
- Name resolution happens on probe-side
- There's no longer an xfer.ProbePort
- xfer.Collector responsibility is reduced
- Fixes to remaining experimental components.
- rm experimental/bridge: it's not being used, and by changing the
app/probe comm model, it would require a complete refactor anyway. We
can easily rebuild it when we need to. It will even be much simpler.
- rm experimental/graphviz: it's broken for some time anyway, and we
don't really need to play around with it as a rendering option
anymore.
- rm experimental/oneshot: we never use this anymore.
Also, 1 packet may be counted in N topologies, so you can't rely on the
sum of all packet counts across topologies having any relation to the
sampling data.
Another implicit invariant in the data model is that edges are always of the
form (local -> remote). That is, the source of an edge must always be a node
that originates from within Scope's domain of visibility. This was evident by
the presence of ingress and egress fields in edge/aggregate metadata.
When building the sniffer, I accidentally and incorrectly violated this
invariant, by constructing distinct edges for (local -> remote) and (remote ->
local), and collapsing ingress and egress byte counts to a single scalar. I
experienced a variety of subtle undefined behavior as a result. See #339.
This change reverts to the old, correct methodology. Consequently the sniffer
needs to be able to find out which side of the sniffed packet is local v.
remote, and to do that it needs access to local networks. I moved the
discovery from the probe/host package into probe/main.go.
As part of that work I discovered that package report also maintains its own,
independent "cache" of local networks. Except it contains only the (optional)
Docker bridge network, if it's been populated by the probe, and it's only used
by the report.Make{Endpoint,Address}NodeID constructors to scope local
addresses. Normally, scoping happens during rendering, and only for pseudo
nodes -- see current LeafMap Render localNetworks. This is pretty convoluted
and should be either be made consistent or heavily commented.
NewNodeMetadata -> MakeNodeMetadata. It doesn't return a pointer, so
Make is more idiomatic.
Invoke MakeNodeMetadata when necessary. The zero value for a
NodeMetadata is no longer valid.
Split MakeNodeMetadata to two constructors. MakeNodeMetadata when you
don't have anything to prepopulate; MakeNodeMetadataWith when you do.
Also, a fix to the tests in app. We unmarshal a RenderableNode struct,
which has a JSON-ignored NodeMetadata field. The zero value is invalid,
so we need to fix that before performing comparisons.
This fixes the regression where process names weren't appearing for
Darwin probes. Makes testing easier.
Also, changes the process walker to operate on value types. There's no
performance advantage to using reference types for something of this
size, and there appeared to be a data race in the Darwin port that
caused nodes to gain and lose process names over time.
Also, restructures how to enable docker scraping. Default false when run
manually, and enabled via --probe.docker true in the scope script.
- Make poll take interfaces, do diff on error
- Use poll in TestRegistryEvents
- Improve the locking to prevent deadlocks and data races in registry_test.go
This causes detailed node lookups for the grouped-by-process-name view to fail. Also, add a test for process walker trimmming whitespace, and a test the process-by-name view gives the right result.
- Move pidtree to its own module and disaggregate it into tree, walker and reporter.
- Extend testing for probe/process
- Extend process metadata; add command line & # threads.
- Move docker probe code into it's own module
- Put PIDTree behind and interface for mocking
- Disaggregate dockerTagger into a registry, tagger and reporter
- Similarly disaggregate tests
- Add mocks for docker container and registry
- Add test for docker events & stats
This makes container image details show the containers (and processes) correctly.
Also:
- introduces a 'test' package, moved Diff function there.
- adds some tests for this new rendered view.
We only want to scope (i.e. prefix with hostID) those addresses that are
deemed loopback, to disambiguate them. Otherwise, we want to leave
addresses in unscoped form, so they can be matched, and links between
communicating nodes properly made.
So, we make the isLoopback check in MakeAddressID, and omit hostID if
the address isn't loopback. So far so good.
But this breaks topology rendering, as we were relying on extracting
hostID from adjacency node IDs, to populate origin hosts in the rendered
node output. So we need another way to get origin host from an arbitrary
node.
A survey revealed no reliable way to get that information from IDs in
their new form. However, we have access to node metadata, so this
changeset introduces the OriginHostTagger, which tags each node with its
origin host, via the foreign-key semantics we'll use going forward.