- App takes POST report on /api/report
- Probe publishes to configured target(s)
- Name resolution happens on probe-side
- There's no longer an xfer.ProbePort
- xfer.Collector responsibility is reduced
- Fixes to remaining experimental components.
- rm experimental/bridge: it's not being used, and by changing the
app/probe comm model, it would require a complete refactor anyway. We
can easily rebuild it when we need to. It will even be much simpler.
- rm experimental/graphviz: it's broken for some time anyway, and we
don't really need to play around with it as a rendering option
anymore.
- rm experimental/oneshot: we never use this anymore.
Also, 1 packet may be counted in N topologies, so you can't rely on the
sum of all packet counts across topologies having any relation to the
sampling data.
Another implicit invariant in the data model is that edges are always of the
form (local -> remote). That is, the source of an edge must always be a node
that originates from within Scope's domain of visibility. This was evident by
the presence of ingress and egress fields in edge/aggregate metadata.
When building the sniffer, I accidentally and incorrectly violated this
invariant, by constructing distinct edges for (local -> remote) and (remote ->
local), and collapsing ingress and egress byte counts to a single scalar. I
experienced a variety of subtle undefined behavior as a result. See #339.
This change reverts to the old, correct methodology. Consequently the sniffer
needs to be able to find out which side of the sniffed packet is local v.
remote, and to do that it needs access to local networks. I moved the
discovery from the probe/host package into probe/main.go.
As part of that work I discovered that package report also maintains its own,
independent "cache" of local networks. Except it contains only the (optional)
Docker bridge network, if it's been populated by the probe, and it's only used
by the report.Make{Endpoint,Address}NodeID constructors to scope local
addresses. Normally, scoping happens during rendering, and only for pseudo
nodes -- see current LeafMap Render localNetworks. This is pretty convoluted
and should be either be made consistent or heavily commented.
During rendering, RenderableNodes are created by Map funcs, which take only
NodeMetadata. Previously, it was simply assumed that the relevant keys for a
given Map func would be present in the metadata. If that implicit invariant
failed, the returned RenderableNode would be invalid, with e.g. an ID of
"hostid::". That, in turn, would trigger undefined behavior later on in the
rendering workflow.
This bug was detected by creating a partial node metadata for a non-local
endpoint node. That node was detected during the first phase of rendering, and
given an invalid renderable node ID of "myhostname::", which prevented it from
attaching to TheInternet pseudonode. It eventually got removed from the set
of valid nodes, which meant nodes that were adjacent to it suddenly became
orphans, and got filtered out by the FilterUnconnected step of the rendering
pipeline.
With this change, every map func checks for the presence of mandatory fields,
i.e. the fields that compose the resulting renderable node's ID.
Also,
- Add unit tests for LeafMapFuncs
- Topology Validate checks NodeMetadatas must not have nil Metadata
NewNodeMetadata -> MakeNodeMetadata. It doesn't return a pointer, so
Make is more idiomatic.
Invoke MakeNodeMetadata when necessary. The zero value for a
NodeMetadata is no longer valid.
Split MakeNodeMetadata to two constructors. MakeNodeMetadata when you
don't have anything to prepopulate; MakeNodeMetadataWith when you do.
Also, a fix to the tests in app. We unmarshal a RenderableNode struct,
which has a JSON-ignored NodeMetadata field. The zero value is invalid,
so we need to fix that before performing comparisons.
This fixes the regression where process names weren't appearing for
Darwin probes. Makes testing easier.
Also, changes the process walker to operate on value types. There's no
performance advantage to using reference types for something of this
size, and there appeared to be a data race in the Darwin port that
caused nodes to gain and lose process names over time.
Also, restructures how to enable docker scraping. Default false when run
manually, and enabled via --probe.docker true in the scope script.