Compare commits

...

5 Commits

Author SHA1 Message Date
Alon Girmonsky
ec7979f826 Network RCA skill: update resolution tools to list_workloads/list_ips
Replace deprecated resolve_workload/resolve_ip references with the new
list_workloads and list_ips tools that support both singular lookup
(name+namespace or IP) and filtered scan (namespace/regex/label filters
against snapshots).

Ref: kubeshark/hub#687
2026-03-27 16:45:01 -07:00
Alon Girmonsky
ddc2e57f12 Network RCA skill: use local timezone instead of UTC (#1880)
* Use local timezone instead of UTC in Network RCA skill output

Add a Timezone Handling section that instructs the agent to detect the
local timezone, present local time as the primary reference with UTC in
parentheses, and convert UTC tool responses before presenting to users.
Update all example timestamps to demonstrate the local+UTC format.

Closes #1879

* Ensure agent proactively starts dissection for workload/API queries

The agent was waiting for dissection to complete without ever starting it.
Add explicit instructions: check dissection status first, start it if
missing, and default to the Dissection route for any non-PCAP question.
Only PCAP-specific requests can skip dissection.

* Translate every API/Kubernetes question into a fresh list_api_calls query

Add "Every Question Is a Query" section: each user prompt with API or
Kubernetes semantics should map to a list_api_calls call with the
appropriate KFL filter. Includes examples of natural language to KFL
translation. Agent should never answer from memory or stale results.

---------

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-03-24 12:03:05 -07:00
Alon Girmonsky
e80fc3319b Revamp README descriptions and structure (#1881)
* Revamp README intro, sections, and descriptions

Rewrite the opening description to focus on indexing and querying.
Replace "What's captured" with actionable "What you can do" bullets.
Add port-forward step and ingress recommendation to Get Started.
Rename and tighten section descriptions: Network Data for AI Agents,
Network Traffic Indexing, Workload Dependency Map, Traffic Retention
& PCAP Export.

* Remove Raw Capture from features table
2026-03-23 08:33:27 -07:00
Volodymyr Stoiko
868b4c1f36 Verify hub/front pods are ready by conditions (#1864)
* Verify hub/front pods are ready by conditions

* log waiting for readiness

* proper sync

---------

Co-authored-by: Alon Girmonsky <1990761+alongir@users.noreply.github.com>
2026-03-21 17:33:48 -07:00
Serhii Ponomarenko
c63740ec45 🐛 Fix dissection-control front env logic (#1878) 2026-03-20 08:20:53 -07:00
4 changed files with 191 additions and 69 deletions

View File

@@ -17,17 +17,13 @@
---
Kubeshark captures cluster-wide network traffic at the speed and scale of Kubernetes, continuously, at the kernel level using eBPF. It consolidates a highly fragmented picture — dozens of nodes, thousands of workloads, millions of connections — into a single, queryable view with full Kubernetes and API context.
Kubeshark indexes cluster-wide network traffic at the kernel level using eBPF — delivering instant answers to any query using network, API, and Kubernetes semantics.
Network data is available to **AI agents via [MCP](https://docs.kubeshark.com/en/mcp)** and to **human operators via a [dashboard](https://docs.kubeshark.com/en/v2)**.
**What you can do:**
**What's captured, cluster-wide:**
- **L4 Packets & TCP Metrics** — retransmissions, RTT, window saturation, connection lifecycle, packet loss across every node-to-node path ([TCP insights →](https://docs.kubeshark.com/en/mcp/tcp_insights))
- **L7 API Calls** — real-time request/response matching with full payload parsing: HTTP, gRPC, GraphQL, Redis, Kafka, DNS ([API dissection →](https://docs.kubeshark.com/en/v2/l7_api_dissection))
- **Decrypted TLS** — eBPF-based TLS decryption without key management
- **Kubernetes Context** — every packet and API call resolved to pod, service, namespace, and node
- **PCAP Retention** — point-in-time raw packet snapshots, exportable for Wireshark ([Snapshots →](https://docs.kubeshark.com/en/v2/traffic_snapshots))
- **Download Retrospective PCAPs** — cluster-wide packet captures filtered by nodes, time, workloads, and IPs. Store PCAPs for long-term retention and later investigation.
- **Visualize Network Data** — explore traffic matching queries with API, Kubernetes, or network semantics through a real-time dashboard.
- **Integrate with AI** — connect your favorite AI assistant (e.g. Claude, Copilot) to include network data in AI-driven workflows like incident response and root cause analysis.
![Kubeshark](https://github.com/kubeshark/assets/raw/master/png/stream.png)
@@ -38,9 +34,12 @@ Network data is available to **AI agents via [MCP](https://docs.kubeshark.com/en
```bash
helm repo add kubeshark https://helm.kubeshark.com
helm install kubeshark kubeshark/kubeshark
kubectl port-forward svc/kubeshark-front 8899:80
```
Dashboard opens automatically. You're capturing traffic.
Open `http://localhost:8899` in your browser. You're capturing traffic.
> For production use, we recommend using an [ingress controller](https://docs.kubeshark.com/en/ingress) instead of port-forward.
**Connect an AI agent** via MCP:
@@ -53,9 +52,9 @@ claude mcp add kubeshark -- kubeshark mcp
---
### AI-Powered Network Analysis
### Network Data for AI Agents
Kubeshark exposes all cluster-wide network data via MCP (Model Context Protocol). AI agents can query L4 metrics, investigate L7 API calls, analyze traffic patterns, and run root cause analysis through natural language. Use cases include incident response, root cause analysis, troubleshooting, debugging, and reliability workflows.
Kubeshark exposes cluster-wide network data via [MCP](https://docs.kubeshark.com/en/mcp) — enabling AI agents to query traffic, investigate API calls, and perform root cause analysis through natural language.
> *"Why did checkout fail at 2:15 PM?"*
> *"Which services have error rates above 1%?"*
@@ -70,25 +69,25 @@ Works with Claude Code, Cursor, and any MCP-compatible AI.
---
### L7 API Dissection
### Network Traffic Indexing
Cluster-wide request/response matching with full payloads, parsed according to protocol specifications. HTTP, gRPC, Redis, Kafka, DNS, and more. Every API call resolved to source and destination pod, service, namespace, and node. No code instrumentation required.
Kubeshark indexes cluster-wide network traffic by parsing it according to protocol specifications, with support for HTTP, gRPC, Redis, Kafka, DNS, and more. This enables queries using Kubernetes semantics (e.g. pod, namespace, node), API semantics (e.g. path, headers, status), and network semantics (e.g. IP, port). No code instrumentation required.
![API context](https://github.com/kubeshark/assets/raw/master/png/api_context.png)
[Learn more →](https://docs.kubeshark.com/en/v2/l7_api_dissection)
### L4/L7 Workload Map
### Workload Dependency Map
Cluster-wide view of service communication: dependencies, traffic flow, and anomalies across all nodes and namespaces.
A visual map of how workloads communicate, showing dependencies, traffic volume, and protocol usage across the cluster.
![Service Map](https://github.com/kubeshark/assets/raw/master/png/servicemap.png)
[Learn more →](https://docs.kubeshark.com/en/v2/service_map)
### Traffic Retention
### Traffic Retention & PCAP Export
Continuous raw packet capture with point-in-time snapshots. Export PCAP files for offline analysis with Wireshark or other tools.
Capture and retain raw network traffic cluster-wide. Download PCAPs scoped by time range, nodes, workloads, and IPs — ready for Wireshark or any PCAP-compatible tool.
![Traffic Retention](https://github.com/kubeshark/assets/raw/master/png/snapshots.png)
@@ -100,7 +99,6 @@ Continuous raw packet capture with point-in-time snapshots. Export PCAP files fo
| Feature | Description |
|---------|-------------|
| [**Raw Capture**](https://docs.kubeshark.com/en/v2/raw_capture) | Continuous cluster-wide packet capture with minimal overhead |
| [**Traffic Snapshots**](https://docs.kubeshark.com/en/v2/traffic_snapshots) | Point-in-time snapshots, export as PCAP for Wireshark |
| [**L7 API Dissection**](https://docs.kubeshark.com/en/v2/l7_api_dissection) | Request/response matching with full payloads and protocol parsing |
| [**Protocol Support**](https://docs.kubeshark.com/en/protocols) | HTTP, gRPC, GraphQL, Redis, Kafka, DNS, and more |

View File

@@ -40,9 +40,11 @@ type Readiness struct {
}
var ready *Readiness
var proxyOnce sync.Once
func tap() {
ready = &Readiness{}
proxyOnce = sync.Once{}
state.startTime = time.Now()
log.Info().Str("registry", config.Config.Tap.Docker.Registry).Str("tag", config.Config.Tap.Docker.Tag).Msg("Using Docker:")
@@ -147,11 +149,21 @@ func printNoPodsFoundSuggestion(targetNamespaces []string) {
log.Warn().Msg(fmt.Sprintf("Did not find any currently running pods that match the regex argument, %s will automatically target matching pods if any are created later%s", misc.Software, suggestionStr))
}
func isPodReady(pod *core.Pod) bool {
for _, condition := range pod.Status.Conditions {
if condition.Type == core.PodReady {
return condition.Status == core.ConditionTrue
}
}
return false
}
func watchHubPod(ctx context.Context, kubernetesProvider *kubernetes.Provider, cancel context.CancelFunc) {
podExactRegex := regexp.MustCompile(fmt.Sprintf("^%s", kubernetes.HubPodName))
podWatchHelper := kubernetes.NewPodWatchHelper(kubernetesProvider, podExactRegex)
eventChan, errorChan := kubernetes.FilteredWatch(ctx, podWatchHelper, []string{config.Config.Tap.Release.Namespace}, podWatchHelper)
isPodReady := false
podReady := false
podRunning := false
timeAfter := time.After(120 * time.Second)
for {
@@ -183,26 +195,30 @@ func watchHubPod(ctx context.Context, kubernetesProvider *kubernetes.Provider, c
Interface("containers-statuses", modifiedPod.Status.ContainerStatuses).
Msg("Watching pod.")
if modifiedPod.Status.Phase == core.PodRunning && !isPodReady {
isPodReady = true
if isPodReady(modifiedPod) && !podReady {
podReady = true
ready.Lock()
ready.Hub = true
ready.Unlock()
log.Info().Str("pod", kubernetes.HubPodName).Msg("Ready.")
} else if modifiedPod.Status.Phase == core.PodRunning && !podRunning {
podRunning = true
log.Info().Str("pod", kubernetes.HubPodName).Msg("Waiting for readiness...")
}
ready.Lock()
proxyDone := ready.Proxy
hubPodReady := ready.Hub
frontPodReady := ready.Front
ready.Unlock()
if !proxyDone && hubPodReady && frontPodReady {
ready.Lock()
ready.Proxy = true
ready.Unlock()
postFrontStarted(ctx, kubernetesProvider, cancel)
if hubPodReady && frontPodReady {
proxyOnce.Do(func() {
ready.Lock()
ready.Proxy = true
ready.Unlock()
postFrontStarted(ctx, kubernetesProvider, cancel)
})
}
case kubernetes.EventBookmark:
break
@@ -223,7 +239,7 @@ func watchHubPod(ctx context.Context, kubernetesProvider *kubernetes.Provider, c
cancel()
case <-timeAfter:
if !isPodReady {
if !podReady {
log.Error().
Str("pod", kubernetes.HubPodName).
Msg("Pod was not ready in time.")
@@ -242,7 +258,8 @@ func watchFrontPod(ctx context.Context, kubernetesProvider *kubernetes.Provider,
podExactRegex := regexp.MustCompile(fmt.Sprintf("^%s", kubernetes.FrontPodName))
podWatchHelper := kubernetes.NewPodWatchHelper(kubernetesProvider, podExactRegex)
eventChan, errorChan := kubernetes.FilteredWatch(ctx, podWatchHelper, []string{config.Config.Tap.Release.Namespace}, podWatchHelper)
isPodReady := false
podReady := false
podRunning := false
timeAfter := time.After(120 * time.Second)
for {
@@ -274,25 +291,29 @@ func watchFrontPod(ctx context.Context, kubernetesProvider *kubernetes.Provider,
Interface("containers-statuses", modifiedPod.Status.ContainerStatuses).
Msg("Watching pod.")
if modifiedPod.Status.Phase == core.PodRunning && !isPodReady {
isPodReady = true
if isPodReady(modifiedPod) && !podReady {
podReady = true
ready.Lock()
ready.Front = true
ready.Unlock()
log.Info().Str("pod", kubernetes.FrontPodName).Msg("Ready.")
} else if modifiedPod.Status.Phase == core.PodRunning && !podRunning {
podRunning = true
log.Info().Str("pod", kubernetes.FrontPodName).Msg("Waiting for readiness...")
}
ready.Lock()
proxyDone := ready.Proxy
hubPodReady := ready.Hub
frontPodReady := ready.Front
ready.Unlock()
if !proxyDone && hubPodReady && frontPodReady {
ready.Lock()
ready.Proxy = true
ready.Unlock()
postFrontStarted(ctx, kubernetesProvider, cancel)
if hubPodReady && frontPodReady {
proxyOnce.Do(func() {
ready.Lock()
ready.Proxy = true
ready.Unlock()
postFrontStarted(ctx, kubernetesProvider, cancel)
})
}
case kubernetes.EventBookmark:
break
@@ -312,7 +333,7 @@ func watchFrontPod(ctx context.Context, kubernetesProvider *kubernetes.Provider,
Msg("Failed creating pod.")
case <-timeAfter:
if !isPodReady {
if !podReady {
log.Error().
Str("pod", kubernetes.FrontPodName).
Msg("Pod was not ready in time.")
@@ -429,9 +450,6 @@ func postFrontStarted(ctx context.Context, kubernetesProvider *kubernetes.Provid
watchScripts(ctx, kubernetesProvider, false)
}
if config.Config.Scripting.Console {
go runConsoleWithoutProxy()
}
}
func updateConfig(kubernetesProvider *kubernetes.Provider) {

View File

@@ -70,7 +70,7 @@ spec:
value: '{{- if and (not .Values.demoModeEnabled) (not .Values.tap.capture.dissection.enabled) -}}
true
{{- else -}}
{{ not (default false .Values.demoModeEnabled) | ternary false true }}
{{ (default false .Values.demoModeEnabled) | ternary false true }}
{{- end -}}'
- name: 'REACT_APP_CLOUD_LICENSE_ENABLED'
value: '{{- if or (and .Values.cloudLicenseEnabled (not (empty .Values.license))) (not .Values.internetConnectivity) -}}

View File

@@ -29,6 +29,31 @@ Unlike real-time monitoring, retrospective analysis lets you go back in time:
reconstruct what happened, compare against known-good baselines, and pinpoint
root causes with full L4/L7 visibility.
## Timezone Handling
All timestamps presented to the user **must use the local timezone** of the environment
where the agent is running. Users think in local time ("this happened around 3pm"), and
UTC-only output adds friction during incident response when speed matters.
### Rules
1. **Detect the local timezone** at the start of every investigation. Use the system
clock or environment (e.g., `date +%Z` or equivalent) to determine the timezone.
2. **Present local time as the primary reference** in all output — summaries, event
correlations, time-range references, and tables.
3. **Show UTC in parentheses** for clarity, e.g., `15:03:22 IST (12:03:22 UTC)`.
4. **Convert tool responses** — Kubeshark MCP tools return timestamps in UTC. Always
convert these to local time before presenting to the user.
5. **Use local time in natural language** — when describing events, say "the spike at
3:23 PM" not "the spike at 12:23 UTC".
### Snapshot Creation
When creating snapshots, Kubeshark MCP tools accept UTC timestamps. Convert the user's
local time references to UTC before passing them to tools like `create_snapshot` or
`export_snapshot_pcap`. Confirm the converted window with the user if there's any
ambiguity.
## Prerequisites
Before starting any analysis, verify the environment is ready.
@@ -103,6 +128,11 @@ Both routes are valid and complementary. Use PCAP when you need raw packets
for human analysis or compliance. Use Dissection when you want an AI agent
to search and analyze traffic programmatically.
**Default to Dissection.** Unless the user explicitly asks for a PCAP file or
Wireshark export, assume Dissection is needed. Any question about workloads,
APIs, services, pods, error rates, latency, or traffic patterns requires
dissected data.
## Snapshot Operations
Both routes start here. A snapshot is an immutable freeze of all cluster traffic
@@ -116,19 +146,19 @@ Check what raw capture data exists across the cluster. You can only create
snapshots within these boundaries — data outside the window has been rotated
out of the FIFO buffer.
**Example response**:
**Example response** (raw tool output is in UTC — convert to local time before presenting):
```
Cluster-wide:
Oldest: 2026-03-14 16:12:34 UTC
Newest: 2026-03-14 18:05:20 UTC
Oldest: 2026-03-14 18:12:34 IST (16:12:34 UTC)
Newest: 2026-03-14 20:05:20 IST (18:05:20 UTC)
Per node:
┌─────────────────────────────┬────────────────────┐
│ Node │ Oldest │ Newest
├─────────────────────────────┼────────────────────┤
│ ip-10-0-25-170.ec2.internal │ 16:12:34 │ 18:03:39 │
│ ip-10-0-32-115.ec2.internal │ 16:13:45 │ 18:05:20 │
└─────────────────────────────┴────────────────────┘
┌─────────────────────────────┬───────────────────────────────┬───────────────────────────────┐
│ Node │ Oldest │ Newest
├─────────────────────────────┼───────────────────────────────┼───────────────────────────────┤
│ ip-10-0-25-170.ec2.internal │ 18:12:34 IST (16:12:34 UTC) │ 20:03:39 IST (18:03:39 UTC)
│ ip-10-0-32-115.ec2.internal │ 18:13:45 IST (16:13:45 UTC) │ 20:05:20 IST (18:05:20 UTC)
└─────────────────────────────┴───────────────────────────────┴───────────────────────────────┘
```
If the incident falls outside the available window, the data has been rotated
@@ -191,18 +221,48 @@ When you know the workload names but not their IPs, resolve them from the
snapshot's metadata. Snapshots preserve pod-to-IP mappings from capture time,
so resolution is accurate even if pods have been rescheduled since.
**Tool**: `resolve_workload`
**Tool**: `list_workloads`
**Example workflow** — extract PCAP for specific workloads:
Use `list_workloads` with `name` + `namespace` for a singular lookup (works
live and against snapshots), or with `snapshot_id` + filters for a broader
scan.
1. Resolve IPs: `resolve_workload` for `orders-594487879c-7ddxf``10.0.53.101`
2. Resolve IPs: `resolve_workload` for `payment-service-6b8f9d-x2k4p``10.0.53.205`
**Example workflow — singular lookup** — extract PCAP for specific workloads:
1. Resolve IPs: `list_workloads` with `name: "orders-594487879c-7ddxf"`, `namespace: "prod"` → IPs: `["10.0.53.101"]`
2. Resolve IPs: `list_workloads` with `name: "payment-service-6b8f9d-x2k4p"`, `namespace: "prod"` → IPs: `["10.0.53.205"]`
3. Build BPF: `host 10.0.53.101 or host 10.0.53.205`
4. Export: `export_snapshot_pcap` with that BPF filter
**Example workflow — filtered scan** — extract PCAP for all workloads
matching a pattern in a snapshot:
1. List workloads: `list_workloads` with `snapshot_id`, `namespaces: ["prod"]`,
`name_regex: "payment.*"` → returns all matching workloads with their IPs
2. Collect all IPs from the response
3. Build BPF: `host 10.0.53.205 or host 10.0.53.210 or ...`
4. Export: `export_snapshot_pcap` with that BPF filter
This gives you a cluster-wide PCAP filtered to exactly the workloads involved
in the incident — ready for Wireshark or long-term storage.
### IP-to-Workload Resolution
When you have an IP address (e.g., from a PCAP or L4 flow) and need to
identify the workload behind it:
**Tool**: `list_ips`
Use `list_ips` with `ip` for a singular lookup (works live and against
snapshots), or with `snapshot_id` + filters for a broader scan.
**Example — singular lookup**: `list_ips` with `ip: "10.0.53.101"`,
`snapshot_id: "snap-abc"` → returns pod/service identity for that IP.
**Example — filtered scan**: `list_ips` with `snapshot_id: "snap-abc"`,
`namespaces: ["prod"]`, `labels: {"app": "payment"}` → returns all IPs
associated with workloads matching those filters.
---
## Route 2: Dissection
@@ -232,7 +292,30 @@ KFL field names differ from what you might expect (e.g., `status_code` not
`response.status`, `src.pod.namespace` not `src.namespace`). Using incorrect
fields produces wrong results without warning.
### Activate Dissection
### Dissection Is Required — Do Not Skip This
**Any question about workloads, Kubernetes resources, services, pods, namespaces,
or API calls requires dissection.** Only the PCAP route works without it. If the
user asks anything about traffic content, API behavior, error rates, latency,
or service-to-service communication, you **must** ensure dissection is active
before attempting to answer.
**Do not wait for dissection to complete on its own — it will not start by itself.**
Follow this sequence every time before using `list_api_calls`, `get_api_call`,
or `get_api_stats`:
1. **Check status**: Call `get_snapshot_dissection_status` (or `list_snapshot_dissections`)
to see if a dissection already exists for this snapshot.
2. **If dissection exists and is completed** — proceed with your query. No further
action needed.
3. **If dissection is in progress** — wait for it to complete, then proceed.
4. **If no dissection exists** — you **must** call `start_snapshot_dissection` to
trigger it. Then monitor progress with `get_snapshot_dissection_status` until
it completes.
Never assume dissection is running. Never wait for a dissection that was not started.
The agent is responsible for triggering dissection when it is missing.
**Tool**: `start_snapshot_dissection`
@@ -243,6 +326,27 @@ become available:
- `get_api_call` — Drill into a specific call (headers, body, timing, payload)
- `get_api_stats` — Aggregated statistics (throughput, error rates, latency)
### Every Question Is a Query
**Every user prompt that involves APIs, workloads, services, pods, namespaces,
or Kubernetes semantics should translate into a `list_api_calls` call with an
appropriate KFL filter.** Do not answer from memory or prior results — always
run a fresh query that matches what the user is asking.
Examples of user prompts and the queries they should trigger:
| User says | Action |
|---|---|
| "Show me all 500 errors" | `list_api_calls` with KFL: `http && status_code == 500` |
| "What's hitting the payment service?" | `list_api_calls` with KFL: `dst.service.name == "payment-service"` |
| "Any DNS failures?" | `list_api_calls` with KFL: `dns && status_code != 0` |
| "Show traffic from namespace prod to staging" | `list_api_calls` with KFL: `src.pod.namespace == "prod" && dst.pod.namespace == "staging"` |
| "What are the slowest API calls?" | `list_api_calls` with KFL: `http && elapsed_time > 5000000` |
The user's natural language maps to KFL. Your job is to translate intent into
the right filter and run the query — don't summarize old results or speculate
without fresh data.
### Investigation Strategy
Start broad, then narrow:
@@ -255,16 +359,17 @@ Start broad, then narrow:
full payload to understand what went wrong.
4. Use KFL filters to slice by namespace, service, protocol, or any combination.
**Example `list_api_calls` response** (filtered to `http && status_code >= 500`):
**Example `list_api_calls` response** (filtered to `http && status_code >= 500`,
timestamps converted from UTC to local):
```
┌──────────────────────┬────────┬──────────────────────────┬────────┬───────────┐
Timestamp │ Method │ URL │ Status │ Elapsed │
├──────────────────────┼────────┼──────────────────────────┼────────┼───────────┤
│ 2026-03-14 17:23:45 │ POST │ /api/v1/orders/charge │ 503 │ 12,340 ms │
│ 2026-03-14 17:23:46 │ POST │ /api/v1/orders/charge │ 503 │ 11,890 ms │
│ 2026-03-14 17:23:48 │ GET │ /api/v1/inventory/check │ 500 │ 8,210 ms │
│ 2026-03-14 17:24:01 │ POST │ /api/v1/payments/process │ 502 │ 30,000 ms │
└──────────────────────┴────────┴──────────────────────────┴────────┴───────────┘
┌──────────────────────────────────────────┬────────┬──────────────────────────┬────────┬───────────┐
Timestamp │ Method │ URL │ Status │ Elapsed │
├──────────────────────────────────────────┼────────┼──────────────────────────┼────────┼───────────┤
│ 2026-03-14 19:23:45 IST (17:23:45 UTC) │ POST │ /api/v1/orders/charge │ 503 │ 12,340 ms │
│ 2026-03-14 19:23:46 IST (17:23:46 UTC) │ POST │ /api/v1/orders/charge │ 503 │ 11,890 ms │
│ 2026-03-14 19:23:48 IST (17:23:48 UTC) │ GET │ /api/v1/inventory/check │ 500 │ 8,210 ms │
│ 2026-03-14 19:24:01 IST (17:24:01 UTC) │ POST │ /api/v1/payments/process │ 502 │ 30,000 ms │
└──────────────────────────────────────────┴────────┴──────────────────────────┴────────┴───────────┘
Src: api-gateway (prod) → Dst: payment-service (prod)
```
@@ -305,8 +410,9 @@ conn && conn_state == "open" && conn_local_bytes > 1000000 // High-volume conne
The two routes are complementary. A common pattern:
1. Start with **Dissection** — let the AI agent search and identify the root cause
2. Once you've pinpointed the problematic workloads, use `resolve_workload`
to get their IPs
2. Once you've pinpointed the problematic workloads, use `list_workloads`
to get their IPs (singular lookup by name+namespace, or filtered scan
by namespace/regex/labels against the snapshot)
3. Switch to **PCAP** — export a filtered PCAP of just those workloads for
Wireshark deep-dive, sharing with the network team, or compliance archival
@@ -319,7 +425,7 @@ The two routes are complementary. A common pattern:
3. `create_snapshot` covering the incident window (add 15 minutes buffer)
4. **Dissection route**: `start_snapshot_dissection``get_api_stats`
`list_api_calls``get_api_call` → follow the dependency chain
5. **PCAP route**: `resolve_workload``export_snapshot_pcap` with BPF →
5. **PCAP route**: `list_workloads``export_snapshot_pcap` with BPF →
hand off to Wireshark or archive
### Other Use Cases