mirror of
https://github.com/kubeshark/kubeshark.git
synced 2026-03-27 13:58:01 +00:00
Compare commits
1 Commits
master
...
fix-kfl-la
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
37f8becb7b |
36
README.md
36
README.md
@@ -17,13 +17,17 @@
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
Kubeshark indexes cluster-wide network traffic at the kernel level using eBPF — delivering instant answers to any query using network, API, and Kubernetes semantics.
|
Kubeshark captures cluster-wide network traffic at the speed and scale of Kubernetes, continuously, at the kernel level using eBPF. It consolidates a highly fragmented picture — dozens of nodes, thousands of workloads, millions of connections — into a single, queryable view with full Kubernetes and API context.
|
||||||
|
|
||||||
**What you can do:**
|
Network data is available to **AI agents via [MCP](https://docs.kubeshark.com/en/mcp)** and to **human operators via a [dashboard](https://docs.kubeshark.com/en/v2)**.
|
||||||
|
|
||||||
- **Download Retrospective PCAPs** — cluster-wide packet captures filtered by nodes, time, workloads, and IPs. Store PCAPs for long-term retention and later investigation.
|
**What's captured, cluster-wide:**
|
||||||
- **Visualize Network Data** — explore traffic matching queries with API, Kubernetes, or network semantics through a real-time dashboard.
|
|
||||||
- **Integrate with AI** — connect your favorite AI assistant (e.g. Claude, Copilot) to include network data in AI-driven workflows like incident response and root cause analysis.
|
- **L4 Packets & TCP Metrics** — retransmissions, RTT, window saturation, connection lifecycle, packet loss across every node-to-node path ([TCP insights →](https://docs.kubeshark.com/en/mcp/tcp_insights))
|
||||||
|
- **L7 API Calls** — real-time request/response matching with full payload parsing: HTTP, gRPC, GraphQL, Redis, Kafka, DNS ([API dissection →](https://docs.kubeshark.com/en/v2/l7_api_dissection))
|
||||||
|
- **Decrypted TLS** — eBPF-based TLS decryption without key management
|
||||||
|
- **Kubernetes Context** — every packet and API call resolved to pod, service, namespace, and node
|
||||||
|
- **PCAP Retention** — point-in-time raw packet snapshots, exportable for Wireshark ([Snapshots →](https://docs.kubeshark.com/en/v2/traffic_snapshots))
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
@@ -34,12 +38,9 @@ Kubeshark indexes cluster-wide network traffic at the kernel level using eBPF
|
|||||||
```bash
|
```bash
|
||||||
helm repo add kubeshark https://helm.kubeshark.com
|
helm repo add kubeshark https://helm.kubeshark.com
|
||||||
helm install kubeshark kubeshark/kubeshark
|
helm install kubeshark kubeshark/kubeshark
|
||||||
kubectl port-forward svc/kubeshark-front 8899:80
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Open `http://localhost:8899` in your browser. You're capturing traffic.
|
Dashboard opens automatically. You're capturing traffic.
|
||||||
|
|
||||||
> For production use, we recommend using an [ingress controller](https://docs.kubeshark.com/en/ingress) instead of port-forward.
|
|
||||||
|
|
||||||
**Connect an AI agent** via MCP:
|
**Connect an AI agent** via MCP:
|
||||||
|
|
||||||
@@ -52,9 +53,9 @@ claude mcp add kubeshark -- kubeshark mcp
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### Network Data for AI Agents
|
### AI-Powered Network Analysis
|
||||||
|
|
||||||
Kubeshark exposes cluster-wide network data via [MCP](https://docs.kubeshark.com/en/mcp) — enabling AI agents to query traffic, investigate API calls, and perform root cause analysis through natural language.
|
Kubeshark exposes all cluster-wide network data via MCP (Model Context Protocol). AI agents can query L4 metrics, investigate L7 API calls, analyze traffic patterns, and run root cause analysis — through natural language. Use cases include incident response, root cause analysis, troubleshooting, debugging, and reliability workflows.
|
||||||
|
|
||||||
> *"Why did checkout fail at 2:15 PM?"*
|
> *"Why did checkout fail at 2:15 PM?"*
|
||||||
> *"Which services have error rates above 1%?"*
|
> *"Which services have error rates above 1%?"*
|
||||||
@@ -69,25 +70,25 @@ Works with Claude Code, Cursor, and any MCP-compatible AI.
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### Network Traffic Indexing
|
### L7 API Dissection
|
||||||
|
|
||||||
Kubeshark indexes cluster-wide network traffic by parsing it according to protocol specifications, with support for HTTP, gRPC, Redis, Kafka, DNS, and more. This enables queries using Kubernetes semantics (e.g. pod, namespace, node), API semantics (e.g. path, headers, status), and network semantics (e.g. IP, port). No code instrumentation required.
|
Cluster-wide request/response matching with full payloads, parsed according to protocol specifications. HTTP, gRPC, Redis, Kafka, DNS, and more. Every API call resolved to source and destination pod, service, namespace, and node. No code instrumentation required.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
[Learn more →](https://docs.kubeshark.com/en/v2/l7_api_dissection)
|
[Learn more →](https://docs.kubeshark.com/en/v2/l7_api_dissection)
|
||||||
|
|
||||||
### Workload Dependency Map
|
### L4/L7 Workload Map
|
||||||
|
|
||||||
A visual map of how workloads communicate, showing dependencies, traffic volume, and protocol usage across the cluster.
|
Cluster-wide view of service communication: dependencies, traffic flow, and anomalies across all nodes and namespaces.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
[Learn more →](https://docs.kubeshark.com/en/v2/service_map)
|
[Learn more →](https://docs.kubeshark.com/en/v2/service_map)
|
||||||
|
|
||||||
### Traffic Retention & PCAP Export
|
### Traffic Retention
|
||||||
|
|
||||||
Capture and retain raw network traffic cluster-wide. Download PCAPs scoped by time range, nodes, workloads, and IPs — ready for Wireshark or any PCAP-compatible tool.
|
Continuous raw packet capture with point-in-time snapshots. Export PCAP files for offline analysis with Wireshark or other tools.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
@@ -99,6 +100,7 @@ Capture and retain raw network traffic cluster-wide. Download PCAPs scoped by ti
|
|||||||
|
|
||||||
| Feature | Description |
|
| Feature | Description |
|
||||||
|---------|-------------|
|
|---------|-------------|
|
||||||
|
| [**Raw Capture**](https://docs.kubeshark.com/en/v2/raw_capture) | Continuous cluster-wide packet capture with minimal overhead |
|
||||||
| [**Traffic Snapshots**](https://docs.kubeshark.com/en/v2/traffic_snapshots) | Point-in-time snapshots, export as PCAP for Wireshark |
|
| [**Traffic Snapshots**](https://docs.kubeshark.com/en/v2/traffic_snapshots) | Point-in-time snapshots, export as PCAP for Wireshark |
|
||||||
| [**L7 API Dissection**](https://docs.kubeshark.com/en/v2/l7_api_dissection) | Request/response matching with full payloads and protocol parsing |
|
| [**L7 API Dissection**](https://docs.kubeshark.com/en/v2/l7_api_dissection) | Request/response matching with full payloads and protocol parsing |
|
||||||
| [**Protocol Support**](https://docs.kubeshark.com/en/protocols) | HTTP, gRPC, GraphQL, Redis, Kafka, DNS, and more |
|
| [**Protocol Support**](https://docs.kubeshark.com/en/protocols) | HTTP, gRPC, GraphQL, Redis, Kafka, DNS, and more |
|
||||||
|
|||||||
@@ -40,11 +40,9 @@ type Readiness struct {
|
|||||||
}
|
}
|
||||||
|
|
||||||
var ready *Readiness
|
var ready *Readiness
|
||||||
var proxyOnce sync.Once
|
|
||||||
|
|
||||||
func tap() {
|
func tap() {
|
||||||
ready = &Readiness{}
|
ready = &Readiness{}
|
||||||
proxyOnce = sync.Once{}
|
|
||||||
state.startTime = time.Now()
|
state.startTime = time.Now()
|
||||||
log.Info().Str("registry", config.Config.Tap.Docker.Registry).Str("tag", config.Config.Tap.Docker.Tag).Msg("Using Docker:")
|
log.Info().Str("registry", config.Config.Tap.Docker.Registry).Str("tag", config.Config.Tap.Docker.Tag).Msg("Using Docker:")
|
||||||
|
|
||||||
@@ -149,21 +147,11 @@ func printNoPodsFoundSuggestion(targetNamespaces []string) {
|
|||||||
log.Warn().Msg(fmt.Sprintf("Did not find any currently running pods that match the regex argument, %s will automatically target matching pods if any are created later%s", misc.Software, suggestionStr))
|
log.Warn().Msg(fmt.Sprintf("Did not find any currently running pods that match the regex argument, %s will automatically target matching pods if any are created later%s", misc.Software, suggestionStr))
|
||||||
}
|
}
|
||||||
|
|
||||||
func isPodReady(pod *core.Pod) bool {
|
|
||||||
for _, condition := range pod.Status.Conditions {
|
|
||||||
if condition.Type == core.PodReady {
|
|
||||||
return condition.Status == core.ConditionTrue
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return false
|
|
||||||
}
|
|
||||||
|
|
||||||
func watchHubPod(ctx context.Context, kubernetesProvider *kubernetes.Provider, cancel context.CancelFunc) {
|
func watchHubPod(ctx context.Context, kubernetesProvider *kubernetes.Provider, cancel context.CancelFunc) {
|
||||||
podExactRegex := regexp.MustCompile(fmt.Sprintf("^%s", kubernetes.HubPodName))
|
podExactRegex := regexp.MustCompile(fmt.Sprintf("^%s", kubernetes.HubPodName))
|
||||||
podWatchHelper := kubernetes.NewPodWatchHelper(kubernetesProvider, podExactRegex)
|
podWatchHelper := kubernetes.NewPodWatchHelper(kubernetesProvider, podExactRegex)
|
||||||
eventChan, errorChan := kubernetes.FilteredWatch(ctx, podWatchHelper, []string{config.Config.Tap.Release.Namespace}, podWatchHelper)
|
eventChan, errorChan := kubernetes.FilteredWatch(ctx, podWatchHelper, []string{config.Config.Tap.Release.Namespace}, podWatchHelper)
|
||||||
podReady := false
|
isPodReady := false
|
||||||
podRunning := false
|
|
||||||
|
|
||||||
timeAfter := time.After(120 * time.Second)
|
timeAfter := time.After(120 * time.Second)
|
||||||
for {
|
for {
|
||||||
@@ -195,30 +183,26 @@ func watchHubPod(ctx context.Context, kubernetesProvider *kubernetes.Provider, c
|
|||||||
Interface("containers-statuses", modifiedPod.Status.ContainerStatuses).
|
Interface("containers-statuses", modifiedPod.Status.ContainerStatuses).
|
||||||
Msg("Watching pod.")
|
Msg("Watching pod.")
|
||||||
|
|
||||||
if isPodReady(modifiedPod) && !podReady {
|
if modifiedPod.Status.Phase == core.PodRunning && !isPodReady {
|
||||||
podReady = true
|
isPodReady = true
|
||||||
|
|
||||||
ready.Lock()
|
ready.Lock()
|
||||||
ready.Hub = true
|
ready.Hub = true
|
||||||
ready.Unlock()
|
ready.Unlock()
|
||||||
log.Info().Str("pod", kubernetes.HubPodName).Msg("Ready.")
|
log.Info().Str("pod", kubernetes.HubPodName).Msg("Ready.")
|
||||||
} else if modifiedPod.Status.Phase == core.PodRunning && !podRunning {
|
|
||||||
podRunning = true
|
|
||||||
log.Info().Str("pod", kubernetes.HubPodName).Msg("Waiting for readiness...")
|
|
||||||
}
|
}
|
||||||
|
|
||||||
ready.Lock()
|
ready.Lock()
|
||||||
|
proxyDone := ready.Proxy
|
||||||
hubPodReady := ready.Hub
|
hubPodReady := ready.Hub
|
||||||
frontPodReady := ready.Front
|
frontPodReady := ready.Front
|
||||||
ready.Unlock()
|
ready.Unlock()
|
||||||
|
|
||||||
if hubPodReady && frontPodReady {
|
if !proxyDone && hubPodReady && frontPodReady {
|
||||||
proxyOnce.Do(func() {
|
ready.Lock()
|
||||||
ready.Lock()
|
ready.Proxy = true
|
||||||
ready.Proxy = true
|
ready.Unlock()
|
||||||
ready.Unlock()
|
postFrontStarted(ctx, kubernetesProvider, cancel)
|
||||||
postFrontStarted(ctx, kubernetesProvider, cancel)
|
|
||||||
})
|
|
||||||
}
|
}
|
||||||
case kubernetes.EventBookmark:
|
case kubernetes.EventBookmark:
|
||||||
break
|
break
|
||||||
@@ -239,7 +223,7 @@ func watchHubPod(ctx context.Context, kubernetesProvider *kubernetes.Provider, c
|
|||||||
cancel()
|
cancel()
|
||||||
|
|
||||||
case <-timeAfter:
|
case <-timeAfter:
|
||||||
if !podReady {
|
if !isPodReady {
|
||||||
log.Error().
|
log.Error().
|
||||||
Str("pod", kubernetes.HubPodName).
|
Str("pod", kubernetes.HubPodName).
|
||||||
Msg("Pod was not ready in time.")
|
Msg("Pod was not ready in time.")
|
||||||
@@ -258,8 +242,7 @@ func watchFrontPod(ctx context.Context, kubernetesProvider *kubernetes.Provider,
|
|||||||
podExactRegex := regexp.MustCompile(fmt.Sprintf("^%s", kubernetes.FrontPodName))
|
podExactRegex := regexp.MustCompile(fmt.Sprintf("^%s", kubernetes.FrontPodName))
|
||||||
podWatchHelper := kubernetes.NewPodWatchHelper(kubernetesProvider, podExactRegex)
|
podWatchHelper := kubernetes.NewPodWatchHelper(kubernetesProvider, podExactRegex)
|
||||||
eventChan, errorChan := kubernetes.FilteredWatch(ctx, podWatchHelper, []string{config.Config.Tap.Release.Namespace}, podWatchHelper)
|
eventChan, errorChan := kubernetes.FilteredWatch(ctx, podWatchHelper, []string{config.Config.Tap.Release.Namespace}, podWatchHelper)
|
||||||
podReady := false
|
isPodReady := false
|
||||||
podRunning := false
|
|
||||||
|
|
||||||
timeAfter := time.After(120 * time.Second)
|
timeAfter := time.After(120 * time.Second)
|
||||||
for {
|
for {
|
||||||
@@ -291,29 +274,25 @@ func watchFrontPod(ctx context.Context, kubernetesProvider *kubernetes.Provider,
|
|||||||
Interface("containers-statuses", modifiedPod.Status.ContainerStatuses).
|
Interface("containers-statuses", modifiedPod.Status.ContainerStatuses).
|
||||||
Msg("Watching pod.")
|
Msg("Watching pod.")
|
||||||
|
|
||||||
if isPodReady(modifiedPod) && !podReady {
|
if modifiedPod.Status.Phase == core.PodRunning && !isPodReady {
|
||||||
podReady = true
|
isPodReady = true
|
||||||
ready.Lock()
|
ready.Lock()
|
||||||
ready.Front = true
|
ready.Front = true
|
||||||
ready.Unlock()
|
ready.Unlock()
|
||||||
log.Info().Str("pod", kubernetes.FrontPodName).Msg("Ready.")
|
log.Info().Str("pod", kubernetes.FrontPodName).Msg("Ready.")
|
||||||
} else if modifiedPod.Status.Phase == core.PodRunning && !podRunning {
|
|
||||||
podRunning = true
|
|
||||||
log.Info().Str("pod", kubernetes.FrontPodName).Msg("Waiting for readiness...")
|
|
||||||
}
|
}
|
||||||
|
|
||||||
ready.Lock()
|
ready.Lock()
|
||||||
|
proxyDone := ready.Proxy
|
||||||
hubPodReady := ready.Hub
|
hubPodReady := ready.Hub
|
||||||
frontPodReady := ready.Front
|
frontPodReady := ready.Front
|
||||||
ready.Unlock()
|
ready.Unlock()
|
||||||
|
|
||||||
if hubPodReady && frontPodReady {
|
if !proxyDone && hubPodReady && frontPodReady {
|
||||||
proxyOnce.Do(func() {
|
ready.Lock()
|
||||||
ready.Lock()
|
ready.Proxy = true
|
||||||
ready.Proxy = true
|
ready.Unlock()
|
||||||
ready.Unlock()
|
postFrontStarted(ctx, kubernetesProvider, cancel)
|
||||||
postFrontStarted(ctx, kubernetesProvider, cancel)
|
|
||||||
})
|
|
||||||
}
|
}
|
||||||
case kubernetes.EventBookmark:
|
case kubernetes.EventBookmark:
|
||||||
break
|
break
|
||||||
@@ -333,7 +312,7 @@ func watchFrontPod(ctx context.Context, kubernetesProvider *kubernetes.Provider,
|
|||||||
Msg("Failed creating pod.")
|
Msg("Failed creating pod.")
|
||||||
|
|
||||||
case <-timeAfter:
|
case <-timeAfter:
|
||||||
if !podReady {
|
if !isPodReady {
|
||||||
log.Error().
|
log.Error().
|
||||||
Str("pod", kubernetes.FrontPodName).
|
Str("pod", kubernetes.FrontPodName).
|
||||||
Msg("Pod was not ready in time.")
|
Msg("Pod was not ready in time.")
|
||||||
@@ -450,6 +429,9 @@ func postFrontStarted(ctx context.Context, kubernetesProvider *kubernetes.Provid
|
|||||||
watchScripts(ctx, kubernetesProvider, false)
|
watchScripts(ctx, kubernetesProvider, false)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if config.Config.Scripting.Console {
|
||||||
|
go runConsoleWithoutProxy()
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func updateConfig(kubernetesProvider *kubernetes.Provider) {
|
func updateConfig(kubernetesProvider *kubernetes.Provider) {
|
||||||
|
|||||||
@@ -70,7 +70,7 @@ spec:
|
|||||||
value: '{{- if and (not .Values.demoModeEnabled) (not .Values.tap.capture.dissection.enabled) -}}
|
value: '{{- if and (not .Values.demoModeEnabled) (not .Values.tap.capture.dissection.enabled) -}}
|
||||||
true
|
true
|
||||||
{{- else -}}
|
{{- else -}}
|
||||||
{{ (default false .Values.demoModeEnabled) | ternary false true }}
|
{{ not (default false .Values.demoModeEnabled) | ternary false true }}
|
||||||
{{- end -}}'
|
{{- end -}}'
|
||||||
- name: 'REACT_APP_CLOUD_LICENSE_ENABLED'
|
- name: 'REACT_APP_CLOUD_LICENSE_ENABLED'
|
||||||
value: '{{- if or (and .Values.cloudLicenseEnabled (not (empty .Values.license))) (not .Values.internetConnectivity) -}}
|
value: '{{- if or (and .Values.cloudLicenseEnabled (not (empty .Values.license))) (not .Values.internetConnectivity) -}}
|
||||||
|
|||||||
@@ -125,13 +125,26 @@ Match against any direction (src or dst):
|
|||||||
### Labels and Annotations
|
### Labels and Annotations
|
||||||
|
|
||||||
```
|
```
|
||||||
map_get(local_labels, "app", "") == "checkout" // Safe access with default
|
// Direct access — works when the label is expected to exist
|
||||||
|
local_labels.app == "payment" || remote_labels.app == "payment"
|
||||||
|
|
||||||
|
// Safe access with default — use when the label may not exist
|
||||||
|
map_get(local_labels, "app", "") == "checkout"
|
||||||
map_get(remote_labels, "version", "") == "canary"
|
map_get(remote_labels, "version", "") == "canary"
|
||||||
"tier" in local_labels // Label existence check
|
|
||||||
|
// Label existence check
|
||||||
|
"tier" in local_labels
|
||||||
```
|
```
|
||||||
|
|
||||||
Always use `map_get()` for labels and annotations — direct access like
|
Direct access (`local_labels.app`) returns an error if the key doesn't exist.
|
||||||
`local_labels["app"]` errors if the key doesn't exist.
|
Use `map_get()` when you're not sure the label is present on all workloads.
|
||||||
|
|
||||||
|
Queries can be as complex as needed — combine labels with any other fields.
|
||||||
|
Responses are fast because all API elements are indexed:
|
||||||
|
|
||||||
|
```
|
||||||
|
local_labels.app == "payment" && http && status_code >= 500 && dst.pod.namespace == "production"
|
||||||
|
```
|
||||||
|
|
||||||
### Node and Process
|
### Node and Process
|
||||||
|
|
||||||
|
|||||||
@@ -29,31 +29,6 @@ Unlike real-time monitoring, retrospective analysis lets you go back in time:
|
|||||||
reconstruct what happened, compare against known-good baselines, and pinpoint
|
reconstruct what happened, compare against known-good baselines, and pinpoint
|
||||||
root causes with full L4/L7 visibility.
|
root causes with full L4/L7 visibility.
|
||||||
|
|
||||||
## Timezone Handling
|
|
||||||
|
|
||||||
All timestamps presented to the user **must use the local timezone** of the environment
|
|
||||||
where the agent is running. Users think in local time ("this happened around 3pm"), and
|
|
||||||
UTC-only output adds friction during incident response when speed matters.
|
|
||||||
|
|
||||||
### Rules
|
|
||||||
|
|
||||||
1. **Detect the local timezone** at the start of every investigation. Use the system
|
|
||||||
clock or environment (e.g., `date +%Z` or equivalent) to determine the timezone.
|
|
||||||
2. **Present local time as the primary reference** in all output — summaries, event
|
|
||||||
correlations, time-range references, and tables.
|
|
||||||
3. **Show UTC in parentheses** for clarity, e.g., `15:03:22 IST (12:03:22 UTC)`.
|
|
||||||
4. **Convert tool responses** — Kubeshark MCP tools return timestamps in UTC. Always
|
|
||||||
convert these to local time before presenting to the user.
|
|
||||||
5. **Use local time in natural language** — when describing events, say "the spike at
|
|
||||||
3:23 PM" not "the spike at 12:23 UTC".
|
|
||||||
|
|
||||||
### Snapshot Creation
|
|
||||||
|
|
||||||
When creating snapshots, Kubeshark MCP tools accept UTC timestamps. Convert the user's
|
|
||||||
local time references to UTC before passing them to tools like `create_snapshot` or
|
|
||||||
`export_snapshot_pcap`. Confirm the converted window with the user if there's any
|
|
||||||
ambiguity.
|
|
||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
Before starting any analysis, verify the environment is ready.
|
Before starting any analysis, verify the environment is ready.
|
||||||
@@ -128,11 +103,6 @@ Both routes are valid and complementary. Use PCAP when you need raw packets
|
|||||||
for human analysis or compliance. Use Dissection when you want an AI agent
|
for human analysis or compliance. Use Dissection when you want an AI agent
|
||||||
to search and analyze traffic programmatically.
|
to search and analyze traffic programmatically.
|
||||||
|
|
||||||
**Default to Dissection.** Unless the user explicitly asks for a PCAP file or
|
|
||||||
Wireshark export, assume Dissection is needed. Any question about workloads,
|
|
||||||
APIs, services, pods, error rates, latency, or traffic patterns requires
|
|
||||||
dissected data.
|
|
||||||
|
|
||||||
## Snapshot Operations
|
## Snapshot Operations
|
||||||
|
|
||||||
Both routes start here. A snapshot is an immutable freeze of all cluster traffic
|
Both routes start here. A snapshot is an immutable freeze of all cluster traffic
|
||||||
@@ -146,19 +116,19 @@ Check what raw capture data exists across the cluster. You can only create
|
|||||||
snapshots within these boundaries — data outside the window has been rotated
|
snapshots within these boundaries — data outside the window has been rotated
|
||||||
out of the FIFO buffer.
|
out of the FIFO buffer.
|
||||||
|
|
||||||
**Example response** (raw tool output is in UTC — convert to local time before presenting):
|
**Example response**:
|
||||||
```
|
```
|
||||||
Cluster-wide:
|
Cluster-wide:
|
||||||
Oldest: 2026-03-14 18:12:34 IST (16:12:34 UTC)
|
Oldest: 2026-03-14 16:12:34 UTC
|
||||||
Newest: 2026-03-14 20:05:20 IST (18:05:20 UTC)
|
Newest: 2026-03-14 18:05:20 UTC
|
||||||
|
|
||||||
Per node:
|
Per node:
|
||||||
┌─────────────────────────────┬───────────────────────────────┬───────────────────────────────┐
|
┌─────────────────────────────┬──────────┬──────────┐
|
||||||
│ Node │ Oldest │ Newest │
|
│ Node │ Oldest │ Newest │
|
||||||
├─────────────────────────────┼───────────────────────────────┼───────────────────────────────┤
|
├─────────────────────────────┼──────────┼──────────┤
|
||||||
│ ip-10-0-25-170.ec2.internal │ 18:12:34 IST (16:12:34 UTC) │ 20:03:39 IST (18:03:39 UTC) │
|
│ ip-10-0-25-170.ec2.internal │ 16:12:34 │ 18:03:39 │
|
||||||
│ ip-10-0-32-115.ec2.internal │ 18:13:45 IST (16:13:45 UTC) │ 20:05:20 IST (18:05:20 UTC) │
|
│ ip-10-0-32-115.ec2.internal │ 16:13:45 │ 18:05:20 │
|
||||||
└─────────────────────────────┴───────────────────────────────┴───────────────────────────────┘
|
└─────────────────────────────┴──────────┴──────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
If the incident falls outside the available window, the data has been rotated
|
If the incident falls outside the available window, the data has been rotated
|
||||||
@@ -262,30 +232,7 @@ KFL field names differ from what you might expect (e.g., `status_code` not
|
|||||||
`response.status`, `src.pod.namespace` not `src.namespace`). Using incorrect
|
`response.status`, `src.pod.namespace` not `src.namespace`). Using incorrect
|
||||||
fields produces wrong results without warning.
|
fields produces wrong results without warning.
|
||||||
|
|
||||||
### Dissection Is Required — Do Not Skip This
|
### Activate Dissection
|
||||||
|
|
||||||
**Any question about workloads, Kubernetes resources, services, pods, namespaces,
|
|
||||||
or API calls requires dissection.** Only the PCAP route works without it. If the
|
|
||||||
user asks anything about traffic content, API behavior, error rates, latency,
|
|
||||||
or service-to-service communication, you **must** ensure dissection is active
|
|
||||||
before attempting to answer.
|
|
||||||
|
|
||||||
**Do not wait for dissection to complete on its own — it will not start by itself.**
|
|
||||||
|
|
||||||
Follow this sequence every time before using `list_api_calls`, `get_api_call`,
|
|
||||||
or `get_api_stats`:
|
|
||||||
|
|
||||||
1. **Check status**: Call `get_snapshot_dissection_status` (or `list_snapshot_dissections`)
|
|
||||||
to see if a dissection already exists for this snapshot.
|
|
||||||
2. **If dissection exists and is completed** — proceed with your query. No further
|
|
||||||
action needed.
|
|
||||||
3. **If dissection is in progress** — wait for it to complete, then proceed.
|
|
||||||
4. **If no dissection exists** — you **must** call `start_snapshot_dissection` to
|
|
||||||
trigger it. Then monitor progress with `get_snapshot_dissection_status` until
|
|
||||||
it completes.
|
|
||||||
|
|
||||||
Never assume dissection is running. Never wait for a dissection that was not started.
|
|
||||||
The agent is responsible for triggering dissection when it is missing.
|
|
||||||
|
|
||||||
**Tool**: `start_snapshot_dissection`
|
**Tool**: `start_snapshot_dissection`
|
||||||
|
|
||||||
@@ -296,27 +243,6 @@ become available:
|
|||||||
- `get_api_call` — Drill into a specific call (headers, body, timing, payload)
|
- `get_api_call` — Drill into a specific call (headers, body, timing, payload)
|
||||||
- `get_api_stats` — Aggregated statistics (throughput, error rates, latency)
|
- `get_api_stats` — Aggregated statistics (throughput, error rates, latency)
|
||||||
|
|
||||||
### Every Question Is a Query
|
|
||||||
|
|
||||||
**Every user prompt that involves APIs, workloads, services, pods, namespaces,
|
|
||||||
or Kubernetes semantics should translate into a `list_api_calls` call with an
|
|
||||||
appropriate KFL filter.** Do not answer from memory or prior results — always
|
|
||||||
run a fresh query that matches what the user is asking.
|
|
||||||
|
|
||||||
Examples of user prompts and the queries they should trigger:
|
|
||||||
|
|
||||||
| User says | Action |
|
|
||||||
|---|---|
|
|
||||||
| "Show me all 500 errors" | `list_api_calls` with KFL: `http && status_code == 500` |
|
|
||||||
| "What's hitting the payment service?" | `list_api_calls` with KFL: `dst.service.name == "payment-service"` |
|
|
||||||
| "Any DNS failures?" | `list_api_calls` with KFL: `dns && status_code != 0` |
|
|
||||||
| "Show traffic from namespace prod to staging" | `list_api_calls` with KFL: `src.pod.namespace == "prod" && dst.pod.namespace == "staging"` |
|
|
||||||
| "What are the slowest API calls?" | `list_api_calls` with KFL: `http && elapsed_time > 5000000` |
|
|
||||||
|
|
||||||
The user's natural language maps to KFL. Your job is to translate intent into
|
|
||||||
the right filter and run the query — don't summarize old results or speculate
|
|
||||||
without fresh data.
|
|
||||||
|
|
||||||
### Investigation Strategy
|
### Investigation Strategy
|
||||||
|
|
||||||
Start broad, then narrow:
|
Start broad, then narrow:
|
||||||
@@ -329,17 +255,16 @@ Start broad, then narrow:
|
|||||||
full payload to understand what went wrong.
|
full payload to understand what went wrong.
|
||||||
4. Use KFL filters to slice by namespace, service, protocol, or any combination.
|
4. Use KFL filters to slice by namespace, service, protocol, or any combination.
|
||||||
|
|
||||||
**Example `list_api_calls` response** (filtered to `http && status_code >= 500`,
|
**Example `list_api_calls` response** (filtered to `http && status_code >= 500`):
|
||||||
timestamps converted from UTC to local):
|
|
||||||
```
|
```
|
||||||
┌──────────────────────────────────────────┬────────┬──────────────────────────┬────────┬───────────┐
|
┌──────────────────────┬────────┬──────────────────────────┬────────┬───────────┐
|
||||||
│ Timestamp │ Method │ URL │ Status │ Elapsed │
|
│ Timestamp │ Method │ URL │ Status │ Elapsed │
|
||||||
├──────────────────────────────────────────┼────────┼──────────────────────────┼────────┼───────────┤
|
├──────────────────────┼────────┼──────────────────────────┼────────┼───────────┤
|
||||||
│ 2026-03-14 19:23:45 IST (17:23:45 UTC) │ POST │ /api/v1/orders/charge │ 503 │ 12,340 ms │
|
│ 2026-03-14 17:23:45 │ POST │ /api/v1/orders/charge │ 503 │ 12,340 ms │
|
||||||
│ 2026-03-14 19:23:46 IST (17:23:46 UTC) │ POST │ /api/v1/orders/charge │ 503 │ 11,890 ms │
|
│ 2026-03-14 17:23:46 │ POST │ /api/v1/orders/charge │ 503 │ 11,890 ms │
|
||||||
│ 2026-03-14 19:23:48 IST (17:23:48 UTC) │ GET │ /api/v1/inventory/check │ 500 │ 8,210 ms │
|
│ 2026-03-14 17:23:48 │ GET │ /api/v1/inventory/check │ 500 │ 8,210 ms │
|
||||||
│ 2026-03-14 19:24:01 IST (17:24:01 UTC) │ POST │ /api/v1/payments/process │ 502 │ 30,000 ms │
|
│ 2026-03-14 17:24:01 │ POST │ /api/v1/payments/process │ 502 │ 30,000 ms │
|
||||||
└──────────────────────────────────────────┴────────┴──────────────────────────┴────────┴───────────┘
|
└──────────────────────┴────────┴──────────────────────────┴────────┴───────────┘
|
||||||
Src: api-gateway (prod) → Dst: payment-service (prod)
|
Src: api-gateway (prod) → Dst: payment-service (prod)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user