Compare commits

...

31 Commits

Author SHA1 Message Date
Volodymyr Stoiko
e948637f79 Revert "auth: emit CHART_VERSION into hub ConfigMap (Phase V)"
This reverts c21e4c42. Companion to hub revert 61a275b6 — the hub no
longer reads CHART_VERSION, so emitting it serves no purpose.
2026-05-19 06:23:16 +00:00
Volodymyr Stoiko
c21e4c4276 auth: emit CHART_VERSION into hub ConfigMap (Phase V)
Hub commit 51abc954 reads this key on first SyncConfig and warns when
its embedded version.Ver disagrees with the chart at the major
component. Provided alongside as the chart-side companion to keep
both sides on one PR per repo on the permissions-refactoring branch.
2026-05-18 22:11:34 +00:00
Volodymyr Stoiko
9445806002 auth: add tap.auth.roles operator-defined role catalogue
Chart-side companion to hub commit 67162b2e (Phase C of the permissions
refactor). Operators can now declare named roles with their own
capability set + namespace scope under tap.auth.roles; the
12-config-map renders these as AUTH_ROLES JSON for the hub to consume.

  - config/configStructs: AuthConfig.Roles map[string]RoleConfig with
    Capabilities + Namespaces; doc comments updated for groupMapping +
    defaultRole to reflect that user-defined names are now accepted.
  - config/configStruct.go: zero-value initializer for Roles so
    `kubeshark config` renders `roles: {}` consistently.
  - helm-chart/templates/12-config-map.yaml: AUTH_ROLES emits the
    full roles map as JSON; hub-side syncAuthRoles validates names
    (kubeshark-* prefix reserved) and capabilities (unknown caps
    warn-dropped).
  - helm-chart/values.yaml: regenerated. Diff is the single `roles: {}`
    line under tap.auth.

Spot-checked the rendered ConfigMap:

  AUTH_ROLES: '{"payments-viewer":{"capabilities":["snapshot:read",
                                                   "dissection:live"],
                                    "namespaces":"payments"}}'

which is exactly the shape the hub parser expects.
2026-05-18 21:35:13 +00:00
Volodymyr Stoiko
90a6fb3d40 auth: set helm default role to kubeshark-viewer
Per round-2 permissions clarifications: SSO users whose claim doesn't
match any built-in role and isn't in AUTH_GROUP_MAPPING should fall
back to a read-only baseline instead of strict-deny ("").

defaultRole="" causes the dashboard to 403-storm gated endpoints from
unmatched users; viewer (snapshot:read only) gives them a sensible
read-only UX while still preventing any state change.
2026-05-17 21:34:50 +00:00
Volodymyr Stoiko
fd5bf8c1b5 auth: drop AUTH_ROLES; add AUTH_GROUP_MAPPING + built-in defaultRole
Companion to kubeshark/hub#permissions-refactoring. Aligns the CLI
config struct, chart values, and rendered ConfigMap with the
post-refactor hub.

config/configStructs/tapConfig.go:
  - Drop AuthConfig.Roles (admin-authored map[string]Role) and the
    Role + ScriptingPermissions structs they referenced.
  - Drop AuthConfig.DefaultFilter (no namespace scoping in v1).
  - Add AuthConfig.GroupMapping (map[string]string) — SSO group name
    → built-in role translation.
  - Tighten DefaultRole godoc to reference the four built-in role
    constants (kubeshark-admin / kubeshark-realtime /
    kubeshark-snapshot / kubeshark-viewer) and the strict-deny
    semantics on empty.

config/configStruct.go:
  - Drop the legacy "admin" entry from the AuthConfig default —
    operators now configure DefaultRole + GroupMapping instead.
  - Default RolesClaim is now "groups" (Okta/OIDC convention; was
    "role"), matching the hub's runtime default.

helm-chart/templates/12-config-map.yaml:
  - Drop AUTH_ROLES emission (key no longer read by hub).
  - Add AUTH_GROUP_MAPPING emission from tap.auth.groupMapping (JSON
    map; hub validates each value against the built-in role names at
    sync time).

helm-chart/values.yaml: regenerated from the Go config — drops the
tap.auth.roles block, adds tap.auth.groupMapping with the new
documentation header for DefaultRole.

Breaking change: deployments carrying tap.auth.roles in their values
will silently lose those role definitions. Migration is to remove the
roles: block and either (a) name their SSO groups to match the four
built-in role constants, or (b) populate tap.auth.groupMapping with
explicit translations.
2026-05-14 21:04:48 +00:00
Volodymyr Stoiko
7b5954ea00 helm: grant hub tokenreviews and label worker pods for internal auth (#1926)
* helm: grant hub tokenreviews and pass trusted controllers

Adds RBAC for hub to call the authentication.k8s.io/v1 TokenReview
endpoint, used by the new internalauth middleware to validate projected
ServiceAccountTokens presented by in-cluster gRPC callers.

Adds tap.internalAuth.trustedControllers value (empty by default),
threaded through to hub's -trusted-controllers flag as a CSV. Listing
a controller here lets pods owned by it authenticate to hub via the
projected SA token (audience kubeshark-hub). Hub-spawned Jobs are
always trusted regardless of this list. Hub matches OwnerReferences
by name AND UID, so a name-only forgery does not grant trust.

Sub-issue of kubeshark/hub#656.

* helm: inline trusted controllers in hub deployment template

The chart already knows its own controller names (worker DaemonSet
metadata.name is the literal "kubeshark-worker-daemon-set" in
09-worker-daemon-set.yaml). Pasting the same literal into a user-facing
tap.internalAuth.trustedControllers value adds a step without buying
anything — if the worker DS rename, the deployment template would have
to change in lockstep regardless.

Drop the values knob, render the flag unconditionally with the literal
worker DS name (matching the convention used elsewhere in this chart,
e.g. the hub deployment's {{ include "kubeshark.name" . }}-hub).

* helm: drop redundant comment on tokenreviews RBAC

* helm: drop -trusted-controllers flag (no caller today)

The flag was wiring forward-prep for a hypothetical worker->hub gRPC
caller from the DaemonSet. Hub-spawned Jobs (dissection-job) are
admitted via internalauth.RegisterSpawnedJob, not via this flag.
Re-add when an actual DaemonSet-deployed caller materializes.

* helm: label worker DS pods for hub internal auth

Worker pods don't call hub gRPC today, but pre-labeling the DS pod
template means a future worker->hub gRPC caller is one PR (worker-side)
away from working — no chart change required. Matches the generic
label-driven trust model in hub#783.

* helm: rename trust label to kubeshark.io/internal-auth

Matches the hub rename. Generic name so the same label can mark pods
trusted by future kubeshark services beyond hub.
2026-05-13 10:53:20 -07:00
Volodymyr Stoiko
8186b7891b Authz refactoring (helm chart + CLI) (#1921)
* Migrate auth.saml.roles to unified auth.roles

Follows the hub-side introduction of the backend-neutral AUTH_ROLES /
AUTH_ROLES_CLAIM / AUTH_DEFAULT_ROLE config (hub commit 51177bcb).
CLI and Helm chart now surface the unified location:

  tap.auth.roles         — map of role -> permissions (shared SAML/OIDC)
  tap.auth.rolesClaim    — token/assertion claim name carrying roles
  tap.auth.defaultRole   — fallback role for authenticated users with
                           no matching role in their token

Helm ConfigMap template emits AUTH_ROLES / AUTH_ROLES_CLAIM /
AUTH_DEFAULT_ROLE and no longer emits AUTH_SAML_ROLES or
AUTH_SAML_ROLE_ATTRIBUTE. Hub's back-compat fallback still reads those
keys from any existing ConfigMap that hasn't been helm-upgraded.

Legacy struct fields (SamlConfig.Roles, SamlConfig.RoleAttribute) stay
in place so existing values.yaml files with auth.saml.roles still parse
without errors, but the CLI and the chart ignore them. Follow-up release
can remove the struct fields once telemetry confirms migration.

Breaking for users with customized auth.saml.roles in their values.yaml
— the customization is masked by the new default auth.roles.admin and
must be migrated to auth.roles for the custom permissions to take
effect. Documented in the chart README and release notes.

Part of authz-refactoring (Step 2 of hub-oidc-rbac.md, CLI side).

* Remove legacy

* Align CLI + Helm chart with hub AUTH_TYPE rename

Follows hub commit 11564fef. The canonical AUTH_TYPE is now `oidc` for
generic OIDC; `dex` is a permanent alias; `descope` is a new explicit
label. This change surfaces the new vocabulary in the CLI config struct
and the Helm chart, and renames the nested `auth.dexOidc` values.yaml
field to `auth.oidc` for consistency.

Helm chart:
- 12-config-map.yaml: AUTH_OIDC_* keys now read `.Values.tap.auth.oidc.*`
  instead of `auth.dexOidc.*`. The cloud-license override that forced
  AUTH_TYPE=default unless the admin picked `dex` now accepts `oidc` too.
- 13-secret.yaml: OIDC_CLIENT_ID / OIDC_CLIENT_SECRET read from
  `auth.oidc.*` (was `auth.dexOidc.*`).
- 06-front-deployment.yaml: REACT_APP_AUTH_ENABLED / REACT_APP_AUTH_TYPE
  conditionals accept both `oidc` and `dex` where they previously only
  matched `dex`.
- values.yaml: comment on `tap.auth.type` lists valid values and flags
  the breaking change.
- README.md: `tap.auth.type` row lists valid values. All `dexOidc`
  references renamed to `oidc`. Sample values.yaml blocks now show
  `type: oidc` as the canonical form.

CLI:
- config/configStructs/tapConfig.go: AuthConfig.Type documented with the
  full list of valid values and the migration hint.

Breaking changes (repeated in release notes):
1. `tap.auth.type: oidc` now routes to the generic OIDC middleware
   (previously Descope). Switch to `tap.auth.type: descope` or `default`
   if you were using `oidc` for Descope.
2. `tap.auth.dexOidc.*` values are no longer read. Rename to
   `tap.auth.oidc.*`. No fallback.
3. `tap.auth.type: dex` continues to work — permanent alias of `oidc`.

Part of authz-refactoring (Step 4 of hub-oidc-rbac.md, CLI/Helm side).

* default kfl

* Authz Refactoring: Step 8: namespaces-list role filter

Align with hub PR kubeshark/hub#756. Per-role auth.roles[].filter (KFL)
is replaced by auth.roles[].namespaces (comma-separated list with "*",
literal, and glob semantics). Standalone tap.auth.defaultFilter knob
removed.

helm-chart/values.yaml
- admin role example uses namespaces: "*" instead of filter: "".
- Comment block explains the new namespaces semantics.
- defaultFilter: "" entry + accompanying comment block deleted.

helm-chart/templates/12-config-map.yaml
- AUTH_DEFAULT_FILTER ConfigMap entry removed (hub no longer reads it).

helm-chart/README.md
- tap.auth.defaultFilter row removed.
- tap.auth.roles default value example updated: filter: "" → namespaces: "*";
  description gains the per-role namespaces semantics legend.
2026-05-06 09:08:21 -07:00
Alon Girmonsky
ab81b0c3a7 🔖 Bump the Helm chart version to 53.2.5 (#1920)
Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-05-01 13:36:38 -07:00
Alon Girmonsky
9f5a1a41c0 fix(release-pr): sync bumped Chart.yaml to kubeshark.github.io (#1913)
* fix(release-pr): sync bumped Chart.yaml to kubeshark.github.io

The release-pr target was switching back to master (and pulling)
BEFORE copying helm-chart/ into ../kubeshark.github.io/charts/chart.
That reverted the working tree to the pre-bump Chart.yaml, so the
kubeshark.github.io PR shipped the previous version and the
chart-releaser action failed trying to recreate an existing tag.

Copy the bumped chart from the release/vX.Y.Z working tree, then
switch kubeshark back to master at the end of the target.

Also consolidate iterative robustness improvements: VERSION
validation, idempotent sibling-repo tagging, idempotent branch /
commit / push / PR creation, and a "nothing to commit" guard so
reruns of release-pr do not fail.

* refactor(release): split release-pr into three rerunnable targets

Before, release-pr did three things in one recipe: tag sibling
repos, create the kubeshark release PR, and create the helm chart
PR. If any step failed, the whole target had to be rerun, even for
the parts that had already succeeded, and some sub-steps (like
tagging worker/hub/front after a docker-image-only rebuild) had no
standalone entry point.

Split into:
  - release-siblings     : tag worker, hub, front
  - release-pr-kubeshark : bump Chart.yaml, build, open kubeshark PR
  - release-pr-helm      : sync chart to kubeshark.github.io, open helm PR
  - release-pr           : orchestrates all three in order

Each is idempotent and can be rerun independently. release-siblings
is now the canonical entry point for tagging sibling repos when
refreshing docker images without a full release.

release-pr-helm checks out release/v$(VERSION) (fetching from origin
if absent) before copying helm-chart/, so it has the bumped Chart.yaml
regardless of whether it runs right after release-pr-kubeshark or
days later in a separate invocation.

A shared _release-check-version prerequisite validates VERSION once
per target invocation.

* fix(release): make branch creation and push truly idempotent

Delete and recreate local release/helm branches instead of conditionally
checking out, and use --force-with-lease push to handle local/remote
divergence on reruns.

---------

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-05-01 10:07:20 -07:00
Alon Girmonsky
fef3e8fb05 Add PostgreSQL protocol configuration (#1919)
* Add MySQL protocol to default configuration

Closes #1915

* Add PostgreSQL protocol configuration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 12:59:11 +03:00
Alon Girmonsky
7ae81ccc4b Add MySQL protocol to default configuration (#1916)
Closes #1915

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-04-28 15:49:44 +03:00
Serhii Ponomarenko
27111e48d3 🔨 Create dashboard entries-limit helm value (#1914)
* 🔨 Create dashboard entries-limit helm value

* 🔨 Set default value for entries-limit env
2026-04-23 18:20:22 +03:00
Alon Girmonsky
863be8f47a 🔖 Bump the Helm chart version to 53.2.3 (#1912) 2026-04-20 16:39:25 +03:00
Serhii Ponomarenko
9e4059bc4d 🔨 Set nginx proxy-buffer directives (#1909) 2026-04-18 08:07:47 +03:00
Alon Girmonsky
f79885bd35 🔖 Release v53.2.2 (#1908)
* 🔖 Bump the Helm chart version to 53.2.2

* temp

* temp2

* revert back makefile
2026-04-14 01:21:58 -07:00
Volodymyr Stoiko
31129e570a Provide external volume for dissection job (#1905)
* Pass dissection storage configuration

* add dissection storage test

* Allow pvc management

* Use snapshot storage config as default for dissection storage config

---------

Co-authored-by: Alon Girmonsky <1990761+alongir@users.noreply.github.com>
2026-04-10 09:51:44 -07:00
theechofive
3a1ad64b4c fix: add subPathExpr to worker DaemonSet for shared persistent storage (#1901)
Co-authored-by: Volodymyr Stoiko <me@volodymyrstoiko.com>
Co-authored-by: Alon Girmonsky <1990761+alongir@users.noreply.github.com>
2026-04-10 09:23:31 -07:00
Alon Girmonsky
fa03da2fd4 Enable MongoDB protocol dissector (#1903)
Add mongodb to the enabled dissectors list and port mapping (27017)
in both Go config defaults and Helm chart values.

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 08:05:13 -07:00
stringsbuilder
4de0ac6abd refactor: replace Split in loops with more efficient SplitSeq and gofmt the code (#1888)
Signed-off-by: stringsbuilder <stringsbuilder@outlook.com>
Co-authored-by: Alon Girmonsky <1990761+alongir@users.noreply.github.com>
2026-04-06 21:07:50 -07:00
Alon Girmonsky
9b5ac2821f Network RCA skill: update resolution tools to list_workloads/list_ips (#1887)
Replace deprecated resolve_workload/resolve_ip references with the new
list_workloads and list_ips tools that support both singular lookup
(name+namespace or IP) and filtered scan (namespace/regex/label filters
against snapshots).

Ref: kubeshark/hub#687

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-04-06 12:40:34 -07:00
Alon Girmonsky
1ba6ed94e0 💄 Improve README with AI skills, KFL semantics, and cloud storage (#1892)
* 💄 Improve README with AI skills, KFL semantics image, and cloud storage

- Add AI Skills section with Network RCA and KFL skills, Claude Code plugin install
- Rename "Network Traffic Indexing" to "Query with API, Kubernetes, and Network Semantics" with new KFL semantics image showing how a single query combines all three layers
- Add cloud storage providers (S3, Azure Blob, GCS) and decrypted TLS to Traffic Retention section
- Update Features table: add AI Skills, KFL query language, cloud storage, delayed indexing

* 🔒 Add encrypted traffic visibility to README "What you can do" section

* 🎨 Update snapshots image in README

---------

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-04-02 18:38:13 -07:00
Alon Girmonsky
4695acb41e 🐛 Fix release-pr Makefile target cleanup and macOS sed compatibility (#1890)
- Fix macOS sed -i requiring empty backup extension argument
- Checkout master after creating kubeshark release PR
- Checkout master in kubeshark.github.io before and after creating helm PR
- Run all kubeshark.github.io operations in a single shell to avoid lost cd context

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-03-31 12:05:21 -07:00
Alon Girmonsky
b80723edfb 🔖 Bump the Helm chart version to 53.2.0 (#1889)
Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-03-31 11:30:42 -07:00
Alon Girmonsky
ddc2e57f12 Network RCA skill: use local timezone instead of UTC (#1880)
* Use local timezone instead of UTC in Network RCA skill output

Add a Timezone Handling section that instructs the agent to detect the
local timezone, present local time as the primary reference with UTC in
parentheses, and convert UTC tool responses before presenting to users.
Update all example timestamps to demonstrate the local+UTC format.

Closes #1879

* Ensure agent proactively starts dissection for workload/API queries

The agent was waiting for dissection to complete without ever starting it.
Add explicit instructions: check dissection status first, start it if
missing, and default to the Dissection route for any non-PCAP question.
Only PCAP-specific requests can skip dissection.

* Translate every API/Kubernetes question into a fresh list_api_calls query

Add "Every Question Is a Query" section: each user prompt with API or
Kubernetes semantics should map to a list_api_calls call with the
appropriate KFL filter. Includes examples of natural language to KFL
translation. Agent should never answer from memory or stale results.

---------

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-03-24 12:03:05 -07:00
Alon Girmonsky
e80fc3319b Revamp README descriptions and structure (#1881)
* Revamp README intro, sections, and descriptions

Rewrite the opening description to focus on indexing and querying.
Replace "What's captured" with actionable "What you can do" bullets.
Add port-forward step and ingress recommendation to Get Started.
Rename and tighten section descriptions: Network Data for AI Agents,
Network Traffic Indexing, Workload Dependency Map, Traffic Retention
& PCAP Export.

* Remove Raw Capture from features table
2026-03-23 08:33:27 -07:00
Volodymyr Stoiko
868b4c1f36 Verify hub/front pods are ready by conditions (#1864)
* Verify hub/front pods are ready by conditions

* log waiting for readiness

* proper sync

---------

Co-authored-by: Alon Girmonsky <1990761+alongir@users.noreply.github.com>
2026-03-21 17:33:48 -07:00
Serhii Ponomarenko
c63740ec45 🐛 Fix dissection-control front env logic (#1878) 2026-03-20 08:20:53 -07:00
Alon Girmonsky
10dbedf356 Add KFL and Network RCA skills (#1875)
* Add KFL and Network RCA skills

Introduce the skills/ directory with two Kubeshark MCP skills:

- network-rca: Retrospective traffic analysis via snapshots, dissection,
  KFL queries, PCAP extraction, and trend comparison
- kfl: Complete KFL2 (Kubeshark Filter Language) reference covering all
  supported protocols, variables, operators, and filter patterns

Update CLAUDE.md with skill authoring guidelines, structure conventions,
and the list of available Kubeshark MCP tools.

* Optimize skills and add shared setup reference

- network-rca: cut repeated metaphor, add list_api_calls example response,
  consolidate use cases, remove unbuilt composability section, extract
  setup reference to references/setup.md (409 → 306 lines)
- kfl: merge thin protocol sections, fix map_get inconsistency, add
  negation examples, move capture source to reference doc
- kfl2-reference: add most-commonly-used variables table, add inline
  filter examples per protocol section
- Add skills/README.md with usage and contribution guidelines

* Add plugin infrastructure and update READMEs

- Add .claude-plugin/plugin.json and marketplace.json for Claude Code
  plugin distribution
- Add .mcp.json bundling the Kubeshark MCP configuration
- Update skills/README.md with plugin install, manual install, and
  agent compatibility sections
- Update mcp/README.md with AI skills section and install instructions
- Restructure network-rca skill into two distinct investigation routes:
  PCAP (no dissection, BPF filters, Wireshark/compliance) and
  Dissection (indexed queries, AI-driven analysis, payload inspection)

* Remove CLAUDE.md from tracked files

Content now lives in skills/README.md, mcp/README.md, and the skills themselves.

* Add README to .claude-plugin directory

* Reorder MCP config: default mode first, URL mode for no-kubectl

* Move AI Skills section to top of MCP README

* Reorder manual install: symlink first

* Streamline skills README: focus on usage and contributing

* Enforce KFL skill loading before writing filters

- network-rca: require loading KFL skill before constructing filters,
  suggest installation if unavailable
- kfl: set user-invocable: false (background knowledge skill), strengthen
  description to mandate loading before any filter construction

* Move KFL requirement to top of Dissection route

* Add strict fallback: only use exact examples if KFL skill unavailable

* Add clone step to manual installation

* Use $PWD/kubeshark paths in manual install examples

* Add mkdir before symlinks, simplify paths

* Move prerequisites before installation

---------

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-03-18 15:31:32 -07:00
Serhii Ponomarenko
963b3e4ac2 🐛 Add default value for demoModeEnabled (#1872) 2026-03-17 13:22:42 -07:00
Volodymyr Stoiko
b2813e02bd Add detailed docs for kubeshark irsa setup (#1871)
Co-authored-by: Alon Girmonsky <1990761+alongir@users.noreply.github.com>
2026-03-16 20:29:49 -07:00
Serhii Ponomarenko
707d7351b6 🛂 Demo Portal: Readonly mode / no authn (#1869)
* 🔨 Add snapshots-updating-enabled `front` env

* 🔨 Add snapshots-updating-enabled config

* 🔨 Add demo-enabled `front` env

* 🔨 Add demo-enabled config

* 🔨 Replace `liveConfigMapChangesDisabled` with `demoModeEnabled` flag

* 🐛 Fix dissection-control-enabled env logic

* 🦺 Handle nullish `demoModeEnabled` value
2026-03-16 20:01:18 -07:00
30 changed files with 2285 additions and 297 deletions

33
.claude-plugin/README.md Normal file
View File

@@ -0,0 +1,33 @@
# Kubeshark Claude Code Plugin
This directory contains the [Claude Code plugin](https://docs.anthropic.com/en/docs/claude-code/plugins) configuration for Kubeshark.
## What's here
| File | Purpose |
|------|---------|
| `plugin.json` | Plugin manifest — name, version, description, metadata |
| `marketplace.json` | Marketplace index — allows discovery via `/plugin marketplace add` |
## Installing the plugin
```
/plugin marketplace add kubeshark/kubeshark
/plugin install kubeshark
```
This loads the Kubeshark AI skills and MCP configuration. Skills appear as
`/kubeshark:network-rca` and `/kubeshark:kfl`.
## What the plugin includes
- **Skills** from [`skills/`](../skills/) — network root cause analysis and KFL filter expertise
- **MCP configuration** from [`.mcp.json`](../.mcp.json) — connects to the Kubeshark MCP server
## Local development
Test the plugin without installing:
```bash
claude --plugin-dir /path/to/kubeshark
```

View File

@@ -0,0 +1,15 @@
{
"name": "kubeshark",
"description": "Kubeshark network observability skills for Kubernetes",
"plugins": [
{
"name": "kubeshark",
"description": "Network observability skills powered by Kubeshark MCP — root cause analysis, KFL traffic filtering, snapshot forensics, PCAP extraction.",
"source": {
"source": "github",
"owner": "kubeshark",
"repo": "kubeshark"
}
}
]
}

View File

@@ -0,0 +1,24 @@
{
"name": "kubeshark",
"version": "1.0.0",
"description": "Kubernetes network observability skills powered by Kubeshark MCP. Root cause analysis, traffic filtering, snapshot forensics, PCAP extraction, and more.",
"author": {
"name": "Kubeshark",
"url": "https://kubeshark.com"
},
"homepage": "https://kubeshark.com",
"repository": "https://github.com/kubeshark/kubeshark",
"license": "Apache-2.0",
"keywords": [
"kubeshark",
"kubernetes",
"network",
"observability",
"traffic",
"mcp",
"rca",
"pcap",
"kfl",
"ebpf"
]
}

8
.mcp.json Normal file
View File

@@ -0,0 +1,8 @@
{
"mcpServers": {
"kubeshark": {
"command": "kubeshark",
"args": ["mcp"]
}
}
}

119
Makefile
View File

@@ -253,50 +253,111 @@ port-forward:
kubectl port-forward $$(kubectl get pods | awk '$$1 ~ /^$(POD_PREFIX)/' | awk 'END {print $$1}') $(SRC_PORT):$(DST_PORT)
release: ## Print release workflow instructions.
@echo "Release workflow (2 steps):"
@echo "Release workflow — each step is idempotent and can be rerun on its own:"
@echo ""
@echo " 1. make release-pr VERSION=x.y.z"
@echo " Tags sibling repos, bumps version, creates PRs"
@echo " (kubeshark + kubeshark.github.io helm chart)."
@echo " Review and merge both PRs manually."
@echo " 1. make release-siblings VERSION=x.y.z"
@echo " Tag worker, hub, front with vx.y.z. Also run standalone when"
@echo " rebuilding docker images without cutting a full release."
@echo ""
@echo " 2. (automatic) Tag is created when release PR merges."
@echo " Fallback: make release-tag VERSION=x.y.z"
@echo " 2. make release-pr-kubeshark VERSION=x.y.z"
@echo " Bump Helm Chart.yaml, build, open release PR on kubeshark."
@echo ""
@echo " 3. make release-pr-helm VERSION=x.y.z"
@echo " Sync helm-chart/ into kubeshark.github.io, open helm PR."
@echo " Requires release/vx.y.z branch (created by step 2)."
@echo ""
@echo " Shortcut: make release-pr VERSION=x.y.z runs 1 → 2 → 3."
@echo ""
@echo " After both PRs merge: tag is created automatically,"
@echo " or run: make release-tag VERSION=x.y.z"
release-pr: ## Step 1: Tag sibling repos, bump version, create release PR.
@cd ../worker && git checkout master && git pull && git tag -d v$(VERSION); git tag v$(VERSION) && git push origin --tags
@cd ../hub && git checkout master && git pull && git tag -d v$(VERSION); git tag v$(VERSION) && git push origin --tags
@cd ../front && git checkout master && git pull && git tag -d v$(VERSION); git tag v$(VERSION) && git push origin --tags
# Internal: validate VERSION before any release-* target runs.
_release-check-version:
@if [ -z "$(VERSION)" ]; then echo "ERROR: VERSION is required. Usage: make <target> VERSION=x.y.z"; exit 1; fi
@echo "$(VERSION)" | grep -Eq '^[0-9]+\.[0-9]+\.[0-9]+' || { echo "ERROR: VERSION must be semver (e.g. 53.2.4)"; exit 1; }
release-siblings: _release-check-version ## Tag worker, hub, front with v$(VERSION). Idempotent; standalone for docker-image-only updates.
@for repo in worker hub front; do \
echo "==> $$repo: ensuring v$(VERSION) tag"; \
(cd ../$$repo && git checkout master && git pull) || exit 1; \
if (cd ../$$repo && git ls-remote --tags origin "refs/tags/v$(VERSION)" | grep -q .); then \
echo " v$(VERSION) already on origin — skipping"; \
else \
(cd ../$$repo && git tag -d v$(VERSION) 2>/dev/null; git tag v$(VERSION) && git push origin "refs/tags/v$(VERSION)") || exit 1; \
fi; \
done
release-pr-kubeshark: _release-check-version ## Bump Chart.yaml, build, open release PR on kubeshark.
@cd ../kubeshark && git checkout master && git pull
@sed -i "s/^version:.*/version: \"$(shell echo $(VERSION) | sed -E 's/^([0-9]+\.[0-9]+\.[0-9]+)\..*/\1/')\"/" helm-chart/Chart.yaml
@NEW=$$(echo $(VERSION) | sed -E 's/^([0-9]+\.[0-9]+\.[0-9]+).*/\1/'); \
CUR=$$(awk '/^version:/ {gsub(/"/,"",$$2); print $$2; exit}' helm-chart/Chart.yaml); \
if [ "$$CUR" != "$$NEW" ]; then \
sed -i '' "s/^version:.*/version: \"$$NEW\"/" helm-chart/Chart.yaml; \
else \
echo "Chart.yaml already at $$NEW"; \
fi
@$(MAKE) build VER=$(VERSION)
@if [ "$(shell uname)" = "Darwin" ]; then \
codesign --sign - --force --preserve-metadata=entitlements,requirements,flags,runtime ./bin/kubeshark__; \
fi
@$(MAKE) generate-helm-values && $(MAKE) generate-manifests
@if git show-ref --verify --quiet refs/heads/release/v$(VERSION); then \
git branch -D release/v$(VERSION); \
fi
@git checkout -b release/v$(VERSION)
@git add -A .
@git commit -m ":bookmark: Bump the Helm chart version to $(VERSION)"
@git push -u origin release/v$(VERSION)
@gh pr create --title ":bookmark: Release v$(VERSION)" \
--body "Automated release PR for v$(VERSION)." \
--base master \
--reviewer corest
@rm -rf ../kubeshark.github.io/charts/chart
@mkdir ../kubeshark.github.io/charts/chart
@cp -r helm-chart/ ../kubeshark.github.io/charts/chart/
@if ! git diff --cached --quiet; then \
git commit -m ":bookmark: Bump the Helm chart version to $(VERSION)"; \
else \
echo "nothing to commit"; \
fi
@git push --force-with-lease -u origin release/v$(VERSION)
@if gh pr view release/v$(VERSION) --json number >/dev/null 2>&1; then \
echo "PR already exists for release/v$(VERSION)"; \
else \
gh pr create --title ":bookmark: Release v$(VERSION)" \
--body "Automated release PR for v$(VERSION)." \
--base master \
--reviewer corest; \
fi
release-pr-helm: _release-check-version ## Sync helm-chart/ to kubeshark.github.io and open the helm PR. Requires release/v$(VERSION) branch (step 2).
@git fetch origin "refs/heads/release/v$(VERSION):refs/heads/release/v$(VERSION)" 2>/dev/null || true
@if ! git show-ref --verify --quiet refs/heads/release/v$(VERSION); then \
echo "ERROR: release/v$(VERSION) branch not found locally or on origin."; \
echo "Run 'make release-pr-kubeshark VERSION=$(VERSION)' first."; \
exit 1; \
fi
@git checkout release/v$(VERSION)
@cd ../kubeshark.github.io && git checkout master && git pull \
&& git checkout -b helm-v$(VERSION) \
&& git add -A . \
&& git commit -m ":sparkles: Update the Helm chart to v$(VERSION)" \
&& git push -u origin helm-v$(VERSION) \
&& gh pr create --title ":sparkles: Helm chart v$(VERSION)" \
&& rm -rf charts/chart && mkdir -p charts/chart \
&& cp -r ../kubeshark/helm-chart/ charts/chart/
@cd ../kubeshark.github.io && \
if git show-ref --verify --quiet refs/heads/helm-v$(VERSION); then \
git branch -D helm-v$(VERSION); \
fi && \
git checkout -b helm-v$(VERSION) && \
git add -A . && \
if ! git diff --cached --quiet; then \
git commit -m ":sparkles: Update the Helm chart to v$(VERSION)"; \
else \
echo "nothing to commit"; \
fi && \
git push --force-with-lease -u origin helm-v$(VERSION) && \
if ! gh pr view helm-v$(VERSION) --json number >/dev/null 2>&1; then \
gh pr create --title ":sparkles: Helm chart v$(VERSION)" \
--body "Update Helm chart for release v$(VERSION)." \
--base master \
--reviewer corest
@cd ../kubeshark
--reviewer corest; \
else \
echo "PR already exists for helm-v$(VERSION)"; \
fi && \
git checkout master
@cd ../kubeshark && git checkout master && git pull
release-pr: release-siblings release-pr-kubeshark release-pr-helm ## Run release-siblings, release-pr-kubeshark, and release-pr-helm in sequence.
@echo ""
@echo "Release PRs created:"
@echo "Release PRs created (or already present):"
@echo " - kubeshark: Review and merge the release PR."
@echo " - kubeshark.github.io: Review and merge the helm chart PR."
@echo "Tag will be created automatically, or run: make release-tag VERSION=$(VERSION)"

View File

@@ -17,17 +17,14 @@
---
Kubeshark captures cluster-wide network traffic at the speed and scale of Kubernetes, continuously, at the kernel level using eBPF. It consolidates a highly fragmented picture — dozens of nodes, thousands of workloads, millions of connections — into a single, queryable view with full Kubernetes and API context.
Kubeshark indexes cluster-wide network traffic at the kernel level using eBPF — delivering instant answers to any query using network, API, and Kubernetes semantics.
Network data is available to **AI agents via [MCP](https://docs.kubeshark.com/en/mcp)** and to **human operators via a [dashboard](https://docs.kubeshark.com/en/v2)**.
**What you can do:**
**What's captured, cluster-wide:**
- **L4 Packets & TCP Metrics** — retransmissions, RTT, window saturation, connection lifecycle, packet loss across every node-to-node path ([TCP insights →](https://docs.kubeshark.com/en/mcp/tcp_insights))
- **L7 API Calls** — real-time request/response matching with full payload parsing: HTTP, gRPC, GraphQL, Redis, Kafka, DNS ([API dissection →](https://docs.kubeshark.com/en/v2/l7_api_dissection))
- **Decrypted TLS** — eBPF-based TLS decryption without key management
- **Kubernetes Context** — every packet and API call resolved to pod, service, namespace, and node
- **PCAP Retention** — point-in-time raw packet snapshots, exportable for Wireshark ([Snapshots →](https://docs.kubeshark.com/en/v2/traffic_snapshots))
- **Download Retrospective PCAPs** — cluster-wide packet captures filtered by nodes, time, workloads, and IPs. Store PCAPs for long-term retention and later investigation.
- **Visualize Network Data** — explore traffic matching queries with API, Kubernetes, or network semantics through a real-time dashboard.
- **See Encrypted Traffic in Plain Text** — automatically decrypt TLS/mTLS traffic using eBPF, with no key management or sidecars required.
- **Integrate with AI** — connect your favorite AI assistant (e.g. Claude, Copilot) to include network data in AI-driven workflows like incident response and root cause analysis.
![Kubeshark](https://github.com/kubeshark/assets/raw/master/png/stream.png)
@@ -38,9 +35,12 @@ Network data is available to **AI agents via [MCP](https://docs.kubeshark.com/en
```bash
helm repo add kubeshark https://helm.kubeshark.com
helm install kubeshark kubeshark/kubeshark
kubectl port-forward svc/kubeshark-front 8899:80
```
Dashboard opens automatically. You're capturing traffic.
Open `http://localhost:8899` in your browser. You're capturing traffic.
> For production use, we recommend using an [ingress controller](https://docs.kubeshark.com/en/ingress) instead of port-forward.
**Connect an AI agent** via MCP:
@@ -53,9 +53,9 @@ claude mcp add kubeshark -- kubeshark mcp
---
### AI-Powered Network Analysis
### Network Data for AI Agents
Kubeshark exposes all cluster-wide network data via MCP (Model Context Protocol). AI agents can query L4 metrics, investigate L7 API calls, analyze traffic patterns, and run root cause analysis through natural language. Use cases include incident response, root cause analysis, troubleshooting, debugging, and reliability workflows.
Kubeshark exposes cluster-wide network data via [MCP](https://docs.kubeshark.com/en/mcp) — enabling AI agents to query traffic, investigate API calls, and perform root cause analysis through natural language.
> *"Why did checkout fail at 2:15 PM?"*
> *"Which services have error rates above 1%?"*
@@ -68,31 +68,51 @@ Works with Claude Code, Cursor, and any MCP-compatible AI.
[MCP setup guide →](https://docs.kubeshark.com/en/mcp)
### AI Skills
Open-source, reusable skills that teach AI agents domain-specific workflows on top of Kubeshark's MCP tools:
| Skill | Description |
|-------|-------------|
| **[Network RCA](skills/network-rca/)** | Retrospective root cause analysis — snapshots, dissection, PCAP extraction, trend comparison |
| **[KFL](skills/kfl/)** | KFL (Kubeshark Filter Language) expert — writes, debugs, and optimizes traffic filters |
Install as a Claude Code plugin:
```
/plugin marketplace add kubeshark/kubeshark
/plugin install kubeshark
```
Or clone and use directly — skills trigger automatically based on conversation context.
[AI Skills docs →](https://docs.kubeshark.com/en/mcp/skills)
---
### L7 API Dissection
### Query with API, Kubernetes, and Network Semantics
Cluster-wide request/response matching with full payloads, parsed according to protocol specifications. HTTP, gRPC, Redis, Kafka, DNS, and more. Every API call resolved to source and destination pod, service, namespace, and node. No code instrumentation required.
Kubeshark indexes cluster-wide network traffic by parsing it according to protocol specifications, with support for HTTP, gRPC, Redis, Kafka, DNS, and more. A single [KFL query](https://docs.kubeshark.com/en/v2/kfl2) can combine all three semantic layers — Kubernetes identity, API context, and network attributes — to pinpoint exactly the traffic you need. No code instrumentation required.
![API context](https://github.com/kubeshark/assets/raw/master/png/api_context.png)
![KFL query combining API, Kubernetes, and network semantics](https://github.com/kubeshark/assets/raw/master/png/kfl-semantics.png)
[Learn more](https://docs.kubeshark.com/en/v2/l7_api_dissection)
[KFL reference →](https://docs.kubeshark.com/en/v2/kfl2) · [Traffic indexing](https://docs.kubeshark.com/en/v2/l7_api_dissection)
### L4/L7 Workload Map
### Workload Dependency Map
Cluster-wide view of service communication: dependencies, traffic flow, and anomalies across all nodes and namespaces.
A visual map of how workloads communicate, showing dependencies, traffic volume, and protocol usage across the cluster.
![Service Map](https://github.com/kubeshark/assets/raw/master/png/servicemap.png)
[Learn more →](https://docs.kubeshark.com/en/v2/service_map)
### Traffic Retention
### Traffic Retention & PCAP Export
Continuous raw packet capture with point-in-time snapshots. Export PCAP files for offline analysis with Wireshark or other tools.
Capture and retain raw network traffic cluster-wide, including decrypted TLS. Download PCAPs scoped by time range, nodes, workloads, and IPs — ready for Wireshark or any PCAP-compatible tool. Store snapshots in cloud storage (S3, Azure Blob, GCS) for long-term retention and cross-cluster sharing.
![Traffic Retention](https://github.com/kubeshark/assets/raw/master/png/snapshots.png)
![Traffic Retention](https://github.com/kubeshark/assets/raw/master/png/snapshots-list.png)
[Snapshots guide →](https://docs.kubeshark.com/en/v2/traffic_snapshots)
[Snapshots guide →](https://docs.kubeshark.com/en/v2/traffic_snapshots) · [Cloud storage →](https://docs.kubeshark.com/en/snapshots_cloud_storage)
---
@@ -100,13 +120,12 @@ Continuous raw packet capture with point-in-time snapshots. Export PCAP files fo
| Feature | Description |
|---------|-------------|
| [**Raw Capture**](https://docs.kubeshark.com/en/v2/raw_capture) | Continuous cluster-wide packet capture with minimal overhead |
| [**Traffic Snapshots**](https://docs.kubeshark.com/en/v2/traffic_snapshots) | Point-in-time snapshots, export as PCAP for Wireshark |
| [**L7 API Dissection**](https://docs.kubeshark.com/en/v2/l7_api_dissection) | Request/response matching with full payloads and protocol parsing |
| [**Traffic Snapshots**](https://docs.kubeshark.com/en/v2/traffic_snapshots) | Point-in-time snapshots with cloud storage (S3, Azure Blob, GCS), PCAP export for Wireshark |
| [**Traffic Indexing**](https://docs.kubeshark.com/en/v2/l7_api_dissection) | Real-time and delayed L7 indexing with request/response matching and full payloads |
| [**Protocol Support**](https://docs.kubeshark.com/en/protocols) | HTTP, gRPC, GraphQL, Redis, Kafka, DNS, and more |
| [**TLS Decryption**](https://docs.kubeshark.com/en/encrypted_traffic) | eBPF-based decryption without key management |
| [**AI-Powered Analysis**](https://docs.kubeshark.com/en/v2/ai_powered_analysis) | Query cluster-wide network data with Claude, Cursor, or any MCP-compatible AI |
| [**Display Filters**](https://docs.kubeshark.com/en/v2/kfl2) | Wireshark-inspired display filters for precise traffic analysis |
| [**TLS Decryption**](https://docs.kubeshark.com/en/encrypted_traffic) | eBPF-based decryption without key management, included in snapshots |
| [**AI Integration**](https://docs.kubeshark.com/en/mcp) | MCP server + open-source AI skills for network RCA and traffic filtering |
| [**KFL Query Language**](https://docs.kubeshark.com/en/v2/kfl2) | CEL-based query language with Kubernetes, API, and network semantics |
| [**100% On-Premises**](https://docs.kubeshark.com/en/air_gapped) | Air-gapped support, no external dependencies |
---

View File

@@ -86,9 +86,9 @@ type mcpContent struct {
}
type mcpPrompt struct {
Name string `json:"name"`
Description string `json:"description,omitempty"`
Arguments []mcpPromptArg `json:"arguments,omitempty"`
Name string `json:"name"`
Description string `json:"description,omitempty"`
Arguments []mcpPromptArg `json:"arguments,omitempty"`
}
type mcpPromptArg struct {
@@ -117,11 +117,11 @@ type mcpGetPromptResult struct {
// Hub MCP API response types
type hubMCPResponse struct {
Name string `json:"name"`
Description string `json:"description"`
Version string `json:"version"`
Tools []hubMCPTool `json:"tools"`
Prompts []hubMCPPrompt `json:"prompts"`
Name string `json:"name"`
Description string `json:"description"`
Version string `json:"version"`
Tools []hubMCPTool `json:"tools"`
Prompts []hubMCPPrompt `json:"prompts"`
}
type hubMCPTool struct {
@@ -131,9 +131,9 @@ type hubMCPTool struct {
}
type hubMCPPrompt struct {
Name string `json:"name"`
Description string `json:"description,omitempty"`
Arguments []hubMCPPromptArg `json:"arguments,omitempty"`
Name string `json:"name"`
Description string `json:"description,omitempty"`
Arguments []hubMCPPromptArg `json:"arguments,omitempty"`
}
type hubMCPPromptArg struct {
@@ -151,10 +151,10 @@ type mcpServer struct {
stdout io.Writer
backendInitialized bool
backendMu sync.Mutex
setFlags []string // --set flags to pass to 'kubeshark tap' when starting
directURL string // If set, connect directly to this URL (no kubectl/proxy)
urlMode bool // True when using direct URL mode
allowDestructive bool // If true, enable start/stop tools
setFlags []string // --set flags to pass to 'kubeshark tap' when starting
directURL string // If set, connect directly to this URL (no kubectl/proxy)
urlMode bool // True when using direct URL mode
allowDestructive bool // If true, enable start/stop tools
cachedHubMCP *hubMCPResponse // Cached tools/prompts from Hub
cachedAt time.Time // When the cache was populated
hubMCPMu sync.Mutex
@@ -772,7 +772,6 @@ func (s *mcpServer) callHubTool(toolName string, args map[string]any) (string, b
return prettyJSON.String(), false
}
func (s *mcpServer) callGetFileURL(args map[string]any) (string, bool) {
filePath, _ := args["path"].(string)
if filePath == "" {
@@ -869,8 +868,8 @@ func (s *mcpServer) callStartKubeshark(args map[string]any) (string, bool) {
// Add namespaces if provided
if v, ok := args["namespaces"].(string); ok && v != "" {
namespaces := strings.Split(v, ",")
for _, ns := range namespaces {
namespaces := strings.SplitSeq(v, ",")
for ns := range namespaces {
ns = strings.TrimSpace(ns)
if ns != "" {
cmdArgs = append(cmdArgs, "-n", ns)

View File

@@ -417,7 +417,7 @@ func TestMCP_CommandArgs(t *testing.T) {
cmdArgs = append(cmdArgs, v)
}
if v, _ := tc.args["namespaces"].(string); v != "" {
for _, ns := range strings.Split(v, ",") {
for ns := range strings.SplitSeq(v, ",") {
cmdArgs = append(cmdArgs, "-n", strings.TrimSpace(ns))
}
}

View File

@@ -40,9 +40,11 @@ type Readiness struct {
}
var ready *Readiness
var proxyOnce sync.Once
func tap() {
ready = &Readiness{}
proxyOnce = sync.Once{}
state.startTime = time.Now()
log.Info().Str("registry", config.Config.Tap.Docker.Registry).Str("tag", config.Config.Tap.Docker.Tag).Msg("Using Docker:")
@@ -147,11 +149,21 @@ func printNoPodsFoundSuggestion(targetNamespaces []string) {
log.Warn().Msg(fmt.Sprintf("Did not find any currently running pods that match the regex argument, %s will automatically target matching pods if any are created later%s", misc.Software, suggestionStr))
}
func isPodReady(pod *core.Pod) bool {
for _, condition := range pod.Status.Conditions {
if condition.Type == core.PodReady {
return condition.Status == core.ConditionTrue
}
}
return false
}
func watchHubPod(ctx context.Context, kubernetesProvider *kubernetes.Provider, cancel context.CancelFunc) {
podExactRegex := regexp.MustCompile(fmt.Sprintf("^%s", kubernetes.HubPodName))
podWatchHelper := kubernetes.NewPodWatchHelper(kubernetesProvider, podExactRegex)
eventChan, errorChan := kubernetes.FilteredWatch(ctx, podWatchHelper, []string{config.Config.Tap.Release.Namespace}, podWatchHelper)
isPodReady := false
podReady := false
podRunning := false
timeAfter := time.After(120 * time.Second)
for {
@@ -183,26 +195,30 @@ func watchHubPod(ctx context.Context, kubernetesProvider *kubernetes.Provider, c
Interface("containers-statuses", modifiedPod.Status.ContainerStatuses).
Msg("Watching pod.")
if modifiedPod.Status.Phase == core.PodRunning && !isPodReady {
isPodReady = true
if isPodReady(modifiedPod) && !podReady {
podReady = true
ready.Lock()
ready.Hub = true
ready.Unlock()
log.Info().Str("pod", kubernetes.HubPodName).Msg("Ready.")
} else if modifiedPod.Status.Phase == core.PodRunning && !podRunning {
podRunning = true
log.Info().Str("pod", kubernetes.HubPodName).Msg("Waiting for readiness...")
}
ready.Lock()
proxyDone := ready.Proxy
hubPodReady := ready.Hub
frontPodReady := ready.Front
ready.Unlock()
if !proxyDone && hubPodReady && frontPodReady {
ready.Lock()
ready.Proxy = true
ready.Unlock()
postFrontStarted(ctx, kubernetesProvider, cancel)
if hubPodReady && frontPodReady {
proxyOnce.Do(func() {
ready.Lock()
ready.Proxy = true
ready.Unlock()
postFrontStarted(ctx, kubernetesProvider, cancel)
})
}
case kubernetes.EventBookmark:
break
@@ -223,7 +239,7 @@ func watchHubPod(ctx context.Context, kubernetesProvider *kubernetes.Provider, c
cancel()
case <-timeAfter:
if !isPodReady {
if !podReady {
log.Error().
Str("pod", kubernetes.HubPodName).
Msg("Pod was not ready in time.")
@@ -242,7 +258,8 @@ func watchFrontPod(ctx context.Context, kubernetesProvider *kubernetes.Provider,
podExactRegex := regexp.MustCompile(fmt.Sprintf("^%s", kubernetes.FrontPodName))
podWatchHelper := kubernetes.NewPodWatchHelper(kubernetesProvider, podExactRegex)
eventChan, errorChan := kubernetes.FilteredWatch(ctx, podWatchHelper, []string{config.Config.Tap.Release.Namespace}, podWatchHelper)
isPodReady := false
podReady := false
podRunning := false
timeAfter := time.After(120 * time.Second)
for {
@@ -274,25 +291,29 @@ func watchFrontPod(ctx context.Context, kubernetesProvider *kubernetes.Provider,
Interface("containers-statuses", modifiedPod.Status.ContainerStatuses).
Msg("Watching pod.")
if modifiedPod.Status.Phase == core.PodRunning && !isPodReady {
isPodReady = true
if isPodReady(modifiedPod) && !podReady {
podReady = true
ready.Lock()
ready.Front = true
ready.Unlock()
log.Info().Str("pod", kubernetes.FrontPodName).Msg("Ready.")
} else if modifiedPod.Status.Phase == core.PodRunning && !podRunning {
podRunning = true
log.Info().Str("pod", kubernetes.FrontPodName).Msg("Waiting for readiness...")
}
ready.Lock()
proxyDone := ready.Proxy
hubPodReady := ready.Hub
frontPodReady := ready.Front
ready.Unlock()
if !proxyDone && hubPodReady && frontPodReady {
ready.Lock()
ready.Proxy = true
ready.Unlock()
postFrontStarted(ctx, kubernetesProvider, cancel)
if hubPodReady && frontPodReady {
proxyOnce.Do(func() {
ready.Lock()
ready.Proxy = true
ready.Unlock()
postFrontStarted(ctx, kubernetesProvider, cancel)
})
}
case kubernetes.EventBookmark:
break
@@ -312,7 +333,7 @@ func watchFrontPod(ctx context.Context, kubernetesProvider *kubernetes.Provider,
Msg("Failed creating pod.")
case <-timeAfter:
if !isPodReady {
if !podReady {
log.Error().
Str("pod", kubernetes.FrontPodName).
Msg("Pod was not ready in time.")
@@ -429,9 +450,6 @@ func postFrontStarted(ctx context.Context, kubernetesProvider *kubernetes.Provid
watchScripts(ctx, kubernetesProvider, false)
}
if config.Config.Scripting.Console {
go runConsoleWithoutProxy()
}
}
func updateConfig(kubernetesProvider *kubernetes.Provider) {

View File

@@ -102,25 +102,10 @@ func CreateDefaultConfig() ConfigStruct {
},
},
Auth: configStructs.AuthConfig{
Saml: configStructs.SamlConfig{
RoleAttribute: "role",
Roles: map[string]configStructs.Role{
"admin": {
Filter: "",
CanDownloadPCAP: true,
CanUseScripting: true,
ScriptingPermissions: configStructs.ScriptingPermissions{
CanSave: true,
CanActivate: true,
CanDelete: true,
},
CanUpdateTargetedPods: true,
CanStopTrafficCapturing: true,
CanControlDissection: true,
ShowAdminConsoleLink: true,
},
},
},
RolesClaim: "groups",
DefaultRole: "kubeshark-viewer",
GroupMapping: map[string]string{},
Roles: map[string]configStructs.RoleConfig{},
},
EnabledDissectors: []string{
"amqp",
@@ -128,6 +113,9 @@ func CreateDefaultConfig() ConfigStruct {
"http",
"icmp",
"kafka",
"mongodb",
"mysql",
"postgresql",
"redis",
// "sctp",
// "syscall",
@@ -147,7 +135,10 @@ func CreateDefaultConfig() ConfigStruct {
HTTP: []uint16{80, 443, 8080},
AMQP: []uint16{5671, 5672},
KAFKA: []uint16{9092},
REDIS: []uint16{6379},
MONGODB: []uint16{27017},
MYSQL: []uint16{3306},
POSTGRESQL: []uint16{5432},
REDIS: []uint16{6379},
LDAP: []uint16{389},
DIAMETER: []uint16{3868},
},

View File

@@ -155,35 +155,58 @@ type ProbeConfig struct {
FailureThreshold int `yaml:"failureThreshold" json:"failureThreshold" default:"3"`
}
type ScriptingPermissions struct {
CanSave bool `yaml:"canSave" json:"canSave" default:"true"`
CanActivate bool `yaml:"canActivate" json:"canActivate" default:"true"`
CanDelete bool `yaml:"canDelete" json:"canDelete" default:"true"`
}
type Role struct {
Filter string `yaml:"filter" json:"filter" default:""`
CanDownloadPCAP bool `yaml:"canDownloadPCAP" json:"canDownloadPCAP" default:"false"`
CanUseScripting bool `yaml:"canUseScripting" json:"canUseScripting" default:"false"`
ScriptingPermissions ScriptingPermissions `yaml:"scriptingPermissions" json:"scriptingPermissions"`
CanUpdateTargetedPods bool `yaml:"canUpdateTargetedPods" json:"canUpdateTargetedPods" default:"false"`
CanStopTrafficCapturing bool `yaml:"canStopTrafficCapturing" json:"canStopTrafficCapturing" default:"false"`
CanControlDissection bool `yaml:"canControlDissection" json:"canControlDissection" default:"false"`
ShowAdminConsoleLink bool `yaml:"showAdminConsoleLink" json:"showAdminConsoleLink" default:"false"`
}
type SamlConfig struct {
IdpMetadataUrl string `yaml:"idpMetadataUrl" json:"idpMetadataUrl"`
X509crt string `yaml:"x509crt" json:"x509crt"`
X509key string `yaml:"x509key" json:"x509key"`
RoleAttribute string `yaml:"roleAttribute" json:"roleAttribute"`
Roles map[string]Role `yaml:"roles" json:"roles"`
IdpMetadataUrl string `yaml:"idpMetadataUrl" json:"idpMetadataUrl"`
X509crt string `yaml:"x509crt" json:"x509crt"`
X509key string `yaml:"x509key" json:"x509key"`
}
type AuthConfig struct {
Enabled bool `yaml:"enabled" json:"enabled" default:"false"`
Type string `yaml:"type" json:"type" default:"saml"`
Saml SamlConfig `yaml:"saml" json:"saml"`
Enabled bool `yaml:"enabled" json:"enabled" default:"false"`
// Type selects the authentication backend. Valid values:
// saml — SAML 2.0 SSO
// oidc — generic OIDC (Dex, Okta, Auth0, Keycloak, Azure AD, …)
// dex — permanent alias of oidc (kept for back-compat)
// descope — Descope SDK
// default — also routes to Descope (kept, not deprecated)
//
// NOTE: prior releases routed `oidc` to Descope. If you were using `oidc`
// to mean Descope, switch to `descope` (or `default`). The rename is a
// breaking change documented in the release notes.
Type string `yaml:"type" json:"type" default:"saml"`
RolesClaim string `yaml:"rolesClaim" json:"rolesClaim"`
// DefaultRole is applied when the authenticated user's SSO claim has no
// recognized group. Must be one of the four built-in roles
// (kubeshark-admin / kubeshark-realtime / kubeshark-snapshot /
// kubeshark-viewer), the name of an operator-defined role under
// `tap.auth.roles`, or empty for strict-deny.
DefaultRole string `yaml:"defaultRole" json:"defaultRole"`
// GroupMapping translates SSO group names into role names (built-in or
// operator-defined). Optional — groups whose name already matches a
// built-in role are identity-matched and don't need an entry here.
// Operator-defined role names MUST appear here to participate in
// resolution (identity-match is built-in-only).
GroupMapping map[string]string `yaml:"groupMapping" json:"groupMapping"`
// Roles is the operator-defined role catalogue, keyed by role name.
// Each role has its own capability set + namespace scope. Names with
// the `kubeshark-` prefix are reserved for built-ins and will be
// rejected at hub startup. Unknown capability strings are dropped
// with a warning; empty / "*" namespace specs mean deny-all-data and
// allow-all respectively (see hub plans/permissions-decisions.md).
Roles map[string]RoleConfig `yaml:"roles" json:"roles"`
Saml SamlConfig `yaml:"saml" json:"saml"`
}
// RoleConfig is an operator-defined role declared under tap.auth.roles.
// Capabilities is the closed vocabulary documented in the hub project
// (snapshot:read / snapshot:write / snapshot:dissection / dissection:live /
// dissection:control / pods:target:write / settings:write); unknown
// capability strings are warn-dropped at hub startup. Namespaces is a
// comma-separated list with `*` (allow-all) and glob (`foo-*`, `*-bar`,
// `*mid*`) support; empty string means deny-all-data.
type RoleConfig struct {
Capabilities []string `yaml:"capabilities" json:"capabilities"`
Namespaces string `yaml:"namespaces" json:"namespaces"`
}
type IngressConfig struct {
@@ -203,6 +226,7 @@ type DashboardConfig struct {
StreamingType string `yaml:"streamingType" json:"streamingType" default:"connect-rpc"`
CompleteStreamingEnabled bool `yaml:"completeStreamingEnabled" json:"completeStreamingEnabled" default:"true"`
ClusterWideMapEnabled bool `yaml:"clusterWideMapEnabled" json:"clusterWideMapEnabled" default:"false"`
EntriesLimit string `yaml:"entriesLimit" json:"entriesLimit" default:"300000"`
}
type FrontRoutingConfig struct {
@@ -282,7 +306,10 @@ type PortMapping struct {
HTTP []uint16 `yaml:"http" json:"http"`
AMQP []uint16 `yaml:"amqp" json:"amqp"`
KAFKA []uint16 `yaml:"kafka" json:"kafka"`
REDIS []uint16 `yaml:"redis" json:"redis"`
MONGODB []uint16 `yaml:"mongodb" json:"mongodb"`
MYSQL []uint16 `yaml:"mysql" json:"mysql"`
POSTGRESQL []uint16 `yaml:"postgresql" json:"postgresql"`
REDIS []uint16 `yaml:"redis" json:"redis"`
LDAP []uint16 `yaml:"ldap" json:"ldap"`
DIAMETER []uint16 `yaml:"diameter" json:"diameter"`
}
@@ -353,8 +380,10 @@ type SnapshotsConfig struct {
}
type DelayedDissectionConfig struct {
CPU string `yaml:"cpu" json:"cpu" default:"1"`
Memory string `yaml:"memory" json:"memory" default:"4Gi"`
CPU string `yaml:"cpu" json:"cpu" default:"1"`
Memory string `yaml:"memory" json:"memory" default:"4Gi"`
StorageSize string `yaml:"storageSize" json:"storageSize" default:""`
StorageClass string `yaml:"storageClass" json:"storageClass" default:""`
}
type DissectionConfig struct {
@@ -412,7 +441,6 @@ type TapConfig struct {
Gitops GitopsConfig `yaml:"gitops" json:"gitops"`
Sentry SentryConfig `yaml:"sentry" json:"sentry"`
DefaultFilter string `yaml:"defaultFilter" json:"defaultFilter" default:""`
LiveConfigMapChangesDisabled bool `yaml:"liveConfigMapChangesDisabled" json:"liveConfigMapChangesDisabled" default:"false"`
GlobalFilter string `yaml:"globalFilter" json:"globalFilter" default:""`
EnabledDissectors []string `yaml:"enabledDissectors" json:"enabledDissectors"`
PortMapping PortMapping `yaml:"portMapping" json:"portMapping"`

View File

@@ -1,6 +1,6 @@
apiVersion: v2
name: kubeshark
version: "53.1.0"
version: "53.2.5"
description: The API Traffic Analyzer for Kubernetes
home: https://kubeshark.com
keywords:

View File

@@ -164,6 +164,8 @@ Example for overriding image names:
| `tap.snapshots.cloud.gcs.credentialsJson` | Service account JSON key. When set, the chart auto-creates a Secret with `SNAPSHOT_GCS_CREDENTIALS_JSON`. | `""` |
| `tap.delayedDissection.cpu` | CPU allocation for delayed dissection jobs | `1` |
| `tap.delayedDissection.memory` | Memory allocation for delayed dissection jobs | `4Gi` |
| `tap.delayedDissection.storageSize` | Storage size for dissection job PVC. When empty, falls back to `tap.snapshots.local.storageSize`. When the resolved value is non-empty, a PVC is created; otherwise an `emptyDir` is used. | `""` |
| `tap.delayedDissection.storageClass` | Storage class for dissection job PVC. When empty, falls back to `tap.snapshots.local.storageClass`. | `""` |
| `tap.release.repo` | URL of the Helm chart repository | `https://helm.kubeshark.com` |
| `tap.release.name` | Helm release name | `kubeshark` |
| `tap.release.namespace` | Helm release namespace | `default` |
@@ -210,14 +212,15 @@ Example for overriding image names:
| `tap.tolerations.hub` | Tolerations for hub component | `[]` |
| `tap.tolerations.front` | Tolerations for front-end component | `[]` |
| `tap.auth.enabled` | Enable authentication | `false` |
| `tap.auth.type` | Authentication type (1 option available: `saml`) | `saml` |
| `tap.auth.type` | Authentication backend. Valid values: `saml`, `oidc` (generic OIDC — Dex, Okta, Auth0, Keycloak, Azure AD, Google, …), `dex` (permanent alias of `oidc`), `descope`, `default` (also routes to Descope). **Breaking**: prior releases routed `oidc` to Descope — if you were using it for Descope, switch to `descope` or `default`. | `saml` |
| `tap.auth.approvedEmails` | List of approved email addresses for authentication | `[]` |
| `tap.auth.approvedDomains` | List of approved email domains for authentication | `[]` |
| `tap.auth.saml.idpMetadataUrl` | SAML IDP metadata URL <br/>(effective, if `tap.auth.type = saml`) | `` |
| `tap.auth.rolesClaim` | Name of the JWT claim (OIDC) or SAML attribute carrying role memberships. | `role` |
| `tap.auth.defaultRole` | Optional role name inside `tap.auth.roles` applied as fallback when an authenticated user has no matching role. Empty string = no fallback, zero-valued permissions. | `""` |
| `tap.auth.roles` | Backend-neutral role map shared by SAML and OIDC. Each role's `namespaces` is a comma-separated list controlling which Kubernetes namespaces the role's users see traffic for: `""` = deny all, `"*"` = allow all, `"foo"` = literal namespace, `"foo,bar"` = OR over literals, `"foo-*"` = glob expansion against the cluster's known namespaces. Empty/unset `tap.auth.roles` grants nothing — admins opt into elevated access by populating this map. | `{"admin":{"namespaces":"*","canDownloadPCAP":true,"canUpdateTargetedPods":true,"canUseScripting":true,"scriptingPermissions":{"canSave":true,"canActivate":true,"canDelete":true},"canStopTrafficCapturing":true,"canControlDissection":true,"showAdminConsoleLink":true}}` |
| `tap.auth.saml.idpMetadataUrl` | SAML IDP metadata URL <br/>(effective, if `tap.auth.type = saml`) | `` |
| `tap.auth.saml.x509crt` | A self-signed X.509 `.cert` contents <br/>(effective, if `tap.auth.type = saml`) | `` |
| `tap.auth.saml.x509key` | A self-signed X.509 `.key` contents <br/>(effective, if `tap.auth.type = saml`) | `` |
| `tap.auth.saml.roleAttribute` | A SAML attribute name corresponding to user's authorization role <br/>(effective, if `tap.auth.type = saml`) | `role` |
| `tap.auth.saml.roles` | A list of SAML authorization roles and their permissions <br/>(effective, if `tap.auth.type = saml`) | `{"admin":{"canDownloadPCAP":true,"canUpdateTargetedPods":true,"canUseScripting":true, "scriptingPermissions":{"canSave":true, "canActivate":true, "canDelete":true}, "canStopTrafficCapturing":true, "canControlDissection":true, "filter":"","showAdminConsoleLink":true}}` |
| `tap.ingress.enabled` | Enable `Ingress` | `false` |
| `tap.ingress.className` | Ingress class name | `""` |
| `tap.ingress.host` | Host of the `Ingress` | `ks.svc.cluster.local` |
@@ -232,7 +235,6 @@ Example for overriding image names:
| `tap.sentry.enabled` | Enable sending of error logs to Sentry | `false` |
| `tap.sentry.environment` | Sentry environment to label error logs with | `production` |
| `tap.defaultFilter` | Sets the default dashboard KFL filter (e.g. `http`). By default, this value is set to filter out noisy protocols such as DNS, UDP, ICMP and TCP. The user can easily change this, **temporarily**, in the Dashboard. For a permanent change, you should change this value in the `values.yaml` or `config.yaml` file. | `""` |
| `tap.liveConfigMapChangesDisabled` | If set to `true`, all user functionality (scripting, targeting settings, global & default KFL modification, traffic recording, traffic capturing on/off, protocol dissectors) involving dynamic ConfigMap changes from UI will be disabled | `false` |
| `tap.globalFilter` | Prepends to any KFL filter and can be used to limit what is visible in the dashboard. For example, `redact("request.headers.Authorization")` will redact the appropriate field. Another example `!dns` will not show any DNS traffic. | `""` |
| `tap.metrics.port` | Pod port used to expose Prometheus metrics | `49100` |
| `tap.enabledDissectors` | This is an array of strings representing the list of supported protocols. Remove or comment out redundant protocols (e.g., dns).| The default list excludes: `udp` and `tcp` |
@@ -376,8 +378,8 @@ Add these helm values to set up OIDC authentication powered by your Dex IdP:
tap:
auth:
enabled: true
type: dex
dexOidc:
type: oidc # canonical; `dex` is accepted as a permanent alias
oidc:
issuer: <put Dex IdP issuer URL here>
clientId: kubeshark
clientSecret: create your own client password
@@ -389,7 +391,7 @@ tap:
---
**Note:**<br/>
Set `tap.auth.dexOidc.bypassSslCaCheck: true`
Set `tap.auth.oidc.bypassSslCaCheck: true`
to allow Kubeshark communication with Dex IdP having an unknown SSL Certificate Authority.
This setting allows you to prevent such SSL CA-related errors:<br/>
@@ -428,7 +430,7 @@ The following Dex settings will have these values:
| Setting | Value |
|-------------------------------------------------------|----------------------------------------------|
| `tap.auth.dexOidc.issuer` | `https://ks.example.com/dex` |
| `tap.auth.oidc.issuer` | `https://ks.example.com/dex` |
| `tap.auth.dexConfig.issuer` | `https://ks.example.com/dex` |
| `tap.auth.dexConfig.staticClients -> redirectURIs` | `https://ks.example.com/api/oauth2/callback` |
| `tap.auth.dexConfig.connectors -> config.redirectURI` | `https://ks.example.com/dex/callback` |
@@ -446,16 +448,16 @@ Please, make sure to prepare the following things first.
- You will need to specify storage settings in `tap.auth.dexConfig.storage`
- default: `memory`
3. Decide on the OAuth2 `?state=` param expiration time:
- field: `tap.auth.dexOidc.oauth2StateParamExpiry`
- field: `tap.auth.oidc.oauth2StateParamExpiry`
- default: `10m` (10 minutes)
- valid time units are `s`, `m`, `h`
4. Decide on the refresh token expiration:
- field 1: `tap.auth.dexOidc.expiry.refreshTokenLifetime`
- field 1: `tap.auth.oidc.expiry.refreshTokenLifetime`
- field 2: `tap.auth.dexConfig.expiry.refreshTokens.absoluteLifetime`
- default: `3960h` (165 days)
- valid time units are `s`, `m`, `h`
5. Create a unique & secure password to set in these fields:
- field 1: `tap.auth.dexOidc.clientSecret`
- field 1: `tap.auth.oidc.clientSecret`
- field 2: `tap.auth.dexConfig.staticClients -> secret`
- password must be the same for these 2 fields
6. Discover more possibilities of **[Dex Configuration](https://dexidp.io/docs/configuration/)**
@@ -477,8 +479,8 @@ Helm `values.yaml`:
tap:
auth:
enabled: true
type: dex
dexOidc:
type: oidc # canonical; `dex` is accepted as a permanent alias
oidc:
issuer: https://<your-ingress-hostname>/dex
# Client ID/secret must be taken from `tap.auth.dexConfig.staticClients -> id/secret`

View File

@@ -95,7 +95,85 @@ helm install kubeshark kubeshark/kubeshark \
### Example: IRSA (recommended for EKS)
Create a ConfigMap with bucket configuration:
[IAM Roles for Service Accounts (IRSA)](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) lets EKS pods assume an IAM role without static credentials. EKS injects a short-lived token into the pod automatically.
**Prerequisites:**
1. Your EKS cluster must have an [OIDC provider](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html) associated with it.
2. An IAM role with a trust policy that allows the Kubeshark service account to assume it.
**Step 1 — Create an IAM policy scoped to your bucket:**
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:GetObjectVersion",
"s3:DeleteObjectVersion",
"s3:ListBucket",
"s3:ListBucketVersions",
"s3:GetBucketLocation",
"s3:GetBucketVersioning"
],
"Resource": [
"arn:aws:s3:::my-kubeshark-snapshots",
"arn:aws:s3:::my-kubeshark-snapshots/*"
]
}
]
}
```
> For read-only access, remove `s3:PutObject`, `s3:DeleteObject`, and `s3:DeleteObjectVersion`.
**Step 2 — Create an IAM role with IRSA trust policy:**
```bash
# Get your cluster's OIDC provider URL
OIDC_PROVIDER=$(aws eks describe-cluster --name CLUSTER_NAME \
--query "cluster.identity.oidc.issuer" --output text | sed 's|https://||')
# Create a trust policy
# The default K8s SA name is "<release-name>-service-account" (e.g. "kubeshark-service-account")
cat > trust-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/${OIDC_PROVIDER}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${OIDC_PROVIDER}:sub": "system:serviceaccount:NAMESPACE:kubeshark-service-account",
"${OIDC_PROVIDER}:aud": "sts.amazonaws.com"
}
}
}
]
}
EOF
# Create the role and attach your policy
aws iam create-role \
--role-name KubesharkS3Role \
--assume-role-policy-document file://trust-policy.json
aws iam put-role-policy \
--role-name KubesharkS3Role \
--policy-name KubesharkSnapshotsBucketAccess \
--policy-document file://bucket-policy.json
```
**Step 3 — Create a ConfigMap with bucket configuration:**
```yaml
apiVersion: v1
@@ -107,10 +185,12 @@ data:
SNAPSHOT_AWS_REGION: us-east-1
```
Set Helm values:
**Step 4 — Set Helm values with `tap.annotations` to annotate the service account:**
```yaml
tap:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/KubesharkS3Role
snapshots:
cloud:
provider: "s3"
@@ -118,7 +198,17 @@ tap:
- kubeshark-s3-config
```
The hub pod's service account must be annotated for IRSA with an IAM role that has S3 access to the bucket.
Or via `--set`:
```bash
helm install kubeshark kubeshark/kubeshark \
--set tap.snapshots.cloud.provider=s3 \
--set tap.snapshots.cloud.s3.bucket=my-kubeshark-snapshots \
--set tap.snapshots.cloud.s3.region=us-east-1 \
--set tap.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::ACCOUNT_ID:role/KubesharkS3Role
```
No `accessKey`/`secretKey` is needed — EKS injects credentials automatically via the IRSA token.
### Example: Static Credentials

View File

@@ -44,6 +44,12 @@ rules:
- create
- update
- delete
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
@@ -86,6 +92,15 @@ rules:
verbs:
- create
- get
- apiGroups:
- ""
resources:
- persistentvolumeclaims
verbs:
- create
- get
- list
- delete
- apiGroups:
- batch
resources:

View File

@@ -56,6 +56,16 @@ spec:
- -dissector-memory
- '{{ .Values.tap.delayedDissection.memory }}'
{{- end }}
{{- $dissectorStorageSize := .Values.tap.delayedDissection.storageSize | default .Values.tap.snapshots.local.storageSize }}
{{- if $dissectorStorageSize }}
- -dissector-storage-size
- '{{ $dissectorStorageSize }}'
{{- end }}
{{- $dissectorStorageClass := .Values.tap.delayedDissection.storageClass | default .Values.tap.snapshots.local.storageClass }}
{{- if $dissectorStorageClass }}
- -dissector-storage-class
- '{{ $dissectorStorageClass }}'
{{- end }}
{{- if .Values.tap.gitops.enabled }}
- -gitops
{{- end }}

View File

@@ -26,15 +26,15 @@ spec:
- env:
- name: REACT_APP_AUTH_ENABLED
value: '{{- if or (and .Values.cloudLicenseEnabled (not (empty .Values.license))) (not .Values.internetConnectivity) -}}
{{ (and .Values.tap.auth.enabled (eq .Values.tap.auth.type "dex")) | ternary true false }}
{{ (default false .Values.demoModeEnabled) | ternary true ((and .Values.tap.auth.enabled (or (eq .Values.tap.auth.type "oidc") (eq .Values.tap.auth.type "dex"))) | ternary true false) }}
{{- else -}}
{{ .Values.cloudLicenseEnabled | ternary "true" .Values.tap.auth.enabled }}
{{ .Values.cloudLicenseEnabled | ternary "true" ((default false .Values.demoModeEnabled) | ternary "true" .Values.tap.auth.enabled) }}
{{- end }}'
- name: REACT_APP_AUTH_TYPE
value: '{{- if and .Values.cloudLicenseEnabled (not (eq .Values.tap.auth.type "dex")) -}}
value: '{{- if and .Values.cloudLicenseEnabled (not (or (eq .Values.tap.auth.type "oidc") (eq .Values.tap.auth.type "dex"))) -}}
default
{{- else -}}
{{ .Values.tap.auth.type }}
{{ (default false .Values.demoModeEnabled) | ternary "default" .Values.tap.auth.type }}
{{- end }}'
- name: REACT_APP_COMPLETE_STREAMING_ENABLED
value: '{{- if and (hasKey .Values.tap "dashboard") (hasKey .Values.tap.dashboard "completeStreamingEnabled") -}}
@@ -55,30 +55,22 @@ spec:
false
{{- end }}'
- name: REACT_APP_SCRIPTING_DISABLED
value: '{{- if .Values.tap.liveConfigMapChangesDisabled -}}
{{- if .Values.demoModeEnabled -}}
{{ .Values.demoModeEnabled | ternary false true }}
{{- else -}}
true
{{- end }}
{{- else -}}
false
{{- end }}'
value: '{{ default false .Values.demoModeEnabled }}'
- name: REACT_APP_TARGETED_PODS_UPDATE_DISABLED
value: '{{ .Values.tap.liveConfigMapChangesDisabled }}'
value: '{{ default false .Values.demoModeEnabled }}'
- name: REACT_APP_PRESET_FILTERS_CHANGING_ENABLED
value: '{{ .Values.tap.liveConfigMapChangesDisabled | ternary "false" "true" }}'
value: '{{ not (default false .Values.demoModeEnabled) }}'
- name: REACT_APP_BPF_OVERRIDE_DISABLED
value: '{{ eq .Values.tap.packetCapture "af_packet" | ternary "false" "true" }}'
- name: REACT_APP_RECORDING_DISABLED
value: '{{ .Values.tap.liveConfigMapChangesDisabled }}'
value: '{{ default false .Values.demoModeEnabled }}'
- name: REACT_APP_DISSECTION_ENABLED
value: '{{ .Values.tap.capture.dissection.enabled | ternary "true" "false" }}'
- name: REACT_APP_DISSECTION_CONTROL_ENABLED
value: '{{- if and .Values.tap.liveConfigMapChangesDisabled (not .Values.tap.capture.dissection.enabled) -}}
value: '{{- if and (not .Values.demoModeEnabled) (not .Values.tap.capture.dissection.enabled) -}}
true
{{- else -}}
{{ not .Values.tap.liveConfigMapChangesDisabled | ternary "true" "false" }}
{{ (default false .Values.demoModeEnabled) | ternary false true }}
{{- end -}}'
- name: 'REACT_APP_CLOUD_LICENSE_ENABLED'
value: '{{- if or (and .Values.cloudLicenseEnabled (not (empty .Values.license))) (not .Values.internetConnectivity) -}}
@@ -91,11 +83,17 @@ spec:
- name: REACT_APP_BETA_ENABLED
value: '{{ default false .Values.betaEnabled | ternary "true" "false" }}'
- name: REACT_APP_DISSECTORS_UPDATING_ENABLED
value: '{{ .Values.tap.liveConfigMapChangesDisabled | ternary "false" "true" }}'
value: '{{ not (default false .Values.demoModeEnabled) }}'
- name: REACT_APP_SNAPSHOTS_UPDATING_ENABLED
value: '{{ not (default false .Values.demoModeEnabled) }}'
- name: REACT_APP_DEMO_MODE_ENABLED
value: '{{ default false .Values.demoModeEnabled }}'
- name: REACT_APP_CLUSTER_WIDE_MAP_ENABLED
value: '{{ default false (((.Values).tap).dashboard).clusterWideMapEnabled }}'
- name: REACT_APP_RAW_CAPTURE_ENABLED
value: '{{ .Values.tap.capture.raw.enabled | ternary "true" "false" }}'
- name: REACT_APP_ENTRIES_LIMIT
value: '{{ default 300000 (((.Values).tap).dashboard).entriesLimit }}'
- name: REACT_APP_SENTRY_ENABLED
value: '{{ (include "sentry.enabled" .) }}'
- name: REACT_APP_SENTRY_ENVIRONMENT

View File

@@ -21,6 +21,7 @@ spec:
metadata:
labels:
app.kubeshark.com/app: worker
kubeshark.io/internal-auth: "true"
{{- include "kubeshark.labels" . | nindent 8 }}
name: kubeshark-worker-daemon-set
namespace: kubeshark
@@ -131,6 +132,10 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: TCP_STREAM_CHANNEL_TIMEOUT_MS
value: '{{ .Values.tap.misc.tcpStreamChannelTimeoutMs }}'
- name: TCP_STREAM_CHANNEL_TIMEOUT_SHOW
@@ -227,6 +232,9 @@ spec:
mountPropagation: HostToContainer
- mountPath: /app/data
name: data
{{- if .Values.tap.persistentStorage }}
subPathExpr: $(NODE_NAME)
{{- end }}
{{- if .Values.tap.tls }}
- command:
- ./tracer
@@ -257,6 +265,10 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: PROFILING_ENABLED
value: '{{ .Values.tap.pprof.enabled }}'
- name: SENTRY_ENABLED
@@ -328,6 +340,9 @@ spec:
mountPropagation: HostToContainer
- mountPath: /app/data
name: data
{{- if .Values.tap.persistentStorage }}
subPathExpr: $(NODE_NAME)
{{- end }}
- mountPath: /etc/os-release
name: os-release
readOnly: true

View File

@@ -20,6 +20,10 @@ data:
client_header_buffer_size 32k;
large_client_header_buffers 8 64k;
proxy_buffer_size 64k;
proxy_buffers 4 128k;
proxy_busy_buffers_size 128k;
location {{ default "" (((.Values.tap).routing).front).basePath }}/api {
rewrite ^{{ default "" (((.Values.tap).routing).front).basePath }}/api(.*)$ $1 break;
proxy_pass http://kubeshark-hub;

View File

@@ -19,47 +19,41 @@ data:
INGRESS_HOST: '{{ .Values.tap.ingress.host }}'
PROXY_FRONT_PORT: '{{ .Values.tap.proxy.front.port }}'
AUTH_ENABLED: '{{- if and .Values.cloudLicenseEnabled (not (empty .Values.license)) -}}
{{ and .Values.tap.auth.enabled (eq .Values.tap.auth.type "dex") | ternary true false }}
{{ (default false .Values.demoModeEnabled) | ternary true ((and .Values.tap.auth.enabled (or (eq .Values.tap.auth.type "oidc") (eq .Values.tap.auth.type "dex"))) | ternary true false) }}
{{- else -}}
{{ .Values.cloudLicenseEnabled | ternary "true" (.Values.tap.auth.enabled | ternary "true" "") }}
{{ .Values.cloudLicenseEnabled | ternary "true" ((default false .Values.demoModeEnabled) | ternary "true" .Values.tap.auth.enabled) }}
{{- end }}'
AUTH_TYPE: '{{- if and .Values.cloudLicenseEnabled (not (eq .Values.tap.auth.type "dex")) -}}
AUTH_TYPE: '{{- if and .Values.cloudLicenseEnabled (not (or (eq .Values.tap.auth.type "oidc") (eq .Values.tap.auth.type "dex"))) -}}
default
{{- else -}}
{{ .Values.tap.auth.type }}
{{ (default false .Values.demoModeEnabled) | ternary "default" .Values.tap.auth.type }}
{{- end }}'
AUTH_SAML_IDP_METADATA_URL: '{{ .Values.tap.auth.saml.idpMetadataUrl }}'
AUTH_SAML_ROLE_ATTRIBUTE: '{{ .Values.tap.auth.saml.roleAttribute }}'
AUTH_SAML_ROLES: '{{ .Values.tap.auth.saml.roles | toJson }}'
AUTH_OIDC_ISSUER: '{{ default "not set" (((.Values.tap).auth).dexOidc).issuer }}'
AUTH_OIDC_REFRESH_TOKEN_LIFETIME: '{{ default "3960h" (((.Values.tap).auth).dexOidc).refreshTokenLifetime }}'
AUTH_OIDC_STATE_PARAM_EXPIRY: '{{ default "10m" (((.Values.tap).auth).dexOidc).oauth2StateParamExpiry }}'
AUTH_ROLES_CLAIM: '{{ .Values.tap.auth.rolesClaim }}'
AUTH_DEFAULT_ROLE: '{{ default "" .Values.tap.auth.defaultRole }}'
AUTH_GROUP_MAPPING: '{{ default (dict) .Values.tap.auth.groupMapping | toJson }}'
AUTH_ROLES: '{{ default (dict) .Values.tap.auth.roles | toJson }}'
AUTH_OIDC_ISSUER: '{{ default "not set" (((.Values.tap).auth).oidc).issuer }}'
AUTH_OIDC_REFRESH_TOKEN_LIFETIME: '{{ default "3960h" (((.Values.tap).auth).oidc).refreshTokenLifetime }}'
AUTH_OIDC_STATE_PARAM_EXPIRY: '{{ default "10m" (((.Values.tap).auth).oidc).oauth2StateParamExpiry }}'
AUTH_OIDC_BYPASS_SSL_CA_CHECK: '{{- if and
(hasKey .Values.tap "auth")
(hasKey .Values.tap.auth "dexOidc")
(hasKey .Values.tap.auth.dexOidc "bypassSslCaCheck")
(hasKey .Values.tap.auth "oidc")
(hasKey .Values.tap.auth.oidc "bypassSslCaCheck")
-}}
{{ eq .Values.tap.auth.dexOidc.bypassSslCaCheck true | ternary "true" "false" }}
{{ eq .Values.tap.auth.oidc.bypassSslCaCheck true | ternary "true" "false" }}
{{- else -}}
false
{{- end }}'
TELEMETRY_DISABLED: '{{ not .Values.internetConnectivity | ternary "true" (not .Values.tap.telemetry.enabled | ternary "true" "false") }}'
SCRIPTING_DISABLED: '{{- if .Values.tap.liveConfigMapChangesDisabled -}}
{{- if .Values.demoModeEnabled -}}
{{ .Values.demoModeEnabled | ternary false true }}
{{- else -}}
true
{{- end }}
{{- else -}}
false
{{- end }}'
TARGETED_PODS_UPDATE_DISABLED: '{{ .Values.tap.liveConfigMapChangesDisabled | ternary "true" "" }}'
PRESET_FILTERS_CHANGING_ENABLED: '{{ .Values.tap.liveConfigMapChangesDisabled | ternary "false" "true" }}'
RECORDING_DISABLED: '{{ .Values.tap.liveConfigMapChangesDisabled | ternary "true" "" }}'
DISSECTION_CONTROL_ENABLED: '{{- if and .Values.tap.liveConfigMapChangesDisabled (not .Values.tap.capture.dissection.enabled) -}}
SCRIPTING_DISABLED: '{{ default false .Values.demoModeEnabled }}'
TARGETED_PODS_UPDATE_DISABLED: '{{ default false .Values.demoModeEnabled }}'
PRESET_FILTERS_CHANGING_ENABLED: '{{ not (default false .Values.demoModeEnabled) }}'
RECORDING_DISABLED: '{{ (default false .Values.demoModeEnabled) | ternary true false }}'
DISSECTION_CONTROL_ENABLED: '{{- if and (not .Values.demoModeEnabled) (not .Values.tap.capture.dissection.enabled) -}}
true
{{- else -}}
{{ not .Values.tap.liveConfigMapChangesDisabled | ternary "true" "false" }}
{{ (default false .Values.demoModeEnabled) | ternary false true }}
{{- end }}'
GLOBAL_FILTER: {{ include "kubeshark.escapeDoubleQuotes" .Values.tap.globalFilter | quote }}
DEFAULT_FILTER: {{ include "kubeshark.escapeDoubleQuotes" .Values.tap.defaultFilter | quote }}
@@ -76,7 +70,9 @@ data:
DUPLICATE_TIMEFRAME: '{{ .Values.tap.misc.duplicateTimeframe }}'
ENABLED_DISSECTORS: '{{ gt (len .Values.tap.enabledDissectors) 0 | ternary (join "," .Values.tap.enabledDissectors) "" }}'
CUSTOM_MACROS: '{{ toJson .Values.tap.customMacros }}'
DISSECTORS_UPDATING_ENABLED: '{{ .Values.tap.liveConfigMapChangesDisabled | ternary "false" "true" }}'
DISSECTORS_UPDATING_ENABLED: '{{ not (default false .Values.demoModeEnabled) }}'
SNAPSHOTS_UPDATING_ENABLED: '{{ not (default false .Values.demoModeEnabled) }}'
DEMO_MODE_ENABLED: '{{ default false .Values.demoModeEnabled }}'
DETECT_DUPLICATES: '{{ .Values.tap.misc.detectDuplicates | ternary "true" "false" }}'
PCAP_DUMP_ENABLE: '{{ .Values.pcapdump.enabled }}'
PCAP_TIME_INTERVAL: '{{ .Values.pcapdump.timeInterval }}'

View File

@@ -9,8 +9,8 @@ metadata:
stringData:
LICENSE: '{{ .Values.license }}'
SCRIPTING_ENV: '{{ .Values.scripting.env | toJson }}'
OIDC_CLIENT_ID: '{{ default "not set" (((.Values.tap).auth).dexOidc).clientId }}'
OIDC_CLIENT_SECRET: '{{ default "not set" (((.Values.tap).auth).dexOidc).clientSecret }}'
OIDC_CLIENT_ID: '{{ default "not set" (((.Values.tap).auth).oidc).clientId }}'
OIDC_CLIENT_SECRET: '{{ default "not set" (((.Values.tap).auth).oidc).clientSecret }}'
---

View File

@@ -0,0 +1,127 @@
suite: dissection storage configuration
templates:
- templates/04-hub-deployment.yaml
tests:
- it: should fallback to snapshot storageSize when dissection storageSize is empty
asserts:
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-size
- contains:
path: spec.template.spec.containers[0].command
content: "20Gi"
- it: should fallback to snapshot storageClass when dissection storageClass is empty
set:
tap.snapshots.local.storageClass: gp2
asserts:
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-class
- contains:
path: spec.template.spec.containers[0].command
content: gp2
- it: should not render dissector-storage-class when both dissection and snapshot storageClass are empty
asserts:
- notContains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-class
- it: should prefer dissection storageSize over snapshot storageSize
set:
tap.delayedDissection.storageSize: 100Gi
tap.snapshots.local.storageSize: 50Gi
asserts:
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-size
- contains:
path: spec.template.spec.containers[0].command
content: "100Gi"
- it: should prefer dissection storageClass over snapshot storageClass
set:
tap.delayedDissection.storageClass: io2
tap.snapshots.local.storageClass: gp2
asserts:
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-class
- contains:
path: spec.template.spec.containers[0].command
content: io2
- it: should fallback to snapshot config for both storageSize and storageClass
set:
tap.snapshots.local.storageSize: 30Gi
tap.snapshots.local.storageClass: gp3
asserts:
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-size
- contains:
path: spec.template.spec.containers[0].command
content: "30Gi"
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-class
- contains:
path: spec.template.spec.containers[0].command
content: gp3
- it: should not render dissector-storage-size when both dissection and snapshot storageSize are empty
set:
tap.delayedDissection.storageSize: ""
tap.snapshots.local.storageSize: ""
asserts:
- notContains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-size
- it: should render all dissector args together with custom values
set:
tap.delayedDissection.cpu: "4"
tap.delayedDissection.memory: 8Gi
tap.delayedDissection.storageSize: 200Gi
tap.delayedDissection.storageClass: local-path
asserts:
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-cpu
- contains:
path: spec.template.spec.containers[0].command
content: "4"
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-memory
- contains:
path: spec.template.spec.containers[0].command
content: 8Gi
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-size
- contains:
path: spec.template.spec.containers[0].command
content: "200Gi"
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-class
- contains:
path: spec.template.spec.containers[0].command
content: local-path
- it: should still render existing dissector-cpu and dissector-memory args
asserts:
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-cpu
- contains:
path: spec.template.spec.containers[0].command
content: "1"
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-memory
- contains:
path: spec.template.spec.containers[0].command
content: 4Gi

View File

@@ -37,6 +37,8 @@ tap:
delayedDissection:
cpu: "1"
memory: 4Gi
storageSize: ""
storageClass: ""
snapshots:
local:
storageClass: ""
@@ -151,24 +153,14 @@ tap:
auth:
enabled: false
type: saml
rolesClaim: groups
defaultRole: kubeshark-viewer
groupMapping: {}
roles: {}
saml:
idpMetadataUrl: ""
x509crt: ""
x509key: ""
roleAttribute: role
roles:
admin:
filter: ""
canDownloadPCAP: true
canUseScripting: true
scriptingPermissions:
canSave: true
canActivate: true
canDelete: true
canUpdateTargetedPods: true
canStopTrafficCapturing: true
canControlDissection: true
showAdminConsoleLink: true
ingress:
enabled: false
className: ""
@@ -186,6 +178,7 @@ tap:
streamingType: connect-rpc
completeStreamingEnabled: true
clusterWideMapEnabled: false
entriesLimit: "300000"
telemetry:
enabled: true
resourceGuard:
@@ -198,7 +191,6 @@ tap:
enabled: false
environment: production
defaultFilter: ""
liveConfigMapChangesDisabled: false
globalFilter: ""
enabledDissectors:
- amqp
@@ -206,6 +198,9 @@ tap:
- http
- icmp
- kafka
- mongodb
- mysql
- postgresql
- redis
- ws
- ldap
@@ -225,6 +220,12 @@ tap:
- 5672
kafka:
- 9092
mongodb:
- 27017
mysql:
- 3306
postgresql:
- 5432
redis:
- 6379
ldap:

View File

@@ -4,10 +4,10 @@ apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
name: kubeshark-hub-network-policy
namespace: default
@@ -33,10 +33,10 @@ apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
annotations:
name: kubeshark-front-network-policy
@@ -60,10 +60,10 @@ apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
annotations:
name: kubeshark-dex-network-policy
@@ -87,10 +87,10 @@ apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
annotations:
name: kubeshark-worker-network-policy
@@ -116,10 +116,10 @@ apiVersion: v1
kind: ServiceAccount
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
name: kubeshark-service-account
namespace: default
@@ -132,10 +132,10 @@ metadata:
namespace: default
labels:
app.kubeshark.com/app: hub
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
stringData:
LICENSE: ''
@@ -151,10 +151,10 @@ metadata:
namespace: default
labels:
app.kubeshark.com/app: hub
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
stringData:
AUTH_SAML_X509_CRT: |
@@ -167,10 +167,10 @@ metadata:
namespace: default
labels:
app.kubeshark.com/app: hub
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
stringData:
AUTH_SAML_X509_KEY: |
@@ -182,10 +182,10 @@ metadata:
name: kubeshark-nginx-config-map
namespace: default
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
data:
default.conf: |
@@ -199,6 +199,10 @@ data:
client_header_buffer_size 32k;
large_client_header_buffers 8 64k;
proxy_buffer_size 64k;
proxy_buffers 4 128k;
proxy_busy_buffers_size 128k;
location /api {
rewrite ^/api(.*)$ $1 break;
proxy_pass http://kubeshark-hub;
@@ -248,10 +252,10 @@ metadata:
namespace: default
labels:
app.kubeshark.com/app: hub
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
data:
POD_REGEX: '.*'
@@ -276,9 +280,9 @@ data:
AUTH_OIDC_BYPASS_SSL_CA_CHECK: 'false'
TELEMETRY_DISABLED: 'false'
SCRIPTING_DISABLED: 'false'
TARGETED_PODS_UPDATE_DISABLED: ''
TARGETED_PODS_UPDATE_DISABLED: 'false'
PRESET_FILTERS_CHANGING_ENABLED: 'true'
RECORDING_DISABLED: ''
RECORDING_DISABLED: 'false'
DISSECTION_CONTROL_ENABLED: 'true'
GLOBAL_FILTER: ""
DEFAULT_FILTER: ""
@@ -289,15 +293,17 @@ data:
TIMEZONE: ' '
CLOUD_LICENSE_ENABLED: 'true'
DUPLICATE_TIMEFRAME: '200ms'
ENABLED_DISSECTORS: 'amqp,dns,http,icmp,kafka,redis,ws,ldap,radius,diameter,udp-flow,tcp-flow,udp-conn,tcp-conn'
ENABLED_DISSECTORS: 'amqp,dns,http,icmp,kafka,mongodb,mysql,postgresql,redis,ws,ldap,radius,diameter,udp-flow,tcp-flow,udp-conn,tcp-conn'
CUSTOM_MACROS: '{"https":"tls and (http or http2)"}'
DISSECTORS_UPDATING_ENABLED: 'true'
SNAPSHOTS_UPDATING_ENABLED: 'true'
DEMO_MODE_ENABLED: 'false'
DETECT_DUPLICATES: 'false'
PCAP_DUMP_ENABLE: 'false'
PCAP_TIME_INTERVAL: '1m'
PCAP_MAX_TIME: '1h'
PCAP_MAX_SIZE: '500MB'
PORT_MAPPING: '{"amqp":[5671,5672],"diameter":[3868],"http":[80,443,8080],"kafka":[9092],"ldap":[389],"redis":[6379]}'
PORT_MAPPING: '{"amqp":[5671,5672],"diameter":[3868],"http":[80,443,8080],"kafka":[9092],"ldap":[389],"mongodb":[27017],"mysql":[3306],"postgresql":[5432],"redis":[6379]}'
RAW_CAPTURE_ENABLED: 'true'
RAW_CAPTURE_STORAGE_SIZE: '1Gi'
---
@@ -306,10 +312,10 @@ apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
name: kubeshark-cluster-role-default
namespace: default
@@ -353,10 +359,10 @@ apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
name: kubeshark-cluster-role-binding-default
namespace: default
@@ -374,10 +380,10 @@ apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
annotations:
name: kubeshark-self-config-role
@@ -412,6 +418,15 @@ rules:
verbs:
- create
- get
- apiGroups:
- ""
resources:
- persistentvolumeclaims
verbs:
- create
- get
- list
- delete
- apiGroups:
- batch
resources:
@@ -424,10 +439,10 @@ apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
annotations:
name: kubeshark-self-config-role-binding
@@ -447,10 +462,10 @@ kind: Service
metadata:
labels:
app.kubeshark.com/app: hub
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
name: kubeshark-hub
namespace: default
@@ -468,10 +483,10 @@ apiVersion: v1
kind: Service
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
name: kubeshark-front
namespace: default
@@ -489,10 +504,10 @@ kind: Service
apiVersion: v1
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
annotations:
prometheus.io/scrape: 'true'
@@ -502,10 +517,10 @@ metadata:
spec:
selector:
app.kubeshark.com/app: worker
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
ports:
- name: metrics
@@ -518,10 +533,10 @@ kind: Service
apiVersion: v1
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
annotations:
prometheus.io/scrape: 'true'
@@ -531,10 +546,10 @@ metadata:
spec:
selector:
app.kubeshark.com/app: hub
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
ports:
- name: metrics
@@ -549,10 +564,10 @@ metadata:
labels:
app.kubeshark.com/app: worker
sidecar.istio.io/inject: "false"
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
name: kubeshark-worker-daemon-set
namespace: default
@@ -566,10 +581,10 @@ spec:
metadata:
labels:
app.kubeshark.com/app: worker
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
name: kubeshark-worker-daemon-set
namespace: kubeshark
@@ -579,7 +594,7 @@ spec:
- /bin/sh
- -c
- mkdir -p /sys/fs/bpf && mount | grep -q '/sys/fs/bpf' || mount -t bpf bpf /sys/fs/bpf
image: 'docker.io/kubeshark/worker:v53.1'
image: 'docker.io/kubeshark/worker:v53.2'
imagePullPolicy: Always
name: mount-bpf
securityContext:
@@ -618,7 +633,7 @@ spec:
- '500Mi'
- -cloud-api-url
- 'https://api.kubeshark.com'
image: 'docker.io/kubeshark/worker:v53.1'
image: 'docker.io/kubeshark/worker:v53.2'
imagePullPolicy: Always
name: sniffer
ports:
@@ -634,6 +649,10 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: TCP_STREAM_CHANNEL_TIMEOUT_MS
value: '10000'
- name: TCP_STREAM_CHANNEL_TIMEOUT_SHOW
@@ -690,7 +709,7 @@ spec:
- -disable-tls-log
- -loglevel
- 'warning'
image: 'docker.io/kubeshark/worker:v53.1'
image: 'docker.io/kubeshark/worker:v53.2'
imagePullPolicy: Always
name: tracer
env:
@@ -702,6 +721,10 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: PROFILING_ENABLED
value: 'false'
- name: SENTRY_ENABLED
@@ -782,10 +805,10 @@ kind: Deployment
metadata:
labels:
app.kubeshark.com/app: hub
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
name: kubeshark-hub
namespace: default
@@ -800,10 +823,10 @@ spec:
metadata:
labels:
app.kubeshark.com/app: hub
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
spec:
dnsPolicy: ClusterFirstWithHostNet
@@ -819,13 +842,15 @@ spec:
- -capture-stop-after
- "5m"
- -snapshot-size-limit
- ''
- '20Gi'
- -dissector-image
- 'docker.io/kubeshark/worker:v53.1'
- 'docker.io/kubeshark/worker:v53.2'
- -dissector-cpu
- '1'
- -dissector-memory
- '4Gi'
- -dissector-storage-size
- '20Gi'
- -cloud-api-url
- 'https://api.kubeshark.com'
env:
@@ -843,7 +868,7 @@ spec:
value: 'production'
- name: PROFILING_ENABLED
value: 'false'
image: 'docker.io/kubeshark/hub:v53.1'
image: 'docker.io/kubeshark/hub:v53.2'
imagePullPolicy: Always
readinessProbe:
periodSeconds: 5
@@ -911,10 +936,10 @@ kind: Deployment
metadata:
labels:
app.kubeshark.com/app: front
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
name: kubeshark-front
namespace: default
@@ -929,10 +954,10 @@ spec:
metadata:
labels:
app.kubeshark.com/app: front
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.2.5
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.2.5"
app.kubernetes.io/managed-by: Helm
spec:
containers:
@@ -973,13 +998,21 @@ spec:
value: 'false'
- name: REACT_APP_DISSECTORS_UPDATING_ENABLED
value: 'true'
- name: REACT_APP_SNAPSHOTS_UPDATING_ENABLED
value: 'true'
- name: REACT_APP_DEMO_MODE_ENABLED
value: 'false'
- name: REACT_APP_CLUSTER_WIDE_MAP_ENABLED
value: 'false'
- name: REACT_APP_RAW_CAPTURE_ENABLED
value: 'true'
- name: REACT_APP_ENTRIES_LIMIT
value: '300000'
- name: REACT_APP_SENTRY_ENABLED
value: 'false'
- name: REACT_APP_SENTRY_ENVIRONMENT
value: 'production'
image: 'docker.io/kubeshark/front:v53.1'
image: 'docker.io/kubeshark/front:v53.2'
imagePullPolicy: Always
name: kubeshark-front
livenessProbe:

View File

@@ -2,6 +2,18 @@
[Kubeshark](https://kubeshark.com) MCP (Model Context Protocol) server enables AI assistants like Claude Desktop, Cursor, and other MCP-compatible clients to query real-time Kubernetes network traffic.
## AI Skills
The MCP provides the tools — [AI skills](../skills/) teach agents how to use them.
Skills turn raw MCP capabilities into domain-specific workflows like root cause
analysis, traffic filtering, and forensic investigation. See the
[skills README](../skills/README.md) for installation and usage.
| Skill | Description |
|-------|-------------|
| [`network-rca`](../skills/network-rca/) | Network Root Cause Analysis — snapshot-based retrospective investigation with PCAP and dissection routes |
| [`kfl`](../skills/kfl/) | KFL2 filter expert — write, debug, and optimize traffic queries across all supported protocols |
## Features
- **L7 API Traffic Analysis**: Query HTTP, gRPC, Redis, Kafka, DNS transactions
@@ -34,20 +46,20 @@ Add to your Claude Desktop configuration:
**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
**Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
#### URL Mode (Recommended for existing deployments)
#### Default (requires kubectl access / kube context)
```json
{
"mcpServers": {
"kubeshark": {
"command": "kubeshark",
"args": ["mcp", "--url", "https://kubeshark.example.com"]
"args": ["mcp"]
}
}
}
```
#### Proxy Mode (Requires kubectl access)
With an explicit kubeconfig path:
```json
{
@@ -59,14 +71,18 @@ Add to your Claude Desktop configuration:
}
}
```
or:
#### URL Mode (no kubectl required)
Use this when the machine doesn't have kubectl access or a kube context.
Connect directly to an existing Kubeshark deployment:
```json
{
"mcpServers": {
"kubeshark": {
"command": "kubeshark",
"args": ["mcp"]
"args": ["mcp", "--url", "https://kubeshark.example.com"]
}
}
}

120
skills/README.md Normal file
View File

@@ -0,0 +1,120 @@
# Kubeshark AI Skills
Open-source AI skills that work with the [Kubeshark MCP](https://github.com/kubeshark/kubeshark).
Skills teach AI agents how to use Kubeshark's MCP tools for specific workflows
like root cause analysis, traffic filtering, and forensic investigation.
Skills use the open [Agent Skills](https://github.com/anthropics/skills) format
and work with Claude Code, OpenAI Codex CLI, Gemini CLI, Cursor, and other
compatible agents.
## Available Skills
| Skill | Description |
|-------|-------------|
| [`network-rca`](network-rca/) | Network Root Cause Analysis. Retrospective traffic analysis via snapshots, with two investigation routes: PCAP (for Wireshark/compliance) and Dissection (for AI-driven API-level investigation). |
| [`kfl`](kfl/) | KFL2 (Kubeshark Filter Language) expert. Complete reference for writing, debugging, and optimizing CEL-based traffic filters across all supported protocols. |
## Prerequisites
All skills require the Kubeshark MCP:
```bash
# Claude Code
claude mcp add kubeshark -- kubeshark mcp
# Without kubectl access (direct URL)
claude mcp add kubeshark -- kubeshark mcp --url https://kubeshark.example.com
```
For Claude Desktop, add to `claude_desktop_config.json`:
```json
{
"mcpServers": {
"kubeshark": {
"command": "kubeshark",
"args": ["mcp"]
}
}
}
```
## Installation
### Option 1: Plugin (recommended)
Install as a Claude Code plugin directly from GitHub:
```
/plugin marketplace add kubeshark/kubeshark
/plugin install kubeshark
```
Skills appear as `/kubeshark:network-rca` and `/kubeshark:kfl`. The plugin
also bundles the Kubeshark MCP configuration automatically.
### Option 2: Clone and run
```bash
git clone https://github.com/kubeshark/kubeshark
cd kubeshark
claude
```
Skills trigger automatically based on your conversation.
### Option 3: Manual installation
Clone the repo (if you haven't already), then symlink or copy the skills:
```bash
git clone https://github.com/kubeshark/kubeshark
mkdir -p ~/.claude/skills
# Symlink to stay in sync with the repo (recommended)
ln -s kubeshark/skills/network-rca ~/.claude/skills/network-rca
ln -s kubeshark/skills/kfl ~/.claude/skills/kfl
# Or copy to your project (project scope only)
mkdir -p .claude/skills
cp -r kubeshark/skills/network-rca .claude/skills/
cp -r kubeshark/skills/kfl .claude/skills/
# Or copy for personal use (all your projects)
cp -r kubeshark/skills/network-rca ~/.claude/skills/
cp -r kubeshark/skills/kfl ~/.claude/skills/
```
## Contributing
We welcome contributions — whether improving an existing skill or proposing a new one.
- **Suggest improvements**: Open an issue or PR with changes to an existing skill's `SKILL.md`
or reference docs. Better examples, clearer workflows, and additional filter patterns
are always appreciated.
- **Add a new skill**: Open an issue describing the use case first. New skills should
follow the structure below and reference Kubeshark MCP tools by exact name.
### Skill structure
```
skills/
└── <skill-name>/
├── SKILL.md # Required. YAML frontmatter + markdown body.
└── references/ # Optional. Detailed reference docs.
└── *.md
```
### Guidelines
- Keep `SKILL.md` under 500 lines. Use `references/` for detailed content.
- Use imperative tone. Reference MCP tools by exact name.
- Include realistic example tool responses.
- The `description` frontmatter should be generous with trigger keywords.
### Planned skills
- `api-security` — OWASP API Top 10 assessment against live or snapshot traffic.
- `incident-response` — 7-phase forensic incident investigation methodology.
- `network-engineering` — Real-time traffic analysis, latency debugging, dependency mapping.

384
skills/kfl/SKILL.md Normal file
View File

@@ -0,0 +1,384 @@
---
name: kfl
user-invocable: false
description: >
KFL2 (Kubeshark Filter Language) reference. This skill MUST be loaded before
writing, constructing, or suggesting any KFL filter expression. KFL is statically
typed — incorrect field names or syntax will fail silently or error. Do not guess
at KFL syntax without this skill loaded. Trigger on any mention of KFL, CEL filters,
traffic filtering, display filters, query syntax, filter expressions, write a filter,
construct a query, build a KFL, create a filter expression, "how do I filter",
"show me only", "find traffic where", protocol-specific queries (HTTP status codes,
DNS lookups, Redis commands, Kafka topics), Kubernetes-aware filtering (by namespace,
pod, service, label, annotation), L4 connection/flow filters, time-based queries,
or any request to slice/search/narrow network traffic in Kubeshark. Also trigger
when other skills need to construct filters — KFL is the query language for all
Kubeshark traffic analysis.
---
# KFL2 — Kubeshark Filter Language
You are a KFL2 expert. KFL2 is built on Google's CEL (Common Expression Language)
and is the query language for all Kubeshark traffic analysis. It operates as a
**display filter** — it doesn't affect what's captured, only what you see.
Think of KFL the way you think of SQL for databases or Google search syntax for
the web. Kubeshark captures and indexes all cluster traffic; KFL is how you
search it.
For the complete variable and field reference, see `references/kfl2-reference.md`.
## Core Syntax
KFL expressions are boolean CEL expressions. An empty filter matches everything.
### Operators
| Category | Operators |
|----------|-----------|
| Comparison | `==`, `!=`, `<`, `<=`, `>`, `>=` |
| Logical | `&&`, `\|\|`, `!` |
| Arithmetic | `+`, `-`, `*`, `/`, `%` |
| Membership | `in` |
| Ternary | `condition ? true_val : false_val` |
### String Functions
```
str.contains(substring) // Substring search
str.startsWith(prefix) // Prefix match
str.endsWith(suffix) // Suffix match
str.matches(regex) // Regex match
size(str) // String length
```
### Collection Functions
```
size(collection) // List/map/string length
key in map // Key existence
map[key] // Value access
map_get(map, key, default) // Safe access with default
value in list // List membership
```
### Time Functions
```
timestamp("2026-03-14T22:00:00Z") // Parse ISO timestamp
duration("5m") // Parse duration
now() // Current time (snapshot at filter creation)
```
### Negation
```
!http // Everything that is NOT HTTP
http && status_code != 200 // HTTP responses that aren't 200
http && !path.contains("/health") // Exclude health checks
!(src.pod.namespace == "kube-system") // Exclude system namespace
```
## Protocol Detection
Boolean flags that indicate which protocol was detected. Use these as the first
filter term — they're fast and narrow the search space immediately.
| Flag | Protocol | Flag | Protocol |
|------|----------|------|----------|
| `http` | HTTP/1.1, HTTP/2 | `redis` | Redis |
| `dns` | DNS | `kafka` | Kafka |
| `tls` | eBPF TLS interception | `amqp` | AMQP |
| `tcp` | TCP | `ldap` | LDAP |
| `udp` | UDP | `ws` | WebSocket |
| `sctp` | SCTP | `gql` | GraphQL (v1+v2) |
| `icmp` | ICMP | `gqlv1` / `gqlv2` | GraphQL version-specific |
| `grpc` | gRPC (HTTP/2 sub-protocol) | `mongodb` | MongoDB |
| `mysql` | MySQL | `radius` | RADIUS |
| `diameter` | Diameter | `conn` / `flow` | L4 connection/flow tracking |
| | | `tcp_conn` / `udp_conn` | Transport-specific connections |
## Kubernetes Context
The most common starting point. Filter by where traffic originates or terminates.
### Pod and Service Fields
```
src.pod.name == "orders-594487879c-7ddxf"
dst.pod.namespace == "production"
src.service.name == "api-gateway"
dst.service.namespace == "payments"
```
Pod fields fall back to service data when pod info is unavailable, so
`dst.pod.namespace` works even for service-level entries.
### Summary Name and Namespace
Convenience variables that pick the best available identity for a peer:
```
src.name == "api-gateway" // pod > service > dns > process
dst.name.contains("payment") // works across identity types
src.namespace == "production" // pod namespace, falls back to service
dst.namespace != "kube-system" // exclude system namespace
```
### Aggregate Collections
Match against any direction (src or dst):
```
"production" in namespaces // Any namespace match
"orders" in pods // Any pod name match
"api-gateway" in services // Any service name match
```
### Labels and Annotations
```
map_get(local_labels, "app", "") == "checkout" // Safe access with default
map_get(remote_labels, "version", "") == "canary"
"tier" in local_labels // Label existence check
```
Always use `map_get()` for labels and annotations — direct access like
`local_labels["app"]` errors if the key doesn't exist.
### Node and Process
```
node_name == "ip-10-0-25-170.ec2.internal"
local_process_name == "nginx"
remote_process_name.contains("postgres")
```
### DNS Resolution
```
src.dns == "api.example.com"
dst.dns.contains("redis")
```
## HTTP Filtering
HTTP is the most common protocol for API-level investigation.
### Fields
| Field | Type | Example |
|-------|------|---------|
| `method` | string | `"GET"`, `"POST"`, `"PUT"`, `"DELETE"` |
| `url` | string | Full path + query: `"/api/users?id=123"` |
| `path` | string | Path only: `"/api/users"` |
| `status_code` | int | `200`, `404`, `500` |
| `http_version` | string | `"HTTP/1.1"`, `"HTTP/2"` |
| `request.headers` | map | `request.headers["content-type"]` |
| `response.headers` | map | `response.headers["server"]` |
| `request.cookies` | map | `request.cookies["session"]` |
| `response.cookies` | map | `response.cookies["token"]` |
| `query_string` | map | `query_string["id"]` |
| `request_body_size` | int | Request body bytes |
| `response_body_size` | int | Response body bytes |
| `elapsed_time` | int | Duration in **microseconds** |
### Common Patterns
```
// Error investigation
http && status_code >= 500 // Server errors
http && status_code == 429 // Rate limiting
http && status_code >= 400 && status_code < 500 // Client errors
// Endpoint targeting
http && method == "POST" && path.contains("/orders")
http && url.matches(".*/api/v[0-9]+/users.*")
// Performance
http && elapsed_time > 5000000 // > 5 seconds
http && response_body_size > 1000000 // > 1MB responses
// Header inspection
http && "authorization" in request.headers
http && request.headers["content-type"] == "application/json"
// GraphQL (subset of HTTP)
gql && method == "POST" && status_code >= 400
// Only eBPF-intercepted TLS traffic (decrypted HTTPS)
tls && http && status_code >= 500
```
> **Note on `tls`**: The `tls` flag is an alias for `capture_source == "ebpf_tls"`.
> It indicates traffic captured via eBPF TLS interception, not TLS protocol dissection.
## DNS Filtering
DNS issues are often the hidden root cause of outages.
| Field | Type | Description |
|-------|------|-------------|
| `dns_questions` | []string | Question domain names |
| `dns_answers` | []string | Answer domain names |
| `dns_question_types` | []string | Record types: A, AAAA, CNAME, MX, TXT, SRV, PTR |
| `dns_request` | bool | Is request |
| `dns_response` | bool | Is response |
| `dns_request_length` | int | Request size |
| `dns_response_length` | int | Response size |
```
dns && "api.external-service.com" in dns_questions
dns && dns_response && status_code != 0 // Failed lookups
dns && "A" in dns_question_types // A record queries
dns && size(dns_questions) > 1 // Multi-question
```
## Database and Messaging Protocols
### Redis
```
redis && redis_type == "GET" // Command type
redis && redis_key.startsWith("session:") // Key pattern
redis && redis_command.contains("DEL") // Command search
redis && redis_total_size > 10000 // Large operations
```
### Kafka
```
kafka && kafka_api_key_name == "PRODUCE" // Produce operations
kafka && kafka_client_id == "payment-processor" // Client filtering
kafka && kafka_request_summary.contains("orders") // Topic filtering
kafka && kafka_size > 10000 // Large messages
```
### MongoDB
```
mongodb && mongodb_command == "find" // Find operations
mongodb && mongodb_collection == "users" // Collection filtering
mongodb && mongodb_database == "mydb" // Database filtering
mongodb && !mongodb_success // Failed operations
mongodb && mongodb_error_code != 0 // Error code filtering
mongodb && mongodb_total_size > 10000 // Large operations
```
### MySQL
```
mysql && mysql_command == "COM_QUERY" // SQL queries
mysql && mysql_query.contains("SELECT") // SELECT statements
mysql && mysql_database == "orders_db" // Database filtering
mysql && !mysql_success // Failed queries
mysql && mysql_error_code != 0 // Error code filtering
mysql && mysql_total_size > 10000 // Large queries
```
### gRPC
gRPC is a sub-protocol of HTTP/2. All HTTP variables are also available on gRPC entries.
```
grpc && grpc_method == "SayHello" // Method filtering
grpc && grpc_status != 0 // Non-OK status codes
grpc && grpc_status == 14 // UNAVAILABLE
grpc && grpc_method.contains("Create") // Method pattern
grpc && elapsed_time > 1000000 // Slow gRPC calls (>1s)
```
### AMQP, LDAP, RADIUS, Diameter
```
amqp && amqp_method == "basic.publish" // AMQP publish
ldap && ldap_type == "bind" // LDAP bind requests
radius && radius_code_name == "Access-Request" // RADIUS auth
diameter && diameter_method.contains("Credit") // Diameter credit control
```
For the full variable list for these protocols, see `references/kfl2-reference.md`.
## Transport Layer (L4)
### TCP/UDP Fields
```
tcp && tcp_error_type != "" // TCP errors
udp && udp_length > 1000 // Large UDP packets
```
### Connection Tracking
```
conn && conn_state == "open" // Active connections
conn && conn_local_bytes > 1000000 // High-volume
conn && "HTTP" in conn_l7_detected // L7 protocol detection
tcp_conn && conn_state == "closed" // Closed TCP connections
```
### Flow Tracking (with Rate Metrics)
```
flow && flow_local_pps > 1000 // High packet rate
flow && flow_local_bps > 1000000 // High bandwidth
flow && flow_state == "closed" && "TLS" in flow_l7_detected
tcp_flow && flow_local_bps > 5000000 // High-throughput TCP
```
## Network Layer
```
src.ip == "10.0.53.101"
dst.ip.startsWith("192.168.")
src.port == 8080
dst.port >= 8000 && dst.port <= 9000
```
## Time-Based Filtering
```
timestamp > timestamp("2026-03-14T22:00:00Z")
timestamp >= timestamp("2026-03-14T22:00:00Z") && timestamp <= timestamp("2026-03-14T23:00:00Z")
timestamp > now() - duration("5m") // Last 5 minutes
elapsed_time > 2000000 // Latency > 2 seconds
```
## Building Filters: Progressive Narrowing
The most effective investigation technique — start broad, add constraints:
```
// Step 1: Protocol + namespace
http && dst.pod.namespace == "production"
// Step 2: Add error condition
http && dst.pod.namespace == "production" && status_code >= 500
// Step 3: Narrow to service
http && dst.pod.namespace == "production" && status_code >= 500 && dst.service.name == "payment-service"
// Step 4: Narrow to endpoint
http && dst.pod.namespace == "production" && status_code >= 500 && dst.service.name == "payment-service" && path.contains("/charge")
// Step 5: Add timing
http && dst.pod.namespace == "production" && status_code >= 500 && dst.service.name == "payment-service" && path.contains("/charge") && elapsed_time > 2000000
```
## Performance Tips
1. **Protocol flags first**`http && ...` is faster than `... && http`
2. **`startsWith`/`endsWith` over `contains`** — prefix/suffix checks are faster
3. **Specific ports before string ops**`dst.port == 80` is cheaper than `url.contains(...)`
4. **Use `map_get` for labels** — avoids errors on missing keys
5. **Keep filters simple** — CEL short-circuits on `&&`, so put cheap checks first
## Type Safety
KFL2 is statically typed. Common gotchas:
- `status_code` is `int`, not string — use `status_code == 200`, not `"200"`
- `elapsed_time` is in **microseconds** — 5 seconds = `5000000`
- `timestamp` requires `timestamp()` function — not a raw string
- Map access on missing keys errors — use `key in map` or `map_get()` first
- List membership uses `value in list` — not `list.contains(value)`

View File

@@ -0,0 +1,467 @@
# KFL2 Complete Variable and Field Reference
This is the exhaustive reference for every variable available in KFL2 filters.
KFL2 is built on Google's CEL (Common Expression Language) and evaluates against
Kubeshark's protobuf-based `BaseEntry` structure.
## Most Commonly Used Variables
These are the variables you'll reach for in 90% of investigations:
| Variable | Type | What it's for |
|----------|------|---------------|
| `status_code` | int | HTTP response status (200, 404, 500) |
| `method` | string | HTTP method (GET, POST, PUT, DELETE) |
| `path` | string | URL path without query string |
| `dst.pod.namespace` | string | Where traffic is going (namespace) |
| `dst.service.name` | string | Where traffic is going (service) |
| `src.pod.name` | string | Where traffic comes from (pod) |
| `elapsed_time` | int | Request duration in microseconds |
| `dns_questions` | []string | DNS domains being queried |
| `namespaces` | []string | All namespaces involved (src + dst) |
## Network-Level Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `src.ip` | string | Source IP address | `"10.0.53.101"` |
| `dst.ip` | string | Destination IP address | `"192.168.1.1"` |
| `src.port` | int | Source port number | `43210` |
| `dst.port` | int | Destination port number | `8080` |
| `protocol` | string | Detected protocol type | `"HTTP"`, `"DNS"` |
## Identity and Metadata Variables
| Variable | Type | Description |
|----------|------|-------------|
| `id` | int | BaseEntry unique identifier (assigned by sniffer) |
| `node_id` | string | Node identifier (assigned by hub) |
| `index` | int | Entry index for stream uniqueness |
| `stream` | string | Stream identifier (hex string) |
| `timestamp` | timestamp | Event time (UTC), use with `timestamp()` function |
| `elapsed_time` | int | Response-request latency in microseconds |
| `worker` | string | Worker identifier |
## Cross-Reference Variables
| Variable | Type | Description |
|----------|------|-------------|
| `conn_id` | int | L7 to L4 connection cross-reference ID |
| `flow_id` | int | L7 to L4 flow cross-reference ID |
| `has_pcap` | bool | Whether PCAP data is available for this entry |
## Capture Source Variables
| Variable | Type | Description | Values |
|----------|------|-------------|--------|
| `capture_source` | string | Canonical capture source | `"unspecified"`, `"af_packet"`, `"ebpf"`, `"ebpf_tls"` |
| `capture_backend` | string | Backend family | `"af_packet"`, `"ebpf"` |
| `capture_source_code` | int | Numeric enum | 0=unspecified, 1=af_packet, 2=ebpf, 3=ebpf_tls |
| `capture` | map | Nested map access | `capture["source"]`, `capture["backend"]` |
## Protocol Detection Flags
Boolean variables indicating detected protocol. Use as first filter term for performance.
| Variable | Protocol | Variable | Protocol |
|----------|----------|----------|----------|
| `http` | HTTP/1.1, HTTP/2 | `redis` | Redis |
| `dns` | DNS | `kafka` | Kafka |
| `tls` | eBPF TLS interception | `amqp` | AMQP messaging |
| `tcp` | TCP transport | `ldap` | LDAP directory |
| `udp` | UDP transport | `ws` | WebSocket |
| `sctp` | SCTP streaming | `gql` | GraphQL (v1 or v2) |
| `icmp` | ICMP | `gqlv1` | GraphQL v1 only |
| `grpc` | gRPC (HTTP/2 sub-protocol) | `gqlv2` | GraphQL v2 only |
| `mongodb` | MongoDB | `mysql` | MySQL |
| `radius` | RADIUS auth | `diameter` | Diameter |
| | | `conn` | L4 connection tracking |
| `flow` | L4 flow tracking | `tcp_conn` | TCP connection tracking |
| `tcp_flow` | TCP flow tracking | `udp_conn` | UDP connection tracking |
| `udp_flow` | UDP flow tracking | | |
## HTTP Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `method` | string | HTTP method | `"GET"`, `"POST"`, `"PUT"`, `"DELETE"`, `"PATCH"` |
| `url` | string | Full URL path and query string | `"/api/users?id=123"` |
| `path` | string | URL path component (no query) | `"/api/users"` |
| `status_code` | int | HTTP response status code | `200`, `404`, `500` |
| `http_version` | string | HTTP protocol version | `"HTTP/1.1"`, `"HTTP/2"` |
| `query_string` | map[string]string | Parsed URL query parameters | `query_string["id"]``"123"` |
| `request.headers` | map[string]string | Request HTTP headers | `request.headers["content-type"]` |
| `response.headers` | map[string]string | Response HTTP headers | `response.headers["server"]` |
| `request.cookies` | map[string]string | Request cookies | `request.cookies["session"]` |
| `response.cookies` | map[string]string | Response cookies | `response.cookies["token"]` |
| `request_headers_size` | int | Request headers size in bytes | |
| `request_body_size` | int | Request body size in bytes | |
| `response_headers_size` | int | Response headers size in bytes | |
| `response_body_size` | int | Response body size in bytes | |
GraphQL requests have `gql` (or `gqlv1`/`gqlv2`) set to true and all HTTP
variables available.
**Example**: `http && method == "POST" && status_code >= 500 && path.contains("/api")`
## DNS Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `dns_questions` | []string | Question domain names (request + response) | `["example.com"]` |
| `dns_answers` | []string | Answer domain names | `["1.2.3.4"]` |
| `dns_question_types` | []string | Record types in questions | `["A"]`, `["AAAA"]`, `["CNAME"]` |
| `dns_request` | bool | Is DNS request message | |
| `dns_response` | bool | Is DNS response message | |
| `dns_request_length` | int | DNS request size in bytes (0 if absent) | |
| `dns_response_length` | int | DNS response size in bytes (0 if absent) | |
| `dns_total_size` | int | Sum of request + response sizes | |
Supported question types: A, AAAA, NS, CNAME, SOA, MX, TXT, SRV, PTR, ANY.
**Example**: `dns && dns_response && status_code != 0` (failed DNS lookups)
## TLS Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `tls` | bool | eBPF TLS interception (alias for `capture_source == "ebpf_tls"`) | |
| `tls_summary` | string | TLS handshake summary | `"ClientHello"`, `"ServerHello"` |
| `tls_info` | string | TLS connection details | `"TLS 1.3, AES-256-GCM"` |
| `tls_request_size` | int | TLS request size in bytes | |
| `tls_response_size` | int | TLS response size in bytes | |
| `tls_total_size` | int | Sum of request + response (computed if not provided) | |
## TCP Variables
| Variable | Type | Description |
|----------|------|-------------|
| `tcp` | bool | TCP payload detected |
| `tcp_method` | string | TCP method information |
| `tcp_payload` | bytes | Raw TCP payload data |
| `tcp_error_type` | string | TCP error type (empty if none) |
| `tcp_error_message` | string | TCP error message (empty if none) |
## UDP Variables
| Variable | Type | Description |
|----------|------|-------------|
| `udp` | bool | UDP payload detected |
| `udp_length` | int | UDP packet length |
| `udp_checksum` | int | UDP checksum value |
| `udp_payload` | bytes | Raw UDP payload data |
## SCTP Variables
| Variable | Type | Description |
|----------|------|-------------|
| `sctp` | bool | SCTP payload detected |
| `sctp_checksum` | int | SCTP checksum value |
| `sctp_chunk_type` | string | SCTP chunk type |
| `sctp_length` | int | SCTP chunk length |
## ICMP Variables
| Variable | Type | Description |
|----------|------|-------------|
| `icmp` | bool | ICMP payload detected |
| `icmp_type` | string | ICMP type code |
| `icmp_version` | int | ICMP version (4 or 6) |
| `icmp_length` | int | ICMP message length |
## WebSocket Variables
| Variable | Type | Description | Values |
|----------|------|-------------|--------|
| `ws` | bool | WebSocket payload detected | |
| `ws_opcode` | string | WebSocket operation code | `"text"`, `"binary"`, `"close"`, `"ping"`, `"pong"` |
| `ws_request` | bool | Is WebSocket request | |
| `ws_response` | bool | Is WebSocket response | |
| `ws_request_payload_data` | string | Request payload (safely truncated) | |
| `ws_request_payload_length` | int | Request payload length in bytes | |
| `ws_response_payload_length` | int | Response payload length in bytes | |
## Redis Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `redis` | bool | Redis payload detected | |
| `redis_type` | string | Redis command verb | `"GET"`, `"SET"`, `"DEL"`, `"HGET"` |
| `redis_command` | string | Full Redis command line | `"GET session:1234"` |
| `redis_key` | string | Key (truncated to 64 bytes) | `"session:1234"` |
| `redis_request_size` | int | Request size (0 if absent) | |
| `redis_response_size` | int | Response size (0 if absent) | |
| `redis_total_size` | int | Sum of request + response | |
**Example**: `redis && redis_type == "GET" && redis_key.startsWith("session:")`
## Kafka Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `kafka` | bool | Kafka payload detected | |
| `kafka_api_key` | int | Kafka API key number | 0=FETCH, 1=PRODUCE |
| `kafka_api_key_name` | string | Human-readable API operation | `"PRODUCE"`, `"FETCH"` |
| `kafka_client_id` | string | Kafka client identifier | `"payment-processor"` |
| `kafka_size` | int | Message size (request preferred, else response) | |
| `kafka_request` | bool | Is Kafka request | |
| `kafka_response` | bool | Is Kafka response | |
| `kafka_request_summary` | string | Request summary/topic | `"orders-topic"` |
| `kafka_request_size` | int | Request size (0 if absent) | |
| `kafka_response_size` | int | Response size (0 if absent) | |
**Example**: `kafka && kafka_api_key_name == "PRODUCE" && kafka_request_summary.contains("orders")`
## AMQP Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `amqp` | bool | AMQP payload detected | |
| `amqp_method` | string | AMQP method name | `"basic.publish"`, `"channel.open"` |
| `amqp_summary` | string | Operation summary | |
| `amqp_request` | bool | Is AMQP request | |
| `amqp_response` | bool | Is AMQP response | |
| `amqp_request_length` | int | Request length (0 if absent) | |
| `amqp_response_length` | int | Response length (0 if absent) | |
| `amqp_total_size` | int | Sum of request + response | |
## LDAP Variables
| Variable | Type | Description |
|----------|------|-------------|
| `ldap` | bool | LDAP payload detected |
| `ldap_type` | string | LDAP operation type (request preferred) |
| `ldap_summary` | string | Operation summary |
| `ldap_request` | bool | Is LDAP request |
| `ldap_response` | bool | Is LDAP response |
| `ldap_request_length` | int | Request length (0 if absent) |
| `ldap_response_length` | int | Response length (0 if absent) |
| `ldap_total_size` | int | Sum of request + response |
## RADIUS Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `radius` | bool | RADIUS payload detected | |
| `radius_code` | int | RADIUS code (request preferred) | |
| `radius_code_name` | string | Code name | `"Access-Request"` |
| `radius_request` | bool | Is RADIUS request | |
| `radius_response` | bool | Is RADIUS response | |
| `radius_request_authenticator` | string | Request authenticator (hex) | |
| `radius_request_length` | int | Request size (0 if absent) | |
| `radius_response_length` | int | Response size (0 if absent) | |
| `radius_total_size` | int | Sum of request + response | |
## Diameter Variables
| Variable | Type | Description |
|----------|------|-------------|
| `diameter` | bool | Diameter payload detected |
| `diameter_method` | string | Method name (request preferred) |
| `diameter_summary` | string | Operation summary |
| `diameter_request` | bool | Is Diameter request |
| `diameter_response` | bool | Is Diameter response |
| `diameter_request_length` | int | Request size (0 if absent) |
| `diameter_response_length` | int | Response size (0 if absent) |
| `diameter_total_size` | int | Sum of request + response |
## MongoDB Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `mongodb` | bool | MongoDB payload detected | |
| `mongodb_command` | string | Operation type | `"find"`, `"insert"`, `"update"`, `"delete"` |
| `mongodb_database` | string | Database name | `"mydb"` |
| `mongodb_collection` | string | Collection name | `"users"` |
| `mongodb_opcode` | string | Operation opcode name | |
| `mongodb_request_size` | int | Request size in bytes | |
| `mongodb_response_size` | int | Response size in bytes | |
| `mongodb_total_size` | int | Combined request + response size | |
| `mongodb_success` | bool | Operation success status | |
| `mongodb_error_code` | int | Error code | |
| `mongodb_error_message` | string | Error description | |
| `mongodb_error_code_name` | string | Named error code | |
**Example**: `mongodb && mongodb_command == "find" && mongodb_collection == "users"`
## MySQL Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `mysql` | bool | MySQL payload detected | |
| `mysql_command` | string | SQL command name | `"COM_QUERY"`, `"COM_STMT_PREPARE"` |
| `mysql_query` | string | Full SQL query text | `"SELECT * FROM users"` |
| `mysql_database` | string | Active database name | `"orders_db"` |
| `mysql_statement_id` | int | Prepared statement identifier | |
| `mysql_request_size` | int | Request payload size in bytes | |
| `mysql_response_size` | int | Response payload size in bytes | |
| `mysql_total_size` | int | Combined request + response size | |
| `mysql_success` | bool | Response OK status | |
| `mysql_error_code` | int | MySQL error code | |
| `mysql_error_message` | string | Error description | |
**Example**: `mysql && mysql_query.contains("SELECT") && !mysql_success`
## gRPC Variables
gRPC is a sub-protocol of HTTP/2. When `grpc` is true, all HTTP variables are also available.
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `grpc` | bool | gRPC payload detected | |
| `grpc_method` | string | Trailing method name from gRPC :path | `"SayHello"` (from `/helloworld.Greeter/SayHello`) |
| `grpc_status` | int | gRPC status code from Grpc-Status trailer | `0`=OK, `5`=NOT_FOUND, `14`=UNAVAILABLE; `-1` on non-gRPC |
**Example**: `grpc && grpc_status != 0 && grpc_method.contains("Create")`
## L4 Connection Tracking Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `conn` | bool | Connection tracking entry | |
| `conn_state` | string | Connection state | `"open"`, `"in_progress"`, `"closed"` |
| `conn_local_pkts` | int | Packets from local peer | |
| `conn_local_bytes` | int | Bytes from local peer | |
| `conn_remote_pkts` | int | Packets from remote peer | |
| `conn_remote_bytes` | int | Bytes from remote peer | |
| `conn_l7_detected` | []string | L7 protocols detected on connection | `["HTTP", "TLS"]` |
| `conn_group_id` | int | Connection group identifier | |
**Example**: `conn && conn_state == "open" && conn_local_bytes > 1000000` (high-volume open connections)
## L4 Flow Tracking Variables
Flows extend connections with rate metrics (packets/bytes per second).
| Variable | Type | Description |
|----------|------|-------------|
| `flow` | bool | Flow tracking entry |
| `flow_state` | string | Flow state (`"open"`, `"in_progress"`, `"closed"`) |
| `flow_local_pkts` | int | Packets from local peer |
| `flow_local_bytes` | int | Bytes from local peer |
| `flow_remote_pkts` | int | Packets from remote peer |
| `flow_remote_bytes` | int | Bytes from remote peer |
| `flow_local_pps` | int | Local packets per second |
| `flow_local_bps` | int | Local bytes per second |
| `flow_remote_pps` | int | Remote packets per second |
| `flow_remote_bps` | int | Remote bytes per second |
| `flow_l7_detected` | []string | L7 protocols detected on flow |
| `flow_group_id` | int | Flow group identifier |
**Example**: `tcp_flow && flow_local_bps > 5000000` (high-bandwidth TCP flows)
## Kubernetes Variables
### Pod and Service (Directional)
| Variable | Type | Description |
|----------|------|-------------|
| `src.pod.name` | string | Source pod name |
| `src.pod.namespace` | string | Source pod namespace |
| `dst.pod.name` | string | Destination pod name |
| `dst.pod.namespace` | string | Destination pod namespace |
| `src.service.name` | string | Source service name |
| `src.service.namespace` | string | Source service namespace |
| `dst.service.name` | string | Destination service name |
| `dst.service.namespace` | string | Destination service namespace |
**Fallback behavior**: Pod namespace/name fields automatically fall back to
service data when pod info is unavailable. This means `dst.pod.namespace` works
even when only service-level resolution exists.
**Example**: `src.service.name == "api-gateway" && dst.pod.namespace == "production"`
### Summary Name and Namespace
| Variable | Type | Description |
|----------|------|-------------|
| `src.name` | string | Worker-enriched summary name of source (pod > service > dns > process) |
| `dst.name` | string | Worker-enriched summary name of destination |
| `src.namespace` | string | Source namespace with service fallback |
| `dst.namespace` | string | Destination namespace with service fallback |
### Aggregate Collections (Non-Directional)
| Variable | Type | Description |
|----------|------|-------------|
| `namespaces` | []string | All namespaces (src + dst, pod + service) |
| `pods` | []string | All pod names (src + dst) |
| `services` | []string | All service names (src + dst) |
### Labels and Annotations
| Variable | Type | Description |
|----------|------|-------------|
| `local_labels` | map[string]string | Kubernetes labels of local peer |
| `local_annotations` | map[string]string | Kubernetes annotations of local peer |
| `remote_labels` | map[string]string | Kubernetes labels of remote peer |
| `remote_annotations` | map[string]string | Kubernetes annotations of remote peer |
Use `map_get(local_labels, "key", "default")` for safe access that won't error
on missing keys.
**Example**: `map_get(local_labels, "app", "") == "checkout" && "production" in namespaces`
### Node Information
| Variable | Type | Description |
|----------|------|-------------|
| `node` | map | Nested: `node["name"]`, `node["ip"]` |
| `node_name` | string | Node name (flat alias) |
| `node_ip` | string | Node IP (flat alias) |
| `local_node_name` | string | Node name of local peer |
| `remote_node_name` | string | Node name of remote peer |
### Process Information
| Variable | Type | Description |
|----------|------|-------------|
| `local_process_name` | string | Process name on local peer |
| `remote_process_name` | string | Process name on remote peer |
### DNS Resolution
| Variable | Type | Description |
|----------|------|-------------|
| `src.dns` | string | DNS resolution of source IP |
| `dst.dns` | string | DNS resolution of destination IP |
| `dns_resolutions` | []string | All DNS resolutions (deduplicated) |
### Resolution Status
| Variable | Type | Values |
|----------|------|--------|
| `local_resolution_status` | string | `""` (resolved), `"no_node_mapping"`, `"rpc_error"`, `"rpc_empty"`, `"cache_miss"`, `"queue_full"` |
| `remote_resolution_status` | string | Same as above |
## Default Values
When a variable is not present in an entry, KFL2 uses these defaults:
| Type | Default |
|------|---------|
| string | `""` |
| int | `0` |
| bool | `false` |
| list | `[]` |
| map | `{}` |
| bytes | `[]` |
## Protocol Variable Precedence
For protocols with request/response pairs (Kafka, RADIUS, Diameter), merged
fields prefer the **request** side. If no request exists, the response value
is used. Size totals are always computed as `request_size + response_size`.
## CEL Language Features
KFL2 supports the full CEL specification:
- **Short-circuit evaluation**: `&&` stops on first false, `||` stops on first true
- **Ternary**: `condition ? value_if_true : value_if_false`
- **Regex**: `str.matches("pattern")` uses RE2 syntax
- **Type coercion**: Timestamps require `timestamp()`, durations require `duration()`
- **Null safety**: Use `in` operator or `map_get()` before accessing map keys
For the full CEL specification, see the
[CEL Language Definition](https://github.com/google/cel-spec/blob/master/doc/langdef.md).

444
skills/network-rca/SKILL.md Normal file
View File

@@ -0,0 +1,444 @@
---
name: network-rca
description: >
Kubernetes network root cause analysis skill powered by Kubeshark MCP. Use this skill
whenever the user wants to investigate past incidents, perform retrospective traffic
analysis, take or manage traffic snapshots, extract PCAPs, dissect L7 API calls from
historical captures, compare traffic patterns over time, detect drift or anomalies
between snapshots, or do any kind of forensic network analysis in Kubernetes.
Also trigger when the user mentions snapshots, raw capture, PCAP extraction,
traffic replay, postmortem analysis, "what happened yesterday/last week",
root cause analysis, RCA, cloud snapshot storage, snapshot dissection, or KFL filters
for historical traffic. Even if the user just says "figure out what went wrong"
or "compare today's traffic to yesterday" in a Kubernetes context, use this skill.
---
# Network Root Cause Analysis with Kubeshark MCP
You are a Kubernetes network forensics specialist. Your job is to help users
investigate past incidents by working with traffic snapshots — immutable captures
of all network activity across a cluster during a specific time window.
Kubeshark is a search engine for network traffic. Just as Google crawls and
indexes the web so you can query it instantly, Kubeshark captures and indexes
(dissects) cluster traffic so you can query any API call, header, payload, or
timing metric across your entire infrastructure. Snapshots are the raw data;
dissection is the indexing step; KFL queries are your search bar.
Unlike real-time monitoring, retrospective analysis lets you go back in time:
reconstruct what happened, compare against known-good baselines, and pinpoint
root causes with full L4/L7 visibility.
## Timezone Handling
All timestamps presented to the user **must use the local timezone** of the environment
where the agent is running. Users think in local time ("this happened around 3pm"), and
UTC-only output adds friction during incident response when speed matters.
### Rules
1. **Detect the local timezone** at the start of every investigation. Use the system
clock or environment (e.g., `date +%Z` or equivalent) to determine the timezone.
2. **Present local time as the primary reference** in all output — summaries, event
correlations, time-range references, and tables.
3. **Show UTC in parentheses** for clarity, e.g., `15:03:22 IST (12:03:22 UTC)`.
4. **Convert tool responses** — Kubeshark MCP tools return timestamps in UTC. Always
convert these to local time before presenting to the user.
5. **Use local time in natural language** — when describing events, say "the spike at
3:23 PM" not "the spike at 12:23 UTC".
### Snapshot Creation
When creating snapshots, Kubeshark MCP tools accept UTC timestamps. Convert the user's
local time references to UTC before passing them to tools like `create_snapshot` or
`export_snapshot_pcap`. Confirm the converted window with the user if there's any
ambiguity.
## Prerequisites
Before starting any analysis, verify the environment is ready.
### Kubeshark MCP Health Check
Confirm the Kubeshark MCP is accessible and tools are available. Look for tools
like `list_api_calls`, `list_l4_flows`, `create_snapshot`, etc.
**Tool**: `check_kubeshark_status`
If tools like `list_api_calls` or `list_l4_flows` are missing from the response,
something is wrong with the MCP connection. Guide the user through setup
(see Setup Reference at the bottom).
### Raw Capture Must Be Enabled
Retrospective analysis depends on raw capture — Kubeshark's kernel-level (eBPF)
packet recording that stores traffic at the node level. Without it, snapshots
have nothing to work with.
Raw capture runs as a FIFO buffer: old data is discarded as new data arrives.
The buffer size determines how far back you can go. Larger buffer = wider
snapshot window.
```yaml
tap:
capture:
raw:
enabled: true
storageSize: 10Gi # Per-node FIFO buffer
```
If raw capture isn't enabled, inform the user that retrospective analysis
requires it and share the configuration above.
### Snapshot Storage
Snapshots are assembled on the Hub's storage, which is ephemeral by default.
For serious forensic work, persistent storage is recommended:
```yaml
tap:
snapshots:
local:
storageClass: gp2
storageSize: 1000Gi
```
## Core Workflow
Every investigation starts with a snapshot. After that, you choose one of two
investigation routes depending on your goal:
1. **Determine time window** — When did the issue occur? Use `get_data_boundaries`
to see what raw capture data is available.
2. **Create or locate a snapshot** — Either take a new snapshot covering the
incident window, or find an existing one with `list_snapshots`.
3. **Choose your investigation route** — PCAP or Dissection (see below).
### Choosing the Right Route
| | PCAP Route | Dissection Route |
|---|---|---|
| **Speed** | Immediate — no indexing needed | Takes time to index |
| **Filtering** | Nodes, time window, BPF filters | Kubernetes & API-level (pods, labels, paths, status codes) |
| **Output** | Cluster-wide PCAP files | Structured query results |
| **Investigation by** | Human (Wireshark) | AI agent or human (queryable database) |
| **Best for** | Compliance, sharing with network teams, Wireshark deep-dives | Root cause analysis, API-level debugging, automated investigation |
Both routes are valid and complementary. Use PCAP when you need raw packets
for human analysis or compliance. Use Dissection when you want an AI agent
to search and analyze traffic programmatically.
**Default to Dissection.** Unless the user explicitly asks for a PCAP file or
Wireshark export, assume Dissection is needed. Any question about workloads,
APIs, services, pods, error rates, latency, or traffic patterns requires
dissected data.
## Snapshot Operations
Both routes start here. A snapshot is an immutable freeze of all cluster traffic
in a time window.
### Check Data Boundaries
**Tool**: `get_data_boundaries`
Check what raw capture data exists across the cluster. You can only create
snapshots within these boundaries — data outside the window has been rotated
out of the FIFO buffer.
**Example response** (raw tool output is in UTC — convert to local time before presenting):
```
Cluster-wide:
Oldest: 2026-03-14 18:12:34 IST (16:12:34 UTC)
Newest: 2026-03-14 20:05:20 IST (18:05:20 UTC)
Per node:
┌─────────────────────────────┬───────────────────────────────┬───────────────────────────────┐
│ Node │ Oldest │ Newest │
├─────────────────────────────┼───────────────────────────────┼───────────────────────────────┤
│ ip-10-0-25-170.ec2.internal │ 18:12:34 IST (16:12:34 UTC) │ 20:03:39 IST (18:03:39 UTC) │
│ ip-10-0-32-115.ec2.internal │ 18:13:45 IST (16:13:45 UTC) │ 20:05:20 IST (18:05:20 UTC) │
└─────────────────────────────┴───────────────────────────────┴───────────────────────────────┘
```
If the incident falls outside the available window, the data has been rotated
out. Suggest increasing `storageSize` for future coverage.
### Create a Snapshot
**Tool**: `create_snapshot`
Specify nodes (or cluster-wide) and a time window within the data boundaries.
Snapshots include raw capture files, Kubernetes pod events, and eBPF cgroup events.
Snapshots take time to build. Check status with `get_snapshot` — wait until
`completed` before proceeding with either route.
### List Existing Snapshots
**Tool**: `list_snapshots`
Shows all snapshots on the local Hub, with name, size, status, and node count.
### Cloud Storage
Snapshots on the Hub are ephemeral. Cloud storage (S3, GCS, Azure Blob)
provides long-term retention. Snapshots can be downloaded to any cluster
with Kubeshark — not necessarily the original one.
**Check cloud status**: `get_cloud_storage_status`
**Upload to cloud**: `upload_snapshot_to_cloud`
**Download from cloud**: `download_snapshot_from_cloud`
---
## Route 1: PCAP
The PCAP route does **not** require dissection. It works directly with the raw
snapshot data to produce filtered, cluster-wide PCAP files. Use this route when:
- You need raw packets for Wireshark analysis
- You're sharing captures with network teams
- You need evidence for compliance or audit
- A human will perform the investigation (not an AI agent)
### Filtering a PCAP
**Tool**: `export_snapshot_pcap`
Filter the snapshot down to what matters using:
- **Nodes** — specific cluster nodes only
- **Time** — sub-window within the snapshot
- **BPF filter** — standard Berkeley Packet Filter syntax (e.g., `host 10.0.53.101`,
`port 8080`, `net 10.0.0.0/16`)
These filters are combinable — select specific nodes, narrow the time range,
and apply a BPF expression all at once.
### Workload-to-BPF Workflow
When you know the workload names but not their IPs, resolve them from the
snapshot's metadata. Snapshots preserve pod-to-IP mappings from capture time,
so resolution is accurate even if pods have been rescheduled since.
**Tool**: `list_workloads`
Use `list_workloads` with `name` + `namespace` for a singular lookup (works
live and against snapshots), or with `snapshot_id` + filters for a broader
scan.
**Example workflow — singular lookup** — extract PCAP for specific workloads:
1. Resolve IPs: `list_workloads` with `name: "orders-594487879c-7ddxf"`, `namespace: "prod"` → IPs: `["10.0.53.101"]`
2. Resolve IPs: `list_workloads` with `name: "payment-service-6b8f9d-x2k4p"`, `namespace: "prod"` → IPs: `["10.0.53.205"]`
3. Build BPF: `host 10.0.53.101 or host 10.0.53.205`
4. Export: `export_snapshot_pcap` with that BPF filter
**Example workflow — filtered scan** — extract PCAP for all workloads
matching a pattern in a snapshot:
1. List workloads: `list_workloads` with `snapshot_id`, `namespaces: ["prod"]`,
`name_regex: "payment.*"` → returns all matching workloads with their IPs
2. Collect all IPs from the response
3. Build BPF: `host 10.0.53.205 or host 10.0.53.210 or ...`
4. Export: `export_snapshot_pcap` with that BPF filter
This gives you a cluster-wide PCAP filtered to exactly the workloads involved
in the incident — ready for Wireshark or long-term storage.
### IP-to-Workload Resolution
When you have an IP address (e.g., from a PCAP or L4 flow) and need to
identify the workload behind it:
**Tool**: `list_ips`
Use `list_ips` with `ip` for a singular lookup (works live and against
snapshots), or with `snapshot_id` + filters for a broader scan.
**Example — singular lookup**: `list_ips` with `ip: "10.0.53.101"`,
`snapshot_id: "snap-abc"` → returns pod/service identity for that IP.
**Example — filtered scan**: `list_ips` with `snapshot_id: "snap-abc"`,
`namespaces: ["prod"]`, `labels: {"app": "payment"}` → returns all IPs
associated with workloads matching those filters.
---
## Route 2: Dissection
The Dissection route indexes raw packets into structured L7 API calls, building
a queryable database from the snapshot. Use this route when:
- An AI agent is performing the investigation
- You need to search by Kubernetes context (pods, namespaces, labels, services)
- You need to search by API elements (paths, status codes, headers, payloads)
- You want structured responses you can analyze programmatically
- You need to drill into the payload of a specific API call
**KFL requirement**: The Dissection route uses KFL filters for all queries
(`list_api_calls`, `get_api_stats`, etc.). Before constructing any KFL filter,
load the KFL skill (`skills/kfl/`). KFL is statically typed — incorrect field
names or syntax will fail silently or error. If the KFL skill is not available,
suggest the user install it:
```bash
ln -s /path/to/kubeshark/skills/kfl ~/.claude/skills/kfl
```
**If the KFL skill cannot be loaded**, only use the exact filter examples shown
in this skill. Do not improvise or guess at field names, operators, or syntax.
KFL field names differ from what you might expect (e.g., `status_code` not
`response.status`, `src.pod.namespace` not `src.namespace`). Using incorrect
fields produces wrong results without warning.
### Dissection Is Required — Do Not Skip This
**Any question about workloads, Kubernetes resources, services, pods, namespaces,
or API calls requires dissection.** Only the PCAP route works without it. If the
user asks anything about traffic content, API behavior, error rates, latency,
or service-to-service communication, you **must** ensure dissection is active
before attempting to answer.
**Do not wait for dissection to complete on its own — it will not start by itself.**
Follow this sequence every time before using `list_api_calls`, `get_api_call`,
or `get_api_stats`:
1. **Check status**: Call `get_snapshot_dissection_status` (or `list_snapshot_dissections`)
to see if a dissection already exists for this snapshot.
2. **If dissection exists and is completed** — proceed with your query. No further
action needed.
3. **If dissection is in progress** — wait for it to complete, then proceed.
4. **If no dissection exists** — you **must** call `start_snapshot_dissection` to
trigger it. Then monitor progress with `get_snapshot_dissection_status` until
it completes.
Never assume dissection is running. Never wait for a dissection that was not started.
The agent is responsible for triggering dissection when it is missing.
**Tool**: `start_snapshot_dissection`
Dissection takes time proportional to snapshot size — it parses every packet,
reassembles streams, and builds the index. After completion, these tools
become available:
- `list_api_calls` — Search API transactions with KFL filters
- `get_api_call` — Drill into a specific call (headers, body, timing, payload)
- `get_api_stats` — Aggregated statistics (throughput, error rates, latency)
### Every Question Is a Query
**Every user prompt that involves APIs, workloads, services, pods, namespaces,
or Kubernetes semantics should translate into a `list_api_calls` call with an
appropriate KFL filter.** Do not answer from memory or prior results — always
run a fresh query that matches what the user is asking.
Examples of user prompts and the queries they should trigger:
| User says | Action |
|---|---|
| "Show me all 500 errors" | `list_api_calls` with KFL: `http && status_code == 500` |
| "What's hitting the payment service?" | `list_api_calls` with KFL: `dst.service.name == "payment-service"` |
| "Any DNS failures?" | `list_api_calls` with KFL: `dns && status_code != 0` |
| "Show traffic from namespace prod to staging" | `list_api_calls` with KFL: `src.pod.namespace == "prod" && dst.pod.namespace == "staging"` |
| "What are the slowest API calls?" | `list_api_calls` with KFL: `http && elapsed_time > 5000000` |
The user's natural language maps to KFL. Your job is to translate intent into
the right filter and run the query — don't summarize old results or speculate
without fresh data.
### Investigation Strategy
Start broad, then narrow:
1. `get_api_stats` — Get the overall picture: error rates, latency percentiles,
throughput. Look for spikes or anomalies.
2. `list_api_calls` filtered by error codes (4xx, 5xx) or high latency — find
the problematic transactions.
3. `get_api_call` on specific calls — inspect headers, bodies, timing, and
full payload to understand what went wrong.
4. Use KFL filters to slice by namespace, service, protocol, or any combination.
**Example `list_api_calls` response** (filtered to `http && status_code >= 500`,
timestamps converted from UTC to local):
```
┌──────────────────────────────────────────┬────────┬──────────────────────────┬────────┬───────────┐
│ Timestamp │ Method │ URL │ Status │ Elapsed │
├──────────────────────────────────────────┼────────┼──────────────────────────┼────────┼───────────┤
│ 2026-03-14 19:23:45 IST (17:23:45 UTC) │ POST │ /api/v1/orders/charge │ 503 │ 12,340 ms │
│ 2026-03-14 19:23:46 IST (17:23:46 UTC) │ POST │ /api/v1/orders/charge │ 503 │ 11,890 ms │
│ 2026-03-14 19:23:48 IST (17:23:48 UTC) │ GET │ /api/v1/inventory/check │ 500 │ 8,210 ms │
│ 2026-03-14 19:24:01 IST (17:24:01 UTC) │ POST │ /api/v1/payments/process │ 502 │ 30,000 ms │
└──────────────────────────────────────────┴────────┴──────────────────────────┴────────┴───────────┘
Src: api-gateway (prod) → Dst: payment-service (prod)
```
Use the pattern of repeated failures and high latency to identify the failing
service chain, then drill into individual calls with `get_api_call`.
### KFL Filters for Dissected Traffic
Layer filters progressively when investigating:
```
// Step 1: Protocol + namespace
http && dst.pod.namespace == "production"
// Step 2: Add error condition
http && dst.pod.namespace == "production" && status_code >= 500
// Step 3: Narrow to service
http && dst.pod.namespace == "production" && status_code >= 500 && dst.service.name == "payment-service"
// Step 4: Narrow to endpoint
http && dst.pod.namespace == "production" && status_code >= 500 && dst.service.name == "payment-service" && path.contains("/charge")
```
Other common RCA filters:
```
dns && dns_response && status_code != 0 // Failed DNS lookups
src.service.namespace != dst.service.namespace // Cross-namespace traffic
http && elapsed_time > 5000000 // Slow transactions (> 5s)
conn && conn_state == "open" && conn_local_bytes > 1000000 // High-volume connections
```
---
## Combining Both Routes
The two routes are complementary. A common pattern:
1. Start with **Dissection** — let the AI agent search and identify the root cause
2. Once you've pinpointed the problematic workloads, use `list_workloads`
to get their IPs (singular lookup by name+namespace, or filtered scan
by namespace/regex/labels against the snapshot)
3. Switch to **PCAP** — export a filtered PCAP of just those workloads for
Wireshark deep-dive, sharing with the network team, or compliance archival
## Use Cases
### Post-Incident RCA
1. Identify the incident time window from alerts, logs, or user reports
2. Check `get_data_boundaries` — is the window still in raw capture?
3. `create_snapshot` covering the incident window (add 15 minutes buffer)
4. **Dissection route**: `start_snapshot_dissection``get_api_stats`
`list_api_calls``get_api_call` → follow the dependency chain
5. **PCAP route**: `list_workloads``export_snapshot_pcap` with BPF →
hand off to Wireshark or archive
### Other Use Cases
- **Trend analysis** — Take snapshots at regular intervals and compare
`get_api_stats` across them to detect latency drift, error rate changes,
or new service-to-service connections.
- **Forensic preservation** — `create_snapshot` + `upload_snapshot_to_cloud`
for immutable, long-term evidence. Downloadable to any cluster months later.
- **Production-to-local replay** — Upload a production snapshot to cloud,
download it on a local KinD cluster, and investigate safely.
## Setup Reference
For CLI installation, MCP configuration, verification, and troubleshooting,
see `references/setup.md`.

View File

@@ -0,0 +1,70 @@
# Kubeshark MCP Setup Reference
## Installing the CLI
**Homebrew (macOS)**:
```bash
brew install kubeshark
```
**Linux**:
```bash
sh <(curl -Ls https://kubeshark.com/install)
```
**From source**:
```bash
git clone https://github.com/kubeshark/kubeshark
cd kubeshark && make
```
## MCP Configuration
**Claude Desktop / Cowork** (`claude_desktop_config.json`):
```json
{
"mcpServers": {
"kubeshark": {
"command": "kubeshark",
"args": ["mcp"]
}
}
}
```
**Claude Code (CLI)**:
```bash
claude mcp add kubeshark -- kubeshark mcp
```
**Without kubectl access** (direct URL mode):
```json
{
"mcpServers": {
"kubeshark": {
"command": "kubeshark",
"args": ["mcp", "--url", "https://kubeshark.example.com"]
}
}
}
```
```bash
# Claude Code equivalent:
claude mcp add kubeshark -- kubeshark mcp --url https://kubeshark.example.com
```
## Verification
- Claude Code: `/mcp` to check connection status
- Terminal: `kubeshark mcp --list-tools`
- Cluster: `kubectl get pods -l app=kubeshark-hub`
## Troubleshooting
- **Binary not found** → Install via Homebrew or the install script above
- **Connection refused** → Deploy Kubeshark first: `kubeshark tap`
- **No L7 data** → Check `get_dissection_status` and `enable_dissection`
- **Snapshot creation fails** → Verify raw capture is enabled in Kubeshark config
- **Empty snapshot** → Check `get_data_boundaries` — the requested window may
fall outside available data