Compare commits

..

30 Commits

Author SHA1 Message Date
Alon Girmonsky
c1dbbd2386 Replace HTML embeds with image placeholders in security-audit README 2026-05-19 22:37:24 -07:00
Alon Girmonsky
35dea1bc9a Add security-audit skill README with demo session and sample report 2026-05-19 22:13:09 -07:00
Alon Girmonsky
f97866f747 🔖 Release v53.3.0 (#1937)
* 🔖 Bump the Helm chart version to 53.3.0

* 🙈 Add .claude/ to .gitignore

* 🔥 Remove .claude/ and RELEASE_NOTES_v53.2.5.md

*  Revert changes to release-tag.yml

---------

Co-authored-by: Alon Girmonsky <alongir@Alons-MacBook-Air.local>
Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-05-19 02:00:17 -07:00
Ilya Gavrilov
b2a0fb0cea Add L7 data boundaries MCP tool, API endpoint and frontend LIVE filter button (#1935)
Co-authored-by: Alon Girmonsky <1990761+alongir@users.noreply.github.com>
2026-05-18 22:53:01 -07:00
Alon Girmonsky
2475f6e260 Add PostgreSQL protocol support to KFL skill (#1936)
Add PostgreSQL filter examples, variable reference table, and protocol
table entry. Notes the key difference that postgresql_error_code is a
string (SQLSTATE) unlike MySQL's int error code.

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-05-18 02:25:04 -07:00
Alon Girmonsky
cd13d8f89e Add security-audit skill for MITRE ATT&CK-based threat detection (#1934)
New skill that guides systematic 8-phase network security audits across
MITRE ATT&CK tactics using snapshot-based traffic analysis. Includes
threat catalog, KFL security filter reference, and report template.

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-05-15 10:16:37 -07:00
Alon Girmonsky
ad9dfbf5f9 Add install skill for Kubeshark deployment guidance (#1933)
* Add install skill for Kubeshark deployment guidance

New skill that helps users install and configure Kubeshark with a clear
CLI vs Helm decision tree, opinionated production defaults, and
platform-specific storage class recommendations.

* Add user-invocable flag to install skill frontmatter

* Add backup/overwrite check guidance for ~/.kubeshark/ config files

---------

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-05-15 08:31:33 -07:00
Alon Girmonsky
ed1d2e1a4d Enable tlsx dissector by default (#1928)
Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-05-14 11:40:02 -07:00
Volodymyr Stoiko
7b5954ea00 helm: grant hub tokenreviews and label worker pods for internal auth (#1926)
* helm: grant hub tokenreviews and pass trusted controllers

Adds RBAC for hub to call the authentication.k8s.io/v1 TokenReview
endpoint, used by the new internalauth middleware to validate projected
ServiceAccountTokens presented by in-cluster gRPC callers.

Adds tap.internalAuth.trustedControllers value (empty by default),
threaded through to hub's -trusted-controllers flag as a CSV. Listing
a controller here lets pods owned by it authenticate to hub via the
projected SA token (audience kubeshark-hub). Hub-spawned Jobs are
always trusted regardless of this list. Hub matches OwnerReferences
by name AND UID, so a name-only forgery does not grant trust.

Sub-issue of kubeshark/hub#656.

* helm: inline trusted controllers in hub deployment template

The chart already knows its own controller names (worker DaemonSet
metadata.name is the literal "kubeshark-worker-daemon-set" in
09-worker-daemon-set.yaml). Pasting the same literal into a user-facing
tap.internalAuth.trustedControllers value adds a step without buying
anything — if the worker DS rename, the deployment template would have
to change in lockstep regardless.

Drop the values knob, render the flag unconditionally with the literal
worker DS name (matching the convention used elsewhere in this chart,
e.g. the hub deployment's {{ include "kubeshark.name" . }}-hub).

* helm: drop redundant comment on tokenreviews RBAC

* helm: drop -trusted-controllers flag (no caller today)

The flag was wiring forward-prep for a hypothetical worker->hub gRPC
caller from the DaemonSet. Hub-spawned Jobs (dissection-job) are
admitted via internalauth.RegisterSpawnedJob, not via this flag.
Re-add when an actual DaemonSet-deployed caller materializes.

* helm: label worker DS pods for hub internal auth

Worker pods don't call hub gRPC today, but pre-labeling the DS pod
template means a future worker->hub gRPC caller is one PR (worker-side)
away from working — no chart change required. Matches the generic
label-driven trust model in hub#783.

* helm: rename trust label to kubeshark.io/internal-auth

Matches the hub rename. Generic name so the same label can mark pods
trusted by future kubeshark services beyond hub.
2026-05-13 10:53:20 -07:00
Volodymyr Stoiko
8186b7891b Authz refactoring (helm chart + CLI) (#1921)
* Migrate auth.saml.roles to unified auth.roles

Follows the hub-side introduction of the backend-neutral AUTH_ROLES /
AUTH_ROLES_CLAIM / AUTH_DEFAULT_ROLE config (hub commit 51177bcb).
CLI and Helm chart now surface the unified location:

  tap.auth.roles         — map of role -> permissions (shared SAML/OIDC)
  tap.auth.rolesClaim    — token/assertion claim name carrying roles
  tap.auth.defaultRole   — fallback role for authenticated users with
                           no matching role in their token

Helm ConfigMap template emits AUTH_ROLES / AUTH_ROLES_CLAIM /
AUTH_DEFAULT_ROLE and no longer emits AUTH_SAML_ROLES or
AUTH_SAML_ROLE_ATTRIBUTE. Hub's back-compat fallback still reads those
keys from any existing ConfigMap that hasn't been helm-upgraded.

Legacy struct fields (SamlConfig.Roles, SamlConfig.RoleAttribute) stay
in place so existing values.yaml files with auth.saml.roles still parse
without errors, but the CLI and the chart ignore them. Follow-up release
can remove the struct fields once telemetry confirms migration.

Breaking for users with customized auth.saml.roles in their values.yaml
— the customization is masked by the new default auth.roles.admin and
must be migrated to auth.roles for the custom permissions to take
effect. Documented in the chart README and release notes.

Part of authz-refactoring (Step 2 of hub-oidc-rbac.md, CLI side).

* Remove legacy

* Align CLI + Helm chart with hub AUTH_TYPE rename

Follows hub commit 11564fef. The canonical AUTH_TYPE is now `oidc` for
generic OIDC; `dex` is a permanent alias; `descope` is a new explicit
label. This change surfaces the new vocabulary in the CLI config struct
and the Helm chart, and renames the nested `auth.dexOidc` values.yaml
field to `auth.oidc` for consistency.

Helm chart:
- 12-config-map.yaml: AUTH_OIDC_* keys now read `.Values.tap.auth.oidc.*`
  instead of `auth.dexOidc.*`. The cloud-license override that forced
  AUTH_TYPE=default unless the admin picked `dex` now accepts `oidc` too.
- 13-secret.yaml: OIDC_CLIENT_ID / OIDC_CLIENT_SECRET read from
  `auth.oidc.*` (was `auth.dexOidc.*`).
- 06-front-deployment.yaml: REACT_APP_AUTH_ENABLED / REACT_APP_AUTH_TYPE
  conditionals accept both `oidc` and `dex` where they previously only
  matched `dex`.
- values.yaml: comment on `tap.auth.type` lists valid values and flags
  the breaking change.
- README.md: `tap.auth.type` row lists valid values. All `dexOidc`
  references renamed to `oidc`. Sample values.yaml blocks now show
  `type: oidc` as the canonical form.

CLI:
- config/configStructs/tapConfig.go: AuthConfig.Type documented with the
  full list of valid values and the migration hint.

Breaking changes (repeated in release notes):
1. `tap.auth.type: oidc` now routes to the generic OIDC middleware
   (previously Descope). Switch to `tap.auth.type: descope` or `default`
   if you were using `oidc` for Descope.
2. `tap.auth.dexOidc.*` values are no longer read. Rename to
   `tap.auth.oidc.*`. No fallback.
3. `tap.auth.type: dex` continues to work — permanent alias of `oidc`.

Part of authz-refactoring (Step 4 of hub-oidc-rbac.md, CLI/Helm side).

* default kfl

* Authz Refactoring: Step 8: namespaces-list role filter

Align with hub PR kubeshark/hub#756. Per-role auth.roles[].filter (KFL)
is replaced by auth.roles[].namespaces (comma-separated list with "*",
literal, and glob semantics). Standalone tap.auth.defaultFilter knob
removed.

helm-chart/values.yaml
- admin role example uses namespaces: "*" instead of filter: "".
- Comment block explains the new namespaces semantics.
- defaultFilter: "" entry + accompanying comment block deleted.

helm-chart/templates/12-config-map.yaml
- AUTH_DEFAULT_FILTER ConfigMap entry removed (hub no longer reads it).

helm-chart/README.md
- tap.auth.defaultFilter row removed.
- tap.auth.roles default value example updated: filter: "" → namespaces: "*";
  description gains the per-role namespaces semantics legend.
2026-05-06 09:08:21 -07:00
Alon Girmonsky
ab81b0c3a7 🔖 Bump the Helm chart version to 53.2.5 (#1920)
Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-05-01 13:36:38 -07:00
Alon Girmonsky
9f5a1a41c0 fix(release-pr): sync bumped Chart.yaml to kubeshark.github.io (#1913)
* fix(release-pr): sync bumped Chart.yaml to kubeshark.github.io

The release-pr target was switching back to master (and pulling)
BEFORE copying helm-chart/ into ../kubeshark.github.io/charts/chart.
That reverted the working tree to the pre-bump Chart.yaml, so the
kubeshark.github.io PR shipped the previous version and the
chart-releaser action failed trying to recreate an existing tag.

Copy the bumped chart from the release/vX.Y.Z working tree, then
switch kubeshark back to master at the end of the target.

Also consolidate iterative robustness improvements: VERSION
validation, idempotent sibling-repo tagging, idempotent branch /
commit / push / PR creation, and a "nothing to commit" guard so
reruns of release-pr do not fail.

* refactor(release): split release-pr into three rerunnable targets

Before, release-pr did three things in one recipe: tag sibling
repos, create the kubeshark release PR, and create the helm chart
PR. If any step failed, the whole target had to be rerun, even for
the parts that had already succeeded, and some sub-steps (like
tagging worker/hub/front after a docker-image-only rebuild) had no
standalone entry point.

Split into:
  - release-siblings     : tag worker, hub, front
  - release-pr-kubeshark : bump Chart.yaml, build, open kubeshark PR
  - release-pr-helm      : sync chart to kubeshark.github.io, open helm PR
  - release-pr           : orchestrates all three in order

Each is idempotent and can be rerun independently. release-siblings
is now the canonical entry point for tagging sibling repos when
refreshing docker images without a full release.

release-pr-helm checks out release/v$(VERSION) (fetching from origin
if absent) before copying helm-chart/, so it has the bumped Chart.yaml
regardless of whether it runs right after release-pr-kubeshark or
days later in a separate invocation.

A shared _release-check-version prerequisite validates VERSION once
per target invocation.

* fix(release): make branch creation and push truly idempotent

Delete and recreate local release/helm branches instead of conditionally
checking out, and use --force-with-lease push to handle local/remote
divergence on reruns.

---------

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-05-01 10:07:20 -07:00
Alon Girmonsky
fef3e8fb05 Add PostgreSQL protocol configuration (#1919)
* Add MySQL protocol to default configuration

Closes #1915

* Add PostgreSQL protocol configuration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 12:59:11 +03:00
Alon Girmonsky
7ae81ccc4b Add MySQL protocol to default configuration (#1916)
Closes #1915

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-04-28 15:49:44 +03:00
Serhii Ponomarenko
27111e48d3 🔨 Create dashboard entries-limit helm value (#1914)
* 🔨 Create dashboard entries-limit helm value

* 🔨 Set default value for entries-limit env
2026-04-23 18:20:22 +03:00
Alon Girmonsky
863be8f47a 🔖 Bump the Helm chart version to 53.2.3 (#1912) 2026-04-20 16:39:25 +03:00
Serhii Ponomarenko
9e4059bc4d 🔨 Set nginx proxy-buffer directives (#1909) 2026-04-18 08:07:47 +03:00
Alon Girmonsky
f79885bd35 🔖 Release v53.2.2 (#1908)
* 🔖 Bump the Helm chart version to 53.2.2

* temp

* temp2

* revert back makefile
2026-04-14 01:21:58 -07:00
Volodymyr Stoiko
31129e570a Provide external volume for dissection job (#1905)
* Pass dissection storage configuration

* add dissection storage test

* Allow pvc management

* Use snapshot storage config as default for dissection storage config

---------

Co-authored-by: Alon Girmonsky <1990761+alongir@users.noreply.github.com>
2026-04-10 09:51:44 -07:00
theechofive
3a1ad64b4c fix: add subPathExpr to worker DaemonSet for shared persistent storage (#1901)
Co-authored-by: Volodymyr Stoiko <me@volodymyrstoiko.com>
Co-authored-by: Alon Girmonsky <1990761+alongir@users.noreply.github.com>
2026-04-10 09:23:31 -07:00
Alon Girmonsky
fa03da2fd4 Enable MongoDB protocol dissector (#1903)
Add mongodb to the enabled dissectors list and port mapping (27017)
in both Go config defaults and Helm chart values.

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 08:05:13 -07:00
stringsbuilder
4de0ac6abd refactor: replace Split in loops with more efficient SplitSeq and gofmt the code (#1888)
Signed-off-by: stringsbuilder <stringsbuilder@outlook.com>
Co-authored-by: Alon Girmonsky <1990761+alongir@users.noreply.github.com>
2026-04-06 21:07:50 -07:00
Alon Girmonsky
9b5ac2821f Network RCA skill: update resolution tools to list_workloads/list_ips (#1887)
Replace deprecated resolve_workload/resolve_ip references with the new
list_workloads and list_ips tools that support both singular lookup
(name+namespace or IP) and filtered scan (namespace/regex/label filters
against snapshots).

Ref: kubeshark/hub#687

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-04-06 12:40:34 -07:00
Alon Girmonsky
1ba6ed94e0 💄 Improve README with AI skills, KFL semantics, and cloud storage (#1892)
* 💄 Improve README with AI skills, KFL semantics image, and cloud storage

- Add AI Skills section with Network RCA and KFL skills, Claude Code plugin install
- Rename "Network Traffic Indexing" to "Query with API, Kubernetes, and Network Semantics" with new KFL semantics image showing how a single query combines all three layers
- Add cloud storage providers (S3, Azure Blob, GCS) and decrypted TLS to Traffic Retention section
- Update Features table: add AI Skills, KFL query language, cloud storage, delayed indexing

* 🔒 Add encrypted traffic visibility to README "What you can do" section

* 🎨 Update snapshots image in README

---------

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-04-02 18:38:13 -07:00
Alon Girmonsky
4695acb41e 🐛 Fix release-pr Makefile target cleanup and macOS sed compatibility (#1890)
- Fix macOS sed -i requiring empty backup extension argument
- Checkout master after creating kubeshark release PR
- Checkout master in kubeshark.github.io before and after creating helm PR
- Run all kubeshark.github.io operations in a single shell to avoid lost cd context

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-03-31 12:05:21 -07:00
Alon Girmonsky
b80723edfb 🔖 Bump the Helm chart version to 53.2.0 (#1889)
Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-03-31 11:30:42 -07:00
Alon Girmonsky
ddc2e57f12 Network RCA skill: use local timezone instead of UTC (#1880)
* Use local timezone instead of UTC in Network RCA skill output

Add a Timezone Handling section that instructs the agent to detect the
local timezone, present local time as the primary reference with UTC in
parentheses, and convert UTC tool responses before presenting to users.
Update all example timestamps to demonstrate the local+UTC format.

Closes #1879

* Ensure agent proactively starts dissection for workload/API queries

The agent was waiting for dissection to complete without ever starting it.
Add explicit instructions: check dissection status first, start it if
missing, and default to the Dissection route for any non-PCAP question.
Only PCAP-specific requests can skip dissection.

* Translate every API/Kubernetes question into a fresh list_api_calls query

Add "Every Question Is a Query" section: each user prompt with API or
Kubernetes semantics should map to a list_api_calls call with the
appropriate KFL filter. Includes examples of natural language to KFL
translation. Agent should never answer from memory or stale results.

---------

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-03-24 12:03:05 -07:00
Alon Girmonsky
e80fc3319b Revamp README descriptions and structure (#1881)
* Revamp README intro, sections, and descriptions

Rewrite the opening description to focus on indexing and querying.
Replace "What's captured" with actionable "What you can do" bullets.
Add port-forward step and ingress recommendation to Get Started.
Rename and tighten section descriptions: Network Data for AI Agents,
Network Traffic Indexing, Workload Dependency Map, Traffic Retention
& PCAP Export.

* Remove Raw Capture from features table
2026-03-23 08:33:27 -07:00
Volodymyr Stoiko
868b4c1f36 Verify hub/front pods are ready by conditions (#1864)
* Verify hub/front pods are ready by conditions

* log waiting for readiness

* proper sync

---------

Co-authored-by: Alon Girmonsky <1990761+alongir@users.noreply.github.com>
2026-03-21 17:33:48 -07:00
Serhii Ponomarenko
c63740ec45 🐛 Fix dissection-control front env logic (#1878) 2026-03-20 08:20:53 -07:00
32 changed files with 3003 additions and 295 deletions

3
.gitignore vendored
View File

@@ -66,4 +66,5 @@ scripts/
kubeshark.yaml
# Claude Code
CLAUDE.md
CLAUDE.md
.claude/

119
Makefile
View File

@@ -253,50 +253,111 @@ port-forward:
kubectl port-forward $$(kubectl get pods | awk '$$1 ~ /^$(POD_PREFIX)/' | awk 'END {print $$1}') $(SRC_PORT):$(DST_PORT)
release: ## Print release workflow instructions.
@echo "Release workflow (2 steps):"
@echo "Release workflow — each step is idempotent and can be rerun on its own:"
@echo ""
@echo " 1. make release-pr VERSION=x.y.z"
@echo " Tags sibling repos, bumps version, creates PRs"
@echo " (kubeshark + kubeshark.github.io helm chart)."
@echo " Review and merge both PRs manually."
@echo " 1. make release-siblings VERSION=x.y.z"
@echo " Tag worker, hub, front with vx.y.z. Also run standalone when"
@echo " rebuilding docker images without cutting a full release."
@echo ""
@echo " 2. (automatic) Tag is created when release PR merges."
@echo " Fallback: make release-tag VERSION=x.y.z"
@echo " 2. make release-pr-kubeshark VERSION=x.y.z"
@echo " Bump Helm Chart.yaml, build, open release PR on kubeshark."
@echo ""
@echo " 3. make release-pr-helm VERSION=x.y.z"
@echo " Sync helm-chart/ into kubeshark.github.io, open helm PR."
@echo " Requires release/vx.y.z branch (created by step 2)."
@echo ""
@echo " Shortcut: make release-pr VERSION=x.y.z runs 1 → 2 → 3."
@echo ""
@echo " After both PRs merge: tag is created automatically,"
@echo " or run: make release-tag VERSION=x.y.z"
release-pr: ## Step 1: Tag sibling repos, bump version, create release PR.
@cd ../worker && git checkout master && git pull && git tag -d v$(VERSION); git tag v$(VERSION) && git push origin --tags
@cd ../hub && git checkout master && git pull && git tag -d v$(VERSION); git tag v$(VERSION) && git push origin --tags
@cd ../front && git checkout master && git pull && git tag -d v$(VERSION); git tag v$(VERSION) && git push origin --tags
# Internal: validate VERSION before any release-* target runs.
_release-check-version:
@if [ -z "$(VERSION)" ]; then echo "ERROR: VERSION is required. Usage: make <target> VERSION=x.y.z"; exit 1; fi
@echo "$(VERSION)" | grep -Eq '^[0-9]+\.[0-9]+\.[0-9]+' || { echo "ERROR: VERSION must be semver (e.g. 53.2.4)"; exit 1; }
release-siblings: _release-check-version ## Tag worker, hub, front with v$(VERSION). Idempotent; standalone for docker-image-only updates.
@for repo in worker hub front; do \
echo "==> $$repo: ensuring v$(VERSION) tag"; \
(cd ../$$repo && git checkout master && git pull) || exit 1; \
if (cd ../$$repo && git ls-remote --tags origin "refs/tags/v$(VERSION)" | grep -q .); then \
echo " v$(VERSION) already on origin — skipping"; \
else \
(cd ../$$repo && git tag -d v$(VERSION) 2>/dev/null; git tag v$(VERSION) && git push origin "refs/tags/v$(VERSION)") || exit 1; \
fi; \
done
release-pr-kubeshark: _release-check-version ## Bump Chart.yaml, build, open release PR on kubeshark.
@cd ../kubeshark && git checkout master && git pull
@sed -i "s/^version:.*/version: \"$(shell echo $(VERSION) | sed -E 's/^([0-9]+\.[0-9]+\.[0-9]+)\..*/\1/')\"/" helm-chart/Chart.yaml
@NEW=$$(echo $(VERSION) | sed -E 's/^([0-9]+\.[0-9]+\.[0-9]+).*/\1/'); \
CUR=$$(awk '/^version:/ {gsub(/"/,"",$$2); print $$2; exit}' helm-chart/Chart.yaml); \
if [ "$$CUR" != "$$NEW" ]; then \
sed -i '' "s/^version:.*/version: \"$$NEW\"/" helm-chart/Chart.yaml; \
else \
echo "Chart.yaml already at $$NEW"; \
fi
@$(MAKE) build VER=$(VERSION)
@if [ "$(shell uname)" = "Darwin" ]; then \
codesign --sign - --force --preserve-metadata=entitlements,requirements,flags,runtime ./bin/kubeshark__; \
fi
@$(MAKE) generate-helm-values && $(MAKE) generate-manifests
@if git show-ref --verify --quiet refs/heads/release/v$(VERSION); then \
git branch -D release/v$(VERSION); \
fi
@git checkout -b release/v$(VERSION)
@git add -A .
@git commit -m ":bookmark: Bump the Helm chart version to $(VERSION)"
@git push -u origin release/v$(VERSION)
@gh pr create --title ":bookmark: Release v$(VERSION)" \
--body "Automated release PR for v$(VERSION)." \
--base master \
--reviewer corest
@rm -rf ../kubeshark.github.io/charts/chart
@mkdir ../kubeshark.github.io/charts/chart
@cp -r helm-chart/ ../kubeshark.github.io/charts/chart/
@if ! git diff --cached --quiet; then \
git commit -m ":bookmark: Bump the Helm chart version to $(VERSION)"; \
else \
echo "nothing to commit"; \
fi
@git push --force-with-lease -u origin release/v$(VERSION)
@if gh pr view release/v$(VERSION) --json number >/dev/null 2>&1; then \
echo "PR already exists for release/v$(VERSION)"; \
else \
gh pr create --title ":bookmark: Release v$(VERSION)" \
--body "Automated release PR for v$(VERSION)." \
--base master \
--reviewer corest; \
fi
release-pr-helm: _release-check-version ## Sync helm-chart/ to kubeshark.github.io and open the helm PR. Requires release/v$(VERSION) branch (step 2).
@git fetch origin "refs/heads/release/v$(VERSION):refs/heads/release/v$(VERSION)" 2>/dev/null || true
@if ! git show-ref --verify --quiet refs/heads/release/v$(VERSION); then \
echo "ERROR: release/v$(VERSION) branch not found locally or on origin."; \
echo "Run 'make release-pr-kubeshark VERSION=$(VERSION)' first."; \
exit 1; \
fi
@git checkout release/v$(VERSION)
@cd ../kubeshark.github.io && git checkout master && git pull \
&& git checkout -b helm-v$(VERSION) \
&& git add -A . \
&& git commit -m ":sparkles: Update the Helm chart to v$(VERSION)" \
&& git push -u origin helm-v$(VERSION) \
&& gh pr create --title ":sparkles: Helm chart v$(VERSION)" \
&& rm -rf charts/chart && mkdir -p charts/chart \
&& cp -r ../kubeshark/helm-chart/ charts/chart/
@cd ../kubeshark.github.io && \
if git show-ref --verify --quiet refs/heads/helm-v$(VERSION); then \
git branch -D helm-v$(VERSION); \
fi && \
git checkout -b helm-v$(VERSION) && \
git add -A . && \
if ! git diff --cached --quiet; then \
git commit -m ":sparkles: Update the Helm chart to v$(VERSION)"; \
else \
echo "nothing to commit"; \
fi && \
git push --force-with-lease -u origin helm-v$(VERSION) && \
if ! gh pr view helm-v$(VERSION) --json number >/dev/null 2>&1; then \
gh pr create --title ":sparkles: Helm chart v$(VERSION)" \
--body "Update Helm chart for release v$(VERSION)." \
--base master \
--reviewer corest
@cd ../kubeshark
--reviewer corest; \
else \
echo "PR already exists for helm-v$(VERSION)"; \
fi && \
git checkout master
@cd ../kubeshark && git checkout master && git pull
release-pr: release-siblings release-pr-kubeshark release-pr-helm ## Run release-siblings, release-pr-kubeshark, and release-pr-helm in sequence.
@echo ""
@echo "Release PRs created:"
@echo "Release PRs created (or already present):"
@echo " - kubeshark: Review and merge the release PR."
@echo " - kubeshark.github.io: Review and merge the helm chart PR."
@echo "Tag will be created automatically, or run: make release-tag VERSION=$(VERSION)"

View File

@@ -17,17 +17,14 @@
---
Kubeshark captures cluster-wide network traffic at the speed and scale of Kubernetes, continuously, at the kernel level using eBPF. It consolidates a highly fragmented picture — dozens of nodes, thousands of workloads, millions of connections — into a single, queryable view with full Kubernetes and API context.
Kubeshark indexes cluster-wide network traffic at the kernel level using eBPF — delivering instant answers to any query using network, API, and Kubernetes semantics.
Network data is available to **AI agents via [MCP](https://docs.kubeshark.com/en/mcp)** and to **human operators via a [dashboard](https://docs.kubeshark.com/en/v2)**.
**What you can do:**
**What's captured, cluster-wide:**
- **L4 Packets & TCP Metrics** — retransmissions, RTT, window saturation, connection lifecycle, packet loss across every node-to-node path ([TCP insights →](https://docs.kubeshark.com/en/mcp/tcp_insights))
- **L7 API Calls** — real-time request/response matching with full payload parsing: HTTP, gRPC, GraphQL, Redis, Kafka, DNS ([API dissection →](https://docs.kubeshark.com/en/v2/l7_api_dissection))
- **Decrypted TLS** — eBPF-based TLS decryption without key management
- **Kubernetes Context** — every packet and API call resolved to pod, service, namespace, and node
- **PCAP Retention** — point-in-time raw packet snapshots, exportable for Wireshark ([Snapshots →](https://docs.kubeshark.com/en/v2/traffic_snapshots))
- **Download Retrospective PCAPs** — cluster-wide packet captures filtered by nodes, time, workloads, and IPs. Store PCAPs for long-term retention and later investigation.
- **Visualize Network Data** — explore traffic matching queries with API, Kubernetes, or network semantics through a real-time dashboard.
- **See Encrypted Traffic in Plain Text** — automatically decrypt TLS/mTLS traffic using eBPF, with no key management or sidecars required.
- **Integrate with AI** — connect your favorite AI assistant (e.g. Claude, Copilot) to include network data in AI-driven workflows like incident response and root cause analysis.
![Kubeshark](https://github.com/kubeshark/assets/raw/master/png/stream.png)
@@ -38,9 +35,12 @@ Network data is available to **AI agents via [MCP](https://docs.kubeshark.com/en
```bash
helm repo add kubeshark https://helm.kubeshark.com
helm install kubeshark kubeshark/kubeshark
kubectl port-forward svc/kubeshark-front 8899:80
```
Dashboard opens automatically. You're capturing traffic.
Open `http://localhost:8899` in your browser. You're capturing traffic.
> For production use, we recommend using an [ingress controller](https://docs.kubeshark.com/en/ingress) instead of port-forward.
**Connect an AI agent** via MCP:
@@ -53,9 +53,9 @@ claude mcp add kubeshark -- kubeshark mcp
---
### AI-Powered Network Analysis
### Network Data for AI Agents
Kubeshark exposes all cluster-wide network data via MCP (Model Context Protocol). AI agents can query L4 metrics, investigate L7 API calls, analyze traffic patterns, and run root cause analysis through natural language. Use cases include incident response, root cause analysis, troubleshooting, debugging, and reliability workflows.
Kubeshark exposes cluster-wide network data via [MCP](https://docs.kubeshark.com/en/mcp) — enabling AI agents to query traffic, investigate API calls, and perform root cause analysis through natural language.
> *"Why did checkout fail at 2:15 PM?"*
> *"Which services have error rates above 1%?"*
@@ -68,31 +68,51 @@ Works with Claude Code, Cursor, and any MCP-compatible AI.
[MCP setup guide →](https://docs.kubeshark.com/en/mcp)
### AI Skills
Open-source, reusable skills that teach AI agents domain-specific workflows on top of Kubeshark's MCP tools:
| Skill | Description |
|-------|-------------|
| **[Network RCA](skills/network-rca/)** | Retrospective root cause analysis — snapshots, dissection, PCAP extraction, trend comparison |
| **[KFL](skills/kfl/)** | KFL (Kubeshark Filter Language) expert — writes, debugs, and optimizes traffic filters |
Install as a Claude Code plugin:
```
/plugin marketplace add kubeshark/kubeshark
/plugin install kubeshark
```
Or clone and use directly — skills trigger automatically based on conversation context.
[AI Skills docs →](https://docs.kubeshark.com/en/mcp/skills)
---
### L7 API Dissection
### Query with API, Kubernetes, and Network Semantics
Cluster-wide request/response matching with full payloads, parsed according to protocol specifications. HTTP, gRPC, Redis, Kafka, DNS, and more. Every API call resolved to source and destination pod, service, namespace, and node. No code instrumentation required.
Kubeshark indexes cluster-wide network traffic by parsing it according to protocol specifications, with support for HTTP, gRPC, Redis, Kafka, DNS, and more. A single [KFL query](https://docs.kubeshark.com/en/v2/kfl2) can combine all three semantic layers — Kubernetes identity, API context, and network attributes — to pinpoint exactly the traffic you need. No code instrumentation required.
![API context](https://github.com/kubeshark/assets/raw/master/png/api_context.png)
![KFL query combining API, Kubernetes, and network semantics](https://github.com/kubeshark/assets/raw/master/png/kfl-semantics.png)
[Learn more](https://docs.kubeshark.com/en/v2/l7_api_dissection)
[KFL reference →](https://docs.kubeshark.com/en/v2/kfl2) · [Traffic indexing](https://docs.kubeshark.com/en/v2/l7_api_dissection)
### L4/L7 Workload Map
### Workload Dependency Map
Cluster-wide view of service communication: dependencies, traffic flow, and anomalies across all nodes and namespaces.
A visual map of how workloads communicate, showing dependencies, traffic volume, and protocol usage across the cluster.
![Service Map](https://github.com/kubeshark/assets/raw/master/png/servicemap.png)
[Learn more →](https://docs.kubeshark.com/en/v2/service_map)
### Traffic Retention
### Traffic Retention & PCAP Export
Continuous raw packet capture with point-in-time snapshots. Export PCAP files for offline analysis with Wireshark or other tools.
Capture and retain raw network traffic cluster-wide, including decrypted TLS. Download PCAPs scoped by time range, nodes, workloads, and IPs — ready for Wireshark or any PCAP-compatible tool. Store snapshots in cloud storage (S3, Azure Blob, GCS) for long-term retention and cross-cluster sharing.
![Traffic Retention](https://github.com/kubeshark/assets/raw/master/png/snapshots.png)
![Traffic Retention](https://github.com/kubeshark/assets/raw/master/png/snapshots-list.png)
[Snapshots guide →](https://docs.kubeshark.com/en/v2/traffic_snapshots)
[Snapshots guide →](https://docs.kubeshark.com/en/v2/traffic_snapshots) · [Cloud storage →](https://docs.kubeshark.com/en/snapshots_cloud_storage)
---
@@ -100,13 +120,12 @@ Continuous raw packet capture with point-in-time snapshots. Export PCAP files fo
| Feature | Description |
|---------|-------------|
| [**Raw Capture**](https://docs.kubeshark.com/en/v2/raw_capture) | Continuous cluster-wide packet capture with minimal overhead |
| [**Traffic Snapshots**](https://docs.kubeshark.com/en/v2/traffic_snapshots) | Point-in-time snapshots, export as PCAP for Wireshark |
| [**L7 API Dissection**](https://docs.kubeshark.com/en/v2/l7_api_dissection) | Request/response matching with full payloads and protocol parsing |
| [**Traffic Snapshots**](https://docs.kubeshark.com/en/v2/traffic_snapshots) | Point-in-time snapshots with cloud storage (S3, Azure Blob, GCS), PCAP export for Wireshark |
| [**Traffic Indexing**](https://docs.kubeshark.com/en/v2/l7_api_dissection) | Real-time and delayed L7 indexing with request/response matching and full payloads |
| [**Protocol Support**](https://docs.kubeshark.com/en/protocols) | HTTP, gRPC, GraphQL, Redis, Kafka, DNS, and more |
| [**TLS Decryption**](https://docs.kubeshark.com/en/encrypted_traffic) | eBPF-based decryption without key management |
| [**AI-Powered Analysis**](https://docs.kubeshark.com/en/v2/ai_powered_analysis) | Query cluster-wide network data with Claude, Cursor, or any MCP-compatible AI |
| [**Display Filters**](https://docs.kubeshark.com/en/v2/kfl2) | Wireshark-inspired display filters for precise traffic analysis |
| [**TLS Decryption**](https://docs.kubeshark.com/en/encrypted_traffic) | eBPF-based decryption without key management, included in snapshots |
| [**AI Integration**](https://docs.kubeshark.com/en/mcp) | MCP server + open-source AI skills for network RCA and traffic filtering |
| [**KFL Query Language**](https://docs.kubeshark.com/en/v2/kfl2) | CEL-based query language with Kubernetes, API, and network semantics |
| [**100% On-Premises**](https://docs.kubeshark.com/en/air_gapped) | Air-gapped support, no external dependencies |
---

View File

@@ -86,9 +86,9 @@ type mcpContent struct {
}
type mcpPrompt struct {
Name string `json:"name"`
Description string `json:"description,omitempty"`
Arguments []mcpPromptArg `json:"arguments,omitempty"`
Name string `json:"name"`
Description string `json:"description,omitempty"`
Arguments []mcpPromptArg `json:"arguments,omitempty"`
}
type mcpPromptArg struct {
@@ -117,11 +117,11 @@ type mcpGetPromptResult struct {
// Hub MCP API response types
type hubMCPResponse struct {
Name string `json:"name"`
Description string `json:"description"`
Version string `json:"version"`
Tools []hubMCPTool `json:"tools"`
Prompts []hubMCPPrompt `json:"prompts"`
Name string `json:"name"`
Description string `json:"description"`
Version string `json:"version"`
Tools []hubMCPTool `json:"tools"`
Prompts []hubMCPPrompt `json:"prompts"`
}
type hubMCPTool struct {
@@ -131,9 +131,9 @@ type hubMCPTool struct {
}
type hubMCPPrompt struct {
Name string `json:"name"`
Description string `json:"description,omitempty"`
Arguments []hubMCPPromptArg `json:"arguments,omitempty"`
Name string `json:"name"`
Description string `json:"description,omitempty"`
Arguments []hubMCPPromptArg `json:"arguments,omitempty"`
}
type hubMCPPromptArg struct {
@@ -151,10 +151,10 @@ type mcpServer struct {
stdout io.Writer
backendInitialized bool
backendMu sync.Mutex
setFlags []string // --set flags to pass to 'kubeshark tap' when starting
directURL string // If set, connect directly to this URL (no kubectl/proxy)
urlMode bool // True when using direct URL mode
allowDestructive bool // If true, enable start/stop tools
setFlags []string // --set flags to pass to 'kubeshark tap' when starting
directURL string // If set, connect directly to this URL (no kubectl/proxy)
urlMode bool // True when using direct URL mode
allowDestructive bool // If true, enable start/stop tools
cachedHubMCP *hubMCPResponse // Cached tools/prompts from Hub
cachedAt time.Time // When the cache was populated
hubMCPMu sync.Mutex
@@ -772,7 +772,6 @@ func (s *mcpServer) callHubTool(toolName string, args map[string]any) (string, b
return prettyJSON.String(), false
}
func (s *mcpServer) callGetFileURL(args map[string]any) (string, bool) {
filePath, _ := args["path"].(string)
if filePath == "" {
@@ -869,8 +868,8 @@ func (s *mcpServer) callStartKubeshark(args map[string]any) (string, bool) {
// Add namespaces if provided
if v, ok := args["namespaces"].(string); ok && v != "" {
namespaces := strings.Split(v, ",")
for _, ns := range namespaces {
namespaces := strings.SplitSeq(v, ",")
for ns := range namespaces {
ns = strings.TrimSpace(ns)
if ns != "" {
cmdArgs = append(cmdArgs, "-n", ns)

View File

@@ -417,7 +417,7 @@ func TestMCP_CommandArgs(t *testing.T) {
cmdArgs = append(cmdArgs, v)
}
if v, _ := tc.args["namespaces"].(string); v != "" {
for _, ns := range strings.Split(v, ",") {
for ns := range strings.SplitSeq(v, ",") {
cmdArgs = append(cmdArgs, "-n", strings.TrimSpace(ns))
}
}

View File

@@ -40,9 +40,11 @@ type Readiness struct {
}
var ready *Readiness
var proxyOnce sync.Once
func tap() {
ready = &Readiness{}
proxyOnce = sync.Once{}
state.startTime = time.Now()
log.Info().Str("registry", config.Config.Tap.Docker.Registry).Str("tag", config.Config.Tap.Docker.Tag).Msg("Using Docker:")
@@ -147,11 +149,21 @@ func printNoPodsFoundSuggestion(targetNamespaces []string) {
log.Warn().Msg(fmt.Sprintf("Did not find any currently running pods that match the regex argument, %s will automatically target matching pods if any are created later%s", misc.Software, suggestionStr))
}
func isPodReady(pod *core.Pod) bool {
for _, condition := range pod.Status.Conditions {
if condition.Type == core.PodReady {
return condition.Status == core.ConditionTrue
}
}
return false
}
func watchHubPod(ctx context.Context, kubernetesProvider *kubernetes.Provider, cancel context.CancelFunc) {
podExactRegex := regexp.MustCompile(fmt.Sprintf("^%s", kubernetes.HubPodName))
podWatchHelper := kubernetes.NewPodWatchHelper(kubernetesProvider, podExactRegex)
eventChan, errorChan := kubernetes.FilteredWatch(ctx, podWatchHelper, []string{config.Config.Tap.Release.Namespace}, podWatchHelper)
isPodReady := false
podReady := false
podRunning := false
timeAfter := time.After(120 * time.Second)
for {
@@ -183,26 +195,30 @@ func watchHubPod(ctx context.Context, kubernetesProvider *kubernetes.Provider, c
Interface("containers-statuses", modifiedPod.Status.ContainerStatuses).
Msg("Watching pod.")
if modifiedPod.Status.Phase == core.PodRunning && !isPodReady {
isPodReady = true
if isPodReady(modifiedPod) && !podReady {
podReady = true
ready.Lock()
ready.Hub = true
ready.Unlock()
log.Info().Str("pod", kubernetes.HubPodName).Msg("Ready.")
} else if modifiedPod.Status.Phase == core.PodRunning && !podRunning {
podRunning = true
log.Info().Str("pod", kubernetes.HubPodName).Msg("Waiting for readiness...")
}
ready.Lock()
proxyDone := ready.Proxy
hubPodReady := ready.Hub
frontPodReady := ready.Front
ready.Unlock()
if !proxyDone && hubPodReady && frontPodReady {
ready.Lock()
ready.Proxy = true
ready.Unlock()
postFrontStarted(ctx, kubernetesProvider, cancel)
if hubPodReady && frontPodReady {
proxyOnce.Do(func() {
ready.Lock()
ready.Proxy = true
ready.Unlock()
postFrontStarted(ctx, kubernetesProvider, cancel)
})
}
case kubernetes.EventBookmark:
break
@@ -223,7 +239,7 @@ func watchHubPod(ctx context.Context, kubernetesProvider *kubernetes.Provider, c
cancel()
case <-timeAfter:
if !isPodReady {
if !podReady {
log.Error().
Str("pod", kubernetes.HubPodName).
Msg("Pod was not ready in time.")
@@ -242,7 +258,8 @@ func watchFrontPod(ctx context.Context, kubernetesProvider *kubernetes.Provider,
podExactRegex := regexp.MustCompile(fmt.Sprintf("^%s", kubernetes.FrontPodName))
podWatchHelper := kubernetes.NewPodWatchHelper(kubernetesProvider, podExactRegex)
eventChan, errorChan := kubernetes.FilteredWatch(ctx, podWatchHelper, []string{config.Config.Tap.Release.Namespace}, podWatchHelper)
isPodReady := false
podReady := false
podRunning := false
timeAfter := time.After(120 * time.Second)
for {
@@ -274,25 +291,29 @@ func watchFrontPod(ctx context.Context, kubernetesProvider *kubernetes.Provider,
Interface("containers-statuses", modifiedPod.Status.ContainerStatuses).
Msg("Watching pod.")
if modifiedPod.Status.Phase == core.PodRunning && !isPodReady {
isPodReady = true
if isPodReady(modifiedPod) && !podReady {
podReady = true
ready.Lock()
ready.Front = true
ready.Unlock()
log.Info().Str("pod", kubernetes.FrontPodName).Msg("Ready.")
} else if modifiedPod.Status.Phase == core.PodRunning && !podRunning {
podRunning = true
log.Info().Str("pod", kubernetes.FrontPodName).Msg("Waiting for readiness...")
}
ready.Lock()
proxyDone := ready.Proxy
hubPodReady := ready.Hub
frontPodReady := ready.Front
ready.Unlock()
if !proxyDone && hubPodReady && frontPodReady {
ready.Lock()
ready.Proxy = true
ready.Unlock()
postFrontStarted(ctx, kubernetesProvider, cancel)
if hubPodReady && frontPodReady {
proxyOnce.Do(func() {
ready.Lock()
ready.Proxy = true
ready.Unlock()
postFrontStarted(ctx, kubernetesProvider, cancel)
})
}
case kubernetes.EventBookmark:
break
@@ -312,7 +333,7 @@ func watchFrontPod(ctx context.Context, kubernetesProvider *kubernetes.Provider,
Msg("Failed creating pod.")
case <-timeAfter:
if !isPodReady {
if !podReady {
log.Error().
Str("pod", kubernetes.FrontPodName).
Msg("Pod was not ready in time.")
@@ -429,9 +450,6 @@ func postFrontStarted(ctx context.Context, kubernetesProvider *kubernetes.Provid
watchScripts(ctx, kubernetesProvider, false)
}
if config.Config.Scripting.Console {
go runConsoleWithoutProxy()
}
}
func updateConfig(kubernetesProvider *kubernetes.Provider) {

View File

@@ -102,23 +102,21 @@ func CreateDefaultConfig() ConfigStruct {
},
},
Auth: configStructs.AuthConfig{
Saml: configStructs.SamlConfig{
RoleAttribute: "role",
Roles: map[string]configStructs.Role{
"admin": {
Filter: "",
CanDownloadPCAP: true,
CanUseScripting: true,
ScriptingPermissions: configStructs.ScriptingPermissions{
CanSave: true,
CanActivate: true,
CanDelete: true,
},
CanUpdateTargetedPods: true,
CanStopTrafficCapturing: true,
CanControlDissection: true,
ShowAdminConsoleLink: true,
RolesClaim: "role",
Roles: map[string]configStructs.Role{
"admin": {
Filter: "",
CanDownloadPCAP: true,
CanUseScripting: true,
ScriptingPermissions: configStructs.ScriptingPermissions{
CanSave: true,
CanActivate: true,
CanDelete: true,
},
CanUpdateTargetedPods: true,
CanStopTrafficCapturing: true,
CanControlDissection: true,
ShowAdminConsoleLink: true,
},
},
},
@@ -128,13 +126,16 @@ func CreateDefaultConfig() ConfigStruct {
"http",
"icmp",
"kafka",
"mongodb",
"mysql",
"postgresql",
"redis",
// "sctp",
// "syscall",
// "tcp",
// "udp",
"ws",
// "tlsx",
"tlsx",
"ldap",
"radius",
"diameter",
@@ -147,7 +148,10 @@ func CreateDefaultConfig() ConfigStruct {
HTTP: []uint16{80, 443, 8080},
AMQP: []uint16{5671, 5672},
KAFKA: []uint16{9092},
REDIS: []uint16{6379},
MONGODB: []uint16{27017},
MYSQL: []uint16{3306},
POSTGRESQL: []uint16{5432},
REDIS: []uint16{6379},
LDAP: []uint16{389},
DIAMETER: []uint16{3868},
},

View File

@@ -173,17 +173,29 @@ type Role struct {
}
type SamlConfig struct {
IdpMetadataUrl string `yaml:"idpMetadataUrl" json:"idpMetadataUrl"`
X509crt string `yaml:"x509crt" json:"x509crt"`
X509key string `yaml:"x509key" json:"x509key"`
RoleAttribute string `yaml:"roleAttribute" json:"roleAttribute"`
Roles map[string]Role `yaml:"roles" json:"roles"`
IdpMetadataUrl string `yaml:"idpMetadataUrl" json:"idpMetadataUrl"`
X509crt string `yaml:"x509crt" json:"x509crt"`
X509key string `yaml:"x509key" json:"x509key"`
}
type AuthConfig struct {
Enabled bool `yaml:"enabled" json:"enabled" default:"false"`
Type string `yaml:"type" json:"type" default:"saml"`
Saml SamlConfig `yaml:"saml" json:"saml"`
Enabled bool `yaml:"enabled" json:"enabled" default:"false"`
// Type selects the authentication backend. Valid values:
// saml — SAML 2.0 SSO
// oidc — generic OIDC (Dex, Okta, Auth0, Keycloak, Azure AD, …)
// dex — permanent alias of oidc (kept for back-compat)
// descope — Descope SDK
// default — also routes to Descope (kept, not deprecated)
//
// NOTE: prior releases routed `oidc` to Descope. If you were using `oidc`
// to mean Descope, switch to `descope` (or `default`). The rename is a
// breaking change documented in the release notes.
Type string `yaml:"type" json:"type" default:"saml"`
Roles map[string]Role `yaml:"roles" json:"roles"`
RolesClaim string `yaml:"rolesClaim" json:"rolesClaim"`
DefaultRole string `yaml:"defaultRole" json:"defaultRole"`
DefaultFilter string `yaml:"defaultFilter" json:"defaultFilter"`
Saml SamlConfig `yaml:"saml" json:"saml"`
}
type IngressConfig struct {
@@ -203,6 +215,7 @@ type DashboardConfig struct {
StreamingType string `yaml:"streamingType" json:"streamingType" default:"connect-rpc"`
CompleteStreamingEnabled bool `yaml:"completeStreamingEnabled" json:"completeStreamingEnabled" default:"true"`
ClusterWideMapEnabled bool `yaml:"clusterWideMapEnabled" json:"clusterWideMapEnabled" default:"false"`
EntriesLimit string `yaml:"entriesLimit" json:"entriesLimit" default:"300000"`
}
type FrontRoutingConfig struct {
@@ -282,7 +295,10 @@ type PortMapping struct {
HTTP []uint16 `yaml:"http" json:"http"`
AMQP []uint16 `yaml:"amqp" json:"amqp"`
KAFKA []uint16 `yaml:"kafka" json:"kafka"`
REDIS []uint16 `yaml:"redis" json:"redis"`
MONGODB []uint16 `yaml:"mongodb" json:"mongodb"`
MYSQL []uint16 `yaml:"mysql" json:"mysql"`
POSTGRESQL []uint16 `yaml:"postgresql" json:"postgresql"`
REDIS []uint16 `yaml:"redis" json:"redis"`
LDAP []uint16 `yaml:"ldap" json:"ldap"`
DIAMETER []uint16 `yaml:"diameter" json:"diameter"`
}
@@ -353,8 +369,10 @@ type SnapshotsConfig struct {
}
type DelayedDissectionConfig struct {
CPU string `yaml:"cpu" json:"cpu" default:"1"`
Memory string `yaml:"memory" json:"memory" default:"4Gi"`
CPU string `yaml:"cpu" json:"cpu" default:"1"`
Memory string `yaml:"memory" json:"memory" default:"4Gi"`
StorageSize string `yaml:"storageSize" json:"storageSize" default:""`
StorageClass string `yaml:"storageClass" json:"storageClass" default:""`
}
type DissectionConfig struct {

View File

@@ -1,6 +1,6 @@
apiVersion: v2
name: kubeshark
version: "53.1.0"
version: "53.3.0"
description: The API Traffic Analyzer for Kubernetes
home: https://kubeshark.com
keywords:

View File

@@ -164,6 +164,8 @@ Example for overriding image names:
| `tap.snapshots.cloud.gcs.credentialsJson` | Service account JSON key. When set, the chart auto-creates a Secret with `SNAPSHOT_GCS_CREDENTIALS_JSON`. | `""` |
| `tap.delayedDissection.cpu` | CPU allocation for delayed dissection jobs | `1` |
| `tap.delayedDissection.memory` | Memory allocation for delayed dissection jobs | `4Gi` |
| `tap.delayedDissection.storageSize` | Storage size for dissection job PVC. When empty, falls back to `tap.snapshots.local.storageSize`. When the resolved value is non-empty, a PVC is created; otherwise an `emptyDir` is used. | `""` |
| `tap.delayedDissection.storageClass` | Storage class for dissection job PVC. When empty, falls back to `tap.snapshots.local.storageClass`. | `""` |
| `tap.release.repo` | URL of the Helm chart repository | `https://helm.kubeshark.com` |
| `tap.release.name` | Helm release name | `kubeshark` |
| `tap.release.namespace` | Helm release namespace | `default` |
@@ -210,14 +212,15 @@ Example for overriding image names:
| `tap.tolerations.hub` | Tolerations for hub component | `[]` |
| `tap.tolerations.front` | Tolerations for front-end component | `[]` |
| `tap.auth.enabled` | Enable authentication | `false` |
| `tap.auth.type` | Authentication type (1 option available: `saml`) | `saml` |
| `tap.auth.type` | Authentication backend. Valid values: `saml`, `oidc` (generic OIDC — Dex, Okta, Auth0, Keycloak, Azure AD, Google, …), `dex` (permanent alias of `oidc`), `descope`, `default` (also routes to Descope). **Breaking**: prior releases routed `oidc` to Descope — if you were using it for Descope, switch to `descope` or `default`. | `saml` |
| `tap.auth.approvedEmails` | List of approved email addresses for authentication | `[]` |
| `tap.auth.approvedDomains` | List of approved email domains for authentication | `[]` |
| `tap.auth.saml.idpMetadataUrl` | SAML IDP metadata URL <br/>(effective, if `tap.auth.type = saml`) | `` |
| `tap.auth.rolesClaim` | Name of the JWT claim (OIDC) or SAML attribute carrying role memberships. | `role` |
| `tap.auth.defaultRole` | Optional role name inside `tap.auth.roles` applied as fallback when an authenticated user has no matching role. Empty string = no fallback, zero-valued permissions. | `""` |
| `tap.auth.roles` | Backend-neutral role map shared by SAML and OIDC. Each role's `namespaces` is a comma-separated list controlling which Kubernetes namespaces the role's users see traffic for: `""` = deny all, `"*"` = allow all, `"foo"` = literal namespace, `"foo,bar"` = OR over literals, `"foo-*"` = glob expansion against the cluster's known namespaces. Empty/unset `tap.auth.roles` grants nothing — admins opt into elevated access by populating this map. | `{"admin":{"namespaces":"*","canDownloadPCAP":true,"canUpdateTargetedPods":true,"canUseScripting":true,"scriptingPermissions":{"canSave":true,"canActivate":true,"canDelete":true},"canStopTrafficCapturing":true,"canControlDissection":true,"showAdminConsoleLink":true}}` |
| `tap.auth.saml.idpMetadataUrl` | SAML IDP metadata URL <br/>(effective, if `tap.auth.type = saml`) | `` |
| `tap.auth.saml.x509crt` | A self-signed X.509 `.cert` contents <br/>(effective, if `tap.auth.type = saml`) | `` |
| `tap.auth.saml.x509key` | A self-signed X.509 `.key` contents <br/>(effective, if `tap.auth.type = saml`) | `` |
| `tap.auth.saml.roleAttribute` | A SAML attribute name corresponding to user's authorization role <br/>(effective, if `tap.auth.type = saml`) | `role` |
| `tap.auth.saml.roles` | A list of SAML authorization roles and their permissions <br/>(effective, if `tap.auth.type = saml`) | `{"admin":{"canDownloadPCAP":true,"canUpdateTargetedPods":true,"canUseScripting":true, "scriptingPermissions":{"canSave":true, "canActivate":true, "canDelete":true}, "canStopTrafficCapturing":true, "canControlDissection":true, "filter":"","showAdminConsoleLink":true}}` |
| `tap.ingress.enabled` | Enable `Ingress` | `false` |
| `tap.ingress.className` | Ingress class name | `""` |
| `tap.ingress.host` | Host of the `Ingress` | `ks.svc.cluster.local` |
@@ -375,8 +378,8 @@ Add these helm values to set up OIDC authentication powered by your Dex IdP:
tap:
auth:
enabled: true
type: dex
dexOidc:
type: oidc # canonical; `dex` is accepted as a permanent alias
oidc:
issuer: <put Dex IdP issuer URL here>
clientId: kubeshark
clientSecret: create your own client password
@@ -388,7 +391,7 @@ tap:
---
**Note:**<br/>
Set `tap.auth.dexOidc.bypassSslCaCheck: true`
Set `tap.auth.oidc.bypassSslCaCheck: true`
to allow Kubeshark communication with Dex IdP having an unknown SSL Certificate Authority.
This setting allows you to prevent such SSL CA-related errors:<br/>
@@ -427,7 +430,7 @@ The following Dex settings will have these values:
| Setting | Value |
|-------------------------------------------------------|----------------------------------------------|
| `tap.auth.dexOidc.issuer` | `https://ks.example.com/dex` |
| `tap.auth.oidc.issuer` | `https://ks.example.com/dex` |
| `tap.auth.dexConfig.issuer` | `https://ks.example.com/dex` |
| `tap.auth.dexConfig.staticClients -> redirectURIs` | `https://ks.example.com/api/oauth2/callback` |
| `tap.auth.dexConfig.connectors -> config.redirectURI` | `https://ks.example.com/dex/callback` |
@@ -445,16 +448,16 @@ Please, make sure to prepare the following things first.
- You will need to specify storage settings in `tap.auth.dexConfig.storage`
- default: `memory`
3. Decide on the OAuth2 `?state=` param expiration time:
- field: `tap.auth.dexOidc.oauth2StateParamExpiry`
- field: `tap.auth.oidc.oauth2StateParamExpiry`
- default: `10m` (10 minutes)
- valid time units are `s`, `m`, `h`
4. Decide on the refresh token expiration:
- field 1: `tap.auth.dexOidc.expiry.refreshTokenLifetime`
- field 1: `tap.auth.oidc.expiry.refreshTokenLifetime`
- field 2: `tap.auth.dexConfig.expiry.refreshTokens.absoluteLifetime`
- default: `3960h` (165 days)
- valid time units are `s`, `m`, `h`
5. Create a unique & secure password to set in these fields:
- field 1: `tap.auth.dexOidc.clientSecret`
- field 1: `tap.auth.oidc.clientSecret`
- field 2: `tap.auth.dexConfig.staticClients -> secret`
- password must be the same for these 2 fields
6. Discover more possibilities of **[Dex Configuration](https://dexidp.io/docs/configuration/)**
@@ -476,8 +479,8 @@ Helm `values.yaml`:
tap:
auth:
enabled: true
type: dex
dexOidc:
type: oidc # canonical; `dex` is accepted as a permanent alias
oidc:
issuer: https://<your-ingress-hostname>/dex
# Client ID/secret must be taken from `tap.auth.dexConfig.staticClients -> id/secret`

View File

@@ -44,6 +44,12 @@ rules:
- create
- update
- delete
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
@@ -86,6 +92,15 @@ rules:
verbs:
- create
- get
- apiGroups:
- ""
resources:
- persistentvolumeclaims
verbs:
- create
- get
- list
- delete
- apiGroups:
- batch
resources:

View File

@@ -56,6 +56,16 @@ spec:
- -dissector-memory
- '{{ .Values.tap.delayedDissection.memory }}'
{{- end }}
{{- $dissectorStorageSize := .Values.tap.delayedDissection.storageSize | default .Values.tap.snapshots.local.storageSize }}
{{- if $dissectorStorageSize }}
- -dissector-storage-size
- '{{ $dissectorStorageSize }}'
{{- end }}
{{- $dissectorStorageClass := .Values.tap.delayedDissection.storageClass | default .Values.tap.snapshots.local.storageClass }}
{{- if $dissectorStorageClass }}
- -dissector-storage-class
- '{{ $dissectorStorageClass }}'
{{- end }}
{{- if .Values.tap.gitops.enabled }}
- -gitops
{{- end }}

View File

@@ -26,12 +26,12 @@ spec:
- env:
- name: REACT_APP_AUTH_ENABLED
value: '{{- if or (and .Values.cloudLicenseEnabled (not (empty .Values.license))) (not .Values.internetConnectivity) -}}
{{ (default false .Values.demoModeEnabled) | ternary true ((and .Values.tap.auth.enabled (eq .Values.tap.auth.type "dex")) | ternary true false) }}
{{ (default false .Values.demoModeEnabled) | ternary true ((and .Values.tap.auth.enabled (or (eq .Values.tap.auth.type "oidc") (eq .Values.tap.auth.type "dex"))) | ternary true false) }}
{{- else -}}
{{ .Values.cloudLicenseEnabled | ternary "true" ((default false .Values.demoModeEnabled) | ternary "true" .Values.tap.auth.enabled) }}
{{- end }}'
- name: REACT_APP_AUTH_TYPE
value: '{{- if and .Values.cloudLicenseEnabled (not (eq .Values.tap.auth.type "dex")) -}}
value: '{{- if and .Values.cloudLicenseEnabled (not (or (eq .Values.tap.auth.type "oidc") (eq .Values.tap.auth.type "dex"))) -}}
default
{{- else -}}
{{ (default false .Values.demoModeEnabled) | ternary "default" .Values.tap.auth.type }}
@@ -70,7 +70,7 @@ spec:
value: '{{- if and (not .Values.demoModeEnabled) (not .Values.tap.capture.dissection.enabled) -}}
true
{{- else -}}
{{ not (default false .Values.demoModeEnabled) | ternary false true }}
{{ (default false .Values.demoModeEnabled) | ternary false true }}
{{- end -}}'
- name: 'REACT_APP_CLOUD_LICENSE_ENABLED'
value: '{{- if or (and .Values.cloudLicenseEnabled (not (empty .Values.license))) (not .Values.internetConnectivity) -}}
@@ -92,6 +92,8 @@ spec:
value: '{{ default false (((.Values).tap).dashboard).clusterWideMapEnabled }}'
- name: REACT_APP_RAW_CAPTURE_ENABLED
value: '{{ .Values.tap.capture.raw.enabled | ternary "true" "false" }}'
- name: REACT_APP_ENTRIES_LIMIT
value: '{{ default 300000 (((.Values).tap).dashboard).entriesLimit }}'
- name: REACT_APP_SENTRY_ENABLED
value: '{{ (include "sentry.enabled" .) }}'
- name: REACT_APP_SENTRY_ENVIRONMENT

View File

@@ -21,6 +21,7 @@ spec:
metadata:
labels:
app.kubeshark.com/app: worker
kubeshark.io/internal-auth: "true"
{{- include "kubeshark.labels" . | nindent 8 }}
name: kubeshark-worker-daemon-set
namespace: kubeshark
@@ -131,6 +132,10 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: TCP_STREAM_CHANNEL_TIMEOUT_MS
value: '{{ .Values.tap.misc.tcpStreamChannelTimeoutMs }}'
- name: TCP_STREAM_CHANNEL_TIMEOUT_SHOW
@@ -227,6 +232,9 @@ spec:
mountPropagation: HostToContainer
- mountPath: /app/data
name: data
{{- if .Values.tap.persistentStorage }}
subPathExpr: $(NODE_NAME)
{{- end }}
{{- if .Values.tap.tls }}
- command:
- ./tracer
@@ -257,6 +265,10 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: PROFILING_ENABLED
value: '{{ .Values.tap.pprof.enabled }}'
- name: SENTRY_ENABLED
@@ -328,6 +340,9 @@ spec:
mountPropagation: HostToContainer
- mountPath: /app/data
name: data
{{- if .Values.tap.persistentStorage }}
subPathExpr: $(NODE_NAME)
{{- end }}
- mountPath: /etc/os-release
name: os-release
readOnly: true

View File

@@ -20,6 +20,10 @@ data:
client_header_buffer_size 32k;
large_client_header_buffers 8 64k;
proxy_buffer_size 64k;
proxy_buffers 4 128k;
proxy_busy_buffers_size 128k;
location {{ default "" (((.Values.tap).routing).front).basePath }}/api {
rewrite ^{{ default "" (((.Values.tap).routing).front).basePath }}/api(.*)$ $1 break;
proxy_pass http://kubeshark-hub;

View File

@@ -19,27 +19,28 @@ data:
INGRESS_HOST: '{{ .Values.tap.ingress.host }}'
PROXY_FRONT_PORT: '{{ .Values.tap.proxy.front.port }}'
AUTH_ENABLED: '{{- if and .Values.cloudLicenseEnabled (not (empty .Values.license)) -}}
{{ (default false .Values.demoModeEnabled) | ternary true ((and .Values.tap.auth.enabled (eq .Values.tap.auth.type "dex")) | ternary true false) }}
{{ (default false .Values.demoModeEnabled) | ternary true ((and .Values.tap.auth.enabled (or (eq .Values.tap.auth.type "oidc") (eq .Values.tap.auth.type "dex"))) | ternary true false) }}
{{- else -}}
{{ .Values.cloudLicenseEnabled | ternary "true" ((default false .Values.demoModeEnabled) | ternary "true" .Values.tap.auth.enabled) }}
{{- end }}'
AUTH_TYPE: '{{- if and .Values.cloudLicenseEnabled (not (eq .Values.tap.auth.type "dex")) -}}
AUTH_TYPE: '{{- if and .Values.cloudLicenseEnabled (not (or (eq .Values.tap.auth.type "oidc") (eq .Values.tap.auth.type "dex"))) -}}
default
{{- else -}}
{{ (default false .Values.demoModeEnabled) | ternary "default" .Values.tap.auth.type }}
{{- end }}'
AUTH_SAML_IDP_METADATA_URL: '{{ .Values.tap.auth.saml.idpMetadataUrl }}'
AUTH_SAML_ROLE_ATTRIBUTE: '{{ .Values.tap.auth.saml.roleAttribute }}'
AUTH_SAML_ROLES: '{{ .Values.tap.auth.saml.roles | toJson }}'
AUTH_OIDC_ISSUER: '{{ default "not set" (((.Values.tap).auth).dexOidc).issuer }}'
AUTH_OIDC_REFRESH_TOKEN_LIFETIME: '{{ default "3960h" (((.Values.tap).auth).dexOidc).refreshTokenLifetime }}'
AUTH_OIDC_STATE_PARAM_EXPIRY: '{{ default "10m" (((.Values.tap).auth).dexOidc).oauth2StateParamExpiry }}'
AUTH_ROLES: '{{ .Values.tap.auth.roles | toJson }}'
AUTH_ROLES_CLAIM: '{{ .Values.tap.auth.rolesClaim }}'
AUTH_DEFAULT_ROLE: '{{ default "" .Values.tap.auth.defaultRole }}'
AUTH_OIDC_ISSUER: '{{ default "not set" (((.Values.tap).auth).oidc).issuer }}'
AUTH_OIDC_REFRESH_TOKEN_LIFETIME: '{{ default "3960h" (((.Values.tap).auth).oidc).refreshTokenLifetime }}'
AUTH_OIDC_STATE_PARAM_EXPIRY: '{{ default "10m" (((.Values.tap).auth).oidc).oauth2StateParamExpiry }}'
AUTH_OIDC_BYPASS_SSL_CA_CHECK: '{{- if and
(hasKey .Values.tap "auth")
(hasKey .Values.tap.auth "dexOidc")
(hasKey .Values.tap.auth.dexOidc "bypassSslCaCheck")
(hasKey .Values.tap.auth "oidc")
(hasKey .Values.tap.auth.oidc "bypassSslCaCheck")
-}}
{{ eq .Values.tap.auth.dexOidc.bypassSslCaCheck true | ternary "true" "false" }}
{{ eq .Values.tap.auth.oidc.bypassSslCaCheck true | ternary "true" "false" }}
{{- else -}}
false
{{- end }}'

View File

@@ -9,8 +9,8 @@ metadata:
stringData:
LICENSE: '{{ .Values.license }}'
SCRIPTING_ENV: '{{ .Values.scripting.env | toJson }}'
OIDC_CLIENT_ID: '{{ default "not set" (((.Values.tap).auth).dexOidc).clientId }}'
OIDC_CLIENT_SECRET: '{{ default "not set" (((.Values.tap).auth).dexOidc).clientSecret }}'
OIDC_CLIENT_ID: '{{ default "not set" (((.Values.tap).auth).oidc).clientId }}'
OIDC_CLIENT_SECRET: '{{ default "not set" (((.Values.tap).auth).oidc).clientSecret }}'
---

View File

@@ -0,0 +1,127 @@
suite: dissection storage configuration
templates:
- templates/04-hub-deployment.yaml
tests:
- it: should fallback to snapshot storageSize when dissection storageSize is empty
asserts:
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-size
- contains:
path: spec.template.spec.containers[0].command
content: "20Gi"
- it: should fallback to snapshot storageClass when dissection storageClass is empty
set:
tap.snapshots.local.storageClass: gp2
asserts:
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-class
- contains:
path: spec.template.spec.containers[0].command
content: gp2
- it: should not render dissector-storage-class when both dissection and snapshot storageClass are empty
asserts:
- notContains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-class
- it: should prefer dissection storageSize over snapshot storageSize
set:
tap.delayedDissection.storageSize: 100Gi
tap.snapshots.local.storageSize: 50Gi
asserts:
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-size
- contains:
path: spec.template.spec.containers[0].command
content: "100Gi"
- it: should prefer dissection storageClass over snapshot storageClass
set:
tap.delayedDissection.storageClass: io2
tap.snapshots.local.storageClass: gp2
asserts:
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-class
- contains:
path: spec.template.spec.containers[0].command
content: io2
- it: should fallback to snapshot config for both storageSize and storageClass
set:
tap.snapshots.local.storageSize: 30Gi
tap.snapshots.local.storageClass: gp3
asserts:
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-size
- contains:
path: spec.template.spec.containers[0].command
content: "30Gi"
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-class
- contains:
path: spec.template.spec.containers[0].command
content: gp3
- it: should not render dissector-storage-size when both dissection and snapshot storageSize are empty
set:
tap.delayedDissection.storageSize: ""
tap.snapshots.local.storageSize: ""
asserts:
- notContains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-size
- it: should render all dissector args together with custom values
set:
tap.delayedDissection.cpu: "4"
tap.delayedDissection.memory: 8Gi
tap.delayedDissection.storageSize: 200Gi
tap.delayedDissection.storageClass: local-path
asserts:
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-cpu
- contains:
path: spec.template.spec.containers[0].command
content: "4"
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-memory
- contains:
path: spec.template.spec.containers[0].command
content: 8Gi
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-size
- contains:
path: spec.template.spec.containers[0].command
content: "200Gi"
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-storage-class
- contains:
path: spec.template.spec.containers[0].command
content: local-path
- it: should still render existing dissector-cpu and dissector-memory args
asserts:
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-cpu
- contains:
path: spec.template.spec.containers[0].command
content: "1"
- contains:
path: spec.template.spec.containers[0].command
content: -dissector-memory
- contains:
path: spec.template.spec.containers[0].command
content: 4Gi

View File

@@ -37,6 +37,8 @@ tap:
delayedDissection:
cpu: "1"
memory: 4Gi
storageSize: ""
storageClass: ""
snapshots:
local:
storageClass: ""
@@ -151,24 +153,26 @@ tap:
auth:
enabled: false
type: saml
roles:
admin:
filter: ""
canDownloadPCAP: true
canUseScripting: true
scriptingPermissions:
canSave: true
canActivate: true
canDelete: true
canUpdateTargetedPods: true
canStopTrafficCapturing: true
canControlDissection: true
showAdminConsoleLink: true
rolesClaim: role
defaultRole: ""
defaultFilter: ""
saml:
idpMetadataUrl: ""
x509crt: ""
x509key: ""
roleAttribute: role
roles:
admin:
filter: ""
canDownloadPCAP: true
canUseScripting: true
scriptingPermissions:
canSave: true
canActivate: true
canDelete: true
canUpdateTargetedPods: true
canStopTrafficCapturing: true
canControlDissection: true
showAdminConsoleLink: true
ingress:
enabled: false
className: ""
@@ -186,6 +190,7 @@ tap:
streamingType: connect-rpc
completeStreamingEnabled: true
clusterWideMapEnabled: false
entriesLimit: "300000"
telemetry:
enabled: true
resourceGuard:
@@ -205,8 +210,12 @@ tap:
- http
- icmp
- kafka
- mongodb
- mysql
- postgresql
- redis
- ws
- tlsx
- ldap
- radius
- diameter
@@ -224,6 +233,12 @@ tap:
- 5672
kafka:
- 9092
mongodb:
- 27017
mysql:
- 3306
postgresql:
- 5432
redis:
- 6379
ldap:

View File

@@ -4,10 +4,10 @@ apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
name: kubeshark-hub-network-policy
namespace: default
@@ -33,10 +33,10 @@ apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
annotations:
name: kubeshark-front-network-policy
@@ -60,10 +60,10 @@ apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
annotations:
name: kubeshark-dex-network-policy
@@ -87,10 +87,10 @@ apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
annotations:
name: kubeshark-worker-network-policy
@@ -116,10 +116,10 @@ apiVersion: v1
kind: ServiceAccount
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
name: kubeshark-service-account
namespace: default
@@ -132,10 +132,10 @@ metadata:
namespace: default
labels:
app.kubeshark.com/app: hub
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
stringData:
LICENSE: ''
@@ -151,10 +151,10 @@ metadata:
namespace: default
labels:
app.kubeshark.com/app: hub
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
stringData:
AUTH_SAML_X509_CRT: |
@@ -167,10 +167,10 @@ metadata:
namespace: default
labels:
app.kubeshark.com/app: hub
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
stringData:
AUTH_SAML_X509_KEY: |
@@ -182,10 +182,10 @@ metadata:
name: kubeshark-nginx-config-map
namespace: default
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
data:
default.conf: |
@@ -199,6 +199,10 @@ data:
client_header_buffer_size 32k;
large_client_header_buffers 8 64k;
proxy_buffer_size 64k;
proxy_buffers 4 128k;
proxy_busy_buffers_size 128k;
location /api {
rewrite ^/api(.*)$ $1 break;
proxy_pass http://kubeshark-hub;
@@ -248,10 +252,10 @@ metadata:
namespace: default
labels:
app.kubeshark.com/app: hub
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
data:
POD_REGEX: '.*'
@@ -268,17 +272,18 @@ data:
AUTH_ENABLED: 'true'
AUTH_TYPE: 'default'
AUTH_SAML_IDP_METADATA_URL: ''
AUTH_SAML_ROLE_ATTRIBUTE: 'role'
AUTH_SAML_ROLES: '{"admin":{"canControlDissection":true,"canDownloadPCAP":true,"canStopTrafficCapturing":true,"canUpdateTargetedPods":true,"canUseScripting":true,"filter":"","scriptingPermissions":{"canActivate":true,"canDelete":true,"canSave":true},"showAdminConsoleLink":true}}'
AUTH_ROLES: '{"admin":{"canControlDissection":true,"canDownloadPCAP":true,"canStopTrafficCapturing":true,"canUpdateTargetedPods":true,"canUseScripting":true,"filter":"","scriptingPermissions":{"canActivate":true,"canDelete":true,"canSave":true},"showAdminConsoleLink":true}}'
AUTH_ROLES_CLAIM: 'role'
AUTH_DEFAULT_ROLE: ''
AUTH_OIDC_ISSUER: 'not set'
AUTH_OIDC_REFRESH_TOKEN_LIFETIME: '3960h'
AUTH_OIDC_STATE_PARAM_EXPIRY: '10m'
AUTH_OIDC_BYPASS_SSL_CA_CHECK: 'false'
TELEMETRY_DISABLED: 'false'
SCRIPTING_DISABLED: 'false'
TARGETED_PODS_UPDATE_DISABLED: ''
TARGETED_PODS_UPDATE_DISABLED: 'false'
PRESET_FILTERS_CHANGING_ENABLED: 'true'
RECORDING_DISABLED: ''
RECORDING_DISABLED: 'false'
DISSECTION_CONTROL_ENABLED: 'true'
GLOBAL_FILTER: ""
DEFAULT_FILTER: ""
@@ -289,15 +294,17 @@ data:
TIMEZONE: ' '
CLOUD_LICENSE_ENABLED: 'true'
DUPLICATE_TIMEFRAME: '200ms'
ENABLED_DISSECTORS: 'amqp,dns,http,icmp,kafka,redis,ws,ldap,radius,diameter,udp-flow,tcp-flow,udp-conn,tcp-conn'
ENABLED_DISSECTORS: 'amqp,dns,http,icmp,kafka,mongodb,mysql,postgresql,redis,ws,tlsx,ldap,radius,diameter,udp-flow,tcp-flow,udp-conn,tcp-conn'
CUSTOM_MACROS: '{"https":"tls and (http or http2)"}'
DISSECTORS_UPDATING_ENABLED: 'true'
SNAPSHOTS_UPDATING_ENABLED: 'true'
DEMO_MODE_ENABLED: 'false'
DETECT_DUPLICATES: 'false'
PCAP_DUMP_ENABLE: 'false'
PCAP_TIME_INTERVAL: '1m'
PCAP_MAX_TIME: '1h'
PCAP_MAX_SIZE: '500MB'
PORT_MAPPING: '{"amqp":[5671,5672],"diameter":[3868],"http":[80,443,8080],"kafka":[9092],"ldap":[389],"redis":[6379]}'
PORT_MAPPING: '{"amqp":[5671,5672],"diameter":[3868],"http":[80,443,8080],"kafka":[9092],"ldap":[389],"mongodb":[27017],"mysql":[3306],"postgresql":[5432],"redis":[6379]}'
RAW_CAPTURE_ENABLED: 'true'
RAW_CAPTURE_STORAGE_SIZE: '1Gi'
---
@@ -306,10 +313,10 @@ apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
name: kubeshark-cluster-role-default
namespace: default
@@ -347,16 +354,22 @@ rules:
- create
- update
- delete
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
---
# Source: kubeshark/templates/03-cluster-role-binding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
name: kubeshark-cluster-role-binding-default
namespace: default
@@ -374,10 +387,10 @@ apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
annotations:
name: kubeshark-self-config-role
@@ -412,6 +425,15 @@ rules:
verbs:
- create
- get
- apiGroups:
- ""
resources:
- persistentvolumeclaims
verbs:
- create
- get
- list
- delete
- apiGroups:
- batch
resources:
@@ -424,10 +446,10 @@ apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
annotations:
name: kubeshark-self-config-role-binding
@@ -447,10 +469,10 @@ kind: Service
metadata:
labels:
app.kubeshark.com/app: hub
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
name: kubeshark-hub
namespace: default
@@ -468,10 +490,10 @@ apiVersion: v1
kind: Service
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
name: kubeshark-front
namespace: default
@@ -489,10 +511,10 @@ kind: Service
apiVersion: v1
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
annotations:
prometheus.io/scrape: 'true'
@@ -502,10 +524,10 @@ metadata:
spec:
selector:
app.kubeshark.com/app: worker
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
ports:
- name: metrics
@@ -518,10 +540,10 @@ kind: Service
apiVersion: v1
metadata:
labels:
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
annotations:
prometheus.io/scrape: 'true'
@@ -531,10 +553,10 @@ metadata:
spec:
selector:
app.kubeshark.com/app: hub
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
ports:
- name: metrics
@@ -549,10 +571,10 @@ metadata:
labels:
app.kubeshark.com/app: worker
sidecar.istio.io/inject: "false"
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
name: kubeshark-worker-daemon-set
namespace: default
@@ -566,10 +588,11 @@ spec:
metadata:
labels:
app.kubeshark.com/app: worker
helm.sh/chart: kubeshark-53.1.0
kubeshark.io/internal-auth: "true"
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
name: kubeshark-worker-daemon-set
namespace: kubeshark
@@ -579,7 +602,7 @@ spec:
- /bin/sh
- -c
- mkdir -p /sys/fs/bpf && mount | grep -q '/sys/fs/bpf' || mount -t bpf bpf /sys/fs/bpf
image: 'docker.io/kubeshark/worker:v53.1'
image: 'docker.io/kubeshark/worker:v53.3'
imagePullPolicy: Always
name: mount-bpf
securityContext:
@@ -618,7 +641,7 @@ spec:
- '500Mi'
- -cloud-api-url
- 'https://api.kubeshark.com'
image: 'docker.io/kubeshark/worker:v53.1'
image: 'docker.io/kubeshark/worker:v53.3'
imagePullPolicy: Always
name: sniffer
ports:
@@ -634,6 +657,10 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: TCP_STREAM_CHANNEL_TIMEOUT_MS
value: '10000'
- name: TCP_STREAM_CHANNEL_TIMEOUT_SHOW
@@ -690,7 +717,7 @@ spec:
- -disable-tls-log
- -loglevel
- 'warning'
image: 'docker.io/kubeshark/worker:v53.1'
image: 'docker.io/kubeshark/worker:v53.3'
imagePullPolicy: Always
name: tracer
env:
@@ -702,6 +729,10 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: PROFILING_ENABLED
value: 'false'
- name: SENTRY_ENABLED
@@ -782,10 +813,10 @@ kind: Deployment
metadata:
labels:
app.kubeshark.com/app: hub
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
name: kubeshark-hub
namespace: default
@@ -800,10 +831,10 @@ spec:
metadata:
labels:
app.kubeshark.com/app: hub
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
spec:
dnsPolicy: ClusterFirstWithHostNet
@@ -819,13 +850,15 @@ spec:
- -capture-stop-after
- "5m"
- -snapshot-size-limit
- ''
- '20Gi'
- -dissector-image
- 'docker.io/kubeshark/worker:v53.1'
- 'docker.io/kubeshark/worker:v53.3'
- -dissector-cpu
- '1'
- -dissector-memory
- '4Gi'
- -dissector-storage-size
- '20Gi'
- -cloud-api-url
- 'https://api.kubeshark.com'
env:
@@ -843,7 +876,7 @@ spec:
value: 'production'
- name: PROFILING_ENABLED
value: 'false'
image: 'docker.io/kubeshark/hub:v53.1'
image: 'docker.io/kubeshark/hub:v53.3'
imagePullPolicy: Always
readinessProbe:
periodSeconds: 5
@@ -911,10 +944,10 @@ kind: Deployment
metadata:
labels:
app.kubeshark.com/app: front
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
name: kubeshark-front
namespace: default
@@ -929,10 +962,10 @@ spec:
metadata:
labels:
app.kubeshark.com/app: front
helm.sh/chart: kubeshark-53.1.0
helm.sh/chart: kubeshark-53.3.0
app.kubernetes.io/name: kubeshark
app.kubernetes.io/instance: kubeshark
app.kubernetes.io/version: "53.1.0"
app.kubernetes.io/version: "53.3.0"
app.kubernetes.io/managed-by: Helm
spec:
containers:
@@ -973,13 +1006,21 @@ spec:
value: 'false'
- name: REACT_APP_DISSECTORS_UPDATING_ENABLED
value: 'true'
- name: REACT_APP_SNAPSHOTS_UPDATING_ENABLED
value: 'true'
- name: REACT_APP_DEMO_MODE_ENABLED
value: 'false'
- name: REACT_APP_CLUSTER_WIDE_MAP_ENABLED
value: 'false'
- name: REACT_APP_RAW_CAPTURE_ENABLED
value: 'true'
- name: REACT_APP_ENTRIES_LIMIT
value: '300000'
- name: REACT_APP_SENTRY_ENABLED
value: 'false'
- name: REACT_APP_SENTRY_ENVIRONMENT
value: 'production'
image: 'docker.io/kubeshark/front:v53.1'
image: 'docker.io/kubeshark/front:v53.3'
imagePullPolicy: Always
name: kubeshark-front
livenessProbe:

View File

@@ -14,6 +14,7 @@ compatible agents.
|-------|-------------|
| [`network-rca`](network-rca/) | Network Root Cause Analysis. Retrospective traffic analysis via snapshots, with two investigation routes: PCAP (for Wireshark/compliance) and Dissection (for AI-driven API-level investigation). |
| [`kfl`](kfl/) | KFL2 (Kubeshark Filter Language) expert. Complete reference for writing, debugging, and optimizing CEL-based traffic filters across all supported protocols. |
| [`security-audit`](security-audit/) | Network Security Audit. Systematic 8-phase threat detection across MITRE ATT&CK tactics — C2, exfiltration, lateral movement, credential theft, cryptomining, protocol abuse — using snapshot-based traffic analysis. |
## Prerequisites

489
skills/install/SKILL.md Normal file
View File

@@ -0,0 +1,489 @@
---
name: install
user-invocable: true
description: >
Kubeshark installation and deployment skill. Use this skill whenever the user wants
to install Kubeshark, deploy Kubeshark to a Kubernetes cluster, set up Kubeshark,
configure Kubeshark helm values, generate a Kubeshark config file, customize
Kubeshark deployment, troubleshoot Kubeshark installation, upgrade Kubeshark,
uninstall Kubeshark, or manage the Kubeshark Helm release. Also trigger when
the user mentions "kubeshark tap", "kubeshark clean", "helm install kubeshark",
"get kubeshark running", "set up traffic capture", "deploy kubeshark",
"kubeshark not starting", "kubeshark pods not ready", "configure namespaces",
"persistent storage", "cloud storage for snapshots", "kubeshark ingress",
"kubeshark auth", "kubeshark SAML", "kubeshark license", "kubeshark config",
"custom helm values", "kubeshark on EKS/GKE/AKS", "kubeshark on OpenShift",
"kubeshark on KinD/minikube/k3s", "air-gapped", "offline install",
or any request related to getting Kubeshark installed, configured, and running
in a Kubernetes cluster.
---
# Kubeshark Installation & Deployment
You are a Kubeshark deployment specialist. Your job is to help users install,
configure, and deploy Kubeshark to their Kubernetes cluster — tailoring the
configuration to their specific environment, requirements, and use case.
Kubeshark deploys via Helm. The CLI (`kubeshark tap`) is a thin wrapper that
installs a basic Helm chart and establishes a port-forward — nothing more.
For larger or production clusters, use Helm directly with a custom values file.
## Decision: CLI or Helm?
**Use the CLI** when:
- Quick install on a dev/test cluster (minikube, KinD, k3s)
- Personal environment, single user
- Just want to try Kubeshark quickly
**Use Helm directly** when:
- Larger cluster (staging, production)
- Need custom configuration (ingress, auth, storage, namespaces)
- GitOps / infrastructure-as-code workflows
- Team environment
## Path A: CLI (Dev/Test Clusters)
### Step 1 — Install the CLI
Check if Kubeshark is already installed:
```bash
kubeshark version
```
If not installed, offer one of these methods:
**Homebrew (easiest, where available):**
```bash
brew tap kubeshark/kubeshark
brew install kubeshark
```
**Binary download:**
For the full list of platforms and architectures, see https://docs.kubeshark.com/en/install
```bash
# Linux (amd64)
curl -Lo kubeshark https://github.com/kubeshark/kubeshark/releases/latest/download/kubeshark_linux_amd64
chmod +x kubeshark
sudo mv kubeshark /usr/local/bin/
# Linux (arm64)
curl -Lo kubeshark https://github.com/kubeshark/kubeshark/releases/latest/download/kubeshark_linux_arm64
chmod +x kubeshark
sudo mv kubeshark /usr/local/bin/
# macOS (Apple Silicon)
curl -Lo kubeshark https://github.com/kubeshark/kubeshark/releases/latest/download/kubeshark_darwin_arm64
chmod +x kubeshark
sudo mv kubeshark /usr/local/bin/
# macOS (Intel)
curl -Lo kubeshark https://github.com/kubeshark/kubeshark/releases/latest/download/kubeshark_darwin_amd64
chmod +x kubeshark
sudo mv kubeshark /usr/local/bin/
```
### Step 2 — Check for Updates
**Always check for updates before using the CLI.** This is critical — Kubeshark
releases frequently and running an outdated version can cause issues.
```bash
# Homebrew
brew upgrade kubeshark
# Binary — check the latest release and re-download if newer
kubeshark version
# Compare with https://github.com/kubeshark/kubeshark/releases/latest
```
### Step 3 — Deploy with `kubeshark tap`
```bash
kubeshark tap
```
This installs the Helm chart with defaults and opens the dashboard in your browser.
That's it for dev/test clusters.
### Step 4 — Reconnect if Connection Breaks
If the port-forward drops (laptop sleep, network change, terminal closed):
```bash
kubeshark proxy
```
This re-establishes the port-forward and reopens the dashboard. It does **not**
reinstall — Kubeshark is still running in the cluster.
### Step 5 — Clean Up After Use
**Always clean up when done.** Kubeshark runs eBPF probes and DaemonSet workers
on every node — leaving it running wastes cluster resources.
```bash
kubeshark clean
```
Always remind the user to run `kubeshark clean` when they're finished. This is
easy to forget and important.
## Path B: Helm (Larger / Production Clusters)
### Step 1 — Upgrade the Helm Chart
**Always update the Helm repo first.** This is the most important first step —
running an outdated chart can cause issues.
```bash
helm repo add kubeshark https://helm.kubeshark.com
helm repo update
```
### Step 2 — Create a Config Directory
Store all configuration files in `~/.kubeshark/`:
```bash
mkdir -p ~/.kubeshark
```
**Before writing any file to `~/.kubeshark/`, check if it already exists.**
If `~/.kubeshark/values.yaml` (or any target filename) already exists, **ask the
user** before overwriting. Either:
1. Back up the existing file first: `cp ~/.kubeshark/values.yaml ~/.kubeshark/values.yaml.bak.$(date +%s)`
2. Use a descriptive name for the new file (e.g., `values-production.yaml`, `values-staging.yaml`)
The user may have multiple values files for different clusters or environments.
### Step 3 — Build the Values File
Walk through the following configuration areas with the user. Each section
explains what the value does and what to recommend.
#### Pod Targeting (CRITICAL)
```yaml
tap:
regex: .*
namespaces: []
excludedNamespaces: []
```
**This is one of the most important configuration decisions.** By default,
Kubeshark monitors the entire cluster's traffic. On a large cluster this is a
huge undertaking that consumes significant CPU and memory on every node.
**Always set namespace targeting.** Ask the user which namespaces contain the
workloads they care about, and set those explicitly:
```yaml
tap:
namespaces:
- production
- staging
```
Alternatively, use `excludedNamespaces` to monitor everything except specific
namespaces:
```yaml
tap:
excludedNamespaces:
- kube-system
- monitoring
- kubeshark
```
The `regex` field filters by pod name within the targeted namespaces. Leave as
`.*` unless the user wants to focus on specific pods.
Setting pod targeting rules causes Kubeshark to focus only on specific workloads,
which moderates compute consumption significantly.
#### Docker Registry (Air-Gapped Environments)
```yaml
tap:
docker:
registry: docker.io/kubeshark
tag: ""
```
- `tap.docker.registry` — Change this for air-gapped environments where there's
no access to `docker.io`. Point to your internal registry. Additional config
may be needed (pull secrets, registry credentials).
- `tap.docker.tag` — Set a specific version. If a patch version is missing, the
latest patch in that minor version is used. **Leave empty (recommended)** to
use the version matching the Helm chart.
For air-gapped clusters, also set:
```yaml
internetConnectivity: false
```
This is the **most important setting for air-gapped clusters** — it disables all
outbound connectivity checks (license validation, telemetry, update checks).
#### Capture & Dissection
```yaml
tap:
capture:
dissection:
enabled: true
stopAfter: 5m
raw:
enabled: true
storageSize: 1Gi
dbMaxSize: 500Mi
```
**`tap.capture.dissection.enabled`** — Controls real-time dissection (L7 protocol
parsing on production nodes). Real-time dissection consumes significant compute
resources from production nodes. **Recommend starting with `false` (disabled).**
This can be toggled on-demand from the dashboard when needed, so it's used only
when necessary and doesn't consume resources the rest of the time.
Dissection is independent from raw capture + snapshots. Raw capture is lightweight
and runs continuously; dissection is the heavy operation.
**`tap.capture.dissection.stopAfter`** — Time after which dissection automatically
disables once all client connections end. Set to `0` to never auto-disable (manual
control only).
**`tap.capture.raw.enabled`** — Keep this `true`. Raw capture consumes very little
production resources yet captures all traffic. This is what powers snapshots and
retrospective analysis.
**`tap.capture.raw.storageSize`** — The FIFO buffer for raw capture per node.
**Recommend 100Gi** for production. The larger this is, the further back in time
snapshots can reach.
**`tap.capture.dbMaxSize`** — Size of the database holding dissected API calls.
Bigger = more history kept. Adjust based on how much queryable history the user needs.
**`tap.capture.captureSelf`** — Debug option. Ignore during installation.
**`bpfOverride`** — Debug option. Ignore during installation.
#### Delayed Dissection
```yaml
tap:
delayedDissection:
cpu: "1"
memory: 4Gi
```
Delayed dissection is the process on the Hub that dissects raw capture data within
a snapshot. It runs on the Hub node (not production nodes) and is triggered when
a delayed dissection operation is requested on a snapshot.
**Give this as much resources as possible.** Recommend `cpu: "5"` and `memory: 5Gi`.
This speeds up snapshot analysis significantly.
#### Snapshot Storage (Local)
```yaml
tap:
snapshots:
local:
storageClass: ""
storageSize: 20Gi
```
This is where snapshots are stored locally. **Be very generous with this.**
**Recommend 2Ti (2TB)** for production environments that will accumulate snapshots.
**`storageClass`** — Must match a valid storage class in the cluster. Suggest
based on the cloud provider:
| Provider | Recommended Storage Class |
|----------|-------------------------|
| EKS (AWS) | `gp2` or `gp3` |
| GKE (Google) | `standard` or `premium-rwo` |
| AKS (Azure) | `managed-csi` or `managed-premium` |
| OpenShift | Check `kubectl get sc` — varies by provider |
| KinD / minikube | `standard` (default) |
| Private / bare metal | Ask the user for their storage class |
Always verify available storage classes with `kubectl get sc`.
#### Cloud Storage (Long-Term Retention)
Cloud storage enables uploading snapshots to S3, GCS, or Azure Blob for long-term
retention, cross-cluster sharing, and backup/restore.
For detailed configuration per provider (including IRSA, Workload Identity, static
credentials, and ConfigMap/Secret setup), see `references/cloud-storage.md`.
Summary of provider values:
```yaml
tap:
snapshots:
cloud:
provider: "" # "s3", "azblob", or "gcs" (empty = disabled)
prefix: "" # Key prefix in bucket
configMaps: [] # Pre-existing ConfigMaps with cloud config
secrets: [] # Pre-existing Secrets with cloud credentials
```
Help the user select the right provider based on where their cluster runs and
walk them through the authentication setup.
#### Resources
For a first installation, **do not change the resource defaults.** Let the user
run Kubeshark with defaults first and tune based on actual usage patterns later.
The defaults are reasonable starting points. Resource consumption depends heavily
on how much traffic is processed, which is controlled by pod targeting rules.
#### Node Selectors
```yaml
tap:
nodeSelectorTerms:
workers:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values: [linux]
```
Use `nodeSelectorTerms` when the user wants to focus on specific nodes. The less
workload processed by Kubeshark, the less CPU and memory it consumes. The goal is
to process workloads of interest, not the entire cluster.
#### Ingress (STRONGLY RECOMMENDED)
```yaml
tap:
ingress:
enabled: false
className: ""
host: ks.svc.cluster.local
path: /
tls: []
annotations: {}
```
**Ingress is the strongly preferred access method.** While port-forward is available,
it is **highly NOT recommended** for anything beyond quick local testing. Port-forward
is fragile, drops connections, and doesn't scale for team use.
**Always help the user configure ingress.** Ask them about their ingress controller
(nginx, ALB, Traefik, etc.) and build the ingress config:
```yaml
tap:
ingress:
enabled: true
className: nginx
host: kubeshark.example.com
tls:
- secretName: kubeshark-tls
hosts:
- kubeshark.example.com
annotations: {}
```
For ALB on AWS:
```yaml
tap:
ingress:
enabled: true
className: alb
host: kubeshark.example.com
annotations:
alb.ingress.kubernetes.io/scheme: internal
alb.ingress.kubernetes.io/target-type: ip
```
#### Air-Gapped Clusters
For air-gapped environments, two settings are essential:
```yaml
tap:
docker:
registry: your-internal-registry.example.com/kubeshark
internetConnectivity: false
```
`internetConnectivity: false` is the **single most important option** for
air-gapped clusters. Without it, Kubeshark will attempt outbound connections
that will fail and cause issues.
### Step 4 — Install
```bash
helm install kubeshark kubeshark/kubeshark \
-f ~/.kubeshark/values.yaml \
-n kubeshark --create-namespace
```
### Step 5 — Upgrade
When upgrading, **always update the Helm repo first**:
```bash
helm repo update
helm upgrade kubeshark kubeshark/kubeshark \
-f ~/.kubeshark/values.yaml \
-n kubeshark
```
## Uninstalling
**Via CLI:**
```bash
kubeshark clean
kubeshark clean -s kubeshark # Specific namespace
```
**Via Helm:**
```bash
helm uninstall kubeshark -n kubeshark
```
PersistentVolumeClaims are not deleted by default. Remove manually if needed:
```bash
kubectl delete pvc -l app.kubernetes.io/name=kubeshark -n kubeshark
```
## Troubleshooting
- **Pods not starting**: Check `kubectl get pods -l app.kubernetes.io/name=kubeshark -n <ns>`
and `kubectl describe pod`. Common: ImagePullBackOff (registry), Pending (storage/resources),
CrashLoopBackOff (check `kubectl logs`).
- **No traffic**: Verify namespaces have running pods, check pod regex, ensure eBPF supported
(kernel 4.14+, 5.4+ recommended).
- **Permissions**: Requires privileged containers with NET_RAW, NET_ADMIN, SYS_ADMIN,
SYS_PTRACE, SYS_RESOURCE, IPC_LOCK capabilities.
- **Storage**: Verify storage class exists (`kubectl get sc`), PVC is bound (`kubectl get pvc`).
## Setup Reference
### Kubeshark MCP for AI Agents
After installation, connect the Kubeshark MCP so AI agents can interact with Kubeshark:
```bash
# Claude Code
claude mcp add kubeshark -- kubeshark mcp
# Direct URL (no kubectl needed)
claude mcp add kubeshark -- kubeshark mcp --url https://kubeshark.example.com
```

View File

@@ -0,0 +1,96 @@
# Cloud Storage for Snapshots
This is a pointer to the authoritative cloud storage documentation maintained in
the Helm chart:
**Source of truth**: `helm-chart/docs/snapshots_cloud_storage.md`
Always read that file for the latest configuration details, including:
- Amazon S3 (static credentials, IRSA, cross-account AssumeRole)
- Azure Blob Storage (storage key, Workload Identity / DefaultAzureCredential)
- Google Cloud Storage (service account JSON, GKE Workload Identity)
- IAM permissions and trust policy examples
- ConfigMap and Secret setup patterns
- Inline values vs. external ConfigMap/Secret approaches
## Quick Reference
### Helm Values Structure
```yaml
tap:
snapshots:
cloud:
provider: "" # "s3", "azblob", or "gcs" (empty = disabled)
prefix: "" # Key prefix in the bucket/container
configMaps: [] # Pre-existing ConfigMaps with cloud config env vars
secrets: [] # Pre-existing Secrets with cloud credentials
s3:
bucket: ""
region: ""
accessKey: ""
secretKey: ""
roleArn: ""
externalId: ""
azblob:
storageAccount: ""
container: ""
storageKey: ""
gcs:
bucket: ""
project: ""
credentialsJson: ""
```
### Recommended Auth Per Provider
| Provider | Production Recommendation |
|----------|-------------------------|
| S3 (EKS) | IRSA (IAM Roles for Service Accounts) — no static credentials |
| S3 (non-EKS) | Static credentials via Secret, or default AWS credential chain |
| Azure Blob (AKS) | Workload Identity / Managed Identity |
| Azure Blob (non-AKS) | Storage account key via Secret |
| GCS (GKE) | GKE Workload Identity — no JSON key file |
| GCS (non-GKE) | Service account JSON key via Secret |
### Inline Values (Simplest Approach)
Set credentials directly in values.yaml. The Helm chart creates the necessary
ConfigMap/Secret resources automatically.
**S3:**
```yaml
tap:
snapshots:
cloud:
provider: "s3"
s3:
bucket: my-kubeshark-snapshots
region: us-east-1
```
**GCS:**
```yaml
tap:
snapshots:
cloud:
provider: "gcs"
gcs:
bucket: my-kubeshark-snapshots
project: my-gcp-project
```
**Azure Blob:**
```yaml
tap:
snapshots:
cloud:
provider: "azblob"
azblob:
storageAccount: mykubesharksa
container: snapshots
```
For production setups with proper IAM integration, see the full documentation
in `helm-chart/docs/snapshots_cloud_storage.md`.

View File

@@ -0,0 +1,376 @@
# Kubeshark Helm Values Reference
Complete reference for all Kubeshark Helm chart values. Use this when building
custom `values.yaml` files or `--set` flags.
## Docker Images
```yaml
tap:
docker:
registry: docker.io/kubeshark # Docker registry
tag: "" # Image tag (empty = chart appVersion)
tagLocked: true # Lock to specific tag
imagePullPolicy: Always # Always, IfNotPresent, Never
imagePullSecrets: [] # Registry pull secrets
overrideImage: # Override individual component images
worker: ""
hub: ""
front: ""
overrideTag: # Override individual component tags
worker: ""
hub: ""
front: ""
```
## Proxy / Port-Forward
```yaml
tap:
proxy:
worker:
srvPort: 48999
hub:
srvPort: 8898
front:
port: 8899 # Local port for port-forward
host: 127.0.0.1 # Bind address
```
## Pod Targeting
```yaml
tap:
regex: .* # Pod name regex filter
namespaces: [] # Target namespaces (empty = all)
excludedNamespaces: [] # Namespaces to exclude
bpfOverride: "" # Custom BPF filter override
```
## Capture & Dissection
```yaml
tap:
capture:
dissection:
enabled: true # Enable L7 dissection
stopAfter: 5m # Auto-stop dissection after duration
captureSelf: false # Capture Kubeshark's own traffic
raw:
enabled: true # Enable raw packet capture (needed for snapshots)
storageSize: 1Gi # FIFO buffer size per node
dbMaxSize: 500Mi # Max L7 database size per node
delayedDissection:
cpu: "1" # CPU for delayed dissection jobs
memory: 4Gi # Memory for delayed dissection jobs
storageSize: "" # Storage for delayed dissection
storageClass: "" # Storage class for delayed dissection
```
## Snapshots
```yaml
tap:
snapshots:
local:
storageClass: "" # Storage class for local snapshots
storageSize: 20Gi # PVC size for local snapshots
cloud:
provider: "" # s3, gcs, or azblob
prefix: "" # Path prefix in bucket
configMaps: [] # Additional ConfigMaps to mount
secrets: [] # Additional Secrets to mount
s3:
bucket: ""
region: ""
accessKey: ""
secretKey: ""
roleArn: "" # IAM role ARN (IRSA)
externalId: "" # STS external ID
azblob:
storageAccount: ""
container: ""
storageKey: ""
gcs:
bucket: ""
project: ""
credentialsJson: "" # Service account JSON
```
## Helm Release
```yaml
tap:
release:
repo: https://helm.kubeshark.com # Helm chart repository
name: kubeshark # Release name
namespace: default # Release namespace
helmChartPath: "" # Path to local chart (overrides repo)
```
## Storage
```yaml
tap:
persistentStorage: false # Enable PVC for worker data
persistentStorageStatic: false # Static provisioning
persistentStoragePvcVolumeMode: FileSystem # FileSystem or Block
efsFileSytemIdAndPath: "" # EFS file system ID (EKS)
secrets: [] # Additional secrets to mount
storageLimit: 10Gi # Max storage per node
storageClass: standard # Default storage class
```
## Resources
```yaml
tap:
resources:
hub:
limits:
cpu: "0" # 0 = no limit
memory: 5Gi
requests:
cpu: 50m
memory: 50Mi
sniffer:
limits:
cpu: "0"
memory: 5Gi
requests:
cpu: 50m
memory: 50Mi
tracer:
limits:
cpu: "0"
memory: 5Gi
requests:
cpu: 50m
memory: 50Mi
```
## Health Probes
```yaml
tap:
probes:
hub:
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
sniffer:
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
```
## TLS & Service Mesh
```yaml
tap:
serviceMesh: true # Capture mTLS traffic (service mesh)
tls: true # Capture OpenSSL/Go TLS traffic
disableTlsLog: true # Suppress TLS debug logging
packetCapture: best # Capture method: best, af_packet, pcap
```
## Labels, Annotations & Scheduling
```yaml
tap:
labels: {} # Additional labels for all pods
annotations: {} # Additional annotations for all pods
nodeSelectorTerms:
hub: # Hub pod node selector
- matchExpressions:
- key: kubernetes.io/os
operator: In
values: [linux]
workers: # Worker DaemonSet node selector
- matchExpressions:
- key: kubernetes.io/os
operator: In
values: [linux]
front: # Frontend pod node selector
- matchExpressions:
- key: kubernetes.io/os
operator: In
values: [linux]
tolerations:
hub: []
workers:
- operator: Exists
effect: NoExecute # Workers tolerate NoExecute by default
front: []
priorityClass: "" # PriorityClassName for pods
```
## Authentication
```yaml
tap:
auth:
enabled: false
type: saml # Only SAML supported currently
roles:
admin:
filter: "" # KFL filter restricting visible traffic
canDownloadPCAP: true
canUseScripting: true
scriptingPermissions:
canSave: true
canActivate: true
canDelete: true
canUpdateTargetedPods: true
canStopTrafficCapturing: true
canControlDissection: true
showAdminConsoleLink: true
rolesClaim: role # SAML attribute for role mapping
defaultRole: "" # Role for users without a role claim
defaultFilter: "" # Default KFL filter for all users
saml:
idpMetadataUrl: "" # SAML IdP metadata URL
x509crt: "" # SP certificate
x509key: "" # SP private key
```
## Ingress
```yaml
tap:
ingress:
enabled: false
className: "" # nginx, alb, traefik, etc.
host: ks.svc.cluster.local
path: /
tls: [] # TLS configuration
annotations: {} # Ingress annotations
```
## Protocol Dissectors
```yaml
tap:
enabledDissectors:
- amqp
- dns
- http
- icmp
- kafka
- mongodb
- mysql
- postgresql
- redis
- ws
- ldap
- radius
- diameter
- udp-flow
- tcp-flow
- udp-conn
- tcp-conn
portMapping: # Default port-to-protocol mappings
http: [80, 443, 8080]
amqp: [5671, 5672]
kafka: [9092]
mongodb: [27017]
mysql: [3306]
postgresql: [5432]
redis: [6379]
ldap: [389]
diameter: [3868]
customMacros:
https: "tls and (http or http2)"
```
## Networking & Security
```yaml
tap:
hostNetwork: true # Use host network (required for capture)
ipv6: true # Enable IPv6 support
mountBpf: true # Mount BPF filesystem
securityContext:
privileged: true
appArmorProfile:
type: ""
localhostProfile: ""
seLinuxOptions:
level: ""
role: ""
type: ""
user: ""
capabilities:
networkCapture: [NET_RAW, NET_ADMIN]
serviceMeshCapture: [SYS_ADMIN, SYS_PTRACE, DAC_OVERRIDE]
ebpfCapture: [SYS_ADMIN, SYS_PTRACE, SYS_RESOURCE, IPC_LOCK]
```
## Dashboard
```yaml
tap:
dashboard:
streamingType: connect-rpc
completeStreamingEnabled: true
clusterWideMapEnabled: false
entriesLimit: "300000"
routing:
front:
basePath: "" # Base path for reverse proxy
```
## Scripting
```yaml
scripting:
enabled: false
env: {} # Environment variables for scripts
source: "" # Git repo for scripts
sources: [] # Multiple script sources
watchScripts: true # Watch for script changes
active: [] # Active scripts
console: true # Enable script console
```
## Misc
```yaml
tap:
dryRun: false # Preview targeted pods without deploying
debug: false # Enable debug mode
telemetry:
enabled: true # Anonymous usage telemetry
resourceGuard:
enabled: false # Resource usage guard
watchdog:
enabled: false # Watchdog process
gitops:
enabled: false # GitOps mode
defaultFilter: "" # Default KFL display filter
globalFilter: "" # Global KFL filter (cannot be overridden)
dns:
nameservers: [] # Custom DNS nameservers
searches: [] # Custom DNS search domains
options: [] # Custom DNS options
misc:
jsonTTL: 5m # TTL for JSON entries
pcapTTL: "0" # TTL for PCAP files (0 = no TTL)
trafficSampleRate: 100 # Traffic sampling rate (1-100)
resolutionStrategy: auto # IP resolution: auto, dns, k8s
detectDuplicates: false # Detect duplicate packets
staleTimeoutSeconds: 30 # Timeout for stale connections
tcpFlowTimeout: 1200 # TCP flow idle timeout (seconds)
udpFlowTimeout: 1200 # UDP flow idle timeout (seconds)
headless: false # Suppress browser auto-open
license: "" # Kubeshark Pro license key
timezone: "" # Override timezone
logLevel: warning # Log level: debug, info, warning, error
kube:
configPath: "" # Custom kubeconfig path
context: "" # Kubernetes context name
```

View File

@@ -14,6 +14,7 @@ description: >
or any request to slice/search/narrow network traffic in Kubeshark. Also trigger
when other skills need to construct filters — KFL is the query language for all
Kubeshark traffic analysis.
last-updated: 2026-05-08
---
# KFL2 — Kubeshark Filter Language
@@ -88,13 +89,16 @@ filter term — they're fast and narrow the search space immediately.
|------|----------|------|----------|
| `http` | HTTP/1.1, HTTP/2 | `redis` | Redis |
| `dns` | DNS | `kafka` | Kafka |
| `tls` | TLS/SSL | `amqp` | AMQP |
| `tls` | eBPF TLS interception | `amqp` | AMQP |
| `tcp` | TCP | `ldap` | LDAP |
| `udp` | UDP | `ws` | WebSocket |
| `sctp` | SCTP | `gql` | GraphQL (v1+v2) |
| `icmp` | ICMP | `gqlv1` / `gqlv2` | GraphQL version-specific |
| `radius` | RADIUS | `conn` / `flow` | L4 connection/flow tracking |
| `diameter` | Diameter | `tcp_conn` / `udp_conn` | Transport-specific connections |
| `grpc` | gRPC (HTTP/2 sub-protocol) | `mongodb` | MongoDB |
| `mysql` | MySQL | `postgresql` | PostgreSQL |
| `radius` | RADIUS | | |
| `diameter` | Diameter | `conn` / `flow` | L4 connection/flow tracking |
| | | `tcp_conn` / `udp_conn` | Transport-specific connections |
## Kubernetes Context
@@ -112,6 +116,17 @@ dst.service.namespace == "payments"
Pod fields fall back to service data when pod info is unavailable, so
`dst.pod.namespace` works even for service-level entries.
### Summary Name and Namespace
Convenience variables that pick the best available identity for a peer:
```
src.name == "api-gateway" // pod > service > dns > process
dst.name.contains("payment") // works across identity types
src.namespace == "production" // pod namespace, falls back to service
dst.namespace != "kube-system" // exclude system namespace
```
### Aggregate Collections
Match against any direction (src or dst):
@@ -125,26 +140,13 @@ Match against any direction (src or dst):
### Labels and Annotations
```
// Direct access — works when the label is expected to exist
local_labels.app == "payment" || remote_labels.app == "payment"
// Safe access with default — use when the label may not exist
map_get(local_labels, "app", "") == "checkout"
map_get(local_labels, "app", "") == "checkout" // Safe access with default
map_get(remote_labels, "version", "") == "canary"
// Label existence check
"tier" in local_labels
"tier" in local_labels // Label existence check
```
Direct access (`local_labels.app`) returns an error if the key doesn't exist.
Use `map_get()` when you're not sure the label is present on all workloads.
Queries can be as complex as needed — combine labels with any other fields.
Responses are fast because all API elements are indexed:
```
local_labels.app == "payment" && http && status_code >= 500 && dst.pod.namespace == "production"
```
Always use `map_get()` for labels and annotations — direct access like
`local_labels["app"]` errors if the key doesn't exist.
### Node and Process
@@ -205,8 +207,14 @@ http && request.headers["content-type"] == "application/json"
// GraphQL (subset of HTTP)
gql && method == "POST" && status_code >= 400
// Only eBPF-intercepted TLS traffic (decrypted HTTPS)
tls && http && status_code >= 500
```
> **Note on `tls`**: The `tls` flag is an alias for `capture_source == "ebpf_tls"`.
> It indicates traffic captured via eBPF TLS interception, not TLS protocol dissection.
## DNS Filtering
DNS issues are often the hidden root cause of outages.
@@ -248,6 +256,55 @@ kafka && kafka_request_summary.contains("orders") // Topic filtering
kafka && kafka_size > 10000 // Large messages
```
### MongoDB
```
mongodb && mongodb_command == "find" // Find operations
mongodb && mongodb_collection == "users" // Collection filtering
mongodb && mongodb_database == "mydb" // Database filtering
mongodb && !mongodb_success // Failed operations
mongodb && mongodb_error_code != 0 // Error code filtering
mongodb && mongodb_total_size > 10000 // Large operations
```
### MySQL
```
mysql && mysql_command == "COM_QUERY" // SQL queries
mysql && mysql_query.contains("SELECT") // SELECT statements
mysql && mysql_database == "orders_db" // Database filtering
mysql && !mysql_success // Failed queries
mysql && mysql_error_code != 0 // Error code filtering
mysql && mysql_total_size > 10000 // Large queries
```
### PostgreSQL
```
postgresql && postgresql_command == "COM_QUERY" // Query commands
postgresql && postgresql_query.contains("SELECT") // SELECT statements
postgresql && postgresql_database == "orders_db" // Database filtering
postgresql && postgresql_user == "admin" // User filtering
postgresql && !postgresql_success // Failed queries
postgresql && postgresql_error_code != "" // Error code filtering (SQLSTATE string)
postgresql && postgresql_total_size > 10000 // Large queries
```
> **Note**: `postgresql_error_code` is a **string** (SQLSTATE code like `"23505"`),
> not an int. This differs from MySQL's `mysql_error_code` which is an int.
### gRPC
gRPC is a sub-protocol of HTTP/2. All HTTP variables are also available on gRPC entries.
```
grpc && grpc_method == "SayHello" // Method filtering
grpc && grpc_status != 0 // Non-OK status codes
grpc && grpc_status == 14 // UNAVAILABLE
grpc && grpc_method.contains("Create") // Method pattern
grpc && elapsed_time > 1000000 // Slow gRPC calls (>1s)
```
### AMQP, LDAP, RADIUS, Diameter
```
@@ -301,7 +358,7 @@ dst.port >= 8000 && dst.port <= 9000
timestamp > timestamp("2026-03-14T22:00:00Z")
timestamp >= timestamp("2026-03-14T22:00:00Z") && timestamp <= timestamp("2026-03-14T23:00:00Z")
timestamp > now() - duration("5m") // Last 5 minutes
elapsed_time > 2000000 // Older than 2 seconds
elapsed_time > 2000000 // Latency > 2 seconds
```
## Building Filters: Progressive Narrowing

View File

@@ -1,5 +1,7 @@
# KFL2 Complete Variable and Field Reference
> Last synced with [kfl2 repo](https://github.com/kubeshark/kfl2): 2026-05-08
This is the exhaustive reference for every variable available in KFL2 filters.
KFL2 is built on Google's CEL (Common Expression Language) and evaluates against
Kubeshark's protobuf-based `BaseEntry` structure.
@@ -39,7 +41,7 @@ These are the variables you'll reach for in 90% of investigations:
| `index` | int | Entry index for stream uniqueness |
| `stream` | string | Stream identifier (hex string) |
| `timestamp` | timestamp | Event time (UTC), use with `timestamp()` function |
| `elapsed_time` | int | Age since timestamp in microseconds |
| `elapsed_time` | int | Response-request latency in microseconds |
| `worker` | string | Worker identifier |
## Cross-Reference Variables
@@ -67,13 +69,16 @@ Boolean variables indicating detected protocol. Use as first filter term for per
|----------|----------|----------|----------|
| `http` | HTTP/1.1, HTTP/2 | `redis` | Redis |
| `dns` | DNS | `kafka` | Kafka |
| `tls` | TLS/SSL handshake | `amqp` | AMQP messaging |
| `tls` | eBPF TLS interception | `amqp` | AMQP messaging |
| `tcp` | TCP transport | `ldap` | LDAP directory |
| `udp` | UDP transport | `ws` | WebSocket |
| `sctp` | SCTP streaming | `gql` | GraphQL (v1 or v2) |
| `icmp` | ICMP | `gqlv1` | GraphQL v1 only |
| `radius` | RADIUS auth | `gqlv2` | GraphQL v2 only |
| `diameter` | Diameter | `conn` | L4 connection tracking |
| `grpc` | gRPC (HTTP/2 sub-protocol) | `gqlv2` | GraphQL v2 only |
| `mongodb` | MongoDB | `mysql` | MySQL |
| `postgresql` | PostgreSQL | `diameter` | Diameter |
| `radius` | RADIUS auth | | |
| | | `conn` | L4 connection tracking |
| `flow` | L4 flow tracking | `tcp_conn` | TCP connection tracking |
| `tcp_flow` | TCP flow tracking | `udp_conn` | UDP connection tracking |
| `udp_flow` | UDP flow tracking | | |
@@ -123,7 +128,7 @@ Supported question types: A, AAAA, NS, CNAME, SOA, MX, TXT, SRV, PTR, ANY.
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `tls` | bool | TLS payload detected | |
| `tls` | bool | eBPF TLS interception (alias for `capture_source == "ebpf_tls"`) | |
| `tls_summary` | string | TLS handshake summary | `"ClientHello"`, `"ServerHello"` |
| `tls_info` | string | TLS connection details | `"TLS 1.3, AES-256-GCM"` |
| `tls_request_size` | int | TLS request size in bytes | |
@@ -263,6 +268,76 @@ Supported question types: A, AAAA, NS, CNAME, SOA, MX, TXT, SRV, PTR, ANY.
| `diameter_response_length` | int | Response size (0 if absent) |
| `diameter_total_size` | int | Sum of request + response |
## MongoDB Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `mongodb` | bool | MongoDB payload detected | |
| `mongodb_command` | string | Operation type | `"find"`, `"insert"`, `"update"`, `"delete"` |
| `mongodb_database` | string | Database name | `"mydb"` |
| `mongodb_collection` | string | Collection name | `"users"` |
| `mongodb_opcode` | string | Operation opcode name | |
| `mongodb_request_size` | int | Request size in bytes | |
| `mongodb_response_size` | int | Response size in bytes | |
| `mongodb_total_size` | int | Combined request + response size | |
| `mongodb_success` | bool | Operation success status | |
| `mongodb_error_code` | int | Error code | |
| `mongodb_error_message` | string | Error description | |
| `mongodb_error_code_name` | string | Named error code | |
**Example**: `mongodb && mongodb_command == "find" && mongodb_collection == "users"`
## MySQL Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `mysql` | bool | MySQL payload detected | |
| `mysql_command` | string | SQL command name | `"COM_QUERY"`, `"COM_STMT_PREPARE"` |
| `mysql_query` | string | Full SQL query text | `"SELECT * FROM users"` |
| `mysql_database` | string | Active database name | `"orders_db"` |
| `mysql_statement_id` | int | Prepared statement identifier | |
| `mysql_request_size` | int | Request payload size in bytes | |
| `mysql_response_size` | int | Response payload size in bytes | |
| `mysql_total_size` | int | Combined request + response size | |
| `mysql_success` | bool | Response OK status | |
| `mysql_error_code` | int | MySQL error code | |
| `mysql_error_message` | string | Error description | |
**Example**: `mysql && mysql_query.contains("SELECT") && !mysql_success`
## PostgreSQL Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `postgresql` | bool | PostgreSQL payload detected | |
| `postgresql_command` | string | Command tag | `"SELECT"`, `"INSERT"`, `"UPDATE"` |
| `postgresql_query` | string | Full SQL query text | `"SELECT * FROM users WHERE id = 1"` |
| `postgresql_database` | string | Active database name | `"orders_db"` |
| `postgresql_user` | string | Authenticated user name | `"app_service"` |
| `postgresql_request_size` | int | Request payload size in bytes | |
| `postgresql_response_size` | int | Response payload size in bytes | |
| `postgresql_total_size` | int | Combined request + response size | |
| `postgresql_success` | bool | Response OK status | |
| `postgresql_error_code` | **string** | SQLSTATE error code (NOT int) | `"23505"` (unique violation), `"42P01"` (undefined table) |
| `postgresql_error_message` | string | Error description | |
**Important**: Unlike MySQL's `mysql_error_code` (int), `postgresql_error_code` is a
**string** because PostgreSQL uses 5-character SQLSTATE codes.
**Example**: `postgresql && postgresql_query.contains("SELECT") && !postgresql_success`
## gRPC Variables
gRPC is a sub-protocol of HTTP/2. When `grpc` is true, all HTTP variables are also available.
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `grpc` | bool | gRPC payload detected | |
| `grpc_method` | string | Trailing method name from gRPC :path | `"SayHello"` (from `/helloworld.Greeter/SayHello`) |
| `grpc_status` | int | gRPC status code from Grpc-Status trailer | `0`=OK, `5`=NOT_FOUND, `14`=UNAVAILABLE; `-1` on non-gRPC |
**Example**: `grpc && grpc_status != 0 && grpc_method.contains("Create")`
## L4 Connection Tracking Variables
| Variable | Type | Description | Example |
@@ -320,6 +395,15 @@ even when only service-level resolution exists.
**Example**: `src.service.name == "api-gateway" && dst.pod.namespace == "production"`
### Summary Name and Namespace
| Variable | Type | Description |
|----------|------|-------------|
| `src.name` | string | Worker-enriched summary name of source (pod > service > dns > process) |
| `dst.name` | string | Worker-enriched summary name of destination |
| `src.namespace` | string | Source namespace with service fallback |
| `dst.namespace` | string | Destination namespace with service fallback |
### Aggregate Collections (Non-Directional)
| Variable | Type | Description |

View File

@@ -29,6 +29,31 @@ Unlike real-time monitoring, retrospective analysis lets you go back in time:
reconstruct what happened, compare against known-good baselines, and pinpoint
root causes with full L4/L7 visibility.
## Timezone Handling
All timestamps presented to the user **must use the local timezone** of the environment
where the agent is running. Users think in local time ("this happened around 3pm"), and
UTC-only output adds friction during incident response when speed matters.
### Rules
1. **Detect the local timezone** at the start of every investigation. Use the system
clock or environment (e.g., `date +%Z` or equivalent) to determine the timezone.
2. **Present local time as the primary reference** in all output — summaries, event
correlations, time-range references, and tables.
3. **Show UTC in parentheses** for clarity, e.g., `15:03:22 IST (12:03:22 UTC)`.
4. **Convert tool responses** — Kubeshark MCP tools return timestamps in UTC. Always
convert these to local time before presenting to the user.
5. **Use local time in natural language** — when describing events, say "the spike at
3:23 PM" not "the spike at 12:23 UTC".
### Snapshot Creation
When creating snapshots, Kubeshark MCP tools accept UTC timestamps. Convert the user's
local time references to UTC before passing them to tools like `create_snapshot` or
`export_snapshot_pcap`. Confirm the converted window with the user if there's any
ambiguity.
## Prerequisites
Before starting any analysis, verify the environment is ready.
@@ -84,10 +109,17 @@ Every investigation starts with a snapshot. After that, you choose one of two
investigation routes depending on your goal:
1. **Determine time window** — When did the issue occur? Use `get_data_boundaries`
to see what raw capture data is available.
2. **Create or locate a snapshot** — Either take a new snapshot covering the
to see what raw capture data (L4) is available.
2. **Check the L7 (dissected) window** — Before any KFL query on *live* data,
call `get_l7_data_boundaries`. It returns the per-node + cluster-wide range
of dissected API call data plus a `dissection_enabled` flag. Treat L4
(`get_data_boundaries`) as the snapshot/PCAP window and L7
(`get_l7_data_boundaries`) as the KFL-query window — they can differ
significantly because L7 only starts producing entries once dissection is
enabled (existing raw capture is **not** retroactively dissected).
3. **Create or locate a snapshot** — Either take a new snapshot covering the
incident window, or find an existing one with `list_snapshots`.
3. **Choose your investigation route** — PCAP or Dissection (see below).
4. **Choose your investigation route** — PCAP or Dissection (see below).
### Choosing the Right Route
@@ -103,6 +135,11 @@ Both routes are valid and complementary. Use PCAP when you need raw packets
for human analysis or compliance. Use Dissection when you want an AI agent
to search and analyze traffic programmatically.
**Default to Dissection.** Unless the user explicitly asks for a PCAP file or
Wireshark export, assume Dissection is needed. Any question about workloads,
APIs, services, pods, error rates, latency, or traffic patterns requires
dissected data.
## Snapshot Operations
Both routes start here. A snapshot is an immutable freeze of all cluster traffic
@@ -116,24 +153,52 @@ Check what raw capture data exists across the cluster. You can only create
snapshots within these boundaries — data outside the window has been rotated
out of the FIFO buffer.
**Example response**:
**Example response** (raw tool output is in UTC — convert to local time before presenting):
```
Cluster-wide:
Oldest: 2026-03-14 16:12:34 UTC
Newest: 2026-03-14 18:05:20 UTC
Oldest: 2026-03-14 18:12:34 IST (16:12:34 UTC)
Newest: 2026-03-14 20:05:20 IST (18:05:20 UTC)
Per node:
┌─────────────────────────────┬────────────────────┐
│ Node │ Oldest │ Newest
├─────────────────────────────┼────────────────────┤
│ ip-10-0-25-170.ec2.internal │ 16:12:34 │ 18:03:39 │
│ ip-10-0-32-115.ec2.internal │ 16:13:45 │ 18:05:20 │
└─────────────────────────────┴────────────────────┘
┌─────────────────────────────┬───────────────────────────────┬───────────────────────────────┐
│ Node │ Oldest │ Newest
├─────────────────────────────┼───────────────────────────────┼───────────────────────────────┤
│ ip-10-0-25-170.ec2.internal │ 18:12:34 IST (16:12:34 UTC) │ 20:03:39 IST (18:03:39 UTC)
│ ip-10-0-32-115.ec2.internal │ 18:13:45 IST (16:13:45 UTC) │ 20:05:20 IST (18:05:20 UTC)
└─────────────────────────────┴───────────────────────────────┴───────────────────────────────┘
```
If the incident falls outside the available window, the data has been rotated
out. Suggest increasing `storageSize` for future coverage.
### Check L7 (Dissected) Data Boundaries
**Tool**: `get_l7_data_boundaries`
Check what *dissected* L7 entries exist across the cluster. This is the
pre-flight check before any KFL query against live data. The response
contains:
- `dissection_enabled`: if `false`, KFL queries on live data will return
empty regardless of L4 boundaries. Enabling dissection only captures
*forward* — raw capture is **not** retroactively dissected.
- `cluster.oldest_ts` / `cluster.newest_ts`: cluster-wide window where KFL
on live data has any chance of returning results.
- `nodes[].oldest_ts` / `nodes[].newest_ts`: per-node windows for narrowing
queries.
**Key distinction:**
| | L4 (`get_data_boundaries`) | L7 (`get_l7_data_boundaries`) |
|---|---|---|
| Data | Raw PCAP capture | Dissected API call entries |
| Useful for | Snapshots, PCAP extraction | KFL queries |
| Backfill | Comes from FIFO ring buffer | Only forward from dissection-enable |
If the user is asking an API-level question and `dissection_enabled` is
`false`, enable it first — but tell the user they will only see entries
captured *after* enabling, never the historical window.
### Create a Snapshot
**Tool**: `create_snapshot`
@@ -191,18 +256,48 @@ When you know the workload names but not their IPs, resolve them from the
snapshot's metadata. Snapshots preserve pod-to-IP mappings from capture time,
so resolution is accurate even if pods have been rescheduled since.
**Tool**: `resolve_workload`
**Tool**: `list_workloads`
**Example workflow** — extract PCAP for specific workloads:
Use `list_workloads` with `name` + `namespace` for a singular lookup (works
live and against snapshots), or with `snapshot_id` + filters for a broader
scan.
1. Resolve IPs: `resolve_workload` for `orders-594487879c-7ddxf``10.0.53.101`
2. Resolve IPs: `resolve_workload` for `payment-service-6b8f9d-x2k4p``10.0.53.205`
**Example workflow — singular lookup** — extract PCAP for specific workloads:
1. Resolve IPs: `list_workloads` with `name: "orders-594487879c-7ddxf"`, `namespace: "prod"` → IPs: `["10.0.53.101"]`
2. Resolve IPs: `list_workloads` with `name: "payment-service-6b8f9d-x2k4p"`, `namespace: "prod"` → IPs: `["10.0.53.205"]`
3. Build BPF: `host 10.0.53.101 or host 10.0.53.205`
4. Export: `export_snapshot_pcap` with that BPF filter
**Example workflow — filtered scan** — extract PCAP for all workloads
matching a pattern in a snapshot:
1. List workloads: `list_workloads` with `snapshot_id`, `namespaces: ["prod"]`,
`name_regex: "payment.*"` → returns all matching workloads with their IPs
2. Collect all IPs from the response
3. Build BPF: `host 10.0.53.205 or host 10.0.53.210 or ...`
4. Export: `export_snapshot_pcap` with that BPF filter
This gives you a cluster-wide PCAP filtered to exactly the workloads involved
in the incident — ready for Wireshark or long-term storage.
### IP-to-Workload Resolution
When you have an IP address (e.g., from a PCAP or L4 flow) and need to
identify the workload behind it:
**Tool**: `list_ips`
Use `list_ips` with `ip` for a singular lookup (works live and against
snapshots), or with `snapshot_id` + filters for a broader scan.
**Example — singular lookup**: `list_ips` with `ip: "10.0.53.101"`,
`snapshot_id: "snap-abc"` → returns pod/service identity for that IP.
**Example — filtered scan**: `list_ips` with `snapshot_id: "snap-abc"`,
`namespaces: ["prod"]`, `labels: {"app": "payment"}` → returns all IPs
associated with workloads matching those filters.
---
## Route 2: Dissection
@@ -232,7 +327,30 @@ KFL field names differ from what you might expect (e.g., `status_code` not
`response.status`, `src.pod.namespace` not `src.namespace`). Using incorrect
fields produces wrong results without warning.
### Activate Dissection
### Dissection Is Required — Do Not Skip This
**Any question about workloads, Kubernetes resources, services, pods, namespaces,
or API calls requires dissection.** Only the PCAP route works without it. If the
user asks anything about traffic content, API behavior, error rates, latency,
or service-to-service communication, you **must** ensure dissection is active
before attempting to answer.
**Do not wait for dissection to complete on its own — it will not start by itself.**
Follow this sequence every time before using `list_api_calls`, `get_api_call`,
or `get_api_stats`:
1. **Check status**: Call `get_snapshot_dissection_status` (or `list_snapshot_dissections`)
to see if a dissection already exists for this snapshot.
2. **If dissection exists and is completed** — proceed with your query. No further
action needed.
3. **If dissection is in progress** — wait for it to complete, then proceed.
4. **If no dissection exists** — you **must** call `start_snapshot_dissection` to
trigger it. Then monitor progress with `get_snapshot_dissection_status` until
it completes.
Never assume dissection is running. Never wait for a dissection that was not started.
The agent is responsible for triggering dissection when it is missing.
**Tool**: `start_snapshot_dissection`
@@ -243,6 +361,27 @@ become available:
- `get_api_call` — Drill into a specific call (headers, body, timing, payload)
- `get_api_stats` — Aggregated statistics (throughput, error rates, latency)
### Every Question Is a Query
**Every user prompt that involves APIs, workloads, services, pods, namespaces,
or Kubernetes semantics should translate into a `list_api_calls` call with an
appropriate KFL filter.** Do not answer from memory or prior results — always
run a fresh query that matches what the user is asking.
Examples of user prompts and the queries they should trigger:
| User says | Action |
|---|---|
| "Show me all 500 errors" | `list_api_calls` with KFL: `http && status_code == 500` |
| "What's hitting the payment service?" | `list_api_calls` with KFL: `dst.service.name == "payment-service"` |
| "Any DNS failures?" | `list_api_calls` with KFL: `dns && status_code != 0` |
| "Show traffic from namespace prod to staging" | `list_api_calls` with KFL: `src.pod.namespace == "prod" && dst.pod.namespace == "staging"` |
| "What are the slowest API calls?" | `list_api_calls` with KFL: `http && elapsed_time > 5000000` |
The user's natural language maps to KFL. Your job is to translate intent into
the right filter and run the query — don't summarize old results or speculate
without fresh data.
### Investigation Strategy
Start broad, then narrow:
@@ -255,16 +394,17 @@ Start broad, then narrow:
full payload to understand what went wrong.
4. Use KFL filters to slice by namespace, service, protocol, or any combination.
**Example `list_api_calls` response** (filtered to `http && status_code >= 500`):
**Example `list_api_calls` response** (filtered to `http && status_code >= 500`,
timestamps converted from UTC to local):
```
┌──────────────────────┬────────┬──────────────────────────┬────────┬───────────┐
Timestamp │ Method │ URL │ Status │ Elapsed │
├──────────────────────┼────────┼──────────────────────────┼────────┼───────────┤
│ 2026-03-14 17:23:45 │ POST │ /api/v1/orders/charge │ 503 │ 12,340 ms │
│ 2026-03-14 17:23:46 │ POST │ /api/v1/orders/charge │ 503 │ 11,890 ms │
│ 2026-03-14 17:23:48 │ GET │ /api/v1/inventory/check │ 500 │ 8,210 ms │
│ 2026-03-14 17:24:01 │ POST │ /api/v1/payments/process │ 502 │ 30,000 ms │
└──────────────────────┴────────┴──────────────────────────┴────────┴───────────┘
┌──────────────────────────────────────────┬────────┬──────────────────────────┬────────┬───────────┐
Timestamp │ Method │ URL │ Status │ Elapsed │
├──────────────────────────────────────────┼────────┼──────────────────────────┼────────┼───────────┤
│ 2026-03-14 19:23:45 IST (17:23:45 UTC) │ POST │ /api/v1/orders/charge │ 503 │ 12,340 ms │
│ 2026-03-14 19:23:46 IST (17:23:46 UTC) │ POST │ /api/v1/orders/charge │ 503 │ 11,890 ms │
│ 2026-03-14 19:23:48 IST (17:23:48 UTC) │ GET │ /api/v1/inventory/check │ 500 │ 8,210 ms │
│ 2026-03-14 19:24:01 IST (17:24:01 UTC) │ POST │ /api/v1/payments/process │ 502 │ 30,000 ms │
└──────────────────────────────────────────┴────────┴──────────────────────────┴────────┴───────────┘
Src: api-gateway (prod) → Dst: payment-service (prod)
```
@@ -305,8 +445,9 @@ conn && conn_state == "open" && conn_local_bytes > 1000000 // High-volume conne
The two routes are complementary. A common pattern:
1. Start with **Dissection** — let the AI agent search and identify the root cause
2. Once you've pinpointed the problematic workloads, use `resolve_workload`
to get their IPs
2. Once you've pinpointed the problematic workloads, use `list_workloads`
to get their IPs (singular lookup by name+namespace, or filtered scan
by namespace/regex/labels against the snapshot)
3. Switch to **PCAP** — export a filtered PCAP of just those workloads for
Wireshark deep-dive, sharing with the network team, or compliance archival
@@ -315,11 +456,16 @@ The two routes are complementary. A common pattern:
### Post-Incident RCA
1. Identify the incident time window from alerts, logs, or user reports
2. Check `get_data_boundaries` — is the window still in raw capture?
3. `create_snapshot` covering the incident window (add 15 minutes buffer)
4. **Dissection route**: `start_snapshot_dissection``get_api_stats`
2. Check `get_data_boundaries` — is the window still in raw capture (L4)?
3. Check `get_l7_data_boundaries` — was dissection enabled at that time, and
does the window overlap with the L7 entry range? If `dissection_enabled`
is `false` or the window predates the L7 range, the Dissection route is
limited to whatever entries exist now — falling back to the PCAP route
is often the right call.
4. `create_snapshot` covering the incident window (add 15 minutes buffer)
5. **Dissection route**: `start_snapshot_dissection``get_api_stats`
`list_api_calls``get_api_call` → follow the dependency chain
5. **PCAP route**: `resolve_workload``export_snapshot_pcap` with BPF →
6. **PCAP route**: `list_workloads``export_snapshot_pcap` with BPF →
hand off to Wireshark or archive
### Other Use Cases

View File

@@ -0,0 +1,26 @@
# Security Audit Skill
A Kubeshark MCP skill that teaches AI agents to perform systematic Kubernetes
network security audits using the MITRE ATT&CK framework. It examines DNS
queries, HTTP requests, L4 flows, and protocol-level payloads to detect
compromised workloads, C2 communication, data exfiltration, cryptomining,
lateral movement, and credential theft.
See [SKILL.md](SKILL.md) for the full methodology.
## Demo
The demo below shows a real security audit session against a compromised
`k8s-mule` namespace containing 21 workloads, 6 of which were actively
compromised with C2, cryptomining, secret theft, S3 exfiltration, port
scanning, and Redis reconnaissance.
### Claude Code Session
<!-- TODO: replace with animated GIF once recorded -->
![Security Audit Demo](https://raw.githubusercontent.com/kubeshark/assets/master/png/security-audit-demo.gif)
### Sample Audit Report
<!-- TODO: replace with animated GIF once recorded -->
![Security Audit Report](https://raw.githubusercontent.com/kubeshark/assets/master/png/security-audit-report.gif)

View File

@@ -0,0 +1,724 @@
---
name: security-audit
description: >
Kubernetes network security audit skill powered by Kubeshark MCP. Use this skill
whenever the user wants to audit a cluster for security threats, detect compromised
workloads, find malicious traffic patterns, hunt for indicators of compromise (IOCs),
check for data exfiltration, identify C2 (command and control) communication,
detect cryptomining, find lateral movement, discover credential theft attempts,
assess network security posture, or perform threat hunting in Kubernetes.
Also trigger when the user mentions security audit, threat detection, compromise
assessment, vulnerability scan, "is my cluster compromised", "find malicious traffic",
"check for threats", DNS exfiltration, DNS tunneling, port scanning, IMDS access,
reverse shell, crypto miner, MITRE ATT&CK, IOC detection, anomaly detection,
suspicious traffic, rogue workloads, unauthorized access, or any request to
evaluate cluster security through network traffic analysis.
---
# Kubernetes Network Security Audit with Kubeshark MCP
You are a Kubernetes network security specialist. Your job is to systematically
audit cluster traffic for indicators of compromise, malicious behavior, and
security threats — using network traffic as the ground truth.
Network traffic cannot lie. Logs can be tampered with, metrics can be spoofed,
but packets on the wire reveal what workloads actually do — what they connect to,
what protocols they speak, what data they send. Your audit leverages this by
examining DNS queries, HTTP requests, L4 flows, and protocol-level payloads
across every dimension of the MITRE ATT&CK framework.
## Prerequisites
Before starting any audit, verify the environment is ready.
**Tool**: `check_kubeshark_status`
Confirm Kubeshark is deployed and tools are available. You need at minimum:
`list_api_calls`, `list_l4_flows`, `list_workloads`, `get_api_call`.
**KFL requirement**: This skill uses KFL filters for all queries. Before
constructing any filter, load the KFL skill (`skills/kfl/`). KFL is statically
typed — incorrect field names will fail silently. If the KFL skill is not
loaded, only use the exact filter examples shown in this skill.
**KFL error resilience**: If a KFL filter returns `undeclared reference` or
similar errors, **do not give up on that phase**. Fall back to:
1. Port-based filtering: `dst.port == 5432` instead of protocol flags
2. Name-based filtering: `dst.name.contains("db")` or `src.name.contains("pod-name")`
3. Browsing entries with `get_api_call` on IDs from `list_l4_flows`
A KFL error means the filter syntax is wrong, not that the data doesn't exist.
## Audit Methodology
A security audit is NOT an incident investigation. You are not responding to
a known event — you are proactively searching for threats that may be hiding
in normal traffic. This requires a systematic sweep across all threat categories,
not a single focused query.
The audit has **two sections** that run in sequence:
```
SECTION A: Real-Time Analysis → Instant, uses live dissected traffic
SECTION B: Snapshot Deep Dive → Immutable evidence, protocol-level inspection
```
### Why Two Sections?
Kubeshark has two modes of data access:
1. **Real-time dissection** — traffic is dissected as it flows through the
cluster. Provides instant access to L7 data (DNS, HTTP, etc.) that is
already captured and indexed. However, real-time dissection is resource-
intensive and may not be enabled, or may have gaps in coverage.
2. **Snapshots** — immutable captures of raw traffic within a time window.
Must be created explicitly, then dissected separately. Guarantees complete
coverage of all packets in the window, but takes time to create and index.
Section A uses whatever is already available — fast, immediate, but possibly
incomplete. Section B creates snapshots for thorough, evidence-grade analysis.
### Severity Classification
Classify every finding using this framework:
| Severity | Criteria | Examples |
|----------|----------|---------|
| **CRITICAL** | Active data exfiltration, credential theft in progress, confirmed C2 | DNS tunneling, IMDS credential harvest, mining pool connections |
| **HIGH** | Reconnaissance with cluster-wide scope, confirmed unauthorized access | K8s API secret enumeration, port scanning, cluster-admin abuse |
| **MEDIUM** | Suspicious patterns requiring investigation, limited-scope recon | Cross-namespace probes, outdated User-Agents, unusual external connections |
| **LOW** | Anomalies that may be benign, single-instance events | Unknown workloads, new external destinations, noisy but not malicious |
### Timezone
Kubeshark returns timestamps in UTC. Always convert to local time before
presenting to the user. Detect the local timezone at the start (e.g.,
`date +%Z`). Present local time as primary, with UTC in parentheses:
`15:03:22 IST (12:03:22 UTC)`.
**Conversion**: Kubeshark timestamps are Unix milliseconds. To convert:
`ms / 1000` → Unix seconds → datetime → format with timezone offset.
Example: `1778534735974``2026-05-11 14:05:35 PDT (21:05:35 UTC)`.
---
## SECTION A: Real-Time Analysis
**Goal**: Fast initial sweep using live data that's already available. No
waiting for snapshot creation or dissection.
### Step 1: Check What's Available
**Tool**: `check_kubeshark_status`
Confirm Kubeshark is running and which tools are available.
**Tool**: `get_data_boundaries`
Check how far back raw capture data exists. You need this to plan snapshot
creation in Step 3 — call it now so the data is ready when you need it.
**Tool**: `list_workloads` (no snapshot_id — queries live state)
Get the current workload inventory for the target namespace. This returns
pod names, namespaces, and IP addresses. Save the IPs — you'll need them
throughout the audit.
**Note**: `list_workloads` without a `snapshot_id` may fail with some
Kubeshark versions (`snapshot_id is required for filtered listing`). If
this happens, use individual lookups with `name` + `namespace` parameters,
or skip to Step 3 and get the workload inventory from the first snapshot.
### Step 2: Query Live Traffic
In parallel, query the real-time dissected traffic across key dimensions.
Use `list_api_calls` and `list_l4_flows` **without** a `snapshot_id` to
hit the live data.
Run these queries simultaneously:
| Query | KFL Filter | What You're Looking For |
|-------|-----------|------------------------|
| DNS traffic | `dns` | Mining domains, high-entropy subdomains, external resolution, NXDOMAIN flood |
| HTTP traffic | `http` | C2 beaconing, suspicious URLs, external destinations, anomalous headers |
| L4 flows | (via `list_l4_flows`) | External IPs, suspicious ports (3333, 4444), IMDS (169.254.169.254), fan-out patterns |
| PostgreSQL | `postgresql` | SQL injection patterns, sensitive table access |
| Redis | `redis` | Dangerous commands (CONFIG, KEYS, CLIENT LIST) |
Filter by namespace if the user specified one (e.g., `dns && src.pod.namespace == "k8s-mule"`).
**Important**: Real-time dissection may have incomplete data — traffic that
arrived before dissection was enabled, or during gaps in coverage, won't
appear. Treat Section A findings as a fast first pass, not the final word.
### Step 3: Create Snapshots (Sequential — One at a Time)
While analyzing real-time data, begin creating snapshots for Section B.
**CRITICAL: Create snapshots ONE AT A TIME, sequentially.** Kubeshark only
supports one concurrent snapshot download. Parallel creation will cause
failures and data loss. The pattern is:
1. Create snapshot → wait for completion → start dissection → move to next
2. Snapshot creation is fast (seconds). Dissection is slow (minutes).
3. You do NOT need to wait for dissection before creating the next snapshot.
Create the next snapshot while the previous one dissects.
Use the data boundaries from Step 1 (`get_data_boundaries`) to calculate
how many snapshots are needed:
```
total_range_ms = newest_timestamp - oldest_timestamp
window_ms = 240000 # 4 minutes
num_snapshots = ceil(total_range_ms / window_ms)
```
Then create snapshots in **4-minute increments**, starting from the most
recent:
```
Step 1: create_snapshot (now - 4min → now)
→ poll get_snapshot until status == "completed"
→ start_snapshot_dissection
Step 2: create_snapshot (now - 8min → now - 4min)
→ poll get_snapshot until status == "completed"
→ start_snapshot_dissection
Step 3: create_snapshot (now - 12min → now - 8min)
→ poll get_snapshot until status == "completed"
→ start_snapshot_dissection
```
**Polling pattern**: After `create_snapshot`, call `get_snapshot` with the
returned snapshot ID to check status. Repeat until `status == "completed"`.
After `start_snapshot_dissection`, call `get_snapshot_dissection_status`
and check until `progress == 100`.
4-minute windows balance snapshot size (fast to create and dissect) against
coverage (captures threats with sleep cycles up to ~3 minutes). Most attack
patterns in the wild repeat within 30-120 seconds.
**Do not skip this step.** A single short snapshot will miss threats with
longer sleep cycles. The 4-minute windows ensure full coverage.
**Note**: Small snapshots (under ~15 minutes of traffic) often dissect in
seconds rather than minutes. If dissection completes quickly, you can
collapse the phased approach (immediate data first, L7 after) into a
single pass through all phases.
### Step 4: Present Intermediate Results
Present Section A findings to the user as **intermediate results** — clearly
labeled as preliminary:
```
## Intermediate Results (Real-Time Analysis)
⚠️ These findings are based on live dissected traffic, which may have
gaps in coverage. Snapshot analysis is in progress and will provide
the complete, evidence-grade audit.
[findings table and details]
Snapshots are being created and dissected. Full report to follow.
```
This gives the user immediate value while snapshots process. But be explicit:
**the audit is not complete until Section B finishes.**
---
## SECTION B: Snapshot Deep Dive
**Goal**: Systematic, thorough analysis against immutable snapshot data.
This is the evidence-grade section — complete coverage, reproducible results.
**The audit is NOT done until this section completes.** Snapshots must be
created, dissected, and analyzed at L7 before the final report is generated.
Section A may miss traffic that wasn't being dissected in real-time — Section B
captures everything in the raw PCAP buffer, including traffic that real-time
dissection dropped or never saw. Do not skip this section or treat Section A
results as the final word.
### What a Snapshot Gives You
A completed snapshot provides **three independent data sources** — do not
wait for dissection to use the first two:
| Source | Available | Tool | What It Provides |
|--------|-----------|------|-----------------|
| **Workloads & IPs** | Immediately | `list_workloads` with `snapshot_id` | Pod names, namespaces, IPs at capture time |
| **L4 Flows** | Immediately | `list_l4_flows` with `snapshot_id` | TCP/UDP connections: src/dst IPs, ports, bytes, duration |
| **PCAP Export** | Immediately | `export_snapshot_pcap` | Raw packets filtered by BPF expression |
| **L7 Dissection** | After indexing | `list_api_calls`, `get_api_call`, `get_api_stats` | DNS queries, HTTP requests, SQL statements, Redis commands, gRPC methods |
### Audit Flow Per Snapshot
For each 4-minute snapshot, run the full 7-phase sweep. Start with immediate
data while dissection completes:
```
Snapshot ready
├── Start dissection (background)
├── Phase 1: list_workloads (immediate) — workload inventory + IPs
│ export_snapshot_pcap (immediate) — raw packet evidence
├── Phase 3: list_l4_flows (immediate) — external flows, port scanning
├── Phase 4: list_l4_flows (immediate) — lateral movement, fan-out
├── [dissection completes]
├── Phase 2: list_api_calls — DNS threat analysis
├── Phase 5: list_api_calls — protocol abuse (PG, Redis, gRPC)
├── Phase 6: list_api_calls — credential access (IMDS, cloud APIs)
└── Phase 7: correlate all findings
```
Process snapshots in reverse chronological order (most recent first). If the
first snapshot reveals enough threats, you may not need to analyze all of them.
### PCAP for Deep Inspection
PCAP export happens in Phase 1b (immediately after snapshot creation). In
later phases, if a new finding needs deeper packet-level analysis beyond
what `list_api_calls` provides, export additional PCAPs using the workload
IPs collected in Phase 1a:
```
export_snapshot_pcap(snapshot_id, bpf_filter="host <workload_ip>")
```
### Merging Findings Across Snapshots
Threats that appear in multiple snapshots are confirmed persistent. One-time
events in a single snapshot may be transient. Note which findings repeat
across snapshots — persistence is a strong signal of real compromise vs.
a single anomalous event.
---
## Phase 1: Workload Inventory & PCAP Evidence
**Goal**: Identify all active workloads, collect their IPs, and export raw
PCAP evidence — all before dissection completes.
**Data source**: Immediate (no dissection needed).
### 1a: Workload Inventory
**Tool**: `list_workloads` with `snapshot_id`
Query with the target namespace (or all namespaces). The response includes
pod names, namespaces, and **IP addresses at capture time** — these IPs are
critical for building BPF filters in later phases and for correlating L4
flows to workload identities.
For each workload, note:
- Pod name and namespace
- IP address (save these — you'll need them for PCAP export and L4 analysis)
- Whether it's expected (matches known deployments)
**What to flag**:
- Workloads not matching any known Deployment/DaemonSet/StatefulSet
- Pods with names that mimic system components (e.g., `kube-proxy-debug`)
- Unexpected number of replicas or pods in the namespace
### 1b: PCAP Export (Immediate — No Dissection Needed)
**Tool**: `export_snapshot_pcap` with `snapshot_id`
PCAP export is available immediately after snapshot creation — it reads raw
packets, not dissected data. Use it now to preserve evidence and get raw
packet-level visibility before L7 dissection completes.
**Export PCAP for every CRITICAL finding** from Section A's real-time analysis.
Use the workload IPs from 1a to build BPF filters:
```
export_snapshot_pcap(snapshot_id, bpf_filter="host <workload_ip>")
```
This is especially useful for:
- Verifying encrypted C2 (TLS ClientHello SNI inspection)
- Confirming Stratum mining protocol content
- Extracting DNS tunnel payloads at packet level
- Preserving forensic evidence before cluster changes
If Section A identified no CRITICAL findings yet, export a broad PCAP for
the most suspicious workloads based on L4 flow analysis (Phase 3).
---
## Phase 2: DNS Threat Analysis
**Goal**: DNS is the single most reliable indicator of compromise. Every attack
that communicates externally needs DNS resolution. Sweep DNS traffic for all
known threat patterns.
### 2a: External DNS (Non-Cluster Queries)
**Tool**: `list_api_calls` with KFL: `dns`
Examine all DNS queries. Flag anything that is NOT `*.cluster.local` or
`*.svc.cluster.local` — these are external resolutions that reveal what
workloads are reaching out to.
**What to flag**:
| Pattern | Threat | KFL Filter |
|---------|--------|------------|
| Mining pool domains (minexmr, nanopool, mining-pool) | Cryptojacking | `dns && dns_questions.exists(q, q.contains("minexmr"))` |
| High-entropy subdomains (base64-like, >30 chars) | DNS tunneling / exfiltration | `dns` — then inspect subdomain length and entropy |
| DGA patterns (random .com/.net with NXDOMAIN) | C2 beaconing | `dns && dns_response && size(dns_answers) == 0` |
| DoH resolver domains (cloudflare-dns.com, dns.google) | DNS bypass / C2 channel | `dns && dns_questions.exists(q, q.contains("cloudflare-dns"))` |
| Cloud API domains (sts.amazonaws.com, s3.amazonaws.com) | Stolen credential usage | `dns && dns_questions.exists(q, q.contains("amazonaws.com"))` |
| C2/attacker domains (attacker, c2, darknet, exfil) | Command & Control | `dns && dns_questions.exists(q, q.contains("c2"))` |
### 2b: DNS Query Volume and Types
High query volume from a single pod is suspicious. Also check for unusual
record types:
- **TXT queries** to external domains → data exfiltration
- **NULL queries** → DNS tunneling (iodine, dnscat2)
- **AXFR queries** → zone transfer attempts (reconnaissance)
- **SRV queries** to many namespaces → service enumeration
### 2c: NXDOMAIN Ratio
A high NXDOMAIN ratio (>20% of queries) from a single source suggests DGA
beaconing — the malware tries many generated domains, most of which don't exist.
**Tool**: `list_api_calls` with KFL: `dns && dns_response && size(dns_answers) == 0`
Compare the count of failed queries to total queries per source pod.
---
## Phase 3: External Communication
**Goal**: Identify all traffic leaving the cluster. Any pod connecting to
external IPs or domains needs justification.
**Data source**: Immediate (no dissection needed). Use L4 flows first,
then enrich with L7 data from dissection when available.
### 3a: L4 External Flows
**Tool**: `list_l4_flows` with `snapshot_id`
This is available immediately — do not wait for dissection. Use the workload
IPs from Phase 1 to map flows to pod identities.
Look for flows where the destination is NOT a cluster-internal IP (not RFC 1918:
10.x.x.x, 172.16-31.x.x, 192.168.x.x). Every external flow is a potential
exfiltration or C2 channel.
**What to flag**:
| Pattern | Threat | Severity |
|---------|--------|----------|
| Destination 169.254.169.254 | IMDS metadata credential theft | CRITICAL |
| Destination port 3333, 14433, 45700 | Stratum mining protocol | CRITICAL |
| Destination port 4444, 1337 | Reverse shell / backdoor | CRITICAL |
| Persistent connections to single external IP | C2 beaconing | HIGH |
| Large outbound data volume (>1MB) to external | Data exfiltration | HIGH |
| Connections to cloud API endpoints (port 443) | Stolen credential usage | MEDIUM |
### 3b: HTTP External Requests
**Tool**: `list_api_calls` with KFL: `http && !dst.pod.namespace.startsWith("kube")`
Inspect outbound HTTP requests for:
- **Beaconing patterns**: Regular-interval requests to the same external URL
- **Suspicious User-Agents**: `Mozilla/4.0`, `curl/`, empty, or malware-like
- **Suspicious paths**: `/check?s=`, `/beacon`, `/heartbeat`, `/proxy?coin=`
- **Base64 in headers**: Oversized Cookie or custom X-* headers with encoded data
- **gRPC to external**: `Content-Type: application/grpc` to non-cluster destinations
- **WebSocket upgrades**: `Upgrade: websocket` to external hosts (potential mining)
---
## Phase 4: Lateral Movement
**Goal**: Identify pods communicating with services they shouldn't — crossing
namespace boundaries, probing infrastructure, or scanning the network.
**Data source**: L4 flows (immediate) for port scanning detection. L7
dissection (after indexing) for cross-namespace HTTP and API server analysis.
### 4a: Cross-Namespace Traffic
**Tool**: `list_api_calls` with KFL: `src.pod.namespace != dst.pod.namespace`
Most pods should only talk within their namespace (and to kube-system services).
Cross-namespace traffic to unexpected destinations is a lateral movement indicator.
### 4b: Kubernetes API Server Access
**Tool**: `list_api_calls` with KFL: `http && dst.port == 443 && path.startsWith("/api")`
Check what pods are querying the K8s API server and what they're requesting:
| API Path | Threat | Severity |
|----------|--------|----------|
| `/api/v1/secrets` | Secret enumeration | CRITICAL |
| `/api/v1/pods` | Workload discovery | HIGH |
| `/apis/rbac.authorization.k8s.io` | RBAC reconnaissance | HIGH |
| `/api/v1/configmaps` | Config enumeration | MEDIUM |
| `/api/v1/namespaces` | Namespace discovery | MEDIUM |
A pod hitting **multiple** of these paths is performing systematic enumeration,
not legitimate API access. Legitimate workloads typically access 1-2 specific
resources, not sweep across resource types.
### 4c: Port Scanning Detection
**Tool**: `list_l4_flows` with `snapshot_id` (immediate — no dissection needed)
Use the workload IPs from Phase 1 to identify the source pod.
Look for a single source IP with connections to:
- Many distinct destination IPs (>10)
- Many distinct destination ports (>5)
- High connection failure rate (RST/timeout)
This is a textbook port scan pattern.
### 4d: Service Fingerprinting
**Tool**: `list_api_calls` with KFL: `http && (path == "/.env" || path == "/actuator/info" || path == "/server-info" || path == "/version")`
These paths are used for service fingerprinting — mapping what software is
running on internal endpoints. A pod probing multiple services with these
paths is performing reconnaissance.
### 4e: Service Account Permission Audit via Traffic
Cross-reference Phase 4b findings (K8s API traffic) with the source pod's
actual service account to determine if permissions are excessive.
For each pod making API server calls:
1. **Identify the service account**: From the workload inventory or via
`kubectl get pod <name> -n <ns> -o jsonpath='{.spec.serviceAccountName}'`
2. **Check what it accessed**: The API paths from Phase 4b reveal what the
pod actually queried (secrets, pods, RBAC, configmaps)
3. **Compare against expected access**: A `frontend` pod should never hit
`/api/v1/secrets`. A `batch-processor` has no reason to query
`/apis/rbac.authorization.k8s.io/v1/clusterrolebindings`.
**What to flag**:
| Pattern | Threat | Severity |
|---------|--------|----------|
| Pod queries secrets but its SA only needs pod read | Over-privileged SA or stolen token | HIGH |
| Pod hits cluster-wide endpoints (`--all-namespaces` style queries) | Cluster-admin binding | CRITICAL |
| Pod's SA is `default` but makes authenticated API calls | Token mounted unnecessarily | MEDIUM |
| Multiple pods share the same over-privileged SA | Lateral blast radius | HIGH |
This converts a network finding (API traffic volume) into an actionable RBAC
recommendation — telling the user exactly which ClusterRoleBinding to revoke.
### 4f: Cross-Namespace Threat Correlation
When port scanning or lateral movement targets IPs outside the audited
namespace (e.g., IPs in the pod CIDR `10.244.x.x` that don't belong to
any workload in the target namespace), resolve them to identify the
cross-namespace blast radius:
1. Use `list_workloads` (all namespaces) to map destination IPs to pods
2. Identify which namespaces are being probed
3. Flag the scope: "port scan from `k8s-mule/network-diagnostics` is
targeting pods in `default`, `monitoring`, and `kube-system`"
This turns a single-namespace finding into a cluster-wide risk assessment.
---
## Phase 5: Protocol Abuse
**Goal**: Inspect L7 payload content for attack patterns within supported
protocols. This is the phase most often skipped — and where subtle threats hide.
### 5a: PostgreSQL Wire Protocol
**Tool**: `list_api_calls` with KFL: `postgresql`
The `postgresql_query` variable contains the full SQL text. Use it to detect:
| KFL Filter | Threat | Severity |
|------------|--------|----------|
| `postgresql && postgresql_query.contains("UNION SELECT")` | SQL injection | HIGH |
| `postgresql && postgresql_query.contains("pg_shadow")` | Password hash theft | CRITICAL |
| `postgresql && postgresql_query.contains("information_schema")` | Schema enumeration | MEDIUM |
| `postgresql && postgresql_query.contains("TRUNCATE")` | Data destruction | CRITICAL |
| `postgresql && postgresql_query.contains("DROP TABLE")` | Data destruction | CRITICAL |
| `postgresql && !postgresql_success` | Failed queries (may indicate probing) | MEDIUM |
Use `get_api_call` to inspect the full SQL content. Also check `postgresql_user`
— queries from unexpected users are suspicious.
### 5b: Redis Protocol
**Tool**: `list_api_calls` with KFL: `redis`
Use `redis_type` (command verb) and `redis_command` (full command line) to detect:
| KFL Filter | Threat | Severity |
|------------|--------|----------|
| `redis && redis_type == "CONFIG"` | Server config dump/write | HIGH |
| `redis && redis_type == "KEYS"` | Full key enumeration | HIGH |
| `redis && redis_type == "CLIENT"` | Connection enumeration | MEDIUM |
| `redis && redis_type == "DEBUG"` | Debug access | MEDIUM |
| `redis && redis_command.contains("CONFIG SET dir")` | Arbitrary file write (RCE) | CRITICAL |
| `redis && redis_type == "FLUSHALL"` | Data destruction | CRITICAL |
### 5c: gRPC Endpoints
**Tool**: `list_api_calls` with KFL: `grpc`
Use `grpc_method` to inspect method names:
| KFL Filter | Threat | Severity |
|------------|--------|----------|
| `grpc && grpc_method.contains("Reflection")` | API surface enumeration | MEDIUM |
| `grpc && dst.name.contains("attacker")` | Data exfiltration | HIGH |
| `grpc && grpc_status != 0` | Failed gRPC calls (may indicate probing) | LOW |
### 5d: HTTP Request Anomalies
**Tool**: `list_api_calls` with KFL: `http`
Check for:
- **WebSocket upgrades to external hosts**: `Upgrade: websocket` header — potential
mining proxy or persistent C2 channel
- **DNS-over-HTTPS requests**: `accept: application/dns-json` header — DNS bypass
- **AWS Signature headers**: `Authorization: AWS4-HMAC-SHA256` — stolen cloud creds
- **IMDS-specific headers**: `X-aws-ec2-metadata-token-ttl-seconds` — token request
---
## Phase 6: Credential Access
**Goal**: Detect active credential theft — IMDS access, service account abuse,
cloud API exploitation.
### 6a: Instance Metadata Service (IMDS)
**Tool**: `list_api_calls` with KFL: `dst.ip == "169.254.169.254"`
Or use `list_l4_flows` to find connections to 169.254.169.254.
Any pod connecting to this IP is attempting to steal the node's cloud credentials.
Check the HTTP paths:
| Path | What's Being Stolen |
|------|-------------------|
| `/latest/meta-data/iam/security-credentials/` | IAM role name |
| `/latest/meta-data/iam/security-credentials/<role>` | Actual AWS credentials |
| `/latest/dynamic/instance-identity/document` | Instance identity (account ID, region) |
| `/latest/user-data` | Instance bootstrap scripts (may contain secrets) |
| `/latest/api/token` (PUT) | IMDSv2 session token |
### 6b: Service Account Token Exfiltration
Look for HTTP requests where the body or headers contain JWT tokens
(strings starting with `eyJ`). These may be service account tokens being
sent to external endpoints.
---
## Phase 7: Attack Chain Correlation
**Goal**: Connect individual findings into a coherent attack narrative.
After completing phases 1-6, synthesize findings into an attack chain. Real
attacks follow a progression:
```
1. INITIAL ACCESS → How did the attacker get in?
2. RECONNAISSANCE → Port scanning, DNS enumeration, API discovery
3. CREDENTIAL ACCESS → IMDS theft, secret enumeration, token exfil
4. LATERAL MOVEMENT → Cross-namespace probing, SSRF, service scanning
5. EXFILTRATION → DNS tunneling, HTTP exfil, gRPC streaming
6. PERSISTENCE → C2 beaconing, cryptomining (monetization)
```
Map each finding to a stage. If you see findings across multiple stages from
the same namespace or related workloads, you've found a coordinated attack.
### Output Format
Present the audit results as:
1. **Workload inventory** — table of all observed workloads with threat level
2. **Detailed findings** — one section per finding, ordered by severity
3. **Attack chain summary** — if findings correlate, map the kill chain
4. **Immediate actions** — prioritized remediation steps
---
## Audit Report — Two-Stage Delivery
The audit produces **two outputs** — an intermediate report during Section A,
and a final PDF report after Section B completes.
### Stage 1: Intermediate Report (after Section A)
Present findings from real-time analysis directly in the conversation. Clearly
label as preliminary. This gives the user immediate value while snapshots
are being created and dissected.
### Stage 2: Final PDF Report (after Section B)
This is the primary deliverable. It is generated **only after all snapshots
have been dissected and analyzed at L7**. Do not generate the final report
based on Section A alone — that would miss protocol-level threats (SQL
injection, Redis abuse, gRPC exfil) that only appear after dissection.
1. **Write** the report as markdown: `security-audit-<namespace>-<date>.md`
Follow the template in `references/report-template.md` — it defines
the full structure: executive summary, threat table, detailed findings
with evidence, attack chain analysis, detection coverage, and remediation.
2. **Convert to PDF** (in preference order):
```bash
npx md-to-pdf security-audit-<namespace>-<date>.md # Best quality
pandoc security-audit-<namespace>-<date>.md -o security-audit-<namespace>-<date>.pdf
```
If neither tool is available, leave the markdown as the deliverable.
3. **The final report must include findings from both sections** — Section A
(real-time) and Section B (snapshot dissection). Findings confirmed by
both sections are marked with higher confidence. Findings only in
Section B (missed by real-time) should be noted — this reveals gaps
in real-time dissection coverage.
### Key Report Requirements
- **Quote raw evidence** — actual DNS queries, HTTP URLs, SQL statements,
Redis commands. The reader must be able to verify without re-running.
- **Timestamp every finding** — snapshot ID + local time (UTC in parentheses).
- **Specific recommendations** — not "fix RBAC" but "revoke ClusterRoleBinding
`mule-recon-cluster-admin`".
- **Include MITRE ATT&CK IDs** for each finding.
- **Evidence preservation** — list snapshot IDs, recommend cloud storage upload.
---
## What Network Auditing Cannot Detect
Be transparent about blind spots. Network traffic analysis **cannot** detect:
- **Configuration vulnerabilities**: Privileged containers, missing resource
limits, permissive RBAC, hostPath mounts — these are YAML-level issues with
no traffic signature
- **Secrets in environment variables**: Hardcoded credentials don't generate
network traffic until used
- **Image vulnerabilities**: CVEs in container images are not visible on the wire
- **Idle threats**: A malicious pod that hasn't started communicating yet
Recommend `kubectl`-based configuration auditing for these gaps. Network
auditing is the complement, not the replacement, for config-level security
scanning.
## Threat Intelligence Reference
For detailed descriptions of all 22 network-observable threat scenarios with
MITRE ATT&CK mappings and detection guidance, see `references/threat-catalog.md`.

View File

@@ -0,0 +1,64 @@
# KFL Quick Reference: Security Audit Filters
## DNS Threat Hunting
```
dns // All DNS traffic
dns && dns_response && size(dns_answers) == 0 // Failed lookups (NXDOMAIN — no answers)
dns && dns_questions.exists(q, q.contains("minexmr")) // Mining pool DNS
dns && dns_questions.exists(q, q.contains("nanopool")) // Mining pool DNS
dns && dns_questions.exists(q, q.contains("amazonaws")) // Cloud API resolution
dns && dns_questions.exists(q, q.contains("cloudflare-dns")) // DoH bypass
dns && dns_questions.exists(q, q.contains("dns.google")) // DoH bypass
```
## External Communication
```
http && dst.name.contains("attacker") // Known-bad destinations
http && map_get(request.headers, "user-agent", "").contains("Mozilla/4.0") // Suspicious UA
http && map_get(request.headers, "accept", "").contains("dns-json") // DoH requests
http && map_get(request.headers, "upgrade", "") == "websocket" // WebSocket (potential mining)
```
## Lateral Movement
```
src.pod.namespace != dst.pod.namespace // Cross-namespace traffic
http && path.startsWith("/api/v1/secrets") // Secret enumeration
http && path == "/.env" // Service fingerprinting
http && path == "/actuator/info" // Spring Boot fingerprinting
http && path == "/version" // Version fingerprinting
```
## Protocol Inspection
```
postgresql // PostgreSQL wire protocol
postgresql && postgresql_query.contains("UNION SELECT") // SQL injection patterns
postgresql && !postgresql_success // Failed PostgreSQL queries
redis // Redis protocol
grpc // gRPC calls (native detection)
grpc && grpc_method.contains("Reflection") // gRPC reflection enumeration
```
## Credential Theft
```
dst.ip == "169.254.169.254" // IMDS access
http && path.contains("/meta-data/iam") // IAM credential paths
http && map_get(request.headers, "authorization", "").startsWith("AWS4-HMAC-SHA256") // Stolen AWS creds
http && "x-aws-ec2-metadata-token-ttl-seconds" in request.headers // IMDSv2 token request
```
## Resource Hijacking
```
dst.port == 3333 // Stratum mining (standard)
dst.port == 14433 // Stratum mining (alt)
dst.port == 45700 // Stratum mining (alt)
dst.port == 4444 // Reverse shell / backdoor
```
## Per-Namespace Scoping
Add namespace filters to any query above:
```
dns && src.pod.namespace == "k8s-mule" // DNS from specific namespace
http && src.pod.namespace == "k8s-mule" // HTTP from specific namespace
redis && src.pod.namespace == "k8s-mule" // Redis from specific namespace
```

View File

@@ -0,0 +1,102 @@
# Security Audit Report Template
Use this template for the markdown report. Fill in all sections, then convert
to PDF.
```markdown
# Kubernetes Network Security Audit Report
**Cluster**: <cluster name/context>
**Namespace**: <target namespace>
**Date**: <audit date and time, local timezone>
**Audit window**: <start time> — <end time> (<duration>)
**Snapshots analyzed**: <count and IDs>
**Audited by**: Claude Code + Kubeshark MCP
---
## Executive Summary
<2-3 sentence summary: how many threats found, highest severity,
whether an active attack chain was identified, top recommendation>
## Threat Summary
| # | Severity | Workload | Threat | MITRE ATT&CK |
|---|----------|----------|--------|---------------|
| 1 | CRITICAL | log-shipper | DNS Tunneling | T1048.003 |
| 2 | CRITICAL | cloud-health-monitor | IMDS Credential Theft | T1552.005 |
| ... | | | | |
## Detailed Findings
### Finding 1: <Title> (CRITICAL)
**Workload**: <pod name>
**MITRE ATT&CK**: <technique ID and name>
**Snapshot**: <snapshot ID>
**Detection method**: <which phase and tool detected this>
**Evidence**:
<Specific traffic data — DNS queries, HTTP requests, L4 flows,
protocol payloads. Include timestamps, source/dest, and relevant
content. Quote actual query names, URLs, SQL statements, or
Redis commands observed.>
**Impact**:
<What this means — data at risk, credentials exposed, scope of access>
**Recommendation**:
<Specific remediation — NetworkPolicy, RBAC change, pod deletion, credential rotation>
---
(repeat for each finding)
## Attack Chain Analysis
<If findings correlate, map the kill chain:
Initial Access → Reconnaissance → Credential Access → Lateral Movement →
Exfiltration → Persistence. Identify which workloads participate in each stage.>
## Detection Coverage
| Phase | Checked | Findings |
|-------|---------|----------|
| Workload Inventory | Yes | <count> |
| DNS Threat Analysis | Yes | <count> |
| External Communication | Yes | <count> |
| Lateral Movement | Yes | <count> |
| Protocol Abuse | Yes | <count> |
| Credential Access | Yes | <count> |
## Limitations
<What this audit cannot detect — config-level vulnerabilities,
image CVEs, idle threats. Recommend complementary tools.>
## Immediate Actions
1. <Highest priority action>
2. <Second priority>
3. ...
## Evidence Preservation
<List snapshot IDs created during this audit. Recommend uploading
to cloud storage for long-term retention. Include PCAP export
commands for key findings.>
```
## Quality Guidelines
- **Include raw evidence** — quote actual DNS queries, HTTP URLs, SQL
statements, Redis commands. The reader should be able to verify findings
without re-running the audit.
- **Timestamp everything** — every finding should reference the snapshot ID
and timestamp (local time with UTC in parentheses).
- **Be specific in recommendations** — not "fix RBAC" but "revoke
ClusterRoleBinding `mule-recon-cluster-admin` and replace with a
namespace-scoped Role granting only `get` on `pods`".
- **Include MITRE ATT&CK IDs** — makes the report actionable for security
teams that track coverage against the framework.

View File

@@ -0,0 +1,190 @@
# Network Threat Catalog
22 network-observable threat patterns organized by MITRE ATT&CK tactic.
Each entry describes the attack, what it looks like on the wire, and how
to detect it with Kubeshark.
## Command & Control (TA0011)
### DGA Beaconing (T1568.002)
- **What**: Malware generates pseudo-random domain names daily and queries DNS
for each. The C2 operator registers a few; most resolve to NXDOMAIN.
- **Wire signature**: Burst of DNS queries for high-entropy .com/.net domains
with >80% NXDOMAIN response rate.
- **KFL**: `dns && dns_response && size(dns_answers) == 0` — then check for entropy in queried names.
- **Difficulty**: Medium. NXDOMAIN flood is distinctive but low-rate DGA can
blend with legitimate DNS failures.
### HTTP C2 Beaconing (T1071.001)
- **What**: Implant calls home via HTTP GET at regular intervals, receiving
tasking in the response body. Cobalt Strike, Meterpreter pattern.
- **Wire signature**: Periodic HTTP GET to fixed external URL at suspiciously
regular intervals (30-60s). Outdated User-Agent (Mozilla/4.0). Session
identifiers in URL path.
- **KFL**: `http && dst.name.contains("attacker")` or check for User-Agent anomalies.
- **Difficulty**: Medium. Regularity is the key anomaly.
### Encrypted C2 (T1573.002)
- **What**: C2 over HTTPS. Content is encrypted but TLS SNI reveals suspicious
domain names.
- **Wire signature**: Outbound TLS to non-standard domains (darknet, cdn-mirror).
DNS queries preceding the connection reveal the target.
- **KFL**: `dns && (dns_questions.exists(q, q.contains("darknet")) || dns_questions.exists(q, q.contains("cdn-mirror")))`.
- **Difficulty**: Hard. Encrypted, uses standard port 443.
### DNS-over-HTTPS C2 (T1572)
- **What**: Bypasses cluster DNS by sending queries as HTTPS to public DoH
resolvers (cloudflare-dns.com, dns.google). C2 commands embedded in TXT
responses.
- **Wire signature**: HTTP requests to DoH endpoints with `accept: application/dns-json`
header. No corresponding queries on port 53.
- **KFL**: `http && (dst.name.contains("cloudflare-dns") || dst.name.contains("dns.google"))`.
- **Difficulty**: Hard. Looks like regular HTTPS to trusted providers.
## Exfiltration (TA0010)
### DNS Tunneling (T1048.003)
- **What**: Full bidirectional data channel over DNS using tools like iodine,
dnscat2. Data encoded in long subdomain labels.
- **Wire signature**: High-frequency DNS queries (20+/burst) with subdomain
labels near 63-byte limit. Mix of A, TXT, NULL query types.
- **KFL**: `dns && dns_questions.exists(q, q.contains("data-relay"))` or look for
high query rates per source.
- **Difficulty**: Medium. Volume and long subdomains are distinctive.
### HTTP Header Exfiltration (T1048.001)
- **What**: Data exfiltrated in HTTP headers (Cookie, X-Trace-ID) disguised
as analytics tracking. Low volume to evade detection.
- **Wire signature**: HTTP GET to analytics-looking URL with oversized Cookie
or custom headers containing base64-encoded data.
- **KFL**: `http && dst.name.contains("cdn-provider")`.
- **Difficulty**: Hard. Low volume, standard HTTP, looks like analytics.
### DNS Credential Exfiltration (T1048.003)
- **What**: Stolen JWT tokens or credentials encoded in DNS TXT queries to
attacker-controlled authoritative nameserver.
- **Wire signature**: DNS TXT queries with structured multi-label subdomains
containing base64-like encoded data.
- **KFL**: `dns && dns_questions.exists(q, q.contains("steal-creds"))`.
- **Difficulty**: Medium. Multi-label structure is distinctive.
### gRPC Stream Exfiltration (T1048.001)
- **What**: Data exfiltration via gRPC (HTTP/2) POST to external endpoint.
Blends with normal microservice traffic.
- **Wire signature**: HTTP/2 POST with `Content-Type: application/grpc` to
external destination with exfil-related method names.
- **KFL**: `grpc && dst.name.contains("attacker")`.
- **Difficulty**: Hard. gRPC is normal in K8s. External destination is the signal.
## Lateral Movement (TA0008)
### K8s API Enumeration (T1613)
- **What**: Compromised pod uses mounted service account token to enumerate
secrets, pods, RBAC bindings across all namespaces.
- **Wire signature**: HTTPS to kubernetes.default.svc with broad GET requests
across /api/v1/secrets, /pods, /configmaps, /clusterrolebindings.
- **KFL**: `http && dst.port == 443 && path.contains("/api/v1/secrets")`.
- **Difficulty**: Medium. The fanout across resource types is the anomaly.
### SSRF to Internal Services (T1090)
- **What**: Pod probes cross-namespace internal services it shouldn't talk to —
kube-dns metrics, Prometheus, Grafana, dashboards.
- **Wire signature**: HTTP to multiple ClusterIP services across namespaces
from a single source pod.
- **KFL**: `http && src.pod.namespace == "k8s-mule" && dst.pod.namespace != "k8s-mule"`.
- **Difficulty**: Medium. Cross-namespace breadth is the signal.
### Port Scanning (T1046)
- **What**: Sweep of common ports across pod CIDR after initial access.
- **Wire signature**: Rapid TCP SYN from single source to many IPs on ports
80, 443, 3306, 5432, 6379, 8080, 9090, 27017. High RST/timeout rate.
- **KFL**: `tcp && src.name == "network-diagnostics"`.
- **Difficulty**: Easy. Classic scan pattern — high fan-out, high failure rate.
### Service Fingerprinting (T1046)
- **What**: HTTP probes to discovery paths across multiple services to identify
running software.
- **Wire signature**: HTTP GET to /version, /healthz, /.env, /actuator/info,
/server-info. HEAD and OPTIONS methods. Multiple targets from one source.
- **KFL**: `http && (path == "/.env" || path == "/actuator/info")`.
- **Difficulty**: Medium. Path patterns are distinctive.
## Credential Access (TA0006)
### IMDS Metadata Theft (T1552.005)
- **What**: Query AWS/GCP instance metadata to steal IAM role credentials.
The Capital One breach vector.
- **Wire signature**: HTTP to 169.254.169.254 with paths /latest/meta-data/iam/,
/latest/user-data, /latest/api/token (PUT for IMDSv2).
- **KFL**: `dst.ip == "169.254.169.254"`.
- **Difficulty**: Easy. Destination IP is unique and unmistakable.
### Cloud API Abuse (T1078.004)
- **What**: Direct calls to AWS APIs (STS, S3, EC2) with stolen credentials
from a workload pod.
- **Wire signature**: DNS for sts.amazonaws.com, s3.amazonaws.com. HTTPS
requests with AWS Signature V4 Authorization headers.
- **KFL**: `dns && dns_questions.exists(q, q.contains("amazonaws.com"))`.
- **Difficulty**: Medium. Cloud API DNS from a non-controller pod is suspicious.
## Resource Hijacking (TA0040)
### Stratum Mining Protocol (T1496)
- **What**: XMRig/miner connecting to mining pool via Stratum JSON-RPC over TCP.
- **Wire signature**: TCP connection to port 3333/14433/45700 with JSON-RPC
messages: mining.subscribe, mining.authorize, mining.submit.
- **KFL**: `dst.port == 3333`.
- **Difficulty**: Medium. Port 3333 is a well-known mining indicator.
### Mining Pool DNS (T1496)
- **What**: DNS resolution of known mining pool domains before connecting.
- **Wire signature**: DNS queries for domains containing minexmr, nanopool,
mining-pool, hashvault, supportxmr.
- **KFL**: `dns && (dns_questions.exists(q, q.contains("minexmr")) || dns_questions.exists(q, q.contains("mining-pool")))`.
- **Difficulty**: Easy. Mining domain names are unmistakable.
### WebSocket Mining (T1496)
- **What**: Browser-based miner communicating via WebSocket on standard ports.
- **Wire signature**: HTTP Upgrade: websocket request to external host with
mining-related URL path (/proxy?coin=, ?algo=randomx).
- **KFL**: `http && map_get(request.headers, "upgrade", "") == "websocket"`.
- **Difficulty**: Hard. WebSocket on port 80/443 looks normal. Only URL reveals intent.
## Protocol Abuse
### SQL Injection via PG Wire (T1190)
- **What**: SQL injection payloads sent through PostgreSQL wire protocol.
- **Wire signature**: PG protocol carrying UNION SELECT, information_schema,
pg_shadow queries.
- **KFL**: `postgresql`.
- **Difficulty**: Medium. PG dissection reveals the SQL content directly.
### Redis Unauthorized Access (T1190)
- **What**: Unauthenticated Redis instance probed with dangerous commands.
- **Wire signature**: Redis protocol: CONFIG GET *, KEYS *, CLIENT LIST, DEBUG.
- **KFL**: `redis`.
- **Difficulty**: Easy. Redis command names are directly visible.
### Database Destruction (T1485)
- **What**: Ransomware pattern — SELECT * (data theft) then TRUNCATE/DROP (destruction).
- **Wire signature**: PG protocol showing SELECT followed by TRUNCATE on same table.
- **KFL**: `postgresql`.
- **Difficulty**: Medium. DDL commands in PG protocol are visible with dissection.
## Reconnaissance (TA0043)
### DNS Zone Enumeration (T1018)
- **What**: Brute-force DNS queries across namespaces to discover services.
Includes SRV lookups and AXFR zone transfer attempts.
- **Wire signature**: High volume of DNS queries for *.svc.cluster.local patterns
across many namespaces. Many NXDOMAIN responses.
- **KFL**: `dns && src.name == "service-discovery"`.
- **Difficulty**: Easy. Volume and cross-namespace pattern is obvious.
### gRPC Reflection Enumeration (T1046)
- **What**: Probing gRPC server reflection to discover API surfaces without
needing proto files.
- **Wire signature**: HTTP/2 POST to /grpc.reflection.v1alpha.ServerReflection/
ServerReflectionInfo across multiple services.
- **KFL**: `grpc && grpc_method.contains("Reflection")` or `http && path.contains("grpc.reflection")`.
- **Difficulty**: Medium. Reflection path is a known enumeration vector.