Compare commits

..

2 Commits

Author SHA1 Message Date
Alon Girmonsky
c1dbbd2386 Replace HTML embeds with image placeholders in security-audit README 2026-05-19 22:37:24 -07:00
Alon Girmonsky
35dea1bc9a Add security-audit skill README with demo session and sample report 2026-05-19 22:13:09 -07:00
4 changed files with 119 additions and 38 deletions

24
.github/workflows/release-tag.yml vendored Normal file
View File

@@ -0,0 +1,24 @@
name: Auto-tag release
on:
pull_request:
types: [closed]
branches: [master]
jobs:
tag:
if: github.event.pull_request.merged == true && startsWith(github.event.pull_request.head.ref, 'release/v')
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- uses: actions/checkout@v5
with:
fetch-depth: 0
- name: Create and push tag
run: |
VERSION="${GITHUB_HEAD_REF#release/}"
echo "Creating tag $VERSION on master"
git tag "$VERSION"
git push origin "$VERSION"

View File

@@ -268,8 +268,8 @@ release: ## Print release workflow instructions.
@echo ""
@echo " Shortcut: make release-pr VERSION=x.y.z runs 1 → 2 → 3."
@echo ""
@echo " After both PRs merge, create the release tag:"
@echo " make release-tag VERSION=x.y.z"
@echo " After both PRs merge: tag is created automatically,"
@echo " or run: make release-tag VERSION=x.y.z"
# Internal: validate VERSION before any release-* target runs.
_release-check-version:
@@ -362,18 +362,14 @@ release-pr: release-siblings release-pr-kubeshark release-pr-helm ## Run release
@echo " - kubeshark.github.io: Review and merge the helm chart PR."
@echo "Tag will be created automatically, or run: make release-tag VERSION=$(VERSION)"
release-tag: _release-check-version ## Step 2: Tag master after release PR is merged. Idempotent; re-run to retrigger the release build.
release-tag: ## Step 2 (fallback): Tag master after release PR is merged.
@echo "Verifying release PR was merged..."
@if ! gh pr list --state merged --head release/v$(VERSION) --json number --jq '.[0].number' | grep -q .; then \
echo "Error: No merged PR found for release/v$(VERSION). Merge the PR first."; \
exit 1; \
fi
@git checkout master && git pull
@if git ls-remote --tags origin "refs/tags/v$(VERSION)" | grep -q .; then \
echo "Tag v$(VERSION) already exists on origin — deleting to retrigger release..."; \
git push origin :refs/tags/v$(VERSION); \
fi
@git tag -d v$(VERSION) 2>/dev/null; git tag v$(VERSION) && git push origin "refs/tags/v$(VERSION)"
@git tag -d v$(VERSION) 2>/dev/null; git tag v$(VERSION) && git push origin --tags
@echo ""
@echo "Tagged v$(VERSION) on master. GitHub Actions will build the release."

View File

@@ -0,0 +1,26 @@
# Security Audit Skill
A Kubeshark MCP skill that teaches AI agents to perform systematic Kubernetes
network security audits using the MITRE ATT&CK framework. It examines DNS
queries, HTTP requests, L4 flows, and protocol-level payloads to detect
compromised workloads, C2 communication, data exfiltration, cryptomining,
lateral movement, and credential theft.
See [SKILL.md](SKILL.md) for the full methodology.
## Demo
The demo below shows a real security audit session against a compromised
`k8s-mule` namespace containing 21 workloads, 6 of which were actively
compromised with C2, cryptomining, secret theft, S3 exfiltration, port
scanning, and Redis reconnaissance.
### Claude Code Session
<!-- TODO: replace with animated GIF once recorded -->
![Security Audit Demo](https://raw.githubusercontent.com/kubeshark/assets/master/png/security-audit-demo.gif)
### Sample Audit Report
<!-- TODO: replace with animated GIF once recorded -->
![Security Audit Report](https://raw.githubusercontent.com/kubeshark/assets/master/png/security-audit-report.gif)

View File

@@ -113,15 +113,25 @@ waiting for snapshot creation or dissection.
Confirm Kubeshark is running and which tools are available.
**Tool**: `get_data_boundaries`
Check how far back raw capture data exists. You need this to plan snapshot
creation in Step 3 — call it now so the data is ready when you need it.
**Tool**: `list_workloads` (no snapshot_id — queries live state)
Get the current workload inventory for the target namespace. This returns
pod names, namespaces, and IP addresses. Save the IPs — you'll need them
throughout the audit.
**Note**: `list_workloads` without a `snapshot_id` may fail with some
Kubeshark versions (`snapshot_id is required for filtered listing`). If
this happens, use individual lookups with `name` + `namespace` parameters,
or skip to Step 3 and get the workload inventory from the first snapshot.
### Step 2: Query Live Traffic
**Tool**: `get_l7_data_boundaries`
Check the time boundaries of dissected API calls in the real-time database.
This tells you how far back L7 data is available — use it to understand
the scope of your real-time queries before running them.
Then query the real-time dissected traffic across key dimensions.
In parallel, query the real-time dissected traffic across key dimensions.
Use `list_api_calls` and `list_l4_flows` **without** a `snapshot_id` to
hit the live data.
@@ -145,12 +155,6 @@ appear. Treat Section A findings as a fast first pass, not the final word.
While analyzing real-time data, begin creating snapshots for Section B.
**Tool**: `get_data_boundaries`
Check how far back raw capture data exists. Raw capture is the FIFO buffer
that feeds snapshot creation — this tells you the time window available
for snapshots (which is different from the L7 boundaries in Step 2).
**CRITICAL: Create snapshots ONE AT A TIME, sequentially.** Kubeshark only
supports one concurrent snapshot download. Parallel creation will cause
failures and data loss. The pattern is:
@@ -160,7 +164,8 @@ failures and data loss. The pattern is:
3. You do NOT need to wait for dissection before creating the next snapshot.
Create the next snapshot while the previous one dissects.
Use `get_data_boundaries` to calculate how many snapshots are needed:
Use the data boundaries from Step 1 (`get_data_boundaries`) to calculate
how many snapshots are needed:
```
total_range_ms = newest_timestamp - oldest_timestamp
@@ -242,6 +247,7 @@ wait for dissection to use the first two:
| Source | Available | Tool | What It Provides |
|--------|-----------|------|-----------------|
| **Workloads & IPs** | Immediately | `list_workloads` with `snapshot_id` | Pod names, namespaces, IPs at capture time |
| **L4 Flows** | Immediately | `list_l4_flows` with `snapshot_id` | TCP/UDP connections: src/dst IPs, ports, bytes, duration |
| **PCAP Export** | Immediately | `export_snapshot_pcap` | Raw packets filtered by BPF expression |
| **L7 Dissection** | After indexing | `list_api_calls`, `get_api_call`, `get_api_stats` | DNS queries, HTTP requests, SQL statements, Redis commands, gRPC methods |
@@ -255,12 +261,12 @@ Snapshot ready
├── Start dissection (background)
├── Phase 1: list_workloads (immediate) — workload inventory + IPs
│ export_snapshot_pcap (immediate) — raw packet evidence
├── Phase 3: list_l4_flows (immediate) — external flows, port scanning
├── Phase 4: list_l4_flows (immediate) — lateral movement, fan-out
├── [dissection completes]
├── Phase 2: list_api_calls — DNS threat analysis
├── Phase 3: list_api_calls — external HTTP communication
├── Phase 4: list_api_calls — lateral movement, K8s API access
├── Phase 5: list_api_calls — protocol abuse (PG, Redis, gRPC)
├── Phase 6: list_api_calls — credential access (IMDS, cloud APIs)
└── Phase 7: correlate all findings
@@ -390,14 +396,32 @@ Compare the count of failed queries to total queries per source pod.
**Goal**: Identify all traffic leaving the cluster. Any pod connecting to
external IPs or domains needs justification.
**Data source**: L7 dissection (after indexing).
**Data source**: Immediate (no dissection needed). Use L4 flows first,
then enrich with L7 data from dissection when available.
**Note**: L4 flow analysis for external communication is covered in
Section A (Step 2) using `list_l4_flows` against real-time data. In
Section B, use `list_api_calls` against dissected snapshot data for
deeper L7 inspection of external traffic.
### 3a: L4 External Flows
### 3a: HTTP External Requests
**Tool**: `list_l4_flows` with `snapshot_id`
This is available immediately — do not wait for dissection. Use the workload
IPs from Phase 1 to map flows to pod identities.
Look for flows where the destination is NOT a cluster-internal IP (not RFC 1918:
10.x.x.x, 172.16-31.x.x, 192.168.x.x). Every external flow is a potential
exfiltration or C2 channel.
**What to flag**:
| Pattern | Threat | Severity |
|---------|--------|----------|
| Destination 169.254.169.254 | IMDS metadata credential theft | CRITICAL |
| Destination port 3333, 14433, 45700 | Stratum mining protocol | CRITICAL |
| Destination port 4444, 1337 | Reverse shell / backdoor | CRITICAL |
| Persistent connections to single external IP | C2 beaconing | HIGH |
| Large outbound data volume (>1MB) to external | Data exfiltration | HIGH |
| Connections to cloud API endpoints (port 443) | Stolen credential usage | MEDIUM |
### 3b: HTTP External Requests
**Tool**: `list_api_calls` with KFL: `http && !dst.pod.namespace.startsWith("kube")`
@@ -416,11 +440,8 @@ Inspect outbound HTTP requests for:
**Goal**: Identify pods communicating with services they shouldn't — crossing
namespace boundaries, probing infrastructure, or scanning the network.
**Data source**: L7 dissection (after indexing) for cross-namespace HTTP
and API server analysis.
**Note**: Port scanning detection via `list_l4_flows` is covered in
Section A (Step 2) against real-time data.
**Data source**: L4 flows (immediate) for port scanning detection. L7
dissection (after indexing) for cross-namespace HTTP and API server analysis.
### 4a: Cross-Namespace Traffic
@@ -447,7 +468,19 @@ A pod hitting **multiple** of these paths is performing systematic enumeration,
not legitimate API access. Legitimate workloads typically access 1-2 specific
resources, not sweep across resource types.
### 4c: Service Fingerprinting
### 4c: Port Scanning Detection
**Tool**: `list_l4_flows` with `snapshot_id` (immediate — no dissection needed)
Use the workload IPs from Phase 1 to identify the source pod.
Look for a single source IP with connections to:
- Many distinct destination IPs (>10)
- Many distinct destination ports (>5)
- High connection failure rate (RST/timeout)
This is a textbook port scan pattern.
### 4d: Service Fingerprinting
**Tool**: `list_api_calls` with KFL: `http && (path == "/.env" || path == "/actuator/info" || path == "/server-info" || path == "/version")`
@@ -455,7 +488,7 @@ These paths are used for service fingerprinting — mapping what software is
running on internal endpoints. A pod probing multiple services with these
paths is performing reconnaissance.
### 4d: Service Account Permission Audit via Traffic
### 4e: Service Account Permission Audit via Traffic
Cross-reference Phase 4b findings (K8s API traffic) with the source pod's
actual service account to determine if permissions are excessive.
@@ -482,7 +515,7 @@ For each pod making API server calls:
This converts a network finding (API traffic volume) into an actionable RBAC
recommendation — telling the user exactly which ClusterRoleBinding to revoke.
### 4e: Cross-Namespace Threat Correlation
### 4f: Cross-Namespace Threat Correlation
When port scanning or lateral movement targets IPs outside the audited
namespace (e.g., IPs in the pod CIDR `10.244.x.x` that don't belong to
@@ -570,6 +603,8 @@ cloud API exploitation.
**Tool**: `list_api_calls` with KFL: `dst.ip == "169.254.169.254"`
Or use `list_l4_flows` to find connections to 169.254.169.254.
Any pod connecting to this IP is attempting to steal the node's cloud credentials.
Check the HTTP paths: