Kubernetes Network Flow Security Monitoring with Cilium Hubble and Retina
Problem
Kubernetes NetworkPolicy controls which pods can communicate with which other pods and services. But NetworkPolicy enforcement without observability is blind enforcement: you know what should be allowed, but you don’t know what is actually happening. Standard Kubernetes audit logs capture API server operations — pod creation, RBAC decisions, secret access — but they contain nothing about the network traffic those pods generate.
The gap matters enormously for security:
Lateral movement is invisible. An attacker who has compromised a pod and is probing internal services — scanning for open database ports, attempting connections to the secrets vault, reaching the Kubernetes API server from a namespace that shouldn’t have access — generates no Kubernetes audit log entries and no application log entries. The only record is in network traffic, which default Kubernetes infrastructure does not capture.
NetworkPolicy bypass is undetected. Misconfigured NetworkPolicy objects, CNI plugin bugs, and kernel-level policy bypass techniques can all result in connections that should be blocked succeeding silently. Without flow-level monitoring, a policy bypass is only discovered when its consequences (data exfiltration, lateral pivot) become apparent.
DNS-based detection is missed. Before making a connection, an attacker’s pod performs DNS lookups — for internal service discovery, for C2 callback domains, for data exfiltration endpoints. DNS-layer monitoring in Kubernetes catches these before connections are established, but it requires either CoreDNS query logging (high volume) or eBPF-based DNS interception at the pod level.
Multi-namespace communication patterns are opaque. In a cluster with dozens of namespaces, understanding which namespaces communicate with which others — and which communications are unexpected — requires network flow data. This is the equivalent of VPC flow logs, but for the internal cluster network.
Cilium Hubble (the observability layer for Cilium CNI), Microsoft Retina (a CNI-agnostic eBPF observability plane), and custom eBPF XDP programs address this gap. They capture network flows at the kernel level via eBPF hooks on pod network interfaces, providing per-flow visibility with pod identity labels, namespace, protocol, source/destination IP and port, and DNS query data — all without application modification.
Target systems: Kubernetes 1.24+ clusters with Cilium ≥1.14, or any cluster where Retina or similar eBPF observability can be deployed as a DaemonSet; clusters where lateral movement detection and NetworkPolicy audit are security requirements.
Threat Model
Adversary 1 — Lateral movement from compromised pod. Access level: code execution inside a pod in namespace app. Objective: probe internal services in namespace database to find an accessible database port. Flow monitoring detects: app/compromised-pod → database/* port 5432 DROPPED (policy blocks but activity is visible) or worse, app/compromised-pod → database/postgres port 5432 ALLOWED (policy gap).
Adversary 2 — Data exfiltration via unexpected egress. Access level: code execution inside an application pod. Objective: exfiltrate data to an external endpoint. Flow monitoring detects: app/pod → 203.0.113.1:443 ALLOWED — an external connection from a pod that should only talk to internal services.
Adversary 3 — C2 callback via DNS. Access level: malicious code in a pod. Objective: resolve a C2 domain and establish a callback connection. DNS flow monitoring detects: app/pod DNS query c2.attacker.com NXDOMAIN (domain sinkholed) or ALLOWED with IP resolution.
Adversary 4 — Kubernetes API server access from unexpected namespace. Access level: pod in ml-training namespace with no legitimate need to call the Kubernetes API. Objective: use the pod’s service account token to enumerate cluster resources. Flow monitoring detects: ml-training/pod → kube-apiserver:443 ALLOWED (should be blocked by NetworkPolicy).
Without monitoring: all four attacks proceed undetected until their consequences are visible. With flow monitoring: each generates an alert or an anomaly in the flow baseline within seconds.
Configuration / Implementation
Step 1 — Enable Hubble in Cilium
If your cluster already uses Cilium, Hubble is included and needs only to be enabled:
# Check if Cilium is running
kubectl get pods -n kube-system -l k8s-app=cilium
# Enable Hubble via Helm upgrade
helm upgrade cilium cilium/cilium \
--namespace kube-system \
--reuse-values \
--set hubble.enabled=true \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set hubble.metrics.enabled="{dns,drop,tcp,flow,icmp,http}" \
--set hubble.export.fileMaxSizeMb=10 \
--set hubble.export.fileMaxBackups=5
# Verify Hubble is running
kubectl get pods -n kube-system -l k8s-app=hubble-relay
kubectl get pods -n kube-system -l k8s-app=hubble-ui
Step 2 — Query flows with Hubble CLI
# Install Hubble CLI
curl -L --remote-name-all https://github.com/cilium/hubble/releases/latest/download/hubble-linux-amd64.tar.gz
tar xzvf hubble-linux-amd64.tar.gz
sudo mv hubble /usr/local/bin/hubble
# Port-forward to Hubble relay
kubectl port-forward -n kube-system svc/hubble-relay 4245:80 &
# Observe all flows in real time
hubble observe --follow
# Filter for specific security-relevant flows:
# 1. Dropped flows (NetworkPolicy denials)
hubble observe --verdict DROPPED --follow
# 2. Flows to Kubernetes API server from unexpected namespaces
hubble observe \
--to-namespace kube-system \
--to-port 443 \
--not-namespace kube-system \
--not-namespace monitoring \
--follow
# 3. External egress flows from pods
hubble observe \
--from-pod app/suspicious-pod \
--not-to-namespace "" \
--follow 2>/dev/null | grep -v "10\.\|172\.\|192\.168"
# 4. DNS queries for unusual domains
hubble observe --protocol DNS --follow | grep -v "\.svc\.cluster\.local\|\.internal"
Step 3 — Export flows to your SIEM
Configure Hubble to export flows in JSON to a file or directly to a log aggregator:
# Export flows via Hubble relay to stdout (pipe to Fluent Bit / Vector)
hubble observe \
--output json \
--follow \
--verdict DROPPED,AUDIT 2>/dev/null | \
# Forward to log aggregator
nc -q 1 fluentbit.monitoring.svc.cluster.local 5170
Better: use Hubble’s built-in export configuration:
# Cilium Helm values — export flows to stdout for log collector to pick up
hubble:
export:
static:
enabled: true
filePath: /var/run/cilium/hubble/events.log
fieldMask:
- time
- source
- destination
- verdict
- drop_reason
- l4
- l7
- node_name
allowList:
# Only export security-relevant flows to reduce volume
- '{"verdict":["DROPPED","AUDIT"]}' # All denials
- '{"destination_port":[443,80],"source_namespace":["production"]}' # Prod egress
- '{"dns":{"query":"true"}}' # All DNS queries
Deploy Fluent Bit as a DaemonSet sidecar to forward Hubble logs to your SIEM:
# fluent-bit-hubble-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-hubble
namespace: kube-system
data:
fluent-bit.conf: |
[INPUT]
Name tail
Path /var/run/cilium/hubble/events.log
Parser json
Tag hubble.flows
[FILTER]
Name record_modifier
Match hubble.*
Record source kubernetes_network_flow
[OUTPUT]
Name forward
Match hubble.*
Host fluentd.logging.svc.cluster.local
Port 24224
Step 4 — Deploy Microsoft Retina for CNI-agnostic flow monitoring
If your cluster does not use Cilium, Retina works with any CNI:
# Install Retina via Helm
helm repo add retina https://microsoft.github.io/retina/
helm install retina retina/retina-operator \
--namespace retina-system \
--create-namespace \
--set operator.installCRDs=true
# Deploy a RetinaEndpoint to capture flows for specific namespaces
kubectl apply -f - <<'EOF'
apiVersion: retina.sh/v1alpha1
kind: RetinaEndpoint
metadata:
name: production-network-monitor
namespace: production
spec:
# Capture all flows in the production namespace
netObservabilityPolicy:
- direction: ingress
action: capture
- direction: egress
action: capture
# Export to metrics (Prometheus) and structured logs
exportPolicy:
prometheusEnabled: true
logEnabled: true
EOF
Step 5 — Create alerting rules for security-relevant flow patterns
# Prometheus alerting rules for Hubble/Retina flow metrics
groups:
- name: kubernetes-network-security
rules:
# Alert on pods reaching the Kubernetes API from non-system namespaces
- alert: UnexpectedKubeAPIAccess
expr: |
sum by (source_namespace, source_pod) (
rate(hubble_flows_processed_total{
destination_namespace="kube-system",
destination_port="443",
source_namespace!~"kube-system|monitoring|cilium"
}[5m])
) > 0
labels:
severity: warning
annotations:
summary: "Pod in {{ $labels.source_namespace }} is accessing kube-apiserver"
description: "{{ $labels.source_pod }} is making connections to kube-apiserver:443"
# Alert on unexpected external egress from production
- alert: ProductionExternalEgress
expr: |
sum by (source_pod, destination_ip) (
rate(hubble_flows_processed_total{
source_namespace="production",
is_reply="false",
traffic_direction="EGRESS"
}[5m])
) > 0
for: 2m
labels:
severity: warning
annotations:
summary: "Production pod making external connection"
# Alert on spike in dropped flows (potential NetworkPolicy bypass attempt or misconfiguration)
- alert: NetworkPolicyDropSpike
expr: |
rate(hubble_drop_total[5m]) >
10 * avg_over_time(rate(hubble_drop_total[5m])[1h:5m])
labels:
severity: warning
annotations:
summary: "Network policy drop rate spike — possible scanning or misconfiguration"
# Alert on cross-namespace traffic not expected by topology
- alert: UnexpectedCrossNamespaceFlow
expr: |
sum by (source_namespace, destination_namespace) (
rate(hubble_flows_processed_total{
verdict="FORWARDED",
source_namespace=~"dev|test",
destination_namespace="production"
}[5m])
) > 0
labels:
severity: critical
annotations:
summary: "Dev/test namespace accessing production — isolation breach"
Step 6 — Capture DNS flows for C2 detection
# Hubble DNS export configuration
hubble:
export:
static:
allowList:
# Export all DNS queries for threat hunting
- '{"l7":{"type":"DNS"}}'
# Query DNS flows for suspicious domains
hubble observe --protocol DNS --follow --output json | \
jq 'select(.l7.dns.query | test("\\.(xyz|top|tk|ml|ga|cf)$")) |
{time: .time, pod: .source.pod_name, namespace: .source.namespace, query: .l7.dns.query}'
SIEM detection rule for DNS-based C2 indicators:
# Elastic KQL / Splunk SPL equivalent
# Alert when a pod queries a domain that has not been seen before
# and does not match *.svc.cluster.local pattern
index=hubble_flows type=DNS
| NOT query="*.svc.cluster.local"
| NOT query="*.internal"
| stats dc(query) as unique_domains by source_namespace, source_pod, _time span=1h
| where unique_domains > 50
| sort -unique_domains
Expected Behaviour
| Signal | Without flow monitoring | With Hubble/Retina |
|---|---|---|
| Pod scanning internal services | Invisible | hubble observe --verdict DROPPED shows blocked probe attempts |
| NetworkPolicy bypass | Invisible until consequences | FORWARDED flow to restricted destination triggers alert |
| External egress from production pod | Invisible | Alert fires within 2 minutes |
| DNS query to suspicious TLD from pod | Invisible | Captured in DNS flow log; matched by threat intel |
| dev namespace accessing production | Invisible | Cross-namespace alert fires immediately |
Verification:
# Generate a test dropped flow and verify it appears in Hubble
kubectl exec -n app test-pod -- curl --max-time 3 http://database.svc.cluster.local:5432 2>&1
# This should be blocked by NetworkPolicy
# Verify flow appeared in Hubble
hubble observe --verdict DROPPED --from-namespace app | grep "5432"
# Expected: flow entry showing source pod → database:5432 DROPPED
# Verify Prometheus metrics are exported
curl -s http://hubble-relay.kube-system.svc.cluster.local:9965/metrics | \
grep hubble_flows_processed_total | head -5
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Full flow capture | Complete visibility; enables threat hunting | High log volume in busy clusters; storage cost | Filter to DROPPED + external egress for baseline; enable full capture for production namespaces only |
| DNS flow logging | C2 detection; service discovery visibility | Very high volume (every DNS query from every pod) | Sample or filter to non-internal domains; use rate-limiting in Fluent Bit |
| Retina on non-Cilium cluster | CNI-agnostic; works with Flannel, Calico, AWS VPC | Less deep integration; some features require Cilium | Retina provides the key security flows (egress, denials, DNS); Cilium provides deeper L7 visibility |
| SIEM export of all flows | Rich investigation data | Cost of SIEM storage for flow data | Tier storage: hot (DROPPED + external, 7 days), warm (all production flows, 30 days), cold (archive, 1 year) |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Hubble relay is unavailable | hubble observe times out; Prometheus metrics gap |
Hubble relay pod not ready; metrics scrape fails | Hubble relay is stateless — restart the pod; flow history is lost but current flows resume |
| Flow log volume overwhelms Fluent Bit | Fluent Bit drops events; gaps in SIEM data | Fluent Bit reports backpressure; tail lag increases | Add Fluent Bit output buffering; reduce Hubble export scope to DROPPED+AUDIT only |
| Alert fires on legitimate cross-namespace traffic | False positive on known integration | On-call investigates; traffic is expected | Add the legitimate namespace pair to the alert exception list; document the approved path |
| Cilium upgrade breaks Hubble export | Flow logs stop appearing in SIEM | Alerting on flow log gap (absence of expected volume) fires | Check cilium status on nodes; Hubble export is restarted with cilium restart |
Related Articles
- Kubernetes Network Policies — the NetworkPolicy objects whose enforcement Hubble makes visible
- Network Flow Analysis — broader network flow analysis covering cloud VPC flows and on-prem NetFlow
- Cilium Network Policy — Cilium-specific network policy features that Hubble can monitor
- Lateral Movement Detection — detection rules for lateral movement patterns that network flow data surfaces
- Kubernetes Forensics Post-Compromise — using Hubble flow data as forensic evidence after a suspected compromise