Kubernetes Network Flow Security Monitoring with Cilium Hubble and Retina

Kubernetes Network Flow Security Monitoring with Cilium Hubble and Retina

Problem

Kubernetes NetworkPolicy controls which pods can communicate with which other pods and services. But NetworkPolicy enforcement without observability is blind enforcement: you know what should be allowed, but you don’t know what is actually happening. Standard Kubernetes audit logs capture API server operations — pod creation, RBAC decisions, secret access — but they contain nothing about the network traffic those pods generate.

The gap matters enormously for security:

Lateral movement is invisible. An attacker who has compromised a pod and is probing internal services — scanning for open database ports, attempting connections to the secrets vault, reaching the Kubernetes API server from a namespace that shouldn’t have access — generates no Kubernetes audit log entries and no application log entries. The only record is in network traffic, which default Kubernetes infrastructure does not capture.

NetworkPolicy bypass is undetected. Misconfigured NetworkPolicy objects, CNI plugin bugs, and kernel-level policy bypass techniques can all result in connections that should be blocked succeeding silently. Without flow-level monitoring, a policy bypass is only discovered when its consequences (data exfiltration, lateral pivot) become apparent.

DNS-based detection is missed. Before making a connection, an attacker’s pod performs DNS lookups — for internal service discovery, for C2 callback domains, for data exfiltration endpoints. DNS-layer monitoring in Kubernetes catches these before connections are established, but it requires either CoreDNS query logging (high volume) or eBPF-based DNS interception at the pod level.

Multi-namespace communication patterns are opaque. In a cluster with dozens of namespaces, understanding which namespaces communicate with which others — and which communications are unexpected — requires network flow data. This is the equivalent of VPC flow logs, but for the internal cluster network.

Cilium Hubble (the observability layer for Cilium CNI), Microsoft Retina (a CNI-agnostic eBPF observability plane), and custom eBPF XDP programs address this gap. They capture network flows at the kernel level via eBPF hooks on pod network interfaces, providing per-flow visibility with pod identity labels, namespace, protocol, source/destination IP and port, and DNS query data — all without application modification.

Target systems: Kubernetes 1.24+ clusters with Cilium ≥1.14, or any cluster where Retina or similar eBPF observability can be deployed as a DaemonSet; clusters where lateral movement detection and NetworkPolicy audit are security requirements.


Threat Model

Adversary 1 — Lateral movement from compromised pod. Access level: code execution inside a pod in namespace app. Objective: probe internal services in namespace database to find an accessible database port. Flow monitoring detects: app/compromised-pod → database/* port 5432 DROPPED (policy blocks but activity is visible) or worse, app/compromised-pod → database/postgres port 5432 ALLOWED (policy gap).

Adversary 2 — Data exfiltration via unexpected egress. Access level: code execution inside an application pod. Objective: exfiltrate data to an external endpoint. Flow monitoring detects: app/pod → 203.0.113.1:443 ALLOWED — an external connection from a pod that should only talk to internal services.

Adversary 3 — C2 callback via DNS. Access level: malicious code in a pod. Objective: resolve a C2 domain and establish a callback connection. DNS flow monitoring detects: app/pod DNS query c2.attacker.com NXDOMAIN (domain sinkholed) or ALLOWED with IP resolution.

Adversary 4 — Kubernetes API server access from unexpected namespace. Access level: pod in ml-training namespace with no legitimate need to call the Kubernetes API. Objective: use the pod’s service account token to enumerate cluster resources. Flow monitoring detects: ml-training/pod → kube-apiserver:443 ALLOWED (should be blocked by NetworkPolicy).

Without monitoring: all four attacks proceed undetected until their consequences are visible. With flow monitoring: each generates an alert or an anomaly in the flow baseline within seconds.


Configuration / Implementation

Step 1 — Enable Hubble in Cilium

If your cluster already uses Cilium, Hubble is included and needs only to be enabled:

# Check if Cilium is running
kubectl get pods -n kube-system -l k8s-app=cilium

# Enable Hubble via Helm upgrade
helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  --set hubble.enabled=true \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \
  --set hubble.metrics.enabled="{dns,drop,tcp,flow,icmp,http}" \
  --set hubble.export.fileMaxSizeMb=10 \
  --set hubble.export.fileMaxBackups=5

# Verify Hubble is running
kubectl get pods -n kube-system -l k8s-app=hubble-relay
kubectl get pods -n kube-system -l k8s-app=hubble-ui

Step 2 — Query flows with Hubble CLI

# Install Hubble CLI
curl -L --remote-name-all https://github.com/cilium/hubble/releases/latest/download/hubble-linux-amd64.tar.gz
tar xzvf hubble-linux-amd64.tar.gz
sudo mv hubble /usr/local/bin/hubble

# Port-forward to Hubble relay
kubectl port-forward -n kube-system svc/hubble-relay 4245:80 &

# Observe all flows in real time
hubble observe --follow

# Filter for specific security-relevant flows:

# 1. Dropped flows (NetworkPolicy denials)
hubble observe --verdict DROPPED --follow

# 2. Flows to Kubernetes API server from unexpected namespaces
hubble observe \
  --to-namespace kube-system \
  --to-port 443 \
  --not-namespace kube-system \
  --not-namespace monitoring \
  --follow

# 3. External egress flows from pods
hubble observe \
  --from-pod app/suspicious-pod \
  --not-to-namespace "" \
  --follow 2>/dev/null | grep -v "10\.\|172\.\|192\.168"

# 4. DNS queries for unusual domains
hubble observe --protocol DNS --follow | grep -v "\.svc\.cluster\.local\|\.internal"

Step 3 — Export flows to your SIEM

Configure Hubble to export flows in JSON to a file or directly to a log aggregator:

# Export flows via Hubble relay to stdout (pipe to Fluent Bit / Vector)
hubble observe \
  --output json \
  --follow \
  --verdict DROPPED,AUDIT 2>/dev/null | \
  # Forward to log aggregator
  nc -q 1 fluentbit.monitoring.svc.cluster.local 5170

Better: use Hubble’s built-in export configuration:

# Cilium Helm values — export flows to stdout for log collector to pick up
hubble:
  export:
    static:
      enabled: true
      filePath: /var/run/cilium/hubble/events.log
      fieldMask:
      - time
      - source
      - destination
      - verdict
      - drop_reason
      - l4
      - l7
      - node_name
      allowList:
      # Only export security-relevant flows to reduce volume
      - '{"verdict":["DROPPED","AUDIT"]}'  # All denials
      - '{"destination_port":[443,80],"source_namespace":["production"]}'  # Prod egress
      - '{"dns":{"query":"true"}}'  # All DNS queries

Deploy Fluent Bit as a DaemonSet sidecar to forward Hubble logs to your SIEM:

# fluent-bit-hubble-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-hubble
  namespace: kube-system
data:
  fluent-bit.conf: |
    [INPUT]
        Name tail
        Path /var/run/cilium/hubble/events.log
        Parser json
        Tag hubble.flows

    [FILTER]
        Name record_modifier
        Match hubble.*
        Record source kubernetes_network_flow

    [OUTPUT]
        Name  forward
        Match hubble.*
        Host  fluentd.logging.svc.cluster.local
        Port  24224

Step 4 — Deploy Microsoft Retina for CNI-agnostic flow monitoring

If your cluster does not use Cilium, Retina works with any CNI:

# Install Retina via Helm
helm repo add retina https://microsoft.github.io/retina/
helm install retina retina/retina-operator \
  --namespace retina-system \
  --create-namespace \
  --set operator.installCRDs=true

# Deploy a RetinaEndpoint to capture flows for specific namespaces
kubectl apply -f - <<'EOF'
apiVersion: retina.sh/v1alpha1
kind: RetinaEndpoint
metadata:
  name: production-network-monitor
  namespace: production
spec:
  # Capture all flows in the production namespace
  netObservabilityPolicy:
    - direction: ingress
      action: capture
    - direction: egress
      action: capture
  # Export to metrics (Prometheus) and structured logs
  exportPolicy:
    prometheusEnabled: true
    logEnabled: true
EOF

Step 5 — Create alerting rules for security-relevant flow patterns

# Prometheus alerting rules for Hubble/Retina flow metrics

groups:
- name: kubernetes-network-security
  rules:
  # Alert on pods reaching the Kubernetes API from non-system namespaces
  - alert: UnexpectedKubeAPIAccess
    expr: |
      sum by (source_namespace, source_pod) (
        rate(hubble_flows_processed_total{
          destination_namespace="kube-system",
          destination_port="443",
          source_namespace!~"kube-system|monitoring|cilium"
        }[5m])
      ) > 0
    labels:
      severity: warning
    annotations:
      summary: "Pod in {{ $labels.source_namespace }} is accessing kube-apiserver"
      description: "{{ $labels.source_pod }} is making connections to kube-apiserver:443"

  # Alert on unexpected external egress from production
  - alert: ProductionExternalEgress
    expr: |
      sum by (source_pod, destination_ip) (
        rate(hubble_flows_processed_total{
          source_namespace="production",
          is_reply="false",
          traffic_direction="EGRESS"
        }[5m])
      ) > 0
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "Production pod making external connection"

  # Alert on spike in dropped flows (potential NetworkPolicy bypass attempt or misconfiguration)
  - alert: NetworkPolicyDropSpike
    expr: |
      rate(hubble_drop_total[5m]) > 
      10 * avg_over_time(rate(hubble_drop_total[5m])[1h:5m])
    labels:
      severity: warning
    annotations:
      summary: "Network policy drop rate spike — possible scanning or misconfiguration"

  # Alert on cross-namespace traffic not expected by topology
  - alert: UnexpectedCrossNamespaceFlow
    expr: |
      sum by (source_namespace, destination_namespace) (
        rate(hubble_flows_processed_total{
          verdict="FORWARDED",
          source_namespace=~"dev|test",
          destination_namespace="production"
        }[5m])
      ) > 0
    labels:
      severity: critical
    annotations:
      summary: "Dev/test namespace accessing production — isolation breach"

Step 6 — Capture DNS flows for C2 detection

# Hubble DNS export configuration
hubble:
  export:
    static:
      allowList:
      # Export all DNS queries for threat hunting
      - '{"l7":{"type":"DNS"}}'
# Query DNS flows for suspicious domains
hubble observe --protocol DNS --follow --output json | \
  jq 'select(.l7.dns.query | test("\\.(xyz|top|tk|ml|ga|cf)$")) | 
      {time: .time, pod: .source.pod_name, namespace: .source.namespace, query: .l7.dns.query}'

SIEM detection rule for DNS-based C2 indicators:

# Elastic KQL / Splunk SPL equivalent
# Alert when a pod queries a domain that has not been seen before
# and does not match *.svc.cluster.local pattern

index=hubble_flows type=DNS 
| NOT query="*.svc.cluster.local" 
| NOT query="*.internal" 
| stats dc(query) as unique_domains by source_namespace, source_pod, _time span=1h
| where unique_domains > 50
| sort -unique_domains

Expected Behaviour

Signal Without flow monitoring With Hubble/Retina
Pod scanning internal services Invisible hubble observe --verdict DROPPED shows blocked probe attempts
NetworkPolicy bypass Invisible until consequences FORWARDED flow to restricted destination triggers alert
External egress from production pod Invisible Alert fires within 2 minutes
DNS query to suspicious TLD from pod Invisible Captured in DNS flow log; matched by threat intel
dev namespace accessing production Invisible Cross-namespace alert fires immediately

Verification:

# Generate a test dropped flow and verify it appears in Hubble
kubectl exec -n app test-pod -- curl --max-time 3 http://database.svc.cluster.local:5432 2>&1
# This should be blocked by NetworkPolicy

# Verify flow appeared in Hubble
hubble observe --verdict DROPPED --from-namespace app | grep "5432"
# Expected: flow entry showing source pod → database:5432 DROPPED

# Verify Prometheus metrics are exported
curl -s http://hubble-relay.kube-system.svc.cluster.local:9965/metrics | \
  grep hubble_flows_processed_total | head -5

Trade-offs

Aspect Benefit Cost Mitigation
Full flow capture Complete visibility; enables threat hunting High log volume in busy clusters; storage cost Filter to DROPPED + external egress for baseline; enable full capture for production namespaces only
DNS flow logging C2 detection; service discovery visibility Very high volume (every DNS query from every pod) Sample or filter to non-internal domains; use rate-limiting in Fluent Bit
Retina on non-Cilium cluster CNI-agnostic; works with Flannel, Calico, AWS VPC Less deep integration; some features require Cilium Retina provides the key security flows (egress, denials, DNS); Cilium provides deeper L7 visibility
SIEM export of all flows Rich investigation data Cost of SIEM storage for flow data Tier storage: hot (DROPPED + external, 7 days), warm (all production flows, 30 days), cold (archive, 1 year)

Failure Modes

Failure Symptom Detection Recovery
Hubble relay is unavailable hubble observe times out; Prometheus metrics gap Hubble relay pod not ready; metrics scrape fails Hubble relay is stateless — restart the pod; flow history is lost but current flows resume
Flow log volume overwhelms Fluent Bit Fluent Bit drops events; gaps in SIEM data Fluent Bit reports backpressure; tail lag increases Add Fluent Bit output buffering; reduce Hubble export scope to DROPPED+AUDIT only
Alert fires on legitimate cross-namespace traffic False positive on known integration On-call investigates; traffic is expected Add the legitimate namespace pair to the alert exception list; document the approved path
Cilium upgrade breaks Hubble export Flow logs stop appearing in SIEM Alerting on flow log gap (absence of expected volume) fires Check cilium status on nodes; Hubble export is restarted with cilium restart