Behavioral Detection for Active CVE Exploitation in Production

Behavioral Detection for Active CVE Exploitation in Production

The Problem

The patch window is real. Between CVE publication and complete patch deployment across a production fleet, there is a period — hours to days — during which an exploit can succeed. Vulnerability scanners tell you what’s unpatched. They don’t tell you whether it’s being exploited right now. That gap is where behavioral detection fills in.

Behavioral detection for CVE exploitation differs from general anomaly detection in an important way: each CVE has a specific exploitation pattern. A buffer overflow in NGINX’s HTTP/2 HPACK decoder has a different syscall signature than a deserialization gadget chain in a Java application, which has a different signature than a use-after-free in the kernel’s io_uring implementation. Writing detection rules that match a specific CVE’s exploitation pattern — rather than “anything unusual” — produces high-fidelity alerts with low false-positive rates.

The data sources and signal types by exploit class:

Kernel LPE exploits (io_uring, netfilter, nf_tables, namespace UAF): characteristic sequences of clone(), unshare(), setuid() calls from a process that was previously unprivileged; sudden appearance of a root shell spawned from a non-root container process.

Web application CVEs (path traversal, SSTI, SSRF, deserialization): anomalous outbound connections from web server processes; execve() calls from the web server UID; access to files outside the web root from the web process context.

Container escape CVEs (runc, containerd, OCI runtime): mount() syscalls from within a container namespace that resolve to host paths; container process appearing in host PID namespace unexpectedly.

Memory corruption CVEs (heap overflow, UAF in libraries): program receiving a SIGSEGV or SIGABRT that is immediately caught and continues executing (exploit successfully turns a crash into code execution); /proc/self/maps read followed by mprotect() calls (heap spray or JIT-style exploitation).

Target systems: Linux hosts and Kubernetes nodes with Falco, Tetragon, or custom eBPF programs; Prometheus or a SIEM (Splunk, Elastic, OpenSearch) for correlation; structured audit logging enabled.

Threat Model

1. Attacker exploiting unpatched kernel LPE CVE (container breakout path). Objective: exploit a known kernel CVE from within a container to gain root on the host. Impact: full host compromise; lateral movement to other containers. Behavioral signal: privilege escalation syscall sequence; setuid(0) from non-root; ns_capable bypass pattern.

2. Attacker exploiting web app CVE via remote code execution (external). Objective: exploit a deserialization or SSTI CVE in a web application to spawn a reverse shell. Impact: arbitrary command execution under web server context; pivot to broader network access. Behavioral signal: execve() from web server process with unusual command; outbound connection to non-whitelisted IP.

3. Attacker exploiting CVE in a shared library (supply chain). Objective: exploit a CVE in a commonly-used library (OpenSSL, libpng, zlib) that was patched upstream but not yet updated in the application container. Impact: varies by library and CVE class; typically memory disclosure or RCE. Behavioral signal: abnormal memory access patterns; /proc/self/mem reads; signal handling anomalies.

4. Attacker performing CVE enumeration (reconnaissance). Objective: send payloads for known CVEs to enumerate which are present and unpatched. Impact: reconnaissance informs targeted exploitation. Behavioral signal: repeated error responses from the same source IP with CVE-associated request patterns.

Hardening Configuration

Falco Rules for CVE-Class Behavioral Detection

# /etc/falco/rules/cve-behavioral-detection.yaml

# ─────────────────────────────────────────────────────────────────
# CVE Class: Kernel LPE via namespace privilege escalation
# Matches patterns seen in io_uring, nf_tables, and namespace UAFs
# ─────────────────────────────────────────────────────────────────
- rule: Potential Kernel LPE Exploitation - Privilege Escalation Sequence
  desc: >
    Detects a non-root process executing a setuid/setgid syscall after
    a sequence of unshare/clone calls — characteristic of namespace-based LPE exploits.
  condition: >
    (syscall.type = setuid or syscall.type = setgid)
    and not user.uid = 0
    and proc.pname != "su"
    and proc.pname != "sudo"
    and not proc.name in (trusted_privilege_executables)
  output: >
    Potential LPE exploitation detected (user=%user.name uid=%user.uid
    proc=%proc.name pid=%proc.pid parent=%proc.pname cmdline=%proc.cmdline
    container=%container.name)
  priority: CRITICAL
  tags: [cve, lpe, privilege-escalation]

# ─────────────────────────────────────────────────────────────────
# CVE Class: Web application RCE — shell spawned from web process
# ─────────────────────────────────────────────────────────────────
- rule: Potential Web App RCE - Shell Spawned from Web Server
  desc: >
    A shell (sh, bash, dash) spawned by a web server process indicates
    potential RCE exploitation via command injection, SSTI, or deserialization.
  condition: >
    spawned_process
    and shell_procs
    and proc.pname in (web_server_procs)
    and not proc.args startswith "-c echo"   # Health check exclusion
  output: >
    Shell spawned from web server process (shell=%proc.name
    parent=%proc.pname pid=%proc.pid cmdline=%proc.cmdline
    container=%container.name image=%container.image.repository)
  priority: CRITICAL
  tags: [cve, rce, web]

# ─────────────────────────────────────────────────────────────────
# CVE Class: Container escape — mount from within container to host path
# ─────────────────────────────────────────────────────────────────
- rule: Potential Container Escape - Suspicious Mount
  desc: >
    A mount() syscall from within a container namespace targeting a
    device or path that resolves to host filesystem is indicative of
    container escape exploitation (runc, containerd CVE classes).
  condition: >
    syscall.type = mount
    and container.id != host
    and (evt.arg.dev startswith "/dev/sd" or evt.arg.dev startswith "/dev/nvme"
         or evt.arg.target startswith "/proc/1/")
    and not container.privileged = true   # Privileged containers are expected to mount
  output: >
    Suspicious mount attempt from container (container=%container.name
    image=%container.image.repository dev=%evt.arg.dev
    target=%evt.arg.target user=%user.name)
  priority: CRITICAL
  tags: [cve, container-escape, mount]

# ─────────────────────────────────────────────────────────────────
# CVE Class: Memory corruption — /proc/self/maps read + mprotect pattern
# ─────────────────────────────────────────────────────────────────
- rule: Potential Memory Corruption Exploitation - ASLR Bypass Pattern
  desc: >
    A process reads /proc/self/maps (ASLR bypass) immediately followed by
    mprotect() calls on heap regions — characteristic of heap-based CVE exploitation.
  condition: >
    (open_read and fd.name = "/proc/self/maps")
    and within 2s (syscall.type = mprotect and evt.arg.prot contains PROT_EXEC)
    and not proc.name in (jit_runtimes)   # JVMs, V8, etc. legitimately do this
  output: >
    Potential ASLR bypass + mprotect pattern (proc=%proc.name pid=%proc.pid
    container=%container.name)
  priority: WARNING
  tags: [cve, memory-corruption, aslr]

# Macros
- macro: web_server_procs
  condition: proc.name in (nginx, apache2, httpd, gunicorn, uvicorn, node, php-fpm)

- macro: jit_runtimes
  condition: proc.name in (java, node, python3, ruby, dotnet)

- macro: trusted_privilege_executables
  items: [newgrp, sg, passwd, chsh, chfn, login, sshd, polkit]

Tetragon TracingPolicy for Fine-Grained CVE Detection

Tetragon provides lower-level kernel call tracing with argument inspection:

# tetragon-cve-detection.yaml
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: cve-lpe-detection
spec:
  kprobes:
    # Detect io_uring-based CVE exploitation patterns
    - call: "io_uring_create"
      syscall: false
      args:
        - index: 0
          type: "uint"   # entries
      selectors:
        - matchPIDs:
            - operator: NotIn
              isNamespacePID: true
              values: [1]   # Not PID 1 (init)
          matchCapabilities:
            - type: Effective
              operator: NotIn
              values: ["CAP_SYS_ADMIN"]
          matchActions:
            - action: Sigkill   # Kill process attempting the syscall
              argError: -1

    # Detect kernel keyring exploitation (as used in several LPE CVEs)
    - call: "request_key"
      syscall: true
      args:
        - index: 0
          type: "char_buf"   # type string
          sizeArgIndex: 1
      selectors:
        - matchArgs:
            - index: 0
              operator: Equal
              values: ["user"]
          matchCapabilities:
            - type: Effective
              operator: NotIn
              values: ["CAP_SYS_ADMIN", "CAP_SETUID"]
          matchActions:
            - action: Post   # Log; don't kill (request_key is also benign)

SIEM Correlation Rules

For teams forwarding Falco/Tetragon events to a SIEM, implement correlation rules that detect multi-stage exploit chains:

# Splunk SPL: detect LPE attempt followed by root shell within 60 seconds
index=falco_events 
| eval is_lpe_attempt = if(rule="Potential Kernel LPE Exploitation*", 1, 0)
| eval is_root_shell  = if(rule="Shell*" AND user.uid=0, 1, 0)
| transaction host maxspan=60s startswith="is_lpe_attempt=1" endswith="is_root_shell=1"
| where eventcount >= 2
| table _time host container.name proc.name
| sort -_time
# Elasticsearch KQL: container escape attempt
event.kind: "alert" AND
rule.name: "Potential Container Escape*" AND
@timestamp: [now-1h TO now]

CVE-Specific Detection: Adding Context at Alert Time

When a new CVE drops, create a CVE-specific detection rule that captures the exact exploit pattern:

#!/usr/bin/env python3
# generate-cve-rule.py — scaffold a Falco rule from CVE details

import json, sys

CVE_ID = sys.argv[1]   # e.g., CVE-2026-XXXXX

CVE_DB = {
    "CVE-2026-XXXXX": {
        "description": "io_uring sqpoll thread capability bypass LPE",
        "syscall_pattern": ["io_uring_setup", "setuid"],
        "process_condition": "proc.name != root_processes",
        "priority": "CRITICAL",
    }
}

cve = CVE_DB.get(CVE_ID, {})
if not cve:
    print(f"Unknown CVE: {CVE_ID}")
    sys.exit(1)

rule = f"""
- rule: Active Exploitation Attempt - {CVE_ID}
  desc: >
    Behavioral detection rule for {CVE_ID}: {cve['description']}.
    This rule was generated at CVE publication time.
  condition: >
    (syscall.type in ({', '.join(cve['syscall_pattern'])}))
    and {cve['process_condition']}
  output: >
    POTENTIAL {CVE_ID} EXPLOITATION (proc=%proc.name pid=%proc.pid
    user=%user.name container=%container.name)
  priority: {cve['priority']}
  tags: [cve, {CVE_ID.lower().replace('-', '_')}]
"""
print(rule)

Alerting and Response Integration

# alertmanager: route CVE exploitation alerts to PagerDuty with high urgency
route:
  receiver: security-oncall
  group_by: [alertname, container_name]
  group_wait: 10s
  group_interval: 5m
  repeat_interval: 1h

  routes:
    - match:
        tags: cve
      receiver: security-oncall-critical
      continue: true

receivers:
  - name: security-oncall-critical
    pagerduty_configs:
      - routing_key: "${PAGERDUTY_KEY}"
        severity: critical
        description: "{{ .CommonAnnotations.output }}"
        details:
          container: "{{ .CommonLabels.container_name }}"
          host: "{{ .CommonLabels.node }}"
          runbook: "https://wiki.internal/runbooks/cve-exploitation-response"

Expected Behaviour After Hardening

Attack Scenario Without Detection With Behavioral Detection
Kernel LPE exploit from container Succeeds silently; discovered in post-incident Falco alert fires on setuid from non-root; PagerDuty page within 30s
Web app RCE spawning reverse shell Shell executes; attacker has session Shell-spawned-from-web-server rule fires; container isolated or killed
Container escape via runc CVE Host compromised; other containers at risk Suspicious mount syscall detected; Tetragon kills the process
Exploit attempt with wrong payload (false start) No detection Alert fires; analysts investigate; confirms exploit attempt

Verification:

# Test Falco rule without triggering real exploit
# Simulate the syscall sequence using a test binary
cat > /tmp/test-lpe-sim.c << 'EOF'
#include <sys/unistd.h>
int main() {
    // Simulate: non-root setuid attempt (will fail but should trigger Falco)
    setuid(0);
    return 0;
}
EOF
gcc -o /tmp/test-lpe-sim /tmp/test-lpe-sim.c
sudo -u nobody /tmp/test-lpe-sim

# Check Falco output
journalctl -u falco -n 20 | grep "LPE"
# Expected: alert line with process details

Trade-offs and Operational Considerations

Aspect Benefit Cost Mitigation
Syscall-level detection Catches exploitation regardless of network layer High telemetry volume; storage cost Sample in-frequency for benign syscalls; full capture only for security-relevant calls
CVE-specific rules Low false-positive rate for known CVE patterns Must write new rule for each new CVE Template-based rule generation; maintain a CVE rule library
Tetragon process kill action Immediate containment; stops exploit mid-execution Risk of killing legitimate process matching pattern Start with Post action (log only); promote to Sigkill after tuning
SIEM correlation for multi-stage chains Detects sophisticated attackers who split exploit stages Requires event timeline correlation; complex queries Use transaction-based correlation (Splunk) or EQL sequences (Elastic)

Failure Modes

Failure Symptom Detection Recovery
Falco rule too broad False positives on JVM/Node.js mprotect patterns Alert volume spike; analyst fatigue Add and not proc.name in (jit_runtimes) exclusion; re-tune
Tetragon policy kills legitimate workload Pod terminates unexpectedly; liveness probe fails Pod restart count metric; OOMKilled vs Sigkill in reason Revert to Post action; add process name exclusion; redeploy
Exploit uses novel syscall sequence not in rule Attack succeeds; no alert Post-incident analysis of syscall audit log Update rule with observed sequence; add to CVE rule library
Falco daemon crash No behavioral alerts during outage falco_events_total counter stops incrementing; monitoring gap alert Ensure Falco runs as DaemonSet with restartPolicy: Always; monitor health metric