Detecting and Containing eBPF-Based Rootkits That Blind Your Observability Stack

Detecting and Containing eBPF-Based Rootkits That Blind Your Observability Stack

Problem

eBPF observability tools — Falco, Tetragon, Cilium’s Hubble, and custom BPF programs — have become the dominant method for kernel-level security monitoring in Linux environments. They are low-overhead, production-safe, and capable of tracing system calls, network connections, and process executions with sub-millisecond latency. The assumption underlying all of them is that the BPF programs running in the kernel have exclusive, unfiltered visibility into kernel events.

eBPF rootkits break that assumption. Because eBPF programs run in kernel space alongside the observability stack, a sufficiently privileged attacker who loads their own BPF programs can intercept the same kernel hooks that security tools use — and filter out their own activity before it reaches user space. The rootkit and the security tool are competing programs in the same BPF subsystem, and the rootkit moves second.

Published eBPF rootkit toolkits demonstrate this concretely:

ebpfkit (2021, Guillaume Fournier/Datadog research): hooks sys_read and sys_write to hide files and processes from user space tools; hooks tc network points to filter network telemetry; hooks bpf() syscall itself to hide BPF maps from bpftool inspection.

bad-bpf (2022, PatH): a collection of eBPF programs demonstrating TTY hijacking, process hiding via getdents64 hooking, and privilege escalation via bpf_probe_write_user.

TripleCross (2022): eBPF rootkit using Linux kernel runtime hijacking techniques including task structure manipulation to make processes invisible to ps, top, and /proc traversal.

The attack pattern is consistent across implementations:

  1. Load BPF programs with CAP_BPF (Linux 5.8+) or CAP_SYS_ADMIN.
  2. Hook kprobes or fentry/fexit on the same kernel functions that Falco/Tetragon monitor.
  3. Filter events: if the caller matches the rootkit’s process, return early before the security tool’s BPF program fires or before the data reaches user space ringbuffers.
  4. Optionally hook bpf() syscall to hide the rootkit’s own BPF programs from bpftool prog list.

The effectiveness of this approach depends on two structural weaknesses in current BPF security tooling. First, most tools have no mechanism to detect that another BPF program is modifying the event stream they are reading. They receive filtered data and report it faithfully. Second, the unprivileged BPF restrictions (kernel.unprivileged_bpf_disabled) block unauthorized loading but do not protect against a process that has already obtained CAP_BPF through a preceding privilege escalation.

Kubernetes environments add complexity: Falco and Tetragon typically run as DaemonSets with CAP_SYS_ADMIN or CAP_BPF. A container escape that achieves code execution as root on the node gains the same capability level as the security tool and can load competing BPF programs.

Target systems: Linux 5.8–6.12 with eBPF-based security monitoring (Falco ≥0.32, Tetragon ≥0.10, Cilium ≥1.12, custom BPF agents); Kubernetes nodes with DaemonSet-based observability; any host where CAP_BPF or CAP_SYS_ADMIN can be obtained by an attacker.


Threat Model

Adversary 1 — Container escape with host root. Access level: root on the Kubernetes node after a container escape. Objective: load eBPF rootkit programs to blind Falco and Tetragon before performing lateral movement, preventing any alert from firing during the attack.

Adversary 2 — Compromised privileged DaemonSet. Access level: code execution inside a DaemonSet pod that runs with CAP_BPF (e.g., a compromised custom monitoring agent). Objective: use the pod’s existing BPF capability to load rootkit programs that hide the attacker’s subsequent activity.

Adversary 3 — Kernel LPE followed by BPF rootkit. Access level: initial unprivileged shell, then kernel privilege escalation (e.g., via n_gsm or io_uring UAF). Objective: load BPF rootkit after achieving root to evade post-compromise detection.

Adversary 4 — Supply chain compromise of BPF agent. Access level: control over the container image or binary for an eBPF monitoring tool. Objective: replace the monitoring agent with a version that selectively suppresses alerts while appearing to function normally.

Without hardening: eBPF rootkit loads undetected; security tools report clean telemetry; attacker moves freely. With hardening: BPF program loading triggers alerts; BPF map count anomaly is detected; kernel lockdown restricts BPF loading; secondary observability layer provides independent view.


Configuration / Implementation

Step 1 — Restrict unprivileged BPF loading

# /etc/sysctl.d/90-bpf-hardening.conf

# Block all BPF program loading by unprivileged users
kernel.unprivileged_bpf_disabled = 1

# Enable BPF JIT hardening (prevents ROP chains in BPF programs)
net.core.bpf_jit_harden = 2

# Restrict BPF JIT kallsyms exposure
net.core.bpf_jit_kallsyms = 0

# Restrict kernel pointer exposure (limits info leak for exploit chains)
kernel.kptr_restrict = 2
sysctl --system

# Verify
sysctl kernel.unprivileged_bpf_disabled
# kernel.unprivileged_bpf_disabled = 1

kernel.unprivileged_bpf_disabled = 1 does not prevent root from loading BPF programs, but it eliminates the class of attacks where an unprivileged user loads a BPF rootkit before privilege escalation.

Step 2 — Enable kernel lockdown to restrict BPF from unsigned code

# Check current lockdown mode
cat /sys/kernel/security/lockdown

# Enable lockdown=integrity at boot (blocks BPF from modifying kernel memory)
# Add to kernel command line in /etc/default/grub:
GRUB_CMDLINE_LINUX="lockdown=integrity"
update-grub

# Or on systems using systemd-boot:
# Add lockdown=integrity to /boot/loader/entries/linux.conf options line

lockdown=integrity prevents bpf_probe_write_user calls (used by TripleCross for privilege escalation), blocks /dev/mem and /proc/kcore access, and prevents unsigned kernel module loading. It does not prevent legitimate BPF programs from being loaded by root — it blocks the specific primitives that BPF rootkits use to modify kernel memory.

Note: lockdown=confidentiality is more restrictive (blocks all BPF kprobe writes) but may break some legitimate monitoring tools. Start with integrity.

Step 3 — Monitor BPF program loading and map creation

Use Tetragon or Falco to alert on any new BPF program loading — including BPF programs loading other BPF programs:

Tetragon TracingPolicy:

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: monitor-bpf-loading
spec:
  kprobes:
  - call: "security_bpf_prog_load"
    syscall: false
    args:
    - index: 0
      type: "int"
    selectors:
    - matchPIDs:
      - operator: NotIn
        values:
        # Exempt known legitimate BPF agents (add your PIDs or use matchNamespaces)
        - 1
      matchActions:
      - action: Sigkill    # Kill unexpected BPF loader; adjust to Post for audit-only
    - matchActions:
      - action: Post       # Log all BPF program loads

Falco rule:

- rule: BPF Program Loaded
  desc: A BPF program was loaded — alert if from unexpected process
  condition: >
    evt.type = bpf and
    evt.dir = > and
    not proc.name in (falco, tetragon, cilium-agent, bpftool, node_exporter) and
    not container.name in (falco, tetragon-agent)
  output: >
    BPF program loaded by unexpected process
    (proc=%proc.name pid=%proc.pid user=%user.name
     container=%container.name image=%container.image.repository)
  priority: CRITICAL
  tags: [ebpf, rootkit, kernel]

Step 4 — Implement BPF program inventory baseline

Run a periodic check of loaded BPF programs and alert on unexpected additions:

#!/bin/bash
# /usr/local/bin/bpf-inventory-check.sh
# Run this as a cron job or via your monitoring stack

BASELINE_FILE=/var/lib/bpf-inventory/baseline.json
CURRENT_FILE=/tmp/bpf-current-$(date +%s).json

# Capture current BPF program list
bpftool prog list -j > "$CURRENT_FILE"

if [[ ! -f "$BASELINE_FILE" ]]; then
  mkdir -p /var/lib/bpf-inventory
  cp "$CURRENT_FILE" "$BASELINE_FILE"
  echo "Baseline established with $(jq length "$BASELINE_FILE") programs"
  exit 0
fi

# Compare: look for programs added since baseline
NEW_PROGRAMS=$(jq -r --slurpfile baseline "$BASELINE_FILE" '
  . as $current |
  ($baseline[0] | map(.id)) as $baseline_ids |
  $current[] | select(.id as $id | $baseline_ids | index($id) == null) |
  "\(.id) \(.type) \(.name // "unnamed") loaded_at=\(.loaded_at // "unknown")"
' "$CURRENT_FILE")

if [[ -n "$NEW_PROGRAMS" ]]; then
  echo "ALERT: New BPF programs detected since baseline:"
  echo "$NEW_PROGRAMS"
  # Send to your alerting system
  # curl -X POST $ALERT_WEBHOOK -d "{\"text\": \"New BPF programs: $NEW_PROGRAMS\"}"
else
  echo "BPF program count unchanged"
fi

rm "$CURRENT_FILE"
chmod +x /usr/local/bin/bpf-inventory-check.sh

# Run every 5 minutes
echo "*/5 * * * * root /usr/local/bin/bpf-inventory-check.sh >> /var/log/bpf-inventory.log 2>&1" \
  > /etc/cron.d/bpf-inventory

Step 5 — Detect BPF hook hiding via cross-layer comparison

A BPF rootkit that hides processes from getdents64 will show a discrepancy between the kernel’s view and a non-BPF view. Implement cross-layer process enumeration:

#!/bin/bash
# /usr/local/bin/process-consistency-check.sh
# Compare processes visible via /proc vs those seen by bpftrace

# Enumerate PIDs via /proc (may be filtered by BPF hook)
PROC_PIDS=$(ls /proc | grep '^[0-9]' | sort -n)

# Enumerate PIDs via bpftrace kprobe on sys_getpid (bypasses getdents64 hook)
# This uses a different kernel path than /proc traversal
BPFTRACE_PIDS=$(bpftrace -e '
  tracepoint:syscalls:sys_enter_getpid { printf("%d\n", pid); }
' --timeout 5 2>/dev/null | sort -u -n)

# Find PIDs visible to bpftrace but not /proc (hidden processes)
HIDDEN=$(comm -23 \
  <(echo "$BPFTRACE_PIDS") \
  <(echo "$PROC_PIDS"))

if [[ -n "$HIDDEN" ]]; then
  echo "ALERT: PIDs active but hidden from /proc: $HIDDEN"
fi

Note: a sophisticated rootkit can also hook bpftrace. True independence requires hardware-based monitoring (Intel PT, PMU) or a hypervisor-level view. This script detects naive rootkits that only hook getdents64.

Step 6 — Deploy a secondary observability layer outside BPF

The most reliable defence is a monitoring layer that does not share the same kernel hook points as the primary BPF security tool:

Option A — auditd (kernel audit subsystem, separate from BPF):

# /etc/audit/rules.d/90-bpf-monitor.rules

# Monitor bpf() syscall
-a always,exit -F arch=b64 -S bpf -F key=bpf_syscall

# Monitor module loading (adjacent to BPF loading)
-a always,exit -F arch=b64 -S finit_module -S init_module -F key=module_load

# Monitor ptrace (used by some rootkit install paths)
-a always,exit -F arch=b64 -S ptrace -F key=ptrace_call
augenrules --load
systemctl restart auditd

# Verify BPF syscall is being audited
ausearch -k bpf_syscall --start today | head -20

auditd operates via a separate kernel hook (audit_log_* functions) that is harder to blind with a BPF program than the standard kprobe/tracepoint paths used by Falco/Tetragon.

Option B — eBPF program loading via BTF CO-RE from a separate kernel module:

For organizations that can deploy custom kernel modules with module signing, a loadable kernel module that registers LSM hooks for security_bpf_prog_load provides a monitoring path that a BPF-only rootkit cannot intercept.

Step 7 — Enforce Seccomp to block bpf() in workload containers

For application containers that have no legitimate need to load BPF programs, block the bpf() syscall via Seccomp:

# For all production workloads — add to pod spec
securityContext:
  seccompProfile:
    type: RuntimeDefault

The RuntimeDefault Seccomp profile blocks bpf() in containerd and Docker runtimes. Verify:

# Check which syscalls RuntimeDefault blocks on your runtime
cat /var/lib/kubelet/seccomp/profiles/audit.json | \
  jq '.syscalls[] | select(.names[] | test("bpf"))'

Expected Behaviour

Signal Before hardening After hardening
bpftool prog list shows unexpected programs Not alerted BPF inventory check fires alert within 5 minutes
kernel.unprivileged_bpf_disabled 0 (default) 1
lockdown mode none integrity
New BPF program load from non-system process No alert from Falco/Tetragon Falco CRITICAL alert fires
bpf() syscall in application container Permitted Blocked by RuntimeDefault Seccomp
auditd logs bpf() calls Not configured All bpf() syscalls logged with uid, pid, comm

Verification:

# Confirm BPF inventory baseline is established
ls -la /var/lib/bpf-inventory/baseline.json

# Confirm auditd rule is active
auditctl -l | grep bpf_syscall
# -a always,exit -F arch=b64 -S bpf -F key=bpf_syscall

# Confirm lockdown mode
cat /sys/kernel/security/lockdown
# integrity [confidentiality]

# Attempt BPF load as unprivileged user
su -s /bin/bash nobody -c "bpftool prog load /tmp/test.bpf /sys/fs/bpf/test 2>&1"
# Expected: Error: bpf(BPF_PROG_LOAD): Operation not permitted

Trade-offs

Aspect Benefit Cost Mitigation
lockdown=integrity Blocks BPF memory-write primitives used by rootkits Breaks some legitimate kprobe uses; may block perf profiling Test on a non-production node first; lockdown=integrity is less restrictive than confidentiality
BPF inventory check Detects new programs added post-baseline Generates false positives on every kernel update or agent restart Re-baseline after planned updates; exclude known-good program names
auditd BPF rule Independent from BPF-based monitoring High log volume on BPF-heavy systems Rate-limit by uid; exempt known monitoring service accounts
Seccomp blocking bpf() in containers Prevents container workloads from loading BPF Breaks workloads that legitimately use BPF (rare, but e.g. some network tools) Allowlist specific pods that need BPF via explicit Seccomp profile override

Failure Modes

Failure Symptom Detection Recovery
Falco/Tetragon self-alerts on own BPF programs Flood of false-positive alerts on startup Alert volume spike at agent start Add exemption for the monitoring agent’s own pod/namespace in Falco/Tetragon rules
lockdown=integrity breaks legacy monitoring tool Tool fails to start; dmesg shows lockdown rejection dmesg | grep "lockdown" shows blocked operation Update the tool; if not possible, run on a dedicated node without lockdown
BPF inventory baseline becomes stale after kernel upgrade Every new BPF program fires an alert; alert fatigue All alerts after upgrade reference known-good programs Re-baseline post-upgrade as part of maintenance runbook
Sophisticated rootkit hooks both kprobes and auditd paths No alert from either monitoring layer Gap detected only via hypervisor-level or hardware tracing Layer in VM introspection (e.g., KVM VMI); schedule periodic offline forensic analysis

  • eBPF LSM — building LSM hooks that enforce security policy alongside eBPF monitoring programs
  • eBPF Tetragon — deploying Tetragon for kernel-level process and network tracing
  • Falco Security Rules — writing and tuning Falco rules for runtime threat detection
  • Linux LPE Defence in Depth — the privilege escalation paths that give an attacker the CAP_BPF needed to load rootkits
  • File Integrity Monitoring — complementary monitoring layer that detects rootkit installation artifacts