eBPF-LSM (lsm_bpf): Kernel Security Policy as Hot-Loadable BPF Programs

eBPF-LSM (lsm_bpf): Kernel Security Policy as Hot-Loadable BPF Programs

Problem

Linux Security Modules (LSMs) — AppArmor, SELinux, Smack — define security policy at kernel hooks: every file open, network operation, capability check passes through the LSM, which decides allow / deny. Two long-standing limits:

  • Policy authoring is painful. SELinux policy is a custom DSL; AppArmor is more readable but limited; both require recompilation or restart for substantial changes.
  • Policy distribution is operational. Updating an AppArmor profile means updating filesystem files; cluster-wide propagation is a per-host concern; no atomic rollout.

lsm_bpf (kernel 5.7+) attaches eBPF programs to LSM hooks. The same hooks AppArmor and SELinux use are now programmable in C / Rust / via libbpf, loaded at runtime, and observable through normal eBPF tooling. By 2026 the production examples are mature:

  • Cilium Tetragon uses BPF LSM for runtime enforcement of process and file policies.
  • Falco’s BPF probes can attach to LSM hooks for richer event capture.
  • Cilium’s Network policy (in some configurations) uses BPF LSM for socket-level enforcement.
  • Custom internal policies in cloud providers’ Linux base images.

Compared to AppArmor:

  • BPF LSM policies compile to BPF bytecode; loaded via bpf() syscall; live on the kernel’s verifier-checked path.
  • Hot reload: replacing a policy doesn’t reboot or restart anything.
  • Policy is C / Rust source, version-controlled like any other code.
  • Same observability primitives as eBPF tracing — counters, perf events, ring buffers.
  • Strict cloud-native shape: a single binary contains the policy; deploy via DaemonSet.

The specific gaps in default Linux:

  • AppArmor / SELinux profiles are filesystem-based; require config-management for distribution.
  • Profile updates require reload (AppArmor) or full system relabel (SELinux).
  • Policy debugging is per-host; no native observability into rule hits.
  • Custom policies for ephemeral-shaped infrastructure (per-Pod, per-tenant) are awkward to express.

This article covers writing BPF LSM programs, deploying via libbpf-bootstrap or BCC, integration with Cilium / Tetragon, performance characteristics, and the hardening patterns that BPF LSM uniquely enables.

Target systems: Linux kernel 5.13+ (KFunc support); 6.0+ for stable BPF token authentication; CONFIG_BPF_LSM=y, CONFIG_LSM includes bpf. Most distributions in 2026 ship this enabled (Ubuntu 24.04, RHEL 10, Fedora 41+).

Threat Model

  • Adversary 1 — Compromised root inside a container: wants to bypass standard syscall filters via uncommon paths (uncovered by seccomp).
  • Adversary 2 — Container escape attempt via known kernel CVE: seeks to perform unusual capability or namespace operations.
  • Adversary 3 — Insider with privileged access modifying critical files outside expected work areas.
  • Adversary 4 — Workload anomaly doing legitimate-looking but risky operations (mounting hostPath, opening /proc//mem, etc.).
  • Access level: Adversary 1-2 have root inside a container. Adversary 3 has interactive shell. Adversary 4 has standard workload privilege.
  • Objective: Bypass the workload’s intended capability surface, escape the container, persist on the host, exfiltrate sensitive data.
  • Blast radius: Without LSM enforcement, the kernel grants the operation if standard DAC permits. With BPF LSM, every operation passes through the policy; deny decisions are immediate.

Configuration

Step 1: Verify Kernel Support

# Check that BPF LSM is enabled.
grep CONFIG_BPF_LSM /boot/config-$(uname -r)
# CONFIG_BPF_LSM=y

cat /sys/kernel/security/lsm
# capability,landlock,lockdown,yama,integrity,apparmor,bpf

# bpf must appear in the active LSM list. If absent, edit kernel cmdline:
# lsm=...,bpf

Step 2: Write a Simple Policy

Example: deny /etc/shadow reads from any process named unprivileged.

// shadow_protect.bpf.c
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_core_read.h>

#define EACCES 13

SEC("lsm/file_open")
int BPF_PROG(file_open_check, struct file *file)
{
    char filename[256] = {};
    char comm[16] = {};
    bpf_probe_read_kernel_str(filename, sizeof(filename),
        BPF_CORE_READ(file, f_path.dentry, d_name.name));
    bpf_get_current_comm(comm, sizeof(comm));

    /* Match: process named "unprivileged" trying to read "shadow". */
    bool is_target = (
        __builtin_memcmp(comm, "unprivileged", 12) == 0 &&
        __builtin_memcmp(filename, "shadow", 6) == 0
    );

    if (is_target) {
        bpf_printk("DENY: %s tried to open %s\n", comm, filename);
        return -EACCES;
    }
    return 0;   /* allow */
}

char LICENSE[] SEC("license") = "GPL";

Compile with libbpf-bootstrap:

clang -O2 -g -target bpf -D__TARGET_ARCH_x86 \
  -I /usr/include/x86_64-linux-gnu \
  -c shadow_protect.bpf.c \
  -o shadow_protect.bpf.o

Step 3: Load the Policy

# bpftool load + attach.
sudo bpftool prog load shadow_protect.bpf.o /sys/fs/bpf/shadow_protect
sudo bpftool prog attach pinned /sys/fs/bpf/shadow_protect lsm

# Verify attached.
sudo bpftool prog show
# 213: lsm  name file_open_check  tag ... gpl
#   loaded_at 2026-04-29T10:00:00+0000  uid 0
#   xlated 1024B  jited 768B  memlock 4096B
#   btf_id 41

The policy is now active. A process named unprivileged reading /etc/shadow will get EACCES.

# Verify.
cp /usr/bin/cat /tmp/unprivileged
/tmp/unprivileged /etc/shadow
# /tmp/unprivileged: /etc/shadow: Permission denied

# Audit messages.
sudo cat /sys/kernel/debug/tracing/trace_pipe
# unprivileged-12345 [001] DENY: unprivileged tried to open shadow

Step 4: Deploy via Cilium Tetragon

Tetragon manages BPF LSM policies declaratively via Kubernetes CRDs.

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: deny-shadow-read
spec:
  kprobes:
    - call: "security_file_open"
      syscall: false
      args:
        - index: 0
          type: "file"
      selectors:
        - matchArgs:
            - index: 0
              operator: "Postfix"
              values:
                - "/shadow"
          matchActions:
            - action: Override
              argError: -13   # EACCES

Tetragon compiles the CRD into a BPF LSM program, loads it on every node, and reports policy hits. New policies roll out via kubectl apply — no reboot or pod restart.

Step 5: Per-Cgroup / Per-Container Policy

BPF LSM programs can scope policy by cgroup (and therefore by container or Pod):

SEC("lsm/file_open")
int BPF_PROG(file_open_check, struct file *file)
{
    /* Get the cgroup ID of the current process. */
    u64 cgroup_id = bpf_get_current_cgroup_id();

    /* Look up policy for this cgroup. */
    struct policy *p = bpf_map_lookup_elem(&cgroup_policies, &cgroup_id);
    if (!p) return 0;   /* no policy for this cgroup */

    /* Check the file path against the policy's denied paths. */
    char filename[256] = {};
    bpf_probe_read_kernel_str(filename, sizeof(filename),
        BPF_CORE_READ(file, f_path.dentry, d_name.name));
    if (path_matches_denylist(filename, p)) {
        return -EACCES;
    }
    return 0;
}

Userspace populates cgroup_policies map per Pod / per Container. Different workloads on the same host run under different policies without separate AppArmor profiles per workload.

Step 6: Auditable Logging

Every policy hit emits an event. Userspace consumes via perf event or ring buffer:

struct lsm_event {
    u64 timestamp;
    u32 pid;
    u32 cgroup_id;
    char comm[16];
    char filename[256];
    int decision;
};

struct {
    __uint(type, BPF_MAP_TYPE_RINGBUF);
    __uint(max_entries, 1 << 24);
} events SEC(".maps");

SEC("lsm/file_open")
int BPF_PROG(file_open_check, struct file *file)
{
    /* ... policy check ... */
    if (deny) {
        struct lsm_event *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
        if (e) {
            e->timestamp = bpf_ktime_get_ns();
            e->pid = bpf_get_current_pid_tgid() >> 32;
            e->cgroup_id = bpf_get_current_cgroup_id();
            bpf_get_current_comm(e->comm, sizeof(e->comm));
            bpf_probe_read_kernel_str(e->filename, sizeof(e->filename),
                BPF_CORE_READ(file, f_path.dentry, d_name.name));
            e->decision = -EACCES;
            bpf_ringbuf_submit(e, 0);
        }
        return -EACCES;
    }
    return 0;
}

Userspace daemon reads the ring buffer, ships events to Loki / Splunk / your SIEM. Per-event detail richer than auditd; lower overhead than AppArmor audit mode.

Step 7: Common Patterns

Beyond file access:

  • Capability checks (lsm/capable): deny CAP_SYS_PTRACE to specific containers.
  • Socket creation (lsm/socket_create): block raw sockets in containers without the audit-permission flag.
  • Bprm checks (lsm/bprm_check_security): block execution of binaries from /tmp or /dev/shm.
  • Mount checks (lsm/sb_mount): refuse hostPath mounts at the kernel level even when Pod Security Admission permits them.

Each of these is 50-100 lines of BPF code and a userspace loader — distributed via DaemonSet.

Step 8: Performance

BPF LSM programs run on every relevant syscall path. Performance matters.

# Benchmark per-syscall overhead.
sudo perf stat -e cycles,instructions ./benchmark
# without policy: 1.2 us/syscall
# with policy:    1.4 us/syscall  (+15%)

Typical overhead: 50-200 ns per check. For most workloads the impact is invisible. For latency-critical paths (high-frequency network sockets, many small file reads), measure before deploying.

Optimize:

  • Avoid bpf_probe_read_kernel_str for variable-length paths in the hot path; use cached values.
  • Use BPF maps for lookup-heavy decisions rather than recomputing.
  • Limit the per-program complexity (verifier rejects programs over 1M instructions).

Expected Behaviour

Signal AppArmor / SELinux BPF LSM
Update propagation Filesystem-based; per-host Kernel-loaded; atomic; cluster-wide via DaemonSet
Policy authoring Custom DSL C / Rust / libbpf
Per-container policy Profile per workload (operational nightmare) Per-cgroup map lookup; uniform program
Hot reload Reload command, sometimes restart Replace BPF program; no restart
Observability auditd messages Ring-buffer events; eBPF tracing
Compatibility with existing LSMs Stack with bpf as one of multiple Same: bpf is a co-resident LSM
Performance Comparable Comparable; verifier-bounded

Verify a policy is active:

sudo bpftool prog show | grep lsm
# 213: lsm  name file_open_check ...

sudo bpftool prog tracelog
# (live trace of policy hits)

Trade-offs

Aspect Benefit Cost Mitigation
Policy as code Version-controlled, reviewable Requires BPF / kernel familiarity Use Tetragon / Cilium for declarative wrapper; teams write CRDs, not BPF directly.
Hot reload No restart for policy changes Operator must understand load / attach lifecycle Standard via libbpf-bootstrap or higher-level tools.
Per-cgroup scoping Per-workload policy Map maintenance per Pod Automate via container runtime hooks; tools like Tetragon handle this.
BPF verifier strictness Prevents kernel panics Some natural code patterns rejected (loops, complex pointer arithmetic) Use BPF helpers; structure code for verifier; libbpf macros help.
Performance overhead Low but non-zero Latency-critical paths impacted Benchmark before deploying; optimize hot path.
Stack with existing LSMs Defense in depth Policy interactions can be subtle Test thoroughly; understand which LSM denies first (each LSM evaluates independently; any deny wins).

Failure Modes

Failure Symptom Detection Recovery
BPF verifier rejects policy Load fails bpftool returns verifier error Read verifier output; restructure code (typically: bound loops, simplify pointer arithmetic).
Policy too restrictive Legitimate workload breaks App reports EACCES on unexpected paths Review hit logs; loosen policy; deploy with audit-only mode first.
BPF map memory exhausted Lookups fail Map dump shows max-entries reached Increase map size; LRU maps for cache-style use.
Kernel BPF LSM not in active list Policy loads but doesn’t enforce cat /sys/kernel/security/lsm lacks bpf Add lsm=...,bpf to kernel cmdline; reboot.
Policy bug crashes verifier Some legitimate ops blocked Specific pattern of failures Verifier wouldn’t accept actual buggy code; if odd behavior, check userspace policy-loading logic.
Per-cgroup map drift New Pods don’t have policy New workloads run unprotected Userspace daemon must populate map on Pod start; test the integration.
Stale policy after pod terminates Map entries leak Map size grows Cleanup on Pod terminate; periodic GC.