AI-Generated System Code vs. the Linux Kernel's 30-Year Audit Trail

AI-Generated System Code vs. the Linux Kernel’s 30-Year Audit Trail

The Problem

The question is not whether LLMs write syntactically correct kernel code. They do. The question is whether that code carries the security properties that make upstream kernel code safe to run on production infrastructure — and on that question, the answer is systematically no.

An engineer today can type “write an eBPF program that rate-limits connections per IP, 100 connections per second, dropping excess” and receive a complete, compilable BPF program in 30 seconds. On a sufficiently modern prompt, the code will use libbpf, load a hash map keyed on source IP, and implement a token-bucket check. It will compile. It will load into the kernel. The verifier will accept it. Then it will silently miscount connections in high-concurrency environments, permit a per-CPU race that makes the rate limit nondeterministic, and break entirely on the next kernel version because it references an interface the AI trained on that has since been removed.

This is not a corner case. It is the predictable result of what AI code generation is and is not. A language model learns syntactic and semantic patterns from training data. It does not know which kernel interfaces are stable vs. internal, which map access patterns require spinlocks, or which edge cases in the BPF verifier’s abstract interpretation have been identified and fixed since the kernel version in its training cut-off. The Linux kernel security model has specific structural properties that took decades of adversarial testing, coordinated disclosure infrastructure, and community review to build. None of those properties are transferable via a prompt.

What the Linux Kernel Security Model Actually Provides

1. The kernel security response team and coordinated CVE process. When a vulnerability is found in an upstream kernel subsystem, the reporter sends a private disclosure to kernel-security@kernel.org. The security team identifies the affected subsystem maintainer, who develops a fix in private. The patch is reviewed and tested, then coordinated with major Linux distributors (Red Hat, Debian, Ubuntu, SUSE, and others) who receive advance access through the linux-distros@openwall.com embargo list — typically a 7-day embargo before public disclosure. The result is a CVE number assigned by MITRE or CNA, a commit in the stable kernel tree tagged with the CVE, backports to all active LTS kernels (currently 5.15, 6.1, 6.6, and 6.12), and distribution packages appearing within days of the embargo lift. When you discover a bug in your AI-generated kernel module — and bugs will be discovered, either by you, by an attacker, or by a kernel API change that triggers a panic — there is no process. No CVE. No patch. No backport. No distribution notification. The bug exists in your infrastructure and in nobody else’s threat model.

2. Stable API guarantees and the EXPORT_SYMBOL_GPL contract. The Linux kernel exports two categories of interfaces to modules. Interfaces exported via EXPORT_SYMBOL_GPL are available to GPL-licensed modules and carry an implicit commitment to longevity — they are changed cautiously, with deprecation warnings in the kernel’s linux-api mailing list before removal. Kernel-internal functions that are not exported are used by subsystems within the kernel tree itself and may change or disappear without notice between kernel versions. AI-generated kernel modules routinely use non-exported functions they learned from code in their training data. Those functions were valid at training time. By the time the module is deployed against a newer kernel version — or when the distribution kernel is updated — the function may have been renamed, refactored, or removed. The resulting failure mode is a kernel panic at module load time, or worse, a subtle memory corruption if the ABI boundary shifts in a way that passes basic runtime checks but accesses the wrong memory layout. Upstream kernel drivers and subsystems, by contrast, are tested against every kernel version in the CI infrastructure at kunit.kernel.org and kernelci.org. An upstream driver that references an internal function will fail the kernel’s own build system when that function is removed — the problem is caught and fixed before release.

3. Thirty-plus years of adversarial testing. The Linux TCP/IP stack — net/ipv4/tcp.c, net/core/, net/netfilter/ — has been fuzzed by Google’s syzkaller continuously since 2016, producing thousands of bug reports. It has been analysed by academic security researchers, by NSA-funded projects, by every major operating system security team in the world. The kernel’s BPF subsystem has had verifier bugs found by Google Project Zero (CVE-2021-3490, CVE-2022-23222), Theori (CVE-2021-31440), and others, each triggering coordinated disclosure, patches to every LTS kernel, and distribution updates. Every reachable code path in a major kernel subsystem has been subjected to some form of adversarial review at some point. AI-generated code starts at zero. The first adversarial review of your custom eBPF rate limiter or kernel module is whatever audit you conduct before shipping — and whatever an attacker who targets your system later chooses to do.

4. The eBPF verifier contract and what it does not guarantee. The eBPF verifier at kernel/bpf/verifier.c checks every program that passes through bpf(BPF_PROG_LOAD, ...) before it is JIT-compiled and attached. It verifies that the program cannot perform out-of-bounds memory access, cannot loop unboundedly, and cannot perform illegal pointer arithmetic. What the verifier does not check is whether the program is semantically correct — whether it does what you intend, whether its map access patterns are race-condition-free, or whether its interpretation of kernel data structures is accurate. A program that passes the verifier is safe to load in the sense that it cannot corrupt kernel memory. It is not safe in the sense that it will behave correctly.

A Concrete Comparison: AI-Generated eBPF Rate Limiter vs. the Correct Approach

Here is what a typical LLM produces when asked to write a per-source-IP connection rate limiter in eBPF:

// AI-generated: typical output, multiple bugs
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>

struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 65536);
    __type(key, __u32);    // source IP
    __type(value, __u64);  // connection count
} conn_count SEC(".maps");

SEC("xdp")
int rate_limit(struct xdp_md *ctx)
{
    void *data = (void *)(long)ctx->data;
    void *data_end = (void *)(long)ctx->data_end;
    struct ethhdr *eth = data;

    if (data + sizeof(*eth) > data_end)
        return XDP_PASS;

    struct iphdr *ip = data + sizeof(*eth);
    if ((void *)ip + sizeof(*ip) > data_end)
        return XDP_PASS;

    __u32 src_ip = ip->saddr;
    __u64 *count = bpf_map_lookup_elem(&conn_count, &src_ip);

    if (!count) {
        __u64 initial = 1;
        bpf_map_update_elem(&conn_count, &src_ip, &initial, BPF_ANY);
        return XDP_PASS;
    }

    if (*count > 100) {        // Bug 1: no time window
        return XDP_DROP;
    }

    (*count)++;                // Bug 2: non-atomic read-modify-write on a HASH map
    bpf_map_update_elem(&conn_count, &src_ip, count, BPF_ANY);
    return XDP_PASS;
}

char LICENSE[] SEC("license") = "GPL";

There are three distinct correctness bugs here, none of which the verifier catches:

Bug 1: No time window. The count increments per-packet, never resets. After count > 100, every subsequent packet from this IP is dropped forever — for the lifetime of the map entry. There is no sliding window, no token bucket, no counter reset. What the engineer asked for (100 connections per second) and what this program implements (100 connections total, ever) are completely different. The AI generated “rate limiting” code that is actually a permanent IP blocklist after the first 100 packets.

Bug 2: Non-atomic read-modify-write on a HASH map. The pattern count = bpf_map_lookup_elem(...); (*count)++; bpf_map_update_elem(...) is a read-modify-write sequence on a BPF_MAP_TYPE_HASH map. On a multi-CPU system, two CPUs executing this path concurrently for the same source IP will both read the same value, both increment it, and both write back — one of the increments is lost. For a rate limiter, this means the effective limit is nondeterministic under load. The correct approach is BPF_MAP_TYPE_PERCPU_HASH (which eliminates contention by giving each CPU its own counter) with aggregation at read time, or __sync_fetch_and_add with a BPF_MAP_TYPE_ARRAY. The AI was unaware of the distinction.

Bug 3: BPF_MAP_TYPE_HASH value pointer dereferencing. The code dereferences count (a pointer into the map’s hash table) and immediately writes through it ((*count)++) before calling bpf_map_update_elem. This is technically valid — for HASH maps, the returned pointer is stable during the BPF program’s execution — but it is fragile and masks the intent. More critically, the AI-generated code uses this pattern without recognising that it would be invalid if the map type were changed to BPF_MAP_TYPE_LRU_HASH, where the value pointer can be invalidated by eviction triggered by a concurrent map lookup. This is exactly the kind of subtle invariant that kernel subsystem documentation captures and that an AI trained on a mix of correct and incorrect code examples will miss.

The correct upstream approach to this problem is not a custom eBPF program — it is the nftables meter statement, which implements per-source rate limiting using a kernel-maintained hash table with proper locking:

# Upstream nftables per-source-IP rate limiting — no custom code
nft add table inet filter
nft add chain inet filter input '{ type filter hook input priority 0; }'
nft add rule inet filter input \
  ip protocol tcp \
  meter conn_rate { ip saddr timeout 10s limit rate 100/second burst 200 packets } \
  drop

This is a single rule, implemented entirely within the kernel’s netfilter subsystem, tested across every Linux kernel release since the nftables meter was introduced in 4.3, and maintained by the netfilter team. It implements a proper sliding window with configurable burst. Its correctness bugs — and there have been some, such as CVE-2023-6817, a use-after-free in the nft_pipapo set backend — were found by security researchers, disclosed through the kernel security team, patched in all LTS kernels, and distributed within days. No equivalent process exists for custom eBPF code.

Threat Model

AI-generated kernel module with a memory corruption bug. Kernel modules execute at ring 0. A stack buffer overflow in a custom module — or an integer overflow in a length calculation, or a use-after-free in an IRQ handler — is not a userspace crash. It is a kernel panic at best, or kernel arbitrary write at an attacker-controlled moment. An AI-generated module that passes insmod without error but contains a latent memory corruption bug is a time bomb. The module has no CVE, no upstream maintainer, no coordinated patching process. The organisation running it may not know the bug exists until a kernel panic occurs in production or, worse, until an attacker who reverse-engineers the module discovers it first.

AI-generated eBPF with incorrect map semantics. Silent data corruption in production. Rate limiters that don’t rate-limit. Accounting maps that undercount. Monitoring programs that miss events due to per-CPU aggregation bugs. None of this causes a crash or an error log entry — it simply produces wrong results that may not be noticed until the production consequence is visible.

AI-generated netfilter hook that breaks conntrack state. Netfilter hooks interact with the connection tracking subsystem through a specific API. Incorrectly sequenced hook calls, wrong priority values, or missing nf_ct_get / nf_conntrack_put reference counting produce connection state corruption that manifests as intermittent network failures — connections reset for no apparent reason, NAT translations broken, stateful rules matching incorrectly. These bugs are extraordinarily hard to diagnose because the symptom (a TCP reset) appears far from the cause (a reference counting error in a rarely-executed code path that only triggers under specific load patterns). The AI that generated the hook had no idea about the conntrack reference counting contract.

No upstream maintenance. A kernel API an AI-generated module uses is deprecated and removed. In upstream code, this triggers a compile error caught by the kernel’s CI system months before the removal lands in a stable release. In AI-generated code, the module silently fails to load on the new kernel, or — if the removed function was replaced by one with an incompatible signature — loads but produces undefined behaviour. Nobody files a CVE. Nobody ships a fix. The module simply breaks.

Hardening Configuration

1. Audit Your Infrastructure for AI-Generated Kernel Code

Before implementing any other control, identify what is already running. AI-generated kernel code tends to lack the boilerplate that upstream drivers include as a matter of convention:

# Check loaded modules for ones not in the distro package
comm -23 \
  <(lsmod | awk 'NR>1 {print $1}' | sort) \
  <(dpkg -l linux-modules-* 2>/dev/null | \
    awk '/^ii/ {print $2}' | \
    xargs dpkg-query -L 2>/dev/null | \
    grep '\.ko' | xargs -I{} basename {} .ko | sort) \
  | grep -v '^\s*$'

Any module appearing in this output is either out-of-tree (potentially AI-generated or custom), from a third-party DKMS package, or from a vendor driver not shipped in the distribution kernel. Audit each one.

For eBPF programs currently loaded:

# List all loaded BPF programs with their type and load time
bpftool prog list --json | jq -r '.[] | [.id, .type, .name, .loaded_at] | @tsv'

# For each program, check whether it was loaded by a known upstream tool
# (Cilium, Falco, systemd, bcc) or something else
bpftool prog list --json | jq -r '.[] | .loaded_by // "unknown"'

Known upstream tools (Cilium, Falco, Tetragon, systemd-network, the kernel’s own samples) will load programs with predictable names and from known binary paths. Programs loaded from ad-hoc scripts or one-off binaries with names that don’t match any installed package warrant immediate review.

2. Always Prefer Upstream Kernel Features Before Writing Custom Code

The first question before writing any kernel code — whether AI-assisted or not — is whether the kernel already provides the feature through a stable, maintained interface.

# Rate limiting: nftables meter — no custom eBPF required
# Proper sliding window, per-source, with configurable burst
nft add rule inet filter input \
  ip protocol tcp \
  meter conn_rate { ip saddr timeout 10s limit rate 100/second burst 200 packets } \
  drop

# Verify the meter is operating:
nft list meter inet filter conn_rate

# Connection tracking inspection: conntrack userspace tool
# Read-only access to the kernel's conntrack table — no custom hook needed
conntrack -L
conntrack -E --event-mask=NEW    # Stream new connection events

# XDP-based packet dropping: use the kernel's built-in XDP samples
# rather than writing custom XDP programs for standard use cases
# linux/samples/bpf/ contains reviewed, tested reference implementations
ls /usr/src/linux-headers-$(uname -r)/samples/bpf/ 2>/dev/null || \
  find /usr/share/doc/bpf-examples/ -name "*.c" 2>/dev/null

The nftables rate limiting example above replaces the AI-generated eBPF program entirely. The conntrack example replaces any need for a custom netfilter hook for read-only state inspection. The kernel’s linux/samples/bpf/ directory — accessible in the kernel source tree and in the linux-source package on Debian/Ubuntu — contains reference BPF programs that have been reviewed by BPF subsystem maintainers. These are the correct starting point if a custom BPF program is genuinely required.

3. If Using AI-Generated eBPF: Validate Against libbpf-bootstrap and CO-RE

AI models trained on BPF code prior to 2022 tend to generate BCC-style programs. BCC (BPF Compiler Collection) compiles BPF programs at runtime using the kernel headers installed on the target system. When the kernel version changes, BCC programs may fail to compile against the new headers, or compile but access struct fields that have moved. The CO-RE (Compile Once, Run Everywhere) approach, available through libbpf with BTF (BPF Type Format), generates relocations that adapt to the running kernel’s actual struct layouts at load time. An AI generating BCC-style code for a system that requires portability across kernel versions is producing inherently fragile output.

# Clone libbpf-bootstrap — the canonical starting template for CO-RE programs
git clone https://github.com/libbpf/libbpf-bootstrap /opt/libbpf-bootstrap
ls /opt/libbpf-bootstrap/examples/c/

# Compare the AI-generated program against the bootstrap templates:
# - Does it use SEC() macros correctly?
# - Does it use bpf_core_read() instead of direct struct field access?
# - Does it use ring buffers (BPF_MAP_TYPE_RINGBUF) rather than perf buffers
#   for event output? (perf buffers are per-CPU and require aggregation)
# - Does it handle map cleanup in error paths?

# Check whether the target kernel has BTF enabled (required for CO-RE)
ls /sys/kernel/btf/vmlinux
# If this file exists, CO-RE is available

# Verify that the AI-generated program uses CO-RE relocations:
llvm-objdump -r ai_generated_prog.o | grep -i "btf"
# No BTF relocations = BCC-style, not CO-RE = will break on kernel updates

4. Inspect AI-Generated eBPF Programs with bpftool Before Production

The verifier accepts a program that is memory-safe. It does not tell you what the program actually does at runtime. bpftool exposes the JIT-compiled and translated bytecode, the map definitions, and the program’s actual operation:

# Load the program in a test environment first, then inspect it
# bpftool shows the verifier-accepted bytecode translation
bpftool prog list
# Sample output:
# 42: xdp  name rate_limit  tag a1b2c3d4e5f60718  gpl
#         loaded_at 2026-05-08T10:23:41+0000  uid 0
#         xlated 312B  jited 198B  memlock 4096B  map_ids 7

# Disassemble the JIT-compiled program (the actual machine code)
bpftool prog dump xlated id 42
# Review for unexpected memory accesses, map operations, or helper calls

# Dump the actual JIT output
bpftool prog dump jited id 42

# Inspect map definitions — check types match what the code expects
bpftool map list
bpftool map dump id 7

# Use the eBPF formal verifier for correctness checking beyond the kernel verifier
# https://github.com/vbpf/ebpf-verifier — academic tool that checks more properties
# than the kernel's own verifier, including some semantic properties
git clone https://github.com/vbpf/ebpf-verifier /opt/ebpf-verifier
cd /opt/ebpf-verifier && cmake -B build && cmake --build build
./build/check ai_generated_prog.o

The kernel verifier’s output is binary: the program loads, or it doesn’t. bpftool prog dump xlated shows you what instructions were actually accepted. The ebpf-verifier academic tool implements a more conservative abstract domain than the kernel’s verifier and will flag programs that the kernel accepts but that have potential issues the kernel verifier’s analysis is not designed to catch.

5. Enforce Kernel Module Signing to Prevent Unsigned Module Loading

If your infrastructure ever runs custom kernel modules — AI-generated or not — the minimum control is requiring all modules to be signed with a key the kernel trusts. An unsigned module cannot load on a kernel with CONFIG_MODULE_SIG_FORCE=y. This converts “AI-generated module gets dropped into /lib/modules/ and loaded” from a zero-friction attack path to one that requires key material the attacker does not have.

# Check whether the running kernel was built with module signature enforcement
grep CONFIG_MODULE_SIG /boot/config-$(uname -r)
# CONFIG_MODULE_SIG=y         — signing infrastructure present
# CONFIG_MODULE_SIG_FORCE=y   — unsigned modules are rejected (required)
# CONFIG_MODULE_SIG_SHA256=y  — SHA-256 digest (correct choice)

# If CONFIG_MODULE_SIG_FORCE is not set, the kernel accepts unsigned modules.
# The only fix is to reboot into a kernel built with FORCE enabled,
# or to use Secure Boot with MOK enforcement.

# Generate an organisation signing key:
openssl req -new -x509 -newkey rsa:4096 \
  -keyout /etc/kernel-signing/signing_key.pem \
  -out /etc/kernel-signing/signing_key.x509 \
  -days 3650 -subj "/CN=Module Signing Key/O=Your Org" \
  -nodes

# Sign a module before deploying:
/usr/src/linux-headers-$(uname -r)/scripts/sign-file \
  sha256 \
  /etc/kernel-signing/signing_key.pem \
  /etc/kernel-signing/signing_key.x509 \
  /path/to/custom_module.ko

# Verify the module has a valid signature:
modinfo /path/to/custom_module.ko | grep signer
# signer:         Your Org Module Signing Key

# Attempt to load an unsigned module on a FORCE-enabled kernel:
insmod unsigned_module.ko
# Expected: insmod: ERROR: could not insert module: Required key not available

On systems with Secure Boot enabled, the MOK (Machine Owner Key) database provides a second layer: the module signing key must itself be enrolled in the UEFI MOK database, which requires physical presence at the UEFI console to approve. This prevents an attacker who has compromised the OS from enrolling their own key and signing arbitrary modules.

6. Track AI-Generated Code Explicitly in Your SBOM

Any AI-generated kernel code that reaches production must be tracked with a clear provenance marker. Scanners that correlate against CVE databases will find no matches for custom code — that silence is not the same as safety. The SBOM annotation communicates to vulnerability management tooling that this component requires manual audit rather than automated CVE correlation:

# Add AI-generated component to SBOM with explicit provenance marker
# Using SPDX 2.3 format:
cat >> sbom.spdx << 'EOF'

PackageName: custom-rate-limiter-xdp
PackageVersion: 1.0.0
SPDXID: SPDXRef-custom-rate-limiter-xdp
PackageSupplier: NOASSERTION
PackageOriginator: Tool: claude-3-7-sonnet-20250219
FilesAnalyzed: true
PackageLicenseConcluded: GPL-2.0-only
PackageComment: <text>AI-generated XDP rate limiter. No upstream CVE feed.
Manual security audit required. Kernel API compatibility must be
reverified on each kernel update. No coordinated disclosure process
for bugs found in this component.</text>
ExternalRef: SECURITY cpe23Type cpe:2.3:a:custom:rate-limiter-xdp:1.0.0:*:*:*:*:*:*:*
EOF

The PackageOriginator: Tool: <model-name> field explicitly marks the component as AI-generated. The PackageComment communicates the maintenance gap to anyone who reads the SBOM later. This is not a security control — it is an audit trail that prevents the component from being treated as equivalent to a packageable upstream dependency with an active CVE feed.

7. Decision Matrix: When AI-Generated Kernel Code Is Acceptable

Not all custom kernel code carries equal risk. The key variable is whether the code has write access to kernel memory, network state, or security-enforcing data structures:

AI-GENERATED eBPF: LOWER RISK (acceptable with review)
  - Read-only kprobe/tracepoint programs that only call bpf_perf_event_output()
    or bpf_ringbuf_output() — they read kernel state but do not modify it
  - Performance counters: BPF_MAP_TYPE_PERCPU_ARRAY counters aggregated in
    userspace — no network stack modification, no security enforcement
  - Process execution tracing: bpf_get_current_comm(), bpf_get_current_pid_tgid()
    for audit logging — read-only, limited blast radius if semantically wrong

AI-GENERATED eBPF: HIGHER RISK (require extensive review or avoid)
  - XDP or TC programs that drop or modify packets — semantic bugs have network-
    level consequences; incorrect drop rules affect availability
  - Programs that write to maps shared with userspace that controls security decisions
  - Programs attached to LSM hooks — bugs in security enforcement code
  - Any program using bpf_probe_write_user() — direct userspace memory writes

AI-GENERATED KERNEL MODULES: AVOID in production
  - Modules run at ring 0 with no isolation boundary
  - A single memory corruption bug is a full host compromise vector
  - No upstream maintenance; no CVE process; no LTS backport
  - If a custom module is genuinely required: write it from scratch with a named
    engineer taking maintenance ownership, code review, and testing against
    the kernel's own test infrastructure (KUnit, kselftest)

8. Monitor the Upstream Kernel Tree for APIs Your Custom Code Uses

If you ship custom kernel code — AI-generated or otherwise — you are responsible for tracking when the APIs it depends on change. The kernel’s linux-api list and the MAINTAINERS file identify who owns each subsystem:

# Check which kernel functions your module uses
nm -u /path/to/custom_module.ko | grep -v '__crc_' | awk '{print $2}'

# For each function, check whether it is exported and to which symbol table
# On a running kernel:
grep function_name /proc/kallsyms
# Lines tagged [GPL] mean EXPORT_SYMBOL_GPL — available to GPL modules
# No tag means EXPORT_SYMBOL — available to all modules
# Not present means kernel-internal — your module will break on API change

# Track the relevant subsystem's git log for API changes
# Example: watching netfilter for changes to nf_conntrack interfaces
git clone https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git \
  /opt/nf-kernel-tree
git -C /opt/nf-kernel-tree log --oneline --since="30 days ago" \
  -- include/linux/netfilter/ net/netfilter/

Expected Behaviour

After enforcing CONFIG_MODULE_SIG_FORCE=y, loading an unsigned AI-generated module produces an immediate, unambiguous error:

$ insmod ai_rate_limiter.ko
insmod: ERROR: could not insert module ai_rate_limiter.ko: Required key not available
$ dmesg | tail -3
[12345.678] ai_rate_limiter: module verification failed: signature and/or required key missing - tainting kernel
[12345.679] Loading of unsigned module is rejected

After replacing an AI-generated eBPF rate limiter with the nftables meter equivalent, the nft list ruleset output includes the meter definition with its timeout and rate parameters:

table inet filter {
  meter conn_rate {
    type ipv4_addr
    size 65535
    flags dynamic,timeout
  }

  chain input {
    type filter hook input priority filter; policy accept;
    ip protocol tcp meter name conn_rate { ip saddr timeout 10s limit rate 100/second burst 200 packets } drop
  }
}

The timeout parameter (10s) ensures entries expire, implementing the sliding window that the AI-generated eBPF program did not have. The burst 200 packets parameter allows short bursts without dropping, matching expected browser behaviour for connection-heavy page loads.

After loading the AI-generated eBPF rate limiter (for comparison) and inspecting it with bpftool, the map type reveals the per-CPU bug immediately:

$ bpftool map list
7: hash  name conn_count  flags 0x0
        key 4B  value 8B  max_entries 65536  memlock 4718592B

A hash map type (BPF_MAP_TYPE_HASH) with a counter value, accessed with bpf_map_lookup_elem and then written back — this is the non-atomic pattern. The correct type would be percpu_hash (BPF_MAP_TYPE_PERCPU_HASH), visible as percpu_hash in bpftool map list. The map type tells you immediately that the locking semantics are wrong.

Trade-offs

The upstream preference creates a feature gap. Not everything you need exists upstream. Some use cases are genuinely novel: an organisation-specific telemetry program that tags BPF events with internal identifiers, a custom XDP program for a proprietary hardware offload path, a kernel module for a device with no upstream driver. The argument is not that custom kernel code is always wrong — it is that it carries a maintenance and security burden that must be consciously accepted and resourced. “We will write and maintain this module and own the security response process for it” is a viable engineering decision. “An LLM will write this module and we’ll ship it” is not.

Module signing adds key management overhead. An organisation that enforces module signing needs a key management process: generating the signing key with appropriate entropy, storing it in a hardware security module or at minimum a secrets manager, rotating it when personnel who had access leave, and enrolling it in the MOK database on each host. For development workflows where engineers frequently compile and test kernel modules locally, enforcing module signing creates friction. The correct response is a development-specific signing key enrolled on developer machines, separate from the production key, with a policy that prohibits production deployment of modules signed only with the development key.

AI-generated eBPF for read-only tracing is genuinely lower risk. A tracepoint program attached to sched_process_exec that calls bpf_get_current_comm() and writes to a ring buffer has a very limited blast radius if it is semantically wrong. The worst case is that it reports incorrect data to a monitoring tool — a correctness problem, not a security problem. This is meaningfully different from an XDP program that drops packets or a module that hooks into the VFS layer. The risk calculation should be proportional to the privilege and write access the code exercises.

Failure Modes

Shipping AI-generated kernel modules to production because the tests pass. The kernel module loads without error. A test that fires SYN packets confirms the rate limiter drops packets above the threshold. The module is merged and deployed. Three weeks later, a kernel update removes the internal function the module uses with __attribute__((visibility("hidden"))) that was not exported via EXPORT_SYMBOL. On the next node reboot after the kernel update, the module fails to load with Unknown symbol in module. The rate limiting feature silently disappears on any node that has rebooted — a problem that presents as mysterious traffic behaviour rather than a clear error, because most monitoring does not check whether specific kernel modules are loaded.

Using BCC-style AI-generated eBPF on infrastructure that upgrades kernels. BCC compiles eBPF programs at runtime against the running kernel’s header files. When Ubuntu 24.04 ships a kernel update from 6.8 to 6.11, BCC programs that reference struct fields whose layout changed will either fail to compile or compile incorrectly and access the wrong memory. This is particularly dangerous for programs that read kernel data structures: if task_struct gains a field that shifts the offset of comm[], a BCC program reading the process name will silently read whatever is now at the old offset. The AI that generated the BCC-style program had no knowledge of which struct layouts are stable vs. which change between kernel versions, because that information lives in individual commit messages and MAINTAINERS annotations rather than in the code itself.

No inventory of which production systems run custom vs. upstream kernel code. An organisation discovers a semantic bug in its AI-generated XDP packet filter — not through a CVE process, because there isn’t one, but through traffic analysis that reveals the rate limiter is not functioning correctly under load. The fix requires identifying every node where the affected BPF program is loaded. Without a deployment record that distinguishes custom BPF programs from those loaded by Cilium, Falco, or the kernel itself, the hunt requires connecting to each node and running bpftool prog list. At a scale of hundreds of nodes, this is a multi-hour incident response exercise for a bug that could have been avoided by using an upstream solution. You cannot patch what you cannot find.

Treating “passes the verifier” as equivalent to “is correct.” The kernel’s eBPF verifier is a memory-safety tool. It proves that the program cannot corrupt kernel memory. It does not prove that the program implements the intended policy. An organisation that reviews AI-generated eBPF by confirming it loads successfully and then checking that it appears to work in basic testing has conflated safety with correctness. The bugs that matter for security enforcement programs are semantic — the ones the verifier explicitly does not check. Review for correctness requires reading the generated code, understanding the map access semantics, verifying the race conditions, and validating against the upstream kernel documentation for the specific helpers and map types used. That review takes longer than generating the code. If you are not willing to spend the time on that review, the correct decision is to use an upstream kernel feature instead.