Reducing Your Applicable Linux Kernel CVE Count via Attack Surface Reduction

Reducing Your Applicable Linux Kernel CVE Count via Attack Surface Reduction

Problem

The Linux kernel is the most CVE-intensive piece of software in any Linux deployment. In 2024 alone, over 5,700 kernel CVEs were published — a number that has been accelerating as automated vulnerability discovery tools, better fuzzing infrastructure (syzkaller, Google’s OSS-Fuzz), and AI-assisted code analysis find vulnerabilities faster than ever.

The alarming number is deceptive in a useful way: the vast majority of kernel CVEs affect code paths that are not present or reachable on most production servers. A CVE in the Bluetooth L2CAP stack does not matter on a server with no Bluetooth hardware. A CVE in the Amateur Radio AX.25 subsystem is irrelevant on any cloud instance. A CVE in io_uring is not exploitable if io_uring is disabled.

The practical consequence: the number of kernel CVEs that apply to your specific deployment is far smaller than the total CVE count — but only if you have actively reduced the kernel’s attack surface. A default server kernel from most distributions ships with a wide range of compiled-in subsystems and dynamically loadable modules. Any CVE in those subsystems applies to your host, even if the functionality is never intentionally used.

The attack surface reduction leverage is high. Disabling Bluetooth removes a consistent source of kernel CVEs (multiple per year). Disabling obsolete networking protocols (IPX, ECONET, X.25, DECnet) eliminates entire CVE families. Disabling io_uring for most server workloads (where it is not used) removed exposure to a sustained series of privilege escalation CVEs in 2022–2024. Each subsystem disabled is a category of CVEs that no longer requires tracking or patching on that host.

This approach complements, but does not replace, patching. Its value is reducing the scope of what must be patched and what must be monitored.

Target systems: production Linux servers (bare metal and VM); any host where kernel CVE tracking is part of the security programme; systems where kernel live patching is used and patch cost per CVE matters.


Threat Model

Adversary 1 — Local privilege escalation via unused subsystem. A server has never used Bluetooth, but the Bluetooth stack is compiled into the running kernel. A CVE in the Bluetooth subsystem allows local privilege escalation. The server is exploited by a low-privilege attacker (compromised service account, container breakout landing in a pod). With the Bluetooth module blacklisted, the CVE is not reachable.

Adversary 2 — io_uring LPE chain. A series of io_uring CVEs (2022–2024 pattern) allow privilege escalation from any process that can call io_uring_setup. Most server workloads do not use io_uring. Restricting it via sysctl eliminates this entire CVE category.

Adversary 3 — Container escape via kernel network namespace CVE. A CVE in a rarely-used networking protocol allows manipulation of kernel state accessible from a container. Removing the kernel module for that protocol eliminates the escape vector.


Configuration / Implementation

Step 1 — Audit which kernel modules are currently loaded and used

# List all currently loaded kernel modules
lsmod | sort

# Identify modules loaded but never actively used
# Check modules that have been loaded since boot
lsmod | awk 'NR>1 {print $1}' | while read mod; do
    # Check if the module has been used (reference count > 0 indicates active use)
    refcount=$(lsmod | awk -v m="$mod" '$1==m {print $3}')
    if [[ "$refcount" == "0" ]]; then
        echo "UNUSED (refcount=0): $mod"
    fi
done

# Find all available modules on the system (much larger than loaded set)
find /lib/modules/$(uname -r) -name "*.ko*" | wc -l
# This number is the total module attack surface if modprobe is unrestricted

# Check which modules have had CVEs recently
# Cross-reference loaded modules against the kernel CVE tracker
# https://www.cve.org/CVERecord — search "kernel" + module name

Step 2 — Map high-CVE kernel subsystems to module names

# High-CVE subsystems commonly present on servers that are rarely needed:

# Bluetooth — consistent CVE source, irrelevant on servers
BT_MODULES=(bluetooth btusb btbcm btintel btrtl bt822x)

# Amateur Radio
AMPR_MODULES=(ax25 netrom rose)

# Obsolete networking protocols
OBSOLETE_NET=(decnet econet ipx x25 appletalk)

# SCTP — rarely needed on most servers
SCTP_MODULES=(sctp)

# NFS client/server — if not using NFS
NFS_MODULES=(nfs nfsd)

# CAN bus — industrial protocol, not needed on most servers
CAN_MODULES=(can can-bcm can-raw can-gw)

# Video/display modules on headless servers
VIDEO_MODULES=(drm drm_kms_helper ttm)

# Check which of these are currently loaded
echo "=== Loaded high-CVE modules that may not be needed ==="
for mod in "${BT_MODULES[@]}" "${AMPR_MODULES[@]}" "${OBSOLETE_NET[@]}" \
           "${SCTP_MODULES[@]}" "${CAN_MODULES[@]}"; do
    if lsmod | grep -q "^$mod "; then
        echo "LOADED: $mod"
    fi
done

Step 3 — Blacklist unused kernel modules

# /etc/modprobe.d/attack-surface-reduction.conf
# Blacklist modules that are not needed on this server type
# Each blacklisted module removes a category of CVEs from applicability

cat > /etc/modprobe.d/attack-surface-reduction.conf << 'EOF'
# Bluetooth — CVE-dense subsystem; not needed on servers
install bluetooth /bin/false
install btusb /bin/false
install btbcm /bin/false
install btintel /bin/false

# Amateur Radio protocols
install ax25 /bin/false
install netrom /bin/false
install rose /bin/false

# Obsolete/unused networking protocols
install decnet /bin/false
install econet /bin/false
install ipx /bin/false
install x25 /bin/false
install appletalk /bin/false

# TIPC — rarely needed
install tipc /bin/false

# RDS — rarely needed  
install rds /bin/false

# DCCP — deprecated
install dccp /bin/false

# CAN bus — industrial; not for general servers
install can /bin/false
install can-bcm /bin/false
install can-raw /bin/false

# SCTP — only needed if explicitly used
# Uncomment if your workload doesn't use SCTP:
# install sctp /bin/false

# USB audio — not needed on headless servers
install snd_usb_audio /bin/false
install snd_usbmidi_lib /bin/false

# Firewire — rarely present on modern servers
install firewire_ohci /bin/false
install firewire_sbp2 /bin/false
EOF

# Rebuild initramfs to apply blacklist
update-initramfs -u 2>/dev/null || dracut -f 2>/dev/null

# Unload currently loaded blacklisted modules (requires reboot for some)
modprobe -r bluetooth btusb 2>/dev/null
modprobe -r ax25 netrom 2>/dev/null
modprobe -r decnet econet ipx 2>/dev/null

# Verify modules are blacklisted
modprobe --dry-run bluetooth 2>&1
# Expected: FATAL: Module bluetooth is blacklisted

Step 4 — Restrict io_uring via sysctl

# io_uring has been a consistent source of LPE CVEs (CVE-2022-29968,
# CVE-2023-2598, CVE-2024-0582, and others in the same family)
# Most server workloads do not use io_uring

# Disable io_uring for all unprivileged users
# This does not break workloads that don't use io_uring
echo 1 > /proc/sys/kernel/io_uring_disabled
# 0 = enabled (default)
# 1 = disabled for unprivileged users (keep for root if needed)
# 2 = disabled entirely

# Make permanent
cat >> /etc/sysctl.d/99-attack-surface.conf << 'EOF'
# Disable io_uring for unprivileged users — eliminates a family of LPE CVEs
# Set to 2 if no workloads on this host use io_uring at all
kernel.io_uring_disabled = 1

# Restrict unprivileged user namespace creation — eliminates container escape
# CVE families that require creating user namespaces
kernel.unprivileged_userns_clone = 0

# Disable BPF JIT for unprivileged users — removes JIT spray attack surface
kernel.unprivileged_bpf_disabled = 1

# Restrict perf_event_open — reduces kernel attack surface
# 1 = only accessible to processes with CAP_PERFMON
kernel.perf_event_paranoid = 3

# Disable kexec — prevents loading a new kernel (limits LPE impact)
kernel.kexec_load_disabled = 1
EOF

sysctl -p /etc/sysctl.d/99-attack-surface.conf

Step 5 — Cross-reference enabled subsystems against CVE feeds

#!/usr/bin/env python3
# scripts/kernel-cve-surface-check.py
# Cross-reference loaded kernel modules against recent CVE data

import subprocess
import urllib.request
import json
import sys

def get_loaded_modules() -> list[str]:
    result = subprocess.run(["lsmod"], capture_output=True, text=True)
    modules = []
    for line in result.stdout.splitlines()[1:]:  # Skip header
        modules.append(line.split()[0])
    return modules

def fetch_recent_kernel_cves(days: int = 90) -> list[dict]:
    """Fetch recent kernel CVEs from OSV (Open Source Vulnerabilities)."""
    # OSV API — free, no auth required
    url = "https://api.osv.dev/v1/query"
    query = json.dumps({
        "package": {"name": "linux", "ecosystem": "Linux"},
        "version": subprocess.run(["uname", "-r"], 
                                   capture_output=True, text=True).stdout.strip()
    }).encode()
    
    try:
        req = urllib.request.Request(url, data=query,
                                      headers={"Content-Type": "application/json"})
        with urllib.request.urlopen(req, timeout=10) as resp:
            return json.loads(resp.read()).get("vulns", [])
    except Exception as e:
        print(f"Warning: Could not fetch CVE data: {e}", file=sys.stderr)
        return []

def check_cve_module_overlap(modules: list[str], cves: list[dict]) -> list[dict]:
    """Find CVEs that mention loaded module names."""
    findings = []
    module_set = set(modules)
    
    for cve in cves:
        cve_id = cve.get("id", "unknown")
        summary = cve.get("summary", "").lower()
        details = cve.get("details", "").lower()
        combined = summary + " " + details
        
        for mod in module_set:
            if mod.lower() in combined or mod.replace("_", "-").lower() in combined:
                findings.append({
                    "cve": cve_id,
                    "module": mod,
                    "summary": cve.get("summary", ""),
                    "severity": cve.get("database_specific", {}).get("severity", "unknown")
                })
                break
    
    return findings

if __name__ == "__main__":
    print("Checking loaded kernel modules against recent CVEs...")
    modules = get_loaded_modules()
    print(f"Loaded modules: {len(modules)}")
    
    cves = fetch_recent_kernel_cves()
    print(f"Recent kernel CVEs fetched: {len(cves)}")
    
    if cves:
        findings = check_cve_module_overlap(modules, cves)
        if findings:
            print(f"\nPotentially applicable CVEs ({len(findings)}):")
            for f in sorted(findings, key=lambda x: x["severity"], reverse=True):
                print(f"  [{f['severity'].upper()}] {f['cve']}: {f['module']}")
                print(f"    {f['summary'][:80]}...")
        else:
            print("\nNo CVEs matched to currently loaded modules")
    
    # Report high-CVE modules that are loaded but potentially unnecessary
    HIGH_CVE_MODULES = {
        "bluetooth": "Bluetooth stack — consistent LPE CVE source",
        "ax25": "Amateur Radio — niche protocol, rarely needed",
        "ipx": "IPX — obsolete protocol",
        "can": "CAN bus — industrial protocol",
        "dccp": "DCCP — deprecated transport protocol",
        "tipc": "TIPC — cluster protocol, rarely used on general servers",
    }
    
    print("\n=== High-CVE modules currently loaded ===")
    found_any = False
    for mod, desc in HIGH_CVE_MODULES.items():
        if mod in modules:
            print(f"  LOADED: {mod} — {desc}")
            found_any = True
    if not found_any:
        print("  None — good attack surface hygiene")

Step 6 — Track surface reduction as a security metric

# Prometheus metrics for kernel attack surface
# Expose via node_exporter custom collector or textfile collector

# /etc/node_exporter/textfile_collector/kernel_surface.sh
#!/bin/bash
# Run as: */15 * * * * /etc/node_exporter/textfile_collector/kernel_surface.sh > /var/lib/node_exporter/textfile_collector/kernel_surface.prom

LOADED_COUNT=$(lsmod | wc -l)
BLACKLISTED=$(cat /etc/modprobe.d/attack-surface-reduction.conf 2>/dev/null | \
              grep "^install.*bin/false" | wc -l)
IO_URING=$(sysctl -n kernel.io_uring_disabled 2>/dev/null || echo 0)
UNPRIV_NS=$(sysctl -n kernel.unprivileged_userns_clone 2>/dev/null || echo 1)

cat << EOF
# HELP kernel_loaded_modules_total Number of currently loaded kernel modules
# TYPE kernel_loaded_modules_total gauge
kernel_loaded_modules_total $LOADED_COUNT

# HELP kernel_blacklisted_modules_total Number of blacklisted kernel modules
# TYPE kernel_blacklisted_modules_total gauge
kernel_blacklisted_modules_total $BLACKLISTED

# HELP kernel_io_uring_disabled io_uring restriction level (0=enabled, 1=unpriv-disabled, 2=disabled)
# TYPE kernel_io_uring_disabled gauge
kernel_io_uring_disabled $IO_URING

# HELP kernel_unprivileged_userns_disabled Whether unprivileged user namespaces are disabled
# TYPE kernel_unprivileged_userns_disabled gauge
kernel_unprivileged_userns_disabled $((1 - UNPRIV_NS))
EOF

Expected Behaviour

Control Before After
Bluetooth CVE published Applicable — btusb loaded Not applicable — module blacklisted
io_uring LPE CVE Reachable from any process Blocked for unprivileged users via sysctl
Amateur Radio protocol CVE Applicable — ax25 present Not applicable — blacklisted
Kernel CVE count requiring tracking All kernel CVEs for the running version Subset matching enabled subsystems only
lsmod output 150+ modules Reduced to modules actually needed

Trade-offs

Aspect Benefit Cost Mitigation
Module blacklisting Reduces applicable CVE count; eliminates code paths Some applications may unexpectedly require a blacklisted module Audit applications before blacklisting; test in staging; check dmesg for module load failures
io_uring_disabled=1 Removes sustained LPE CVE family Applications using io_uring for performance will fail Check: grep -r io_uring /proc/*/maps to identify users before disabling
unprivileged_userns_clone=0 Eliminates container escape CVE classes Rootless containers (Podman, Docker rootless) require user namespaces Accept the trade-off on non-container hosts; do not disable on Kubernetes nodes or container hosts
Compiling a minimal kernel Maximum CVE reduction Complex; breaks hardware detection; not feasible for cloud instances Prefer module blacklisting on existing kernels over recompilation

Failure Modes

Failure Symptom Detection Recovery
Blacklisted module needed by unexpected dependency Application fails to start; dmesg shows module load blocked journalctl shows “FATAL: Module X is blacklisted” Remove specific module from blacklist; add targeted allowlist comment explaining why
io_uring_disabled=1 breaks database with io_uring backend PostgreSQL/MySQL with io_uring backend fails Application error log; database startup failure Set kernel.io_uring_disabled=0; evaluate if the database’s io_uring use is optional
Module blacklist not applied after kernel update New kernel image loads without blacklist Verify blacklist with modprobe --dry-run bluetooth Ensure /etc/modprobe.d/ config persists across kernel updates; verify in post-update runbook
Overly aggressive blacklisting breaks cloud provider tooling Cloud agent fails (AWS SSM, GCP guest agent) Cloud metadata and agent logs Review cloud provider module dependencies before blacklisting; exclude those modules from the list