Linux netfilter and nf_tables Security Hardening
Problem
Linux netfilter is the kernel framework that underpins every form of packet filtering and network address translation on Linux systems. iptables, nftables, conntrack, ipset, and every stateful firewall feature the kernel exposes are built on top of netfilter hooks embedded in the network stack. Since kernel 3.13, nf_tables has been the modern packet filtering subsystem, replacing iptables with a more expressive rule set language evaluated by a register-based virtual machine inside the kernel. Unlike iptables, which used fixed match/target modules, nf_tables allows userspace to compose arbitrary rule expressions — comparisons, arithmetic operations, set lookups, map lookups, and verdict expressions — that the kernel evaluates inline in the packet path. This flexibility is powerful for firewall authors and catastrophic when the expression evaluation engine has a bug.
CVE-2026-31414 was reserved on March 9, 2026 and published in May 2026. The vulnerability resides in the netfilter connection tracking expectations mechanism, specifically in nf_conntrack_expect. Connection tracking helpers are kernel modules that inspect application-layer protocols (FTP, SIP, IRC, H.323) and create “expectations” — pre-authorised connection entries in the conntrack table — to allow the secondary data connections these protocols establish. The unsafe helper lookup procedure in CVE-2026-31414 permitted improper access to connection tracking helpers through the expectation creation path, potentially allowing a local attacker to trigger privilege escalation or a denial of service by crafting malformed expectations. The vulnerability affects kernels in the 6.1–6.10 range. The nearly two-month gap between CVE reservation (March 9) and public disclosure (May 2026) gave major distributions adequate time to prepare kernel updates — but the reservation itself was publicly visible on MITRE from March 9, clearly signalling that a netfilter security fix was in progress to anyone watching the MITRE CVE feed.
The netfilter subsystem has a long and consistent history of high-severity CVEs. CVE-2022-1972 was an out-of-bounds write in nf_tables. CVE-2023-32233 was a use-after-free in nf_tables that enabled local privilege escalation to root and was demonstrated in working exploit code within days of disclosure. CVE-2024-1086 was another nf_tables use-after-free that was widely exploited in the wild; a working public exploit achieved local root on kernels from 5.14 to 6.6. CVE-2026-31414 continues this pattern. The attack surface is structurally large: nf_tables accepts complex rule expressions from userspace via netlink sockets; any process with CAP_NET_ADMIN can load nf_tables rules; because unprivileged user namespace creation grants CAP_NET_ADMIN within the new network namespace, and many distributions enable this by default, a completely unprivileged local user can reach the nf_tables expression engine. Bugs in expression evaluation, set/map lookup, or the verdict interpreter yield kernel arbitrary read/write primitives.
The open-source visibility problem with netfilter is well-documented and operationally relevant. The netfilter developers maintain two public git trees: git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git (the stable fixes tree) and nf-next.git (the development tree). Security fixes land in the nf tree and are visible in its commit log before being pulled into Linus’s mainline tree and subsequently into distribution kernels. CVE-2024-1086 illustrates the pattern precisely: the fix was committed to the nf tree on January 31, 2024 with commit f342de4e2f33e0e39165d8639387aa6c19dff660, touching nft_verdict_init() in nf_tables_api.c. The commit message described a use-after-free fix but carried no CVE reference. This commit was publicly visible in the nf tree for approximately three weeks before the CVE was formally assigned and before the fix propagated into Ubuntu, Debian, or RHEL kernel packages. Researchers monitoring nf.git identified the fix type from the diff — freeing a reference that was being held past its valid lifetime in the verdict path — and developed working exploits during this window. The same early-visibility pattern is expected to recur for every future netfilter security fix, including the class of bug addressed by CVE-2026-31414.
Effective defence against this class of vulnerability has two tracks. The first is architectural: eliminate or restrict the conditions under which unprivileged code can reach the nf_tables expression engine. The second is operational: monitor the nf tree for security-relevant commits and use that signal to trigger accelerated kernel patching, before distribution packages arrive. Neither track alone is sufficient. Monitoring the nf tree at https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git/log/ for commits touching net/netfilter/nf_conntrack*.c and net/netfilter/nf_tables*.c, and subscribing to netfilter-devel@vger.kernel.org, provides early warning. Cross-referencing commits against https://www.openwall.com/lists/oss-security/ for netfilter disclosure threads completes the picture.
Target systems: Linux kernel 6.1–6.10 for CVE-2026-31414 specifically; all kernels with nf_tables enabled and unprivileged user namespace creation permitted for the broader nf_tables LPE attack class — this covers Ubuntu 22.04 and later, Debian 11 and later, RHEL 9 and later, Fedora 37 and later, and most container-hosting systems.
Threat Model
-
Local unprivileged user via user namespace. An attacker with a local shell account runs
unshare -n— which requires no privileges on most modern distributions — gainingCAP_NET_ADMINwithin a new network namespace. The attacker then loads a crafted nf_tables ruleset that triggers CVE-2026-31414 or a related expression evaluation bug in the kernel. The result is a kernel arbitrary write primitive, used to overwrite a process credential structure or a function pointer, achieving full local privilege escalation to root. -
Container escape via
CAP_NET_ADMIN. Kubernetes pods running network plugins (Calico, Cilium, Flannel) frequently requireCAP_NET_ADMINwithin their network namespace. An attacker who achieves code execution inside such a pod — through a vulnerable application or a supply chain compromise — uses the granted capability to load a crafted nf_tables ruleset exploiting a use-after-free or out-of-bounds write, escaping the container’s network namespace boundary and obtaining kernel-level access on the host node. -
Patch-gap attacker monitoring the
nftree. An attacker monitorshttps://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git/log/for commits touchingnf_conntrack_expect.cornf_tables_core.c. When the commit fixing CVE-2026-31414’s unsafe helper lookup appears in the tree, the attacker reads the diff, identifies the precise code path that was changed (the expectation creation and helper assignment sequence innf_conntrack_expect), and begins developing a proof-of-concept against unpatched systems. With an approximately eight-week window between the upstream commit and the arrival of patched kernels in major distribution package repositories, the attacker has substantial time to produce a working exploit and deploy it against exposed targets before defenders apply the fix. -
Denial of service via conntrack exhaustion. An attacker able to send network traffic to a target (or able to influence traffic patterns through a compromised adjacent system) crafts packets designed to create large numbers of connection tracking entries or malformed expectations. Without conntrack table size limits, this can fill the entire conntrack table (
nf_conntrack_max), causing all new connection establishment to fail. On a system using automatic conntrack helper assignment (nf_conntrack_helper=1), crafted application-layer traffic can also trigger unexpected helper invocations that interact with the vulnerability class described in CVE-2026-31414.
The blast radius of an nf_tables or conntrack exploit is wide because the netfilter subsystem runs in kernel context with full kernel memory access. A successful local privilege escalation reaches every process, every credential, every secret on the machine. In a Kubernetes environment, a compromised node can be used to exfiltrate secrets from etcd-backed volumes, impersonate the kubelet, or pivot to other cluster nodes via the control plane. Reducing the blast radius requires eliminating the preconditions — unprivileged user namespace creation and automatic conntrack helper assignment — and maintaining a kernel patching cadence fast enough to close the patch-gap window.
Configuration / Implementation
Restricting Unprivileged User Namespace Creation
Unprivileged user namespace creation is the primary vector for local privilege escalation via nf_tables. A user who cannot create a new network namespace cannot reach CAP_NET_ADMIN without already having elevated privileges. On Debian and Ubuntu, a non-upstream sysctl knob controls this:
# Disable unprivileged user namespace creation (Debian/Ubuntu)
sysctl -w kernel.unprivileged_userns_clone=0
# Persist across reboots
echo "kernel.unprivileged_userns_clone=0" >> /etc/sysctl.d/99-netfilter-hardening.conf
sysctl -p /etc/sysctl.d/99-netfilter-hardening.conf
On RHEL, Fedora, and distributions without the unprivileged_userns_clone knob, use the upstream kernel parameter:
# Disable user namespaces entirely (upstream kernel)
sysctl -w user.max_user_namespaces=0
# Persist
echo "user.max_user_namespaces=0" >> /etc/sysctl.d/99-netfilter-hardening.conf
Verify the restriction is effective:
# Should fail with "unshare: unshare failed: Operation not permitted"
unshare -n ip link
The trade-off is real: kernel.unprivileged_userns_clone=0 breaks Podman rootless containers (which use user namespaces to avoid requiring root for container operations), Chrome and Chromium’s process sandbox (which uses a user namespace to isolate renderer processes), and Bubblewrap-based sandboxing tools used by Flatpak. Document this trade-off explicitly before deploying. On servers and production Linux hosts where rootless container tooling is not required, this is the single most effective control against nf_tables local privilege escalation.
Disabling nf_tables If Not Required
Systems using iptables-legacy as their firewall backend may have nf_tables loaded as a kernel module but unused. Removing the module eliminates the attack surface entirely.
# Check whether nf_tables is currently in active use
nft list ruleset
# Check which iptables backend is active
iptables-legacy-save | head -5
# If no rules exist in nf_tables and iptables-legacy is in use, unload the module
modprobe -r nf_tables
# Verify unload
lsmod | grep nf_tables
If modprobe -r nf_tables returns an error because the module is in use, identify what is holding it:
lsmod | grep -E "^nf_tables|^nft_"
To prevent nf_tables from loading on future boots:
echo "blacklist nf_tables" >> /etc/modprobe.d/blacklist-nftables.conf
update-initramfs -u # Debian/Ubuntu
# or
dracut --force # RHEL/Fedora
Note: blacklisting nf_tables prevents Docker and Kubernetes from using iptables-nft (the default on RHEL 9 and newer distributions). Verify your container runtime’s firewall backend before applying this control in a container hosting environment.
Conntrack Hardening
Three sysctl settings address the conntrack-specific attack surface, including the helper assignment behaviour directly implicated in CVE-2026-31414-class bugs.
# Cap the conntrack table to prevent exhaustion DoS
# Default is often unlimited or set based on system RAM; 131072 is reasonable for most servers
sysctl -w net.netfilter.nf_conntrack_max=131072
# Reduce the idle timeout for established TCP connections
# Default is 432000 (5 days); 3600 (1 hour) aggressively reclaims table entries
sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=3600
# Disable automatic conntrack helper assignment
# This is the direct mitigation for CVE-2026-31414-class unsafe helper lookup bugs
# Conntrack helpers will no longer be automatically assigned to new connections
sysctl -w net.netfilter.nf_conntrack_helper=0
Persist these settings:
cat >> /etc/sysctl.d/99-netfilter-hardening.conf << 'EOF'
net.netfilter.nf_conntrack_max=131072
net.netfilter.nf_conntrack_tcp_timeout_established=3600
net.netfilter.nf_conntrack_helper=0
EOF
sysctl -p /etc/sysctl.d/99-netfilter-hardening.conf
Verify the helper setting took effect:
cat /proc/sys/net/netfilter/nf_conntrack_helper
# Expected output: 0
Monitor current conntrack table utilisation to avoid inadvertently setting nf_conntrack_max too low:
# Current number of conntrack entries
cat /proc/sys/net/netfilter/nf_conntrack_count
# Should be well below nf_conntrack_max; if >80%, increase max
The nf_conntrack_helper=0 setting disables automatic helper assignment globally. If your environment uses NAT traversal for FTP, SIP, or IRC — protocols that require conntrack helpers for ALG (Application Layer Gateway) functionality — you must explicitly configure helpers on a per-connection basis using nft ct helper rules rather than relying on automatic assignment. This is covered in the Trade-offs section.
Restricting Netlink Socket Access via Seccomp
Processes that do not need to configure the firewall should not be able to open AF_NETLINK sockets, which are the interface through which userspace communicates with netfilter and nf_tables. A seccomp profile blocking AF_NETLINK socket creation for non-firewall workloads eliminates the ability of a compromised process to reach nf_tables at all.
The seccomp rule to block socket(AF_NETLINK, ...) — AF_NETLINK is address family 16:
{
"defaultAction": "SCMP_ACT_ALLOW",
"syscalls": [
{
"names": ["socket"],
"action": "SCMP_ACT_ERRNO",
"errnoRet": 1,
"args": [
{
"index": 0,
"value": 16,
"op": "SCMP_CMP_EQ"
}
]
}
]
}
Save this as /etc/seccomp/block-netlink.json and apply it to non-network-admin workloads. In Kubernetes, apply it as a pod-level seccomp profile for pods that do not need firewall access:
apiVersion: v1
kind: Pod
metadata:
name: app-pod
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: block-netlink.json
containers:
- name: app
image: app:latest
Store the seccomp profile in the kubelet’s seccomp profile directory (typically /var/lib/kubelet/seccomp/) and reference it with localhostProfile: block-netlink.json.
For systemd services, apply the equivalent restriction using RestrictAddressFamilies:
# In the [Service] section of a systemd unit file
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
# Omitting AF_NETLINK prevents the service from opening netlink sockets
Auditing netlink Socket Access
Track which processes attempt to open netlink sockets to identify unexpected access attempts:
# Add an audit rule to log AF_NETLINK socket creation
auditctl -a always,exit -F arch=b64 -S socket -F a0=16 -k netfilter_socket
# Watch the audit log for hits
ausearch -k netfilter_socket -ts today
Audit the currently loaded nf_tables ruleset to detect unexpected rules that may have been loaded by a compromised process:
# List the full ruleset with handle numbers for auditability
nft list ruleset -a
# Validate a ruleset file before loading it
nft -c -f /etc/nftables.conf
Monitoring the netfilter Tree for Silent Fixes
Establish a local clone of the nf tree and run a cron job to detect new commits before they acquire CVE numbers:
# Initial clone of the netfilter stable fixes tree
git clone https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git /opt/netfilter-nf --bare
# Weekly cron job: fetch and report new commits touching netfilter source files
cat > /etc/cron.d/netfilter-monitor << 'EOF'
0 8 * * 1 root git -C /opt/netfilter-nf fetch origin 2>/dev/null && git -C /opt/netfilter-nf log --oneline ORIG_HEAD..origin/main -- net/netfilter/ 2>/dev/null | grep -q . && git -C /opt/netfilter-nf log --oneline ORIG_HEAD..origin/main -- net/netfilter/ | mail -s "netfilter nf tree: new commits" security-team@example.com
EOF
A more robust daily monitoring script with focused filtering for high-risk paths:
#!/bin/bash
# /usr/local/sbin/netfilter-monitor.sh
# Run daily; alerts on new commits touching connection tracking or nf_tables core
NF_REPO=/opt/netfilter-nf
ALERT_EMAIL="${NETFILTER_ALERT_EMAIL:-security-team@example.com}"
HIGH_RISK_PATHS="net/netfilter/nf_conntrack net/netfilter/nf_tables"
git -C "$NF_REPO" fetch origin --quiet 2>&1
for path in $HIGH_RISK_PATHS; do
commits=$(git -C "$NF_REPO" log --oneline ORIG_HEAD..origin/main -- "$path" 2>/dev/null)
if [ -n "$commits" ]; then
echo "New commits in $path:"
echo "$commits"
echo ""
echo "Review at: https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git/log/"
fi
done | mail -s "[ALERT] netfilter nf tree new commits - $(date +%Y-%m-%d)" "$ALERT_EMAIL"
git -C "$NF_REPO" update-ref refs/previous/main origin/main 2>/dev/null || true
Cross-reference alerts with:
https://www.openwall.com/lists/oss-security/— search for “netfilter” threadshttps://osv.dev/list?ecosystem=Linux— kernel CVEs with netfilter tagsnetfilter-devel@vger.kernel.orgmailing list archives
Expected Behaviour
| Signal | Unpatched / default config | Patched + hardened |
|---|---|---|
Unprivileged user runs unshare -n then loads crafted nf_tables ruleset |
Succeeds; attacker reaches CAP_NET_ADMIN in new network namespace and submits nf_tables expressions to the kernel expression engine |
unshare -n fails with “Operation not permitted”; kernel never receives the crafted ruleset |
| Connection to FTP server triggers conntrack helper auto-assignment | nf_conntrack_ftp helper assigned automatically; CVE-2026-31414-class unsafe helper lookup reachable |
nf_conntrack_helper=0 prevents automatic assignment; helper lookup path is not reached for untrusted traffic |
| Attacker sends traffic crafted to exhaust conntrack table | Conntrack table fills without bound; all new connections fail; kernel may OOM or become unresponsive | nf_conntrack_max=131072 caps table size; nf_conntrack_tcp_timeout_established=3600 reclaims idle entries; table exhaustion is bounded |
| Compromised pod attempts nf_tables UAF exploit via netlink socket | Pod with CAP_NET_ADMIN successfully opens AF_NETLINK socket and loads crafted rules into the expression engine |
Seccomp profile blocks socket(AF_NETLINK, ...) for non-network-admin pods; pod receives EPERM; audit rule logs the attempt |
Patch-gap attacker monitors nf.git for CVE-2026-31414 fix commit |
Fix commit visible in nf tree for weeks before distribution kernels are updated; attacker develops exploit against unpatched systems |
Automated monitoring detects new commits in net/netfilter/nf_conntrack* within 24 hours; alert triggers accelerated kernel patching workflow |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
kernel.unprivileged_userns_clone=0 |
Eliminates the primary unprivileged path to CAP_NET_ADMIN and the nf_tables expression engine; closes CVE-2023-32233, CVE-2024-1086, and CVE-2026-31414 class attacks for local unprivileged users |
Breaks Podman rootless (user namespaces required for uid mapping), Chrome/Chromium renderer sandbox (uses user namespace for isolation), Flatpak/Bubblewrap sandboxing | Use on servers and production nodes where rootless container tools are not deployed; document the requirement explicitly; use Podman with explicit root or a dedicated socket instead |
nf_conntrack_helper=0 |
Disables automatic conntrack helper assignment; directly mitigates CVE-2026-31414-class unsafe helper lookup; reduces attack surface in nf_conntrack_expect.c |
Breaks passive mode FTP, SIP call setup, IRC DCC, and other protocols that rely on ALG helpers for secondary connection authorisation | Configure explicit ct helper rules in nft for the specific protocols and ports your environment uses; document which conntrack helpers are intentionally enabled |
| nf_tables module blacklist | Removes the nf_tables expression engine from kernel memory entirely; eliminates the entire CVE class for kernels where nf_tables is unused | Prevents Docker (iptables-nft mode) and Kubernetes kube-proxy from loading iptables rules on RHEL 9+ and newer distributions that default to the nft backend | Switch container runtime and kube-proxy to the legacy iptables backend before blacklisting; verify with iptables -V that “legacy” appears in the output |
| AF_NETLINK seccomp restriction | Prevents compromised application processes from opening netlink sockets to interact with netfilter; contains post-exploitation capability | Breaks any tool inside the sandboxed process that legitimately needs netlink (network namespace introspection, CNI plugins, systemd-networkd interactions) | Apply only to well-understood workloads that provably never need netlink; do not apply to CNI plugin containers, network monitoring agents, or kube-proxy |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
nf_conntrack_helper=0 breaks FTP passive mode |
FTP clients connect successfully but PASV command returns a port that firewalls do not open; file transfers hang or fail with connection refused |
Test with ftp or lftp to a known-good FTP server; PASV connection times out; conntrack -L shows no expectation entries created for FTP |
Add an explicit nft ct helper rule: nft add rule inet filter input ct helper set "ftp-21"; load the nf_conntrack_ftp module; verify expectations appear with conntrack -L -p tcp --dport 21 |
| nf_tables blacklist prevents Docker iptables from loading | Docker daemon fails to start or reports “Failed to program FORWARD chain”; containers have no network connectivity; docker network ls shows networks but iptables -L shows no Docker rules |
journalctl -u docker shows iptables errors; `lsmod |
grep nf_tables` returns empty |
| Seccomp AF_NETLINK block breaks CNI plugin | Kubernetes pod networking fails; pods start but immediately enter CrashLoopBackOff or cannot reach cluster DNS; CNI plugin logs show socket errors |
kubectl describe pod shows container started but network setup failed; strace -e trace=network on the CNI binary shows socket(AF_NETLINK, ...) = -1 EPERM; auditd logs show netfilter_socket hits from the CNI process |
Remove the seccomp profile from the node’s CNI DaemonSet pods; apply the profile only to application pods; maintain a separate, less restrictive profile for network infrastructure pods |
| Conntrack table full causes connection drops | New TCP connections to the host fail with “connection refused” or time out; existing connections are unaffected; SSH to the host from a new source fails | cat /proc/sys/net/netfilter/nf_conntrack_count equals nf_conntrack_max; `dmesg |
grep "nf_conntrack: table full"shows recent messages;conntrack -L |