eBPF-XDP for L4 DDoS Mitigation: Line-Rate Drop in the Kernel
Problem
Layer-4 floods (SYN flood, UDP amplification, raw packet floods at >1 Mpps) overwhelm a server long before the application gets a chance to respond. The kernel’s path from NIC to application — driver receive, sk_buff allocation, conntrack, netfilter, socket lookup, application receive — is hot. At a few million packets/second of attack traffic, the host’s CPUs saturate handling kernel-level packet bookkeeping.
XDP (eXpress Data Path) is an eBPF hook in the network driver that runs before the kernel allocates an sk_buff. A program at this hook decides:
XDP_DROP— packet discarded immediately. No kernel memory allocated, no further processing. Cost: ~1ns per packet.XDP_TX— packet bounced back out the same NIC.XDP_REDIRECT— packet sent to another CPU or another NIC.XDP_PASS— packet enters normal kernel processing.
For DDoS mitigation, dropping at XDP gives line-rate filter capacity on commodity hardware: a 25 Gbps NIC drops the full attack rate while leaving CPU available for legitimate traffic.
By 2026 the tooling is mature. Cilium uses XDP for its load-balancer fast-path; Katran (Facebook’s L4 LB) and Cloudflare’s edge use XDP for production DDoS mitigation; tools like bpf-iptools, xdp-filter, and xdpctl provide pre-built XDP filters for common patterns.
The specific gaps in a default Linux server facing DDoS:
- iptables / nftables run in netfilter — long after sk_buff allocation. Drop happens, but the cost of allocating-and-discarding sk_buff per packet still saturates CPU.
tcp_syncookiesmitigates SYN flood specifically but doesn’t help against UDP amplification or generic flood.- Cloud-provider DDoS protection (AWS Shield, Cloudflare) handles most volumetric traffic but not internal-east-west floods or smaller-scale attacks below cloud-provider thresholds.
- Self-hosted edges (NGINX, HAProxy, custom load balancers) lack a kernel-level filter; everything reaches userspace.
This article covers writing simple XDP programs for SYN flood, UDP amplification, and rate-limiting; loading via bpftool and integration with Cilium; observability via per-action counters; the trade-offs vs. cloud-managed DDoS.
Target systems: Linux kernel 5.4+ (XDP native mode); 5.10+ for stable XDP-CPUMAP; NICs with native XDP driver support (Intel ixgbe / i40e / ice, Mellanox mlx5, Broadcom bnxt, virtio-net). Most cloud instances support XDP in generic mode (slower but functional).
Threat Model
- Adversary 1 — Volumetric SYN flood: botnet sends 5-50 million SYN packets/second to exhaust server connection-tracking and CPU.
- Adversary 2 — UDP amplification: spoofed-source UDP traffic to a service that responds with larger payloads (DNS, NTP, SSDP).
- Adversary 3 — Pulse-wave attack: short bursts of attack traffic followed by gaps; bypasses cloud-DDoS detection thresholds.
- Adversary 4 — Encrypted L4 flood: UDP / TCP packets to ports the server expects (HTTPS 443, DNS 53), so the cloud DDoS edge cannot drop based on protocol mismatch.
- Access level: all adversaries have network reach to the public IP; some have spoofing capability (BCP38-non-compliant networks).
- Objective: Service unavailability via CPU exhaustion, network-stack saturation, or downstream resource exhaustion.
- Blast radius: Without XDP-level filtering, a multi-Mpps flood saturates host CPU and triggers TCP-stack-level effects (conntrack overflow, TIME_WAIT exhaustion). With XDP filtering, the host stays available; legitimate traffic is unaffected.
Configuration
Step 1: Verify XDP Capabilities
# Check NIC driver support.
ip link show eth0
# 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> ... mode DEFAULT group default
# Native XDP support per driver.
ethtool -i eth0 | grep driver
# driver: ixgbe (or i40e, ice, mlx5, bnxt — these support native mode)
# Test XDP loading (load an empty program).
sudo ip link set dev eth0 xdpgeneric obj /dev/null sec xdp
# (succeeds = generic mode works; native mode requires the driver flag)
For VMs in clouds, native XDP often isn’t available; fall back to generic mode (XDP_FLAGS_SKB_MODE), which is slower but still meaningfully better than netfilter.
Step 2: A Simple SYN Flood Filter
// xdp_synflood.c
// Drop SYN packets from sources exceeding rate limit; pass others.
#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/tcp.h>
#include <bpf/bpf_helpers.h>
#define MAX_SOURCES 1000000
#define SYN_RATE_LIMIT 100 /* SYNs per second per source */
struct {
__uint(type, BPF_MAP_TYPE_LRU_HASH);
__type(key, __u32); /* source IP */
__type(value, __u64); /* token bucket: 32-bit count + 32-bit window */
__uint(max_entries, MAX_SOURCES);
} syn_rate_map SEC(".maps");
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, __u32);
__type(value, __u64);
__uint(max_entries, 4);
} stats SEC(".maps");
#define STAT_PASS 0
#define STAT_DROP_RATE 1
#define STAT_DROP_BAD 2
#define STAT_DROP_LIMIT 3
SEC("xdp")
int xdp_syn_filter(struct xdp_md *ctx) {
void *data = (void *)(long)ctx->data;
void *data_end = (void *)(long)ctx->data_end;
struct ethhdr *eth = data;
if ((void *)(eth + 1) > data_end) return XDP_DROP;
if (eth->h_proto != bpf_htons(ETH_P_IP)) return XDP_PASS;
struct iphdr *ip = (void *)(eth + 1);
if ((void *)(ip + 1) > data_end) return XDP_DROP;
if (ip->protocol != IPPROTO_TCP) return XDP_PASS;
struct tcphdr *tcp = (void *)ip + ip->ihl * 4;
if ((void *)(tcp + 1) > data_end) return XDP_DROP;
/* Only rate-limit SYN packets without ACK (initial connection). */
if (!(tcp->syn) || tcp->ack) return XDP_PASS;
__u32 src = ip->saddr;
__u64 *bucket = bpf_map_lookup_elem(&syn_rate_map, &src);
__u64 now_sec = bpf_ktime_get_ns() / 1000000000ULL;
if (!bucket) {
__u64 init_val = (now_sec << 32) | 1;
bpf_map_update_elem(&syn_rate_map, &src, &init_val, BPF_ANY);
__u32 k = STAT_PASS;
__u64 *s = bpf_map_lookup_elem(&stats, &k);
if (s) (*s)++;
return XDP_PASS;
}
__u32 window = (*bucket) >> 32;
__u32 count = (*bucket) & 0xFFFFFFFF;
if (window != now_sec) {
/* Reset window. */
__u64 new_val = (now_sec << 32) | 1;
bpf_map_update_elem(&syn_rate_map, &src, &new_val, BPF_ANY);
return XDP_PASS;
}
count++;
if (count > SYN_RATE_LIMIT) {
__u32 k = STAT_DROP_RATE;
__u64 *s = bpf_map_lookup_elem(&stats, &k);
if (s) (*s)++;
return XDP_DROP;
}
__u64 new_val = ((__u64)now_sec << 32) | count;
bpf_map_update_elem(&syn_rate_map, &src, &new_val, BPF_ANY);
return XDP_PASS;
}
char _license[] SEC("license") = "GPL";
Compile and load:
clang -O2 -g -target bpf -c xdp_synflood.c -o xdp_synflood.o
sudo ip link set dev eth0 xdp obj xdp_synflood.o sec xdp
Verify:
sudo bpftool prog list | grep xdp_syn_filter
sudo bpftool map dump name stats
# [
# { "key": 0, "value": [ /* PASS counts per CPU */ ] },
# { "key": 1, "value": [ /* DROP_RATE counts */ ] },
# ]
A SYN flood from a single source exceeding 100 SYNs/sec is dropped at the NIC. Legitimate connections from the same source pass through.
Step 3: UDP Amplification Filter (DNS / NTP)
// Drop UDP responses that don't match an outbound query in flight.
// Useful as a UDP amplification reflection drop.
SEC("xdp")
int xdp_udp_filter(struct xdp_md *ctx) {
void *data = (void *)(long)ctx->data;
void *data_end = (void *)(long)ctx->data_end;
struct ethhdr *eth = data;
if ((void *)(eth + 1) > data_end || eth->h_proto != bpf_htons(ETH_P_IP))
return XDP_PASS;
struct iphdr *ip = (void *)(eth + 1);
if ((void *)(ip + 1) > data_end) return XDP_PASS;
if (ip->protocol != IPPROTO_UDP) return XDP_PASS;
struct udphdr *udp = (void *)ip + ip->ihl * 4;
if ((void *)(udp + 1) > data_end) return XDP_PASS;
__u16 dport = bpf_ntohs(udp->dest);
/* Drop unsolicited DNS responses to non-resolver hosts. */
if (dport == 53 || dport == 123) { /* DNS or NTP source ports */
/* On a host that's NOT a DNS or NTP server, drop these. */
__u32 k = STAT_DROP_BAD;
__u64 *s = bpf_map_lookup_elem(&stats, &k);
if (s) (*s)++;
return XDP_DROP;
}
return XDP_PASS;
}
This is a coarse filter; a real deployment correlates UDP responses with outbound queries via a connection-tracking map. The principle: at XDP, drop traffic that has no business reaching this host.
Step 4: IP Allowlist / Blocklist via XDP Maps
Often the simplest mitigation is “drop traffic from known-bad sources.” Maintain a BPF_MAP_TYPE_LPM_TRIE for CIDR blocklists:
struct cidr_key {
__u32 prefixlen; /* CIDR prefix length */
__u32 addr; /* IPv4 address */
};
struct {
__uint(type, BPF_MAP_TYPE_LPM_TRIE);
__type(key, struct cidr_key);
__type(value, __u32);
__uint(max_entries, 100000);
__uint(map_flags, BPF_F_NO_PREALLOC);
} blocklist SEC(".maps");
SEC("xdp")
int xdp_blocklist(struct xdp_md *ctx) {
/* ... parse to ip header ... */
struct cidr_key k = {.prefixlen = 32, .addr = ip->saddr};
if (bpf_map_lookup_elem(&blocklist, &k)) {
return XDP_DROP;
}
return XDP_PASS;
}
Userspace pushes blocklist updates to the map at runtime — no XDP program reload needed. Updates take effect within nanoseconds.
# Add a CIDR.
sudo bpftool map update name blocklist key 24 0 0 0 192 0 2 0 value 0
Couple to a threat-intel feed: pull blocked IPs from a service like Spamhaus DROP, AbuseIPDB, your SIEM’s hot-list.
Step 5: Integrating With Cilium
Cilium uses XDP under the hood. For a Cilium cluster, custom XDP programs install as CiliumLoadBalancerIPPool-aware filters or via CiliumNetworkPolicy extensions. The cilium-bpfctl tool inspects active programs.
For per-Pod XDP (rare; usually node-level XDP is sufficient):
apiVersion: cilium.io/v2
kind: CiliumLocalRedirectPolicy
metadata:
name: ddos-filter
spec:
redirectFrontend:
addressMatcher:
ip: 192.0.2.1
toPorts:
- port: "443"
redirectBackend:
localEndpointSelector:
matchLabels:
app: ddos-filter
toPorts:
- port: "8443"
For a high-traffic edge, treat XDP as part of the host-level setup (independent of Cilium); use Cilium for service-mesh and policy.
Step 6: Observability
XDP counters via per-CPU array maps. Aggregate to Prometheus:
# Read stats via bpftool, aggregate per-CPU.
sudo bpftool map dump name stats -j | jq '
.[] | {
action: .key,
total: ([.formatted.value[]] | add)
}'
Wire into Prometheus via a small exporter:
# bpf_xdp_exporter.py
from prometheus_client import start_http_server, Gauge
import json, subprocess, time
ACTIONS = ["pass", "drop_rate", "drop_bad", "drop_limit"]
metric = Gauge("xdp_packets_total", "XDP per-action packet count", ["action"])
while True:
out = subprocess.check_output(
["bpftool", "-j", "map", "dump", "name", "stats"]).decode()
data = json.loads(out)
for entry in data:
action_idx = entry["key"]
total = sum(entry["formatted"]["value"])
metric.labels(action=ACTIONS[action_idx]).set(total)
time.sleep(1)
start_http_server(9100)
Alert rules:
rate(xdp_packets_total{action="drop_rate"}[1m]) > 100000— sustained SYN flood.rate(xdp_packets_total{action="pass"}[1m])drops sharply withdrop_*rising — active attack.
Step 7: Failure Recovery
XDP loaded programs persist across NIC restarts but not host reboots. Persist via systemd:
# /etc/systemd/system/xdp-ddos-filter.service
[Unit]
Description=Load XDP DDoS filter on eth0
After=network.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/local/sbin/xdp-loader load -m native eth0 /usr/local/lib/xdp/synflood.o
ExecStop=/usr/local/sbin/xdp-loader unload eth0 --all
[Install]
WantedBy=multi-user.target
Always include unload logic; a buggy XDP program loaded with native mode can wedge the NIC, and unload-by-systemd-stop is the recovery path.
Expected Behaviour
| Signal | Without XDP | With XDP |
|---|---|---|
| SYN flood at 5 Mpps | Host CPU saturated; legitimate traffic stalls | Floor of CPU usage; floods dropped at NIC; legitimate traffic flows |
| Network-stack memory under flood | sk_buff allocations explode | bounded; flood drops before allocation |
| Latency for legitimate connection | Severely degraded | Unchanged |
| netfilter / iptables overhead | Linear with packet rate | Bypassed at XDP |
| Cloud DDoS provider triggers | At threshold | Below threshold (XDP absorbs) |
| Per-flood reaction time | Reactive (slowdown noticed) | Sub-second (XDP rate-limit) |
Synthetic test (use only against your own infrastructure):
# Generate 1M SYN/sec from a single source (pktgen).
sudo pktgen-dpdk --vdev=eth_pcap0,iface=eth1 \
-- -P -m "[1].0" \
-p 0 --tx-rate 1000000 --tx-burst 100
# On target with XDP loaded:
watch sudo bpftool map dump name stats
# Confirm drop_rate counter rises proportional to over-limit traffic.
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Native XDP on hardware NIC | Line-rate drop | Limited to driver-supported NICs | Use generic mode for VMs; functional but slower. |
| Per-source rate limit | Bounds noisy-neighbor floods | Memory for source-tracking map | LRU map sizes well; 1M entries fit in ~80 MB. |
| Programmability | Custom logic for unique attacks | eBPF programming complexity | Use existing libraries (xdp-filter, Katran) for common patterns; write custom only for special cases. |
| In-kernel speed | No userspace context switch | Debugging is harder than userspace code | Use bpftool for inspection; structured logging via per-CPU arrays. |
| Drop-and-forget | No latency from defensive logic | Legitimate traffic from rate-limited source dropped during burst | Tune limits to legitimate-burst tolerances; couple with state-aware logic. |
| Persistence via systemd | Survives NIC restart | Requires unload procedure on reboot | Include unload in systemd ExecStop; idempotent reload. |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| XDP program rejected by verifier | Loading fails | bpftool prog load error |
Verifier errors are detailed; fix the BPF program. Common: too many map lookups in a loop. |
| Native XDP unsupported on NIC | ip link set xdp fails |
Error: “operation not supported” | Use generic mode (xdpgeneric) or upgrade NIC / driver. |
| Map table fills | Some sources unrate-limited | LRU evicts entries; overall function still works | Increase max_entries. LRU is forgiving here. |
| XDP wedges NIC | Network unreachable | Cannot reach host | Have an out-of-band recovery path (IPMI, console). Always test new XDP programs in a non-production environment first. |
| Verifier loop limit hit | Program rejected on update | Loop too complex |
Refactor to avoid bounded loops; use bpf_loop helper (5.17+) for explicit iteration. |
| XDP and conntrack interaction | Connections drop unexpectedly | conntrack table grows; legitimate traffic dropped | Don’t rate-limit at XDP without considering conntrack state; combine with userspace policy. |
| False-positive drop | Legitimate clients rate-limited | Customer reports of blocked traffic | Lower limits trigger false-positives at scale (CGNAT). Tune; consider TLS-fingerprint-based identification rather than IP. |
When to Consider a Managed Alternative
Self-hosted XDP DDoS mitigation requires kernel tuning, NIC selection, BPF expertise, and 24/7 ops to tune limits as attacks evolve (10-30 hours/month for an exposed-edge fleet).
- Cloudflare Magic Transit / Spectrum: L4 DDoS at the edge; absorbs volumetric traffic before it reaches your origin.
- AWS Shield Advanced: L3/L4 protection with custom rate-rules.
- Google Cloud Armor: L7 + L4 protection with managed rules.
- OVH / DDoS-Guard: dedicated DDoS-mitigation providers for self-hosted edges.
For internal east-west traffic where cloud-edge protection doesn’t help, XDP remains the right answer.