VXLAN and Geneve Overlay Network Security

Problem

VXLAN (RFC 7348) and Geneve (RFC 8926) are the dominant overlay protocols in cloud-native environments. Kubernetes CNI plugins — Flannel, Cilium in VXLAN mode, Calico with vxlanMode: Always or CrossSubnet, and Antrea — all tunnel pod traffic over VXLAN by default. OpenStack Neutron uses VXLAN for tenant network isolation. Hypervisor overlays in VMware NSX and AWS VPC internals rely on similar encapsulation semantics. The choice is not whether your environment uses overlay networks; it is whether you have secured them.

The fundamental security problem with VXLAN is explicit in the RFC: there is no authentication or encryption. VXLAN encapsulates Ethernet frames inside UDP datagrams sent to port 4789. Any host that can deliver a UDP packet to that port on a VTEP (VXLAN Tunnel Endpoint) can inject an arbitrary Ethernet frame into any VNI (VXLAN Network Identifier) it knows. The encapsulated frame will be decapsulated and delivered as if it originated from a legitimate tenant host. From the perspective of the receiving workload, the injected frame is indistinguishable from a genuine one.

VTEP spoofing exploits this directly. An attacker forges the outer IP source address of the VXLAN UDP datagram to match a legitimate VTEP — a Kubernetes node, a hypervisor host, or an OpenStack compute node. The receiving VTEP updates its forwarding database (FDB) to associate the forged source MAC with the attacker’s IP, poisoning future unicast traffic toward that MAC. This is the overlay equivalent of ARP spoofing, operating at the tunnel layer rather than the local segment.

BUM traffic — Broadcast, Unknown-unicast, and Multicast — is an amplification vector specific to overlay networks. In a multicast-based VXLAN deployment, a BUM packet sent to the VTEP multicast group reaches every VTEP participating in that VNI. An attacker generating high-rate BUM traffic causes every node in the flooding domain to process and forward the traffic. The amplification factor scales with the number of VTEPs. In unicast VXLAN deployments using head-end replication, each VTEP floods BUM to every other VTEP individually; the computational load moves from the network to the host CPU.

VNI enumeration is a reconnaissance technique unique to VXLAN’s 24-bit VNI space (0–16,777,215). An attacker with access to UDP/4789 on even a single VTEP can probe VNIs by sending VXLAN packets with different VNI values and observing ICMP unreachables, timing differences, or FDB responses. A full scan of the 24-bit space at moderate packet rates completes in minutes. Discovered VNIs reveal tenant network topology and may expose VNIs with weaker isolation.

Geneve (RFC 8926) extends VXLAN’s capabilities with extensible TLV (Type-Length-Value) option headers and supports multiple inner protocol types. It uses UDP port 6081. Geneve is the encapsulation format used by OVN (Open Virtual Network), the control plane backing Kubernetes networking in OpenShift and many bare-metal deployments. The TLV extension mechanism is frequently cited as an advantage — operators can embed tenant metadata, security context labels, or flow identifiers in the option headers. However, these options add complexity without adding security. There is no authentication in the base Geneve spec. A receiver cannot verify that a TLV option value was set by a legitimate sender rather than an attacker.

The Linux kernel VXLAN implementation has accumulated CVEs alongside its growing feature set. CVE-2021-3773 (netfilter hook bypass via malformed VXLAN packets) and CVE-2022-0435 (TIPC stack overflow reachable from adjacent network, including VXLAN segments) illustrate how the kernel’s overlay stack is part of the attack surface. Kernel version discipline — tracking the stable branch at ≥ 5.15 LTS — is a prerequisite, not a substitute, for configuration hardening.

The contrast with WireGuard-encrypted overlays is instructive. WireGuard enforces mutual authentication via public-key cryptography on every tunnel; there is no unauthenticated packet path. Cilium’s WireGuard encryption mode replaces VXLAN as the overlay transport entirely. For environments that must remain on VXLAN or Geneve — due to CNI compatibility, existing tooling, or hardware offload requirements — layering IPsec in transport mode over the VXLAN UDP flows provides encryption equivalent to WireGuard at the cost of additional key management complexity.

Target systems: Linux kernel >= 5.15, Flannel 0.21+, Cilium 1.14+ (VXLAN mode), Calico 3.26+ (VXLAN mode), Antrea 1.13+.

Threat Model

Co-tenant VTEP injection. An attacker on a shared hypervisor or cloud compute node, co-located with legitimate workloads, sends crafted VXLAN UDP packets to a neighbouring VTEP. By setting the outer source IP to a legitimate VTEP’s address and choosing a known VNI, the attacker injects Ethernet frames into the target tenant’s overlay segment. No kernel exploit is required — only the ability to send arbitrary UDP datagrams, which any unprivileged user can do with raw sockets if CAP_NET_RAW is available, or via sendto() on a UDP socket.
BUM amplification DoS. A network-adjacent attacker — or an insider on any node that can reach the multicast group or the VTEP unicast addresses — sends high-rate broadcast or unknown-unicast frames into a VNI. Every VTEP in the flooding domain receives and processes each packet. In a 500-node cluster where every node is a VTEP, a single-stream 100 Mbps BUM flood causes each of the 499 other nodes to process 100 Mbps. The effective amplification is close to the node count. CPU cycles consumed by VXLAN decapsulation are unavailable for tenant workloads.
VNI enumeration. An insider, contractor, or compromised node probes the 24-bit VNI space by iterating VNI values in VXLAN packets delivered to any accessible VTEP. Differences in ICMP responses, TCP RST behaviour from encapsulated stacks, or FDB side-channel timing reveal which VNIs are active, how many tenants are present, and the approximate size of each tenant’s overlay segment. This map accelerates targeted injection and cross-tenant reconnaissance.
ARP/ND spoofing inside the overlay. A compromised Kubernetes node injects ARP replies inside an active VNI, associating a peer pod’s IP with the attacker’s MAC address. Subsequent traffic to that pod is redirected to the compromised node. Unlike classic on-link ARP spoofing, the attack originates at the overlay layer, bypassing physical network ARP inspection. CNI plugins that do not implement ARP proxy or static FDB entries are vulnerable.

The blast radius of an unmitigated overlay breach is large. VXLAN and Geneve overlays frequently carry East-West traffic that does not pass through perimeter firewalls or service mesh sidecars. A compromised VTEP can silently inspect, modify, or drop traffic between any pods or VMs sharing a VNI, with no log entries at the application layer. Multi-tenant clusters — common in managed Kubernetes services and private cloud environments — face the additional risk that a breach of one tenant’s overlay segment enables lateral movement into adjacent VNIs if VTEP ACLs are absent.

Configuration / Implementation

VTEP access control with nftables

The first line of defence is limiting which source IPs can deliver VXLAN (UDP/4789) and Geneve (UDP/6081) packets to a VTEP. Build a nftables set of allowed VTEP source addresses and drop everything else. Apply this ruleset to every node in the cluster.

#!/usr/sbin/nft -f
# /etc/nftables.d/vtep-acl.nft
# Reload with: nft -f /etc/nftables.d/vtep-acl.nft

table inet vtep_acl {

  set allowed_vteps {
    type ipv4_addr
    flags interval
    # Enumerate all node IPs that act as VTEPs.
    # Automate population from your node registry.
    elements = {
      10.0.1.0/24,   # node subnet A
      10.0.2.0/24    # node subnet B
    }
  }

  chain input_vtep {
    type filter hook input priority filter - 5; policy accept;

    # Allow VXLAN only from known VTEPs
    udp dport 4789 ip saddr @allowed_vteps accept
    udp dport 4789 drop

    # Allow Geneve only from known VTEPs
    udp dport 6081 ip saddr @allowed_vteps accept
    udp dport 6081 drop
  }
}

Apply and persist:

# Install and test
nft -c -f /etc/nftables.d/vtep-acl.nft   # dry-run check
nft -f /etc/nftables.d/vtep-acl.nft

# Verify the set is loaded
nft list set inet vtep_acl allowed_vteps

# Persist across reboots (systemd)
systemctl enable --now nftables

When nodes are added to the cluster, append their IPs to the set atomically:

# Add a new node VTEP without replacing the full set
nft add element inet vtep_acl allowed_vteps { 10.0.3.5 }

VXLAN with IPsec transport encryption

Linux’s ip xfrm subsystem can encrypt VXLAN UDP traffic at the IP layer in transport mode. This wraps each VXLAN UDP datagram in ESP without changing the outer IP header, which preserves ECMP routing and hardware offload compatibility.

Configure a transport-mode ESP policy between two VTEPs (10.0.1.10 and 10.0.1.20):

# On both VTEPs: generate a shared key (for illustration; use IKEv2 in production)
SKEY=$(openssl rand -hex 32)
AKEY=$(openssl rand -hex 20)

# On VTEP A (10.0.1.10):
# Inbound SA from VTEP B
ip xfrm state add src 10.0.1.20 dst 10.0.1.10 \
  proto esp spi 0x1002 mode transport \
  auth sha256 0x${AKEY} enc aes 0x${SKEY}

# Outbound SA to VTEP B
ip xfrm state add src 10.0.1.10 dst 10.0.1.20 \
  proto esp spi 0x1001 mode transport \
  auth sha256 0x${AKEY} enc aes 0x${SKEY}

# Policy: encrypt outbound VXLAN to VTEP B
ip xfrm policy add src 10.0.1.10/32 dst 10.0.1.20/32 \
  proto udp dport 4789 dir out \
  tmpl src 10.0.1.10 dst 10.0.1.20 proto esp mode transport

# Policy: require encrypted inbound VXLAN from VTEP B
ip xfrm policy add src 10.0.1.20/32 dst 10.0.1.10/32 \
  proto udp sport 4789 dir in \
  tmpl src 10.0.1.20 dst 10.0.1.10 proto esp mode transport

For production, replace manual key management with StrongSwan IKEv2. A minimal /etc/swanctl/conf.d/vxlan-mesh.conf:

connections {
  vtep-mesh {
    version = 2
    local_addrs  = 10.0.1.10
    remote_addrs = 10.0.1.20
    proposals    = aes256gcm16-prfsha384-ecp384

    local {
      auth = pubkey
      certs = /etc/swanctl/x509/node-a.pem
    }
    remote {
      auth = pubkey
      certs = /etc/swanctl/x509/node-b.pem
    }

    children {
      vxlan-udp {
        local_ts  = 10.0.1.10[udp/4789]
        remote_ts = 10.0.1.20[udp/4789]
        mode      = transport
        esp_proposals = aes256gcm16-ecp384
        rekey_time = 3600
      }
    }
  }
}

Load and verify:

swanctl --load-all
swanctl --initiate --child vxlan-udp
swanctl --list-sas   # confirm ESP SA is ESTABLISHED
ip xfrm state list   # confirm SAs in kernel

Cilium WireGuard encryption mode

For clusters running Cilium 1.14+, switching from VXLAN to WireGuard encryption eliminates the need for IPsec management. WireGuard provides authenticated encryption natively; there is no unauthenticated path.

Switch via Helm values:

# values-cilium-wg.yaml
tunnel: disabled          # disable VXLAN encapsulation
encryption:
  enabled: true
  type: wireguard
  wireguard:
    userspaceFallback: false   # use kernel WireGuard (requires kernel >= 5.6)

Apply to an existing cluster:

helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  -f values-cilium-wg.yaml

Verify that all nodes have WireGuard tunnels active:

# On any Cilium-managed node
cilium-dbg encrypt status
# Expected output includes:
#   Encryption: Wireguard
#   Interface: cilium_wg0
#   Public key: <node-pubkey>
#   Peers: <N peers negotiated>

# Check WireGuard interface directly
wg show cilium_wg0

WireGuard mode requires that nodes can reach each other’s WireGuard port (UDP/51871 by default in Cilium). Add this port to your VTEP ACL set or open it explicitly in nftables.

BUM traffic limitation

Disable multicast-based VXLAN flooding and switch to unicast head-end replication with a static FDB. This bounds the BUM flooding domain to explicitly configured VTEPs.

Create a VXLAN interface in unicast mode (no group or remote default):

# Create VXLAN interface without a multicast group
ip link add vxlan100 type vxlan \
  id 100 \
  dstport 4789 \
  local 10.0.1.10 \
  nolearning        # disable dynamic MAC learning from the data plane

ip link set vxlan100 up
ip addr add 192.168.100.1/24 dev vxlan100

Populate the FDB with explicit per-VTEP entries. Use the all-zeros MAC to define flood targets (BUM goes only to these VTEPs):

# Add known VTEPs to the flood list
bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 10.0.1.20
bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 10.0.1.30

# Add a specific MAC-to-VTEP mapping (unicast, no flooding)
bridge fdb append aa:bb:cc:dd:ee:ff dev vxlan100 dst 10.0.1.20

# Verify the FDB
bridge fdb show dev vxlan100

For Flannel, bind the VTEP to a specific interface rather than the wildcard address to prevent unintended exposure on additional interfaces:

# In flannel systemd unit or ConfigMap
--iface=eth0    # bind VTEP to this interface only, not 0.0.0.0

VNI isolation in multi-tenant environments

Assign a distinct VNI range per tenant and enforce strict VTEP ACLs so nodes belonging to tenant A cannot reach the VTEP port of nodes belonging to tenant B.

For Calico with VXLAN, configure per-pool vxlanMode: CrossSubnet to limit VXLAN tunneling to cross-subnet traffic and use direct routing within subnets:

# calico-ippool-tenant-a.yaml
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  name: tenant-a-pool
spec:
  cidr: 10.100.0.0/16
  vxlanMode: CrossSubnet
  natOutgoing: true
  nodeSelector: "tenant == 'a'"

Use Linux network namespaces to ensure VXLAN interfaces for different VNIs live in separate namespaces, preventing cross-VNI packet leakage:

# Create isolated namespace for tenant B's VTEP
ip netns add tenant-b
ip link add vxlan200 type vxlan id 200 dstport 4789 local 10.0.1.10 nolearning
ip link set vxlan200 netns tenant-b

# Operate within the namespace
ip netns exec tenant-b ip link set vxlan200 up
ip netns exec tenant-b ip addr add 192.168.200.1/24 dev vxlan200

Geneve security options

Geneve’s TLV option headers can carry tenant metadata that the receiving VTEP validates before accepting inner frames. This does not authenticate the packet, but it provides a soft check that prevents accidental cross-tenant frame delivery from misconfigured endpoints.

Create a Geneve interface with explicit destination port and TOS inheritance:

ip link add geneve0 type geneve \
  id 100 \
  remote 10.0.1.20 \
  dstport 6081 \
  tos inherit \
  ttl 64

ip link set geneve0 up
ip addr add 192.168.100.1/24 dev geneve0

The tos inherit flag copies the inner frame’s DSCP markings to the outer IP header, which preserves QoS treatment through the physical underlay. For security purposes, Geneve offers no advantage over VXLAN without an authenticated transport (IPsec or WireGuard) underneath. Prefer VXLAN for CNI compatibility; prefer Geneve only when OVN/OVS is the data plane, since OVN encodes flow metadata in Geneve TLV options natively.

Monitoring VXLAN anomalies

Deploy an eBPF tc classifier to count VXLAN packets arriving from source IPs not in the allowed VTEP set and expose the count as a Prometheus metric:

# Attach a tc BPF filter to count unknown VTEP sources (illustrative; load compiled BPF object)
tc qdisc add dev eth0 clsact
tc filter add dev eth0 ingress protocol ip prio 1 bpf obj vtep-monitor.o sec classifier direct-action

# Verify filter is attached
tc filter show dev eth0 ingress

Prometheus alerting rule for unknown VTEP source traffic:

# prometheus-rules/vxlan-anomaly.yaml
groups:
  - name: vxlan_security
    rules:
      - alert: UnknownVTEPSourceTraffic
        expr: rate(vxlan_unknown_vtep_packets_total[5m]) > 0
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "VXLAN traffic from unknown VTEP source on {{ $labels.node }}"
          description: "{{ $value }} pps from unrecognised VTEP. Possible injection or misconfiguration."

For incident investigation, capture VXLAN traffic at the physical interface and decode the inner frames:

# Capture VXLAN on the underlay interface, decode inner frames
tcpdump -i eth0 -n udp port 4789 -w /tmp/vxlan-capture.pcap

# Decode with tshark (shows both outer and inner headers)
tshark -r /tmp/vxlan-capture.pcap -d udp.port==4789,vxlan -V | head -80

# Quick summary of outer source IPs seen on port 4789
tcpdump -i eth0 -n -c 10000 udp port 4789 2>/dev/null | \
  awk '{print $3}' | sort | uniq -c | sort -rn | head -20

Expected Behaviour

Signal	Without hardening	With hardening
Rogue VTEP injection (UDP/4789 from unlisted IP)	Frame decapsulated and delivered to tenant overlay; FDB poisoned	nftables drops packet at physical interface; rogue VTEP cannot reach VXLAN stack
BUM amplification flood (high-rate broadcast to VNI)	All VTEPs in multicast group process and forward; CPU saturation on all nodes	Static unicast FDB limits flood targets to known VTEPs; BUM rate alerts fire within 1 minute
VNI scan (sequential VNI probe via UDP/4789)	Probes reach VXLAN stack; FDB responses and ICMP leakage reveal active VNIs	nftables drops probes from unlisted sources; only legitimate VTEPs can probe; monitoring alerts on probe pattern
Cross-tenant ARP spoof (ARP reply injected in overlay)	ARP cache poisoned in target tenant segment; traffic redirected to attacker	`nolearning` flag prevents dynamic FDB updates; static FDB entries enforced; Calico ARP proxy mode returns authoritative responses

Trade-offs

Aspect	Benefit	Cost	Mitigation
IPsec transport encryption over VXLAN	Encrypts all inter-VTEP traffic; compatible with existing VXLAN CNI deployments	5–15% throughput reduction per hop; CPU overhead without hardware offload; IKEv2 key management complexity	Use AES-GCM (hardware-accelerated on x86); deploy StrongSwan with automated cert renewal; use NICs with IPsec offload (e.g., Mellanox ConnectX-6)
Static VTEP FDB table	Eliminates dynamic MAC learning as an attack surface; disables BUM amplification via unknown sources	Operational burden: every node add or remove requires FDB updates on all other nodes	Automate FDB updates via node lifecycle hooks in Kubernetes (DaemonSet or CNI plugin); use a cluster-state-driven reconciler
Cilium WireGuard encryption mode	Zero unauthenticated packet paths; simple key management (Cilium-managed); no IPsec SA negotiation overhead	Requires Cilium as CNI; not portable to Flannel, Calico, or Antrea without replacing CNI; requires kernel >= 5.6	Standardise on Cilium for greenfield clusters; for brownfield, use IPsec transport as the encryption layer instead
nftables VTEP ACL maintenance	Blocks all unauthenticated VTEP injection from non-cluster IPs; simple to audit	Node IP changes (cloud instance replacement) temporarily break overlay until ACL is updated	Drive nftables set updates from the same source of truth (Kubernetes node object) that the CNI uses; update ACL before CNI is initialised on new nodes

Failure Modes

Failure	Symptom	Detection	Recovery
nftables ACL blocks new node VTEP	Pods on new node cannot communicate with pods on existing nodes; new node shows as Ready in Kubernetes but pod-to-pod traffic fails	`tcpdump -i eth0 udp port 4789` on an existing node shows no packets from new node’s IP; `nft list set inet vtep_acl allowed_vteps` does not include new node IP	Add new node IP to nftables set on all existing nodes: `nft add element inet vtep_acl allowed_vteps { <new-node-ip> }`; automate via node join hook
IPsec SA mismatch between VTEPs	Intermittent packet loss on paths between specific node pairs; loss disappears and reappears as SAs cycle	`swanctl --list-sas` shows SA in REKEYING or DELETING state; `ip xfrm state list` shows expired SAs; per-node traceroute drops on specific hops	`swanctl --terminate --ike vtep-mesh && swanctl --initiate --child vxlan-udp`; check system clock skew (NTP drift causes IKEv2 failures); verify certificate expiry
VXLAN FDB table overflow	Packet loss for unknown MACs after FDB capacity exceeded; `bridge fdb show` returns truncated output	`ip -s link show vxlan100` shows increasing RX drop counter; kernel log: `vxlan: fdb: maximum reached`; Prometheus alert on interface drop rate	Lower `nolearning` threshold; set `ageing` on FDB entries (`ip link set vxlan100 type vxlan ageing 300`); reduce FDB pressure by limiting pods per node or using ARP proxy
WireGuard key rotation disrupts existing connections	Active TCP connections through the Cilium WireGuard overlay drop during key rotation; brief connectivity gap (< 1 second typical)	`wg show cilium_wg0` shows `latest handshake` timestamp cycling; application-layer retransmit counters spike during rotation	Cilium manages WireGuard key rotation automatically (every 5 minutes by default in Cilium 1.14+); ensure application-layer retry logic handles < 2s gaps; verify `encryption.wireguard.persistentKeepalive` is set to keep NAT mappings alive