Hardening Linux AF_VSOCK Against VM-to-Host Escape
Problem
AF_VSOCK (Virtual Socket) is a socket address family designed for efficient communication between virtual machines and their hypervisors. Unlike network sockets that require a full IP stack, VSOCK uses a CID (Context Identifier) addressing model: the hypervisor always has CID 2, the host has CID 1, and each VM gets a unique CID assigned at boot. Services on the hypervisor bind VSOCK ports; processes inside VMs connect to them.
The practical uses are pervasive: guest agents (VMware Tools, QEMU guest agent, AWS SSM agent), container runtime shims (containerd uses VSOCK for Firecracker micro-VM communication), nested virtualisation, and cloud provider metadata services all use VSOCK. Every cloud VM runs at least one VSOCK-connected process.
The security problem is structural: VSOCK creates a direct, low-level communication channel from untrusted guest code to the hypervisor. Any vulnerability in a VSOCK-listening service on the hypervisor is reachable from inside every VM that machine hosts. And vulnerabilities have appeared regularly:
CVE-2021-26708 (Linux kernel VSOCK): race condition in vsock_stream_connect() and related paths enabling local privilege escalation and, in the context of a VM, a guest-to-host escalation vector.
CVE-2022-26525 / CVE-2022-26526: QEMU vhost-vsock backend vulnerabilities allowing a malicious guest to corrupt host memory via crafted VSOCK messages.
CVE-2024-50264: use-after-free in vsock/virtio transport enabling guest code to corrupt virtio ring state, potentially triggering host kernel code execution.
VMware Tools VSOCK exposure: VMware’s guest agent exposes a VSOCK service with a documented protocol. Research in 2024 demonstrated that under-validated message parsing in this agent created a command injection path from guest to host.
Beyond specific CVEs, VSOCK has a structural risk that is often overlooked in hardened VM deployments: the surface is invisible to standard network security tooling. Firewall rules, network ACLs, and packet capture don’t see VSOCK traffic. An attacker who reaches VSOCK-based services bypasses all network-layer controls. The channel is fast, reliable, and completely auditing-transparent by default.
Target systems: Linux KVM/QEMU virtual machines, AWS EC2 instances with SSM agent, VMware vSphere guests, Firecracker-based container environments, any Linux host running VSOCK-listening services (virtio-vsock, vhost-vsock).
Threat Model
Adversary 1 — Compromised VM code reaching hypervisor services. Access level: code execution inside a guest VM. Objective: connect to VSOCK ports on the hypervisor (CID 2), exploit a vulnerability in a listening service, achieve host code execution or read host memory.
Adversary 2 — Container escape via VSOCK in Firecracker. Access level: code inside a Firecracker micro-VM (used as a container sandbox). Objective: exploit a VSOCK vulnerability in the containerd-shim VSOCK listener to escape the Firecracker boundary and reach the host.
Adversary 3 — Malicious guest VSOCK packet injection. Access level: root inside a guest VM with VSOCK device access. Objective: send malformed VSOCK messages that trigger kernel bugs in the vhost-vsock backend on the host, corrupting host kernel memory.
Adversary 4 — VSOCK lateral movement between VMs. Access level: code inside one guest VM. Objective: connect to VSOCK ports on sibling VMs (if the hypervisor allows inter-VM VSOCK). Most hypervisors restrict this, but misconfigurations exist.
Without hardening: VSOCK is an unmonitored, unconstrained channel from guest to hypervisor. With hardening: Seccomp blocks AF_VSOCK socket creation in workloads that don’t need it; hypervisor-side service isolation limits blast radius; audit logging captures VSOCK connection patterns.
Configuration / Implementation
Step 1 — Audit current VSOCK usage
# List processes with open VSOCK sockets
ss --vsock --processes
# Or:
ss -xlp | grep vsock
# Check which CIDs are active on the host
# (Run on the KVM/QEMU host, not inside a VM)
ls /dev/vhost-vsock 2>/dev/null && echo "vhost-vsock device present"
# Check which processes listen on VSOCK ports
# Inside a guest VM:
ss --vsock --listening
# On KVM host — find VSOCK-listening processes
for pid in $(ls /proc | grep '^[0-9]'); do
fd_dir="/proc/$pid/fd"
if ls "$fd_dir" 2>/dev/null | while read fd; do
target=$(readlink "$fd_dir/$fd" 2>/dev/null)
echo "$target"
done | grep -q "vsock"; then
echo "PID $pid ($(cat /proc/$pid/comm)) has VSOCK socket"
fi
done 2>/dev/null
Step 2 — Block AF_VSOCK via Seccomp for workloads that don’t need it
Most application workloads inside VMs have no legitimate need to open VSOCK sockets directly. Block the socket family:
{
"defaultAction": "SCMP_ACT_ALLOW",
"syscalls": [
{
"names": ["socket"],
"action": "SCMP_ACT_ERRNO",
"errnoRet": 1,
"args": [
{
"index": 0,
"value": 40,
"op": "SCMP_CMP_EQ"
}
],
"comment": "Block AF_VSOCK (40) socket creation"
}
]
}
# Verify AF_VSOCK family number on your kernel
python3 -c "import socket; print(socket.AF_VSOCK)"
# Should print: 40
# Apply to containers in Kubernetes
kubectl apply -f - <<'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: seccomp-deny-vsock
namespace: default
data:
deny-vsock.json: |
{
"defaultAction": "SCMP_ACT_ALLOW",
"syscalls": [{
"names": ["socket"],
"action": "SCMP_ACT_ERRNO",
"args": [{"index": 0, "value": 40, "op": "SCMP_CMP_EQ"}]
}]
}
EOF
For Kubernetes pods:
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: deny-vsock.json
containers:
- name: app
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
Step 3 — Restrict VSOCK on the hypervisor side
On KVM/QEMU hosts, restrict which processes can access /dev/vhost-vsock:
# Check current permissions on vhost-vsock device
ls -la /dev/vhost-vsock
# Default: crw------- 1 root root — only root can access
# If the device is more permissive, tighten it
chmod 0600 /dev/vhost-vsock
chown root:kvm /dev/vhost-vsock
# Restrict via udev rule
cat > /etc/udev/rules.d/90-vsock.rules << 'EOF'
KERNEL=="vhost-vsock", GROUP="kvm", MODE="0660"
EOF
udevadm control --reload-rules && udevadm trigger
# Verify: non-kvm processes cannot access the device
su -s /bin/bash nobody -c "cat /dev/vhost-vsock" 2>&1
# Expected: Permission denied
Step 4 — Harden VSOCK-listening services on the hypervisor
Services that legitimately listen on VSOCK should be hardened with minimal privileges:
# /etc/systemd/system/qemu-guest-agent.service — example hardening
[Unit]
Description=QEMU Guest Agent
[Service]
ExecStart=/usr/bin/qemu-ga --method=virtio-serial
# Run as dedicated user, not root
User=qemu-guest
Group=qemu-guest
# Restrict capabilities
CapabilityBoundingSet=
AmbientCapabilities=
# Restrict filesystem access
ProtectSystem=strict
ProtectHome=true
PrivateTmp=true
# Restrict syscalls
SystemCallFilter=@system-service
SystemCallErrorNumber=EPERM
NoNewPrivileges=true
For custom VSOCK services, validate all messages rigorously:
// vsock_server.rs — secure VSOCK listener pattern
use vsock::{VsockListener, VMADDR_CID_HOST};
use std::io::{Read, Write};
fn secure_vsock_server() -> std::io::Result<()> {
let listener = VsockListener::bind_with_cid_port(
VMADDR_CID_HOST,
9999
)?;
for stream in listener.incoming() {
let mut stream = stream?;
// Get the peer CID — validate it's an expected guest
let peer_addr = stream.peer_addr()?;
let peer_cid = peer_addr.cid();
// Only accept connections from specific VM CIDs
let allowed_cids = [3u32, 4, 5]; // Specific VM CIDs
if !allowed_cids.contains(&peer_cid) {
eprintln!("Rejected connection from unexpected CID: {}", peer_cid);
continue;
}
// Read message with strict size limit (prevent resource exhaustion)
let mut buf = vec![0u8; 4096]; // Max 4KB message
let n = stream.read(&mut buf)?;
if n == 0 || n > 1024 {
eprintln!("Invalid message size: {} bytes from CID {}", n, peer_cid);
continue;
}
// Parse and validate message strictly
let msg = &buf[..n];
handle_message(peer_cid, msg, &mut stream)?;
}
Ok(())
}
Step 5 — Enable VSOCK audit logging
VSOCK connections don’t appear in iptables logs or standard network logs. Add explicit audit:
# /etc/audit/rules.d/92-vsock.rules
# Audit socket() calls for AF_VSOCK
-a always,exit -F arch=b64 -S socket -F a0=40 -F key=vsock_socket
# Audit connect() calls that may target VSOCK peers
-a always,exit -F arch=b64 -S connect -F key=vsock_connect
augenrules --load && systemctl restart auditd
# Monitor VSOCK socket creation
ausearch -k vsock_socket --start today | \
grep -v "^----" | head -20
Step 6 — Apply VSOCK firewall rules via eBPF
For Firecracker or vhost-vsock environments where you need fine-grained filtering:
// vsock_filter.bpf.c — eBPF program to filter VSOCK connections
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
// Allow list of permitted guest CIDs
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 256);
__type(key, __u32); // Guest CID
__type(value, __u8); // 1 = allowed
} allowed_cids SEC(".maps");
SEC("cgroup/connect6")
int vsock_connect_filter(struct bpf_sock_addr *ctx) {
if (ctx->family != AF_VSOCK)
return 1; // Allow non-VSOCK
__u32 peer_cid = ctx->user_ip4; // CID in VSOCK context
__u8 *allowed = bpf_map_lookup_elem(&allowed_cids, &peer_cid);
if (!allowed) {
bpf_printk("VSOCK blocked: CID %u not in allowlist\n", peer_cid);
return 0; // Block
}
return 1; // Allow
}
Expected Behaviour
| Signal | Before hardening | After hardening |
|---|---|---|
| App container opens VSOCK socket | Succeeds | Blocked by Seccomp — EPERM |
/dev/vhost-vsock permissions |
May be world-readable | 0660, group kvm only |
| auditd logs VSOCK socket creation | Not captured | vsock_socket key fires |
| Guest connects to hypervisor VSOCK from unexpected CID | No logging, no filtering | Rejected by service-level CID allowlist |
| VSOCK service runs as root | Common default | Runs as dedicated user with restricted capabilities |
Verification:
# Inside a VM — confirm Seccomp blocks VSOCK
python3 -c "
import socket
try:
s = socket.socket(40, socket.SOCK_STREAM) # AF_VSOCK = 40
print('FAIL: VSOCK socket created')
except OSError as e:
print(f'PASS: VSOCK blocked — {e}')
"
# On host — confirm vhost-vsock permissions
stat -c "%a %U %G" /dev/vhost-vsock
# Expected: 660 root kvm
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Seccomp AF_VSOCK block | Eliminates VSOCK exploitation from app containers | Breaks workloads that legitimately need VSOCK (VM agents, container shims) | Apply only to application containers; exempt system agent pods/services |
| CID allowlist in VSOCK service | Limits which VMs can connect to the service | Requires knowing CIDs at service startup; CIDs can change | For cloud environments, use CID-to-instance metadata mapping; update allowlist via service restart on VM lifecycle events |
| eBPF VSOCK filtering | Kernel-level enforcement; cannot be bypassed by userspace | Requires Linux 5.10+; adds complexity | Use as belt-and-suspenders with service-level CID checks |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Seccomp blocks legitimate VM agent | VM agent cannot communicate with hypervisor; agent health checks fail | Agent logs show socket creation error; VM management plane loses contact | Add VSOCK socket to agent’s Seccomp exemption; use a targeted profile instead of blocking all AF_VSOCK |
| CID allowlist too restrictive | New VM cannot connect to hypervisor service; agent fails | New VM connectivity issues; agent logs show connection refused | Add new VM’s CID to the allowlist; automate via VM lifecycle hooks |
| Kernel update changes VSOCK family number | Unlikely — AF_VSOCK = 40 is stable | If Seccomp stops working | Verify with python3 -c "import socket; print(socket.AF_VSOCK)" after kernel update |
Related Articles
- Linux Netlink Socket Hardening — the same pattern of restricting kernel socket families that have produced LPE vulnerabilities
- Linux Unprivileged Namespace Restriction — complements VSOCK hardening by limiting the capability grants that make VSOCK exploitation reachable
- Seccomp BPF Without Containers — applying Seccomp at the service level to restrict socket families including AF_VSOCK
- Firecracker and Kata CI Runners — Firecracker uses VSOCK for the containerd-shim interface; hardening that deployment
- Linux LPE Defence in Depth — layered controls that contain exploitation even when individual VSOCK vulnerabilities are discovered