Limiting NGINX Worker Process Blast Radius with OS-Level Controls
Problem
NGINX’s process model is deliberately simple: a master process runs as root to bind privileged ports (80, 443) and manage configuration; worker processes handle all request processing and run as an unprivileged user (nginx, www-data, or a custom account). The design intent is that even if a worker process is compromised, the attacker inherits only the unprivileged worker’s context — not root.
Recent NGINX CVEs demonstrate why this design deserves scrutiny rather than trust. CVE-2024-7347 (ngx_http_mp4_module heap buffer overflow), the QUIC module vulnerabilities CVE-2024-24989 and CVE-2024-24990, and earlier memory corruption bugs all target the worker process. When exploited, the attacker’s code runs as the NGINX worker user. On a typical deployment, that user can:
- Read all files accessible to the worker — including SSL private keys, application configuration, and files that are world-readable on the host
- Make outbound network connections to arbitrary destinations (no egress restriction by default)
- Read
/procentries for other processes (though limited by ptrace restrictions) - Write to any directory writable by the worker user
- Call almost any syscall — there is no Seccomp filter on NGINX workers by default
- On systems with lax filesystem permissions, pivot to application secrets or credentials
The gap between “runs as unprivileged user” and “fully contained” is significant. OS-level controls that the NGINX process model does not provide by default include: syscall filtering (Seccomp), network namespace isolation, filesystem namespace restrictions, and capability bounding sets.
These controls matter most in the window between CVE disclosure and patch deployment. If your emergency patching SLA is 7 days for critical vulnerabilities, these controls are your defence for those 7 days.
Target systems: any Linux host running NGINX as a public-facing web server or reverse proxy; bare metal, VM, and non-container deployments where NGINX is managed via systemd or init scripts; this article focuses on non-containerised NGINX — containerised deployments have different tooling.
Threat Model
Adversary 1 — RCE via memory corruption CVE. A vulnerability in an NGINX module (mp4, QUIC, image_filter) allows an attacker to achieve code execution in a worker process. With baseline hardening: attacker can read SSL private keys, make outbound connections, and attempt further privilege escalation. With OS-level controls: attacker is restricted to a narrow syscall whitelist and cannot reach most of the filesystem or external network.
Adversary 2 — SSRF via proxy_pass misconfiguration. A misconfigured proxy_pass directive allows the NGINX worker to make requests to internal services on behalf of an attacker. Without network namespace isolation: the worker can reach internal services on any network interface. With namespace isolation: the worker is limited to the network interfaces explicitly shared.
Adversary 3 — Post-exploitation privilege escalation. After achieving worker-level code execution, an attacker attempts to escalate to root via a kernel LPE. Without Seccomp: all LPE syscall chains are reachable. With Seccomp: the restricted syscall set eliminates most common LPE primitives.
Configuration / Implementation
Step 1 — Baseline worker user configuration
Before adding OS-level controls, ensure the worker runs with minimal permissions:
# /etc/nginx/nginx.conf
# Dedicated worker user with no login shell and no home directory
user nginx nginx;
# Worker process count — one per CPU core
worker_processes auto;
# Limit worker connections
events {
worker_connections 1024;
use epoll;
}
# Create dedicated user if it doesn't exist
useradd --system --no-create-home --shell /bin/false --user-group nginx
# Verify the user has no sudo rights and no writable home
id nginx
# uid=xxx(nginx) gid=xxx(nginx) groups=xxx(nginx)
# Ensure SSL private keys are NOT readable by the worker user
ls -la /etc/ssl/private/nginx.key
# Should be: -rw-r----- root ssl-cert (not readable by nginx)
# nginx master reads the key before dropping privileges; workers never need direct access
Step 2 — Apply a Seccomp filter via systemd
The most impactful control is restricting which syscalls the NGINX worker can make. A compromised worker cannot call execve to spawn a shell, cannot call ptrace for memory scanning, and cannot reach kernel LPE primitives:
# /etc/systemd/system/nginx.service.d/seccomp-hardening.conf
[Service]
# Apply systemd's built-in Seccomp filtering
# Block all syscalls not in the web server group
SystemCallFilter=@system-service @network-io @file-system @io-event @signal @timer
SystemCallFilter=~@privileged @obsolete @reboot @swap @cpu-emulation @debug
# Specifically deny syscalls commonly used in kernel LPE exploits
SystemCallFilter=~ptrace process_vm_readv process_vm_writev userfaultfd
# Deny module-related syscalls
SystemCallFilter=~finit_module init_module delete_module
# Allow only necessary setuid/setgid operations (master needs these; workers don't)
# For more restrictive setup, consider separate service units for master and workers
SystemCallArchitectures=native
# Additional hardening
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=read-only
PrivateTmp=yes
PrivateDevices=yes
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=yes
RestrictNamespaces=yes
RestrictRealtime=yes
LockPersonality=yes
MemoryDenyWriteExecute=yes
# Allow NGINX to read its config and serve files
ReadWritePaths=/var/log/nginx /var/cache/nginx /run/nginx
ReadOnlyPaths=/etc/nginx /usr/share/nginx /var/www
systemctl daemon-reload
systemctl restart nginx
# Verify the service has Seccomp active
systemctl status nginx | grep Seccomp
# Should show: SeccompFilter enabled
# Test that nginx still works
curl -I http://localhost/
# Expected: HTTP/1.1 200 OK
Step 3 — Restrict filesystem access
# /etc/systemd/system/nginx.service.d/filesystem-hardening.conf
[Service]
# Prevent NGINX workers from accessing home directories
ProtectHome=yes
# Read-only system except for writable paths
ProtectSystem=strict
# Explicit writable paths only
ReadWritePaths=/var/log/nginx /var/cache/nginx /run /tmp
# Prevent access to sensitive directories
InaccessiblePaths=/root /home /boot /proc/1
# Bind-mount only what nginx needs from /etc
BindReadOnlyPaths=/etc/nginx /etc/ssl/certs
Step 4 — Apply capability bounding set
Strip capabilities that NGINX workers don’t need post-startup:
# /etc/systemd/system/nginx.service.d/capabilities.conf
[Service]
# The master process needs NET_BIND_SERVICE to bind port 80/443
# Workers inherit a reduced capability set after the master forks them
CapabilityBoundingSet=CAP_NET_BIND_SERVICE CAP_SETUID CAP_SETGID CAP_DAC_OVERRIDE
# Ambient capabilities — none needed after startup
AmbientCapabilities=
# Prevent any process from gaining new capabilities
NoNewPrivileges=yes
Step 5 — Write a targeted Seccomp BPF profile for NGINX
For higher-security deployments, replace systemd’s generic filter with a NGINX-specific BPF profile:
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_AARCH64"],
"syscalls": [
{
"names": [
"accept4", "bind", "close", "connect", "epoll_create1", "epoll_ctl",
"epoll_wait", "eventfd2", "fstat", "futex", "getdents64", "getpid",
"getuid", "geteuid", "getgid", "getegid", "ioctl", "listen",
"lseek", "mmap", "mprotect", "munmap", "nanosleep", "open", "openat",
"pipe2", "poll", "ppoll", "pread64", "pwrite64", "read", "readv",
"recv", "recvfrom", "recvmsg", "rename", "rt_sigaction",
"rt_sigprocmask", "rt_sigreturn", "send", "sendfile", "sendmsg",
"sendto", "set_robust_list", "setsockopt", "getsockopt",
"set_tid_address", "shutdown", "socket", "stat", "fstatat",
"write", "writev", "exit", "exit_group", "clock_gettime",
"gettimeofday", "getrlimit", "setrlimit", "prctl",
"sched_getaffinity", "sched_yield", "unlink", "mkdir", "chmod",
"chown", "utime", "utimensat"
],
"action": "SCMP_ACT_ALLOW"
}
]
}
Save as /etc/nginx/nginx-worker.seccomp.json and apply via systemd:
[Service]
SeccompFilter=/etc/nginx/nginx-worker.seccomp.json
Step 6 — Verify worker process isolation
# Check what the nginx worker can read
WORKER_PID=$(pgrep -f "nginx: worker" | head -1)
echo "Worker PID: $WORKER_PID"
# Verify worker runs as the expected user
cat /proc/$WORKER_PID/status | grep -E "^(Name|Uid|Gid|CapPrm|CapEff)"
# Check the worker's open files — should not include sensitive paths
ls -la /proc/$WORKER_PID/fd/ | grep -v "pipe\|socket\|nginx"
# Confirm Seccomp is active on the worker
cat /proc/$WORKER_PID/status | grep Seccomp
# Expected: Seccomp: 2 (filter active)
# Test: attempt to call a blocked syscall from inside nginx context
# This is a smoke test — not a full exploit test
cat /proc/$WORKER_PID/syscall # Should show current syscall is within allowed set
Expected Behaviour
| Control | Before hardening | After hardening |
|---|---|---|
Worker calls execve to spawn shell |
Succeeds | Blocked by Seccomp — EPERM |
Worker reads /root/.ssh/ |
Succeeds if world-readable | Blocked by InaccessiblePaths |
| Worker makes outbound connection | Unrestricted | Allowed on port 80/443; other ports blocked by Seccomp |
/proc/$worker/status shows Seccomp |
Seccomp: 0 (disabled) |
Seccomp: 2 (filter active) |
Worker attempts ptrace on another process |
Succeeds | Blocked by Seccomp |
| New capabilities after privilege drop | May be present | NoNewPrivileges=yes prevents |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Strict Seccomp filter | Blocks most LPE exploit chains | NGINX modules that need unusual syscalls will break | Audit each module’s syscall requirements; add exceptions with comments explaining why |
ProtectSystem=strict |
Prevents worker from writing to system paths | NGINX module configuration may write to unexpected paths | Map all legitimate write paths; add them to ReadWritePaths |
MemoryDenyWriteExecute |
Prevents ROP gadget injection | Some compression modules use JIT-compiled code | Disable only if a specific module requires it; document the exception |
| Separate seccomp for master vs. worker | Tighter worker restrictions | Complex to implement with systemd’s single-service model | Use Type=forking with custom startup wrapper if needed |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Seccomp blocks legitimate NGINX syscall | NGINX fails to start or serve requests; systemd shows SIGSYS | dmesg shows audit: type=1326 (Seccomp violation); NGINX error log shows unexpected exit |
Identify the blocked syscall via strace nginx -t 2>&1 | head -50; add to allowlist |
ProtectHome breaks serving files from home dirs |
403 Forbidden for files under /home/ |
NGINX error log shows permission denied | Move served files out of home directories; use /var/www |
MemoryDenyWriteExecute breaks Lua/njs module |
Module fails to load; NGINX exits | NGINX error log shows memory mapping error | Add MemoryDenyWriteExecute=no and document why |
InaccessiblePaths hides path NGINX legitimately needs |
NGINX cannot find config file or cert | NGINX fails to start; nginx -t shows path error |
Move the resource to a non-protected path or remove from InaccessiblePaths |
Related Articles
- Linux LPE Defence in Depth — the layered OS controls that contain exploitation even without a patch
- Seccomp BPF Without Containers — applying Seccomp at the service level to non-containerised processes like NGINX
- Systemd Unit Hardening — the full set of systemd security directives used in this article
- NGINX Hardening Beyond TLS — application-layer NGINX hardening that complements OS-level controls
- NGINX Fleet Patch Management — managing NGINX patches across the fleet while OS-level controls provide interim protection