NGINX CVE Patch Management Across Mixed Bare Metal, VM, and Kubernetes Fleets
Problem
Most organisations running NGINX at scale have it deployed in at least two different ways: as a systemd service on bare metal or VMs (managed via Ansible, Salt, or manual configuration), and as the ingress-nginx controller in Kubernetes (managed via Helm). Some have a third tier: NGINX in Docker containers built into application images, often with no automatic update path.
When a critical NGINX CVE is published — like CVE-2024-7347, CVE-2025-23419, or the 2025 ingress annotation injection family — the patching process differs for each tier:
Bare metal / VM tier: NGINX is installed as a distro package or from nginx.org’s repository. Patching requires running the package manager (apt-get upgrade nginx or yum update nginx) across each host. Ansible can orchestrate this, but the fleet may have different distros, package repositories, and OS configurations. Some hosts may be running NGINX compiled from source with custom modules, which requires a recompile.
Kubernetes ingress-nginx tier: NGINX lives inside the ingress-nginx controller image, versioned by Helm chart. Patching requires updating the Helm chart version, which changes the controller image, which triggers a rolling restart of ingress-nginx pods. The challenge is that breaking changes in the Helm chart can affect all inbound traffic to the cluster.
Application container tier: NGINX is baked into application Docker images as a reverse proxy or static file server. These images have their own build pipelines and may use outdated base images (nginx:1.24-alpine from months ago). There is no central way to patch these — each application team must rebuild and redeploy their image.
The inventory problem. Before patching, you need to know what you have. Many organisations cannot answer “how many NGINX instances are running right now, and what versions?” across the full fleet. Without inventory, there is no way to measure patch progress or confirm coverage.
The patch window. For critical CVEs, the window between disclosure and active exploitation can be less than 48 hours. A patching process that takes two weeks leaves the fleet exposed. OS-level controls (Seccomp, capability bounding sets — see the companion article on NGINX worker hardening) buy time in this window, but patching is the actual remediation.
Target systems: any organisation operating NGINX at scale across more than one deployment tier; security teams responsible for CVE SLAs; platform teams who own NGINX infrastructure but not individual application NGINX deployments.
Threat Model
Risk 1 — Extended exposure window. A critical NGINX CVE is disclosed. The fleet has 200 NGINX instances. Without a patch management process, the team must manually identify all instances, determine their versions, and patch them individually. The process takes three weeks; active exploitation begins in week one.
Risk 2 — Hidden NGINX in application containers. Application teams have embedded NGINX in their Docker images as a static file server. These images are not tracked in the NGINX fleet inventory. A CVE that affects all NGINX versions is patched on the infrastructure tier but persists in 15 application containers that the security team doesn’t know exist.
Risk 3 — Package repository pinning blocks patches. A host has a pinned NGINX version (via /etc/apt/preferences.d/nginx or yum.conf excludes) because a previous upgrade caused issues. The pin silently prevents the security update from being applied. The host reports “NGINX up to date” but is running a vulnerable version.
Configuration / Implementation
Step 1 — Build fleet inventory across all tiers
#!/bin/bash
# nginx-fleet-inventory.sh
# Comprehensive NGINX version inventory across all deployment tiers
OUTPUT_FILE="/tmp/nginx-fleet-inventory-$(date +%Y%m%d-%H%M).csv"
echo "Tier,Host/Cluster,Location,NGINX Version,Source,Last Updated" > "$OUTPUT_FILE"
# Tier 1: Systemd hosts via Ansible ad-hoc
echo "=== Tier 1: Bare metal / VM NGINX versions ==="
ansible all -m shell -a "nginx -v 2>&1 | grep -oP '(?<=nginx/)[\d.]+'; systemctl is-active nginx" \
--limit nginx_hosts 2>/dev/null | \
while IFS= read -r line; do
if [[ "$line" =~ ^([^|]+)\|(.+)$ ]]; then
HOST="${BASH_REMATCH[1]}"
VERSION="${BASH_REMATCH[2]}"
echo "systemd,$HOST,/usr/sbin/nginx,$VERSION,package,$(date +%Y-%m-%d)" >> "$OUTPUT_FILE"
fi
done
# Tier 2: Kubernetes ingress-nginx via kubectl
echo ""
echo "=== Tier 2: Kubernetes ingress-nginx versions ==="
for context in $(kubectl config get-contexts -o name 2>/dev/null); do
kubectl --context="$context" get pods -A \
-l "app.kubernetes.io/name=ingress-nginx" \
-o jsonpath='{range .items[*]}{.spec.containers[0].image}{"\t"}{.metadata.namespace}{"\n"}{end}' \
2>/dev/null | \
while IFS=$'\t' read -r image namespace; do
version=$(echo "$image" | grep -oP '(?<=:v)[\d.]+')
echo "kubernetes,$context,$namespace/ingress-nginx,${version:-unknown},helm,$(date +%Y-%m-%d)" >> "$OUTPUT_FILE"
done
done
# Tier 3: NGINX in Docker containers (Kubernetes pods)
echo ""
echo "=== Tier 3: Application containers running NGINX ==="
for context in $(kubectl config get-contexts -o name 2>/dev/null); do
kubectl --context="$context" get pods -A \
-o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}: {range .spec.containers[*]}{.image}{" "}{end}{"\n"}{end}' \
2>/dev/null | \
while IFS= read -r pod_images; do
if echo "$pod_images" | grep -qE "nginx:[0-9]|/nginx:"; then
POD=$(echo "$pod_images" | cut -d: -f1)
IMAGE=$(echo "$pod_images" | grep -oE "nginx:[^[:space:]]+" | head -1)
VERSION=$(echo "$IMAGE" | grep -oP '(?<=:)[\d.]+')
echo "container,$context,$POD,${VERSION:-embedded},docker-image,unknown" >> "$OUTPUT_FILE"
fi
done
done
echo ""
echo "Inventory written to: $OUTPUT_FILE"
cat "$OUTPUT_FILE"
Step 2 — Check inventory against CVE-affected versions
#!/usr/bin/env python3
# scripts/nginx-cve-check.py
# Compares fleet inventory against known CVE-affected version ranges
import csv
import sys
from dataclasses import dataclass
from typing import Optional
from packaging.version import Version, InvalidVersion
@dataclass
class NginxCVE:
cve_id: str
severity: str
affected_mainline_lt: Optional[str]
affected_stable_lt: Optional[str]
description: str
# Current CVE database — update as new CVEs are published
NGINX_CVES = [
NginxCVE("CVE-2024-7347", "MEDIUM", "1.27.1", "1.26.2",
"ngx_http_mp4_module heap buffer overflow"),
NginxCVE("CVE-2024-24989", "HIGH", "1.25.4", None,
"QUIC module NULL pointer dereference"),
NginxCVE("CVE-2024-24990", "HIGH", "1.25.4", None,
"QUIC module use-after-free"),
NginxCVE("CVE-2025-23419", "MEDIUM", "1.27.4", "1.26.3",
"mTLS session resumption bypass"),
]
def parse_version(version_str: str) -> Optional[Version]:
try:
# Strip 'v' prefix if present
return Version(version_str.lstrip('v'))
except InvalidVersion:
return None
def check_vulnerabilities(nginx_version: str) -> list[NginxCVE]:
v = parse_version(nginx_version)
if not v:
return []
findings = []
for cve in NGINX_CVES:
is_mainline = (v.major, v.minor) in [(v.major, m) for m in [25, 27]]
if is_mainline and cve.affected_mainline_lt:
patched = parse_version(cve.affected_mainline_lt)
if patched and v < patched:
findings.append(cve)
elif not is_mainline and cve.affected_stable_lt:
patched = parse_version(cve.affected_stable_lt)
if patched and v < patched:
findings.append(cve)
return findings
if __name__ == "__main__":
inventory_file = sys.argv[1] if len(sys.argv) > 1 else "/tmp/nginx-fleet-inventory.csv"
critical_hosts = []
with open(inventory_file) as f:
reader = csv.DictReader(f)
for row in reader:
version = row.get("NGINX Version", "unknown")
if version in ("unknown", "embedded", ""):
print(f"UNKNOWN version: {row['Tier']}/{row['Host/Cluster']}/{row['Location']}")
continue
findings = check_vulnerabilities(version)
if findings:
host_id = f"{row['Tier']}/{row['Host/Cluster']}/{row['Location']}"
critical = any(f.severity in ("HIGH", "CRITICAL") for f in findings)
if critical:
critical_hosts.append(host_id)
print(f"\n{'CRITICAL' if critical else 'WARNING'}: {host_id}")
print(f" Running NGINX {version}")
for finding in findings:
print(f" [{finding.severity}] {finding.cve_id}: {finding.description}")
if critical_hosts:
print(f"\n\nCRITICAL: {len(critical_hosts)} hosts require immediate patching")
sys.exit(1)
else:
print("\nAll hosts running patched NGINX versions")
sys.exit(0)
Step 3 — Patch Tier 1: bare metal / VM via Ansible
# playbooks/nginx-emergency-patch.yml
# Emergency patching playbook for NGINX CVE remediation
---
- name: Emergency NGINX Patch — CVE Remediation
hosts: nginx_hosts
serial: "20%" # Roll out to 20% of hosts at a time
max_fail_percentage: 10 # Abort if more than 10% of hosts fail
vars:
target_nginx_version_debian: "1.26.3"
target_nginx_version_rhel: "1.26.3"
slack_webhook: "{{ lookup('env', 'SLACK_WEBHOOK') }}"
pre_tasks:
- name: Record current NGINX version
command: nginx -v
register: nginx_version_before
changed_when: false
ignore_errors: true
- name: Verify NGINX is running before patching
service_facts:
- name: Check if NGINX is active
assert:
that: "ansible_facts.services['nginx.service'].state == 'running'"
fail_msg: "NGINX is not running on {{ inventory_hostname }} — skipping patch"
ignore_errors: true
tasks:
- name: Patch NGINX on Debian/Ubuntu
apt:
name: nginx
state: latest
update_cache: yes
when: ansible_os_family == "Debian"
notify: reload nginx
- name: Patch NGINX on RHEL/CentOS
yum:
name: nginx
state: latest
update_cache: yes
when: ansible_os_family == "RedHat"
notify: reload nginx
- name: Record new NGINX version
command: nginx -v
register: nginx_version_after
changed_when: false
- name: Test NGINX configuration is valid
command: nginx -t
register: nginx_test
changed_when: false
- name: Log patch result
debug:
msg: "{{ inventory_hostname }}: {{ nginx_version_before.stderr | default('unknown') }} → {{ nginx_version_after.stderr }}"
handlers:
- name: reload nginx
service:
name: nginx
state: reloaded
post_tasks:
- name: Verify NGINX is serving requests after reload
uri:
url: "http://localhost/health"
status_code: 200
ignore_errors: true
register: health_check
- name: Alert on failed health check
debug:
msg: "WARNING: Health check failed on {{ inventory_hostname }} after NGINX patch"
when: health_check.status is defined and health_check.status != 200
Step 4 — Patch Tier 2: Kubernetes ingress-nginx
#!/bin/bash
# scripts/patch-ingress-nginx.sh
# Staged ingress-nginx update across Kubernetes clusters
PATCHED_CHART_VERSION="${1:?Usage: $0 <chart-version> [cluster-context...]}"
CLUSTERS="${@:2}"
# If no clusters specified, use all contexts
if [[ -z "$CLUSTERS" ]]; then
CLUSTERS=$(kubectl config get-contexts -o name 2>/dev/null)
fi
STAGING_CLUSTER="${STAGING_CLUSTER:-staging}"
WAIT_MINUTES="${WAIT_MINUTES:-15}"
patch_cluster() {
local context="$1"
echo ""
echo "=== Patching ingress-nginx on cluster: $context ==="
# Get current version
CURRENT=$(helm -n ingress-nginx list --kube-context="$context" \
-o json 2>/dev/null | jq -r '.[0].chart')
echo "Current chart: $CURRENT"
# Upgrade
helm upgrade ingress-nginx ingress-nginx/ingress-nginx \
--kube-context="$context" \
--namespace ingress-nginx \
--version "$PATCHED_CHART_VERSION" \
--wait \
--timeout 5m \
--atomic \ # Roll back automatically on failure
-f "helm/ingress-nginx/values-${context}.yaml" 2>&1
if [[ $? -ne 0 ]]; then
echo "FAIL: Helm upgrade failed on $context — check for automatic rollback"
return 1
fi
# Verify new pods are running
kubectl --context="$context" rollout status \
deployment/ingress-nginx-controller \
-n ingress-nginx \
--timeout=3m
# Smoke test — send a test request through ingress
echo "Running smoke test on $context..."
INGRESS_IP=$(kubectl --context="$context" get svc \
-n ingress-nginx ingress-nginx-controller \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}')
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
--connect-timeout 5 \
-H "Host: health.example.com" \
"http://$INGRESS_IP/health" 2>/dev/null)
if [[ "$HTTP_STATUS" != "200" ]]; then
echo "WARN: Smoke test returned HTTP $HTTP_STATUS on $context"
else
echo "OK: Smoke test passed on $context"
fi
return 0
}
# Always patch staging first
echo "Phase 1: Staging cluster"
patch_cluster "$STAGING_CLUSTER" || {
echo "ABORT: Staging patch failed — halting production rollout"
exit 1
}
echo "Waiting ${WAIT_MINUTES} minutes before production rollout..."
echo "Monitor staging: kubectl --context=$STAGING_CLUSTER get pods -n ingress-nginx -w"
sleep "${WAIT_MINUTES}m"
# Patch production clusters
echo ""
echo "Phase 2: Production clusters"
FAILED_CLUSTERS=()
for context in $CLUSTERS; do
[[ "$context" == "$STAGING_CLUSTER" ]] && continue
patch_cluster "$context" || FAILED_CLUSTERS+=("$context")
done
if [[ ${#FAILED_CLUSTERS[@]} -gt 0 ]]; then
echo ""
echo "FAILED clusters: ${FAILED_CLUSTERS[*]}"
exit 1
fi
echo ""
echo "Patch complete. Run the CVE check script to verify:"
echo " python3 scripts/nginx-cve-check.py /tmp/nginx-fleet-inventory.csv"
Step 5 — Track patch progress and SLA compliance
#!/bin/bash
# scripts/nginx-patch-sla-report.sh
# Generates a patch SLA compliance report for a given CVE
CVE_ID="${1:?Usage: $0 <cve-id> <disclosure-date> <sla-days>}"
DISCLOSURE_DATE="${2:?}" # Format: YYYY-MM-DD
SLA_DAYS="${3:-7}"
INVENTORY_FILE="/tmp/nginx-fleet-inventory.csv"
DISCLOSURE_EPOCH=$(date -d "$DISCLOSURE_DATE" +%s)
NOW_EPOCH=$(date +%s)
DAYS_ELAPSED=$(( (NOW_EPOCH - DISCLOSURE_EPOCH) / 86400 ))
SLA_DEADLINE=$(date -d "$DISCLOSURE_DATE + $SLA_DAYS days" +%Y-%m-%d)
echo "=== NGINX $CVE_ID Patch SLA Report ==="
echo "CVE disclosure: $DISCLOSURE_DATE"
echo "SLA deadline: $SLA_DEADLINE (${SLA_DAYS} days)"
echo "Days elapsed: $DAYS_ELAPSED"
echo ""
python3 scripts/nginx-cve-check.py "$INVENTORY_FILE" 2>/dev/null | \
grep -E "CRITICAL|WARNING|UNKNOWN" | \
while IFS= read -r line; do
if [[ "$DAYS_ELAPSED" -gt "$SLA_DAYS" ]]; then
echo "SLA BREACH: $line"
else
echo "OPEN: $line (${SLA_DAYS - DAYS_ELAPSED} days remaining)"
fi
done
# Count total vs patched
TOTAL=$(wc -l < "$INVENTORY_FILE")
VULNERABLE=$(python3 scripts/nginx-cve-check.py "$INVENTORY_FILE" 2>/dev/null | \
grep -c "CRITICAL\|WARNING")
PATCHED=$((TOTAL - VULNERABLE - 1)) # -1 for header
echo ""
echo "Summary: $PATCHED/$TOTAL hosts patched ($(( PATCHED * 100 / TOTAL ))%)"
Expected Behaviour
| Scenario | Without process | With process |
|---|---|---|
| CVE published; fleet inventory needed | Manual host-by-host audit; 2–3 days | Inventory script runs in 10 minutes; vulnerable hosts identified immediately |
| Patch Ansible tier | Manual per-host; no coordination | serial: 20% rolling playbook; automatic rollback on failure; health check post-patch |
| Patch Kubernetes ingress-nginx | Manual Helm upgrade per cluster; no staging gate | Staged script: staging first, 15-min wait, then production with atomic rollback |
| Application container tier | Not tracked; patching unknown | Docker image scan in CI catches NGINX base image version; application teams alerted |
| CVE SLA compliance reporting | Not tracked | SLA report script shows per-host compliance percentage and days remaining |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
serial: 20% in Ansible |
Limits blast radius of bad patch | Takes longer to patch full fleet | Acceptable trade-off for zero-downtime patching; increase serial percentage for critical CVEs |
Helm --atomic flag |
Automatic rollback on upgrade failure | Rollback may cause brief traffic interruption during rollout | Prefer rollback over leaving a broken ingress state; monitor rollout in real time |
| Inventorying application container NGINX | Full fleet visibility | Requires scanning all pod images in all clusters | Run inventory script weekly and on CVE publication; pipe into Slack or ticketing |
| CVE database in script | Immediate check without external dependencies | Must be updated manually as CVEs are published | Subscribe to nginx-announce mailing list; create a bot that opens a PR to update the CVE list |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Ansible NGINX package pinned by OS configuration | apt-get upgrade nginx says “already latest” but vulnerable version remains |
CVE check script still flags host post-patch | Check /etc/apt/preferences.d/ for nginx pins; remove pin and retry patch |
Helm --atomic rollback triggers mid-production rollout |
Ingress-nginx rolls back; cluster still on old version | Helm release shows previous revision; CVE check flags cluster | Investigate why upgrade failed (helm history ingress-nginx -n ingress-nginx); fix and retry |
| Application container NGINX not caught by inventory | Hidden vulnerable NGINX in application pods | Post-patch CVE scan reports clean; exploitation via application container | Add image scanning to application CI pipelines; require approved NGINX base image tags |
| Patch applied but NGINX not reloaded | Service running old binary from before update | nginx -v shows new version but /proc/$(pgrep nginx)/exe shows old path |
systemctl restart nginx (not reload) forces binary reload; test service still active |
Related Articles
- NGINX Worker Privilege Hardening — OS-level controls that contain exploitation during the patch window
- ingress-nginx Version Pinning — Renovate and ArgoCD-based automated version management for ingress-nginx
- Cyber Insurance Technical Requirements — patch SLA requirements that cyber insurance mandates and how to demonstrate compliance
- Vulnerability Management Program — the broader vulnerability management process NGINX CVEs fall within
- NGINX CVE Exploitation Detection — detection rules that alert on exploitation while the patch is being rolled out