NGINX CVE Patch Management Across Mixed Bare Metal, VM, and Kubernetes Fleets

Problem

Most organisations running NGINX at scale have it deployed in at least two different ways: as a systemd service on bare metal or VMs (managed via Ansible, Salt, or manual configuration), and as the ingress-nginx controller in Kubernetes (managed via Helm). Some have a third tier: NGINX in Docker containers built into application images, often with no automatic update path.

When a critical NGINX CVE is published — like CVE-2024-7347, CVE-2025-23419, or the 2025 ingress annotation injection family — the patching process differs for each tier:

Bare metal / VM tier: NGINX is installed as a distro package or from nginx.org’s repository. Patching requires running the package manager (apt-get upgrade nginx or yum update nginx) across each host. Ansible can orchestrate this, but the fleet may have different distros, package repositories, and OS configurations. Some hosts may be running NGINX compiled from source with custom modules, which requires a recompile.

Kubernetes ingress-nginx tier: NGINX lives inside the ingress-nginx controller image, versioned by Helm chart. Patching requires updating the Helm chart version, which changes the controller image, which triggers a rolling restart of ingress-nginx pods. The challenge is that breaking changes in the Helm chart can affect all inbound traffic to the cluster.

Application container tier: NGINX is baked into application Docker images as a reverse proxy or static file server. These images have their own build pipelines and may use outdated base images (nginx:1.24-alpine from months ago). There is no central way to patch these — each application team must rebuild and redeploy their image.

The inventory problem. Before patching, you need to know what you have. Many organisations cannot answer “how many NGINX instances are running right now, and what versions?” across the full fleet. Without inventory, there is no way to measure patch progress or confirm coverage.

The patch window. For critical CVEs, the window between disclosure and active exploitation can be less than 48 hours. A patching process that takes two weeks leaves the fleet exposed. OS-level controls (Seccomp, capability bounding sets — see the companion article on NGINX worker hardening) buy time in this window, but patching is the actual remediation.

Target systems: any organisation operating NGINX at scale across more than one deployment tier; security teams responsible for CVE SLAs; platform teams who own NGINX infrastructure but not individual application NGINX deployments.

Threat Model

Risk 1 — Extended exposure window. A critical NGINX CVE is disclosed. The fleet has 200 NGINX instances. Without a patch management process, the team must manually identify all instances, determine their versions, and patch them individually. The process takes three weeks; active exploitation begins in week one.

Risk 2 — Hidden NGINX in application containers. Application teams have embedded NGINX in their Docker images as a static file server. These images are not tracked in the NGINX fleet inventory. A CVE that affects all NGINX versions is patched on the infrastructure tier but persists in 15 application containers that the security team doesn’t know exist.

Risk 3 — Package repository pinning blocks patches. A host has a pinned NGINX version (via /etc/apt/preferences.d/nginx or yum.conf excludes) because a previous upgrade caused issues. The pin silently prevents the security update from being applied. The host reports “NGINX up to date” but is running a vulnerable version.

Configuration / Implementation

Step 1 — Build fleet inventory across all tiers

#!/bin/bash
# nginx-fleet-inventory.sh
# Comprehensive NGINX version inventory across all deployment tiers

OUTPUT_FILE="/tmp/nginx-fleet-inventory-$(date +%Y%m%d-%H%M).csv"
echo "Tier,Host/Cluster,Location,NGINX Version,Source,Last Updated" > "$OUTPUT_FILE"

# Tier 1: Systemd hosts via Ansible ad-hoc
echo "=== Tier 1: Bare metal / VM NGINX versions ==="
ansible all -m shell -a "nginx -v 2>&1 | grep -oP '(?<=nginx/)[\d.]+'; systemctl is-active nginx" \
  --limit nginx_hosts 2>/dev/null | \
  while IFS= read -r line; do
    if [[ "$line" =~ ^([^|]+)\|(.+)$ ]]; then
      HOST="${BASH_REMATCH[1]}"
      VERSION="${BASH_REMATCH[2]}"
      echo "systemd,$HOST,/usr/sbin/nginx,$VERSION,package,$(date +%Y-%m-%d)" >> "$OUTPUT_FILE"
    fi
  done

# Tier 2: Kubernetes ingress-nginx via kubectl
echo ""
echo "=== Tier 2: Kubernetes ingress-nginx versions ==="
for context in $(kubectl config get-contexts -o name 2>/dev/null); do
  kubectl --context="$context" get pods -A \
    -l "app.kubernetes.io/name=ingress-nginx" \
    -o jsonpath='{range .items[*]}{.spec.containers[0].image}{"\t"}{.metadata.namespace}{"\n"}{end}' \
    2>/dev/null | \
  while IFS=$'\t' read -r image namespace; do
    version=$(echo "$image" | grep -oP '(?<=:v)[\d.]+')
    echo "kubernetes,$context,$namespace/ingress-nginx,${version:-unknown},helm,$(date +%Y-%m-%d)" >> "$OUTPUT_FILE"
  done
done

# Tier 3: NGINX in Docker containers (Kubernetes pods)
echo ""
echo "=== Tier 3: Application containers running NGINX ==="
for context in $(kubectl config get-contexts -o name 2>/dev/null); do
  kubectl --context="$context" get pods -A \
    -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}: {range .spec.containers[*]}{.image}{" "}{end}{"\n"}{end}' \
    2>/dev/null | \
  while IFS= read -r pod_images; do
    if echo "$pod_images" | grep -qE "nginx:[0-9]|/nginx:"; then
      POD=$(echo "$pod_images" | cut -d: -f1)
      IMAGE=$(echo "$pod_images" | grep -oE "nginx:[^[:space:]]+" | head -1)
      VERSION=$(echo "$IMAGE" | grep -oP '(?<=:)[\d.]+')
      echo "container,$context,$POD,${VERSION:-embedded},docker-image,unknown" >> "$OUTPUT_FILE"
    fi
  done
done

echo ""
echo "Inventory written to: $OUTPUT_FILE"
cat "$OUTPUT_FILE"

Step 2 — Check inventory against CVE-affected versions

#!/usr/bin/env python3
# scripts/nginx-cve-check.py
# Compares fleet inventory against known CVE-affected version ranges

import csv
import sys
from dataclasses import dataclass
from typing import Optional
from packaging.version import Version, InvalidVersion

@dataclass
class NginxCVE:
    cve_id: str
    severity: str
    affected_mainline_lt: Optional[str]
    affected_stable_lt: Optional[str]
    description: str

# Current CVE database — update as new CVEs are published
NGINX_CVES = [
    NginxCVE("CVE-2024-7347", "MEDIUM", "1.27.1", "1.26.2",
              "ngx_http_mp4_module heap buffer overflow"),
    NginxCVE("CVE-2024-24989", "HIGH", "1.25.4", None,
              "QUIC module NULL pointer dereference"),
    NginxCVE("CVE-2024-24990", "HIGH", "1.25.4", None,
              "QUIC module use-after-free"),
    NginxCVE("CVE-2025-23419", "MEDIUM", "1.27.4", "1.26.3",
              "mTLS session resumption bypass"),
]

def parse_version(version_str: str) -> Optional[Version]:
    try:
        # Strip 'v' prefix if present
        return Version(version_str.lstrip('v'))
    except InvalidVersion:
        return None

def check_vulnerabilities(nginx_version: str) -> list[NginxCVE]:
    v = parse_version(nginx_version)
    if not v:
        return []
    
    findings = []
    for cve in NGINX_CVES:
        is_mainline = (v.major, v.minor) in [(v.major, m) for m in [25, 27]]
        
        if is_mainline and cve.affected_mainline_lt:
            patched = parse_version(cve.affected_mainline_lt)
            if patched and v < patched:
                findings.append(cve)
        elif not is_mainline and cve.affected_stable_lt:
            patched = parse_version(cve.affected_stable_lt)
            if patched and v < patched:
                findings.append(cve)
    
    return findings

if __name__ == "__main__":
    inventory_file = sys.argv[1] if len(sys.argv) > 1 else "/tmp/nginx-fleet-inventory.csv"
    
    critical_hosts = []
    
    with open(inventory_file) as f:
        reader = csv.DictReader(f)
        for row in reader:
            version = row.get("NGINX Version", "unknown")
            if version in ("unknown", "embedded", ""):
                print(f"UNKNOWN version: {row['Tier']}/{row['Host/Cluster']}/{row['Location']}")
                continue
            
            findings = check_vulnerabilities(version)
            if findings:
                host_id = f"{row['Tier']}/{row['Host/Cluster']}/{row['Location']}"
                critical = any(f.severity in ("HIGH", "CRITICAL") for f in findings)
                if critical:
                    critical_hosts.append(host_id)
                
                print(f"\n{'CRITICAL' if critical else 'WARNING'}: {host_id}")
                print(f"  Running NGINX {version}")
                for finding in findings:
                    print(f"  [{finding.severity}] {finding.cve_id}: {finding.description}")
    
    if critical_hosts:
        print(f"\n\nCRITICAL: {len(critical_hosts)} hosts require immediate patching")
        sys.exit(1)
    else:
        print("\nAll hosts running patched NGINX versions")
        sys.exit(0)

Step 3 — Patch Tier 1: bare metal / VM via Ansible

# playbooks/nginx-emergency-patch.yml
# Emergency patching playbook for NGINX CVE remediation

---
- name: Emergency NGINX Patch — CVE Remediation
  hosts: nginx_hosts
  serial: "20%"  # Roll out to 20% of hosts at a time
  max_fail_percentage: 10  # Abort if more than 10% of hosts fail
  
  vars:
    target_nginx_version_debian: "1.26.3"
    target_nginx_version_rhel: "1.26.3"
    slack_webhook: "{{ lookup('env', 'SLACK_WEBHOOK') }}"
  
  pre_tasks:
    - name: Record current NGINX version
      command: nginx -v
      register: nginx_version_before
      changed_when: false
      ignore_errors: true
    
    - name: Verify NGINX is running before patching
      service_facts:
    
    - name: Check if NGINX is active
      assert:
        that: "ansible_facts.services['nginx.service'].state == 'running'"
        fail_msg: "NGINX is not running on {{ inventory_hostname }} — skipping patch"
      ignore_errors: true

  tasks:
    - name: Patch NGINX on Debian/Ubuntu
      apt:
        name: nginx
        state: latest
        update_cache: yes
      when: ansible_os_family == "Debian"
      notify: reload nginx
    
    - name: Patch NGINX on RHEL/CentOS
      yum:
        name: nginx
        state: latest
        update_cache: yes
      when: ansible_os_family == "RedHat"
      notify: reload nginx
    
    - name: Record new NGINX version
      command: nginx -v
      register: nginx_version_after
      changed_when: false
    
    - name: Test NGINX configuration is valid
      command: nginx -t
      register: nginx_test
      changed_when: false
    
    - name: Log patch result
      debug:
        msg: "{{ inventory_hostname }}: {{ nginx_version_before.stderr | default('unknown') }} → {{ nginx_version_after.stderr }}"

  handlers:
    - name: reload nginx
      service:
        name: nginx
        state: reloaded

  post_tasks:
    - name: Verify NGINX is serving requests after reload
      uri:
        url: "http://localhost/health"
        status_code: 200
      ignore_errors: true
      register: health_check
    
    - name: Alert on failed health check
      debug:
        msg: "WARNING: Health check failed on {{ inventory_hostname }} after NGINX patch"
      when: health_check.status is defined and health_check.status != 200

Step 4 — Patch Tier 2: Kubernetes ingress-nginx

#!/bin/bash
# scripts/patch-ingress-nginx.sh
# Staged ingress-nginx update across Kubernetes clusters

PATCHED_CHART_VERSION="${1:?Usage: $0 <chart-version> [cluster-context...]}"
CLUSTERS="${@:2}"

# If no clusters specified, use all contexts
if [[ -z "$CLUSTERS" ]]; then
    CLUSTERS=$(kubectl config get-contexts -o name 2>/dev/null)
fi

STAGING_CLUSTER="${STAGING_CLUSTER:-staging}"
WAIT_MINUTES="${WAIT_MINUTES:-15}"

patch_cluster() {
    local context="$1"
    echo ""
    echo "=== Patching ingress-nginx on cluster: $context ==="
    
    # Get current version
    CURRENT=$(helm -n ingress-nginx list --kube-context="$context" \
        -o json 2>/dev/null | jq -r '.[0].chart')
    echo "Current chart: $CURRENT"
    
    # Upgrade
    helm upgrade ingress-nginx ingress-nginx/ingress-nginx \
        --kube-context="$context" \
        --namespace ingress-nginx \
        --version "$PATCHED_CHART_VERSION" \
        --wait \
        --timeout 5m \
        --atomic \  # Roll back automatically on failure
        -f "helm/ingress-nginx/values-${context}.yaml" 2>&1
    
    if [[ $? -ne 0 ]]; then
        echo "FAIL: Helm upgrade failed on $context — check for automatic rollback"
        return 1
    fi
    
    # Verify new pods are running
    kubectl --context="$context" rollout status \
        deployment/ingress-nginx-controller \
        -n ingress-nginx \
        --timeout=3m
    
    # Smoke test — send a test request through ingress
    echo "Running smoke test on $context..."
    INGRESS_IP=$(kubectl --context="$context" get svc \
        -n ingress-nginx ingress-nginx-controller \
        -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
    
    HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
        --connect-timeout 5 \
        -H "Host: health.example.com" \
        "http://$INGRESS_IP/health" 2>/dev/null)
    
    if [[ "$HTTP_STATUS" != "200" ]]; then
        echo "WARN: Smoke test returned HTTP $HTTP_STATUS on $context"
    else
        echo "OK: Smoke test passed on $context"
    fi
    
    return 0
}

# Always patch staging first
echo "Phase 1: Staging cluster"
patch_cluster "$STAGING_CLUSTER" || {
    echo "ABORT: Staging patch failed — halting production rollout"
    exit 1
}

echo "Waiting ${WAIT_MINUTES} minutes before production rollout..."
echo "Monitor staging: kubectl --context=$STAGING_CLUSTER get pods -n ingress-nginx -w"
sleep "${WAIT_MINUTES}m"

# Patch production clusters
echo ""
echo "Phase 2: Production clusters"
FAILED_CLUSTERS=()
for context in $CLUSTERS; do
    [[ "$context" == "$STAGING_CLUSTER" ]] && continue
    patch_cluster "$context" || FAILED_CLUSTERS+=("$context")
done

if [[ ${#FAILED_CLUSTERS[@]} -gt 0 ]]; then
    echo ""
    echo "FAILED clusters: ${FAILED_CLUSTERS[*]}"
    exit 1
fi

echo ""
echo "Patch complete. Run the CVE check script to verify:"
echo "  python3 scripts/nginx-cve-check.py /tmp/nginx-fleet-inventory.csv"

Step 5 — Track patch progress and SLA compliance

#!/bin/bash
# scripts/nginx-patch-sla-report.sh
# Generates a patch SLA compliance report for a given CVE

CVE_ID="${1:?Usage: $0 <cve-id> <disclosure-date> <sla-days>}"
DISCLOSURE_DATE="${2:?}"  # Format: YYYY-MM-DD
SLA_DAYS="${3:-7}"

INVENTORY_FILE="/tmp/nginx-fleet-inventory.csv"

DISCLOSURE_EPOCH=$(date -d "$DISCLOSURE_DATE" +%s)
NOW_EPOCH=$(date +%s)
DAYS_ELAPSED=$(( (NOW_EPOCH - DISCLOSURE_EPOCH) / 86400 ))
SLA_DEADLINE=$(date -d "$DISCLOSURE_DATE + $SLA_DAYS days" +%Y-%m-%d)

echo "=== NGINX $CVE_ID Patch SLA Report ==="
echo "CVE disclosure: $DISCLOSURE_DATE"
echo "SLA deadline: $SLA_DEADLINE (${SLA_DAYS} days)"
echo "Days elapsed: $DAYS_ELAPSED"
echo ""

python3 scripts/nginx-cve-check.py "$INVENTORY_FILE" 2>/dev/null | \
    grep -E "CRITICAL|WARNING|UNKNOWN" | \
    while IFS= read -r line; do
        if [[ "$DAYS_ELAPSED" -gt "$SLA_DAYS" ]]; then
            echo "SLA BREACH: $line"
        else
            echo "OPEN: $line (${SLA_DAYS - DAYS_ELAPSED} days remaining)"
        fi
    done

# Count total vs patched
TOTAL=$(wc -l < "$INVENTORY_FILE")
VULNERABLE=$(python3 scripts/nginx-cve-check.py "$INVENTORY_FILE" 2>/dev/null | \
    grep -c "CRITICAL\|WARNING")
PATCHED=$((TOTAL - VULNERABLE - 1))  # -1 for header

echo ""
echo "Summary: $PATCHED/$TOTAL hosts patched ($(( PATCHED * 100 / TOTAL ))%)"

Expected Behaviour

Scenario	Without process	With process
CVE published; fleet inventory needed	Manual host-by-host audit; 2–3 days	Inventory script runs in 10 minutes; vulnerable hosts identified immediately
Patch Ansible tier	Manual per-host; no coordination	`serial: 20%` rolling playbook; automatic rollback on failure; health check post-patch
Patch Kubernetes ingress-nginx	Manual Helm upgrade per cluster; no staging gate	Staged script: staging first, 15-min wait, then production with atomic rollback
Application container tier	Not tracked; patching unknown	Docker image scan in CI catches NGINX base image version; application teams alerted
CVE SLA compliance reporting	Not tracked	SLA report script shows per-host compliance percentage and days remaining

Trade-offs

Aspect	Benefit	Cost	Mitigation
`serial: 20%` in Ansible	Limits blast radius of bad patch	Takes longer to patch full fleet	Acceptable trade-off for zero-downtime patching; increase serial percentage for critical CVEs
Helm `--atomic` flag	Automatic rollback on upgrade failure	Rollback may cause brief traffic interruption during rollout	Prefer rollback over leaving a broken ingress state; monitor rollout in real time
Inventorying application container NGINX	Full fleet visibility	Requires scanning all pod images in all clusters	Run inventory script weekly and on CVE publication; pipe into Slack or ticketing
CVE database in script	Immediate check without external dependencies	Must be updated manually as CVEs are published	Subscribe to nginx-announce mailing list; create a bot that opens a PR to update the CVE list

Failure Modes

Failure	Symptom	Detection	Recovery
Ansible NGINX package pinned by OS configuration	`apt-get upgrade nginx` says “already latest” but vulnerable version remains	CVE check script still flags host post-patch	Check `/etc/apt/preferences.d/` for nginx pins; remove pin and retry patch
Helm `--atomic` rollback triggers mid-production rollout	Ingress-nginx rolls back; cluster still on old version	Helm release shows previous revision; CVE check flags cluster	Investigate why upgrade failed (`helm history ingress-nginx -n ingress-nginx`); fix and retry
Application container NGINX not caught by inventory	Hidden vulnerable NGINX in application pods	Post-patch CVE scan reports clean; exploitation via application container	Add image scanning to application CI pipelines; require approved NGINX base image tags
Patch applied but NGINX not reloaded	Service running old binary from before update	`nginx -v` shows new version but `/proc/$(pgrep nginx)/exe` shows old path	`systemctl restart nginx` (not reload) forces binary reload; test service still active

NGINX Worker Privilege Hardening — OS-level controls that contain exploitation during the patch window
ingress-nginx Version Pinning — Renovate and ArgoCD-based automated version management for ingress-nginx
Cyber Insurance Technical Requirements — patch SLA requirements that cyber insurance mandates and how to demonstrate compliance
Vulnerability Management Program — the broader vulnerability management process NGINX CVEs fall within
NGINX CVE Exploitation Detection — detection rules that alert on exploitation while the patch is being rolled out