Patch SLA Compression in the LLM Exploit Era: From 30 Days to 24 Hours
The Problem
Enterprise vulnerability management programs typically define patch SLAs by CVSS severity: Critical (CVSS ≥ 9.0) within 30 days, High (CVSS 7.0–8.9) within 60 days. These numbers were calibrated in an era when exploit development took days to weeks and attackers needed significant skill to turn a CVE description into working code.
That calibration is obsolete. LLM-assisted exploit pipelines can produce a functional exploit for a well-described CVE within hours. The CISA KEV list grows weekly with CVEs that were weaponised the same week they were published. The 30-day SLA for critical CVEs now means that a significant fraction of organisations will be exploited before their change-control process even schedules the patch.
The right answer is not simply to say “patch faster” — that advice ignores the infrastructure and process constraints that make fast patching hard. The answer is to identify and eliminate the structural blockers that make sub-24-hour patching impossible, then implement the automation that makes it reliable.
The structural blockers are:
Mutable infrastructure. If patching requires SSH-ing into servers and running apt upgrade, you need available engineers, maintenance windows, and manual verification. Immutable images (rebuild and redeploy, never patch in place) make patching a deployment operation, not a system administration operation.
Change-control bureaucracy. Many enterprise change-control processes require a CAB meeting, a change ticket, a rollback plan, and a multi-hour approval window. A 24-hour SLA cannot coexist with a weekly CAB cycle. Security-exception pathways and automated approval workflows are required.
Test/staging environment lag. If patching requires the patch to traverse dev → test → staging → production with manual approval at each gate, the latency is days even with an immutable image model. Parallel staged rollout (5% → 25% → 100%) with automated health gates removes human approval from the hot path.
Escalation paths not defined. A weekend CVE requires an on-call engineer with the authority and access to deploy a patch. Many organisations have on-call for production incidents but not for security patches. The patch SLA timer does not stop on weekends.
Target systems: Any internet-facing production infrastructure; organisations with CVSS 9+ CVEs appearing in their scanner results more than once per quarter; teams currently operating on a 30-day or 60-day patch cycle.
Threat Model
1. Automated exploit deployment targeting unpatched fleet (attacker with LLM exploit pipeline). Objective: scan the internet for services returning version strings or response headers indicating vulnerable software; deploy LLM-generated exploit at scale. Impact: organisations still in their 30-day patch window are compromised before they patch.
2. Targeted attack on high-value organisation (APT or criminal group). Objective: identify a specific target’s technology stack via prior reconnaissance; wait for a CVE affecting their stack; deploy an LLM-generated exploit within 24 hours of publication. Impact: targeted intrusion; data exfiltration; ransomware deployment.
3. Patch regression causing production incident (accidental, during emergency patch). Objective (unintentional): rushing a patch through without adequate testing causes a regression that breaks production. Impact: availability incident; may be worse than the CVE itself for some scenarios. This is the risk the staged rollout approach mitigates.
Hardening Configuration
Step 1: Define SLA Tiers and Measure Baseline
Before you can compress SLAs, you need to measure where you are:
# Measure current patch latency from CVE publication to production deployment
# Query your scanner history and deployment logs
# Example: Trivy history + Kubernetes deployment timestamps
# For each CVE: find earliest scan showing it, find latest scan not showing it
# (indicating patch deployed), compute duration
python3 << 'EOF'
import json, subprocess
from datetime import datetime
# Get scan history for a specific CVE
CVE = "CVE-2025-XXXXX"
# (In practice: query your scanner's API or database)
# Compute median, 90th percentile patch latency from historical data
patch_times = [72, 144, 336, 48, 720, 24, 168] # hours, sample data
patch_times.sort()
n = len(patch_times)
print(f"Median patch time: {patch_times[n//2]}h")
print(f"P90 patch time: {patch_times[int(n*0.9)]}h")
print(f"Max patch time: {max(patch_times)}h")
EOF
Define your target SLA tiers:
| Tier | Criteria | Target SLA | Automation Level |
|---|---|---|---|
| P0 | CISA KEV + internet-exposed | 12 hours | Fully automated; security approval required |
| P1 | CVSS ≥ 9.5 OR EPSS ≥ 0.30 | 24 hours | Automated + emergency change-control |
| P2 | CVSS 9.0–9.4 OR EPSS 0.15–0.30 | 72 hours | Normal fast-track process |
| P3 | CVSS 7.0–8.9, EPSS < 0.15 | 7 days | Standard process |
Step 2: Immutable Infrastructure Prerequisites
# Base image pattern for immutable deployments
# Always pin to a digest, not a tag
FROM debian:bookworm@sha256:1234abc... AS base
# All package installations happen at build time
RUN apt-get update && apt-get install -y --no-install-recommends \
libssl3=3.0.15-1 \
&& rm -rf /var/lib/apt/lists/*
# No SSH, no package manager in the final image
FROM gcr.io/distroless/base-debian12 AS final
COPY --from=base /usr/lib /usr/lib
COPY --from=base /app /app
USER nonroot:nonroot
ENTRYPOINT ["/app/server"]
Automated base image rebuild on upstream CVE patches:
# Renovate config: auto-update base image digests
{
"extends": ["config:base"],
"docker": {
"enabled": true,
"pinDigests": true
},
"packageRules": [
{
"matchPackagePatterns": ["debian", "alpine", "ubuntu"],
"automerge": false, # Security team reviews base image updates
"labels": ["security", "base-image-update"]
}
]
}
Step 3: Emergency Change-Control Pathway
Define a security change pathway that bypasses the standard CAB cycle for P0/P1 patches:
# change-control-policy.yaml
standard_pathway:
approval_required: ["change-advisory-board"]
minimum_lead_time_hours: 48
applies_to: [P3, P4]
fast_track_pathway:
approval_required: ["engineering-lead", "security-team"]
minimum_lead_time_hours: 4
applies_to: [P2]
emergency_security_pathway:
approval_required: ["security-oncall", "ciso-delegate"]
minimum_lead_time_hours: 0 # Can start immediately
applies_to: [P0, P1]
constraints:
- staged_rollout_required: true
- automated_rollback_required: true
- post_deployment_review_hours: 24
notification:
- ciso@example.com
- security-leads@example.com
Automate the pathway selection:
#!/usr/bin/env python3
# patch-pathway.py — determines correct pathway for a given CVE
import requests
def get_patch_pathway(cve_id: str) -> str:
# Fetch EPSS score
resp = requests.get(f"https://api.first.org/data/v1/epss?cve={cve_id}")
epss = float(resp.json()["data"][0]["epss"]) if resp.ok else 0.0
# Fetch KEV status
kev_resp = requests.get(
"https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json"
)
kev_ids = {v["cveID"] for v in kev_resp.json()["vulnerabilities"]}
# Fetch CVSS score (simplified)
nvd_resp = requests.get(
f"https://services.nvd.nist.gov/rest/json/cves/2.0?cveId={cve_id}"
)
try:
cvss = nvd_resp.json()["vulnerabilities"][0]["cve"]["metrics"]["cvssMetricV31"][0]["cvssData"]["baseScore"]
except (KeyError, IndexError):
cvss = 0.0
if cve_id in kev_ids:
return "P0", 12
elif cvss >= 9.5 or epss >= 0.30:
return "P1", 24
elif cvss >= 9.0 or epss >= 0.15:
return "P2", 72
else:
return "P3", 168
tier, sla_hours = get_patch_pathway(sys.argv[1])
print(f"Tier: {tier}, SLA: {sla_hours}h")
Step 4: Staged Rollout with Automated Health Gates
# Argo Rollouts strategy for emergency patches
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: app
spec:
strategy:
canary:
steps:
- setWeight: 5 # 5% of traffic for 5 minutes
- pause: {duration: 5m}
- analysis:
templates:
- templateName: error-rate-check
- setWeight: 25
- pause: {duration: 10m}
- analysis:
templates:
- templateName: error-rate-check
- setWeight: 100
autoPromotionEnabled: true # No human approval needed for health-gated rollout
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: error-rate-check
spec:
metrics:
- name: error-rate
successCondition: result[0] < 0.005 # < 0.5% error rate
failureLimit: 1
interval: 60s
provider:
prometheus:
address: http://prometheus:9090
query: |
sum(rate(http_requests_total{status=~"5.."}[2m])) /
sum(rate(http_requests_total[2m]))
Step 5: SLA Tracking and Escalation
# sla-tracker.py — tracks CVE patch SLA compliance
# Runs every 15 minutes as a Kubernetes CronJob
import json
from datetime import datetime, timedelta, UTC
import requests
class CVESLATracker:
def __init__(self):
self.findings = self.load_active_findings()
def load_active_findings(self) -> list[dict]:
# Query your scanner API for unpatched CVEs
# Return list of {cve_id, tier, discovered_at, target_host_count}
pass
def check_sla_status(self) -> list[dict]:
now = datetime.now(UTC)
results = []
for finding in self.findings:
tier, sla_hours = get_patch_pathway(finding["cve_id"])
deadline = finding["discovered_at"] + timedelta(hours=sla_hours)
hours_remaining = (deadline - now).total_seconds() / 3600
status = "on-track"
if hours_remaining < 0:
status = "breached"
elif hours_remaining < sla_hours * 0.25: # < 25% time remaining
status = "at-risk"
results.append({
"cve_id": finding["cve_id"],
"tier": tier,
"sla_hours": sla_hours,
"hours_remaining": hours_remaining,
"status": status,
"affected_hosts": finding["target_host_count"],
})
return results
def escalate_at_risk(self, findings: list[dict]):
for f in findings:
if f["status"] == "at-risk":
self.notify_oncall(f)
elif f["status"] == "breached":
self.notify_ciso(f)
self.open_incident(f)
Prometheus metrics for SLA dashboard:
# SLA compliance metric
cat << 'EOF' | curl --data-binary @- http://pushgateway:9091/metrics/job/patch-sla
# HELP cve_patch_sla_hours_remaining Hours until SLA breach for unpatched CVE
# TYPE cve_patch_sla_hours_remaining gauge
cve_patch_sla_hours_remaining{cve="CVE-2026-XXXXX",tier="P1"} 18.5
# HELP cve_patch_sla_breached Whether a CVE has breached its patch SLA
# TYPE cve_patch_sla_breached gauge
cve_patch_sla_breached{cve="CVE-2025-YYYYY",tier="P2"} 0
EOF
Step 6: Weekend and Holiday Coverage
# on-call rotation that includes security patch authority
# PagerDuty escalation policy for P0/P1 CVEs
escalation_policy:
name: Security Patch Response
rules:
- escalation_delay_in_minutes: 0
targets:
- type: schedule
id: security-oncall-primary # First: security oncall
- escalation_delay_in_minutes: 15
targets:
- type: schedule
id: security-oncall-secondary # 15 min: secondary oncall
- escalation_delay_in_minutes: 30
targets:
- type: user
id: ciso-delegate # 30 min: CISO delegate
Pre-authorise emergency deployments to avoid weekend approval delays:
# Pre-authorisation stored in a secrets manager as a time-bounded token
# The on-call engineer presents this token to the deployment pipeline
# without needing a CAB approval for P0/P1 patches
vault write auth/approle/role/emergency-patch-deployer \
secret_id_ttl=24h \
token_ttl=1h \
token_policies=["production-deploy", "read-only-config"]
Expected Behaviour After Hardening
| Scenario | 30-Day SLA Era | 24-Hour SLA with Hardening |
|---|---|---|
| CISA KEV published Friday 17:00 | Triage begins Monday; patch deployed next week | On-call paged Friday 17:05; patch deployed Saturday 05:00 |
| CVSS 9.8, EPSS 0.35, active scanning detected | SLA clock starts; 30-day target | P1 classification; 24-hour clock; automated rollout starts |
| Patch causes regression at 5% canary | Full rollout deployed; regression found in production | Argo Rollouts health gate fails; automatic rollback; no wider impact |
| SLA breached for P3 CVE | No monitoring; team discovers weeks later | Metric fires; ticket opened; escalation to engineering lead |
Trade-offs and Operational Considerations
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Immutable image model | Patching is a deployment operation; fast and repeatable | Base image rebuild time adds 10-30 min to patch timeline | Pre-build base images nightly; patch only needs app layer rebuild |
| Automated staged rollout | No human approval needed for healthy patches | Auto-rollout deploys a regression if health checks miss it | Define health checks that cover the specific CVE’s attack vector |
| Emergency change pathway | Bypasses CAB latency | Reduced oversight for changes; higher regression risk | Require post-deployment review within 24h; log all emergency deployments |
| EPSS-based tier assignment | Focuses resources on likely-to-be-exploited CVEs | EPSS lags 24-72h post-publication; may misclassify fresh CVEs | Use KEV as a CVSS-independent tier classifier; re-evaluate EPSS at 24h |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| On-call engineer lacks deployment access | P0 CVE unpatched despite escalation | SLA metric shows breach; deployment log empty | Pre-provision break-glass credentials with dual-approval for on-call role |
| Staged rollout health check too lenient | Regression reaches 100% before detection | Post-deployment error rate spike | Tighten health check thresholds; add smoke tests specific to the patched component |
| SLA tracker misclassifies CVE tier | Wrong urgency; wrong timeline | Manual review against policy; EPSS/KEV mismatch | Allow security team to manually override tier assignment with justification |
| Immutable image rebuild fails (base image pull error) | Patch timeline extended by build failure | CI build failure alert; SLA clock continues | Pre-pull base images to local mirror; use cached layer if upstream unavailable |