Managing CVE Remediation Pipelines at Scale
Problem
The volume of CVEs affecting software dependencies has increased sharply. A moderately complex application with 200 direct and transitive dependencies may see 50–100 CVE-related dependency update PRs per month from Renovate or Dependabot. A platform team managing 30 repositories may see 1,500–3,000 dependency PRs per month.
The conventional response — have a human review and merge each dependency update PR — does not scale at these volumes. The consequences of volume-induced overload are predictable:
Alert fatigue and queue abandonment. The team falls behind on dependency PRs. A large backlog accumulates. Team members stop reviewing the queue because it feels futile. PRs that contain critical CVE patches are lost in the noise alongside cosmetic version bumps.
Inconsistent SLA compliance. Without automation, some critical CVEs are patched quickly and others linger for months depending on which engineer happened to notice them. Audit trails show inconsistent patch times.
False positive noise. Vulnerability scanners flag CVEs in test dependencies, build-time-only packages, and packages where the vulnerable code path is unreachable. These false positives generate PRs that consume review time and erode trust in the scanner output.
The automation gap. Most teams have Renovate or Dependabot configured to open PRs automatically. The missing piece is the triage and merge automation: which PRs should be auto-merged, which need review, which can be deferred, and which need immediate escalation.
This article builds a CVE remediation pipeline that handles volume through automation: auto-merge for low-risk patches, EPSS-weighted prioritization, false positive suppression, and SLA-driven escalation for high-risk findings.
Target systems: any repository using Renovate or Dependabot for dependency updates; platform teams managing multiple repositories; security teams responsible for CVE remediation SLAs.
Threat Model
Risk 1 — CVE in KEV list merged late due to backlog. A KEV-listed CVE affects a production dependency. It generates an update PR that sits in the 200-PR backlog for three weeks. The patch SLA requires 24-hour remediation. The queue delay causes a compliance breach and potential exploitation window.
Risk 2 — Auto-merge introduces breaking change. An automated policy auto-merges a “patch” version bump (e.g., 1.2.3 → 1.2.4). The maintainer incorrectly labelled a breaking API change as a patch release. The application breaks in production.
Risk 3 — False positive drives unnecessary churn. A CVE in a development dependency (e.g., a testing tool) generates urgent-looking PRs. The team spends time reviewing and merging updates for CVEs that are not reachable in production, creating churn and eroding confidence in the scanner.
Configuration / Implementation
Step 1 — Configure Renovate with EPSS-aware grouping
{
"$schema": "https://docs.renovatebot.com/renovate-schema.json",
"extends": ["config:base"],
"vulnerabilityAlerts": {
"enabled": true,
"labels": ["security"],
"groupName": null
},
"packageRules": [
{
"description": "Auto-merge patch updates for low-risk packages (no CVE, passing tests)",
"matchUpdateTypes": ["patch"],
"matchPackagePatterns": [".*"],
"excludePackagePatterns": [
"openssl", "libssl", "cryptography", "jwt", "auth",
"postgres", "mysql", "sqlite", "redis"
],
"automerge": true,
"automergeType": "pr",
"automergeStrategy": "squash",
"requiredStatusChecks": ["ci/test", "ci/build"],
"prPriority": 0
},
{
"description": "Never auto-merge security-critical packages — always requires review",
"matchPackagePatterns": [
"openssl", "libssl", "cryptography", "bcrypt", "argon2",
"jsonwebtoken", "passport", "django", "flask", "rails",
"spring-security", "shiro", "bouncy-castle"
],
"automerge": false,
"labels": ["security", "requires-review"],
"prPriority": 10,
"reviewers": ["security-team"]
},
{
"description": "Vulnerability alerts — highest priority, no auto-merge",
"isVulnerabilityAlert": true,
"automerge": false,
"labels": ["security", "vulnerability", "priority"],
"prPriority": 100,
"reviewers": ["security-team"],
"schedule": ["at any time"],
"prTitle": "SECURITY: {{depName}} {{newVersion}} — CVE fix"
},
{
"description": "Group non-security patch updates to reduce PR noise",
"matchUpdateTypes": ["patch"],
"matchPackagePatterns": ["^@types/", "^eslint", "^prettier", "^jest", "^@testing-library"],
"groupName": "dev tooling patch updates",
"automerge": true,
"automergeType": "pr"
}
],
"prHourlyLimit": 5,
"prConcurrentLimit": 15,
"stabilityDays": 2
}
Step 2 — Build an EPSS-weighted PR triage workflow
#!/usr/bin/env python3
# scripts/triage-cve-prs.py
# Fetches open CVE-related PRs and enriches them with EPSS scores
# Outputs a prioritized triage list for the security team
import json
import subprocess
import urllib.request
from dataclasses import dataclass
from typing import Optional
@dataclass
class CVEPullRequest:
pr_number: int
title: str
repo: str
cve_ids: list[str]
created_at: str
age_days: float
epss_scores: list[float]
in_kev: bool
priority: str = "unknown"
def get_open_cve_prs(repo: str) -> list[dict]:
"""Fetch open PRs with security/vulnerability labels via GitHub CLI."""
result = subprocess.run(
["gh", "pr", "list", "--repo", repo,
"--label", "vulnerability",
"--json", "number,title,createdAt,labels,body"],
capture_output=True, text=True
)
if result.returncode != 0:
return []
return json.loads(result.stdout)
def extract_cves_from_pr(pr: dict) -> list[str]:
"""Extract CVE IDs from PR title and body."""
import re
text = f"{pr.get('title', '')} {pr.get('body', '')}"
return re.findall(r'CVE-\d{4}-\d{4,7}', text)
def fetch_epss_batch(cve_ids: list[str]) -> dict[str, float]:
if not cve_ids:
return {}
cve_param = ",".join(cve_ids[:100])
url = f"https://api.first.org/data/1.0/epss?cve={cve_param}"
try:
with urllib.request.urlopen(url, timeout=15) as resp:
data = json.loads(resp.read())
return {item["cve"]: float(item.get("epss", 0)) for item in data.get("data", [])}
except Exception:
return {}
def fetch_kev() -> set[str]:
url = "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json"
try:
with urllib.request.urlopen(url, timeout=15) as resp:
data = json.loads(resp.read())
return {v["cveID"] for v in data.get("vulnerabilities", [])}
except Exception:
return set()
def calculate_priority(pr: CVEPullRequest) -> str:
if pr.in_kev:
return "P0-MERGE-NOW"
max_epss = max(pr.epss_scores) if pr.epss_scores else 0
if max_epss >= 0.3:
return "P1-THIS-SPRINT"
if max_epss >= 0.05:
return "P2-NEXT-SPRINT"
if pr.age_days > 30:
return "P2-OVERDUE"
return "P3-BACKLOG"
def triage_repos(repos: list[str]) -> list[CVEPullRequest]:
from datetime import datetime, timezone
kev = fetch_kev()
all_cve_ids = []
raw_prs = []
for repo in repos:
for pr in get_open_cve_prs(repo):
cves = extract_cves_from_pr(pr)
all_cve_ids.extend(cves)
raw_prs.append((repo, pr, cves))
epss_scores = fetch_epss_batch(list(set(all_cve_ids)))
now = datetime.now(timezone.utc)
result = []
for repo, pr, cves in raw_prs:
pr_scores = [epss_scores.get(c, 0.0) for c in cves]
pr_in_kev = any(c in kev for c in cves)
created = datetime.fromisoformat(pr["createdAt"].replace("Z", "+00:00"))
age_days = (now - created).total_seconds() / 86400
cpr = CVEPullRequest(
pr_number=pr["number"],
title=pr["title"],
repo=repo,
cve_ids=cves,
created_at=pr["createdAt"],
age_days=age_days,
epss_scores=pr_scores,
in_kev=pr_in_kev
)
cpr.priority = calculate_priority(cpr)
result.append(cpr)
priority_order = {"P0-MERGE-NOW": 0, "P1-THIS-SPRINT": 1, "P2-THIS-SPRINT": 2,
"P2-OVERDUE": 2, "P3-BACKLOG": 3}
result.sort(key=lambda x: (priority_order.get(x.priority, 99), -max(x.epss_scores or [0])))
return result
if __name__ == "__main__":
REPOS = ["myorg/app", "myorg/platform", "myorg/infra"]
prs = triage_repos(REPOS)
print(f"{'Priority':<18} {'PR':>6} {'Age':>5} {'EPSS':>6} {'KEV':>5} {'Repo':<20} Title")
print("-" * 100)
for pr in prs:
max_epss = max(pr.epss_scores) if pr.epss_scores else 0.0
kev = "YES" if pr.in_kev else "no"
print(f"{pr.priority:<18} #{pr.pr_number:>5} {pr.age_days:>4.0f}d {max_epss:>6.4f} "
f"{kev:>5} {pr.repo:<20} {pr.title[:50]}")
Step 3 — Suppress false positives systematically
# .trivyignore — suppress known false positives
# Each suppression requires a justification and expiry date
# Format: CVE-ID [expiry-date] # justification
CVE-2023-XXXX exp:2027-01-01 # Only affects Windows builds; our CI is Linux-only
CVE-2024-YYYY exp:2026-12-01 # In test-only dependency; not present in production image
# For Grype:
# .grype.yaml
ignore:
- vulnerability: CVE-2023-XXXX
reason: "Windows-only vulnerability; our deployments are Linux"
expires: "2027-01-01"
fix-state: wont-fix
- vulnerability: CVE-2024-YYYY
package:
name: pytest
ecosystem: python
reason: "Test-only dependency; not included in production Docker image"
expires: "2026-12-01"
#!/bin/bash
# scripts/audit-suppressions.sh
# Review suppression files and alert on expired entries
TRIVY_IGNORE=".trivyignore"
GRYPE_IGNORE=".grype.yaml"
TODAY=$(date +%Y-%m-%d)
echo "=== Checking suppression expiry ==="
if [[ -f "$TRIVY_IGNORE" ]]; then
while IFS= read -r line; do
if [[ "$line" =~ exp:([0-9]{4}-[0-9]{2}-[0-9]{2}) ]]; then
expiry="${BASH_REMATCH[1]}"
cve=$(echo "$line" | grep -oP 'CVE-\d{4}-\d+')
if [[ "$expiry" < "$TODAY" ]]; then
echo "EXPIRED: $cve (expiry: $expiry) — review and remove or renew"
fi
fi
done < "$TRIVY_IGNORE"
fi
echo ""
echo "Total suppressions: $(grep -c '^CVE-' "$TRIVY_IGNORE" 2>/dev/null || echo 0)"
Step 4 — SLA-driven escalation workflow
# .github/workflows/cve-sla-escalation.yml
# Checks CVE PRs against SLA and escalates overdue ones
name: CVE SLA Escalation
on:
schedule:
- cron: "0 9 * * 1-5" # Weekdays at 09:00
jobs:
check-sla:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Check CVE PR SLAs
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
python3 - << 'PYEOF'
import subprocess, json, sys
from datetime import datetime, timezone
# SLA thresholds in hours
SLA = {"P0-KEV": 24, "P1": 168, "P2": 720}
result = subprocess.run(
["gh", "pr", "list", "--label", "vulnerability",
"--json", "number,title,createdAt,labels"],
capture_output=True, text=True
)
prs = json.loads(result.stdout)
now = datetime.now(timezone.utc)
breaches = []
for pr in prs:
created = datetime.fromisoformat(pr["createdAt"].replace("Z", "+00:00"))
age_hours = (now - created).total_seconds() / 3600
labels = [l["name"] for l in pr.get("labels", [])]
sla_hours = None
if "kev" in " ".join(labels).lower():
sla_hours = SLA["P0-KEV"]
elif "priority" in " ".join(labels).lower():
sla_hours = SLA["P1"]
if sla_hours and age_hours > sla_hours:
breaches.append({
"pr": pr["number"],
"title": pr["title"],
"age_hours": age_hours,
"sla_hours": sla_hours
})
if breaches:
print(f"SLA BREACH: {len(breaches)} CVE PRs are overdue")
for b in breaches:
print(f" PR #{b['pr']}: {b['age_hours']:.0f}h / {b['sla_hours']}h SLA")
print(f" {b['title']}")
sys.exit(1)
else:
print("All CVE PRs within SLA")
PYEOF
Step 5 — Metrics for the CVE remediation pipeline
# /etc/node_exporter/textfile_collector/cve_pipeline.sh
# Expose CVE remediation pipeline metrics for Prometheus
OPEN_CVE_PRS=$(gh pr list --label vulnerability --state open --json number | jq length 2>/dev/null || echo 0)
OVERDUE_PRS=$(python3 scripts/triage-cve-prs.py 2>/dev/null | grep -c "P0\|OVERDUE" || echo 0)
SUPPRESSIONS=$(grep -c "^CVE-" .trivyignore 2>/dev/null || echo 0)
cat << EOF
# HELP cve_open_prs_total Open CVE remediation pull requests
# TYPE cve_open_prs_total gauge
cve_open_prs_total $OPEN_CVE_PRS
# HELP cve_overdue_prs_total CVE PRs exceeding SLA
# TYPE cve_overdue_prs_total gauge
cve_overdue_prs_total $OVERDUE_PRS
# HELP cve_active_suppressions_total Active vulnerability scan suppressions
# TYPE cve_active_suppressions_total gauge
cve_active_suppressions_total $SUPPRESSIONS
EOF
Expected Behaviour
| Scenario | Unmanaged pipeline | Managed pipeline |
|---|---|---|
| 200 CVE PRs open simultaneously | Team overwhelmed; nothing merged | Priority queue; P0-KEV merged in < 24h; P3 auto-triaged |
| Patch-level update for low-risk library | Manual review required | Auto-merged after CI passes |
| KEV CVE appears in dependency | Discovered manually; may take days | Triage script flags P0; escalation fires within hours |
| EPSS score rises on open PR | Not detected | Daily re-triage re-prioritizes rising-EPSS PRs |
| Suppression expires | No detection | Audit script flags expired suppressions in CI |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Auto-merge for patch updates | Eliminates review burden for low-risk changes | Semantic versioning violations (breaking change in patch release) | Require full CI pass before auto-merge; add integration test suite |
| EPSS-based prioritization | Focus on exploitable CVEs | EPSS may underweight novel zero-days with no historical data | Always preserve P0-KEV override; treat unknown EPSS as medium priority |
| False positive suppression | Reduces noise; focuses reviews | Suppressions may be wrong; could mask real risk | Require justification and expiry on every suppression; run audit script in CI |
| Central triage script | Consistent prioritization across repos | Requires maintenance as EPSS/KEV APIs evolve | Version the script; test API responses; cache aggressively |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Auto-merge introduces breaking change | Application fails to start after merge | CI catches it before merge (if tests are comprehensive) | Revert the auto-merged commit; add regression test for broken interface |
| Triage script EPSS API times out | All PRs classified as P3-BACKLOG | Alert on all-P3 output (abnormal distribution) | Cache last-known EPSS scores; use cached scores when API is unavailable |
| KEV check not running for weekend CVE | KEV-listed CVE not escalated until Monday | Weekday-only SLA check misses Friday KEV addition | Run SLA check 3×/day including weekends for KEV checks specifically |
| Suppression file grows without audit | Large list of suppressions; some masking real risk | Suppression count metric exceeds threshold | Require security team review for suppression files > 20 entries |
Related Articles
- Renovate Dependabot Security Configuration — configuring the dependency update bots that generate CVE PRs
- EPSS-Driven Patch Prioritization — the EPSS scoring methodology used in the triage pipeline
- CISA KEV Alerting Integration — real-time KEV updates that override the normal triage priority
- Container Vulnerability Scanning CI — the scanning tools whose output feeds the remediation pipeline
- Vulnerability Management Program — the broader programme within which the CI pipeline operates