Managing CVE Remediation Pipelines at Scale

Managing CVE Remediation Pipelines at Scale

Problem

The volume of CVEs affecting software dependencies has increased sharply. A moderately complex application with 200 direct and transitive dependencies may see 50–100 CVE-related dependency update PRs per month from Renovate or Dependabot. A platform team managing 30 repositories may see 1,500–3,000 dependency PRs per month.

The conventional response — have a human review and merge each dependency update PR — does not scale at these volumes. The consequences of volume-induced overload are predictable:

Alert fatigue and queue abandonment. The team falls behind on dependency PRs. A large backlog accumulates. Team members stop reviewing the queue because it feels futile. PRs that contain critical CVE patches are lost in the noise alongside cosmetic version bumps.

Inconsistent SLA compliance. Without automation, some critical CVEs are patched quickly and others linger for months depending on which engineer happened to notice them. Audit trails show inconsistent patch times.

False positive noise. Vulnerability scanners flag CVEs in test dependencies, build-time-only packages, and packages where the vulnerable code path is unreachable. These false positives generate PRs that consume review time and erode trust in the scanner output.

The automation gap. Most teams have Renovate or Dependabot configured to open PRs automatically. The missing piece is the triage and merge automation: which PRs should be auto-merged, which need review, which can be deferred, and which need immediate escalation.

This article builds a CVE remediation pipeline that handles volume through automation: auto-merge for low-risk patches, EPSS-weighted prioritization, false positive suppression, and SLA-driven escalation for high-risk findings.

Target systems: any repository using Renovate or Dependabot for dependency updates; platform teams managing multiple repositories; security teams responsible for CVE remediation SLAs.


Threat Model

Risk 1 — CVE in KEV list merged late due to backlog. A KEV-listed CVE affects a production dependency. It generates an update PR that sits in the 200-PR backlog for three weeks. The patch SLA requires 24-hour remediation. The queue delay causes a compliance breach and potential exploitation window.

Risk 2 — Auto-merge introduces breaking change. An automated policy auto-merges a “patch” version bump (e.g., 1.2.31.2.4). The maintainer incorrectly labelled a breaking API change as a patch release. The application breaks in production.

Risk 3 — False positive drives unnecessary churn. A CVE in a development dependency (e.g., a testing tool) generates urgent-looking PRs. The team spends time reviewing and merging updates for CVEs that are not reachable in production, creating churn and eroding confidence in the scanner.


Configuration / Implementation

Step 1 — Configure Renovate with EPSS-aware grouping

{
  "$schema": "https://docs.renovatebot.com/renovate-schema.json",
  "extends": ["config:base"],

  "vulnerabilityAlerts": {
    "enabled": true,
    "labels": ["security"],
    "groupName": null
  },

  "packageRules": [
    {
      "description": "Auto-merge patch updates for low-risk packages (no CVE, passing tests)",
      "matchUpdateTypes": ["patch"],
      "matchPackagePatterns": [".*"],
      "excludePackagePatterns": [
        "openssl", "libssl", "cryptography", "jwt", "auth",
        "postgres", "mysql", "sqlite", "redis"
      ],
      "automerge": true,
      "automergeType": "pr",
      "automergeStrategy": "squash",
      "requiredStatusChecks": ["ci/test", "ci/build"],
      "prPriority": 0
    },

    {
      "description": "Never auto-merge security-critical packages — always requires review",
      "matchPackagePatterns": [
        "openssl", "libssl", "cryptography", "bcrypt", "argon2",
        "jsonwebtoken", "passport", "django", "flask", "rails",
        "spring-security", "shiro", "bouncy-castle"
      ],
      "automerge": false,
      "labels": ["security", "requires-review"],
      "prPriority": 10,
      "reviewers": ["security-team"]
    },

    {
      "description": "Vulnerability alerts — highest priority, no auto-merge",
      "isVulnerabilityAlert": true,
      "automerge": false,
      "labels": ["security", "vulnerability", "priority"],
      "prPriority": 100,
      "reviewers": ["security-team"],
      "schedule": ["at any time"],
      "prTitle": "SECURITY: {{depName}} {{newVersion}} — CVE fix"
    },

    {
      "description": "Group non-security patch updates to reduce PR noise",
      "matchUpdateTypes": ["patch"],
      "matchPackagePatterns": ["^@types/", "^eslint", "^prettier", "^jest", "^@testing-library"],
      "groupName": "dev tooling patch updates",
      "automerge": true,
      "automergeType": "pr"
    }
  ],

  "prHourlyLimit": 5,
  "prConcurrentLimit": 15,

  "stabilityDays": 2
}

Step 2 — Build an EPSS-weighted PR triage workflow

#!/usr/bin/env python3
# scripts/triage-cve-prs.py
# Fetches open CVE-related PRs and enriches them with EPSS scores
# Outputs a prioritized triage list for the security team

import json
import subprocess
import urllib.request
from dataclasses import dataclass
from typing import Optional

@dataclass
class CVEPullRequest:
    pr_number: int
    title: str
    repo: str
    cve_ids: list[str]
    created_at: str
    age_days: float
    epss_scores: list[float]
    in_kev: bool
    priority: str = "unknown"

def get_open_cve_prs(repo: str) -> list[dict]:
    """Fetch open PRs with security/vulnerability labels via GitHub CLI."""
    result = subprocess.run(
        ["gh", "pr", "list", "--repo", repo,
         "--label", "vulnerability",
         "--json", "number,title,createdAt,labels,body"],
        capture_output=True, text=True
    )
    if result.returncode != 0:
        return []
    return json.loads(result.stdout)

def extract_cves_from_pr(pr: dict) -> list[str]:
    """Extract CVE IDs from PR title and body."""
    import re
    text = f"{pr.get('title', '')} {pr.get('body', '')}"
    return re.findall(r'CVE-\d{4}-\d{4,7}', text)

def fetch_epss_batch(cve_ids: list[str]) -> dict[str, float]:
    if not cve_ids:
        return {}
    cve_param = ",".join(cve_ids[:100])
    url = f"https://api.first.org/data/1.0/epss?cve={cve_param}"
    try:
        with urllib.request.urlopen(url, timeout=15) as resp:
            data = json.loads(resp.read())
            return {item["cve"]: float(item.get("epss", 0)) for item in data.get("data", [])}
    except Exception:
        return {}

def fetch_kev() -> set[str]:
    url = "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json"
    try:
        with urllib.request.urlopen(url, timeout=15) as resp:
            data = json.loads(resp.read())
            return {v["cveID"] for v in data.get("vulnerabilities", [])}
    except Exception:
        return set()

def calculate_priority(pr: CVEPullRequest) -> str:
    if pr.in_kev:
        return "P0-MERGE-NOW"
    max_epss = max(pr.epss_scores) if pr.epss_scores else 0
    if max_epss >= 0.3:
        return "P1-THIS-SPRINT"
    if max_epss >= 0.05:
        return "P2-NEXT-SPRINT"
    if pr.age_days > 30:
        return "P2-OVERDUE"
    return "P3-BACKLOG"

def triage_repos(repos: list[str]) -> list[CVEPullRequest]:
    from datetime import datetime, timezone
    
    kev = fetch_kev()
    all_cve_ids = []
    raw_prs = []
    
    for repo in repos:
        for pr in get_open_cve_prs(repo):
            cves = extract_cves_from_pr(pr)
            all_cve_ids.extend(cves)
            raw_prs.append((repo, pr, cves))
    
    epss_scores = fetch_epss_batch(list(set(all_cve_ids)))
    now = datetime.now(timezone.utc)
    
    result = []
    for repo, pr, cves in raw_prs:
        pr_scores = [epss_scores.get(c, 0.0) for c in cves]
        pr_in_kev = any(c in kev for c in cves)
        created = datetime.fromisoformat(pr["createdAt"].replace("Z", "+00:00"))
        age_days = (now - created).total_seconds() / 86400
        
        cpr = CVEPullRequest(
            pr_number=pr["number"],
            title=pr["title"],
            repo=repo,
            cve_ids=cves,
            created_at=pr["createdAt"],
            age_days=age_days,
            epss_scores=pr_scores,
            in_kev=pr_in_kev
        )
        cpr.priority = calculate_priority(cpr)
        result.append(cpr)
    
    priority_order = {"P0-MERGE-NOW": 0, "P1-THIS-SPRINT": 1, "P2-THIS-SPRINT": 2,
                      "P2-OVERDUE": 2, "P3-BACKLOG": 3}
    result.sort(key=lambda x: (priority_order.get(x.priority, 99), -max(x.epss_scores or [0])))
    return result

if __name__ == "__main__":
    REPOS = ["myorg/app", "myorg/platform", "myorg/infra"]
    prs = triage_repos(REPOS)
    
    print(f"{'Priority':<18} {'PR':>6} {'Age':>5} {'EPSS':>6} {'KEV':>5} {'Repo':<20} Title")
    print("-" * 100)
    for pr in prs:
        max_epss = max(pr.epss_scores) if pr.epss_scores else 0.0
        kev = "YES" if pr.in_kev else "no"
        print(f"{pr.priority:<18} #{pr.pr_number:>5} {pr.age_days:>4.0f}d {max_epss:>6.4f} "
              f"{kev:>5} {pr.repo:<20} {pr.title[:50]}")

Step 3 — Suppress false positives systematically

# .trivyignore — suppress known false positives
# Each suppression requires a justification and expiry date

# Format: CVE-ID [expiry-date] # justification
CVE-2023-XXXX exp:2027-01-01 # Only affects Windows builds; our CI is Linux-only
CVE-2024-YYYY exp:2026-12-01 # In test-only dependency; not present in production image

# For Grype:
# .grype.yaml
ignore:
  - vulnerability: CVE-2023-XXXX
    reason: "Windows-only vulnerability; our deployments are Linux"
    expires: "2027-01-01"
    fix-state: wont-fix

  - vulnerability: CVE-2024-YYYY
    package:
      name: pytest
      ecosystem: python
    reason: "Test-only dependency; not included in production Docker image"
    expires: "2026-12-01"
#!/bin/bash
# scripts/audit-suppressions.sh
# Review suppression files and alert on expired entries

TRIVY_IGNORE=".trivyignore"
GRYPE_IGNORE=".grype.yaml"
TODAY=$(date +%Y-%m-%d)

echo "=== Checking suppression expiry ==="

if [[ -f "$TRIVY_IGNORE" ]]; then
    while IFS= read -r line; do
        if [[ "$line" =~ exp:([0-9]{4}-[0-9]{2}-[0-9]{2}) ]]; then
            expiry="${BASH_REMATCH[1]}"
            cve=$(echo "$line" | grep -oP 'CVE-\d{4}-\d+')
            if [[ "$expiry" < "$TODAY" ]]; then
                echo "EXPIRED: $cve (expiry: $expiry) — review and remove or renew"
            fi
        fi
    done < "$TRIVY_IGNORE"
fi

echo ""
echo "Total suppressions: $(grep -c '^CVE-' "$TRIVY_IGNORE" 2>/dev/null || echo 0)"

Step 4 — SLA-driven escalation workflow

# .github/workflows/cve-sla-escalation.yml
# Checks CVE PRs against SLA and escalates overdue ones

name: CVE SLA Escalation

on:
  schedule:
    - cron: "0 9 * * 1-5"  # Weekdays at 09:00

jobs:
  check-sla:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Check CVE PR SLAs
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          python3 - << 'PYEOF'
          import subprocess, json, sys
          from datetime import datetime, timezone

          # SLA thresholds in hours
          SLA = {"P0-KEV": 24, "P1": 168, "P2": 720}

          result = subprocess.run(
              ["gh", "pr", "list", "--label", "vulnerability",
               "--json", "number,title,createdAt,labels"],
              capture_output=True, text=True
          )
          prs = json.loads(result.stdout)
          now = datetime.now(timezone.utc)

          breaches = []
          for pr in prs:
              created = datetime.fromisoformat(pr["createdAt"].replace("Z", "+00:00"))
              age_hours = (now - created).total_seconds() / 3600
              labels = [l["name"] for l in pr.get("labels", [])]
              
              sla_hours = None
              if "kev" in " ".join(labels).lower():
                  sla_hours = SLA["P0-KEV"]
              elif "priority" in " ".join(labels).lower():
                  sla_hours = SLA["P1"]
              
              if sla_hours and age_hours > sla_hours:
                  breaches.append({
                      "pr": pr["number"],
                      "title": pr["title"],
                      "age_hours": age_hours,
                      "sla_hours": sla_hours
                  })
          
          if breaches:
              print(f"SLA BREACH: {len(breaches)} CVE PRs are overdue")
              for b in breaches:
                  print(f"  PR #{b['pr']}: {b['age_hours']:.0f}h / {b['sla_hours']}h SLA")
                  print(f"    {b['title']}")
              sys.exit(1)
          else:
              print("All CVE PRs within SLA")
          PYEOF

Step 5 — Metrics for the CVE remediation pipeline

# /etc/node_exporter/textfile_collector/cve_pipeline.sh
# Expose CVE remediation pipeline metrics for Prometheus

OPEN_CVE_PRS=$(gh pr list --label vulnerability --state open --json number | jq length 2>/dev/null || echo 0)
OVERDUE_PRS=$(python3 scripts/triage-cve-prs.py 2>/dev/null | grep -c "P0\|OVERDUE" || echo 0)
SUPPRESSIONS=$(grep -c "^CVE-" .trivyignore 2>/dev/null || echo 0)

cat << EOF
# HELP cve_open_prs_total Open CVE remediation pull requests
# TYPE cve_open_prs_total gauge
cve_open_prs_total $OPEN_CVE_PRS

# HELP cve_overdue_prs_total CVE PRs exceeding SLA
# TYPE cve_overdue_prs_total gauge
cve_overdue_prs_total $OVERDUE_PRS

# HELP cve_active_suppressions_total Active vulnerability scan suppressions
# TYPE cve_active_suppressions_total gauge
cve_active_suppressions_total $SUPPRESSIONS
EOF

Expected Behaviour

Scenario Unmanaged pipeline Managed pipeline
200 CVE PRs open simultaneously Team overwhelmed; nothing merged Priority queue; P0-KEV merged in < 24h; P3 auto-triaged
Patch-level update for low-risk library Manual review required Auto-merged after CI passes
KEV CVE appears in dependency Discovered manually; may take days Triage script flags P0; escalation fires within hours
EPSS score rises on open PR Not detected Daily re-triage re-prioritizes rising-EPSS PRs
Suppression expires No detection Audit script flags expired suppressions in CI

Trade-offs

Aspect Benefit Cost Mitigation
Auto-merge for patch updates Eliminates review burden for low-risk changes Semantic versioning violations (breaking change in patch release) Require full CI pass before auto-merge; add integration test suite
EPSS-based prioritization Focus on exploitable CVEs EPSS may underweight novel zero-days with no historical data Always preserve P0-KEV override; treat unknown EPSS as medium priority
False positive suppression Reduces noise; focuses reviews Suppressions may be wrong; could mask real risk Require justification and expiry on every suppression; run audit script in CI
Central triage script Consistent prioritization across repos Requires maintenance as EPSS/KEV APIs evolve Version the script; test API responses; cache aggressively

Failure Modes

Failure Symptom Detection Recovery
Auto-merge introduces breaking change Application fails to start after merge CI catches it before merge (if tests are comprehensive) Revert the auto-merged commit; add regression test for broken interface
Triage script EPSS API times out All PRs classified as P3-BACKLOG Alert on all-P3 output (abnormal distribution) Cache last-known EPSS scores; use cached scores when API is unavailable
KEV check not running for weekend CVE KEV-listed CVE not escalated until Monday Weekday-only SLA check misses Friday KEV addition Run SLA check 3×/day including weekends for KEV checks specifically
Suppression file grows without audit Large list of suppressions; some masking real risk Suppression count metric exceeds threshold Require security team review for suppression files > 20 entries