Tuning CI Vulnerability Scanner Gates for High CVE Volume
The Problem
When NVD published 28,000 CVEs in 2023 and 40,000 in 2024, teams running Trivy or Grype as hard CI gates started experiencing the “CVE gridlock” pattern: every dependency update triggers a new finding; every finding blocks the pipeline; engineers spend more time writing suppression exceptions than writing code; eventually the team either disables the scanner gate entirely or installs a blanket suppression for anything below CVSS 8.
Both outcomes are security regressions. The first removes all scanning. The second misses critical CVEs that have low initial CVSS scores but are later exploited (EPSS, not CVSS, predicts actual exploitation).
The CVE volume problem is accelerating. LLM-assisted vulnerability research allows security researchers to find and report CVEs in widely-used libraries faster than ever. The NVD backlog and enrichment lag (a separate problem) means many CVEs arrive with incomplete data — no CVSS score, no CWE, no CPE — which confuses score-based gates.
A well-tuned CI gate needs to:
- Block on CVEs that matter: CISA KEV entries, CVEs with EPSS > configured threshold, CVEs with public PoC, CVSS ≥ 9.0.
- Warn on CVEs that need tracking: CVSS 7–9, EPSS 5–15%, no public PoC yet.
- Silently suppress noise: CVEs with no CVSS (data-deficient), CVEs in dev/test-only dependencies, CVEs in packages not in the final image layer, CVEs with vendor-issued
not_affectedVEX statements. - Auto-clear on fix availability: when a patched version exists and the build can be updated, generate a PR automatically rather than blocking.
Target systems: Trivy, Grype, or Snyk as CI scanners; GitHub Actions, GitLab CI, or Tekton pipelines; teams shipping container images or language-ecosystem packages.
Threat Model
1. CVE flood causes gate bypass (accidental). Objective (unintentional): engineers suppress every CVE to keep pipelines green; a critical exploitable CVE is suppressed alongside thousands of noise entries. Impact: a CISA KEV-listed CVE with public exploit ships to production.
2. False-negative gate tuning (misconfiguration). Objective: EPSS threshold set too high (0.50+); many exploited CVEs with EPSS 0.10–0.30 pass silently. Impact: CVEs that are actively being exploited in the wild pass CI without review.
3. Suppression file manipulation (insider or supply-chain attacker). Objective: add a suppression entry for a specific CVE that affects a malicious dependency the attacker introduced. Impact: the CVE in the backdoored dependency is suppressed and the package ships to production.
4. NVD lag + EPSS lag (systemic). Objective: CVE has no score for 48 hours after publication (NVD backlog); scanner treats it as low priority; EPSS hasn’t had time to be computed. Impact: a high-severity CVE is published, PoC drops same day, but CI gate doesn’t block because the CVE has no score yet. Mitigation: treat score-absent CVEs with public PoC as critical.
Hardening Configuration
Gate Architecture
CI Pipeline: build image → scan with Trivy → evaluate gate policy → pass/fail/warn
↓
EPSS API: fetch exploitation probability
KEV API: fetch known-exploited list
VEX store: fetch vendor not-affected statements
Trivy Configuration with Policy Enforcement
# .trivy.yaml — project-level Trivy config
scan:
# Scan all layers, not just the final layer
target: image
scanners:
- vuln
- secret # catch secrets in image layers too
vulnerability:
# Don't block on CVEs without CVSS — handled separately
ignoreSeverities: []
# Include unfixed CVEs in report but with different exit code handling
ignoreUnfixed: false
exit-code: 0 # Override per-policy below; don't fail on any CVE by default
format: json
output: trivy-results.json
Evaluate the results with a policy script rather than relying on Trivy’s built-in exit codes:
#!/usr/bin/env python3
# gate-policy.py — evaluates Trivy output against tuned policy
import json
import sys
import requests
from datetime import datetime, UTC
EPSS_API = "https://api.first.org/data/v1/epss"
KEV_URL = "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json"
# Tuning parameters — adjust per team risk appetite
EPSS_BLOCK_THRESHOLD = 0.15 # Block if exploitation probability >= 15%
CVSS_BLOCK_THRESHOLD = 9.0 # Block if CVSS score >= 9.0
CVSS_WARN_THRESHOLD = 7.0 # Warn if CVSS score >= 7.0
def fetch_epss_scores(cve_ids: list[str]) -> dict[str, float]:
if not cve_ids:
return {}
cve_param = ",".join(cve_ids[:100]) # API limit
resp = requests.get(f"{EPSS_API}?cve={cve_param}", timeout=10)
resp.raise_for_status()
return {item["cve"]: float(item["epss"]) for item in resp.json().get("data", [])}
def fetch_kev_ids() -> set[str]:
resp = requests.get(KEV_URL, timeout=15)
resp.raise_for_status()
return {v["cveID"] for v in resp.json().get("vulnerabilities", [])}
def load_suppressions(path: str = ".cve-suppressions.yaml") -> dict:
import yaml
try:
with open(path) as f:
return yaml.safe_load(f) or {}
except FileNotFoundError:
return {}
def evaluate(trivy_results_path: str):
with open(trivy_results_path) as f:
results = json.load(f)
# Collect all CVEs from scan results
all_cves = []
findings_by_cve = {}
for result in results.get("Results", []):
for vuln in result.get("Vulnerabilities", []):
cve_id = vuln.get("VulnerabilityID", "")
if cve_id:
all_cves.append(cve_id)
findings_by_cve[cve_id] = vuln
epss_scores = fetch_epss_scores(all_cves)
kev_ids = fetch_kev_ids()
suppressions = load_suppressions()
blocks = []
warnings = []
for cve_id, vuln in findings_by_cve.items():
# Check suppression
suppression = suppressions.get(cve_id)
if suppression:
# Validate suppression hasn't expired
if suppression.get("expires_after"):
expiry = datetime.fromisoformat(suppression["expires_after"])
if expiry < datetime.now(UTC):
# Suppression expired — treat as unsuppressed
print(f"WARNING: Suppression for {cve_id} expired on {expiry.date()}")
else:
continue # Valid suppression; skip
epss = epss_scores.get(cve_id, 0.0)
cvss = vuln.get("CVSS", {}).get("nvd", {}).get("V3Score", 0.0)
is_kev = cve_id in kev_ids
# Block conditions
if is_kev:
blocks.append({"cve": cve_id, "reason": "CISA KEV", "epss": epss, "cvss": cvss})
elif epss >= EPSS_BLOCK_THRESHOLD:
blocks.append({"cve": cve_id, "reason": f"EPSS {epss:.3f}", "cvss": cvss})
elif cvss >= CVSS_BLOCK_THRESHOLD:
blocks.append({"cve": cve_id, "reason": f"CVSS {cvss}", "epss": epss})
# Warn conditions (not blocking)
elif cvss >= CVSS_WARN_THRESHOLD or epss >= 0.05:
warnings.append({"cve": cve_id, "cvss": cvss, "epss": epss})
# CVE with no score but public PoC URL — treat as warn
elif not cvss and vuln.get("References"):
poc_refs = [r for r in vuln["References"] if "poc" in r.lower() or "exploit" in r.lower()]
if poc_refs:
warnings.append({"cve": cve_id, "cvss": "N/A", "epss": epss,
"note": "no CVSS but PoC reference found"})
# Output results
for b in blocks:
print(f"BLOCK: {b['cve']} — {b['reason']}")
for w in warnings:
print(f"WARN: {w['cve']} — CVSS={w['cvss']} EPSS={w.get('epss', 'N/A'):.3f}")
if blocks:
print(f"\n{len(blocks)} blocking CVEs found. Pipeline FAILED.")
sys.exit(1)
elif warnings:
print(f"\n{len(warnings)} warning CVEs (non-blocking). Review recommended.")
sys.exit(0)
else:
print("No blocking or warning CVEs. Pipeline PASSED.")
if __name__ == "__main__":
evaluate(sys.argv[1] if len(sys.argv) > 1 else "trivy-results.json")
Suppression Policy File
# .cve-suppressions.yaml
# Format: CVE-ID: { reason, expires_after, ticket }
# All suppressions require: reason, expiry, and ticket reference
# Suppressions without expiry are rejected by the gate policy
CVE-2024-12345:
reason: "Dev-only dependency (pytest); not present in final image"
expires_after: "2026-12-31"
ticket: "SEC-1234"
approved_by: "security-team"
CVE-2025-67890:
reason: "Vendor VEX statement: not_affected for our usage pattern"
expires_after: "2026-09-01"
ticket: "SEC-1567"
vex_url: "https://vendor.example.com/security/CVE-2025-67890-vex.json"
approved_by: "security-team"
Gate policy enforcement on suppressions:
# CI step: reject suppressions without expiry or ticket
python3 - << 'EOF'
import yaml, sys
with open(".cve-suppressions.yaml") as f:
suppressions = yaml.safe_load(f) or {}
errors = []
for cve, data in suppressions.items():
if not data.get("expires_after"):
errors.append(f"{cve}: missing expires_after")
if not data.get("ticket"):
errors.append(f"{cve}: missing ticket")
if errors:
print("SUPPRESSION POLICY VIOLATIONS:")
for e in errors: print(f" {e}")
sys.exit(1)
EOF
GitHub Actions Integration
# .github/workflows/security-scan.yml
name: Security Scan
on: [push, pull_request]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build image
run: docker build -t app:${{ github.sha }} .
- name: Run Trivy scan
uses: aquasecurity/trivy-action@master
with:
image-ref: app:${{ github.sha }}
format: json
output: trivy-results.json
exit-code: "0" # Policy script handles exit code
- name: Validate suppression file
run: python3 scripts/validate-suppressions.py
- name: Evaluate gate policy
run: python3 scripts/gate-policy.py trivy-results.json
env:
# Cache EPSS API results to avoid rate limiting in parallel jobs
EPSS_CACHE_TTL: "3600"
- name: Upload scan results
if: always()
uses: actions/upload-artifact@v4
with:
name: trivy-scan-${{ github.sha }}
path: trivy-results.json
retention-days: 90
Monitoring Gate Effectiveness
Track gate decisions over time to detect drift:
# Prometheus metrics from gate policy script
cat << EOF >> gate-policy-metrics.py
from prometheus_client import CollectorRegistry, Gauge, push_to_gateway
registry = CollectorRegistry()
g_blocks = Gauge("ci_cve_gate_blocks", "CVEs blocked in CI", ["reason"], registry=registry)
g_warns = Gauge("ci_cve_gate_warnings","CVEs warned in CI", [], registry=registry)
g_suppressed = Gauge("ci_cve_gate_suppressed", "CVEs suppressed", [], registry=registry)
# ... populate from gate evaluation results ...
push_to_gateway("pushgateway:9091", job="cve-gate", registry=registry)
EOF
Expected Behaviour After Hardening
| CVE Type | Before Tuning | After Tuning |
|---|---|---|
| CISA KEV — CVSS 7.5, EPSS 0.45 | Passes if team only gates on CVSS ≥ 9 | Blocked by KEV rule |
| CVSS 9.8, EPSS 0.002 (theoretical) | Blocks pipeline | Blocked (CVSS threshold) — correct |
| CVSS 5.0, EPSS 0.001, dev-only dep | Blocks pipeline (noise) | Suppressed with expiry; passes |
| No CVSS yet, PoC in references | Not evaluated | Warns; flags for manual review |
| Suppression expired | Applies indefinitely | Gate rejects expired suppression; CVE re-evaluated |
Trade-offs and Operational Considerations
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| EPSS-based blocking | Catches exploited CVEs missed by CVSS | EPSS scores lag 24-72h post-publication | Supplement with KEV blocking; treat no-CVSS CVEs with PoC as warnings |
| Suppression with expiry | Prevents indefinite noise suppression | Requires process discipline to re-evaluate | Automate expiry notification; block PRs adding suppressions without ticket |
| Policy script vs scanner flags | Full control over logic; auditable | More maintenance than native scanner flags | Keep policy script in a shared security team repo; version it |
| EPSS API call in CI | Always-current scores | External API dependency; potential rate limit | Cache responses for 1 hour; fall back to CVSS-only if API unavailable |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| EPSS API unreachable | Policy falls back to CVSS-only; KEV-listed CVEs with low CVSS may pass | CI log: EPSS API timeout; metric shows fallback mode | Use cached EPSS data from last successful fetch; alert on fallback duration > 4h |
| KEV JSON endpoint changes URL | KEV check fails; all CVEs pass | CI log: HTTP 404 fetching KEV; metric shows KEV disabled | Update KEV URL; use mirrored copy for resilience |
| Suppression file merge conflict | CI fails on suppression validation | Suppression validation step fails with parse error | Resolve merge conflict; ensure only one entry per CVE |
| EPSS threshold too aggressive (0.05) | Too many blocks; pipeline gridlock resumes | Pipeline failure rate metric rises; engineer complaints | Raise threshold to 0.10 or 0.15; review blocked CVEs to calibrate |
| CVE with no CVSS blocks production | Gate evaluates no-CVSS CVE as critical | Post-incident review: CVE had no CVSS yet was in final image | Add grace period for no-CVSS CVEs: warn only for 48h after publication |