Tuning CI Vulnerability Scanner Gates for High CVE Volume

The Problem

When NVD published 28,000 CVEs in 2023 and 40,000 in 2024, teams running Trivy or Grype as hard CI gates started experiencing the “CVE gridlock” pattern: every dependency update triggers a new finding; every finding blocks the pipeline; engineers spend more time writing suppression exceptions than writing code; eventually the team either disables the scanner gate entirely or installs a blanket suppression for anything below CVSS 8.

Both outcomes are security regressions. The first removes all scanning. The second misses critical CVEs that have low initial CVSS scores but are later exploited (EPSS, not CVSS, predicts actual exploitation).

The CVE volume problem is accelerating. LLM-assisted vulnerability research allows security researchers to find and report CVEs in widely-used libraries faster than ever. The NVD backlog and enrichment lag (a separate problem) means many CVEs arrive with incomplete data — no CVSS score, no CWE, no CPE — which confuses score-based gates.

A well-tuned CI gate needs to:

Block on CVEs that matter: CISA KEV entries, CVEs with EPSS > configured threshold, CVEs with public PoC, CVSS ≥ 9.0.
Warn on CVEs that need tracking: CVSS 7–9, EPSS 5–15%, no public PoC yet.
Silently suppress noise: CVEs with no CVSS (data-deficient), CVEs in dev/test-only dependencies, CVEs in packages not in the final image layer, CVEs with vendor-issued not_affected VEX statements.
Auto-clear on fix availability: when a patched version exists and the build can be updated, generate a PR automatically rather than blocking.

Target systems: Trivy, Grype, or Snyk as CI scanners; GitHub Actions, GitLab CI, or Tekton pipelines; teams shipping container images or language-ecosystem packages.

Threat Model

1. CVE flood causes gate bypass (accidental). Objective (unintentional): engineers suppress every CVE to keep pipelines green; a critical exploitable CVE is suppressed alongside thousands of noise entries. Impact: a CISA KEV-listed CVE with public exploit ships to production.

2. False-negative gate tuning (misconfiguration). Objective: EPSS threshold set too high (0.50+); many exploited CVEs with EPSS 0.10–0.30 pass silently. Impact: CVEs that are actively being exploited in the wild pass CI without review.

3. Suppression file manipulation (insider or supply-chain attacker). Objective: add a suppression entry for a specific CVE that affects a malicious dependency the attacker introduced. Impact: the CVE in the backdoored dependency is suppressed and the package ships to production.

4. NVD lag + EPSS lag (systemic). Objective: CVE has no score for 48 hours after publication (NVD backlog); scanner treats it as low priority; EPSS hasn’t had time to be computed. Impact: a high-severity CVE is published, PoC drops same day, but CI gate doesn’t block because the CVE has no score yet. Mitigation: treat score-absent CVEs with public PoC as critical.

Hardening Configuration

Gate Architecture

CI Pipeline: build image → scan with Trivy → evaluate gate policy → pass/fail/warn
                                                   ↓
                             EPSS API: fetch exploitation probability
                             KEV API:  fetch known-exploited list
                             VEX store: fetch vendor not-affected statements

Trivy Configuration with Policy Enforcement

# .trivy.yaml — project-level Trivy config
scan:
  # Scan all layers, not just the final layer
  target: image
  scanners:
    - vuln
    - secret    # catch secrets in image layers too

vulnerability:
  # Don't block on CVEs without CVSS — handled separately
  ignoreSeverities: []
  # Include unfixed CVEs in report but with different exit code handling
  ignoreUnfixed: false

exit-code: 0    # Override per-policy below; don't fail on any CVE by default
format: json
output: trivy-results.json

Evaluate the results with a policy script rather than relying on Trivy’s built-in exit codes:

#!/usr/bin/env python3
# gate-policy.py — evaluates Trivy output against tuned policy

import json
import sys
import requests
from datetime import datetime, UTC

EPSS_API = "https://api.first.org/data/v1/epss"
KEV_URL  = "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json"

# Tuning parameters — adjust per team risk appetite
EPSS_BLOCK_THRESHOLD = 0.15    # Block if exploitation probability >= 15%
CVSS_BLOCK_THRESHOLD = 9.0     # Block if CVSS score >= 9.0
CVSS_WARN_THRESHOLD  = 7.0     # Warn if CVSS score >= 7.0

def fetch_epss_scores(cve_ids: list[str]) -> dict[str, float]:
    if not cve_ids:
        return {}
    cve_param = ",".join(cve_ids[:100])  # API limit
    resp = requests.get(f"{EPSS_API}?cve={cve_param}", timeout=10)
    resp.raise_for_status()
    return {item["cve"]: float(item["epss"]) for item in resp.json().get("data", [])}

def fetch_kev_ids() -> set[str]:
    resp = requests.get(KEV_URL, timeout=15)
    resp.raise_for_status()
    return {v["cveID"] for v in resp.json().get("vulnerabilities", [])}

def load_suppressions(path: str = ".cve-suppressions.yaml") -> dict:
    import yaml
    try:
        with open(path) as f:
            return yaml.safe_load(f) or {}
    except FileNotFoundError:
        return {}

def evaluate(trivy_results_path: str):
    with open(trivy_results_path) as f:
        results = json.load(f)

    # Collect all CVEs from scan results
    all_cves = []
    findings_by_cve = {}
    for result in results.get("Results", []):
        for vuln in result.get("Vulnerabilities", []):
            cve_id = vuln.get("VulnerabilityID", "")
            if cve_id:
                all_cves.append(cve_id)
                findings_by_cve[cve_id] = vuln

    epss_scores = fetch_epss_scores(all_cves)
    kev_ids     = fetch_kev_ids()
    suppressions = load_suppressions()

    blocks = []
    warnings = []

    for cve_id, vuln in findings_by_cve.items():
        # Check suppression
        suppression = suppressions.get(cve_id)
        if suppression:
            # Validate suppression hasn't expired
            if suppression.get("expires_after"):
                expiry = datetime.fromisoformat(suppression["expires_after"])
                if expiry < datetime.now(UTC):
                    # Suppression expired — treat as unsuppressed
                    print(f"WARNING: Suppression for {cve_id} expired on {expiry.date()}")
                else:
                    continue  # Valid suppression; skip

        epss   = epss_scores.get(cve_id, 0.0)
        cvss   = vuln.get("CVSS", {}).get("nvd", {}).get("V3Score", 0.0)
        is_kev = cve_id in kev_ids

        # Block conditions
        if is_kev:
            blocks.append({"cve": cve_id, "reason": "CISA KEV", "epss": epss, "cvss": cvss})
        elif epss >= EPSS_BLOCK_THRESHOLD:
            blocks.append({"cve": cve_id, "reason": f"EPSS {epss:.3f}", "cvss": cvss})
        elif cvss >= CVSS_BLOCK_THRESHOLD:
            blocks.append({"cve": cve_id, "reason": f"CVSS {cvss}", "epss": epss})

        # Warn conditions (not blocking)
        elif cvss >= CVSS_WARN_THRESHOLD or epss >= 0.05:
            warnings.append({"cve": cve_id, "cvss": cvss, "epss": epss})

        # CVE with no score but public PoC URL — treat as warn
        elif not cvss and vuln.get("References"):
            poc_refs = [r for r in vuln["References"] if "poc" in r.lower() or "exploit" in r.lower()]
            if poc_refs:
                warnings.append({"cve": cve_id, "cvss": "N/A", "epss": epss,
                                  "note": "no CVSS but PoC reference found"})

    # Output results
    for b in blocks:
        print(f"BLOCK: {b['cve']} — {b['reason']}")
    for w in warnings:
        print(f"WARN:  {w['cve']} — CVSS={w['cvss']} EPSS={w.get('epss', 'N/A'):.3f}")

    if blocks:
        print(f"\n{len(blocks)} blocking CVEs found. Pipeline FAILED.")
        sys.exit(1)
    elif warnings:
        print(f"\n{len(warnings)} warning CVEs (non-blocking). Review recommended.")
        sys.exit(0)
    else:
        print("No blocking or warning CVEs. Pipeline PASSED.")

if __name__ == "__main__":
    evaluate(sys.argv[1] if len(sys.argv) > 1 else "trivy-results.json")

Suppression Policy File

# .cve-suppressions.yaml
# Format: CVE-ID: { reason, expires_after, ticket }
# All suppressions require: reason, expiry, and ticket reference
# Suppressions without expiry are rejected by the gate policy

CVE-2024-12345:
  reason: "Dev-only dependency (pytest); not present in final image"
  expires_after: "2026-12-31"
  ticket: "SEC-1234"
  approved_by: "security-team"

CVE-2025-67890:
  reason: "Vendor VEX statement: not_affected for our usage pattern"
  expires_after: "2026-09-01"
  ticket: "SEC-1567"
  vex_url: "https://vendor.example.com/security/CVE-2025-67890-vex.json"
  approved_by: "security-team"

Gate policy enforcement on suppressions:

# CI step: reject suppressions without expiry or ticket
python3 - << 'EOF'
import yaml, sys
with open(".cve-suppressions.yaml") as f:
    suppressions = yaml.safe_load(f) or {}
errors = []
for cve, data in suppressions.items():
    if not data.get("expires_after"):
        errors.append(f"{cve}: missing expires_after")
    if not data.get("ticket"):
        errors.append(f"{cve}: missing ticket")
if errors:
    print("SUPPRESSION POLICY VIOLATIONS:")
    for e in errors: print(f"  {e}")
    sys.exit(1)
EOF

GitHub Actions Integration

# .github/workflows/security-scan.yml
name: Security Scan
on: [push, pull_request]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Build image
        run: docker build -t app:${{ github.sha }} .

      - name: Run Trivy scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: app:${{ github.sha }}
          format: json
          output: trivy-results.json
          exit-code: "0"    # Policy script handles exit code

      - name: Validate suppression file
        run: python3 scripts/validate-suppressions.py

      - name: Evaluate gate policy
        run: python3 scripts/gate-policy.py trivy-results.json
        env:
          # Cache EPSS API results to avoid rate limiting in parallel jobs
          EPSS_CACHE_TTL: "3600"

      - name: Upload scan results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: trivy-scan-${{ github.sha }}
          path: trivy-results.json
          retention-days: 90

Monitoring Gate Effectiveness

Track gate decisions over time to detect drift:

# Prometheus metrics from gate policy script
cat << EOF >> gate-policy-metrics.py
from prometheus_client import CollectorRegistry, Gauge, push_to_gateway

registry = CollectorRegistry()
g_blocks  = Gauge("ci_cve_gate_blocks",  "CVEs blocked in CI", ["reason"], registry=registry)
g_warns   = Gauge("ci_cve_gate_warnings","CVEs warned in CI",  [], registry=registry)
g_suppressed = Gauge("ci_cve_gate_suppressed", "CVEs suppressed", [], registry=registry)

# ... populate from gate evaluation results ...

push_to_gateway("pushgateway:9091", job="cve-gate", registry=registry)
EOF

Expected Behaviour After Hardening

CVE Type	Before Tuning	After Tuning
CISA KEV — CVSS 7.5, EPSS 0.45	Passes if team only gates on CVSS ≥ 9	Blocked by KEV rule
CVSS 9.8, EPSS 0.002 (theoretical)	Blocks pipeline	Blocked (CVSS threshold) — correct
CVSS 5.0, EPSS 0.001, dev-only dep	Blocks pipeline (noise)	Suppressed with expiry; passes
No CVSS yet, PoC in references	Not evaluated	Warns; flags for manual review
Suppression expired	Applies indefinitely	Gate rejects expired suppression; CVE re-evaluated

Trade-offs and Operational Considerations

Aspect	Benefit	Cost	Mitigation
EPSS-based blocking	Catches exploited CVEs missed by CVSS	EPSS scores lag 24-72h post-publication	Supplement with KEV blocking; treat no-CVSS CVEs with PoC as warnings
Suppression with expiry	Prevents indefinite noise suppression	Requires process discipline to re-evaluate	Automate expiry notification; block PRs adding suppressions without ticket
Policy script vs scanner flags	Full control over logic; auditable	More maintenance than native scanner flags	Keep policy script in a shared security team repo; version it
EPSS API call in CI	Always-current scores	External API dependency; potential rate limit	Cache responses for 1 hour; fall back to CVSS-only if API unavailable

Failure Modes

Failure	Symptom	Detection	Recovery
EPSS API unreachable	Policy falls back to CVSS-only; KEV-listed CVEs with low CVSS may pass	CI log: EPSS API timeout; metric shows fallback mode	Use cached EPSS data from last successful fetch; alert on fallback duration > 4h
KEV JSON endpoint changes URL	KEV check fails; all CVEs pass	CI log: HTTP 404 fetching KEV; metric shows KEV disabled	Update KEV URL; use mirrored copy for resilience
Suppression file merge conflict	CI fails on suppression validation	Suppression validation step fails with parse error	Resolve merge conflict; ensure only one entry per CVE
EPSS threshold too aggressive (0.05)	Too many blocks; pipeline gridlock resumes	Pipeline failure rate metric rises; engineer complaints	Raise threshold to 0.10 or 0.15; review blocked CVEs to calibrate
CVE with no CVSS blocks production	Gate evaluates no-CVSS CVE as critical	Post-incident review: CVE had no CVSS yet was in final image	Add grace period for no-CVSS CVEs: warn only for 48h after publication