Tuning CI Vulnerability Scanner Gates for High CVE Volume

Tuning CI Vulnerability Scanner Gates for High CVE Volume

The Problem

When NVD published 28,000 CVEs in 2023 and 40,000 in 2024, teams running Trivy or Grype as hard CI gates started experiencing the “CVE gridlock” pattern: every dependency update triggers a new finding; every finding blocks the pipeline; engineers spend more time writing suppression exceptions than writing code; eventually the team either disables the scanner gate entirely or installs a blanket suppression for anything below CVSS 8.

Both outcomes are security regressions. The first removes all scanning. The second misses critical CVEs that have low initial CVSS scores but are later exploited (EPSS, not CVSS, predicts actual exploitation).

The CVE volume problem is accelerating. LLM-assisted vulnerability research allows security researchers to find and report CVEs in widely-used libraries faster than ever. The NVD backlog and enrichment lag (a separate problem) means many CVEs arrive with incomplete data — no CVSS score, no CWE, no CPE — which confuses score-based gates.

A well-tuned CI gate needs to:

  1. Block on CVEs that matter: CISA KEV entries, CVEs with EPSS > configured threshold, CVEs with public PoC, CVSS ≥ 9.0.
  2. Warn on CVEs that need tracking: CVSS 7–9, EPSS 5–15%, no public PoC yet.
  3. Silently suppress noise: CVEs with no CVSS (data-deficient), CVEs in dev/test-only dependencies, CVEs in packages not in the final image layer, CVEs with vendor-issued not_affected VEX statements.
  4. Auto-clear on fix availability: when a patched version exists and the build can be updated, generate a PR automatically rather than blocking.

Target systems: Trivy, Grype, or Snyk as CI scanners; GitHub Actions, GitLab CI, or Tekton pipelines; teams shipping container images or language-ecosystem packages.

Threat Model

1. CVE flood causes gate bypass (accidental). Objective (unintentional): engineers suppress every CVE to keep pipelines green; a critical exploitable CVE is suppressed alongside thousands of noise entries. Impact: a CISA KEV-listed CVE with public exploit ships to production.

2. False-negative gate tuning (misconfiguration). Objective: EPSS threshold set too high (0.50+); many exploited CVEs with EPSS 0.10–0.30 pass silently. Impact: CVEs that are actively being exploited in the wild pass CI without review.

3. Suppression file manipulation (insider or supply-chain attacker). Objective: add a suppression entry for a specific CVE that affects a malicious dependency the attacker introduced. Impact: the CVE in the backdoored dependency is suppressed and the package ships to production.

4. NVD lag + EPSS lag (systemic). Objective: CVE has no score for 48 hours after publication (NVD backlog); scanner treats it as low priority; EPSS hasn’t had time to be computed. Impact: a high-severity CVE is published, PoC drops same day, but CI gate doesn’t block because the CVE has no score yet. Mitigation: treat score-absent CVEs with public PoC as critical.

Hardening Configuration

Gate Architecture

CI Pipeline: build image → scan with Trivy → evaluate gate policy → pass/fail/warn
                                                   ↓
                             EPSS API: fetch exploitation probability
                             KEV API:  fetch known-exploited list
                             VEX store: fetch vendor not-affected statements

Trivy Configuration with Policy Enforcement

# .trivy.yaml — project-level Trivy config
scan:
  # Scan all layers, not just the final layer
  target: image
  scanners:
    - vuln
    - secret    # catch secrets in image layers too

vulnerability:
  # Don't block on CVEs without CVSS — handled separately
  ignoreSeverities: []
  # Include unfixed CVEs in report but with different exit code handling
  ignoreUnfixed: false

exit-code: 0    # Override per-policy below; don't fail on any CVE by default
format: json
output: trivy-results.json

Evaluate the results with a policy script rather than relying on Trivy’s built-in exit codes:

#!/usr/bin/env python3
# gate-policy.py — evaluates Trivy output against tuned policy

import json
import sys
import requests
from datetime import datetime, UTC

EPSS_API = "https://api.first.org/data/v1/epss"
KEV_URL  = "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json"

# Tuning parameters — adjust per team risk appetite
EPSS_BLOCK_THRESHOLD = 0.15    # Block if exploitation probability >= 15%
CVSS_BLOCK_THRESHOLD = 9.0     # Block if CVSS score >= 9.0
CVSS_WARN_THRESHOLD  = 7.0     # Warn if CVSS score >= 7.0

def fetch_epss_scores(cve_ids: list[str]) -> dict[str, float]:
    if not cve_ids:
        return {}
    cve_param = ",".join(cve_ids[:100])  # API limit
    resp = requests.get(f"{EPSS_API}?cve={cve_param}", timeout=10)
    resp.raise_for_status()
    return {item["cve"]: float(item["epss"]) for item in resp.json().get("data", [])}

def fetch_kev_ids() -> set[str]:
    resp = requests.get(KEV_URL, timeout=15)
    resp.raise_for_status()
    return {v["cveID"] for v in resp.json().get("vulnerabilities", [])}

def load_suppressions(path: str = ".cve-suppressions.yaml") -> dict:
    import yaml
    try:
        with open(path) as f:
            return yaml.safe_load(f) or {}
    except FileNotFoundError:
        return {}

def evaluate(trivy_results_path: str):
    with open(trivy_results_path) as f:
        results = json.load(f)

    # Collect all CVEs from scan results
    all_cves = []
    findings_by_cve = {}
    for result in results.get("Results", []):
        for vuln in result.get("Vulnerabilities", []):
            cve_id = vuln.get("VulnerabilityID", "")
            if cve_id:
                all_cves.append(cve_id)
                findings_by_cve[cve_id] = vuln

    epss_scores = fetch_epss_scores(all_cves)
    kev_ids     = fetch_kev_ids()
    suppressions = load_suppressions()

    blocks = []
    warnings = []

    for cve_id, vuln in findings_by_cve.items():
        # Check suppression
        suppression = suppressions.get(cve_id)
        if suppression:
            # Validate suppression hasn't expired
            if suppression.get("expires_after"):
                expiry = datetime.fromisoformat(suppression["expires_after"])
                if expiry < datetime.now(UTC):
                    # Suppression expired — treat as unsuppressed
                    print(f"WARNING: Suppression for {cve_id} expired on {expiry.date()}")
                else:
                    continue  # Valid suppression; skip

        epss   = epss_scores.get(cve_id, 0.0)
        cvss   = vuln.get("CVSS", {}).get("nvd", {}).get("V3Score", 0.0)
        is_kev = cve_id in kev_ids

        # Block conditions
        if is_kev:
            blocks.append({"cve": cve_id, "reason": "CISA KEV", "epss": epss, "cvss": cvss})
        elif epss >= EPSS_BLOCK_THRESHOLD:
            blocks.append({"cve": cve_id, "reason": f"EPSS {epss:.3f}", "cvss": cvss})
        elif cvss >= CVSS_BLOCK_THRESHOLD:
            blocks.append({"cve": cve_id, "reason": f"CVSS {cvss}", "epss": epss})

        # Warn conditions (not blocking)
        elif cvss >= CVSS_WARN_THRESHOLD or epss >= 0.05:
            warnings.append({"cve": cve_id, "cvss": cvss, "epss": epss})

        # CVE with no score but public PoC URL — treat as warn
        elif not cvss and vuln.get("References"):
            poc_refs = [r for r in vuln["References"] if "poc" in r.lower() or "exploit" in r.lower()]
            if poc_refs:
                warnings.append({"cve": cve_id, "cvss": "N/A", "epss": epss,
                                  "note": "no CVSS but PoC reference found"})

    # Output results
    for b in blocks:
        print(f"BLOCK: {b['cve']} — {b['reason']}")
    for w in warnings:
        print(f"WARN:  {w['cve']} — CVSS={w['cvss']} EPSS={w.get('epss', 'N/A'):.3f}")

    if blocks:
        print(f"\n{len(blocks)} blocking CVEs found. Pipeline FAILED.")
        sys.exit(1)
    elif warnings:
        print(f"\n{len(warnings)} warning CVEs (non-blocking). Review recommended.")
        sys.exit(0)
    else:
        print("No blocking or warning CVEs. Pipeline PASSED.")

if __name__ == "__main__":
    evaluate(sys.argv[1] if len(sys.argv) > 1 else "trivy-results.json")

Suppression Policy File

# .cve-suppressions.yaml
# Format: CVE-ID: { reason, expires_after, ticket }
# All suppressions require: reason, expiry, and ticket reference
# Suppressions without expiry are rejected by the gate policy

CVE-2024-12345:
  reason: "Dev-only dependency (pytest); not present in final image"
  expires_after: "2026-12-31"
  ticket: "SEC-1234"
  approved_by: "security-team"

CVE-2025-67890:
  reason: "Vendor VEX statement: not_affected for our usage pattern"
  expires_after: "2026-09-01"
  ticket: "SEC-1567"
  vex_url: "https://vendor.example.com/security/CVE-2025-67890-vex.json"
  approved_by: "security-team"

Gate policy enforcement on suppressions:

# CI step: reject suppressions without expiry or ticket
python3 - << 'EOF'
import yaml, sys
with open(".cve-suppressions.yaml") as f:
    suppressions = yaml.safe_load(f) or {}
errors = []
for cve, data in suppressions.items():
    if not data.get("expires_after"):
        errors.append(f"{cve}: missing expires_after")
    if not data.get("ticket"):
        errors.append(f"{cve}: missing ticket")
if errors:
    print("SUPPRESSION POLICY VIOLATIONS:")
    for e in errors: print(f"  {e}")
    sys.exit(1)
EOF

GitHub Actions Integration

# .github/workflows/security-scan.yml
name: Security Scan
on: [push, pull_request]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Build image
        run: docker build -t app:${{ github.sha }} .

      - name: Run Trivy scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: app:${{ github.sha }}
          format: json
          output: trivy-results.json
          exit-code: "0"    # Policy script handles exit code

      - name: Validate suppression file
        run: python3 scripts/validate-suppressions.py

      - name: Evaluate gate policy
        run: python3 scripts/gate-policy.py trivy-results.json
        env:
          # Cache EPSS API results to avoid rate limiting in parallel jobs
          EPSS_CACHE_TTL: "3600"

      - name: Upload scan results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: trivy-scan-${{ github.sha }}
          path: trivy-results.json
          retention-days: 90

Monitoring Gate Effectiveness

Track gate decisions over time to detect drift:

# Prometheus metrics from gate policy script
cat << EOF >> gate-policy-metrics.py
from prometheus_client import CollectorRegistry, Gauge, push_to_gateway

registry = CollectorRegistry()
g_blocks  = Gauge("ci_cve_gate_blocks",  "CVEs blocked in CI", ["reason"], registry=registry)
g_warns   = Gauge("ci_cve_gate_warnings","CVEs warned in CI",  [], registry=registry)
g_suppressed = Gauge("ci_cve_gate_suppressed", "CVEs suppressed", [], registry=registry)

# ... populate from gate evaluation results ...

push_to_gateway("pushgateway:9091", job="cve-gate", registry=registry)
EOF

Expected Behaviour After Hardening

CVE Type Before Tuning After Tuning
CISA KEV — CVSS 7.5, EPSS 0.45 Passes if team only gates on CVSS ≥ 9 Blocked by KEV rule
CVSS 9.8, EPSS 0.002 (theoretical) Blocks pipeline Blocked (CVSS threshold) — correct
CVSS 5.0, EPSS 0.001, dev-only dep Blocks pipeline (noise) Suppressed with expiry; passes
No CVSS yet, PoC in references Not evaluated Warns; flags for manual review
Suppression expired Applies indefinitely Gate rejects expired suppression; CVE re-evaluated

Trade-offs and Operational Considerations

Aspect Benefit Cost Mitigation
EPSS-based blocking Catches exploited CVEs missed by CVSS EPSS scores lag 24-72h post-publication Supplement with KEV blocking; treat no-CVSS CVEs with PoC as warnings
Suppression with expiry Prevents indefinite noise suppression Requires process discipline to re-evaluate Automate expiry notification; block PRs adding suppressions without ticket
Policy script vs scanner flags Full control over logic; auditable More maintenance than native scanner flags Keep policy script in a shared security team repo; version it
EPSS API call in CI Always-current scores External API dependency; potential rate limit Cache responses for 1 hour; fall back to CVSS-only if API unavailable

Failure Modes

Failure Symptom Detection Recovery
EPSS API unreachable Policy falls back to CVSS-only; KEV-listed CVEs with low CVSS may pass CI log: EPSS API timeout; metric shows fallback mode Use cached EPSS data from last successful fetch; alert on fallback duration > 4h
KEV JSON endpoint changes URL KEV check fails; all CVEs pass CI log: HTTP 404 fetching KEV; metric shows KEV disabled Update KEV URL; use mirrored copy for resilience
Suppression file merge conflict CI fails on suppression validation Suppression validation step fails with parse error Resolve merge conflict; ensure only one entry per CVE
EPSS threshold too aggressive (0.05) Too many blocks; pipeline gridlock resumes Pipeline failure rate metric rises; engineer complaints Raise threshold to 0.10 or 0.15; review blocked CVEs to calibrate
CVE with no CVSS blocks production Gate evaluates no-CVSS CVE as critical Post-incident review: CVE had no CVSS yet was in final image Add grace period for no-CVSS CVEs: warn only for 48h after publication