Vulnerability Management Program: Scanning, SLAs, and Risk-Based Prioritisation

Vulnerability Management Program: Scanning, SLAs, and Risk-Based Prioritisation

Problem

Most organisations scan for vulnerabilities. Far fewer have a functioning vulnerability management program — a systematic process for ensuring that discovered vulnerabilities are actually remediated within a timeframe that limits attacker opportunity.

The gap between discovery and remediation is where attacks happen. The average time from public CVE disclosure to weaponised exploit is 12 days. The average time from CVE disclosure to enterprise patch application is 60+ days. This 48-day window is the attacker’s opportunity.

Common failure modes in vulnerability management:

  • Scanning without remediation workflow. A vulnerability scanner runs weekly and produces a report with 4,000 findings. The report lands in a shared inbox. Nobody has clear ownership. The same findings appear in next week’s report.
  • CVSS severity without context. CVSS scores assess vulnerability severity in isolation. A CVSS 9.8 critical against an internal service with no network exposure may be lower actual risk than a CVSS 7.0 against an internet-exposed authentication endpoint. Treating all criticals equally creates remediation backlogs for low-risk issues while high-risk contextual vulnerabilities are missed.
  • No SLAs. Without defined remediation timeframes, “remediate soon” means “remediate never”. Ownership disputes resolve in favour of inaction.
  • Patch verification not required. A ticket is closed when a developer marks it done. No scan re-runs to verify the patch is actually deployed. Vulnerabilities reappear in future scans.
  • Container images not in scope. Infrastructure patching is mature; container base images are not. Application teams run outdated base images — ubuntu:20.04 from two years ago — that are 50+ CVEs behind.
  • No exception process. Some vulnerabilities cannot be immediately patched (vendor-supplied components, no fix available). Without a formal exception process with compensating controls and review dates, they are either closed incorrectly or stuck indefinitely.

Target systems: Trivy (container/IaC scanning), Grype, Nuclei (web application scanning), AWS Inspector, GitHub Dependabot, Qualys/Tenable for infrastructure; Jira/Linear for remediation tracking.

Threat Model

  • Adversary 1 — Known CVE exploitation: An attacker uses a weaponised exploit for a known CVE against an unpatched system. The vulnerability was in the scan report but not remediated. Exploitation is automated and opportunistic.
  • Adversary 2 — Container image CVE: An attacker compromises a service via a known vulnerability in a base image (e.g., OpenSSL in ubuntu:focal). Application code is current but the base image has not been updated in 18 months.
  • Adversary 3 — Dependency vulnerability in application code: A transitive dependency in a Node.js or Python application has a known deserialization or RCE vulnerability. Dependabot opened a PR but it was not merged; the vulnerability is active in production.
  • Adversary 4 — Misconfiguration as vulnerability: A cloud resource is publicly accessible without authentication. Vulnerability scanners that check only CVEs miss this; configuration scanners (CSPM) are required.
  • Adversary 5 — Exception without compensating controls: A vulnerability with no available patch was granted an exception. The exception was not reviewed at renewal. The patch was released six months ago but nobody acted on it.
  • Access level: Adversaries 1, 2, and 3 exploit internet-exposed services. Adversary 4 exploits misconfigured cloud resources. Adversary 5 exploits abandoned exception management.
  • Objective: Initial access via unpatched vulnerability; privilege escalation; data exfiltration.
  • Blast radius: A single unpatched critical CVE on an internet-exposed service can provide initial access to the entire environment.

Configuration

Step 1: Asset Inventory (Scanning Scope)

You cannot remediate what you have not found. Maintain a comprehensive asset inventory:

# asset-inventory.yaml — scope of vulnerability management program.
asset_categories:
  - category: "internet_exposed"
    description: "Services directly reachable from the internet"
    scan_frequency: daily
    critical_sla_days: 3
    high_sla_days: 7
    includes:
      - "Load balancers and WAF endpoints"
      - "API gateways"
      - "Authentication services"
      - "Web application servers"

  - category: "internal_services"
    description: "Internal services reachable only from VPN/internal network"
    scan_frequency: weekly
    critical_sla_days: 7
    high_sla_days: 30
    includes:
      - "Internal APIs"
      - "Databases (internal access)"
      - "Kubernetes control plane"
      - "CI/CD systems"

  - category: "container_images"
    description: "Container base images and application dependencies"
    scan_frequency: on_build_and_daily
    critical_sla_days: 7
    high_sla_days: 14
    includes:
      - "Production container images in registry"
      - "Base images (FROM in Dockerfiles)"
      - "Application package dependencies (npm, pip, go.mod)"

  - category: "cloud_infrastructure"
    description: "Cloud resources and configurations"
    scan_frequency: continuous
    critical_sla_days: 24  # Hours, not days.
    high_sla_days: 7
    includes:
      - "S3 bucket configurations"
      - "IAM policy analysis"
      - "Security group rules"
      - "KMS key policies"

Step 2: Scanning Tools by Category

# scanning-tools.yaml — tool assignments per asset category.
tools:
  infrastructure:
    - name: Trivy
      scope: "Container images, Kubernetes manifests, IaC (Terraform, Helm)"
      command: "trivy image --severity HIGH,CRITICAL --exit-code 1 {image}"
    - name: Nuclei
      scope: "Web application endpoints — known CVE checks, misconfiguration"
      command: "nuclei -target {target} -severity high,critical -json -o {output}"
    - name: AWS Inspector
      scope: "EC2 instances, Lambda functions, ECR images"
      notes: "Continuous; findings push to Security Hub"

  application_dependencies:
    - name: GitHub Dependabot
      scope: "npm, pip, Go modules, Maven — repository-level"
      automation: "PR auto-creation for patch-level updates"
    - name: Grype
      scope: "Container SBOM scanning in CI pipeline"
      command: "grype sbom:{sbom_file} --fail-on high"
    - name: Trivy filesystem
      scope: "Application source tree scanning"
      command: "trivy fs --severity HIGH,CRITICAL ."

  cloud_configuration:
    - name: AWS Config + Security Hub
      scope: "AWS resource misconfigurations"
    - name: Prowler
      scope: "CIS AWS Benchmark, PCI DSS, HIPAA checks"
      command: "prowler aws --checks check_id_1 check_id_2 -M json"
# Integrate Trivy into CI — block builds with critical CVEs.
# .github/workflows/security-scan.yml

trivy image \
  --severity CRITICAL \
  --exit-code 1 \           # Fail build on critical CVEs.
  --ignore-unfixed \        # Ignore CVEs with no available fix.
  --format json \
  --output trivy-results.json \
  "$IMAGE_NAME:$IMAGE_TAG"

# Upload results to GitHub Security tab.
trivy image \
  --format sarif \
  --output trivy-sarif.json \
  "$IMAGE_NAME:$IMAGE_TAG"

Step 3: Risk-Based Prioritisation (Beyond CVSS)

Raw CVSS scores are a poor proxy for actual risk. Use contextual factors:

# vulnerability_scoring/risk_scorer.py
from dataclasses import dataclass
from enum import Enum

class ExposureLevel(Enum):
    INTERNET_FACING = 5
    INTERNAL_VPN = 3
    AIR_GAPPED = 1

class DataClassification(Enum):
    PII_OR_FINANCIAL = 3
    INTERNAL_CONFIDENTIAL = 2
    PUBLIC = 1

@dataclass
class Vulnerability:
    cve_id: str
    cvss_score: float
    cvss_vector: str
    has_exploit: bool          # From ExploitDB, Metasploit, CISA KEV.
    exploit_is_weaponised: bool  # Weaponised = point-and-click exploit.
    cisa_kev: bool             # In CISA Known Exploited Vulnerabilities catalog.
    fix_available: bool

@dataclass
class Asset:
    asset_id: str
    exposure: ExposureLevel
    data_classification: DataClassification
    business_critical: bool

def calculate_risk_priority(vuln: Vulnerability, asset: Asset) -> int:
    """
    Returns a risk priority score (higher = more urgent).
    Used to sort the remediation queue.
    """
    score = vuln.cvss_score * 10  # Base: 0-100.

    # Exploit availability is the strongest predictor of exploitation.
    if vuln.cisa_kev:
        score += 50     # Known to be exploited in the wild.
    elif vuln.exploit_is_weaponised:
        score += 30     # Weaponised but not confirmed in wild.
    elif vuln.has_exploit:
        score += 15     # Proof-of-concept exists.

    # Context: exposure level.
    score *= asset.exposure.value / 3.0

    # Context: data at stake.
    score *= asset.data_classification.value / 2.0

    # No fix available — cannot remediate; still track, but lower priority.
    if not vuln.fix_available:
        score *= 0.5

    return int(score)

def assign_sla(risk_score: int, asset_exposure: ExposureLevel) -> int:
    """Returns SLA in days."""
    if asset_exposure == ExposureLevel.INTERNET_FACING:
        if risk_score > 150: return 1
        if risk_score > 100: return 3
        if risk_score > 60:  return 7
        return 30
    else:
        if risk_score > 150: return 3
        if risk_score > 100: return 7
        if risk_score > 60:  return 14
        return 60

Check against the CISA Known Exploited Vulnerabilities catalog:

# Download CISA KEV catalog (updated regularly).
curl -s https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json \
  -o /tmp/kev.json

# Check if a specific CVE is in KEV.
KEV_CHECK=$(jq --arg cve "CVE-2024-12345" \
  '.vulnerabilities[] | select(.cveID == $cve) | .cveID' \
  /tmp/kev.json)

if [ -n "$KEV_CHECK" ]; then
  echo "CRITICAL: $cve is in CISA KEV — immediate patching required"
fi

Step 4: Remediation Tracking Workflow

# Vulnerability ticket lifecycle in Jira/Linear.
vulnerability_workflow:
  states:
    - name: "New"
      description: "Automatically created from scanner finding"
      trigger: "Scanner finds new CVE with CVSS >= 7.0"
      
    - name: "Triaged"
      description: "Risk score calculated; owner assigned; SLA set"
      sla: "24 hours from New"
      required_fields: ["owner", "risk_score", "remediation_due_date", "affected_assets"]
      
    - name: "In Progress"
      description: "Owner is actively working on remediation"
      
    - name: "Remediated"
      description: "Fix applied; awaiting verification scan"
      
    - name: "Verified"
      description: "Re-scan confirms vulnerability is gone"
      automation: "Re-run scanner against affected asset; auto-close if not found"
      
    - name: "Accepted Risk"
      description: "Formal exception granted with compensating controls"
      required_fields: ["approver", "compensating_controls", "review_date", "business_justification"]
      
    - name: "False Positive"
      description: "Finding confirmed to be a false positive"
      required_fields: ["analyst", "reason"]

  sla_breach_escalation:
    - days_overdue: 3
      action: "Notify owner and manager"
    - days_overdue: 7
      action: "Escalate to VP/director"
    - days_overdue: 14
      action: "Escalate to CISO; formal risk acceptance required"
# Automation: create Jira tickets from scanner findings.
# vulnerability_mgmt/ticket_creator.py

def create_vuln_ticket(finding: dict, asset: dict, risk_score: int, sla_days: int):
    due_date = datetime.now() + timedelta(days=sla_days)
    owner = lookup_asset_owner(asset["asset_id"])
    
    ticket = jira.create_issue(
        project="SEC",
        issuetype="Security Vulnerability",
        summary=f"[{finding['cve_id']}] {finding['title']} on {asset['name']}",
        description=render_vuln_description(finding, asset, risk_score),
        priority=risk_to_jira_priority(risk_score),
        assignee=owner,
        due_date=due_date.strftime("%Y-%m-%d"),
        labels=["vulnerability", finding["severity"].lower(), "auto-created"],
        custom_fields={
            "cvss_score": finding["cvss_score"],
            "risk_score": risk_score,
            "affected_asset": asset["asset_id"],
            "scanner": finding["scanner"],
        }
    )
    
    # Auto-link to asset in CMDB.
    jira.create_link(ticket.key, asset["jira_asset_key"], "affects")
    return ticket.key

Step 5: Container Image Patching Automation

#!/bin/bash
# Update base images automatically when security patches are available.
# Run daily in CI.

IMAGES_TO_CHECK=(
  "ubuntu:22.04"
  "python:3.12-slim"
  "node:20-alpine"
  "golang:1.22-alpine"
)

for IMAGE in "${IMAGES_TO_CHECK[@]}"; do
  # Pull latest version of the base image.
  docker pull "$IMAGE"
  
  # Scan for critical CVEs.
  CRITICAL_COUNT=$(trivy image --severity CRITICAL --no-progress \
    --format json "$IMAGE" 2>/dev/null | \
    jq '.Results[].Vulnerabilities | length' | \
    awk '{sum+=$1} END {print sum}')
  
  if [ "$CRITICAL_COUNT" -gt 0 ]; then
    echo "ALERT: $IMAGE has $CRITICAL_COUNT critical CVEs"
    # Create ticket to update all Dockerfiles using this base image.
    python3 create_base_image_ticket.py --image "$IMAGE" --count "$CRITICAL_COUNT"
  fi
done
# Renovate configuration for automated dependency PRs.
# renovate.json
{
  "$schema": "https://docs.renovatebot.com/renovate-schema.json",
  "extends": ["config:base"],
  "vulnerabilityAlerts": {
    "enabled": true,
    "labels": ["security", "vulnerability"]
  },
  "packageRules": [
    {
      "matchUpdateTypes": ["patch"],
      "automerge": true,       # Auto-merge patch updates (low risk).
      "automergeType": "branch"
    },
    {
      "matchPackagePatterns": ["*"],
      "matchUpdateTypes": ["major"],
      "reviewers": ["team:platform"],  # Major updates require review.
    }
  ]
}

Step 6: Exception Management

# Exception request template.
# sec-exception-request.yaml
exception:
  vulnerability: "CVE-2024-12345"
  affected_assets: ["api.example.com/v2", "api.example.com/v3"]
  reason: "Vendor-supplied component; no patch available as of 2026-05-01"
  cvss_score: 8.1
  exploitability: "PoC available; no weaponised exploit"
  
  compensating_controls:
    - "WAF rule deployed blocking the exploit path (rule ID: waf-cve-2024-12345)"
    - "Network egress from affected service blocked to prevent data exfiltration"
    - "Enhanced monitoring: alert on any access to vulnerable endpoint path"
  
  risk_acceptance:
    approver: "security-eng-lead@example.com"
    date_accepted: "2026-05-01"
    review_date: "2026-08-01"    # 90 days; re-evaluate if patch released.
    business_justification: "Critical vendor dependency; no replacement available"
  
  monitoring:
    # Automated check: if a patch becomes available, reopen the ticket.
    watch_nvd: true
    watch_vendor_advisory: "https://vendor.example.com/security/advisories"

Step 7: Metrics and Reporting

# vulnerability_mgmt/metrics.py — KPIs for the vulnerability management program.

def calculate_program_metrics(findings: list, tickets: list) -> dict:
    return {
        # Coverage: what percentage of assets are scanned?
        "scan_coverage_pct": len(scanned_assets) / len(total_assets) * 100,
        
        # Mean time to remediate by severity.
        "mttr_critical_days": mean([
            (t.resolved_date - t.created_date).days
            for t in tickets
            if t.severity == "CRITICAL" and t.resolved_date
        ]),
        "mttr_high_days": mean([...]),
        
        # SLA compliance: what % resolved within SLA?
        "sla_compliance_critical_pct": len([
            t for t in tickets
            if t.severity == "CRITICAL"
            and t.resolved_date
            and (t.resolved_date - t.created_date).days <= t.sla_days
        ]) / len(critical_tickets) * 100,
        
        # Open vulnerability age distribution.
        "open_critical_over_sla": len([
            t for t in tickets
            if t.severity == "CRITICAL"
            and not t.resolved_date
            and (datetime.now() - t.created_date).days > t.sla_days
        ]),
        
        # CISA KEV coverage.
        "kev_open_count": len([
            t for t in tickets
            if t.cisa_kev and not t.resolved_date
        ]),
    }

Step 8: Telemetry

vuln_open_total{severity, asset_category}              gauge
vuln_overdue_total{severity, asset_category}           gauge
vuln_mttr_days{severity}                               histogram
vuln_sla_compliance_pct{severity}                      gauge
vuln_new_per_day{severity, scanner}                    gauge
vuln_kev_open_total{}                                  gauge
vuln_exceptions_active_total{}                         gauge
vuln_scan_coverage_pct{}                               gauge

Alert on:

  • vuln_kev_open_total > 0 — a CISA KEV (confirmed exploited) vulnerability is unpatched; immediate escalation.
  • vuln_overdue_total{severity="CRITICAL"} > 0 — critical vulnerability past SLA; escalate to management.
  • vuln_scan_coverage_pct < 95 — assets are not being scanned; coverage gap.
  • vuln_exceptions_active_total growing week-over-week — exceptions accumulating without remediation; program discipline issue.
  • vuln_sla_compliance_pct{severity="CRITICAL"} < 90 — critical SLA compliance below target; review workflow.

Expected Behaviour

Signal Ad-hoc patching Structured VM program
Critical CVE disclosed Noticed if someone reads security news Automatically detected in next scan; ticket created within 24h
CISA KEV vulnerability Unknown if unpatched Immediate alert; escalation triggered
Vulnerability “closed” without fix Common; no verification Re-scan required; auto-reopen if still present
Exception never reviewed Permanent exception accumulates Automated review-date check; ticket created at renewal
MTTR for critical CVEs Unknown Measured; trending; SLA breach alert

Trade-offs

Aspect Benefit Cost Mitigation
Risk scoring over pure CVSS Prioritises actual threat More complex; requires asset context data Build asset inventory incrementally; start with known high-value assets
Auto-create tickets from scanner No findings fall through cracks Ticket volume can be overwhelming Filter to CVSS >= 7.0 initially; tune thresholds based on team capacity
Verification scan before close Confirms actual remediation Adds time to close; requires scanner access Automate re-scan on ticket status change to “Remediated”
--ignore-unfixed in CI Reduces noise from no-fix CVEs May miss issues where fix exists but scanner DB is stale Update scanner DB daily; cross-reference with NVD

Failure Modes

Failure Symptom Detection Recovery
Scanner DB not updated New CVEs not detected Scanner output references old signatures Automate daily DB update; alert if update fails
Owner not assigned Ticket stays in “New” state Tickets older than 24h without assignment Auto-escalate; enforce ownership in ticket creation automation
Ticket closed without verification Vulnerability reappears in next scan Re-scan finds same CVE; ticket re-opened Require linked scan result as close condition
Exception review date missed Expired exception still active Automated review-date check in CI/calendar Alert at 14 days before review date; require renewal or close
Scanner false positives overwhelm team Team stops triaging SLA breach rate increases; MTTR increases Tune scanner; add false-positive suppression; tune severity thresholds