Container Patch SLA Policy Enforcement: From Severity Tiers to Admission Control

Problem

Most organisations have a container vulnerability management policy. Fewer have one that is enforced. The gap between policy-on-paper and technical enforcement is where exploits live.

The typical failure pattern looks like this: Trivy runs in CI and flags a critical CVE. A Jira ticket is created. The ticket is assigned to an application team. The team’s sprint is full. The ticket is moved to the next sprint. Three weeks later the image with the critical CVE is still running in production. The Jira ticket is closed as “won’t fix this sprint” and the finding re-emerges in the next Trivy scan.

There are three root causes:

No automated patching on a schedule. Teams depend on application developers to rebuild base images. Copa — the open-source container patching tool from Microsoft — can patch OS-level vulnerabilities in an existing image layer without a full rebuild, but only if something triggers it automatically on new CVE disclosure.

No admission control blocking vulnerable images. Even if patched images exist, nothing stops a deployment of an older, vulnerable tag. Kubernetes will happily run whatever image reference is in the manifest. Without an admission webhook that validates the patch state of the image at deployment time, the pipeline gate is pure theatre.

Exceptions are informal and never expire. A Slack message saying “this image can’t be patched right now, it’s vendor-supplied” becomes a permanent exemption. Nobody tracks when vendor fixes are available, and the exception is never revisited.

This article builds a complete enforcement stack: severity-to-SLA tier definitions, Copa as the rapid-response patching engine, OCI annotation-based patch state recording, Kyverno policies enforcing SLA windows at admission time, and a time-bounded exception process that expires automatically.

Target systems: Kubernetes 1.28+, Kyverno 1.11+, Copa 0.7+, Trivy 0.51+, crane or cosign for annotation management.

Threat Model

Threat 1 — Critical CVE exploitation during SLA window. A critical vulnerability is disclosed on Monday morning. By Monday evening a weaponised exploit is available in Metasploit. Your container image, still running the vulnerable library, is exposed to that exploit through Tuesday, Wednesday, and into the following week because your patching process has no urgency tier. The attacker exploits during the delay. With a 24-hour SLA for CVSS ≥ 9.0 and CISA KEV entries, and admission control that blocks images beyond that window, the opportunity collapses to under a day.

Threat 2 — Permanent exception masquerading as temporary. A developer submits an exception request for an image that cannot be patched because it is a vendor-supplied third-party container. The exception is approved verbally. The vendor releases a fixed version three months later. Nobody notices. The exception was never recorded in a machine-readable form, so nothing triggers a review. The unpatched image runs for 14 months. A Kyverno PolicyException with an expiresAt annotation and an enforcement CronJob converts this informal agreement into a time-bounded technical constraint.

Threat 3 — Policy enforced in staging, bypassed in production. Teams learn which namespaces have the Kyverno admission webhook enforcing SLA policies. They deploy directly to production namespaces using kubectl apply or a CI job that bypasses the staging gate, knowing production is in “audit-only” mode. Namespace-based policy labels ensure production namespaces carry the enforcement mode, not audit mode, and that this label is protected from removal by a separate Kyverno policy.

Configuration and Implementation

SLA Tier Definitions

Define tiers before writing any code. These are the four tiers used in the policies below:

Tier	Condition	Patch SLA	Rationale
Critical	CVSS ≥ 9.0 or in CISA KEV	24 hours	Weaponised exploits appear within hours of disclosure for KEV entries
High	CVSS 7.0–8.9	7 days	Significant exploitability but typically requires more attacker effort
Medium	CVSS 4.0–6.9	30 days	Limited scope; compensating controls often sufficient short-term
Low	CVSS < 4.0	90 days	Tracked but not operationally urgent
No fix available	Any severity	Exception required	Vendor or upstream must release fix; compensating controls mandatory

Record these tiers in a ConfigMap consumed by your tooling so they are a single source of truth:

apiVersion: v1
kind: ConfigMap
metadata:
  name: patch-sla-tiers
  namespace: security-policy
data:
  critical_sla_hours: "24"
  high_sla_hours: "168"    # 7 days
  medium_sla_hours: "720"  # 30 days
  low_sla_hours: "2160"    # 90 days

Copa as the Rapid-Response Patching Engine

Copa patches container images in place by injecting updated OS packages without rebuilding from source. For a critical CVE disclosure, this means you can patch a running production image’s base layer within minutes of the fix being available in the upstream package repository — no Dockerfile change, no application rebuild, no developer involvement.

A Copa patching pipeline triggered on new critical CVE disclosures:

#!/usr/bin/env bash
# patch-critical.sh — triggered by Trivy scan finding CVSS ≥ 9.0

set -euo pipefail

IMAGE="${1:?Image reference required}"
REGISTRY="${2:?Registry required}"
REPORT_PATH=$(mktemp -d)/trivy-report.json
PATCHED_TAG="${IMAGE%%:*}:$(date +%Y%m%d-%H%M%S)-patched"

echo "[+] Scanning ${IMAGE} with Trivy..."
trivy image \
  --format json \
  --output "${REPORT_PATH}" \
  --severity CRITICAL,HIGH \
  "${IMAGE}"

CVE_COUNT=$(jq '[.Results[].Vulnerabilities // [] | .[]] | length' "${REPORT_PATH}")
if [[ "${CVE_COUNT}" -eq 0 ]]; then
  echo "[+] No Critical/High CVEs found — no patch required"
  exit 0
fi

echo "[+] Found ${CVE_COUNT} Critical/High CVEs — running Copa..."
copa patch \
  --image "${IMAGE}" \
  --report "${REPORT_PATH}" \
  --tag "${PATCHED_TAG}" \
  --addr "unix:///run/buildkit/buildkitd.sock"

echo "[+] Annotating patched image..."
MAX_SEVERITY=$(jq -r '
  [.Results[].Vulnerabilities // [] | .[].Severity] |
  if any(. == "CRITICAL") then "CRITICAL"
  elif any(. == "HIGH") then "HIGH"
  else "MEDIUM"
  end
' "${REPORT_PATH}")

PATCH_TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)

crane annotate \
  --annotation "security.hardening/patched-at=${PATCH_TIMESTAMP}" \
  --annotation "security.hardening/max-cve-severity-before-patch=${MAX_SEVERITY}" \
  --annotation "security.hardening/patch-tool=copa" \
  "${PATCHED_TAG}"

echo "[+] Pushing ${PATCHED_TAG} to ${REGISTRY}..."
crane copy "${PATCHED_TAG}" "${REGISTRY}/${PATCHED_TAG##*/}"

echo "[+] Patch complete: ${REGISTRY}/${PATCHED_TAG##*/}"

Trigger this script from a Tekton or GitHub Actions pipeline subscribed to your vulnerability scanner’s webhook, or from a CronJob that re-scans and re-patches on a four-hour interval for images running in production namespaces.

OCI Annotation Strategy

After Copa patches an image, annotate it with machine-readable patch state before pushing to your registry. Kyverno reads these annotations at admission time.

Three annotations are required for the enforcement policy:

# After Copa patches and before push to registry
crane annotate "${PATCHED_IMAGE}" \
  --annotation "security.hardening/patched-at=2026-05-09T14:23:00Z" \
  --annotation "security.hardening/max-cve-severity=CRITICAL" \
  --annotation "security.hardening/patch-sla-tier=critical"

If you use cosign for signing alongside annotation, embed them in the signature payload:

cosign sign \
  --annotations "security.hardening/patched-at=2026-05-09T14:23:00Z" \
  --annotations "security.hardening/max-cve-severity=CRITICAL" \
  --key "${KMS_KEY_ID}" \
  "${PATCHED_IMAGE}"

Using cosign-signed annotations provides tamper evidence: an attacker who can push to your registry cannot forge a patched-at annotation without also controlling the signing key. See the failure mode discussion for why unsigned annotations alone are insufficient in high-assurance environments.

Kyverno ClusterPolicy: Enforcing the Patch SLA

This is the core enforcement mechanism. The policy runs at admission time against every Pod in namespaces labelled patch-sla-enforcement: strict. It reads the patched-at OCI annotation from the image manifest (resolved via the Kyverno image metadata feature) and computes whether the image is within its SLA window.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: enforce-container-patch-sla
  annotations:
    policies.kyverno.io/title: Container Patch SLA Enforcement
    policies.kyverno.io/category: Vulnerability Management
    policies.kyverno.io/severity: high
    policies.kyverno.io/subject: Pod
    policies.kyverno.io/description: >
      Blocks deployment of container images that have breached their patch SLA.
      Critical images (CVSS ≥ 9.0) must be patched within 24h of the patched-at
      annotation timestamp. High images within 7 days. Images without patch
      annotations are blocked in production namespaces.
spec:
  validationFailureAction: Enforce
  background: false
  rules:
    - name: require-patch-annotation-in-production
      match:
        any:
          - resources:
              kinds: [Pod]
              namespaceSelector:
                matchLabels:
                  patch-sla-enforcement: strict
      validate:
        message: >
          Image {{ request.object.spec.containers[0].image }} is missing required
          patch annotations. All images in production namespaces must carry
          security.hardening/patched-at and security.hardening/max-cve-severity
          annotations. Run the Copa patching pipeline before deploying.
        foreach:
          - list: "request.object.spec.containers"
            deny:
              conditions:
                any:
                  - key: "{{ imageData('{{element.image}}').manifest.annotations.\"security.hardening/patched-at\" || '' }}"
                    operator: Equals
                    value: ""

    - name: enforce-critical-patch-sla-24h
      match:
        any:
          - resources:
              kinds: [Pod]
              namespaceSelector:
                matchLabels:
                  patch-sla-enforcement: strict
      validate:
        message: >
          Image {{ request.object.spec.containers[0].image }} has max-cve-severity=CRITICAL
          but was patched more than 24 hours ago. The patch SLA for critical vulnerabilities
          is 24 hours. Rebuild or re-patch the image, or submit a formal PolicyException.
        foreach:
          - list: "request.object.spec.containers"
            deny:
              conditions:
                all:
                  - key: "{{ imageData('{{element.image}}').manifest.annotations.\"security.hardening/max-cve-severity\" || 'NONE' }}"
                    operator: Equals
                    value: CRITICAL
                  - key: "{{ time_since('', imageData('{{element.image}}').manifest.annotations.\"security.hardening/patched-at\", '') }}"
                    operator: GreaterThan
                    value: "24h"

    - name: enforce-high-patch-sla-7d
      match:
        any:
          - resources:
              kinds: [Pod]
              namespaceSelector:
                matchLabels:
                  patch-sla-enforcement: strict
      validate:
        message: >
          Image has max-cve-severity=HIGH and the patch is older than 7 days.
          Re-patch with Copa or submit a time-bound PolicyException.
        foreach:
          - list: "request.object.spec.containers"
            deny:
              conditions:
                all:
                  - key: "{{ imageData('{{element.image}}').manifest.annotations.\"security.hardening/max-cve-severity\" || 'NONE' }}"
                    operator: Equals
                    value: HIGH
                  - key: "{{ time_since('', imageData('{{element.image}}').manifest.annotations.\"security.hardening/patched-at\", '') }}"
                    operator: GreaterThan
                    value: "168h"

The imageData() Kyverno context function fetches the OCI manifest from the registry at admission time. This requires the Kyverno controller to have registry pull credentials — configure these as a Kubernetes Secret referenced in the Kyverno ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: kyverno
  namespace: kyverno
data:
  imageRegistryCredentials.secrets: "registry-pull-secret"
  imageRegistryCredentials.allowInsecureRegistry: "false"

Alternative: Kyverno with Trivy Operator External Data

Rather than trusting OCI annotations (which can be spoofed if registry access controls are weak), query live vulnerability data from the Trivy Operator running in-cluster. The Trivy Operator continuously scans running images and writes results to VulnerabilityReport CRDs.

A Kyverno policy using external data from the Trivy Operator:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: enforce-patch-sla-via-trivy-operator
spec:
  validationFailureAction: Enforce
  background: false
  rules:
    - name: check-live-vulnerability-report
      match:
        any:
          - resources:
              kinds: [Pod]
              namespaceSelector:
                matchLabels:
                  patch-sla-enforcement: strict
      context:
        - name: vulnReport
          apiCall:
            urlPath: >
              /apis/aquasecurity.github.io/v1alpha1/namespaces/{{request.object.metadata.namespace}}/vulnerabilityreports?labelSelector=trivy-operator.resource.kind=ReplicaSet
            jmesPath: "items[0]"
      validate:
        message: >
          Live Trivy scan shows critical vulnerabilities in this image.
          Patch with Copa before deploying to production.
        deny:
          conditions:
            any:
              - key: "{{ vulnReport.report.summary.criticalCount || `0` }}"
                operator: GreaterThan
                value: "0"

The Trivy Operator approach provides fresher data — it reflects the actual current scan state rather than the annotation timestamp — but introduces latency because new images must complete a scan cycle before a report exists. Use a hybrid: OCI annotations for initial admission gating, Trivy Operator reports for continuous background validation.

Namespace-Based Policy Tiers

Apply enforcement selectively using namespace labels. This prevents a single misconfigured policy from disrupting development workflows while maintaining strict controls in production.

# Production namespaces — strict enforcement
apiVersion: v1
kind: Namespace
metadata:
  name: payments-prod
  labels:
    patch-sla-enforcement: strict
    environment: production
---
# Staging namespaces — audit only (Kyverno logs but does not block)
apiVersion: v1
kind: Namespace
metadata:
  name: payments-staging
  labels:
    patch-sla-enforcement: audit
    environment: staging

Protect the enforcement label from removal. An adversary with namespace edit permissions could remove the patch-sla-enforcement: strict label to bypass the policy. A second Kyverno policy prevents this:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: protect-patch-sla-enforcement-label
spec:
  validationFailureAction: Enforce
  background: false
  rules:
    - name: block-removal-of-enforcement-label
      match:
        any:
          - resources:
              kinds: [Namespace]
              operations: [UPDATE]
              selector:
                matchLabels:
                  environment: production
      validate:
        message: >
          Removing patch-sla-enforcement label from a production namespace is not permitted.
          Submit a change request for security team review.
        deny:
          conditions:
            any:
              - key: "{{ request.object.metadata.labels.\"patch-sla-enforcement\" || '' }}"
                operator: Equals
                value: ""

Exception Process: Time-Bound PolicyExceptions

When a legitimate exception is required — a vendor image with no available patch, a third-party component pending upstream fix — use a Kyverno PolicyException with an explicit expiry annotation rather than modifying the ClusterPolicy directly.

apiVersion: kyverno.io/v2beta1
kind: PolicyException
metadata:
  name: vendor-kafka-image-exception
  namespace: data-platform
  annotations:
    security.hardening/exception-reason: >
      Confluent Kafka 7.6.1 image contains CVE-2026-12345 in bundled libssl.
      Vendor fix expected in 7.7.0, scheduled for release 2026-06-01.
      Compensating control: network policy restricts ingress to internal services only.
    security.hardening/approved-by: security-team@example.com
    security.hardening/approved-date: "2026-05-09"
    security.hardening/expires-at: "2026-06-15T00:00:00Z"
    security.hardening/jira-ticket: SEC-4521
spec:
  exceptions:
    - policyName: enforce-container-patch-sla
      ruleNames:
        - enforce-critical-patch-sla-24h
  match:
    any:
      - resources:
          kinds: [Pod]
          namespaces: [data-platform]
          selector:
            matchLabels:
              app: kafka-broker

A CronJob runs hourly and revokes expired exceptions by deleting PolicyException resources where the expires-at annotation is in the past:

#!/usr/bin/env bash
# expire-policy-exceptions.sh

NOW=$(date -u +%Y-%m-%dT%H:%M:%SZ)

kubectl get policyexception \
  --all-namespaces \
  -o json |
jq -r --arg now "${NOW}" '
  .items[] |
  select(
    .metadata.annotations["security.hardening/expires-at"] != null and
    .metadata.annotations["security.hardening/expires-at"] < $now
  ) |
  "\(.metadata.namespace)/\(.metadata.name)"
' | while IFS='/' read -r ns name; do
  echo "[+] Revoking expired PolicyException: ${ns}/${name}"
  kubectl delete policyexception \
    --namespace "${ns}" \
    "${name}"
done

Wrap this in a Kubernetes CronJob running with a ServiceAccount scoped to list and delete on policyexceptions:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: expire-policy-exceptions
  namespace: security-policy
spec:
  schedule: "0 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: exception-expiry-controller
          restartPolicy: OnFailure
          containers:
            - name: expiry-controller
              image: bitnami/kubectl:1.29
              command: ["/scripts/expire-policy-exceptions.sh"]
              volumeMounts:
                - name: scripts
                  mountPath: /scripts
          volumes:
            - name: scripts
              configMap:
                name: expiry-controller-scripts
                defaultMode: 0755

SLA Compliance Reporting

A weekly compliance report shows which images are in each tier, how many are in breach, and which exceptions are approaching expiry. Run this as a CronJob or on-demand:

#!/usr/bin/env bash
# patch-sla-report.sh

echo "=== Container Patch SLA Compliance Report ==="
echo "Generated: $(date -u)"
echo ""

echo "--- Images with CRITICAL severity annotation (SLA: 24h) ---"
kubectl get pods \
  --all-namespaces \
  -l patch-sla-enforcement \
  -o json | jq -r '
  .items[] |
  .spec.containers[] as $c |
  {
    namespace: .metadata.namespace,
    pod: .metadata.name,
    image: $c.image
  } |
  "\(.namespace)\t\(.pod)\t\(.image)"
'

echo ""
echo "--- PolicyExceptions expiring within 7 days ---"
SEVEN_DAYS_FROM_NOW=$(date -u -d '+7 days' +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || \
  date -u -v+7d +%Y-%m-%dT%H:%M:%SZ)

kubectl get policyexception \
  --all-namespaces \
  -o json | jq -r --arg cutoff "${SEVEN_DAYS_FROM_NOW}" '
  .items[] |
  select(
    .metadata.annotations["security.hardening/expires-at"] != null and
    .metadata.annotations["security.hardening/expires-at"] <= $cutoff
  ) |
  "EXPIRING SOON: \(.metadata.namespace)/\(.metadata.name) — expires \(.metadata.annotations["security.hardening/expires-at"])"
'

Expected Behaviour

Scenario	Kyverno Decision	Audit Log Entry
Image with `max-cve-severity=CRITICAL`, `patched-at` 2 hours ago, deployed to production namespace	Allow	`PolicyResponse: Pass — critical SLA within 24h window`
Image with `max-cve-severity=CRITICAL`, `patched-at` 36 hours ago, no PolicyException	Block (Enforce mode)	`PolicyResponse: Fail — enforce-critical-patch-sla-24h: patched-at exceeds 24h SLA`
Image with `max-cve-severity=CRITICAL`, `patched-at` 36 hours ago, valid non-expired PolicyException	Allow	`PolicyResponse: Pass — PolicyException vendor-kafka-image-exception matched`
Image with no `patched-at` annotation, deployed to production namespace	Block (Enforce mode)	`PolicyResponse: Fail — require-patch-annotation-in-production: annotation missing`
Image with `max-cve-severity=HIGH`, `patched-at` 5 days ago, deployed to production	Allow	`PolicyResponse: Pass — high SLA within 7-day window`
Image with `max-cve-severity=HIGH`, `patched-at` 9 days ago, no PolicyException	Block (Enforce mode)	`PolicyResponse: Fail — enforce-high-patch-sla-7d: patched-at exceeds 168h SLA`
Expired PolicyException (past `expires-at`), deleted by CronJob, image deployed	Block (Enforce mode)	`PolicyResponse: Fail — no matching PolicyException, SLA exceeded`
Any image deployed to staging namespace (labelled `patch-sla-enforcement: audit`)	Allow with warning	`PolicyResponse: Audit — would have failed enforce-critical-patch-sla-24h`

Trade-offs

Decision	Option A	Option B	Recommendation
Kyverno action mode	`Enforce` — blocks non-compliant deployments	`Audit` — logs violations, allows deployment	Enforce in production namespaces; audit in staging and development. Never run audit-only in production.
Patch state source	OCI annotations on the image (`patched-at`)	Live query to Trivy Operator `VulnerabilityReport`	Annotations are fast and portable but can be forged. Trivy Operator provides fresh data but requires scan completion before admission. Use both: annotations as the gate, Trivy Operator for background monitoring.
Patch annotation trust	Unsigned annotations with `crane annotate`	Cosign-signed annotations verified at admission	Signed annotations are more secure but require Kyverno cosign integration and KMS infrastructure. For regulated environments, sign. For internal clusters with strong registry access controls, unsigned is acceptable.
Namespace-based tiers	Per-namespace enforcement labels — allows gradual rollout	Cluster-wide policy applying to all namespaces	Namespace tiers allow phased rollout but require label protection policies. Cluster-wide is simpler but risks disrupting non-production workloads. Namespace tiers are the pragmatic path for most organisations.
Exception handling	Kyverno `PolicyException` with expiry CronJob	Inline policy exceptions via `exclude` blocks in `ClusterPolicy`	`PolicyException` resources are auditable, namespaced, and deletable. Inline `exclude` blocks require policy changes (code review, approval) which is a feature, not a bug — but `PolicyException` is the right tool for temporary operational exceptions.
Copa trigger	On-demand per-CVE trigger from scanner webhook	Scheduled CronJob re-patching all images on a fixed interval	Webhook triggers are faster for critical CVEs but require integration plumbing. Scheduled re-patching is more reliable but may miss the 24h window for truly new criticals. Run both: webhook for critical and high, scheduled daily for medium.

Failure Modes

Failure	Symptom	Impact	Mitigation
Kyverno admission webhook down	Pods fail to schedule; webhook timeout	Depending on `failurePolicy` setting, either all deployments fail (`Fail`) or all deployments succeed bypassing policy (`Ignore`)	Set `failurePolicy: Fail` for production namespaces. Accept the operational impact of Kyverno unavailability as the safer default. Run Kyverno in high-availability mode (3 replicas, PodDisruptionBudget). Alert on Kyverno webhook response latency.
OCI annotation spoofed by attacker	Attacker with registry write access pushes image with forged `patched-at` timestamp newer than actual patch	Policy allows deployment of an unpatched image because annotation timestamp is within SLA	Use cosign-signed annotations. Kyverno can verify the cosign signature against a known public key or KMS key at admission time, making forgery require control of the signing key.
Exception CronJob fails silently	Expired `PolicyException` resources remain active	Expired exceptions continue to allow SLA-breaching images indefinitely	Alert on CronJob failure (last successful job time). Add a Prometheus metric or Kyverno background scan report showing PolicyExceptions past their `expires-at`. Consider a secondary process (security engineer weekly review) as a backstop.
Production namespace missing enforcement label	New namespace created without `patch-sla-enforcement: strict` label	Policy does not apply; production workloads run without SLA enforcement	A Kyverno policy requiring all namespaces labelled `environment: production` to also carry `patch-sla-enforcement: strict`. Enforce this at namespace creation time.
`imageData()` registry call fails	Kyverno cannot fetch image manifest from registry (network issue, auth expiry)	Admission evaluation fails; policy falls back to denying or allowing based on error handling	Configure Kyverno registry credentials via the `imageRegistryCredentials` ConfigMap. Set registry credentials with long-lived service account tokens. Test registry connectivity from Kyverno pods. Combine with OCI annotation fallback: if `imageData()` fails, deny unless annotation is present in the pod spec (belt-and-suspenders).
Copa patch introduces regression	Patched image passes SLA check but fails application tests	Production outage from bad patch	Run Copa patches through a staging validation pipeline before production. Copa patches only OS packages, not application code, which limits the regression surface. Canary deployments in production catch issues before full rollout.

Copa: Patching Distroless and Minimal Container Images — how Copa handles distroless images where traditional package managers are absent
Copa in Kubernetes: Automated In-Cluster Patching Pipelines — running Copa as an in-cluster job triggered by the Trivy Operator
Copa in CI/CD: Integrating Container Patch Automation into Your Pipeline — pipeline patterns for GitHub Actions, Tekton, and GitLab CI
Kyverno Controller Security: Hardening the Policy Engine — securing the Kyverno installation itself, including RBAC, network policies, and admission webhook configuration
Policy as Code at Scale: OPA, Rego Testing, and Enterprise Policy Libraries — managing policy libraries at enterprise scale, including testing and exception governance