Container Patch SLA Policy Enforcement: From Severity Tiers to Admission Control
Problem
Most organisations have a container vulnerability management policy. Fewer have one that is enforced. The gap between policy-on-paper and technical enforcement is where exploits live.
The typical failure pattern looks like this: Trivy runs in CI and flags a critical CVE. A Jira ticket is created. The ticket is assigned to an application team. The team’s sprint is full. The ticket is moved to the next sprint. Three weeks later the image with the critical CVE is still running in production. The Jira ticket is closed as “won’t fix this sprint” and the finding re-emerges in the next Trivy scan.
There are three root causes:
No automated patching on a schedule. Teams depend on application developers to rebuild base images. Copa — the open-source container patching tool from Microsoft — can patch OS-level vulnerabilities in an existing image layer without a full rebuild, but only if something triggers it automatically on new CVE disclosure.
No admission control blocking vulnerable images. Even if patched images exist, nothing stops a deployment of an older, vulnerable tag. Kubernetes will happily run whatever image reference is in the manifest. Without an admission webhook that validates the patch state of the image at deployment time, the pipeline gate is pure theatre.
Exceptions are informal and never expire. A Slack message saying “this image can’t be patched right now, it’s vendor-supplied” becomes a permanent exemption. Nobody tracks when vendor fixes are available, and the exception is never revisited.
This article builds a complete enforcement stack: severity-to-SLA tier definitions, Copa as the rapid-response patching engine, OCI annotation-based patch state recording, Kyverno policies enforcing SLA windows at admission time, and a time-bounded exception process that expires automatically.
Target systems: Kubernetes 1.28+, Kyverno 1.11+, Copa 0.7+, Trivy 0.51+, crane or cosign for annotation management.
Threat Model
Threat 1 — Critical CVE exploitation during SLA window. A critical vulnerability is disclosed on Monday morning. By Monday evening a weaponised exploit is available in Metasploit. Your container image, still running the vulnerable library, is exposed to that exploit through Tuesday, Wednesday, and into the following week because your patching process has no urgency tier. The attacker exploits during the delay. With a 24-hour SLA for CVSS ≥ 9.0 and CISA KEV entries, and admission control that blocks images beyond that window, the opportunity collapses to under a day.
Threat 2 — Permanent exception masquerading as temporary. A developer submits an exception request for an image that cannot be patched because it is a vendor-supplied third-party container. The exception is approved verbally. The vendor releases a fixed version three months later. Nobody notices. The exception was never recorded in a machine-readable form, so nothing triggers a review. The unpatched image runs for 14 months. A Kyverno PolicyException with an expiresAt annotation and an enforcement CronJob converts this informal agreement into a time-bounded technical constraint.
Threat 3 — Policy enforced in staging, bypassed in production. Teams learn which namespaces have the Kyverno admission webhook enforcing SLA policies. They deploy directly to production namespaces using kubectl apply or a CI job that bypasses the staging gate, knowing production is in “audit-only” mode. Namespace-based policy labels ensure production namespaces carry the enforcement mode, not audit mode, and that this label is protected from removal by a separate Kyverno policy.
Configuration and Implementation
SLA Tier Definitions
Define tiers before writing any code. These are the four tiers used in the policies below:
| Tier | Condition | Patch SLA | Rationale |
|---|---|---|---|
| Critical | CVSS ≥ 9.0 or in CISA KEV | 24 hours | Weaponised exploits appear within hours of disclosure for KEV entries |
| High | CVSS 7.0–8.9 | 7 days | Significant exploitability but typically requires more attacker effort |
| Medium | CVSS 4.0–6.9 | 30 days | Limited scope; compensating controls often sufficient short-term |
| Low | CVSS < 4.0 | 90 days | Tracked but not operationally urgent |
| No fix available | Any severity | Exception required | Vendor or upstream must release fix; compensating controls mandatory |
Record these tiers in a ConfigMap consumed by your tooling so they are a single source of truth:
apiVersion: v1
kind: ConfigMap
metadata:
name: patch-sla-tiers
namespace: security-policy
data:
critical_sla_hours: "24"
high_sla_hours: "168" # 7 days
medium_sla_hours: "720" # 30 days
low_sla_hours: "2160" # 90 days
Copa as the Rapid-Response Patching Engine
Copa patches container images in place by injecting updated OS packages without rebuilding from source. For a critical CVE disclosure, this means you can patch a running production image’s base layer within minutes of the fix being available in the upstream package repository — no Dockerfile change, no application rebuild, no developer involvement.
A Copa patching pipeline triggered on new critical CVE disclosures:
#!/usr/bin/env bash
# patch-critical.sh — triggered by Trivy scan finding CVSS ≥ 9.0
set -euo pipefail
IMAGE="${1:?Image reference required}"
REGISTRY="${2:?Registry required}"
REPORT_PATH=$(mktemp -d)/trivy-report.json
PATCHED_TAG="${IMAGE%%:*}:$(date +%Y%m%d-%H%M%S)-patched"
echo "[+] Scanning ${IMAGE} with Trivy..."
trivy image \
--format json \
--output "${REPORT_PATH}" \
--severity CRITICAL,HIGH \
"${IMAGE}"
CVE_COUNT=$(jq '[.Results[].Vulnerabilities // [] | .[]] | length' "${REPORT_PATH}")
if [[ "${CVE_COUNT}" -eq 0 ]]; then
echo "[+] No Critical/High CVEs found — no patch required"
exit 0
fi
echo "[+] Found ${CVE_COUNT} Critical/High CVEs — running Copa..."
copa patch \
--image "${IMAGE}" \
--report "${REPORT_PATH}" \
--tag "${PATCHED_TAG}" \
--addr "unix:///run/buildkit/buildkitd.sock"
echo "[+] Annotating patched image..."
MAX_SEVERITY=$(jq -r '
[.Results[].Vulnerabilities // [] | .[].Severity] |
if any(. == "CRITICAL") then "CRITICAL"
elif any(. == "HIGH") then "HIGH"
else "MEDIUM"
end
' "${REPORT_PATH}")
PATCH_TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)
crane annotate \
--annotation "security.hardening/patched-at=${PATCH_TIMESTAMP}" \
--annotation "security.hardening/max-cve-severity-before-patch=${MAX_SEVERITY}" \
--annotation "security.hardening/patch-tool=copa" \
"${PATCHED_TAG}"
echo "[+] Pushing ${PATCHED_TAG} to ${REGISTRY}..."
crane copy "${PATCHED_TAG}" "${REGISTRY}/${PATCHED_TAG##*/}"
echo "[+] Patch complete: ${REGISTRY}/${PATCHED_TAG##*/}"
Trigger this script from a Tekton or GitHub Actions pipeline subscribed to your vulnerability scanner’s webhook, or from a CronJob that re-scans and re-patches on a four-hour interval for images running in production namespaces.
OCI Annotation Strategy
After Copa patches an image, annotate it with machine-readable patch state before pushing to your registry. Kyverno reads these annotations at admission time.
Three annotations are required for the enforcement policy:
# After Copa patches and before push to registry
crane annotate "${PATCHED_IMAGE}" \
--annotation "security.hardening/patched-at=2026-05-09T14:23:00Z" \
--annotation "security.hardening/max-cve-severity=CRITICAL" \
--annotation "security.hardening/patch-sla-tier=critical"
If you use cosign for signing alongside annotation, embed them in the signature payload:
cosign sign \
--annotations "security.hardening/patched-at=2026-05-09T14:23:00Z" \
--annotations "security.hardening/max-cve-severity=CRITICAL" \
--key "${KMS_KEY_ID}" \
"${PATCHED_IMAGE}"
Using cosign-signed annotations provides tamper evidence: an attacker who can push to your registry cannot forge a patched-at annotation without also controlling the signing key. See the failure mode discussion for why unsigned annotations alone are insufficient in high-assurance environments.
Kyverno ClusterPolicy: Enforcing the Patch SLA
This is the core enforcement mechanism. The policy runs at admission time against every Pod in namespaces labelled patch-sla-enforcement: strict. It reads the patched-at OCI annotation from the image manifest (resolved via the Kyverno image metadata feature) and computes whether the image is within its SLA window.
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: enforce-container-patch-sla
annotations:
policies.kyverno.io/title: Container Patch SLA Enforcement
policies.kyverno.io/category: Vulnerability Management
policies.kyverno.io/severity: high
policies.kyverno.io/subject: Pod
policies.kyverno.io/description: >
Blocks deployment of container images that have breached their patch SLA.
Critical images (CVSS ≥ 9.0) must be patched within 24h of the patched-at
annotation timestamp. High images within 7 days. Images without patch
annotations are blocked in production namespaces.
spec:
validationFailureAction: Enforce
background: false
rules:
- name: require-patch-annotation-in-production
match:
any:
- resources:
kinds: [Pod]
namespaceSelector:
matchLabels:
patch-sla-enforcement: strict
validate:
message: >
Image {{ request.object.spec.containers[0].image }} is missing required
patch annotations. All images in production namespaces must carry
security.hardening/patched-at and security.hardening/max-cve-severity
annotations. Run the Copa patching pipeline before deploying.
foreach:
- list: "request.object.spec.containers"
deny:
conditions:
any:
- key: "{{ imageData('{{element.image}}').manifest.annotations.\"security.hardening/patched-at\" || '' }}"
operator: Equals
value: ""
- name: enforce-critical-patch-sla-24h
match:
any:
- resources:
kinds: [Pod]
namespaceSelector:
matchLabels:
patch-sla-enforcement: strict
validate:
message: >
Image {{ request.object.spec.containers[0].image }} has max-cve-severity=CRITICAL
but was patched more than 24 hours ago. The patch SLA for critical vulnerabilities
is 24 hours. Rebuild or re-patch the image, or submit a formal PolicyException.
foreach:
- list: "request.object.spec.containers"
deny:
conditions:
all:
- key: "{{ imageData('{{element.image}}').manifest.annotations.\"security.hardening/max-cve-severity\" || 'NONE' }}"
operator: Equals
value: CRITICAL
- key: "{{ time_since('', imageData('{{element.image}}').manifest.annotations.\"security.hardening/patched-at\", '') }}"
operator: GreaterThan
value: "24h"
- name: enforce-high-patch-sla-7d
match:
any:
- resources:
kinds: [Pod]
namespaceSelector:
matchLabels:
patch-sla-enforcement: strict
validate:
message: >
Image has max-cve-severity=HIGH and the patch is older than 7 days.
Re-patch with Copa or submit a time-bound PolicyException.
foreach:
- list: "request.object.spec.containers"
deny:
conditions:
all:
- key: "{{ imageData('{{element.image}}').manifest.annotations.\"security.hardening/max-cve-severity\" || 'NONE' }}"
operator: Equals
value: HIGH
- key: "{{ time_since('', imageData('{{element.image}}').manifest.annotations.\"security.hardening/patched-at\", '') }}"
operator: GreaterThan
value: "168h"
The imageData() Kyverno context function fetches the OCI manifest from the registry at admission time. This requires the Kyverno controller to have registry pull credentials — configure these as a Kubernetes Secret referenced in the Kyverno ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: kyverno
namespace: kyverno
data:
imageRegistryCredentials.secrets: "registry-pull-secret"
imageRegistryCredentials.allowInsecureRegistry: "false"
Alternative: Kyverno with Trivy Operator External Data
Rather than trusting OCI annotations (which can be spoofed if registry access controls are weak), query live vulnerability data from the Trivy Operator running in-cluster. The Trivy Operator continuously scans running images and writes results to VulnerabilityReport CRDs.
A Kyverno policy using external data from the Trivy Operator:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: enforce-patch-sla-via-trivy-operator
spec:
validationFailureAction: Enforce
background: false
rules:
- name: check-live-vulnerability-report
match:
any:
- resources:
kinds: [Pod]
namespaceSelector:
matchLabels:
patch-sla-enforcement: strict
context:
- name: vulnReport
apiCall:
urlPath: >
/apis/aquasecurity.github.io/v1alpha1/namespaces/{{request.object.metadata.namespace}}/vulnerabilityreports?labelSelector=trivy-operator.resource.kind=ReplicaSet
jmesPath: "items[0]"
validate:
message: >
Live Trivy scan shows critical vulnerabilities in this image.
Patch with Copa before deploying to production.
deny:
conditions:
any:
- key: "{{ vulnReport.report.summary.criticalCount || `0` }}"
operator: GreaterThan
value: "0"
The Trivy Operator approach provides fresher data — it reflects the actual current scan state rather than the annotation timestamp — but introduces latency because new images must complete a scan cycle before a report exists. Use a hybrid: OCI annotations for initial admission gating, Trivy Operator reports for continuous background validation.
Namespace-Based Policy Tiers
Apply enforcement selectively using namespace labels. This prevents a single misconfigured policy from disrupting development workflows while maintaining strict controls in production.
# Production namespaces — strict enforcement
apiVersion: v1
kind: Namespace
metadata:
name: payments-prod
labels:
patch-sla-enforcement: strict
environment: production
---
# Staging namespaces — audit only (Kyverno logs but does not block)
apiVersion: v1
kind: Namespace
metadata:
name: payments-staging
labels:
patch-sla-enforcement: audit
environment: staging
Protect the enforcement label from removal. An adversary with namespace edit permissions could remove the patch-sla-enforcement: strict label to bypass the policy. A second Kyverno policy prevents this:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: protect-patch-sla-enforcement-label
spec:
validationFailureAction: Enforce
background: false
rules:
- name: block-removal-of-enforcement-label
match:
any:
- resources:
kinds: [Namespace]
operations: [UPDATE]
selector:
matchLabels:
environment: production
validate:
message: >
Removing patch-sla-enforcement label from a production namespace is not permitted.
Submit a change request for security team review.
deny:
conditions:
any:
- key: "{{ request.object.metadata.labels.\"patch-sla-enforcement\" || '' }}"
operator: Equals
value: ""
Exception Process: Time-Bound PolicyExceptions
When a legitimate exception is required — a vendor image with no available patch, a third-party component pending upstream fix — use a Kyverno PolicyException with an explicit expiry annotation rather than modifying the ClusterPolicy directly.
apiVersion: kyverno.io/v2beta1
kind: PolicyException
metadata:
name: vendor-kafka-image-exception
namespace: data-platform
annotations:
security.hardening/exception-reason: >
Confluent Kafka 7.6.1 image contains CVE-2026-12345 in bundled libssl.
Vendor fix expected in 7.7.0, scheduled for release 2026-06-01.
Compensating control: network policy restricts ingress to internal services only.
security.hardening/approved-by: security-team@example.com
security.hardening/approved-date: "2026-05-09"
security.hardening/expires-at: "2026-06-15T00:00:00Z"
security.hardening/jira-ticket: SEC-4521
spec:
exceptions:
- policyName: enforce-container-patch-sla
ruleNames:
- enforce-critical-patch-sla-24h
match:
any:
- resources:
kinds: [Pod]
namespaces: [data-platform]
selector:
matchLabels:
app: kafka-broker
A CronJob runs hourly and revokes expired exceptions by deleting PolicyException resources where the expires-at annotation is in the past:
#!/usr/bin/env bash
# expire-policy-exceptions.sh
NOW=$(date -u +%Y-%m-%dT%H:%M:%SZ)
kubectl get policyexception \
--all-namespaces \
-o json |
jq -r --arg now "${NOW}" '
.items[] |
select(
.metadata.annotations["security.hardening/expires-at"] != null and
.metadata.annotations["security.hardening/expires-at"] < $now
) |
"\(.metadata.namespace)/\(.metadata.name)"
' | while IFS='/' read -r ns name; do
echo "[+] Revoking expired PolicyException: ${ns}/${name}"
kubectl delete policyexception \
--namespace "${ns}" \
"${name}"
done
Wrap this in a Kubernetes CronJob running with a ServiceAccount scoped to list and delete on policyexceptions:
apiVersion: batch/v1
kind: CronJob
metadata:
name: expire-policy-exceptions
namespace: security-policy
spec:
schedule: "0 * * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: exception-expiry-controller
restartPolicy: OnFailure
containers:
- name: expiry-controller
image: bitnami/kubectl:1.29
command: ["/scripts/expire-policy-exceptions.sh"]
volumeMounts:
- name: scripts
mountPath: /scripts
volumes:
- name: scripts
configMap:
name: expiry-controller-scripts
defaultMode: 0755
SLA Compliance Reporting
A weekly compliance report shows which images are in each tier, how many are in breach, and which exceptions are approaching expiry. Run this as a CronJob or on-demand:
#!/usr/bin/env bash
# patch-sla-report.sh
echo "=== Container Patch SLA Compliance Report ==="
echo "Generated: $(date -u)"
echo ""
echo "--- Images with CRITICAL severity annotation (SLA: 24h) ---"
kubectl get pods \
--all-namespaces \
-l patch-sla-enforcement \
-o json | jq -r '
.items[] |
.spec.containers[] as $c |
{
namespace: .metadata.namespace,
pod: .metadata.name,
image: $c.image
} |
"\(.namespace)\t\(.pod)\t\(.image)"
'
echo ""
echo "--- PolicyExceptions expiring within 7 days ---"
SEVEN_DAYS_FROM_NOW=$(date -u -d '+7 days' +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || \
date -u -v+7d +%Y-%m-%dT%H:%M:%SZ)
kubectl get policyexception \
--all-namespaces \
-o json | jq -r --arg cutoff "${SEVEN_DAYS_FROM_NOW}" '
.items[] |
select(
.metadata.annotations["security.hardening/expires-at"] != null and
.metadata.annotations["security.hardening/expires-at"] <= $cutoff
) |
"EXPIRING SOON: \(.metadata.namespace)/\(.metadata.name) — expires \(.metadata.annotations["security.hardening/expires-at"])"
'
Expected Behaviour
| Scenario | Kyverno Decision | Audit Log Entry |
|---|---|---|
Image with max-cve-severity=CRITICAL, patched-at 2 hours ago, deployed to production namespace |
Allow | PolicyResponse: Pass — critical SLA within 24h window |
Image with max-cve-severity=CRITICAL, patched-at 36 hours ago, no PolicyException |
Block (Enforce mode) | PolicyResponse: Fail — enforce-critical-patch-sla-24h: patched-at exceeds 24h SLA |
Image with max-cve-severity=CRITICAL, patched-at 36 hours ago, valid non-expired PolicyException |
Allow | PolicyResponse: Pass — PolicyException vendor-kafka-image-exception matched |
Image with no patched-at annotation, deployed to production namespace |
Block (Enforce mode) | PolicyResponse: Fail — require-patch-annotation-in-production: annotation missing |
Image with max-cve-severity=HIGH, patched-at 5 days ago, deployed to production |
Allow | PolicyResponse: Pass — high SLA within 7-day window |
Image with max-cve-severity=HIGH, patched-at 9 days ago, no PolicyException |
Block (Enforce mode) | PolicyResponse: Fail — enforce-high-patch-sla-7d: patched-at exceeds 168h SLA |
Expired PolicyException (past expires-at), deleted by CronJob, image deployed |
Block (Enforce mode) | PolicyResponse: Fail — no matching PolicyException, SLA exceeded |
Any image deployed to staging namespace (labelled patch-sla-enforcement: audit) |
Allow with warning | PolicyResponse: Audit — would have failed enforce-critical-patch-sla-24h |
Trade-offs
| Decision | Option A | Option B | Recommendation |
|---|---|---|---|
| Kyverno action mode | Enforce — blocks non-compliant deployments |
Audit — logs violations, allows deployment |
Enforce in production namespaces; audit in staging and development. Never run audit-only in production. |
| Patch state source | OCI annotations on the image (patched-at) |
Live query to Trivy Operator VulnerabilityReport |
Annotations are fast and portable but can be forged. Trivy Operator provides fresh data but requires scan completion before admission. Use both: annotations as the gate, Trivy Operator for background monitoring. |
| Patch annotation trust | Unsigned annotations with crane annotate |
Cosign-signed annotations verified at admission | Signed annotations are more secure but require Kyverno cosign integration and KMS infrastructure. For regulated environments, sign. For internal clusters with strong registry access controls, unsigned is acceptable. |
| Namespace-based tiers | Per-namespace enforcement labels — allows gradual rollout | Cluster-wide policy applying to all namespaces | Namespace tiers allow phased rollout but require label protection policies. Cluster-wide is simpler but risks disrupting non-production workloads. Namespace tiers are the pragmatic path for most organisations. |
| Exception handling | Kyverno PolicyException with expiry CronJob |
Inline policy exceptions via exclude blocks in ClusterPolicy |
PolicyException resources are auditable, namespaced, and deletable. Inline exclude blocks require policy changes (code review, approval) which is a feature, not a bug — but PolicyException is the right tool for temporary operational exceptions. |
| Copa trigger | On-demand per-CVE trigger from scanner webhook | Scheduled CronJob re-patching all images on a fixed interval | Webhook triggers are faster for critical CVEs but require integration plumbing. Scheduled re-patching is more reliable but may miss the 24h window for truly new criticals. Run both: webhook for critical and high, scheduled daily for medium. |
Failure Modes
| Failure | Symptom | Impact | Mitigation |
|---|---|---|---|
| Kyverno admission webhook down | Pods fail to schedule; webhook timeout | Depending on failurePolicy setting, either all deployments fail (Fail) or all deployments succeed bypassing policy (Ignore) |
Set failurePolicy: Fail for production namespaces. Accept the operational impact of Kyverno unavailability as the safer default. Run Kyverno in high-availability mode (3 replicas, PodDisruptionBudget). Alert on Kyverno webhook response latency. |
| OCI annotation spoofed by attacker | Attacker with registry write access pushes image with forged patched-at timestamp newer than actual patch |
Policy allows deployment of an unpatched image because annotation timestamp is within SLA | Use cosign-signed annotations. Kyverno can verify the cosign signature against a known public key or KMS key at admission time, making forgery require control of the signing key. |
| Exception CronJob fails silently | Expired PolicyException resources remain active |
Expired exceptions continue to allow SLA-breaching images indefinitely | Alert on CronJob failure (last successful job time). Add a Prometheus metric or Kyverno background scan report showing PolicyExceptions past their expires-at. Consider a secondary process (security engineer weekly review) as a backstop. |
| Production namespace missing enforcement label | New namespace created without patch-sla-enforcement: strict label |
Policy does not apply; production workloads run without SLA enforcement | A Kyverno policy requiring all namespaces labelled environment: production to also carry patch-sla-enforcement: strict. Enforce this at namespace creation time. |
imageData() registry call fails |
Kyverno cannot fetch image manifest from registry (network issue, auth expiry) | Admission evaluation fails; policy falls back to denying or allowing based on error handling | Configure Kyverno registry credentials via the imageRegistryCredentials ConfigMap. Set registry credentials with long-lived service account tokens. Test registry connectivity from Kyverno pods. Combine with OCI annotation fallback: if imageData() fails, deny unless annotation is present in the pod spec (belt-and-suspenders). |
| Copa patch introduces regression | Patched image passes SLA check but fails application tests | Production outage from bad patch | Run Copa patches through a staging validation pipeline before production. Copa patches only OS packages, not application code, which limits the regression surface. Canary deployments in production catch issues before full rollout. |
Related Articles
- Copa: Patching Distroless and Minimal Container Images — how Copa handles distroless images where traditional package managers are absent
- Copa in Kubernetes: Automated In-Cluster Patching Pipelines — running Copa as an in-cluster job triggered by the Trivy Operator
- Copa in CI/CD: Integrating Container Patch Automation into Your Pipeline — pipeline patterns for GitHub Actions, Tekton, and GitLab CI
- Kyverno Controller Security: Hardening the Policy Engine — securing the Kyverno installation itself, including RBAC, network policies, and admission webhook configuration
- Policy as Code at Scale: OPA, Rego Testing, and Enterprise Policy Libraries — managing policy libraries at enterprise scale, including testing and exception governance