Kyverno Controller Security: Hardening the Policy Engine That Enforces Your Security Policies

Kyverno Controller Security: Hardening the Policy Engine That Enforces Your Security Policies

The Problem

Kyverno occupies the most trusted position in a Kubernetes cluster: every API request that creates or modifies a resource passes through it before the API server persists anything to etcd. When you define a ClusterPolicy that blocks privileged containers, requires image signatures, or prevents RBAC escalation, Kyverno’s admission webhook is the mechanism that gives those policies teeth. It is also the mechanism that, if broken, renders your entire policy posture meaningless — silently, without any visible indication that enforcement has stopped.

This creates a structural problem that most teams do not reason about: the system that enforces your security controls is itself a high-value attack target, and it runs with the permissions necessary to enforce policy across the entire cluster. Kyverno’s service account needs to read and list virtually every resource type to evaluate policies. Its webhook intercepts every sensitive API call. Its controller pod processes arbitrary YAML from untrusted namespaces. An attacker who compromises Kyverno does not just gain control of one pod — they gain cluster-wide API access and the ability to silently approve any resource request, including ones that all your policies should block.

The Kubernetes admission control model exposes two distinct failure modes depending on how webhooks are configured:

  • failurePolicy: Fail — if Kyverno is unreachable or returns an error, the API server rejects the request. Enforcement is maintained, but any Kyverno instability cascades into a cluster-wide denial of service for resource creation.
  • failurePolicy: Ignore — if Kyverno is unreachable or returns an error, the API server allows the request through. Enforcement silently fails. An attacker who can reliably crash the Kyverno pod, induce webhook timeouts, or force Kyverno into an error state bypasses all policies without touching them directly.

The Kyverno Helm chart installs webhooks with failurePolicy: Ignore by default for several resource types, prioritising availability. That default configuration means Kyverno crashing — for any reason, including a CVE exploit, a bug triggered by crafted input, or even a legitimate OOM kill — silently disables policy enforcement for those resources.

Attack vectors against Kyverno:

CVE in the Kyverno binary. Kyverno parses arbitrary YAML resources submitted by users in any namespace. It evaluates JMESPath expressions and CEL expressions from ClusterPolicies. Its HTTP handler processes webhook admission review requests from the API server. Any vulnerability in the YAML parser, the expression evaluator, or the HTTP layer can lead to remote code execution inside the Kyverno pod. In 2023, CVE-2023-42443 demonstrated that Kyverno’s policy evaluation path could panic on crafted input, causing a denial of service — exactly the condition that turns failurePolicy: Ignore into an open door. RCE in Kyverno gives an attacker access to every secret and resource the Kyverno service account can read, which is nearly everything in the cluster.

Webhook misconfiguration. The failurePolicy, timeoutSeconds, matchConditions, and namespaceSelector fields on a ValidatingWebhookConfiguration or MutatingWebhookConfiguration define exactly when Kyverno’s enforcement applies and what happens when it fails. Misconfiguring these fields — setting too-broad namespace exclusions, overly generous timeouts that allow crafted slow requests to accumulate and exhaust Kyverno workers, or Ignore failure policies on security-critical paths — creates enforcement gaps. These misconfigurations are often introduced during upgrades when Kyverno regenerates webhook configurations, and they are rarely audited after the fact.

Supply chain attack on the Kyverno image. Kyverno is published to ghcr.io/kyverno/kyverno. Deploying by tag (kyverno:v1.12.0) creates a mutable reference: a compromised image pushed to the same tag deploys silently on the next rollout. A malicious Kyverno image that intercepts all admission requests and returns Allowed: true regardless of policy is indistinguishable from a working Kyverno at the Kubernetes API layer — until something that should have been blocked gets through.

RBAC escalation via Kyverno’s service account. Kyverno’s ClusterRole includes get, list, and watch on nearly all resource types in the cluster. It needs create, update, and delete permissions on several resource types to generate resources and manage its own webhook configurations. An attacker who executes code inside the Kyverno pod inherits this service account token and can use it to read Secrets across all namespaces, enumerate every workload, and potentially escalate further by manipulating Kyverno’s own ClusterPolicies.

ClusterPolicy deletion. An attacker with sufficient RBAC permissions to delete clusterpolicies.kyverno.io resources can surgically remove specific enforcement policies — the one blocking privileged pods, the one requiring image signatures, the one preventing hostPID access — without touching Kyverno itself. With no alerting on policy deletion events, this is invisible until the next security audit.

Threat Model

  • Kyverno pod compromise via RCE — attacker gains cluster-wide read access via the Kyverno service account token, can read Secrets in all namespaces, enumerate all workloads, and exfiltrate credentials. Kyverno’s generate permissions can be used to create resources across namespaces.
  • failurePolicy: Ignore + induced Kyverno crash — attacker submits crafted input that panics or OOM-kills the Kyverno pod; all resources matching Ignore-policy webhooks are admitted without policy evaluation. Privileged pods, unsigned images, and RBAC escalations proceed without denial.
  • Supply chain attack on Kyverno image — a malicious image deployed via a mutable tag returns Allowed for all admission requests regardless of ClusterPolicy content. Policy objects exist and appear correct; enforcement does not occur. Detection requires monitoring denial rates, not just policy presence.
  • ClusterPolicy deletion — attacker with clusterpolicies delete permission removes enforcement policies selectively. The cluster accepts the next privileged pod creation because the blocking policy no longer exists. No alert fires unless audit logging on Kyverno CRDs is explicitly configured.
  • Webhook configuration tampering — attacker with validatingwebhookconfigurations update permission modifies the Kyverno webhook to add namespace exclusions or change failurePolicy to Ignore. Kyverno continues running normally; enforcement is silently narrowed.

Access levels assumed: Adversary 1 (RCE path) achieves initial access via a CVE triggered from an untrusted namespace or via a compromised workload co-located with Kyverno. Adversary 2 (policy deletion / webhook tampering) has Kubernetes API access with permissions that appear legitimate but are scoped too broadly — a developer account with residual cluster-admin, a CI service account granted excessive permissions during initial setup.

Blast radius without hardening: All policy enforcement collapses. Privileged containers deploy. Unsigned images run in production. RBAC escalation proceeds unchecked. The enforcement collapse is silent unless you are monitoring Kyverno metrics for denial rate drops.

Hardening Configuration

1. Pin the Kyverno Image to a SHA Digest

A tag reference is a mutable pointer. Pin Kyverno to an immutable digest:

# Get the digest for the current release
crane digest ghcr.io/kyverno/kyverno:v1.13.4
# Output: sha256:7c9a2b4f1e8d3c5a6b0f2e4d8c1a3e5b7d9f0c2a4e6b8d0f2c4a6e8b0d2f4c6a

# Also get digests for the background controller and reports controller
crane digest ghcr.io/kyverno/background-controller:v1.13.4
crane digest ghcr.io/kyverno/reports-controller:v1.13.4
crane digest ghcr.io/kyverno/cleanup-controller:v1.13.4

In your Helm values file:

# kyverno-values.yaml
kyverno:
  image:
    repository: ghcr.io/kyverno/kyverno
    tag: ""
    digest: "sha256:7c9a2b4f1e8d3c5a6b0f2e4d8c1a3e5b7d9f0c2a4e6b8d0f2c4a6e8b0d2f4c6a"

backgroundController:
  image:
    repository: ghcr.io/kyverno/background-controller
    tag: ""
    digest: "sha256:<background-controller-digest>"

reportsController:
  image:
    repository: ghcr.io/kyverno/reports-controller
    tag: ""
    digest: "sha256:<reports-controller-digest>"

cleanupController:
  image:
    repository: ghcr.io/kyverno/cleanup-controller
    tag: ""
    digest: "sha256:<cleanup-controller-digest>"

Kyverno v1.11+ splits the monolithic controller into four separate deployments. Pin all four. Missing the background controller or cleanup controller means a supply chain attack on those images bypasses the digest check while potentially affecting Kyverno’s ability to manage generated resources and policies.

Use Renovate with pinDigests: true or Dependabot to track digest updates automatically:

# .github/dependabot.yml
version: 2
updates:
  - package-ecosystem: "helm"
    directory: "/helm/kyverno"
    schedule:
      interval: "weekly"
    commit-message:
      prefix: "chore(kyverno)"

Renovate with pinDigests: true converts Helm image values to digest form on first run and opens PRs when new Kyverno releases publish updated digests. Review the PR before merging — automated digest updates are themselves a trust decision.

2. Verify Kyverno Image Signatures with Cosign

Kyverno releases are signed with cosign using GitHub Actions OIDC (keyless signing). Verify the signature before deploying:

KYVERNO_IMAGE="ghcr.io/kyverno/kyverno:v1.13.4"

cosign verify \
  --certificate-identity-regexp="https://github.com/kyverno/kyverno/.*" \
  --certificate-oidc-issuer="https://token.actions.githubusercontent.com" \
  "${KYVERNO_IMAGE}"

A valid signature outputs the certificate details and the verified payload. Verification failure exits non-zero. Use this as a gate in your deployment pipeline:

#!/bin/bash
# deploy-kyverno.sh — run before helm upgrade

set -euo pipefail

KYVERNO_VERSION="v1.13.4"
IMAGES=(
  "ghcr.io/kyverno/kyverno:${KYVERNO_VERSION}"
  "ghcr.io/kyverno/background-controller:${KYVERNO_VERSION}"
  "ghcr.io/kyverno/reports-controller:${KYVERNO_VERSION}"
  "ghcr.io/kyverno/cleanup-controller:${KYVERNO_VERSION}"
  "ghcr.io/kyverno/kyvernopre:${KYVERNO_VERSION}"
)

for image in "${IMAGES[@]}"; do
  echo "Verifying: ${image}"
  cosign verify \
    --certificate-identity-regexp="https://github.com/kyverno/kyverno/.*" \
    --certificate-oidc-issuer="https://token.actions.githubusercontent.com" \
    "${image}" > /dev/null 2>&1 || {
      echo "FATAL: Signature verification failed for ${image}"
      echo "Aborting deployment."
      exit 1
    }
  echo "  OK: ${image}"
done

helm upgrade --install kyverno kyverno/kyverno \
  --namespace kyverno \
  --values kyverno-values.yaml

The certificate identity regexp matches Kyverno’s GitHub Actions signing workflow. A compromised image signed outside that workflow fails verification. A legitimate Kyverno image with a tampered digest fails cosign’s hash check before signature verification is attempted.

3. Harden the Kyverno Controller Pod

The default Kyverno Helm chart does not apply the most restrictive security context settings. Configure them explicitly:

# kyverno-values.yaml (security context and availability section)
kyverno:
  podSecurityContext:
    runAsNonRoot: true
    runAsUser: 65534
    runAsGroup: 65534
    fsGroup: 65534
    seccompProfile:
      type: RuntimeDefault

  securityContext:
    allowPrivilegeEscalation: false
    readOnlyRootFilesystem: true
    capabilities:
      drop:
        - ALL

  # Size limits to prevent memory exhaustion attacks via crafted large resources
  # Monitor container_memory_working_set_bytes and set to 2x p99 observed usage
  resources:
    limits:
      memory: 1536Mi
      cpu: "2"
    requests:
      memory: 512Mi
      cpu: 500m

  # Three replicas minimum for failurePolicy: Fail to be operationally viable
  replicaCount: 3

  # Spread replicas across nodes — all three on the same node = single point of failure
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: kubernetes.io/hostname
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          app.kubernetes.io/name: kyverno
          app.kubernetes.io/component: admission-controller

  # Kyverno needs writable temp space for policy compilation and cosign cache
  extraVolumes:
    - name: tmp-dir
      emptyDir: {}
    - name: sigstore-cache
      emptyDir: {}
  extraVolumeMounts:
    - name: tmp-dir
      mountPath: /tmp
    - name: sigstore-cache
      mountPath: /.sigstore

Add a PodDisruptionBudget to prevent node maintenance from draining all Kyverno replicas simultaneously:

# kyverno-pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: kyverno-admission-controller
  namespace: kyverno
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: kyverno
      app.kubernetes.io/component: admission-controller

With minAvailable: 2 and three replicas, node drain during cluster upgrades will not reduce Kyverno below two ready pods. Two ready pods remain sufficient to handle admission requests, preserving both enforcement and availability during maintenance windows.

4. Enforce failurePolicy: Fail on Security-Critical Webhooks

Audit the current failurePolicy configuration for all Kyverno-managed webhooks:

# Check all Kyverno webhook configurations
for webhook_type in validating mutating; do
  echo "=== ${webhook_type} webhooks ==="
  kubectl get ${webhook_type}webhookconfigurations \
    -l "webhook.kyverno.io/managed-by=kyverno" \
    -o json | jq -r '
      .items[] | 
      .metadata.name as $name |
      .webhooks[] | 
      "\($name): \(.name) → failurePolicy=\(.failurePolicy // "nil") timeoutSeconds=\(.timeoutSeconds)"
    '
done

Identify which webhooks cover security-critical resources:

# Find webhooks matching Pod, ClusterRole, ClusterRoleBinding
kubectl get validatingwebhookconfigurations kyverno-resource-validating-webhook-cfg \
  -o json | jq '
    .webhooks[] | 
    select(
      .rules[]? | 
      .resources[]? | 
      test("pods|clusterroles|clusterrolebindings|secrets")
    ) | 
    {name: .name, failurePolicy: .failurePolicy}
  '

Set failurePolicy at the ClusterPolicy level for security-critical policies:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: restrict-privileged-containers
  annotations:
    policies.kyverno.io/title: Restrict Privileged Containers
    policies.kyverno.io/severity: high
spec:
  webhookConfiguration:
    failurePolicy: Fail
    timeoutSeconds: 10
  validationFailureAction: Enforce
  background: false
  rules:
  - name: restrict-privileged
    match:
      any:
      - resources:
          kinds:
          - Pod
    validate:
      message: "Privileged containers are blocked by policy restrict-privileged-containers."
      pattern:
        spec:
          containers:
          - =(securityContext):
              =(privileged): "false"
---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-signed-images
spec:
  webhookConfiguration:
    failurePolicy: Fail
    timeoutSeconds: 15
  validationFailureAction: Enforce
  background: false
  rules:
  - name: verify-image-signature
    match:
      any:
      - resources:
          kinds:
          - Pod
    verifyImages:
    - imageReferences:
      - "ghcr.io/your-org/*"
      attestors:
      - count: 1
        entries:
        - keyless:
            subject: "https://github.com/your-org/*/.*"
            issuer: "https://token.actions.githubusercontent.com"
            rekor:
              url: https://rekor.sigstore.dev

failurePolicy: Fail on the image verification policy means that if Kyverno crashes while a pod creation is in flight, the pod is rejected. This is the correct outcome — an unsigned image that cannot be verified should not run.

5. Alert on ClusterPolicy Deletion and Webhook Configuration Changes

Add Kyverno CRD operations to your Kubernetes audit policy:

# audit-policy.yaml — add to existing audit policy
rules:
# Kyverno CRD changes — policy creation, modification, deletion
- level: RequestResponse
  verbs: ["create", "update", "patch", "delete"]
  resources:
  - group: "kyverno.io"
    resources:
    - clusterpolicies
    - policies
    - clusteradmissionpolicies
    - policyexceptions

# Webhook configuration changes — Kyverno-managed and any other
- level: RequestResponse
  verbs: ["update", "patch", "delete"]
  resources:
  - group: "admissionregistration.k8s.io"
    resources:
    - validatingwebhookconfigurations
    - mutatingwebhookconfigurations
  omitManagedFields: false

RequestResponse level captures both request and full response body. For webhook configuration changes, the response includes the complete updated object — you can reconstruct exactly which fields changed, including whether failurePolicy was flipped from Fail to Ignore.

Route audit events to Falco via the k8saudit plugin:

# falco-rules-kyverno.yaml
- rule: Kyverno ClusterPolicy deleted
  desc: >
    A Kyverno ClusterPolicy was deleted. All enforcement by this policy is 
    immediately removed. Resources that were previously blocked by this policy 
    will now be admitted without restriction.
  condition: >
    ka.verb = "delete" and
    ka.target.resource = "clusterpolicies" and
    ka.target.group = "kyverno.io"
  output: >
    CRITICAL: Kyverno ClusterPolicy deleted
    (policy=%ka.target.name
     user=%ka.user.name
     groups=%ka.user.groups
     useragent=%ka.useragent
     sourceip=%ka.source.ip)
  priority: CRITICAL
  source: k8saudit
  tags: [kyverno, policy, admission-control]

- rule: Kyverno webhook configuration modified by non-Kyverno principal
  desc: >
    A Kyverno-managed ValidatingWebhookConfiguration or 
    MutatingWebhookConfiguration was modified by a principal that is not the 
    Kyverno service account. This may indicate an attempt to weaken or bypass 
    policy enforcement by changing failurePolicy, namespaceSelector, or 
    matchConditions.
  condition: >
    (ka.verb in ("update", "patch")) and
    (ka.target.resource in 
      ("validatingwebhookconfigurations", "mutatingwebhookconfigurations")) and
    ka.target.name startswith "kyverno" and
    not ka.user.name startswith "system:serviceaccount:kyverno:"
  output: >
    WARNING: Kyverno webhook configuration modified by non-Kyverno principal
    (webhook=%ka.target.name
     verb=%ka.verb
     user=%ka.user.name
     groups=%ka.user.groups
     sourceip=%ka.source.ip)
  priority: WARNING
  source: k8saudit
  tags: [kyverno, webhook, admission-control]

- rule: Kyverno PolicyException created
  desc: >
    A Kyverno PolicyException was created. PolicyExceptions exempt specific 
    resources from one or more ClusterPolicies. Review to confirm this exception 
    is authorised and scoped to the minimum necessary resources and namespaces.
  condition: >
    ka.verb = "create" and
    ka.target.resource = "policyexceptions" and
    ka.target.group = "kyverno.io"
  output: >
    INFO: Kyverno PolicyException created
    (exception=%ka.target.name
     namespace=%ka.target.namespace
     user=%ka.user.name
     groups=%ka.user.groups)
  priority: INFORMATIONAL
  source: k8saudit
  tags: [kyverno, policy-exception, admission-control]

- rule: Kyverno RBAC permissions escalated
  desc: >
    The Kyverno ClusterRole or ClusterRoleBinding was modified by a principal 
    other than the Kyverno service account. Unexpected changes to Kyverno's RBAC 
    may grant the Kyverno pod additional cluster permissions.
  condition: >
    (ka.verb in ("update", "patch")) and
    (ka.target.resource in ("clusterroles", "clusterrolebindings")) and
    ka.target.name startswith "kyverno" and
    not ka.user.name startswith "system:serviceaccount:kyverno:"
  output: >
    WARNING: Kyverno RBAC modified by non-Kyverno principal
    (resource=%ka.target.resource
     name=%ka.target.name
     user=%ka.user.name
     sourceip=%ka.source.ip)
  priority: WARNING
  source: k8saudit
  tags: [kyverno, rbac, admission-control]

The webhook modification rule excludes Kyverno service accounts — Kyverno legitimately updates its own webhook configurations during startup and upgrades. Any other principal modifying Kyverno webhooks is anomalous and worth immediate investigation.

6. Monitor Kyverno Metrics for Enforcement Degradation

A healthy Kyverno cluster shows a steady rate of admission requests, some fraction of which are denied. The denial rate dropping to zero in a busy cluster is a strong signal that enforcement has failed — either Kyverno is down, or a bypass has succeeded:

# prometheus-alerts-kyverno.yaml
groups:
- name: kyverno_health
  rules:

  # Alert if Kyverno has processed requests but denied nothing recently
  # In any active cluster, some requests fail policy validation
  # Adjust the request count threshold for your cluster's traffic volume
  - alert: KyvernoZeroDenials
    expr: |
      (
        sum(increase(kyverno_admission_requests_total[15m])) > 50
      )
      and
      (
        sum(increase(kyverno_admission_requests_total{
          resource_type="Pod",
          action="deny"
        }[15m])) == 0
      )
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Kyverno processed >50 requests but denied zero in 15 minutes"
      description: >
        Either all submitted resources are fully policy-compliant (verify manually),
        or policy enforcement has silently failed. Check:
        kubectl get clusterpolicies
        kubectl get pods -n kyverno
        kubectl top pods -n kyverno

  # p99 admission latency above 5s — webhook timeout is typically 10s
  # If latency exceeds timeout, API server applies failurePolicy
  - alert: KyvernoAdmissionLatencyHigh
    expr: |
      histogram_quantile(0.99, 
        rate(kyverno_admission_review_duration_seconds_bucket[5m])
      ) > 5
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "Kyverno admission p99 latency >5s — risk of webhook timeout"
      description: >
        Kyverno admission review p99 latency is {{ $value }}s. 
        Webhook timeout is typically 10s. At timeout, the API server applies 
        failurePolicy. For Fail webhooks this rejects the request; for Ignore 
        webhooks this silently bypasses enforcement.

  # Fewer than 2 ready replicas — single pod failure makes Kyverno unavailable
  - alert: KyvernoReplicasLow
    expr: |
      kube_deployment_status_replicas_ready{
        namespace="kyverno",
        deployment=~"kyverno-admission.*"
      } < 2
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Kyverno admission controller has fewer than 2 ready replicas"
      description: >
        Ready replicas: {{ $value }}. A single pod failure now makes Kyverno 
        unavailable. For failurePolicy: Fail webhooks, this causes resource 
        creation failures cluster-wide. For failurePolicy: Ignore webhooks, 
        this silently disables enforcement.

  # Processing errors — depending on failurePolicy, may be causing silent bypasses
  - alert: KyvernoAdmissionErrors
    expr: |
      rate(kyverno_admission_requests_total{action="error"}[5m]) > 0.1
    for: 3m
    labels:
      severity: warning
    annotations:
      summary: "Kyverno processing >0.1 admission errors/second"
      description: >
        Kyverno is encountering internal errors evaluating policies. 
        Depending on webhook failurePolicy, these errors may be resulting 
        in silent bypasses for Ignore-policy webhooks.
        Errors/s: {{ $value }}

  # Background controller down — generated resources will drift from policy
  - alert: KyvernoBackgroundControllerDown
    expr: |
      kube_deployment_status_replicas_ready{
        namespace="kyverno",
        deployment="kyverno-background-controller"
      } < 1
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "Kyverno background controller has no ready replicas"
      description: >
        The background controller handles generate rules and policy-triggered 
        resource creation. Its absence does not affect admission enforcement 
        but will cause generated resources (NetworkPolicies, RoleBindings created 
        by generate rules) to diverge from policy intent.

7. Validate Webhook TLS and Certificate Rotation

Kyverno’s admission webhook communicates with the API server over mTLS. Verify the TLS configuration and certificate validity:

# Check the Kyverno TLS secret's certificate validity
kubectl get secret kyverno-tls-pair -n kyverno -o json | \
  jq -r '.data["tls.crt"]' | \
  base64 -d | \
  openssl x509 -noout -dates -subject -issuer

# Verify the CA bundle in the webhook configuration has not expired
kubectl get validatingwebhookconfigurations kyverno-resource-validating-webhook-cfg \
  -o json | \
  jq -r '.webhooks[0].clientConfig.caBundle' | \
  base64 -d | \
  openssl x509 -noout -checkend 604800
# "Certificate will not expire" = good
# "Certificate WILL expire" = certificate rotation failure — webhook will break

# Check whether the CA bundle in the webhook config matches what Kyverno serves
webhook_ca=$(kubectl get validatingwebhookconfigurations \
  kyverno-resource-validating-webhook-cfg \
  -o json | jq -r '.webhooks[0].clientConfig.caBundle')
secret_ca=$(kubectl get secret kyverno-tls-pair -n kyverno \
  -o json | jq -r '.data["tls.crt"]')
[ "$webhook_ca" = "$secret_ca" ] && echo "CA bundles match" || echo "MISMATCH — webhook may fail"

For production clusters, integrate Kyverno TLS with cert-manager for reliable rotation and expiry alerting:

# cert-manager Certificate for Kyverno's webhook TLS
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: kyverno-tls
  namespace: kyverno
spec:
  secretName: kyverno-tls-pair
  duration: 8760h    # 1 year
  renewBefore: 720h  # Rotate 30 days before expiry
  subject:
    organizations:
      - kyverno
  commonName: kyverno-svc.kyverno.svc
  dnsNames:
  - kyverno-svc
  - kyverno-svc.kyverno
  - kyverno-svc.kyverno.svc
  - kyverno-svc.kyverno.svc.cluster.local
  issuerRef:
    name: cluster-internal-ca
    kind: ClusterIssuer
    group: cert-manager.io

With cert-manager managing the webhook certificate, rotation happens automatically 30 days before expiry, and cert-manager’s own metrics expose the time remaining before expiry — alertable through the standard cert-manager Prometheus rules.

Expected Behaviour

After image signature verification is integrated into the deployment pipeline: a deployment run against a Kyverno image that has not been signed by Kyverno’s GitHub Actions workflow exits non-zero before helm upgrade runs:

Verifying: ghcr.io/kyverno/kyverno:v1.13.4
FATAL: Signature verification failed for ghcr.io/kyverno/kyverno:v1.13.4
Error: no matching signatures: failed to verify certificate. Expected 
  SAN identity matching "https://github.com/kyverno/kyverno/..." 
  but found no matching entries.
Aborting deployment.

After configuring failurePolicy: Fail and Kyverno becomes unavailable: a kubectl run nginx --image=nginx produces:

Error from server (InternalError): Internal error occurred: failed calling 
webhook "validate.kyverno.svc-fail": Post 
"https://kyverno-svc.kyverno.svc:443/validate/fail": 
dial tcp 10.96.45.12:443: connect: connection refused

The pod is not created. Enforcement is maintained at the cost of availability. The correct response is to restore Kyverno, not to switch webhooks to Ignore.

After the ClusterPolicy deletion Falco rule fires:

CRITICAL: Kyverno ClusterPolicy deleted
  policy=restrict-privileged-containers
  user=ci-deploy-sa
  groups=[system:serviceaccounts, system:authenticated]
  useragent=kubectl/v1.29.2
  sourceip=10.0.1.44

The combination of deleted policy name and actor gives you immediate context: a CI service account deleted a security-critical policy. The alert fires within seconds of the deletion — not on the next scheduled audit.

After configuring the KyvernoZeroDenials Prometheus alert: in a cluster where Kyverno normally denies several requests per hour, the alert fires within 5 minutes of a Kyverno crash paired with an Ignore-policy webhook. The 50-request threshold prevents false positives from genuinely quiet periods — the cluster must have been active recently for the alert to fire.

Trade-offs

failurePolicy: Fail for all Pod-related webhooks. If Kyverno loses all replicas — upgrade gone wrong, node failure draining the kyverno namespace, persistent OOM condition — the entire cluster stops accepting new Pods. Deployments cannot scale. Jobs cannot start. This is the correct security outcome, but it transforms a Kyverno incident from an enforcement gap into a cluster availability incident. Running failurePolicy: Fail safely requires three replicas minimum, topology spread constraints preventing all replicas from landing on the same node, a PodDisruptionBudget preventing simultaneous eviction during maintenance, and resource limits sized conservatively to prevent OOM kills.

Image digest pinning across four components. Kyverno v1.11+ splits into admission controller, background controller, reports controller, and cleanup controller — each with its own image. Digest pinning requires maintaining four digests through each Kyverno release. Automated tooling handles this, but it adds PR review overhead per release. Teams that automerge digest updates without review are reintroducing the trust problem that digest pinning solves.

KyvernoZeroDenials alert in small clusters. A cluster where legitimate policy violations genuinely never occur — a single-application deployment with all resources fully compliant — produces a persistent false-positive alert. The alert is designed for clusters with active policy violations as a baseline. Threshold tuning requires understanding your cluster’s normal violation rate before enabling the alert. Consider using a cluster-specific threshold derived from a 7-day p5 denial rate baseline.

PolicyExceptions as a bypass mechanism. Kyverno v1.11+ supports PolicyException resources that exempt specific workloads from specific policies. In clusters with active policy engineering teams, PolicyExceptions accumulate over time. Each exception represents a deliberate reduction in coverage. Consider enforcing PolicyExceptions themselves via a Kyverno meta-policy — requiring that every PolicyException resource be created by a restricted service account, scoped to specific namespaces, and annotated with a Jira ticket reference:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: restrict-policy-exception-creation
spec:
  validationFailureAction: Enforce
  rules:
  - name: require-ticket-annotation
    match:
      any:
      - resources:
          kinds: [PolicyException]
    validate:
      message: "PolicyExceptions must include a jira-ticket annotation."
      pattern:
        metadata:
          annotations:
            security/jira-ticket: "?*"

Failure Modes

Running Kyverno with a single replica. A single Kyverno pod is a single point of failure for all admission control. Any crash — crash loop backoff, OOM kill, node failure — removes all enforcement. With failurePolicy: Ignore, enforcement is silently absent until the pod restarts. This is the most common Kyverno misconfiguration in production clusters: the Helm chart defaults to one replica for resource-constrained environments, and it is rarely changed at install time.

Not auditing failurePolicy after Kyverno upgrades. Kyverno regenerates its webhook configurations on startup. Depending on the upgrade path and Helm values, the regenerated webhooks may not preserve the failurePolicy overrides you configured. After every Kyverno upgrade, run the audit command from Section 4 to verify the failurePolicy configuration is intact. This is not a theoretical risk — Kyverno upgrades that changed webhook management behaviour between minor versions have silently reset failurePolicy in the wild.

Excluding the kyverno namespace from all Kyverno policies. The standard Kyverno installation excludes the kyverno namespace from policy enforcement to prevent recursive webhook calls. This is necessary for Kyverno to function, but it means that resources in the kyverno namespace — including the Kyverno pods themselves and any attacker-deployed resources — are not subject to your ClusterPolicies. If an attacker compromises the kyverno namespace, they can create privileged pods, unprotected deployments, and credential-extracting resources without triggering any policy denial. Verify the namespace exclusion is scoped to the minimum necessary resources rather than a blanket exclusion of all policy types from the entire namespace.

Trusting policy existence as proof of enforcement. The most operationally dangerous failure mode is organisational: running kubectl get clusterpolicies and seeing all expected policies listed is treated as confirmation that enforcement is working. Policies can exist while enforcement is bypassed via failurePolicy: Ignore, webhook misconfiguration, or a Kyverno image that silently returns Allow for all requests. Periodic enforcement validation — intentionally submitting a policy-violating resource in a non-production namespace and confirming it is rejected — is the only reliable test that the full enforcement chain is functioning end to end. Automate this test in your cluster health check suite alongside readiness probe checks.