Kyverno Policy Development and Testing: Validate, Mutate, and Generate
Problem
Kubernetes admission control stops misconfigured workloads at deployment time. Kyverno implements admission control as YAML policies — no Rego, no webhooks to write, no custom controllers. A security engineer who understands Kubernetes YAML can write Kyverno policies without learning a new programming language.
But effective Kyverno policy authorship requires understanding several non-obvious behaviours:
- Validate vs enforce. A
ClusterPolicywithvalidationFailureAction: Auditlogs violations but never blocks anything. Teams often deploy in Audit mode and never flip to Enforce, so the policy provides no actual protection. - Pattern matching subtleties. Kyverno patterns match using an
anchorsystem.?(key): valueis a conditional anchor (only match if the key exists).^(key): valueis a negation anchor. Getting these wrong produces silent non-enforcement. - Context and variables. Policies that reference
request.objectvsrequest.oldObjectbehave differently on creates vs updates. Policies that usecontext.variablesfor external data lookups require understanding Kyverno’s JMESPath implementation. - Mutation ordering. Multiple mutate rules apply in sequence. A later rule can undo what an earlier one set. Order matters.
- Testing is skipped. Most teams write policies and test by trying to deploy a bad workload manually. This misses edge cases, doesn’t catch regressions when policies are updated, and doesn’t verify that the policy fires for all intended resource types.
Target systems: Kyverno 1.12+; Kubernetes 1.28+; Chainsaw 0.1.9+ (Kyverno’s E2E testing tool); kyverno CLI 1.12+ (local policy testing without a cluster).
Threat Model
- Adversary 1 — Policy in Audit mode only: A policy exists to block privileged containers. It was deployed in Audit mode for testing and never switched to Enforce. A developer deploys a privileged container; the policy logs a violation but the container runs.
- Adversary 2 — Pattern mismatch bypasses policy: A validate policy checks
spec.containers[*].securityContext.privileged == false. An attacker deploys a pod withsecurityContext.privileged: null(absent, not false). The pattern match fails to catch the omission. - Adversary 3 — Kyverno webhook unavailable: The Kyverno admission webhook is down during a deployment. If
failurePolicy: Ignoreis set, all admission requests are allowed through. The policy provides no protection during Kyverno downtime. - Adversary 4 — Namespace exemption too broad: A policy excludes namespaces labelled
policy-exempt: true. A developer adds that label to a production namespace to unblock a deployment, permanently exempting it. - Adversary 5 — Generate policy creates overpermissive defaults: A Kyverno generate rule creates a default NetworkPolicy for new namespaces. The generated policy is overly permissive; all new namespaces inherit a weak posture.
- Access level: Adversaries 1–4 have developer access to deploy workloads or modify namespace labels. Adversary 5 is a Kyverno misconfiguration affecting all new namespaces.
- Objective: Deploy non-compliant workloads, bypass admission controls, inherit weak security defaults.
- Blast radius: An audit-mode-only policy provides zero protection. A misconfigured pattern match leaves the intended gap open. Kyverno in Ignore failure mode means any Kyverno outage = open admission.
Configuration
Step 1: Install Kyverno with HA and Fail-Closed
helm repo add kyverno https://kyverno.github.io/kyverno/
helm repo update
helm install kyverno kyverno/kyverno \
--namespace kyverno --create-namespace \
--set replicaCount=3 \ # HA: 3 admission controller replicas.
--set admissionController.replicas=3 \
--set backgroundController.replicas=2 \
--set webhookFailurePolicy=Fail \ # CRITICAL: fail-closed if Kyverno is unavailable.
--set webhookTimeout=15 \ # 15s timeout before failing.
--set "features.policyExceptions.enabled=true" # Enable structured exceptions.
webhookFailurePolicy=Fail means if Kyverno’s webhook is unreachable, admission requests are denied rather than allowed. This is the secure default for production.
Verify the webhook is configured correctly:
kubectl get validatingwebhookconfigurations kyverno-resource-validating-webhook-cfg \
-o jsonpath='{.webhooks[0].failurePolicy}'
# Expected: Fail
Step 2: Validate Policy — Blocking Privileged Containers
A complete, production-grade validate policy catches both explicit and implicit privileged access:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: disallow-privileged-containers
annotations:
policies.kyverno.io/title: Disallow Privileged Containers
policies.kyverno.io/description: >
Privileged containers share the host kernel namespaces and have full
access to host resources. This policy blocks privileged containers
and privilege escalation.
spec:
validationFailureAction: Enforce # Not Audit — actually blocks.
background: true # Also check existing resources.
rules:
- name: no-privileged
match:
any:
- resources:
kinds: [Pod]
exclude:
any:
- resources:
namespaces: [kyverno, kube-system] # Minimal exemptions only.
validate:
message: "Privileged containers are not allowed."
pattern:
spec:
containers:
# =(key) is a conditional anchor: only check if the key exists.
# Without this, a missing securityContext would match.
- =(securityContext):
=(privileged): "false | null"
=(allowPrivilegeEscalation): "false | null"
initContainers:
- =(securityContext):
=(privileged): "false | null"
=(allowPrivilegeEscalation): "false | null"
ephemeralContainers:
- =(securityContext):
=(privileged): "false | null"
- name: require-drop-all
match:
any:
- resources:
kinds: [Pod]
exclude:
any:
- resources:
namespaces: [kyverno, kube-system]
validate:
message: "Containers must drop ALL capabilities."
deny:
conditions:
any:
- key: "{{ request.object.spec.containers[].securityContext.capabilities.drop[] | contains(@, 'ALL') }}"
operator: Equals
value: false
Step 3: Mutate Policy — Enforcing Defaults
Mutate policies add or overwrite fields at admission time, ensuring defaults are applied even when developers omit them:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: add-default-security-context
spec:
rules:
- name: set-security-defaults
match:
any:
- resources:
kinds: [Pod]
namespaces: [production, staging]
mutate:
patchStrategicMerge:
spec:
securityContext:
# Set defaults that developers frequently omit.
# Only sets if not already specified (strategic merge semantics).
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containers:
- (name): "*" # Apply to all containers.
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: [ALL]
Mutate with JMESPath for conditional logic:
- name: add-resource-limits-if-missing
match:
any:
- resources:
kinds: [Pod]
preconditions:
any:
# Only apply if at least one container is missing resource limits.
- key: "{{ request.object.spec.containers[?!resources.limits] | length(@) }}"
operator: GreaterThan
value: "0"
mutate:
foreach:
- list: "request.object.spec.containers"
patchStrategicMerge:
spec:
containers:
- name: "{{ element.name }}"
resources:
limits:
memory: "512Mi"
cpu: "500m"
Step 4: Generate Policy — Secure Defaults for New Namespaces
Generate policies create resources in response to other resource creation:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: generate-default-network-policy
spec:
rules:
- name: default-deny-all
match:
any:
- resources:
kinds: [Namespace]
exclude:
any:
- resources:
names: [kube-system, kyverno, monitoring]
generate:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
name: default-deny-all
namespace: "{{ request.object.metadata.name }}"
synchronize: true # Keep generated resource in sync if policy changes.
data:
spec:
podSelector: {} # Applies to all pods in the namespace.
policyTypes: [Ingress, Egress]
# Empty Ingress/Egress = deny all by default.
- name: default-allow-dns
match:
any:
- resources:
kinds: [Namespace]
exclude:
any:
- resources:
names: [kube-system, kyverno, monitoring]
generate:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
name: allow-dns-egress
namespace: "{{ request.object.metadata.name }}"
synchronize: true
data:
spec:
podSelector: {}
policyTypes: [Egress]
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- port: 53
protocol: UDP
Step 5: Testing with the Kyverno CLI
Test policies without a cluster:
# Install Kyverno CLI.
curl -LO https://github.com/kyverno/kyverno/releases/latest/download/kyverno_linux_amd64.tar.gz
tar xzf kyverno_linux_amd64.tar.gz && mv kyverno /usr/local/bin/
# Test a policy against a resource file.
kyverno apply disallow-privileged.yaml --resource pod-privileged.yaml
# Output: PASS count: 0, FAIL count: 1
# [pod-privileged] policy/disallow-privileged-containers/no-privileged FAIL
# Test against a directory of resources.
kyverno apply policies/ --resource resources/
# Test with generate (shows what would be created).
kyverno apply generate-netpol.yaml --resource namespace.yaml
Step 6: Integration Testing with Chainsaw
Chainsaw runs Kyverno policies against a real cluster with declarative test scenarios:
# Install Chainsaw.
go install github.com/kyverno/chainsaw@latest
# Directory structure for a policy test.
# tests/
# disallow-privileged/
# chainsaw-test.yaml
# manifests/
# bad-pod.yaml (should be blocked)
# good-pod.yaml (should be allowed)
# tests/disallow-privileged/chainsaw-test.yaml
apiVersion: chainsaw.kyverno.io/v1alpha1
kind: Test
metadata:
name: disallow-privileged-containers
spec:
steps:
- name: apply-policy
try:
- apply:
file: ../../../../policies/disallow-privileged.yaml
- name: test-privileged-pod-blocked
try:
- apply:
file: manifests/bad-pod.yaml
expect:
- match:
apiVersion: v1
kind: Pod
check:
($error != null): true # Expect the apply to fail.
- name: test-compliant-pod-allowed
try:
- apply:
file: manifests/good-pod.yaml
- assert:
file: manifests/good-pod.yaml
- name: cleanup
try:
- delete:
file: ../../../../policies/disallow-privileged.yaml
# Run all tests.
chainsaw test tests/
# Output:
# Running tests...
# PASS: disallow-privileged-containers/test-privileged-pod-blocked
# PASS: disallow-privileged-containers/test-compliant-pod-allowed
Add Chainsaw tests to CI:
# .github/workflows/kyverno-test.yml
name: Kyverno Policy Tests
on:
push:
paths: ["kyverno/policies/**", "kyverno/tests/**"]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Create kind cluster
uses: helm/kind-action@v1
- name: Install Kyverno
run: |
helm install kyverno kyverno/kyverno -n kyverno --create-namespace \
--set webhookFailurePolicy=Fail
- name: Run Chainsaw tests
run: chainsaw test kyverno/tests/ --parallel 4
Step 7: Policy Exceptions — Structured Override
Never use broad namespace exemptions. Use PolicyException for specific, auditable overrides:
# PolicyException: allow a specific legacy workload to use a privileged container.
apiVersion: kyverno.io/v2beta1
kind: PolicyException
metadata:
name: legacy-monitoring-agent-exception
namespace: monitoring # Exception is namespace-scoped.
annotations:
# Justification is mandatory — document why the exception exists.
policy.exception/justification: >
The legacy-monitoring-agent requires host PID access for metrics collection.
Migration to the new agent is tracked in ticket MON-1234.
Exception expires: 2026-12-01.
policy.exception/approved-by: "security-team@example.com"
policy.exception/expires: "2026-12-01"
spec:
exceptions:
- policyName: disallow-privileged-containers
ruleNames:
- no-privileged
match:
any:
- resources:
kinds: [Pod]
names: [legacy-monitoring-agent-*]
namespaces: [monitoring]
Audit all PolicyExceptions periodically:
kubectl get policyexceptions -A -o json | jq '
.items[] | {
name: .metadata.name,
namespace: .metadata.namespace,
policy: .spec.exceptions[].policyName,
justification: .metadata.annotations["policy.exception/justification"],
expires: .metadata.annotations["policy.exception/expires"]
}'
Step 8: Telemetry
kyverno_policy_results_total{policy, rule, resource_namespace, result} counter
kyverno_admission_requests_total{resource_kind, operation, result} counter
kyverno_policy_execution_duration_seconds{policy, rule} histogram
kyverno_exceptions_used_total{policy, exception} counter
kyverno_controller_reconcile_errors_total counter
Alert on:
kyverno_policy_results_total{result="fail"}in namespaces where Enforce is expected — someone deployed a non-compliant resource; investigate.kyverno_controller_reconcile_errors_totalnon-zero — Kyverno controller is failing; policies may not be applied correctly.kyverno_exceptions_used_totalincreasing — exceptions are accumulating; trigger a review.- Kyverno pods
NotReady— webhook unavailable; withfailurePolicy=Fail, all admission requests blocked; urgent.
Expected Behaviour
| Signal | No Kyverno | Kyverno Audit mode | Kyverno Enforce mode |
|---|---|---|---|
| Privileged container deployed | Allowed | Allowed; violation logged | Blocked; error returned to kubectl |
| Missing resource limits | Allowed | Allowed (violation logged if validate rule) | Set to defaults by mutate rule |
| New namespace created | No default NetworkPolicy | No default NetworkPolicy | NetworkPolicy generated automatically |
| Kyverno webhook down | N/A | All requests allowed (Ignore failurePolicy) | All requests denied (Fail failurePolicy) |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
webhookFailurePolicy=Fail |
No bypass during Kyverno downtime | Kyverno downtime blocks all admission | Run 3 HA replicas; monitor Kyverno pod health with priority alerting. |
validationFailureAction: Enforce |
Actually blocks non-compliant workloads | Breaks existing non-compliant workloads on first deploy | Use Audit mode to discover violations; fix before switching to Enforce. |
Generate with synchronize: true |
Generated resources stay in sync with policy | Policy change propagates to all existing namespaces | Test policy changes in staging; use synchronize: false for break-glass scenarios. |
| PolicyException over namespace exemption | Auditable; specific; time-bounded | More effort than adding a namespace label | The overhead is the point — exceptions should require effort. |
| Chainsaw E2E tests | Catches regression in policy logic | Requires a cluster (even kind) | Use kind in CI; fast spin-up (~60s). |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Kyverno pod OOMKilled | Webhook becomes unavailable | kyverno_controller_reconcile_errors_total; pods restarting |
Increase Kyverno memory limits; large clusters need more resources. |
| Pattern anchor mismatch | Policy does not block the intended configuration | Chainsaw test reveals the gap | Fix the anchor syntax; use kyverno apply locally to test specific resources. |
synchronize: true deletes a manually-created resource |
A manually created NetworkPolicy is deleted when Kyverno generates one | Resource disappears; application connectivity breaks | Check synchronize: true policies after any generate-policy change. |
| Exception not expired when due | Expired exception still grants access | PolicyException list shows past expires annotation |
Automate exception expiry checks via a weekly job; delete or renew exceptions on schedule. |
| Mutate rule breaks existing workload | Pod fails health check after mutation changes a setting | Pod fails; logs show unexpected configuration | Add preconditions to the mutate rule; only apply when a specific condition is met. |