Kubernetes Structured Authorization Config: Hardening Multi-Webhook Auth Chains
The Problem
Before Kubernetes 1.30, authorization configuration was a flat --authorization-mode flag on the API server: Node,RBAC,Webhook. The ordering was fixed at startup, every request hit every authorizer in sequence, and webhook authorizers had a single binary fail-open/fail-closed dial. There was no way to express “run this webhook only for requests in the kube-system namespace” or “skip the external PAM webhook for service accounts” without building that logic into the webhook itself.
The Structured Authorization Config API (KEP-3221, GA in 1.30) replaces this with a YAML file that allows expressing per-authorizer matchConditions using the same CEL expression language used in ValidatingAdmissionPolicies. This is powerful, but it introduces a class of misconfiguration bugs that didn’t exist before:
Silent bypass via CEL condition gap. If a matchConditions expression is slightly wrong — for example, request.user.startsWith("system:serviceaccount:prod:") written when the intent was “all service accounts in prod namespace” — requests that don’t match simply skip the webhook and fall through to RBAC. Depending on RBAC configuration, this can mean unauthorized access silently passes.
Chain ordering confusion. The structured config evaluates authorizers in declaration order. Teams migrating from the old flag-based config sometimes inadvertently reorder evaluators, placing a permissive webhook before RBAC and granting access that RBAC would have denied.
Failure policy interaction. Each webhook in the chain can have failurePolicy: Deny or failurePolicy: NoOpinion. NoOpinion on a failure means the chain continues, which is safe if later authorizers are strict. Deny stops the chain and rejects the request. Teams frequently set all webhooks to Deny as a “safe default” but then find that webhook outages deny API server traffic even for innocuous requests like pod status updates.
Stale authorizer configs. The structured config is a file (or ConfigMap) reloaded by the API server on a configurable interval. A deployment that updates the file without validating the CEL expressions first can push a broken config that rejects all requests until the API server is restarted.
Target systems: Kubernetes 1.30+ API servers using --authorization-config file; clusters migrating from the legacy --authorization-mode flag.
Threat Model
1. Insider attacker with limited RBAC (authenticated cluster user, no cluster-admin). Objective: craft a request that matches a CEL condition gap in the authorization chain, bypassing a restrictive external webhook and relying on a permissive RBAC binding. Impact: privilege escalation without triggering the external webhook’s audit log.
2. Webhook operator with webhook service write access (compromised webhook deployment). Objective: return NoOpinion for all requests to force fall-through to a permissive RBAC binding installed as a backdoor. Impact: cluster-wide authorization bypass persists until the webhook’s responses are audited.
3. Platform engineer with API server config write access (misconfiguration, not malice). Objective: push an authorization config update with a CEL syntax error; API server reloads the broken config and begins denying or permitting requests incorrectly. Impact: denial of service to cluster operations or unauthorized access depending on the failure direction.
4. External attacker with stolen service account token (network access to API server). Objective: find a service account whose requests match a CEL gap that skips all restrictive webhooks, then escalate from there. Impact: lateral movement within the cluster from a compromised pod.
Blast radius without hardening: a single CEL expression error in a matchConditions block can silently bypass an entire authorizer for all requests that don’t match, giving adversaries a structural bypass window that survives API server restarts.
Hardening Configuration
Structured Authorization Config File Layout
Create /etc/kubernetes/authorization-config.yaml (path referenced by --authorization-config on kube-apiserver):
apiVersion: apiserver.config.k8s.io/v1
kind: AuthorizationConfiguration
authorizers:
# 1. Node authorizer — must be first; handles kubelet credential scope
- type: Node
# 2. External policy webhook (OPA, Kyverno-authz, etc.)
- type: Webhook
name: opa-authz
webhook:
authorizedTTL: 5m
unauthorizedTTL: 30s
timeout: 3s
subjectAccessReviewVersion: v1
# Fail CLOSED: if the webhook is unavailable, deny the request
failurePolicy: Deny
connectionInfo:
type: KubeConfigFile
kubeConfigFile: /etc/kubernetes/opa-authz-kubeconfig.yaml
matchConditions:
# Only send requests to OPA if they are NOT system component requests
# System components (kube-controller-manager, scheduler) use RBAC only
- expression: >
!(request.user.startsWith("system:") &&
!request.user.startsWith("system:serviceaccount:"))
# Skip OPA for read-only discovery requests (reduces latency hot path)
- expression: >
!(request.resourceAttributes.verb in ["get","list","watch"] &&
request.resourceAttributes.resource in ["namespaces","nodes"] &&
request.resourceAttributes.namespace == "")
# 3. RBAC — final authorizer; must always be present
- type: RBAC
Apply the config to the API server:
# /etc/kubernetes/manifests/kube-apiserver.yaml (static pod)
spec:
containers:
- command:
- kube-apiserver
- --authorization-config=/etc/kubernetes/authorization-config.yaml
# IMPORTANT: remove --authorization-mode if present — the two flags conflict
volumeMounts:
- mountPath: /etc/kubernetes/authorization-config.yaml
name: auth-config
readOnly: true
volumes:
- hostPath:
path: /etc/kubernetes/authorization-config.yaml
name: auth-config
Validating CEL Expressions Before Deployment
Use kubectl dry-run with the authorization config admission validation:
# Validate syntax before pushing (requires kubectl 1.30+)
kubectl create --dry-run=server \
-f /etc/kubernetes/authorization-config.yaml \
--validate=strict
# More thorough: use cel-validator (standalone tool)
# https://github.com/google/cel-go
cat /etc/kubernetes/authorization-config.yaml | \
grep "expression:" | awk '{print $2}' | while read expr; do
echo "Validating: $expr"
cel-eval "$expr" '{"request": {"user": "test", "resourceAttributes": {"verb": "get", "resource": "pods", "namespace": "default"}}}'
done
Write a test matrix that exercises each CEL condition:
#!/bin/bash
# test-authz-conditions.sh
# Tests that key request types hit the right authorizer
API_SERVER="https://localhost:6443"
TOKEN=$(cat /tmp/test-token)
# Test 1: system:node credentials should bypass OPA (handled by Node authorizer)
result=$(kubectl auth can-i get pods --as="system:node:node01" -n default 2>&1)
echo "Node credential bypass OPA: $result"
# Test 2: regular service account should reach OPA
result=$(kubectl auth can-i create deployments \
--as="system:serviceaccount:prod:my-svc" -n prod 2>&1)
echo "Service account reaches OPA: $result"
# Test 3: discovery requests bypass OPA
result=$(kubectl auth can-i list namespaces --as="some-user" 2>&1)
echo "Discovery bypasses OPA: $result"
Configuring Reload and Drift Detection
The API server reloads the authorization config file when it changes on disk (every 60s by default, configurable via --authorization-config-reload-interval). Protect against silent config drift:
# Hash the authorisation config and store in a ConfigMap for drift detection
sha256sum /etc/kubernetes/authorization-config.yaml | \
awk '{print $1}' > /etc/kubernetes/authorization-config.sha256
# Alerting rule for Prometheus (via kube-state-metrics custom resource)
# or a simple cron job
*/5 * * * * \
current=$(sha256sum /etc/kubernetes/authorization-config.yaml | awk '{print $1}') && \
stored=$(cat /etc/kubernetes/authorization-config.sha256) && \
[ "$current" != "$stored" ] && \
echo "ALERT: authorization-config changed without tracked update" | \
logger -t k8s-authz-drift
Enforcing Fail-Closed for Critical Namespaces
For namespaces that hold sensitive workloads, configure a secondary webhook that is strictly fail-closed:
authorizers:
- type: Node
- type: Webhook
name: default-policy-webhook
webhook:
failurePolicy: Deny
timeout: 3s
matchConditions: [] # matches all requests
- type: Webhook
name: privileged-ns-strict-webhook
webhook:
failurePolicy: Deny
timeout: 2s
matchConditions:
# Only applies to security-sensitive namespaces
- expression: >
request.resourceAttributes.namespace in
["kube-system","cert-manager","vault","monitoring"]
connectionInfo:
type: KubeConfigFile
kubeConfigFile: /etc/kubernetes/strict-authz-kubeconfig.yaml
- type: RBAC
Auditing the Authorizer Decision Path
Enable audit logging to record which authorizer made each decision:
# audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: RequestResponse
verbs: ["create","update","patch","delete","escalate","bind","impersonate"]
resources:
- group: ""
resources: ["*"]
- group: "rbac.authorization.k8s.io"
resources: ["*"]
Parse the audit log to find requests that bypassed the webhook (fell through to RBAC only):
# Find requests where the webhook returned NoOpinion and RBAC approved
jq 'select(.annotations["authorization.k8s.io/decision"] == "allow") |
select(.annotations["authorization.k8s.io/reason"] | test("RBAC")) |
{user: .user.username, verb: .verb, resource: .objectRef.resource,
ns: .objectRef.namespace}' /var/log/kubernetes/audit.log | head -50
Expected Behaviour After Hardening
| Scenario | Before Hardening | After Hardening |
|---|---|---|
| CEL expression syntax error pushed to API server | Config reloaded silently; malformed condition skips authorizer | kubectl dry-run validation fails pre-deployment |
| Webhook unavailable during pod create | Depends on failurePolicy; often NoOpinion (pass-through) |
Deny failurePolicy blocks request; alerts fire on webhook error rate |
| System component (kubelet) request routed to OPA | CEL not available; all requests hit webhook | CEL condition skips OPA for system:node:*; no latency added |
| Privileged namespace access attempt with stolen token | Reaches RBAC only if webhook chain has gap | Secondary strict webhook enforces additional policy for sensitive namespaces |
| Authorization config drift | No detection | Checksum alert fires within 5 minutes |
Verification:
# Confirm the authorization config is loaded
kubectl get --raw /metrics | grep apiserver_authorization_config
# Check webhook decision distribution
kubectl get --raw /metrics | grep apiserver_authorization_webhook_duration_seconds_bucket
# Confirm OPA webhook is being called for expected requests
kubectl logs -n opa deployment/opa-authz | grep "decision" | tail -20
Trade-offs and Operational Considerations
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| CEL matchConditions | Precise targeting reduces webhook latency by skipping irrelevant requests | Complex CEL expressions can silently produce gaps | Test with a request matrix; add coverage assertions to CI |
| Fail-closed webhooks | Prevents authorizer bypass when webhook is down | Webhook outage causes API server denial-of-service | Deploy webhooks with high availability (≥2 replicas, PodDisruptionBudget) |
| Multiple authorizer chain | Layered defence; different tools cover different policies | Debugging which authorizer denied a request is harder | Parse authorization.k8s.io/reason annotation in audit logs |
| Config file reload | No API server restart needed for policy updates | Reload window (up to 60s) where old and new policies coexist | Use --authorization-config-reload-interval=10s in low-risk update windows only |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Broken CEL expression in config | All requests matching the expression skip the authorizer silently | Audit log shows RBAC-only approvals for webhook-targeted requests | Roll back config file; restart API server if reload fails |
| Webhook TLS certificate expiry | All webhook-targeted requests denied with TLS error | Webhook error rate metric spikes; API server logs show x509 errors | Rotate webhook TLS cert; update kubeconfig reference |
| Webhook latency exceeds timeout | Requests time out; failurePolicy determines outcome |
apiserver_authorization_webhook_duration_seconds p99 > timeout |
Increase webhook replicas; reduce matchConditions to reduce call volume |
| Chain ordering reversed accidentally | Permissive webhook approves before restrictive RBAC | Authorization audit shows unexpected approvals from first-in-chain | Restore original chain order; verify with test matrix |
| Config reload picks up partial write | API server rejects the half-written config and keeps old | API server log: failed to reload authorization config |
Use atomic file writes (mv from temp); monitor API server config reload errors |