Kubernetes Structured Authorization Config: Hardening Multi-Webhook Auth Chains

Kubernetes Structured Authorization Config: Hardening Multi-Webhook Auth Chains

The Problem

Before Kubernetes 1.30, authorization configuration was a flat --authorization-mode flag on the API server: Node,RBAC,Webhook. The ordering was fixed at startup, every request hit every authorizer in sequence, and webhook authorizers had a single binary fail-open/fail-closed dial. There was no way to express “run this webhook only for requests in the kube-system namespace” or “skip the external PAM webhook for service accounts” without building that logic into the webhook itself.

The Structured Authorization Config API (KEP-3221, GA in 1.30) replaces this with a YAML file that allows expressing per-authorizer matchConditions using the same CEL expression language used in ValidatingAdmissionPolicies. This is powerful, but it introduces a class of misconfiguration bugs that didn’t exist before:

Silent bypass via CEL condition gap. If a matchConditions expression is slightly wrong — for example, request.user.startsWith("system:serviceaccount:prod:") written when the intent was “all service accounts in prod namespace” — requests that don’t match simply skip the webhook and fall through to RBAC. Depending on RBAC configuration, this can mean unauthorized access silently passes.

Chain ordering confusion. The structured config evaluates authorizers in declaration order. Teams migrating from the old flag-based config sometimes inadvertently reorder evaluators, placing a permissive webhook before RBAC and granting access that RBAC would have denied.

Failure policy interaction. Each webhook in the chain can have failurePolicy: Deny or failurePolicy: NoOpinion. NoOpinion on a failure means the chain continues, which is safe if later authorizers are strict. Deny stops the chain and rejects the request. Teams frequently set all webhooks to Deny as a “safe default” but then find that webhook outages deny API server traffic even for innocuous requests like pod status updates.

Stale authorizer configs. The structured config is a file (or ConfigMap) reloaded by the API server on a configurable interval. A deployment that updates the file without validating the CEL expressions first can push a broken config that rejects all requests until the API server is restarted.

Target systems: Kubernetes 1.30+ API servers using --authorization-config file; clusters migrating from the legacy --authorization-mode flag.

Threat Model

1. Insider attacker with limited RBAC (authenticated cluster user, no cluster-admin). Objective: craft a request that matches a CEL condition gap in the authorization chain, bypassing a restrictive external webhook and relying on a permissive RBAC binding. Impact: privilege escalation without triggering the external webhook’s audit log.

2. Webhook operator with webhook service write access (compromised webhook deployment). Objective: return NoOpinion for all requests to force fall-through to a permissive RBAC binding installed as a backdoor. Impact: cluster-wide authorization bypass persists until the webhook’s responses are audited.

3. Platform engineer with API server config write access (misconfiguration, not malice). Objective: push an authorization config update with a CEL syntax error; API server reloads the broken config and begins denying or permitting requests incorrectly. Impact: denial of service to cluster operations or unauthorized access depending on the failure direction.

4. External attacker with stolen service account token (network access to API server). Objective: find a service account whose requests match a CEL gap that skips all restrictive webhooks, then escalate from there. Impact: lateral movement within the cluster from a compromised pod.

Blast radius without hardening: a single CEL expression error in a matchConditions block can silently bypass an entire authorizer for all requests that don’t match, giving adversaries a structural bypass window that survives API server restarts.

Hardening Configuration

Structured Authorization Config File Layout

Create /etc/kubernetes/authorization-config.yaml (path referenced by --authorization-config on kube-apiserver):

apiVersion: apiserver.config.k8s.io/v1
kind: AuthorizationConfiguration
authorizers:
  # 1. Node authorizer — must be first; handles kubelet credential scope
  - type: Node

  # 2. External policy webhook (OPA, Kyverno-authz, etc.)
  - type: Webhook
    name: opa-authz
    webhook:
      authorizedTTL: 5m
      unauthorizedTTL: 30s
      timeout: 3s
      subjectAccessReviewVersion: v1
      # Fail CLOSED: if the webhook is unavailable, deny the request
      failurePolicy: Deny
      connectionInfo:
        type: KubeConfigFile
        kubeConfigFile: /etc/kubernetes/opa-authz-kubeconfig.yaml
      matchConditions:
        # Only send requests to OPA if they are NOT system component requests
        # System components (kube-controller-manager, scheduler) use RBAC only
        - expression: >
            !(request.user.startsWith("system:") &&
              !request.user.startsWith("system:serviceaccount:"))
        # Skip OPA for read-only discovery requests (reduces latency hot path)
        - expression: >
            !(request.resourceAttributes.verb in ["get","list","watch"] &&
              request.resourceAttributes.resource in ["namespaces","nodes"] &&
              request.resourceAttributes.namespace == "")

  # 3. RBAC — final authorizer; must always be present
  - type: RBAC

Apply the config to the API server:

# /etc/kubernetes/manifests/kube-apiserver.yaml (static pod)
spec:
  containers:
  - command:
    - kube-apiserver
    - --authorization-config=/etc/kubernetes/authorization-config.yaml
    # IMPORTANT: remove --authorization-mode if present — the two flags conflict
    volumeMounts:
    - mountPath: /etc/kubernetes/authorization-config.yaml
      name: auth-config
      readOnly: true
  volumes:
  - hostPath:
      path: /etc/kubernetes/authorization-config.yaml
    name: auth-config

Validating CEL Expressions Before Deployment

Use kubectl dry-run with the authorization config admission validation:

# Validate syntax before pushing (requires kubectl 1.30+)
kubectl create --dry-run=server \
  -f /etc/kubernetes/authorization-config.yaml \
  --validate=strict

# More thorough: use cel-validator (standalone tool)
# https://github.com/google/cel-go
cat /etc/kubernetes/authorization-config.yaml | \
  grep "expression:" | awk '{print $2}' | while read expr; do
    echo "Validating: $expr"
    cel-eval "$expr" '{"request": {"user": "test", "resourceAttributes": {"verb": "get", "resource": "pods", "namespace": "default"}}}'
done

Write a test matrix that exercises each CEL condition:

#!/bin/bash
# test-authz-conditions.sh
# Tests that key request types hit the right authorizer

API_SERVER="https://localhost:6443"
TOKEN=$(cat /tmp/test-token)

# Test 1: system:node credentials should bypass OPA (handled by Node authorizer)
result=$(kubectl auth can-i get pods --as="system:node:node01" -n default 2>&1)
echo "Node credential bypass OPA: $result"

# Test 2: regular service account should reach OPA
result=$(kubectl auth can-i create deployments \
  --as="system:serviceaccount:prod:my-svc" -n prod 2>&1)
echo "Service account reaches OPA: $result"

# Test 3: discovery requests bypass OPA
result=$(kubectl auth can-i list namespaces --as="some-user" 2>&1)
echo "Discovery bypasses OPA: $result"

Configuring Reload and Drift Detection

The API server reloads the authorization config file when it changes on disk (every 60s by default, configurable via --authorization-config-reload-interval). Protect against silent config drift:

# Hash the authorisation config and store in a ConfigMap for drift detection
sha256sum /etc/kubernetes/authorization-config.yaml | \
  awk '{print $1}' > /etc/kubernetes/authorization-config.sha256

# Alerting rule for Prometheus (via kube-state-metrics custom resource)
# or a simple cron job
*/5 * * * * \
  current=$(sha256sum /etc/kubernetes/authorization-config.yaml | awk '{print $1}') && \
  stored=$(cat /etc/kubernetes/authorization-config.sha256) && \
  [ "$current" != "$stored" ] && \
  echo "ALERT: authorization-config changed without tracked update" | \
  logger -t k8s-authz-drift

Enforcing Fail-Closed for Critical Namespaces

For namespaces that hold sensitive workloads, configure a secondary webhook that is strictly fail-closed:

authorizers:
  - type: Node

  - type: Webhook
    name: default-policy-webhook
    webhook:
      failurePolicy: Deny
      timeout: 3s
      matchConditions: []   # matches all requests

  - type: Webhook
    name: privileged-ns-strict-webhook
    webhook:
      failurePolicy: Deny
      timeout: 2s
      matchConditions:
        # Only applies to security-sensitive namespaces
        - expression: >
            request.resourceAttributes.namespace in
            ["kube-system","cert-manager","vault","monitoring"]
      connectionInfo:
        type: KubeConfigFile
        kubeConfigFile: /etc/kubernetes/strict-authz-kubeconfig.yaml

  - type: RBAC

Auditing the Authorizer Decision Path

Enable audit logging to record which authorizer made each decision:

# audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  - level: RequestResponse
    verbs: ["create","update","patch","delete","escalate","bind","impersonate"]
    resources:
      - group: ""
        resources: ["*"]
      - group: "rbac.authorization.k8s.io"
        resources: ["*"]

Parse the audit log to find requests that bypassed the webhook (fell through to RBAC only):

# Find requests where the webhook returned NoOpinion and RBAC approved
jq 'select(.annotations["authorization.k8s.io/decision"] == "allow") |
    select(.annotations["authorization.k8s.io/reason"] | test("RBAC")) |
    {user: .user.username, verb: .verb, resource: .objectRef.resource,
     ns: .objectRef.namespace}' /var/log/kubernetes/audit.log | head -50

Expected Behaviour After Hardening

Scenario Before Hardening After Hardening
CEL expression syntax error pushed to API server Config reloaded silently; malformed condition skips authorizer kubectl dry-run validation fails pre-deployment
Webhook unavailable during pod create Depends on failurePolicy; often NoOpinion (pass-through) Deny failurePolicy blocks request; alerts fire on webhook error rate
System component (kubelet) request routed to OPA CEL not available; all requests hit webhook CEL condition skips OPA for system:node:*; no latency added
Privileged namespace access attempt with stolen token Reaches RBAC only if webhook chain has gap Secondary strict webhook enforces additional policy for sensitive namespaces
Authorization config drift No detection Checksum alert fires within 5 minutes

Verification:

# Confirm the authorization config is loaded
kubectl get --raw /metrics | grep apiserver_authorization_config

# Check webhook decision distribution
kubectl get --raw /metrics | grep apiserver_authorization_webhook_duration_seconds_bucket

# Confirm OPA webhook is being called for expected requests
kubectl logs -n opa deployment/opa-authz | grep "decision" | tail -20

Trade-offs and Operational Considerations

Aspect Benefit Cost Mitigation
CEL matchConditions Precise targeting reduces webhook latency by skipping irrelevant requests Complex CEL expressions can silently produce gaps Test with a request matrix; add coverage assertions to CI
Fail-closed webhooks Prevents authorizer bypass when webhook is down Webhook outage causes API server denial-of-service Deploy webhooks with high availability (≥2 replicas, PodDisruptionBudget)
Multiple authorizer chain Layered defence; different tools cover different policies Debugging which authorizer denied a request is harder Parse authorization.k8s.io/reason annotation in audit logs
Config file reload No API server restart needed for policy updates Reload window (up to 60s) where old and new policies coexist Use --authorization-config-reload-interval=10s in low-risk update windows only

Failure Modes

Failure Symptom Detection Recovery
Broken CEL expression in config All requests matching the expression skip the authorizer silently Audit log shows RBAC-only approvals for webhook-targeted requests Roll back config file; restart API server if reload fails
Webhook TLS certificate expiry All webhook-targeted requests denied with TLS error Webhook error rate metric spikes; API server logs show x509 errors Rotate webhook TLS cert; update kubeconfig reference
Webhook latency exceeds timeout Requests time out; failurePolicy determines outcome apiserver_authorization_webhook_duration_seconds p99 > timeout Increase webhook replicas; reduce matchConditions to reduce call volume
Chain ordering reversed accidentally Permissive webhook approves before restrictive RBAC Authorization audit shows unexpected approvals from first-in-chain Restore original chain order; verify with test matrix
Config reload picks up partial write API server rejects the half-written config and keeps old API server log: failed to reload authorization config Use atomic file writes (mv from temp); monitor API server config reload errors