Admission Webhook PR Poisoning: How a Merged PR Becomes a Cluster Backdoor

Admission Webhook PR Poisoning: How a Merged PR Becomes a Cluster Backdoor

The Problem

Every resource create and update operation in a Kubernetes cluster passes through the admission webhook pipeline before it reaches the API server’s persistent store. A ValidatingWebhookConfiguration that enforces pod security standards — blocking privileged containers, enforcing seccomp profiles, requiring read-only root filesystems — is the authoritative enforcement point for those policies. If an attacker can modify the webhook configuration, they can remove enforcement without touching the policies themselves. The policies remain in place, the webhook remains registered, but its scope is silently narrowed to exclude the resources the attacker wants to abuse.

The attack surface is a change management problem. Admission webhook configurations are typically managed as GitOps-controlled Kubernetes manifests. A pull request that modifies failurePolicy: Fail to failurePolicy: Ignore, widens a namespaceSelector to exclude production namespaces, or adds a rules exclusion for a specific resource kind — changes the security posture of every workload in scope without any change to the workload manifests themselves. The PR diff looks like a small configuration tweak. The security implication is that pod security enforcement is now absent for specific namespaces or resource types.

The harder variant: a PR that does not modify the webhook manifest directly but instead modifies the controller deployment that registers the webhook at startup. Webhook controllers typically read their configuration at boot and call the Kubernetes API to register or update ValidatingWebhookConfiguration objects. A patch that changes the controller’s registration logic — adjusting the namespaceSelector it passes to the API, removing a rule from its admission handler, or pointing the clientConfig.url at a different endpoint — is functionally equivalent to modifying the manifest but is invisible to tools that only watch webhook manifest changes.

The xz-utils attack pattern applies: a contributor submits a series of legitimate improvements to a webhook controller over months — better error handling, performance improvements, cleaner code — building the commit history and reputation that causes reviewers to give subsequent PRs less scrutiny. The malicious PR, when it arrives, looks like another incremental improvement. The security regression is buried in a changed default value or a removed validation case.

Specific gaps in environments without webhook change controls:

  • Admission webhook configs managed in a busy GitOps repository may receive less scrutiny than application code PRs.
  • failurePolicy: Ignore is the correct setting for non-critical webhooks and a catastrophic setting for security-enforcement webhooks — a reviewer unfamiliar with the specific webhook may not recognise the significance of the change.
  • Webhook controllers that self-register don’t leave a static manifest to review — the effective configuration only exists at runtime.
  • ArgoCD and Flux treat webhook configurations as regular Kubernetes objects; without specific protection, a PR that removes a webhook object is applied silently on the next sync.

Target systems: Kubernetes 1.25+; OPA Gatekeeper and Kyverno admission controllers; ArgoCD and Flux GitOps controllers; GitHub Actions and GitLab CI PR workflows; cluster RBAC configurations.

Threat Model

Adversary 1 — Direct PR to webhook configuration manifest. An attacker with write access to the GitOps repository (via compromised credentials, social engineering, or a bot account) submits a PR that modifies a ValidatingWebhookConfiguration or MutatingWebhookConfiguration manifest. The change may be subtle: widening a namespaceSelector, changing failurePolicy, adding an objectSelector exclusion, or removing a rules entry. The PR description explains the change as a compatibility fix or performance improvement. If the change passes review, the next ArgoCD sync applies it to all clusters that source from this repository.

Adversary 2 — PR to controller code that registers webhooks. A PR modifies the webhook controller’s Go or Python source code. The registration logic constructs a ValidatingWebhookConfiguration object and applies it to the cluster at startup. A patch that changes the namespaceSelector expression, removes a rule, or modifies the failurePolicy default produces a different runtime webhook configuration than the current one but leaves no diff in any static manifest file. This is more subtle and requires reviewers to understand the controller’s registration code path.

Adversary 3 — PR that adds a new webhook endpoint the attacker controls. A PR adds a new entry to an existing MutatingWebhookConfiguration pointing to a webhook server under the attacker’s control. The new entry’s rules are broad — applying to all pod creates. The attacker’s webhook server mutates pods to add environment variables containing secrets, or to replace container images with attacker-controlled images. The webhook server endpoint looks like a legitimate internal service in the PR diff.

Adversary 4 — Maintainer account compromise followed by webhook modification. An attacker compromises a repository maintainer’s account through credential stuffing, phishing, or session token theft. They bypass the PR process and push directly to the main branch, or approve their own PR using the compromised account. The webhook configuration change is applied in the next GitOps sync cycle — potentially within minutes.

  • Access objective: Disable pod security enforcement for target namespaces, enabling deployment of privileged containers; redirect mutation webhook to attacker-controlled server to intercept or modify pod specifications.
  • Detection surface: PR diff analysis, ArgoCD sync diff alerts, Kubernetes audit logs, OPA/Kyverno meta-policies.
  • Blast radius: Depending on the webhook’s scope, a single configuration change can remove pod security enforcement for an entire cluster or specific production namespaces.

Hardening Configuration

Step 1: OPA/Rego Policy Denying Webhook Configuration Changes from Non-Approved Identities

OPA Gatekeeper meta-policies enforce constraints on admission webhook configurations themselves, creating a second enforcement layer that operates independently of the GitOps process.

# gatekeeper/webhook-config-protection.rego
# Enforces that ValidatingWebhookConfiguration and MutatingWebhookConfiguration
# objects can only be created or modified by approved service accounts.

package kubernetes.validating.webhookprotection

import future.keywords.in

# Approved identities — only these service accounts may modify webhook configs.
approved_identities := {
  "system:serviceaccount:argocd:argocd-application-controller",
  "system:serviceaccount:kube-system:webhook-controller",
}

# Security-critical webhooks that require stricter protection.
protected_webhook_names := {
  "gatekeeper-validating-webhook-configuration",
  "kyverno-resource-validating-webhook-cfg",
  "pod-security-webhook",
}

deny[msg] {
  # Match create/update on webhook configuration types.
  input.request.kind.kind in {
    "ValidatingWebhookConfiguration",
    "MutatingWebhookConfiguration",
  }
  input.request.operation in {"CREATE", "UPDATE"}
  
  # Check if the requesting identity is approved.
  requesting_user := input.request.userInfo.username
  not requesting_user in approved_identities
  
  msg := sprintf(
    "Webhook configuration modification denied: %v is not in the approved identity list. Webhook configs may only be modified by: %v",
    [requesting_user, approved_identities]
  )
}

deny[msg] {
  # Specifically protect named critical webhooks.
  input.request.kind.kind in {
    "ValidatingWebhookConfiguration",
    "MutatingWebhookConfiguration",
  }
  input.request.operation == "UPDATE"
  
  webhook_name := input.request.object.metadata.name
  webhook_name in protected_webhook_names
  
  # Detect failurePolicy downgrade.
  old_webhooks := {w | w := input.request.oldObject.webhooks[_]}
  new_webhooks := {w | w := input.request.object.webhooks[_]}
  
  some old_wh in old_webhooks
  some new_wh in new_webhooks
  old_wh.name == new_wh.name
  old_wh.failurePolicy == "Fail"
  new_wh.failurePolicy == "Ignore"
  
  msg := sprintf(
    "Webhook failurePolicy downgrade denied: webhook %v in %v had failurePolicy: Fail and cannot be changed to Ignore without explicit approval",
    [old_wh.name, webhook_name]
  )
}

Apply the constraint:

# gatekeeper/webhook-config-constraint-template.yaml
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: webhookconfigprotection
spec:
  crd:
    spec:
      names:
        kind: WebhookConfigProtection
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        # (paste the Rego above)
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: WebhookConfigProtection
metadata:
  name: protect-admission-webhooks
spec:
  enforcementAction: deny
  match:
    kinds:
      - apiGroups: ["admissionregistration.k8s.io"]
        kinds:
          - ValidatingWebhookConfiguration
          - MutatingWebhookConfiguration

Step 2: Kyverno ClusterPolicy Blocking Webhook Configuration Drift

Kyverno’s validate rules can enforce invariants on webhook configurations, blocking specific high-risk changes:

# kyverno/webhook-integrity-policy.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: webhook-configuration-integrity
  annotations:
    policies.kyverno.io/title: Admission Webhook Configuration Integrity
    policies.kyverno.io/severity: critical
    policies.kyverno.io/description: >
      Prevents security-degrading changes to admission webhook configurations.
      Blocks failurePolicy downgrades, namespace selector widening, and
      addition of wildcard rules to security-critical webhooks.
spec:
  validationFailureAction: Enforce
  rules:
    - name: block-failurepolicy-downgrade
      match:
        any:
          - resources:
              kinds:
                - ValidatingWebhookConfiguration
                - MutatingWebhookConfiguration
              operations:
                - UPDATE
      validate:
        message: >
          Changing failurePolicy from Fail to Ignore is not permitted.
          This change silently disables enforcement when the webhook is unavailable.
        foreach:
          - list: "request.object.webhooks"
            deny:
              conditions:
                all:
                  - key: "{{ element.failurePolicy }}"
                    operator: Equals
                    value: Ignore
                  - key: >
                      {{ request.oldObject.webhooks[?name=='{{ element.name }}'].failurePolicy | [0] }}
                    operator: Equals
                    value: Fail

    - name: block-wildcard-namespace-selector-addition
      match:
        any:
          - resources:
              kinds:
                - ValidatingWebhookConfiguration
              names:
                - "gatekeeper-*"
                - "kyverno-*"
                - "pod-security-*"
              operations:
                - UPDATE
      validate:
        message: >
          Adding a namespaceSelector that excludes all production namespaces
          from security enforcement is not permitted.
        foreach:
          - list: "request.object.webhooks"
            deny:
              conditions:
                any:
                  - key: "{{ element.namespaceSelector.matchExpressions[].operator }}"
                    operator: AnyIn
                    value:
                      - NotIn
                      - DoesNotExist

Step 3: ArgoCD GitOps Configuration Treating Webhook Manifests as Immutable

Configure ArgoCD to enforce that webhook configurations are only sourced from the GitOps repository — drift from the in-cluster state is immediately remediated:

# argocd/webhook-configs-app.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: admission-webhook-configs
  namespace: argocd
  annotations:
    # Notify security team on any sync that touches webhook resources.
    notifications.argoproj.io/subscribe.on-sync-succeeded.slack: security-alerts
spec:
  project: security-controlled
  source:
    repoURL: https://github.com/org/cluster-security-configs
    targetRevision: HEAD
    path: webhook-configurations/
  destination:
    server: https://kubernetes.default.svc
    namespace: ""  # Cluster-scoped resources.
  syncPolicy:
    automated:
      prune: true      # Remove webhook configs not in git.
      selfHeal: true   # Immediately revert manual changes.
    syncOptions:
      - RespectIgnoreDifferences=false
      - ApplyOutOfSyncOnly=true
  # Ignore only labels and annotations that change during normal operation.
  ignoreDifferences:
    - group: admissionregistration.k8s.io
      kind: ValidatingWebhookConfiguration
      jsonPointers:
        - /metadata/labels/app.kubernetes.io~1version

Restrict the ArgoCD project to allow only the security team’s repository as a source for webhook configurations:

# argocd/security-controlled-project.yaml
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: security-controlled
  namespace: argocd
spec:
  description: "Security-controlled resources — webhook configs, RBAC, NetworkPolicies"
  sourceRepos:
    # Only this specific repository may manage webhook configurations.
    - "https://github.com/org/cluster-security-configs"
  destinations:
    - namespace: "*"
      server: https://kubernetes.default.svc
  clusterResourceWhitelist:
    - group: admissionregistration.k8s.io
      kind: ValidatingWebhookConfiguration
    - group: admissionregistration.k8s.io
      kind: MutatingWebhookConfiguration
  # Sync windows: webhook config changes only during business hours with approval.
  syncWindows:
    - kind: allow
      schedule: "09 00 * * 1-5"
      duration: 8h
      applications:
        - admission-webhook-configs
      manualSync: true  # Require manual approval even during the window.

Step 4: Diff-Based CI Check Alerting on Webhook Registration Changes

A CI check that runs on every PR affecting webhook-related code or manifests, producing an explicit diff of the effective webhook configuration:

#!/bin/bash
# .github/scripts/check-webhook-drift.sh
# Compares the webhook configuration that would be applied after this PR
# with the current cluster configuration.
# Runs in CI with read-only cluster access.

set -euo pipefail

CHANGED_FILES=$(git diff --name-only origin/main...HEAD)
WEBHOOK_AFFECTED=false

# Check if webhook manifests or controller code changed.
for f in $CHANGED_FILES; do
  if echo "$f" | grep -qE \
    "(ValidatingWebhookConfiguration|MutatingWebhookConfiguration|webhook.*controller|admission.*handler)"; then
    WEBHOOK_AFFECTED=true
    echo "Webhook-related change detected: $f"
  fi
done

if [ "$WEBHOOK_AFFECTED" = "false" ]; then
  echo "No webhook-related changes detected — skipping webhook diff check."
  exit 0
fi

echo "=== Webhook Configuration Diff Analysis ==="
echo "This PR modifies files that affect admission webhook behaviour."
echo ""

# Extract current webhook configs from cluster.
kubectl get validatingwebhookconfigurations \
  -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | \
  while read webhook_name; do
    kubectl get validatingwebhookconfiguration "$webhook_name" -o yaml \
      > "/tmp/current-${webhook_name}.yaml"
  done

# For manifest changes, diff the YAML directly.
for f in $CHANGED_FILES; do
  if kubectl apply --dry-run=client -f "$f" 2>/dev/null | \
     grep -q "ValidatingWebhookConfiguration\|MutatingWebhookConfiguration"; then
    WEBHOOK_NAME=$(grep "name:" "$f" | head -1 | awk '{print $2}')
    if [ -f "/tmp/current-${WEBHOOK_NAME}.yaml" ]; then
      echo "--- Diff for webhook: ${WEBHOOK_NAME} ---"
      diff \
        <(yq '.webhooks[] | {"name": .name, "failurePolicy": .failurePolicy, "namespaceSelector": .namespaceSelector, "rules": .rules}' \
          "/tmp/current-${WEBHOOK_NAME}.yaml" 2>/dev/null) \
        <(yq '.webhooks[] | {"name": .name, "failurePolicy": .failurePolicy, "namespaceSelector": .namespaceSelector, "rules": .rules}' \
          "$f" 2>/dev/null) || true
    fi
  fi
done

# Fail if failurePolicy is being downgraded.
if git diff origin/main...HEAD -- "*.yaml" "*.yml" | \
   grep -E "^\+.*failurePolicy.*Ignore" | \
   grep -B5 "failurePolicy" | grep -qE "Fail"; then
  echo ""
  echo "FAIL: failurePolicy downgrade detected (Fail -> Ignore)."
  echo "This change disables webhook enforcement on API server errors."
  echo "Requires security team review."
  exit 1
fi

echo ""
echo "Webhook diff check complete. Changes require security team review."
# Exit 0 to allow PR but generate an explicit annotation for reviewers.

Step 5: RBAC Restricting Webhook Configuration Modification

Ensure that only the specific service accounts that manage webhook configurations have API access to update them:

# rbac/webhook-config-admin-clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: webhook-configuration-admin
rules:
  - apiGroups: ["admissionregistration.k8s.io"]
    resources:
      - validatingwebhookconfigurations
      - mutatingwebhookconfigurations
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
# Only ArgoCD and specific webhook controllers get this role.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: webhook-configuration-admin-binding
subjects:
  - kind: ServiceAccount
    name: argocd-application-controller
    namespace: argocd
  - kind: ServiceAccount
    name: gatekeeper-admin
    namespace: gatekeeper-system
roleRef:
  kind: ClusterRole
  name: webhook-configuration-admin
  apiGroup: rbac.authorization.k8s.io
---
# Audit who currently has this access.
# Run: kubectl get clusterrolebindings -o json | \
#   jq '.items[] | select(.roleRef.name | test("admin|cluster-admin")) |
#   {binding: .metadata.name, subjects: .subjects}'

Verify no principals have broad webhook modification access via ClusterAdmin:

# Audit webhook config write access.
kubectl auth can-i update validatingwebhookconfigurations \
  --as system:serviceaccount:default:default
# Expected: no

# Check all service accounts in application namespaces.
for ns in $(kubectl get ns -o name | sed 's|namespace/||' | grep -v "kube-\|argocd\|gatekeeper"); do
  kubectl get rolebindings,clusterrolebindings -n "$ns" -o json 2>/dev/null | \
    jq -r '.items[] | .metadata.name + " -> " + .roleRef.name' | \
    grep -i admin | \
    while read binding; do
      echo "REVIEW: Namespace $ns has admin binding: $binding"
    done
done

Step 6: Audit Log Monitoring for Webhook Configuration Modifications

# falco/webhook-config-modification-rule.yaml
# Falco rule to alert on runtime webhook configuration changes.
- rule: Admission Webhook Configuration Modified
  desc: >
    Detect modification of ValidatingWebhookConfiguration or
    MutatingWebhookConfiguration objects via the Kubernetes API.
    These objects gate all admission to the cluster.
  condition: >
    kube_audit and
    ka.verb in (create, update, patch, delete) and
    ka.target.resource in (
      validatingwebhookconfigurations,
      mutatingwebhookconfigurations
    ) and
    not ka.user.name in (
      "system:serviceaccount:argocd:argocd-application-controller",
      "system:serviceaccount:gatekeeper-system:gatekeeper-admin",
      "system:serviceaccount:kyverno:kyverno-admission-controller"
    )
  output: >
    Webhook configuration modified by unexpected identity
    (user=%ka.user.name verb=%ka.verb
     resource=%ka.target.resource name=%ka.target.name
     response=%ka.response.code)
  priority: CRITICAL
  tags: [admission-control, supply-chain, kubernetes]
# Query audit logs for webhook configuration changes (last 24 hours).
# For clusters with log aggregation:
kubectl logs -n kube-system \
  -l component=kube-apiserver \
  --since=24h 2>/dev/null | \
  jq -r 'select(.objectRef.resource == "validatingwebhookconfigurations" or
                .objectRef.resource == "mutatingwebhookconfigurations") |
    "\(.requestReceivedTimestamp) \(.user.username) \(.verb) \(.objectRef.name)"' \
  2>/dev/null | sort

# For Datadog/Splunk users, the equivalent query:
# kubernetes.audit.objectRef.resource:validatingwebhookconfigurations
# OR kubernetes.audit.objectRef.resource:mutatingwebhookconfigurations

Expected Behaviour After Hardening

PR with failurePolicy downgrade blocked in CI. A PR modifies gatekeeper-validating-webhook-configuration.yaml to change failurePolicy: Fail to failurePolicy: Ignore as part of a “stability improvement for high-load periods.” The CI webhook drift check detects the change:

Webhook-related change detected: manifests/webhook-configurations/gatekeeper.yaml

=== Webhook Configuration Diff Analysis ===

--- Diff for webhook: gatekeeper-validating-webhook-configuration ---
-  failurePolicy: Fail
+  failurePolicy: Ignore

FAIL: failurePolicy downgrade detected (Fail -> Ignore).
This change disables webhook enforcement on API server errors.
Requires security team review.
Error: Process completed with exit code 1.

The CI check fails. The PR cannot merge until the security team explicitly reviews and approves.

OPA policy blocks runtime modification. An operator with cluster-admin access attempts to patch the Gatekeeper webhook configuration directly from their terminal:

$ kubectl patch validatingwebhookconfiguration gatekeeper-validating-webhook-configuration \
  --type=merge -p '{"webhooks":[{"name":"validation.gatekeeper.sh","failurePolicy":"Ignore"}]}'

Error from server ([webhook-configuration-integrity] Changing failurePolicy from
Fail to Ignore is not permitted. This change silently disables enforcement when
the webhook is unavailable.): admission webhook
"webhook-configuration-integrity.kyverno.svc" denied the request

Falco alert on unexpected modification. A pod running in the cluster attempts to modify the webhook configuration using a service account token:

10:42:17 CRITICAL Admission Webhook Configuration Modified
  user=system:serviceaccount:app-team:deployer
  verb=update
  resource=validatingwebhookconfigurations
  name=pod-security-webhook
  response=403

Trade-offs and Operational Considerations

Control Benefit Cost / Friction
OPA/Kyverno meta-policies Prevents runtime webhook modification; enforces invariants cluster-wide Must not be enforced before the webhook controller bootstraps; circular dependency risk during cluster init
ArgoCD immutable sync with selfHeal Immediately reverts manual webhook changes; prevents drift Breaks “emergency” manual changes; requires runbook for legitimate emergency modifications
CI diff check Catches changes in PRs before they reach the cluster Requires cluster API read access in CI; false positives on legitimate webhook additions
RBAC restriction Limits blast radius of compromised credentials Wide ClusterAdmin bindings (common in many clusters) override RBAC restrictions; audit is one-time
Falco audit monitoring Real-time alerting on unexpected webhook modifications Requires Falco deployment; audit log format varies by cloud provider; alert routing adds operational burden
Kyverno failurePolicy rule Prevents the highest-impact single webhook change Kyverno itself has a webhook configuration; Kyverno’s own configuration needs separate protection

The bootstrap ordering problem: OPA Gatekeeper and Kyverno themselves register webhook configurations at startup. If your meta-policy blocks all webhook configuration updates from non-approved identities, and the approved identities list doesn’t include the Gatekeeper/Kyverno service accounts, bootstrapping the cluster requires a deliberate exception or a two-phase deployment.

Failure Modes

Failure Mode Consequence Prevention
OPA/Kyverno meta-policy has failurePolicy: Ignore The meta-policy that protects other webhooks can itself be bypassed when the admission controller is unavailable Set the meta-policy’s own failurePolicy to Fail; monitor the admission controller’s availability as a critical service
ArgoCD service account has overly broad RBAC Compromising ArgoCD gives direct webhook modification access Scope ArgoCD’s service account using AppProject restrictions; use separate service accounts per application
Protected webhook names list is incomplete New security webhooks deployed without adding them to the protection policy Use label-based matching (app.kubernetes.io/component: security-enforcement) rather than name-based
CI check only runs on YAML changes Controller code PR bypasses the check entirely Trigger webhook diff check on changes to any directory under controllers/, webhooks/, or admission/
Legitimate emergency modification is blocked Incident response requires a webhook change but all paths are blocked Maintain a documented break-glass procedure: specific human approval in the GitOps repo grants a temporary RBAC binding; all steps are audited
Contributor adds a new webhook endpoint PR (Adversary 3) New MutatingWebhookConfiguration entry is missed by policies focused on existing webhook names Alert on all create operations for MutatingWebhookConfiguration, not just updates to existing ones