Security Validation for AI-Generated Kubernetes Manifests

Security Validation for AI-Generated Kubernetes Manifests

Problem

AI coding assistants are widely used to generate Kubernetes manifests. When a developer asks Copilot, Claude, or Cursor to “write a Deployment for my Python API with 3 replicas”, the assistant produces working YAML in seconds. The manifest deploys correctly, runs the application, and satisfies the developer’s immediate need.

The security problem is that AI-generated Kubernetes manifests systematically reproduce the worst patterns from their training data. The internet contains vastly more insecure Kubernetes YAML than secure YAML — tutorials that skip security context, Stack Overflow answers that add privileged: true to fix permission issues, blog posts from 2019 that predate Pod Security Standards. AI assistants learn these patterns and apply them by default.

The specific misconfiguration classes that appear reliably in AI-generated manifests:

Missing securityContext entirely. Most AI-generated Deployments omit the securityContext block at both pod and container levels. This means containers run as root (uid 0), can escalate privileges via setuid binaries, have no Seccomp profile, and have a writable root filesystem. These are the defaults that Pod Security Admission’s baseline policy exists to reject — but many clusters still permit them.

privileged: true in containers. When an AI assistant encounters a workload that needs device access, hostPath mounts, or elevated capabilities, it defaults to privileged: true — the equivalent of giving the container root on the host node. The correct solution is almost always to drop all capabilities and add back only the specific ones needed, but this requires understanding the workload in detail that the AI lacks.

Over-broad ClusterRoleBindings. AI-generated RBAC frequently uses cluster-admin as the roleRef, or creates ClusterRoles with wildcard resources and verbs (resources: ["*"], verbs: ["*"]). This pattern is almost never correct for application workloads and gives any compromise of the associated service account full cluster access.

hostNetwork: true, hostPID: true, hostIPC: true. These are shortcuts that AI assistants suggest when they observe that a workload needs to interact with the host network or processes. Each is a significant security boundary violation that eliminates the namespace isolation containers are supposed to provide.

No resource limits. AI-generated pods omit resources.limits, which means they can consume unlimited CPU and memory — enabling denial of service against the node’s other workloads.

Permissive service exposure. AI assistants frequently generate type: LoadBalancer or type: NodePort for internal services, exposing them externally when ClusterIP would suffice. Combined with the lack of NetworkPolicy, this creates unintended external exposure.

These patterns are not hypothetical edge cases. A study of public GitHub repositories containing AI-generated Kubernetes YAML (identifiable by characteristic comment patterns) found that AI-generated manifests had significantly higher rates of each of these misconfigurations compared to manifests written without AI assistance. The risk compounds as AI assistance becomes more prevalent — every new service started by an AI-assisted developer begins its life with the same set of security debts.

Target systems: any Kubernetes cluster (1.24+) where developers use AI coding assistants to generate manifests; CI/CD pipelines that deploy AI-generated YAML; platform teams responsible for security policy enforcement across development teams.


Threat Model

Adversary 1 — Privilege escalation via AI-generated privileged container. A developer asks an AI assistant for a monitoring agent manifest. The AI generates privileged: true because the agent needs to read /proc. An attacker who compromises the monitoring agent uses privileged access to escape to the host node, reaching all other pods’ secrets and network traffic.

Adversary 2 — Cluster takeover via AI-generated RBAC. A developer asks an AI for an operator manifest with RBAC. The AI generates a ClusterRoleBinding referencing cluster-admin. When the operator pod is compromised via a dependency vulnerability, the attacker has full cluster access.

Adversary 3 — DoS via no resource limits. A developer deploys an AI-generated manifest with no resource limits during a traffic spike. The pod consumes all CPU and memory on the node, evicting other pods including critical system components.

Adversary 4 — Unintended external exposure. An AI assistant generates a LoadBalancer service for an internal microservice. The service is exposed on a public IP. An attacker discovers it via automated scanning and accesses an API that was intended to be internal-only.

Without validation gates: all four attacks succeed because misconfigurations go undetected until they are exploited. With validation: automated tooling catches each class at PR time with actionable remediation guidance.


Configuration / Implementation

Step 1 — Scan manifests with kube-score at PR time

kube-score performs static analysis of Kubernetes manifests and produces detailed, actionable findings:

# Install kube-score
curl -L https://github.com/zegl/kube-score/releases/latest/download/kube-score_linux_amd64 \
  -o /usr/local/bin/kube-score && chmod +x /usr/local/bin/kube-score

# Scan a manifest
kube-score score deployment.yaml

# Focus on critical failures
kube-score score deployment.yaml --output-format ci | grep "CRITICAL\|WARNING"

# Typical AI-generated manifest output:
# [CRITICAL] Container Security Context
#   · container: app -> Container has no configured security context
# [CRITICAL] Pod Security Context
#   · container: app -> Pod has no configured security context
# [WARNING] Container Resources
#   · container: app -> CPU limit is not set
# [CRITICAL] Container Seccomp Profile
#   · container: app -> Container has no configured Seccomp profile

Add to CI/CD as a required check:

# .github/workflows/validate-manifests.yml
name: Validate Kubernetes Manifests

on:
  pull_request:
    paths:
      - 'k8s/**'
      - 'deploy/**'
      - '**/*.yaml'
      - '**/*.yml'

permissions:
  contents: read

jobs:
  kube-score:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683

    - name: Run kube-score
      run: |
        curl -L https://github.com/zegl/kube-score/releases/download/v1.18.0/kube-score_1.18.0_linux_amd64.tar.gz | tar xz
        find . -name "*.yaml" -path "*/k8s/*" | xargs ./kube-score score \
          --output-format ci \
          --exit-one-on-warning

Step 2 — Run Polaris for policy-based validation

Polaris checks manifests against security best practices with configurable severity:

# Install Polaris
curl -L https://github.com/FairwindsOps/polaris/releases/latest/download/polaris_linux_amd64.tar.gz | tar xz
mv polaris /usr/local/bin/

# Audit a manifest file
polaris audit --audit-path deployment.yaml --format pretty

# Common AI-manifest findings:
# Danger: privilegedContainers — Container is running as privileged
# Danger: runAsRootAllowed — Container may run as root
# Warning: readinessProbeMissing — Liveness probe is not configured
# Danger: dangerousCapabilities — Container has dangerous capabilities

Custom Polaris configuration to flag AI-generated patterns:

# polaris-config.yaml
checks:
  # Container security context checks
  privilegedContainers: danger
  runAsRootAllowed: danger
  runAsPrivileged: danger
  notReadOnlyRootFilesystem: warning
  privilegeEscalationAllowed: danger
  allowPrivilegeEscalation: danger
  
  # Capabilities
  dangerousCapabilities: danger
  dropCapabilitiesNotMet: warning
  
  # Host sharing (common in AI-generated monitoring agents)
  hostNetworkSet: danger
  hostPIDSet: danger
  hostIPCSet: danger
  
  # Resource limits
  cpuLimitsMissing: warning
  memoryLimitsMissing: warning
  cpuRequestsMissing: warning
  memoryRequestsMissing: warning
  
  # Image
  tagNotSpecified: danger
  pullPolicyNotAlways: warning

exemptions:
  # Exempt specific system namespaces
  - namespace: kube-system
    controllerNames: [cilium-agent, calico-node]
    rules: [hostNetworkSet, hostPIDSet]

Step 3 — Enforce with Kyverno admission policies

Catch misconfigurations at admission time, before pods even start:

# kyverno-ai-manifest-policy.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: validate-ai-generated-manifests
  annotations:
    policies.kyverno.io/description: >
      Prevents the most common AI-generated manifest misconfigurations
      from reaching production clusters.
spec:
  validationFailureAction: Enforce
  background: true
  rules:

  # Block privileged containers
  - name: deny-privileged
    match:
      any:
      - resources:
          kinds: [Pod]
    exclude:
      any:
      - resources:
          namespaces: [kube-system, cilium, calico-system]
    validate:
      message: "Privileged containers are not permitted. Drop all capabilities and add only what is needed."
      deny:
        conditions:
          any:
          - key: "{{ request.object.spec.containers[].securityContext.privileged | contains(@, true) }}"
            operator: Equals
            value: true

  # Require security context
  - name: require-security-context
    match:
      any:
      - resources:
          kinds: [Pod]
          namespaces: ["production", "staging", "development"]
    validate:
      message: "All containers must have a securityContext with allowPrivilegeEscalation: false"
      pattern:
        spec:
          containers:
          - securityContext:
              allowPrivilegeEscalation: false

  # Block cluster-admin ClusterRoleBindings for application namespaces
  - name: deny-cluster-admin-binding
    match:
      any:
      - resources:
          kinds: [ClusterRoleBinding]
    exclude:
      any:
      - resources:
          names:
          - "cluster-admin"  # System binding
          - "system:*"
    validate:
      message: "ClusterRoleBindings referencing cluster-admin require security team approval annotation"
      deny:
        conditions:
          all:
          - key: "{{ request.object.roleRef.name }}"
            operator: Equals
            value: "cluster-admin"
          - key: "{{ request.object.metadata.annotations.\"security.example.com/cluster-admin-approved\" || '' }}"
            operator: NotEquals
            value: "true"

  # Block host namespace sharing
  - name: deny-host-namespaces
    match:
      any:
      - resources:
          kinds: [Pod]
          namespaces: ["production", "staging"]
    validate:
      message: "hostNetwork, hostPID, and hostIPC must not be enabled in production/staging"
      pattern:
        spec:
          =(hostNetwork): false
          =(hostPID): false
          =(hostIPC): false

  # Require resource limits
  - name: require-resource-limits
    match:
      any:
      - resources:
          kinds: [Pod]
          namespaces: ["production", "staging"]
    validate:
      message: "Resource limits (CPU and memory) must be set on all containers"
      pattern:
        spec:
          containers:
          - resources:
              limits:
                memory: "?*"
                cpu: "?*"

Step 4 — Provide secure AI prompt templates

Reduce the problem at the source: give developers prompts that produce more secure manifests:

# Secure Kubernetes Manifest Prompt Template
# Save as .github/prompts/k8s-deployment.md and reference in AI coding assistant settings

When generating Kubernetes Deployment manifests:
1. Always include podSecurityContext with: runAsNonRoot: true, seccompProfile: {type: RuntimeDefault}
2. Always include containerSecurityContext with: allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, capabilities: {drop: ["ALL"]}
3. Always include resource requests AND limits for CPU and memory
4. Use ClusterIP (not LoadBalancer) for internal services unless explicitly asked for external access
5. Set imagePullPolicy: Always for mutable tags; IfNotPresent for digest-pinned images
6. Never use privileged: true — instead, add specific capabilities back with capabilities.add
7. Never set hostNetwork, hostPID, or hostIPC unless the workload is a system agent that genuinely requires it

Alternatively, maintain a secure Deployment template that AI tools can reference:

# .github/templates/secure-deployment.yaml
# AI TEMPLATE: Use this as a base for all Deployment manifests
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ app-name }}
  namespace: {{ namespace }}
spec:
  replicas: 2
  selector:
    matchLabels:
      app: {{ app-name }}
  template:
    metadata:
      labels:
        app: {{ app-name }}
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 10000
        runAsGroup: 10000
        fsGroup: 10000
        seccompProfile:
          type: RuntimeDefault
      containers:
      - name: {{ app-name }}
        image: {{ image }}:{{ tag }}
        imagePullPolicy: Always
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop: ["ALL"]
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 15
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 10

Step 5 — RBAC audit for AI-generated roles

#!/bin/bash
# audit-ai-rbac.sh — detect over-broad AI-generated RBAC

# Find ClusterRoles with wildcard permissions (AI hallmark)
kubectl get clusterroles -o json | jq -r '
  .items[] |
  select(
    .metadata.name | test("^system:") | not
  ) |
  . as $role |
  .rules[]? |
  select(
    (.resources[]? == "*") or
    (.verbs[]? == "*") or
    (.apiGroups[]? == "*")
  ) |
  $role.metadata.name
' | sort -u | while read -r role; do
  echo "FINDING: ClusterRole '$role' has wildcard permissions"
done

# Find ClusterRoleBindings referencing cluster-admin
kubectl get clusterrolebindings -o json | jq -r '
  .items[] |
  select(.roleRef.name == "cluster-admin") |
  select(.metadata.name | test("^system:") | not) |
  "\(.metadata.name): \(.subjects[]? | "\(.kind)/\(.name)")"
'

Expected Behaviour

Manifest characteristic AI-generated (unvalidated) After validation pipeline
Container runs as root Common (no securityContext) Blocked by Kyverno at admission
privileged: true Suggested by AI for monitoring agents Blocked; developer receives actionable error
cluster-admin ClusterRoleBinding Generated when AI sees “needs cluster access” Blocked unless approved annotation present
Missing resource limits Near-universal in AI output Blocked for production/staging namespaces
kube-score CI check Not present Fails PR if CRITICAL findings in changed YAML

Trade-offs

Aspect Benefit Cost Mitigation
Kyverno Enforce mode Hard block before admission Breaks workflows that depended on insecure defaults Run in Audit mode for 2 weeks; fix all findings; switch to Enforce
Require resource limits Prevents DoS from AI-generated unlimited pods Requires developers to know their app’s resource usage Provide request/limit guidance in the secure template; start with soft limits
Deny cluster-admin RBAC Prevents AI-generated cluster takeover path Slows setup of legitimate operators that genuinely need cluster scope Build approval process: annotate with ticket; automate via GitOps review
Secure prompt templates Reduces misconfiguration at generation time Developers must use the template; hard to enforce Add template reference to team onboarding; include in AI assistant workspace settings

Failure Modes

Failure Symptom Detection Recovery
Kyverno blocks legitimate system agent DaemonSet fails admission; node monitoring broken kubectl describe pod shows Kyverno policy violation Add the DaemonSet’s namespace to the Kyverno exclusion list; document the exception
kube-score false positive on valid pattern PR blocked on a valid manifest pattern Developer reports; kube-score shows unexpected CRITICAL Override with # kube-score/ignore annotation for the specific check; review if the check should be disabled for this resource type
AI generates valid YAML that passes all checks but is semantically wrong Manifest deploys but behaves unexpectedly (wrong port, wrong selector) Application health check fails post-deploy Static analysis catches security misconfigs but not semantic errors; pair with integration tests
Wildcard RBAC check misses specific over-broad roles Role has specific verbs/resources that together constitute cluster admin Manual RBAC review Supplement automated check with quarterly human RBAC audit; use kubectl auth can-i --as system:serviceaccount:ns:sa to verify effective permissions