Security Validation for AI-Generated Kubernetes Manifests
Problem
AI coding assistants are widely used to generate Kubernetes manifests. When a developer asks Copilot, Claude, or Cursor to “write a Deployment for my Python API with 3 replicas”, the assistant produces working YAML in seconds. The manifest deploys correctly, runs the application, and satisfies the developer’s immediate need.
The security problem is that AI-generated Kubernetes manifests systematically reproduce the worst patterns from their training data. The internet contains vastly more insecure Kubernetes YAML than secure YAML — tutorials that skip security context, Stack Overflow answers that add privileged: true to fix permission issues, blog posts from 2019 that predate Pod Security Standards. AI assistants learn these patterns and apply them by default.
The specific misconfiguration classes that appear reliably in AI-generated manifests:
Missing securityContext entirely. Most AI-generated Deployments omit the securityContext block at both pod and container levels. This means containers run as root (uid 0), can escalate privileges via setuid binaries, have no Seccomp profile, and have a writable root filesystem. These are the defaults that Pod Security Admission’s baseline policy exists to reject — but many clusters still permit them.
privileged: true in containers. When an AI assistant encounters a workload that needs device access, hostPath mounts, or elevated capabilities, it defaults to privileged: true — the equivalent of giving the container root on the host node. The correct solution is almost always to drop all capabilities and add back only the specific ones needed, but this requires understanding the workload in detail that the AI lacks.
Over-broad ClusterRoleBindings. AI-generated RBAC frequently uses cluster-admin as the roleRef, or creates ClusterRoles with wildcard resources and verbs (resources: ["*"], verbs: ["*"]). This pattern is almost never correct for application workloads and gives any compromise of the associated service account full cluster access.
hostNetwork: true, hostPID: true, hostIPC: true. These are shortcuts that AI assistants suggest when they observe that a workload needs to interact with the host network or processes. Each is a significant security boundary violation that eliminates the namespace isolation containers are supposed to provide.
No resource limits. AI-generated pods omit resources.limits, which means they can consume unlimited CPU and memory — enabling denial of service against the node’s other workloads.
Permissive service exposure. AI assistants frequently generate type: LoadBalancer or type: NodePort for internal services, exposing them externally when ClusterIP would suffice. Combined with the lack of NetworkPolicy, this creates unintended external exposure.
These patterns are not hypothetical edge cases. A study of public GitHub repositories containing AI-generated Kubernetes YAML (identifiable by characteristic comment patterns) found that AI-generated manifests had significantly higher rates of each of these misconfigurations compared to manifests written without AI assistance. The risk compounds as AI assistance becomes more prevalent — every new service started by an AI-assisted developer begins its life with the same set of security debts.
Target systems: any Kubernetes cluster (1.24+) where developers use AI coding assistants to generate manifests; CI/CD pipelines that deploy AI-generated YAML; platform teams responsible for security policy enforcement across development teams.
Threat Model
Adversary 1 — Privilege escalation via AI-generated privileged container. A developer asks an AI assistant for a monitoring agent manifest. The AI generates privileged: true because the agent needs to read /proc. An attacker who compromises the monitoring agent uses privileged access to escape to the host node, reaching all other pods’ secrets and network traffic.
Adversary 2 — Cluster takeover via AI-generated RBAC. A developer asks an AI for an operator manifest with RBAC. The AI generates a ClusterRoleBinding referencing cluster-admin. When the operator pod is compromised via a dependency vulnerability, the attacker has full cluster access.
Adversary 3 — DoS via no resource limits. A developer deploys an AI-generated manifest with no resource limits during a traffic spike. The pod consumes all CPU and memory on the node, evicting other pods including critical system components.
Adversary 4 — Unintended external exposure. An AI assistant generates a LoadBalancer service for an internal microservice. The service is exposed on a public IP. An attacker discovers it via automated scanning and accesses an API that was intended to be internal-only.
Without validation gates: all four attacks succeed because misconfigurations go undetected until they are exploited. With validation: automated tooling catches each class at PR time with actionable remediation guidance.
Configuration / Implementation
Step 1 — Scan manifests with kube-score at PR time
kube-score performs static analysis of Kubernetes manifests and produces detailed, actionable findings:
# Install kube-score
curl -L https://github.com/zegl/kube-score/releases/latest/download/kube-score_linux_amd64 \
-o /usr/local/bin/kube-score && chmod +x /usr/local/bin/kube-score
# Scan a manifest
kube-score score deployment.yaml
# Focus on critical failures
kube-score score deployment.yaml --output-format ci | grep "CRITICAL\|WARNING"
# Typical AI-generated manifest output:
# [CRITICAL] Container Security Context
# · container: app -> Container has no configured security context
# [CRITICAL] Pod Security Context
# · container: app -> Pod has no configured security context
# [WARNING] Container Resources
# · container: app -> CPU limit is not set
# [CRITICAL] Container Seccomp Profile
# · container: app -> Container has no configured Seccomp profile
Add to CI/CD as a required check:
# .github/workflows/validate-manifests.yml
name: Validate Kubernetes Manifests
on:
pull_request:
paths:
- 'k8s/**'
- 'deploy/**'
- '**/*.yaml'
- '**/*.yml'
permissions:
contents: read
jobs:
kube-score:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
- name: Run kube-score
run: |
curl -L https://github.com/zegl/kube-score/releases/download/v1.18.0/kube-score_1.18.0_linux_amd64.tar.gz | tar xz
find . -name "*.yaml" -path "*/k8s/*" | xargs ./kube-score score \
--output-format ci \
--exit-one-on-warning
Step 2 — Run Polaris for policy-based validation
Polaris checks manifests against security best practices with configurable severity:
# Install Polaris
curl -L https://github.com/FairwindsOps/polaris/releases/latest/download/polaris_linux_amd64.tar.gz | tar xz
mv polaris /usr/local/bin/
# Audit a manifest file
polaris audit --audit-path deployment.yaml --format pretty
# Common AI-manifest findings:
# Danger: privilegedContainers — Container is running as privileged
# Danger: runAsRootAllowed — Container may run as root
# Warning: readinessProbeMissing — Liveness probe is not configured
# Danger: dangerousCapabilities — Container has dangerous capabilities
Custom Polaris configuration to flag AI-generated patterns:
# polaris-config.yaml
checks:
# Container security context checks
privilegedContainers: danger
runAsRootAllowed: danger
runAsPrivileged: danger
notReadOnlyRootFilesystem: warning
privilegeEscalationAllowed: danger
allowPrivilegeEscalation: danger
# Capabilities
dangerousCapabilities: danger
dropCapabilitiesNotMet: warning
# Host sharing (common in AI-generated monitoring agents)
hostNetworkSet: danger
hostPIDSet: danger
hostIPCSet: danger
# Resource limits
cpuLimitsMissing: warning
memoryLimitsMissing: warning
cpuRequestsMissing: warning
memoryRequestsMissing: warning
# Image
tagNotSpecified: danger
pullPolicyNotAlways: warning
exemptions:
# Exempt specific system namespaces
- namespace: kube-system
controllerNames: [cilium-agent, calico-node]
rules: [hostNetworkSet, hostPIDSet]
Step 3 — Enforce with Kyverno admission policies
Catch misconfigurations at admission time, before pods even start:
# kyverno-ai-manifest-policy.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: validate-ai-generated-manifests
annotations:
policies.kyverno.io/description: >
Prevents the most common AI-generated manifest misconfigurations
from reaching production clusters.
spec:
validationFailureAction: Enforce
background: true
rules:
# Block privileged containers
- name: deny-privileged
match:
any:
- resources:
kinds: [Pod]
exclude:
any:
- resources:
namespaces: [kube-system, cilium, calico-system]
validate:
message: "Privileged containers are not permitted. Drop all capabilities and add only what is needed."
deny:
conditions:
any:
- key: "{{ request.object.spec.containers[].securityContext.privileged | contains(@, true) }}"
operator: Equals
value: true
# Require security context
- name: require-security-context
match:
any:
- resources:
kinds: [Pod]
namespaces: ["production", "staging", "development"]
validate:
message: "All containers must have a securityContext with allowPrivilegeEscalation: false"
pattern:
spec:
containers:
- securityContext:
allowPrivilegeEscalation: false
# Block cluster-admin ClusterRoleBindings for application namespaces
- name: deny-cluster-admin-binding
match:
any:
- resources:
kinds: [ClusterRoleBinding]
exclude:
any:
- resources:
names:
- "cluster-admin" # System binding
- "system:*"
validate:
message: "ClusterRoleBindings referencing cluster-admin require security team approval annotation"
deny:
conditions:
all:
- key: "{{ request.object.roleRef.name }}"
operator: Equals
value: "cluster-admin"
- key: "{{ request.object.metadata.annotations.\"security.example.com/cluster-admin-approved\" || '' }}"
operator: NotEquals
value: "true"
# Block host namespace sharing
- name: deny-host-namespaces
match:
any:
- resources:
kinds: [Pod]
namespaces: ["production", "staging"]
validate:
message: "hostNetwork, hostPID, and hostIPC must not be enabled in production/staging"
pattern:
spec:
=(hostNetwork): false
=(hostPID): false
=(hostIPC): false
# Require resource limits
- name: require-resource-limits
match:
any:
- resources:
kinds: [Pod]
namespaces: ["production", "staging"]
validate:
message: "Resource limits (CPU and memory) must be set on all containers"
pattern:
spec:
containers:
- resources:
limits:
memory: "?*"
cpu: "?*"
Step 4 — Provide secure AI prompt templates
Reduce the problem at the source: give developers prompts that produce more secure manifests:
# Secure Kubernetes Manifest Prompt Template
# Save as .github/prompts/k8s-deployment.md and reference in AI coding assistant settings
When generating Kubernetes Deployment manifests:
1. Always include podSecurityContext with: runAsNonRoot: true, seccompProfile: {type: RuntimeDefault}
2. Always include containerSecurityContext with: allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, capabilities: {drop: ["ALL"]}
3. Always include resource requests AND limits for CPU and memory
4. Use ClusterIP (not LoadBalancer) for internal services unless explicitly asked for external access
5. Set imagePullPolicy: Always for mutable tags; IfNotPresent for digest-pinned images
6. Never use privileged: true — instead, add specific capabilities back with capabilities.add
7. Never set hostNetwork, hostPID, or hostIPC unless the workload is a system agent that genuinely requires it
Alternatively, maintain a secure Deployment template that AI tools can reference:
# .github/templates/secure-deployment.yaml
# AI TEMPLATE: Use this as a base for all Deployment manifests
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ app-name }}
namespace: {{ namespace }}
spec:
replicas: 2
selector:
matchLabels:
app: {{ app-name }}
template:
metadata:
labels:
app: {{ app-name }}
spec:
securityContext:
runAsNonRoot: true
runAsUser: 10000
runAsGroup: 10000
fsGroup: 10000
seccompProfile:
type: RuntimeDefault
containers:
- name: {{ app-name }}
image: {{ image }}:{{ tag }}
imagePullPolicy: Always
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 15
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
Step 5 — RBAC audit for AI-generated roles
#!/bin/bash
# audit-ai-rbac.sh — detect over-broad AI-generated RBAC
# Find ClusterRoles with wildcard permissions (AI hallmark)
kubectl get clusterroles -o json | jq -r '
.items[] |
select(
.metadata.name | test("^system:") | not
) |
. as $role |
.rules[]? |
select(
(.resources[]? == "*") or
(.verbs[]? == "*") or
(.apiGroups[]? == "*")
) |
$role.metadata.name
' | sort -u | while read -r role; do
echo "FINDING: ClusterRole '$role' has wildcard permissions"
done
# Find ClusterRoleBindings referencing cluster-admin
kubectl get clusterrolebindings -o json | jq -r '
.items[] |
select(.roleRef.name == "cluster-admin") |
select(.metadata.name | test("^system:") | not) |
"\(.metadata.name): \(.subjects[]? | "\(.kind)/\(.name)")"
'
Expected Behaviour
| Manifest characteristic | AI-generated (unvalidated) | After validation pipeline |
|---|---|---|
| Container runs as root | Common (no securityContext) | Blocked by Kyverno at admission |
privileged: true |
Suggested by AI for monitoring agents | Blocked; developer receives actionable error |
cluster-admin ClusterRoleBinding |
Generated when AI sees “needs cluster access” | Blocked unless approved annotation present |
| Missing resource limits | Near-universal in AI output | Blocked for production/staging namespaces |
| kube-score CI check | Not present | Fails PR if CRITICAL findings in changed YAML |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Kyverno Enforce mode | Hard block before admission | Breaks workflows that depended on insecure defaults | Run in Audit mode for 2 weeks; fix all findings; switch to Enforce |
| Require resource limits | Prevents DoS from AI-generated unlimited pods | Requires developers to know their app’s resource usage | Provide request/limit guidance in the secure template; start with soft limits |
| Deny cluster-admin RBAC | Prevents AI-generated cluster takeover path | Slows setup of legitimate operators that genuinely need cluster scope | Build approval process: annotate with ticket; automate via GitOps review |
| Secure prompt templates | Reduces misconfiguration at generation time | Developers must use the template; hard to enforce | Add template reference to team onboarding; include in AI assistant workspace settings |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Kyverno blocks legitimate system agent | DaemonSet fails admission; node monitoring broken | kubectl describe pod shows Kyverno policy violation |
Add the DaemonSet’s namespace to the Kyverno exclusion list; document the exception |
| kube-score false positive on valid pattern | PR blocked on a valid manifest pattern | Developer reports; kube-score shows unexpected CRITICAL | Override with # kube-score/ignore annotation for the specific check; review if the check should be disabled for this resource type |
| AI generates valid YAML that passes all checks but is semantically wrong | Manifest deploys but behaves unexpectedly (wrong port, wrong selector) | Application health check fails post-deploy | Static analysis catches security misconfigs but not semantic errors; pair with integration tests |
| Wildcard RBAC check misses specific over-broad roles | Role has specific verbs/resources that together constitute cluster admin | Manual RBAC review | Supplement automated check with quarterly human RBAC audit; use kubectl auth can-i --as system:serviceaccount:ns:sa to verify effective permissions |
Related Articles
- AI-Generated CI/CD Config Security — the same AI misconfiguration pattern applied to pipeline YAML; complementary validation pipeline
- Kyverno Policy Development — writing the admission policies that gate AI-generated manifests
- Pod Security Context — the security context fields that AI assistants most commonly omit
- RBAC Design Patterns — designing the minimal RBAC that AI assistants should be prompted to generate
- Kubernetes Subresource RBAC Escalation — the specific RBAC escalation paths that AI-generated ClusterRoleBindings enable