AI-Generated Kubernetes Operators vs. Maintained Open Source: The CVE Response Gap

AI-Generated Kubernetes Operators vs. Maintained Open Source: The CVE Response Gap

The Problem

In early 2025, a platform team at a mid-sized fintech company needed a Kubernetes operator to manage their internal configuration CRD — a resource that described service mesh configuration for hundreds of microservices. No OperatorHub entry matched their schema. Building from scratch using controller-runtime was estimated at three weeks of engineering time. A senior engineer prompted Claude 3.5 Sonnet and had a working operator — CRD definition, reconciliation loop, admission webhook, RBAC manifests, Dockerfile — in about four hours.

Eighteen months later, govulncheck ran against that codebase as part of a quarterly security review and found four vulnerabilities in the operator’s Go dependency chain: CVE-2025-22869 (a key exchange denial-of-service in golang.org/x/crypto/ssh, CVSS 7.5), CVE-2025-22866 (scalar invalidation in golang.org/x/crypto, CVSS 4.0), and two findings in a pinned version of k8s.io/apiserver from controller-runtime’s transitive dependency tree. The operator was running in production with a ClusterRole that included wildcard verb grants on all resources in its API group — which the LLM had generated because wildcard RBAC makes the reconciliation loop work without needing to enumerate specific sub-resources. Nobody had changed those RBAC grants in eighteen months because the operator worked and the team had moved on to other work.

Nobody had filed a CVE against the operator. Nobody had received a security advisory. There was no SECURITY.md, no security@company.com email address, no GitHub Security Advisory. There was no maintainer in the sense that open source operators have maintainers: someone who watches for CVEs in dependencies and issues a patch release. There was just a Go module in a private repository, running as a Kubernetes operator with cluster-wide permissions, accumulating vulnerability debt.

This is the structural problem. An LLM can produce controller-runtime code that passes basic validation. It cannot produce the ecosystem of security processes that surrounds a maintained open source operator.

What maintained operators actually provide. Compare the AI-generated operator’s security posture with cert-manager (CNCF incubating, 11,000+ GitHub stars as of 2026, security contact at cert-manager-security@googlegroups.com):

  • CVE response: cert-manager’s June 2025 advisory GHSA-r4pg-vg54-wxx4 (a denial-of-service in ACME challenge processing) was reported privately, fixed in a private advisory branch, and released as v1.15.3 with coordinated disclosure within the 90-day embargo window. The advisory listed affected versions, fixed versions, and upgrade instructions. Clusters running cert-manager received the information through GitHub Release notifications, the cert-manager Slack channel, and the CNCF security mailing list. Your AI-generated operator will never produce that sequence of events.
  • Dependency updates: The External Secrets Operator (ESO) runs Renovate on its dependencies. When controller-runtime v0.18.x addressed a reflection-based resource access issue in late 2025, ESO shipped a patch release within two weeks. Renovate opened the PR; the maintainer merged it and cut a release.
  • RBAC audits: Prometheus Operator’s ClusterRole was audited and scoped down in a dedicated PR in 2025, removing permissions that accumulated over years of feature additions but were no longer required by current reconciliation logic. An AI-generated operator has no equivalent process — the RBAC from the initial generation is the RBAC that runs in production, likely forever.
  • Webhook certificate management: Strimzi Kafka operator integrates cert-manager for webhook TLS certificate issuance and rotation. The operator’s own certificates are refreshed without manual intervention. AI-generated admission webhooks frequently use a self-signed certificate generated at deployment time with a 10-year validity, stored in a Secret, and never rotated. If the webhook certificate expires — or if an attacker obtains the private key — the consequences range from admission control failure to mutation injection.

The scale of the gap. Kubernetes operators are not utility scripts. They are long-running privileged processes. The Kubernetes API server treats an operator’s service account token with the same authority as any other bearer token — if the operator has cluster-admin, so does any attacker who achieves code execution inside the operator pod. The attack surface includes: the operator’s Go dependency chain (every imported library is a potential vulnerability source), the container image’s base OS packages, the admission webhook’s TLS handling, the operator’s reconciliation logic, and the custom resource validation. Maintaining security posture across all of these dimensions for a single operator is a non-trivial ongoing commitment — one that mature open source operators fulfill through structured processes that AI-generated code inherently lacks.

Threat Model

Threat 1: Dependency chain CVE with no patch path. An AI-generated operator built on controller-runtime v0.16.3 in mid-2025 accumulates Go dependency CVEs as the upstream ecosystem moves forward. The operator’s go.mod pins transitive dependencies at their initial versions. govulncheck run six months later identifies CVE-2025-22869 in golang.org/x/crypto v0.31.0 — the SSH key exchange denial-of-service that affects any code that imports the SSH client, including some indirect paths through k8s.io/client-go. The CVE is rated CVSS 7.5. The fixed version is golang.org/x/crypto v0.35.0. Updating the operator requires rebuilding the binary, updating the container image, and redeploying — work that requires someone with Go knowledge to take ownership. If the team that built the operator has moved on, this is work that simply does not get done.

Threat 2: Overprivileged RBAC under attacker control. An AI-generated operator running with a wildcard ClusterRole gets its pod compromised through a path-traversal vulnerability in the HTTP client library it uses for health check callbacks — a real class of vulnerability seen in CVE-2024-24786 (protobuf-go) and similar findings. Code execution inside the operator pod immediately grants the attacker the operator’s full RBAC access. With wildcard grants on "*" resources, that is equivalent to cluster-admin: the attacker can list all Secrets, create ClusterRoleBindings, modify RBAC, exfiltrate TLS private keys from cert-manager’s Secret store, and deploy DaemonSets on every node. A maintained operator with scoped RBAC limits this blast radius to the specific resources the operator legitimately manages.

Threat 3: Admission webhook with insecure TLS as a persistent MITM surface. A generated validating webhook uses a self-signed TLS certificate hardcoded into a Kubernetes Secret at deploy time. The certificate has a 10-year TTL and uses a 2048-bit RSA key. The caBundle in the ValidatingWebhookConfiguration matches the cert at deploy time and is never updated. An attacker who obtains the private key — through the operator’s own Secret read permissions, through a Secret exfiltration from another compromised workload, or through a misconfigured RBAC policy — can perform a man-in-the-middle attack on webhook traffic. Because the operator’s failurePolicy was set to Ignore (the LLM’s default, to avoid webhook availability problems), webhook failures silently admit all requests. The attacker does not need to MITM: they can just take the webhook server offline and all admission validation for the operator’s CRD stops working.

Threat 4: No security advisory channel means responsible disclosure fails. A security researcher auditing a deployed AI-generated operator finds that the operator writes a derived database credential into the custom resource’s .status field for the application to consume — a common pattern in AI-generated operators that don’t model the secret reference pattern used by mature operators. The credential is readable by anyone with get on the CR, including developers who should not have access to production database passwords. The researcher wants to disclose this responsibly. There is no SECURITY.md, no security contact, no GitHub Security Advisory button that reaches an attentive maintainer. The researcher files a public GitHub issue. Within hours, the vulnerability is public. Clusters running the operator are now exposed with no patch available and no coordinated communication to the teams running it.

Configuration

1. Use OperatorHub Before Building Custom

Before investing in a custom operator — AI-generated or otherwise — audit OperatorHub and the CNCF landscape for existing solutions.

# Check if a maintained operator exists for your use case.
# OperatorHub.io lists 300+ operators with maturity ratings.
# CNCF landscape: https://landscape.cncf.io/card-mode?category=database&grouping=category

# For common infrastructure needs, maintained operators exist:
#
# Relational databases:
#   CloudNativePG (Postgres) — CNCF sandbox, 3,000+ GitHub stars
#   PlanetScaleDB (MySQL/Vitess) — commercial, open core
#
# Message queues:
#   Strimzi (Kafka) — CNCF sandbox; security contact in SECURITY.md
#
# Secrets:
#   External Secrets Operator — CNCF sandbox; Renovate on deps; signed releases
#   Vault Secrets Operator (HashiCorp) — commercial support available
#
# Certificates:
#   cert-manager — CNCF incubating; GHSA process; 11k+ stars
#
# Autoscaling:
#   KEDA — CNCF graduated; signed artifacts; security advisories on GitHub
#
# GitOps:
#   Argo CD — CNCF graduated; 17k+ stars; active CVE response history
#   Flux — CNCF graduated; signed artifacts; security response team

# Check if a candidate operator has a security contact:
curl -s https://api.github.com/repos/cert-manager/cert-manager/contents/SECURITY.md \
  | jq -r '.content' | base64 -d | head -30

# Check the operator's last release date — a good proxy for active maintenance:
gh api repos/cert-manager/cert-manager/releases/latest \
  --jq '{tag: .tag_name, published: .published_at, prerelease: .prerelease}'

# Check if releases are signed (Sigstore/cosign) — a maturity signal:
gh release download v1.15.3 --repo cert-manager/cert-manager \
  --pattern '*.pem' --dir /tmp/cert-manager-check
cosign verify-blob \
  --certificate /tmp/cert-manager-check/cert-manager-controller-linux-amd64.tar.gz.pem \
  --signature /tmp/cert-manager-check/cert-manager-controller-linux-amd64.tar.gz.sig \
  --certificate-identity-regexp '.*' \
  --certificate-oidc-issuer https://token.actions.githubusercontent.com \
  /tmp/cert-manager-check/cert-manager-controller-linux-amd64.tar.gz

If no maintained operator exists and a custom operator is the only path forward, document this decision explicitly. The absence of an upstream is a security liability that must be owned — not assumed away.

2. Audit RBAC for Any Custom Operator Before Production

The most common AI-generated RBAC pattern is wildcard grants because LLMs optimize for functional correctness, not least privilege. Audit every ClusterRole before deployment.

# Install rbac-tool: https://github.com/alcideio/rbac-tool
rbac-tool lookup \
  --service-account my-operator \
  -n my-operator-system

# Output to look for — these are immediate red flags:
#
#   Cluster-wide:
#     apiGroups: ["*"]  resources: ["*"]  verbs: ["*"]    <- cluster-admin equivalent
#     apiGroups: [""]   resources: ["secrets"]  verbs: ["*"]  <- all secrets in all namespaces
#     resources: ["clusterrolebindings"]  verbs: ["create"]    <- privilege escalation
#
# Verify what the operator actually uses at runtime (audit log):
kubectl logs -n kube-system \
  $(kubectl get pods -n kube-system -l component=kube-apiserver -o name | head -1) \
  | grep '"user":{"username":"system:serviceaccount:my-operator-system:my-operator"' \
  | jq -r '.requestURI + " " + .verb + " " + (.objectRef // {} | .resource + "/" + (.subresource // ""))' \
  | sort -u

The audit log check tells you what the operator actually calls at runtime — use this to construct a minimal RBAC, then remove everything else:

# Red flag: AI-generated wildcard ClusterRole.
# This is what LLMs generate when they prioritise "it works."
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: my-operator  # REJECT THIS.
rules:
- apiGroups: ["*"]
  resources: ["*"]
  verbs: ["*"]
# Required: audit-derived minimal ClusterRole.
# Constructed from actual API server audit log entries for this operator.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: my-operator
rules:
# Own CRDs — the operator's raison d'être.
- apiGroups: ["myconfig.mycompany.io"]
  resources: ["meshconfigs", "meshconfigs/status", "meshconfigs/finalizers"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
# Deployments: update mesh config sidecar annotations.
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list", "watch", "patch"]
# Events: emit status events.
- apiGroups: [""]
  resources: ["events"]
  verbs: ["create", "patch"]
# Leader election lease (scoped to specific lease name via ResourceName).
- apiGroups: ["coordination.k8s.io"]
  resources: ["leases"]
  resourceNames: ["my-operator-leader"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
#
# Explicitly absent:
#   secrets: none — operator does not handle credentials
#   clusterroles/clusterrolebindings: none — operator does not manage RBAC
#   nodes: none — operator does not need node topology
#   namespaces: none — reconciliation is namespace-scoped
#   pods: none — operator manages Deployments, not Pods directly

Enforce this policy at admission time with Kyverno. An operator that requests wildcard grants should fail policy before it ever runs:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: block-operator-wildcard-rbac
  annotations:
    policies.kyverno.io/description: >
      Operators must not request wildcard verb or resource grants.
      AI-generated operators frequently request "*" on "*" resources.
      This policy blocks ClusterRole creation with wildcard patterns
      in operator namespaces.
spec:
  validationFailureAction: Enforce
  background: false
  rules:
  - name: block-wildcard-verbs
    match:
      any:
      - resources:
          kinds: ["ClusterRole"]
    validate:
      message: >
        ClusterRole uses wildcard verbs or resources. Enumerate
        specific verbs and resources. See operator RBAC runbook.
      deny:
        conditions:
          any:
          - key: "{{ request.object.rules[].verbs[] | contains(@, '*') }}"
            operator: Equals
            value: true
          - key: "{{ request.object.rules[].resources[] | contains(@, '*') }}"
            operator: Equals
            value: true

3. Daily Dependency Vulnerability Scanning

An AI-generated operator has no Renovate, no Dependabot, no release process. You are the maintainer. That means you need automated scanning to know when your dependency chain accumulates CVEs — because no upstream will tell you.

# .github/workflows/govulncheck.yaml
# Run daily to catch new CVEs added to the Go vulnerability database
# against an unchanged codebase. A CVE can be published today against
# a dependency you pinned six months ago.
name: Dependency Vulnerability Scan
on:
  schedule:
    - cron: '0 6 * * *'   # 06:00 UTC daily
  push:
    branches: [main]
  pull_request:

jobs:
  govulncheck:
    name: govulncheck
    runs-on: ubuntu-latest
    permissions:
      contents: read
      security-events: write   # For SARIF upload
    steps:
    - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11

    - uses: actions/setup-go@v5
      with:
        go-version-file: go.mod

    - name: Run govulncheck
      uses: golang/govulncheck-action@v1
      with:
        go-version-input: '1.22'
        go-package: './...'
        # govulncheck exits non-zero if any vulnerability is found
        # that affects reachable code paths. This distinguishes between
        # vulnerabilities in code you actually call vs. transitive
        # dependencies you import but never invoke.

    - name: Also run Trivy for container image CVEs
      uses: aquasecurity/trivy-action@18f2510ee396bbf400402947b394f2dd8c87dbb0
      with:
        image-ref: 'ghcr.io/myorg/my-operator:latest'
        format: sarif
        output: trivy-results.sarif
        severity: 'CRITICAL,HIGH'
        exit-code: '1'

    - name: Upload Trivy SARIF
      uses: github/codeql-action/upload-sarif@v3
      if: always()
      with:
        sarif_file: trivy-results.sarif

When govulncheck finds a CVE in a maintained operator, the upstream issues a patch release that you can adopt. When it finds a CVE in your AI-generated operator, you have to fix it yourself. The actual govulncheck output for the scenario above looks like this:

Vulnerability #1: GO-2025-3447
    A denial of service vulnerability exists in golang.org/x/crypto/ssh
    in versions before 0.35.0. The SSH server may panic if the client
    sends an invalid Diffie-Hellman key exchange message.
  More info: https://pkg.go.dev/vuln/GO-2025-3447
  Module: golang.org/x/crypto
  Found in: golang.org/x/crypto@v0.31.0
  Fixed in: golang.org/x/crypto@v0.35.0
  Example traces found:
    #1: cmd/manager/main.go:47:22: main calls
        sigs.k8s.io/controller-runtime/pkg/manager.New,
        which eventually calls golang.org/x/crypto/ssh...

Vulnerability #2: GO-2025-3503
    An attacker can craft a message which results in a large amount
    of CPU time and/or memory being allocated in p256NegCond.
  More info: https://pkg.go.dev/vuln/GO-2025-3503
  Module: golang.org/x/crypto
  Found in: golang.org/x/crypto@v0.31.0
  Fixed in: golang.org/x/crypto@v0.35.0

Your code is affected by 2 vulnerabilities from 1 module.

The fix — bumping golang.org/x/crypto to v0.35.0 in go.mod, running go mod tidy, rebuilding the image — takes about twenty minutes. The problem is knowing that the work needs to be done, having someone assigned to do it, and having a release process to ship it. None of that comes from the LLM.

4. Admission Webhook Security

AI-generated admission webhooks carry a predictable set of security defects. The most dangerous: failurePolicy: Ignore (the LLM defaults that avoid breaking the cluster during webhook outages), self-signed certificates with decade-long TTLs, and no TLS rotation.

# What LLMs typically generate — insecure:
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: my-operator-webhook
webhooks:
- name: validate.meshconfig.mycompany.io
  failurePolicy: Ignore           # WRONG: bypassed when webhook is down
  clientConfig:
    caBundle: LS0tLS1CRUdJTi...   # Static self-signed cert, never rotated
    service:
      name: my-operator-webhook
      namespace: my-operator-system
      port: 443
  rules:
  - operations: ["CREATE", "UPDATE", "DELETE"]
    apiGroups: ["*"]              # WRONG: intercepts all API groups
    apiVersions: ["*"]
    resources: ["*"]             # WRONG: intercepts all resources
# Hardened replacement:
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: my-operator-webhook
  annotations:
    # cert-manager injects the caBundle automatically when this annotation
    # is present and the Certificate resource below is configured.
    cert-manager.io/inject-ca-from: my-operator-system/my-operator-webhook-cert
webhooks:
- name: validate.meshconfig.mycompany.io
  # Fail means: if the webhook is unreachable or returns an error,
  # the admission request is denied. This is correct for security-relevant
  # validation. Run the webhook with 2+ replicas and a PodDisruptionBudget
  # to handle this availability requirement.
  failurePolicy: Fail
  sideEffects: None
  admissionReviewVersions: ["v1"]
  timeoutSeconds: 10
  clientConfig:
    # caBundle injected by cert-manager — not static.
    service:
      name: my-operator-webhook
      namespace: my-operator-system
      port: 9443
      path: /validate-myconfig-mycompany-io-v1-meshconfig
  rules:
  - operations: ["CREATE", "UPDATE"]
    apiGroups: ["myconfig.mycompany.io"]  # Only our API group
    apiVersions: ["v1"]
    resources: ["meshconfigs"]            # Only our resource
  # Exclude the operator's own namespace from interception to prevent
  # infinite loops during operator startup.
  namespaceSelector:
    matchExpressions:
    - key: kubernetes.io/metadata.name
      operator: NotIn
      values: ["my-operator-system", "kube-system"]
# cert-manager Certificate for the webhook server — automatic rotation.
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: my-operator-webhook-cert
  namespace: my-operator-system
spec:
  secretName: my-operator-webhook-cert
  dnsNames:
  - my-operator-webhook.my-operator-system.svc
  - my-operator-webhook.my-operator-system.svc.cluster.local
  issuerRef:
    name: my-operator-ca-issuer
    kind: Issuer
  # 90-day TTL with 30-day early renewal.
  # Compare to the AI-generated default: 10-year self-signed certificate.
  duration: 2160h    # 90 days
  renewBefore: 720h  # Renew 30 days before expiry
  privateKey:
    algorithm: ECDSA
    size: 256

The cert-manager integration requires cert-manager to be installed in the cluster — which is itself a dependency on a maintained open source operator. This is not an accident: maintained operators compose with each other. An AI-generated operator that reinvents TLS certificate management from scratch is solving a solved problem badly.

5. OpenSSF Scorecard as a Maturity Gate

Before deploying any operator — open source or internally generated — run OpenSSF Scorecard against it. Scorecard quantifies the security hygiene signals that separate a maintained project from abandoned code with a working reconciliation loop.

# Run Scorecard against a candidate open source operator:
docker run \
  -e GITHUB_AUTH_TOKEN=$GITHUB_TOKEN \
  gcr.io/openssf/scorecard:stable \
  --repo=github.com/cert-manager/cert-manager \
  --format=json \
  | jq '.checks[] | {name, score, reason}' | head -60

# Expected output for cert-manager (a well-maintained operator):
# {"name":"Maintained","score":10,"reason":"30 commit(s) out of 30 are found in the last 90 days"}
# {"name":"Security-Policy","score":10,"reason":"security policy file detected"}
# {"name":"Signed-Releases","score":10,"reason":"all release artifacts are signed"}
# {"name":"Dependency-Update-Tool","score":10,"reason":"Renovate detected"}
# {"name":"Branch-Protection","score":9,"reason":"branch protection is not maximal..."}
# {"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected"}

# Run Scorecard against an AI-generated operator's repo:
docker run \
  -e GITHUB_AUTH_TOKEN=$GITHUB_TOKEN \
  gcr.io/openssf/scorecard:stable \
  --repo=github.com/myorg/my-operator \
  --format=json \
  | jq '.checks[] | select(.score < 5) | {name, score, reason}'

# Typical AI-generated operator output:
# {"name":"Maintained","score":0,"reason":"repo was created 8 months ago, 0 commit(s) in last 90 days"}
# {"name":"Security-Policy","score":0,"reason":"security policy file not detected"}
# {"name":"Signed-Releases","score":0,"reason":"no release artifacts found"}
# {"name":"Dependency-Update-Tool","score":0,"reason":"no update tool detected"}
# {"name":"Vulnerabilities","score":4,"reason":"2 existing vulnerabilities detected"}
# {"name":"SAST","score":0,"reason":"no SAST tool detected"}

A Scorecard result with Maintained: 0 and Security-Policy: 0 is a direct measurement of the CVE response gap. It quantifies what “no maintainer, no security process” looks like in practice.

6. Define the Maintenance Commitment Before Shipping

The decision to deploy a custom operator — AI-assisted or otherwise — is a decision to become its maintainer. That commitment must be explicit before the operator reaches production. Create an OPERATOR-SECURITY.md in the operator’s repository and require it as a merge prerequisite:

# OPERATOR-SECURITY.md

## Operator: my-operator (MeshConfig Controller)
## Origin: Initially generated with Claude Sonnet 4.5 (2025-08-12), significantly modified
## Production since: 2025-09-01
## Maintainer: Platform Engineering Team <platform-eng@mycompany.com>

## Why This Operator Instead of an Open Source Alternative

No existing OperatorHub operator manages the MeshConfig CRD schema we require.
Candidates evaluated:
- Istio operator: manages Istio control plane, not application mesh config CRDs
- Custom controller from service-mesh-hub: last release 2023, no security contact

Decision: custom operator with documented maintenance commitment.

## Security Process

**CVE reporting:** security@mycompany.com (internal Jira security queue)
**Patch SLA:**
  - Critical (CVSS 9.0+): 48 hours
  - High (CVSS 7.0-8.9): 7 days
  - Medium (CVSS 4.0-6.9): 30 days

**Dependency update process:**
  - Renovate Bot is configured (renovate.json) with Go module updates
  - govulncheck runs daily via GitHub Actions; alerts route to #platform-security Slack
  - Container base image updated monthly via automated PR

## Known Limitations vs. Open Source Alternatives

- No external security audit has been performed
- No formal threat model document exists (tracked in SEC-1447)
- No signed release artifacts (planned for Q3 2026)
- RBAC was audited against production audit logs on 2026-03-15; next audit due 2026-09-15

## RBAC Justification

See ClusterRole manifest at deploy/rbac/clusterrole.yaml.
Justification for each grant is documented inline as YAML comments.
Wildcard grants are explicitly prohibited by Kyverno policy block-operator-wildcard-rbac.

This document forces an explicit reckoning before deployment. It prevents the most common failure mode: an operator ships to production with the implicit assumption that “someone will handle security” — and that someone turns out to be nobody.

Expected Behaviour

govulncheck on a six-month-old AI-generated operator will typically find CVEs in the golang.org/x family (crypto, net, text) and in k8s.io transitive dependencies, because these packages release security fixes frequently and pinned go.sum entries do not automatically update. A quarterly-maintained operator will have addressed these. An AI-generated operator that hasn’t been touched since initial deployment will show multiple findings, often including at least one High-severity CVE.

OpenSSF Scorecard comparison between a CNCF-graduated operator and an AI-generated operator shows consistent deltas: Maintained (10 vs 0), Security-Policy (10 vs 0), Signed-Releases (10 vs 0), Dependency-Update-Tool (10 vs 0). These are not cosmetic differences — they are the mechanisms through which vulnerabilities get discovered, communicated, and fixed.

RBAC audit with rbac-tool on an AI-generated operator will show the verbatim grants from initial generation. In practice, LLMs generating controller-runtime operators produce ClusterRoles with wildcard verbs on the operator’s own API group, plus frequent additions of wildcard secrets access and RBAC management permissions that the reconciliation logic does not actually require. These accumulate because the LLM generates the RBAC to cover all possible paths through the code it generates, including error-handling paths and development-time scaffolding that was never removed.

Webhook failurePolicy: Ignore in production means that any reason the webhook becomes unavailable — pod restart, OOM kill, network partition — silently admits all requests that would otherwise be validated. In a well-run cluster this is detectable via webhook request metrics and API server audit logs. In practice, it often goes unnoticed until a misconfigured custom resource causes downstream reconciliation failures.

Trade-offs

Build custom when genuinely no alternative exists. The argument for AI-assisted operator development is strongest when the CRD schema is genuinely company-specific and no OperatorHub operator comes close. Operator SDK scaffolding is still weeks of work even with LLM assistance; the trade-off is not “AI vs. hand-written” but “custom operator vs. redesigning the abstraction to fit a maintained operator.” When the latter is possible, it almost always produces better long-term security posture.

AI assistance for scaffolding is lower risk than AI assistance for security logic. Using an LLM to generate the boilerplate reconciliation loop structure, CRD type definitions, and controller-runtime wiring is defensible when a senior engineer reviews and owns the output. Using an LLM to generate admission webhook TLS handling, RBAC policy, or finalizer cleanup logic without review is not — these are the exact domains where the gap between “functional code” and “secure code” is widest and hardest to detect through testing.

The maintenance cost of a custom operator is consistently underestimated. Initial build time with LLM assistance is fast. The ongoing maintenance cost — dependency updates, CVE monitoring, RBAC re-audits as the operator evolves, webhook certificate rotation, compatibility with new Kubernetes minor versions — runs to several engineering days per quarter for a moderately complex operator. Teams that build custom operators without explicitly budgeting this cost will find that the operator accumulates security debt in proportion to how long they ignore it.

Failure Modes

Deploying with implicit cluster-admin because “we’ll fix the RBAC later.” This never gets fixed because the operator works, the team is busy, and tightening RBAC on a running operator in production requires careful testing to avoid breaking reconciliation. The RBAC from initial deployment runs in production indefinitely. This is the default outcome for AI-generated operators without an explicit RBAC audit gate before production.

No CVE scanning means critical vulnerabilities sit for months. Without a daily govulncheck job, a Critical CVE published against a Go library in your dependency tree is not discovered until someone runs a manual audit — or until a security assessor finds it during a penetration test. The median time from CVE publication to patch adoption for unmonitored internal services is measured in months, not days.

Treating AI-generated code as equivalent to community-vetted open source. The code may be functionally equivalent. The security posture is not. The difference is not in the quality of any specific function — it is in the surrounding ecosystem: the people watching for CVEs, the processes that route vulnerability reports to maintainers, the release infrastructure that ships patches to users. LLMs do not generate those things.

Webhook failurePolicy: Ignore as a permanent configuration. Teams set this because failurePolicy: Fail requires the webhook to be reliably available, which requires proper replica counts, PodDisruptionBudgets, and health check configuration. The LLM sets Ignore because it avoids the availability problem at the cost of creating a security bypass. Teams inherit this configuration and never revisit it because the operator functions correctly and the bypass is invisible during normal operation.

No SECURITY.md means responsible disclosure fails open. A researcher who finds a privilege escalation in an AI-generated operator and finds no security contact will eventually file a public issue or post to a mailing list. Your clusters running that operator get no advance notice. Adding SECURITY.md and a monitored security contact costs thirty minutes. Not having one guarantees that any vulnerability disclosure will be as damaging as possible.