Cluster API Security for Kubernetes Fleet Management
Problem
Cluster API (CAPI) is a Kubernetes project that brings declarative, Kubernetes-style APIs to cluster lifecycle management. Instead of running shell scripts, Terraform modules, or cloud-vendor wizards to create workload clusters, platform teams create Cluster, Machine, MachineDeployment, and KubeadmControlPlane custom resources inside a dedicated management cluster. CAPI controllers reconcile those resources against cloud provider APIs to provision, upgrade, scale, and delete entire Kubernetes clusters on demand. The appeal is enormous: cluster creation becomes a kubectl apply, upgrades become a field update, and fleet consistency becomes policy-as-code. The security implications are equally large, and they are underappreciated in most CAPI adoption stories.
The management cluster is the single most sensitive system in a CAPI fleet. It holds cloud provider credentials — service account keys, IAM role bindings, or federated identity tokens — for every workload cluster it manages. Those credentials are scoped to provision compute, networking, storage, and identity resources across an entire cloud account or subscription. An attacker who gains cluster-admin on the management cluster, or who can exec into a CAPI controller pod, effectively holds credentials to provision or destroy compute across the entire fleet. The blast radius is not limited to Kubernetes: IAM roles with broad EC2 permissions can be used to exfiltrate data from S3, modify VPC routes, or pivot to other AWS services entirely. This is a qualitatively different threat profile from compromising a single workload cluster.
Bootstrap tokens are short-lived Kubernetes secrets used during the kubeadm node join process. A new node presents the token to the API server to retrieve its TLS bootstrap credentials. CAPI creates these tokens automatically during Machine provisioning and embeds them in node cloud-init UserData. The CAPI specification defaults allow TTLs of one hour or more. If a node takes longer than expected to boot — due to AMI pull time, cloud capacity constraints, or network issues — operators sometimes respond by increasing the TTL rather than investigating the underlying cause. A bootstrap token with a long TTL sitting in kube-system is a standing invitation: any actor who can read that secret, or who intercepts the UserData through the cloud metadata service, can join an unauthorized node to the cluster until the token expires.
CAPI providers — CAPA for AWS, CAPZ for Azure, CAPG for GCP, and others — require IAM permissions that are difficult to scope minimally. Provisioning a cluster requires creating EC2 instances, load balancers, security groups, subnets, IAM instance profiles, EBS volumes, and Route 53 records. Most teams start with the example IAM policies from provider documentation and never revisit them. Those example policies are intentionally broad to minimize setup friction. In production, a single shared IAM role used by the CAPA controller for all managed clusters means that credential compromise affects every cluster in the fleet regardless of workload criticality, environment, or business unit.
ClusterClass, introduced in CAPI v1.2 as part of the topology feature, centralizes cluster template definitions. A ClusterClass defines a reusable skeleton for control plane and worker configuration, and individual Cluster resources reference it via spec.topology.class. This reduces configuration duplication and enables fleet-wide upgrades by updating a single ClusterClass. The security implication is that ClusterClass becomes a high-value target: a mutation to the template propagates to every cluster referencing it. An attacker or misconfigured automation pipeline with write access to a ClusterClass can change machine types, inject UserData, alter network configuration, or disable security controls fleet-wide without touching individual cluster objects.
Most CAPI fleets are managed via GitOps — Flux or Argo CD applies cluster manifests from a Git repository to the management cluster. This is the correct operational pattern, but it shifts the security boundary to the Git repository and the GitOps tooling. Branch protection rules, required reviewers for the clusters/ directory, and the RBAC constraints on the GitOps service account all become part of the cluster security posture. A developer who can push directly to the main branch, or who can approve their own pull requests, effectively has the ability to create, modify, or delete clusters without human review.
Target systems: Cluster API v1.7+, CAPA v2.x (AWS), CAPZ v1.x (Azure), clusterctl v1.7+.
Threat Model
-
Management cluster compromise for fleet-wide credential access. An attacker gains code execution inside the management cluster — through a vulnerable workload, a misconfigured admission policy, or a stolen kubeconfig. They locate the
AWSClusterControllerIdentityor provider secret, extract cloud credentials, and use them to provision unauthorized EC2 instances, read S3 buckets, or pivot to other services. All workload clusters managed by the compromised controller are affected. -
Developer with Machine write access provisioning unauthorized compute. A developer granted write access to
MachineorMachineDeploymentobjects — for troubleshooting or scaling purposes — creates additional machines in a production cluster outside of the normal change process. Without cost controls or MachineDeployment quotas, they can provision GPU instances, overprovision capacity, or cause cloud account spending spikes. With a customAWSMachineTemplatereferencing a backdoored AMI, the unauthorized node joins the cluster and runs attacker-controlled workloads. -
Bootstrap token reuse by rogue node. A bootstrap token with a TTL of one hour is embedded in a Machine’s cloud-init UserData. The intended node never boots — cloud capacity is unavailable, or the machine is deleted after a failed join attempt. The token remains valid in
kube-system. Thirty minutes later, an attacker who has read the token from the cloud metadata service, from a leaked UserData log, or from a compromised secrets store uses it to join an unauthorized node to the cluster. The node receives a valid client certificate and appears as a legitimate cluster member. -
ClusterClass mutation escaping template constraints. An attacker or misconfigured CI pipeline with write access to a ClusterClass modifies the worker
MachineTemplatereference to one withhostNetwork: trueor a privileged container in the JoinConfiguration. Because ClusterClass changes propagate during the next reconciliation cycle to all referencing clusters, every managed cluster that reconciles the topology upgrade deploys the modified — and now privileged — node configuration.
The blast radius across all four scenarios extends beyond Kubernetes. Cloud credential compromise enables actions against the cloud control plane: creating new IAM users, modifying S3 bucket policies, altering VPC firewall rules. Bootstrap token abuse produces nodes that are trusted members of the workload cluster’s control plane trust domain. ClusterClass mutations affect every cluster in the fleet simultaneously rather than a single workload cluster. CAPI fleet management concentrates risk in a way that individual cluster management does not, and defenses must be proportionally stronger.
Configuration / Implementation
Management Cluster Isolation
The management cluster must not run production workloads. Mixing workload deployments with CAPI controllers means that a compromised workload application can access controller service account tokens through the shared API server. Dedicate a small, hardened cluster exclusively to fleet management. For cost-sensitive environments, a single-node or three-node management cluster running on minimal instance types is acceptable; the criticality of the cluster does not correlate with its size.
Apply PodSecurityStandard restricted to all CAPI namespaces to prevent controllers themselves from running as root or with elevated capabilities:
apiVersion: v1
kind: Namespace
metadata:
name: capi-system
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/enforce-version: latest
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/audit: restricted
---
apiVersion: v1
kind: Namespace
metadata:
name: capa-system
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/enforce-version: latest
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/audit: restricted
Restrict CAPI controller egress to only cloud API endpoints using NetworkPolicy. CAPI controllers have no legitimate reason to initiate connections to workload cluster pod CIDRs or internal RFC 1918 ranges other than the cloud API:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: capa-controller-egress
namespace: capa-system
spec:
podSelector:
matchLabels:
control-plane: capa-controller-manager
policyTypes:
- Egress
egress:
# Allow DNS resolution
- ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
# Allow AWS API endpoints (resolve via DNS, so allow 443 broadly)
- ports:
- port: 443
protocol: TCP
# Allow management cluster API server
- ports:
- port: 6443
protocol: TCP
Provider Credential Scoping (CAPA / AWS)
Replace static credentials with IRSA (IAM Roles for Service Accounts). The CAPA controller service account in capa-system should assume a role whose trust policy requires the EKS OIDC condition matching that specific service account:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::111122223333:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/EXAMPLEOIDCID"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.us-east-1.amazonaws.com/id/EXAMPLEOIDCID:sub": "system:serviceaccount:capa-system:capa-controller-manager",
"oidc.eks.us-east-1.amazonaws.com/id/EXAMPLEOIDCID:aud": "sts.amazonaws.com"
}
}
}
]
}
Use per-workload-cluster IAM roles by creating a dedicated AWSClusterRoleIdentity for each workload cluster rather than relying on the default AWSClusterControllerIdentity. This limits the credential blast radius to one cluster:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSClusterRoleIdentity
metadata:
name: production-us-east-1-identity
namespace: clusters
spec:
allowedNamespaces:
list:
- clusters
roleARN: arn:aws:iam::111122223333:role/capa-production-us-east-1
sessionName: capa-production-us-east-1
durationSeconds: 3600
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: production-us-east-1
namespace: clusters
spec:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSCluster
name: production-us-east-1
# ... other fields
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSCluster
metadata:
name: production-us-east-1
namespace: clusters
spec:
region: us-east-1
identityRef:
kind: AWSClusterRoleIdentity
name: production-us-east-1-identity
Apply an SCP deny-override on the management account to prevent the CAPA role from performing actions outside its intended scope, such as creating IAM users or modifying billing settings:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyCAPAEscalation",
"Effect": "Deny",
"Action": [
"iam:CreateUser",
"iam:DeleteUser",
"iam:AttachUserPolicy",
"organizations:*",
"account:*"
],
"Resource": "*",
"Condition": {
"ArnLike": {
"aws:PrincipalArn": "arn:aws:iam::111122223333:role/capa-*"
}
}
}
]
}
Bootstrap Token Hardening
Set bootstrapTokenTTL to fifteen minutes or less in both KubeadmControlPlane and KubeadmConfigTemplate. This window must be long enough for the cloud instance to boot, pull the container runtime, and reach the API server, but short enough that a token from a failed provisioning attempt cannot be reused:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
name: production-control-plane
namespace: clusters
spec:
kubeadmConfigSpec:
clusterConfiguration: {}
initConfiguration:
bootstrapTokens:
- ttl: 15m0s
usages:
- signing
- authentication
groups:
- system:bootstrappers:kubeadm:default-node-token
joinConfiguration:
nodeRegistration:
kubeletExtraArgs:
cloud-provider: external
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
name: production-workers
namespace: clusters
spec:
template:
spec:
joinConfiguration:
nodeRegistration:
kubeletExtraArgs:
cloud-provider: external
# CAPI generates bootstrap tokens; enforce TTL via MachineDeployment rollout strategy
Audit expired and orphaned bootstrap tokens regularly. CAPI does not always clean up tokens from machines that failed to join:
# List all bootstrap tokens with their expiration times
kubectl get secrets -n kube-system \
--field-selector type=bootstrap.kubernetes.io/token \
-o custom-columns='NAME:.metadata.name,EXPIRATION:.data.expiration' | \
while read name exp; do
echo "$name expires $(echo $exp | base64 -d)"
done
# Delete expired bootstrap tokens older than 1 hour
kubectl get secrets -n kube-system \
--field-selector type=bootstrap.kubernetes.io/token \
-o json | \
jq -r '.items[] | select(.data.expiration != null) | .metadata.name' | \
xargs -r kubectl delete secret -n kube-system
RBAC for CAPI CRDs
Platform teams who create and manage clusters need write access to core CAPI CRDs. Developers who request clusters through a self-service interface should have read-only access at most, with actual cluster creation mediated by a GitOps workflow and human review:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: capi-platform-admin
rules:
- apiGroups: ["cluster.x-k8s.io"]
resources:
- clusters
- machines
- machinedeployments
- machinesets
- machinehealthchecks
- clusterclasses
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["infrastructure.cluster.x-k8s.io"]
resources: ["*"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["controlplane.cluster.x-k8s.io"]
resources: ["*"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["bootstrap.cluster.x-k8s.io"]
resources: ["*"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: capi-developer-readonly
rules:
- apiGroups: ["cluster.x-k8s.io"]
resources:
- clusters
- machines
- machinedeployments
- machinesets
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: capi-platform-admin-binding
subjects:
- kind: Group
name: platform-team
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: capi-platform-admin
apiGroup: rbac.authorization.k8s.io
Use ClusterClass with allowedTopologies in the workers topology to constrain what instance types developers can request through a self-service Cluster object:
apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
name: standard-production
namespace: clusters
spec:
workers:
machineDeployments:
- class: default-worker
template:
bootstrap:
ref:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
name: standard-production-worker-bootstrap
infrastructure:
ref:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
name: standard-production-worker
machineHealthCheck:
maxUnhealthy: 33%
nodeStartupTimeout: 10m
unhealthyConditions:
- type: Ready
status: Unknown
timeout: 300s
- type: Ready
status: "False"
timeout: 300s
variables:
- name: workerInstanceType
required: true
schema:
openAPIV3Schema:
type: string
enum:
- m6i.large
- m6i.xlarge
- m6i.2xlarge
description: "Allowed worker instance types"
Machine Image Pinning
Pin AMI references in AWSMachineTemplate to specific AMI IDs validated by your image pipeline. Avoid filters selectors that resolve at provision time and can be influenced by AMI tag manipulation:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
metadata:
name: production-worker-v1-29-4
namespace: clusters
spec:
template:
spec:
instanceType: m6i.xlarge
ami:
id: ami-0a1b2c3d4e5f67890 # Pinned; validated by AMI pipeline on 2026-04-28
iamInstanceProfile: production-nodes-instance-profile
sshKeyName: "" # Disable SSH key injection; use SSM Session Manager
additionalSecurityGroups:
- id: sg-0abc123def456789a
rootVolume:
size: 50
type: gp3
encrypted: true
nonRootVolumes: []
imdsOptions:
httpPutResponseHopLimit: 1
httpTokens: required # IMDSv2 required
Deploy a MachineHealthCheck per MachineDeployment to enable automatic remediation of unhealthy nodes without requiring manual intervention that might involve elevated privileges:
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineHealthCheck
metadata:
name: production-worker-health
namespace: clusters
spec:
clusterName: production-us-east-1
selector:
matchLabels:
cluster.x-k8s.io/deployment-name: production-workers
maxUnhealthy: 33%
nodeStartupTimeout: 10m
unhealthyConditions:
- type: Ready
status: Unknown
timeout: 300s
- type: Ready
status: "False"
timeout: 300s
remediationTemplate:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSRemediationTemplate
name: production-worker-remediation
namespace: clusters
GitOps Integration
Scope the Flux or Argo CD service account used to apply CAPI objects to only the clusters namespace, not cluster-admin across the management cluster:
apiVersion: v1
kind: ServiceAccount
metadata:
name: capi-gitops-applier
namespace: clusters
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: capi-gitops-role
namespace: clusters
rules:
- apiGroups: ["cluster.x-k8s.io", "infrastructure.cluster.x-k8s.io",
"controlplane.cluster.x-k8s.io", "bootstrap.cluster.x-k8s.io"]
resources: ["*"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
# Explicitly deny delete — cluster deletion requires manual approval
# (omitting delete from verbs achieves this)
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: capi-gitops-binding
namespace: clusters
subjects:
- kind: ServiceAccount
name: capi-gitops-applier
namespace: clusters
roleRef:
kind: Role
name: capi-gitops-role
apiGroup: rbac.authorization.k8s.io
Reference that service account from the Flux Kustomization:
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: capi-clusters
namespace: flux-system
spec:
interval: 5m
path: ./clusters
prune: false # Disable prune — accidental cluster deletion is catastrophic
sourceRef:
kind: GitRepository
name: fleet-config
serviceAccountName: capi-gitops-applier
targetNamespace: clusters
healthChecks:
- apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
name: "*"
namespace: clusters
Enforce branch protection in GitHub or GitLab for the clusters/ directory path. A CODEOWNERS file requiring approval from the platform-team group before any merge achieves this without custom tooling:
# .github/CODEOWNERS
/clusters/ @org/platform-team
/clusters/production/ @org/platform-team @org/security-team
Audit Logging on the Management Cluster
Configure the API server audit policy to capture all write and delete operations on CAPI resources at the RequestResponse level. Read operations on Cluster and Machine objects should be captured at Metadata level to track who is querying the fleet state:
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Capture all writes to CAPI resources at full request/response
- level: RequestResponse
verbs: ["create", "update", "patch", "delete"]
resources:
- group: "cluster.x-k8s.io"
resources: ["clusters", "machines", "machinedeployments", "clusterclasses"]
- group: "infrastructure.cluster.x-k8s.io"
resources: ["awsclusters", "awsmachines", "awsmachinetemplates",
"awsclusterroleidentities"]
- group: "controlplane.cluster.x-k8s.io"
resources: ["kubeadmcontrolplanes"]
- group: "bootstrap.cluster.x-k8s.io"
resources: ["kubeadmconfigs", "kubeadmconfigtemplates"]
# Capture reads at metadata level
- level: Metadata
verbs: ["get", "list", "watch"]
resources:
- group: "cluster.x-k8s.io"
resources: ["clusters", "machines", "machinedeployments"]
# Capture bootstrap token access
- level: RequestResponse
resources:
- group: ""
resources: ["secrets"]
namespaces: ["kube-system"]
- level: None
users: ["system:kube-controller-manager"]
verbs: ["get", "list", "watch"]
- level: Metadata
omitStages:
- RequestReceived
Ship these audit logs to your SIEM. Alert on: Machine creates outside of business hours, AWSClusterRoleIdentity modifications, any ClusterClass updates, and bootstrap secret reads by non-controller principals.
Expected Behaviour
| Signal | Without Hardening | With Hardening |
|---|---|---|
| Provider credential blast radius | Single AWSClusterControllerIdentity with broad permissions covers all clusters; one compromise exposes entire fleet |
Per-cluster AWSClusterRoleIdentity with minimal permissions; compromise of one role affects one cluster |
| Bootstrap token reuse | Token TTL 1 hour; failed Machine leaves valid token; rogue node can join up to 60 minutes after failure | Token TTL 15 minutes; CAPI audit alert fires on bootstrap secret reads by non-controller principals; expired tokens cleaned up automatically |
| Unauthorized Machine provisioning | Developer with Machine write access creates GPU instances with custom AMI; joins cluster as trusted node |
RBAC restricts Machine write to platform-team group; ClusterClass variable schema enforces allowed instance types; GitOps service account cannot delete clusters |
| ClusterClass constraint bypass | Attacker with ClusterClass write modifies worker template fleet-wide; propagates to all clusters on next reconcile |
ClusterClass write restricted to platform-team; CODEOWNERS requires security-team review; audit alert fires on any ClusterClass update |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Per-cluster IAM roles | Blast radius limited to one workload cluster per credential compromise | N IAM roles to create and manage; role rotation complexity scales with fleet size | Automate role creation as part of cluster bootstrap pipeline; use AWS Organizations SCPs as a backstop |
| Short bootstrap TTL (≤ 15 minutes) | Stale tokens from failed provisioning cannot be reused | On slow cloud regions or large AMIs, node may not reach API server within window, causing join failure and requiring Machine reprovisioning | Profile actual node boot time per region; set TTL to measured p99 boot time plus two minutes; use MachineHealthCheck to auto-remediate join failures |
| Dedicated management cluster | No shared blast radius with workload applications; PodSecurityStandard can be enforced aggressively | Additional cluster cost and operational overhead; one more cluster to patch and upgrade | Use minimal instance types (three m6i.large nodes is sufficient for most CAPI controllers); treat management cluster upgrades as highest-priority maintenance |
| ClusterClass variable schema constraints | Prevents developers from requesting oversized or disallowed instance types; enforces fleet consistency | Reduces developer self-service flexibility; new instance types require schema update before adoption | Maintain a documented process for schema additions with a SLA (for example, 48-hour turnaround for approved instance type additions) |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| IRSA role assumption denied | CAPA controller logs AccessDenied: sts:AssumeRoleWithWebIdentity; AWSCluster object stuck in Provisioning with event failed to get AWS session |
CloudTrail AssumeRoleWithWebIdentity failures from the controller pod IP; CAPA controller error metric spike | Verify OIDC thumbprint in IAM identity provider matches cluster; confirm service account annotation eks.amazonaws.com/role-arn is set; check trust policy StringEquals condition matches exact service account name and namespace |
| Bootstrap token expires before node joins | Machine object shows BootstrapReady: false; node never appears in kubectl get nodes; cloud instance is running but repeatedly failing /v1beta1/token requests against API server |
CloudWatch/cloud logs showing HTTP 401 from node bootstrap requests; CAPI Machine event bootstrap token expired |
Delete the failed Machine object; CAPI will create a new one with a fresh bootstrap token; investigate underlying slow boot cause (AMI pull time, cloud init scripts) |
| MachineHealthCheck remediation loop | Nodes repeatedly deleted and recreated; MachineDeployment replica count oscillates; cluster unavailable as remediation consumes node quota | CAPI MachineHealthCheck metric unhealthy_machines above 0 for more than two reconciliation periods; cloud cost spike from instance churn |
Check maxUnhealthy threshold — if set too low, transient failures trigger remediation; pause remediation with cluster.x-k8s.io/paused annotation while investigating root cause (node condition misconfiguration, cloud instance quota, kernel crash) |
| ClusterClass schema validation rejects valid topology | Cluster object with a legitimate topology update fails admission with spec.topology.workers.machineDeployments[0].variables: Invalid value; platform team cannot apply cluster change |
Kubernetes admission webhook rejection events in API server audit log; Flux/Argo CD sync failure notification | Add the required instance type or variable value to the ClusterClass schema enum; re-apply the Cluster object; review ClusterClass variable schema as part of the change approval process to prevent recurrence |