SPIFFE and SPIRE for Workload Identity Across Clusters and Clouds

SPIFFE and SPIRE for Workload Identity Across Clusters and Clouds

Problem

Workloads need to authenticate to other workloads. The dominant patterns each have a structural problem:

  • Shared API keys / passwords — leaks live forever, rotation requires redeploys, secrets scattered across CI systems and config files.
  • Cloud IAM roles — work within one cloud but break at the boundary. A Kubernetes pod in cluster A cannot directly assume an IAM role in cloud B’s account without bridging through a shared secret or a federated mechanism that ends up grounding back to a shared secret.
  • mTLS with hand-managed certificates — operationally heavy at scale; certificate distribution, rotation, and revocation become a job in themselves.
  • Service account tokens (Kubernetes) — bound to the cluster’s signing key; portable only via OIDC federation, and only to systems that accept that issuer.

SPIFFE (Secure Production Identity Framework For Everyone) is an open specification for workload identity. SPIRE (the SPIFFE Runtime Environment) is the most widely-deployed implementation. Together they provide:

  • A cryptographic identity for every workload, expressed as a SPIFFE ID (a URI like spiffe://prod.example.com/ns/payments/sa/api).
  • Short-lived X.509 certificates (SVIDs) and JWT-SVIDs issued automatically to each workload, rotated continuously, never persisted to disk.
  • Attestation of workloads based on operating system facts (Kubernetes pod metadata, AWS instance identity, Azure VM identity, hostname signatures) — no shared secrets to bootstrap trust.
  • Federation between trust domains, allowing identities issued in cluster A to be verified in cluster B or in a different cloud entirely.

By 2026 SPIRE has substantial adoption: Istio uses SPIFFE for service identity, AWS App Mesh ships SPIRE integrations, Tetrate and HPE offer commercial SPIRE deployments, and Tetragon supports SPIFFE-aware policy. The pattern is mature enough for production beyond a single team’s pet project.

This article covers SPIRE Server topology, agent placement on Kubernetes and VMs, attestation policies, federation between trust domains, and integration with applications via the Workload API.

Target systems: SPIRE 1.10+, Kubernetes 1.28+, optional integrations with Istio 1.22+, Envoy 1.30+, and HashiCorp Vault 1.16+. Federates with cloud OIDC issuers (AWS STS, GCP STS, Azure AD).

Threat Model

  • Adversary 1 — Compromised pod: attacker with code execution in pod A wants to impersonate pod B to access B’s permissions on a downstream service.
  • Adversary 2 — Cluster-to-cluster pivot: attacker who has compromised a workload in dev cluster wants to access a service in prod cluster using a stolen credential.
  • Adversary 3 — Insider with cloud-account access: ops engineer with read access to one cloud account uses it to impersonate workloads in another account.
  • Adversary 4 — Token replay: attacker captures a JWT-SVID in transit and replays it after the original session ended.
  • Access level: Adversary 1 has pod-level execution; cannot read other pods’ filesystems by default. Adversary 2 has full credential access in their compromised cluster. Adversary 3 has IAM-read in one account. Adversary 4 has passive network capture.
  • Objective: Authenticate to a service as a different workload. Cross trust boundaries (cluster, cloud, environment). Persist access beyond the legitimate session window.
  • Blast radius: Without SPIFFE: compromised tokens are usable until rotation (often days/weeks). Cross-cluster pivot succeeds whenever a shared secret bridges environments. With SPIFFE: SVIDs are rotated every hour by default; replay attacks expire quickly. Attestation binds identity to workload-level facts (pod UID, container image), so impersonation requires reproducing those facts.

Configuration

Step 1: Deploy SPIRE Server

The Server is the certificate authority for the trust domain. One Server (HA-deployed) per trust domain. Trust domain typically maps to one organizational boundary (prod.example.com, staging.example.com).

# spire-server-statefulset.yaml
# SPIRE Server in HA. Backed by a managed Postgres for HA state.
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: spire-server
  namespace: spire
spec:
  serviceName: spire-server
  replicas: 3
  selector:
    matchLabels:
      app: spire-server
  template:
    metadata:
      labels:
        app: spire-server
    spec:
      serviceAccountName: spire-server
      containers:
        - name: spire-server
          image: ghcr.io/spiffe/spire-server:1.10.0
          args: ["-config", "/run/spire/config/server.conf"]
          ports:
            - containerPort: 8081
              name: grpc
          volumeMounts:
            - name: config
              mountPath: /run/spire/config
              readOnly: true
            - name: data
              mountPath: /run/spire/data
          livenessProbe:
            httpGet:
              path: /live
              port: 8080
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
      volumes:
        - name: config
          configMap:
            name: spire-server-config
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: [ReadWriteOnce]
        resources:
          requests:
            storage: 10Gi
# server.conf
server {
  bind_address = "0.0.0.0"
  bind_port = "8081"
  trust_domain = "prod.example.com"
  data_dir = "/run/spire/data"
  log_level = "INFO"
  ca_subject = {
    country = ["US"],
    organization = ["Example Corp"],
    common_name = "SPIRE Server CA"
  }
  default_x509_svid_ttl = "1h"
  default_jwt_svid_ttl = "5m"
  ca_ttl = "168h"
  ca_key_type = "ec-p384"
}

plugins {
  DataStore "sql" {
    plugin_data {
      database_type = "postgres"
      connection_string = "postgres://spire:..@spire-db:5432/spire?sslmode=require"
    }
  }
  KeyManager "aws_kms" {
    plugin_data {
      region = "us-east-1"
      key_metadata_file = "/run/spire/data/keys.json"
      key_policy_file = "/run/spire/config/key-policy.json"
    }
  }
  NodeAttestor "k8s_psat" {
    plugin_data {
      clusters = {
        "prod-us-east-1" = {
          service_account_allow_list = ["spire:spire-agent"]
        }
      }
    }
  }
  UpstreamAuthority "disk" {
    plugin_data {
      cert_file_path = "/run/spire/config/upstream/intermediate.crt"
      key_file_path = "/run/spire/config/upstream/intermediate.key"
    }
  }
}

Key choices:

  • default_x509_svid_ttl = "1h" — short SVID lifetimes; SPIRE Agents auto-rotate.
  • KeyManager "aws_kms" — the trust-domain root key lives in KMS, never in plaintext on disk.
  • UpstreamAuthority "disk" — chains to your organization’s PKI so SPIRE-issued SVIDs validate against your existing trust roots.

Step 2: Deploy SPIRE Agent on Each Node

The Agent runs on every node where SPIRE-aware workloads need identities. On Kubernetes, deploy as a DaemonSet exposing the Workload API as a Unix socket.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: spire-agent
  namespace: spire
spec:
  selector:
    matchLabels:
      app: spire-agent
  template:
    metadata:
      labels:
        app: spire-agent
    spec:
      serviceAccountName: spire-agent
      hostPID: true
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
        - name: spire-agent
          image: ghcr.io/spiffe/spire-agent:1.10.0
          args: ["-config", "/run/spire/config/agent.conf"]
          volumeMounts:
            - name: config
              mountPath: /run/spire/config
              readOnly: true
            - name: agent-socket
              mountPath: /run/spire/agent-sockets
            - name: kubelet-socket
              mountPath: /var/lib/kubelet/pod-resources
              readOnly: true
      volumes:
        - name: config
          configMap:
            name: spire-agent-config
        - name: agent-socket
          hostPath:
            path: /run/spire/agent-sockets
            type: DirectoryOrCreate
        - name: kubelet-socket
          hostPath:
            path: /var/lib/kubelet/pod-resources
# agent.conf
agent {
  data_dir = "/run/spire/data"
  log_level = "INFO"
  server_address = "spire-server.spire.svc.cluster.local"
  server_port = "8081"
  socket_path = "/run/spire/agent-sockets/spire-agent.sock"
  trust_bundle_path = "/run/spire/config/bootstrap.crt"
  trust_domain = "prod.example.com"
}

plugins {
  NodeAttestor "k8s_psat" {
    plugin_data {
      cluster = "prod-us-east-1"
    }
  }
  KeyManager "memory" {}
  WorkloadAttestor "k8s" {
    plugin_data {
      kubelet_read_only_port = 0
      skip_kubelet_verification = false
    }
  }
  WorkloadAttestor "unix" {
    plugin_data {
      discover_workload_path = true
    }
  }
}

The k8s WorkloadAttestor inspects pod metadata to identify the workload. The unix WorkloadAttestor identifies based on the calling process’s UID, GID, and binary path — useful for non-Kubernetes workloads on the same host.

Step 3: Register Workloads

A registration entry maps an attestation selector (a set of facts about a workload) to a SPIFFE ID. Without a registration, even an attested workload gets no identity.

# Register the payments-api workload in the payments namespace.
kubectl exec -n spire spire-server-0 -- \
  /opt/spire/bin/spire-server entry create \
    -spiffeID spiffe://prod.example.com/ns/payments/sa/api \
    -parentID spiffe://prod.example.com/spire/agent/k8s_psat/prod-us-east-1/spire-agent \
    -selector k8s:ns:payments \
    -selector k8s:sa:api \
    -selector k8s:container-image:ghcr.io/myorg/payments-api@sha256:abc123 \
    -ttl 3600

The selectors specify what must be true for the workload to receive this SPIFFE ID. Pinning the container image digest (the third selector) means a tampered or substituted image with a different digest cannot impersonate the workload.

Step 4: Application Integration via the Workload API

The application connects to the Workload API socket to fetch its SVID. SPIFFE provides language SDKs.

// main.go - Go application using the SPIFFE Workload API.
package main

import (
    "context"
    "log"
    "net/http"
    "github.com/spiffe/go-spiffe/v2/spiffetls"
    "github.com/spiffe/go-spiffe/v2/spiffetls/tlsconfig"
    "github.com/spiffe/go-spiffe/v2/workloadapi"
    "github.com/spiffe/go-spiffe/v2/spiffeid"
)

func main() {
    ctx := context.Background()
    src, err := workloadapi.NewX509Source(ctx)
    if err != nil { log.Fatal(err) }
    defer src.Close()

    // Server: accept only known peer SPIFFE IDs.
    allowedClient := spiffeid.RequireFromString(
        "spiffe://prod.example.com/ns/web/sa/frontend")
    tlsCfg := tlsconfig.MTLSServerConfig(src, src,
        tlsconfig.AuthorizeID(allowedClient))

    server := &http.Server{
        Addr:      ":8443",
        TLSConfig: tlsCfg,
        Handler:   http.HandlerFunc(handler),
    }
    log.Fatal(server.ListenAndServeTLS("", ""))
}

func handler(w http.ResponseWriter, r *http.Request) {
    // r.TLS.PeerCertificates[0] contains the verified peer SVID.
    // SPIFFE ID is in URI SAN.
    w.Write([]byte("OK"))
}

The application never touches a private key; never writes a certificate to disk; gets continuous rotation transparently.

Step 5: Federation Between Trust Domains

To allow workloads in staging.example.com to authenticate to services in prod.example.com, federate the trust domains. Each Server exports a “trust bundle” (its root CA) to the other.

# In prod, register the staging trust domain as federated.
spire-server federation create \
  -trustDomain staging.example.com \
  -bundleEndpointURL https://spire-server.staging.example.com/bundle \
  -bundleEndpointProfile https_spiffe \
  -endpointSpiffeID spiffe://staging.example.com/spire/server

# In staging, register prod the same way.

Workloads in prod can now configure an authorization policy that accepts SPIFFE IDs from staging.example.com for specific use cases — for example, allowing a staging build pipeline to write to a prod artifact bucket via the SPIFFE-authenticated path.

Step 6: Federation with Cloud OIDC

Federation isn’t only between SPIRE deployments. SPIRE issues JWT-SVIDs that are valid OIDC ID tokens; cloud providers can be configured to trust SPIRE as an OIDC issuer.

# Configure SPIRE Server to expose its OIDC discovery endpoint.
spire-server x509-authority list
# Take the upstream cert.

# Register SPIRE as an OIDC provider in AWS.
aws iam create-open-id-connect-provider \
  --url https://oidc.spire.example.com \
  --client-id-list spiffe://prod.example.com \
  --thumbprint-list <thumbprint-of-spire-jwks-endpoint-cert>

# Trust policy on the AWS role:
{
  "Effect": "Allow",
  "Principal": {"Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.spire.example.com"},
  "Action": "sts:AssumeRoleWithWebIdentity",
  "Condition": {
    "StringEquals": {
      "oidc.spire.example.com:sub":
        "spiffe://prod.example.com/ns/payments/sa/api"
    }
  }
}

The payments API pod now assumes an AWS role using its SPIFFE identity — no AWS-specific credentials, no OIDC bridging through GitHub Actions or another federation provider.

Expected Behaviour

Signal Without SPIFFE With SPIFFE
Workload-to-workload auth Shared secret or hand-rolled mTLS Cryptographic SVID; rotated hourly
Cluster-to-cluster auth Bridged via shared secret or external IdP Federation between SPIRE trust domains
Cloud-to-cloud auth Per-cloud federation chains Single SPIFFE identity, multiple cloud OIDC trusts
Certificate lifetime Days to years 1 hour (default), continuously rotated
Compromised pod blast radius Until secret rotation Until SVID expires (max 1 hour)
Onboarding new service Issue secret, distribute, monitor Create registration entry; no secret distribution

Verify a workload has a valid SPIFFE identity:

# From inside the pod, query the Workload API.
kubectl exec -n payments deploy/api -- \
  /opt/spire/bin/spire-agent api fetch x509 \
    -socketPath /run/spire/agent-sockets/spire-agent.sock
# SPIFFE ID:        spiffe://prod.example.com/ns/payments/sa/api
# SVID Valid After: 2026-04-27 16:30:00 +0000 UTC
# SVID Valid Until: 2026-04-27 17:30:00 +0000 UTC

Trade-offs

Aspect Benefit Cost Mitigation
Continuous SVID rotation Compromised SVIDs expire within an hour Workloads must use the Workload API rather than reading a static cert SDK integration via go-spiffe, spiffe-helper for sidecar injection of cert files.
Attestation by image digest Container substitution does not impersonate Registration entries pin to digests; image rebuild requires new entry Automate registration entry updates from your CD pipeline. The pipeline knows the new digest.
Multi-trust-domain federation Services in different clusters can mTLS without shared keys Operational overhead of bundle endpoints, federation entries Use the federation bundle endpoint with auto-refresh; SPIRE refreshes bundles on its own schedule.
OIDC-based cloud federation Single identity model across cloud accounts OIDC providers must trust your SPIRE issuer URL Run the SPIRE OIDC discovery endpoint behind a stable, public URL (Cloudflare or similar); rotate the JWKS endpoint cert with care.
Workload API as a UDS Simple, fast, no network involvement Requires hostPath or DaemonSet socket mount Use a CSI driver (spiffe-csi) for ephemeral socket mounts that respect Pod Security Standards.
KMS-backed root Server compromise does not expose root key KMS API costs and dependency on KMS availability Use a regional KMS; SPIRE caches the active intermediate, only contacting KMS for rotation.

Failure Modes

Failure Symptom Detection Recovery
Workload registration missing Pod cannot fetch SVID; logs show no SVID found for... spire-server entry show for the SPIFFE ID returns nothing Add the registration entry. Automate via your deploy pipeline so this is not a manual step.
Selector mismatch (image digest changed) Pods after a deploy lose their identity spire-server entry show shows the old digest pinned Update the registration entry with the new digest before or alongside the deploy.
SPIRE Server outage New SVIDs cannot be issued; existing SVIDs expire within their TTL Workload error rates rise an hour after Server outage Run Server in HA (3+ replicas). For multi-region, deploy a Server per region with shared backing store and a federation between them.
Federation bundle stale Cross-domain mTLS fails; remote SVIDs no longer validate Application logs show unknown CA errors for federated SPIFFE IDs Verify the bundle endpoint is reachable; SPIRE auto-refreshes bundles every 5 minutes by default. Check the federation entry’s last-refresh time.
Workload API socket inaccessible Pod cannot find the Unix socket Pod logs show connection refused on the socket path Verify the SPIRE Agent DaemonSet is running on the node. For new pod templates, ensure the hostPath mount or CSI volume is configured.
Time skew breaks SVID validation Pods report SVID not yet valid or expired errors Application errors at handshake; clock disagreement between pods and CA Run NTP on every node; SPIRE validates against a 30s clock skew tolerance. Investigate node clocks before tuning.
Compromised SPIRE Agent Attacker can issue arbitrary SVIDs to processes on that node Audit logs show registrations from unexpected selectors Limit Server-side per-Agent registrations (selectors must be specific). Rotate the Agent’s bootstrap credentials. The blast radius is bounded to that node’s pods.

When SPIFFE Is the Wrong Tool

  • Single-cloud, single-cluster, no cross-boundary identity needed. The cloud’s native IAM (Pod Identity, Workload Identity Federation) is simpler.
  • Workloads cannot integrate with the Workload API. SPIFFE is most powerful when applications fetch SVIDs directly. For legacy workloads, spiffe-helper writes certs to disk; that works but loses the in-memory-only benefit.
  • Trust domain boundaries do not align with operational boundaries. SPIFFE assumes a logical trust domain hierarchy. If your environment is a flat shared cluster, the value is reduced.