Zero Trust Architecture: From BeyondCorp Principles to Production Implementation

Zero Trust Architecture: From BeyondCorp Principles to Production Implementation

The Problem With Perimeter Thinking

The classic security model draws a hard line between inside and outside. Inside the firewall is trusted; outside is not. VPN endpoints authenticate once at the boundary, and users inherit the network’s implicit trust for the rest of the session.

This model has two foundational problems. First, the perimeter no longer maps to anything coherent — engineers work from coffee shops, cloud workloads run in providers you don’t control, and contractors access systems from unmanaged devices. Second, a single compromised credential or misconfigured service inside the perimeter grants lateral movement to everything on the flat internal network. The 2020 SolarWinds breach exploited exactly this: once inside, attackers moved freely because the network trusted its own tenants.

Zero trust replaces implicit trust with continuous verification. Every request — from a user, a service, a device — must be authenticated, authorised, and validated against policy regardless of where it originates.

NIST SP 800-207: The Seven Tenets

NIST SP 800-207 (2020) defines the logical components and behavioral properties a ZTA must have. Seven tenets:

  1. All data sources and computing services are resources. Laptops, servers, cloud services, IoT devices — all require the same access verification. Nothing earns implicit trust by being “internal.”

  2. All communication is secured regardless of network location. Traffic between two pods in the same namespace requires the same authentication and encryption as traffic crossing the internet.

  3. Access to individual enterprise resources is granted per-session. Trust is scoped to a specific session on a specific resource, not to a network segment. Access to the time-tracking app does not inherit access to the payroll API.

  4. Access to resources is determined by dynamic policy. Policy incorporates device health, user identity, behavioural signals, and environmental context. The same credential evaluates differently from a managed device at a known location versus an unmanaged device in an unfamiliar jurisdiction.

  5. The enterprise monitors and measures the integrity and security posture of all owned assets. Continuous posture assessment, not point-in-time. A device compromised at 10am should not continue to have 9am tokens accepted.

  6. Authentication and authorisation are dynamic and strictly enforced before access is allowed. No cached trust. Every request goes through policy evaluation.

  7. The enterprise collects information about the current state of assets and communications to improve its security posture. The policy engine is fed by telemetry. Dead sensors mean degraded decisions.

BeyondCorp: The Reference Implementation

Google published the BeyondCorp architecture in a series of papers starting in 2014. By 2017 the majority of Google’s internal applications were accessible without VPN. The design is worth understanding because it solves concrete engineering problems that make zero trust hard to ship.

BeyondCorp rests on three interacting components:

Device inventory service. Every device that may access internal applications is enrolled in an inventory database storing hardware identity, OS version, patch level, disk encryption status, and managed status. Unenrolled devices cannot pass the access check regardless of the user’s credentials.

User and group database. Identity federated from the corporate directory. Group membership defines role-level access grants, but group membership alone is insufficient — the device posture check must also pass.

Access proxy. All application access flows through a reverse proxy that enforces policy by querying the device inventory and user database for every request. Applications sit behind the proxy and are not directly network-reachable from clients.

The key insight is the elimination of the network as a trust signal. Whether the request arrives from the corporate campus WiFi or a hotel abroad, the proxy applies the same policy. Network location is context — it can influence risk scores — but is never a sufficient condition for access.

Google Cloud’s managed offering (Cloud BeyondCorp Enterprise) exposes this as an identity-aware proxy. HashiCorp Boundary, Cloudflare Access, Zscaler Private Access, and Palo Alto Prisma Access implement the same pattern: access proxy fronting internal resources, identity and device posture evaluated per request, no VPN.

The Five Pillars and Their Controls

CISA and DoD framing of zero trust organises implementation across five pillars. Each pillar has concrete control mappings.

Identity

The anchor for all zero trust decisions. Controls:

  • OIDC/SAML federation from an authoritative IdP (Okta, Entra ID, Keycloak). Every human user authenticates through the IdP. No local accounts, no service passwords. See OAuth2 and OIDC hardening for hardening the token pipeline.
  • Phishing-resistant MFA (FIDO2/WebAuthn passkeys or hardware keys). TOTP codes are insufficient for the identity pillar of a serious ZTA — they are phishable.
  • Session duration limits with continuous re-evaluation. Long-lived sessions are broken when context changes: device posture degrades, the user travels to a flagged location, an alert fires on their account.
  • Privileged identity management. Just-in-time elevation with time-bounded sessions for privileged operations. No always-on privileged accounts.

Device

A valid user credential on a compromised or unmanaged device is a compromised credential.

  • Device posture checks at authentication time. At minimum: managed/enrolled status, OS patch level, disk encryption, endpoint agent running, no known-bad software.
  • Certificate-based device authentication. Managed devices carry a device certificate issued by the enterprise PKI, bound to the hardware via TPM-backed private key. Verified alongside the user credential at the access proxy.
  • Continuous posture evaluation. Posture assessed at time-of-request, not just enrollment. Devices that fall out of compliance have sessions terminated or access downgraded. MDM/EDR integration provides the real-time signal.
  • Unenrolled device access paths. Define explicitly what unmanaged devices may access — often nothing internal, or a narrow isolated portal. No implicit fallback.

Network

Network controls are not the primary enforcement mechanism in zero trust — that role belongs to identity and device — but they are a required defence-in-depth layer.

  • Microsegmentation replaces flat internal networks. Services communicate only on explicitly declared paths. See microsegmentation controls for implementation patterns.
  • East-west traffic encryption. All service-to-service traffic is encrypted in transit. A service mesh with mutual TLS is standard at the workload layer.
  • Software-defined perimeter. The network boundary is defined by policy, not physical topology. Access proxies and ZTNA gateways replace static firewall rules.

Application (Workload)

Applications and services are resources. Workloads need machine identity just as users need human identity.

  • SPIFFE workload identity. Each workload carries a SPIFFE ID (spiffe://trust-domain/path) backed by a short-lived X.509 certificate (SVID). The certificate is issued by SPIRE based on workload attestation — pod UID, container image, namespace — not a shared secret. SVIDs rotate continuously (default: hourly).
  • mTLS between services. Mutual TLS using SVIDs means both sides of a service call prove their identity. A service that has not been issued an SVID cannot successfully establish a connection, regardless of what network path it uses.
  • Per-route authorisation. Service identity (who is calling) is a necessary but insufficient condition. Authorisation policy specifies which identities may call which endpoints. A SPIFFE ID for the inventory service does not grant it access to the payments endpoint.

Data

Data classification drives which controls apply to which flows.

  • Data classification schema. At minimum: public, internal, confidential, restricted. Classification attached to data repositories, APIs, and egress paths.
  • DLP controls at egress points. Data loss prevention inspection on email gateways, cloud sync, and unmanaged upload paths, tuned to classification.
  • Encryption at rest keyed to sensitivity. Restricted data uses HSM-backed keys with access logging. Blanket at-rest encryption is a baseline, not a differentiator.
  • Data access logging. Who accessed which resource, when, from which identity and device — the audit trail that makes policy decisions reviewable.

The PDP/PEP Architecture

NIST 800-207 describes the zero trust policy engine in terms of two logical components.

Policy Decision Point (PDP): Evaluates policy against the requesting identity, device posture signals, target resource, and environmental context. Produces permit or deny, optionally with conditions (require step-up auth, restrict to read-only scope). Inputs come from the identity provider, device inventory, threat intelligence feeds, and SIEM/UEBA.

Policy Enforcement Point (PEP): Intercepts the request, queries the PDP, and allows or blocks the result. In practice: the access proxy for user-to-application access, the sidecar proxy (Envoy) or ingress controller for service-to-service access, the database proxy for data-layer access.

The PDP/PEP separation is the key architectural constraint: enforcement logic must not live in the application. Applications trust the PEP to have verified the caller and receive verified identity as a header or mTLS peer certificate. Policy changes take effect at the PEP without application redeployments, and enforcement is auditable in one place.

In Kubernetes: the PEP is the service mesh sidecar (Envoy/ztunnel) or gateway; the PDP is OPA or the cloud identity-aware proxy. Machine-to-machine policy lives in OPA; user-to-service policy in the access proxy rule engine.

Zero Trust for Kubernetes

Kubernetes clusters present a specific challenge: by default, all pods can reach all other pods across all namespaces. This is a maximally permissive flat network. Layering zero trust onto a Kubernetes cluster requires addressing four areas.

Network policy enforcement. Kubernetes NetworkPolicy resources require a CNI plugin that enforces them (Calico, Cilium, Antrea). Start by denying all traffic and permitting only declared paths. Cilium’s CiliumNetworkPolicy extends this to L7 — denying POST /admin/* from frontend even if the TCP path is open.

Workload identity via SPIFFE. Deploy SPIRE into the cluster. Every workload gets a SPIFFE ID anchored to its service account and namespace. The SPIRE agent DaemonSet attests pods using the Kubernetes node attestor and delivers SVIDs via the Workload API (a Unix socket), rotating them without restarts. See SPIFFE/SPIRE workload identity for full deployment details.

mTLS between services. A service mesh enforces mTLS using SVIDs. In Istio, PeerAuthentication in STRICT mode refuses connections without a valid client certificate. A pod without a valid SVID cannot connect regardless of network proximity.

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT

OPA for authorisation. OPA Gatekeeper enforces admission-time policy (no container may run as root; all images must come from the approved registry). At runtime, Envoy’s external authorisation filter queries an OPA sidecar per request. See Kyverno controller security for a policy-as-code alternative at the admission layer.

A minimal OPA policy enforcing SPIFFE-based authorisation for service-to-service calls:

package authz

import future.keywords.if
import future.keywords.in

default allow := false

# Permitted service-to-service calls
allowed_paths := {
  "spiffe://prod.example.com/ns/frontend/sa/web": {"/api/products", "/api/search"},
  "spiffe://prod.example.com/ns/payments/sa/processor": {"/api/orders"},
}

allow if {
  caller := input.attributes.source.principal
  target_path := input.attributes.request.http.path
  target_path in allowed_paths[caller]
}

The target cluster posture: all network paths explicitly declared, no implicit trust between namespaces, every workload holds a SPIFFE-issued certificate, every east-west call is mutually authenticated, OPA evaluates each call against a least-privilege access graph.

Phased Migration Plan

A production environment cannot switch to zero trust overnight. The migration must be staged so that each phase delivers measurable security value without breaking production.

Phase 1: Inventory and classify. Enumerate all services, APIs, data stores, and access paths. Classify each by sensitivity. Identify which services have no authentication on internal calls — these are the highest-priority targets. Identify all VPN users and what they access. This asset inventory is the prerequisite for every subsequent phase.

Phase 2: Enforce identity at the perimeter. Deploy an identity-aware access proxy in front of internal applications. Force all human access through the proxy. Implement phishing-resistant MFA. Begin logging device posture at authentication time even if you are not yet denying on posture. This immediately eliminates the “VPN credential plus flat access” pattern.

Phase 3: Enforce mTLS between services. Deploy a service mesh or SPIFFE/SPIRE. Set initial mode to permissive (record which calls fail mTLS but do not block). Fix application configurations that hardcode IP addresses or bypass DNS. Flip namespaces to STRICT mTLS mode starting with the least critical services, targeting all namespaces within 60–90 days.

Phase 4: Enforce network microsegmentation. Implement deny-by-default network policy across namespaces. Start with a permissive-observe mode (Cilium policy audit mode, Calico staged policies) to log what would be denied without blocking. Use the traffic data to write permit rules for legitimate paths, then flip to enforcing. This eliminates lateral movement even for workloads that bypass the mTLS layer.

Phase 5: Replace VPN with ZTNA. Applications behind the access proxy and services protected by mTLS leave the VPN with diminishing purpose. Move remaining services behind the proxy. Decommission VPN for application access. Retain it only for break-glass infrastructure management, with a plan to eliminate that path too.

Phase 6: Eliminate implicit trust zones. Audit for remaining implicit trust: shared service accounts with broad permissions, API endpoints that accept requests from 10.0.0.0/8 without authentication, allowPrivilegeEscalation: true on containers, network policies with catch-all permits. Use the NIST 800-207 tenets as a checklist: for each resource, confirm access is per-session, policy is dynamic, and posture signals feed the PDP.

Common Pitfalls

“We have MFA, therefore we have zero trust.” MFA satisfies part of the identity pillar. It does not address device posture, session-level authorisation, workload identity, network segmentation, or data classification. It is table stakes, not a destination.

Relying entirely on network-layer controls. Microsegmentation and firewall rules are brittle. Rules accumulate, exceptions get carved out, and the documented topology diverges from reality. A service with a valid SPIFFE certificate that is not authorised to call the payments API must be denied at the application layer regardless of whether a network path happens to exist.

Incomplete device coverage. A ZTA covering 80% of devices has a substantial gap. Define the policy for unmanaged devices explicitly — either they access nothing, or they reach a narrow sandboxed set of resources through an isolated path. Do not let them silently inherit managed-device access paths.

Treating zero trust as a product. Buying a ZTNA gateway implements one component — the PEP — at one layer. Device posture integration, identity federation, service-to-service authentication, data classification, and the policy decision engine all still require deliberate design across all five pillars.

Skipping the inventory phase. Organisations that attempt enforcement before mapping their assets block unknown services, carve out exceptions under pressure, and watch those exceptions outlive the incident. The inventory is the foundation, not optional pre-work.

Measuring Progress

Zero trust is not a binary state. Track these metrics quarterly:

  • % of user-to-application sessions flowing through the access proxy (target: 100%)
  • % of east-west service calls authenticated with mTLS (target: 100%)
  • % of namespaces with default-deny network policy enforced (target: 100%)
  • % of managed endpoints checking into the posture evaluation system in the last 24 hours (target: >95%)
  • Mean time to revoke access after a termination event (target: <1 hour)
  • % of privileged access granted via just-in-time elevation (target: 100%)

The direction matters as much as the absolute value. A team that moves from 30% mTLS coverage to 70% in two quarters is making genuine progress. Reaching 100% on these metrics means you have built the enforcement infrastructure. Maintaining it — as services are added, devices turn over, access patterns change — is the ongoing operational work of a zero trust architecture.