Zero Trust Network Access with WireGuard: Replacing VPN with Per-Resource Tunnels

VPN vs ZTNA: The Access Model Shift

Traditional VPN is network-centric. A user authenticates once — credentials, certificate, or both — and the VPN concentrator inserts them into an internal IP subnet. From that point the user’s device has Layer-3 reachability to everything in that subnet until the session disconnects or expires. The access decision is binary and front-loaded. The concentrator has no knowledge of which application the user is accessing, which device they are using, or whether the device is compliant with policy. A contractor and a privileged engineer with the same VPN profile are indistinguishable at the network layer.

Zero Trust Network Access (ZTNA) is resource-centric. The access decision happens per connection, per resource, after evaluating user identity, device posture, and context. The tunnel is narrow by definition — it carries exactly the traffic the policy permits to exactly the destination allowed. There is no IP subnet membership; there is no implicit trust after authentication. Each time a client connects to a new resource, the control plane re-evaluates policy.

The concrete differences:

Dimension	VPN	ZTNA
Access scope	Subnet — everything in the network	Per-resource — specific host:port pairs
Trust decision	Once at tunnel establishment	Per-connection or per-session
Device posture	Unknown or checked at connect only	Evaluated continuously or at each resource
Lateral movement on compromise	Trivial — attacker has subnet access	Contained — attacker has only the policy-permitted resources
Audit granularity	“User connected VPN”	“User accessed resource X at time T from device Y”
Protocol	Layer 3	Varies; WireGuard, HTTPS proxy, or both

WireGuard makes ZTNA operationally practical. Its cryptokey routing model maps cleanly to per-resource access: each peer is identified by a public key, and what a peer can reach is determined entirely by the AllowedIPs field of its peer configuration. The control plane — whether Tailscale, Headscale, or a custom broker — becomes the policy enforcement point that decides which keys get which AllowedIPs at provisioning time.

For the broader architectural context of why network-location-based trust fails, see Zero Trust Architecture Principles.

WireGuard Fundamentals

WireGuard is a Layer-3 VPN protocol implemented as a Linux kernel module (mainline since 5.6) and a cross-platform userspace implementation (wireguard-go). The protocol surface is deliberately minimal: Curve25519 for key exchange, ChaCha20-Poly1305 for encryption, BLAKE2s for hashing, SipHash for hashtable keys. There is no cipher negotiation and no protocol versioning — reducing the attack surface to the point where formal verification is tractable.

Cryptokey routing is WireGuard’s core routing mechanism. The kernel module maintains a table: for each peer, a public key maps to a set of allowed source IP ranges. When a packet arrives on the WireGuard interface from a peer, it is decrypted using that peer’s shared secret (derived from the peer’s public key and the local private key). If the decrypted packet’s source IP falls within the peer’s AllowedIPs, it is accepted. If not, it is dropped. On egress, the destination IP of a packet determines which peer it is encrypted to and forwarded toward. This is the entire routing model — no routing table lookups, no certificates, no session state beyond the handshake.

Kernel module vs userspace: The kernel module (wireguard.ko) is faster — encryption and decryption happen in kernel context, avoiding syscall overhead. wireguard-go runs in userspace and is used on platforms without kernel module support (macOS, Windows, some BSDs). Tailscale uses wireguard-go internally via its gVisor-based netstack (netstack mode) or via the kernel module on Linux. For production Linux servers, prefer the kernel module.

# Verify kernel module is loaded
lsmod | grep wireguard
# wireguard              94208  0

# If not loaded:
modprobe wireguard
echo wireguard >> /etc/modules-load.d/wireguard.conf

A minimal peer configuration:

# /etc/wireguard/wg0.conf
[Interface]
PrivateKey = <base64-private-key>
Address = 100.64.0.1/32
ListenPort = 51820

[Peer]
PublicKey = <peer-b-public-key>
AllowedIPs = 100.64.0.2/32, 10.0.1.0/24
Endpoint = peer-b.example.com:51820
PersistentKeepalive = 25

The AllowedIPs = 100.64.0.2/32, 10.0.1.0/24 line is a policy statement: peer-B is allowed to send traffic originating from 100.64.0.2, and traffic destined for 10.0.1.0/24 should be encrypted and sent to peer-B. Narrowing AllowedIPs is the primitive that enables per-resource access at the WireGuard layer.

Tailscale Architecture

Tailscale is a managed control plane layered over WireGuard. It solves the operational problems that raw WireGuard leaves to the operator: key distribution, NAT traversal, peer discovery, and access policy.

Coordination server: Every Tailscale node maintains a long-lived connection to controlplane.tailscale.com. The coordination server distributes public keys and endpoint addresses to all nodes in the tailnet. When node A wants to connect to node B, the coordination server provides B’s current public key and its candidate endpoints. The actual WireGuard traffic never traverses the coordination server — it negotiates the data path and gets out of the way.

DERP relays (Designated Encrypted Relay for Packets): Not all peers can establish direct UDP paths. Symmetric NAT, enterprise firewalls, and carrier-grade NAT all block direct peer-to-peer UDP. Tailscale maintains a global network of DERP servers. When a direct path cannot be established, WireGuard traffic is relayed through a DERP server — still end-to-end encrypted (the relay sees only ciphertext), but with added latency. Tailscale continuously attempts to upgrade relayed connections to direct P2P paths using techniques similar to ICE (Interactive Connectivity Establishment).

Direct P2P: When both nodes can exchange UDP packets directly, Tailscale establishes a direct WireGuard tunnel without relay. This is the fast path — full kernel WireGuard throughput, sub-millisecond overhead. Most Linux-to-Linux connections within the same cloud provider or VPC achieve direct paths. Cross-NAT paths often require DERP initially, then upgrade to direct.

ACL tags: Tags are the ZTNA primitive in Tailscale. A tag (tag:production, tag:ci-runner) is applied to a device, decoupling its access rights from the user who authenticated it. Servers and service accounts use tags; human users use group membership via SSO. Tags are owned by users specified in tagOwners; only those users can assign the tag to a device. This prevents privilege escalation via tag self-assignment.

# Install Tailscale
curl -fsSL https://tailscale.com/install.sh | sh

# Join a tailnet with a tag (for a server — no interactive login)
sudo tailscale up \
  --auth-key=tskey-auth-k... \
  --advertise-tags=tag:production \
  --hostname=db-primary-01

Headscale: Self-Hosted Control Plane

Headscale is an open-source reimplementation of the Tailscale coordination protocol. Standard Tailscale clients (tailscaled) point at a Headscale server instead of controlplane.tailscale.com. The data plane is identical WireGuard; only the control plane moves on-premises.

Deployment:

# Pull and run Headscale
docker run -d \
  --name headscale \
  -v /etc/headscale:/etc/headscale \
  -v /var/lib/headscale:/var/lib/headscale \
  -p 443:8080 \
  -p 9090:9090 \
  ghcr.io/juanfont/headscale:0.24.3 \
  serve

Headscale’s configuration file at /etc/headscale/config.yaml:

server_url: https://headscale.corp.example.com
listen_addr: 0.0.0.0:8080
metrics_listen_addr: 0.0.0.0:9090

# TLS — use a reverse proxy (nginx/caddy) in front for production
tls_letsencrypt_hostname: ""

# Database — SQLite for small deployments; PostgreSQL for HA
db_type: sqlite3
db_path: /var/lib/headscale/headscale.db

# OIDC — map your IdP groups to Headscale users
oidc:
  issuer: https://accounts.google.com
  client_id: "headscale-client-id"
  client_secret_path: /run/secrets/headscale-oidc-secret
  scope: ["openid", "profile", "email"]
  extra_params:
    hd: "corp.example.com"

# ACL policy file path
acls_path: /etc/headscale/acls.hujson

# DERP — use Tailscale's public DERP network or run your own
derp:
  server:
    enabled: false
  urls:
    - https://controlplane.tailscale.com/derpmap/default

User and key management via CLI:

# Create a user (maps to a Tailscale "tailnet user")
headscale users create platform-team

# Create a reusable pre-auth key for a server
headscale preauthkeys create \
  --user platform-team \
  --reusable \
  --expiration 24h

# Create a one-time ephemeral key (for CI runners or auto-scaling nodes)
headscale preauthkeys create \
  --user ci \
  --ephemeral \
  --expiration 2h

# List nodes
headscale nodes list

# Assign a tag to a node
headscale nodes tag --identifier 7 --tags tag:production

ACL policy in Headscale uses the same HuJSON format as Tailscale:

// /etc/headscale/acls.hujson
{
  "tagOwners": {
    "tag:production": ["user:admin@corp.example.com"],
    "tag:staging":    ["user:admin@corp.example.com"],
    "tag:ci-runner":  ["user:admin@corp.example.com"]
  },

  "groups": {
    "group:sre":       ["user:alice@corp.example.com", "user:bob@corp.example.com"],
    "group:engineers": ["user:carol@corp.example.com", "user:dave@corp.example.com"]
  },

  "acls": [
    // SRE: SSH to production; Postgres from production app tier
    {
      "action": "accept",
      "src":    ["group:sre"],
      "dst":    ["tag:production:22"]
    },
    {
      "action": "accept",
      "src":    ["tag:production"],
      "dst":    ["tag:production:5432"]
    },
    // Engineers: SSH to staging; HTTPS to internal tooling
    {
      "action": "accept",
      "src":    ["group:engineers"],
      "dst":    ["tag:staging:22", "tag:staging:443"]
    },
    // CI runners: push to registry and reach staging only
    {
      "action": "accept",
      "src":    ["tag:ci-runner"],
      "dst":    ["tag:staging:443", "tag:staging:8080"]
    }
    // Implicit deny-all for everything not matched above
  ]
}

The src/dst/action model evaluates left to right and stops at the first match. Every connection not explicitly permitted is denied — the default posture is deny. For a detailed treatment of identity-aware proxy enforcement patterns that complement this ACL model, see Identity-Aware Proxy Security.

Tailscale ACL Policy: Precise Port-Level Access

The ACL policy language supports both group membership and tag-based matching, with port-level granularity on the destination.

{
  "acls": [
    // Database access: only the app tier can reach Postgres; port-specific
    {
      "action": "accept",
      "src":    ["tag:app-server"],
      "dst":    ["tag:db-primary:5432", "tag:db-replica:5432"]
    },
    // Metrics scraping: Prometheus can reach all nodes on 9090-9100
    {
      "action": "accept",
      "src":    ["tag:prometheus"],
      "dst":    ["tag:production:9090-9100"]
    },
    // Developers: read-only port on staging DB; no production DB access
    {
      "action": "accept",
      "src":    ["group:engineers"],
      "dst":    ["tag:staging:5433"]
    },
    // Tailscale SSH: SRE to production, engineers to staging
    {
      "action": "accept",
      "src":    ["group:sre"],
      "dst":    ["tag:production:22"]
    },
    {
      "action": "accept",
      "src":    ["group:engineers"],
      "dst":    ["tag:staging:22"]
    }
  ],
  "ssh": [
    {
      "action": "accept",
      "src":    ["group:sre"],
      "dst":    ["tag:production"],
      "users":  ["autogroup:nonroot"]
    },
    {
      "action": "accept",
      "src":    ["group:engineers"],
      "dst":    ["tag:staging"],
      "users":  ["autogroup:nonroot"]
    }
  ]
}

ACLs are applied by the Tailscale/Headscale control plane when distributing peer configurations. A node only receives the public keys of peers that the ACL allows it to communicate with. If the ACL does not permit the connection, the receiving node does not even know the source peer exists in the tailnet — its key is simply not distributed to it. This is enforcement at the control plane, not just at the firewall.

Ephemeral Auth Keys for Servers and CI

Ephemeral keys authenticate nodes that should be cleaned up automatically. A CI runner, a container, or an autoscaled instance that spins up and down should not leave stale Tailscale nodes accumulating in the tailnet.

# Tailscale: create an ephemeral key via the admin API
curl -s https://api.tailscale.com/api/v2/tailnet/-/keys \
  -u "${TAILSCALE_API_KEY}:" \
  -d '{
    "capabilities": {
      "devices": {
        "create": {
          "reusable": false,
          "ephemeral": true,
          "preauthorized": true,
          "tags": ["tag:ci-runner"]
        }
      }
    },
    "expirySeconds": 3600
  }' | jq -r '.key'

The returned key is single-use. The CI job uses it to join the tailnet:

# In CI pipeline (GitHub Actions, GitLab CI, etc.)
- name: Connect to tailnet
  run: |
    curl -fsSL https://tailscale.com/install.sh | sh
    sudo tailscale up \
      --auth-key="${EPHEMERAL_TAILSCALE_KEY}" \
      --hostname="ci-runner-${GITHUB_RUN_ID}" \
      --ephemeral

When the job completes, the runner’s node is automatically removed from the tailnet because the key was provisioned with ephemeral: true. No manual cleanup required; no stale keys accumulating.

For Headscale, the equivalent:

headscale preauthkeys create \
  --user ci \
  --ephemeral \
  --expiration 1h \
  --output json | jq -r '.key'

For long-lived servers that should not require human interaction to re-authenticate, use reusable (non-ephemeral) pre-auth keys with tag constraints and a controlled expiry. Inject the key at provisioning time via a secrets manager (Vault, AWS Secrets Manager, GCP Secret Manager):

# Terraform provisioner pattern
sudo tailscale up \
  --auth-key="$(aws secretsmanager get-secret-value \
    --secret-id tailscale/prod-auth-key \
    --query SecretString \
    --output text)" \
  --advertise-tags=tag:production \
  --hostname="$(hostname)" \
  --ssh

Subnet Routers: Exposing On-Prem Resources

Not every on-prem device can run Tailscale or WireGuard. Network printers, legacy PLCs, on-prem databases, and IoT devices cannot install an agent. Subnet routers solve this: a single Tailscale node that runs on a machine within the on-prem subnet advertises that subnet to the tailnet, acting as a gateway.

# Enable IP forwarding on the subnet router host
sudo sysctl -w net.ipv4.ip_forward=1
echo "net.ipv4.ip_forward=1" | sudo tee /etc/sysctl.d/99-tailscale.conf

# Advertise the on-prem subnet
sudo tailscale up \
  --advertise-routes=192.168.10.0/24 \
  --auth-key="${AUTH_KEY}" \
  --advertise-tags=tag:subnet-router \
  --hostname=onprem-gateway-01

In the Tailscale admin console or via the API, approve the advertised route:

# Via Tailscale API
curl -s -X POST \
  "https://api.tailscale.com/api/v2/device/${DEVICE_ID}/routes" \
  -u "${TAILSCALE_API_KEY}:" \
  -d '{"routes": ["192.168.10.0/24"]}'

After approval, tailnet nodes whose ACL permits it can reach 192.168.10.0/24 through the subnet router. The ACL policy controls which tailnet members can use the subnet route:

{
  "acls": [
    // Network team: full access to on-prem subnet
    {
      "action": "accept",
      "src":    ["group:network-ops"],
      "dst":    ["192.168.10.0/24:*"]
    },
    // SRE: only the on-prem monitoring endpoint
    {
      "action": "accept",
      "src":    ["group:sre"],
      "dst":    ["192.168.10.50:9100"]
    }
  ]
}

For high availability, run two subnet routers advertising the same route. Tailscale performs automatic failover between them.

Exit Nodes and Traffic Inspection Implications

An exit node routes all non-tailnet traffic through itself — the equivalent of a full-tunnel VPN concentrator, but under Tailscale ACL control. A tailnet node configured as an exit node forwards internet-bound traffic from clients that choose to use it.

# Designate a node as an exit node
sudo tailscale up \
  --advertise-exit-node \
  --auth-key="${AUTH_KEY}" \
  --advertise-tags=tag:exit-node

From a client:

# Use a specific exit node
sudo tailscale up --exit-node=100.64.0.50

# Or by name
sudo tailscale up --exit-node=exit-node-01

The security implications depend on whether the exit node is trusted by the organization. If your exit node is a corporate-managed VM with a TLS inspection proxy or an IDS, all egress traffic from remote employees flows through it — providing visibility. If the exit node is a personal server or a cloud VM without inspection, it provides privacy from the network but no organizational visibility.

For corporate use cases with DLP or egress inspection requirements, the exit node should be:

A managed VM with a TLS inspection proxy (e.g., Squid with mitmproxy, or a commercial SSE/CASB appliance).
Tagged tag:exit-node; exit node usage restricted by ACL to specific groups.
Logging DNS queries and TCP connections to a SIEM.

Exit node traffic does not bypass Tailscale ACLs for tailnet resources. A node using an exit node still cannot reach tailnet resources that the ACL forbids.

Hardening WireGuard and Tailscale Endpoints

Key Expiry

By default, Tailscale requires nodes to re-authenticate periodically (key expiry). Disabling key expiry on a device is a permanent grant of access, unrevocable without manually removing the device from the tailnet. Never disable key expiry on user devices. For servers with automated re-auth (using pre-auth keys injected at boot), key expiry can be left enabled — the node re-authenticates automatically. The risk of disabled expiry is that a decommissioned device retains indefinite tailnet membership if the admin forgets to remove it.

# Check key expiry status for all nodes
tailscale status --json | jq '.Peer[] | {hostname: .HostName, keyExpiry: .KeyExpiry}'

# Headscale: list nodes with expiry info
headscale nodes list

Audit Logging

Tailscale’s network flow logs capture connection metadata (source, destination, bytes, start/end times). Enable log streaming to your SIEM:

# Enable log streaming via Tailscale API
curl -s -X POST \
  "https://api.tailscale.com/api/v2/tailnet/-/logging/network" \
  -u "${TAILSCALE_API_KEY}:" \
  -d '{
    "logStreamingDestination": {
      "destinationType": "panther",
      "url": "https://logs.corp.example.com/tailscale"
    }
  }'

For Headscale, enable audit logging in the configuration and forward via Fluentd or the OpenTelemetry Collector:

# headscale config
log:
  level: info
  format: json

# Pipe headscale stdout to vector/fluentd for SIEM forwarding

Monitor for:

Connections to resources that are not in the expected ACL set (ACL misconfiguration or policy drift).
High-volume data transfer from a single node (potential exfiltration).
Nodes connecting from unexpected geographic regions.
Authentication failures or repeated key generation events.

Tailscale SSH vs Traditional SSH

Tailscale SSH replaces traditional SSH host key verification and authorized_keys management with identity derived from the tailnet. When --ssh is passed at tailscale up, the Tailscale daemon intercepts SSH connections on port 22, verifies the connecting user’s tailnet identity, and applies the ssh ACL block before handing off to a shell.

# Enable Tailscale SSH on a server node
sudo tailscale up \
  --ssh \
  --advertise-tags=tag:production \
  --auth-key="${AUTH_KEY}"

Benefits over traditional SSH:

No authorized_keys files to manage or audit. Access is controlled entirely by the ACL policy.
SSH sessions are auditable via Tailscale’s session logging, including session recordings if configured.
Certificate rotation is handled by the control plane; no manual host key distribution.
User identity is cryptographically bound to tailnet membership, not to static key files on disk.

The trade-off: Tailscale SSH requires the tailscale daemon to be running on the node. If the daemon crashes or the node loses connectivity to the control plane, SSH connections are dropped. For production servers, keep traditional SSH on a non-standard port as a break-glass mechanism, restricted to the subnet router’s IP:

# /etc/ssh/sshd_config (break-glass config)
Port 2222
ListenAddress 192.168.10.1  # Only reachable via subnet router; not on tailnet interface
AllowUsers breakglass-admin
PasswordAuthentication no
PubkeyAuthentication yes
AuthorizedKeysFile /etc/ssh/authorized_keys.d/%u
MaxAuthTries 3

WireGuard Interface Hardening

On nodes running raw WireGuard (without Tailscale):

# Firewall rules: only accept WireGuard UDP; drop everything else inbound
nft add table inet wg-filter
nft add chain inet wg-filter input { type filter hook input priority 0 \; policy drop \; }
nft add rule inet wg-filter input udp dport 51820 accept
nft add rule inet wg-filter input ct state established,related accept
nft add rule inet wg-filter input iif lo accept

# On the WireGuard interface itself: enforce AllowedIPs via firewall
# (Defense-in-depth — WireGuard's cryptokey routing already enforces this at the kernel)
nft add chain inet wg-filter wg0-input { type filter hook input priority -100 \; }
nft add rule inet wg-filter wg0-input iif wg0 ip saddr != { 10.10.0.0/24 } drop

Rotate private keys on a schedule. WireGuard does not automatically rotate keys; a compromised private key grants permanent access until manually revoked (the compromised public key must be removed from all peer configurations).

# Key rotation script — run via cron or configuration management
OLD_KEY=$(cat /etc/wireguard/private.key)
wg genkey | tee /etc/wireguard/private.key.new | wg pubkey > /etc/wireguard/public.key.new
chmod 600 /etc/wireguard/private.key.new

# Distribute the new public key to all peers before cutting over
# ... (peer update step via Ansible/Salt/API) ...

# Atomically swap in the new key
mv /etc/wireguard/private.key.new /etc/wireguard/private.key
wg set wg0 private-key /etc/wireguard/private.key

Hardening Checklist

[ ] ACL policy stored in version control; changes reviewed via pull request before applying
[ ] Default-deny enforced: no wildcard dst: ["*:*"] entries except for explicit admin grants
[ ] Ephemeral keys used for all CI runners and autoscaled compute; no stale nodes accumulate
[ ] Key expiry enabled on all user devices; servers use pre-auth keys with automated re-auth
[ ] Tailscale SSH enabled on production nodes; authorized_keys removed from those nodes
[ ] Break-glass SSH configured on a non-standard port reachable only via subnet router
[ ] Subnet router routes approved explicitly; not set to auto-approve all advertised routes
[ ] Exit nodes tagged and ACL-restricted; not available to all tailnet users
[ ] Network flow logs streaming to SIEM; alerts on unexpected cross-tag connections
[ ] WireGuard interface (if using raw wg-quick) firewalled: inbound only on 51820/UDP
[ ] Private key rotation scheduled (annual minimum; immediate on device compromise)
[ ] Headscale (if self-hosted) running behind a TLS-terminating reverse proxy; metrics endpoint not publicly accessible
[ ] Device posture checks integrated: only MDM-enrolled devices can receive auth keys for production tags