Production Access Management with Teleport and Boundary: Brokered, Recorded, Auditable Access

Problem

Operator access to production hosts has long been a structural weakness:

Static SSH keys distributed by config management; key rotation rarely happens; departed engineers’ keys often persist.
Bastion hosts with shared accounts; “who logged in” requires correlating multiple logs.
VPN + direct SSH model gives broad network access on top of host access.
Database access via shared passwords in 1Password / Vault that everyone copies into their .psqlrc.
Kubernetes access via long-lived kubeconfigs distributed manually.

The pattern: operators need access; access becomes static; access drifts; access leaks. Each compromise of an operator’s laptop / credentials grants the attacker the same broad, persistent reach.

By 2026, brokered access management is the default. Teleport (gravitational) and HashiCorp Boundary are the two leading open-source options; commercial offerings like StrongDM, Tailscale SSH, and Cloudflare Access provide similar capabilities.

The architecture: a centralized broker sits between operators and production. Operators authenticate to the broker via SSO (OIDC); the broker issues a short-lived certificate or session; the operator uses it to access production. The broker records the session, enforces RBAC, and revokes access at the end of the certificate’s lifetime.

The properties that matter:

Just-in-time access with short TTLs (default 1-8 hours).
Session recording for SSH, kubectl exec, database queries.
Identity-bound to the SSO user, not a shared account.
RBAC at the broker, not per-host.
Audit centralized and structured.
No-VPN model — broker handles network access.

The specific gaps in pre-broker setups:

Operator SSH keys persist after employees leave.
Database access uses shared credentials; no per-user audit.
Kubernetes context switching is manual; everyone has every cluster’s kubeconfig.
“Quick fix” production access becomes permanent.
Compliance audits manually correlate “who did what” across many sources.

This article covers Teleport’s architecture, RBAC and approval workflow, session recording for SSH / k8s / databases, the migration from static SSH, and the operational integration with on-call / break-glass scenarios.

Target systems: Teleport 16+, HashiCorp Boundary 0.18+, StrongDM (commercial), Cloudflare Access; integrates with Okta, Azure AD, Google Workspace, GitHub for SSO.

Threat Model

Adversary 1 — Stolen operator credential: an attacker has the operator’s laptop, SSH keys, or VPN cert. Wants to reach production.
Adversary 2 — Departed employee: still has access via legacy SSH keys not yet rotated.
Adversary 3 — Insider abuse: legitimate operator using their access for unauthorized actions, expecting no accountability.
Adversary 4 — Lateral movement: attacker with one host’s access tries to reach others on the production network.
Adversary 5 — Credential exfil from operator endpoint: malware on operator laptop reads SSH keys, browser cookies, Vault tokens.
Access level: Adversary 1 has operator endpoint compromise. Adversary 2 has historical credentials. Adversary 3 has legitimate access. Adversary 4 has one host. Adversary 5 has malware on endpoint.
Objective: Read or modify production data; pivot through production; act without leaving traceable footprint.
Blast radius: With static SSH: a stolen key reaches every host the user had access to, indefinitely. With brokered access: stolen credentials grant only what’s currently active (often nothing — sessions are short-lived); the broker enforces fresh authentication for each access.

Configuration

Step 1: Teleport Architecture

Teleport has three roles:

Auth Service: issues certificates; manages roles and users; integrates with SSO.
Proxy Service: the public-facing entry point; handles user-facing connections.
Agents: installed on each managed resource (server, k8s cluster, database). Connect outbound to the Proxy.

Operators connect to the Proxy via tsh (CLI) or a web UI; the Proxy routes to the Agent on the requested resource.

# Install Teleport on the auth + proxy server.
curl -fsSL https://get.gravitational.com/teleport.repo | sudo bash
sudo apt install teleport

# /etc/teleport.yaml on the central server.
version: v3
teleport:
  nodename: teleport-prod
  data_dir: /var/lib/teleport
  log:
    output: stdout
auth_service:
  enabled: yes
  cluster_name: prod.internal.example.com
  authentication:
    type: oidc
    oidc:
      issuer_url: https://login.example.com/
      client_id: teleport-prod
      client_secret_file: /etc/teleport/oidc-client-secret
      redirect_url: https://teleport.example.com/v1/webapi/oidc/callback
      claims_to_roles:
        - claim: groups
          value: sre-team
          roles: [sre]
        - claim: groups
          value: payments-team
          roles: [payments-developer]
proxy_service:
  enabled: yes
  public_addr: teleport.example.com:443
  https_keypairs:
    - cert_file: /etc/teleport/tls.crt
      key_file: /etc/teleport/tls.key

Connect to SSO; user authenticates via existing identity provider; Teleport issues a certificate scoped to the user’s roles.

Step 2: Role Definitions

# roles/sre.yaml
kind: role
version: v7
metadata:
  name: sre
spec:
  options:
    max_session_ttl: 8h
    forward_agent: false
    require_session_mfa: true   # MFA on every session; replays don't help
  allow:
    logins: [ec2-user, ubuntu]
    node_labels:
      'env': ['production', 'staging']
    kubernetes_labels:
      'cluster': ['*']
    db_labels:
      'env': ['production']
    db_users: ['readonly', 'breakglass']
    db_names: ['*']
    rules:
      - resources: [session]
        verbs: [list, read]
  deny:
    logins: [root]
    node_labels:
      'tag': ['hardened-prod']   # certain tagged hosts are off-limits even to SRE

The role is a least-privilege shape: SREs can SSH as ec2-user to production hosts, exec into any K8s namespace, query databases as readonly or breakglass user. They cannot become root directly; cannot reach hardened-prod-tagged hosts.

Step 3: Session Recording

Every session is recorded by default. SSH sessions to disk as keystroke replay; kubectl exec sessions as command + output; database queries as audit log.

auth_service:
  session_recording: node-sync     # record at the node
  proxy_listener_mode: multiplex

Recording modes:

node-sync — recording to the node’s local disk, synced to S3 / GCS in real time. Tamper-resistant: the operator on the node can’t easily delete the recording.
proxy — recording at the Proxy. Less reliable if the connection terminates abnormally.
off — explicitly disabled; not recommended for production.

Replays:

tsh ssh sessions ls
# 2026-04-29 10:00  alice@teleport-prod  node prod-web-01  duration 12m
# 2026-04-29 11:30  bob@teleport-prod    node prod-db-01   duration 5m

tsh play <session-id>
# Replays the session at original speed in the terminal.

For database access, queries are logged in structured JSON:

tsh db logs query <session-id>
# {"timestamp": "2026-04-29T10:01:23Z", "user": "alice", "query": "SELECT * FROM orders WHERE customer_id = 5"}

Step 4: Access Requests / JIT

For elevated permissions beyond a user’s standing role, use access requests:

# roles/sre.yaml — extends the role.
spec:
  allow:
    request:
      roles: [prod-write, prod-admin]
      thresholds:
        - approve: 1
          deny: 1
      annotations:
        purpose: ['*']

# Operator requests elevated access.
tsh request create --roles=prod-write --reason="Investigating SEV2 incident #1234, fixing payment-api memory leak"

# Approver receives notification (Slack, email).
tsh request review --approve <request-id> --reason="Approved per incident-1234"

# Operator now has prod-write role for the request's TTL.
tsh login --request-id=<id>

Standing access stays minimal; elevation is recorded with explicit business reason.

Step 5: Database Access

Database connections route through Teleport, with per-query logging.

db_service:
  enabled: yes
  resources:
    - labels:
        env: production
  databases:
    - name: payments-db
      protocol: postgres
      uri: payments-db.internal:5432
      ad: {}

Operator connects via tsh:

tsh db login payments-db --db-user=readonly --db-name=payments
tsh db connect payments-db
# psql session opens; queries logged via Teleport's audit pipeline.

The actual database password isn’t shared; Teleport authenticates to the database on the operator’s behalf using a service account (or a Vault-issued dynamic credential, depending on configuration).

Step 6: Kubernetes Access

Teleport can serve as the Kubernetes API entrypoint:

kubernetes_service:
  enabled: yes
  kube_cluster_name: prod-east
  resources:
    - labels:
        env: production

Operators get a kubeconfig automatically:

tsh kube login prod-east
kubectl get pods   # routed through Teleport

The kubeconfig is short-lived; refreshes via tsh login. Compromised laptop = access expires within the session TTL.

Kubernetes RBAC is layered with Teleport’s role:

kind: role
version: v7
metadata:
  name: payments-developer
spec:
  allow:
    kubernetes_labels:
      env: ['production']
    kubernetes_groups: ['payments-developer']
    kubernetes_users: ['payments-developer']
    kubernetes_resources:
      - kind: pod
        namespace: payments
        verbs: ['get', 'list', 'exec']
      - kind: deployment
        namespace: payments
        verbs: ['get', 'list', 'patch']

Per-resource access at the K8s layer; Teleport issues a kubeconfig with the appropriate restrictions.

Step 7: Boundary as an Alternative

Boundary’s model is similar but with a different decomposition:

# Boundary controller config.
controller {
  name = "controller-prod"
  description = "Production controller"

  database {
    url = "postgresql://boundary@db.internal:5432/boundary"
  }
}

listener "tcp" {
  address = "0.0.0.0:9200"
  purpose = "api"
  tls_disable = false
  tls_cert_file = "/etc/boundary/tls.crt"
  tls_key_file = "/etc/boundary/tls.key"
}

# Define a target.
boundary targets create tcp -name "payments-db" \
  -default-port 5432 \
  -session-connection-limit 10 \
  -session-max-seconds 14400 \
  -host-source ${HOST_SET_ID}

# Operator connects.
boundary connect postgres -target-id <target-id>
# Boundary establishes a tunnel to payments-db; operator's psql connects to localhost.

Boundary is lighter on session recording but excellent on TCP-level brokering. Often paired with Vault for dynamic database credentials.

Step 8: Integration With On-Call

For emergency access during incidents:

# roles/oncall-emergency.yaml
kind: role
metadata:
  name: oncall-emergency
spec:
  options:
    max_session_ttl: 4h
    require_session_mfa: true
  allow:
    request:
      roles: [prod-admin]
      thresholds:
        - approve: 1                     # only need 1 approver for emergencies
          deny: 1
      annotations:
        incident: ['SEV1', 'SEV2']       # require an incident reason
        pagerduty_active_incident: ['true']

A custom plugin verifies the user is currently on-call (PagerDuty integration); approves automatically if so. Audit log captures every emergency elevation with the linked incident.

Step 9: Telemetry

teleport_sessions_started_total{cluster, type}
teleport_sessions_recorded_total{cluster}
teleport_access_requests_total{role, result}
teleport_session_duration_seconds
teleport_failed_auth_total{user, reason}
teleport_audit_events_total{type}

Alert on:

failed_auth_total rising for a specific user — possible compromised credential or stale config.
access_requests_total{result="denied"} rising — possible attempted privilege escalation.
Sessions exceeding expected duration — possible long-running unauthorized activity.

Expected Behaviour

Signal	Static SSH + bastions	Teleport / Boundary
Departed employee SSH access	Until rotation	Expires at TTL (8h) automatically
Per-user audit	Manual log correlation	Centralized; structured by user/session
Session replay	Manual / impossible	Built-in; standard kubectl/SSH/DB
Per-resource access control	Per-host config	Centralized RBAC
Database query audit	Database-side audit (often disabled)	Per-query log with user attribution
Kubernetes access	Kubeconfig per cluster, distributed manually	Routed through broker; identity-bound
Compromise of one machine	Broad reach	Bounded to that one TTL window

Trade-offs

Aspect	Benefit	Cost	Mitigation
Centralized broker	Single audit pane	Single point of failure	Run in HA; for short outages, break-glass procedure documented.
Session recording	Forensic clarity	Storage cost; privacy implications	Encrypted at rest; short retention for routine sessions, longer for elevated.
Identity-bound certificates	No shared credentials	SSO outage = no access	Plan break-glass for SSO unavailability.
Just-in-time elevation	Minimal standing access	Friction during incidents	Auto-approve for on-call during active SEV1; manual approval otherwise.
Database brokerage	Per-query audit	Latency overhead	Negligible for interactive queries; matters less for query-heavy applications.
Migration from static	Long-term security improvement	Engineering effort	Phased rollout; per-team migration; coexist briefly.

Failure Modes

Failure	Symptom	Detection	Recovery
Teleport / Boundary controller down	All operator access blocked	Service health check fails	HA deploy; for outage longer than break-glass window, use the documented emergency procedure.
SSO outage	All authentication blocked	Auth provider error	Local fallback users with high audit; avoid using unless emergency.
Stale role assignments	Departed user retains access	Periodic SSO sync drift	Continuous sync with SSO; alert on stale role assignments.
Session recording storage full	Sessions stop recording	`teleport_sessions_recorded_total` rate stalls	Alert on storage utilization; migrate to S3 / GCS at >70% local capacity.
Approval flow misuse	Auto-approval for non-emergency	Audit shows elevations without active incidents	Tighten auto-approval criteria; require manual approval for non-incident elevations.
Privilege drift	Role accumulates over time	Periodic role audit	Review roles quarterly; remove unused permissions.
Latency-sensitive workload broken	DB queries slow due to broker	App-level latency monitors	Some workloads (high-throughput batch) bypass broker; document the exemption with compensating controls.

When to Consider a Managed Alternative

Self-hosted Teleport / Boundary requires HA infrastructure, session-recording storage, integration with SSO, and ongoing operational care (8-15 hours/month for a multi-environment fleet).

Teleport Cloud: managed Teleport; SSO integration; session storage included.
StrongDM: commercial broker; multi-protocol, audit pipeline integrated.
Cloudflare Access: identity-bound zero-trust gateway; integrates with existing IdP.

For organizations with strict regulatory constraints prohibiting third-party brokers, self-hosted Teleport with on-prem session recording is the right choice.