ContainerSSH as a Bastion Host Replacement: Ephemeral Containers per SSH Session

ContainerSSH as a Bastion Host Replacement: Ephemeral Containers per SSH Session

Problem

A traditional bastion host starts life as a clean, hardened jump server. Six months later it has accumulated forty system user accounts (some belonging to people who left the company), a dozen authorized_keys files of uncertain provenance, shared credentials for downstream systems stored in home directories, and no isolation whatsoever between concurrent sessions. It is the most sensitive host on the network and the least frequently audited.

The structural problems are inherent to the model, not the configuration:

Persistent user accounts. Adding a new operator means creating a system user on the bastion. Offboarding requires deleting it — but offboarding processes fail, people change roles, contractors finish engagements. Accounts accumulate. Each one is a potential entry point.

No session isolation. Two administrators logged in simultaneously share the same kernel, the same process namespace, and often the same filesystem. A compromised session can read /proc/<pid>/environ of adjacent processes, scrape credentials from memory with ptrace, or simply write to shared directories.

Shared state becomes a liability. Files placed on the bastion persist. Operators copy credentials, keys, or sensitive data to the bastion for convenience. Over time it becomes a data store nobody intended to create.

Lateral movement risk. Compromising the bastion gives an attacker an authenticated foothold on the internal network with established trust relationships to every host the bastion reaches. The bastion’s network position — wide egress, trusted by internal hosts — makes post-exploitation trivial.

No automatic cleanup. When a session ends, nothing changes on the host. Logs persist in the user’s home directory. Bash history accumulates. Files remain until manually purged.

ContainerSSH replaces this model with ephemeral isolation. Every SSH connection triggers a webhook authentication call, which returns a container specification. ContainerSSH launches a fresh container, attaches the SSH session to it, and destroys the container when the connection closes. The bastion host itself runs no persistent user sessions. There are no system users to accumulate. The only persistent state is the ContainerSSH binary, its config, and the SSH host key.

Target systems: Linux hosts running Docker, Podman, or with access to a Kubernetes cluster. ContainerSSH supports Docker, Podman, and Kubernetes backends. Minimum Go 1.21 for building from source; prebuilt binaries and container images are available. The auth webhook can be written in any language.

Threat Model

The threat model for a bastion host is one of concentration risk. The bastion aggregates access to the entire internal network, so its compromise has outsized consequences.

Adversary 1 — Attacker compromises an active session. Via a vulnerability in a tool installed on the bastion, an attacker escapes from a user’s shell into the bastion’s OS.

  • Traditional bastion: Attacker is now on the bastion OS with access to all other active sessions, all home directories, and the bastion’s network interfaces. They can read other sessions via /proc, scrape credentials from memory, exfiltrate files from shared directories, and begin lateral movement to internal hosts immediately.
  • ContainerSSH: Attacker is inside a container with the network, filesystem, and process namespace of that single session. Other sessions run in separate containers. The bastion OS is not directly accessible. Escape requires a container breakout (a separate, harder exploit class). The blast radius is scoped to one container’s lifetime.

Adversary 2 — Credential theft from the bastion host itself. An attacker with read access to the bastion’s filesystem.

  • Traditional bastion: All user home directories are readable. SSH private keys, API tokens, .bash_history files containing plaintext commands with embedded secrets — all present, all persistent.
  • ContainerSSH: The bastion filesystem contains the ContainerSSH binary, a config file, and the SSH host key. No user home directories. No persistent credentials. Auth decisions are made by the external webhook, not by files on disk.

Adversary 3 — Insider threat / data exfiltration after session ends. A malicious operator copies sensitive data to the bastion and retrieves it later.

  • Traditional bastion: Data persists indefinitely in the user’s home directory unless purged by policy. No automatic enforcement.
  • ContainerSSH: The container is destroyed on disconnect. Any data inside it is gone. There is no persistent storage unless the container spec explicitly mounts a volume — which the auth webhook controls and can deny.

Adversary 4 — Stale account exploitation. A former employee’s account on the bastion.

  • Traditional bastion: The account exists until manually removed. SSH keys may still be valid. Organisational offboarding processes are imperfect.
  • ContainerSSH: There are no system accounts on the bastion host. Authentication is entirely delegated to the webhook. Disabling access means updating the auth backend (LDAP, IAM, a database row). No bastion-specific cleanup required.
Scenario Traditional Bastion Blast Radius ContainerSSH Blast Radius
Active session compromise Full host access, all sessions, internal network Single session container, container lifetime only
Filesystem read access All user home dirs, all credentials, history files ContainerSSH binary, config, host key — no user data
Post-session data retrieval Data persists indefinitely Container destroyed on disconnect
Stale account Valid until manually removed, SSH keys live on host No system accounts; disable in auth backend only
Auth backend compromised N/A (no central auth) Full SSH access granted; requires auth backend hardening

Configuration

Architecture Overview

ContainerSSH sits in front of a container backend. The flow for every SSH connection is:

  1. Client connects to ContainerSSH’s listening port (default 2222).
  2. ContainerSSH calls the config webhook with the username and returns per-user backend configuration (optional; can be static).
  3. ContainerSSH calls the auth webhook with the username and credential (password or public key). The webhook returns true or false, and optionally overrides the container configuration for that user.
  4. On successful auth, ContainerSSH instructs the backend (Docker, Kubernetes, Podman) to launch a container.
  5. The SSH session is attached to the container process. The user gets a shell inside the container.
  6. On disconnect, ContainerSSH signals the backend to remove the container.
Client ──SSH──► ContainerSSH ──webhook──► Auth Service (LDAP/IAM/DB)
                     │
                     └──backend API──► Docker / Kubernetes / Podman
                                            │
                                            └──► Ephemeral Container (per session)

Installing ContainerSSH

# Download the latest release binary (check https://containerssh.io/releases for current version)
CSSH_VERSION="0.5.1"
curl -Lo /usr/local/bin/containerssh \
  "https://github.com/ContainerSSH/ContainerSSH/releases/download/v${CSSH_VERSION}/containerssh-linux-amd64"
chmod +x /usr/local/bin/containerssh

# Verify the checksum (replace with the actual SHA256 from the release page)
curl -Lo /tmp/containerssh.sha256 \
  "https://github.com/ContainerSSH/ContainerSSH/releases/download/v${CSSH_VERSION}/containerssh-linux-amd64.sha256"
sha256sum -c /tmp/containerssh.sha256

# Generate the SSH host key for ContainerSSH
# This is the key clients will verify — keep it stable and back it up.
mkdir -p /etc/containerssh
ssh-keygen -t ed25519 -f /etc/containerssh/host_key -C "containerssh-bastion" -N ""
chmod 600 /etc/containerssh/host_key

Minimal config.yaml (Docker Backend)

# /etc/containerssh/config.yaml
# ContainerSSH minimal configuration — Docker backend

log:
  level: info
  format: json

ssh:
  listen: "0.0.0.0:2222"
  # Path to the SSH host private key generated above.
  # Clients will see this key's fingerprint — rotate with care.
  hostkeys:
    - /etc/containerssh/host_key

auth:
  # ContainerSSH calls this URL for every authentication attempt.
  # The webhook receives the username and credential; returns success/failure.
  webhook:
    url: "http://auth-service.internal:8080/auth"
    timeout: 5s

configserver:
  # Optional: per-user container config overrides.
  # Omit this section to use static backend config for all users.
  url: "http://auth-service.internal:8080/config"
  timeout: 5s

backend: docker

docker:
  connection:
    # Docker socket — use a Unix socket for local Docker, TCP for remote.
    host: "unix:///var/run/docker.sock"

  execution:
    # Container image for SSH sessions.
    # Build and maintain this image separately — see "Session Container Image" below.
    launch:
      containerConfig:
        image: "registry.internal/bastion-shell:latest"
        # Run as a non-root user inside the container.
        user: "10000:10000"
        # No privilege escalation inside the container.
        securityOpt:
          - "no-new-privileges:true"
        # Read-only root filesystem — no persistent writes.
        readonlyRootfs: true
        # Tmpfs for /tmp so tools that need temp space still work.
        tmpfs:
          /tmp: "rw,noexec,nosuid,size=64m"
      # Automatically remove the container when the SSH session ends.
      # This is ContainerSSH's default behaviour; shown explicitly for clarity.
      removeOnExit: true
    # Limit resources to prevent a single session from affecting others.
    resources:
      requests:
        cpu: "100m"
        memory: "128Mi"
      limits:
        cpu: "500m"
        memory: "256Mi"

Auth Webhook

ContainerSSH sends a JSON POST to the auth URL for each authentication attempt. The payload structure differs slightly for password vs. public key auth:

// Password auth payload sent by ContainerSSH to the auth webhook
{
  "username": "alice",
  "remoteAddress": "203.0.113.45:54321",
  "connectionId": "b3f2a...",
  "passwordBase64": "c2VjcmV0"
}

// Public key auth payload
{
  "username": "alice",
  "remoteAddress": "203.0.113.45:54321",
  "connectionId": "b3f2a...",
  "publicKeyBase64": "AAAAB3NzaC1yc2EAAAA..."
}

The webhook returns a JSON object indicating success or failure:

// Success response
{"success": true}

// Failure response
{"success": false}

A minimal Python webhook that validates against a static list (replace with LDAP, IAM, or a database lookup in production):

# auth_webhook.py — minimal ContainerSSH auth webhook
# Run with: uvicorn auth_webhook:app --host 0.0.0.0 --port 8080

import base64
import hashlib
import hmac
from fastapi import FastAPI, Request
from pydantic import BaseModel
from typing import Optional

app = FastAPI()

# In production: look these up from LDAP, a database, or an IAM service.
# Keys are usernames; values are lists of authorised public key fingerprints (SHA256).
AUTHORISED_KEYS: dict[str, list[str]] = {
    "alice": ["SHA256:AbCdEfGhIjKlMnOpQrStUvWxYz0123456789abcdef="],
    "bob":   ["SHA256:ZyXwVuTsRqPoNmLkJiHgFeDcBa9876543210fedcba="],
}

class AuthRequest(BaseModel):
    username: str
    remoteAddress: str
    connectionId: str
    passwordBase64: Optional[str] = None
    publicKeyBase64: Optional[str] = None


def ssh_pubkey_fingerprint(pubkey_b64: str) -> str:
    """Derive SHA256 fingerprint from base64-encoded SSH public key blob."""
    raw = base64.b64decode(pubkey_b64)
    digest = hashlib.sha256(raw).digest()
    fp = base64.b64encode(digest).decode().rstrip("=")
    return f"SHA256:{fp}"


@app.post("/auth")
async def authenticate(req: AuthRequest):
    allowed_fps = AUTHORISED_KEYS.get(req.username, [])

    if req.publicKeyBase64:
        fp = ssh_pubkey_fingerprint(req.publicKeyBase64)
        success = fp in allowed_fps
    else:
        # Password auth — not recommended; shown for completeness.
        # In practice: reject password auth entirely or validate against MFA.
        success = False

    # Log every attempt — structured for SIEM ingestion.
    import json, sys
    print(json.dumps({
        "event": "auth_attempt",
        "username": req.username,
        "remote_address": req.remoteAddress,
        "connection_id": req.connectionId,
        "auth_type": "pubkey" if req.publicKeyBase64 else "password",
        "success": success,
    }), file=sys.stderr)

    return {"success": success}


@app.post("/config")
async def container_config(req: Request):
    # Optional: return per-user container config overrides.
    # Return an empty object to use the static config from config.yaml.
    return {}

Session Container Image

The container image defines the tools available inside each SSH session. Apply the same principles as a distroless or hardened base image: include only what operators need, nothing more.

# Dockerfile.bastion-shell
# Minimal session container for ContainerSSH bastion access.
# Build: docker build -t registry.internal/bastion-shell:latest -f Dockerfile.bastion-shell .

FROM debian:12-slim AS base

# Install only the tools operators need for bastion access.
# Adjust to your environment — add kubectl, aws-cli, etc. as required.
RUN apt-get update && apt-get install -y --no-install-recommends \
    bash \
    openssh-client \
    curl \
    ca-certificates \
    less \
    vim-tiny \
    jq \
    netcat-openbsd \
    iputils-ping \
    dnsutils \
  && rm -rf /var/lib/apt/lists/*

# Create a non-root user for the SSH session.
# All sessions run as this user regardless of the username used to connect.
RUN groupadd -g 10000 operator && \
    useradd -u 10000 -g operator -m -s /bin/bash -d /home/operator operator

# No package manager cache, no setuid binaries beyond what the base requires.
RUN find / -perm /4000 -type f 2>/dev/null | \
    grep -v -E '^/(bin/su|usr/bin/passwd|usr/bin/newgrp)$' | \
    xargs chmod u-s 2>/dev/null || true

USER 10000:10000
WORKDIR /home/operator

# ContainerSSH will exec the shell directly — no SSH daemon needed in the container.
CMD ["/bin/bash"]

Build and push this image to your internal registry. Tag by date or digest, not just latest, so container launches are reproducible and rollback is straightforward.

SSH Certificate Authority Integration

If your organisation uses an SSH CA (see SSH Certificate Authority), the auth webhook validates certificates rather than raw public keys. The certificate’s principal becomes the identity passed to your authorisation logic.

ContainerSSH passes the presented public key (or certificate public key) to the auth webhook. To accept certificates signed by your CA, the webhook extracts the certificate, verifies the CA signature, checks the principal, and validates the validity window:

# Certificate validation addition to auth_webhook.py
import subprocess
import tempfile
import os

def validate_ssh_certificate(pubkey_b64: str, username: str, ca_pubkey_path: str) -> bool:
    """
    Validate an SSH certificate presented via ContainerSSH's auth webhook.
    Returns True if the cert is valid, signed by the trusted CA, and the
    principal matches the connecting username.
    """
    raw = base64.b64decode(pubkey_b64)

    # Write cert to a temp file for ssh-keygen inspection
    with tempfile.NamedTemporaryFile(suffix="-cert.pub", delete=False) as f:
        f.write(pubkey_b64.encode())
        cert_path = f.name

    try:
        result = subprocess.run(
            ["ssh-keygen", "-L", "-f", cert_path],
            capture_output=True, text=True, timeout=5
        )
        if result.returncode != 0:
            return False

        output = result.stdout
        # Check that the certificate is signed by the trusted CA
        # and that the connecting username is in the principals list.
        # A production implementation should parse the cert binary directly
        # using a library like golang.org/x/crypto/ssh for stronger validation.
        return (
            f"Public key: {_ca_fingerprint(ca_pubkey_path)}" in output or
            username in _extract_principals(output)
        )
    finally:
        os.unlink(cert_path)

In practice, use a Go-based webhook for certificate validation — Go’s golang.org/x/crypto/ssh package parses SSH certificates natively and verifies CA signatures without shelling out.

Migrating from a Traditional Bastion

Migration is a DNS cutover combined with moving SSH key distribution to the auth webhook:

  1. Inventory current bastion users. Export all authorized_keys entries. Map them to identities in your directory (LDAP, Active Directory, GitHub usernames).
  2. Build the auth webhook. Import the public key fingerprints into your auth backend. Validate that every current user can authenticate against the webhook before cutting over.
  3. Deploy ContainerSSH in parallel. Run it on a different port or hostname. Have operators test access without changing the production bastion.
  4. Update DNS. Change the bastion.example.com A record to point to the ContainerSSH host. Old bastion remains reachable at legacy-bastion.example.com during the transition window.
  5. Communicate the host key change. Clients will see a new SSH host key (ContainerSSH’s key, not the old bastion’s). Distribute the new fingerprint or use an SSH CA host certificate so clients verify it automatically.
  6. Decommission the old bastion. After a stabilisation period (one to two weeks), remove the legacy bastion. Delete all its system user accounts and rotate any credentials that were stored on it.

Host Key Management

ContainerSSH’s host key is what SSH clients verify to confirm they are connecting to the legitimate bastion. It must be stable, securely stored, and backed up.

# Generate a dedicated host key for ContainerSSH.
# Use Ed25519 — compact, fast, strong.
ssh-keygen -t ed25519 -f /etc/containerssh/host_key -C "bastion.example.com" -N ""

# Store the private key in a secrets manager (Vault, AWS Secrets Manager, etc.)
# and retrieve it at service startup rather than leaving it on disk.
# Example: retrieve from HashiCorp Vault at startup
vault kv get -field=private_key secret/containerssh/host_key > /etc/containerssh/host_key
chmod 600 /etc/containerssh/host_key

# Distribute the public key fingerprint to operators' known_hosts,
# or sign it with your SSH CA so clients verify it automatically.
ssh-keygen -l -f /etc/containerssh/host_key.pub
# Output: 256 SHA256:xxxx... bastion.example.com (ED25519)

Add the fingerprint to operator workstations’ ~/.ssh/known_hosts, or better, use an SSH CA host certificate:

# Sign ContainerSSH's host key with your SSH CA
# (requires the host CA private key — see /articles/linux/ssh-certificate-authority/)
ssh-keygen -s /etc/ssh/ca/host_ca \
  -I "bastion.example.com" \
  -h \
  -n "bastion.example.com,bastion,10.0.1.50" \
  -V "+52w" \
  /etc/containerssh/host_key.pub
# Creates /etc/containerssh/host_key-cert.pub

# Reference the cert in ContainerSSH config.yaml:
# ssh:
#   hostkeys:
#     - /etc/containerssh/host_key
#   hostcerts:
#     - /etc/containerssh/host_key-cert.pub

With a host certificate, clients trust the CA (one line in known_hosts) rather than individual host fingerprints. No known_hosts update is needed when the bastion’s IP changes.

Running ContainerSSH as a Systemd Service

# /etc/systemd/system/containerssh.service
[Unit]
Description=ContainerSSH — ephemeral container SSH gateway
After=network.target docker.service
Requires=docker.service

[Service]
Type=simple
User=containerssh
Group=containerssh
ExecStart=/usr/local/bin/containerssh --config /etc/containerssh/config.yaml
Restart=on-failure
RestartSec=5s

# Harden the service process itself.
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/etc/containerssh
CapabilityBoundingSet=
AmbientCapabilities=

[Install]
WantedBy=multi-user.target
# Create a dedicated service account (no login shell, no home directory).
useradd -r -s /usr/sbin/nologin -M -d /nonexistent containerssh
chown -R containerssh:containerssh /etc/containerssh
# Add to docker group to allow Docker socket access, OR use rootless Docker.
usermod -aG docker containerssh

systemctl daemon-reload
systemctl enable --now containerssh
systemctl status containerssh

Expected Behaviour

Scenario ContainerSSH Behaviour Security Outcome
User connects with valid SSH key Auth webhook called → returns success → container launched from specified image → user dropped into shell Fresh isolated container, no shared state with prior sessions
User disconnects (clean exit) ContainerSSH signals backend to remove container immediately Container and all in-container data destroyed; no residual state
User disconnects (ungraceful: network drop) ContainerSSH detects TCP close or SSH keepalive timeout → signals backend to remove container Container removed even without clean session termination
Attacker compromises the session container Attacker is inside one container’s namespace; container has no persistent storage, read-only root FS, non-root user Blast radius limited to that container’s lifetime; no access to other sessions or bastion OS without additional exploit
Auth webhook returns failure SSH connection rejected at the authentication stage; no container launched Zero-trust enforcement: no backend access without explicit auth approval
Auth webhook is unavailable All SSH connections fail (ContainerSSH cannot authenticate without a webhook response) Fail-closed behaviour; no unauthenticated access; requires webhook HA for production
Container backend (Docker) unreachable Auth webhook may succeed; container launch fails; SSH connection dropped with error User cannot connect; alert on backend errors; ensures no half-open sessions
Container image pull fails Container launch fails; SSH connection dropped User cannot connect; pre-pull images on the host to avoid runtime pull delays
Session timeout (idle) Configure ssh.clientAliveInterval and ssh.clientAliveCountMax; ContainerSSH closes connection and removes container Idle sessions do not persist indefinitely; containers reclaimed automatically

Trade-offs

Trade-off Implication Mitigation
Stateless sessions — no persistent work directory Operators cannot leave files between sessions; any work-in-progress is lost on disconnect Mount a network volume (NFS, S3FS) into the container via the auth webhook config override; scope it per-user
Container startup latency Each SSH connection waits for a container to start (typically 0.5–3 s with a pre-pulled image, longer on cold pull) Pre-pull the session image on the ContainerSSH host; use a slim image; accept latency as a security trade-off
Webhook as a single point of failure If the auth webhook is down, all SSH access is blocked — harder to troubleshoot under incident conditions Run multiple webhook instances behind a load balancer; implement a local fallback (break-glass account on the bastion OS itself, distinct from ContainerSSH)
Container image maintenance burden The session image must be patched for OS CVEs like any other container image Add the session image to your container scanning pipeline; automate rebuilds on base image updates
Container escape risk A container breakout exploit would give the attacker access to the bastion OS Use a hardened container runtime (gVisor, Kata Containers) for higher-assurance deployments; keep the bastion OS minimal
No persistent audit trail inside container Session activity inside the container is not automatically captured Use a session recorder (ContainerSSH has built-in audit log support); ship logs to a centralised SIEM before container removal
Auth webhook must understand SSH pubkey formats Validating SSH public keys and certificates requires SSH-specific parsing logic Use a Go-based webhook with golang.org/x/crypto/ssh; avoid reinventing certificate validation

Failure Modes

Failure Mode Symptom Immediate Impact Resolution
Auth webhook down SSH connections hang then time out (configurable timeout, default 5 s) All SSH access blocked — no sessions can be established Deploy webhook as HA service (2+ replicas); monitor webhook health endpoint; maintain break-glass access via separate mechanism
Container backend unreachable (Docker socket gone, Kubernetes API unavailable) Auth succeeds but container launch fails; SSH client receives connection error No new sessions; existing sessions unaffected (containers already running) Monitor Docker/Kubernetes API health; alert on ContainerSSH launch errors; restart Docker daemon or restore API connectivity
SSH host key lost or rotated unexpectedly Clients see WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED and refuse to connect All client connections rejected until known_hosts updated Back up host key in a secrets manager; use SSH CA host certificates so clients trust the CA, not individual keys; document rotation procedure
Session container OOM (Out of Memory) Container is killed by the OOM killer; SSH session drops User’s session is terminated abruptly; no data loss beyond in-session work Set appropriate memory limits in config; monitor container metrics; alert on OOM events; resize limits if legitimate
Container not cleaned up on ungraceful disconnect Container continues running after TCP session drops; resources consumed until ContainerSSH detects timeout Wasted compute; potential data exposure if container has mounted volumes Configure SSH keepalive aggressively (clientAliveInterval 30, clientAliveCountMax 3); monitor for orphaned containers; ContainerSSH’s cleanup goroutine handles most cases
Auth webhook returns wrong result (false positive) Unauthorised user gains SSH access; correct user is denied Security control failure; potential unauthorised access Add integration tests to auth webhook; log all decisions; alert on unexpected access patterns; review webhook code as security-critical
Config webhook returns malformed container spec Container launch fails; SSH connection dropped User cannot connect; may affect all users if using shared config endpoint Validate webhook responses in CI; test config webhook separately from auth webhook; fall back to static config if config webhook is optional
ContainerSSH process crashes SSH port becomes unreachable All SSH access blocked Run under systemd with Restart=on-failure; monitor port availability; alert on service restarts