WASM Cold-Start Optimization for Security Workloads: Pre-Compilation, Snapshots, and AOT

WASM Cold-Start Optimization for Security Workloads: Pre-Compilation, Snapshots, and AOT

Problem

Security-relevant WASM workloads run on the request hot path: auth filters, policy decisions, content classifiers, prompt-injection detectors, request-rewriters. Each request invokes the WASM module; cold-start latency per invocation is the latency users experience.

A naive WASM deploy:

[Module bytes loaded]
  -> [JIT compile to native code]    100-500 ms
  -> [Module instantiate]              5-50 ms
  -> [First call execution]            normal

500 ms cold-start is acceptable for batch workloads. For per-request invocation it’s catastrophic; users see seconds of latency on cold tenants. The standard mitigations:

  • Pre-compilation (AOT). Compile the WASM module to native code at build time; ship the compiled artifact.
  • Snapshot resume. Boot the runtime once, snapshot post-init state, resume from snapshot in microseconds.
  • Module pooling. Keep instances warm across requests rather than instantiating per-call.

By 2026 Wasmtime, WAMR, and Wasmer all support AOT (.cwasm files); Spin pre-compiles modules at upload; Cloudflare Workers and Fastly Compute use proprietary equivalents.

Yet many production deployments still ship .wasm and accept the cold-start. The hardening implication: a slow cold start makes operators tempted to keep modules warm too long, share state across requests, or skip security-relevant restart/rotation cycles. Fast cold-start enables tighter operational discipline.

The specific gaps in default deployments:

  • WASM modules shipped as .wasm; runtime JIT-compiles at first call.
  • Module pool reuse across tenants in multi-tenant deployments (state leakage risk).
  • No snapshot infrastructure; every cold start pays JIT cost.
  • AOT artifact distribution depends on per-architecture builds not handled.
  • Pre-warming strategies are ad-hoc; uneven warmth across tenants.

This article covers AOT compilation in Wasmtime, snapshot-resume patterns, per-tenant pool management, AOT artifact validation (signing the AOT output), and the security trade-offs of fast cold-start patterns.

Target systems: Wasmtime 22+ with cwasm AOT compilation; Spin 2.6+ with pre-compilation; wasmEdge 0.14+ with WAMR AOT; Cloudflare Workers / Fastly Compute (managed equivalents).

Threat Model

  • Adversary 1 — Slow-start attacker: generates requests targeting cold tenants to amplify per-request cost; DoS via cold-start capacity exhaustion.
  • Adversary 2 — Tampered AOT artifact: an attacker substitutes the pre-compiled artifact between build and execution.
  • Adversary 3 — Cross-tenant pool leakage: a runtime that pools WASM instances reuses one across tenants, possibly leaking state.
  • Adversary 4 — Snapshot tampering: an attacker modifies the persisted snapshot file before resume.
  • Access level: Adversary 1 has request-input ability. Adversary 2 has artifact-distribution-path access. Adversary 3 has tenant-level capability. Adversary 4 has filesystem access on the host.
  • Objective: Bypass WASM-mediated security checks (by exploiting cold-start weaknesses), exhaust capacity (cold-start DoS), execute attacker-modified code via tampered artifacts.
  • Blast radius: Slow cold-start enables DoS; tampered AOT bypasses signature checks; pool leakage = cross-tenant state access.

Configuration

Step 1: Pre-Compile at Build Time

Compile .wasm to .cwasm once, distribute the AOT artifact:

# Source compilation.
cargo build --release --target wasm32-wasip2

# Pre-compile to native.
wasmtime compile target/wasm32-wasip2/release/auth-filter.wasm \
  -o auth-filter.cwasm \
  --cranelift-opt-level speed

The .cwasm file is architecture-specific (x86_64 vs aarch64). For multi-arch deployments, build per architecture.

# x86_64.
wasmtime compile auth-filter.wasm -o auth-filter-x86_64.cwasm --target x86_64

# arm64.
wasmtime compile auth-filter.wasm -o auth-filter-arm64.cwasm --target aarch64

# Sign each.
cosign sign-blob --yes auth-filter-x86_64.cwasm > auth-filter-x86_64.cwasm.sig
cosign sign-blob --yes auth-filter-arm64.cwasm > auth-filter-arm64.cwasm.sig

Distribute via OCI registry (per OCI WASM Module Signing) with cosign signatures. The runtime verifies signatures before loading.

Step 2: Runtime AOT Loading

Wasmtime loads .cwasm files much faster than .wasm:

// Load AOT artifact.
let engine = Engine::new(&Config::new())?;
let module = unsafe { Module::deserialize_file(&engine, "auth-filter.cwasm")? };

// Instantiate is now sub-millisecond.
let mut store = Store::new(&engine, ());
let instance = Instance::new(&mut store, &module, &[])?;
let auth_check = instance.get_typed_func::<(i32, i32), i32>(&mut store, "auth_check")?;

The unsafe is meaningful: deserializing pre-compiled code is faster but skips the WASM verifier’s check. Always verify a cosign signature before loading; only load .cwasm files from trusted source.

Step 3: Snapshot-Based Cold Start

For runtimes that boot heavyweight (loading large WASI configurations, initializing host-state, populating policy databases), snapshot post-init state.

// Boot once, snapshot.
let engine = Engine::new(&Config::new())?;
let mut linker = Linker::new(&engine);
wasmtime_wasi::add_to_linker_sync(&mut linker, |s| s)?;

let module = Module::from_file(&engine, "policy-engine.wasm")?;
let mut store = Store::new(&engine, /* initial state */);
let instance = linker.instantiate(&mut store, &module)?;

// Trigger any one-time init.
let init_fn = instance.get_typed_func::<(), ()>(&mut store, "init")?;
init_fn.call(&mut store, ())?;

// Save snapshot.
let snapshot = store.snapshot()?;
snapshot.save_to_file("policy-engine.snap")?;

On cold start:

// Resume from snapshot.
let store = Store::resume_from_snapshot(&engine, "policy-engine.snap")?;
// All host-state is restored; ready to serve requests in microseconds.

For Wasmtime specifically, pooling allocator + AOT + snapshot combine to reach single-digit-microsecond cold-start.

Step 4: Per-Tenant Pool Management

Multi-tenant deployments must isolate per-tenant state across instances:

struct TenantPool {
    engine: Engine,
    module: Module,
    instances: Mutex<VecDeque<TenantInstance>>,
    max_pool_size: usize,
}

struct TenantInstance {
    store: Store<TenantState>,
    instance: Instance,
    last_used: Instant,
}

impl TenantPool {
    fn acquire(&self, tenant_id: &str) -> TenantInstance {
        let mut pool = self.instances.lock();

        // Find an idle instance for this tenant.
        if let Some(idx) = pool.iter().position(|i| i.tenant_state.tenant_id == tenant_id) {
            return pool.remove(idx).unwrap();
        }

        // Otherwise, instantiate a new one.
        let mut store = Store::new(&self.engine, TenantState::new(tenant_id));
        let instance = Instance::new(&mut store, &self.module, &[]).unwrap();
        TenantInstance { store, instance, last_used: Instant::now() }
    }

    fn release(&self, mut inst: TenantInstance) {
        // Reset per-request state (clear caches, etc.) before returning to pool.
        inst.store.data_mut().reset_per_request();
        inst.last_used = Instant::now();
        let mut pool = self.instances.lock();
        if pool.len() >= self.max_pool_size {
            // Evict idlest.
            pool.pop_front();
        }
        pool.push_back(inst);
    }
}

Per-tenant pools mean Tenant A’s state can’t leak to Tenant B’s instance. The reset_per_request step clears any per-request data (caches, temporary variables) before returning the instance to the pool — critical for state hygiene.

For platforms with high tenant counts (10k+ tenants), pool size per tenant is small (1-3 instances); LRU eviction across tenants. Cold tenants pay first-request cost; warm tenants serve from pool.

Step 5: AOT Artifact Verification

The .cwasm file is more dangerous than .wasm — it skips the verifier. Treat as compiled binary:

fn load_signed_aot(path: &str, expected_sig: &Signature) -> Result<Module, Error> {
    let bytes = std::fs::read(path)?;
    // Verify signature first.
    let cosign_result = cosign_verify(bytes.as_slice(), expected_sig)?;
    if !cosign_result.is_valid() {
        return Err(Error::SignatureInvalid);
    }
    // Verify the build provenance ties back to a known source.
    if cosign_result.subject != EXPECTED_BUILD_WORKFLOW {
        return Err(Error::ProvenanceInvalid);
    }
    let engine = Engine::default();
    let module = unsafe { Module::deserialize(&engine, &bytes)? };
    Ok(module)
}

Without signature verification, an attacker who substitutes the .cwasm runs arbitrary native code (the runtime trusts the file). With verification + provenance: the file came from an approved build pipeline.

Step 6: Snapshot Hygiene

Snapshots persist runtime state. For security-relevant modules, what gets snapshotted is consequential.

// Don't snapshot an instance that has processed user data.
fn ready_for_snapshot(state: &TenantState) -> bool {
    state.requests_served == 0 && state.config_loaded
}

Snapshot only post-init, pre-request. A snapshot taken after handling user data could persist that data; resuming a different request from this snapshot leaks.

For platforms managing snapshot files:

  • Per-tenant snapshot directory; never share across tenants.
  • Encrypt snapshots at rest (the snapshot contains compiled native code paths and any host-state).
  • Validate signatures on snapshot files at load time (similar to .cwasm).

Step 7: Pool Lifetime and Refresh

Even with fast cold-start, instance pools should be refreshed periodically. A long-lived instance accumulates JIT-compiled code paths, host-side memory, and any state-related drift.

fn should_evict(inst: &TenantInstance, max_age: Duration, max_requests: u64) -> bool {
    inst.last_used.elapsed() > max_age || inst.requests_served > max_requests
}

// Background task: evict stale instances.
fn evict_stale(pool: &TenantPool) {
    let mut p = pool.instances.lock();
    p.retain(|i| !should_evict(i, Duration::from_secs(3600), 10_000));
}

For security-critical modules, force per-request fresh instances (no pooling at all). The cold-start cost is paid; the security benefit is no cross-request state at all.

Step 8: Telemetry

wasm_cold_start_seconds{tenant, module}                histogram
wasm_aot_load_seconds                                  histogram
wasm_snapshot_resume_seconds                           histogram
wasm_pool_hit_total{tenant}                            counter
wasm_pool_miss_total{tenant}                           counter
wasm_instance_pool_size{tenant}                        gauge
wasm_signature_verification_failure_total              counter

Alert on:

  • wasm_signature_verification_failure_total non-zero — possible tampering.
  • Cold-start latency p99 rising — pool starvation; hit-rate dropping.
  • Pool sizes growing unbounded — memory leak.

Step 9: Per-Architecture Build and Distribution

For multi-arch deployments (x86_64 + arm64), build .cwasm per arch. OCI artifacts can include per-arch variants:

# OCI manifest for multi-arch WASM artifact.
schemaVersion: 2
mediaType: application/vnd.oci.image.index.v1+json
manifests:
  - mediaType: application/vnd.wasm.config.v0+json
    digest: sha256:abc123...   # x86_64 cwasm
    platform: {architecture: amd64, os: linux}
  - mediaType: application/vnd.wasm.config.v0+json
    digest: sha256:def456...   # aarch64 cwasm
    platform: {architecture: arm64, os: linux}
  - mediaType: application/vnd.wasm.config.v0+json
    digest: sha256:ghi789...   # source .wasm (architecture-independent)

Runtime selects the matching artifact at pull time. Source .wasm is included for portability (runtimes without AOT support).

Expected Behaviour

Signal .wasm + JIT .cwasm AOT
Cold start time 100-500 ms 1-10 ms
First call latency Cold-start + execution Just execution
Memory at startup High (JIT code generation) Lower (pre-compiled)
Verification before load WASM verifier validates Cosign signature validates
Architecture support Source-only Per-arch artifacts needed
Instance pool reuse Pays JIT once per pool Sub-ms instantiation

Trade-offs

Aspect Benefit Cost Mitigation
AOT Sub-ms cold start Per-arch artifacts; signing required Build matrix; reuse cosign infrastructure.
Snapshot resume Microsecond cold start Snapshot lifecycle complexity Per-tenant snapshot dirs; encrypted storage.
Per-tenant pools Strong tenant isolation More instances cached in memory LRU eviction across tenants; bound total pool memory.
Pool reuse Eliminates per-request cost Risk of state leakage reset_per_request discipline; for high-stakes modules, no pooling.
Signature on .cwasm Tamper detection Build-pipeline integration Standard cosign + SLSA workflow.
Multi-arch builds Native performance per arch Build complexity Standard CI matrix.

Failure Modes

Failure Symptom Detection Recovery
.cwasm loaded without signature Tampered code can run Audit logs show .cwasm load without verify step Code-review the loader; CI test that signature verification is mandatory.
Snapshot leaks user data Cross-tenant state visible Investigation reveals data-cross-correlation across tenants Snapshot pre-request only; never after request data.
Pool size unbounded Memory exhausts Process OOM LRU eviction; cap total pool memory.
reset_per_request missed State leaks across requests within same tenant Hard to detect; manifest as occasional “wrong context” responses Test pool reuse with deliberately-distinct requests; automated check for state-clearing.
Per-arch build mismatch Module fails to load on some hosts Runtime errors Build matrix in CI; test on each target arch.
AOT version skew Module compiled with old Wasmtime fails on new Loading errors after Wasmtime upgrade Pin runtime + module-build versions together; rebuild on runtime upgrade.
Signature key rotation Old artifacts no longer load Loading errors after key rotation Re-sign during rotation window; transition gracefully.