WASM Cold-Start Optimization for Security Workloads: Pre-Compilation, Snapshots, and AOT
Problem
Security-relevant WASM workloads run on the request hot path: auth filters, policy decisions, content classifiers, prompt-injection detectors, request-rewriters. Each request invokes the WASM module; cold-start latency per invocation is the latency users experience.
A naive WASM deploy:
[Module bytes loaded]
-> [JIT compile to native code] 100-500 ms
-> [Module instantiate] 5-50 ms
-> [First call execution] normal
500 ms cold-start is acceptable for batch workloads. For per-request invocation it’s catastrophic; users see seconds of latency on cold tenants. The standard mitigations:
- Pre-compilation (AOT). Compile the WASM module to native code at build time; ship the compiled artifact.
- Snapshot resume. Boot the runtime once, snapshot post-init state, resume from snapshot in microseconds.
- Module pooling. Keep instances warm across requests rather than instantiating per-call.
By 2026 Wasmtime, WAMR, and Wasmer all support AOT (.cwasm files); Spin pre-compiles modules at upload; Cloudflare Workers and Fastly Compute use proprietary equivalents.
Yet many production deployments still ship .wasm and accept the cold-start. The hardening implication: a slow cold start makes operators tempted to keep modules warm too long, share state across requests, or skip security-relevant restart/rotation cycles. Fast cold-start enables tighter operational discipline.
The specific gaps in default deployments:
- WASM modules shipped as
.wasm; runtime JIT-compiles at first call. - Module pool reuse across tenants in multi-tenant deployments (state leakage risk).
- No snapshot infrastructure; every cold start pays JIT cost.
- AOT artifact distribution depends on per-architecture builds not handled.
- Pre-warming strategies are ad-hoc; uneven warmth across tenants.
This article covers AOT compilation in Wasmtime, snapshot-resume patterns, per-tenant pool management, AOT artifact validation (signing the AOT output), and the security trade-offs of fast cold-start patterns.
Target systems: Wasmtime 22+ with cwasm AOT compilation; Spin 2.6+ with pre-compilation; wasmEdge 0.14+ with WAMR AOT; Cloudflare Workers / Fastly Compute (managed equivalents).
Threat Model
- Adversary 1 — Slow-start attacker: generates requests targeting cold tenants to amplify per-request cost; DoS via cold-start capacity exhaustion.
- Adversary 2 — Tampered AOT artifact: an attacker substitutes the pre-compiled artifact between build and execution.
- Adversary 3 — Cross-tenant pool leakage: a runtime that pools WASM instances reuses one across tenants, possibly leaking state.
- Adversary 4 — Snapshot tampering: an attacker modifies the persisted snapshot file before resume.
- Access level: Adversary 1 has request-input ability. Adversary 2 has artifact-distribution-path access. Adversary 3 has tenant-level capability. Adversary 4 has filesystem access on the host.
- Objective: Bypass WASM-mediated security checks (by exploiting cold-start weaknesses), exhaust capacity (cold-start DoS), execute attacker-modified code via tampered artifacts.
- Blast radius: Slow cold-start enables DoS; tampered AOT bypasses signature checks; pool leakage = cross-tenant state access.
Configuration
Step 1: Pre-Compile at Build Time
Compile .wasm to .cwasm once, distribute the AOT artifact:
# Source compilation.
cargo build --release --target wasm32-wasip2
# Pre-compile to native.
wasmtime compile target/wasm32-wasip2/release/auth-filter.wasm \
-o auth-filter.cwasm \
--cranelift-opt-level speed
The .cwasm file is architecture-specific (x86_64 vs aarch64). For multi-arch deployments, build per architecture.
# x86_64.
wasmtime compile auth-filter.wasm -o auth-filter-x86_64.cwasm --target x86_64
# arm64.
wasmtime compile auth-filter.wasm -o auth-filter-arm64.cwasm --target aarch64
# Sign each.
cosign sign-blob --yes auth-filter-x86_64.cwasm > auth-filter-x86_64.cwasm.sig
cosign sign-blob --yes auth-filter-arm64.cwasm > auth-filter-arm64.cwasm.sig
Distribute via OCI registry (per OCI WASM Module Signing) with cosign signatures. The runtime verifies signatures before loading.
Step 2: Runtime AOT Loading
Wasmtime loads .cwasm files much faster than .wasm:
// Load AOT artifact.
let engine = Engine::new(&Config::new())?;
let module = unsafe { Module::deserialize_file(&engine, "auth-filter.cwasm")? };
// Instantiate is now sub-millisecond.
let mut store = Store::new(&engine, ());
let instance = Instance::new(&mut store, &module, &[])?;
let auth_check = instance.get_typed_func::<(i32, i32), i32>(&mut store, "auth_check")?;
The unsafe is meaningful: deserializing pre-compiled code is faster but skips the WASM verifier’s check. Always verify a cosign signature before loading; only load .cwasm files from trusted source.
Step 3: Snapshot-Based Cold Start
For runtimes that boot heavyweight (loading large WASI configurations, initializing host-state, populating policy databases), snapshot post-init state.
// Boot once, snapshot.
let engine = Engine::new(&Config::new())?;
let mut linker = Linker::new(&engine);
wasmtime_wasi::add_to_linker_sync(&mut linker, |s| s)?;
let module = Module::from_file(&engine, "policy-engine.wasm")?;
let mut store = Store::new(&engine, /* initial state */);
let instance = linker.instantiate(&mut store, &module)?;
// Trigger any one-time init.
let init_fn = instance.get_typed_func::<(), ()>(&mut store, "init")?;
init_fn.call(&mut store, ())?;
// Save snapshot.
let snapshot = store.snapshot()?;
snapshot.save_to_file("policy-engine.snap")?;
On cold start:
// Resume from snapshot.
let store = Store::resume_from_snapshot(&engine, "policy-engine.snap")?;
// All host-state is restored; ready to serve requests in microseconds.
For Wasmtime specifically, pooling allocator + AOT + snapshot combine to reach single-digit-microsecond cold-start.
Step 4: Per-Tenant Pool Management
Multi-tenant deployments must isolate per-tenant state across instances:
struct TenantPool {
engine: Engine,
module: Module,
instances: Mutex<VecDeque<TenantInstance>>,
max_pool_size: usize,
}
struct TenantInstance {
store: Store<TenantState>,
instance: Instance,
last_used: Instant,
}
impl TenantPool {
fn acquire(&self, tenant_id: &str) -> TenantInstance {
let mut pool = self.instances.lock();
// Find an idle instance for this tenant.
if let Some(idx) = pool.iter().position(|i| i.tenant_state.tenant_id == tenant_id) {
return pool.remove(idx).unwrap();
}
// Otherwise, instantiate a new one.
let mut store = Store::new(&self.engine, TenantState::new(tenant_id));
let instance = Instance::new(&mut store, &self.module, &[]).unwrap();
TenantInstance { store, instance, last_used: Instant::now() }
}
fn release(&self, mut inst: TenantInstance) {
// Reset per-request state (clear caches, etc.) before returning to pool.
inst.store.data_mut().reset_per_request();
inst.last_used = Instant::now();
let mut pool = self.instances.lock();
if pool.len() >= self.max_pool_size {
// Evict idlest.
pool.pop_front();
}
pool.push_back(inst);
}
}
Per-tenant pools mean Tenant A’s state can’t leak to Tenant B’s instance. The reset_per_request step clears any per-request data (caches, temporary variables) before returning the instance to the pool — critical for state hygiene.
For platforms with high tenant counts (10k+ tenants), pool size per tenant is small (1-3 instances); LRU eviction across tenants. Cold tenants pay first-request cost; warm tenants serve from pool.
Step 5: AOT Artifact Verification
The .cwasm file is more dangerous than .wasm — it skips the verifier. Treat as compiled binary:
fn load_signed_aot(path: &str, expected_sig: &Signature) -> Result<Module, Error> {
let bytes = std::fs::read(path)?;
// Verify signature first.
let cosign_result = cosign_verify(bytes.as_slice(), expected_sig)?;
if !cosign_result.is_valid() {
return Err(Error::SignatureInvalid);
}
// Verify the build provenance ties back to a known source.
if cosign_result.subject != EXPECTED_BUILD_WORKFLOW {
return Err(Error::ProvenanceInvalid);
}
let engine = Engine::default();
let module = unsafe { Module::deserialize(&engine, &bytes)? };
Ok(module)
}
Without signature verification, an attacker who substitutes the .cwasm runs arbitrary native code (the runtime trusts the file). With verification + provenance: the file came from an approved build pipeline.
Step 6: Snapshot Hygiene
Snapshots persist runtime state. For security-relevant modules, what gets snapshotted is consequential.
// Don't snapshot an instance that has processed user data.
fn ready_for_snapshot(state: &TenantState) -> bool {
state.requests_served == 0 && state.config_loaded
}
Snapshot only post-init, pre-request. A snapshot taken after handling user data could persist that data; resuming a different request from this snapshot leaks.
For platforms managing snapshot files:
- Per-tenant snapshot directory; never share across tenants.
- Encrypt snapshots at rest (the snapshot contains compiled native code paths and any host-state).
- Validate signatures on snapshot files at load time (similar to .cwasm).
Step 7: Pool Lifetime and Refresh
Even with fast cold-start, instance pools should be refreshed periodically. A long-lived instance accumulates JIT-compiled code paths, host-side memory, and any state-related drift.
fn should_evict(inst: &TenantInstance, max_age: Duration, max_requests: u64) -> bool {
inst.last_used.elapsed() > max_age || inst.requests_served > max_requests
}
// Background task: evict stale instances.
fn evict_stale(pool: &TenantPool) {
let mut p = pool.instances.lock();
p.retain(|i| !should_evict(i, Duration::from_secs(3600), 10_000));
}
For security-critical modules, force per-request fresh instances (no pooling at all). The cold-start cost is paid; the security benefit is no cross-request state at all.
Step 8: Telemetry
wasm_cold_start_seconds{tenant, module} histogram
wasm_aot_load_seconds histogram
wasm_snapshot_resume_seconds histogram
wasm_pool_hit_total{tenant} counter
wasm_pool_miss_total{tenant} counter
wasm_instance_pool_size{tenant} gauge
wasm_signature_verification_failure_total counter
Alert on:
wasm_signature_verification_failure_totalnon-zero — possible tampering.- Cold-start latency p99 rising — pool starvation; hit-rate dropping.
- Pool sizes growing unbounded — memory leak.
Step 9: Per-Architecture Build and Distribution
For multi-arch deployments (x86_64 + arm64), build .cwasm per arch. OCI artifacts can include per-arch variants:
# OCI manifest for multi-arch WASM artifact.
schemaVersion: 2
mediaType: application/vnd.oci.image.index.v1+json
manifests:
- mediaType: application/vnd.wasm.config.v0+json
digest: sha256:abc123... # x86_64 cwasm
platform: {architecture: amd64, os: linux}
- mediaType: application/vnd.wasm.config.v0+json
digest: sha256:def456... # aarch64 cwasm
platform: {architecture: arm64, os: linux}
- mediaType: application/vnd.wasm.config.v0+json
digest: sha256:ghi789... # source .wasm (architecture-independent)
Runtime selects the matching artifact at pull time. Source .wasm is included for portability (runtimes without AOT support).
Expected Behaviour
| Signal | .wasm + JIT | .cwasm AOT |
|---|---|---|
| Cold start time | 100-500 ms | 1-10 ms |
| First call latency | Cold-start + execution | Just execution |
| Memory at startup | High (JIT code generation) | Lower (pre-compiled) |
| Verification before load | WASM verifier validates | Cosign signature validates |
| Architecture support | Source-only | Per-arch artifacts needed |
| Instance pool reuse | Pays JIT once per pool | Sub-ms instantiation |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| AOT | Sub-ms cold start | Per-arch artifacts; signing required | Build matrix; reuse cosign infrastructure. |
| Snapshot resume | Microsecond cold start | Snapshot lifecycle complexity | Per-tenant snapshot dirs; encrypted storage. |
| Per-tenant pools | Strong tenant isolation | More instances cached in memory | LRU eviction across tenants; bound total pool memory. |
| Pool reuse | Eliminates per-request cost | Risk of state leakage | reset_per_request discipline; for high-stakes modules, no pooling. |
| Signature on .cwasm | Tamper detection | Build-pipeline integration | Standard cosign + SLSA workflow. |
| Multi-arch builds | Native performance per arch | Build complexity | Standard CI matrix. |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| .cwasm loaded without signature | Tampered code can run | Audit logs show .cwasm load without verify step |
Code-review the loader; CI test that signature verification is mandatory. |
| Snapshot leaks user data | Cross-tenant state visible | Investigation reveals data-cross-correlation across tenants | Snapshot pre-request only; never after request data. |
| Pool size unbounded | Memory exhausts | Process OOM | LRU eviction; cap total pool memory. |
reset_per_request missed |
State leaks across requests within same tenant | Hard to detect; manifest as occasional “wrong context” responses | Test pool reuse with deliberately-distinct requests; automated check for state-clearing. |
| Per-arch build mismatch | Module fails to load on some hosts | Runtime errors | Build matrix in CI; test on each target arch. |
| AOT version skew | Module compiled with old Wasmtime fails on new | Loading errors after Wasmtime upgrade | Pin runtime + module-build versions together; rebuild on runtime upgrade. |
| Signature key rotation | Old artifacts no longer load | Loading errors after key rotation | Re-sign during rotation window; transition gracefully. |