Running User-Provided WASM Safely: Sandboxing Untrusted Customer Code
Problem
User-provided code execution is the hardest surface to secure on any platform. When a SaaS product lets customers upload and run their own logic — a data-pipeline transform, a game-mod script, a billing-rules plugin, a CI test runner — the platform accepts an adversarial artefact and executes it with real compute and real access to downstream systems.
WebAssembly is now the standard substrate for this pattern. Its linear-memory model, explicit import surface, and no-ambient-authority design make it far safer than native plugins or Docker containers for untrusted third-party code. But the sandbox is only as strong as its configuration. A default Wasmtime embedding with no resource limits and no import restrictions provides almost no additional protection over a bare function call. Making it safe requires deliberate design across six distinct layers.
The use cases driving this problem are concrete and widespread:
- SaaS extensibility platforms (think Shopify Functions, Figma plugins, Stripe rule engines) where merchants or developers upload business logic that runs inside the product.
- Data pipelines that allow users to supply transformation or filtering functions applied to their own data streams.
- Game-modding platforms where players upload WASM modules that execute inside the game engine or simulation loop.
- Low-code/no-code builders that generate or accept WASM-compiled logic attached to workflow nodes.
Each of these has the same threat surface: a module of unknown provenance executes on shared infrastructure, with some channel into your platform’s internal services.
The specific failure modes in a default user-code embedding:
- No CPU bound. A malicious or infinite-looping module occupies a thread indefinitely. Even with async runtimes, enough such modules saturate the thread pool.
- Unrestricted host import access. If host functions for file I/O, outbound network, or internal API calls are linked into the module’s linker without restriction, any user module can call them.
- No memory ceiling. A module requesting 2 GB of linear memory will get it if the host has it available, at the expense of every other tenant on the box.
- No cross-tenant isolation. Two tenants’ modules may share an
Engine, a compilation cache, or a host-side resource table. A compromise or misbehaviour in one module can affect the other. - Output passed downstream unchecked. A module that returns crafted output exploiting downstream SQL parsers, template engines, or deserialisers becomes an injection vector even if it never escapes the WASM sandbox.
Target systems: Wasmtime 22+ embedded as a Rust library. The patterns apply directly to WasmEdge 0.14+ and wazero 1.7+ with equivalent APIs. Examples are in Rust; the Wasmtime C API and Python/Go bindings expose the same resource-limit hooks.
Threat Model
User-supplied WASM introduces a distinct set of adversarial goals compared to trusted first-party code. Understanding them precisely is the prerequisite for choosing controls.
Sandbox escape. The most serious goal. A module that can break out of its linear-memory isolation and read or write host-process memory can exfiltrate secrets, overwrite control structures, and achieve arbitrary code execution on the host. WASM’s memory model makes this hard by design — linear memory is fully isolated — but the sandbox is only the first barrier. Vulnerabilities in the JIT compiler, in the host-function boundary (type confusion, use-after-free in host-side memory passed to the module), or in the seccomp profile can open escape paths. Defense in depth is the response: the WASM sandbox is the primary layer; OS-level seccomp is the fallback.
Denial of service. A module with no CPU or memory cap can exhaust either until the worker process becomes unresponsive. DoS does not require a sandbox escape; it is effective at the application layer. An adversary with module-upload access who wants to damage your platform can upload a module containing loop {} and call it in a tight loop from multiple accounts. Resource limits per execution — fuel, epoch interrupts, memory ceiling — are the direct countermeasure.
Data exfiltration via allowed channels. This is the subtler threat. If a module has legitimate access to a KV store, a database query result, or a user’s file list, it can encode that data in an allowed outbound channel — an HTTP request body, a log message, a numeric return value — and extract it to an attacker-controlled destination. The WASM sandbox does not prevent this; the module is using permitted APIs. The countermeasure is per-tenant allowlisting of outbound destinations, rate-limiting of outbound calls, and audit logging of every host function invocation.
Covert channels. Two tenant modules that cannot directly communicate may be able to use shared platform resources as a covert channel. Tenant A’s module measures the latency of a KV get call; Tenant B’s module writes and deletes keys rapidly to modulate that latency. This is a classical covert channel through a shared resource. Mitigations include per-tenant resource pools for sensitive backends, artificial response-time jitter, and rate limiting that prevents the high-frequency writes needed to sustain a covert channel.
Output injection into downstream systems. A module that cannot escape the sandbox may still attack downstream systems through its return values. If a module that processes user data returns a string, and that string is interpolated into a SQL query or rendered in an HTML template without escaping, the module has achieved SQL injection or XSS despite never touching the host filesystem or network. Output validation — schema checking, type enforcement, string sanitisation — is the control.
Access level across all adversaries: module-upload and module-invocation access via the normal customer API. They do not have host-process access, host-filesystem access, or cross-tenant memory access. Those are the properties the platform must enforce, not assume.
Blast radius: With a correctly hardened embedding, the blast radius of a compromised or malicious module is bounded to the offending tenant’s single invocation. It traps on a resource limit, returns an error, and the next tenant’s invocation is unaffected. Without hardening, a single module upload can take down the entire execution tier.
Configuration
Step 1: Pre-Execution Validation — Inspect the Module Before Accepting It
The earliest defensive layer runs before the module is persisted or ever executed. Static analysis of the WASM binary at upload time catches dangerous patterns before they become runtime risk. Two properties matter: structural validity and import surface.
// module_validator.rs
use wasmparser::{Parser, Payload, WasmFeatures};
use std::collections::HashSet;
/// Imports that are never permitted for user-supplied modules.
/// These names should never appear in user code; they are internal
/// host functions not exposed through the public linker. If a module
/// imports them, it was crafted specifically to probe the platform.
const FORBIDDEN_IMPORTS: &[(&str, &str)] = &[
("env", "__platform_internal_key"),
("env", "exec_command"),
("wasi_snapshot_preview1", "sock_accept"),
("wasi_snapshot_preview1", "sock_open"),
("wasi_snapshot_preview1", "path_open"),
("wasi_snapshot_preview1", "fd_write"),
];
pub fn validate_user_module(
wasm: &[u8],
allowed_imports: &AllowedImports,
) -> Result<(), ValidationError> {
// Size gate first — fast, no parsing needed.
// A 50 MiB WASM binary is unusual; 500 MiB is an attack.
if wasm.len() > 50 * 1024 * 1024 {
return Err(ValidationError::ModuleTooLarge(wasm.len()));
}
// Structural validity. wasmparser checks the binary format and
// type-correctness of the module before we touch it further.
let features = WasmFeatures {
threads: false, // Disallow shared memory; cross-instance comms vector.
multi_memory: false, // No multiple linear memories.
memory64: false, // 64-bit linear memory; rarely needed, broadens attack surface.
relaxed_simd: false, // Less-audited SIMD path.
exceptions: false, // Exception-handling proposal adds interface complexity.
gc: false, // GC; not yet stable for untrusted use.
..WasmFeatures::default()
};
let mut validator = wasmparser::Validator::new_with_features(features);
validator
.validate_all(wasm)
.map_err(|e| ValidationError::MalformedModule(e.to_string()))?;
// Import surface audit.
// Every import the module declares must appear in the allowed set.
// Any import not on the allowlist is an immediate reject.
let mut declared_imports: Vec<(String, String)> = Vec::new();
for payload in Parser::new(0).parse_all(wasm) {
if let Ok(Payload::ImportSection(reader)) = payload {
for import in reader {
let imp = import
.map_err(|e| ValidationError::ParseError(e.to_string()))?;
declared_imports.push((imp.module.to_string(), imp.name.to_string()));
}
}
}
for (module, name) in &declared_imports {
// Hard-blocked: crafted probes for internal functions.
if FORBIDDEN_IMPORTS.contains(&(module.as_str(), name.as_str())) {
return Err(ValidationError::ForbiddenImport {
module: module.clone(),
name: name.clone(),
});
}
// Not on the platform's public allowlist.
if !allowed_imports.permits(module, name) {
return Err(ValidationError::UnpermittedImport {
module: module.clone(),
name: name.clone(),
});
}
}
Ok(())
}
The AllowedImports value is not derived from the module itself — it is the platform’s explicit list of functions that have been reviewed and approved for user code. Anything outside that list is rejected, regardless of whether it looks harmless.
Validation runs synchronously in the upload handler, before the module is stored. A module that fails validation is never persisted and never executed.
// upload_handler.rs
pub async fn handle_upload(
tenant_id: TenantId,
wasm_bytes: Bytes,
allowed_imports: &AllowedImports,
) -> Result<ModuleId, UploadError> {
// Validate before storing.
validate_user_module(&wasm_bytes, allowed_imports)
.map_err(UploadError::ValidationFailed)?;
// Hash the validated bytes. The stored artefact is identified by content hash;
// re-executing with the same hash is guaranteed to re-execute the validated bytes.
let module_id = ModuleId::from_sha256(&wasm_bytes);
store_module(tenant_id, module_id, &wasm_bytes).await?;
// Pre-compile asynchronously. Subsequent executions load the .cwasm artifact;
// they do not pay compilation cost on the request path.
let engine = platform_engine();
let bytes = wasm_bytes.clone();
tokio::task::spawn_blocking(move || {
let module = Module::new(&engine, &bytes)?;
let cwasm = module.serialize()?;
write_cwasm_artifact(tenant_id, module_id, &cwasm)
})
.await??;
Ok(module_id)
}
Step 2: Execution Isolation — One Store Per Execution, No Cross-Tenant State
Cross-tenant isolation starts at the Wasmtime object hierarchy. An Engine shares a JIT compilation cache across all Module and Store objects that use it. A Store holds per-execution state: linear memory, table entries, fuel accounting. The isolation properties follow from this hierarchy.
For the strongest cross-tenant isolation, give each tenant their own Engine. Their compiled code cache is separate; a malicious module cannot poison the cache of another tenant.
// tenant_runtime.rs
use wasmtime::{Config, Engine};
use std::collections::HashMap;
use std::sync::{Arc, RwLock};
pub struct TenantEngine {
pub engine: Engine,
}
impl TenantEngine {
pub fn new(tenant_id: &str) -> anyhow::Result<Self> {
let mut config = Config::new();
// CPU-limiting mechanisms — both enabled.
config.consume_fuel(true);
config.epoch_interruption(true);
// Feature surface: disable everything not needed for user code.
config.wasm_threads(false);
config.wasm_multi_memory(false);
config.wasm_memory64(false);
config.wasm_relaxed_simd(false);
config.wasm_exceptions(false);
config.wasm_gc(false);
config.wasm_reference_types(true); // Safe; needed by component model.
config.wasm_bulk_memory(true); // Safe.
config.wasm_simd(true); // Standard SIMD; audited.
// Per-tenant compilation cache. Tenant A's artifacts cannot
// interfere with Tenant B's at the filesystem layer.
let cache_toml = format!("/var/cache/wasm-platform/{tenant_id}/cache.toml");
let _ = config.cache_config_load(&cache_toml);
Ok(Self {
engine: Engine::new(&config)?,
})
}
}
pub struct EnginePool {
engines: RwLock<HashMap<String, Arc<TenantEngine>>>,
}
impl EnginePool {
pub fn get_or_create(&self, tenant_id: &str) -> anyhow::Result<Arc<TenantEngine>> {
if let Some(e) = self.engines.read().unwrap().get(tenant_id) {
return Ok(e.clone());
}
let mut w = self.engines.write().unwrap();
if let Some(e) = w.get(tenant_id) {
return Ok(e.clone());
}
let te = Arc::new(TenantEngine::new(tenant_id)?);
w.insert(tenant_id.to_string(), te.clone());
Ok(te)
}
}
Each execution creates a fresh Store. No per-execution state leaks across invocations — the store is dropped at the end of the call. This is the core principle: a Store is not a long-lived object that one tenant’s module reuses. It is created immediately before the call and dropped immediately after. Any state that needs to persist between calls lives in the host, scoped to the tenant, and is accessed only through the approved host function surface.
Step 3: Resource Limits — Fuel, Memory, Stack, and Tables
Every resource dimension needs a hard limit. A module that exhausts one limit should trap cleanly without affecting other executions.
// execution.rs
use wasmtime::{Store, ResourceLimiter, Module, Linker, Engine};
struct ExecutionLimits {
max_memory_bytes: usize, // Linear memory ceiling.
max_table_entries: u32, // Function-pointer table ceiling.
max_instances: usize, // Nested instance count.
}
impl ResourceLimiter for ExecutionLimits {
fn memory_growing(
&mut self,
_current: usize,
desired: usize,
_max: Option<usize>,
) -> anyhow::Result<bool> {
Ok(desired <= self.max_memory_bytes)
}
fn table_growing(
&mut self,
_current: u32,
desired: u32,
_max: Option<u32>,
) -> anyhow::Result<bool> {
Ok(desired <= self.max_table_entries)
}
fn instances(&self) -> usize { self.max_instances }
fn tables(&self) -> usize { 4 }
fn memories(&self) -> usize { 1 }
}
pub async fn execute_user_module(
engine: &Engine,
cwasm_path: &Path,
linker: &Linker<HostState>,
input: &[u8],
) -> anyhow::Result<Vec<u8>> {
let limits = ExecutionLimits {
max_memory_bytes: 32 * 1024 * 1024, // 32 MiB — tune per workload tier.
max_table_entries: 4096,
max_instances: 1,
};
let mut store = Store::new(engine, HostState::new(limits));
store.limiter(|s| &mut s.limits);
// Fuel grant: platform-tier-specific.
// 50M fuel units ≈ roughly 5–10 seconds of compute depending on workload.
store.set_fuel(50_000_000)?;
// Epoch deadline: 5 ticks. If the platform epoch thread fires every 50ms,
// this caps wall-clock to ~250ms regardless of fuel.
store.set_epoch_deadline(5);
// Load the pre-compiled artifact — no JIT on the request path.
let module = unsafe { Module::deserialize_file(engine, cwasm_path)? };
let instance = linker.instantiate(&mut store, &module)?;
let run = instance.get_typed_func::<(u32, u32), u32>(&mut store, "run")?;
let output_ptr = run.call(&mut store, write_input(&mut store, input)?)?;
Ok(read_output(&store, output_ptr))
}
The epoch-incrementing thread is started once per process, shared across all engines:
// main.rs — start once at process init.
fn start_epoch_ticker(engines: Vec<Engine>) {
std::thread::Builder::new()
.name("epoch-ticker".into())
.spawn(move || loop {
std::thread::sleep(Duration::from_millis(50));
for engine in &engines {
engine.increment_epoch();
}
})
.expect("epoch ticker thread must start");
}
Using both fuel and epoch interrupts together is intentional. Fuel provides precise metering useful for billing and fair accounting; it counts every WASM operation. Epoch interrupts provide a wall-clock hard deadline that fires regardless of which operations the module is executing. A module that finds a way to consume fuel slowly — long pauses between operations, for instance — still hits the epoch deadline.
Step 4: Restricted Import Surface — Only Approved Host APIs
The linker controls what host functions user code can call. Build the user-module linker from a fixed, audited set. Do not use wasmtime_wasi::add_to_linker — that grants the full WASI surface. Define each permitted function explicitly.
// host_functions.rs
use wasmtime::{Caller, Linker};
pub fn build_user_linker(
engine: &Engine,
tenant: &Tenant,
) -> anyhow::Result<Linker<HostState>> {
let mut linker: Linker<HostState> = Linker::new(engine);
// Logging: structured output only. No filesystem paths, no raw byte dumps.
let tenant_id = tenant.id.clone();
linker.func_wrap(
"platform",
"log",
move |mut caller: Caller<'_, HostState>, msg_ptr: u32, msg_len: u32| {
let msg = read_string_bounded(&caller, msg_ptr, msg_len, 2048)?;
// Strip control characters before handing to logger.
let msg = msg.chars().filter(|c| !c.is_control()).collect::<String>();
caller.data_mut().audit_log.push(AuditEntry::ModuleLog {
tenant: tenant_id.clone(),
message: msg,
});
Ok(())
},
)?;
// KV store: tenant-scoped. The tenant prefix is injected by the host;
// the module cannot address another tenant's keys.
let tenant_prefix = format!("tenant/{}/", tenant.id);
linker.func_wrap(
"platform",
"kv_get",
move |mut caller: Caller<'_, HostState>,
key_ptr: u32,
key_len: u32,
out_ptr: u32| {
let key = read_string_bounded(&caller, key_ptr, key_len, 256)?;
let scoped_key = format!("{tenant_prefix}{key}");
let value = caller.data_mut().kv.get(&scoped_key)?;
write_bytes(&mut caller, out_ptr, &value)
},
)?;
// What is NOT linked:
// - Any filesystem access (path_open, fd_read, fd_write).
// - Any WASI socket or network primitives.
// - Any process-control (proc_exit).
// - Any random source beyond approved get_random.
// - Any internal platform API not on the public surface.
Ok(linker)
}
The key discipline is maintaining the allowlist as a positive list, not a blocklist. Every function that is linked is there because it was explicitly reviewed and approved. No function is linked by default. Adding a new host function requires a code review that assesses what module authors can do with it and whether the access it grants is proportionate to the use case.
Step 5: Network Isolation — No Outbound Connections Unless Explicitly Granted
User modules get no network access unless explicitly granted per-tenant through a platform configuration knob — not through WASI sockets, not through host functions. This is a default-deny policy enforced by the linker: if wasi:sockets is not linked and no HTTP host function is registered, the module has no network path regardless of what it declares in its imports.
For tenants on tiers that legitimately need outbound HTTP, the host function enforces an allowlist that is stored in the platform database — not derived from the module:
// Wire this in only for tenants whose tier includes outbound HTTP.
// Tenants on the base tier get no http_fetch function at all.
if tenant.tier.allows_outbound_http() {
let allowed_hosts: Arc<HashSet<String>> = tenant.allowed_hosts.clone();
linker.func_wrap(
"platform",
"http_fetch",
move |mut caller: Caller<'_, HostState>,
url_ptr: u32,
url_len: u32,
body_ptr: u32,
body_len: u32,
out_ptr: u32| {
let url = read_string_bounded(&caller, url_ptr, url_len, 2048)?;
let parsed = url::Url::parse(&url)
.map_err(|_| anyhow::anyhow!("invalid URL"))?;
// Must be HTTPS — no plaintext exfiltration paths.
if parsed.scheme() != "https" {
return Err(anyhow::anyhow!("only https allowed"));
}
// Host must be on the per-tenant allowlist.
let host = parsed.host_str().unwrap_or("");
if !allowed_hosts.contains(host) {
caller
.data_mut()
.metrics
.blocked_network_attempt(host);
return Err(anyhow::anyhow!("host not permitted: {host}"));
}
let body = read_bytes_bounded(&caller, body_ptr, body_len, 64 * 1024)?;
let response = caller
.data()
.http_client
.post(&url)
.body(body)
.send()?;
write_bytes(&mut caller, out_ptr, &response.bytes()?)
},
)?;
}
Network isolation is absolute for tenants on the base tier. For tenants with allows_outbound_http, the allowed hosts list is reviewed at tier-upgrade time and stored in the platform database, not in the module itself. A module cannot expand its own outbound allowlist by returning a different value; the list is immutable from the module’s perspective.
Step 6: Output Validation Before Downstream Use
WASM modules that return crafted output can attack downstream systems even if they never escape the sandbox. Validate every output before passing it to any downstream consumer.
// output_validation.rs
pub fn validate_module_output(
output: &[u8],
expected_schema: &OutputSchema,
) -> Result<ValidatedOutput, OutputError> {
// Size check — before any deserialization.
if output.len() > expected_schema.max_output_bytes {
return Err(OutputError::TooLarge(output.len()));
}
// Parse against the declared schema. User modules produce JSON or a
// platform-specific binary format. Reject anything that does not
// parse cleanly.
let value: serde_json::Value = serde_json::from_slice(output)
.map_err(|e| OutputError::InvalidJson(e.to_string()))?;
// Schema validation — required fields, type constraints, value ranges.
expected_schema
.validate(&value)
.map_err(OutputError::SchemaMismatch)?;
// Sanitise string fields before downstream use.
// The downstream system (template renderer, SQL builder, etc.) must
// apply its own escaping; the platform applies a second pass here.
let sanitised = sanitise_string_fields(&value);
Ok(ValidatedOutput(sanitised))
}
Output validation is not optional. The validated output type is distinct from raw bytes; downstream functions only accept ValidatedOutput, not &[u8]. This makes it a compile-time error to pass unvalidated module output to a downstream consumer. The schema is versioned alongside the module API: when a module author changes the shape of what they return, they update the declared schema and the platform re-validates.
Step 7: Multi-Layer Isolation — WASM Sandbox Plus OS seccomp
The WASM sandbox is a software boundary. Defence in depth requires an OS-level boundary underneath it. If a Wasmtime bug allows a module to escape linear-memory isolation — a JIT compiler vulnerability, a host-function boundary confusion — the seccomp profile is the next barrier. It cannot fix the escape, but it prevents the module from using that escape to do anything useful.
Wrap the module execution worker process in a seccomp-BPF profile that allows only the syscalls Wasmtime itself needs:
# seccomp-profile.yaml — used via libseccomp or Kubernetes seccomp support.
# Derived from a Wasmtime execution baseline using strace/seccomp-tools.
defaultAction: SCMP_ACT_KILL_PROCESS
architectures:
- SCMP_ARCH_X86_64
- SCMP_ARCH_AARCH64
syscalls:
- names:
[ read, write, mmap, mprotect, munmap, brk, clone3,
futex, nanosleep, clock_gettime, gettid, getpid,
epoll_wait, epoll_ctl, eventfd2, close, openat,
fstat, newfstatat, getrandom, rt_sigaction,
rt_sigprocmask, exit_group, tgkill, sigaltstack ]
action: SCMP_ACT_ALLOW
# Wasmtime JIT requires mmap with PROT_EXEC for compiled code.
# Restrict: MAP_ANONYMOUS only, not file-backed mappings.
- names: [mmap]
action: SCMP_ACT_ALLOW
args:
- index: 3
value: 34 # MAP_ANON | MAP_PRIVATE
op: SCMP_CMP_EQ
If the Wasmtime runtime ever executes a path that tries to call execve, socket, connect, or open with a real filesystem path, the kernel kills the process immediately. The module’s WASM trap is already contained by Wasmtime; the seccomp profile contains any bug in Wasmtime itself.
On Kubernetes, attach the seccomp profile to the Pod spec:
apiVersion: v1
kind: Pod
metadata:
name: user-wasm-worker
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/wasm-worker.json
containers:
- name: worker
image: registry.internal/wasm-worker:1.0.0
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 10001
capabilities:
drop: [ALL]
The container’s root filesystem is read-only, no capabilities are granted, and privilege escalation is denied. The seccomp profile is applied at the pod level. The combination means that a module which somehow escapes the WASM sandbox is still constrained to the kernel API surface that Wasmtime needs for normal operation — network, filesystem, and process-creation calls are all blocked.
Step 8: Audit Logging Per Execution
Every module execution generates an audit record. Security incidents involving user-provided code are impossible to investigate without an immutable, per-execution log. At minimum, record: which tenant, which module (by content hash), when, how much CPU and memory was consumed, whether a trap occurred and why, and which host functions were called.
// audit.rs
#[derive(Serialize)]
pub struct ExecutionAuditRecord {
pub execution_id: Uuid,
pub tenant_id: String,
pub module_id: String, // Identifier from the platform's module registry.
pub module_hash: String, // SHA-256 of the WASM bytes; ties the record to exact bytes.
pub invoked_at: DateTime<Utc>,
pub wall_clock_ms: u64,
pub fuel_consumed: u64,
pub fuel_budget: u64,
pub memory_peak_bytes: usize,
pub trap_kind: Option<String>, // None = clean exit; "fuel_exhausted", "epoch_deadline", etc.
pub output_bytes: usize,
pub output_validation_status: ValidationStatus,
pub blocked_network_attempts: Vec<String>,
pub host_function_calls: Vec<HostFunctionCall>,
}
pub async fn emit_audit(record: ExecutionAuditRecord) {
// Write to the append-only audit log. The audit store is separate from
// the operational database; it uses a WORM policy and is not accessible
// from user-module execution workers.
AUDIT_SINK.emit(serde_json::to_string(&record).unwrap()).await;
// Emit metrics for real-time alerting.
metrics::counter!(
"user_wasm_executions_total",
"tenant" => record.tenant_id.clone(),
"trap_kind" => record.trap_kind.clone().unwrap_or_default(),
)
.increment(1);
metrics::histogram!(
"user_wasm_fuel_consumed",
"tenant" => record.tenant_id,
)
.record(record.fuel_consumed as f64);
}
Key metrics for operational alerting:
user_wasm_executions_total{tenant, trap_kind} counter
user_wasm_fuel_consumed{tenant} histogram
user_wasm_memory_peak_bytes{tenant} histogram
user_wasm_execution_wall_ms{tenant} histogram
user_wasm_blocked_network_attempts_total{tenant, host} counter
user_wasm_validation_rejected_total{tenant, reason} counter
user_wasm_output_validation_failed_total{tenant} counter
Alert thresholds:
user_wasm_executions_total{trap_kind="fuel_exhausted"}spiking for a tenant — the module is hitting its CPU budget on every call; investigate for intentional DoS or runaway logic.user_wasm_blocked_network_attempts_totalnon-zero — a module attempted an outbound connection to a host not on the allowlist; treat as a probable exfiltration attempt and review the module.user_wasm_validation_rejected_totalnon-zero — a module upload was rejected at the import-surface stage; review the rejected import list.user_wasm_output_validation_failed_totalnon-zero — a module produced output that failed schema validation; the module may be attempting output injection.
Expected Behaviour
| Signal | Unprotected embedding | Hardened embedding |
|---|---|---|
Module with loop {} |
Worker thread hangs indefinitely | Epoch deadline fires within ~250ms; execution traps; next request proceeds normally |
| Module requests 2 GB memory | Allocates; worker OOMs | memory_growing returns false; module traps with MemoryGrowError |
Module imports platform.exec_command |
Executes if linked | Pre-execution validation rejects at upload; module never stored |
| Module imports a host function not on allowlist | Available if carelessly linked | Validation rejects at upload; linker does not expose it regardless |
| Tenant A module reads Tenant B data via KV | Possible if KV is unscoped | KV host function injects tenant prefix; Tenant A cannot address Tenant B’s key namespace |
| Module attempts outbound TCP to exfiltrate | Succeeds via WASI sockets | WASI sockets not linked; seccomp kills any socket(2) call that reaches the kernel |
| Module returns SQL injection payload in output | Injected downstream | Output validation schema-checks and sanitises all string fields before downstream use |
| Module triggers JIT compilation pause on execution | All concurrent requests pause | Module pre-compiled at upload time; execution loads .cwasm with no JIT pause |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Per-tenant Engine | JIT cache isolation between tenants | ~10–50 MiB engine overhead per tenant | Use per-tenant engines for paid/production tiers; share an engine for free/eval tiers with extra runtime monitoring |
| Pre-execution import validation | Catches dangerous modules before execution | Adds ~50–200ms to the upload path | Run validation asynchronously after a fast size check; defer full parse to async worker |
Explicit linker (no add_to_linker) |
Minimum attack surface; no accidental WASI exposure | More code to maintain; each new API must be wired explicitly | Use a host-function registry pattern; unit-test each host function’s access control independently |
| Output schema validation | Stops injection attacks on downstream systems | Schema maintenance overhead | Version the output schema alongside the module API; store the expected schema hash with each module version |
| seccomp profile | OS-level defence in depth; contains Wasmtime bugs | Profile maintenance as Wasmtime’s syscall profile changes across versions | Build the profile from a baseline audit on each major Wasmtime upgrade; test with seccomp-tools trace |
| Fuel + epoch (both) | Fuel provides metering; epoch provides wall-clock hard deadline | Fuel adds ~5–15% per-operation overhead | For latency-critical workloads, use epoch-only with a tight tick interval; reserve fuel for billing-grade accounting |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Epoch ticker thread dies | Modules run past their deadline; trap_kind metric stops recording epoch_deadline |
user_wasm_executions_total{trap_kind="epoch_deadline"} drops to zero while module count stays nonzero |
Supervise the epoch thread with a watchdog; restart the worker process if the ticker is not observed within 2 ticks |
| Import allowlist too permissive | A newly-added host function exposes capability user modules should not have | Security review of linker registration code; blocked_network_attempts is silent because the call is allowed |
Review the linker build function on every PR touching host functions; require a security review annotation |
| seccomp profile missing after Wasmtime upgrade | Wasmtime uses a new syscall; worker process killed on legitimate execution | Execution failure rate spikes; SIGKILL in process logs |
Run the seccomp trace audit on new Wasmtime versions in staging before promoting |
| Output schema not updated with module API | Legitimate module output fails validation after a module update | user_wasm_output_validation_failed_total rises; customer reports failures |
Version the schema alongside the module; deploy both together; roll back the module if schema cannot be updated |
| Covert channel via timing | Two tenants’ modules exchange information through latency of shared KV or HTTP backends | Difficult to detect without statistical analysis of call timing across tenants | Rate-limit all host function calls per tenant; add jitter to host function response times; use separate host-function pools per tenant for high-security deployments |
| Cache directory permissions misconfigured | Tenant A’s compiled .cwasm artifact is readable by Tenant B’s worker |
File permission audit reveals world-readable cache paths | Own each tenant’s cache directory with a per-tenant UID; enforce mode 0700 on the directory |
| Pre-compilation fails silently | Module upload succeeds but execution always falls back to JIT; compilation pauses spike latency | p99 latency anomaly on first execution per module version | Alert on pre-compilation failures at upload time; surface the error to the platform operator |
Managed Alternatives
Building this stack in-house requires sustained investment: host-function auditing, seccomp maintenance, quota infrastructure, output validation schemas, and ongoing Wasmtime version tracking. The managed alternatives shift that burden to the provider:
- Cloudflare Workers: isolate-per-request, managed multi-tenant WASM platform; Cloudflare maintains the seccomp and isolation stack.
- Fastly Compute: Wasmtime-based with platform-managed resource limits; suitable for pipeline and edge-logic use cases.
- Fermyon Cloud: Spin-based managed hosting; handles multi-tenant isolation for plugin-style workloads.
Build in-house when your use case has compliance or data-residency requirements that preclude managed hosting, when you need custom host functions not available on managed platforms, or when you need per-tenant billing at granular fuel resolution.