WASM Multi-Tenancy Patterns: Resource Quotas, Fair Scheduling, and Tenant Isolation Failures
Problem
Running multiple tenants’ WASM workloads in a single runtime instance is the hard case for WASM platforms. Single-tenant Wasmtime hardening (covered in Wasmtime Production Hardening) bounds one workload. Multi-tenant adds three orthogonal concerns:
- Fairness: one tenant’s workload should not starve another’s. Naive per-call resource caps still allow one greedy tenant to dominate by submitting many concurrent calls.
- Isolation: beyond linear-memory boundaries, tenants share JIT-compiled code caches, host-side resource tables, and network connections to shared host services. Each shared resource is an isolation surface.
- Predictability: in a multi-tenant system, latency for tenant B depends on what tenant A is doing. Production SLAs require bounded interference.
By 2026 multi-tenant WASM is mainstream — Cloudflare Workers, Fastly Compute, Spin running in shared Kubernetes Pods, wasmCloud lattice deployments. The hardening contract for these platforms is significantly more involved than single-tenant.
The specific gaps in a default multi-tenant Wasmtime embedding:
- All tenants share one
Engine. Compiled code cache is shared; cache poisoning by a malicious tenant could affect others. - Per-
Storeresource limiters apply per-call, not per-tenant aggregate. Tenant A submitting 1000 concurrent calls each at the cap consumes 1000× the cap. - Host functions called by WASM modules touch shared backends (database, KV store, message queue). Without per-tenant quotas at the host-function layer, a tenant can saturate shared backends.
- Scheduling is FIFO across calls. A tenant with high arrival rate gets the same time slice as one with infrequent calls.
- JIT-compilation pauses block all calls in the same
Engineduring the pause. A tenant uploading a new module triggers latency spikes for all tenants.
This article covers per-tenant resource accounting, fair scheduling across tenants, host-function quotas, isolated Engine instances per trust boundary, and the failure modes that violate tenant isolation.
Target systems: Wasmtime 22+ embedded in a multi-tenant control plane; Spin platform 2.6+, wasmCloud 1.2+, Fastly Compute and Cloudflare Workers (managed analogs).
Threat Model
- Adversary 1 — Greedy tenant: a tenant submits unusually high call rates, each within per-call caps but cumulatively starving others.
- Adversary 2 — Malicious tenant attempting cross-tenant attack: crafted module trying to read another tenant’s module memory, poison shared cache, or consume tokens / credentials issued by host functions.
- Adversary 3 — Compromised module attempting JIT-cache poisoning: module designed to exercise codepaths that trigger compilation; hopes to land malicious code in the cache that other modules then execute.
- Adversary 4 — Resource-exhaustion attack on host functions: module calls a shared host function (database write, network egress) at high rate to exhaust the host’s downstream capacity.
- Access level: Adversary 1 has standard tenant API. Adversary 2 has module-upload. Adversary 3 has module-upload + knowledge of the runtime’s compilation behavior. Adversary 4 has standard API.
- Objective: Degrade service for other tenants, cross tenant boundaries, exhaust shared resources.
- Blast radius: Without per-tenant quotas, one tenant can affect every other. With proper accounting, blast radius is bounded to the offending tenant’s quota; other tenants see no degradation.
Configuration
Pattern 1: Per-Tenant Resource Aggregate Quotas
Rather than (or in addition to) per-call limits, track per-tenant totals across a sliding window.
// per_tenant_quota.rs
use std::sync::Arc;
use parking_lot::Mutex;
use std::collections::HashMap;
use std::time::{Duration, Instant};
pub struct TenantQuota {
cpu_seconds_per_window: f64,
memory_max_bytes: usize,
concurrent_calls_max: usize,
egress_bytes_per_window: u64,
window: Duration,
}
pub struct TenantState {
cpu_used_in_window: f64,
window_started: Instant,
concurrent_calls: usize,
memory_in_use: usize,
egress_bytes_in_window: u64,
}
pub struct TenantTracker {
quotas: HashMap<String, TenantQuota>,
state: Arc<Mutex<HashMap<String, TenantState>>>,
}
impl TenantTracker {
pub fn admit_call(&self, tenant_id: &str) -> Result<CallGuard, QuotaError> {
let mut state = self.state.lock();
let s = state.entry(tenant_id.to_string()).or_insert_with(default_state);
// Roll the window if expired.
let q = self.quotas.get(tenant_id).ok_or(QuotaError::UnknownTenant)?;
if s.window_started.elapsed() >= q.window {
s.cpu_used_in_window = 0.0;
s.egress_bytes_in_window = 0;
s.window_started = Instant::now();
}
if s.concurrent_calls >= q.concurrent_calls_max {
return Err(QuotaError::ConcurrencyLimit);
}
if s.cpu_used_in_window >= q.cpu_seconds_per_window {
return Err(QuotaError::CpuLimit);
}
s.concurrent_calls += 1;
Ok(CallGuard { tenant_id: tenant_id.into(), tracker: self.state.clone() })
}
pub fn record_cpu(&self, tenant_id: &str, seconds: f64) {
let mut state = self.state.lock();
if let Some(s) = state.get_mut(tenant_id) {
s.cpu_used_in_window += seconds;
}
}
}
pub struct CallGuard {
tenant_id: String,
tracker: Arc<Mutex<HashMap<String, TenantState>>>,
}
impl Drop for CallGuard {
fn drop(&mut self) {
if let Some(s) = self.tracker.lock().get_mut(&self.tenant_id) {
s.concurrent_calls = s.concurrent_calls.saturating_sub(1);
}
}
}
Wrap every WASM call site with admit_call → execute → record_cpu. A tenant exceeding any quota dimension gets rejected at the boundary; other tenants’ calls proceed.
Pattern 2: Fair Scheduling Across Tenants
When the system is at capacity, who waits and who runs? FIFO favors high-volume tenants. Use a weighted-fair queue.
// fair_scheduler.rs
// Per-tenant FIFO; round-robin across tenants weighted by quota.
use std::collections::{HashMap, VecDeque};
use tokio::sync::Notify;
pub struct FairScheduler {
tenant_queues: HashMap<String, VecDeque<Job>>,
tenant_weights: HashMap<String, u32>,
last_served: HashMap<String, Instant>,
notify: Notify,
}
impl FairScheduler {
pub fn submit(&mut self, job: Job) {
self.tenant_queues.entry(job.tenant_id.clone()).or_default().push_back(job);
self.notify.notify_one();
}
pub async fn next(&mut self) -> Job {
// Pick the tenant whose deficit (weight × time-since-served) is highest.
loop {
let mut best: Option<(&String, f64)> = None;
for (tid, queue) in &self.tenant_queues {
if queue.is_empty() { continue; }
let weight = *self.tenant_weights.get(tid).unwrap_or(&1);
let last = self.last_served.get(tid).copied().unwrap_or(Instant::now());
let score = last.elapsed().as_secs_f64() * weight as f64;
if best.map_or(true, |(_, s)| score > s) {
best = Some((tid, score));
}
}
if let Some((tid, _)) = best {
let tid = tid.clone();
let job = self.tenant_queues.get_mut(&tid).unwrap().pop_front().unwrap();
self.last_served.insert(tid, Instant::now());
return job;
}
self.notify.notified().await;
}
}
}
A tenant with weight 1 served once gets equal time as a tenant with weight 1 waiting; a tenant with weight 5 gets 5× the time-share. Quotas remain hard caps; weights determine priority within those caps.
Pattern 3: Per-Tenant Engines for Trust Boundaries
Sharing one Engine (and therefore one compilation cache) across all tenants is fast but couples them. For the highest isolation, give each tenant — or each trust class — its own Engine.
struct TenantRuntime {
engine: Engine,
cache_dir: PathBuf,
}
let mut tenant_runtimes: HashMap<String, TenantRuntime> = HashMap::new();
fn get_or_create_runtime(tenant_id: &str) -> &TenantRuntime {
tenant_runtimes.entry(tenant_id.into()).or_insert_with(|| {
let mut config = Config::new();
config.cache_config_load(format!("/var/cache/wasmtime/{tenant_id}/cache.toml"))
.expect("cache config");
config.consume_fuel(true);
config.epoch_interruption(true);
// Per-tenant compilation costs; can be parallelized.
let engine = Engine::new(&config).expect("engine");
TenantRuntime { engine, cache_dir: format!("/var/cache/wasmtime/{tenant_id}").into() }
})
}
Each tenant’s compiled .cwasm artifacts live in their own cache directory. A malicious tenant cannot poison cache entries for others.
The trade-off: more memory (each engine has its own jit data structures, ~10-50 MB) and each tenant pays its own first-compile cost. For 100 tenants, that’s 1-5 GB of engine overhead — acceptable for high-trust separation.
Pattern 4: Host-Function Quotas
Host functions reach shared backends (databases, KV, message queues, the network). A tenant calling db.query(...) 1M times consumes shared backend capacity.
Wrap host functions with per-tenant quotas:
fn instrumented_db_query(
mut caller: Caller<'_, Host>,
tenant_id: &str,
query_ptr: u32,
query_len: u32,
) -> Result<u32, Error> {
// Per-tenant rate limit.
let quota = caller.data().tenant_tracker.host_function_quota(tenant_id, "db.query")?;
quota.consume_or_reject(1)?;
// Per-call timeout that does not consume the tenant's CPU budget if database is slow.
let result = tokio::time::timeout(Duration::from_millis(500), async {
caller.data().db.query(read_string(&caller, query_ptr, query_len)?)
}).await??;
Ok(write_result(&caller, result)?)
}
The host_function_quota is a per-tenant token bucket: tenant gets N tokens per second for db.query, refills naturally. Bursts are bounded.
Pattern 5: JIT Compilation Pause Mitigation
Wasmtime’s compilation can briefly pause execution of other modules in the same engine. For latency-critical multi-tenant workloads:
- Pre-compile modules at upload time, never on first request.
- Use
Config::compilation_strategy(CompilationStrategy::Cranelift)withConfig::cranelift_opt_level(OptLevel::Speed)— opt for compile speed over execution speed if compile pauses dominate. - Run compilation in a separate thread pool, dedicated to compilation, distinct from the request-handling pool.
// Pre-compile asynchronously; serve from cache when ready.
let module_bytes = upload_module(tenant_id, &wasm_bytes).await?;
tokio::task::spawn_blocking(move || {
let module = Module::new(&engine, &module_bytes)?;
let cwasm = module.serialize()?;
fs::write(format!("/var/cache/wasmtime/{tenant_id}/{module_id}.cwasm"), cwasm)?;
Ok(())
}).await??;
// Subsequent invocations.
let module = Module::deserialize_file(&engine, &cwasm_path)?;
Pattern 6: Telemetry Per Tenant
Every metric carries the tenant label.
wasm_tenant_invocations_total{tenant, module} counter
wasm_tenant_cpu_seconds_total{tenant} counter
wasm_tenant_memory_pages{tenant, module} gauge
wasm_tenant_quota_rejected_total{tenant, reason} counter
wasm_tenant_concurrent_calls{tenant} gauge
wasm_tenant_egress_bytes_total{tenant, target} counter
wasm_tenant_host_function_calls_total{tenant, function} counter
wasm_tenant_jit_compile_seconds_total{tenant, module} counter
Build per-tenant dashboards. Anomaly alerts:
wasm_tenant_quota_rejected_total{reason="cpu_limit"}rising — tenant is hitting their CPU quota repeatedly. Either grow them up a tier or investigate behavior.wasm_tenant_cpu_seconds_totalfor one tenant comparable to total system CPU — a single tenant dominating; verify quota assignment is correct.wasm_tenant_jit_compile_seconds_totalcorrelated with latency spikes — compilation is on the request path; pre-compile.
Expected Behaviour
| Signal | Single-engine, no quotas | Per-tenant quotas + fair scheduler |
|---|---|---|
| Tenant A submits 10x normal load | Other tenants slow | Tenant A throttled to their quota; others unaffected |
| Tenant A uploads new large module | All tenants pause during JIT compile | Compile in background; other tenants unaffected |
| Tenant A’s module attempts cache poisoning | Could succeed if implementation buggy | Per-tenant cache directory; impossible |
| Tenant A’s host calls saturate database | Database queue grows for everyone | Tenant A’s calls rejected at host-function quota; database unaffected |
| Tenant A’s quota limit reached | Effectively unbounded | Hard rejection; tenant sees 429 quota_exceeded |
| Latency for Tenant B during Tenant A spike | Variable | Bounded by Tenant B’s own quota; insensitive to Tenant A |
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Per-tenant aggregate quotas | Bounds noisy-neighbor effects | Tracking and accounting overhead | Implement in a single tenant-tracker module shared across all hot paths. |
| Fair scheduling | Predictable latency for low-volume tenants | More complex than FIFO | Use existing libraries (tokio’s JoinSet with scheduler hooks); avoid hand-rolling. |
| Per-tenant engines | Strong isolation; per-tenant cache | Memory overhead per tenant | Use only for high-trust separation (paid customers); shared engine for free-tier where cost-per-tenant matters. |
| Host-function quotas | Bounds backend resource use | Each host function needs explicit quota | Centralize quota logic in a wrapper; apply via macros or codegen. |
| Pre-compilation at upload | Eliminates JIT-on-request latency | Upload is slower; modules retained even if never invoked | Acceptable; upload is a less latency-sensitive operation. |
| Per-tenant telemetry | Detection of anomalies | Metric cardinality grows with tenant count | High-cardinality tenant labels are expensive in some TSDBs (Prometheus, Mimir); use exemplars rather than full per-tenant breakdown for high-volume metrics. |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Quota lookup race / TOCTOU | Tenant exceeds quota briefly during concurrent admission | wasm_tenant_cpu_seconds_total exceeds quota |
Use atomic operations (compare-and-swap) for quota updates; or a single-writer thread per tenant. |
| Per-tenant cache directory permissions wrong | Cross-tenant cache read possible | OS-level file permission check fails | Tenant-cache directories owned by per-tenant UIDs (combined with user namespace); enforce mode 0700. |
| Fair scheduler livelock | Some tenant’s queue never serves | Per-tenant queue depth grows unbounded for a specific tenant | Add a maximum-wait deadline to each job; jobs exceeding it are rejected. Bug in scoring logic. |
| Host function quota too low | Legitimate tenant hits quota during normal use | Tenant complaints; metrics show high quota_rejected rate from a known-good tenant |
Profile real workloads; size quotas to 99th-percentile peak * 1.5. |
| Quota exhaustion attack | Tenant deliberately fills quota to deny their own service to themselves | (rare and self-inflicted) | Quota refresh window is short; tenant naturally recovers. Rate-limit account-level lockouts. |
| Tenant grows beyond their tier | Production load now exceeds quota; legitimate degradation | Quota-rejected rate sustained over hours | Move to higher tier; trigger billing alert; do not silently raise quota for a tenant. |
| JIT-cache corruption | Specific tenant module always traps after a partial-write event | Crash with cache-related errors | Per-tenant cache; delete the affected directory; force recompile. Use atomic rename for cache writes. |
When to Consider a Managed Alternative
Building a multi-tenant WASM platform in-house requires quota infrastructure, fair scheduling, per-tenant engines, telemetry, and ongoing tuning (15-30 hours/month for a platform team).
- Cloudflare Workers: isolate-based, multi-tenant by design.
- Fastly Compute: Wasmtime-based with platform-managed isolation.
- Fermyon Cloud: Spin-based managed multi-tenant platform.
- Wasmer Edge: managed multi-tenant WASM hosting.