Safe Module Termination with Wasmtime Epoch-Based Interruption
Problem
A Wasm module that enters an infinite loop, allocates unbounded memory, or performs a denial-of-service computation cannot be stopped from outside without terminating the entire host process. This is the fundamental challenge of executing untrusted code: the sandbox prevents the module from escaping, but it does not prevent the module from consuming unlimited host resources within the sandbox.
Wasmtime offers two mechanisms for interrupting runaway modules: fuel-based execution limits and epoch-based interruption. Fuel assigns a fixed instruction budget to a module execution; when exhausted, the module traps. Epochs use a background timer and check points in the generated machine code to allow the host to request termination at safe points.
Each approach has different security properties and operational tradeoffs:
Fuel is deterministic — the same program always uses the same fuel for the same input. It is useful for CPU budgets and reproducible execution limits. But fuel-per-instruction counting is coarse: a loop that calls host functions (which don’t consume fuel) can run indefinitely. And fuel is synchronous — it cannot be adjusted from a different thread while a module is running.
Epochs are asynchronous and wall-clock-based. A background thread increments an epoch counter at a configurable interval. Each Wasm function prologue (and loop back-edge) checks whether the current epoch has exceeded the module’s deadline. If so, the module traps at a safe instruction boundary. This enables genuine wall-clock timeouts: “this module must complete within 5 seconds” is enforced even if the module is mostly sleeping in host function calls.
The security gap in most Wasmtime deployments is that neither fuel nor epochs are configured by default. The default configuration runs modules indefinitely with no resource limit. For production use cases — edge functions, LLM agent tool plugins, user-provided Wasm execution, plugin systems — this means a single malicious or buggy module can exhaust a host process’s resources, potentially affecting all other modules running in the same engine.
Epoch interruption is the right mechanism for production timeout enforcement, but it requires:
- A background thread that advances the epoch counter
- Per-module epoch deadlines configured before execution
- Cooperative yield points for async execution contexts
- A timeout policy that matches the security requirements of each execution context
Wasmtime’s epoch mechanism interacts with the async execution model in subtle ways that, if misconfigured, can cause epoch checks to be skipped entirely — making the timeout ineffective.
Target systems: any Wasmtime embedding that executes untrusted or third-party Wasm modules; edge function platforms; LLM agent tool plugins (see Wasm AI Plugin Sandboxing); plugin systems where module authors are external; Wasmtime ≥13.0.
Threat Model
Adversary 1 — Infinite loop denial of service. A malicious third-party plugin executes an infinite loop. Without epoch interruption, the host thread executing the module is permanently occupied. Other modules sharing the engine cannot be scheduled; the service degrades.
Adversary 2 — Exponential backtracking algorithm. A plugin performs a computation with exponential worst-case complexity (regex matching, JSON parsing of adversarial input, cryptographic operations on attacker-controlled parameters). Fuel limits are insufficient if the computation uses host-imported functions that don’t consume fuel. Without epoch interruption, the computation runs until the host OOMs or the process is killed.
Adversary 3 — Sleep-based timeout evasion. A malicious module calls a host sleep function repeatedly, consuming wall-clock time while using minimal CPU. Fuel-based limits are evaded because sleep calls don’t consume fuel. Epoch interruption fires regardless of CPU usage, catching sleep-based evasion.
Adversary 4 — Async execution epoch bypass. A module is executed in an async context. The epoch check is inserted at function prologues and loop back-edges. If the async yield points are not configured correctly, the epoch check may not fire during async-suspended states, allowing indefinite suspension of an async execution slot.
Without epoch interruption: modules run indefinitely, enabling all four attacks. With epoch interruption properly configured: all executions are bounded by a wall-clock deadline; the deadline fires at safe instruction boundaries; the module traps cleanly.
Configuration / Implementation
Step 1 — Enable epoch interruption in the engine
use wasmtime::{Engine, Config};
fn production_engine() -> Engine {
let mut config = Config::new();
// Enable epoch-based interruption
// This adds epoch check instructions to the compiled code
config.epoch_interruption(true);
// Also enable fuel for CPU instruction budget (complementary control)
config.consume_fuel(true);
Engine::new(&config).expect("Failed to create engine")
}
Step 2 — Start the epoch ticker
Epoch interruption requires a background thread that increments the epoch counter:
use std::sync::Arc;
use std::time::Duration;
use std::thread;
use wasmtime::Engine;
fn start_epoch_ticker(engine: Arc<Engine>, tick_interval: Duration) -> thread::JoinHandle<()> {
thread::spawn(move || {
loop {
thread::sleep(tick_interval);
engine.increment_epoch();
}
})
}
fn main() {
let engine = Arc::new(production_engine());
// Tick every 10ms — epoch checks fire at roughly 10ms granularity
// For tighter timeouts, use a shorter interval (at the cost of more overhead)
let _ticker = start_epoch_ticker(engine.clone(), Duration::from_millis(10));
// ... run modules
}
For async runtimes, use a Tokio interval instead:
use tokio::time::{interval, Duration};
use wasmtime::Engine;
async fn start_epoch_ticker_async(engine: Arc<Engine>, tick_interval: Duration) {
let mut ticker = interval(tick_interval);
loop {
ticker.tick().await;
engine.increment_epoch();
}
}
Step 3 — Set per-execution epoch deadlines
Each Store (execution context) must have an epoch deadline configured before the module runs:
use wasmtime::{Store, Engine};
use wasmtime_wasi::WasiCtx;
struct ExecutionConfig {
/// Wall-clock timeout as number of epoch ticks
/// With 10ms tick interval, deadline=500 means 5 seconds
epoch_deadline: u64,
/// Fuel budget — instructions before trap (covers tight CPU loops between epoch checks)
fuel_budget: u64,
}
fn execute_with_timeout<T>(
engine: &Engine,
config: ExecutionConfig,
host_data: T,
run: impl FnOnce(&mut Store<T>) -> anyhow::Result<()>,
) -> anyhow::Result<()> {
let mut store = Store::new(engine, host_data);
// Set epoch deadline relative to current epoch
// deadline=N means: trap if epoch has advanced N times since now
store.set_epoch_deadline(config.epoch_deadline);
// Set fuel budget
store.set_fuel(config.fuel_budget)?;
// Configure what happens when epoch deadline is reached
// Options: trap (default) or callback
store.epoch_deadline_trap(); // Trap with "interrupt" error
// Alternative: store.epoch_deadline_callback(|_| Ok(UpdateDeadline::Continue(1)));
run(&mut store)
}
// Usage: execute a plugin with a 5-second wall-clock timeout
execute_with_timeout(
&engine,
ExecutionConfig {
epoch_deadline: 500, // 500 × 10ms = 5 seconds
fuel_budget: 10_000_000, // ~10M instructions
},
wasi_context,
|store| {
let instance = linker.instantiate(&mut *store, &module)?;
let func = instance.get_typed_func::<(i32,), i32>(&mut *store, "process")?;
func.call(&mut *store, (input_ptr,))?;
Ok(())
}
)
Step 4 — Handle epoch-triggered traps cleanly
When an epoch deadline fires, Wasmtime raises a trap. Distinguish this from other traps and handle it as a timeout:
use wasmtime::{Trap, TrapCode};
fn is_epoch_timeout(err: &anyhow::Error) -> bool {
err.downcast_ref::<Trap>()
.map(|t| t.trap_code() == Some(TrapCode::Interrupt))
.unwrap_or(false)
}
fn run_plugin_safely(
store: &mut Store<PluginState>,
func: TypedFunc<(i32,), i32>,
input: i32,
) -> Result<i32, PluginError> {
match func.call(store, (input,)) {
Ok(result) => Ok(result),
Err(err) if is_epoch_timeout(&err) => {
tracing::warn!(
plugin = %store.data().plugin_name,
"Plugin exceeded execution deadline"
);
Err(PluginError::Timeout)
}
Err(err) => Err(PluginError::ExecutionError(err.to_string())),
}
}
Step 5 — Configure for async execution (Wasmtime async)
In async contexts, epoch interruption must be paired with cooperative async yield points:
use wasmtime::Config;
fn async_engine_config() -> Config {
let mut config = Config::new();
config.async_support(true);
config.epoch_interruption(true);
// Async epoch interruption: when deadline fires, yield to the async runtime
// rather than trap immediately — allows graceful cleanup
config
}
// Async execution with epoch deadline
async fn execute_async_with_timeout(
engine: &Engine,
module: &Module,
deadline_ticks: u64,
) -> anyhow::Result<()> {
let mut store = Store::new(engine, ());
store.set_epoch_deadline(deadline_ticks);
// For async execution, use epoch_deadline_async_yield_and_update
// This yields to the Tokio runtime at each epoch check rather than trapping
store.epoch_deadline_async_yield_and_update(deadline_ticks);
let instance = Linker::new(engine)
.instantiate_async(&mut store, module)
.await?;
// Run with timeout at the Tokio level as well (defence in depth)
tokio::time::timeout(
Duration::from_secs(10),
async {
let func = instance
.get_typed_func::<(), ()>(&mut store, "_start")?;
func.call_async(&mut store, ()).await
}
).await??;
Ok(())
}
Step 6 — Define a timeout policy per execution context
Document and enforce different timeout tiers for different plugin classes:
/// Timeout policy: match execution context to appropriate limits
pub struct TimeoutPolicy {
pub epoch_deadline_ticks: u64,
pub fuel_budget: u64,
pub description: &'static str,
}
pub const INTERACTIVE_REQUEST: TimeoutPolicy = TimeoutPolicy {
epoch_deadline_ticks: 100, // 1 second (at 10ms tick)
fuel_budget: 1_000_000, // 1M instructions
description: "Interactive HTTP request — user-facing latency",
};
pub const BACKGROUND_JOB: TimeoutPolicy = TimeoutPolicy {
epoch_deadline_ticks: 3000, // 30 seconds
fuel_budget: 100_000_000, // 100M instructions
description: "Background processing — longer timeout acceptable",
};
pub const AI_TOOL_PLUGIN: TimeoutPolicy = TimeoutPolicy {
epoch_deadline_ticks: 500, // 5 seconds
fuel_budget: 10_000_000, // 10M instructions
description: "LLM agent tool — must complete within agent turn budget",
};
pub const UNTRUSTED_PLUGIN: TimeoutPolicy = TimeoutPolicy {
epoch_deadline_ticks: 50, // 500ms — aggressive timeout for untrusted code
fuel_budget: 100_000, // 100K instructions
description: "Third-party plugin — minimal resource budget",
};
Step 7 — Monitor epoch interruption events
// Instrument epoch timeout events for alerting and capacity planning
use metrics::{counter, histogram};
fn instrumented_execute(
store: &mut Store<PluginState>,
func: TypedFunc<(i32,), i32>,
input: i32,
) -> Result<i32, PluginError> {
let start = std::time::Instant::now();
let plugin_name = store.data().plugin_name.clone();
let result = run_plugin_safely(store, func, input);
let elapsed = start.elapsed();
histogram!("plugin.execution_duration_ms",
elapsed.as_millis() as f64,
"plugin" => plugin_name.clone());
match &result {
Err(PluginError::Timeout) => {
counter!("plugin.epoch_timeouts_total", 1, "plugin" => plugin_name);
tracing::warn!(plugin = %plugin_name, "Epoch timeout fired");
}
Err(PluginError::FuelExhausted) => {
counter!("plugin.fuel_exhausted_total", 1, "plugin" => plugin_name);
}
Ok(_) => {
counter!("plugin.executions_success_total", 1, "plugin" => plugin_name);
}
_ => {}
}
result
}
Expected Behaviour
| Signal | Without epoch interruption | With epoch interruption |
|---|---|---|
| Module in infinite loop | Host thread blocked indefinitely | Traps with TrapCode::Interrupt after deadline |
| 5-second timeout on plugin | Not enforced | Plugin traps after ~5 seconds regardless of CPU usage |
| Sleep-in-loop evasion | Evades fuel limits; runs indefinitely | Epoch fires during host sleep; deadline enforced |
| Malicious plugin consuming all CPU | Other modules starved | Timeout fires; module terminated; other modules continue |
| Async module suspended indefinitely | Never times out | epoch_deadline_async_yield_and_update fires; Tokio cancels |
Verification:
#[test]
fn test_epoch_terminates_infinite_loop() {
let engine = production_engine();
let _ticker = start_epoch_ticker(Arc::new(engine.clone()), Duration::from_millis(10));
// Wasm module with an infinite loop
let module = Module::new(&engine, r#"
(module
(func $loop (loop br 0)) ;; infinite loop
(export "_start" (func $loop))
)
"#).unwrap();
let mut store = Store::new(&engine, ());
store.set_epoch_deadline(10); // 10 ticks = ~100ms at 10ms interval
store.epoch_deadline_trap();
let instance = Instance::new(&mut store, &module, &[]).unwrap();
let func = instance.get_typed_func::<(), ()>(&mut store, "_start").unwrap();
let start = std::time::Instant::now();
let result = func.call(&mut store, ());
let elapsed = start.elapsed();
assert!(result.is_err(), "Should have trapped");
assert!(is_epoch_timeout(&result.unwrap_err()), "Should be epoch timeout, not other error");
assert!(elapsed < Duration::from_millis(500),
"Should have terminated in < 500ms, took {:?}", elapsed);
println!("PASS: Infinite loop terminated in {:?}", elapsed);
}
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Epoch tick interval (10ms) | Fine-grained interruption; low overshoot | Background thread overhead; one wake per 10ms per engine | Use a single shared engine per process; 10ms tick for one engine is ~100μs CPU overhead |
| Epoch vs fuel | Fuel: deterministic; Epoch: wall-clock — both are needed | Fuel doesn’t count host function time; epoch has ±tick granularity | Use both: fuel for CPU budget, epoch for wall-clock timeout |
epoch_deadline_async_yield_and_update |
Graceful cancellation in async context | Slightly more complex than epoch_deadline_trap |
Use async variant when module is in an async runtime; sync variant otherwise |
| Per-module deadline configuration | Fine-grained control per execution context | Requires tracking deadlines per Store | Create a TimeoutPolicy enum as shown; apply at Store creation time |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Epoch ticker thread exits | Modules no longer time out; infinite loops can run indefinitely | Monitor engine.epoch_deadline_* metrics; alert if epoch counter stops advancing |
Restart the ticker thread; implement a watchdog that verifies epoch is advancing |
epoch_interruption not enabled in Config |
set_epoch_deadline has no effect; modules never time out |
Test with an infinite-loop module — it should trap within deadline | Verify config.epoch_interruption(true) is called before engine creation |
| Async epoch fires but not caught | Async module receives yield signal but Tokio task doesn’t check it | Module appears to run past deadline; no timeout trap fired | Ensure call_async is used (not call) for async-enabled stores; add Tokio-level timeout as defence-in-depth |
| Deadline too short for legitimate computation | Plugin traps mid-execution; legitimate requests fail | Plugin execution metrics show timeout rate on known-good inputs | Benchmark worst-case legitimate execution time; set deadline to 3× P99 execution time |
Related Articles
- Wasm Fuel Metering — the complementary fuel-based resource limit that bounds CPU instruction count
- Wasmtime Production Hardening — comprehensive Wasmtime configuration covering memory limits, epoch interruption, and security flags
- Wasm AI Plugin Sandboxing — epoch interruption is the primary DoS defence for Wasm-sandboxed LLM tool plugins
- User-Provided Wasm Execution — executing untrusted Wasm safely; epoch interruption is a core requirement
- Wasmtime WASI Resource Limits — WASI-level resource limits that complement epoch interruption for I/O-bound modules