Safe Module Termination with Wasmtime Epoch-Based Interruption

Problem

A Wasm module that enters an infinite loop, allocates unbounded memory, or performs a denial-of-service computation cannot be stopped from outside without terminating the entire host process. This is the fundamental challenge of executing untrusted code: the sandbox prevents the module from escaping, but it does not prevent the module from consuming unlimited host resources within the sandbox.

Wasmtime offers two mechanisms for interrupting runaway modules: fuel-based execution limits and epoch-based interruption. Fuel assigns a fixed instruction budget to a module execution; when exhausted, the module traps. Epochs use a background timer and check points in the generated machine code to allow the host to request termination at safe points.

Each approach has different security properties and operational tradeoffs:

Fuel is deterministic — the same program always uses the same fuel for the same input. It is useful for CPU budgets and reproducible execution limits. But fuel-per-instruction counting is coarse: a loop that calls host functions (which don’t consume fuel) can run indefinitely. And fuel is synchronous — it cannot be adjusted from a different thread while a module is running.

Epochs are asynchronous and wall-clock-based. A background thread increments an epoch counter at a configurable interval. Each Wasm function prologue (and loop back-edge) checks whether the current epoch has exceeded the module’s deadline. If so, the module traps at a safe instruction boundary. This enables genuine wall-clock timeouts: “this module must complete within 5 seconds” is enforced even if the module is mostly sleeping in host function calls.

The security gap in most Wasmtime deployments is that neither fuel nor epochs are configured by default. The default configuration runs modules indefinitely with no resource limit. For production use cases — edge functions, LLM agent tool plugins, user-provided Wasm execution, plugin systems — this means a single malicious or buggy module can exhaust a host process’s resources, potentially affecting all other modules running in the same engine.

Epoch interruption is the right mechanism for production timeout enforcement, but it requires:

A background thread that advances the epoch counter
Per-module epoch deadlines configured before execution
Cooperative yield points for async execution contexts
A timeout policy that matches the security requirements of each execution context

Wasmtime’s epoch mechanism interacts with the async execution model in subtle ways that, if misconfigured, can cause epoch checks to be skipped entirely — making the timeout ineffective.

Target systems: any Wasmtime embedding that executes untrusted or third-party Wasm modules; edge function platforms; LLM agent tool plugins (see Wasm AI Plugin Sandboxing); plugin systems where module authors are external; Wasmtime ≥13.0.

Threat Model

Adversary 1 — Infinite loop denial of service. A malicious third-party plugin executes an infinite loop. Without epoch interruption, the host thread executing the module is permanently occupied. Other modules sharing the engine cannot be scheduled; the service degrades.

Adversary 2 — Exponential backtracking algorithm. A plugin performs a computation with exponential worst-case complexity (regex matching, JSON parsing of adversarial input, cryptographic operations on attacker-controlled parameters). Fuel limits are insufficient if the computation uses host-imported functions that don’t consume fuel. Without epoch interruption, the computation runs until the host OOMs or the process is killed.

Adversary 3 — Sleep-based timeout evasion. A malicious module calls a host sleep function repeatedly, consuming wall-clock time while using minimal CPU. Fuel-based limits are evaded because sleep calls don’t consume fuel. Epoch interruption fires regardless of CPU usage, catching sleep-based evasion.

Adversary 4 — Async execution epoch bypass. A module is executed in an async context. The epoch check is inserted at function prologues and loop back-edges. If the async yield points are not configured correctly, the epoch check may not fire during async-suspended states, allowing indefinite suspension of an async execution slot.

Without epoch interruption: modules run indefinitely, enabling all four attacks. With epoch interruption properly configured: all executions are bounded by a wall-clock deadline; the deadline fires at safe instruction boundaries; the module traps cleanly.

Configuration / Implementation

Step 1 — Enable epoch interruption in the engine

use wasmtime::{Engine, Config};

fn production_engine() -> Engine {
    let mut config = Config::new();
    
    // Enable epoch-based interruption
    // This adds epoch check instructions to the compiled code
    config.epoch_interruption(true);
    
    // Also enable fuel for CPU instruction budget (complementary control)
    config.consume_fuel(true);
    
    Engine::new(&config).expect("Failed to create engine")
}

Step 2 — Start the epoch ticker

Epoch interruption requires a background thread that increments the epoch counter:

use std::sync::Arc;
use std::time::Duration;
use std::thread;
use wasmtime::Engine;

fn start_epoch_ticker(engine: Arc<Engine>, tick_interval: Duration) -> thread::JoinHandle<()> {
    thread::spawn(move || {
        loop {
            thread::sleep(tick_interval);
            engine.increment_epoch();
        }
    })
}

fn main() {
    let engine = Arc::new(production_engine());
    
    // Tick every 10ms — epoch checks fire at roughly 10ms granularity
    // For tighter timeouts, use a shorter interval (at the cost of more overhead)
    let _ticker = start_epoch_ticker(engine.clone(), Duration::from_millis(10));
    
    // ... run modules
}

For async runtimes, use a Tokio interval instead:

use tokio::time::{interval, Duration};
use wasmtime::Engine;

async fn start_epoch_ticker_async(engine: Arc<Engine>, tick_interval: Duration) {
    let mut ticker = interval(tick_interval);
    loop {
        ticker.tick().await;
        engine.increment_epoch();
    }
}

Step 3 — Set per-execution epoch deadlines

Each Store (execution context) must have an epoch deadline configured before the module runs:

use wasmtime::{Store, Engine};
use wasmtime_wasi::WasiCtx;

struct ExecutionConfig {
    /// Wall-clock timeout as number of epoch ticks
    /// With 10ms tick interval, deadline=500 means 5 seconds
    epoch_deadline: u64,
    /// Fuel budget — instructions before trap (covers tight CPU loops between epoch checks)
    fuel_budget: u64,
}

fn execute_with_timeout<T>(
    engine: &Engine,
    config: ExecutionConfig,
    host_data: T,
    run: impl FnOnce(&mut Store<T>) -> anyhow::Result<()>,
) -> anyhow::Result<()> {
    let mut store = Store::new(engine, host_data);
    
    // Set epoch deadline relative to current epoch
    // deadline=N means: trap if epoch has advanced N times since now
    store.set_epoch_deadline(config.epoch_deadline);
    
    // Set fuel budget
    store.set_fuel(config.fuel_budget)?;
    
    // Configure what happens when epoch deadline is reached
    // Options: trap (default) or callback
    store.epoch_deadline_trap(); // Trap with "interrupt" error
    // Alternative: store.epoch_deadline_callback(|_| Ok(UpdateDeadline::Continue(1)));
    
    run(&mut store)
}

// Usage: execute a plugin with a 5-second wall-clock timeout
execute_with_timeout(
    &engine,
    ExecutionConfig {
        epoch_deadline: 500,  // 500 × 10ms = 5 seconds
        fuel_budget: 10_000_000,  // ~10M instructions
    },
    wasi_context,
    |store| {
        let instance = linker.instantiate(&mut *store, &module)?;
        let func = instance.get_typed_func::<(i32,), i32>(&mut *store, "process")?;
        func.call(&mut *store, (input_ptr,))?;
        Ok(())
    }
)

Step 4 — Handle epoch-triggered traps cleanly

When an epoch deadline fires, Wasmtime raises a trap. Distinguish this from other traps and handle it as a timeout:

use wasmtime::{Trap, TrapCode};

fn is_epoch_timeout(err: &anyhow::Error) -> bool {
    err.downcast_ref::<Trap>()
        .map(|t| t.trap_code() == Some(TrapCode::Interrupt))
        .unwrap_or(false)
}

fn run_plugin_safely(
    store: &mut Store<PluginState>,
    func: TypedFunc<(i32,), i32>,
    input: i32,
) -> Result<i32, PluginError> {
    match func.call(store, (input,)) {
        Ok(result) => Ok(result),
        Err(err) if is_epoch_timeout(&err) => {
            tracing::warn!(
                plugin = %store.data().plugin_name,
                "Plugin exceeded execution deadline"
            );
            Err(PluginError::Timeout)
        }
        Err(err) => Err(PluginError::ExecutionError(err.to_string())),
    }
}

Step 5 — Configure for async execution (Wasmtime async)

In async contexts, epoch interruption must be paired with cooperative async yield points:

use wasmtime::Config;

fn async_engine_config() -> Config {
    let mut config = Config::new();
    config.async_support(true);
    config.epoch_interruption(true);
    // Async epoch interruption: when deadline fires, yield to the async runtime
    // rather than trap immediately — allows graceful cleanup
    config
}

// Async execution with epoch deadline
async fn execute_async_with_timeout(
    engine: &Engine,
    module: &Module,
    deadline_ticks: u64,
) -> anyhow::Result<()> {
    let mut store = Store::new(engine, ());
    store.set_epoch_deadline(deadline_ticks);
    
    // For async execution, use epoch_deadline_async_yield_and_update
    // This yields to the Tokio runtime at each epoch check rather than trapping
    store.epoch_deadline_async_yield_and_update(deadline_ticks);
    
    let instance = Linker::new(engine)
        .instantiate_async(&mut store, module)
        .await?;
    
    // Run with timeout at the Tokio level as well (defence in depth)
    tokio::time::timeout(
        Duration::from_secs(10),
        async {
            let func = instance
                .get_typed_func::<(), ()>(&mut store, "_start")?;
            func.call_async(&mut store, ()).await
        }
    ).await??;
    
    Ok(())
}

Step 6 — Define a timeout policy per execution context

Document and enforce different timeout tiers for different plugin classes:

/// Timeout policy: match execution context to appropriate limits
pub struct TimeoutPolicy {
    pub epoch_deadline_ticks: u64,
    pub fuel_budget: u64,
    pub description: &'static str,
}

pub const INTERACTIVE_REQUEST: TimeoutPolicy = TimeoutPolicy {
    epoch_deadline_ticks: 100,   // 1 second (at 10ms tick)
    fuel_budget: 1_000_000,      // 1M instructions
    description: "Interactive HTTP request — user-facing latency",
};

pub const BACKGROUND_JOB: TimeoutPolicy = TimeoutPolicy {
    epoch_deadline_ticks: 3000,  // 30 seconds
    fuel_budget: 100_000_000,    // 100M instructions
    description: "Background processing — longer timeout acceptable",
};

pub const AI_TOOL_PLUGIN: TimeoutPolicy = TimeoutPolicy {
    epoch_deadline_ticks: 500,   // 5 seconds
    fuel_budget: 10_000_000,     // 10M instructions
    description: "LLM agent tool — must complete within agent turn budget",
};

pub const UNTRUSTED_PLUGIN: TimeoutPolicy = TimeoutPolicy {
    epoch_deadline_ticks: 50,    // 500ms — aggressive timeout for untrusted code
    fuel_budget: 100_000,        // 100K instructions
    description: "Third-party plugin — minimal resource budget",
};

Step 7 — Monitor epoch interruption events

// Instrument epoch timeout events for alerting and capacity planning
use metrics::{counter, histogram};

fn instrumented_execute(
    store: &mut Store<PluginState>,
    func: TypedFunc<(i32,), i32>,
    input: i32,
) -> Result<i32, PluginError> {
    let start = std::time::Instant::now();
    let plugin_name = store.data().plugin_name.clone();
    
    let result = run_plugin_safely(store, func, input);
    let elapsed = start.elapsed();
    
    histogram!("plugin.execution_duration_ms", 
               elapsed.as_millis() as f64,
               "plugin" => plugin_name.clone());
    
    match &result {
        Err(PluginError::Timeout) => {
            counter!("plugin.epoch_timeouts_total", 1, "plugin" => plugin_name);
            tracing::warn!(plugin = %plugin_name, "Epoch timeout fired");
        }
        Err(PluginError::FuelExhausted) => {
            counter!("plugin.fuel_exhausted_total", 1, "plugin" => plugin_name);
        }
        Ok(_) => {
            counter!("plugin.executions_success_total", 1, "plugin" => plugin_name);
        }
        _ => {}
    }
    
    result
}

Expected Behaviour

Signal	Without epoch interruption	With epoch interruption
Module in infinite loop	Host thread blocked indefinitely	Traps with `TrapCode::Interrupt` after deadline
5-second timeout on plugin	Not enforced	Plugin traps after ~5 seconds regardless of CPU usage
Sleep-in-loop evasion	Evades fuel limits; runs indefinitely	Epoch fires during host sleep; deadline enforced
Malicious plugin consuming all CPU	Other modules starved	Timeout fires; module terminated; other modules continue
Async module suspended indefinitely	Never times out	`epoch_deadline_async_yield_and_update` fires; Tokio cancels

Verification:

#[test]
fn test_epoch_terminates_infinite_loop() {
    let engine = production_engine();
    let _ticker = start_epoch_ticker(Arc::new(engine.clone()), Duration::from_millis(10));
    
    // Wasm module with an infinite loop
    let module = Module::new(&engine, r#"
        (module
          (func $loop (loop br 0))  ;; infinite loop
          (export "_start" (func $loop))
        )
    "#).unwrap();
    
    let mut store = Store::new(&engine, ());
    store.set_epoch_deadline(10);  // 10 ticks = ~100ms at 10ms interval
    store.epoch_deadline_trap();
    
    let instance = Instance::new(&mut store, &module, &[]).unwrap();
    let func = instance.get_typed_func::<(), ()>(&mut store, "_start").unwrap();
    
    let start = std::time::Instant::now();
    let result = func.call(&mut store, ());
    let elapsed = start.elapsed();
    
    assert!(result.is_err(), "Should have trapped");
    assert!(is_epoch_timeout(&result.unwrap_err()), "Should be epoch timeout, not other error");
    assert!(elapsed < Duration::from_millis(500), 
            "Should have terminated in < 500ms, took {:?}", elapsed);
    
    println!("PASS: Infinite loop terminated in {:?}", elapsed);
}

Trade-offs

Aspect	Benefit	Cost	Mitigation
Epoch tick interval (10ms)	Fine-grained interruption; low overshoot	Background thread overhead; one wake per 10ms per engine	Use a single shared engine per process; 10ms tick for one engine is ~100μs CPU overhead
Epoch vs fuel	Fuel: deterministic; Epoch: wall-clock — both are needed	Fuel doesn’t count host function time; epoch has ±tick granularity	Use both: fuel for CPU budget, epoch for wall-clock timeout
`epoch_deadline_async_yield_and_update`	Graceful cancellation in async context	Slightly more complex than `epoch_deadline_trap`	Use async variant when module is in an async runtime; sync variant otherwise
Per-module deadline configuration	Fine-grained control per execution context	Requires tracking deadlines per Store	Create a TimeoutPolicy enum as shown; apply at Store creation time

Failure Modes

Failure	Symptom	Detection	Recovery
Epoch ticker thread exits	Modules no longer time out; infinite loops can run indefinitely	Monitor `engine.epoch_deadline_*` metrics; alert if epoch counter stops advancing	Restart the ticker thread; implement a watchdog that verifies epoch is advancing
`epoch_interruption` not enabled in Config	`set_epoch_deadline` has no effect; modules never time out	Test with an infinite-loop module — it should trap within deadline	Verify `config.epoch_interruption(true)` is called before engine creation
Async epoch fires but not caught	Async module receives yield signal but Tokio task doesn’t check it	Module appears to run past deadline; no timeout trap fired	Ensure `call_async` is used (not `call`) for async-enabled stores; add Tokio-level timeout as defence-in-depth
Deadline too short for legitimate computation	Plugin traps mid-execution; legitimate requests fail	Plugin execution metrics show timeout rate on known-good inputs	Benchmark worst-case legitimate execution time; set deadline to 3× P99 execution time

Wasm Fuel Metering — the complementary fuel-based resource limit that bounds CPU instruction count
Wasmtime Production Hardening — comprehensive Wasmtime configuration covering memory limits, epoch interruption, and security flags
Wasm AI Plugin Sandboxing — epoch interruption is the primary DoS defence for Wasm-sandboxed LLM tool plugins
User-Provided Wasm Execution — executing untrusted Wasm safely; epoch interruption is a core requirement
Wasmtime WASI Resource Limits — WASI-level resource limits that complement epoch interruption for I/O-bound modules