Sandboxing LLM Agent Tool Plugins with WebAssembly

Sandboxing LLM Agent Tool Plugins with WebAssembly

Problem

LLM agents derive their utility from tool use: the ability to call external functions — search, database queries, file operations, API calls, code execution — to complete tasks that require information beyond the model’s training data or actions beyond text generation. Every tool a model can call is a potential attack surface. When the tool is a function in the same process as the agent orchestrator, a malicious or buggy tool implementation can compromise the entire agent runtime.

The security challenge is that agent tool ecosystems are inherently extensible. Enterprise agent platforms allow teams to register custom tools, marketplace platforms allow third-party tool authors, and agentic frameworks like LangChain, CrewAI, and Claude’s tool use API encourage developers to write arbitrary Python functions as tools. The implicit trust model is that every registered tool is trusted code. This assumption fails when:

The tool is third-party code. A team installs a tool from a package registry or marketplace. The tool’s author is unknown. The tool has access to whatever capabilities the agent runtime provides — file system access, network calls, environment variables, subprocess execution.

The tool is AI-generated. The agent itself generates tool implementations dynamically, or a developer uses an LLM to write a tool function quickly. AI-generated code has a higher rate of unintended capability access (using os.environ, making unexpected network calls) and may be manipulated via prompt injection to include malicious logic.

The tool receives attacker-controlled input. Tools that process user-supplied data, external documents, or web content can be exploited via injection. A tool that executes code strings, runs SQL queries, or constructs shell commands from LLM-generated arguments is a direct injection target.

The tool is compromised post-installation. Like any dependency, a tool registered today may be modified tomorrow via a supply chain attack, a package update, or a repository compromise.

WebAssembly addresses these risks through structural sandboxing. A Wasm plugin runs in a memory-isolated execution environment. It has access only to the host functions explicitly imported into it. It cannot make syscalls directly, cannot access memory outside its own linear memory, and cannot reach the host process’s file system, environment, or network stack unless the host explicitly provides those capabilities.

The tradeoff is implementation cost: tools must be compiled to Wasm, the host must manage Wasm runtime instances, and the interface between host and plugin must be explicitly defined. The Extism framework (a Wasm-based plugin system) and Wasmtime’s embedding API substantially reduce this cost, making Wasm sandboxing practical for agent tool plugins.

Compared to process-based sandboxing (running each tool in a subprocess or container):

  • Wasm is lower-overhead — instantiation takes microseconds vs. milliseconds for a process
  • Wasm provides deterministic capability control — exactly what the manifest specifies
  • Wasm is language-agnostic — plugins written in Rust, Go, Python (via MicroPython), C, or AssemblyScript
  • Wasm cannot use OS-level sandbox escapes — no kernel vulnerability reaches through the Wasm boundary

Target systems: any LLM agent platform that executes registered tool functions; agentic frameworks where third-party or AI-generated tools run alongside production orchestration code; enterprise agent deployments where tool authors are not all internal trusted engineers.


Threat Model

Adversary 1 — Malicious third-party tool. An attacker publishes a tool plugin to a marketplace or package registry. The tool is installed by an organisation’s agent platform. The tool implementation exfiltrates environment variables (LLM API keys, cloud credentials) on first execution, or exfiltrates data from every tool call. With process sandboxing: possible if the process has network access. With Wasm sandboxing and no network host function: impossible — the tool cannot make outbound connections.

Adversary 2 — Prompt injection via tool input. An agent tool receives attacker-controlled content (a web page, an email, a document) as input. The content contains an injection that manipulates the tool into executing a malicious operation — writing a file, calling an API, reading a secret. With Wasm sandboxing: the tool can only perform operations explicitly provided by host functions; injected instructions to “read /etc/passwd” fail because there is no filesystem host function.

Adversary 3 — AI-generated tool with unintended capability. A developer uses an LLM to generate a Python tool function. The generated code includes import subprocess; subprocess.run(cmd) for what the model thought was a legitimate use. With Wasm sandboxing: the compiled plugin has no subprocess capability; the import is absent from the host function manifest.

Adversary 4 — Supply chain compromise of tool package. A previously-safe tool package is updated to include malicious code. The next execution installs the update. With Wasm sandboxing: the compromised tool is limited to its declared capability set; it cannot escalate beyond what the host permits.

Without sandboxing: tool compromise = agent runtime compromise = host system compromise. With Wasm sandboxing: tool compromise is contained to the declared capability set of that tool.


Configuration / Implementation

Step 1 — Define the tool capability manifest

Before implementing anything, define what capabilities each tool type is permitted:

# tool-capability-manifest.yaml
# Defines what host functions each tool category may access

capability_sets:
  search_tool:
    allowed_host_functions:
      - http_get          # Outbound HTTP GET only (no POST)
      - log_debug         # Logging only
    network_egress:
      allowed_domains:
        - "*.google.com"
        - "api.bing.com"
      blocked_domains:
        - "*"             # All others blocked
    filesystem: none
    env_access: none
    
  database_tool:
    allowed_host_functions:
      - db_query_readonly # Read-only SQL query via host-mediated connection
      - log_debug
    network_egress: none  # Database connection managed by host; tool never touches network
    filesystem: none
    env_access: none
    
  file_tool:
    allowed_host_functions:
      - file_read         # Host provides sandboxed file read within allowed prefix
      - file_list         # Directory listing within allowed prefix
      - log_debug
    filesystem:
      allowed_prefix: "/workspace/agent-files/"  # Scoped to agent workspace only
    network_egress: none
    env_access: none

  code_execution_tool:
    # Code execution tools get the most restricted sandbox
    allowed_host_functions:
      - log_debug
      # Explicitly: no network, no filesystem, no env
    network_egress: none
    filesystem: none
    env_access: none
    resource_limits:
      fuel: 1000000        # Wasmtime instruction fuel limit
      memory_pages: 16     # 1 MB max
      execution_timeout_ms: 5000

Step 2 — Implement a Wasm plugin host with Extism

Extism provides a high-level plugin host that wraps Wasmtime:

// plugin_host.rs — Wasm plugin host using Extism
use extism::{Plugin, PluginBuilder, Manifest, Wasm, Function, UserData, Val, ValType};
use std::collections::HashMap;
use std::path::Path;

struct ToolCapabilities {
    allowed_domains: Vec<String>,
    filesystem_prefix: Option<String>,
    fuel_limit: u64,
}

/// Build a plugin with scoped capabilities from a manifest
fn build_sandboxed_plugin(
    wasm_path: &Path,
    capabilities: &ToolCapabilities,
) -> anyhow::Result<Plugin> {
    let wasm = Wasm::file(wasm_path);
    let manifest = Manifest::new([wasm])
        // Set memory limit
        .with_memory_max(capabilities.fuel_limit / 1000);

    // Define host functions — only what the capability manifest allows
    let mut host_functions = vec![];

    // Always available: structured logging (no raw file/network access)
    let log_fn = Function::new(
        "log_debug",
        [ValType::I64],  // Pointer to log message in plugin memory
        [],
        UserData::default(),
        |plugin, inputs, _outputs, _user_data| {
            let msg_ptr = inputs[0].unwrap_i64() as u64;
            // Read the string from plugin memory (type-safe, bounded)
            let msg = plugin.memory_string(msg_ptr)?;
            tracing::debug!(plugin_log = %msg);
            Ok(())
        }
    );
    host_functions.push(log_fn);

    // Conditionally provide http_get if in capability set
    if !capabilities.allowed_domains.is_empty() {
        let allowed_domains = capabilities.allowed_domains.clone();
        let http_fn = Function::new(
            "http_get",
            [ValType::I64, ValType::I64],  // (url_ptr, url_len)
            [ValType::I64],                 // response body pointer
            UserData::default(),
            move |plugin, inputs, outputs, _| {
                let url_ptr = inputs[0].unwrap_i64() as u64;
                let url = plugin.memory_string(url_ptr)?;
                
                // Enforce domain allowlist BEFORE making the request
                let parsed = url::Url::parse(&url)?;
                let host = parsed.host_str().unwrap_or("");
                let allowed = allowed_domains.iter().any(|pattern| {
                    if pattern.starts_with("*.") {
                        host.ends_with(&pattern[1..])
                    } else {
                        host == pattern
                    }
                });
                
                if !allowed {
                    return Err(anyhow::anyhow!(
                        "Domain '{}' not in capability allowlist", host
                    ));
                }
                
                // Make the HTTP request on behalf of the plugin
                let response = reqwest::blocking::get(&url)?;
                let body = response.text()?;
                
                // Write response to plugin memory and return pointer
                let ptr = plugin.memory_alloc_bytes(body.as_bytes())?;
                outputs[0] = Val::I64(ptr as i64);
                Ok(())
            }
        );
        host_functions.push(http_fn);
    }

    // Build the plugin with scoped host functions
    let plugin = PluginBuilder::new(manifest)
        .with_wasi(false)  // Disable WASI — no filesystem/network via WASI
        .with_functions(host_functions)
        .build()?;

    Ok(plugin)
}

/// Execute a tool function in the sandboxed plugin
pub fn execute_tool(
    plugin: &mut Plugin,
    function_name: &str,
    input_json: &str,
) -> anyhow::Result<String> {
    let result = plugin.call::<&str, &str>(function_name, input_json)?;
    Ok(result.to_string())
}

Step 3 — Write a Wasm tool plugin (Rust)

Tool authors write their plugin in Rust, Go, or another language compiled to Wasm. The plugin can only use the host functions declared in its capability manifest:

// search_tool/src/lib.rs — Example search tool plugin
use extism_pdk::*;
use serde::{Deserialize, Serialize};

// Import only the host functions declared in the capability manifest
extern "C" {
    fn http_get(url_ptr: i64, url_len: i64) -> i64;
    fn log_debug(msg_ptr: i64);
}

#[derive(Deserialize)]
struct SearchInput {
    query: String,
    num_results: Option<usize>,
}

#[derive(Serialize)]
struct SearchResult {
    title: String,
    snippet: String,
    url: String,
}

#[plugin_fn]
pub fn search(input: Json<SearchInput>) -> FnResult<Json<Vec<SearchResult>>> {
    let query = urlencoding::encode(&input.0.query);
    let url = format!("https://api.bing.com/v7.0/search?q={}&count={}", 
                       query, input.0.num_results.unwrap_or(5));
    
    // Call host's http_get — host enforces domain allowlist
    // This cannot call any domain outside the capability manifest
    let url_bytes = url.as_bytes();
    let response_ptr = unsafe { 
        http_get(url_bytes.as_ptr() as i64, url_bytes.len() as i64) 
    };
    
    // Parse response (host wrote it to our memory)
    let response_str = unsafe { 
        std::str::from_utf8_unchecked(
            std::slice::from_raw_parts(response_ptr as *const u8, 
                                        Memory::find(response_ptr as u64).unwrap().len())
        )
    };
    
    let parsed: serde_json::Value = serde_json::from_str(response_str)?;
    let results = parsed["webPages"]["value"]
        .as_array()
        .unwrap_or(&vec![])
        .iter()
        .map(|v| SearchResult {
            title: v["name"].as_str().unwrap_or("").to_string(),
            snippet: v["snippet"].as_str().unwrap_or("").to_string(),
            url: v["url"].as_str().unwrap_or("").to_string(),
        })
        .collect();
    
    Ok(Json(results))
}

Compile to Wasm:

# Add Wasm target
rustup target add wasm32-unknown-unknown

# Build the plugin
cargo build --target wasm32-unknown-unknown --release

# The output .wasm file is what gets distributed and loaded by the host
ls target/wasm32-unknown-unknown/release/search_tool.wasm

Step 4 — Integrate with an agent framework

Wire the sandboxed plugin system into an LLM agent:

# agent_tool_registry.py — register Wasm plugins as LLM tools
import subprocess
import json
import ctypes
from pathlib import Path
from anthropic import Anthropic

client = Anthropic()

class WasmToolRegistry:
    """Registry that wraps Wasm plugins as LLM tool definitions."""
    
    def __init__(self, plugin_host_binary: str):
        self.plugin_host = plugin_host_binary  # Rust binary wrapping Extism
        self.registered_tools: dict[str, dict] = {}
    
    def register(
        self,
        name: str,
        wasm_path: Path,
        capability_set: str,
        description: str,
        input_schema: dict
    ) -> None:
        self.registered_tools[name] = {
            "wasm_path": str(wasm_path),
            "capability_set": capability_set,
            "definition": {
                "name": name,
                "description": description,
                "input_schema": input_schema,
            }
        }
    
    def execute(self, tool_name: str, tool_input: dict) -> str:
        """Execute a tool in its sandboxed Wasm environment."""
        if tool_name not in self.registered_tools:
            raise ValueError(f"Unknown tool: {tool_name}")
        
        tool = self.registered_tools[tool_name]
        
        # Call the Rust plugin host binary
        # In production: embed the host library directly
        result = subprocess.run(
            [self.plugin_host,
             "--plugin", tool["wasm_path"],
             "--capability", tool["capability_set"],
             "--function", tool_name,
             "--input", json.dumps(tool_input)],
            capture_output=True,
            text=True,
            timeout=10  # Hard timeout regardless of plugin fuel
        )
        
        if result.returncode != 0:
            raise RuntimeError(f"Plugin execution failed: {result.stderr}")
        
        return result.stdout
    
    @property
    def tool_definitions(self) -> list[dict]:
        """Return tool definitions in Anthropic API format."""
        return [t["definition"] for t in self.registered_tools.values()]


# Set up registry with sandboxed tools
registry = WasmToolRegistry("/usr/local/bin/wasm-tool-host")

registry.register(
    name="web_search",
    wasm_path=Path("/plugins/search_tool.wasm"),
    capability_set="search_tool",
    description="Search the web for information",
    input_schema={
        "type": "object",
        "properties": {
            "query": {"type": "string"},
            "num_results": {"type": "integer", "default": 5}
        },
        "required": ["query"]
    }
)

# Run agent loop with sandboxed tools
messages = [{"role": "user", "content": "What are the latest news about WebAssembly security?"}]

while True:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=4096,
        tools=registry.tool_definitions,
        messages=messages
    )
    
    if response.stop_reason == "end_turn":
        print(response.content[0].text)
        break
    
    messages.append({"role": "assistant", "content": response.content})
    
    # Execute tool calls in sandboxed Wasm environments
    tool_results = []
    for block in response.content:
        if block.type == "tool_use":
            try:
                result = registry.execute(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result
                })
            except Exception as e:
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": f"Tool execution failed (sandboxed): {str(e)}",
                    "is_error": True
                })
    
    messages.append({"role": "user", "content": tool_results})

Expected Behaviour

Signal Without sandboxing With Wasm sandboxing
Malicious tool reads /etc/passwd Succeeds — tool runs in agent process Fails — no filesystem host function provided
Tool makes outbound HTTP to exfil endpoint Succeeds — tool has network access Blocked — domain not in capability allowlist
Tool reads environment variable ANTHROPIC_API_KEY Succeeds Fails — no env access host function
Prompt injection causes tool to run subprocess Succeeds if tool uses Python subprocess Fails — subprocess requires host OS access unavailable to Wasm
Tool plugin executes for >10 seconds (resource abuse) Hangs agent Terminated by fuel limit + timeout
Supply chain: malicious tool update published Executes with full agent runtime permissions Executes with only declared capability set

Verification:

// isolation_test.rs — verify sandbox boundaries
#[test]
fn test_plugin_cannot_read_filesystem() {
    let plugin = build_sandboxed_plugin(
        Path::new("test_plugins/escape_attempt.wasm"),
        &ToolCapabilities {
            allowed_domains: vec![],
            filesystem_prefix: None,
            fuel_limit: 100_000,
        }
    ).unwrap();
    
    let result = execute_tool(&mut plugin, "try_read_file", "/etc/passwd");
    
    // Should fail — no filesystem host function provided
    assert!(result.is_err(), "Filesystem access should be blocked in sandbox");
    assert!(result.unwrap_err().to_string().contains("unknown import"));
}

Trade-offs

Aspect Benefit Cost Mitigation
Wasm compilation requirement Structural security boundary Tools must be compiled to Wasm; Python tools need MicroPython or wasmtime-py Provide toolchain scaffolding; common tool types have Rust/Go templates; Python tools use Componentize-py
Host-mediated network access Domain allowlist enforcement HTTP client must be implemented in host, not plugin Use Extism’s built-in HTTP capability or write a thin host function; acceptable for REST APIs
Fuel/instruction limits Prevents resource abuse Limits tools that legitimately need long execution Set fuel per capability set; search tools need less than code analysis tools; tune empirically
No WASI Prevents filesystem/network bypass via WASI Some library crates expect WASI; compilation may fail Use wasm32-unknown-unknown target (no WASI) for security-critical plugins; wasm32-wasi only with explicit WASI host restrictions

Failure Modes

Failure Symptom Detection Recovery
Plugin compilation fails for complex tool Wasm binary missing or incomplete; tool registration fails Build CI step fails; test suite catches Simplify tool API; extract complex logic into host function; provide a Docker-based fallback for tools that cannot be compiled to Wasm
Fuel limit too low for legitimate tool Tool returns “fuel exhausted” error for valid queries Agent receives tool_error; retries fail Increase fuel limit for the specific capability set; monitor fuel usage per tool type
Host function API mismatch after update Plugin calls function with wrong signature; runtime type error Runtime error in plugin execution Version host function APIs; use capability manifest versioning; test plugin compatibility after host updates
Attacker submits malicious Wasm binary via plugin marketplace Plugin loads but attempts to exploit Wasm runtime vulnerability Runtime sandbox escape (very rare with Wasmtime); or capability boundary exceeded Keep Wasmtime/Extism updated; run plugins in separate OS processes as defence-in-depth; cryptographically verify plugin identity