Sandboxing LLM Agent Tool Plugins with WebAssembly
Problem
LLM agents derive their utility from tool use: the ability to call external functions — search, database queries, file operations, API calls, code execution — to complete tasks that require information beyond the model’s training data or actions beyond text generation. Every tool a model can call is a potential attack surface. When the tool is a function in the same process as the agent orchestrator, a malicious or buggy tool implementation can compromise the entire agent runtime.
The security challenge is that agent tool ecosystems are inherently extensible. Enterprise agent platforms allow teams to register custom tools, marketplace platforms allow third-party tool authors, and agentic frameworks like LangChain, CrewAI, and Claude’s tool use API encourage developers to write arbitrary Python functions as tools. The implicit trust model is that every registered tool is trusted code. This assumption fails when:
The tool is third-party code. A team installs a tool from a package registry or marketplace. The tool’s author is unknown. The tool has access to whatever capabilities the agent runtime provides — file system access, network calls, environment variables, subprocess execution.
The tool is AI-generated. The agent itself generates tool implementations dynamically, or a developer uses an LLM to write a tool function quickly. AI-generated code has a higher rate of unintended capability access (using os.environ, making unexpected network calls) and may be manipulated via prompt injection to include malicious logic.
The tool receives attacker-controlled input. Tools that process user-supplied data, external documents, or web content can be exploited via injection. A tool that executes code strings, runs SQL queries, or constructs shell commands from LLM-generated arguments is a direct injection target.
The tool is compromised post-installation. Like any dependency, a tool registered today may be modified tomorrow via a supply chain attack, a package update, or a repository compromise.
WebAssembly addresses these risks through structural sandboxing. A Wasm plugin runs in a memory-isolated execution environment. It has access only to the host functions explicitly imported into it. It cannot make syscalls directly, cannot access memory outside its own linear memory, and cannot reach the host process’s file system, environment, or network stack unless the host explicitly provides those capabilities.
The tradeoff is implementation cost: tools must be compiled to Wasm, the host must manage Wasm runtime instances, and the interface between host and plugin must be explicitly defined. The Extism framework (a Wasm-based plugin system) and Wasmtime’s embedding API substantially reduce this cost, making Wasm sandboxing practical for agent tool plugins.
Compared to process-based sandboxing (running each tool in a subprocess or container):
- Wasm is lower-overhead — instantiation takes microseconds vs. milliseconds for a process
- Wasm provides deterministic capability control — exactly what the manifest specifies
- Wasm is language-agnostic — plugins written in Rust, Go, Python (via MicroPython), C, or AssemblyScript
- Wasm cannot use OS-level sandbox escapes — no kernel vulnerability reaches through the Wasm boundary
Target systems: any LLM agent platform that executes registered tool functions; agentic frameworks where third-party or AI-generated tools run alongside production orchestration code; enterprise agent deployments where tool authors are not all internal trusted engineers.
Threat Model
Adversary 1 — Malicious third-party tool. An attacker publishes a tool plugin to a marketplace or package registry. The tool is installed by an organisation’s agent platform. The tool implementation exfiltrates environment variables (LLM API keys, cloud credentials) on first execution, or exfiltrates data from every tool call. With process sandboxing: possible if the process has network access. With Wasm sandboxing and no network host function: impossible — the tool cannot make outbound connections.
Adversary 2 — Prompt injection via tool input. An agent tool receives attacker-controlled content (a web page, an email, a document) as input. The content contains an injection that manipulates the tool into executing a malicious operation — writing a file, calling an API, reading a secret. With Wasm sandboxing: the tool can only perform operations explicitly provided by host functions; injected instructions to “read /etc/passwd” fail because there is no filesystem host function.
Adversary 3 — AI-generated tool with unintended capability. A developer uses an LLM to generate a Python tool function. The generated code includes import subprocess; subprocess.run(cmd) for what the model thought was a legitimate use. With Wasm sandboxing: the compiled plugin has no subprocess capability; the import is absent from the host function manifest.
Adversary 4 — Supply chain compromise of tool package. A previously-safe tool package is updated to include malicious code. The next execution installs the update. With Wasm sandboxing: the compromised tool is limited to its declared capability set; it cannot escalate beyond what the host permits.
Without sandboxing: tool compromise = agent runtime compromise = host system compromise. With Wasm sandboxing: tool compromise is contained to the declared capability set of that tool.
Configuration / Implementation
Step 1 — Define the tool capability manifest
Before implementing anything, define what capabilities each tool type is permitted:
# tool-capability-manifest.yaml
# Defines what host functions each tool category may access
capability_sets:
search_tool:
allowed_host_functions:
- http_get # Outbound HTTP GET only (no POST)
- log_debug # Logging only
network_egress:
allowed_domains:
- "*.google.com"
- "api.bing.com"
blocked_domains:
- "*" # All others blocked
filesystem: none
env_access: none
database_tool:
allowed_host_functions:
- db_query_readonly # Read-only SQL query via host-mediated connection
- log_debug
network_egress: none # Database connection managed by host; tool never touches network
filesystem: none
env_access: none
file_tool:
allowed_host_functions:
- file_read # Host provides sandboxed file read within allowed prefix
- file_list # Directory listing within allowed prefix
- log_debug
filesystem:
allowed_prefix: "/workspace/agent-files/" # Scoped to agent workspace only
network_egress: none
env_access: none
code_execution_tool:
# Code execution tools get the most restricted sandbox
allowed_host_functions:
- log_debug
# Explicitly: no network, no filesystem, no env
network_egress: none
filesystem: none
env_access: none
resource_limits:
fuel: 1000000 # Wasmtime instruction fuel limit
memory_pages: 16 # 1 MB max
execution_timeout_ms: 5000
Step 2 — Implement a Wasm plugin host with Extism
Extism provides a high-level plugin host that wraps Wasmtime:
// plugin_host.rs — Wasm plugin host using Extism
use extism::{Plugin, PluginBuilder, Manifest, Wasm, Function, UserData, Val, ValType};
use std::collections::HashMap;
use std::path::Path;
struct ToolCapabilities {
allowed_domains: Vec<String>,
filesystem_prefix: Option<String>,
fuel_limit: u64,
}
/// Build a plugin with scoped capabilities from a manifest
fn build_sandboxed_plugin(
wasm_path: &Path,
capabilities: &ToolCapabilities,
) -> anyhow::Result<Plugin> {
let wasm = Wasm::file(wasm_path);
let manifest = Manifest::new([wasm])
// Set memory limit
.with_memory_max(capabilities.fuel_limit / 1000);
// Define host functions — only what the capability manifest allows
let mut host_functions = vec![];
// Always available: structured logging (no raw file/network access)
let log_fn = Function::new(
"log_debug",
[ValType::I64], // Pointer to log message in plugin memory
[],
UserData::default(),
|plugin, inputs, _outputs, _user_data| {
let msg_ptr = inputs[0].unwrap_i64() as u64;
// Read the string from plugin memory (type-safe, bounded)
let msg = plugin.memory_string(msg_ptr)?;
tracing::debug!(plugin_log = %msg);
Ok(())
}
);
host_functions.push(log_fn);
// Conditionally provide http_get if in capability set
if !capabilities.allowed_domains.is_empty() {
let allowed_domains = capabilities.allowed_domains.clone();
let http_fn = Function::new(
"http_get",
[ValType::I64, ValType::I64], // (url_ptr, url_len)
[ValType::I64], // response body pointer
UserData::default(),
move |plugin, inputs, outputs, _| {
let url_ptr = inputs[0].unwrap_i64() as u64;
let url = plugin.memory_string(url_ptr)?;
// Enforce domain allowlist BEFORE making the request
let parsed = url::Url::parse(&url)?;
let host = parsed.host_str().unwrap_or("");
let allowed = allowed_domains.iter().any(|pattern| {
if pattern.starts_with("*.") {
host.ends_with(&pattern[1..])
} else {
host == pattern
}
});
if !allowed {
return Err(anyhow::anyhow!(
"Domain '{}' not in capability allowlist", host
));
}
// Make the HTTP request on behalf of the plugin
let response = reqwest::blocking::get(&url)?;
let body = response.text()?;
// Write response to plugin memory and return pointer
let ptr = plugin.memory_alloc_bytes(body.as_bytes())?;
outputs[0] = Val::I64(ptr as i64);
Ok(())
}
);
host_functions.push(http_fn);
}
// Build the plugin with scoped host functions
let plugin = PluginBuilder::new(manifest)
.with_wasi(false) // Disable WASI — no filesystem/network via WASI
.with_functions(host_functions)
.build()?;
Ok(plugin)
}
/// Execute a tool function in the sandboxed plugin
pub fn execute_tool(
plugin: &mut Plugin,
function_name: &str,
input_json: &str,
) -> anyhow::Result<String> {
let result = plugin.call::<&str, &str>(function_name, input_json)?;
Ok(result.to_string())
}
Step 3 — Write a Wasm tool plugin (Rust)
Tool authors write their plugin in Rust, Go, or another language compiled to Wasm. The plugin can only use the host functions declared in its capability manifest:
// search_tool/src/lib.rs — Example search tool plugin
use extism_pdk::*;
use serde::{Deserialize, Serialize};
// Import only the host functions declared in the capability manifest
extern "C" {
fn http_get(url_ptr: i64, url_len: i64) -> i64;
fn log_debug(msg_ptr: i64);
}
#[derive(Deserialize)]
struct SearchInput {
query: String,
num_results: Option<usize>,
}
#[derive(Serialize)]
struct SearchResult {
title: String,
snippet: String,
url: String,
}
#[plugin_fn]
pub fn search(input: Json<SearchInput>) -> FnResult<Json<Vec<SearchResult>>> {
let query = urlencoding::encode(&input.0.query);
let url = format!("https://api.bing.com/v7.0/search?q={}&count={}",
query, input.0.num_results.unwrap_or(5));
// Call host's http_get — host enforces domain allowlist
// This cannot call any domain outside the capability manifest
let url_bytes = url.as_bytes();
let response_ptr = unsafe {
http_get(url_bytes.as_ptr() as i64, url_bytes.len() as i64)
};
// Parse response (host wrote it to our memory)
let response_str = unsafe {
std::str::from_utf8_unchecked(
std::slice::from_raw_parts(response_ptr as *const u8,
Memory::find(response_ptr as u64).unwrap().len())
)
};
let parsed: serde_json::Value = serde_json::from_str(response_str)?;
let results = parsed["webPages"]["value"]
.as_array()
.unwrap_or(&vec![])
.iter()
.map(|v| SearchResult {
title: v["name"].as_str().unwrap_or("").to_string(),
snippet: v["snippet"].as_str().unwrap_or("").to_string(),
url: v["url"].as_str().unwrap_or("").to_string(),
})
.collect();
Ok(Json(results))
}
Compile to Wasm:
# Add Wasm target
rustup target add wasm32-unknown-unknown
# Build the plugin
cargo build --target wasm32-unknown-unknown --release
# The output .wasm file is what gets distributed and loaded by the host
ls target/wasm32-unknown-unknown/release/search_tool.wasm
Step 4 — Integrate with an agent framework
Wire the sandboxed plugin system into an LLM agent:
# agent_tool_registry.py — register Wasm plugins as LLM tools
import subprocess
import json
import ctypes
from pathlib import Path
from anthropic import Anthropic
client = Anthropic()
class WasmToolRegistry:
"""Registry that wraps Wasm plugins as LLM tool definitions."""
def __init__(self, plugin_host_binary: str):
self.plugin_host = plugin_host_binary # Rust binary wrapping Extism
self.registered_tools: dict[str, dict] = {}
def register(
self,
name: str,
wasm_path: Path,
capability_set: str,
description: str,
input_schema: dict
) -> None:
self.registered_tools[name] = {
"wasm_path": str(wasm_path),
"capability_set": capability_set,
"definition": {
"name": name,
"description": description,
"input_schema": input_schema,
}
}
def execute(self, tool_name: str, tool_input: dict) -> str:
"""Execute a tool in its sandboxed Wasm environment."""
if tool_name not in self.registered_tools:
raise ValueError(f"Unknown tool: {tool_name}")
tool = self.registered_tools[tool_name]
# Call the Rust plugin host binary
# In production: embed the host library directly
result = subprocess.run(
[self.plugin_host,
"--plugin", tool["wasm_path"],
"--capability", tool["capability_set"],
"--function", tool_name,
"--input", json.dumps(tool_input)],
capture_output=True,
text=True,
timeout=10 # Hard timeout regardless of plugin fuel
)
if result.returncode != 0:
raise RuntimeError(f"Plugin execution failed: {result.stderr}")
return result.stdout
@property
def tool_definitions(self) -> list[dict]:
"""Return tool definitions in Anthropic API format."""
return [t["definition"] for t in self.registered_tools.values()]
# Set up registry with sandboxed tools
registry = WasmToolRegistry("/usr/local/bin/wasm-tool-host")
registry.register(
name="web_search",
wasm_path=Path("/plugins/search_tool.wasm"),
capability_set="search_tool",
description="Search the web for information",
input_schema={
"type": "object",
"properties": {
"query": {"type": "string"},
"num_results": {"type": "integer", "default": 5}
},
"required": ["query"]
}
)
# Run agent loop with sandboxed tools
messages = [{"role": "user", "content": "What are the latest news about WebAssembly security?"}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=registry.tool_definitions,
messages=messages
)
if response.stop_reason == "end_turn":
print(response.content[0].text)
break
messages.append({"role": "assistant", "content": response.content})
# Execute tool calls in sandboxed Wasm environments
tool_results = []
for block in response.content:
if block.type == "tool_use":
try:
result = registry.execute(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
except Exception as e:
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": f"Tool execution failed (sandboxed): {str(e)}",
"is_error": True
})
messages.append({"role": "user", "content": tool_results})
Expected Behaviour
| Signal | Without sandboxing | With Wasm sandboxing |
|---|---|---|
Malicious tool reads /etc/passwd |
Succeeds — tool runs in agent process | Fails — no filesystem host function provided |
| Tool makes outbound HTTP to exfil endpoint | Succeeds — tool has network access | Blocked — domain not in capability allowlist |
Tool reads environment variable ANTHROPIC_API_KEY |
Succeeds | Fails — no env access host function |
| Prompt injection causes tool to run subprocess | Succeeds if tool uses Python subprocess | Fails — subprocess requires host OS access unavailable to Wasm |
| Tool plugin executes for >10 seconds (resource abuse) | Hangs agent | Terminated by fuel limit + timeout |
| Supply chain: malicious tool update published | Executes with full agent runtime permissions | Executes with only declared capability set |
Verification:
// isolation_test.rs — verify sandbox boundaries
#[test]
fn test_plugin_cannot_read_filesystem() {
let plugin = build_sandboxed_plugin(
Path::new("test_plugins/escape_attempt.wasm"),
&ToolCapabilities {
allowed_domains: vec![],
filesystem_prefix: None,
fuel_limit: 100_000,
}
).unwrap();
let result = execute_tool(&mut plugin, "try_read_file", "/etc/passwd");
// Should fail — no filesystem host function provided
assert!(result.is_err(), "Filesystem access should be blocked in sandbox");
assert!(result.unwrap_err().to_string().contains("unknown import"));
}
Trade-offs
| Aspect | Benefit | Cost | Mitigation |
|---|---|---|---|
| Wasm compilation requirement | Structural security boundary | Tools must be compiled to Wasm; Python tools need MicroPython or wasmtime-py | Provide toolchain scaffolding; common tool types have Rust/Go templates; Python tools use Componentize-py |
| Host-mediated network access | Domain allowlist enforcement | HTTP client must be implemented in host, not plugin | Use Extism’s built-in HTTP capability or write a thin host function; acceptable for REST APIs |
| Fuel/instruction limits | Prevents resource abuse | Limits tools that legitimately need long execution | Set fuel per capability set; search tools need less than code analysis tools; tune empirically |
| No WASI | Prevents filesystem/network bypass via WASI | Some library crates expect WASI; compilation may fail | Use wasm32-unknown-unknown target (no WASI) for security-critical plugins; wasm32-wasi only with explicit WASI host restrictions |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Plugin compilation fails for complex tool | Wasm binary missing or incomplete; tool registration fails | Build CI step fails; test suite catches | Simplify tool API; extract complex logic into host function; provide a Docker-based fallback for tools that cannot be compiled to Wasm |
| Fuel limit too low for legitimate tool | Tool returns “fuel exhausted” error for valid queries | Agent receives tool_error; retries fail | Increase fuel limit for the specific capability set; monitor fuel usage per tool type |
| Host function API mismatch after update | Plugin calls function with wrong signature; runtime type error | Runtime error in plugin execution | Version host function APIs; use capability manifest versioning; test plugin compatibility after host updates |
| Attacker submits malicious Wasm binary via plugin marketplace | Plugin loads but attempts to exploit Wasm runtime vulnerability | Runtime sandbox escape (very rare with Wasmtime); or capability boundary exceeded | Keep Wasmtime/Extism updated; run plugins in separate OS processes as defence-in-depth; cryptographically verify plugin identity |
Related Articles
- Wasm Sandboxed MCP Tools — using Wasm to sandbox MCP tool implementations, the same pattern applied to the MCP protocol
- Agent Tool Use Sandboxing — the AI-landscape perspective on sandboxing LLM agent tools
- Extism Plugin Security — hardening the Extism plugin framework used as the Wasm tool host
- Wasm Host Function Security — designing secure host function interfaces that enforce capability boundaries
- User-Provided Wasm Execution — broader guidance on safely executing untrusted Wasm code, applicable to third-party tool plugins