Sandboxing and Blast Radius
Series
Agent EngineeringAn agent that can do anything will eventually do something catastrophic. Not from malice — from a hallucinated tool argument, a misunderstood constraint, or a compound error across five steps that individually each looked reasonable. Sandboxing is not paranoia; it is the engineering discipline of defining exactly how much damage an agent can cause when it goes wrong, and ensuring that number is acceptable before you deploy.
Blast Radius as a Design Constraint
Blast radius is the maximum damage a single agent run can cause if everything goes wrong. It is a design input, not an afterthought. Before writing any agent code, answer: "If this agent runs completely off the rails, what is the worst outcome?" The answer defines your sandbox requirements.
Every action goes through three gates: permission scope, dry-run flag, and rollback record. None of these are optional for agents that touch production systems.
Permission Scopes
The principle of least privilege applies to agents more than almost any other software system. An agent that reads customer data to answer support queries does not need write access to the billing database. Model your permissions explicitly.
from enum import Flag, auto
from dataclasses import dataclass
class Permission(Flag):
READ_FS = auto()
WRITE_FS = auto()
DELETE_FS = auto()
READ_DB = auto()
WRITE_DB = auto()
SEND_EMAIL = auto()
CALL_API = auto()
RUN_CODE = auto()
TOOL_REQUIRED_PERMS: dict[str, Permission] = {
"read_file": Permission.READ_FS,
"write_file": Permission.WRITE_FS,
"delete_file": Permission.DELETE_FS | Permission.WRITE_FS,
"query_db": Permission.READ_DB,
"update_db": Permission.WRITE_DB,
"send_email": Permission.SEND_EMAIL,
"http_request": Permission.CALL_API,
"run_python": Permission.RUN_CODE,
}
@dataclass
class AgentScope:
granted: Permission
allowed_paths: list[str] = None # filesystem path allowlist
allowed_domains: list[str] = None # HTTP domain allowlist
def check(self, tool: str, args: dict) -> None:
required = TOOL_REQUIRED_PERMS.get(tool, Permission(0))
missing = required & ~self.granted
if missing:
raise PermissionError(
f"Tool `{tool}` requires {missing!r} — not in agent scope."
)
# Path allowlist enforcement
if self.allowed_paths and "path" in args:
path = args["path"]
if not any(path.startswith(p) for p in self.allowed_paths):
raise PermissionError(
f"Path `{path}` not in allowed paths: {self.allowed_paths}"
)Scope objects are created at task instantiation and passed through the entire agent loop. The agent never self-grants permissions — scope is determined by the caller (the orchestration layer), not the agent.
Dry-Run Mode
Dry-run mode executes the permission and validation logic but skips the actual side effect. It is essential for two use cases: testing agent behavior without touching real systems, and showing users a preview of what the agent plans to do before it does it.
from typing import Any
class ToolRunner:
def __init__(self, scope: AgentScope, dry_run: bool = False):
self.scope = scope
self.dry_run = dry_run
self.dry_run_log: list[dict] = []
def run(self, tool: str, args: dict) -> Any:
# Always check permissions, even in dry-run
self.scope.check(tool, args)
if self.dry_run:
preview = self._preview(tool, args)
self.dry_run_log.append({"tool": tool, "args": args, "preview": preview})
return f"[DRY RUN] Would execute: {preview}"
return self._execute(tool, args)
def _preview(self, tool: str, args: dict) -> str:
previews = {
"write_file": lambda a: f"Write {len(a.get('content',''))} bytes to {a['path']}",
"delete_file": lambda a: f"Delete {a['path']}",
"send_email": lambda a: f"Email to {a['to']}: '{a.get('subject','')}'",
"update_db": lambda a: f"UPDATE {a.get('table','?')} WHERE {a.get('where','?')}",
}
return previews.get(tool, lambda a: f"{tool}({a})")(args)
def _execute(self, tool: str, args: dict) -> Any:
# Real implementation delegates to actual tool functions
raise NotImplementedErrorShow the dry_run_log to users or the human-in-the-loop reviewer as the "plan preview" before switching to live mode. This combines nicely with the checkpoint pattern from post 8.
Rollback Records
For every destructive action, write a rollback record before executing. This is the agent equivalent of a database transaction log.
import json
import shutil
from pathlib import Path
from datetime import datetime, timezone
class RollbackManager:
def __init__(self, rollback_dir: str = "/tmp/agent_rollback"):
self.dir = Path(rollback_dir)
self.dir.mkdir(parents=True, exist_ok=True)
def snapshot_file(self, path: str) -> str:
"""Copy file to rollback dir before overwriting/deleting."""
src = Path(path)
if not src.exists():
return "nonexistent"
snapshot_path = self.dir / f"{src.name}.{int(datetime.now().timestamp())}.bak"
shutil.copy2(src, snapshot_path)
return str(snapshot_path)
def record_db_change(self, table: str, where: str, before_values: dict) -> str:
"""Write inverse SQL to a rollback record."""
record_path = self.dir / f"db_{table}_{int(datetime.now().timestamp())}.json"
record = {
"ts": datetime.now(timezone.utc).isoformat(),
"table": table,
"where": where,
"before_values": before_values,
"rollback_sql": (
f"UPDATE {table} SET "
+ ", ".join(f"{k}={json.dumps(v)}" for k, v in before_values.items())
+ f" WHERE {where};"
),
}
record_path.write_text(json.dumps(record, indent=2))
return str(record_path)
def rollback_file(self, snapshot_path: str, original_path: str):
if snapshot_path == "nonexistent":
Path(original_path).unlink(missing_ok=True)
else:
shutil.copy2(snapshot_path, original_path)The rollback record is written before the action executes. If the action partially succeeds and then fails (network cut, crash), you still have the rollback artifact.
Sandboxing Code Execution
If your agent can run arbitrary code (run_python, shell exec), sandboxing becomes existential. The minimum viable approach: subprocess isolation with resource limits.
import subprocess
import resource
def run_sandboxed_python(code: str, timeout_sec: float = 10.0) -> str:
"""
Run Python in a subprocess with CPU/memory limits.
In production, use Docker, gVisor, or Firecracker instead.
"""
# Blocklist dangerous imports
BLOCKED = ["os.system", "subprocess", "importlib", "__import__", "eval", "exec"]
for pattern in BLOCKED:
if pattern in code:
raise PermissionError(f"Blocked pattern in code: `{pattern}`")
result = subprocess.run(
["python3", "-c", code],
capture_output=True,
text=True,
timeout=timeout_sec,
# Resource limits via preexec_fn (Unix only)
preexec_fn=lambda: (
resource.setrlimit(resource.RLIMIT_AS, (256 * 1024 * 1024,) * 2),
resource.setrlimit(resource.RLIMIT_CPU, (5, 5)),
),
)
if result.returncode != 0:
raise RuntimeError(f"Code failed: {result.stderr[:500]}")
return result.stdoutFor production, use container-level sandboxing (Docker with --network=none --read-only --memory=256m) or dedicated sandbox services (E2B, Morph). The subprocess approach above is appropriate only for dev/test environments.
Key Takeaways
- Blast radius is a design input: define the worst-case outcome before writing agent code and ensure sandboxing constrains it to acceptable levels.
- Permission scopes are set by the orchestration layer at task creation — agents must never self-grant permissions.
- Dry-run mode should enforce the same permission checks as live mode — it is not a bypass, it is a preview.
- Write rollback records before destructive actions execute, not after — a mid-action crash still needs a recovery path.
- Blocklisting dangerous patterns in code execution is a first line of defense, not a complete solution — use container sandboxing for production.
- Every permission denial and rollback event should emit a structured log entry; the audit trail is your incident response tool when something goes wrong.