Agent Engineering

Sandboxing and Blast Radius

Ravinder·March 29, 2025·6 min read

AgentsAILLMSecuritySandboxingSafety

Series

Agent Engineering

Part 9 of 10

← Part 8

Human-in-the-Loop Checkpoints

Part 10 →

Production Rollout Patterns

An agent that can do anything will eventually do something catastrophic. Not from malice — from a hallucinated tool argument, a misunderstood constraint, or a compound error across five steps that individually each looked reasonable. Sandboxing is not paranoia; it is the engineering discipline of defining exactly how much damage an agent can cause when it goes wrong, and ensuring that number is acceptable before you deploy.

Blast Radius as a Design Constraint

Blast radius is the maximum damage a single agent run can cause if everything goes wrong. It is a design input, not an afterthought. Before writing any agent code, answer: "If this agent runs completely off the rails, what is the worst outcome?" The answer defines your sandbox requirements.

flowchart TD A[Agent Action Request] --> B{Permission Scope Check} B -- denied --> C[Return PermissionError] B -- allowed --> D{Dry-Run Mode?} D -- yes --> E[Simulate & Return Preview] D -- no --> F[Execute Action] F --> G[Snapshot State Before] G --> H[Write Rollback Record] H --> I[Perform Action] I --> J{Success?} J -- yes --> K[Commit + Log] J -- no --> L[Trigger Rollback] L --> M[Restore from Snapshot]

Every action goes through three gates: permission scope, dry-run flag, and rollback record. None of these are optional for agents that touch production systems.

Permission Scopes

The principle of least privilege applies to agents more than almost any other software system. An agent that reads customer data to answer support queries does not need write access to the billing database. Model your permissions explicitly.

from enum import Flag, auto
from dataclasses import dataclass
 
class Permission(Flag):
    READ_FS     = auto()
    WRITE_FS    = auto()
    DELETE_FS   = auto()
    READ_DB     = auto()
    WRITE_DB    = auto()
    SEND_EMAIL  = auto()
    CALL_API    = auto()
    RUN_CODE    = auto()
 
TOOL_REQUIRED_PERMS: dict[str, Permission] = {
    "read_file":     Permission.READ_FS,
    "write_file":    Permission.WRITE_FS,
    "delete_file":   Permission.DELETE_FS | Permission.WRITE_FS,
    "query_db":      Permission.READ_DB,
    "update_db":     Permission.WRITE_DB,
    "send_email":    Permission.SEND_EMAIL,
    "http_request":  Permission.CALL_API,
    "run_python":    Permission.RUN_CODE,
}
 
@dataclass
class AgentScope:
    granted: Permission
    allowed_paths: list[str] = None       # filesystem path allowlist
    allowed_domains: list[str] = None     # HTTP domain allowlist
 
    def check(self, tool: str, args: dict) -> None:
        required = TOOL_REQUIRED_PERMS.get(tool, Permission(0))
        missing = required & ~self.granted
        if missing:
            raise PermissionError(
                f"Tool `{tool}` requires {missing!r} — not in agent scope."
            )
        # Path allowlist enforcement
        if self.allowed_paths and "path" in args:
            path = args["path"]
            if not any(path.startswith(p) for p in self.allowed_paths):
                raise PermissionError(
                    f"Path `{path}` not in allowed paths: {self.allowed_paths}"
                )

Scope objects are created at task instantiation and passed through the entire agent loop. The agent never self-grants permissions — scope is determined by the caller (the orchestration layer), not the agent.

Dry-Run Mode

Dry-run mode executes the permission and validation logic but skips the actual side effect. It is essential for two use cases: testing agent behavior without touching real systems, and showing users a preview of what the agent plans to do before it does it.

from typing import Any
 
class ToolRunner:
    def __init__(self, scope: AgentScope, dry_run: bool = False):
        self.scope = scope
        self.dry_run = dry_run
        self.dry_run_log: list[dict] = []
 
    def run(self, tool: str, args: dict) -> Any:
        # Always check permissions, even in dry-run
        self.scope.check(tool, args)
 
        if self.dry_run:
            preview = self._preview(tool, args)
            self.dry_run_log.append({"tool": tool, "args": args, "preview": preview})
            return f"[DRY RUN] Would execute: {preview}"
 
        return self._execute(tool, args)
 
    def _preview(self, tool: str, args: dict) -> str:
        previews = {
            "write_file": lambda a: f"Write {len(a.get('content',''))} bytes to {a['path']}",
            "delete_file": lambda a: f"Delete {a['path']}",
            "send_email": lambda a: f"Email to {a['to']}: '{a.get('subject','')}'",
            "update_db": lambda a: f"UPDATE {a.get('table','?')} WHERE {a.get('where','?')}",
        }
        return previews.get(tool, lambda a: f"{tool}({a})")(args)
 
    def _execute(self, tool: str, args: dict) -> Any:
        # Real implementation delegates to actual tool functions
        raise NotImplementedError

Show the dry_run_log to users or the human-in-the-loop reviewer as the "plan preview" before switching to live mode. This combines nicely with the checkpoint pattern from post 8.

Rollback Records

For every destructive action, write a rollback record before executing. This is the agent equivalent of a database transaction log.

import json
import shutil
from pathlib import Path
from datetime import datetime, timezone
 
class RollbackManager:
    def __init__(self, rollback_dir: str = "/tmp/agent_rollback"):
        self.dir = Path(rollback_dir)
        self.dir.mkdir(parents=True, exist_ok=True)
 
    def snapshot_file(self, path: str) -> str:
        """Copy file to rollback dir before overwriting/deleting."""
        src = Path(path)
        if not src.exists():
            return "nonexistent"
        snapshot_path = self.dir / f"{src.name}.{int(datetime.now().timestamp())}.bak"
        shutil.copy2(src, snapshot_path)
        return str(snapshot_path)
 
    def record_db_change(self, table: str, where: str, before_values: dict) -> str:
        """Write inverse SQL to a rollback record."""
        record_path = self.dir / f"db_{table}_{int(datetime.now().timestamp())}.json"
        record = {
            "ts": datetime.now(timezone.utc).isoformat(),
            "table": table,
            "where": where,
            "before_values": before_values,
            "rollback_sql": (
                f"UPDATE {table} SET "
                + ", ".join(f"{k}={json.dumps(v)}" for k, v in before_values.items())
                + f" WHERE {where};"
            ),
        }
        record_path.write_text(json.dumps(record, indent=2))
        return str(record_path)
 
    def rollback_file(self, snapshot_path: str, original_path: str):
        if snapshot_path == "nonexistent":
            Path(original_path).unlink(missing_ok=True)
        else:
            shutil.copy2(snapshot_path, original_path)

The rollback record is written before the action executes. If the action partially succeeds and then fails (network cut, crash), you still have the rollback artifact.

Sandboxing Code Execution

If your agent can run arbitrary code (run_python, shell exec), sandboxing becomes existential. The minimum viable approach: subprocess isolation with resource limits.

import subprocess
import resource
 
def run_sandboxed_python(code: str, timeout_sec: float = 10.0) -> str:
    """
    Run Python in a subprocess with CPU/memory limits.
    In production, use Docker, gVisor, or Firecracker instead.
    """
    # Blocklist dangerous imports
    BLOCKED = ["os.system", "subprocess", "importlib", "__import__", "eval", "exec"]
    for pattern in BLOCKED:
        if pattern in code:
            raise PermissionError(f"Blocked pattern in code: `{pattern}`")
 
    result = subprocess.run(
        ["python3", "-c", code],
        capture_output=True,
        text=True,
        timeout=timeout_sec,
        # Resource limits via preexec_fn (Unix only)
        preexec_fn=lambda: (
            resource.setrlimit(resource.RLIMIT_AS, (256 * 1024 * 1024,) * 2),
            resource.setrlimit(resource.RLIMIT_CPU, (5, 5)),
        ),
    )
    if result.returncode != 0:
        raise RuntimeError(f"Code failed: {result.stderr[:500]}")
    return result.stdout

For production, use container-level sandboxing (Docker with --network=none --read-only --memory=256m) or dedicated sandbox services (E2B, Morph). The subprocess approach above is appropriate only for dev/test environments.

Key Takeaways

Blast radius is a design input: define the worst-case outcome before writing agent code and ensure sandboxing constrains it to acceptable levels.
Permission scopes are set by the orchestration layer at task creation — agents must never self-grant permissions.
Dry-run mode should enforce the same permission checks as live mode — it is not a bypass, it is a preview.
Write rollback records before destructive actions execute, not after — a mid-action crash still needs a recovery path.
Blocklisting dangerous patterns in code execution is a first line of defense, not a complete solution — use container sandboxing for production.
Every permission denial and rollback event should emit a structured log entry; the audit trail is your incident response tool when something goes wrong.

Series

Agent Engineering

Part 9 of 10

← Part 8

Human-in-the-Loop Checkpoints

Part 10 →

Production Rollout Patterns