Agent Engineering

Human-in-the-Loop Checkpoints

Ravinder·March 22, 2025·6 min read

AgentsAILLMHuman-in-the-LoopUXSafety

Series

Agent Engineering

Part 8 of 10

← Part 7

Evals for Agentic Workflows

Part 9 →

Sandboxing and Blast Radius

Full autonomy is a product decision, not a technical default. The question is not whether to add human checkpoints but where — and how to make them feel like safety nets rather than friction. A poorly designed checkpoint that triggers on every trivial action trains users to click "approve" without reading. A poorly placed checkpoint that triggers too late fails to prevent the damage it was meant to catch.

Anatomy of a Checkpoint

A checkpoint is a pause in the agent loop where a human reviews and approves (or rejects) a pending action before it executes. Every checkpoint needs four things: a trigger condition, a human-readable summary of the pending action, a response interface, and a timeout policy.

flowchart TD A[Agent proposes action] --> B{Checkpoint trigger?} B -- no --> C[Execute action] B -- yes --> D[Notify human\nwith summary] D --> E{Human response} E -- approve --> C E -- reject --> F[Agent replans] E -- timeout --> G{Timeout policy} G -- auto-approve --> C G -- auto-reject --> F G -- escalate --> H[Senior reviewer] C --> I[Log to audit trail] F --> I

The timeout policy is where most implementations make a mistake. Auto-approve on timeout is dangerous (the agent proceeds without consent). Auto-reject on timeout is safe but creates deadlocks for long-running tasks. The right answer depends on the action's reversibility: auto-approve reversible actions, auto-reject irreversible ones.

Trigger Conditions

Not all actions need approval. The trigger condition is the key design decision.

from enum import Enum
from dataclasses import dataclass
 
class ActionRisk(Enum):
    LOW = "low"       # read-only, reversible
    MEDIUM = "medium" # writes, but recoverable
    HIGH = "high"     # irreversible, external effects
 
@dataclass
class AgentAction:
    tool: str
    args: dict
    risk: ActionRisk
    estimated_cost_usd: float = 0.0
    affects_external_systems: bool = False
 
CHECKPOINT_RULES = [
    # Always checkpoint high-risk
    lambda a: a.risk == ActionRisk.HIGH,
    # Checkpoint if external system affected
    lambda a: a.affects_external_systems,
    # Checkpoint if estimated cost above threshold
    lambda a: a.estimated_cost_usd > 5.0,
    # Checkpoint file deletions
    lambda a: a.tool == "delete_file",
    # Checkpoint emails/messages
    lambda a: a.tool in ("send_email", "post_message", "send_sms"),
]
 
def needs_checkpoint(action: AgentAction) -> bool:
    return any(rule(action) for rule in CHECKPOINT_RULES)

This rule-based approach is explicit and auditable. Avoid fuzzy heuristics like "ask the LLM if this action is risky" — you will get inconsistent results and no audit trail for the trigger decision itself.

The Human-Readable Summary

A checkpoint is only useful if the human can understand what they are approving. Raw tool args ({"path": "/var/data/users.csv", "mode": "delete"}) are not enough.

def generate_checkpoint_summary(action: AgentAction, context: dict) -> str:
    """Generate a plain-English summary for human review."""
    templates = {
        "delete_file": (
            "The agent wants to permanently delete the file at `{path}`. "
            "This cannot be undone. Current task: {task_summary}."
        ),
        "send_email": (
            "The agent wants to send an email to {to} with subject '{subject}'. "
            "Preview of body: {body_preview}. Current task: {task_summary}."
        ),
        "write_file": (
            "The agent wants to write to `{path}` ({size} bytes). "
            "Existing file will be overwritten. Current task: {task_summary}."
        ),
    }
    template = templates.get(
        action.tool,
        "The agent wants to call `{tool}` with args: {args}. Current task: {task_summary}."
    )
    return template.format(
        **action.args,
        tool=action.tool,
        args=action.args,
        task_summary=context.get("task_summary", "unknown"),
        body_preview=str(action.args.get("body", ""))[:200],
    )

Include the current task summary in every checkpoint message. Without it, reviewers have no context for why the action is being proposed.

Checkpoint UX Patterns

Inline approval: The agent pauses and the user responds in the same conversation thread. Works well for chat-based agents. Friction is low when done right.

Async queue: The agent queues the action and continues other work. A human reviews a batch of pending approvals. Works for non-blocking workflows where the human is not actively watching.

Email/SMS escalation: For high-stakes irreversible actions, send a notification with a one-click approve/reject link. Use when the stakes warrant the latency.

import asyncio
from typing import Literal
 
CheckpointResponse = Literal["approved", "rejected", "timeout"]
 
async def await_human_approval(
    action: AgentAction,
    summary: str,
    notify_fn,                   # e.g., send_slack, send_email
    timeout_sec: float = 300.0,
    timeout_policy: CheckpointResponse = "rejected",
) -> CheckpointResponse:
    """Pause agent, notify human, wait for response."""
    approval_event = asyncio.Event()
    response: list[CheckpointResponse] = []
 
    def on_response(decision: CheckpointResponse):
        response.append(decision)
        approval_event.set()
 
    # Register response handler (webhook, websocket, polling)
    checkpoint_id = register_checkpoint(action, on_response)
    await notify_fn(summary, checkpoint_id)
 
    try:
        await asyncio.wait_for(approval_event.wait(), timeout=timeout_sec)
        return response[0]
    except asyncio.TimeoutError:
        return timeout_policy

The timeout_policy parameter makes the reversibility decision explicit at the call site, not buried in a config file.

Audit Trail

Every checkpoint decision — approve, reject, timeout — must be logged with who made the decision, when, and what summary they saw. This is non-negotiable for any agent operating on production systems.

import json
from datetime import datetime, timezone
from pathlib import Path
 
class AuditLog:
    def __init__(self, log_path: str):
        self.path = Path(log_path)
        self.path.parent.mkdir(parents=True, exist_ok=True)
 
    def record_checkpoint(
        self,
        task_id: str,
        action: AgentAction,
        summary: str,
        decision: str,
        reviewer: str,
        review_latency_sec: float,
    ):
        entry = {
            "ts": datetime.now(timezone.utc).isoformat(),
            "task_id": task_id,
            "tool": action.tool,
            "risk": action.risk.value,
            "summary_shown": summary,
            "decision": decision,
            "reviewer": reviewer,
            "review_latency_sec": review_latency_sec,
        }
        with self.path.open("a") as f:
            f.write(json.dumps(entry) + "\n")

The summary_shown field is critical: you need to know exactly what the human saw when they made the decision, not just what the action was. If your summary generation has a bug, you can trace back to which approvals were made on bad summaries.

Avoiding Checkpoint Fatigue

The death of a good checkpoint system is alarm fatigue. If 70% of agent actions require approval, users will approve without reading. To prevent this:

Tune triggers continuously — track approval rate. If >20% of actions hit checkpoints, tighten the trigger rules.
Auto-approve patterns — actions the same user has approved 10+ times for the same tool/args pattern are candidates for a trusted-action allowlist.
Batch low-risk checkpoints — instead of one notification per action, batch multiple low-risk pending actions into a single review.

Key Takeaways

Checkpoint trigger conditions should be explicit, rule-based, and auditable — not delegated to an LLM's judgment about riskiness.
Auto-approve reversible actions on timeout; auto-reject irreversible actions on timeout — make this decision explicit at the call site.
Every checkpoint notification must include the current task context, not just the raw action args.
Log the exact summary the human saw when approving — not just the action — so you can audit decisions made on bad summaries.
Monitor checkpoint approval rate: if >20% of actions require approval, trigger rules are miscalibrated and fatigue will degrade review quality.
A trusted-action allowlist (auto-approve repeated identical patterns) is a safe way to reduce friction without removing oversight.

Series

Agent Engineering

Part 8 of 10

← Part 7

Evals for Agentic Workflows

Part 9 →

Sandboxing and Blast Radius