When LLMs can call tools that change real state, the design decisions that matter most are about what's gated, what's automatic, and what triggers a human checkpoint.

On this page

AI Agent Tool Design: Boundaries and Confirmations

We've been shipping LLM agents for about a year — workers that can call tools to read and write real state. The interesting design decisions aren't about the LLM at all; they're about what tools to expose, how to gate writes, and where to require a human checkpoint. This post is what we've landed on after a few near-misses and one real incident.

The shape of the problem #

An LLM agent gets a task ("update the customer's mailing address"), decides which tool to call, and the tool changes something. The model isn't deterministic; the task isn't always well-specified; the input might be adversarial. The question is: what guardrails make this safe enough to ship?

Two failure modes we've seen:

The wrong tool gets called. Model misreads intent, calls delete_customer instead of archive_customer.
The right tool gets called with wrong arguments. Model fills in a parameter from context that doesn't match what the user meant.

Both are real. Both happen in low percentage of cases but at scale "low percentage" is real customer impact. The design problem is making the cost of these failures low enough to live with.

The read/write split #

The single most important design decision: separate read tools from write tools and gate them differently.

Read tools (get_customer_profile, list_recent_orders, search_knowledge_base) — we expose freely. The model can call them as often as it wants. Cost is read-only; worst case is some wasted tokens.

Write tools (update_address, cancel_subscription, issue_refund) — gated. Different model, different prompt, different invocation flow, different audit trail. Sometimes a different process entirely.

We had read+write tools in the same prompt early on. The model was happy to call writes for tasks where it should have just read. Splitting changed agent behavior dramatically — calling writes became deliberate, not incidental.

Patterns for gating writes #

Three patterns that earn their place:

Two-step: propose then confirm. The agent runs the write tool in "propose" mode, which returns a description of what would change. A human (or a second agent with a separate prompt) confirms before the actual write executes. For most consequential writes we use this. Adds latency; adds safety.

Dry-run by default. Some tools default to dry-run mode. The agent has to explicitly pass confirm=true to actually execute. The default makes "I'm not sure" path safe; the explicit confirmation makes "I'm sure" deliberate.

Read-only impersonation in test mode. For some internal-only tools, we run the agent against a read-only mirror of production data, only swapping to live writes after we've validated the agent's behavior. The mirror is updated nightly; agents that read it can't affect production state.

The right pattern depends on the write's reversibility. Adding a tag to a record — fine, just do it. Issuing a $500 refund — propose, confirm, then execute.

Confirmation routing: who confirms what #

Once you've decided writes need confirmation, the question is: who?

Two-step flow #1: agent → user → agent. The agent proposes a write to the user via the UI ("I'm about to cancel your subscription, confirm?"). User clicks confirm. Agent executes. Good for direct user-driven flows; not useful for automated backend agents.

Two-step flow #2: agent → second agent → execution. The second agent has a different prompt focused on safety review. Cheaper than human review at scale. Quality is OK for simple "does this look like a normal operation?" checks; not enough for high-stakes operations.

Two-step flow #3: agent → on-call human → execution. For genuinely high-stakes operations, a human in the loop. We use this for anything that touches money or destroys data.

We layer these. A typical agent run for a non-trivial task: agent proposes → safety-review agent checks against policy → if green, execute; if questionable, page on-call.

Tool schemas matter a lot #

Vague tool descriptions give vague behavior. Specific schemas with examples in the description give predictable behavior.

Example of a bad schema:

json.json

{
  "name": "update_user",
  "description": "Updates the user record.",
  "parameters": {"user_id": "string", "data": "object"}
}

Example of a good schema:

json.json

{
  "name": "update_user_address",
  "description": "Updates only the mailing address on a user record. Use this for moving-address requests. Do NOT use this to update email, phone, or name — those have separate tools.",
  "parameters": {
    "user_id": {"type": "string", "description": "The internal user ID (not their email)"},
    "address": {
      "type": "object",
      "properties": {
        "street": {"type": "string"},
        "city": {"type": "string"},
        "postal_code": {"type": "string"},
        "country": {"type": "string", "description": "ISO 3166-1 alpha-2 country code"}
      },
      "required": ["street", "city", "country"]
    }
  }
}

The narrower the tool's scope, the easier it is for the model to use correctly. We've moved from a few broad tools to many narrow tools. "Update the user" became "update address", "update email", "update phone", "update name" — four tools instead of one. Each individually is easier for the model to call right.

Argument validation #

Tool implementations validate every argument server-side. Don't trust the schema to constrain the model's output. We've seen:

Country codes that aren't valid ISO codes
User IDs that look right but don't exist (model hallucinated)
Currency amounts that are obviously wrong (negative, or absurdly large)
Date formats slightly off

For each, the validation rejects the call with a structured error. The agent often handles the error gracefully — tries again, with a fix, or asks the user for clarification. But the safety boundary is the server-side validation, not the model's compliance.

Logging every call #

Every tool invocation logs:

Task ID + agent run ID
Tool name
Full arguments (sometimes redacted for sensitive fields)
Result or error
Latency
Timestamp

Why: when something goes wrong, the only way to debug is to see exactly what was called. We've reconstructed incidents weeks later from these logs. They live in S3 with 1-year retention; expensive but worth it.

We also log the LLM's reasoning trace (the <thinking> content or equivalent) for the most consequential operations. Reading "the model thought X about the user's intent" is sometimes the only way to understand why a wrong tool was called.

The incident that taught us this #

A real one, anonymized: the agent had a complete_task tool that marked a customer's task as done. The agent decided a task was complete (correctly), but called the tool with the wrong task ID — one that referred to a different customer's task. The wrong task got marked done.

What we changed:

complete_task now requires both task_id and a sanity-check field (the customer's email). Mismatch → rejected.
Logged trace shows clearly which task the agent thought it was working on.
A nightly audit checks for tasks marked complete that don't match expected patterns; flagged for review.

None of these are clever. They're boring engineering. The boring engineering is what makes agents safe to ship.

What we don't do #

A few patterns we've considered and skipped:

Letting agents create new tools. Some frameworks support agents writing their own code, then executing it. We don't. The blast radius is too large and we can't audit it.

Agents calling agents recursively. Same reason — unbounded recursion, unbounded cost, unbounded blast radius. We allow at most one level of agent-to-agent calls.

Tools that take freeform SQL or code as arguments. The model loves writing creative SQL. Some of it would be fine; some would not. We expose query builders with constrained parameters instead.

Skipping the confirmation step "because the user is impatient." Speed isn't worth the failure modes. If users find confirmations annoying, that's a UX problem to fix differently — not by removing the safety layer.

What I'd tell a team starting #

Read tools wide open; write tools heavily gated. The most important design decision.

Many narrow tools, not few broad ones. Easier for the model to pick correctly.

Server-side argument validation, always. The schema is documentation, not a security boundary.

Log every call. When something goes wrong (it will), the logs are how you debug.

Two-step for consequential writes. Propose, confirm, execute. The latency cost is small; the safety improvement is large.

Tabletop the failure modes. Walk through what happens if the agent calls each tool wrong. The exercise reveals which tools need more gating.

Agent tool design is mostly software engineering, not ML. Once you have those gates in place, the LLM at the core matters less than you'd think — the safety comes from the surrounding system. The teams that ship agents responsibly are the ones who treat tool-use as a security boundary, not a convenience.

AI Agent Tool Design — Boundaries and Confirmations

AI Agent Tool Design: Boundaries and Confirmations

The shape of the problem #

The read/write split #

Patterns for gating writes #

Confirmation routing: who confirms what #

Tool schemas matter a lot #

Argument validation #

Logging every call #

The incident that taught us this #

What we don't do #

What I'd tell a team starting #

Stay Updated

Chaos Engineering — What We Actually Run as Game Days

Karpenter — Node Provisioning Patterns at Scale

More from AI

Embeddings Drift Detection — When "Similar Enough" Stops Being Similar

LLM Streaming UX — Backpressure, Cancellation, Partial Results

What Are Embeddings? A Beginner's Guide with Code

Embeddings Drift Detection — When "Similar Enough" Stops Being Similar

LLM Streaming UX — Backpressure, Cancellation, Partial Results

What Are Embeddings? A Beginner's Guide with Code

Prompt Engineering Basics — From "Help Me" to Working Prompts

SSH Tutorial — Keys, Config, and Working Remotely

Linux File Permissions — Read, Write, Execute Without Tears

About Admin

You might have missed

GitOps with Argo CD: Best Practices for 2025

Linux Performance Tuning for Containers and Kubernetes Nodes

AWS Lambda and Serverless Best Practices for Production