The four failure modes of AI agents in production
Hallucinated tool calls, scope creep at runtime, cost runaway, audit-trail loss. Each one has a boundary condition. Each one has a specific mitigation. Here's the operator's read on what actually breaks when an agent ships.
What actually breaks when an agent ships?
Four things, in order of how often we see them in production audits.
Most agent failures are not catastrophic. They're cumulative. A small percentage of sessions go sideways, the operator doesn't notice for a few weeks, and the cost — in money, in reputation, in misrouted decisions — compounds quietly. The fixes are mostly architectural rather than reactive. Knowing which failure mode is yours is most of the work.
Failure mode 1 — hallucinated tool calls
The agent calls a tool that doesn't exist, or calls a tool that does exist with arguments the tool can't accept.
What it looks like: the agent's reasoning trace shows "I'll call lookup_customer_history(customer_id='ACME-2024')" — but lookup_customer_history was never registered in the agent's tool schema, or customer_id is supposed to be a UUID rather than a string. The runtime returns an error, the agent retries with a different invocation that's also wrong, and the session either burns through its tool-call budget or returns an unhelpful response to the user.
Why it happens: the agent's tool-use training generalizes from training examples. When the schema you give it at runtime resembles patterns from training, it fills in plausibly-named tools or plausibly-shaped arguments that don't match your specific deployment.
The mitigation: tight tool-schema validation. Every registered tool emits a strict JSON schema. The runtime rejects any tool call where the name isn't in the registered set, or the arguments don't validate against the schema. The rejection returns to the agent as an explicit error with the registered tool list re-included in context, so the agent's next call can correct the mistake.
This is not an exotic safeguard. It's the default in the OpenAI Agent SDK and the Anthropic Agent SDK. Operators who built agents directly against the foundation-model APIs (without an SDK abstraction) often skip it. The fix takes a few hours and reduces hallucinated tool calls by approximately 90%.
Failure mode 2 — scope creep at runtime
Given a vague goal, the agent does more than the operator expected.
What it looks like: the operator scoped the agent to "draft a first-pass response to inbound sales inquiries." The agent in production reads an inquiry, drafts a response, then notices the inquiry asks for a custom integration. Without permission, the agent calls the CRM tool to update the lead's stage, calls the calendar tool to suggest a meeting, calls the email tool to send the draft, and adds a note in the CRM about the custom integration that misrepresents what the firm actually offers. Three tool calls beyond the goal. One factual error. The operator finds out a week later when the prospect calls confused.
Why it happens: the agent's reasoning loop optimizes for task completion against the goal it inferred. If the goal is vague, it infers a broader goal than the operator intended and takes more actions to hit the broader goal. The actions feel helpful in isolation; they're scope-violating in aggregate.
The mitigation: sharper goal definitions plus explicit tool-call budgets. The goal becomes "draft a first-pass response. Do not send it. Do not update CRM stage. Do not schedule meetings. The output is a draft email, returned for human review." The tool-call budget caps total actions per session at a defined number — usually 3 to 5 for narrow agent jobs.
Operators consistently underestimate how much the goal definition matters. A goal stated in five words produces a different behavioral envelope than the same goal stated in two paragraphs with explicit refusals.
Failure mode 3 — cost runaway
A single agent session runs hundreds of tool calls and costs dollars.
What it looks like: the operator's monthly bill for the agent comes in at 4x the budgeted number. Investigation shows the runaway came from 12 specific sessions over the month, each of which entered a reasoning loop where the agent kept calling tools, getting partial answers, calling more tools, never converging on a terminal answer. Each of those 12 sessions cost $4 to $12 in API tokens; the rest of the sessions cost cents.
Why it happens: without explicit termination conditions, the agent's reasoning loop has no reason to stop. The model is trained to continue trying when it's not confident in the answer. Without a hard cap on tool calls per session, total tokens per session, or wall-clock seconds per session, the loop runs until it hits some implicit limit.
The mitigation: three caps, all hard.
- Tool-call cap per session. A defined integer (typically 5 to 25 depending on task complexity). Once the agent crosses the cap, the runtime returns the agent's best answer so far and terminates the session.
- Token cap per session. A defined integer (typically 50K to 200K). Once tokens crossed, runtime terminates.
- Wall-clock cap per session. Typically 120 to 300 seconds. Once time crossed, runtime terminates.
These caps are non-negotiable for production deployments. The cost of a 5%-failure-rate agent without caps is unbounded. The cost with caps is upper-bounded at (cap × cost-per-call × session-count). The bound is what makes agent budgeting possible.
Operators who built agents in 2024 without caps and were burned by the bill usually deploy 2026 agents with all three. Operators new to agent deployment often skip them and discover the math the hard way.
Failure mode 4 — audit-trail loss
The agent took an action; nobody can reconstruct why.
What it looks like: an operator gets a question from a customer about why their account was credited $400 by an automated agent. The operator pulls up the agent's logs and finds — the prompt, the tool calls, the response. But not the reasoning trace between them. The agent decided the credit was warranted; the chain of evidence the agent used is gone. The operator can't defend the decision because they can't reproduce the decision-making process.
Why it happens: agent runtimes vary in what they log by default. Many log only the final response and the sequence of tool calls, not the model's full chain-of-thought. When the operator needs to debug or defend a specific decision, the missing trace is the gap.
The mitigation: structured logging with three required fields per agent action.
- Prompt (verbatim, including the system prompt and tool schemas)
- Tool calls (each call with arguments, response, and timestamp)
- Reasoning trace (the agent's internal reasoning between tool calls, captured if the model exposes it; otherwise the model's intermediate
thinkingblocks per the SDK's API)
The logs need to be queryable by session ID, timestamp range, and (where applicable) user ID or customer ID. They need retention long enough to cover the longest dispute window in the operator's category — for regulated decisions, that may be 7 years. For unregulated decisions, 90 days is usually enough.
The cost of structured logging is small. The cost of not having it surfaces only when something goes wrong, by which time the trace is gone. Operators who experience a single audit-trail-loss incident usually retrofit logging immediately. Operators who haven't experienced one yet often skip it.
Where do these modes intersect?
The four are not independent. They compound.
A hallucinated tool call (mode 1) can trigger scope creep (mode 2) when the agent retries with a different tool that does exist but is broader than the operator intended. Scope creep can trigger cost runaway (mode 3) when the broader scope generates more tool calls. Cost runaway can trigger audit-trail loss (mode 4) when the runtime truncates logs to manage storage of sessions that ran too long.
The compounding is why the four mitigations are usually deployed together rather than picked individually. Tight tool schemas + sharp goal definitions + hard caps + structured logging is one architectural commitment, not four separate ones.
What does the operator do this week?
Three concrete moves before deploying any agent past the prototype phase.
Move 1 — audit your tool schema validation. If you built directly against the foundation-model APIs without an SDK, your tool schemas may be advisory rather than enforced. The fix is to wrap the runtime with a validator that rejects malformed tool calls explicitly.
Move 2 — write the goal definition in a paragraph, with explicit refusals. "Do X. Do not do Y. Do not do Z. The output is W." Five sentences minimum. The longer goal definition shrinks the behavioral envelope.
Move 3 — set the three caps, with monitoring. Tool-call cap, token cap, wall-clock cap. Monitor the percentage of sessions that hit each cap. If above 5% of sessions hit any cap, revisit the agent's design — caps hitting often means the agent isn't converging cleanly.
The full decision tree on whether to ship an agent at all lives at does my company actually need an AI agent. The shape comparison against alternatives lives at AI agents vs automations vs internal tools.
Where to go from here
- The decision tree: does my company actually need an AI agent.
- The shape comparison: AI agents vs automations vs internal tools.
- The readiness check: AI readiness checklist for operators past PMF.
- Or just request the audit: /audit. The four failure modes are mitigatable. The decision to ship the agent at all is what the audit answers first.