When Budget Runs Out: AI Agent Degradation Patterns

Part of: AI Agent Risk & Blast Radius Reference — the full pillar covering action authority, risk scoring, blast-radius containment, and degradation paths.

Runtime enforcement solves one problem and creates another. Once your guardrails can block an agent action before execution, you need to decide what the agent should do when it gets blocked.

A hard stop — "budget exceeded, goodbye" — is better than a runaway agent. But it's not a good user experience. The agent was in the middle of something. The user was waiting for a result. A bare error message doesn't help either of them.

And budget isn't the only reason an agent gets blocked. Runtime authority enforces both cost limits (you've spent your $10) and risk limits (you've used your 3 allowed send_email() calls, or your RISK_POINTS budget is exhausted). The DENY is the same signal. The recovery path is different.

This post covers five patterns for handling enforcement decisions gracefully — whether the trigger is cost, risk, or both, and whether the response is a full DENY, a constrained ALLOW_WITH_CAPS, or a soft signal that limits are approaching. The goal is the same in every case: the agent completes as much useful work as possible within the boundaries it's given.

What Frameworks Do Today

Before covering the patterns, it's worth understanding the current state of the art. Frameworks are getting better at detecting limits and intercepting stops, but graceful degradation is still mostly application logic.

LangGraph raises a GRAPH_RECURSION_LIMIT error when the agent hits its iteration cap — a hard stop by default. LangChain's newer middleware surface adds prebuilt components for summarization, model fallback, model-call limits, tool-call limits, and human-in-the-loop approval. These are real building blocks, but assembling them into a coherent degradation strategy per agent is still application logic.

CrewAI is the closest to graceful degradation built-in. When an agent hits max_iter (default 20), it's forced to provide its best answer rather than crash. CrewAI also auto-summarizes context when the token window overflows — genuine capability narrowing at the framework level.

OpenAI Agents SDK raises MaxTurnsExceeded when the run exceeds the max_turns you configure. But it offers an error_handlers mechanism where you can return a controlled final output instead of raising the exception — the infrastructure for Pattern 5, if you wire it up yourself.

Claude Agent SDK has first-class session budget controls via max_budget_usd and returns structured result messages, which makes graceful termination easier to handle than a generic crash. Anthropic's engineering blog also recommends checkpointing with progress files so agents can resume from where they stopped rather than restart from scratch.

The common gap: Frameworks provide pieces — fallback, summarization, iteration or tool-call limits, error handlers — but no unified degradation policy model that ties them together. That's what the five patterns below address.

The Three Signals Your Agent Receives

Every pattern below depends on the three enforcement decisions an agent can receive:

Decision	Triggered by	Meaning	Agent should...
ALLOW	—	Proceed normally	Execute the planned action
ALLOW_WITH_CAPS	Cost, risk, or both	Proceed, but with constraints	Adapt — cheaper model (cost), narrower tools (risk), or both
DENY	Cost, risk, or both	Action not permitted	Stop, degrade, queue, or inform the user

The binary allow/deny model forces hard stops. The three-way model gives agents room to adapt. Most of the patterns below depend on ALLOW_WITH_CAPS — the middle ground where enforcement narrows what the agent can do without killing the session entirely.

Pattern 1: Model Downgrade

When to use: Budget is running low but the task can still be completed with a cheaper model.

The simplest degradation path — and the one with the most existing framework support. LangChain's fallback middleware provides built-in model fallback components for this. When the agent receives ALLOW_WITH_CAPS or detects that remaining budget is thin, it switches from a high-capability model to a cheaper one.

The fallback chain:

GPT-4o → GPT-4o-mini → cached/partial response
Claude Opus → Claude Sonnet → Claude Haiku

How it works in practice:

Agent requests a reservation for the next LLM call
Enforcement returns ALLOW_WITH_CAPS with a reduced budget ceiling
Agent checks whether the preferred model fits within the cap
If not, it walks down the fallback chain until it finds a model that fits
If no model fits, it falls through to Pattern 4 or 5

Trade-offs:

Users get a result instead of an error
Output quality drops — summarization may be less nuanced, code generation less precise
Requires designing and testing fallback chains per use case
Some tasks genuinely need a high-capability model and can't degrade meaningfully

Watch out for the context window trap. If the agent has accumulated a large context (150K+ tokens of conversation history and retrieved documents), downgrading to a model with a smaller effective context window — or one that handles long context less well — can cause a hard failure or severe quality drop. Model downgrade often requires a truncation or summarization step before the fallback call, not just a model swap.

When it doesn't work: Tasks where the cheaper model produces wrong answers rather than less polished ones. A legal document review that needs Opus-level reasoning shouldn't silently fall back to Haiku.

Pattern 2: Capability Narrowing

When to use: The agent has multiple tools, and enforcement progressively restricts which ones it can use. This is the primary pattern for risk-driven enforcement — it's triggered by RISK_POINTS consumption, not just dollar spend.

This is the progressive capability narrowing pattern. As the agent consumes its RISK_POINTS budget — where high-blast-radius actions like send_email (20 points) or deploy (50 points) cost far more than read-only operations — the enforcement layer returns ALLOW_WITH_CAPS with increasingly restrictive tool_denylist or tool_allowlist constraints.

Example policy progression:

Budget consumed	Decision	Caps	Agent can still...
0-50%	ALLOW	—	Full tool access
50-80%	ALLOW_WITH_CAPS	`tool_denylist: [send_email, deploy, delete_record]`	Read, search, generate, write drafts
80-100%	ALLOW_WITH_CAPS	`tool_allowlist: [search, read_file, summarize]`	Read-only operations
100%	DENY	—	Must stop or inform user

What the agent does at each stage:

Full access: Normal operation
High-risk tools denied: Agent continues but skips actions that send external messages, mutate data, or trigger deployments. It can still research, draft, and prepare outputs for human review.
Read-only mode: Agent can answer questions, retrieve information, and summarize — but can't take any action with side effects.
DENY: Graceful stop (see Pattern 5).

Trade-offs:

The agent degrades gracefully instead of hard-stopping
It can still complete useful work — reading files, running searches, generating summaries — while dangerous capabilities are removed
Requires defining risk tiers for your tools (see risk assessment guide)
Users may not realize the agent is operating in a constrained mode unless you tell them
Critical: the agent must know why a tool was denied. If the enforcement layer returns a generic "tool execution failed" error, a persistent agent will retry the tool in an expensive loop or hallucinate alternatives. The DENY or ALLOW_WITH_CAPS response should include context the LLM can reason about — e.g., "send_email capability revoked due to risk limits. Proceed with read-only operations." — so the agent changes strategy instead of retrying blindly

Pattern 3: Queue and Defer

When to use: The action is important but not urgent, and can wait for a new budget window.

Not every DENY needs to be a dead end. Some actions — batch processing, scheduled reports, non-urgent notifications — can be queued for later execution when budget replenishes.

How it works:

Agent receives DENY for the current action
Agent checks whether the action type is deferrable (based on priority or configuration)
If deferrable: agent queues the action with metadata (what to do, for whom, estimated cost)
Agent confirms to the user: "I've queued this report for processing. It'll run when the next budget window opens."
A scheduler or background worker picks up the action when budget is available

Where this fits naturally:

Nightly batch summarizations
Scheduled data exports
Non-urgent email drafts (queue for human review instead of sending immediately)
Research tasks that can run during off-peak hours

Where it doesn't fit:

Real-time user interactions where the user is waiting for a response
Time-sensitive actions (incident response, live customer support)
Actions with external deadlines

Trade-offs:

Converts a hard failure into a delayed success
Requires a queueing mechanism and budget-aware scheduler
State rehydration is hard. Deferring a simple task (send this report) is straightforward. Deferring an agent mid-reasoning-chain requires serializing its memory, scratchpad, and graph state, storing it durably, and rehydrating it later. Realistically this requires a durable execution engine (like Temporal or Inngest) or a robust state-store design — making this the heaviest pattern to implement from scratch
Users need to understand that "deferred" doesn't mean "lost"
Budget windows must be well-defined and communicated

Pattern 4: Partial Completion with Summary

When to use: The agent is mid-task when it hits a budget boundary, and some work is already done.

The worst user experience isn't a DENY at the start — it's a DENY in the middle, with no indication of what was accomplished. This pattern ensures the agent saves its progress and tells the user what it completed and what remains.

How it works:

Agent is executing a multi-step task (e.g., reviewing 10 files, processing 50 records)
Mid-task, a reservation returns DENY or ALLOW_WITH_CAPS with max_steps_remaining: 1
Agent uses its final allowed step to:
- Save intermediate results (drafts, partial analyses, processed records)
- Generate a summary of work completed vs. work remaining
- Provide a clear handoff: "I reviewed 6 of 10 files. Here are my findings so far. The remaining 4 files are: ..."

Example agent response:

"I've analyzed 6 of the 10 pull requests you asked about. Here's what I found so far:
PR #142: Approved — minor style changes only
PR #145: Needs review — changes to auth middleware
[4 more results]
I wasn't able to review PRs #149, #150, #151, and #153 — session budget reached its limit. You can continue in a new session or I can queue the remaining reviews."

Trade-offs:

Users get partial value instead of nothing
Clear handoff reduces frustration and enables manual follow-up
Requires the agent to track its own progress and be able to summarize on demand
The "summarize what you've done" step itself costs tokens — budget enough for it

Anthropic's engineering team recommends a similar approach for long-running agents: maintain a progress file that logs completed actions, so new sessions can pick up where the previous one stopped rather than restarting from scratch.

Design tip: Reserve a small budget buffer (~5% of total) specifically for the summarization step. If the agent uses 100% of budget on work and has nothing left for the summary, the user gets the worst of both worlds: incomplete work and no explanation.

Pattern 5: Inform and Stop

When to use: The action cannot be degraded, deferred, or partially completed. The only option is a clean stop.

This is the last resort — but it should still be designed, not accidental. This is error handling for humans, not just for logs. A well-crafted stop message is dramatically better than an unhandled error or a silent timeout.

What a good stop looks like:

Bad	Better
`Error: BUDGET_EXCEEDED`	"I've reached the session budget limit. Here's a summary of what I completed. To continue, you can start a new session."
`500 Internal Server Error`	"I'm not able to complete this request right now. Your data is saved and nothing was sent externally."
Silent hang	"This action requires more budget than currently available. I've queued it for your review."

Key elements of a good stop message:

What happened — be specific (budget limit, not "something went wrong")
What was saved — reassure the user that no work was lost
What wasn't done — so they know what to follow up on
What they can do next — start a new session, wait for budget refresh, escalate

Trade-offs:

Clean user experience instead of a cryptic error
Requires designing stop messages per agent type (internal-facing vs. customer-facing)
Users may not understand why the agent stopped if the message is too vague
Over-explaining internal mechanics (budgets, RISK_POINTS) confuses end users

For customer-facing agents, the message should never expose internal budget mechanics. Instead:

"I'm not able to look up your order status right now. Our systems are experiencing temporary limits. Please try again in a few minutes, or I can connect you with a human agent."

The user doesn't need to know about RISK_POINTS or budget hierarchies. They need to know their request isn't lost and there's a next step.

Choosing the Right Pattern

Not every DENY needs the same response. The right pattern depends on the action, the user, and the context:

Scenario	Recommended pattern
Budget low, task can use cheaper model	Pattern 1: Model downgrade
Budget low, agent has multiple tools	Pattern 2: Capability narrowing
Action denied but not time-sensitive	Pattern 3: Queue and defer
Agent mid-task when budget hits	Pattern 4: Partial completion + summary
Hard DENY, no degradation possible	Pattern 5: Inform and stop
Budget low + mid-task + deferrable remainder	Combine 1 + 4 + 3: Downgrade model, complete what you can, queue the rest

In practice, most production agents should implement at least patterns 1, 4, and 5 — they cover the core graceful degradation path. Model downgrade catches the majority of budget constraints. Partial completion handles mid-task boundaries. Inform-and-stop is the fallback for everything else.

Patterns 2 and 3 add sophistication for agents with complex tool access or batch workloads, but they require more infrastructure — risk-point tiers for capability narrowing, a queue and scheduler for deferral.

Cost-Driven vs. Risk-Driven Degradation

The trigger matters because the recovery options are different:

	Cost-driven DENY	Risk-driven DENY
Why it fired	Dollar or token budget exhausted	RISK_POINTS budget exhausted or specific tool denied
Can a cheaper model help?	Yes — Pattern 1	No — the model isn't the problem, the action is
Can the agent keep working?	Often yes, at lower quality	Often yes, with fewer tools (Pattern 2)
Can it be deferred?	Yes — wait for next budget window	Depends — some actions are time-sensitive
Typical recovery path	Downgrade model → partial completion → stop	Narrow capabilities → read-only mode → stop

The key insight: a cost DENY means the agent can't afford to act. A risk DENY means the agent isn't allowed to act. Agents that handle both need to inspect the DENY reason and choose the right degradation path.

In practice, enforcement decisions often involve both simultaneously — an agent running low on dollar budget and approaching its RISK_POINTS ceiling. The patterns compose: downgrade the model (cost) while narrowing tool access (risk), complete what you can (partial completion), and queue the rest (defer).

The Design Principle

All five patterns share one principle: the agent should never leave the user wondering what happened.

A DENY without context is an error. A DENY with a summary, a handoff, and a next step is a feature. The difference isn't in the enforcement layer — it's in how the agent is designed to handle the response.

Enforcement tells the agent no. Graceful degradation design tells it what to do instead. That's the difference between error handling that serves the system and error handling that serves the user.

When Budget Runs Out: AI Agent Degradation Patterns ​

What Frameworks Do Today ​

The Three Signals Your Agent Receives ​

Pattern 1: Model Downgrade ​

Pattern 2: Capability Narrowing ​

Pattern 3: Queue and Defer ​

Pattern 4: Partial Completion with Summary ​

Pattern 5: Inform and Stop ​

Choosing the Right Pattern ​

Cost-Driven vs. Risk-Driven Degradation ​

The Design Principle ​

Related how-to guides ​

More from the Blog

When Budget Runs Out: AI Agent Degradation Patterns

What Frameworks Do Today

The Three Signals Your Agent Receives

Pattern 1: Model Downgrade

Pattern 2: Capability Narrowing

Pattern 3: Queue and Defer

Pattern 4: Partial Completion with Summary

Pattern 5: Inform and Stop

Choosing the Right Pattern

Cost-Driven vs. Risk-Driven Degradation

The Design Principle

Related how-to guides