Stop Agents from Burning Your API Budget Overnight
A coding agent hit an ambiguous error. It retried with an expanding context window. Each retry cost more than the last. By the time someone checked the dashboard the next morning, the agent had looped 240 times and spent $4,200.
The model pricing was exactly right. The call volume was not.
Why existing controls didn't help
Provider spending caps are typically monthly and org-wide. They don't distinguish between your production agent and your staging test. By the time the monthly cap kicks in, the damage is done — and it blocks every other agent on the account too.
Rate limits control how fast, not how much. The agent stayed within its requests-per-second limit. It was making perfectly well-formed API calls. Just 240 of them.
Observability dashboards showed the spike — the next morning. The cost graph was a vertical line at 2 AM. Useful for the post-mortem. Useless for prevention.
How Cycles fixes it
from runcycles import cycles
@cycles(estimate=2_000_000, action_kind="llm.completion", action_name="gpt-4o")
def call_llm(prompt: str) -> str:
return openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
).choices[0].message.contentThat's it. Before every LLM call, the @cycles decorator reserves budget. If the budget is exhausted, BudgetExceededError is raised and the model is never called. No tokens consumed. No cost incurred.
The same agent with a $15 per-run budget stops after 8 iterations and surfaces the problem immediately: "Budget exhausted. This task needs human review."
What happens now
- Budget checked before every call. The agent can't overspend — the reservation is denied before the API call executes.
- Graceful degradation, not a crash. The agent can catch
BudgetExceededErrorand wind down: summarize progress, switch to a cheaper model, or queue the task for later. - Per-run isolation. Each agent run has its own budget. A runaway in run #47 can't affect run #48 or another customer's allocation.
- You find out at $15, not $4,200. The budget limit surfaces the problem immediately instead of letting it compound overnight.
The math
| Without Cycles | With Cycles ($15/run cap) | |
|---|---|---|
| Agent loops | 240 | 8 |
| Cost | $4,200 | $15 |
| Time to detect | Next morning | Immediately |
| Impact on other agents | All blocked by provider cap | None — per-run isolation |
| Recovery action | Post-mortem and budget reset | Fix the prompt |
Now run the numbers for your workload
The calculator below is pre-seeded with a similar retry-loop profile — 200K input tokens per call by the time someone notices, 240 calls. The exact $4,200 in the story above depends on context-window growth across retries that no static calculator captures perfectly; the shape of the cost curve is what the budget gate actually bounds. Adjust the input/output tokens, calls/day, and model rates to match your own incident. Click Share to send the configured view to a teammate, or PNG for an artifact you can paste into a deck.
| Model | Input $ / M | Output $ / M | Per call | Per day | Per month | Per year | |
|---|---|---|---|---|---|---|---|
| $0.0340 | $340 | $10,200 | $124,100 | ||||
| $0.0170 | $170 | $5,100 | $62,050 | ||||
| $0.0051 | $51.00 | $1,530 | $18,615 | ||||
| $0.0014 | $14.00 | $420 | $5,110 | ||||
| $0.0300 | $300 | $9,000 | $109,500 | ||||
| $0.0300 | $300 | $9,000 | $109,500 | ||||
| $0.0180 | $180 | $5,400 | $65,700 | ||||
| $0.0060 | $60.00 | $1,800 | $21,900 |
Defaults reflect published list pricing as of 2026-04 from OpenAI and Anthropic; verify before relying on these for budgeting and edit any cell to match your contracted rate. Calculation: (input_tokens × input_$/M + output_tokens × output_$/M) ÷ 1,000,000 × calls/day. Excludes prompt caching, batch discounts, fine-tuning, fast-mode, and data-residency premiums.
Go deeper
- LLM Cost Runtime Control Reference — the full topic guide: incident taxonomy, runtime authority patterns, multi-tenant isolation, unit economics, and rollout
- End-to-End Tutorial — zero to budget-guarded LLM call in 10 minutes
- Cost Estimation Cheat Sheet — how much to reserve per model
- Degradation Paths — what to do when budget runs out
- 5 Failures Budget Controls Would Prevent — more incidents with dollar math
- Why Rate Limits Are Not Enough — the deeper argument