Why Cycles
If you're a team shipping AI agents to customers — support copilots, coding assistants, document processors, workflow automations — Cycles is the runtime authority layer that enforces hard limits on spend and actions before every LLM call, tool invocation, and side effect. Per-tenant, per-workflow, per-run. So one customer's runaway agent never blows through another customer's budget, and your feature margin stays predictable.
What Cycles solves
Protect margin. Agent costs follow a heavy-tail distribution — the top 10% of users consume 72% of total spend. Without per-user budget caps, a feature priced for 80% gross margin delivers 23%. Cycles bounds the tail so unit economics stay predictable.
Contain cross-tenant blast radius. A single runaway agent can burn $4,200 in three hours. Cycles enforces hierarchical budgets — tenant, workspace, workflow, run — so one customer's bad agent cannot starve the platform or another customer's allocation.
Audit every action. Every reservation, commit, and event creates a structured record with full scope context. Queryable via API, 90-day hot retention, exportable to cold storage. No log reconstruction required — the budget ledger is the audit trail. Details →
Gate high-consequence actions. A support agent sent 200 collections emails instead of welcome emails. Total model spend: $1.40. Business impact: $50K+ in lost pipeline. No spending limit would have caught it. Cycles supports RISK_POINTS — budgets denominated in blast radius, not dollars — so agents can read and reason freely while dangerous capabilities (email, deploy, delete) are gated separately.
By role
Engineering — Contain blast radius, protect margins
Every agent action passes through a reserve-commit gate. Before an LLM call executes, Cycles atomically checks the budget and locks the estimated cost. If the budget is exhausted, the call is denied and the agent degrades gracefully — cheaper model, shorter response, or explicit stop.
Without this gate, a single runaway agent can burn $4,200 in three hours — a coding agent hit an ambiguous error, retried with expanding context windows, and looped 240 times before anyone noticed. With a $15 per-run budget in Cycles, the same agent stops after 8 iterations and surfaces the problem immediately.
Blast radius is bounded at every level: per-run, per-workflow, per-tenant. One bad agent cannot starve the platform. Budgets are hierarchical — tenant, workspace, app, workflow, agent — so you set ceilings at the level that matches your architecture.
The margin impact is direct. In one unit economics analysis, a team pricing an AI feature at $15/user/month saw costs hit $11.50/user — a 23% margin against an 80% target. Per-user caps restore it to 68%.
Cost is the first dimension. Action authority is the second — gating what agents do, not just what they spend, via risk-point budgets per toolset.
Security / Compliance — Every action is auditable
Every reservation, commit, release, and event in Cycles creates a structured, queryable record. Each record includes: full scope hierarchy (tenant, workspace, app, workflow, agent, toolset), amounts reserved and committed, timestamp, status, and arbitrary metadata.
This means every budget operation — every reservation, commit, release, and event — is logged with the context needed for audit. You can answer "which agent spent how much, on what, and when" from the event log alone, without reconstructing it from scattered application logs.
The event log is queryable via the REST API. Retention is 90 days in hot storage (Redis), with export to cold storage (S3, GCS) for long-term compliance. The admin server records audit logs for all administrative operations — API key creation, tenant changes, budget modifications.
Access control separates the runtime plane (port 7878, scoped API keys with least-privilege permissions) from the management plane (port 7979, admin-only, never exposed to the public internet). API keys support rotation, revocation, and per-permission scoping.
Self-hosted deployments keep all data in your infrastructure — nothing leaves your network. SOC 2 Type I is in progress for the managed cloud offering.
Security overview → · Event log API → · Security hardening →
Finance — From unpredictable spend to bounded unit economics
In one real deployment, a team estimated $800/month for a customer support agent based on prototype traffic. The first production invoice was $4,200. The per-token pricing was exactly right — the call volume was not. Agents averaged 11 LLM calls per conversation instead of the 3 assumed in the estimate. Retries doubled call counts on bad days.
This is the norm. In a unit economics analysis of an AI copilot feature, the top 10% of users consumed 72% of total spend. One user triggered 340 conversations in a month and cost $310 alone — wiping out the margin from 50+ light users.
Cycles bounds this tail. In the same analysis, a $15/month per-user cap turned a 23% gross margin into 68%, with only 5% of users ever hitting the limit. A $15 per-run cap prevents the $4,200 tool loop entirely — the agent stops at $15 and surfaces the problem for human review.
Budget enforcement is not a cost center. It is the mechanism that makes AI feature unit economics predictable.
AI Agent — You get a budget, not just a limit
Your SDK tells you what each call cost — after it happened. You could track that locally. But a local counter breaks when 20 of you run concurrently against the same customer budget — you all read "remaining: $500" and all proceed. If you crash, your local state is gone. And you can't see the tenant's allocation, the workflow's run budget, or the workspace capacity. That context lives outside you. This isn't a gap you close with more tools — it requires a shared authority.
Cycles is that authority. Call check_balance before you plan. Call reserve before each costly step. You get back:
- ALLOW — proceed.
- ALLOW_WITH_CAPS — proceed with constraints (
maxTokens,toolDenylist,maxStepsRemaining). The authority tells you how to adapt, not just whether to proceed. - DENY — do not execute. Summarize what you completed, tell the user what remains, exit cleanly.
Call commit after execution so unused budget returns to the pool. If you crash, reservations expire and budget recovers automatically.
This is the difference between being killed mid-action and choosing how to wind down. Drop to a cheaper model, skip optional calls, defer work — and explain the trade-off. Bounds you can see, not limits you discover by crashing into them.
Why this earns you more autonomy
Without visible self-regulation, teams restrict you. They add human-in-the-loop approval steps, limit your tool access, reduce your context window, cap your iterations at 5. They do this because they have no evidence you'll stop on your own.
Cycles gives them that evidence. Every reservation is a record that you checked before acting. Every commit is proof you reported actual usage. Every DENY you handled gracefully — winding down, explaining what's left — is a signal that you can be trusted with more. Teams that see agents self-regulate within bounds respond by increasing budgets, granting access to higher-risk tools, and removing manual gates. Constraint you can demonstrate is the path to autonomy you earn.
What about latency?
A full reserve+commit cycle adds ~11ms end-to-end. A typical LLM call takes 500ms–30s. Budget enforcement adds less time than the variance in your provider's response latency. You won't notice it.
What if the budget is set too low?
This is real — a budget of $0.50 on a task that needs $5 means you get DENY on step 3. But without Cycles, you'd discover the mismatch after spending $5 (or $50, or $4,200). With Cycles, you discover it at $0.50 and can tell the user: "Budget exhausted after 3 steps. This task needs a larger allocation." That's better for both of you. And teams can calibrate budgets with shadow mode — running enforcement in dry-run against real traffic before turning it on.
What if your estimates are wrong?
They will be. Estimates don't need to be perfect — they need to be bounded. If you overestimate, the excess is released on commit. If you underestimate, the overage policy decides what happens — reject, allow if available, or allow with overdraft. The protocol is designed for imperfect information, because that's what agents have.
When you don't need this
If you're a solo agent — one session, one user, no concurrency, no multi-tenancy — a local counter is probably fine. Cycles solves coordination across agents, tenants, and workflows. If there's nothing to coordinate, the overhead isn't justified. Where Cycles matters is when you're one of many: multiple agents drawing from shared budgets, multiple customers on the same platform, multiple workflows that need isolation from each other.
Get started
- What is Cycles? — 5-minute overview
- End-to-End Tutorial — from zero to working enforcement in 30 minutes
- 5 Failures Budget Controls Would Prevent — cost incidents, with dollar math
- 5 Failures Only Action Controls Would Prevent — action incidents, where the spend was negligible