AI Agent Unit Economics: Cost and Margin Analysis
A B2B SaaS company adds an AI copilot to their customer support product. They price the feature at $15/user/month and estimate $3/user/month in LLM costs based on their pilot: 20 conversations per user per month, 6 turns per conversation, GPT-4o at ~$0.15 per conversation. Gross margin target: 80%.
Month one in production with 200 users: average cost per user is $4.20. Close enough. Month two: $6.80. Month three: $11.50. The average is not the problem — the distribution is. 70% of users cost under $3/month. 20% cost $8-25/month. 10% cost $40-120/month. One user triggered 340 conversations in a month — automated integration testing against the copilot endpoint. That single user cost $310.
The company's overall gross margin on the AI feature in month three: 23% — far below their 80% target. The top 10% of users by cost consume 72% of total spend. Worse, three users each cost over $200, wiping out the margin from 50+ light users apiece. The product is profitable for most users and catastrophically unprofitable for a few — and there is no mechanism to distinguish between them at the point of execution.
From Tokens to Business Metrics
Most engineering teams track cost at the wrong level of abstraction. They know their per-token price. They know their monthly API bill. They cannot answer: "What does it cost us to resolve one support ticket?" or "What is our cost per active user this month?"
The translation requires four inputs:
- Raw token cost — per-model pricing from the provider
- Calls per unit of work — how many LLM calls does one conversation, review, or document take?
- Units of work per user — how many conversations, reviews, or documents does one user generate per month?
- Variance distribution — what does the cost spread look like across users?
The first three give you the average. The fourth determines whether the average is useful.
| Use case | Unit of work | Avg calls/unit | Avg cost/unit | Median | P90 | P99 | P99/Median |
|---|---|---|---|---|---|---|---|
| Support copilot | Conversation | 9 | $0.21 | $0.08 | $0.45 | $3.80 | 47× |
| Code review agent | Pull request | 22 | $1.85 | $1.20 | $4.50 | $18.00 | 15× |
| Document processor | Document | 4 | $0.12 | $0.09 | $0.30 | $2.10 | 23× |
The rightmost column — P99/Median — is the variance multiplier. For the support copilot, the most expensive 1% of conversations cost 47× the median. This ratio determines whether average-based pricing works or breaks.
Why Variance Destroys Margin Predictions
If you price at 3× average cost — a standard SaaS margin target — you need the cost distribution to be tight enough that 3× average covers nearly all users. For normal distributions, it does. Agent cost distributions are not normal. They follow a heavy-tail pattern because:
Context window growth is superlinear. Each turn in a conversation sends all previous turns. A 6-turn conversation sends ~21 message payloads total (1+2+3+4+5+6). A 20-turn conversation sends ~210. The cost scales with the square of conversation length, not linearly.
Retries cluster. A 5% overall failure rate sounds manageable. But failures are not evenly distributed — some conversations hit 50% failure rates because they trigger edge cases in tool execution. Those conversations cost 2-3× more than their content suggests, and the extra cost is invisible in average metrics.
Tool call depth varies 10-50×. A "what's my order status?" query makes 2 LLM calls. A "help me debug this integration" query makes 30+. Both are "one conversation" in your metrics.
User behavior is unpredictable. Some users send one message per conversation. Others send 40-message threads. Some users open 5 conversations per month. Others open 200. The variance in user behavior compounds the variance in per-conversation cost.
To illustrate, consider a generic AI feature with $4.00 average cost per user:
| Pricing at | Avg cost/user | Price/user | Margin (tight distribution, CV=0.5) | Margin (heavy-tail, CV=3.0) |
|---|---|---|---|---|
| 2× average cost | $4.00 | $8.00 | 65% | -15% |
| 3× average cost | $4.00 | $12.00 | 78% | 22% |
| 5× average cost | $4.00 | $20.00 | 87% | 55% |
CV is the coefficient of variation — standard deviation divided by mean. Tight distributions have CV < 1. Agent cost distributions typically have CV of 2-4. At CV=3.0, even pricing at 3× average only yields 22% margin, because a small number of high-cost users eat the profit from everyone else.
Building a Unit Economics Dashboard
Four metrics every team running AI features should track:
1. Cost per unit of work. For a support copilot, this is cost per conversation. For a code review agent, cost per pull request. Track the median, P90, P95, and P99 — not just the average. The average masks the tail.
2. Cost per active user per month. Total spend attributed to a user divided by one month. Break this out by percentile: what does your P50 user cost? Your P90? Your P99? The gap between P50 and P99 is your variance exposure.
3. Variance ratio (P95/Median). A single number that captures how fat the tail is. If P95/Median < 5, your pricing model can rely on averages. If P95/Median > 10, averages are misleading and you need per-user budget enforcement.
4. Margin per user cohort. Revenue minus cost, grouped by usage tier. This reveals whether your product is profitable for all users or subsidized by light users to cover heavy ones.
Cycles' Subject hierarchy maps directly to these metrics. Each reservation and commit is tagged with a subject — tenant, workflow, agent — so cost attribution is structural, not inferred from API logs after the fact.
from runcycles import CyclesClient, CyclesConfig, Subject
client = CyclesClient(CyclesConfig.from_env())
# Get balance for a specific user's monthly spend
balance = client.get_balance(
subject=Subject(
tenant="acme",
workflow="support-copilot",
agent=f"user-{user_id}",
),
)
cost_usd = balance.committed / 100_000_000 # microcents to dollarsWith per-subject cost attribution, you can compute all four metrics directly: aggregate by conversation ID for cost-per-conversation, by user ID for cost-per-user, and by user cohort for margin analysis. No log parsing, no reconciliation against provider invoices.
How Budget Enforcement Bounds Variance
You cannot control variance at the pricing layer. You must control it at the execution layer. Budget enforcement — a runtime authority that makes a deterministic allow/deny decision before every LLM call — transforms the cost distribution from unbounded heavy-tail to bounded exposure.
Three enforcement strategies, each mapped to margin impact:
Per-conversation cap. Set a $2.00 hard limit per conversation. Conversations that would have cost $0.45 (P90) pass through unaffected, but the $3.80 outliers (P99) are capped. The agent degrades gracefully — shorter responses, cheaper model fallback, or an explicit "I've reached my limit for this conversation, please start a new one" message. The tail is cut.
Per-user monthly cap. Set a $15.00/month ceiling per user — matching the price point. Users who would have cost $80/month are bounded. The feature becomes profitable for every user, by definition. This is the same pattern used in multi-tenant AI cost control for per-tenant isolation.
Tiered budgets by plan. Free users get $2/month in agent budget. Pro users get $20/month. Enterprise gets custom limits. The budget enforcement implements the pricing model directly — the hard limit and the price point are the same number.
| Strategy | Without cap | $2/conversation cap | $15/user/month cap |
|---|---|---|---|
| Avg cost/user/month | $11.50 | $4.80 | $4.80 |
| P99 cost/user/month | $120.00 | $14.00 | $15.00 |
| Worst-case user | $310.00 | $22.00 | $15.00 |
| Feature gross margin | 23% | 68% | 68% |
| Users hitting cap | 0% | 12% | 5% |
The $15/user/month cap turns a 23% margin feature into a 68% margin feature — close to the 80% target — with only 5% of users ever hitting the limit. For those users, the agent degrades gracefully — it does not hard-stop. It can switch to a cheaper model, reduce response length, or defer non-critical tasks.
Cost Per Conversation as a Business KPI
Token pricing is an engineering metric. Cost per conversation is a business KPI. Three patterns for using it:
Chargeback. Enterprise customers pay for actual AI usage. Cycles' per-tenant tracking provides the billing data — every reservation and commit is scoped to a tenant, so cost attribution is automatic. The usage report is the invoice. See Multi-Tenant AI Cost Control for the full chargeback model.
Feature-level P&L. Treat the AI copilot as its own cost center. Track cost per conversation as COGS. Monitor margin weekly. Set alerts when margin drops below threshold. This is Tier 3 of the cost management maturity model — alerting on business metrics, not just raw spend.
Model routing by economics. Route simple conversations to GPT-4o-mini ($0.15/1M input tokens) and complex conversations to GPT-4o ($2.50/1M input tokens). The routing decision is economic, not just capability-based. A simple "what's my order status?" query does not need a $2.50/1M-token model. A complex debugging session does. Routing and enforcement complement each other — the router picks the model, the runtime authority bounds the cost.
Next steps
- How Much Do AI Agents Actually Cost? — raw provider pricing and per-scenario cost breakdowns
- AI Agent Cost Management: The Complete Guide — the five-tier maturity model from monitoring to hard enforcement
- Multi-Tenant AI Cost Control — per-tenant budgets and chargeback models
- 5 AI Agent Failures Budget Controls Would Prevent — what happens when variance is unbounded
- AI Agent Budget Control: Enforce Hard Spend Limits — the reserve-commit pattern for pre-execution enforcement
- Cycles vs LLM Proxies and Observability Tools — why dashboards cannot prevent the overspend
