Integrating Cycles with the Vercel AI SDK
This guide shows how to add budget governance to a Next.js application using the Vercel AI SDK and the runcycles TypeScript client.
The Vercel AI SDK uses streaming by default, so this guide uses the reserveForStream pattern — reserving budget before the stream starts, keeping the reservation alive during streaming, and committing actual usage when the stream finishes.
Prerequisites
- A running Cycles stack with a tenant, API key, and budget (Deploy the Full Stack)
- A Next.js project with the Vercel AI SDK installed
- Node.js 20+
Installation
npm install runcycles ai @ai-sdk/openaiEnvironment variables
CYCLES_BASE_URL=http://localhost:7878
CYCLES_API_KEY=cyc_live_...
CYCLES_TENANT=acme-corp
OPENAI_API_KEY=sk-...60-Second Quick Start
import { streamText } from "ai";
import { openai } from "@ai-sdk/openai";
import { CyclesClient, CyclesConfig, reserveForStream } from "runcycles";
const cycles = new CyclesClient(CyclesConfig.fromEnv());
const handle = await reserveForStream({
client: cycles, estimate: 2_000_000, unit: "USD_MICROCENTS",
actionKind: "llm.completion", actionName: "gpt-4o",
});
const result = streamText({
model: openai("gpt-4o"),
prompt: "What is budget authority?",
onFinish: async ({ usage }) => {
await handle.commit((usage.promptTokens ?? 0) * 250 + (usage.completionTokens ?? 0) * 1000);
},
});Budget is reserved before the stream starts and committed when it finishes. Read on for the full Next.js API route pattern with error handling.
API route with budget governance
Create an API route that reserves budget before streaming and commits actual usage after:
// app/api/chat/route.ts
import { streamText, type UIMessage, convertToModelMessages } from "ai";
import { openai } from "@ai-sdk/openai";
import {
CyclesClient,
CyclesConfig,
reserveForStream,
BudgetExceededError,
} from "runcycles";
export const runtime = "nodejs"; // Required for AsyncLocalStorage
const cyclesClient = new CyclesClient(CyclesConfig.fromEnv());
export async function POST(req: Request) {
const { messages }: { messages: UIMessage[] } = await req.json();
// Estimate cost from message content (1 token ~ 4 chars).
// GPT-4o: input $2.50/1M tokens (250 microcents/token),
// output $10/1M tokens (1000 microcents/token).
const estimatedInputTokens = messages.reduce(
(sum, m) => sum + (typeof m.content === "string" ? m.content.length : 0) / 4,
0,
);
const estimatedCost = Math.ceil(
estimatedInputTokens * 250 + estimatedInputTokens * 2 * 1000,
);
// 1. Reserve budget
let handle;
try {
handle = await reserveForStream({
client: cyclesClient,
estimate: estimatedCost,
unit: "USD_MICROCENTS",
actionKind: "llm.completion",
actionName: "gpt-4o",
});
} catch (err) {
if (err instanceof BudgetExceededError) {
return new Response(
JSON.stringify({
error: "budget_exceeded",
message: "Budget exhausted. Contact your administrator.",
}),
{ status: 402, headers: { "Content-Type": "application/json" } },
);
}
throw err;
}
// 2. Stream with budget tracking
try {
const result = streamText({
model: openai("gpt-4o"),
messages: await convertToModelMessages(messages),
onFinish: async ({ usage }) => {
const actualCost = Math.ceil(
(usage.promptTokens ?? 0) * 250 +
(usage.completionTokens ?? 0) * 1000,
);
await handle.commit(actualCost, {
tokensInput: usage.promptTokens,
tokensOutput: usage.completionTokens,
});
},
});
return result.toDataStreamResponse();
} catch (err) {
await handle.release("stream_error");
throw err;
}
}How it works
- Before streaming:
reserveForStreamcreates a reservation and starts an automatic heartbeat to keep it alive during the stream. - During streaming: The Vercel AI SDK streams tokens to the client. The heartbeat extends the reservation TTL automatically.
- After streaming: The
onFinishcallback calculates actual cost from token usage and callshandle.commit(). The heartbeat stops automatically. - On error: The
catchblock callshandle.release()to return the reserved budget to the pool.
Respecting budget caps
When the budget is running low, Cycles may return ALLOW_WITH_CAPS with a suggested max_tokens limit. Respect it by capping the model's output:
let handle = await reserveForStream({ ... });
// Use caps-aware max_tokens
let maxTokens = 4096;
if (handle.caps?.maxTokens) {
maxTokens = Math.min(maxTokens, handle.caps.maxTokens);
}
const result = streamText({
model: openai("gpt-4o"),
maxTokens,
messages: await convertToModelMessages(messages),
onFinish: async ({ usage }) => { ... },
});Client-side error handling
Handle the 402 response in your React component:
// components/chat.tsx
import { useChat } from "ai/react";
export function Chat() {
const { messages, input, handleInputChange, handleSubmit, error } = useChat();
if (error?.message?.includes("budget_exceeded")) {
return <div>Your budget has been exhausted. Please contact support.</div>;
}
return (
<form onSubmit={handleSubmit}>
{messages.map((m) => (
<div key={m.id}>{m.content}</div>
))}
<input value={input} onChange={handleInputChange} />
</form>
);
}Next steps
- Integrating with Next.js — middleware, server actions, per-tenant isolation
- Handling Streaming Responses — streaming patterns in detail
- Cost Estimation Cheat Sheet — how much to reserve per model
- Choosing the Right Integration Pattern — when to use
withCyclesvsreserveForStream - Vercel AI SDK example — runnable Vercel AI SDK integration