Integrating Cycles with LangChain.js
This guide shows how to add budget governance to LangChain.js applications using a custom callback handler that wraps every LLM call with a Cycles reservation.
Prerequisites
npm install runcycles @langchain/core @langchain/openaiexport CYCLES_BASE_URL="http://localhost:7878"
export CYCLES_API_KEY="cyc_live_..."
export CYCLES_TENANT="acme"
export OPENAI_API_KEY="sk-..."The callback handler approach
LangChain.js fires callback events on every LLM call. A custom BaseCallbackHandler can hook into handleLLMStart and handleLLMEnd to create and commit Cycles reservations:
import { BaseCallbackHandler } from "@langchain/core/callbacks/base";
import { Serialized } from "@langchain/core/load/serializable";
import { LLMResult } from "@langchain/core/outputs";
import { v4 as uuidv4 } from "uuid";
import {
CyclesClient,
CyclesConfig,
} from "runcycles";
interface CyclesBudgetHandlerOptions {
client: CyclesClient;
subject: { tenant: string; workflow?: string; agent?: string; toolset?: string };
estimateAmount?: number;
actionKind?: string;
actionName?: string;
}
export class CyclesBudgetHandler extends BaseCallbackHandler {
name = "CyclesBudgetHandler";
private client: CyclesClient;
private subject: CyclesBudgetHandlerOptions["subject"];
private estimateAmount: number;
private actionKind: string;
private actionName: string;
private reservations = new Map<string, string>();
private keys = new Map<string, string>();
constructor(options: CyclesBudgetHandlerOptions) {
super();
this.client = options.client;
this.subject = options.subject;
this.estimateAmount = options.estimateAmount ?? 2_000_000;
this.actionKind = options.actionKind ?? "llm.completion";
this.actionName = options.actionName ?? "gpt-4o";
}
async handleLLMStart(
_serialized: Serialized,
_prompts: string[],
runId: string,
): Promise<void> {
const key = uuidv4();
this.keys.set(runId, key);
const res = await this.client.createReservation({
idempotency_key: key,
subject: this.subject,
action: { kind: this.actionKind, name: this.actionName },
estimate: { unit: "USD_MICROCENTS", amount: this.estimateAmount },
ttl_ms: 60_000,
});
if (!res.isSuccess) {
throw new Error(res.errorMessage ?? "Reservation failed");
}
this.reservations.set(runId, res.getBodyAttribute("reservation_id") as string);
}
async handleLLMEnd(output: LLMResult, runId: string): Promise<void> {
const rid = this.reservations.get(runId);
const key = this.keys.get(runId);
this.reservations.delete(runId);
this.keys.delete(runId);
if (!rid || !key) return;
const usage = output.llmOutput?.tokenUsage ?? {};
const inputTokens = usage.promptTokens ?? 0;
const outputTokens = usage.completionTokens ?? 0;
await this.client.commitReservation(rid, {
idempotency_key: `commit-${key}`,
actual: {
unit: "USD_MICROCENTS",
amount: inputTokens * 250 + outputTokens * 1_000,
},
metrics: {
tokens_input: inputTokens,
tokens_output: outputTokens,
},
});
}
async handleLLMError(error: Error, runId: string): Promise<void> {
const rid = this.reservations.get(runId);
const key = this.keys.get(runId);
this.reservations.delete(runId);
this.keys.delete(runId);
if (rid && key) {
await this.client.releaseReservation(rid, {
idempotency_key: `release-${key}`,
});
}
}
}Using the handler
With a chat model
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage } from "@langchain/core/messages";
import { CyclesClient, CyclesConfig, BudgetExceededError } from "runcycles";
const client = new CyclesClient(CyclesConfig.fromEnv());
const handler = new CyclesBudgetHandler({
client,
subject: { tenant: "acme", agent: "my-agent" },
});
const llm = new ChatOpenAI({ model: "gpt-4o", callbacks: [handler] });
try {
const result = await llm.invoke([new HumanMessage("Hello!")]);
console.log(result.content);
} catch (err) {
if (err instanceof BudgetExceededError) {
console.log("Budget exhausted.");
} else {
throw err;
}
}With an agent and tools
Every LLM call the agent makes (including tool-calling turns) gets its own reservation:
import { tool } from "@langchain/core/tools";
import { z } from "zod";
const getWeather = tool(
async ({ location }: { location: string }) => `72°F in ${location}`,
{
name: "get_weather",
description: "Get weather for a location.",
schema: z.object({ location: z.string() }),
},
);
const handler = new CyclesBudgetHandler({
client,
subject: { tenant: "acme", agent: "tool-agent", toolset: "weather" },
});
const llm = new ChatOpenAI({ model: "gpt-4o", callbacks: [handler] });
const llmWithTools = llm.bindTools([getWeather]);
try {
const result = await llmWithTools.invoke([
new HumanMessage("What's the weather in NYC?"),
]);
console.log(result.content);
} catch (err) {
if (err instanceof BudgetExceededError) {
console.log("Agent stopped — budget exhausted.");
} else {
throw err;
}
}How it works
| Event | Action |
|---|---|
handleLLMStart | Create a reservation with the estimated cost |
handleLLMEnd | Commit the actual cost from token usage |
handleLLMError | Release the reservation to free held budget |
The handler tracks active reservations by LangChain's runId, so concurrent calls are handled correctly.
Streaming with LangChain.js
For streaming responses, use reserveForStream instead of the callback handler. This keeps the reservation alive with an automatic heartbeat while tokens are being streamed:
import { ChatOpenAI } from "@langchain/openai";
import { HumanMessage } from "@langchain/core/messages";
import {
CyclesClient,
CyclesConfig,
reserveForStream,
BudgetExceededError,
} from "runcycles";
const client = new CyclesClient(CyclesConfig.fromEnv());
const handle = await reserveForStream({
client,
estimate: 2_000_000,
unit: "USD_MICROCENTS",
actionKind: "llm.completion",
actionName: "gpt-4o",
});
const llm = new ChatOpenAI({ model: "gpt-4o" });
try {
const stream = await llm.stream([new HumanMessage("Write a short poem.")]);
let fullText = "";
for await (const chunk of stream) {
const content = typeof chunk.content === "string" ? chunk.content : "";
process.stdout.write(content);
fullText += content;
}
// Estimate actual cost from output length (1 token ~ 4 chars)
const estimatedOutputTokens = Math.ceil(fullText.length / 4);
const actualCost = Math.ceil(500 * 250 + estimatedOutputTokens * 1_000);
await handle.commit(actualCost, {
tokensOutput: estimatedOutputTokens,
});
} catch (err) {
await handle.release("stream_error");
throw err;
}Per-agent budgets
Use Cycles' subject hierarchy to give each agent its own budget scope:
// Planning agent with its own budget
const plannerHandler = new CyclesBudgetHandler({
client,
subject: { tenant: "acme", workflow: "support", agent: "planner" },
});
// Executor agent with a separate budget
const executorHandler = new CyclesBudgetHandler({
client,
subject: { tenant: "acme", workflow: "support", agent: "executor" },
});
const planner = new ChatOpenAI({ model: "gpt-4o", callbacks: [plannerHandler] });
const executor = new ChatOpenAI({ model: "gpt-4o", callbacks: [executorHandler] });Each agent draws from its own budget allocation. If the executor exhausts its budget, the planner can still operate independently.
Key points
- One reservation per LLM call. The callback creates a reservation on every
handleLLMStartand commits onhandleLLMEnd. - Agents are automatically covered. Multi-turn agents that call the LLM repeatedly get budget-checked on every turn.
- Errors release budget. If the LLM call fails, the reservation is released immediately.
- Concurrent-safe. Reservations are tracked by
runId, supporting concurrent LLM calls. - Streaming uses a different pattern. Use
reserveForStreamwith its automatic heartbeat instead of the callback handler. - Works with any LangChain.js model. Attach the handler to
ChatOpenAI,ChatAnthropic, or any other model viacallbacks: [handler].
Next steps
- Integrating Cycles with LangChain (Python) — the Python version of this guide
- Handling Streaming Responses — streaming patterns in detail
- Cost Estimation Cheat Sheet — how much to reserve per model
- Error Handling Patterns in TypeScript — handling Cycles errors in TypeScript
