Integrating Cycles with Anthropic (TypeScript)
This guide shows how to guard Anthropic Messages API calls with Cycles budget reservations in TypeScript, including streaming support and per-tool-call budget tracking for agentic workflows.
For the Python version, see Integrating with Anthropic (Python).
Prerequisites
- A running Cycles stack with a tenant, API key, and budget (Deploy the Full Stack)
- Node.js 20+
Installation
npm install runcycles @anthropic-ai/sdkexport CYCLES_BASE_URL="http://localhost:7878"
export CYCLES_API_KEY="cyc_live_..."
export ANTHROPIC_API_KEY="sk-ant-..."60-Second Quick Start
import Anthropic from "@anthropic-ai/sdk";
import { CyclesClient, CyclesConfig, withCycles } from "runcycles";
const cycles = new CyclesClient(CyclesConfig.fromEnv());
const anthropic = new Anthropic();
const ask = withCycles(
{
client: cycles,
actionKind: "llm.completion",
actionName: "claude-sonnet-4",
estimate: () => 2_000_000,
actual: (r: Anthropic.Message) =>
r.usage.input_tokens * 300 + r.usage.output_tokens * 1_500,
},
async (prompt: string) => {
return anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [{ role: "user", content: prompt }],
});
},
);
const response = await ask("What is budget authority?");
console.log(response.content[0].type === "text" ? response.content[0].text : "");Budget is reserved before the call and committed with actual token cost after. If budget is exhausted, BudgetExceededError is thrown before the Anthropic call is made.
Non-streaming calls with withCycles
Use the withCycles higher-order function to wrap Anthropic calls with automatic reserve → execute → commit:
import Anthropic from "@anthropic-ai/sdk";
import {
CyclesClient, CyclesConfig, withCycles,
setDefaultClient, getCyclesContext, BudgetExceededError,
} from "runcycles";
const cyclesClient = new CyclesClient(CyclesConfig.fromEnv());
setDefaultClient(cyclesClient);
const anthropic = new Anthropic();
// Claude Sonnet 4 pricing (microcents per token)
const INPUT_PRICE = 300; // $3.00 / 1M tokens
const OUTPUT_PRICE = 1_500; // $15.00 / 1M tokens
const DEFAULT_MAX_TOKENS = 1024;
const sendMessage = withCycles(
{
client: cyclesClient,
actionKind: "llm.completion",
actionName: "claude-sonnet-4-20250514",
estimate: (prompt: string) => {
const inputTokens = Math.ceil(prompt.length / 4);
return inputTokens * INPUT_PRICE + DEFAULT_MAX_TOKENS * OUTPUT_PRICE;
},
actual: (response: Anthropic.Message) => {
return response.usage.input_tokens * INPUT_PRICE
+ response.usage.output_tokens * OUTPUT_PRICE;
},
},
async (prompt: string) => {
const ctx = getCyclesContext();
// Respect budget caps
let maxTokens = DEFAULT_MAX_TOKENS;
if (ctx?.caps?.maxTokens) {
maxTokens = Math.min(maxTokens, ctx.caps.maxTokens);
}
const response = await anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: maxTokens,
messages: [{ role: "user", content: prompt }],
});
// Report metrics for observability
if (ctx) {
ctx.metrics = {
tokensInput: response.usage.input_tokens,
tokensOutput: response.usage.output_tokens,
modelVersion: response.model,
};
}
return response;
},
);
try {
const response = await sendMessage("Explain budget governance.");
console.log(response.content[0].type === "text" ? response.content[0].text : "");
} catch (err) {
if (err instanceof BudgetExceededError) {
console.log("Budget exhausted.");
} else {
throw err;
}
}Streaming with reserveForStream
For streaming responses, use reserveForStream to manage the reservation lifecycle:
import Anthropic from "@anthropic-ai/sdk";
import {
CyclesClient, CyclesConfig, reserveForStream, BudgetExceededError,
} from "runcycles";
const cyclesClient = new CyclesClient(CyclesConfig.fromEnv());
const anthropic = new Anthropic();
const INPUT_PRICE = 300;
const OUTPUT_PRICE = 1_500;
async function streamWithBudget(prompt: string) {
const estimatedInputTokens = Math.ceil(prompt.length / 4);
const estimate = estimatedInputTokens * INPUT_PRICE + 1024 * OUTPUT_PRICE;
// 1. Reserve budget
const handle = await reserveForStream({
client: cyclesClient,
estimate,
unit: "USD_MICROCENTS",
actionKind: "llm.completion",
actionName: "claude-sonnet-4-20250514",
});
try {
// Respect budget caps
let maxTokens = 1024;
if (handle.caps?.maxTokens) {
maxTokens = Math.min(maxTokens, handle.caps.maxTokens);
}
// 2. Stream the response
const stream = anthropic.messages.stream({
model: "claude-sonnet-4-20250514",
max_tokens: maxTokens,
messages: [{ role: "user", content: prompt }],
});
for await (const event of stream) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
process.stdout.write(event.delta.text);
}
}
// 3. Commit actual usage from the final message
const finalMessage = await stream.finalMessage();
const actualCost =
finalMessage.usage.input_tokens * INPUT_PRICE +
finalMessage.usage.output_tokens * OUTPUT_PRICE;
await handle.commit(actualCost, {
tokensInput: finalMessage.usage.input_tokens,
tokensOutput: finalMessage.usage.output_tokens,
modelVersion: finalMessage.model,
});
} catch (err) {
await handle.release("stream_error");
throw err;
}
}Per-tool-call budget tracking
When Claude uses tools, each LLM turn consumes tokens. Use the programmatic client to create a reservation per turn:
import Anthropic from "@anthropic-ai/sdk";
import {
CyclesClient, CyclesConfig, BudgetExceededError,
} from "runcycles";
const cyclesClient = new CyclesClient(CyclesConfig.fromEnv());
const anthropic = new Anthropic();
const INPUT_PRICE = 300;
const OUTPUT_PRICE = 1_500;
async function chatWithTools(prompt: string): Promise<string> {
const messages: Anthropic.MessageParam[] = [
{ role: "user", content: prompt },
];
for (let turn = 1; turn <= 5; turn++) {
// Reserve budget for this turn
const handle = await reserveForStream({
client: cyclesClient,
estimate: 2_000_000,
unit: "USD_MICROCENTS",
actionKind: "llm.completion",
actionName: "claude-sonnet-4-20250514",
});
try {
const response = await anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
tools: TOOLS,
messages,
});
// Commit actual cost
const actualCost =
response.usage.input_tokens * INPUT_PRICE +
response.usage.output_tokens * OUTPUT_PRICE;
await handle.commit(actualCost, {
tokensInput: response.usage.input_tokens,
tokensOutput: response.usage.output_tokens,
modelVersion: response.model,
});
if (response.stop_reason === "end_turn") {
const textBlock = response.content.find((b) => b.type === "text");
return textBlock ? textBlock.text : "";
}
// Process tool calls and continue
if (response.stop_reason === "tool_use") {
messages.push({ role: "assistant", content: response.content });
const toolResults: Anthropic.ToolResultBlockParam[] = [];
for (const block of response.content) {
if (block.type === "tool_use") {
const result = await executeTool(block.name, block.input);
toolResults.push({
type: "tool_result",
tool_use_id: block.id,
content: result,
});
}
}
messages.push({ role: "user", content: toolResults });
}
} catch (err) {
await handle.release("tool_call_error");
throw err;
}
}
return "Max turns reached.";
}Each turn gets its own reservation, so the budget authority can deny mid-conversation if the agent is burning through budget too fast.
Pricing reference
Adjust these constants for the model you use:
| Model | Input (microcents/token) | Output (microcents/token) |
|---|---|---|
| Claude Haiku 3.5 | 80 | 400 |
| Claude Sonnet 4 | 300 | 1,500 |
| Claude Opus 4 | 1,500 | 7,500 |
Key points
withCyclesfor non-streaming. Wraps a single Anthropic call with automatic reserve → execute → commit.reserveForStreamfor streaming. Manages the reservation lifecycle with automatic heartbeat during the stream.- Token fields differ from OpenAI. Anthropic uses
usage.input_tokens/usage.output_tokens(notprompt_tokens/completion_tokens). - Per-turn reservations for tool use. Each LLM turn in a tool-use loop gets its own reservation for fine-grained budget control.
- Respect caps. Check
handle.caps?.maxTokensto honor budget authority limits.
Full example
See examples/anthropic-sdk/ for a complete, runnable example.
Next steps
- Integrating with Anthropic (Python) — Python version of this guide
- Handling Streaming Responses — streaming patterns in detail
- Cost Estimation Cheat Sheet — pricing reference for estimation
- Error Handling in TypeScript — handling budget errors
- Production Operations Guide — running Cycles in production