Integrating Cycles with OpenAI
This guide shows how to guard OpenAI API calls with Cycles budget reservations so that every chat completion is cost-controlled, caps-aware, and observable.
Prerequisites
pip install runcycles openaiSet environment variables:
export CYCLES_BASE_URL="http://localhost:7878"
export CYCLES_API_KEY="your-api-key" # create via Admin Server — see note below
export CYCLES_TENANT="acme"
export OPENAI_API_KEY="sk-..."Need an API key? Create one via the Admin Server — see Deploy the Full Stack or API Key Management.
Basic pattern
Use the @cycles decorator to wrap an OpenAI call with automatic reserve → execute → commit:
from openai import OpenAI
from runcycles import (
CyclesClient, CyclesConfig, CyclesMetrics,
cycles, get_cycles_context, set_default_client,
)
# Set up clients
config = CyclesConfig.from_env()
set_default_client(CyclesClient(config))
openai_client = OpenAI()
# Per-token pricing in USD microcents (1 USD = 100_000_000 microcents)
PRICE_PER_INPUT_TOKEN = 250 # $2.50 / 1M tokens
PRICE_PER_OUTPUT_TOKEN = 1_000 # $10.00 / 1M tokens
@cycles(
estimate=lambda prompt, **kw: len(prompt.split()) * 2 * PRICE_PER_INPUT_TOKEN
+ kw.get("max_tokens", 1024) * PRICE_PER_OUTPUT_TOKEN,
actual=lambda result: (
result["usage"]["prompt_tokens"] * PRICE_PER_INPUT_TOKEN
+ result["usage"]["completion_tokens"] * PRICE_PER_OUTPUT_TOKEN
),
action_kind="llm.completion",
action_name="gpt-4o",
unit="USD_MICROCENTS",
ttl_ms=60_000,
)
def chat_completion(prompt: str, max_tokens: int = 1024) -> dict:
ctx = get_cycles_context()
# Respect caps from the budget authority
if ctx and ctx.has_caps() and ctx.caps.max_tokens:
max_tokens = min(max_tokens, ctx.caps.max_tokens)
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
max_tokens=max_tokens,
)
# Report metrics
if ctx:
ctx.metrics = CyclesMetrics(
tokens_input=response.usage.prompt_tokens,
tokens_output=response.usage.completion_tokens,
model_version=response.model,
)
return {
"content": response.choices[0].message.content,
"usage": {
"prompt_tokens": response.usage.prompt_tokens,
"completion_tokens": response.usage.completion_tokens,
},
}Cost estimation strategies
The estimate function runs before the API call. The more accurate it is, the less budget you hold unnecessarily:
| Strategy | Accuracy | Example |
|---|---|---|
| Constant | Low | estimate=500_000 |
| Token-proportional | Medium | estimate=lambda p, **kw: kw.get("max_tokens", 1024) * PRICE_PER_OUTPUT_TOKEN |
| Input + output | High | Count input tokens (or approximate from word count) plus max output tokens |
For production use, consider using tiktoken for accurate input token counts:
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
def estimate_cost(prompt: str, max_tokens: int = 1024) -> int:
input_tokens = len(enc.encode(prompt))
return (
input_tokens * PRICE_PER_INPUT_TOKEN
+ max_tokens * PRICE_PER_OUTPUT_TOKEN
)Handling budget exhaustion
When the budget is insufficient, the @cycles decorator raises BudgetExceededError without calling OpenAI:
from runcycles import BudgetExceededError
try:
result = chat_completion("Summarize this document...")
except BudgetExceededError:
# Degrade gracefully
result = {"content": "Service temporarily unavailable.", "usage": {}}See Degradation Paths for patterns like queueing, model downgrade, and caching.
Respecting caps
When the decision is ALLOW_WITH_CAPS, the budget authority may limit token usage. Always check and respect caps inside your function:
ctx = get_cycles_context()
if ctx and ctx.has_caps() and ctx.caps.max_tokens:
max_tokens = min(max_tokens, ctx.caps.max_tokens)This lets the budget authority throttle expensive requests without fully denying them.
Reporting metrics
Metrics attached to the context are included in the commit and become available for observability:
ctx.metrics = CyclesMetrics(
tokens_input=response.usage.prompt_tokens,
tokens_output=response.usage.completion_tokens,
latency_ms=elapsed_ms,
model_version=response.model,
)Key points
- Estimate before, commit after. The
estimatefunction determines how much budget to reserve; theactualfunction computes the real cost from the response. - Caps are advisory. The budget authority sets them; your code decides how to enforce them.
- Metrics are optional but valuable. They flow into Cycles for per-model, per-tenant cost visibility.
- The function never executes on DENY. OpenAI is never called if the budget is exhausted, saving both money and latency.
Full example
See examples/openai_integration.py for a complete, runnable script.
