Integrating Cycles with Flask
This guide shows how to add budget management to a Flask application using error handlers, per-tenant isolation, and preflight budget checks.
Prerequisites
pip install runcycles flaskexport CYCLES_BASE_URL="http://localhost:7878"
export CYCLES_API_KEY="your-api-key" # create via Admin Server — see note below
export CYCLES_TENANT="acme"Need an API key? Create one via the Admin Server — see Deploy the Full Stack or API Key Management.
Client initialization
Create a Cycles client at app startup:
from flask import Flask
from runcycles import CyclesClient, CyclesConfig, set_default_client
app = Flask(__name__)
client = CyclesClient(CyclesConfig.from_env())
set_default_client(client)
app.config["CYCLES_CLIENT"] = clientSetting the default client means @cycles-decorated functions work without passing client= explicitly.
Error handlers
Convert Cycles exceptions into appropriate HTTP responses:
from flask import jsonify
from runcycles import BudgetExceededError, CyclesProtocolError
@app.errorhandler(BudgetExceededError)
def handle_budget_exceeded(exc):
return jsonify({
"error": "budget_exceeded",
"message": "Insufficient budget for this request.",
"retry_after_ms": exc.retry_after_ms,
}), 402
@app.errorhandler(CyclesProtocolError)
def handle_protocol_error(exc):
status = 429 if exc.is_retryable() else 503
return jsonify({
"error": str(exc.error_code),
"message": str(exc),
"retry_after_ms": exc.retry_after_ms,
}), statusBudget-guarded routes
Use the @cycles decorator on route handler functions or helper functions:
from flask import request, jsonify
from runcycles import cycles, get_cycles_context, CyclesMetrics
PRICE_PER_INPUT_TOKEN = 250
PRICE_PER_OUTPUT_TOKEN = 1_000
@cycles(
estimate=lambda prompt, **kw: len(prompt.split()) * 2 * PRICE_PER_INPUT_TOKEN
+ kw.get("max_tokens", 1024) * PRICE_PER_OUTPUT_TOKEN,
actual=lambda result: result["cost"],
action_kind="llm.completion",
action_name="gpt-4o",
unit="USD_MICROCENTS",
)
def guarded_llm_call(prompt: str, max_tokens: int = 1024) -> dict:
ctx = get_cycles_context()
if ctx and ctx.has_caps() and ctx.caps.max_tokens:
max_tokens = min(max_tokens, ctx.caps.max_tokens)
# Your LLM call here
response = call_llm(prompt, max_tokens=max_tokens)
if ctx:
ctx.metrics = CyclesMetrics(
tokens_input=response["usage"]["input_tokens"],
tokens_output=response["usage"]["output_tokens"],
)
return {
"content": response["content"],
"cost": (response["usage"]["input_tokens"] * PRICE_PER_INPUT_TOKEN
+ response["usage"]["output_tokens"] * PRICE_PER_OUTPUT_TOKEN),
}
@app.route("/chat", methods=["POST"])
def chat():
body = request.get_json()
result = guarded_llm_call(body["prompt"])
return jsonify({"response": result["content"]})Preflight budget check
Use client.decide() with a before_request hook to check budget before processing expensive requests:
import uuid
from flask import request, jsonify, g
from runcycles import DecisionRequest, Subject, Action, Amount, Unit
BUDGET_GUARDED_PATHS = {"/chat", "/summarize"}
@app.before_request
def budget_preflight():
if request.path not in BUDGET_GUARDED_PATHS:
return None
tenant = request.headers.get("X-Tenant-ID", "acme")
g.tenant = tenant
response = client.decide(DecisionRequest(
idempotency_key=str(uuid.uuid4()),
subject=Subject(tenant=tenant, app="my-flask-api"),
action=Action(kind="api.request", name=request.path),
estimate=Amount(unit=Unit.USD_MICROCENTS, amount=1_000_000),
))
if response.is_success:
decision = response.get_body_attribute("decision")
if decision == "DENY":
return jsonify({"error": "budget_exceeded"}), 402
return NonePer-tenant isolation
Extract the tenant from request headers and scope budgets per tenant:
from flask import request, g
@app.before_request
def extract_tenant():
g.tenant = request.headers.get("X-Tenant-ID", "acme")
@app.route("/chat", methods=["POST"])
def chat():
body = request.get_json()
result = guarded_llm_call(body["prompt"], tenant=g.tenant)
return jsonify({"response": result["content"]})Budget dashboard endpoint
Expose per-tenant budget information:
@app.route("/budget/<tenant_id>")
def get_budget(tenant_id):
response = client.get_balances(tenant=tenant_id)
if not response.is_success:
return jsonify({"error": response.error_message}), 500
return jsonify(response.body)Key points
- Use
CyclesClient(sync) in Flask — Flask views are synchronous. - Initialize at app startup — create the client once, store in
app.config. - Map HTTP errors —
BudgetExceededError→ 402, retryable errors → 429. - Preflight with
before_request— lightweight budget check before expensive work. - Isolate tenants — use
g.tenantfrom request headers. - Set a default client — avoids passing
client=to every@cyclesdecorator.
Full example
See examples/flask_integration.py for a complete, runnable server.
Next steps
- Integrating with Django — Django web framework integration
- Integrating with FastAPI — async Python web framework integration
- Error Handling Patterns in Python — handling budget errors in Python
- Testing with Cycles — testing budget-guarded code
- Production Operations Guide — running Cycles in production