Adding Cycles to an Existing Application

This guide covers how to incrementally add budget governance to an application that already makes LLM or API calls. Rather than rewriting your integration layer, you can adopt Cycles in stages.

MCP-compatible agents

If your agent runs in Claude Desktop, Claude Code, Cursor, or Windsurf, the fastest path is the Cycles MCP Server — zero code changes needed. The guide below covers SDK-based integration for application code.

Spring Boot / Java

This guide shows Python and TypeScript. For Spring Boot, equivalent patterns use the @Cycles annotation — see the Spring Boot Quickstart for full setup and examples.

The incremental adoption path

1. Shadow mode     →  Observe what enforcement would do, without blocking anything
2. Wrap one call   →  Add budget governance to a single LLM call path
3. Expand coverage →  Wrap additional call paths
4. Enforce         →  Switch from shadow mode to live enforcement

Stage 1: Deploy Cycles and observe with the decide endpoint

Start by deploying the Cycles stack and using the decide endpoint as a side-channel to observe what budget decisions would be made without blocking any calls.

Important

The dry_run flag on the @cycles decorator / withCycles HOF skips executing the wrapped function entirely (it returns a DryRunResult instead). For shadow observation where your existing code still runs, use the decide endpoint separately.

PythonTypeScript

python

from runcycles import CyclesClient, CyclesConfig

client = CyclesClient(CyclesConfig.from_env())

def existing_chat_function(prompt: str) -> str:
    # Observe: check what Cycles would decide, but don't block
    try:
        decision = client.decide({
            "idempotency_key": str(uuid.uuid4()),
            "subject": {"tenant": client._config.tenant},
            "action": {"kind": "llm.completion", "name": "openai:gpt-4o"},
            "estimate": {"amount": 2000000, "unit": "USD_MICROCENTS"},
        })
        logger.info("Cycles decision: %s", decision.body.get("decision"))
    except Exception:
        pass  # Don't let observation failures affect production

    # Your existing code — completely unchanged
    return call_openai(prompt)

typescript

import { CyclesClient, CyclesConfig } from "runcycles";

const client = new CyclesClient(CyclesConfig.fromEnv());

async function existingChatFunction(prompt: string) {
  // Observe: check what Cycles would decide, but don't block
  try {
    const decision = await client.decide({
      idempotencyKey: crypto.randomUUID(),
      subject: { tenant: client.config.tenant! },
      action: { kind: "llm.completion", name: "openai:gpt-4o" },
      estimate: { amount: 2000000, unit: "USD_MICROCENTS" },
    });
    console.log("Cycles decision:", decision.body.decision);
  } catch {
    // Don't let observation failures affect production
  }

  // Your existing code — completely unchanged
  return await callOpenAI(prompt);
}

This approach lets you observe decisions in production logs while your existing code continues to run unmodified. Use the data to tune budgets before moving to enforcement.

See Shadow Mode Rollout for the full guide on dry-run evaluation at the protocol level.

Stage 2: Wrap your first call

Pick the highest-value call path to wrap first. Good candidates:

The call that costs the most per invocation (e.g., GPT-4o or Claude Opus)
The call that runs most frequently
The call most likely to loop or retry

Wrapping an existing function

Before:

PythonTypeScript

python

def generate_summary(document: str) -> str:
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Summarize: {document}"}],
        max_tokens=2000,
    )
    return response.choices[0].message.content

typescript

async function generateSummary(document: string): Promise<string> {
  const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: `Summarize: ${document}` }],
    max_tokens: 2000,
  });
  return response.choices[0].message.content!;
}

After:

PythonTypeScriptSpring Boot

python

from runcycles import cycles

@cycles(
    estimate=lambda document: int(len(document) / 4 * 250 + 2000 * 1000) * 1.2,
    action_kind="llm.completion",
    action_name="openai:gpt-4o",
)
def generate_summary(document: str) -> str:
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Summarize: {document}"}],
        max_tokens=2000,
    )
    return response.choices[0].message.content

typescript

import { withCycles } from "runcycles";

const generateSummary = withCycles(
  {
    estimate: (document: string) => Math.ceil(document.length / 4 * 250 + 2000 * 1000) * 1.2,
    actionKind: "llm.completion",
    actionName: "openai:gpt-4o",
  },
  async (document: string) => {
    const response = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: [{ role: "user", content: `Summarize: ${document}` }],
      max_tokens: 2000,
    });
    return response.choices[0].message.content!;
  },
);

java

import io.runcycles.client.java.spring.annotation.Cycles;

@Cycles(value = "#{ T(Math).ceil(#document.length() / 4 * 250 + 2000 * 1000) * 1.2 }",
        actionKind = "llm.completion", actionName = "openai:gpt-4o")
public String generateSummary(String document) {
    // Same OpenAI call — business logic unchanged
    return openAiClient.chat(document);
}

The only change is adding the @cycles decorator (Python), withCycles wrapper (TypeScript), or @Cycles annotation (Spring Boot). Your business logic stays exactly the same.

Handling budget denial

Your existing error handling needs one new branch — what to do when budget is denied:

PythonTypeScriptSpring Boot

python

from runcycles import BudgetExceededError

try:
    result = generate_summary(document)
except BudgetExceededError:
    # Option A: Return a graceful fallback
    result = "Summary unavailable — budget limit reached."
    # Option B: Use a cheaper model
    result = generate_summary_cheap(document)
    # Option C: Queue for later
    queue_for_retry(document)

typescript

import { BudgetExceededError } from "runcycles";

try {
  const result = await generateSummary(document);
} catch (err) {
  if (err instanceof BudgetExceededError) {
    // Option A: Return a graceful fallback
    result = "Summary unavailable — budget limit reached.";
    // Option B: Use a cheaper model
    result = await generateSummaryCheap(document);
    // Option C: Queue for later
    queueForRetry(document);
  }
}

java

import io.runcycles.client.java.spring.model.CyclesProtocolException;

try {
    String result = summaryService.generateSummary(document);
} catch (CyclesProtocolException e) {
    if (e.isBudgetExceeded()) {
        // Option A: Return a graceful fallback
        result = "Summary unavailable — budget limit reached.";
        // Option B: Use a cheaper model
        result = summaryService.generateSummaryCheap(document);
        // Option C: Queue for later
        queueForRetry(document);
    }
}

See Degradation Paths for a full treatment of fallback strategies.

Stage 3: Expand coverage

Once the first call path is working, wrap additional calls. Use a consistent pattern:

PythonTypeScript

python

@cycles(estimate=500000, action_kind="llm.completion", action_name="openai:gpt-4o-mini")
def classify_intent(text: str) -> str:
    ...

@cycles(estimate=3000000, action_kind="llm.completion", action_name="openai:gpt-4o")
def generate_response(context: str, intent: str) -> str:
    ...

@cycles(estimate=100000, action_kind="tool.call", action_name="web-search")
def search_web(query: str) -> list:
    ...

typescript

const classifyIntent = withCycles(
  { estimate: 500000, actionKind: "llm.completion", actionName: "openai:gpt-4o-mini" },
  async (text: string) => { ... },
);

const generateResponse = withCycles(
  { estimate: 3000000, actionKind: "llm.completion", actionName: "openai:gpt-4o" },
  async (context: string, intent: string) => { ... },
);

const searchWeb = withCycles(
  { estimate: 100000, actionKind: "tool.call", actionName: "web-search" },
  async (query: string) => { ... },
);

Each wrapped function reserves independently. If the agent calls all three in sequence, the total budget consumed is the sum of actual usage — and each call is individually authorized before it runs.

Stage 4: Switch to live enforcement

Once you're confident in your budget allocations (from shadow mode data), remove dry_run=True:

PythonTypeScriptSpring Boot

python

# Remove dry_run to enable enforcement
@cycles(
    estimate=2000000,
    action_kind="llm.completion",
    action_name="openai:gpt-4o",
    # dry_run=True,  ← remove this line
)
def generate_summary(document: str) -> str:
    ...

typescript

// Remove dryRun to enable enforcement
const generateSummary = withCycles(
  {
    estimate: 2000000,
    actionKind: "llm.completion",
    actionName: "openai:gpt-4o",
    // dryRun: true,  ← remove this line
  },
  async (document: string) => { ... },
);

java

// Remove dryRun to enable enforcement
@Cycles(value = "2000000",
        actionKind = "llm.completion", actionName = "openai:gpt-4o"
        // dryRun = true  ← remove this line
)
public String generateSummary(String document) { ... }

Tips for existing applications

Keep your existing error handling

Don't replace your existing try/except or try/catch blocks. Add Cycles error handling alongside them:

python

try:
    result = generate_summary(document)
except BudgetExceededError:
    result = "Budget limit reached."
except openai.APIError as e:
    # Your existing error handling stays
    result = handle_openai_error(e)

Start with generous budgets

When first deploying, set budgets higher than you think you need. You can tighten them after collecting real usage data. Under-budgeting on day one creates unnecessary friction.

Use scopes to separate environments

Use different workspace scopes for dev, staging, and production:

python

@cycles(
    estimate=2000000,
    action_kind="llm.completion",
    action_name="openai:gpt-4o",
    workspace=os.environ.get("ENVIRONMENT", "dev"),  # dev, staging, prod
)
def generate_summary(document: str) -> str:
    ...

Don't wrap everything at once

It's better to have 3 well-instrumented call paths than 30 poorly-estimated ones. Start with the calls that matter most.

Next steps

Shadow Mode Rollout — full guide to safe rollout
Cost Estimation Cheat Sheet — how much to reserve per model
Choosing the Right Integration Pattern — decorator vs programmatic vs middleware
Integrations Overview — all supported frameworks and providers
Degradation Paths — what to do when budget is denied

Adding Cycles to an Existing Application ​

The incremental adoption path ​

Stage 1: Deploy Cycles and observe with the decide endpoint ​

Stage 2: Wrap your first call ​

Wrapping an existing function ​

Handling budget denial ​

Stage 3: Expand coverage ​

Stage 4: Switch to live enforcement ​

Tips for existing applications ​

Keep your existing error handling ​

Start with generous budgets ​

Use scopes to separate environments ​

Don't wrap everything at once ​

Next steps ​

Adding Cycles to an Existing Application

The incremental adoption path

Stage 1: Deploy Cycles and observe with the decide endpoint

Stage 2: Wrap your first call

Wrapping an existing function

Handling budget denial

Stage 3: Expand coverage

Stage 4: Switch to live enforcement

Tips for existing applications

Keep your existing error handling

Start with generous budgets

Use scopes to separate environments

Don't wrap everything at once

Next steps