Integrating Cycles with Spring AI
This guide shows how to guard Spring AI chat completions and tool calls with Cycles budget reservations so that every LLM interaction is cost-controlled, caps-aware, and observable.
For strategic guidance on where to integrate, see Budget Limits with Spring AI.
Prerequisites
Add the Cycles Spring Boot Starter to your project:
<dependency>
<groupId>io.runcycles</groupId>
<artifactId>cycles-client-java-spring</artifactId>
<version>0.2.0</version>
</dependency>implementation 'io.runcycles:cycles-client-java-spring:0.2.0'Configure the connection in application.yml:
cycles:
base-url: http://localhost:7878
api-key: ${CYCLES_API_KEY}
tenant: acme
app: my-spring-ai-appNeed an API key? Create one via the Admin Server — see Deploy the Full Stack or API Key Management.
60-Second Quick Start
import io.runcycles.client.java.spring.annotation.Cycles;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.stereotype.Service;
@Service
public class ChatService {
private final ChatClient chatClient;
public ChatService(ChatClient.Builder builder) {
this.chatClient = builder.build();
}
// GPT-4o: ~$2.50/1M input tokens ≈ 25 microcents/token
@Cycles(value = "#maxTokens * 25",
actionKind = "llm.completion",
actionName = "gpt-4o")
public String chat(String prompt, int maxTokens) {
return chatClient.prompt(prompt)
.call()
.content();
}
}That's it. Every call to chat() is now budget-guarded: Cycles reserves the estimated cost before execution, commits actual usage after, and throws CyclesProtocolException if the budget is exceeded.
Dynamic cost estimation with Spring AI
Use SpEL expressions to estimate cost from method parameters. The value (or estimate) attribute is evaluated before the method runs:
// Estimate based on max tokens × price per token (in USD_MICROCENTS)
// GPT-4o: ~$2.50/1M input tokens = 25 microcents/token
@Cycles(value = "#maxTokens * 25",
actionKind = "llm.completion",
actionName = "gpt-4o")
public String generate(String prompt, int maxTokens) {
return chatClient.prompt(prompt)
.call()
.content();
}
// Estimate from prompt length (rough token approximation: ~4 chars per token)
@Cycles(value = "#prompt.length() / 4 * 25",
actionKind = "llm.completion",
actionName = "gpt-4o")
public String summarize(String prompt) {
return chatClient.prompt(prompt)
.call()
.content();
}See SpEL Expression Reference for all available expressions.
Reporting actual usage
The actual attribute is evaluated after the method returns, using #result to reference the return value. This lets Cycles commit the real cost instead of the estimate:
@Cycles(value = "#maxTokens * 25",
actual = "#result.length() / 4 * 25",
actionKind = "llm.completion",
actionName = "gpt-4o")
public String generate(String prompt, int maxTokens) {
return chatClient.prompt(prompt)
.call()
.content();
}For precise token counts, access the ChatResponse metadata and report via CyclesMetrics:
import io.runcycles.client.java.spring.annotation.Cycles;
import io.runcycles.client.java.spring.context.CyclesContextHolder;
import io.runcycles.client.java.spring.context.CyclesReservationContext;
import io.runcycles.client.java.spring.model.CyclesMetrics;
@Cycles(value = "#maxTokens * 25",
actionKind = "llm.completion",
actionName = "gpt-4o")
public String generateWithMetrics(String prompt, int maxTokens) {
long start = System.currentTimeMillis();
ChatResponse response = chatClient.prompt(prompt)
.call()
.chatResponse();
String content = response.getResult().getOutput().getText();
// Report exact token usage via the reservation context
CyclesReservationContext ctx = CyclesContextHolder.get();
if (ctx != null) {
Usage usage = response.getMetadata().getUsage();
CyclesMetrics metrics = new CyclesMetrics();
metrics.setTokensInput((int) usage.getPromptTokens());
metrics.setTokensOutput((int) usage.getCompletionTokens());
metrics.setLatencyMs((int) (System.currentTimeMillis() - start));
metrics.setModelVersion("gpt-4o-2024-08-06");
ctx.setMetrics(metrics);
}
return content;
}The actual SpEL attribute on @Cycles handles cost calculation. Use CyclesMetrics for observability data (token counts, latency, model version) that is attached to the commit for reporting.
Respecting budget caps in Spring AI
When budget is running low, Cycles may return ALLOW_WITH_CAPS instead of a flat ALLOW. Caps tell you how to constrain the operation — for example, reducing max tokens to conserve budget. Read them from the reservation context:
@Cycles(value = "#maxTokens * 25",
actionKind = "llm.completion",
actionName = "gpt-4o")
public String capsAwareChat(String prompt, int maxTokens) {
CyclesReservationContext ctx = CyclesContextHolder.get();
// Respect token cap from budget authority
int effectiveMaxTokens = maxTokens;
if (ctx != null && ctx.hasCaps() && ctx.getCaps().getMaxTokens() != null) {
effectiveMaxTokens = Math.min(maxTokens, ctx.getCaps().getMaxTokens());
}
return chatClient.prompt(prompt)
.options(ChatOptions.builder()
.maxTokens(effectiveMaxTokens)
.build())
.call()
.content();
}Error handling
Catch CyclesProtocolException to degrade gracefully when budget is exceeded. This should be part of your service layer from the start:
import io.runcycles.client.java.spring.model.CyclesProtocolException;
@Service
public class ResilientChatService {
private final GuardedLlmService premiumLlm;
private final GuardedLlmService budgetLlm;
public String chat(String prompt) {
try {
return premiumLlm.generate(prompt, 4096); // GPT-4o
} catch (CyclesProtocolException e) {
if (e.isBudgetExceeded()) {
return budgetLlm.generate(prompt, 1024); // GPT-4o-mini fallback
}
if (e.getRetryAfterMs() != null) {
scheduleRetry(prompt, e.getRetryAfterMs());
return "Request queued. Retrying shortly.";
}
throw e;
}
}
}GuardedLlmService is a separate @Service bean whose methods are annotated with @Cycles. This is needed because Spring AOP proxies only intercept calls from outside the bean — see Self-invocation workaround below.
For global exception handling in a REST API:
@ControllerAdvice
public class CyclesExceptionHandler {
@ExceptionHandler(CyclesProtocolException.class)
public ResponseEntity<Map<String, Object>> handleBudgetError(CyclesProtocolException e) {
if (e.isBudgetExceeded()) {
return ResponseEntity.status(429)
.header("Retry-After", String.valueOf(
e.getRetryAfterMs() != null ? e.getRetryAfterMs() / 1000 : 60))
.body(Map.of("error", "budget_exceeded", "message", "Budget limit reached."));
}
return ResponseEntity.status(503)
.body(Map.of("error", e.getReasonCode(), "message", e.getMessage()));
}
}Guarding Spring AI tool calls
For Spring AI function callbacks, wrap the tool execution with @Cycles on a separate service bean:
@Service
public class GuardedToolService {
@Cycles(value = "500000", // $0.005 per tool call
actionKind = "tool.search",
actionName = "web-search",
toolset = "search-tools")
public String webSearch(String query) {
return searchApi.search(query);
}
@Cycles(value = "100000", // $0.001 per DB query
actionKind = "tool.database",
actionName = "sql-query",
toolset = "data-tools")
public String queryDatabase(String sql) {
return jdbcTemplate.queryForList(sql).toString();
}
}Then register these as Spring AI function callbacks:
@Configuration
public class ToolConfig {
@Bean
public FunctionCallback webSearchTool(GuardedToolService tools) {
return FunctionCallback.builder()
.function("web_search", (String query) -> tools.webSearch(query))
.description("Search the web")
.inputType(String.class)
.build();
}
}The toolset attribute scopes budget per tool category, so you can set different budgets for search tools vs. database tools via the Admin API.
Spring AI streaming with budget control
For streaming, use the programmatic CyclesClient instead of the annotation, since the stream needs to commit after all chunks arrive:
import io.runcycles.client.java.spring.client.CyclesClient;
import io.runcycles.client.java.spring.model.*;
@Service
public class StreamingChatService {
private final ChatClient chatClient;
private final CyclesClient cyclesClient;
public Flux<String> streamChat(String prompt, int maxTokens) {
// Reserve budget before streaming
Map<String, Object> body = Map.of(
"idempotency_key", UUID.randomUUID().toString(),
"subject", Map.of("tenant", "acme"),
"action", Map.of("kind", "llm.completion", "name", "gpt-4o"),
"estimate", Map.of("unit", "USD_MICROCENTS", "amount", maxTokens * 25L),
"ttl_ms", 120000
);
var response = cyclesClient.createReservation(body);
String reservationId = (String) response.get("reservation_id");
String decision = (String) response.get("decision");
if (!"ALLOW".equals(decision) && !"ALLOW_WITH_CAPS".equals(decision)) {
throw new CyclesProtocolException("Budget denied: " + decision);
}
AtomicInteger tokenCount = new AtomicInteger();
return chatClient.prompt(prompt)
.stream()
.content()
.doOnNext(chunk -> tokenCount.addAndGet(chunk.length() / 4))
.doOnComplete(() -> {
cyclesClient.commitReservation(reservationId, Map.of(
"idempotency_key", UUID.randomUUID().toString(),
"actual", Map.of("unit", "USD_MICROCENTS",
"amount", tokenCount.get() * 25L)
));
})
.doOnError(err -> {
cyclesClient.releaseReservation(reservationId, Map.of(
"idempotency_key", UUID.randomUUID().toString(),
"reason", "stream_error: " + err.getMessage()
));
});
}
}Agent loop budget control
For multi-step agent workflows, guard each iteration. Each call gets its own reservation, so Cycles can deny mid-workflow when budget runs out:
@Service
public class AgentService {
private final GuardedLlmService llm;
public String runAgent(String task, int maxIterations) {
String context = task;
for (int i = 0; i < maxIterations; i++) {
try {
String response = llm.generate(context, 2048);
if (isComplete(response)) {
return response;
}
context = response;
} catch (CyclesProtocolException e) {
if (e.isBudgetExceeded()) {
return "Agent stopped: budget exhausted after " + i + " iterations.";
}
throw e;
}
}
return "Agent reached max iterations.";
}
}Production patterns
Dry-run rollout
Start in shadow mode to measure budget impact before enforcing:
@Cycles(value = "#maxTokens * 25",
actionKind = "llm.completion",
actionName = "gpt-4o",
dryRun = true)
public String shadowChat(String prompt, int maxTokens) {
return chatClient.prompt(prompt).call().content();
}WARNING
When dryRun = true, the guarded method does not execute. The annotation evaluates the reservation against the budget but skips method execution and returns a framework result object. Use this to measure what budget impact would be, not for serving production traffic.
Multi-tenant via SpEL
Resolve tenant from the method parameters:
@Cycles(value = "#maxTokens * 25",
tenant = "#tenantId",
actionKind = "llm.completion",
actionName = "gpt-4o")
public String tenantChat(String tenantId, String prompt, int maxTokens) {
return chatClient.prompt(prompt).call().content();
}Self-invocation workaround
Spring AOP proxies do not intercept self-calls within the same bean. If you call an @Cycles method from another method in the same class, the annotation is bypassed. Use a separate service bean:
// This bean's @Cycles annotations ARE intercepted by the proxy
@Service
public class GuardedLlmService {
private final ChatClient chatClient;
public GuardedLlmService(ChatClient.Builder builder) {
this.chatClient = builder.build();
}
@Cycles(value = "#maxTokens * 25", actionKind = "llm.completion", actionName = "gpt-4o")
public String generate(String prompt, int maxTokens) {
return chatClient.prompt(prompt).call().content();
}
}
// This bean calls the guarded bean — proxy intercepts correctly
@Service
public class AgentOrchestrator {
@Autowired private GuardedLlmService llm;
public String orchestrate(String task) {
return llm.generate(task, 2048); // @Cycles is applied
}
}Key points
@Cyclesworks with any Spring AIChatClientorChatModelcall — no adapter needed- Use
value(SpEL) to estimate cost before execution,actualto commit real cost after CyclesContextHolder.get()provides reservation context inside the guarded method — use it for caps and metrics- Guard tool calls with
@Cycleson a separate@Servicebean, scoped withtoolset - For streaming, use the programmatic
CyclesClientinstead of the annotation - Catch
CyclesProtocolExceptionto degrade to a cheaper model or queue for retry - Start with
dryRun = truefor shadow-mode rollouts before enforcing
Next steps
- Spring Boot Starter Quickstart — demo app, annotation reference, full walkthrough
- Spring Client Configuration — all
cycles.*properties - SpEL Expression Reference — estimate and actual expressions
- Choosing the Right Overage Policy — REJECT vs ALLOW_IF_AVAILABLE vs ALLOW_WITH_OVERDRAFT
- Budget Limits with Spring AI — strategic guidance