Client Performance Tuning
The default client configuration works well for most workloads. This guide covers when and how to tune for high-throughput or latency-sensitive scenarios.
When to tune
Consider tuning when you observe any of these:
- Concurrent reservation count regularly exceeds 100
- Client-side p99 latency exceeds 50ms
- Commit retry rate exceeds 5%
- Connection timeout errors in client logs
For server-side benchmarks and baseline expectations, see the Performance Benchmarks blog post.
Timeout tuning
All three clients configure two timeouts: connection (TCP handshake) and read (waiting for server response).
Default values
| Setting | Python | TypeScript | Spring Boot |
|---|---|---|---|
| Connect timeout | connect_timeout=2.0 (seconds) | connectTimeout=2000 (ms) | cycles.http.connect-timeout=2s |
| Read timeout | read_timeout=5.0 (seconds) | readTimeout=5000 (ms) | cycles.http.read-timeout=5s |
TypeScript timeout behavior
Node's built-in fetch does not distinguish connection from read timeout. connectTimeout and readTimeout are summed into a single AbortSignal.timeout() value (default: 7000ms total).
Tuning profiles
Choose a profile based on your deployment topology:
Co-located (client and server on same network):
# Python
config = CyclesConfig(
base_url="http://cycles-server:7878",
api_key="cyc_live_...",
connect_timeout=0.5, # 500ms — same-network handshake is fast
read_timeout=2.0, # 2s — server p99 is <5ms, leave headroom
)// TypeScript
const config = new CyclesConfig({
baseUrl: "http://cycles-server:7878",
apiKey: "cyc_live_...",
connectTimeout: 500,
readTimeout: 2000,
});# Spring Boot
cycles:
http:
connect-timeout: 500ms
read-timeout: 2sCross-region or high-latency network:
# Python
config = CyclesConfig(
base_url="https://cycles.us-east.example.com",
api_key="cyc_live_...",
connect_timeout=5.0, # TLS handshake across regions
read_timeout=15.0, # account for network jitter
)# Spring Boot
cycles:
http:
connect-timeout: 5s
read-timeout: 15sHigh-throughput with aggressive retry:
# Python — fail fast, retry quickly
config = CyclesConfig(
base_url="http://cycles-server:7878",
api_key="cyc_live_...",
connect_timeout=1.0,
read_timeout=3.0,
retry_max_attempts=10,
retry_initial_delay=0.1,
retry_multiplier=1.5,
retry_max_delay=5.0,
)WARNING
Never set the read timeout below the server's expected p99 latency. At default load, the server's p99 for reserve+commit is ~20ms. Under heavy load it can reach 50ms+. A read timeout of 100ms will cause spurious failures.
Connection pooling
Python (httpx)
The Python client uses httpx.Client, which manages a connection pool automatically. Key details:
- Connections are reused across requests (HTTP keep-alive)
- Pool timeout is hardcoded at 5 seconds (time to acquire a connection from the pool)
- Write timeout is hardcoded at 5 seconds
- httpx uses a default pool of 100 connections (10 per host)
For most workloads, the defaults are sufficient. If you see PoolTimeout errors, you likely have too many concurrent requests for a single client instance. Solutions:
Increase concurrency limit — create the client with a custom transport:
pythonimport httpx from runcycles import CyclesConfig config = CyclesConfig(base_url="...", api_key="...") # Custom transport with larger pool transport = httpx.HTTPTransport( limits=httpx.Limits(max_connections=200, max_keepalive_connections=50) )Use multiple client instances — partition by tenant or workload if a single pool is a bottleneck.
TypeScript (fetch)
The TypeScript client uses Node's built-in fetch, which relies on the runtime's HTTP agent for connection reuse:
- Node.js 20+ reuses connections automatically via its global
undiciagent - No explicit pool configuration is exposed
- Keep-alive is enabled by default
For high-throughput Node.js services, ensure you're running Node 20+ where fetch connection reuse is reliable.
Spring Boot (Reactor Netty)
The Spring Boot starter uses WebClient backed by Reactor Netty's HttpClient. Reactor Netty manages its own connection pool:
- Default pool: shared global pool (500 max connections, 45s idle timeout)
- Configure via Reactor Netty system properties or provide a custom
WebClientbean
To customize the connection pool:
@Bean
public WebClient cyclesWebClient(CyclesProperties props) {
ConnectionProvider provider = ConnectionProvider.builder("cycles")
.maxConnections(200)
.maxIdleTime(Duration.ofSeconds(30))
.build();
HttpClient httpClient = HttpClient.create(provider)
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS,
(int) props.getHttp().getConnectTimeout().toMillis())
.responseTimeout(props.getHttp().getReadTimeout());
return WebClient.builder()
.clientConnector(new ReactorClientHttpConnector(httpClient))
.baseUrl(props.getBaseUrl())
.defaultHeader("X-Cycles-API-Key", props.getApiKey())
.build();
}Retry strategy tuning
The commit retry engine handles transient failures (network errors, 5xx responses) with exponential backoff. Two common profiles:
Fast-fail (latency-sensitive paths)
Fail quickly so the caller can degrade. Use when the user is waiting for a response.
| Setting | Python | TypeScript | Spring Boot |
|---|---|---|---|
| Max attempts | retry_max_attempts=2 | retryMaxAttempts: 2 | cycles.retry.max-attempts=2 |
| Initial delay | retry_initial_delay=0.1 | retryInitialDelay: 100 | cycles.retry.initial-delay=100ms |
| Multiplier | retry_multiplier=1.5 | retryMultiplier: 1.5 | cycles.retry.multiplier=1.5 |
| Max delay | retry_max_delay=1.0 | retryMaxDelay: 1000 | cycles.retry.max-delay=1s |
| Total worst-case | ~250ms | ~250ms | ~250ms |
Durable (must-commit workloads)
Retry aggressively because the action already happened and the ledger must reflect it.
| Setting | Python | TypeScript | Spring Boot |
|---|---|---|---|
| Max attempts | retry_max_attempts=10 | retryMaxAttempts: 10 | cycles.retry.max-attempts=10 |
| Initial delay | retry_initial_delay=0.2 | retryInitialDelay: 200 | cycles.retry.initial-delay=200ms |
| Multiplier | retry_multiplier=1.5 | retryMultiplier: 1.5 | cycles.retry.multiplier=1.5 |
| Max delay | retry_max_delay=60.0 | retryMaxDelay: 60000 | cycles.retry.max-delay=60s |
| Total worst-case | ~2 minutes | ~2 minutes | ~2 minutes |
When to disable retry
config = CyclesConfig(base_url="...", api_key="...", retry_enabled=False)Disable retry when:
- You handle retries at a higher level (e.g. job queue with built-in retry)
- You need deterministic latency with no background work
- Testing, where retry masks failures
Anti-patterns
Creating a new client per request
Every CyclesClient instance creates a new HTTP connection pool. Creating one per request wastes connections and prevents reuse.
# BAD — new client (and connection pool) for every call
@cycles(estimate=1000, client=CyclesClient(config))
def process(text: str) -> str:
return call_llm(text)# GOOD — reuse a single client via module default
client = CyclesClient(config)
set_default_client(client)
@cycles(estimate=1000)
def process(text: str) -> str:
return call_llm(text)The same applies in TypeScript (setDefaultClient) and Spring Boot (the auto-configured CyclesClient bean is a singleton).
Expensive estimate computations in the hot path
The estimate callable runs synchronously before each reservation. Keep it fast:
# BAD — network call in the estimate
@cycles(estimate=lambda text: fetch_token_count_from_api(text))
def process(text: str) -> str:
return call_llm(text)# GOOD — pre-compute or use a fast heuristic
@cycles(estimate=lambda text: len(text) * 4) # ~4 tokens per character
def process(text: str) -> str:
return call_llm(text)High-throughput checklist
- Reuse a single client instance across all requests (all 3 clients)
- Warm up on startup — make a health check call to establish the connection pool:python
import httpx httpx.get(f"{config.base_url}/actuator/health") - Graceful shutdown — commit or release active reservations before process exit
- Pre-compute estimates outside the decorator/HOF hot path
- Lower timeouts if co-located, raise if cross-region
- Use durable retry for must-commit workloads
- Monitor retry rates and timeout errors — rising rates signal infrastructure issues
Server-side tuning
For server-side performance:
- Redis connection pool — default 50 connections. See Production Operations Guide.
- Expiry sweep interval — default 5000ms. See Server Configuration Reference.
- Benchmarks — Reserve 4.5ms p50, 2390 ops/sec at 32 threads. See Performance Benchmarks.
Next steps
- Production Operations Guide — server infrastructure and Redis tuning
- Monitoring and Alerting — metrics and alerting setup
- Observability Setup — Prometheus, Grafana, and Datadog integration
- Python Client Configuration — all Python config options
- TypeScript Client Configuration — all TypeScript config options
- Spring Client Configuration — all Spring Boot config options