Production Operations Guide
This guide covers what you need to run Cycles reliably in production. It assumes you've already deployed the stack per Deploy the Full Stack and are preparing for production traffic.
Redis configuration for production
Cycles stores all state in Redis. Redis availability directly determines Cycles availability.
Persistence
Enable both RDB snapshots and AOF append-only logging:
# redis.conf
save 900 1 # Snapshot every 15 min if at least 1 key changed
save 300 10 # Snapshot every 5 min if at least 10 keys changed
appendonly yes # Enable AOF
appendfsync everysec # Fsync once per second (good balance of safety and performance)In Docker Compose:
redis:
image: redis:7-alpine
command: redis-server --appendonly yes --save "900 1" --save "300 10"
volumes:
- redis-data:/dataMemory management
Set a max memory limit and eviction policy:
maxmemory 2gb
maxmemory-policy noeviction # IMPORTANT: never evict budget dataAlways use noeviction. Evicting budget keys silently loses budget state. It is better for Redis to reject writes (causing reservation failures that can be retried) than to silently drop data.
High availability
For production, consider:
- Redis Sentinel — automatic failover with a primary + replica setup. Good for most deployments.
- Redis Cluster — sharded across multiple nodes. Required for very large deployments.
Cycles uses Lua scripts for atomic operations. All keys for a single reservation operation are in the same Redis keyspace, so single-instance and Sentinel setups work out of the box. For Redis Cluster, ensure the key prefix strategy keeps related keys on the same shard.
Backup strategy
- Automated RDB snapshots stored offsite (S3, GCS, etc.)
- AOF backups for point-in-time recovery
- Test restores regularly — untested backups are not backups
Cycles Server configuration
Running multiple instances
The Cycles Server is stateless. You can run multiple instances behind a load balancer:
cycles-server-1:
image: ghcr.io/runcycles/cycles-server:0.1.23
environment:
REDIS_HOST: redis-primary
REDIS_PORT: 6379
REDIS_PASSWORD: ${REDIS_PASSWORD}
cycles-server-2:
image: ghcr.io/runcycles/cycles-server:0.1.23
environment:
REDIS_HOST: redis-primary
REDIS_PORT: 6379
REDIS_PASSWORD: ${REDIS_PASSWORD}Any load balancing strategy works (round-robin, least-connections). No sticky sessions required.
Health checks
Both servers expose Spring Boot Actuator health endpoints:
# Cycles Server
curl http://localhost:7878/actuator/health
# Admin Server
curl http://localhost:7979/actuator/healthConfigure your load balancer or orchestrator to check these endpoints.
JVM tuning
The default JVM settings work for most deployments. For high-throughput environments:
JAVA_OPTS="-Xms512m -Xmx1g -XX:+UseG1GC"Reservation expiry
The server runs a background sweep to expire stale reservations:
cycles:
expiry:
interval-ms: 5000 # Default: sweep every 5 secondsReduce the interval for tighter TTL enforcement. Increase it to reduce Redis load if TTL precision is not critical.
Network architecture
Recommended topology
┌─────────────────┐
│ Load Balancer │
│ (port 7878) │ ← Application traffic (public or internal)
└────────┬─────────┘
│
┌────┴────┐
│ Cycles │ ← Multiple instances for HA
│ Server │
└────┬────┘
│
┌────┴────┐
│ Redis │ ← Internal network only
└─────────┘
┌──────────────────┐
│ Admin Server │ ← Internal/VPN only (port 7979)
│ (management) │
└────────┬─────────┘
│
┌────┴────┐
│ Redis │ ← Same Redis instance
└─────────┘Network isolation
- Cycles Server (port 7878): Accessible to your application. Can be on an internal network or behind an API gateway.
- Admin Server (port 7979): Internal access only. This manages tenants, API keys, and budgets. Never expose to the public internet.
- Redis (port 6379): Internal access only. Never expose directly.
TLS termination
Terminate TLS at the load balancer or API gateway. The Cycles Server itself runs plain HTTP. Example with nginx:
server {
listen 443 ssl;
server_name cycles.internal.example.com;
ssl_certificate /etc/ssl/certs/cycles.crt;
ssl_certificate_key /etc/ssl/private/cycles.key;
location / {
proxy_pass http://cycles-server:7878;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}Capacity planning
Rules of thumb
- Redis memory: ~1 KB per active reservation, ~500 bytes per budget ledger. 1 GB of Redis memory supports roughly 500K concurrent reservations.
- Server CPU: Each reservation involves 1 Redis Lua script execution (~1ms). A single server instance can handle thousands of reservations per second.
- Latency: Expect <5ms for reservation operations on a well-configured setup (server co-located with Redis).
Scaling triggers
Add more Cycles Server instances when:
- Response latency exceeds 50ms at p99
- CPU utilization exceeds 70%
Scale Redis when:
- Memory utilization exceeds 80%
- Command latency exceeds 5ms
Upgrade procedures
Rolling upgrade
Since the Cycles Server is stateless, you can do rolling upgrades with zero downtime:
- Pull the new image:
docker pull ghcr.io/runcycles/cycles-server:NEW_VERSION - Stop one instance at a time
- Start the new version
- Verify health check passes
- Repeat for remaining instances
Version compatibility
The Cycles protocol is versioned (/v1). Minor version upgrades (e.g., 0.1.22 → 0.1.23) are backward-compatible. Check the changelog for breaking changes before major upgrades.
Rollback
If an upgrade causes issues:
- Stop the new version
- Start the previous version
- Redis state is compatible across minor versions
Logging
Log levels
Configure via Spring Boot:
logging:
level:
io.runcycles: INFO # Application logs
org.springframework: WARN # Framework logsSet io.runcycles: DEBUG for troubleshooting (includes full request/response logging).
Structured logging
Add JSON logging for log aggregation systems:
logging:
pattern:
console: '{"timestamp":"%d","level":"%p","logger":"%c","message":"%m"}%n'Or use the Spring Boot JSON logging starter for full structured output.
Operational runbooks
Budget exhaustion alert
Symptom: Applications report BUDGET_EXCEEDED errors.
Response:
- Check which scope is exhausted:
GET /v1/balances?tenant=... - Determine if this is expected (legitimate traffic) or unexpected (runaway agent)
- If expected: fund the budget via admin API (
POST .../fundwithCREDIT) - If unexpected: check active reservations for anomalies (
GET /v1/reservations?status=ACTIVE)
Reservation leak
Symptom: Budget reserved amount grows but spent stays flat. Reservations are being created but never committed or released.
Response:
- List active reservations:
GET /v1/reservations?status=ACTIVE - Check for reservations past their expected TTL
- The expiry sweep should eventually clean these up. If it's not running, check the server logs.
- Investigate the client application — it may be failing to commit or release.
Redis connection loss
Symptom: All reservation operations fail with 500 errors.
Response:
- Check Redis connectivity:
redis-cli ping - Check server logs for connection errors
- Restart the Cycles Server if Redis connection pool is exhausted
- Active reservations with remaining TTL are preserved in Redis and will resume when connectivity returns
Next steps
- Security Hardening — Redis AUTH, TLS, key rotation
- Monitoring and Alerting — metrics and alerting setup
- Server Configuration Reference — all configuration properties
