Deploying the Events Service
The events service (cycles-server-events) has two async jobs: webhook delivery and CyclesEvidence signing. Use webhook delivery to get real-time alerts in Slack, PagerDuty, or your own systems when budgets run out, thresholds are crossed, or reservations are denied. Use CyclesEvidence signing when runtime responses need verifiable audit receipts.
As of v0.1.25.9 the service binds two ports: the application port 7980 and a separate management port 9980 for actuator endpoints (/actuator/health, /actuator/info, /actuator/prometheus). The current reference service is an outbound worker; webhook delivery and evidence signing do not require inbound application traffic. Keep 9980 internal-only for health checks and Prometheus scraping, and do not publish 7980 unless your deployment has an explicit internal control-plane use for that app port.
It is optional — the admin and runtime servers operate normally without it. When deployed, it consumes delivery jobs from Redis and sends HTTP POST requests to webhook endpoints with HMAC-SHA256 signatures. If CyclesEvidence is enabled, it also consumes evidence-source records from Redis, signs envelopes with Ed25519, and stores them by content hash for GET /v1/evidence/{id}.
Quick start with Docker
If you already have the full stack running via Deploying the Full Cycles Stack, uncomment the cycles-events block in your docker-compose.yml and restart. Otherwise, use the full-stack compose from the admin repo:
# From the cycles-server-admin directory
export WEBHOOK_SECRET_ENCRYPTION_KEY=$(openssl rand -base64 32)
docker compose -f docker-compose.full-stack.yml upServices: Redis (6379), Admin (7979), Runtime (7878), Events app port (7980), Events management/actuator (9980).
Standalone deployment
From pre-built image
docker run -d --name cycles-events \
-e REDIS_HOST=redis.example.com \
-e REDIS_PORT=6379 \
-e REDIS_PASSWORD=your-redis-password \
-e WEBHOOK_SECRET_ENCRYPTION_KEY=your-base64-key \
ghcr.io/runcycles/cycles-server-events:0.1.25.15The service does not need inbound traffic from applications or webhook targets; it sends webhook HTTP requests outbound. For local inspection, temporarily add -p 9980:9980 and query the management endpoint from the host. In production, scrape 9980 from Prometheus on an internal network path and leave 7980 unpublished unless you have a specific internal use for the app port.
From JAR
REDIS_HOST=redis.example.com \
REDIS_PORT=6379 \
REDIS_PASSWORD=your-redis-password \
WEBHOOK_SECRET_ENCRYPTION_KEY=your-base64-key \
java -jar cycles-server-events-*.jarConfiguration
Required
| Variable | Description |
|---|---|
REDIS_HOST | Redis hostname (shared with admin and runtime servers) |
REDIS_PORT | Redis port (default: 6379) |
REDIS_PASSWORD | Redis password (empty for no auth) |
Recommended
| Variable | Default | Description |
|---|---|---|
WEBHOOK_SECRET_ENCRYPTION_KEY | (empty) | AES-256-GCM key for signing secret decryption. Base64-encoded 32 bytes. Must match admin and runtime. Generate: openssl rand -base64 32. If empty, secrets are read as plaintext. |
Optional: CyclesEvidence signing
Configure these only when runtime responses should include verifiable cycles_evidence references. The public identity must match the runtime server's EVIDENCE_SERVER_ID and EVIDENCE_SIGNING_SIGNER_DID; the private key belongs only on cycles-server-events.
| Variable | Description |
|---|---|
EVIDENCE_SERVER_ID | Issuer base URL including /v1, for example https://cycles.example.com/v1. Blank disables evidence signing; pending source records are left untouched, not dead-lettered. |
EVIDENCE_SIGNING_SIGNER_DID | Raw-hex Ed25519 public key. Must match the runtime server's public signer identity. |
EVIDENCE_SIGNING_PRIVATE_KEY_HEX | Raw-hex Ed25519 private key. Keep this secret and deploy it only to cycles-server-events. |
The runtime server also publishes public JWKS metadata with EVIDENCE_SIGNING_KID, EVIDENCE_SIGNING_NBF_MS, and EVIDENCE_SIGNING_RETIRED_KEYS. Those variables are not read by the events service.
Tuning
| Variable | Default | Description |
|---|---|---|
dispatch.pending.timeout-seconds | 5 | BRPOP blocking timeout (seconds) |
dispatch.retry.poll-interval-ms | 5000 | How often to check for ready retries (ms) |
dispatch.http.timeout-seconds | 30 | HTTP request timeout for webhook delivery |
dispatch.http.connect-timeout-seconds | 5 | HTTP connect timeout |
MAX_DELIVERY_AGE_MS | 86400000 | Deliveries older than this auto-fail (24h) |
EVENT_TTL_DAYS | 90 | Redis TTL for event records |
DELIVERY_TTL_DAYS | 14 | Redis TTL for delivery records |
RETENTION_CLEANUP_INTERVAL_MS | 3600000 | ZSET index cleanup interval (1h) |
Full configuration example
REDIS_HOST=redis.example.com
REDIS_PORT=6379
REDIS_PASSWORD=your-redis-password
WEBHOOK_SECRET_ENCRYPTION_KEY=K7x2mP9qR4sT6wB1cD3fG5hJ8kL0nA2=
EVIDENCE_SERVER_ID=https://cycles.example.com/v1
EVIDENCE_SIGNING_SIGNER_DID=b10554...c522
EVIDENCE_SIGNING_PRIVATE_KEY_HEX=4f9c...d20a
dispatch.pending.timeout-seconds=5
dispatch.retry.poll-interval-ms=5000
dispatch.http.timeout-seconds=30
dispatch.http.connect-timeout-seconds=5
MAX_DELIVERY_AGE_MS=86400000
EVENT_TTL_DAYS=90
DELIVERY_TTL_DAYS=14
RETENTION_CLEANUP_INTERVAL_MS=3600000Health check
The events service exposes a Spring Boot Actuator health endpoint on the management port (9980 by default as of v0.1.25.9):
curl http://localhost:9980/actuator/health
# {"status":"UP"}Pre-v0.1.25.9 deployments exposed /actuator/health on the application port 7980. Update kubelet probes and Docker HEALTHCHECK commands to hit :9980 when upgrading. The published Docker image's built-in HEALTHCHECK (30s interval, 60s start period, 5 retries) has already been updated.
What happens when the events service is down
- Admin and runtime servers are unaffected — event emission and evidence source writes are fire-and-forget, never blocking API responses
- Events and deliveries accumulate in Redis —
event:{id}keys (90-day TTL),delivery:{id}keys (14-day TTL),dispatch:pendinglist grows - Redis memory is bounded — TTLs ensure keys auto-expire even if never consumed
- When the events service restarts:
- Stale deliveries (older than
MAX_DELIVERY_AGE_MS, default 24h) are immediately marked FAILED - Fresh deliveries are processed normally via BRPOP
RetentionCleanupServicetrims orphaned ZSET index entries hourly
- Stale deliveries (older than
- No data loss for events — event records persist in Redis for 90 days regardless of delivery status
- Evidence may be temporarily unavailable — responses can still include
cycles_evidence, butGET /v1/evidence/{id}may return transient404until the events service signs and stores the envelope
Auto-disable for persistently failing subscriptions
The events service tracks consecutive_failures per subscription. When the counter reaches disable_after_failures (default 10), the subscription transitions to DISABLED and no further deliveries are attempted. The counter resets to 0 on any successful delivery. Re-enable a disabled subscription with PATCH /v1/admin/webhooks/{id} once the receiver is healthy.
Stale deliveries (older than MAX_DELIVERY_AGE_MS, default 24h) are marked FAILED without attempting HTTP delivery. This prevents a large backlog from triggering thundering-herd traffic against a receiver after a long events-service outage.
Signing secrets are encrypted at rest with AES-256-GCM using WEBHOOK_SECRET_ENCRYPTION_KEY (v0.1.25.2+). The events service decrypts per delivery; plaintext never lives on disk.
Prometheus metrics
The events service publishes webhook delivery metrics under the cycles_webhook_* namespace on /actuator/prometheus, served on the management port (9980 by default as of v0.1.25.9; was 7980 on pre-.9 builds). Update Prometheus scrape targets accordingly — the metric names and labels are unchanged.
| Metric | Tags | Description |
|---|---|---|
cycles_webhook_delivery_attempts_total | tenant, event_type | Every outbound HTTP attempt (including retries) |
cycles_webhook_delivery_success_total | tenant, event_type, status_code_family (2xx/3xx/4xx/5xx) | Attempts that received HTTP 2xx |
cycles_webhook_delivery_failed_total | tenant, event_type, reason | Failed attempts, bucketed by failure reason |
cycles_webhook_delivery_retried_total | tenant, event_type | Retry attempts scheduled on the dispatch:retry ZSET |
cycles_webhook_delivery_stale_total | tenant | Deliveries auto-failed by the MAX_DELIVERY_AGE_MS gate |
cycles_webhook_subscription_auto_disabled_total | tenant, reason | Subscriptions transitioned to DISABLED after disable_after_failures |
cycles_webhook_delivery_latency_seconds | tenant, event_type, outcome | Timer — HTTP RTT per delivery attempt |
cycles_webhook_events_payload_invalid_total | type, rule | Event payload validation discrepancies (no tenant tag — shape issue, not traffic) |
The tenant tag on all counters is gated by cycles.metrics.tenant-tag.enabled (default true) — set to false in deployments with many thousands of tenants to bound Prometheus cardinality.
Alert on cycles_webhook_subscription_auto_disabled_total (any increase is a receiver health issue) and on a sustained rise in cycles_webhook_delivery_failed_total{reason=!~"client_4xx"} (non-client-error failures indicate dispatch issues).
Scaling
Multiple events service instances can safely BRPOP from the same dispatch:pending list — BRPOP is atomic, so each delivery is processed by exactly one consumer. No distributed locking is needed.
Next steps
- Webhook Event Delivery Protocol — full event type catalog and delivery specification
- CyclesEvidence Envelopes — evidence signing, JWKS verification, and rotation
- Managing Webhooks — create, test, and monitor webhooks
- Webhook Integrations — PagerDuty, Slack, ServiceNow examples
- Configuration Reference — all events service settings
- Architecture Overview — how the events service fits in the system