Deploying the Events Service
The events service (cycles-server-events) delivers webhook events asynchronously — use it to get real-time alerts in Slack, PagerDuty, or your own systems when budgets run out, thresholds are crossed, or reservations are denied.
As of v0.1.25.9 the service binds two ports: the public API port 7980 (dispatch control surface) and a separate management port 9980 for actuator endpoints (/actuator/health, /actuator/info, /actuator/prometheus). Expose 7980 via public ingress; keep 9980 internal-only.
It is optional — the admin and runtime servers operate normally without it. When deployed, it consumes delivery jobs from Redis and sends HTTP POST requests to webhook endpoints with HMAC-SHA256 signatures.
Quick start with Docker
If you already have the full stack running via Deploying the Full Cycles Stack, uncomment the cycles-events block in your docker-compose.yml and restart. Otherwise, use the full-stack compose from the admin repo:
# From the cycles-server-admin directory
export WEBHOOK_SECRET_ENCRYPTION_KEY=$(openssl rand -base64 32)
docker compose -f docker-compose.full-stack.yml upServices: Redis (6379), Admin (7979), Runtime (7878), Events API (7980), Events management/actuator (9980).
Standalone deployment
From pre-built image
docker run -d --name cycles-events \
-p 7980:7980 \
-p 9980:9980 \
-e REDIS_HOST=redis.example.com \
-e REDIS_PORT=6379 \
-e REDIS_PASSWORD=your-redis-password \
-e WEBHOOK_SECRET_ENCRYPTION_KEY=your-base64-key \
ghcr.io/runcycles/cycles-server-events:0.1.25.10Only 7980 needs to be reachable from clients and downstream webhook targets. 9980 should remain internal — scrape it from your Prometheus cluster on its own network path.
From JAR
REDIS_HOST=redis.example.com \
REDIS_PORT=6379 \
REDIS_PASSWORD=your-redis-password \
WEBHOOK_SECRET_ENCRYPTION_KEY=your-base64-key \
java -jar cycles-server-events-*.jarConfiguration
Required
| Variable | Description |
|---|---|
REDIS_HOST | Redis hostname (shared with admin and runtime servers) |
REDIS_PORT | Redis port (default: 6379) |
REDIS_PASSWORD | Redis password (empty for no auth) |
Recommended
| Variable | Default | Description |
|---|---|---|
WEBHOOK_SECRET_ENCRYPTION_KEY | (empty) | AES-256-GCM key for signing secret decryption. Base64-encoded 32 bytes. Must match admin and runtime. Generate: openssl rand -base64 32. If empty, secrets are read as plaintext. |
Tuning
| Variable | Default | Description |
|---|---|---|
dispatch.pending.timeout-seconds | 5 | BRPOP blocking timeout (seconds) |
dispatch.retry.poll-interval-ms | 5000 | How often to check for ready retries (ms) |
dispatch.http.timeout-seconds | 30 | HTTP request timeout for webhook delivery |
dispatch.http.connect-timeout-seconds | 5 | HTTP connect timeout |
MAX_DELIVERY_AGE_MS | 86400000 | Deliveries older than this auto-fail (24h) |
EVENT_TTL_DAYS | 90 | Redis TTL for event records |
DELIVERY_TTL_DAYS | 14 | Redis TTL for delivery records |
RETENTION_CLEANUP_INTERVAL_MS | 3600000 | ZSET index cleanup interval (1h) |
Full configuration example
REDIS_HOST=redis.example.com
REDIS_PORT=6379
REDIS_PASSWORD=your-redis-password
WEBHOOK_SECRET_ENCRYPTION_KEY=K7x2mP9qR4sT6wB1cD3fG5hJ8kL0nA2=
dispatch.pending.timeout-seconds=5
dispatch.retry.poll-interval-ms=5000
dispatch.http.timeout-seconds=30
dispatch.http.connect-timeout-seconds=5
MAX_DELIVERY_AGE_MS=86400000
EVENT_TTL_DAYS=90
DELIVERY_TTL_DAYS=14
RETENTION_CLEANUP_INTERVAL_MS=3600000Health check
The events service exposes a Spring Boot Actuator health endpoint on the management port (9980 by default as of v0.1.25.9):
curl http://localhost:9980/actuator/health
# {"status":"UP"}Pre-v0.1.25.9 deployments exposed /actuator/health on the public API port 7980. Update kubelet probes and Docker HEALTHCHECK commands to hit :9980 when upgrading. The published Docker image's built-in HEALTHCHECK (30s interval, 60s start period, 5 retries) has already been updated.
What happens when the events service is down
- Admin and runtime servers are unaffected — event emission is fire-and-forget, never blocks API responses
- Events and deliveries accumulate in Redis —
event:{id}keys (90-day TTL),delivery:{id}keys (14-day TTL),dispatch:pendinglist grows - Redis memory is bounded — TTLs ensure keys auto-expire even if never consumed
- When the events service restarts:
- Stale deliveries (older than
MAX_DELIVERY_AGE_MS, default 24h) are immediately marked FAILED - Fresh deliveries are processed normally via BRPOP
RetentionCleanupServicetrims orphaned ZSET index entries hourly
- Stale deliveries (older than
- No data loss for events — event records persist in Redis for 90 days regardless of delivery status
Auto-disable for persistently failing subscriptions
The events service tracks consecutive_failures per subscription. When the counter reaches disable_after_failures (default 10), the subscription transitions to DISABLED and no further deliveries are attempted. The counter resets to 0 on any successful delivery. Re-enable a disabled subscription with PATCH /v1/admin/webhooks/{id} once the receiver is healthy.
Stale deliveries (older than MAX_DELIVERY_AGE_MS, default 24h) are marked FAILED without attempting HTTP delivery. This prevents a large backlog from triggering thundering-herd traffic against a receiver after a long events-service outage.
Signing secrets are encrypted at rest with AES-256-GCM using WEBHOOK_SECRET_ENCRYPTION_KEY (v0.1.25.2+). The events service decrypts per delivery; plaintext never lives on disk.
Prometheus metrics
The events service publishes webhook delivery metrics under the cycles_webhook_* namespace on /actuator/prometheus, served on the management port (9980 by default as of v0.1.25.9; was 7980 on pre-.9 builds). Update Prometheus scrape targets accordingly — the metric names and labels are unchanged.
| Metric | Tags | Description |
|---|---|---|
cycles_webhook_delivery_attempts_total | tenant, event_type | Every outbound HTTP attempt (including retries) |
cycles_webhook_delivery_success_total | tenant, event_type, status_code_family (2xx/3xx/4xx/5xx) | Attempts that received HTTP 2xx |
cycles_webhook_delivery_failed_total | tenant, event_type, reason | Failed attempts, bucketed by failure reason |
cycles_webhook_delivery_retried_total | tenant, event_type | Retry attempts scheduled on the dispatch:retry ZSET |
cycles_webhook_delivery_stale_total | tenant | Deliveries auto-failed by the MAX_DELIVERY_AGE_MS gate |
cycles_webhook_subscription_auto_disabled_total | tenant, reason | Subscriptions transitioned to DISABLED after disable_after_failures |
cycles_webhook_delivery_latency_seconds | tenant, event_type, outcome | Timer — HTTP RTT per delivery attempt |
cycles_webhook_events_payload_invalid_total | type, rule | Event payload validation discrepancies (no tenant tag — shape issue, not traffic) |
The tenant tag on all counters is gated by cycles.metrics.tenant-tag.enabled (default true) — set to false in deployments with many thousands of tenants to bound Prometheus cardinality.
Alert on cycles_webhook_subscription_auto_disabled_total (any increase is a receiver health issue) and on a sustained rise in cycles_webhook_delivery_failed_total{reason=!~"client_4xx"} (non-client-error failures indicate dispatch issues).
Scaling
Multiple events service instances can safely BRPOP from the same dispatch:pending list — BRPOP is atomic, so each delivery is processed by exactly one consumer. No distributed locking is needed.
Next steps
- Webhook Event Delivery Protocol — full event type catalog and delivery specification
- Managing Webhooks — create, test, and monitor webhooks
- Webhook Integrations — PagerDuty, Slack, ServiceNow examples
- Configuration Reference — all events service settings
- Architecture Overview — how the events service fits in the system