Deploying the Events Service

The events service (cycles-server-events) delivers webhook events asynchronously — use it to get real-time alerts in Slack, PagerDuty, or your own systems when budgets run out, thresholds are crossed, or reservations are denied.

As of v0.1.25.9 the service binds two ports: the public API port 7980 (dispatch control surface) and a separate management port 9980 for actuator endpoints (/actuator/health, /actuator/info, /actuator/prometheus). Expose 7980 via public ingress; keep 9980 internal-only.

It is optional — the admin and runtime servers operate normally without it. When deployed, it consumes delivery jobs from Redis and sends HTTP POST requests to webhook endpoints with HMAC-SHA256 signatures.

Quick start with Docker

If you already have the full stack running via Deploying the Full Cycles Stack, uncomment the cycles-events block in your docker-compose.yml and restart. Otherwise, use the full-stack compose from the admin repo:

bash

# From the cycles-server-admin directory
export WEBHOOK_SECRET_ENCRYPTION_KEY=$(openssl rand -base64 32)
docker compose -f docker-compose.full-stack.yml up

Services: Redis (6379), Admin (7979), Runtime (7878), Events API (7980), Events management/actuator (9980).

Standalone deployment

From pre-built image

bash

docker run -d --name cycles-events \
  -p 7980:7980 \
  -p 9980:9980 \
  -e REDIS_HOST=redis.example.com \
  -e REDIS_PORT=6379 \
  -e REDIS_PASSWORD=your-redis-password \
  -e WEBHOOK_SECRET_ENCRYPTION_KEY=your-base64-key \
  ghcr.io/runcycles/cycles-server-events:0.1.25.10

Only 7980 needs to be reachable from clients and downstream webhook targets. 9980 should remain internal — scrape it from your Prometheus cluster on its own network path.

From JAR

bash

REDIS_HOST=redis.example.com \
REDIS_PORT=6379 \
REDIS_PASSWORD=your-redis-password \
WEBHOOK_SECRET_ENCRYPTION_KEY=your-base64-key \
java -jar cycles-server-events-*.jar

Configuration

Required

Variable	Description
`REDIS_HOST`	Redis hostname (shared with admin and runtime servers)
`REDIS_PORT`	Redis port (default: 6379)
`REDIS_PASSWORD`	Redis password (empty for no auth)

Tuning

Variable	Default	Description
`dispatch.pending.timeout-seconds`	5	BRPOP blocking timeout (seconds)
`dispatch.retry.poll-interval-ms`	5000	How often to check for ready retries (ms)
`dispatch.http.timeout-seconds`	30	HTTP request timeout for webhook delivery
`dispatch.http.connect-timeout-seconds`	5	HTTP connect timeout
`MAX_DELIVERY_AGE_MS`	86400000	Deliveries older than this auto-fail (24h)
`EVENT_TTL_DAYS`	90	Redis TTL for event records
`DELIVERY_TTL_DAYS`	14	Redis TTL for delivery records
`RETENTION_CLEANUP_INTERVAL_MS`	3600000	ZSET index cleanup interval (1h)

Full configuration example

bash

REDIS_HOST=redis.example.com
REDIS_PORT=6379
REDIS_PASSWORD=your-redis-password
WEBHOOK_SECRET_ENCRYPTION_KEY=K7x2mP9qR4sT6wB1cD3fG5hJ8kL0nA2=
dispatch.pending.timeout-seconds=5
dispatch.retry.poll-interval-ms=5000
dispatch.http.timeout-seconds=30
dispatch.http.connect-timeout-seconds=5
MAX_DELIVERY_AGE_MS=86400000
EVENT_TTL_DAYS=90
DELIVERY_TTL_DAYS=14
RETENTION_CLEANUP_INTERVAL_MS=3600000

Health check

The events service exposes a Spring Boot Actuator health endpoint on the management port (9980 by default as of v0.1.25.9):

bash

curl http://localhost:9980/actuator/health
# {"status":"UP"}

Pre-v0.1.25.9 deployments exposed /actuator/health on the public API port 7980. Update kubelet probes and Docker HEALTHCHECK commands to hit :9980 when upgrading. The published Docker image's built-in HEALTHCHECK (30s interval, 60s start period, 5 retries) has already been updated.

What happens when the events service is down

Admin and runtime servers are unaffected — event emission is fire-and-forget, never blocks API responses
Events and deliveries accumulate in Redis — event:{id} keys (90-day TTL), delivery:{id} keys (14-day TTL), dispatch:pending list grows
Redis memory is bounded — TTLs ensure keys auto-expire even if never consumed
When the events service restarts:
- Stale deliveries (older than MAX_DELIVERY_AGE_MS, default 24h) are immediately marked FAILED
- Fresh deliveries are processed normally via BRPOP
- RetentionCleanupService trims orphaned ZSET index entries hourly
No data loss for events — event records persist in Redis for 90 days regardless of delivery status

Auto-disable for persistently failing subscriptions

The events service tracks consecutive_failures per subscription. When the counter reaches disable_after_failures (default 10), the subscription transitions to DISABLED and no further deliveries are attempted. The counter resets to 0 on any successful delivery. Re-enable a disabled subscription with PATCH /v1/admin/webhooks/{id} once the receiver is healthy.

Stale deliveries (older than MAX_DELIVERY_AGE_MS, default 24h) are marked FAILED without attempting HTTP delivery. This prevents a large backlog from triggering thundering-herd traffic against a receiver after a long events-service outage.

Signing secrets are encrypted at rest with AES-256-GCM using WEBHOOK_SECRET_ENCRYPTION_KEY (v0.1.25.2+). The events service decrypts per delivery; plaintext never lives on disk.

Prometheus metrics

The events service publishes webhook delivery metrics under the cycles_webhook_* namespace on /actuator/prometheus, served on the management port (9980 by default as of v0.1.25.9; was 7980 on pre-.9 builds). Update Prometheus scrape targets accordingly — the metric names and labels are unchanged.

Metric	Tags	Description
`cycles_webhook_delivery_attempts_total`	`tenant`, `event_type`	Every outbound HTTP attempt (including retries)
`cycles_webhook_delivery_success_total`	`tenant`, `event_type`, `status_code_family` (`2xx`/`3xx`/`4xx`/`5xx`)	Attempts that received HTTP 2xx
`cycles_webhook_delivery_failed_total`	`tenant`, `event_type`, `reason`	Failed attempts, bucketed by failure reason
`cycles_webhook_delivery_retried_total`	`tenant`, `event_type`	Retry attempts scheduled on the `dispatch:retry` ZSET
`cycles_webhook_delivery_stale_total`	`tenant`	Deliveries auto-failed by the `MAX_DELIVERY_AGE_MS` gate
`cycles_webhook_subscription_auto_disabled_total`	`tenant`, `reason`	Subscriptions transitioned to `DISABLED` after `disable_after_failures`
`cycles_webhook_delivery_latency_seconds`	`tenant`, `event_type`, `outcome`	Timer — HTTP RTT per delivery attempt
`cycles_webhook_events_payload_invalid_total`	`type`, `rule`	Event payload validation discrepancies (no tenant tag — shape issue, not traffic)

The tenant tag on all counters is gated by cycles.metrics.tenant-tag.enabled (default true) — set to false in deployments with many thousands of tenants to bound Prometheus cardinality.

Alert on cycles_webhook_subscription_auto_disabled_total (any increase is a receiver health issue) and on a sustained rise in cycles_webhook_delivery_failed_total{reason=!~"client_4xx"} (non-client-error failures indicate dispatch issues).

Scaling

Multiple events service instances can safely BRPOP from the same dispatch:pending list — BRPOP is atomic, so each delivery is processed by exactly one consumer. No distributed locking is needed.

Next steps

Webhook Event Delivery Protocol — full event type catalog and delivery specification
Managing Webhooks — create, test, and monitor webhooks
Webhook Integrations — PagerDuty, Slack, ServiceNow examples
Configuration Reference — all events service settings
Architecture Overview — how the events service fits in the system

Deploying the Events Service ​

Quick start with Docker ​

Standalone deployment ​

From pre-built image ​

From JAR ​

Configuration ​

Required ​

Recommended ​

Tuning ​

Full configuration example ​

Health check ​

What happens when the events service is down ​

Auto-disable for persistently failing subscriptions ​

Prometheus metrics ​

Scaling ​

Next steps ​

Deploying the Events Service

Quick start with Docker

Standalone deployment

From pre-built image

From JAR

Configuration

Required

Recommended

Tuning

Full configuration example

Health check

What happens when the events service is down

Auto-disable for persistently failing subscriptions

Prometheus metrics

Scaling

Next steps