Real-Time Budget Alerts for AI Agents: Designing Cycles' Webhook Event System

Part of: LLM Cost Runtime Control Reference — the full pillar covering causes, enforcement patterns, multi-tenant boundaries, and unit economics.

Consider a common scenario: an infrastructure team has budget dashboards. Prometheus scrapes every 15 seconds. Grafana panels show utilization curves. Alert rules fire when thresholds cross 90%.

An agent hit a retry loop on a Friday afternoon. It burned through $450 of budget in under 3 minutes. The 90% threshold alert fired at minute one. The on-call engineer saw it at minute four — after the Slack notification, after checking context, after pulling up the dashboard. By then the budget was exhausted, 12 other agents in the same workspace were blocked, and a customer-facing workflow was down.

The monitoring worked. The problem was latency. Polling-based alerting has a structural delay between "state changed" and "someone knows about it." For budget enforcement, where a single runaway agent can exhaust funds in seconds, that delay is where the damage happens.

The detection gap

Representative detection latencies for a budget exhaustion event:

Detection Method	Typical Time to Alert	Typical Time to Human Action
Polling dashboard (60s interval)	30-60s	2-5 minutes
Prometheus alert (15s scrape + 1m for-duration)	~75s	3-6 minutes
Webhook event (push on state change)	<1s	1-3 minutes

The webhook doesn't make humans faster. It eliminates the detection delay entirely. The event fires the instant the budget state changes — not on the next scrape, not after a for-duration averaging window.

This is why we built a webhook event system into Cycles v0.1.25.

41 event types across 6 categories

As of post date. The current Admin API EventType enum registers 47 event types across 7 categories (the webhook category was added later). For the live count and per-category breakdown, see the Event Payloads Reference.

Every observable state change in the system produces an event. We organized them into 6 categories covering the full lifecycle:

Category	Count	Covers
budget	16	Created, funded, debited, reset, reset_spent (billing period), frozen, closed, threshold crossed, exhausted, over-limit, debt incurred, burn rate anomaly
reservation	5	Denied, denial rate spike, expired, expiry rate spike, commit overage
tenant	6	Created, updated, suspended, reactivated, closed, settings changed
api_key	6	Created, revoked, expired, permissions changed, auth failed, auth failure rate spike
policy	3	Created, updated, deleted
system	5	Store connection lost/restored, high latency, webhook delivery failed, webhook test

The six events that matter most for incident response:

Event	When It Fires	Why It Matters
`budget.exhausted`	Remaining = 0	All reservations for this scope will be denied until funded
`budget.over_limit_entered`	Debt exceeds overdraft limit	New reservations blocked; operator intervention required
`reservation.denied`	Agent can't reserve budget	Agents are failing — check if budget needs funding or if there's a runaway consumer
`budget.threshold_crossed`	Utilization crosses 80%, 95%, or 100%	Early warning before exhaustion
`api_key.auth_failed`	Authentication attempt with invalid key	Security event — possible credential leak or misconfiguration
`system.store_connection_lost`	Redis connection failed	Infrastructure incident — budget enforcement depends on Redis availability

Every event includes a standard payload: who caused it (actor), what changed (data), where it happened (scope path like tenant:acme-corp/workspace:prod/agent:support-bot), and a millisecond-precision timestamp. Events are emitted by both the runtime enforcement server (reserve/commit operations) and the admin control plane (CRUD operations) — a single webhook subscription captures both.

Architecture: why a separate delivery service

The most important engineering decision in this system: webhook delivery runs as its own service, separate from the runtime enforcement server and the admin API.

Runtime server (port 7878) ──┐
                             ├── LPUSH dispatch:pending
Admin server (port 7979) ────┘
                                    │
                              Redis ─┤
                                    │
Events service (port 7980) ──── BRPOP → HTTP POST with HMAC signature

Three services, three workloads, three scaling profiles:

Service	Workload	Latency Target	Scaling Driver
Runtime (reserve/commit)	Synchronous, hot path	<10ms p99	Agent request volume
Admin (CRUD)	Synchronous, operator-facing	<200ms	Human operator actions
Events (webhook delivery)	Asynchronous, variable latency	Best-effort	Subscription count × event rate

Why not embed delivery in the runtime server? Webhook endpoints are external HTTP services with unpredictable latency. A slow endpoint or DNS timeout would add hundreds of milliseconds to the reserve/commit path. For a system designed to enforce budgets at sub-10ms latency, that's unacceptable. Even running delivery on a background thread doesn't help — thread pool exhaustion from slow endpoints would eventually affect the main request threads.

Why not embed in the admin server? Same problem, different magnitude. Admin API latency matters less (operators tolerate 200ms), but a webhook endpoint that hangs for 30 seconds ties up a thread pool slot. Multiply by 50 subscriptions and a burst of events, and the admin API becomes unresponsive for tenant management.

The shared Redis queue solves both problems. Admin and runtime servers fire-and-forget — LPUSH a delivery ID to dispatch:pending and return immediately. The events service does the slow work: load the event, look up the subscription, compute the HMAC signature, make the HTTP call, handle retries. If the events service falls behind, the queue buffers. If the events service is down entirely, events accumulate in Redis with a 90-day TTL and drain when it restarts.

Multiple events service instances can run concurrently. BRPOP is atomic — each delivery is processed by exactly one consumer. No distributed locking, no coordination, no split-brain risk. Scale horizontally by adding instances.

Delivery guarantees: at-least-once with HMAC signing

We chose at-least-once delivery over exactly-once. In a distributed system where the webhook receiver is an external HTTP service, exactly-once is impossible without two-phase commit — and two-phase commit across the internet is a fiction. The practical choice is: deliver at least once and give receivers the tools to deduplicate.

Every delivery includes an X-Cycles-Event-Id header containing the event's unique ID. Receivers store processed event IDs and skip duplicates. This is the same pattern used by Stripe, GitHub, and every other webhook system at scale.

Why HMAC-SHA256?

We evaluated four approaches for webhook payload verification:

Approach	Proves Identity	Proves Integrity	Setup Complexity	Industry Standard
Bearer token in header	Yes	No	Low	Common but incomplete
IP allowlisting	Partial	No	Medium	Brittle with CDNs/proxies
mTLS	Yes	Yes	High	Heavy for webhook receivers
HMAC-SHA256	Yes	Yes	Low	GitHub, Stripe, Slack

HMAC-SHA256 proves both identity (the sender knows the shared secret) and integrity (the body hasn't been modified in transit). It requires no certificate infrastructure, no IP management, and no special HTTP client configuration. Receivers verify with 3 lines of code in any language.

The signature is sent in the X-Cycles-Signature header as sha256=<hex>, matching GitHub's webhook signature format. Signing secrets can be encrypted at rest in Redis using AES-256-GCM (enabled via the WEBHOOK_SECRET_ENCRYPTION_KEY environment variable). When configured, a compromise of the Redis data store doesn't expose the signing secrets.

Failure handling: what happens when things break

This is the section that matters most for on-call engineers evaluating whether to trust this system with their alerting pipeline.

Scenario	What Happens	Recovery
Endpoint returns 500	Retry with exponential backoff (default: 1s, 2s, 4s, 8s, 16s)	Auto-recovers when endpoint returns 2xx
Endpoint unreachable	Same retry sequence	Auto-recovers when reachable
Endpoint down for hours	Retries exhaust (5 by default) → delivery marked FAILED	Re-enable subscription via API, replay missed events
10 consecutive failures	Subscription auto-disabled (status → DISABLED)	Fix endpoint, PATCH subscription to ACTIVE (resets counter)
Events service down	Events accumulate in Redis (90-day TTL)	Drains backlog on restart; deliveries older than 24h auto-fail
Redis down	Budget enforcement is unavailable; event delivery enqueue fails (logged, does not block API callers)	Enforcement and event delivery resume when Redis recovers

Two design decisions are worth calling out:

Stale delivery protection. If the events service is down for a week and then restarts, it won't deliver week-old webhook notifications. Deliveries older than 24 hours (configurable via MAX_DELIVERY_AGE_MS) are automatically marked FAILED. This prevents flooding receivers with irrelevant historical alerts. If you need those events, use the replay API to selectively re-deliver.

Auto-disable with manual re-enable. After 10 consecutive delivery failures (configurable via disable_after_failures), the subscription is automatically disabled. This prevents hammering a dead endpoint for hours. Re-enabling is a single API call that resets the failure counter. We chose manual re-enable over automatic re-enable to avoid surprise traffic spikes when endpoints recover.

Retention and resource management

Event data doesn't grow without bounds:

Data	TTL	Cleanup
Event records (`event:{id}`)	90 days	Redis EXPIRE on creation
Delivery records (`delivery:{id}`)	14 days	Redis EXPIRE on creation
ZSET index entries	N/A	Hourly trimming via `RetentionCleanupService`
Dispatch queue (`dispatch:pending`)	Self-draining	Consumed by BRPOP

All TTLs are configurable via environment variables (EVENT_TTL_DAYS, DELIVERY_TTL_DAYS) — no code changes, no redeployment. The events service is optional: if you don't deploy it, admin and runtime servers are completely unaffected. Events accumulate in Redis until either the TTL expires or you start the events service.

Integration: PagerDuty in 5 minutes

Creating a webhook subscription and routing events to PagerDuty takes two steps:

bash

# 1. Create subscription for critical budget events
curl -X POST http://localhost:7979/v1/admin/webhooks \
  -H "X-Admin-API-Key: $ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-middleware.example.com/cycles-to-pagerduty",
    "event_types": [
      "budget.exhausted",
      "budget.over_limit_entered",
      "reservation.denied"
    ]
  }'

The response includes a signing secret (returned once — store it). Your middleware transforms Cycles events into PagerDuty Events API v2 format, mapping budget.exhausted to severity critical and reservation.denied to warning. Use the event_id as PagerDuty's dedup_key to correlate retried deliveries to the same alert.

We have full integration guides with code examples for PagerDuty, Slack, Datadog, Microsoft Teams, Opsgenie, and ServiceNow, plus a custom receiver pattern with signature verification in Python, Node.js, and Go.

Tenants can also create their own webhook subscriptions via /v1/webhooks using their API key — restricted to budget, reservation, and tenant events (27 of 41 types as of post date; the live count is 27 of 47 — see the Webhook Event Delivery Protocol for current category eligibility). Admin-only events (api_key, policy, webhook, system) require admin key access.

Webhook URLs are validated at creation time with SSRF protection enabled by default: RFC 1918 private IP ranges, loopback, and link-local addresses are blocked, and HTTPS is required in production. These can be configured via PUT /v1/admin/config/webhook-security for environments that need internal endpoint access.

What's next

The v0.1.25 event system delivers threshold alerts at the default levels (80%, 95%, 100% utilization). Coming next on the implementation roadmap:

Per-subscription threshold customization: override the default 80%/95%/100% thresholds for specific subscriptions — e.g., a high-priority workspace that should alert at 50%
Burn rate anomaly detection: alert when spend rate exceeds the rolling average by a configurable multiplier
Rate spike detection: alert on reservation denial rate spikes and expiry rate spikes across rolling windows

These are defined in the v0.1.25 spec as WebhookThresholdConfig. The schema is finalized; server-side implementation is on the roadmap.

Get started:

Managing Webhooks — create, test, monitor, and troubleshoot subscriptions
Webhook Integrations — PagerDuty, Slack, Datadog, Teams, Opsgenie, ServiceNow code examples
Webhooks and Events Concepts — architecture, delivery semantics, security model
Deploy the Full Stack — admin + runtime + events in one command

Real-Time Budget Alerts for AI Agents: Designing Cycles' Webhook Event System ​

The detection gap ​

41 event types across 6 categories ​

Architecture: why a separate delivery service ​

Delivery guarantees: at-least-once with HMAC signing ​

Why HMAC-SHA256? ​

Failure handling: what happens when things break ​

Retention and resource management ​

Integration: PagerDuty in 5 minutes ​

What's next ​

Related how-to guides ​

More from the Blog