Force-Releasing Stuck Reservations as an Operator
Reservations can get stuck. A client crashes without committing. A network partition hangs a request indefinitely. A bug in an agent leaves a reservation open past its intended lifetime. The budget stays reserved, and — because remaining = allocated - spent - reserved - debt — other work starts getting denied even though no actual money was spent.
The normal path is to wait for the reservation to expire (ttl_ms plus the grace period) and let the server-side sweeper release it. That is usually fine. But during an incident you often cannot wait: customer-facing agents are failing reservation requests right now because budget is parked on a hung run that has already been cancelled upstream.
For this case the runtime plane supports admin-on-behalf-of release on POST /v1/reservations/{id}/release (added in cycles-server v0.1.25.8). Operators can force-release any reservation by presenting both the tenant-scoped X-Cycles-API-Key and the X-Admin-API-Key on the same request. Every such call is tagged in the audit log so the action is never invisible.
When to reach for this
Use force-release only when you cannot wait for normal expiry and you have confirmed the downstream work is truly dead. Releasing a reservation whose client is still executing will cause the client's commit to land on a freshly-restored balance — double-counting. If there is any doubt, wait for expiry.
Dual-authentication request
The force-release call is the standard release endpoint with a second header:
curl -X POST "http://localhost:7878/v1/reservations/rsv_abc123/release" \
-H "X-Cycles-API-Key: $TENANT_API_KEY" \
-H "X-Admin-API-Key: $ADMIN_KEY" \
-H "Content-Type: application/json" \
-d '{
"idempotency_key": "incident-2026-04-17-release-rsv_abc123",
"reason": "Force-release during incident INC-842; client confirmed dead by oncall"
}'The runtime server:
- Validates the
X-Admin-API-Keyagainst its localADMIN_API_KEYenv var (must match the admin server's key — deploy them with the same value). - Validates the
X-Cycles-API-Keyand enforces that the reservation belongs to that key's tenant. - Runs the
release.luascript to returnreservedbudget to the pool. - Writes an audit entry with
actor_type = admin_on_behalf_of, the operator's admin key fingerprint, and the tenant key's ID.
Response is identical to a normal release:
{
"reservation_id": "rsv_abc123",
"status": "RELEASED",
"released_at_ms": 1744905600000
}ADMIN_API_KEY must be the same on both planes
Admin-on-behalf-of authenticates on the runtime plane, so cycles-server and cycles-server-admin must share the same ADMIN_API_KEY. If the runtime plane has a different value or none at all, the dual-auth header is rejected with 401 UNAUTHORIZED and the release does not happen.
Finding the reservation to release
You usually do not know the reservation_id off the top of your head. Two common paths to find it:
From an idempotency key
If the client was built to generate idempotency keys for reservations before calling reserve (which it should be — that is the canonical recovery pattern), you can look up the reservation from the key the application last logged:
curl -G "http://localhost:7878/v1/reservations" \
-H "X-Cycles-API-Key: $TENANT_API_KEY" \
--data-urlencode "idempotency_key=run-ac35f-step-4" | jq .Idempotency keys are unique per (tenant, endpoint, key), so this returns at most one match.
From a filtered listing
If the client did not log the key, find it by subject and status:
curl -G "http://localhost:7878/v1/reservations" \
-H "X-Cycles-API-Key: $TENANT_API_KEY" \
--data-urlencode "status=ACTIVE" \
--data-urlencode "app=support-bot" \
--data-urlencode "sort_by=expires_at_ms" \
--data-urlencode "sort_dir=asc" \
--data-urlencode "limit=20" | jq .Oldest-expiring first is the most operationally useful view — hung reservations usually have expires_at_ms well in the past. See Reservation Recovery and Listing for the full sort parameter catalog.
Audit trail
Every force-release is captured in the audit log. Query it from the admin plane:
curl -G "http://localhost:7979/v1/admin/audit/logs" \
-H "X-Admin-API-Key: $ADMIN_KEY" \
--data-urlencode "action_kind=reservation.release" \
--data-urlencode "actor_type=admin_on_behalf_of" | jq .Fields to expect on the audit entry:
| Field | Meaning |
|---|---|
action_kind | reservation.release |
actor_type | admin_on_behalf_of (distinguishes from tenant-initiated releases) |
actor_admin_key_id | Fingerprint of the X-Admin-API-Key used |
actor_api_key_id | ID of the tenant key used |
tenant_id | Tenant the reservation belonged to |
reservation_id | The reservation |
reason | Free-form text from the request body, if provided |
idempotency_key | From the request body |
The audit entry is retained under the authenticated-audit retention policy (default 400 days in v0.1.25.20+). This comfortably covers SOC2 audit windows.
Recovery checklist
A pragmatic runbook for hung-reservation incidents:
- Confirm the client is dead. Check logs, look for the idempotency key, grep for the reservation ID in traces. If the client is still running, do not force-release — either wait for TTL or release from the client itself.
- Find the reservation. Use idempotency key lookup first, filtered listing second.
- Inspect it.
GET /v1/reservations/{id}returns the full state — status, subject, reserved amount, TTL, and finalized time (null if still ACTIVE). - Decide: release vs wait. If
expires_at_msis within minutes, waiting is safer. If it is hours away and budget is being denied right now, force-release. - Force-release with a reason. Include the incident ID and the reason in both
idempotency_keyandreason— the audit entry is the record of why. - Verify. Re-fetch the reservation; it should now be
RELEASEDwith afinalized_at_ms. - Post-incident. Document the hung reservation in the incident report. If the pattern repeats, look at shortening client TTLs or adding heartbeat extension.
Dashboard equivalent
The Reservations page in the Cycles Admin Dashboard exposes this exact flow — filter to status=ACTIVE, sort by expires_at_ms ascending, click a row, and use the Force Release button. The dashboard sends the dual-auth call through its nginx reverse proxy to the runtime plane.
Failure modes
- 401 UNAUTHORIZED — either the admin key is wrong or the runtime server does not have
ADMIN_API_KEYconfigured (dual-auth is rejected instead of silently falling back). - 403 FORBIDDEN — the tenant key's tenant does not own the reservation. Tenant isolation is enforced even for admin-on-behalf-of calls.
- 404 NOT_FOUND — the reservation never existed.
- 409 CONFLICT — the reservation is already
COMMITTED,RELEASED, orEXPIRED. The release is a no-op; the response echoes the current state. - 410 RESERVATION_EXPIRED — the reservation already expired before you got to it; the budget is already back.
Next steps
- Reservation Recovery and Listing — listing and sort parameters
- Production Operations Guide — broader runbook patterns
- Admin API reference — audit log query endpoints
- Security Hardening — protecting the admin key across planes