Rate limits

Driftstack uses a per-account token-bucket scheme to keep one noisy customer from degrading the platform for everyone else. This page documents the buckets, the response headers your client should consume, and how to react to 429s gracefully.

How the buckets work

Each request increments a counter against one or more named buckets. Buckets refill at a tier-dependent rate. If any bucket is empty when your request lands, you get a 429 + a Retry-After header.

Buckets are per account, not per API key. If you mint 10 keys to spread your load, you're still hitting the same buckets — the limit is on the account.

Bucket reference

Three buckets are enforced today: global (every authenticated request), sessions:create (POST /v1/sessions), and agent_sessions:message (POST /v1/agent-sessions/:id/messages). Burst is the maximum bucket capacity; sustained throughput is the refill rate.

Bucket	Personal	API Builder	What it covers
`global`	120 burst · 120 req/min	1,800 burst · 1,800 req/min	Every authenticated request increments this bucket. The default rate-limit catch-all.
`sessions:create`	10 burst · 2 req/min	60 burst · 60 req/min	POST /v1/sessions — burst-sensitive; throttled tighter than `global` to keep one customer from saturating the fleet.
`agent_sessions:message`	40 burst · 20 req/min	300 burst · 180 req/min	POST /v1/agent-sessions/:id/messages — isolated from global so an LLM-driven message loop can't drain the global cap.

Higher tiers (API Scale, Enterprise) get larger buckets — see the tier comparison for the full matrix. Free uses the same bucket sizes as Solo Manual.

Response headers

Every API response carries headers describing the calling account's rate-limit state for the bucket hit by the request:

Header	Value
`X-RateLimit-Limit`	The bucket capacity (tokens).
`X-RateLimit-Remaining`	Tokens left after this request.
`X-RateLimit-Reset`	Unix seconds when the bucket will be full again.
`X-RateLimit-Bucket`	The bucket name (e.g. `sessions:create`).

Track X-RateLimit-Remaining against a low-water mark in your client; if it drops below, say, 20% of the limit, slow your request rate proactively rather than waiting for the 429.

When you hit a 429

The body follows RFC 7807 (application/problem+json) — flat keys, no error envelope. The type URI dereferences to the long-form explanation; the retry_after_seconds extension carries the same value as the Retry-After header for clients that prefer to read it from the body.

HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-RateLimit-Bucket: sessions:create
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1715472012
Content-Type: application/problem+json

{
  "type": "https://errors.driftstack.dev/rate-limited",
  "title": "Too Many Requests",
  "status": 429,
  "detail": "Rate limit for \"sessions:create\" exceeded for tier \"solo_manual\".",
  "retry_after_seconds": 12
}

Your client should:

Stop sending requests to the same bucket immediately.
Wait at least Retry-After seconds.
Resume with reduced concurrency; gradually ramp back up.

Do not retry-loop without backoff. We log sustained 429s as abuse and may rate-limit further or temporarily disable your key.

Concurrent sessions

Separate from per-bucket request rate, your tier caps the number of concurrently running sessions. The caps mirror TIER_CONCURRENT_SESSION_LIMITS exactly; see /docs/concurrency for the authoritative table + backoff guidance.

Exceeding the concurrency cap returns 429 Too Many Requests with the https://errors.driftstack.dev/concurrency-limit problem-type — distinct from rate-limit 429s, which use the https://errors.driftstack.dev/rate-limited type. Dispatch on the type URI, not the status code.

Requesting a temporary override

For one-off events (load tests, customer-facing demos), email [email protected] describing the bucket(s) + multiplier + duration you need. We generally approve up to 5× for up to 48h on the spot for paid tiers. Anything larger needs a quick conversation about why.

Overrides are visible in your dashboard under Settings → Rate limits. They expire automatically at the configured time.

Testing rate-limit behaviour locally

The self-hosted dev stack (see the self-hosted mac local runbook) runs the same rate-limit code path against a local Redis. Use it to exercise your 429 handling without consuming real production budget.

Support

Sustained 429s that you didn't expect, or a need to discuss a production-impacting limit: [email protected].