Incidents & SLA
This page covers what happens during a Driftstack outage: how we detect, communicate, and resolve incidents, and the credit policy if we miss our SLA.
Status page
status.driftstack.dev
is the single source of truth during an incident. The same
data the status page renders is also published via
GET /v1/status (overall + per-component status)
and GET /v1/status/incidents (recent / live
incidents). To get notified by email when an incident is
filed or resolved, subscribe via
POST /v1/status/subscribe — see
/docs/status-subscriptions.
Severity ladder
| Severity | Definition | First update | Update cadence |
|---|---|---|---|
| Critical | Core API down across all customers, or data-loss risk. | ≤ 15 min | Every 30 min until resolved. |
| Major | API degraded (>5% error rate) OR a critical surface (auth, sessions) unavailable for a subset of customers. | ≤ 30 min | Every 60 min. |
| Minor | Single non-critical surface (dashboard, an SDK build pipeline) degraded. | ≤ 60 min | At resolution. |
| Maintenance | Planned change with potential impact. Always announced ≥48h in advance. | Pre-announced | Start + end. |
Detection
Three signals trigger an incident:
- V-295b health probes: 60-second poller
against
/v1/health+ per-region API endpoints. Three consecutive failures auto-create a Critical incident. - Customer reports: emails to [email protected] and Slack channel monitoring. We acknowledge within 30 min during EU business hours.
- Internal alerting: Sentry + cost-monitoring thresholds page on-call. Anything that warrants customer comms gets escalated to a public status-page entry.
Customer communications during an incident
- Status page entry filed with severity + title + affected components.
- Email fan-out to confirmed
/v1/status/subscribesubscribers. - Progress updates on the cadence above.
Each update appends to the incident's timeline (visible via
GET /v1/status/incidents). - Resolution marks the incident resolved and triggers a final email to subscribers with a root-cause summary + remediation steps.
- Postmortem for Critical / Major incidents published within 7 business days on the public status page as a permanent entry under the resolved incident. Minor incidents get an inline summary on the resolved status entry.
Note: incident.created /
incident.updated / incident.resolved
are admin-audit / internal SSE event types — they are not yet
in SubscribableWebhookEventTypeSchema, so they
can't be the target of a POST /v1/webhooks
subscription. Email subscription is the customer-facing
notification path today.
SLA + credit policy
Driftstack's SLA is on the API control plane (session create + lifecycle endpoints). The dashboard, SDK distribution, and any features marked roadmap in /docs/api-versioning are best-effort.
Tier-by-tier SLA targets, the windowing methodology, the
credit bands, and the dispute process all live in
/docs/sla-policy — that is the
authoritative reference. Tier identifiers used there match
the AccountTier enum exactly.
Reading SLA data from the API
GET /v1/status/sla
→ {
"data": [
{
"target": "api.driftstack.dev",
"uptimePct": 99.99,
"totalProbes": 43200,
"okCount": 43196,
"failCount": 4,
"lastProbeAt": "2026-05-11T13:00:00Z",
"lastFailureAt": "2026-05-09T14:23:00Z",
"windowStart": "2026-04-11T13:00:00Z",
"windowEnd": "2026-05-11T13:00:00Z"
}
]
} No auth — status surface is public. Window is a fixed rolling 30 days. Field names are camelCase (the SLA report serialises its internal model directly).
Postmortems
Public postmortems for Critical + Major incidents live on the public status page, attached to the resolved incident entry. Each follows the same template: timeline, root cause, what we changed to prevent recurrence. Postmortems are blameless and detailed enough to be useful — we'd rather over-share than under-share.
Reporting a problem
- Acute outage: if the status page doesn't already show the incident, email [email protected]. That goes straight to on-call.
- Non-acute bug / weird behaviour: [email protected] with a session id + timestamp.
- Security: [email protected] — PGP available on the page. See also /docs/api-security-headers for the response-header reference reviewers ask about.